0% found this document useful (0 votes)
57 views32 pages

Generalized Functional Linear Models

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views32 pages

Generalized Functional Linear Models

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

The Annals of Statistics

2005, Vol. 33, No. 2, 774–805


DOI 10.1214/009053604000001156
© Institute of Mathematical Statistics, 2005

GENERALIZED FUNCTIONAL LINEAR MODELS1

B Y H ANS -G EORG M ÜLLER AND U LRICH S TADTMÜLLER


University of California, Davis, and Universität Ulm
We propose a generalized functional linear regression model for a
regression situation where the response variable is a scalar and the predictor
is a random function. A linear predictor is obtained by forming the scalar
product of the predictor function with a smooth parameter function, and
the expected value of the response is related to this linear predictor via
a link function. If, in addition, a variance function is specified, this leads
to a functional estimating equation which corresponds to maximizing a
functional quasi-likelihood. This general approach includes the special cases
of the functional linear model, as well as functional Poisson regression
and functional binomial regression. The latter leads to procedures for
classification and discrimination of stochastic processes and functional
data. We also consider the situation where the link and variance functions
are unknown and are estimated nonparametrically from the data, using a
semiparametric quasi-likelihood procedure.
An essential step in our proposal is dimension reduction by approximating
the predictor processes with a truncated Karhunen–Loève expansion. We
develop asymptotic inference for the proposed class of generalized regression
models. In the proposed asymptotic approach, the truncation parameter
increases with sample size, and a martingale central limit theorem is applied
to establish the resulting increasing dimension asymptotics. We establish
asymptotic normality for a properly scaled distance between estimated and
true functions that corresponds to a suitable L2 metric and is defined through
a generalized covariance operator. As a consequence, we obtain asymptotic
tests and simultaneous confidence bands for the parameter function that
determines the model.
The proposed estimation, inference and classification procedures and
variants with unknown link and variance functions are investigated in a
simulation study. We find that the practical selection of the number of
components works well with the AIC criterion, and this finding is supported
by theoretical considerations. We include an application to the classification
of medflies regarding their remaining longevity status, based on the observed
initial egg-laying curve for each of 534 female medflies.

Received September 2001; revised March 2004.


1 Supported in part by NSF Grants DMS-99-71602 and DMS-02-04869.
AMS 2000 subject classifications. Primary 62G05, 62G20; secondary 62M09, 62H30.
Key words and phrases. Classification of stochastic processes, covariance operator, eigenfunc-
tions, functional regression, generalized linear model, increasing dimension asymptotics, Karhunen–
Loève expansion, martingale central limit theorem, order selection, parameter function, quasi-
likelihood, simultaneous confidence bands.
774
GENERALIZED FUNCTIONAL LINEAR MODELS 775

1. Introduction. Many studies involve tightly spaced repeated measurements


on the same individuals or direct recordings of a sample of curves [Brumback
and Rice (1998) and Staniswalis and Lee (1998)]. If longitudinal measurements
are made on a suitably dense grid, such data can often be regarded as a sample
of curves or as functional data. Examples can be found in studies on longevity
and reproduction, where typical subjects are fruit flies [Müller et al. (2001)] or
nematodes [Wang, Müller, Capra and Carey (1994)].
Our procedures are motivated by a study where the goal is to find out whether
there is information in the egg-laying curve observed for the first 30 days of life
for female medflies, regarding whether the fly is going to be long-lived or short-
lived. Discrimination and classification of curve data is of wide interest, from
engineering [Hall, Poskitt and Presnell (2001)], and astronomy [Hall, Reimann
and Rice (2000)] to DNA expression arrays with repeated measurements, where
dynamic classification of genes is of interest [Alter, Brown and Botstein (2000)].
For multivariate predictors with fixed dimension, such discrimination tasks are
often addressed by fitting binomial regression models using quasi-likelihood based
estimating equations.
Given the importance of discrimination problems for curve data, it is clearly of
interest to extend notions such as logistic, binomial or Poisson regression to the
case of a functional predictor, which may be often viewed as a random predictor
process. More generally, there is a need for new models and procedures allowing
one to regress univariate responses of various types on a predictor process. The
extension from the classical situation with a finite-dimensional predictor vector
to the case of an infinite-dimensional predictor process involves a distinctly
different and more complicated technology. One characteristic feature is that
the asymptotic analysis involves increasing dimension asymptotics, where one
considers a sequence of increasingly larger models.
The functional linear regression model with functional or continuous response
has been the focus of various investigations [see Ramsay and Silverman (1997),
Faraway (1997), Cardot, Ferraty and Sarda (1999) and Fan and Zhang (2000)].
An applied version of a generalized linear model with functional predictors has
been investigated by James (2002). We assume here that the dependent variable
is univariate and continuous or discrete, for example, of binomial or Poisson
type, and that the predictor is a random function. The main idea is to employ a
Karhunen–Loève or other orthogonal expansion of the random predictor function
[see, e.g., Ash and Gardner (1975) and Castro, Lawton and Sylvestre (1986)], with
the aim to reduce the dimension to the first few components of such an expansion.
The expansion is therefore truncated at a finite number of terms which increases
asymptotically.
Once the dimension is reduced to a finite number of components, the expansion
coefficients of the predictor process determine a finite-dimensional vector of
random variables. We can then apply the machinery of generalized linear or
quasi-likelihood models [Wedderburn (1974)], essentially solving an estimating or
776 H.-G. MÜLLER AND U. STADTMÜLLER

generalized score equation. The resulting regression coefficients obtained for the
linear predictor in such a model then provide us with an estimate of the parameter
function of the generalized functional regression model. This parameter function
replaces the parameter vector of the ordinary finite-dimensional generalized
linear model. We derive an asymptotic limit result (Theorem 4.1) for the
deviation between estimated and true parameter function for increasing dimension
asymptotics, referring to a situation where the number of components in the model
increases with sample size.
Asymptotic tests for the regression effect and simultaneous confidence bands
are obtained as corollaries of this main result. We include an extension to the
case of a semiparametric quasi-likelihood regression (SPQR) model in which link
and variance functions are unknown and are estimated from the data, extending
previous approaches of Chiou and Müller (1998, 1999), and also provide an
analysis of the AIC criterion for order selection.
The paper is organized as follows: The basics of the proposed generalized
functional linear model and some preliminary considerations can be found in
Section 2. The underlying ideas of estimation and statistical analysis within the
generalized functional linear model will be discussed in Section 3. The main
results and their ramifications are described in Section 4, preceded by a discussion
of the appropriate metric in which to formulate the asymptotic result, which
is found to be tied to the link and variance functions used for the generalized
functional linear model. Simulation results are reported in Section 5. An illustrative
example for the special case of binomial functional regression with the goal to
discriminate between short- and long-lived medflies is provided in Section 6. This
is followed by the main proofs in Section 7. Proofs of auxiliary results are in the
Appendix.

2. The generalized functional linear model. The data we observe for the ith
subject or experimental unit are ({Xi (t), t ∈ T }, Yi ), i = 1, . . . , n. We assume that
these data form an i.i.d. sample. The predictor variable X(t), t ∈ T , is a random
curve which is observed per subject or experimental unit and corresponds to a
square integrable stochastic process on a real interval T . The dependent variable
Y is a real-valued random variable which may be continuous or discrete. For
example, in the important special case of a binomial functional regression, one
would have Y ∈ {0, 1}.
Assume that a link function g(·) is given which is a monotone and twice con-
tinuously differentiable function with bounded derivatives and is thus invertible.
Furthermore, we have a variance function σ 2 (·) which is defined on the range of
the link function and is strictly positive. The generalized functional linear model
or functional quasi-likelihood model is determined by a parameter function β(·),
which is assumed to be square integrable on its domain T , in addition to the link
function g(·) and the variance function σ 2 (·).
GENERALIZED FUNCTIONAL LINEAR MODELS 777

Given a real measure dw on T , define linear predictors



η=α+ β(t)X(t) dw(t)

and conditional means µ = g(η), where E(Y |X(t), t ∈ T ) = µ and Var(Y |X(t),
t ∈ T ) = σ 2 (µ) = σ̃ 2 (η) for a function σ̃ 2 (η) = σ 2 (g(η)). In a generalized
functional linear model the distribution of Y would be specified within the
exponential family. For the following (except where explicitly noted), it will be
sufficient to consider the functional quasi-likelihood model
  
(1) Yi = g α + β(t)Xi (t) dw(t) + ei , i = 1, . . . , n,

where
 
E e|X(t), t ∈ T = 0,
 
Var e|X(t), t ∈ T = σ 2 (µ) = σ̃ 2 (η).
Note that α is a constant, and the inclusion of an intercept allows us to require
E(X(t)) = 0 for all t.
The errors ei are i.i.d. and we use integration w.r.t. the measure dw(t) to allow
for nonnegative weight functions v(·) such that v(t) > 0 for t ∈ T , v(t) = 0 for
t∈/ T and dw(t) = v(t) dt; the default choice will be v(t) = 1{t∈T } . Nonconstant
weight functions might be of interest when the observed predictor processes are
function estimates which may exhibit increased variability in some regions, for
example, toward the boundaries.
The parameter function β(·) is a quantity of central interest in the statistical
analysis and replaces the vector of slopes in a generalized linear model or
estimating equation based model. Setting σ 2 = E{σ̃ 2 (η)}, we then find
Var(e) = Var{E(e|X(t), t ∈ T } + E{Var(e|X(t), t ∈ T }
= E{σ̃ 2 (η)} = σ 2 ,
as well as E(e) = 0.
 ρj , j = 1, 2, . . . , be an orthonormal basis of the function space L (dw), that
Let 2

is, T ρj (t)ρk (t) dw(t) = δj k . Then the predictor process X(t) and the parameter
function β(t) can be expanded into

 ∞

X(t) = εj ρj (t), β(t) = βj ρj (t)
j =1 j =1

[in the L2 (dw) sense] with r.v.’s εj and coefficients βj , given by εj = X(t) ×
ρ (t) dw(t) and βj = β(t)ρj (t) dw(t), respectively. We note that E(εj ) = 0 and
j 2  
βj < ∞. Writing σj2 = E(εj2 ), we find σj2 = E(X 2 (t)) dw(t) < ∞.
778 H.-G. MÜLLER AND U. STADTMÜLLER

From the orthonormality of the base functions ρj , it follows immediately that


 ∞

β(t)X(t) dw(t) = βj εj .
j =1
It will be convenient to work with standardized errors
e = eσ (µ) = eσ̃ (η),
for which E(e |X) = 0, E(e ) = 0, E(e2 ) = 1. We assume that E(e4 ) = µ4 < ∞
and note that in model (1), the distribution of the errors ei does not need to be
specified and, in particular, does not need to be a member of the exponential
family. In this regard, model (1) is less an extension of the classical generalized
linear model [McCullagh and Nelder (1989)] than an extension of the quasi-
likelihood approach of Wedderburn (1974). We address the difficulty caused by
the infinite dimensionality of the predictors by approximating model (1) with a
series of models where the number of predictors is truncated at p = pn and the
dimension pn increases asymptotically as n → ∞.
A heuristic motivation for this truncation strategy is as follows: Setting
p
 ∞

Up = α + βj εj , Vp = βj εj ,
j =1 j =p+1

we find E(Y |X(t), t ∈ T ) = g(α + ∞ j =1 βj εj ) = g(Up + Vp ). Conditioning on
the first p components and writing FVp |Up for the conditional distribution function
leads to a truncated link function gp ,

E(Y |Up ) = gp (Up ) = E[g(Up + Vp )|Up ] = g(Up + s) dFVp |Up (s).
For the approximation of the full model by the truncated link function, we note
that the boundedness of g  , |g  (·)|2 ≤ c, implies that

2
[g(Up + Vp ) − g(Up + s)] dFVp |Up (s)

≤ g  (ξ )2 (Vp − s)2 dFVp |Up (s)

≤ 2c (Vp2 + s 2 ) dFVp |Up (s)
and, therefore,
 2 
E g(Up + Vp ) − gp (Up )
 2
=E [g(Up + Vp ) − g(Up + s)] dFVp |Up (s)
(2)  
≤ 2cE Vp2 + E(Vp2 |Up ) = 4cE(Vp2 )

 ∞

≤ 4c βj2 σj2 .
j =p+1 j =p+1
GENERALIZED FUNCTIONAL LINEAR MODELS 779

The approximation error of the truncated model is seen to be directly tied to


Var(Vp ) and is controlled by the sequence σj 2 = Var(εj ), j = 1, 2, . . . , which
for the special case of an eigenbase corresponds to a sequence of eigenvalues.

Setting εj(i) = Xi (t)ρj (t) dw(t), the full model with standardized errors ei is



 
Yi = g α + βj εj(i) + ei σ̃ α+ βj εj(i) , i = 1, . . . , n.
j =1 j =1

With truncated linear predictors η and means µ,


p
 (i)
ηi = α + βj εj , µi = g(ηi ),
j =1

the p-truncated model becomes


p p
 
+ ei σ̃p
(p) (i) (i)
(3) Yi = gp α + βj εj α+ βj εj , i = 1, . . . , n,
j =1 j =1

where σ̃p is defined analogously to gp . Note that g(Up ) − gp (Up ) and, analo-
gously, σ̃ (Up ) − σ̃p (Up ) are bounded by the error (2). Since it will be assumed
that this error vanishes asymptotically, as p → ∞, we may instead of (3) work
with the approximating sequence of models
p p
 
+ ei σ̃
(p)
(4) Yi =g α+ βj εj(i) α+ βj εj(i) , i = 1, . . . , n,
j =1 j =1

in which the functions g and σ̃ are fixed. We note that the random variables
Yi and ei , i = 1, . . . , n, form triangular arrays, Yi,nn and ei,n
 , i = 1, . . . , n, with
(p) (p )

changing distribution as n changes; for simplicity, we suppress the indices n.


Inference will be developed for the sequence of p-truncated models (4) with
asymptotic results for p → ∞. The practical choice of p in finite sample
situations will be discussed in Section 5. We also develop a version where the
link function g is estimated from the data, given p. The practical implementation
of this semiparametric quasi-likelihood regression (SPQR) version adapts to the
changing link functions gp of the approximating sequence (3).

3. Estimation in the generalized functional linear model. One central


aim is estimation and inference for the parameter function β(·). Inference for
β(·) is of interest for constructing confidence regions and testing whether the
predictor function has any influence on the outcome, in analogy to the test for
regression effect in a classical regression model. The orthonormal basis {ρj , j =
1, 2, . . .} is commonly chosen as the Fourier basis or the basis formed by the
eigenfunctions of the covariance operator. The eigenfunctions can be estimated
from the data as described in Rice and Silverman (1991) or Capra and Müller
780 H.-G. MÜLLER AND U. STADTMÜLLER

(1997). Whenever estimation and inference for the intercept α is to be included,


we change the summation range for the linear predictors ηi on the right-hand side
p p
of the p-truncated model (3) to 0 from 1 , setting ε0(i) = 1 and β0 = α. In the
following, inclusion of α into the parameter vector will be the default.
Fixing p for the moment, we are in the situation of the usual estimating equation
approach and can estimate the unknown parameter vector β T = (β0 , . . . , βp ) by
solving the estimating or score equation
(5) U (β) = 0.
p
Setting ε(i)T = (ε0(i) , . . . , εp(i) ), ηi = j =0 βj εj(i) , µi = g(ηi ), i = 1, . . . , n, the
vector-valued score function is defined by

n
(6) U (β) = (Yi − µi )g  (ηi )ε(i) /σ 2 (µi ).
i=1

The solutions of the score equation (5) will be denoted by


(7) β̂ T = (β̂0 , . . . , β̂p ); α̂ = βˆ0 .
Relevant matrices which play a well-known role in solving the estimating
equation (5) are
 
D = Dn,p = g  (ηi )εk(i) /σ (µi ) 1≤i≤n, 0≤k≤p ,
 
V = Vn,p = diag σ 2 (µ1 ), . . . , σ 2 (µn ) 1≤i≤n ,

and with generic copies η, ε, µ of ηi , ε(i) , µi , respectively,


 2 
g (η)
= p = (γkl )0≤k,l≤p , γkl = E εk εl ,
σ 2 (µ)
(8)
= −1 = (ξkl )0≤k,l≤p .

We note that = n1 E(D T D) is a symmetric and positive definite matrix and


the inverse matrix exists. Otherwise, one would arrive at the contradiction
that
E(( k=0 αk εk g  (η)/σ (µ)))2 ) = 0 for nonzero constants α0 , . . . , αp .
p

With vectors Y T = (Y1 , . . . , Yn ), µT = (µ1 , . . . , µn ), the estimating equation


U (β) = 0 can be rewritten as
D T V −1/2 (Y − µ) = 0.
This equation is usually solved iteratively by the method of iterated weighted
least squares. Under our basic assumptions, as n1 E(D T D) = p is a fixed positive
definite matrix for each p, the existence of a unique solution for each fixed p is
assured asymptotically.
GENERALIZED FUNCTIONAL LINEAR MODELS 781

In the above developments we have assumed that both the link function g(·)
and the variance function σ 2 (·) are known. Situations where the link and variance
functions are unknown are common, and we can extend our methods to cover the
general case where these functions are smooth, which for fixed p corresponds
to the semiparametric quasi-likelihood regression (SPQR) models considered in
Chiou and Müller (1998, 1999). In the implementation of SPQR one alternates
nonparametric (smoothing) and parametric updating steps, using a reasonable
parametric model for the initialization step. Since the link function is arbitrary,
except for smoothness and monotonicity constraints, we may require that estimates
and parameters satisfy β = 1, β̂ = 1 for identifiability.
p
For given β̂, β̂ = 1, setting η̂i = j =0 β̂j εj(i) , updates of the link function
estimate ĝ(·) and its first derivative ĝ  (·) are obtained by smoothing (applying any
reasonable scatterplot smoothing method that allows the estimation of derivatives)
the scatterplot (η̂i , Yi )i=1,...,n . Updates for the variance function estimate σ̂ 2 (·)
are obtained by smoothing the scatterplot (µ̂i , ε̂i2 )i=1,...,n , where µ̂i = ĝ(η̂i ) are
current mean response estimates and ε̂i2 = (Yi − µ̂i )2 are current squared residuals.
The parametric updating step then proceeds by solving the score equation (5),
using the semiparametric score

n
   
(9) U (β) = Yi − ĝ(ηi ) ĝ  (ηi )ε(i) /σ̂ 2 ĝ(ηi ) .
i=1

This leads to the solutions β̂, in analogy to (7). For solutions of the score equations
for both scores (6) and (9), we then obtain the regression function estimates
p

(10) β̂(t) = β̂0 + β̂j ρj (t).
j =1

Matrices D and are modified analogously for the SPQR case, substituting
appropriate estimates.

4. Asymptotic inference. Given an L2 -integrable integral kernel function


R(s, t) : T 2 → R, define the linear integral operator AR : L2 (dw) → L2 (dw) on
the Hilbert space L2 (dw) for f ∈ L2 (dw) by

(11) (AR f )(t) = f (s)R(s, t) dw(s).

Operators AR are compact self-adjoint Hilbert–Schmidt operators if



|R(s, t)|2 dw(s) dw(t) < ∞,

and can then be diagonalized [Conway (1990), page 47].


782 H.-G. MÜLLER AND U. STADTMÜLLER

Integral operators of special interest are the autocovariance operator AK of X


with kernel
   
(12) K(s, t) = cov X(s), X(t) = E X(s)X(t)
and the generalized autocovariance operator AG with kernel
  2 
g (η)
(13) G(s, t) = E X(s)X(t) .
σ 2 (µ)
Hilbert–Schmidt operators AR generate a metric in L2 ,

  
dR2 (f, g) = f (t) − g(t) AR (f − g) (t) dw(t)

  
= f (s) − g(s) f (t) − g(t) R(s, t) dw(s) dw(t)

for f, g ∈ L2 (dw), and given an arbitrary orthonormal basis {ρj , j = 1, 2, . . .}, the
Hilbert–Schmidt kernels R can be expressed as

R(s, t) = rkl ρk (s)ρl (t)
k,l

for suitable coefficients {rkl , k, l = 1, 2, . . .} [Dunford and Schwartz (1963),


page 1009]. Using for any given function h ∈ L2 the notation

hρ,j = h(s)ρj (s) dw(s)

and denoting the normalized eigenfunctions and eigenvalues of the operator AR


by {ρjR , λR
j , j = 1, 2, . . .}, the distance dR can be expressed as

dR2 (f, g) = rkl (fρ,k − gρ,k )(fρ,l − gρ,l )
k,l
(14) 
= k (fρ R ,k − gρ R ,k ) .
λR 2

k
In the following we use the metric dG , since it allows us to derive asymptotic
limits under considerably simpler conditions than for the L2 metric, due to its
dampening effect on higher order frequencies. For the sequence of pn -truncated
models (1) that we are considering,
   2 
   g (η)
2
dG (β̂, β) = β̂(s) − β(s) β̂(t) − β(t) E X(s)X(t) dw(s) dw(t)
σ 2 (µ)
2 (β̂, β) = (β̂ − β)T (β̂ − β) for each p.
is approximated by dG,p
In addition to the basic assumptions in Section 2 and usual conditions on
variance and link functions, we require some technical conditions which restrict
the growth of p = pn and the higher-order moments of the random coefficients εj .
GENERALIZED FUNCTIONAL LINEAR MODELS 783

Additional conditions are required for the semiparametric (SPQR) case where
both link and variance functions are assumed unknown and are estimated
nonparametrically.
(M1) The link function g is monotone, invertible and has two continuous bounded
derivatives with g  (·) ≤ c, g  (·) ≤ c for a constant c ≥ 0. The variance
function σ 2 (·) has a continuous bounded derivative and there exists a δ > 0
such that σ (·) ≥ δ.
(M2) The number of predictor terms pn in the sequence of approximating
pn -truncated models (1) satisfies pn → ∞ and pn n−1/4 → 0 as n → ∞.
(M3) It holds that [see (8), where the ξkl are defined]
 
g 4 (η)
pn

E εk1 εk2 εk3 εk4 4 ξk1 k2 ξk3 k4 = o(n/pn2 ).
k1 ,...,k4 =0
σ (µ)

(M4) It holds that


pn
  4 
g (η)
E εk1 εk3 εk5 εk7
k1 ,...,k8 =0
σ 4 (µ)
 4 
g (η)
×E εk2 εk4 εk6 εk8 ξk1 k2 ξk3 k4 ξk5 k6 ξk7 k8 = o(n2 pn2 ).
σ 4 (µ)
We are now in a position to state the central asymptotic result. Given p = pn ,
denote by β̂ = (β̂0 , . . . , β̂p )T the solution of the estimating equations (5), (6)
and by β = (β0 , . . . , βp )T the intercept α = β0 and the first p coefficients of the

expansion of the parameter function β(t) = ∞ j =1 βj ρj (t) in the basis {ρj , j ≥ 1}.

T HEOREM 4.1. If the basic assumptions and (M1)–(M4) are satisfied, then

n(β̂ − β)T pn (β̂ − β) − (pn + 1) d


(15) √ → N(0, 1) as n → ∞.
2(pn + 1)

We note that the matrix pn in Theorem 4.1 may be replaced by the empirical
version ˜ = n1 (DD T ); this is a consequence of (21), (22) and Lemma 7.2 below.
Whenever only the “slope” parameters β1 , β2 , . . . but not the intercept parameter
α = β0 are of interest, pn is replaced by pn − 1 and the (p + 1) × (p + 1) matrix
is replaced by the p × p submatrix of obtained by deleting the first row/column.
To study the convergence of the estimated parameter function β̂(·), we use the
distance dG and the representation (14) with R ≡ G, coupled with the expansion
pn

β̂(t) = β̂ρ G ρjG (t)
j
j =1
784 H.-G. MÜLLER AND U. STADTMÜLLER

of the estimated parameter function β̂(·) in the basis {ρjG , j = 1, 2, . . .}, the
eigenbasis of operator AG with associated eigenvalues λG
j . We obtain

     
2
dG β̂(·), β(·) = β̂(s) − β(s) G(s, t) β̂(t) − β(t) dw(s) dw(t)
p
 ∞

 2
= j β̂ρ G − βρ G
λG + λG 2
j βρ G
j j j
j =1 j =p+1


= (β̂ G − β G )T G (β̂ G − β G ) + λG 2
j βρ G .
j
j =p+1

Here
 T  T
β̂ G = β̂ρ G , . . . , β̂ρpG , β G = βρ G , . . . , βρpG ,
1 1

and the diagonal matrix G is obtained by replacing in the definition of the matrix
[see (8)] the εj by εjG that are given by

g  (η)
εjG = X(t)ρjG (t) dw(t),
σ (µ)
with the property

(16) E(εjG εkG ) = G(s, t)ρjG (s)ρkG (t) dw(s) dw(t) = δij λG
j .

These considerations lead under appropriate moment conditions to the following:

C OROLLARY 4.1. If the parameter function β(·) has the property that
∞  2 √ 
 G2 pn
(17) E εj β(t)ρj (t) dw(t) = o
G
,
j =p+1
n

then

n (β̂(s) − β(s))(β̂(t) − β(t))G(s, t) dw(s) dw(t) − (pn + 1) d
√ → N(0, 1)
2 pn + 1
as n → ∞.

We note that property (17) relates to the rate at which higher-order oscillations,
relative to the oscillations of processes X(t), contribute to the L2 norm of the
parameter function β(·).
In the case of unknown link and variance functions (SPQR), one applies scatter-
plot smoothing to obtain nonparametric estimates of functions and derivatives and
then obtains the parameter estimates β̂ as solutions of the semiparametric score
equation (9). After iteration, final nonparametric estimates of the link function ĝ,
GENERALIZED FUNCTIONAL LINEAR MODELS 785

its derivative ĝ  and of the variance function σ̂ 2 are obtained. We implement these
nonparametric curve estimators with local linear or quadratic kernel smoothers,
using a bandwidth h in the smoothing step. For the following result we assume
these conditions:
(R1) The regularity conditions (M1)–(M6) and (K1)–(K3) of Chiou and Müller
(1998) hold uniformly for all pn .
(R2) For the bandwidths h of the nonparametric function estimates for link and
nh3 √p −1/2  → 0 as n → ∞.
variance function, h → 0, log n → ∞ and  nh2

The following result refers to the matrix


 
1 n
ĝ 2 (ηˆi )
(18) ˆ = (γ̂kl )1≤k,l≤pn , γ̂k,l = εki εli .
n i=1 σ̂ 2 (ηˆi )

C OROLLARY 4.2. Assume (R1) and (R2) and replace the matrix in (15)
by the matrix ˆ from (18). Then (15) remains valid for the semiparametric
quasi-likelihood (SPQR) estimates β̂ that are obtained as solutions of the
semiparametric estimating equation (9), substituting nonparametrically estimated
link and variance functions.

Extending the arguments used in the proofs of Theorems 1 and 2 in Chiou and
Müller (1998), and assuming additional regularity conditions as described there,
we find for these nonparametric function estimates,
 2   √ 
 ĝ (t)

g 2 (t)  log n pn
sup  2 − = Op + h + 2 β̂ − β .
2
t σ̂ (t) σ 2 (t)  nh3 h
3
Assuming that h → 0, log √p −1/2  → 0, we obtain from the
n → ∞ and  nh2
nh

boundedness of the design density of the linear predictors away from 0 and ∞
that
ĝ 2 (η̂) g 2 (η)
= 2 + op (1),
σ̂ 2 (η̂) σ (η)
where the op -terms are uniform in p following (M2). Therefore, the matrix ˆ
approximates the elements of the matrix
 
1 1 n
g 2 (ηi )
˜ = (DD T ) = (γ̃kl )1≤k,l≤pn , γ̃k,l = εki εli
n n i=1 σ 2 (ηi )
uniformly in k, l and pn . This, together with the remarks after Theorem 4.1,
justifies the extension to the semiparametric (SPQR) case with unknown link
and variance functions. This case will be included in the following, unless noted
otherwise.
786 H.-G. MÜLLER AND U. STADTMÜLLER

A common problem of inference in regression models is testing for no


regression effect, that is, H0 : β ≡ const, which is a special case of testing for
H0 : β ≡ β0 for a given regression parameter function β0 . With the representation
β0 (t) = β0j ρj (t), the null hypothesis becomes H0 : βj = β0j , j = 0, 1, 2, . . . ,
and H0 is rejected when the test statistic in Theorem 4.1 exceeds the critical value
(1 − α), for the case of a fully specified link function. Through a judicious
choice of the orthonormal basis
 {ρj , j = 1, 2, . . .}, these tests also include null
hypotheses of the type H0 : β(t)hj (t) dw(t) = τj , j = 1, 2, . . . , for a sequence
of linearly independent functions hj ; these are transformed into an orthonormal
basis by Gram–Schmidt orthonormalization, whence it is easy to see that these
null hypotheses translate into H0 : βj = τj , j = 1, 2, . . . , for suitable τj if we use
the new orthonormal basis in lieu of the {ρj , j ≥ 1}. For alternative approaches to
testing in functional regression, we refer to Fan and Lin (1998).
Another application of practical interest is the construction of confidence bands
for the unknown regression parameter function β. In a finite sample situation for
which p = pn is given and estimates β̂ for p-vectors β have been determined, an
asymptotic (1 − α) confidence region for β according to Theorem 4.1 is given by

(β̂ − β)T (β̂ − β) ≤ c(α), where c(α) = [p + 1 + 2(p + 1) (1 − α)]/n, and
may be replaced by its empirical counterparts ˜ or . ˆ More precisely, we have
the following:

C OROLLARY 4.3. Denote the eigenvectors/eigenvalues of the matrix [see


(8)] by (e1 , λ1 ), . . . , (ep+1 , λp+1 ), and let
p+1

ek = (ek1 , . . . , ek,p+1 )T , ωk (t) = ρl (t)ekl , k = 1, . . . , p + 1.
l=1

Then, for large n and pn , an approximate (1 − α) simultaneous confidence band


is given by


 p+1
 ωk (t)2

(19) 
β̂(t) ± c(α) .
k=1
λk

A practical simultaneous band is obtained by substituting estimates for


ωk and λk that result from empirical matrices ˜ or ˆ instead of .

5. Simulation study and model selection.

5.1. Model order selection. An auxiliary parameter of importance in the


estimation procedure is the number p of eigenfunctions that are used in fitting
the function β(t). This number has to be chosen by the statistician. An appealing
method is the Akaike information criterion (AIC), due to its affinity to increasing
GENERALIZED FUNCTIONAL LINEAR MODELS 787

model orders, and, in addition, we found AIC to work well in practice. We discuss
here the consistency of AIC for choosing p in the context of the generalized linear
model with full likelihood and known link function.
Assume the linear predictor vector ηp consists of n components ηp,i =
p p
j =0 εj βj , i = 1, . . . , n, the vector η̂p of the components η̂p,i =
i i
j =0 εj β̂j
∞ i
and the vector η of the components j =0 εj βj . Let G be the antiderivative
of the (inverse) link function g so that Y has the density (in canonical form)
fY (y) = exp(yη + a(y) − G(η)). In particular, σ̃ 2 (η) = g  (η). The deviance is
 
D = −2n (Y, η̂p ) + 2n Y, g −1 (Y ) ,

with log-likelihood

n 
n
n (Y, η̂p ) = Yi η̂i,p − G(η̂i,p ).
i=1 i=1

Taylor expansion yields

−2n (Y, η̂p ) = −2n (Y, ηp )


 T
+ 2 ∇βp n (Y, η̂p ) (βp − β̂p )
 
∂2
+ (βp − β̂p )T n (Y, η̃p ) (βp − β̂p ),
∂βk ∂β
where the second term on the right-hand side is zero, due to the score equation, and
the matrix in the quadratic form is essentially (D T D). It follows from the proof of
T
Theorem 4.1 that the quadratic form n(βp − β̂p )T ( D n D )(βp − β̂p ) has asymptotic
expectation p. Since

n
 
−2n (Y, ηp ) = −2n (Y, η) − 2 Yi − g(ηi ) (ηi,p − ηi )
i=1

n
+ g  (ηi )(ηi,p − ηi )2 ,
i=1

we arrive at
    
E(D) = n E g  (η)εk εl βk βl − p 1 + o(1) + En
k,l=p+1

  2 
g (η)  
=n E εk εl βk βl − p 1 + o(1) + En ,
k,l=p+1
σ̃ 2 (η)

where En is an expression that does not depend on p.


788 H.-G. MÜLLER AND U. STADTMÜLLER

Applying the law of large numbers, and similar considerations as in Section 7,


p
we find D/E(D) → 1, as long as p is chosen in (p0 , cn1/4 ). Next, applying
results of Section 7,

     
d β̂(·), β(·) = β̂(s) − β(s) G(s, t) β̂(t) − β(t) dw(s) dw(t)


= (β̂p − βp )T (β̂p − βp ) + γj,k βj βk
k,j =p+1
p
 ∞

+2 γj,k (β̂j − βj )βk ,
j =1 k=p+1

2
where γk,l = E( gσ̃ 2 (η)
(η)
εk εl ). We obtain E(d(β̂(·), β(·))) = p/n(1 + o(1)) +
∞
k,j =p+1 γj,k βj βk (1 + o(1)).
This analysis shows that the target function d(β̂(·), β(·)) to be minimized is
asymptotically close to E(D/n) + 2p/n. This suggests that we are in the situation
considered by Shibata (1981) for sequences of linear models with normal residuals
and by Shao (1997) for the more general case. While the closeness of the target
function and AIC is suggestive, a rigorous proof that the order pA selected by
AIC and the order pd that minimizes the target function satisfy pd /pA → 1
in probability as n → ∞ or a stronger consistency or efficiency result requires
additional analysis that is not provided here. One difficulty is that the usual
normality assumption is not satisfied as one operates in an exponential family or
quasi-likelihood setting.
In practice, we implement AIC and the alternative Bayesian information
criterion BIC by obtaining first the deviance or quasi-deviance D(p), dependent
on the model order p. This is straightforward in the quasi-likelihood or maximum
likelihood case with known link function, and requires integrating the score
function to obtain the analogue of the log-likelihood in the SPQR case with
unknown link function. Once the deviance is obtained, we choose the minimizing
argument of

(20) C(p) = D(p) + P (p),

where P is the penalty term, chosen as 2p for the AIC and as p log n for the BIC.
Several alternative selectors that we studied were found to be less stable and
more computer intensive in simulations. These included minimization of the leave-
one-out prediction error, of the leave-one-out misclassification rate via cross-
validation [Rice and Silverman (1991)], and of the relative difference between the
Pearson criterion and the deviance [Chiou and Müller (1998)].
GENERALIZED FUNCTIONAL LINEAR MODELS 789

5.2. Monte Carlo study. Besides choosing the number p of components to


include, an implementation of the proposed generalized functional linear model
also requires choice of a suitable orthonormal basis {ρj , j = 1, 2, . . . }. Essentially
one has√ two options, using a fixed standard basis such as the Fourier basis ρj ≡
ϕj ≡ 2 sin(πj t), t ∈ [0, 1], j ≥ 1, or, alternatively, to estimate the eigenfunctions
of the covariance operator AK (11), (12) from the data, with the goal of achieving a
sparse representation. We implemented this second option following an algorithm
for the estimation of eigenfunctions which is described in detail in Capra and
Müller (1997); see also Rice and Silverman (1991). Once the number of model
components p has been determined, the ith observed process is reduced to the

p predictors εj(i) = Xi (t)ρj (t) dw(t), j = 1, . . . , p. We substitute the estimated
eigenfunctions for the ρj and evaluate the integrals numerically.
Once we have reduced the infinite-dimensional model (1) to its p-truncated
approximation (3), we are in the realm of finite-dimensional generalized linear
and quasi-likelihood models. The parameters α and β1 , . . . , βp in the p-truncated
generalized functional model are estimated by solving the respective score
equation. We adopted the weighted iterated least squares algorithm which is
described in McCullagh and Nelder (1989) for the case of a generalized linear
or quasi-likelihood model with known link function, and the QLUE algorithm
described in Chiou and Müller (1998) for the SPQR model with unknown link
function.
The purpose of our Monte Carlo study was to compare AIC and BIC as selection
criteria for the order p, to study the power of statistical tests for regression
effect in a generalized functional regression model and, finally, to investigate
the behavior of the semiparametric SPQR procedure for functional regression,
in comparison to the maximum or quasi-likelihood implementation with a fully
specified link function. The design was as follows: Pseudo-random 
processes
based on the first 20 functions from the Fourier base X(t) = 20 j =1 εj ϕ j (t) were
generated by using normal pseudo-random variables εj ∼ N(0, 1/j ), j ≥ 1. 2

Choosing βj = 1/j , 1 ≤ j ≤ 3, β0 = 1, βj = 0, j > 3, we defined β(t) =


20 20
j =1 βj ϕ(t) and p(X(·)) = g(β0 + j =1 βj εj ), choosing logit link [with g(x) =
exp(x)/(1 + exp(x))] and c-loglog link [with g(x) = exp(− exp(−x))]. Then
we generated responses Y (X) ∼ Binomial(p(X), 1) as pseudo-Bernoulli r.v.s
with probability p(X), obtaining a sample (Xi (t), Yi ), i = 1, . . . , n. Estimation
methods included generalized functional linear modeling with logit, c-loglog and
unspecified (SPQR) link functions.
In results not shown here, a first finding was that the AIC performed somewhat
better than BIC overall, in line with theoretical expectations, and, therefore, we
used AIC in the data applications. To demonstrate the asymptotic results, in
particular, Theorem 4.1, we obtained empirical power functions for data generated
and analyzed with the logit link, using the test statistic T on the left-hand side
of (16) to test the null hypothesis of no regression effect H0 : βj = 0, j = 1, 2, . . . .
790 H.-G. MÜLLER AND U. STADTMÜLLER

This test was implemented as a one-sided test at the 5% level, that is, rejection was
recorded whenever |T | > −1 (0.95). The average rejection rate was determined
over 500 Monte Carlo runs, for sample sizes n = 50, 200, as a function of δ, 0 ≤
δ ≤ 2, where the underlying parameter vector was as described in the preceding
paragraph, multiplied by δ, and is given by (δ, δ, δ/2, δ/3). The resulting power
functions are shown in Figure 1 and demonstrate that sample size plays a critical
role.
To demonstrate the usefulness of the SPQR approach with automatic link
estimation, we calculated the means of the estimated regression parameter
functions β̂(·) over 50 Monte Carlo runs for the following cases: In each run,
1000 samples were generated with either the logit or c-loglog link function and the
corresponding functions β(·) were estimated in three different ways: Assuming
a logit link, a c-loglog link and assuming no link, using the SPQR method.
The resulting mean function estimates can be seen in Figure 2. One finds that
misspecification of the link function can lead to serious problems with these
estimates and that the flexibility of the SPQR approach entails a clear advantage
over methods where a link function must be specified a priori.

F IG . 1. Empirical power functions for the significance test for a functional logistic regression effect
at the 5% level. Based on 500 simulations, for sample sizes 50 (dashed ) and 200 (solid ), with p = 3.
GENERALIZED FUNCTIONAL LINEAR MODELS 791

F IG . 2. Average estimates of the regression parameter function β(·) obtained over 50 Monte Carlo
runs from data generated either with the logit link (left panel) or with the c-loglog link (right
panel). Each panel displays the target function (solid ), and estimates obtained assuming the logit
link (dashed ), the c-loglog link (dash-dot) and the SPQR method incorporating nonparametric link
function estimation (dotted ).

6. Application to medfly data and classification. It is a long-standing


problem in evolution and ecology to analyze the interplay of longevity and
reproduction. On one hand, longevity is a prerequisite for reproduction; on the
other hand, numerous articles have been written about a “cost of reproduction,”
which is the concept that a high degree of reproduction inflicts a damage on the
organism and shortens its lifespan [see, e.g., Partridge and Harvey (1985)]. The
precise nature of this cost of reproduction remains elusive.
Studies with Mediterranean fruit flies (Ceratitis capitata), or medflies for short,
have been of considerable interest in pursuing these questions as hundreds of
flies can be reared simultaneously and their daily reproduction activity can be
observed by simply counting the daily eggs laid by each individual fly, in addition
to recording its lifetime [Carey et al. (1998a, b)]. For each medfly, one may thus
obtain a reproductive trajectory and one can then ask the operational question
whether particular features of this random curve have an impact on subsequent
mortality [see Müller et al. (2001) for a parametric approach and Chiou, Müller
and Wang (2003) for a functional model, where the egg-laying trajectories are
viewed as response]. In the present framework we cast this as the problem to
predict whether a fly is short- or long-lived after an initial period of egg-laying
is observed. We adopt a functional binomial regression model where the initial
egg-laying trajectory is the predictor process and the subsequent longevity status
of the fly is the response. Of particular interest is the shape of the parameter
function β(·), as it provides an indication as to which features of the egg-laying
process are associated with the longevity of a fly.
From the one thousand medflies described in Carey et al. (1998a), we select
flies which lived past 34 days, providing us with a sample of 534 medflies. For
prediction, we use egg-laying trajectories from 0 to 30 days, slightly smoothed to
792 H.-G. MÜLLER AND U. STADTMÜLLER

obtain the predictor processes Xi (t), t ∈ [0, 30], i = 1, . . . , 534. A fly is classified
as long-lived if the remaining lifetime past 30 days is 14 days or longer, otherwise
as short-lived. Of the n = 534 flies, 256 were short-lived and 278 were long-lived.
We apply the algorithm as described in the previous section, choosing the logit
link, fitting a logistic functional regression.
Plotting the reproductive trajectories for the long-lived and short-lived flies
separately (upper panels of Figure 3), no clear visual differences between the two
groups can be discerned. Failure to visually detect differences between the two
groups could result from overcrowding of these plots with too many curves, but
when displaying fewer curves (lower panels of Figure 3), this remains the same.
Therefore, the discrimination task at hand is difficult, as at best subtle and hard to
discern differences exist between the trajectories of the two groups.
We use the Akaike information criterion (AIC) for choosing the number of
model components. As can be seen from Figure 4, where the AIC criterion is

F IG . 3. Predictor trajectories, corresponding to slightly smoothed daily egg-laying curves, for


n = 534 medflies. The reproductive trajectories for 256 short-lived medflies are in the upper left
and those for 278 long-lived medflies in the upper right panel. Randomly selected profiles from the
panels above are shown in the lower panels for 50 medflies.
GENERALIZED FUNCTIONAL LINEAR MODELS 793

F IG . 4. Akaike information criterion (AIC ) as a function of the number of model components p for
the medfly data.

shown in dependency on the model order p, this leads to the choice p = 6. The
 (−i) (−i)
cross-validation prediction error criterion PE = n1 ni=1 (Yi − p̂i )2 , where p̂i
is the leave-one-out estimate for pi , supports a similar choice. The leave-out
misclassification rate estimates are, for the group of long-lived flies, 37% with
logit link and 35% for the nonparametric SPQR link, while for the group of short-
lived flies these are 47% for logit and 48% for SPQR, demonstrating the difficulty
of classifying short-lived flies correctly.
The fitted regression parameter functions β̂(·) for both logistic (logit link)
and SPQR (nonparametric link) functional regression, along with simultaneous
confidence bands (19), are shown in Figure 5; we find that the estimate with
nonparametric link is quite close to the estimate employing the logistic link,
thus providing some support for the choice of the logistic link in this case. The
asymptotic confidence bands allow us to conclude that the link function has a steep
rise at the right end towards age 30 days, and that the null hypothesis of no effect
would be rejected.
The shape of the parameter function β̂(·) highlights periods of egg-laying
that are associated with increased longevity. We note that under the logit link
794 H.-G. MÜLLER AND U. STADTMÜLLER

F IG . 5. The regression parameter function estimates β̂(·) (19) (solid ) for the medfly classification
problem, with simultaneous confidence bands (5) (dashed ). Left panel: Logit link. Right panel:
Nonparametric link, using the SPQR algorithm.

function, the predicted classification probability for a long-lived fly is g(η) =


exp(η)/(1 + exp(η)). Overlaid with this expit-function, the nonparametric link
function estimate that is employed in SPQR is shown in Figure 6 (choosing
local linear smoothing and the bandwidth 0.55 for the smoothing steps), along
with the corresponding indicator data from the last iteration step. For both links,
larger linear predictors η, and therefore larger values of the parameter regression
function β(·), are seen to be associated with an increased chance for longevity.
Since the parameter function is relatively large between about 12–17 days and
past 26 days, we conclude that heavy reproductive activity during these periods is
associated with increased longevity. In contrast, increased reproduction between
8–12 days and 20–26 days is associated with decreased longevity. A high level
of late reproduction emerges as a significant and overall as the strongest indicator
of longevity in our analysis. This is of biological significance since it implies that
increased late reproduction is associated with increased longevity and may have a
protective effect. Increased reproduction during the peak egg-laying period around
GENERALIZED FUNCTIONAL LINEAR MODELS 795

F IG . 6. Logit link (dashed ) and nonparametric link function (solid ) obtained via the SPQR
algorithm, with overlaid group indicators, versus level of linear predictor η.

10 days has previously been associated with a cost of reproduction, an association


that is supported by our analysis.

7. Proof of Theorem 4.1 and auxiliary results. Proofs of the auxiliary


results in this section are provided in the Appendix. Throughout, we assume
that all assumptions of Theorem 4.1 are satisfied and work with the matrices
= (γkl ), = −1 = (ξkl ), 0 ≤ k, l ≤ p, defined in (8) and also with the matrix
(1/2)
1/2 =: (ξkl ), 0 ≤ k, l ≤ p. We will use both versions σ (·) and σ̃ (·) to represent
the variance function, depending on the context, noting that σ (µ) = σ̃ (η) and the
notation β, β̂ for the (pn + 1)-vectors defined before Theorem 4.1 and β(·) for the
parameter function.
For the first step of the proof of Theorem 4.1, we adopt the usual Taylor
expansion based approach for showing asymptotic normality for an estimator
which is defined through an estimating equation; see, for example, McCullagh
(1983). Writing the Hessian of the quasi-likelihood as Jβ = β U (β) and noting
796 H.-G. MÜLLER AND U. STADTMÜLLER

that

n
D D= T
g 2 (ηi )ε(i) ε(i)T /σ̃ 2 (ηi ),
i=1
we obtain

n
∂      
Jβ = g (ηi )ε(i) Yi − g(ηi ) /σ̃ 2 g(ηi ) · β ηi
i=1
∂ηi



n
  g (ηi ) g 2 (ηi )σ̃ 2 (ηi )
= −D T D − Yi − g(ηi ) ε(i) ε(i)T −
i=1
σ̃ 2 (ηi ) σ̃ 4 (ηi )
= −D T D + R, say.
We aim to show that the remainder term R can eventually be neglected. By a Taylor
expansion, for a β̃ between β and β̂,
U (β) = U (β̂) − Jβ̃ (β̂ − β) = −Jβ̃ (β̂ − β)
= −[D T D(β̂ − β) + (Jβ̃ − Jβ )(β̂ − β) + (Jβ − D T D)(β̂ − β)].
Denoting the q × q identity matrix by Iq , this leads to
√ √  −1
n(β̂ − β) = n D T D + (Jβ̃ − Jβ ) + (Jβ − D T D) U (β)
  −1  J − J   −1  −1
DT D β̃ β DT D Jβ − D T D
= Ipn +1 + +
n n n n
 −1
DT D U (β)
× √ .
n n

Using the matrix norm M2 = ( m2kl )1/2 , we find (see Appendix for the
proof ) the following:

L EMMA 7.1. As n → ∞,
  T −1 
√ D D U (β) 
 n(β̂ − β) − √ = op (1).
 n n  2

The asymptotically prevailing term is seen to be


 −1
√ DT D U (β)
n(β̂ − β) ∼ √ ,
n n
corresponding to
 −1
DT D D T V −1/2 (Y − µ)
Zn = √
n n
 −1  −1
DT D D T V −1/2 e DT D D T e
= √ = √ .
n n n n
GENERALIZED FUNCTIONAL LINEAR MODELS 797

Of interest is then the asymptotic distribution of ZTn Zn . Defining (p + 1)-vectors


Xn and (p + 1) × (p + 1)-matrices n by
 −1
n D T e 
1/2
DT D
(21) Xn = √ , n = n1/2 n1/2 ,
n n
we may decompose this into three terms,
(22) ZTn Zn = XTn n2 Xn
 
= XTn Xn + 2XTn n − Ipn +1 Xn
(23)   
+ XTn n − Ipn +1 n − Ipn +1 Xn
(24) = Fn + Gn + Hn , say.
The following lemma is instrumental, as it implies that in deriving the limit
distribution, Gn and Hn are asymptotically negligible as compared to Fn .

L EMMA 7.2. Under the conditions


(M3 )
pn = o(n1/3 ),
 p 4 (η)
(M4 ) k1n,...,k4 =0 E(εk1 εk2 εk3 εk4 gσ̃ 4 (η) )ξk1 k2 ξk3 k4 = o(n/pn2 ),
we have that
 
n − Ip +1 2 = Op (1/pn ).
n 2

Note that conditions (M3 ) and (M4 ) are weaker than the corresponding
conditions (M2) and (M3) and, therefore, will be satisfied under the basic
assumptions. A consequence of Lemma 7.2 is
 T    
X n − Ip +1 Xn  ≤ |Xn XT |n − Ip +1 
n n n n 2
 √  √ 
= Op (pn )op 1/ pn = op pn .
√ p
Therefore, Gn / pn → 0. The bound for the term Hn is completely analogous.
√ d
Since we will show in Proposition 7.1 below that (XTn Xn − (pn + 1))/ 2pn →
N(0, 1) [this implies |Xn XTn | = Op (pn )], it follows that Gn + Hn = op (Fn ) so
that these terms can indeed be neglected. The proof of Theorem 4.1 will therefore
be complete if we show the following:
√ d
P ROPOSITION 7.1. As n → ∞, (XTn Xn − (pn + 1))/ 2pn → N(0, 1).

For the proof of Proposition 7.1, we make use of


n pn p
1/2 D T e   (1/2) g  (ην ) (ν)
 √
Xn = √ = ξit εt eν / n
n ν=1 t=0
σ̃ (ην ) i=0
798 H.-G. MÜLLER AND U. STADTMÜLLER

and
pn
 (1/2) (1/2)
ξkt1 ξkt2 = ξt1 t2
k=0

to obtain
g  (ην1 ) g  (ην2 ) (ν1 ) (ν2 ) (1/2) (1/2)
pn  pn
1 n 
XTn Xn = eν 1 eν 2 ε ε ξ ξ
n k=0 ν ,ν =1 t ,t =0 σ̃ (ην1 ) σ̃ (ην2 ) t1 t2 kt1 kt2
1 2 1 2

1 n pn
 2
(ν) g (ην )
= eν2 εt(ν) ε ξt ,t
n ν=1 t ,t =0 1 t2 σ̃ 2 (ην ) 1 2
1 2

g  (ην1 ) g  (ην2 ) 
pn
1  n
eν 1 eν 2
(ν ) (ν )
+ ε 1 ε 2 ξt t
n ν =ν =1 σ̃ (ην1 ) σ̃ (ην2 ) t ,t =0 t1 t2 1 2
1 2 1 2

= An + Bn , say.

We will analyze these terms in turn and utilize the independence of the random
variables associated with observations (Xi , Yi ) for different values of i, the
independence of the ei of all ε’s, and E(e ) = 0, E(e2 ) = 1.

L EMMA 7.3. For An , it holds that


An − (pn + 1) p
√ → 0.
pn

Turning now to the second term Bn , we show that it is asymptotically normal.


Defining the r.v.s
−1
g  (ηk ) g  (ηj ) 
j pn
e k e j
(k) (j )
Wnj = ε ε ξt t ,
k=1
σ̃ (ηk ) σ̃ (ηj ) t ,t =0 t1 t2 1 2
1 2

we may write

2 n
Bn = Wnj .
n j =1

A key result is now the following:

L EMMA 7.4. The random variables {Wnj , 1 ≤ j ≤ n, n ∈ N} form a tri-


angular array of martingale difference sequences w.r.t. the filtrations (Fnj ) =
(i)
σ (εt , ei , 1 ≤ i ≤ j, 0 ≤ t ≤ pn )(1 ≤ j ≤ n, n ∈ N).
GENERALIZED FUNCTIONAL LINEAR MODELS 799

Note that Fn,j ⊂ Fn+1,j . Lemma 7.4 implies that the r.v.s W nj = √2 Wnj
n 2pn
also form a triangular array of martingale difference sequences. According to the
central limit theorem for martingale difference sequences [Brown (1971); see also
Hall and Heyde (1980), Theorem 3.2 and corollaries], sufficient conditions for
 nj →d
the asymptotic normality nj=1 W N(0, 1) are the conditional normalization
condition and the conditional Lyapunov condition. The following two lemmas
which are proved in the Appendix demonstrate that these sufficient conditions are
satisfied. We note that martingale methods have also been used by Ghorai (1980)
for the asymptotic distribution of an error measure for orthogonal series density
estimates.

L EMMA 7.5 (Conditional normalization condition).



n
p
2 |Fn,j −1 ) → 1,
E(W n → ∞.
nj
j =1

L EMMA 7.6 (Conditional Lyapunov condition).



n
p
4 |Fn,j −1 ) → 0,
E(W n → ∞.
nj
j =1

√ d
A consequence of Lemmas 7.5 and 7.6 is then Bn / 2 pn → N(0, 1). Together
with Lemma 7.4, this implies Proposition 7.1 and, thus, Theorem 4.1.

APPENDIX

We provide here the main arguments of the proofs of several corollaries and of
the auxiliary results which were used in Section 7 for the proof of Theorem 4.1.

P ROOF OF C OROLLARY 4.2. Extending the arguments used in the proofs of


Theorems 1 and 2 in Chiou and Müller (1998), we find for these nonparametric
function estimates under (R1) that
 2   √ 
 ĝ (t)

g 2 (t)  log n pn
sup  2 − 2  = Op + h2
+ β̂ − β .
t σ̂ (t) σ (t) nh3 h2
Define the matrix
 
1 1 n
g 2 (ηi )
˜ = (DD T ) = (γ̃kl )1≤k,l≤pn , γ̃k,l = εki εli .
n n i=1 σ 2 (ηi )
According to (21) and (22), the result (15) remains the same when replacing
˜ From (R2) and observing the boundedness of g 2 /σ 2 below and above, we
by .
obtain γ̂kl = γ̃kl (1+op (1)), where the op -term is uniform in k, l and pn . The result
800 H.-G. MÜLLER AND U. STADTMÜLLER

follows by observing that the semiparametric estimate β̂ has the same asymptotic
behavior as the parametric estimate, except for some minor modifications due to
the identifiability constraint. 

P ROOF OF C OROLLARY 4.3. The asymptotic (1 − α) confidence ellipsoid


for β ∈ Rp+1 is (β̂ − β)T ( /c(α))(β̂ − β) ≤ 1. Expressing the vectors β, β̂ in
 
terms of the eigenvectors ek leads to the coefficients β̂k∗ = l ekl β̂l , βk∗ = l ekl βl ,
√ √
and with γk∗ = (β̂k∗ − βk∗ )/ c(α)/λk , ωk∗ (t) = ωk (t) c(α)/λk the confidence

ellipsoid corresponds to the sphere k γk∗ 2 ≤ 1. To obtain the confidence band,
 
we need to maximize | k (β̂k∗ − βk∗ )ωk (t)| = | k γk∗ ωk∗ (t)| w.r.t. γk∗ , and subject
 ∗2  ∗ ∗ 
to k γk ≤ 1. By Cauchy–Schwarz, | k γk ωk (t)| ≤ [ k ωk∗ (t)2 ]1/2 and the
maximizing γk∗ must be linear dependent with the vector ω1∗ (t), . . . , ωp+1 ∗ (t), so
that the Cauchy–Schwarz inequality becomes an equality. The result then follows
from the definition of the ωk∗ (t). 

P ROOF OF L EMMA 7.1. We observe


    2
 Jβ − D T D 2 pn
E   =O → 0,
n  n
2
since g (ν) (·) ≤ c < ∞, ν = 1, 2, σ̃ 2 (·) ≤ c̃ < ∞ and σ̃ 2 (·) ≥ δ > 0 according
to (M1).
Together with pn = o(n1/4 ) (M2), this implies
 T −1    
 D D

Jβ − D T D D T D −1 U (β) 
 √  = op (1).
n n n n 2
Similarly,
 T −1 
 D D Jβ̃ − Jβ  D T D −1 U (β) 
 √  = op (1),
 n n n n
whence the result follows. 

P ROOF OF L EMMA 7.2. Note that


   
n − Ip +1  ≤ n 2  −1 − Ip +1  .
n 2 n n 2
We show that n−1 − Ipn +1 2 = op (1), implying
  n−1 − Ipn +1 2   √
n 2 ≤ Ipn +1 2 + ∼ Ipn +1 2 = pn + 1.
1 − n−1 − Ipn +1 2
Observe that
1 T
n−1 = 1/2
n D D 1/2
n
n
pn
1 
p 2
(1/2) (1/2)  g (ην ) (ν) (ν)
n n
= ξkj ξml ε ε
n j,m=0 ν=1
σ̃ 2 (ην ) j m k,l=0
GENERALIZED FUNCTIONAL LINEAR MODELS 801

and, therefore,
 
2
E n−1 − Ipn +1 2
pn
 1 n pn
 2
(1/2) (1/2) g (ην )
=E ξkj1 ξm1 l 2 1 εj(ν1 1 ) εm
(ν1 )
1
− δkl
k,l=0
n ν1 =1 j1 ,m1 =0
σ (µ ν1 )


1  n p 2
(1/2) (1/2) g (ην2 ) (ν2 ) (ν2 )
× ξkj2 ξm2 l ε ε − δkl
n ν =1 j ,m =0 σ̃ 2 (ην2 ) j2 m2
2 2 2

 
pn + 1
=O + o(1/pn2 ),
n

due to (M3 ). Hence, by (M4 ),


 
  √
n − Ip +1  = Op √pn Op
  1 
n 2 = Op 1/pn . 
pn

P ROOF OF L EMMA 7.3. Since


 
g 2 (ην )
pn
1 n 
E(An ) = E(eν2 )E εt1 εt2 2 ξt t = pn + 1
n ν=1 t ,t =0 σ̃ (ην ) 1 2
1 2

using the definition of , = −1 and E(e2 ) = 1, and, similarly, by (M3),

(pn + 1)2
E(A2n ) = o(pn ) + (pn + 1)2 − .
n
We find that 0 ≤ Var(An ) = o(pn ). This concludes the proof. 

P ROOF OF L EMMA 7.4. All random variables with upper index j are
independent of Fn,j −1 . Hence, we obtain

−1  
g  (ηi )  g  (ηj ) (j )
j pn
ei εt1 ξt1 t2 E ej
(i)
E(Wnj |Fn,j −1 ) = εt2 |Fn,j −1 = 0
i=1
σ̃ (ηi ) t ,t =0 σ̃ (ηj )
1 2

since
   
g  (ηj ) (j ) g  (ηj ) (j )
E ej εt2 |Fn,j −1 = E(ej )E ε = 0.
σ̃ (ηj ) σ̃ (ηj ) t2 
802 H.-G. MÜLLER AND U. STADTMÜLLER

P ROOF OF L EMMA 7.5. We note


2
E(Wnj |Fn,j −1 )

−1
j  (η  (η pn

g i1 )g i2 )
= ei1 ei2 εt(i1 1 ) εt(i3 2 ) ξt1 t2 ξt3 t4
i1 ,i2 =1
σ̃ (ηi1 )σ̃ (ηi2 ) t1 ,...,t4 =0
 
(j ) (j ) g  (ηj ) 2
× E εt2 εt4 e |Fn,j −1
σ̃ 2 (ηj ) j
−1
j  (η  (η pn

g i1 )g i2 )
ei1 ei2
(i ) (i )
= εt1 1 εt3 2 ξt3 t1
i1 ,i2 =1
σ̃ (ηi1 )σ̃ (ηi2 ) t1 ,t3 =0

and obtain
−1 
j pn  2 
  g (η)
E 2
E(Wnj |Fn,j −1 ) = E εt1 εt3 ξt3 t1
i=1 t1 ,t3
σ̃ 2 (η)

= (j − 1)(pn + 1).

This implies
n

E 
E(Wnj |Fn,j −1 ) → 1,
2
n → ∞.
j =1
n 2
We are done if we can show Var( j =1 {E(Wnj |Fn,j −1 )}) → 0. In order to obtain
the second moments, we first note
2
E{E(Wnj |Fn,j −1 )E(Wnk
2
|Fn,k−1 )}

−1
j 
k−1   (η  (η  (η  (η 
g i1 )g i2 )g i3 )g i4 )
= E ei1 ei2 ei3 ei4
i1 ,i2 =1 i3 ,i4 =1
σ̃ (ηi1 )σ̃ (ηi2 )σ̃ (ηi3 )σ̃ (ηi4 )

pn
 (i ) (i ) (i ) (i )
× εt1 1 εt2 2 εt3 3 εt4 4 ξt1 t2 ξt3 t4
t1 ,...,t4 =0

pn
  4 
g (η)
= µ4 (k − 1) E · εt1 εt2 εt3 εt4 ξt1 t2 ξt3 t4
t1 ,...,t4 =0
σ̃ 4 (η)

+ (j − 1)(k − 1)(pn + 1)2 + 2(k − 1)2 (pn + 1),


GENERALIZED FUNCTIONAL LINEAR MODELS 803

and then obtain


n 2

E {E(Wnj
2
|Fn,j −1 )}
j =1


n
  
= E {E(Wnj
2
|Fn,j −1 )}2 + 2 2
E{E(Wnj |Fn,j −1 )E(Wnk
2
|Fn,k−1 )}
j =1 1≤k<j ≤n
 pn  4 

n  g (η)
= µ4 (j − 1) E · εt1 · · · εt4 ξt1 t2 ξt3 t4
j =1 t1 ,...,t4 =0
σ̃ 4 (η)

+ (j − 1) (pn + 1) + 2(j − 1) (pn + 1)
2 2 2

−1
n j
pn  4 
  g (η)
+2 (k − 1)µ4 E · εt1 · · · εt4 ξt1 t2 ξt3 t4
j =1 k=1 t1 ,...,t4 =0
σ̃ 4 (η)

+ (j − 1)(k − 1)(pn + 1) + 2(k − 1) (pn + 1) 2 2

pn  4 
 g (η)
= O n3 E · εt1 · · · εt4 ξt1 t2 ξt3 t4
t1 ,...,t4 =0
σ̃ 4 (η)

n4   n4  
+ (pn + 1)2 1 + o(1) + (pn + 1) 1 + o(1) .
4 6
Applying (M2), we infer
n 2

E 2 |Fn,j −1 )}
{E(W = 1 + o(1)
nj
j =1

and conclude that


n

Var 2 |Fn,j −1 )}
{E(W → 0,
nj
j =1

whence the result follows. 

4 |F
P ROOF OF L EMMA 7.6. Combining detailed calculations of E(Wnj n,j −1 )
804 H.-G. MÜLLER AND U. STADTMÜLLER

4 |F
and E(E(Wnj n,j −1 )) with (M2) and (M3) leads to


n
4 )
E(Wnj
j =1
  pn
  4 
1 g (η)
=O 4 2 O(n )2
E εt1 εt3 εt5 εt7
n pn t1 ,...,t8 =0
σ̃ 4 (η)
 4 
g (η)
×E εt2 εt4 εt6 εt8 ξt1 t2 ξt3 t4 ξt5 t6 ξt7 t8
σ̃ 4 (η)
pn
  4 
g (η)
+ O(n ) 3
ξt3 t4 ξt7 t8 E εt3 εt4 εt7 εt8
t3 ,t4 ,t7 ,t8 =0
σ 4 (µ)
= o(1),
completing the proof. 

Acknowledgments. We are grateful to an Associate Editor and two referees


for helpful remarks and suggestions that led to a substantially improved version of
the paper. We thank James Carey for making the medfly fecundity data available
to us and Ping-Shi Wu for help with the programming.

REFERENCES
A LTER, O., B ROWN, P. O. and B OTSTEIN, D. (2000). Singular value decomposition for genome-
wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA 97
10101–10106.
A SH, R. B. and G ARDNER, M. F. (1975). Topics in Stochastic Processes. Academic Press, New York.
B ROWN, B. M. (1971). Martingale central limit theorems. Ann. Math. Statist. 42 59–66.
B RUMBACK, B. A. and R ICE, J. A. (1998). Smoothing spline models for the analysis of nested and
crossed samples of curves (with discussion). J. Amer. Statist. Assoc. 93 961–994.
C APRA, W. B. and M ÜLLER, H.-G. (1997). An accelerated time model for response curves. J. Amer.
Statist. Assoc. 92 72–83.
C ARDOT, H., F ERRATY, F. and S ARDA, P. (1999). Functional linear model. Statist. Probab. Lett. 45
11–22.
C AREY, J. R., L IEDO, P., M ÜLLER, H.-G., WANG, J.-L. and C HIOU, J.-M. (1998a). Relationship of
age patterns of fecundity to mortality, longevity and lifetime reproduction in a large cohort
of Mediterranean fruit fly females. J. Gerontology: Biological Sciences 53A B245–B251.
C AREY, J. R., L IEDO, P., M ÜLLER, H.-G., WANG, J.-L. and VAUPEL, J. W. (1998b). Dual modes of
aging in Mediterranean fruit fly females. Science 281 996–998.
C ASTRO, P. E., L AWTON, W. H. and S YLVESTRE, E. A. (1986). Principal modes of variation for
processes with continuous sample curves. Technometrics 28 329–337.
C HIOU, J.-M. and M ÜLLER, H.-G. (1998). Quasi-likelihood regression with unknown link and
variance functions. J. Amer. Statist. Assoc. 93 1376–1387.
C HIOU, J.-M. and M ÜLLER, H.-G. (1999). Nonparametric quasi-likelihood. Ann. Statist. 27 36–64.
GENERALIZED FUNCTIONAL LINEAR MODELS 805

C HIOU, J.-M., M ÜLLER, H.-G. and WANG, J.-L. (2003). Functional quasi-likelihood regression
models with smooth random effects. J. R. Stat. Soc. Ser. B Stat. Methodol. 65 405–423.
C ONWAY, J. B. (1990). A Course in Functional Analysis, 2nd ed. Springer, New York.
D UNFORD, N. and S CHWARTZ, J. T. (1963). Linear Operators. II. Spectral Theory. Wiley, New
York.
FAN, J. and L IN, S.-K. (1998). Test of significance when the data are curves. J. Amer. Statist. Assoc.
93 1007–1021.
FAN, J. and Z HANG, J.-T. (2000). Two-step estimation of functional linear models with application
to longitudinal data. J. R. Stat. Soc. Ser. B Stat. Methodol. 62 303–322.
FARAWAY, J. J. (1997). Regression analysis for a functional response. Technometrics 39 254–261.
G HORAI, J. (1980). Asymptotic normality of a quadratic measure of orthogonal series type density
estimate. Ann. Inst. Statist. Math. 32 341–350.
H ALL, P. and H EYDE, C. (1980). Martingale Limit Theory and Its Applications. Academic Press,
New York.
H ALL, P., P OSKITT, D. S. and P RESNELL, B. (2001). A functional data-analytic approach to signal
discrimination. Technometrics 43 1–9.
H ALL, P., R EIMANN, J. and R ICE, J. (2000). Nonparametric estimation of a periodic function.
Biometrika 87 545–557.
JAMES, G. M. (2002). Generalized linear models with functional predictors. J. R. Stat. Soc. Ser. B
Stat. Methodol. 64 411–432.
M C C ULLAGH, P. (1983). Quasi-likelihood functions. Ann. Statist. 11 59–67.
M C C ULLAGH, P. and N ELDER, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman and
Hall, London.
M ÜLLER, H.-G., C AREY, J. R., W U, D., L IEDO, P. and VAUPEL, J. W. (2001). Reproductive potential
predicts longevity of female Mediterranean fruit flies. Proc. R. Soc. Lond. Ser. B Biol. Sci.
268 445–450.
PARTRIDGE, L. and H ARVEY, P. H. (1985). Costs of reproduction. Nature 316 20–21.
S HAO, J. (1997). An asymptotic theory for linear model selection (with discussion). Statist. Sinica 7
221–264.
S HIBATA, R. (1981). An optimal selection of regression variables. Biometrika 68 45–54.
R AMSAY, J. O. and S ILVERMAN, B. W. (1997). Functional Data Analysis. Springer, New York.
R ICE, J. A. and S ILVERMAN, B. W. (1991). Estimating the mean and covariance structure
nonparametrically when the data are curves. J. Roy. Statist. Soc. Ser. B 53 233–243.
S TANISWALIS, J. G. and L EE, J. J. (1998). Nonparametric regression analysis of longitudinal data.
J. Amer. Statist. Assoc. 93 1403–1418.
WANG, J.-L., M ÜLLER, H.-G., C APRA, W. B. and C AREY, J. R. (1994). Rates of mortality in
populations of Caenorhabditis elegans. Science 266 827–828.
W EDDERBURN, R. W. M. (1974). Quasi-likelihood functions, generalized linear models and the
Gauss–Newton method. Biometrika 61 439–447.

D EPARTMENT OF S TATISTICS A BT. F. Z AHLEN - U .


U NIVERSITY OF C ALIFORNIA WAHRSCHEINLICHKEITSTHEORIE
O NE S HIELDS AVENUE U NIVERSITÄT U LM
DAVIS , C ALIFORNIA 95616 89069 U LM
USA G ERMANY
E- MAIL : [email protected]

You might also like