Generalized Functional Linear Models
Generalized Functional Linear Models
generalized score equation. The resulting regression coefficients obtained for the
linear predictor in such a model then provide us with an estimate of the parameter
function of the generalized functional regression model. This parameter function
replaces the parameter vector of the ordinary finite-dimensional generalized
linear model. We derive an asymptotic limit result (Theorem 4.1) for the
deviation between estimated and true parameter function for increasing dimension
asymptotics, referring to a situation where the number of components in the model
increases with sample size.
Asymptotic tests for the regression effect and simultaneous confidence bands
are obtained as corollaries of this main result. We include an extension to the
case of a semiparametric quasi-likelihood regression (SPQR) model in which link
and variance functions are unknown and are estimated from the data, extending
previous approaches of Chiou and Müller (1998, 1999), and also provide an
analysis of the AIC criterion for order selection.
The paper is organized as follows: The basics of the proposed generalized
functional linear model and some preliminary considerations can be found in
Section 2. The underlying ideas of estimation and statistical analysis within the
generalized functional linear model will be discussed in Section 3. The main
results and their ramifications are described in Section 4, preceded by a discussion
of the appropriate metric in which to formulate the asymptotic result, which
is found to be tied to the link and variance functions used for the generalized
functional linear model. Simulation results are reported in Section 5. An illustrative
example for the special case of binomial functional regression with the goal to
discriminate between short- and long-lived medflies is provided in Section 6. This
is followed by the main proofs in Section 7. Proofs of auxiliary results are in the
Appendix.
2. The generalized functional linear model. The data we observe for the ith
subject or experimental unit are ({Xi (t), t ∈ T }, Yi ), i = 1, . . . , n. We assume that
these data form an i.i.d. sample. The predictor variable X(t), t ∈ T , is a random
curve which is observed per subject or experimental unit and corresponds to a
square integrable stochastic process on a real interval T . The dependent variable
Y is a real-valued random variable which may be continuous or discrete. For
example, in the important special case of a binomial functional regression, one
would have Y ∈ {0, 1}.
Assume that a link function g(·) is given which is a monotone and twice con-
tinuously differentiable function with bounded derivatives and is thus invertible.
Furthermore, we have a variance function σ 2 (·) which is defined on the range of
the link function and is strictly positive. The generalized functional linear model
or functional quasi-likelihood model is determined by a parameter function β(·),
which is assumed to be square integrable on its domain T , in addition to the link
function g(·) and the variance function σ 2 (·).
GENERALIZED FUNCTIONAL LINEAR MODELS 777
and conditional means µ = g(η), where E(Y |X(t), t ∈ T ) = µ and Var(Y |X(t),
t ∈ T ) = σ 2 (µ) = σ̃ 2 (η) for a function σ̃ 2 (η) = σ 2 (g(η)). In a generalized
functional linear model the distribution of Y would be specified within the
exponential family. For the following (except where explicitly noted), it will be
sufficient to consider the functional quasi-likelihood model
(1) Yi = g α + β(t)Xi (t) dw(t) + ei , i = 1, . . . , n,
where
E e|X(t), t ∈ T = 0,
Var e|X(t), t ∈ T = σ 2 (µ) = σ̃ 2 (η).
Note that α is a constant, and the inclusion of an intercept allows us to require
E(X(t)) = 0 for all t.
The errors ei are i.i.d. and we use integration w.r.t. the measure dw(t) to allow
for nonnegative weight functions v(·) such that v(t) > 0 for t ∈ T , v(t) = 0 for
t∈/ T and dw(t) = v(t) dt; the default choice will be v(t) = 1{t∈T } . Nonconstant
weight functions might be of interest when the observed predictor processes are
function estimates which may exhibit increased variability in some regions, for
example, toward the boundaries.
The parameter function β(·) is a quantity of central interest in the statistical
analysis and replaces the vector of slopes in a generalized linear model or
estimating equation based model. Setting σ 2 = E{σ̃ 2 (η)}, we then find
Var(e) = Var{E(e|X(t), t ∈ T } + E{Var(e|X(t), t ∈ T }
= E{σ̃ 2 (η)} = σ 2 ,
as well as E(e) = 0.
ρj , j = 1, 2, . . . , be an orthonormal basis of the function space L (dw), that
Let 2
is, T ρj (t)ρk (t) dw(t) = δj k . Then the predictor process X(t) and the parameter
function β(t) can be expanded into
∞
∞
X(t) = εj ρj (t), β(t) = βj ρj (t)
j =1 j =1
[in the L2 (dw) sense] with r.v.’s εj and coefficients βj , given by εj = X(t) ×
ρ (t) dw(t) and βj = β(t)ρj (t) dw(t), respectively. We note that E(εj ) = 0 and
j 2
βj < ∞. Writing σj2 = E(εj2 ), we find σj2 = E(X 2 (t)) dw(t) < ∞.
778 H.-G. MÜLLER AND U. STADTMÜLLER
where σ̃p is defined analogously to gp . Note that g(Up ) − gp (Up ) and, analo-
gously, σ̃ (Up ) − σ̃p (Up ) are bounded by the error (2). Since it will be assumed
that this error vanishes asymptotically, as p → ∞, we may instead of (3) work
with the approximating sequence of models
p p
+ ei σ̃
(p)
(4) Yi =g α+ βj εj(i) α+ βj εj(i) , i = 1, . . . , n,
j =1 j =1
in which the functions g and σ̃ are fixed. We note that the random variables
Yi and ei , i = 1, . . . , n, form triangular arrays, Yi,nn and ei,n
, i = 1, . . . , n, with
(p) (p )
In the above developments we have assumed that both the link function g(·)
and the variance function σ 2 (·) are known. Situations where the link and variance
functions are unknown are common, and we can extend our methods to cover the
general case where these functions are smooth, which for fixed p corresponds
to the semiparametric quasi-likelihood regression (SPQR) models considered in
Chiou and Müller (1998, 1999). In the implementation of SPQR one alternates
nonparametric (smoothing) and parametric updating steps, using a reasonable
parametric model for the initialization step. Since the link function is arbitrary,
except for smoothness and monotonicity constraints, we may require that estimates
and parameters satisfy β = 1, β̂ = 1 for identifiability.
p
For given β̂, β̂ = 1, setting η̂i = j =0 β̂j εj(i) , updates of the link function
estimate ĝ(·) and its first derivative ĝ (·) are obtained by smoothing (applying any
reasonable scatterplot smoothing method that allows the estimation of derivatives)
the scatterplot (η̂i , Yi )i=1,...,n . Updates for the variance function estimate σ̂ 2 (·)
are obtained by smoothing the scatterplot (µ̂i , ε̂i2 )i=1,...,n , where µ̂i = ĝ(η̂i ) are
current mean response estimates and ε̂i2 = (Yi − µ̂i )2 are current squared residuals.
The parametric updating step then proceeds by solving the score equation (5),
using the semiparametric score
n
(9) U (β) = Yi − ĝ(ηi ) ĝ (ηi )ε(i) /σ̂ 2 ĝ(ηi ) .
i=1
This leads to the solutions β̂, in analogy to (7). For solutions of the score equations
for both scores (6) and (9), we then obtain the regression function estimates
p
(10) β̂(t) = β̂0 + β̂j ρj (t).
j =1
Matrices D and are modified analogously for the SPQR case, substituting
appropriate estimates.
for f, g ∈ L2 (dw), and given an arbitrary orthonormal basis {ρj , j = 1, 2, . . .}, the
Hilbert–Schmidt kernels R can be expressed as
R(s, t) = rkl ρk (s)ρl (t)
k,l
k
In the following we use the metric dG , since it allows us to derive asymptotic
limits under considerably simpler conditions than for the L2 metric, due to its
dampening effect on higher order frequencies. For the sequence of pn -truncated
models (1) that we are considering,
2
g (η)
2
dG (β̂, β) = β̂(s) − β(s) β̂(t) − β(t) E X(s)X(t) dw(s) dw(t)
σ 2 (µ)
2 (β̂, β) = (β̂ − β)T (β̂ − β) for each p.
is approximated by dG,p
In addition to the basic assumptions in Section 2 and usual conditions on
variance and link functions, we require some technical conditions which restrict
the growth of p = pn and the higher-order moments of the random coefficients εj .
GENERALIZED FUNCTIONAL LINEAR MODELS 783
Additional conditions are required for the semiparametric (SPQR) case where
both link and variance functions are assumed unknown and are estimated
nonparametrically.
(M1) The link function g is monotone, invertible and has two continuous bounded
derivatives with g (·) ≤ c, g (·) ≤ c for a constant c ≥ 0. The variance
function σ 2 (·) has a continuous bounded derivative and there exists a δ > 0
such that σ (·) ≥ δ.
(M2) The number of predictor terms pn in the sequence of approximating
pn -truncated models (1) satisfies pn → ∞ and pn n−1/4 → 0 as n → ∞.
(M3) It holds that [see (8), where the ξkl are defined]
g 4 (η)
pn
E εk1 εk2 εk3 εk4 4 ξk1 k2 ξk3 k4 = o(n/pn2 ).
k1 ,...,k4 =0
σ (µ)
T HEOREM 4.1. If the basic assumptions and (M1)–(M4) are satisfied, then
We note that the matrix pn in Theorem 4.1 may be replaced by the empirical
version ˜ = n1 (DD T ); this is a consequence of (21), (22) and Lemma 7.2 below.
Whenever only the “slope” parameters β1 , β2 , . . . but not the intercept parameter
α = β0 are of interest, pn is replaced by pn − 1 and the (p + 1) × (p + 1) matrix
is replaced by the p × p submatrix of obtained by deleting the first row/column.
To study the convergence of the estimated parameter function β̂(·), we use the
distance dG and the representation (14) with R ≡ G, coupled with the expansion
pn
β̂(t) = β̂ρ G ρjG (t)
j
j =1
784 H.-G. MÜLLER AND U. STADTMÜLLER
of the estimated parameter function β̂(·) in the basis {ρjG , j = 1, 2, . . .}, the
eigenbasis of operator AG with associated eigenvalues λG
j . We obtain
2
dG β̂(·), β(·) = β̂(s) − β(s) G(s, t) β̂(t) − β(t) dw(s) dw(t)
p
∞
2
= j β̂ρ G − βρ G
λG + λG 2
j βρ G
j j j
j =1 j =p+1
∞
= (β̂ G − β G )T G (β̂ G − β G ) + λG 2
j βρ G .
j
j =p+1
Here
T T
β̂ G = β̂ρ G , . . . , β̂ρpG , β G = βρ G , . . . , βρpG ,
1 1
and the diagonal matrix G is obtained by replacing in the definition of the matrix
[see (8)] the εj by εjG that are given by
g (η)
εjG = X(t)ρjG (t) dw(t),
σ (µ)
with the property
(16) E(εjG εkG ) = G(s, t)ρjG (s)ρkG (t) dw(s) dw(t) = δij λG
j .
C OROLLARY 4.1. If the parameter function β(·) has the property that
∞
2 √
G2 pn
(17) E εj β(t)ρj (t) dw(t) = o
G
,
j =p+1
n
then
n (β̂(s) − β(s))(β̂(t) − β(t))G(s, t) dw(s) dw(t) − (pn + 1) d
√ → N(0, 1)
2 pn + 1
as n → ∞.
We note that property (17) relates to the rate at which higher-order oscillations,
relative to the oscillations of processes X(t), contribute to the L2 norm of the
parameter function β(·).
In the case of unknown link and variance functions (SPQR), one applies scatter-
plot smoothing to obtain nonparametric estimates of functions and derivatives and
then obtains the parameter estimates β̂ as solutions of the semiparametric score
equation (9). After iteration, final nonparametric estimates of the link function ĝ,
GENERALIZED FUNCTIONAL LINEAR MODELS 785
its derivative ĝ and of the variance function σ̂ 2 are obtained. We implement these
nonparametric curve estimators with local linear or quadratic kernel smoothers,
using a bandwidth h in the smoothing step. For the following result we assume
these conditions:
(R1) The regularity conditions (M1)–(M6) and (K1)–(K3) of Chiou and Müller
(1998) hold uniformly for all pn .
(R2) For the bandwidths h of the nonparametric function estimates for link and
nh3 √p −1/2 → 0 as n → ∞.
variance function, h → 0, log n → ∞ and nh2
C OROLLARY 4.2. Assume (R1) and (R2) and replace the matrix in (15)
by the matrix ˆ from (18). Then (15) remains valid for the semiparametric
quasi-likelihood (SPQR) estimates β̂ that are obtained as solutions of the
semiparametric estimating equation (9), substituting nonparametrically estimated
link and variance functions.
Extending the arguments used in the proofs of Theorems 1 and 2 in Chiou and
Müller (1998), and assuming additional regularity conditions as described there,
we find for these nonparametric function estimates,
2 √
ĝ (t)
g 2 (t) log n pn
sup 2 − = Op + h + 2 β̂ − β .
2
t σ̂ (t) σ 2 (t) nh3 h
3
Assuming that h → 0, log √p −1/2 → 0, we obtain from the
n → ∞ and nh2
nh
boundedness of the design density of the linear predictors away from 0 and ∞
that
ĝ 2 (η̂) g 2 (η)
= 2 + op (1),
σ̂ 2 (η̂) σ (η)
where the op -terms are uniform in p following (M2). Therefore, the matrix ˆ
approximates the elements of the matrix
1 1 n
g 2 (ηi )
˜ = (DD T ) = (γ̃kl )1≤k,l≤pn , γ̃k,l = εki εli
n n i=1 σ 2 (ηi )
uniformly in k, l and pn . This, together with the remarks after Theorem 4.1,
justifies the extension to the semiparametric (SPQR) case with unknown link
and variance functions. This case will be included in the following, unless noted
otherwise.
786 H.-G. MÜLLER AND U. STADTMÜLLER
model orders, and, in addition, we found AIC to work well in practice. We discuss
here the consistency of AIC for choosing p in the context of the generalized linear
model with full likelihood and known link function.
Assume the linear predictor vector ηp consists of n components ηp,i =
p p
j =0 εj βj , i = 1, . . . , n, the vector η̂p of the components η̂p,i =
i i
j =0 εj β̂j
∞ i
and the vector η of the components j =0 εj βj . Let G be the antiderivative
of the (inverse) link function g so that Y has the density (in canonical form)
fY (y) = exp(yη + a(y) − G(η)). In particular, σ̃ 2 (η) = g (η). The deviance is
D = −2n (Y, η̂p ) + 2n Y, g −1 (Y ) ,
with log-likelihood
n
n
n (Y, η̂p ) = Yi η̂i,p − G(η̂i,p ).
i=1 i=1
we arrive at
E(D) = n E g (η)εk εl βk βl − p 1 + o(1) + En
k,l=p+1
2
g (η)
=n E εk εl βk βl − p 1 + o(1) + En ,
k,l=p+1
σ̃ 2 (η)
2
where γk,l = E( gσ̃ 2 (η)
(η)
εk εl ). We obtain E(d(β̂(·), β(·))) = p/n(1 + o(1)) +
∞
k,j =p+1 γj,k βj βk (1 + o(1)).
This analysis shows that the target function d(β̂(·), β(·)) to be minimized is
asymptotically close to E(D/n) + 2p/n. This suggests that we are in the situation
considered by Shibata (1981) for sequences of linear models with normal residuals
and by Shao (1997) for the more general case. While the closeness of the target
function and AIC is suggestive, a rigorous proof that the order pA selected by
AIC and the order pd that minimizes the target function satisfy pd /pA → 1
in probability as n → ∞ or a stronger consistency or efficiency result requires
additional analysis that is not provided here. One difficulty is that the usual
normality assumption is not satisfied as one operates in an exponential family or
quasi-likelihood setting.
In practice, we implement AIC and the alternative Bayesian information
criterion BIC by obtaining first the deviance or quasi-deviance D(p), dependent
on the model order p. This is straightforward in the quasi-likelihood or maximum
likelihood case with known link function, and requires integrating the score
function to obtain the analogue of the log-likelihood in the SPQR case with
unknown link function. Once the deviance is obtained, we choose the minimizing
argument of
where P is the penalty term, chosen as 2p for the AIC and as p log n for the BIC.
Several alternative selectors that we studied were found to be less stable and
more computer intensive in simulations. These included minimization of the leave-
one-out prediction error, of the leave-one-out misclassification rate via cross-
validation [Rice and Silverman (1991)], and of the relative difference between the
Pearson criterion and the deviance [Chiou and Müller (1998)].
GENERALIZED FUNCTIONAL LINEAR MODELS 789
This test was implemented as a one-sided test at the 5% level, that is, rejection was
recorded whenever |T | >
−1 (0.95). The average rejection rate was determined
over 500 Monte Carlo runs, for sample sizes n = 50, 200, as a function of δ, 0 ≤
δ ≤ 2, where the underlying parameter vector was as described in the preceding
paragraph, multiplied by δ, and is given by (δ, δ, δ/2, δ/3). The resulting power
functions are shown in Figure 1 and demonstrate that sample size plays a critical
role.
To demonstrate the usefulness of the SPQR approach with automatic link
estimation, we calculated the means of the estimated regression parameter
functions β̂(·) over 50 Monte Carlo runs for the following cases: In each run,
1000 samples were generated with either the logit or c-loglog link function and the
corresponding functions β(·) were estimated in three different ways: Assuming
a logit link, a c-loglog link and assuming no link, using the SPQR method.
The resulting mean function estimates can be seen in Figure 2. One finds that
misspecification of the link function can lead to serious problems with these
estimates and that the flexibility of the SPQR approach entails a clear advantage
over methods where a link function must be specified a priori.
F IG . 1. Empirical power functions for the significance test for a functional logistic regression effect
at the 5% level. Based on 500 simulations, for sample sizes 50 (dashed ) and 200 (solid ), with p = 3.
GENERALIZED FUNCTIONAL LINEAR MODELS 791
F IG . 2. Average estimates of the regression parameter function β(·) obtained over 50 Monte Carlo
runs from data generated either with the logit link (left panel) or with the c-loglog link (right
panel). Each panel displays the target function (solid ), and estimates obtained assuming the logit
link (dashed ), the c-loglog link (dash-dot) and the SPQR method incorporating nonparametric link
function estimation (dotted ).
obtain the predictor processes Xi (t), t ∈ [0, 30], i = 1, . . . , 534. A fly is classified
as long-lived if the remaining lifetime past 30 days is 14 days or longer, otherwise
as short-lived. Of the n = 534 flies, 256 were short-lived and 278 were long-lived.
We apply the algorithm as described in the previous section, choosing the logit
link, fitting a logistic functional regression.
Plotting the reproductive trajectories for the long-lived and short-lived flies
separately (upper panels of Figure 3), no clear visual differences between the two
groups can be discerned. Failure to visually detect differences between the two
groups could result from overcrowding of these plots with too many curves, but
when displaying fewer curves (lower panels of Figure 3), this remains the same.
Therefore, the discrimination task at hand is difficult, as at best subtle and hard to
discern differences exist between the trajectories of the two groups.
We use the Akaike information criterion (AIC) for choosing the number of
model components. As can be seen from Figure 4, where the AIC criterion is
F IG . 4. Akaike information criterion (AIC ) as a function of the number of model components p for
the medfly data.
shown in dependency on the model order p, this leads to the choice p = 6. The
(−i) (−i)
cross-validation prediction error criterion PE = n1 ni=1 (Yi − p̂i )2 , where p̂i
is the leave-one-out estimate for pi , supports a similar choice. The leave-out
misclassification rate estimates are, for the group of long-lived flies, 37% with
logit link and 35% for the nonparametric SPQR link, while for the group of short-
lived flies these are 47% for logit and 48% for SPQR, demonstrating the difficulty
of classifying short-lived flies correctly.
The fitted regression parameter functions β̂(·) for both logistic (logit link)
and SPQR (nonparametric link) functional regression, along with simultaneous
confidence bands (19), are shown in Figure 5; we find that the estimate with
nonparametric link is quite close to the estimate employing the logistic link,
thus providing some support for the choice of the logistic link in this case. The
asymptotic confidence bands allow us to conclude that the link function has a steep
rise at the right end towards age 30 days, and that the null hypothesis of no effect
would be rejected.
The shape of the parameter function β̂(·) highlights periods of egg-laying
that are associated with increased longevity. We note that under the logit link
794 H.-G. MÜLLER AND U. STADTMÜLLER
F IG . 5. The regression parameter function estimates β̂(·) (19) (solid ) for the medfly classification
problem, with simultaneous confidence bands (5) (dashed ). Left panel: Logit link. Right panel:
Nonparametric link, using the SPQR algorithm.
F IG . 6. Logit link (dashed ) and nonparametric link function (solid ) obtained via the SPQR
algorithm, with overlaid group indicators, versus level of linear predictor η.
that
n
D D= T
g 2 (ηi )ε(i) ε(i)T /σ̃ 2 (ηi ),
i=1
we obtain
n
∂
Jβ = g (ηi )ε(i) Yi − g(ηi ) /σ̃ 2 g(ηi ) · β ηi
i=1
∂ηi
n
g (ηi ) g 2 (ηi )σ̃ 2 (ηi )
= −D T D − Yi − g(ηi ) ε(i) ε(i)T −
i=1
σ̃ 2 (ηi ) σ̃ 4 (ηi )
= −D T D + R, say.
We aim to show that the remainder term R can eventually be neglected. By a Taylor
expansion, for a β̃ between β and β̂,
U (β) = U (β̂) − Jβ̃ (β̂ − β) = −Jβ̃ (β̂ − β)
= −[D T D(β̂ − β) + (Jβ̃ − Jβ )(β̂ − β) + (Jβ − D T D)(β̂ − β)].
Denoting the q × q identity matrix by Iq , this leads to
√ √ −1
n(β̂ − β) = n D T D + (Jβ̃ − Jβ ) + (Jβ − D T D) U (β)
−1 J − J −1 −1
DT D β̃ β DT D Jβ − D T D
= Ipn +1 + +
n n n n
−1
DT D U (β)
× √ .
n n
Using the matrix norm M2 = ( m2kl )1/2 , we find (see Appendix for the
proof ) the following:
L EMMA 7.1. As n → ∞,
T −1
√ D D U (β)
n(β̂ − β) − √ = op (1).
n n 2
Note that conditions (M3 ) and (M4 ) are weaker than the corresponding
conditions (M2) and (M3) and, therefore, will be satisfied under the basic
assumptions. A consequence of Lemma 7.2 is
T
X n − Ip +1 Xn ≤ |Xn XT |n − Ip +1
n n n n 2
√ √
= Op (pn )op 1/ pn = op pn .
√ p
Therefore, Gn / pn → 0. The bound for the term Hn is completely analogous.
√ d
Since we will show in Proposition 7.1 below that (XTn Xn − (pn + 1))/ 2pn →
N(0, 1) [this implies |Xn XTn | = Op (pn )], it follows that Gn + Hn = op (Fn ) so
that these terms can indeed be neglected. The proof of Theorem 4.1 will therefore
be complete if we show the following:
√ d
P ROPOSITION 7.1. As n → ∞, (XTn Xn − (pn + 1))/ 2pn → N(0, 1).
and
pn
(1/2) (1/2)
ξkt1 ξkt2 = ξt1 t2
k=0
to obtain
g (ην1 ) g (ην2 ) (ν1 ) (ν2 ) (1/2) (1/2)
pn pn
1 n
XTn Xn = eν 1 eν 2 ε ε ξ ξ
n k=0 ν ,ν =1 t ,t =0 σ̃ (ην1 ) σ̃ (ην2 ) t1 t2 kt1 kt2
1 2 1 2
1 n pn
2
(ν) g (ην )
= eν2 εt(ν) ε ξt ,t
n ν=1 t ,t =0 1 t2 σ̃ 2 (ην ) 1 2
1 2
g (ην1 ) g (ην2 )
pn
1 n
eν 1 eν 2
(ν ) (ν )
+ ε 1 ε 2 ξt t
n ν =ν =1 σ̃ (ην1 ) σ̃ (ην2 ) t ,t =0 t1 t2 1 2
1 2 1 2
= An + Bn , say.
We will analyze these terms in turn and utilize the independence of the random
variables associated with observations (Xi , Yi ) for different values of i, the
independence of the ei of all ε’s, and E(e ) = 0, E(e2 ) = 1.
we may write
2 n
Bn = Wnj .
n j =1
Note that Fn,j ⊂ Fn+1,j . Lemma 7.4 implies that the r.v.s W nj = √2 Wnj
n 2pn
also form a triangular array of martingale difference sequences. According to the
central limit theorem for martingale difference sequences [Brown (1971); see also
Hall and Heyde (1980), Theorem 3.2 and corollaries], sufficient conditions for
nj →d
the asymptotic normality nj=1 W N(0, 1) are the conditional normalization
condition and the conditional Lyapunov condition. The following two lemmas
which are proved in the Appendix demonstrate that these sufficient conditions are
satisfied. We note that martingale methods have also been used by Ghorai (1980)
for the asymptotic distribution of an error measure for orthogonal series density
estimates.
√ d
A consequence of Lemmas 7.5 and 7.6 is then Bn / 2 pn → N(0, 1). Together
with Lemma 7.4, this implies Proposition 7.1 and, thus, Theorem 4.1.
APPENDIX
We provide here the main arguments of the proofs of several corollaries and of
the auxiliary results which were used in Section 7 for the proof of Theorem 4.1.
follows by observing that the semiparametric estimate β̂ has the same asymptotic
behavior as the parametric estimate, except for some minor modifications due to
the identifiability constraint.
and, therefore,
2
E n−1 − Ipn +1 2
pn
1 n pn
2
(1/2) (1/2) g (ην )
=E ξkj1 ξm1 l 2 1 εj(ν1 1 ) εm
(ν1 )
1
− δkl
k,l=0
n ν1 =1 j1 ,m1 =0
σ (µ ν1 )
1 n p 2
(1/2) (1/2) g (ην2 ) (ν2 ) (ν2 )
× ξkj2 ξm2 l ε ε − δkl
n ν =1 j ,m =0 σ̃ 2 (ην2 ) j2 m2
2 2 2
pn + 1
=O + o(1/pn2 ),
n
(pn + 1)2
E(A2n ) = o(pn ) + (pn + 1)2 − .
n
We find that 0 ≤ Var(An ) = o(pn ). This concludes the proof.
P ROOF OF L EMMA 7.4. All random variables with upper index j are
independent of Fn,j −1 . Hence, we obtain
−1
g (ηi ) g (ηj ) (j )
j pn
ei εt1 ξt1 t2 E ej
(i)
E(Wnj |Fn,j −1 ) = εt2 |Fn,j −1 = 0
i=1
σ̃ (ηi ) t ,t =0 σ̃ (ηj )
1 2
since
g (ηj ) (j ) g (ηj ) (j )
E ej εt2 |Fn,j −1 = E(ej )E ε = 0.
σ̃ (ηj ) σ̃ (ηj ) t2
802 H.-G. MÜLLER AND U. STADTMÜLLER
−1
j (η (η pn
g i1 )g i2 )
= ei1 ei2 εt(i1 1 ) εt(i3 2 ) ξt1 t2 ξt3 t4
i1 ,i2 =1
σ̃ (ηi1 )σ̃ (ηi2 ) t1 ,...,t4 =0
(j ) (j ) g (ηj ) 2
× E εt2 εt4 e |Fn,j −1
σ̃ 2 (ηj ) j
−1
j (η (η pn
g i1 )g i2 )
ei1 ei2
(i ) (i )
= εt1 1 εt3 2 ξt3 t1
i1 ,i2 =1
σ̃ (ηi1 )σ̃ (ηi2 ) t1 ,t3 =0
and obtain
−1
j pn 2
g (η)
E 2
E(Wnj |Fn,j −1 ) = E εt1 εt3 ξt3 t1
i=1 t1 ,t3
σ̃ 2 (η)
= (j − 1)(pn + 1).
This implies
n
E
E(Wnj |Fn,j −1 ) → 1,
2
n → ∞.
j =1
n 2
We are done if we can show Var( j =1 {E(Wnj |Fn,j −1 )}) → 0. In order to obtain
the second moments, we first note
2
E{E(Wnj |Fn,j −1 )E(Wnk
2
|Fn,k−1 )}
−1
j
k−1 (η (η (η (η
g i1 )g i2 )g i3 )g i4 )
= E ei1 ei2 ei3 ei4
i1 ,i2 =1 i3 ,i4 =1
σ̃ (ηi1 )σ̃ (ηi2 )σ̃ (ηi3 )σ̃ (ηi4 )
pn
(i ) (i ) (i ) (i )
× εt1 1 εt2 2 εt3 3 εt4 4 ξt1 t2 ξt3 t4
t1 ,...,t4 =0
pn
4
g (η)
= µ4 (k − 1) E · εt1 εt2 εt3 εt4 ξt1 t2 ξt3 t4
t1 ,...,t4 =0
σ̃ 4 (η)
n
= E {E(Wnj
2
|Fn,j −1 )}2 + 2 2
E{E(Wnj |Fn,j −1 )E(Wnk
2
|Fn,k−1 )}
j =1 1≤k<j ≤n
pn 4
n g (η)
= µ4 (j − 1) E · εt1 · · · εt4 ξt1 t2 ξt3 t4
j =1 t1 ,...,t4 =0
σ̃ 4 (η)
+ (j − 1) (pn + 1) + 2(j − 1) (pn + 1)
2 2 2
−1
n j
pn 4
g (η)
+2 (k − 1)µ4 E · εt1 · · · εt4 ξt1 t2 ξt3 t4
j =1 k=1 t1 ,...,t4 =0
σ̃ 4 (η)
+ (j − 1)(k − 1)(pn + 1) + 2(k − 1) (pn + 1) 2 2
pn 4
g (η)
= O n3 E · εt1 · · · εt4 ξt1 t2 ξt3 t4
t1 ,...,t4 =0
σ̃ 4 (η)
n4 n4
+ (pn + 1)2 1 + o(1) + (pn + 1) 1 + o(1) .
4 6
Applying (M2), we infer
n 2
E 2 |Fn,j −1 )}
{E(W = 1 + o(1)
nj
j =1
4 |F
P ROOF OF L EMMA 7.6. Combining detailed calculations of E(Wnj n,j −1 )
804 H.-G. MÜLLER AND U. STADTMÜLLER
4 |F
and E(E(Wnj n,j −1 )) with (M2) and (M3) leads to
n
4 )
E(Wnj
j =1
pn
4
1 g (η)
=O 4 2 O(n )2
E εt1 εt3 εt5 εt7
n pn t1 ,...,t8 =0
σ̃ 4 (η)
4
g (η)
×E εt2 εt4 εt6 εt8 ξt1 t2 ξt3 t4 ξt5 t6 ξt7 t8
σ̃ 4 (η)
pn
4
g (η)
+ O(n ) 3
ξt3 t4 ξt7 t8 E εt3 εt4 εt7 εt8
t3 ,t4 ,t7 ,t8 =0
σ 4 (µ)
= o(1),
completing the proof.
REFERENCES
A LTER, O., B ROWN, P. O. and B OTSTEIN, D. (2000). Singular value decomposition for genome-
wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA 97
10101–10106.
A SH, R. B. and G ARDNER, M. F. (1975). Topics in Stochastic Processes. Academic Press, New York.
B ROWN, B. M. (1971). Martingale central limit theorems. Ann. Math. Statist. 42 59–66.
B RUMBACK, B. A. and R ICE, J. A. (1998). Smoothing spline models for the analysis of nested and
crossed samples of curves (with discussion). J. Amer. Statist. Assoc. 93 961–994.
C APRA, W. B. and M ÜLLER, H.-G. (1997). An accelerated time model for response curves. J. Amer.
Statist. Assoc. 92 72–83.
C ARDOT, H., F ERRATY, F. and S ARDA, P. (1999). Functional linear model. Statist. Probab. Lett. 45
11–22.
C AREY, J. R., L IEDO, P., M ÜLLER, H.-G., WANG, J.-L. and C HIOU, J.-M. (1998a). Relationship of
age patterns of fecundity to mortality, longevity and lifetime reproduction in a large cohort
of Mediterranean fruit fly females. J. Gerontology: Biological Sciences 53A B245–B251.
C AREY, J. R., L IEDO, P., M ÜLLER, H.-G., WANG, J.-L. and VAUPEL, J. W. (1998b). Dual modes of
aging in Mediterranean fruit fly females. Science 281 996–998.
C ASTRO, P. E., L AWTON, W. H. and S YLVESTRE, E. A. (1986). Principal modes of variation for
processes with continuous sample curves. Technometrics 28 329–337.
C HIOU, J.-M. and M ÜLLER, H.-G. (1998). Quasi-likelihood regression with unknown link and
variance functions. J. Amer. Statist. Assoc. 93 1376–1387.
C HIOU, J.-M. and M ÜLLER, H.-G. (1999). Nonparametric quasi-likelihood. Ann. Statist. 27 36–64.
GENERALIZED FUNCTIONAL LINEAR MODELS 805
C HIOU, J.-M., M ÜLLER, H.-G. and WANG, J.-L. (2003). Functional quasi-likelihood regression
models with smooth random effects. J. R. Stat. Soc. Ser. B Stat. Methodol. 65 405–423.
C ONWAY, J. B. (1990). A Course in Functional Analysis, 2nd ed. Springer, New York.
D UNFORD, N. and S CHWARTZ, J. T. (1963). Linear Operators. II. Spectral Theory. Wiley, New
York.
FAN, J. and L IN, S.-K. (1998). Test of significance when the data are curves. J. Amer. Statist. Assoc.
93 1007–1021.
FAN, J. and Z HANG, J.-T. (2000). Two-step estimation of functional linear models with application
to longitudinal data. J. R. Stat. Soc. Ser. B Stat. Methodol. 62 303–322.
FARAWAY, J. J. (1997). Regression analysis for a functional response. Technometrics 39 254–261.
G HORAI, J. (1980). Asymptotic normality of a quadratic measure of orthogonal series type density
estimate. Ann. Inst. Statist. Math. 32 341–350.
H ALL, P. and H EYDE, C. (1980). Martingale Limit Theory and Its Applications. Academic Press,
New York.
H ALL, P., P OSKITT, D. S. and P RESNELL, B. (2001). A functional data-analytic approach to signal
discrimination. Technometrics 43 1–9.
H ALL, P., R EIMANN, J. and R ICE, J. (2000). Nonparametric estimation of a periodic function.
Biometrika 87 545–557.
JAMES, G. M. (2002). Generalized linear models with functional predictors. J. R. Stat. Soc. Ser. B
Stat. Methodol. 64 411–432.
M C C ULLAGH, P. (1983). Quasi-likelihood functions. Ann. Statist. 11 59–67.
M C C ULLAGH, P. and N ELDER, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman and
Hall, London.
M ÜLLER, H.-G., C AREY, J. R., W U, D., L IEDO, P. and VAUPEL, J. W. (2001). Reproductive potential
predicts longevity of female Mediterranean fruit flies. Proc. R. Soc. Lond. Ser. B Biol. Sci.
268 445–450.
PARTRIDGE, L. and H ARVEY, P. H. (1985). Costs of reproduction. Nature 316 20–21.
S HAO, J. (1997). An asymptotic theory for linear model selection (with discussion). Statist. Sinica 7
221–264.
S HIBATA, R. (1981). An optimal selection of regression variables. Biometrika 68 45–54.
R AMSAY, J. O. and S ILVERMAN, B. W. (1997). Functional Data Analysis. Springer, New York.
R ICE, J. A. and S ILVERMAN, B. W. (1991). Estimating the mean and covariance structure
nonparametrically when the data are curves. J. Roy. Statist. Soc. Ser. B 53 233–243.
S TANISWALIS, J. G. and L EE, J. J. (1998). Nonparametric regression analysis of longitudinal data.
J. Amer. Statist. Assoc. 93 1403–1418.
WANG, J.-L., M ÜLLER, H.-G., C APRA, W. B. and C AREY, J. R. (1994). Rates of mortality in
populations of Caenorhabditis elegans. Science 266 827–828.
W EDDERBURN, R. W. M. (1974). Quasi-likelihood functions, generalized linear models and the
Gauss–Newton method. Biometrika 61 439–447.