Functional Estimation For Density, Regression Models and Processes (Odile Pons)
Functional Estimation For Density, Regression Models and Processes (Odile Pons)
Odile Pons
INRA, France
World Scientific
NEW JERSEY • LONDON • SINGAPORE • BEIJING • SHANGHAI • HONG KONG • TA I P E I • CHENNAI
For photocopying of material in this volume, please pay a copying fee through the Copyright
Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to
photocopy is not required from the publisher.
ISBN-13 978-981-4343-73-2
ISBN-10 981-4343-73-0
Printed in Singapore.
Preface
Contents
Preface v
1. Introduction 1
1.1 Estimation of a density . . . . . . . . . . . . . . . . . . . 2
1.2 Estimation of a regression curve . . . . . . . . . . . . . . 10
1.3 Estimation of functionals of processes . . . . . . . . . . . 14
1.4 Content of the book . . . . . . . . . . . . . . . . . . . . . 19
vii
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Contents ix
Notations 189
Bibliography 191
Index 197
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Chapter 1
Introduction
The aim of this book is to present in the same approach estimators for
functions defining probability models: density, intensity of point processes,
regression curves and diffusion processes. The observations may be continu-
ous for processes or discretized for samples of densites, regressions and time
series, with sequential observations over time. The regular sampling scheme
of the time series is not common in regression models where stochastic ex-
planatory variables X are recorded together with a response variable Y
according to a random sampling of independent and identically distributed
observations (Xi , Yi )i≤n . The discretization of a continuous diffusion pro-
cess yields a regression model and the approximation error can be made
sufficiently small to extend the estimators of the regression model to the
drift and variance functions of a diffusion process. The functions defin-
ing the probability models are not specified by parameters and they are
estimated in functional spaces.
This chapter is a review of well known estimators for density and re-
gression functions and a presentation of models for continuous or discrete
processes where nonparametric estimators are defined.
1
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
The empirical distribution function and the histogram are stepwise estima-
tors and smooth estimators have been later defined for regular functions.
Several kinds of smooth methods have been developed. The first one was
the projection of functions onto regular and orthonormal bases of functions
(φk )k≥0 . The density of the observations is approximated by a countable
P Kn
projection on the basis fn (x) = i=1 ak φk (x) where Kn tends to infin-
ity and the coefficients are defined by the scalar product specific to the
orthonormality of the basis with
Z Z
2
φk (x)µφ (x) dx = 1, φk (x)φl (x)µφ (x) dx = 0, for all k 6= l,
R
then ak =< f, φk >= f (x)φk (x)µφ (x) dx. The coefficients are estimated
by integrating the basis with respect to the empirical distribution of the
variable X
Z
akn = φk (x)µφ (x) dFbn (x)
b
P Kn
which yields an estimator of the density fbn (x) = i=1 b
akn φk (x). The same
principle applies to other stepwise estimators of functions. Well known
bases of L2 -orthogonal functions are
Introduction 3
2
(ii) Hermite’s polynomials of degree n defined by the derivatives
2 dn −x2 /2
Hn (x) = (−1)n ex /2
(e ), n ≥ 1,
dxn
they satisfy the recurrence equation Hn+1 (x) = xHn (x) − Hn0 (x), with
H0 (x) = 1. They are orthogonal with the scalar product
Z +∞
2
< f, g >= f (x)g(x)e−x dx
−∞
√
and their norm is kHn k = n! 2π;
(iii) Laguerre’s polynomials3 defined by the derivatives
ex dn −x n
Ln (x) = (e x ), n ≥ 1,
n! dxn
and L0 (x) = 1. They satisfy the recurrence equation Ln+1 (x) = (2n +
1 − x)Ln (x) − n2 Ln−1 (x) and they are orthogonal with the scalar
product
Z +∞
< f, g >= f (x)g(x)e−2x dx.
−∞
R +∞
norm kfbn − fn k2 = { −∞
E(fbn − fn )2 (x)µφ (x) dx}1/2 if n−1 Kn tends to
zero, so that
Z +∞ Kn
X
kfbn − fn k22 = E akn − ak )2 φ2k (x)µφ (x) dx
(b
−∞ i=1
Kn
X
= akn − ak )2
E(b
i=1
Z
2
akn − ak ) = E{
E(b φk (x)µφ (x) d(Fbn − F )(x)}2
Z
= n−1 φk (x)φk (y)µφ (x)µφ (y) dC(x, y)
Introduction 5
The regularity of the kernel K entails the continuity of the estimator fbX,n,h .
All results established for a real valued variable X apply straightforwardly
to a variable defined in a metric space.
Deheuvels (1977) presented a review of nonparametric methods of es-
timation for the density and compared the mean squared error of several
kernel estimators including the classical polynomial kernels which do not
satisfy the above conditions, some of them diverge and their orders differ
from those of the density kernels. Classical kernels are the normal density
with support R and densities with a compact support such as the Bartlett-
Epanechnikov kernel with support [−1, 1], K(u) = 0.75(1 − u2 )1{|u|≤1} ,
other kernels are presented in Parzen (1962), Prakasa Rao (1983), etc.
With a sequence hn converging to zero at a convenient rate, the estimator
fbX,n,h is biased, with an asymptotically negligible bias depending on the
regularity properties of the density. Constants depending on moments of
the kernel function also appear in the bias function E fbX,n,h − fX and the
moments E fbX,n,h
k
of the estimated density. The variance does not depend
on the class of the density. The weak and strong uniform consistency of
the kernel density estimator and its derivatives were proved by Silverman
(1978) under derivability conditions for the density. Their performances
are measured by several error criteria corresponding to the estimation of
the density at a single point or over its whole support. The mean squared
error criterion is common for that purpose and it splits into a variance and
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
The first order approximations of the MSE and the MISE as the sample
size increases are the AMSE and the AMISE. Let (hn )n be a bandwidth
sequence converging to zero and
R such that nh tends to infinity
R and let K
be a kernel satisfying m2K = x2 K(x) dx < ∞ and κ2 = K 2 (x) dx < ∞.
Consider a variable X such that EX 2 is finite and the density FX is twice
continuously differentiable
h4 2 002
AM SE(fbX,n,h ); x = (nh)−1 fX (x)κ2 +m f (x).
4 2K
They depend on the bandwidth h of the kernel and the AMSE is minimized
at a value
R 2
K (x) dx 1/5
hAMSE (x) = fX (x) .
nm22K f 002 (x)
The global optimum of the AMISE is attained at
R 2
K (x) dx 1/5
hAMISE = 2
R .
002
nm2K f (x) dx
Then the optimal AMSE tends to zero with the order n−4/5 , it depends
on the kernel and on the unknown values at x of the functions fX and
f 002 , or their integrals for the integrated error (Silverman, 1986). If the
bandwidth has a smaller order, the variance of the estimator is predom-
inant in the expression of the errors and the variations of estimator are
larger, if the bandwidth is larger than the optimal value, the bias increases
and the variance is reduced. The approximation made by suppressing the
higher order terms in the expansions of the bias and the variance of the
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Introduction 7
Introduction 9
with Ih = [a + h, b − h].
The
R Fourier transform
R is an isometry as expressed by the equality
|F f (s)|2 ds = |f (s)|2 ds.
Let (Xk )k≤n be a stationary time series with mean zero, the spectral
density is defined from the autocorrelation coefficients γk = E(X0 Xk ) by
P
S(w) = +∞ R ∞ γk e
k=−∞
−iwk
and the inverse relationship for the autocorrela-
tions is γk = −∞ S(w)eiwsx dx. The periodogram of the series is defined as
Pn
Ibn (w) = T −1 | k=1 Xk e−2πikw |2 and Rit is smoothed to yield a regular esti-
mator of the spectral density Sbn (s) = Kh (u−s)Ibn (s) ds. Brillinger (1975)
4 French mathematician (1768–1830)
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Y = m(X) + σε (1.6)
Introduction 11
This estimator has been introduced by Watson (1964) and Nadaraya (1964)
and detailed presentations can bee found in the monographs by Eubank
(1977), Nadaraya (1989) and Härdle (1990). The performance of the kernel
estimator for the regression curve m is measured by error criteria corre-
sponding to the estimation of the curve at a single point or over its whole
support, like for the kernel estimator of a continuous density.
A global random measure of the distance between the estimator m b n,h
and the regression function m is the integrated squared error (ISE)
Z
ISE(m b n,h ; h) = {mb n,h (x) − m(x)}2 dx, (1.8)
its convergence was studied by Hall (1984), Härdle (1990). The mean
squared error criterion develops as the sum of the variance and the squared
bias of the estimator
b n,h (x) − m(x)}2
b n,h ; x, h) = E{m
M SE(m
b n,h (x)}2 + {E m
b n,h (x) − E m
= E{m b n,h (x) − m(x)}2 .
A global mean squared error is the mean integrated squared error
Z
M ISE(mb n,h ; h) = E{ISE(mb n,h ; h)} = M SE(m b n,h ; x, h) dx. (1.9)
Introduction 13
Theorem 1.2 (Kiefer 1972). Let pn tend to zero with npn and hn tend
to infinity, and let δ = 1 or −1 then
Qn (pn ) − npn
lim sup δ = 1, a.s.
n {2npn log log n}1/2
The results were extended to conditional distribution functions and
Sheather and Marron (1990) considered kernel quantile estimators. The
inverse function for a nonparametric regression curve determines thresh-
olds for X given Y values, it is related to the distribution function of Y
conditionally on X. The inverse empirical process for a monotone non-
parametric regression function has been studied in Pinçon and Pons (2006)
and Pons (2008), the main results are presented and generalized in Chap-
ter 5. The behaviour of the threshold estimators Qb X,n,h and Q
b Y,n,h of the
conditional distribution is studied, with their bias and variance and the
mean squared errors which determine the optimal bandwidths specific to
the quantile processes.
The Bahadur representation for the quantile estimators is an expansion
t − FbX,n
FbX,n
−1 −1
(t) = FX (t) + −1
◦ FX (t) + Rn (t), t ∈ [0, 1],
fX
where the main is a sum of independent and identically distributed ran-
dom variables and the remainder term Rn (t) is a op (n−1/2 ) (Ghosh, 1971),
Bahadur (1966) studied its a.s. convergence. Lo and Singh (1986), Gijbels
and Veraverbeke (1988, 1989) extended this approach by differentiation to
the Kaplan-Meier estimator of the distribution function of independent and
identically distributed right-censored variables.
Watson and Laedbetter (1964) introduced smooth estimators for the hazard
function of a point process. The functional intensity λ(t) of an inhomoge-
neous Poisson point process N is defined by
Introduction 15
have been intensively studied and they are estimated by empirical moments
from observations on a subset G of Rd . The centered moments are imme-
Pk
diatly obtained from the mean measure m and µk = i=1 (−1)i Cki mi νk−i .
The stationarity of the process implies that the k-th moment of N
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Introduction 17
Rt
process (Xt )t>0 . Assuming that E exp{− 21 0 β 2 (Bs ) ds} is finite, the Gir-
sanov theorem formulates the density of the process X. Parametric diffu-
sion models have been much studied and estimators of the parameters are
defined by maximum likelihood from observations at regularly spaced dis-
cretization points or at random stopping times. In a discretization scheme
with a constant interval of length ∆n between observations, nonparamet-
ric estimators are defined like with samples of variables in nonparametric
regression models (Pons, 2008). Let (Xti , Yi )i≤1 be discrete observations
with Yi = Xti+1 − Xti defined by equation (1.11), the functions α and β 2
are estimated by
Pn
Y K (x − Xti )
bn (x) =
α P n i hn
i=1
,
∆n i=1 Khn (x − Xti )
Pn
Z 2 K (x − Xti )
b 2
βn (x) = P n i hn
i=1
,
∆n i=1 Khn (x − Xti )
where Zi = Yi − ∆n α bn (Xti ) is the variable of the centered variations for the
diffusion process. The variance of the variable Yi conditionally on Xti varies
with Xti and weighted estimators are also defined here. Varying sampling
intervals or random sampling schemes modify the estimators. Functional
models of diffusions with discontinuities were also considered in Pons (2008)
where the jump size was assumed to be a squared integrable function of the
process X and a nonparametric estimator of this function was defined. Here
the estimators of the discretized process are compared to those built with
the continuously observed diffusion process X defined by (1.11), on an in-
creasing time interval [0, T ]. The kernel bandwidth hT tends to zero as T
tends to infinity with the same rate as hn . In Chapter 8, the MISE of each
estimator and its optimal bandwidth are determined. The estimators are
compared with those defined for the continuously observed diffusion pro-
cesses.
Introduction 19
Rx
0
ξ(u) du. The estimators are based on the covariances of the process
Z are built with its quadratic variations. For the time transformation, the
P[nx]
estimator of Φ(x) is defined by linearisation of Vn (x) = k=1 (∆Zk )2 where
the variables Zk = Z(n−1 k) − Z(n−1 (k − 1)) are centered and independent
vn (x) = Vn (x) + (nx − [nx])(∆Z[nx]+1 )2 , x ∈ [0, 1[,
vn (1) = Vn (1), (1.12)
b
Φn (x) = vn−1 (1)vn (x),
the process Φb n − Φ is uniformly consistent and n1/2 (Φ
b n − Φ) is asymptot-
3
ically Gaussian. The method was extended to [0, 1] . The diffusion pro-
cesses cannot be reduced to the same model but the method for estimating
its variance function relies on similar properties of Gaussian processes.
In time series analysis, the models are usually defined by scalar pa-
rameters and a wide range of parametric models for stationary series have
been intensively studied since many years. Nonparametric spectral densi-
ties of the parametric models have been estimated by smoothing the peri-
odogram calculated from T discrete observations of stationary and mixing
series (Wold, 1975, Brillinger, 1981, Robinson, 1986, Herrmann, Gasser
and Kneip, 1992, Pons, 2008). The spectral density is supposed to be twice
continuously differentiable and the bias, variance and moments of its kernel
estimator have been expanded like those of a probability density. It con-
verges weakly with the rate T 2/5 to a Gaussian process, as a consequence
of the weak convergence of the empirical periodogram.
Chapters 2 and 3 focus on the density and the regression models, respec-
tively. In models with a constant variance, the regression estimator defined
as a ratio of kernel estimators is approximated by a weighted sum of two
kernel estimators and its properties are easily deduced. In models with a
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Introduction 21
Chapter 2
2.1 Introduction
23
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
and it is bounded by the sum of a p-moment and a bias term. For every
x in IX,h , the pointwise and uniform convergence of the kernel estimator
fbn,h are established under the following conditions about the kernel and
the density.
Condition 2.1.
(1) K is a symmetric density such that |x|2 K(x) → 0 as |x| tends to infinity
or K has a compact support with value zero on its frontier;
(2) The density function f belongs to the class C2 (IX ) of twice continuously
differentiable functions defined in IX .
(3) The kernel
R function satisfies R integrability conditions: Rthe moments
m2K = u2 K(u)du, κα = K α (u)du, for α ≥ 0, and |K 0 (u)|α du,
for α = 1, 2, are finite. As n → ∞, hn → 0 nhn → ∞.
(4) nh5n converges to a finite limit γ.
The next conditions are stronger than Conditions 2.1 (2)-(4), with higher
degrees of differentiability and integrability.
Condition 2.2.
≤ 58 exp{−αnh2n }
P∞
with α > 0, and n=1 exp{−nα h2n } tends to zero under Condition 2.1 or
2.2.
Proposition 2.2. Assume hn → 0 and nhn → ∞,
(a) under Conditions 2.1, the bias of fbn,h (x) is
h2
bn,h (x) = m2K f (2) (x) + o(h2 ),
2
denoted h2 bf (x) + o(h2 ), its variance is
V ar{fbn,h (x)} = (nh)−1 κ2 f (x) + o((nh)−1 ),
also denoted (nh)−1 σf2 (x) + o((nh)−1 ), where all approximations are uni-
form. Let K have the compact support [−1, 1], the covariance of fbn,h (x)
and fbn,h (y) is zero if |x − y| > 2h, otherwise it is approximated by
Z
(nh)−1
{f (x) + f (y)}δx,y K (v − αh K v + αh dv
2
where αh = |x − y|/(2h) and δx,y is the indicator of {x = y}.
(b) Under Conditions 2.2, for every s ≥ 2, the bias of fbn,h (x) is
hs
bn,h (x; s) = msK f (s) (x) + o(hs ),
s!
and
kfbn,h (x) − fn,h (x)kp = 0((nh)−1/p ),
for every p ≥ 2, where the approximations are uniform.
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Proof. The bias as h tends to zero is obtained from a second order ex-
pansion of f (x + ht) under Condition 2.1, and from its s-order expansion
under Condition 2.2. The variance of fbn,h (x) is
Z
b
V ar{fn,h (x)} = n { Kh2 (x − s)f (s) ds − fn,h
−1 2
(x)}.
R
The first term of the sum is n−1 Kh2 (x − u)f (u)du = (nh)−1 κ2 f (x) +
o((nh)−1 ), the second term n−1 f 2 (x) + O(n−1 h) is smaller.
R
The covariance of fbn,h (x) and fbn,h (y) is written n−1 { I 2 Kh (u −
X
x)Kh (u − y)f (u) du − fn,h (x)fn,h (y)}, it is zero if |x − y| > 2h. Otherwise
let αh = |x−y|/(2h) < 1, changing the variables as h−1 (x−u) = v −αh and
h−1 (y − u) = v + αh with v = {(x + y)/2 − u}/h, the covariance develops
as
Z
b b −1 x+y
Cov{fn,h (x), fn,h (y)} = (nh) f ( ) K(v − αh )K(v + αh )dv
2
+ o((nh)−1 ).
If |x − y| ≤ 2h, f ((x + y)/2) = f (x) + o(1) = f (y) + o(1), the covariance is
approximated by
Z
(nh)−1
{f (x) + f (y)}I{0≤αh <1} K (v − αh K v + αh dv.
2
Due to the compactness of the support of K, the covariance
is zero if
αh ≥ 1. For x 6= y, αh tends to infinity and I 0 ≤ αh < 1 tends to zero
as n tends to infinity, then the indicator is approximated by the indicator
δx,y of {x = y}. R
For p = 1, E|fbn,h − fn,h |(x) ≤ IX,h |Kh (x − s)| d|Fbn − F |(s) which
converges to zero as n tends to infinity. For p ≥ 3, the Lp -risk of fbn,h (x) is
obtained from the expansion of the sum and by recursion on the order of
the moment in the expansion of
n
X
p p
E fbn,h (x) − fn,h (x) = E n−1 Kh (x − Xi ) − fn,h (x) .
i=1
Integrating the
R above expansions entails similar bounds for the inte-
grated norms E |fbn,h (x) − fn,h (x)|p dx = 0((nh)−1 ), for every p > 1.
For p = 2, E{fbn,h (x) − f (x)}2 = V ar{fbn,h (x)} + {fn,h (x) − f (x)}2 and
its first order expansion is n−1 h−1 κ2 f (x) + o(n−1 h−1 ) + 41 m22K h4 f (2)2 (x) +
o(h4 ). The asymptotic mean squared error for fbn,h at x is then
1
AM SE(fbn,h ; x) = (nh)−1 κ2 f (x) + m22K h4 f (2)2 (x),
4
it is minimum for the bandwidth function
κ2 f (x) 1/5
hAMSE (x) = n−1/5 .
m22K f (2)2 (x)
A smaller order bandwidth increases the variance of the density estimator
and reduces its bias, with the order n−1/5 its asymptotic distribution cannot
be centered. An estimator of the derivative f (k) is defined by the means of
the derivative K (k) of the symmetric kernel, for k ≥ 1. The convergences
rates for estimators of a derivative of the density also depend on the order
of the derivative. Consider the k-order derivative of Kh
(k)
Kh (x) = h−(k+1) K (k) (h−1 x), k ≥ 1.
The estimators of the derivatives of the density are
n
X
(k) (k)
fbn,h (x) = n−1 Kh (x − Xi ). (2.3)
i=1
(k)
The next lemma implies the uniform consistency of fbn,h to f (k) , for every
order of derivability k ≥ 1 and allows to calculate the variance of the
derivative estimators. It is not exhaustive and integrals of higher orders
are easily obtained using integrations by parts.
(1) Pn (1)
The sum fbn,h (x) = n−1 i=1 Kh (x − Xi ) converges uniformly on IX,h to
its expectation Z
(1) (1) (1)
fn,h (x) = EKh (x − X) = Kh (u − x)fX (u) du
Z Z
h2
= −f (1) (x) zK (1) (z) dz − f (3) (x) z 3 K (1) (z) dz + o(h2 )
6
2
h
= f (1) (x) + m2K f (3) (x) + o(h2 ),
2
(1) 2
then fbn,h converges uniformly to f (1) (x) and its bias is h2 m2K f (3) (x). Its
R
variance is (nh3 )−1 f (x) K (1)2 (z) dz + o((nh3 )−1 ) and the optimal local
bandwidth for estimating f (1) is deduced as
R (1)2
(1)
−1/7 f (x) K (z) dz 1/7
hAMSE (f ; x) = n 2 (3)2
,
m2K f (x)
thus the estimator of the first density derivative (2.3) has to be computed
with a bandwidth estimating hAMSE (f (1) ; x). For the second derivative,
(2) (2) 2
the expectation of fbn,h is fn,h (x) = f (2) (x) + h2 m2K f (3) (x) + o(h2 ), so it
2
converges uniformly f (2) with the bias h2 m2K f (4) (x)+o(h2 ) and the vari-
R to(2)2
5 −1
ance (nh ) f (x) K (z) dz + o((nh4 )−1 ). More generally, Lemma 2.1
generalizes by induction to higher orders and the rate of optimal band-
widths is deduced as follows.
(k)
Proposition 2.3. Under Conditions 2.1, the estimator fbn,h of the k-
order derivative of a density in class C2 has a bias O(h2 ) and a variance
O((nh2k+1 )−1 ), its optimal local and global bandwidths are O(n−1/(2k+5) ),
for every k ≥ 2.
For a density of class Cs and under Conditions 2.2, the bias is a O(hs ) and
the variance a O((nh2k+1 )−1 ), its optimal bandwidths are O(n−1/(2k+2s+1) )
and the corresponding L2 -risks are O(n−s/(2k+2s+1) ).
(k)
As a consequence the L2 -risk of the estimator fb b is a O(n−2s/(2k+2s+1) )
n,hopt
for every density in Cs , s ≥ 2. If the k-th derivative of the kernel and the
density are lipschitzian with |K (k) (x) − K (k) (y)| ≤ α|x − y| and |f (k) (x) −
f (k) (y)| ≤ α|x − y| for some constant α > 0, then there exists a constant C
such that for every x and y in IX,h
(k) (k)
|fbn,h (x) − fbn,h (y)| ≤ Cαh−(k+1) |x − y|.
R (k)2
The integral θk = f (x) dx of the quadratic k-th derivative of the
density is estimated by Z
(k)2
θbk,n,h = fbn,h (x) dx, (2.4)
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
the variance E(θbk,n,h − θk )2 has the same order as the MISE for the estima-
(k)
tor fbn,h of f (k) , hence it converges to θk with the rate O((n1/2 hk+1/2 ) and
the estimator does not achieve the parametric rate of convergence n1/2 .
The Lp -risk of the estimator of the density decreases as s increases and,
for p ≥ 2, a bound of the Lp -norm is
hps
kfbn,h (x) − f (x)kpp ≤ 2p−1 {mpsK f (k)p (x) + o(1)}
(s!)p
+ (nh)−1 {gp (x) + o(1)} ,
P[p/2] P k
where gp (x) = k=2 1<j1 6=...6=jk ≤p; i ji =p κj1 . . . κjl f (x). The optimal
P
bandwidth is still reached when both terms of this bound are of the same
order and minimal. With p = 2 and s = 2, it is
1 2
hn (x) = O(n− 5 ), kfbn,h (x) − f (x)k2 = O(n− 5 ).
The bandwidth and the risk decrease as the order of derivability of the
density increases.
The derivability condition f ∈ Cs in 2.1 can be replaced by the condi-
tion: f belongs to a Hölder class Hα,M with |f (s) (x)−f (s) (y)| ≤ M |x−y|α−s
where s = [α] ≥ 0 is the integer part of α > 0.
The Lp -norm of the variations of the process fbn,h − fn,h are bounded by
the same arguments as the bias and the variance. Assume that K has the
support [−1, 1].
Lemma 2.2. Under Conditions 2.1 and 2.1, there exists a constant C such
that for every x and y in IX,h and satisfying |x − y| ≤ 2h
Proof. Let x and y in IX,h , the variance of fbn,h (x) − fbn,h (y) develops
according to their variances given by Proposition 2.2 and the covariance
between both terms which has the same bound by the Cauchy-Schwarz
inequality. The
R second order moment E|fbn,h (x) − fbn,h (y)|2 develops as
the sum n−1 {Khn (x − u) − Khn (y − u)}2 f (u) du + (1 − n−1 ){f R n,hn (x) −
2
fn,hn (y)} . For an approximation of the integral I2 (x, y) = {Khn (x −
u) − Khn (y − u)}2 f (u) du, the Mean Value Theorem implies Khn (x − u) −
(1)
Khn (y − u) = R(x − y)ϕn (z − u) where ϕn (x) = Khn (x), and z is between
x and y, then {Khn (x − u) − Khn (y − u)}2 f (u) du is approximated by
Z Z
2
(x − y) ϕn (z − u)f (u) du = (x − y) hn {f (x) K (1)2 + o(hn )}.
(1)2 2 −3
Since h−1 −1
n |x| and hn |y| are bounded by 1, the order of the second moment
of fbn,h (x) − fbn,h (y) is a O((x − y)2 (nh3n )−1 ) if |x − y| ≤ 2hn and the
covariance is zero otherwise.
Theorem 2.1. Under Conditions 2.1 and 2.2, for a density f of class
Cs (IX ) and with nh2s+1 converging to a constant γ, the process
Un,h = (nh)1/2 {fbn,h − f }I{IX,h }
converges weakly to Wf +γ 1/2 bf , where Wf is a continuous Gaussian process
on IX with mean zero and covariance E{Wf (x)Wf (x0 )} = δx,x0 σf2 (x), at x
and x0 .
hz)}dz]2 ≤ 2|x − y|2 kf (1) k2∞ imply that the mean of the squared variations
of the process Un,h are O(h−2 |x − y|2 ) as |x − y| ≤ 2h < 1, otherwise
the estimators fbnh (x) and fbnh (y) are independent. Billingsley’s Theorem 3
implies the tightness of the process Un,h 1[−h,h] and the convergence is ex-
tended to any compact subinterval of the support. With an unbounded
support for X such that E|X| < ∞, for every η > 0 there exists A such
that P (|X| > A) ≤ η, therefore P (|Un,h (A + 1)| > 0) ≤ η and the same
result still holds on [−A − 1, A + 1] instead of the support of the process
Un,h .
Lemma 2.2 concerning second moments does not depend on the smooth-
ness of the density and it is not modified by the condition of a Hölder
class instead of a class Cs . The variations of the bias are now bounded by
{fn,h (x)−f (x)−fn,h (y)+f (y)}2 ≤ 2M |x−y|2α and the mean of the squared
variations of the process Un,h are O(h−2 |x − y|2 ) for |x − y| ≤ 2h < 1. The
weak convergence of Theorem 2.1 is therefore fulfilled with every α > 1.
Bootstrap estimators for the bias and the variance provide another estima-
tion of M ISEn (h) and hAMISE . These consistent estimators are then used
for centering and normalizing the process fbn,h −f and provide an estimated
process
bn = (nb
U b−1
hn )1/2 σ b
{fn,bhn γn,bhnbbf,n,bhn }I{IX,bhn }.
−f −b
f,n,b
hn
Theorem 2.2. Under Conditions 2.1 and 2.2, for a density f of class
Cs (IX ) and with nh2s+2k+1 converging to a constant γ, the process
(k) (k)
Un,h = (nh2k+1 )1/2 {fbn,h − f (k) }I{IX,h }
by a multivariate kernel K defined on [−1, 1]d and Kh (x) = h−d K(h−d x),
Qd
with a single bandwidth or Kh (x) = k=1 h−1 −1
k K(hk xk ) with a vector
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
bandwidth, for x in IX,h . The derivatives of the density f (k) are arrays
and the rates of their moments depend on the dimension d. If hk = h
hs
bs,n,h (x) = msK f (s) (x) + o(hs ),
s!
V ar{fbn,h (x)} = (nhd )−1 κ2 f (x) + o((nhd )−1 ),
kfbn,h (x) − fn,h (x)kp = 0((nhd )−1/p ), (2.5)
M ISEn (h, x) = O(h2s ) + O((nhd )−1 ).
The optimal bandwidth hn (x) minimizing the M ISEn (h, x) has the order
n−1/(2s+d) where the local MISE reaches the minimal order O(n−2s/(2s+d) ).
The convergence rate of fbn,h − f is (nhd )1/2 and the results of Theorem 2.1
and its corollary still hold with this rate.
Consider a class F of densities and a risk R(f, fbn ) for the estimation of
a density f of F by an estimator fbn belonging to a space Fb. A minimax
estimator fbn∗ is defined as a minimizer of the maximal risk over F
fb∗ = arg inf sup R(f, fbn ).
n
fbn ∈F
b f ∈F
With an optimal bandwidth related to the risk Rpp , the kernel estimator of
a density of F = Cs , s ≥ 2, provides a Lp -risk of order hspn (x; s, p) and this
is the minimax risk order in a space Fb determined by the regularity of the
kernel, the kernel estimator reaches this bound.
R
The estimator (2.4) of the integral θk = f (k)2 (x) dx of the quadratic k-
th derivative of a density of C2 has therefore the optimal rate of convergence
for an estimator of θk .
The histogram is the older unsmoothed nonparametric estimator of the
density. It is defined as the empirical distribution of the observations cumu-
lated on small intervals of equal length hn , divided by hn , with hn and
nhn converging to zero as n tends to infinity. Let (Bjh )j=1,...,JX,h be a
partition of IX into subintervals of length h and centered at ajh , and let
P
Kh (x) = h−1 j∈JX,h 1Bjh (x) be the kernel corresponding to the histo-
gram, it is therefore defined as
Z
fen,h (x) = hKh (x) Kh (s)dFbn (s).
P
Its bias ebf,h (x) = j∈JX,h 1Bjh (x){f (ajh ) − f (x)} + o(h) = hf (1) (x) + o(h)
is larger than the bias of kernel estimators and its variance vef (x) is a
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
has been introduced by Hall and Marron (1987). A second plug-in estimator
was defined by Bickel and Ritov (1988) as the integral of the square of the
estimated density
2 X Z
θ̄n,h = Kh (x − Xi )Kh (x − Xj ) dx.
n(n − 1)
1≤i<j≤n
are also estimated by the integral of the square of the kernel estimator for
the derivative of the density
2 X Z (k) (k)
θ̄n,h = Kh (x − Xi )Kh (x − Xj ) dx,
n(n − 1)
1≤i<j≤n
cf,n,h = M b .
M fn,h
(1)
For every x in IX , the variable U(1),n,h (x) = (nh3 )1/2 (fbn,h − f (1) )(x) con-
verges weakly to a Gaussian variable with a non degenerated variance
κ2 f (x) and a mean m(1) (x) = limn (nh7n )1/2 m2K f (3) (x)/2. It follows that
the convergence rate of (Mcf,n,h − Mf ) is (nh3 )1/2 and
Proposition 2.5. Under Conditions 2.1 and 2.2, (nh3 )1/2 (M cf,n,h − Mf )
(2)−1 2
converges weakly to a variable N (f (Mf )m(1) (Mf ), σMf ).
(1)
deduced from the bias of the process (fbn,h − f (1) ) and it equals
2 2
cf,n,h ) = − h m2K f (3) (M
Ef (1) (M cf,n,h )+o(h2 ) = − h m2K f (3) (Mf )+o(h2 ),
2 2
it does not depend on the degree of derivability of the density f .
and πbn = n−1 n1 . Their densities with respect to the Lebesgue measure are
denoted f1 and f2 , and the density of the second sample with respect to
the distribution of the first one is ϕ = π(1 − π)−1 f1−1 f2 . The densities f1
and f2 are estimated by smoothing Fb1,n and Fb2,n , then f0 , fϕ and ϕ are
estimated by
Z
b
f0,n,h (t) = π −1
bn Kh (t − s) dFb1,n (s),
Z
fbn,h (t) = (1 − π
bn )−1 Kh (t − s) dFb2,n (s),
and Jn (s) = 1{Y (s)>0} . The process FbnR is also written in an additive
n
which is easily calculated. From the martingale property of the process Λbn
and Gill’s expression for the Kaplan-Meier estimator
Z t
FbnR − F 1 − FbnR (s− ) b
(t) = − d(Λn − Λ)(s), t ≤ max Ti , (2.11)
1−F 0 1 − F (s)
it follows that
Z t
1 − FbnR (s− ) b
E {dΛn (s) − dΛ(s)} = 0,
0 1 − F (s)
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
The a.s. uniform consistency of the process FbnR − F , for the Kaplan-Meier
estimator, implies that supIX,h |fbn,h
R
− f | converges in probability to zero,
as n tends to the infinity and h to zero. From (2.11), the estimator fbR n,h
satisfies
Z Z
1 − FbnR− b
s
fbn,h
R
(t) = Kh (t − s)[f (s){1 + d(Λn − Λ)} ds
0 1−F
− {1 − FbnR (s− )} d(Λ
b n − Λ)(s)]
Z
= Kh (t − s) [dF (s) − {1 − FbnR (s− )} d(Λ
b n − Λ)(s)]
Z Z u
1 − FbnR− b n − Λ)(u).
+ { Kh (t − s) dF (s)} (u) d(Λ
h 1−F
where the last two terms are O((nh)−1 ) and the first one is a O(n−1 ). The
optimal bandwidths for estimating the density under left-censoring are then
also O(n−1/5) and the optimal L2 -risks are O(n−2/5 ).
Under Conditions 2.1 or 2.2 and if the support of K is compact, the
variance vfL belongs to class C2 (IX ) and for every t and t0 in IX,h , there
exists a constant α such that for |t − t0 | ≤ 2h
L 2
E fbn,h (t) − fbn,h
L
(t0 ) ≤ α(nh3 )−1 |t − t0 |2 .
L
Under the conditions of Theorem 2.1, the process Un,h = (nh)1/2 {fbn,h
L
−
L 1/2 L
f }I{IX,h } converges weakly to Wf + γ bf , where Wf is a continuous
Gaussian process on IX with mean and covariances zero and with variance
function vfL .
The mean marginal density f of the process is the density of the distribution
function F , it is estimated by replacing the integral of a kernel function with
respect to the empirical distribution function of a sample by an integral with
respect to the Lebesgue measure over [0, T ] and the bandwidth sequence is
indexed by T . For every x in IX,T,h
Z
b 1 T
fT,h (x) = Kh (Xs − x) ds, (2.15)
T 0
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
R
its expectation is fT,h (x) = IX,n Kh (y − x)f (y) dy so its bias is
Z
hs
bT,h (x) = Kh (y − x){f (y) − f (x)} dy = T msK f (s) (x) + o(hsT )
IX,T s!
under Conditions 2.1-2.2. For a density in a Hölder class Hα,M , bT,h (x)
[α]
Rtends[α]to zero for every α > 0 and it is a O(h ) under the condition
|u| K(u) du < ∞.
Its variance is expressed through the integral of the covariance between
Kh (Xs − x) and Kh (Xt − x). For Xs = Xt , the integral on the diagonal
2
DX of IX,T is a (T hT )−1 κ2 f (x) + o((T hT )−1 ) and the integral outside the
diagonal denoted Io (T ) is expanded using the ergodicity property (2.13).
Let αh (u, v) = |u − v|/2hT , the integral Io (T ) is written
Z Z
ds dt
Kh (u − x)Kh (v − x)fXs ,Xt (u, v) du dv
[0,T ] 2 2
IX,T \DX T T
Z Z Z 1−αh (u,v)
= (T hT )−1 { K(z − αh (u, v))K(z + αh (u, v)) dz
IX IX\{u} −1+αh (u,v)
and the optimal local and global bandwidths minimizing the mean squared
(integrated) errors are O(T 1/(2s+1) ). If hT has the rate of the optimal
bandwidths, the M ISE is a O(T 2s/(2s+1) ). The Lp -norm of the estimator
satisfies kfbT,h (x)−fT,h (x)kp = O((T hT )−1/p ) under an ergodicity condition
k
for (Xt1 , . . . , Xtk ) similar to (2.13) for bounded functions ψ defined on IX
Z
ET −1 ψ(Xt1 , . . . , Xtk ) dt1 . . . dtk (2.16)
[0,T ]k
Z Y
→ ψ(x1 , . . . , xk ) πxj (dxj+1 ) dF (x1 ),
k
IX 1≤j≤k−1
for every integer k = 2, . . . , p. The property (2.16) implies the weak conver-
gence of the finite dimensional distributions of the process (T hT )1/2 (fbT,h −
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
where
Z T
FbT (t) = T −1 1{Xt ≤s} dt
0
and F is its limit √ under the ergodicity property (2.14). The convergence
rate of FbT −F is T , from the mixing property of the process X. Therefore
1/2
h2 (fbT,hT , f ) convergences to zero in probability with the rate T hT .
2.11 Exercises
R
(1) Let f and g be real functions defined on R R and let f ∗ g(x) = f (x −
y)g(y) dy be their convolution. Calculate f ∗ g(x) dx and prove that,
for 1 ≤ p ≤ ∞, if f belongs to Lp and g to Lq such that p−1 + q −1 = 1,
then supx∈R |f ∗ g(x)| ≤ kf kp kgkq . Assume p is finite and prove that
f ∗ g belongs to the space of continuous functions on R tending to zero
at infinity.
(2) Prove the approximation of the bias in (d) of Proposition 2.2 using a
Taylor expansion and precise the expansions for the Lp -risk.
(3) Prove the results of Equation (2.5).
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
(4) Write the variance of the kernel estimator for the marginal density of
dependent observations (Xi )i≤n in terms of the auto-covariance coeffi-
Pn
cients ρj = n−1 i=1 Cov(Xi , Xi+j ).
(5) Consider a hierarchical sample (Xij , Yij )j=(1,...,Ji ),i=1,...,n , with n in-
dependent and finite sub-samples of Ji dependent observations. Let
Pn −1
P n P Ji
N = i=1 Ji and f = limn N i=1 j=1 fXij be the limiting
marginal mean density of the observations of X. Define an estimator
of the density f and give the first order approximation of the variance
of the estimator
R x under relevant ergodicity conditions.
(6) Let H(x) = −1 k(y) dy be the integrated kernel, F be the distribu-
Pn
tion function of X and Fbnh (x) = n−1 i=1 Hh (Xi − x) be a smooth
estimator of the distribution function. Prove that the bias of Fbnh (x)
is 12 h2 m2K f (1) (x) + o(h2 ) and its variance (nh)−1 κ2 F (x) + o((nh)−1 ).
Define the optimal local and global bandwidths for Fbnh .
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Chapter 3
49
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
regression function
Z
−1
m(x) = E(Y |X = x) = fX yfXY (x, y) dy
and its denominator is fbX,n,h (x). The mean of µbn,h (x) and its limit are
respectively
Z Z
µn,h (x) = yKh (x − s) dFXY (s, y),
Z
µ(x) = yfXY (x, y) dy = fX (x)m(x),
whereas the mean of m b n,h (x) is denoted mn,h (x). The notations for the
parameters and estimators of the density f are unchanged. The variance
of Y is supposed to be finite and its conditional variance is denoted
σ 2 (x) = E(Y 2 |X = x) − m2 (x),
Z
−1
E(Y 2 |X = x) = fX (x)w2 (x) = y 2 fY |X (y; x) dy, with
Z Z
w2 (x) = y fXY (x, y) dy = fX (x) y 2 fY |X (y; x) dy.
2
Let also σ4 (x) = E[{Y −m(x)}4 | X = x], they are supposed to be bounded
functions. The Lp -risk of the kernel estimator of the regression function m
is defined by its Lp -norm k · kp = {Ek · kp }1/p .
Proof. Note that Condition 3.1 implies that the kernel estimator of fX is
bounded away from zero on IX which may be a sub-interval of the support
of the variable X. Proposition 2.2 and the almost sure convergence to
zero of supx∈IX,h |b
µn,h − µn,h |, proved by the same arguments as for the
density, imply the assertion (a). The bias and the variance are similar for
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
where the second difference is a o(h2 ), using (3.8). A second order Taylor
−1
expansion of fX,n,h (x) as n tends to infinity leads to
µn,h (x) −1
= m(x) + {bµ,n,h (x) − m(x)bfX ,n,h (x)}fX (x) + o(h2 )
fX,n,h (x)
where the non random term is a o(h4 ), by (3.8). The first term develops
using twice the equality y −1 = x−1 − (y − x)(xy)−1
µn,h (x)
fX,n,h (x) m b n,h (x) − =µbn,h (x) − µn,h (x)
fX,n,h (x)
− mn,h (x) fbX,n,h (x) − fX,n,h (x) (3.9)
bn,h (x) − µn,h (x) fbX,n,h (x) − fX,n,h (x)
µ
−
fX,n,h (x)
2
mb n,h (x) fbX,n,h (x) − fX,n,h (x)
+ ,
fX,n,h (x)
so that
2
µn,h (x) 2
fX,n,h (x)E mb n,h (x) − = V ar{b
µn,h (x)}
fX,n,h (x)
+ m 2 (x)V ar{fbX,n,h (x)} − 2mn,h (x)Cov{b
n,h µn,h (x), fbX,n,h (x)}
π0,2,2 (x) mn,h (x) π0,2,1 (x)
+ 2 +2 π0,1,2 (x) − 2
fX,n,h (x) fX,n,h (x) fX,n,h (x)
π2,0,4 (x) π1,1,2 (x) π1,0,3 (x)
+ 2 +2 − 2mn,h (x)
fX,n,h (x) fX,n,h (x) fX,n,h (x)
π1,1,3 (x)
−2 2 ,
fX,n,h (x)
where
k 0 00
µn,h (x) − µn,h (x)}k {fbX,n,h (x) − fX,n,h (x)}k
b n,h (x){b
πk,k0 ,k00 (x) = E m
for k ≥ 0, k 0 ≥ 0 and k 00 ≥ 0. Since m b n,h (x) is bounded, Cauchy-Schwarz
inequalities and the order of the moments of µ bn,h (x) and fbn,h (x) imply that
0 00
all terms πk,k0 ,k00 (x) in the above expression are O((nh)−(k +k )/2 so they
are o((nh)−1 ) except the covariance term π0,1,1 (x). Using the first order
expansions of the means fX,n,h (x) = fX (x) + O(h2 ) the mean develops as
mn,h (x) = m(x) + O(h2 ). It follows that
−2
vm,n,h (x) = fX (x) µn,h (x)} + m2 (x) V ar{fbX,n,h (x)}
V ar{b
− 2m(x) Cov{b µn,h (x), fbX,n,h (x)} + o(n−1 h−1 )
and the convergence to zero of the last term rn,h in (3.2) is satisfied. The
other results are obtained by simple calculus.
b n,h is established by the same
The minimax property of the estimator m
method as for density estimation.
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
For p ≥ 2, let
and
kfbn,h
−1 −1
(x) − fX,n,h (x)kp = O((nh)−1/p ),
and the decreasing order of the moments of the kernel estimator of the
density implies
X
E| (−an )k |p = E|an |p + o(E|an |p ).
k≥1
Lemma 3.2. Under Conditions 2.1, the bias of µ bn,h (x) and m
b n,h (x) are
uniformly approximated as
Z
hs ∂ s fX,Y (x, y)
bµ,n,h (x; s) = msK y dy + o(hs ),
s! ∂xs
hs −1 (s)
bm,n,h (x; s) = msK fX (x){µ(s) (x) − m(x)fX (x)} + o(hs ), (3.11)
s!
for s ≥ 2, and their variances develop as in Proposition 3.1.
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Proposition 3.2. Under Conditions 2.1 and 3.1 with s = 2, for every x
in IXh
(nh)1/2 (m µn,h − µn,h ) − m(fbX,n,h − fX,n,h )}
b n,h − m) = (nh)1/2 f −1 {(b X
+ (nh5 )1/2 bm + rbn,h , (3.12)
and the remainder term of (3.12) satisfies
rn,h k2 = O((nh)−1/2 ).
sup kb
x∈IX,h
−1
X fbX,n,h − fX,n,h k
+ (nh)1/2 fX,n,h µn,h − m − .
fX,n,h
k≥1
By Lemma 3.1 and Proposition 3.1, the first term is a O((nh)−1/2 ). The
second order uniform approximation
−1
fX,n,h (x)µn,h (x) − m(x) = h2 bµ (x) + O(h4 ), (3.14)
implies that the second term in the sum is a O(h4 ) = O((nh)−1 ), as a
consequence of Condition 2.1. By Lemma 3.1 and (3.14), the third term is
a O((nh)1/2 h2 (nh)−1/2 ), it is therefore a O((nh)−1/2 ).
For a regression function of class Cs , s ≥ 2, the L2 -norm of the remain-
der term rbn,h is given by the next proposition.
Proposition 3.3. Under Conditions 2.1, 2.2 and 3.1, for every s ≥ 2 the
remainder term of (3.12) satisfies the uniform bounds
rn,h k2 = O((nh)−1/2 ).
sup kb
IX,h
Propositions 2.2, 3.1, Equation (3.2), Propositions 3.2 and 3.3 determine
an upper bound for the norm km b n,h − mn,h kp of the estimator of m
km −1
b n,h − mn,h kp = kfX {(b µn,h − µn,h ) − m(fbX,n,h − fX,n,h )}kp
+ O((nh)−1/2 kb
rn,h kp ),
−1
≤ 2p−1 [sup fX {kb
µn,h − µn,h kp
IX
similar to those of Propositions 3.1 and 3.2 for the regression curve hold
for mp (x)
−1
(nh)1/2 (m
b p,n,h − mp,n,h ) = (nh5 )1/2 bp,m + (nh)1/2 fX {(b
µp,n,h − µp,n,h )
b
− mp (fX,n,h − fX,n,h )} + rbp,n,h ,
rp,n,h k2 = O((nh)−1/2 )
sup kb
x∈IX,h
h2
= (mfX )(1) (x) + m2K (mfX )(3) (x) + o(h2 ),
2
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
(1) −1 (1)
b n,h converges uniformly to fX
then m (x){(mfX )(1) − mfX } = m(1) , as h
(1)
b n,h (x) is
tends to zero. The bias of m
h2 −1 (3)
m2K fX (x){(mfX )(3) − mfX }(x).
2
Its variance is obtained by an application of Proposition 3.1 to equation
(3.15), its convergence rate is (nh3 )−1 (see Appendix A) and the optimal
global bandwidth for estimating m(1) follows. For the second derivative
Pn (2)
(2) i=1 Yi Kh (x − Xi )
b n,h (x) = P
m n
i=1 Kh (x − Xi )
Pn (1) Pn (1)
{ i=1 Yi Kh (x − Xi )}{ i=1 Kh (x − Xi )}
−2 Pn
{ i=1 Kh (x − Xi )}2
Pn Pn (2)
{ i=1 Yi Kh (x − Xi )}{ i=1 Kh (x − Xi )}
− Pn
{ i=1 Kh (x − Xi )}2
Pn Pn (1)2
{ i=1 Yi Kh (x − Xi )}{ i=1 Kh (x − Xi )}
+2 Pn ,
{ i=1 Kh (x − Xi )}3
(2) (2) Pn (2)
the estimators fbn,h and µ bn,h (x) = n−1 i=1 Yi Kh (x − Xi ) converge uni-
2 (4)
formly to f (2) and µ(2) , respectively, with respective biases h2 m2K fX (x)+
2
o(h2 ) and h2 m2K µ(4) (x) + o(h2 ). The result extends to a general order of
derivative k ≥ 1.
Proposition 3.5. Under Conditions 2.2 and 3.1 with nh2k+2s+1 = O(1),
(k)
for k ≥ 1, and functions m and fX in class Cs (IX ), the estimator m b n,h is
an uniformly consistent estimator of the k-order derivative of the regression
function, its bias is a O(hs ), and its variance a O((nh2k+1 )−1 ), the optimal
bandwidth is a O(n−1/(2k+2s+1) ).
Lemma 3.3. Under Conditions 3.1, there exist positive constants C1 and
C2 such that for every x and y in IX,h and satisfying |x − y| ≤ 2h
µn,h − µn,h )(x, y)|2 ≤ C1 (nh3 )−1 |x − y|2 ,
E|∆(b
b n,h − mn,h )(x, y)|2 ≤ C2 (nh3 )−1 |x − y|2 ,
E|∆(m
if |x − y| > 2h, they are O((nh)−1 ) and the estimators at x and y are
independent.
Proof. Let x and y in RIX,h such that |x − y| ≤ 2h, E|b bn,h (y)|2
µn,h (x) − µ
−1 2
develops as the sum n w2 (u){Khn (x − u) − Khn (y − u)} f (u) du + (1 −
n−1 ){µn,hn (x)−µn,hn (y)}2 . For an approximation of the integral, the Mean
(1)
Value Theorem implies Khn (x − u) − Khn (y − u) = (x − y)ϕn (z − u) where
z is between x and y, and
Z
{Khn (x − u) − Khn (y − u)}2 w2 (u)f (u) du
Z
= (x − y)2 ϕ(1)2 n (z − u)w2 (u)f (u) du
Z
= (x − y)2 h−3n {w 2 (x)f (x) K (1)2 + o(hn )}.
Let |x| ≤ hn and |y| ≤ hn , the order of the second moment E|fbn,h (x) −
fbn,h (y)|2 is a O((x−y)2 (nh3n )−1 ) if |x−y| ≤ 2hn and it is the sum E µ
b2n,h (x)
2
and µ bn,h (y) otherwise. This bound and Lemma 2.2 imply the same orders
for the estimator of the regression function m.
Theorem 3.1. For h > 0, the process Un,h = (nh)1/2 {m b n,h − m}I{IX,h }
converges in distribution to σm W1 +γ 1/2 bm where W1 is a centered Gaussian
process on IX with variance 1 and covariances zero.
Proof. For any x ∈ IX,h and from the approximation (3.2) of Propo-
sition 3.1 and the weak convergences for µ bn,h − µn,h and fbX,n,h −
fX,n,h , the variable Un,h (x) develops as (nh)1/2 {m b n,h (x) − mn,h (x)} +
(nh5 )1/2 bm (x)+o((nh5 )1/2 ), and it converges to a non centered distribution
{W + γ 1/2 bm }(x) where W (x) is the Gaussian variable with mean zero and
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
2
variance σm (x). In the same way, the finite dimensional distributions of
the process Un,h converge weakly to those of {W + γ 1/2 bm }, where W is a
Gaussian process with the same distribution as W (x) at x. The covariance
matrix {σ 2 (xk , xl )}k,l=1,...,m between components W (xk ) and W (xl ) of the
limiting process is the limit of
nh
Cov Un,h (xk ), Un,h (xl ) = Cov{b bn,h (xl )}
µn,h (xk ), µ
fX (xk )fX (xl )
− m(xk )Cov{fbX,n,h (xk ), µ µn,h (xk ), fbX,n,h (xl )}
bn,h (xl )} − m(xl )Cov{b
+ m(xk )m(xl )Cov{fbX,n,h (xk ), fbX,n,h (xl )} + o(1) ,
where the o(1) is deduced from Propositions 3.1, 3.2 and 3.3. For every
integers k and l, let αh = |xl − xk |/(2h) and v = {(xl + xk )/2 − s}/h
be in [0, 1], hence h−1 (xk − s) = v − α and h−1 (xl − s) = v + α. By a
Taylor expansion in a neighborhood of (xl + xk )/2, the integral of the first
covariance term develops as
Cov{b bn,h (xl )}
µn,h (xk ), µ
Z
x k + xl xk + xl
= n−1 h−1 w2 fX K v − αh K v + αh dv
2 2
+ o(n−1 h−1 )
and zero otherwise, with the notation (3.10). Similar expansions are satis-
fied for the other terms of the covariance. Using the following approxima-
tions for |xk − xl | ≤ 2h : w2 ({xk + xl }/2) = w2 (xk ) + o(1) = w2 (xl ) + o(1)
and fX ({xk + xl }/2) = fX (xk ) + o(1) = fX (xl ) + o(1), the covariance of
Un,h (xk ) and Un,h (xl ) is approximated by
Z
V ar(Y |X = xk ) + V ar(Y |X = xl )
I{0≤αh <1} K (v − αh K v + αh dv.
fX (xk ) + fX (xl )
Due to the compactness of the support of K, the covariance iszero if αh ≥ 1.
For xk 6= xl , αh tends to infinity as h tends to zero and I 0 ≤ αh < 1
tends to zero as n tends to infinity, therefore the covariance
of Un,h (xk )
and Un,h (xl ) is equal to δk,l + o(1), where V ar Un,h (xk ) is defined in
Proposition 3.1.
The tightness of the sequence {Un,h } on IX,h will follow from (i) the
tightness of {Un,h (a)} and (ii) a bound of the increments E Un,h (x2 ) −
2
Un,h (x1 ) for |x2 − x1 | < 2h. For condition (i), let η > 0 and c >
1/2
γ 1/2 |bm (a)| + 2η −1 σ 2 (a) , then
1/2
Pr{|Un,h (a)| > c} ≤ Pr (nh) |(m b n,h − mn,h )(a)| + (nh)1/2 |bn,h (a)| > c
V ar{(nh)1/2 (m
b n,h − mn,h )(a)}
≤
{c − (nh)1/2 |bn,h (a)|}2
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
if the orthogonality of the basis entails that EHk (X −x)Hl (X −x)Kh (X −x)
convergences to zero as h tends to zero, for every k 6= l ≤ p. This estimator
is consistent and its behaviour is further studied by the same method as
the estimator of the nonparametric regression.
A multidimensional regression function m(X1 , . . . , Xd ) can be expanded
in sums of univariate regression functions E(Y | Xk = x) and their inter-
actions like a nonparametric analysis of variance if the regression variables
(X1 , . . . , Xd ) generate orthogonal spaces generated. The orthogonality is a
necessary condition for the estimation of the components of this expansion
since
Z
E{Y Kh (xk − Xk )} = E(Y | X = x) FX (dx1 , . . . , xk−1 ,
xk+1 , . . . , xd ) + o(1)
= m(xk )fk (xk ) + o(1),
E{Y Kh (xk − Xk )Kh (xl − Xl )} = CK m(xk , xl )fXk ,Xl (xk , xl ) + o(1),
R
where CK = K(u)K(v) du dv, and E{Y Kh (xk − Xk )Kh (xl − Xl )}
−1
fX k ,Xl
(xk , xl ) can be factorized or expanded as a sum of regression functions
only if Xk and Xl belong to orthogonal spaces. The orthogonalisation of the
space generated by a vector variable X can be performed by a preliminary
principal component analysis providing orthogonal linear combinations of
the initial variables.
The mean of Sbn,h,δ (x) is denoted Sn,h,δ (x). By the uniform consistency
b n,h , Sbn,h,δ converges uniformly to S as n tends to infinity, with h
of m
P
and δ tending to zero. At Xj , it is written Sbn,h,δ (Xj ) = n−1 i6=j {Yi −
b n,h (Xi )}2n,h Kδ (Xj − Xi ) + o((nh)−1 ). The rate of convergence of δn to
m
zero is governed by the degree of derivability of the variance function σ 2 .
Proposition 3.6. Under Conditions 2.1, 2.2 and 3.1, for every function
µ in Cs , density fX in Cr and variance function σ 2 in Ck ,
b nh (x)}2 = σ 2 (x) + O(h2(s∧r) ) + O((nh)−1 ),
E{Y − m
the bias of the estimator Sbn,h,δ (x) of σ 2 (x) defined by (3.17) is
δ 2k
βn,h,δ (x) = b2m,n,h (x)fX (x) + σm,n,h
2
(x)fX (x) + (σ 2 (x)fX (x))(2)
(k!)2
+ o(δ 2k + h2(s∧r) + (nh)−1 )
and its variance is written (nδ)−1 {vσ2 + o(1)} with vσ2 (x) = κ2 V ar{(Y −
m(x))2 |X = x}. The process (nδ)1/2 (b 2
σn,h,δ − σ 2 − βn,h,δ ) converges weakly
to a Gaussian process with mean zero, variance vσ2 and covariances zero.
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Proof. Using Proposition 2.2 and Lemma 3.2, the mean squared er-
ror for m b nh at x is E[{Y − m b nh (x)}2 | X = x] and it is expanded as
2 2 2
σ (x) + bm,n,h (x) + σm,n,h (x) + E[{Y − m(x)}{m(x) − m b nh (x)} | X = x]
b
where the last term is zero. For the variance of Sn,h,δ (x), the fourth condi-
tional moment E[{Y − m b nh (x)}4 (x) | X = x] is the conditional expectation
of {(Y − m(x)) + (m − mnh )(x) + (mnh − m b nh )(x)}4 and it is expanded in a
sum of σ4 (x) = E{Y −m(x)) | X = x}, a bias term b4m,n,h (x) = O(h8(s∧r) ),
4
(1)
The weak convergences of the process (nh3 )1/2 (m
b n,h − m(1) ) (Proposition
cm,n,h − Mm ) as (nh3 )−1/2 and
3.5) determines the convergence rate of (M
it implies the asymptotic behaviour of the estimator Mcm,n,h .
Proposition 3.7. Under Conditions 2.1, 2.2 and 3.1, (nh3 )1/2 (M cm,n,h −
Mm ) converges weakly to a centered Gaussian variable with finite variance
(1)
m(2)−2 (Mm )V arm
b n,h (Mm ).
cm,n,h ) is
If the regression function belongs to C3 (IX ), the bias of m(1) (M
(1)
deduced from the bias of the process mb n,h defined by (3.15), it equals
2
cm,n,h ) = − h m2K f −1 (x){(mfX )(3) − mf (3) }(Mm ) + o(h2 )
Em(1) (M X X
2
and does not depend on the degree of derivability of the regression function
m. All results are extended for the search of the local maxima and minima
of the function m which are local maxima of −m. The maximization of the
function on the interval IX is then replaced by sequential maximizations
or minimizations.
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
the estimator Λb Y |X,n,h is unbiased and FbY |X,n,h is the Kaplan-Meier estima-
tor for distribution function of Y conditional on {X = x}. The regression
function m is then estimated by
Z
b n,h (x) = y FbY |X,n,h (dy; x)
m
n
X Jn (Yi ; x)
= Yi {1 − FbY |X,n,h (Yi− ; x)} .
i=1
Yn (Yi ; x)
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
The estimators satisfy supIX ×I |Λb Y |X,n,h −ΛY |X |, supI |FbY |X,n,h −FY |X |
X,Y
and supIX |m b n,h − m| converge in probability to zero as n tends to infinity,
for every compact subinverval I of IY . For every y ≤ max Yi∗ , the condi-
tional Kaplan-Meier estimator, given x in IX,n,h , still satisfies
Z y
FY |X − FbY |X,n,h 1 − FbY |X,n,h (s− ; x) b
(y; x) = d(ΛY |X,n,h −ΛY |X )(s; x) .
1 − FY |X −∞ 1 − FY |X (s; x)
(3.21)
The mean of this integral with respect to a centered martingale is zero so the
conditional Kaplan-Meier estimator and Λ b Y |X,n are unbiased estimators.
The bias of the estimator of the regression function for censored variables
Y is then a O(h2 ).
PK
with the constraint k=1 P (Y = k|X = x) = 1 for every x. This
reparametrization of the conditional probabilities αk is not restrictive,
though it is called the logistic model.
Estimating first the support of the regression variable reduces the num-
ber of unknown parameters to 2(K − 1), the thresholds of the classes and
their probabilities, for k ≤ K − 1, in addition to the nonparametric re-
gression function m. The probability functions πk (x) are estimated by the
proportions πbn,k (x) of observations of the variable Y in class k, condition-
ally on the regressor value x. Let
bn,k (Xi )
π
Uik = log , i = 1, . . . , n,
1−π bn,k (Xi )
calculated from the observations (Xi , Yi )i=1,...,n such that Yi = k. The
variations of the regression function m between two values x and y are
estimated by
XK Pn
−1 Pi=1 Uik Kh (Xi − x)
b n,h (x) − m
m b n,h (y) = K n
k=1 i=1 Kh (Xi − x)
Pn
Pi=1 Uik Kh (Xi − y)
− n .
i=1 Kh (Xi − y)
This estimator yields an estimator for the derivative of the regression func-
(1)
b n,h (x) = lim|x−y|→0 (x − y)−1 {m
tion, m b n,h (x) − m
b n,h (y)} wich is written as
the mean over the classes of the derivative estimator (3.15) with response
variables Uik . Integrating the mean derivative provides a nonparametric
estimator of the regression function m. The bounds of the classes cannot
be identified without observations of the underlying continuous variable Z,
thus the odds ratio allows to remove the unidentifiable parameters from the
model for the observed variables.
With a regression multidimensional variable X, the single-index model
or a transformation model (Chapter 7) reduce the dimension of the variable
and fasten the convergence of the estimators.
and its denominator is fbX,T,h (x). The mean of µ bT,h (x) and its limit are
respectively
Z
µT,h (x) = yKh (x − u) dFXY (u, y),
I
Z XY
µ(x) = yfXY (x, y) dy = fX (x)m(x).
IXY
and the optimal local and global bandwidths minimizing the mean squared
(integrated) errors are O(T 1/(2s+1) )
n1 2
σm (x) o1/(2s+1)
hAMSE,T (x) = 2
T 2sbm (x; s)m(x)
and, for the asymptotic mean integrated squared error criterion
n1 R 2 o1/(2s+1)
σm (x) dx
hAMISE,T = R .
T 2s b2m (x; s)m(x) dx
With the optimal bandwidth rate, the asymptotic mean (integrated)
squared errors are O(T 2s/(2s+1) ). The same expansions as for the vari-
ance µbT,h (x) and fbX,T,h (x) in Section 2.10 prove that the finite dimen-
sion distributions of the process (T hT )1/2 (fbT,h − f − bT,h ) converge to
those of a centered Gaussian process with mean zero, covariances zero
and variance κ2 f (x) at x. Lemma 3.3 generalizes and the increments
E{fbT,h (x) − fbT,h (y)}2 are approximated as E|∆(m b n,h − mn,h )(x, y)|2 =
O(|x− y|2 (T h3T )−1 ) for every x and y in IX,h such that |x− y| ≤ 2hT . Then
the process (T hT )1/2 {mb T,h −m}I{IX,T } converges weakly to σm W1 +γ 1/2 bm
where W1 is a centered Gaussian process on IX with variance 1 and covari-
ances zero.
3.11 Exercises
(1) Detail the proof for the approximations of the biases and variances of
Proposition 3.1.
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Chapter 4
4.1 Introduction
The increasing intervals IX,hn are now defined with respect the uniform
norm of the function hn by IX,hn = {s ∈ IX ; [s − khn k, s + khn k] ∈ IX }.
The main results of the previous chapters are extended to kernel estima-
tors with functional bandwidth sequences satisfying this convergence rate.
That is the case of the kernel estimators built with estimated optimal local
bandwidths calculated from independent observations.
The second point of this chapter is the definition of an adaptative es-
timator of the bandwidth, when the degree of derivability of the density
varies in its domain of definition, and the behaviour of the estimator of
75
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Let us consider the random process Un,hn (x) = (nhn (x))1/2 {fbn,hn (x) (x) −
f (x)} for x in IX,hn . Under Conditions 2.1 and 4.1, supI |fbn,hn (x) − f (x)|
converges a.s. to zero for every compact subinterval I of IX,hn and
kfbn,hn (x) − f (x)kp tends to zero, as n tends to infinity. The bias of
fbn,hn (x) (x) is bn,hn (x) = 21 h2n (x)m2K f (2) (x) + o(khn k2 ), its variance is
V ar{fbn,hn (x) (x)} = (nhn (x))−1 κ2 f (x) + o((n−1 kh−1 b
n k) and kfn,hn (x) (x) −
−1 −1 1/p
fn,hn (x) (x)kp = 0((n khn k) ).
Under Conditions 2.1-4.1, for a density of class Cs (IX ) and for every
x in IX,h , the moments of order p ≥ 2 are unchanged and the bias of
fbn,hn (x) (x) is modified as
hsn (x)
bn,hn (x; s) = msK f (s) (x) + o(khn ks ).
s!
The MISE and the optimal local bandwidth are similar to those of Chapter
2 using these expressions.
For every u in [−1, 1], let αn and v in [−1, 1], |u| in [0, {x + hn (x)} ∧
{y + hn (y)}] be defined by
1
αn (x, y, u) = {(u − x)h−1 −1
n (x) − (u − y)hn (y)}, (4.1)
2
1
v = vn (x, y, u) = {(u − x)h−1 −1
n (x) + (u − y)hn (y)}
2
1
= [{hn (x) + hn (y)}u − xhn (y) − yhn (x)],
2hn (x)hn (y)
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
u = un (x, y, v) = {hn (x) + hn (y)}−1 {xhn (y) + yhn (x) + 2vhn (x)hn (y)},
zn (x, y) = {hn (x) + hn (y)}−1 {xhn (y) + yhn (x)},
δn (x, y) = 2hn (x)hn (y){hn (x) + hn (y)}−1 = o(1),
Lemma 4.1. The covariance of fbn,h (x) and fbn,h (y)} equals
Z
2
{f (zn (x, y)) K(v − αn (v))K(v + αn (v)) dv
n{hn (x) + hn (y)}
Z
+ δn (x, y)f (1) (zn (x, y)) vK(v − αn (v))K(v + αn (v)) dv + o(khn k)}.
Proof. By the Mean Value Theorem, for every x and y in IX,h there
exists s between x and y such that |fn,hn (x) − fn,hn (y)| = |x − y|f (1) (s)
and
Using the notations (4.1), the first term of this sum is expanded as
Z
1
S1n = {hn (x)K(v − αn (v))
nhn (x)hn (y){hn (x) + hn (y)}
−hn (y)K(v + αn (v))}2 f (zn (v)) dv.
The derivability of the bandwidth functions implies
Z
1
{hn (x)K(v − αn ) − hn (y)K(v + αn )}2 f (zn ) dv
hn (x)hn (y)
Z
hn (x)
≤ 2[ {K(v − αn ) − K(v + αn )}2 f (zn ) dv
hn (y)
Z
{hn (x) − hn (y)}2
+ K 2 (v − αn )f (zn ) dv],
hn (x)hn (y)
Z
2 hn (x)
S1n ≤ [ f (z) {K(v − αn ) − K(v + αn )}2 dv
n{hn (x) + hn (y)} hn (y)
(1)2 Z
2 hn (η(x − y))
+ (x − y) K 2 (v − αn )f (zn ) dv],
hn (x)hn (y)
(1)2
where η lies in (−1, 1), by the Mean Value Theorem, hn (η) and
hn (x)hn (y) have the same order, and
Z Z
{K(v − αn ) − K(v + αn )}2 dv = 4α2n K (1)2 (v) dv = O(|x − y|2 kh−1 2
n k ).
Theorem 4.1. Under the conditions, for a density f of class Cs (IX ) and
a varying bandwidth sequence such that nkhn k2s+1 converges to khk, the
process
Un,hn (x) = (nhn (x))1/2 {fbn,hn(x) − f (x)}I{x ∈ IX,khn k }
converges weakly to the process defined on IX as Wf (x) + h1/2 (x)bf (x),
where Wf is a continuous centered Gaussian process with covariance
σf2 (x)δ{x,x0 } between Wf (x) and Wf (x0 ).
quadratic variations of the bias {fn,hn (x) (x) − f (x) − fn,hn (y) (y) + f (y)}2
are bounded by
Z
| K(z){f (x + hn (x)z) − f (x) − f (y + hn (y)z) − f (y)} dz|2
+ o(khn k2 )
Z
= m2K {fX (x) − hn (x)f (1) (x) zK 2 (z) dz + o(khn k)
the first order approximation of the variances are identical and their second
order approximation have the opposite sign.
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Let us consider the variable bandwidth kernel estimator m b n,hn (x) (x) of the
regression function m and the random process related to the estimated re-
gression function Um,n,hn (x) = (nhn (x))1/2 {m
b n,hn (x)− m(x)}I{x∈IX,khn k } .
Conditions 2.1 and 4.1 for kernel estimators of densities with variable band-
width are supposed to be satisfied in addition to Conditions 3.1 for kernel
estimators of regression functions. Then supx∈IX,khn k |mb n,hn (x) (x) − m(x)|
converges a.s. to zero with the uniform approximations
µn,hn (x) (x)
mn,hn (x) (x) = + O((nkhn k)−1 ),
fX,n,hn (x) (x)
1/2
(nhn (x))1/2 {m
b n,hn(x) − mn,hn (x) }(x) = nhn (x) (b
µn,hn (x)
−1
− µn,h (x) )(x) − m(x)(fbX,n,h (x) − fX,n,h (x) )(x) f (x) + rn,h
n n n X n (x)
,
−2 (1) (1)
+ δn (x, y)fX (zn (x, y)){w2 − m2 fX }(zn (x, y))
Z
× vK(v − αn (v))K(v + αn (v)) dv + o(khn k)].
Theorem 4.2. Under the conditions, for a density f of class Cs (IX ) and a
varying bandwidth sequence such that nkhn k2s+1 converges to khk, the pro-
cess Un,hn converges weakly to the process defined on IX as Wm + h1/2 bm ,
where Wm is a continuous centered Gaussian process with covariance
2
σm (x)δ{x=x0 } at x and x0 .
then a new estimator for the regression function is defined using this estima-
−1
tor as a weighting process w bn = σ
bn,h n ,δn
in the estimator of the regression
function
Pn
wbn (Xi )Yi Khn (x) (x − Xi )
b wbn ,n,hn (x) (x) = Pi=1
m n .
i=1 wbn (Xi )Khn (x) (x − Xi )
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
2
The bias and variance of the estimator σ bn,h n (x),δn (x)
(x) and the fixed band-
2
width estimator for σ (x) are still similar. The bias of m b wbn ,n,hn (x) (x)
and m b w,n,hn (x) (x) have the same approximations, the variance of
b w,n,hn (x) (x) is identical to the variance of m
m b n,hn (x) (x) whereas the vari-
ance of m b wbn ,n,hn (x) (x) is modified like with the fixed bandwidth estima-
tor. The weak convergence theorem 4.2 extends to the weighted regression
estimator.
to be ergodic, satisfying the properties (2.13) and (2.16). Under the same
conditions as in Chapter 3, the regression function m is estimated on an
interval IX,Y,T,khT k by the kernel estimator
RT
Ys KhT (x) (x − Xs ) ds
b T,hT (x) = 0R T
m .
0
KhT (x) (x − Xs ) ds
The bias and variances established in Section 3.10 for the functions f and m
of class Cs and fixed bandwidth hT are modified, with the notation µ = mf
and the covariance of m b T,hT (x) (x) and m b T,hT (x) (y) is a o((T khT k)−1 ). The
weak convergence of the process (T hT (x))1/2 {m b T,hhT (x) (x) − m(x)} is then
proved by the same methods, under the ergodicity properties.
In a model with a variance function, the regression function is also
−1
estimated using a weighting process w bT = σ bT,h T ,δT
in the estimator of the
regression function
RT
2 {Ys − m b T,hT (Xs ) (Xs )}2 KδT (x) (x − Xs ) ds
bT,hT ,δT (x) = 0
σ RT ,
0 KδT (x) (x − Xs ) ds
RT
bT (Xi )Yi KhT (x) (x − Xi )
w
b wbT ,T,hT (x) (x) = R0 T
m .
0
wbT (X i )K h T (x) (x − X i )
The previous modifications of the bias and variance of the estimator extend
to the continuously observed process (Xt )t≤T .
4.5 Exercises
(1) Compute the fixed and varying optimal bandwidths for the estimation
of a density and compare the respective density estimators.
(2) Give the expressions of the first moments of the varying bandwidth
estimator of the conditional probability p(x) = P (Y |X = x) for a Y
binary variable, conditionally on the value of a continuous variable X
(Exercise 3.10-(2)).
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Chapter 5
5.1 Introduction
87
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
The results for the bias of the estimator FbY |X,n,h extend to a density fX
in Cs as in Lemma 3.2. The weak convergence of the bivariate process
(nh)1/2 (FbY |X,n,h − FY |X ) defined on IX,Y,h requires an extension of the
previous results as for the empirical distribution function of Y .
bF
bY (u; x) = −h2 ◦ QY (u; x) + o(h2 ),
fY |X
and its variance is
vF
vY (u; x) = (nh)−1 ◦ QY (u; x) + o((nh)−1 ).
{fY |X }2
b Y,n,h (u)}
+ V ar{bF,n,h ◦ Q
= (nh)−1 E{vF ◦ Q b Y,n,h (u)}
b Y,n,h (u)} + o(n−1 h−1 + h4 ). (5.11)
+ h4 V ar{bF ◦ Q
The moments of order l ≥ 3 of ηbY,n,h are bounded using an expansion of
the moments of the sum in Equation (5.8) by
n o
e Y,n,h (u)|l + E|(FbY |X,n,h − FY |X ) ◦ Q
2l |bY,n,h ◦ Q b Y,n,h(u)|l ,
thus
h n
e Y,n,h (u)|l + 2l E |bY,n,h ◦ Q
ηY,n,h (u)|l ≤ 2l |bY,n,h ◦ Q
E|b b Y,n,h (u)|l
oi
+ E{|(FbY |X,n,h − FY |X,n,h ) ◦ Q b Y,n,h (u)|l |Q
b Y,n,h (u)} .
b Y,n,h(u) = Q
e Y,n,h (u) + ηbY,n,h (u)
Q
fY |X ◦ QY (u)
+ O(h2 ηbY,n,h (u)) + O(b2
ηY,n,h (u)). (5.13)
b Y,n,h (u) = bF ◦ Q
e Y,n,h (u) + ηbY,n,h (u) e Y,n,h(u)
bF ◦ Q bf ◦ Q
FY |X ◦ QY (u)
+ O(h2 ηbY,n,h (u)) + O(b2
ηY,n,h (u)),
b Y,n,h (u)} = bF ◦ Q
E{bF ◦ Q e Y,n,h (u) + o(1) = bF ◦ QY (u) + o(1).
Moreover, V ar{bF ◦ Q b Y,n,h (u)} = O(h4 + n−1 h−1 ) because of the approxi-
mations V ar{bηY,n,h (u)} = O(h4 +n−1 h−1 ) and E{b 4
ηY,n,h (u)} = o(n−1 h−1 ).
From (5.10), the expectation of ηbY,n,h (u) becomes
ηY,n,h (u)} = o(h2 ).
E{b (5.14)
b Y,n,h (u)} = vF ◦ QY (u) + o(1) and
In the expansion (5.11), E{vF ◦ Q
4 b
h V ar{bF ◦ QY,n,h (u)} = O(h + n−1 h3 ) = o(n−1 h−1 ). The variance of
8
Proof. For every x in IX,n,h and for every u in D bY,n,h , there exists an
b
unique y in IY,n,h such that u = FY |X,n,h (y; x) therefore
b Y,n,h − QY )(u; x) = (Q
(Q b Y,n,h ◦ FbY |X,n,h − QY ◦ FbY |X,n,h )(y; x)
= (QY ◦ FY |X − QY ◦ FbY |X,n,h )(y; x).
From the convergence of FbY |X,n,h to FY |X , it follows
1/2
b Y,n,h − QY } = − {νY |X,n,h + γ
(nh)1/2 {Q
bY } b Y,n,h ,
◦Q
fY |X,n,h
and its limit is deduced from Proposition 5.2.
The representation of the conditional quantile process
b Y,n,h = QY + {(b
Q e Y,n,h − FY |X ◦ QY )/(fY |X ◦ Q
ηY,n,h + FY |X ◦ Q b Y,n,h)}
+ rY,n,h , (5.16)
where ηbY,n,h is defined by (5.8) and where the remainder term rY,n,h is
oL2 ((nhn )−1/2 ), was established to prove Proposition 5.3 and Theorem 5.1.
An analogous representation holds for the quantile process Q b X,n,h
b X,n,h = QX +{(ζbX,n,h +FY |X ◦ Q
Q e X,n,h −FY |X ◦QX )/(fY |X ◦QX )}+rX,n,h
where ζbX,n,h = FY |X ◦ Q
b X,n,h −FY |X ◦ Q
e X,n,h and rX,n,h = oL2 ((nhn )−1/2 ).
The bias bX , the variance vX and the weak convergence of Q b X,n,h are
(1)
deduced. Let FY |X be the derivative of FY |X (y; x) with respect to x.
to zero. If the density fX,Y of (X, Y ) belongs to Cs (IX,Y ), then for every
x in IX and u in D bX,n,h , the bias of Q
b X,n,h equals
bF
bX (y; u) = −h2 ◦ QX (y; u) + o(h2 ),
∂FY |X /∂x
and its variance is
vF
vX (y; u) = (nh)−1 ◦ QX (y; u) + o((nh)−1 ).
{∂FY |X /∂x}2
b X,n,h − QX }1 b
Theorem 5.2. The process UX,n,h = (nh)1/2 {Q {DX,n,h } con-
Wν + limn (nh5n )1/2 bF
verges weakly to UX = ◦ QX .
∂FY |X /∂x
vF ◦ QY (u; x)
AMSEQY (u; x, h) = (nh)−1
{fY |X ◦ QY (u; x)}2
2
4 bF
+h ◦ QY (u; x) .
fY |X
Its minimization in h leads to an optimal local bandwidth, varying with u
and x
1/5
−1/5 vF ◦ QY (u; x)
hopt,loc (u; x) = n .
4b2F ◦ QY (u; x)
That this also the optimal local bandwidth minimizing the AMSE of
FbY |X,n,h (u; x) for the unique value of x such that y = QY (u), that is
AMSEF (u; x, h) = (nh)−1 vF (u; x) + h4 b2F (u; x).
If the density fX has a continuous derivative, that is also identical to the
optimal local bandwidth minimizing the AMSE of Q b X,n,h (y; x), at fixed y,
by Proposition 5.4. Since the optimal rate for the bandwidth has the order
n−1/5 , the optimal rate of convergence of Qb Y,n,h to QY is n−4/5 and the
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
which is the empirical mean of AMASEQY (X, h). Similar ones are defined
for QX and other means. Note that no computation of the global errors and
bandwidths require the computation of integrals of errors for the empirical
inverse functions, all are expressed through integrals or empirical means
of AMSEF with various weights depending on the density of X and the
conditional density of Y given X. The optimal window for AMASEQF (h)
is
Pn
−1/5 {vF (fX,Y )−1 }(Xi , Yi , h) 1/5
n [ Pi=1
n ] ,
4 i=1 {b2F (fX,Y )−1 }(Xi , Yi , h)
The expressions of other optimal global bandwidths are easily written and
all are estimated by plugging estimators of the density, the bias bF and the
variance vF with another bandwidth. The derivatives of the conditional
distribution function are simply the derivatives of the conditional empir-
ical distribution function, as nonparametric regression curves. The mean
squared errors and the optimal bandwidths for the quantile process Q b X,n,h
are written in similar forms, with the bias bX and variance vX .
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
1 −1 (2)
bf = m2K fX {∂ 2 fX,Y /∂y 2 + ∂ 2 fX,Y /∂x2 − fY |X fX },
2
with covariances zero and variance function vf . The optimal bandwidth is
O(n−1/6 ).
1 −1 (2)
bf = m2K fX {∂ 2 fX,Y /∂y 2 + ∂ 2 fX,Y /∂x2 − fY |X fX }.
2
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Generally, the range of the variables X and Y differs and two distinct
kernels have to be used, the bias is then expressed as a sum of two terms
(1) 0 0
bf,n,h,h0 (y; x) = h2 bF (y; x) + h 2 bf (y; x) + o(h2 ) + o(h 2 ),
(1) 1 −1 (2)
bF = m2K fX {∂ 2 fX,Y /∂x2 − fY |X fX },
2
1 −1 2
bf = m2K fX ∂ fX,Y /∂y 2 .
2
The variance of the estimator is the limit of V arfbY |X,n,h,h0 (y; x) written
Z
Kh0 (u − y)Kh0 (v − y)Cov{FbY |X,n,h (du; x), FbY |X,n,h (dv; x)}
Z
−1
= (nh)−1 κ2 fX (x){ Kh20 (u − y) FY |X (du; x)
Z
− Kh0 (u − y)Kh0 (v − y) FY |X (du; x)FY |X (dv; x)}.
0
The first integral develops as I1 = h −1 {κ2 fY |X (y; x) + o(1)}, the second
R
integral I2 = Kh0 (u − y)Kh0 (v − y) FY |X (du; x)FY |X (dv; x) is the sum
2
of the integral outside the diagonal DY = {(u, v) ∈ IX,T ; |u − v| < 2h0n },
which is zero, and an integral restricted to the diagonal which is expanded
by changing the variables like in the proof of Proposition 2.2.
Let αh0 (u, v) = |u − v|/(2h0 ), u = y + h0 (z + αh0 ), v = y + h0 (z − αh0 )
and z = {(u + v)/2 − y}/(h0 )
Z
I2 = Kh0 (u − x)Kh0 (v − x)fY |X (u; x)fY |X (v; x) du dv
DY
Z
0
−1
=h { K(z − αh0 (u, v))K(z + αh0 (u, v)) dzdufY2 |X (y; x) + o(1)}
DY
0 0
and it is equivalent to h −1 κ2 fY2 |X (y; x) + o(h −1 ). The variance of the
estimator of the conditional density fY |X is then
vfY |X,n,h,h0 = (nhh0 )−1 vf (y; x) + o((nhh0 )−1 ),
vf (y; x) = κ2 fY |X (1 − fY |X ) (5.17)
and its covariances at every y 6= y 0 tends to zero.
The asymptotic mean squared error for the estimator of the conditional
(1)2 0
density is M SEfY |X y; x; hn , h0n ) = h4n bF (y; x)+hn4 b2f (y; x)+(nhn h0n )−1 vf ,
it is minimal at the optimal bandwidth
( )1/5
0 1 vf
hn,opt,fY |X (y; x) = (y; x) .
nhn 4b2f
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
In this expression, hn can be chosen as the optimal bandwidth for the kernel
estimator of the conditional distribution function FY |X . The convergence
rate (nhn h0n )1/2 of the estimator for the conditional density is smaller than
the convergence rate of a density and than (nh2n )1/2 = O(n2/5 ), at the
optimal bandwidth.
Assuming that h0n = hn , the optimal bandwidth is now
( )1/6
1 vf
hn,opt,fY |X (y; x) = (y; x)
nh 2b2f
and the convergence rate for the estimator of the conditional density fY |X
is n1/3 which is larger than the previous rate with two optimal bandwidths.
RT
The numerator of (5.18), µ bRF,T,hT (y; x) = T1 0 1{Ys ≤y} KhT (x − Xs ) ds,
has the mean µF,T,hT (x) = IX KhT (x − u) FXs ,Ys (du, y) = F (y; x)f (x) +
h2T bF (y; x) + o(h2T ) with bµ (y; x) = ∂ 3 {F (x, y)}/∂x3 , for a conditional den-
sity of C2 (IXY ).
The weak convergence of Proposition 5.2 is still satisfied with the conver-
gence rate (T hT )1/2 and the notations of Proposition 5.6. The quantile pro-
cesses of Section 5.2 are generalized to the continuous process (Xt , Yt )t>0
and their asymptotic behaviour is deduced by the same arguments from
the weak convergence of (T hT )1/2 (FbY |X,T,hT − FY |X ).
The conditional density fY |X (y; x) of the ergodic limit of the process is
estimated using the kernel KhT , with the same bandwidth as the estimator
of the distribution function FY |X
Z T (R T )
1 K h (x − X s )1 {Y ≤Y }
fbY |X,T,hT (y; x) = 0 T s t
KhT (Yt − y) RT dt.
T 0
0 KhT (x − Xs ) ds
The optimal bandwidth for fbY |X,T,hT is O(T −1/(2s+2) ) and the convergence
rate of fbY |X,T,hT with the optimal bandwidth is T s/(2s+2) , hence it is T 1/3
for s = 2, and the expression of the optimal bandwidth is hT,opt,fY |X defined
in the previous section.
Consider the inverse function (5.4) for a regression function m of the model
(1.6), monotone on a sub-interval Im of the support IX of the regression
variable X. The kernel estimator of the function m is monotone on the
same interval with a probability converging to 1 as n tends to infinity,
by an extension of Lemma 5.1 to an increasing function. The maxima
and minima of the estimated regression function, considered in Section
3.7, define empirical intervals for monotonicity where the inverse of the
regression function is estimated by the inverse of its estimator. Let t belong
to the image Jm by m of an interval Im where m is increasing
b m,n,h (t) = m
Q b −1 (t) = inf{x ∈ Im : mb n,h (x) ≥ t}. (5.19)
n,h
b m,n,h − Qm } con-
Theorem 5.3. On Jm , the process UQm ,n,h = (nh)1/2 {Q
1/2
W1 + γ bm
verges weakly to UQm = ◦ Qm .
m(1)
The inverse of the estimator (3.22) for a regression function of an ergodic
and mixing process (Xt , Yt )t≥0 is (T hT )1/2 -consistent and it satisfies the
same approximations and weak convergence, with the notations and condi-
tions of Section 3.10.
Under derivability conditions for the kernel, the regression function
and the density of the variable X, the estimators m b n,h and its inverse
are differentiable and they belong to the same class which is supposed
to be sufficiently large to allow expansions of order s for estimator of
function m in Ck+s . The derivatives of the quantile are determined
b (1) (1) b m,n,h }−1 ,
by consecutive derivatives of the inverse: Q m,n,h = {mb n,h ◦ Q
Qb (2) = {m (2)
b {m b
(1)3 −1
} }◦Q b m,n,h .
m,n,h n,h n,h
where mj (x) = E(Y | 1Aj X = x) and the expectation in the whole sample
is defined from the probability pj of Aj , the conditional density of X given
Aj and the conditional regression functions given the group Aj
J
X
m(x) = πj (x)mj (x),
j=1
fj (x)
πj (x) = pj = P (Aj | X = x).
f (x)
The density of X in the whole sample is a mixture of J densities condi-
PJ
tionally on the group fX (x) = j=1 pj fj (x) and the ratio f −1 (x)fj (x) is
one if the partition is independent of X. The regression functions and the
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
The inverse processes Qb j,m,n,h are defined as in Equation (5.19) for each
group. The inverse of the conditional probability densities πj are estimated
using the same arguments.
b n − QF ) = −n1/2 ψn
n1/2 (Q ◦ QF + rn (5.20)
f
is unbiased and it converges weakly to a centered Gaussian process with
covariance function c(s, t) = CF (QF (s), QF (t)){f ◦QF (s)}−1 {f ◦QF (t)}−1 ,
for every s and t in [0, F (τ0 )]. The remainder term is such that
supt≤F (τ0 ) krn k is a oL2 (1).
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
A smoothed
R quantile process is defined by integrating the smoothed
process Kh (t−s) dQ bn (s) which is an uniformly consistent estimator of the
(1) R
derivative QF (t) = 1/{f ◦ QF (t)} of QF (t). Its mean is Kh (t − s) dQ(s)
2 (3)
and its bias bQF ,n,h = h2 m2K QF (t) if F belongs to C3 . Its variance
and covariance functions are deduced from the representation (5.20) of the
quantile, for s 6= t and as n tends to infinity
Z tZ s Z 1 Z 1
b n,h (t), Q
Cov{Q b n,h (s)} = n−1 1{u6=v} Kh (u − u0 )
0 0 −1 −1
× Kh (v − v 0 ) du dv d2 c(u0 , v 0 ) + (nh) −1
κ2 c(s ∧ t, s ∧ t) + o(n−1/2 )
= (nh)−1 κ2 c(s ∧ t, s ∧ t) + o(n−1/2 ).
The pointwise conditional mean squared errors for the empirical conditional
distribution function and its inverses reach their minimum at a varying
bandwidth function. So the behaviour of the estimators with such band-
width is now considered. Conditions 4.1 are supposed to be satisfied in ad-
dition to 2.1 or 2.2. The results of Propositions 5.1 and 5.2 still hold with a
functional bandwidth sequence hn and approximation orders o(khn k2 ) for
the bias and o(nkh−1 n k) for the variance and a functional convergence rate
(nhn )1/2 for the process νY |X,n,h . This is an application of Section 4.3 with
the following expansion of the covariances.
Lemma 5.2. The covariance of FbY |X,n,h (y; x1 ) and FbY |X,n,h (y; x2 ) equals
The mean squared errors the functional bandwidth sequences are similar
to the MSE and MISE of Section 5.3.
The conditional quantiles are now defined with functional bandwidths
satisfying the convergence condition 4.1. The representation of the condi-
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
5.9 Exercises
Chapter 6
Nonparametric estimation
of intensities for stochastic processes
6.1 Introduction
107
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
(0)
and the estimators of λ and β are defined by the means of the process Sn
defined by weighting each term of the sum in YT by the regression function
at the jump time
n
X
Sn(0) (t; β) = rZi (t; β)1{Ti ≥t} ,
i=1
with the parametric function rZ (t; β) = exp{β T Z(t)}. For k = 1, 2, let also
(0) Pn ⊗k (0)
Sn (t; β) = i=1 rZi (t; β)Zi (t)1{Ti ≥t} be the derivatives of Sn with
respect to β, let Z ⊗0 = 1, Z ⊗1 = Z and Z ⊗2 be the scalar product. The
true regression parameter value is β0 , or r0 for the function r and the
predictable compensator of the point process Nn is
Z t
Nen (t) = Sn(0) (s; β)λ(s) ds . (6.3)
0
The function λ is estimated by smoothing the cumulative hazard function
of the Cox process, the parameter β by maximizing the partial likelihood
Z 1
bn,h (t; β) =
λ Jn (s){Sn(0) (s; β)}−1 Kh (t − s) dNn (s), (6.4)
−1
Y
βbn,h = arg max {rZi (t; β)λbn,h (Ti ; β)}δi ,
β
Ti ≤τ
en,h = λ
and λ en,h (βen,h ). More generally, a nonparametric function r is esti-
mated by a stepwise process rbn,h defined on each set Bjh of the partition
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
(Bjh )j≤Jh , centered at ajh . Let also (Dlh )l≤Lh be a partition of the val-
ues Zi (t), i ≤ n, centered at zlh . The function r is estimated in the form
P
ren,h (Z(t)) = l≤Lh ren,h (zlh )1Dlh (Z(t))
X Z Z
en,h (t; r) =
λ 1Bjh (t) Jn (s) dNn (s) [ Sn(0) (s; r) ds]−1 ,
j≤Jh Bjh Bjh
Y X
ren,h (zlh ) = arg max [{ en,h (Ti ; rlh )]δi ,
rl 1Dlh (Zi (Ti ))}λ (6.6)
rl
Ti ≤τ l≤Lh
(0) P
where Sn (t; r) = ni=1 rZi (t)1{Ti ≥t} is now defined for a nonparametric
regression function, then λ en,h (t, Z) = λen,h (t, ren (Z(t))). A kernel estimator
for the functions λ is similarly defined by
Z 1
b
λn,h (t; r) = Jn (s){Sn(0) (s; r)}−1 Kh (t − s) dNn (s) . (6.7)
−1
An approximation of the covariates values at jump times by z when they are
sufficiently close allows to build a nonparametric estimator of the regression
function r like β in the parametric model for r
X n Z 1
rbn,h (z) = arg max bn (s; rz )}Kh (z − Zi (s)) dNi (s),
{log rz (s) + log λ 2
rz −1
i=1
where h2 = hn2 is a bandwidth sequence satisfying the same conditions as
bn (t, Z) = λ
h, and λ bn (t; rbn (t, Z(t)).
The L2 -risk of the estimators of the intensity functions splits into a
squared bias and a variance term and the minimization of the quadratic risk
provides an optimal bandwidth depending on the parameters and functions
of the models and having similar rates of convergence, following the same
arguments as for the optimal bandwidths for densities.
Conditions for expanding the bias and variance of the estimators are added
to Conditions 2.1 and 2.2 of the previous chapters concerning the kernel and
the bandwidths. For the intensities, they are regularity and boundedness
conditions for the functions of the models and for the processes.
Condition 6.1.
(1) As n tends to infinity, the process Yn is positive with a probability
tending to 1, i.e. P {inf [0,τ ] Yn > 0} tends to 1, and there exists a
function g such that sup[0,τ ] |n−1 Yn − g| tends a.s. to zero;
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Rτ
(2) 0 g −1 (s)λ(s) ds < ∞
(3) The functions λ and g belong to Cs (R+ ), s ≥ 2.
(b) For every t in In,h,τ , the bias bλ,n,h (t) = λh (t) − λn,h (t) develops as
Z 1
bλ,n,h (t) = Kh (t − u)E{Jn (u)λ(u) − Jn (t)λ(t)} du
−1
Z 1
= E{Jn (t + hz)λ(t + hz) − Jn (t)λ(t)}K(z) dz
−1
s
h (s)
= λ (t) + o(hs ),
s!
where EYn (s) = P (Yn (s) > 0) = P (maxi≤n Ti > s) = 1 − FTn (s) belongs to
]0, 1[ for every s ≤ τ , for independent times Ti , i ≤ n. Its variance is
Z 1
V ar{λbn,h (t)} = E Kh2 (t − u)Jn (u)Yn−1 (u)λ(u) du
−1
Z
= (nh)−1 K 2 (z)EJn (t + hz)g −1 (t + hz)λ(t + hz) dz
The higher order moments are deduced by iterations. In each case, the
main term is expressed as the integral of the product of a squared kernel
at a time Ti and other kernel terms at Tjk , where all time variables are
independent.
For p = 2, E{λ bn,h (t) − λn,h (t)}2 develops on In,h,τ as the sum of
a squared bias and a variance terms {λn,h (t) − λ(t)}2 + V ar{λ bn,h (t)}
and its first order expansions are (s!) msK h λ (t) + o(h2s ) +
−2 2 2s (s)2
An estimator of the derivative λ(k) or its integral are defined by the means
of the derivatives K (k) of the kernel, for k ≥ 1
Z
b(k) (t) = K (k) (t − s)Jn (s)Y −1 (s) dNn (s).
λn,h h n
(1) s
for an intensity of C3 or λn,h (t) = λ(1) (t) + hs! msK λ(s) (t) + o(hs ) for an
R
b(1) is (nh3 )−1 g −1 (t)λ(t) K (1)2 (z) dz +
intensity of Cs . The variance of λ n,h
o((nh3 )−1 ). The optimal local bandwidth for estimating λ(1) belonging to
Cs is therefore R
λ(t) K (1)2 (z) dz 1/(2s+3)
hAMSE (λ(1) ; t) = n−1/(2s+3) s!(s − 1)! .
2m2sK g(t)λ(3)2 (t)
For the second derivative of the intensity, the estimator λb(2) is the deriva-
n,h
b(1) expressed by the means of the second derivative of the ker-
tive of λ n,h
nel. For a function λ in class C4 , the expectation of the estimator
b(2) is λ(2) (t) = λ(2) (t) + h2 m2K λ(4) (t) + o(h2 ), so it converges uni-
λ n,h n,h 2
formly to λ(2) . The bias of λ b(2) is h2 m2K λ(4) (t) + o(h2 ) and its variance
R n,h 2
(nh5 )−1 g −1 (t)λ(t) K (2)2 (z) dz + o((nh5 )−1 ).
Proposition 6.2. Under Conditions 2.2, for every integers k ≥ 0 and
b(k) of the
s ≥ 2 and for intensities belonging to class Cs , the estimator λn,h
k-order derivative λ(k) has a bias O(hs ) and a variance O((nh2k+1 )−1 ) on
In,h,τ . Its optimal local and global bandwidths are O(n−s/(2k+2s+1) ) and
the optimal L2 -risks are O(n−s/(2k+2s+1) ).
Consider the normalized process
bn,h (t) − λ(t)}, t ∈ In,h,τ .
Uλ,n,h (t) = (nh)1/2 {λ
The tightness and the weak convergence of Uλ,n,h on In,h,τ are proved
by studing moments of its variations and the convergence of its finite di-
mensional distributions. For independent and identically distributed obser-
vations of right-censored variables, the intensity of the censored counting
process has the same degree of derivability as the density functions for the
random times of interest.
Lemma 6.1. Under Conditions 6.1, for every intensity of Cs there exists
a constant C such that for every t and t0 in In,h,τ satisfying |t0 − t| ≤ 2h
V ar λ bn,h (s) 2 ≤ C(nh3 )−1 |t − t0 |2 .
bn,h (t) − λ
Theorem 6.1. Under Conditions 6.1, for a density λ of class Cs (Iτ ) and
with nh2s+1 converging to a constant γ, the process
bn,h − λ}1{I
Uλ,n,h = (nh)1/2 {λ n,h,τ }
converges weakly to supIτ |W1 |, where W1 is the Gaussian process with mean
zero, variance 1 and covariances zero.
For every η > 0, there exists a constant cη > 0 such that
Pr{ sup |σλ−1 (Uλ,n,h − γ 1/2 bλ ) − W1 | > cη }
In,h,τ
R
and it is also written h2 (P1 , P2 ) = {1 − ( λλ12 )1/2 e−(Λ1 −Λ2 )/2 } dF1 . The
estimator of a function λ satisfies
Z b b
bn,h , λ) = {1 − ( λn,h 1 − Fn )1/2 } dF
h2 (λ
λ 1−F
Z b
λn,h 1 − Fbn 1/2
≤ {( ) − 1} d(Fbn − F )
λ 1−F
bn,h , λ) to zero is nh1/2
and the convergence rate of h2 (λ n .
and Ekλ bn,h (t) (t) − λn,h (t) (t)kp = 0((n−1 kh−1 k)1/p ). The covariance of
n n n
bn,h (t) (t) and λ
λ bn,h (t) (t0 )} equals
n n
Z
E Khn (t) (t − u)Khn (t0 ) (t0 − u)Yn−1 (u) dΛ(u)
Z
2 −1 0
= {(g λ)(z n (t, t )) K(v − αn (v))K(v + αn (v)) dv
n{hn (t) + hn (t0 )}
Z
+ δn (x, y)(g −1 λ)(1) (zn (x, y)) vK(v − αn (v))K(v + αn (v)) dv + o(khn k)}
also denoted ebλ,h (t) = hebλ (t) + o(h) and it is larger than the bias of kernel
estimator. Assuming that V ar{n−1/2 (Yn − g)(t)} = O(1) for every t, the
variance of the denominator of λ en,h (aj,h ) equals (nh)−1 V arn−1/2 Yn (ajh ) +
o((nh)−1 ) = O((nh)−1 . For every t in Bj,h , the variance of λ en,h (t) is
R R
B
g(s)λ(s) ds 2 g(s)λ(s) ds 2
ven,h (t) = E λen,h (t)− R j,h
− Eλ en,h (t)− BRj,h ,
Bj,h g(s) ds Bj,h g(s) ds
and the last term is a o(n−1/2 h1/2 ), by (6.8). Following the same calculus
as for the variance of the nonparametric estimator of a regression function,
Z
−2
ven,h (t) = g(t) + o(h) V ar{(nh)−1 Jn dNn }
Bj,h
Z Z
− 2λ(t)Cov{(nh)−1 Jn dNn , (nh)−1 Yn ds}
Bj,h Bj,h
Z
+ λ2 (t)V ar{(nh)−1 Yn ds} + o((nh)−1 )
Bj,h
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
R
with V ar{(nh)−1 Bj,h
Jn dNn } = (nh)−1 (gλ)(ajh ) + o((nh)−1 ) and the
covariance term
Z Z
−2
(nh) E{ Jn (dNn − Yn dΛ), (Yn − g) ds} = O((nh)−1 )
Bj,h Bj,h
This expression and the AMSE do not depend on the degree of derivability
of the intensity.
The estimators (6.4) for the exponential regression of the intensity are spe-
cial cases of those defined by (6.7) in a multiplicative intensity with ex-
planatory predictable processes and an unknown regression R function r. For
every t in In,h,τ , the mean of λbn,h (t; r) is still λn,h (t) = 1 Kh (t−s)λ(s) ds
−1
and their degree of derivability is the same as K.
With a parametric regression function r, the convergence in the
first condition of 6.1 is replaced by the a.s. convergence to zero of
(k) (k) (k) (k)
supt∈[0,τ ] supkβ−β0 k≤ε |n−1 Sn (t; β) − s0 (t)|, where s0 = s0 (β0 ), for
k = 0, 1, 2, and ε > 0 is a small real number. In a nonparamet-
ric model, this condition is replaced by the a.s. convergence to zero of
(k) (k) (k)
supt∈[0,τ ] supkr−r0 k≤ε |n−1 Sn (t; r) − s0 (t)|, where s0 = s(k) (r0 ), for
k = 0, 1, 2.
The previous conditions 6.1 are modified by the regression function. For
expansions of the bias and the variance, they are now written as follows.
Condition 6.2.
(k) (k)
(1) As n tends to infinity, the processes n−1 Sn (t; β) and n−1 Sn (t; r) are
positive with a probability tending to 1 and the function defined by
(k)
s(k) (t) = n−1 ESn (t) belongs to class C2 (R+ );
(0)
(2) The function pn (s) = Pr(Sn (s; r) > 0) belongs to class C2 (R+ ) and
p
R nτ(τ, r0 ) converges to 1 in probability;
(3) 0 r(z)g −1 (s)λ(s) ds < ∞
(4) The functions λ and g belong to C2 (R+ ) and r belongs to Cs (Z).
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Proof. The first derivative of Lbn with respect to rz and its expectation
bn
depend on the derivative of λ
Z
b(1) (t; rz ) = −
λ Kh (t − s)Sn(1) (s; rz )Sn(0)−2 (s; rz ) dΛ(s),
n
In,h,τ
n Z
X b(1)
1 λn (s; rz )
Lb(1)
n = n
−1
{ + } Kh (z − Zi (s)) dNi (s),
In,h,τ rz (s) bn (s; rz )
λ
i=1
Z 1 b(1)
1 λn (s; rz )
L(1)
n = E { + } Kh (z − Z(s))Sn(0) (s; rz ) fZ(s) (z)λ(s) ds
−1 rt,z (s) b
λn (s; rz )
(1) (1)
is such that V arLbn (z; r) = O((nh)−1 ), therefore (nh)1/2 (Lbn − L(1) )(z; r)
is bounded in probability, and the second derivative is a Op (1). By a Taylor
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
(1)
expansion of Lbn (z; r) in a neighbourhood of the true value r0z ≡ r0 (z) of
the regression function at z
Lb(1)
n (z; r)−L
(1)
(z; r0 ) = {(rz (0)−r0z (s)}T Lb(2) rn,h −r0 )2 (z(s)))
n (z; r0 )+Op ((b
Proposition 6.4. The processes (nh)1/2 (λ bn,h − λn,h )(r0 )1I and
τ,n,h
Z (1)
bn,h −λ)+(nh)1/2 (b Sn
(nh)1/2 (λ rn,h −r0 ) (0)3
(s; rbn,h )Kh (·−s) Jn (s) dNn (s)}
Sn
converge weakly to the same continuous and centered Gaussian process on
Iτ , with covariances zero and variance function vλ = κ2 s(0)−1 (r0 )λ.
The bias of the estimator λ bn,h is obtained by smoothing the bias bΛ,n,h (t)
and its first order approximation can be written as the mean of
Z 1 (1)
bbλ,n,h (t) = (b Sn (s, rbn,h )
rn,h − r)(t) Kh (t − s) (0) dNn (s).
−1 {Sn (t, rbn,h )}3
Its mean
Z
Ln (β) = {β T s(1) b s(0)
n (s; β0 ) + E log{λn (s; β)}bn (s; β0 )}λ(s) ds
In,h,τ
Rτ
converges to L(β) = 0
{β T s(1) (s; β0 ) + {log λ(s; β)}s(0) (s; β0 )}λ(s) ds.
converge weakly to the same continuous and centered Gaussian process with
covariances zero and variance function vλ = κ2 s(0)−1 (β0 )λ.
Proof. The derivatives with respect to β of the partial likelihood Lbn are
written
n
X b(1)
λn (Ti ; β)
Lb(1)
n (β) = n
−1
δi {Zi (Ti ) + },
b
λn (Ti ; β)
i=1
b(1)⊗2
λn
b(2)
λn
Lb(2)
n (β) = −n −1
δ i − (Ti ; β),
b2
λ bn
λ
n
(1)
bn , k = 1, 2, is deduced from the martingale property
The expectation of λ
e
of Nn − Nn
Z (1)
Sn (s; β) (0)
λ(1)
n (t; β) = (0)2
Sn (s; β0 )λ(s)Kh (t − s) ds,
In,h,τ Sn (s; β)
(1)
Sn (t; β)
= (0)2
Sn(0) (t; β0 )λ(t)
Sn (t; β)
(1)
m2K h2 Sn (t; β) (0) (2)
+ (0)2
Sn (t; β0 )λ(t) + o(h2 ),
2 Sn (t; β)
Z (1)⊗2 (2)
Sn Sn
λ(2)
n (t; β) = 2 (0)2
− (0)
(s; β)Kh (t − s)λ(s) ds
In,h,τ Sn Sn
(1)⊗2 (2)
Sn Sn
= 2 (0)2
− (0)
(t; β)λ(t)
Sn Sn
(1)⊗2 (2)
m2K h2 Sn Sn (2)
+ { 2 (0)2 − (0) (t; β)λ(t) + o(h2 ).
2 Sn Sn
(1) (2)
It follows that Ln (β) and Ln (β) converges to L(1) (t; β)
Z τ
(1)
L (β) = {s(1) (1) (0)−2
n (s; β0 ) − sn (s; β)sn (s; β)s(0)2
n (s; β0 )}λ(s) ds,
0
Z τ
λ(1)⊗2 λ(2)
L(2) (β) = − − (s; β)s(0)
n (s; β0 )λ(s) ds,
λ2 λ
Z0 τ
L(2) (β0 ) = − s(2) (s; β0 )λ(s) ds,
0
where −L(2) (β0 ) is positive definite and L(1) (t; β0 ) = 0 so that the max-
imum βbn,h of Lbn converges in probability to β0 , the maximum of the
(1)
limit L of Ln . The rate of convergence of βbn,h − β0 is that of Lbn (β0 ).
First n1/2 (Lb(1) )n − L(1) )n )(β0 ) is the sum of stochastic integrals of pre-
dictable processes with respect to centered martingales and it Rconvergences
weakly to a centered Gaussian variable with variance v(1) = In,h,τ {s(2) −
s(1)⊗2 s(0)−1 )(s; β0 )λ(s) ds. Secondly
Z
n1/2 (L(1)
n − L (1)
)(β 0 ) = [n1/2 {n−1 (Sn(1) − s(1) }(s; β0 )
In,h,τ
b(1)
λn
+ n1/2 { n−1 Sn(0) − s(1) }(s; β0 )]λ(s) ds,
b
λn
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
(1) (1)
is continuous and independent of n1/2 (Lbn − Ln )(β0 ), its integrand is a
(1)
sum of three terms l1n + l2n + l3n where l1n = n1/2 {n−1 (Sn − s(1) }(s; β0 )
convergences weakly to a centered Gaussian variable with a finite variance,
(1)
λn,h
l2n = n1/2 { n−1 Sn(0) − s(1) }(·; β0 ),
λn,h
b(1)
λ
(1)
λn,h
n,h
l3n = n1/2 [Sn(0) { − }](·; β0 ).
bn,h
λ λn,h
The term l2n convergences weakly to a centered Gaussian variable with a
finite variance and l3n has the same asymptotic distribution as
b(1) − λ(1) ) − λ(1) λ−1 (λ
n1/2 s(0) λ−1 {(λ bn,h − λn,h )}(·; β0 )
n,h n,h
where the process
bn,h − λn,h )(t; β0 )
(nh)1/2 (λ
Z τ
=n 1/2 en )(s)
Sn(0)−1 (s; β0 )Kh (t − s) Yn (s) d(Nn − N
0
R (0)−1
has the mean zero and the variance h Iτ,n,h
Sn (s; β0 )Kh2 (t − s)λ(s) ds
which converges in probability to vλ = κ2 s(0)−1 (t; β0 )λ(t). In the same
way, the process
Z (1)
1/2 b(1) (1) 1/2 Sn en )(s)
n (λn,h − λn,h )(t; β0 ) = n (0)2
(s; β0 )Kh (t − s) d(Nn − N
In,h,τ Sn
is consistent and it has the finite asymptotic variance
vλ,(1) (t) = s(1)⊗2 s(0)−3 (t; β0 )λ(t).
The term l3n with asymptotic variance zero converges in probability to zero.
The proof of the weak convergence of βbn ends as previously. The process
bn,h − λ)(t; βbn ) develops as
(nh)1/2 (λ
Z
n1/2 Jn (s)Sn(0)−1 (s; βbn )Kh (t − s) d(Nn − N
en )(s)
Iτ,n,h
Z
+ {Sn(0)−1 (s; βbn ) − Sn(0)−1 (s; β0 )}Sn(0) (s; β0 )Kh (t − s)λ(s) ds
Iτ,n,h
the first term of the right-hand side converges weakly to a centered Gaus-
sian process with variance κ2 s(0)−1 (t; β0 )λ(t) and covariances zero, and the
second term is expanded into
Z
− n1/2 (βbn,h − β0 )T Sn(1) (s; β0 )Sn(0)−1 (s; β0 )Kh (t − s)λ(s) ds
Iτ,n,h
= −n 1/2
(βbn,h − β0 )T s(1) (t; β0 )s(0)−1 (t; β0 )λ(t) + o(1).
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
The histogram estimator (6.2) for the intensity λ with a parametric regres-
sion or (6.6), for nonparametric regression of the intensity, are consistent
estimators as h tends to zero and n to infinity. Their expectations are ap-
proximated like (6.8) by a ratio of means. Their variances are calculated
as in Section 6.2.2.
The nonparametric regression function r is estimated by
X
ren,h (z) = ren,h (zlh )1Dlh (z)
l≤Lh
and the histogram estimator for the intensity defines the estimator ren,h of
the regression function by
Y X
ren,h (zlh ) = arg max [{ en,h (Ti ; rlh )]δi ,
rl 1Dlh (Zn (Ti ))}λ
rl
Ti ≤τ l≤Lh
X Z Z
en,h (t; r) =
λ 1Bjh (t) Jn (s) dNn (s) [ Sn(0) (s; r) ds]−1 .
j≤Jh Bjh Bjh
(0) Pn
For every t in Bj,h , let Sn (t; r) = i=1 rZi (t)Yi (t), the limit of
−1 (0) (0)
P
n Sn (t; r) is s (t; r) = l≤Lh rzlh Pr(Z(t) ∈ Dlh ) + o(1) and its vari-
ance is
X
v (0) (t; r) = n−1 rz2lh [Pr(Z(t) ∈ Dlh )g(t)−{Pr(Z(t) ∈ Dlh )g(t)}2 ]+o(1)
l≤Lh
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
and its bias is ebλ,h (t; rlh ) = hλ(1) (t)+o(h), uniformly on Iτ,n,h . Its variance
is
R
B
s(0) (s; rlh )λ(s) ds 2
e
ven,h (t; rlh ) = E λn,h (t) − Rj,h + O((nh)−1 )
s (0) (s; r ) ds
Bj,h lh
Z
(0) −2 −1
= s (t; rlh ) V ar{(nh) Jn (s) dNn }
Bj,h
Z Z
−1 −1
− 2λ(t)Cov{(nh) Jn dNn , (nh) Sn(0) (s; rlh ) ds}
Bj,h Bj,h
Z
+ λ2 (t)V ar{(nh)−1 Sn(0) (s; rlh ) ds} + o((nh)−1 ) + o(h)
Bj,h
(1) (1)
and it satisfies Ln,h (er1h , . . . , reLh ,h ) = 0, where Ln,h is a vector with com-
ponents the derivatives with respect to the components of rh = (rlh )l≤Lh
Z X
(1) 1
Ln,h,l (rh ) = { 1D (Zn (s))} Jn (s) dNn (s)
rlh lh
l≤Lh
Z
+ λ e(1) (s; rlh ) Jn (s) dNn (s).
n,h,l
(1)
It is zero at βen,h and its convergence rate is a O((nh3 )), like λ
e . Therefore,
nh
e 3
βn,h has the convergence rate O((nh )) and the estimator of the hazard
function has the convergence rate O((nh)).
and it converges weakly to a Gausian process with mean zero and a finite
R t+x
variance. The probability Pt (t + x) = exp{− t λ(s) ds} is also estimated
by the product-limit estimator on the interval ]T1:n , Tn:n [
Y 1{t<Ti ≤t+x} Jn (Ti )
Pbn,t (t + x) = {1 − }, (6.9)
Yn (Ti )
1≤i≤n
Pt − Pbn,t
Bn,t (t + x) = n1/2 (t + x) (6.10)
Pt
Theorem 6.2. On every interval [a, b] included in the interval ]T1:n , Tn:n [,
the estimator Pbn satisfies
Z t+x
Bn,t (t + x) = n 1/2 b n − Λ).
d(Λ
t
b n be the martin-
Proof. Let [a, b] be a sub-interval ofR ]T1:n , Tn:n [ and let Λ
b t −1
gale estimator of Λ = − log F̄ , Λn = 0 Jn Yn dNn is uniformly consistent
b n − Λ) converges weakly to a centered Gaussian process
on [a, b] and n1/2 (Λ
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Rt
with independent increments and variance vΛ (t) = 0 F̄ −2 dF . The process
Bn,t (t + x) is expanded as
Z t+x Z t+x
− Pt (t + x)Bn,t (t + x) = n1/2 [exp{− b n } − exp{−
dΛ dΛ}]
t t
Z t+x
+ n1/2 [exp{log Pbn,t (t + x)} − exp{− dΛb n }]
t
Z t+x Z t+x
= −n1/2 b n − Λ) exp{−
d(Λ dΛ∗n } {1 + o(1)}
t t
Z t+x b̄ (t + x) Z t+x
F n b n },
+ n1/2 exp{− dΛ∗∗
n } {log − dΛ
t Fb̄ (t) t
n
2
CP (t + x, t + y; t) = Pt (t + x)Pt (t + y) lim EBn,t (t + x ∧ y)
n
= Pt (t + x)Pt (t + y)vB (t + x ∧ y; t).
The weak convergence of n1/2 (F̄n − F̄ ) on the interval [0, Tn:n ] allows to
extend theR previous proposition to a weak convergence on [T1:n , Tn:n ]. For
τ
t < τF , if 0 F F̄ −1 dΛ < ∞
Z (t+x)∧Tn:n
Pt − Pbn,t 1 − Fbn (s− ) b
(t + x) = d(Λn − Λ)(s). (6.11)
Pt t∨T1:n 1 − F (s)
Therefore, the process defined for t and t + x in [T1:n , Tn:n ] by n1/2 {(Pt −
Pbn,t )Pt−1 }(t + x) converges weakly on the support of F to a centered Gaus-
sian process BP , with independent increments and variance vF̄ (t + x) −
vF̄ (t). By the product definition of the estimator of the probability Pt (t+x),
it satisfies
Z (t+x)∧Tn:n
b
Pn,t (t + x) = Pbn,t (t + s− ) dΛ
b n (s).
t∨T1:n
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
For x < y, the covariance of Pbn,t (x) and Pbn,t (y) is n−1 Pt (t + x)Pt (t +
y)vB (t+ x; t)+ o(n−1 ), hence for |x− y| > 2h, the limit of nCn,h,p converges
to
Z Z
Kh (t + x − u)Kh (t + y − v)pt (t + u)pt (t + v)vB (t + u; t)
(1)
+ Pt (t + u)Pt (t + v)vB (t + u; t)} du dv
(1)
= pt (t + y){pt (t + x)vB (t + x; t)}(1)
(1)
+ pt (t + y){Pt (t + x)vB (t + x; t)}(1) ,
if |x − y| ≤ 2h, Cn,h,p = 0((nh)−1 ). Then, the process (nh)1/2 (b
pn,h − p)
converges weakly to a Gaussian variable with mean zero, covariances zero
and variance function vp .
of the estimator satisfy supt∈IT ,h,τ kλ bT,h (t) − λT,h (t)kp = 0((T hT )−1/p )
T T
and sup b
kλT,h (t) − λ(t)kp = 0((T hT ) −1/p
+ hs ). Under a mix-
t∈IT ,hT ,τ T
ing condition for the point process (NT , YT ) which ensures the weak con-
bT,h (t) − λ)
vergence of the process T 1/2 (YT − g), the process (T hT )1/2 (λ T
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
converges weakly to the Gaussian process limit of Theorem 6.1. Now the
optimal bandwidth are O(T −1/(2s+1) ) and the minimal mean squared errors
are O(T −2s/(2s+1) ).
In models with covariates, the process YT is modified by a regression
(0) PN (T )
function at the jump times as ST (t; β) = i=1 rZi (t; β)1{Ti ≥t} with a
(0) PN (T )
parametric regression function or ST (t; r) = i=1 rZi (t; )1{Ti ≥t} with
a nonparametric regression
R function. The predictable compensator of NT
becomes N eT (t) = t∧T S (0) (s; r)λ(s) ds. The function λ is estimated by
0 T
smoothing the estimator of the cumulative hazard function
Z 1
bT,h (t; β) = (0)
λ {ST (s; β)}−1 Kh (t − s) dNT (s)
−1
N (T ) Z T
X
rbT,hT (z) = arg max Kh2 (z − Zi (s)){log rz (s)
rz 0
i=1
bT,h (s; rz )}Kh (z − Zi (s)) dNi (s)
+ log λ T 2
T
λ(t | X, Z) = λ(t)eβ(X) Z(t)
. (6.14)
ei = Xi ∧ (Ci − Si )
Let X
In,τ = {(s, x); s ∈ [hn , τ − hn ], x ∈ [0, τ − s]} ,
Yi (x) = 1{T 0 ∧Ci ≥Si +x} = 1{Xei ≥x} ,
i
X
(0) −1
Sn (x; s, β) = n Khn (s − Sj )Yj (x) exp{β T Zj (Sj + x)}.
j
Rx
The estimator of Λ0,X|S (x; s) = 0 λ0,X|S (y; s) dy is defined for (s, x) ∈ In,τ
by Λ̂n,X|S (x; s) = Λ̂n,X|S (x; s, β̂n ) with
X Khn (s − Si )1{Si ≤Ci ,Xi ≤x∧(Ci −Si )}
Λ̂n,X|S (x; s, β) = (0)
.
i nSn (Xi ; s, β)
The estimator βbn of the regression coefficient maximizes the following par-
tial likelihood
X h i
ln (β) = δi β T Zi (Ti0 ) − log{nSn(0) (Xi ; Si , β)} εn (Si )
i
1/2
bn −Bn kIn,τ = op (hn ). This property im-
processes Bn on In,τ such that kB
b n,X|S − Λ0,X|S )1{I }
plies the weak convergence of the process (nhn )1/2 (Λ n,τ
where Yi (t) = 1{Ti ≥t} is the risk indicator for individual i at t. Let
(0) P β(Xi )T Zi (t)
Sn (t, β) = i Yi (t)e , an estimator of the integrated baseline
R
hazard function is Λ b n (t) = t Sn(0)−1 (s, βbn,h ) dNn (s). For every x in IX,h ,
0
the process n−1 ln,x converges uniformly to
Z τ
lx (β) = (β − β0 )(x)T s(1) (t, β0 (x), x)
0
s(0) (t, β(x), x)
− s(0) (t, β0 (x), x) log dΛ0 (t)
s(0) (t, β0 (x), x)
which is maximum at β0 hence βbn,h (x) = arg max ln,x (x) converges to
β0 (x). Let Un,h (·, x) and In,h (·, x) be the first two derivatives of the pro-
cess ln,x with respect to β at fixed x in IX,h , the estimator of β(x) satisfies
Un,h (βbn,h (x), x) = 0 and In,h (x) ≤ 0 converges uniformly to a limit I(x).
By a Taylor expansion Un,h (β0 (x), x) = (βbn,h (x) − β0 (x))T {I(β0 , x) + o(1)}
and (βbn,h (x) − β0 (x)) = {In,h (β0 , x) + o(1)}}−1 Un,h (β0 (x). Under the as-
sumptions that the bandwidth is a O(n−1/5 ) and the function β belongs to
the class C2 (IX ), the bias of βbn,h (x) isRapproximated by I −1 (β0 , x)h2 u(x)
τ
where u(x) has the form u(x) = m22K 0 φ(t, x) dΛ0 (t) and its variance is
(nh)−1 κ2 I −1 (β0 , x) + o((nh)
R
−1
). The asymptotic mean integrated squared
error AM ISEw (h) = E Xn,h kβbn,h (x) − β0 (x)kw(x) dx for βbn,h (x) is there-
fore minimal for the bandwidth
R
κ2 Xn,h kI −1 (β0 , x)kw(x) dx
hn,opt = n−1/5 R −1 (β , x)kw(x) dx
Xn,h u(x)kI 0
Proposition 6.6. For every x in IX,n,h , the variable (nhn )1/2 (βbn,h −β0 )(x)
converges weakly to a Gaussian variable N (0, γ2 (K)I0−1 (x)).
b n − Λ0 ) converges weakly to the Gaussian pro-
TheR processR (nhn )1/2 (Λ
·
cess − 0 G(t){ s (t, y) dy}−2 dΛ0 (t), where the process G is the limiting
(0)
distribution of Gn .
The convergence (nhn )1/2 for the estimator of Λ comes from the vari-
R t (0) rate (0)−2
ance E 0 Sn (s, β0 )Sn (s, βbn,h ) dΛ0 (s) of Λ
b n (t) − Λ0 (t) developed by a
first order Taylor expansion.
Xm
E{Yn,m (t) | R} = Rj Ḡj (t)F̄ (t).
j=1
Let µR = lim Rj for j = 1, . . . , m, and Jn,m (t) = 1{Yn,m (t)>0} , the estimator
of the cumulative hazard function Λ and its derivative λ are
Z t
b n,m (t) =
Λ −1
Yn,m Jn,m dNn,m ,
0
Z
bn,m (t) = Kh (t − s) dΛ
λ b n,m (s).
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Assuming that there exists an uniform limit for the mean survival func-
Pm
tion Ḡ = limm→∞ m−1 j=1 Ḡj , the process n−1 Yn,m converges uniformly
to its expectation
Rt µY (t) = µR Ḡ(t)F̄ (t), and n−1 Nn,m converges uniformly
to µR 0 Ḡ dF . The estimators Λ b n,m and λ bn,m are then unbiased and uni-
1/2 b
Rt −1
formly consistent. The variance of n (Λn,m −Λ)(t) is 0 nYn,m Jn,m dΛ and
R t −1
it converges to vΛ (t) = 0 µY dΛ. The variance of the estimator λ bn,m (t) is
(1)
(nh)−1 κ2 vΛ (t) + o((nh)−1 ) and both estimator processes converge weakly
to Gausian processes with zero mean and these variances. The process
b n,m − Λ) has independent increments and n1/2 (λ
n1/2 (Λ bn,m − λ) has asymp-
totic covariances zero. All results for multiplicative regression models with
independent censoring times apply to this progressive random censoring
scheme. With nonrandom numbers Rj , the necessary condition for the
Pm
convergence of the processes is the uniform convergence of n−1 j=1 Rj Ḡj .
6.9 Exercises
Chapter 7
Estimation in semi-parametric
regression models
7.1 Introduction
where
V (η, θ) = E[σ −1 (η T Xi ){Y − g(θT X)}2 ]
is the mean weighted squared error at a fixed parameters value η and θ.
Several empirical criteria can be defined for estimating V , estimators of θ0
satisfying the same property (7.2) are deduced.
Let (Xi , Yi )i=1,...,n be a sample of observations in Rd+1 , in the model
with known variance function. At fixed θ, let gbn,h (z) be the nonparametric
137
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
(bT
ηn,h , θbn,h
T
)T = arg min Vbn,h (η, θ), (7.4)
η,θ∈Θ
We first assume that the variance function is known and denoted σ 2 (x),
the error and the estimator of g are then only normalized by σ −1 (x). The
global error (7.3) and the estimator (7.4) have been modified by considering
the mean of local empirical squared errors. In a neighborhood of z, a local
empirical squared error is defined by smoothing (7.3)
n
X
Vbn,h (z; θ) = n−1 σ −1 (Xi ){Yi − b
gn,h (θT Xi ; θ)}2 Kh (z − θT Xi ),
i=1
T
A global estimator θ̄n,h
of θ was defined by an empirical mean of the local
estimators at the random point Z bn,i = θbT
n,h,Zi Zi . Then an estimator of the
regression function m is
T
m̄n,h (x) = gbn,h (θ̄n,h x; θ̄n,h ). (7.5)
with sums on values individuals having close values of the regression vari-
able. Finally, a third estimator of the regression function is
gn,h (θen,h
e n,h (x) = b
m T
x; θen,h )
of the limits V (θ) and W (θ) is θ0 , all estimators converge to θ0 . The mini-
(1)
mization of Vbn,h provides an estimator θbn solution of Vbn,h (θ) = 0 where
n
X
(1) (1)
Vbn,h (θ) = 2n−1 {Yi − gbn,h (θT Xi ; θ)} {b
gn,h (θT Xi ; θ)} Xi
i=1
The convergence rate of vn,h is the same as θbn,h − θ, which is (nh3 )1/2 like
(1) (1)
(Vbn,h − V (1) )(θ0 ) since the bias of Vbn,h (θ0 ) disappears with hn = o(n−1/7 ).
The process (nh)1/2 un,h (x) = (nh)1/2 (b gn,h − g)(θ0T x){1 + op (1)} is a Op (1),
then the convergence rate of {m b n,h (x) − m(x)} is (nh3 )1/2 .
The bandwidth minimizing the sum of the squared bias and the vari-
(1) (1)
ance of Vbn,h (θ0 ) is hV,n = O(n−1/7 ) and the convergence rate of Vbn,h (θ0 ) is
n2/7 in that case. Note that with a bandwidth hn = O(n−1/7 ), the limit is
(2)−1 (1)
a Gaussian variable with the finite mean limn (nh7 )1/2 Vn,h )(θ0 )Vn,h (θ0 ).
This rate hn = O(n−1/7 ) is optimal for estimating the first derivative of
a function of class C2 and the biases were obtained under this assump-
tion. This is a consequence of the approximation of θbn,h − θ0 in terms of
(1)
(Vbn,h − V (1) )(θ). The optimal local and global bandwidths for the estima-
tors gbn,h and m b n,h are O(n−1/(2s+3) ) and they are expressed as a ration
of their (integrated) bias and variance, the mean squared errors for their
estimation are O(n−2s/(2s+3) ).
derivatives differs from those of the global error due to the kernel smoothing.
The mean estimator has a smaller variance than the estimator minimizing
the global error Vbn,h .
(2)
The sequence (Wn,hn )n is therefore an uniform O(h2 + (nh3 )−1 ). Assuming
(1)
that nh4n tends to infinity with n, Wn,hn = O(h2 + (nh2 )−1 ) = O(h2 ) and
c (1) is that its
a necessary condition for the weak convergence of (n3 h)1/2 W n,h
normalized mean converges, i.e. h = O(n−3/5 ).
Under this condition, nh5 tends to zero and the convergence rate of
(2)
Wn,hn is h2 . Arguing as for the estimator of θbn,h related to Vbn,h , θen,h
minimizing W cn,h is such that (n3 h5 )1/2 (θen,h − θn,h ) is approximated by the
c (2) (θ0 )}−1 n3/2 h1/2 (W
variable h2 {W c (1) − W (1) )(θ0 ). It converges weakly
n,h n,h n,h
to a Gaussian process with variance vθ0 = W (2)−1 (θ0 )ΣW,θ0 W (2)−1 (θ0 ).
Following the arguments for the proof of the previous proposition, we obtain
the following convergences.
gn,h ◦ ϕθbn,h .
b n,h = b
and m
Assume that the variance is constant and let Zi = ϕθ (Xi ) at fixed θ.
The derivatives of Vbn,h are
n
X
(1) (1) (1)
Vbn,h (θ) = −2n−1 {Yi − gbn,h (Zi )} b
gn,h (Zi ) ϕθ (Xi )
i=1
n
X
(2) (2) (1)2 (1)
Vbn,h (θ) = −2n−1 [{Yi − gbn,h (Zi )}b gn,h (Zi )]{ϕθ (Xi )}⊗2
gn,h (Zi ) − b
i=1
n
X (1) (2)
− n−1 {Yi − gbn,h (Zi )} gbn,h (Zi ) ϕθ (Xi )
i=1
(2)
where −Vbn,h (θ) is a positive definite matrix converging to a finite limit
(2) (1)
ΣV (θ) uniformly on the parameter space. The mean of Vbn,h (θ) and its
variance have the same orders as for the derivative of (7.3) in model (7.1)
(1) (1) (1)
Vn,h (θ) = 2E[{Yi − gbn,h ◦ ϕθ (Xi )} b
gn,h ◦ ϕθ (Xi ) ϕθ (Xi )]
(1) (1) (1)
= 2E{[{(m − gn,h )gn,h − Cov(b
gn,h , gbn,h )} ◦ ϕθ ϕθ ](Xi )}
where the first terms in the right-hand side is O(h2 ) and the last term is
(1)
a O((nh2 )−1 ). The variance of Vbn,h (θ) is a O((nh3 )−1 ). An expansion of
Vn,h in a neighborhood of θ0 implies
(2) (1)
(nh3 )1/2 (θbn,h − θ0 ) = {−Vbn,h (θ0 )}−1 (nh3 )1/2 {Vbn,h (θ0 ) − V (1) (θ0 )} + op (1).
(1)
The variance of Vbn,h (θ0 ) is asymptotically equivalent to (nh3 )−1 ΣV,θ0 +
3 −1
o((nh ) ), with a modified notation
Z
(1)⊗2 −2
ΣV,θ0 = 4E[ϕθ0 (X)fX (X){g 2 + σg2 } ◦ ϕθ0 (X)w2 (ϕθ0 (X))]( K (1)2 ).
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Proposition 7.3. Under Conditions 2.1 and 3.1, and with a bandwidth
hn = o(n−1/7 ) for the estimation of a regression function m in class
C2 , the estimators of the parameter θ and the function m are consistent,
(nh3 )1/2 (θbn,h −θ0 ) converges weakly to a centered Gaussian with variance vθ
and (nh3 )1/2 (m b n,h − mθ0 ) converges weakly to a centered Gaussian process
(1) (1)
with covariance g (1) ◦ ϕθ0 (x)vθ0 g (1) ◦ ϕθ0 (x0 )ϕθ0 (x) ⊗ ϕθ0 (x0 ) at (x, x0 ).
With the optimal bandwidth hn = O(n−1/7 ), the convergence rates of the
estimators are n2/7 and the limiting distributions of the estimators have a
non zero mean, as in the previous section.
The results of Proposition 7.2 are similar with these notations for the
asymptotic variances.
7.4 Exercises
(1) Write the mean, the bias and the variance of the local mean squared
error Vbn,h (θ; z) and define a sequence of optimal bandwidth functions
for this criterium.
(2) Write the derivatives of Vbn,h (θ; z) and an approximation for the esti-
mator θbn,h (z) minimizing Vbn,h (θ; z). Determine the orders of its bias
and its variance.
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Chapter 8
Diffusion processes
8.1 Introduction
147
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Condition 8.1. There exists a mean density of the variables (Xti )1≤i≤n
defined as the limit
n
X
f (x) = lim n−1 fXti (x) = EfXt (x).
n→∞
i=1
Zi = Yi − ∆n α
bn,h (Xti ) = ∆n (α − α
bn,h )(Xti ) + β(Xti )εi ,
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
its mean is E{Zi |Xti = x} = ∆n E(α − α bn,h )(Xti ) = ∆n (α − αn,h )(Xti ) its
order is ∆n O(h2 ) and its variance satisfies
∆−1 −1
bn,h )(Xti ) + β(Xti )εi }2
n V ar{Zi |Xti = x} = ∆n E{∆n (αn,h − α
αn,h (Xti ) + β 2 (x)
= ∆n V arb
= (nh)−1 κ2 f −1 (x) + β 2 (x). (8.4)
E{∆2n (b
αn,h − α)4 (Xt ) + β 4 (Xt )∆−2
n ε
4
+ 2β 2 (Xt )(b
αn,h − α)2 (Xt ) | Xt = x} − E 2 (∆−1 2
n Zt | Xt = x)
= β 4 (x)∆−2 4 2
αn,h − α)4 (x) + 2β 2 (x){vα,n,h (x) + b2α,n,h (x)}
n Eε + ∆n E(b
− {β 2 (x) + vα,n,h (x) + b2α,n,h (x)}2
= β 4 (x){∆−2 4 4
n Eε − 1} = 2β (x) + o(1),
thus σβ2 is written in the form σβ2 (x) = 2κ2 f −1 (x)β 4 (x). Under the condi-
tion h = hT = 0(T −1/5 ), let cα = limT →∞ (T h5T )1/2 .
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Proposition 8.1. Under Conditions 2.1, 2.2, 3.1 and 3.2 for the func-
bn,h and βbn,h are uniformly
tions α and β in class Cs (X ), the estimators α
consistent on X , with bias
αn,h (x) − αn,h (x)kp = 0((T h)−1/p ) kβbn,h (x) − βn,h (x)kp = 0((nh)−1/p ) ,
kb
The order for the bandwidths is the order of the optimal bandwidth for the
asymptotic mean squared errors of estimation of α. The conditions ensure
a Lipschitz property for the second order moment of the increments of the
processes, similar to Lemma 2.2 for the density. Moreover, the covariances
develop like in the proof of Theorem 2.1.
The variance of the variable Y in model (8.2) being a function of X, the
regression function α is also estimated by the mean of a weighted kernel as in
b ti ) = σα−1 (Xti ). As previously,
Section 3.6, with the weighting variables w(X
the approximations of the bias and variance of the new estimator (3.19)
of the drift function are modified by introducing w bn and its asymptotic
distribution is modified.
With a partition of [0, T ] in subintervals Ii of unequal length ∆n,i vary-
ing with the observation timesti of the process the variable Yi has to be
normalized by ∆n,i , 1 ≤ i ≤ n. For every x in Xn,h , the estimators are
Pn −1
i=1 ∆n,i Yi Kh (x − Xti )
bn,h (x) =
α P n ,
i=1 Kh (x − Xti )
−1/2
Zn,i = ∆n,i {Yi − ∆n,i αbn,h (Xti )},
Pn 2
i=1 Zn,i Kh (x − Xti )
βbn,h
2
(x) = P n .
i=1 Kh (x − Xti )
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
The results of Proposition 8.1 are satisfied, replacing the means of sums
Pn
with terms ∆−1n,i by means with coefficient n
−1 −1
i=1 ∆n,i and assuming
−1
that the lengths ∆n,i have the order n T . The optimal bandwidth for the
estimation of α is O(T −1/(2s+1) ) and its asymptotic mean squared error is
AM SEα (x) = (T h)−1 σα2 (x) + hs2 b2α,s (x), it is minimum for the bandwidth
function
n (s!)2 κ T −1 V ar(∆−1 o1/(2s+1)
2 n,i Yti )
hα,AMSE (x) = .
2sm2sK {µ(s)
α (x) − α(x)f
(s) (x)}2
The optimal local bandwidth for estimating the variance function β 2 of the
diffusion is a O(n−1/(2s+1) ) and it minimizes AM SEβ (x) = (nh)−1 σβ2 (x) +
hs2 b2β,s (x).
A diffusion model including several explanatory processes in the coeffi-
cients α and β may be written using an indicator process (Jt )t with values
in a discrete space {1, . . . , K} as
K
X K
X
dXt = αk (Xt )1{Jt = k}dt + βk (Xt )1{Jt = k}dBt , t ∈ [0, T ]. (8.5)
k=1 k=1
and the estimators of the 2K functions αk and βk are defined for every for
x in Xn,h by
Pn −1
i=1 ∆n,i Yi Kh (x − Xti ,k )
bk,n,h (x) =
α Pn ,
i=1 Kh (x − Xti ,k )
Pn −1 2
i=1 ∆n,i Zi Kh (x − Xti ,k )
βbk,n,h
2
(x) = Pn .
i=1 Kh (x − Xti ,k )
where
n
X Z Z
µα,k,n,h (x) = En −1
∆−1
n,i yKh (x − s) dFXk,ti ,Yti (s, y)
i=1
h2
= µαk (x) + m2K µ(2) 2
αk (x) + o(h )
2
µαk (x) = f (x)αk (x),
n
X
µβk ,n,h (x) = n−1 ∆−1 2
n,i E[Zi Kh (x − Xk,ti )]
i=1
h2 (2)
= µβk (x) + m2K µβk (x) + o(h2 ),
2
µβk (x) = f (x)βk2 (x).
The norms and the asymptotic behaviour of the estimators is the same as
in Proposition 8.1. The two-dimensional model
dXt = αX (Yt )dt + βX (Xt )dBX (t),
dYt = αY (Yt )dt + βY (Yt )dBY (t)
with independent Brownian processes BX and BY is a special case where
all parameters are estimated as before.
The process {Xt , t ∈ [0, 1]} is extended to a time interval [0, T ] by rescaling:
Xt = XT s , with s in [0, 1] and t in [0, T ]. Now the Gaussian process B is
mapped from [0, 1] onto [0, T ] by the same transform and Bs = T 1/2 Bt/T is
the Brownian motion extended from [0, 1] to [0, T ]. The observation of the
sample-path of the process {Xt , t ∈ [0, T ]} allows to construct estimators
similar those of smooth density and regression function in Sections 2.10 and
3.10, under the ergodic property (2.13). The Brownian process (Bt )t≥0
is a martingale with respect to the filtration generated by the (Bu )u<t ,
E(Bt − Bs | Xs ) = 0 for every 0 < s < t. Its moments are EBt2k+1 = 0,
Bt2 = t thus (Bt − B0 )2 has a tχ21 distribution and, for every integer k,
Bt2k = tk G(k) (0) with the generating function G2k (t) = (1 − 2t)−k of the
χ2k distribution, for t < 1/2, hence Bt4 = 3t2 .
Estimators are built like for regression functions of processes with the
response process Yt = dXt , without derivability assumption for the sample-
paths of X since B has only a L2 -derivative. The integrated drift function
Z t
A(t; X) = α(Xs ) ds
0
b X) = Xt − X0 , thus E A(t;
is estimated by A(t; b X) = A(t; X) and its vari-
Rt Rt 2
ance equals V ar{ 0 α(Xs ) ds} + E 0 β (Xs ) ds. The drift function α(Xt )
is estimated by smoothing the sample-path of the process X in a neighbor-
hood of Xt = x
RT
Kh (x − Xs ) dXs
bT,h (x) = R0 T
α . (8.7)
0 K h (x − X s ) ds
The estimators of the density and µα (x) = α(x)fX (x) defining (8.7) are
Z T
fbX,T,h (Xt ) = T −1 Kh (x − Xs ) ds,
0
Z T
−1
bα,T,h (x) = T
µ Kh (x − Xs ) dXs .
0
Their limits are expressed with the mean marginal density of the process
Z T
fX (x) = lim T −1 E fXs (x) ds
T →∞ 0
and the mixing property of the sample path of the process X implies that
Z T
−1
fX (x) − lim T E fXs (x) ds = O(T −1/2 ).
T →∞ 0
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Assuming that the kernel satisfies Conditions 2.1-2.2, their moments are
approximated using Taylor expansions and the properties of the Brownian
motion, with covariance function E(Bs Bt ) = s ∧ t. With a diffusion process
X, their expectations are
Z
E fbX,T,h (x) = Kh (u − x) fX (u) du
IX
h2 (2)
= fX (x) + m2K fX (x) + o(h2 ),
2
Z T
bα,T,h (x) = T −1 E
Eµ Kh (x − Xs ){α(Xs ) ds + β(Xs ) dBs }
0
Z
= α(u)Kh (u − x)fX (u) du
IX
h2
= α(x)fX (x) + m2K (αfX )(2) (x) + o(h2 ),
2
so the bias of the estimator of µα (x) is h2 bµα (x) = h2 m2K (αfX )(2) (x)/2 +
RT
o(h2 ). Its variance T −2 E 0 Kh (Xt − x) {dXt − α(Xt ) dt}2 is expanded
using theR ergodicity property (2.16) as−1 inRSection 2.10, now the covariance
−1 T T
of T 0 Kh (Xs −x)β(Xs ) dBs and T R 0 Kh (Xt −x)β(Xt ) dBt as a sum
T
Id (T ) + Io (T ), where Id (T ) = T −2 E 0 Kh2 (Xt − x)β 2 (Xt ) dt develops as
Z
Id (T ) = T −1 Kh2 (u − x)β 2 (u)fX (u) du
IX
= (T hT )−1 κ2 β 2 (x)fX (x) + o((T hT )−1 )
and
R T Rthe expectation
R T Io (T ) is expanded using the ergodicity property, with
T 2
0 0 d(s∧t) = 2 0 (T −s) ds = T and the notation αh (u, v) = |u−v|/2hT
Z
Io (T ) = Kh (u − x)Kh (v − x)xβ(u)β(v) dFXs ,Xt (u, v) du dv
2 \D
IX X
Z Z 1
= K(z − αh (u, v))K(z + αh (u, v)) dz
IX\{u} −1
µα,T,h (x)kp = O((T hT )−1/p ) and the approximation (3.2) is also satisfied
for the estimator αbT,h . It follows that the estimator α bT,h (x) of a drift
function α in class Cs , for s ≥ 2, has a bias and a variance
bα,T,h (x; s) = hsT bα (x; s) + o(hsT ),
msK −1 (s)
bα (x; s) = f (x){(αfX )(s) (x) − α(x)fX (x)},
s! X
vm,T,h (x) = (T hT )−1 {σα2 (x) + o(1)},
−1
σα2 (x) = κ2 fX (x)β 2 (x)
so they have the same expressions as in the discretized regression model
bT,h (x) and α
(8.2), the covariance of α bT,h (y) tends to zero. Let
Z t Z t Z t
Z t = Xt − X0 − bT,h (Xs ) ds =
α (α − α
bT,h )(Xs ) ds + β(Xt ) dBt ,
0 0 0
(8.8)
its expectation conditionallyRon the filtration generated by the process X
t
up to t− is E(Zt | Ft ) = − 0 bα,T,h (Xs ) ds = O(h2 ) for every t > 0 and
the main term of its conditional variance
Z t Z t
V ar(Zt | Xt ) = V ar{ bT,h (Xs )ds} +
α β 2 (Xs )ds
0 0
Z t Z t
− 2Cov{ (b αT,h )(Xs ) ds, β(Xs ) dBs } + O((T hT )−1 )
0 0
Rt
is 0 β 2 (Xs )ds. The variance function β 2 (Xt ) is therefore consistently es-
timated by
RT
b2 2 0 Zs Kh (Xs − x) dZs
βT,h (x) = RT . (8.9)
0
Kh (Xs − x) ds
Under conditions (2.1) and (3.1) for the functions α and β in class Cs (X ),
the bias of the estimator βbT,h
2
is
bβ,T,h (x) = hs bβ (x; s) + o(hs ),
msK −1
bβ (x; s) = f (x){(f β 2 )(s) (x) − β 2 (x)f (s) (x)}.
s!
Its variance is vβ,T,h (x) = (T h)−1 {σβ2 (x) + o(1)} where σβ2 (x) is the first
term in the expansion of κ2 f −1 (x)V ar(Zt2 | Xt = x) calculated like in
the discrete model, that is σβ2 (x) = 2κ2 f −1 (x)β 4 (x), as in Proposition 8.1.
Under the previous conditions, the processes (T hT )1/2 (b αT,h − α − bα,T,h )
1/2 b2 2
and (T hT ) (βT,h − β − bβ 2 ,T,h ) converge weakly to a centered Gaussian
processes with mean zero, covariances zero and respective variance functions
σα2 and σβ2 .
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
The same expansions as for the variance of µ bT,h (x) and fbX,T,h (x)
in Section 2.10 prove that the finite dimension distributions of the pro-
αT,h − α − bα,T,h ) and (T hT )1/2 (βbT,h − β − bβ,T,h ) con-
cess (T hT )1/2 (b
verge to those of a centered Gaussian process with mean zero, covari-
ances zero and variance functions σα2 and σβ2 . Lemma 3.3 generalizes
and the increments E{b αT,h (x) − αbT,h (y)}2 and E{βbT,h (x) − βbT,h (y)}2
are approximated by O(|x − y|2 (T h3T )−1 ) for every x and y in IX,h such
that |x − y| ≤ 2hT . Then the processes (T hT )1/2 {b αT,h − α}I{IX,T } and
1/2 b
(T hT ) {βT,h − β}I{IX,T } converge weakly to σα W1 + γ 1/2 bα and σβ W2 +
γ 1/2 bβ , respectively, where W1 and W2 are centered Gaussian processes
on IX with variance 1 and covariances zero. The covariance Cα,β,T,h (x, y)
of αbT,h (x) and βbT,h (y), with 2|x − y| > hT develops using the approxi-
RT RT
mation (3.2) as {f (x)f (y)T }−2[E{ 0 Kh (Xs − x)β(Xs ) dBs }{ 0 Kh (Xs −
RT
y)(2Zt dZt − β 2 (Xt ) dt)} − Eα(x){ 0 Kh (Xs − y)β(Xs ) dBs }(fbT,h −
RT
fT,h )(x) − Eβ(y){ 0 Kh (Xs − x)(2Zt dZt − β 2 (Xt ) dt)}(fbT,h − fT,h )(y) +
α(x)β(y)(T hT )−1 Cov(fbT,h (x), fbT,h (y)), it is therefore a o((T hT )−1 ).
According to the local optimal bandwidths defined in the previous sec-
tions, the estimators α bn,h and βbn,h are calculated with a functional band-
width sequences (hn (x))n or (hT (x))T . The assumptions for the convergence
of these sequences are similar to the assumptions for the nonparametric re-
gression with a functional bandwidth and the results of Chapter 4 apply
immediatly for the estimators of the discretized or continuous processes
(8.2) and (8.1).
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
This condition is satisfied if the jump part of Rthe process Xt satisfies the
T e (s). The diffu-
property (8.3) and the limit is fN (x) = T −1 E 0 fXs (x) dN
sion process Xt defined by (8.10) has the mean
Z t
µT = EX0 + E α(Xs )ds
0
Z Z t Z
= EX0 + α(x)fXs (x) dt dx = EX0 + t α(x)f (x) dx
X 0 X
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
and the variance of the normalized variable T −1/2 (XT − µT ) is finite if the
integrals
Z T Z
Sα = ET −1 α2 (Xs ) ds = α2 (x)f (x) dx + o(1),
0 X
Z T Z
Sβ = ET −1 β 2 (Xs ) ds = β 2 (x)f (x) dx + o(1),
0 X
Z T Z
Sγ = ET −1 e (s) =
γ 2 (Xs ) dN γ 2 (x)fN (x) dx + o(1)
0 X
Let a in R and Ta = inf{s ∈ [0, 1]; BX (s) = a} be a stopping time for the
process BX , then for every θ ≥ 0
√
E exp{θSX (Ta )} = exp(−a 2θ).
Let a in R and TT,a = inf{s ∈ [0, 1]; WT,s = a} be a stopping time for the
process WT,s .
Corollary
√ 8.1. For every θ ≥ 0, E exp{θSX (TT,a )} converges to
exp(−a 2θ) as T tends to infinity.
Pn Rt
for x in Xn,h . Let ∆−1 n denote n
−1 −1 e
i=1 ∆n,i and E Nt = 0 g(s)λ(s) ds,
then the mean of α bn,h (x) is approximated by
h2
αn,h (x) = α(x) + m2K {(f α)(2) (x) − α(x)f (2) (x)} + o(h2 ).
2
The variance of α bn,h (x) is a O((T h)−1 )
Xn
vα,n,h (x) = n−1 ∆−2
n,i (nh)
−1
{σα2 (x) + o(1)},
i=1
σα2 (x) = κ2 f −1
(x)∆−1
n V ar(Yt | Xt = x)
= κ2 f −1 (x){β 2 (x) + γ 2 (x)g(t)λ(t)}
and its covariances tend to zero. The process (T h)1/2 (bαn,h − α) has the
−1
asymptotic variance κ2 σα2 (x)fX (x), at x.
P
The discrete part of X is X d (t) = s≤t γ(Xs )∆Ns and its continuous
Rt Rt Rt
es , with variations
part is X c (t) = 0 α(Xs ) ds + 0 β(Xs ) dBs − 0 γ(Xs ) dN
on (ti , ti+1 )
∆Xic = α(Xti ) ∆n,i + β(Xti ) ∆Bti − γ(Xti ) ∆n,i Y (ti )λ(ti ) = Op (∆n,i ).
Rt
Then the sum its jumps converges to 0 Eγ(Xs ) g(s)dΛs . Let (Ti )1≤i≤N (T )
be the jumps of the process N . The jumps ∆X d (Ti ) = γ(XTi ) yield a
consistent estimator of γ(x), for x in Xn,h
P d
1≤i≤N (T ) ∆X (Ti )Kh (x − XTi )
bn,h (x) =
γ P
1≤i≤N (T ) Kh (x − XTi )
P
1≤i≤N (T ) γ(XTi )Kh (x − XTi )
= P .
1≤i≤N (T ) Kh (x − XTi )
The expectation of b γn,h (x) is approximated by the ratio of the means of
the numerator and the denominator. For the numerator
Z T
−1 h2
ET γ(Xs )Kh (x−Xs ) dNs = (γfN )(x)+ m2K {(γfN )(x)}(2) +o(h2 )
0 2
R T
and, for the denominator ET −1 0 Kh (x − Xs ) dNs = fN (x) +
h2 (2) 2
2 m2K fN (x) + o(h ). The bias of b γn,h (x) is then
h2 (2)
bγ,n,h(x) = m2K {fN (x)}−1 [{γ(x)fN (x)}(2) − γ(x)fN (x)] + o(h2 ),
2
also denoted bγ,n,h (x) = h2 bγ . The variance of b
γn,h (x) is deduced from the
variance of the numerator
Z T
T −2 E Kh2 (x − Xs ) dNs
0
h2 (2)
= (T h)−1 {κ2 fN (x) + κ22 f (x)} + o(T −1 hT )
2 N
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
E{∆2n (b
αn,h − α)4 (x) + β 4 (x)∆−2 4
bn,h )4 (x)∆−2
n ε + (γ − γ n η
4
In model (8.10), the estimator α bT,hT of Section 8.3 is unchanged and new
estimators of the functions β and γ must be defined from the continuous
observation
Rt of the sample path of X. The discrete part of X is also written
Xtd = 0 γ(Xs )dNs and the point process N is rescaled as Nt = NT s , with
t in [0, T ] and s in [0, 1]. Let
NT (s) = T −1 NT s ,
XT (s) = T −1 XT s , t ∈ [0, T ], s ∈ [0, 1].
R
The predictable compensator of NT is written N eT (t) = T −1 t YT (s)λ(s) ds
0
on [0, 1] and
Rt it is assumed to converge uniformly on [0, 1] to its mean
e d
E NT (t) = 0 g(s)λ(s) ds, in probability. Then XT (t) converges uniformly
Rt
in probability to 0 Eγ(XT (s)) g(s)dΛ(s). The continuous part of X is
dXtc = α(Xt ) dt + β(Xt ) dBt − γ(Xt )Yt λt dt. A consistent estimator of
γ(x), for x in IX,T,h
RT
Kh (x − Xs ) dX d (s)
bT,h (x) = R0 T
γ ,
0
Kh (x − Xs ) dN (s)
RT
Kh (x − Xs )γ(Xs ) dN (s)
= 0 RT ,
0
Kh (x − Xs ) dN (s)
it is identical to the estimator previously defined for the discrete diffusion
process. Its moments calculated in the continuous model (8.10) are iden-
tical to those of Section 8.4 then the process (T h−1
T )
1/2
γT,hT − γ − cα bγ )
(b
converges weakly to a centered Gaussian process with variance function vγ
and covariances zero.
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
2 h2
= β (x) f (x) + m2K (β 2 (x)f (x))(2) + o(h2 )
2
and its bias is denoted bβ,T,h = h2 bβ + o(h2 ), with
1
bβ = m2K f −1 (x){(f β 2 )(2) (x) − β 2 (x)f (2) (x)}.
2
Under conditions (2.1) and (3.1) for the function β in class R C2 , the
variance of the estimator βbT,h is obtained from E(Zt2 | X) = β 2 (Xs ) ds,
V ar(Zt2 | X) = O(t) and expanding
Z TZ
−2
ET Kh2 (x − y)V ar(Zt2 | Xt = y)fXt (y) dy dt = O((hT )−1 ),
0
2
it is therefore written σβ,T,h = (hT )−1 vβ + o((hT )−1 ). Then the process
(T hT )1/2 (βbT,h − β − (T h5T )1/2 bβ ) converges weakly to a centered Gaussian
process with variance function vβ and covariances zero.
The convergence rate of the process ξbn,h is therefore (nhn )1/2 and the finite
dimensional distributions of (nhn )1/2 (ξbn,h − ξn,h ) converge to those of a
Gaussian process with mean zero, as normalized sums of the independent
variables defined as the weighted quadratic variations of the increments of
Z. The covariances of (nhn )1/2 (ξbn,h − ξn,h ) are zero except on the interval
[−hn , hn ] where they are bounded, hence the covariance function converges
to zero. The quadratic variations of ξbn,h satisfy a Lipschitz property of
moments
it is then a O((nh3n )−1 |x − y|2 ) for |x − y| ≤ 2hn . It follows that the process
(nhn )1/2 (ξbn,h − ξn,h ) converges weakly to a continuous process with mean
zero and variance function 2v 2 and covariances zero.
The singularity function of the spatial covariance of a Gaussian process
Z is estimated by smoothing the estimator of the integrated spatial trans-
form of Z on [0, 1]3 , the convergence rate of the estimator is then (nh3 )1/2 .
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
8.7 Exercises
(1) Calculate the moments of the estimators for the continuous process
(8.6) and write the necessary ergodic conditions for the convergences
in this model.
(2) Calculate the bias and variance of derivatives of the estimators of func-
tions α, β and γ in the stochastic differential equations model (8.10).
(3) Prove Proposition 8.2.
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Chapter 9
Let (X, k·k) be a metric space and (Xt )t∈N be a time series defined on XN by
its initial value X0 and a recursive equation Xt = m(Xt−p , . . . , Xt−1 ) + εt
where m is a parameric or nonparametric function defined on Xp for some
p > 1 and (εt )t is a sequence of independent noise variables with mean zero
and variance σ 2 , such that for every t, εt is independent of (Xt−p , . . . , Xt−1 ).
167
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
k
1 X
bt,k
µ = Xt−i ,
k + 1 i=0
k
for a lag k up to t. The transformed series is Xt − µ bt,k = k+1 Xt −
1
P k
k+1 i=1 Xt−i and it equals (Xt − Xt−1 )/2 for k = 2. A polynomial trend
is estimated by minimizing the empirical mean squared error of the model,
then the transformed series Xt − µ bt,k is expressed by the means of moving
average of higher order, according to the degree of the polynomial model.
Consider the auto-regressive process with nonparametric mean
Xt = µt + αXt−1 + εt , t ∈ N, (9.1)
t−1
X t
X
Xt = µt−k αk + αt X0 + αk εt−k .
k=0 k=1
t1/2 (m
b t − mt ), when |α| < 1, and tα−t (mb t − mt ), when |α| > 1, follows
from martingale properties of the time series which imply its ergodicity and
a mixing property (Appendix D). If |α| 6= 1
Pt
(Xk−1 − mb k−1 )((1 − α)(mk − mb k ) + εk )
bt − α = k=1
α Pt
(X − b
m )2
k=1 k−1 k
it is therefore approximated in the same way as in model AR(1) and it
converges weakly with the same rate as in this model.
The vector α is estimated by α bT = arg minα∈]−1,1[d lT,K,t (α). For the first
1/2 ˙
order derivative, T lT,K,t (α0 ) converges weakly to a centered limiting
distribution and the second order derivative l̈T,K,t converges in probability
to a positive definite matrix E l̈T,K,t which does not depend on α. Then
the estimator of α satisfies T 1/2 (b −1
αT,K,t − α0 ) = l̈T,K,t T 1/2 l˙T,K,t (α0 ) + o(1).
The estimator αbT is consistent and its weak convergence rate is T 1/2 , if all
components of the vector α have a norm smaller than 1. The function ψ is
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
In the auto-regressive model AR(1) with independent errors with mean zero
and variance σ 2 , for k ≥ 1, the variable Xk is expressed from the initial
value as
k
X k−1
X
Xk − m = αk (X0 − m) + Sk,α , where Sk,α = αk−j εj = αj εk−j .
j=1 j=0
Let B be the standard Brownian motion, if |α| < 1 the process S[ns],α
defined up to the integer part of ns converges weakly to σB{(1 − α2 )1/2 }−1 .
If α = 1, the process n−1/2 S[ns],1 converges weakly to σB, and if |α| >
1 the process α−[ns] S[ns],α converges weakly to σB{(α2 − 1)1/2 }−1 . The
independence of the error variables εj implies
2k+s
so E(Xk − m)(Xk+s − m) = α V arX0 + V arSk,α and the covariance
function of the series is not stationary. The estimator (9.2) of the variance
σ 2 is defined as the empirical variance of the estimator of the noise variables
which are identically distributed and independent. In the same way, the
covariance is estimated by
t
X
1
ρbt,k = {Xi − m
b t −α
bt (Xi−1 − α
bt )}{Xi−k − m
b t −α
bt (Xi−k−1 − α
bt )},
t−k
i=k+1
the estimators σ bt2 and ρbt,k are consistent (Pons 2008). The estimators
are defined in the same way in an auto-regressive model of order p, with
a scalar products α bTt Xi−1 and α
bTt Xi−k−1 for p-dimensional variables Xi−1
and Xi−k−1 . In model (9.1), the expansion (9.6) of the variables centered by
the mean function is not modified and the covariance E(Xk − mk )(Xk+s −
mk+s ) has the same expression depending only on the variances of the
initial value and Sk,α , and on α and the rank of the observations. In
auto-regressive series with deterministic models of the mean, the covariance
estimator is modified by the corresponding estimator of the mean. In model
(9.3), the covariance estimator becomes
t
X
1
ρbt,k = {Xi − m
b t,h (Xi )}{Xi−k − m
b t,h (Xi−k )}
t−k
i=k+1
estimated by
τ
X t
X
σt2 (θ) = τ −1 εb2τ,k + (t − τ )−1 εb2t,k ,
k=1 k=τ +1
bt2 (τ ).
bt = arg min σ
γ
τ ∈[0,t]
The change-point estimator is approximated by
τ
1 X
bt = arg min t1/2 {
γ (Xk − µα − αXk−1 )2
τ ∈[0,t] τ
k=τ0 +1
τ
X
1
− (Xk − µβ − βXk−1 )2 − γ0 } + op (1),
t−τ
k=τ0 +1
t
X
2
bτ,t,h
σ = t−1 {Yi − (Iτ,i )m b 2,t,h (Xi , τ )}2
b 1,t,h (Xi , τ ) − (1 − Iτ,i ){m
i=1
2
τbt,h = arg min σ
bτ,t,h
τ ≤t
be the mean squared error for parameters (m, τ ). The difference of the
error from its minimal is
b nh (x) = m
m b 2nh (x)(1 − Iτbt,h )
b 1nh (x)Iτbt,h + m
of the regression function m0 (x) = m10 (x)Iτ0 + m20 (x)(1 − Iτ0 ) is a.s.
uniformly consistent and the process (th)1/2 (m b th − m0 ) converges weakly
under Pm0 to a Gaussian process Gm on IX , with mean and covariances
zero and with variance function Vm (x) = κ2 V ar(Y |X = x).
For the weak convergence of the change-point estimator, let kϕkX be
the L2 (FX )-norm of a function ϕ on IX , ρ(θ, θ0 ) = (|γ −γ 0 |+km−m0 k2X )1/2
0
the distance between θ = (mT , γ)T and θ0 = (m T , γ 0 )T and let Vε (θ0 ) be a
neighbourhood of θ0 with radius ε for the metric ρ. The quadratic function
lt (m, τ ) defined by (9.9) converges to its expectation
b nh − m0 k2X + |b
b th , τbth ) = 0(km
l(m τnh − τ0 |).
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
it is denoted lt = (l1t + l2t ){1 + o(1)}. The process Wt (m, γ) = t1/2 (lt −
l)(m, τγ ) is a Op (1). The estimator m b th is a local maximum likelihood
estimators of the nonparametric regression functions and the estimator of
the change-point is a maximum likelihood estimator. The variable l1t (m b th )
converges to l1 (m0 ) = 0 and l2t (mb th , τbγt ) converges to zero with the same
rate if the convergence rate of γ bt is the same as m b th . We obtain the next
bounds.
Lemma 9.1. For every ε > 0, there exists a constant κ0 such that
E sup(m,γ)∈Vε (τ0 ) lt (m, τγ ) ≤ κ0 ε2 and 0 ≤ l(m, τγ ) ≤ κ0 ρ2 (θ, θ0 ), for every
θ in Vε (τ0 ).
Lemma 9.2. For every ε > 0, there exist a constant κ1 such that
E sup(m,γ)∈Vε (τ0 ) Wt (m, γ) ≤ κ1 ρ(θ, θ0 ).
where νkt = t1/2 (Fbkt − Fk0 ) is the empirical processes of the series in phase
k = 1, 2, with the ergodic distributions Fk0 of the process.
The discrete part of Wt is approximated by W2t (γ) = t1/2 (l2t − l2 )(γ)
P
where l2t = t−1 ti=1 (Iτ,i − Iτ0 ,i )2 (m10 − m20 )2 (Xi ) + op (|τ − τ0 |) and the
sum is developed with the notation ai = (m10 − m20 )2 (Xi )
t
X τ0 t
1 X X
t−1 (Iτ,i − Iτ0 ,i )2 (m10 − m20 )2 (Xi ) = { (1 − Iτ,i )ai + Iτ,i ai }
i=1
t i=1 i=1+τ 0
τ0
X τth
X
1
= {1{τth <τ0 } ai + 1{τ0 <τth } ai }
t i=1+τ i=1+τ
th 0
τ0 τ0 +[h−1 uγ ]
1 X X
= {1{τth <τ0 } ai + 1{τ0 <τth } ai }.
t i=1+τ0
i=1+τ0 −[h−1 uγ ]
9.6 Exercises
Chapter 10
Appendix
10.1 Appendix A
The moments of derivatives of the kernel estimator for the regression func-
tion are presented in Chapter 3, here the proofs are detailed. The variance
(1) (1) (1)
b n,h (x) = fbn,h
of m −1
µn,h (x) − m
(x){b b n,h (x)fbn,h (x)} is obtained by an approx-
imation similar to (3.2) in Proposition 3.1
(1) (1) (1)
V armb (x) = f −2 [V ar{b
n,h X µ (x) − m b n,h (x)fb (x)}
n,h n,h
(1) (1)
+ {µn,h (x) − mn,h (x)fn,h (x)}2 V arfbn,h (x)
(1) (1)
− 2{µn,h(x) − mn,h (x)fn,h (x)}
(1) (1)
× Cov{fbn,h (x), µ b n,h (x)fbn,h (x)}] {1 + o(1)},
bn,h (x) − m
(1) (1)
where the variances V arb µn,h (x) and V arfbn,h (x) are O((nh3 )−1 ),
(1) (1)
V arfbn,h (x) = O((nh)−1 ), E{fbn,h (x) − fn,h (x)}4 = O((nh3 )−1 ) and
4 −1
E{mb n,h (x) − mn, h(x)} = O((nh) ),
(1) (1) (1)
V ar{b b n,h (x)fbn,h (x)} = V arb
µn,h (x) − m µn,h (x)
(1) (1) (1)
b n,h (x)fbn,h (x)} − 2Cov{b
+ V ar{m b n,h (x)fbn,h (x)},
µn,h (x), m
(1) (1)
b n,h (x)fbn,h (x)} ≤ [E{m
V ar{m b n,h (x)}4 E{fbn,h (x)}4 ]1/2
(1)
b n,h (x)fbn,h (x)} = O((nh2 )−1 )
− E 2 {m
(1) (1) (1)
Therefore V ar{b b n,h (x)fbn,h (x)} and V arm
µn,h (x)− m b n,h (x) are O((nh3 )−1 ).
Proposition 10.1.
(1) −2 (1)
b n,h (x) = fX
V arm V arbµn,h (x) + o((nh3 )−1 )
Z
−2
= (nh3 )−1 {fX (x)w2 (x) K (1)2 + o(h)}.
183
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
10.2 Appendix B
The expectation of the quadratic variations |fbn,h (x) − fbn,h (y)|2 develops as
the sum
Z
n−1 {Khn (x) (x − u) − Khn (y) (y − u)}2 f (u) du
Since h−1 −1 b
n (x)|x| and hn (y)|y| are bounded by 1, the order of E|fn,h (x) −
b 2 2 −1 −1
fn,h (y)| = O((x − y) n khn k if |xhn (y) − yhn (x)| ≤ 2hn (y)hn (x) and it
is a sum of variances otherwise.
10.3 Appendix C
In the single index model studied in Chapter 7, the precise order of the
(1)
mean and variance of Vbn,h defined Section 7.2 requires expansions. The
empirical mean squared error of the estimated function g, at fixed θ has
the derivative
n
X
(1) (1)
Vbn,h (θ) = n−1 {Yi − gbn,h (θT Xi ; θ)} {b
gn,h(θT Xi ; θ)} Xi .
i=1
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Appendix 185
(1)
Let Z = θT X at fixed θ. The mean of Vbn,h is
(1) (1)
Vn,h (θ) = EE[{g(Zi ) − gbn,h (Zi ; θ)} {b
gn,h (Zi ; θ)} Xi | Xi ]
(1) (1)
gn,h , gbn,h )(Zi ; θ)} Xi⊗2 ].
= E[{bg(1) ,n,h gn,h (Zi ; θ) − Cov(b
(1)
Its variance is n−1 V ar[{Yi − b
gn,h (Zi ; θ)} {b
gn,h (Zi ; θ)} Xi ] and,
(1) (1)
V ar[{Yi − gbn,h (Zi ; θ)} gbn,h (Zi ; θ)} = O(V ar{Yi gbn,h (Zi ; θ)})
(1)
gn,h (Zi ; θ) gbn,h (Zi ; θ)})
+ O(V ar{b
(1) (1)
V ar{Yi b gn,h (Zi ; θ)}) = O((nh3 )−1 ).
gn,h (Zi ; θ)} = O(V ar{b
(1)
The expansions (3.2) for gbn,h and (3.15) for b
gn,h are written
−1
{b
gn,h − gn,h }(z) = fX (z) (bµn,h − µn,h )(z)
b
− g(z)(fX,n,h − fX,n,h )(z) + oL2 ((nh)−1/2 )
(1) (1) −1 (1) (1)
{b
gn,h − gn,h }(z) = fX µn,h − µn,h )(z)
(z)[(b
(1) (1)
gn,h fbX,n,h − E(b
− {b gn,h fbX,n,h )}(z)
− g (1) (z) (fbX,n,h − fX,n,h )(z)] + oL2 ((nh3 )−1/2 ),
Then
(1) (1) (1) (1)
gn,h fbX,n,h − E(b
{b gn,h fbX,n,h )}(z) = {fbX,n,h − fX,n,h }(z)([gn,h
+ fX−1
(bµn,h − µn,h ) − g(fbX,n,h − fX,n,h ) ](z)
(1)
+ fX,n,h (z)fX −1
µn,h − µn,h ) − g(fbX,n,h − fX,n,h ) (z)) + oL2 ((nh)−1/2 )
(b
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Appendix 187
10.4 Appendix D
B1 The sequence (Xi )i≥0 is ϕ-mixing with a sequence (ϕk )k≥1 satisfying
P 2 1/2
k≥1 (k + 1) ϕk < ∞.
The ϕ-mixing property and condition B1 are defined in Billingsley
(1968), they imply the weak convergence of the normalized variable
P
n1/2 { n1 ni=1 φ(Xi ) − Eν φ(X1 )} to a normal variable σϕ2 N (0, 1).
Let (Xt )t≥0 be a time indexed process such that for every t > 0, Xt
is a random variable in a metric space (X, A, µ) and EXt2 is finite. Let
Ms0 and M∞ s+t be the σ-algebras generated by the sample-paths of the
process observed on the time intervals [0, s] and [s + t, ∞[ respectively,
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Ms0 = σ{(Xu )0≤u≤s } and M∞ s+t = σ{(Xu )u≥s+t }, with s and t > 0. The
sequence (Xt )t≥0 is ϕ-mixing if there exists a sequence (ϕt )t≥0 converging
to zero as t tends to infinity and such that the marginal distributions of the
process (Xt )t≥0 satisfy
sup{|P (B|A) − P (B)|; A ∈ Ms0 , B ∈ M∞
s+t , s, t ≥ 0} < ϕt .
B2 The process (Xt )t≥0 is ϕ-mixing with a sequence (ϕt )t≥0 satisfying
R∞ 2 1/2
0 (t + 1) ϕt dt < ∞.
The ergodic property is strengthened to allow the convergence of functionals
of the joint distributions of the process at several observation times.
A2’ The process (Xt )t≥0 is ergodic if for every integer k there exists a
probability νk on (X⊗k , Ak , µ), with the Borel σ-algebra Ak on X⊗k ,
such that for every real bounded function φ defined on the space X⊗k
Z
1 P
φ(Xs1 , . . . , Xsk ) ds1 , . . . , dsk −−−→ Eνk φ(X1 , . . . , Xk ).
tk [0,t]k t→∞
Notations
1A indicator of a set A,
a.s. almost surely,
(Bt )t≥0 Brownian motion,
Cov(Xi , Xj ) covariance of Xi and Xj : E(Xi − EXi )(Xj − EXj ),
Cs (I) class of real functions on I, having bounded and
continuous derivatives of order s,
∆f (x, y) variation of f : f (y) − f (x), R
EX expectation (or mean) of a variable X: x dFX (x),
FX (x) probability of the random set {X ≤ x},
f (s) derivatives of order s for a function f ,
FbX,n empirical distribution function,
m−1 either 1/m or inverse of a monotone function m,
Hα,M class of real functions f such that ∀x and y,
|f (s) (x) − f (s) (y)| ≤ M |x − y|α−s , s = [α],
Kh =R h−1 K(h−1 ·) normalized kernel with bandwidth h,
· dF
Λ = 0 (1−F −) cumulative hazard function for the distribution
function F ,
N = (Nt )t≥0 point process,
νbn empirical process n1/2 (FbX,n − FX ),
Ln partial likelihood of N at t such that Nt = n,
(Net )t≥0 predictable compensator of a point process N ,
Ω sample space,
ρ(i, j) correlation of variables Xi and Xj :
Cov(Xi , Xj ){V arXi V arXj }−1/2 ,
V arX variance of a variable X: E(X − EX)2 ,
Vbn,h empirical mean squared error for a regression,
189
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Bibliography
Andersen, P. and Gill, R. D. (1982). Cox’s regression model for counting processes:
a large sample study, Ann. Statist. 10, pp. 1100–1120.
Bahadur, R. R. (1966). A note on quantiles in large samples, Ann. Math. Statist.
37, pp. 577–580.
Barlow, R., Bartholomew, D. J., Bremmer, J. and Brunk, H. D. (1972). Statistical
Inference under Order Restrictions (Wiley, New York).
Beran, R. J. (1972). Upper and lower risks and minimax procedures, Proceedings
of the sixth Berkeley Symposium on Mathematical Statistics, L. Lecam, J.
Neyman and E. Scott (eds) , pp. 1–16.
Bickel, P. and Rosenblatt, P. (1973). On some global measures of the deviations
of density functions estimates, Ann. Statist. 1, pp. 1071–1095.
Billingsley, P. (1968). Convergence of probability measures (Wiley, New York).
Bosq, D. (1998). Nonparametric Statistics for Stochastic Processes (2nd edition,
Springer, New York).
Bowman, A. W. (1984). An alternative method of cross-validation for the smooth-
ing of density estimates, Biometrika 71, pp. 353–360.
Bowman, A. W. and Azalini, A. (1997). Applied Smoothing Techniques for Data
Analysis. The Kernel Approach with S-Plus Illustrations (Oxford Statistical
Science Series 18).
Breslow, N. and Crowley, J. (1974). A large sample study of the life table and
product limit estimates under random censorship, Ann. Statist. 2, pp. 437–
453.
Bretagnolle, J. and Huber, C. (1981). Estimation de densités : risque minimax,
Z. Wahrsch. Verw. Geb. 47, pp. 119–139.
Brillinger, D. R. (1981). Time Series Data Analysis and Theory (Holt, Rinehart
and Winston, New York).
Chaudhuri, P. (1991). Nonparametric estimates of regression quantiles and their
local Bahadur representation, Ann. Statist. 19, pp. 760–777.
Chernoff, H. (1964). Estimation of the mode, Ann. Inst. Statist. Math. 16, pp.
31–41.
Cox, D. R. and Oakes, D. (1984). Analysis of Survival Data (Chapman and Hall,
London).
191
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Cox, R. D. (1972). Regression model and life tables, J. Roy. Statist. Soc. Ser. B
34, pp. 187–220.
De Boor, C. (1978). A Practical Guide to Splines (Springer, New York).
Deheuvels, P. (1977). Estimation non paramétrique de la densité par his-
togrammes généralisés, Rev. Statist. Appl 25, pp. 5–42.
Delecroix, M., Härdle, W. and Hristache, M. (2003). Optimal smoothing in single-
index models, J. Multiv. Anal. 286, pp. 213–226.
Devroye, L. (1983). The equivalence of weak, strong and complete convergence in
l1 for kernel density estimates, Ann. Statist. 11, pp. 896–904.
Dumbgen, L. and Rufibach, K. (2009). Maximum likelihood estimatio of a log-
concave density and its distribution function: Basic properties and uniform
consistency, Bernoulli 15, pp. 40–68.
Dvoretski, A., Kiefer, J. and Wolfowitz, J. (1956). Asymptotic minimax char-
acter of the sample distribution functions and of the classical multinomial
estimator, Ann. Math. Statist. 27, pp. 642–669.
Eubank, R. (1977). Spline Smoothing and Nonparametric Regression (Dekker,
New York).
Fan, J. and Gijbels, I. (1996). Polynomial Modelling and Its Applications (Chap-
man and Hall CRC).
Ghosh, J. K. (1966). A new proof of the Bahadur representation of quantiles and
an application, Ann. Math. Statist. 42, pp. 1957–1961.
Gijbels, I. and Veraverbeke, N. (1988a). Almost sure asymptotic representation
for a class of functionals of the product-limit estimator, Ann. Statist. 19,
pp. 1457–1470.
Gijbels, I. and Veraverbeke, N. (1988b). Weak asymptotic representations for
quantiles of the product-limit estimator, J. Statist. Plann. Inf. 18, pp.
151–160.
Groeneboom, P. (1989). Brownian motion with a parabolic dridt and airy func-
tions, Probab. Theory Related Fields 81, pp. 79–109.
Groeneboom, P., Jonkbloed, G. and Wellner, J. (2001). Estimation of a convex
function: Characterization and asymptotic theory, Ann. Statist. 29, pp.
1653–1698.
Groeneboom, P. and Wellner, J. (1990). Empirical processes (Burckhlder, Basel).
Guyon, X. and Perrin, O. (2000). Identification of space deformation using linear
and superficial quadratic variations, Statist. Prob. Lett. 47, pp. 307–316.
Hall, P. (1981). Law of the iterated logarithm for nonparametric density estima-
tors, Stoch. Proc. Appl. 56, pp. 47–61.
Hall, P. (1984). Integrated square error properties of kernel estimators of regres-
sion functions, Ann. Statist. 12, pp. 241–260.
Hall, P. and Huang, L.-S. (2001). Nonparametric kernel regression subjet to mono-
tonicity constraints, Ann. Statist. 29, pp. 624–647.
Hall, P. and Johnstone, I. (1992). Empirical functionals and efficient smoothing
parameter selection, J. Roy. Statist. Soc. Ser. B 54, pp. 475–530.
Hall, P. and Marron, J. M. (1987). Estimation of integrated squared density
derivatives, Statist. Probab. Lett. 6, pp. 109–115.
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Bibliography 193
Bibliography 195
Schoenberg, I. (1964). Spline functions and the problem of graduation, Proc. Nat.
Acad. Sci. USA 52, pp. 947–950.
Schuster, E. F. (1969). Estimation of a probability density function and its deriva-
tives, Ann. Math. Statist. 40, pp. 1187–1195.
Scott, D. W. (1992). Multivariate density estimation: theory, practice, and visu-
alization (Wiley, New York).
Sheather, S. J. and Marron, J. S. (1990). Kernel quantile estimators, J. Amer.
Statist. Assoc. 85, pp. 410–416.
Shorack, G. R. and Wellner, J. A. (1986). Empirical processes and applications
to statistics (Wiley, New York).
Silverman, B. W. (1978a). On a gaussian process related to multivariate probabil-
ity density estimation, Math. Proc. Cambridge Philos. Soc. 80, pp. 136–144.
Silverman, B. W. (1978b). Weak and strong uniform consistency of the kernel
estimate of a density and its derivatives, Ann. Statist. 6, pp. 17–184.
Silverman, B. W. (1984). Spline smoothing: The equivalent variable kernel
method, Ann. Statist. 12, pp. 898–916.
Silverman, B. W. (1985). Some aspects of the spline smoothing approach to the
nonparametric regression curve fitting, J. Roy. Statist. Soc. Ser. B 47, pp.
1–22.
Simonoff, J. S. (1996). Smoothing Methods in Statistics (Springer-Verlag, New
York).
Singh, R. S. (1979). On necessary and sufficient conditions for uniform strong
consistency of estimators of a density and its derivatives, J. Multiv. Anal.
9, pp. 157–164.
Stieltjes, T.-J. (1890). Sur les polynômes de Legendre, Ann. Fac. Sci. Toulouse,
1e série 4 G, pp. 1–17.
Stone, M. (1974). Cross-validation choice and assessment of statistical prediction
(with discussion), J. Roy. Statist. Soc. Ser. B 36, pp. 111–147.
Stute, W. (1982). A law of the logarithm for kernel density estimators, Ann.
Probab. 10, pp. 414–422.
van de Geer, S. (1993). Hellinger consistency of certain nonparametric maximum
likelihood estimators, Ann. Statist. 21, pp. 14–44.
van de Geer, S. (1996). Applications of empirical process theory (Cambridge uni-
versity press).
van der Vaart, A. and van der Laan, M. (2003). Smooth estimation of a monotone
density, Statistics 37, pp. 189–203.
van der Vaart, A. and Wellner, J. A. (1996). Weak convergence and Empirical
Processes (Springer, New York).
Wahba, G. (1977). Optimal smoothing of density estimates, Classification and
clustering, (ed.) J. Van Ryzin. Academic Press, New York , pp. 423–458.
Wahba, G. and Wold, S. (1975). A completely automatic french curve: Fitting
spline functions by cross-validation, Comm. Statist. 4, pp. 1–17.
Walker, A. M. (1971). On the estimation of a harmonic component in a time
series with stationary independent residuals, Biometrika 58, pp. 26–36.
Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing (Chapman and Hall,
CRC).
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Index
197
January 31, 2011 17:17 World Scientific Book - 9in x 6in FunctionalEstimation
Index 199