Convergence Rates in Weighted L Spaces of Kernel Density Estimators For Linear Processes
Convergence Rates in Weighted L Spaces of Kernel Density Estimators For Linear Processes
s=0
a
s
ts
with independent and identically
distributed (i.i.d.) innovations
t
that have nite mean and density f. We assume
that a
0
= 1 and that the coecients are summable,
s=0
|a
s
| < . Then X
t
has
a stationary density g. It can be estimated by the kernel estimator
g(x) =
1
n
n
j=1
k
b
(x X
j
), x R.
Here k
b
(v) = k(v/b)/b, where k is a kernel and b is a bandwidth such that b 0 and
nb . Pointwise and uniform convergence rates have been studied by several
authors, see for example Hall and Hart (1990), Tran (1992), Hallin and Tran (1996),
Lu (2001), Wu and Mielniczuk (2002), Bryk and Mielniczuk (2005), and Schick and
Wefelmeyer (2006).
The natural distance for densities is given by the L
1
-norm. Convergence of g to g
in this norm has been neglected for time series. Here we study rates of convergence
Received by the editors September 28 2007, accepted March 26 2008.
2000 Mathematics Subject Classication. 62G07, 62G20, 62M05.
Key words and phrases. L
1
-Lipschitz, smoothness of convolutions, variance bound.
Anton Schick was supported by NSF Grant DMS 0405791.
117
118 Anton Schick and Wolfgang Wefelmeyer
in weighted L
1
-norms under mild assumptions on f. More specically, we consider
the weight function V (x) = (1+|x|)
s=0
a
s
ts
and Z
t
=
s=m
a
s
ts
and write f
m
for the density of Y
t
. Express g g as the sum S + T + B of three
terms, where
S(x) =
1
n
n
j=1
_
k
b
(x X
j
) k
b
f
m
(x Z
j
)
_
, (1.1)
T(x) =
1
n
n
j=1
k
b
f
m
(x Z
j
) k
b
g(x), (1.2)
B(x) = k
b
g(x) g(x), x R. (1.3)
For m = 1 this approach was used by Wu and Mielniczuk (2002), and for arbitrary
m by Schick and Wefelmeyer (2006). We study the V -norms of the terms in (1.1)
(1.3) individually. Let
N =
s1
1[a
s
= 0] (1.4)
denote the number of nonzero coecients among a
s
, s 1. If N = 0 we have i.i.d.
observations X
t
=
t
. If N is nite, the observations are m-dependent for some m.
In those cases, we can choose T = 0 by taking m large enough. Thus the term T
has to be dealt with only if N = .
Under mild conditions on f, g and k we obtain the rates
S
V
= O
P
(n
1/2
b
1/2
), T
V
= O
P
(n
1/2
), B
V
= O(b
r
)
for a positive integer r. This yields the familiar rate
g g
V
= O
P
(n
1/2
b
1/2
) +O(b
r
) (1.5)
under such conditions.
For the special case V = 1 we are dealing with the usual L
1
-norm and have the
following results. We take a bounded kernel of order 2. We distinguish the
cases when N is nite and when N is innite.
(i) If N is nite and f has bounded variation and a nite moment of order
greater than one, then (1.5) holds for V = 1 with r = min{, N + 1}.
(ii) If N is innite, the series
s=1
s|a
s
| converges, f has a nite moment of
order greater than one, and the function x (1 + |x|)f(x) has bounded
variation, then (1.5) holds for V = 1 with r = .
Convergence rates 119
The paper is organized as follows. In Section 2 we state the results. Sections
3, 4 and 5 treat the terms S, B and T, respectively. Relations between various
smoothness conditions for V -norms are studied in Section 6. An auxiliary result
used in Section 5 is proved in Section 7. This result is of independent interest.
Together with the generalization in Corollary 7.1 it is used in Schick and Wefelmeyer
(2007b) and (2008).
2. Results
Let V be a measurable function satisfying
V (x +y) V (x)V (y), x, y R, (2.1)
and
1 V (zx) V (x), x R, |z| 1. (2.2)
Then the V -norm of a measurable function h is dened by
h
V
=
_
V (x)|h(x)| dx.
Let v(x) = 1 + |x| and W
= V
2
v
. In this case W
= v
2+
. The reason
for restricting attention to this case is that we can rely on the moment inequality
(3.2) below and can then give conditions in terms of niteness of moments of f.
We now state some inequalities on the V -norm. An application of the Cauchy
Schwarz inequality yields
h
2
V
K
h
2
W
(2.3)
for all > 1, with K
=
_
v
_
h(x y)(dy)
dx h
V
_
V d (2.5)
for every measure such that
_
V d < , and every h with nite V -norm. In
particular, the V -norm of a convolution h
1
h
2
of two functions with nite V -norms
satises the inequality
h
1
h
2
V
h
1
V
h
2
V
. (2.6)
Since (h
1
h
2
)
2
h
2
1
(h
2
1
|h
2
|) in view of the CauchySchwarz inequality, we
obtain from the last inequality that
(h
1
h
2
)
2
V
h
2
1
V
h
2
V
h
2
1
(2.7)
if h
2
1
and h
2
have nite V -norms.
To state our results we introduce the following denitions. These concepts and
their relations are studied in Section 6.
Denition 2.1. A function h is V -Lipschitz (with constant L) if
_
V (x)|h(x t) h(x)| dx L|t|V (t), t R.
120 Anton Schick and Wolfgang Wefelmeyer
A function h is V -Lipschitz of order r (with constant L) for some positive integer
r if there are functions h
(1)
, . . . , h
(r1)
such that
_
V (x)
i=1
t
i
i!
h
(i)
(x)
dx L|t|
r
V (t), t R.
If the functions h
(1)
, . . . , h
(r1)
also have nite V -norms, then h is strongly V -
Lipschitz of order r.
Denition 2.2. A function h has nite V -variation if there are nite measures
1
and
2
satisfying
1
(R) =
2
(R) and
_
V d(
1
+
2
) < such that h(x) =
1
((, x])
2
((, x]) for Lebesgue-almost-all x. In this case we call =
1
+
2
a measure of V -variation of h.
We need the following strengthened concept of a kernel of order . For V = 1
this denition reduces to the usual denition of a kernel of order if we also assume
that
_
x
k(x) dx = 0.
Denition 2.3. A kernel k is of V -order if is an integer greater than one,
_
x
i
k(x) dx = 0, i = 1, . . . , 1,
and
_
(1 +|x|)
V (x)|k(x)| dx is nite.
We have the following results. The rst result treats the case of independent
observations and is essentially contained in M uller, Schick and Wefelmeyer (2005)
and in Schick and Wefelmeyer (2007a).
Theorem 2.1. If N = 0, f is V -Lipschitz, f and k
2
have nite W
-norms for
some > 1, and
_
|t|V (t)|k(t)| dt is nite, then
g g
V
= O
P
(n
1/2
b
1/2
) +O(b).
Proof: Since N = 0, we have g = f. If we take m = 1, we obtain T = 0 and
g g = S +B. Hence Theorem 2.1 follows from Propositions 3.1 and 4.1.
Theorem 2.2. Let N be positive and nite and let V = v
for some 0. If f
has nite V -variation, f and k
2
have nite moments of order > 2 +1, and k is
of V -order , then
g g
V
= O
P
(n
1/2
b
1/2
) + O(b
r
),
where r is the minimum of N + 1 and .
Proof: Since N is nite, we can pick m so large that T = 0. It follows from
Lemma 6.5 that g is strongly V -Lipschitz of order N+1. We have v(ax) v(a)v(x)
and therefore E[W
(a
0
)] W
(a)E[W
(
0
)]. Hence we derive Theorem 2.2 from
Propositions 3.2 and 4.1.
Theorem 2.3. Let N be innite and let
j=1
j|a
j
| be nite. Set V = v
for
some 0. Suppose that for some non-negative p and q with p + q > 2 + 1 and
some positive integer m, the density f has a nite moment of order p + max(q, 1)
and nite v
-variation, f
m
is v
q
-Lipschitz, and v
p
f
m
is bounded. Suppose k is a
bounded kernel of V -order r and
_
v
j=1
j|a
j
| be nite. Set V = v
for
some 0. Let the density f have a nite moment of order > 2 +1 and nite
v
+1
-variation. Suppose k is a bounded kernel of V -order r and
_
v
(x)|k(x)| dx is
nite for some > 2 + 1. Then
g g
V
= O
P
(n
1/2
b
1/2
) + O(b
r
).
Suppose we know that N is innite. Under the assumptions of Theorem 2.3, we
can control the rate O(b
r
) of the bias by choosing a kernel of high order r. A choice
of bandwidth b n
1/(2r+1)
yields the rate
g g
V
= O
P
(n
r/(2r+1)
).
Thus we can achieve a rate close to the parametric rate n
1/2
. For invertible
processes, even the parametric rate n
1/2
can be achieved using the above results
and constructing estimators that exploit the linear structure of the process; see
Schick and Wefelmeyer (2007b) for the supremum norm and Schick and Wefelmeyer
(2008) for the V -norm.
Note that if vf has bounded variation, then vf is bounded, f has bounded
variation, and a simple argument shows that f is v-Lipschitz. Thus we derive from
Theorem 2.3 the following result for the case V = 1.
Corollary 2.2. Let N be innite and let
j=1
j|a
j
| be nite. Suppose f has a
nite moment of order greater than one, vf has bounded variation, and the kernel
k is bounded and of order r. Then
g g
1
= O
P
(n
1/2
b
1/2
) +O(b
r
).
3. Behavior of S
Let us rst deal with the term S dened in (1.1). Since the i-th and j-th
summands of S are uncorrelated if |i j| m, we obtain that nE[S
2
(x)]
2mE[k
2
b
(x X
1
)]. Using this and the inequalities (2.3) and (2.5), the latter with
W
in place of V , we nd
nE[S
2
V
] K
nE[S
2
]
W
2mK
_
W
(x)E[k
2
b
(x X
1
)] dx
2mK
E[W
(X
1
)]
_
W
(x)k
2
b
(x) dx
2m
b
K
E[W
(X
1
)]
_
W
(bx)k
2
(x) dx
for all > 1. Thus we have the following result.
122 Anton Schick and Wolfgang Wefelmeyer
Proposition 3.1. Suppose E[W
(X
1
)] and k
2
W
are nite for some > 1.
Then S
V
= O
P
(n
1/2
b
1/2
).
Now consider a nite N. Since W
(X
1
)] (E[W
(a
0
)])
N
with
a = sup{|a
s
| : s 0}. Hence we have the following consequence of Proposition 3.1.
Proposition 3.2. Let N be nite and suppose that E[W
(a
0
)] and k
2
W
are
nite for some > 1. Then S
V
= O
P
(n
1/2
b
1/2
).
Now consider the case that V is a non-negative power of v. Then W
is also a
power of v. We have
(1 +|x|)
r
2
r1
(1 +|x|
r
), x R, r 1. (3.1)
This and the Minkowski inequality give
E[v
r
(X
1
)] 2
r1
_
1 +
_
s=0
|a
s
|
_
r
E[|
0
|
r
]
_
, r 1. (3.2)
Thus we have the following result.
Proposition 3.3. Suppose E[|
0
|
] and
_
v
(x)k
2
(x) dx are nite for some >
2 + 1 with 0. Then S
V
= O
P
(n
1/2
b
1/2
) for V = v
.
4. Behavior of the bias
Next we deal with the bias term B dened in (1.3). For this we shall use the
following lemma.
Lemma 4.1. Suppose h, h
1
, . . . , h
r1
, w and U are measurable functions such that
_
V (x)
i=1
t
i
i!
h
i
(x)
dx U(t), t R,
and c
i
=
_
t
i
w(t) dt, i = 0, . . . , r 1, and A =
_
U(t)|w(t)| dt are nite. Then
_
_
_h w
r1
i=0
(1)
i
c
i
i!
h
i
_
_
_
V
A. (4.1)
Proof: Let denote the left-hand side of (4.1). Then
=
_
V (x)
_
_
h(x t) h(x)
r1
i=1
(t)
i
i!
h
i
(x)
_
w(t) dt
dx
__
V (x)
h(x t) h(x)
r1
i=1
(t)
i
i!
h
i
(x)
dx|w(t)| dt
and hence A.
Proposition 4.1. Suppose g is V -Lipschitz of order r and the kernel k is of V -
order r. Then B
V
= O(b
r
).
Proof: This follows from Lemma 4.1 applied with w = k
b
and U(t) = L|t|
r
V (t).
Note that c
0
= 1 and c
i
= 0 for i = 1, . . . , r 1 and
_
U(t)|k
b
(t)| dt = L
_
|bt|
r
V (bt)|k(t)| dt Lb
r
_
|t|
r
V (t)|k(t)| dt
Convergence rates 123
for b 1.
Sucient conditions for g to be V -Lipschitz of order r 2 are given in Section
6.
5. Behavior of T
Finally we consider the termT introduced in (1.2). As shown in the Introduction,
we need to treat only the case when N is innite.
Proposition 5.1. Let V = v
j=1
j|a
j
| is nite, f has nite v
-norm, v
p
f
m
is bounded, f
m
is
v
q
-Lipschitz, and v
r
k is integrable. Then T
V
= O
P
(n
1/2
).
Proof: In view of (2.3) it suces to show that E[nT
2
]
v
p+q is bounded. For this
we apply Lemma 7.1 below with h = f
m
k
b
, c
j
= a
j+i
and U
j
=
ji
, with
i = inf{j m : a
j
= 0}. Since f has nite v
-norm, U
0
has nite moment of order
. Moreover, f
m
has nite v
q
-norm and f
m
k
b
v
q f
m
v
q k
b
v
q . Since f
m
is v
q
-Lipschitz with constant L and v
p
f
m
is bounded by C, say, we obtain from
Remark 6.1 below that f
m
k
b
is v
q
-Lipschitz with constant Lk
b
v
q and v
p
f
m
k
b
is bounded by Ck
b
v
p. Note that k
b
v
s k
b
v
r v
r
(b)k
v
r for s r. Our
assumptions on the coecients a
j
guarantee that D of Lemma 7.1 is nite. From
this lemma we obtain that E[nT
2
]
v
p+q = O(1), which is the desired result.
By Lemma 6.2, if f
m
has nite v
q
-variation, then v
q
f
m
is bounded and f
m
is
v
q
-Lipschitz. Thus, taking < p < + 1 = q, we arrive at the following result.
Corollary 5.1. Let V = v
j=1
j|a
j
| is nite, f has nite v
-norm, f
m
has nite v
+1
-variation,
and v
+1
k is integrable. Then T
V
= O
P
(n
1/2
).
6. Smoothness in the V -norm
Here we study nite V -variation and the V -Lipschitz property and their relations.
Our rst lemma shows that these properties are preserved under convolutions with
a measure for which
_
V d is nite.
Lemma 6.1. Let be a measure with
_
V d nite. Let h be a function for which
h
(x) =
_
h(x y)(dy), x R,
is well-dened. Then the following are true.
(1) If h has nite V -norm, then the V -norm of h
is bounded by h
V
_
V d.
(2) If h is V -Lipschitz with constant L, then h
is strongly
V -Lipschitz of order r with constant L
_
V d.
(4) If h has nite V -variation with measure of variation , then h
has nite
V -variation with measure of variation .
(5) If V h is bounded by C, then V h
is bounded by C
_
V d.
124 Anton Schick and Wolfgang Wefelmeyer
Proof: Conclusion (1) is a consequence of (2.5). Conclusion (2) follows from the
bound
_
V (x)|h
(x t) h
(x)| dx
__
V (x)|h(x t y) h(x y)| dx(dy)
_
V (x)|h(x t) h(x)| dx
_
V (y) (dy).
To verify (3), take h
(i)
(x) =
_
h
(i)
(x y)(dy). Then, by (2.5), the functions
h
(1)
, . . . , h
(r1)
(x +t) h
(x)
r1
i=1
t
i
i!
h
(i)
(x)
dx
_
V d
_
V (x)
i=1
t
i
i!
h
(i)
(x)
dx,
and (3) follows. To verify (4) we may assume that h(x) =
1
((, x])
2
((, x])
for all x R, where
1
and
2
are nite measures with
1
(R) =
2
(R) and
_
V d
1
and
_
V d
2
nite. We now derive h
(x) = (
1
2
) ((, x]) and hence (4).
Finally, (5) follows from the bound |V (x)h
(x)|
_
V (y)V (xy)|h(xy)|(dy)
C
_
V d.
Remark 6.1. Let h and u be measurable functions with hu well-dened and u
V
nite. Then the conclusions of Lemma 6.1 hold with h
replaced by h u and
(dx) = |u(x)| dx so that
_
V d becomes u
V
. To see this, write u = u
+
u
,
where u
+
and u
are the positive and negative part of u, and apply Lemma 6.1
with (dx) = u
+
(x) dx and (dx) = u
(x) dx.
Remark 6.2. An integrable function of bounded variation has nite 1-variation.
Hence densities of bounded variation are 1-Lipschitz. Moreover, an integrable ab-
solutely continuous function h with h
V
nite has nite V -variation (with
1
having density h
+
= max(h
, 0) and
2
having density h
= max(h
, 0)).
The next lemma gives consequences of nite V -variation.
Lemma 6.2. If h has nite V -variation, then V h is bounded by
_
V d and h is
V -Lipschitz with constant
_
V d, where is a measure of V -variation of h. If h
has nite vV -variation, then h has nite V -norm h
V
_
vV d.
Proof: We may assume that h(x) =
1
((, x])
2
((, x]) for all x R,
where
1
and
2
are nite measures with
1
(R) =
2
(R). Then we have h(x) =
2
((x, ))
1
((x, )) for all x. By (2.2), we have V (x) V (y) for |x| |y|.
Then, with =
1
+
2
, we obtain for x 0 the inequalities
V (x)|h(x)| V (x)
_
xy
(dy)
_
xy
V (y)(dy)
and
_
0
V (x)|h(x)| dx
__
0xy
V (y)(dy) dx
_
0
yV (y)(dy).
For x 0, we obtain the inequalities
V (x)|h(x)| V (x)
_
yx
(dy)
_
yx
V (y)(dy)
Convergence rates 125
and
_
0
V (x)|h(x)| dx
_
0
|y|V (y)(dy).
Thus V h is bounded by
_
V d, and h
V
_
vV d. The arguments in the proof
of Lemma 8 of Schick and Wefelmeyer (2007a) show that a function with nite
V -variation is V -Lipschitz with constant
_
V d.
We now give sucient conditions for a function to be V -Lipschitz of order r 2.
For this we make the following denitions.
Denition 6.1. A function h is absolutely continuous of order r if h is (r1)-times
dierentiable and if its (r 1)-th derivative h
(r1)
is absolutely continuous with
almost everywhere derivative h
(r)
.
Denition 6.2. A function h is V -regular of order r if h is absolutely continuous
of order r1 and h
(r1)
is V -Lipschitz. If also h
(1)
, . . . , h
(r1)
have nite V -norms,
we call h strongly V -regular of order r.
Lemma 6.3. If h is (strongly) V -regular of order r 2, then h is (strongly)
V -Lipschitz of the same order.
Proof: Let
(x, t) = h(x + t) h(x)
r1
i=1
t
i
i!
h
(i)
(x).
We have
(x, t) = t
r1
_
1
0
(1 u)
r2
(r 2)!
_
h
(r1)
(x +ut) h
(r1)
(x)
_
du.
Since h
(r1)
is V -Lipschitz and V (ut) V (t) for |u| 1,
_
V (x)|(x, t)| dx L|t|
r
V (t)
_
1
0
(1 u)
r2
(r 2)!
du. (6.1)
The desired results are now immediate.
Remark 6.3. In the case of strong V -regularity, the bound (6.1) can be replaced by
_
V (x)|(x, t)| dx 2
|t|
r
1 +|t|
V (t)
_
1
0
(1 u)
r2
(r 2)!
du,
with = max(L, 2h
(r1)
V
). This follows from the fact that the map x x/(1+
x) is increasing on the interval (0, ) and the following lemma. This alternative
bound is better if |t| is large.
Lemma 6.4. If h has nite V -norm and is V -Lipschitz with constant L, then
h( t) h
V
2
|t|
1 +|t|
V (t), t R,
where = max(L, 2h
V
).
Proof: The statement is clear if |t| 1, and it follows from the bound
h( t) h
V
(V (t) + 1)h
V
2V (t)h
V
for |t| > 1.
Sucient condition for V -regularity of g are given next.
126 Anton Schick and Wolfgang Wefelmeyer
Lemma 6.5. Suppose f has nite V -norm and nite V -variation for V = v
with
0. Let N p. Then g is strongly V -regular of order p + 1 and hence strongly
V -Lipschitz of that order.
Proof: Let
1
, . . . ,
p
denote the rst p nonzero numbers among a
s
, s 1. Let g
p
denote the density of
0
+
p
i=1
i
. Then g(x) = E[g
p
(x Z)], x R, for some
random variable Z with E[V (Z)] < . Thus by (3) of Lemma 6.1 it suces to
show that g
p
is strongly V -regular of order p + 1. This is true for p = 0. In this
case, g
0
equals f, and the latter is V -Lipschitz with constant
_
V d by Lemma 6.2.
The desired result now follows by induction using the following lemma. Keep in
mind that the density of a
0
inherits the properties of the density f for non-zero a.
Lemma 6.6. Suppose the functions h
1
and h
2
have nite V -norms, h
1
is V -
Lipschitz with constant L and h
2
has nite V -variation. Then h = h
1
h
2
is
absolutely continuous and h
V
h
1
V
_
V d and is V -
Lipschitz with constant L
_
V d, where is a measure of V -variation. Hence h is
strongly V -Lipschitz of order 2.
Proof: We may assume that h
2
(x) =
1
((, x])
2
((, x]) for all x, where
1
and
2
are measures such that
_
V d(
1
+
2
) is nite. For i = 1, 2, set q
i
(x) =
_
h
1
(xy)
i
(dy), x R. By Lemma 6.1, q
i
has nite V -norm and q
i
is V -Lipschitz.
Since q
1
q
2
is an almost everywhere derivative of h, as shown in Lemma 1 of Schick
and Wefelmeyer (2006), we obtain the desired result.
7. A bound
Consider a linear process
S
t
=
s=0
c
s
U
ts
, t Z,
with independent and identically distributed innovations U
t
, t Z, with nite
mean and summable coecients c
0
, c
1
, . . . with c
0
= 0. For a bounded measurable
function h, set
H(x) = n
1/2
n
j=1
_
h(x S
j
) E[h(x S
j
)]
_
, x R.
In this section we derive bounds for
_
v
r
(x)E[H
2
(x)] dx with r 0. For this we set
c =
j=0
|c
j
| and D =
j=0
(j + 1)|c
j
|.
To simplify notation, we abbreviate U
0
by U. Also, let us set
A(, ) = 2
1
(1 +
E[|U|
]), 0, 1.
Lemma 7.1. Let p and q be non-negative and q
DA
4
,
where = max(L, 2h
v
q) and A = A
_
max(1, 2c), p +q
_
.
Convergence rates 127
Proof: Let = p +q
. For j = 0, 1, . . . , let
Q
j
=
s=0
|c
s
U
js
|, T
j
=
j1
s=0
c
s
U
js
, R
j
= S
j
T
j
=
s=j
c
s
U
js
,
and h
j
(x) = E[h(x T
j
)], x R. Note that T
0
= 0, R
0
= S
0
and h
0
= h. The
absolute values of T
j
and R
j
are bounded by Q
j
so that for non-negative t and all
j = 0, 1, . . . ,
E[v
t
(T
j
)] E[v
t
(Q
j
)] and E[v
t
(R
j
)] E[v
t
(Q
j
)]. (7.1)
Let Q = Q
0
. The argument leading to (3.2) yields that, for every t [0, ] and
every j = 0, 1, . . . ,
E[v
t
(Q)] E[v
(Q+Q
j
)] A. (7.2)
Using stationarity and a conditioning argument, we obtain
E[H
2
(x)] = Var(h(x S
0
)) +
2
n
n1
j=1
(n j) Cov(h(x S
0
), h(x S
j
))
= Var(h
0
(x R
0
)) +
2
n
n1
j=1
(n j) Cov(h
0
(x R
0
), h
j
(x R
j
)).
Thus
_
v
p+q
(x)E[H
2
(x)] dx 2
j=0
j
where
j
=
_
v
p+q
(x)E[|h(x R
0
) E[h(x R
0
)]||h
j
(x R
j
) h
j
(x)|] dx.
Since v
p
h is bounded and v(x +y) v(x)v(y), we derive the bound
v
p
(x)|h(x R
0
)| v
p
(x R
0
)v
p
(R
0
)|h(x R
0
)| v
p
h
v
p
(Q)
which implies
v
p
(x)|E[h(x R
0
)]| v
p
h
E[v
p
(Q)] v
p
h
A.
Using these bounds and v 1 and A 1, we obtain for j 0,
j
2Av
p
h
E
_
v
p
(Q)
_
v
q
(x)|h
j
(x R
j
) h
j
(x)| dx
_
.
Note that h
j
v
q h
v
q E[v
q
(T
j
)] and h
j
is v
q
-Lipschitz with constant L
j
=
LE[v
q
(T
j
)]. Thus, by Lemma 6.4 and the inequalities (7.1) and (7.2), we obtain
the bound
_
v
q
(x)|h
j
(x R
j
) h
j
(x)| dx 2A
_
v
q1
(Q
j
)|R
j
|
_
.
Since v
s
(x)v
t
(y) (v(x +y))
s+t
for non-negative s, t, x, y, the above shows that
j
4A
2
v
p
h
E[v
1
(Q +Q
j
)|R
j
|].
128 Anton Schick and Wolfgang Wefelmeyer
Using v(x + y) v(x)v(y) and the independence of U
i
and Q
j,i
= Q + Q
j
|c
i
U
i
| |c
j+i
U
i
| for i 0, we obtain
E[v
1
(Q +Q
j
)|R
j
|] E
_
s=j
|c
s
U
js
|v
1
(Q+Q
j
)
_
s=j
|c
s
|E[|U
js
|v
1
((|c
sj
| +|c
s
|)U
js
)]E[v
1
(Q
j,sj
)]
s=j
|c
s
|E[v
(U)]E[v
(Q+Q
j
)],
with = max(1, 2c). We have E[v
j
4v
p
h
A
4
s=j
|c
s
|, j 0.
Note also that
j=0
s=j
|c
s
| =
s=0
(1 +s)|c
s
| = D.
The desired result is now immediate.
Using the inequality h
j
( t) h
j
v
q AL|t|v
q
(t) instead of the inequality
provided by Lemma 6.4, we can avoid the assumption that h has nite v
q
-norm at
the price of (possibly) increasing the moment condition from p + q
to p + q + 1.
More precisely, we have the following result.
Lemma 7.2. Let p and q be non-negative. Suppose h is v
q
-Lipschitz with constant
L, v
p
h is bounded, and U has a nite moment of order = p + q + 1. Let D be
nite. Then
_
v
p+q
(x)E[H
2
(x)] dx 4LA
4
v
p
h
D,
where now A = A(max(1, 2c), p +q + 1).
Repeating the above proof with v
p
= v
q
= 1, we obtain the following result.
Lemma 7.3. Suppose h is bounded and 1-Lipschitz with constant L. Let D be
nite. Then E[H
2
(x)] 4h
j=0
E[|h(x R
j
) h(x)|], x R.
Consequently,
_
E[H
2
(x)] dx 4Lh
DE[|U|].
If we take h(x) = 1[0 x], then H becomes the empirical process
D
n
(x) = n
1/2
n
j=1
(1[S
t
x] P(S
t
x)), x R.
This choice of h is bounded by 1 and 1-Lipschitz with constant L = 1. Thus we
have the following result.
Convergence rates 129
Corollary 7.1. Let D be nite. Then there exists an integrable function with
1
4DE[|U|] such that E[D
2
n
(x)] (x) for all n and x.
Acknowledgment. We thank two referees whose suggestions led us to rewrite the
paper completely.
References
A. Bryk and J. Mielniczuk. Asymptotic properties of density estimates for linear
processes: application of projection method. J. Nonparametr. Stat. 17, 121133
(2005).
P. Hall and J. D. Hart. Convergence rates in density estimation for data from
innite-order moving average processes. Probab. Theory Related Fields 87, 253
274 (1990).
M. Hallin and L. T. Tran. Kernel density estimation for linear processes: Asymp-
totic normality and optimal bandwidth derivation. Ann. Inst. Statist. Math. 48,
429449 (1996).
Z. Lu. Asymptotic normality of kernel density estimators under dependence. Ann.
Inst. Statist. Math. 53, 447468 (2001).
U. U. M uller, A. Schick and W. Wefelmeyer. Weighted residual-based density esti-
mators for nonlinear autoregressive models. Statist. Sinica 15, 177195 (2005).
A. Schick and W. Wefelmeyer. Pointwise convergence rates and central limit theo-
rems for kernel density estimators in linear processes. Statist. Probab. Lett. 76,
17561760 (2006).
A. Schick and W. Wefelmeyer. Root-n consistent density estimators of convolutions
in weighted l
1
-norms. J. Statist. Plann. Inference 137, 17651774 (2007a).
A. Schick and W. Wefelmeyer. Uniformly root-n consistent density estimators for
weakly dependent invertible linear processes. Ann. Statist. 35, 81584 (2007b).
A. Schick and W. Wefelmeyer. Root-n consistency in weighted l
1
-spaces for density
estimators of invertible linear processes (2008). To appear in: Stat. Inference
Stoch. Process.
L. T. Tran. Kernel density estimation for linear processes. Stochastic Process. Appl.
41, 281296 (1992).
W. B. Wu and J. Mielniczuk. Kernel density estimation for linear processes. Ann.
Statist. 30, 14411459 (2002).