0% found this document useful (0 votes)
74 views27 pages

Y Ainer Ahlhaus and Iudas Iraitis

ghjjjhhj

Uploaded by

bakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views27 pages

Y Ainer Ahlhaus and Iudas Iraitis

ghjjjhhj

Uploaded by

bakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

ON THE OPTIMAL SEGMENT LENGTH FOR PARAMETER ESTIMATES

FOR LOCALLY STATIONARY TIME SERIES

BY RAINER DAHLHAUS AND LIUDAS GIRAITIS


UniversitaÈt Heidelberg
First version received Aug 95

Abstract. We discuss the behaviour of parameter estimates when stationary time


series models are ®tted locally to non-stationary processes which have an evolutionary
spectral representation. A particular example is the estimation for an autoregressive
process with time-varying coef®cients by local Yule±Walker estimates. The bias and
the mean squared error for the parameter estimates are calculated and the optimal
length of the data segment is determined.

Keywords. Non-stationary processes; autoregressive processes; evolutionary spec-


trum; local time series models; time-varying parameters; segment length.

1. INTRODUCTION

In many areas such as signal processing, geophysics or engineering, time series


data do not satisfy the assumption of stationarity. Nevertheless, stationary time
series models or spectral estimations are very often applied on small segments of
the data where the time series is regarded as approximately stationary. Such an
approach is quite common in signal processing where, for example, a spectral
estimate is calculated on consecutive segments of the signal.
Another example is linear predictive coding where stationary autoregressive
(AR) models are ®tted to the signal on small (overlapping) segments of the
signal. Afterwards the time behaviour of the estimated AR coef®cients is
studied. It is heuristically obvious that the non-stationarity over the segment
causes a bias. Therefore the segment length should be chosen with respect to
the `degree of non-stationarity' of the signal.
In Section 2 we study this problem, i.e. we estimate non-parametrically the
coef®cients of a time-varying AR process given by the system of difference
equations
X
p
a jt X tÿ j ˆ ó t å t
jˆ0

where a0 t  1 and the å t are independent and identically distributed (i.i.d.)


variables with mean zero and variance 1. a jt ( j ˆ 1, . . ., p) and ó t are assumed
to change slowly over time. Given T observations of the process, we want to

0143-9782/98/06 629±655 JOURNAL OF TIME SERIES ANALYSIS Vol. 19, No. 6


# 1998 Blackwell Publishers Ltd., 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street,
Malden, MA 02148, USA.
630 R. DAHLHAUS AND L. GIRAITIS

estimate a jt and ó t for some ®xed t. This is done by using the tapered Yule±
Walker equations (2.2) (see later) on a segment of length N ˆ N (T ) with
N ˆ o(T ) around t. One goal of the paper is to determine the optimal length of
the segment N and the optimal data taper as a function of the local variation of
the coef®cients a jt and ó t .
The technical dif®culties require asymptotic considerations. As in non-
parametric regression we therefore rescale the parameters a jt and ó t in an
adequate way in order to obtain meaningful asymptotic results (cf. Dahlhaus,
1996b). Therefore, we consider the system of difference equations
X p    
t t
aj X tÿ j,T ˆ ó åt 1<t<T (1:1)
jˆ0
T T

where a0 (u)  1, få t g are i.i.d. variables with zero mean and variance 1 and
ó (:), a j (:), j ˆ 1, . . ., p, are (smooth) real functions on [0, 1]. A unique solution
of (1.1) which is causal exists under certain regularity conditions (see (2.4) later)
if ó (:) and a j (:), j ˆ 1, . . ., p, are extended to a j (u) ˆ a j (0) and ó (u) ˆ ó (0)
for u , 0. This means that the process X t,T for t < 0 and in particular the
starting values X ÿ p‡1,T , . . ., X 0,T are assumed to be stationary. å t,T :ˆ ó (t=T )å t
are the one-step-ahead forecast errors. We denote by t a time point in the
interval [1, T ] while u is a time point in the rescaled interval [0, 1] i.e. u ˆ t=T .
The properties of tapered Yule±Walker estimates are studied in this framework
in Section 2. The optimal segment length N and the optimal data taper are
derived in Theorem 2.2.
A time-varying AR process as in (1.1) is a special case of a locally
stationary process as introduced by Dahlhaus (1996a, 1996b, 1997)Ðsee the
de®nition below. In Section 3 we study as a generalization the non-parametric
estimation of arbitrary time-varying parameters of locally stationary processes.
Furthermore, the ®tted model may be misspeci®ed. We assume that the
observed data are generated by an arbitrary locally stationary process with
spectral density f (u, ë) and we ®t locally at time t a (stationary) ®nite-
dimensional model with spectral density f è( t=T ) (ë). The goal then is to estimate
the parameter function è(u), u 2 [0, 1] (for the AR model (1.1) we have
è(u) ˆ (ó 2 (u), a1 (u), . . ., , a p (u))9).
We discuss the asymptotic properties of the (local) estimator è(u) ^ of the
parameter è(u) de®ned as the minimizer of the Kullback±Leibler distance. As
in Section 2 we determine the optimal segment length and the optimal data
taper by minimizing the mean squared ^ ÿ è(u)i 2
error (quadratic risk) Ei è(u)
„
^ 2
and the integrated squared error Ei è(u) ÿ è(u)i du, where i i denotes the :
Eulidean norm.
In Section 4 we investigate the asymptotic properties of the integrated
periodogram. The results are used in Sections 2 and 3.
Processes with an evolutionary spectral representation were introduced and
investigated by Priestley (1965, 1988). The present approach of local
stationarity may be regarded as a setting which allows for a meaningful

# Blackwell Publishers Ltd 1998


OPTIMAL SEGMENT LENGTH FOR PARAMETER ESTIMATES 631
asymptotic theory for processes with an evolutionary spectral representation (cf.
the discussion in Dahlhaus, 1995b). Subba Rao (1970, 1996) and Hussain and
Subba Rao (1976) have studied the estimation of time-dependent parameters for
processes with an evolutionary spectral representation. In addition to the
different results, our approach is also different.

2. SEMIPARAMETRIC LOCALLY STATIONARY AR MODELS

In this section we study the properties of Yule±Walker estimates for the


coef®cients of Equation (1.1) applied locally on a segment of length N around
some ®xed time point u. As mentioned in Section 1 this procedure is often
justi®ed by the idea that the data on the segment can be reasonably well
approximated by a stationary process. We formalize this idea by ®rst looking at
the (theoretical) Yule±Walker equations, the covariance function and the spectral
density of a stationary AR process with parameters a1 (u), . . ., a p (u) where u is
®xed. In this situation it is well known that
a u ˆ ÿRÿ1
u ru

ó 2 (u) ˆ c(u, 0) ‡ a9u ru


where a u :ˆ (a1 (u), . . ., a p (u))9, ru :ˆ (c(u, 1), . . ., c(u, p))9, Ru :ˆ fc(u,
i ÿ j)g i, jˆ1,..., p ,
…ð
c(u, j) :ˆ exp (ijë) f (u, ë) dë j2 Z
ÿð

is the local covariance function at time u and


p ÿ2
ó 2 (u) X
:
f (u, ë) :ˆ a j (u) exp(ÿijë) (2:1)
2ð jˆ0

f (u, ë) is the spectral density of a stationary AR process with parameters


a1 (u), . . ., a p (u), ó 2 (u) (u ®xed). It can be shown that X t,T as de®ned by (1.1) is
locally stationary with time-varying spectral density f (u, ë)Ðsee the de®nition
in Section 3 and the discussion thereafter.
Before investigating the estimate we mention that one can obtain also a
slightly different set of Yule±Walker type equations by multiplying both sides
of Equation (1.1) by X tÿi,T and taking expectations (note that (1.1) has the
causal solution (2.4) and that the relation EX tÿ j,T X tÿi,T ˆ c(t=T ,
i ÿ j) ‡ O(T ÿ1 ) can be proved, showing that both sets of equations are
asymptotically equivalent).
We now estimate the coef®cient a1 (u), . . ., a p (u) by the local empirical
Yule±Walker equations based on a segment of N observations around u, i.e. we
proceed as if the observations were stationary on the segment [uT ] ÿ

# Blackwell Publishers Ltd 1998


632 R. DAHLHAUS AND L. GIRAITIS

N =2 ‡ 1, . . ., [uT ] ‡ N =2. Therefore ^a N ,u ˆ (^a N (u, 1), . . ., ^a N (u, p))9 and ó^ 2N ,u


are given by
^ ÿ1
^a N ,u ˆ ÿ R N ,u ^
ru

ó^ 2N ,u ˆ ^c N (u, 0) ‡ ^a9N ,u ^ru (2:2)


^ N ,u :ˆ f^c N (u, i ÿ j)g i, jˆ1,..., p and
where ^ru :ˆ (^c N (u, 1), . . ., ^c N (u, p))9, R
   
1 X N ÿ1
s t
^c N (u, j) :ˆ h h X [uT ]ÿ N =2‡s‡1,T X [uT ]ÿ N =2‡ t‡1,T (2:3)
H N s, tˆ0 N N
sÿ tˆ j

is the sample covariance with tapered data on the segment [uT ] ÿ


N =2 ‡ 1, . . ., [uTP
] ‡ N =2. h: [0, 1] „! R is a data taper with h(x) ˆ
1
h(1 ÿ x), H N :ˆ Njˆ0 ÿ1 2
h ( j=N )  N 0 h2 (x) dx is the normalizing factor and
[a] denotes the entire part of a real number a. For h(x) ˆ 1 if x 2 [0, 1] and
h(x) ˆ 0 elsewhere we obtain the classical non-tapered covariance estimate.
The use of tapered covariances has two bene®ts: ®rst it reduces the bias of
the AR estimate due to leakage which leads to a good AR estimate also for
small samples (cf. Dahlhaus, 1988); second it weights down the bias due to
non-stationarity in the segment. A major result of this paper is the
determination of the optimal segment length N and the optimal taper with
respect to the `degree of non-stationary' of the process (Theorem 2.2). We
mention that the above empirical covariance estimate is equivalent to a kernel
estimate with kernel h(x)2 and bandwidth N =T (see Remark 2.6).
We use the following assumptions.

ASSUMPTIONS 2.1

(i) The derivatives j@ 3 ó 2 (u)=@u 3 j, j@ 3 a j (u)=@u 3 j, j ˆ 1, . . ., p, are uni-


formly bounded in u.P
p j
(ii) The roots of jˆ0 a j (u)z are uniformly bounded away from the unit
circle.
(iii) P(X t,T ˆ 0) ˆ 0, 1 < t < T .

Under these assumptions it follows from KuÈnsch (1995) that the AR


equations (1.1) have a solution of the form
X
X t,T ˆ ø j,t,T å tÿ j 1<t<T (2:4)
j>0

where få j g is an i.i.d. sequence and the real weights ø j, t,T are such that
P
j>0 jø j, t,T j , 1 uniformly in t and T (see Miller, 1969; Hallin, 1978, 1984;
MeÂlard, 1985).
Furthermore, let

# Blackwell Publishers Ltd 1998


OPTIMAL SEGMENT LENGTH FOR PARAMETER ESTIMATES 633
„1 „1
0 h2 (x)(x ÿ 12)2 dx h4 (x) dx
d(h) :ˆ „1 v(h) ˆ „ 10 (2:5)
2 f 0 h2 (x) dxg2
0 h (x) dx

and
b(u) ˆ ÿ (a1 (u), 2a2 (u), . . ., pa p (u))9
8P
> p=2ÿ1
< jˆ0 fa j (u) ÿ a pÿ j (u)gc j p even
‡
: P( pÿ1)=2 fa jÿ1 (u) ÿ a pÿ j (u)gd j
>
p odd
jˆ0

where c j is p 3 1 and has 1s in rows j ‡ 2, j ‡ 4, . . ., p ÿ j and 0s elsewhere,


and d j is p 3 1 and has 1s in rows j ‡ 1, j ‡ 3, . . ., p ÿ j and 0s elsewhere.
b(u) occurs as a bias term for least squares and Yule±Walker estimates in the
stationary case (cf. Shaman and Stine, 1988; Zhang, 1992).

THEOREM 2.1. Suppose that Assumptions 2.1 hold. Then, as T =N 2 ‡


N =T ! 0,
  
1 N2 ÿ1 d2 d2
E^a N ,u ˆ a u ÿ d(h)R u Ru a u ‡ 2 ru
2 T2 du 2 du
 
1 N2 1
‡ v(h)b(u) ‡ o ‡
N T2 N
and the variance±covariance matrix is
 
1 2 ÿ1 1
cov(^a N ,u ) ˆ v(h)ó (u)R u ‡ o : (2:6)
N N

PROOF. In our proofs we use C as a generic constant. According to the


^ N ,u ^a N ,u ‡ ^ru ˆ 0. Then, in
de®nition of ^a N ,u , a u , we have Ru a u ‡ ru ˆ 0 and R
view of the equality
^ N ,u ÿ Ru )^a N ,u ‡ ^ru ÿ ru g
Ru (^a N ,u ÿ a u ) ˆ ÿf( R (2:7)
and the well-known inequality
i ABi < i Ai sp i Bi (2:8)
where i Ai denotes the Euclidean norm and i Ai sp ˆ sup i xiˆ1 i Axi the spectral
norm of a matrix A (cf. Davies, 1973), it follows that
i^a N ,u ÿ a u i < C sup i Rÿ1 ^
u i sp (i RN ,u ÿ Ru i sp i^
a N ,u i ‡ i^ru ÿ ru i)
u

^ N ,u ÿ Ru i sp ‡ i^ru ÿ ru i)
< C(i R
since by Lemma 4.2 (later) i^a N ,u i < 2 p . (For this property Assumption 2.1(iii)

# Blackwell Publishers Ltd 1998


634 R. DAHLHAUS AND L. GIRAITIS

is needed.) Note that, by Assumption 2.1(ii), the spectral density ful®ls


0 , c1 < f (u, ë) < c2 , 1 for some c1 , c2 . 0 which implies that sup u
i Rÿ1
u i sp , 1. To ®nd an upper bound for the above expression we use the
results of Section 4 on the integrated periodogram. Note that with the notations
(3.5) and (4.1) the elements ^c N (u, j) of the matrix R ^ N ,u can be written as
integrated periodograms, i.e.
…ð
^c N (u, j) ˆ exp(ië j) I N (u, ë) dë ˆ J N fexp(ij:)g: (2:9)
ÿð

Furthermore, c(u, j) ˆ J fexp(ij:)g with J as de®ned in (4.19). Thus, by


Theorems 4.1 and 4.2, we get with d N ˆ N 2 =T 2 ‡ 1=N 1=2
2l
^ u ÿ Ru i ‡ Ei^ru ÿ ru i 2 l ) < Cd 2 l
Ei^a N ,u ÿ a u i 2 l < C(Ei R sp N (l > 1): (2:10)

Further, rewrite (2.7) as

ÿR ^ N ,u ^a N ,u ‡ ^ru ÿ ( R
^ N ,u a u ÿ ^ru ˆ R ^ N ,u a u ‡ ^ru )

^ N ,u (^a N ,u ÿ a u )
ˆR

^ N ,u ÿ Ru )(^a N ,u ÿ a u ):
ˆ Ru (^a N ,u ÿ a u ) ‡ ( R

This gives

^a N ,u ÿ a u ˆ ÿRÿ1
u m^ N ,u ‡ Rÿ1 ^ ÿ1
^ N ,u ‡ rarN
u ( RN ,u ÿ Ru )R u m (2:11)
^ N ,u ÿ Ru )a u ‡ ^ru ÿ ru and
^ N ,u :ˆ ( R
where m

rarN ˆ Rÿ1 ^ ÿ1 ^
u ( RN ,u ÿ Ru )R u ( RN ,u ÿ Ru )(^
a N ,u ÿ a u ):

Now, by (2.8) we obtain


^ 2 2
irarN i < sup i Rÿ1
u i sp i RN ,u ÿ Ru i sp i^
a N ,u ÿ a u i:
0<u<1

Therefore it follows with the Cauchy inequality and (2.10) that


8
^ N ,u ÿ Ru i Ei^a N ,u ÿ a u i 4 )1=2 ˆ O(d 6 ):
EirarN i 2 < C(Ei R sp N (2:12)

Finally, Theorem 4.2 implies that


^ N ,u ÿ Ru )a u ‡ E^ru ÿ ru
^ N ,u :ˆ (E R
Em
  
1 N2 ÿ1 d2 d2
ˆ d(h)R u Ru a u ‡ 2 ru f1 ‡ o(1)g
2 T2 du 2 du
and

# Blackwell Publishers Ltd 1998


OPTIMAL SEGMENT LENGTH FOR PARAMETER ESTIMATES 635
 
^ N ,u ÿ Ru )Rÿ1 m ^ N ,u ÿ E R
^ N ,u )Rÿ1 ( m N2
E( R u ^ N ,u ˆ E( R u ^ N ,u ÿ E ^
m N ,u ) ‡ o
T2
 
N2
ˆ: l N ,u ‡ o :
T2
Theorems 4.1 and 4.2 and Corollary 4.1 establish the existence of the limits
lim T Nl N ,u ˆ: v(h)l(u) and lim T N cov (^a N ,u ) ˆ: v(h)W (u), which are the same
as in the stationary case for the AR model (1.1) with coef®cients ao (u), . . .,
a p (u), ó 2 (u). But then it is well known that l(u) ˆ Rÿ1 (u)b(u) (see Shaman and
Stine, 1988; Zhang, 1992) and that W (u) ˆ ó 2 (u)Rÿ1 u (see Brockwell and
Davies, 1986, Theorem 8.1.1).
This together with (2.11) and (2.12) proves the theorem. j

The ®rst term of the bias in Theorem 2.1 is due to non-stationarity (note that
it disappears if Ru and ru are constant over time) while the second term is the
bias of the tapered Yule±Walker estimate in the stationary case (cf. Zhang,
1992). The covariance is the same as the covariance of the Yule±Walker
estimate in the stationary case. If the goal is a small bias we may now balance
the two bias terms. In the following theorem we minimize the mean squared
error with respect to the segment length N and the taper function h.
2
Let i^a N ,u ÿ a u i Ó u :ˆ (^a N ,u ÿ a u )9Ó u (^a N ,u ÿ a u ). In view of the asymptotic
variance of ^a N ,u it makes sense to consider this norm with Ó u ˆ f1=ó 2 (u)gRu .
Alternatively, one may choose Ó u as the identity matrix. Furthermore, let
  9   
d2 d2 ÿ ÿ1 d2 d2
D1 (u) :ˆ Ru a u ‡ 2 ru R u Ó u R u Ru a u ‡ 2 ru
du 2 du du 2 du
D2 (u) :ˆ ó 2 (u) tr (Ó u Rÿ1
u )

…1 (… 1   2 )2
v(h) 1
C(h) :ˆ 2 ˆ h4 (x) dx h2 (x) x ÿ dx
d (h) 0 0 2

c(h) :ˆ v(h)d(h)1=2 :
We have
 
2 N 4 d 2 (h) 1 N4 1
Ei^a N ,u ÿ au iÓ u ˆ 4 D1 (u) ‡ v(h)D2 (u) ‡ o ‡ : (2:13)
T 4 N T4 N

2
THEOREM 2.2. The mean squared error Ei^a N ,u ÿ a u i Ó u is minimal for
h(x) ˆ hopt (x) ˆ f6x(1 ÿ x)g1=2 0<x<1 (2:14)
and

# Blackwell Publishers Ltd 1998


636 R. DAHLHAUS AND L. GIRAITIS
 
D2 (u) 1=5 4=5
N ˆ N opt (u) ˆ C(hopt )1=5 T : (2:15)
D1 (u)
In this case
2 5
T 4=5 Ei^a N ,u ÿ a u i Ó u ˆ c(hopt )4=5 D1 (u)1=5 D2 (u)4=5 ‡ o(1) (2:16)
4
uniformly in u.

PROOF. We use similar arguments as in Priestley's derivation of the optimal


window in spectral density estimation (Priestly, 1981, Ch. 7.5). According to
(2.13) we have to minimize
N 4 d 2 (h)D1 (u) 1
f (N , h) :ˆ ‡ v(h)D2 (u)
T4 4 N
over N and h. The minimum of f (N , h) in N is determined by the equation
@ N3 1
f (N , h) ˆ 4 d 2 (h)D1 (u) ÿ 2 v(h)D2 (u) ˆ 0
@N T N
and is achieved at N h ˆ fC(h) D2 (u)=D1 (u)g1=5 T 4=5 .
It remains to minimize f (N h , h) ˆ 54 c(h)4=5 D1 (u)1=5 D2 (u)4=5 T ÿ4=5 with
respect to h. Since d(h), v(h) and the estimate are invariant under the
multiplication
„1 2 of h by a constant we can restrict ourselves to tapers with
0 h (x) dx ˆ 1. We now minimize f (N h , h) with respect to the larger class of
functions h: R ! R with
…
h2 (x) dx ˆ 1 (2:17)
R

and
h(x) ˆ h(1 ÿ x): (2:18)
Suppose
 
1 x ÿ 1=2 1
h r (x) ˆ 1=2 h ‡
ô ô 2
(if h2 is regarded as a probability density centred around 1=2, then h2ô is the
„density
2
with scale multiplied by ô also centred around 1=2). We have
2 ÿ1
R hô (x) dx ˆ 1, d(hô ) ˆ ô d(h) and v(hô ) ˆ ô v(hô ) (with the integrals in the
de®nition of d and v extended to the real line). As a consequence we obtain
N hô ˆ N h =ô and f (N hô , h) ˆ f (N h , h), i.e. we can restrict ourselves to h with
…  2 …  2
2 1 2 1
h (x) x ÿ dx ˆ hopt (x) x ÿ dx (2:19)
R 2 R 2
where hopt is as in (2.14) with hopt (x) ˆ 0 for x 2
= [0, 1]. Therefore, we have to
show that

# Blackwell Publishers Ltd 1998


OPTIMAL SEGMENT LENGTH FOR PARAMETER ESTIMATES 637
… …
h4 (x) dx > h4opt (x) dx
R R

for all h that ful®l (2.17),


„ (2.18) and (2.19). Let h2 (x) ˆ h2opt (x) ‡ å(x), x 2 R,
where å 6 0. Since R å 2 (x) dx > 0 and hopt disappears outside [0, 1] it is
suf®cient to prove that
…1
å(x)h2opt (x) dx > 0:
0

By (2.17) and (2.19)


… …
å(x) dx ˆ 0 (x 2 ÿ x)å(x) dx ˆ 0:
R R

We obtain
…1 …
å(x)h2opt (x) dx ˆ ÿ6 å(x)x(x ÿ 1) dx > 0
0 R=[0,1]

since å(x) > 0 and x(x ÿ 1) < 0 for x 2


= [0, 1]. j

By the same argument we obtain the following theorem for the global risk
on the interval I T :ˆ [N =(2T), 1 ÿ N =(2T )].
„
THEOREM 2.3. The minimum of the integrated square risk I T Ei^a N ,u ÿ
2
a u i Ó u du is achieved by the taper hopt as given in (2.14) and the global
bandwidth
(„ )1=5
1=5 „ I T D2 (u) du
N ˆ Ngl ˆ C(hopt ) T 4=5 :
I T D1 (u) du

In this case
… … 1=5 … 4=5
4=5 2 4 4=5
T Ei^a N ,u ÿ a u i Ó u du ˆ c(hopt ) D1 (u) du D2 (u) du
IT 5 IT IT

‡ o(1):

REMARK 2.1 Assumption 2.1(iii) is needed in the proof of the boundedness of


the empirical Yule±Walker estimate ^a N ,u in Lemma 4.2. If the distribution of
the i.i.d. sequence (å t ) has a density, then it is easy to show that the
distribution function of X t,T is absolutely continuous and therefore Assumption
2.1(iii) is satis®ed.

REMARK 2.2 D1 (u) measures the `degree of non-stationarity' while D2 (u)


measures the variability of the estimate at time u. N opt gets larger if D1 (u) gets
smaller, i.e. if the process is closer to stationarity. At the same time the mean

# Blackwell Publishers Ltd 1998


638 R. DAHLHAUS AND L. GIRAITIS

squared error decreases. The results are similar to kernel estimation in non-
parametric regression or to kernel density estimation (cf. Silverman, 1986).
N =T is the bandwidth while h2 (x) corresponds to the kernel (cf. Remark 2.6
below). In fact h2opt (x) ˆ 6x(1 ÿ x) is a transformation of the Epanechnikov
kernel. If we de®ne the ef®ciency of the taper h as
 
c(hopt ) 4=5
eff (h) ˆ (< 1)
c(h)
which compares it with the optimal taper, then for large N the mean squared
error (2.16) will be the same using N observations and the taper h as if we use
eff (h)N observations and the taper hopt (cf. Silvermann, 1990).

REMARK 2.3 We have excluded in this paper the problem of estimation of a u


at the edges. Here one may use special tapers (or kernelsÐcompare the
discussion in Remark 2.6). Of course, the problem of selecting an adequate
segment length from the sample also remains to be solved. The literature on
smoothing offers a wide variety of methods for bandwidth selection (e.g. cross-
validation, plug-in methods, bootstrap methods) which should be investigated in
the present situation. Furthermore, there are a large number of other smoothing
techniques whose use could be discussed for time-varying AR models.
Dahlhaus et al (1997) have investigated non-linear wavelet estimates for
time-varying AR processes.

REMARK 2.4. We omit the discussion of the properties of the scale parameter
estimate ó^ 2 (u) at this point. They can be obtained as a special case of the
results of Section 3.

REMARK 2.5. It should be pointed out that the asymptotic properties of the
bias and the quadratic risk for least squares and Yule±Walker estimates of AR
coef®cients were obtained by Shaman and Stine (1988) and Lewis and Reinsel
(1988) only under an additional ergodicity type assumption on the sample
covariance, Ei R^ ÿ1 ÿ Rÿ1 i 8 , 1. By our approach this assumption can be
N ,u u sp
omitted for Yule±Walker estimates in both the stationary and the locally
stationary case.

REMARK 2.6. An equivalent estimator is a local least squares estimator.


Consider the minimization of
  p 2
1 X T
1 u ÿ t=T X
W a j X tÿ j,T
T tˆ1 b T bT jˆ0

with respect to a1 , . . ., a p where W is a kernel and b T is a bandwidth. The


resulting estimate is
~ ÿ1 ~ru
~a N ,u ˆ ÿ R N ,u

# Blackwell Publishers Ltd 1998


OPTIMAL SEGMENT LENGTH FOR PARAMETER ESTIMATES 639
~ N ,u :ˆ f~c N (u, i, j)g i, jˆ1,..., p , r~u :ˆ (~c N (u, 0, 1), . . ., ~c N (u, 0, p))9 and
where R
 
1 X T
1 u ÿ t=T
~c N (u, i, j) :ˆ W X tÿi,T X tÿ j,T :
T tˆ1 b T bT
If W (x) ˆ h(x)2 and b T ˆ N =T then
~c N (u, i, j) ÿ ^c N (u, i ÿ j) ˆ Op (N ÿ1 )
and therefore also
~a N ,u ÿ ^a N ,u ˆ Op (N ÿ1 ):
Furthermore, we conjecture that ~a N ,u has the same bias expansion as ^a N ,u and
that a similar result on the mean squared error can be proved. Subba Rao (1970)
has studied a weighted least squares regression for time-varying parameters of
processes with an evolutionary spectral representation.

3. PARAMETER ESTIMATION IN SEMIPARAMERIC LOCALLY STATIONARY MODELS

We now generalize the results of the preceding section to arbitrary locally


stationary models. Furthermore, we include the case where the model is
misspeci®ed.
As in Dalhaus (1997) we call a sequence of stochastic processes X t,T
(t ˆ 1, . . ., T ) locally stationary if they admit a time-varying spectral repre-
sentation
  …ð
t
X t,T ˆ ì ‡ exp(itë)Aot,T (ë)î(dë) t ˆ 1, . . ., T (3:1)
T ÿð

where the transfer function Aot,T (ë) ˆ Aot,T (ÿë) is such that
 
t
Aot,T (ë) ˆ A , ë ‡ O(T ÿ1 )
T
uniformly in 1 < t < T and jëj < ð for some function A(u, ë):
[0, 1] 3 [ÿð, ð] ! C with A(u, ë) ˆ A(u, ÿë). The functions A(u, ë) and
ì(u) are assumed to be smooth functions (in this paper we assume ì(u)  0 and
differentiability of A(u, ë)Ðcompare Assumption 3.1(ii)). The stochastic process
î(ë) is assumed to have bounded spectral densities h k , k > 2 (h1  0, h2  1),
de®ned by the cumulants of the random measure dî(ë),
0 1
Xk
cumfdî(ë1 ), . . ., dî(ë k )g ˆ ä@ ë j A h k (ë1 , . . ., ë kÿ1 )dë1 . . . dë k (3:2)
jˆ1

where ä(:) is the Dirac function periodically extended to R with period 2ð.
The function

# Blackwell Publishers Ltd 1998


640 R. DAHLHAUS AND L. GIRAITIS

f (u, ë) :ˆ jA(u, ë)j2


is called the time-varying spectral density. It is uniquely de®ned by the
covariance structure of the triangular array (X t,T ) Ttˆ1 , T > 1 (see Dahlhaus,
1996a).
The time-varying AR process (1.1) is locally stationary with spectral density
p ÿ2
ó (u)2 X
f (u, ë) ˆ a j (u)exp(ië j) ˆ f è(u) (ë) (3:3)
2ð jˆ0

(see Dahlhaus, 1996a, Theorem 2.3). Additional examples are given by Dahlhaus
(1997).
Again we denote by t a time point in the interval [1, T ] while u is a time
point in the rescaled interval [0, 1], i.e. u ˆ t=T .
Suppose now that we have observations X 1,T , . . ., X T ,T of a locally
stationary process with mean zero and spectral density f (u, ë) and we ®t a
locally stationary model with spectral density f è(u) (ë). è(u) is estimated by the
minimum è ^ N (u) of the local Whittle likelihood
…ð  
L^ N (è, u) :ˆ 1 log f è (ë) ‡
I n (u, ë)
dë è2È (3:4)
4ð ÿð f è (ë)
with a local version of the periodogram
N ÿ1   2
1 X s
I N (u, ë) :ˆ h exp(ÿiës)X [uT ]ÿ N =2‡s‡1,T (3:5)
2ð H N sˆ0 N
P ÿ1 2 „1
where h: [0, 1] ! R is a data taper, H N :ˆ Njˆ0 h ( j=N )  N 0 h2 (x) dx is
the normalizing factor and [a] denotes the entire part of a real number a. We
assume that the tapering function h(x), 0 < x < 1, has bounded second
derivative and is symmetric:
h(x) ˆ h(1 ÿ x) 0 < x < 1: (3:6)
Note that the above estimate can also be interpreted as a local ®t of a
stationary model with spectral density f è(u) (ë). More generally, all results below
also hold if a locally stationary model with spectral density f è(u) (u, ë) is ®tted.
However, we restrict ourselves to the slightly simpler case f è(u) (ë).
^ N (u) will be an
If the model is correct, i.e. if f (u, ë) ˆ f è0 (u) (ë), then è
estimate of è0 (u). If the model is misspeci®ed and f (u, ë) is not of the above
form then è ^ N (u) usually converges to è0 (u) where è0 (u) is the minimum of
…  
1 ð f (u, ë)
L (è, u) :ˆ log f è (ë) ‡ dë è 2 È, 0 < u < 1: (3:7)
4ð ÿð f è (ë)
Note that
…1 …ð ( )
1 f è(u) (ë) f (u, ë)
d( f è , f ) :ˆ log ‡ ÿ 1 dë du
4ð 0 ÿð f (u, ë) f è(u) (ë)

# Blackwell Publishers Ltd 1998


OPTIMAL SEGMENT LENGTH FOR PARAMETER ESTIMATES 641
is the asymptotic Kullback±Leibler information divergence between two
Gaussian locally stationary processes with time-varying spectral densities
f (u, ë) and f è(u) (ë) (Dahlhaus, 1996a, Theorem 3.4). In the stationary case we
have a similar expression (without the time integral). Therefore, è0 (u) is that
parameter which gives locally the best ®t in the sense of the Kullback±Leibler
information divergence.
For the time-varying AR process (1.1) where è0 (u) ˆ (ó 2 (u), a1 (u), . . .,
a p (u))9 it can be shown that the estimate è ^ N (u) is given exactly by Equa-
tions (2.2). This follows from an evaluation of the equations
@ ^
L N (è, u)jèˆè^ N (u) ˆ 0 i ˆ 1, . . ., p
@èi
by using (2.9). Theorem 2.1 is therefore a special case of Theorems 3.1 and 3.2
proved below. However, the proofs in this section are much more complicated
and the assumptions are more restrictive.
As in Section 2 we look at the weighted mean squared error, i.e. we
calculate and minimize
…
^ 2
Ei è N (u) ÿ è0 (u)i Ó u and E ^ N (u) ÿ è0 (u)i 2 du
iè Óu
IT

where I T ˆ [N =(2T ), 1 ÿ N =(2T )]. To derive these expressions we need an


expansion of the estimate. Let = ˆ (@=@è1 , . . ., @=@è p )9 be the gradient with
respect to è, and de®ne

S N (è, u) :ˆ N 1=2 f=L N (è, u) ÿ E=L N (è, u)g (3:8)


…ð
N 1=2
 = f ÿ1
è (u, ë)fI N (u, ë) ÿ EI N (u, ë)g dë
4ð ÿð

2
T
B N (è, u) :ˆ fE=L N (è, u) ÿ =L (è, u)g
N2
…
T2 1 ð
 2 = f ÿ1
è (u, ë)fEI N (u, ë) ÿ f (u, ë)g dë
N 4ð ÿð
where =2 L è0 ,u :ˆ =2 L fè0 (u), ug. In this section we use the following
assumptions.

ASSUMPTIONS 3.1

^ N (u), ug ˆ 0, =L fè0 (u), ug ˆ 0 for all u, N .


(i) =L N fè
(ii) The derivatives (a) @ 2 A(u, ë)=@u@ë and (b) @ 3 A(u, ë)=@u 3 are
uniformly bounded in (u, ë) 2 [0, 1] 3 [ÿð, ð].
(iii) The derivatives

# Blackwell Publishers Ltd 1998


642 R. DAHLHAUS AND L. GIRAITIS

@3 @3 @ 2 @ ÿ1
f ÿ1 (u, ë) f è (u, ë) f (u, ë)
@èi1 @èi2 @èi3 è @èi1 @èi2 @èi3 @ë2 @èi1 è
are bounded for 1 < i1 , i2 , i3 < p uniformly in (è, u, ë) 2 È3 [0, 1] 3 [ÿð, ð],
where È is an open convex subset of R p and È denotes the closure of È.
(iv) sup0<u<1,è2È i=2 L (è, u)ÿ1 i sp , 1, where i:i sp denotes the spectral
norm of the matrix.

^ N (u)Ðan example is (^a N ,u , ó^ 2 )


Assumption 3.1(i) is usually ful®lled for è N ,u
from Section 2. We now obtain the following asymptotic expansion for è ^ N (u).

THEOREM 3.1. Suppose Assumptions 3.1 hold and T =N 2 ‡ N =T ! 0. Then


the asymptotic expansion
 
2
^ 2 ÿ1 N 1
è N (u) ˆ è0 (u) ÿ = L fè0 (u), ug B N fè0 (u), ug ‡ 1=2 S N fè0 (u), ug ‡ r N
T2 N
(3:9)
with
 
1=2 2 N4 1
E (ir N i ) ˆ O ‡ (3:10)
T4 N
holds uniformly in u.

^ N ,è ˆ L N (è, u). It is suf®cient to


PROOF. We ®x u and set L è ˆ L (è, u), L
show that
è ^ N ,è (u) ‡ r N
^ N (u) ÿ è0 (u) ˆ ÿ(=2 L è (u) )ÿ1 =L (3:11)
0 0

where
…ð
^ N ,è (u) ˆ 1
=L = f ÿ1
è0 (u) (u, ë)fI N (u, ë) ÿ f (u, ë)g dë
0
4ð ÿð

2
N 1
2
ˆB N fè0 (u), ug ‡ 1=2 S N fè0 (u), ug (3:12)
T N
and that the remainder r N is of the order
E1=2 (ir N i 2 ) < Cd 2N
uniformly in u and N with
N2 1
dN ˆ 2
‡ 1=2 :
T N
By the mean value theorem
^ ^ ÿ =L
=L ^ fè
^ N ,è (u) ˆ =2 L ^ N (u) ÿ è0 (u)g (3:13)
N ,è(u) 0 N ,è

# Blackwell Publishers Ltd 1998


OPTIMAL SEGMENT LENGTH FOR PARAMETER ESTIMATES 643
^
for some è 2 È with iè ÿ è0 i < iè ÿ è0 i. Since =L N ,è(u) ˆ 0, =L è0 (u) ˆ 0 we
can rewrite (3.13) in the form (3.11) with
^ N (u) ÿ è0 (u)g
r N ˆ ÿ(=2 L è0 (u) )ÿ1 V N ,è fè
and
^ N ,è ÿ =2 L
V N ,è :ˆ =2 L ^è:
0

Next, using (2.8) and Assumption 3.1(iv) we obtain the bound


 2 ^

Eir N i 2 < CE sup iV N ,è i sp i è N (u) ÿ è0 (u)i
2
^ 0i
è:ièÿè0 i<i èÿè
 
^ N (u) ÿ è0 (u)i 4 ‡
< CE i è sup
4
iV N ,è i sp
^ 0i
è:ièÿè0 i<i èÿè

uniformly in u. Therefore, the result of the theorem follows if we show (for


l ˆ 2) that
^ N (u) ÿ è0 (u)i 2 l < Cd 2 l
Ei è (3:14)
N

and
2l
E sup iV N ,è i sp < Cd 2Nl : (3:15)
^ 0i
è:ièÿè0 i<i èÿè

To show this let


…ð
1
M N (è)  M N ( f ÿ1
è ) :ˆ f ÿ1
è (u, ë)fI N (u, ë) ÿ f (u, ë)g dë:
4ð ÿð
^ N ,è ÿ =L è . Since =L è (u) ˆ 0, =L
Clearly, =M N (è) ˆ =L ^ ^ ˆ 0, this
0 N ,è(u)
allows us to rewrite
=L è(u)
^ ÿ =L è0 (u) ˆ =L è(u)
^
^ N (u)g:
 ÿ=M N fè
Therefore, by the mean value theorem
^ N (u) ÿ è0 (u) ˆ (=2 L )ÿ1 =L
è 2 ÿ1 ^
è è(u)  ÿ(= L è ) =M N fè N (u)g
^ N (u) ÿ è0 (u)i. Consequently, in view of (2.8) and
for some è 2 È, iè ÿ è0 i < i è
Assumptions 3.1(ii) and 3.1(iv), we conclude that
^ N (u) ÿ è0 (u)i < C sup i(=2 L è )ÿ1 i sp sup i=L è i ˆ: D , 1
iè (3:16)
è2È è2È

and
^ N (u) ÿ è0 (u)i < Csup i(=2 L è )ÿ1 i sp i
iè sup i=M N (è)=i
è2È è2È,ièi< D1

< C sup i=M N (è)i (3:17)


ièi< D1

# Blackwell Publishers Ltd 1998


644 R. DAHLHAUS AND L. GIRAITIS

where D1 ˆ D ‡ iè0 i. Now, from Corollary 4.2 (later) it follows easily that for
l>1
E sup i=M N (è)i 2 l < Cd 2Nl (3:18)
è2È,ièi<D1

uniformly in u. From (3.17)±(3.18) we obtain (3.14).


It remains to prove (3.15). Observe that
^ N ,è ˆ =2 M N (è) ‡ =2 L è :
=2 L
From Assumption 3.1(iii) we obtain with a Taylor expansion that =2 L è ˆ
=2 L è0 ‡ r N ,1 where i r N ,1 i sp < Ciè ÿ è0 (u)i. Thus, =2 L ^ N ,è ˆ =2 L è (u) ‡
0
2
V N ,è , where V N ,è ˆ = M N (è) ‡ r N ,1 . Therefore, keeping in mind (3.16),
V N :ˆ sup iV N ,è i sp
^ 0i
è:ièÿè0 i<i èÿè

^ N (u) ÿ è0 (u)i:
< C sup i=2 M N (è)i sp ‡ Ci è (3:19)
ièi< D1

The (i, j)th component of the p 3 p random matrix =2 M N (è) is a random


process …ð
@2
k i,Nj (è) ˆ f ÿ1
è (u; ë)fI N (u; ë) ÿ f (u; ë)g dë
ÿð @è i @è j
S
which is continuous in è 2 È fièi < D1 g. Thus, in view of Assumption
3.1(iii), it follows from Corollary 4.2 that
^ N (u) ÿ è0 (u)i 2 l ‡ C max E
EV 2Nl < CEi è sup jk i,Nj (è)j2 l < Cd 2Nl :
i, jˆ1,..., p
è2È,ièi< D1

Thus, (3.15) and therefore also Theorem 3.1 have been proved. j

To formulate the results on the asymptotic bias and variance, we introduce a


Gaussian ®eld î u ( g), indexed by a complex-valued or real function g:
[ÿð, ð] ! C, with mean Eî u ( g) ˆ 0 and covariances
…ð
Eî u ( g)î u (k) ˆ 2ð g(ë)fk(ë) ‡ k(ÿë)g f 2 (u, ë)dë
ÿð
…ð …ð
‡ 2ð g(ë)k(ÿì) f (u, ë) f (u, ì)h4 (ë, ÿë, ì) dë d ì:
ÿð ÿð
(3:20)
Set
  
1 @ ÿ1
fæ j (u)g jˆ1,..., p :ˆ îu f (u) :
4ð @è j è0 jˆ1,:::, p

Further let

# Blackwell Publishers Ltd 1998


OPTIMAL SEGMENT LENGTH FOR PARAMETER ESTIMATES 645
…ð
1 @2
b(è, u) :ˆ = f ÿ1
è (ë) f (u, ë) dë (3:21)
4ð ÿð @u 2
and d(h) and v(h) be de®ned as in (2.5).

THEOREM 3.2 Under the conditions of Theorem 3.1 as N ! 1


d(h)
B N fè0 (u), ug ˆ bfè0 (u), ug ‡ o(1) (3:22)
2
uniformly in u and
S N (è0 , u) ) fv(h)g1=2 fæ j (u)g jˆ1,:::, p : (3:23)
Furthermore, the covariance matrix of S N (è0 ) converges to the covariance
matrix of fv(h)g1=2 fæ j (u)g jˆ1,:::, p denoted by v(h)W (u).

PROOF. The convergences (3.22) and (3.23) follow from Corollary 4.1 and
Theorem 4.2.

COROLLARY 3.1. Suppose Assumptions 3.1 hold and N 5=2 =T 2 ! 0. Then


^ N (u) ÿ è0 (u)g ) N f0, v(h)(=2 L è ,u )ÿ1 W (u)(=2 L è ,u )ÿ1 g:
N 1=2 fè 0 0

As in Section 2 we now calculate the mean squared error for è ^ N (u) and
minimize it with respect to N and h. Theorems 3.1 and 3.2 imply
 
^ 2 1 N4 2 1 N4 1
Ei è N (u) ÿ è0 (u)i Ó u ˆ d (h)D1 (u) ‡ v(h)D2 (u) ‡ o ‡ (3:24)
4 T4 N T4 N
where
D1 (u) :ˆ bfè0 (u), ug9(=2 L è0 ,u )ÿ1 Ó u (=2 L è0 ,u )ÿ1 bfè0 (u), ug
and
D2 (u) :ˆ trfÓ u (=2 L è0 ,u )ÿ1 W (u)(=2 L è0 ,u )ÿ1 g:
Now we have the analogous result to Theorem 2.2.

^ ÿ è0 (u)i is minimal if
THEOREM 3.3. The mean squared error Ei è(u)
2
Óu

h(x) ˆ hopt (x) ˆ f6x(1 ÿ x)g1=2 0<x<1


and
 1=5
D2 (u)
N ˆ Nopt (u) ˆ C(hopt )1=5 T 4=5 :
D1 (u)
In this case

# Blackwell Publishers Ltd 1998


646 R. DAHLHAUS AND L. GIRAITIS

^ N (u) ÿ è0 (u)i 2 ˆ 5 c(hopt )4=5 D1 (u)1=5 D1 (u)1=5 D2 (u)4=5 ‡ o(1)


T 4=5 Ei è Óu
4
uniformly in u, N, T. A corresponding result also holds for the integrated squared
error as in Theorem 2.3. Moreover, if h4 is continuous then
^ N (u) ÿ è0 (u)i 2 ) Z opt (u)9Ó u Z opt (u)
Nopt i è (3:25)
Óu

where Z opt is a Gaussian vector with mean 12fC(hopt )D2 (u)=


D1 (u)g1=2 d(hopt )(=2 L è0 ,u )ÿ1 bfè0 (u), ug and covariance matrix
v(hopt )(=2 L è0 ,u )ÿ1 W (u)(=2 L è0 ,u )ÿ1 .

Theorem 3.3 follows in the same way as Theorem 2.2. Relation (3.25) is a
consequence of Theorem 3.1 and 3.2.

REMARK. The above results also give the mean squared error for the variance
estimate ó^ 2N ,u of Section 2. Unfortunately, Assumptions 3.1(iii) and 3.1(iv) are
not ful®lled in this case. However, one may derive this expression directly by
similar arguments to those in the proof of Theorem 2.1.

4. LIMIT PROPERTIES OF THE INTEGRATED PERIODOGRAM

We consider in this section the asymptotic properties of the ®eld


…ð
J N (ö) ˆ ö(ë)I N (u, ë) dë ö(:) 2 Ö (4:1)
ÿð

where Ö is the set of complex-valued bounded functions, equipped with the


uniform norm iöi 1 :ˆ sup t jö(t)j. The results of this section are used in
Sections 2 and 3 (see (2.9) and (3.8)). Furthermore, they are of independent
interest for other applications (e.g. for ö ì (ë) ˆ 1[0,ì] (ë) we obtain the local
spectral measure). Throughout this section we assume the function ö to be
periodically extended to R with period 2ð. We start by approximating J N (ö) by
the corresponding statistics of a stationary process which has the same
characteristics locally at t ˆ uT . Let
…ð
J YN (ö) :ˆ ö(ë)I YN (ë) dë
ÿð

where
N ÿ1   2
1 X s
I YN (ë) :ˆ h exp(ÿiës)Y[uT ]ÿ N =2‡s‡1
2ð H N sˆ0 N
is the periodogram on the segment [uT ] ÿ N =2 ‡ 1, . . ., [uT ] ‡ N =2 of the
stationary process

# Blackwell Publishers Ltd 1998


OPTIMAL SEGMENT LENGTH FOR PARAMETER ESTIMATES 647
…ð
Y s :ˆ exp(iës)A(u, ë)î(dë):
ÿð

THEOREM 4.1. Suppose Assumption 3.1(ii)(a) holds, N =T ! 0 as T ! 1


and ö 2 Ö is a function of bounded variation. Then we have

J N (ö) ÿ EJ N (ö) ˆ J YN (ö) ÿ EJ YN (ö) ‡ op (N ÿ1=2 ) (4:2)


and
 
N
J N (ö) ˆ J YN (ö) ‡O ‡ op (N ÿ1=2 ): (4:3)
T
Furthermore, for any k > 2, k even, we have
k
EjN 1=2 fJ N (ö) ÿ EJ N (ö)gj k < Ciöi 1 (4:4)
uniformly in N and ö, where C does not depend on ö.

PROOF. In view of de®nitions (3.1) and (3.5) we have


…ð …ð
J N (ö) ˆ d^ N (x1 , x2 )î(dx1 )î(dx2 )
ÿð ÿð

where
…ð
1
d^ N (x1 , x2 ) ˆ ö(ë)i N (ë, x1 )i N (ÿë, x2 ) dë
2ð H N ÿð

and
i N (ë, x)
    
X
N ÿ1
N s
ˆ expfÿis(ë ÿ x)gexp i [uT ] ÿ ‡ 1 x h A0[uT ]ÿ N =2‡s‡1,T (x):
sˆ0
2 N
(4:5)
We now decompose d^ N (x1 , x2 ) into a main term and some remainders which
have to be estimated. Put
    
( l)
X
N ÿ1
N s
i N (ë, x) :ˆ expfÿis(ë ÿ x)gexp i [uT ] ÿ ‡ 1 x h A(s l (x)
sˆ0
2 N

l ˆ 1, 2, 3
where

# Blackwell Publishers Ltd 1998


648 R. DAHLHAUS AND L. GIRAITIS


[uT ] ÿ N =2 ‡ s ‡ 1
A(1)
s (x) :ˆ A(u, x) A(2)
s (x)
:ˆ ÿA(u, x) ‡ A ,x
T
 
(3) [uT ] ÿ N =2 ‡ s ‡ 1
A s (x) :ˆ ÿA , x ‡ A0[uT ]ÿ N =2‡s‡1,T (x):
T
Then
X 3 …ð X 3
1
d^ N (x1 , x2 ) ˆ ö(ë)i(Nj) (ë, x1 )i(Nl) (ÿë, x2 ) dë ˆ: d^ (Nj, l) (x1 , x2 ):
2ð H N j, lˆ1 ÿð j, lˆ1

(4:6)
Therefore, (4.1) can be written as
X3 …ð …ð X
3
J N (ö) ˆ d^ (Nj, l ) (x1 , x2 )î(dx1 )î(dx2 ) ˆ: J (Nj, l ) (ö): (4:7)
j, lˆ1 ÿð ÿð j, lˆ1
„ð
Direct veri®cation gives that J (1,1) Y
N (ö) ˆ ÿð ö(ë)I N (ë) dë. Thus (4.2) follows, if
( j, l) ( j, l )
J N (ö) ÿ EJ N (ö) ˆ op (N ÿ1=2 ) for j ‡ l . 2. This will be derived at the end
of this proof.
To prove (4.4) we ®rst consider the kth-order cumulants of J N (ö). Let c:
[ÿð, ð] ! C 2 and
…
q(c) ˆ c(x1 x2 )fî(dx1 )î(dx2 ) ÿ E[î(dx1 )î(dx2 )]g:
[ÿð,ð]2

We have ÿ EJ (Nj, l) (ö) ˆ q( d^ (Nj, l) ). We now obtain with the product


J (Nj, l) (ö)
theorem for cumulants (cf. Brillinger, 1981, Theorem 2.3.2)
cumfq( d^ Nj1 ,l1 ) ), . . ., q( d^ Nj k , l k ) )g
X …ð … ð (Y
k
)
ˆ ... d^ (Nj i , l i ) (xi1 , xi2 )
ãˆ(V,..., V r )2Ã(G) ÿð ÿð iˆ1
" #
Y
r
3 ^ st ); (s, t) 2 V m g
cumfî( dx (4:8)
mˆ1

where Ã(G) is the set of all indecomposable partitions 㠈 (V1 , . . ., V r ) of the


V
Qk with jV i j >V2. WithP(3.2) and the notation x :ˆ
set G ˆ f(s, t)g sˆ1,2; tˆ1,...,
v
fxst ; (s, t) 2 V g, dx :ˆ s, t2V dxst , ä(x ) ˆ: ä( (s, t)2V xst ) this is equal to
X …ð … ð (Y k
)(
Y
r
)
... ^ ( ji, li)
d N (xi1 , xi2 ) V m ~ Vm
ä(x ) h(x ) dx Vm

ã2Ã(G) ÿð ÿð iˆ1 mˆ1

where ~h(x1 , . . ., x k ) :ˆ h k (x1 , . . ., x kÿ1 ). Repeated application of the Cauchy±


Schwarz inequality now implies that the cumulant in (4.8) is bounded by

# Blackwell Publishers Ltd 1998


OPTIMAL SEGMENT LENGTH FOR PARAMETER ESTIMATES 649
Y
k
C i d^ (Nj i , l i ) i L2 ((ÿð,ð]2 )
iˆ1

with some constant C independent of N and the d N . Theorem 2.3.2 of Brillinger


(1981) now also implies for the moments
k
Y Y
k
E q( ^
d ( ji , l i)
) < C i d^ (Nj i , l i ) i L2 ((ÿð,ð]2 ) : (4:9)
N
iˆ1 iˆ1

In order to prove (4.4) we therefore have to estimate


2 …ð …ð … ð 2
2 iöi
i d^ (Nj, l) i L2 ((ÿð,ð]2 ) < C 21 dx1 dx2 ji(Nj) (ë, x1 )i(Nl) (ÿë, x2 )j dë
H N ÿð ÿð ÿð
(4:10)
Summation by parts gives under Assumption (3.1)(ii) (see Dahlhaus, 1997,
Lemma A.5)
ji(1)
N (ë, x)j < CL N (x ÿ ë) (4:11)

N N
ji(2)
N (ë, x)j <CL N (x ÿ ë) ji(3)
N (ë, x)j < C (4:12)
T T
uniformly in u, x, ë, where L N (x) is the periodic extension of

N jxj < 1=N
L N (x) :ˆ (4:13)
1=jxj 1=N < jxj < ð:
Moreover, it is easy to verify that, for any 0 , å , 1=2,
…ð
L N (x ‡ y)L N ( y) dy < C(å)N å L1ÿå
N (x) (4:14)
ÿð
…ð
jN å L1ÿå 2
N ( y)j dy < C(å)N (4:15)
ÿð

uniformly in jxj < ð and N > 1, where the constant C(å) does not depend on N.
This implies
2
2 iöi 1
i d^ (1,1)
N i L2 ((ÿð,ð]2 ) < C (4:16)
N
and
2 2 N
i d^ (Nj, l) i L2 ((ÿð,ð]2 ) < Ciöi 1 (4:17)
T2
for j ‡ l . 2. We therefore obtain for k even
X
3 X
3 Y
k
k
EjN 1=2 fJ N (ö) ÿ EJ N (ö)gj k ˆ N k=2 E q( d^ (Nj i , l i ) < Ciöi 1
j1 ,..., j k ˆ1 l1 ,..., l k ˆ1 iˆ1

# Blackwell Publishers Ltd 1998


650 R. DAHLHAUS AND L. GIRAITIS

which proves (4.4). We also have for j ‡ l . 2

2 N2
EjN 1=2 fJ (Nj, l) (ö) ÿ EJ (Nj, l) (ö)gj2 < Ciöi 1
T2
which ®nally proves (4.2) and for j ‡ l . 2
…ð …ð
EJ (Nj, l) (ö)) ˆ ö(ë)i(Nj) (ë, ÿx)i(Nl) (-ë, x) dë dx
ÿð ÿð
 
N
ˆO
T
which, together with (4.2), gives (4.3). j

COROLLARY 4.1. Let Assumption 3.1(ii)(a) be satis®ed, N =T ! 0 as T ! 1


and ö j 2 Ö be functions of bounded variation. Then
N 1=2 fJ N (ö j ) ÿ EJ N (ö j )g jˆ1,:::, p ) fv(h)g1=2 fî(ö j )g jˆ1,..., p (4:18)
where î(ö j ), j ˆ 1, . . ., p, is a Gaussian vector with zero mean and covariance
(3.20).

PROOF. The result follows from (4.2) since Theorem 2 of Dahlhaus (1983)
implies that
 …ð 
1=2 Y Y
N ö j (ë)fI N (ë) ÿ EI N (ë)gdë ) fv(h)g1=2 fî(ö j )g jˆ1,:::, p :
ÿð jˆ1,:::, p

j
Keeping in mind relation (2.9) it is therefore heuristically clear from (4.3)
that ^a N ,u as given in (2.2) has the same asymptotic distribution as in the
stationary case which leads to (2.6).
We now derive the bias of J N (ö). Let
…ð
J (ö) :ˆ ö(ë) f (u, ë) dë: (4:19)
ÿð

THEOREM 4.2. Let the function ö have bounded derivative ö0 ˆ d 2 ö=dë2


and let Assumption 3.1(ii)(b) and T =N 2 ‡ N =T ! 0 be satis®ed. Then
…ð  
1 N2 @2 N2
EJ N (ö) ÿ J (ö) ˆ d(h) ö(ë) f (u, ë) dë ‡ o (iöi 1 ‡ iö0i 1 )
2 T2 ÿð @u 2 T2
(4:20)
with d(h) as in (2.5) uniformly in u and ö.

# Blackwell Publishers Ltd 1998


OPTIMAL SEGMENT LENGTH FOR PARAMETER ESTIMATES 651
PROOF. Denote
…ð …ð …ð
q N :ˆ E ö(ë)I N (u, ë) dë ˆ (2ð H N )ÿ1 ö(ë)ji N (ë, ì)j2 dë d ì:
ÿð ÿð ÿð

Observe that, in view of Taylor's formula,

ö(ë) ˆ ö(ì) ‡ ö9(ì)f(ë ÿ ì)mod 2ðg ‡ Ofj(ë ÿ ì)mod 2ðj2 iö 0i 1 g

uniformly in ë and ì, and


…ð
f(ë ÿ ì)mod 2ðgji N (ë, ì)j2 dë ˆ 0
ÿð

for each ì 2 [ÿð, ð). This implies


…ð …ð
ÿ1
q N ˆ (2ð H N ) ö(ì)ji N (ë, ì)j2 dë d ì
ÿð ÿð
 …ð …ð 
ÿ1 2 2
‡ O iö0i 1 N j(ë ÿ ì)mod 2ðj ji N (ë, ì)j dë d ì
ÿð ÿð

ˆ: q(1)
N ‡ O(iö0i 1 r N ):

We show below that


…ð  
1 N2 @2 N2
q(1)
N ˆ J (ö) ‡ d(h) ö(ë) 2 f (u, ë) dë ‡ o iöi 1 (4:21)
2 T2 ÿð @u T2
 
N2
rN ˆ o (4:22)
T2

uniformly in ö. Clearly, (4.21) and (4.22) imply (4.20). First we shall prove
(4.21). By de®nition (4.5) of i N (:, :),
…ð X
N ÿ1      
s [uT ] ÿ N =2 ‡ s ‡ 1 1
q(1)
N ˆ H ÿ1
N ö(ì) h2
f , ì dì ‡ O iöi 1 :
ÿð sˆ0
N T T

Next, from Taylor's formula we have

@ v2 @ 2
f (u ‡ v, ì) ˆ f (u, ì) ‡ v f (u, ì) ‡ f (u, ì) ‡ O(v3 ):
@u 2 @u 2
Furthermore, for h such that h(x) ˆ h(1 ÿ x), 0 < x < 1, the following asser-
tions are valid:

# Blackwell Publishers Ltd 1998


652 R. DAHLHAUS AND L. GIRAITIS
 
1 XN ÿ1
s s ‡ 1 ÿ N =2
h2 ˆ O(T ÿ1 )
H N sˆ0 N T
  2
1 XN ÿ1
2 s s ‡ 1 ÿ N =2 N2
h ˆ 2 d(h)f1 ‡ o(1)g
H N sˆ0 N T T
  3  
1 XN ÿ1
2 s s ‡ 1 ÿ N =2 N3
h ˆ O :
H N sˆ0 N T T3

Consequently,
…ð  
1 N2 @2 N2
q(1)
N :ˆ J (ö) ‡ d(h) ö(ì) 2 f (u, ì) d ì ‡ o iöi 1
2 T2 ÿð @u T2
uniformly in ö and u.
Now we turn to the proof of the assertion (4.22). In view of (4.11)±(4.12),
   N ÿ1  
N X s
i N (ë, ì) ˆ A(u, ì)exp i [uT ] ÿ ‡ 1 ì expfÿis(ë ÿ ì)gh
2 sˆ0
N
 
N
‡O L N (ë ÿ ì) :
T
Furthermore, by Lemma A.7 of Dahlhaus (1997),
X  
N ÿ1 s L2N (ì)
exp(ÿisì)h < C :
N N
sˆ0

Hence
…ð …ð  
2 L4N (ë ÿ ì) N 2
rN < C j(ë ÿ ì)mod 2ðj ‡ 2 L N (ë ÿ ì) dë d ì
ÿð ÿð N3 T
   
1 N N2
<C ‡ ˆo
N2 T 2 T2
by the assumption T =N 2 ‡ N =T ! 0, where C does not depend on ö and N.
This ®nishes the proof of assertion (4.22). j

We remark that Theorem 4.2 leads with (4.2) to the same approximation as
in (4.3) with O(N 2 =T 2 ) instead of O(N =T ) but under stronger assumptions.
To prove the next result we need the following lemma which can be
established in the same way as Theorem 19 of Ibragimov and HasÁminskii
(1981).

# Blackwell Publishers Ltd 1998


OPTIMAL SEGMENT LENGTH FOR PARAMETER ESTIMATES 653
LEMMA 4.1. Let the random process æ(t) be de®ned and continuous with
probability 1 on the closed set F  R p . Assume that there exist numbers
k > m . p and a number H . 0 such that for all t, h, t ‡ h 2 F
Ejæ(t ‡ h) ÿ æ(t)j k < Hjhj m Ejæ(t)j k < H:
Then
   
E sup jæ(x) ÿ æ( y)j k < B0 Hh mÿ p E supjæ(x)j k < B0 H
jxÿ yj< h;x, y2 F x2 F

where the constant B0 depends on F, k, m and p.

COROLLARY 4.2 Let assumption 3.1(ii)(b) and T =N 2 ‡ N =T ! 0 be satis®ed.


Suppose F  R p is a closed set and öè : [ÿð, ð] ! C are functions whose
derivatives @öè (ë)=@ èi , i ˆ 1, . . ., p and @ 2 öè (ë)=@ë2 are uniformly bounded in
(è, ë) on F 3 [ÿð, ð]. Then for any k > 2, k even,
E supjN 1=2 fJ N (öè ) ÿ J(öè )gj k < B0 d kN
è2 F

where d N ˆ N =T ‡ 1=N 1=2 and B0 depends on k and F.


2 2

PROOF. Theorem 4.1 implies


EjN 1=2 fJ N (öv ) ÿ EJ N (öv )g ÿ N 1=2 fJ N (ö ì ) ÿ EJ N (ö ì )gj k
k
< Ciöv ÿ ö ì i 1 < Civ ÿ ìi k
and
EjN 1=2 fJ N (öv ) ÿ EJ N (öv )gj k < C
where C does not depend on N and v, ì 2 F. The result now follows from
Lemma 4.1 and Theorem 4.2. j

Finally we prove a result that is needed in the proof of Theorem 2.2.

LEMMA 4.2. Under Assumption 2.1(iii) the Yule±Walker estimate (2.2)


^ ÿ1 ^ru has the bound
^a N ,u ˆ ÿ R N ,u
i^a N ,u i < 2 p almost surely (4:23)

PROOF. First we show that R ^ N ,u from (2.2) is non-singular with probability 1.


^
Let it be singular, i.e. P(j RN ,u j ˆ 0) . 0.„ If j R^ N ,u j ˆ 0 then 9a 6ˆ 0 such that
ð Pp
^ ^
RN ,u a ˆ 0, and therefore a9 RN ,u a ˆ ÿð I N (u, ë)j jˆ1 a j exp(ië j)j2 dë ˆ 0.
This implies I N (u, ë) ˆ 0 for all ë, and consequently h(s=N )
X [uT ]ÿ N =2‡s‡1,T ˆ 0 s ˆ 0, . . ., N ÿ 1. But then P(X t ˆ 0) . 0 for some
t ˆ 1, . . ., T . Pp
^ N ,u is non-singular, the roots z j of the polynomial a N (u, j)z j
Because R jˆ0 ^

# Blackwell Publishers Ltd 1998


654 R. DAHLHAUS AND L. GIRAITIS

satisfy the inequalities jz j j . 1 (see Brockwell and Davis, 1987, Problem 8.3;
also Whittle, 1963). Therefore
…
1 ð Y
p
2
i^a i ˆ j1 ÿ zÿ1 2 p
j exp(ië)j dë < 4 almost surely.
2ð ÿð jˆ1

ACKNOWLEDGEMENTS

The authors are very grateful to the referees whose comments led to a substantial
improvement of the paper.
This research was supported by the Deutsche Forschungsgemeinschaft and the
Alexander von Humboldt Foundation.

REFERENCES

BRILLINGER D. R. (1981) Time Series: Data Analysis and Theory. San Francisco, CA: Holden Day.
BROCKWELL, P. J. and DAVIS, R. A. (1987) Time Series: Theory and Methods. New York: Springer.
DAHLHAUS, R. (1983) Spectral analysis with tapered data. J. Time Ser. Anal. 4, 163±75.
б (1988) Small sample effects in time series analysis: a new asymptotic theory and a new
estimate. Ann. Stat. 16, 804±41.
б (1996a) On the Kullback±Leibler information divergence of locally stationary processes.
Stochastic Process. Appl. 62, 139±68.
б (1996b) Asymptotic statistical inference for nonstationary processes with evolutionary spectra.
In Athens Conference on Applied Probability and Time Series, Vol. II (eds P. M Robinson and
M. Rosenblatt), Lecture Notes in Statistics 115. New York: Springer, pp. 145±59.
б (1997) Fitting time series models to nonstationary processes. Ann. Stat. 25, 1±37.
б NEUMANN, M. and VON SACHS, R. (1997) Nonlinear wavelet estimation of time-varying
autoregressive processes. Bernoulli, to be published.
DAVIES, R. (1973) Asymptotic inference in stationary Gaussian time-series. Adv. Appl. Probab. 5,
469±97.
HALLIN, M. (1978) Mixed autoregressive moving average multivariate processes with time
dependent coef®cients. J. Multivariate Anal 8, 567±72.
б (1984) Spectral factorization of nonstationary moving average processes. Ann. Stat. 12,
172±92.
HUSSAIN, M. Y. and SUBBA RAO, T. (1976) The estimation of autoregressive moving average and
mixed autoregressive average systems with time-dependent parameters of non-stationary time
series. Int. J. Control 23, 647±56.
IBRAGIMOV, I. A. and HASÁMINSKII, R. Z. (1981) Statistical Estimation. Asymptotic Theory. New York:
Springer.
KUÈNSCH, H. R. (1995) A note on causal solutions for locally stationary AR processes. Preprint,
ETH ZuÈrich.
LEWIS, R. A. and REINSEL, G. C. (1988) Prediction error of multivariate time series with
misspeci®ed models. J. Time Ser. Anal. 9, 43±57.
MEÂLARD, G. (1985) An example of the evolutionary spectrum theory. J. Time Ser. Anal. 6, 81±90.
MILLER, K. S. (1969) Nonstationary autoregressive processes. IEEE Trans. Inform. Theory IT-15,
315±16.
PRIESTLEY, M. B. (1965) Evolutionary spectra and non-stationary processes. J. R. Stat. Soc. B 27,
204±29.
б (1981) Spectral Analysis and Time Series, Vol. 1. London: Academic.
б (1988) Non-linear and Non-stationary Time Series Analysis. London: Academic.

# Blackwell Publishers Ltd 1998


OPTIMAL SEGMENT LENGTH FOR PARAMETER ESTIMATES 655
SHAMAN, P. and STINE, R. A. (1988) The bias of autoregressive coef®cients estimators. Am. Stat.
Assoc. 83, 842±48.
SILVERMAN, B. W. (1986) Density Estimation for Statistics and Data Analysis. London: Chapman
and Hall.
SUBBA RAO, T. (1970) The ®tting of non-stationary time-series models with time-dependent
parameters. J. R. Stat. Soc. B 32, 312±22.
б (1996) Spectral and higher order spectral analysis of nonstationary and nonlinear time series.
Technical Report, Manchester Centre for Statistical Science.
WHITTLE, P. (1963) On the ®tting of multivariate autoregressions, and the approximate canonical
factorization of a spectral density matrix. Biometrika 50, 129±34.
ZHANG, H.-C. (1992) Reduction of the asymptotic bias of autoregressive and spectral estimators by
tapering. J. Time. Ser. Anal. 13, 451±69.

# Blackwell Publishers Ltd 1998

You might also like