100% found this document useful (1 vote)

151 views32 pages

Forecasting Non-Stationary Time Series by Wavelet Process Modelling (Lsero)

Uploaded by

Ana Scalet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

151 views32 pages

Forecasting Non-Stationary Time Series by Wavelet Process Modelling (Lsero)

Uploaded by

Ana Scalet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Piotr Fryzlewicz, Sébastien van Bellegem and Rainer von

Sachs
Forecasting non-stationary time series by
wavelet process modelling

Article (Accepted version)

(Unrefereed)

Original citation:
Fryzlewicz, Piotr and van Bellegem, Sébastien and von Sachs, Rainer (2003) Forecasting non-
stationary time series by wavelet process modelling. Annals of the Institute of Statistical
Mathematics, 55 (4). pp. 737-764. ISSN 0020-3157
DOI: 10.1007/BF02523391

© 2003 The Institute of Statistical Mathematics

This version available at: http://eprints.lse.ac.uk/25830/

Available in LSE Research Online: November 2011

LSE has developed LSE Research Online so that users may access research output of the
School. Copyright © and Moral Rights for the papers on this site are retained by the individual
authors and/or other copyright owners. Users may download and/or print one copy of any
article(s) in LSE Research Online to facilitate their private study or for non-commercial research.
You may not engage in further distribution of the material or use it for any profit-making activities
or any commercial gain. You may freely distribute the URL (http://eprints.lse.ac.uk) of the LSE
Research Online website.

This document is the author’s final accepted version of the journal article. There may be
differences between this version and the published version. You are advised to consult the
publisher’s version if you wish to cite from it.
Forecasting non-stationary time series
by wavelet process modelling
P. Fryźlewicz1 S. Van Bellegem2, 4, ∗ R. von Sachs3, 4
December 16, 2002

Abstract
Many time series in the applied sciences display a time-varying second order struc-
ture. In this article, we address the problem of how to forecast these non-stationary
time series by means of non-decimated wavelets. Using the class of Locally Station-
ary Wavelet processes, we introduce a new predictor based on wavelets and derive the
prediction equations as a generalisation of the Yule-Walker equations. We propose
an automatic computational procedure for choosing the parameters of the forecasting
algorithm. Finally, we apply the prediction algorithm to a meteorological time series.

Keywords: Local stationarity, non-decimated wavelets, prediction, time-modulated pro-

cesses, Yule-Walker equations.
Running head: Forecasting non-stationary processes by wavelets

1
University of Bristol, Department of Mathematics, Bristol, UK. E-mail: [email protected]
2
Research Fellow of the National Fund for Scientific Research (F.N.R.S.). Université catholique de Lou-
vain, Institut de statistique, Louvain-la-Neuve, Belgium. E-mail: [email protected]
3
Université catholique de Louvain, Institut de statistique, Louvain-la-Neuve, Belgium. E-mail:
[email protected]
4
Financial support from the contract ‘Projet d’Actions de Recherche Concertées’ nr. 98/03-217 of the
Belgian Government and from the IAP research network No. P5/24 of the Belgian State (Federal Office for
Scientific, Technical and Cultural Affairs) are gratefully acknowledged.
* Corresponding author. Address for correspondence: Université catholique de Louvain, Institut de
statistique, Voie du Roman Pays, 20, B-1348 Louvain-la-Neuve, Belgium. Fax: +32 10 47.30.32

1
1 Introduction
In a growing number of fields, such as biomedical time series analysis, geophysics, telecom-
munications, or financial data analysis, to name but a few, explaining and inferring from
observed serially correlated data calls for non-stationary models of their second order struc-
ture. That is, variance and covariance, or equivalently the spectral structure, are likely to
change over time.
In this article, we address the problem of whether and how wavelet methods can help
in forecasting non-stationary time series. Recently, Antoniadis and Sapatinas (2002) used
wavelets for forecasting time-continuous stationary processes. The use of wavelets has proved
successful in capturing local features of observed data. There arises a natural question of
whether they can also be useful for prediction in situations where too little homogeneous
structure at the end of the observed data set prevents the use of classical prediction methods
based on stationarity. Obviously, in order to develop a meaningful approach, one needs to
control this deviation from stationarity, and hence one first needs to think about what kind
of non-stationary models to fit to the observed data. Let us give a brief overview of the
existing possibilities.
Certainly the simplest approach consists in assuming piecewise stationarity, or approxi-
mate piecewise stationarity, where the challenge is to find the stretches of homogeneity opti-
mally in a data-driven way (Ombao et al., 2001). The resulting estimate of the time-varying
second order structure is, necessarily, rather blocky over time, so some further thoughts on
how to cope with these potentially artificially introduced discontinuities are needed. To name
a few out of the many models which allow a smoother change over time, we cite the following
approaches to the idea of “local stationarity”: the work of Mallat et al. (1998), who impose
bounds on the derivative of the Fourier spectrum as a function of time, and the approaches
which allow the coefficients of a parametric model (such as AR) to vary slowly with time
(e.g. Mélard and Herteleer-De Schutter (1989), Dahlhaus et al. (1999) or Grillenzoni (2000)).
The following fact is a starting point for several other more general and more non-parametric
approaches: every covariance-stationary process Xt has a Cramér representation
Z
(1.1) Xt = A(ω) exp(iωt)dZ(ω), t ∈ Z,
(−π,π]

where Z(ω) is a stochastic process with orthonormal increments. Non-stationary processes

are defined by assuming a slow change over time of the amplitude A(ω) (Priestley (1965),
Dahlhaus (1997), Ombao et al. (2002)). All the above models are of the “time-frequency”
type as they use, directly or indirectly, the concept of a time-varying spectrum, being the
Fourier transform of a time-varying autocovariance.
The work of Nason, von Sachs and Kroisandt (2000) adopts the concept of local sta-
tionarity but replaces the aforementioned spectral representation with respect to the Fourier
basis by a representation with respect to non-decimated (or translation-invariant) wavelets.
With their model of “Locally Stationary Wavelet” (LSW) processes, the authors introduce
a time-scale representation of a stochastic process. The representation allows for a rigorous
theory of how to estimate the wavelet spectrum, i.e. the coefficients of the resulting repre-
sentation of the local autocovariance function with respect to autocorrelation wavelets. This
theory parallels the one developed by Dahlhaus (1997), where rescaling the time argument
of the autocovariance and the Fourier spectrum makes it possible to embed the estimation
in the non-parametric regression setting, including asymptotic considerations of consistency

2
and inference. Nason et al. (2000) also propose a fast and easily implementable estimation
algorithm which accompanies their theory.
As LSW processes are defined with respect to a wavelet system, they have a mean-square
representation in the time-scale plane. It is worth recalling that many time series in the ap-
plied sciences are believed to have an inherent “multiscale” structure (e.g. financial log-return
data, see Calvet and Fisher (2001)). In contrast to Fourier-based models of nonstationarity,
the LSW model offers a multiscale representation of the (local) covariance (see Section 2).
This representation is often sparse, and thus the covariance may be estimated more easily
in practice. The estimator itself is constructed by means of the wavelet periodogram, which
mimicks the structure of the LSW model and is naturally localised.
Given all these benefits, it seems appropriate to us to use the (linear) LSW model to
generalise the stationary approach of forecasting Xt by means of a predictor based on the
previous observations up to time t − 1. While the classical linear predictor can be viewed
as based on a non-local Fourier-type representation, our generalisation uses a local wavelet-
based approach.
The paper is organised as follows: Section 2 familiarises the reader with the general
LSW model, as well as with the particular subclass of time-modulated processes. These are
stationary processes modulated by a time-varying variance function, and have proved useful,
for instance, in modelling financial log-return series (Van Bellegem and von Sachs (2002)).
In the central Section 3, we deal with the theory of prediction for LSW processes, where
the construction of our linear predictor is motivated by the approach in the stationary case,
i.e. the objective is to minimise the mean-square prediction error (MSPE). This leads to
a generalisation of the Yule-Walker equations, which can be solved numerically by matrix
inversion or standard iterative algorithms such as the innovations algorithm (Brockwell and
Davis, 1991), provided that the non-stationary covariance structure is known. However, the
estimation of a non-stationary covariance structure is the main challenge in this context, and
this issue is addressed in Section 4. In the remainder of Section 3, we derive an analogue of
the classical Kolmogorov formula for the theoretical prediction error, and we generalise the
one-step-ahead to h-step-ahead prediction.
Section 4 deals with estimation of the time-varying covariance structure. We discuss some
asymptotic properties of our estimators based on the properties of the corrected wavelet
periodogram, which is an asymptotically unbiased, but not consistent, estimator of the
wavelet spectrum. To achieve consistency, we propose an automatic smoothing procedure,
which forms an integral part of our new algorithm for forecasting non-stationary time series.
The algorithm implements the idea of adaptive forecasting (see Ledolter (1980)) in the LSW
model. In Section 5 we apply our algorithm to a meteorological time series.
We close with a conclusions section and we present our proofs in two appendices. Ap-
pendix A contains all the results related to approximating the finite-sample covariance struc-
ture of the non-stationary time series by the locally stationary limit. In Appendix B, we
show some relevant basic properties of the system of autocorrelation wavelets, and provide
the remaining proofs of the statements made in Section 3 and 4.

2 Locally Stationary Wavelet processes

LSW processes are constructed by replacing the amplitude A(ω) in the Cramér representation
(1.1) with a quantity which depends on time (this ensures that the second-order structure
of the process changes over time), as well as by replacing the Fourier harmonics exp(iωt)

3
with non-decimated discrete wavelets ψjk (t), j = −1, −2, . . ., k ∈ Z. Here, j is the scale
parameter (with j = −1 denoting the finest scale) and k is the location parameter. Note that
unlike decimated wavelets, for which the permitted values of k at scale j are restricted to the
set {c2−j , c ∈ Z}, non-decimated wavelets can be shifted to any location defined by the finest
resolution scale, determined by the observed data (k ∈ Z). As a consequence, non-decimated
wavelets do not constitute bases for `2 but overcomplete sets of vectors. The reader is referred
to Coifman and Donoho (1995) for an introduction to non-decimated wavelets.
By way of example, we recall the simplest discrete non-decimated wavelet system: the
Haar wavelets. They are defined by

ψj0 (t) = 2j/2 I{0,1,...,2−j−1 −1} (t) − 2j/2 I{2−j−1 ,...,2−j −1} (t) for j = −1, −2, . . . and t ∈ Z ,

and ψjk (t) = ψj0 (t − k) for all k ∈ Z, where IA (t) is 1 if t ∈ A and 0 otherwise.
We are now in a position to quote the formal definition of an LSW process from Nason,
von Sachs and Kroisandt (2000).
Definition 1. A sequence of doubly-indexed stochastic processes X t,T (t = 0, . . . , T − 1) with
mean zero is in the class of LSW processes if there exists a mean-square representation
−1
X ∞
X
(2.1) Xt,T = wj,k;T ψjk (t) ξjk ,
j=−J k=−∞

where {ψjk (t)}jk is a discrete non-decimated family of wavelets for j = −1, −2, . . . , −J, based
on a mother wavelet ψ(t) of compact support and J = − min{j : Lj 6 T } = O(log(T )), where
Lj is the length of support of ψj0 (t). Also,

1. ξjk is a random orthonormal increment sequence with Eξjk = 0 and Cov (ξjk , ξ`m ) =
δj` δkm for all j, `, k, m; where δj` = 1 if j = ` and 0 otherwise;

2. For each j 6 −1, there exists a Lipschitz-continuous function Wj (z) on (0, 1) possessing
the following properties:
P−1 2
• j=−∞ |Wj (z)| < ∞ uniformly in z ∈ (0, 1) ;
• there exists a sequence of constants Cj such that for each T

k Cj
(2.2) sup wj,k;T − Wj 6 ;
k=0,...,T −1 T T
P−1
• the constants Cj and the Lipschitz constants Lj are such that j=−∞ Lj (Cj +
Lj Lj ) < ∞.

LSW processes are not uniquely determined by the sequence {wjk;T }. However, Nason et
al. (2000) develop a theory which defines a unique spectrum. This spectrum measures the
power of the process at a particular scale and location. Formally, the evolutionary wavelet
spectrum of an LSW process {Xt,T }t=0,...,T −1 , with respect to ψ, is defined by

(2.3) Sj (z) = |Wj (z)|2 , z ∈ (0, 1)

and is such that, by definition of the process, Sj (z) = limT →∞ |wj,[zT ];T |2 for all z in (0, 1).

4
Remark 1 (Rescaled time). In Definition 1, the functions {Wj (z)}j and {Sj (z)}j are
defined on the interval (0, 1) and not on {0, . . . , T − 1}. Throughout the paper, we refer to
z as the rescaled time. This idea goes back to Dahlhaus (1997), who shows that the time-
rescaling permits an asymptotic theory of statistical inference for a time-varying Fourier
spectrum. The rescaled time is related to the observed time t ∈ {0, . . . , T − 1} by the natural
mapping t = [zT ], which implies that as T → ∞, functions {Wj (z)}j and {Sj (z)}j are
sampled on a finer and finer grid. Due to the rescaled time concept, the estimation of the
wavelet spectrum {Sj (z)}j is a statistical problem analogous to the estimation of a regression
function (see also Dahlhaus (1996a)).

In the classical theory of stationary processes, the spectrum and the autocovariance
function are Fourier transforms of each other. To establish an analogous relationship for
the wavelet spectrum, observe that the autocovariance function of an LSW process can be
written as
cT (z, τ ) = Cov X[zT ],T , X[zT ]+τ,T
for z ∈ (0, 1) and τ in Z, and where [ · ] denotes the integer part of a real number. The next
result shows that this covariance tends to a local covariance as T tends to infinity. Let us
introduce the autocorrelation wavelets as
∞
X
Ψj (τ ) = ψjk (0) ψjk (τ ) , j < 0, τ ∈ Z.
k=−∞

Some useful properties of the system {Ψj }j<0 can be found in Appendix B. By definition,
the local autocovariance function of an LSW process with evolutionary spectrum (2.3) is
given by
−1
X
(2.4) c (z, τ ) = Sj (z)Ψj (τ )
j=−∞

for all τ ∈ Z and z in (0, 1). In particular, the local variance is given by the multiscale
decomposition
−1
X
2
(2.5) σ (z) = c(z, 0) = Sj (z)
j=−∞

as Ψj (0) = 1 for all scales j.

Proposition 1 (Nason et al. (2000)). Under the assumptions of Definition 1, if T → ∞,

then |cT (z, τ ) − c (z, τ )| = O (T −1 ) uniformly in τ ∈ Z and z ∈ (0, 1).

Note that formula (2.4) provides a decomposition of the autocovariance structure of the
process over scales and rescaled-time locations. In practice, it often turns out that spectrum
Sj (z) is only significantly different from zero at a limited number of scales (Fryźlewicz, 2002).
If this is the case, then the local autocovariance function c(z, τ ) has a sparse representation
and can thus be estimated more easily.

Remark 2 (Stationary processes). A stationary process with an absolutely summable

autocovariance function is an LSW process (Nason et al., 2000, Proposition 3). Stationarity

5
is characterised by a wavelet spectrum which is constant over rescaled time: Sj (z) = Sj for
all z ∈ (0, 1).

Remark 3 (Time-modulated processes). Time-modulated (TM) processes constitute a

particularly simple class of non-stationary processes. A TM process Xt,T is defined as

t
(2.6) Xt,T = σ Yt ,
T

where Yt is a zero-mean stationary process with variance one, and the local standard deviation
function σ(z) is Lipschitz continuous on (0, 1) with the Lipschitz constant D. Process Xt,T
is LSW if

• the autocovariance function of Yt is absolutely summable (so that Yt is LSW with a

time-invariant spectrum {SjY }j );

• and if the Lipschitz constants LX Y 1/2

j = D(Sj ) satisfy the requirements of Definition 1.

If these two conditions hold, then the spectrum Sj (z) of Xt,T is given by the formula
Sj (z) = σ 2 (z)SjY . The local autocorrelation function ρ(τ ) = c(z, τ )/c(z, 0) of a TM pro-
cess is independent of z.

However, the real advantage of introducing general LSW processes lies in their ability to
model processes whose both variance and autocorrelation function vary over time. Figure
1 shows simulated examples of LSW processes in which the spectrum is only non-zero at a
limited number of scales. A sample realisation of a TM process is plotted in Figure 1(c),
and Figure 1(d) shows a sample realisation of an LSW process which cannot be modelled as
a TM series.

Figure 1 here

3 The predictor and its theoretical properties

In this section, we define and analyse the general linear predictor for non-stationary data
that are modelled to follow the LSW process representation given in Definition 1.

3.1 Definition of the linear predictor

Given t observations X0,T , X1,T , . . . , Xt−1,T of an LSW process, we define the h-step-ahead
predictor of Xt−1+h,T by
t−1
X (h)
(3.1) X̂t−1+h,T = bt−1−s;T Xs,T ,
s=0

(h)
where the coefficients bt−1−s;T are such that they minimise the Mean Square Prediction Error
(MSPE). The MSPE is defined by
2
MSPE(X̂t−1+h,T , Xt−1+h,T ) = E X̂t−1+h,T − Xt−1+h,T .

6
The predictor (3.1) is a linear combination of doubly-indexed observations where the
weights need to follow the same doubly-indexed framework. This means that as T → ∞,
we augment our knowledge about the local structure of the process, which allows us to
(h)
fit coefficients bt−1−s;T more and more accurately. The double indexing of the weights is
necessary due to the non-stationary nature of the data. This scheme is different to the
traditional filtering of the data Xs,T by a linear filter {bt }. In particular, we do not assume
the (square) summability of the sequence bt because (3.1) is a relation which is written in
rescaled time.
The following assumption holds in the sequel of the paper.
Assumption 1. If h is the prediction horizon and t is the number of observed data, then
we set T = t + h and we assume h = o(T ).
Remark 4 (Prediction domain in the rescaled time). With this assumption, the last
observation of the LSW process is denoted by Xt−1,T = XT −h−1,T , while X̂T −1,T is the last
possible forecast (h steps ahead). Consequently, in the rescaled time (see Remark 1), the
evolutionary wavelet spectrum Sj (z) can only be estimated on the interval

h+1
(3.2) 0, 1 − .
T

The rescaled-time segment

h+1
(3.3) 1− ,1
T

accommodates the predicted values of Sj (z). With Assumption 1, the estimation domain
(3.2) asymptotically tends to [0, 1) while the prediction domain (3.3) shrinks to an empty set
in the rescaled time. Thus, Assumption 1 ensures that asymptotically, we acquire knowledge
of the wavelet spectrum over the full interval [0, 1).

3.2 Prediction in the wavelet domain

There is an interesting link between the above definition of the linear predictor (3.1) and
another, “intuitive” definition of a predictor in the LSW model. For ease of presentation,
let us suppose the forecasting horizon is h = 1, so that T = t + 1. Given observations up
to time t − 1, a natural way of defining a predictor of Xt,T is to mimic the structure of the
LSW model itself by moving to the wavelet domain. The empirical wavelet coefficients are
defined by
t−1
X
djk;T = Xs,T ψjk (s)
s=0

for all j = −1, . . . , −J and k ∈ Z. Then, the one-step-ahead predictor is constructed as

−1 X
X (1)
(3.4) X̂t,T = djk;T ajk;T ψjk (t) ,
j=−J k∈

(1)
where the coefficients ajk have to be estimated and are such that they minimise the MSPE.
This predictor (3.4) may be viewed as a projection of Xt,T on the space of random variables

7
spanned by {dj,k;T |j = −1, . . . , −J and k = 0, . . . , T − 1}.
It turns out that due to the redundancy of the non-orthogonal wavelet system {ψjk (t)},
the predictor (3.4) does not have a unique representation: there exists more than one solution
(1)
{ajk } minimising the MSPE, but each solution gives the same predictor (expressed as a
different linear combination of the redundant functions {ψjk (t)}). One canPeasily verify this
observation by considering, for example, the stationary process Xs = ∞ k=−∞ ψ−1k (s)ζk ,
where ψ−1 is the non-decimated discrete Haar wavelet at scale −1 and ζk is an orthonormal
increment sequence.
It is not surprising that the wavelet predictor (3.4) is related to the linear predictor (3.1)
by
−1 X
X
(1) (1)
bt−s;T = ajk;T ψjk (t) ψjk (s).
j=−J k∈

(1)
Because of the redundancy of the non-decimated wavelet system, for a fixed sequence b t−s;T ,
(1)
there exists more than one sequence ajk;T such that this relation holds. For this reason, we
prefer to work directly with the general linear predictor (3.1), bearing in mind that it can
also be expressed as a (non-unique) projection onto the wavelet domain.

3.3 One-step ahead prediction equations

In this subsection, we consider a forecasting horizon h = 1 (so that T = t + 1) and want
(1)
to minimise the mean square prediction error MSPE(X̂t;T , Xt;T ) with respect to bt−s;T . This
quadratic function may be written as

MSPE(X̂t;T , Xt;T ) = b0t Σt;T bt ,

(1) (1)
where bt is the vector (bt−1;T , . . . , b0;T , −1) and Σt;T is the covariance matrix of X0;T , . . . , Xt;T .
2
However, the matrix Σt;T depends on wjk;T which cannot be estimated, as they are not
identifiable (recall that the representation (2.1) is not unique due to the redundancy of
the system {ψjk }). The next proposition shows that the MSPE may be approximated by
b0t Bt;T bt , where Bt;T is a (t + 1) × (t + 1) matrix whose (m, n)−th element is given by
−1
X
n+m
Sj Ψj (n − m) ,
j=−J
2T

and can be estimated by estimating the (uniquely defined) wavelet spectrum Sj . We first
consider the following assumptions on the evolutionary wavelet spectrum.

Assumption 2. The evolutionary wavelet spectrum is such that

∞
X
(3.5) sup |c(z, τ )| < ∞,
z
τ =0
X
(3.6) C1 := ess inf Sj (z)|ψ̂j (ω)|2 > 0,
z,ω
j<0

P∞
where ψ̂j (ω) = s=−∞ ψj0 (s) exp(iωs).

8
Note that if (3.5) holds, then
X
(3.7) C2 := ess sup Sj (z)|ψ̂j (ω)|2 < ∞.
z,ω
j<0

Assumption (3.5) ensures that for each z, the local covariance c(z, τ ) is absolutely summable,
so the process is short-memory (in fact, Assumption (3.5) is slightly stronger than that, for
technical reasons). Assumption (3.6) and formula (3.7) become more transparent when we
recall that for a stationary
P process Xt with spectral density f (ω) and wavelet spectrum
Sj , we have f (ω) = j Sj |ψ̂j (ω)|2 (the Fourier transform of equation (2.4) for stationary
processes). In this sense, (3.6) and (3.7) are “time-varying” counterparts of the classical
assumptions of the (stationary) spectral density being bounded away from zero, as well as
bounded from above.
Proposition 2. Under Assumptions (3.5) and (3.6), the mean square one-step-ahead pre-
diction error may be written as

(3.8) MSPE(X̂t;T , Xt;T ) = b0t Bt;T bt (1 + oT (1)) .

(1) (1)
Moreover, if {bs;T } are the coefficients which minimise b0t Bt;T bt , then {bs;T } solve the fol-
lowing linear system
t−1
( −1 ) −1
X (1)
X n + m X t+n
(3.9) bt−1−m;T Sj Ψj (m − n) = Sj Ψj (t − n)
m=0 j=−J
2T j=−J
2T

for all n = 0, . . . , t − 1.
The proof of the first result can be found in Appendix A (see Lemma 5) and uses standard
approximations of covariance matrices of locally stationary processes. The second result is
simply the minimisation of the quadratic form (3.8) and the system of equations (3.9) is
called the prediction equations. The key observation here is that minimising b0t Σt;T bt is
asymptotically equivalent to minimising b0t Bt;T bt . Bearing in mind the relation of formula
(2.4) between the wavelet spectrum and the local autocovariance function, the prediction
equations can also be written as
t−1
X
(1) n+m n+t
(3.10) bt−1−m;T c ,m− n =c ,t− n .
m=0
2T 2T

The following two remarks demonstrate how the prediction equations simplify in the case of
two important subclasses of locally stationary wavelet processes.
Remark 5 (Stationary processes). If the underlying process is stationary, then the local
autocovariance function c(z, τ ) is no longer a function of two variables, but only a function
of τ . In this context, the prediction equations (3.10) become
t−1
X (1)
bt−1−m c(m − n) = c(t − n)
m=0

for all n = 0, . . . , t − 1, which are the standard Yule-Walker equations used to forecast

9
stationary processes.

Remark 6 (Time-modulated processes). For the processes considered in Remark 3

(equation (2.6)), the local autocovariance function has a multiplicative structure: c(z, τ ) =
σ 2 (z)ρ(τ ). Therefore, for these processes, prediction equations (3.10) become
t−1
X
(1) n+m n+t
bt−1−m;T σ 2 ρ(m − n) = σ 2
ρ(t − n).
m=0
2T 2T

We will now study the inversion of the system (3.9) in the general case, and the stability
of the inversion. Denote by Pt the matrix of this linear system, i.e.
−1
X
n+m
(Pt )nm = Sj Ψj (m − n)
j=−J
2T

for n, m = 0, . . . , t − 1. Using classical results of numerical analysis (see for instance Kress
(1991, Theorem 5.3)) the measure of this stability is given by the so-called condition number,
which is defined by cond (Pt ) = kPt k kPt−1 k. It can be proved along the lines of Lemma 3
(Appendix A) that, under Assumptions (3.5) and (3.6), cond (Pt ) 6 C1 C2 .

3.4 The prediction error

The next result generalises the classical Kolmogorov formula for the theoretical one-step-
ahead prediction error (Brockwell and Davis, 1991, Theorem 5.8.1). It is a direct modification
of a similar result stated by Dahlhaus (1996b, Theorem 3.2(i)) for locally stationary Fourier
processes.

Proposition 3. Suppose that Assumptions (3.5) and (3.6) hold. Given t observations
X0,T , . . . , Xt−1,T of the LSW process {Xt,T } (with T = t + 1), the one-step ahead mean
2
square prediction error σospe in forecasting X̂t,T is given by
( Z " −1 #)
1 π X t
2
σospe = exp dω ln Sj |ψ̂j (ω)|2 (1 + oT (1)) .
2π −π j=−∞
T
P
Note that due to Assumption (3.6), the sum j Sj (t/T )|ψ̂j (ω)|2 is strictly positive,
except possibly on a set of measure zero.

3.5 h-step-ahead prediction

The one-step-ahead prediction equations have a natural generalisation to the h-step-ahead
prediction problem with h > 1. The mean square prediction error can be written
2
MSPE(X̂t+h−1,T , Xt+h−1,T ) = E X̂t+h−1,T − Xt+h−1,T = b0t+h−1 Σt+h−1;T bt+h−1 ,

where Σt+h−1;T is the covariance matrix of X0,T , . . . , Xt+h−1,T and bt+h−1 is the vector
(h) (h) (h) (h) (h) (h) (h)
(bt−1 , . . . , b0 , b−1 , . . . , b−h ), with b−1 , . . . , b−h+1 = 0 and b−h = −1. Like before, we approx-
imate the mean square error by b0t+h−1 Bt+h−1;T bt+h−1 , where Bt+h−1;T is a (t + h) × (t + h)

10
matrix whose (m, n)-th element is given by
−1
X
n+m
Sj Ψj (n − m) .
j=−J
2T

Proposition 4. Under Assumptions (3.5) and (3.6), the mean square prediction error may
be written as

MSPE(X̂t+h−1;T , Xt+h−1;T ) = b0t+h−1 Bt+h−1;T bt+h−1 (1 + oT (1)) .

4 Prediction based on data

Having treated the prediction problem from a theoretical point of view, we now address the
question of how to estimate the unknown time-varying second order structure in the system
of equations (3.9). In Subsection 4.3, we propose a complete algorithm for forecasting non-
stationary time series using the LSW framework.

4.1 Estimation of the time-varying second-order structure

Our estimator of the local autocovariance function c(z, τ ), with 0 < z < t/T , is constructed
by estimating the unknown wavelet spectrum Sj (z) in the multiscale representation (2.4).
Let us first define the function J(t) = − min{j : Lj 6 t}. Following Nason et al. (2000) we
define the wavelet periodogram as the sequence of squared wavelet coefficients d jk;T , where j
and k are scale and location parameters, respectively:
t−1
!2
X
Ij (k/T ) = d2jk;T = Xs,T ψjk (s) , −J(t) 6 j 6 −1, k = Lj − 1, . . . , t − 1 .
s=0

Note that as ψjk is only nonzero for s = 0, . . . , Lj − 1, the estimator Ij (k/T ) is a function of
Xt,T for t 6 k. At the left edge, we set Ij (k/T ) = Ij ((Lj − 1)/T ) for k = 0, . . . , Lj − 2.
From this definition, we define our multiscale estimator of the local variance function
(2.5) as
−1
X
k k
j
(4.1) c̃ ,0 = 2 Ij .
T j=−J
T

The next proposition concerns the asymptotic behaviour of the first two moments of this
estimator.

Proposition 5. The estimator (4.1) satisfies

k k
E c̃ ,0 = c , 0 + O T −1 log(T ) .
T T

11
If, in addition, the increment process {ξjk } in Definition 1 is Gaussian and (3.5) holds, then
−1
!2
k X X X
Var c̃ ,0 =2 2i+j c(k/T, τ ) ψin (τ )ψjn (0) + O(T −1 ).
T i,j=−J τ n

Remark 7 (Time-modulated processes). For Gaussian time-modulated processes con-

sidered in Remark 3 (formula (2.6)), the variance of estimator (4.1) reduces to
−1
!2
k X X X
(4.2) Var c̃ ,0 = 2σ 4 (k/T ) 2i+j ρ(τ ) ψin (τ )ψjn (0) + O(T −1 ),
T i,j=−J τ n

where ρ(τ ) is the autocorrelation function of Yt (see equation (2.6)). If Xt,T = σ(t/T )Zt ,
where Zt are i.i.d. N (0, 1), then the leading term in (4.2) reduces to (2/3)σ 4 (k/T ) for all
compactly supported wavelets ψ. Other possible estimators of the local variance for time-
modulated processes, as well as an empirical study of the explanatory power of these models
as applied to financial time series, may be found in Van Bellegem and von Sachs (2002).

Remark 8. Proposition 5 can be generalised for the estimation of c(z, τ ) for τ 6= 0. Define
the estimator
−1 −1
!
k X X k
−1
(4.3) c̃ ,τ = Aj` Ψ` (τ ) Ij , k = 0, . . . , t − 1, τ 6= 0,
T j=−J `=−J
T

where the matrix A = (Aj` )j,`<0 is defined by

X
(4.4) Aj` := hΨj , Ψ` i = Ψj (τ ) Ψ` (τ ) .
τ

Note that the matrix Aj` is not simply diagonal due to the redundancy in the system of
autocorrelation wavelets {Ψj }. Nason et al. (2000) proved the invertibility of A if {Ψj } is
constructed using Haar wavelets. If other compactly supported wavelets are used, numerical
results suggest that the invertibility of A still holds, but a complete proof of this result has
not been established yet. Using Lemma 8, it is possible to generalise the proof of Proposition
5 for Haar wavelets to show that

k k
E c̃ ,τ = c , τ + O T −1/2
T T

for τ 6= 0 and, if Assumption (3.5) hold and if the increment process {ξjk } in Definition 1 is
Gaussian, then
( )2
k X−1 X k X
Var c̃ ,τ = 2 hi (τ )hj (τ ) c ,τ ψin (τ )ψjn (0) + O T −1 log2 (T )
T i,j=−J τ
T n

P−1
for τ 6= 0, where hj (τ ) = `=−J A−1
j` Ψ` (τ ).

These results show the inconsistency of the estimator of the local (co)variance, which
needs to be smoothed w.r.t. the rescaled time z (i.e. c̃(·, τ ) needs to be smoothed for all

12
τ ). We use standard kernel smoothing where the problem of the choice of the bandwidth
parameter g arises. The goal of Subsection 4.3 is to provide a fully automatic procedure for
choosing g.
To compute the linear predictor in practice, we invert the generalised Yule-Walker equa-
tions (3.10) in which the theoretical local autocovariance function is replaced by the smoothed
version of c̃(k/T, τ ). However, in equations (4.1) and (4.3), our estimator is only defined for
k = 0, . . . , t − 1 while the prediction equations (3.10) require the local autocovariance up to
k = t (for h = 1). This problem is inherent to our non-stationary framework. We denote the
predictor of c(t/T, τ ) by ĉ(t/T, τ ) and, motivated by the slow evolution of the local autoco-
variance function, propose to compute ĉ(t/T, τ ) by the local smoothing of the (unsmoothed)
estimators {c̃(k/T, τ ), k = t − 1, . . . , t − µ}. In practice, the smoothing parameter µ for
prediction is set to be equal to gT , where g is the smoothing parameter (bandwidth) for
estimation. They can be obtained by the data-driven procedure described in Subsection 4.3.

4.2 Future observations in rescaled time

For clarity of presentation, we restrict ourselves (in this and the following subsection) to the
case h = 1.
In remarks 1 and 4, we recalled the mechanics of rescaled time for non-stationary pro-
cesses. An important ingredient of this concept is that the data come in the form of a trian-
gular array whose rows correspond to different stochastic processes, only linked through the
asymptotic wavelet spectrum sampled on a finer and finer grid. This mechanism is inherently
different to what we observe in practice, where, typically, observations arrive one by one and
neither the values of the “old” observations, nor their corresponding second-order structure,
change when a new observation arrives.
One way to reconcile the practical setup with our theory is to assume that for an observed
process X0 , . . . , Xt−1 , there exists a doubly-indexed LSW process Y such that Xk = Yk,T for
k = 0, . . . , t − 1. When a new observation Xt arrives, the underlying LSW process changes,
i.e. there exists another LSW process Z such that Xk = Zk,T +1 for k = 0, . . . , t. An essential
point underlying our adaptive algorithm of the next subsection is that the spectra of Y and
Z are close to each other, due to the above construction and the regularity assumptions
imposed by Definition 1 (in particular, the Lipschitz continuity of Sj (z)).
The objective of our algorithm is to choose appropriate values of certain nuisance pa-
rameters (see the next subsection) in order to forecast Xt from X0 , . . . , Xt−1 . Assume that
these parameters have been selected well, i.e. that the forecasting has been successful. The
closeness of the two spectra implies that we can also expect to successfully forecast X t+1 from
X0 , . . . , Xt using the same, or possibly “neighbouring”, values of the nuisance parameters.
Bearing in mind the above discussion, we introduce our algorithm with a slight abuse of
notation: we drop the second subscript when referring to the observed time series.

4.3 Data-driven choice of parameters

In theory, the best one-step-ahead linear predictor of Xt,T is given by (3.1), where bt =
(1)
(bt−1−s;T )s=0,...,t−1 solves the prediction equations (3.9). In practice, each of the t components
of the vector bt is estimated using our estimator of the local autocovariance function based
on observations X0,T , . . . , Xt−1,T . Hence, we have to find a balance between the estimation

13
error, potentially increasing with t, and the prediction error which is a decreasing function
of t.
As a natural balancing rule which works well in practice, we suggest to choose a number
p such that the “clipped” predictor
t−1
X
(p) (1)
(4.5) X̂t,T = bt−1−s;T Xs,T
s=t−p

gives a good compromise between the theoretical prediction error and the estimation er-
ror. The construction (4.5) is reminiscent of the classical idea of AR(p) approximation for
stationary processes.
We propose an automatic procedure for selecting the two nusiance parameters: the order
p in (4.5) and the bandwidth g, necessary to smooth the inconsistent estimator c̃(z, τ ) using
a kernel method. The idea of this procedure is to start with some initial values of p and
g and to gradually update these parameters using a criterion which measures how well the
series gets predicted using a given pair of parameters. This type of approach is in the spirit
of adaptive forecasting (Ledolter, 1980).
Suppose that we observe the series up to Xt−1 and want to predict Xt , using an ap-
propriate pair (p, g). The idea of our method is as follows. First, we move backwards by
s observations and choose some initial parameters (p0 , g0 ) for predicting Xt−s from the ob-
served series up to Xt−s−1 . Next, we compute the prediction of Xt−s using the pairs of
parameters around our preselected pair (i.e. (p0 − 1, g0 − δ), (p0 , g0 − δ), . . . , (p0 + 1, g0 + δ)
for a fixed constant δ). As the true value of Xt−s is known, we are able to use a preset
criterion to compare the 9 obtained prediction results, and we choose the pair corresponding
to the best predictor (according to this preset criterion). This step is called the update of
the parameters by predicting Xt−s . In the next step, the updated pair is used as the ini-
tial parameters, and itself updated by predicting Xt−s+1 from X0 , . . . , Xt−s . By applying
this procedure to predict Xt−s+2 , Xt−s+3 , . . . , Xt−1 , we finally obtain an updated pair (p1 , g1 )
which is selected to perform the actual prediction.
Many different criteria can be used to compare the quality of the pairs of parameters at
each step. Denote by X̂t−i (p, g) the predictor of Xt−i computed using pair (p, g), and by
It−i (p, g) the corresponding 95% prediction interval based on the assumption of Gaussianity:
h i
(4.6) It−i (p, g) = −1.96σ̂t−i (p, g) + X̂t−i (p, g) , 1.96σ̂t−i (p, g) + X̂t−i (p, g) ,

2
where σ̂t−i (p, g) is the estimate of MSPE(X̂t−i (p, g), Xt−i ) computed using formula (3.8) with
the remainder neglected. The criterion which we use in the simulations reported in the next
section is to compute
Xt−i − X̂t−i (p, g)
length(It−i (p, g))
for each of the 9 pairs at each step of the procedure and select the updated pair as the one
that minimises this ratio.
We also need to choose the initial parameters (p0 , g0 ) and the number s of data points at
the end of the series which are used in the procedure. We suggest that s should be set to the
length of the largest segment at the end of the series which does not contain any apparent
breakpoints observed after a visual inspection. To avoid dependence on the initial values

14
(p0 , g0 ), we suggest to iterate the algorithm a few times, using (p1 , g1 ) as the initial value for
each iteration. We propose to stop when the parameters (p1 , g1 ) are such that at least 95%
of the observations fall into the prediction intervals.
In order to be able to use our procedure completely on-line, we do not have to repeat the
whole algorithm. Indeed, when observation Xt becomes available, we only have to update
the pair (p1 , g1 ) by predicting Xt , and we directly obtain the “optimal” pair for predicting
Xt+1 .
There are, obviously, many possible variants of our algorithm. Possible modifications
include, for example, using a different criterion, restricting the allowed parameter space for
(p, g), penalising certain regions of the parameter space, or allowing more than one parameter
update at each time point.
We have tested our algorithm on numerous examples, and the following section presents
an application to a real data set. A more theoretical study of this algorithm is left for future
work.

5 Application of the general predictor to real data

El Niño is a disruption of the ocean atmosphere system in the tropical Pacific which has
important consequences for the weather around the globe. Even though the effect of El
Niño is not avoidable, research on its forecast and its impacts allows specialists to attenuate
or prevent its harmful consequences (see Philander (1990) for a detailed overview). The
effect of the equatorial Pacific meridional reheating may be measured by the deviation of the
wind speed on the ocean surface from its average. It is worth mentioning that this effect is
produced by conduction, and thus we expect the wind speed variation to be smooth. This
legitimates the use of LSW processes to model the speed. In this section, we study the wind
speed anomaly index, i.e. its standardised deviation from the mean, in a specific region of the
Pacific (12-2N, 160E-70W). Modelling this anomaly helps to understand the effect of El Niño
effect in that region. The time series composed of T = 910 monthly observations is avail-
able free of charge at http://tao.atmos.washington.edu/data sets/eqpacmeridwindts.
Figure 2(a) shows the plot of the series.

Figure 2 here

Throughout this section, we use Haar wavelets to estimate the local (co)variance. Having
provisionally made a safe assumption of the possible non-stationarity of the data, we first
attempt to find a suitable pair of parameters (p, g) which will be used for forecasting the
series. By inspecting the acf of the series, and by trying different values of the bandwidth,
we have found that the pair (7, 70/T ) works well for many segments of the data; indeed, the
segment of 100 observations from June 1928 to October 1936 gets predicted very accurately in
one-step-ahead prediction: 96% of the actual observations are contained in the corresponding
95% prediction intervals (formula (4.6)).
However, the pair (7, 70/T ) does not appear to be uniformly well suited for forecasting
the whole series. For example, in the segment of 40 observations between November 1986
and February 1990, only 5% of the observations fall into the corresponding one-step-ahead
prediction intervals computed using the above pair of parameters. This provides strong
evidence that the series is non-stationary (indeed, if it was stationary, we could expect to

15
obtain a similar percentage of accurately predicted values in both segments). This further
justifies our approach of modelling and forecasting the series as an LSW process.
Motivated by the above observation, we now apply our algorithm, described in the pre-
vious section, to the segment of 40 observations mentioned above, setting the initial param-
eters to (7, 70/T ). After the first iteration along the segment, the parameters drift up to
(14, 90/T ), and 85% of the observations fall within the prediction intervals, which is indeed
a dramatic improvement over the 5% obtained without applying our adaptive algorithm.
In the second pass, we set the initial values to (14, 90/T ), and obtain a 92.5% coverage
by the one-step-ahead prediction intervals, with the parameters drifting up to (14, 104/T ).
In the last iteration, we finally obtain a 95% coverage, and the parameters get updated to
(14, 114/T ). We now have every reason to believe that this pair of parameters is well suited
for one-step-ahead prediction within a short distance of February 1990. Without performing
any further updates, we apply the one-step-ahead forecasting procedure to predict, one by
one, the eight observations which follow February 1990, the prediction parameters being
fixed at (14, 114/T ). The results are plotted in Figure 2(b), which also compares our results
to those obtained by means of AR modelling. At each time point, the order of the AR
process is chosen as the one that minimises the AIC criterion, and then the parameters are
estimated by means of the standard S-Plus routine. We observe that for both models, all of
the true observed values fall within the corresponding one-step-ahead prediction intervals.
However, the main gain obtained using our procedure is that the prediction intervals are
on average 17.45% narrower in the case of our algorithm. This result is not peculiar to AR
modelling as this percentage is also similar in comparison with other stationary models, like
ARMA(2,10), believed to accurately fit the series. A similar phenomenon has been observed
at several other points of the series.

Figure 3 here

We end this section by applying our general prediction method to compute multi-step-
ahead forecasts. Figure 3 shows the 1- up to 9-step-ahead forecasts of the series, along with
the corresponding prediction intervals, computed at the end of the series (December 1995).
In Figure 3(a), the LSW model is used to construct the forecast values, with parameters
(10, 2.18) chosen automatically by our adaptive algorithm described above. Figure 3(b)
shows the 9-step-ahead prediction based on AR modelling (here, AR(2)). The prediction in
Figure 3(a) looks “smoother” because it uses the information from the whole series. This
information is averaged out, whereas in the LSW forecast, local information is picked up at
the end of the series, and the forecasts look more “jagged”.

6 Conclusion
In this paper, we have given an answer to the pertinent question, asked by time series analysts
over the past few years, of whether and how wavelet methods can help in forecasting non-
stationary time series. To develop the forecasting methodology, we have considered the
Locally Stationary Wavelet (LSW) model, which is based on the idea of a localised time-
scale representation of a time-changing autocovariance function. This model includes the
class of second-order stationary processes and has several attractive features, not only for
modelling, but also for estimation and prediction purposes. Its linearity and the fact that
the time-varying second order quantities are modelled as smooth functions, have enabled

16
us to formally extend the classical theory of linear prediction to the whole class of LSW
processes. These results are a generalisation of the Yule-Walker equations and, in particular,
of Kolmogorov’s formula for the one-step-ahead prediction error.
In the empirical prediction equations the second-order quantities have to be estimated,
and this is where the LSW model proves most useful. The rescaled time, one of the main
ingredients of the model, makes it possible to develop a rigorous estimation theory. Moreover,
by using well-localised non-decimated wavelets instead of a Fourier based approach, our
estimators are able to capture the local time-scale features of the observed non-stationary
data very well (Nason and von Sachs, 1999).
In practice, our new prediction methodology depends on two nuisance parameters which
arise in the estimation of the local covariance and the mean-square prediction error. More
specifically, we need to smooth our inconsistent estimators over time, and to do so, we have to
choose the bandwidth of the smoothing kernel. Moreover, we need to reduce the dimension of
the prediction equations to avoid too much inaccuracy of the resulting prediction coefficients
due to estimation errors. We have proposed an automatic computational procedure for
selecting these two parameters. Our algorithm is in the spirit of adaptive forecasting as it
gradually updates the two parameters basing on the success of prediction. This new method
is not only essential for the success of our whole prediction methodology, it also seems to
be promising in a much wider context of choosing nuisance parameters in non-parametric
methods in general.
We have applied our new algorithm to a meteorological data set. Our non-parametric
forecasting algorithm shows interesting advantages over the classical parametric alternative
(AR forecasting). Moreover, we believe that one of the biggest advantages of our new
algorithm is that it can be successfully applied to a variety of data sets, ranging from financial
log-returns (Fryźlewicz (2002), Van Bellegem and von Sachs (2002)) to series traditionally
modelled as ARMA processes, including in particular data sets which are not, or do not
appear to be, second-order stationary. The S-Plus routines implementing our algorithm, as
well as the data set, can be downloaded from the associated web page

http://www.stats.bris.ac.uk/~mapzf/flsw/flsw.html

In the future, we intend to derive the theoretical properties of our automatic algorithm
for choosing the nuisance parameters of the adaptive predictor. Finally, our approach offers
the attractive possibility to use the prediction error for model selection purposes. LSW
processes are constructed using a fixed wavelet system, e.g. Haar or another Daubechies’
system. It is clear that we can compare the fitting quality of each such model by comparing
its prediction performance on the observed data. In the future, we intend to investigate this
in more detail in order to answer the question, left open by Nason et al. (2000), of which
wavelet basis to use to model a given series.

7 Acknowledgements
The authors would like to thank Rainer Dahlhaus, Christine De Mol, Christian Hafner, Guy
Nason and two anonymous referees for stimulating discussions and suggestions which helped
to improve the presentation of this article. They are also grateful to Peter Brockwell and
Brandon Whitcher for their expertise and advice in analysing the wind data of Section 5.
Sébastien Van Bellegem and Rainer von Sachs would like to express their gratitude to the

17
Department of Mathematics, University of Bristol, and Piotr Fryźlewicz — to the Institut de
statistique, Université catholique de Louvain, and SVB, for their hospitality during mutual
visits in 2001 and 2002. RvS and SVB were funded from the National Fund for Scientific
Research – Wallonie, Belgium (F.N.R.S.), by the contract “Projet d’Actions de Recherche
Concertées” no. 98/03-217 of the Belgian Government and by the IAP research network No.
P5/24 of the Belgian State (Federal Office for Scientific, Technical and Cultural Affairs).
PF was funded by the Department of Mathematics at the University of Bristol, Universities
UK, Unilever Research, and Guy Nason.

A Theoretical properties of the predictor

Let Xt;T = (X0;T , . . . , Xt−1;T )0 be a realisation of an LSW process. In this appendix, we study
0
the theoretical properties of the covariance matrix Σt;T = E(Xt;T Xt;T ). As we need upper
−1
bounds for the spectral norms kΣt;T k and kΣt;T k, we base the following results and their
proofs on methods developed in Dahlhaus (1996b, Section 4) for approximating covariance
matrices of locally stationary Fourier processes. However, in our setting these methods need
important modifications. The idea is to approximate Σt;T by overlapping block Toeplitz
matrices along the diagonal.
The approximating matrix is constructed as follows. First, we construct a coverage of
the time axis [0, T ). Let L be a divisor of T such that L/T → 0, and consider the following
partition of the time axis:
n o
P0 = [0, L), [L, 2L), . . . , [T − L, T ) .

Then, consider another partition of the time axis, which is a shift of P0 by δ < L:
n o
P1 = [0, δ), [δ, L + δ), [L + δ, 2L + δ), . . . , [T − L + δ, T ) .

In what follows, assume that L is a multiple of δ and that δ/L → 0 as T tends to infinity.
Also, consider the partition of the time axis which is a shift of P1 by δ:
n o
P2 = [0, 2δ), [2δ, L + 2δ), [L + 2δ, 2L + 2δ), . . . , [T − L + 2δ, T )

and, analogously, define P3 , P4 , . . . up to PM where M = (L/δ) − 1. Consider the union

of all these partitions P = {P0 , P1 , . . . , PM }, which is a highly redundant coverage of the
time axis. Denote by P the number of intervals in P, and denote the elements of P by Mp ,
p = 1, . . . , P .
For each p, we fix a point νp in Mp and consider matrix D (p) defined by:
X νp
(p)
Dnm = Sj Ψj (n − m)In,m∈Mp
j<0
T

where In,m∈Mp means that we only include those n, m that are in Mp . Observe that each νp
is contained exactly in L/δ segments. The following lemma concerns the approximation of

18
Σt;T by matrix D defined by
P
δ X (p)
Dnm = D .
L p=1 nm

Lemma 1. Assume that (3.5) holds. If L → ∞, δ/L → 0 and L2 /T → 0 as T → ∞, then

x0 (Σt;T − D) x = x0 xoT (1).

(p) (p)
Proof. Define matrix Σt;T by Σt;T = (Σt;T )nm In,m∈Mp . Straightforward calculations
nm
yield
" P
#
δ X (p)
(A.1) x0 (Σt;T − D) x = x0 (Σ − D (p) ) x + RestT
L p=1 t;T

T
X
δ
−1 dδ
X
δ
| RestT | 6 2x x 0
min d , 1 b(k) + Rest0T
L
d=1 k=(d−1)δ+1
 √ 
∞
δ+ LX X
6 2x0 x  b(k) + b(k) + Rest0T
L k=1
√
k> L

and the main term in the above is oT (1) since L → ∞ and δ/L → 0 as T → ∞, and by
assumption (3.5). Let us now turn to the remainder Rest0T . We have

T −1
X X
n + m

0 2
RestT 6 xn xm wjk;T − Sj ψj,k (m)ψj,k (n)
2T
n,m=0 j,k

which may be bounded as follows using the definition of an LSW process, and the Lipschitz
property of Sj :
 2
X X k
X
Rest0T 6 O(T −1 ) (Cj + Lj Lj )  |xn ψj,k (n)|
j k n=k−Lj +1
X
6 O(T −1 )x0 x (Cj + Lj Lj )Lj 6 O(T −1 )x0 x
j

by assumption of the Lemma.

19
Let us finally consider the main term in (A.1). We have
! !2
δ X X 2 ν X
P P
δ X (p) p
x0 Σ − D (p) x6 wjk;T − Sj ψj,k (u)xu Iu∈Mp
L p=1 t;T L p=1 jk T u
P
!
δ XX X X
6 O(T −1 ) x2n In∈Mp Cj + Lj (Lj + L)
L p=1 jk n j
X
(A.2) = O(T −1 )x0 x (Cj + Lj (Lj + L))(Lj + L)
j

where the last equality holds because, by construction, each xn is contained in exactly L/δ
segments of the coverage. Since we assumed that L2 /T → 0 as T → ∞, we obtain the result.

Lemma 2. Assume that (3.5) holds and there exists a t∗ such that xu = 0 for all u 6∈
{t∗ , . . . , t∗ + L}. Then for each t0 ∈ {t∗ , . . . , t∗ + L},
X tX
∗ +L !2
X t0 L2
0 0
(A.3) x Σt,T x = Sj xu ψj,k (u) + x xO .
j
T k u=t∗
T

Proof. Identical to the part of the proof of Lemma 1 leading to the bound for the main term,
i.e. formula (A.2).
In√what follows, the matrix norm kM k denotes the spectral norm of the matrix M , i.e.
max{ λ : λ is the eigenvalue of M 0 M }. If M is symmetric and nonnegative definite, by
standard theory we have
−1
0 −1 0
(A.4) kM k = sup x M x kM k = inf
2
x Mx .
k k22 =1 k k2 =1

Lemma 3. Assume that (3.5) holds. The spectral norm kΣt;T k is bounded in t. Also, if
(3.6) holds, then the spectral norm kΣ−1
t;T k is bounded in t.
Proof. Lemma 1 implies
!2
δ X X νp X X
P
kΣt;T k = sup Sj xn ψj,k−n In∈Mp + oT (1)
k k22 =1 L p=1 j<0 T n k

using Parseval formula, we have

2
P Z X νp 2 X
δ X π
= sup dω Sj ψ̂j (ω) xn exp(−iωn)In∈Mp + oT (1)
2 2πL T
k k2 =1 p=1 −π j<0 n
X 2 X 2

6 ess sup Sj (z) ψ̂j (ω) sup kxk22 + oT (1) = ess sup Sj (z) ψ̂j (ω) + oT (1)
z,ω k k22 =1 z,ω
j j

which is bounded by (3.5) (as (3.5) implies (3.7)). Using (A.4) with M = Σt;T , the bound-
edness of kΣ−1
t;T k is shown in exactly the same way.

20
Proof of Proposition 3. The proof uses Lemmas 1 to 3 and is along the lines of Dahlhaus
(1996b, Theorem 3.2(i)). The idea is to reduce the problem to a stationary situation by
fixing the local time at νp . Then, the key point is to use the following relation between
the wavelet spectrum of a stationary process and its classical Fourier spectrum. If Xt is a
stationary process with an absolutely summable autocovariance and with Fourier spectrum
f (·), then its wavelet spectrum is given by
X Z
−1
(A.5) Sj = Aj` dλ f (λ)|ψ̂` (λ)|2
`

for any fixed non-decimated system of compactly supported wavelets {ψjk }. We refer to
Dahlhaus (1996b, Theorem 3.2(i)) for details.

We will now study the approximation of Σt;T by Bt;T .

Lemma 4. Under the assumptions of Proposition 2 and 4,

MSPE(X̂t+h−1;T , Xt+h−1;T ) = b0t+h−1 Bt+h−1;T bt+h−1 + b0t+h−1 bt+h−1 oT (1)

and, in particular,
MSPE(X̂t;T , Xt;T ) = b0t Bt;T bt + b0t bt oT (1)

Proof. By the definition of an LSW process, we have |wjk;T |2 = Sj ((n + m)/T ) + (Cj + Lj |k −
n − m|)O(T −1 ). Therefore,

X t+h−1
X
b0t+h−1 Σt+h−1;T bt+h−1 = bn bm ψjk (n)ψjk (m)|wjk;T |2
jk n,m=0

X t+h−1
X
n+m
(A.6) = bn bm Ψj (n − m)Sj + Rest1
jk n,m=0
2T

We bound Rest1 as follows:

X
X t+h−1
n + m

−1
| Rest1 | 6 O(T ) k − Lj + Cj |bn bm ψjk (n)ψjk (m)|.
jk n,m=0
2

If Lj denotes the length of support of ψj , we have 0 6 k−n, k−m 6 Lj and so k−(n+m)/2 6

Lj such that

X t+h−1
X
−1
| Rest1 | 6 O(T ) (Lj Lj + Cj ) |bn bm ψjk (n)ψjk (m)|
jk n,m=0
X
6 O(T −1 )b0t+h−1 bt+h−1 Lj (Lj Lj + Cj ) = b0t+h−1 bt+h−1 oT (1) by assumption.
j

Finally, by Assumption (3.5), (A.6) yields the result.

21
Lemma 5. Under the assumptions of Proposition 4, we have

b0t+h−1 Σt+h−1;T bt+h−1 = b0t+h−1 Bt+h−1;T bt+h−1 (1 + oT (1))

Proof of Lemma 5. By Lemma 4, we have b0t+h−1 Σt+h−1;T bt+h−1 = b0t+h−1 Bt+h−1;T bt+h−1 +
b0t+h−1 bt+h−1 oT (1) By Lemma 3, the inverse of Σt;T is bounded in T and, by standard
properties of the spectral norm, we have

b0t+h−1 bt+h−1 6 b0t+h−1 Σt+h−1;T bt+h−1 kΣ−1

t+h−1;T k

for all sequences bt+h−1 . The above gives

b0t+h−1 Σt+h−1;T bt+h−1 6 b0t+h−1 Bt+h−1;T bt+h−1 + b0t+h−1 Σt+h−1;T bt+h−1 kΣ−1
t+h−1;T k oT (1)

which is equivalent to
−1
b0t+h−1 Σt+h−1;T bt+h−1 6 b0t+h−1 Bt+h−1;T bt+h−1 1 − kΣ−1
t+h−1;T k o T (1)

for large T . On the other hand, we have

−1
b0t+h−1 Σt+h−1;T bt+h−1 > b0t+h−1 Bt+h−1;T bt+h−1 1 + kΣ−1
t;T k o T (1)

which implies the result.

B Estimation of the local autocovariance function

In this section, we study the properties of the estimator of the local autocovariance. We
first show some relevant properties of the autocorrelation function Ψj (τ ) and the matrix A
defined in (4.4).

Lemma 6. 1. The system {Ψj (τ ), j = −1, −2, . . .} is linearly independent.

2. Denote by Ψ(τ ) the wavelet autocorrelation function of a continuous wavelet ψ, i.e.

Z
Ψ(τ ) = duψ(u)ψ(u − τ ), τ ∈ Z.

We have
Ψj (τ ) = Ψ 2j |τ |
for all j = −1, −2, . . . and τ ∈ Z.

The proof of the first result can be found in Nason et al. (2000, Theorem 1). For the
proof of the second result, see, for example, Berkner and Wells (2002, Lemma 4.2).
P
Lemma 7. −1 j
j=−∞ 2 Ψj (τ ) = δ0 (τ ).

22
Proof. Using Lemma 6 and Parseval’s formula,
−1
X −1
X −1 Z
X ∞
j j j

2 Ψj (τ ) = 2 Ψ 2 |τ | = dω |ψ̂(2−j ω)|2 exp(iωτ )
j=−∞ j=−∞ j=−∞ −∞

−1 Z X
X 2π 2
(B.1) = dω ψ̂ 2−j (ω + 2kπ) exp(iωτ ).
j=−∞ 0 k∈

Denote by m0 (ξ) the trigonometric polynomial which corresponds to the construction of

wavelet ψ and its corresponding scaling function φ (Daubechies, 1992, Theorem 6.3.6). We
may write
X 2 X 2 2
b −j
ψ 2 (ω + 2kπ) = m0 2 −j−1
ω+2 −j−1
k2π + π φ 2 b −j−1
ω+2 −j−1
k2π
k∈ k∈

and, using the 2πk-periodicity of m0 ,

2 X 2
= m0 2−j−1 ω + π φb 2−j−1 ω + 2−j−1 k2π
k∈
2 X 2
= m0 2−j−1 ω + π m0 2−j−2 ω + 2−j−2 k2π 2 φb 2−j−2 ω + 2−j−2 k2π
k∈
2 2 X 2
= m0 2−j−1 ω + π m0 2−j−2 ω φb 2−j−2 ω + 2−j−2 k2π .
k∈

By similar transformations, we finally arrive at

−j
2 Y X 2

= m0 2 −j−1
ω+π m0 2−j−n ω 2 φb (ω + k2π)
n=2 k∈
−j
2 Y
= (2π)−1 m0 2−j−1 ω + π m0 2−j−n ω 2
n=2

−j−2
Y
= (2π) −1 1 − m0 2−j−1 ω 2 m0 2 ` ω 2 .
`=0

Using (B.1), we obtain

Z
−1
X 2π −1
X 2 −j−2
Y
j
2 Ψj (τ ) = (2π) −1
dω exp(iτ ω) 1 − m0 2−j−1 ω m0 2 ` ω 2 .
j=−∞ 0 j=−∞ `=0

23
Expanding the telescopic sum over j, we get
−1
X −j−2
Y −j−1
Y
1 − m0 2−j−1 ω 2 m0 2l ω 2 = 1 − lim m0 2 l ω 2
j→−∞
j=−∞ l=0 l=0
+∞
Y
=1− m0 2 l ω 2 .
l=0

Thus, we obtain
−1 Z +∞
( )
X 1 2π Y
2j Ψj (τ ) = dω exp (iτ ω) 1 − m0 (2` ω)2
j=−∞
2π 0 `=0
Z 2π +∞
Y
1
(B.2) = δ0 (τ ) − dω exp (iτ ω) m0 (2` ω)2 .
2π 0 `=0

Now, it remains to prove that the second term in (B.2) is equal to zero. By definition,
P2N
−1/2 −1 −inω
m0 (ω) = 2 n=0 hn e , where {hk }k∈ is the low pass quadrature mirror filter used in
the construction of Daubechies’ compactly supported continuous time wavelet ψ (Daubechies,
1992, Section 6.4). We have
Z 2π L
Y L
Y 2N
X −1
1
dω exp (iτ ω) m0 (2` ω)2 = 2−` hn hm δ0 (n − m)
2π 0 `=0 `=0 n,m=0

which clearly tends to 0 as L tends to infinity.

Lemma 8. Matrix A defined in (4.4) has the following properties:

−1
X
(B.3) 2j Aj` = 1.
j=−∞

If, in addition, A is constructed using Haar wavelets, then

−1
X
(B.4) |A−1
j` | 6 C · 2
j/2

`=−∞
−1
X
(B.5) A−1
j` = 2
j

`=−∞

for all j < 0, where C is a constant.

Proof. (B.3) is a straightforward corollary of Lemma 7. To prove (B.4), we introduce the

auxiliary matrix Γ = D 0 AD, where D = diag(2j/2 )j<0 is diagonal, i.e. Γj` = 2j/2 Aj` 2`/2 .
Nason et al. (2000, Theorem 2) show that P−1the spectral Pnorm of Γ−1 is bounded for Haar
−1
wavelets. Therefore, we obtain (B.4) as `=−∞ |A−1 j` | = `=−∞ 2
j/2 `/2 −1
2 |Γj` | 6 C · 2j/2 . To
prove (B.5), observe that if Xt,T is a white noise, then its classical FourierPspectrum is f (λ) =
(2π)−1 . On the other hand, white noise is an LSW process such that j Sj Ψj (τ ) = δ0 (τ )

24
which implies that Sj = 2j (Lemma 7). (B.5) then follows from the following property: If Xt
is the wavelet spectrum of a stationary process with absolute summable
P autocovariance
R and
−1
with Fourier spectrum f , then its wavelet spectrum is given by Sj = ` Aj` dλf (λ)|ψ` (λ)|2
R
and, moreover, dλ|ψ̂` (λ)|2 = 2π.

Proof of Proposition 5. We will first show

(B.6) !
X X X X
cov Xs,T ψi,k (s), Xs,T ψj,k (s) = c(k/T, τ ) ψi,n (τ )ψj,n (0) + O(2−(i+j)/2 T −1 ).
s s τ n

We have
!
X X
cov Xs,T ψi,k (s), Xs,T ψj,k (s) =
s s
X
k

Cl + Ll (u − k)
X
Sl +O ψl,s (u)ψj,k (s)ψl,t (u)ψi,k (t).
l,u
T T s,t

Using Lj = O(M 2−j ) in the first step, and the Cauchy inequality in the second one, we
bound the reminder as follows:

X C + L (u − k) X
l l
O ψl,s (u)ψj,k (s)ψl,t (u)ψi,k (t) ≤
T
l,u s,t

X Cl + M Ll (2−l + min(2−i , 2−j )) X X

ψ l,s (u)ψ j,k (s)ψ l,t (u)ψ i,k (t) ≤
T u
s,t

l
X Cl + M Ll (2−l + 2−i/2 2−j/2 )
(Alj )1/2 (Ali )1/2 =
l
T
( )
2 −(i+j)/2 X X
(Cl + M Ll 2−l )2(i+j)/2 (Alj )1/2 (Ali )1/2 + M Ll (Alj )1/2 (Ali )1/2 =
T l l
−(i+j)/2
2
{I + II}.
T
By formula (B.3),
X X X
I≤ (Cl + M Ll 2−l )(2i Ali + 2j Alj ) ≤ (Cl + M Ll 2−l )2 2i Ali ≤ D1 .
l l i
P P
As i Li 2−i < ∞, we must have Li ≤ C2i so i Li Aij ≤ C again by (B.3). This and the
Cauchy inequality give
!1/2 !1/2
X X
II ≤ 2M Ll Ali Ll Alj ≤ D2 .
l l

The bound for the reminder is therefore O(2−(i+j)/2 T −1 ). For the main term, straightforward

25
computation gives
X kX X X
Sl ψl,s (u)ψj,k (s)ψl,t (u)ψi,k (t) = c(k/T, τ ) ψi,n (τ )ψj,n (0),
l,u
T s,t τ n

which yields formula (B.6). Using Lemma 7 and (B.6) with i = j, we obtain
−1
( )
X X
j −j
E(c̃(k/T, 0)) = 2 c(k/T, τ )Ψj τ + O(2 /T )
j=−J τ
X
= c(k/T, τ )δ0 (τ ) + O(log(T )/T ) = c(k/T, 0) + O(log(T )/T ),
τ

which proves the expectation. For the variance, observe that, using Gaussianity, we have
!2
k k X X
cov Ii , Ij = 2 c(k/T, τ ) ψi,n (τ )ψj,n (0) + O(2−(i+j)/2 T −1 )
T T τ n
!2
X X
(B.7) = 2 c(k/T, τ ) ψi,n (τ )ψj,n (0) + O(2−(i+j)/2 T −1 ),
τ n

provided that (3.5) holds. Using (B.7), we finally obtain

−1
!2
X X X
(B.8) Var(c̃(k/T, 0)) = 2 2i+j c(k/T, τ ) ψi,n (τ )ψj,n (0) + O(T −1 ).
i,j=−J τ n

References
Antoniadis, A. and Sapatinas, T. (2002). Wavelet methods for continuous-time prediction using
representations of autoregressive processes in Hilbert spaces. J. Multivariate Anal. (Under
revision)
Berkner, K. and Wells, R. (2002). Smoothness estimates for soft-threshold denoising via translation-
invariant wavelet transforms. Appl. Comput. Harmon. Anal., 12, 1–24.
Brockwell, P. J. and Davis, R. A. (1991). Time series: Theory and methods (Second ed.). Springer,
New York.
Calvet, L. and Fisher, A. (2001). Forecasting multifractal volatility. J. Econometrics, 105, 27–58.
Coifman, R. and Donoho, D. (1995). Time-invariant de-noising. In A. Antoniadis and G. Oppen-
heim (Eds.), Wavelets and Statistics (Vol. 103, pp. 125–150). New York: Springer-Verlag.
Dahlhaus, R. (1996a). Asymptotic statistical inference for nonstationary processes with evolu-
tionary spectra. In P. Robinson and M. Rosenblatt (Eds.), Athens conference on applied
probability and time series analysis (Vol. 2). Springer, New York.
Dahlhaus, R. (1996b). On the Kullback-Leibler information divergence of locally stationary pro-
cesses. Stochastic Process. Appl., 62, 139–168.

26
Dahlhaus, R. (1997). Fitting time series models to nonstationary processes. Ann. Statist., 25, 1–37.

Dahlhaus, R., Neumann, M. H. and von Sachs, R. (1999). Non-linear wavelet estimation of time-
varying autoregressive processes. Bernoulli, 5, 873–906.

Daubechies, I. (1992). Ten lectures on wavelets. Philadelphia: SIAM.

Fryźlewicz, P. (2002). Modelling and forecasting financial log-returns as locally stationary wavelet
processes (Research Report). Department of Mathematics, University of Bristol. (http:
//www.stats.bris.ac.uk/pub/ResRept/2002.html)

Grillenzoni, C. (2000). Time-varying parameters prediction. Ann. Inst. Statist. Math., 52, 108–122.

Kress, R. (1991). Numerical analysis. New York: Springer.

Ledolter, J. (1980). Recursive estimation and adaptive forecasting in ARIMA models with time
varying coefficients. In Applied Time Series Analysis, II (Tulsa, Okla.) (pp. 449–471). New
York-London: Academic Press.

Mallat, S., Papanicolaou, G. and Zhang, Z. (1998). Adaptive covariance estimation of locally
stationary processes. Ann. Statist., 26, 1–47.

Mélard, G. and Herteleer-De Schutter, A. (1989). Contributions to the evolutionary spectral theory.
J. Time Ser. Anal., 10, 41–63.

Nason, G. P. and von Sachs, R. (1999). Wavelets in time series analysis. Phil. Trans. Roy. Soc.
Lond. A, 357, 2511–2526.

Nason, G. P., von Sachs, R. and Kroisandt, G. (2000). Wavelet processes and adaptive estimation
of evolutionary wavelet spectra. J. Roy. Statist. Soc. Ser. B, 62, 271–292.

Ombao, H., Raz, J., von Sachs, R. and Guo, W. (2002). The SLEX model of a non-stationary
random process. Ann. Inst. Statist. Math., 54, 171–200.

Ombao, H., Raz, J., von Sachs, R. and Malow, B. (2001). Automatic statistical analysis of bivariate
nonstationary time series. J. Amer. Statist. Assoc., 96, 543–560.

Philander, S. (1990). El Niño, La Niña and the southern oscillation. San Diego: Academic Press.

Priestley, M. (1965). Evolutionary spectra and non-stationary processes. J. Roy. Statist. Soc. Ser.
B, 27, 204–237.

Van Bellegem, S. and von Sachs, R. (2002). Forecasting economic time series using models of
nonstationarity (Discussion paper No. 0227). Institut de statistique, UCL. (ftp://www.
stat.ucl.ac.be/pub/papers/dp/dp02/dp0227.ps)

27
List of Figures

1 These simulated examples demonstrate the idea of a sparse representation of

the local (co)variance. The left-hand column shows an example of a smooth
time-varying variance function of a TM process. The example on the right
hand side is constructed in such a way that the local variance function c(z, 0)
is constant over time. In this example, the only deviation from stationarity is

in the covariance structure. The simulations, like all throughout the article,
use Gaussian innovations ξjk and Haar wavelets. . . . . . . . . . . . . . . . . 29
(a) Theoretical wavelet spectrum equal to zero everywhere except scale −2
where S−2 (z) = 0.1 + cos2 (3πz + 0.25π). . . . . . . . . . . . . . . . . . 29

(b) Theoretical wavelet spectrum S−2 (z) = 0.1+cos2 (3πz+0.25π), S−1 (z) =
0.1 + sin2 (3πz + 0.25π) and Sj (z) = 0 for j 6= −1, −2. . . . . . . . . . . 29
(c) A sample path of length 1024 simulated from the wavelet spectrum
defined in (a). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

(d) A sample path of length 1024 simulated from the wavelet spectrum
defined in (b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2 The wind anomaly data (910 observations from March 1920 to December 1995). 30
(a) The wind anomaly index (in cm/s). The two vertical lines indicate the
segment shown in Figure 2(b). . . . . . . . . . . . . . . . . . . . . . . . 30

(b) Comparison between the one-step-ahead prediction in our model (dashed

lines) and AR (dotted lines). . . . . . . . . . . . . . . . . . . . . . . . . 30
3 The last observations of the wind anomaly series and its 1- up to 9-step-ahead
forecasts (in cm/s). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

(a) 9-step-ahead prediction using LSW modelling . . . . . . . . . . . . . . 31

(b) 9-step-ahead prediction using AR modelling . . . . . . . . . . . . . . . 31

28
Figures

9
9

8
8

7
7

6
6

Scales

5
Scales

4
4

3
3

2
2

1
1

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Rescaled Time
Rescaled Time

(a) Theoretical wavelet spectrum equal to zero (b) Theoretical wavelet spectrum S−2 (z) = 0.1 +
everywhere except scale −2 where S−2 (z) = 0.1+ cos2 (3πz + 0.25π), S−1 (z) = 0.1 + sin2 (3πz +
cos2 (3πz + 0.25π). 0.25π) and Sj (z) = 0 for j 6= −1, −2.
2

2
0

0
-2

-2
-4

-4

0 200 400 600 800 1000 0 200 400 600 800 1000

(c) A sample path of length 1024 simulated from (d) A sample path of length 1024 simulated from
the wavelet spectrum defined in (a). the wavelet spectrum defined in (b).

Figure 1: These simulated examples demonstrate the idea of a sparse representation of

the local (co)variance. The left-hand column shows an example of a smooth time-varying
variance function of a TM process. The example on the right hand side is constructed in
such a way that the local variance function c(z, 0) is constant over time. In this example,
the only deviation from stationarity is in the covariance structure. The simulations, like all
throughout the article, use Gaussian innovations ξjk and Haar wavelets.

29
100

50
0
0

-50
-100

-100
-200

1920 1930 1940 1950 1960 1970 1980 1990 1987 1988 1989 1990

(a) The wind anomaly index (in cm/s). The two (b) Comparison between the one-step-ahead pre-
vertical lines indicate the segment shown in Fig- diction in our model (dashed lines) and AR (dot-
ure 2(b). ted lines).

Figure 2: The wind anomaly data (910 observations from March 1920 to December 1995).

30
100
100

50
50

0
0

-50
-50

-100
-100

-150
-150

1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997

(a) 9-step-ahead prediction using LSW modelling (b) 9-step-ahead prediction using AR modelling

Figure 3: The last observations of the wind anomaly series and its 1- up to 9-step-ahead
forecasts (in cm/s).

Mlfinlab Release Hudson & Thames
100% (1)
Mlfinlab Release Hudson & Thames
74 pages
Coelum Stellatum Christianum
No ratings yet
Coelum Stellatum Christianum
31 pages
Time Series Analysis With R
No ratings yet
Time Series Analysis With R
6 pages
Statistical Arbitrage in High Frequency Trading Based On Limit Order Book Dynamics
No ratings yet
Statistical Arbitrage in High Frequency Trading Based On Limit Order Book Dynamics
26 pages
Markov Switching Model Tool
No ratings yet
Markov Switching Model Tool
39 pages
Building A Simple Backtester - Quantitative Endeavor (1) .
No ratings yet
Building A Simple Backtester - Quantitative Endeavor (1) .
5 pages
High Frequency Trading in A Limit Order Book
No ratings yet
High Frequency Trading in A Limit Order Book
8 pages
Krolzig Markov-Switching Vector Autoregressions - Modelling, Statistical Inference, and Application To Business Cycle A
No ratings yet
Krolzig Markov-Switching Vector Autoregressions - Modelling, Statistical Inference, and Application To Business Cycle A
375 pages
An Automated FX Trading System Using Adaptive Reinforcement Learning
No ratings yet
An Automated FX Trading System Using Adaptive Reinforcement Learning
10 pages
A Q-Learning Agent For Automated Trading in Equity Stock Markets
No ratings yet
A Q-Learning Agent For Automated Trading in Equity Stock Markets
12 pages
Deep Robust Reinforcement Learning For Practical Algorithmic Trading
No ratings yet
Deep Robust Reinforcement Learning For Practical Algorithmic Trading
9 pages
Market Microstructure: Information-Based Models
No ratings yet
Market Microstructure: Information-Based Models
8 pages
Algorithms - Hidden Markov Models
No ratings yet
Algorithms - Hidden Markov Models
7 pages
Automatic Extraction and Identification of Chart Patterns Towards Financial Forecast
No ratings yet
Automatic Extraction and Identification of Chart Patterns Towards Financial Forecast
12 pages
Microstructure Tutorial PDF
No ratings yet
Microstructure Tutorial PDF
38 pages
Importing Data From Tick Data Into 3rd Party Software
No ratings yet
Importing Data From Tick Data Into 3rd Party Software
32 pages
Ensemble Average and Time Average
No ratings yet
Ensemble Average and Time Average
31 pages
MS&E448: Statistical Arbitrage: Group 5: Carolyn Soo, Zhengyi Lian, Jiayu Lou, Hang Yang
No ratings yet
MS&E448: Statistical Arbitrage: Group 5: Carolyn Soo, Zhengyi Lian, Jiayu Lou, Hang Yang
31 pages
Trading Based On Classification and Regression Trees
No ratings yet
Trading Based On Classification and Regression Trees
64 pages
Test of 20 Different MA Filters For Smoothness and Responsiveness
No ratings yet
Test of 20 Different MA Filters For Smoothness and Responsiveness
12 pages
On Machine Learning Based Cryptocurrency Trading
No ratings yet
On Machine Learning Based Cryptocurrency Trading
121 pages
Quant Trading Notes
100% (1)
Quant Trading Notes
20 pages
Deep Reinforcement Learning in High Frequency Trad
No ratings yet
Deep Reinforcement Learning in High Frequency Trad
6 pages
Pair Trade For Matlab
No ratings yet
Pair Trade For Matlab
4 pages
A Q-Learning Agent For Automated Trading in Equity Stock Markets - Anna's Archive
No ratings yet
A Q-Learning Agent For Automated Trading in Equity Stock Markets - Anna's Archive
34 pages
Implementing A Pairs Trading Strategy in Python - A Step-by-Step Guide - by The Python Lab - Medium
No ratings yet
Implementing A Pairs Trading Strategy in Python - A Step-by-Step Guide - by The Python Lab - Medium
23 pages
Applying Machine Learning To Pairs Trading - Illya Barziy
No ratings yet
Applying Machine Learning To Pairs Trading - Illya Barziy
36 pages
Py Fi
No ratings yet
Py Fi
17 pages
Quantitative Trading As A Mathematical Science: Quantcon Singapore 2016
No ratings yet
Quantitative Trading As A Mathematical Science: Quantcon Singapore 2016
54 pages
High Frequency Trading Final Pres Slides
No ratings yet
High Frequency Trading Final Pres Slides
43 pages
Volatility Surface Interpolation
No ratings yet
Volatility Surface Interpolation
24 pages
BNP Paribas Dupire Arbitrage Pricing With Stochastic Volatility
100% (1)
BNP Paribas Dupire Arbitrage Pricing With Stochastic Volatility
18 pages
A Novel Deep Learning Framework: Prediction and Analysis of Financial Time Series Using CEEMD and LSTM
No ratings yet
A Novel Deep Learning Framework: Prediction and Analysis of Financial Time Series Using CEEMD and LSTM
21 pages
02 A New Hybrid Cryptocurrency Returns Forecasting Method Based On Multiscale Decomposition and An Optimized Extreme Learning Machine Using The Sparro
No ratings yet
02 A New Hybrid Cryptocurrency Returns Forecasting Method Based On Multiscale Decomposition and An Optimized Extreme Learning Machine Using The Sparro
15 pages
A Systematic Review For Transformer-Based Long-Term Series Forecasting
No ratings yet
A Systematic Review For Transformer-Based Long-Term Series Forecasting
30 pages
TFG Luis Gonzalez Corujo PDF
No ratings yet
TFG Luis Gonzalez Corujo PDF
180 pages
FoRex Trading Using Supervised Machine Learning PDF
No ratings yet
FoRex Trading Using Supervised Machine Learning PDF
5 pages
Bates 1996
No ratings yet
Bates 1996
40 pages
(Whalley, Wilmott) An Asymptotic Analysis of An Optimal Hedging Model For Option Pricing With Transaction Costs (Jul1997)
100% (2)
(Whalley, Wilmott) An Asymptotic Analysis of An Optimal Hedging Model For Option Pricing With Transaction Costs (Jul1997)
18 pages
Algo Trader Documentation
No ratings yet
Algo Trader Documentation
48 pages
Genetic Trading
No ratings yet
Genetic Trading
22 pages
The Value of Queue Position in A Limit Order Book
100% (1)
The Value of Queue Position in A Limit Order Book
48 pages
A Machine Learning Framework For An Algorithmic Trading System PDF
No ratings yet
A Machine Learning Framework For An Algorithmic Trading System PDF
11 pages
Time Series Analysis and Spectral Analysis
No ratings yet
Time Series Analysis and Spectral Analysis
11 pages
Q For Gods Whitepaper Series (Edition 17) : Temporal Data: A KDB+ Framework For Corporate Actions
No ratings yet
Q For Gods Whitepaper Series (Edition 17) : Temporal Data: A KDB+ Framework For Corporate Actions
20 pages
A Reinforcement Learning Extension To The Almgren-Chriss Framework For Optimal Trade Execution
No ratings yet
A Reinforcement Learning Extension To The Almgren-Chriss Framework For Optimal Trade Execution
8 pages
A Mean Reverting Processes
No ratings yet
A Mean Reverting Processes
5 pages
Deep Reinforcement Learning For Automated Stock Trading - An Ensemble Strategy
No ratings yet
Deep Reinforcement Learning For Automated Stock Trading - An Ensemble Strategy
9 pages
Micro Structure Tutorial
No ratings yet
Micro Structure Tutorial
38 pages
Mit Thesis HFT
No ratings yet
Mit Thesis HFT
59 pages
A CNN-LSTM-based Model To Forecast Stock Prices
No ratings yet
A CNN-LSTM-based Model To Forecast Stock Prices
10 pages
Event-Driven LSTM For Forex Price Prediction
No ratings yet
Event-Driven LSTM For Forex Price Prediction
7 pages
A Generative Model of A Limit Order Book Using Recurrent Neural Networks
No ratings yet
A Generative Model of A Limit Order Book Using Recurrent Neural Networks
29 pages
Principles of Quantitative Development
From Everand
Principles of Quantitative Development
Manoj Thulasidas
No ratings yet
Exercises of Stochastic Processes
From Everand
Exercises of Stochastic Processes
Simone Malacrida
No ratings yet
TradeStation EasyLanguage for Algorithmic Trading: Discover real-world institutional applications of Equities, Futures, and Forex markets
From Everand
TradeStation EasyLanguage for Algorithmic Trading: Discover real-world institutional applications of Equities, Futures, and Forex markets
Domenico D'Errico
No ratings yet
The High Frequency Game Changer: How Automated Trading Strategies Have Revolutionized the Markets
From Everand
The High Frequency Game Changer: How Automated Trading Strategies Have Revolutionized the Markets
Paul Zubulake
No ratings yet
Advanced Dynamic-System Simulation: Model Replication and Monte Carlo Studies
From Everand
Advanced Dynamic-System Simulation: Model Replication and Monte Carlo Studies
Granino A. Korn
No ratings yet
Mastering AI-Powered Trading Bots for Options
From Everand
Mastering AI-Powered Trading Bots for Options
Jeffery William Long
No ratings yet
Modelling Single-name and Multi-name Credit Derivatives
From Everand
Modelling Single-name and Multi-name Credit Derivatives
Dominic O'Kane
No ratings yet
Getting Started in Creating Your Own Forex Robots
From Everand
Getting Started in Creating Your Own Forex Robots
M.N Thorne
1/5 (2)
Physics Experiment-05 (Concave Mirror)
No ratings yet
Physics Experiment-05 (Concave Mirror)
2 pages
Ebcs-2 12
No ratings yet
Ebcs-2 12
18 pages
Course File - HMT
No ratings yet
Course File - HMT
14 pages
Rating Curve
No ratings yet
Rating Curve
7 pages
FMC TSJ RG 001
No ratings yet
FMC TSJ RG 001
9 pages
Uses of Concave and Convex Mirrors Class 10
No ratings yet
Uses of Concave and Convex Mirrors Class 10
2 pages
Cycolac MG 47
No ratings yet
Cycolac MG 47
3 pages
Grade 7 Physics Summer Task
No ratings yet
Grade 7 Physics Summer Task
6 pages
Aging Formula For Lithium Ion Batteries With Solid Electrolyte Interphase Layer Growth
No ratings yet
Aging Formula For Lithium Ion Batteries With Solid Electrolyte Interphase Layer Growth
9 pages
CET WS - 01 (Electric Charge and Fields, Electrostatic Potential and Capacitance)
No ratings yet
CET WS - 01 (Electric Charge and Fields, Electrostatic Potential and Capacitance)
9 pages
Grade 11 Chem Atomic Structure 3
No ratings yet
Grade 11 Chem Atomic Structure 3
68 pages
Pressure Vessel and Brakes Take Home Problems
No ratings yet
Pressure Vessel and Brakes Take Home Problems
4 pages
USP Chapter 788 Particulate Matter in Injection
No ratings yet
USP Chapter 788 Particulate Matter in Injection
4 pages
MDA & FDA (Clubbed) 2023
No ratings yet
MDA & FDA (Clubbed) 2023
5 pages
PHYS 813: Statistical Mechanics, Assignment 1: X y y X
No ratings yet
PHYS 813: Statistical Mechanics, Assignment 1: X y y X
3 pages
Composite Materials: Presented By: 2018-ME-335 2018-ME-336 2018-ME-337 2018-ME-338 2018-ME-339
100% (1)
Composite Materials: Presented By: 2018-ME-335 2018-ME-336 2018-ME-337 2018-ME-338 2018-ME-339
18 pages
Corrosión y Preservación de La Infraestructura Industrial
No ratings yet
Corrosión y Preservación de La Infraestructura Industrial
290 pages
Appendix G2 - Ii Design Criteria
No ratings yet
Appendix G2 - Ii Design Criteria
48 pages
11 - Semiconductor Solution
No ratings yet
11 - Semiconductor Solution
4 pages
Padhle 10th - Human Eye & The Colourful World + Integrated PYQs
No ratings yet
Padhle 10th - Human Eye & The Colourful World + Integrated PYQs
18 pages
Project Report Specimen
No ratings yet
Project Report Specimen
5 pages
SBT Mechanics TH 1 PDF Free
100% (1)
SBT Mechanics TH 1 PDF Free
535 pages
Syllabus GDCE
No ratings yet
Syllabus GDCE
10 pages
JEE MAIN 2020 (03.09.2020 - 1st Shift)
No ratings yet
JEE MAIN 2020 (03.09.2020 - 1st Shift)
11 pages
Unit 3 - Steam and Gas Turbines
No ratings yet
Unit 3 - Steam and Gas Turbines
94 pages
Xi WS 2 Math's Jul 23
No ratings yet
Xi WS 2 Math's Jul 23
3 pages
Dilday 2010 ApJ 713 1026
No ratings yet
Dilday 2010 ApJ 713 1026
11 pages
SMA 2100 Discrete Mathematics
No ratings yet
SMA 2100 Discrete Mathematics
2 pages
The Edrizzi System: User Manual March 2020/4
No ratings yet
The Edrizzi System: User Manual March 2020/4
58 pages

Forecasting Non-Stationary Time Series by Wavelet Process Modelling (Lsero)

Uploaded by

Forecasting Non-Stationary Time Series by Wavelet Process Modelling (Lsero)

Uploaded by

Piotr Fryzlewicz, Sébastien van Bellegem and Rainer von

Article (Accepted version)

© 2003 The Institute of Statistical Mathematics

This version available at: http://eprints.lse.ac.uk/25830/

Keywords: Local stationarity, non-decimated wavelets, prediction, time-modulated pro-

where Z(ω) is a stochastic process with orthonormal increments. Non-stationary processes

2 Locally Stationary Wavelet processes

(2.3) Sj (z) = |Wj (z)|2 , z ∈ (0, 1)

as Ψj (0) = 1 for all scales j.

Proposition 1 (Nason et al. (2000)). Under the assumptions of Definition 1, if T → ∞,

Remark 2 (Stationary processes). A stationary process with an absolutely summable

Remark 3 (Time-modulated processes). Time-modulated (TM) processes constitute a

• the autocovariance function of Yt is absolutely summable (so that Yt is LSW with a

• and if the Lipschitz constants LX Y 1/2

3 The predictor and its theoretical properties

3.1 Definition of the linear predictor

The rescaled-time segment

3.2 Prediction in the wavelet domain

for all j = −1, . . . , −J and k ∈ Z. Then, the one-step-ahead predictor is constructed as

3.3 One-step ahead prediction equations

MSPE(X̂t;T , Xt;T ) = b0t Σt;T bt ,

Assumption 2. The evolutionary wavelet spectrum is such that

(3.8) MSPE(X̂t;T , Xt;T ) = b0t Bt;T bt (1 + oT (1)) .

Remark 6 (Time-modulated processes). For the processes considered in Remark 3

3.4 The prediction error

3.5 h-step-ahead prediction

MSPE(X̂t+h−1;T , Xt+h−1;T ) = b0t+h−1 Bt+h−1;T bt+h−1 (1 + oT (1)) .

4 Prediction based on data

4.1 Estimation of the time-varying second-order structure

Proposition 5. The estimator (4.1) satisfies

Remark 7 (Time-modulated processes). For Gaussian time-modulated processes con-

where the matrix A = (Aj` )j,`<0 is defined by

4.2 Future observations in rescaled time

4.3 Data-driven choice of parameters

5 Application of the general predictor to real data

A Theoretical properties of the predictor

and, analogously, define P3 , P4 , . . . up to PM where M = (L/δ) − 1. Consider the union

Lemma 1. Assume that (3.5) holds. If L → ∞, δ/L → 0 and L2 /T → 0 as T → ∞, then

x0 (Σt;T − D) x = x0 xoT (1).

by assumption of the Lemma.

using Parseval formula, we have

We will now study the approximation of Σt;T by Bt;T .

Lemma 4. Under the assumptions of Proposition 2 and 4,

MSPE(X̂t+h−1;T , Xt+h−1;T ) = b0t+h−1 Bt+h−1;T bt+h−1 + b0t+h−1 bt+h−1 oT (1)

We bound Rest1 as follows:

If Lj denotes the length of support of ψj , we have 0 6 k−n, k−m 6 Lj and so k−(n+m)/2 6

Finally, by Assumption (3.5), (A.6) yields the result.

b0t+h−1 Σt+h−1;T bt+h−1 = b0t+h−1 Bt+h−1;T bt+h−1 (1 + oT (1))

b0t+h−1 bt+h−1 6 b0t+h−1 Σt+h−1;T bt+h−1 kΣ−1

for all sequences bt+h−1 . The above gives

for large T . On the other hand, we have

which implies the result.

B Estimation of the local autocovariance function

Lemma 6. 1. The system {Ψj (τ ), j = −1, −2, . . .} is linearly independent.

2. Denote by Ψ(τ ) the wavelet autocorrelation function of a continuous wavelet ψ, i.e.

Denote by m0 (ξ) the trigonometric polynomial which corresponds to the construction of

and, using the 2πk-periodicity of m0 ,

By similar transformations, we finally arrive at

Using (B.1), we obtain

which clearly tends to 0 as L tends to infinity.

Lemma 8. Matrix A defined in (4.4) has the following properties:

If, in addition, A is constructed using Haar wavelets, then

for all j < 0, where C is a constant.

Proof. (B.3) is a straightforward corollary of Lemma 7. To prove (B.4), we introduce the

Proof of Proposition 5. We will first show

provided that (3.5) holds. Using (B.7), we finally obtain

Daubechies, I. (1992). Ten lectures on wavelets. Philadelphia: SIAM.

Kress, R. (1991). Numerical analysis. New York: Springer.

1 These simulated examples demonstrate the idea of a sparse representation of

(b) Comparison between the one-step-ahead prediction in our model (dashed

(a) 9-step-ahead prediction using LSW modelling . . . . . . . . . . . . . . 31

Figure 1: These simulated examples demonstrate the idea of a sparse representation of

You might also like