0% found this document useful (0 votes)
115 views10 pages

Financial Time Series

The evolution of financial markets is a complicated real-world phenomenon that ranks at the top in terms of difficulty of modeling and/or prediction. One reason for this difficulty is the well-documented nonlinearity that is inherent at work. The state-of-the-art on the nonlinear modeling of financial returns is given by the popular auto-regressive conditional heteroscedasticity (ARCH) models and their generalizations but they all have their short-comings. Foregoing the goal of finding the ‘best’ model, it is possible to simply transform the problem into a more manageable setting such as the setting of linearity. The form and properties of such a transformation are given, and the issue of one-step-ahead prediction using the new approach is explicitly addressed.

Uploaded by

Ioana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
115 views10 pages

Financial Time Series

The evolution of financial markets is a complicated real-world phenomenon that ranks at the top in terms of difficulty of modeling and/or prediction. One reason for this difficulty is the well-documented nonlinearity that is inherent at work. The state-of-the-art on the nonlinear modeling of financial returns is given by the popular auto-regressive conditional heteroscedasticity (ARCH) models and their generalizations but they all have their short-comings. Foregoing the goal of finding the ‘best’ model, it is possible to simply transform the problem into a more manageable setting such as the setting of linearity. The form and properties of such a transformation are given, and the issue of one-step-ahead prediction using the new approach is explicitly addressed.

Uploaded by

Ioana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Overview

Financial time series


Dimitris N. Politis
The evolution of financial markets is a complicated real-world phenomenon that
ranks at the top in terms of difficulty of modeling and/or prediction. One reason
for this difficulty is the well-documented nonlinearity that is inherent at work.
The state-of-the-art on the nonlinear modeling of financial returns is given by
the popular auto-regressive conditional heteroscedasticity (ARCH) models and
their generalizations but they all have their short-comings. Foregoing the goal of
finding the best model, it is possible to simply transform the problem into a more
manageable setting such as the setting of linearity. The form and properties of
such a transformation are given, and the issue of one-step-ahead prediction using
the new approach is explicitly addressed . 2009 John Wiley & Sons, Inc. WIREs Comp Stat
2009 1 157166

INTRODUCTION

onsider data X1 , . . . , Xn arising as an observed


stretch from a financial returns time series {Xt }
such as the percentage returns of a stock index,
stock price or foreign exchange rate; the returns
may be daily, weekly, or calculated at different
(discrete) intervals. The returns {Xt } are typically
assumed to be strictly stationary having mean zero
whichfrom a practical point of viewimplies
that trends and/or other nonstationarities have been
successfully removed.
At the turn of the 20th century, pioneering work
of L. Bachelier1 suggested the Gaussian random walk
model for (the logarithm of) stock market prices.
Because of the approximate equivalence of percentage
returns to differences in the (logarithm of the) price
series, the direct implication was that the returns
series {Xt } can be modeled as independent, identically
distributed (i.i.d.) random variables with Gaussian
N(0, 2 ) distribution. Although Bacheliers thesis was
not so well-received by its examiners at the time, his
work served as the foundation for financial modeling
for a good part of the last century.
The Gaussian hypothesis was first challenged in
the 1960s when it was noticed that the distribution of
returns seemed to have fatter tails than the normal.2
Recent work has empirically confirmed this fact, and
has furthermore suggested that the degree of heavy
Correspondence

to: [email protected]

Department of Mathematics, University of California at San Diego,


La Jolla, CA 92093-0112, USA
DOI: 10.1002/wics.024

Vo lu me 1, September/Octo ber 2009

tails is such that the distribution of returns has finite


moments only up to order about two.35
Furthermore, in an early paper of B.
Mandelbrot6 the phenomenon of volatility clustering
was pointed out, i.e., the fact that high volatility
days are clustered together and the same is true for
low volatility days; this is effectively negating the
assumption of independence of the returns in the
implication that the absolute values (or squares) of
the returns are positively correlated.
For example, Figure 1 depicts the daily returns
of the S&P500 index from August 30, 1979 to August
30, 1991; the extreme values associated with the
crash of October 1987 are very prominent in the plot.
Figure 2(a) is a correlogram of the S&P500 returns,
i.e., a plot of the estimated autocorrelation function
(ACF); the plot is consistent with the hypothesis of
uncorrelated returns. By contrast, the correlogram
of the squared returns of Figure 2(b) shows some
significant correlations thus lending support to the
volatility clustering hypothesis.
The celebrated auto-regressive conditional
heteroscedasticity (ARCH) models of 2003 Nobel
Laureate R. Engle7 were designed to capture the
phenomenon of volatility clustering by postulating a
particular structure of dependence for the time series
of squared returns {Xt2 }. A typical ARCH(p) model is
described by the Eq.a


p


2
Xt = Zt a +
ai Xti
(1)
i=1
a Equation (1) and subsequent equations where the time variable t
is left unspecified are assumed to hold for all t = 0, 1, 2, . . ..

2009 Jo h n Wiley & So n s, In c.

157

Overview

www.wiley.com/wires/compstats

0.05

SP500 returns

0.0
0.05
0.10
0.15
0.20

500

1000

1500

2000

2500

3000

where a, a1 , a2 , . . . are nonnegative real-valued


parameters, p is a nonnegative integer indicating the
order of the model, and the series {Zt } is assumed
to be i.i.d. N(0, 2 ). Bacheliers model is a special
case of the ARCH(p) model; just let ai = 0 for all i,
effectively implying a model of order zero.
The ARCH model (1) is closely related to an
auto-regressive (AR) model on the squared returns. It
is a simple calculation8 that Eq. (1) implies
Xt2

=a+

p


2
ai Xti

+ Wt

(2)

i=1

where the errors Wt constitute a mean zero,


uncorrelatedb sequence. Note, however, that the Wt s
in the above are not independent, thus making the
original Eq. (1) more useful.
It is intuitive to also consider an auto-regressive
moving average (ARMA) model on the squared
returns; this idea is closely related to Bollerslevs9
GARCH(p, q) models. Among these, the GARCH(1,1)
model is by far the most popular, and often forms
the benchmark for modeling financial returns. The
generalized ARCH (GARCH)(1,1) model is described
by the equation:
2
Xt = st Zt with s2t = C + AXt1
+ Bs2t1

(3)

where the Zt s are i.i.d. N(0, 1) as in Eq. (1), and


the parameters A, B, C are assumed nonnegative.
Under model (3) it can be shown9,10 that EXt2 <
only when A + B < 1; for this reason the latter is
is sometimes called a weak stationarity condition
since the strict stationarity of {Xt } also implies weak
stationarity when second moments are finite.
b To talk about the second moments of W here, we have tacitly
t
assumed EXt4 < .

158

FIGURE 1 | Daily returns of the S&P500 index spanning the period


8-30-1979 to 8-30-1991.

time

Back-solving in the right-hand side of Eq. (3),


it is easy to see10 that the GARCH model (3) is
tantamount to the ARCH model (1) with p = and
the following identifications:

a=

C
, and ai = ABi1 for i = 1, 2, . . .
1B

(4)

In fact, under some conditions, all GARCH(p, q)


models have ARCH() representations similar to
the above. So, in some sense, the only advantage
GARCH models may offer over the simpler ARCH
is parsimony, i.e., achieving the same quality of
model fitting with fewer parameters. Nevertheless,
if one is to impose a certain structure on the
ARCH parameters, then the effect is the same;
the exponential structure (4) is a prime such
example.
The above ARCH/GARCH models beautifully
capture the phenomenon of volatility clustering in
simple equations at the same time implying a
marginal distribution for the {Xt } returns that has
heavier tails than the normal. Viewed differently,
the ARCH(p) and/or GARCH (1,1) model may be
considered as attempts to normalize the returns,
i.e., to reduce the problem to a model with normal
residuals (the Zt s). In that respect, though the
ARCH(p) and/or GARCH (1,1) models are only
partially successful as empirical work suggests that
ARCH/GARCH residuals often exhibit heavier tails
than the normal; the same is true for ARCH/GARCH
spin-off models such as the EGARCHsee Refs 11,12
for a review.
Nonetheless, the goal of normalization is most
worthwhile and it is indeed achievable as will be
shown in the sequel where the connection with the
issue of nonlinearity of stock market returns will also
be brought forward.

2009 Jo h n Wiley & So n s, In c.

Vo lu me 1, September/Octo ber 2009

WIREs Computational Statistics

Financial time series

SP500 returns

(a)
1.0

ACF

0.8
0.6
0.4
0.2
0.0
0

10

20

30

Lag

(b)

SP500 returns squared


1.0

ACF

0.8
0.6
0.4
0.2

FIGURE 2 | (a) Correlogram of the


S&P500 returns. (b) Correlogram of
the S&P500 squared returns.

0.0
0

10

LINEAR AND GAUSSIAN TIME SERIES


Consider a strictly stationary time series {Yt } thatfor
ease of notationis assumed to have mean zero. The
most basic tool for quantifying the inherent strength
of dependence is given by the autocovariance function
(k) = EYt Yt+kand the corresponding Fourier series
iwk ; the latter function
f (w) = (2 )1
k= (k)e
is termed the spectral density,
and is well-defined

(and continuous) when
k | (k)| < . We can
also define the autocorrelation function (ACF) as
(k) = (k)/ (0). If (k) = 0 for all k > 0, then
the series {Yt } is said to be a white noise, i.e., an
uncorrelated sequence; the reason for the term white
is the constancy of the resulting spectral density
function.
The function (k) represents the second-order
moments of the time series {Yt }; more technically, it
represents the second order-cumulants.13 The thirdorder cumulants are given by the function (j, k) =
EYt Yt+j Yt+k whose Fourier series F(w1 , w2 ) =


iw1 jiw2 k
is termed the
(2 )2
j=
k= (j, k)e
bispectral density. We can similarly define the cumulants of higher order, and their corresponding Fourier
series that constitute the so-called higher order spectra.
The set of cumulant functions of all orders,
or equivalently the set of all higher order spectral
density functions, is a complete description of the
dependence structure of the general time series {Yt }.
Of course, working with an infinity of functions is
very cumbersome; a short-cut is desperately needed,
and presented to us by the notion of linearity.
Vo lu me 1, September/Octo ber 2009

20

30

Lag

A time series {Yt } is called linear if it satisfies an

equation of the type: 


k Ztk
(5)
Yt =
k=

where the coefficients k are (at least) squaresummable, and the series {Zt } is i.i.d. with mean zero
and variance 2 > 0. A linear time series {Yt } is called
causal if k = 0 for k < 0, i.e., if
Yt =

k Ztk .

(6)

k=0

Equation (6) should not be confused with the Wold


decomposition that all purely nondeterministic time
series possess.14 In the Wold decomposition, the
error series {Zt } is only assumed to be a white noise
and not i.i.d.; the latter assumption is much stronger.
Linear time series are easy objects to work with
since the totality of their dependence structure of a
linear time series is perfectly captured by a single
entity, namely the sequence of k coefficients. To
elaborate, the autocovariance and spectral
 density of
{Yt } can be calculated to be (k) = 2
s= s s+k
1 2
2
and f (w) = (2 ) |(w)| respectively where (w)
is the 
Fourier series of the k coefficients, i.e.,
iwk . In addition, the bispectral
(w) =
k= k e
density is simply given by

2009 Jo h n Wiley & So n s, In c.

F(w1 , w2 ) = (2 )2 3 (w1 )
(w2 )(w1 + w2 )

(7)

159

Overview

www.wiley.com/wires/compstats

where 3 = EZ3t is the third moment of the errors.


Similarly, all higher order spectra can be calculated in
terms of (w).
The prime example of a linear time series is
given by the aforementioned AR family pioneered by
G. Yule15 in which the time series {Yt } has a linear
relationship with respect to its own lagged values,
namely
Yt =

p


k Ytk + Zt

(8)

k=1

with the error process {Zt } being i.i.d. as in Eq. (5).


AR modeling lends itself ideally to the problem of
prediction of future values of the time series.
For concreteness, let us focus on the onestep-ahead prediction problem, i.e., predicting the
value of Yn+1 on the basis of the observed data
Y1 , . . . , Yn , and denote by Y n+1 the optimal (with
respect to mean squared error) predictor. In general,
we can write Y n+1 = gn (Y1 , . . . , Yn ) where gn () is an
appropriate function. As can easily be shown,16 the
function gn () that achieves this optimal prediction
is given by the conditional expectation, i.e., Y n+1 =
E(Yn+1 |Y1 , . . . , Yn ). Thus, to implement the one-stepahead prediction in a general nonlinear setting requires
knowledge (or accurate estimation) of the unknown
function gn () which is far from trivial.8,17,18
In the case of a causal AR model19 however, it is
easy to show that 
the function gn () is actually linear,
p
and that Y n+1 = k=1 k Yn+1k . Note furthermore
the property of finite memory in that the prediction
function gn () is only sensitive to its last p arguments.
Although the finite memory property is specific to
finite-order causal AR (and Markov) models, the
linearity of the optimal prediction function gn ()
is a property shared by all causal linear time
series satisfying Eq. (6); this broad class includes all
causal and invertible, i.e., minimum-phase,20 ARMA
models with i.i.d. innovations.
However, the property of linearity of the optimal
prediction function gn () is shared by a larger class of
processes. To define this class, consider a weaker
form of Eq. (6) that amounts to relaxing the i.i.d.
assumption on the errors to the assumption of a
martingale difference, i.e., to assume that

i.e., that
E[t |Ft1 ] = 0 and E[t2 |Ft1 ] = 1 for all t. (10)
Following,21 we will use the term weakly linear for
a time series {Yt } that satisfies (9) and (10). As it
turns out, the linearity of the optimal prediction
function gn () is shared by all members of the family
of weakly linear time series;c for example, see Ref 23
and Theorem 1.4.2 of Ref 14.
Gaussian series form an interesting subset of the
class of linear time series. They occur when the series
{Zt } of Eq. (5) is i.i.d. N(0, 2 ), and they too exhibit
the useful linearity of the optimal prediction function
gn (); to see this, recall that the conditional expectation
E(Yn+1 |Y1 , . . . , Yn ) turns out to be a linear function of
Y1 , . . . , Yn when the variables Y1 , . . . , Yn+1 are jointly
normal.19
Furthermore, in the Gaussian case, all spectra of
order higher than two are identically zero; it follows
that all dependence information is concentrated in
the spectral density f (w). Thus, the investigation of
a Gaussian series dependence structure can focus on
the simple study of second-order properties, namely
the ACF (k) and/or the spectral density f (w). For
example, an uncorrelated Gaussian series, i.e., one
satisfying (k) = 0 for all k, necessarily consists of
independent random variables.
To some extent, this last remark can be
generalized to the linear setting: if a linear time
series is deemed to be uncorrelated, then practitioners
typically infer that it is independent as well.d Note that
to check/test whether an estimated ACF, denoted by
(k),

is significantly different from zero, the Bartlett


confidence limits are typically usedsee e.g., the
bands in Figure 2(a); but those too are only valid
for linear or weakly linear time series.14,25,26
To sum up: all the usual statistical goals of
prediction, confidence intervals and hypothesis testing
are greatly facilitated in the presence of linearity, and
particularly in the presence of normality.

CAN THE STOCK MARKET BE


LINEARISED?
It should come as no surprise that a simple parametric
model as (1) might not perfectly capture the behavior
c

Yt =

i ti

(9)

i=0

where {t } is a stationary martingale difference


adapted to Ft , the -field generated by {Ys , s t},
160

There exist, however, time series not belonging to the family


of weakly linear series for which the best predictor is linear. An
example is given by a typical series of squared financial returns, i.e.,
the series {Vt } where Vt = Xt2 for all t, and {Xt } is modeled by an
ARCH/GARCH model.21 Other examples can be found in the class
of random coefficient AR models.22
d Strictly speaking, this inference is only valid for the aforementioned class of causal and invertible ARMA models.24

2009 Jo h n Wiley & So n s, In c.

Vo lu me 1, September/Octo ber 2009

WIREs Computational Statistics

Financial time series

of a complicated real-world phenomenon such as


the evolution of financial returns thatalmost by
definition of market efficiencyranks at the top
in terms of difficulty of modeling/prediction. As a
consequence, researchers have recently been focusing
on alternative models for financial time series.
For example, consider the model
Xt = (t) Zt

(11)

where Zt is i.i.d. (0, 1). If { (t)} is considered a


random process independent of {Zt }, then (11)
falls in the class of stochastic volatility models.12
If, however, () is thought to be a deterministic
function that changes slowly (smoothly) with t, then
model (11) is nonstationaryalthough it is locally
stationary,27 and () can be estimated from the
data using nonparametric smoothing techniques; see,
e.g., Ref 28 and the references therein.
As another example, consider the nonparametric
ARCH model defined by the equation:
Xt = gp (Xt1 , . . . , Xtp ) Zt

(12)

where Zt is i.i.d. (0, 2 ), and gp is an unknown smooth


function to be estimated from the data. Additional
nonparametric methods for financial time series are
discussed in the review paper.29
Despite their nonparametric (and possibly
nonstationary) character, the above are just some
different models attempting to fully capture/describe
the probabilistic characteristics of a financial time
series which is perhaps an overly ambitious task.
Foregoing the goal of finding the best model, we may
instead resort to an exploratory, model-free approach
in trying to understand this difficult type of data. In
particular, we may attempt to transform the problem
into a more manageable setting such as the setting of
linearity.
Consider again the financial returns data Xn =
(X1 , . . . , Xn ), and a transformation of the type Vn =
H(Xn ) where Vn is also n-dimensional. Ideally, we
would like the transformed series Vn = (V1 , . . . , Vn )
to be linear since, as mentioned before, such time
series are easy to work with.
However, just asking for linearity of the
transformed series is not enough. For example, the
naive transformation Vt = sign(Xt ) may be thought of
as a linearizing transformation since, by the efficient
market hypothesis, sign(Xt ) is i.i.d. (taking the values
+1 and -1 with equal probability), and therefore linear.
Nevertheless, in spite of the successful linearization,
the sign transformation is not at all useful as the
Vo lu me 1, September/Octo ber 2009

passage from Xn to Vn is associated with a profound


loss of information.
To avoid such information loss due to
processing,30 we should further require that the transformation H be in some suitable sense invertible,
allowing us to work with the linear series Vt but
then being able to recapture the original series by the
inverse transformation H 1 (Vn ). Interestingly, the key
to finding such a transformation is asking for more:
look for a normalizing (instead of just linearizing)
information preserving transformation.
We now show how this quest may indeed be
fruitful using the ARCH equation (1) as a stepping
stone. Note that Eq. (1) can be re-written as
Xt
.
Zt = 
p
2
a + i=1 ai Xti
Hence, we are led to define the transformed variable
Vt by
Vt = 

Xt
s2t1

+ a0 Xt2 +

p

2
i=1 ai Xti

for t = p + 1, p + 2, . . . , n,

(13)

and Vt = Xt /st for t = 1, 2, . . . , p. In the above,


, a0 , a1 , . . . , ap are nonnegative real-valued parameters, and s2t1 is an estimator of X2 = Var(X1 ) based
on the data up to (but not including) time t. Under the
zero mean assumption for Xt , the natural estimator is

2
s2t1 = (t 1)1 t1
k=1 Xk .
The invertibility of the above transformation is
manifested by solving Eq. (13) for Xt , thus obtaining:
Vt
Xt = 
1 a0 Vt2



p


2
s2 +
ai Xti
t1
i=1

for t = p + 1, p + 2, . . . , n.

(14)

Given the initial conditions X1 , . . . , Xp , the information set FnX = {Xt , 1 t n} is equivalent to the
information set FnV = {Vt , 1 t n}. To see this, note
that with Eq. (14) we can recursively regenerate Xt
for t = p + 1, p + 2, . . . , n using just FnV and the initial
conditions; conversely, Eq. (13) defines Vt in terms of
FnX .
Equation (13) describes the candidate normalizing (and therefore also linearizing) transformation,
i.e., the operator H in Vn = H(Xn ); this transformation was termed NoVaS in Ref 31 which is
an acronym for normalizing and variance stabilising. Note that formally the main difference between

2009 Jo h n Wiley & So n s, In c.

161

Overview

www.wiley.com/wires/compstats

Eq. (13) and the ARCH Eq. (1) is the presence of


the term Xt2 paired with the coefficient a0 inside the
square root; this is a small but crucial difference
without which the normalization goal is not always
feasible.31,32 A secondary difference is having s2t1
take the place of the parameter a; this is motivated
by a dimension (scaling) argument in the sense that
choosing/estimating is invariant with respect to a
change in the units of measurement of Xt . Such invariance does not hold in estimating the parameter a in
Eq. (1).
Despite its similarity to model (1), Eq. (13)
is not to be interpreted as a model for the{Xt }
series. In a modeling situation, the characteristics
of the model are prespecified (e.g., errors that are
i.i.d. N(0, 2 ), etc.), and standard methods such as
maximum likelihood or least squares are used to fit
the model to the data. By contrast, Eq. (13) does
not aspire to fully describe the probabilistic behavior
of the {Xt } series. The order p and the vector of
nonnegative parameters (, a0 , . . . , ap ) are chosen by
the practitioner with just the normalization goal in
mind, i.e., in trying to render the transformed series
{Vt } as close to normal as possible; here, closeness
to normality can be conveniently measured by the
Shapiro-Wilk (SW) test statistic33 or its corresponding
P-value.
It is advantageous (and parsimonious) in practice
to assign a simple structure of decay for the ak
coefficients. The most popular such structureshared
by the popular GARCH(1,1) model9 is associated
with an exponential rate of decay, i.e., postulating
that ak = Cedk for some positive constants d and
C whichtogether with the parameter are to be
chosen by the practitioner.

normalised SP500

Taking into account the convexity requirement


+

the exponential coefficients scheme effectively has only


two free parameters that can be chosen with the
normalization goal in mind, i.e., chosen to maximize
the SW statistic calculated on the transformed series
Vt or linear combinations thereofthe latter in order
to also ensure normality of joint distributions.
As it turns out, the normalization goal can
typically be achieved by a great number of combinations of these two free parameters, yielding an
equally great number of possible normalizing transformations. Among those equally valid normalizing
transformations the simplest one corresponds to the
choice = 0. Alternatively, the value of may be
chosen by an additional optimization criterion driven
by an application of interest such as predictive ability.
For illustration, let us revisit the S&P500
returns dataset. The normalizing trasformation with
ak = Cedk and the simple choice = 0 is achieved
with d = 0.0675; the resulting tranformed V-series is
plotted in Figure 3 which should be compared with
Figure 1. Not only is the phenomenon of volatility
clustering totally absent in the tranformed series but
also the outliers corresponding to the crash of October
1987 are hardly (if at all) discernible.

QUANTIFYING NONLINEARITY AND


NONNORMALITY
There are many indications pointing to the nonlinearity of financial returns. For instance, the fact that
returns are uncorrelated but not independent is a good
indicator; see, e.g., Figure 2(a) and (b). Notably, the
ARCH model and its generalizations are all models
for nonlinear series.
To quantify nonlinearity, it is useful to define
the new function
K(w1 , w2 ) =

2
3
0

500

1000

1500
time

2000

2500

3000

FIGURE 3 | Normalized S&P500 returns, i.e., the tranformed V


-series, spanning the same period 8-30-1979 to 8-30-1991.

162

ak = 1,

k0

|F(w1 , w2 )|2
.
f (w1 )f (w2 )f (w1 + w2 )

(15)

From Eq. (7), it is apparent that if the time series is


linear, then K(w1 , w2 ) is the constant functionequal
to 23 /(2 6 ) for all w1 , w2 ; this observation can be
used in order to test a time series for linearity;34,35 see
also3638 for an up-to-date review of different tests for
linearity.e In the Gaussian case we have 3 = 0 and
therefore F(w1 , w2 ) = 0 and K(w1 , w2 ) = 0 as well.
e

A differentbut relatedfact is that the closure of the class of


linear time series is large enough so that it contains some elements

2009 Jo h n Wiley & So n s, In c.

Vo lu me 1, September/Octo ber 2009

WIREs Computational Statistics

Financial time series

0.05

data quantiles

0.0

0.020
)
K (w 1, w 2

0.015

0.05
0.10
0.15

0.010
0.005

0.20

1.0
0.8

0.0

0.6

0.2
0.4

0.4
w1

0.6

w2

FIGURE 6 | QQ-plot of the S&P500 returns.

0.2

0.8

0
normal quantiles

1.0 0.0
3

1 , w2 ) vs. (w1 , w2 ) for the S&P500 returns.


FIGURE 4 | Plot of K(w
data quantiles

0
1
2

4e-05
)
K (w 1, w 2

3e-05

-2

2e-05
1e-05

1.0
0.8

0.0

0
normal quantiles

FIGURE 7 | QQ-plot of the normalized S&P500 returns.

0.6

0.2
0.4

0.4
w1

0.6

0.2

0.8

w2

1.0 0.0

1 , w2 ) vs. (w1 , w2 ) for the normalized


FIGURE 5 | Plot of K(w
S&P500 returns.

1 , w2 ) denote a data-based nonparaLet K(w


metric estimator of the quantity K(w1 , w2 ). For our
1 , w2 ) will be a kernel smoothed estipurposes, K(w
mator based on infinite-order flat-top kernels that
lead to improved accuracy.40 Figure 4 shows a plot of
1 , w2 ) for the S&P500 returns; its nonconstancy
K(w
is direct evidence of nonlinearity. By contrast, the
1 , w2 ) computed from the normalized
function K(w
S&P500 returns, i.e., the V-series, is not statistically
different from the zero function. Figure 5 shows a plot
that can be confused with a particular type of nonlinear time series;
see, e.g., Fact 3 of Ref. 39.

Vo lu me 1, September/Octo ber 2009

1 , w2 ); note that, in order to present a nontrivof K(w


ial figure, the scale on the vertical axis of Figure 5 is
500 times bigger than that of Figure 4.
1 , w2 ) is not statistically
The fact that K(w
different from the zero function lends support to the
conclusion that the transformed series is linear with
distribution symmetric about zero; the normal is such
a distribution but it is not the only one. To further
delve into the issue of possible normality, recall the
aforementioned SW test which effectively measures
the lack-of-fit of a quantilequantile plot (QQ-plot)
to a straight line. Figure 6 shows the QQ-plot of the
S&P500 returns; it is apparent that a straight line is
not a good fit. As a matter of fact, the SW test yields
a P-value that is zero to several decimal pointsthe
strongest evidence of nonnormality of stock returns.
By contrast, the QQ-plot of the normalized S&P500
returns can be very well approximated by a straight
line: the R2 associated with the plot in Figure 7
is 0.9992, and the SW test yields a P-value of
0.153 lending strong support to the fact that the

2009 Jo h n Wiley & So n s, In c.

163

Overview

www.wiley.com/wires/compstats

transformed series is indistiguishable from a Gaussian


series.
The proposed transformation technique has
been applied to a host of different financial datasets
including returns from several stock indices, stock
prices, and foreign exchange rates.41 Invariably, it was
shown to be successful in its dual goal of normalization
and linearization. Furthermore, as already discussed,
a welcome by-product of linearization/normalization
is that the construction of out-of-sample predictors
becomes easy in the transformed space.
Of course, the desirable objective is to obtain
predictions in the original (untransformed) space of
financial returns. The first thing that comes to mind
is to invert the transformation so that the predictor in
the transformed space is mapped back to a predictor
in the original space; this is indeed possible albeit
suboptimal. Optimal predictors were formulated in
Refs 31,42 and are outlined in the next section where
the construction of predictive distributions is also
discussed.
Notably, predictors based on the NoVaS transformation technique have been shown to outperform
GARCH-based predictors in a host of applications
involving both real and simulated data.31,41 In addition, the NoVaS predictors are very robust, performing well even in the presence of structural breaks or
other nonstationarities in the data.43 Perhaps the most
striking finding is that with moderately large sample
sizes (of the order of 350 daily data), the NoVaS
predictors appear to outperform GARCH-based predictors even when the underlying data generating
process is itself GARCHat least when the sample
size is only moderately large.43

PREDICTION VIA THE


TRANSFORMATION TECHNIQUE
For concreteness, we focus on the problem of onestep-ahead prediction, i.e., prediction of a function of
the unobserved return Xn+1 , say h(Xn+1 ),based on the
observed data FnX = {Xt , 1 t n}. Our normalizing
transformation affords us the opportunity to carry out
the prediction in the V-domain where the prediction
problem is easiest since the problem of optimal
prediction reduces to linear prediction in a Gaussian
setting.
The prediction algorithm is outlined as follows:
Calculate the transformed series V1 , . . . , Vn using
Eq. (13).
Calculate the optimal predictor of Vn+1 , denoted
by V n+1 , given FnV . This predictor would have
164

q1
the general form V n+1 = i=0 ci Vni . The ci
coefficients can be found by Hilbert space
projection techniques, or by simply fitting the
causal AR model
Vt+1 =

q1


ci Vti +
t+1 .

(16)

i=0

to the data where


t is i.i.d. N(0, 2 ). The order
q can be chosen by an information criterion
such as Akaikes Information Criterion (AIC) or
Bayesian Information Criterion (BIC).44

Note that Eq. (14) suggests that h(Xn+1 ) = un (Vn+1 )


where un is given by

V
un (V) = h
1 a0 V 2


p


2
s2 +
.
ai Xn+1i
t1
i=1

(17)
Thus, a quick-and-easy predictor of h(Xn+1 ) could
then be given by un (V n+1 ).
A better predictor, however, is given by the
center of location of the distribution of un (Vn+1 )
conditionally on FnV . Formally, to obtain an optimal
predictor, the optimality criterion must first be
specified, and correspondingly the form of the
predictor is obtained based on the distribution of
the quantity in question. Typical optimality criteria
are L2 , L1 , and 0/1 losses with corresponding optimal
predictors the (conditional) mean, median, and mode
of the distribution. For reasons of robustness, let us
focus on the median of the distribution of un (Vn+1 ) as
such a center of location.
Using Eq. (16), it follows that the distribution
of Vn+1 conditionally on FnV is approximately
N(V n+1 , 2 ) where 2 is an estimate of 2 in Eq. (16).
Thus, the median-optimal one-step-ahead predictor of
h(Xn+1 ) is the median of the distribution of un (V)
where V has the normal distribution N(V n+1 , 2 )

truncated to the values 1/ a0 ; this median is easily


computable by Monte-Carlo simulation.
The above Monte-Carlo simulation actually
creates a predictive distribution for the quantity
h(Xn+1 ). Thus, we can go one step further from
the notion of a point-predictor: clipping the left and
right tail of this predictive distribution, say 100%
on each side, a (1 2)100% prediction interval
for h(Xn+1 ) is obtained. Note, however, that this
prediction interval treats as negligible the variability
of the fitted parameters , a0 , a1 , ..., ap which is
reasonable only as a first-order approximation;

2009 Jo h n Wiley & So n s, In c.

Vo lu me 1, September/Octo ber 2009

WIREs Computational Statistics

Financial time series

alternatively, a bootstrap method might be in


order.45,46

CONCLUSION
An introduction to the statistical intricacies of financial time series was presented with an emphasis on the

crucial notions of nonlinearity and non-Gaussianity.


An overview of popular approaches for the analysis and/or modeling of such nonlinear time series
was also presented with special mention of the problem of one-step-ahead prediction via a transformation
technique.

ACKNOWLEDGEMENT
Many thanks are due to the Economics and Statistics sections of the National Science Foundation for
their support through grants SES-04-18136 and DMS-07-06732. The author is grateful to D. Gatzouras,
D. Kalligas, and D. Thomakos for helpful discussions, to A. Berg for compiling the software for the
bispectrum computations, and to R. Davis, M. Rosenblatt, and G. Sugihara for their advice and
encouragement.

REFERENCES
1. Bachelier, L. Theory of speculation. Reprinted In:
Cootner PH ed. The Random Character of Stock Market Prices (1964). Cambridge, MA: MIT Press; 1900,
1778.
2. Fama EF. The behaviour of stock market prices. J Bus
1965, 38:34105.
3. Davis RA, Mikosch T. The sample autocorrelations of
financial time series models. In: Fitzgerald WJ, Smith
RL, Walden AT, Young P, eds. Nonlinear and Nonstationary Signal Processing. Cambridge: Cambridge
University Press; 2000, 247274.
4. Gabaix X, Gopikrishnan P, Plerou V, Stanley HE. A
theory of power-law distributions in financial market
fluctuations. Nature 2003, 423:267270.
5. Politis DN. A heavy-tailed distribution for ARCH residuals with application to volatility prediction. Ann Econ
Finance 2004, 5:283298.
6. Mandelbrot B. The variation of certain speculative
prices. J Bus 1963, 36:394419.
7. Engle R. Autoregressive conditional heteroscedasticity
with estimates of the variance of UK inflation. Econometrica 1982, 50:9871008.
8. Fan J, Yao Q. Nonlinear Time Series. New York:
Springer, 2003.
9. Bollerslev T. Generalized autoregressive conditional
heteroscedasticity. J Econom 1986, 31:307327.
10. Gourieroux C. ARCH Models and Financial Applications. New York: Springer, 1997.

OE eds. Time Series Models in Econometrics, Finance


and Other Fields. London: Chapman & Hall; 1996,
167.
13. Brillinger D. Time Series: Data Analysis and Theory.
New York: Holt, Rinehart and Winston; 1975.
14. Hannan EJ, Deistler M. The Statistical Theory of Linear
Systems. New York: Wiley, 1988.
15. Yule GU. On a method of investigating periodicities
in disturbed series with special reference to Wolfers
sunspot numbers. Philos Trans R Soc London, Ser A
1927, 226:267298.
16. Billingsley P. Probability and Measure. New York:
Wiley; 1986.
17. Sugihara G, May RM. Nonlinear forecasting as a way
of distinguishing chaos from measurement error in time
series. Nature 1990, 344:734741.
18. Tong H. Non-linear Time Series Analysis: A Dynamical
Systems Approach. Oxford: Oxford University Press,
1990.
19. Brockwell P, Davis R. Time Series: Theory and Methods. 2nd ed. New York: Springer; 1991.
20. Rosenblatt M. Gaussian and Non-Gaussian Linear
Time Series and Random Fields. New York: Springer,
2000.

11. Bollerslev T, Chou R, Kroner K. ARCH modeling in


finance: a review of theory and empirical evidence.
J Econom 1992, 52:560.

21. Kokoszka P, Politis DN. The variance of sample


autocorrelations: does Bartletts formula work with
ARCH data? Discussion paper no. 2008-12, Department of Economics, UCSD; 2008., Available at: http://
repositories.cdlib.org/ucsdecon/2008-12.

12. Shephard N. Statistical aspects of ARCH and stochastic


volatility. In: Cox DR, Hinkley DV, Barndorff-Nielsen

22. Tsay RS. Analysis of Financial Time Series. New York:


Wiley; 2002.

Vo lu me 1, September/Octo ber 2009

2009 Jo h n Wiley & So n s, In c.

165

Overview

www.wiley.com/wires/compstats

23. Poskitt DS. Properties of the sieve bootstrap for fractionally integrated and non-invertible processes. J Time
Ser Anal 2008, 29(2):224250.
24. Breidt FJ, Davis RA, Trindade AA. Least absolute deviation estimation for all-pass time series models. Ann
Stat 2001, 29:919946.
25. Fuller WA. Introduction to Statistical Time Series. 2nd
ed. New York: Wiley; 1996.

Lecture Notes in Statistics No. 24. New York: Springer;


1984.
36. Hinich MJ, Patterson DM. Evidence of nonlinearity in
stock returns. J Bus Econ Stat 1985, 3:6977.
37. Hsieh D. Testing for nonlinear dependence in daily
foreign exchange rates. J Bus 1989, 62:339368.

26. Romano JP, Thombs L. Inference for autocorrelations under weak assumptions. J Am Stat Assoc 1996,
91:590600.

38. [l1]Kugiumtzis
D.
Evaluation
of
surrogate
and bootstrap tests for nonlinearity in time
series. Stud Nonlinear Dyn Econom 2008,
12(1):14741474, article 4. Available at: http://www.
bepress.com/snde/vol12/iss1/art4.

27. Dahlhaus R. Fitting time series models to nonstationary


processes. Ann Stat 1997, 25:137.

39. Bickel P, Buhlmann


P. Closure of linear processes.
J Theor Prob 1997, 10:445479.

a C, Tut
unc
u R. A non-stationary
28. Herzel S, Staric
paradigm for the dynamics of multivariate returns.
In: Bertail P, Doukhan P, Soulier P eds. Dependence
in Probability and Statistics, Springer Lecture Notes
in Statistics No. 187. New York: Springer; 2006,
391430.

40. Politis DN, Romano JP. Bias-corrected nonparametric


spectral estimation. J Time Ser Anal 1995, 16:67104.

29. Fan J. A selective overview of nonparametric methods in


financial econometrics. Stat Sci 2005, 20(4):317357.
30. Cover T, Thomas J. Elements of Information Theory.
New York: Wiley; 1991.

41. Politis DN, Thomakos DD. Financial time series and


volatility prediction using NoVaS transformations. In:
Rapach DE, Wohar ME eds. Forecasting in the Presence of Parameter Uncertainty and Structural Breaks.
Bingley, UK: Emerald Group Publishing Ltd; 2008,
417447.

31. Politis DN. Model-free vs. model-based volatility prediction, J Financ Econom 2007, 5(3):358389.

42. Politis DN. Can the stock market be linearised?


Discussion Paper No. 2006-03, Department of
Economics, UCSD; 2006., Available at: http://
repositories.cdlib.org/ucsdecon/2006-03.

32. Politis, DN. A normalizing and variance-stabilizing


transformation for financial time series. In: Akritas
MG, Politis DN, eds. Recent Advances and Trends
in Nonparametric Statistics. North Holland: Elsevier;
2003, 335347.

43. Politis DN, Thomakos DD. NoVaS transformations: flexible inference for volatility forecasting. Discussion Paper No. 2008-13, Department
of Economics, UCSD; 2008., Available at: http://
repositories.cdlib.org/ucsdecon/2008-13.

33. Shapiro SS, Wilk M. An analysis of variance test


for normality (complete samples). Biometrika 1965,
52:591611.

44. Choi BS. ARMA Model Identification. New York:


Springer; 1992.

34. Hinich MJ. Testing for Gaussianity and linearity of


a stationary time series. J Time Ser Anal 1982,
3(3):169176.
35. Subba Rao T, Gabr M. An Introduction to Bispectral Analysis and Bilinear Time Series Models, Springer

166

45. Politis DN. The impact of bootstrap methods on time


series analysis. Stat Sci 2003, 18(2):219230.
46. Politis DN. Model-free model-fitting and predictive
distributions. Discussion Paper No. 2008-14, Department of Economics, UCSD; 2008., Available at: http://
repositories.cdlib.org/ucsdecon/2008-14.

2009 Jo h n Wiley & So n s, In c.

Vo lu me 1, September/Octo ber 2009

You might also like