0% found this document useful (0 votes)
11 views13 pages

1-Basic Concepts 34454745

Time series analysis involves statistical methods to analyze data indexed by time, focusing on both observed data and stochastic processes. The document discusses various concepts, including the importance of domain knowledge, data visualization, and the transformation of data to reveal hidden structures, using examples like the S&P/TSX Composite Index and Canadian climate data. It also introduces fundamental concepts such as stochastic processes, white noise, and moving average processes, emphasizing the dependence of successive data points in time series analysis.

Uploaded by

Yanbo PAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views13 pages

1-Basic Concepts 34454745

Time series analysis involves statistical methods to analyze data indexed by time, focusing on both observed data and stochastic processes. The document discusses various concepts, including the importance of domain knowledge, data visualization, and the transformation of data to reveal hidden structures, using examples like the S&P/TSX Composite Index and Canadian climate data. It also introduces fundamental concepts such as stochastic processes, white noise, and moving average processes, emphasizing the dependence of successive data points in time series analysis.

Uploaded by

Yanbo PAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

1 Basic concepts

1.1 Introduction
Time series analysis refers to statistical theories and methods used to analyze data
indexed by time.
In time series analysis, the word time series has two related meanings:
(i) A data set, actual or simulated, of observations indexed by time.
(ii) A stochastic process (see Definition 1.3) used to model observed time series data.
For clarity, we may use time series data for (i) and time series model for (ii). We
typically denote random variables and stochastic processes in uppercase (e.g., Xt ),
and observed/realized values in lowercase (e.g., xt ). That time is linearly ordered and
directed is crucial in the statistical analysis of time series data (as well as in real life).
It is essential to gain experience by examining a wide variety of real data-sets. We
give two examples to illustrate some general ideas and terminologies.
Example 1.1. The S&P/TSX Composite Index represents the overall performance of
the stocks listed on the Toronto Stock Exchange (TSX).1 We consider the daily closing
value of the index which is available on, say, Yahoo Finance2 and can be downloaded
either directly from Yahoo Finance or through R using the function getSymbol()
in the package quantmod. Familiarity with the underlying subject matter, or domain
knowledge, is essential to meaningful analysis of the data.

S&P/TSX Composite index Daily log returns


0.10
20000

0.05
15000

0.00
−0.05
10000

−0.10

2000 2005 2010 2015 2020 2025 2000 2005 2010 2015 2020 2025

Date Date

Figure 1.1: Left: Time series of the S&P/TSX Composite Index


from 2000-01-04 to 2024-06-14. Right: Daily log return of the
index.

It is always a good idea to plot the data to examine its features. In Figure 1.1(left)
we plot the daily closing values from 2000-01-04 to 2024-06-14. We say that the
frequency is daily. Other common frequencies (at least for economic data) are weekly,
monthly, quarterly and yearly. While we plot the data in calendar time, the data is
indexed by trading day which skips in particular all holidays. For example, 2000-01-
07 is a Friday and the next trading day is 2000-01-10 (Monday). If we represent the
1
More precisely, it is a capitalization-weighted market index where the influence of a constituent stock
is proportional to its market capitalization.
2
See https://ca.finance.yahoo.com/. The ticker symbol (unique identifier) of the S&P/TSX Com-
posite index is ^GSPTSE

2
series by yt , t = 1, 2, . . . T where T is the length of the series, we interpret t as the
t-th trading day of the data-set.
We say that this time series is in discrete time since the time t takes values in a
discrete case (here t ∈ T = {1, . . . , T }). For visualization purposes we often interpolate
linearly between the data points as done in the figure. Some processes, such as high
frequency trading, speech, brain activities and weather, may be considered to occur
in continuous time (say t ∈ T = [a, b] ⊂ R for some interval), although for practical
reasons they are usually sampled in discrete time.
In Figure 1.1(left), we observe that the series yt evolves in a zig-zag matter but
exhibits an overall increasing trend (in finance, a key question is the trade-off between
reward and risk). It is often useful to transform the data to reveal hidden structures
and/or make it more suitable for statistical analysis; some techniques are discussed in
Section 2. In Figure 1.1(right), we consider the daily logarithmic return (which forms
another time series) given by

rt = log yt − log yt−1 = ∇ log yt , t = 2, 3, . . . , T,

where ∇, defined by
∇xt = xt − xt−1 , (1.1)
is the difference operator. This transformation removes the apparent trend in the
data: the log returns are roughly symmetric about zero.3 Also observe that there are
periods of high volatility (e.g., in 2008 (financial crisis) and 2020 (COVID-19)) and
they seem to cluster (rather than spread evenly across time). This phenomenon is
called volatility clustering and is observed in many financial time series.
Example 1.2. We consider Canadian climate data which is publicly available on https:
//climate.weather.gc.ca/. Here, we consider the daily mean temperature,4 in
degrees Celsius, recorded at a meteorological station (id: 6158731) within the Toronto
Pearson International Airport (YYZ). (The concept “temperature of Toronto” is useful
in daily life but is not very specific. An observation must be recorded sometime
somewhere.) The data set is plotted in Figure 1.2. We observe immediately a strong
seasonal pattern (i.e., periodic behaviour) whose fluctuations differ from year to year.

Daily mean temperature at YYZ


30
20
10
0
−10
−20

2014 2016 2018 2020 2022 2024

Date

Figure 1.2: Daily mean temperature, from 2013-06-13 to 2024-


06-13 at the Toronto International Airport.
3
The trend is accounted by the small but positive mean of the daily log returns.
4
Defined by mean = 12 max + 12 min, see https://climate.weather.gc.ca/glossary_e.html. We may
consider the time-averaged temperature but this quantity is not reported.

3
A raw data-set must be suitably cleaned and preprocessed before formal statistical
analysis begins. For example, there may be errors in data entry possibly revealed by
values which are unreasonably large or small. The series of interest may be derived
from other data. Here, the daily mean temperature is derived from the daily maximum
and minimum temperatures. In this data-set there are 21 missing values. Missing
values may be treated by various methods often on a case-by-case basis. Since here
the number of missing values is small (21 in 4013 date points), a reasonable approach
is to estimate a missing value yt by averaging nearby values, say 21 yt−1 + 12 yt+1 . In
other cases missing, or unobserved, values are important and may be accounted by
a suitable model. It is concerning if the results of an analysis are sensitive to the
conventions used.
The key idea of time series analysis is to regard an observed time series (possibly
after a suitable transformation) as the realized value of a stochastic process. Statistical
methods are used to infer the properties of the process. We recall here the general
definition of stochastic process.
Definition 1.3. Let T ⊂ R be a nonempty index set.
(i) (Stochastic process) A stochastic process indexed by T is a collection (Xt )t∈T of
random variables defined on some probability space (Ω, F, P). We simply write
(Xt ) if T is clear from the context. We say that (Xt ) is d-dimensional if each
Xt = (X1,t , . . . , Xd,t ) is a d-dimensional random vector.
(ii) (Sample path) If (Xt )t∈T is a stochastic process, the sample path corresponding
to the sample point ω ∈ Ω is the function t ∈ T 7→ Xt (ω).
For the most part we focus on univariate processes (d = 1) in discrete time, such
that T is an interval of Z (e.g., T = Z+ = {0, 1, 2, . . .} or Z).5 A distinctive feature of
time series is that successive data points are typically dependent, so we cannot regard
them as i.i.d. samples as in “conventional statistics”. In a nutshell, time series analysis
is concerned with the modelling of dependence for stochastic processes.

1.2 Simple examples of stochastic processes


The concept of white noise is fundamental in time series analysis. A white noise may
be regarded as a relatively simple kind of time series and can be used as a building
block of more sophisticated models. Some basic examples are given.
Definition 1.4. Let (Xt )t∈T , T ⊂ Z, be a real-valued stochastic process.
(i) (i.i.d. noise) (Xt ) follows an i.i.d. noise process with mean 0 and variance σ 2 >
0, written (Xt ) ∼ IID(0, σ 2 ), if (Xt )t∈T is a collection of i.i.d. random variables
with mean 0 and variance σ 2 .
(ii) (White noise) (Xt ) follows a white noise process with mean 0 and variance σ 2 >
0, written (Xt ) ∼ WN(0, σ 2 ), if each Xt has mean 0 and variance σ 2 and (Xt )
are pairwise uncorrelated, i.e.,
 2
σ , if s = t;
Cov(Xs , Xt ) = (1.2)
0, if s 6= t.
i.i.d.
Example 1.5 (Gaussian white noise). If Xt ∼ N (0, σ 2 ) then (Xt ) ∼ IID(0, σ 2 ). A
simulated sample path is shown in Figure 1.3(left). We can obtain non-Gaussian
i.i.d. noise processes simply by using other distributions such as the t-distribution and
the Laplace distribution. Examples of white noise processes which are not i.i.d. noise
processes will be given later.
5
In applications a discrete time series may not be indexed by an integer. For example, for a monthly
1
series we may use 1997 for January 1997 and 1997 + 12 for February 1997.

4
A sample path of Gaussian white noise Sample paths of Gaussian random walk

30
2

20
1

10
0

0
−10
−1
−2

−30
0 50 100 150 200 250 300 0 50 100 150 200 250 300

i.i.d.
Figure 1.3: Left: A sample path of Gaussian white noise Xt ∼
N (0, 1). Right: Ten sample paths of Gaussian random walk where
i.i.d.
Xt ∼ N (0, 1).

Since independence implies uncorrelation, we have

(Xt ) ∼ IID(0, σ 2 ) ⇒ (Xt ) ∼ WN(0, σ 2 ).

The converse is generally false but holds (Xt ) is a Gaussian process, i.e., (Xt1 , . . . , Xtk )
has a multivariate normal distribution for all k ≥ 1 and t1 , . . . , tk ∈ T.
Example 1.6 (A white noise which is not i.i.d.). Let R ≥ 0 be a non-negative and non-
constant random variable with E[R2 ] = σ 2 , and let Zt be an i.i.d. process, independent
of R, such that P(Zt = ±1) = 21 . Now define

Xt = RZt .

By independence, we have

E[Xt ] = E[RZt ] = E[R]E[Zt ] = E[E] · 0 = 0.

Next compute

Cov(Xs , Xt ) = E[(RZs )(RZt )] = E[R2 ]Cov(Zs , Xt ) = σ 2 δst ,

where δst is the Kronecker delta. Thus Xt ∼ WN(0, σ 2 ).


Nevertheless, Xt is not and i.i.d. noise. To see this, note that by construction
|Xt | = R for all t. It follows that Cov(|Xs |, |Xt |) = 1 for all s, t. (If (Xt ) was an
i.i.d. process we would have Cov(|Xs |, |Xt |) = 0 by independence.)
Example 1.7 (Random walk). A random walk started at s0 ∈ R is a process (St )t∈Z+
given by S0 = s0 and

St = S0 + X1 + · · · + Xt , t = 1, 2, . . . , (1.3)

where (Xt )∞ 2
t=1 is a collection of i.i.d. random variables (e.g., (Xt ) ∼ IID(0, σ )). See
Figure 1.3(right) for an illustration. Assuming Xt has finite variance, we may write

Xt = µ + σt ,

5
A sample path of an MA(3) process

3
2
1
0
−1
−2
−3

0 50 100 150 200 250 300

Figure 1.4: A simulated sample path of an MA(3) process where


the noises are i.i.d. N (0, 1) and (θ1 , θ2 , θ3 ) = (0.8, 0.2, −0.2). Note
the difference with the Gaussian white noise simulated in Figure
1.3(left).

where µ = E[Xt ] and (t )t ∼ IID(0, 1). Then


t
X
St = s0 + tµ + σ t , (1.4)
s=1

which has a linear trend. We may recover (Xt ) from (St ) by taking the first difference:
∇St = St − St−1 = Xt = µ + σt . (1.5)
Here differencing removes the trend. Observe that the TSX series in Figure 1.1(left)
looks qualitatively somewhat similar to a random walk if we neglect the big jumps
and volatility clustering.
Example 1.8 (Moving average process). Let (Zt )t∈Z ∼ WN(0, σ 2 ). Given a constant
θ ∈ R, define (Xt )t∈Z by
Xt = Zt + θZt−1 , t ∈ Z. (1.6)
More generally, given an integer q ≥ 1 and θ1 , . . . , θq ∈ R, we may define (Xt )t∈Z by
Xt = Zt + θ1 Zt−1 + · · · + θq Zt−q , t ∈ Z. (1.7)
If θq 6= 0, we call (Xt ) defined by (1.7) a moving average process of order q, or simply
an MA(q) process. Thus (1.6) defines an MA(1) process if θ 6= 0. The idea is that
Xt is given as a weighted sum of the current noise Zt and up to q previous noises
Zt−1 , . . . , Zt−q . See Figure 1.4 for an example.
Unlike a white noise process, successive values of a moving average process are
correlated, up to lag q. For example, for an MA(1) process (1.6) we have, for any t,
Cov(Xt+1 , Xt )
= Cov(Zt+1 + θZt , Zt + θZt−1 )
= Cov(Zt+1 , Zt ) + θCov(Zt+1 , Zt−1 ) + θCov(Zt , Zt ) + θ2 Cov(Zt , Zt−1 )
= 0 + θ · 0 + θ · σ 2 + θ2 · 0 = θσ 2 ,
and, by a similar argument,
Cov(Xt+2 , Xt ) = Cov(Xt+3 , Xt ) = · · · = 0.

6
Also, we have

Cov(Xt , Xt ) = Var(Xt ) = Var(Zt + θZt−1 )


= Var(Zt ) + θ2 Var(Zt−1 ) = (1 + θ2 )σ 2 .

Thus (using the symmetry of covariance) we have, for t (time), h (lag) ∈ Z,



 (1 + θ2 )σ 2 if h = 0;
Cov(Xt+h , Xt ) = θσ 2 , if h = ±1; (1.8)

0, otherwise.

This is an example of an autocovariance function (Definition 1.13). By varying q and


the parameters θ1 , . . . , θq , we can obtain a wide range of behaviours. In practice, q
and the coefficients (θj )qj=1 have to be estimated from data.

1.3 Stationarity
In time series analysis, we often observe only a single series (of some length) regarded
as the realized sample path of a stochastic process. Without some assumption of
stationarity, statistical inference is not possible. For example, suppose we observe one
sample (x1 , . . . , xT ) from N (a, I) where a = (a1 , . . . , aT ) ∈ RT is arbitrary. Regardless
of the length T we cannot expect to estimate a accurately. A key idea of time series
is that the underlying process is built up from some stationary process for which
statistical inference is possible. In the following we let the index set T be either Z+
or Z.
Definition 1.9. A stochastic process (Xt )t∈T is strictly stationary if for all k ≥ 1,
distinct t1 , . . . , tk , and h, we have
d
(Xt1 , . . . , Xtk ) = (Xt1 +h , . . . , Xtk +h ).

That is, the finite dimensional distributions of (Xt ) are invariant under time shifts.
Example 1.10 (Simple examples of strictly stationary processes).
(i) Any i.i.d. process is strictly stationary.
(ii) Let (Xt )t∈Z+ be a time homogeneous Markov chain. Suppose X0 ∼ π where π
is stationary for the chain, i.e., if X0 ∼ π then Xt ∼ π for all t. Then (Xt )
is strictly stationary. To give a concrete example, suppose the state space is
X = {0, 1} and the transition matrix P (x, y) = P(Xt+1 = y|Xt = x), x, y ∈ X ,
is given by

P (0, 0) = P (1, 1) = 1 − p, P (0, 1) = P (1, 0) = p.

Then the stationary distribution π is the uniform distribution on X .


Strict stationarity involves knowing all finite dimensional distributions. This is
difficult, if not impossible, to verify in practice since with only a finite number of
observations the invariance of joint distributions under time shift cannot be accurately
assessed. Weak stationarity (also called second order stationarity) focuses only on the
first two moments and hence is generally more applicable. Also, it is mathematically
and computationally more tractable to work with only the first two moments.
Definition 1.11 (Weak/second order stationarity). Let (Xt )t∈T⊂Z be a real-valued
stochastic process such that Var(Xt ) < ∞ for all t. Consider its mean function defined
by
µX (t) = E[Xt ], t ∈ T, (1.9)

7
and its autocovariance function defined by

γX (s, t) = Cov(Xs , Xt ), s, t ∈ T. (1.10)

We also define the autocorrelation function by ρX (s, t) = Cor(Xs , Xt ). We say that


(Xt ) is weakly stationary if (i) the mean function is constant, i.e., µX (t) ≡ m for
some constant m ∈ R; and (ii) the autocovariance function is variant under time
shift, i.e., γX (s, t) = γX (s + h, t + h) for all s, t, h.
Proposition 1.12. If (Xt ) is strictly stationary and Var(Xt ) < ∞ for all t, then
(Xt ) is weakly stationary. The converse holds if (Xt ) is a Gaussian process.

Proof. The first statement is clear. The second statement holds since a (multivariate)
normal distribution is completely specified by the mean and covariance matrix.

From now on, by statonarity we mean weak stationarity unless otherwise specified.
In (ii) above, replace t by s + h. So (ii) is equivalent to

γX (s, s + h) = γX (t, t + h) (1.11)

for all s, t and h. That is, the covariance function depends only on the lag h. When
h = 0,
γX (t, t) = Var(Xt )
is the common variance of Xt .
Definition 1.13. Let (Xt )t∈T , T = Z+ or Z, be a stationary process.
(i) (Autocovariance function) The autocovarinace function (ACVF) of X is defined
by
γX (h) = Cov(Xt , Xt+h ), h ∈ Z. (1.12)
(ii) (Autocorrelation function) The autocorrelation function (ACF) of X is defined
(when γX (0) > 0) by
Cov(Xt , Xt+h ) γX (h)
ρX (h) = Cor(Xt , Xt+h ) = p p = , h ∈ Z. (1.13)
Var(Xt ) Var(Xt+h ) γX (0)

Remark 1.14. The autocovariance γX (h) = Cov(Xt , Xt+h ) only measures linear de-
pendence between Xt and Xt+h . Xt and Xt+h can be dependent even if Cov(Xt , Xt+h ) =
0.
Let X and Y be real-valued random variables. It can be shown that X and Y are
independent if and only if

E[f (X)g(Y )] = E[f (X)]E[g(Y )] (1.14)

for any (bounded) functions f and g. Uncorrelation between X and Y only requires
(1.14) to hold when f and g are affine functions (i.e., functions of the form ax + b).
Proposition 1.15 (Properties of ACVF and ACF). Let (Xt ) be a stationary process.
Then:
(i) (Normalization) ρX (0) = 1.
(ii) (Symmetry) γX (h) = γX (−h) and ρX (h) = ρX (−h). (Thus in (1.12) and (1.13)
we may restrict to h ≥ 0.)
(iii) (Positive semidefiniteness) For k ≥ 1, lags h1 , . . . , hk and constants a1 , . . . , ak ∈
R, we have
Xk
ai aj γX (hi − hj ) ≥ 0. (1.15)
i,j=1

8
Proof. (i) Obvious since ρX (0) = Cor(Xt , Xt ) = 1.
(ii) By symmetry of the covariance, we have

γX (h) = Cov(Xt+h , X) = Cov(Xt , Xt−h ) = γX (−h).

(iii) Observe that the left hand side of (1.15) is the variance of the linear combi-
Pk
nation i=1 ai X(t + hk ), which is non-negative:
k
! n k
X X X
0 ≤ Var ai Xt+hi = ai aj Cov(Xt+hi , Xt+hj ) = ai aj γX (hi − hj ).
i=1 i,j=1 i,j=1

Remark 1.16. Conversely, any function γ : Z → R (or Z+ → R) which satisfies (ii)


and (iii) in Proposition 1.15 is the ACVF of some stationary process. In particular,
we can construct a strictly stationary Gaussian process (Xt ) whose ACVF is γ.
Example 1.17.
(i) A white noise process (Xt ) ∼ WN(0, σ 2 ) is stationary by construction. Clearly
Cov(Xt , Xt+h ) is equal to σ 2 if h = 0 and 0 if h 6= 0. Thus its ACF is given by

1 if h = 0;
γX (h) = (1.16)
0, if h 6= 0.

(ii) Let (Xt ) be an MA(1) process as in (1.6) in Example 1.8. From (1.8), (Xt ) is
stationary and its ACF is given by

 1 if h = 0;
θ
γX (h) = 2 if |h| = 1; (1.17)
 1+θ
0, if |h| =
6 2.

Example 1.18. Consider the process St = s0 + X1 + · · · + Xt , t ≥ 0, where s0 is


a constant and (Xt ) ∼ WN(0, σ 2 ) (c.f. Example 1.7). Consider its autocovariance
function γS (s, t) = Cov(Ss , St ). By symmetry of γS , we may assume without loss of
generality that t ≥ s. Since the increments of (St ) are uncorrelated, we have

γS (s, t) = Cov(Ss + (St − Ss ), Ss )


= Cov(Ss , Ss ) + Cov(Ss , St − Ss )
= Var(Ss ) + 0 = sσ 2 .

Thus for s, t ≥ 0 we have γS (s, t) = min{s, t} which is not a function of the lag s − t.
We conclude that S is not stationary. Consider

γS (s, t) min{s, t}σ 2 min{s, t}


Cor(Ss , St ) = p p = √ √ = √ √ .
γS (s, s) γS (t, t) 2
sσ tσ 2 s t

For h ≥ 0, we have

t t
Cor(St , St+h ) = √ √ =√ , (1.18)
t( t + h) t+h

which decays like O( h1 ) as h increases.


We study one more fundamental example.

9
Example 1.19 (Autoregressive process of order 1). Let (Zt )t∈Z ∼ WN(0, σ 2 ), and let
φ ∈ R be a constant satisfying |φ| < 1. Define a process (Xt )t∈Z by

X
Xt = φj Zt−j = Zt + φZt−1 + φ2 Zt−2 + · · · , (1.19)
j=0

which P
is a moving average process of infinite order (an instance of MA(∞) process).
∞ j 2
Since j=0 |φ | < ∞ and (Zt ) ∼ WN(0, σ ), the series converges absolutely with
probability 1. To show this, consider
 
X∞ ∞
X
E |φj Zt−j | = |φj |E[|Zt−j |],
j=0 j=0

where the equality holds by the monotone convergence theorem. On the other hand,
since (Zt ) ∼ WN(0, σ 2 ), we have, by the Cauchy-Schwarz inequality,
q
E[|Zt−j |] = E[|Zt |] ≤ E[Zt2 ] · 1 = σ.

Using the geometric sum formula, we have


 
X∞ ∞
X σ
E |φj Zt−j | ≤ σ |φ|j = < ∞.
j=0 j=0
1 − |φ|

P∞
Thus with probability 1 we have j=0 |φj Zt−j | < ∞, i.e., the sum converges abso-
lutely.
Since
Xt = Zt + φ(Zt−1 + φZt−2 + · · · ) = Zt + φXt−1 ,
| {z }
Xt−1

(Xt ) satisfies the recursion

Xt = φXt−1 + Zt , t ∈ Z. (1.20)

Thus Xt is a weighted sum of its previous value Xt−1 and the current noise Zt .
We say that (Xt ) follows an autoregressive process of order 1 (denoted as AR(1)).
Autoregressive processes of higher orders will P be introduced later.

Let’s verify that (Xt ) is stationary. Since j=0 |φj | < ∞ and (Zt ) ∼ WN(0, σ 2 ),
it is possible to exchange sums with expectation/covariance operators.6 We have
 
X∞ X∞ ∞
X
E[Xt ] = E  φj Zt−j  = φj E[Zt−j ] = φj · 0 = 0,
j=0 j=0 j=0

so the mean function is identically zero.7 Since (Zt ) ∼ WN(0, σ 2 ), we have


 
X ∞ X∞
σ2
Var(Xt ) = Var  φj Zt−j  = φ2j Var(Zt−j ) = .
j=0 j=0
1 − φ2

6
We omit the proof P
which can be found in [1, Section 3.1].
7
Letting Xt = µ + ∞ j
j=0 φ Zt−j , where µ ∈ R, shifts the mean to µ. The recursion (1.20) becomes
Xt − µ = φ(Xt−1 − µ) + Zt .

10
φ = 0.9 φ = −0.9

6
4
2

2
0

0
−2

−2
−4

−4
0 50 100 150 200 250 300 0 50 100 150 200 250 300

Figure 1.5: Simulated sample paths of AR(1) processes where


i.i.d.
Zt ∼ N (0, 1). Left: φ = 0.9. Right: φ = −0.9. The same noise
sequence is used to simulate both paths.

So the covariance function Cov(Xt , Xt+h ) is well-defined. To find γX (t, t + h) for


h > 0, we use the recursion (1.20) and observe that

γX (t, t + h) = Cov(Xt , Xt+h )


= Cov(Xt , φXt+h−1 + Zt+h )
= φCov(Xt , Xt+h−1 ) + Cov(Xt , Zt+h ) (1.21)
= φCov(Xt , Xt+h−1 )
= φγX (t, t + (h − 1)).

In the fourth equality we used the fact that



X
Cov(Xt , Zt+h ) = φj Cov(Zt−j , Zt+h ) = 0, h > 0.
j=0

Iterating (1.21) shows that

σ2
γX (t, t + h) = φh γX (t, t) = φh , h ≥ 0.
1 − φ2

Since γX (t, t + h) is independent of t, (Xt ) is stationary with autocorrelation function



1 if h = 0;
ρX (h) = (1.22)
φj if h ≥ 1;

which decays geometrically. When φ ∈ (0, 1), ρX (h) decays monotonically. When
φ ∈ (−1, 0), the sign of ρX (h) alternates. See Figure 1.5 to get a feel of how this
relates to behaviours of the sample paths. The ACFs (theoretical and sample) are
plotted in Figure 1.6.

1.4 Sample autocorrelation function


We introduce the sample analogues of the ACVF and ACF.

11
φ = 0.9 φ = −0.9

1.0

1.0
theoretical theoretical
sample sample
0.5

0.5
ACF

ACF
0.0

0.0
−0.5

−0.5
−1.0

−1.0
0 5 10 15 20 0 5 10 15 20

Lag Lag

Figure 1.6: Theoretical and sample ACFs of the sample paths


in Figure 1.5. Left: The case φ = 0.9. Right: The case φ = −0.9.
The horizontal dashed lines are at ± 1.96

T
where T is the length of
the series.

Definition 1.20. Let (xt )Tt=1 be a real-valued time series data-set.


PT
(i) (Sample mean) The sample mean x̄ is defined by x̄ = T1 t=1 xt .
(ii) (Sample autocovariance function) The sample covariance function is defined for
0 ≤ h < T by
T −h
1 X
γ̂(h) = γ̂(−h) = (xt+h − x̄)(xt − x̄). (1.23)
T t=1

(iii) (Sample correlation function) The sample correlation function is defined for 0 ≤
h < T by
γ̂(h)
ρ̂(h) = ρ̂(−h) = . (1.24)
γ̂(0)
Note that in (1.23) we divide by T rather than T − h (which varies with the lag
h). This ensures that γ̂ is positive semidefinite in the sense of Proposition 1.15(iii).
Example 1.21. In Figure 1.6 we plot the sample ACFs of the two simulated paths of
AR(1) process (for φ = 0.9 and φ = −0.9 respectively). Observe that the patterns of
the sample ACFs match – up to sampling errors – those of the theoretical ones.
Example 1.22. In Figure 1.7 we plot a simulated path of a random walk (as in Figure
1.3) and its sample ACF. Observe that the sample ACF stays positive for all lags
shown and decays rather slowly (c.f. (1.18)). These behaviours usually indicate that
the underlying process is nonstationary.
In time series analysis it is frequently useful to know whether a process is approx-
imately a white noise. Naturally, we may examine this by the sample ACF. Even if
(xt ) is a realization of a white noise process, due to sampling errors the sample ACF is
likely to be non-zero for non-zero lags. The typical magnitude of fluctuations is given
d
by the following theorem whose proof is beyond the scope of the course. We use →
to denote convergence in distribution.
Theorem 1.23 (Asymptotic distribution of sample ACF). Let (Xt ) ∼ IID(0, σ 2 ).
Let ρ̂T be the sample ACF computed from (X1 , . . . , XT ). Under additional technical

12
Sample path of a random walk Series sample_path

1.0
0
−5

0.8
−10

0.6
ACF
−15

0.4
−20

0.2
−25

0.0
−30

0 50 100 150 200 250 300 0 5 10 15 20

Lag

Figure 1.7: Left: Simulated path of a random walk. Right: Its


sample ACF.

assumptions,8 for any positive lag h we have


√ d
T (ρ̂T (1), . . . , ρT (h)) → N (0, Ih ), (1.25)

where N (0, Ih ) is the standard normal distribution. Thus, for any h 6= 0, ρ̂T (h) is
approximately distributed as N (0, T1 ) when T is sufficiently large.
Here is a straightforward application of the theorem. Fix a lag h, say h = 1. If
(Xt ) is an i.i.d. noise (more precisely, we mean that it satisfies the assumptions of
Theorem 1.23), then ρ̂T (h) is approximately distributed as N (0, T1 ). If we observe
|ρ̂T (h)| > 1.96

T
≈ √2T , then we can reject the null hypothesis that (Xt ) is an i.i.d. noise
at 5% significance level. The lines at ± 1.96

T
are typically provided in a plot of ACF are
useful for visual inspection of the sample ACF. For example, in Figures √ 1.6 and 1.7
the sample ACF is significant (by which we mean larger than 1.96/ T in magnitude)
at many lags. This is strong evidence that the series is not a white noise.
Corollary 1.24. Let h ≥ 1 be a given maximum lag under consideration Under the
assumptions of Theorem 1.23, we have
h
X d
QT := T ρ̂2T (j) → χ2h , (1.26)
j=1

where χ2h is the chi-squared distribution with h degrees of freedom.

Proof. This follows from the continuous mapping theorem applied to (1.25).

Given a lag h and a significance level α, the Portmanteau test9 (also called the
Box–Pierce test) states that we reject the null hypothesis that the series is an i.i.d. noise
if the test statistic QT defined by (1.26) exceeds the (1 − α)-quantile of χ2h . Note that
Corollary 1.24 is an asymptotic result so may not be accurate if T is not sufficiently
8
A sufficient condition that the fourth moment of Xt is finite: E[Xt4 ] < ∞.
9
The term “Portmanteau test” is also used for related tests that use a similar test statistic.

13
large. A refinement of the Portmanteau test, called the Ljung-Box test, uses instead
the test statistic
Xh
ρ̂2T (j)
QT := T (T + 2) , (1.27)
j=1
T −j

which has the same limiting distribution χ2h .


Remark 1.25. Although we use various statistical tests, certain subjective decisions
are involved in the analysis of time series (and other) data. For example, we may use
the same data (and prior information) to decide the class of models considered and
estimate the parameters, and we may consider multiple tests and p-values in model
selection and diagnostics (each p-value only applies to a single test when considered
alone). In practice, time series analysis is both a science and an art.

14

You might also like