0% found this document useful (0 votes)
60 views20 pages

ST304 Notes Zetai (v2)

Uploaded by

包金叶
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views20 pages

ST304 Notes Zetai (v2)

Uploaded by

包金叶
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Notes: ST304 Time Series and Forecasting

Zetai Cen

Update (20240313): adjustment of notations.

* The contents of this file are extracted from and hence the file only serves as a cookbook
of ST304@LSE. See Moodle for more details of the course, and all typos or inaccuracies
are my responsibility.

Contents
1 Overview and Basics 3

2 Stationary Time Series Models 5


2.1 WN: White Noise process . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 MA(q), Moving Average process of order q . . . . . . . . . . . . . . . . 5
2.3 AR(p), Auto-Regressive process of order p . . . . . . . . . . . . . . . . 5
2.4 ARMA(p, q), Auto-Regressive and Moving Average process of order p, q 5
2.5 Notes on ARMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5.1 Stationarity of MA(∞) . . . . . . . . . . . . . . . . . . . . . . . 6
2.5.2 Invertibility of ARMA . . . . . . . . . . . . . . . . . . . . . . . 7
2.5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.6 ARCH(p), Auto-Regressive Conditional Heteroscedasticity model of or-
der p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.7 GARCH(p, q), Generalised ARCH model of order p, q . . . . . . . . . . 8

3 Non-stationary Time Series Models 9


3.1 Trend & Seasonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 ARIMA: integrated ARMA . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 SARMA: seasonal ARMA
(Only for those interested) . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 SARIMA: seasonal ARIMA . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Estimation: Time-Domain 11
4.1 Basic Quantities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1.1 Sample Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1.2 Sample ACVS & ACF . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 MME: the Yule-Walker estimators for AR(p) . . . . . . . . . . . . . . . 13
4.3 Asymptotic results to test MA(q) & AR(p) . . . . . . . . . . . . . . . . 14
4.4 LSE: least square estimators for AR(p) & ARMA(p, q) . . . . . . . . . 15
4.5 MLE: MA(q) & AR(p) & ARMA(p, q) . . . . . . . . . . . . . . . . . . 17

1
5 Model Selection & Forecasting 18
5.1 Techniques for model selection & diagnostics . . . . . . . . . . . . . . . 18
5.2 Forecasting: prediction equations
(Only for those interested, except for MSPE definition) . . . . . . . . . 18
5.3 Forecasting: ARMA(p, q) . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2
1 Overview and Basics
On a suitable filtered-probability space (R, B, (B)t , P), we define a time series to be
a discrete sequence of random variables X := {Xt , t ∈ Z} or denote it by {Xt } if no
ambiguity.

Definitions

1. joint distribution of X at time points t1 , . . . , tn is defined to be

Fx (c1 , . . . , cn ) = P(Xt1 ≤ c1 , . . . , Xtn ≤ cn ).

2. autocovariance function (ACVF) of X at time t, s is defined as

γx (t, s) = cov(Xt , Xs ) = E[(Xt − µt )(Xs − µs )].

3. autocorrelation function (ACF) of X at time t, s is defined as

γx (t, s)
ρx (t, s) = p .
γx (t, t)γx (s, s)

4. strong stationarity of X means the joint distribution is invariant of drift. For


any time points t1 , . . . , tn and ∀h:

P(Xt 1 ≤ c1 , . . . , Xtn ≤ cn ) = P(Xt1 +h ≤ c1 , . . . , Xtn +h ≤ cn ).

5. weak stationarity of X means:


- finite: first-order and second-order moments =⇒ E[Xt2 ] < ∞ ,
- time-independent: mean =⇒ EXt = µ,
- time-independent: ACVF

=⇒ ∀t, s, c: γx (t, s) = γx (t + c, s + c).

Thus for a weakly stationary, discrete and equally spaced series, denote the
autocovariance series (ACVS) and ACF as:

sτ := γx (t, t + τ ) = cov(Xt , Xt+τ )

γx (t, t + τ ) sτ
ρτ := p = .
γx (t, t)γx (t + τ, t + τ ) s0

3
6. Gaussian process is a process X such that for any t1 , . . . , tn , vector (Xt1 , . . . , Xtn )
has a multivariate Normal distribution with finite mean and covariances. Notice
a weakly stationary Gaussian process is strongly stationary as well.

7. partial autocorrelation function (PACF) for a mean 0 stationary X defined


as:
ϕ11 := corr(X2 , X1 )
ϕ22 := corr(X3 − E[X3 |X2 ], X1 − E[X1 |X2 ])
ϕ33 := corr(X4 − E[X4 |X3 , X2 ], X1 − E[X1 |X3 , X2 ]), · · ·
Note:
For stationary AR(p) with 0 mean: ϕpp = ϕp := the Xt−p coefficient

Inspiration from Wold’s Decomposition Theorem


Any covariance-stationary (weakly stationary without constant mean) time series Xt
could be decomposed additively as:

Xt = Tt + St + Mt

where Tt is deterministic, called the trend component,St is deterministic, called the


seasonal component, and Mt is stochastic, called the microscopic component. Our
modelling focus is the microscopic part, and we hope Mt is weakly stationary.

We mainly focus on stationary (meaning weakly-stationary) time series and might


mention just a few non-stationary models.

The overall analysis could be dichotomised as: time domain and frequency domain.
The latter is preferred when periodicity is less apparent, and is not considered in this
course. After the estimation, we talk about the forecasting. Thus, generally we have:

modelling → estimation
| {z } → model selection → forecasting
| {z } | {z }
Sec.2 & Sec.3 Sec.4 Sec.5

4
2 Stationary Time Series Models
A filtered series is a linear combination of time series variables. We start from white
noise, the building block of important models.

2.1 WN: White Noise process


A white noise process {Xt } is a collection of uncorrelated time series variables with
constant, finite mean and variance:
(i) E[Xt ] = µ < ∞
(ii) σ 2 := s0 < ∞ and we have sτ = 0 for τ ̸= 0
We then denote it as:
Xt ∼ W N (µ, σ 2 )

2.2 MA(q), Moving Average process of order q


A process {Xt } is a moving average process of order q, denoted Xt ∼ MA(q), if could
be written as:
Xt = µ + θ0 εt + θ1 εt−1 + · · · + θq εt−q
where µ and θi are constants with θq ̸= 0, and εt ∼ W N (0, σε2 ). We have usually θ0 = 1.
Generally, MA(q) is weakly stationary except for MA(∞).

2.3 AR(p), Auto-Regressive process of order p


A process {Xt } is an autoregressive process of order p, denoted Xt ∼ AR(p), if could
be written as:
Xt = ϕ1 Xt−1 + ϕ2 Xt−2 + · · · + ϕp Xt−p + εt (1)
where ϕj are constants with ϕp ̸= 0, and εt ∼ W N (0, σε2 ).

Generally, AR(p) is not stationary, but if it is stationary, we can have an AR(p)


process with non-zero mean. We claim form:

Xt = α + ϕ1 Xt−1 + ϕ2 Xt−2 + · · · + ϕp Xt−p + εt (2)

could be written as (1) by assuming E[Xt] = µ, then take expectation on both sides,
we have α = µ(1 − ϕ1 − · · · − ϕp ).

2.4 ARMA(p, q), Auto-Regressive and Moving Average pro-


cess of order p, q
A process {Xt } is an ARMA(p, q) process, denoted Xt ∼ ARMA(p, q), if could be
written as:

Xt = ϕ1 Xt−1 + · · · + ϕp Xt−p + εt + θ1 εt−1 + · · · + θq εt−q (3)

where ϕj and θj are constants with ϕp , θq ̸= 0, and εt ∼ W N (0, σε2 ).

5
2.5 Notes on ARMA
We define the backward shift operator as B such that:

BXt = Xt−1 , B s Xt = Xt−s

Rewrite (3) as:

(1 − ϕ1 B − · · · − ϕp B p )Xt = (1 + θ1 B + · · · + θq B q )εt (4)


| {z } | {z }
Φ(B) Θ(B)

In (4), we define:
- Φ(B) as the autoregressive operator
- Θ(B) as the moving average operator
Hence also define:
- Φ(z) as the AR characteristic polynomial
- Θ(z) as the MA characteristic polynomial
Thus, ARMA is written as Φ(B)Xt = Θ(B)εt .

2.5.1 Stationarity of MA(∞)



X
Xt = ψj εt−j
j=−∞

The above is an extended MA(∞) process if {εt } is weakly stationary with ACVS sετ .
Notice the process is NOT weaklyP
stationary. If coefficients are absolutely conver-
gent / absolutely summable: j |ψj | < ∞, then the process is weakly stationary
with ACVS: ∞
X
x
sτ = ψj ψk sετ −(j−k)
j,k=−∞

6
2.5.2 Invertibility of ARMA
For an ARMA process:

Φ(B)Xt = Θ(B)εt , εt ∼ W N (0, σ 2 )

We
P∞ say the process is invertible if there is a sequence of constants πj such that
j=0 |πj | < ∞ and:

X
εt = πj Xt−j
j=0

2.5.3 Summary
For ARMA(p, q):
stationarity −→ absolute summability, but not true for other processes.

AR(p) MA(q) ARMA(p,q)


roots of Φ(z) = 0 roots of Φ(z) = 0
stationary always
outside of unit circle outside of unit circle
roots of Θ(z) = 0 roots of Θ(z) = 0
invertible always
outside of unit circle outside of unit circle
ACF exp. decay cut-off at lag q exp. decay
PACF cut-off at lag p exp. decay exp. decay

2.6 ARCH(p), Auto-Regressive Conditional Heteroscedastic-


ity model of order p
An ARCH(p) model, denoted Xt ∼ ARCH(p), is defined as:

X t = σ t εt
(5)
σt2 = α0 + α1 Xt−1
2 2
+ · · · + αp Xt−p

where {εt } is i.i.d. with zero mean and unit variance, denoted εt ∼ P
IID(0, 1). Fur-
thermore, α0 , αp > 0, and the other αj ≥ 0. Usually we also assume pi=1 αi < 1 to
ensure stationarity, see property 1 below.

Properties

1. If pi=1 αi < 1, then Xt ∼ WN(0, σx2 ) where σx2 = α0 /(1 − pi=1 αi ).


P P

2. {Xt } is not independent. Actually the squared process {Xt2 } is an AR(p) by


defining the white noise νt := σt2 (ε2t − 1) (check!):
2
Xt2 = νt + σt2 = α0 + α1 Xt−1 2
+ · · · + αp Xt−p + νt .

7
2.7 GARCH(p, q), Generalised ARCH model of order p, q
A GARCH(p, q) model, denoted Xt ∼ GARCH(p, q), is defined as:

Xt = σt εt
(6)
σt2 = α0 + α1 Xt−1
2 2
+ · · · + αp Xt−p 2
+ β1 σt−1 2
+ · · · + βq σt−q

where εt ∼ IID(0, 1), and α0 , αp , βq > 0, and the other αj , βj ≥ 0. We also assume
Pmax(p,q)
(αi + βi ) := pi=1 αi + qi=1 βi < 1.
P P
i=1

Properties

1. By the assumed constraints on P Pthe process is stationary and Xt ∼


coefficients,
WN(0, σx2 ) where σx2 = α0 /(1 − pi=1 αi − qi=1 βi ).

2. {Xt2 } is an ARMA(max(p, q), q) under certain conditions on finite moments. If


so, use the white noise defined in ARCH νt :
p q
X X
Xt2 = νt + σt2 = α0 + 2
αi Xt−i + 2
βi (Xt−i − νt−i ) + νt
i=1 i=1
max(p,q) q
X X
2
= α0 + (αi + βi )Xt−i + νt − βi νt−i
i=1 i=1

3. For GARCH(1,1), if we adapt Gaussian innovation {εt }, we can check that the
kurtosis is larger than 3, so the process cannot be Gaussian.

8
3 Non-stationary Time Series Models
3.1 Trend & Seasonality
The d-th order difference on a process {Xt } is defined recursively as:

∆d Xt := ∆(∆d−1 Xt ), ∆1 Xt = Xt − Xt−1

Thus, essentially ∆d Xt = (1 − B)d Xt with B as the backward shift operator.

The seasonal difference with period s on a process {Xt } is defined as:

∆s Xt := Xt − Xt−s = (1 − B s )Xt

Thus, if we want, ∆ds Xt = (1 − B s )d Xt .

3.2 ARIMA: integrated ARMA


Usually we might find the process non-stationary. By denoting Xt ∼ I(d), we say a
process {Xt } is integrated with order d if there is a stationary {yt }:

∆d Xt = yt

Thus, if the above yt ∼ ARM A(p, q), we say the process Xt ∼ ARIM A(p, d, q).

3.3 SARMA: seasonal ARMA


(Only for those interested)
If a process {Xt } demonstrates seasonality s and the seasonal differenced follows
ARMA, we denote Xt ∼ ARM A(P, Q)s and formally:

ΦP (B s )Xt = ΘQ (B s )εt

where εt ∼ W N (0, σ 2 ) and:

ΦP (B s ) := 1 − ϕ1 B s − ϕ2 B 2s − · · · − ϕP B P s
ΘQ (B s ) := 1 + θ1 B s + θ2 B 2s + · · · + θQ B Qs

It is a pure seasonal ARMA model. Similar to ARMA, the process is stationary if


ΦP (z s ) = 0 has all roots outside of unit circle, and it is invertible if ΘQ (z s ) = 0 has
all roots outside of unit circle.

9
We extend the pure seasonal ARMA to the multiplicative SARMA model. We denote
Xt ∼ ARM A(p, q) × (P, Q)s if the model has the form:

ΦP (B s )Φ(B)Xt = ΘQ (B s )Θ(B)εt

where εt ∼ W N (0, σ 2 ), and the other operators are defined the same in the pure
seasonal ARMA and ARMA.

3.4 SARIMA: seasonal ARIMA


A general form of seasonal model is introduced by taking seasonal differences and the
process is denoted Xt ∼ ARIM A(p, d, q) × (P, D, Q)s :

ΦP (B s )Φ(B)∆D d s
s ∆ Xt = α + ΘQ (B )Θ(B)εt

where εt ∼ W N (0, σ 2 ).

For pure seasonal AR or MA model, we have similar properties parallel to the ACF
and PACF properties for MA and AR models. We have ACF cut-off at lag Qs for
MA(Q)s , and PACF cut-off at lag P s for AR(P )s .

10
4 Estimation: Time-Domain
4.1 Basic Quantities
For later estimation, we first define and discuss some basic quantities. We assume the
process {Xt } is weakly stationary, and we want to estimate the mean µ, ACVS sτ ,
ACF ρτ .

4.1.1 Sample Mean


Define sample mean:
T
1X
X̄ := Xt
T t=1
First, notice it is an unbiased estimator. For the following, we focus on the variance
of the estimator:
T T T T
1 XX 1 XX
var(X̄) = 2 cov(Xi , Xj ) = 2 si−j (7)
T i=1 j=1 T i=1 j=1

We call a matrix Toeplitz matrix if entry values are the same on every n-th diagonal.
Specifically, here we define:
 
s0 s1 · · · sT −1
 s1
 s0 · · · sT −2 
 .. .. . . .. 
 . . . . 
sT −1 sT −2 · · · s0

Thus, the summation in (7) is just to sum over the entries in each row and column, so
we can re-write it by summing over diagonals:
T −1
1 X
var(X̄) = 2 (T − |τ |)sτ
T
τ =−(T −1)
(8)
T
1 X |τ |
= (1 − )sτ
T τ =−T T
P
If {sτ } is absolutely summable (i.e. τ |sτ | < ∞, see 2.5.1), as T → ∞, we can see
that var(X̄) → 0:
E[(x̄ − µ)2] → 0,
which essentially says x̄ converges to µ in mean square, and hence in probability.

11
4.1.2 Sample ACVS & ACF
Define the unbiased estimator for ACVS, assuming µ is known:
T −|τ |
1 X
s(u)
τ := (Xt − µ)(Xt+|τ | − µ) (9)
T − |τ | t=1

However, we almost always not know µ, so (9) is adjusted, and the new estimator is
biased:
T −|τ |
(⋆) 1 X
sτ := (Xt − X̄)(Xt+|τ | − X̄) (10)
T − |τ | t=1
We propose a new biased estimator which is often more preferred:
T −|τ |
1 X
ŝτ := (Xt − X̄)(Xt+|τ | − X̄) (11)
T t=1

For the following reasons:


 (⋆) 
1. MSE ŝτ ≤ MSE sτ

2. absolute summable: most processes have {sτ } absolutely summable, implying


τ → ∞ ⇒ sτ → 0. It is hence good to have the estimator decreasing with τ .
(⋆)
3. positive semi-definite: ŝτ ensures this property while sτ does not.

Lastly, we define the estimator for ACF:


ŝτ
ρ̂τ := (12)
ŝ0

12
4.2 MME: the Yule-Walker estimators for AR(p)
The Yule-Walker estimator is a type of Methods of Moments Estimators (MME),
and in the discussion below, we assume {Xt } is weakly stationary.

For an AR(p) process in the form (1) and τ > 0, we want to estimate a collection
of p + 1 parameters: {σε2 , ϕ1 , . . . , ϕp }. Take covariance with Xt−τ on both sides, we
have (13); take covariance with Xt instead, we have (14):

sτ = ϕ1 sτ −1 + · · · + ϕp sτ −p (13)

s0 = ϕ1 s1 + · · · + ϕp sp + σε2 (14)

To our p+1 unknown parameters, we just need p+1 equations. Thus, take τ = 1, . . . , p
for (13), we have p + 1 Yule-Walker equations in matrix form:

γp = Γp ϕ, σε2 = s0 − ϕT γp (15)

where ϕ := (ϕ1 , . . . , ϕp )T and γp := (s1 , . . . , sp )T and:


 
s0 s1 · · · sp−1
 s1 s0 · · · sp−2 
Γp :=  ..
 
.. . . .. 
 . . . . 
sp−1 sp−2 · · · s0

From (15), we replace all ACVS using ŝτ in (11) and get the estimated parameters
which guarantees stationary AR(p):
−1
ϕ̂ = Γˆp γˆp , σˆε 2 = sˆ0 − ϕ̂T γˆp (16)

13
4.3 Asymptotic results to test MA(q) & AR(p)
First, MA(q) could be estimated by methods of moments (MME), try it.

Asymptotic results of sample ACF for MA(q)


If for a zero mean MA(q) process, εt ∼ IID(0, σ 2 ) and E[ε4t ] < ∞, then as T → ∞:
q
√ D
X
T ρ̂τ −
→ N (0, 1 + 2 ρ2j ) for all τ > q (17)
j=1

If the process is WN (q = 0), then by (17):


√ D
T ρ̂τ −
→ N (0, 1) for all τ > 0 (18)

Therefore, − 1.96
√ , + 1.96
 
T

T
is a 95% confidence interval on sample ACF for the process
being a white noise. For common MA(q) processes, we often use sample ACF to
replace the real ACF in (17).

Asymptotic results of Yule-Walker estimators for AR(p)


√ D √ D
→ N (0, σε2 Γ−1
T (ϕ̂ − ϕ) − p ), T (σˆε 2 − σε2 ) −
→ N (0, 2σε4 ) (19)

If true order for AR is p, while we estimate an AR(h) process with h > p, then by
(19):
√ D
T ϕ̂h −
→ N (0, 1) (20)
Note the result does NOT hold for other k : p < k < h. Now, by the note in Def.7
on PACF, we have PACF ϕhh = ϕh = 0 for the case in (20). Thus for a well defined
sample PACF:
√ √ D
T ϕ̂hh = T ϕ̂h −
→ N (0, 1) ∀h > p (21)
 1.96 1.96 
Therefore, we have − √T , + √T as the 95% confidence bounds on sample PACF for
the AR process having order ≤ p.

14
4.4 LSE: least square estimators for AR(p) & ARMA(p, q)

AR(p)
We consider the least square estimators (LSE) for AR(p) processes with zero mean.
Given data points {X1 , . . . , XT }, write the equations in matrix form:

xF = F ϕ + εF (22)

where xF := (Xp+1 , . . . , XT )T and εF := (εp+1 , . . . , εT )T and F is of dimension (T −


p) × p:  
Xp Xp−1 · · · X1
 Xp+1 Xp · · · X2 
F :=  ..
 
.. .. .. 
 . . . . 
XT −1 XT −2 · · · XT −p

The forward least square estimator minimises the sum of squared errors:

ϕ̂OLS : = argminϕ ||εF ||2


= argminϕ ||xF − F ϕ||2 (23)
T −1 T
= (F F ) F xF

Hence:
2 ||xF − F (ϕ̂OLS )||2
σ̂OLS := (24)
(T − p) − p

However, the above estimators do not ensure a weakly stationary process which is
problematic for forecasting.

15
ARMA(p, q)
We now consider LSE on general ARMA(p, q) processes. We only mention the idea as
details are left with computers. For a non-zero mean ARMA(p, q):
p q
X X
Xt = c + bj Xt−j + εt + ai εt−i (25)
j=1 i=1

With εt (c, a, b) denoting the value of εt given (c, a, b), where a := (a1 , . . . , aq )T and
b := (b1 , . . . , bp )T :
T
X
⇒ (ĉ, â, b̂) : = argminc,a,b [εt (c, a, b)]2
t=p+1
T p q
(26)
X X X
: = argminc,a,b [Xt − c − bj Xt−j − ai εt−i (c, a, b)]2
t=p+1 j=1 i=1

It is a nonlinear optimisation but we can initialise (c0 , a0 , b0 ) and iterate until conver-
gence in our estimators.

16
4.5 MLE: MA(q) & AR(p) & ARMA(p, q)
We consider maximum likelihood estimators (MLE) for the three types of mod-
els. The general idea is to write MA or ARMA models as AR models by using the
backward shift operator, and we use MLE on the AR model.

Only example of AR(1) is discussed here, other cases are similar. Assume an AR(1)
i.i.d.
model with 0 mean, with data (X0 , X1 , . . . , Xn ), with εt ∼ N (0, σ 2 ). Thus we can
assume:
σ2
X0 ∼ N (0, ) (27)
1 − ϕ2
Together with the conditional joint likelihood f (X1 , . . . , Xn |X0 ) is Gaussian, we can
have the log-likelihood function:
n n 1 1
l(ϕ, σ 2 ) = − log(2π) − log(σ 2 ) + log(1 − ϕ2 ) − 2 S(ϕ) (28)
2 2 2 2σ
where we define S(ϕ):
n
X
S(ϕ) := (Xi − ϕXi−1 )2 + (1 − ϕ2 )X02
i=1
(29)

=: S ⋆ (ϕ) + (1 − ϕ2 )X02

We have three methods for the optimisation:

1. exact likelihood: to maximise l(ϕ, σ 2 )

2. unconditional likelihood: to minimise S(ϕ)

3. conditional likelihood: to minimise S ⋆ (ϕ)

The first method is nonlinear and requires numerical routines, but all three are asymp-
totically the same as the asymptotic distribution coincides with the asymptotic result
for Yule-Walker estimators.

17
5 Model Selection & Forecasting
5.1 Techniques for model selection & diagnostics
For model selection, we can consider the Akaike’s Information Criterion (AIC)
defined as:
AIC := −2(Maximised Log-likelihood) + 2m (30)
where m is the number of estimated parameters. We hope to minimise AIC and the
2m term is used to penalise for large models. More advanced versions are BIC (or
AICC which is not discussed here) based on AIC, defined as

BIC := −2(Maximised Log-likelihood) + m log(n), (31)

where n is the sample size (and hence will be T for our case in time series).

Next we might want to test if a process such as residuals is a white noise. At large
lags we expect to observe trivial ACF, and a formal test is the Ljung-Box-Pierce
statistic:
k

X ρ̂2j
Q := T (T + 2) (32)
j=1
T −j
where T is the sample size, ρ̂j is the sample ACF, k is a pre-chosen and represents
the number of lags we want to look at. As T → ∞, Q⋆ is close to a χ2k−m distribution
where m is the number of estimated parameters.

5.2 Forecasting: prediction equations


(Only for those interested, except for MSPE definition)
We assume the process focused is stationary. For prediction, we define the ”best”
estimator to minimise the mean square prediction error (MSPE):

MSPE(Xt (l)) := E (Xt (l) − Xt+l )2


 
(33)

where Xt (l) is the l-step prediction at time t.

For the rest of the note, we denote the conditional expectation:

Et[·] := E[·|Ft]
where Ft represents information of Xs ∀s ≤ t.

Theorem 6.2.1.
Xt (l) = Et [Xt+l ] minimises the MSPE. Proof omitted but please try.

Now we concentrate on only linear predictors, of the form:


t
X
Xt (l) = α0 + αk X k (34)
k=1

By Theorem 6.2.1 above, we have the following.


18
Theorem 6.2.2.
Xt (l) is the best linear predictor (lowest MSPE among all linear predictors) of Xt+l if
and only if it satisfies:
E (Xt(l) − Xt+l )Xj = 0, ∀j = 0, 1, . . . , t,
 
(35)
with X0 = 1, and the involved equations are called prediction equations. The proof
is omitted here, but PLEASE TRY. The “if” part is to try to prove Xt (l) has lowest
MSPE among all linear predictors if prediction equations satisfied); the “only if” part
is by showing when MSPE is minimised for all linear predictors (differentiation), the
prediction equations hold. (If MSPE is minimised among all predictors, then the proof
is easyis by tower property on conditional mean.)

Notice we can consider the de-meaned process {Xt − µ} and hence α0 in (34) could
be ignored (think about why?). In practice, µ could be estimated by the sample mean
anyway. Next, (35) implies for j = 0, 1, . . . , t:
st+l−j = α1 sj−1 + α2 sj−2 + · · · + αt sj−t . (36)
In matrix form, we have:
γl = Γt α (37)

where γl := (sl , sl+1 , . . . , st+l−1 )T , α := (αt , αt−1 , . . . , α1 )T , and:


 
s0 s1 · · · st−1
 s1 s0 · · · st−2 
Γt :=  ..
 
.. . . .. 
 . . . . 
st−1 st−2 · · · s0 .

By replacing all sj ’s by their sample estimates, ŝτ in (11), we can obtain the coefficient
estimates:
α̂ = Γ̂−1
t γ̂l .

5.3 Forecasting: ARMA(p, q)


Similarly, we assume we are forecasting a stationary ARMA process with 0 mean and
IID innovation. Thus we have the MA(∞) representation:

X
Xt = ψk εt−k (38)
k=0

where εt ∼ IID(0, σ 2 ). Then we easily have the l-step prediction:



Xt (l) = Et
X 
ψk εt+l−k
k=0

X
= ψk εt+l−k (39)
k=l
X∞
= ψj+l εt−j
j=0
19
We can observe that the ε’s terms from t + 1 to t + l is removed due to the expectation,
so the l-step prediction variance is the variance of those removed ε’s:
l−1
σ 2 (l) := E Xt (l) − Xt+l =E
 2   X 2 
ψk εt+l−k
k=0
l−1
X
= σ2 ψk2
k=0

Assume the process is invertible, we can also express Xt (l) directly using Xt , details
not discussed here.

 
Version edited using LATEX.

20

You might also like