0% found this document useful (0 votes)

60 views20 pages

ST304 Notes Zetai (v2)

Uploaded by

包金叶

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views20 pages

ST304 Notes Zetai (v2)

Uploaded by

包金叶

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Notes: ST304 Time Series and Forecasting

Zetai Cen

Update (20240313): adjustment of notations.

* The contents of this file are extracted from and hence the file only serves as a cookbook
of ST304@LSE. See Moodle for more details of the course, and all typos or inaccuracies
are my responsibility.

Contents
1 Overview and Basics 3

2 Stationary Time Series Models 5

2.1 WN: White Noise process . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 MA(q), Moving Average process of order q . . . . . . . . . . . . . . . . 5
2.3 AR(p), Auto-Regressive process of order p . . . . . . . . . . . . . . . . 5
2.4 ARMA(p, q), Auto-Regressive and Moving Average process of order p, q 5
2.5 Notes on ARMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5.1 Stationarity of MA(∞) . . . . . . . . . . . . . . . . . . . . . . . 6
2.5.2 Invertibility of ARMA . . . . . . . . . . . . . . . . . . . . . . . 7
2.5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.6 ARCH(p), Auto-Regressive Conditional Heteroscedasticity model of or-
der p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.7 GARCH(p, q), Generalised ARCH model of order p, q . . . . . . . . . . 8

3 Non-stationary Time Series Models 9

3.1 Trend & Seasonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 ARIMA: integrated ARMA . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 SARMA: seasonal ARMA
(Only for those interested) . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 SARIMA: seasonal ARIMA . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Estimation: Time-Domain 11
4.1 Basic Quantities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1.1 Sample Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1.2 Sample ACVS & ACF . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 MME: the Yule-Walker estimators for AR(p) . . . . . . . . . . . . . . . 13
4.3 Asymptotic results to test MA(q) & AR(p) . . . . . . . . . . . . . . . . 14
4.4 LSE: least square estimators for AR(p) & ARMA(p, q) . . . . . . . . . 15
4.5 MLE: MA(q) & AR(p) & ARMA(p, q) . . . . . . . . . . . . . . . . . . 17

1
5 Model Selection & Forecasting 18
5.1 Techniques for model selection & diagnostics . . . . . . . . . . . . . . . 18
5.2 Forecasting: prediction equations
(Only for those interested, except for MSPE definition) . . . . . . . . . 18
5.3 Forecasting: ARMA(p, q) . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2
1 Overview and Basics
On a suitable filtered-probability space (R, B, (B)t , P), we define a time series to be
a discrete sequence of random variables X := {Xt , t ∈ Z} or denote it by {Xt } if no
ambiguity.

Definitions

1. joint distribution of X at time points t1 , . . . , tn is defined to be

Fx (c1 , . . . , cn ) = P(Xt1 ≤ c1 , . . . , Xtn ≤ cn ).

2. autocovariance function (ACVF) of X at time t, s is defined as

γx (t, s) = cov(Xt , Xs ) = E[(Xt − µt )(Xs − µs )].

3. autocorrelation function (ACF) of X at time t, s is defined as

γx (t, s)
ρx (t, s) = p .
γx (t, t)γx (s, s)

4. strong stationarity of X means the joint distribution is invariant of drift. For

any time points t1 , . . . , tn and ∀h:

P(Xt 1 ≤ c1 , . . . , Xtn ≤ cn ) = P(Xt1 +h ≤ c1 , . . . , Xtn +h ≤ cn ).

5. weak stationarity of X means:

- finite: first-order and second-order moments =⇒ E[Xt2 ] < ∞ ,
- time-independent: mean =⇒ EXt = µ,
- time-independent: ACVF

=⇒ ∀t, s, c: γx (t, s) = γx (t + c, s + c).

Thus for a weakly stationary, discrete and equally spaced series, denote the
autocovariance series (ACVS) and ACF as:

sτ := γx (t, t + τ ) = cov(Xt , Xt+τ )

γx (t, t + τ ) sτ
ρτ := p = .
γx (t, t)γx (t + τ, t + τ ) s0

3
6. Gaussian process is a process X such that for any t1 , . . . , tn , vector (Xt1 , . . . , Xtn )
has a multivariate Normal distribution with finite mean and covariances. Notice
a weakly stationary Gaussian process is strongly stationary as well.

7. partial autocorrelation function (PACF) for a mean 0 stationary X defined

as:
ϕ11 := corr(X2 , X1 )
ϕ22 := corr(X3 − E[X3 |X2 ], X1 − E[X1 |X2 ])
ϕ33 := corr(X4 − E[X4 |X3 , X2 ], X1 − E[X1 |X3 , X2 ]), · · ·
Note:
For stationary AR(p) with 0 mean: ϕpp = ϕp := the Xt−p coefficient

Inspiration from Wold’s Decomposition Theorem

Any covariance-stationary (weakly stationary without constant mean) time series Xt
could be decomposed additively as:

Xt = Tt + St + Mt

where Tt is deterministic, called the trend component,St is deterministic, called the

seasonal component, and Mt is stochastic, called the microscopic component. Our
modelling focus is the microscopic part, and we hope Mt is weakly stationary.

We mainly focus on stationary (meaning weakly-stationary) time series and might

mention just a few non-stationary models.

The overall analysis could be dichotomised as: time domain and frequency domain.
The latter is preferred when periodicity is less apparent, and is not considered in this
course. After the estimation, we talk about the forecasting. Thus, generally we have:

modelling → estimation
| {z } → model selection → forecasting
| {z } | {z }
Sec.2 & Sec.3 Sec.4 Sec.5

4
2 Stationary Time Series Models
A filtered series is a linear combination of time series variables. We start from white
noise, the building block of important models.

2.1 WN: White Noise process

A white noise process {Xt } is a collection of uncorrelated time series variables with
constant, finite mean and variance:
(i) E[Xt ] = µ < ∞
(ii) σ 2 := s0 < ∞ and we have sτ = 0 for τ ̸= 0
We then denote it as:
Xt ∼ W N (µ, σ 2 )

2.2 MA(q), Moving Average process of order q

A process {Xt } is a moving average process of order q, denoted Xt ∼ MA(q), if could
be written as:
Xt = µ + θ0 εt + θ1 εt−1 + · · · + θq εt−q
where µ and θi are constants with θq ̸= 0, and εt ∼ W N (0, σε2 ). We have usually θ0 = 1.
Generally, MA(q) is weakly stationary except for MA(∞).

2.3 AR(p), Auto-Regressive process of order p

A process {Xt } is an autoregressive process of order p, denoted Xt ∼ AR(p), if could
be written as:
Xt = ϕ1 Xt−1 + ϕ2 Xt−2 + · · · + ϕp Xt−p + εt (1)
where ϕj are constants with ϕp ̸= 0, and εt ∼ W N (0, σε2 ).

Generally, AR(p) is not stationary, but if it is stationary, we can have an AR(p)

process with non-zero mean. We claim form:

Xt = α + ϕ1 Xt−1 + ϕ2 Xt−2 + · · · + ϕp Xt−p + εt (2)

could be written as (1) by assuming E[Xt] = µ, then take expectation on both sides,
we have α = µ(1 − ϕ1 − · · · − ϕp ).

2.4 ARMA(p, q), Auto-Regressive and Moving Average pro-

cess of order p, q
A process {Xt } is an ARMA(p, q) process, denoted Xt ∼ ARMA(p, q), if could be
written as:

Xt = ϕ1 Xt−1 + · · · + ϕp Xt−p + εt + θ1 εt−1 + · · · + θq εt−q (3)

where ϕj and θj are constants with ϕp , θq ̸= 0, and εt ∼ W N (0, σε2 ).

5
2.5 Notes on ARMA
We define the backward shift operator as B such that:

BXt = Xt−1 , B s Xt = Xt−s

Rewrite (3) as:

(1 − ϕ1 B − · · · − ϕp B p )Xt = (1 + θ1 B + · · · + θq B q )εt (4)

| {z } | {z }
Φ(B) Θ(B)

In (4), we define:
- Φ(B) as the autoregressive operator
- Θ(B) as the moving average operator
Hence also define:
- Φ(z) as the AR characteristic polynomial
- Θ(z) as the MA characteristic polynomial
Thus, ARMA is written as Φ(B)Xt = Θ(B)εt .

2.5.1 Stationarity of MA(∞)

∞
X
Xt = ψj εt−j
j=−∞

The above is an extended MA(∞) process if {εt } is weakly stationary with ACVS sετ .
Notice the process is NOT weaklyP
stationary. If coefficients are absolutely conver-
gent / absolutely summable: j |ψj | < ∞, then the process is weakly stationary
with ACVS: ∞
X
x
sτ = ψj ψk sετ −(j−k)
j,k=−∞

6
2.5.2 Invertibility of ARMA
For an ARMA process:

Φ(B)Xt = Θ(B)εt , εt ∼ W N (0, σ 2 )

We
P∞ say the process is invertible if there is a sequence of constants πj such that
j=0 |πj | < ∞ and:
∞
X
εt = πj Xt−j
j=0

2.5.3 Summary
For ARMA(p, q):
stationarity −→ absolute summability, but not true for other processes.

AR(p) MA(q) ARMA(p,q)

roots of Φ(z) = 0 roots of Φ(z) = 0
stationary always
outside of unit circle outside of unit circle
roots of Θ(z) = 0 roots of Θ(z) = 0
invertible always
outside of unit circle outside of unit circle
ACF exp. decay cut-off at lag q exp. decay
PACF cut-off at lag p exp. decay exp. decay

2.6 ARCH(p), Auto-Regressive Conditional Heteroscedastic-

ity model of order p
An ARCH(p) model, denoted Xt ∼ ARCH(p), is defined as:

X t = σ t εt
(5)
σt2 = α0 + α1 Xt−1
2 2
+ · · · + αp Xt−p

where {εt } is i.i.d. with zero mean and unit variance, denoted εt ∼ P
IID(0, 1). Fur-
thermore, α0 , αp > 0, and the other αj ≥ 0. Usually we also assume pi=1 αi < 1 to
ensure stationarity, see property 1 below.

Properties

1. If pi=1 αi < 1, then Xt ∼ WN(0, σx2 ) where σx2 = α0 /(1 − pi=1 αi ).

P P

2. {Xt } is not independent. Actually the squared process {Xt2 } is an AR(p) by

defining the white noise νt := σt2 (ε2t − 1) (check!):
2
Xt2 = νt + σt2 = α0 + α1 Xt−1 2
+ · · · + αp Xt−p + νt .

7
2.7 GARCH(p, q), Generalised ARCH model of order p, q
A GARCH(p, q) model, denoted Xt ∼ GARCH(p, q), is defined as:

Xt = σt εt
(6)
σt2 = α0 + α1 Xt−1
2 2
+ · · · + αp Xt−p 2
+ β1 σt−1 2
+ · · · + βq σt−q

where εt ∼ IID(0, 1), and α0 , αp , βq > 0, and the other αj , βj ≥ 0. We also assume
Pmax(p,q)
(αi + βi ) := pi=1 αi + qi=1 βi < 1.
P P
i=1

Properties

1. By the assumed constraints on P Pthe process is stationary and Xt ∼

coefficients,
WN(0, σx2 ) where σx2 = α0 /(1 − pi=1 αi − qi=1 βi ).

2. {Xt2 } is an ARMA(max(p, q), q) under certain conditions on finite moments. If

so, use the white noise defined in ARCH νt :
p q
X X
Xt2 = νt + σt2 = α0 + 2
αi Xt−i + 2
βi (Xt−i − νt−i ) + νt
i=1 i=1
max(p,q) q
X X
2
= α0 + (αi + βi )Xt−i + νt − βi νt−i
i=1 i=1

3. For GARCH(1,1), if we adapt Gaussian innovation {εt }, we can check that the
kurtosis is larger than 3, so the process cannot be Gaussian.

8
3 Non-stationary Time Series Models
3.1 Trend & Seasonality
The d-th order difference on a process {Xt } is defined recursively as:

∆d Xt := ∆(∆d−1 Xt ), ∆1 Xt = Xt − Xt−1

Thus, essentially ∆d Xt = (1 − B)d Xt with B as the backward shift operator.

The seasonal difference with period s on a process {Xt } is defined as:

∆s Xt := Xt − Xt−s = (1 − B s )Xt

Thus, if we want, ∆ds Xt = (1 − B s )d Xt .

3.2 ARIMA: integrated ARMA

Usually we might find the process non-stationary. By denoting Xt ∼ I(d), we say a
process {Xt } is integrated with order d if there is a stationary {yt }:

∆d Xt = yt

Thus, if the above yt ∼ ARM A(p, q), we say the process Xt ∼ ARIM A(p, d, q).

3.3 SARMA: seasonal ARMA

(Only for those interested)
If a process {Xt } demonstrates seasonality s and the seasonal differenced follows
ARMA, we denote Xt ∼ ARM A(P, Q)s and formally:

ΦP (B s )Xt = ΘQ (B s )εt

where εt ∼ W N (0, σ 2 ) and:

ΦP (B s ) := 1 − ϕ1 B s − ϕ2 B 2s − · · · − ϕP B P s
ΘQ (B s ) := 1 + θ1 B s + θ2 B 2s + · · · + θQ B Qs

It is a pure seasonal ARMA model. Similar to ARMA, the process is stationary if

ΦP (z s ) = 0 has all roots outside of unit circle, and it is invertible if ΘQ (z s ) = 0 has
all roots outside of unit circle.

9
We extend the pure seasonal ARMA to the multiplicative SARMA model. We denote
Xt ∼ ARM A(p, q) × (P, Q)s if the model has the form:

ΦP (B s )Φ(B)Xt = ΘQ (B s )Θ(B)εt

where εt ∼ W N (0, σ 2 ), and the other operators are defined the same in the pure
seasonal ARMA and ARMA.

3.4 SARIMA: seasonal ARIMA

A general form of seasonal model is introduced by taking seasonal differences and the
process is denoted Xt ∼ ARIM A(p, d, q) × (P, D, Q)s :

ΦP (B s )Φ(B)∆D d s
s ∆ Xt = α + ΘQ (B )Θ(B)εt

where εt ∼ W N (0, σ 2 ).

For pure seasonal AR or MA model, we have similar properties parallel to the ACF
and PACF properties for MA and AR models. We have ACF cut-off at lag Qs for
MA(Q)s , and PACF cut-off at lag P s for AR(P )s .

10
4 Estimation: Time-Domain
4.1 Basic Quantities
For later estimation, we first define and discuss some basic quantities. We assume the
process {Xt } is weakly stationary, and we want to estimate the mean µ, ACVS sτ ,
ACF ρτ .

4.1.1 Sample Mean

Define sample mean:
T
1X
X̄ := Xt
T t=1
First, notice it is an unbiased estimator. For the following, we focus on the variance
of the estimator:
T T T T
1 XX 1 XX
var(X̄) = 2 cov(Xi , Xj ) = 2 si−j (7)
T i=1 j=1 T i=1 j=1

We call a matrix Toeplitz matrix if entry values are the same on every n-th diagonal.
Specifically, here we define:
 
s0 s1 · · · sT −1
 s1
 s0 · · · sT −2 
 .. .. . . .. 
 . . . . 
sT −1 sT −2 · · · s0

Thus, the summation in (7) is just to sum over the entries in each row and column, so
we can re-write it by summing over diagonals:
T −1
1 X
var(X̄) = 2 (T − |τ |)sτ
T
τ =−(T −1)
(8)
T
1 X |τ |
= (1 − )sτ
T τ =−T T
P
If {sτ } is absolutely summable (i.e. τ |sτ | < ∞, see 2.5.1), as T → ∞, we can see
that var(X̄) → 0:
E[(x̄ − µ)2] → 0,
which essentially says x̄ converges to µ in mean square, and hence in probability.

11
4.1.2 Sample ACVS & ACF
Define the unbiased estimator for ACVS, assuming µ is known:
T −|τ |
1 X
s(u)
τ := (Xt − µ)(Xt+|τ | − µ) (9)
T − |τ | t=1

However, we almost always not know µ, so (9) is adjusted, and the new estimator is
biased:
T −|τ |
(⋆) 1 X
sτ := (Xt − X̄)(Xt+|τ | − X̄) (10)
T − |τ | t=1
We propose a new biased estimator which is often more preferred:
T −|τ |
1 X
ŝτ := (Xt − X̄)(Xt+|τ | − X̄) (11)
T t=1

For the following reasons:

(⋆)
1. MSE ŝτ ≤ MSE sτ

2. absolute summable: most processes have {sτ } absolutely summable, implying

τ → ∞ ⇒ sτ → 0. It is hence good to have the estimator decreasing with τ .
(⋆)
3. positive semi-definite: ŝτ ensures this property while sτ does not.

Lastly, we define the estimator for ACF:

ŝτ
ρ̂τ := (12)
ŝ0

12
4.2 MME: the Yule-Walker estimators for AR(p)
The Yule-Walker estimator is a type of Methods of Moments Estimators (MME),
and in the discussion below, we assume {Xt } is weakly stationary.

For an AR(p) process in the form (1) and τ > 0, we want to estimate a collection
of p + 1 parameters: {σε2 , ϕ1 , . . . , ϕp }. Take covariance with Xt−τ on both sides, we
have (13); take covariance with Xt instead, we have (14):

sτ = ϕ1 sτ −1 + · · · + ϕp sτ −p (13)

s0 = ϕ1 s1 + · · · + ϕp sp + σε2 (14)

To our p+1 unknown parameters, we just need p+1 equations. Thus, take τ = 1, . . . , p
for (13), we have p + 1 Yule-Walker equations in matrix form:

γp = Γp ϕ, σε2 = s0 − ϕT γp (15)

where ϕ := (ϕ1 , . . . , ϕp )T and γp := (s1 , . . . , sp )T and:

 
s0 s1 · · · sp−1
 s1 s0 · · · sp−2 
Γp :=  ..
 
.. . . .. 
 . . . . 
sp−1 sp−2 · · · s0

From (15), we replace all ACVS using ŝτ in (11) and get the estimated parameters
which guarantees stationary AR(p):
−1
ϕ̂ = Γˆp γˆp , σˆε 2 = sˆ0 − ϕ̂T γˆp (16)

13
4.3 Asymptotic results to test MA(q) & AR(p)
First, MA(q) could be estimated by methods of moments (MME), try it.

Asymptotic results of sample ACF for MA(q)

If for a zero mean MA(q) process, εt ∼ IID(0, σ 2 ) and E[ε4t ] < ∞, then as T → ∞:
q
√ D
X
T ρ̂τ −
→ N (0, 1 + 2 ρ2j ) for all τ > q (17)
j=1

If the process is WN (q = 0), then by (17):

√ D
T ρ̂τ −
→ N (0, 1) for all τ > 0 (18)

Therefore, − 1.96
√ , + 1.96

T
√
T
is a 95% confidence interval on sample ACF for the process
being a white noise. For common MA(q) processes, we often use sample ACF to
replace the real ACF in (17).

Asymptotic results of Yule-Walker estimators for AR(p)

√ D √ D
→ N (0, σε2 Γ−1
T (ϕ̂ − ϕ) − p ), T (σˆε 2 − σε2 ) −
→ N (0, 2σε4 ) (19)

If true order for AR is p, while we estimate an AR(h) process with h > p, then by
(19):
√ D
T ϕ̂h −
→ N (0, 1) (20)
Note the result does NOT hold for other k : p < k < h. Now, by the note in Def.7
on PACF, we have PACF ϕhh = ϕh = 0 for the case in (20). Thus for a well defined
sample PACF:
√ √ D
T ϕ̂hh = T ϕ̂h −
→ N (0, 1) ∀h > p (21)
1.96 1.96
Therefore, we have − √T , + √T as the 95% confidence bounds on sample PACF for
the AR process having order ≤ p.

14
4.4 LSE: least square estimators for AR(p) & ARMA(p, q)

AR(p)
We consider the least square estimators (LSE) for AR(p) processes with zero mean.
Given data points {X1 , . . . , XT }, write the equations in matrix form:

xF = F ϕ + εF (22)

where xF := (Xp+1 , . . . , XT )T and εF := (εp+1 , . . . , εT )T and F is of dimension (T −

p) × p:  
Xp Xp−1 · · · X1
 Xp+1 Xp · · · X2 
F :=  ..
 
.. .. .. 
 . . . . 
XT −1 XT −2 · · · XT −p

The forward least square estimator minimises the sum of squared errors:

ϕ̂OLS : = argminϕ ||εF ||2

= argminϕ ||xF − F ϕ||2 (23)
T −1 T
= (F F ) F xF

Hence:
2 ||xF − F (ϕ̂OLS )||2
σ̂OLS := (24)
(T − p) − p

However, the above estimators do not ensure a weakly stationary process which is
problematic for forecasting.

15
ARMA(p, q)
We now consider LSE on general ARMA(p, q) processes. We only mention the idea as
details are left with computers. For a non-zero mean ARMA(p, q):
p q
X X
Xt = c + bj Xt−j + εt + ai εt−i (25)
j=1 i=1

With εt (c, a, b) denoting the value of εt given (c, a, b), where a := (a1 , . . . , aq )T and
b := (b1 , . . . , bp )T :
T
X
⇒ (ĉ, â, b̂) : = argminc,a,b [εt (c, a, b)]2
t=p+1
T p q
(26)
X X X
: = argminc,a,b [Xt − c − bj Xt−j − ai εt−i (c, a, b)]2
t=p+1 j=1 i=1

It is a nonlinear optimisation but we can initialise (c0 , a0 , b0 ) and iterate until conver-
gence in our estimators.

16
4.5 MLE: MA(q) & AR(p) & ARMA(p, q)
We consider maximum likelihood estimators (MLE) for the three types of mod-
els. The general idea is to write MA or ARMA models as AR models by using the
backward shift operator, and we use MLE on the AR model.

Only example of AR(1) is discussed here, other cases are similar. Assume an AR(1)
i.i.d.
model with 0 mean, with data (X0 , X1 , . . . , Xn ), with εt ∼ N (0, σ 2 ). Thus we can
assume:
σ2
X0 ∼ N (0, ) (27)
1 − ϕ2
Together with the conditional joint likelihood f (X1 , . . . , Xn |X0 ) is Gaussian, we can
have the log-likelihood function:
n n 1 1
l(ϕ, σ 2 ) = − log(2π) − log(σ 2 ) + log(1 − ϕ2 ) − 2 S(ϕ) (28)
2 2 2 2σ
where we define S(ϕ):
n
X
S(ϕ) := (Xi − ϕXi−1 )2 + (1 − ϕ2 )X02
i=1
(29)

=: S ⋆ (ϕ) + (1 − ϕ2 )X02

We have three methods for the optimisation:

1. exact likelihood: to maximise l(ϕ, σ 2 )

2. unconditional likelihood: to minimise S(ϕ)

3. conditional likelihood: to minimise S ⋆ (ϕ)

The first method is nonlinear and requires numerical routines, but all three are asymp-
totically the same as the asymptotic distribution coincides with the asymptotic result
for Yule-Walker estimators.

17
5 Model Selection & Forecasting
5.1 Techniques for model selection & diagnostics
For model selection, we can consider the Akaike’s Information Criterion (AIC)
defined as:
AIC := −2(Maximised Log-likelihood) + 2m (30)
where m is the number of estimated parameters. We hope to minimise AIC and the
2m term is used to penalise for large models. More advanced versions are BIC (or
AICC which is not discussed here) based on AIC, defined as

BIC := −2(Maximised Log-likelihood) + m log(n), (31)

where n is the sample size (and hence will be T for our case in time series).

Next we might want to test if a process such as residuals is a white noise. At large
lags we expect to observe trivial ACF, and a formal test is the Ljung-Box-Pierce
statistic:
k
⋆
X ρ̂2j
Q := T (T + 2) (32)
j=1
T −j
where T is the sample size, ρ̂j is the sample ACF, k is a pre-chosen and represents
the number of lags we want to look at. As T → ∞, Q⋆ is close to a χ2k−m distribution
where m is the number of estimated parameters.

5.2 Forecasting: prediction equations

(Only for those interested, except for MSPE definition)
We assume the process focused is stationary. For prediction, we define the ”best”
estimator to minimise the mean square prediction error (MSPE):

MSPE(Xt (l)) := E (Xt (l) − Xt+l )2

(33)

where Xt (l) is the l-step prediction at time t.

For the rest of the note, we denote the conditional expectation:

Et[·] := E[·|Ft]
where Ft represents information of Xs ∀s ≤ t.

Theorem 6.2.1.
Xt (l) = Et [Xt+l ] minimises the MSPE. Proof omitted but please try.

Now we concentrate on only linear predictors, of the form:

t
X
Xt (l) = α0 + αk X k (34)
k=1

By Theorem 6.2.1 above, we have the following.

18
Theorem 6.2.2.
Xt (l) is the best linear predictor (lowest MSPE among all linear predictors) of Xt+l if
and only if it satisfies:
E (Xt(l) − Xt+l )Xj = 0, ∀j = 0, 1, . . . , t,

(35)
with X0 = 1, and the involved equations are called prediction equations. The proof
is omitted here, but PLEASE TRY. The “if” part is to try to prove Xt (l) has lowest
MSPE among all linear predictors if prediction equations satisfied); the “only if” part
is by showing when MSPE is minimised for all linear predictors (differentiation), the
prediction equations hold. (If MSPE is minimised among all predictors, then the proof
is easyis by tower property on conditional mean.)

Notice we can consider the de-meaned process {Xt − µ} and hence α0 in (34) could
be ignored (think about why?). In practice, µ could be estimated by the sample mean
anyway. Next, (35) implies for j = 0, 1, . . . , t:
st+l−j = α1 sj−1 + α2 sj−2 + · · · + αt sj−t . (36)
In matrix form, we have:
γl = Γt α (37)

where γl := (sl , sl+1 , . . . , st+l−1 )T , α := (αt , αt−1 , . . . , α1 )T , and:

 
s0 s1 · · · st−1
 s1 s0 · · · st−2 
Γt :=  ..
 
.. . . .. 
 . . . . 
st−1 st−2 · · · s0 .

By replacing all sj ’s by their sample estimates, ŝτ in (11), we can obtain the coefficient
estimates:
α̂ = Γ̂−1
t γ̂l .

5.3 Forecasting: ARMA(p, q)

Similarly, we assume we are forecasting a stationary ARMA process with 0 mean and
IID innovation. Thus we have the MA(∞) representation:
∞
X
Xt = ψk εt−k (38)
k=0

where εt ∼ IID(0, σ 2 ). Then we easily have the l-step prediction:

∞
Xt (l) = Et
X
ψk εt+l−k
k=0
∞
X
= ψk εt+l−k (39)
k=l
X∞
= ψj+l εt−j
j=0
19
We can observe that the ε’s terms from t + 1 to t + l is removed due to the expectation,
so the l-step prediction variance is the variance of those removed ε’s:
l−1
σ 2 (l) := E Xt (l) − Xt+l =E
2 X 2
ψk εt+l−k
k=0
l−1
X
= σ2 ψk2
k=0

Assume the process is invertible, we can also express Xt (l) directly using Xt , details
not discussed here.

Version edited using LATEX.

Time Series Forecasting - SoftDrink - Business Report
75% (4)
Time Series Forecasting - SoftDrink - Business Report
37 pages
Time Series ARIMA Models PDF
No ratings yet
Time Series ARIMA Models PDF
22 pages
11 Time Series
No ratings yet
11 Time Series
17 pages
Time Series
No ratings yet
Time Series
32 pages
MATH545-Time Series
No ratings yet
MATH545-Time Series
79 pages
Stat720 Notes
No ratings yet
Stat720 Notes
150 pages
Chapter 4X
No ratings yet
Chapter 4X
96 pages
Chapter 3 - Lecture Notes
No ratings yet
Chapter 3 - Lecture Notes
20 pages
Univariate Time Series Models: Gianluca Cubadda
No ratings yet
Univariate Time Series Models: Gianluca Cubadda
75 pages
ARIMA
No ratings yet
ARIMA
24 pages
Arma Models New
No ratings yet
Arma Models New
66 pages
Box Jenkins Methodology
100% (1)
Box Jenkins Methodology
29 pages
Non-Seasonal Box-Jenkins Models
No ratings yet
Non-Seasonal Box-Jenkins Models
75 pages
Week4 1
No ratings yet
Week4 1
37 pages
Lecture-9-Univariate-Time-Series-Modelling - Part 1
No ratings yet
Lecture-9-Univariate-Time-Series-Modelling - Part 1
37 pages
Lecture 9. ARIMA Models
No ratings yet
Lecture 9. ARIMA Models
16 pages
Mifi 564 - Unit 2 - 2022
No ratings yet
Mifi 564 - Unit 2 - 2022
71 pages
Time Series
No ratings yet
Time Series
327 pages
Chapter 3 - Lecture Slides
No ratings yet
Chapter 3 - Lecture Slides
94 pages
Lecture Note 4 - Dynamic Models For Stationary Data
100% (1)
Lecture Note 4 - Dynamic Models For Stationary Data
28 pages
Autoregressive Moving-Average Models: Definition 2.1 (ARMA Models) - A Stochastic Process FX
No ratings yet
Autoregressive Moving-Average Models: Definition 2.1 (ARMA Models) - A Stochastic Process FX
21 pages
TS PartII
100% (1)
TS PartII
50 pages
HSTS203 Time Series
No ratings yet
HSTS203 Time Series
22 pages
Econometrics Module 4 Lesson 3
No ratings yet
Econometrics Module 4 Lesson 3
10 pages
Time Series
No ratings yet
Time Series
13 pages
ARIMA AR MA ARMA Models
No ratings yet
ARIMA AR MA ARMA Models
46 pages
LN LinearTSModels 3
No ratings yet
LN LinearTSModels 3
15 pages
ARMA Processes: 1 Some Notation
No ratings yet
ARMA Processes: 1 Some Notation
6 pages
Autoregressive Moving Average Models: Time Series: Applications To Finance With R and S-Plus, Second Edition
No ratings yet
Autoregressive Moving Average Models: Time Series: Applications To Finance With R and S-Plus, Second Edition
15 pages
Lecture 3
No ratings yet
Lecture 3
31 pages
Econometrics (EM2008) Stationary Univariate Time Series: Irene Mammi
No ratings yet
Econometrics (EM2008) Stationary Univariate Time Series: Irene Mammi
28 pages
Time Series Exam, 2010: Solutions
No ratings yet
Time Series Exam, 2010: Solutions
4 pages
Univariate Time Series Models - Cropped PDF
No ratings yet
Univariate Time Series Models - Cropped PDF
54 pages
Lec 3 Autoregressive Moving Average (ARMA) Models and Their Practical Applications20200209000406
No ratings yet
Lec 3 Autoregressive Moving Average (ARMA) Models and Their Practical Applications20200209000406
32 pages
Stat 479
No ratings yet
Stat 479
74 pages
Vinay Ahlawat
No ratings yet
Vinay Ahlawat
5 pages
Ch6 Slides Ed3 Feb2024
No ratings yet
Ch6 Slides Ed3 Feb2024
31 pages
Time Series DES 3 5 201320130514143106
No ratings yet
Time Series DES 3 5 201320130514143106
69 pages
STA222
No ratings yet
STA222
6 pages
ATA Unit3 4 5
No ratings yet
ATA Unit3 4 5
51 pages
Chapter 3: Some Time-Series Models
No ratings yet
Chapter 3: Some Time-Series Models
33 pages
06 ARMA Properties
No ratings yet
06 ARMA Properties
34 pages
Machine Learning 2
No ratings yet
Machine Learning 2
65 pages
Stationary ARMA Procecess - 2024
No ratings yet
Stationary ARMA Procecess - 2024
14 pages
Ch6 Slides Ed3 Feb2021
No ratings yet
Ch6 Slides Ed3 Feb2021
63 pages
Time Series 2022 B
No ratings yet
Time Series 2022 B
57 pages
Basic Time Series With Python Code
No ratings yet
Basic Time Series With Python Code
33 pages
Linear Stationary Models
No ratings yet
Linear Stationary Models
16 pages
A Course in Time Series Analysis
No ratings yet
A Course in Time Series Analysis
139 pages
Lecture 3. Stationarity Test
No ratings yet
Lecture 3. Stationarity Test
54 pages
ARMA Autocorrelation Functions
No ratings yet
ARMA Autocorrelation Functions
21 pages
6 PDF
No ratings yet
6 PDF
40 pages
Identification of An Arma Model
No ratings yet
Identification of An Arma Model
6 pages
Time Series Summary
No ratings yet
Time Series Summary
25 pages
Notes On ARIMA Modelling: Brian Borchers November 22, 2002
No ratings yet
Notes On ARIMA Modelling: Brian Borchers November 22, 2002
19 pages
Financial Time Series Notes
No ratings yet
Financial Time Series Notes
31 pages
ARIMAs
No ratings yet
ARIMAs
75 pages
Univariate Time Series Modelling and Forecasting
100% (2)
Univariate Time Series Modelling and Forecasting
72 pages
ARMA Models
No ratings yet
ARMA Models
26 pages
Group 9 Time Series Data Analysis (ARIMA)
No ratings yet
Group 9 Time Series Data Analysis (ARIMA)
47 pages
Finding Latent Groups in Observed Data
No ratings yet
Finding Latent Groups in Observed Data
56 pages
Interplate Coupling Along The Nazca Subduction Zone On The Pacific Coast of Colombia Deduced From GeoRED GPS Observation Data
No ratings yet
Interplate Coupling Along The Nazca Subduction Zone On The Pacific Coast of Colombia Deduced From GeoRED GPS Observation Data
16 pages
Ecology Letters - 2015 - Mori - Low Multifunctional Redundancy of Soil Fungal Diversity at Multiple Scales
No ratings yet
Ecology Letters - 2015 - Mori - Low Multifunctional Redundancy of Soil Fungal Diversity at Multiple Scales
11 pages
FIN 5309 Homework 9 Solution Fall 2018: Instructions
No ratings yet
FIN 5309 Homework 9 Solution Fall 2018: Instructions
16 pages
Comparing Unmanned Aerial Systems With Conventional Methodology For Surveying A Wild White-Tailed Deer Population
No ratings yet
Comparing Unmanned Aerial Systems With Conventional Methodology For Surveying A Wild White-Tailed Deer Population
12 pages
Predictors of Supplemental Nutrition Assistance Program Use at Farmers' Markets With Monetary Incentive Programming
No ratings yet
Predictors of Supplemental Nutrition Assistance Program Use at Farmers' Markets With Monetary Incentive Programming
10 pages
Data Driven Hybrid Approach For Health Monitoring and Fault Detection in
No ratings yet
Data Driven Hybrid Approach For Health Monitoring and Fault Detection in
8 pages
McMillen, D. P., & McDonald, J. F. (2004) - Reaction of House Prices To A New Rapid Transit Line: Chicago's Midway Line
No ratings yet
McMillen, D. P., & McDonald, J. F. (2004) - Reaction of House Prices To A New Rapid Transit Line: Chicago's Midway Line
24 pages
BSC Maths Dissertation Examples
100% (1)
BSC Maths Dissertation Examples
8 pages
It S All Relative Examining The Influence of Social Identity On Sport-Based Youth Development
No ratings yet
It S All Relative Examining The Influence of Social Identity On Sport-Based Youth Development
22 pages
Urban Landscape Organization Is Associated With Species-Specific Traits in European Birds
No ratings yet
Urban Landscape Organization Is Associated With Species-Specific Traits in European Birds
35 pages
Coping Flexibility, Potentially Traumatic Life Events, and Resilience: A Prospective Study of College Student Adjustment
No ratings yet
Coping Flexibility, Potentially Traumatic Life Events, and Resilience: A Prospective Study of College Student Adjustment
27 pages
Negative Binomial Regression Second Edition
No ratings yet
Negative Binomial Regression Second Edition
9 pages
Investigating The Preferences of Older People For Telehealth As A New Model of Health Care Service Delivery: A Discrete Choice Experiment
No ratings yet
Investigating The Preferences of Older People For Telehealth As A New Model of Health Care Service Delivery: A Discrete Choice Experiment
13 pages
Habitat and Conservation of The Enigmatic Damselfly Ischnura Pumilio
No ratings yet
Habitat and Conservation of The Enigmatic Damselfly Ischnura Pumilio
12 pages
Effects of Watching Too Much TV Essay
100% (2)
Effects of Watching Too Much TV Essay
5 pages
Geoderma: S. Havaee, M.R. Mosaddeghi, S. Ayoubi
No ratings yet
Geoderma: S. Havaee, M.R. Mosaddeghi, S. Ayoubi
12 pages
ARDL
No ratings yet
ARDL
17 pages
Logistic Regression Via Excel Spreadsheets Mechani
No ratings yet
Logistic Regression Via Excel Spreadsheets Mechani
12 pages
The Fisher-Snedecor F Distribution: A Simple and Accurate Composite Fading Model
No ratings yet
The Fisher-Snedecor F Distribution: A Simple and Accurate Composite Fading Model
4 pages
072-Chapter 1 Inheritance Analysis For Exserted Stigma Rate in Japonica Rice
No ratings yet
072-Chapter 1 Inheritance Analysis For Exserted Stigma Rate in Japonica Rice
7 pages
Gilroy Et Al. (2022) Applications of Operant Demand To Treatment Selection 1. Characterizing Demand For Evidence-Based Practices
No ratings yet
Gilroy Et Al. (2022) Applications of Operant Demand To Treatment Selection 1. Characterizing Demand For Evidence-Based Practices
16 pages
Gérard - Et Al 2017
No ratings yet
Gérard - Et Al 2017
8 pages
Protocol Paper Updated v2 Final
No ratings yet
Protocol Paper Updated v2 Final
42 pages
The Effect of The COVID-19 Vaccine On Daily Cases and Deaths Based On Global Vaccine Data
No ratings yet
The Effect of The COVID-19 Vaccine On Daily Cases and Deaths Based On Global Vaccine Data
12 pages
Time Series Forecasting Fundamentals
No ratings yet
Time Series Forecasting Fundamentals
37 pages
Extended Modified Generalized Exponential Distribution: Properties and Applications
No ratings yet
Extended Modified Generalized Exponential Distribution: Properties and Applications
12 pages
Statistical Foundations, Reasoning and Inference: For Science and Data Science (Springer Series in Statistics) Göran Kauermann
No ratings yet
Statistical Foundations, Reasoning and Inference: For Science and Data Science (Springer Series in Statistics) Göran Kauermann
69 pages
tmp9252 TMP
No ratings yet
tmp9252 TMP
16 pages

ST304 Notes Zetai (v2)

Uploaded by

ST304 Notes Zetai (v2)

Uploaded by

Notes: ST304 Time Series and Forecasting

Update (20240313): adjustment of notations.

2 Stationary Time Series Models 5

3 Non-stationary Time Series Models 9

1. joint distribution of X at time points t1 , . . . , tn is defined to be

Fx (c1 , . . . , cn ) = P(Xt1 ≤ c1 , . . . , Xtn ≤ cn ).

2. autocovariance function (ACVF) of X at time t, s is defined as

γx (t, s) = cov(Xt , Xs ) = E[(Xt − µt )(Xs − µs )].

3. autocorrelation function (ACF) of X at time t, s is defined as

4. strong stationarity of X means the joint distribution is invariant of drift. For

P(Xt 1 ≤ c1 , . . . , Xtn ≤ cn ) = P(Xt1 +h ≤ c1 , . . . , Xtn +h ≤ cn ).

5. weak stationarity of X means:

=⇒ ∀t, s, c: γx (t, s) = γx (t + c, s + c).

sτ := γx (t, t + τ ) = cov(Xt , Xt+τ )

7. partial autocorrelation function (PACF) for a mean 0 stationary X defined

Inspiration from Wold’s Decomposition Theorem

where Tt is deterministic, called the trend component,St is deterministic, called the

We mainly focus on stationary (meaning weakly-stationary) time series and might

2.1 WN: White Noise process

2.2 MA(q), Moving Average process of order q

2.3 AR(p), Auto-Regressive process of order p

Generally, AR(p) is not stationary, but if it is stationary, we can have an AR(p)

Xt = α + ϕ1 Xt−1 + ϕ2 Xt−2 + · · · + ϕp Xt−p + εt (2)

2.4 ARMA(p, q), Auto-Regressive and Moving Average pro-

Xt = ϕ1 Xt−1 + · · · + ϕp Xt−p + εt + θ1 εt−1 + · · · + θq εt−q (3)

where ϕj and θj are constants with ϕp , θq ̸= 0, and εt ∼ W N (0, σε2 ).

BXt = Xt−1 , B s Xt = Xt−s

Rewrite (3) as:

(1 − ϕ1 B − · · · − ϕp B p )Xt = (1 + θ1 B + · · · + θq B q )εt (4)

2.5.1 Stationarity of MA(∞)

Φ(B)Xt = Θ(B)εt , εt ∼ W N (0, σ 2 )

AR(p) MA(q) ARMA(p,q)

2.6 ARCH(p), Auto-Regressive Conditional Heteroscedastic-

1. If pi=1 αi < 1, then Xt ∼ WN(0, σx2 ) where σx2 = α0 /(1 − pi=1 αi ).

2. {Xt } is not independent. Actually the squared process {Xt2 } is an AR(p) by

1. By the assumed constraints on P Pthe process is stationary and Xt ∼

2. {Xt2 } is an ARMA(max(p, q), q) under certain conditions on finite moments. If

Thus, essentially ∆d Xt = (1 − B)d Xt with B as the backward shift operator.

The seasonal difference with period s on a process {Xt } is defined as:

Thus, if we want, ∆ds Xt = (1 − B s )d Xt .

3.2 ARIMA: integrated ARMA

3.3 SARMA: seasonal ARMA

where εt ∼ W N (0, σ 2 ) and:

It is a pure seasonal ARMA model. Similar to ARMA, the process is stationary if

3.4 SARIMA: seasonal ARIMA

4.1.1 Sample Mean

For the following reasons:

2. absolute summable: most processes have {sτ } absolutely summable, implying

Lastly, we define the estimator for ACF:

where ϕ := (ϕ1 , . . . , ϕp )T and γp := (s1 , . . . , sp )T and:

Asymptotic results of sample ACF for MA(q)

If the process is WN (q = 0), then by (17):

Asymptotic results of Yule-Walker estimators for AR(p)

where xF := (Xp+1 , . . . , XT )T and εF := (εp+1 , . . . , εT )T and F is of dimension (T −

ϕ̂OLS : = argminϕ ||εF ||2

We have three methods for the optimisation:

1. exact likelihood: to maximise l(ϕ, σ 2 )

2. unconditional likelihood: to minimise S(ϕ)

3. conditional likelihood: to minimise S ⋆ (ϕ)

BIC := −2(Maximised Log-likelihood) + m log(n), (31)

5.2 Forecasting: prediction equations

MSPE(Xt (l)) := E (Xt (l) − Xt+l )2

where Xt (l) is the l-step prediction at time t.

For the rest of the note, we denote the conditional expectation:

Now we concentrate on only linear predictors, of the form:

By Theorem 6.2.1 above, we have the following.

where γl := (sl , sl+1 , . . . , st+l−1 )T , α := (αt , αt−1 , . . . , α1 )T , and:

5.3 Forecasting: ARMA(p, q)

where εt ∼ IID(0, σ 2 ). Then we easily have the l-step prediction:

You might also like