0% found this document useful (0 votes)
31 views17 pages

TIme-series Analysis

Cheatsheat

Uploaded by

Loh Siyuan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views17 pages

TIme-series Analysis

Cheatsheat

Uploaded by

Loh Siyuan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Time Series

Preliminaries

Gaussian Conditional Distribution

Proposition 0.1. Let 𝑋 = (𝑋1 , 𝑋2 , ..., 𝑋𝑛+1 ) be a multivariate gaussian random vector with
mean vector 𝜇, covariance matrix Σ, and pdf defined as

𝑛+1 1 1
𝒩(𝜇, Σ) = (2𝜋)− 2 |Σ|− 2 exp − (𝑥 − 𝜇)𝑇 Σ−1 (𝑥 − 𝜇)
2

where |Σ| denotes the determinant.


Then given observations (𝑥1 , 𝑥2 , ...𝑥𝑛 ), and rewriting the covariance matrix in block form,

Σ1∶𝑛,1∶𝑛 Σ1∶𝑛,𝑛+1
Σ=[ ]
Σ𝑛+1,1∶𝑛 Σ𝑛+1,𝑛+1

the pdf distribution of 𝑋𝑛+1 conditioned on 𝑋1 = 𝑥1 , ..., 𝑋𝑛 = 𝑥𝑛 is also gaussian with mean

𝜇𝑛+1 + Σ𝑇1∶𝑛,𝑛+1 Σ−1


1∶𝑛,1∶𝑛 (𝑥1∶𝑛 − 𝜇1∶𝑛 )

and variance

Σ𝑛+1,𝑛+1 − Σ𝑇1∶𝑛∶𝑛+1 Σ−1


𝑛 Σ1∶𝑛∶𝑛+1

1
Stochastic process

• a sequence of random variables 𝑋1 , 𝑋2 , .., 𝑋𝑛 , denoted from now by (𝑋𝑡 )


• Time series models treat a sequence of observations 𝑥1 , 𝑥2 , ..., 𝑥𝑛 as a realization of
the stochastic process and specifies its joint distribution

ℙ{𝑋1 ≤ 𝑥1 , 𝑋2 ≤ 𝑥2 , .., 𝑋𝑛 ≤ 𝑥𝑛 }

• Moments of a stochastic process are functions

function definition
mean 𝜇𝑋 (𝑡) ∶= 𝔼[𝑋𝑡 ]
autocovariance 𝛾𝑋 (𝑠, 𝑡) = Var[𝑋𝑠 , 𝑋𝑡 ]
cross-covariance 𝛾𝑋𝑌 (𝑠, 𝑡) = Var[𝑋𝑠 , 𝑌𝑡 ]

Properties of stationary processes

• Causality
• Invertibility

A stochastic process is invertible if


∑ 𝜋𝑗 (𝑋𝑡 − 𝜇) = 𝑊𝑡
𝑗=0

∑𝑗=0 |𝜋𝑗 | < ∞

Stationarity

• A stochastic process is strongly stationary if…


• A stochastic process is weakly stationary if

function properties notation simplifies to


autocovariance
𝛾𝑋 (𝑡, 𝑡 + ℎ) exists for all 𝑡 and ℎ. 𝛾𝑋 (ℎ) = 𝛾𝑋 (𝑋𝑡+ℎ , 𝑋𝑡 )
mean 𝜇𝑋 (𝑡) is independent of 𝑡 𝜇𝑋
independent of 𝑡
𝛾𝑋 (ℎ)
autocorrelation
even function 𝜌𝑋 (ℎ) = 𝛾𝑋 (0)

2
𝛾𝑋𝑌 (ℎ)
cross- if (𝑋𝑡 ), (𝑌𝑡 ) stationary, 𝜌𝑋,𝑌 (ℎ) = √𝛾𝑋 (0)𝛾𝑌 (0)
correlation 𝛾𝑋,𝑌 (ℎ) = 𝛾𝑋,𝑌 (𝑡 + ℎ, 𝑡) is independent of t

• There are two types of non-stationarity

1. Trend stationarity

𝑋𝑡 = 𝑓 𝑡 + 𝑌 𝑡
where𝑓𝑡 is a deterministic sequence and 𝑌𝑡 is a stationary process. Can be made stationary by
estimating the 𝑓𝑡 component and subtracting it from 𝑋𝑡

2. Unit roots

Intuition: processes that become stationary after taking 𝑑 difference are said to have 𝑑 unit
roots. Example: The first difference of a random walk is stationary, so it has 1 unit root.

Moments Estimation

Big idea: the sample mean, autocovariance, and autocorrelation are consistent and asymp-
totically normal estimators of the population

Theorem 0.1. If (𝑋𝑡 ) is a stationary linear process, then


𝑛(𝑋̄ 𝑛 − 𝜇) →𝑑 𝑁 (0, 𝑉 )

where

∞ ∞
𝑉 = ∑ 𝛾𝑋 (ℎ) = 𝜎2 ( ∑ 𝜓𝑗 )2
ℎ=−∞ 𝑗=−∞

If 𝑋𝑡 is a linear process with gaussian white noise, then

1 |ℎ|
𝑋̄ 𝑡 ∼ 𝑁 (𝜇, ∑ (1 − )𝛾 (ℎ))
𝑛 |ℎ|<𝑛 𝑛 𝑋

3
Theorem 0.2. Denote 𝜌1∶𝑘 = (𝜌(1), ..., 𝜌(𝑘)), 𝜌1∶𝑘
̂ = (𝜌(1),
̂ ..., 𝜌(𝑘))
̂
If (𝑋𝑡 ) is a stationary linear process with white noise and finite fourth moments, then for any
fixed k,


𝑛(𝜌1∶𝑛
̂ − 𝜌1∶𝑛 ) →𝑑 𝑁 (0, 𝑉 )

where


𝑉𝑖𝑗 = ∑(𝜌(𝑢 + 𝑖) + 𝜌(𝑢 − 𝑖) − 2𝜌(𝑖)𝜌(𝑢))(𝜌(𝑢 + 𝑗) + 𝜌(𝑢 − 𝑗) − 2𝜌(𝑗)𝜌(𝑢))
𝑢=1
If (𝑋𝑡 ) is white noise, then for any fixed k,


𝑛𝜌1∶𝑘 →𝑑 𝑁 (0, 𝐼)

Conditional distribution under stationarity

Corollary 0.1. If (𝑋𝑡 ) is a mean zero Gaussian process, then given (𝑋1 = 𝑥1 , 𝑋2 =
𝑥2 , ..., 𝑋𝑛 = 𝑥𝑛 ), Proposition 0.1 tells us that the conditional distribution of 𝑋𝑛+ℎ is a gaussian
with mean

𝑇
𝜇𝑋𝑛+ℎ |𝑋1 =𝑥1 ,...,𝑋𝑛 =𝑥𝑛 = 𝛾ℎ∶𝑛+ℎ−1 Γ−1
𝑛 𝑥𝑛∶1

and variance

𝑇
𝛾𝑋 (0) − 𝛾1∶𝑛∶𝑛+1 Γ−1 𝑇
𝑛 Σ1∶𝑛∶𝑛+1 𝛾1∶𝑛∶𝑛+1

where

𝛾𝑎∶𝑏 = (𝛾𝑎 , 𝛾𝑎+1 , ..., 𝛾𝑏 )

for 𝑎 < 𝑏,

𝛾(0) 𝛾(1) ... 𝛾(𝑛 − 1)


⎡ 𝛾(1) 𝛾(0) ... 𝛾(𝑛 − 2) ⎤
Γ−1
𝑛 =⎢ ⎥
⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎣𝛾(𝑛 − 1) 𝛾(𝑛 − 2) ... 𝛾(0) ⎦

@cor-cdstation provides the distributional forecast, with the point forecast being the mean
and the prediction intervals being the corresponding quantiles of the variance.

4
ARMA(p,q) Models

These a broad class of stationary stochastic processes.


The autoregressive polynomial is defined as

𝜙(𝑧) = 1 − 𝜙1 𝑧 − 𝜙2 𝑧2 − ... − 𝜙𝑝 𝑧𝑝

Note: use Φ(𝑧) to denote its seasonal counterpart


The moving average polynomial is defined as

𝜃(𝑧) = 1 − 𝜙1 𝑧 − 𝜙2 𝑧2 − ... − 𝜙𝑝 𝑧𝑝

Note: use Θ(𝑧) to denote its seasonal counterpart


Using the above definitions, write

Type Equation
(p,0) 𝜙(𝐵)𝑋𝑡 = 𝛼 + 𝑊𝑡
(0,q) 𝑋𝑡 = 𝛼 + 𝜃(𝐵)𝑊𝑡
(p,q) 𝜙(𝐵)𝑋𝑡 = 𝛼 + 𝜃(𝐵)𝑊𝑡

ACF (AR,MA,ARMA)

PACF

Causality

Invertibility

5
Transformation

Box-cox

Decomposition

We assume that a time series (𝑥𝑡 ) has 3 components, a trend-cycle component 𝑇𝑡 , a constant
mean-zero seasonal component 𝑆𝑡 (i.e., 𝑆𝑡+𝑝 = 𝑆𝑡 ), and a zero-mean stationary stochastic 𝑅𝑡
process.
They can be combined in an additive manner,

𝑥𝑡 = 𝑇𝑡 + 𝑆𝑡 + 𝑅𝑡

or in a multiplicative manner

log 𝑥𝑡 = log 𝑇𝑡 + log 𝑆𝑡 + log 𝑅𝑡

The above assumptions imply that

1 𝑙 1 𝑙 1 𝑙 1 𝑙
∑ 𝑥𝑡+𝑗 = ∑ 𝑇𝑡+𝑗 + ∑ 𝑆𝑡+𝑗 + ∑𝑅 (1)
𝑚 𝑗=−𝑘 𝑚 𝑗=−𝑘 𝑚 𝑗=−𝑘 𝑚 𝑗=−𝑘 𝑡+𝑗
= 𝑇𝑡 (2)

where (k+l+1) MOD period = 0


Note that for any 1 ≤ 𝑘 ≤ 𝑝, 𝑥𝑡 − 𝑇𝑡 satisfies:

1 ∞ 1 ∞ 1 ∞
∑(𝑥𝑘+𝑗𝑝 − 𝑇𝑘+𝑗𝑝 ) = ∑ 𝑆𝑘+𝑗𝑝 + ∑ 𝑅𝑘+𝑗𝑝 = 𝑆𝑘
𝑚 𝑗=1 𝑚 𝑗=1 𝑚 𝑗=1

Finally,

𝑅𝑡 = 𝑥𝑡 − 𝑇𝑡 − 𝑆𝑡

6
Statistical Tests

𝐻0 ∶ (𝑋𝑡 ) is white noise

All tests are based on theorem @thm-ACF


Table 4: note: ℎ = 𝑚𝑖𝑛(10, 𝑛/5) for non-seasonal, ℎ = 𝑚𝑖𝑛(, 𝑛/5)

Test Test statistics reject 𝐻0 when



ACF Plot Check how many lines exceed ±1.96 𝑛 if more than 5%, reject
inspection

Box-pierce ̂ (𝑗)2 ; 𝑄𝐵𝑃 → 𝜒2ℎ
𝑄𝐵𝑃 ∶= 𝑛 ∑𝑗=1 𝜌𝑋 𝑄𝐵𝑃 >
(1 − 𝛼) − quantile of 𝜒2ℎ
ℎ ̂ (𝑗)2
𝜌𝑋
Ljung-Box 𝑄𝐿𝐵 ∶= 𝑛(𝑛 + 2) ∑𝑗=1 𝑛−𝑗 ; 𝑄𝐵𝑃 → 𝜒2ℎ 𝑄𝐿𝐵 >
(1 − 𝛼) − quantile of 𝜒2ℎ

Unit Root Tests

𝑝 𝑝
Table 5: 𝛾 = ∑𝑗=1 𝜙𝑗 − 1,𝜓𝑗 = − ∑𝑗+1 𝜙𝑖 , 𝑗 = 1, ..., 𝑝 − 1

Test model assumption


DF (𝐼 − 𝐵)𝑋𝑡 = (𝜙 − 1)𝑋𝑡−1 + 𝑊𝑡
𝑝−1
ADF (𝐼 − 𝐵)𝑋𝑡 = 𝛾𝑋𝑡−1 + ∑𝑗=1 𝜓𝑗 (𝐼 − 𝐵)𝑋𝑡−𝑗 + 𝑊𝑡
KPSS 𝑋𝑡 = 𝑅𝑡 + 𝑌𝑡 ; 𝑅𝑡 = 𝑅𝑡−1 + 𝑊𝑡 and (𝑊𝑡 ) ∼ 𝑊 𝑁 (0, 𝜎2 ) , 𝑆𝑡 stationary

test 𝐻0 ; 𝐻1 Test-Statistic lim dist under ℎ0


𝑛 1 2
𝑛 ∑𝑖=1 (𝑋𝑡 −𝑋𝑡−1 )𝑋𝑡−1 2 (𝑊 (1) −1)
DF 𝜙 = 1; |𝜙| < 1 𝑛(𝜙 ̂ − 1) = 𝑛 2
∑𝑖=1 𝑋𝑡−1 1 2
∫0 𝑊 (𝑡) 𝑑𝑡
𝛾̂
ADF 𝛾 = 0 SE(𝛾)
𝑛
∑𝑡=1 𝑆𝑡2 𝑡 𝑆𝑛 1
KPSS 𝜎2 = 0;𝜎2 > 0 𝑛2 𝑉 ̂
, 𝑆𝑡 = ∑𝑗=1 𝑋𝑗 − 𝑋,̄ 𝑉 ̂ = VAR[ √𝑛
] ∫0 (𝑊 (𝑡) −
𝑡𝑊 (0))2 𝑑𝑡

Common Examples of Time series

Random Walk

Definition 0.1.

7
𝑋𝑡 = 𝛿𝑡 + 𝑋𝑡−1 + 𝑊𝑡 (3)
𝑡
= 𝑡𝛿𝑡 + ∑ 𝑊𝑗 (4)
𝑗=1

Note that a random walk is not stationary since 𝜇𝑋 (𝑡) = 𝑡𝛿𝑡 depends on t.

Linear Process

Definition 0.2.

𝑋𝑡 = 𝜇𝑡 + ∑ 𝜓𝑗 𝑊𝑡−𝑗
𝑗=−∞

A linear process is causal if 𝑗 = 0 for all 𝑗 < 0

Gaussian Process

Definition 0.3. A stochastic process (𝑋𝑡 ) is a Gaussian process if all finite dimensional
projections (𝑋𝑡1 , 𝑋𝑡2 , ..., 𝑋𝑡𝑘 ) have a multivariate normal distribution.
Denoting, Γ𝑖𝑗 = 𝛾𝑋 (𝑡𝑖 , 𝑡𝑗 ), 𝜇 = (𝜇𝑋 (𝑡1 ), ..., 𝜇𝑋 (𝑡𝑘 )), 𝑥1∶𝑘 = (𝑥1 , 𝑥2 , ..., 𝑥𝑘 )

𝑘 1 1
𝑝(𝑥1 , 𝑥2 , ..., 𝑥𝑘 ) = (2𝜋)− 2 det(Γ)− 2 exp(− (𝑥1∶𝑘 − 𝜇)𝑇 Γ−1 (𝑥1∶𝑘 − 𝜇)
2

Note that if (𝑋𝑡 ) is weakly stationary, then 𝜇 is a constant and entries of Γ will only depend
on |𝑡𝑖 − 𝑡𝑗 |.

Exponential Smoothing Methods

Taxonomy
A: additive
Ad: additive damped
M:multiplicative

E T S level 𝑙𝑡 trend 𝑏𝑡 seasonal 𝑠𝑡 𝑥𝑛+ℎ|𝑛


̂
A N N 𝛼𝑥𝑡 + (1 − 𝛼)𝑙𝑡−1 𝑙𝑛

8
E T S level 𝑙𝑡 trend 𝑏𝑡 seasonal 𝑠𝑡 𝑥𝑛+ℎ|𝑛
̂
A A N 𝛼𝑥𝑡 + (1 − 𝛽 ∗ (𝑙𝑡 − 𝑙𝑡−1 ) + 𝑙𝑛 + ℎ𝑏𝑛
𝛼)(𝑙𝑡−1 + 𝑏𝑡 ) (1 − 𝛽 ∗ )𝑏𝑡−1
𝑛
A Ad A 𝛼𝑥𝑡 + (1 − 𝛽 ∗ (𝑙𝑡 − 𝑙𝑡−1 ) + 𝑙𝑛 +(∑𝑖=1 𝜙𝑖 )𝑏𝑡
𝛼)(𝑙𝑡−1 + 𝜙𝑏𝑡−1 ) (1 − 𝛽 ∗ )𝜙𝑏𝑡−1
A A A 𝛼(𝑥𝑡 − 𝑠𝑡−𝑝 ) + (1 − 𝛽 ∗ (𝑙𝑡 − 𝑙𝑡−1 ) + 𝛾(𝑥𝑡 − 𝑙𝑡−1 − 𝑙𝑛 + ℎ𝑏𝑛 +
𝛼)(𝑙𝑡−1 + 𝑏𝑡−1 ) (1 − 𝛽 ∗ )𝑏𝑡−1 𝑏𝑡−1 ) + (1 − 𝛾)𝑠𝑡−𝑝 𝑠𝑛+𝑝+(ℎ mod 𝑝)
M A M 𝛼( 𝑠𝑥𝑡 ) + (1 − 𝛽 ∗ (𝑙𝑡 − 𝑙𝑡−1 ) + 𝛾( 𝑙 𝑥−𝑏𝑡
) + (1 − (𝑙𝑛 +
𝑡−𝑝 𝑡−1 𝑡−1
𝛼)(𝑙𝑡−1 + 𝑏𝑡−1 ) (1 − 𝛽 ∗ )𝑏𝑡−1 𝛾)𝑠𝑡−𝑝 ℎ𝑏𝑛 )𝑠𝑛+𝑝+(ℎ mod 𝑝)
M Ad M 𝛼( 𝑠𝑥𝑡 ) + (1 − 𝛽 ∗ (𝑙𝑡 − 𝑙𝑡−1 ) + 𝛾( 𝑙 −𝜙𝑏𝑥𝑡
)+ 𝑙𝑛 +
𝑡−𝑝 𝑡−1 𝑡−1 𝑛
𝛼)(𝑙𝑡−1 + 𝜙𝑏𝑡−1 ) (1 − 𝛽 ∗ )𝜙𝑏𝑡−1 (1 − 𝛾)𝑠𝑡−𝑝 (∑𝑖=1 𝜙𝑖 )𝑏𝑡 )𝑠𝑛+𝑝+(ℎ mod 𝑝)

Best linear predictors

The BLP map 𝑄 ∶ 𝑋𝑡 → 𝑋̃ 𝑡 is a projection onto the span of 𝑋𝑛−1 , 𝑋𝑛−2 , ...
let the point forecast 𝑟 be a function that maps (𝑋1 , ..., 𝑋𝑛 ) to ℝ
define 𝑟(𝑥1 , ..., 𝑥𝑛 ) as the conditional mean of 𝑋𝑛+ℎ given observations (𝑥1 , ..., 𝑥𝑛 ) given in
Corollary 0.1. Note that conditional means are bayes optimal predictors wrt squared loss

𝑟(𝑥1 , ..., 𝑥𝑛 ) ∶= 𝔼[𝑋𝑛+ℎ |𝑋1 = 𝑥1 , ..., 𝑋𝑛 = 𝑥𝑛 ]

𝑟 = argmin[(𝑓(𝑋1∶𝑛 ) − 𝑋𝑛+ℎ )]
𝑓∶ℝ𝕟 →ℝ

This holds for any multivariate distributions. For the gaussian, recall that the conditional
mean is a function that is linear in its inputs. In other words, if we define 𝑟 as we have done
before, then

𝑇
𝑟(𝑥1 , ..., 𝑥𝑛 ) = 𝜙𝑛+ℎ|𝑛 𝑥𝑛∶1

where 𝜙𝑛+ℎ|𝑛 = Γ−1


𝑛 𝛾ℎ∶𝑛+ℎ−1

𝜙𝑛+ℎ|𝑛 is the argument that minimises ℒ(𝛽) = 𝔼[(𝛽 𝑇 𝑋𝑛∶1 − 𝑋𝑛+ℎ )2 ] even if the distribution is
not Gaussian

9
The connection between the BLP and the PACF

• If 𝛼𝑋 (ℎ) is the lag ℎ partial autocorrelation of (𝑋𝑡 ), then 𝛼𝑋 (ℎ) is the h-coordinate of
the regression vector of 𝑋ℎ+1 on 𝑋1 , 𝑋2 , ..., 𝑋ℎ
𝑇
• Denote VAR{𝑋ℎ+1 − 𝜙ℎ+1|ℎ 𝑋ℎ∶1 } as 𝜈ℎ+1 . Then

𝜈ℎ+1 = (1 − 𝛼𝑋 (ℎ)2 )𝜈ℎ

An Algorithm to compute PACFs


Denote 𝜙𝑡+1|𝑡 = (𝜙𝑡1 , 𝜙𝑡2 , ..., 𝜙𝑡𝑡 ).
𝜙00 = 0
for 𝑡 = 1, 2, ..

𝑡−1
𝜌𝑋 (𝑡) − ∑𝑘=1 𝜙𝑡−1,𝑘 𝜌𝑋 (𝑡 − 𝑘)
𝜙𝑡𝑡 = 𝑡−1
1 − ∑𝑘=1 𝜙𝑡−1,𝑘 𝜌𝑋 (𝑘)
for 𝑘 = 1, 2, ..., 𝑡 − 1

𝜙𝑡𝑘 = 𝜙𝑡−1,𝑘 − 𝜙𝑡𝑡 𝜙𝑡−1,𝑡−𝑘

Forecasts

General recipe for forecast

• Estimate model parameters from observations 𝑥1 , ..., 𝑥𝑛


• Deriving forecast distribution 𝑋̂ 𝑛+ℎ
• Use the mean of the forecast distribution as the point forecast

Examples

AR(p) models
̃
𝜙𝑡+1|𝑡 ∶= (𝜙1 , ..., 𝜙𝑝 , 0, 0, ...)

𝑥𝑡|𝑡−1
̂ ̃
= 𝜙𝑡+1|𝑡 𝑥𝑡−1∶1

10
ARMA(p,q) models
Let 𝑊̃ 𝑡 ∶= 𝑄[𝑊𝑡 ]

𝑄[𝑋𝑡+ℎ ] = 𝑄[𝜙(𝐵)𝑋𝑡+ℎ + 𝜃(𝐵)𝑊𝑡+ℎ ] (5)


= 𝜙(𝐵)𝑄[𝑋𝑡+ℎ ] + 𝜃(𝐵)𝑄[𝑊𝑡+ℎ ] (6)
𝑝
= ∑ 𝜙𝑖 𝑋̃ 𝑡+ℎ−𝑖 + ∑ 𝜃𝑗 𝑊̃ 𝑡+ℎ−𝑗 (7)
𝑖=1 𝑗=1

where

0, 𝑡>𝑛
𝛾𝑋 (ℎ) = {
𝑊𝑡 , otherwise
This formula assumes an infinite past, which is impractical. Truncated forecast removes this
assuming by constraint the unobserved past to be 0

0, if 𝑡 ≤ 0
𝑋𝑡 = {
𝑋𝑡 , otherwise

The forecast is now an conditional expectation over only observed values of (𝑋𝑡 )

𝑋̌ 𝑡|𝑛 = 𝔼[𝑋̌ 𝑡 |𝑋𝑗 = 0 for all 𝑗 ≤ 0]


This conditional expectation is also a linear map 𝑃 on the space of random variables.
Letting 𝑊̌ 𝑡 ∶= 𝐹 [𝑊𝑡 ], we have our forecast equations

𝑝
⎧∑𝑗=1 𝜙𝑗 𝑋̌ 𝑡−𝑗 + ∑𝑗=1 𝜃𝑗 𝑊̌ 𝑡−𝑞 , 𝑡 > 𝑛
{
𝑋̌ 𝑡|𝑛 = ⎨𝑋𝑡 , 1≤𝑡≤𝑛
{0 otherwise

⎧0, 𝑡>𝑛
{ 𝑞
𝑊̌ 𝑡|𝑛 = ⎨𝜙(𝐵)𝑋̌ 𝑡 − ∑𝑗=1 𝜃𝑞 𝑊̌ 𝑡−𝑞|𝑛 , 1 ≤ 𝑡 ≤ 𝑛
{0 otherwise

### Prediction intervals

11
long range forecasts
Let (𝑋𝑡 ) be an ARIMA process

𝜙(𝐵)(𝐼 − 𝐵)𝑑 𝑋𝑡 = 𝜃(𝐵)

Since $Y_t = (I-B)^dX_t = $ is an ARMA(p,q) process with mean 𝜇,


̂
lim 𝑌𝑛+ℎ|𝑛 =𝜇
ℎ→∞

̂
The forecast curve is the discrete integral of the sequence 𝑌𝑛+1|𝑛 ̂
, 𝑌𝑛+2|𝑛 , ....
if d=1 and 𝜇 ≠ 0

𝑋̂ 𝑛+ℎ|𝑛 = 𝑋𝑛 + 𝑌𝑛+1|𝑛
̂ ̂
+ 𝑌𝑛+2|𝑛 ̂
+ ... + 𝑌𝑛+ℎ|𝑛 ≈ 𝑋𝑛 + ℎ𝜇

Parameter estimation

Need to do this in order to apply models to data ### Method of Moments (Yule-Walker
Equations)

• Compute sample moments

• find model parameters such that population moments match sample moments

Method of Moments estimates for an AR(p) model.


First, we note that the BLP coefficients are the also the parameters of the AR(p) model

𝜙 = (𝜙1 , ..., 𝜙𝑝 )𝑇 = Γ−1


𝑝 𝛾1∶𝑝

Next, multiply both sides of the AR(p) equation by 𝑋𝑡 and take expectations.

𝔼[𝑋𝑡 𝑋𝑡 ] = 𝜙1 𝔼[𝑋𝑡−1 𝑋𝑡 ] + ... + 𝜙𝑝 𝔼[𝑋𝑡−𝑝 𝑋𝑡 ] + 𝔼[𝑊𝑡 𝑋𝑡 ]

𝛾𝑋 (0) = 𝜙1 𝛾𝑋 (1) + ... + 𝜙𝑝 𝛾𝑋 (𝑝) + 𝜎2


𝜎2 = 𝛾𝑋 (0) − 𝛾1∶𝑝
𝑇
Γ−1
𝑝 𝛾1∶𝑝

The moment of moments estimator of 𝜙 and 𝜎2 are thus

𝜙 ̂ = Γ̂ −1
𝑝 𝛾1∶𝑝
̂

12
and

𝜎̂ = 𝛾𝑋 ̂ 𝑇 Γ̂ −1
̂ (0) − 𝛾1∶𝑝 𝑝 𝛾1∶𝑝
̂

Method of Maximum Likelihood (MLE)

The goal of MLE is the maximise the (log)-likelihood function 𝑓𝛽 (𝑥1 , ..., 𝑥𝑛 ) given the data.

𝑙(𝛽) = − log(𝑓𝛽 (𝑥1 , ..., 𝑥𝑛 )) (8)


(9)

For AR(p) models, 𝑓𝛽 (𝑥1 , ..., 𝑥𝑛 ) factorises into

−(𝑥𝑡 −𝜙𝑇
1∶𝑝 (𝑥𝑡−1∶𝑡−𝑝 −𝜇)−𝜇)
2
(2𝜋𝜎2 )−1/2 exp(− 2𝜎2 ), 𝑡>𝑝
𝑓𝛽 (𝑥𝑡 |𝑥1 , ..., 𝑥𝑡−1 ) = { −(𝑥 −𝜙 𝑇
(𝑥 −𝜇)−𝜇) 2
(2𝜋𝜈𝑡2 )−1/2 exp(− 𝑡 1∶𝑡−12𝜈𝑡−1∶1 2 ), 1≤𝑡≤𝑝
𝑡

where 𝜈𝑡2 = 𝛾(0) − 𝛾1∶𝑡−1


𝑇
Γ−1
𝑡−1 𝛾1∶𝑡−1

Factorising the log-likelihood into conditional densities,

𝑙(𝛽) = − log(𝑓𝛽 (𝑥1 , ..., 𝑥𝑛 )) (10)


𝑛
= − log(∏ 𝑓𝛽 (𝑥𝑡 |𝑥1 , ..., 𝑥𝑡−1 )) (11)
𝑡=1
𝑝
𝑛 1 𝑆(𝜇, 𝜙1∶𝑝 )
= log(2𝜋𝜎2 ) + ∑ log(𝜈𝑡2 /𝜎2 ) + (12)
2 2 𝑡=1 2𝜎2

where

𝑝 𝑛
𝑇 𝜎2
2 𝑇
𝑆(𝜇, 𝜙1∶𝑝 ) = ∑(𝑥𝑡 − 𝜇 − 𝜙1∶𝑡−1 (𝑥𝑡−1∶1 − 𝜇)) ⋅ 2 + ∑ (𝑥𝑡 − 𝜇 − 𝜙1∶𝑝 (𝑥𝑡−1∶𝑡−𝑝 − 𝜇))2
𝑡=1
𝜈𝑡 𝑡=𝑝+1

Note that the presence of 𝜈12 , 𝜈22 , ..., terms make 𝑙(𝛽) a very complicated function. This can be
greatly simplfied by conditioning on the first p-values, 𝑋1 = 𝑥1 , ..., 𝑋𝑝 = 𝑥𝑝

13
Conditional MLE

Asymptotics

Others

Model Selection (AIC, AICc)

A digression into difference equations

A sequence of numbers 𝑢0 , 𝑢1 , 𝑢2 , ... that satisfies

𝑢𝑛 − 𝛼𝑢𝑛−1 = 0

where 𝛼 ≠ 0, 𝑛 = 1, 2, ... , represents a 1𝑠𝑡 order homogeneous difference equation.


We can solve equation ?@eq-1storder by

𝑢1 = 𝛼𝑢0 (13)
2
𝑢2 = 𝛼𝑢1 = 𝛼 𝑢0 (14)
⋮ (15)
𝑛
𝑢𝑛 = 𝛼𝑢𝑛−1 = 𝛼 𝑢0 (16)
(17)

given initial conditions 𝑢0


?@eq-1storder can be written in operator notation

(𝐼 − 𝛼𝐵)𝑢𝑛 = 0

with associated polynomial

𝛼(𝑧) = 1 − 𝛼𝑧 (18)

We want to solve for the roots of 𝛼(𝑧). That is, we want the 𝑧0 that solves 𝑎(𝑧0 ) = 0. Easily,
we can see that 𝑧0 = 𝛼−1 . The solution of ?@eq-1storder with initial condition 𝑢0 = 𝑐 is

𝑢𝑛 = 𝛼𝑛 𝑢0 (19)
−𝑛
= (𝑧0 ) 𝑐 (20)

14
which suggests to us that 𝑢𝑛 only depends on the initial condition 𝑢0 and the inverse of root
associated with 𝛼(𝑧).

Corollary 0.2. Consider the AR(1) equation

𝑋𝑡 = 𝜙𝑋𝑡−1 + 𝑊𝑡

where 𝑊𝑡 is white noise.


The ACF of an AR(1) process is a sequence that satisfies ?@eq-1storder.

Proof.
Multiply both sides by 𝑋𝑡 and take expectations

𝑋𝑡 𝑋𝑡 = 𝜙𝑋𝑡−1 𝑋𝑡 + 𝑊𝑡 𝑋𝑡 𝔼[𝑋𝑡 𝑋𝑡 ] = 𝜙𝔼[𝑋𝑡−1 𝑋1 ] + 𝔼[𝑊𝑡 𝑋𝑡 ]𝛾𝑋 (0) = 𝜙𝛾(1) + 𝜎2

Multiply both sides by 𝑋𝑡−1 and take expectations

𝑋𝑡 𝑋𝑡−1 = 𝜙𝑋𝑡−1 𝑋𝑡−1 + 𝑊𝑡 𝑋𝑡−1 𝔼[𝑋𝑡 𝑋𝑡−1 ] = 𝜙𝔼[𝑋𝑡−1 𝑋𝑡−1 ] + 𝔼[𝑊𝑡 𝑋𝑡−1 ]𝛾𝑋 (1) = 𝜙𝛾(0)

Now multiply both sides by 𝑋𝑡−2 and take expectations

𝑋𝑡 𝑋𝑡−2 = 𝜙𝑋𝑡−1 𝑋𝑡−2 + 𝑊𝑡 𝑋𝑡−2 𝔼[𝑋𝑡 𝑋𝑡−2 ] = 𝜙𝔼[𝑋𝑡−1 𝑋𝑡−2 ] + 𝔼[𝑊𝑡 𝑋𝑡−2 ]𝛾𝑋 (2) = 𝜙𝛾(1)

Hence we see that

𝜙𝛾(1) + 𝜎2 , ℎ = 0
𝛾𝑋 (ℎ) = {
𝜙𝛾(ℎ − 1), ℎ > 0

𝜎2
This gives the initial value 𝛾𝑋 (0) = (1−𝜙2 )

15
a sequence of numbers 𝑢0 , 𝑢1 , 𝑢2 , ... that satisfies

𝑢𝑛 − 𝛼1 𝑢𝑛−1 − 𝛼2 𝑢𝑛−2 = 0

where 𝛼2 ≠ 0, 𝑛 = 1, 2, ... , represents a 2𝑛𝑑 order homogeneous difference equation.


Going through a similar proof as Corollary 0.2, we see that the sequence of ACF values
(𝛾𝑋 (0), 𝛾𝑋 (1), ...) of a AR(2) also satisfies a 2nd order difference equation

𝑢𝑡 = 𝜙1 𝑢𝑡−1 + 𝜙2 𝑢𝑡−2

The values of the ACF can be obtained by solving this difference equation.

Proof. First, write the polynomial for the difference equation as

𝜙(𝑧) = 1 − 𝜙1 𝑧 − 𝜙2 𝑧2 (21)
= (1 − 𝑧1−1 𝑧1 )(1 − 𝑧2−1 𝑧2 ) (22)

Where the second equality holds because of the fundamental theorem of algebra.
Writing the 2nd-order difference equation in operator form, and noting that we can perform a
similar factorisation, we get

(𝐼 − 𝑧1−1 𝐵)(𝐼 − 𝑧2−1 𝐵)𝑢 = 0

We can easily check that all solutions to the 2nd-order difference equations are linear combi-
nations of solutions to either a) (𝐼 − 𝑧1−1 𝐵)𝑢 = 0 or b) (𝐼 − 𝑧2−1 𝐵)𝑢 = 0.
For a), we see that the 𝑢 that satisfies

𝑢 − 𝑧1−1 𝐵(𝑢) = 0 (23)


𝑢− 𝑧1−1 𝑢−1 =0 (24)

is 𝑧1−𝑡 . Doing the same thing for b), we come to the conclusion that all solutions to (𝐼 −
𝑧1−1 𝐵)(𝐼 − 𝑧2−1 𝐵)𝑢 = 0 are linear combinations of 𝑧1−𝑡 and 𝑧2−𝑡 . In other words,

𝑢𝑛 = 𝑐1 𝑧1−𝑛 + 𝑐2 𝑧2−𝑛

which can be verified simply by plugging this varies back into the ?@eq-2ndorder.

16
So now we can solve for 𝑐1 and 𝑐2 given some initial condition 𝑢0 and 𝑢1

𝑢0 = 𝑐1 + 𝑐2

𝑢1 = 𝑐1 𝑧1−1 + 𝑐1 𝑧2−2

Finally, note that 𝑧1 and 𝑧2 may be complex numbers, but the solution to the difference
equation always has to be real. This implies that IF 𝑧1 and 𝑧2 are complex, then they are
complex conjugates, which means the imaginary part cancels out in the sum, and we are left
with the real parts which undergo sinusoidal fluctuations.

Example 0.1. Find the ACF of

𝑋𝑡 = 1.5𝑋𝑡−1 − .75𝑋𝑡−2 + 𝑊𝑡

2
where 𝜎𝑤 =1
the AR polynomial of 𝑋𝑡 is

𝜙(𝑧) = 1 − 1.5𝐵 − .75𝐵2

Its roots are 1 ± 𝑖 √13 . Thus

𝜌(ℎ) = 𝑐1 𝑧1−ℎ + 𝑐1̄ 𝑧1̄ −ℎ (25)


= 𝑎|𝑧1 |−ℎ cos(ℎ𝜃 + 𝑏) (26)
1
= 𝑎|𝑧1 |−ℎ cos(ℎ arctan( √ /1) + 𝑏) (27)
3
𝜋
= 𝑎|𝑧1 |−ℎ cos(ℎ + 𝑏) (28)
6

2𝜋
Since the period of revolution for the cosine function is 2𝜋, the period of 𝑋𝑡 is 𝜋/6 = 12

Without loss of generality, a sequence of numbers 𝑢0 , 𝑢1 , 𝑢2 , ... that satisfies

𝑝
𝑢𝑛 + ∑ 𝛼𝑖 𝑢𝑛−1 = 0
𝑖=1

where 𝛼𝑝 ≠ 0, 𝑛 = 1, 2, ... , represents a 𝑝𝑡ℎ order homogeneous difference equati

17

You might also like