TIme-series Analysis
TIme-series Analysis
Preliminaries
Proposition 0.1. Let 𝑋 = (𝑋1 , 𝑋2 , ..., 𝑋𝑛+1 ) be a multivariate gaussian random vector with
mean vector 𝜇, covariance matrix Σ, and pdf defined as
𝑛+1 1 1
𝒩(𝜇, Σ) = (2𝜋)− 2 |Σ|− 2 exp − (𝑥 − 𝜇)𝑇 Σ−1 (𝑥 − 𝜇)
2
Σ1∶𝑛,1∶𝑛 Σ1∶𝑛,𝑛+1
Σ=[ ]
Σ𝑛+1,1∶𝑛 Σ𝑛+1,𝑛+1
the pdf distribution of 𝑋𝑛+1 conditioned on 𝑋1 = 𝑥1 , ..., 𝑋𝑛 = 𝑥𝑛 is also gaussian with mean
and variance
1
Stochastic process
ℙ{𝑋1 ≤ 𝑥1 , 𝑋2 ≤ 𝑥2 , .., 𝑋𝑛 ≤ 𝑥𝑛 }
function definition
mean 𝜇𝑋 (𝑡) ∶= 𝔼[𝑋𝑡 ]
autocovariance 𝛾𝑋 (𝑠, 𝑡) = Var[𝑋𝑠 , 𝑋𝑡 ]
cross-covariance 𝛾𝑋𝑌 (𝑠, 𝑡) = Var[𝑋𝑠 , 𝑌𝑡 ]
• Causality
• Invertibility
∞
∑ 𝜋𝑗 (𝑋𝑡 − 𝜇) = 𝑊𝑡
𝑗=0
∞
∑𝑗=0 |𝜋𝑗 | < ∞
Stationarity
2
𝛾𝑋𝑌 (ℎ)
cross- if (𝑋𝑡 ), (𝑌𝑡 ) stationary, 𝜌𝑋,𝑌 (ℎ) = √𝛾𝑋 (0)𝛾𝑌 (0)
correlation 𝛾𝑋,𝑌 (ℎ) = 𝛾𝑋,𝑌 (𝑡 + ℎ, 𝑡) is independent of t
1. Trend stationarity
𝑋𝑡 = 𝑓 𝑡 + 𝑌 𝑡
where𝑓𝑡 is a deterministic sequence and 𝑌𝑡 is a stationary process. Can be made stationary by
estimating the 𝑓𝑡 component and subtracting it from 𝑋𝑡
2. Unit roots
Intuition: processes that become stationary after taking 𝑑 difference are said to have 𝑑 unit
roots. Example: The first difference of a random walk is stationary, so it has 1 unit root.
Moments Estimation
Big idea: the sample mean, autocovariance, and autocorrelation are consistent and asymp-
totically normal estimators of the population
√
𝑛(𝑋̄ 𝑛 − 𝜇) →𝑑 𝑁 (0, 𝑉 )
where
∞ ∞
𝑉 = ∑ 𝛾𝑋 (ℎ) = 𝜎2 ( ∑ 𝜓𝑗 )2
ℎ=−∞ 𝑗=−∞
1 |ℎ|
𝑋̄ 𝑡 ∼ 𝑁 (𝜇, ∑ (1 − )𝛾 (ℎ))
𝑛 |ℎ|<𝑛 𝑛 𝑋
3
Theorem 0.2. Denote 𝜌1∶𝑘 = (𝜌(1), ..., 𝜌(𝑘)), 𝜌1∶𝑘
̂ = (𝜌(1),
̂ ..., 𝜌(𝑘))
̂
If (𝑋𝑡 ) is a stationary linear process with white noise and finite fourth moments, then for any
fixed k,
√
𝑛(𝜌1∶𝑛
̂ − 𝜌1∶𝑛 ) →𝑑 𝑁 (0, 𝑉 )
where
∞
𝑉𝑖𝑗 = ∑(𝜌(𝑢 + 𝑖) + 𝜌(𝑢 − 𝑖) − 2𝜌(𝑖)𝜌(𝑢))(𝜌(𝑢 + 𝑗) + 𝜌(𝑢 − 𝑗) − 2𝜌(𝑗)𝜌(𝑢))
𝑢=1
If (𝑋𝑡 ) is white noise, then for any fixed k,
√
𝑛𝜌1∶𝑘 →𝑑 𝑁 (0, 𝐼)
Corollary 0.1. If (𝑋𝑡 ) is a mean zero Gaussian process, then given (𝑋1 = 𝑥1 , 𝑋2 =
𝑥2 , ..., 𝑋𝑛 = 𝑥𝑛 ), Proposition 0.1 tells us that the conditional distribution of 𝑋𝑛+ℎ is a gaussian
with mean
𝑇
𝜇𝑋𝑛+ℎ |𝑋1 =𝑥1 ,...,𝑋𝑛 =𝑥𝑛 = 𝛾ℎ∶𝑛+ℎ−1 Γ−1
𝑛 𝑥𝑛∶1
and variance
𝑇
𝛾𝑋 (0) − 𝛾1∶𝑛∶𝑛+1 Γ−1 𝑇
𝑛 Σ1∶𝑛∶𝑛+1 𝛾1∶𝑛∶𝑛+1
where
for 𝑎 < 𝑏,
@cor-cdstation provides the distributional forecast, with the point forecast being the mean
and the prediction intervals being the corresponding quantiles of the variance.
4
ARMA(p,q) Models
𝜙(𝑧) = 1 − 𝜙1 𝑧 − 𝜙2 𝑧2 − ... − 𝜙𝑝 𝑧𝑝
𝜃(𝑧) = 1 − 𝜙1 𝑧 − 𝜙2 𝑧2 − ... − 𝜙𝑝 𝑧𝑝
Type Equation
(p,0) 𝜙(𝐵)𝑋𝑡 = 𝛼 + 𝑊𝑡
(0,q) 𝑋𝑡 = 𝛼 + 𝜃(𝐵)𝑊𝑡
(p,q) 𝜙(𝐵)𝑋𝑡 = 𝛼 + 𝜃(𝐵)𝑊𝑡
ACF (AR,MA,ARMA)
PACF
Causality
Invertibility
5
Transformation
Box-cox
Decomposition
We assume that a time series (𝑥𝑡 ) has 3 components, a trend-cycle component 𝑇𝑡 , a constant
mean-zero seasonal component 𝑆𝑡 (i.e., 𝑆𝑡+𝑝 = 𝑆𝑡 ), and a zero-mean stationary stochastic 𝑅𝑡
process.
They can be combined in an additive manner,
𝑥𝑡 = 𝑇𝑡 + 𝑆𝑡 + 𝑅𝑡
or in a multiplicative manner
1 𝑙 1 𝑙 1 𝑙 1 𝑙
∑ 𝑥𝑡+𝑗 = ∑ 𝑇𝑡+𝑗 + ∑ 𝑆𝑡+𝑗 + ∑𝑅 (1)
𝑚 𝑗=−𝑘 𝑚 𝑗=−𝑘 𝑚 𝑗=−𝑘 𝑚 𝑗=−𝑘 𝑡+𝑗
= 𝑇𝑡 (2)
1 ∞ 1 ∞ 1 ∞
∑(𝑥𝑘+𝑗𝑝 − 𝑇𝑘+𝑗𝑝 ) = ∑ 𝑆𝑘+𝑗𝑝 + ∑ 𝑅𝑘+𝑗𝑝 = 𝑆𝑘
𝑚 𝑗=1 𝑚 𝑗=1 𝑚 𝑗=1
Finally,
𝑅𝑡 = 𝑥𝑡 − 𝑇𝑡 − 𝑆𝑡
6
Statistical Tests
𝑝 𝑝
Table 5: 𝛾 = ∑𝑗=1 𝜙𝑗 − 1,𝜓𝑗 = − ∑𝑗+1 𝜙𝑖 , 𝑗 = 1, ..., 𝑝 − 1
Random Walk
Definition 0.1.
7
𝑋𝑡 = 𝛿𝑡 + 𝑋𝑡−1 + 𝑊𝑡 (3)
𝑡
= 𝑡𝛿𝑡 + ∑ 𝑊𝑗 (4)
𝑗=1
Note that a random walk is not stationary since 𝜇𝑋 (𝑡) = 𝑡𝛿𝑡 depends on t.
Linear Process
Definition 0.2.
∞
𝑋𝑡 = 𝜇𝑡 + ∑ 𝜓𝑗 𝑊𝑡−𝑗
𝑗=−∞
Gaussian Process
Definition 0.3. A stochastic process (𝑋𝑡 ) is a Gaussian process if all finite dimensional
projections (𝑋𝑡1 , 𝑋𝑡2 , ..., 𝑋𝑡𝑘 ) have a multivariate normal distribution.
Denoting, Γ𝑖𝑗 = 𝛾𝑋 (𝑡𝑖 , 𝑡𝑗 ), 𝜇 = (𝜇𝑋 (𝑡1 ), ..., 𝜇𝑋 (𝑡𝑘 )), 𝑥1∶𝑘 = (𝑥1 , 𝑥2 , ..., 𝑥𝑘 )
𝑘 1 1
𝑝(𝑥1 , 𝑥2 , ..., 𝑥𝑘 ) = (2𝜋)− 2 det(Γ)− 2 exp(− (𝑥1∶𝑘 − 𝜇)𝑇 Γ−1 (𝑥1∶𝑘 − 𝜇)
2
Note that if (𝑋𝑡 ) is weakly stationary, then 𝜇 is a constant and entries of Γ will only depend
on |𝑡𝑖 − 𝑡𝑗 |.
Taxonomy
A: additive
Ad: additive damped
M:multiplicative
8
E T S level 𝑙𝑡 trend 𝑏𝑡 seasonal 𝑠𝑡 𝑥𝑛+ℎ|𝑛
̂
A A N 𝛼𝑥𝑡 + (1 − 𝛽 ∗ (𝑙𝑡 − 𝑙𝑡−1 ) + 𝑙𝑛 + ℎ𝑏𝑛
𝛼)(𝑙𝑡−1 + 𝑏𝑡 ) (1 − 𝛽 ∗ )𝑏𝑡−1
𝑛
A Ad A 𝛼𝑥𝑡 + (1 − 𝛽 ∗ (𝑙𝑡 − 𝑙𝑡−1 ) + 𝑙𝑛 +(∑𝑖=1 𝜙𝑖 )𝑏𝑡
𝛼)(𝑙𝑡−1 + 𝜙𝑏𝑡−1 ) (1 − 𝛽 ∗ )𝜙𝑏𝑡−1
A A A 𝛼(𝑥𝑡 − 𝑠𝑡−𝑝 ) + (1 − 𝛽 ∗ (𝑙𝑡 − 𝑙𝑡−1 ) + 𝛾(𝑥𝑡 − 𝑙𝑡−1 − 𝑙𝑛 + ℎ𝑏𝑛 +
𝛼)(𝑙𝑡−1 + 𝑏𝑡−1 ) (1 − 𝛽 ∗ )𝑏𝑡−1 𝑏𝑡−1 ) + (1 − 𝛾)𝑠𝑡−𝑝 𝑠𝑛+𝑝+(ℎ mod 𝑝)
M A M 𝛼( 𝑠𝑥𝑡 ) + (1 − 𝛽 ∗ (𝑙𝑡 − 𝑙𝑡−1 ) + 𝛾( 𝑙 𝑥−𝑏𝑡
) + (1 − (𝑙𝑛 +
𝑡−𝑝 𝑡−1 𝑡−1
𝛼)(𝑙𝑡−1 + 𝑏𝑡−1 ) (1 − 𝛽 ∗ )𝑏𝑡−1 𝛾)𝑠𝑡−𝑝 ℎ𝑏𝑛 )𝑠𝑛+𝑝+(ℎ mod 𝑝)
M Ad M 𝛼( 𝑠𝑥𝑡 ) + (1 − 𝛽 ∗ (𝑙𝑡 − 𝑙𝑡−1 ) + 𝛾( 𝑙 −𝜙𝑏𝑥𝑡
)+ 𝑙𝑛 +
𝑡−𝑝 𝑡−1 𝑡−1 𝑛
𝛼)(𝑙𝑡−1 + 𝜙𝑏𝑡−1 ) (1 − 𝛽 ∗ )𝜙𝑏𝑡−1 (1 − 𝛾)𝑠𝑡−𝑝 (∑𝑖=1 𝜙𝑖 )𝑏𝑡 )𝑠𝑛+𝑝+(ℎ mod 𝑝)
The BLP map 𝑄 ∶ 𝑋𝑡 → 𝑋̃ 𝑡 is a projection onto the span of 𝑋𝑛−1 , 𝑋𝑛−2 , ...
let the point forecast 𝑟 be a function that maps (𝑋1 , ..., 𝑋𝑛 ) to ℝ
define 𝑟(𝑥1 , ..., 𝑥𝑛 ) as the conditional mean of 𝑋𝑛+ℎ given observations (𝑥1 , ..., 𝑥𝑛 ) given in
Corollary 0.1. Note that conditional means are bayes optimal predictors wrt squared loss
𝑟 = argmin[(𝑓(𝑋1∶𝑛 ) − 𝑋𝑛+ℎ )]
𝑓∶ℝ𝕟 →ℝ
This holds for any multivariate distributions. For the gaussian, recall that the conditional
mean is a function that is linear in its inputs. In other words, if we define 𝑟 as we have done
before, then
𝑇
𝑟(𝑥1 , ..., 𝑥𝑛 ) = 𝜙𝑛+ℎ|𝑛 𝑥𝑛∶1
𝜙𝑛+ℎ|𝑛 is the argument that minimises ℒ(𝛽) = 𝔼[(𝛽 𝑇 𝑋𝑛∶1 − 𝑋𝑛+ℎ )2 ] even if the distribution is
not Gaussian
9
The connection between the BLP and the PACF
• If 𝛼𝑋 (ℎ) is the lag ℎ partial autocorrelation of (𝑋𝑡 ), then 𝛼𝑋 (ℎ) is the h-coordinate of
the regression vector of 𝑋ℎ+1 on 𝑋1 , 𝑋2 , ..., 𝑋ℎ
𝑇
• Denote VAR{𝑋ℎ+1 − 𝜙ℎ+1|ℎ 𝑋ℎ∶1 } as 𝜈ℎ+1 . Then
𝑡−1
𝜌𝑋 (𝑡) − ∑𝑘=1 𝜙𝑡−1,𝑘 𝜌𝑋 (𝑡 − 𝑘)
𝜙𝑡𝑡 = 𝑡−1
1 − ∑𝑘=1 𝜙𝑡−1,𝑘 𝜌𝑋 (𝑘)
for 𝑘 = 1, 2, ..., 𝑡 − 1
Forecasts
Examples
AR(p) models
̃
𝜙𝑡+1|𝑡 ∶= (𝜙1 , ..., 𝜙𝑝 , 0, 0, ...)
𝑥𝑡|𝑡−1
̂ ̃
= 𝜙𝑡+1|𝑡 𝑥𝑡−1∶1
10
ARMA(p,q) models
Let 𝑊̃ 𝑡 ∶= 𝑄[𝑊𝑡 ]
where
0, 𝑡>𝑛
𝛾𝑋 (ℎ) = {
𝑊𝑡 , otherwise
This formula assumes an infinite past, which is impractical. Truncated forecast removes this
assuming by constraint the unobserved past to be 0
0, if 𝑡 ≤ 0
𝑋𝑡 = {
𝑋𝑡 , otherwise
The forecast is now an conditional expectation over only observed values of (𝑋𝑡 )
𝑝
⎧∑𝑗=1 𝜙𝑗 𝑋̌ 𝑡−𝑗 + ∑𝑗=1 𝜃𝑗 𝑊̌ 𝑡−𝑞 , 𝑡 > 𝑛
{
𝑋̌ 𝑡|𝑛 = ⎨𝑋𝑡 , 1≤𝑡≤𝑛
{0 otherwise
⎩
⎧0, 𝑡>𝑛
{ 𝑞
𝑊̌ 𝑡|𝑛 = ⎨𝜙(𝐵)𝑋̌ 𝑡 − ∑𝑗=1 𝜃𝑞 𝑊̌ 𝑡−𝑞|𝑛 , 1 ≤ 𝑡 ≤ 𝑛
{0 otherwise
⎩
### Prediction intervals
11
long range forecasts
Let (𝑋𝑡 ) be an ARIMA process
̂
The forecast curve is the discrete integral of the sequence 𝑌𝑛+1|𝑛 ̂
, 𝑌𝑛+2|𝑛 , ....
if d=1 and 𝜇 ≠ 0
𝑋̂ 𝑛+ℎ|𝑛 = 𝑋𝑛 + 𝑌𝑛+1|𝑛
̂ ̂
+ 𝑌𝑛+2|𝑛 ̂
+ ... + 𝑌𝑛+ℎ|𝑛 ≈ 𝑋𝑛 + ℎ𝜇
Parameter estimation
Need to do this in order to apply models to data ### Method of Moments (Yule-Walker
Equations)
• find model parameters such that population moments match sample moments
Next, multiply both sides of the AR(p) equation by 𝑋𝑡 and take expectations.
𝜙 ̂ = Γ̂ −1
𝑝 𝛾1∶𝑝
̂
12
and
𝜎̂ = 𝛾𝑋 ̂ 𝑇 Γ̂ −1
̂ (0) − 𝛾1∶𝑝 𝑝 𝛾1∶𝑝
̂
The goal of MLE is the maximise the (log)-likelihood function 𝑓𝛽 (𝑥1 , ..., 𝑥𝑛 ) given the data.
−(𝑥𝑡 −𝜙𝑇
1∶𝑝 (𝑥𝑡−1∶𝑡−𝑝 −𝜇)−𝜇)
2
(2𝜋𝜎2 )−1/2 exp(− 2𝜎2 ), 𝑡>𝑝
𝑓𝛽 (𝑥𝑡 |𝑥1 , ..., 𝑥𝑡−1 ) = { −(𝑥 −𝜙 𝑇
(𝑥 −𝜇)−𝜇) 2
(2𝜋𝜈𝑡2 )−1/2 exp(− 𝑡 1∶𝑡−12𝜈𝑡−1∶1 2 ), 1≤𝑡≤𝑝
𝑡
where
𝑝 𝑛
𝑇 𝜎2
2 𝑇
𝑆(𝜇, 𝜙1∶𝑝 ) = ∑(𝑥𝑡 − 𝜇 − 𝜙1∶𝑡−1 (𝑥𝑡−1∶1 − 𝜇)) ⋅ 2 + ∑ (𝑥𝑡 − 𝜇 − 𝜙1∶𝑝 (𝑥𝑡−1∶𝑡−𝑝 − 𝜇))2
𝑡=1
𝜈𝑡 𝑡=𝑝+1
Note that the presence of 𝜈12 , 𝜈22 , ..., terms make 𝑙(𝛽) a very complicated function. This can be
greatly simplfied by conditioning on the first p-values, 𝑋1 = 𝑥1 , ..., 𝑋𝑝 = 𝑥𝑝
13
Conditional MLE
Asymptotics
Others
𝑢𝑛 − 𝛼𝑢𝑛−1 = 0
𝑢1 = 𝛼𝑢0 (13)
2
𝑢2 = 𝛼𝑢1 = 𝛼 𝑢0 (14)
⋮ (15)
𝑛
𝑢𝑛 = 𝛼𝑢𝑛−1 = 𝛼 𝑢0 (16)
(17)
(𝐼 − 𝛼𝐵)𝑢𝑛 = 0
𝛼(𝑧) = 1 − 𝛼𝑧 (18)
We want to solve for the roots of 𝛼(𝑧). That is, we want the 𝑧0 that solves 𝑎(𝑧0 ) = 0. Easily,
we can see that 𝑧0 = 𝛼−1 . The solution of ?@eq-1storder with initial condition 𝑢0 = 𝑐 is
𝑢𝑛 = 𝛼𝑛 𝑢0 (19)
−𝑛
= (𝑧0 ) 𝑐 (20)
14
which suggests to us that 𝑢𝑛 only depends on the initial condition 𝑢0 and the inverse of root
associated with 𝛼(𝑧).
𝑋𝑡 = 𝜙𝑋𝑡−1 + 𝑊𝑡
Proof.
Multiply both sides by 𝑋𝑡 and take expectations
𝑋𝑡 𝑋𝑡−1 = 𝜙𝑋𝑡−1 𝑋𝑡−1 + 𝑊𝑡 𝑋𝑡−1 𝔼[𝑋𝑡 𝑋𝑡−1 ] = 𝜙𝔼[𝑋𝑡−1 𝑋𝑡−1 ] + 𝔼[𝑊𝑡 𝑋𝑡−1 ]𝛾𝑋 (1) = 𝜙𝛾(0)
𝑋𝑡 𝑋𝑡−2 = 𝜙𝑋𝑡−1 𝑋𝑡−2 + 𝑊𝑡 𝑋𝑡−2 𝔼[𝑋𝑡 𝑋𝑡−2 ] = 𝜙𝔼[𝑋𝑡−1 𝑋𝑡−2 ] + 𝔼[𝑊𝑡 𝑋𝑡−2 ]𝛾𝑋 (2) = 𝜙𝛾(1)
𝜙𝛾(1) + 𝜎2 , ℎ = 0
𝛾𝑋 (ℎ) = {
𝜙𝛾(ℎ − 1), ℎ > 0
𝜎2
This gives the initial value 𝛾𝑋 (0) = (1−𝜙2 )
15
a sequence of numbers 𝑢0 , 𝑢1 , 𝑢2 , ... that satisfies
𝑢𝑛 − 𝛼1 𝑢𝑛−1 − 𝛼2 𝑢𝑛−2 = 0
𝑢𝑡 = 𝜙1 𝑢𝑡−1 + 𝜙2 𝑢𝑡−2
The values of the ACF can be obtained by solving this difference equation.
𝜙(𝑧) = 1 − 𝜙1 𝑧 − 𝜙2 𝑧2 (21)
= (1 − 𝑧1−1 𝑧1 )(1 − 𝑧2−1 𝑧2 ) (22)
Where the second equality holds because of the fundamental theorem of algebra.
Writing the 2nd-order difference equation in operator form, and noting that we can perform a
similar factorisation, we get
We can easily check that all solutions to the 2nd-order difference equations are linear combi-
nations of solutions to either a) (𝐼 − 𝑧1−1 𝐵)𝑢 = 0 or b) (𝐼 − 𝑧2−1 𝐵)𝑢 = 0.
For a), we see that the 𝑢 that satisfies
is 𝑧1−𝑡 . Doing the same thing for b), we come to the conclusion that all solutions to (𝐼 −
𝑧1−1 𝐵)(𝐼 − 𝑧2−1 𝐵)𝑢 = 0 are linear combinations of 𝑧1−𝑡 and 𝑧2−𝑡 . In other words,
𝑢𝑛 = 𝑐1 𝑧1−𝑛 + 𝑐2 𝑧2−𝑛
which can be verified simply by plugging this varies back into the ?@eq-2ndorder.
16
So now we can solve for 𝑐1 and 𝑐2 given some initial condition 𝑢0 and 𝑢1
𝑢0 = 𝑐1 + 𝑐2
𝑢1 = 𝑐1 𝑧1−1 + 𝑐1 𝑧2−2
Finally, note that 𝑧1 and 𝑧2 may be complex numbers, but the solution to the difference
equation always has to be real. This implies that IF 𝑧1 and 𝑧2 are complex, then they are
complex conjugates, which means the imaginary part cancels out in the sum, and we are left
with the real parts which undergo sinusoidal fluctuations.
𝑋𝑡 = 1.5𝑋𝑡−1 − .75𝑋𝑡−2 + 𝑊𝑡
2
where 𝜎𝑤 =1
the AR polynomial of 𝑋𝑡 is
2𝜋
Since the period of revolution for the cosine function is 2𝜋, the period of 𝑋𝑡 is 𝜋/6 = 12
𝑝
𝑢𝑛 + ∑ 𝛼𝑖 𝑢𝑛−1 = 0
𝑖=1
17