0% found this document useful (0 votes)
61 views

Last Time Today

This document summarizes key concepts in time series analysis and statistics in finance. It discusses parameter redundancy in autoregressive models, the causal and stationary solution to the AR(1) process, Wold's decomposition theorem stating any stationary process can be represented as a deterministic process plus a causal moving average process, and the method of moments for parameter estimation in autoregressive models which equates theoretical moments to sample moments to estimate parameters.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

Last Time Today

This document summarizes key concepts in time series analysis and statistics in finance. It discusses parameter redundancy in autoregressive models, the causal and stationary solution to the AR(1) process, Wold's decomposition theorem stating any stationary process can be represented as a deterministic process plus a causal moving average process, and the method of moments for parameter estimation in autoregressive models which equates theoretical moments to sample moments to estimate parameters.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Statistics in Finance

Last time Today


Linear regression (cars, lm in R) Parameter redundancy, AR(1)
Causality & Invertibility examples Parameter estimation (method
AR(2) analysis (YW) of moments, OLS)
Parameter redundancy (ARMA) Forecasting (BLP)
PACF via Durbin-Levinson alg. (AR)

FM05: 2020-21
Parameter redundancy I

Why did we consider this exercise?


(Past) Exercise
Consider a process satisfying:

Xt = 0.4Xt−1 + 0.45Xt−2 + εt + εt−1 + 0.25εt−2

What model would best fit this?


Check causality, invertibility and express it, if possible, as a
linear model.

FM05: 2020-21
Parameter redundancy II

Motivating example
Consider a process Xt = εt , ∀t.
Then Xt−1 = εt−1 and 0.5Xt−1 = 0.5εt−1 .
Adding the two equations gives:
Xt + 0.5Xt−1 = εt + 0.5εt−1 .
Given this, can we recover the original process?
Φ(B)Xt = θ(B)εt , with Φ(x) = 1 + 0.5x, Θ(x) = 1 + 0.5x
Since both polynomials have degree 1, this is ARMA(1,1).
Is it? (I + 0.5B)Xt = (I + 0.5B)εt , or (I + 0.5B)(Xt − εt ) = 0
Sufficient Xt = εt , so we can recover the original process:
White Noise.

FM05: 2020-21
Parameter redundancy III

Answer: we considered parameter redundancy because we


want a parsimonious model (i.e. the model includes the
smallest number of variables necessary)
Eg: ARMA(2,2) has more variables than ARMA(1,1) (and more
parameters to estimate)

FM05: 2020-21
Causal process, linear model, stationary sol. to AR(1) I

(Slide in lecture week 3)


AR(1) model Xt = φ1 Xt−1 + εt
If |φ1 | < 1 then

X
−1
(1 − Bφ1 ) = φt1 B t
t=0

One can show that the series in the RHS applied to εt


converges (in L2 and a.s.) to a r.v. with finite variance for
|φ1 | < 1.
In general ∞
P t
P∞
t=0 θt B converges if |θt | < ∞.
P∞ t
Pt=0

Similarly t=−∞ θt B converges if t=−∞ |θt | < ∞.

FM05: 2020-21
Causal process, linear model, stationary sol. to AR(1)
II
AR(1): Xt = εt + φ1 Xt−1 = εt + φ1 εt−1 + φ21 Xt−2 = . . .
= εt + · · · + φn−1 n
1 εt−n+1 + φ1 Xt−n
Pn−1 t t
P∞ lett n t→ ∞: the term t=0 φ1 B (εt ) converges to
Now
t=0 φ1 B (εt ), which is well defined for |φ1 | < 1.
The last term φn1 Xt−n has square mean (by stationarity):
E(φ2n 2 2n 2 2n 2
1 Xt−n ) = φ1 E(Xt−n ) = φ1 E(X0 )
Using |φ1 | < 1 gives 0 as the limit of this last term as
n → ∞ (ARMA assumed finite variance, so E(X02 ) < ∞).
We get the stationary solution: Xt = ∞ t t
P
t=0 φ1 B (εt ) a.s.
It has linear dependency; think MA(∞).
It is causal: it does not depend on any future value εt+j ,
j > 0.
6

FM05: 2020-21
AR(1) when |φ1 | > 1

When |φ1 | > 1 the term ∞ t t


P
t=0 φ1 B (εt ) is not well-defined.

X 1
However ε
k t+k
is well-defined, and it can be shown
k=1
φ 1

X 1
that Xt = − εt+k is the unique stationary solution to
k=1 1
φk
AR(1).
BUT, it is not causal: it depends on future values of the
white noise process, at odds with reality.

FM05: 2020-21
AR(1) when |φ1 | = 1

1
Cannot define 1−z as a power/Laurent series that is
absolutely convergent on the unit disk.
No stationary solutions exists.
For |φ1 | > 1 one can define the Laurent series1 :

1 −1 1 −1 1 1 1
:= (1 − ) =− − 2 2 − 3 3 − ...
1 − φ1 z φ1 z φ1 z φ1 z φ1 z φ1 z

1
Laurent series include negative powers, unlike Taylor series 8

FM05: 2020-21
Wold’s decomposition theorem

Wold’s representation theorem


Any stationary time series process can be represented as the
sum of two mutually uncorrelated processes: one being a
deterministic process and the other a causal MA(∞) process.

Deterministic process can be


linear: removing linear trend
constant: removing constant trend
deterministic cycle: removing cosine trend
Corollary: a dynamic time series process that is stationary
can be approximated with a causal linear process (and
possibly a deterministic trend).
Even AR (or generally ARMA) can be represented as
causal linear process (see AR(1) stationary solution).
9

FM05: 2020-21
Estimation of parameters for the chosen model I

Given a time series: X1 , . . . , Xn


Compute the sample characteristics:
sample mean: X̄
sample ACF: ρ̂(h)
Assume model fitting was done already, and a particular
model has been chosen for the data
Thus under stationarity assumptions the data follows a
causal and invertible ARMA(p,q) model, with p and q
known, but with unknown parameters φi , θj .

10

FM05: 2020-21
Estimation of parameters for the chosen model II

The problem is that of the estimation of these parameters.


Estimator is a statistic, i.e. a rule for using the data (e.g
average)
Estimate is the numerical value of an estimator when
applied to a particular data set.
Same estimator would results in different estimates when
applied to different data sets (average gives different
results when applied to height data vs weight data)

11

FM05: 2020-21
Method of moments I

This method equates the theoretical moments of the model


(population) to the sample moments of the data, and solves for
parameters to obtain their estimates.
Recall: moment of order n is E(X n )
Mean = moment of order 1
(so equate theoretical mean with sample mean and solve for
the parameter)

12

FM05: 2020-21
Method of moments II

Example of parameter estimation in AR(p) model


Xt = φ1 Xt−1 + . . . φp Xt−p + εt
1 P
ci n−1
Xk Xk+i
Denote ri = ρ̂(i) = c0 = , ∀k ∈ {1, . . . , p} and let
1 P 2
Xk
n−1
1 Pn 2
c0 = γ̂(0) be the sample variance (say n−1 i=1 Xk ).
Then the method of moments combined with the Yule-Walker
equations give the Yule-Walker estimators φ̂, θ̂ satisfying:

ρ̂(h) = φ̂1 ρ̂(h − 1) + . . . φ̂p ρ̂(h − p), h ∈ {1, . . . , p}


σ̂ε2 = γ̂(0)[1 − φ̂1 ρ̂(1) − . . . φ̂p ρ̂(p)]

13

FM05: 2020-21
Method of moments III

AR(1): obtain equations r1 = φ̂1 1, (r0 = ρ(0) = 1) and


σ̂ε2 = c0 [1 − φ̂1 r1 ] with solutions:
φ̂1 = r1 and σ̂ε2 = c0 (1 − r12 ).

14

FM05: 2020-21
Method of moments IV

AR(2): write the equations satisfied by the Yule-Walker


estimators and solve them.
Hint: one equation for σ̂ε2 , one for h = 1 and one for h = 2,
using ρ(−h) = ρ(h). That is three equations, with unknowns:
φ̂1 , φ̂2 and σ̂ε2 .

15

FM05: 2020-21
Method of moments V

MA(1):
Xt = εt + θ1 εt−1 .
Recall the theoretical autocorrelation function had a single
θ1
non-zero spike at lag 1 given by ρ(1) = 1+θ 2.
1

Method of moments: equate theoretical with sample values and


solve for parameter s
1 P
c1 Xi Xi+1 1 1
ρ(1) = r1 = = n−11 P 2 =⇒ θ1 = ± −1
c0 n−1 Xi 2r1 4r12
(assuming σε2 = 1)

16

FM05: 2020-21
Method of moments VI

Exercise: Estimate the parameters with the method of moments


for MA(1) with r1 = 0.2 and write the equations satisfied by the
data. What do you notice in terms of the dependence structure
(i.e. ρ(h))?

We obtain θ1 = 0.208, θ2 = 4.791, so


Xt = εt + 0.208εt−1 and Xt = εt + 4.791εt−1 have the same
dependence structure. Which MA representation we retain and
why?

17

FM05: 2020-21
Comparison: Yule-Walker vs OLS estimators I

OLS for AR(1)


     
X2 X1 ε2
X3   X2   ε3 
 =
· · ·  · · ·  φ1 + · · ·
  

Xn Xn−1 εn

or X = Zβ + ε

OLS: β̂ = (Z 0 Z )−1 (Z 0 X ) = ( Xi2 )−1 ( Xi Xi+1 )


P P
1 P
n−1 Xi Xi+1 c1
= 1 P 2
= = r1 .
n−1 Xi c0
Therefore: φ̂1 = r1 (compare with YW).

18

FM05: 2020-21
Comparison: Yule-Walker vs OLS estimators II

OLS for AR(2)


     
X3 X2 X1   ε3
 X4   X3 X2  φ1  ε4 
  
 = +
· · ·  · · · . . .  φ2 · · ·
Xn Xn−1 Xn−2 εn
or X = Zβ + ε
r
1 (1−r2 )

1−r12
 
Derive β̂ =   and compare it with YW estimators.
r2 −r12
 
1−r12

19

FM05: 2020-21
Comparison: Yule-Walker vs OLS estimators III

Prop
i) YW estimators are optimal for AR(p) as OLS estimators.
ii) YW estimators are suboptimal for MA(q) or ARMA(p,q)
(models are not linear in the parameters).

(But they can be used as starting values for MLE with


Gauss-Newton iterations).
Remark: Some argue against YW for AR(2) because of poor
behaviour in cases when the solutions of the characteristic
equations are close to 1 of the order 10−1 to 10−3 . Check if
autocovariance matrix is almost singular, then use alternative
methods (Burg).

20

FM05: 2020-21
Forecasting I

Goal: Given a stationary time series X1 , . . . , Xn from a known


model with known parameters (estimated already), predict
future values Xn+m , m ≥ 1.

Definition (1-step ahead)


Best Linear Predictor (BLP) of Xn+1 is:
n
Xn+1 = φn1 Xn + φn2 Xn−1 + . . . φnn X1
where coeff φni are chosen to minimize
Mean Square Prediction Error: Pn+1 n n )2 ]
= E[(Xn+1 − Xn+1

Theorem
Prediction error and prediction variables are orthogonal:
n )X ] = 0, ∀i = 1, . . . , n
E[(Xn+1 − Xn+1 i

21

FM05: 2020-21
Forecasting II

Corollary
The BLP coeff φni satisfy:
n
X
φnj γ(k − j) = γ(k ), ∀k = 1, . . . , n. (1)
j=1

Xn
Proof: From Th. Pn ⊥ Xi so E[Xn+1 Xi ] = E[ φnj Xn+1−j Xi ].
j=1
n
X
Thus: γ(n + 1 − i) = φnj γ(n + 1 − j − i) and denote
j=1
k := n + 1 − i to get the result.

22

FM05: 2020-21
Forecasting III

BLP coeff
Matrix form: γ n = Γn φn
where   
γ(1) γ(0) γ(1) . . . γ(n − 1)
γ(2)  γ(1) γ(0) . . . γ(n − 2)
 · · ·  Γn =  · · ·
γn =    
... ... ... 
γ(n) γ(n − 1) γ(n − 2) . . . γ(0)
 
φn1
φn2 
φn = 
...

φnn
ARMA: φ̂n = Γ−1 n 0 −1
n γ n , Pn+1 = γ(0) − γ n Γn γ n

23

FM05: 2020-21
Example: prediction for AR(2)

n = 1 (prediction based on one observation X1 ) BLP:


X21 = φ11 X1 with φ11 γ(1 − 1) = γ(1) so X21 = ρ(1)X1 .
n = 2 (prediction based on two observations X1 , X2 ) BLP:
X32 = φ21 X2 + φ22 X1 with φ21 = φ1 , φ22 = φ2 .
n = nj=1 φnj Xn+1−j where φn1 = φ1 ,
P
n > 2 BLP Xn+1
n
φn2 = φ2 , φn3 = · · · = φnn = 0, so Xn+1 = φ1 Xn + φ2 Xn−1
n
For AR(p) model Pn+1 = σε2 .
For invertible MA(q) or ARMA(p,q) use truncated prediction, i.e.
truncate the infinite sum to the number of data points available.

24

FM05: 2020-21
m-step ahead prediction

Goal: what is best guess of Xn+m given X1 , . . . , Xn ?

n (m) (m) (m)


Xn+m = φn1 Xn + φn2 Xn−1 + . . . φnn X1
where
n
(m)
X
γ(m + k − 1) = φnj γ(k − j), ∀k = 1, . . . , n.
j=1

25

FM05: 2020-21
PACF via Durbin-Levinson algorithm I

PACF
Plotting φnn for n = 1, 2, . . . gives the PACF (partial
autocorrelation function)

It represents the (partial / additional) correlation between Xt−n


and Xt beyond that between Xt and Xt−1 , . . . , Xt−n+1 (see the
1-step ahead BLP formula).

26

FM05: 2020-21
PACF via Durbin-Levinson algorithm II

Durbin-Levinson alg. for PACF


Solve for φnn recursively, without any matrix inversion:
n = 0: φ00 = 0, P10 = γ(0)
ρ(n)− n−1
P
φn−1,k ρ(n−k)
n ≥ 1: φnn = Pk=1 , n
Pn+1 = Pnn−1 (1 − φ2nn )
1− n−1 φ
k =1 n−1,k
ρ(k)
where φnk = φn−1,k − φnn φn−1,n−k for n ≥ 2, k = 1, . . . , n − 1.

27

FM05: 2020-21
PACF via Durbin-Levinson algorithm III

Interpretation:
Eq (1) satisfied by BLP coeff are similar to normal
equations of regression.
Thus coeff φnn is like auto-regression coeff of Xn+1 on X1 ,
measuring the impact of X1 on Xn+1 while everything else
is held constant (thus partial correlation).
For AR(p) we have φnn → 0 for n > p (with asymptotic
variance the inverse of # obs.), and φpp → φp

28

FM05: 2020-21
ARMA(1,1) I

Consider a process satisfying (1 − φB)Xt = (1 + θB)εt .


Then the autocorrelation function can be computed as:

(1 + θφ)(φ + θ) h−1
ρ(h) = φ , h ≥ 1.
1 + 2θφ + θ2
Note: ACF decreases to zero and does not cut off at any lag.

29

FM05: 2020-21
ARMA(1,1) II

Figure

30

FM05: 2020-21
ARMA(1,1) III

ACF of ARMA(1,1) is just like that of an AR process, so how to


distinguish between ARMA and AR, based on ACF?
Answer: Not possible, need also PACF which is zero for lags
greater than p for AR(p), while it tails off for ARMA(p,q). (Tails
off = decreases fast towards zero without reaching it)

31

FM05: 2020-21
AR vs ARMA via PACF I

32

FM05: 2020-21
AR vs ARMA via PACF II

33

FM05: 2020-21
AR vs ARMA via PACF III

(Recall ACF can identify MA and its order)


AR(1) and ARMA(1,1) have similar (non-informative) ACF
PACF of AR(1) has one non-zero spike
PACF for ARMA(1,1) decreases towards zero
PACF used to distinguish AR from ARMA

34

FM05: 2020-21

You might also like