Dynamics
Dynamics
Dynamics
Part VI
• Whether our initial theory suggests dynamics or not, time series data often
This large section of the course deals with models in which adjacent observations
are related in one way or another. The most prominent application of this form of
model is with time-series data, although spatial data can have similar characteris-
p
X r
X
yt = µ + γi yt−i + βj xt−j + εt ,
i=1 j=0
or MA errors. [note: here x denotes all RHS vars, including lagged dep.
vars.]
PT
1
2. A regularity condition: plim T −s i=s+1 xt x0t−s = Q(s) finite
Pp
(a) Stationary: For a model yt = µ + i γi yt−i + εt , the roots (z) of the
4. (required for ARDL models) The regression relationship between y and its
regressors x is “stable” over time: the series are “cointegrated” (note that
each series y, x need not be stationary, just the relationship between them).
tionary series.
When one or more of these conditions doesn’t hold, a number of approaches are
Lxt = xt−1
L(Lxt ) = L2 xt = xt−2
Lp xt = xt−p
= ∆2 xt (second-difference)
∞
X 1
A(L) = 1 + aL + (aL)2 + (aL)3 + · · · = (aL)i = .
i=0
(1 − aL)
∞ ∞ ∞
xt X X X
= xt (aL)i = (aL)i xt = ai xt−i .
(1 − aL) i=0 i=0 i=0
C(L)yt = µ + R(L)εt ,
3. When a moving average disturbance process exist, then nonlinear least squares
C(L)yt = µ + R(L)εt
process in ε, such as
2. weak stationarity is satisfied if the root(s) z of C(z) lie outside the unit circle.
For example,
sive process of y:
µ
εt = D(L)yt −
R(L)
Trend stationary: yt = µ + βt + ut
Each of these can be characterized as (1 − L)yt = α + εt , where εt is white noise.
For example, consider the trend stationary case. Take first differences:
= β + εt .
error, MA(1). In any of these cases, the root of the characteristic equation for
y1 = y0 + u1
y2 = (y0 + u1 ) + u2
..
.
X
yt = y0 + ut .
t
X X
E[yt ] = y0 + ut = y0 ; Var[yt ] = σu2 = tσ 2 .
t t
2. γ̂OLS converges to its probability faster than normal, so we reject the null
• Take a random walk with a drift, and substitute lags back to infinity, and you
get:
∞
X
yt = µ + yt−1 + εt = (µ + εt−i )
i=0
∞
X ∞
X
yt − yt−1 = (µ + εt−i ) − (µ + εt−i )
i=0 i=1
∆yt = µ + εt
• Note that a unit root implies that past levels of y do not provide any infor-
yt are stationary.
d.
• For an individual (univariate) time series, the solution to the estimation prob-
lems associated with a unit root is to difference the data until it is I(0). Then
• For two or more related series in a regression, differencing of the data is not
dom walk, a random walk with a drift, and a trend stationary series:
(1 − γL)(yt − α − βt) = εt
define γ ∗ = γ − 1,
∆yt = β + εt ,
• γ ∗ ≤ 0.
• The test statistic is calculated as the usual t-statistic: γ̂ ∗ /se(γ̂ ∗ ) but it does
• Table 20.5 (Greene) provides three sets of critical values If you KNOW your
model doesn’t include a constant and/or a trend, then the appropriate critical
values will provide a more powerful test (reduce the chances of a type 2 error
• If |γ̂ ∗ /se(γ̂ ∗ )| < |critical value|, then fail to reject null, so difference or de-
trend the data. Else proceed with estimation on the original (nondifferenced)
data.
Same concept, but add lagged differences (out to p − 1) on the right hand side:
where, ∆yt−i is the ith lag of the first-difference of the dependent variable (first
Once we have have a stationary series, then we can begin to examine the data
series model more closely and (ultimately) consistently estimate the parameters of
a model.
Our goal is to characterize the relationship between yt and lags of yt . the estimated
Notation:
λ0 ≡ Var[yt ].
λ0 , λ1 , λ2 · · · .
E[yt yt−k ] λk
• The autocorrelation coefficient is ρk = √ √ = λ0 .
Var(yt ) Var(yt−k )
vided through by λ0 .
For estimation we will work mainly with the ACF and the Partial Autocor-
2.5.1 Variance of y
µ
yt = + A(L)εt where A(L) = R(L)/C(L).
C(1)
∞
µ X
= + αi εt−i . then
C(1) i=0
∞
X
λ0 = Var[yt ] = αi2 σε2 .
i=0
= γ1 λ1 + γ2 λ2 + E[yt εt ].
λ0 = γ1 λ1 + γ2 λ2 + σε2 . similarly,
λ0 = γ1 λ1 + γ2 λ2 + σε2
λ1 = γ1 λ0 + γ2 λ1
λ2 = γ1 λ1 + γ2 λ0
σε2
solve for λ0 to get λ0 = σy2 = 1−γ12 (1−γ2 )−1 +γ2
, then plug this into the formulas for
λ1 and λ2 to get the autocoviances. The Autocorrelation coefficients for the first
λ1 γ1
ρ1 = = γ1 + γ2 ρ1 =⇒ ρ1 =
λ0 1 − γ2
λ2 γ12
ρ2 = = γ1 ρ1 + γ2 =⇒ ρ2 = + γ2
λ0 1 − γ2
Generally,
= γ1 λk−1 + γ2 λk−2 ,
yt = εt − θ1 εt−1 − · · · − θq εt−q
!
X
γ0 = E[yt2 ] = σε2 1 + θq2
q
q
!
X
γk = Cov[yt , yt−k ] = σε2 −θ1 + θi−1 θi for i ≥ q.
i=2
exercises: Derive the ACF for an MA(1), then for an ARMA(1,1): (1 − γL)yt =
(1 − θL)εt .
The Autocorrelation is the gross correlation λk between yt and yt−k . The partial
autocorrelation, ρ∗k is the simple linear correlation between yt and yt−k after
e.g., for AR(1), yt = γyt−1 +εt , the correlation coefficient is ρ2 = corr(yt , yt−2 ) =
yt = ρ∗1 yt−1
etc.
µ) = εt . This implies:
∞
µ X
yt = + πi yt−i + εt ,
R(L) i=1
larger.
PT
t=k+1 yt yt−k
rk = PT .
2
t=1 yt
on the progressive regressions above, but usually they are estimated in the following
way: Regress yt and yt−k (separately) on 1 yt−1 . . . yt−(k−1) , and calculate
PT ∗
yt∗ yt−k
rk∗ = PTt=k+1
∗ 2
t=k+1 (yt−k )
basic ARMA processes. we will not discuss in class, but understand how to im-
The Wold Decomposition theorem states that every zero-mean covariance stationary
p
X ∞
X
yt = γi yt−i + πi εt−i .
i=1 i=0
into 3 steps:
tests for white noise (i.i.d.) errors to determine the appropriate lag struc-
ture.
ances.
NOTE: this class will only scratch the surface of ARIMA modeling.
The goal of the Box-Jenkins approach is to specify the most parsimonious model
Once you have a stationary series, you can test for non-zero ACFs and PACFs.
ACFs and PACs will be approximately N(0,1/T) under the null hypothesis of white
noise. A test statistic for the joint test of whether all elements of the ACF (PACF)
p
0
X rk2
Q = T (T + 2) .
T −k
k=1
1. generate the autocorrelogram and partial autocorrelogram for the series. If,
for any lag, the Q statistic leads to rejection of the null hypothesis of white
noise in the ACF and/or PACF, then add AR or MA model components based
2. Test for white noise in the residuals of the ARIMA model you have specified.
If tests indicate white noise for all lags in the correlograms, then stop. Else,
respecify the model and check the residuals again for white noise.
3. Akaike and Schwartz information criterion can also be used to help select a
model specification.
the Wold Decomposition theorem states that every zero-mean covariance stationary
p
X ∞
X
yt = γi yt−i + πi εt−i .
i=1 i=0
If we are able to specify our model such that πi = 0 for all i (that is, no MA
p
X
yt = γi yt−i + εt , εt ∼ white noise
i=1
are required.
y1 = µ + ε1 − θε0 → ε1 = y1 − µ + θε0
depends on the sample size. Thus, nonlinear least squares is often used.
• In practice, we don’t know what the true value of ε0 is. Often ε0 = 0 is used,
With more complicated ARIMA models, the estimated model can become quite
complicated.
You will see some forecasting methods based on Kalman filters in the next section.
Distributed lags deal with the current and lagged effects of an independent variable
yt = α + β0 xt + β1 xt−1 + β2 xt−2 + . . . + et
∞
X
=α+ βi xt−i + et
i=0
P
• The long-run effect over all future periods is βi (AKA equilibrium mul-
tiplier);
P∞ iβi P∞
• The mean lag is i=1
P∞
βj = i=1 iwi
j=0
The problem with the above model: an infinite number of coefficients. Two
for all i = 1 to ∞.
the coefficients of the current and lagged values of x, but we need to decide on p.
t-tests are usually not good for selecting lag length because lagged values of x
are likely to be highly correlated with current values. i.e. t-tests will have low
power.
Two better approaches, both based on the assumption that you know some
• Choose the lag length p ≤ P that maximizes R̄2 or minimizes the Akaike Info.
0
Criterion (AIC) = ln eTe + 2p T .
• Start with high P and do F-tests for joint significance of βi . Successively drop
Both methods tend to “overfit” (leave too many lags in), so high significance
Two models, the Adaptive Expectations Model and the Partial Adjustment
Model have been used a great deal in the literature. They are two specific models
that imply a specific form of infinite distributed lag effects called Geometric lags.
Suppose the current value of the independent variables determines the desired value
yt∗ = α + βxt + εt ,
but only a fixed fraction of desired adjustment is accomplished in one period (it
rearrange:
expectations change. Example: When input decisions (supply decisions) are based
yt = α + βx∗t+1 + δwt + εt
x∗t is the expected value for xt evaluated at time t − 1, and 0 < λ < 1. The second
the difference between the actual value of x in period t and last periods expectation
about xt .
(1−λ)
1. Rearrange the second equation to get x∗t+1 = (1−λL) xt .
1−λ
yt = α + β xt + δwt + εt
1 − λL
X∞
= α + β(1 − λ) λi xt−i + δwt + εt
i=0
x1 (1 − λ).
Note that the disturbances satisfy the CLRM assumptions, and if they are i.i.d.
normal, this recursive process (that minimizes SSE) is also Maximum Likelihood).
(1 − λ)
yt = α + β xt + εt
(1 − λL)
where ut = εt −λεt−1 is a moving average error. Rather than the recursive approach
discussed above, you could also use Instrumental Variables approach (replace yt−1 )
• The geometric lag is very restrictive regarding the relative impact of different
lagged values of x.
the ARDL is a more general form that can accommodate and approximate a huge
p
X r
X
yt = µ + γi yt−i + βj xt−j + εt , ε ∼ i.i.d ∀ t
i=1 j=0
C(L) = 1 − γ1 L − γ2 L2 − · · · − γp Lp and
B(L) = β0 + β1 L + β2 L2 + · · · + βr Lr
3.4.1 Estimation
yt = αyt−1 + εt
∞
εt X
yt = = αi εt−i
1 − αL i=0
∞
X
yt−1 = αi εt−i (note index i starts at 1)
i=1
So, yt−1 is a function of εt−1 and all previous disturbances. The CLRM assumption
of E[ε0 X] = 0 holds if ε is i.i.d. (in this case X is lag y), because yt−1 is not
Show for yourself that Cov[yt−1 , εt ] 6= 0 for a model with one lagged de-
generally is
∞ Pr
B(1) i βi
X
Long Run multiplier = αi = = A(1) = P p
i=0
C(1) 1 − i γi
B(L)
where A(L) = C(L) . Assuming no shocks (disturbances) and assuming stationarity,
Consider an ARDL(2,1):
(β0 + β1 L)
yt = µ̃ + xt + ε̃t
(1 − γ1 L − γ2 L2 )
B(L)
= µ̃ + xt + ε̃t
C(L)
= µ̃ + A(L)xt + ε̃t
∞
X
= µ̃ + αi xt−1 + ε̃t
i=0
to calculate αi .
A(L)C(L) = B(L)
(α0 −α0 γ1 L−α0 γ2 L2 )+(α1 L−α1 γ1 L2 −α1 γ2 L3 )+(α2 L2 −α2 γ1 L3 −α2 γ2 L4 )+· · · = β0 +β1 L
L0 : α0 = β0
L1 : −α0 γ1 + α1 = β1
L2 : −α0 γ2 − α1 γ1 + α2 = 0
L3 : −α1 γ2 − α2 γ1 + α3 = 0
βi and γi .
α 0 = β0
α1 = β1 + α0 γ1 = β1 + β0 γ 1
α2 = α0 γ2 + α1 γ1 = β0 γ2 + (β1 + β0 γ1 )γ1
α3 = α1 γ2 + α2 γ1 = etc. · · ·
Consider an ARDL(2,1):
yT +1 |yT = γ1 yT + γ2 yT −1 + β0 xT +1 + εT +1
0
= γ 0 xT +1 + εT +1 , where xT +1 = yT yT −1 xT +1 .
Var[e1 |T ] = E[ε0T +1 εT +1 ]
q
A forecast interval for y1 is ŷ1 ± tα/2 \
Var[e 1 |T ].
γ1 γ2 γ3 ··· γp−1 γp yT
ŷT +1 µ̂T +1 ε̂T +1
1
0 0 ··· 0 0 yT −1
yT 0 0
= +0 ··· 0 yT −2 +
1 0 0
y 0 0
T −1 .. .
..
..
..
. ..
. . .
0 0 0 ··· 1 0 yT −p
conditional on xT +1 ).
2
σ 0 · · ·
Cov[ε̂T +1 ] = E[(ŷT +1 − yT +1 )(ŷT +1 − yT +1 )0 ] = 2 0
0 0 · · · = σ jj ,
. .. . .
.. . .
0
where j = 1 0 0 · · · (which is a p×p matrix) and Var[εT +1 ] = Cov11 [ε̂T +1 ] =
σ2 .
Note: The forecast errors ε̂T +i are included above for intuition about the forecast
variance. When calculating the point estimates ŷT +i , set ε̂T +i to its expected value
of zero.
For T+2:
F
X
ŷF = CF y0 + Cf −1 [µ̂F −(f −1) + ε̂F −(f −1) ]
f =1
−1
" F
#
X
2 0 i 0 i 0
Var[ŷF ] = σ jj + [C ]jj [C ] .
i=1
Example: ARDL(2,1):
ŷT +1 µ̂T +1 γ̂1 γ̂2 yT
= +
yT 0 1 0 yT −1
An AR(1) model
yt = βxt + vt ; vt = ρvt−1 + εt
can be written as
yt = βxt + vt ,
⇒ vt R(L) = εt .
εt
yt = βxt +
R(L)
R(L)yt = βR(L)xt + εt
Implications
version of an ARDL(p,p).
E.g. an ARDL(2,2)model is
β1 + γ1 β0 0
f (b) = =
β2 + γ2 β0 0
CFRs using characteristic roots. A more flexible and general method of test-
ing the specification of ARDL models is based on the roots of the Lag operator
polynomials.
C(L) = (1 − γ1 L − γ2 L2 ) = (1 − λ1 L)(1 − λ2 L)
B(L) = β0 (1 − β1 L − β2 L2 ) = β0 (1 − τ1 L)(1 − τ2 L)
where λi and τi are characteristics roots (note, we just arbitrarily changed the signs
(1 − λ2 L)yt = (1 − τ2 L)β0 xt + ut
εt
where ut = 1−ρL = ρut−1 + εt , an AR(1) error process.
1. The ARDL(2,2) has a white noise error, can be estimated consistently with
OLS.
2. The restricted model has a lagged dep. var and and AR(1) — OLS is incon-
sistent.
2. Use IV with an instrument for the lagged dep. vars on right hand side; per-
Question: How would you test for autocorrelated errors in an ARDL(p,r) model?
γ−1
Then multiply (β0 + β1 )xt−1 by γ−1 to get
β0 + β1 B(1)
∆yt = µ + β0 ∆xt + (γ − 1)(yt−1 − θxt−1 ) + εt , where θ = = .
1−γ C(1)
B(1)
, where C(1) is the long-run multiplier we saw a while back. This is called an Error
correction model, or more precisely, the error correction form of the ARDL(1,1)
µ µ
where µ̃ = 1−γ = − γ−1 and γ̃ = (γ − 1). ∆yt is comprised of two components (plus
disturbance): a short run shock from ∆xt and a reversion toward equilibrium, or
ȳ = µ̃ + θx̄
Therefore, yt−1 − (µ̃ + θxt−1 ) represents deviation from the equilibrium relationship
culated based on estimates from the original ARDL(1,1) model. Alternatively, all
parameters of the ARDL(1,1) model can be calculated with the parameters from
The results will be identical. Covariances can be calculated using the Delta method
if necessary. You could also estimate the ECM model parameters directly via non-
3.5 Cointegration
• If any of the variables are I(d) with d > 0, then many or all parameter
• The exception: if two or more of the series are integrated of the same order —
drifting or trending at the same rate — then we may be able to find a linear
and ut are uncorrelated white noise errors. Both y1 and y2 are I(1), because their
first difference is stationary. Now consider the error process from a relationship
between y1 on y2 :
yit = αy2t + εt
y1t
εt = y1t − αy2t = 1 −α
y2t
= (3t + ut ) − α(t + vt )
• This is a linear combination of two I(1) variables, and so would in most cases
be I(1), and the variance if εt would explode as t increases (i.e. not stationary).
yt = α0 wt + γyt−1 + β0 xt + β1 xt−1 + εt
can be written as
∆yt = α0 zt + β0 ∆xt + γ ∗ zt + εt
• If y and x are I(1), and wt are I(0), then ε is I(0) if zt = yt−1 − θxt−1 is
I(0).
stationary.
• If such a relationship DOES hold, then ε will be stationary and we can es-
timate both the ARDL(1,1) form and the ECM form in a standard fashion
(OLS, NLS, with standard sampling distributions for the parameter estimates)
is not covariance stationary, and therefore the parameter estimates are not
covariance stationary, which means their sampling distributions are not sta-
tionary.
∆yt = α0 wt + β0 ∆xt + γ ∗ zt + εt
variables, then the parameters on those I(0) variables can be estimated consistently
with OLS applied to the original ARDL(p,q) model, and the t-statistics on these pa-
regression only a subset of the parameters are associated with I(0) variables, this
don’t.
and a multiple equation approach. We begin with the single equation approach and
(or I(0)) in the context of then the errors in a regression of one on the others
• This test statistic does not have the same distribution as the usual DF test
• Forecasting
y1t = µ1 + δ111 y1t−1 + δ112 y2t−1 + δ211 y1t−2 + δ212 y2t−2 + ε1t
y2t = µ2 + δ121 y1t−1 + δ122 y2t−1 + δ221 y1t−2 + δ222 y2t−2 + ε2t
or
y1t µ1 δ111 δ112 y1t−1 δ211 δ212 y1t−2 ε1t
= + + +
y2t µ2 δ121 δ122 y2t−1 δ221 δ222 y2t−2 ε2t
or
yt = µ + Γ1 yt−1 + Γ2 yt−2 + εt
Where δjml is the coefficient for the j th lag in the mth equation on the lth endogenous
variable.
yt = γyt−1 + βxt−1 + εt
“Granger causes” y.
• Granger causality of x on y is absent when f (yt |yt−1 , xt−1 , xt−2 , · · · ) = f (yt |yt−1 );
Example 19.8 (Greene), but extended to a VAR(2,2) here: increased oil prices have
0
preceded all but one recession since WWII. Let yt = GNP OIL PRICE .
GNP t µ1 α1 α2 GNPt−1 α3 α4 GNPt−2 ε1t
= + + +
P OILt µ2 β1 β2 P OILt−1 β3 β4 P OILt−2 ε2t
otherwise it does.
first the restricted and unrestricted regressions of the first equation (GNP) alone; no
need to estimate the second equation for this test. Test stat distributed χ2 (J = 2).
Impulse response functions track the effect of a single shock (from one or more
shock.
y1t µ1 δ111 δ112 y1t−1 δ211 δ212 y1t−2 ε1t
= + + +
y2t µ2 δ121 δ122 y2t−1 δ221 δ222 y2t−2 ε2t
or
yt = µ + Γ1 yt−1 + · · · + Γp yt−p + vt
(mx1) (mx1) (mxm) (mx1) (mxm) (mx1) (mx1)
For forecasting we can use the same Kalman filter arrangement as with the ARDL
Pp
model before: Recast the general model yt = µ + i Γi yt−i + vt as
yt µ Γ Γ2 ··· Γp yt−1ε
1 t
yt−1 0 I 0 ··· 0 yt−2 0
. =.+ . . + .
. . . ..
. . . . ··· 0 .. ..
yt−p+1 0 0 ··· I 0 yt−p 0
or
ỹt = µ̃ + Γ (L)ỹt + vt .
as
yt = µ + Γ (L)yt + vt .
[I − Γ (L)]yt = µ + vt
yt = [I − Γ (L)]−1 (µ + vt )
∞
X
−1
= [I − Γ (L)] µ+ Γ i vt−i
i
∞
X
= ȳ + Γ i vt−i
i
= ȳ + [I − Γ (L)]−1 vt
eigenvalues must be less than one in absolute value (i.e. the moduli), whether or
not the eigenvalue(s) are real or complex. Note, the modulus of a complex number
√
h + vi is R = h2 + v 2 .
What we are interested in is how a one-time shock flows through to the yi,t+j . In
general, a set of impulse response function and it’s covariance matrix is calculated
as
ŷT +s = ȳ + Γ s vT
s−1
X
Σ̂T +s = Γ i Ω(Γ 0 )i
i=0
Example: Suppose a first order VAR with µ = 0 for both equations in.
y1t 0.008 0.461 y1t−1 v1t 1 .5
= + ; Ω = Cov[v1t , v2t ] =
y2t 0.232 0.297 y2t−1 v2t .5 2
Now, suppose a one unit change in v2t at t=0, such that v20 = 1. Then
y10 0
=
y20 1
y
11 0.008 0.461 y
10 0.461
= =
y21 0.232 0.297 y20 0.297
2
y
12 0.008 0.461 y
11 0.008 0.461 y
10 0.141
= = =
y22 0.232 0.297 y21 0.232 0.297 y20 0.195
0
1 .5 0.008 0.461 1 .5 0.008 0.461
Σ̂2 = +
.5 2 0.232 0.297 .5 2 0.232 0.297
VAR can be applied to a set of nonstationary variables are and cointegrated (see
p+1
X
Yt = Xt B + Yt−i Φi + Ut
i=1
where Yt are I(1), Xt are assumed I(0) deterministic variables, and B and Γi are
p
X
∆Yt = Xt B + Yt−1 Π + ∆Yt−i Γi + Ut
i=1
Pp+1
where Γp = −Φp+1 , Γi = Γi+1 − Φi+1 for i = 1...p and Π = i=1 Φi − Ig . This
If the original variables Y are I(1) and the deterministic variables X are I(0),
Note that the estimated value of Π will always be full rank (unless there is
perfect collinearity in the data to begin with, but then you wouldn’t be able to run
a regression in the first place). The question is: can we test where our estimates
(Greene uses M ) in a VAR with I(0) variables, there can be up to g-1 cointegrating
vectors.
The Johansen test seems to be the most popular method for testing for cointe-
grating vectors in VARs. We will not go into detail about estimation, but there is
(up to g).
2. For each r starting with zero, a trace statistic or max statistic is calculated,
accept null.
Note that r > 1 implies more than one possible long-run relationship repre-
Θyt = α + Φyt−1 + εt
Thus, we are simply back to simultaneous equation systems, but with the issues of
1 −θ12
Θ = .
−θ21 1
Then we have a dynamic simultaneous equations problem, with all the lagged
dep. vars. being predetermined and therefore, for our purposes, exogenous.
Hsiao (1997) shows that if you have nonstationarity but cointegrating relation-
ships in your model, then 2SLS and 3SLS can proceed as usual to address endo-
geneity.