Lecture 5 Notes Final20180304112607
Lecture 5 Notes Final20180304112607
Massimo Guidolin
March 2018
• The trending effect caused by functions of t are permanent and hence impress a trend,
as time is irreversible, for instance when Q = 1 ⇒ ∆yt = yt − yt−1 = δ1 + (t − t−1 ).
• They are trend stationary because yt+1 can differ from its trend value by the amount
t+1 and because this deviation is stationary, the series will exhibit only temporary
departures
PQ from the trend ⇒ the long-term forecast of yt+1 will converge to the trend
j
line j=0 δj t .
1
where ητ follows any ARMA(p, q) stationary process.
This expression is centered on the first-differences of yt .
• All deterministic trends can be converted into stochastic trends, while the opposite is
not true.
• In large samples, the deterministic time trend induced by the drift component dom-
inates the time series, while in small samples it is not always easy to discern the
difference between a driftless RW and a model with drift.
Figure 1 and 2 compare simulated deterministic (linear, quadratic, and cubic) trends to ran-
dom walk (RW) stochastic trend series (with no drift and with positive/negative drift): the
deterministic time trend induced by the drift component, dominates the time series.
2
Expectation of a Random Walk:
1. If µ = 0 hXi X
E[yt ] = E[yt+s ] = y0 + E τ = y0 + E[τ ] = y0
τ τ
that is constant equal to the initial value. However, all stochastic shocks have non-
decaying effects on the series.
If µ 6= 0
t
X
E[yt ] = Et [y0 + µt + τ ] = y0 + µt
τ =1
2. If µ = 0
s
X
Et [yt+s ] = Et [yt+s−1 + t+s ] = Et [yt+s−2 + t+s + t+s−1 ] = ... = Et [yt ] + Et [ t+i ] = yt
i=1
that is for any s ≥ 2 the conditional means for all values of yt+s are equivalent and the
current yt represents the minimum mean-squared loss function forecast of all future
values of the series.
If µ 6= 0
t
X
Et [yt+s ] = Et [y0 + µ(t + s) + t ] = y0 + µ(t + s)
τ =1
(a)
t
X t
X
V ar[yt ] = V ar[y0 + τ ] = V ar[τ ] = tσ 2
τ =1 τ =1
(b)
t+s
X t+s
X
V ar[yt+s ] = V ar[y0 + τ ] = V ar[τ ] = (t + s)σ 2
τ =1 τ =1
3
(d) r
E[(yt − y0 )(yt+s − y0 )] tσ 2 t
Corr[yt , yt+s ] = p =p =
V ar[yt ]V ar[yt+s ] tσ 2 (t + s)σ 2 t+s
• For sufficiently large samples, Corr[yt , yt+s ] ' 1, but as s grows Corr[yt , yt+s ]
declines below 1 ⇒ We cannot distinguish between a unit root process and a
stationary process with an autoregressive coefficient that is close to unity using
the ACF.
Figure 3 shows that, even though the sample autocorrelations are close to 1, they visi-
bly decay possibly instilling the doubt that we may be facing a highly persistent AR(1)
process.
• The appropriate degree of the polynomial can be set by standard t-tests, F-tests,
and/or information criteria.
• The regression is usually estimated using the largest value of Q considered rea-
sonable given the type of data or their frequency.
4
• The de-trended process can then be modeled using traditional methods
Unit Root Process and dth Order Integration: When a time series process {yt }
needs to be differenced d times before being reduced to the sum of constant terms plus
a white noise process, {yt } is said to contain d unit roots or to be integrated of order
d; we also write that yt ∼ I(d).
⇒ white noise series plus a constant intercept. It is a I(1) and it contains one unit root.
• If {yt } contains d unit roots, then {yt } is non-stationary, while the opposite does
not hold (e.g. explosive autoregressive process: yt+1 = µ + φyt + t+1 ).
• All I(d) processes (with d > 0) are non-stationary, but there are non-stationary
processes that do not fit the definition of unit root processes.
5
1.1 What Happens if One Incorrectly De-Trends a Unit Root
Series?
De-trending a Unit Root Process: When a time series process {yt } is I(d) but an
attempt is made to remove its stochastic trend by fitting deterministic (often polyno-
mial, spline-type) time trend functions, the resulting OLS residuals will still contain
one or more unit roots. Equivalently, deterministic de-trending does not remove the
stochastic trends.
that is the resulting expression for yt − ŷtde−trend still equals tτ =1 τ even when δ̂0 = y0
P
and δ̂1 = µ. Therefore, the source of unit roots and, consequently, the stochastic trend
have not been removed.
Figure 6 plots the series yt − ŷtde−trend on the left, and the ACF of yt − ŷtde−trend on the
right.
6
and an attempt is made to remove the deterministic trend by differentiating the series d
times, the resulting differentiated series will contain d unit roots in its moving-average
components and will therefore be not invertible. Equivalently, differentiating a trend-
stationary series, creates new stochastic trends that are shifted inside the shocks of the
series.
• The more you differentiate a trend-stationary process, the higher the number of
unit roots in the resulting MA process.
Example:
Starting from a deterministic trend
Q
X
yt = δj tj + t
j=0
that is a MA(1) with a complex time trend characterized by a unit root in the MA
component.
When Q = 1 this becomes an integrated moving-average of order 1
∆yt = δ1 + (t+1 − t )
If we differentiate a second time we obtain an integrated MA(2) process with two visible
roots in excess of 1, and an increasingly complex time trend function
Q Q
X X
2 j j
∆ yt ≡ ∆yt −∆yt−1 =( δj [t −(t−1) ]+t+1 −t )−( δj [(t−1)j −(t−2)j ]+t −t−1 ) =
j=0 j=0
Q
X
= δj {[tj − (t − 1)j ] − [(t − 1)j − (t − 2)j ]} + t+1 − 2t + t−1
j=0
7
Figure 7: Incorrect First-Differencing of a Trend-Stationary Series
Figure 7 plots the series ∆yt on the left, and the ACF of ∆yt on the right.
• Even when the trend-stationary component is absent, if the time series is I(0)
but it is differenced d times, the resulting differentiated series will contain d unit
roots in its moving-average components and will therefore be not invertible.
(a) If r > 0, when a time series is I(d) but it is differenced more than d times,
the resulting differentiated series will contain r unit roots in its moving-average
components and will therefore be not invertible (Over-differentiation).
(b) If r < 0, if the original time series is I(d) but it is differenced less than d times,
the resulting differentiated series will still contain d + r < d unit roots and will
therefore remain non-stationary (Under-differentiation).
8
Figure 8: Excessive Differencing of a I(1) Series
Figure 8 shows that the over-differenced integrated MA series tends to be spikier and
more volatile than the correctly differenced white noise series should be.
9
can be rewritten as
ηt = yt − a − bxt
Substituting yt with the stochastic trend equation we obtain
t
X t
X
ηt = (y0 + µy t + yτ ) − a − b(x0 + µx t + xτ ) =
τ =1 τ =1
t
X
(y0 − a − bx0 ) + (µy − bµx )t + (yτ − bxτ )
τ =1
⇒ ηt ia a unit root process, given that xτ and yτ are independent white noise errors so
that their sum is also a white noise.
t
X t
X
ηt = (y0 + µy t + yτ ) − a − b(x0 + µx t + xτ ) =
τ =1 τ =1
t−1
X
= [(µy − bµx ) + yt − bxt ] + (y0 − a − bx0 ) + (µy − bµx )(t − 1) + (yτ − bxτ )
τ =1
= (µy − bµx )t + ηt−1 + (yt − bxt )
that is, another random walk with drift.
This follows from the result on Sums of stationary and Non-stationary series stated
above, in the case of a sum of two I(1) variables.
• ηt is the weighted sum minus a constant of the two I(1) variables. Therefore
ηt ∼ I(1), which is the highest integration order of the variables that we are
combining.
• Starting from yt = a + bxt + ηt , when the regressand and regressor are both I(1),
it turns out that the regression errors must then also contain, unless special con-
ditions occur, a unit root.
⇒ When ηt is a random walk with drift, the assumptions of the classical regres-
sion model (yt and xt are stationary and the errors have a zero mean and a finite
variance) are not respected.
(a) the residuals are I(1) and as such any shock has a permanent effect on the regres-
sion, being equivalent to a permanent change of the intercept of the model
(b) R-square is high and t-statistics appear to be significant, but the results are void
of any economic meaning
(c) Standard OLS estimators are inconsistent and the associated inferential proce-
dures are invalid and statistically meaningless.
10
• Estimating spurious regressions and reporting and discussing their results is use-
less.
• Problems will also arise in regression analysis when the regressand and the regres-
sors are integrated of different orders. Regression equations using such variables
are meaningless.
Example: if in
t
X
ηt = (y0 − a − bx0 ) + (µy − bµx )t + (yτ − bxτ )
τ =1
• Because the estimated AR(1) coefficient equals Corr[yt , yt−1 ], this means that any
attempt to recover a unit root from generated RW data will fail because the OLS
estimate of the unit root coefficient will contain a downward bias.
⇒ Dickey and Fuller offer a procedure to formally test for the presence of a unit
root that takes this bias into account.
Dickey and Fuller procedure: find critical values using a Monte Carlo design
(a) Generate a large number of random walk sequences and for each of them calculate
the estimated value of the AR(1) coefficient
(b) For the law of large numbers, the larger the number of simulation trials, the more
precise the resulting distribution of the statistic
(c) Obtain critical values, which are generated by assuming a random walk.
Dickey-Fuller test: test for the presence of a unit root. From the AR(1) model
equation specified as
11
where (φ − 1) = α, or as
• It would be incorrect to simply use the standard t-student because standard in-
ference is invalid when the assumed stochastic process is non-stationary
• The t-test used takes into account that under the null of a random walk, its
distribution is nonstandard and cannot be analytically evaluated. A standard
inference would be invalid since the assumed stochastic process is non-stationary.
• The critical values need to be approximated by simulation, using null models that
are adjusted to either exclude the drift or include a time trend. The decision to
include a time trend depends also on economic theory or financial models.
• The ADF critical values depend on which deterministic regressors are included,
if any: the confidence intervals around α = 0 dramatically expand if a drift and
a time trend are included in the model.
• When the series is stationary the distribution of the t-statistic does not depend
on the presence of the other regressors.
• The DF test is limited by the fact that, given the null hypothesis of a random
walk, the alternative hypothesis is specified as an AR(1) stationary process. How-
ever, not all stationary time-series variables can be well represented by an AR(1)
process.
12
Augmented Dickey-Fuller (ADF) test: for general AR(p) processes
Consider the AR(p) process
yt+1 = φ0 + φ1 yt + φ2 yt−1 + ... + φp−1 yt−p+2 + φp yt−p+1 + t+1
Add and subtract φp yt−p+2
yt+1 = φ0 + φ1 yt + φ2 yt−1 + ... + (φp−1 + φp )yt−p+2 − φp ∆yt−p+1 + t+1
Add and subtract (φp−1 + φp )yt−p+3
yt+1 = φ0 + φ1 yt + ... + (φp−2 + φp−1 + φp )yt−p+3 − (φp−1 + φp )∆yt−p+2 − φp ∆yt−p+1 + t+1
Repeat this step p times and obtain
p
X
∆yt+1 = φ0 + αyt + γi ∆yt−i+1 + t+1
i=1
Pp Pp
where α ≡ −(1 − i=1 φi ) and γi ≡ − j=1 φj .
Alternatively, the regression can be specified as
p
X
∆yt+1 = αyt + γi ∆t−i+1 + t+1
i=1
Q p
X X
j
∆yt+1 = φ0 + δj t + αyt + γi ∆yt−i+1 + t+1
j=1 i=1
If α = 0 the equation is entirely in first differences and so it has a unit root, while
α < 1 is indication of stationarity.
• When p = 1, the ADF test boils down to a classical DF test because 1i=2 γi ∆yt−i+1 =
P
0.
• α and its standard error cannot be properly estimated unless all of the autoregres-
sive terms are included in the estimating equation, through correct specification
of p.
– Few lags: the regression residuals do not behave like a white noise process ⇒
the model does not appropriately capture the actual error process ⇒ α and
its standard error are not correctly estimated.
– Too many lags: the power of the test to reject the null of a unit root is reduced
since the increased number of lags necessitates the estimation of additional
parameters and a loss of degrees of freedom.
13
• Select p:
– Start with a relatively long lag length and pare down the model by the usual
t- and/or F -tests.
• The most general of the alternative regressions is not necessary the best choice,
since the presence of the additional estimated parameters may reduce the degrees
of freedom and the power of the test.
• The first regression assumes an AR(p) process, but in reality the DGP may con-
tain both autoregressive and moving average components. However, because an
invertible MA model can be transformed into an AR model, the procedure can
be generalized to allow for moving average components.
• Low power issues because ADF tests cannot distinguish between a series with a
characteristic root that is close to unity and a true unit root process, or between
a trend stationary and a stochastic trend processes. This is because in finite sam-
ples trend stationary process can be arbitrarily well approximated by a unit root
process, and a unit root process can be arbitrarily well approximated by a trend
stationary process.
The plot on the left compares 180 generated values from the two models:
14
both initialized at y0 = y1 = 0, in which the errors are Gaussian IID and kept identical
across simulated values.
The rightmost plot of Figure 9 shows that it can be difficult to distinguish between a
trend stationary and a random walk with drift process.
where the recommended ã is equal to (1 − (7/T )), when there is only an intercept, and
1(13.5/T ) when there is also a linear time trend.
• Instead of modelling a constant and/or time trends in the test regression, the data
are de-trended so that the deterministic explanatory variables are “taken out” of
the data prior to estimating the test regression.
(a) Phillips and Perron (PP) non-parametric method of controlling for serial correla-
tion when testing for a unit root: estimates the classical DF test equations and
modifies the t-ratio of the coefficient α so that serial correlation in the residuals
does not affect the asymptotic distribution of the test statistic.
v PT 2
T −m 1
t=1
ˆt
u
PP DF α̂ u T T
tα = tα ς − ψ = t P 2/9 +
se(α̂) [T ] 1
P T
ˆ
ˆ
i=[−T 2/9 ] T −i t=i+1 t t−i
P[T 2/9 ]
se(α̂)T ( i=[−T 2/9 ] T 1−i Tt=i+1 ˆt ˆt−i − T −m
P PT 2
T t=1 ˆt )
− q P 2/9
[T ]
2 ( i=[−T 2/9 ] T 1−i Tt=i+1 ˆt ˆt−i )( T1 Tt=1 ˆ2t )
P P
where m is the number of deterministic regressors included and se(α̂) is the OLS
standard error of the estimated coefficient.
15
• Because ς, ψ > 0, even when ς > 1, it is possible for tPα P < tDF
α , so that while
a DF test would not reject the null of a unit root, the PP test will.
(b) Kwiatkowski, Phillips, Schmidt, and Shin (KPSS) test: based on the residuals
from the OLS regression of the series on the exogenous, deterministic factors
where xt may contain the intercept and a polynomial time trend and the series is
assumed to be (trend-)stationary under the null.
PT Pt
t=1 ( i=1 ûi )2
KP SST ≡ PT 2/9 PT
1
T2 i=(−T 2/9 ) T −i t=i+1
ˆt ˆt−i
• Unit roots may characterize the MA(q) components of a time series process, which
made it non-invertible.
• Testing for a unit root in the MA polynomial is equivalent to testing that a time
series has been over-differenced.
• Testing for a unit root in the MA polynomial gives a chance to distinguish be-
tween trend-stationary and stochastically trending processes, because we know
that while the dth difference of an I(d) process is I(0), the dth difference of a
trend stationary process will contain d unit roots in the MA polynomial.
• Let {t } be observations from t+1 = θηt + ηt+1 with ηt+1 IID(0, ση2 ). Under
H0 : θ = −1 the statistic T (θ̂ + 1) based on the MLE θ̂ converges to a distribution
characterized by Davis and Dunsmuir.
A one-sided test with H0 : θ = −1 vs. H1 : θ > −1 can be shaped on this
limiting result by rejecting the null when θ̂ > −1 + csize /T where csize is the
(1size)-quantile of the limit distribution of T (θ̂ + 1).
16
4 Cointegration and Error Correction Models
• There are situations in which the choice to simultaneously difference all non-
stationary variables to transform them into a set of stationary variables may
imply a major loss of information, possibly invalid inference, and sub-optimal
predictive performance.
with
κ ≡ (1 + g)/(r − g) > 0
Pt+1 = (real) stock price
Ft+1 = some fundamental real quantity that investors discount when pricing stocks,
e.g. cash dividends or earnings
r= fixed real discount rate that reflects the riskiness of stocks
g = constant real rate of growth of fundamentals
t+1 = fair value of the stock index, that is temporary deviations of market prices from
what is justified by fundamentals.
• If {t+1 } were a I(1) series, this would mean that pricing errors would never be
corrected and on the contrary could diverge forever.
17
• If the time series {Pt+1 } and {Ft+1 } are I(1), a coefficient κ exists such that a
weighted sum of real prices and fundamentals yields a stationary variable:
I(1) − κI(1) ∼ I(0)
that violates the result on Sums of Stationary and Non-Stationary Series.
• It is necessary that the time paths of the two non-stationary variables {Pt+1 } and
{Ft+1 } are closely linked.
• By definition, if two or more variables are integrated of different orders, they can-
not be cointegrated.
18
Multiplicity of Cointegrating Vectors: If yt has N non-stationary components,
there may be as many as (N –1) linearly independent cointegrating vectors. The num-
ber of cointegrating vectors is called the cointegrating rank of yt .
• If yt contains only two variables, there can be at most one independent cointe-
grating vector.
• If
• Cointegrated variables are such because they share common stochastic trends.
Example: dividend growth model where {Pt } and {Ft } are constructed as two
random walks plus a stationary noise process
κ1 Pt +κ2 Ft = κ1 (RWtP +Pt )+κ2 (RWtF +Ft ) = (κ1 RWtP +κ2 RWtF )+(κ1 Pt +κ2 Ft )
Because the series in yt share one or more common stochastic trends, they cannot
drift “too far apart”.
19
4.3 Error Correction Models
(where the error vectors ut+1 can be serially correlated) then, because the vector equa-
tion will need to balanced, Πyt ∼ I(0) will imply that the variables in yt are CI(1, 1).
Because the N × N matrix Π 6= 0 contains only constants, each row of this matrix is
a cointegrating vector of yt and rank(Π) is the cointegrating rank of yt .
20
• The only case in which a VECM fails to imply a cointegrating relationship, is
when Π = 0 ⇒ the VECM boils down to a traditional reduced-form VAR(p) in
first differences and yt+1 will not respond to previous period’s deviation from the
long-run equilibrium.
• When Π 6= 0 ⇒ yt+1 will respond to the previous period’s deviations from the
long-run equilibrium ⇒ estimating the VECM as a VAR(p) in first differences is
inappropriate ⇒ the omission of Πyt leads to misspecification.
• In a model in which the error correction terms have been unduly dropped, the
VECM has no long-run solution, thus it has nothing to say about whether the
variables have an equilibrium relationship.
• It is preferable to use the first differences if the N I(1) variables are not cointe-
grated because
– For a VAR in levels, tests for Granger causality conducted on the I(1)
variables do not have a standard F-distribution.
– The impulse responses at long horizons would also lead to inconsistent esti-
mates of the true responses.
Granger’s Representation Theorem: For any set of N I(d) variables with identical
integration order, error correction and cointegration are equivalent representations.
where a12 6= 0, |a11 | < 1, and Pt+1 and Ft+1 are possibly correlated white noise processes.
We drop an intercept from the price equation to prevent the presence of a drift that
would cause arbitrage opportunities.
If a22 = 1 the equation for fundamentals is a random walk with drift and as such taking
the first difference of both equations gives
h1 − a i
P 11
∆Pt+1 = (a11 −1)Pt +a12 Ft +t+1 = −a12 Pt −Ft +Pt+1 and ∆Ft+1 = a20 +Ft+1
a12
which shows that the first equation is balanced, in the sense that both its left- and
right-hand sides are I(0) ⇐⇒ [((1 − a11 )/a12 )Pt − Ft ] ∼ I(0) independently of the
value taken by a12 . Impose two additional restrictions on the VAR to imply that prices
and fundamentals are CI(1, 1):
21
(a) a12 > 0 ⇒ when ((1 − a11 )/a12 )Pt < Ft and price is too low then
− a12 [((1 − a11 )/a12 )Pt − Ft ] > 0 ⇒ valid error correction dynamic
(b) a11 < 1 ⇒ [((1 − a11 )/a12 )1] ⇒ the model reaches its long-run equilibrium when
Ft = ((1 − a11 )/a12 )Pt or Ft /Pt = (1 − a11 )/a12 .
t+1
X
Ft+1 = F0 + a20 (t + 1) + Fτ
τ =1
⇒ prices are I(2) and fundamentals are I(1), then they cannot be cointegrated.
When |a22 | > 1 and/or |a11 | > 1 ⇒ both variables are non-stationary and explosive
and therefore no cointegration exists.
Pt = κ0 + κ1 Ft + et
22
• Determine whether prices and fundamentals are actually cointegrated ⇒
time series of estimated deviations from the long-run relationship, eˆt =
Pt − κ̂0 − κ̂1 Ft , are stationary ⇒ unit root test where the null expressed as
the impossibility to reject the null of a unit root in the residuals of the Engle-
Granger’s regression which implies that we cannot reject the null hypothesis
that prices and fundamentals are not cointegrated.
p p
X X
∆Ft+1 = λF êt + aF0 + aF1i ∆Pt+1−i + aF2i ∆Ft+1−i + Ft+1
i=1 i=1
• The estimation of the long-run equilibrium regression requires that the re-
searcher places one variable on the left-hand side and uses the others as
regressors.
(b) Multivariate, VECM-based tests: Take the potential existence of multiple coin-
tegrating vectors into account by using single-step full information maximum
23
likelihood estimation (Multivariate generalization of the Dickey-Fuller tests).
(c) Estimate the matrix Π from the unrestricted VAR for N non-stationary series
(d) Test whether we can reject the restrictions implied by the reduced rank of Π:
test for the number of eigenvalues that are insignificantly different from unity
using
H0 : number of distinct cointegrating vectors ≤ r
H1 : number of distinct cointegrating vectors > r
24
N
X
λtrace (r) ≡ −T ln(1 − λ̂i )
i=r+1
• The statistics have heterogeneous functional form, thus may give different results.
λmax is preferred when trying to pin down the number of cointegrating vectors.
25