0% found this document useful (0 votes)
34 views25 pages

Lecture 5 Notes Final20180304112607

The document discusses unit root processes and cointegration. It defines unit root processes as either containing deterministic trends like linear, quadratic, or cubic functions of time, or containing a stochastic trend. A series with a stochastic trend follows a random walk or random walk with drift process and is non-stationary. The document also discusses how to identify unit root processes and their order of integration, as well as the implications of incorrectly de-trending a unit root series.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views25 pages

Lecture 5 Notes Final20180304112607

The document discusses unit root processes and cointegration. It defines unit root processes as either containing deterministic trends like linear, quadratic, or cubic functions of time, or containing a stochastic trend. A series with a stochastic trend follows a random walk or random walk with drift process and is non-stationary. The document also discusses how to identify unit root processes and their order of integration, as well as the implications of incorrectly de-trending a unit root series.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Unit Roots and Cointegration

Massimo Guidolin

March 2018

1 Defining Unit Root Processes

Deterministic trends: linear or nonlinear functions of time t, yt = f (t) + t . Example:


Q
X
yt = δj tj + t
j=0

where {t } is a white noise process.

• The trending effect caused by functions of t are permanent and hence impress a trend,
as time is irreversible, for instance when Q = 1 ⇒ ∆yt = yt − yt−1 = δ1 + (t − t−1 ).

• They are trend stationary because yt+1 can differ from its trend value by the amount
t+1 and because this deviation is stationary, the series will exhibit only temporary
departures
PQ from the trend ⇒ the long-term forecast of yt+1 will converge to the trend
j
line j=0 δj t .

Stochastic trend: all series that can be written as


Xt
yt = y0 + µt + τ
τ =1

• V ar[t ] = σ 2 and yt+1 = y0 + µt + tτ =1 τ + µ + t+1 = µ + yt + t+1


P
⇒ The series follows a random walk with drift process.

or more generally, yt contains a stochastic trend ⇐⇒ it can be decomposed as


t
X
yt = y0 + µt + ητ
τ =1

1
where ητ follows any ARMA(p, q) stationary process.
This expression is centered on the first-differences of yt .

• All deterministic trends can be converted into stochastic trends, while the opposite is
not true.

• A series {yt } can be decomposed as yt =trend + stationary component+noise=(deterministic


trend+stochastic trend)+stationary component+noise.

• In large samples, the deterministic time trend induced by the drift component dom-
inates the time series, while in small samples it is not always easy to discern the
difference between a driftless RW and a model with drift.

Figure 1: Comparing Trend-Stationary Series and Stochastic Trends

Figure 2: Estimated Deterministic Trends and Estimated Random Walk

Figure 1 and 2 compare simulated deterministic (linear, quadratic, and cubic) trends to ran-
dom walk (RW) stochastic trend series (with no drift and with positive/negative drift): the
deterministic time trend induced by the drift component, dominates the time series.

2
Expectation of a Random Walk:

1. If µ = 0 hXi X
E[yt ] = E[yt+s ] = y0 + E τ = y0 + E[τ ] = y0
τ τ

that is constant equal to the initial value. However, all stochastic shocks have non-
decaying effects on the series.
If µ 6= 0
t
X
E[yt ] = Et [y0 + µt + τ ] = y0 + µt
τ =1

2. If µ = 0
s
X
Et [yt+s ] = Et [yt+s−1 + t+s ] = Et [yt+s−2 + t+s + t+s−1 ] = ... = Et [yt ] + Et [ t+i ] = yt
i=1

that is for any s ≥ 2 the conditional means for all values of yt+s are equivalent and the
current yt represents the minimum mean-squared loss function forecast of all future
values of the series.
If µ 6= 0
t
X
Et [yt+s ] = Et [y0 + µ(t + s) + t ] = y0 + µ(t + s)
τ =1

that is the forecast function changes deterministically with time.

Expectation of a driftless Random Walk (µ = 0):

(a)
t
X t
X
V ar[yt ] = V ar[y0 + τ ] = V ar[τ ] = tσ 2
τ =1 τ =1

the variance is time-dependent and therefore a RW is not covariance stationary

(b)
t+s
X t+s
X
V ar[yt+s ] = V ar[y0 + τ ] = V ar[τ ] = (t + s)σ 2
τ =1 τ =1

as s → ∞ the variance approaches infinity.

(c) Given that the mean is constant


t
X t+s
X t
X
E[(yt − y0 )(yt+s − y0 )] = E[ τ · τ ] = E[ 2τ ] = tσ 2
τ =1 τ =1 τ =1

3
(d) r
E[(yt − y0 )(yt+s − y0 )] tσ 2 t
Corr[yt , yt+s ] = p =p =
V ar[yt ]V ar[yt+s ] tσ 2 (t + s)σ 2 t+s

• A driftless RW meanders without exhibiting any tendency to increase or decrease.


In fact, for any fixed real number M , the random walk and its absolute will exceed
M with probability 1 as the series lengthens.

• For sufficiently large samples, Corr[yt , yt+s ] ' 1, but as s grows Corr[yt , yt+s ]
declines below 1 ⇒ We cannot distinguish between a unit root process and a
stationary process with an autoregressive coefficient that is close to unity using
the ACF.

Figure 3: Sample ACF and Estimated AR(p) Models on RW Series

Figure 3 shows that, even though the sample autocorrelations are close to 1, they visi-
bly decay possibly instilling the doubt that we may be facing a highly persistent AR(1)
process.

Deterministic De-Trending: De-trending entails regressing a variable on a deter-


ministic (polynomial) function of time and saving the residuals,{ˆt } that come then to
represent the new, de-trended series
Q
X
ŷt+1 = δ̂j tj + ˆt+1
j=0

where the coefficients can be simply estimated by OLS.

• The appropriate degree of the polynomial can be set by standard t-tests, F-tests,
and/or information criteria.

• The regression is usually estimated using the largest value of Q considered rea-
sonable given the type of data or their frequency.

4
• The de-trended process can then be modeled using traditional methods

Unit Root Process and dth Order Integration: When a time series process {yt }
needs to be differenced d times before being reduced to the sum of constant terms plus
a white noise process, {yt } is said to contain d unit roots or to be integrated of order
d; we also write that yt ∼ I(d).

Example: RW with drift process.


Take its first-difference:

∆yt+1 ≡ yt+1 − yt = (µ + yt + t+1 ) − yt = µ + t+1

⇒ white noise series plus a constant intercept. It is a I(1) and it contains one unit root.

• If {yt } contains d unit roots, then {yt } is non-stationary, while the opposite does
not hold (e.g. explosive autoregressive process: yt+1 = µ + φyt + t+1 ).

• Typically φ = 1 is used to characterize non-stationarity because it describes ac-


curately many time series.

• All I(d) processes (with d > 0) are non-stationary, but there are non-stationary
processes that do not fit the definition of unit root processes.

Figure 4: De-trended and First-Differenced Series

Figure 5: Sample ACF of De-trended and First-Differenced Series

5
1.1 What Happens if One Incorrectly De-Trends a Unit Root
Series?

De-trending a Unit Root Process: When a time series process {yt } is I(d) but an
attempt is made to remove its stochastic trend by fitting deterministic (often polyno-
mial, spline-type) time trend functions, the resulting OLS residuals will still contain
one or more unit roots. Equivalently, deterministic de-trending does not remove the
stochastic trends.

Example: RW with drift process


De-trend using a simple linear time trend, i.e. subtracting δ̂0 + δ̂1 t
t
X t
X
yt − ŷtde−trend = y0 + µt + τ − δ̂0 − δ̂1 t = (y0 − δ̂0 ) + (µ − δ̂1 )t + τ
τ =1 τ =1

that is the resulting expression for yt − ŷtde−trend still equals tτ =1 τ even when δ̂0 = y0
P

and δ̂1 = µ. Therefore, the source of unit roots and, consequently, the stochastic trend
have not been removed.

Figure 6: Incorrect Polynomial De-trending of a Random Walk

Figure 6 plots the series yt − ŷtde−trend on the left, and the ACF of yt − ŷtde−trend on the
right.

1.2 What Happens if One Incorrectly Applies Differencing to


(Deterministic) Trend-Stationary Series?

Differentiating a Trend-Stationary Processes: When a time series process {yt }


contains a deterministic time trend but it is otherwise I(0), (i.e., it is trend-stationary)

6
and an attempt is made to remove the deterministic trend by differentiating the series d
times, the resulting differentiated series will contain d unit roots in its moving-average
components and will therefore be not invertible. Equivalently, differentiating a trend-
stationary series, creates new stochastic trends that are shifted inside the shocks of the
series.

• The more you differentiate a trend-stationary process, the higher the number of
unit roots in the resulting MA process.

Example:
Starting from a deterministic trend
Q
X
yt = δj tj + t
j=0

Take its first difference


Q Q Q
X X X
j j
∆yt ≡ yt −yt−1 =( δj t +t+1 )−( δj (t−1) +t ) = δj [tj −(t−1)j ]+(t+1 −t )
j=0 j=0 j=0

that is a MA(1) with a complex time trend characterized by a unit root in the MA
component.
When Q = 1 this becomes an integrated moving-average of order 1

∆yt = δ1 + (t+1 − t )

If we differentiate a second time we obtain an integrated MA(2) process with two visible
roots in excess of 1, and an increasingly complex time trend function
Q Q
X X
2 j j
∆ yt ≡ ∆yt −∆yt−1 =( δj [t −(t−1) ]+t+1 −t )−( δj [(t−1)j −(t−2)j ]+t −t−1 ) =
j=0 j=0

Q
X
= δj {[tj − (t − 1)j ] − [(t − 1)j − (t − 2)j ]} + t+1 − 2t + t−1
j=0

7
Figure 7: Incorrect First-Differencing of a Trend-Stationary Series

Figure 7 plots the series ∆yt on the left, and the ACF of ∆yt on the right.

1.3 What Happens if One Incorrectly Applies Differencing to


a Stationary Series?

• Even when the trend-stationary component is absent, if the time series is I(0)
but it is differenced d times, the resulting differentiated series will contain d unit
roots in its moving-average components and will therefore be not invertible.

1.4 What Happens if One Incorrectly Applies Differencing


d + r Times to an I (d ) Series?

• It is a generalization of the previous cases, that is differencing d = r times an I(0)


series that contained a deterministic trend and differencing d = r times an I(0)
series

Two cases when differencing yt ∼ I(d) d + r times:

(a) If r > 0, when a time series is I(d) but it is differenced more than d times,
the resulting differentiated series will contain r unit roots in its moving-average
components and will therefore be not invertible (Over-differentiation).

(b) If r < 0, if the original time series is I(d) but it is differenced less than d times,
the resulting differentiated series will still contain d + r < d unit roots and will
therefore remain non-stationary (Under-differentiation).

8
Figure 8: Excessive Differencing of a I(1) Series

Figure 8 shows that the over-differenced integrated MA series tends to be spikier and
more volatile than the correctly differenced white noise series should be.

2 The Spurious Regression Problem

Sums of Stationary and Non-Stationary Series: Consider N time series, y1,t ∼


I(d1 ), y2,t ∼ I(d2 ), ..., yN,t ∼ I(dN ). Then, unless special conditions occur, their
weighted sum will be integrated with an order that is the maximum across all in-
tegration orders:
N
X
wi yi,t ∼ I(max(d1 , d2 , ..., dN ))
i=1

(Heuristic) proof in case of three series:


Let yt ∼ I(1), xt ∼ I(1) and ηt be three series such that yt and xt are independent.
The regression of yt on xt
yt = a + bxt + ηt

9
can be rewritten as
ηt = yt − a − bxt
Substituting yt with the stochastic trend equation we obtain
t
X t
X
ηt = (y0 + µy t + yτ ) − a − b(x0 + µx t + xτ ) =
τ =1 τ =1

t
X
(y0 − a − bx0 ) + (µy − bµx )t + (yτ − bxτ )
τ =1
⇒ ηt ia a unit root process, given that xτ and yτ are independent white noise errors so
that their sum is also a white noise.
t
X t
X
ηt = (y0 + µy t + yτ ) − a − b(x0 + µx t + xτ ) =
τ =1 τ =1

t−1
X
= [(µy − bµx ) + yt − bxt ] + (y0 − a − bx0 ) + (µy − bµx )(t − 1) + (yτ − bxτ )
τ =1
= (µy − bµx )t + ηt−1 + (yt − bxt )
that is, another random walk with drift.
This follows from the result on Sums of stationary and Non-stationary series stated
above, in the case of a sum of two I(1) variables.

• ηt is the weighted sum minus a constant of the two I(1) variables. Therefore
ηt ∼ I(1), which is the highest integration order of the variables that we are
combining.

• Starting from yt = a + bxt + ηt , when the regressand and regressor are both I(1),
it turns out that the regression errors must then also contain, unless special con-
ditions occur, a unit root.
⇒ When ηt is a random walk with drift, the assumptions of the classical regres-
sion model (yt and xt are stationary and the errors have a zero mean and a finite
variance) are not respected.

Spurious regression: a regression where

(a) the residuals are I(1) and as such any shock has a permanent effect on the regres-
sion, being equivalent to a permanent change of the intercept of the model

(b) R-square is high and t-statistics appear to be significant, but the results are void
of any economic meaning

(c) Standard OLS estimators are inconsistent and the associated inferential proce-
dures are invalid and statistically meaningless.

10
• Estimating spurious regressions and reporting and discussing their results is use-
less.

• Problems will also arise in regression analysis when the regressand and the regres-
sors are integrated of different orders. Regression equations using such variables
are meaningless.

Example: if in
t
X
ηt = (y0 − a − bx0 ) + (µy − bµx )t + (yτ − bxτ )
τ =1

only xt is I(1), the resulting regression errors would be I(1)


t
X t
X
ηt0 = (µy + yt ) − a − b(x0 + µx t + xτ ) = (µy − a − bx0 ) − bµx t − b xτ + yt
τ =1 τ =1

3 Unit Root Tests

• Because the estimated AR(1) coefficient equals Corr[yt , yt−1 ], this means that any
attempt to recover a unit root from generated RW data will fail because the OLS
estimate of the unit root coefficient will contain a downward bias.
⇒ Dickey and Fuller offer a procedure to formally test for the presence of a unit
root that takes this bias into account.

3.1 Classical Dickey-Fuller Tests

Dickey and Fuller procedure: find critical values using a Monte Carlo design

(a) Generate a large number of random walk sequences and for each of them calculate
the estimated value of the AR(1) coefficient

(b) For the law of large numbers, the larger the number of simulation trials, the more
precise the resulting distribution of the statistic

(c) Obtain critical values, which are generated by assuming a random walk.

Dickey-Fuller test: test for the presence of a unit root. From the AR(1) model
equation specified as

yt+1 − yt ≡ ∆yt+1 = (φ − 1)yt + t+1

11
where (φ − 1) = α, or as

∆yt+1 = αyt + t+1 no constant

∆yt+1 = µ + δt + αyt + t+1 with a deterministic time trend


From one of these, obtain the OLS estimate of α, the corresponding t-statistic and
compare it with the critical value found by Dickey and Fuller, in a one-sided t-test.

• It would be incorrect to simply use the standard t-student because standard in-
ference is invalid when the assumed stochastic process is non-stationary

• The t-test used takes into account that under the null of a random walk, its
distribution is nonstandard and cannot be analytically evaluated. A standard
inference would be invalid since the assumed stochastic process is non-stationary.

• The limiting distribution α is not affected by the removal of any deterministic S


seasonal components, through dummy variables inserted in yt+1 − yt ≡ ∆yt+1 =
(φ − 1)yt + t+1 and ∆yt+1 = µ + δt + αyt + t+1 , like in
S−1
X
∆yt+1 = µ + δt + λs Ds + αyt + t+1
s=1

where Ds are standard seasonal dummies in a number equal to S − 1.

• α has a non-standard (finite sample) distribution.

• The critical values need to be approximated by simulation, using null models that
are adjusted to either exclude the drift or include a time trend. The decision to
include a time trend depends also on economic theory or financial models.

• The ADF critical values depend on which deterministic regressors are included,
if any: the confidence intervals around α = 0 dramatically expand if a drift and
a time trend are included in the model.

• When the series is stationary the distribution of the t-statistic does not depend
on the presence of the other regressors.

3.2 The Augmented Dickey-Fuller Test

• The DF test is limited by the fact that, given the null hypothesis of a random
walk, the alternative hypothesis is specified as an AR(1) stationary process. How-
ever, not all stationary time-series variables can be well represented by an AR(1)
process.

12
Augmented Dickey-Fuller (ADF) test: for general AR(p) processes
Consider the AR(p) process
yt+1 = φ0 + φ1 yt + φ2 yt−1 + ... + φp−1 yt−p+2 + φp yt−p+1 + t+1
Add and subtract φp yt−p+2
yt+1 = φ0 + φ1 yt + φ2 yt−1 + ... + (φp−1 + φp )yt−p+2 − φp ∆yt−p+1 + t+1
Add and subtract (φp−1 + φp )yt−p+3
yt+1 = φ0 + φ1 yt + ... + (φp−2 + φp−1 + φp )yt−p+3 − (φp−1 + φp )∆yt−p+2 − φp ∆yt−p+1 + t+1
Repeat this step p times and obtain
p
X
∆yt+1 = φ0 + αyt + γi ∆yt−i+1 + t+1
i=1
Pp Pp
where α ≡ −(1 − i=1 φi ) and γi ≡ − j=1 φj .
Alternatively, the regression can be specified as
p
X
∆yt+1 = αyt + γi ∆t−i+1 + t+1
i=1
Q p
X X
j
∆yt+1 = φ0 + δj t + αyt + γi ∆yt−i+1 + t+1
j=1 i=1

If α = 0 the equation is entirely in first differences and so it has a unit root, while
α < 1 is indication of stationarity.

• The ADF test implements a parametric correction for higher-order correlation.

• When p = 1, the ADF test boils down to a classical DF test because 1i=2 γi ∆yt−i+1 =
P
0.

• The appropriate statistic to use depends on the deterministic components included


in the regression equation.

• α and its standard error cannot be properly estimated unless all of the autoregres-
sive terms are included in the estimating equation, through correct specification
of p.

• If p is not correctly specified:

– Few lags: the regression residuals do not behave like a white noise process ⇒
the model does not appropriately capture the actual error process ⇒ α and
its standard error are not correctly estimated.

– Too many lags: the power of the test to reject the null of a unit root is reduced
since the increased number of lags necessitates the estimation of additional
parameters and a loss of degrees of freedom.

13
• Select p:

– Start with a relatively long lag length and pare down the model by the usual
t- and/or F -tests.

– In regressions containing a mixture of I(1) and I(0) variables, t- and F -tests


applied to the coefficients of the stationary variables will be (asymptotically)
valid only when the residuals are white noise ⇒ Information criteria.

• The most general of the alternative regressions is not necessary the best choice,
since the presence of the additional estimated parameters may reduce the degrees
of freedom and the power of the test.

• The first regression assumes an AR(p) process, but in reality the DGP may con-
tain both autoregressive and moving average components. However, because an
invertible MA model can be transformed into an AR model, the procedure can
be generalized to allow for moving average components.

• Low power issues because ADF tests cannot distinguish between a series with a
characteristic root that is close to unity and a true unit root process, or between
a trend stationary and a stochastic trend processes. This is because in finite sam-
ples trend stationary process can be arbitrarily well approximated by a unit root
process, and a unit root process can be arbitrarily well approximated by a trend
stationary process.

Figure 9: Four Simulated Series Based on Identical Shocks

The plot on the left compares 180 generated values from the two models:

yt+1 = 1.1yt − 0.1yt−1 + t+1 (Nonstationary AR(2))

zt+1 = 1.1zt − 0.15zt−1 + t+1 (Stationary AR(2))

14
both initialized at y0 = y1 = 0, in which the errors are Gaussian IID and kept identical
across simulated values.
The rightmost plot of Figure 9 shows that it can be difficult to distinguish between a
trend stationary and a random walk with drift process.

Quasi-differenced series: a series that depends on a parametric value choice a rep-


resenting the specific point alternative against which we wish to test H0 : ∆a yt+1 =
yt+1 − ayt for t = 2, ..., T (∆a y1 = y1 ).

Alternative ADF test (ERS): Estimate an OLS regression of the quasi-differenced


data (i.e. de-trended data) on the quasi-differenced deterministic regressors
∆a yt+1 = ∆a x0t+1 δ(a) + νt+1
where ∆a t = t − at, ∆a t2 = t2 − at2 and so on.
Apply standard ADF test to the estimated quasi differenced, (GLS) de-trended data,
ŷt+1 (ã) = yt − ∆a x0t+1 δ̂(ã)
p
X
∆ŷt+1 (ã) = αŷt (ã) + γi ∆ŷt+1−i (ã) + t+1
i=1

where the recommended ã is equal to (1 − (7/T )), when there is only an intercept, and
1(13.5/T ) when there is also a linear time trend.

• Instead of modelling a constant and/or time trends in the test regression, the data
are de-trended so that the deterministic explanatory variables are “taken out” of
the data prior to estimating the test regression.

3.3 Other Unit Root Tests

(a) Phillips and Perron (PP) non-parametric method of controlling for serial correla-
tion when testing for a unit root: estimates the classical DF test equations and
modifies the t-ratio of the coefficient α so that serial correlation in the residuals
does not affect the asymptotic distribution of the test statistic.
v PT 2
T −m 1
t=1 
ˆt
u
PP DF α̂ u T T
tα = tα ς − ψ = t P 2/9 +
se(α̂) [T ] 1
P T

ˆ 
ˆ
i=[−T 2/9 ] T −i t=i+1 t t−i

P[T 2/9 ]
se(α̂)T ( i=[−T 2/9 ] T 1−i Tt=i+1 ˆt ˆt−i − T −m
P PT 2
T t=1 ˆt )
− q P 2/9
[T ]
2 ( i=[−T 2/9 ] T 1−i Tt=i+1 ˆt ˆt−i )( T1 Tt=1 ˆ2t )
P P

where m is the number of deterministic regressors included and se(α̂) is the OLS
standard error of the estimated coefficient.

15
• Because ς, ψ > 0, even when ς > 1, it is possible for tPα P < tDF
α , so that while
a DF test would not reject the null of a unit root, the PP test will.

• Instead of “whitening” the test regression residuals by fitting some AR(p),


PP propose to directly adjust the test statistic in “HAC-way”.

• The asymptotic distribution of the PP modified t-ratio is the same as that


of the ADF statistic and the test is also performed under the null of a unit
root.

(b) Kwiatkowski, Phillips, Schmidt, and Shin (KPSS) test: based on the residuals
from the OLS regression of the series on the exogenous, deterministic factors

yt+1 = x0t δ + ut+1

where xt may contain the intercept and a polynomial time trend and the series is
assumed to be (trend-)stationary under the null.
PT Pt
t=1 ( i=1 ûi )2
KP SST ≡ PT 2/9 PT
1
T2 i=(−T 2/9 ) T −i t=i+1 
ˆt ˆt−i

where the denominator is an estimator of the residual spectrum at frequency zero.

3.4 Testing for Unit Roots in Moving Average Processes

• Unit roots may characterize the MA(q) components of a time series process, which
made it non-invertible.

• Testing for a unit root in the MA polynomial is equivalent to testing that a time
series has been over-differenced.

• Testing for a unit root in the MA polynomial gives a chance to distinguish be-
tween trend-stationary and stochastically trending processes, because we know
that while the dth difference of an I(d) process is I(0), the dth difference of a
trend stationary process will contain d unit roots in the MA polynomial.

• Let {t } be observations from t+1 = θηt + ηt+1 with ηt+1 IID(0, ση2 ). Under
H0 : θ = −1 the statistic T (θ̂ + 1) based on the MLE θ̂ converges to a distribution
characterized by Davis and Dunsmuir.
A one-sided test with H0 : θ = −1 vs. H1 : θ > −1 can be shaped on this
limiting result by rejecting the null when θ̂ > −1 + csize /T where csize is the
(1size)-quantile of the limit distribution of T (θ̂ + 1).

16
4 Cointegration and Error Correction Models

4.1 The Relationship Between Cointegration and Economic


Theory

• There are situations in which the choice to simultaneously difference all non-
stationary variables to transform them into a set of stationary variables may
imply a major loss of information, possibly invalid inference, and sub-optimal
predictive performance.

Cointegrated variables: N variables whose linear combination is stationary.

• If a linear relationship is already stationary, differencing it entails a misspecifica-


tion error, thus transforming cointegrated variables to be I(0) is a mistake.

Standard discounted dividend/earnings growth model:


(1 + g)Ft+1
Pt+1 = + t+1
(r − g)
where t+1 IID(0, σ 2 ), so that

Pt+1 − κFt+1 = t+1

with
κ ≡ (1 + g)/(r − g) > 0
Pt+1 = (real) stock price
Ft+1 = some fundamental real quantity that investors discount when pricing stocks,
e.g. cash dividends or earnings
r= fixed real discount rate that reflects the riskiness of stocks
g = constant real rate of growth of fundamentals
t+1 = fair value of the stock index, that is temporary deviations of market prices from
what is justified by fundamentals.

• It is an example of equilibrium frameworks in which deviations from equilibrium


must be temporary.

• Because the time series of mispricings, {t+1 }, represents temporary deviations,


we expect them to be stationary, or even to be white noise.

• If {t+1 } were a I(1) series, this would mean that pricing errors would never be
corrected and on the contrary could diverge forever.

17
• If the time series {Pt+1 } and {Ft+1 } are I(1), a coefficient κ exists such that a
weighted sum of real prices and fundamentals yields a stationary variable:
I(1) − κI(1) ∼ I(0)
that violates the result on Sums of Stationary and Non-Stationary Series.

• It is necessary that the time paths of the two non-stationary variables {Pt+1 } and
{Ft+1 } are closely linked.

4.2 Definition of Cointegration

Cointegrated System: The components of the N-dimensional random vector


 0
yt ≡ y1t y2t ... yN T are said to be cointegrated of order d, b, denoted by yt ∼
CI(d, b), if all components
0 of yt are integrated of order d, and there exists a vector κ ≡
κ1 κ2 ... κN such that the linear combination κ0 yt is integrated of order (d − b)


where b > 0. The vector κ is called the cointegrating vector.

Example: Consider a set of economic variables in a long-run equilibrium when


κ1 y1t + κ2 y2t + ... + κN yN t = 0
or compactly, κ0 yt = 0. The deviation from the long-run equilibrium (equilibrium
error) is a scalar random variable t = κ0 yt . If the equilibrium is meaningful, it must
be the case that the equilibrium error process is stationary.

• In general, given a set of N integrated variables, these will not be cointegrated,


implying that there is no long-run equilibrium among the variables, so that they
can wander arbitrarily far from each other.

• It is quite possible that nonlinear long-run relationships exist among a set of


integrated variables.
 0
• The cointegrating vector is not unique: if κ1 κ2 ... κN is a cointegrating
vector, then for any nonzero value of ξ,
 0
ξκ1 ξκ2 ... ξκN
is also a cointegrating vector, which is therefore not unique.

• Typically, one of the variables is used to normalize the cointegrating vector by


fixing its coefficient at unity, e.g. 1 (κ2 /κ1 ) (κ3 /κ1 ) ... (κN /κ1 ) ]0 .

• By definition, if two or more variables are integrated of different orders, they can-
not be cointegrated.

18
Multiplicity of Cointegrating Vectors: If yt has N non-stationary components,
there may be as many as (N –1) linearly independent cointegrating vectors. The num-
ber of cointegrating vectors is called the cointegrating rank of yt .

• If yt contains only two variables, there can be at most one independent cointe-
grating vector.

• If

κ1 y1t + κ2 y2t + ... + κN yN T = t ∼ I(0) ⇒ κ2 y2t + ... + κN yN T = t − κ1 y1t ∼ I(1)


 0
⇒ y1t y2t ... yN T ∼ CI(d, b)
this does not imply that subset of the same N variables needs to be also CI(d, b).

• Cointegrated variables are such because they share common stochastic trends.
Example: dividend growth model where {Pt } and {Ft } are constructed as two
random walks plus a stationary noise process

Pt = RWtP + Pt and Ft = RWtF + Ft

If Pt and Ft are CI(1, 1) ⇒ there must be nonzero values of some coefficients κ1


and κ2 such that κ1 Pt + κ2 Ft is stationary. But

κ1 Pt +κ2 Ft = κ1 (RWtP +Pt )+κ2 (RWtF +Ft ) = (κ1 RWtP +κ2 RWtF )+(κ1 Pt +κ2 Ft )

therefore for κ1 Pt + κ2 Ft to be stationary κ1 RWtP + κ2 RWtF should be zero.


However RWtP and RWtF are random variables whose realized values will be
continually changing over time that cannot be simply set to zero. It follows that
κ2
κ1 Pt + κ2 Ft is stationary ⇐⇒ RWtP = − RWtF
κ1
In the N-variables case:
yt = RWt + t
where RWt is a N ×1 vector of random walk components and t has an equivalent
definition for white noise processes.
If one trend can be expressed as a linear combination of the other trends in the
system, it means that

∃κ : κ0 RWt = 0 ⇒ κ0 yt = κ0 RWt + κ0 t = κ0 t ∼ I(0)

Because the series in yt share one or more common stochastic trends, they cannot
drift “too far apart”.

19
4.3 Error Correction Models

• The dynamics of the cointegrated variables are influenced by the magnitude of


any departures from the long-run equilibrium: if the system is to return to the
equilibrium, the movements of at least some of the variables must respond to the
magnitude of the recorded disequilibrium.

Vector Error Correction Representation:If the N I(1) variables in yt have a


vector error-correction (VECM) representation,
p
X
∆yt+1 = π0 + Πyt + Πi ∆yt+1−i + ut+1
i=1

(where the error vectors ut+1 can be serially correlated) then, because the vector equa-
tion will need to balanced, Πyt ∼ I(0) will imply that the variables in yt are CI(1, 1).
Because the N × N matrix Π 6= 0 contains only constants, each row of this matrix is
a cointegrating vector of yt and rank(Π) is the cointegrating rank of yt .

Example: dynamic error-correction model


∆Pt+1 = λP (Pt − κFt ) + uPt λP ≤ 0
∆Ft+1 = −λF (Pt − κFt ) + uFt λF ≤ 0
where uPt and uFt are white-noise disturbance terms which may be correlated, and λP
and λF are estimable parameters.
The unique long-run equilibrium is attained when Pt = κFt .
because we have assumed that prices and fundamentals are all I(1), ∆Pt+1 and ∆Ft+1
are I(0), uPt and uFt are stationary. Furthermore, for the dynamic error-correction
model to be sensible the right-hand side must be I(0).
⇒ Pt − κFt must be stationary, so that prices and fundamentals must be cointegrated
with the cointegrating vector [1 − κ]0 . This doesn’t change if lagged changes of each
variable are introduced in the model
p p
X X
PP
∆Pt+1 = λP (Pt − κFt ) + φi ∆Pt+1−i + φPi F ∆Ft+1−i + uPt λP ≤ 0
i=1 i=1
p p
X X
∆Ft+1 = −λF (Pt − κFt ) + φFi P ∆Pt+1−i + φFi F ∆Ft+1−i + uFt λF ≤ 0
i=1 i=1
This is a bi-variate VAR(p) in first differences that contains error-correction terms,
and in which λP and λF can be interpreted as speed of adjustment parameters. At
least one of the speed of adjustment terms must be non-zero, otherwise the long-run
equilibrium relation-ship would not appear and the model would not be one of error
correction or cointegration.

20
• The only case in which a VECM fails to imply a cointegrating relationship, is
when Π = 0 ⇒ the VECM boils down to a traditional reduced-form VAR(p) in
first differences and yt+1 will not respond to previous period’s deviation from the
long-run equilibrium.

• When Π 6= 0 ⇒ yt+1 will respond to the previous period’s deviations from the
long-run equilibrium ⇒ estimating the VECM as a VAR(p) in first differences is
inappropriate ⇒ the omission of Πyt leads to misspecification.

• In a model in which the error correction terms have been unduly dropped, the
VECM has no long-run solution, thus it has nothing to say about whether the
variables have an equilibrium relationship.

• It is preferable to use the first differences if the N I(1) variables are not cointe-
grated because

– If cointegration is incorrectly assumed, tests lose power because of the esti-


mation of N 2 additional parameters.

– For a VAR in levels, tests for Granger causality conducted on the I(1)
variables do not have a standard F-distribution.

– The impulse responses at long horizons would also lead to inconsistent esti-
mates of the true responses.

Granger’s Representation Theorem: For any set of N I(d) variables with identical
integration order, error correction and cointegration are equivalent representations.

Example: VAR(1) model when financial markets are efficient

Pt+1 = a11 Pt + a12 Ft + Pt+1 and Ft+1 = a20 + a22 Ft + Ft+1

where a12 6= 0, |a11 | < 1, and Pt+1 and Ft+1 are possibly correlated white noise processes.
We drop an intercept from the price equation to prevent the presence of a drift that
would cause arbitrage opportunities.
If a22 = 1 the equation for fundamentals is a random walk with drift and as such taking
the first difference of both equations gives
h1 − a i
P 11
∆Pt+1 = (a11 −1)Pt +a12 Ft +t+1 = −a12 Pt −Ft +Pt+1 and ∆Ft+1 = a20 +Ft+1
a12
which shows that the first equation is balanced, in the sense that both its left- and
right-hand sides are I(0) ⇐⇒ [((1 − a11 )/a12 )Pt − Ft ] ∼ I(0) independently of the
value taken by a12 . Impose two additional restrictions on the VAR to imply that prices
and fundamentals are CI(1, 1):

21
(a) a12 > 0 ⇒ when ((1 − a11 )/a12 )Pt < Ft and price is too low then
− a12 [((1 − a11 )/a12 )Pt − Ft ] > 0 ⇒ valid error correction dynamic

(b) a11 < 1 ⇒ [((1 − a11 )/a12 )1] ⇒ the model reaches its long-run equilibrium when
Ft = ((1 − a11 )/a12 )Pt or Ft /Pt = (1 − a11 )/a12 .

Therefore, all VECMs can be interpreted as special, restricted VAR models.


Given that

Pt+1 = a11 Pt + a12 Ft + Pt+1 and Ft+1 = a20 + a22 Ft + Ft+1

when a11 = a22 = 1


t
X t+1
X
Pt+1 = (P0 + a12 F0 ) + a12 a20 t + a12 Fτ + Pτ
τ =1 τ =1

t+1
X
Ft+1 = F0 + a20 (t + 1) + Fτ
τ =1

⇒ prices are I(2) and fundamentals are I(1), then they cannot be cointegrated.
When |a22 | > 1 and/or |a11 | > 1 ⇒ both variables are non-stationary and explosive
and therefore no cointegration exists.

4.4 Testing for Cointegration

Tests for cointegration:

(a) Univariate, regression-based tests: Engle and Granger’s univariate methodology:


determine whether the residuals of an estimated equilibrium relationship are sta-
tionary.

Example: dividend/earnings growth model.

• Assume that {Pt+1 } and {Ft+1 } are both I(1)

• Estimate the long-run equilibrium relationship

Pt = κ0 + κ1 Ft + et

• Cointegrated variables ⇒ OLS regression yields a super-consistent estimator


of the cointegrating parameters κ0 and κ1 ⇒ OLS estimator converges faster
than in OLS models using stationary variables.

22
• Determine whether prices and fundamentals are actually cointegrated ⇒
time series of estimated deviations from the long-run relationship, eˆt =
Pt − κ̂0 − κ̂1 Ft , are stationary ⇒ unit root test where the null expressed as
the impossibility to reject the null of a unit root in the residuals of the Engle-
Granger’s regression which implies that we cannot reject the null hypothesis
that prices and fundamentals are not cointegrated.

• Standard ADF critical values in Engle-Granger tests will contain a bias


toward finding a stationary error process
⇒ adjusted critical values
⇒ tests on the normalized autocorrelation coefficient ẑα ≡ 1−PTpα̂ γ̂ 2
i=2 i
⇒ Durbin-Watson (DW) test statistic (Cointegrating Regression Durbin
Watson (CRDW) test): under H0 of a unit root in the errors, CRDW will
be close to zero, so H0 is rejected if the CRDW statistic is larger than the
relevant critical value
⇒ PP test (Phillips-Ouliaris’ test): {ˆt } under p = 0 are used to compute
estimates of the long-run variance (V̂0 ) to perform the adjustment
PT to the
PP 2 −1/2
estimated autocorrelation coefficient given by α̂ = α̂ − T V̂0 ( t=2 êt )
so that ẑαP P = T α̂P P .

• When the null of no cointegration is rejected, estimate (usually by OLS)


the VECM. In this case, it is a VAR(p)) in which the error correction terms
directly use the stationary estimated residuals from
p p
X X
∆Pt+1 = λP êt + aP1i ∆Pt+1−i + aP2i ∆Ft+1−i + Pt+1
i=1 t+1−i

p p
X X
∆Ft+1 = λF êt + aF0 + aF1i ∆Pt+1−i + aF2i ∆Ft+1−i + Ft+1
i=1 i=1

• The estimation of the long-run equilibrium regression requires that the re-
searcher places one variable on the left-hand side and uses the others as
regressors.

• Relies on a two-step estimator, in which the first step regression residuals


are used in the second step to estimate an ADF (or PP)-type regression,
which causes errors and contamination deriving from a generated regressors
problem.

• No systematic procedure to perform the separate estimation of multiple


cointegrating vectors.

(b) Multivariate, VECM-based tests: Take the potential existence of multiple coin-
tegrating vectors into account by using single-step full information maximum

23
likelihood estimation (Multivariate generalization of the Dickey-Fuller tests).

Example: reduced form VAR(p) model


p
X
yt+1 = Ai yt+1−i + t+1 t+1 IID(0, Σ )
i=1

• Add and subtract Ap yt−p+2 to the right-hand side


p−2
X
yt+1 = Ai yt+1−i + (Ap−1 + Ap )yt−p+2 + Ap ∆yt−p+1 + t+1
i=1

• Add and subtract (Ap−1 + Ap )yt−p+3 and so on, obtaining


p−1 p p
X X X
∆yt+1 = Πyt + Γi ∆yt+1−i +t+1 Π ≡ −(IN − A i ) Γi ≡ − Aj
i=1 i=1 j=i+1

• For a set of N variables yt+1 that can be represented as in the equation


∆yt+1 above, the rank of Π equals the number of cointegrating vectors, r.
If Π consists of all zeroes, so that rank(Π) = 0, then all the variables in the
vector yt+1 contain a unit root and there is no cointegrating relationship. If
rank(Π) = N , ∆yt+1 represents a convergent system of difference equations
so that all variables are stationary. If N > rank(Π) > 0, then Πyt is the
error correction term such that
p−1
X
0 = E[∆yt+1 ] = Πyt + Γi E[∆yt+1−i ] + E[t+1 ] ⇒ Πyt = 0
i=1

and Π = ΛK0 , where K is the N ×r matrix of cointegrating vectors and Λ is


the N × r matrix of weights with which each cointegrating vector enters the
N equations of the VAR. Λ can also be interpreted as containing r different
N × 1 vectors of adjustment coefficients.

(c) Estimate the matrix Π from the unrestricted VAR for N non-stationary series

(d) Test whether we can reject the restrictions implied by the reduced rank of Π:
test for the number of eigenvalues that are insignificantly different from unity
using
H0 : number of distinct cointegrating vectors ≤ r
H1 : number of distinct cointegrating vectors > r

24
N
X
λtrace (r) ≡ −T ln(1 − λ̂i )
i=r+1

H0 : the number of cointegrating vectors = r H1 : the number of cointegrating vectors = r+1


λmax (r, r + 1) ≡ −T ln(1 − λ̂r+1 )
where λ̂1 > λ̂2 > ... > λ̂N are the estimated values of the eigen-values obtained
from Π̂. The critical values of the statistics are obtained using a Monte Carlo
approach, while the distribution is non-standard. The critical values depend on
the value of N r, the number of non-stationary components, and whether con-
stants are included in each of the equations.

• The statistics have heterogeneous functional form, thus may give different results.
λmax is preferred when trying to pin down the number of cointegrating vectors.

• The VECM cannot be estimated by OLS because it is necessary to impose cross-


equation restrictions on the Π matrix ⇒ ML estimation methods.

25

You might also like