Analysis of Multiple Time Series
Analysis of Multiple Time Series
Note: The primary references for these notes are chapters 5 and 6 in Enders (2004). An alternative,
but more technical treatment can be found in chapters 10-11 and 18-19 in Hamilton (1994).
5.1.1 Definition
The definition of a vector autoregression is nearly identical to that of a univariate autoregression.
Definition 5.1 (Vector Autoregression of Order P). A Pth order vector autoregression , written
VAR(P), is a process that evolves according to
yt = Φ0 + Φ1 yt −1 + Φ2 yt −2 + . . . + ΦP yt −P + εt (5.1)
Simply replacing the vectors and matrices with scalars will produce the definition of an AR(P). A
vector white noise process has the same useful properties as a univariate white noise process; it
is mean zero, has finite covariance and is uncorrelated with its past although the elements of a
vector white noise process are not required to be contemporaneously uncorrelated.
Definition 5.2 (Vector White Noise Process). A k by 1 vector valued stochastic process, {εt }is
said to be a vector white noise if
E[εt ] = 0k (5.2)
0
E[εt εt −s ] = 0k ×k
E[εt ε0t ] = Σ
The simplest VAR is a first-order bivariate specification which can be equivalently expressed
as
y t = Φ 0 + Φ 1 y t −1 + ε t ,
It is clear that each element of yt is a function of each element of yt −1 , although certain param-
eterizations of Φ1 may remove the dependence. Treated as individual time-series, deriving the
properties of VARs is an exercise in tedium. However, a few tools from linear algebra make work-
ing with VARs hardly more difficult than autoregressions.
5.1 Vector Autoregressions 325
The properties of the VAR(1) are fairly simple to study. More importantly, section 5.2 shows that
all VAR(P)s can be rewritten as a VAR(1), and so the general case requires no additional effort
than the first order VAR.
5.1.2.1 Stationarity
y t = Φ 0 + Φ 1 y t −1 + ε t
is covariance stationary if the eigenvalues of Φ1 are less than 1 in modulus.1 In the univariate
case, this is equivalent to the condition |φ1 | < 1. Assuming the eigenvalues of φ 1 are less than
one in absolute value, backward substitution can be used to show that
∞
X ∞
X
yt = Φi1 Φ0 + Φi1 εt −i (5.3)
i =0 i =0
where the eigenvalue condition ensures that Φi1 will converge to zero as i grows large.
5.1.2.2 Mean
Definition 5.3 (Eigenvalue). λ is an eigenvalue of a square matrix A if and only if |A − λIn | = 0 where | · | denotes
determinant.
The crucial properties of eigenvalues for applications to VARs are given in the following theorem:
Theorem 5.1 (Matrix Power). Let A be an n by n matrix. Then the following statements are equivalent
• Am → 0 as m → ∞.
" ∞
#
X
E [yt ] = E (Ik − Φ1 )−1 Φ0 + E Φi1 εt −i
(5.5)
i =0
∞
X
= (Ik − Φ1 )−1 Φ0 + Φi1 E [εt −i ]
i =0
∞
X
−1
= (Ik − Φ1 ) Φ0 + Φi1 0
i =0
−1
= (Ik − Φ1 ) Φ0
This result is similar to that of a univariate AR(1) which has a mean of (1 − φ1 )−1 φ0 . The eigen-
values play an important role in determining the mean. If an eigenvalue of Φ1 is close to one,
(Ik − Φ1 )−1 will contain large values and the unconditional mean will be large. Similarly, if Φ1 = 0,
then the mean is Φ0 since { yt } is composed of white noise and a constant.
5.1.2.3 Variance
Before deriving the variance of a VAR(1), it often useful to express a VAR in deviation form. Define
µ = E[yt ] to be the unconditional expectation of y (and assume it is finite). The deviations form
of a VAR(P)
yt = Φ0 + Φ1 yt −1 + Φ2 yt −2 + . . . + ΦP yt −P + εt
is given by
∞
X
ỹt = φ i1 εt −i (5.7)
i =1
The deviations form is simply a translation of the VAR from its original mean, µ, to a mean of 0.
The advantage of the deviations form is that all dynamics and the shocks are identical, and so
can be used in deriving the long-run covariance, autocovariances and in forecasting. Using the
backward substitution form of a VAR(1), the long run covariance can be derived as
5.1 Vector Autoregressions 327
" ∞
! ∞
!0 #
X X
0 0
E (yt − µ) (yt − µ) = E ỹt ỹt = E Φi1 εt −i Φi1 εt −i
(5.8)
i =0 i =0
" ∞
#
0 0
X
=E Φi1 εt −i ε0t −i Φ1 (Since εt is WN)
i =0
∞
X 0
= Φi1 E εt −i ε0t −i Φ01
i =0
∞
X 0
= Φi1 Σ Φ01
i =0
0
vec E (yt − µ) (yt − µ) = (Ik 2 − Φ1 ⊗ Φ1 )−1 vec (Σ)
Definition 5.4 (vec). Let A = [a i j ] be an m by n matrix. The vec operator (also known as the stack operator) is
defined
a1
a2
vec A = . (5.9)
..
an
Definition 5.5 (Kronecker Product). Let A = [a i j ] be an m by n matrix, and let B = [bi j ] be a k by l matrix. The
Kronecker product is defined
a 11 B a 12 B . . . a 1n B
a 21 B a 22 B . . . a 2n B
A⊗B=
.. .. .. ..
. . . .
a m1 B a m 2 B . . . a mn B
and has dimension mk by nl .
Theorem 5.2 (Kronecker and vec of a product). Let A, B and C be conformable matrices as needed. Then
5.1.2.4 Autocovariance
and
Γ −s = E[(yt − µ)(yt +s − µ)0 ] (5.11)
These present the first significant deviation from the univariate time-series analysis in chap-
ter 4. Instead of being symmetric around t , they are symmetric in their transpose. Specifically,
Γ s 6= Γ −s
Γ s = Γ 0−s .
In contrast, the autocovariances of stationary scalar processes satisfy γs = γ−s . Computing the
autocovariances is also easily accomplished using the backward substitution form,
" ∞
! ∞
!0 #
X X
0
Γ s = E (yt − µ) (yt −s − µ) = E Φi1 εt −i Φi1 εt −s −i
(5.12)
i =0 i =0
" s −1
! ∞
!0 #
X X
=E Φi1 εt −i Φi1 εt −s −i
i =0 i =0
" ∞
! ∞
!0 #
X X
+E Φ1s Φi1 εt −s −i Φi1 εt −s −i (5.13)
i =0 i =0
" ∞
! ∞
!0 #
X X
= 0 + Φ1s E Φi1 εt −s −i Φi1 εt −s −i
i =0 i =0
= Φ1s V [yt ]
and
3
This follows directly from the property of a transpose that if A and B are compatible matrices, (AB)0 = B0 A0 .
5.2 Companion Form 329
" ∞
! ∞
!0 #
X X
0
Γ −s = E (yt − µ) (yt +s − µ) = E Φi1 εt −i Φi1 εt +s −i
(5.14)
i =0 i =0
" ∞
! ∞
!0 #
X X
=E Φi1 εt −i Φ1s Φi1 εt −i
i =0 i =0
" ∞
! s −1
!0 #
X X
+E Φi1 εt −i Φi1 εt +s −i (5.15)
i =0 i =0
" ∞
! ∞
!#
0 i 0 s
X X
=E Φi1 εt −i ε0t −i Φ1 Φ1 +0
i =0 i =0
" ∞
! ∞
!#
0 i
X X s
=E Φi1 εt −i ε0t −i Φ1 Φ01
i =0 i =0
0 s
= V [yt ] Φ1
where V[yt ] is the symmetric covariance matrix of the VAR. Like most properties of a VAR, this
result is similar to the autocovariance function of an AR(1): γs = φ1s σ2 /(1 − φ12 ) = φ1s V[yt ].
y t = Φ 0 + Φ 1 y t −1 + Φ 2 y t −2 + . . . + Φ P y t −P + ε t .
By subtracting the mean and stacking P of yt into a large column vector denoted zt , a VAR(P) can
be transformed into a VAR(1) by constructing the companion form.
yt = Φ0 + Φ1 yt −1 + Φ2 yt −2 + . . . + ΦP yt −P + εt
PP − 1
where εt is a vector white noise process and µ = I− p =1 Φp Φ0 = E[yt ] is finite. The
companion form is given by
z t = Υ z t −1 + ξ t (5.16)
where
yt − µ
yt −1 − µ
zt = , (5.17)
..
.
yt −P +1 − µ
330 Analysis of Multiple Time Series
Φ1 Φ2 Φ3 . . . Φ P −1 Φ P
Ik 0 0 ... 0 0
Υ = 0 Ik 0 ... 0 0 (5.18)
.. .. .. .. .. ..
. . . . . .
0 0 0 ... Ik 0
and
εt
0
ξt = . (5.19)
..
.
0
This is known as the companion form and allows the statistical properties of any VAR(P) to be
directly computed using only the results of a VAR(1) noting that
Σ 0 ... 0
0 0 ... 0
E[ξt ξ0t ] = .. .. .. .. .
. . . .
0 0 ... 0
Using this form, it can be determined that a VAR(P) is covariance stationary if all of the eigenval-
ues of Υ - there are k P of them - are less than one in absolute value (modulus if complex).4
Throughout this chapter two examples from the finance literature will be used.
Stocks and long term bonds are often thought to hedge one another. VARs provide a simple
method to determine whether their returns are linked through time. Consider the VAR(1)
4
Companion form is also useful when working with univariate AR(P) models. An AR(P) can be reexpressed us-
ing its companion VAR(1) which allows properties such as the long-run variance and autocovariances to be easily
computed.
5.3 Empirical Examples 331
Since these models do not share any parameters, they can be estimated separately using OLS.
Using annualized return data for the VWM from CRSP and the 10-year constant maturity treasury
yield from FRED covering the period May 1953 until December 2008, a VAR(1) was estimated.5
9.733 0.097 0.301
(0.000) (0.104) (0.000) V W M t −1 ε1,t
V W Mt
= + +
10Y R t 1.058 −0.095 0.299 10Y R t −1 ε2,t
(0.000) (0.000) (0.000)
where the p-val is in parenthesis below each coefficient. A few things are worth noting. Stock
returns are not predictable with their own lag but do appear to be predictable using lagged bond
returns: positive bond returns lead to positive future returns in stocks. In contrast, positive re-
turns in equities result in negative returns for future bond holdings. The long-run mean can be
computed as
−1
1 0 0.097 0.301 9.733 10.795
− = .
0 1 −0.095 0.299 1.058 0.046
These values are similar to the sample means of 10.801 and 0.056.
Campbell (1996) builds a theoretical model for asset prices where economically meaningful vari-
ables evolve according to a VAR. Campbell’s model included stock returns, real labor income
growth, the term premium, the relative t-bill rate and the dividend yield. The VWM series from
CRSP is used for equity returns. Real labor income is computed as the log change in income from
labor minus the log change in core inflation and both series are from FRED. The term premium
is the difference between the yield on a 10-year constant maturity bond and the 3-month t-bill
rate. Both series are from FRED. The relative t-bill rate is the current yield on a 1-month t-bill
minus the average yield over the past 12 months and the data is available on Ken French’s web
site. The dividend yield was computed as the difference in returns on the VWM with and without
dividends; both series are available from CRSP.
Using a VAR(1) specification, the model can be described
5
The yield is first converted to prices and then returns are computed as the log difference in consecutive prices.
332 Analysis of Multiple Time Series
Raw Data
V W M t −1 L B R t −1 RT Bt −1 T E R M t −1 D I Vt −1
V W Mt 0.073 0.668 −0.050 −0.000 0.183
(0.155) (0.001) (0.061) (0.844) (0.845)
L B Rt 0.002 −0.164 0.002 0.000 −0.060
(0.717) (0.115) (0.606) (0.139) (0.701)
RT Bt 0.130 0.010 0.703 −0.010 0.137
(0.106) (0.974) (0.000) (0.002) (0.938)
T E R Mt −0.824 −2.888 0.069 0.960 4.028
(0.084) (0.143) (0.803) (0.000) (0.660)
D I Vt 0.001 −0.000 −0.001 −0.000 −0.045
(0.612) (0.989) (0.392) (0.380) (0.108)
Standardized Series
V W M t −1 L B R t −1 RT Bt −1 T E R M t −1 D I Vt −1
V W Mt 0.073 0.112 −0.113 −0.011 0.007
(0.155) (0.001) (0.061) (0.844) (0.845)
L B Rt 0.012 −0.164 0.027 0.065 −0.013
(0.717) (0.115) (0.606) (0.139) (0.701)
RT Bt 0.057 0.001 0.703 −0.119 0.002
(0.106) (0.974) (0.000) (0.002) (0.938)
T E R Mt −0.029 −0.017 0.006 0.960 0.005
(0.084) (0.143) (0.803) (0.000) (0.660)
D I Vt 0.024 −0.000 −0.043 −0.043 −0.045
(0.612) (0.989) (0.392) (0.380) (0.108)
Table 5.1: Parameter estimates from Campbell’s VAR. The top panel contains estimates using
unscaled data while the bottom panel contains estimates from data which have been standard-
ized to have unit variance. While the magnitudes of many coefficients change, the p-vals and
the eigenvalues of these two parameter matrices are identical, and the parameters are roughly
comparable since the series have the same variance.
V W Mt V W M t −1 ε1,t
L B Rt
L B R t −1
ε2,t
RT Bt = Φ0 + Φ1 RT Bt −1 + ε3,t .
ε4,t
T E R Mt T E R M t −1
D I Vt D I Vt −1 ε5t
Two sets of parameters are presented in table 5.1. The top panel contains estimates using non-
scaled data. This produces some very large (in magnitude, not statistical significance) estimates
which are the result of two variables having very different scales. The bottom panel contains esti-
mates from data which have been standardized by dividing each series by its standard deviation.
This makes the magnitude of all coefficients approximately comparable. Despite this transfor-
mation and very different parameter estimates, the p-vals remain unchanged. This shouldn’t be
surprising since OLS t -stats are invariant to scalings of this type. One less obvious feature of the
two sets of estimates is that the eigenvalues of the two parameter matrices are identical and so
both sets of parameter estimates indicate the same persistence.
5.4 VAR forecasting 333
h −1
X j
E t [yt +h ] = φ1 φ0 + φ1h yt .
j =0
h −1
X j
E t [yt +h ] = Φ1 Φ0 + Φh1 yt
j =0
Forecasts from higher order VARs can be constructed by direct forward recursion beginning at
h = 1, although they are often simpler to compute using the deviations form of the VAR since it
includes no intercept,
Using the deviations form h -step ahead forecasts from a VAR(P) can be computed using the re-
currence
starting at E t [ỹt +1 ]. Once the forecast of E t [ỹt +h ] has been computed, the h -step ahead forecast
of yt +h is constructed by adding the long run mean, E t [yt +h ] = µ + E t [ỹt +h ].
ε1,t
V W Mt 9.733 0.097 0.301 V W M t −1
= + +
10Y R t 1.058 −0.095 0.299 10Y R t −1 ε2,t
and a simple AR(1) for each series. The data set contains a total of 620 observations. Begin-
ning at observation 381 and continuing until observation 620, the models (the VAR and the two
ARs) were estimated using an expanding window of data and 1-step ahead forecasts were com-
puted.6 Figure 5.1 contains a graphical representation of the differences between the AR(1)s and
6
Recursive forecasts computed using an expanding window use data from t = 1 to R to estimate any model
parameters and to produce a forecast of R +1. The sample then grows by one observation and data from 1 to R + 1
are used to estimate model parameters and a forecast of R + 2 is computed. This pattern continues until the end
334 Analysis of Multiple Time Series
40
20
−20
1987 1990 1993 1995 1998 2001 2004 2006
20
−20
the VAR(1). The forecasts for the market are substantially different while the forecasts for the
10-year bond return are not. The changes (or lack there of ) are simply a function of the model
specification: the return on the 10-year bond has predictive power for both series. The VAR(1)
is a better model for stock returns (than an AR(1)) although it not meaningfully better for bond
returns.
Definition 5.8 (Cross-correlation). The sth cross correlations between two covariance stationary
of the sample. An alternative is to use rolling windows where both the end point and the start point move through
time so that the distance between the two is constant.
5.5 Estimation and Identification 335
E[(x t − µ x )(yt −s − µ y )]
ρ x y ,s = (5.20)
V[x t ]V[yt ]
p
and
E[(yt − µ y )(x t −s − µ x )]
ρ y x ,s = (5.21)
V[x t ]V[yt ]
p
where the order of the indices indicates which variable is measured using contemporaneous val-
ues and which variable is lagged, E[yt ] = µ y and E[x t ] = µ x .
It should be obvious that, unlike autocorrelations, cross-correlations are not symmetric – the or-
der, x y or y x , matters. Partial cross-correlations are defined in a similar manner; the correlation
between x t and yt −s controlling for yt −1 , . . . , yt −(s −1) .
Definition 5.9 (Partial Cross-correlation). The partial cross-correlations between two covariance
stationary series { x t } and { yt } are defined as the population values of the coefficients ϕ x y ,s in
and ϕ y x ,s in
yt = φ0 + φ1 x t −1 + . . . + φs −1 x t −(s −1) + ϕ y x ,s x t −s + ε x ,t (5.23)
where the order of the indices indicates which variable is measured using contemporaneous val-
ues and which variable is lagged.
Figure 5.2 contains the CCF (cross-correlation function) and PCCF (partial cross-correlation
function) of two first order VARs with identical persistence. The top panel contains the functions
for
ε1,t
yt .5 .4 yt −1
= +
xt .4 .5 x t −1 ε2,t
while the bottom contains the functions for a trivial VAR
ε1,t
yt .9 0 yt −1
= +
xt 0 .9 x t −1 ε2,t
which is just two AR(1)s in a system. The nontrivial VAR(1) exhibits dependence between both
series while the AR-in-disguise shows no dependence between yt and x t − j , j > 0.
With these new tools, it would seem that Box-Jenkins could be directly applied to vector pro-
cesses, and while technically possible, exploiting the ACF, PACF, CCF and PCCF to determine
what type of model is appropriate is difficult. For specifications larger than a bivariate VAR, there
are simply too many interactions.
The usual solution is to take a hands off approach as advocated by Sims (1980). The VAR spec-
ification should include all variables which theory indicate are relevant and a lag length should
336 Analysis of Multiple Time Series
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
5 10 15 20 5 10 15 20
0.8
0.5
0.6
0
0.4
−0.5
0.2
−1
0
Figure 5.2: The 5top panel
10 contains
15 the
20 ACF and CCF for a nontrivial
5 VAR process
10 15 where
20 contem-
poraneous values depend on both series. The bottom contains the ACF and CCF for a trivial VAR
which is simply composed to two AR(1)s.
be chosen which has a high likelihood of capturing all of the dynamics. Once these values have
been set, either a general-to-specific search can be conducted or an information criteria can be
used to select the appropriate lag length. In the VAR case, the Akaike IC, Hannan & Quinn (1979)
IC and the Schwarz/Bayes IC are given by
2
AIC: ln |Σ̂(P )| + k 2 P
T
2 ln ln T
HQC: ln |Σ̂(P )| + k 2 P
T
ln T
SBIC: ln |Σ̂(P )| + k 2 P
T
where Σ̂(P ) is the covariance of the residuals using P lags and | · | indicates determinant.7 The
7
ln |Σ̂| is, up to an additive constant, the gaussian log-likelihood divided by T , and these information criteria are
5.5 Estimation and Identification 337
lag length should be chosen to minimize one of these criteria, and the SBIC will always choose a
(weakly) smaller model than the HQC which in turn will select a (weakly) smaller model than the
AIC. Ivanov & Kilian (2005) recommend the AIC for monthly models and the HQC for quarterly
models, unless the sample size is less than 120 quarters in which case the SBIC is preferred. Their
recommendation is based on the accuracy of the impulse response function, and so may not be
ideal in other applications such as forecasting.
To use a general-to-specific approach, a simple likelihood ratio test can be computed as
A
(T − P2 k 2 ) ln |Σ̂(P1 )| − ln |Σ̂(P2 )| ∼ χ(P
2
2 −P1 )k
2
where P1 is the number of lags in the restricted (smaller) model, P2 is the number of lags in the
unrestricted (larger) model and k is the dimension of yt . Since model 1 is a restricted version
of model 2, its variance is larger which ensures this statistic is positive. The −P2 k 2 term in the
log-likelihood is a degree of freedom correction that generally improves small-sample perfor-
mance. Ivanov & Kilian (2005) recommended against using sequential likelihood ratio testing
for selecting lag length.
A lag length selection procedure was conducted using Campbell’s VAR. The results are contained
in table 5.2. This table contains both the AIC and SBIC values for lags 0 through 12 as well as like-
lihood ratio test results for testing l lags against l + 1. Note that the LR and P-val corresponding
to lag l is a test of the null of l lags against an alternative of l + 1 lags. Using the AIC, 12 lags
would be selected since it produces the smallest value. If the initial lag length was less than 12,
7 lags would be selected. The HQC and SBIC both choose 3 lags in a specific-to-general search
and 12 in a general-to-specific search. Using the likelihood ratio, a general-to-specific proce-
dure chooses 12 lags while a specific-to-general procedure chooses 3. The test statistic for a null
of H0 : P = 11 against an alternative that H1 : P = 12 has a p-val of 0.
One final specification search was conducted. Rather than beginning at the largest lag and
work down one at a time, a “global search” which evaluates models using every combination of
lags up to 12 was computed. This required fitting 4096 VARs which only requires a few seconds
on a modern computer.8 For each possible combination of lags, the AIC and the SBIC were com-
puted. Using this methodology, the AIC search selected lags 1-4, 6, 10 and 12 while the SBIC
selected a smaller model with only lags 1, 3 and 12 - the values of these lags indicate that there
may be a seasonality in the data. Search procedures of this type are computationally viable for
checking up to about 20 lags.
all special cases of the usual information criteria for log-likelihood models which take the form L + PI C where PI C
is the penalty which depends on the number of estimated parameters in the model.
8
For a maximum lag length of L , 2L models must be estimated.
338 Analysis of Multiple Time Series
Table 5.2: Normalized values for the AIC and SBIC in Campbell’s VAR. The AIC chooses 12 lags
while the SBIC chooses only 3. A general-to-specific search would stop at 12 lags since the like-
lihood ratio test of 12 lags against 11 rejects with a p-value of 0. If the initial number of lags was
less than 12, the GtS procedure would choose 6 lags. Note that the LR and P-val corresponding
to lag l is a test of the null of l lags against an alternative of l + 1 lags.
5.6.1 Definition
Granger causality is defined in the negative.
Definition 5.10 (Granger causality). A scalar random variable { x t } is said to not Granger cause
{ yt } if
E[yt | x t −1 , yt −1 , x t −2 , yt −2 , . . .] = E[yt |, yt −1 , yt −2 , . . .].9 That is, { x t } does not Granger cause if the
forecast of yt is the same whether conditioned on past values of x t or not.
Granger causality can be simply illustrated in a bivariate VAR.
In this model, if φ21,1 = φ21,2 = 0 then { x t } does not Granger cause { yt }. If this is the case, it may
be tempting to model yt using
However, it is not; ε1,t and ε2,t can be contemporaneously correlated. If it happens to be the case
that { x t } does not Granger cause { yt } and ε1,t and ε2,t have no contemporaneous correlation,
then yt is said to be weakly exogenous, and yt can be modeled completely independently of x t .
Finally it is worth noting that { x t } not Granger causing { yt } says nothing about whether { yt }
Granger causes { x t }.
One important limitation of GC is that it doesn’t account for indirect effects. For example,
suppose x t and yt are both Granger caused by z t . When this is the case, x t will usually Granger
cause yt even when it has no effect once z t has been conditioned on, and so E [yt | yt −1 , z t −1 , x t −1 , . . .] =
E [yt | yt −1 , z t −1 , . . .] but E [yt | yt −1 , x t −1 , . . .] 6= E [yt | yt −1 , . . .].
5.6.2 Testing
Testing for Granger causality in a VAR(P) is usually conducted using likelihood ratio tests. In this
specification,
yt = Φ0 + Φ1 yt −1 + Φ2 yt −2 + . . . + ΦP yt −P + εt ,
A
(T − (P k 2 − k )) ln |Σ̂r | − ln |Σ̂u | ∼ χP2
where Σr is the estimated residual covariance when the null of no Granger causation is imposed
(H0 : φi j ,1 = φi j ,2 = . . . = φi j ,P = 0) and Σu is the estimated covariance in the unrestricted
VAR(P). If there is no Granger causation in a VAR, it is probably not a good idea to use one.10
Campbell’s VAR will be used to illustrate testing for Granger causality. Table 5.3 contains the
results of Granger causality tests from a VAR which included lags 1, 3 and 12 (as chosen by the
“global search” SBIC method) for the 5 series in Campbell’s VAR. Tests of a variable causing itself
have been omitted as these aren’t particularly informative in the multivariate context. The table
tests whether the variables in the left hand column Granger cause the variables along the top row.
From the table, it can be seen that every variable causes at least one other variable since each row
contains a p-val indicating significance using standard test sizes (5 or 10%) since the null is no
10
The multiplier in the test is a degree of freedom adjusted factor. There are T data points and there are P k 2 − k
parameters in the restricted model.
340 Analysis of Multiple Time Series
Table 5.3: Tests of Granger causality. This table contains tests where the variable on the left hand
side is excluded from the regression for the variable along the top. Since the null is no GC, rejec-
tion indicates a relationship between past values of the variable on the left and contemporaneous
values of variables on the top.
Granger causation. It can also be seen that every variable is caused by another by examining the
p-values column by column.
5.7.1 Defined
Definition 5.11 (Impulse Response Function). The impulse response function of yi , an element
of y, with respect to a shock in ε j , an element of ε, for any j and i , is defined as the change in
yi t +s , s ≥ 0 for a unit shock in ε j ,t .
This definition is somewhat difficult to parse and the impulse response function can be clearly
illustrated through a vector moving average (VMA).11 As long as yt is covariance stationary it must
have a VMA representation,
yt = µ + εt + Ξ1 εt −1 + Ξ2 εt −2 + . . .
Using this VMA, the impulse response yi with respect to a shock in ε j is simply {1, Ξ1[i i ] , Ξ2[i i ] , Ξ3 [i i ] , . . .}
Recall that a stationary AR(P) can also be transformed into a MA(∞). Transforming a stationary VAR(P) into a
11
if i = j and {0, Ξ1[i j ] , Ξ2[i j ] , Ξ3 [i j ] , . . .} otherwise. The difficult part is computing Ξl , l ≥ 1. In the
simple VAR(1) model this is easy since
However, in more complicated models, whether higher order VARs or VARMAs, determining the
MA(∞) form can be tedious. One surprisingly simply, yet correct, method to compute the el-
ements of {Ξ j } is to simulate the effect of a unit shock of ε j ,t . Suppose the model is a VAR in
deviations form12 ,
The impulse responses can be computed by “shocking” εt by 1 unit and stepping the process
forward. To use this procedure, set yt −1 = yt −2 = . . . = yt −P = 0 and then begin the simulation
by setting ε j ,t = 1. The 0th impulse will obviously be e j = [0 j −1 1 0k − j ]0 , a vector with a 1 in the
jth position and zeros everywhere else. The first impulse will be,
Ξ1 = Φ1 e j ,
Ξ3 = Φ31 e j + Φ1 Φ2 e j + Φ2 Φ1 e j + Φ3 e j .
spectral matrix square root is symmetric and a shock to the jth error will generally effect every
series instantaneously. Unfortunately there is no right choice. If there is a natural ordering in
a VAR where shocks to one series can be reasoned to have no contemporaneous effect on the
other series, then the Choleski is the correct choice. However, in many situations there is little
theoretical guidance and the spectral decomposition is the natural choice.
Monte Carlo confidence intervals come in two forms, one that directly simulates Φ̂i from its
asymptotic distribution and one that simulates the VAR and draws Φ̂i as the result of estimat-
ing the unknown parameters in the simulated VAR. The direct sampling method is simple:
1. Compute Φ̂ from the initial data and estimate the covariance matrix Λ̂ in the asymptotic
√ A
distribution T (Φ̂ − Φ0 ) ∼ N (0, Λ).13
1/2
2. Using Φ̂ and Λ̂, generate simulated values Φ̃b from the asymptotic distribution as Λ̂ ε + Φ̂
i.i.d.
where ε ∼ N (0, I). These are i.i.d. draws from a N (Φ̂, Λ̂) distribution.
3. Using Φ̃b , compute the impulse responses {Ξ̂ j ,b } where b = 1, 2, . . . , B . Save these values.
13
This is an abuse of notation. Φ is a matrix and the vec operator is needed to transform it into a vector. Interested
readers should see 11.7 in Hamilton (1994) for details on the correct form.
5.7 Impulse Response Function 343
3 1
1 0.5
0
0
−1
2 4 6 8 10 12 2 4 6 8 10 12
RTB to TERM RTB to DIV
1
−8
−10
−12 0.5
−14
−16
0
−18
−20
2 4 6 8 10 12 2 4 6 8 10 12
Period Period
Figure 5.3: Impulse response functions for 12 steps of the response of the relative T-bill rate to
equity returns, labor income growth, the term premium rate and the dividend yield. The dotted
lines represent 2 standard deviation (in each direction) confidence intervals. All values have been
scaled by 1,000.
4. Return to step 2 and compute a total of B impulse responses. Typically B is between 100
and 1000.
5. For each impulse response for each horizon, sort the responses. The 5th and 95th percentile
of this distribution are the confidence intervals.
The second Monte Carlo method differs only in the method used to compute Φ̃b .
1. Compute Φ̂ from the initial data and estimate the residual covariance Σ̂.
2. Using Φ̂ and Σ̂, simulate a time-series {ỹt } with as many observations as the original data.
These can be computed directly using forward recursion
1/2
ỹt = Φ̂0 + Φ̂1 yt −1 + . . . + Φ̂P yt −P + Σ̂ εt
344 Analysis of Multiple Time Series
i.i.d.
where ε ∼ N (0, Ik ) are i.i.d. multivariate standard normally distributed.
4. Using Φ̃b , compute the impulse responses {Ξ̃ j ,b } where b = 1, 2, . . . , B . Save these values.
5. Return to step 2 and compute a total of B impulse responses. Typically B is between 100
and 1000.
6. For each impulse response for each horizon, sort the impulse responses. The 5th and 95th
percentile of this distribution are the confidence intervals.
Of these two methods, the former should be preferred as the assumption of i.i.d. normal errors in
the latter may be unrealistic. This is particularly true for financial data. The final method, which
uses a procedure known as the bootstrap, combines the ease of the second with the robustness
of the first.
The bootstrap is a computational tool which has become popular in recent years primarily due
to the significant increase in the computing power of typical PCs. Its name is derived from the ex-
pression “pulling oneself up by the bootstraps”, a seemingly impossible feat. The idea is simple:
if the residuals are realizations of the actual error process, one can use them directly to simu-
late this distribution rather than making an arbitrary assumption about the error distribution
(e.g. i.i.d. normal). The procedure is essentially identical to the second Monte Carlo procedure
outlined above:
1. Compute Φ̂ from the initial data and estimate the residuals ε̂t .
2. Using ε̂t , compute a new series of residuals ε̃t by sampling, with replacement, from the
original residuals. The new series of residuals can be described
where u i are i.i.d. discrete uniform random variables taking the values 1, 2, . . . , T . In essence,
the new set of residuals is just the old set of residuals reordered with some duplication and
omission.14
3. Using Φ̂ and {ε̂u 1 , ε̂u 2 , . . . , ε̂u T }, simulate a time-series {ỹt } with as many observations as
the original data. These can be computed directly using the VAR
5. Using Φ̆b , compute the impulse responses {Ξ̆ j ,b } where b = 1, 2, . . . , B . Save these values.
6. Return to step 2 and compute a total of B impulse responses. Typically B is between 100
and 1000.
7. For each impulse response for each horizon, sort the impulse responses. The 5th and 95th
percentile of this distribution are the confidence intervals.
The bootstrap has many uses in econometrics. Interested readers can find more applications in
Efron & Tibshirani (1998).
5.8 Cointegration
Many economic time-series have two properties that make standard VAR analysis unsuitable:
they contain one or more unit roots and most equilibrium models specify that deviations be-
tween key variables, either in levels or ratios, are transitory. Before formally defining cointegra-
tion, consider the case where two important economic variables that contain unit roots, con-
sumption and income, had no long-run relationship. If this were true, the values of these vari-
ables would grow arbitrarily far apart given enough time. Clearly this is unlikely to occur and so
there must be some long-run relationship between these two time-series. Alternatively, consider
the relationship between the spot and future price of oil. Standard finance theory dictates that
the future’s price, f t , is a conditionally unbiased estimate of the spot price in period t + 1, st +1
(E t [st +1 ] = f t ). Additionally, today’s spot price is also an unbiased estimate of tomorrow’s spot
price (E t [st +1 ] = st ). However, both of these price series contain unit roots. Combining these
two identities reveals a cointegrating relationship: st − f t should be stationary even if the spot
and future prices contain unit roots.15
It is also important to note how cointegration is different from stationary VAR analysis. In
stationary time-series, whether scalar or when the multiple processes are linked through a VAR,
the process is self-equilibrating; given enough time, a process will revert to its unconditional
mean. In a VAR, both the individual series and linear combinations of the series are stationary.
The behavior of cointegrated processes is meaningfully different. Treated in isolation, each pro-
cess contains a unit root and has shocks with permanent impact. However, when combined with
another series, a cointegrated pair will show a tendency to revert towards one another. In other
words, a cointegrated pair is mean reverting to a stochastic trend.
Cointegration and error correction provide the tools to analyze temporary deviations from
long-run equilibria. In a nutshell, cointegrated time-series may show temporary deviations from
a long-run trend but are ultimately mean reverting to this trend. It may also be useful to relate
cointegration to what has been studied thus far: cointegration is to VARs as unit roots are to
stationary time-series.
15
This assumes the horizon is short.
346 Analysis of Multiple Time Series
5.8.1 Definition
Recall that an integrated process is defined as a process which is not stationary in levels but is
stationary in differences. When this is the case, yt is said to be I(1) and ∆yt = yt − yt −1 is I(0).
Cointegration builds on this structure by defining relationships across series which transform
I(1) series into I(0) series.
Definition 5.12 (Bivariate Cointegration). Let { x t } and { yt } be two I(1) series. These series are
said to be cointegrated if there exists a vector β with both elements non-zero such that
β 0 [x t yt ]0 = β1 x t − β2 yt ∼ I (0) (5.24)
Put another way, there exists a nontrivial linear combination of x t and yt which is station-
ary. This feature, when present, is a powerful link in the analysis of nonstationary data. When
treated individually, the data are extremely persistent; however there is a combination of the
data which is well behaved. Moreover, in many cases this relationship takes a meaningful form
such as yt − x t . Note that cointegrating relationships are only defined up to a constant. For
example if x t − β yt is a cointegrating relationship, then 2x t − 2β yt = 2(x t − β yt ) is also a coin-
tegrating relationship. The standard practice is to chose one variable to normalize the vector.
For example, if β1 x t − β2 yt was a cointegrating relationship, one normalized version would be
x t − β2 /β1 yt = x t − β̃ yt .
The definition in the general case is similar, albeit slightly more intimidating.
The non-zero requirement is obvious: if π = 0 then πyt = 0 and is trivially I(0). The second
requirement, that π is reduced rank, is not. This technical requirement is necessary since when-
ever π is full rank and πyt ∼ I (0), the series must be the case that yt is also I(0). However, in
order for variables to be cointegrated they must also be integrated. Thus, if the matrix is full rank,
there is no possibility for the common unit roots to cancel and it must have the same order of
integration before and after the multiplication by π. Finally, the requirement that at least 2 of the
series are I (1) rules out the degenerate case where all components of yt are I (0), and allows yt to
contain both I (0) and I (1) random variables.
For example, suppose x t and yt are cointegrated and x t − β yt is stationary. One choice for π
is
1 −β
π=
1 −β
To begin developing a feel for cointegration, examine the plots in figure 5.4. These four plots
correspond to two nonstationary processes and two stationary processes all beginning at the
same point and all using the same shocks. These plots contain data from a simulated VAR(1) with
5.8 Cointegration 347
8 15
10
6
5
4
0
2
−5
0 −10
−2 −15
0 20 40 60 80 100 0 20 40 60 80 100
Persistent, Stationary (Φ21 ) Anti-persistent, Stationary(Φ22 )
6 4
4 3
2
2
1
0
0
−2
−1
−4 −2
−6 −3
0 20 40 60 80 100 0 20 40 60 80 100
Figure 5.4: A plot of four time-series that all begin at the same point initial value and use the
same shocks. All data were generated by yt = Φi j yt −1 + εt where Φi j varies.
different parameters, Φi j .
y t = Φ i j y t −1 + ε t
.8 .2 1 0
Φ11 = Φ12 =
.2 .8 0 1
λi = 1, 0.6 λi = 1, 1
.7 .2 −.3
.3
Φ21 = Φ22 =
.2 .7 .1 −.2
λi = 0.9, 0.5 λi = −0.43, −0.06
where λi are the eigenvalues of the parameter matrices. Note that the eigenvalues of the nonsta-
tionary processes contain the value 1 while the eigenvalues for the stationary processes are all
348 Analysis of Multiple Time Series
less then 1 (in absolute value). Also, note that the cointegrated process has only one eigenvalue
which is unity while the independent unit root process has two. Higher dimension cointegrated
systems may contain between 1 and k − 1 unit eigenvalues. The number of unit eigenvalues
indicates the number of unit root “drivers” in a system of equations. The picture presents evi-
dence of another issue in cointegration analysis: it can be very difficult to tell when two series
are cointegrated, a feature in common with unit root testing of scalar processes.
The Granger representation theorem provides a key insight to understanding cointegrating re-
lationships. Granger demonstrated that if a system is cointegrated then there exists an error cor-
rection model and if there is an error correction model then the system must be cointegrated.
The error correction model is a form which governs short deviations from the trend (a stochas-
tic trend or unit root). The simplest ECM is given by
∆x t α1 ε1,t
x t −1
= 1 −β +
. (5.27)
∆yt α2 yt −1 ε2,t
The short-run dynamics take the forms
∆x t = α1 (x t −1 − β yt −1 ) + ε1,t (5.28)
and
∆yt = α2 (x t −1 − β yt −1 ) + ε2,t . (5.29)
The important elements of this ECM can be clearly labeled: x t −1 − β yt −1 is the deviation from
the long-run trend (also known as the equilibrium correction term) and α1 and α2 are the speed
of adjustment parameters. ECMs impose one restriction of the αs: they cannot both be 0 (if they
were, π would also be 0). In its general form, an ECM can be augmented to allow past short-run
deviations to also influence present short-run deviations or to include deterministic trends. In
vector form, the generalized ECM is
where πyt −1 captures the cointegrating relationship, π0 represents a linear time trend in the orig-
inal data (levels) and π j ∆yt − j , j = 1, 2, . . . , P capture short-run dynamics around the stochastic
trend.
5.8 Cointegration 349
It may not be obvious how a cointegrated VAR is transformed into an ECM. Consider a simple
cointegrated bivariate VAR(1)
ε1,t
xt .8 .2 x t −1
= +
yt .2 .8 yt −1 ε2,t
ε1,t
xt x t −1 .8 .2 x t −1 x t −1
− = − + (5.30)
yt yt −1 .2 .8 yt −1 yt −1 ε2,t
∆x t ε1,t
.8 .2 1 0 x t −1
= − +
∆yt .2 .8 0 1 yt −1 ε2,t
∆x t ε1,t
−.2 .2 x t −1
= +
∆yt .2 −.2 yt −1 ε2,t
∆x t ε1,t
−.2 x t −1
= +
1 −1
∆yt .2 yt −1 ε2,t
In this example, the speed of adjustment parameters are -.2 for ∆x t and .2 for ∆yt and the nor-
malized (on x t ) cointegrating relationship is [1 − 1]. In the general multivariate case, a coin-
tegrated VAR(P) can be turned into an ECM by recursive substitution. Consider a cointegrated
VAR(3),
yt = Φ1 yt −1 + Φ2 yt −2 + Φ3 yt −3 + εt
This system will be cointegrated if at least one but fewer than k eigenvalues of π = Φ1 +Φ2 +Φ3 − Ik
are not zero. To begin the transformation, add and subtract Φ3 yt −2 to the right side
yt = Φ1 yt −1 + Φ2 yt −2 + Φ3 yt −2 − Φ3 yt −2 + Φ3 yt −3 + εt
= Φ1 yt −1 + Φ2 yt −2 + Φ3 yt −2 − Φ3 ∆yt −2 + εt
= Φ1 yt −1 + (Φ2 + Φ3 )yt −2 − Φ3 ∆yt −2 + εt
which is equivalent to
where α contains the speed of adjustment parameters and β contains the cointegrating vectors.
This recursion can be used to transform any cointegrated VAR(P)
yt −1 = Φ1 yt −1 + Φ2 yt −2 + . . . + ΦP yt −P + εt
The key to understanding cointegration in systems with 3 or more variables is to note that the
matrix which governs the cointegrating relationship, π, can always be decomposed into two ma-
trices,
π = αβ 0
where α and β are both k by r matrices where r is the number of cointegrating relationships.
For example, suppose the parameter matrix in an ECM was
0.2 −0.36
0.3
π = 0.2 0.5 −0.35
−0.3 −0.3 0.39
The eigenvalues of this matrix are .9758, .2142 and 0. The 0 eigenvalue of π indicates there are
two cointegrating relationships since the number of cointegrating relationships is rank(π). Since
there are two cointegrating relationships, β can be specified as
5.8 Cointegration 351
1 0
β = 0 1
β1 β2
The rank of π is the same as the number of cointegrating vectors since π = αβ 0 and so if π has
rank r , then α and β must both have r linearly independent columns. α contains the speed
of adjustment parameters and β contains the cointegrating vectors. Note that since there are r
cointegrating vectors there are m = k − r distinct unit roots in the system. This relationship
holds since when there are k variables, and m distinct unit roots, then there are r distinct linear
combinations of the series which will be stationary (except in special circumstances).
Consider a trivariate cointegrated system driven by either one or two unit roots. Denote the
underlying unit root processes as w1,t and w2,t . When there is a single unit root driving all three
variables, the system can be expressed
where ε j ,t is a covariance stationary error (or I(0), but not necessarily white noise).
In this system there are two linearly independent cointegrating vectors. First consider nor-
malizing the coefficient on y1,t to be 1 and so in the equilibrium relationship y1,t − β1 y2,t − β1 y3,t
it must satisfy
κ1 = β1 κ2 + β2 κ3
to ensure that the unit roots are not present. This equation does not have a unique solution since
there are two unknown parameters. One solution is to further restrict β1 = 0 so that the unique
solution is β2 = κ1 /κ3 and an equilibrium relationship is y1,t − (κ1 /κ3 )y3,t . This is a cointegration
relationship since
352 Analysis of Multiple Time Series
κ1 κ1 κ1 κ1
y1,t − y3,t = κ1 w1,t + ε1,t − κ3 w1,t − ε3,t = ε1,t − ε3,t
κ3 κ3 κ3 κ3
Alternatively one could normalize the coefficient on y2,t and so the equilibrium relationship y2,t −
β1 y1,t − β2 y3,t would require
κ2 = β1 κ1 + β2 κ3
which again is not identified since there are 2 unknowns and 1 equation. To solve assume β1 = 0
and so the solution is β2 = κ2/κ3, which is a cointegrating relationship since
κ2 κ2 κ2 κ2
y2,t − y3,t = κ2 w1,t + ε2,t − κ3 w1,t − ε3,t = ε2,t − ε3,t
κ3 κ3 κ3 κ3
These solutions are the only two needed since any other definition of the equilibrium would
be a linear combination of these two. For example, suppose you choose next to try and normalize
on y1,t to define an equilibrium of the form y1,t − β1 y2,t − β2 y3,t , and impose that β3 = 0 to solve
so that β1 = κ1 /κ2 to produce the equilibrium condition
κ1
y1,t − y2,t .
κ2
This equilibrium is already implied by the first two,
κ1 κ2
y1,t − y3,t and y2,t − y3,t
κ3 κ3
and can be seen to be redundant since
κ1 κ1 κ1 κ2
y1,t − y2,t = y1,t − y3,t − y2,t − y3,t
κ2 κ3 κ2 κ3
In this system of three variables and 1 common unit root the set of cointegrating vectors can
be expressed as
1 0
β = 0 1
κ1 κ2
κ3 κ3
since with only 1 unit root and three series, there are two non-redundant linear combinations of
the underlying variables which will be stationary.
Next consider a trivariate system driven by two unit roots,
where the errors ε j ,t are again covariance stationary. By normalizing the coefficient on y1,t to be
1, it must be the case the weights in the equilibrium condition, y1,t − β1 y2,t − β2 y3,t , must satisfy
in order to eliminate both unit roots. This system of 2 equations in 2 unknowns has the solution
−1
β1 κ21 κ31 κ11
= .
β2 κ22 κ32 κ12
This solution is unique after the initial normalization and there are no other cointegrating vec-
tors, and so
1
κ11 κ32 −κ12 κ22
β =
κ21 κ32 −κ22 κ31
κ12 κ21 −κ11 κ31
κ21 κ32 −κ22 κ31
The same line of reasoning extends to k -variate systems driven by m unit roots, and r coin-
tegrating vectors can be constructed by normalizing on the first r elements of y one at a time. In
the general case
yt = Kwt + εt
β̃ = K− 1 0
2 K1 (5.36)
κ1
K1 = and K2 = κ3
κ2
and in the trivariate system driven by two unit roots,
κ21 κ22
K1 = [κ11 κ12 ] and K2 = .
κ31 κ32
354 Analysis of Multiple Time Series
Applying eqs. (5.35) and (5.36) will produce the previously derived set of cointegrating vectors.
Note that when r = 0 then the system contains k unit roots and so is not cointegrated (in general)
since the system would have 3 equations and only two unknowns. Similarly when r = k there
are no unit roots since any linear combination of the series must be stationary.
Cointegration is special case of a broader concept known as common features. In the case of
cointegration, both series have a common stochastic trend (or common unit root). Other exam-
ples of common features which have been examined are common heteroskedasticity, defined
as x t and yt are heteroskedastic but there exists a combination, x t − β yt , which is not, and
common nonlinearities which are defined in an analogous manner (replacing heteroskedastic-
ity with nonlinearity). When modeling multiple time series, you should always consider whether
the aspects you are interested in may be common.
5.8.4 Testing
Testing for cointegration shares one important feature with its scalar counterpart (unit root test-
ing): it can be complicated. Two methods will be presented, the original Engle-Granger 2-step
procedure and the more sophisticated Johansen methodology. The Engle-Granger method is
generally only applicable if there are two variables or the cointegrating relationship is known
(e.g. an accounting identity where the left-hand side has to add up to the right-hand side). The
Johansen methodology is substantially more general and can be used to examine complex sys-
tems with many variables and more than one cointegrating relationship.
The Johansen methodology is the dominant technique to determine whether a system of I (1)
variables is cointegrated, and if so, the number of cointegrating relationships. Recall that one of
the requirements for a set of integrated variables to be cointegrated is that π has reduced rank.
and so the number of non-zero eigenvalues of π is between 1 and k − 1. If the number of non-zero
eigenvalues was k , the system would be stationary. If no non-zero eigenvalues were present, the
system would be contain k unit roots. The Johansen framework for cointegration analysis uses
the magnitude of these eigenvalues to directly test for cointegration. Additionally, the Johansen
methodology allows the number of cointegrating relationships to be determined from the data
directly, a key feature missing from the Engle-Granger two-step procedure.
The Johansen methodology makes use of two statistics, the trace statistic (λtrace ) and the max-
imum eigenvalue statistic (λmax ). Both statistics test functions of the estimated eigenvalues of π
but have different null and alternative hypotheses. The trace statistic tests the null that the num-
ber of cointegrating relationships is less than or equal to r against an alternative that the number
5.8 Cointegration 355
is greater than r . Define λ̂i , i = 1, 2, . . . , k to be the complex modulus of the eigenvalues of π̂1
and let them be ordered such that λ1 > λ2 > . . . > λk .16 The trace statistic is defined
k
X
λtrace (r ) = −T ln(1 − λ̂i ).
i =r +1
There are k trace statistics. The trace test is applied sequentially, and the number of cointe-
grating relationships is determined by proceeding through the test statistics until the null cannot
Pk
be rejected. The first trace statistic, λtrace (0) = −T i =1 ln(1 − λ̂i ), tests that null of no cointe-
grating relationships (e.g. k unit roots) against an alternative that the number of cointegrating
relationships is 1 or more. For example, if the were no cointegrating relationships, each of the
eigenvalues would be close to zero and λtrace (0) ≈ 0 since every unit root “driver” corresponds
to a zero eigenvalue in π. When the series are cointegrated, π will have one or more non-zero
eigenvalues.
Like unit root tests, cointegration tests have nonstandard distributions that depend on the in-
cluded deterministic terms, if any. Fortunately, most software packages return the appropriate
critical values for the length of the time-series analyzed and any included deterministic regres-
sors.
The maximum eigenvalue test examines the null that the number of cointegrating relation-
ships is r against the alternative that the number is r + 1. The maximum eigenvalue statistic is
defined
λmax (r, r + 1) = −T ln(1 − λ̂r +1 )
Intuitively, if there are r +1 cointegrating relationships, then the r +1th ordered eigenvalue should
be different from zero and the value of λmax (r, r + 1) should be large. On the other hand, if there
are only r cointegrating relationships, the r + 1th eigenvalue should be close from zero and the
statistic will be small. Again, the distribution is nonstandard but most statistical packages pro-
vide appropriate critical values for the number of observations and the included deterministic
regressors.
The steps to implement the Johansen procedure are:
Step 1: Plot the data series being analyzed and perform univariate unit root testing. A set of vari-
ables can only be cointegrated if they are all integrated. If the series are trending, either linearly or
quadratically, make note of this and remember to include deterministic terms when estimating
the ECM.
Step 2: The second stage is lag length selection. Select the lag length using one of the procedures
outlined in the VAR lag length selection section (General-to-Specific, AIC or SBIC). For example,
to use the General-to-Specific approach, first select a maximum lag length L and then, starting
with l = L , test l lags against l − 1 use a likelihood ratio test,
Repeat the test decreasing the number of lags (l ) by one each iteration until the LR rejects the
null that the smaller model is appropriate.
Step 3: Estimate the selected ECM,
and determine the rank of π where P is the lag length previously selected. If the levels of the
series appear to be trending, then the model in differences should include a constant and
should be estimated. Using the λtrace and λmax tests, determine the cointegrating rank of the sys-
tem. It is important to check that the residuals are weakly correlated – so that there are no im-
portant omitted variables, not excessively heteroskedastic, which will affect the size and power
of the procedure, and are approximately Gaussian.
Step 4: Analyze the normalized cointegrating vectors to determine whether these conform to im-
plications of finance theory. Hypothesis tests on the cointegrating vector can also be performed
to examine whether the long-run relationships conform to a particular theory.
Step 5: The final step of the procedure is to assess the adequacy of the model by plotting and an-
alyzing the residuals. This step should be the final task in the analysis of any time-series data, not
just the Johansen methodology. If the residuals do not resemble white noise, the model should
be reconsidered. If the residuals are stationary but autocorrelated, more lags may be necessary.
If the residuals are I(1), the system may not be cointegrated.
To illustrate cointegration and error correction, three series which have played an important role
in the revival of the CCAPM in recent years will be examined. These three series are consumption
(c ), asset prices (a ) and labor income (y ). The data were made available by Martin Lettau on his
web site,
http://faculty.haas.berkeley.edu/lettau/data_cay.html
and contain quarterly data from 1952:1 until 2009:1.
The Johansen methodology begins by examining the original data for unit roots. Since it has
been clearly established that all series have unit roots, this will be skipped. The next step tests
eigenvalues of π in the error correction model
using λtrace and λmax tests. Table 5.4 contains the results of the two tests. These tests are applied
sequentially. However, note that all of the p-vals for the null r = 0 indicate no significance at
conventional levels (5-10%), and so the system appears to contain k unit roots.17 The Johansen
17
Had the first null been rejected, the testing would have continued until a null could not be rejected. The first
null not rejected would indicate the cointegrating rank of the system. If all null hypotheses are rejected, then the
5.8 Cointegration 357
Trace Test
Null Alternative λtrace Crit. Val. P-val
r =0 r ≥1 16.77 29.79 0.65
r =1 r ≥2 7.40 15.49 0.53
r =2 r =3 1.86 3.841 0.17
Max Test
Null Alternative λmax Crit. Val. P-val
r =0 r =1 9.37 21.13 0.80
r =1 r =2 5.53 14.26 0.67
r =2 r =3 1.86 3.841 0.17
Table 5.4: Results of testing using the Johansen methodology. Unlike the Engle-Granger proce-
dure, no evidence of cointegration is found using either test statistic.
The Engle-Granger method exploits the key feature of any cointegrated system where there is a
single cointegrating relationship – when data are cointegrated, a linear combination of the series
can be constructed that is stationary. If they are not, any linear combination will remain I(1).
When there are two variables, the Engle-Granger methodology begins by specifying the cross-
section regression
yt = β x t + εt
where β̂ can be estimated using OLS. It may be necessary to include a constant and
yt = β1 + β x t + εt
can be estimated instead if the residuals from the first regression are not mean 0. Once the coeffi-
cients have been estimated, the model residuals, ε̂t , can be tested for the presence of a unit root.
If x t and yt were both I(1) and ε̂t is I(0), the series are cointegrated. The procedure concludes by
using ε̂t to estimate the error correction model to estimate parameters which may be of interest
(e.g. the speed of convergence parameters).
Step 1: Begin by analyzing x t and yt in isolation to ensure that they are both integrated. You
should plot the data and perform ADF tests. Remember, variables can only be cointegrated if
original system appears stationary, and a further analysis of the I(1) classification of the original data is warranted.
358 Analysis of Multiple Time Series
yt = β1 + β2 x t + εt
using OLS and computing the estimated residuals {ε̂t }. Use an ADF test (or DF-GLS for more
power) and test H0 : γ = 0 against H1 : γ < 0 in the regression
It may be necessary to include deterministic trends. Fortunately, standard procedures for testing
time-series for unit roots can be used to examine if this series contains a unit root. If the null is
rejected and ε̂t is stationary, then x t and yt appear to be cointegrated. Alternatively, if ε̂t still
contains a unit root, the series are not cointegrated.
Step 3: If a cointegrating relationship is found, specify and estimate the error correction model
Note that this specification is not linear in its parameters. Both equations have interactions be-
tween the α and β parameters and so OLS cannot be used. Engle and Granger noted that the
terms involving β can be replaced with ε̂t −1 = (yt −1 − β̂1 − β̂2 x t −1 ),
The Engle-Granger procedure begins by performing unit root tests on the individual series and
examining the data. Table 5.5 and figure 5.5 contain these tests and graphs. The null of a unit root
cannot be rejected in any of the three series and all have time-detrended errors which appear to
be nonstationary.
The next step is to specify the cointegrating regression
c t = β1 + β2 a t + β3 yt + εt
and to estimate the long-run relationship using OLS. The estimated cointegrating vector from
5.8 Cointegration 359
Table 5.5: Unit root test results. The top three lines contain the results of ADF tests for unit roots
in the three components of c a y : Consumption, Asset Prices and Aggregate Wealth. None of
these series reject the null of a unit root. The final line contains the results of a unit root test on
the estimated residuals where the null is strongly rejected indicating that there is a cointegrating
relationship between the three. The lags column reports the number of lags used in the ADF
procedure as automatically selected using the AIC.
the first stage OLS was [1 − 0.170 − 0.713] which corresponds to a long-run relationship of
c t −0.994−0.170a t −0.713yt . Finally, the residuals were tested for the presence of a unit root. The
results of this test are in the final line of table 5.5 and indicate a strong rejection of a unit root in
the errors. Based on the Engle-Granger procedure, these three series appear to be cointegrated.
x t = x t −1 + ηt
and
yt = yt −1 + νt
In the regression
x t = β yt + εt
Analysis of cay
Original Series (logs)
Consumption Asset Prices Labor Income
0.1
−0.1
1954 1960 1965 1971 1976 1982 1987 1993 1998 2004
Error
0.03 0̂t
0.02
0.01
0
−0.01
−0.02
1954 1960 1965 1971 1976 1982 1987 1993 1998 2004
Figure 5.5: The top panel contains plots of time detrended residuals from regressions of con-
sumption, asset prices and labor income on a linear time trend. These series may contain
unit roots and are clearly highly persistent. The bottom panel contains a plot of εt = c t −
0.994 − 0.170a t − 0.713yt which is commonly known as the c a y scaling factor (pronounced
consumption-aggregate wealth). The null of a unit root is rejected using the Engle-Granger pro-
cedure for this series indicating that the three original series are cointegrated.
Balance is another important concept when data which contain both stationary and inte-
grated data. An equation is said to be balanced if all variables have the same order of integration.
The usual case occurs when a stationary variable (I(0)) is related to other stationary variables.
However, other situation may arise and it is useful to consider the four possibilities:
• I(0) on I(0): The usual case. Standard asymptotic arguments apply. See section 5.9 for more
issues in cross-section regression using time-series data.
• I(1) on I(0): This regression is unbalanced. An I(0) variable can never explain the long-run
variation in an I(1) variable. The usual solution is to difference the I(1) and the examine
whether the short-run dynamics in the differenced I(1) can be explained by the I(0).
• I(0) on I(1): This regression is unbalanced. An I(1) variable can never explain the variation
in an I(0) variable and unbalanced regressions are not meaningful in explaining economic
phenomena. Unlike spurious regressions, the t-stat still has a standard asymptotic dis-
tribution although caution is needed as small sample properties can be very poor. This
is a common problem in finance where a stationary variable, returns on the market, are
regressed on a very persistent “predictor” (such as the default premium or dividend yield).
The observation index above can be replaced with t to indicate that the data used in the regres-
sion are from a time-series,
yt = β1 x t 1 + β2 x t 2 + . . . + βk x t k + εt . (5.38)
Also recall the five assumptions used in establishing the asymptotic distribution of the parameter
estimated (recast with time-series indices):
The key assumption often violated in applications using time-series data is assumption ??, that
the scores from the linear regression, x0t εt are a martingale with respect to the time t − 1 in-
formation set, Ft −1 . When the scores are not a MDS, it is usually the case that the errors from
the model, εt , can be predicted by variables in Ft −1 , often their own lagged values. The MDS
assumption featured prominently in two theorems about the asymptotic distribution of β̂ and a
consistent estimators of its covariance.
and
−1 −1 p
Σ̂XX ŜΣ̂XX → Σ− 1 −1
XX SΣXX
where Ê = diag(ε̂21 , . . . , ε̂2T ) is a matrix with the squared estimated residuals along the diagonal.
The major change when the assumption of martingale difference scores is relaxed is that a more
complicated covariance estimator is required to estimate the variance of β̂ . A modification from
White’s covariance estimator is needed to pick up the increase in the long run-variance due to the
predictability of the scores (xt εt ). In essence, the correlation in the scores reduces the amount of
“unique” information present in the data. The standard covariance estimator assumes that the
scores are uncorrelated with their past and so each contributes its full share to the precision to
β̂ .
Heteroskedasticity Autocorrelation Consistent (HAC) covariance estimators address this is-
sue. Before turning attention to the general case, consider the simple differences that arise in the
estimation of the unconditional mean (a regression on a constant) when the errors are a white
noise process and when the errors follow a MA(1).
yt = µ + εt
directly from the white noise assumption. Define the sample mean in the usual way,
T
X
−1
µ̂ = T yt
t =1
5.9 Cross-sectional Regression with Time-series Data 363
T
X
−1
E[µ̂] = E[T yt ]
t =1
T
X
−1
=T E[yt ]
t =1
T
X
= T −1 µ
t =1
= µ.
The variance of the mean estimator exploits the white noise property which ensures E[εi ε j ]=0
whenever i 6= j .
T
X
V[µ̂] = E[(T −1 yt − µ)2 ]
t =1
T
X
= E[(T −1 εt )2 ]
t =1
T
X T
X T
X
= E[T −2 ( ε2t + εr εs )]
t =1 r =1 s =1,r 6=s
T
X T
X T
X
−2 −2
=T E[ε2t ] +T E[εr εs ]
t =1 r =1 s =1,r 6=s
T
X T
X T
X
−2 −2
=T σ +T 2
0
t =1 r =1 s =1,r 6=s
= T −2 T σ2
σ2
= ,
T
σ2
and so, V[µ̂] = T
– the standard result.
Consider a modification of the original model where the error process ({ηt }) is a mean zero
MA(1) constructed from white noise shocks ({εt }).
η t = θ ε t −1 + ε t
V[ηt ] = E[(θ εt −1 + εt )2 ]
= E[θ 2 ε2t −1 + 2εt εt −1 + ε2t ]
= E[θ 2 ε2t −1 ] + 2E[εt εt −1 ] + E[ε2t ]
= θ 2 σ2 + 2 · 0 + σ2
= σ2 (1 + θ 2 ).
yt = µ + ηt
These follow from the derivations in chapter 4 for the MA(1) model. More importantly, the usual
mean estimator is unbiased.
T
X
−1
µ̂ = T yt
t =1
T
X
−1
E[µ̂] = E[T yt ]
t =1
T
X
−1
=T E[yt ]
t =1
T
X
−1
=T µ
t =1
= µ,
although its variance is different. The difference is due to the fact that ηt is autocorrelated and
so E[ηt ηt −1 ] 6= 0.
T
X
−1
V[µ̂] = E[(T yt − µ)2 ]
t =1
T
X
−1
= E[(T ηt )2 ]
t =1
5.9 Cross-sectional Regression with Time-series Data 365
T
X T −1
X T −2
X 2
X 1
X
−2
= E[T ( η2t +2 ηt ηt +1 + 2 ηt ηt +2 + . . . + 2 ηt ηt +T −2 + 2 ηt ηt +T −1 )]
t =1 t =1 t =1 t =1 t =1
T
X T −1
X T −2
X
−2 −2 −2
=T E[η2t ] + 2T E[ηt ηt +1 ] + 2T E[ηt ηt +2 ] + . . . +
t =1 t =1 t =1
2
X 1
X
−2 −2
2T E[ηt ηt +T −2 ] + 2T E[ηt ηt +T −1 ]
t =1 t =1
T
X T −1
X T −2
X 2
X 1
X
−2 −2 −2 −2 −2
=T γ0 + 2T γ1 + 2T γ2 + . . . + 2T γT −2 + 2T γT −1
t =1 t =1 t =1 t =1 t =1
where γ0 = E[η2t ] = V[ηt ] and γs = E[ηt ηt −s ]. The two which are non-zero in this specification
are γ0 and γ1 .
γ1 = E[ηt ηt −1 ]
= E[(θ εt −1 + εt )(θ εt −2 + εt −1 )]
= E[θ 2 εt −1 εt −2 + θ ε2t −1 + θ εt εt −2 + εt εt −1 ]
= θ 2 E[εt −1 εt −2 ] + θ E[ε2t −1 ] + θ E[εt εt −2 ] + E[εt εt −1 ]
= θ 2 0 + θ σ2 + θ 0 + 0
= θ σ2
T
X T −1
X
−2 −2
V[µ̂] = T γ0 + 2T γ1
t =1 t =1
= T −2 T γ0 + 2T −2 (T − 1)γ1
γ0 + 2γ1
≈
T
and so when the errors are autocorrelated, the usual mean estimator will have a different vari-
ance, one which reflects the dependence in the errors, and so it is not that case that
γ0
V[µ̂] = .
T
This simple illustration captures the basic idea behind the Newey-West covariance estimator,
which is defined,
366 Analysis of Multiple Time Series
L
X l
σ̂N2 W = γ̂0 + 2 1− γ̂l .
L +1
l =1
When L = 1, the only weight is 2(1 − = 2 21 and σ̂N2 W = γ̂0 + γ̂1 , which is different from the
1
2
)
variance in the MA(1) error example. However as L increases, the weight on γ1 converges to 2
1
since limL →∞ 1 − L +1 = 1 and the Newey-West covariance estimator will, asymptotically, include
the important terms from the covariance, γ0 + 2γ1 , with the correct weights. What happens when
we use σ̂N2 W instead of the usual variance estimator? As L grows large,
σ̂N2 W → γ0 + 2γ1
and the variance of the estimated mean can be estimated using σN2 W ,
γ0 + 2γ1 σ2
V[µ̂] = ≈ NW
T T
As a general principle, the variance of the sum is not the sum of the variances – this statement is
only true when the errors are uncorrelated. Using a HAC covariance estimator allows for time-
series dependence and leads to correct inference as long as L grows with the sample size.18
It is tempting to estimate γ̂0 and γ̂1 and use the natural estimator σ̂H 2
AC = γ̂0 + 2γ̂1 ? Unfor-
tunately this estimator is not guaranteed to be positive, while the Newey-West estimator, γ0 + γ1
(when L =1) is always (weakly) positive. More generally, for any choice of L , the Newey-West co-
variance estimator, σ̂N2 W , is guaranteed to be positive while the unweighted estimator, σ̂H 2
AC =
γ̂0 + 2γ̂1 + 2γ̂2 + . . . + 2γ̂L , is not. This ensures that the variance estimator passes a minimal sanity
check.
T
X
Ŝ = T −1 e t2 x0t xt
t =1
is that it explicitly captures the dependence between the e t2 and x0t xt . Heteroskedasticity Auto-
correlation Consistent estimators work similarly by capturing both the dependence between the
1
18
Allowing L to grow at the rate T 3 has been shown to be optimal in a certain sense related to testing.
5.A Cointegration in a trivariate VAR 367
e t2 and x0t xt (heteroskedasticity) and the dependence between the xt e t and xt − j e t − j (autocorre-
lation). A HAC estimator for a linear regression can be defined as an estimator of the form
T
X L
X T
X T
X
ŜN W = T −1 e t2 x0t xt + wl e s e s −l x0s xs −l + eq −l eq x0q −l xq (5.40)
t =1 l =1 s =l +1 q =l +1
L
X
= Γ̂ 0 + wl Γ̂ l + Γ̂ −l
l =1
L
0
X
= Γ̂ 0 + wl Γ̂ l + Γ̂ l
l =1
where {wl } are a set of weights. The Newey-West estimator is produced by setting wl = 1 − l
L +1
.
Other estimators can be computed using different weighting schemes.
• Decompose the π matrix in to α, the adjustment coefficient, and β , the cointegrating vec-
tors.
.9 −.4 .2 ε1,t
xt x t −1
yt = .2 .8 −.3 yt −1 + ε2,t
zt .5 .2 .1 z t −1 ε3,t
Easy method to determine the stationarity of this VAR is to compute the eigenvalues of the pa-
rameter matrix. If the eigenvalues are all less than one in modulus, the VAR(1) is stationary. These
values are 0.97, 0.62, and 0.2. Since these are all less then one, the model is stationary. An alter-
368 Analysis of Multiple Time Series
∆x t .9 −.4 .2 ε1,t
1 0 0 x t −1
∆yt = .2 .8 −.3 − 0 1 0 yt −1 + ε2,t
∆z t .5 .2 .1 0 0 1 z t −1 ε3,t
∆x t −.1 −.4 .2 ε1,t
x t −1
∆yt = .2 −.2 −.3 yt −1 + ε2,t
∆z t .5 .2 −.9 z t −1 ε3,t
∆wt = πwt + εt
where wt is a vector composed of x t , yt and z t . The rank of the parameter matrix π can be de-
termined by transforming it into row-echelon form.
ε1,t
xt 1 0 0 x t −1
yt = 0 1 0 yt −1 + ε2,t
zt 0 0 1 z t −1 ε3,t
∆x t ε1,t
1 0 0 1 0 0 x t −1
∆yt = 0 1 0 − 0 1 0 yt −1 + ε2,t
∆z t 0 0 1 0 0 1 z t −1 ε3,t
∆x t ε1,t
0 0 0 x t −1
∆yt = 0 0 0 yt −1 + ε2,t
∆z t 0 0 0 z t −1 ε3,t
and the rank of π is obviously 0, so these are three independent unit roots.
5.A Cointegration in a trivariate VAR 369
ε1,t
xt 0.8 0.1 0.1 x t −1
yt = −0.16 1.08 0.08 yt −1 + ε2,t
zt 0.36 −0.18 0.82 z t −1 ε3,t
the eigenvalues of the parameter matrix are 1, 1 and .7. The ECM form of this model is
∆x t ε1,t
0.8 0.1 0.1 1 0 0 x t −1
∆yt = −0.16 1.08 0.08 − 0 1 0 yt −1 + ε2,t
∆z t 0.36 −0.18 0.82 0 0 1 z t −1 ε3,t
∆x t −0.2 ε1,t
0.1 0.1 x t −1
∆yt = −0.16 0.08 0.08 yt −1 + ε2,t
∆z t 0.36 −0.18 −0.18 z t −1 ε3,t
and the eigenvalues of π are 0, 0 and -.3 indicating it has rank one. Remember, in a cointegrated
system, the number of cointegrating vectors is the rank of π. In this example, there is one coin-
tegrating vector, which can be solved for by transforming π into row-echelon form,
α1 − 21 α1 − 12 α1
π = αβ 0 = α2 − 21 α2 − 12 α2
α3 − 21 α3 − 12 α3
ε1,t
xt 0.3 0.4 0.3 x t −1
yt = 0.1 0.5 0.4 yt −1 + ε2,t
zt 0.2 0.2 0.6 z t −1 ε3,t
370 Analysis of Multiple Time Series
the eigenvalues of the parameter matrix are 1, .2+.1i and .2-.1i . The ECM form of this model is
∆x t ε1,t
0.3 0.4 0.3 1 0 0 x t −1
∆yt = 0.1 0.5 0.4 − 0 1 0 yt −1 + ε2,t
∆z t 0.2 0.2 0.6 0 0 1 z t −1 ε3,t
∆x t −0.7 0.4 ε1,t
0.3 x t −1
∆yt = 0.1 −0.5 0.4 yt −1 + ε2,t
∆z t 0.2 0.2 −0.4 z t −1 ε3,t
and the eigenvalues of π are 0, −0.8 + 0.1i and −0.8 − 0.1i indicating it has rank two (note
that two of the eigenvalues are complex). Remember, in a cointegrated system, the number of
cointegrating vectors is the rank of π. In this example, there are two cointegrating vectors, which
can be solved for by transforming π into row-echelon form,
−0.7
0.4
α = 0.1 −0.5
0.2 0.2
Exercises
Shorter Questions
Longer Questions
Exercise 5.1. Consider the estimation of a mean where the errors are a white noise process.
i. Show that the usual mean estimator is unbiased and derive its variance without assuming
the errors are i.i.d.
Now suppose error process follows an MA(1) so that εt = νt + θ1 νt −1 where νt is a WN
process.
ii. Show that the usual mean estimator is still unbiased and derive the variance of the mean.
Suppose that {η1,t } and {η2,t } are two sequences of uncorrelated i.i.d. standard normal
random variables.
ii. Which of the following bivariate Vector Autoregressions are stationary? If they are not sta-
tionary are they cointegrated, independent unit roots or explosive? Assume
ε1t
i.i.d.
∼ N (0, I2 )
ε2t
372 Analysis of Multiple Time Series
π11 π12
π=
π21 π22
can be solved using the two-equation, two-unknowns system λ1 + λ2 = π11 + π22 and
λ1 λ2 = π11 π22 − π12 π21 .
(a)
ε1t
xt 1.4 .4 x t −1
= +
yt −.6 .4 yt −1 ε2t
(b)
ε1t
xt 1 0 x t −1
= +
yt 0 1 yt −1 ε2t
(c)
ε1t
xt .8 0 x t −1
= +
yt .2 .4 yt −1 ε2t
yt = β1 + β2 x t + εt .
yt = φ1 yt −1 + φ2 yt −2 + εt
i. Rewrite the model with ∆yt on the left-hand side and yt −1 and ∆yt −1 on the right-hand
side.
ii. What restrictions are needed on φ1 and φ2 for this model to collapse to an AR(1) in the first
differences?
iii. When the model collapses, what does this tell you about yt ?
x t = x t −1 + ε1,t
yt = β x t −1 + ε2,t
ε1,t
xt 0.4 0.3 x t −1
= +
yt 0.8 0.6 yt −1 ε2,t
iii. Compute the speed of adjustment coefficient α and the cointegrating vector β where the
β on x t is normalized to 1.
Exercise 5.4. Data on interest rates on US government debt was collected for 3-month (3M O ) T-
bills, and 3-year (3Y R ) and 10-year (10Y R ) bonds from 1957 until 2009. Using these three series,
the following variables were defined
Level 3M O
Slope 10Y R − 3M O
Curvature (10Y R − 3Y R ) − (3Y R − 3M O )
i. In terms of VAR analysis, does it matter whether the original data or the level-slope-curvature
model is fit? Hint: Think about reparameterizations between the two.
Granger Causality analysis was performed on this set and the p-vals were
iii. When constructing impulse response graphs the selection of the covariance of the shocks is
important. Outline the alternatives and describe situations when each may be preferable.
iv. Figure 5.6 contains the impulse response curves for this model. Interpret the graph. Also
comment on why the impulse responses can all be significantly different from 0 in light of
the Granger Causality table.
374 Analysis of Multiple Time Series
0 10 20 0 10 20 0 10 20
0 0 0
Figure 5.6: Impulse response functions and 95% confidence intervals for the level-slope-
curvature exercise.
v. Why are some of the “0” lag impulses 0 while other aren’t?
(a) Rewrite the model with ∆yt on the left-hand side and yt −1 and ∆yt −1 on the right-
hand side.
(b) What restrictions are needed on φ1 and φ2 for this model to collapse to an AR(1) in
the first differences?
(c) When the model collapses, what does this tell you about yt ?
x t = x t −1 + ε1,t
yt = β x t −1 + ε2,t
ε1,t
xt 0.625 −0.3125 x t −1
= +
yt −0.75 0.375 yt −1 ε2,t
Exercise 5.6. Consider the estimation of a mean where the errors are a white noise process.
i. Show that the usual mean estimator is unbiased and derive its variance without assuming
the errors are i.i.d.
Now suppose error process follows an AR(1) so that εt = ρεt −1 + νt where {νt } is a WN
process.
ii. Show that the usual mean estimator is still unbiased and derive the variance of the sample
mean.
iii. What is Granger Causality and how is it useful in Vector Autoregression analysis? Be as
specific as possible.
Suppose that {η1,t } and {η2,t } are two sequences of uncorrelated i.i.d. standard normal
random variables.
vi. The AIC, HQC and SBIC were computed for a bivariate VAR with lag length ranging from 0
to 12 and are in the table below. Which model is selected by each?
376 Analysis of Multiple Time Series
x t = x t −1 + ε1,t
yt = β x t −1 + ε2,t
i. Describe two methods for determining the number of lags to use in a VAR(P)
Write this in companion form. Under what conditions is the VAR(P) stationary?
iii. For the remainder of the question, consider the 2-dimentional VAR(1)
yt = Φ1 yt−1 + εt .
Define Granger Causality and explain what conditions on Φ1 are needed for no series in yt
to Granger cause any other series in yt .
vii. In this setup, describe how to test for cointegration using the Engle-Granger method.
iii. What conditions on the eigenvalues of Φ1 are required for cointegration to be present?
∆y t = Πy t −1 + εt .
Assume each of the variables in y t are I(1). What conditions on the rank of Π must hold
when: