Modelling Non-Stationary Times Series
Modelling Non-Stationary Times Series
Time Series
A Multivariate Approach
You can receive future titles in this series as they are published by placing a standing order. Please
contact your bookseller or, in case of difficulty, write to us at the address below with your name and
address, the title of the series and the ISBN quoted above.
10 9 8 7 6 5 4 3 2 1
14 13 12 11 10 09 08 07 06 05
Printed and bound in Great Britain by
Antony Rowe Ltd, Chippenham and Eastbourne
Contents
Preface vii
Notes 203
Appendix B Matrix Algebra for Engle and Granger (1987) Representation 217
B.1 Determinant/Adjoint Representation of a Polynomial Matrix 217
B.2 Expansions of the Determinant and Adjoint about z ∈ [0, 1] 217
B.3 Drawing out a Factor of z from a Reduced Rank Matrix Polynomial 218
References 240
Index 250
Preface
This book deals with an analysis of non-stationary time series that has been
very influential in applied research in econometrics, economics and finance.
The notion that series are non-stationary alters the way in which series are
grouped and may even prove to be relevant to some aspects of regulation and
competition policy when the definition of market becomes an economic issue.
The later might also apply to any discussion of the nature of globalized
financial markets. In terms of econometric and statistical theory an enormous
literature has grown up to handle the behaviour of the different forms of per-
sistence and non-stationary behaviour that economic and financial data
might exhibit. This is emphasized by the Nobel Prize that has been presented
to Clive Granger and Robert Engle in relation to their extension of our under-
standing of the way in which non-stationary series behave. However, the
requirement to analyze non-stationary behaviour has spawned a wide range of
approaches that relate to and interrelate with the notion that series are non-
stationary and/or cointegrated.
It has been our privilege to in some part be involved in these developments
and to have learned much from our colleagues and teachers alike. We must
acknowledge our debt of gratitude to those who taught us and supervised us
over the years. We would also like to thank participants at various Econo-
metrics Society Conferences, EC2, Econometrics Study Group Conferences
held at the Burwells campus of Bristol University and participants in the
Econometrics workshop for their incites, comments and stimulating research.
Dr. Lindsey Anne Gillan also provided us with some guidance through the
potential minefied that is academic publishing. However, all errors are our
own.
SIMON P. BURKE
JOHN HUNTER
vii
1
Introduction: Cointegration, Economic
Equilibrium and the Long Run
1
2 Modelling Non-Stationary Time Series
equation became the basis of the Rotterdam model developed by Theil (1965)
and Barten (1969). In the early 1970s, Box and Jenkins wrote a book that
became highly influential in the statistical analysis of time series data. Box
and Jenkins set out a methodology for building time series models, that firstly
considers the appropriate degree of differencing required to render a series sta-
tionary, and then discusses the type of alternative models autoregressive (AR)
or moving average (MA), or ARMA that might be used to estimate univariate
time series and then considered the method of estimation. Fama (1970) sug-
gests that the observation that financial time series follow random walks is
consistent with the idea that markets were efficient. The random walk model
implies that financial time series are non-stationary and, following Box and
Jenkins, need to be differenced to make them stationary. The difference in the
log of the share price approximates a return and when the financial market is
efficient then returns are not supposed to be predictable.
The structure of time series models pre-dates Box and Jenkins. Yule (1927)
first estimated AR processes and in 1929 Kolmogorov considered the behav-
iour of sums of independent random variables (see the discussion in Wold and
Jureen (1953)). In the regression context, Sargan (1964) applied an MA error
structure to a dynamic model of UK wage inflation. The Sargan model became
the basis of most of the UK wage equations used in the large macroeconomic
models (Wallis et al. 1984). In demand analysis, approximation rather than
non-stationarity was behind differencing and developments in economic
theory related to the structure of demand equations was more interested in
issues of aggregation as compared with the possible time series structure of the
data (Deaton and Muellbauer 1980). To difference time series became
common practice in modelling univariate time series and this approach was
also applied in finance where it was common to consider returns of different
assets rather than share prices. The market model relates the return on a share
to the return on the market. There was now a discrepancy between the
methods applied in statistics and finance to time series data and the approach
predominantly used by economists.
However, the first oil shock precipitated a crisis in macroeconomic model
building. Most of the world’s large macroeconomic models were unable to
resolve many of the problems that ensued from this shock. Forecasts and
policy simulations that provide the governments’ predictions of the future
and a practical tool for understanding the impact of policy on the economy
were unable to explain what had happened and what policies might remedy
the situation (Wallis et al. 1984). The UK Treasury’s inability to forecast the
balance of payments position led to the ludicrous situation of a developed
economy being forced to borrow from the IMF – a remedy that would not
have been sought had reasonable estimates been available of the true pay-
ments position. The whole approach to the econometric modelling of eco-
nomic time series was in doubt.
Introduction: Cointegration, Economic Equilibrium and the Long Run 3
prices or in finance, share prices, stock indices and dividends. Otherwise, frac-
tional differencing might be required, with the resulting models being special
cases of the autoregressive fractionally integrated moving average (ARFIMA)
model.
In chapter 3, modelling non-stationary time series is handled in a single
equation framework. When more than one series is analyzed, differencing
might be more than is required. This occurs when series in combination are
stationary (cointegration). Non-integer differencing is often required, in the
case of series such as interest rates. Single equation models, which incorporate
some different right-hand side variables in levels, are classified as error correc-
tion models. When the original data or their logarithms are non-stationary,
cointegration may be observed when linear combinations of two or more
levels variables are stationary. Then cointegration is valid when the relation-
ships are bivariate or there is one cointegrating relationships in a system.
When the regressors are exogenous, in a univariate time series context, the
regressions can be viewed as ARMAX or ARMA models with exogenous
variables.
In chapter 4, the multivariate time series model is developed from a station-
ary representation of the data that is known always to exist, the vector or
VMA model in differences. The book explains the nature of multivariate time
series under stationarity and then extends this to the cointegration case. We
then explain how the VMA in differences can be transformed into an error
correction model using the Granger representation theorem and the Smith–
McMillan form developed by Yoo (1986). Cointegration is then described in
terms of error correcting VARs or VECMs. A procedure for determining the
existence of the VAR is described along with the Johansen approach to estima-
tion and inference. The book explains the asymptotic theory that lies behind
the Johansen test statistic. An application is developed based on the models of
the UK effective exchange rate estimated by Hunter (1992), Johansen and
Juselius (1992) and Hunter and Simpson (1995). Finally a number of alterna-
tive representations are developed and the question of multi-cointegration
discussed.
In chapter 5, the exogeneity of variables in the VAR and the identification
of long-run parameters are considered. Exogeneity is discussed in terms of the
restrictions required for weak, strict and cointegrating exogeneity in the long
run. Then alternative forms of exogeneity and causality are considered and
the results associated with Hunter (1992) and Hunter and Simpson (1995) are
presented. Identification is discussed in terms of conventional systems with
I(0) series, this approach is extended to show when the parameters can be
identified via imposing the restrictions and solving out for the long-run para-
meters and their loadings. Identification is then discussed in terms of the
results derived by Bauwens and Hunter (2000), Johansen (1995) and Boswijk
Introduction: Cointegration, Economic Equilibrium and the Long Run 7
(1996). All three approaches are applied to the model estimated by Hunter
(1992).
In chapter 6, more advanced topics are considered in some detail. Firstly,
the I(2) case, firstly using an extention to the Sargan–Bézout approach
adopted by Hunter (1994), then in terms of the representation and test due to
Johansen (1992) and Paruolo (1996), and finally the test procedures due to
Johansen and Paruolo are applied to the exchange rate data in Hunter (1992).
Fractional cointegration is briefly discussed in terms of the estimator due to
Robinson and Marinucci (1998) and the test due to Robinson and Yajima
(2002). Secondly, forecasting of non-stationary and stationary components is
considered. The results produced by Lin and Tsay (1996) and Clements and
Hendry (1995, 1998) are presented with a graphical analysis of the perfor-
mance of the simulations developed by Lin and Tsay (1996). Finally, models
with short-run structural equations are discussed – in particular, models with
unit roots in the endogenous and exogenous processes. It is shown how to
estimate models where the unit roots relate to the endogenous variables and
then to the case associated with the exogenous variables.
In chapter 7, the reader is guided to further issues in the literature. Firstly, a
plethora of articles on testing stationarity and non-stationarity has developed;
the reader is directed where appropriate to the book by Patterson (2005). A
condensed discussion of structural breaks is provided along with direction to
appropriate references.
2
Properties of Univariate Time Series
2.1 Introduction
2.2 Non-stationarity
time series and that it is the temporal dependence between elements of these
series that is of concern. Furthermore, the dependence will be considered at a
relatively simple level: that of covariance. This last point does not matter if
the distribution being used is the normal (or Gaussian) distribution, since this
distribution is characterized entirely by its mean and variance and covariance.
Consider Figure 2.1. This shows the time series plot of the annual rate of
growth of UK real output from 1963 to 1993. Its characteristics are that it
varies around a more or less fixed level, that it does not drift away from this
level for any great length of time, and that higher values at some point in
time tend to be followed by other high values, or at least that changes from
the high values or are often smooth. The same applies for low values, followed
by low values or changing relatively smoothly.1 The controlled variability
around a fixed level is a manifestation of stationarity. The relationship
between neighbouring values can be described by autocorrelation – literally,
the quantification of the correlation between values in the time series sep-
arated by fixed periods of time. A type of stationarity can be defined in terms
of the autocorrelation and mean of a time series. This is a restricted but very
useful and practical definition.
In theory, the individual observations comprising the time series are
thought of as realizations of underlying random variables. The autocorrelation
of a time series is defined in terms of these underlying random variables as
follows. Let Xt t = 1, 2, … be a sequence of scalar random variables, one for
each equally spaced point in time, t, but otherwise referring to the same
random variable, X. Such a sequence may (loosely) be called a stochastic
process.2 Let E(.) be the expectation operator.
Var ( Xt ) = E[ Xt − E( Xt ))2 ] = x ( 0)
x( j) (2.2 )
x( j) = .
x ( 0)
j = …, − 2, − 1, 0, 1, 2, …
Being a correlation, it follows that
−1 ≤ ρ x ( j ) ≤ 1
making it a useful basis on which to compare time series.
The sequence of autocovariances and autocorrelations obtained as j, the
time gap between random variables changes, are often referred to as functions.
That is, (2.1) is called the autocovariance function and (2.2) the autocorrela-
tion function (abbreviated to ACF).
2.2.2 Stationarity
The definitions of autocovariance and autocorrelation have been written to
indicate that they depend only on the time gap, not the point in time. That is,
for example, considering two different points in time, t and t – j,
and
even though t ≠ . But the time gap, j, is the same so they have the same auto-
covariance. This is an assumption consisting of two components.
It is assumed that the expected value, or mean of the time series does not
change over time, so that for any t ≠ .
E( Xt ) = E( X ). (2.3a )
from which it follows that the autocorrelations depend only on the time gap,
not on the time itself. The assumption that these quantities remain fixed over
time is a fundamental aspect of stationarity, and goes most of the way to
Properties of Univariate Time Series 11
E( Xt ) = µ , µ < ∞,
Var( Xt ) = σ 2 < ∞,
E[( Xt − E( Xt ))( Xt − j − E( Xt − j ))] = γ x ( j ), γ x ( j ) < ∞.
∑ ( x − x )( x
t = j +1
t t−j − x)
γˆ x ( j ) =
T−j
sample ACF. The sample ACF for the UK output growth data is presented in
Figure 2.2.
Figure 2.2 has two leading characteristics. The sample autocorrelations
damp off over time, that is they decline towards zero as the time gap, or lag
(j), gets larger. There is a degree of oscillation, so that the autocorrelations
start off positive, then decline to zero, but go through zero before returning to
12 Modelling Non-Stationary Time Series
Figure 2.2 Sample ACF for rate of growth of UK real output, quarterly, 1963–1993
Figure 2.3 Daily $/£ exchange rate, January 1985–July 1993, T = 2168
Properties of Univariate Time Series 13
the line is fuzzier, caused by the fact that a great many more observations are
being plotted to the same real length of horizontal axis, which is a matter of
scaling only, the series is seen to wander away from its starting point, to such
an extent that it is difficult to argue that it appears to be varying around a
fixed level. If it isn’t varying about a fixed level (that is, there doesn’t seem to
be a fixed mean), then it is difficult to see how the variances or covariances
might be behaving. It seems that they must also be varying with time,
although, care should be taken since it is quite possible to imagine a series
that varies to a constant degree around a mean that is changing.6 However, in
this case it is difficult to discern what that mean could be. The sample ACF for
this series is given in Figure 2.4. In contrast to the ACF for the growth data,
this declines linearly, and has not reached zero, even by the 100th lag (ˆ (100)
=0.53521). This series appears to have very long memory, in terms of lags. Its
sample ACF does not look like it is damping off at all. This is not consistent
with the idea of covariance stationarity and suggests that the calculations may
Figure 2.5 Moving window sample variance estimates of the $/£ exchange rate data,
window length 100
14 Modelling Non-Stationary Time Series
indeed be meaningless. It seems likely that the exchange rate series is not
covariance stationary.
To emphasize the point, Figure 2.5 plots the moving window sample vari-
ance estimates of the $/£ exchange rate series, computing the sample variance
for observations 1–100, followed by that for observations 2–101, and so on.
From this it is clear that the variance around the mean does not remain con-
stant even when the mean itself is allowed to vary across windows.
F( X +1 , … , X + n ) = F( X + h +1 , …, X + h + n ), h ≥ 0 (2.5)
then the process generating the time series observations is said to be strictly
stationary.
Equation (2.5) simply states that the joint distribution of the sequence of
random variables is unchanged when considering the distribution any number
of periods earlier or later. In the case of covariance stationarity, it is not the dis-
tribution as a whole that is considered, but only its first two moments, the
mean, and the (co)variances. This is clearly a weaker requirement.
It can be seen that strict stationarity, while having appeal from a philosoph-
ical point of view, is very demanding and so impracticable. In common with
most textbook treatments, econometric research, applied and theoretical, this
book will adopt covariance stationarity as its definition of stationarity, and,
unless otherwise stated, stationarity will mean covariance stationarity. In
addition, a common – though not universal – assumption of time series
models is of normality, in which case the two definitions are coincident.
Properties of Univariate Time Series 15
E(t ) = 0 (2.6a )
Var(t ) = 2
∀t (2.6 b)
( j ) = 0 ∀j ≠ 0 (2.6c )
( E(t t − j ) = 0, ∀j ≠ 0) (2.6d )
the sequence is said to be white noise9 and the symbol ∀ means ‘for all’ or ‘for
any’.
16 Modelling Non-Stationary Time Series
Equations (2.6a–2.6d) state that the process has zero mean, constant vari-
ance, and that different members of the sequence are uncorrelated. In addi-
tion, there will often be a distributional assumption, which is that the random
variables are normally distributed. Since, under normality, non-correlation is
equivalent to independence, the sequence is then described as normally inde-
pendently identically distributed (NIID) with a mean of zero and variance 2
In short, t ~ NIID (0, 2 ). Realizations of an NIID(0, 1) sequence are provided
in Figure 2.6, with the time index labelled as though for the same time period
and frequency of data as Figure 2.1 for the growth in output data.
at = t − 1 2t −1.
Then clearly there is some temporal structure to the at, that is they are auto-
correlated. Note that the mean of the process is given by
a ( j ) = E( at at − j )
= E((t − 1 2t −1 )(t − j − 1 2t − j −1 ))
= E(t t − j ) − 1 2 E(t −1t − j ) − 1 2 E(t t − j −1 ) + 1 4 E(t −1t − j −1 )
1 2
− for j = 1
= 2
0 for j > 1
Properties of Univariate Time Series 17
since the expectation terms in the last expression will be zero if the time index
on the random variables is not the same because the white noise series is
uncorrelated; if the index is the same then the expectation is the expectation
of a square of a zero mean process, and so is its variance, 2. So the process is
autocorrelated as far as but not beyond the first lag and this is because it is a
function of the current white noise term and its previous value.
It is possible to build more general models of this type. Let θi, i = 1, 2, …, q
be constant coefficients and define
q
at = t − ∑θ
i =1
i t −i,
(2.8)
E( at ) = 0. (2.9a )
q
Var( at ) = 1 +
∑q 2
i
2
,
(2.9b)
i =1
( + + +…+ ) 2 for j = 1, 2, …, q
a ( j ) = j j +1 1 j + 2 2 q q− j
otherwise (2.9c )
0
These equations show that the mean and variance are fixed, that the autoco-
variances depend not only on the time gap, but on the time itself, and that all
moments are finite as long as the parameters are. The process is therefore sta-
tionary, but has an autocorrelation structure that cuts off after q lags.
However, since there are q parameters in the model, these values may be
chosen so as to reproduce any desired sequence of q autocovariances, and
hence any ACF cutting off after lag q.
Equation (2.8) defines a moving average (MA) process (or model) of order q.
It is important to note that all such processes are stationary, and that they are
very flexible in terms of reproducing autocorrelation structure. To obtain a
process whose autocorrelations last out to lag 15, a MA(15) model can be used.
In theory, the model can extend to an infinite number of lags if the autocorre-
lations damp off asymptotically rather than all at once. Then it is necessary to
place a restriction on the coefficients so that the variance (2.9b) exists,
∞
namely, that ∑θ
i =1
2
i < ∞. These properties also demonstrate the drawbacks of the
MA model: it isn’t practical to work with a very large number of lags; the
model cannot capture non-stationary behaviour; and, finally, it is not easy to
motivate in terms of the real life structures that might have given rise to data.
The point is that, by extending the order of the MA far enough, it is always
possible to provide a MA process whose ACF approximates that of any given
ACF to whatever degree of accuracy is required, and that the approximation
error goes to zero as the order of the MA increases. As long as the foregoing is
understood, this may be abbreviated by stating that any (covariance) station-
ary time series with no deterministic components has an infinite order MA
representation.
Thus, if xt is a stationary time series with only stochastic components, it is
always possible to represent it as
∞
xt = t − ∑θ
i =1
i t −q
where t is zero mean white noise with variance 2, the only restriction on the
∞
parameters being that ∑θ
i =1
2
i < ∞. A more detailed account of this theorem may
be found in Hamilton (1994, section 4.8) and a rigorous one in Brockwell and
Davis (1991).11
xt = xt −1 + t , (2.10a )
xt = ∑ x i t −i + t , (2.10b)
i =1
where and i = 1, 2, …, p are constant coefficients.12
2i
x( j) = (2.11a )
1− 2
x( j)
x( j) = = j. (2.11b)
x ( 0)
Properties of Univariate Time Series 19
In the case of an AR(p) process, the ACF is the solution to the difference equation
p
x( j ) = ∑ ( j − i).
i =1
i x
13
1− ∑ z = 0, i
i
(2.12 )
i =1
where z is the argument of the function on the left-hand side of (2.12). Let
these solutions be i, i = 1, 2, …, p. Then, in the case where the solutions are
all distinct, the solution will be of the form14
p
x( j) = ∑A −j
i i (2.13)
i =1
xt − ∑ x i t −i = t . (2.14)
i =1
Lxt = xt −1 ,
Ln xt = xt − n ,
p
xt − ∑ L x
i =1
i
i
t = t
or
p
1 −
∑ L x
i =1
i
i
t = t .
20 Modelling Non-Stationary Time Series
p
The term 1 −
∑ L
i =1
i
i
of this equation is a polynomial of degree p in the lag
operator L (and so is itself an operator). That is, it is a polynomial function of
L. It is therefore conveniently rewritten as
p
( L) = 1 − ∑ L . i
i
(2.15)
i =1
This function is called a lag polynomial operator (of order p). In general, the
coefficient of L0 = 1 does not have to be equal to 1 as it is here. This has arisen
because the starting point was an autoregressive model.
Using (2.15), the AR(p) model of (2.10b) may be written
( L)xt = t .
q( L) = 1 − ∑q L , i
i
(2.16)
i =1
at = θ ( L)t .
1− ∑ z = 0,
i
i
(2.12 ) again
i =1
But the left-hand side of this equation is the same function as (2.15) except
that the lag operator has been replaced by the general complex argument, z.
So, writing
p
( z ) = 1 − ∑ z
i =1
i
i
( z ) = 0. (2.17)
The values of z that satisfy (2.17) are called the roots of the polynomial (z).
As a short hand, they are also referred to as the roots of the lag polynomial
operator, (L), although, obviously, it is not correct in any sense to assign
numerical values to an operator (the lag operator in this case).
Properties of Univariate Time Series 21
( L) =
0 − ∑
L i
i
(2.18)
i =1
where
i, i = 0, 1, …, n are constant coefficients. By ‘evaluating’ the function at
certain values of its argument, useful functions of the coefficients can result.
There are two important cases:
(i) Replace L by 0. Then (2.18) becomes
n
( 0) =
0 − ∑
0
i =1
i
i
=
0.
That is,
(0) is the value of the coefficient of the zero lag term of
(L).
(ii) Replace L by 1. Then (2.18) becomes
n n
(1) =
0 − ∑
i =1
i 1i =
0 − ∑
.
i =1
i
So
(1) is the sum of the coefficients of
(L).
white noise variance. So, it is the roots of the autoregressive lag polynomial that
determine the evolution of the ACF as a function of the time gap, or lag, j.
The condition that all the roots (of the autoregressive lag polynomial) lie
outside the unit circle is the stationarity condition for autoregressive processes.
xt = xt −1 + t ,
which can be written
( L)xt = t ,
( L) = 1 − L. (2.19)
To see that (2.19) has a unit root, and is therefore non-stationary, note that
(z) = 0 has the solution z = 1 from (2.19). That is, the lag polynomial of this
model has a root of 1.
That is,
= 2
and in general, if the process is differenced n times, the operation can be rep-
resented as n, the lag operator representation of which can be calculated
from (1 – L)n. Although the first difference of the random walk is stationary, so
is the second, because it is an MA(1) process, 2 xt = t – t – 1, and all MA
processes are stationary. However, it has been over-differenced, meaning that
in order to reduce the original (random walk) process to stationarity it was
only necessary to difference once. This can be detected in the time series
structure by observing that t – t – 1 is a MA(1) process with a unit root. (It
could be said that differencing the minimal number of times to reduce the
series to stationarity removes the unit root altogether, while over-differencing
moves it from the AR to the MA side of the equation.) Strictly speaking, it is
the minimal number of times it is necessary to difference a non-stationary
series to stationarity that defines its order of integration. This is made precise
in section 2.3.11 below.
yt = a + bt (2.20)
yt − yt −1 = ( a + bt ) − ( a + b(t − 1)) = b.
yt = yt −1 + b. (2.21a)
However, (2.21a) does not tie down the value of the process at any point in
time, whereas (2.20) does. In particular, considering the value of the process at
t = 0, called the initial value, (2.20) gives
y0 = a. (2.21b)
The time trend model is fully described by equations (2.21a) and (2.21b).
Because the amount added each time period is fixed, b this is known as a
deterministic trend. If instead of adding a fixed amount, a white noise is
added, the resultant process is still called a trend, but it is now termed a
stochastic trend. Thus in place of (2.21a) write
= … = y0∗ + ∑ . j
(2.23)
j =1
The fact that (2.23) involves a simple (unweighted) sum of white noise terms
leads to the general label of integrated for processes of this type, although the
class is not restricted to pure random walks of the type illustrated here.18
Using equation (2.23), it is straightforward to show that both the variance
and autocorrelation structure of the random walk are varying over time
according to
Var( yt∗ ) = t 2 ,
Cov( yt∗ , yt∗− j ) = (t − j ) 2
and defining the correlation to be the covariance divided by the variance of
the process at time t, the autocorrelation is
i
Cor ( yt∗ , yt∗− j ) = 1 − . (2.24)
j
It is clear that the process is non-stationary since its moments are not con-
stant over time. From this non-constancy, it also follows that the manipula-
tions underlying the derivation of the difference equation for the ACF of an
autoregressive process are not valid. So, in fact, equation (2.13) only applies in
the stationary case.20
Properties of Univariate Time Series 25
Figure 2.7 Random walk, 2,168 observations, initial value 0, NIID(0,1) white noise
Figure 2.8 Sample ACF of random walk series plotted in Figure 2.7
Figure 2.9 Theoretical ACF of a random walk at different points in time: t = 50, 75, 100
t t −1
xt = xt − xt −1 = x0 + t + ∑
j =1
j x0 + (t − 1) +
∑ = +
j =1
j t
Such a process is called a random walk with drift and is called the drift para-
meter. There are now two aspects to the non-stationarity: not only is the vari-
ance growing over time (and the autocorrelation structure changing over
time) but the mean of the process is also evolving since
t
E( xt ) = E( x0 ) + E(t ) + E( ∑ ) = x
j =1
j 0 + t
where –1 (L) is such that –1 (L) (L) = 1.21 The inverse operator does not exist
unless all the roots of (L) lie outside the unit circle. The operator ψ (L) is of
infinite order, and so ARMA models can be thought of as a restricted way of
obtaining an MA(∞) representation. That is, the ARMA model of finite orders
provides an approximation to the infinite order MA representation of a sta-
tionary process.
It is easily verified that –1 (L) (L) = 1 in this case. Then the ARMA(1,1)
model
can be written
∞
∑
xt = i Li (1 − θL)t
(2.27)
i=0
Multiplying out the operators on the right hand side of (2.27) gives
∞ i i ∞ ∞
∑
i =0
L (1 − θL) =
∑
i =0
i Li − θ ∑ L
i =0
i i +1
∞ ∞
= 1+ ∑
i =1
i Li − ∑ θ
i =1
i −1 i
L
∞
= 1+ ∑ (
i =1
i
− θ i −1 ) Li .
xt = ψ ( L)t (2.28a )
∞
where ψ ( L) = 1 − ∑ψ L , with
i =1
i
i
(2.26) it appears that the lag polynomials have cancelled. In the stationary
case, where the common operator has its root outside the unit circle, this is a
reasonable way to describe what has happened, since if θ = the AR and MA
operators are indeed the same. The situation is a little more complex in the
non-stationary case where dependency on initial values is not negligible.
If the roots of
(L) are i, i = 1, 2, …, n, then it may be written as the product
of first order factors (1 – –1i L),
n
( L) = ∏ (1 −
i =1
−1
i L).
This factorization does not depend on whether the roots are outside the unit
circle. Thus, if in a stationary ARMA(p, q) model, there is a common factor (i.e.
root) between the AR and the MA polynomial, this may be cancelled to give
an ARMA(p – 1, q – 1) model with exactly the same time series characteristics.
2.3.10.1 Invertibility
An AR(p) or ARMA(p, q) model is said to be invertible if the moving average
operator has all roots outside the unit circle. That is, if
( L)xt = θ ( L)t
−1 θ 1θ θ
x2 (1) = =− =− = x1 (1)
1 + ( −1 θ )2 1+1 θ2 1+ θ2
Properties of Univariate Time Series 29
(Both are stationary because they are pure moving average processes.) In
general, an ARMA(p, q) or MA(q) process will have 2q different parameteriza-
tions that generate the same ACF, because any subset of the moving average
roots may be replaced by their inverses.24 The coefficients of the MA compo-
nent may not therefore be uniquely identified from the ACF. However, if
invertibility holds, there is a unique set of MA coefficients corresponding to
the ACF.25
yt = ∆ d xt (2.29a)
( L) = ( L)∆ d .
For example, if (L) = 1 – 0.1L and d = 1 then
( L) = (1 − 0.1L)∆ = (1 − 0.1L)(1 − L) = 1 − 1.1L + 0.1L2
where
(L) has roots of 10 and 1, one stationary root and a unit root respec-
tively. This suggests another way of thinking about the order of integration as
being the number of unit roots in the autoregressive lag polynomial.
If xt has an ARMA(m, q) representation
(L) xt = θ (L) t which is invertible and
where non-stationarities are due only to unit roots, then the order of integration
is equal to the number of unit roots of
(L). If
(L) has d ≤ m unit roots, then it
may be factorized as (L) dxt = θ (L) t, where (L) is of order m – d.
A time series with a positive order of integration is said to be integrated.
Clearly, integrated time series are not the only type of non-stationary time
series, but this is a very popular way of modelling non-stationarity, not least
because it is simple and because a great deal of statistical theory has been
developed to further this approach.
∆xt = ∆a + ∆bt + ∆t = 0 + b∆t + ∆t = b(t − (t − 1)) + ∆t = b + ∆t . (2.32 )
1 1
= 1 − dL − L2 − L3 … + k Lk …
2 d(1 − d ) 6d(1 − d )(2 − d )
where k ∝ k–(1 + d) for large k, and so die away slowly, such that a very high
order autoregressive model would be needed to approximate the ACF reason-
ably well. The ARFIMA model was developed by Granger and Joyeux (1980)
and Hosking (1981).
2.4.1 Background
The form of non-stationarity that is commonly tested for is the unit root. The
structure within which such tests are performed is the AR or ARMA model.
32 Modelling Non-Stationary Time Series
The idea is to obtain a parameterization of the model that allows the hypo-
thesis to be tested to involve a single parameter. This subject is discussed in
detail in Patterson (2005). However, we illustrate some of the structure of
these tests briefly here for two reasons. Firstly, because multivariate generaliza-
tions form the basis of tests discussed in greater length in chapters 3 and 4;31
and secondly because prior testing for non-stationarity is crucial to a great
deal of the methodology of time series modelling used in economics and
finance.
( L) = − ψL + * ( L)(1 − L) (2.34a )
where
p −1
( L) = 1 −
*
∑ L , * i
i
i =1
p
i* = − ∑ , i = 1, 2, … , p − 1
i
j =i +1
ψ = −(1). (2.3b)
Equation (2.34b) shows that ψ = 0 if and only if (L) has a unit root. For con-
venience, define
p −1 p −1
* ( L) = ∑i =0
i* Li = 1 + ∑ L .
i =1
* i
i
where * (L) is a p – 1th order lag polynomial with * (0) = 1. That is, the AR(p)
model may be reparameterized as an AR(p – 1) model in first differences (* (L)
Properties of Univariate Time Series 33
xt), together with a correction term in the lagged level (ψxt – 1). The unit root
test is then a test of the null hypothesis
H 0 :ψ = 0
The summation term on the right-hand side of (2.36) does not appear if p = 1,
and so can be thought of as the correction for autocorrelation beyond
that which would be due to an AR(1) process. The alternative hypothesis
can be one or two sided, according to whether the alternative of interest is
stationarity (ψ < 0) or explosiveness (ψ > 0), or either. Typically the alternative
of interest is stationarity, and so that used is
H A :ψ < 0.
Unit root tests based on (2.36) are called augmented Dickey–Fuller (ADF) tests
(see Patterson 2005).
white noise. Tests based on (2.37) that assume the disturbances are white
noise will not be valid and inferences from them could be seriously
misleading. However, it is possible to correct the test statistics for the
disturbance autocorrelation so that inferences are once again valid. This
methodology is that developed by Phillips (1987) and Phillips and Perron
(1988). These tests require calculation of a term that has become known as
the long-term variance which is computed using a weighted average of
autocorrelations in a way related to spectral estimation and heteroscedastic
variance–covariance matrix (HAC) estimation (see Andrews 1991; Newey
and West 1987; and White 1980). Again, more details may be found in
Patterson (2005).
34 Modelling Non-Stationary Time Series
(i) The underlying model may not be AR but ARMA. In this case the AR
approximation would be arbitrarily long and impractical in an empirical
setting. Practically speaking, the optimal length of the pure AR approx-
imation depends on the sample size, and longer models can only be
entertained as more data becomes available. The relationship between
the sample size and the AR order is critical to the (asymptotic) validity of
the test. Ng and Perron (1995) discuss this problem and Hall (1989)
offers a different approach. See also Galbraith and Zinde-Walsh (1999).
(ii) The number of unit roots may be greater than 1 in which case testing
can become unreliable if performed in such a way that unit roots remain
unparameterized in the model. Dickey and Pantula (1987) advise on this
issue. It is relevant since economic time series, especially those recorded
in nominal terms, can be integrated of higher order, especially I(2).
(iii) Economic time series are often subject to structural breaks. This is a port-
manteau term to cover many possibilities, but relates simply to the
assumption of constancy of parameters where this does not exist. This
may affect the parameters of interest, so that, for example, a series may
change from being I(1) to being stationary. Alternatively, a time series
may in fact be stationary around a trend (or mean level) that is subject
to jumps or sudden changes in slope. Since the tests themselves look at
the stochastic behaviour around the trend, misspecification of this trend
leads to unreliable inferences about the stochastic component of the
series. This is a topic of current research, but established papers in the
area are Perron (1989), Zivot and Andrews (1992) and Perron (1990).
(iv) As already observed, a unit root test is an examination of the stochastic
component of a series, that is the random fluctuations about some deter-
ministically determined level. This could be many things: zero, non-zero
but fixed, or a trend of some polynomial degree. But misspecification of
the deterministic component can lead to incorrect inference on the sto-
chastic properties of the data. Dickey and Fuller (1979) address this to
some extent, developing tests for the trend as well as the unit root.
Patterson (2000, section 6.4) discusses a framework for joint determination
of the stochastic and deterministic components of a univariate time series.
where |θ| < 1, and (L) has all its roots outside the unit circle, so the differenc-
ing operator is the only source of the unit root and xt ~ I(1). But as θ → 1, so (1
– θL) → (1 – L) and the MA operator will tend to cancel with the differencing
operator. In the limit where this occurs, the process will be stationary and the
null ought to be rejected. But in finite samples, this will be a smooth rather
than a sudden transition, leading to a tendency for tests to reject the null of a
unit root for θ close to unity, even though strictly speaking the process is still
I(1). (See Blough, 1992, for a discussion of this issue.)32
This idea has formed the basis of a set of stationarity tests where the null is
of stationarity. This amounts to a null hypothesis of
H0 :θ = 1
in models such as (2.38). Naturally, this literature is closely related to that for
testing for moving average unit roots, important contributions being
Kwiatowski, Phillips, Schmidt and Shin (1992), and Leybourne and McCabe
(1994). Of course, such tests suffer from finite sample power problems for θ
close to unity, and size problems when an additional AR root tends to 1 (see
also Lee and Schmidt, 1996, for behaviour in the presence of fractional inte-
gration). KPSS also suggest using both unit root and stationarity tests jointly
in confirmatory data analysis. This was investigated in Burke (1994) and
found to add little to the use of either test individually.
The power of unit root tests can be improved by the use of covariates
(Hansen 1995). This has not yet become a popular approach. A method that is
becoming as popular as the ADF test is that advanced by Elliott, Rothenberg
and Stock (1996).
36 Modelling Non-Stationary Time Series
The Bayesian approach to unit root testing is now well developed and may
be found more appealing since the impact of the unit root’s presence or
absence is not so crucial for the distribution theory (see Bauwens, Lubrano
and Richard, 2000, chapter 6). Harvey’s (1989) structural models relax the
concentration on the ARIMA model that has taken such a firm hold in the
analysis of non-stationary economic time series, and offer an alternative set of
testing techniques. Lastly, in an alternative view of the uncertainty of struc-
ture, Leybourne, McCabe and Tremayne (1996) have developed tests for a
stochastic unit root, where, rather than have a fixed value of 1, the root being
tested is stochastic, having a distribution centred on unity under the null
hypothesis.
holds with an error that has mean zero, constant variance, the ACF of which
damps off quite quickly. Being stationary, it will not wander widely from its
n
mean value of zero and will cross it frequently. That is, a0 + ∑a x
i =1
i i ,t will
not depart from zero in any permanent way. So (2.41) holds in the long run –
Properties of Univariate Time Series 37
never exactly, but without long periods of failure. This property is known as
cointegration.
Unit root tests may feature in two ways in order to establish the existence of
such a long-run relationship. First, it is necessary to test for the unit roots in
the first place. Secondly, in order to establish that a long-run relationship
exists, it is necessary to test if the function of the data is stationary – that is, to
check that it does not contain a unit root. Thus one might perform a unit root
or stationarity test on f (Xt), although, importantly, if the parameters of this
function are estimated then adjustments to the critical values of the tests are
necessary due to the uncertainty of the estimates as representative values of
the true parameter values.
2.5 Conclusion
3.1 Introduction
The previous chapter dealt with the properties of univariate time series, and in
particular non-stationarity as characterized by the autoregressive unit root.
This chapter develops the theme by looking at the way in which this type of
non-stationarity can be modelled as a common feature such that the non-
stationarity in one series is fully explained by that present in an appropriate
combination of other series. It is natural to think of this in terms of a single
regression equation.
The unit root corresponds to long-run behaviour of a series, that is
to a component that has an arbitrarily low frequency. Thus, an equa-
tion which fully explains unit root behaviour can be thought of as fully
describing the long-run relationship between the series concerned, or, in
other words, it describes the underlying equilibrium behaviour. If the equa-
tion fails to capture all unit root behaviour it cannot be an equilibrium
relationship.
These ideas are discussed below. The context is intuitively appealing, being
that of a single equation with an unambiguous distinction between depend-
ent variable and (weakly exogenous) regressors.1 This does have limitations,
however, among which is that only one equilibrium relationship can be con-
sidered. This is relaxed in later chapters.
There is never any deviation from this relationship. To emphasize that there is
zero deviation, rewrite (3.1) as
yt − − 0 zt = 0. (3.2 )
However, rather than hold exactly, (3.2) might be subject to deviation. So, for
a given value zt, if the relationship (3.2) held exactly, the value of the y
process would be
yte = + 0 zt . (3.3)
yte
But the y process is not equal to but some other value, simply yt. Denote the
difference between these two, the extent to which the exact relationship does
not hold, by
t = yt − yte . (3.4)
If the t process varies about zero with a controlled size, then it is reasonable
to regard the exact relationship (3.3) as the underlying relationship between
the variables. Such a relationship is referred to as a long-run relationship. If,
on the other hand, the deviations t seem to grow without bound, or become
increasing dispersed about zero, then the exact relationship seems to be irrele-
vant. The stochastic property required of the deviations t is stationarity (and
zero mean). Substituting (3.3) into (3.4) and rearranging gives
yt = + 0 zt + t . (3.5)
E( yt ) = + 0 E( zt ) + E(t ). . (3.6)
E( yt ) = y ,
E( zt ) = z ,
y = + 0 z . (3.7)
This is the same functional relationship as (3.3), that is, it is the underlying
or long-run relationship. The sequence of operations leading to (3.7) can be
stylized as follows:
40 Modelling Non-Stationary Time Series
(i) Assume the processes zt and yt have settled to fixed values z– and y–
respectively.
(ii) Assume also that there are no more deviations to the system, i.e. assume
t = 0 (which can be regarded as its settled value).
(iii) Substitute these values into the complete relationship (3.5).
The resultant function relates the long-run static values of the variables. It is
known as the static equilibrium. The condition that all variables have settled
down in this way is known as the steady state.
The treatment here attempts to point out that while it is perfectly possible
to make the above substitutions and obtain the long-run static solution in this
way, this does not prove its existence. Rather it says that if zt and yt settle to
fixed values and disturbances are stationary then the long-run solution to the
model is given by (3.7). This discussion also indicates that the origin of these
settled values should be the expected value of the processes.
yt = + 1 yt −1 + 0 zt + 1zt −1 + ut . (3.8)
E( yt ) = + 1 E( yt −1 ) + 0 E( zt ) + 1 E( zt −1 ) + E(ut ). (3.9)
If it is assumed that
E( yt ) = E( yt −1 ) = y , (3.10a)
E( zt ) = E( zt −1 ) = z , (3.10b)
E(ut ) = 0, (3.10c)
then (3.9) can be used to derive a relationship between z– and y–. Substituting
equations (3.10a), (3.10b) and (3.10c) into (3.9) and rearranging gives
( + )
y= + 0 1 z. (3.11)
(1 − 1 ) (1 − 1 )
The left-hand side of (3.12) can be evaluated at a pair of actual values (zt, yt). If
the system was in equilibrium, this should be zero. The extent to which it is
not zero is the equilibrium error, which has been denoted t. That is,
( + )
yt − − 0 1 z = t (3.13)
(1 − 1 ) (1 − 1 ) t
which, on grouping terms in the lagged levels on the right-hand side, gives
(0 + 1 )
∆yt = + 0 ∆zt − (1 − 1 )( yt −1 − zt −1 ) + ut (3.14)
(1 − 1 )
From (3.13),
(0 + 1 )
yt −1 − zt −1 = t −1 + ,
(1 − 1 ) (1 − 1 )
The long-run solution and hence the equilibrium error can also be written in
terms of these polynomials since
(1) = 1 − 1
(1) = 0 + 1
which can be substituted into (3.13) to give
(1)
t = yt − − zt . (3.17)
(1) (1)
These results can be generalized for the ADL (m, n) model which is (3.16) with
m
( L) = 1 − ∑ L ,
i =1
i
i
(3.19a)
n
( L) = ∑ L .
i =0
i
i
(3.19b)
where
p −1
* ( L) = 0 − ∑ L * i
i
(3.20c)
i =1
p
i* =− ∑ , i = 1, 2, …, p − 1.
i
(3.20d)
j =i +1
Equation (3.21) is the ECM in this general case. Noting that the lag of a fixed
value is the same as the fixed value, Liy– = y– for i = 0, 1,2 …, and so
p p p
( L)y = 1 −
∑
i =1
i Li y = y −
∑
i =1
i y = 1 −
∑ y = (1)y
i =1
i
and similarly
( L)z = (1)z
and hence the equilibrium error as (3.17) but using the operators (3.19a
and 3.19b). The ECM may therefore be written in terms of the equilibrium
error as
m −1
* ( L) = 1 − ∑ L
i =1
* i
i
44 Modelling Non-Stationary Time Series
Similarly
n −1
* ( L) = 0 + ∑ L
i =1
* i
i
equation (3.22) shows the ECM reparameterization of the ADL (m, n) model
is in terms of the differences of the processes and the lagged equilibrium
error, the maximum lag of the differences of each variable being one less
than the maximum lag of the level in the ADL. Notice that the current
value of z t appears on the right-hand side of (3.22), that is the summation
involving its lags begins at 0, while the summation involving the lags of
y t begins at 1 because the current value is on the left-hand side of the
equation.
yt = y , zt = z , ut = 0, ∀t
and hence
∆yt = yt − yt −1 = y − y = 0,
∆zt = zt − zt −1 = z − z = 0.
(1)
* ( L)0 = * ( L)0 − (1) y − − z + 0
(1) (1)
(1)
y= + z
(1) (1)
as the long-run or steady-state solution to the model.6
Non-Stationary Time Series: Relationships 45
r
r
j (1)
* ( L)∆yt = ∑
j =1
*j ( L)∆z j ,t − (1) yt −1 −
(1)
− ∑ (1) z
j =1
j ,t −1
= ut . (3.23)
The lag polynomials * (L) and *j (L) are interpreted as before and are of order
one less than (L) and j(L), respectively.
—
where y means that the yt process is still changing, but by the same (steady
state) amount every period. This long-run equilibrium relates the steady-state
change in the y process to the steady-state levels of the z processes. If there are
unit roots in any of the j (L) the same approach could be used here resulting
in the replacement of the steady-state level of the corresponding zj by its
steady-state change. The possibly uncomfortable result is a long-run solution
that mixes equilibrium levels, changes along a steady-state growth path and
changes that might be best described as flows rather than stocks.
To give a slightly more concrete example, suppose (L) does not have a
unit root, and suppose there are two explanatory variables, the lag polynomial
on the second of which has a unit root. The ADL is
(1) (1)
y= + 1 z1 + 2 ∆x2 (3.24)
(1) (1) (1)
* ( L)∆yt = 1* ( L)∆z1,t + ˜2* ( L)∆ 2 z2,t − (1) yt −1 −
(1)
1(1) (1)
− z1,t −1 − 2 ∆z2 ,t −1 + ut , (3.25b)
(1) (1)
This type of choice over specification is not very comfortable as the two long-
run equilibria are different, one including a steady state change, the other not.
However, some clarification is often available either from the underlying
economic theory or, in the case where some or all of the variables are non-
stationary, their orders of integration.
(1)
y= + z. (3.26)
(1) (1)
(1)
t = yt − − zt . (3.27)
(1) (1)
But in what sense is (3.26) an equilibrium rather than just a long-run solution
that may or may not be relevant? It won’t be relevant if, for example, the vari-
ables do not tend to steady-state values, that is z– or y– don’t exist. This depends
on the properties of the error sequence, t. Writing (3.27) as
(1)
yt = + zt + t
(1) (1)
being I(0). If t is I(1), then the idea that (3.26) represents an equilibrium is
entirely unhelpful. Granger (1991) and Engle and Granger (1991) compare the
properties of I(1) and I(0) variables. An I(0) series has a mean and there is a
tendency for the series to return to this value frequently with deviations that
are large on a relatively small number of occasions. An I(1) process will
wander widely, only rarely return to an earlier value and its autocorrelations
will remain close to one even at long lags. Theoretically, the expected time it
takes for a random walk to return to some fixed value is infinite.8 Clearly then,
it makes no sense whatsoever for t to be I(1), but it does seem reasonable to
require it to be I(0). There is the issue of its mean value as well. Clearly this
should be zero, although this does not affect the time series properties of the
variable, meaning its stationarity, variance, and autocorrelation structure.9
So the working definition of a static equilibrium will be as follows.
y = 0 + 1z (3.28)
More generally, there may be an arbitrary number of variables and the func-
tion need not be linear. Engle and Granger (1991) use the term attractors to
describe relationships such as (3.28) when (3.29) holds.
where z,t and y,t are two white noise processes. These relationships can be
used to obtain the ARMA representations for any linear combination of zt and
yt Consider
w t = zt + y t . (3.31)
Equations in (3.30) indicate that it will be necessary to work with lag oper-
ators applied to both processes. To remove zt from (3.31) multiply through by
z (L), to give
( L) = y ( L) z ( L) (3.34)
which is a polynomial lag operator of order p = pz + py. The last line of (3.33)
is the sum of two MA processes, y (L) z (L) z,t and z (L) y (L) y,t, of orders
py + qz and pz + qy respectively. As long as the white noise processes z,t and y,t
are only contemporaneously correlated (i.e. E (z,t – i, y,t – i) = 0 if i ≠ j), then the
autocorrelations of the sum of these process will extend only as far as the larger
of the two individual orders. That is, the sum will be a MA process whose order
is the larger of py + qz and pz + qy. The variance of the new white noise driving
sequence and the MA coefficients will depend on the variance–covariance
matrix of y,t and z,t and the coefficient values of the original operators, z (L),
z (L), y (L) and y (L).10 Thus, the final line of (3.33) may be written
y ( L)θ z ( L) z ,t + z ( L)θ y ( L) y ,t = θ ( L)t (3.35)
where (L) is a lag polynomial of order q = max(py + qz, pz + qy), and the vari-
ance of t is chosen so that (0) = 1. Thus the time series model for wt is
ARMA(p,q), were
p = pz + p y
q = max( p y + qz , pz + q y )
( L)wt = θ ( L)t .
From (3.34), the roots of (L) will be those of z (L) and y (L). Consider three
important cases.
(i) If all the roots of these lie outside the unit circle, then all the roots of
(L) lie outside the unit circle and so wt is stationary. This means if
zt and yt are stationary so is their sum, wt.
(ii) Suppose zt is I(1) and yt I(0). Then (L) has one unit root, all the
other lying outside the unit circle and all the roots of (L) lie
outside the unit circle. Since (L) = y (L) z (L), it follows that (L)
has one unit root, all other outside the unit circle. Thus, wt is I(1).
This means that the sum of an I(0) and an I(1) process is I(1).
(iii) Suppose that both zt and yt are I(1). This case is a little more complic-
ated and it is necessary to go back to the working used when obtain-
ing the ARMA structure for the sum. Consider equation (3.33), using
the last equality on the right-hand side,
50 Modelling Non-Stationary Time Series
Since zt and yt are I(1), the AR operators from their separate ARMA representa-
tions may be written in terms of a new AR operator consisting of all and only
the stationary roots, and the differencing operator. So:
z ( L) = ˜z ( L)∆
( L) = ˜ ( L)∆
y y
~ ~
where z (L) and y (L) are of orders pz – 1 and py – 1 respectively. Thus (3.33)
may be written
The common factor of can now be cancelled on each side of the equation11
to give
Thus wt has one (not two) unit roots and wt has an ARMA(p*, q*) structure
where
p * = p z + p y − 2,
q * = max( p y + qz + pz + q y ) − 1.
and define
z̃t = zt + (3.38)
In the stationary case, E (z~t) = , Var (z~t) = Var (zt), and the covariances are
given by z~ (j) = E ((z~t – ) (z~t – j – )) = E(ztzt – j), so are the same as those of the
original process, zt, and since the variance is also unchanged, so are the auto-
correlations. Thus although a constant has to be added to the model, it is
otherwise unchanged.
Non-Stationary Time Series: Relationships 51
When zt is I(1), it is the case that (1) = 0 since (L) has a unit root.
Equation (3.39) can therefore be written as:
( L)z˜t = θ ( L)t
Since t is zero mean white noise, so is any scalar multiple, so t is zero mean
white noise. The structure is therefore unchanged as no new autocorrelation
has been induced. To summarize, if zt is ARMA (p, q) then so is t = + zt and
furthermore this process has the same autocorrelation structure, so that its AR
and MA operators are unchanged. In particular, if zt is I(d), then so is any
linear transformation. Mathematically,
( L)zt = θ ( L)t (3.41)
⇒ ( L) t = + θ ( L)t
where
t = + zt
= (1).
it follows from the results for the sum of two ARIMA processes that t is
ARIMA(p*, d*, q*) where
p * = pz + p y ,
q * = max( p y + qz , pz + q y ),
d * = max( dz , d y ).
Equations (3.43) are easily generalized to show that any linear combination of
an ARIMA(pz, dz, qz) and an ARIMA(py, dy, qy) is ARIMA(p*, d*, q*), with p*, d*, q*
as defined by equations (3.43). In particular:
∆zt = z ,t
where z,t is white noise. Let y,t be another white noise process, uncorrelated
with z,t, and define
y t = z t + y ,t . (3.44)
z ( L)∆ dz zt = θ z ( L) z ,t ,
d
y ( L)∆ y yt = θ y ( L) y ,t ,
Non-Stationary Time Series: Relationships 53
* ( L) = (1 − L)˜ * ( L),
θ * ( L) = (1 − L)θ˜* ( L), (3.46)
~ ~
the polynomials * (L) and * (L) being of orders p* – 1 and q* – 1 respectively.
Substituting (3.1) into (3.45) gives
θ * ( L) = (1 − L)θ˜* ( L).
Substituting this into (3.45) gives
p˜ ≤ pz + p y
q˜ ≤ max( p y + qz , pz + q y )
d˜ ≤ max( dz , d y )
3.3.3.2 Example
Figure 3.1 shows a time series plot of two series generated artificially according
to equations
∆zt = z ,t , (3.48a)
y t = 1 + z t + y ,t , (3.48b)
where z,t and y,t are two independent NIID(0,1) series. Both zt and yt are I(1)
and cointegrated by construction. Figure 3.2 shows the same data using a
scatter plot.
The time series plots indicate the non-stationary nature of both series, and
that, in this case, they track one another very closely. The latter property is
not necessary for two series to be cointegrated. It is quite possible that an
increasing gap may open up between them. This depends on exactly what the
cointegrating combination is. The scatter plot strongly emphasizes the linear
nature of the underlying long run, and in this case equilibrium relationship,
which is, from (3.48b), y– = 1 + z–.
It is also important to realize that the cointegrating property depends on the
selection of the correct linear combination. Using equation (3.48b), the linear
combination generating cointegration is 1,t = yt – zt. This is stationary by con-
struction. But suppose instead, the combination 2,t = yt – –12zt is considered.
Subtracting – –12zt from both sides of (3.48b) gives,
1 1
yt − z t = 1 + z t + y ,t . (3.49)
2 2
But – –12zt is I(1) and y,t is I(0), so from (3.49), 2,t is I(1), and so non-stationary.
This illustrates a key point: to obtain cointegration where it exists, the correct
linear combination must be used.
Figure 3.3 shows clearly that this combination is not stationary and so not a
cointegrating combination. To illustrate the case where a cointegrating combi-
nation still results in a gap opening up between series represented on a time
series plot, suppose instead of (3.48b), a series y*t is generated according to
1
yi∗ = 1 + z t + y ,t . (3.50)
2
Figure 3.4 is a time series plot of zt and y*t. It would be wrong to conclude
from this that just because the series are diverging that they are not cointe-
grated. It is simply that the difference between the two is not the cointegrat-
ing combination.
56 Modelling Non-Stationary Time Series
Figure 3.4 Time series plot of y*t and zt generated according to (3.48a) and (3.50)
+ 1
y= + 0 z (3.52a)
1 − 1 1 − 1
+ 1
t −1 = yt −1 − − 0 zt −1 (3.52b)
1 − 1 1 − 1
All terms on the right-hand side of (3.53) are I(0): yt and zt because yt
1
and zt are I(1), ut by assumption, and, as long as ——
1–a
is well defined, then
1
Non-Stationary Time Series: Relationships 57
3.3.4.2 Example
Consider the ADL(1, 1) model
1 3 3
yt = 1 + yt −1 + zt − zt −1 + ut (3.54a)
2 4 8
zt = t (3.54b)
where ut and t are uncorrelated white noise processes. First of all, zt is I(1)
since it is a random walk, and so is yt since equations (3.54a) and (3.54b)
imply
1 3 1
∆ 1 − L yt = 1 − L t + ∆ut . (3.55)
2 4 2
The right-hand side of (3.55) can be written as an MA(1) process, the left-hand
side shows that the AR operator has a single unit root, so the process is I(1),
and more fully is ARIMA(1,1,1). Equation (3.54a) has long-run solution
3
y =2+ z, (3.56a)
4
58 Modelling Non-Stationary Time Series
3
t = yt − 2 − zt , (3.56b)
4
3 1 3
∆yt = ∆zt −1 − yt −1 − 2 − zt −1 + ut .
4 2 4
Figure 3.5 presents time series plots of zt and yt, and Figure 3.6 plots the equi-
librium error t as given by equation (3.56b). The disturbance processes, ut and
t are both NIID(0, 1).
The equilibrium errors in Figure 3.6 are stationary, but do not appear to be
white noise. There are runs where the values remain continuously positive for
a period of time, and others where negativity persists. This is consistent with
autocorrelation. The time series properties of the process can be obtained from
(3.56b) as follows.
First write the ADL of equation (3.54a) as
( L)yt = 1 + ( L)zt + ut ,
where
1
( L) = 1 −L, (3.57a)
2
3 3
( L) = − L. (3.57b)
4 8
Then
3
t = yt − 2 − zt
4
3
⇒ ( L)t = ( L)yt − (1)2 − ( L)zt
4
3
⇒ ( L)t = 1 + ( L)zt − (1)2 − ( L)zt . (3.58)
4
Note that (1) = –12 and hence (1) 2 = 1, which, on substitution into (3.58)
and rearrangement, gives
3
( L)t = ( L)zt − ( L)zt + ut .
4
Using (3.57a) and (3.57b),
3 3 3 3 1
( L) − ( L) = − L − 1 − L = 0.
4 4 8 4 2
Hence (L) t = ut or
1
1 − L t = ut .
2
Thus t is a stationary AR(1) process. Clearly the root of (L) determines the
persistence of the equilibrium errors. The closer it is to one, the more persis-
tent they will be. This also determines the speed of adjustment, which is (1).
As the root tends to 1 this speed of adjustment will tend to 0. In the limit, the
60 Modelling Non-Stationary Time Series
long-run solution does not exist, and therefore neither does an equilibrium
relationship or cointegration.13
It is straightforward to construct an alternative ADL(1, 1) model to (3.57a)
and (3.57b) that has the same long-run solution, but much more persistent
equilibrium errors and slower adjustment to equilibrium. To get the increased
persistence and slower adjustment to equilibrium, replace (3.57a) by
( L) = 1 − 0.95 L (3.59a)
This has (1) = 0.05 instead of 0.5. In order to obtain the same long-run solu-
tion, the intercept, , which was originally 1, and must be multiplied by
0.1. Although there are a number of ways to obtain the latter result, the
easiest is to multiply the original operator given by (3.57b) by 0.1 to give
with the new value of being 0.1. Thus the DGP is now
with the polynomial lag operators defined by equations (3.59a) and (3.59b),
while the DGP for zt is still the random walk (3.54b). Figure 3.7 shows both
the original errors (etaold) and the new much more persistent ones (etanew);
note the scale of this and the earlier plot are different.14 In fact, what seems to
have happened is that the broad pattern of fluctuations has remained the
same, but their amplitude has become much larger. For example, there are
occasions where the low persistence series (etaold) is positive, but an indi-
vidual shock (i.e. disturbance term) is sufficient to drive the sequence across
the zero line so that the neighbouring value is negative. However, with
increased persistence (etanew), the same shock is insufficient to drive the
series into negativity because it is a lot further away from zero.
This result can be shown algebraically. For the models under consideration,
the structure of the equilibrium errors is:
t −1
t = t0 + ∑
i =1
t −i
ui + ut .
So the current equilibrium errors can be decomposed into two parts: that due
to previous shocks, say
t −1
U t −1 = ∑
i =1
t −i
ui ,
and that due to the current shock, ut. In the case where ut is white noise with
variance u2, the variance of previous shocks is
t −1
2 (1 − t −1 ) 2
Var(U t −1 ) = ∑
i =1
2( t − i )
u2 =
(1 − 2 )
u
The ratio of the variance of the current shock to that of the component due to
past shocks is therefore
(1 − 2 )
r ( ) =
(1 − t −1 )
2
n
vector ´ = (1 … n) is called the cointegrating vector, and
=
∑
i xi ,t is called
the cointegrating combination of the variables. i 1
The most important special case of this is where d = b so that the linear
combination is stationary. The ADL case discussed above is of this type with
d = 1, so the variables in this case are CI(1, 1).
A useful thing to realize at this point is that a regression equation with dis-
turbances can be written as a linear combination like (3.60). Suppose
x1,t = 2 x2 ,t + 3 x3,t + … + n xn,t + ut .
ut = x1,t − 2 x2 ,t − 3 x3,t − …− n xn ,t ,
−1 −1
T T 2 t T 2
ˆ = ∑
t =1
∑
zt y t zt
t =1
=+ ∑
T =1
ztt ∑ zt .
t =1
for neglected structure in the disturbances of the test regression (Kremers et al.
1992).
To illustrate, consider the bivariate ECM of equation (3.8), but without an
intercept for simplicity,
yt = 1 yt −1 + 0 zt + 1zt −1 + ut ,
original ECM, namely 1 = 3, so that there is no requirement for the extra dif-
ference term, or, more accurately, ignores the fact that there will be a correla-
tion between the disturbances and the regressors of the standard ADF
regression, since both will include a component of zt-1.19 The test statistic is the
usual ADF t-ratio on t–1 in (3.63b). The equilibrium error, t, should be calcu-
lated using a consistent estimator of the cointegrating coefficients. These could
come from the static regression, or, from the long run solution to the dynamic
model.
Equation (3.63b) is modified for more complex dynamics and additional
variables by adding differences of all explanatory variables and lagged differ-
ences of all variables (including lags of yt).
Finite sample critical values have to be simulated. Illustrative values may be
found in Patterson (2000, table 8.11). Inder (1993) has found that such tests
display more power than the usual ADF residual based tests, and have addi-
tional desirable properties. They are more robust, because when ψ2 ≠ 0, then
the finite sample performance of the tests are distorted by the exclusion of
extra dynamic terms such as zt.
(i) If there is more than one cointegrating vector, which is possible when
there are more than two integrated variables, then the single equation
approach is only likely to result in a linear combination of these.
(ii) Even if there is only one cointegrating relationship, all variables may
be responding to deviations from equilibrium. Estimating a single
equation only, ignores this and leads to inefficiencies in the estimation.
This amounts to assuming that the right-hand side variables are weakly
exogenous, so their dynamic equations exclude the cointegrating
relationship.
3.5 Conclusion
with models that were mainly static in nature, while the dynamic nature of
data was implicit in univariate time series analysis. In univariate time series
analysis data is differenced to induce stationarity, but this was not common
practice in economics until the 1980s. One of the first articles to amalgamate
a time series model with an econometric formulation with levels was the wage
equation article produced by Sargan (1964), the model unlike many wage
equations of the time considered the question of the dynamic specification of
a wage inflation equation in the context of a model that is estimated by
instrumental variables. The article is the first example of an error correction
model, which was both well ahead of its time and highly influential in terms
of the institutional modelling of UK wage equations. The ARMAX representa-
tion is the first example of cointegration as the error correction term is
significant assuming the type of asymptotic normality of the t-test on the
coefficient of the error correction term is accepted (Kremers et al. 1992).
Granger and Newbold (1974) provided the first simulation experiments to
consider the impact of non-stationarity on the diagnostics associated with
nonsense regressions. The problem of nonsense correlation was well known to
time series statistics through the work of Yule (1926) and should have been
known to the econometrics literature because of the intervention of Keynes
(1939), who discussed the potential for misanalysis when regressions between
variables with intermediate causes were considered. Granger and Newbold
considered the special case for which the intermediate cause was an indepen-
dent stochastic trend. While time series analysts started to consider tests for
non-stationarity (Dickey and Fuller 1979), econometricians in the UK started
to implement models which exhibited error correction behaviour. Davidson et
al. (1978) introduced the notion that the error correction term explained the
long-run behaviour of economic series and these dynamic models are again
cointegrating relationships in the sense of Kremers et al. (1992) as they
include combinations of stationary variables. In one case the lag series renders
the variable stationary, for the error correction term, the contemporaneous
observation in a different time series does the same thing.
Granger (1983) introduced the term cointegration to the literature, while
Sargan and Bhargava (1983) provided the first recognized test of existence of
long-run behaviour. It was Granger, via his decomposition of the Wold repre-
sentation of what are quasi over-differenced series, whi explained how depen-
dent moving average behaviour might yield long-run relations with variable
that are stationary. Engle and Granger’s (1987) article provided a means by
which bivariate relationships might be given error correction representations,
though more generally this proposition does not follow from the results devel-
oped in the article. The two-step method developed by Engle and Granger
shows that the long-run parameters in the case where there are two variables
68 Modelling Non-Stationary Time Series
4.1 Introduction
This chapter considers the case where there are a number of non-stationary
series driven by common processes. It was shown in the previous chapter that
the underlying behaviour of time series may arise from a range of different
time series processes. Time series models separate into autoregressive processes
that have long-term dependence on past values and moving average processes
that are dynamic but limited in terms of the way they project back in time. In
the previous chapter the issue of non-stationarity was addressed in a way that
was predominantly autoregressive. That is, stationarity testing via the compar-
ison of a difference stationary process under the null with a stationary auto-
regressive process of higher order under the alternative. The technique is
extended to consider the extent to which the behaviour of the discrepancy
between two series is stationary or not. In the context of single equations, a
Dickey–Fuller test can be used to determine whether such series are related;
when they are this is called cointegration. When it comes to analyzing more
than one series then the nature of the time series process driving the data
becomes more complicated and the number of combinations of non-station-
ary series that are feasible increases.
Here we consider in detail a number of alternative mathematical models
that have the property of cointegration. Initially we discuss representations
that derive from the multivariate Wold form. This is the approach first consid-
ered by Granger (1983) and Granger and Weiss (1983), in which the Granger
representation theorem is developed. From this theorem there are a number
of mathematical decompositions, which transform moving average models
into vector autoregressive models with multivariate error correction terms.
From the perspective of the probability model from which the Wold form
derives the VMA representation provides a more elegant explanation of non-
stationary time series. First, the conditions associated with cointegration in
69
70 Modelling Non-Stationary Time Series
the VMA are more succinct and secondly implicit in the fundamental con-
dition for cointegration in the VMA is the explicit conclusion that under co-
integration the long-run levels relationships are stationary. An alternative
mechanism of decomposing the VAR into an error correction form derives
from Engle and Granger (1987), but beyond the single equation case inference
about non-stationary processes, estimation of the long-run parameters and
testing hypotheses about the long run all derive from the maximum likeli-
hood theory developed by Johansen. When it comes to constructing dynamic
models, then the approach developed by Johansen appears to provide a bridge
between two main strands of econometric time series modelling: first, the VAR
reduced form approach derived from the rational expectations literature by
Sims (1980) and the error correction approach that has developed from the
work of Sargan (1964), Davidson et al. (1978) and Ericsson, Hendry and Mizon
(1998). The cointegration/error correction approach emphasizes the descrip-
tion of detectable economic phenomena in the long run. The cointegration
approach assumes that short-run processes are not well defined by virtue of
aggregation across agents, goods and time, differing forms of expectations,
learning, habits and dynamic adjustment. Alternatively, the long run provides
a useful summary of the non-detectable short-run dynamics, while the error
correction approach in the confines of the VAR permits short-run policy
analysis via the impulse response function and the ability to analyze both
short-run and long-run causality and exogeneity. If the VAR defines a valid
reduced form, then it allows the detection of the readily available structure.
More conventional econometric approaches (Pesaran et al. 2000) criticize
the Johansen methodology for being ad hoc in the sense that it doesn’t use
as its starting point an econometric system of the type defined by the
Cowles foundation, but, as is discussed in the context of RE models in
chapter 6, it is still possible to introduce short-run restrictions in the
confines of a VAR-style cointegration approach. The VMA approach appears
to be less interested in the distinction between the long run and the short
run, as to whether money causes inflation as compared with money leading
to price rises, but still permits impulse response and short-run causality
analysis. However, in the context of pure MA models, inference and detec-
tion of long-run behaviour is less well developed. Impulse response analysis
emphasize the responsiveness of variables to and the effectiveness of policy.
The use of the VAR and VMA for short-run analysis is discussed in detail by
Lippi and Reichlin (1996).
Here, we define cointegration in terms of the Wold decomposition, then we
consider the Johansen approach to testing and estimation, some empirical
results are derived from the literature and discussed in the context of an
increasing body of evidence based on Monte Carlo simulation. Alternative
Multivariate Approach to Cointegration 71
representations are discussed along with the extension of the methods applied
to multi-cointegration and polynomial cointegration.
.5 .25
xt = x +
1 .5 t −1 t
Just as with scalar autoregressive models, the VAR(p) may be reparameterized
into differences and a single lagged levels term. Any pth order n × n matrix
p
polynomial of the form A( L) = I n − ∑A L
i =1
i
i
may be written in the form
A( L) = − ∏ L + A * ( L)(1 − L)
p −1
I n −
where A * ( L) =
i =1
∑
Ai* Li if p > 1
,
I n if p = 1
p
Ai* = − ∑ A , i = 1, 2, …, p − 1 and ∏ = − A(1)
j = i +1
i
xt = + t + ∑Θ L i
i
t (4.5)
i =1
4.2.3.1 Cointegration starting from a VMA and deriving VAR and VECM forms
This was the first approach to explaining how to characterize cointegration in
the context of a multiple time series model. It is in many ways the most
natural for two reasons. First, it builds on an established representation
theorem, the multivariate version of the Wold representation. Secondly, it
naturally restricts cases under examination to whatever orders of integration
are the subject of investigation. Suppose xt is a n × 1 vector of time series each
74 Modelling Non-Stationary Time Series
element of which is I(1). Then the first difference of the vector, xt, is I(0). As
such it has a Wold representation,
The task is to determine how this relationship can give rise to cointegration.
This follows by application of the reparameterization to C(L), then C(L) =
C(1)L + C*(L)(1 – L), for some C*(L) of order one less than C(L). Substituting
this into (4.6) gives
∆xt = (C(1) L + C * ( L)(1 − L))t . (4.7)
′C(1) = 0 (4.12)
since otherwise the right-hand side of (4.11) would be the sum of the I(1)
t −1
process
∑
i =0
i and the I(0) process (0 + ′C*(L)t) and hence xt ~ I(1). But
A moving average process such as (4.7) with singular C (1) may be called a
reduced rank moving average process. Next, the link between a reduced rank
moving average and a VECM needs to be established. This is done by first
establishing that xt has a vector autoregressive moving average (VARMA) repres-
entation. A VARMA process is a VAR with VMA disturbances, so may be
written
A( L)xt = B( L)t (4.13)
p q
where A( L) = I − ∑ A L and B( L) = I − ∑ B L
i =1
i
i
j =1
j
j
∆xt = C( L)t
must be inverted. However, since xt is CI(1, 1), it follows that C (1) is singular.
That is, C (L) has unit roots preventing its inversion (see Appendix A.3). In
addition, a representation of xt rather than xt is required. The problem is
overcome by factoring out the unit root components from C (L), although
scalar factors are not available. Even so, the approach still very neatly allows
76 Modelling Non-Stationary Time Series
~
Pre-multiplying the Wold form above by Hc (L) transforms the VMA into a
VARMA:
˜ ( L)∆x = H
H ˜ ( L)C( L)
C t C t
= ∆g˜C ( L) I nt
This is a unique VARMA representation of xt, for the case where the order of
~
cointegration is (1, 1) and g c (L) is a scalar polynomial.
To further motivate this result consider the following example.
Let q = 1, n = 3 and
1 1 2L 1 2 L
C( L) = − 1 2 L 1 − 5 4L − 1 4 L .
1 4 L 1 8 L 1 − 7 8 L
Then
1 12 1 2
C(1) = − 1 2 −1 4 − 1 4 .
14 18 1 8
It is easy to see that C (1) is rank deficient, because the rows and columns of
this matrix are scalar multiples of each. For example, using the notation C (1)i.
to denote the ith row of C (1),
C(1)1. = [1 12 1 2 ] = −2C(1)2. = −2[ − 1 2 −1 4 − 1 4]
= 4C(1)3. = 4[1 4 18 1 8].
By definition the rank of a matrix is the number of linearly independent
rows or columns, which in this case is 1. The decomposition requires pre-
Multivariate Approach to Cointegration 77
~
multiplication of C(L) by the matrix the Hc (L) where the adjoint of C(L)
~
is given by Ca(L) = (1 – L)Hc(L). Calculation of the adjoint follows from the
transpose of the usual matrix of minors (further detail see Dhrymes 1984).
Therefore
− 17 L + 9 L2 + 1 − 1 L + 1 L2 − 1 L + 1 L2
8 8 2 2 2 2
C a ( L) = 12 L − 12 L2 − 78 L − 18 L2 + 1 14 L – 14 L2
− 1 L + 1 L2 − 18 L + 18 L2 − 54 L + 14 L2 + 1
4 4
1 − 9 8 L − 1 2L − 1 2 L
= (1 − L) 1 2 L 1 + 1 8 L 1 4 L = (1 − L˜) H C ( L)
− 1 4 L − 1 8 L 1 − 1 4 L
This establishes the AR operator of the VARMA (4.14). To obtain the scalar
MA operator note that, from the results on reduced rank polynomials, C(L) =
~
g c (L). In this case
17 5 1
C( L) = 1 − L + L2 − L3 = (1 − L)2 (1 − 1 8 L) = (1 − L)[(1 − L)(1 − 1 8 L)], (4.15)
8 4 8
~
and therefore g c(L) = (1 – L)(1 – 1/8L). Hence the VARMA representation is:
1 − 9 8 L − 1 2L − 1 2 L
1 2 L 1 + 1 8L 1 4 L xt = (1 − L)(1 − 1 8 L)t .
− 1 4 L − 1 8 L 1 − 1 4 L
It should be noticed that the MA component is not invertible. In general the
~
VMA does not directly transform into a VAR as only in special cases does g c(L)
invert.
This completes the numerical example.
An important reason for wanting to re-express a cointegrating VMA in dif-
ferences is that a VAR in levels follows from the widely employed techniques
of Johansen (1995a). These assume a (finite order) VAR representation. The
VMA in differences is a very natural starting point since it employs Wold’s
fundamental representation of a stationary process. It also conveniently
allows the scalar processes to have a unit root (be I(1)) and be cointegrated.
Such properties are more difficult to impose starting from a VAR (Johansen
1995a).
From the Johansen point of view, the Engle–Granger approach to trans-
forming a VMA in first differences to a VARMA in levels is inconvenient in
that some moving average structure remains. The right hand side of equation
(4.14) is a VMA with a scalar diagonal matrix lag operator. It is not a pure VAR
as defined in equation (4.2). The advantage is that it applies to any cointegrat-
ing (CI(1, 1)) VMA.
78 Modelling Non-Stationary Time Series
Engle and Yoo (1991) show that if the lag polynomial operator of the original
cointegrating VMA is rational (each element of the VMA operator is rational
and may have a different denominator polynomial), then there exists a VAR
representation where the right-hand side is white noise and the autoregressive
operator is rational. As with the Engle–Granger transformation, the unit root
moves from being explicit in the VMA to being implicit in the VAR, but now
there is no autocorrelation of the disturbances, and there is no restriction that
the denominator polynomials of the final VAR operator need all be the same.
The Engle–Yoo approach also has the advantage that it extends fairly readily
to other forms of cointegration.
The problem to be addressed is how to obtain a VAR form in levels from
a VMA form in differences. There are various ways of establishing the relation-
ship. In general such theorems have become known as (Granger) representa-
tion theorems, after a working of the problem in Engle and Granger (1987).
As in the univariate case there are a number of alternative time series repres-
entations. Each representation has different characteristics. Here the alternat-
ive forms are used to move between models where differencing eliminates
strong autoregressive behaviour, but due to dependence among economic
series some over-differencing remains in the form of moving average behav-
iour with unit roots. If this type of behaviour inverts to a model with auto-
regressive behaviour then there may be cointegration amongst the levels of
the non-differenced data. It is the movement from the MA to the AR which is
important.
The application of the Smith–McMillan (SM) form to cointegrated systems is
presented in Engle and Yoo (1991). A rational operator is not in general finite,
which is a problem for the Johansen methodology, although special cases
exist where the left-hand side reduces to a finite order VAR. (See section 4.7.2
for a discussion of a situation where a finite order pure VAR is available for the
first differences.) However, as the denominator polynomials in the Engle–Yoo
representation have all their roots outside the unit circle, the operator co-
efficients tend to zero as the lag length increases. This approach is described
below.
Before describing the approach in detail, it is useful to make some prelim-
inary points.
C( L) = G( L)−1 CS ( L) H ( L)−1
where CS (L) is a diagonal finite order polynomial matrix and G (L) and H (L)
are invertible polynomial matrices having unit determinant (called unimodu-
lar matrices, see the Appendix A.2 for details), representing the elementary
80 Modelling Non-Stationary Time Series
and hence
G( L)xt = CS ( L) H ( L)−1 t . (4.16)
1 − 3 L −L
C( L) = 14
− L 1− 1
L
8 2
can be written
−1
1 − 6 1 0 −1
C( L) = 1 1 − (2 L − 6) .
3 5
L 1 − L 0 1 − L + L 0 1 2
1
8 4 4 4
The roots of C (L) and the Smith form, CS (L), are the same since G (L) and
H (L) are unimodular. Further, the diagonality of CS (L) allows any individual
roots to be factored out into another diagonal matrix. In particular, unit roots
may be factored out. In this example,
1 0 1 0 ~
where C˜S ( L) = and D( L) = . By construction, CS (L)
0 (1 −
1
4
L) 0 (1 − L)
has all roots outside the unit circle (see Appendix A.3), and so can be inverted.
~
So, equation (4.16) can be pre-multiplied by CS (L)–1 to give
Through D (L), the presence of a unit root is now much more apparent than
was the case in the original VMA expression.
so that
(1 − L) 0 (1 − L) 0
D * ( L)D( L) =
0 1 0 1
(1 − L) 0
= = (1 − L) I 2
0 (1 − L)
Multivariate Approach to Cointegration 81
1 (1 − L
−1 1
L)
H ( L)D * ( L)C˜S ( L)−1 G( L) = 1 − L 1 2
4 L 1 − L
3
8 4
so that
K( L)xt = ∆t .
But now consider the case where the original model was a VAR for the differ-
ences of a process, that is xt = yt, so that, after rearrangement
K( L)∆yt = ∆t .
Then, apart from initial conditions, the differencing operator can be cancelled
to give
K( L)yt = t .
∆yt = C( L)t .
where:
Since it is diagonal, CSM (L) can be factorized into the product of two diagonal
matrices, one of the divisor polynomials, gi (L), and one of the fi (L). That is
F( L) = F˜( L)D( L)
~ ~
where F (L) = diag (f i (L),
I n− r 0′
(1 − L)dn −r +1 0 … 0
D( L) = 0 (1 − L)dn −r + 2 … 0 ,
…
0r
…
0 0 … (1 − L)dn (4.25)
As long as all the non-unit roots of C (L) lie outside the unit circle, it follows
~
that all the roots of F (L) lie outside the unit circle.
84 Modelling Non-Stationary Time Series
Assumption A3: The roots of C (L) are either equal to unity or lie outside the
unit circle.
~
Then F (L)-1 exists, implying
xt = C( L)t , (4.28)
~
then pre-multiplying by F (L)-1 G (L) U (L) gives
This makes the presence of the unit roots explicit but is not in VAR form. In
order to take the problem further, specific cases must be considered.
C( L) = (1 − L)−1 C * ( L) (4.32)
where
˜i, j ( L)
C * ( L) = ,
˜i, j ( L)
i, j ( L) if i = m, j = n
˜i, j ( L) =
(1 − L)i, j ( L) otherwise
and
i∗, j ( L) if i = m, j = n
i, j ( L) =
i, j ( L) otherwise
Multivariate Approach to Cointegration 85
D * ( L)D( L) = ∆I (4.33)
since then, pre-multiplying (4.31) by D*(L) gives (apart from initial values)
which is of the required VAR form. However, such a D* (L) will not be avail-
able for all D (L) of the form given in equation (4.25). To see what is required,
write
(1 − L)dn −r +1 0 … 0
0 (1 − L) dn −r + 2
… 0
D ( L) = (4.36)
…
…
…
… (1 − L)dn
0 0
86 Modelling Non-Stationary Time Series
so that
I n− r 0′r
D( L) = (4.37)
0r D ( L)
where
D1∗,1 ( L) = ∆I n− r , D1∗,2 ( L) = 0, D2* ,1 ( L) = 0, D2∗,2 ( L) = I r . (4.39)
I n− r 0r′
D( L) = . (4.40)
0r ∆I r
As a result D* (L) D (L) = I and substituting into (4.35) gives the VAR in levels
corresponding to the VMA in differences when the variables are CI(1, 1).
This illustrates that if xt is CI(1, 1) with cointegrating rank r (Assumption
A2), then the system may be represented either as a VMA in xt or a VAR in xt,
providing the VMA is rational (Assumption A1).
The SMY form of the VMA operator is given by
where U (L) and V (L) are unimodular matrices corresponding to sets of ele-
mentary row and column operations respectively.
Multivariate Approach to Cointegration 87
In summary, the SMY form consists of the factorization of all the unit
roots from the VMA operator (C (L)) in such a way (as D (L), that, by pre-
multiplication by an appropriate matrix (D* (L)), a single differencing operator
() may be isolated on the MA side of the equation. This may then be can-
celled with the differencing operator on the AR side where the original VMA is
for a differenced process. This is the process represented in (4.34) leading to
the final representation of (4.35).
′C(1) = 0. (4.43)
There are r such vectors that are linearly independent. The space of such
vectors is the null space (of the columns) of C (1). This can be compared with
the corresponding VAR representation. For convenience, put
C(1) A(1) = U (1)−1 G(1)−1 F˜(1)D(1)D * (1) F˜(1)−1 G(1)U (1) (4.46)
and
I n− r 0 0 0
D(1) = and D * (1) =
0 0 0 I r
88 Modelling Non-Stationary Time Series
In addition,
A( L)C( L) = C( L) A( L) = ∆I n . (4.48)
∆xt = C( L)t
which, on cancelling the differencing operator, gives the VAR form. Pre-
multiplying again by C (L) reverses the transformation:
C( L) A( L)xt = C( L)t
⇒
∆xt = C( L)t ,
regenerating the VMA. Broadly speaking, then, the problem that has been
solved to show that the VMA in differences can be expressed as a VAR in
levels is to find a matrix A (L) such that equation (4.48) holds. The solution is
(4.44).8
circle because these are the poles of C (L)) and those of D* (L) (unit roots). Its
~
poles are the roots of F (L), and so are all outside the unit circle.
Now consider any other VAR in levels representation of a CI(1, 1) system,
say
˜ ( L)x = .
A t t
~
Then as long as A (L) satisfies assumptions A1–A4, then there exists a matrix
~ ~ ~ ~ ~
C (L) such that A (L) C (L) = C (L) A (L) = I and, hence, pre-multiplying by
~
C (L), the VAR becomes
C˜( L) A
˜( L)x = C˜( L) ⇒ ∆x = C˜( L) ,
t t t t
~
which is a VMA representation. By arguments similar to those above, C (L) will
also satisfy the assumptions. It is therefore the case that, among the class of
models having operators obeying assumptions A1–A4, the VMA in differences
and VAR in levels are equivalent representations of a CI(1, 1) system, and that
this sub-class of models is closed.
The starting point is a VAR where the intercept has been set to zero for
simplicity. That is
A( L)xt = t , (4.49)
where
p
A( L) = I + ∑A L.
i =1
i
i
It is also assumed that all the roots of A (L) are either outside the unit circle or
equal to unity. Thus while non-stationarity is allowed, this can only be due to
standard unit roots.9 This VAR may be written
p
xt + ∑Ax
i =1
i t −i = t (4.50)
p p
where ∏ = I + ∑
i =1
i = − A(1) and
i = ∑A.
j = i +1
i
which is I(0) since all terms on the right-hand side are I(0) when xt ~ I(1).
Then must be of reduced rank, since if this were not the case then its inverse
would exist and
p −2
xt −1 = ∏ −1 ∆xt −
i =1
∑
i∗∆xt − i − t ~ I ( 0)
which contradicts xt ~ I(1). The fact that xt–1 ~ I(0) then establishes cointegra-
tion as long as ≠ 0, the rows of being cointegrating vectors. If = 0 then it
is immediate from the VECM that the process is not cointegrated. Note that
is an n × n matrix, and let rank () = r where for cointegration r<n, so that is
of reduced rank. Then there exist n × r matrices and both of maximum
rank, r, such that
∏ = ′ (4.52)
Multivariate Approach to Cointegration 91
Furthermore, since each row of is a linear combination of the rows of ′, the
rows of ′ are cointegrating vectors. The rank of is known as the cointegrating
rank of the system. This establishes the following result.
( L)∆xt = ∏ xt −1 + t ,
where
ϒ = (1). (4.53)
(ii) For any full rank n × r (r ≤ n) matrix , define its orthogonal compliment,
⊥ dimensioned n × (n – r) with rank n – r such that
ϕ ′ϕ ⊥ = 0,
0 if r = n
ϕ⊥ = .
I if r = 0
92 Modelling Non-Stationary Time Series
There are explicit formulations of ⊥, though sub-blocks of this matrix are
arbitrary. Also define
ϕ = ϕ (ϕ ′ϕ )−1 ( 4.57)
with the projection matrix
Pϕ = ϕ (ϕ ′ϕ )−1 ϕ ′ = ϕϕ ′ = ϕϕ ′, (4.58)
Johansen’s key (necessary and sufficient) condition on the VAR such that
the processes are integrated of order 1 and cointegrated, is expressed in terms
of ϒ, ⊥ and ⊥. An outline of the derivation of this condition is provided
below.10 The result applies only to VARs the roots of which are either equal to
one or lie outside the unit circle.
The approach used is to split the differenced process, xt, into components
relating to the directions of (potential) cointegration, t (which occur in dif-
ferenced form) and non-cointegration, ut (in levels). The difference process is
then cumulated (summed from the first to the tth values) to give an equation
for the levels, xt. The cumulation results in: the sum of the ut, giving rise to a
stochastic trend (a unit root process if ut is stationary); the transformation of
the differences of t to its levels; and the appearance of an initial value vector
(analogous to a constant of integration). To keep the treatment simple, the
initial values are ignored (set to zero).11 Since in detail, t is a set of linear
combinations of the components of xt, if both the ut and the t are I(0) then xt
is both I(1) (as a result of the stochastic trend involving ut) and cointegrated
(because then t is a linear combination of I(1) variables that is I(0)). So the
proof revolves around showing that ut and the t are I(0). The condition
results from the need for the stationarity of these processes. Having shown
this, it is fairly straight forward to show that cointegration of order (1,1)
implies the condition, and hence it is established that the condition is both
sufficient and necessary.
An outline of the statement and proof is provided here. The result is that a
necessary and sufficient condition for xt to be both I(1) and cointegrated (i.e.
CI(1, 1)) is that
The second term on the right-hand side of (4.60) can be rearranged in terms
of potentially cointegrating combinations of xt. Define
t = ′xt (4.61)
these being the potentially cointegrating combinations. Also, arising from the
first term on the right hand side of equation (4.61), define
ut = ′∆xt (4.62)
Then, from (4.60)
∆xt = ⊥ (⊥′ ⊥ )−1 ut + ( ′)−1 ∆t . (4.63)
The process of interest is not xt but xt itself, obtained by summing the differ-
ence process up to the current period. When this is done, an initial value is
also generated. In addition, in order to reuse t as the index for the current
period, a different index has to be used on the process being summed. Thus,
t
∑ ∆x = x − x .
i =1
i t 0
Thus it is sufficient to show that both ut and t are I(0). It is in the process of
obtaining this result that condition (4.59) arises.
Define
x˜t = (t′ ut′ )′.
94 Modelling Non-Stationary Time Series
If a VAR representation for can be found for x ~ , all the roots of which lie
t
~ is stationary.12 The required VAR is obtained by:
outside the unit circle, then x t
(ii) substituting using 4.61 and 4.62 to give equations in ut and t, though a
term in 2xt remains;
(iii) noting that the term in 2xt can be expressed in terms of the differences
of x~;
t
(iv) expressing the resultant equation in terms of x ~ only.
t
˜ (1) ( L) = − I ′ϒ⊥
A (4.68)
0 ⊥′ ϒ⊥
~
It remains to establish that A (L) has all its roots outside the unit circle. This is
done in two stages. Firstly it is established that any non-stationarity is due to
~
unit roots (by showing that the roots of A (L) and A (L) are the same, except that
~
the number of unit roots may differ), and then showing that A (L) has no unit
~
roots. To show the relationship between the roots of A (L) and A (L), note that
˜( z ) = (1 − z )−( n− r ) A( z ) Q
A (4.69)
˜(1) ≠ 0,
⊥′ ϒ⊥ full rank ⇒ ⊥′ ϒ⊥ ≠ 0 ⇔ A
Multivariate Approach to Cointegration 95
p −1
∆xt = ∏ xt −1 + ∑
∆x
i =1
i t −i + t ,
that xt must be I(0), and hence that t = ′xt is I(0). Thus the VAR for
~
x˜t = (t′ut′ )′ , still given by (4.66) must be stationary, so A (1) ≠ 0. But as
before,
˜(1) = − I ′ϒ⊥
A
0 ⊥′ ϒ⊥
~ — — —
and so A (1) = –
—′
⊥ϒ ⊥. Hence – ⊥ϒ ⊥ ≠ 0, that is ⊥ϒ ⊥ is of full rank.
—′ —′
4.4.2.1 Discussion
This key condition is undoubtedly difficult to understand from an intuitive
point of view. However, practically speaking, its function is to guard against
the component processes being I(2). If it is assumed from the outset that the
processes are I(1), then the required condition on the VAR is simply that is
of reduced rank. The condition can be used to extend the analysis of cointe-
grated systems to cases where the processes can be I(2). Having established the
condition for I(1) and cointegration, since this is necessary and sufficient,
clearly ′⊥
⊥ must be of reduced rank in order for the processes to be of a
higher order of integration.
96 Modelling Non-Stationary Time Series
− I ′
−1 − I ′
( ′
)−1
−1 ⊥ ⊥ ⊥ ⊥
˜ ˜
C(1) = A (1) = = . (4.72)
0 ⊥′
⊥ 0 (⊥′
⊥ )−1
Thus
ut = ( 0 I )x˜t = ( 0 I )C˜( L)( ⊥ )′t
[ ]
= ( 0 I ) C˜(1) + (1 − L)C˜ * ( L) ( ⊥ )′t
Summing terms in (4.73) and setting initial values to zero for simplicity:
t t
⊥ ∑
i =1
ui = ⊥ (⊥′
⊥ )−1 ⊥′ ∑ + C˜ ( L) .
i =1
i
+
t
This can be substituted for the first term on the right-hand side of (4.65). The
–
remaining term, t = (′)–1 t requires the expression of t in terms of t. It
~ , this term may be written
follows from (4.70) and the fact that t = (I 0) x t
t = D( L)t −1 (4.74)
+
t = D ( L)t . (4.75)
– — – ~
This is further simplified by setting C = ⊥ ( ′⊥
⊥)–1
—
′⊥ and C (L) = C + (L)
+ D+ (L). Hence
t
xt = C ∑ + C( L)
i =1
i t
∆xt = ∏ xt −1 − ∑
∆x i t −i + t .
i =1
is the same as that from (4.51).18 From the point of view of maximum likeli-
hood estimation, this is equivalent to concentrating the likelihood function.
As long as a Gaussian likelihood is used, the maximum likelihood estimator of
is also unaffected, even under the restriction that the matrix is of reduced
rank, r, r < n. That is, the estimates of and in
∏ = ′
and does not seem to address the issue of correlation maximization. However,
a close examination of the relationship between this and the complete
maximum likelihood problem reveals that in fact the problems yield the same
solutions (see Appendix D). Both the maximum likelihood (ML) and canon-
ical correlation problem deal with the sample covariance matrix of the resid-
ual vectors. Define the sample covariance matrices
T
(i) The eigenvalues of this problem are the squares of the canonical
correlations.
(ii) The corresponding eigenvectors are the potential cointegrating vectors,
.
(iii) The maximized value of the log-likelihood function depends only on
the r largest eigenvalues and S0,0, where the term in S0,0 is additive and so
does not appear in expressions for the difference between maximized
log-likelihood functions for different r.
(iv) Estimates of , called the adjustment coefficients, are available as a func-
tion of the estimates of and S0,1.
when S1,1 is non-singular (Dhrymes 1984), and as such is the same as that for
the canonical correlation problem.
For each eigenvalue that satisfies (4.77), there is an equivalent eigenvector, vi,
that is a solution to the following homogenous system of linear equations:19
or
1,1 is an estimator of so
It follows from the algebra of the problem that S0,1S–1
an estimate of can be obtained from that of since
= S0,1
In addition:
i = i′S0−,10i
where i is the ith column of . This result follows only where the normaliza-
tion ′ S1,1 = I is used. Equation (4.79) shows that a test of i = 0 is equivalent
to a test of i = 0 that is, that the ith column of is zero. The restriction i = 0
means that the ith potentially cointegrating combination does not appear in
the VECM, the reason being either that it is not a stationary combination, or
that it is not significantly linearly independent of the combinations associated
with the larger eigenvalues, j, j < i.
The maximized log likelihood conditional on r, ignoring certain constants,
is given by
T
r
log L˜MAX = − log S0 ,0 +
2 ∑ log(1 − )
i =1
i
T
r
log L˜MAX (r ) = − log S0 ,0 +
2 i =1
∑log (1 − i ), r = 0, 1, …, n,
where the summation term does not appear if r = 0. The log likelihood
~ ~
log LMAX(r1) is a restricted version of log LMAX(r0) if r0 < r1. Thus, the likelihood
ratio statistic for comparing H0: r ≤ r0 with the alternative H1: r ≤ r1 is
[
LR(r0 , r1 ) = −2 log L˜MAX (r1 ) − log L˜MAX (r0 ) ]
~
since log L MAX (r*) is the log-likelihood for a model where H0: r ≤ r*.
Substituting from the expression for the maximized log likelihood in terms of
the eigenvalues, this can be written
Multivariate Approach to Cointegration 101
r1
LR(r0 , r1 ) = −T ∑
i = r +1
log (1 − i )
0
If used in a conventional way, the null hypothesis would be rejected for large
values of the test statistic, such a rejection being a statement that the eigen-
values i, i = r0 + 1, …, r1 were jointly significantly different from zero. The
normal choices of r0 and r1 are:
(a) r0 = j – 1, r1 = n, j = 1, 2, …, n;
(b) r0 = j – 1, r1 = j, j = 1, 2, …, n.
In case (a), the test is of whether the eigenvalues i, i = j, …, n are jointly zero.
These are the n – j smallest eigenvalues. In case(b), the test is of whether the
eigenvalue j, alone is zero.20 In performing the two tests, the information
exploited is different, and so the inferences may not always agree.
The test associated with (a) is known as the trace statistic, denoted trace
(j – 1). The null (H0) and alternative (H1) hypotheses are, for j = 1,2, …, n:
H0 : r ≤ j − 1
H1 : r ≤ n.
n
∑
LR( j − 1, n) = −T log (1 − i ) = trace ( j − 1)
i= j
The test related to (b) is known as the maximal eigenvalue statistic, denoted
max (j – 1), and has the hypotheses
H0 : r ≤ j − 1
H1 : r ≤ j
Each test rejects the null hypothesis for large values of the test statistic,
which must be positive. Thus, using c to stand for the critical value of the
test, and (j – 1) to represent the test statistic, the form of the test is:
reject H 0 if ( j − 1) > cv
The critical values for the two tests are different in general (except when
j = n), come from non-standard null distributions and are dependent on the
sample size and the number of cointegrating vectors being tested for. The dis-
tribution theory leading to the critical values of the test is described in
102 Modelling Non-Stationary Time Series
Appendix D.22 Most computer packages that compute the test statistics also
compute critical values for the tests.
The interpretation of these tests should be considered carefully.
The trace statistic always has as its unrestricted case, that the cointegrating
rank is at most n. The restricted, or null, case is that the cointegrating rank is
at most j-1. This is consistent with the statement of the hypotheses in terms of
the eigenvalues as
H0 : i = 0, i = j, …, n
H1 : i > 0 for at least one of i = j, …, n:
since in the alternative case at least one of the set of eigenvalues being tested
must be non-zero. So, it might be that only the largest remaining, the jth, is
non-zero, hence that the cointegrating rank is j, or at the other extreme, it
could be that all are, in which case the rank is n. Given that the cointegrating
rank cannot exceed n, the simplest way to represent the case under the alter-
native is r ≥ j.
The maximal eigenvalue test has the same restricted model, but the unre-
stricted model only considers a cointegrating rank one higher. Thus, the only
case explicitly considered under the alternative is a cointegrating rank is
greater by one. In terms of the eigenvalues the hypotheses become
H0 : j = 0
H1 : j > 0.
From the hypotheses expressed in terms of eigenvalues it can be seen that the
trace test is a joint test of all eigenvalues smaller than j – 1, that is j, j + 1, …,
n while the maximal eigenvalue test is of j only. The hypotheses of the two
tests are summarized in Table 4.1.
In the case of neither test is the cointegrating rank established uniquely. To
determine the cointegrating rank it is necessary to focus down onto a particu-
lar value for r. This can be achieved by testing in sequence, moving in the
direction of increasing cointegrating rank. Notice that when using the trace
test, rejection of the null r ≤ s – 1 leads to the conclusion that r ≥ s. The next
i = j, least one i, i= j
j + 1, …, n i = j, j + 1, …, n
log(1 − j ) ]
Multivariate Approach to Cointegration 103
…
r≤n–1 r=n–1 r=n Conclude r = n Conclude r = n – 1
Note:
*Sequential interpretation assumes rejection of previous null hypothesis.
null in the sequence is r ≤ s, but since r ≤ s – 1 has already been rejected, this
reduces to r = s.23 The alternative is r ≥ s + 1. Rejection of the null again would
lead to a test of the null r ≤ s + 1 (in effect r = s + 1) against the alternative
r ≥ s + 2, and so on until the null is not rejected. This sequence and the inter-
pretation of rejection or non-rejection at each stage is described in Table 4.2.
The maximal eigenvalue test may be used in an analogous way, as described
in Table 4.3. Rejection or non-rejection of the null hypothesis should be
treated cautiously. Rejection of the null hypothesis does not imply that the
Note:
*Sequential interpretation assumes rejection of previous null hypothesis.
104 Modelling Non-Stationary Time Series
(i) Use the max test only to check the cointegrating rank determined by the
trace procedure. Thus a confirmatory inference is achieved if the first non-
rejection of the trace sequence occurs at H0 : r ≤ j (interpreted sequentially
as r = j) versus H1 : r ≥ j + 1 and the non-sequential max test does not reject
at H0 : r ≤ j versus H1 : r = j + 1. In this way, the test with the better power
is used up to a point where a test of greater power is used to confirm the
inference.
(ii) Rather than compute the statistics in sequence, it is possible to compute
p-values for all cases. The preferred alternative would be that of the test
Multivariate Approach to Cointegration 105
with the highest p-value. The interpretation from the maximal eigenvalue
test is clear as this has a point alternative. That from the trace test is less
obvious since the alternative is of a compound form. However, the
natural interpretation is to select the lower bound since cases involving
only higher orders of cointegration are not preferred.
Hendry (1995) and others have argued that a general to specific approach is
to be preferred in model selection. The sequential testing procedure, however,
begins by testing the most restricted case: that all eigenvalues are zero. The
restrictions are then relaxed one eigenvalue at a time. This is a specific to
general approach. It is also specific to general in that the lower the rank, the
fewer coefficients are needed to parameterize the VECM.24 Nonetheless,
Johansen’s result establishes that the specific to general approach is a valid
method for determining the cointegrating rank.
appropriate number of times, while the long run is estimated in the usual way
from the residuals from equations in the lag of the original data. Otherwise, a
more general estimator is required. Currently there is an appropriate estimator
for the I(2) case, which will be covered in more detail in chapter 6. Here, we
consider an example which satisfies the property of balanced I(2) behaviour.
Either that data are all I(2) and the dynamic models are specified in their
second differences or when the data is logarithmic, accelerations of all the
series analyzed are specified as being I(1), and then the usual Johansen
method is applied to I(1) series of which some may also be differenced. To
confirm the appropriateness of the balanced I(2) case, the test for I(2) by
Johansen (1992) is applied.
Finally, the current evidence on the performance of tests of cointegration is
discussed.
LogL(.) = T ∑ log(1 − ) i
i =1
where i is solved from the determinantal equation iS1,1 – S1,0 (S0,0)–1 S0,1 = 0,
Sij = t=1 RitRjt, i, j = 0, 1. In the VAR(1) case:
n
R0 ,t = ∆xt
R1,t = xt −1 .
The latter equation is a VAR(1). Not only is this a VAR(1), but this equation
can be readily viewed as a multivariate generalization of the model estimated
by Dickey and Fuller, to test stationarity of a single series; the estimation of
this type of model is briefly considered in Engle and Granger (1987). For a
single equation, based on one or more regressors, Engle and Granger test coin-
tegration using regression residuals, while the Johansen estimator requires a
108 Modelling Non-Stationary Time Series
∑
i
trace test is trace (i ) = −T log(1 − i ) for i = 1, …n.27 If it were known a priori
j =1
that all the series were stationary, then both the Johansen test statistics, that
are essentially likelihood ratio tests, would follow a Chi-squared distribution.
However, as was discussed above, when the series are I(1), then the distribu-
tion is non-standard. It has been common practice to compare the test statis-
tics with their asymptotic critical values, which come from simulating a null
Multivariate Approach to Cointegration 109
Table 4.4 Eigenvalues, Johansen test statistics for VAR due to Hunter (1992)
Note:
* Indicates significant at the 5% level for critical values. For tables of the Johansen trace test with
un-restricted intercept and T = 50 observations see Francis (1994).
distribution for the test that the series are multi-variate random walks. The
tests are significant when the null hypothesis r = i is rejected against the alter-
native for both tests that r > i. From the results presented in Table 4.1, both
tests ( max(1)=50.82>39.43 and trace(1) = 119.69>95.18) yield the same con-
clusion that there are r = 1 cointegrating vectors. The test is only significant in
the case where r = 1, otherwise none of the tests are significant. The test statis-
tics are asymptotic and much of the research that has looked at the impact of
testing would conclude that the performance of both tests in small samples is
poor. Based on the suggestion that the trace test is more reliable than the max
test and the fact that rejection of the proposition that there are two cointe-
grating vectors is very marginal ( trace(2) = 68.86), Johansen and Juselius whose
results are for a restricted version of this model, suggested r = 2. Some theoret-
ical and empirical evidence is presented in the next two sections as to why
there may be over-rejection.
Johansen and Juselius (1992) used the same data, but they assumed that the
oil price was strictly exogenous to the system, which means that it has no
influence on the long-run. They estimate the following five-variable VAR con-
ditional on changes in the oil price (this proposition is tested in the next
chapter):
The results presented in Table 4.5 are based on the same model, except that it
is estimated on the data set extended to 1991q4. The results and conclusions
are not materially different from those of Johansen and Juselius (1992). As was
concluded before, Johansen and Juselius suggested that there were r = 2 coin-
tegrating vectors, even though the test statistics did not quite bear this out. In
what follows the analysis is based on the Johansen trace test. The extended
data set implies that Johansen and Juselius (1992) were correct to suggest that
110 Modelling Non-Stationary Time Series
Table 4.5 Eigenvalues and trace test statistics for Johansen and Juselius model
*Indicates significant at the 5% level for tabulated values of the test statistic with trend and one
exogenous variables. For similar values, see Pesaran et al. (2000).
there are r = 2, cointegrating vectors, because the trace test is significant for
the proposition that r exceeds zero and one. It will be discovered that the
VAR(2) model is not well formulated, but any opportunity to re-specify the
models associated with Hunter (1992a) and Johansen and Juselius (1992) is
limited by the number of observations. For further comparison with the
results in Johansen and Juselius (1992), eigenvectors are calculated for the case
in which r = 2.
The two vectors are normalized with respect to the first element, but the
normalization is arbitrary and no suggestion is made that these vectors have
any meaning. However, when compared with the results presented in
Johansen and Juselius (1992), the unrestricted eigenvectors suggest that the
following restriction (1 – 1 – 1) might be applied to both aggregate price series
and the exchange rate. The restriction implies that there is a long-run corres-
pondence between the terms of trade and the exchange rate (a condition for
Purchasing Power Parity or PPP). This conclusion is quite consistent with the
results in Johansen and Juselius (1992). This type of restriction is analyzed in
more depth in the next chapter where identification and exogeneity are dis-
cussed. It is of interest to note that neither Johansen and Juselius (1992) nor
Hunter (1992a) could force the first vector to be restricted to satisfy pure PPP;
that is to say the proposition that the real exchange rate is stationary was not
sustained by the data. And, unlike Juselius (1995), who considers similar
results for Denmark and Germany, the interest rates that appear in the model
p1 1.00 1.00
–p2 –1.07 –1.6
e12 –1.03 2.9
i1 –3.34 –8.4
i2 –0.31 14.5
Multivariate Approach to Cointegration 111
do not yield a PPP vector augmented by uncovered interest rate parity (UIRP).
However, as will be observed in the next chapter the second vector does
appear to suggest UIRP. Here, it is not possible to interpret the unrestricted
cointegrating vectors as they have not been appropriately identified. Three
matrices (, and ) were calculated they all have the same rank, which
implies that only part of can be used to identify both and . Without
restriction not all of the matrix pair (, ) is identified. Alternatively, without
restriction both matrices can be transformed to a square r × r arbitrary non-
singular matrix (). Therefore:
Table 4.7 Key diagnostic tests for the VAR(2) model with strictly exogenous oil prices
p1 25.74** 7.33*
p2 8.53* 2.28
e12 6.92* 0.84
i1 3.42 2.57
i2 32.44** 15.43**
Table 4.8 Eigenvalues and trace test statistics for Juselius model applied to UK data
set than Johansen and Juselius (1992) and the decision to include this vector
is based on a statistic that is significant in conventional terms.
A valid analysis and interpretation of the results is left to the next chapter,
after identification is discussed. However, the first vector would appear to be
PPP augmented by a UK interest rate and the inflation rate, the second vector
suggests an interest parity condition, while the interpretation of the other
vectors is not clear.
Thus far, three alternative VAR models have been devised to explain the UK
effective exchange rate in association with a set of related variables. However,
the conclusions drawn from this exercise depend on the performance of all
the equations in the long-run and short-run models. Exclusion of variables in
the long run depends on tests of exogeneity, long–run exclusion and restric-
tions associated with economic hypotheses that are likely to identify.
Discussion of these issues is left for the next chapter, here the question of
specification would appear to be a key mechanism to discriminate between
models.
Comparison of the results in Table 4.7 with those in Table 4.10 suggest that
the transformed model is better behaved (none of the diagnostics are
significant at the 1% level). According to the system-wide diagnostic tests, the
VAR(2) which includes the dummy variables is well specified, as can be
observed from Table 4.10. If testing at the 1% level is considered acceptable,
then none of the tests of dynamic specification are significant, which implies
that each of the estimated equations is well specified. When tests are applied
Table 4.10 Single equation diagnostics for each equation in the VAR (2) model
p1 – p2 2.74 9.94*
∆p1 7.19* 2.53
e12 0.63 4.13
i1 1.25 6.28
i2 0.35 7.98
Figures 4.1–4.6 Recursive Chow tests for the 5 VAR equations and the VAR system
rank(⊥′
⊥ ) = n − r .
Multivariate Approach to Cointegration 115
p
where
= I + ∑
i =1
∗
i = −
(1).
If rank(′⊥
⊥) < n – r then there are trends in the VAR that have not been
accounted for. Taking the extreme case where rank(′⊥
⊥) = 0,
= 0 and
p −1
∆ 2 xt+ + ∑
∆x
i =1
∗
i
+
t −i = t+ ,
Table 4.11 Eigenvalues and trace test statistics for I(2) test
Note:
*indicates significant at the 5% level
116 Modelling Non-Stationary Time Series
T
(i) Specify a general model with s ≤ 3n
— lags per equation.
Another problem that is likely to arise in this case relates to the existence of
what Caner and Kilian (2001) call hidden moving average behaviour. In
section 4.2, the question of inversion of the Wold representation was dis-
cussed. It was stated that the VECM only derives from the Granger representa-
tion theorem when the system is bivariate. A more general transformation
118 Modelling Non-Stationary Time Series
exists when the matrix polynomial from the Wold representation (C(L)) is
rational, but this proposition is still not testable from the VAR. An alternative
inversion is considered in the next section, but this only yields a finite order
VAR when C(L) is first order. One solution is to apply the Johansen procedure
to a Frisch–Waugh equation where the residuals are estimated using either a
VARMA(1,q) or shorter order VARMA (Hunter and Dislis 1996; and Lütkepohl
and Claessen 1993). Burke and Hunter (1998) have shown via simulation of
models with quite simple moving average structure, that the size and size cor-
rected power can be quite strongly affected by the existence of moving
average errors and that this does not disappear as the sample size increases.
However, Marinucci and Robinson (2001) show that the Johansen trace test
would appear to work quite well with samples of 100 observations, when com-
pared with fully modified estimators, though there is some evidence for small
systems that the Phillips modified estimator might perform better when the
sample size is less than 100 (Hubrich et al. 2001). If the system is bivariate and
one variable is weakly exogenous then the semi-parametric approach first
applied by Robinson and Marinucci (1998) to fractionally integrated series
appears to work well (Marinucci and Robinson 2001).
The number of observations likely to yield reasonable inference depends on
the nature and complexity of the problem to be analyzed and the order of
integration of the series. The advantage of the Johansen approach is that it
still provides an inferential procedure, which permits the long run to be estim-
ated and long-run systems to be identified, causal structure and endogeneity
tested. None of the other approaches appear to do all of the above. The
approach also generalizes to higher order cointegration.
In the next section we consider some further issues related to representa-
tions and in the next chapter issues of exogeneity and identification are
discussed.
It was observed that the switching between cointegrating forms in the Wold
VMA and the Johansen VAR was not a straightforward exercise. One possible
explanation is that VAR and the VMA are always approximations, the other is
that the natural time series representation in the cointegration case is either a
VAR or a VMA. However, the finite VMA that forms the basis of the Granger
representation theorem and the Smith–McMillan–Yoo Form does not usually
conform with a finite order VAR. In this section we develop an extension to
the results previously considered, which derives from the literature on matrix
polynomials (Gantmacher 1960; Gohberg et al. 1983). Based on some broad
conditions for the extraction of divisor matrices from a matrix polynomial it
follows that the VMA can be directly inverted. In this section the Generalized
Multivariate Approach to Cointegration 119
Bézout Theorem and an extension that considers the unit root case are used to
derive a VAR and VARMA representation for cointegration (Hunter 1989a,
1992). It is shown that under the conditions required for the extended Bézout
Theorem, that the VMA(1) inverts exactly to a VAR(1), this result is demon-
strated for a simple bivariate system, which is used by Burke and Hunter
(1998) to develop their Monte Carlo study. The section concludes with a brief
discussion of the articles by Haldrup and Salmon (1998) and Engsted and
Johansen (1999).
Theorem 2 If Q0 ≠ 0, then there exists a left-hand divisor Q0(z) = (Iz – F) such that
Q(z) = Q0(z)Q1(z), if and only if, Q(F) = 0.
In the case where Q(z) has a block of common roots, then the result devised
by Sargan (1983a) to extract Matrix Common Factors in autoregressive error
models can be applied to the case of common unit roots:
Theorem 3 If Q(z) has a block of common roots (for cointegration on the unit
circle), then Q(z) has a left-hand divisor Qo(z) = (Iz – F), if and only if FQ(F) = 0.
Q ( z ) = (Q0 z q + Q1z q −1 + …Q q )
where Q0 = I and rank(Q(1)) = n – r. If there is a left-hand divisor Q0(z) = (zI – F)
of Q(z), then
zQ ( z ) = Q0 ( z )Q1 ( z ). (4.83)
By comparison of the jth polynomial powers of z on the left-hand and right-
hand side of (4.83):
It follows that FQ1(q) = 0 is necessary and sufficient for (4.83) and (4.85) to be
isomorphic. Replacing j by q and re-arranging (4.84):
or equivalently
F 2Q1( q −1) = − FQ q .
∑
q −1
F q +1 = − F F kQ q − k . (4.86)
k =0
∑
q
FQ ( F ) = F F kQ q − k = 0. (a)
k =0
The existence of the left-hand divisor relies on FQ(F) = 0, which occurs either
when the Generalized Bézout Theorem holds and Q(F) = 0 or when F is a left-
hand annihilator of Q(F). ■
This generalization implies that F lies in the null space of Q(F) or when rank
(F) = r, then rank(Q(F)) = n – r. Given rank Q(F)) = n – r then there exists an
r × n matrix K1, which annihilates Q(F). There is an arbitrary matrix K2 of
dimension n × r defined so that K1K2 is non-singular and without loss of
generality F = K2(K1K2)–1K1 is an idempotent matrix, which annihilates Q(F).
When F is idempotent, then by definition Fk = F and:
∑ ∑
q q
FQ ( F ) = F FQ q − k = F Q q − k = FQ (1).
k =0 k =0
z qC( z −1 ) = Q ( z ) = Q0 ( z )Q1 ( z ).
Given that F is idempotent, Q0(z) has the following Smith rational form:
( z − 1) I r 0
Q0 ( z ) = ( Iz − F ) = H −1 H
0 zI n− r
Multivariate Approach to Cointegration 121
Ir 0
F = H −1 H.
0 0
Following Engle and Granger (1987), the necessary condition for cointegra-
tion is K1C(1) = 0. For K′1 = [K′11 : K′12], then any matrix K′2 which satisfies the
condition that (K1K2) is non-singular can be used. It is convenient to select
K′2 = [K′12 : 0] as it is then straightforward to show that H1 = K1 and H1xt = t
defines an r vector of cointegrating variables as F then has the following form
I −1
K11 K12
F= r .
0 0
( L − 1) I r 0
Q0 ( L) = H −1 −1
H = LC0 ( L )
0 LI n− r
( L−1 − 1) I r 0
= LH −1 −1
H.
0 L I n− r
Therefore:
(1 − L) I r 0
C0 ( L) = H −1 H.
0 I n− r
A0 ( L)xt = C1 ( L)t
where
Ir 0 −1 LI r 0
A0 ( L) = H −1 H = ( ∆I − H H ) = ( ∆I − FL). (4.88)
0 ∆I n− r 0 0
The above factorization is unique as long as (a) above holds and this prohibits
the possibility of polynomial cointegration. The partial common factor (1 – L)
cancels to leave the following VARMA(1,q) in levels and differences:
or
∆xt − Fxt −1 = C1 ( L)t
Ir 0 H1
A( 0) F = A( 0) H −1
0 0 H 2
Ir 0 Ir 0 H1
= A( 0) H −1
0 0 0 0 H 2
∗
= A( 0) H1 H1 ,
what will be called weakly exogenous variables.32 If A(0) has full rank and H*
has rank r this implies that can be factorized so that there is an n × r block of
well-defined elements. It is also of interest to notice that conditional on the
knowledge of the number of cointegrating vectors, the VAR has the following
structural representation:
( L)∆xt − A( 0) Fxt −1 = t
or
+ ( L)∆xt = Fxt −1 + t
where
+ (L) = A(0)–1
(L) which has the same cointegrating vectors as the
VARMA(1,q) representation.
1 1 − 1
∆xt = − xt −1 + t .
2 −1 1
A( L)C( L) = ∆I ,
The differencing operator then cancels, so that, apart from initial values,
A( L)xt = t .
When A(L) and C (L) are first order, a sufficient condition on C (L) is that the
matrix lag coefficient must be idempotent. The required lag coefficient of A (L)
1 1 1
may then be solved for. In this case: A( L) = I − L , therefore:
2 1 1
1 − 1 L − 1 L 1 − 1 L 1 L
A( L)C( L) = 2 2 2 2
1 1 1 1
2− L 1 − L L 1 − L
2 2 2
( −0.5 L + 1.0) − 0.25 L
2 2
0.0
=
0.0 (−0.5 L + 1.0) − 0.25 L
2 2
1 − L 0
= .
0 1 − L
It is also of interest to note from the Granger reparameterization applied to
the AR and the MA representation, that the above condition implies:
A( L)C( L) = ( A(1) L + ∆A * ( L))(C(1) + ∆C * ( L)) =
= A(1)C(1) L + ∆A * ( L)C(1) + ∆A(1)C * ( L) L +
∆ 2 A * ( L)C * ( L)
= ∆I . (4.91)
It is necessary and sufficient for the above result to hold that the following
conditions apply
A(1)C(1) = ′C(1) = 0
∆A * ( L)C(1) L + ∆A(1)C * ( L) L + ∆ A * ( L)C * ( L) = ∆I .
2 2
1 1 − 1 1 1 1 0 0
A(1)C(1) = = = 0.
2 −1 1 2 1 1 0 0
Which derives from the condition for cointegration ′C(1) = 0. If we now look
at the second term, then this yields the difference operator that cancels and
for this example A*(L) = I and C*(L) = I. Therefore:
−2 L + L2 + 2 L( − 12 L + 12 ) + 1 L( − 12 L + 12 ) + L( 12 L − 12 )
=
L( − 12 L + 12 ) + L( 12 L − 12 ) − 2 L + L2 + 2 L( − 12 L + 12 ) + 1
− L + 1 0
= .
0 − L + 1
MA and AR components. When compared with the VAR derived using the
Smith–McMillan–Yoo form, the VARMA defines a unique factorization, which
can be made robust to the choice of r the number of cointegrating vectors and
when r is known, the long-run parameters can be estimated in one step. It is
also feasible that a Johansen type procedure can be applied in this case
(Hunter and Dislis 1996). The VARMA approach associated with this decom-
position selects unique linear combinations of variables which are stationary
when FC(1) = 0. Where an exact VARMA procedure to be used, then it is possi-
ble to handle roots on or inside the unit circle (Phadke and Kedem 1978).
A similar approach has been adopted by Lütkepohl and Claessen (1993),
though they estimate the long run using the Johansen procedure and then
estimate the short-run model using a VARMA model.
4.8 Conclusion
In this chapter, cointegration associated with series that are I(1) or may be
transformed to being I(1), has been considered. Granger (1983) first specified
cointegration in terms of VMA processes which have been over-differenced. If
one considers such over-differencing, then it is mirrored in the error processes,
which then exhibit moving average behaviour with unit roots. The theory was
developed for a system of equations and from the reparameterization of the
VMA polynomial follows the fundamental result for cointegration that
rank(C(1)) = n – r. This implies that there are r over-differences or r unit roots
in the moving average representation of the differenced data. The over-
differences relate the series that cointegrate or form linear combinations that
are stationary, while the remaining n – r series, require differencing to be
made stationary. In the Granger representation theorem it is shown that the
linear combinations that are stationary are associated with error correction
terms or cointegrating vectors that transform the non-stationary series to
stationarity. The cointegrating vectors transform the series to stationarity
under the Wold form, because they annihilate C(1), which leads to the r
cointegrating variables having a multivariate moving average representation
with all roots outside the unit circle.
Unfortunately, it is not easy to show that the VMA in differences inverts to
a VAR in levels. The result developed by Engle and Granger (1987) is only
valid for bivariate systems. Yoo (1986) developed a factorization based on
Smith–McMillan forms, but these are only correct when C (L) is a rational
polynomial. In this chapter an alternative approach is developed, which gives
rise to an exact inversion of the VMA to an error correcting VAR, but this
requires a matrix F that is idempotent and which annihilates C(1). It follows
that F contains the cointegrating vectors.
Multivariate Approach to Cointegration 127
128
Exogeneity and Identification 129
In this chapter, the idea of exogeneity is first discussed in broad terms and
it is then considered relative to the long-run parameters. When compared
with the short run some of the long-run concepts are directly testable.
Identification is then discussed in terms of a conventional system of equations
and finally in terms of the long-run parameters of the model.
( L)∆xt = ∏ xt −1 + + t (5.2)
H1 (r ) : ∏ = ′
where xt = [yt, zt] and Xt = (X0, x1, x2, … xt). Weak exogeneity requires that the
parameters of interest depend on only the parameters of the conditional
density of yt and that there is a sequential cut of the parameter spaces for 1
and 2 (Florens, Mouchart and Rolin 1990). If so, the marginal density for zt
can be ignored without loss of information when conducting statistical infer-
ence about the parameters of interest. Strong exogeneity combines weak exo-
geneity with Granger non-causality, so that the marginal density for zt
becomes D(zt|Zt – 1, 2). Super exogeneity requires weak exogeneity and that
the parameters of the conditional process for yt are invariant to changes in the
process for zt. Weak exogeneity can either be defined in terms of the matrix
as a whole or in terms of a sub-block .1.
where i,j is (ni × rj) and ′i,j is (rj × ni), and the following vectors: it = .′1xt
2t = ′2.2zt define r1 and r2 blocks of stationary variables.
If conditions (i) and (ii) above hold, then cointegrating exogeneity in this
form is an exact analogue of strong exogeneity as in the usual setting of
dynamic models weak exogeneity is combined with non-causality (see Engle
et al. 1983). Unfortunately the restrictions implied by (i) are not easy to
impose which leads to the alternative special case of diagonalization first dis-
cussed in Hunter (1992). Diagonalization or quasi-diagonalization of the
system requires (ii) in combination with (iv) below.
1,2 = 0 (iv)
132 Modelling Non-Stationary Time Series
However, (iv) is sufficient for 1,2 = 1,2 (2,2)–1 2,2 = 0 as this condition implies
that 1,2 = 0.1 Once the quasi–diagonal form is accepted, then weak exogeneity
of zt for .1 is equivalent to weak exogeneity of zt for the first n1 blocks of . As
a result the first sub-block of cointegrating vectors can be estimated from the y
sub-system. Hall and Wickens (1994) discuss a special case of the above result
which occurs when 1,1 is non-singular. As a result, the quasi-diagonal form is
observationally equivalent to the cointegrating exogenous case. This occurs
when rank (1,1) = n1 = r1, because it is then possible to reparameterize II in the
following way
1,1 0 1′ ,1 b 1,11′ ,1 1,1b
∏= = .
0 2 ,2 0 2′ ,2 0 2 ,22′ ,2
This diagonal form is equivalent to (iii) above; when b = ′2,1 + (1,1)– 1 1,2 ′2,2
and:2
1,11′ ,1 1,12′ ,1 + 1,22′ ,2
∏= .
0 2,22′ ,2
However, when 1,1 is non-singular, then b is a linear combination of some
minimal or more primitive set of cointegrating vectors of which 2,1 and 2,2
are sub-blocks. This difficulty in interpretation does not arise when zt is
weakly exogenous for b.1 = [1,1: b], but weak exogeneity implies a sequential
cut in the parameter space, which only occurs when (1,1)– 1 1,2 ′2,2 = 0
as otherwise b.1 = f(.2), which violates the condition for a sequential cut.
If (1,1)– 1 1,2 ′2,2 = 0 then either zt is weakly exogenous for .1 or 2,2 = 0.
In the latter case r2 = 0, r = r1 and the system is decomposed into r2 difference
stationary variables and r1 = n1 stationary variables.
It is more usual to start from the proposition that zt is weak or cointegrating
exogenous for some parameters of interest .1, block triangularity implies and
is implied by (ii) when .1 define the parameters of interest. However, zt is
only weakly exogenous for .1 when 12 = 0 or zt is strongly exogenous. The
invariance of .1 when a block diagonality restriction is applied is an indicator
that the diagonal form is valid.
H 4 : = H 4 . H 4 (n × s),( s × r ).
H 6 : = ( H 6 1! 2 ), H 4 (n × s), 1 ( s × r1 ), ! 2 (n × r2 ).
H7 : = (!1 , H7 2 ), H7 (n × s), 2 ( s × r2 ), !1 (n × r1 ).
H 4 : = H 4 . H 4 (n × s), ( s × r ).
H 6 : = (H 6 1 ,"2 ) H 4 (n × s), 1 ( s × r1 ), "2 (n × r2 ).
H7 : = ("1 , H7 2 ). H7 (n × s), 2 ( s × r2 ), "1 (n × r1 ).
where r ≤ s ≤ n and r1 + r2 = r
0 0 0 0 0 0 0 5
= H4 = , = .
I 5 2
Hence, H4 is a 6 × 5 selection matrix and is a 5 × 2 matrix of unre-
stricted parameters. Testing for strict exogeneity requires the application of
134 Modelling Non-Stationary Time Series
the restrictions associated with LE (H4) and WE (H4). Using the results
presented in Table 4 of Hunter (1992), the restriction does not hold as
2(4) = 23.83 exceeds the critical value (9.49). As a result of the above finding,
all subsequent tests were applied to a model, which included all six variables.
By applying to each variable the same type of restriction as H4 above, Hunter
(1992) finds that three variables out of six might be viewed as being WE for .6
Subsequently, WE tests are applied to groups of variables. In particular, for the
case where (e12) and (i1) are tested, then is a 4 × 2 matrix of parameters H4 is
a 6 × 4 selection matrix:
1 0 0 0 11 12
0 1 0 0 21 22
0 0 1 0 31 32
= H 4 = , =
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 1 61 62
and the WE variables are associated with the 4th and 5th rows of H4 and
respectively. The test is not significant as 2(4) = 4.04 does not exceed the
critical value.
To test whether i1 and i2 are CE for the first cointegrating vector implies the
following restrictions:
I4
1,1
=
0 0 0 0 0 1
1,2
0 0 0 0 0
0 0
0 0
1,2 0 0 7
= 2 .
2 ,2 0 0
1 0
0 1
The restrictions are accepted as 2(6) = 7.82 is less than the critical value at the
5% level.
Here emphasis is placed on long-run non-causality, the short-run concept
relates to a combination of restrictions associated with CE, that 1,2 = 0 and
those on the short-run dynamics. Mosconi and Giannini (1992) apply the test
of non-causality in a short-run sense, while here the emphasis is solely on the
long run. Non-causality in the long-run relations associated with the variables
in the cointegrating equations implies a recursive structure to , whereas co-
integrating exogeneity also implies that the equations associated with the CE
Exogeneity and Identification 135
variables do not include the CE vectors associated with the non-CE variables.
Cointegrating exogeneity implies that long-run forecasts can be made condi-
tional on the CE variables.
In practice, all the restriction applied above can be undertaken using the
general restrictions approach dealt with in Hendry and Doornik (2001):
H g : = ( ) ∩ = ( )
Table 5.1 Tests of weak and strict exogeneity, long-run exclusion and cointegrating
exogeneity
Note: †Cointegrating exogeneity (CE), strict exogeneity (SE), weak exogeneity (WE) and long-run
exclusion (LE). (* significant at the 5% level and ** significant at the 1% level.)
5.2 Identification
in practice. A further issue which limits our ability to identify is the notion of
observational equivalence. Appropriate restrictions might be found and
generic identification satisfied, the restrictions applied might be accepted, but
it may not be possible to discriminate between one class of model and another
model drawn from a different set of theoretical principles.
For linear models identification is usually straightforward, depending on
simple order conditions and a rank restriction (Goldberger 1964). When one
considers further degrees of non-linearity, then it becomes more difficult to
prove generic identification and the process becomes more empirical in
nature. Although certain advances have been made, the notion of observa-
tional equivalence is often all that is available to discriminate between
identified and non-identified models (Rothenberg 1971). Rothenberg (1971)
makes a further distinction between local and global identification. Local
identification is described as the ability to discriminate between models with
observationally distinct parameterizations within a neighbourhood of the
optimum. Consequently, identification, by its very nature, becomes more
empirical and any conclusions drawn are reliant on the parameterization of
the problem. Generic identification often stems from the rank of the informa-
tion matrix, which is a necessary criterion for safe optimization, though in
practice highly ill conditioned problems may yield locally well-defined para-
meter estimates. The empirical and generic notions become intimately related.
The ability to estimate some ‘structure’ consistently yields the possibility of a
sub-category of models, which may be observationally equivalent. Usually,
the minimum parametric form is a reduced form and from this more specific
structural models can be identified.
It is a combination of such necessary and sufficient conditions that will be
the main concern of the following sections of the chapter, in combination
with the question of observational equivalence. These results are then applied
to the identification of long-run relationships. In the above sense, generic
identification depends on sufficient conditions derived from Rothenberg
(1971) combined with an order condition necessary for identification.
Identification and identifiability are viewed as being non-linear in nature,
which implies that this treatment is both different and more general than that
of Johansen (1995a) and Boswijk (1996). The treatment also permits the ready
combination of restrictions on all the parameters associated with the long-run
behaviour of the model.
Some of the conditions considered here stem from the article by Hunter
(1998) where the question of non-identification is addressed. Sargan (1983a)
emphasized what he defines as conditions for higher order identification,
the very existence of which may depend on higher-order moments. In
this context consistency and non-identifiability are not equivalent when
identification depends on distributional assumptions. This renders the usual
Exogeneity and Identification 139
P = − B −1
. (5.4)
It is common to redefine (5.3) above, thus:
Ax = ut
Ri ai = 0 f or i = 1, …, n1
BP +
= 0
or
P
[B
] I = A = 0. (5.5)
When (5.5) is transposed and a single row from A is considered then the rank
condition becomes:
rank([′ Ri ]) = n1 − 1.
ji + n2 ≥ n1 + n2 − 1
or
ji ≥ n1 − 1.11
Furthermore, some of the types of restriction that violate the rank condition
are well known, identification is lost when two equations use the same restric-
tion as they are observationally equivalent and the same restriction applied to
all equations simply reduces the number of operational variables in the model
and thus a restriction is lost. However, the type of restriction discussed above
is linear in nature and often restrictions might well be non-linear (i.e., the case
of CE discussed above requires non-linear estimation). Prior to any discussion
of cointegration we consider non-linear identification, based on the results in
Rothenberg (1971) and Sargan (1988). The following theorem follows from
Sargan (1975):
L(0 Xt ) = L(1 Xt )
yt = Pzt + t
where yt and zt are defined above, t an n1 vector of reduced form errors and P
is an n1 × n2 matrix of reduced form parameters. A consistent estimator of P is:
Pˆ = Y ′Z( Z ′Z )−1
∂g
rank < q. (5.7)
∂ ′
Again, the above condition is necessary for local identification and failure
leads to a model that satisfies the full rank condition that is generally viewed
#g
as being unidentified. However, for non-linear models, where # — has full rank,
it may still be possible to obtain solutions to (5.6), because the conditions for
singularity or near singularity are less burdensome than those required to
solve (5.6). This gives rise to the following theorem that derives from Sargan
(1983a):
By simulation Sargan (1983b) shows that there may be near singular models
that cannot be distinguished from singular models, but satisfy (5.6) and are
thus identified. The convergence in distribution of estimators derived from
such near singular cases turns out to be much slower than usual. Because of a
larger than usual asymptotic variance they tend to be classified empirically as
unidentified.
142 Modelling Non-Stationary Time Series
∏ = ′
r ≤ r 2 − j or j ≥ r 2 − r (5.8)
P = − B −1
. (5.9)
If P is unrestricted, then identification of P follows from our ability to estimate
the long-run parameters.
A multivariate generalization of the conventional condition for the identifi-
cation of a regression equation is required, that is a first moment matrix com-
posed of some regressors has to have full rank. If P = is calculated using the
144 Modelling Non-Stationary Time Series
′ = (1 )−1 ∏1 .
H1 ⊂ H 2 ⊂ H 3 … ⊂ H i .
Identification follows from the acceptance of the sequence of tests. Let us look
at some linear restrictions of the form:
Ri Ai = 0 for i = 1, …r .
Any linear statistical model with a set of restrictions may be defined thus:
L = Agxr , ∑ H ii = 0, i = 1, …r .
Notice that the restrictions associated with M are now non-linear by virtue of
the rank restriction. When the set of all possible restrictions is considered,
then the class of just identified models is likely to be large, though it will
define a subset of the restricted models, so that M 傺 L.
The above result implies that there is a non-null set of models that cannot be
distinguished on the basis of the likelihood and they define a family of obser-
vationally equivalent models, which satisfy the rank condition and thus corre-
spond to a point in M with certainty. If these results are made particular to the
cointegration case, then the parameter point given by the restrictions Ri for
i = 1, … r, with
rank( Ri H i1 Ri H i2 … Ri H ik ) ≥ k
Consider, for example, the model estimated by Hunter and Simpson (1995),
which has r = 4 cointegrating vectors, = [H11 H22 H33 H44], then
identification of the first cointegrating vector alone requires us to check:
rank( R1′ H i1 ) = 1, for i1 = 2, 3, 4
rank( R1′ H 2 R1′ H i2 ) = 2 for i2 = 3, 4
rank( R1′ H 2 R1′ H 3 R1′ H 4 ) = 3.
In the case of the second cointegrating vector:
rank( R2′ H i1 ) = 1, for i1 = 1, 3, 4
rank( R2′ H1 R2′ H i2 ) = 2 for i2 = 3, 4
rank( R2′ H1 R2′ H 3 R2′ H 4 ) = 3.
Similar types of rank condition need to be checked for each remaining cointe-
grating vector.
Consider the simpler case estimated by Hunter (1992) and used before in
section 5.1.3. In this case n = 6, r = 2, = [H11 H22] and based on the order
condition two restrictions are required to identify each cointegrating vector
without normalization. For this section PPP is applied as a parametric restric-
tion to the first vector [*, a, -a, -a, *, 0] in combination with a zero restriction
on the eurodollar rate,17 while the second vector is restricted to accept UIP,
[0, 0, 0, 0, b, –b]. Hence, there are j1 = 2 restrictions in the first vector, which
without normalization is enough to just identify. And j2 = 5 means that the
second vector ought to be over–identified before normalization. Therefore:
1 0 0 0
0 1 0 0
0 − 1 0 0
H1 = and H 2 =
0 − 1 0 0
0 0 0 1
0 0 1 −1
Exogeneity and Identification 147
1 0 0 0 0 0
0 1 1 0 0 0 0 1 0 0 0 0
R1′ = 0 1 0 1 0 0 and R2′ = 0 0 1 0 0 0.
0 0 0 0 1 0 0 0 0 1 0 0
0 0 0 0 1 1
In this case both vectors are identified when k = r – 1 = 1 conditions are
satisfied, for a block of homogeneous restriction of the form R′k k = 0 or R′k Hk =
0 for k = 1, 2. It follows that the Johansen approach to identification checks
each combination of conditions rank(R′i Hj) = 1 for i ≠ j. In the case of the first
vector, it follows that,
1 0 0
0 1 0
0 1 1 0 0 0
0 −1 0
R1′ H1 = 0 1 0 1 0 0
0 −1 0
0 0 0 0 1 0
0 0 0
0 0 1
0 0 0
0 0 0
= 0 0 0.
the two matrices are orthogonal, while for identification:
0
0
0 1 1 0 0 0
0
R1′ H 2 = 0 1 0 1 0 0
0
0 0 0 0 1 0
1
−1
0
= 0.
1
1 0 0
1 0 0 0 0 0
0 1 0
0 1 0 0 0 0
0 −1 0
R2′ H1 = 0 0 1 0 0 0
0 −1 0
0 0 0 1 0 0
0 0 0
0 0 0 0 1 1
0 0 1
148 Modelling Non-Stationary Time Series
0 0 1
0 0 0
0 0 0
= 0 0 0.
0 − 1 0
0 0 1
Hence, the second vector is identified, because the matrix product above has 3
independent rows and columns.
As can be seen from the above derivations, the algebra becomes increasingly
burdensome with r. The conditions also relate to the specific definition of
generic identification described by Johansen (1995b) and the article does not
address the issue of empirical identification or the more general notion of
identification associated with observational equivalence.
a 0 b a o
′ = and H 2 = .
c d 0 c d
Selecting the normalization, a = 1 and d = 1 it follows from Boswijk (1996),
that the first vector in ′ is identifiable when the matrix H2 has full rank. To
discriminate between failure of normalization and other types of failure, a
further rank test is applied to an r – 1 dimensioned sub-matrix. Therefore:
H 03 : ∈ B4 = { : rank( R1′) ≤ r − 2 }.
In the example rank failure occurs for H02 when a = 0 (normalization) and for
the further restriction associated with H03: d = 0. However, from the accep-
tance of the Johansen test for cointegration (rank(′) = r), is identifiable as r
linearly independent cointegrating vectors must exist and, given acceptance
of the over-identifying restrictions, then the first vector is identified when I(0)
variables are precluded from the system. Using the approach of Boswijk, once
the first vector is identifiable, then rank conditions need to be tested for each
of the other vectors in turn.
Based on the results presented in Hunter and Simpson (1995) and those
above, some of the problems associated with incorrect normalization may be
Exogeneity and Identification 149
∂vec( ′ )
and secondly rank = nr , if the normalization is ignored. By similar
∂vec( ∏ i )′
argument, is identifiable when there exists two matrices .j and B for which
.j = B′ and B is non-singular. As a result, a unique solution for exists of the
form:
vec() = ( B ⊗ I n )−1 vec( ∏. j ) (5.12)
■
In the cointegration case, the existence of one or more solutions to (5.11) and
(5.12) is sufficient for the existence of a solution to = g(), which is what is
required for identification given (5.8). Finding such solutions negates the need
to undertake the test in Johansen (1995b).
150 Modelling Non-Stationary Time Series
Linearity, or the need to consider and , does not present a problem for
the condition in Theorem 9 that may be applied sequentially to and to
yield a sufficient set of solutions. Empirical verification of the generic result
follows from a direct test of the over-identifying restrictions:
( I ) H : + R vec() = 0
H : R vec() = 0.
The existence of a solution to (5.11) and (5.12) implies the system is generi-
cally identified. As Boswijk suggests, on empirical grounds identification may
fail due to insignificance of certain parameters. Here, identifiability follows
from the existence of sufficient information in certain rows and columns of
to identify and (Sargan, 1983). Clearly, many such orientations related to
particular over-identifying restrictions may exist. However, it is sufficient to
find one such orientation of the system to empirically accept the generic solu-
tion. Consider the example used above where for comparison with Boswijk we
let B = H2. When rank(H2) = r,20 then the condition in Boswijk (1996) is
satisfied, but also the sufficient condition for the existence of a solution to
(5.12) (a matrix B of full rank). From Theorem 9, the rank condition identifies
based on the restrictions in (I). Then conditional on (I), discovery of a
matrix (B) with full rank is sufficient for identification of .
If the variable chosen for normalization is invalid (a = 0 and rank(H2) < r),
then failure of the rank condition yields an additional restriction on the set of
cointegrating vectors (′). Therefore can be identified from a new orientation:
0 0 b 0 b
′ = and B = .
c d 0 d 0
The system is now over-identified as j = 3 > r2 – r. From acceptance of the
Johansen rank test, |B| = 0 can only occur when d = 0, but this contradicts the
proposition that rank(′) = 2. The structure of ′ based on d = 0 gives x1 and x3
as the cointegrating vectors, so two series in xt are I(0).21
Exogeneity and Identification 151
where $ = 1,2–1 2,2. One set of sufficient conditions for weak exogeneity of zt
for ′.1 = [′1,1: ′2,1] is 1,2 – $2,2 = 0 and 2,1 = 0, see Lemma 2 in Ericsson et al.
(1998). Combining (5.15) with (5.14) yields a system which, to a non-singular
transformation matrix, is equivalent to the original VAR. If (1,2 = 0, 2,1 = 0) is
152 Modelling Non-Stationary Time Series
applied to (5.13) and (5.14), then the VAR has a quasi-diagonal long-run struc-
ture (Hunter, 1992). For weak exogeneity additional restrictions may apply as
1,2 – $2,2 = 0 is required. Should 1,2 = 0, then $2,2 = 0 is sufficient for weak
exogeneity. This result can be associated with three possible requirements:
(i) $ = 0; (ii) 2,2 = 0; or (iii) $ is a left-hand side annihilation matrix of 2,2.
Under cointegration, (ii) does not apply as rank (2,2) = r2. Case (i) is consistent
with Lemma 2 in Ericsson et al. (1998). For case (iii), the quasi-diagonality
restriction (1,2 = 0, 2,1 = 0) combined with $2,2 = 0 is sufficient for weak exo-
geneity of zt for .1.
Weak exogeneity for a sub-block implies that analysis may be undertaken at
the level of the sub-system. More specifically, identification conditions now
apply at the level of the sub-system, as previously at the level of the full
system. Let 1 denote an n1 × n sub-matrix of for which rank (1) = r1 and
n1 > r1 ≥ 1. If 1(r1) defines an r1 × n sub-matrix of 1 for which the maximum
rank is given by its smallest dimension, then an equivalent column matrix
exists which is n1 × r1 and has full column rank. Given the quasi-diagonality
restriction, it follows that:
∏1 = 1,1.′1 and ∏1( r1 ) = A1.′1 , (5.16)
where A1 is a square matrix of full rank r1 obtained from 1,1 (by selecting r1
rows). To identify 1,1 and .1 subject to a standard normalization (i.e. r1
restrictions) the following sub-system order condition now applies:
Proof. By analogy with the proof of Theorem 9, vec(.1), which follows from
vectorizing (5.16), is identifiable when A1 has full rank. ■
If rank(2,2) = r2, then there is a sub-matrix 2, (r2) of dimension r2 × n2, and a
matrix of column vectors dimensioned n2 × r2, both of rank r2. Now the order
condition for this sub-system is:
Even with all of the zero restrictions in the second block of cointegrating
vectors, the number of relevant restrictions in the order condition for the sub-
block remains unchanged at the level of the sub-block. Subject to an appropri-
ate number of identifying restrictions, then a sufficient condition for the
existence of a solution to the system associated with 2,2 is the existence of A2,
an r2 × r2 sub-matrix of 2,2. By analogy with the result in Theorem 10, the fol-
lowing relationship exists for 2,2:
both weakly exogenous for .1 and zt is not long-run caused by yt, then zt is
termed long-run strongly exogenous for .1 Therefore, strong exogeneity com-
bines the restrictions associated with weak exogeneity and the restrictions
appropriate for cointegrating exogeneity.
In the next section, the identification and identifiability of a model involv-
ing weak, cointegrating and strongly exogenous variables is addressed.
To motivate the analytic solution and empirical results discussed in the last
section, the approach is applied to the data set analyzed by Johansen and
Juselius (1992) and Hunter (1992a).24 The system of equations associated with
Theorem 9 is observed to have a number of solutions, which directly relate to
the correct degrees of freedom for the test of over-identifying restrictions.
Emphasis is placed on a model, that is identified via restrictions on dis-
cussed in section 5.3 and both weak exogeneity and cointegrating exogeneity
are tested.
From the discussion in section 5.2, whether it is possible to identify the
parameters in the long run follows from the ability to solve for and from
well-defined rows and columns of . According to Theorem 6, this depends
on the existence of what might be called a valid orientation of the system. If
i = A′ and from the cointegrating rank test rank(′) = r, then it follows from
the conditions on the rank of sub-matrices, that rank(A) = r ⇒ rank(i) = r.
Hence, determining an A matrix with full rank is equivalent to associating the
solved system with well-defined parts of the matrix . The ability to identify
the parameters empirically from the solution to the algebraic problem of the
form (5.11) and (5.12) relies empirically on finding matrices A and B with full
rank. Prior to undertaking such a test, a set of minimum restrictions will be
defined and then tested.25 For generic identification of a system with r = 2
cointegrating vectors r2 – r = 2 restrictions are required with normalization and
r2 without. To test the over-identifying restrictions and identifiability, the like-
lihood ratio test discussed in Johansen and Juselius (1992) and implemented
in Doornik and Hendry (1998, 2001) is used. Using the results in Section 5.3,
and can be identified via a normalization and the restrictions associated
with quasi-diagonal also discussed in section 5.1.2:
0 0 0
′ = 11 21 31 . (5.17)
0 0 0 42 52 62
The only restrictions applied to are those associated with the normaliza-
tion (41 = –1, 52 = 1).
p0 p1 p2 e12 i1 i2
− 1 51 61
′ = 11 21 31 .
12 22 32 42 1 62
Exogeneity and Identification 155
It can be seen from the p-value associated with test (I) in Table 5.2 that the
long run is identified: (i) six restrictions are imposed (j = r2 – r = 2) and (ii) the
test of over-identifying restrictions is accepted at the 5% level.26
Now consider the orientation of the system or the selection of the appropri-
ate r-dimensioned square matrices A and B. A valid choice for A is based on
the 3rd and 6th rows from . For a solution, it is required that:
Hence, any matrix A needs to be of full rank. Following the acceptance of the
quasi-diagonality restriction then the identifiability of depends on the rejec-
tion of the condition |A| = 0.27 One possible orientation is:
31 0
31
32
33
34
35
36
A= and ∏ 3 = .
0 62
61
62
63
64
65
66
This test is applied under a null of non-identifiability of (Table 5.2, II), the
test is 2(2) and the null is rejected at 5% and any other conventional level of
significance. Should one consider the alternative orientation associated with
the treasury bill rate (i1) and the exchange rate equations, then both were
jointly accepted to be weakly exogenous by Hunter (1992a). To compare this
orientation with that used above it is of interest to note that when the restric-
tions 51 = 0 and 52 = 0 are used to augment 31 = 0 and 32 = 0 (Table 5.2, IIb)
then when compared with a 2(2) statistic the null cannot be rejected at the
5% significance level. This implies that the fifth column does not yield an
appropriate sub-matrix to orientate the system and by a similar argument the
fourth column can also not be used.
A possible choice of B is based on the fourth and fifth columns of , so that
vec(
.′4 ) −1 42
vec() = ( B ⊗ I 6 )−1 and B ′ = (5.19)
vec(
.′5 ) 51 1
156 Modelling Non-Stationary Time Series
where
′.j = [
1j
2j …
6j] for j = 4, 5. Here the test of orientation for the
identification of is undertaken prior to the imposition of any restriction (see
Table 5.2, IIc). Under the null the determinant of B is set to zero, the test is
2(1) and from the critical value non-identifiability can be rejected at the 5%
level. It follows from Theorem 9 that .j = B′ and from the cointegrating rank
test rank() = r, so rank(B) = r ⇒ rank(.j) = r also and the orientation with
respect to is valid.
It follows that a solution can now be derived from (5.18) and (5.19) based
on the selected A and B matrices (see Appendix H):
= [11 21 31 42 52 62 11 21 31 51 61 12
22 32 42 62 ]
51 51 51 51
= [ 1
14 −
15 1
24 −
25 1
34 −
35 −
44 − 1
45
51 51 −1 −1
−
54 − 1
55 −
64 − 1
65 31
32 31
33
−1 −1 −1 −1 −1 −1 −1
31
35 31
36 62
61 62
62 62
63 62
64 62
66 ]
= g −1 ( ), where = −1 − 4251.
From Corollary 11, when r1 = 1, then the existence of a block of weakly exoge-
nous variables is a sufficient condition for identification of the cointegrating
vectors in the first block. By analogy the second block is also identified, when
r2 = 1. The system is sequentially identifiable from the restrictions on alone
and the selection of the normalization. In this case, the long run is partitioned
into two sub-systems for which ri = 1 and consequently each vector is
identified by the normalization alone.
5.5 Conclusion
6.1 Introduction
In this chapter three further topics are considered in some detail: estimation
of models with I(2) variables; forecasting; and structural models with short-
run behaviour driven by expectations. Though mathematically the notions of
order of integration and cointegration are exact, in practice they are valid to
the best approximation or resolution that the data may permit. To define an
order of integration as a specific integer quantity is to assume that the series is
approximated by a single well-defined time series process across the sample.
Time series data for developed economies have exhibited many features, from
behaviour that might be viewed as purely stationary through to series that
require first or second differencing to render them stationary. Some nominal
series in first differences may require further differencing, which suggests that
the original nominal series are of order I(2) or higher when further differenc-
ing is required. In this chapter, discussion is limited to processes up until I(2).
The condition required for a series to be considered to be I(1), as compared
with one exhibiting further features only consistent with I(2) behaviour, is
necessary and sufficient for cointegration amongst I(1) series, but beyond
testing this condition, there is a well defined procedure for inference and esti-
mation of I(2) processes (Johansen 1992, 1995). It might often be difficult to
distinguish between an I(1) and an I(2) series, which suggests that series,
which appear to be I(2), are being approximated to some order of accuracy by
second differences. Alternatively, these series may be better modelled using
non-integer orders of differencing (Granger and Joyeux 1980; Hosking 1981).
To this end, the question of fractional processes and long-memory will be dis-
cussed briefly after the section on I(2) behaviour. A further reason why it
might be difficult to detect the order of integration of a series may be due to
the existence of structural breaks. This opens up a plethora of potential
difficulties for any form of structural modelling. Breaks in structure have a
159
160 Modelling Non-Stationary Time Series
∆ 2 xt = C( L)t
and xt cointegrate when ′I(2) C (1) = 0 and ′I(2) xt = I(2)t ~ I(0). If a left-hand
factor can be extracted in the manner described in section 4.5, then:
When C1 (L) has no more unit roots, then an I(2) cointegrating VAR exists in
second differences:
( L)∆ 2 xt = ∏ I ( 2 ) ∆xt −1 + t
where I(2) = I(2)′I(2) = F. This has been called balanced I(2) behaviour by
Juselius (1995). Now consider the case where C(1) has further unit roots, then
it might be possible to undertake a further factorization when a left-hand term
C01(L) = (I – GL) can be extracted and GC1(1) = 0. Therefore:
( ∆I − FL)∆xt = ( I − GL)C11( L)t (6.3)
( ∆I − GL)( ∆I − FL)xt = C11( L)t (6.4)
or
6.2.1.1 The Johansen procedure for testing cointegrating rank with I(2) variables
Prior to any discussion of the appropriate method of estimation the more con-
ventional VECM for the I(2) case is presented (Johansen 1995a):
p −1
∆ 2 xt = ′xt −1 −
∆xt −1 + ∑% ∆ x
i =1
i
2
t −i + N 0 Dt + t . (6.7)
Alternatively, when ′ = 0 and the differenced I(1) series have linear com-
binations that are stationary:
p −1
∆ 2 xt = −
∆xt −1 + ∑% ∆ x
i =1
i
2
t −i + N 0 Dt + t (6.9)
where –
= (′⊥)–1"′′ = I(2)′I(2) as ′⊥ has full rank, because ′ = 0 implies
= 0 and = 0. The full I(2) case allows for the possibility of cointegration
amongst I(2) series that become I(0) in combination, and cointegration
amongst I(1) series that become I(0).
Clearly, (6.8) can be estimated using the Johansen procedure, except the re-
gression that is purged of short-run behaviour in, for example the VAR(1) case is:
R0 ,t = ′R1,t
or
∆ xt = ∏ xt −1 ,
2
Further Topics 163
and decomposition and testing follows in the usual way (see sections 4.3–4.4).
Alternatively, for the VAR(1) case associated with (6.9) the estimation
procedure is in every respect the same as that derived by Johansen (1991),
except the data are first and second differenced. For the VAR(1) case this
involves estimating the following model:
R0 ,t = I ( 2 ) ′I ( 2 ) R1,t = (⊥′ )−1 " ′ ′R1,t
or
∆ xt = −
∆xt −1
2
This becomes more complicated when the two types of cointegration are com-
bined, then (6.7) needs to be estimated, but this requires two blocks of reduced
rank tests to be undertaken. One procedure for undertaking this analysis would
be to consider the unit roots associated with cointegration amongst I(2) series
whose first differences cointegrate. However: when ′ ≠ 0, then the model to
be estimated will either require very long lags as the moving average terms
′xt–1 = J(L)εt–1 have been omitted or the Johansen approach might be applied
to a VARMA(1,q) model. To see this re-write (6.2) as:
∆ 2 xt − F∆xt −1 = C1 ( L)t . (6.10)
If (6.10) were to be estimated, then the method must account for roots on the
unit circle as when the level terms cointegrate, C1(L) contains further unit
roots. Otherwise, the conventional VAR associated with this problem is of
infinite order and not conventionally invertible. There is no unique way
of deriving the estimator and in general the existence of the time series
representation cannot be proven.
In general, the case with both I(2) and I(1) interdependencies can be
handled by considering the solution to two reduced rank problems:
∏ = ′
⊥′
⊥ = ′
rank (section 4.4). To confirm that the I(1) analysis is valid the test for I(2)
components discussed previously in 4.4.5 needs to be undertaken, this relates
to the solution to the second reduced rank problem, that is rank(′⊥
⊥) =
n – r. Should this matrix not have full rank, then there are I(2) components
not accounted for. Next an analysis of the I(2) components of the model is
undertaken, controlling for the I(1) variables. Subject to knowledge of (, , r)
the I(1) terms are eliminated by pre-multiplying (6.7) by ′⊥:
p −1
= ′xt −1 − ′
xt −1 + ∑′% ∆ x i
2
t −i + ′FDt + ′t (6.13)
i =1
where –′ = Ir. Subtracting (6.13) from $ × (6.12):
p −1
∑% ∆ x i
2
t −i – N 0 Dt ) + ( ′ − $′⊥ )t (6.14)
i =1
where $ = ⊥ = –′⊥ and ⊥⊥ = ′⊥⊥ The errors of (6.12) and
⊥ –1⊥⊥,
(6.14) are independent by construction. While the parameters of (6.12), (′⊥
,
′⊥%i⊥⊥) and (6.14), ($,(–′ – $′⊥)
,(–′ – $′⊥)%i, (–′ – $′⊥)N0) are variation
free. It follows that the parameters (
, %i, N0, ) can be disentangled from the
Further Topics 165
p −1
= −⊥′
′∆xt −1 + ⊥′
⊥⊥′ ∆xt −1 + ∑ ′ % ∆ x
i =1
⊥ i
2
t −i + ⊥′ N 0 Dt + ⊥′ t (6.15)
p −1
= −⊥′
′∆xt −1 + ′⊥′ ∆xt −1 + ∑ ′ % ∆ x
i =1
⊥ i
2
t −i + ⊥′ N 0 Dt + ⊥′ t . (6.16)
tic is based on the solution to the eigenvalue problem | S1,1- S1,0 S–10,0S0,1| = 0,
calculated from sample product moments derived for the I(2) case using:
T
Si , j = T −1 ∑R
i =1
i ,t R′j ,t for i = 0,1
s1
LR( s0 , s1 ) = −T ∑
log(1 − i )
i = s +1
0
and for an appropriate choice of s the matrix ′ is the matrix whose columns
are the eigenvectors associated with the first s significant eigenvalues.
An alternative approach is derived in Johansen (1997) and Hansen and
Johansen (1998) using (6.6) where the parameters to be estimated that are
variation free are (, , , , ", !).
on the notion that the series are all I(1) may not be valid (Paruolo 1996).
When the series are I(2) they become stationary by virtue of a combination of
I(1) and I(2) processes and from (6.6) the cointegrating relations have the
following form:
( ′ ′xt −1 − ! ′∆xt −1 ) = ′ ′xt −1 − ! ′∆xt −1
= ′xt −1 − ! ′∆xt −1 = ( ′xt −1 − ! ′∆xt −1 ).
Engle and Yoo (1989) defined cointegrating relationships of the form ′xt–1 –
!′xt–1 as polynomial cointegration. To observe this re-write the cointegrating
vectors as a lag polynomial (L) in x:
( L)′ xt = ′xt −1 − ! ′xt −1 + ! ′xt − 2 = (( ′ − ! ′) I + ! ′L)xt −1.
The test statistic (1Qr,s) is compared with associated points on the null distribu-
tion, the comparison is made either with [p.value] calculated by PCGIVE 10.1
(Doornik and Hendry 2001) or 5% critical values (cr,n–r–s (5%)) taken from
Paruolo (1996). It is suggested in Doornik and Hendry (2001) that testing is
applied from the top left of the table, while Paruolo (1996) suggests progress-
ing from the top to the bottom of each column to a point at which the null
can no longer be rejected. Paruolo (1996) advises that tests are applied to
the specific case, moving to the general or from the most restricted to less
Further Topics 167
1Qr,s(Q??r,s)
r [5% c.v.] Q*r cn–r cr,n-r-s
[p.value]
n-r-s 6 5 4 3 2 1
314.01 254.23 199.22 163.69 141.7 126.62
0 [194.32] [134.54] [79.53] [44.0 [22.01] [6.93]
240.35 203.12 174.83 148.54 126.69 109.21 119.69 93.92
[0.0000] [0.0000] [0.0031] [0.0105] [0.0073] [0.0028]
203.82 148.4 114.58 90.026 74.347
1 [134.96] [79.539] [45.719] [21.165] [5.486]
171.89 142.57 117.63 97.97 81.93 68.861 68.68
[0.0009] [0.0429] [0.1335] [0.2082] [0.1840]
124.56 88.233 65.029 49.417
2 [80.184] [43.857] [20.653] [5.041]
116.31 91.41 72.99 57.95 44.376 47.21
[0.0226] [0.1234] [0.2247] [0.2537]
3 83.798 56.535 35.023
[59.868] [32.605 [11.093]
70.87 51.35 38.82 23.938 29.38
[0.0039] [0.0176] [0.1215]
48.922 27.513
4 [35.512] [14.103]
36.12 22.6 13.413 15.34
[0.0016] [0.0084]
5 13.576
[8.392] 5.184 3.84
12.93
[0.0601]
c*n-r-s 75.33 53.35 35.07 20.17 9.09
restricted cases. Following this approach, the first diagonal element implies
r = 0, n – r – s = 6 and the test statistic for the case with unrestricted constant
(I ≠ 0) is 1Q0,0 = 314.01 > c0,6 (5%) = 240.35. Based on the calculated statistic
the null hypothesis (rank(
) = s = rank() = r = 0) cannot be accepted.
Progressing to the next column, where r = 0 and n – r – s = 5, 1Q0,1 = 254.23 >
c0,5(5%) = 203.12, the null is rejected, that rank(
) = s = 1 and rank() = r = 0.
At this point using Paruolo’s (1996) suggestion to move down the column,
r = 1, n – r – s = 4, s = 1, the joint test statistic 1Q0,1 = 203.82 > c0,5(5%) = 177.89
and the [p-value]=.0009 confirms that the null hypothesis cannot be accepted
at either the 5% or the 1% level. Now the next column is considered, r = 0,
n – r – s = 4, s = 2 and the [p-value]=0.0031 implies the null (rank(
) = s = 2,
rank() = r = 0) cannot be accepted.
Following this approach, testing stops and the correct decomposition of the
long-run is detected once a null in the above table is accepted. Looking at the
168 Modelling Non-Stationary Time Series
Based on the first rank test it is suggested that r = 2 is selected and then s is
determined by moving along that row to the point at which the null cannot
be rejected. The Johansen test along each row considers the specific case and
moves towards the more general, but this now occurs for different values of
n – r – s, which for fixed r imply different values of s. Given r = 2, the test
statistic Q2,s is considered for s = 0, 1, 2, 3. Starting from the left n – r – s =
6 – 2 – 0 = 4, the Johansen tests statistic is Q2,0 = 80.184, which exceeds the 5%
critical value (c*6–2–0 = 53.35) taken from Johansen (1995a), implying that the
null (r = 2, s = 0) cannot be accepted. Continuing along the row where r = 2,
the null eventually cannot be rejected when n – r – s = 6 – 2 – 2 and s = 2
(Q2,2 = 20.653 < c*6–2–2 = 20.17). In line with Doornik and Hendry, the Johansen
testing procedure implies that there are r = 2 stationary linear combinations
(cointegrating vectors), n – r – s = 6 – 2 – s = 2, I(1) trends and s = 2, I(2)
trends.
The two test procedures advanced by Johansen (1995a) and Paruolo (1996)
imply that s = 2, but they disagree about the number of cointegrating vectors
and I(1) trends. Johansen (1995a) shows that by progressing from s = 0, 1, 2, 3,
the Q2,2 test has the same optimal properties in the limit as the Johansen test
statistic for cointegration. Furthermore, looking at the Johansen I(2) tests pre-
sented in the table above (Qr,s), when r = 0, 1, 2 the tests are not materially dif-
ferent whatever value n – r – s is selected. Partial confirmation of the
optimality of the test may be observed by comparing values of Qr,s. For the
Further Topics 169
column headed n – r – s = 3, Q0,3 = 44 & Q1,2 = 45.719 & Q2,1 = 43.857 and all
these values exceed the critical value (c*6–2–2 = 35.07) at the 5% level.
Inspection of the roots of the companion matrix of the VAR is often viewed
as a useful tool in determining the number of unit roots and as a result some
idea of the likely number of non-stationary processes driving xt (Johansen
1995a). The VAR(2) written as a first order model in state space from is:
xt A1 A2 xt −1 ε t
xt∗ = ∗ ∗
= Ac xt –1 + ε t = +
x
t −1 I 0 xt − 2 0
xt A1 xt −1 + A2 xt − 2 + ε t
=
xt −1 xt −1
or
A( L)xt xt − A1 xt −1 − A2 xt − 2 ε t
= = .
xt −1 − xt −1 xt −1 − xt −1 0
Dhrymes (1984) shows that the characteristic roots of the dynamic process
described by the polynomial A(L) can be calculated from the eigenroots of the
companion matrix Ac. The eigenvalues (roots) for the VAR(2) model estimated
above and for comparison a similar VAR(1) are given in Table 6.2.
The Australian exchange rate example in Johansen (1991a), summarized in
Johansen (1995a), yields the clear-cut conclusion that there are three unit
roots when n – r = 5 – 2 = 3. By contrast, the VAR(2) case considered here
appears to reveal three roots close to the unit circle, a real root (.9719) and a
complex conjugate pair of roots with modulus (.9001), but, according to the
I(2) test produced by Johansen, n – r = 4. This suggests that detecting the
VAR(2) VAR(1)
number of unit roots from the companion matrix is not always straight-
forward. Firstly, a VAR(2) system can be decomposed into two stationary
processes (r = 2), two non-stationary processes (either n – 2 – s = 2 or s = 2) and
a pair of common I(2) or I(1) trends driven by a single unit root. Secondly,
should the roots of the VAR(1) be considered for comparison, then the esti-
mates are quite consistent with the proposition that there are n – r = 4 unit
roots. Analysis associated with both sets of eigenvalues for the two companion
matrices does not appear to support the approach due to Paruolo (1996),
which suggests r = 1 and n – r = 4.
Having found that some of the series are I(2), the usual cointegrating
vectors may not be valid as the stationary linear combinations may require
combinations of I(2) processes that are I(1) to make them stationary or poly-
nomial cointegration. Consider these following suggestions for the long-run
relationships associated with the VAR(2) system developed above. Based on
the findings in Hunter (1992a) and Johansen and Juselius (1992), there are
two cointegrating vectors that accept PPP and UIRP restrictions. The conclu-
sion of the I(2) analysis for PPP is that the series may only be rendered station-
ary when the cointegrating vector is augmented by differences in I(2)
variables. For example, relative movements in the cross-country inflation rates
may be what is required. With s = 2 common I(2) trends driving the price
series (p0p1p2) then the cointegrating vectors could take the following form:
p0
p1
0 1 − 1 − 1 0 0 p2
′xt −1 − ! ′∆xt −1 = ( −
0 0 0 0 1 − 1 e12
r
1
r2 t −1
p0
p1
0 0 ! 31 − ! 31 0 0 p2
∆ ).
!12 0 0 0 0 0 e12
r
1
r2 t −1
A similar type of long run occurs with polynomial cointegration (Engle and
Yoo 1991; Gregoir and Laroque 1993):
′xt −1 − ! ′∆xt −1 =
0 1 ! 31 − 1 – ! 31 L − 1 – ! 31 + ! 31 L 51 61
x
(12 − !12 + !12 L) 0 0 0 1 – 1 t −1
where x′t = [p0t p1t p2t e12t r1t r2t]. The two forms of I(2) cointegration are equiva-
lent when 51 = 0, 61 = 0 and 12 = 0. Unfortunately, prior to any evaluation
Further Topics 171
of the long run, the system needs to be identified, but identification of the
type discussed in chapter 4 is considerably more complicated in the I(2) case
as three sets of matrices lack identification:121
Hence, the same likelihood can be defined for (6.6) using parameters [, ′, ′,
"′, !′, ′⊥] and [*, +′, *′, "+′, !*′, *′
⊥ ]. The two sets of parameterizations are
observationally equivalent and observational equivalence leads to a funda-
mental loss of identification.
Although inflation seemed to be I(1) in the late 1980s and early 1990s the
argument appears less compelling in a world where inflation is predominantly
under control, which suggests that economic and financial time series might
be better described as long-memory.
The estimator is similar to that used by Phillips and Hansen (1990) to estimate
long-run parameters when the series are I(1). The unknown moving average
parameters in J(L) are captured by a frequency domain estimator, which also
appears to compare well with Phillips and Hansen (1990) when the series are
I(1) (Marinucci and Robinson 2001). Although there is evidence that this type
of approach is able to estimate long-run parameters when r is known or not
large, the method, though efficient in calculating well-known long-run rela-
tionships, does not provide a formal test of the proposition that either frac-
tional or integer integrated series are cointegrated. The method can determine
the extent to which the variables in the regression are related by determining
whether 1 is significant or not. Clearly, any such conclusion is conditional
on the appropriateness of this normalization.
Robinson and Yajima have attempted to determine the order of integration
and cointegration by two different methods. They consider three different
crude oil prices (WTI, Dubai and Brent). Based on an Augmented Dickey–
Fuller test with an intercept, the three series are found to be stationary at the
5% level of significance. But when the order of difference is assumed to be
fractional, the estimates of d for the three series are [.5336, .4367, .4538].5
Robinson and Yajima (2002) suggest two approaches to the problem of
selecting the cointegrating rank, but they use one of them in their example.
Consider the Vector Auto-Regressive Fractionally Integrated Moving Average
(VARFIMA) model:
E( L)xt = C( L)ε t
where E(L) = diag[(1 – L)d1, (1 – L)d2 … (1 – L)dn].6 The series are ordered on the
basis of the prior estimate of the difference order. The test is based as is usually
the case on the rank of the matrix C(1), which, under conventional cointegra-
tion, has rank n – r associated with the extent to which there is any over-
differencing. The test, as is the case with integer cointegration, progresses
from the most restricted model, where C(1) has full rank, n – r = n and r = 0,
there is no cointegration to the cointegration cases, r = 1, 2, 3. The test for
fractional cointegration is:
H i : rank(G ) = rank(C(1)) = n − r
1
where G = C(1)C(1)′ .
2
To make the test operational, Robinson and Yajima use the following non-
parametric estimator of G:
Further Topics 173
m1
∑ Rc{ˆ ( ) }
1 −1 ˆ ( )−1 .
Gˆ = j I j j
m1 j =1
Where Ij = $( j)$( j)′, $( j) = ($1( j)$2( j) … $n( j))′, Re{·} is the real component,
i
d * i
d *
2
j
ˆ ( ) = diag ( e −j d* , … e −j d* ), j =
T
j
2 2
2 . It has been assumed that
, and m < —
T
–
da is replaced by a pooled estimate d * = (dˆ1 + dˆ2 + dˆ3)/3 and $a( j) =
1
∑T xat eit j is the discrete Fourier transform of the original data. The effec-
2
T t =1
tive bandwidth m1 is set to increase at a faster rate than m to counteract the
effect of using an estimate of da Robinson and Yajima (2002) provide estimates
of G evaluated with m = 13 and m1 = 15:
.00493 .00542 .00575
ˆ
G = .00542 .00625 .00653,
.00575 .00653 .0073
where Ĝ has the following eigenvalues [.01807, .000275, .000124]. The most
important eigenvector is associated with the largest root, which given that the
other two roots are small suggests that n – r = 1 or with n = 3 variables then
there are r = 2 cointegrating relationships. Robinson and Yajima (2002)
proceed to analyze the case where the three series have two distinct orders of
differencing. This suggests that the WTI oil price series is handled differently
than that for Brent and Dubai. Once Brent and Dubai crude prices are consid-
ered together with two types of difference, the reduced rank calculation is
applied to a 2 × 2 sub-matrix, which from the obvious rank deficiency in Ĝ
above implies r = 1.
expression for xt, which is to be the object of the forecast, sum both sides of
(6.18) from i = 1, …, t to give
t
xt − x0 = ∑ C( L) .
i =1
i
and so
t i −1
xt = ∑∑C
i =1 j = 0
j i− j. (6.19)
and redefining the index on the last summation to emphasize that it contains
terms in the disturbances beyond t only, gives
t t + h− s h h− q
xt + h = ∑ ∑C + ∑∑C
s =1 r = 0
r s
q =1 r = 0
r t +q. (6.20)
Equation (6.20) expresses xt+h as the sum of two terms that partition the dis-
turbances between those occurring up to and including time t, and later
values.
The forecast of xt+h based on information available at time t is the expected
value of xt+h given the information, and is denoted xt+h|t. In this context, h is
known as the forecast horizon and t is called the forecast origin. Using the fact
that the conditional expectation of a future disturbance term is zero, and the
conditional expectation of any current or past value is the expectation of a
realized value, from (6.20),
t t + h− s
xt + h t = ∑ ∑C .
s =1 r = 0
r s (6.21)
Further Topics 175
This does not yet establish that the forecasts are linearly related. The require-
ment for this is for there to exist a linear combination of the forecasts that is
zero (in the absence of deterministic terms). That is, there must exist an n × 1
vector such that ′xt+h|t = 0. From (6.21), a sufficient condition for this is
that
t + h− s
′ ∑C
r =0
r = 0.
But this does not follow from the properties of the VMA, as it requires each of
t + h− s
t + h− s ∞
Limh→∞ ∑
r =0
Cr = ∑C
r =0
r = C(1), (6.22)
and define what can be called the long-run forecast, x∞|t, as:
x∞ t = Limh→∞ xt + h t . [ ] (6.23)
t t + h− s
[ ]
x∞ t = Limh→∞ xt + h t = Limh→∞
s =1 r = 0
Cr s
∑∑
t t + h− s t
= ∑
Limh→∞
s =1
r =0
∑
Cr s = C(1) s .
s =1
∑
The long-run forecast therefore follows a linear combination of the realized
value of a vector stochastic trend. But rank (C(1)) = n – r, and so there exist r
linearly independent vectors, that is the cointegrating vectors, , such that
′C(1) = 0. Therefore:
t
′x∞ t = ′C(1) ∑
s =1
s = 0. (6.24)
and so
t ∞
′xt + h t = − ′ ∑ ∑C . r s
s =1 r = t + h − s +1
t t + h− s h h− q t t + h− s h h− q
et + h t = ∑ ∑C + ∑∑C
s =1 r = 0
r s
q =1 r = 0
r t +q − ∑ ∑C = ∑∑C
s =1 r = 0
r s
q =1 r = 0
r t +q
(6.25)
h h− q h− q
var ( et + h t ) = ∑ ∑ C ' ∑ C′ ,
r r
q =1 r =0 r =0
where ' = E (te′t ), for all t. That is, the forecast error variance grows with h.
Interestingly, it is also the case that the forecast errors are cointegrated, with
precisely the same time series structure as the original process, xt, under the
condition that all forecasts are made using the same information, that avail-
able at time t. To see this use (6.25) to construct the forecast error difference
process
∆et + h t = et + h t − et + h−1 t
h h− q h −1 h −1 − q
= ∑∑C
q =1 r = 0
r t +q − ∑ ∑C
q =1 r = 0
r t +q
h −1 h− q h − q −1
= C0t + h + ∑∑
q =1 r = 0
Cr −
r =0
∑
Cr t + q
h −1 h h
= C0t + h + ∑
q =1
Ch − q t + q = ∑q =1
Ch − q t + q = ∑C
q =1
h − q t + h −( h − q )
h −1
= ∑C
k =0
k t + h− k = C( L)t + h , q = 0, q ≤ t ,
where the initial values are now relative to the forecast origin, and consistent
with the original VMA, have been set to zero. Thus
∆et + h t = C( L)t + h
Further Topics 177
and hence, from the original VMA, all h-step ahead forecast errors are cointe-
grated of order (1,1). That is, the difference between the h-step ahead and the
h – 1-step ahead forecast errors, both made conditional on information avail-
able at time t, is stationary, but the sequence of h-step ahead forecast errors,
for h = 1,2, …, is I(1).
An intuition for the non-stationarity of the forecast error can be provided
by expressing a future value of the process as a sum of the forecast and the
forecast error,
xt + h = xt + h t + et + h t . (6.26)
Since, xt+h|t depends only on realized values (the disturbance values at time t
and before), it is non-stochastic. Thus the stochastic non-stationarity proper-
ties of xt+h and et+h|t must be the same, so they must both be integrated of order
1. Applying the initial value condition q = 0, q ≤ t, equation (6.26) gives
xt+h|t = x∞|t and hence:
xt + h = x∞ t + et + h t ,
The left-hand side of (6.27) is I(0) from the VMA, and therefore so is ′et + h|t,
hence et+h|t is CI(1,1).
where, again = ′ with and dimensioned n × r. Following Lin and Tsay
(1996), in order to understand how the forecasts from (6.28) have the same
long-run properties as the series themselves, note that xt is I(0), and that
forecasts of a stationary series converge to the expected value of the process as
the forecast horizon tends to infinity. That is
Limh→∞ ∆xt + h t = ∆x (6.30)
where x = E (xt). The properties of the forecasts of the difference process are
used to obtain those of the levels via the VECM. Using (6.29), the h-step ahead
forecast equation for the difference process is
178 Modelling Non-Stationary Time Series
p −1
∆xt + h t = xt + h−1 t − ∑
∆x
i =1
i t + h− i t . (6.31)
In order to derive the properties of the long-run forecasts, take the limit of
(6.31) as h → ∞, and substitute from (6.30) to give
p −1
[
∆x = Limh→∞ xt + h−1 t − ] ∑
i =1
i ∆x .
Rearranging, and using the notation of (6.23) for the long-run forecast of the
level,
p −1
x∞ t = I n +
∑
i ∆x . (6.32)
i =1
The right-hand side of (6.32) is a constant matrix, and so shows that the long-
run forecasts, x∞|t, are tied together. The analysis can be taken further to com-
plete the analogy with equation (6.24) for the VMA case. Pre-multiplying
(6.32) by ′ and replacing by ′ gives
p −1
( ′) ′x∞ t = ′ I n +
∑
i ∆x
i =1
where (′) is non-singular, so that
p −1
′x∞ t = ( ′)−1 I n +
i =1
∑
i ∆x .
This is directly comparable with (6.24) (except that in 6.24 initial values have
been set to zero), and shows that each cointegrating vector constitutes a con-
straint on the long run forecasts.
p
xt = ∑ Aˆ x
i =1
i t −i + et ,
ˆ = I +
A ˆ −
ˆ, A
ˆ =
ˆ −
ˆ and A
ˆ =
ˆ .
1 n 1 i i i −1 p p −1
where xt + h–i|t = xt + h–i for h ≤ i. If r and are unknown, they may be replaced by
values r̂ and ˆ estimated using the Johansen procedure. This is the approach
used by Lin and Tsay (1996).
The order of the forecasting VAR in (6.33), and that used for the Johansen
pre-whitening, should be the same, determined, for example, using an infor-
mation criterion, such as the Schwarz (SIC) (see Reimers 1992; Lütkepohl
1991). Otherwise, as was explained in section 4.3.3, programs such as PCGIVE
provide systems and single equation diagnostic test for each equation in the
VAR (Doornik and Hendry 2001).
The details of information criteria vary according to the weight put on addi-
tional parameters, but they are generally of the form
T
∑ ˆ′ˆ
1
IC = ln t t + mf (T ), (6.34)
T t =1
(i) How useful is the long-run information in providing long but finite time
horizon forecasts?
180 Modelling Non-Stationary Time Series
These issues are discussed by Clements and Hendry (1995, 1998), Lin and Tsay
(1996) and Engle and Yoo (1987), among others. The three studies report
Monte Carlo results; their findings are summarized below.
introduce another issue, which is the form of the process used to compare
forecasts: the levels, the differences, or the stationary combinations. The last
of these representations is obtained by transforming the model to one in
terms of the cointegrating combinations and the differenced common trends.
Thus, the number of processes is unaltered, and their integration and cointe-
gration properties preserved. Their notation for the I(0) variables is wt where
w′t = (x′t ′⊥xt). Consider the partition ′ = (′a ′b) with a dimensioned
r × r and b dimensioned r × (n – r) and defining
J ′ = ( 0 I n− r ) and Q = ( J )′
the representation is
wt = Gwt −1 + t (6.35)
( I r + ′) 0 ′
where G = and t = t .
b 0 J′
Clements and Hendry produce forecasts of xt and xt using each of the four
estimation methods, UVAR, ML, EG, and DV. These primary forecasts are
transformed to produce forecasts of each of xt, xt and wt. That is, each fore-
cast is one of xt or xt, initially, but all are transformed (as necessary) into xt,
xt and wt. The purpose of the exercise is to emphasize that the superiority of
one forecast method over another depends not only on what model is used to
produce the forecast, but also on what properties of the forecast are being
compared.
In particular, in comparing EG and UVAR to forecast xt, the level of the
process, the importance of the imposition of a valid long-run restriction is
examined. But the question then arises as to whether it matters that the
restriction is specifically a long-run restriction. In other words, are the advan-
tages available from the imposition of correct restrictions markedly different
in a non-stationary cointegrated environment compared to a stationary one?
The way to get at this issue is to transform the forecasts to stationarity before
comparing them, effectively filtering out long-run variation. The appropriate
transformation is that of equation (6.35), applied to the forecasts. This proce-
dure is only available in the context of simulations (using parameter values
from the DGP), since the UVAR, by its very nature, brings with it no estima-
tion of the cointegrating combinations. It is still the case that the forecasts
differ in the method of their production, but are now being compared on a
more appropriately matched basis – that is, in stationary terms. If relative
forecasting performance is different in stationary space, then it suggests that
the long-run nature of the restrictions is relevant in determining forecast
behaviour.
If it is the long run nature of the restrictions that improve the long-run fore-
casts, then direct comparisons of the forecasts of the level of the process
182 Modelling Non-Stationary Time Series
where the restrictions are, and are not imposed, should favour the forecasts
made subject to the restrictions. However, if the long-run components are
removed prior to comparison, these transformed forecasts should not differ
significantly. Equation (6.35) is a very useful device for decomposing the
causes of relative forecast behaviour.
In their simplest case (among 13 parameterizations), Clements and Hendry
generate data according to a bivariate VECM model with a single lag,
∆xt = xt −1 + t .
(i) When the system is stationary the long-run forecasts approach a constant
quickly as the forecast horizon increases. (The size of the forecast errors,
in terms of their variance is also relatively small.)
(ii) If the system is stationary, then under-specifying the rank of the long-run
matrix leads to under-performance. That is, imposing long-run restric-
tions that do not exist in practice (which are not valid) damages long-run
forecast performance. The more of these there are, the worse the perfor-
mance of the forecasts.
(iii) Unless the system is very close to non-stationarity (the near non-
stationary DGP is model 3), correct specification of the cointegrating rank
is best.
(iv) Under specification of the cointegrating rank is not serious if the
processes concerned are non-stationary. This should be contrasted with
the stationary case, where, although cointegration is not defined, the
rank of the long-run matrix still is, and where this is under-specified,
there is a deterioration in forecast performance.
Figure 6.1 Forecasting performance from Lin and Tsay study, by model
Lin and Tsay control carefully for the roots of the processes involved, only
their cointegrated structure displays common features, in this case of the unit
root. All the other models are diagonal, meaning that, in the case of model 3
for example, although there are roots very close to being unit roots, they do
Further Topics 185
Figure 6.2 Lin and Tsay results, all models, rank 2 system
not constitute a common feature. For this to be so, the determinant of the
VAR lag operator evaluated at that root would have to be less than full rank,
but not zero. Diagonality results in its being zero (Burke 1996).9 Model 3 also
has the interesting property that the quality of forecasts is least affected by the
choice of (cointegrating) rank.
By grouping these results differently, a further conclusion can be made.
Instead of looking at the results by model and varying the cointegrating rank
imposed, it is possible to fix the imposed cointegrating rank, and see which
model is easiest or hardest to forecast for that restriction. Figure 6.2 demon-
strates the case for the imposition of rank 2, which is correct for model 4. It is
immediately obvious that, using the trace measure (see Forecast Evaluation
below), the cointegrated system is the hardest to forecast at medium and
long horizons. It is even harder to forecast than the non-stationary non-
cointegrated case.10 In fact, no matter what cointegrating rank is imposed
(0 to 4), the cointegrated system is the most difficult to forecast, in the sense
that it has the largest trace statistic. However, it remains the case that, if the
system is cointegrated, it is best to impose the appropriate cointegrating rank
(figure 6.1d).11
These forecast comparisons are more limited since they are compared in
levels terms only. Clements and Hendry demonstrate that once transformed
to stationarity, there is much less difference between forecasts based on differ-
ent procedures. It is not clear from Lin and Tsay if the same transformation
would result in less obvious distinctions between the forecasts based on the
imposition of different cointegrating ranks at the estimation stage. Broadly
speaking, the extension to the multivariate case is not found to undermine
the findings on Clements and Hendry for the bivariate case. However, the
four-variable setting makes it even more difficult to generalize the findings,
186 Modelling Non-Stationary Time Series
and the multiplicity of possible cases should lead to reticence when interpret-
ing the results in a wider setting.
In order to reduce the impact of such criticisms, Lin and Tsay present two
real data examples, one financial and one macroeconomic. They observe that
the problem of roots close to the unit circle, but not actually being unit roots,
is observable in data (that is, similarity to model 2, or, more extremely, model 3).
In such circumstances, the under-specification of the rank (imposing unit
roots that are not present) can be expected to result in poor long term fore-
casts.12 Secondly, they observe that forecast error variances from a stationary
system converge fairly rapidly as a function of forecast horizon. This is used to
explore the stationarity of a system of bond yields. In this case, the unit root
and cointegration tests performed suggest cointegration. This could be a case
where the process is near non-stationary, and with a common feature, but the
common feature is a root close to, but not on, the unit circle. It is clear from
their investigations that, at a practical level, cointegrating restrictions cannot
be assumed to improve long term forecasts, even where there is within-sample
statistical evidence to support them.
One of the measures used by Clements and Hendry, and the one relevant to
most of the results reported above, is
K
∑ ˆ ( j)
'k ,t
T ( j ) = trace k =1 ,
K
which is referred to as the trace mean-square forecast error (TMSFE). Lin and
Tsay use a modified version of this criterion since each replication gives rise to
a set of j–step ahead forecasts, as a result of rolling forward the forecast origin
within the same replication. They construct a within replication estimate of
the forecast error variance–covariance matrix as
400 − j
∑ 'ˆ k ,t ( j )
ˆ ( j) =
' t = 300
k
100 − j + 1
Further Topics 187
K
∑' ˆ ( j)
k
E( j ) = trace k =1 (6.36)
K
Clements and Hendry (1998) discuss the choice of criterion, and use others in
addition to TMSFE. An important aspect of these is their sensitivity to linear
transformations of the data, although extensive use continues to be made of it.
= ′
n× n n× r r × n
such that there are only 2nr parameters of to be freely estimated. The infor-
mation criterion is therefore of the form of (6.34) with m = (p – 1)n2 + 2nr, the
selected model being that for which the criterion is minimized over a grid of
values of p and r = 0, 1 …, n (the upper limit on the range of r allowing for sta-
tionarity). The evidence on the appropriate form of the penalty term, f(T), is
mixed (Reimers 1992), and while SIC can dominate, relative performance
depends on simulation design. In practice, it is best to compute a range of crite-
ria and search for corroborative evidence amongst them as to model order and
cointegrating rank, and, if there is significant deviation in the findings, to check
that subsequent inferences are not sensitive across the set of models suggested.13
Lin and Tsay (1996) point out that a model should be selected (and estimated)
according to its purpose. In their paper they develop the idea that if the objec-
tive of the model is to forecast at a long-term forecast horizon, then it should be
optimized to do this. Since standard methods of estimation and the form of
information criteria are based on one step-ahead errors, it would not be surpris-
ing that such models were sub-optimal in terms of, say, 50-step ahead forecasts.
where Q0 = (1 + ) K + H and Q1 = K.
Consider the process when it approaches its terminal value (at T* = T + N):
Simplifying (6.40):
1 1 1 1
− − (T *) (T * +1) (T * +1)
E( 2Q
0
2 y T* −2 Q1 yT * +1 − 2 Q1′ yT * −1
1 1
− T*
− 2H A 2 zT * 't ) = 0. (6.41)
1 1
Re-defining (6.41) in terms of y*T* = – –2(T*)
yT* and z*T* = – –2(T*)
zT* gives rise to
the symmetric solution:
1 1
− −
E( 2Q *
0 yT * − Q1 yT* * +1 − Q1′ yT* * −1 − 2H AzT* * 't ) = 0. (6.42)
In the limit (6.42) is bounded when the roots of the processes driving zt and yt
1
are of mean order less than –2– as:
Lim E( yT* * +1 't ) → 0 and Lim E( zT* * +1 't ) → 0.
T *→∞ T *→ 0
Notice that (6.42) is bounded even when y and z have univariate time
series representations that are non-stationary. Now consider the cointegration
case. Dividing (6.38) by t and transforming yields an error correction
representation:
E( −K∆yt +1 + K∆yt + H ( yt − Azt ) 't ) = 0. (6.43)
From the above discussion, a regular solution (see Pesaran 1987) to (6.42)
1
exists, if and only if: (a) Qo is symmetric; (b) K is non-singular; and (c) < –2–.
Dividing through (6.38) by t yields the following difference equation:
Redefining (6.45) using the forward (L–1) and backward (L) lag operators:
Q ( L) E( yt 't ) = H AE( zt 't ). (6.46)
where G1 = F, F = PP–1 and is a matrix whose diagonal elements are the
stable eigenroots of the system. Therefore:
(Sargent (1978). Where Ro= ((F – I) + F–1 – I) and Mt satisfies the martingale
property E(Mt+1|'t) = (G1) Mt (Pesaran 1987).
Further Topics 191
( I − G1 L−1 )( yt − Fyt −1 − ut )
∞
= ( I − G1 L−1 )( ∑ (G ) F E( R Az
s=0
1
s
o t +s 't ) + (G1 )− t M t )
∞ ∞
= ∑
s=0
(G1 )s F E( Ro Azt + s 't ) − G1 ∑ (G ) F E( R Az
s=0
1
s
o t +s 't +1 )
+ ( I − G1 L−1 )(G1 )− t M t .
The first two terms on the right-hand side simplify, while the Koyck operator
annihilates the bubble behaviour. Therefore:
( I − G1 L−1 )( yt − Fyt −1 − ut )
∞ ∞
= FRo Azt + ∑
s =1
(G1 )s F E( Ro Azt + s 't ) − ∑ (G ) F E( R Az
s =1
1
s
o t +s 't +1 )
∞
= FRo Azt + ∑ (G ) F( E( R Az
s =1
1
s
o t +s 't ) − E( Ro Azt + s 't +1 )).
Assuming that there are no bubbles and a forcing process zt = B(L)wt (wt is
white noise), then:
E( zt + s 't ) − E( zt + s 't +1 ) = − Bs −1wt +1
and
∞
( I − G1 L−1 )( yt − Fyt −1 − ut ) = FRo Azt − ∑ (G ) F ( B
s =1
1
s
s −1wt +1 )
∞
= FRo Azt − FRo ( ∑ (G ) AB
s =1
1
s
s −1 )wt +1 .
Now reversing the Koyck lead and setting ( ∑∞s =1 (G1 )s ABs −1 ) = D gives rise to a
forward-looking representation, which depends on future values of zt.
Therefore:
1. yt −1
∆yt = [11 : 12 ] + 1t (6.52)
2. zt −1
1. yt −1
∆zt = [21 : 22 ] + 2t , (6.53)
2. zt −1
1. yt −1
∆yt = [11: 12 ] + 1t (6.54)
2. zt −1
∆zt = 2 t , (6.55)
where = [′1.′2.] and 2t = C(L)wt. Notice that inference on the short-
run parameters is not appropriate as the coefficients of the ARMA error
process forcing yt depend on the MA process forcing 1t. It follows that the
cointegrating relations are defined in the equations for yt. Now consider the
Further Topics 193
∞ ∞
= (I − F)
(∑s=0
(G1 )s E( Azt + s 't ) − ∑ (G )
s=0
1
s +1
E( Azt + s 't )
)
∞
= ( I − F ){ FAzt −1 + ∑ (G ) E( A∆z
s=0
1
s
t +s 't )}. (6.56)
Now it follows from the results in Engsted and Haldrup (1997) that (6.56) has
an error correction type representation in differences and levels. Furthermore:
∞
∆yt + ( I − F )yt −1 − ut = ( I − F ){ Azt −1 + ∑ (G ) E( A∆z
s=0
1
s
t +s 't )}
∞
∆yt + ( I − F ){ yt −1 − Azt −1 } − ut = ∑ (G ) E( A∆z
s=0
1
s
t +s 't ).
∞ ∞
= ( I − F ){ ∑ (G ) E( A∆z
s=0
1
s
t +s 't ) − G1 ∑ (G ) E( ∆Az
s =0
1
s
t +s 't +1 )}
194 Modelling Non-Stationary Time Series
or
( I − G1 L−1 )( yt − Fyt −1 − ( I − F ) Azt −1 − ut )
∞
= ( I − F ) A∆zt + ( I − F ) ∑ (G ) ( E( A∆z
s =1
1
s
t +s 't ) − E( A∆zt + s 't +1 )).
It follows from the Granger representation theorem that zt has the following
Wold form zt = C(L)wt and
∞
= ( I − F ) A∆zt − ( I − F )( ∑ (G ) AC
s =1
1
s
s −1 )wt +1 .
∞
Now reversing the Koyck lead and setting ( ∑ s =1 (G1 )s ACs −1 ) = D*, gives rise to
a forward-looking representation, which depends on future values of zt:
yt − Fyt −1 − ( I − F ) Azt −1 − ut
= ( I − G1 L−1 )−1 (( I − F ) A∆zt − ( I − F )D * wt +1 )
∞
= ∑(G ) (( I − F )A∆z
1
s
t+s − ( I − F )D * wt + s +1 ).
s =0
Now decompose the last relationship as follows:
yt − Fyt −1 − ( I − F ) Azt −1 − ut
∞ ∞
= ∑ (G1 )s (( I − F ) Azt + s − ∑ (G ) ( I − F )D * w
1
s
t + s +1 )
s =0 s =0
∞ ∞
− ∑ (G ) ( I − F )D * w
1
s
t + s +1 .
s =0
Therefore:
∞
yt − Fyt −1 − ut = ∑ (G ) ( I − F )Ax
s=0
1
s
t +s + ( I − F ) Axt −1 − ( I − F ) Axt −1
∞ ∞
– ∑s =1
(G1 )s ( I − F ) Axt + s −1 − ∑ (G ) ( I − F )D * w
s=0
1
s
t + s +1 .
Further Topics 195
− ∑ (G ) ( I − F )D w
1
s *
t + s +1 .
s =0
Re-indexing the second sum and gathering terms, yields a levels relationship:
∞ ∞
yt − Fyt −1 − ut = ( I − F )( I − G1 ) ∑(G ) Az
1
s
t+s − ∑ (G ) ( I − F )D w
1
s *
t + s +1 .
s =0 s =0
In such circumstances the above relationship has the same forward recursion
as was considered before, except the transversality condition relies on the
existence of cointegration. Decompose (6.44) as follows:
The conditions for cointegration (Engle and Granger 1987) are sufficient for
this to be satisfied. That is yt ~ I(1) and (yt – Azt) ~ I(0), yt and zt cointegrate.
Furthermore, (6.57) has an error correction form:
∆yt − ( I − F )( yt −1 − Azt −1 ) − ut = ht
ht = ( I − F ) A∆zt − ( I − F )D * wt +1 + G1ht +1
In the next section the case with dependence amongst the endogenous vari-
ables is considered.
6.4.3 Models with forward behaviour and unit roots in the process
driving yt
There are a number of reasons for finding dependence amongst the endoge-
nous processes, one of which would be cointegration, the other would be the
type of dependence that exists amongst series that might satisfy an adding up
type constraint. In the former case the cause of rank failure is the existence of
a unit root and it can be shown that the original objective function can be
solved in the usual way (Hunter 1989a).
196 Modelling Non-Stationary Time Series
E –1
where y ∗′ = y ∗′
t 1t [ ]
y2∗′t = yt′ [ E ′ : M ′], K * = [ E ′ : M ′]–1 K and v*t conformable with
M
y*t. It follows that the loss function has the following form:
T*
E( ℑt 't ) = ∑ E{ y
t =o
t ∗′ ∗ ∗
1t K11y1t + 2 y1∗t′ K12
∗
y2∗ t +
y2∗′t K22
∗
y2∗ t + ( y1∗t − v1∗t )′ ( y1∗t − v1∗t )) 't )}. (6.60)
y2+t′ K22
∗
y2+t + ( y1+t − v1+t )′ ( y1+t − v1+t )) 't )}.
+
Now differentiating with respect to y1t gives rise to the following first-order
condition:
∗
E( t K11 y1+t − t +1 K11
∗
y1+t +1 − t ( y1+t − v1+t ) − 2 t K12
∗
( y2+t − y2+t +1 ) Ωt ) = 0, (6.61)
Subtracting the above equation from its forward value and re-writing:
∗
E( t K21 ( y1∗t − y1∗t +1 ) + t K22
∗
( y2∗ t − y2∗ t +1 ) 't ) = 0.
E( K( yt − yt +1 ) + H ( yt − zt ) 't ) = 0.
Further Topics 197
∑
1 1
LogL((, H , K, A), .) = −Tn log(2
) − T log − tr ( −1 ut ut′ )
2 2 t =1
T t =1
using a Quasi-Newton algorithm such as Gill, Murray and Pitfield (see Sargan
1988) or an equivalent method. The method due to Gill, Murray and Pitfield
has the advantage of using the Cholesky factors from the inverse of the
Hessian. They are then bounded to be positive definite subject to an appropri-
ately conditioned Hessian matrix.
However, the conventional estimates of the parameter variance based on
the information matrix are not valid, even when the model for the endoge-
nous equations is estimated as a system. The correct estimate needs to take
account of the generated regressors and their parameter estimates. The follow-
ing algorithm is suggested to do this. Initial estimates of the exogenous
198 Modelling Non-Stationary Time Series
variables are estimated as a VAR, then the residuals are saved. The VMA repre-
sentation is estimated by OLS using the method described by Spliid (1983). In
state space form:
z = Wς + w.
z1 w1
z
2 w2
. .
where z = [ ]
,W = w−1 w−2 . . . w− p , w =
. .
. .
zT wT
C1
C2
.
and ς =
.
.
C p
where W(o) contains the initial estimates of the surprises, unobserved values of
the residual are set to zero and %(0) are the initial estimates of the parameters.
Once the system has been estimated, then the likelihood is re-estimated based
on B = 200, bootstrap re-samplings of the original residuals vector w, where
each iteration reallocates a block of residuals wi by the new residual set w(b)
used to provide new estimates of the VMA parameters (%(b) for b = 1, …, B).
Then given the maximum likelihood estimates of the parameters (, H, K, A)
an empirical distribution for the estimated test statistics are generated
from the bootstrap re-sampling regime. A sample of 400 is created by the
use of antithetic variance technique, providing at each bootstrap replication
a pair of residuals w(b) and –w(b) (see Hendry 1995). Then percentiles of
the empirical distribution can be used to determine critical values for the
estimated parameters.
6.5 Conclusion
In this chapter a number of more advanced issues have been addressed: coin-
tegration amongst series with different orders of integration; forecasting with
cointegrating relationships; and cointegration combined with short-run struc-
ture defined by rational expectations.
Further Topics 199
7.1 Approximation
200
Conclusion 201
Many developments of the basic model have taken place, such as the intro-
duction of non-linear adjustment,2 more detailed characterizations of non-sta-
tionarity (such as fractional cointegration considered briefly in the previous
chapter). There have also been developments in other branches of times series
econometrics, such as the modelling of higher order moments of the data,
including variance, skewness and kurtosis. These models provide different
means of analyzing data, not necessarily focussed on the concept of equilib-
rium. Even so, some of the features of their data generating processes have
been used to investigate the robustness of the cointegration methodology. So
for example, the issue arrises as to how reliably cointegration, or its absence, is
identified in the presence of autoregressive conditionally heteroscedastic dis-
turbances, or where the Gaussian disturbance structure is replaced by one with
more frequently occurring extreme values (relative to a Gaussian distribu-
tion).3 It is inevitable that eventually the methods will fail.
Cointegration analysis has also been extended to panel data models where
the time series dimension is sufficiently large.
Probably the main feature of economic time series that is capable of under-
mining cointegration analysis is that of structural breaks. Breaks in individual
time series can lead to incorrect inference as to their order of integration. Thus
data that are considered to be integrated of order 1, might in fact be stationary
202 Modelling Non-Stationary Time Series
There is no doubt about the impact methods for the empirical analysis of time
series equilibrium have had on applied economics. The methods and models
continue to develop, and the range of subjects to which it can be applied
seems only to be limited by the availability of adequate data. Indeed, even rel-
atively small samples have been analyzed via the use of bootstrapping tech-
niques. Outside the realm of high frequency financial models, it is unlikely
that a similar revolution in econometric time series analysis will occur in the
near future.
Notes
1 Introduction
1 Muellbauer (1983) showed that a random walk model of consumption with innova-
tions in income and interest rates can be nested in the ADL framework due to
Davidson et al. (1978). However, the tests used do not take account of the under-
lying series being non-stationary.
2 As will be discovered in the last section of chapter 6, stationarity is overly strong. In
addition, the types of model used by Sargent are excessively restrictive (Hunter
1989).
3 It should be noted that the impulse response function solved from the VAR is not
unique (Lippi and Reichlin 1994) and any findings on causality depend on the
variables in the VAR model estimated (Hendry and Ericsson 1990).
4 Keynes discusses the latent nature of expectations, the problems with dynamic
specification, measurement error, the role of forecast performance and structural
breaks.
203
204 Notes
component, and that which cannot be perfectly predicted from its own past. A
purely non-deterministic process has no component that can be predicted from its
own past, and it is this type of series to which this abbreviated version of the
theorem refers.
11 See also Box and Jenkins (1976) and Granger and Newbold (1976).
12 In addition, t is uncorrelated with future values of the process, xt+j, j>0.
13 The initial values for this equation can be calculated from the process. See Hamilton
(1994), chapter 3.
14 In fact, this derivation requires the autocorrelations to be non-time varying. In
other words, equation (2.19) only applies in the stationary case. See section 2.3.7
below.
15 As described in any textbook dealing with difference equations, the other case that
has to be considered, but which is less interesting, is that where the roots are
repeated, in which case equation (2.19) has to be modified.
16 The AR(1) process xt = xt–1 + t will have one root given by 1 = –1. Substituting
this and p = 1 into (2.13) gives (2.11b).
17 This is, in fact, a linear trend, being a linear function of time. Higher-order polyno-
mial functions of time, such as the quadratic, are also referred to as time trends. It is
for the purposes of analogy that the linear case is used here.
18 This is not a very helpful piece of terminology as it seems to mix up the discrete
and continuous time cases. Perhaps “summed” would have been a better, if more
prosaic, choice.
19 There is another absurdity about this calculation. Although purporting to be a cor-
relation, it is clear that this quantity is not restricted to [–1, +1], for if j is large but t
small, then this quantity can fall below –1.
20 Similar arguments apply in the explosive case when the roots lie inside the unit
circle.
21 Preserving the ordering so the inverse operator is the premultiplying factor on the
right-hand side of (2.25) is not necessary in the univariate case, but is good practice
since in the multivariate case discussed in section 4.2 it is important.
22 The zero lag coefficient does not have to be 1 but it simplifies things a little to con-
sider this case, which is anyway appropriate for ARMA models.
23 As with all ACFs, x(0) = 1, and for all MA(1) ACFs, x(i) = 0 for j > 1, so only x(1) is
considered in this illustration.
24 Strictly this applies only to cases of distinct real roots. Complex roots will occur as
complex conjugate pairs and both be replaced by their inverses in order that the
process remain real. Repetition of roots will mean that fewer new parameterisations
can be generated by inverting just one root.
25 Note also that the MA still has to be normalized so that (0) = 1.
26 This definition deals with the case where the non-stationarity is due to a root of
z = 1. As already stated, all that is required for non-stationarity is |z| ≤ 1, so z = 1 is a
special case.
27 Unless otherwise stated it is the case that t is white noise and (0) = (0) = 1.
28 If the (first) differenced process does have a non-zero mean, then the undifferenced
process will possess a linear deterministic trend. In other words, although a linear
trend plus noise model (2.31) is not I(1), a random walk with drift is.
29 The key property is that, in the representation of the model using the initial values
and the summed disturbance process, the order of integration of the purely stoch-
astic component and the order of the polynomial time trend is the same. So, in a
t
and ∑
j =1
j = t and so is I(1).
Notes 205
13 The case for general (L) and (L) in the ADL demonstrates further the power of
the lag polynomial notation. In this case the ADL can still be written as (L)yt =
+ (L)zt + ut and the equilibrium error will be t. Applying (L) to both sides of
the long-run relationship yields
(1)
( L)t = ( L)y t − − ( L) zt .
(1)
Now substituting out for (L)yt, implies
(1)
( L)t = − + ( L)zt − ( L) zt + ut ,
(1)
(1)
= {( L) − ( L) }zt + ut .
(1)
(1) (1)
Letting ( L) = ( L) − ( L), it can be seen ( L) − ( L) has a unit root as
(1) (1)
(1)
(1) = (1) − (1) = 0. As reparameterizing using (L) = (1)L + *(L), it
(1)
follows that (L) = *(L). When substituted into the expression for (L)t gives
rise to (L)t = *(L)zt + ut. If (L) has all its roots outside the unit circle and zt is
at most I(1), then t is stationary. As a special case, if zt is the random walk
defined by (3.54b), then zt is white noise and t is ARMA(p, q) where p is at most
the order of (L) and q is at most the larger of the orders of (L) and (L) minus
one (because the unit root has been factored out). This also shows that the closer
are any of the roots of (L) to unity, the more persistent will be the equilibrium
errors. At the same time, (1) → 0, so the speed of adjustment to equilibrium gets
smaller. Note, however, that in addition to the roots of (L) those of the autore-
gressive operator of the ARMA representation of zt will also determine the
behaviour of t. Thus if zt displays persistence, so will t independent of the
speed of convergence.
14 Exactly the same random number sequences are used in the two cases.
15 This terminology is also appropriate for any regression between I(d) variables with
disturbances that are I(d–b). However, it is generally reserved for the case where the
disturbances are stationary as in any case this is the case that is of most interest
because of its equilibrium interpretation.
16 Speeds of convergence and Op(.) are discussed in more detail in Spanos (1986,
Chapter 10) and Patterson (2000, section 4.4.2) provides a brief introduction.
17 Asymptotic normality does apply if zt is strongly exogenous for the estimation of b,
that is, it is both weakly exogenous and zt is not Granger caused by yt.
18 If regressions involving not only I(1) but also I(2) variables are being considered,
then the critical values of the tests must be further adjusted. The tests are still of the
null that the disturbances are I(1) against the alternative that they are I(0), thus it is
assumed that any I(2) processes are cointegrating to I(1). Haldrup (1994) discusses
this problem and presents appropriate critical values.
19 The common factor restriction for autoregressive models in the error is discussed
by Hendry and Mizon (1978). The ADF test applied to the cointegration case is a
transformation of such autoregressive behaviour in the residual associated with the
common factor restriction. The effect of such restrictions on ADF and ECM tests of
cointegration is considered in Kremers et al. (1992).
Notes 207
13 The expression for this determinant is found on p. 51 of Johansen (1995). The addi-
tional step of factoring out the unit root term is achieved using the formula for the
determinant of partitioned matrix provided by Dhrymes (1984, p. 37).
14 Johansen’s theorem 2.2 establishes that the necessary and sufficient condition for a
VAR to be stationary is that all the roots lie outside the unit circle.
15 The nature of the projection matrices is such that C may also be written C =
⊥(′⊥
⊥)–1′⊥.
16 The method of maximum likelihood is not discussed here, although its relevance is
described in Appendix C. For an introduction see Patterson (2000) or Sargan (1988).
17 Condition (4.59) is not usually considered in applied work. Instead, the series are
individually tested to confirm that they are I(1).
18 This result is known as the Frisch–Waugh theorem. See, for example, Davidson and
MacKinnon (1993, p. 19).
19 There are a number of standard programmes that can be used to solve eigenvalue
problems, Doornik (1995) prefers the singular value decomposition which limits
the problem to a solution in terms of positive/negative semi-definite matrices.
20 Note that since 1 ≥ 2 ≥ … ≥ n –1 ≥ n ≥ 0, then
j = 0 ⇒ i = 0, i = j, …, n.
21 Note that the trace statistics can be written as the sum of a series of max statistics:
n
∑
trace ( j − 1) =
i= j
max (i − 1).
22 We will also see later that dummy variables and stationary variables may be
included in the VECM and the number of these that are included also effects the
critical values. This type of sensitivity is typical of tests and estimation procedures
involving non-stationary processes.
23 This is not strictly correct since the rejection of the previous null was achieved
using a different test. However, this is the way the non-rejection of the null would
be interpreted in a sequential testing procedure, so it is stated as the null for conve-
nience.
24 In addition, it is the last test of the sequence that examines whether the data is sta-
tionary or I(1), yet this is in practice a property that is pre-tested using unit root
tests. That is, this is not the last but the first specification issue to be decided.
25 We would like to thank Paul Fisher and Ken Wallis for providing us with the data.
26 Hence, for r = 2, and are 6 × 2 dimensioned matrices.
27 For i = 1, with 1 = .0827, T = 60, the max test is max(1) = –Tlog(1 – 1) = –60log
(1 – .0827) ≈ 5.18, and for i = 2, max(2) = –Tlog(1 – 2) = 8.08. The trace test is the
sum of the max tests and for i = 2, trace(2) = 5.15 + 8.08 = 13.41.
28 When the small sample adjustment due to Reimers (1994) is used to test whether
there are r = 4 or more cointegrating vectors, then the revised test statistic is 14.2
and the test is marginally rejected at the 5% level. The test adjusts for the number
of observations by correcting for degrees of freedom, but this corrected statistic is
not necessarily any more reliable than the Johansen test statistic. More specifically,
it is known that shift dummies will alter the distribution of the test statistic
(Johansen (1995), while centred seasonal dummies do not. However, the critical
values used here are based on T = 50 and are again taken from Frances (1994).
29 The one-step Chow test is based on recursive estimation starting with an initial
sample of M – 1 observations and then re-estimated over samples M, M + 1 … T.
Notes 209
Here M = 50 and T = 74. To give a perspective on the choice of the initial sample,
T
following the Sargan rule for model parameterization k < — 3 . The minimum sample
when k = 18 (a VAR has 8 constants/dummies and 2 × 5 lag coefficients) is
n = 3k: = 54. For simplicity the recursive estimates were derived from M = 50 obser-
vations, but the first four calculations must in each case be viewed with caution.
30 The Cauchy is generated by a ratio of normals. Where nominal variables and prices
are normally distributed then their ratio would not converge in distribution to
normality.
31 In practice, C(1) is not always of rank n – r, so that there may be insufficient zero
roots. When C(L) has n – r zero roots, then C(z) = C0(z)C1(z) and C1(z) is of degree
q – 1. If there are insufficient zero roots, this can be rectified by extending the poly-
nomial. Consider zC*(z), then zC*(z) = znC*(z) and this extension introduces n addi-
tional null roots. For the extended model C*(z) = C0(z)C1(z) where C1(z) is of degree
q and C0(z) is defined above.
32 Hunter and Simpson (1995) suggested that the system should be re-ordered on the
basis of tests of weak exogeneity.
22 The matrices ij and ij have the dimensions ni × rj for i = 1, 2 and j = 1, 2. For
example, the matrix is partitioned into two blocks of columns, .1 of dimensions
n × r1, and .2 of dimensions n × r2, then each block is itself cut into two blocks of
rows.
23 In the limit there are r such sub-blocks, which leads to the identification case
I
considered by Boswijk (1992) where = .
0
24 The original source of the data is the National Institute of Economic Research, that
has been kindly passed on to us by Paul Fisher and Ken Wallis.
25 The model in Hunter (1992a) is massively over-identified. It is possible to identify
subject to restrictions on both and . Here we will concentrate on identification
from alone.
26 The discovery of four valid solutions implies that the model has four over-
identifying restrictions.
27 If the determinant is tested for any sub-matrix of then it is found that no such
combination with non-zero determinant appears to exist.
necessarily the short-run forecasts, as the sample dynamics may be better described
by some other order than that used to generate the data.
8 Roots in the paper are the reciprocals of those normally reported, thus a root less
than one in modulus is a stationary root. On this basis the roots of the process are,
respectively: {0.5, 0.5, 0.5, 0.5}, {0.5, 0.5, 0.95, 0.95}, {0.5, 0.5, 0.99, 0.99}, {0.5, 0.5,
1.0, 1.0}, {1.0, 1.0. 1.0, 1.0}.
9 An alternative study would be one based on perturbations of the cointegrated
model, model 4, that retained the common feature, but moved it from being at the
unit root to being further outside the unit circle. This would mean that the
processes became stationary, and more solidly so, but retained the reduced rank
property key to cointegration. In this way, it is possible to isolate two aspects of the
problem with potentially different impacts: stationarity and common features
(reduced rank).
10 This conclusion can also be drawn by comparing the scaling on the vertical axes of
Figures 6.1a–6.1e, whence it will be seen that much the smallest scale is employed
in Figure 6.1c.
11 Figure 6.1d also shows clearly that, in this case, under-specification of the co-
integrating rank is not harmful to forecast performance (including imposing unit
roots), whereas over-specification leads to a deterioration in forecasting perform-
ance.
12 Though they do not establish whether it is the imposition of any false restriction
that matters, or that of unit roots in particular. This is the point made by Clements
and Hendry. They also do not consider if the near unit root is a common feature, or
if restricting it to being so would be advantageous.
13 The information criterion can also be written in terms of the eigenvalues of the
underlying problem, and hence in terms of the test statistics.
7 Conclusions
1 Johansen (2002b) provides a small sample correction to the rank test for cointegra-
tion r = 0 and r = 1. The correction factors are difficult to calculate, but based on the
simulation results there can be considerable benefits to their use. Based on the study
of a four-variable model of Danish Money, the critical values are adjusted by any-
thing between 1.14 and 1.07 for t = 50, 100. For the empirical results in section 4.6.2
such adjustments would not affect the conclusions associated with the trace test for
r = 0 and r = 1. Quite clearly such an adjustment might alter our conclusions when
r > 1. Even so, the critical values used here were taken from Franses (1994), which
assumed T = 50. Further, wrong rejection of the null might not be of paramount
importance when over-rejection of the alternative of cointegration is what is critical
to the applied researcher. Hence, were the true size of the test 10%, then over-rejec-
tion of the null might not be a problem, but cases where the size is considerably
larger ought to be avoided. In particular, test properties are likely to be very poor
when some series are I(2), because conventional tests of cointegration require all the
series in the VAR to be no more than I(1). When there are I(2) series in the VAR, this
violates the necessary and sufficient condition required for the cointegrating rela-
tionships to exist. Johansen shows that the correction increases in line with the true
size of the test as the series tend to become I(2) and in the limit non-cointegration is
always rejected. The reader is referred back to section 4.4.2 and 4.6.2.
2 For further discussion of such issues the reader is directed to Granger and Hallman
(1991), Granger (1995) and Corradi et al. (2000).
Notes 213
Appendix A
1 Notice that in this simple case the (2,2) element of C3(L) is |C(L)|.
Appendix C
1 The eigenvalue problem is solved with respect to both and under some of the
restrictions considered in chapter 5, while the likelihood associated with general
restrictions, applied to both and , is presented in Appendix F.
Appendix E
1 This statement implies that, for u > 0, w(u) ~ N(0, u).
2 For x > 0, x can be written x = X + , where X is a non-negative integer and
0 ⭐ < 1. Then 〈x〉 = X.
3 To be more precise, let X be a random variable and x represent a value taken by X.
Also, let XT be a sequence of random variables. Let the distribution function of X be
F(.) and that of XT be FT(.). Then FT(.) is said to converge weakly to F(.) if
FT ( x) = Pr( XT ≤ x) → Pr( X ≤ x) = F ( x) as T → ∞.
4 For a proof of this result, see McCabe and Tremayne (1993, chapter 8).
5 See Johansen (1995, p. 151) for details.
6 Weak convergence, in contrast, indicates the convergence of one random variable
to another.
p
7 Technically → 0, or, equivalently, is said to be op(1).
8 For details, see Johansen (1995, p. 158).
9 These generalizations break down the residual product moment matrices in terms of
components in the cointegrating space and orthogonal to it.
10 Davidson (1994) provides detailed discussion of different types of stochastic
convergence.
11 Pesaran, Shin and Smith (2000) extended this set up by allowing exogenous I(1)
variables, which distorts the distributions.
12 MacKinnon, Haug and Michelis (1999) find that using Monte Carlo simulations
based on 400 observations leads to quite inaccurate results, especially when n – r is
large. They use a response surface estimated across a range of experiments using dif-
ferent sample sizes. This method calculates the relevant percentile, say the 95th,
214 Notes
appropriate for a test of 5%, for each set of Monte Carlo experiments using a partic-
ular DGP, and regresses this on the characteristics of the DGP. In the simplest form
the dependent variables are an intercept and powers of the reciprocal of the sample
size, such that the estimated intercept is the estimated asymptotic critical value of
the test. Critical values for other sample sizes are obtained by using the estimated
regression to predict substituting the relevant value for T. This approach is also used
in MacKinnon (1991) for unit root and residual based cointegration tests.
13 Asymptotic tests are those based on finite samples of data but using asymptotic
critical values.
Appendix G
1 The normalization adopted by Hunter and Simpson implies that the first vector is an
inflation equation, the second an exchange rate equation, the third a terms of trade
or real exchange rate equation and the fourth a real interest rate equation.
Appendix A: Matrix Preliminaries
A left (right) elementary matrix is a matrix such that, when it multiplies from the left
(right) it performs an elementary row (column) operation. The matrix formed from the
product of such matrices therefore performs the same transformation as a sequence of
such row (column) operations. For example, consider the use of row and column opera-
tions to diagonalize the 2 × 2 finite order polynomial matrix,
1 − 3 L − L
C ( L) = 1 4
− L 1 − 1 L
. (A.1)
8 2
Row operation 1 (objective to alter the (1,1) element to unity): replace row 1 by row
1 minus 6 times row 2. This can be achieved by pre-multiplication by the matrix
1 − 6
0 1 .
The new matrix is
1 − 6 1 − 4 L − L 1 − 6 + 2 L
3
C1 ( L) = 1 = 1 . (A.2)
0 1 − 8 L 1 − 2 L − 8 L 1 − 2 L
1 1
Row operation 2 on C1(L) (objective to alter the (2,1) element to zero): replace row 2 by row 2
plus 18– L times row 1. This can be achieved by pre-multiplication by the matrix
1 0
1 .
8 L 1
The new matrix is
1 0 1 − 6 + 2 L 1 − 6 + 2L
C 2 ( L) = 1 1 = . (A.3)
8 L 1 − 8 L 1 − 2 L 0 1− L+
1 5 1
4 4
L2
Column operation 1 on C2(L) (objective to alter the (1,2) element to zero): replace column
2 by column 2 minus (2L – 6) times column 1. This can be achieved by post-
multiplication by the matrix
1 − (2 L − 6)
0 .
1
215
216 Appendix A
and so are not functions of L. Furthermore, because the determinant is non-zero, the
matrices are invertible. Such matrices are known as unimodular matrices (having constant
non-zero determinant). Usefully, all elementary matrices are unimodular and so there-
fore is the product of two or more elementary matrices. It is therefore possible to invert
the transformation and express C(L) in terms of C3(L) as
C ( L) = G( L)−1 C3 ( L) H ( L)−1.
3 1 1 5 1 1
so A(z ) = 1 − z 1 − z − z 2 = 1 − z + z = (1 − z ) 1 − z ,
4 2 8 4 4 4
and the roots are therefore z = 1 and z = 4.
An important special case is that of a unit root. If A(L) has a unit root then |A (1)| = 0,
that is A(1) is singular. This the case in the example above by putting z = 1.
Appendix B: Matrix Algebra for Engle
and Granger (1987) Representation
G( z ) = I n − ∑G z
i =1
i i
where the Gi, i = 1, 2, …, m are n × n coefficient matrices. Then, denoting the determi-
nant of G(z) by |G(z)| and its adjoint by Ga(z):
G a ( z )G( z ) = G( z ) I n . (B.2)
Note that G (z) is an n × n polynomial matrix or order at most m × (n – 1), and |G(z)| is a
a
Now, rank (G(0)) = n – r, 0 ≤ r ≤ n and z 僆 [0,1]. For the case considered here G* (0) ≠ 0
and the determinant of the polynomial in z is:
G( z ) = z r g ( z ),
where
a
g (z) = ∑ g z , a ≤ ( m × n) − r ,
i =0
i
i
G a ( z ) = z r −1 H ( z ),
where
b
H (z) = ∑H z .
i =0
i
i
217
218 Appendix B
It follows that the index on the sum is limited by b = (m × [n – 1]) – r + 1 with Hi being
n × n coefficient matrices. If G(z) is originally of infinite order, then a and b are also
infinite.
Pre-multiplying G(z) by H(z) extracts a factor of z and reduces the expression to a scalar
diagonal form:
Application to lag polynomial to draw out a unit root factor
Let A(L) be a n × n lag polynomial matrix or order m. This may be written instead as a
polynomial of order m in = (1 – L) using 1 – = L so that
m m
A( L) = I n − ∑ A L = I − ∑ A (1 − ∆) .
i =1
i
i
n
i =1
i
i
m m
A( L) = I n − ∑ A L = I − ∑ A (1 − ∆) = G(∆).
i =1
i
i
n
i =1
i
i
G( z ) = I n − ∑ A (1 − z) .
i =1
i
i
m m
G(0) = I n − ∑ A (1 − 0) = I − ∑ A = A(1).
i =1
i
i
i =1
i
It is also important to recall that in the reduced rank case G(0) must be singular as is
A(1). Assuming this condition to be satisfied, then replace z in equation (B.3) by to
give
H ( ∆ )G( ∆ ) = ∆g ( ∆ ) I n . (B.4)
~
Both H() and g() may be written as polynomials of L (of unchanged order), say H (L)
~
and g (L) respectively, and so (B.4) may be written
H˜ ( L) A( L) = ∆g˜( L) I . (B.5)
n
~
Equation (B.5) states that pre-multiplying A(L) by H (L) results in a scalar diagonal lag
polynomial matrix with a scalar factor in the difference operator .
Appendix C: Johansen’s Procedure as a
Maximum Likelihood Procedure
The starting point for obtaining the maximized log-likelihood function in terms of the
relevant eigenvalues is a multivariate Gaussian distribution. From this assumption
follow the maximum likelihood estimates of the cointegrating vectors as particular
eigenvectors and the expression of the maximized likelihood in terms of the subset of
the corresponding eigenvalues. This in turn leads to simple expressions for test statistics
based on the comparison of maximized likelihoods, since these too will depend on the
relevant eigenvalues. Not all distributional assumptions will lead to these results and, as
such, the Johansen procedure can be said to depend on the Gaussian assumption. The
distributional assumption is that the disturbances of the VAR follow a multivariate
Gaussian distribution. That is:
p −1
∆xt = xt −1 − ∑
x i
∗
t −i + t . (C.1)
i=1
= ′ and t ~ N I I D (0, )
1
1 p −1
∑
x
− n 1
−
g (xt , ,
i∗ , ) = (2
) 2 exp − xt − ′xt −1 +
2 ∗
i t −i
2 i =1
p −1
× −1 xt − ′xt −1 +
i =1
∑
i∗xt −i .
The natural logarithm of the joint density of xt t = 1, 2, …, T, ignoring initial values for
convenience, is
G(xt ,t = 1,2 ,… ,T , ,
i∗ , ) = −
1
2
1
nT log (2
) − T log
2
( )
T
p −1
′ p −1
∑ ∑ ∑
1
− xt − ′xt −1 +
xt −i −1 xt − ′xt −1 +
∗ ∗
xt −i .
2
i
i
t =1 i =1 i =1
Thus the log-likelihood of the VECM (conditional on the data), minus the constant
term –12– nT log (2
) is given by
log L (, ,
i∗ , ) = −
1
2
T log ( )
T
p −1
′ p −1
∑ ∑ ∑
1
− xt − ′xt −1 +
i∗xt −i −1 xt − ′xt −1 +
i∗xt −i .
2
t =1 i =1 i =1
219
220 Appendix C
This expression and subsequent algebra is simplified by re-expressing the log likelihood
in terms the following:
z0 ,t = xt , z1,t = xt −1 ,
[
z 2′ ,t = xt′−1 … xt′−( p −1) ]′
and
= [
1 …
p–1]. Then the log likelihood can be written:
log L(, ,
, ) = −
1
2
T log ( )
T
∑ (z ) (
− ′z1,t +
z 2 ,t ′ −1 z0 ,t − ′z1,t +
z 2 ,t . )
1
− 0 ,t
2 t =1
This function may be maximized with respect to
alone giving rise to an expression for
the maximum likelihood estimator for
in terms of the data and the other parameters
of the model.
– –
Denote this estimator as
. By differentiating the log likelihood with respect to
and
–
solving the first-order conditions,
is given by
= M 0 ,2 M 2−,12 − ′M1,2 M 2−,12
T
∑
1
where M i, j = zi,t z j′,t . The values of , and that maximize log L(, ,
, ) will
T t =1
– –
also maximize this expression with
substituted for
– that is log L(, ,
, ). The
latter function is known as the concentrated likelihood function. Before writing it out
– – –
in full, note that
appears in log L (, ,
, ) only in the term (z–0,t – ′z–1,t –
z–2,t) or its
– –
transpose, so
appears in the concentrated log-likelihood only in (z0,t – ′z1,t –
z–2,t).
– –
But
z0 ,t − ′z1,t −
z 2 ,t = z0 ,t − ′z1,t − ( M 0 ,2 M 2−,12 − ′M1,2 M 2−,12 )z 2 ,t
= ( z0 ,t − M 0 ,2 M 2−,12 z 2 ,t ) − ′( z1,t − M1,2 M 2−,12 z 2 ,t ).
Define
R0 ,t = z0 ,t − M 0 ,2 M 2−,12 z 2 ,t (C.2)
−1
R1,t = z1,t − M1,2 M z
2 ,2 2 ,t (C.3)
so that
z0 ,t − ′z1,t −
z 2 ,t = R0 ,t − ′R1,t
and note that R0,t and R1,t are the residuals from the least squares regression of z–0,t and
z–1,t respectively on z–2,t. Using this residual notation, the concentrated log-likelihood may
be written
T
log L (, , ) = −
1
2
T log −
1
2
( ) ∑{( R 0 ,t − ′R1,t )′ −1 ( R0 ,t − ′R1,t )}.
t =1
ˆ = S0 ,1( ′S1,1)−1 ,
ˆ = S − S ( ′S )−1 ′S ,
0 ,0 0 ,1 1,1 1,0
T
∑R
1
where Si, j = i,t Rj′,t .
T i =1
Appendix C 221
Thus, finally, the only term of interest in the concentrated likelihood, i.e. in log LMAX, is
ˆ, which itself is a function only of (and the data). It therefore remains only to max-
imize log LMAX with respect to . Clearly the value of that maximizes log LMAX also
~ ˆ|), since the difference is a constant term (a multiplicative
maximizes log L = – –T2 log (|
term in the likelihoods themselves). The problem is to obtain the value of that max-
~
imizes log L. By definition, this will be the maximum likelihood estimator. Equivalently,
the problem is to minimize
ˆ = S − S ( ′S )−1 ′S .
Q ( ) = 0 ,0 0 ,1 1,1 1,0
The solution to this problem is obtained by first re-expressing Q() using the formulae
for the determinant of a partitioned matrix. In general, for any matrix
A1,1 A1,2
A=
A2 ,1 A2 ,2
with invertible diagonal blocks
Setting A1,1 = S0,0, A1,2 = S0,1, A2,1 = A′1,2 and A2,2 = ′S1,1 gives rise to the following
expression
ˆ = S − S ( ′S )−1 ′S
Q ( ) = 0 ,0 o ,1 1,1 1,0
~
Let ˆ be the n × r matrix that minimizes Q(). Consider the solutions = i to the eigen-
value problem:
I − S1−,11S1,0 S0−,10 S0 ,1 = 0 (C.5)
ordered so that 1 > 2 > … > n Let ˆi for i = 1,2, …, r, be the eigenvectors corresponding
to i, i = 1,2, …, r, the r largest eigenvalues. Then it is stated without proof that
= ˆ = (ˆ1 … ˆr )
minimizes (C.4). Furthermore, the minimized function can be written:
r
Q (ˆ ) = S0 ,0 ∏ (1 − ).
i =1
i
T
r
Since the ˆi are eigenvectors, many normalizations are possible. A convenient choice for
deriving the above expressions in terms of the eigen values follows from observing that
the original eigen value problem is equivalent to solving
S1,1 − S1,0 S0−,10 S0 ,1 = 0. (C.7)
(This is known as solving for the eigenvalues of S1,0S–1 0,0S0,1 in the metric of S1,1.)
Consequently the matrix of eigenvectors (ˆ) that diagonalizes S1,1
–1
S1,0S–1
0,0S0,1 also diag-
–1
onalizes S1,1 and S1,0S 0,0S0,1 in the following manner:
ˆ ′S1,1ˆ = I ,
ˆ ′S1,0 S0−,10 S0 ,1ˆ = ,
where = diag ( 1, 2, …, r). It follows from the diagonalization that
= ∏ (1 − )
i =1
i
is the minimized value of (C.4), giving (C.6) as the maximized log-likelihood. Subject to
conditioning the problem on the , the values of ˆ and ˆ can be calculated directly
from the formulae above, but with this normalization reduce to:
( )
−1
ˆ = S0 ,1ˆ ˆ ′S1,1ˆ = S0 ,1ˆ ,
ˆ = S − ′.
0 ,0
It follows that
ˆ =
ˆ ˆ ′ = S0 ,1
ˆ ˆ ′.
The determination of ˆ and ˆ in this way ensures that ˆ is of rank r ≤ n. Since the
approach works regardless of whether ˆ is of full or reduced rank, the procedure is
known as reduced rank regression. As will be indicated below, it is very closely related to
the calculation of canonical correlations.
This analysis demonstrates that the Johansen approach rests on the Gaussian assump-
tion in the following ways:
(i) Through concentration of the likelihood function it explains the generation of R0,t
and R1,t, and how this relates to the Gaussian likelihood.
(ii) The expression of the maximized likelihood in terms of the eigenvalues depends
on the particular form of the concentrated likelihood function in terms of the
ratio of the determinants of quadratic forms.1
(iii) The expressions for the likelihood ratio statistics in terms of the eigenvalues
depends on the expression for the maximized log likelihood, and hence these too
depend on the distributional assumption.
Appendix D: The Maximum
Likelihood Procedure in Terms
of Canonical Correlations
∑R
1
Si∗, j = ∗
i,t R∗j,′t = I for i = j and P otherwise
T i =1
where P = diag (p1, …, pn), pi > 0, and, by appropriate ordering of the elements of R0,t and
R1,t, p1 ≥ p2 ≥ … ≥ pn. As all the pi are correlations and positive by construction, they lie
on the [0,1] interval. They are called canonical correlations. The solutions to the prob-
lems of the selection of A and B are the solutions to two closely associated eigenvalue
problems. Consider the matrices
H 0 = S0−,10 S0 ,1S1−,11S1,0
H1 = S1−,11S1,0 S0−,10 S0 ,1.
The eigenvalues of these two matrices are identical and given by the solution to equa-
tion (C.7) above. That is, they are the i, i = 1, 2, … n of the maximum likelihood
problem. The eigenvectors of H0 are the solutions for the columns of A and are denoted
ai. They are chosen so that a′i S0,0 aj = 1 for i = j, 0 otherwise. The eigenvectors of H1 have
already been denoted i, and are normalized as before, so that ′i S1,1j = 1 for i = j, 0
otherwise. Thus B is an n × n matrix with ith, column i. In addition, R2 = diag ( 1, …, n),
in other words, the eigenvalues are the squared canonical correlation. Thus, from
the expression for the maximized log-likelihood of equation (C.6), the Johansen ML
223
224 Appendix D
procedure can be seen to be the calculation of the coefficients of the linear combina-
tions of the non-stationary variables such that their correlation with the (canonically
combined) stationary variables is maximized. For given r, the required linear combina-
tions of the levels will be those using the eigenvectors, i, i = 1, 2, …, r. In order to max-
imize correlation with stationary variables, the linear combinations of the I(1) variables
will need to be as close to stationarity as possible. The problem is restricted by only
considering the r most correlated combinations. The cointegrating rank r has to
be determined by testing. The values of the model parameters are then obtained as
outlined in Appendix C.
Appendix E: Distribution Theory
(i) b(0) = 0
(ii) b(u) – b(v) ~ N(0, |u – v|) ∀u ≠ v1
(iii) E(b (u1) – b(u2)) (b(u3) – b(v4)) = 0.
t ~ I I D(0, 2 ), for t = 1, 2 , … , T
and partial sum
t
st = ∑ .
i =1
i
s T = ∑ .
i =1
i
then F (.) is called the limiting distribution of the sequence s〈T〉. It is said that FT, (.)
converges weakly to F (.).3 Notationally, FT, ⇒ F means FT, (.) converges weakly to
225
226 Appendix E
F(.). Furthermore, if S() is a random variable having distribution function F (.), then
s〈T〉 is said to converge in distribution to S(). The notation for this ought strictly to be
different since it involves random variables rather than their distribution functions, but
the same symbol ⇒ will be used. Otherwise, a commonly used notation for convergence
D
in distribution is →.
In fact, s〈T〉 does not have a limiting distribution. In order to obtain convergence, it
1 1
must be divided by T 2– . Donsker’s theorem defines the random variable to which T– 2– s〈T〉
tends in distribution in terms of Brownian motion. It states
T
∑ ⇒ b( )
1
−
T 2
i for ∈[ 0,1].4
i =1
A further tool is needed since the asymptotic distributions required are those of func-
tions of (normalized) partial sums. The continuous mapping theorem (CMT) states that,
if a sequence of random variables tends in distribution to another random variable,
then a given continuous function of the sequence tends to the same function of that
limit. So, for any continuous function g (.), the CMT states that
−1
g T 2 s T ⇒ g ( b( )).
An important example of a function to which the CMT applies is the integral. The CMT
and Donsker’s theorem can therefore be used to derive the Brownian motion character-
istics of a wide range of random variables based on partial sums of IID random variables.
(i) B(0) = 0;
(ii) B() ~ N(0, In);
T
∑ ⇒ W ( ).
1
−
T 2
i
i =1
Johansen (1995, appendix B) discusses the key results needed to obtain the limiting
distributions of the test statistics and estimators. Concerning the trace statistic, an impor-
tant observation, allowing the application of the multivariate version of Donsker’s theorem
via the CMT, is that the eigenvalues on which the statistic depends, are continuous func-
tions of product moments, the asymptotic distributions of which are available.
Appendix E 227
(i) Establish the relationship between the eigenvalues that appear in the test statistic
and the product moment matrices, Si,j. This can be derived since the eigenvalues
are the solutions to the problem | I – S–1 –1
1,1S1,0S 0,0S0,1| = 0. This is the standard eigen-
value problem, for which the solutions = i, i = 1, 2, …, n are the eigenvalues of
S–1 –1
1,1S1,0S 0,0S0,1 and so
n
∑ = tr (S
i =1
i
−1 −1
S S S ).
1,1 1,0 0 ,0 0 ,1
(ii) Establish the asymptotic distributions of the Si, j under the null that the cointegrat-
ing rank is zero. These are.5
1
∫
T −1S1,1 ⇒ WW ′du
0
(E.1)
∫
T −1S1,0 ⇒ W (dW )′
0
(E.2)
S0 ,0
P
→'
P
where → indicates ‘convergence in probability’, meaning that the random variable
on the left-hand side tends to the deterministic quantity on the right.6
(iii) Replace i by the more appropriate notation ˆi to emphasize that they are random
variables, and apply the CMT to obtain the limiting behaviour of
n
∑ ˆ = tr (S
i =1
i
−1 −1
S S S ).
1,1 1,0 0 ,0 0 ,1
This simply requires the substitution of the limit results at (ii) into the expression,
and adjusting for the required normalization such that convergence is to a random
variable. Thus,
n 1 −1
1 1
∑ ∫ ∫
i ⇒ tr T WW ′du W[ dw ]′ ' −1 dWW ′ .
∫
i =1
0 0 0
Clearly the right-hand side of this expression has a factor of T–1, indicating that it
tends to zero rather than a random variable. So, for weak convergence, both sides
must be multiplied by T to give
n 1 −1
1 1
T ∑ ∫ ∫
i ⇒ tr WW ′du W[ dw ]′ ' −1 dWW ′ .
∫
i =1
0 0 0
1
This expression can be written in terms of standard Brownian motion, B(u) = '– –2
W(u) as
n 1 1
−1
1
T ∑ i ⇒ tr
∫
(dB) B′
∫
BB′du B(dB)′ .
∫
i =1
0 0 0
228 Appendix E
(iv) Next, establish how the trace statistic can be expressed in terms of ∑ .
i =1
i
Note that, since | ˆi| < 1, the usual expansion of the natural logarithm function
applies, such that
n
∑ ( )
n
−T
i=1
log 1 − ˆi = T
i =1
ˆi + ∑
where is an asymptotically irrelevant term such that it can be ignored in the
n
∑ ( )
n
n
same random variable.7 But −T
i=1
∑ (
)
log 1 − ˆi is the trace statistic for testing the
−T
∫
log 1 − ˆi ⇒ tr (dB) B′du BB′du
∫ ∫
B(dB)′ .
(E.3)
i =1
0 0 0
n
−T ∑ (
i=r +1
log 1 − ˆi .
)
The analysis proceeds by examining the behaviour of the n – r smallest eigenvalues
under the null. It is stated, without proof, that under the null hypothesis that the coin-
tegrating rank is r, with appropriate normalization, the smallest n – r eigenvalues con-
verge to zero while the remaining r tend to positive constants.8 It transpires that the
problem is best addressed not in terms of the eigenvalues, i but of i = T i. For conve-
nience, define
S( ) = S1,1 − S1,0 S0−,10 S0 ,1.
The eigenvalues are the solutions to |S( )| = 0. Clearly the solutions are unchanged for
the problem
A ′ S( ) A = 0
for any non-singular matrix A. Now partition A such that A = (A1 A2), then:
A ′ S( ) A = H G
G =0 (E.4)
Appendix E 229
so that it is only necessary to consider G. But G can be broken down into a number of
components whose asymptotic distributions can be derived, and hence, via the CMT,
the distribution of the trace statistic is obtained.
Let A1 = , A2 = ⊥(′⊥⊥)–1 where is n × r and ⊥ is n × (n – r) and orthogonal to .
The derivation begins by showing that H is redundant. For this choice of A
H = H1 − H 2
H1 = ′S1,1
H 2 = ′S1,0 S0−,10 S0 ,1.
H is seen to be a function of only through H1. Now reparameterize the problem using
= T . Then, H1 = T–1′S1,1. The asymptotic limits taken from now on will be such that
remains fixed as T → ∞, which means → 0. Thus from this point on, the discussion is
with respect to the eigenvalues normalized by T. Under this limit, H1 → 0, and so, asymp-
totically, H is not a function of , and the required solutions will follow from (E.4).
Now consider G, and for convenience put D = ⊥(′⊥⊥)–1 so that
(
G = D ′ S( ) − S( )[ ′S( ) ]−1 ′S( ) D )
= G1 − G2 − G3 (E.5)
~ ~ ~
where G1 = ) –1D′S1,1D, G2 = D′S1,0S–1
0,0S0,1D, G3 = G 3 (′S( ))G ′3, G 3 = D′S( ). Further
convergence results are now required (Johansen, 1995, lemma 10.3). These are, general-
izing (E.1) and (E.2) respectively:
1
∫
D ′( S1,0 − S1,1 ′) ⇒ W (dW )′
0
9
where W now has dimension n – r dimension; and
′S1,1
P
→ , , (E.6)
′S1,0 → ,0 ,
P
(E.7)
S0 ,0
P
→ 0 ,0 , (E.8)
D′S1,1 = OP (1). (E.9)
The last equality means that the probability that D′S1,1 diverges from a constant value
tends to zero, and hence that it can be regarded as a constant in the limit.10 In the fol-
lowing the “=” sign represents either equality or weak convergence to the same random
variable. Then, by (E.8)
where the last equality follows from (E.8) and (E.7), and the previous one from
(E.9). Then,
230 Appendix E
˜ ( ′S( ) ) −1 G
G3 = G ˜′
3 3
( )
−1
˜ T −1 ′S − S S −1 S
=G ˜′
G
3 1,1 1,0 0 ,0 0 ,1 3
( )
−1
˜ ′S S −1 S
= −G ˜ ′ , as T → ∞
G
3 1,0 0 ,0 0 ,1 3
( )
−1
˜ ′S
= −G −1
S S ˜ ′ as T → ∞
G
3 1,0 0 ,0 0 ,1 3
= −G ( ′ )
−1
˜ −1
˜′
3 ,0 0 ,0 0 , G3 (E.12)
where the last equality follows using (E.7, E.8). Substituting (E.10, E.11, and E.12) into
(E.5) gives
˜ ( ′S( ) ) −1 G
G = G1 − G2 − G ˜′
3 3
( )
−1
˜ ′ −1
= T −1D ′S1,1D − D ′S1,0 0−1,0 S0 ,1D + G ˜′
G
3 ,0 0 ,0 0 , 3
( )
−1
Q = 0−1,0 − 0−1,0 0 , ′ ,0 0−1,0 0 , ,0 0−1,0 .
∫
D ′S1,0 ⊥ ⇒ W (dW )′ ⊥
0
and so
1 1 ′
0
∫
G4 ⇒ W (dW )′ ⊥ (Var ( ⊥′ W ))−1 ⊥′ W (dW )′ .
0
∫ (E.14)
Similarly,
1
and so
1
∫
G1 ⇒ WW ′du.
0
(E.16)
1
1 1
0
∫ 0
∫
G ⇒ WW ′du − W (dW )′ ⊥ (Var ( ⊥′ W ))−1 ⊥′ W (dW )′ .
0
∫
Appendix E 231
It follows from the CLT that the solutions of the problem |G| = 0 converge in distribu-
tion to those of the problem
1
1 1 ′
∫
0 0
∫
WW ′du − W (dW )′ ⊥ (Var ( ⊥′ W ))−1 ⊥′ W (dW )′ = 0.
0
∫
The solutions for are unchanged if the matrix of which the determinant is being taken
1
is pre- and post-multiplied by '– –2, which leads to simplification since the outer occur-
1
rences of W become standardized as B = '–2– W. Thus (E.17) may be replaced by
1
1 1 ′
0
∫ 0
∫
BB′du − B(dW )′ ⊥ (Var ( ⊥′ W ))−1 ⊥′ B(dW )′ = 0.
0
∫ (E.18)
1
Finally, noting also that (Var(′⊥W))–2– (′⊥W) = B, equation (E.18) may be written
1
1 1 ′
0
∫
BB′du − B(dB)′
0
∫
0
B(dB)′ = 0
∫ (E.19)
n
−T ∑ (
i=r +1
log 1 − ˆi ),
which is asymptotically equivalent to
n
T ∑ ˆ .
i = r +1
i
1
′ 1
−1
n n
1
T ∑ ˆi ⇒ ∑ ∫ 0 ∫
i = tr (dB) B′ BB′du B (dB)′
∫ (E.20)
i = r +1 i = r +1
0 0
providing the required asymptotic distribution for the trace statistic for testing the null
of cointegrating rank r against the alternative of rank n. This result specializes to that for
testing cointegrating rank 0 against rank n by setting r = 0, as can be seen by comparing
equations (E.20) and (E.3).
xt = xt −1 − ∑
x
i =1
i t −i + t + t
232 Appendix E
where t = 0 + 1t which, in its most general form, allows the process xt to have a
quadratic trend, and the cointegrating relations to have a linear trend (Johansen,
1991).11 The deterministic components, in increasing order of complexity, are:
(i) xt has no deterministic terms and all stationary components have zero mean.
(ii)xt has neither quadratic nor linear trend, but both xt and ′xt have constant terms.
(iii)
xt has a linear trend, but this is eliminated in the cointegrating combinations.
(iv)xt has no quadratic trend, but has a linear trend that is also present in the cointe-
grating relations.
(v) xt has a quadratic trend, but the cointegrating relations have a linear trend only.
The asymptotic distribution of the trace statistic for testing the null of cointegrating
rank r has the same generic form in each case, but the distributions have to be corrected
differently. This form is
1
′ 1
−1
1
∫ ∫
0
∫
tr (dB) F ′ F F ′du F (dB)′
0 0
(E.21)
where B is an n – r standard Brownian motion, and F is the same standard Brownian
motion corrected for the deterministic components, with the final element (either the
n – rth or n – r + 1st) consisting of the appropriate power of u corrected for the same
components.
This is described in Table E.1. The coefficients ai and bi are fixed and required to
correct for the included deterministic terms. All elements of the corrected Brownian
motion, except the last, are, in effect, a residual having regressed the standard case on
the deterministic terms. The last term, the qth in the table below, corresponds to regress-
ing the random variable u on the same terms. If the highest order deterministic term is
orthogonal to a then the final term is n – r + 1st, otherwise it is the n – rth.
(i) 0 0 n–r ai = 0, bi = 0 –
(ii) ακ0 0 n–r ai = 0, bi = 0 1
1
(iii) ⊥ 0 ≠ 0 0 n–r–1 u – ai, ai = 1/2*
∫ B (u)du, b = 0
+
ai = i i
ai and bi are fixed coefficients necessary to correct for the included deterministic terms. +Corrects
Bi(u) for a constant. *Corrects u for a constant. ++Corrects Bi(u) for a linear time trend. **Corrects u2
for a linear time trend.
Cases Source
Note:
D – Doornik (2003); J – Johansen (1995); OL – Osterwald-Lenum (1992).
Cheung and Lai (1993) suggest correcting for, in effect, the number of parameters esti-
mated in the VAR. The correction is to replace T by T – np. Equivalently, the asymptotic
critical values can be multiplied by T/(T – np). The result is to correct a tendency of the
asymptotic tests13 to be over-sized. That is, when used naively, the tests reject the null
hypothesis too frequently. When testing the null of non-cointegration, this results in
findings of cointegration where it does not exist.
′ 1
−1
1 1
0
∫ 0
∫
A(n, r ) = ( dB) F ′ F F ′du
∫
F ( dB)′
0
as described in equation (E.21), the asymptotic distribution of the maximal eigenvalue
statistic is, analogously, the maximal eigenvalue of A(n,r). In practice, the maximal
eigenvalue statistic would be used in the same sequential manner as the trace statistic,
but it is important to note that there is no proof yet available of the consistency of this
procedure for this statistic. It is therefore reasonable to place emphasis on the trace
statistic.
Partial systems
The system discussed treats all variables as endogenous. There is no sense in which any
of them plays a different causal role to any others. Johansen (1992) has discussed this,
and, more recently, Harbo, Johansen, Nielsen, and Rahbek (1998), and Pesaran, Shin
and Smith (2001) have considered the impact of exogenous I(1) variables on the asymp-
totic distribution of the test statistics. This generates a wider set of models for which the
distributions must be calculated, depending not only on the total number of variables
in the system (n), but also on the number of these that are endogenous (n1, say). Thus
A(n,r) of equation (16), where B is of dimension n – r, and F depends on B as described
in table A, is replaced by
′ 1
−1
1 1
A
∫
˜ (n, k , r ) = (dB
0
∫
˜ ) F˜ ′ F˜ F˜ ′du
0
∫
F˜ (dB
0
˜ )′
R0 ,t = ′R1,t + ε t
or
R0 ,t
ε t = R0 ,t − ′R1,t = [ I : ′] .
R1,t
It follows from Doornik and Hendry (2001) that the Concentrated Likelihood for this
multivariate least squares problem can be written:
T T R0 I
log L = K − [ ]
log εε ′ = K − log I : − ′ R0′ : R1′ [ ]
2 2 R1 − ′
T S0 ,0 S0 ,1 I
=K− [
log I : − ′ ] S
S1,1 − ′
2 1,0
T
log L = K − log S0 ,0 − S0 ,1( ′S1,1)−1 ′S1,0
2
T
= K − log S0 ,0 ( ′S1,1)−1 ′( S1,1 − S1,0 S0−,10 S1,0 ) .
2
Subject to the normalization ′S1,1 = I and given that the solution to the likelihood
problem with respect to is invariant to S0,0, then the likelihood problem is equivalent
to solving the determinantal equation |′(S1,1 – S1,0S–1 0,0S1,0)| which in the cointegration
case is the reduced rank problem, |S1,1 – S1,0S–1
0,0S1,0| = 0. What is required is a solution to
the usual eigenvalue problem, | S1,1 – S1,0S–1
0,0S1,0| = 0, where for each non-zero eigenvalue
there is an eigenvector i such that:
( S1,1 − S1,0 S0−,10 S1,0 )i = 0.
Stacking the eigenvectors associated with the non-zero eigenvalues into an n × r matrix
, then is the matrix that diagonalizes S1,1 – S1,0S0,0
–1
S1,0. Therefore:
∏
r
′( S1,1 − S1,0 S0−,10 S1,0 ) = I − r = (1 − i ).
i =1
T
r
235
236 Appendix F
As was stated in chapter 4, any test of parameters must compare the above likelihood,
which imposes no restrictions on either or with one on which restrictions have been
imposed. Therefore:
log L( r , H g : = f ( ) ∩ = f ( ))
T S0 ,0 S0 ,1 I
=K− [
log I : − ( )( )′ ] S .
S1,1 −( )( )′
2 1,0
Doornik and Hendry (2001) explain how to maximize the non-linear likelihood under a
range of different restrictions.
Appendix G: Proof of Identification
based on an Indirect Solution
Define and as consisting of ij and ij elements for i = 1, … 5, and j = 1, … 4, and
as consisting of
ij elements for i = 1, … 5, and j = 1, … 5. For (WE) of i2,.5 = 0 and
5. = 0, which excludes them from our deliberations. However, over-identification is
sufficient for identification which implies that the conditions for over-identification are
necessary for the preferred parameters to be identified. If we look at equation (5.10) and
set 2′ = 0, then:
1 1 ′
= .
0 0
After imposing the same restrictions as Hunter and Simpson (1995) 1 and take the
following form:
13 =
13 , 14 =
14 , 33 =
33 , 34 and 44 =
44 .
Consequently:
(
13 +
12 )
11 =
11 +
14 , 21 = and 41 =
41 +
44 .
(
11 +
14 )
237
238 Appendix G
Furthermore:
25
21 =
21 − (
25 +
24 ), 24 =
24 +
25 , 22 −
22 − 2121 and 52 = .
22
The long-run restrictions imply that there are three over-identified parameters as there
are three unused solutions associated with some of the parameters in the system:
34 = −
31 , 33 = −
32 , 4121 =
42 .
Hence, the parameters are slightly over-identified, which is surprising given the number
of restrictions adopted, 20.
Appendix H: Generic Identification of
Long-Run Parameters in Section 5.5
A−1vec(
)
vec(1112 ) 31 61
A−1vec(
32
62 )
vec ( 21 22 )
vec(3132 ) A−1vec(
33
63 ) −1 1 0
= 31 ,
−1 , A = 0
vec(4142 ) A vec(
34
64 )
1
62
vec(5152 ) −1
A vec (
)
35 65
vec(6162 ) −1
A vec(
36
66 )
−1 −1
i1 = 31
3i , for i = 1,2 ,3,5,6, i 2 = 62
6 i , for i = 1,2 ,3,4,6,
−1 −1
1 = 31
34 , 1 = 62
65.
vec(1112 )
vec( 21 22 )
vec(3132 ) 1 − 51
vec(
14
24 …
64
= ( B −1 ⊗ I 6 )
−1
and B = −1
vec(4142 ) vec(
15
25 …
65 42
vec( ) −
51 52
vec(6162 )
where = – 1 – 12 51. Solving the former equation, subject to the restrictions on :
1 1 1
11 =
14 − 51
15 , 21 =
24 − 51
25 , 12 = − 42
14 −
15 = 0,
1 1
22 = − 42
24 −
25 = 0, 31 =
34 − 51
35 ,
1 42 1 42
41 =
44 −
45 = 0, 51 =
54 −
55 = 0,
1 1
61 =
64 − 42
65 = 0, 32 = − 42
34 −
35 = 0,
1 1 1
42 = − 51
44 −
45 , 52 = − 51
54 −
55 , 62 = − 51
64 −
65.
As the parameters are over-identified one only needs to consider the following results:
11, 21, 31, 42, 52.
239
References
240
References 241
Burke, S.P. and Hunter, J. (1998) The impact of moving average behaviour on the
Johansen trace test for cointegration. Discussion Papers in Quantitative Economics and
Computing, no. 60, Department of Economics, University of Reading.
Caner, M., and Kilian, L. (2001) Size distortions of tests of the null hypothesis of
stationarity: evidence and implications for the PPP debate. Journal of International
Money and Finance, 20, 639–57.
Cheung, Y.-W. and Lai, K.S. (1993) Finite-sample sizes of Johansen’s likelihood ratio
tests for cointegration. Oxford Bulletin of Economics and Statistics, 55, 313–28.
Chow, G.C. (1978) Analysis and Control of Dynamic Economic Systems. New York: John
Wiley.
Clements, M.P. and Hendry, D.F. (1995) Forecasting in cointegrated systems. Journal of
Applied Econometrics, 10, 127–46.
Clements, M.P. and Hendry, D.F. (1998) Forecasting Economic Time Series. Cambridge:
Cambridge University Press.
Clements, M.P. and Hendry, D.F. (2001) Forecasting Non-Stationary Economic Time Series.
London: The MIT Press.
Corradi, V., Swanson, N.R., and White, H. (2000) Testing for stationarity-ergodicity and
for comovements between nonlinear discrete time Markov processes. Journal of
Econometrics, 96, 39–73.
Davidson, J.E.H. (1994) Stochastic Limit Theory. Oxford: Oxford University Press.
Davidson, J.E.H., Hendry, D.F., Srba, F., and Yeo, S. (1978) Econometric modelling of
the aggregate time series relationships between consumers, expenditure and income
in the United Kingdom. Economic Journal, 88, 661–92.
Davidson, R. and MacKinnon, J.G. (1993) Estimation and Inference in Econometrics.
New York: Oxford University Press
Davidson, R. and MacKinnon, J.G. (1998) Graphical methods for investigating the size
and power of hypothesis tests. The Manchester School, 6, 1–26.
Deaton, A.S. and Muellbauer, J.N.J. (1980) An almost ideal demand system. American
Economic Review, 70, 312–26.
Dickey, D.A. and Fuller, W.A. (1979) Distribution of the estimation for autoregressive
time series with a unit root. Journal of the American Statistical Association, 74, 427–31.
Dickey, D.A. and Fuller, W.A. (1981) Likelihood ratio statistics for autoregressive time
series with a unit root. Econometrica, 49, 1057–72.
Dickey, D.A, Hasza, D.P. and Fuller, W.A. (1984) Testing for unit roots in seasonal time
series. Journal of the American Statistical Association, 79, 355–67.
Dickey, D.A. and Pantula, S.G. (1987) Determining the order of differencing in auto-
regressive processes. Journal of Business and Economic Statistics, 5, 455–61.
Dhrymes, P.J. (1984) Mathematics for Econometrics. New York: Springer-Verlag.
Dolado, J., Galbraith, J.W. and Banerjee, A. (1991) Estimating intertemporal quadratic
adjustment costs models with dynamic data. International Economic Review, 32,
919–36.
Dornbusch, R. (1976) Expectations and exchange rate dynamics. Journal of Political
Economy, 84, 1161–76.
Doornik, J.A. (1995) Testing general restrictions on the cointegration space. Mimeo,
Nuffield College, Oxford.
Doornik, J.A. (1998), Approximations to the asymptotic distribution of cointegration
tests. Journal of Economic Surveys, 12, 573–93.
Doornik, J.A. (2003) Asymptotic tables for cointegration tests based on the gamma-
distribution approximation. Mimeo, Nuffield College, University of Oxford.
242 References
Doornik, J.A. and Hendry, D.F. (1996) PCFIML 9. London: Thompson International
Publishers.
Doornik, J.A. and Hendry, D.F. (2001) PCFIML 10. London: Timberlake Consultants
Press.
Dunne, J.P. and Hunter, J. (1998) The allocation of government expenditure in the UK:
a forward looking dynamic model. Paper presented at the International Institute of
Public Finance Conference, Cordoba, Argentina, August.
Elliott, G., Rothenberg, T.J and Stock, J.H. (1996) Efficient tests for an autoregressive
unit root. Econometrica, 64, 813–36.
Engle, C. (2001) The responsiveness of consumer prices to exchange rates and the impli-
cations for exchange-rate policy: a survey of a few recent new open economy macro
models. Mimeo University of Wisconsin.
Engle, R.F. (1982) Autoregressive conditional heteroscedasticity with estimates of the
variance of United Kingdom inflation. Econometrica, 50, 987–1007.
Engle, R.F. and Granger, C.W.J. (1987) Co-integration and error-correction: representa-
tion, estimation and testing. Econometrica, 55, 251–76.
Engle, R.F. and Granger, C.W.J. (1991) Long-Run Economic Relationships. Oxford: Oxford
University Press.
Engle, R.F. and Yoo, B.S. (1987) Forecasting and testing in co-integrated systems. Journal
of Econometrics, 35, 143–59.
Engle, R.F. and Yoo, B.S. (1991) Cointegrated time series: an overview with new results.
Chapter 12 in R.F. Engle and C.W.J. Granger (eds), Long-run Economic Relationships.
Oxford: Oxford University Press.
Engle, R.F., Hendry, D.F. and Richard, R.F. (1983) Exogeneity. Econometrica, 51, 277–304.
Engsted, T. and Haldrup, N. (1997) Money demand, adjustment costs and forward
looking behaviour. Journal of Policy Modeling, 19, 153–73.
Engsted, T. and Johansen, S. (1999) Granger’s representation theorem and multicointe-
gration, cointegration, causality, and forecasting. In A Festschrift in Honour of Clive
W. J. Granger. Engle, Robert F. White, Halbert, eds., Oxford and New York: Oxford
University Press, 200–11.
Ericsson, N.R. (1994) Testing exogeneity: An introduction, in Testing Exogeneity.
Ericsson, N.R. and Irons, J.S., eds, Oxford University Press, 3–38.
Ericsson, N.R. and Irons, J.S. (1994) Testing Exogeneity. Oxford: Oxford University Press.
Ericsson, N.R., Hendry, D.F. and Mizon, G.E. (1998) Exogeneity, cointegration and
economic policy analysis. Journal of Business and Economics Statistics, 16, 371–87.
Fama, E.F. (1970) Efficient capital markets: a review of theory and empirical work.
Journal of Finance, 25, 383–417.
Favero, C. and Hendry, D.F. (1992) Testing the Lucas critique: a review. Econometric
Reviews, 11, 265–306.
Fisher, P.G., Tanna, S.K., Turner, D.S, Wallis, K.F., and Whitley, J.D. (1990) Econometric
evaluation of the exchange rate in models of the UK economy. Economic Journal, 100,
1024–56.
Flôres, R. and Szafarz, A. (1995) Efficient markets do not cointegrate. Discussion Paper
9501, CEME, Université Libre de Bruxelles.
Flôres, R., and Szafarz, A. (1996) An extended definition of cointegration. Economics
Letters, 50, 193–5.
Florens, I.P., Mouchart, M. and Rolin, J.-M. (1990) Sequential Experiments, Chapter 6 in
Elements of Bayesian Statistics. New York: Marcel Dekker.
Franses, P.H. (1994) A multivariate approach to modeling univariate seasonal time
series. Journal of Econometrics, 63, 133–51.
References 243
Hall, R.E. (1978) Stochastic implications of the life cycle-permanent income hypothesis:
theory and evidence. Journal of Political Economy, 86, 971–87.
Hall, S.J. and Wickens, M. (1994) Causality in integrated systems. Centre for Economic
Forecasting, Discussion paper, 27–93, London Business School.
Hamilton, J.D. (1994) Time Series Analysis. Princeton: Princeton University Press.
Hansen, B.E. (1995) Rethinking the univariate approach to unit root testing: using
covariates to increase power. Econometric Theory, 11, 1148–72.
Hansen, L.P. and Sargent, T.J. (1982) Instrumental variables procedures for estimating
linear rational expectations models. Journal of Monetary Economics, 9, 263–96.
Hansen, P. and Johansen, S. (1998) Workbook for Cointegration. Oxford: Oxford
University Press.
Harbo, I., Johansen, S., Nielsen, B., and Rahbek, A. (1998) Asymptotic inference on
cointegrating rank in partial systems. Journal of Business and Economic Statstics, 16,
388–399.
Harvey, A.C. (1989) Forecasting Structural Time Series Models and the Kalman Filter.
Cambridge, Cambridge University Press.
Harvey, A.C. (1993) Time Series Models (second edition). London: Harvester Wheatsheaf.
Hatanaka, M. (1996) Time-series-based Econometrics: Unit Roots and Cointegration. Oxford:
Oxford University Press.
Haug, A.A. (1993) A Monte Carlo study of size distortions. Economics Letters, 41, 345–51.
Haug, A.A. (1996) Tests for cointegration: A Monte Carlo comparison. Journal of
Econometrics, 71, 89–115.
Hendry, D.F. (1988) The encompassing implications of feedback versus feed-forward
mechanisms in econometrics. Oxford Economic Papers, 40, 132–49.
Hendry, D.F. (1995) Dynamic Econometrics. Oxford: Oxford University Press.
Hendry, D.F. and Ericsson, N.R. (1990) An econometric analysis of U.K. money demand
in Monetary Trends in the United States and the United Kingdom by Milton
Friedman and Anna Schwartz. American Economic Review, 81, 8–38.
Hendry, D.F. and Favero, C. (1992) Testing the Lucas critique: a review. Econometric
Reviews, 11, 265–306.
Hendry, D.F. and Mizon, G.E. (1978) Serial correlation as a convenient simplification
not a nuisance: a comment on a study of the demand for money by the Bank of
England. Economic Journal, 88, 549–63.
Hendry, D.F. and Mizon, G.E. (1993) Evaluating dynamic econometric models by
encompassing the VAR. Chapter 18 in P.C.B. Phillips (ed.), Models, Methods and
Applications of Econometrics: Essays in Honour of A.R. Bergstrom. Cambridge, MA:
Blackwell Publishers, 272–300.
Hendry, D.F., Pagan, A. and Sargan, J.D. (1983) Dynamic Specification: The Handbook of
Econometrics. Amsterdam: North Holland.
Hendry, D.F. and Richard, J.F. (1982) On the formulation of empirical models in
dynamic econometrics. Journal of Econometrics, 20, 3–33.
Hendry, D.F. and Richard, J.F. (1983) The econometric analysis of economic time series.
International Statistical Review, 51 111–63.
Henry, M. and Robinson, P.M. (1996) Bandwidth choice in Gaussian Semi-parametric
estimation of long-run dependence. In the Papers and Proceedings of the Athens
Conference on Applied Probability and Time Series Analysis. Robinson, P.M. and
Rosenblatt, M. eds. New York: Springer-Verlag, 220–32.
Hosking, J.R.M. (1981) Fractional differencing. Biometrika, 68, 165–76.
Hubrich, K., Lutkepohl, H., and Saikkonen, P. (2001) A review of systems cointegration
tests. Econometric Reviews, 20, 247–318.
References 245
Hull, J. (2002) Options, Futures and Other Derivatives. London: Prentice Hall.
Hunter, J. (1989a) Dynamic modelling of expectations: with particular reference to the
labour market. Unpublished PhD manuscript, London School of Economics.
Hunter, J. (1989b) The effect of cointegration on solutions to rational expectations
models. Paper presented at European Econometrics Society Conference in Munich,
September.
Hunter, J. (1990) Cointegrating exogeneity. Economics Letters, 34, 33–5.
Hunter, J. (1992a) Tests of cointegrating exogeneity for PPP and uncovered interest rate
parity for the UK. Journal of Policy Modelling, Special Issue: Cointegration, Exogeneity
and Policy Analysis 14, 4, 453–63.
Hunter, J. (1992b) Representation and global identification of linear rational expecta-
tions models. Paper presented at the European Econometrics Society Conference in
Uppsala, CERF Discussion Paper, 92–03, Brunel University.
Hunter, J. (1994) A parsimonious cointegration representation of multi-cointegration.
Paper presented at the European Econometrics Society Conference in Maastricht,
CERF Discussion paper no 94–02, Brunel University.
Hunter, J. (1995) Representation and global identification of linear rational expecta-
tions. Mimeo, Brunel University.
Hunter, J. and Dislis, C.D. (1996) Cointegration representation, identification and esti-
mation. Brunel University, Centre for Research in Empirical Finance, Discussion
Paper.
Hunter, J. and Ioannidis, C. (2000) Identification and identifiability of non-linear
IV/GMM Estimators. Paper presented at the LACEA conference in Uruguay and the
ECSG conference in Bristol, Brunel University Discussion Paper, DP07–00.
Hunter, J and Simpson M. (1995) Exogeneity and identification in a model of the UK
effective exchange rate. Paper presented at the EC2 Conference in Aarhus Dec. 1995
and the Econometrics Society European Meeting in Istanbul 1996.
Inder, B. (1993) Estimating long-run relationships in economics: a comparison of
different approaches. Journal of Econometrics, 57, 53–68.
Johansen, S. (1988a) The mathematical structure of error correction models.
Contemporary Mathematics, 80, 359–86.
Johansen, S. (1988b) Statistical analysis of cointegration vectors. Journal of Economic
Dynamics and Control, 12, 231–54.
Johansen, S. (1991a) Estimation and hypothesis testing of cointegrating vectors in
Gaussian vector autoregressive models. Econometrica, 59, 1551–80.
Johansen, S. (1991b) A statistical analysis of cointegration for I(2) variables. University
of Helsinki, Department of Statistics Report, no. 77.
Johansen, S. (1992a) Testing weak exogeneity and the order of cointegration in UK
money demand data. Journal of Policy Modelling, Special Issue: Cointegration,
Exogeneity and Policy Analysis, 14, 313–34.
Johansen. S. (1992b) Cointegration in partial systems and the efficiency of single equa-
tion analysis. Journal of Econometrics, 52, 3, 389–402.
Johansen, S. (1995a) Likelihood-Inference in Cointegrated Vector Auto-Regressive Models.
Oxford: Oxford University Press.
Johansen, S. (1995b) Identifying restrictions of cointegrating vectors. Journal of
Econometrics, 69, 111–32.
Johansen, S. (1995c) A statistical analysis of cointegration for I(2) variables. Econometric
Theory, 11, 25–59.
Johansen, S. (2002a) A small sample correction for the test of cointegrating rank in the
vector autoregressive model. Econometrica, 70, 1929–61.
246 References
Johansen, S. (2002b) A small sample correction for tests of hypotheses on the co-
integrating vectors. Journal of Econometrics, 111, 195–221.
Johansen, S. and Juselius, K. (1990) Maximum likelihood estimation and inference on
cointegration – with applications to the demand for money. Oxford Bulletin of
Economics and Statistics, 52, 169–210.
Johansen, S. and Juselius, K. (1992) Some structural hypotheses in a multi-variate coin-
tegration analysis of the purchasing power parity and the uncovered interest parity
for UK. Journal of Econometrics, 53, 211–44.
Johansen, S. and Juselius, K. (1994) Identification of the long-run and the short-run
structure: An application to the IS/LM model. Journal of Econometrics, 63, 7–36.
Johansen, S. and Swensen A.R. (1999) Testing exact rational expectations in co-
integrated vector autoregressive models. Journal of Econometrics 93, 73–91.
Juselius K. (1994) Do PPP and UIRP hold in the long-run? – An example of likelihood
inference in a multivariate time-series model. Paper presented at Econometric Society
European Meeting, Maastricht.
Juselius, K. (1995) Do purchasing power parity and uncovered interest rate parity hold
in the long-run? – An example of likelihood inference in a multivariate time-series
model. Journal of Econometrics, 69, 178–210.
Keynes, J.M. (1939) Professor Tinbergen’s method. Reprinted in the Collected Writings of
John Maynard Keynes, vol. XIV, 306–18.
Kollintzas, T. (1985) The symmetric linear rational expectations model. Econometrica, 53,
963–76.
Koopmans, T.C. (1953) Identification problems in economic model construction. In
Studies in Econometric Method, Cowles Commission Monograph 14, Koopmans, T.C
and Hood, W.C., eds. New York: John Wiley and Sons.
Kremers, J.J.M., Ericsson, N.R. and Dolado, J. (1992) The power of cointegration tests.
Oxford Bulletin of Economics and Statistics, 54, 325–48.
Kwiatkowski, D., Phillips, P.C.B., Schmidt, P. and Shin, Y. (1992) Testing the null of
stationarity against the alternative of a unit root: how sure are we that economic time
series have a unit root? Journal of Econometrics, 54, 159–78.
Lee, D., and Schmidt, P. (1996) On the power of the KPSS test of stationarity against
fractionally-integrated alternatives. Journal of Econometrics, 73, 285–302.
Leybourne, S.J. and McCabe, B.M.P. (1994) A consistent test for a unit root. Journal of
Business and Economic Statistics, 12, 157–66.
Leybourne, S.J., McCabe, B.P.M. and Tremayne, A.R. (1996) Can economic time series be
differenced to stationarity? Journal of Business and Economic Statistics, 14, 435–46.
Lin, J.-L. and Tsay, R.S. (1996) Co-integration constraint and forecasting: an empirical
examination. Journal of Applied Econometrics, 11, 519–38.
Lippi, M. and Reichlin, L. (1994) VAR analysis, non-fundamental representations,
Blaschke matrices. Journal of Econometrics, 63, 290–307.
Lucas, R.E. (1976) Econometric policy evaluation: a critique. In The Philips Curve and
Labor Markets, Carnegie-Rochester Conference Series on Public Policy, vol. 1, Brunner
K. and Meltzer A.H. (eds). Amsterdam: North-Holland.
Lütkepohl, H. (1991) Introduction to Multiple Time-Series. Berlin: Springer-Verlag.
Lütkepohl, H. and Claessen, H. (1993) Analysis of cointegrated VARMA processes. Paper
presented at the EC2 conference at the Institute for Economics and Statistics, Oxford,
December.
MacKinnon, J.G. (1991) Critical values for cointegration tests. In Long-Run Economic
Relationships, R.F. Engle and C.W.J. Granger (eds). Oxford: Oxford University Press.
MacKinnon, J.G., Haug, A.A. and Michelis, L. (1999) Numerical distribution functions of
likelihood ratio tests for cointegration. Journal of Applied Econometrics, 14, 563–77.
References 247
Maddala, G.S. and Kim, I.-M. (1998) Unit Roots, Cointegration and Structural Change.
Cambridge: Cambridge University Press.
Marinucci, D. and Robinson, P.M. (2001) Finite-sample improvements in statistical
inference with I(1) processes. Journal of Applied Econometrics, 16, 431–44.
McCabe, B. and Tremayne, A.R. (1993) Elements of Modern Asymptotic Theory with
Statistical Applications. Manchester: Manchester University Press.
Mosconi, R. and Giannini, C. (1992) Non-causality in cointegrated systems: representa-
tion, estimation and testing. Oxford Bulletin of Economics and Statistics, 54, 399–417.
Muellbauer J. (1983) Surprises in the consumption function. Economic Journal,
Supplement March, 34–50.
Nankervis, J.C., and Savin, N.E. (1985) Testing the autoregressive parameter with the
t-Statistic. Journal of Econometrics, 27, 143–61.
Nankervis, J.C. and Savin, N.E. (1988) The student’s t approximation in a stationary first
order autoregressive model. Econometrica, 56, 119–45.
Nickell, S.J. (1985) Error-correction, partial adjustment and all that: an expository note.
Oxford Bulletin of Economics and Statistics, 47, 119–29.
Newey, W. and West, K. (1987) A simple positive semi-definite heteroskedasticity and
autocorrelation consistent covariance matrix. Econometrica, 55, 703–8.
Ng, S. and Perron, P. (1995) Unit root tests in ARMA models with data-dependent
methods for selection of the truncation lag. Journal of the American Statistical
Association, 90, 268–81.
Osterwald-Lenum, M. (1992) A note with quantiles of the asymptotic distribution of the
maximum likelihood cointegration rank test statistics. Oxford Bulletin of Economics and
Statistics, 54, 461–71.
Park, J.Y. and Phillips, P.C.B (1988) Statistical in regressions with integrated processes:
Part I. Econometric Theory, 4, 468–97.
Parker, S. (1998) Opening a can of worms: the pitfalls of time series regression analyses
of income inequality. Brunel University Discussion Paper, 98–11.
Paruolo, P. (1996) On the determination of integration indices in I(2) systems. Journal of
Econometrics, 72, 313–56.
Patterson, K. (2000) An Introduction to Applied Econometrics: a Time Series Approach.
Basingstoke: Macmillan.
Patterson K. (2005) Topics in Nonstationary Economic Time Series. Basingstoke: Palgrave
Macmillan.
Pesaran, M.H. (1981) Identification of rational expectations models. Journal of
Econometrics, 16, 375–98.
Pesaran, M.H. (1987) The Limits to Rational Expectations. Oxford: Basil Blackwell.
Pesaran, M.H., Shin, Y. and Smith, R.J. (2000) Structural analysis of vector error correc-
tion models with exogenous I(1) variables. Journal of Econometrics, 97, 293–343.
Pesaran, B. and Pesaran, M.H. (1998) Microfit 4. Oxford: Oxford Electronic Publishing.
Perron, P. (1989) The great crash, the oil price shock and the unit root hypothesis.
Econometrica, 57, 1361–1401.
Perron, P. (1990) Testing for a unit root in a time series with a changing mean. Journal of
Business and Economic Statistics, 8, 153–62.
Phillips, P.C.B. (1987) Time series regression with a unit root. Econometrica, 55, 277–302.
Phadke, M.S. and Kedem, G. (1978) Computation of the exact likelihood function of
multivariate moving average models. Biometrika, 65, 511–19.
Phillips, P.C.B. (1991) Optimal inference in cointegrated systems. Econometrica, 59,
283–306.
Phillips, P.C.B. (1994) Some exact distribution theory for maximum likelihood estima-
tiors of cointegrating coefficients in error correction models. Econometrica, 62, 73–93.
248 References
Phillips, P.C.B. and Hansen, B.E. (1990) Statistical inference in instrumental variables
regression with I(1) processes. Review of Economic Studies, 57, 99–125.
Phillips, P.C.B. and Ouliaris, S. (1990) Asymptotic properties of residual based tests of
cointegration. Econometrica, 58, 165–93.
Phillips, P.C.B. and Perron, P. (1988) Testing for a unit root in time series regression.
Biometrika, 75, 335–436.
Podivinsky, J.M. (1993) Small sample properties of tests of linear restrictions on cointe-
grating vectors and their weights. Economics Letters, 39, 13–18.
Reinsel, G.C. and Ahn, S.K. (1992) Vector autoregressive models with unit roots and
reduced rank structure: estimation likelihood ratio test, and forecasting. Journal of
Time Series Analysis, 13, 353–75.
Reimers, H.-E. (1992) Comparisons of tests for multivariate cointegration. Statistical
Papers, 33, 335–59.
Robinson, P.M. (1994) Semi-parametric analysis of long-memory time series. Annals of
Statistics, 23, 1630–61.
Robinson, P.M. and Marinucci, D. (1998) Semiparametric frequency domain analysis of
fractional cointegration. STICERD discussion paper EM/98/348, London School of
Economics.
Robinson, P.M. and Yajima, Y. (2002) Determination of cointegrating rank in fractional
systems. Journal of Econometrics, 106, 217–41.
Rothenberg, T.J. (1971) Identification in parametric models. Econometrica, 39, 577–91.
Said, S.E. and Dickey, D.A. (1984) Testing for unit roots in autoregressive-moving
average models of unknown order. Biometrika, 71, 599–607.
Saikonnen, P. (1991) Asymptotically efficient estimation of cointegrating regressions.
Econometric Theory, 7, 1–21.
Sargan, J.D. (1964) Wages and prices in the UK: a study in econometric methodology.
In Econometric Analysis for National Economic Planning, P.E. Hart, G. Mills and
J.K. Whitaker (eds). London: Butterworth.
Sargan, J.D. (1975) The identification and estimation of sets of simultaneous stochastic
equations. LSE discussion paper no. A1.
Sargan, J.D. (1982) Alternatives to the Muellbauer method of specifying and estimating
a rational expectations model. Florida University discussion paper 68.
Sargan, J.D. (1983a) Identification and lack of identification. Econometrica, 51, 1605–33.
Sargan, J.D. (1983b) Identification in models with autoregressive errors. In Studies in
Econometrics. Time Series and Multivariate Statistics, S. Karlin, T. Amemiya and L.A.
Goodman (eds). New York.: Academic Press, 169–205.
Sargan, J.D. (1988) Lectures on Advanced Econometric Theory. Oxford: Basil Blackwell.
Sargan, J.D. and Bhargava, A. (1983) Testing residuals from least squares regression for
being generated by a Gaussian random walk. Econometrica, 51, 153–74.
Sargent, T.J. (1978) Estimation of dynamic labour demand schedules under rational
expectations. Journal of Political Economy, 86, 1009–44.
Schwert, G.W. (1989) Tests for unit roots: a Monte Carlo investigation. Journal of
Business and Economic Statistics, 7, 147–59.
Sims, C. (1980) Macroeconomics and reality. Econometrica, 48, 11–48.
Spanos, A. (1986) Statistical Foundations of Econometric Modelling. Cambridge: Cambridge
University Press.
Spanos, A. (1994) On modeling heteroskedasticity: the Student’s t and elliptical linear
regression models. Econometric Theory, 10, 286–315.
Spliid, H. (1983) A fast estimation method for the vector auto-regressive moving average
model with exogenous variables. Journal of the American Statistical Association, 78,
843–49.
References 249
250
Index 251