100% found this document useful (1 vote)
220 views263 pages

Modelling Non-Stationary Times Series

MODELAJE DE SERIES DE TIEMPO DESESTACIONALIZADAS

Uploaded by

FAUSTINOESTEBAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
220 views263 pages

Modelling Non-Stationary Times Series

MODELAJE DE SERIES DE TIEMPO DESESTACIONALIZADAS

Uploaded by

FAUSTINOESTEBAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 263

Modelling Non-Stationary

Time Series
A Multivariate Approach

Simon P. Burke and John Hunter


Modelling Non-Stationary Time Series
Palgrave Texts in Econometrics
Series Editor: Kerry Patterson
Titles include:

Simon P. Burke and John Hunter


MODELLING NON-STATIONARY TIME SERIES
Michael P. Clements
EVALUATING ECONOMETRIC FORECASTS OF ECONOMIC AND FINANCIAL
VARIABLES
Terence C.Mills
MODELLING TRENDS AND CYCLES IN ECONOMIC TIME SERIES
Kerry Patterson
UNIT ROOTS IN ECONOMIC TIME SERIES
Jan Podivinsky
MODELLING VOLATILITY

Palgrave Texts in Econometrics


Series Standing Order ISBN 1–4039–0172–4 Hardcover
Series Standing Order ISBN 1–4039–0173–2 Paperback
(outside North America only)

You can receive future titles in this series as they are published by placing a standing order. Please
contact your bookseller or, in case of difficulty, write to us at the address below with your name and
address, the title of the series and the ISBN quoted above.

Customer Services Department, Macmillan Distribution Ltd, Houndmills, Basingstoke, Hampshire


RG21 6XS, England.
Modelling Non-Stationary
Time Series:
A Multivariate Approach

Simon P. Burke and John Hunter


© Simon P. Burke and John Hunter 2005
All rights reserved. No reproduction, copy or transmission of this
publication may be made without written permission.
No paragraph of this publication may be reproduced, copied or transmitted
save with written permission or in accordance with the provisions of the
Copyright, Designs and Patents Act 1988, or under the terms of any licence
permitting limited copying issued by the Copyright Licensing Agency, 90 Tottenham
Court Road, London W1T 4LP.
Any person who does any unauthorized act in relation to this publication
may be liable to criminal prosecution and civil claims for damages.
The authors have asserted their rights to be identified as the authors of this work in
accordance with the Copyright, Designs and
Patents Act 1988.
First published 2005 by
PALGRAVE MACMILLAN
Houndmills, Basingstoke, Hampshire RG21 6XS and
175 Fifth Avenue, New York, N. Y. 10010
Companies and representatives throughout the world
PALGRAVE MACMILLAN is the global academic imprint of the Palgrave
Macmillan division of St. Martin’s Press, LLC and of Palgrave Macmillan Ltd.
Macmillan® is a registered trademark in the United States, United Kingdom
and other countries. Palgrave is a registered trademark in the European
Union and other countries.
ISBN-10: 1–4039–0202–X hardback
ISBN-13: 978–1–4039–0202–3 hardback
ISBN-10: 1–4039–0203–8 paperback
ISBN-13: 978–1–4039–0203–0 paperback
This book is printed on paper suitable for recycling and made from fully managed and
sustained forest sources.
A catalogue record for this book is available from the British Library.
Library of Congress Cataloging-in-Publication Data
Burke, Simon P.
Modelling non-stationary economic time series : a multivariate approach / by
Simon P. Burke and John Hunter.
p. cm. – (Palgrave texts in econometrics)
Includes bibliographical references and index.
ISBN 1–4039–0202–X (cloth)–ISBN 1–4039–0203–8 (pbk.)
1. Econometric models. 2. Time-series analysis. I. Title. II. Series.
HB141.B866 2004
330¢.01¢51955–dc22 2004056896

10 9 8 7 6 5 4 3 2 1
14 13 12 11 10 09 08 07 06 05
Printed and bound in Great Britain by
Antony Rowe Ltd, Chippenham and Eastbourne
Contents

Preface vii

1 Introduction: Cointegration, Economic Equilibrium and the 1


Long Run

2 Properties of Univariate Time Series 8


2.1 Introduction 8
2.2 Non-stationarity 8
2.3 Univariate statistical time series models and
non-stationarity 15
2.4 Testing for non-stationarity in single series 31
2.5 Conclusion 37

3 Relationships Between Non-Stationary Time Series 38


3.1 Introduction 38
3.2 Equilibrium and equilibrium correction 38
3.3 Cointegration and equilibrium 47
3.4 Regression amongst cointegrated variables 62
3.5 Conclusion 66

4 Multivariate Time Series Approach to Cointegration 69


4.1 Introduction 69
4.2 The VMA, the VAR and VECM 71
4.3 The Smith–McMillan–Yoo form 78
4.4 Johansen’s VAR representation of cointegration 89
4.5 Johansen’s approach to testing for cointegration in systems 97
4.6 Tests of cointegration in VAR models 105
4.7 Alternative representations of cointegration 118
4.8 Conclusion 126

5 Exogeneity and Identification 128


5.1 An introduction to exogeneity 129
5.2 Identification 137
5.3 Exogeneity and identification 151
54 Empirical examples 154
5.5 Conclusion 156

6 Further Topics in the Analysis of Non-Stationary Time Series 159


6.1 Introduction 159
6.2 Inference and estimation when series are not I(1) 160
v
vi Contents

6.2 Forecasting in cointegrated systems 173


6.3 Models with short-run dynamics induced by expectations 188
6.4 Conclusion 198

7 Conclusions: Limitations, Developments and Alternatives 200


7.1 Approximation 200
7.2 Alternative models 201
7.3 Structural breaks 201
7.4 Last comments 202

Notes 203

Appendix A Matrix Preliminaries 215


A.1 Elementary Row Operations and Elementary Matrices 215
A.2 Unimodular Matrices 216
A.3 Roots of a Matrix Polynomial 216

Appendix B Matrix Algebra for Engle and Granger (1987) Representation 217
B.1 Determinant/Adjoint Representation of a Polynomial Matrix 217
B.2 Expansions of the Determinant and Adjoint about z ∈ [0, 1] 217
B.3 Drawing out a Factor of z from a Reduced Rank Matrix Polynomial 218

Appendix C Johansen’s Procedure as a Maximum Likelihood Procedure 219

Appendix D The Maximum Likelihood Procedure in Terms of Canonical 223


Correlations

Appendix E Distribution Theory 225


E.1 Some Univariate Theory 225
E.2 Vector Processes and Cointegration 226
E.3 Testing the Null Hypothesis of Non-Cointegration 227
E.4 Testing a Null Hypothesis of Non-zero Rank 228
E.5 Distribution Theory when there are Deterministic Trends in the Data 231
E.6 Other Issues 233

Appendix F Estimation Under General Restrictions 235

Appendix G Proof of Identification based on an Indirect Solution 237

Appendix H Generic Identification of Long-run Parameters in Section 5.5 239

References 240

Index 250
Preface

This book deals with an analysis of non-stationary time series that has been
very influential in applied research in econometrics, economics and finance.
The notion that series are non-stationary alters the way in which series are
grouped and may even prove to be relevant to some aspects of regulation and
competition policy when the definition of market becomes an economic issue.
The later might also apply to any discussion of the nature of globalized
financial markets. In terms of econometric and statistical theory an enormous
literature has grown up to handle the behaviour of the different forms of per-
sistence and non-stationary behaviour that economic and financial data
might exhibit. This is emphasized by the Nobel Prize that has been presented
to Clive Granger and Robert Engle in relation to their extension of our under-
standing of the way in which non-stationary series behave. However, the
requirement to analyze non-stationary behaviour has spawned a wide range of
approaches that relate to and interrelate with the notion that series are non-
stationary and/or cointegrated.
It has been our privilege to in some part be involved in these developments
and to have learned much from our colleagues and teachers alike. We must
acknowledge our debt of gratitude to those who taught us and supervised us
over the years. We would also like to thank participants at various Econo-
metrics Society Conferences, EC2, Econometrics Study Group Conferences
held at the Burwells campus of Bristol University and participants in the
Econometrics workshop for their incites, comments and stimulating research.
Dr. Lindsey Anne Gillan also provided us with some guidance through the
potential minefied that is academic publishing. However, all errors are our
own.

SIMON P. BURKE
JOHN HUNTER

vii
1
Introduction: Cointegration, Economic
Equilibrium and the Long Run

The econometrician or statistician might be viewed as a forensic scientist,


trying to detect from the splatter (of blood), a line through space from which
it may be determined, how and by whom a crime was committed. The tools
available to calculate and describe this evidence are estimators and tests, and
then – conditional on the model selected – identification of the cause or the
perpetrator of the crime.
At the very core of econometrics lies measurement, the quality of measure-
ment and the existence of the measure. When a measure is considered then
there is the practical question of whether measurement is feasible or not.
Conventional statistical measurement and inference considered the behaviour
of processes that are associated with distributions that are generally viewed as
being fixed across the sample. When economists started to apply statistical
measurement to economic data then the notion that the data were identically
and independently distributed (IID) had to be rejected. Regression was used to
measure the heterogeneity by estimating a mean conditional on exogenous
information while the assumption that the data are independently and identi-
cally distributed (IID), was used to give structure to the unknown error in the
model. Essentially some form of least squares regression became the method
generally applied to explain economic phenomena, but in the early literature
it is hard to find reference to the notion of non-stationarity. One exception is
the book written by Herman Wold with Lars Jureen on the subject of demand
analysis, which does consider the behaviour of stationary economic time
series. However, Wold and Jureen (1953) analyzed data for the inter-war years,
a period when price series fell in relative terms and growth of output was rela-
tively stagnant. Hence, any question of how demand models might be derived
when time series are non-stationary was apart from some exceptions ignored.
It is of interest to note that, in a study of the demand for food, James Tobin
estimated both a logarithmic inverse demand curve and in an attempt to
remove serial correlation the same relationship in differences. The latter

1
2 Modelling Non-Stationary Time Series

equation became the basis of the Rotterdam model developed by Theil (1965)
and Barten (1969). In the early 1970s, Box and Jenkins wrote a book that
became highly influential in the statistical analysis of time series data. Box
and Jenkins set out a methodology for building time series models, that firstly
considers the appropriate degree of differencing required to render a series sta-
tionary, and then discusses the type of alternative models autoregressive (AR)
or moving average (MA), or ARMA that might be used to estimate univariate
time series and then considered the method of estimation. Fama (1970) sug-
gests that the observation that financial time series follow random walks is
consistent with the idea that markets were efficient. The random walk model
implies that financial time series are non-stationary and, following Box and
Jenkins, need to be differenced to make them stationary. The difference in the
log of the share price approximates a return and when the financial market is
efficient then returns are not supposed to be predictable.
The structure of time series models pre-dates Box and Jenkins. Yule (1927)
first estimated AR processes and in 1929 Kolmogorov considered the behav-
iour of sums of independent random variables (see the discussion in Wold and
Jureen (1953)). In the regression context, Sargan (1964) applied an MA error
structure to a dynamic model of UK wage inflation. The Sargan model became
the basis of most of the UK wage equations used in the large macroeconomic
models (Wallis et al. 1984). In demand analysis, approximation rather than
non-stationarity was behind differencing and developments in economic
theory related to the structure of demand equations was more interested in
issues of aggregation as compared with the possible time series structure of the
data (Deaton and Muellbauer 1980). To difference time series became
common practice in modelling univariate time series and this approach was
also applied in finance where it was common to consider returns of different
assets rather than share prices. The market model relates the return on a share
to the return on the market. There was now a discrepancy between the
methods applied in statistics and finance to time series data and the approach
predominantly used by economists.
However, the first oil shock precipitated a crisis in macroeconomic model
building. Most of the world’s large macroeconomic models were unable to
resolve many of the problems that ensued from this shock. Forecasts and
policy simulations that provide the governments’ predictions of the future
and a practical tool for understanding the impact of policy on the economy
were unable to explain what had happened and what policies might remedy
the situation (Wallis et al. 1984). The UK Treasury’s inability to forecast the
balance of payments position led to the ludicrous situation of a developed
economy being forced to borrow from the IMF – a remedy that would not
have been sought had reasonable estimates been available of the true pay-
ments position. The whole approach to the econometric modelling of eco-
nomic time series was in doubt.
Introduction: Cointegration, Economic Equilibrium and the Long Run 3

Econometric modelling was criticized on three grounds – the specification


of the models used, their forecast accuracy and their existence. The model
building approach adopted at the London School of Economics (LSE) built on
the methodology developed by Sargan (1964). The Sargan approach attempted
to combine the lessons of conventional time series modelling by applying the
difference operator to the dependent variable with the practical requirement
of the economist that the model could be solved back to reveal relationships
from which the levels of the data might be forecast. The LSE approach implied
that economic time series were dynamic and best modelled as regressions that
included an appropriate description of the dynamic process underlying the
data. The approach reinforced the proposition that a valid regression was
required to satisfy the Gauss–Markov conditions (Patterson 2000) and that
any regression models estimated ought to be well specified. This became what
has been called the Hendry methodology and in the UK and Europe this
approach has provided a potent mechanism to generate reasonable approx-
imations to many aggregate economic time series. In particular, the articles by
Davidson et al. (1978) and Hendry and Mizon (1978) expound a single equa-
tion modelling methodology for consumption and money. Davidson et al.
(1978) emphasize that correct specification follows from estimating general
autoregressive distributed lag (ADL) models, states that the dynamic model
explains the short-run behaviour of the stationary form of the data in differ-
ences, that any levels variables explain the long run and that the long run is
associated with conventional economic theory. Hendry and Richard (1982,
1983) elaborated on these ideas further by explaining what an adequate
approximation of the data is and how systems models are sequentially
reduced into valid sub-models. The final important development that came
out of this approach was the categorization of exogeneity into strict, weak,
strong and super. As far as inference and the estimation of single equation
regression models is concerned, weak exogeneity justified the use of contem-
poraneous variables such as income in consumption and money equations.
The LSE approach provided model builders with a methodology for estimat-
ing single equations by regression. Poor forecast performance was viewed as a
sign of a poorly performing model and was viewed then as, correctable by
valid model selection. In the US the failure of econometric model building
was viewed as a failure of economic theory. Forecasts based on large macro
models broke down because the postwar Keynesian consensus had broken
down and the basis of failure was neoclassical monetary neutrality combined
with hyper-rational agent behaviour. The Lucas critique suggested that
the conventional macro models were unable to capture changes in agent
responses to government policy, the deep parameters of the economic system.
Models based on classical assumptions purported to show that monetary
policy was not effective, while the notion that macroeconomic time series fol-
lowed random walks was embedded in the article by Robert Hall (1978),
4 Modelling Non-Stationary Time Series

which showed that consumption followed a random walk.1 In 1978 Sargent


derived dynamic models based on rational expectations, which impose the-
oretical propositions about the underlying behaviour of agents on the short-
run behaviour of the data. However, Sargent explicitly requires that the series
are stationary for the solution to exist.2 The literature derived from the neo-
classical rational expectations solution to macro modelling has adopted two
approaches to the problem of model specification. The first is to build
dynamic models with data that are differenced and then to solve the expecta-
tions problem or to estimate the models using an unrestricted vector autore-
gressive (VAR) model. The former approach often uses Generalized Method of
Moments to estimate the Euler equation via the errors in variables approach
best explained by Wickens (1982). While Sims (1980) proposed the VAR
methodology, first differencing the data to render it stationary and then
estimating economic behaviour by systems of autoregressive models, suggest-
ing that all the variables modelled are endogenous. Policy invariance is tested
by looking at impulse responses and causal structure, rather than by deriving
structural models.3
The LSE methodology assumed that long-run relationships existed and that
conventional inference was valid irrespective of whether series are stationary
or not. The rational expectations literature that transformed the data into dif-
ferences risked the possibility that there may be over-differencing. Both
approaches understood that time series modelling required dynamic models,
the former assuming that conventional economic theory can be detected in
terms of long-run relationships from the data, the latter approach that it
cannot be. The idea that a correlation is not valid is best explained in Yule
(1926) who considers a number of correlations that can only be viewed as
nonsense. In particular, Yule found that the fall in Church of England mar-
riages was positively correlated with the fall in the death rate between 1861
and 1913. This idea of nonsense correlation along with many of the problems
associated with econometric modelling, including the appropriate measure-
ment of expectations, was discussed by Keynes (1939).4 Keynes emphasizes the
role of economics in statistical model building and explains that economists
need to be looking at true causes as compared with correlations that derive
from the dependence of variables on an underlying primary cause. In 1974
Granger and Newbold presented simulation results for nonsense regressions –
relationships that are observed to be correlated, but cannot be. Granger and
Newbold (1986) describe how univariate and multivariate economic time
series ought to be modelled. Simulations presented in Granger and Newbold
(1974, 1986) show that it is possible to run regressions on unrelated data and
find significant relationships where there should be none. The 1974 article
suggests that the discovery of an R2 that exceeds the Durbin Watson (DW)
Introduction: Cointegration, Economic Equilibrium and the Long Run 5

statistic ought to be indicative of the problem as then the DW statistic has to


be less than one and as a result the model must suffer from significant serial
correlation. The article appears to emphasize that badly misspecified models
should be viewed with deep suspicion, because they may reveal relationships
that are spurious. It is apparent that the econometrics profession had adopted
this research agenda by building on one side of the Atlantic ADL models and
on the other VARs in differences. However, the results associated with Granger
and Newbold (1986) were somewhat subtler, in that when the data were gen-
erated via random walks with MA errors, spurious regressions could be
observed with DW statistics in excess of one. Hence, the question of what
determines a true regression relationship is further complicated by the exis-
tence of more complex explanations of individual time series.
This book considers methods by which it can be determined whether time
series are stationary or non-stationary in differences, difference stationary or
trend stationary or rendered stationary by subtracting from the non-stationary
series some part of another series. The latter case is the cointegration case,
which occurs when two or more series combine to produce stationary vari-
ables and a conventional regression equation between these variables has eco-
nomic meaning in a long-run sense. This notion of cointegration is then
developed in the context of multiple time series. A conclusion for the VAR
methodology in differences is that when long-run behaviour exists, in terms
of combinations of stationary variables in levels, the VAR is fundamentally
misspecified. However, the generalization of the ADL to a system, can under
the restrictions associated with cointegration provide a short-run explanation
of the data, with long-run behaviour explained by restrictions on the levels in
each equation.
In chapter 2, the characteristics of economic and financial time series are
considered. The properties of the variance, covariance and autocovariance of
stationary and non-stationarity time series are defined, in addition to the
alternative definitions of stationarity. Time series models are defined for both
their stationary and non-stationary representations. The statistical properties
of the error are defined in terms of white noise residuals and the Wold decom-
position. Non-invertibility, random walks and alternative notions of persis-
tence are dealt with, as, before time series are modelled, they ought to be
stationary. The proposition that a series is stationary needs to be tested and
the data transformed to take account of non-stationarity or persistence.
Having decided on the stationary form of the data, a time series model can be
identified and estimated. Much of the existing literature handles persistence
by first or second differencing data. The former is often appropriate for real
variables such as output or employment, while second differences might often
be required for nominal variables in economic models, GDP, sales and retail
6 Modelling Non-Stationary Time Series

prices or in finance, share prices, stock indices and dividends. Otherwise, frac-
tional differencing might be required, with the resulting models being special
cases of the autoregressive fractionally integrated moving average (ARFIMA)
model.
In chapter 3, modelling non-stationary time series is handled in a single
equation framework. When more than one series is analyzed, differencing
might be more than is required. This occurs when series in combination are
stationary (cointegration). Non-integer differencing is often required, in the
case of series such as interest rates. Single equation models, which incorporate
some different right-hand side variables in levels, are classified as error correc-
tion models. When the original data or their logarithms are non-stationary,
cointegration may be observed when linear combinations of two or more
levels variables are stationary. Then cointegration is valid when the relation-
ships are bivariate or there is one cointegrating relationships in a system.
When the regressors are exogenous, in a univariate time series context, the
regressions can be viewed as ARMAX or ARMA models with exogenous
variables.
In chapter 4, the multivariate time series model is developed from a station-
ary representation of the data that is known always to exist, the vector or
VMA model in differences. The book explains the nature of multivariate time
series under stationarity and then extends this to the cointegration case. We
then explain how the VMA in differences can be transformed into an error
correction model using the Granger representation theorem and the Smith–
McMillan form developed by Yoo (1986). Cointegration is then described in
terms of error correcting VARs or VECMs. A procedure for determining the
existence of the VAR is described along with the Johansen approach to estima-
tion and inference. The book explains the asymptotic theory that lies behind
the Johansen test statistic. An application is developed based on the models of
the UK effective exchange rate estimated by Hunter (1992), Johansen and
Juselius (1992) and Hunter and Simpson (1995). Finally a number of alterna-
tive representations are developed and the question of multi-cointegration
discussed.
In chapter 5, the exogeneity of variables in the VAR and the identification
of long-run parameters are considered. Exogeneity is discussed in terms of the
restrictions required for weak, strict and cointegrating exogeneity in the long
run. Then alternative forms of exogeneity and causality are considered and
the results associated with Hunter (1992) and Hunter and Simpson (1995) are
presented. Identification is discussed in terms of conventional systems with
I(0) series, this approach is extended to show when the parameters can be
identified via imposing the restrictions and solving out for the long-run para-
meters and their loadings. Identification is then discussed in terms of the
results derived by Bauwens and Hunter (2000), Johansen (1995) and Boswijk
Introduction: Cointegration, Economic Equilibrium and the Long Run 7

(1996). All three approaches are applied to the model estimated by Hunter
(1992).
In chapter 6, more advanced topics are considered in some detail. Firstly,
the I(2) case, firstly using an extention to the Sargan–Bézout approach
adopted by Hunter (1994), then in terms of the representation and test due to
Johansen (1992) and Paruolo (1996), and finally the test procedures due to
Johansen and Paruolo are applied to the exchange rate data in Hunter (1992).
Fractional cointegration is briefly discussed in terms of the estimator due to
Robinson and Marinucci (1998) and the test due to Robinson and Yajima
(2002). Secondly, forecasting of non-stationary and stationary components is
considered. The results produced by Lin and Tsay (1996) and Clements and
Hendry (1995, 1998) are presented with a graphical analysis of the perfor-
mance of the simulations developed by Lin and Tsay (1996). Finally, models
with short-run structural equations are discussed – in particular, models with
unit roots in the endogenous and exogenous processes. It is shown how to
estimate models where the unit roots relate to the endogenous variables and
then to the case associated with the exogenous variables.
In chapter 7, the reader is guided to further issues in the literature. Firstly, a
plethora of articles on testing stationarity and non-stationarity has developed;
the reader is directed where appropriate to the book by Patterson (2005). A
condensed discussion of structural breaks is provided along with direction to
appropriate references.
2
Properties of Univariate Time Series

2.1 Introduction

This chapter introduces a number of concepts in the analysis of univariate


time series that are important for an understanding of non-stationarity in the
multivariate case. The fundamental building block is the autocorrelation
structure of a time series. This describes the way in which current and past
values of a time series are related to one another. Capturing the main charac-
teristics of these relationships can be thought of as the primary task of a time
series model: to provide theoretical structures the properties of which closely
approximate those of observed time series, and to provide estimates of such
models using specific time series that can be used to draw inferences about
other aspects of behaviour.
Linear models designed to capture the leading properties of autocorrelation
structures, namely the autoregressive and moving average models, define a set
of structures for which generic concepts, especially non-stationarity, have very
specific but simply stated implications. The discussion below begins by dis-
cussing autocorrelation and non-stationarity in fairly general terms. It moves
on to describe how these properties can be reasonably approximated by uni-
variate autoregressive moving average models, and lastly to how, they can be
used to test for a limited form of non-stationarity. The treatment throughout
is univariate.

2.2 Non-stationarity

2.2.1 Time series structure: autocorrelation


There are various aspects to the idea of stationarity and so to non-stationarity.
A general definition may be very difficult to exploit in practice. A practical
definition has to be precise, but will be more prescriptive, dealing with a
limited set of situations relevant to the problem at hand. The characteristics of
the set of problems dealt with in this book relate to the fact that the data are
8
Properties of Univariate Time Series 9

Figure 2.1 UK annual rate of growth of real output, quarterly, 1963Q1–1993Q4, T = 84

time series and that it is the temporal dependence between elements of these
series that is of concern. Furthermore, the dependence will be considered at a
relatively simple level: that of covariance. This last point does not matter if
the distribution being used is the normal (or Gaussian) distribution, since this
distribution is characterized entirely by its mean and variance and covariance.
Consider Figure 2.1. This shows the time series plot of the annual rate of
growth of UK real output from 1963 to 1993. Its characteristics are that it
varies around a more or less fixed level, that it does not drift away from this
level for any great length of time, and that higher values at some point in
time tend to be followed by other high values, or at least that changes from
the high values or are often smooth. The same applies for low values, followed
by low values or changing relatively smoothly.1 The controlled variability
around a fixed level is a manifestation of stationarity. The relationship
between neighbouring values can be described by autocorrelation – literally,
the quantification of the correlation between values in the time series sep-
arated by fixed periods of time. A type of stationarity can be defined in terms
of the autocorrelation and mean of a time series. This is a restricted but very
useful and practical definition.
In theory, the individual observations comprising the time series are
thought of as realizations of underlying random variables. The autocorrelation
of a time series is defined in terms of these underlying random variables as
follows. Let Xt t = 1, 2, … be a sequence of scalar random variables, one for
each equally spaced point in time, t, but otherwise referring to the same
random variable, X. Such a sequence may (loosely) be called a stochastic
process.2 Let E(.) be the expectation operator.

2.2.1.1 Autocovariance and autocorrelation


The autocovariance between two random variables at different points in time
is their covariance, and is given by
 x ( j) = E [ Xt − E( Xt ))( Xt − j − E( Xt − j ))]. (2.1)
j = …, − 2, − 1, 0, 1, 2, …
10 Modelling Non-Stationary Time Series

The autocorrelation is the correlation between the two random variables.


Noting that the variance of the process is given by

Var ( Xt ) = E[ Xt − E( Xt ))2 ] =  x ( 0)

the autocorrelation is given by

 x( j) (2.2 )
x( j) = .
 x ( 0)
j = …, − 2, − 1, 0, 1, 2, …
Being a correlation, it follows that

−1 ≤ ρ x ( j ) ≤ 1
making it a useful basis on which to compare time series.
The sequence of autocovariances and autocorrelations obtained as j, the
time gap between random variables changes, are often referred to as functions.
That is, (2.1) is called the autocovariance function and (2.2) the autocorrela-
tion function (abbreviated to ACF).

2.2.2 Stationarity
The definitions of autocovariance and autocorrelation have been written to
indicate that they depend only on the time gap, not the point in time. That is,
for example, considering two different points in time, t and t – j,

E[( Xt − E( Xt ))( Xt − j − E( Xt − j ))] =  x ( j )

and

E[( Xτ − E( Xτ ))( Xτ − j − Eτ − j ))] = γ x ( j )

even though t ≠ . But the time gap, j, is the same so they have the same auto-
covariance. This is an assumption consisting of two components.
It is assumed that the expected value, or mean of the time series does not
change over time, so that for any t ≠ .

E( Xt ) = E( X ). (2.3a )

It is also assumed that, given the mean is constant, the autocovariance


between equally separated random variables does not change. As a special case
of this last assumption, the variance does not alter over time so that

 x ( 0) = E[( Xt − E( Xt ))2 ] = E[ X − E( X ))2 ] (2.3b)

from which it follows that the autocorrelations depend only on the time gap,
not on the time itself. The assumption that these quantities remain fixed over
time is a fundamental aspect of stationarity, and goes most of the way to
Properties of Univariate Time Series 11

providing a practical definition of stationarity for the purposes of time series


analysis.
The expectation of the process, such as in equation (2.3a), is referred to as
the first moment, the (co)variance as the second moment (about the mean).
Thus, in equations (2.3a) and (2.3b), it has been assumed that the first two
moments of the process are constant over time. Some definitions stop at this
point and use the constancy of these moments to define covariance stationar-
ity.3 The definition used here will add one clarification: that these moments
must be finite. (This is only a clarification because the definition obviously
requires the moments to exist, but if infinite, they do not exist.) This is a
common addition, see for example Banerjee et al. (1993, p. 11). The definition
of covariance stationarity used in this book is at the beginning of the next
section.

2.2.2.1 Covariance stationarity4


The sequence of random variables Xt, t = 1, 2, … is said to be covariance
stationary if, for all t,

E( Xt ) = µ , µ < ∞,
Var( Xt ) = σ 2 < ∞,
E[( Xt − E( Xt ))( Xt − j − E( Xt − j ))] = γ x ( j ), γ x ( j ) < ∞.

Under the assumption of covariance stationarity it is meaningful to estimate


the autocorrelation function in the following way. Let xt, t = 1, 2, …, T be
the observations on a time series. Then the autocovariance function may be
estimated as
T

∑ ( x − x )( x
t = j +1
t t−j − x)
γˆ x ( j ) =
T−j

and the ACF as


γˆ x ( j )
ρˆ x ( j ) = (2.4)
γˆ x ( 0)
T
where x– is the sample mean, x = ∑ xt T . Equation (2.4) is referred to as the
t =1

sample ACF. The sample ACF for the UK output growth data is presented in
Figure 2.2.
Figure 2.2 has two leading characteristics. The sample autocorrelations
damp off over time, that is they decline towards zero as the time gap, or lag
(j), gets larger. There is a degree of oscillation, so that the autocorrelations
start off positive, then decline to zero, but go through zero before returning to
12 Modelling Non-Stationary Time Series

Figure 2.2 Sample ACF for rate of growth of UK real output, quarterly, 1963–1993

this level. Of course, zero indicates an absence of association between the


values concerned.5 So, in this case, after the gap between observations reaches
20 (5 years) there is no discernible relationship between values. If confidence
limits can be added to the sample ACF, it may well turn out that any statist-
ically significant relationship dies out earlier than 5 years.
The damping off of the sample ACF is an empirical characteristic of covari-
ance stationary time series. In the case of Figure 2.1, it suggests that, while
there is a relationship between temporally close values (say up to gaps of one
year), values separated by a greater length of time are not much correlated. As
a special case of this, the values of the time series observations are not
dependent on the initial value, x1, because if this was so, autocorrelations at
very large time gaps would remain high.
Of course, the sample ACF can be calculated using equation (2.4) whether
or not the conditions for covariance stationarity apply. Consider the $/£
exchange rate data plotted in Figure 2.3. The characteristics of this time series
plot are markedly different from those of Figure 2.1. Apart from the fact that

Figure 2.3 Daily $/£ exchange rate, January 1985–July 1993, T = 2168
Properties of Univariate Time Series 13

Figure 2.4 Sample of ACF of $/£ exchange rate data

the line is fuzzier, caused by the fact that a great many more observations are
being plotted to the same real length of horizontal axis, which is a matter of
scaling only, the series is seen to wander away from its starting point, to such
an extent that it is difficult to argue that it appears to be varying around a
fixed level. If it isn’t varying about a fixed level (that is, there doesn’t seem to
be a fixed mean), then it is difficult to see how the variances or covariances
might be behaving. It seems that they must also be varying with time,
although, care should be taken since it is quite possible to imagine a series
that varies to a constant degree around a mean that is changing.6 However, in
this case it is difficult to discern what that mean could be. The sample ACF for
this series is given in Figure 2.4. In contrast to the ACF for the growth data,
this declines linearly, and has not reached zero, even by the 100th lag (ˆ (100)
=0.53521). This series appears to have very long memory, in terms of lags. Its
sample ACF does not look like it is damping off at all. This is not consistent
with the idea of covariance stationarity and suggests that the calculations may

Figure 2.5 Moving window sample variance estimates of the $/£ exchange rate data,
window length 100
14 Modelling Non-Stationary Time Series

indeed be meaningless. It seems likely that the exchange rate series is not
covariance stationary.
To emphasize the point, Figure 2.5 plots the moving window sample vari-
ance estimates of the $/£ exchange rate series, computing the sample variance
for observations 1–100, followed by that for observations 2–101, and so on.
From this it is clear that the variance around the mean does not remain con-
stant even when the mean itself is allowed to vary across windows.

2.2.2.2 Strict stationarity


Covariance stationarity is a useful but rather specific version of stationarity.
It is useful because it relates only to the first two moments, and because it
can be defined precisely in terms of the parameters of the commonly used
autoregressive-moving average (ARMA) time series models, as well as their
multivariate counterpart, the vector autoregressive (VAR) model. Furthermore,
if the distribution of the random variables is normal, then it is only necessary
to consider the first two moments.
However, a more general definition is available, and is expressed in terms of
the joint (entire) distribution of the set of random variables underlying the time
series observations. Suppose there are T time series observations, xt, t = 1, 2, … T.
Consider a subsample of n of these, xt, t = t + 1, t + 2, …, t + n. Each of these is
thought of as a realization of an underlying random variable, Xt, t = t + 1, t + 2,
…, t + n. If the joint distribution of these n random variables remains
unchanged through time, then the time series is said to be strictly stationary.7

2.2.3 Strict (joint distribution) stationarity


Let F(·) be the joint distribution function of, Xt, t = t + 1, t + 2, …, t + n,
written as F (X + 1, …, X + n). Then if

F( X +1 , … , X + n ) = F( X + h +1 , …, X + h + n ), h ≥ 0 (2.5)
then the process generating the time series observations is said to be strictly
stationary.
Equation (2.5) simply states that the joint distribution of the sequence of
random variables is unchanged when considering the distribution any number
of periods earlier or later. In the case of covariance stationarity, it is not the dis-
tribution as a whole that is considered, but only its first two moments, the
mean, and the (co)variances. This is clearly a weaker requirement.
It can be seen that strict stationarity, while having appeal from a philosoph-
ical point of view, is very demanding and so impracticable. In common with
most textbook treatments, econometric research, applied and theoretical, this
book will adopt covariance stationarity as its definition of stationarity, and,
unless otherwise stated, stationarity will mean covariance stationarity. In
addition, a common – though not universal – assumption of time series
models is of normality, in which case the two definitions are coincident.
Properties of Univariate Time Series 15

2.3 Univariate statistical time series models and non-stationarity

2.3.1 Describing covariance non-stationarity: parametric models


Covariance non-stationary is an observable feature of a time series, as seen
from Figures 2.3–2.5. The failure of the sample autocorrelations of a time
series to damp off over time suggests non-stationarity, or the wandering of a
time series away from its starting (initial) value with a tendency not to return
to it. These properties relate to the dynamic properties of the series rather
than to the joint distribution of observations.
Such properties can be captured, by very simple models of the series, that
relate the current value of a series to its past values and to the current and past
values of a largely structureless stochastic component. That is, it is possible to
invent theoretical models of the underlying random variables that would
produce realizations whose sample properties approximate those observed in
actual data.
It is important to realize at this early stage, that what is going on here is the
approximation of the underlying process generating the data. This process is
known as the data generating process (DGP). It can be expected to be highly
complex, and incapable of exact description.8 No model will be exact, not
simply in terms of the parameter values chosen, but also in terms of the basic
form of the model used. This having been said, models can often capture key
features of data relevant for the purposes at hand. The key feature of interest
in this case is covariance non-stationarity.

2.3.2 The white noise process


The building block of the time series models considered here is a stochastic
process with simplest possible structure, having no temporal dependence and
constant moments over time. Typically, the moments described are only the
first two, thought there is no reason why this should not be extended to cover
all moments. Put in terms of stationarity, the process is a zero mean stationary
process. It is called a white noise process.

2.3.2.1 White noise


Let t, t = 1, 2, … be a sequence of random variables. Then if

E(t ) = 0 (2.6a )
Var(t ) =  2
∀t (2.6 b)
 ( j ) = 0 ∀j ≠ 0 (2.6c )
( E(t t − j ) = 0, ∀j ≠ 0) (2.6d )

the sequence is said to be white noise9 and the symbol ∀ means ‘for all’ or ‘for
any’.
16 Modelling Non-Stationary Time Series

Figure 2.6 Realizations of a NIID(0,1) white noise sequence

Equations (2.6a–2.6d) state that the process has zero mean, constant vari-
ance, and that different members of the sequence are uncorrelated. In addi-
tion, there will often be a distributional assumption, which is that the random
variables are normally distributed. Since, under normality, non-correlation is
equivalent to independence, the sequence is then described as normally inde-
pendently identically distributed (NIID) with a mean of zero and variance 2
In short, t ~ NIID (0, 2 ). Realizations of an NIID(0, 1) sequence are provided
in Figure 2.6, with the time index labelled as though for the same time period
and frequency of data as Figure 2.1 for the growth in output data.

2.3.3 The moving average process


It is easy to construct a correlated sequence from white noise by forming
linear combinations of different members of the white noise sequence. For
example, define

at = t − 1 2t −1.
Then clearly there is some temporal structure to the at, that is they are auto-
correlated. Note that the mean of the process is given by

E( at ) = E(t ) − 1 2 E(t −1 ) = 0; (2.7a )


the variance by

Var( at ) = Var(t ) + 1 4 E(t −1 ) − 1 2Cov(t , t −1 ), (2.7 b)

where, because t is white noise, the covariance between t and t – 1, Cov(t,


t – 1), is zero, this becomes

Var( at ) = Var(t ) + 1 4 E(t −1 ) = 5 4 2 (2.7c )

the autocovariance for j ≠ 0 by

 a ( j ) = E( at at − j )
= E((t − 1 2t −1 )(t − j − 1 2t − j −1 ))
= E(t t − j ) − 1 2 E(t −1t − j ) − 1 2 E(t t − j −1 ) + 1 4 E(t −1t − j −1 )
 1 2
−  for j = 1
= 2 
 0 for j > 1
Properties of Univariate Time Series 17

since the expectation terms in the last expression will be zero if the time index
on the random variables is not the same because the white noise series is
uncorrelated; if the index is the same then the expectation is the expectation
of a square of a zero mean process, and so is its variance, 2. So the process is
autocorrelated as far as but not beyond the first lag and this is because it is a
function of the current white noise term and its previous value.
It is possible to build more general models of this type. Let θi, i = 1, 2, …, q
be constant coefficients and define
q
at = t − ∑θ 
i =1
i t −i,
(2.8)

E( at ) = 0. (2.9a )
 q 
Var( at ) = 1 +
 ∑q  2
i
2
,
(2.9b)
 i =1 
( +   +   +…+   ) 2 for j = 1, 2, …, q

 a ( j ) =  j j +1 1 j + 2 2 q q− j
otherwise (2.9c )
0
These equations show that the mean and variance are fixed, that the autoco-
variances depend not only on the time gap, but on the time itself, and that all
moments are finite as long as the parameters are. The process is therefore sta-
tionary, but has an autocorrelation structure that cuts off after q lags.
However, since there are q parameters in the model, these values may be
chosen so as to reproduce any desired sequence of q autocovariances, and
hence any ACF cutting off after lag q.
Equation (2.8) defines a moving average (MA) process (or model) of order q.
It is important to note that all such processes are stationary, and that they are
very flexible in terms of reproducing autocorrelation structure. To obtain a
process whose autocorrelations last out to lag 15, a MA(15) model can be used.
In theory, the model can extend to an infinite number of lags if the autocorre-
lations damp off asymptotically rather than all at once. Then it is necessary to
place a restriction on the coefficients so that the variance (2.9b) exists,

namely, that ∑θ
i =1
2
i < ∞. These properties also demonstrate the drawbacks of the

MA model: it isn’t practical to work with a very large number of lags; the
model cannot capture non-stationary behaviour; and, finally, it is not easy to
motivate in terms of the real life structures that might have given rise to data.

2.3.4 Wold’s representation theorem


The approximation of autocorrelation structures of stationary processes by
moving average models lies at the heart of one of the most important theories
of time series analysis. Wold’s representation theorem states that:
‘any covariance stationary time series with only stochastic components10
can be represented by an infinite order MA model with appropriately
chosen coefficient values and white noise variance’.
18 Modelling Non-Stationary Time Series

The point is that, by extending the order of the MA far enough, it is always
possible to provide a MA process whose ACF approximates that of any given
ACF to whatever degree of accuracy is required, and that the approximation
error goes to zero as the order of the MA increases. As long as the foregoing is
understood, this may be abbreviated by stating that any (covariance) station-
ary time series with no deterministic components has an infinite order MA
representation.
Thus, if xt is a stationary time series with only stochastic components, it is
always possible to represent it as

xt = t − ∑θ 
i =1
i t −q

where t is zero mean white noise with variance 2, the only restriction on the

parameters being that ∑θ
i =1
2
i < ∞. A more detailed account of this theorem may

be found in Hamilton (1994, section 4.8) and a rigorous one in Brockwell and
Davis (1991).11

2.3.5 The autoregressive process


A moving average process cannot capture non-stationarity. In addition, it
cannot capture an autocorrelation structure that damps slowly off to zero
other than in the case of an arbitrarily high-order (q) process. An alternative
model relates the current value of a process to its past values plus a white
noise disturbance term. This is the autoregressive process, and equations
(2.10a) and (2.10b) below define an autoregressive process of order I and of
order p respectively:

xt = xt −1 + t , (2.10a )

xt = ∑ x i t −i + t , (2.10b)
i =1
where  and  i = 1, 2, …, p are constant coefficients.12

2.3.5.1 The ACF of an autoregressive process


It is straightforward to show that the autocovariance and autocorrelation
functions of the AR(1) process are

 2i
 x( j) = (2.11a )
1− 2

 x( j)
x( j) = =  j. (2.11b)
 x ( 0)
Properties of Univariate Time Series 19

In the case of an AR(p) process, the ACF is the solution to the difference equation
p
 x( j ) = ∑   ( j − i).
i =1
i x
13

The solution of this equation depends on the p solutions of the polynomial


equation,
p

1− ∑ z = 0, i
i
(2.12 )
i =1
where z is the argument of the function on the left-hand side of (2.12). Let
these solutions be i, i = 1, 2, …, p. Then, in the case where the solutions are
all distinct, the solution will be of the form14
p

x( j) = ∑A −j
i i (2.13)
i =1

where the Ai, i = 1, 2, …, p are constant coefficients determined from the


initial values of the ACF, x(j), j = 0, 1, …, p – 1. A special case of particular
interest in economics, is where the solutions occur in complex conjugate
pairs.15

2.3.6 Lag polynomials and their roots


2.3.6.1 The lag operator and lag polynomials
The representation and analysis of the autoregressive and moving average
models is made more succinct by the use of a functional representation of the
lag structure involved. In the case of the AR(p) model, rewrite equation (2.10b)
as
p

xt − ∑ x i t −i = t . (2.14)
i =1

Using the lag operator, L, defined such that

Lxt = xt −1 ,
Ln xt = xt − n ,

equation (2.14) may be written in terms of xt as

p
xt − ∑ L x
i =1
i
i
t = t

or

 p 
1 −

∑  L  x
i =1
i
i
t = t .
20 Modelling Non-Stationary Time Series

 p 
The term 1 −

∑  L 
i =1
i
i
of this equation is a polynomial of degree p in the lag
operator L (and so is itself an operator). That is, it is a polynomial function of
L. It is therefore conveniently rewritten as
p

( L) = 1 − ∑ L . i
i
(2.15)
i =1

This function is called a lag polynomial operator (of order p). In general, the
coefficient of L0 = 1 does not have to be equal to 1 as it is here. This has arisen
because the starting point was an autoregressive model.
Using (2.15), the AR(p) model of (2.10b) may be written

( L)xt = t .

Similarly, defining the qth order lag polynomial,


p

q( L) = 1 − ∑q L , i
i
(2.16)
i =1

the MA(q) model of (2.8) may be written as

at = θ ( L)t .

2.3.6.2 The roots of a lag polynomial


In obtaining the ACF of an AR(p) process, it was necessary to refer to the solu-
tions to the equation
p

1− ∑ z = 0,
i
i
(2.12 ) again
i =1

But the left-hand side of this equation is the same function as (2.15) except
that the lag operator has been replaced by the general complex argument, z.
So, writing

p
( z ) = 1 − ∑ z
i =1
i
i

so that it can be seen that (z) is a polynomial function of z, equation (2.12)


may be written,

( z ) = 0. (2.17)
The values of z that satisfy (2.17) are called the roots of the polynomial (z).
As a short hand, they are also referred to as the roots of the lag polynomial
operator, (L), although, obviously, it is not correct in any sense to assign
numerical values to an operator (the lag operator in this case).
Properties of Univariate Time Series 21

2.3.6.3 Convenient short-hands for referring to functions of the coefficients of lag


polynomials
Let
(L) be a lag polynomial given by
n


( L) =
0 − ∑
L i
i
(2.18)
i =1

where
i, i = 0, 1, …, n are constant coefficients. By ‘evaluating’ the function at
certain values of its argument, useful functions of the coefficients can result.
There are two important cases:
(i) Replace L by 0. Then (2.18) becomes

n

( 0) =
0 − ∑
0
i =1
i
i
=
0.

That is,
(0) is the value of the coefficient of the zero lag term of
(L).
(ii) Replace L by 1. Then (2.18) becomes

n n

(1) =
0 − ∑
i =1

i 1i =
0 − ∑
.
i =1
i

So
(1) is the sum of the coefficients of
(L).

2.3.6.4 Roots and the ACF of an autoregressive process


Having defined the lag polynomial and its roots, it is possible to refer very easily
to an AR model and its ACF as follows. Consider the AR(p) model  (L) xt = t,
(where, it could be added for precision, but would normally be understood from
the definition of an AR process, t is white noise and  (0) = 1). Let i i = 1, 2, …,
p be the (distinct) roots of the lag polynomial. Then the ACF of xt is given by
p
 x ( j ) = ∑ Ai −i j , where the Ai depend on the parameters of the process, including the
i =1

white noise variance. So, it is the roots of the autoregressive lag polynomial that
determine the evolution of the ACF as a function of the time gap, or lag, j.

2.3.7 Non-stationarity and the autoregressive process


Equation (2.13) and its simplification in the first order case, equation
(2.11b),20 show that the pattern of the autocorrelations of an AR process
depend on the roots, i, i = 1, 2, …, n of the lag polynomial,  (L). A necessary
and sufficient condition that x (j) → 0 as j → ∞ is that | –1i| < 1 for all i = 1, 2,
…, n. In terms of the roots instead of their inverses, the condition is that all
roots are such that | i| > 1. This condition is referred to as ‘all the roots lying
outside the unit circle’. It applies to complex as well as to real roots. In the
complex case, a root may be written c = a + ib, where in this case i = −1 (not
an indexing subscript) and a and b are real coefficients. Then c = + (a + b ).
2 2
22 Modelling Non-Stationary Time Series

The condition that all the roots (of the autoregressive lag polynomial) lie
outside the unit circle is the stationarity condition for autoregressive processes.

2.3.7.1 Stationarity of an autoregressive process


The AR (p) process  (L) yt = t is stationary if and only if i such that
( i ) = o ⇒ i > 1 ∀i = 1, 2, … , p.

2.3.8 The random walk and the unit root


Notice that, by definition, if the polynomial function evaluated at some
number is equal to zero, then that number is a root of the polynomial. So, for
example, if  (1) = 0 then 1 is a root, usually referred to as a unit root. Such a
root would mean the autoregressive process with this lag polynomial was non-
stationary because this root is not greater than 1 in modulus (i.e. |1| = 1), and
hence the stationarity condition is contradicted.

2.3.8.1 The random walk process


The random walk is an AR(1) process with a unit root. It is therefore a non-
stationary process. Equation (2.19) below defines a random walk,

xt = xt −1 + t ,
which can be written
( L)xt = t ,
( L) = 1 − L. (2.19)

To see that (2.19) has a unit root, and is therefore non-stationary, note that
 (z) = 0 has the solution z = 1 from (2.19). That is, the lag polynomial of this
model has a root of 1.

2.3.8.2 Differencing and stationarity


The period on period changes of a process are known as its first difference.
Thus, xt – xt – 1 is the first difference of xt. It is denoted xt. Clearly, D can be
represented as the (autoregressive) lag operator (1 – L). That is, it is a first order
operator with a unit root. The random walk may thus be written xt = t
which illustrates an important principle. Since t is white noise it is stationary.
Therefore xt is stationary. But xt itself is non-stationary. That is, taking the
first difference of the non-stationary process has reduced it to stationarity.
Such a process is said to be integrated of order 1.
The second difference of the random walk would be
xt = t
⇒ ( xt − xt −1 ) = t − t −1
⇒ xt − xt −1 = t − t −1
⇒ xt − xt −1 − xt −1 + xt − 2 = t − t −1
⇒ (1 − L)2 xt = t − t −1
⇒ 2 xt = t − t −1 .
Properties of Univariate Time Series 23

That is,

= 2
and in general, if the process is differenced n times, the operation can be rep-
resented as n, the lag operator representation of which can be calculated
from (1 – L)n. Although the first difference of the random walk is stationary, so
is the second, because it is an MA(1) process, 2 xt = t – t – 1, and all MA
processes are stationary. However, it has been over-differenced, meaning that
in order to reduce the original (random walk) process to stationarity it was
only necessary to difference once. This can be detected in the time series
structure by observing that t – t – 1 is a MA(1) process with a unit root. (It
could be said that differencing the minimal number of times to reduce the
series to stationarity removes the unit root altogether, while over-differencing
moves it from the AR to the MA side of the equation.) Strictly speaking, it is
the minimal number of times it is necessary to difference a non-stationary
series to stationarity that defines its order of integration. This is made precise
in section 2.3.11 below.

2.3.8.3 The random walk as stochastic trend


The idea of a trend is a process that increases by the same amount each time
period.17 Thus, a process defined as

yt = a + bt (2.20)

where a and b are constant coefficients, increases by an amount b each period


since

yt − yt −1 = ( a + bt ) − ( a + b(t − 1)) = b.

So, (2.20) could be written

yt = yt −1 + b. (2.21a)

However, (2.21a) does not tie down the value of the process at any point in
time, whereas (2.20) does. In particular, considering the value of the process at
t = 0, called the initial value, (2.20) gives
y0 = a. (2.21b)
The time trend model is fully described by equations (2.21a) and (2.21b).
Because the amount added each time period is fixed, b this is known as a
deterministic trend. If instead of adding a fixed amount, a white noise is
added, the resultant process is still called a trend, but it is now termed a
stochastic trend. Thus in place of (2.21a) write

yt∗ = yt∗−1 + t (2.22)


Comparing (2.9a and 2.22), it is clear that y*t is a random walk. That is, the
random walk and the stochastic trend are, when defined in this way, the same
thing.
24 Modelling Non-Stationary Time Series

By the same argument as in the deterministic case, in order to tie the


process down, it is necessary to provide some information about the process at
some point in time. As before, it is most convenient and intuitively appealing
to make this the initial value, y*0. The simplest case is y*0 = 0.
It is possible to obtain the expression for the stochastic trend analogous to
(2.20) in the deterministic case, that is to express the process in terms of its
initial value and the accumulation of its increments. This is done by the
process of back-substitution, which means using the original equation for y*t
(2.22), lagging it one period to get an expression for y*t – 1, and substituting this
into (2.37). This generates an expression in y*t – 2 which can be substituted for
in a similar manner. The process is repeated until the y* variable in the equa-
tion is the initial value. These steps are:
yt∗ = yt∗−1 + 
= yt∗−2 + t −1 + t
= yt∗−3 + t −2 + t −1 + t
t

= … = y0∗ + ∑ . j
(2.23)
j =1

The fact that (2.23) involves a simple (unweighted) sum of white noise terms
leads to the general label of integrated for processes of this type, although the
class is not restricted to pure random walks of the type illustrated here.18
Using equation (2.23), it is straightforward to show that both the variance
and autocorrelation structure of the random walk are varying over time
according to
Var( yt∗ ) = t 2 ,
Cov( yt∗ , yt∗− j ) = (t − j ) 2
and defining the correlation to be the covariance divided by the variance of
the process at time t, the autocorrelation is

 i
Cor ( yt∗ , yt∗− j ) = 1 −  . (2.24)
 j

Alternatively, dividing by the square root of the product of the variances at t


and t – j would give the autocorrelation19

Cor ( yt∗ , yt∗− j ) = 1 − j t .

It is clear that the process is non-stationary since its moments are not con-
stant over time. From this non-constancy, it also follows that the manipula-
tions underlying the derivation of the difference equation for the ACF of an
autoregressive process are not valid. So, in fact, equation (2.13) only applies in
the stationary case.20
Properties of Univariate Time Series 25

Figure 2.7 Random walk, 2,168 observations, initial value 0, NIID(0,1) white noise

Figure 2.8 Sample ACF of random walk series plotted in Figure 2.7

Figure 2.9 Theoretical ACF of a random walk at different points in time: t = 50, 75, 100

Figure 2.7 presents an artificially generated random walk sequence based on


an NIID(0,1) white noise sequence, and Figure 2.8 is its sample ACF, although
recall that this plot cannot have the meaning it possesses in the stationary
case.
The theoretical ACF will vary with the time t, according to equation (2.24).
Figure 2.9 provides a suite of such functions for three different points in
time.
26 Modelling Non-Stationary Time Series

2.3.8.4 The random walk with drift


An important case for economic time series involves a generalization of the
random walk so that the process consists of the sum of both a stochastic and a
linear deterministic trend,
t
xt = x0 + t + ∑ j =1
j

x0 being the initial value of the process. In this case,

t  t −1 
xt = xt − xt −1 = x0 + t + ∑
j =1
 j  x0 + (t − 1) +


∑   = + 
j =1
j t

Such a process is called a random walk with drift and is called the drift para-
meter. There are now two aspects to the non-stationarity: not only is the vari-
ance growing over time (and the autocorrelation structure changing over
time) but the mean of the process is also evolving since
t
E( xt ) = E( x0 ) + E( t ) + E( ∑ ) = x
j =1
j 0 + t

assuming x0 is fixed (non-stochastic).

2.3.9 The autoregressive moving average process and operator inversion


Autoregressive and moving average models can be combined to form a single
model. This may be written
( L)xt = θ ( L)t (2.25)
where t is white noise, (L) a pth order lag polynomial with  (0) = 1 and θ (L)
qth order with θ (0) = 1 as defined by equations (2.15) and (2.16). The model is
then autoregressive-moving average of order (p, q) or ARMA(p, q). Since θ (L) t
is a moving average process, it is stationary. Thus (2.25) is stationary if and
only if the autoregressive contribution is stationary, that is if all the roots of
 (L) lie outside the unit circle. In the stationary case, models of this type give
rise to ACFs which begin with q irregular autocorrelations determined by the
MA coefficients, followed by a pattern of values generated to the solution of
the difference equation arising from the autoregressive polynomial. For details
see Box and Jenkins (1976).
ARMA models are also of practical importance since they provide a way of
representing a relatively complex ACF with relatively few parameters: they are
said to be parsimonious.
In the stationary case, both sides of (2.25) can be divided by  (L) to give
xt = ψ ( L)t
where
ψ ( L) = ( L) ( L).
Properties of Univariate Time Series 27

It is stronger to think of this as the inversion of the AR operator, and write


instead

ψ ( L) =  −1( L)θ ( L).

where –1 (L) is such that –1 (L)  (L) = 1.21 The inverse operator does not exist
unless all the roots of  (L) lie outside the unit circle. The operator ψ (L) is of
infinite order, and so ARMA models can be thought of as a restricted way of
obtaining an MA(∞) representation. That is, the ARMA model of finite orders
provides an approximation to the infinite order MA representation of a sta-
tionary process.

2.3.9.1 Illustration of operator inversion


Let (L) = 1 – L, || < 1. Then a Taylor series expansion may be used to
obtain

 −1( L) = 1 / (1 − L) = (1 + L +  2 L2 + … = ∑ L .
i =0
i i

It is easily verified that –1 (L)  (L) = 1 in this case. Then the ARMA(1,1)
model

(1 − L)xt = (1 − θL)t (2.26)

can be written
 ∞ 
 ∑
xt =  i Li  (1 − θL)t

(2.27)
 i=0 
Multiplying out the operators on the right hand side of (2.27) gives
 ∞ i i ∞ ∞

∑
 i =0
 L  (1 − θL) =


i =0
 i Li − θ ∑ L
i =0
i i +1

∞ ∞
= 1+ ∑
i =1
 i Li − ∑ θ
i =1
i −1 i
L


= 1+ ∑ (
i =1
i
− θ i −1 ) Li .

This ARMA process (2.26) can therefore be represented as

xt = ψ ( L)t (2.28a )

where ψ ( L) = 1 − ∑ψ L , with
i =1
i
i

ψi = −(i − θi−1 ) = i−1(θ −  ). (2.28b)


Equation (2.28b) also illustrates another point. If θ =  then ψi = 0, ∀i ≥ 1, that
is ψ (L) = 1. Substituting this into (2.47a) gives xt = t. Comparing this with
28 Modelling Non-Stationary Time Series

(2.26) it appears that the lag polynomials have cancelled. In the stationary
case, where the common operator has its root outside the unit circle, this is a
reasonable way to describe what has happened, since if θ =  the AR and MA
operators are indeed the same. The situation is a little more complex in the
non-stationary case where dependency on initial values is not negligible.

2.3.10 Factorizing polynomial lag operators


Any polynomial lag operator may be factorized in terms of its roots. Consider
the nth order lag polynomial22
n

( L) = 1 − ∑
L .
i =1
i
i

If the roots of
(L) are i, i = 1, 2, …, n, then it may be written as the product
of first order factors (1 – –1i L),
n

( L) = ∏ (1 −
i =1
−1
i L).

This factorization does not depend on whether the roots are outside the unit
circle. Thus, if in a stationary ARMA(p, q) model, there is a common factor (i.e.
root) between the AR and the MA polynomial, this may be cancelled to give
an ARMA(p – 1, q – 1) model with exactly the same time series characteristics.

2.3.10.1 Invertibility
An AR(p) or ARMA(p, q) model is said to be invertible if the moving average
operator has all roots outside the unit circle. That is, if

( L)xt = θ ( L)t

then, the process is invertible if and only if


θ ( z ) = 0 ⇒ z > 1.

2.3.10.2 Identifiability and invertibility


Moving average models have the property that a given set of coefficients is
not the only one that reproduces a specific autocorrelation structure. In par-
ticular, if any root of the moving average operator, * is replaced by its
inverse, then the autocorrelation structure is unchanged. In the simple MA(1)
case, the processes
x1,t = t − θt −1 ,
x2 ,t = t − 1 θt −1,
have the same ACF, 23

−1 θ 1θ θ
 x2 (1) = =− =− =  x1 (1)
1 + ( −1 θ )2 1+1 θ2 1+ θ2
Properties of Univariate Time Series 29

(Both are stationary because they are pure moving average processes.) In
general, an ARMA(p, q) or MA(q) process will have 2q different parameteriza-
tions that generate the same ACF, because any subset of the moving average
roots may be replaced by their inverses.24 The coefficients of the MA compo-
nent may not therefore be uniquely identified from the ACF. However, if
invertibility holds, there is a unique set of MA coefficients corresponding to
the ACF.25

2.3.10.3 Comparison with stationarity


Stationarity and invertibility are the same mathematical condition applied to
different operators. Stationarity ensures that an AR(p) process may be
expressed as an MA (∞) (in other words that the inverse of the AR lag polyno-
mial exists). Similarly, invertibility means that the MA operator may be
inverted and so the model expressed in AR(∞) form.

2.3.11 Order of integration and autoregressive integrated moving average


models
As already mentioned, there are many ways in which a time series can be
covariance non-stationary. All that is required is that at least one of the mean,
variance or covariance are changing over time. However, one particular way of
capturing or describing non-stationarity relates to the stochastic trend. A sto-
chastic trend (random walk) is non-stationary as has been seen. The difference
of the process however, is stationary. The unit root associated with the differ-
encing operator is in this case responsible for the non-stationarity. Such a series
is described as integrated of order 1 because differencing it once removes the
non-stationarity. A general definition of an integrated process in the ARMA
context requires a condition to avoid over-differencing. This is invertibility.

2.3.11.1 Integration: Definition 126


Let xt be a non-stationary process. If

yt = ∆ d xt (2.29a)

has a stationary and invertible ARMA representation27

( L)yt = θ ( L)t (2.29b)

then xt is said to be integrated of order d, denoted I(d).

2.3.11.2 ARIMA models


Substituting (2.29a) into (2.29b) gives
( L)∆d xt = θ ( L)t (2.30) (2.30)
30 Modelling Non-Stationary Time Series

A time series having this representation is said to be autoregressive integrated


moving average (ARIMA) of order (p, d, q), where (L) and θ (L) have all their
roots outside the unit circle and are of order p and q respectively. The opera-
tors of the left-hand side of (2.30) can be expressed as a single operator, say


( L) = ( L)∆ d .
For example, if  (L) = 1 – 0.1L and d = 1 then


( L) = (1 − 0.1L)∆ = (1 − 0.1L)(1 − L) = 1 − 1.1L + 0.1L2
where
(L) has roots of 10 and 1, one stationary root and a unit root respec-
tively. This suggests another way of thinking about the order of integration as
being the number of unit roots in the autoregressive lag polynomial.
If xt has an ARMA(m, q) representation
(L) xt = θ (L) t which is invertible and
where non-stationarities are due only to unit roots, then the order of integration
is equal to the number of unit roots of
(L). If
(L) has d ≤ m unit roots, then it
may be factorized as  (L) dxt = θ (L) t, where  (L) is of order m – d.
A time series with a positive order of integration is said to be integrated.
Clearly, integrated time series are not the only type of non-stationary time
series, but this is a very popular way of modelling non-stationarity, not least
because it is simple and because a great deal of statistical theory has been
developed to further this approach.

2.3.12 Trend and difference stationarity


The random walk with drift illustrated that non-stationarity can be due to
deterministic or stochastic trends. In this example it was both. However, inter-
est often focuses on the distinction between non-stationarity due to time
trends and that due to stochastic trends. A time series that is non-stationary
due to a linear time trend is called trend stationary, because it consists simply
of stationary fluctuations around a trend. So, if the fluctuations about trend
are white noise, this would be written,
xt = a + bt + t . (2.31)
If the trend is removed, the process is stationary since xt – a – bt = t.
However, careful consideration of definition 1 of integration shows that this
process is not an integrated process. To see this, first note that by (2.31) xt is
non-stationary as its expected values is the trend and so changing over time.
Now consider differencing removes that non-stationarity:

∆xt = ∆a + ∆bt + ∆t = 0 + b∆t + ∆t = b(t − (t − 1)) + ∆t = b + ∆t . (2.32 )

The definition of integration is trivially generalized to include a non-zero


mean for the difference process (here b),28 but the real problem with (2.32) lies
with the MA process t. This is first order but with a unit root and so is non-
invertible. The definition requires the ARMA representation of the differenced
Properties of Univariate Time Series 31

process to be invertible. Therefore xt is not integrated of order 1 (or higher


order as further differencing will just induce further MA unit roots). In con-
trast, a simple random walk is white noise after differencing by definition, and
so clearly I(1). It is the quintessential I(1) process. It is said to be difference
stationary.
But there is an uncomfortable wrinkle in this terminology: a difference
stationary process can still have a deterministic trend. The simplest case is the
random walk with drift, xt = xt – 1 + b + t, or xt = b + t, which is stationary
and invertible. So the process is I(1).29

2.3.13 Other models


2.3.13.1 Fractional integration
The order of integration of an ARIMA model need not be integer valued. If d is
non-integer, the model is known as fractionally integrated and is abbreviated
to ARFIMA(p, d, q). The process is stationary for d < 0.5, but the autocorrela-
tions die down more slowly than those of a stationary AR process. For d = 0.5
the process is non-stationary.
The definition of the fractional differencing operator requires the gamma
function, (.), and is defined by

Γ( k − d )
∆d = 1 + ∑ Γ(−d)Γ(k + 1) L
k =1
k

1 1
= 1 − dL − L2 − L3 … +  k Lk …
2 d(1 − d ) 6d(1 − d )(2 − d )

where k ∝ k–(1 + d) for large k, and so die away slowly, such that a very high
order autoregressive model would be needed to approximate the ACF reason-
ably well. The ARFIMA model was developed by Granger and Joyeux (1980)
and Hosking (1981).

2.3.13.2 Structural models


Rather than embed time series properties in a single statement, as in the
ARIMA class of models, components having identifiably different characteris-
tics can be modelled separately so that each of its components can be inter-
preted directly. Such models are described extensively in Harvey (1989) and
despite fairly persuasive arguments in their favour will not be dealt with in
this book.30

2.4 Testing for non-stationarity in single series

2.4.1 Background
The form of non-stationarity that is commonly tested for is the unit root. The
structure within which such tests are performed is the AR or ARMA model.
32 Modelling Non-Stationary Time Series

The idea is to obtain a parameterization of the model that allows the hypo-
thesis to be tested to involve a single parameter. This subject is discussed in
detail in Patterson (2005). However, we illustrate some of the structure of
these tests briefly here for two reasons. Firstly, because multivariate generaliza-
tions form the basis of tests discussed in greater length in chapters 3 and 4;31
and secondly because prior testing for non-stationarity is crucial to a great
deal of the methodology of time series modelling used in economics and
finance.

2.4.2 Reparameterizing the autoregressive model


Consider the AR (p) model,
( L)xt = t (2.33)
p
where ( L) = 1 − ∑  . As discussed by many authors (e.g. Burke, 1996), and
i =1
i

applied by Dickey and Fuller (1979, 1981) in their ground-breaking work on


testing for unit roots, this polynomial can be rewritten as

( L) = − ψL +  * ( L)(1 − L) (2.34a )

where
p −1

 ( L) = 1 −
*
∑ L , * i
i
i =1
p

i* = − ∑  , i = 1, 2, … , p − 1
i
j =i +1

and, most relevantly,

ψ = −(1). (2.3b)

Equation (2.34b) shows that ψ = 0 if and only if  (L) has a unit root. For con-
venience, define
p −1 p −1
 * ( L) = ∑i =0
i* Li = 1 + ∑ L .
i =1
* i
i

Substituting this and (2.34a) into (2.33) and using = (1 – L) gives

−(ψL +  * ( L)∆ )xt = t


or

 * ( L)∆xt = ψxt −1 + t (2.35)

where * (L) is a p – 1th order lag polynomial with * (0) = 1. That is, the AR(p)
model may be reparameterized as an AR(p – 1) model in first differences (* (L)
Properties of Univariate Time Series 33

xt), together with a correction term in the lagged level (ψxt – 1). The unit root
test is then a test of the null hypothesis
H 0 :ψ = 0

in the model obtained by rearranging (2.35) in regression format as,


p −1
∆xt = ψxt −1 + ∑  ∆x
i =1
*
i t −1 + t . (2.36)

The summation term on the right-hand side of (2.36) does not appear if p = 1,
and so can be thought of as the correction for autocorrelation beyond
that which would be due to an AR(1) process. The alternative hypothesis
can be one or two sided, according to whether the alternative of interest is
stationarity (ψ < 0) or explosiveness (ψ > 0), or either. Typically the alternative
of interest is stationarity, and so that used is

H A :ψ < 0.

Unit root tests based on (2.36) are called augmented Dickey–Fuller (ADF) tests
(see Patterson 2005).

2.4.3 Semi-parametric methods


A contrasting approach is based on the observation that it is only the para-
meter ψ that is of interest in (2.36). The others, while important because they
correct for autocorrelation, are of no direct interest and are consequently
known as nuisance parameters. Their specific values are certainly of no inter-
est in this context. To see this, rewrite (2.36) so that the correction (or aug-
mentation) terms do not appear explicitly, as
∆xt = ψxt −1 + ut (2.37)
p −1
where ut = ∑  ∆x
i =1
*
i t −i + t ,is an autocorrelated disturbance term, and no longer

white noise. Tests based on (2.37) that assume the disturbances are white
noise will not be valid and inferences from them could be seriously
misleading. However, it is possible to correct the test statistics for the
disturbance autocorrelation so that inferences are once again valid. This
methodology is that developed by Phillips (1987) and Phillips and Perron
(1988). These tests require calculation of a term that has become known as
the long-term variance which is computed using a weighted average of
autocorrelations in a way related to spectral estimation and heteroscedastic
variance–covariance matrix (HAC) estimation (see Andrews 1991; Newey
and West 1987; and White 1980). Again, more details may be found in
Patterson (2005).
34 Modelling Non-Stationary Time Series

2.4.4 Other issues


A number of complicating features present themselves in testing for unit roots
that have analogues in the multivariate tests discussed later in this book. Some
important ones are listed below:

(i) The underlying model may not be AR but ARMA. In this case the AR
approximation would be arbitrarily long and impractical in an empirical
setting. Practically speaking, the optimal length of the pure AR approx-
imation depends on the sample size, and longer models can only be
entertained as more data becomes available. The relationship between
the sample size and the AR order is critical to the (asymptotic) validity of
the test. Ng and Perron (1995) discuss this problem and Hall (1989)
offers a different approach. See also Galbraith and Zinde-Walsh (1999).
(ii) The number of unit roots may be greater than 1 in which case testing
can become unreliable if performed in such a way that unit roots remain
unparameterized in the model. Dickey and Pantula (1987) advise on this
issue. It is relevant since economic time series, especially those recorded
in nominal terms, can be integrated of higher order, especially I(2).
(iii) Economic time series are often subject to structural breaks. This is a port-
manteau term to cover many possibilities, but relates simply to the
assumption of constancy of parameters where this does not exist. This
may affect the parameters of interest, so that, for example, a series may
change from being I(1) to being stationary. Alternatively, a time series
may in fact be stationary around a trend (or mean level) that is subject
to jumps or sudden changes in slope. Since the tests themselves look at
the stochastic behaviour around the trend, misspecification of this trend
leads to unreliable inferences about the stochastic component of the
series. This is a topic of current research, but established papers in the
area are Perron (1989), Zivot and Andrews (1992) and Perron (1990).
(iv) As already observed, a unit root test is an examination of the stochastic
component of a series, that is the random fluctuations about some deter-
ministically determined level. This could be many things: zero, non-zero
but fixed, or a trend of some polynomial degree. But misspecification of
the deterministic component can lead to incorrect inference on the sto-
chastic properties of the data. Dickey and Fuller (1979) address this to
some extent, developing tests for the trend as well as the unit root.
Patterson (2000, section 6.4) discusses a framework for joint determination
of the stochastic and deterministic components of a univariate time series.

2.4.5 Other approaches


Unit root tests of the type outlined above have been criticized for a number of
reasons. In empirical work, great importance is placed on the distinction
Properties of Univariate Time Series 35

between unit root and stationary processes, so high precision is required of


the tests. This is unfortunate because it has been demonstrated that even in
the stationary case significant distortions can occur in the estimation and
testing of autoregressive roots (Nankervis and Savin, 1985, 1988). The main
attack is on the power of the tests: their ability to correctly reject the null of
non-stationarity. To be useful, any test must, asymptotically, be able to reject
a false null with certainty. Such tests are called consistent tests. The tests of
the previous section satisfy this requirement under reasonable conditions.
However, they are likely to perform less well if the root is stationary but close
to unity, or if the process is fractionally integrated with an integration para-
meter close to one half.
The dual of the power problem can also be encountered, where size, the
probability of rejecting a true null, is distorted from its nominal value. Of
course, what this amounts to saying is that the appropriate tail of the null dis-
tribution is altered. This will occur, for example in an ARIMA(p,1,1), model,

( L)∆xt = (1 − θL)t (2.38)

where |θ| < 1, and (L) has all its roots outside the unit circle, so the differenc-
ing operator is the only source of the unit root and xt ~ I(1). But as θ → 1, so (1
– θL) → (1 – L) and the MA operator will tend to cancel with the differencing
operator. In the limit where this occurs, the process will be stationary and the
null ought to be rejected. But in finite samples, this will be a smooth rather
than a sudden transition, leading to a tendency for tests to reject the null of a
unit root for θ close to unity, even though strictly speaking the process is still
I(1). (See Blough, 1992, for a discussion of this issue.)32
This idea has formed the basis of a set of stationarity tests where the null is
of stationarity. This amounts to a null hypothesis of

H0 :θ = 1

in models such as (2.38). Naturally, this literature is closely related to that for
testing for moving average unit roots, important contributions being
Kwiatowski, Phillips, Schmidt and Shin (1992), and Leybourne and McCabe
(1994). Of course, such tests suffer from finite sample power problems for θ
close to unity, and size problems when an additional AR root tends to 1 (see
also Lee and Schmidt, 1996, for behaviour in the presence of fractional inte-
gration). KPSS also suggest using both unit root and stationarity tests jointly
in confirmatory data analysis. This was investigated in Burke (1994) and
found to add little to the use of either test individually.
The power of unit root tests can be improved by the use of covariates
(Hansen 1995). This has not yet become a popular approach. A method that is
becoming as popular as the ADF test is that advanced by Elliott, Rothenberg
and Stock (1996).
36 Modelling Non-Stationary Time Series

The Bayesian approach to unit root testing is now well developed and may
be found more appealing since the impact of the unit root’s presence or
absence is not so crucial for the distribution theory (see Bauwens, Lubrano
and Richard, 2000, chapter 6). Harvey’s (1989) structural models relax the
concentration on the ARIMA model that has taken such a firm hold in the
analysis of non-stationary economic time series, and offer an alternative set of
testing techniques. Lastly, in an alternative view of the uncertainty of struc-
ture, Leybourne, McCabe and Tremayne (1996) have developed tests for a
stochastic unit root, where, rather than have a fixed value of 1, the root being
tested is stochastic, having a distribution centred on unity under the null
hypothesis.

2.4.6 Relevance of unit root testing to multivariate methods


There are a number of reasons why it is necessary to test for unit roots. Among
these are that the presence of unit roots alters the statistical properties of
estimators and test statistics used in the econometric analysis of the relation-
ships between variables. Another is that the presence of a unit root in a group
of series makes it possible to identify the presence of a long run relationship
between the series.
Consider an n × 1 vector of I(1) time series Xt = (x1,t … xn,t)´. Consider any
function of these series,
t = f ( Xt ) (2.39)

although a linear function is the easiest to work with, say,


n
f ( Xt ) = a0 + ∑a x
i =1
i i ,t (2.40)

where the ai, i = 0, 1, …, n are constant coefficients.


If this combination results in a zero mean stationary process, then, substi-
tuting (2.40) into (2.39),
n
t = a0 + ∑a x
i =1
i i ,t

is stationary. That is to say that the relationship


n
a0 + ∑a x
i =1
i i ,t =0 (2.41)

holds with an error that has mean zero, constant variance, the ACF of which
damps off quite quickly. Being stationary, it will not wander widely from its
n
mean value of zero and will cross it frequently. That is, a0 + ∑a x
i =1
i i ,t will

not depart from zero in any permanent way. So (2.41) holds in the long run –
Properties of Univariate Time Series 37

never exactly, but without long periods of failure. This property is known as
cointegration.
Unit root tests may feature in two ways in order to establish the existence of
such a long-run relationship. First, it is necessary to test for the unit roots in
the first place. Secondly, in order to establish that a long-run relationship
exists, it is necessary to test if the function of the data is stationary – that is, to
check that it does not contain a unit root. Thus one might perform a unit root
or stationarity test on f (Xt), although, importantly, if the parameters of this
function are estimated then adjustments to the critical values of the tests are
necessary due to the uncertainty of the estimates as representative values of
the true parameter values.

2.5 Conclusion

This chapter has considered the characterization of non-stationarity for uni-


variate time series. Testing for non-stationarity is mainly considered in the
context of univariate models that are autoregressive (Dickey and Fuller 1979),
and thus focuses on the presence of unit roots. The unit root is a very power-
ful property of a time series, and in cases where there are a reasonable number
of observations, it is generally fairly straightforward to determine its presence
or otherwise. Not only that, but its presence or absence has important struc-
tural implications since unit roots are associated with long-run behaviour.
A powerful tool of empirical analysis can be based on testing, whether such
long-run behaviour is exhaustively shared by a set of time series. This is the
notion of cointegration. To develop the idea, it is necessary to consider the
relationship between series, rather than the properties of individual series
alone. This is the subject of the next chapter.
3
Relationships Between
Non-Stationary Time Series

3.1 Introduction

The previous chapter dealt with the properties of univariate time series, and in
particular non-stationarity as characterized by the autoregressive unit root.
This chapter develops the theme by looking at the way in which this type of
non-stationarity can be modelled as a common feature such that the non-
stationarity in one series is fully explained by that present in an appropriate
combination of other series. It is natural to think of this in terms of a single
regression equation.
The unit root corresponds to long-run behaviour of a series, that is
to a component that has an arbitrarily low frequency. Thus, an equa-
tion which fully explains unit root behaviour can be thought of as fully
describing the long-run relationship between the series concerned, or, in
other words, it describes the underlying equilibrium behaviour. If the equa-
tion fails to capture all unit root behaviour it cannot be an equilibrium
relationship.
These ideas are discussed below. The context is intuitively appealing, being
that of a single equation with an unambiguous distinction between depend-
ent variable and (weakly exogenous) regressors.1 This does have limitations,
however, among which is that only one equilibrium relationship can be con-
sidered. This is relaxed in later chapters.

3.2 Equilibrium and equilibrium correction

3.2.1 Long-run relationships


The idea of equilibrium is fundamental to the interrelationship of economic
processes. In the time series econometric context, the idea is encapsulated in
many ways. In general, the concept implies an underlying relationship about
which the process or processes under examination vary, without deviating too
far or for too long away from the values that would have to exist if the rela-
tionship held exactly at each period in time.
38
Non-Stationary Time Series: Relationships 39

3.2.1.1 In a static model


A static model is one in which all the processes appear with the same time
index so that only current values are concerned and there are no intertem-
poral links between them. Consider two scalar processes, zt and yt. Suppose
there exists an exact linear relationship between them so that, at all points in
time t
yt = + 0 zt . (3.1)

There is never any deviation from this relationship. To emphasize that there is
zero deviation, rewrite (3.1) as

yt − − 0 zt = 0. (3.2 )

However, rather than hold exactly, (3.2) might be subject to deviation. So, for
a given value zt, if the relationship (3.2) held exactly, the value of the y
process would be

yte = + 0 zt . (3.3)

yte
But the y process is not equal to but some other value, simply yt. Denote the
difference between these two, the extent to which the exact relationship does
not hold, by

t = yt − yte . (3.4)

If the t process varies about zero with a controlled size, then it is reasonable
to regard the exact relationship (3.3) as the underlying relationship between
the variables. Such a relationship is referred to as a long-run relationship. If,
on the other hand, the deviations t seem to grow without bound, or become
increasing dispersed about zero, then the exact relationship seems to be irrele-
vant. The stochastic property required of the deviations t is stationarity (and
zero mean). Substituting (3.3) into (3.4) and rearranging gives

yt = + 0 zt + t . (3.5)

Taking expectations of this gives

E( yt ) = + 0 E( zt ) + E(t ). . (3.6)

Then, if E(t) = 0 and

E( yt ) = y ,
E( zt ) = z ,

equation (3.6) can be written

y = + 0 z . (3.7)

This is the same functional relationship as (3.3), that is, it is the underlying
or long-run relationship. The sequence of operations leading to (3.7) can be
stylized as follows:
40 Modelling Non-Stationary Time Series

(i) Assume the processes zt and yt have settled to fixed values z– and y–
respectively.
(ii) Assume also that there are no more deviations to the system, i.e. assume
t = 0 (which can be regarded as its settled value).
(iii) Substitute these values into the complete relationship (3.5).

The resultant function relates the long-run static values of the variables. It is
known as the static equilibrium. The condition that all variables have settled
down in this way is known as the steady state.
The treatment here attempts to point out that while it is perfectly possible
to make the above substitutions and obtain the long-run static solution in this
way, this does not prove its existence. Rather it says that if zt and yt settle to
fixed values and disturbances are stationary then the long-run solution to the
model is given by (3.7). This discussion also indicates that the origin of these
settled values should be the expected value of the processes.

3.2.1.2 In a dynamic model


Consider the model

yt = + 1 yt −1 + 0 zt + 1zt −1 + ut . (3.8)

where the ut are disturbances to the dynamic relationship, but it remains to be


seen how these relate to the deviations from equilibrium.2 Since this relation-
ship includes lags, it is said to be dynamic. Taking expected values of (3.8)
treating all variables as stochastic, gives

E( yt ) = + 1 E( yt −1 ) + 0 E( zt ) + 1 E( zt −1 ) + E(ut ). (3.9)

If it is assumed that

E( yt ) = E( yt −1 ) = y , (3.10a)
E( zt ) = E( zt −1 ) = z , (3.10b)
E(ut ) = 0, (3.10c)
then (3.9) can be used to derive a relationship between z– and y–. Substituting
equations (3.10a), (3.10b) and (3.10c) into (3.9) and rearranging gives

( +  )
y= + 0 1 z. (3.11)
(1 − 1 ) (1 − 1 )

3.2.2 Equilibrium and equilibrium error


Equation (3.11) allows the definition of deviations from equilibrium. It can be
rearranged as
( +  )
y− − 0 1 z = 0. (3.12 )
(1 − 1 ) (1 − 1 )
Non-Stationary Time Series: Relationships 41

The left-hand side of (3.12) can be evaluated at a pair of actual values (zt, yt). If
the system was in equilibrium, this should be zero. The extent to which it is
not zero is the equilibrium error, which has been denoted t. That is,
( +  )
yt − − 0 1 z = t (3.13)
(1 − 1 ) (1 − 1 ) t

This defines the equilibrium error. If this process is non-stationary then it


doesn’t make much sense to regard the long-run solution as an equilibrium of
course.

3.2.3 Equilibrium correction


The dynamic model (3.8) can be rewritten in terms of the equilibrium error
(3.13). Subtracting yt – 1 from both sides and adding 0zt – 1 – 0zt – 1 to the right-
hand side of (3.8) gives
∆yt = + 0 ∆zt + (1 − 1)yt −1 + (0 + 1 )zt −1 + ut
= + 0 ∆zt − (1 − 1 )yt −1 + (0 + 1 )zt −1 + ut

which, on grouping terms in the lagged levels on the right-hand side, gives

(0 + 1 )
∆yt = + 0 ∆zt − (1 − 1 )( yt −1 − zt −1 ) + ut (3.14)
(1 − 1 )
From (3.13),
(0 + 1 )
yt −1 − zt −1 = t −1 + ,
(1 − 1 ) (1 − 1 )

which on substitution into (3.14) gives3

∆yt = 0 ∆zt − (1 − 1 )t −1 + ut . (3.15)

This can also be called an equilibrium correction model.4 Changes in yt are


seen to be due to changes in zt and the extent to which the system was out of
equilibrium in the past period, that is t – 1. From (3.13), the equilibrium error is
positive if the y process is of a value higher than is consistent with equilibrium
( 0 + 1 )
(t > 0 ⇔ yt > + zt ). This suggests that such an error should exert a
(1 −  1 ) (1 −  1 )
downward pressure on yt in the next period, in other words that there should
be a negative pressure on the change. This means the coefficient on the lagged
equilibrium error in (3.15) ought to be negative if this simple behavioural rule
applies. Similarly, if the equilibrium error is negative, there should be upward
pressure, and again the argument is for a negative coefficient on t–1. In this
simple model this requires 1 < 1.
The speed of adjustment to equilibrium is measured by the size of the
coefficient on the disequilibrium error (1 – 1). The larger this is in absolute
42 Modelling Non-Stationary Time Series

value (assuming it is of the “correct” negative sign), the quicker is the


adjustment.

3.2.4 Equilibrium correction and autoregressive distributed lag models in


general
Equation (3.8) is an example of an autoregressive distributed lag (ADL) model.
There are two variables in the model, each appearing with a maximum lag of
1. The model is therefore referred to as an ADL(1, 1) model, the first number
of the ordered pair referring to the maximum lag of the dependent variable.
Note that (3.8) could be written in terms of lag polynomials. Define
( L) = 1 − 1 L
( L) = 0 + 1 L

Then (3.8) can be written

( L)yt = + ( L)zt + ut . (3.16)

The long-run solution and hence the equilibrium error can also be written in
terms of these polynomials since

(1) = 1 − 1
(1) = 0 + 1
which can be substituted into (3.13) to give

(1)
t = yt − − zt . (3.17)
(1) (1)

Similarly, the ECM may be written


∆yt = 0 ∆zt − (1)zt −1 + ut . (3.18)

These results can be generalized for the ADL (m, n) model which is (3.16) with
m
( L) = 1 − ∑ L ,
i =1
i
i
(3.19a)

n
( L) = ∑ L .
i =0
i
i
(3.19b)

Using a slightly generalized version of the reparameterization used in section


2.4.2, equation (2.34a), that is if
p
 ( L) =  0 − ∑ L
i =1
i
i
(3.20a)

then it can be written


( L) = (1) L +  ∗( L)(1 − L) (3.20b)
Non-Stationary Time Series: Relationships 43

where
p −1

 * ( L) =  0 − ∑ L * i
i
(3.20c)
i =1
p

i* =− ∑  , i = 1, 2, …, p − 1.
i
(3.20d)
j =i +1

Thus (3.16) becomes


((1) L + * ( L)(1 − L)yt = + (1) L + (* ( L)(1 − L))zt + ut

This can be rearranged as


* ( L)∆yt = − (1)yt −1 + (1)zt −1 + * ( L)∆zt + ut

 (1) 
a* ( L)∆yt = + * ( L)∆zt − (1) yt −1 zt −1  + ut
 (1) 
or
 (1) 
* ( L)∆yt =  * ( L)∆zt − (1) yt −1 − − zt −1  + ut (3.21)
 (1) (1) 

Equation (3.21) is the ECM in this general case. Noting that the lag of a fixed
value is the same as the fixed value, Liy– = y– for i = 0, 1,2 …, and so
 p  p  p 
( L)y = 1 −


i =1
i Li  y = y −


i =1
i y = 1 −

∑   y = (1)y
i =1
i

and similarly
( L)z = (1)z

the long-run static solution to the model can be written


(1)
y− − z =0
(1) (1)

and hence the equilibrium error as (3.17) but using the operators (3.19a
and 3.19b). The ECM may therefore be written in terms of the equilibrium
error as

∗( L)∆yt = ∗( L)∆zt − (1)t −1 + ut .


Equation (3.19a) gives the form of  (L) and from (3.20a to 3.20d) it can be
seen that *(0) = 1 and *(L) is of order m – 1, that is

m −1
 * ( L) = 1 − ∑ L
i =1
* i
i
44 Modelling Non-Stationary Time Series

Similarly
n −1
* ( L) = 0 + ∑ L
i =1
* i
i

and so the ECM can be rewritten


m −1 n −1
∆yt = ∑
i =1
i* ∆yt − i + ∑  ∆z
i =0
*
i t − (1)t −1 + ut (3.22)

equation (3.22) shows the ECM reparameterization of the ADL (m, n) model
is in terms of the differences of the processes and the lagged equilibrium
error, the maximum lag of the differences of each variable being one less
than the maximum lag of the level in the ADL. Notice that the current
value of z t appears on the right-hand side of (3.22), that is the summation
involving its lags begins at 0, while the summation involving the lags of
y t begins at 1 because the current value is on the left-hand side of the
equation.

3.2.4.1 Solving the ECM for the long-run solution


The ECM reparameterization of the ADL is performed without any initial
reference to the long-run solution of the model. That is
( L)yt = + ( L)zt + ut

 (1) 
a* ( L)∆yt = * ( L)∆zt − (1) yt −1 − − zt −1  + ut .
 (1) (1) 
The ECM provides an immediate calculation of the long-run solution. Under
the assumption of a steady state without growth:5

yt = y , zt = z , ut = 0, ∀t

and hence
∆yt = yt − yt −1 = y − y = 0,
∆zt = zt − zt −1 = z − z = 0.

Substituting these equations into the ECM gives

 (1) 
* ( L)0 = * ( L)0 − (1) y − − z + 0
 (1) (1) 

which on rearrangement gives

(1)
y= + z
(1) (1)
as the long-run or steady-state solution to the model.6
Non-Stationary Time Series: Relationships 45

3.2.4.2 Generalization to ADL models with more than two variables


The reparameterization generalizes in a straightforward manner to the situa-
tion where there are more than two variables in the equation. The general
ADL may be written
r
( L)yt = + ∑  ( L)z
j =1
j j ,t + ut

where  (L) and j(L), j = 1, 2, …, r are lag polynomials of possibly different


order and  (0) = 1, and zj,t, j = 1, 2, …, r are a set of explanatory variables.
Then, using the reparameterization on each of the polynomials and rearrang-
ing, the ECM is of the form

r 
r
 j (1) 
* ( L)∆yt = ∑
j =1
*j ( L)∆z j ,t − (1) yt −1 −

 (1)
− ∑ (1) z
j =1
j ,t −1 


= ut . (3.23)

The lag polynomials * (L) and *j (L) are interpreted as before and are of order
one less than  (L) and j(L), respectively.

3.2.5 Unit roots and the ECM reparameterization


Equations (3.14),7 (3.21), and (3.23) demonstrate the ECM reparameterization
of ADL models of increasing generality. But they all have the same structure
and (3.23) covers them all. It is clear that the value of  (1) is very important.
It directly determines the speed of adjustment to equilibrium as it is the co-
efficient of the lagged equilibrium error term. It should also be negative if the
equilibrium error interpretation is to make sense. Furthermore, in order for
the long run (i.e. equilibrium) solution to exist, it must be non-zero because it
appears as a divisor in the equilibrium error term.
The condition  (1) ≠ 0 means the lag polynomial  (L) must have no unit
roots. Of course, as  (1) → 0, the speed of adjustment gets slower. In the limit
there is no equilibrium to which to adjust. This point is returned to in the
context of non-stationary processes. It is also clear that if any of the lag poly-
nomials on the explanatory variables, that is any of the j (L) have unit roots,
then the corresponding variable disappears from the long-run solution since
then j (1) = 0. If  (L) has a unit root, then one avenue to consider, if eco-
nomically meaningful, would be to work instead with the differences of yt
right from the start, since under these circumstances  (L) could be factorized
thus extracting the unit root as

( L) = ˜( L)(1 − L) = ˜( L)∆


and so

( L)yt = ˜( L)(1 − L)yt = ˜( L)∆yt


46 Modelling Non-Stationary Time Series

The ADL would then be


r
˜( L)∆yt = + ∑  ( L)z
j =1
j j ,t + ut

and the long-run solution



r
 j (1)
∆y =
 1)
˜(
+ ∑ ˜(1) z
j =1
j


where y means that the yt process is still changing, but by the same (steady
state) amount every period. This long-run equilibrium relates the steady-state
change in the y process to the steady-state levels of the z processes. If there are
unit roots in any of the j (L) the same approach could be used here resulting
in the replacement of the steady-state level of the corresponding zj by its
steady-state change. The possibly uncomfortable result is a long-run solution
that mixes equilibrium levels, changes along a steady-state growth path and
changes that might be best described as flows rather than stocks.
To give a slightly more concrete example, suppose  (L) does not have a
unit root, and suppose there are two explanatory variables, the lag polynomial
on the second of which has a unit root. The ADL is

( L)yt = + 1( L)z1,t + 2 ( L)z2,t + ut ,


(1), 1(1) ≠ 0, 2 (1) = 0.
There are then two possible interpretations in terms of the long-run solution
(also see the discussion of I(2) processes in chapter 6). Leaving 2(L) as it is,
that is not factorizing out the unit root, gives the long-run solution

 (1)  (1)  (1)


y= + 1 z1 + 2 z2 = + 1 z1
(1) (1) (1) (1) (1)
that is, x2 drops out of the long-run solution. Alternatively, 2 (L) could be
~
factorized so as to draw out the unit root, 2 (L) = 2 (L) . The ADL would
then be

( L)yt = + 1( L)z1,t + ˜2 ( L)∆z2,t + ut

and the long-run solution

 (1)  (1)
y= + 1 z1 + 2 ∆x2 (3.24)
(1) (1) (1)

Equation (3.24) relates the steady-state level of y to the steady-state level of x1


and the steady-state change of x2 The speed of adjustment to equilibrium is
the same in both cases, but in the former, the first differences of z2,t will
appear in the short-run dynamics, whereas in the latter it will be the second
differences ( 2). The ECMs for the two cases are given below in equations
(3.25a) and (3.25b) respectively.
Non-Stationary Time Series: Relationships 47

* ( L)∆yt = 1* ( L)∆z1,t + 2* ( L)∆z2,t


  (1) 
−(1) yt −1 − − 1 z1,t −1  + ut , (3.25a)
 (1) (1) 


* ( L)∆yt = 1* ( L)∆z1,t + ˜2* ( L)∆ 2 z2,t − (1) yt −1 −
 (1)
1(1)  (1) 
− z1,t −1 − 2 ∆z2 ,t −1  + ut , (3.25b)
(1) (1) 

This type of choice over specification is not very comfortable as the two long-
run equilibria are different, one including a steady state change, the other not.
However, some clarification is often available either from the underlying
economic theory or, in the case where some or all of the variables are non-
stationary, their orders of integration.

3.3 Cointegration and equilibrium

3.3.1 Equilibrium error and static equilibrium


For purposes of illustration, consider again the two variable ADL (m, n) model,
( L)yt = + ( L)zt + ut (3.16) again

where the lag polynomials are as in equations (3.19a, 3.19b). Assuming it


exists, this has a long-run static solution given by

(1)
y= + z. (3.26)
(1) (1)

In any period t, the equilibrium error will be

(1)
t = yt − − zt . (3.27)
(1) (1)

But in what sense is (3.26) an equilibrium rather than just a long-run solution
that may or may not be relevant? It won’t be relevant if, for example, the vari-
ables do not tend to steady-state values, that is z– or y– don’t exist. This depends
on the properties of the error sequence, t. Writing (3.27) as

(1)
yt = + zt + t
(1) (1)

shows that yt is its putative equilibrium value, (1) plus a deviation,


+ z ,
(1) (1) t
t from this value. If these deviations display any form of permanence, then it
is not sensible to regard (3.26) as the underlying relationship. One definition
of a lack of permanence is stationarity, or, more precisely, the property of
48 Modelling Non-Stationary Time Series

being I(0). If t is I(1), then the idea that (3.26) represents an equilibrium is
entirely unhelpful. Granger (1991) and Engle and Granger (1991) compare the
properties of I(1) and I(0) variables. An I(0) series has a mean and there is a
tendency for the series to return to this value frequently with deviations that
are large on a relatively small number of occasions. An I(1) process will
wander widely, only rarely return to an earlier value and its autocorrelations
will remain close to one even at long lags. Theoretically, the expected time it
takes for a random walk to return to some fixed value is infinite.8 Clearly then,
it makes no sense whatsoever for t to be I(1), but it does seem reasonable to
require it to be I(0). There is the issue of its mean value as well. Clearly this
should be zero, although this does not affect the time series properties of the
variable, meaning its stationarity, variance, and autocorrelation structure.9
So the working definition of a static equilibrium will be as follows.

3.3.1.1 Static equilibrium


The relationship

y = 0 + 1z (3.28)

is a static equilibrium relationship for the processes zt and yt if

yt − 0 − 1zt ~ I ( 0). (3.29)

More generally, there may be an arbitrary number of variables and the func-
tion need not be linear. Engle and Granger (1991) use the term attractors to
describe relationships such as (3.28) when (3.29) holds.

3.3.2 Static equilibrium with I(1) variables


3.3.2.1 Sums of ARMA processes
Suppose that zt and yz have ARMA(pz, qz) and ARMA(py, qy) representations
respectively. In particular, let
 z ( L)zt = θ z ( L) z ,t
 y ( L)yt = θ y ( L) y ,t (3.30)

where z,t and y,t are two white noise processes. These relationships can be
used to obtain the ARMA representations for any linear combination of zt and
yt Consider

w t = zt + y t . (3.31)

Equations in (3.30) indicate that it will be necessary to work with lag oper-
ators applied to both processes. To remove zt from (3.31) multiply through by
z (L), to give

 z ( L)wt =  z ( L)zt +  z ( L)yt = q z ( L) z ,t +  z ( L)yt . (3.32)


Non-Stationary Time Series: Relationships 49

To substitute for yt multiply (3.32) through by y (L), to give


 y ( L) z ( L)wt =  y ( L)θ z ( L) z ,t +  y ( L) z ( L)yt
=  y ( L)θ z ( L) z ,t +  z ( L) y ( L)yt
=  y ( L)θ z ( L) z ,t +  z ( L)θ y ( L) y ,t . (3.33)

The left-hand side of (3.33) is easily simplified by writing

( L) =  y ( L) z ( L) (3.34)

which is a polynomial lag operator of order p = pz + py. The last line of (3.33)
is the sum of two MA processes, y (L) z (L) z,t and z (L) y (L) y,t, of orders
py + qz and pz + qy respectively. As long as the white noise processes z,t and y,t
are only contemporaneously correlated (i.e. E (z,t – i, y,t – i) = 0 if i ≠ j), then the
autocorrelations of the sum of these process will extend only as far as the larger
of the two individual orders. That is, the sum will be a MA process whose order
is the larger of py + qz and pz + qy. The variance of the new white noise driving
sequence and the MA coefficients will depend on the variance–covariance
matrix of y,t and z,t and the coefficient values of the original operators, z (L),
z (L), y (L) and y (L).10 Thus, the final line of (3.33) may be written
 y ( L)θ z ( L) z ,t +  z ( L)θ y ( L) y ,t = θ ( L)t (3.35)

where  (L) is a lag polynomial of order q = max(py + qz, pz + qy), and the vari-
ance of t is chosen so that  (0) = 1. Thus the time series model for wt is
ARMA(p,q), were
p = pz + p y
q = max( p y + qz , pz + q y )
( L)wt = θ ( L)t .
From (3.34), the roots of  (L) will be those of z (L) and y (L). Consider three
important cases.

(i) If all the roots of these lie outside the unit circle, then all the roots of
 (L) lie outside the unit circle and so wt is stationary. This means if
zt and yt are stationary so is their sum, wt.
(ii) Suppose zt is I(1) and yt I(0). Then  (L) has one unit root, all the
other lying outside the unit circle and all the roots of  (L) lie
outside the unit circle. Since  (L) = y (L) z (L), it follows that  (L)
has one unit root, all other outside the unit circle. Thus, wt is I(1).
This means that the sum of an I(0) and an I(1) process is I(1).
(iii) Suppose that both zt and yt are I(1). This case is a little more complic-
ated and it is necessary to go back to the working used when obtain-
ing the ARMA structure for the sum. Consider equation (3.33), using
the last equality on the right-hand side,
50 Modelling Non-Stationary Time Series

 y ( L) z ( L)wt =  y ( L)θ z ( L) z ,t +  z ( L)θ y ( L) y ,t .

Since zt and yt are I(1), the AR operators from their separate ARMA representa-
tions may be written in terms of a new AR operator consisting of all and only
the stationary roots, and the differencing operator. So:

 z ( L) = ˜z ( L)∆
 ( L) = ˜ ( L)∆
y y

~ ~
where z (L) and y (L) are of orders pz – 1 and py – 1 respectively. Thus (3.33)
may be written

˜ y ( L)˜z ( L)∆ 2 wt = ˜ y ( L)∆θ z ( L) z ,t + ˜z ( L)∆θ y ( L) y ,t


= ∆(˜ y ( L)θ z ( L) z ,t + ˜z ( L)θ y ( L) y ,t ).

The common factor of can now be cancelled on each side of the equation11
to give

˜ y ( L)˜z ( L)∆wt = ˜ y ( L)θ z ( L) z ,t + ˜z ( L)θ y ( L) y ,t . (3.36)

Thus wt has one (not two) unit roots and wt has an ARMA(p*, q*) structure
where
p * = p z + p y − 2,
q * = max( p y + qz + pz + q y ) − 1.

Alternatively, in ARIMA terminology, wt is ARIMA(p*, 1, q*). This important


result shows that the sum of two I(1) processes is also I(1).

3.3.2.2 Linear functions of ARMA processes


It is also easily shown that multiplying or adding constants alters neither the
ARMA orders nor their integration properties. Let

( L)zt = θ ( L)t , (3.37)

and define
z̃t = zt +  (3.38)

where  is a constant. Multiplying both sides of (3.38) by  (L) gives

( L)z˜t = ( L)zt + ( L) = θ ( L)t + (1). (3.39)

In the stationary case, E (z~t) = , Var (z~t) = Var (zt), and the covariances are
given by z~ (j) = E ((z~t – ) (z~t – j – )) = E(ztzt – j), so are the same as those of the
original process, zt, and since the variance is also unchanged, so are the auto-
correlations. Thus although a constant has to be added to the model, it is
otherwise unchanged.
Non-Stationary Time Series: Relationships 51

When zt is I(1), it is the case that  (1) = 0 since  (L) has a unit root.
Equation (3.39) can therefore be written as:
( L)z˜t = θ ( L)t

which is exactly the same process as the original. To show multiplying by a


constant makes no difference, continue to define the zt process by equation
(3.37), and consider the transformed process,
z̆ t = zt (3.40)

Multiplying both sides of (3.40) by  (L) gives


(L)z̆ t = (L) zt = (L) zt = (L)t = (L) t.

Since t is zero mean white noise, so is any scalar multiple, so t is zero mean
white noise. The structure is therefore unchanged as no new autocorrelation
has been induced. To summarize, if zt is ARMA (p, q) then so is t =  + zt and
furthermore this process has the same autocorrelation structure, so that its AR
and MA operators are unchanged. In particular, if zt is I(d), then so is any
linear transformation. Mathematically,
( L)zt = θ ( L)t (3.41)
⇒ ( L) t = + θ ( L)t

where
 t =  + zt
= (1).

3.3.2.3 Linear combinations of ARIMA processes


Let zt and yt be the ARIMA(pz, dz, qz) and ARIMA(py, dy, qy), where dz, dy = 0, 1.
Then (3.1) states that t =  + zt is ARIMA(pz, dz, qz), and, defining
ξt = yt +  + zt (3.42)

it follows from the results for the sum of two ARIMA processes that t is
ARIMA(p*, d*, q*) where
p * = pz + p y ,
q * = max( p y + qz , pz + q y ),
d * = max( dz , d y ).

Equations (3.43) are easily generalized to show that any linear combination of
an ARIMA(pz, dz, qz) and an ARIMA(py, dy, qy) is ARIMA(p*, d*, q*), with p*, d*, q*
as defined by equations (3.43). In particular:

(i) A linear combination of I(0) processes is I(0).


(ii) A linear combination of an I(0) and an I(1) process is I(1).
(iii) A linear combination of I(1) processes is I(1).
52 Modelling Non-Stationary Time Series

These results generalize trivially to the case of a linear combination of an


arbitrary number of ARIMA processes as follows. If xi,t is ARIMA(pi, di, qi),
di = 0, 1, i = 1, 2, …, n, and
n
ξt = 0 + ∑ x
i =1
i i ,t

for a set of constants t, i = 0, 1, …, n, then t is I(d*), where d* = max(di).

3.3.3 Cointegration: static equilibrium with I(1) variables


Thus it appears that a linear combination of two I(1) variables is also I(1), not
I(0). But, if static equilibrium is to exist between I(1) variables, a linear combi-
nation has to exist that is not I(1), but I(0). For some processes, this cannot
happen. But it can happen that there exists a special association between the
processes such that a special linear combination does result in a stationary
series. This is called a cointegrating combination.
Rather than look for the properties of a special pair of I(1) series such that
cointegration can result, it is easier to construct such a pair directly.12 To keep
things simple, suppose zt is a pure random walk given by

∆zt =  z ,t

where z,t is white noise. Let y,t be another white noise process, uncorrelated
with z,t, and define

y t = z t +  y ,t . (3.44)

If yt is I(1), then zt and yt are cointegrated since


y t − z t =  y ,t

is a linear combination of I(1) processes which, being white noise, is I(0). To


show that yt is indeed I(1), note that the right-hand side of (3.44) is the sum of
an I(1) and an I(0) process. It has been shown that an I(1) plus an I(0) process
is I(1). Thus, yt is I(1) and so zt and yt are cointegrated.
Given that earlier it was shown that any linear combination of an ARIMA(pz,
dz, qz) and an ARIMA(py, dy, qy) is ARIMA(p*, d*, q*), with p*, d*, q* as defined by
equations (3.43), some further explanation is needed here. In particular, this
states that the order of integration is d* = max(dz, dy). In fact, all the orders
reported in equations (3.43) are upper bounds on the orders of the model,
because it is possible that simpler representations may exist if the moving
average polynomial has some roots in common with those of the autoregres-
sive polynomial, or if it has any unit roots. Such common roots are known as
common factors. For example, suppose that zt and yt have ARIMA(pz, dz, qz) and
an ARIMA(py, dy, qy) representations respectively, given by

 z ( L)∆ dz zt = θ z ( L) z ,t ,
d
 y ( L)∆ y yt = θ y ( L) y ,t ,
Non-Stationary Time Series: Relationships 53

and define the linear combination


t =  z zt +  y yt

Then t has the ARIMA representation

 * ( L)∆ d *t = θ * ( L)t (3.45)


for some white noise process t, where * (L) and * (L) are lag polynomials of
orders p* and q* as defined by equations (3.43), and d* = max(dz, dy). But it is
possible that * (L) and * (L) contain a common factor, say (1 – L), so that

 * ( L) = (1 − L)˜ * ( L),
θ * ( L) = (1 − L)θ˜* ( L), (3.46)
~ ~
the polynomials * (L) and  * (L) being of orders p* – 1 and q* – 1 respectively.
Substituting (3.1) into (3.45) gives

(1 − L)˜ * ( L)∆ d *t = (1 − L)θ˜* ( L)t (3.47)

which on cancelling the common factor (1 – L) becomes

˜ * ( L)∆ d *t = θ˜* ( L)t ,

which is an ARIMA(p* – 1, d*, q* – 1) model. As a special case of this, the MA


lag polynomial may have a unit root, and therefore be written

θ * ( L) = (1 − L)θ˜* ( L).
Substituting this into (3.45) gives

 * ( L)∆ d *t = (1 − L)θ˜* ( L)t


which on cancelling with one of the unit roots represented by d* (and ignor-
ing the generation of a possibly non-zero mean) gives

 * ( L)∆ d * −1t = θ˜* ( L)t .

In this case t is an ARIMA(p* – 1, d* – 1, q* – 1) process.


A special case of this last example is the case of cointegration where * (L)
has no unit roots, and dz = dy = d* = 1. Then zt and yt are I(1) but t is I(0).
Hence zt and yt are cointegrated.
The general result for the linear combination of two ARIMA processes is
considered next.

3.3.3.1 Linear combinations of ARIMA processes


The linear combination of an ARIMA(pz, dz, qz) and an ARIMA(py, dy, qy)
~ ~ ~
process will be an ARIMA( p , d , q ) process
˜
˜( L)∆ dt = θ˜( L)t
~ ~ ~
where  (L) is invertible and  (L) and  (L) have no common factors, with the
orders given by
54 Modelling Non-Stationary Time Series

p˜ ≤ pz + p y
q˜ ≤ max( p y + qz , pz + q y )
d˜ ≤ max( dz , d y )

with equality only if no common factors have been cancelled.

3.3.3.2 Example
Figure 3.1 shows a time series plot of two series generated artificially according
to equations
∆zt =  z ,t , (3.48a)
y t = 1 + z t +  y ,t , (3.48b)

where z,t and y,t are two independent NIID(0,1) series. Both zt and yt are I(1)
and cointegrated by construction. Figure 3.2 shows the same data using a
scatter plot.
The time series plots indicate the non-stationary nature of both series, and
that, in this case, they track one another very closely. The latter property is
not necessary for two series to be cointegrated. It is quite possible that an
increasing gap may open up between them. This depends on exactly what the
cointegrating combination is. The scatter plot strongly emphasizes the linear
nature of the underlying long run, and in this case equilibrium relationship,
which is, from (3.48b), y– = 1 + z–.
It is also important to realize that the cointegrating property depends on the
selection of the correct linear combination. Using equation (3.48b), the linear
combination generating cointegration is 1,t = yt – zt. This is stationary by con-
struction. But suppose instead, the combination 2,t = yt – –12zt is considered.
Subtracting – –12zt from both sides of (3.48b) gives,

1 1
yt − z t = 1 + z t +  y ,t . (3.49)
2 2

Figure 3.1 Time series plot of artificially generated cointegrated series


Non-Stationary Time Series: Relationships 55

Figure 3.2 Scatter plot of artificially generated cointegrated series

Figure 3.3 Plot of 2,t, non-cointegrating combination of cointegrating variables

But – –12zt is I(1) and y,t is I(0), so from (3.49), 2,t is I(1), and so non-stationary.
This illustrates a key point: to obtain cointegration where it exists, the correct
linear combination must be used.
Figure 3.3 shows clearly that this combination is not stationary and so not a
cointegrating combination. To illustrate the case where a cointegrating combi-
nation still results in a gap opening up between series represented on a time
series plot, suppose instead of (3.48b), a series y*t is generated according to

1
yi∗ = 1 + z t +  y ,t . (3.50)
2
Figure 3.4 is a time series plot of zt and y*t. It would be wrong to conclude
from this that just because the series are diverging that they are not cointe-
grated. It is simply that the difference between the two is not the cointegrat-
ing combination.
56 Modelling Non-Stationary Time Series

Figure 3.4 Time series plot of y*t and zt generated according to (3.48a) and (3.50)

3.3.4 ADL models, cointegration, and equilibrium


There is a very strong link between cointegration and equilibrium when a set
of I(1) variables are related according to an ADL model with stationary distur-
bances. To introduce the idea, consider a reparameterization of the ADL(1, 1)
model (3.8) into an ECM:
  + 1 
∆yt = 0 ∆zt − (1 − 1 ) yt −1 − − 0 zt −1  + ut . (3.51)
 1 − 1 1 − 1 

Note that the static long run solution is

 + 1
y= + 0 z (3.52a)
1 − 1 1 − 1

and for convenience, put

 + 1
t −1 = yt −1 − − 0 zt −1 (3.52b)
1 − 1 1 − 1

and substitute this into (3.51) as in section 2.4 to give


∆yt = 0 ∆zt − (1 − 1 )t −1 + ut .

Rearranging this last equation, t can be seen to be a linear combination of


I(0) variables and is therefore itself I(0) as long as 1 – 1 is materially different
from zero, since
1 0 1
t −1 = ∆yt − − ∆zt ut . (3.53)
1 − 1 1 − 1 1 − 1

All terms on the right-hand side of (3.53) are I(0): yt and zt because yt
1
and zt are I(1), ut by assumption, and, as long as ——
1–a
is well defined, then
1
Non-Stationary Time Series: Relationships 57

multiplying a variable by a constant does not change its order of integration.


Hence, t is the sum of three I(0) processes, and so is I(0). From (3.52b), this
 + 1
means that yt − − 0 zt is I(0) and hence by the definition of static
1 − 1 1 − 1
equilibrium in section 3.2.1.2, equation (3.52a) is a static equilibrium. It is
also the cointegrating combination.
In this sense there is an intimate link between cointegration and equilib-
rium, and it is for this reason that the concept of cointegration is so appeal-
ing. It provides an empirically testable definition of equilibrium relationships
amongst time series data. The general ADL result follows immediately along
the same lines.

3.3.4.1 ADL models, cointegration and equilibrium


Let yt and xj,t, j = 1, 2, …, r be a set of I(1) variables related according to the
ADL model
r
( L)yt = + ∑  ( L)z
j =1
j j ,t + ut ,

where  (L) and  (L), j = 1, 2, …, r are lag polynomials of possibly different


order and  (L), and ut is I(0). The long-run static solution is both an equilib-
rium and a cointegrating combination of the variables. The deviations from
the long-run values are therefore stationary and can be interpreted as devi-
ations from equilibrium.

3.3.4.2 Example
Consider the ADL(1, 1) model
1 3 3
yt = 1 + yt −1 + zt − zt −1 + ut (3.54a)
2 4 8
zt = t (3.54b)

where ut and t are uncorrelated white noise processes. First of all, zt is I(1)
since it is a random walk, and so is yt since equations (3.54a) and (3.54b)
imply

 1  3 1 
∆ 1 − L yt = 1 − L t + ∆ut . (3.55)
 2  4 2 

The right-hand side of (3.55) can be written as an MA(1) process, the left-hand
side shows that the AR operator has a single unit root, so the process is I(1),
and more fully is ARIMA(1,1,1). Equation (3.54a) has long-run solution

3
y =2+ z, (3.56a)
4
58 Modelling Non-Stationary Time Series

and equilibrium error

3
t = yt − 2 − zt , (3.56b)
4

the ECM being

3 1 3 
∆yt = ∆zt −1 −  yt −1 − 2 − zt −1  + ut .
4 2 4 

Figure 3.5 presents time series plots of zt and yt, and Figure 3.6 plots the equi-
librium error t as given by equation (3.56b). The disturbance processes, ut and
t are both NIID(0, 1).
The equilibrium errors in Figure 3.6 are stationary, but do not appear to be
white noise. There are runs where the values remain continuously positive for
a period of time, and others where negativity persists. This is consistent with
autocorrelation. The time series properties of the process can be obtained from
(3.56b) as follows.
First write the ADL of equation (3.54a) as

( L)yt = 1 + ( L)zt + ut ,

where
1
( L) = 1 −L, (3.57a)
2
3 3
( L) = − L. (3.57b)
4 8

Figure 3.5 Cointegrated processes generated by (3.54a) and (3.54b)


Non-Stationary Time Series: Relationships 59

Figure 3.6 The equilibrium error sequence for (3.56b)

Then

3
t = yt − 2 − zt
4
3
⇒ ( L)t = ( L)yt − (1)2 − ( L)zt
4
3
⇒ ( L)t = 1 + ( L)zt − (1)2 − ( L)zt . (3.58)
4

Note that  (1) = –12 and hence  (1) 2 = 1, which, on substitution into (3.58)
and rearrangement, gives

3
( L)t = ( L)zt − ( L)zt + ut .
4
Using (3.57a) and (3.57b),

3 3 3 3 1 
( L) − ( L) = − L − 1 − L = 0.
4 4 8 4 2 

Hence  (L) t = ut or

 1 
1 − L t = ut .
 2 

Thus t is a stationary AR(1) process. Clearly the root of  (L) determines the
persistence of the equilibrium errors. The closer it is to one, the more persis-
tent they will be. This also determines the speed of adjustment, which is  (1).
As the root tends to 1 this speed of adjustment will tend to 0. In the limit, the
60 Modelling Non-Stationary Time Series

long-run solution does not exist, and therefore neither does an equilibrium
relationship or cointegration.13
It is straightforward to construct an alternative ADL(1, 1) model to (3.57a)
and (3.57b) that has the same long-run solution, but much more persistent
equilibrium errors and slower adjustment to equilibrium. To get the increased
persistence and slower adjustment to equilibrium, replace (3.57a) by

( L) = 1 − 0.95 L (3.59a)

This has  (1) = 0.05 instead of 0.5. In order to obtain the same long-run solu-
tion, the intercept, , which was originally 1, and must be multiplied by
0.1. Although there are a number of ways to obtain the latter result, the
easiest is to multiply the original operator given by (3.57b) by 0.1 to give

( L) = 0.075 − 0.0375L, (3.59b)

with the new value of being 0.1. Thus the DGP is now

( L)yt = 0.1 + ( L)zt + ut ,

with the polynomial lag operators defined by equations (3.59a) and (3.59b),
while the DGP for zt is still the random walk (3.54b). Figure 3.7 shows both
the original errors (etaold) and the new much more persistent ones (etanew);
note the scale of this and the earlier plot are different.14 In fact, what seems to
have happened is that the broad pattern of fluctuations has remained the
same, but their amplitude has become much larger. For example, there are
occasions where the low persistence series (etaold) is positive, but an indi-
vidual shock (i.e. disturbance term) is sufficient to drive the sequence across
the zero line so that the neighbouring value is negative. However, with
increased persistence (etanew), the same shock is insufficient to drive the
series into negativity because it is a lot further away from zero.

Figure 3.7 Time series plot comparing equilibrium errors


Non-Stationary Time Series: Relationships 61

This result can be shown algebraically. For the models under consideration,
the structure of the equilibrium errors is:
t −1
t = t0 + ∑
i =1
t −i
ui + ut .

Assuming the initial value, 0 is zero gives


t −1
t = ∑
i =1
t −i
ui + ut .

So the current equilibrium errors can be decomposed into two parts: that due
to previous shocks, say
t −1
U t −1 = ∑
i =1
t −i
ui ,

and that due to the current shock, ut. In the case where ut is white noise with
variance  u2, the variance of previous shocks is
t −1
2 (1 − t −1 ) 2
Var(U t −1 ) = ∑
i =1
2( t − i )
 u2 =
(1 − 2 )
u

The ratio of the variance of the current shock to that of the component due to
past shocks is therefore

(1 − 2 )
r ( ) =
 (1 − t −1 )
2

This is a decreasing function of , and so as  increases, the variance of the


current shock becomes smaller relative to that of the accumulated shocks.
That is, increasing persistence will manifest itself by a decreasing importance
of the current shock relative to the aggregate effect of all those that have pre-
ceded it in the evolution of the process.

3.3.5 Cointegration amongst I(d) variables


It is necessary to provide a definition of cointegration amongst variables that
are integrated of the same order.
Definition: Cointegration Amongst Variables Integrated of Order d
Let xi,t i = 1, 2, …, n be a set of variables all integrated of order d. Let i, i = 1,
2, …, n be a set of constants. If there exists a linear combination of the vari-
ables given by
n
t = ∑ x
i =1
i i ,t (3.60)

that is integrated of order d – b, where 0 < b = d, then the variables xi,t, i = 1, 2,


…, n, are said to be cointegrated of order (b, d) (or CI(b, d)). The coefficient
62 Modelling Non-Stationary Time Series

n
vector ´ = (1 … n) is called the cointegrating vector, and
=

 i xi ,t is called
the cointegrating combination of the variables. i 1

The most important special case of this is where d = b so that the linear
combination is stationary. The ADL case discussed above is of this type with
d = 1, so the variables in this case are CI(1, 1).
A useful thing to realize at this point is that a regression equation with dis-
turbances can be written as a linear combination like (3.60). Suppose
x1,t = 2 x2 ,t + 3 x3,t + … + n xn,t + ut .

This can be written as

ut = x1,t − 2 x2 ,t − 3 x3,t − …− n xn ,t ,

which is of the form (3.60) with t = ut, 1 = 1, i = – i for i = 2, 3, …, n.


A regression where all the variables are I(1) but the disturbances are I(0) is
called a cointegrating regression.15

3.4 Regression amongst cointegrated variables

3.4.1 Static regressions


The existence of cointegrating relationships between I(1) variables can be tested
using a single static regression equation estimated by ordinary least squares
(OLS). A static regression is one involving only contemporaneous values of the
variables. There are strong reasons for preferring a multiple equation approach
to this problem, but the single equation approach is described here due to its
popularity. Its original application is due to Engle and Granger (1987).
Static regressions can be thought of as falling into three cases defined by the
order of integration of the variables. These are: (i) variables I(0); (ii) variables
I(1) but not cointegrated; (iii) variables I(1) and cointegrated. Case (i) is not
discussed in detail, as it is the foundation case free of the complications due to
non-stationarity. (For a clear discussion, see Patterson 2000).
Consider the bivariate case involving two variables zt and yt, both I(1), and
consider the regression of yt on zt,
yt = zt + t ,t = 1, 2,…,T ,

estimated by OLS. The OLS estimator of , say ˆ, can be written

−1 −1
 T  T 2  t  T 2
ˆ =  ∑
 t =1

zt y t   zt 
  t =1 
=+ ∑
 T =1
ztt  ∑ zt  .
  t =1 

The large sample behaviour of the estimator ˆ relative to  therefore depends


−1
 t  T 2
on that of  ∑
 T =1
ztt   ∑ zt  . Stock (1987) shows that, when zt and yt are
  t =1 
Non-Stationary Time Series: Relationships 63

cointegrated with parameter , so that t ~ I(0), this term converges to zero


at a rate of Op (T –1).16 This means that ˆ tends to  at the same rate, and is
therefore a consistent estimator. When the series are stationary, the conver-
1
gence is only Op (T – –2). Stock’s result is therefore known as a proof of super-
consistency. It is unaffected by the disturbances, t, being autocorrelated (as
long as they are I(0)), or being correlated with zt. The consistency of ˆ estab-
lishes that it can replace  in any model with no change to the asymptotic
properties of other estimated parameters.
Standard inference on  is not available, however, as, appropriately normal-
ized by multiplying by T, its asymptotic distribution is non-normal, being that
of a random variable depending on Wiener processes (see Banerjee et al.,
1993, p176, and Park and Phillips, 1988). Its t-ratio is also asymptotically non-
normal in general.17
It is important to note that these results are asymptotic, and that in finite
samples, significant biases may occur (Inder 1993; Banerjee et al. 1993). These
arguments extend to multiple regressions of the form
n

x1,t = ∑ x i i ,t + ut , t = 1, 2,…,T , (3.61)


2 =1

where xi,t, i = 1, 2, … n, are CI(1, 1).


When the dynamic model is relatively complex, containing a larger number
of variables and higher-order lags, the structure of the disturbance term in
(3.61) will be correspondingly complex. This greater complexity does not
affect the super-consistency of OLS estimation, but it does increase the
chances of considerable bias. This will impact on subsequent tests of cointe-
gration, and upon any model, such as an ECM, estimated using the residuals
from the static regression. In such cases, tests and estimations can be based
upon the ADL model directly.
Analogously to the unit root testing problem, an alternative to, in effect,
correcting the regression by adding lags, is to correct the estimates (and
t-ratios) non-parametrically, as is done in the Phillips (1987) unit root test.
This approach was developed by Phillips and Hansen (1990), and can be
found described in detail in Hamilton (1994, p. 613). Super-consistency states
that problems arising from autocorrelated residuals and endogeneity can be
ignored in the limit. However, in finite samples these effects will still be
present. Modification can be developed to reduce these problems, the result-
ant estimators being known as fully modified least squares (FMLS).
This approach is not discussed in detail here. Its application requires the cal-
culation of variance matrices that can have rather poor properties with the
result that simulation evidence varies according to the form of data genera-
tion process used. See Inder (1993), Patterson (2000), and Phillips and Hansen
(1990) for more details.
64 Modelling Non-Stationary Time Series

3.4.2 Testing for cointegration in single equations


3.4.2.1 Tests based on static regressions
Equation (3.61) provides the basis for testing for cointegration of order
CI(1, 1) Assume that Xi,t, i = 1, 2, … n, are I(1). Then, in general, any linear
combination of these variables will also be I(1). The exception is if they are
cointegrated, in which case, estimating (3.61) by OLS (which minimizes the
residual variance), should provide a good estimate of the cointegrating
coefficients, i. It has already been argued that OLS provides a consistent esti-
mate of i under cointegration. Thus one way to proceed is to estimate (3.61)
by OLS, obtaining residuals ût, and testing the residuals for a unit root, since
n
uˆt = x1,t − ∑ ˆ x
i =1
i i ,t ,

is a minimum residual variance linear combination of observations on the


variables. This means that any standard procedure for testing for unit roots or
stationarity is available for testing the integratedness of the residuals, and
hence whether or not the series are cointegrated. In the sense that the ˆi are
consistent for the i, the residuals ût can be said to be consistent estimators of
the disturbances ut, so that the test is of the cointegration properties of the
variables. Clearly ut ~ I (0) is equivalent to cointegration, while ut ~ I (1) is
equivalent to non-cointegration. Unit root tests are tests of the null hypo-
thesis of a unit root and are thus tests of the null of non-cointegration
when the alternative is I(0), while stationarity tests have stationarity as the
null, corresponding to cointegration.
Tests of the former type are the usual augmented Dickey–Fuller tests, or the
Phillips Z tests. The asymptotic properties of these and other residual based tests
of non-cointegration are discussed by Phillips and Ouliaris (1990). A leading
example of the latter type of test, being one of the null of cointegration against
an alternative of non-cointegration, is the test given in Kwiatowski et al. (1992).
The asymptotic distributions of these test statistics are altered as a result of
the estimation of i, and finite sample distributions vary with both T and n.
Critical values may be calculated using the response surfaces of MacKinnon
(1991).18

3.4.2.2 Test based on dynamic model


It has been found that unit root tests lack power in finite samples. That is,
they do not reject the null of a unit root sufficiently frequently as the autore-
gressive parameter approaches unity from below (e.g. Schwert, 1989). This
problem transfers to their use as tests of non-cointegration, where the null of
non-cointegration is not rejected with sufficient frequency when the residuals
are close to being non-stationary, but are in fact stationary (Banerjee et al.
1993; and Inder 1993). The power of such tests can be improved by correcting
Non-Stationary Time Series: Relationships 65

for neglected structure in the disturbances of the test regression (Kremers et al.
1992).
To illustrate, consider the bivariate ECM of equation (3.8), but without an
intercept for simplicity,
yt = 1 yt −1 + 0 zt + 1zt −1 + ut ,

which has the ECM form


∆yt = 1 ∆zt +  2 ( yt −1 −  3zt −1 ) + ut ,
0 + 1
with 1 = 0 ,  2 = −(1 − 1 ),  3 = . If 2 = 0 then the series are not cointe-
1 − 1
grated. In practice, the cointegrating coefficient 3 is unknown, and so a test
on an estimated coefficient requiring knowledge of its value appears imprac-
ticable. However, the ECM may be rewritten as

∆yt = 1 ∆zt +  2 ( yt −1 − zt −1 ) +  4 zt −1 + ut , (3.62)


 4 =  2 (1 −  3 ).
The test is of H0: 2 = 0 in (3.62), the test statistic being the usual OLS t-ratio.
If the actual cointegrating coefficient happens to be equal to 1 (i.e. 3 = 1),
then the term in 4zt-1 will be superfluous. Banerjee et al. (1993, table 7.6)
provide some critical values for this test statistic.
In the case of more than two variables the approach generalizes to specify-
ing any potential equilibrium error as the variable of which 2 is the
coefficient, and add correcting terms in the first lag of each of the variables
other than yt-1. More complex dynamics are allowed for by adding lagged dif-
ferences of the variables, as in the original ECM.
An alternative approach is to reparameterize the ECM to a form an analogue
of the equation in the disturbances used to perform the ADF test. Trans-
forming the ECM:
∆yt = 1 ∆zt +  2 ( yt −1 −  3zt −1 ) + ut
⇒ ∆yt −  3 ∆zt =  2 ( yt −1 −  3zt −1 )
+ (1 −  3 )∆zt + ut (3.63a)

But t = yt – 3zt is the cointegrating combination, the consistent estimates of


which are the residuals from the static regression on which the ADF test is per-
formed. However, rather than the usual ADF regression, (3.63a) is a further
augmentation:
∆t = ψ 1t −1 + ψ 2 ∆zt + ut , (3.63b)
ψ 1 =  2 ,ψ 2 = (1 −  3 ).

This suggests that the ADF regression should be further augmented by


the lagged difference of the right-hand side variable in the static regression. It
also shows that the usual ADF regression assumes a restriction applies to the
66 Modelling Non-Stationary Time Series

original ECM, namely 1 = 3, so that there is no requirement for the extra dif-
ference term, or, more accurately, ignores the fact that there will be a correla-
tion between the disturbances and the regressors of the standard ADF
regression, since both will include a component of zt-1.19 The test statistic is the
usual ADF t-ratio on t–1 in (3.63b). The equilibrium error, t, should be calcu-
lated using a consistent estimator of the cointegrating coefficients. These could
come from the static regression, or, from the long run solution to the dynamic
model.
Equation (3.63b) is modified for more complex dynamics and additional
variables by adding differences of all explanatory variables and lagged differ-
ences of all variables (including lags of yt).
Finite sample critical values have to be simulated. Illustrative values may be
found in Patterson (2000, table 8.11). Inder (1993) has found that such tests
display more power than the usual ADF residual based tests, and have addi-
tional desirable properties. They are more robust, because when ψ2 ≠ 0, then
the finite sample performance of the tests are distorted by the exclusion of
extra dynamic terms such as zt.

3.4.3 Problems with the single equation approach


The single equation approach is problematical for a number of reasons:

(i) If there is more than one cointegrating vector, which is possible when
there are more than two integrated variables, then the single equation
approach is only likely to result in a linear combination of these.
(ii) Even if there is only one cointegrating relationship, all variables may
be responding to deviations from equilibrium. Estimating a single
equation only, ignores this and leads to inefficiencies in the estimation.
This amounts to assuming that the right-hand side variables are weakly
exogenous, so their dynamic equations exclude the cointegrating
relationship.

Given that an approach allowing estimation of a system of equations


describing multiple cointegrating relations is available, it is likely that this will
provide more robust estimation and inference since it does not rely on condi-
tions that it is difficult to test in practice. This is the subject of the following
chapter.

3.5 Conclusion

The notion of cointegration developed via the integration of conventional


time series analysis with econometric methods. Econometrics dealt initially
Non-Stationary Time Series: Relationships 67

with models that were mainly static in nature, while the dynamic nature of
data was implicit in univariate time series analysis. In univariate time series
analysis data is differenced to induce stationarity, but this was not common
practice in economics until the 1980s. One of the first articles to amalgamate
a time series model with an econometric formulation with levels was the wage
equation article produced by Sargan (1964), the model unlike many wage
equations of the time considered the question of the dynamic specification of
a wage inflation equation in the context of a model that is estimated by
instrumental variables. The article is the first example of an error correction
model, which was both well ahead of its time and highly influential in terms
of the institutional modelling of UK wage equations. The ARMAX representa-
tion is the first example of cointegration as the error correction term is
significant assuming the type of asymptotic normality of the t-test on the
coefficient of the error correction term is accepted (Kremers et al. 1992).
Granger and Newbold (1974) provided the first simulation experiments to
consider the impact of non-stationarity on the diagnostics associated with
nonsense regressions. The problem of nonsense correlation was well known to
time series statistics through the work of Yule (1926) and should have been
known to the econometrics literature because of the intervention of Keynes
(1939), who discussed the potential for misanalysis when regressions between
variables with intermediate causes were considered. Granger and Newbold
considered the special case for which the intermediate cause was an indepen-
dent stochastic trend. While time series analysts started to consider tests for
non-stationarity (Dickey and Fuller 1979), econometricians in the UK started
to implement models which exhibited error correction behaviour. Davidson et
al. (1978) introduced the notion that the error correction term explained the
long-run behaviour of economic series and these dynamic models are again
cointegrating relationships in the sense of Kremers et al. (1992) as they
include combinations of stationary variables. In one case the lag series renders
the variable stationary, for the error correction term, the contemporaneous
observation in a different time series does the same thing.
Granger (1983) introduced the term cointegration to the literature, while
Sargan and Bhargava (1983) provided the first recognized test of existence of
long-run behaviour. It was Granger, via his decomposition of the Wold repre-
sentation of what are quasi over-differenced series, whi explained how depen-
dent moving average behaviour might yield long-run relations with variable
that are stationary. Engle and Granger’s (1987) article provided a means by
which bivariate relationships might be given error correction representations,
though more generally this proposition does not follow from the results devel-
oped in the article. The two-step method developed by Engle and Granger
shows that the long-run parameters in the case where there are two variables
68 Modelling Non-Stationary Time Series

or a single cointegrating vector can be estimated from a cointegrating


regressin. In general, this is not the case, which suggests that the requirement
to develop an approach that might be applied to a system. It is the systems
approach developed by Johansen (1988a, 1988b) for the autoregressive model
that will be considered in detail in the next chapter.
4
Multivariate Time Series Approach to
Cointegration

4.1 Introduction

This chapter considers the case where there are a number of non-stationary
series driven by common processes. It was shown in the previous chapter that
the underlying behaviour of time series may arise from a range of different
time series processes. Time series models separate into autoregressive processes
that have long-term dependence on past values and moving average processes
that are dynamic but limited in terms of the way they project back in time. In
the previous chapter the issue of non-stationarity was addressed in a way that
was predominantly autoregressive. That is, stationarity testing via the compar-
ison of a difference stationary process under the null with a stationary auto-
regressive process of higher order under the alternative. The technique is
extended to consider the extent to which the behaviour of the discrepancy
between two series is stationary or not. In the context of single equations, a
Dickey–Fuller test can be used to determine whether such series are related;
when they are this is called cointegration. When it comes to analyzing more
than one series then the nature of the time series process driving the data
becomes more complicated and the number of combinations of non-station-
ary series that are feasible increases.
Here we consider in detail a number of alternative mathematical models
that have the property of cointegration. Initially we discuss representations
that derive from the multivariate Wold form. This is the approach first consid-
ered by Granger (1983) and Granger and Weiss (1983), in which the Granger
representation theorem is developed. From this theorem there are a number
of mathematical decompositions, which transform moving average models
into vector autoregressive models with multivariate error correction terms.
From the perspective of the probability model from which the Wold form
derives the VMA representation provides a more elegant explanation of non-
stationary time series. First, the conditions associated with cointegration in

69
70 Modelling Non-Stationary Time Series

the VMA are more succinct and secondly implicit in the fundamental con-
dition for cointegration in the VMA is the explicit conclusion that under co-
integration the long-run levels relationships are stationary. An alternative
mechanism of decomposing the VAR into an error correction form derives
from Engle and Granger (1987), but beyond the single equation case inference
about non-stationary processes, estimation of the long-run parameters and
testing hypotheses about the long run all derive from the maximum likeli-
hood theory developed by Johansen. When it comes to constructing dynamic
models, then the approach developed by Johansen appears to provide a bridge
between two main strands of econometric time series modelling: first, the VAR
reduced form approach derived from the rational expectations literature by
Sims (1980) and the error correction approach that has developed from the
work of Sargan (1964), Davidson et al. (1978) and Ericsson, Hendry and Mizon
(1998). The cointegration/error correction approach emphasizes the descrip-
tion of detectable economic phenomena in the long run. The cointegration
approach assumes that short-run processes are not well defined by virtue of
aggregation across agents, goods and time, differing forms of expectations,
learning, habits and dynamic adjustment. Alternatively, the long run provides
a useful summary of the non-detectable short-run dynamics, while the error
correction approach in the confines of the VAR permits short-run policy
analysis via the impulse response function and the ability to analyze both
short-run and long-run causality and exogeneity. If the VAR defines a valid
reduced form, then it allows the detection of the readily available structure.
More conventional econometric approaches (Pesaran et al. 2000) criticize
the Johansen methodology for being ad hoc in the sense that it doesn’t use
as its starting point an econometric system of the type defined by the
Cowles foundation, but, as is discussed in the context of RE models in
chapter 6, it is still possible to introduce short-run restrictions in the
confines of a VAR-style cointegration approach. The VMA approach appears
to be less interested in the distinction between the long run and the short
run, as to whether money causes inflation as compared with money leading
to price rises, but still permits impulse response and short-run causality
analysis. However, in the context of pure MA models, inference and detec-
tion of long-run behaviour is less well developed. Impulse response analysis
emphasize the responsiveness of variables to and the effectiveness of policy.
The use of the VAR and VMA for short-run analysis is discussed in detail by
Lippi and Reichlin (1996).
Here, we define cointegration in terms of the Wold decomposition, then we
consider the Johansen approach to testing and estimation, some empirical
results are derived from the literature and discussed in the context of an
increasing body of evidence based on Monte Carlo simulation. Alternative
Multivariate Approach to Cointegration 71

representations are discussed along with the extension of the methods applied
to multi-cointegration and polynomial cointegration.

4.2 The VMA, the VAR and VECM

The concept of cointegration is now well established following the seminal


work of Engle and Granger (1987), and the development of practical estima-
tion and inferential methods by Johansen (1988a,b). Although in the latter
case, many different approaches are now available (for example, on estima-
tion, see Gonzalo, 1994), it is the Johansen methodology that dominates
empirical work. However, there is an uneasy relationship between the struc-
ture explored by Engle and Granger and those exploited by Johansen: the
former is based on the Wold decomposition, that is, a potentially infinite
order vector moving average (VMA) representation, while the latter employs a
vector autoregressive (VAR) model.
It is not difficult to motivate consideration of moving average structure:
there is much evidence in the literature for the poor performance of the
Dickey–Fuller test (Said and Dickey 1984; Hall 1989) and some for the
Johansen test (Boswijk and Franses 1992; Cheung and Lai 1993) under moving
average errors. In the multivariate case, fundamentalness, for which inverti-
bility is a necessary condition (see below and Lippi and Reichlin 1996), is
required for impulse response analysis. Cointegrating VARs must be able to
deliver reasonable approximations to an underlying VMA and prior testing
must be reliable in identifying whether or not cointegration exists.
Interest in MA behaviour also has a generic basis in terms of univariate time
series modelling, while what might be viewed as one of the earliest examples
of error correction behaviour had an MA error that is, the wage–price model –
developed by Sargan (1964). Further, moving average behaviour often defines
measure or rather mismeasurement equations associated with rational expec-
tations models. It is the principle reason for estimating dynamic Euler equa-
tions by generalized method of moments (GMM). More specifically, market
efficiency in the confines of a model relating spot and futures contracts might
be expected to display this form of over-differencing linked with a dependent
relationship and the use of the Wold representation (Flôres and Szafarz 1995).
One problem to be addressed is how to obtain a VAR form in levels from a
VMA form in differences. There are various ways of establishing the relation-
ship. In general such theorems have become known as (Granger) representa-
tion theorems, after a working of the problem in Engle and Granger (1987).
The application of the Smith–McMillan (SM) form to cointegrated systems is
presented in Engle and Yoo (1991) and Yoo (1986). This approach is handled
in detail in section 4.3.
72 Modelling Non-Stationary Time Series

4.2.1 The VAR and VECM models


Let zt be an n × 1 vector of time series, and t a n × 1 of vector white noise
series having E(t) = 0 and
E(t t′ − j ) = ∑ for j = 0 , (4.1)
0 otherwise

 being a positive definite matrix. Then xt has a vector autoregressive structure of


order p (VAR(p)) if
A( L)xt = + t (4.2)

where is a n × 1 vector of constants, and A(L) is an n × n matrix lag poly-


p
nomial given by A( L) = I n − ∑A L,
i =1
i
i
In being the n × n identity matrix and

Ai, i = 1, 2, …, p, n × n coefficient matrices. Equation (4.2) can alternatively be


written
p
xt = + ∑Ax
i =1
i t −i + t

A simple example for n = 2 and p = 1 is xt = [x1,t x2,t]′ and t = [1,t 2,t]′

.5 .25
xt =  x + 
1 .5  t −1 t
Just as with scalar autoregressive models, the VAR(p) may be reparameterized
into differences and a single lagged levels term. Any pth order n × n matrix
p
polynomial of the form A( L) = I n − ∑A L
i =1
i
i
may be written in the form

A( L) = − ∏ L + A * ( L)(1 − L)
 p −1
I n −
where A * ( L) = 
i =1

Ai* Li if p > 1
,

 I n if p = 1
p
Ai* = − ∑ A , i = 1, 2, …, p − 1 and ∏ = − A(1)
j = i +1
i

Applying this reparameterization to (4.2) yields


( − ∏ L + A * ( L)(1 − L))xt = + t

On rearrangement this gives


p −1
∆xt = + ∏ xt −1 − ∑ A ∆x
i =1
*
i t −i + t , (4.3)
Multivariate Approach to Cointegration 73

where the summation on the right-hand side does not appear if p = 1.


Equation (4.3) is known as the vector error correction representation of the VAR.
It exists irrespective of the orders of integration of the processes xi,t, i = 1, 2, …,
n. This is commonly used in the analysis of cointegrated variables (Johansen
1995) and is then often known as a vector error correction model (VECM).

4.2.2 The VMA model


Let i be a vector white noise process as defined in equation (4.1). Let xt be a
n × 1 vector of stationary variables. Then there exists a multivariate version of
Wold’s representation theorem that states that xt may be represented as (a
possibly infinite order) vector moving average (VMA) process. If xt has a non-
zero mean this may be introduced to the model. Thus xt may be represented
xt = + Θ( L)t (4.4)

where is a n × 1 vector of constants and (L) is a n × n matrix lag poly-


q
nomial given by Θ( L) = I n + ∑Θ L
i =1
i
i
where q may be infinite. Equation (4.4)

can be written less compactly:


q

xt = + t + ∑Θ L i
i
t (4.5)
i =1

and (4.5) define a VMA process of order q (VMA(q)).

4.2.3 The Granger representation theorem: systems representation of


cointegrated variables
Chapter 2 established a link between cointegration of order (1,1) and the
single equation error correction model. It was shown that cointegrated
variables could be represented either as an ADL or as an ECM. Cointegration
can be characterized in a systems context in a number of ways and manifests
itself as a set of restrictions on a general model. Naturally there is more than
one way in which this may be achieved. The first characterization of this
nature is due to Granger (1983) and as such is known as the Granger represen-
tation theorem. Subsequent treatments develop alternative representations
(Johansen 1988a, 1995) and generalize the cases considered.

4.2.3.1 Cointegration starting from a VMA and deriving VAR and VECM forms
This was the first approach to explaining how to characterize cointegration in
the context of a multiple time series model. It is in many ways the most
natural for two reasons. First, it builds on an established representation
theorem, the multivariate version of the Wold representation. Secondly, it
naturally restricts cases under examination to whatever orders of integration
are the subject of investigation. Suppose xt is a n × 1 vector of time series each
74 Modelling Non-Stationary Time Series

element of which is I(1). Then the first difference of the vector, xt, is I(0). As
such it has a Wold representation,

∆xt = C( L)t .1 (4.6)

The task is to determine how this relationship can give rise to cointegration.
This follows by application of the reparameterization to C(L), then C(L) =
C(1)L + C*(L)(1 – L), for some C*(L) of order one less than C(L). Substituting
this into (4.6) gives
∆xt = (C(1) L + C * ( L)(1 − L))t . (4.7)

Cointegration requires that there exists a n × 1 vector , such that ′ xt is I(0).


Equation (4.7) can be used to obtain an expression for ′ xt by pre-multiplying
by ′, and remembering that is a scalar linear operator. Thus
 ′∆xt =  ′(C(1) L + C * ( L)(1 − L))t (4.8)
∆ ′xt =  ′C(1)t −1 +  ′C * ( L)∆t (4.9)

Equation (4.9) can be used to develop the following theorem.

Theorem 1 xt is cointegrated if and only if C(1) is singular.

Proof. Singularity of C(1) implies cointegration since C(1) is singular ⇔


rank(C(1)) = r < n ⇒ there exists a vector  such that ′C(1) = 0. It follows from
substitution of this term in (4.9), that
∆ ′xt =  ′C * ( L)∆t

 ′xt = 0 +  ′C * ( L)t (4.10)
The last implication can be thought of as cancellation of the differencing
operator, but in the non-stationary context is better thought of as summation,
effectively the discrete analogue of integration. However described, this
process generates a constant of integration, 0, that is a function of the initial
value of the processes involved. Equation (4.10) shows that ′xt has a moving
average representation and is therefore stationary. However, xt itself is I(1) and
is therefore also CI(1, 1)
Cointegration implies singularity of C (1) since xt ~ CI(1, 1) ⇒ there exists a
vector  such that
 ′xt ~ I ( 0)

Therefore ′xt has an invertible MA representation,


 ′xt = M ( L) t

∆ ′xt = ∆M ( L) t
Multivariate Approach to Cointegration 75

and hence, from 4.9,

∆M ( L) t =  ′C(1)t −1 +  ′C * ( L)∆t .

Summing this last expression from 1 to t gives


t −1
M ( L) t =  0 +  ′C(1) ∑  + ′C * ( L) ,
i =0
i t (4.11)

where 0 is the constant of integration, and so

 ′C(1) = 0 (4.12)

since otherwise the right-hand side of (4.11) would be the sum of the I(1)
 t −1 
process 

∑  
i =0
i and the I(0) process (0 + ′C*(L)t) and hence xt ~ I(1). But

M(L)t is I(0) process, which means, by contradiction, that an I(1) process


must not enter (4.11) and hence ′C(1) must be zero. However:

 ′C(1) = 0 ⇔ rank (C(1)) < n ⇔ C(1) is singular. ■

A moving average process such as (4.7) with singular C (1) may be called a
reduced rank moving average process. Next, the link between a reduced rank
moving average and a VECM needs to be established. This is done by first
establishing that xt has a vector autoregressive moving average (VARMA) repres-
entation. A VARMA process is a VAR with VMA disturbances, so may be
written
A( L)xt = B( L)t (4.13)
p q
where A( L) = I − ∑ A L and B( L) = I − ∑ B L
i =1
i
i

j =1
j
j

where xt is n × 1, Ai, Bj n × n coefficient matrices, and t a n × 1 vector white


noise process as defined in equation (4.1). Equation (4.13) defines a VARMA
process of order (p, q). In order to derive the VARMA structure from the Wold
representation of equation (4.6), two results for polynomial matrices are devel-
oped in Appendix A (Engle and Granger, 1987).

4.2.4 VARMA representation of CI(1, 1) variables


In order to obtain a VARMA representation for xt, it appears that C (L) in

∆xt = C( L)t

must be inverted. However, since xt is CI(1, 1), it follows that C (1) is singular.
That is, C (L) has unit roots preventing its inversion (see Appendix A.3). In
addition, a representation of xt rather than xt is required. The problem is
overcome by factoring out the unit root components from C (L), although
scalar factors are not available. Even so, the approach still very neatly allows
76 Modelling Non-Stationary Time Series

the cancellation of the differencing operator as required. Thus both objectives


are achieved. The results in Appendix B may be used to obtain a VARMA form
of the Wold representation, (4.6). Using notation consistent with equation
(B.1), let rank (C(1)) = n – r, 1 ≤ r ≤ n. Therefore the result developed to derive
(B.5) may be applied to C (L), which is a qth order polynomial in L (m = q). It
~
follows that there exists a matrix lag polynomial Hc (L) of order b ≤ qn-1 – r + 1,
~
and a scalar lag polynomial g c (L) of order a ≤ qn – r such that
˜ ( L)C( L) = ∆g˜ ( L) I .
H C C n

~
Pre-multiplying the Wold form above by Hc (L) transforms the VMA into a
VARMA:
˜ ( L)∆x = H
H ˜ ( L)C( L)
C t C t
= ∆g˜C ( L) I nt

Dividing through by the difference operator


˜ ( L)x =  + g˜ ( L)
H C t 0 C t

where 0 is a constant of integration, which for appropriately set initial condi-


tions or data transformation:
˜ ( L)x = g˜ ( L)
H (4.14)
C t C t

This is a unique VARMA representation of xt, for the case where the order of
~
cointegration is (1, 1) and g c (L) is a scalar polynomial.
To further motivate this result consider the following example.
Let q = 1, n = 3 and

 1 1 2L 1 2 L
 
C( L) = − 1 2 L 1 − 5 4L − 1 4 L .
 1 4 L 1 8 L 1 − 7 8 L 
Then

 1 12 1 2
 
C(1) = − 1 2 −1 4 − 1 4 .
 14 18 1 8 

It is easy to see that C (1) is rank deficient, because the rows and columns of
this matrix are scalar multiples of each. For example, using the notation C (1)i.
to denote the ith row of C (1),
C(1)1. = [1 12 1 2 ] = −2C(1)2. = −2[ − 1 2 −1 4 − 1 4]
= 4C(1)3. = 4[1 4 18 1 8].
By definition the rank of a matrix is the number of linearly independent
rows or columns, which in this case is 1. The decomposition requires pre-
Multivariate Approach to Cointegration 77

~
multiplication of C(L) by the matrix the Hc (L) where the adjoint of C(L)
~
is given by Ca(L) = (1 – L)Hc(L). Calculation of the adjoint follows from the
transpose of the usual matrix of minors (further detail see Dhrymes 1984).
Therefore

− 17 L + 9 L2 + 1 − 1 L + 1 L2 − 1 L + 1 L2 
 8 8 2 2 2 2 
C a ( L) =  12 L − 12 L2 − 78 L − 18 L2 + 1 14 L – 14 L2 
 
 − 1 L + 1 L2 − 18 L + 18 L2 − 54 L + 14 L2 + 1
 4 4 
1 − 9 8 L − 1 2L − 1 2 L
 
= (1 − L) 1 2 L 1 + 1 8 L 1 4 L  = (1 − L˜) H C ( L)
− 1 4 L − 1 8 L 1 − 1 4 L 

This establishes the AR operator of the VARMA (4.14). To obtain the scalar
MA operator note that, from the results on reduced rank polynomials, C(L) =
~
g c (L). In this case

17 5 1
C( L) = 1 − L + L2 − L3 = (1 − L)2 (1 − 1 8 L) = (1 − L)[(1 − L)(1 − 1 8 L)], (4.15)
8 4 8
~
and therefore g c(L) = (1 – L)(1 – 1/8L). Hence the VARMA representation is:

1 − 9 8 L − 1 2L − 1 2 L
 
 1 2 L 1 + 1 8L 1 4 L  xt = (1 − L)(1 − 1 8 L)t .
− 1 4 L − 1 8 L 1 − 1 4 L 
It should be noticed that the MA component is not invertible. In general the
~
VMA does not directly transform into a VAR as only in special cases does g c(L)
invert.
This completes the numerical example.
An important reason for wanting to re-express a cointegrating VMA in dif-
ferences is that a VAR in levels follows from the widely employed techniques
of Johansen (1995a). These assume a (finite order) VAR representation. The
VMA in differences is a very natural starting point since it employs Wold’s
fundamental representation of a stationary process. It also conveniently
allows the scalar processes to have a unit root (be I(1)) and be cointegrated.
Such properties are more difficult to impose starting from a VAR (Johansen
1995a).
From the Johansen point of view, the Engle–Granger approach to trans-
forming a VMA in first differences to a VARMA in levels is inconvenient in
that some moving average structure remains. The right hand side of equation
(4.14) is a VMA with a scalar diagonal matrix lag operator. It is not a pure VAR
as defined in equation (4.2). The advantage is that it applies to any cointegrat-
ing (CI(1, 1)) VMA.
78 Modelling Non-Stationary Time Series

4.3 The Smith–McMillan-Yoo form

Engle and Yoo (1991) show that if the lag polynomial operator of the original
cointegrating VMA is rational (each element of the VMA operator is rational
and may have a different denominator polynomial), then there exists a VAR
representation where the right-hand side is white noise and the autoregressive
operator is rational. As with the Engle–Granger transformation, the unit root
moves from being explicit in the VMA to being implicit in the VAR, but now
there is no autocorrelation of the disturbances, and there is no restriction that
the denominator polynomials of the final VAR operator need all be the same.
The Engle–Yoo approach also has the advantage that it extends fairly readily
to other forms of cointegration.
The problem to be addressed is how to obtain a VAR form in levels from
a VMA form in differences. There are various ways of establishing the relation-
ship. In general such theorems have become known as (Granger) representa-
tion theorems, after a working of the problem in Engle and Granger (1987).
As in the univariate case there are a number of alternative time series repres-
entations. Each representation has different characteristics. Here the alternat-
ive forms are used to move between models where differencing eliminates
strong autoregressive behaviour, but due to dependence among economic
series some over-differencing remains in the form of moving average behav-
iour with unit roots. If this type of behaviour inverts to a model with auto-
regressive behaviour then there may be cointegration amongst the levels of
the non-differenced data. It is the movement from the MA to the AR which is
important.
The application of the Smith–McMillan (SM) form to cointegrated systems is
presented in Engle and Yoo (1991). A rational operator is not in general finite,
which is a problem for the Johansen methodology, although special cases
exist where the left-hand side reduces to a finite order VAR. (See section 4.7.2
for a discussion of a situation where a finite order pure VAR is available for the
first differences.) However, as the denominator polynomials in the Engle–Yoo
representation have all their roots outside the unit circle, the operator co-
efficients tend to zero as the lag length increases. This approach is described
below.
Before describing the approach in detail, it is useful to make some prelim-
inary points.

(i) The Smith–McMillan (SM) form is a decomposition of a matrix polyno-


mial. It can be applied to convert a VMA in differences to a VAR in levels
or vice versa, despite the presence of unit roots.
(ii) It is limited in its application to matrix lag polynomial operators the
individual elements of which are rational (one scalar polynomial divided
Multivariate Approach to Cointegration 79

by another). While rational operators are in general of infinite order in


the lag operator, there exist infinite order polynomials that cannot be
represented in rational form. Strictly speaking, therefore, this form of
decomposition, and hence conversion from VMA to VAR and vice versa
applies only to a sub-class of models: those of rational form. (This does
not rule out the special case of finite order polynomials as these are a
special case of rational polynomials.)
(iii) The SM form allows the diagonalization of rational polynomial matri-
ces, making their manipulation much easier. This is done in two stages.
First, it is noted that there exists a diagonal form for all finite order
polynomial matrices, called a Smith form. Secondly, a rational operator
can be expressed as a scalar factor dividing a finite order polynomial
matrix. The finite order polynomial can then be put in Smith form after
which the result can be divided again by the scalar factor. This gives the
SM form. The Smith form relies on the application of elementary row
or column operations (see Appendix A for details), and it is this
approach that restricts application to finite order polynomials, and
hence restricts the decomposition of infinite order cases to those that
are rational.
(iv) The diagonalization process requires the pre- and post-multiplication of
the original matrix by polynomial matrices that are ALWAYS invertible.
This has two consequences: problems of simplification (this is not really
inversion as will be seen) focus entirely on the diagonalized form (this is
called the SM form); and secondly, multiplication by these matrices or
their inverses do not alter rank.

The distinctive feature of the Smith–McMillan–Yoo form is the factorization


of all the unit roots from the VMA operator in such a way that, by pre-
multiplication by an appropriate matrix, a single differencing operator may be
isolated on the MA side of the equation.

4.3.1 Using the Smith form to reparameterize a finite order VMA


Consider the VMA
xt = C( L)t ,
where C (L) is a finite order operator. The Smith form of the operator C(L) is

C( L) = G( L)−1 CS ( L) H ( L)−1
where CS (L) is a diagonal finite order polynomial matrix and G (L) and H (L)
are invertible polynomial matrices having unit determinant (called unimodu-
lar matrices, see the Appendix A.2 for details), representing the elementary
80 Modelling Non-Stationary Time Series

operations necessary to obtain the diagonalization. Applying this decomposi-


tion to the VMA gives
xt = G( L)−1 CS ( L) H ( L)−1 t

and hence
G( L)xt = CS ( L) H ( L)−1 t . (4.16)

For example, it is shown in Appendix A.1 that the operator

1 − 3 L −L 
C( L) =  14 
 − L 1− 1
L
 8 2

can be written
−1
1 − 6  1 0  −1

C( L) = 1    1 − (2 L − 6) .
3 5
 L 1 − L 0 1 − L + L  0 1 2  
1 
8 4   4 4 
The roots of C (L) and the Smith form, CS (L), are the same since G (L) and
H (L) are unimodular. Further, the diagonality of CS (L) allows any individual
roots to be factored out into another diagonal matrix. In particular, unit roots
may be factored out. In this example,

CS ( L) = C˜S ( L)D( L),

1 0  1 0  ~
where C˜S ( L) =   and D( L) =  . By construction, CS (L)
0 (1 −
1
4
L) 0 (1 − L)

has all roots outside the unit circle (see Appendix A.3), and so can be inverted.
~
So, equation (4.16) can be pre-multiplied by CS (L)–1 to give

C˜S ( L)−1 G( L)xt = D( L) H ( L)−1 t .

Through D (L), the presence of a unit root is now much more apparent than
was the case in the original VMA expression.

4.3.1.1 Reparameterizing a VMA in differences


A further stage in the decomposition of C (L), useful when the VMA describes
the differences of a process, is to isolate the differencing factor as a scalar
term. To do this in the context of the example, define
 (1 − L) 0
D * ( L) = 
 0 1

so that

 (1 − L) 0  (1 − L) 0
D * ( L)D( L) = 
 0 1  0 1
 (1 − L) 0 
= = (1 − L) I 2
 0 (1 − L)
Multivariate Approach to Cointegration 81

where I2 is the 2 × 2 identity matrix. Clearly, such a simplification will always


be available when D (L) has diagonal elements that are either 1 or = (1 – L).
Thus

D * ( L)C˜S ( L)−1 G( L) = ∆H ( L)−1 .


Since is a scalar, this may be rewritten

H ( L)D * ( L)C˜S ( L)−1 G( L) = ∆I 2 .


Thus the VMA can be written

H ( L)D * ( L)C˜S ( L)−1 G( L)xt = ∆t . (4.17)


Continuing the example,
−1 −1
1 0   1   (1 − 4 L)
1
0
C˜S ( L)−1 =   = 1 − L  
1
 0 (1 − 4 L)  4   0 1
giving

1   (1 − L 
−1 1
 L)
H ( L)D * ( L)C˜S ( L)−1 G( L) = 1 − L  1 2

 4   L 1 − L
3
8 4

which is a rational lag polynomial matrix.


Clearly (4.17) is not a VAR because the right-hand side is the difference of a
white noise process, not a white noise process, so it cannot be said that the
VMA has been inverted to form a VAR. It has been inverted as far as possible.
That is, all components involving roots outside the unit circle have been
inverted. Those parts of the VMA operator C (L) that cannot be inverted, the
unit roots, have been isolated and alone remain on the right hand side. For
convenience, let

K( L) = H ( L)D * ( L)C˜S ( L)−1 G( L)

so that

K( L)xt = ∆t .

But now consider the case where the original model was a VAR for the differ-
ences of a process, that is xt = yt, so that, after rearrangement
K( L)∆yt = ∆t .

Then, apart from initial conditions, the differencing operator can be cancelled
to give
K( L)yt = t .

This is a VAR in levels corresponding to the VMA in differences

∆yt = C( L)t .

The VAR is of infinite order, but is rational.2


82 Modelling Non-Stationary Time Series

4.3.2 The Smith–McMillan form in general applied to a rational VMA:


the Smith–McMillan–Yoo form
The manipulation above starts with a finite order operator. It can be general-
ized by allowing every element of C (L) to be a rational (and hence infinite
order) operator. This requires the generalization of the Smith form to the SM
form, the latter being a diagonalized form of a rational polynomial matrix.3
The SM form has a strong structure. Let C (L) be an n × n rational polynomial
matrix.

Assumption A1: C (L) is rational.


If assumption 1 holds, there exist a set of elementary row and column opera-
tions represented by unimodular matrices U (L) and V (L) respectively such
that

C SM ( L) = U ( L)C ( L)V ( L) (4.18)

where CSM (L) is a diagonal rational matrix given by


 f ( L)  (4.19)
CSM ( L) = diag  i 
 g i ( L) 

where:

(i) fi (L) and gi (L) have no common factors;


(ii) fi (L) is a factor of fi + 1 (L), i = 1,2, …, n – 1;
(iii) gi + 1 (L) is a factor of gi (L), i = 1,2, …, n – 1.

There are a number of implications of this result, usefully summarized by


Hatanaka (1996). Let z be a general complex argument.
(I1) For any specific value z0 of z, the rank of C (z0) is equal to that of CSM
(z0).
(I2) If z0 is not a root of fn – r (z) nor of g1 (L), but is a root of fn – r + 1 (z), then
z0 is a root of fn – r + i (z), i = 2, 3, …, r.
(I3) The roots of fi (L), i = 1, 2, …, n are the roots of C (z).
If C (L) has any unit roots, then it follows from implications (I2) and (I3) that
they can be associated only with a set of consecutive fi (L), and that this
sequence must extend to the (n, n)th element. (One such case is associated with
rank (C (1)) = n – r, this is defined as a necessary condition for cointegration.)

Assumption A2: C (1) has rank n – r.


Under assumption 2 the rank of CSM (1) is n – r by implication (I1). Therefore
n – r of the fi (1) must be non-zero, meaning that n – r of the fi (L) cannot have
a unit root. The remaining fi (1) must be zero, meaning that the correspond-
ing fi (L) do have a unit root. Implication (I2) establishes that those having the
Multivariate Approach to Cointegration 83

unit root must be fi (L), i = n – r + 1, …, n as otherwise there may be too many


unit roots.
This observation is applied by Engle and Yoo (1991) to obtain the
Smith–McMillan–Yoo (SMY) form. Define a set of scalar polynomial lag oper-
~ ~
ators, f i (L), such that f i(L) ≠ 0, and
(4.20)
f i ( L) = (1 − L)di f˜i ( L), i = 1, 2, …, n;
di = 0 for i ≤ n − r and di > 0 for i > n − r . (4.21)

Since it is diagonal, CSM (L) can be factorized into the product of two diagonal
matrices, one of the divisor polynomials, gi (L), and one of the fi (L). That is

CSM ( L)( L) = G( L)−1 F( L), (4.22)


where
(4.23)
G( L) = diag ( g i ( L)),
F( L) = diag ( f i ( L)). (4.24)

Using equations (4.20) and (4.21), F (L) may be written:

F( L) = F˜( L)D( L)
~ ~
where F (L) = diag (f i (L),

 I n− r 0′ 
 
 (1 − L)dn −r +1 0 … 0 
D( L) =  0 (1 − L)dn −r + 2 … 0 ,
 

 0r 

 
 0 0 … (1 − L)dn  (4.25)

0r an r × (n – r) matrix of zeros. It follows that

CSM ( L) = G( L)−1 F˜( L)D( L). (4.26)


~
By construction, in this expression, the roots of F (L) are the non-unit roots of
C (L). There is, however, no control over the roots of G (L). By inverting the
unimodular matrices in equation (4.18), C (L) may be written

C( L) = U ( L)−1 CSM ( L)V ( L)−1 . (4.27)

Substituting (4.25) into (4.27) gives

C( L) = U ( L)−1 G( L)−1 F˜( L)D( L)V ( L)−1


and
G( L)U ( L)C( L) = F˜( L)D( L)V ( L)−1 .

As long as all the non-unit roots of C (L) lie outside the unit circle, it follows
~
that all the roots of F (L) lie outside the unit circle.
84 Modelling Non-Stationary Time Series

Assumption A3: The roots of C (L) are either equal to unity or lie outside the
unit circle.
~
Then F (L)-1 exists, implying

F˜( L)−1 G( L)U ( L)C( L) = D( L)V ( L)−1


Since the roots of D (L) are all unit roots it cannot be inverted. If C (L) is the
matrix lag polynomial of the VMA

xt = C( L)t , (4.28)
~
then pre-multiplying by F (L)-1 G (L) U (L) gives

F˜( L)−1 G( L)U ( L)xt = D( L)V ( L)−1 t (4.29)

This makes the presence of the unit roots explicit but is not in VAR form. In
order to take the problem further, specific cases must be considered.

4.3.2.1 The Smith–McMillan–Yoo form and cointegration of order (1, 1)


The starting point for the analysis is the VMA representation in differences,

∆xt = C( L)t (4.30)

Let assumptions A1, A2 and A3 hold, then from (4.28)

F˜( L)−1 G( L)U ( L)∆xt = D( L)V ( L)−1 t . (4.31)


Since C (L) is rational, it may be possible to draw out a factor from the denom-
inator that has a unit root. Mathematically, write
 i , j ( L) 
C( L) =  .
 i , j ( L) 
If any of the i,j (L) polynomials have a unit root, then this can be factored out
of C (L). Suppose m,n (L) = (1 – L) *m,n (L), so that it has a unit root, then:

C( L) = (1 − L)−1 C * ( L) (4.32)
where

 ˜i, j ( L) 
C * ( L) =  ,
 ˜i, j ( L) 
i, j ( L) if i = m, j = n
˜i, j ( L) = 
(1 − L)i, j ( L) otherwise

and
i∗, j ( L) if i = m, j = n
i, j ( L) = 
i, j ( L) otherwise
Multivariate Approach to Cointegration 85

Substituting (4.32) into (4.30) gives


∆xt = (1 − L)−1 C * ( L)t
or
∆ 2 xt = C * ( L)
This gives rise to xt being I(2), in direct contradiction of the assumption that
this process is CI(1, 1). The following assumption is therefore required to
exclude this possibility.

Assumption A4: All the roots denominator polynomials of the elements of


C (L), i,j (L), i, j = 1, 2, …, n, must lie outside the unit circle.
This assumption is worded so as to exclude not only unit roots, but also any
roots on or inside the unit circle. Thus assumption A4 is that all the poles of
C (L) lie outside the unit circle.4 A more fundamental way of justifying this
assumption is to recognize that if i,j (L) has any roots on or inside the unit
i , j ( L)
circle, then the coefficients of do not converge and so, strictly speak-
i , j ( L)
ing, the operator is not even defined, just as it can be argued that the operator
(1 – L) cannot be inverted. In other words, it is meaningless for C (L) to have
any poles on or inside the unit circle.5
Assumption A4 implies that all the roots of G (L) in equation (4.31) lie
outside the unit circle. The objective is to re-express (4.31) (and hence 4.30) as
a VAR in the levels of the process xt. In order to do this it is necessary to find a
way of cancelling the differencing operator from the left-hand side of (4.31).
Since V (L) has no unit roots (because it is unimodular), it is necessary and
sufficient to find a matrix D* (L) such that

D * ( L)D( L) = ∆I (4.33)

since then, pre-multiplying (4.31) by D*(L) gives (apart from initial values)

D * ( L) F˜( L)G( L)U ( L)∆xt = D * ( L)D( L)V ( L)−1 t


= ∆V ( L)−1 t (4.34)

V ( L)D * ( L) F˜( L)−1 G( L)U ( L)xt = t , (4.35)

which is of the required VAR form. However, such a D* (L) will not be avail-
able for all D (L) of the form given in equation (4.25). To see what is required,
write
 (1 − L)dn −r +1 0 … 0 
 
0 (1 − L) dn −r + 2
… 0
D ( L) =   (4.36)
 


 
 … (1 − L)dn 
 0 0
86 Modelling Non-Stationary Time Series

so that
 I n− r 0′r 
D( L) =   (4.37)
 0r D ( L)

and recall that dn–r+i ≥ 1 for i = 1,2, …, r. Partitioning D*(L) conformably as


 D1∗,1 ( L) D1∗,2 ( L) 
D * ( L) =  ∗ 
 D2 ,1 ( L) D2∗,2 ( L)
gives
 D1∗,1 ( L) D1∗,2 ( L)D ( L) 
D * ( L)D( L) =  ∗  = ∆I , (4.38)
 D ( L) D2∗ ,2 ( L)D ( L)
 2 ,1

where
D1∗,1 ( L) = ∆I n− r , D1∗,2 ( L) = 0, D2* ,1 ( L) = 0, D2∗,2 ( L) = I r . (4.39)

The constraints on D*i,j(L) in (4.39) follow from the matrix equivalence of


(4.33) and (4.38). Given (4.37), D*1,1 (L), D*1,2 (L), and D*2,1 (L) impose no restric-
— — —
tions on D (L). However, D*2,2 (L) D (L) = Ir implies D*2,2 (L) = Ir,6 D (L) = Ir,
and hence

 I n− r 0r′ 
D( L) =  . (4.40)
 0r ∆I r 

That is, if xt is CI(1, 1) then D (L) of (4.31) must be given by (4.40).7


Furthermore, from the conditions on D* (L) in (4.39)
 ∆I n− r 0r′ 
D * ( L) =  .
I r 
(4.41)
 0r

As a result D* (L) D (L) = I and substituting into (4.35) gives the VAR in levels
corresponding to the VMA in differences when the variables are CI(1, 1).
This illustrates that if xt is CI(1, 1) with cointegrating rank r (Assumption
A2), then the system may be represented either as a VMA in xt or a VAR in xt,
providing the VMA is rational (Assumption A1).
The SMY form of the VMA operator is given by

C( L) = U ( L)−1 CSM ( L)V ( L)−1 , (4.42)


CSM ( L) = G( L)−1 F˜( L)D( L),
 I n− r 0r′ 
D( L) =  ,
 0r ∆I r 

where U (L) and V (L) are unimodular matrices corresponding to sets of ele-
mentary row and column operations respectively.
Multivariate Approach to Cointegration 87

In summary, the SMY form consists of the factorization of all the unit
roots from the VMA operator (C (L)) in such a way (as D (L), that, by pre-
multiplication by an appropriate matrix (D* (L)), a single differencing operator
( ) may be isolated on the MA side of the equation. This may then be can-
celled with the differencing operator on the AR side where the original VMA is
for a differenced process. This is the process represented in (4.34) leading to
the final representation of (4.35).

4.3.3 Cointegrating vectors in the VMA and VAR representations of


CI (1, 1)
When a CI(1, 1) system is represented in VMA form, the rank of C (1) is n – r.
The n × 1 cointegrating vectors,  are those such that

 ′C(1) = 0. (4.43)

There are r such vectors that are linearly independent. The space of such
vectors is the null space (of the columns) of C (1). This can be compared with
the corresponding VAR representation. For convenience, put

A( L) = V ( L)D * ( L) F˜( L)−1 G( L)U ( L) (4.44)

so that the VAR form of (4.35) may be written


A( L)xt = t .

From (4.27) and (4.42)

C( L) = U ( L)−1 G( L)−1 F˜( L)D( L)V ( L)−1


and hence

C(1) = U (1)−1 G(1)−1 F˜(1)D(1)V (1)−1 ,


~
where U (1), G(1), F (1) and V (1) are all of full rank, while (4.44) implies

A(1) = V (1)D * (1) F˜(1)−1 G(1)U (1). (4.45)


It is also straightforward to see that

C(1) A(1) = U (1)−1 G(1)−1 F˜(1)D(1)D * (1) F˜(1)−1 G(1)U (1) (4.46)
and

A(1)C(1) = V (1)D * (1)D(1)V (1)−1 . (4.47)


Now replacing L by 1 in (4.40) and (4.41),

 I n− r 0 0 0
D(1) =   and D * (1) = 
 0 0 0 I r 
88 Modelling Non-Stationary Time Series

so clearly rank (D (1)) = n – r and rank(D* (1)) = r. Thus

rank( A(1)) = rank( D * (1)) = r .

In addition,

D(1)D * (1) = D * (1)D(1) = 0

and substituting into (4.46 and 4.47) gives

C(1) A(1) = A(1)C(1) = 0.


It follows that the rank of A (1) is r, the cointegrating rank, and its rows are
cointegrating vectors and span the space of cointegrating vectors (meaning all
cointegrating vectors can be constructed from a linear combination of the
rows of A (1)).

4.3.3.1 A(L) as partial inverse of C(L) in the CI(1, 1) case


From the definitions of A (L) and C (L),

A( L)C( L) = C( L) A( L) = ∆I n . (4.48)

When the VMA in differences is considered then

∆xt = C( L)t

Pre-multiplying this by A (L),


A( L)∆xt = A( L)C( L)t = ∆t ,

which, on cancelling the differencing operator, gives the VAR form. Pre-
multiplying again by C (L) reverses the transformation:

C( L) A( L)xt = C( L)t

∆xt = C( L)t ,
regenerating the VMA. Broadly speaking, then, the problem that has been
solved to show that the VMA in differences can be expressed as a VAR in
levels is to find a matrix A (L) such that equation (4.48) holds. The solution is
(4.44).8

4.3.4 Equivalence of VAR and VMA representations in the CI(1, 1) case


It has been shown that VMA in differences representation of a CI(1, 1) system
implies a VAR in levels as long as assumptions A1–A4 hold. It is also straight
forward to move back to the VMA representation again, since the VAR oper-
ator A (L) (4.44) also satisfies the assumptions, with r replacing n – r through-
~
out. This follows since A (L) = V (L) D* (L) F (L)–1 G (L) U (L) is rational (due to
~
presence of F (L)–1), with rank r. Its roots are those of G (L) (all outside the unit
Multivariate Approach to Cointegration 89

circle because these are the poles of C (L)) and those of D* (L) (unit roots). Its
~
poles are the roots of F (L), and so are all outside the unit circle.
Now consider any other VAR in levels representation of a CI(1, 1) system,
say
˜ ( L)x =  .
A t t

~
Then as long as A (L) satisfies assumptions A1–A4, then there exists a matrix
~ ~ ~ ~ ~
C (L) such that A (L) C (L) = C (L) A (L) = I and, hence, pre-multiplying by
~
C (L), the VAR becomes

C˜( L) A
˜( L)x = C˜( L) ⇒ ∆x = C˜( L) ,
t t t t
~
which is a VMA representation. By arguments similar to those above, C (L) will
also satisfy the assumptions. It is therefore the case that, among the class of
models having operators obeying assumptions A1–A4, the VMA in differences
and VAR in levels are equivalent representations of a CI(1, 1) system, and that
this sub-class of models is closed.

4.4 Johansen’s VAR representation of cointegration

The Engle–Granger–Yoo approach begins by assuming, that is imposing,


that the univariate processes are I(1) and that the vector moving average
process has reduced rank. Johansen’s (1995) approach reflects the assump-
tion of Sims (1980) that the VAR, though not necessarily the correct
underlying process, may in practice be the only type of model that can
be reliably identified and estimated. This approach also eliminates a
dichotomy that existed prior to our knowledge of cointegration, between
dynamic time-series models that derive from the LSE approach to econo-
metrics via Hendry and Sargan as compared with the approach based on
expectations that views the VAR as a fundamental reduced form. The
former approach emphasize the role of the underlying Data Generation
Process (DGP) to model complex agent interaction at an aggregate level
with the error correction revealing the long-run theoretical model. The VAR
is a natural extension of the univariate time series approach to analyze the
properties of a vector of time series. Johansen has amalgamated the time
series methodology of the VAR to incorporate long-run relationships associ-
ated with cointegration and provided an approach to estimation and
testing which determines the conditions necessary on the VAR for the
processes to be I(1) and cointegrated. The required conditions are more
complex than those for the VMA in differences, but the benefit lies in facil-
ity of estimation and an inferential procedure that derives from the con-
ventional maximum likelihood approach both in confirming cointegration
and in testing theoretical propositions on parameters.
90 Modelling Non-Stationary Time Series

The starting point is a VAR where the intercept has been set to zero for
simplicity. That is

A( L)xt = t , (4.49)

where
p
A( L) = I + ∑A L.
i =1
i
i

It is also assumed that all the roots of A (L) are either outside the unit circle or
equal to unity. Thus while non-stationarity is allowed, this can only be due to
standard unit roots.9 This VAR may be written
p
xt + ∑Ax
i =1
i t −i = t (4.50)

and reparameterized as the VECM


p −1
∆xt + ∑ ∆x
i =1
i t −i = ∏ xt −1 + t , (4.51)

p p
where ∏ = I + ∑
i =1
i = − A(1) and i = ∑A.
j = i +1
i

4.4.1 Cointegration assuming integration of order 1


For simplicity, assume that xt ~ I(1). Then  must be of reduced rank, and
unless  = 0, xt must be cointegrated. This can be shown by contradiction by
assuming that  is of full rank. First note that the VECM can be written
p −1
∏ xt −1 = ∆xt + ∑ ∆x
i =1
i t −i − t

which is I(0) since all terms on the right-hand side are I(0) when xt ~ I(1).
Then  must be of reduced rank, since if this were not the case then its inverse
would exist and
 p −2 
xt −1 = ∏ −1  ∆xt −
 i =1

i∗∆xt − i − t  ~ I ( 0)

which contradicts xt ~ I(1). The fact that xt–1 ~ I(0) then establishes cointegra-
tion as long as  ≠ 0, the rows of  being cointegrating vectors. If  = 0 then it
is immediate from the VECM that the process is not cointegrated. Note that 
is an n × n matrix, and let rank () = r where for cointegration r<n, so that  is
of reduced rank. Then there exist n × r matrices  and  both of maximum
rank, r, such that
∏ =  ′ (4.52)
Multivariate Approach to Cointegration 91

Furthermore, since each row of  is a linear combination of the rows of ′, the
rows of ′ are cointegrating vectors. The rank of  is known as the cointegrating
rank of the system. This establishes the following result.

4.4.1.1 Cointegrated VARs with I(1) processes


Let xt ~ I(1) and obey the VECM (4.51) with rank () = r. Then:

(i) 0 < r < n;


(ii) the rows of  are cointegrating vectors;
(iii) the rows of ′ in the representation of equation (4.52) constitute a set of
linearly independent cointegrating vectors.

4.4.2 Conditions for the VAR process to be I(1) and cointegrated


The difficulty with assuming xt ~ I(1) is that the order of integration in the
VAR can be greater than 1. It is necessary to establish conditions for the
processes being I(1), to check that these can be satisfied, and to begin to con-
sider how to handle higher order integrated processes. Some further prelim-
inaries are necessary.
(i) Defining
p −1
( L) = I + ∑ L ,
i =1
i
i

the VECM may be written,

( L)∆xt = ∏ xt −1 + t ,
where

ϒ = (1). (4.53)

Then A (L) may be written

A( L) = − ∏ +( ϒ + ∏ )(1 − L) + * ( L)(1 − L)2 (4.54)


where * (L) is a polynomial of order p – 2. Thus, substituting (4.54) into
(4.49), the VAR may be written
− ∏ xt + ( ϒ + ∏ )∆xt + * ( L)∆ 2 xt = t . (4.55)

(ii) For any full rank n × r (r ≤ n) matrix , define its orthogonal compliment,
⊥ dimensioned n × (n – r) with rank n – r such that

ϕ ′ϕ ⊥ = 0,
0 if r = n
ϕ⊥ =  .
 I if r = 0
92 Modelling Non-Stationary Time Series

There are explicit formulations of ⊥, though sub-blocks of this matrix are
arbitrary. Also define

ϕ = ϕ (ϕ ′ϕ )−1 ( 4.57)
with the projection matrix
Pϕ = ϕ (ϕ ′ϕ )−1 ϕ ′ = ϕϕ ′ = ϕϕ ′, (4.58)

and note that


ϕ ′ϕ = I r = ϕ ′ϕ .

Johansen’s key (necessary and sufficient) condition on the VAR such that
the processes are integrated of order 1 and cointegrated, is expressed in terms
of ϒ, ⊥ and ⊥. An outline of the derivation of this condition is provided
below.10 The result applies only to VARs the roots of which are either equal to
one or lie outside the unit circle.
The approach used is to split the differenced process, xt, into components
relating to the directions of (potential) cointegration, t (which occur in dif-
ferenced form) and non-cointegration, ut (in levels). The difference process is
then cumulated (summed from the first to the tth values) to give an equation
for the levels, xt. The cumulation results in: the sum of the ut, giving rise to a
stochastic trend (a unit root process if ut is stationary); the transformation of
the differences of t to its levels; and the appearance of an initial value vector
(analogous to a constant of integration). To keep the treatment simple, the
initial values are ignored (set to zero).11 Since in detail, t is a set of linear
combinations of the components of xt, if both the ut and the t are I(0) then xt
is both I(1) (as a result of the stochastic trend involving ut) and cointegrated
(because then t is a linear combination of I(1) variables that is I(0)). So the
proof revolves around showing that ut and the t are I(0). The condition
results from the need for the stationarity of these processes. Having shown
this, it is fairly straight forward to show that cointegration of order (1,1)
implies the condition, and hence it is established that the condition is both
sufficient and necessary.
An outline of the statement and proof is provided here. The result is that a
necessary and sufficient condition for xt to be both I(1) and cointegrated (i.e.
CI(1, 1)) is that

rank(⊥′ ϒ⊥ ) = n − r , (4.59)


i.e. ′⊥ ⊥ is of full rank.
To show why this matrix is important, first decompose the difference
process as
∆xt = ( P ⊥ + P )∆xt = ⊥ (⊥′ ⊥ )−1  ′∆xt + ( ′)−1  ′∆xt (4.60)
Multivariate Approach to Cointegration 93

The second term on the right-hand side of (4.60) can be rearranged in terms
of potentially cointegrating combinations of xt. Define

t =  ′xt (4.61)

these being the potentially cointegrating combinations. Also, arising from the
first term on the right hand side of equation (4.61), define

ut =  ′∆xt (4.62)
Then, from (4.60)
∆xt = ⊥ (⊥′ ⊥ )−1 ut + ( ′)−1 ∆t . (4.63)

The process of interest is not xt but xt itself, obtained by summing the differ-
ence process up to the current period. When this is done, an initial value is
also generated. In addition, in order to reuse t as the index for the current
period, a different index has to be used on the process being summed. Thus,
t

∑ ∆x = x − x .
i =1
i t 0

Applying the same operation to the right-hand side of (4.63) yields


t
xt − x0 = ⊥ (⊥′ ⊥ )−1 ∑ u + (′)
i =1
i
−1
(t − 0 ) (4.64)

Ignoring the initial values x0 and 0, this becomes


t
xt = ⊥ (⊥′ ⊥ )−1 ∑ u + (′)
i =1
i
−1
t . (4.65)

From this last equation, since


t
ut ~ I ( 0) ⇒ ∑ u ~ I(1)
i =1
i

it can be seen that xt ~ I(d), d ≥ 1, depending on the order of integration of


t. In particular, if t ~ I(0) then, from (4.65), xt is the sum of an I(1) and an
I(0) process and so is itself I(1). But if xt ~ I(1) and t ~ I(0), then by the
definition of t (a set of linear combinations of xt), xt is also cointegrated. In
brief,
ut ~ I ( 0),t ~ I ( 0) ⇒ xt ~ CI (1,1).

Thus it is sufficient to show that both ut and t are I(0). It is in the process of
obtaining this result that condition (4.59) arises.
Define
x˜t = (t′ ut′ )′.
94 Modelling Non-Stationary Time Series

If a VAR representation for can be found for x ~ , all the roots of which lie
t
~ is stationary.12 The required VAR is obtained by:
outside the unit circle, then x t

(i) pre-multiplying equation (4.55) by —


′ to give one new equation, and by
 ′⊥ to obtain another;

(ii) substituting using 4.61 and 4.62 to give equations in ut and t, though a
term in 2xt remains;
(iii) noting that the term in 2xt can be expressed in terms of the differences
of x~;
t
(iv) expressing the resultant equation in terms of x ~ only.
t

The result is the VAR


˜( L)x˜ = (  )′ ,
A (4.66)
t ⊥ t
~
where the operator A (L) can be written
˜ ( L) = A
A ˜ (1) ( L) + A
˜ ( 2 ) ( L )∆ (4.67)
~ ~ = (′ u′ )′ as
and A(1) (L) is partitioned conformably with x t t t

 
˜ (1) ( L) = − I  ′ϒ⊥ 
A (4.68)
 0 ⊥′ ϒ⊥ 
~
It remains to establish that A (L) has all its roots outside the unit circle. This is
done in two stages. Firstly it is established that any non-stationarity is due to
~
unit roots (by showing that the roots of A (L) and A (L) are the same, except that
~
the number of unit roots may differ), and then showing that A (L) has no unit
~
roots. To show the relationship between the roots of A (L) and A (L), note that
˜( z ) = (1 − z )−( n− r ) A( z ) Q
A (4.69)

where Q is a (n – r) × (n – r) matrix depending on , ′⊥,  and ⊥.13 Thus


˜( z ) = 0 for z ≠ 1,
A( z ) = 0 ⇒ A
~
so all non-unit roots of A (L) are also roots of A (L), but due to the presence of
~
the factor of (1 – z)– (n – r), z = 1 may or not be a root of A (L). Thus if the roots
~
of A (L) are all either outside the unit circle or equal to 1, so are those of A (L).
~ ~
To show that A (L) has no unit roots, consider A (1). The required condition
~ ~
is A (1) ≠ 0, or equivalently that A (1) should have full rank. From (4.67),
~ ~
A (1) = A(1) (1), and directly from (4.68):

˜ (1) (1) = − I  ′ϒ⊥ .


A
 0 ⊥′ ϒ⊥ 
˜(1) = − ′ ϒ and since
Thus A ⊥ ⊥

˜(1) ≠ 0,
⊥′ ϒ⊥ full rank ⇒ ⊥′ ϒ⊥ ≠ 0 ⇔ A
Multivariate Approach to Cointegration 95

it follows that rank(⊥′ ϒ⊥ ) = n − r is a sufficient condition for xt ~ CI (1, 1).


Necessity is easily established. Start from the assumption that xt ~ I(1). This
means that in the original VAR, A (1) = 0, that is A (1) is not of full rank. The
rank of A (1) is r. Therefore, there exist full rank n × r matrices  and  such
that  = ′ and the variables t = ′xt and ut = ′⊥ xt can be constructed,
where ut ~ I(0) since xt ~ I(1). It also follows from the VECM

p −1
∆xt = ∏ xt −1 + ∑ ∆x
i =1
i t −i + t ,

that xt must be I(0), and hence that t = ′xt is I(0). Thus the VAR for
~
x˜t = (t′ut′ )′ , still given by (4.66) must be stationary, so A (1) ≠ 0. But as
before,
 
˜(1) = − I  ′ϒ⊥ 
A
 0 ⊥′ ϒ⊥ 
~ — — —
and so A (1) = –
—′
⊥ϒ ⊥. Hence – ⊥ϒ ⊥ ≠ 0, that is  ⊥ϒ ⊥ is of full rank.
—′ —′

Finally note that


⊥′ ϒ⊥ = (⊥′ ⊥ )−1 ⊥′ ϒ⊥ (⊥′ ⊥ )−1 ,

where (′⊥⊥)–1 and (′⊥⊥)–1 are full rank (n – r) × (n – r) matrices. Thus

rank(⊥′ ϒ⊥ ) = rank(⊥′ ϒ⊥ ),



so that the rank condition applies equivalently to the simpler matrix  —′
⊥ϒ  ⊥,
as required.
In summary, when the vector process follows a VAR given by (4.49), and

where the only non-stationary roots are unity, 
—′
⊥ϒ ⊥ being of full rank is nec-
essary and sufficient for cointegration of order (1,1). In this matrix, ⊥ and ⊥
are the orthogonal compliments of  and  defined by (4.52), and ϒ is given
by (4.53).14

4.4.2.1 Discussion
This key condition is undoubtedly difficult to understand from an intuitive
point of view. However, practically speaking, its function is to guard against
the component processes being I(2). If it is assumed from the outset that the
processes are I(1), then the required condition on the VAR is simply that  is
of reduced rank. The condition can be used to extend the analysis of cointe-
grated systems to cases where the processes can be I(2). Having established the
condition for I(1) and cointegration, since this is necessary and sufficient,
clearly ′⊥ ⊥ must be of reduced rank in order for the processes to be of a
higher order of integration.
96 Modelling Non-Stationary Time Series

4.4.3 The moving average representation


To obtain a VMA representation, note that equation (4.65) provides an expres-
~ (i.e.
sion for xt in terms of t and ut. Equation (4.66) is a stationary VAR for x t
t and ut) and can therefore be inverted. Thus a solution is available for t and
ut in terms of the stationary disturbances t. That is

x˜t = C˜( L)( ⊥ )′t . (4.70)

Thus from (4.66), xt can be expressed as a function of t through expressing t


and ut in this way. In equation (4.65) ut appears as the increment in the
stochastic trend, and a little further investigation of this term is potentially
useful. Note that
ut = ( 0 I )x˜t . (4.71)
~ ~ ~
and applying the usual reparameterization C (z) = C (1) + (1 – z) C* (z) where,
~ ~
since C (z) = A–1 (z)

− I  ′  −1 − I  ′  ( ′  )−1 
−1 ⊥ ⊥ ⊥ ⊥
˜ ˜
C(1) = A (1) =   = . (4.72)
 0 ⊥′ ⊥   0 (⊥′ ⊥ )−1 

Thus
ut = ( 0 I )x˜t = ( 0 I )C˜( L)( ⊥ )′t

[ ]
= ( 0 I ) C˜(1) + (1 − L)C˜ * ( L) ( ⊥ )′t

= (0 I )[C˜(1) + (1 − L)C˜ * ( L)]( ⊥ )′t


= ( 0 I )C˜(1)( ⊥ )′t + ( 0 I )(1 − L)C˜ * ( L)( ⊥ )′t .
– ~ – ~
Using equation (4.72), pre-multiplying by  ⊥ and letting C + (L) =  ⊥ (0 I) C *
(L) (  ⊥)′, gives:
— —

⊥ut = ⊥ (⊥′ ⊥ )−1 ⊥′ t + C˜ + ( L)∆t . (4.73)

Summing terms in (4.73) and setting initial values to zero for simplicity:
t t
⊥ ∑
i =1
ui = ⊥ (⊥′ ⊥ )−1 ⊥′ ∑  + C˜ ( L) .
i =1
i
+
t

This can be substituted for the first term on the right-hand side of (4.65). The

remaining term,  t =  (′)–1 t requires the expression of t in terms of t. It
~ , this term may be written
follows from (4.70) and the fact that t = (I 0) x t

t = D( L)t −1 (4.74)
+
t = D ( L)t . (4.75)

Expressions (4.74) and (4.75) can be substituted into (4.65) to give


t

xt = ⊥ (⊥′ ⊥ )−1 ⊥′ ∑ + C˜ ( L) ( L) + D ( L) .


i
+
t t
+
t
(4.76)
i =1
Multivariate Approach to Cointegration 97

– — – ~
This is further simplified by setting C =  ⊥ ( ′⊥  ⊥)–1 

′⊥ and C (L) = C + (L)
+ D+ (L). Hence
t
xt = C ∑  + C( L)
i =1
i t

This is the VMA representation corresponding to the VAR (4.49).15

4.5 Johansen’s approach to testing for cointegration in systems

The Johansen methodology is based on the VAR representation and referred


to as a maximum likelihood approach. This is because the underlying estima-
tion method which provides the Johansen test statistics is in fact maximum
likelihood.16

4.5.1 Testing for reduced rank and estimating cointegrating vectors


4.5.1.1 Review of source of reduced rank in cointegrated systems
Consider the VECM
p −1

∆xt = ∏ xt −1 − ∑ ∆x i t −i + t .
i =1

The rank conditions on  are discussed above: the condition for xt to


be CI(1, 1), given xt ~ I(1) is rank () = r, where 0 < r < n. The cointegrating rank
r is the number of linearly independent cointegrating vectors of the system.17
The Johansen approach to testing for cointegration (that is, testing to
obtain an estimate of the cointegrating rank) exploits these properties. In the
sections below, the two commonly used tests of Johansen are derived.

4.5.1.2 Using eigenvalues and eigenvectors in cointegration analysis


Eigenvalues can be regarded as a set of summary statistics of a matrix from
which a number of key properties can be determined. Each eigenvalue is asso-
ciated with an eigenvector. Any matrix can be expressed in terms of its eigen-
values and eigenvectors.
In Johansen’s cointegration analysis, the key statistics are a set of non-nega-
tive eigenvalues. In testing for cointegration, or more accurately determining
the cointegrating rank, interest focuses on those that are significant, that is,
significantly greater than zero. Because of the association of an eigenvector
with an eigenvalue, an eigenvector is insignificant if its eigenvalue is
insignificantly different from zero. Thus the significance of an eigenvector can
be tested through the significance of its eigenvalue. If the eigenvalue is
significant it is meaningful go on to calculate, and work with the correspond-
ing eigenvector. In the problem that arises in cointegration analysis, the
eigenvectors are the cointegrating vectors.
98 Modelling Non-Stationary Time Series

4.5.2 The removal of nuisance parameters


The matrix that characterizes the cointegration properties of the system is .
All other parameters of the model and associated variables are irrelevant.
These nuisance terms can be removed by regressing both xt and xt–1 on xt–i,
i = 1, 2, …, p – 1 using ordinary least squares. The residuals from these regres-
sions will be purged of their correlation with the lagged differences. Let R0,t
and R1,t be the n × 1 residual vectors from the regressions with xt and xt–1
respectively as dependent variables. Then the least squares estimate of  in
R1,t = ∏ R0 ,t + error

is the same as that from (4.51).18 From the point of view of maximum likeli-
hood estimation, this is equivalent to concentrating the likelihood function.
As long as a Gaussian likelihood is used, the maximum likelihood estimator of
 is also unaffected, even under the restriction that the matrix is of reduced
rank, r, r < n. That is, the estimates of  and  in
∏ =  ′

are unaffected. This is explained in Appendix D. The requirement for


Gaussianity means that the disturbances, t, must be jointly normally distrib-
uted, an important assumption.

4.5.3 Estimating potentially cointegrating relations


The residual vectors inherit the integration properties of the dependent vari-
ables since all explanatory variables in the auxiliary regressions generating
them are stationary. Thus R0,t ~ I(0), R1,t ~ I(1). One way of motivating the
problem of the determination of the cointegrating vectors is to observe that
the correlation between an I(0) variable and a linear combination of I(1) vari-
ables will be low unless the particular linear combination of the I(1) variables
is itself I(0). In such a case, the coefficients of the linear combination consti-
tute a cointegrating vector. So, by choosing the linear combination to maxi-
mize the correlation, if a cointegrating combination is available, the procedure
should select this combination. Of course, it may not exist, in which case the
correlation between R0,t and all linear combinations of R1, t will be low. Or,
there may be more than one, in which case some will be more correlated with
the stationary residuals than others. This problem is closely related to that of
canonical correlation applied to R0,t and R1,t. Canonical correlation involves
the transformation of each vector using linear combinations of their elements
such that the transformed vectors have the identity matrix as variance –
covariance matrix and the elements of the transformed vectors have a
diagonal correlation matrix. This is explained in detail in Appendix D. The
resultant correlations are known as canonical correlations. At first glance, the
canonical correlation problem seems to apply more structure than is required,
Multivariate Approach to Cointegration 99

and does not seem to address the issue of correlation maximization. However,
a close examination of the relationship between this and the complete
maximum likelihood problem reveals that in fact the problems yield the same
solutions (see Appendix D). Both the maximum likelihood (ML) and canon-
ical correlation problem deal with the sample covariance matrix of the resid-
ual vectors. Define the sample covariance matrices
T

Si, j = T −1 ∑R i ,t R′j ,t i, j = 0,1.


t =1
where T is the sample size. In each case the problem reduces to an examina-
tion of the eigenvalues and eigenvectors of the matrix S–1 –1
1,1S1, 0S0,0S0,1. The fol-
lowing points are relevant:

(i) The eigenvalues of this problem are the squares of the canonical
correlations.
(ii) The corresponding eigenvectors are the potential cointegrating vectors,
.
(iii) The maximized value of the log-likelihood function depends only on
the r largest eigenvalues and S0,0, where the term in S0,0 is additive and so
does not appear in expressions for the difference between maximized
log-likelihood functions for different r.
(iv) Estimates of , called the adjustment coefficients, are available as a func-
tion of the estimates of  and S0,1.

The eigenvalue problem in the ML context is often expressed in generalized


form as

I − S1−,11S1,0 S0−,10 S0,1 = 0


where the eigenvalues lie in [0,1] and are denoted in ordered form as 0 ≤ n ≤
n – 1 … ≤ 2 ≤ 1 ≤ 1. This eigenvalue problem is equivalent to the more usual
problem
(4.77)
I − S1−,11S1,0 S0−,10 S0,1 = 0

when S1,1 is non-singular (Dhrymes 1984), and as such is the same as that for
the canonical correlation problem.
For each eigenvalue that satisfies (4.77), there is an equivalent eigenvector, vi,
that is a solution to the following homogenous system of linear equations:19

( I − S1−,11S1,0 S0−,10 S0 ,1 )vi = 0

or

i vi = S1−,11S1,0 S0−,10 S0,1vi (4.78)


100 Modelling Non-Stationary Time Series

For i = 1, …, r, the eigenvectors define r cointegrating relationships, that is


i = i, and so
ii = S1−,11S1,0 S0−,10 S0,1i

1,1 is an estimator of  so
It follows from the algebra of the problem that S0,1S–1
an estimate of  can be obtained from that of  since
 = S0,1

In addition:

i = i′S0−,10i

where i is the ith column of . This result follows only where the normaliza-
tion ′ S1,1 = I is used. Equation (4.79) shows that a test of i = 0 is equivalent
to a test of i = 0 that is, that the ith column of  is zero. The restriction i = 0
means that the ith potentially cointegrating combination does not appear in
the VECM, the reason being either that it is not a stationary combination, or
that it is not significantly linearly independent of the combinations associated
with the larger eigenvalues, j, j < i.
The maximized log likelihood conditional on r, ignoring certain constants,
is given by
T 
r
log L˜MAX = − log S0 ,0 +
2  ∑ log(1 − )
i =1
i

The estimation of the i is not dependent on r, although their interpretation


is. If j is insignificantly different from zero, then the corresponding canonical
correlation, using j as the coefficients of the I(1) processes, is insignificantly
different from zero. That is, j does not result in a stationary combination of
the I(1) processes.

4.5.4 Testing cointegrating rank


Since it is a function of r, denote the maximized likelihood above as

T 
r
log L˜MAX (r ) = − log S0 ,0 +
2  i =1
∑log (1 − i ), r = 0, 1, …, n,

where the summation term does not appear if r = 0. The log likelihood
~ ~
log LMAX(r1) is a restricted version of log LMAX(r0) if r0 < r1. Thus, the likelihood
ratio statistic for comparing H0: r ≤ r0 with the alternative H1: r ≤ r1 is

[
LR(r0 , r1 ) = −2 log L˜MAX (r1 ) − log L˜MAX (r0 ) ]
~
since log L MAX (r*) is the log-likelihood for a model where H0: r ≤ r*.
Substituting from the expression for the maximized log likelihood in terms of
the eigenvalues, this can be written
Multivariate Approach to Cointegration 101

 r1 
LR(r0 , r1 ) = −T  ∑
i = r +1
log (1 − i )

 0 
If used in a conventional way, the null hypothesis would be rejected for large
values of the test statistic, such a rejection being a statement that the eigen-
values i, i = r0 + 1, …, r1 were jointly significantly different from zero. The
normal choices of r0 and r1 are:

(a) r0 = j – 1, r1 = n, j = 1, 2, …, n;
(b) r0 = j – 1, r1 = j, j = 1, 2, …, n.

In case (a), the test is of whether the eigenvalues i, i = j, …, n are jointly zero.
These are the n – j smallest eigenvalues. In case(b), the test is of whether the
eigenvalue j, alone is zero.20 In performing the two tests, the information
exploited is different, and so the inferences may not always agree.
The test associated with (a) is known as the trace statistic, denoted trace
(j – 1). The null (H0) and alternative (H1) hypotheses are, for j = 1,2, …, n:
H0 : r ≤ j − 1
H1 : r ≤ n.

The test statistic is

 n 

LR( j − 1, n) = −T  log (1 − i ) = trace ( j − 1)
 i= j 
 

The test related to (b) is known as the maximal eigenvalue statistic, denoted
max (j – 1), and has the hypotheses
H0 : r ≤ j − 1
H1 : r ≤ j

for which the test statistic is

LR( j − 1, j ) = −T log (1 − j ) = max ( j − 1).21

Each test rejects the null hypothesis for large values of the test statistic,
which must be positive. Thus, using c to stand for the critical value of the
test, and (j – 1) to represent the test statistic, the form of the test is:

reject H 0 if ( j − 1) > cv

The critical values for the two tests are different in general (except when
j = n), come from non-standard null distributions and are dependent on the
sample size and the number of cointegrating vectors being tested for. The dis-
tribution theory leading to the critical values of the test is described in
102 Modelling Non-Stationary Time Series

Appendix D.22 Most computer packages that compute the test statistics also
compute critical values for the tests.
The interpretation of these tests should be considered carefully.
The trace statistic always has as its unrestricted case, that the cointegrating
rank is at most n. The restricted, or null, case is that the cointegrating rank is
at most j-1. This is consistent with the statement of the hypotheses in terms of
the eigenvalues as

H0 : i = 0, i = j, …, n
H1 : i > 0 for at least one of i = j, …, n:

since in the alternative case at least one of the set of eigenvalues being tested
must be non-zero. So, it might be that only the largest remaining, the jth, is
non-zero, hence that the cointegrating rank is j, or at the other extreme, it
could be that all are, in which case the rank is n. Given that the cointegrating
rank cannot exceed n, the simplest way to represent the case under the alter-
native is r ≥ j.
The maximal eigenvalue test has the same restricted model, but the unre-
stricted model only considers a cointegrating rank one higher. Thus, the only
case explicitly considered under the alternative is a cointegrating rank is
greater by one. In terms of the eigenvalues the hypotheses become
H0 : j = 0
H1 : j > 0.

From the hypotheses expressed in terms of eigenvalues it can be seen that the
trace test is a joint test of all eigenvalues smaller than j – 1, that is j, j + 1, …,
n while the maximal eigenvalue test is of j only. The hypotheses of the two
tests are summarized in Table 4.1.
In the case of neither test is the cointegrating rank established uniquely. To
determine the cointegrating rank it is necessary to focus down onto a particu-
lar value for r. This can be achieved by testing in sequence, moving in the
direction of increasing cointegrating rank. Notice that when using the trace
test, rejection of the null r ≤ s – 1 leads to the conclusion that r ≥ s. The next

Table 4.1 Hypotheses of the maximal eigenvalue and trace statistics, j = 1, 2, …, n

Test Hypotheses Test statistic

In terms of cointegrating In terms of eigenvalues


rank

Null Alternative Null Alternative

λmax (j – 1) r≤j–1 r=j λj = 0 λj ≠ 0 –Tlog(1 – λj)


λtrace (j – 1) r≤j–1 r≥j λj = 0, λj > 0 for at
−T ∑
n

i = j, least one i,  i= j
j + 1, …, n i = j, j + 1, …, n
log(1 − j ) ]
Multivariate Approach to Cointegration 103

Table 4.2 Sequential testing using the trace test

Null hypotheses Alternative Interpretation of inference


hypotheses

Actual Sequential* Rejection of null Non-rejection of


null

r=0 r=0 r ≥1 r ≥ 1, continue to Conclude r = 0.


next stage, test null No further testing.
r≤1
r≤1 r=1 r ≥2 r ≥ 2, continue to Conclude r = 1.
next stage, test null No further testing.
r≤2


r≤n–1 r=n–1 r=n Conclude r = n Conclude r = n – 1

Note:
*Sequential interpretation assumes rejection of previous null hypothesis.

null in the sequence is r ≤ s, but since r ≤ s – 1 has already been rejected, this
reduces to r = s.23 The alternative is r ≥ s + 1. Rejection of the null again would
lead to a test of the null r ≤ s + 1 (in effect r = s + 1) against the alternative
r ≥ s + 2, and so on until the null is not rejected. This sequence and the inter-
pretation of rejection or non-rejection at each stage is described in Table 4.2.
The maximal eigenvalue test may be used in an analogous way, as described
in Table 4.3. Rejection or non-rejection of the null hypothesis should be
treated cautiously. Rejection of the null hypothesis does not imply that the

Table 4.3 Sequential testing using the maximal eigenvalue test

Null hypotheses Alternative Interpretation of inference


hypotheses

Actual Sequential* Rejection of null Non-rejection of


null

r=0 r=0 r =1 Apparently r = 1, Conclude r = 0.


but r > 1 not No further testing.
considered so
continue to next
stage, test null r ≤ 1
r≤1 r=1 r =2 Apparently r = 2, but Conclude r = 1.
r > 2 not considered No further testing.
so continue to next
stage, test null r ≤ 2

r≤n–1 r=n–1 r=n Conclude r = n Conclude r = n – 1

Note:
*Sequential interpretation assumes rejection of previous null hypothesis.
104 Modelling Non-Stationary Time Series

alternative should be accepted. Similarly with non-rejection of the null. For


example, rejection may occur because untested assumptions about the data
are being contravened – that is, the hypotheses are in effect more complex
than is being stated. The point is stronger still when a test is being performed
on a parameter, but the union of the sets of values under null and alternative
hypotheses is not exhaustive. Under such circumstances, if the true or best
approximating value of the parameter is not accounted for under either
hypothesis, it is difficult to predict which of the two competing hypotheses
will be preferred. However, if both hypotheses constitute similarly poor
approximations, then the null will be favoured by the test, as in tests of
this sort, the null hypothesis is reverted to in the absence of discriminating
evidence.
At each stage, the trace test covers all possible values of the parameter r, so
the non-exhaustive problem does not arise. However, the maximal eigenvalue
test only covers all possible cointegrating ranks when testing the final null in
the sequence, r ≤ n – 1. The procedure based on this test may fail to reject the
null because neither the null nor alternative includes the true cointegrating
rank. The procedure tests for lower orders of cointegration first and so may
well underestimate the cointegrating rank. Thus, intuitively, the method
based on the trace test is to be preferred. Johansen (1995, chapter 12) shows
formally that, asymptotically, the sequential procedure based on the trace test
does not underestimate the cointegrating rank, and overestimates it with
probability equal to the size of test at an individual step. Thus, a procedure
using tests of size 5% at each step, would, asymptotically, select the correct
order of cointegration with probability 95%. This is not to say that the
maximal eigenvalue test is inferior at all stages. If the alternative of the
maximal eigenvalue test is the true rank then it can be expected to have more
power than the trace test since the latter will be considering it jointly with a
set of irrelevant alternatives. While it is common practice to use both tests as
the basis of sequential testing procedures, the trace test should be relied upon
more heavily. This begs the question of how the two tests might be combined
in a more useful way in a finite sample. Two suggestions are as follows:

(i) Use the max test only to check the cointegrating rank determined by the
trace procedure. Thus a confirmatory inference is achieved if the first non-
rejection of the trace sequence occurs at H0 : r ≤ j (interpreted sequentially
as r = j) versus H1 : r ≥ j + 1 and the non-sequential max test does not reject
at H0 : r ≤ j versus H1 : r = j + 1. In this way, the test with the better power
is used up to a point where a test of greater power is used to confirm the
inference.
(ii) Rather than compute the statistics in sequence, it is possible to compute
p-values for all cases. The preferred alternative would be that of the test
Multivariate Approach to Cointegration 105

with the highest p-value. The interpretation from the maximal eigenvalue
test is clear as this has a point alternative. That from the trace test is less
obvious since the alternative is of a compound form. However, the
natural interpretation is to select the lower bound since cases involving
only higher orders of cointegration are not preferred.

Hendry (1995) and others have argued that a general to specific approach is
to be preferred in model selection. The sequential testing procedure, however,
begins by testing the most restricted case: that all eigenvalues are zero. The
restrictions are then relaxed one eigenvalue at a time. This is a specific to
general approach. It is also specific to general in that the lower the rank, the
fewer coefficients are needed to parameterize the VECM.24 Nonetheless,
Johansen’s result establishes that the specific to general approach is a valid
method for determining the cointegrating rank.

4.6 Tests of cointegration in VAR models

In this section, we consider the application of the Johansen procedure.


Although many alternative methods are available in the literature for testing
cointegration and detecting the long-run parameters, the Johansen procedure
tests the proposition that series are cointegrated, estimates the parameters,
permits theoretical propositions and exogeneity to be tested. Here the test is
considered within the confines of a simple case. Then some results from the
literature are discussed and their economic interpretation. Firstly, we consider
the simplest case where the underlying models are essentially random walks,
then models of the UK exchange rate considered by Hunter (1992a), and
Johansen and Juselius (1992) and the results based on an extended data set
first presented in Hunter and Simpson (1995). In the next chapter,
identification and exogeneity are discussed.
The Engle–Granger two-step procedure first considered the estimation of the
long run from a single equation regression and then the residual from this
model defines a cointegrating vector. As was explained in chapter 3 the lagged
residual can be entered into a dynamic model and this is then described as an
error correction or, more precisely, an equilibrium correction term. The equi-
librium correction term has estimated parameters, while the parameters of an
error correction term sets the coefficients in absolute terms to unity, (i.e.
[1–1]). It must be re-emphasized that the Engle–Granger method will only
generally be valid when there is r = 1 cointegrating vectors or there are only
two equations in the system. Excepting very particular cases the method will
be incorrect when there is more than one cointegrating vector and more than
two equations in the system. In spite of this, there have been many attempts
to improve the performance of the long-run estimator. Saikkonen (1991)
106 Modelling Non-Stationary Time Series

suggested the inclusion of further dynamics to improve the estimates of the


long-run parameter, while Phillips and Hansen (1990), and Phillips (1991)
provide non-parametric corrections robust to different types of error structure
and the correction proposed by Marinucci and Robinson (2001) seems to
perform well when the system includes weakly exogenous variables. However,
the performance of these types of estimator has generally been found to
perform well in Monte-Carlo studies applied to bivariate models.
In this section, results associated with the multivariate approach due to
Johansen are considered. The results associated with the Johansen estimator
are well defined when the conditions described in section 4.3 are satisfied:

(1) The error process is normally distributed.


(2) The underlying VAR is well defined.
(3) There are no structural breaks.
(4) All the series are of the same order of integration (usually I(1)).

The Johansen test can be significantly altered by non-normality. Non-nor-


mality can be observed, because the series follow a non-normal distribution,
due to intercept shifts, structural changes or the type of error variance behav-
iour linked to volatility. Non-normality can often be rectified by the introduc-
tion of dummy variables, when it has a simple institutional or structural
cause. However, the impact of a dummy on the Johansen test statistic is not
always innocuous. We will discuss the question dynamic specification on the
Johansen trace test in the next section. It will be assumed here that the VAR
has been correctly specified. Structural breaks whose point of occurrence is
unknown are more difficult to handle. Here, it is assumed that any breaks that
do occur are associated with well-documented events. As the order of integra-
tion is not known by definition, a number of issues arise. Firstly, when the
Johansen procedure is used I(0) and I(1) processes can be mixed when there
are at least two I(1) variables in the system. Secondly, non-stationary series
that are fractionally integrated require a different type of estimator (Robinson
and Marinucci 1998). Thirdly, balanced I(2) behaviour can be incorporated.
Fourth, more general I(2) processes require a different estimator as do higher
order processes. Flôres and Szafarz (1996) consider an extended definition
of cointegration where there is a mixture of I(1) and I(0) processes, this is
much more readily dealt with by the Johansen approach (Juselius 1995).
Robinson and Yajima (2002) consider processes with long-memory that are
stationary but require fractional differencing for stationarity, this approach is
not handled by the Johansen procedure. When series are integrated of an
order in excess of (1), but the integer order of integration is the same, then
both the Engle–Granger and Johansen approaches are still valid. The dynamic
model is estimated by rendering the data stationary through differencing an
Multivariate Approach to Cointegration 107

appropriate number of times, while the long run is estimated in the usual way
from the residuals from equations in the lag of the original data. Otherwise, a
more general estimator is required. Currently there is an appropriate estimator
for the I(2) case, which will be covered in more detail in chapter 6. Here, we
consider an example which satisfies the property of balanced I(2) behaviour.
Either that data are all I(2) and the dynamic models are specified in their
second differences or when the data is logarithmic, accelerations of all the
series analyzed are specified as being I(1), and then the usual Johansen
method is applied to I(1) series of which some may also be differenced. To
confirm the appropriateness of the balanced I(2) case, the test for I(2) by
Johansen (1992) is applied.
Finally, the current evidence on the performance of tests of cointegration is
discussed.

4.6.1 Special cases of the Johansen test


If we assume the simplest procedure drives the underlying series, then the fol-
lowing special case provides a more intuitive explanation of the Johansen pro-
cedure. Let all series be generated by random walks, then the likelihood
statistic due to Johansen (1991) is (see Appendix C):
r

LogL(.) = T ∑ log(1 − ) i
i =1

where i is solved from the determinantal equation  iS1,1 – S1,0 (S0,0)–1 S0,1 = 0,
Sij =  t=1 RitRjt, i, j = 0, 1. In the VAR(1) case:
n

R0 ,t = ∆xt
R1,t = xt −1 .

Essentially, the Johansen procedure generalizes these equations to transform a


more complex dynamic model (VAR(i)) into two sets of equations that reduce
to a multivariate first order autoregression: based on the above description of
R0,t and R1,t the equations that are being implicitly estimated by the Johansen
procedure are
R0 ,t =  ′R1,t (4.80)
or
∆xt = ∏ xt −1 .

The latter equation is a VAR(1). Not only is this a VAR(1), but this equation
can be readily viewed as a multivariate generalization of the model estimated
by Dickey and Fuller, to test stationarity of a single series; the estimation of
this type of model is briefly considered in Engle and Granger (1987). For a
single equation, based on one or more regressors, Engle and Granger test coin-
tegration using regression residuals, while the Johansen estimator requires a
108 Modelling Non-Stationary Time Series

variance decomposition. Consequently the two methods may produce differ-


ent results (Haug 1993, 1996).

4.6.2 Empirical examples of the Johansen test


Now let us consider some empirical examples. The model of the exchange rate
derived using the data presented in Johansen and Juselius (1992) and Fisher
et al. (1990). The model estimated by Hunter (1992a). Estimates based on an
extended data set for the purchasing power parity model considered by
Johansen and Juselius (1992). And estimates of a UK exchange rate model
with balanced I(2) based on Juselius (1995).
Firstly, for simplicity the six variable VAR(2) model,25 estimated by Hunter
(1992a), is considered. The model is an extension of the five-variable VAR
model estimated in Johansen and Juselius (1992). The system contains the fol-
lowing variables all in logarithms: oil prices (pot), UK prices (p1t), world prices
(p2t), the UK effective exchange rate (e12t), UK treasury bill rate (i1t) and the
Eurodollar rate (i2t). Johansen and Juselius (1992) wanted to confirm that the
UK effective exchange rate satisfied the conditions for Purchasing Power Parity
(PPP). The six variables are stacked into the following VAR(2) model with
normal errors and unrestricted intercept:
( I + 1 L)∆xt = ∏ xt −1 + + t . (4.81)

The hypothesis to be tested relates to the cointegrating rank,


H1 (r ) : ∏ =  ′.

This test determines how many cointegrating vectors or long-run relationships


(r) exist in the system. In this case there are at the most r = 6 and at the
minimum r = 0, none.26 A number of hypotheses exist in relation to trends,
unrestricted intercepts in the model operate as drift parameters in the same
way as occurs when all series in the system are purely difference stationary,
which is equivalent to saying r = 0. Otherwise the VAR can have a time trend.
The model considered by Johansen and Juselius has unrestricted intercepts,
which implies that there is drift. Let us consider the results for the Johansen
test outlined above in the case of the six-variable VAR, which allows for drift
and includes centred seasonals.
The max test is calculated as max(i) = –Tlog(1 – i) for i = 1, …n. and the


i
trace test is trace (i ) = −T log(1 − i ) for i = 1, …n.27 If it were known a priori
j =1

that all the series were stationary, then both the Johansen test statistics, that
are essentially likelihood ratio tests, would follow a Chi-squared distribution.
However, as was discussed above, when the series are I(1), then the distribu-
tion is non-standard. It has been common practice to compare the test statis-
tics with their asymptotic critical values, which come from simulating a null
Multivariate Approach to Cointegration 109

Table 4.4 Eigenvalues, Johansen test statistics for VAR due to Hunter (1992)

Eigenvalue Alternative λmax 95% critical λtrace 95% critical


hypothesis value value

0.571 r=1 50.82* 39.43 119.69* 95.18


0..335 r=2 24.48 33.26 68.86 69.38
0..289 r=3 20.44 27.34 44.37 48.41
0.161 r=4 10.52 21.28 23.93 31.25
0..128 r=5 8.22 14.6 13.41 17.84
0.083 r=6 5.18 8.08 5.18 8.08

Note:
* Indicates significant at the 5% level for critical values. For tables of the Johansen trace test with
un-restricted intercept and T = 50 observations see Francis (1994).

distribution for the test that the series are multi-variate random walks. The
tests are significant when the null hypothesis r = i is rejected against the alter-
native for both tests that r > i. From the results presented in Table 4.1, both
tests ( max(1)=50.82>39.43 and trace(1) = 119.69>95.18) yield the same con-
clusion that there are r = 1 cointegrating vectors. The test is only significant in
the case where r = 1, otherwise none of the tests are significant. The test statis-
tics are asymptotic and much of the research that has looked at the impact of
testing would conclude that the performance of both tests in small samples is
poor. Based on the suggestion that the trace test is more reliable than the max
test and the fact that rejection of the proposition that there are two cointe-
grating vectors is very marginal ( trace(2) = 68.86), Johansen and Juselius whose
results are for a restricted version of this model, suggested r = 2. Some theoret-
ical and empirical evidence is presented in the next two sections as to why
there may be over-rejection.
Johansen and Juselius (1992) used the same data, but they assumed that the
oil price was strictly exogenous to the system, which means that it has no
influence on the long-run. They estimate the following five-variable VAR con-
ditional on changes in the oil price (this proposition is tested in the next
chapter):

( I + 1 L)∆xt = ∏ xt −1 + (Ξ0 + Ξ1 L)∆p0 t + + t .

The results presented in Table 4.5 are based on the same model, except that it
is estimated on the data set extended to 1991q4. The results and conclusions
are not materially different from those of Johansen and Juselius (1992). As was
concluded before, Johansen and Juselius suggested that there were r = 2 coin-
tegrating vectors, even though the test statistics did not quite bear this out. In
what follows the analysis is based on the Johansen trace test. The extended
data set implies that Johansen and Juselius (1992) were correct to suggest that
110 Modelling Non-Stationary Time Series

Table 4.5 Eigenvalues and trace test statistics for Johansen and Juselius model

Eigenvalue Alternative λtrace 95% critical

0.31 r=1 84.3* 70.6


0.27 r=2 55.4* 48.3
0.19 r=3 31.3 31.5
0.13 r=4 14.9 17.9
0.05 r=5 4.4 8.2

*Indicates significant at the 5% level for tabulated values of the test statistic with trend and one
exogenous variables. For similar values, see Pesaran et al. (2000).

there are r = 2, cointegrating vectors, because the trace test is significant for
the proposition that r exceeds zero and one. It will be discovered that the
VAR(2) model is not well formulated, but any opportunity to re-specify the
models associated with Hunter (1992a) and Johansen and Juselius (1992) is
limited by the number of observations. For further comparison with the
results in Johansen and Juselius (1992), eigenvectors are calculated for the case
in which r = 2.
The two vectors are normalized with respect to the first element, but the
normalization is arbitrary and no suggestion is made that these vectors have
any meaning. However, when compared with the results presented in
Johansen and Juselius (1992), the unrestricted eigenvectors suggest that the
following restriction (1 – 1 – 1) might be applied to both aggregate price series
and the exchange rate. The restriction implies that there is a long-run corres-
pondence between the terms of trade and the exchange rate (a condition for
Purchasing Power Parity or PPP). This conclusion is quite consistent with the
results in Johansen and Juselius (1992). This type of restriction is analyzed in
more depth in the next chapter where identification and exogeneity are dis-
cussed. It is of interest to note that neither Johansen and Juselius (1992) nor
Hunter (1992a) could force the first vector to be restricted to satisfy pure PPP;
that is to say the proposition that the real exchange rate is stationary was not
sustained by the data. And, unlike Juselius (1995), who considers similar
results for Denmark and Germany, the interest rates that appear in the model

Table 4.6 Normalized significant eigenvectors

Equation β.1 β.2

p1 1.00 1.00
–p2 –1.07 –1.6
e12 –1.03 2.9
i1 –3.34 –8.4
i2 –0.31 14.5
Multivariate Approach to Cointegration 111

do not yield a PPP vector augmented by uncovered interest rate parity (UIRP).
However, as will be observed in the next chapter the second vector does
appear to suggest UIRP. Here, it is not possible to interpret the unrestricted
cointegrating vectors as they have not been appropriately identified. Three
matrices (,  and ) were calculated they all have the same rank, which
implies that only part of  can be used to identify both  and . Without
restriction not all of the matrix pair (, ) is identified. Alternatively, without
restriction both matrices can be transformed to a square r × r arbitrary non-
singular matrix (). Therefore:

 ′ =  −1 ′ = ∗∗′ .


The system with cointegrating vectors  has the same likelihood function as
the system with cointegrating vectors *. Consequently L(.) = L(*.) and
both systems conditional on the existing information are observationally
equivalent* (Rothenberg 1971). Observational equivalence is a key criterion
for non-identification, when models cannot be distinguished then neither can
their parameters. Further, the Johansen test statistics are well defined when
the criterion for cointegration are satisfied and the DGP well approximated.
What is required is a set of single equations in the VAR, which satisfy the con-
ventional regression criterion (Spanos 1986; Davidson and MacKinnon 1993;
or Hendry 1995). More appropriately the criterion for well defined dynamic
systems of equations is discussed by Hendry and Richard (1982, 1983), that is
they should have spherical disturbances, define stable processes subject to
appropriate conditioning on impulse or periodic dummies. Should the distur-
bances be non-normal, then a quasi-likelihood result is required, that is the
sample should be sufficient for the test statistics derived from the estimator to
tend to their asymptotic distributions. The Johansen test is viewed to be sensi-
tive to different types of deviation from normality, the lag length of the VAR
and unmodelled dynamic behaviour in structure of the variance–covariance
matrix of the disturbances. When the dynamic isn’t well defined or the resid-
uals are non-normal then the Johansen test may not be optimal. The tests
may have low power to discriminate against the local alternative of cointegra-
tion or be inappropriately sized. The latter problem implies that when the test
is calculated and it is defined at a particular critical value (5%), the true proba-
bility in the tail of the distribution may exceed 5% (over-sized) or be less than
5% (under-sized).
The VAR model in Johansen and Juselius (1992) and the VAR estimated on
the extended data set isn’t well defined, as can be observed from the diagnos-
tic tests presented in Table 4.7. Here, emphasis has been placed only on the
tests that reject the null of correct specification (no serial correlation, normal-
ity, no autoregressive conditional heteroscedasticity), which leads to the pos-
sibility that the statistics may be a reflection of the testing as compared with
112 Modelling Non-Stationary Time Series

Table 4.7 Key diagnostic tests for the VAR(2) model with strictly exogenous oil prices

Equation Normality statistic Fourth order serial correlation

p1 25.74** 7.33*
p2 8.53* 2.28
e12 6.92* 0.84
i1 3.42 2.57
i2 32.44** 15.43**

true mis-specification. One procedure to counter the possibility of false rejec-


tion is to use a broader criterion than usual (i.e., 1%) on each test. This is the
Bonferroni Principle (Stock and Watson, 2003, p. 191), which implies that i
tests applied at the a/i% level yield an overall rejection rate of a% for the tests
as a whole. Special attention should be paid to the test that fail at the 1% level
as they are likely to be the ones that imply misspecification. Consequently,
the UK price equation and eurodollar equations are the ones to be concerned
about as they would reject the null hypothesis of normal errors. In the case of
the eurodollar equation no serial correlation is rejected irrespective of the cut-
off point selected for the test. There is almost zero probability that the errors
from these models are drawn from a normal distribution or in the later case
are not serially correlated.
In this light, and given the possibility that the price series may be I(2),
Hunter and Simpson (1995) used the same extended data, but dropped the oil
price. To partially correct for I(2) behaviour they followed the structure
employed by Juselius (1995) to model the Danish kroner relative to the
deutsche mark and they found that dummy variables were more effective in
accounting for non-normality than oil prices. The model is a five-variable
VAR(2) of the same form as (4.81), except x′t = [ p1t, p1t – p2t, e12, i1, i2].
According to the trace test there are r = 4 cointegrating vectors. With a cor-
rection for the number of observations due to Reimers (1994) the conclusion
of this test is less clear and given the number of dummies the test may also
have the wrong size.28 The fourth vector is marginal when inclusion of the
intervention dummies is considered, but the model is based on a longer data

Table 4.8 Eigenvalues and trace test statistics for Juselius model applied to UK data

Eigenvalue Alternative Statistic 95% critical

0.471 r≥1 117.5 68.5


0.316 r≥2 70.4 47.2
0.294 r≥3 42.2 29.7
0.138 r≥4 16.4 15.4
0.071 r=5 5.4 8.2
Multivariate Approach to Cointegration 113

Table 4.9 Normalized significant eigenvectors

Equation β.1 β.2 β.3 β.4

p1 – p2 1.00 0.16 0.32 0.53


∆p1 9.15 0.22 1.83 1.00
e12 –0.76 0.05 1.00 –0.83
i1 –3.03 –1.26 1.05 0.31
i2 0.02 1.00 1.45 –2.13

set than Johansen and Juselius (1992) and the decision to include this vector
is based on a statistic that is significant in conventional terms.
A valid analysis and interpretation of the results is left to the next chapter,
after identification is discussed. However, the first vector would appear to be
PPP augmented by a UK interest rate and the inflation rate, the second vector
suggests an interest parity condition, while the interpretation of the other
vectors is not clear.
Thus far, three alternative VAR models have been devised to explain the UK
effective exchange rate in association with a set of related variables. However,
the conclusions drawn from this exercise depend on the performance of all
the equations in the long-run and short-run models. Exclusion of variables in
the long run depends on tests of exogeneity, long–run exclusion and restric-
tions associated with economic hypotheses that are likely to identify.
Discussion of these issues is left for the next chapter, here the question of
specification would appear to be a key mechanism to discriminate between
models.
Comparison of the results in Table 4.7 with those in Table 4.10 suggest that
the transformed model is better behaved (none of the diagnostics are
significant at the 1% level). According to the system-wide diagnostic tests, the
VAR(2) which includes the dummy variables is well specified, as can be
observed from Table 4.10. If testing at the 1% level is considered acceptable,
then none of the tests of dynamic specification are significant, which implies
that each of the estimated equations is well specified. When tests are applied

Table 4.10 Single equation diagnostics for each equation in the VAR (2) model

Equation Normality statistic Fourth order serial correlation

p1 – p2 2.74 9.94*
∆p1 7.19* 2.53
e12 0.63 4.13
i1 1.25 6.28
i2 0.35 7.98

(**significant at the 1% level and *significant at the 5% level)


114 Modelling Non-Stationary Time Series

Figures 4.1–4.6 Recursive Chow tests for the 5 VAR equations and the VAR system

at the 5% level, then the serial correlation up to order four is marginally


significant in the terms of trade equation and the normality test fails at the
5% level for the inflation equation. Again, 5% might be viewed as being
overly harsh due to the risk of over-rejection, otherwise the model has per-
formed better than any of the existing models analyzed with the extended
data set. Beyond the tests described above, the models would also appear to
have stable parameters based on the sequence of recursive Chow tests pre-
sented in Figures 4.1–4.6. The 1 step ahead Chow test is a one-period ahead
forecast F test which is a variant of the Chow type 2 F test (Spanos 1986). This
is an in-sample prediction test, which examines the model parameters over
the data period for parameter constancy. It is scaled in such a way that critical
values at each point in the sample are equal to unity. Hence the horizontal
line at unity becomes the critical value to use for making inference about sta-
bility. If the parameters are found not to be consistent, then the model is spu-
rious and the equation estimates meaningless. Details of the various recursive
estimation tests can be found in PCFIML v10 (Doornik and Hendry 2001).29 As
can be observed from Figures 4.1–4.5 each of the short-run equations has
stable parameters according to the sequence of Chow tests applied at the 1%
level, which is reflected in the result in Figure 4.6 for the VAR as a system.
In section 4.4 the key condition for the existence of a cointegrated VAR
according to Johansen (1995) is

rank(⊥′ ⊥ ) = n − r .
Multivariate Approach to Cointegration 115

As is explained in Johansen (1995) a test of this rank condition can be under-


taken in the usual way by transforming the data through pre-multiplying the
data by ⊥ and then applying the Johansen test to the n – r dimensioned
system based on the difference of x+t = ⊥xt. Consider the model:
p −1
∆ 2 xt+ + ∑ ∆x
i =1

i
+
t −1 = ∆xt+− i + t+ (4.82)

p
where = I + ∑
i =1

i = − (1).

If rank(′⊥ ⊥) < n – r then there are trends in the VAR that have not been
accounted for. Taking the extreme case where rank(′⊥ ⊥) = 0, = 0 and
p −1
∆ 2 xt+ + ∑ ∆x
i =1

i
+
t −i = t+ ,

2x+t is a stationary process or there are linear combinations of xt or more


appropriately n – r variables which are I(2). Failure of this condition breaks a
fundamental criterion for cointegration in the VAR, which is that the system
is predicated on variables that are at most I(1). If the test fails and rank(′⊥ ⊥)
< n – r then it is best to test for cointegration within the confines of a dynamic
model, which accounts for the interdependencies associated with I(2) vari-
ables; this analysis is undertaken in chapter 6 for the data set used by Hunter
(1992a) and Johansen and Juselius (1992).
For the case considered here r = 4 and n – r = 1, calculating ′⊥ from the sin-
gular value decomposition of  leaves a 1 × 5 vector, which is then used to
multiply the variables in the VAR in second differences. Applying the
Frisch–Waugh Theorem to (4.82), yields a Johansen test statistic, which under
the null implies that rank( ) = 0. If the Johansen test statistic is significant at
the 5% level then we accept the alternative that rank( ) = n – r = 1 and that
the VAR(5) is a well-defined system in terms of I(1) variables that cointegrate.
For the system developed here, the result in Table 4.11 implies that the altern-
ative cannot be rejected and as a result there are no additional I(2) trends. This
suggests, that we have, what is called by Johansen (1994), balanced I(2) behav-
iour. This seems to be the only coherent way, by which a single I(2) variable
can cointegrate with other series that are I(1).

Table 4.11 Eigenvalues and trace test statistics for I(2) test

Eigenvalue Alternative Statistic 95% Critical

0.407 r=1 36.36* 9.243

Note:
*indicates significant at the 5% level
116 Modelling Non-Stationary Time Series

Cointegration as a statistical construct exists, whether it is an effective pro-


cedure for approximating actual behaviour is open to question. The usefulness
of any method depends on our capacity to detect this type of phenomena
with the nature and quantity of data available. Soren Johansen did a great
service to applied econometric and statistical research by providing a structure
for estimation, inference and identification when series are close to random
walks.

4.6.3 Evidence on the performance of the Johansen test


The usefulness of this approach depends on the properties and performance of
the test statistics. There is a burgeoning literature on size, power and compar-
ative performance of the Johansen test statistic based on simulation (Burke
and Hunter 1998; Gonzalo and Pitarakis 1999; Hubrich et al. 2001; and
Marinucci and Robinson 2001).
Performance of these tests generally relates to the quality and informative-
ness of the data, and the extent to which the underlying model satisfies the
Gaussian properties that underlie the Johansen VAR. If the series to be consid-
ered have residuals that are approximately normal, the dynamic process can
be well approximated by a finite order VAR, the residuals are not sensitive to
dynamic behaviour in the variance and breaks in structure are limited to those
that can be handled by dummy variables, then the Johansen methodology
would appear to work reasonably well (Hubrich et al. 2001). Whether series
are normal is not known a priori, but normality is testable and some of the
abhorrent features associated with non-normality can often be removed by
transformation, aggregation or dummy variables. Financial time series are
prone to significant non-normality, but evidence exists that aggregate returns
tend to normality as the period of aggregation increases. Hence, daily returns
that are non-normal will when aggregated at a quarterly frequency appear
normal (Barndorff-Nielsen and Shephard 2001). Conventional finance theory
argues that share prices are log normal and that logarithms of share prices
follow Wiener processes in the limit (Hull 2002). Hence, log transformations
of income, share prices or wealth statistics are likely to be closer to normality.
Shocks that induce large errors will cause series to fail normality tests, which
has led to the use of dummy variables as a correction (Hendry 1995).
However, in the latter case the distribution of the Johansen test is altered by
certain types of intercept correction (Johansen 1995; Hubrich et al. 2001).
When the distribution of the error satisfies appropriate regularity conditions
(see Appendix D), then the Johansen test statistic will converge to the asymp-
totic distribution, but the rate of convergence depends on the information
content or innate variability of the data. When series are highly informative
convergence may be fast, statistics based on some underlying distributions
converge more readily to normality (e.g. means calculated from data
Multivariate Approach to Cointegration 117

generated from a uniform distribution converge to normality after thirty


observations, while the t1 or Cauchy Distribution never converges to the
normal).30
In finance, there is significant interest in volatility and financial series are
often viewed as being t6 or a mixture of normals. This implies that such series
may not behave in the way that is desirable. Spanos (1994) has suggested that
the conditional t with small degrees of freedom provides an appropriate statis-
tical model for financial time series that are volatile. An alternative model of
volatility arises when there is a dynamic structure in the variance, this is often
modelled by ARCH models (Engle 1982). With dynamic behaviour in the vari-
ance, then the Johansen test statistics may not perform well. Bauwens et al.
(1997) have suggested correction for the two-step Engle–Granger approach,
but no correction appears to exist for the Johansen procedure. One suggestion
is to apply a GLS correction to the first step of the Frisch–Waugh procedure
and then use the small sample tail correction to the test statistic suggested by
Doornik and Hendry (2001); this approach yields recalibrated p-values for the
calculated Johansen trace test, the same correction may apply when dummy
variables are included in the model.
The order of the VAR is often difficult to determine, but is critical to the per-
formance of the Johansen test. Too many lags will affect the small sample per-
formance of the test, while too few lags will imply that the model is not well
specified. Often information criterion are used to suggest lag length, this
derives from univariate analysis of time series. However, such measures tend
to perform less well, when a system is considered and the dynamic of the dif-
ferent variables in the system is not homogeneous. Firstly, any VAR model
may be tested for the presence of serial correlation and should that be found
then the dynamic model needs to be re-specified (Hendry 1995). Secondly,
asymptotically the Johansen test is invariant to the number of lags in the
VAR, which suggests a general to specific approach to derive the short-run
dynamics:

T
(i) Specify a general model with s ≤ 3n
— lags per equation.

(ii) Eliminating insignificant lags to order p ≤ s.


(iii) Eliminating insignificant intermediate lags in each VAR equation.
(iv) Estimating the long-run relationships by applying the Frisch–Waugh
Theorem to a restricted version of (4.80).

Another problem that is likely to arise in this case relates to the existence of
what Caner and Kilian (2001) call hidden moving average behaviour. In
section 4.2, the question of inversion of the Wold representation was dis-
cussed. It was stated that the VECM only derives from the Granger representa-
tion theorem when the system is bivariate. A more general transformation
118 Modelling Non-Stationary Time Series

exists when the matrix polynomial from the Wold representation (C(L)) is
rational, but this proposition is still not testable from the VAR. An alternative
inversion is considered in the next section, but this only yields a finite order
VAR when C(L) is first order. One solution is to apply the Johansen procedure
to a Frisch–Waugh equation where the residuals are estimated using either a
VARMA(1,q) or shorter order VARMA (Hunter and Dislis 1996; and Lütkepohl
and Claessen 1993). Burke and Hunter (1998) have shown via simulation of
models with quite simple moving average structure, that the size and size cor-
rected power can be quite strongly affected by the existence of moving
average errors and that this does not disappear as the sample size increases.
However, Marinucci and Robinson (2001) show that the Johansen trace test
would appear to work quite well with samples of 100 observations, when com-
pared with fully modified estimators, though there is some evidence for small
systems that the Phillips modified estimator might perform better when the
sample size is less than 100 (Hubrich et al. 2001). If the system is bivariate and
one variable is weakly exogenous then the semi-parametric approach first
applied by Robinson and Marinucci (1998) to fractionally integrated series
appears to work well (Marinucci and Robinson 2001).
The number of observations likely to yield reasonable inference depends on
the nature and complexity of the problem to be analyzed and the order of
integration of the series. The advantage of the Johansen approach is that it
still provides an inferential procedure, which permits the long run to be estim-
ated and long-run systems to be identified, causal structure and endogeneity
tested. None of the other approaches appear to do all of the above. The
approach also generalizes to higher order cointegration.
In the next section we consider some further issues related to representa-
tions and in the next chapter issues of exogeneity and identification are
discussed.

4.7 Alternative representations of cointegration VAR

It was observed that the switching between cointegrating forms in the Wold
VMA and the Johansen VAR was not a straightforward exercise. One possible
explanation is that VAR and the VMA are always approximations, the other is
that the natural time series representation in the cointegration case is either a
VAR or a VMA. However, the finite VMA that forms the basis of the Granger
representation theorem and the Smith–McMillan–Yoo Form does not usually
conform with a finite order VAR. In this section we develop an extension to
the results previously considered, which derives from the literature on matrix
polynomials (Gantmacher 1960; Gohberg et al. 1983). Based on some broad
conditions for the extraction of divisor matrices from a matrix polynomial it
follows that the VMA can be directly inverted. In this section the Generalized
Multivariate Approach to Cointegration 119

Bézout Theorem and an extension that considers the unit root case are used to
derive a VAR and VARMA representation for cointegration (Hunter 1989a,
1992). It is shown that under the conditions required for the extended Bézout
Theorem, that the VMA(1) inverts exactly to a VAR(1), this result is demon-
strated for a simple bivariate system, which is used by Burke and Hunter
(1998) to develop their Monte Carlo study. The section concludes with a brief
discussion of the articles by Haldrup and Salmon (1998) and Engsted and
Johansen (1999).

4.7.1 The Sargan–Bézout factorization


From the Wold decomposition C(L) is a finite matrix polynomial of degree q.
Following the convention in the literature on matrix decompositions it is
usual to look at the inverse of the spectral decomposition of C(L):

Q ( z ) = (Q0 z q + Q1z q −1 + …Q q ) = z qC(1 / z ).

If Q(z) is a matrix polynomial, such that Q0 ≠ 0, then it follows from the


Generalized Bézout Theorem:

Theorem 2 If Q0 ≠ 0, then there exists a left-hand divisor Q0(z) = (Iz – F) such that
Q(z) = Q0(z)Q1(z), if and only if, Q(F) = 0.

Proof: see Gantmacher (1960).

In the case where Q(z) has a block of common roots, then the result devised
by Sargan (1983a) to extract Matrix Common Factors in autoregressive error
models can be applied to the case of common unit roots:

Theorem 3 If Q(z) has a block of common roots (for cointegration on the unit
circle), then Q(z) has a left-hand divisor Qo(z) = (Iz – F), if and only if FQ(F) = 0.

Proof: Consider the quasi-monic matrix polynomial,

Q ( z ) = (Q0 z q + Q1z q −1 + …Q q )
where Q0 = I and rank(Q(1)) = n – r. If there is a left-hand divisor Q0(z) = (zI – F)
of Q(z), then

zQ ( z ) = Q0 ( z )Q1 ( z ). (4.83)
By comparison of the jth polynomial powers of z on the left-hand and right-
hand side of (4.83):

Q1( j ) = FQ1( j −1) + Q j for j = 1… q. (4.84)


Expanding the right-hand side of (4.83) into its component matrices when
Q1(1) = I, then:
120 Modelling Non-Stationary Time Series

( Iz − F )Q1 ( z ) = ( Iz − F )( Iz q + Q1(1)z q −1 + …Q1( q ) ) = zQ ( z ) − FQ1( q ) . (4.85)

It follows that FQ1(q) = 0 is necessary and sufficient for (4.83) and (4.85) to be
isomorphic. Replacing j by q and re-arranging (4.84):

FQ1( q ) = F 2Q1( q −1) + FQ q = 0

or equivalently
F 2Q1( q −1) = − FQ q .

By replacement of terms of the form Q1(q – k) it follows that:


q −1
F q +1 = − F F kQ q − k . (4.86)
k =0

Gathering together terms in powers of F on the left-hand side of (4.86) and


extracting the common term in F gives rise to following polynomial in F:


q
FQ ( F ) = F F kQ q − k = 0. (a)
k =0

The existence of the left-hand divisor relies on FQ(F) = 0, which occurs either
when the Generalized Bézout Theorem holds and Q(F) = 0 or when F is a left-
hand annihilator of Q(F). ■

This generalization implies that F lies in the null space of Q(F) or when rank
(F) = r, then rank(Q(F)) = n – r. Given rank Q(F)) = n – r then there exists an
r × n matrix K1, which annihilates Q(F). There is an arbitrary matrix K2 of
dimension n × r defined so that K1K2 is non-singular and without loss of
generality F = K2(K1K2)–1K1 is an idempotent matrix, which annihilates Q(F).
When F is idempotent, then by definition Fk = F and:

∑ ∑
q q
FQ ( F ) = F FQ q − k = F Q q − k = FQ (1).
k =0 k =0

If F is idempotent, then the condition FQ(F) = 0 is equivalent to FQ(1) = 0 and


F is the matrix, which annihilates Q(1). Consequently, Q(1) satisfies a neces-
sary condition for cointegration that rank(Q(1)) = n – r and F contains the
cointegrating vectors K1.
As Q(z) is a simple inversion of the ordering of the spectral form of C(z),
then a similar result exist for C(z):

z qC( z −1 ) = Q ( z ) = Q0 ( z )Q1 ( z ).

Given that F is idempotent, Q0(z) has the following Smith rational form:

( z − 1) I r 0 
Q0 ( z ) = ( Iz − F ) = H −1  H
 0 zI n− r 
Multivariate Approach to Cointegration 121

and F the following canonical form

 Ir 0
F = H −1   H.
0 0

Following Engle and Granger (1987), the necessary condition for cointegra-
tion is K1C(1) = 0. For K′1 = [K′11 : K′12], then any matrix K′2 which satisfies the
condition that (K1K2) is non-singular can be used. It is convenient to select
K′2 = [K′12 : 0] as it is then straightforward to show that H1 = K1 and H1xt = t
defines an r vector of cointegrating variables as F then has the following form
I −1
K11 K12 
F= r .
0 0 

The only condition required for Theorem 3 to go through is the existence of a


block of common roots in Q(z). In the cointegration case Q(z) has a canonical
form  with a block of unit roots and a sub-matrix 1 with roots within the
unit circle. Then  has a multiplicity of r common roots and the left-hand
divisor has r unit roots and n – r zero roots and moving from frequency
domain to the time domain C(L) can be decomposed in an equivalent manner
to Q(L) when there are sufficient zero roots31

∆xt = C0 ( L)C1 ( L)t (4.87)

where Q(L) = Lq–1C(L–1), Q1(L) = LqC1(L–1), Q0(L) = LC0(–1) and

( L − 1) I r 0 
Q0 ( L) = H −1  −1
 H = LC0 ( L )
 0 LI n− r 

( L−1 − 1) I r 0 
= LH −1  −1
 H.
 0 L I n− r 

Therefore:
(1 − L) I r 0
C0 ( L) = H −1   H.
 0 I n− r 

Having defined a unique factorization, which extracts an appropriate


number of unit roots the Yoo inversion procedure can be applied to (4.87).
Therefore:

C0 ( L)−1 ∆xt = C1 ( L)t .


The non-invertible MA is eliminated by cancellation of the inverted difference
operator in C0(L)–1:
 1 
0
C0 ( L)−1 = H −1  1 − L r
I
H.
 
 0 I n− r 

122 Modelling Non-Stationary Time Series

By extracting a common factor C(L) becomes quasi-invertible, because it


removes a partial over-difference. Therefore:

A0 ( L)xt = C1 ( L)t

where
 Ir 0  −1  LI r 0
A0 ( L) = H −1   H = ( ∆I − H   H ) = ( ∆I − FL). (4.88)
0 ∆I n− r   0 0

The above factorization is unique as long as (a) above holds and this prohibits
the possibility of polynomial cointegration. The partial common factor (1 – L)
cancels to leave the following VARMA(1,q) in levels and differences:

( ∆I − FL)xt = C1 ( L)t (4.89)

or
∆xt − Fxt −1 = C1 ( L)t

where F = K2(K1K2)–1 K1 is an idempotent matrix, FC(1) = 0 and K1xt defines


a block of r cointegrating vectors. Under cointegration when the K1xt pro-
cesses are all I(0), then C1(L) is invertible and (4.89) has the following VAR
representation

C1 ( L)−1 ∆xt − C1 ( L)−1 Fxt −1 = t

where A(L) = C1(L)–1. It is now straightforward to transform this into an error-


correcting vector autoregressive (ECmVAR) representation: as the conven-
tional reparameterization sets A(L) = (A(0) + (1 – L)A*(L)). Therefore:

A( L)∆xt − ( A( 0) + (1 − L) A * ( L))C1 ( L)−1 ) Fxt −1 = t


or
( L)∆xt = ∏ xt −1 + t
where (L) = A(L) – A*(L) FL,  = A(0)F = ′ and for the VAR to be equivalent
to the VARMA(1,q) form it follows:

 Ir 0   H1 
A( 0) F = A( 0) H −1   
0 0  H 2 
 Ir 0  Ir 0   H1 
= A( 0) H −1    
0 0  0 0  H 2 

= A( 0) H1 H1 ,

 = A(0) H*1, ′ = H1 and H–1 = [H*1 : H*2]. As a consequence the cointegrating


vectors are equivalent to those which result from the VARMA(1,q).
Identification will be considered in more detail in the next chapter, but for
the case considered identification stems from the existence of a number of
Multivariate Approach to Cointegration 123

what will be called weakly exogenous variables.32 If A(0) has full rank and H*
has rank r this implies that  can be factorized so that there is an n × r block of
well-defined elements. It is also of interest to notice that conditional on the
knowledge of the number of cointegrating vectors, the VAR has the following
structural representation:
( L)∆xt − A( 0) Fxt −1 = t
or
+ ( L)∆xt = Fxt −1 + t
where + (L) = A(0)–1 (L) which has the same cointegrating vectors as the
VARMA(1,q) representation.

4.7.2 A VAR(1) representation of a VMA(1) model under cointegration


The following example is used to motivate the algebraic results presented
above and assist the readers understanding. If the underlying process is a
VMA(1), then the analytic result presented above yields a very simple alternat-
ive representation, which is a VAR(1) case:
∆xt = C( L)t (4.90)
1 − 1 L 1
L 
C( L) =  2 2 .
 1 1 
 2 L 1 − L
2 

Using the notation of the section above


1 0  1 1 − 1 1 − 1
M= , F =   and H = h  
0 0  2 −1 1 −1 − 1
where h is any non-zero scalar
As C1(L) = I in this case, then it follows from (4.88) that the VAR representa-
tion is
 1 1 1 
 I − 2 1 1 L xt = t
   
the VECM being

1 1 − 1
∆xt = −   xt −1 + t .
2 −1 1

An alternative way of explaining these manipulations is to state that an oper-


ator, A(L), is required such that

A( L)C( L) = ∆I ,

I is the 2 × 2 identity matrix. Pre-multiplying (4.90) by A (L) yields


A( L)∆xt = A( L)C( L)t = ∆t .
124 Modelling Non-Stationary Time Series

The differencing operator then cancels, so that, apart from initial values,

A( L)xt = t .

When A(L) and C (L) are first order, a sufficient condition on C (L) is that the
matrix lag coefficient must be idempotent. The required lag coefficient of A (L)
 1 1 1 
may then be solved for. In this case: A( L) =  I −   L , therefore:
 2 1 1 

1 − 1 L − 1 L  1 − 1 L 1 L 
A( L)C( L) =  2 2  2 2 
 1 1  1 1 
 2− L 1 − L L 1 − L
2   2 2 
( −0.5 L + 1.0) − 0.25 L
2 2
0.0 
= 
 0.0 (−0.5 L + 1.0) − 0.25 L 
2 2

1 − L 0 
= .
 0 1 − L
It is also of interest to note from the Granger reparameterization applied to
the AR and the MA representation, that the above condition implies:
A( L)C( L) = ( A(1) L + ∆A * ( L))(C(1) + ∆C * ( L)) =
= A(1)C(1) L + ∆A * ( L)C(1) + ∆A(1)C * ( L) L +
∆ 2 A * ( L)C * ( L)
= ∆I . (4.91)
It is necessary and sufficient for the above result to hold that the following
conditions apply
A(1)C(1) =  ′C(1) = 0
∆A * ( L)C(1) L + ∆A(1)C * ( L) L + ∆ A * ( L)C * ( L) = ∆I .
2 2

For the example you will observe that:

1 1 − 1 1 1 1 0 0
A(1)C(1) =    =  = 0.
2 −1 1 2 1 1 0 0

Which derives from the condition for cointegration ′C(1) = 0. If we now look
at the second term, then this yields the difference operator that cancels and
for this example A*(L) = I and C*(L) = I. Therefore:

∆A * ( L)C(1) L + ∆A(1)C * ( L) L + ∆ 2 A * ( L)C * ( L) = ∆C(1) L + ∆A(1) L + ∆ 2 .


Applying this result to the matrices for the VAR(1) example above, demon-
strates the result presented in (4.91):
∆C(1) L + ∆A(1) L + ∆ 2
1 1 − L 1 − L 1 1 − L − 1 + L
=  L +  
2 1 − L 1 − L 2 −1 + L 1 − L
1 − 2 L + L2 0 
+ 
 0 1 − 2 L + L 
2
Multivariate Approach to Cointegration 125

−2 L + L2 + 2 L( − 12 L + 12 ) + 1 L( − 12 L + 12 ) + L( 12 L − 12 ) 
= 
 L( − 12 L + 12 ) + L( 12 L − 12 ) − 2 L + L2 + 2 L( − 12 L + 12 ) + 1
− L + 1 0 
= .
 0 − L + 1

4.7.3 Further discussion of representation and estimation.


Beyond what is discussed here, a number of other attempts have been made to
explain non-stationary series that cointegrate. A variant of what has been
termed the Bewley transformation was developed by Wickens and Breusch
(1988), Gregoir and Laroque (1994) develop a definition for time series
processes under cointegration that embodies polynomial cointegration. While
Haldrup and Salmon (1998) have developed a number of decompositions of
C (L) that separate out components with different orders of integration, based
on forms that generalize the Smith-McMillan form developed by Yoo (1986).
The theory of monic matrix polynomials (Gohberg et al. 1983) has been used
to factor the Wold form and transform the VMA into a VECM (Engsted and
Johansen 1999).
However, none of these alternative representations has yielded a structure
for inference and estimation to match the Johansen methodology. The reason
for seeking an alternative approach derives from an inability to properly
invert either VMA or VARMA representations. The Johansen procedure may be
severely compromised by significant MA behaviour and this is not alleviated
by increasing the number of observations. More specifically, should there be a
VMA process generating the data then the order of VAR that approximates the
VMA will increase with the number of observations. In the limit, the VAR
order that properly approximates MA or ARMA behaviour is infinite.
Should that be the case then one might consider approximating the error
behaviour using a semi-parametric estimator. Phillips and Hansen (1990) and
Marinucci and Robinson (2001) have developed this approach to estimation
and inference for long-run behaviour. Toda and Phillips (1994) suggest a tri-
angular representation to identify the long-run relationships, which suggests
that the Phillips and Hansen approach can be applied sequentially to estimate
the long-run equations of a system. However, this type of structure will only
by chance embed the types of restriction that economic theory might suggest
and such systems might be viewed as a long-run reduced form. As was stated
above, much of the evidence in support of modified estimators relates to
bivariate or trivariate systems and often still requires the use of a test of
cointegration.
Otherwise, the VARMA is preferred to the VAR when it defines a parsimo-
nious time series representation of the data. The existence of VMA errors is
likely when the original data has been differenced, as often strongly auto-
regressive univariate time series exhibit signs of some over-differencing. The
above is generally observed as a spike in the autocorrelation function at
the frequency of the difference. An advantage of the factorization presented in
the last section is that it maintains a minimum order for the lag-length of the
126 Modelling Non-Stationary Time Series

MA and AR components. When compared with the VAR derived using the
Smith–McMillan–Yoo form, the VARMA defines a unique factorization, which
can be made robust to the choice of r the number of cointegrating vectors and
when r is known, the long-run parameters can be estimated in one step. It is
also feasible that a Johansen type procedure can be applied in this case
(Hunter and Dislis 1996). The VARMA approach associated with this decom-
position selects unique linear combinations of variables which are stationary
when FC(1) = 0. Where an exact VARMA procedure to be used, then it is possi-
ble to handle roots on or inside the unit circle (Phadke and Kedem 1978).
A similar approach has been adopted by Lütkepohl and Claessen (1993),
though they estimate the long run using the Johansen procedure and then
estimate the short-run model using a VARMA model.

4.8 Conclusion

In this chapter, cointegration associated with series that are I(1) or may be
transformed to being I(1), has been considered. Granger (1983) first specified
cointegration in terms of VMA processes which have been over-differenced. If
one considers such over-differencing, then it is mirrored in the error processes,
which then exhibit moving average behaviour with unit roots. The theory was
developed for a system of equations and from the reparameterization of the
VMA polynomial follows the fundamental result for cointegration that
rank(C(1)) = n – r. This implies that there are r over-differences or r unit roots
in the moving average representation of the differenced data. The over-
differences relate the series that cointegrate or form linear combinations that
are stationary, while the remaining n – r series, require differencing to be
made stationary. In the Granger representation theorem it is shown that the
linear combinations that are stationary are associated with error correction
terms or cointegrating vectors that transform the non-stationary series to
stationarity. The cointegrating vectors transform the series to stationarity
under the Wold form, because they annihilate C(1), which leads to the r
cointegrating variables having a multivariate moving average representation
with all roots outside the unit circle.
Unfortunately, it is not easy to show that the VMA in differences inverts to
a VAR in levels. The result developed by Engle and Granger (1987) is only
valid for bivariate systems. Yoo (1986) developed a factorization based on
Smith–McMillan forms, but these are only correct when C (L) is a rational
polynomial. In this chapter an alternative approach is developed, which gives
rise to an exact inversion of the VMA to an error correcting VAR, but this
requires a matrix F that is idempotent and which annihilates C(1). It follows
that F contains the cointegrating vectors.
Multivariate Approach to Cointegration 127

Johansen, in a sequence of papers that are best summarized in Johansen


(1991, 1995), decided to adopt an approach based on the VAR to estimate and
test for cointegration. The difficulty with the VAR is that it is difficult to prove
that it exists and to show that the cointegrating vectors define stationary
series excepting when all the series are by definition I(1). This is not the case
for the Granger representation theorem as the Wold form always exists and
when the cointegrating vectors annihilate C(1), then they always yield new
series that are I(0). However, in the context of the VAR inference and estima-
tion are relatively straightforward. It is also easy to undertake inference on the
cointegrating vectors once the appropriate order of the VAR has been selected.
The Johansen procedure has been an enormously useful tool for modelling
non-stationary time series and, as may be observed from the plethora of arti-
cles based on this methodology, it has been much used in economics and
finance. With sufficient data, the tests have relatively good size and power
properties as long as the underlying disturbances are Gaussian and the order
of the VAR is finite. Should there be non-normality or should the VAR length
be difficult to determine then the approach might be jeopardized and the tests
ill sized (Hubrich et al. 2001). It would appear possible to correct for some of
these problems by altering the model specification to correct for ARCH behav-
iour in the variance and outliers to capture the non-normality. It is even pos-
sible to correct the Johansen method for moving average errors (Hunter and
Dislis 1996). However, all of these corrections require alternatives tables for
the Johansen test statistic.
In the next chapter the question of exogeneity and identification are con-
sidered in the context of the Johansen VAR.
5
Exogeneity and Identification

In this chapter, we consider the question of long-run exogeneity and the


related issue of identification. In the authors’ opinion, detection of the exoge-
nous variables in either the long run or the short run is a precursor to any
attempt to structurally identify economic or financial phenomena.
In the preceding chapters, such issues were not addressed because single
equations are always identified to a normalization and VAR models are viewed
as being multicausal. Economic theory often determines that certain variables
are viewed as being exogenous to the system, but, given the inherent inter-
relatedness of economic systems, it may prove too arbitrary to purely permit
theory to select what is exogenous as compared with endogenous.
It is difficult for the economist to concede that the theory might not be
paramount in this context or that there may well be systems where theory
may have no prior view as to the variables that are exogenous. The require-
ment to devise a theory of exogeneity, which is about the model within which
variables are embedded, has led to the development of a range of alternative
notions of exogeneity based on the principle that models are almost invari-
ably incomplete. Engle et al. (1983) defined such notions in the short run,
while a similar sort of discussion for the long run occurs in Ericsson (1994)
and further consideration is given to these ideas here.
The notion of exogeneity combined with the existence of a set of exogenous
variables is viewed as a preface to any process of identification. A distinction is
drawn between the theoretical (or generic) concept of identification and
empirical identification. Generic identification relates to the technical feasibil-
ity of being able to detect the parametric structure of the model. The process
of generic identification may or may not reveal operational conditions either
necessary, sufficient or both necessary and sufficient to identify. Empirical
identification relates to an ability to detect by a range of measurable condi-
tions the parameters of a model. Consequently, even though a model may be
generically identified, empirically this might not be the case and vice versa.

128
Exogeneity and Identification 129

In this chapter, the idea of exogeneity is first discussed in broad terms and
it is then considered relative to the long-run parameters. When compared
with the short run some of the long-run concepts are directly testable.
Identification is then discussed in terms of a conventional system of equations
and finally in terms of the long-run parameters of the model.

5.1 An introduction to exogeneity

In terms of cointegration Johansen (1992) first defined the conditions on


the matrix of loadings () for weak exogeneity when the matrix of cointegrat-
ing vectors defines the parameters of interest. Hunter (1992a) extended
the discussion in Johansen to deal with weak exogeneity for a sub-block
(i) of parameters in the VAR and cointegrating exogeneity. Cointegrating
exogeneity implies a separation between the cointegrating vectors or
long-run non-causality between the exogenous variables (z) and endogenous
variables (y).
As is reported in Hendry and Mizon (1993), a necessary condition for
weak exogeneity is a block triangular  matrix. This would suggest that cointe-
grating exogeneity is an exact long-run analogue of strong exogeneity (see
Engle et al. 1983) as it combines weak exogeneity for a sub-vector with long-
run non-causality. The statement above is valid when further restrictions
are applied to the  matrix. One such type of restriction leads to the quasi-
diagonal form first discussed in Hunter (1992). Hall and Wickens (1994) point
out an observational equivalence between the triangular and diagonal forms
of  associated with cointegrating exogeneity. As is stated in Hunter and
Simpson (1995), this algebraic result holds only under very special conditions
and it is only consistent with the definition of weak exogeneity of z for a
specific sub-block of parameters (i) when strong exogeneity is accepted.
It follows, from Engle et al. (1983), that exogeneity is model-dependent in
the sense that variables are exogenous for a particular parameterization of a
model. This is of interest as in the context of the long run the standard
definitions of exogeneity can be directly tested (see Ericsson and Irons 1994).
In particular weak exogeneity, cointegrating exogeneity, strong exogeneity for
both a sub-block and for  depend on restrictions on  and .

5.1.1 Conditional models and testing for cointegration and exogeneity


In this section we formulate a VAR system and relate it to an error correction
model. The conditions for cointegration are specified in terms of the levels
parameters in the error correction model. Cointegration imposes a restriction
on the matrix of long-run parameters, implying that questions about the
nature of exogeneity need to be discussed in this context. At the end of
this section, we look at cointegrating exogeneity and the restrictions on the
130 Modelling Non-Stationary Time Series

long-run parameters associated with cointegrating and weak exogeneity.


Consider the n variable, kth order VAR in levels with Gaussian errors:

A( L)xt = + t and t NIID(O, ∑) for t = 1, …T (5.1)

where A(L) = (I + A1L + A2L2 + … AkLk). In error correction form:

( L)∆xt = ∏ xt −1 + + t (5.2)

where (L) = (I + 1L + 2L2 + … kLk) and xt–1 is a set of non-zero stationary


linear combinations of xt–1. The hypothesis of r cointegrating vectors is:

H1 (r ) : ∏ =  ′

where rank () = rank() = rank() = r and 0 ≤ r ≤ n. Conditional on the rank


of  and thus r, we can test further restrictions on  to determine whether the
variables are cointegrating exogenous and or weakly exogenous. Engle et al.
(1983) distinguish between a number of concepts of exogeneity: strict, strong,
weak and super. The cointegration literature has mainly dealt with the weak
exogeneity of a variable zt for  (Johansen 1992b). Weak exogeneity is defined
in terms of specific parameters of interest and formulated in terms of the dis-
tribution of observable variables. The joint density of xt in (5.2) can be parti-
tioned into a conditional density of yt given zt and a marginal density of zt
(Engle et al. 1983):

D( xt Xt −1 , ) = D( yt zt , Xt −1 ,1 )D( zt Xt −1 ,2 )

where xt = [yt, zt] and Xt = (X0, x1, x2, … xt). Weak exogeneity requires that the
parameters of interest depend on only the parameters of the conditional
density of yt and that there is a sequential cut of the parameter spaces for 1
and 2 (Florens, Mouchart and Rolin 1990). If so, the marginal density for zt
can be ignored without loss of information when conducting statistical infer-
ence about the parameters of interest. Strong exogeneity combines weak exo-
geneity with Granger non-causality, so that the marginal density for zt
becomes D(zt|Zt – 1, 2). Super exogeneity requires weak exogeneity and that
the parameters of the conditional process for yt are invariant to changes in the
process for zt. Weak exogeneity can either be defined in terms of the  matrix
as a whole or in terms of a sub-block .1.

5.1.2 Cointegration and exogeneity


Here, we take as our point of departure the matrix of long-run parameters ()
in the vector auto-regression (VAR) in error correction form. To define more
precisely the different forms of exogeneity we partition  into blocks of co-
integrating vectors associated with yt and those related to zt:

∏1,1 ∏1,2  1  1  ′ 11′ 12′ 


∏ =  =    =  
∏ 2 ,1 ∏ 2 ,2  2  2  21′ 22′ 

Exogeneity and Identification 131

where 1 is n1 × r, 2 is n2 × r, 1 is n1 × r and 2 is n2 × r. Hendry and Mizon


(1993) emphasize the exogeneity conditions associated with the short-run
parameters of the VAR. It also holds in the long run that a necessary condition
for weak exogeneity of zt for a sub-block .1 = [1,1 2,1] is a block triangular 
matrix
1,1 1,2 
= .
 0 2 ,2 
However, the triangular form is not sufficient for weak exogeneity which
means that we require further restrictions for appropriate long-run inference
in a conditional model. When triangularization is combined with (i) below
(see Hendry and Mizon 1993) then these two conditions are necessary and
sufficient for weak exogeneity of zt for the sub-block .1:
1,2 = ∑1,2 ( ∑ 2,2 )−1 2,2 = 0 or 1,2 = − ∑1,2 ( ∑ 2,2 )−1 2,2 . (i)

Cointegrating exogeneity augments the triangular  with non-causality


between y and z at the level of the system. Hence, the long-run relationships
for z do not depend on the levels of y. It follows that zt is cointegrating exoge-
nous for the sub-vector .1, if and only if:
∏2 ,1 = 0. ( ii)

Following Hunter (1990), this form of separating cointegration is quite arbit-


rary in that any orthogonal combination of 1 and 2 satisfy (ii). However,
when .1 defines the parameters of interest, then the partition 2 = [0 : 11]
and 1 = [1,1 : 0] is the only one that is relevant. This gives rise to following
matrix of long-run parameters (Hunter 1992):

∏1,1 ∏1,2  1,1 1,2  1′ ,1 2′ ,1 


∏=  =  
∏ 2 ,1 ∏ 2 ,2   0 2 ,2   0 2′ ,2 
1,11′ ,1 1,12′ ,1 + 1,22′ ,2 
= 
 0 2,22′ ,2  (iii)

where i,j is (ni × rj) and ′i,j is (rj × ni), and the following vectors: it = .′1xt
2t = ′2.2zt define r1 and r2 blocks of stationary variables.
If conditions (i) and (ii) above hold, then cointegrating exogeneity in this
form is an exact analogue of strong exogeneity as in the usual setting of
dynamic models weak exogeneity is combined with non-causality (see Engle
et al. 1983). Unfortunately the restrictions implied by (i) are not easy to
impose which leads to the alternative special case of diagonalization first dis-
cussed in Hunter (1992). Diagonalization or quasi-diagonalization of the
system requires (ii) in combination with (iv) below.

1,2 = 0 (iv)
132 Modelling Non-Stationary Time Series

However, (iv) is sufficient for 1,2 = 1,2 (2,2)–1 2,2 = 0 as this condition implies
that 1,2 = 0.1 Once the quasi–diagonal form is accepted, then weak exogeneity
of zt for .1 is equivalent to weak exogeneity of zt for the first n1 blocks of . As
a result the first sub-block of cointegrating vectors can be estimated from the y
sub-system. Hall and Wickens (1994) discuss a special case of the above result
which occurs when 1,1 is non-singular. As a result, the quasi-diagonal form is
observationally equivalent to the cointegrating exogenous case. This occurs
when rank (1,1) = n1 = r1, because it is then possible to reparameterize II in the
following way
1,1 0  1′ ,1 b  1,11′ ,1 1,1b 
∏=   = .
 0 2 ,2   0 2′ ,2   0 2 ,22′ ,2 
This diagonal form is equivalent to (iii) above; when b = ′2,1 + (1,1)– 1 1,2 ′2,2
and:2
1,11′ ,1 1,12′ ,1 + 1,22′ ,2 
∏=  .
 0 2,22′ ,2 
However, when 1,1 is non-singular, then b is a linear combination of some
minimal or more primitive set of cointegrating vectors of which 2,1 and 2,2
are sub-blocks. This difficulty in interpretation does not arise when zt is
weakly exogenous for b.1 = [1,1: b], but weak exogeneity implies a sequential
cut in the parameter space, which only occurs when (1,1)– 1 1,2 ′2,2 = 0
as otherwise b.1 = f(.2), which violates the condition for a sequential cut.
If (1,1)– 1 1,2 ′2,2 = 0 then either zt is weakly exogenous for .1 or 2,2 = 0.
In the latter case r2 = 0, r = r1 and the system is decomposed into r2 difference
stationary variables and r1 = n1 stationary variables.
It is more usual to start from the proposition that zt is weak or cointegrating
exogenous for some parameters of interest .1, block triangularity implies and
is implied by (ii) when .1 define the parameters of interest. However, zt is
only weakly exogenous for .1 when 12 = 0 or zt is strongly exogenous. The
invariance of .1 when a block diagonality restriction is applied is an indicator
that the diagonal form is valid.

5.1.3 Tests of long-run exogeneity


For such tests see Johansen (1991a) and Mosconi and Giannini (1992). Long-
run exclusion (Juselius 1994) and weak exogeneity tests can be readily applied,
while cointegrating exogeneity can be implemented using the procedure in
PCGIVE (Doornik and Hendry 2001). A more detailed explanation of the tests
of cointegrating exogeneity is given in Hunter (1992). Johansen and Juselius
(1990) show that conditional on the choice of r, a likelihood ratio test, which
is asymptotically distributed chi-squared can be used to test these hypotheses.
A range of tests associated with  and , which are related to the restrictions
Exogeneity and Identification 133

discussed both above and subsequently, are discussed in more detail by


Johansen and Juselius (1992) and Mosconi and Giannini (1992). Such tests
were categorized as follows in Hunter (1992):

H 4  :  = H 4  . H 4  (n × s),( s × r ).
H 6  :  = ( H 6  1! 2 ), H 4  (n × s), 1 ( s × r1 ), ! 2 (n × r2 ).
H7  :  = (!1 , H7  2 ), H7  (n × s), 2 ( s × r2 ), !1 (n × r1 ).
H 4 :  = H 4  . H 4 (n × s),  ( s × r ).
H 6 :  = (H 6 1 ,"2 ) H 4 (n × s), 1 ( s × r1 ), "2 (n × r2 ).
H7 :  = ("1 , H7  2 ). H7 (n × s), 2 ( s × r2 ), "1 (n × r1 ).
where r ≤ s ≤ n and r1 + r2 = r

Tests of weak exogeneity, long-run exclusion, strict exogeneity, cointegrating


exogeneity and diagonalization are presented in Table 5.1 and implemented
in PCGIVE (Doornik and Hendry 2001).
Let us consider the results presented in Hunter (1992)3 who tests for WE, CE
and diagonalization in the context of a six-variable VAR(2) model, which is an
extension of the VAR model presented in Johansen and Juselius (1992). As is
discussed in Johansen (1992), the diagnostic tests are conditional on the coin-
tegrating rank being assumed to be the same as that selected by Johansen and
Juselius (1992) as r = 2. Variables in logarithms are: oil prices (pot), UK prices
(p1t), world prices (p2t), the UK effective exchange rate (e12t), UK treasury bill
rate (i1t) and the Eurodollar rate (i2t).4
Before undertaking the usual tests Hunter (1992) checks whether the oil
price can be excluded from the long-run behaviour of the model. The test
applied is termed general exclusion by Juselius (1995) or strict exogeneity by
Hunter and Simpson (1995). The test implies that the oil price is WE for ,
which means that none of the cointegrating vectors appear in the short-run
oil price equation and long-run exclusion (LE), which means that the oil price
is excluded from all the cointegrating vectors. The restriction for weak exo-
geneity implies that the first row of  is set to zero, which based on the frame-
work presented above, requires a 5 × 2 matrix of freely estimated parameters 
and a 6 × 5 selection matrix H4 the exact form, which is:
0 0 0 0 0  0 0 
 = H 4  =   ,  =  .
 I 5   2 
Similarly for long-run exclusion:

0 0 0 0 0  0 0  5
 = H4   =  ,  =  .
 I 5   2 
Hence, H4 is a 6 × 5 selection matrix and  is a 5 × 2 matrix of unre-
stricted parameters. Testing for strict exogeneity requires the application of
134 Modelling Non-Stationary Time Series

the restrictions associated with LE (H4) and WE (H4). Using the results
presented in Table 4 of Hunter (1992), the restriction does not hold as
2(4) = 23.83 exceeds the critical value (9.49). As a result of the above finding,
all subsequent tests were applied to a model, which included all six variables.
By applying to each variable the same type of restriction as H4 above, Hunter
(1992) finds that three variables out of six might be viewed as being WE for .6
Subsequently, WE tests are applied to groups of variables. In particular, for the
case where (e12) and (i1) are tested, then  is a 4 × 2 matrix of parameters H4 is
a 6 × 4 selection matrix:

1 0 0 0 11 12 
   
0 1 0 0 21 22 
0 0 1 0 31 32 
 = H 4  =  ,  =  
0 0 0 0  0 0 
0 0 0 0  0 0 
   
0 0 0 1  61 62 

and the WE variables are associated with the 4th and 5th rows of H4 and 
respectively. The test is not significant as 2(4) = 4.04 does not exceed the
critical value.
To test whether i1 and i2 are CE for the first cointegrating vector implies the
following restrictions:

 I4 
1,1   
 =
  0 0 0 0 0 1
1,2  
0 0 0 0 0
0 0 
 
0 0 
1,2  0 0 7
 = 2 .
2 ,2  0 0
1 0 
 
0 1 

The restrictions are accepted as 2(6) = 7.82 is less than the critical value at the
5% level.
Here emphasis is placed on long-run non-causality, the short-run concept
relates to a combination of restrictions associated with CE, that 1,2 = 0 and
those on the short-run dynamics. Mosconi and Giannini (1992) apply the test
of non-causality in a short-run sense, while here the emphasis is solely on the
long run. Non-causality in the long-run relations associated with the variables
in the cointegrating equations implies a recursive structure to , whereas co-
integrating exogeneity also implies that the equations associated with the CE
Exogeneity and Identification 135

variables do not include the CE vectors associated with the non-CE variables.
Cointegrating exogeneity implies that long-run forecasts can be made condi-
tional on the CE variables.
In practice, all the restriction applied above can be undertaken using the
general restrictions approach dealt with in Hendry and Doornik (2001):
H g :  = ( ) ∩  = ( )

where  and  take a general form, which permits non-linear restrictions of


the form 14 + 23 = 0. Furthermore, the restrictions can apply both within
and across equations. Such restrictions can be imposed on the parameters of
the following matrices:

11 12  11 0 


   
21 22   1 0 
31 32  −1 0 
( ) =   , ( ) =  .
 0 0 −1 − 1 
 0 0 3 4 
   
1 2  −3 − 4 

The restriction editor in PCFIML permits the imposition of a wide range of


non-linear cross equation restriction (i.e., 12 = 0). The exact basis of the like-
lihood comparison is considered in more detail in Appendix F. Based on the
above procedure tests of WE, LE, SE and CE are applied to the exchange rate
system estimated by Hunter and Simpson (1995) and presented in the previ-
ous chapter. The I(0) variables are defined as p1 – p2, p1, e12, i1, i2 based on the
same data set considered before, though estimated over the sample
1973q3–1991q4. Hunter and Simpson (1995) consider the reordering of the
system by the degree of exogeneity or causal nature of variables and the rela-
tionship of this ordering to identification and identification/identifiability.
The test results and associated restrictions are presented in Table 5.1. Firstly,
whether variables influence the long run will be considered, that is a sub-set of
variables might be either SE or WE for  or LE8 from . It can be seen from
the results in Table 5.1 that the test of long-run exclusion is rejected for all
five variables in the VAR. While strict exogeneity for  is accepted at the 1%
level for i2 (the Eurodollar rate) and WE of i2 for  at the 1 and the 5% level. If
i2 is weakly exogenous for , then the short-run equation does not include
any of the cointegrating relationships. The Eurodollar equation is a difference
equation, which means that the behaviour of Eurodollar rate is predominantly
a random walk. Hunter and Simpson (1995) suggest that uncovering
exogenous variables through tests of WE, LE and SE for  do not identify,
because the restrictions are either common across and/or to all the long-run
equations. Hence, the restrictions do not identify, though it will be observed
136 Modelling Non-Stationary Time Series

Table 5.1 Tests of weak and strict exogeneity, long-run exclusion and cointegrating
exogeneity

Hypothesis Null Statistic (95% critical value)†

Cointegration r = 4 r≤3 –nΣln(1 – λi) = 16.4* (15.4)


(WE)|r = 4 ∆p1 α1i = 0, for i = 1,…4 χ2(4) = 32.54** (9.49)
e12 α2i = 0, for i = 1,…4 = 17.62**
p1 – p2 α3i = 0, for i = 1,…4 = 16.71**
i1 α4i = 0, for i = 1,…4 = 10.43*
i2 α5i = 0, for i = 1,…4 = 8.18
(LE)|r = 4 ∆p1 βj1 = 0, for j = 1,…4 χ2(4) = 39.94** (9.49)
e12 βj2 = 0, for j = 1,…4 = 20.31**
p1 – p2 βj3 = 0, for j = 1,…4 = 29.97**
i1 βj4 = 0, for j = 1,…4 = 25.73**
i2 βj5 = 0, for j = 1,…4 = 9.52*
(SE)|r = 4 ∆p1 α1i = βj1 = 0, i,j = 1,…4 χ2(8) = 43.66** (15.51)
e12 α2i = βj2 = 0, i,j = 1,…4 = 31.34**
p1 – p2 α3i = βj3 = 0, i,j = 1,…4 = 38.37**
i1 α4i = βj4 = 0, i,j = 1,…4 = 29.88*
i2 α5i = βj5 = 0, i,j = 1,…4 = 18.56*
(CE) e12 and α23 = α43 = 0, χ2(5) = 2.50 (11.07)
i1 for β.3|r = 4 β31 = β32 = β34 = 0

Note: †Cointegrating exogeneity (CE), strict exogeneity (SE), weak exogeneity (WE) and long-run
exclusion (LE). (* significant at the 5% level and ** significant at the 1% level.)

subsequently that uncovering exogenous variables or excluding variables from


all cointegrating vectors can aid the identification process.
As part of their approach to identification Hunter and Simpson also test for
CE in their revised exchange rate system. The test for CE implies a sub-block
of  for which condition (ii) holds (2,1 = 0). Evidence for CE may be drawn
from the insignificance of parameters in . It was decided on the basis of t-
tests by Hunter and Simpson (1995) that the treasury bill rate and the
exchange rate are CE for a cointegrating vector which can be restricted to
satisfy PPP. It can be observed from Table 5.1 that to test this proposition, six
restrictions are applied to  and , but of these only five are binding. The test
is not insignificant as 2 (5) = 2.5, so the restrictions for CE can be accepted for
both the exchange rate and the UK treasury bill rate (i1), for the cointegrating
vector that satisfies PPP. That the terms of trade are caused by the exchange
rate might be viewed as counter-intuitive, but it is quite consistent with the
type of sticky price monetary model of the exchange rate recently considered
by Charles Engle (2001). Turning to the cointegrating vector for which i1 is
CE, because this variable is also set to zero in the long-run relationship, long-
run non-causality takes a trivial form.
Exogeneity and Identification 137

Bauwens and Hunter (2000) discussed identification in association with


conditions for exogeneity and by applying tests of WE for a sub-block of
variables they showed that the model estimated by Hunter (1992) can be
identified from restrictions on  alone. At this point, it is viewed appropriate
to consider the tests of weak exogeneity and strong exogeneity presented in
the article by Bauwens and Hunter (2000).
Weak exogeneity for a sub-block of cointegrating vectors (.1) implies that
2,1 = 0 and 1,2 = 1,2 (2,2)–1 2,2. The exchange rate and interest rates are WE
for a long-run augmented PPP equation when: firstly the loadings for that
cointegrating vector are zero in both the exchange rate and interest rate equa-
tions (2,1 = 0) and secondly, the three coefficients in  associated with the
price equations are in proportion to the coefficients in 2,2.9 The test is
insignificant as 2(4) = 2.5132 does not exceed the criterion at the 5% level.
Bauwens and Hunter proceed to test for long-run strong exogeneity, this com-
bines WE with long-run non-causality. For the model in Hunter (1992), both
interest rates (i1, i2) and the exchange rate (e12) are strongly exogenous in the
long run for the interest rate augmented PPP vector, because (i1, i2, e12) satisfy
the WE restrictions, and the interest rate vector is not long-run caused by the
real oil price (p0) and goods prices (p1, p2).10 The restrictions that are being
applied are a combination of those required for WE (2,1 = 0 and 1,2 = 1,2
(2,2)–1 2,2) and CE (2,1 = 0 and 1,2 = 0).
As has been observed above, it is possible to undertake direct parametric
tests of WE, LE and SE for  and WE, CE and strong exogeneity for a sub-block
of . Otherwise, the observation that the long-run parameters are invariant to
a sub-set of variables combined with WE implies that such variables are super
exogenous, but no direct restrictions apply to the parameters in the long run.
For further discussion of super exogeneity, see Ericsson and Irons (1994).

5.2 Identification

Parametric econometric identification is the capacity to appropriately detect


model parameters from empirical observations. One can further discriminate
between a conceptual capacity to determine parameters algebraically and an
observational capability of the data to permit such a distinction. The former is
termed generic identification by Johansen and Juselius (1994). Generic
identification is concerned with the specification of conditions that permit
parameters to be solved, discriminated or detected from an unrestricted
system or estimable reduced form, and consequently such conditions may be
defined prior to any analysis based on the innate structure of the model. The
latter form is empirical identification, although it might be possible, based on
some restrictions, to identify some parameters, the restrictions selected might
not be empirically acceptable and as a result the model will not be identified
138 Modelling Non-Stationary Time Series

in practice. A further issue which limits our ability to identify is the notion of
observational equivalence. Appropriate restrictions might be found and
generic identification satisfied, the restrictions applied might be accepted, but
it may not be possible to discriminate between one class of model and another
model drawn from a different set of theoretical principles.
For linear models identification is usually straightforward, depending on
simple order conditions and a rank restriction (Goldberger 1964). When one
considers further degrees of non-linearity, then it becomes more difficult to
prove generic identification and the process becomes more empirical in
nature. Although certain advances have been made, the notion of observa-
tional equivalence is often all that is available to discriminate between
identified and non-identified models (Rothenberg 1971). Rothenberg (1971)
makes a further distinction between local and global identification. Local
identification is described as the ability to discriminate between models with
observationally distinct parameterizations within a neighbourhood of the
optimum. Consequently, identification, by its very nature, becomes more
empirical and any conclusions drawn are reliant on the parameterization of
the problem. Generic identification often stems from the rank of the informa-
tion matrix, which is a necessary criterion for safe optimization, though in
practice highly ill conditioned problems may yield locally well-defined para-
meter estimates. The empirical and generic notions become intimately related.
The ability to estimate some ‘structure’ consistently yields the possibility of a
sub-category of models, which may be observationally equivalent. Usually,
the minimum parametric form is a reduced form and from this more specific
structural models can be identified.
It is a combination of such necessary and sufficient conditions that will be
the main concern of the following sections of the chapter, in combination
with the question of observational equivalence. These results are then applied
to the identification of long-run relationships. In the above sense, generic
identification depends on sufficient conditions derived from Rothenberg
(1971) combined with an order condition necessary for identification.
Identification and identifiability are viewed as being non-linear in nature,
which implies that this treatment is both different and more general than that
of Johansen (1995a) and Boswijk (1996). The treatment also permits the ready
combination of restrictions on all the parameters associated with the long-run
behaviour of the model.
Some of the conditions considered here stem from the article by Hunter
(1998) where the question of non-identification is addressed. Sargan (1983a)
emphasized what he defines as conditions for higher order identification,
the very existence of which may depend on higher-order moments. In
this context consistency and non-identifiability are not equivalent when
identification depends on distributional assumptions. This renders the usual
Exogeneity and Identification 139

condition on the Hessian or information criterion (Doornik 1995; Doornik


and Hendry 1996) as a necessary, but not sufficient condition. Questions of
distribution automatically open the door to a Bayesian treatment of the
problem (see Bauwens, Lubrano and Richard 2000, for a discussion of this
issue).
Firstly, the preliminaries of identification, identifiability and observational
equivalence are discussed, and then their relation to cointegration is consid-
ered. Next the results of Johansen, Boswijk and Hunter are placed in context
and discussed in relation to some simple cases.

5.2.1 I(0) systems and some preliminaries


For a generic notion of identification or identifiability, it is important to con-
sider the issue of observational equivalence. It is this idea which forms the
basis for most conditions and definitions of identification, even though the
final condition may be far removed from this. It is this definition, which is
most general in nature, though often less easy to consider in practice. If one
considers the simultaneous equation model (SEM), it is common in the
identification literature to take as point of departure a matrix of reduced form
parameters P. Consider, the following structural form for a linear SEM (see
Goldberger 1964):
Byt + zt = ut and ut ~ D( 0, ∑ ) (5.3)

where B is an n1 × n1 matrix of endogenous variable parameters, yt an n1 vector


of endogenous variables, an n1 × n2 matrix of exogenous variable parameters,
zt an n2 vector of predetermined variables, ut an n1 vector of structural errors
and  an n1 × n1 variance–covariance matrix. It is well known that the rela-
tionship between the reduced form parameters (P) and the structural form
parameters is:

P = − B −1 . (5.4)
It is common to redefine (5.3) above, thus:

Ax = ut

where A = [B : ] and xt = [yt : zt]. Identification usually follows from the


acceptance of a number of linear restrictions of the form:

Ri ai = 0 f or i = 1, …, n1

where ai is a column vector composed of the ith column from A and Ri is a


selection matrix that determines the variables to be restricted in the ith equa-
tion. This leads to the classical result on identification attributed to Koopmans
(1953), which implies satisfaction of a rank condition. This gives rise to the
following theorem:
140 Modelling Non-Stationary Time Series

Theorem 4 A necessary and sufficient condition that parameters [A : ] are


identified is rank (Ri) = n1 – 1 for i = 1, …, n1.

As it stands, no account is taken of the impact of cross-equation restrictions. It


follows from simple algebraic manipulation of (5.4), that:

BP + = 0
or
P 
[B ]  I  = A = 0. (5.5)
 
When (5.5) is transposed and a single row from A is considered then the rank
condition becomes:
rank([′ Ri ]) = n1 − 1.

However, applying the above condition to identify generically is somewhat


complicated as it requires the matrix P associated with the restricted model for
each case, otherwise a rank test similar to that associated with cointegration is
needed. However, there is an order condition that can be used to select the
appropriate number of restrictions

ji + n2 ≥ n1 + n2 − 1

or

ji ≥ n1 − 1.11

Furthermore, some of the types of restriction that violate the rank condition
are well known, identification is lost when two equations use the same restric-
tion as they are observationally equivalent and the same restriction applied to
all equations simply reduces the number of operational variables in the model
and thus a restriction is lost. However, the type of restriction discussed above
is linear in nature and often restrictions might well be non-linear (i.e., the case
of CE discussed above requires non-linear estimation). Prior to any discussion
of cointegration we consider non-linear identification, based on the results in
Rothenberg (1971) and Sargan (1988). The following theorem follows from
Sargan (1975):

Theorem 5 If there exists a regular point 0 僆  where  is some well-defined para-


meter space, then 0 is locally unidentifiable when there exists a 1 僆  such that:

L(0 Xt ) = L(1 Xt )

where L is the likelihood function and Xt is the observed sample.

Proof: Sargan (1975).


Exogeneity and Identification 141

Consider the following reduced form system

yt = Pzt +  t

where yt and zt are defined above, t an n1 vector of reduced form errors and P
is an n1 × n2 matrix of reduced form parameters. A consistent estimator of P is:

Pˆ = Y ′Z( Z ′Z )−1

where Y and Z are matrices composed of T stacked observations of the vectors


y′t and z′t.
Now take a non-linear function which maps vec(P) = p (an n1n2 vector) onto
 (a vector of q structural parameters).
p = g ( ) (5.6)

To preclude a trivial over-parameterization, it is assumed that q ≤ n1n2. A nec-


essary condition for identification, given by Rothenberg (1971), is that the
Hessian matrix is non-singular.12 An alternative approach, relates to the
Jacobian of the transformation and this is described as first-order
identification by Sargan (1983):

 ∂g 
rank   < q. (5.7)
 ∂ ′ 

Again, the above condition is necessary for local identification and failure
leads to a model that satisfies the full rank condition that is generally viewed
#g
as being unidentified. However, for non-linear models, where # — has full rank,
it may still be possible to obtain solutions to (5.6), because the conditions for
singularity or near singularity are less burdensome than those required to
solve (5.6). This gives rise to the following theorem that derives from Sargan
(1983a):

Theorem 6 Given p a vector of reduced form coefficients, then a sufficient condition


for the identification of a vector of structural parameters  is the existence of a
unique solution  = * to the vector function p* = g(*).

Proof: If g(.) is continuously differentiable within a neighbourhood of * and


#g
—) < q, then  = * is a solution (5.6) and * is identified. However, when
rank( #
#g
—) ≈ q and  = * is a solution to (5.6), then * is still identified.
rank(# ■

By simulation Sargan (1983b) shows that there may be near singular models
that cannot be distinguished from singular models, but satisfy (5.6) and are
thus identified. The convergence in distribution of estimators derived from
such near singular cases turns out to be much slower than usual. Because of a
larger than usual asymptotic variance they tend to be classified empirically as
unidentified.
142 Modelling Non-Stationary Time Series

An order condition which is necessary for identification obviates the


problem of over-parameterization, while conditions for the existence of a
solution to (5.6) are sufficient for global identification for a broad range of
non-linear models. For example, Hunter (1992) considers such conditions for
rational expectations models.

5.2.1.1 The cointegration case


Equation (5.3) might equally well define a structural cointegration model
when yt = xt and zt = xt–1. Taking a range of specifications of ut, (5.3) can rep-
resent any order of VAR or VMA process. In the cointegration case the matrix
P defines the matrix of long-run coefficients usually termed . The latter,
assumed to have rank equal to r (the number of cointegrating relations), is
decomposed thus:

∏ =  ′

where  and  are n × r matrices of rank r. Cointegration takes as a starting


point the identification of . The error correction form under the usual
assumptions produces r dependencies between n variables. In theory rank
() = r, but excepting exact dependencies, all n2 elements are commonly esti-
mated as compared to the 2nr elements in  and , which need to be
identified. After normalization, there are only (n – 1)r unrestricted elements in
. Under cointegration, a comparison between the number of parameters in
, equal to n2 – (n – r)2, and those in  and  equal to nr + (n – 1) r, gives rise
to the order condition that at least r2 – r elements of  and  must be
restricted (when r = 1, there is no need for restriction beyond the normaliza-
tion). For identification of the long-run parameters, we require j = r2 – r restric-
tions to reduce the number of redundant parameters in  and . This defines a
multi-equation version of the usual systems order condition:13

r ≤ r 2 − j or j ≥ r 2 − r (5.8)

Without a true knowledge of structure, the reduced rank condition on 


reduces the set of alternative long-run models. However, the order condition
is only necessary for identification. Two issues arise: (a) certain types of restric-
tion do not identify; (b) alternative structural models may be identified. The
former is well known within the conventional identification literature
(Goldberger 1964; Sargan 1988), specifically, the same restriction applied to
each equation in turn or the same restrictions applied to two or more equa-
tions. Observational equivalence implies the existence of two models with dif-
ferent structure, which are statistically indistinguishable. In the context of
cointegration, Johansen (1995a) shows that there is a non-null set of models,
which are observationally equivalent depending on the nature of the restric-
tions imposed.14
Exogeneity and Identification 143

There is a presumption in the identification literature related to cointegra-


tion that restrictions on  are a priori non-identifying. This would seem to be
the sentiment in Johansen’s work, while Pesaran et al. (2000) is stronger in
condemnation of those who impose such restrictions. In the Bayesian litera-
ture there is a suggestion that this overly complicates the estimation process
(Bauwens, Lubrano and Richard 2000). It must be stated that (5.3), from
which (5.8) is defined, has no such limitation on the imposition of restric-
tions. However, P, as distinct from , presumes the existence of a partition of
x into endogenous and exogenous variables (Ericsson and Irons (1994) and
Section 5.1).
Prior to any discussion of the merits of alternative approaches to
identification there must be some discussion of these issues. Firstly, there are
some cases where there may be no natural restrictions on , whereas there
may be strong views over causality (Parker 1998). Secondly, both cause and
restriction might be relevant to a specific theory which implies that
identification depends on joint tests on both  and  (for example, tests of the
monetary approach to the balance of payments imply PPP and causality from
price to the exchange rate). Thirdly, exogeneity might be viewed as a precur-
sor to any analysis, because weak exogeneity of some zs for  may be a
sufficient condition for identification (Hunter 1998). Fourthly, both non-lin-
earity and non-normality might imply a significant role for prior information
in the identification process (Bauwens, Lubrano and Richard 2000).

5.2.2 A simple indirect procedure for generic identification


In this section, identification is handled by a procedure that is widely used in
the literature for I(0) econometrics.15 First, a sufficient number of restrictions
needs to be selected and this follows from the order condition derived above
(i.e., ji = r – 1 restrictions per long-run equation). Secondly, for generic
identification a solution needs to be found for the parameters of the structural
form from some reduced form. Thirdly, empirical identification is checked by
testing any over-identifying restrictions.
Rothenberg (1971) suggests that global conditions for identification depend
on the relationship between the structural form (SF) and the reduced form
(RF) parameters. It is well known that the relationship between the reduced
form parameters (P) and the structural form parameters for the linear case is:

P = − B −1 . (5.9)
If P is unrestricted, then identification of P follows from our ability to estimate
the long-run parameters.
A multivariate generalization of the conventional condition for the identifi-
cation of a regression equation is required, that is a first moment matrix com-
posed of some regressors has to have full rank. If P =  is calculated using the
144 Modelling Non-Stationary Time Series

Johansen procedure, then the VAR(1) transformation yields well-defined para-


meters for all the long-run equations in the system when rank(R′1R1) = n where
R1 is an n × n matrix of regression residuals. Under cointegration,  is typically
a reduced rank matrix and rank (R′1R1) = r. Hence, the Johansen rank test for
cointegration determines the extent to which the rows and columns of  may
be uniquely defined. Identification of  is only necessary for the existence of
the long-run parameters.
Under cointegration  is a reduced rank matrix, which implies that n – r
rows and columns are dependent. Now, from the traditional treatment of
matrix algebra (Dhrymes 1984), a matrix of rank r has r independent rows and
columns. Partitioning  such that
∏1  1 ′ 
∏ = = , (5.10)
∏ 2  2 ′ 
then 1 is r × n dimensioned and 1 is an r × r dimensioned sub-matrix of .
Subject to ,  and  all of rank r (cointegrating rank), then rank(1) = r ⇒  is
identified. Where 1 with full rank is a sufficient condition for  to be
identified from  as:

 ′ = (1 )−1 ∏1 .

By a similar argument an analogous result exists for , which may be


identified when a square sub-matrix of  exists such that:

 ′ = (1 )−1 ∏.1

where .1 is an r × n matrix composed of the first r columns of . A generic


proof of identification for the model estimated by Hunter and Simpson (1995)
follows from solving equations of a similar form to (5.9), except for the coin-
tegration case P = ′. The solution to the restricted parameters is shown to
exist for the model in Hunter and Simpson (1995) using the indirect result
derived in Appendix G.
Identification stems from the imposition of a number of additional restric-
tions on  and . In this case, identification was simplified by reordering the
system using the tests of weak exogeneity presented in Table 5.1.

5.2.3 Johansen identification conditions


Johansen, in a series of papers with Juselius (1994) and (1995a), places empha-
sis on  as the parameters over which structural hypotheses are defined.
However, cointegration doesn’t differentiate between endogenous and exoge-
nous variables, which negates the original Cowles foundation view that
identification stems from information on exogenous or pre-determined vari-
ables. By concentrating on the long run with no exogenous variables then
conventional cointegration has no predetermined variables.16
Exogeneity and Identification 145

The Johansen approach to identification considers a series of rank condi-


tions which yield a family of ordered tests of the form:

H1 ⊂ H 2 ⊂ H 3 … ⊂ H i .

Identification follows from the acceptance of the sequence of tests. Let us look
at some linear restrictions of the form:
Ri Ai = 0 for i = 1, …r .

As was observed in section 5.4, it is more common in the cointegration litera-


ture to formulate the restrictions as:
Ai = H i.

And Hi is a selection matrix composed of zeros and ones, and  an i dimen-


sioned vector of unrestricted parameters. As a result:
A = [ H11 H 22 … H rr ].

Any linear statistical model with a set of restrictions may be defined thus:

L = Agxr , ∑ H ii = 0, i = 1, …r .

It is then of use to differentiate between models which are linear and


restricted as compared with those which are identified:

M = Agxr , ∑ H ii = 0, rank ( Ri Ai ) = r − 1, i = 1, …r .

Notice that the restrictions associated with M are now non-linear by virtue of
the rank restriction. When the set of all possible restrictions is considered,
then the class of just identified models is likely to be large, though it will
define a subset of the restricted models, so that M 傺 L.

Theorem 7 If L contains an identified parameter value, that is M is not empty, then


M is an open dense subset of L.

Proof: see Johansen (1995b).

The above result implies that there is a non-null set of models that cannot be
distinguished on the basis of the likelihood and they define a family of obser-
vationally equivalent models, which satisfy the rank condition and thus corre-
spond to a point in M with certainty. If these results are made particular to the
cointegration case, then the parameter point given by the restrictions Ri for
i = 1, … r, with

A( i ) = [ H11 H 22 … H rr ],

defines a representation which is identified when:


146 Modelling Non-Stationary Time Series

rank( Ri H11 Ri H 22 … Ri H rr ) = r − 1 for i = 1, …r .

Under cointegration a necessary condition for a specific set of linear restric-


tions to be identifying is given by the following theorem.

Theorem 8 The linear statistical model L defined by the restrictions Ri for i = 1, … r


is identifying if and only if for each i:

rank( Ri H i1 Ri H i2 … Ri H ik ) ≥ k

where k is set in turn to 1, 2 …, r – 1 for all sequence of indices 1 ≤ i1 ≤ i2 … ≤ ik ≤ r


such that ij ≠ i.

Proof: Of necessity and sufficiency see Johansen (1995b).

Consider, for example, the model estimated by Hunter and Simpson (1995),
which has r = 4 cointegrating vectors,  = [H11 H22 H33 H44], then
identification of the first cointegrating vector alone requires us to check:
rank( R1′ H i1 ) = 1, for i1 = 2, 3, 4
rank( R1′ H 2 R1′ H i2 ) = 2 for i2 = 3, 4
rank( R1′ H 2 R1′ H 3 R1′ H 4 ) = 3.
In the case of the second cointegrating vector:
rank( R2′ H i1 ) = 1, for i1 = 1, 3, 4
rank( R2′ H1 R2′ H i2 ) = 2 for i2 = 3, 4
rank( R2′ H1 R2′ H 3 R2′ H 4 ) = 3.
Similar types of rank condition need to be checked for each remaining cointe-
grating vector.
Consider the simpler case estimated by Hunter (1992) and used before in
section 5.1.3. In this case n = 6, r = 2,  = [H11 H22] and based on the order
condition two restrictions are required to identify each cointegrating vector
without normalization. For this section PPP is applied as a parametric restric-
tion to the first vector [*, a, -a, -a, *, 0] in combination with a zero restriction
on the eurodollar rate,17 while the second vector is restricted to accept UIP,
[0, 0, 0, 0, b, –b]. Hence, there are j1 = 2 restrictions in the first vector, which
without normalization is enough to just identify. And j2 = 5 means that the
second vector ought to be over–identified before normalization. Therefore:

1 0 0   0
   
 0 1 0   0
0 − 1 0   0
H1 =   and H 2 =  
0 − 1 0   0
0 0 0   1
   
0 0 1 −1
Exogeneity and Identification 147

  
1 0 0 0 0 0
 
0 1 1 0 0 0  0 1 0 0 0 0
 
R1′ = 0 1 0 1 0 0  and R2′ = 0 0 1 0 0 0.
 
0 0 0 0 1 0 0 0 0 1 0 0
0 0 0 0 1 1 

In this case both vectors are identified when k = r – 1 = 1 conditions are
satisfied, for a block of homogeneous restriction of the form R′k k = 0 or R′k Hk =
0 for k = 1, 2. It follows that the Johansen approach to identification checks
each combination of conditions rank(R′i Hj) = 1 for i ≠ j. In the case of the first
vector, it follows that,

1 0 0
 
0 1 0
0 1 1 0 0 0  
 
 0 −1 0
R1′ H1 = 0 1 0 1 0 0   
0 −1 0
0 0 0 0 1 0 
0 0 0
 
0 0 1 
0 0 0 
 
0 0 0 
= 0 0 0.
 
the two matrices are orthogonal, while for identification:

 0
 
0
0 1 1 0 0 0   
  0

R1′ H 2 = 0 1 0 1 0 0   
0
0 0 0 0 1 0  
 1
 
−1
0 
 
= 0.
1 

According to the Johansen conditions, the first equation is just identified,


whereas identification of the second vector follows from rank(R′2 H1) ≥ r – 1:

1 0 0
1 0 0 0 0 0  
  0 1 0
0 1 0 0 0 0 
0 −1 0
R2′ H1 = 0 0 1 0 0 0  
  0 −1 0
0 0 0 1 0 0 
0 0 0
0 0 0 0 1 1   

0 0 1 
 
148 Modelling Non-Stationary Time Series

0 0 1 
0 0 0
 
0 0 0
= 0 0 0.
 
0 − 1 0
0 0 1 

Hence, the second vector is identified, because the matrix product above has 3
independent rows and columns.
As can be seen from the above derivations, the algebra becomes increasingly
burdensome with r. The conditions also relate to the specific definition of
generic identification described by Johansen (1995b) and the article does not
address the issue of empirical identification or the more general notion of
identification associated with observational equivalence.

5.2.4 Boswijk conditions and observational equivalence


Boswijk (1996) emphasizes the restriction on  rather than those on , though
he argues that similar results will also hold for . To solve this problem,
Boswijk (1996) provides two further conditions for what he terms identifi-
ability. According to Boswijk,  is non-identifiable when the normalization
fails or some of the remaining parameters are not significant. Therefore:

H 02 :  ∈ B3 ∪ B4 = { : rank( R1∗) ≤ r − 1},

where R*1′ is the restriction matrix including the normalization and B3 傼 B4


defines the null associated with non-identifiability. Consider the following
example, developed from Boswijk (1996), n = 3, r = 2 and j = 2 = r2 – r restric-
tions identify :

a 0 b  a o 
′ =   and H 2 =  .
 c d 0  c d 
Selecting the normalization, a = 1 and d = 1 it follows from Boswijk (1996),
that the first vector in ′ is identifiable when the matrix H2 has full rank. To
discriminate between failure of normalization and other types of failure, a
further rank test is applied to an r – 1 dimensioned sub-matrix. Therefore:
H 03 :  ∈ B4 = { : rank( R1′) ≤ r − 2 }.

In the example rank failure occurs for H02 when a = 0 (normalization) and for
the further restriction associated with H03: d = 0. However, from the accep-
tance of the Johansen test for cointegration (rank(′) = r),  is identifiable as r
linearly independent cointegrating vectors must exist and, given acceptance
of the over-identifying restrictions, then the first vector is identified when I(0)
variables are precluded from the system. Using the approach of Boswijk, once
the first vector is identifiable, then rank conditions need to be tested for each
of the other vectors in turn.
Based on the results presented in Hunter and Simpson (1995) and those
above, some of the problems associated with incorrect normalization may be
Exogeneity and Identification 149

obviated by excluding normalizations associated with weakly exogenous and


long-run excluded variables.

5.2.5 Hunter’s conditions for identification


It is conventional in the identification literature to consider the relationship
between reduced form and structural form parameters when the problem is
non-linear (Rothenberg 1971). Given the existence of a set of row and column
vectors of appropriate dimension selected from , then the question arises as
to which such sub-blocks might be used to identify. In this light an orienta-
tion might be selected for the system which does not prejudge the possible
normalization of the long-run parameters, but defines possible rows and
columns from which the  and the  might be identified; it should be noted
n!
that there are (n − r )! r! possible alternative combinations of rows and columns
from .
The following theorem provides sufficient conditions for the existence of a
unique solution to a vector function relating the identifiable elements of ,
that is  = vec(r), where r = [
ij 僆 i 傼 .j],18 to the unknown parameters in
 and , that may be stacked in .

Theorem 9 Given (5.8) is satisfied and knowledge of r (the cointegrating rank)


a sufficient condition for a solution to the vector function  = g( ) is the exist-
ence of two r × r dimensioned non-singular sub-matrices A and B, in  and 
respectively.

Proof. Rank() = r is equivalent to the existence of a sub-matrix A such that


rank(A) = r. There are (n −nr!)! r! possible alternative combinations of rows of
 from which A might be formed. It follows that each A has a related
sub-matrix i of  such that rank(A) = r ⇔ rank(i) = r and i = A′.
Vectorizing i implies that vec(i) = vec(A′) = (In ⊗ A)vec(′) (see Dhrymes
1984, pp. 40–3 and chapter 4). Following the argument in Sargan (1983a, pp.
282–3),  is identifiable when A has full rank as firstly a unique solution
results:

vec( ′ ) = ( I n ⊗ A)−1 vec( ∏ i ) (5.11)

 ∂vec(  ′ ) 
and secondly rank   = nr , if the normalization is ignored. By similar
 ∂vec( ∏ i )′ 
argument,  is identifiable when there exists two matrices .j and B for which
.j = B′ and B is non-singular. As a result, a unique solution for  exists of the
form:
vec() = ( B ⊗ I n )−1 vec( ∏. j ) (5.12)

In the cointegration case, the existence of one or more solutions to (5.11) and
(5.12) is sufficient for the existence of a solution to  = g(), which is what is
required for identification given (5.8). Finding such solutions negates the need
to undertake the test in Johansen (1995b).
150 Modelling Non-Stationary Time Series

Linearity, or the need to consider  and , does not present a problem for
the condition in Theorem 9 that may be applied sequentially to  and  to
yield a sufficient set of solutions. Empirical verification of the generic result
follows from a direct test of the over-identifying restrictions:
( I ) H  :   + R vec() = 0
H : R vec() = 0.

Now  is a j × 1 vector of known constants (normalizations), R and R are


j × nr and j × nr matrices, which select all the j and j restrictions on  and 
respectively, and j = j + j.19 The degrees of freedom of the test are calculated
from the number of solutions to (5.11) and (5.12). If (I) is rejected, then this is
sufficient for non-identification, while identification requires a different set of
restrictions. However, acceptance of (I) is only necessary for identification as
there may be a sequence of models, that accept either the over-identifying
restrictions or Johansen’s test (Johansen, 1995b).
Here, an alternative approach follows from the sufficient conditions for a
solution to (5.11) and (5.12) given in Theorem 9:

(II) Test identifiability: rank(B) = r and rank(A) = r.

The existence of a solution to (5.11) and (5.12) implies the system is generi-
cally identified. As Boswijk suggests, on empirical grounds identification may
fail due to insignificance of certain parameters. Here, identifiability follows
from the existence of sufficient information in certain rows and columns of 
to identify  and  (Sargan, 1983). Clearly, many such orientations related to
particular over-identifying restrictions may exist. However, it is sufficient to
find one such orientation of the system to empirically accept the generic solu-
tion. Consider the example used above where for comparison with Boswijk we
let B = H2. When rank(H2) = r,20 then the condition in Boswijk (1996) is
satisfied, but also the sufficient condition for the existence of a solution to
(5.12) (a matrix B of full rank). From Theorem 9, the rank condition identifies
 based on the restrictions in (I). Then conditional on (I), discovery of a
matrix (B) with full rank is sufficient for identification of .
If the variable chosen for normalization is invalid (a = 0 and rank(H2) < r),
then failure of the rank condition yields an additional restriction on the set of
cointegrating vectors (′). Therefore  can be identified from a new orientation:

0 0 b  0 b 
′ =   and B =  .
c d 0  d 0 
The system is now over-identified as j = 3 > r2 – r. From acceptance of the
Johansen rank test, |B| = 0 can only occur when d = 0, but this contradicts the
proposition that rank(′) = 2. The structure of ′ based on d = 0 gives x1 and x3
as the cointegrating vectors, so two series in xt are I(0).21
Exogeneity and Identification 151

Boswijk and Johansen emphasize a limited information approach associated


with linear restrictions, that can only be applied to  and  in turn. In this
section, restrictions can be applied to both  and , they can be non-linear
and they apply to the system as a whole.
In the next section, the results are extended further to take account of
exogeneity.

5.3 Exogeneity and identification

Traditional econometric methodology assumes the existence of a set of exoge-


nous variables, whereas the notion of cointegration and vector autoregressive
(VAR) modelling negates this. Cointegration is multi–causal and the VAR
treats all variables as endogenous but within such a system, it is feasible to test
a number of notions of long-run exogeneity. The reader is directed to look at
Ericsson and Irons (1994) and Ericsson et al. (1998). Now consider the impact
of long-run exogeneity and identification on the system.
Let the system (5.2) be separated into two sub-models, corresponding to a
partition of xt into yt and zt of dimensions n1 and n2, respectively, and con-
formable partitioning of  and :22
∆yt = (1,11′ ,1 + 1,21′ ,2 )yt −1 + (1,12′ ,1 + 1,22′ ,2 )zt −1 + 1t (5.13)
∆zt = (2 ,11′ ,1 + 2 ,21′ ,2 )yt −1 +
(2 ,12′ ,1 + 2 ,22′ ,2 )zt −1 + 2tt , (5.14)

where (′1t 2t


′ )′ ~ N (O, ) and independently over t = 1, …, T. It is well known
that when [2,1: 2,2] = [0 : 0], then zt is weakly exogenous for  (Johansen
1992).
However, such restrictions do not directly assist in the identification of the
long-run parameters as they apply to a part of  which is non-informative. In
terms of the requirement to find a solution to (5.11) and (5.12), weak exo-
geneity is of direct use when there are n – r weakly exogenous variables as the
only basis for a choice of A is the matrix [1,1: 1,2], which is then by definition
of rank r.
Otherwise, one might consider weak exogeneity associated with a sub-block
of cointegrating vectors. To discuss issues of exogeneity it is useful to look at
the conditional model for yt given zt (Johansen 1992):
∆yt = [(1,11′ ,1 + 1,21′ ,2 − $(2 ,11′ ,1 + 2 ,21′ ,2 )]yt −1 + $∆zt
+ [(1,12′ ,1 + 1,22′ ,2 − $(2 ,12′ ,1 + 2 ,22′ ,2 )]zt −1 + 1t − $2 t (5.15)

where $ = 1,2–1 2,2. One set of sufficient conditions for weak exogeneity of zt
for ′.1 = [′1,1: ′2,1] is 1,2 – $2,2 = 0 and 2,1 = 0, see Lemma 2 in Ericsson et al.
(1998). Combining (5.15) with (5.14) yields a system which, to a non-singular
transformation matrix, is equivalent to the original VAR. If (1,2 = 0, 2,1 = 0) is
152 Modelling Non-Stationary Time Series

applied to (5.13) and (5.14), then the VAR has a quasi-diagonal long-run struc-
ture (Hunter, 1992). For weak exogeneity additional restrictions may apply as
1,2 – $2,2 = 0 is required. Should 1,2 = 0, then $2,2 = 0 is sufficient for weak
exogeneity. This result can be associated with three possible requirements:
(i) $ = 0; (ii) 2,2 = 0; or (iii) $ is a left-hand side annihilation matrix of 2,2.
Under cointegration, (ii) does not apply as rank (2,2) = r2. Case (i) is consistent
with Lemma 2 in Ericsson et al. (1998). For case (iii), the quasi-diagonality
restriction (1,2 = 0, 2,1 = 0) combined with $2,2 = 0 is sufficient for weak exo-
geneity of zt for .1.
Weak exogeneity for a sub-block implies that analysis may be undertaken at
the level of the sub-system. More specifically, identification conditions now
apply at the level of the sub-system, as previously at the level of the full
system. Let 1 denote an n1 × n sub-matrix of  for which rank (1) = r1 and
n1 > r1 ≥ 1. If 1(r1) defines an r1 × n sub-matrix of 1 for which the maximum
rank is given by its smallest dimension, then an equivalent column matrix
exists which is n1 × r1 and has full column rank. Given the quasi-diagonality
restriction, it follows that:
∏1 = 1,1.′1 and ∏1( r1 ) = A1.′1 , (5.16)

where A1 is a square matrix of full rank r1 obtained from 1,1 (by selecting r1
rows). To identify 1,1 and .1 subject to a standard normalization (i.e. r1
restrictions) the following sub-system order condition now applies:

r1n + r1n1 − r1 ≤ r1n + r1n1 − r12 ⇔ r12 − r1 ≤ j1 ,

where j1 is the number of restrictions associated with the sub-system. Now,


r1 – 1 restrictions apply to each equation in the first sub-block as compared
with r – 1 when the full system condition is used. Hence, r2 variables are
viewed as exogenous to the sub-system.

Theorem 10 Given r21 – r1 = j1 and knowledge of the sub-system cointegrating rank


(r1), a sufficient condition for the existence of a solution to the vector sub-system:
vec(′.1) = (In ⊗ A1)–1vec(1(r1)) is the existence of a matrix A1 of full rank r1 con-
structed by selection of r1 rows of 1,1.

Proof. By analogy with the proof of Theorem 9, vec(.1), which follows from
vectorizing (5.16), is identifiable when A1 has full rank. ■

A special case arises when r1 = 1 and excepting the choice of normalization


no further restrictions are required to identify .1.

Corollary 11 If r1 = 1, then subject to a normalization, weak exogeneity is sufficient


for identification of the long-run parameters .1 associated with the first sub-block.
Exogeneity and Identification 153

If in addition, r2 = 1, then for a specific normalization weak exogeneity is all


that is required for the identification of  when r1 + r2 = r. It follows from weak
exogeneity that identification is a natural consequence of the partition. In
more general sub-systems, the type of conditions derived in the previous
section are relevant.
It can readily be shown that a similar result to Theorem 10 applies to any
subsequent sub-system. Hence, vec(.2) is identified when a sub-matrix A2 of 2
has full rank. There are now at least two sub-systems that can be separately
estimated and identified based on the above conditions.23 However, the quasi-
diagonal form of weak exogeneity implies that while y is dependent in
the long run on z in the first sub-block, then z is also dependent on y in the
second block. The latter statement does not appear to be consistent with the
idea that in the long run the notions of exogeneity and causality are coherent.
To address the above concern, attention is focused on cointegrating exogene-
ity, the restrictions 1,2 = 0 combined with 2,1 = 0 imply that z is not long-run
caused by y and as a result 2,1 = 0. Restrictions associated with cointegrating
exogeneity direct attention towards the identification of the long-run para-
meters in a sub-block. However, such restrictions only identify  to the sub-
block as (1,2 = 0) implies that the same restrictions are applied to all the rows of
.2. However, the order condition per sub-block is now less onerous (r2 – 1
restrictions). And when r2 = 1, then 2,2 is identified via a normalized coefficient.
When compared with the impact of quasi-diagonalizing the system, cointegrat-
ing exogeneity applies only to the set of identified sub-system relationships. In
terms of identifying that sub-block, the following relationship is of interest:
∏ 2 ,2 = 2 ,22′ ,2 .

If rank(2,2) = r2, then there is a sub-matrix 2, (r2) of dimension r2 × n2, and a
matrix of column vectors dimensioned n2 × r2, both of rank r2. Now the order
condition for this sub-system is:

r2n + r2n2 − r2 ≤ r2n + r2n2 − r22 ⇔ r22 − r2 ≤ j2 .

Even with all of the zero restrictions in the second block of cointegrating
vectors, the number of relevant restrictions in the order condition for the sub-
block remains unchanged at the level of the sub-block. Subject to an appropri-
ate number of identifying restrictions, then a sufficient condition for the
existence of a solution to the system associated with 2,2 is the existence of A2,
an r2 × r2 sub-matrix of 2,2. By analogy with the result in Theorem 10, the fol-
lowing relationship exists for 2,2:

vec(2′ ,2 ) = ( I n2 ⊗ A2 )−1 vec( ∏1,( r2 ) ).

Further, when zt is also cointegrating exogenous, then the long-run behaviour


of the sub-system for zt does not depend on the endogenous variables. If zt is
154 Modelling Non-Stationary Time Series

both weakly exogenous for .1 and zt is not long-run caused by yt, then zt is
termed long-run strongly exogenous for .1 Therefore, strong exogeneity com-
bines the restrictions associated with weak exogeneity and the restrictions
appropriate for cointegrating exogeneity.
In the next section, the identification and identifiability of a model involv-
ing weak, cointegrating and strongly exogenous variables is addressed.

5.4 Empirical examples

To motivate the analytic solution and empirical results discussed in the last
section, the approach is applied to the data set analyzed by Johansen and
Juselius (1992) and Hunter (1992a).24 The system of equations associated with
Theorem 9 is observed to have a number of solutions, which directly relate to
the correct degrees of freedom for the test of over-identifying restrictions.
Emphasis is placed on a model, that is identified via restrictions on  dis-
cussed in section 5.3 and both weak exogeneity and cointegrating exogeneity
are tested.
From the discussion in section 5.2, whether it is possible to identify the
parameters in the long run follows from the ability to solve for  and  from
well-defined rows and columns of . According to Theorem 6, this depends
on the existence of what might be called a valid orientation of the system. If
i = A′ and from the cointegrating rank test rank(′) = r, then it follows from
the conditions on the rank of sub-matrices, that rank(A) = r ⇒ rank(i) = r.
Hence, determining an A matrix with full rank is equivalent to associating the
solved system with well-defined parts of the matrix . The ability to identify
the parameters empirically from the solution to the algebraic problem of the
form (5.11) and (5.12) relies empirically on finding matrices A and B with full
rank. Prior to undertaking such a test, a set of minimum restrictions will be
defined and then tested.25 For generic identification of a system with r = 2
cointegrating vectors r2 – r = 2 restrictions are required with normalization and
r2 without. To test the over-identifying restrictions and identifiability, the like-
lihood ratio test discussed in Johansen and Juselius (1992) and implemented
in Doornik and Hendry (1998, 2001) is used. Using the results in Section 5.3,
 and  can be identified via a normalization and the restrictions associated
with quasi-diagonal  also discussed in section 5.1.2:

   0 0 0 
 ′ =  11 21 31 . (5.17)
 0 0 0 42 52 62 
The only restrictions applied to  are those associated with the normaliza-
tion (41 = –1, 52 = 1).

p0 p1 p2 e12 i1 i2
   − 1 51 61 
 ′ =  11 21 31 .
12 22 32 42 1 62 
Exogeneity and Identification 155

Table 5.2 Tests of exogeneity and identification conditional on r = 2

Test Null Statistic [p-value]

(I) Quasi-diagonality αi1 = 0 for i = 4, 5, 6; χ2 (4) = 3.9595 [0.4115]


β41 = –1 αi2 = 0 for
i = 1, 2, 3; β52 = 1.
(IIa) Non-identifiability α31 = 0, α62 = 0 χ2(2) = 30.0465 [0.0000]
(IIb) Non-identifiability α51 = 0, α52 = 0 χ2(2) = 4.42 [0.1097]
(IIc) Non-identifiability β41β52 – β42β52 = 0 χ2(1) = 3.9087 [0.0481]
(IIIa) Weak exogeneity αi1 = 0 for i = 4, 5, 6; χ2(4) = 2.5132 [0.6423]
β41 = – 1, αi2 =
ωi1α42 + ωi2α52 + ωi3α62
for i = 1, 2, 3; β52 = 1.
(IIIb) Strong exogeneity αi1 = 0 for i = 4, 5, 6 χ2(8) = 12.708 [0.1223]
(Weak + Cointegrating Exogeneity) αi2 = 0 for i = 1, 2, 3
βi2 = 0 for i = 1, …, 4.

It can be seen from the p-value associated with test (I) in Table 5.2 that the
long run is identified: (i) six restrictions are imposed (j = r2 – r = 2) and (ii) the
test of over-identifying restrictions is accepted at the 5% level.26
Now consider the orientation of the system or the selection of the appropri-
ate r-dimensioned square matrices A and B. A valid choice for A is based on
the 3rd and 6th rows from . For a solution, it is required that:

vec( ′ ) = ( I 6 ⊗ A)−1 vec( ∏ 3 ). (5.18)

Hence, any matrix A needs to be of full rank. Following the acceptance of the
quasi-diagonality restriction then the identifiability of  depends on the rejec-
tion of the condition |A| = 0.27 One possible orientation is:

31 0  
31
32
33
34
35
36 
A=  and ∏ 3 =  .
 0 62  
61
62
63
64
65
66 
This test is applied under a null of non-identifiability of  (Table 5.2, II), the
test is 2(2) and the null is rejected at 5% and any other conventional level of
significance. Should one consider the alternative orientation associated with
the treasury bill rate (i1) and the exchange rate equations, then both were
jointly accepted to be weakly exogenous by Hunter (1992a). To compare this
orientation with that used above it is of interest to note that when the restric-
tions 51 = 0 and 52 = 0 are used to augment 31 = 0 and 32 = 0 (Table 5.2, IIb)
then when compared with a 2(2) statistic the null cannot be rejected at the
5% significance level. This implies that the fifth column does not yield an
appropriate sub-matrix to orientate the system and by a similar argument the
fourth column can also not be used.
A possible choice of B is based on the fourth and fifth columns of , so that
vec(
.′4 ) −1 42 
vec() = ( B ⊗ I 6 )−1   and B ′ =   (5.19)
vec(
.′5 ) 51 1 
156 Modelling Non-Stationary Time Series

where
′.j = [
1j
2j …
6j] for j = 4, 5. Here the test of orientation for the
identification of  is undertaken prior to the imposition of any restriction (see
Table 5.2, IIc). Under the null the determinant of B is set to zero, the test is
2(1) and from the critical value non-identifiability can be rejected at the 5%
level. It follows from Theorem 9 that .j = B′ and from the cointegrating rank
test rank() = r, so rank(B) = r ⇒ rank(.j) = r also and the orientation with
respect to  is valid.
It follows that a solution can now be derived from (5.18) and (5.19) based
on the selected A and B matrices (see Appendix H):

 = [11 21 31 42 52 62 11 21 31 51 61 12
22 32 42 62 ]
51 51 51 51
= [ 1
14 − 

15 1


24 − 

25 1


34 − 

35 − 

44 − 1
45
51 51 −1 −1
− 

54 − 1
55 − 

64 − 1
65 31
32 31
33
−1 −1 −1 −1 −1 −1 −1
31
35 31
36 62
61 62
62 62
63 62
64 62
66 ]
= g −1 ( ), where  = −1 − 4251.

Theorem 9 implies that a sufficient condition for the existence of a solution to


the vector system associated with the first v1 cointegrating vectors is the exist-
ence of a matrix A1 such that:

vec(.′1 ) = ( I 6 ⊗ A1 )−1 vec( ∏1,( r1 ) ).

From Corollary 11, when r1 = 1, then the existence of a block of weakly exoge-
nous variables is a sufficient condition for identification of the cointegrating
vectors in the first block. By analogy the second block is also identified, when
r2 = 1. The system is sequentially identifiable from the restrictions on  alone
and the selection of the normalization. In this case, the long run is partitioned
into two sub-systems for which ri = 1 and consequently each vector is
identified by the normalization alone.

5.5 Conclusion

In this chapter exogeneity and identification have been discussed. Exogeneity


implies restrictions on the long-run parameters of the model. In the case of
weak exogeneity for , the requirement is that all cointegrating vectors are
excluded from the equation for the weakly exogenous variable. This proposi-
tion is tested using a likelihood ratio test, which compares the model estim-
ated using a VAR, which is only restricted by virtue of the rank restriction on
, with models that, irrespective of the restriction, can be estimated using the
generalized restriction estimator given in Appendix F (Doornik 1995). For
weak exogeneity and long-run exclusion there are r restrictions on  and 
respectively for each variable excluded, while for strict exogeneity, there are
Exogeneity and Identification 157

2r restrictions on  and  for each variable excluded. Such restrictions are


binding and can be tested by a test, which is asymptotically distributed χ2ir
for WE and LE, and χ22ir for SE, where i denotes the number of variables
excluded. Small sample corrections are available for these tests either via
the bootstrap (Podivinsky 1993) or exact small sample correction (Johansen
2002).
Cointegrating exogeneity is comparable with Granger causality, in the sense
that the non-linear restrictions, when applied, are associated with non-causa-
tion of the exogenous by the endogenous variables in the long run, but in the
later case the restriction also applies to the short-run dynamics of the model.
The variables not caused are termed cointegrating exogenous for . Forecasts
of the endogenous variables in the long-run can be made conditional on the
forecasts of the cointegrating exogenous variables, because both  and  have
a block triangular structure. As pointed out by Toda and Phillips (1994), care
must be taken in determining the degrees of freedom of this test, because
there is an annihilation of parameters that implies that not all of these restric-
tions are binding. Doornik (1998) has implemented a procedure for checking
the degrees of freedom, but in more general terms the problem is best viewed
as one of identification. The restrictions for exogeneity only in very special
cases identify the cointegrating vectors. Furthermore, such common restric-
tions applied to  only identify to a sub-block of equations.
The procedure for identification outlined can be applied using standard
packages and identifiability is a product of the conditions required for generic
identification. The procedure requires identification to be checked on an a
priori basis. The test of the existence of the sufficient conditions associated
with Sargan (1983a) stems from the application of restrictions to both  and
, and the whole approach can be made operational with a range of non-
linear restrictions.
The method was applied to data well known in the cointegration literature.
The discovery of a solution to the vector conditions associated with Theorem
9 verifies the restrictions as over-identifying and determines the degree of
over-identification. Identifiability of  is accepted on the basis of a test similar
to the H02 in Boswijk (1996). However, this test confirms that it is appropriate
to solve the system using the selected rows and columns of . Hence, the ori-
entation of the system and the solution uncovered are empirically identified.
Identifiability of  follows from restrictions on  that relate to the exogeneity
of the variables selected. The question of which variables are exogenous would
appear to be of importance when the appropriateness of the normalization is
at issue.
Based on the results in section 5.4, the system was identified by imposing a
quasi-diagonality restriction on  and by normalizing with respect to r coeffi-
cients in . It is shown that quasi-diagonality, subject to additional covariance
158 Modelling Non-Stationary Time Series

restrictions, implies weak exogeneity for a sub-block of . Finally, the joint


acceptance of weak exogeneity and cointegrating exogeneity tests for the
interest rates implies that they are long-run strongly exogenous for the first
cointegrating vector. Given the diagonalization of the system, this causal
ordering further emphasizes that the interest rates are the exogenous variables
in the system.
6
Further Topics in the Analysis of
Non-Stationary Time Series

6.1 Introduction

In this chapter three further topics are considered in some detail: estimation
of models with I(2) variables; forecasting; and structural models with short-
run behaviour driven by expectations. Though mathematically the notions of
order of integration and cointegration are exact, in practice they are valid to
the best approximation or resolution that the data may permit. To define an
order of integration as a specific integer quantity is to assume that the series is
approximated by a single well-defined time series process across the sample.
Time series data for developed economies have exhibited many features, from
behaviour that might be viewed as purely stationary through to series that
require first or second differencing to render them stationary. Some nominal
series in first differences may require further differencing, which suggests that
the original nominal series are of order I(2) or higher when further differenc-
ing is required. In this chapter, discussion is limited to processes up until I(2).
The condition required for a series to be considered to be I(1), as compared
with one exhibiting further features only consistent with I(2) behaviour, is
necessary and sufficient for cointegration amongst I(1) series, but beyond
testing this condition, there is a well defined procedure for inference and esti-
mation of I(2) processes (Johansen 1992, 1995). It might often be difficult to
distinguish between an I(1) and an I(2) series, which suggests that series,
which appear to be I(2), are being approximated to some order of accuracy by
second differences. Alternatively, these series may be better modelled using
non-integer orders of differencing (Granger and Joyeux 1980; Hosking 1981).
To this end, the question of fractional processes and long-memory will be dis-
cussed briefly after the section on I(2) behaviour. A further reason why it
might be difficult to detect the order of integration of a series may be due to
the existence of structural breaks. This opens up a plethora of potential
difficulties for any form of structural modelling. Breaks in structure have a

159
160 Modelling Non-Stationary Time Series

number of forms when conventional (I(0)) linear econometrics is considered,


but beyond slope and intercept shifts, there are other types of intercept cor-
rection used in macro modelling (see Clements and Hendry 1998, 2001). The
break may also apply to the cointegrating relations (co-breaks) or in the order
of integration and cointegration. Testing was limited in chapter 4 to recursive
break tests and tests with a known break in structure that could be corrected
by the use of dummy variables. In this chapter forecast performance is com-
pared by considering the difference between forecasts made with and without
the imposition of cointegration. Specifically, the simulation results of Hendry
and Clements and Lin and Tsay are evaluated.
Once the notion of forecast failure is considered, then issues associated with
our ability to detect short-run structure arise. In this context, there can be no
difference between estimating a structural relationship as compared with a
reduced form, except for the added efficiency that might derive from the
imposition of further restrictions on the long-run and short-run parameters.
There are a number of approaches to defining structural models under cointe-
gration of which the best defined follows from the work of Pesaran et al.
(2000). The elegance of the Johansen approach is lost once the long-and the
short-run coefficients are interrelated, as testing for a unit root in multivariate
processes cannot be readily disentangled from the estimation of the long-run
and short-run parameters. In particular, when the long-run parameters are
embedded within the short run, as occurs with models with future expecta-
tions, then testing for cointegration is less straightforward. Here, the impact of
forward-looking behaviour is considered in terms of exogenous processes that
are weakly and cointegrating exogenous and then processes that have unit
roots in the exogenous variables. The simple method suggested by Dolado et
al. (1991) is considered along with an extension of this method to the multi-
variate context by Engsted and Haldrup (1997). An alternative maximum like-
lihood approach is discussed here, though the inference is contaminated by
both the unit root and generated regressor problem.

6.2 Inference and estimation when series are not I(1)

In this section the I(2) approach advanced in Johansen (1992a), is considered


along with some discussion of multi-cointegrated and fractional processes.
Whether a series is I(1), close to I(1) in levels or differences, is a matter of
debate. To some extent cointegration operates beyond the framework of this
debate, because long memory processes may also interact, as has been
observed recently by Abadir and Talmain (2002). From the original definition
of cointegration due to Engle and Granger (1987) series of order I(j) cointe-
grate and I(1) and I(0) series may also combine in the manner described by
Flôres and Szafarz (1996). One estimator, which combines I(0), I(1) and I(2)
Further Topics 161

processes is that given in Johansen (1992a). This assumes that differenced


series are of integer order, which rules out the possibility that series such as
inflation rates are fractional processes. The distinction between long memory
and non-stationarity might be viewed as semantic for the data sets readily
available, but one cannot dismiss the possibility that series may move across
orders of integration from non-stationarity through long-memory to sta-
tionarity. In this light the series might never be purely stationary or non-
stationary. Where this would appear to accord with sound economic principle
then one might have to look for the best approximation.1

6.2.1 Cointegration when series are I(2)


Consider the cointegration case developed by Engle and Granger (1987),
where all the series are I(2). It follows from our discussion of cointegration in
chapter 4 that second differences have the following Wold decomposition:

∆ 2 xt = C( L)t

and xt cointegrate when ′I(2) C (1) = 0 and ′I(2) xt = I(2)t ~ I(0). If a left-hand
factor can be extracted in the manner described in section 4.5, then:

∆ 2 xt = C0 ( L)C1 ( L)t . (6.1)

It is possible to transform the Wold form into an error-correcting VARMA


when FC(1) = 0, and F is an idempotent matrix. Therefore:

( ∆I − FL)∆xt = C1 ( L)t . (6.2)

When C1 (L) has no more unit roots, then an I(2) cointegrating VAR exists in
second differences:

( L)∆ 2 xt = ∏ I ( 2 ) ∆xt −1 + t
where I(2) = I(2)′I(2) = F. This has been called balanced I(2) behaviour by
Juselius (1995). Now consider the case where C(1) has further unit roots, then
it might be possible to undertake a further factorization when a left-hand term
C01(L) = (I – GL) can be extracted and GC1(1) = 0. Therefore:
( ∆I − FL)∆xt = ( I − GL)C11( L)t (6.3)
( ∆I − GL)( ∆I − FL)xt = C11( L)t (6.4)

The following I(2) representation can be readily derived from multiplying


through the two left-hand divisors above. Therefore:
∆ 2 xt − F∆xt −1 − G( ∆xt −1 − Fxt − 2 ) = C11( L)t ,

transforming to the VAR by inverting C11(L) and applying the reparameteriza-


tion (A(1)L + (1 – L)A*(L)) to produce terms in first differences and (A(0) +
(1 – L)A+(L) + (1 – L)2A++(L)) terms in levels,
162 Modelling Non-Stationary Time Series

%( L)∆xt − A(1) F∆xt − 2 + A( 0)GFxt − 2 − A(1)G∆xt − 2 + A + ( 0)GF∆xt − 2 = t

or

%( L)∆xt = A(1) F∆xt − 2 − A( 0)(GF )xt − 2 + ( A x (1)G − A x ( 0)GF )∆xt − 2 ) + t (6.5)

where %(L) = (A(L) – A*(L)(F + G)L + A++(L)GFL2), Ax(1) = A(0)–1A(1) and


Ax(0) = A(0)–1A+(0). Assuming a VAR(2) system with A(1)F = ⊥(′⊥⊥)–1"′′,
F = H–1MI(2)H, MI(2) = diag(1 … 1, 0 … 0), A(0)G = ′′ and (Ax(1)G – Ax(0)GF) =
!′, then (6.5) is a restricted version of the I(2) representation in Hansen and
Johansen (1998):

∆ 2 xt = ∑ ⊥ (⊥′ ∑ ⊥ )−1 " ′ ′∆xt −1 + (  ′ ′xt −1 − ! ′∆xt −1 ) + t . (6.6)


In the notation of Hansen and Johansen,  is n × r,  is (r + s) × r,  is
n × (r + s), ! is n × r, " is (r + s) × (n – r) and  is n × n.
Next the approach due to Johansen (1992) is considered for testing for coin-
tegration in I(2) systems, then an example is discussed along with
identification and estimation.

6.2.1.1 The Johansen procedure for testing cointegrating rank with I(2) variables
Prior to any discussion of the appropriate method of estimation the more con-
ventional VECM for the I(2) case is presented (Johansen 1995a):
p −1
∆ 2 xt =  ′xt −1 − ∆xt −1 + ∑% ∆ x
i =1
i
2
t −i + N 0 Dt + t . (6.7)

Where = ⊥(′⊥⊥)–1"′′ + !′,  and ′ = ′′ are the conventional loadings


and cointegrating vectors for the case in which series of any order may col-
lapse to a stationary linear combination. If = 0, then this is the cointegration
case considered by Engle and Granger (1987) where all the series are I(2) and:
p −1
∆ 2 xt =  ′xt −1 + ∑% ∆ x
i =1
i
2
t −i + N 0 Dt + t . (6.8)

Alternatively, when ′ = 0 and the differenced I(1) series have linear com-
binations that are stationary:
p −1
∆ 2 xt = − ∆xt −1 + ∑% ∆ x
i =1
i
2
t −i + N 0 Dt + t (6.9)

where – = (′⊥)–1"′′ = I(2)′I(2) as ′⊥ has full rank, because ′ = 0 implies
 = 0 and  = 0. The full I(2) case allows for the possibility of cointegration
amongst I(2) series that become I(0) in combination, and cointegration
amongst I(1) series that become I(0).
Clearly, (6.8) can be estimated using the Johansen procedure, except the re-
gression that is purged of short-run behaviour in, for example the VAR(1) case is:

R0 ,t =  ′R1,t
or
∆ xt = ∏ xt −1 ,
2
Further Topics 163

and decomposition and testing follows in the usual way (see sections 4.3–4.4).
Alternatively, for the VAR(1) case associated with (6.9) the estimation
procedure is in every respect the same as that derived by Johansen (1991),
except the data are first and second differenced. For the VAR(1) case this
involves estimating the following model:
R0 ,t =  I ( 2 ) ′I ( 2 ) R1,t = (⊥′ )−1 " ′ ′R1,t
or
∆ xt = − ∆xt −1
2

This becomes more complicated when the two types of cointegration are com-
bined, then (6.7) needs to be estimated, but this requires two blocks of reduced
rank tests to be undertaken. One procedure for undertaking this analysis would
be to consider the unit roots associated with cointegration amongst I(2) series
whose first differences cointegrate. However: when ′ ≠ 0, then the model to
be estimated will either require very long lags as the moving average terms
′xt–1 = J(L)εt–1 have been omitted or the Johansen approach might be applied
to a VARMA(1,q) model. To see this re-write (6.2) as:
∆ 2 xt − F∆xt −1 = C1 ( L)t . (6.10)

If (6.10) were to be estimated, then the method must account for roots on the
unit circle as when the level terms cointegrate, C1(L) contains further unit
roots. Otherwise, the conventional VAR associated with this problem is of
infinite order and not conventionally invertible. There is no unique way
of deriving the estimator and in general the existence of the time series
representation cannot be proven.
In general, the case with both I(2) and I(1) interdependencies can be
handled by considering the solution to two reduced rank problems:

∏ =  ′
⊥′ ⊥ =  ′

where  and  are (n – r) × s dimensioned matrices. To simplify the exposition


quadratic trends are not considered here. Johansen (1995) suggests the
problem is made tractable by correcting the short-run behaviour firstly for the
usual cointegration case as the I(2) series collapse to linear combinations that
are stationary. When the Frisch–Waugh theorem is applied to purge the short-
run relationship of the nuisance terms, then 2xt and xt–1 are both regressed
on xt–1 and x2t–i i = 1, 2, …, n – 1 by ordinary least squares. The residuals
from these regressions will not be correlated with the lagged second differ-
ences and the influence of the first form of cointegration will be removed.
Again R0,t and R1,t are, in essence, the n × 1 residual vectors from regressions
with xt and xt–1 as the dependent variables. The following regressions, yield
estimates of the first long-run parameter matrix:

R0 ,t =  ′R1,t = ∏ R1,t . (6.11)

Now  is calculated by solving the conventional eigenvalue problem for the


I(1) case and the usual I(1) analysis is undertaken to determine cointegrating
164 Modelling Non-Stationary Time Series

rank (section 4.4). To confirm that the I(1) analysis is valid the test for I(2)
components discussed previously in 4.4.5 needs to be undertaken, this relates
to the solution to the second reduced rank problem, that is rank(′⊥ ⊥) =
n – r. Should this matrix not have full rank, then there are I(2) components
not accounted for. Next an analysis of the I(2) components of the model is
undertaken, controlling for the I(1) variables. Subject to knowledge of (, , r)
the I(1) terms are eliminated by pre-multiplying (6.7) by ′⊥:
p −1

′⊥∆ xt = ′⊥′xt −1 − ′⊥ ∆xt −1 +


2
∑′ % ∆ x ⊥ i
2
t −i + ′⊥ N 0 Dt + ′⊥t
i =1
p −1

= −′⊥ ∆xt −1 + ∑′ % ∆ x


⊥ i
2
t −i + ′⊥ N 0 Dt + ′⊥t . (6.12)
i =1

This is an n – r dimensioned system and in the pure I(1) case rank(′⊥ ) = n – r.


The test for further I(2) trends is undertaken by regressing ′⊥ 2xt and ′⊥ xt–1
on ′⊥ x2t–i i = 1, 2, …, n – 1. The residuals from the regressions of R0,t and R1,t
for this case yield an eigenvalue problem that can be solved in the usual way.
The Johansen test for this case determines the rank (′⊥ ⊥) = s, where 0 ≤ s ≤
n – r and associated with s significant eigenvalues is the s × n – r matrix of
eigenvectors ′ that define common trends. If all the variables are I(1), then
the system separates into r stationary variables (′xt–1) and n – r common
trends ′ xt– 1. Otherwise there are s common trends and n – r – s, I(2) trends.
To complete the I(2) analysis, (6.7) is now multiplied by the r × n matrix –′:
p −1

 ′∆2 xt =  ′′xt −1 −  ′ ∆xt −1 + ∑′% ∆ x i


2
t −i +  ′N 0 Dt +  ′t
i =1
p −1

= ′xt −1 −  ′ xt −1 + ∑′% ∆ x i
2
t −i +  ′FDt +  ′t (6.13)
i =1
where –′ = Ir. Subtracting (6.13) from $ × (6.12):
p −1

 ′∆2 xt − $′⊥∆2 xt = ′xt −1 −  ′ xt –1 + ∑′% ∆ x i


2
t −i +  ′N 0 Dt +  ′t +
i =1
p −1

$(′⊥ ∆xt −1 − ∑′ % ∆ x ⊥ i


2
t −i − ′⊥ N 0 Dt − ′⊥t )
i =1

 ′∆2 xt − ′xt −1 = $′⊥∆2 xt − ( ′ − $′⊥ )( ∆xt −1 −


p −1

∑% ∆ x i
2
t −i – N 0 Dt ) + ( ′ − $′⊥ )t (6.14)
i =1

where $ = ⊥ = –′⊥ and ⊥⊥ = ′⊥⊥ The errors of (6.12) and
⊥ –1⊥⊥,
(6.14) are independent by construction. While the parameters of (6.12), (′⊥ ,
′⊥%i⊥⊥) and (6.14), ($,(–′ – $′⊥) ,(–′ – $′⊥)%i, (–′ – $′⊥)N0) are variation
free. It follows that the parameters ( , %i, N0, ) can be disentangled from the
Further Topics 165

above reparameterization. If there are no further cross-equation restrictions on


the higher-order dynamics and cointegration, then (6.12) and (6.14) can be
analyzed separately, while the dependence that operates on the common
trends applies to (6.12) alone.
The second reduced rank hypothesis is:
H r , s : rank(⊥′ ⊥ ) = rank( ′ ) = s
– –
where 0 ≤ s ≤ n – r. Using the identity I =  ′ + ⊥ ′⊥, the variables ′ xt–1 and

 ′ xt–1 may be introduced into (6.12):
p −1
⊥′ ∆ 2 xt = −⊥′ ( ′ + ⊥⊥′ )∆xt −1 + ∑ ′ % ∆ x
i =1
⊥ i
2
t −i + ⊥′ N 0 Dt + ⊥′ t

p −1
= −⊥′  ′∆xt −1 + ⊥′ ⊥⊥′ ∆xt −1 + ∑ ′ % ∆ x
i =1
⊥ i
2
t −i + ⊥′ N 0 Dt + ⊥′ t (6.15)

p −1
= −⊥′  ′∆xt −1 +  ′⊥′ ∆xt −1 + ∑ ′ % ∆ x
i =1
⊥ i
2
t −i + ⊥′ N 0 Dt + ⊥′ t . (6.16)

The parameters ( , %i for i = 1, … n – 1 N0) can be estimated by regressing


–′ 2xt – ′xt–1 on ′⊥ 2xt, xt–1, 2xt–i and Dt. The dependence amongst the s
common trends can be determined from the regression:
R0 ,t = – ′R1,t (6.17)

where R0,t and R1,t are residuals based on regressing ′⊥ 2xt and  ′⊥ xt respec-
tively on ′ xt–1, xt–i for i = 1, … p – 1 and Dt. The likelihood ratio test statis-
2

tic is based on the solution to the eigenvalue problem | S1,1- S1,0 S–10,0S0,1| = 0,
calculated from sample product moments derived for the I(2) case using:
T
Si , j = T −1 ∑R
i =1
i ,t R′j ,t for i = 0,1

It follows that s is selected by calculating the maximal eigenvalue test:

 s1 
LR( s0 , s1 ) = −T  ∑
log(1 − i )
i = s +1 
 0 

and for an appropriate choice of s the matrix ′ is the matrix whose columns
are the eigenvectors associated with the first s significant eigenvalues.
An alternative approach is derived in Johansen (1997) and Hansen and
Johansen (1998) using (6.6) where the parameters to be estimated that are
variation free are (, , , , ", !).

6.2.1.2 An example of I(2)


Identification and model selection in the I(2) case is more complicated than in
the I(1) case and partial consideration of the null of cointegration conditioned
166 Modelling Non-Stationary Time Series

on the notion that the series are all I(1) may not be valid (Paruolo 1996).
When the series are I(2) they become stationary by virtue of a combination of
I(1) and I(2) processes and from (6.6) the cointegrating relations have the
following form:
(  ′ ′xt −1 − ! ′∆xt −1 ) =  ′ ′xt −1 − ! ′∆xt −1
=  ′xt −1 − ! ′∆xt −1 = ( ′xt −1 − ! ′∆xt −1 ).

Engle and Yoo (1989) defined cointegrating relationships of the form ′xt–1 –
!′ xt–1 as polynomial cointegration. To observe this re-write the cointegrating
vectors as a lag polynomial (L) in x:
( L)′ xt =  ′xt −1 − ! ′xt −1 + ! ′xt − 2 = (( ′ − ! ′) I + ! ′L)xt −1.

The cointegrating vectors reduce to linear combinations (′xt′) of xt–1 (Engle


and Granger 1987, when either !′ = 0 or !′ = ⊥ and = "′′. In general, (6.7)
has r linear combination of I(2) variables that are I(0), s independent linear
combinations of I(1) variables that are I(0) and n – r – s variables that follow
I(2) trends. If, in addition, = "′′ = 0, then s = 0 and there are n – r, I(2)
trends rendered stationary by the second difference operator; the case consid-
ered by Engle and Granger (1987).
It was suggested in Hunter (1992a) that some of the series analyzed by
Johansen and Juselius (1992) were I(2). In response to this suggestion Hunter
and Simpson (1995) analyzed a system in which the UK inflation series enters
the model in first difference form, but they based their analysis on a longer data
set. Here, the extended VAR(2) model estimated by Hunter (1992a) is tested for
I(2) behaviour. For this example, n = 6, x′t = [p0t p1t p2te12t r1tr2t], the variables are
described in section 4.3.1.2 and the statistics are calculated for the period
1973Q2–1987Q3. When the first reduced rank regression (6.11) is undertaken to
calculate ′, the intercept is unrestricted and a trend is introduced into the
model. At the second stage the trend is restricted to exclude quadratic trends.
The problem is addressed firstly using the approach adopted by Paruolo (1996)
and this is then compared with that described in Johansen (1995).
Paruolo (1996) derives critical values for the test of the joint hypothesis:2
H r , s : rank( ) + rank( ) = s + r .

The test statistic (1Qr,s) is compared with associated points on the null distribu-
tion, the comparison is made either with [p.value] calculated by PCGIVE 10.1
(Doornik and Hendry 2001) or 5% critical values (cr,n–r–s (5%)) taken from
Paruolo (1996). It is suggested in Doornik and Hendry (2001) that testing is
applied from the top left of the table, while Paruolo (1996) suggests progress-
ing from the top to the bottom of each column to a point at which the null
can no longer be rejected. Paruolo (1996) advises that tests are applied to
the specific case, moving to the general or from the most restricted to less
Further Topics 167

Table 6.1 I(2) Cointegration tests

1Qr,s(Q??r,s)
r [5% c.v.] Q*r cn–r cr,n-r-s
[p.value]
n-r-s 6 5 4 3 2 1
314.01 254.23 199.22 163.69 141.7 126.62
0 [194.32] [134.54] [79.53] [44.0 [22.01] [6.93]
240.35 203.12 174.83 148.54 126.69 109.21 119.69 93.92
[0.0000] [0.0000] [0.0031] [0.0105] [0.0073] [0.0028]
203.82 148.4 114.58 90.026 74.347
1 [134.96] [79.539] [45.719] [21.165] [5.486]
171.89 142.57 117.63 97.97 81.93 68.861 68.68
[0.0009] [0.0429] [0.1335] [0.2082] [0.1840]
124.56 88.233 65.029 49.417
2 [80.184] [43.857] [20.653] [5.041]
116.31 91.41 72.99 57.95 44.376 47.21
[0.0226] [0.1234] [0.2247] [0.2537]
3 83.798 56.535 35.023
[59.868] [32.605 [11.093]
70.87 51.35 38.82 23.938 29.38
[0.0039] [0.0176] [0.1215]
48.922 27.513
4 [35.512] [14.103]
36.12 22.6 13.413 15.34
[0.0016] [0.0084]
5 13.576
[8.392] 5.184 3.84
12.93
[0.0601]
c*n-r-s 75.33 53.35 35.07 20.17 9.09

restricted cases. Following this approach, the first diagonal element implies
r = 0, n – r – s = 6 and the test statistic for the case with unrestricted constant
( I ≠ 0) is 1Q0,0 = 314.01 > c0,6 (5%) = 240.35. Based on the calculated statistic
the null hypothesis (rank( ) = s = rank() = r = 0) cannot be accepted.
Progressing to the next column, where r = 0 and n – r – s = 5, 1Q0,1 = 254.23 >
c0,5(5%) = 203.12, the null is rejected, that rank( ) = s = 1 and rank() = r = 0.
At this point using Paruolo’s (1996) suggestion to move down the column,
r = 1, n – r – s = 4, s = 1, the joint test statistic 1Q0,1 = 203.82 > c0,5(5%) = 177.89
and the [p-value]=.0009 confirms that the null hypothesis cannot be accepted
at either the 5% or the 1% level. Now the next column is considered, r = 0,
n – r – s = 4, s = 2 and the [p-value]=0.0031 implies the null (rank( ) = s = 2,
rank() = r = 0) cannot be accepted.
Following this approach, testing stops and the correct decomposition of the
long-run is detected once a null in the above table is accepted. Looking at the
168 Modelling Non-Stationary Time Series

[p.values] in the column headed n – r – s = 4, there is no case where the null


hypothesis can be accepted. The final rejection of the null implies that there
are at least r = 2 cointegrating vectors and 6 – r – s ≤ 3, I(1) trends. Now pro-
gression is from the top of the next column (n – r – s = 3) and again to a point
at which the null cannot be rejected. From the size of the [p.value] = 0.1335,
this occurs when r = 1, n – r – s = 3 and s = 2. The Paruolo approach implies
that there are r = 1 stationary linear combinations (cointegrating vectors),
n – r – s = 6 – 1 – s = 3, I(1) trends and s = 2, I(2) trends. Were one to follow the
direction in Doornik and Hendry (2001), to progress down and to the right,
then this suggests shifting to the next column at the point at which r = 2 and
then progressing down that column.3 The direction of Doornik and Hendry is
consistent with the proposition that the first step of the Johansen I(2) estim-
ator correctly determines the number, but not necessarily the exact nature of
the cointegrating vectors.
In comparison, Johansen (1995a) suggests that the cointegrating rank calcu-
lated from the first step estimation is still reliable, which suggests testing the
hypothesis associated with I(2) trends conditional on selecting a particular
value for r. The null hypothesis that Johansen (1995a) tests is:
H r , s H (r ) : rank( ′ ) = s.

Based on the first rank test it is suggested that r = 2 is selected and then s is
determined by moving along that row to the point at which the null cannot
be rejected. The Johansen test along each row considers the specific case and
moves towards the more general, but this now occurs for different values of
n – r – s, which for fixed r imply different values of s. Given r = 2, the test
statistic Q2,s is considered for s = 0, 1, 2, 3. Starting from the left n – r – s =
6 – 2 – 0 = 4, the Johansen tests statistic is Q2,0 = 80.184, which exceeds the 5%
critical value (c*6–2–0 = 53.35) taken from Johansen (1995a), implying that the
null (r = 2, s = 0) cannot be accepted. Continuing along the row where r = 2,
the null eventually cannot be rejected when n – r – s = 6 – 2 – 2 and s = 2
(Q2,2 = 20.653 < c*6–2–2 = 20.17). In line with Doornik and Hendry, the Johansen
testing procedure implies that there are r = 2 stationary linear combinations
(cointegrating vectors), n – r – s = 6 – 2 – s = 2, I(1) trends and s = 2, I(2)
trends.
The two test procedures advanced by Johansen (1995a) and Paruolo (1996)
imply that s = 2, but they disagree about the number of cointegrating vectors
and I(1) trends. Johansen (1995a) shows that by progressing from s = 0, 1, 2, 3,
the Q2,2 test has the same optimal properties in the limit as the Johansen test
statistic for cointegration. Furthermore, looking at the Johansen I(2) tests pre-
sented in the table above (Qr,s), when r = 0, 1, 2 the tests are not materially dif-
ferent whatever value n – r – s is selected. Partial confirmation of the
optimality of the test may be observed by comparing values of Qr,s. For the
Further Topics 169

column headed n – r – s = 3, Q0,3 = 44 & Q1,2 = 45.719 & Q2,1 = 43.857 and all
these values exceed the critical value (c*6–2–2 = 35.07) at the 5% level.
Inspection of the roots of the companion matrix of the VAR is often viewed
as a useful tool in determining the number of unit roots and as a result some
idea of the likely number of non-stationary processes driving xt (Johansen
1995a). The VAR(2) written as a first order model in state space from is:
 xt   A1 A2   xt −1  ε t 
xt∗ =  ∗ ∗
 = Ac xt –1 + ε t =   + 
x
 t −1  I 0   xt − 2   0 
 xt   A1 xt −1 + A2 xt − 2 + ε t 
 = 
 xt −1   xt −1 
or
 A( L)xt   xt − A1 xt −1 − A2 xt − 2  ε t 
 =  =  .
 xt −1 − xt −1   xt −1 − xt −1   0
Dhrymes (1984) shows that the characteristic roots of the dynamic process
described by the polynomial A(L) can be calculated from the eigenroots of the
companion matrix Ac. The eigenvalues (roots) for the VAR(2) model estimated
above and for comparison a similar VAR(1) are given in Table 6.2.
The Australian exchange rate example in Johansen (1991a), summarized in
Johansen (1995a), yields the clear-cut conclusion that there are three unit
roots when n – r = 5 – 2 = 3. By contrast, the VAR(2) case considered here
appears to reveal three roots close to the unit circle, a real root (.9719) and a
complex conjugate pair of roots with modulus (.9001), but, according to the
I(2) test produced by Johansen, n – r = 4. This suggests that detecting the

Table 6.2 Eigenvalues of companion matrix

VAR(2) VAR(1)

real imag modulus real imag modulus

–0.01897 0.3874 0.3879


–0.01897 –0.3874 0.3879
0.1327 0.0000 0.1327
0.4550 0.3193 0.5559
0.4550 –0.3193 0.5559
0.9719 0.0000 0.9719 0.9574 0.0000 0.9574
0.8877 0.1486 0.9001 0.9222 0.1115 0.9289
0.8877 –0.1486 0.9001 0.9222 –0.1115 0.9289
0.6553 0.2302 0.6946 0.6587 0.2145 0.6927
0.6553 –0.2302 0.6946 0.6587 –0.2145 0.6927
0.4910 0.0000 0.4910 0.9252 0.0000 0.9252
0.7729 0.0000 0.7729
170 Modelling Non-Stationary Time Series

number of unit roots from the companion matrix is not always straight-
forward. Firstly, a VAR(2) system can be decomposed into two stationary
processes (r = 2), two non-stationary processes (either n – 2 – s = 2 or s = 2) and
a pair of common I(2) or I(1) trends driven by a single unit root. Secondly,
should the roots of the VAR(1) be considered for comparison, then the esti-
mates are quite consistent with the proposition that there are n – r = 4 unit
roots. Analysis associated with both sets of eigenvalues for the two companion
matrices does not appear to support the approach due to Paruolo (1996),
which suggests r = 1 and n – r = 4.
Having found that some of the series are I(2), the usual cointegrating
vectors may not be valid as the stationary linear combinations may require
combinations of I(2) processes that are I(1) to make them stationary or poly-
nomial cointegration. Consider these following suggestions for the long-run
relationships associated with the VAR(2) system developed above. Based on
the findings in Hunter (1992a) and Johansen and Juselius (1992), there are
two cointegrating vectors that accept PPP and UIRP restrictions. The conclu-
sion of the I(2) analysis for PPP is that the series may only be rendered station-
ary when the cointegrating vector is augmented by differences in I(2)
variables. For example, relative movements in the cross-country inflation rates
may be what is required. With s = 2 common I(2) trends driving the price
series (p0p1p2) then the cointegrating vectors could take the following form:

 p0 
 
 p1 
0 1 − 1 − 1 0 0   p2 
 ′xt −1 − ! ′∆xt −1 = (   −
0 0 0 0 1 − 1 e12 
r 
1 
r2 t −1
 p0 
 
 p1 
0 0 ! 31 − ! 31 0 0  p2 
  ∆   ).
!12 0 0 0 0 0  e12 
r 
1 
r2 t −1
A similar type of long run occurs with polynomial cointegration (Engle and
Yoo 1991; Gregoir and Laroque 1993):
′xt −1 − ! ′∆xt −1 =
 0 1 ! 31 − 1 – ! 31 L − 1 – ! 31 + ! 31 L 51 61 
 x
(12 − !12 + !12 L) 0 0 0 1 – 1  t −1

where x′t = [p0t p1t p2t e12t r1t r2t]. The two forms of I(2) cointegration are equiva-
lent when 51 = 0, 61 = 0 and 12 = 0. Unfortunately, prior to any evaluation
Further Topics 171

of the long run, the system needs to be identified, but identification of the
type discussed in chapter 4 is considerably more complicated in the I(2) case
as three sets of matrices lack identification:121

 ′ ′ = ςς −1 ′ ′ ′ −1 ′ =  *  +′ *′


! ′ = ςς −1! ′ =  * ! *′
⊥ (⊥′ ⊥ ) " ′ ′ = ⊥  −1(⊥′ ⊥ )−1 " ′ ′ ′ −1 ′
−1

= ∗⊥ (⊥′ ⊥ )−1 " +′ *′

Hence, the same likelihood can be defined for (6.6) using parameters [, ′, ′,
"′, !′, ′⊥] and [*, +′, *′, "+′, !*′, *′
⊥ ]. The two sets of parameterizations are
observationally equivalent and observational equivalence leads to a funda-
mental loss of identification.
Although inflation seemed to be I(1) in the late 1980s and early 1990s the
argument appears less compelling in a world where inflation is predominantly
under control, which suggests that economic and financial time series might
be better described as long-memory.

6.2.2 Fractional cointegration


The notion of fractional differenced series was introduced in chapter 2. When
such processes are considered then the possibility of fractional cointegration
ought to be entertained. Robinson and Yajima (2002) explain that this notion
of fractional cointegration is quite consistent with the original definition of
cointegration due to Engle and Granger (1987). Consider a pair of series x1t
and x2t that require fractional differencing for them to be rendered stationary,
then:
∆ d xit = (1 − L)d xit ~ I ( 0) for i = 1, 2




( j − d )
where (1 − L)d = . For a > 0, ( a) = z l −1e − z dz and a = –l, l = 0,
j = 0 ( − d ) ( j + 1) 0
−1l
1, …, (a) has simple poles with residues , otherwise (a) = (a + 1)/a. It
l
follows that xt is cointegrated when:
 ′xt = J ( L)ε t ~ C(i, d )
 x1t 
when xt =  .
 x2 t 
Proofs exist for the analysis of stationary fractional series with –.5 < d < .5
(Robinson and Yajima 2002). The conventional question arises over the rank
of the matrix of cointegrating vectors, rank() = r. Do there exist r linear com-
binations of variables xt that require the fractional difference operator (1 – L)d
to be applied for the series to be I(0). Robinson (1994) explains how to use
non-parametric estimates of the dynamic process to calculate the cointegrat-
ing relationships when series have the same order of integration. Robinson
172 Modelling Non-Stationary Time Series

and Marinucci (1998) apply this approach to stationary fractionally integrated


series to estimate the long-run parameters from the equation:
x  ε 
[1 − 1 ]x1t  = J ( L)ε1t .
 2t   2t 

The estimator is similar to that used by Phillips and Hansen (1990) to estimate
long-run parameters when the series are I(1). The unknown moving average
parameters in J(L) are captured by a frequency domain estimator, which also
appears to compare well with Phillips and Hansen (1990) when the series are
I(1) (Marinucci and Robinson 2001). Although there is evidence that this type
of approach is able to estimate long-run parameters when r is known or not
large, the method, though efficient in calculating well-known long-run rela-
tionships, does not provide a formal test of the proposition that either frac-
tional or integer integrated series are cointegrated. The method can determine
the extent to which the variables in the regression are related by determining
whether 1 is significant or not. Clearly, any such conclusion is conditional
on the appropriateness of this normalization.
Robinson and Yajima have attempted to determine the order of integration
and cointegration by two different methods. They consider three different
crude oil prices (WTI, Dubai and Brent). Based on an Augmented Dickey–
Fuller test with an intercept, the three series are found to be stationary at the
5% level of significance. But when the order of difference is assumed to be
fractional, the estimates of d for the three series are [.5336, .4367, .4538].5
Robinson and Yajima (2002) suggest two approaches to the problem of
selecting the cointegrating rank, but they use one of them in their example.
Consider the Vector Auto-Regressive Fractionally Integrated Moving Average
(VARFIMA) model:

E( L)xt = C( L)ε t

where E(L) = diag[(1 – L)d1, (1 – L)d2 … (1 – L)dn].6 The series are ordered on the
basis of the prior estimate of the difference order. The test is based as is usually
the case on the rank of the matrix C(1), which, under conventional cointegra-
tion, has rank n – r associated with the extent to which there is any over-
differencing. The test, as is the case with integer cointegration, progresses
from the most restricted model, where C(1) has full rank, n – r = n and r = 0,
there is no cointegration to the cointegration cases, r = 1, 2, 3. The test for
fractional cointegration is:
H i : rank(G ) = rank(C(1)) = n − r
1
where G = C(1)C(1)′ .
2

To make the test operational, Robinson and Yajima use the following non-
parametric estimator of G:
Further Topics 173

m1

∑ Rc{ˆ ( ) }
1 −1 ˆ ( )−1 .
Gˆ = j I j j
m1 j =1

Where Ij = $( j)$( j)′, $( j) = ($1( j)$2( j) … $n( j))′, Re{·} is the real component,
i
d * i
d *
2
j
ˆ ( ) = diag ( e −j d* , … e −j d* ), j =
T
 j
2 2
2 . It has been assumed that
, and m < —
T

da is replaced by a pooled estimate d * = (dˆ1 + dˆ2 + dˆ3)/3 and $a( j) =
1
∑T xat eit j is the discrete Fourier transform of the original data. The effec-
2
T t =1
tive bandwidth m1 is set to increase at a faster rate than m to counteract the
effect of using an estimate of da Robinson and Yajima (2002) provide estimates
of G evaluated with m = 13 and m1 = 15:
.00493 .00542 .00575
ˆ  
G = .00542 .00625 .00653,
.00575 .00653 .0073 

where Ĝ has the following eigenvalues [.01807, .000275, .000124]. The most
important eigenvector is associated with the largest root, which given that the
other two roots are small suggests that n – r = 1 or with n = 3 variables then
there are r = 2 cointegrating relationships. Robinson and Yajima (2002)
proceed to analyze the case where the three series have two distinct orders of
differencing. This suggests that the WTI oil price series is handled differently
than that for Brent and Dubai. Once Brent and Dubai crude prices are consid-
ered together with two types of difference, the reduced rank calculation is
applied to a 2 × 2 sub-matrix, which from the obvious rank deficiency in Ĝ
above implies r = 1.

6.3 Forecasting in cointegrated systems

6.3.1 VMA analysis


Cointegration describes how, in the long run, the levels of a set of variables
should move together. A similar property should therefore be expected of
forecasts from such a system. That is, the forecasts of a set of variables from a
cointegrated system should be related to one another such that, although
individually subject to the implications of non-stationarity, there remain
linear combinations of the forecasts that are zero, or constant (depending on
the deterministic terms in the model). If valid long-run relationships are
imposed on an empirical model of the data, this ought to improve the
quality of long-run forecasts, as additional information is being exploited.
But is the value of the long-run restrictions, in terms of forecast improve-
ment, greater than for other types of restriction, or restrictions on stationary
systems? Engle and Yoo (1987) provide an analysis of this problem in the
CI(1, 1) case.
174 Modelling Non-Stationary Time Series

Consider the usual VMA representation of an n × 1, CI(1, 1) system consi-


dered in section 4.2:
∆xt = C( L)t , (6.18)

where C( L) = ∑C L ,
i =0
i
i
rank (C(1)) = n – r, and C0 = In. In order to obtain an

expression for xt, which is to be the object of the forecast, sum both sides of
(6.18) from i = 1, …, t to give
t
xt − x0 = ∑ C( L) .
i =1
i

In addition, assume initial values x0 and q, q = 0 are zero. Then,


∞ i −1
C( L)i = ∑j =0
C j i − j = ∑C 
j =0
j i− j

and so
t i −1
xt = ∑∑C 
i =1 j = 0
j i− j. (6.19)

Equation (6.19) can be rewritten in terms of s, S = 1, …, t


t t −s
xt = ∑∑C  .
s =1 r = 0
r s

Moving forward another h periods,


t + h t + h− s t t + h− s t + h t + h− s
xt + h = ∑∑
s =1 r = 0
Cr  s = ∑∑
s =1 r = 0
Cr  s + ∑ ∑C 
s = t +1 r = 0
r s

and redefining the index on the last summation to emphasize that it contains
terms in the disturbances beyond t only, gives
t t + h− s h h− q
xt + h = ∑ ∑C  + ∑∑C 
s =1 r = 0
r s
q =1 r = 0
r t +q. (6.20)

Equation (6.20) expresses xt+h as the sum of two terms that partition the dis-
turbances between those occurring up to and including time t, and later
values.
The forecast of xt+h based on information available at time t is the expected
value of xt+h given the information, and is denoted xt+h|t. In this context, h is
known as the forecast horizon and t is called the forecast origin. Using the fact
that the conditional expectation of a future disturbance term is zero, and the
conditional expectation of any current or past value is the expectation of a
realized value, from (6.20),

t t + h− s
xt + h t = ∑ ∑C  .
s =1 r = 0
r s (6.21)
Further Topics 175

This does not yet establish that the forecasts are linearly related. The require-
ment for this is for there to exist a linear combination of the forecasts that is
zero (in the absence of deterministic terms). That is, there must exist an n × 1
vector  such that ′xt+h|t = 0. From (6.21), a sufficient condition for this is
that
t + h− s
′ ∑C
r =0
r = 0.

But this does not follow from the properties of the VMA, as it requires each of
t + h− s

∑ C , s = 1, …, t to be of reduced rank and to have the same null space.


r =0
r

However, cointegration is a long-run property and its implications can only be


expected to follow in the long run. In a forecasting context, this means that
any special properties of the forecast arising from cointegration can only be
expected to become apparent as the forecast horizon, h, becomes large. So
t + h− s

consider the limit of ∑ C , as h → ∞:


r =0
r

t + h− s ∞
Limh→∞ ∑
r =0
Cr = ∑C
r =0
r = C(1), (6.22)

and define what can be called the long-run forecast, x∞|t, as:

x∞ t = Limh→∞ xt + h t . [ ] (6.23)

Then, from (6.21) and (6.22), x∞|t, is given by

 t t + h− s 
[ ]
x∞ t = Limh→∞ xt + h t = Limh→∞ 
 s =1 r = 0
Cr  s 

∑∑
t  t + h− s  t
= ∑
 Limh→∞
s =1 
 r =0 

Cr  s = C(1)  s .
s =1

The long-run forecast therefore follows a linear combination of the realized
value of a vector stochastic trend. But rank (C(1)) = n – r, and so there exist r
linearly independent vectors, that is the cointegrating vectors, , such that
′C(1) = 0. Therefore:
t
 ′x∞ t =  ′C(1) ∑
s =1
s = 0. (6.24)

The extent to which fixed horizon forecasts approximate to (6.24) depends


how quickly the matrix coefficients Ci,i = 0, 1 …, decay. From (6.21)
t  ∞  t ∞
xt + h t = ∑ C(1) − ∑ C 
s =1
r
r = t + h − s +1
s = x∞ t − ∑ ∑C  r s
s =1 r = t + h − s +1
176 Modelling Non-Stationary Time Series

and so
t ∞
 ′xt + h t = − ′ ∑ ∑C  . r s
s =1 r = t + h − s +1

Thus the smallest index on the Cr is r = h + 1, indicating that, assuming the Cr


do decay with r, the greater is the forecast horizon, the smaller will be the
deviation of the forecasts from their long-run relationship. Thus, empirically,
the evidence for cointegration restrictions improving forecasts should be
weaker for short horizons, than longer ones. The more rapidly the coefficients
decay, the fewer steps ahead the forecasts need to be before they display a
functional relationship similar to the cointegrating relations.
Turning to the h-step ahead forecast error, denoted et + h|t, and its variance,
from (6.20) and (6.21), this error is

t t + h− s h h− q t t + h− s h h− q
et + h t = ∑ ∑C  + ∑∑C 
s =1 r = 0
r s
q =1 r = 0
r t +q − ∑ ∑C  = ∑∑C 
s =1 r = 0
r s
q =1 r = 0
r t +q
(6.25)

and, since the disturbances are not autocorrelated

h  h− q   h− q 
var ( et + h t ) = ∑  ∑ C  ' ∑ C′ ,
r r
q =1  r =0 r =0 
where ' = E (te′t ), for all t. That is, the forecast error variance grows with h.
Interestingly, it is also the case that the forecast errors are cointegrated, with
precisely the same time series structure as the original process, xt, under the
condition that all forecasts are made using the same information, that avail-
able at time t. To see this use (6.25) to construct the forecast error difference
process

∆et + h t = et + h t − et + h−1 t
h h− q h −1 h −1 − q
= ∑∑C 
q =1 r = 0
r t +q − ∑ ∑C 
q =1 r = 0
r t +q

h −1  h− q h − q −1 
= C0t + h + ∑∑ 
q =1  r = 0
Cr −
r =0



Cr  t + q

h −1 h h
= C0t + h + ∑
q =1
Ch − q  t + q = ∑q =1
Ch − q  t + q = ∑C
q =1
h − q t + h −( h − q )

h −1
= ∑C 
k =0
k t + h− k = C( L)t + h ,  q = 0, q ≤ t ,

where the initial values are now relative to the forecast origin, and consistent
with the original VMA, have been set to zero. Thus
∆et + h t = C( L)t + h
Further Topics 177

and hence, from the original VMA, all h-step ahead forecast errors are cointe-
grated of order (1,1). That is, the difference between the h-step ahead and the
h – 1-step ahead forecast errors, both made conditional on information avail-
able at time t, is stationary, but the sequence of h-step ahead forecast errors,
for h = 1,2, …, is I(1).
An intuition for the non-stationarity of the forecast error can be provided
by expressing a future value of the process as a sum of the forecast and the
forecast error,
xt + h = xt + h t + et + h t . (6.26)

Since, xt+h|t depends only on realized values (the disturbance values at time t
and before), it is non-stochastic. Thus the stochastic non-stationarity proper-
ties of xt+h and et+h|t must be the same, so they must both be integrated of order
1. Applying the initial value condition q = 0, q ≤ t, equation (6.26) gives
xt+h|t = x∞|t and hence:
xt + h = x∞ t + et + h t ,

from which, pre-multiplication by the cointegrating vector gives


 ′xt + h =  ′x∞ t +  ′et + h t =  ′et + h t . (6.27)

The left-hand side of (6.27) is I(0) from the VMA, and therefore so is ′et + h|t,
hence et+h|t is CI(1,1).

6.3.2 Forecasting from the VAR


The property that the long-run forecasts should be linearly constrained can
also be obtained from a VAR. Again, let xt be an n × 1 CI(1,1) vector, this time
having the VAR(p) structure
p
xt = ∑Ax
i =1
i t −i + t . (6.28)

Reparameterize this in the usual way as the VECM


p −1
∆xt = xt −1 − ∑ ∆x
i =1
i t −i + t (6.29)

where, again  = ′ with  and  dimensioned n × r. Following Lin and Tsay
(1996), in order to understand how the forecasts from (6.28) have the same
long-run properties as the series themselves, note that xt is I(0), and that
forecasts of a stationary series converge to the expected value of the process as
the forecast horizon tends to infinity. That is
Limh→∞ ∆xt + h t = ∆x (6.30)

where x = E ( xt). The properties of the forecasts of the difference process are
used to obtain those of the levels via the VECM. Using (6.29), the h-step ahead
forecast equation for the difference process is
178 Modelling Non-Stationary Time Series

p −1
∆xt + h t = xt + h−1 t − ∑ ∆x
i =1
i t + h− i t . (6.31)

In order to derive the properties of the long-run forecasts, take the limit of
(6.31) as h → ∞, and substitute from (6.30) to give
p −1

[
∆x =  Limh→∞ xt + h−1 t − ] ∑ i =1
i ∆x .

Rearranging, and using the notation of (6.23) for the long-run forecast of the
level,
 p −1 
x∞ t =  I n +

∑  i ∆x . (6.32)
i =1 
The right-hand side of (6.32) is a constant matrix, and so shows that the long-
run forecasts, x∞|t, are tied together. The analysis can be taken further to com-
plete the analogy with equation (6.24) for the VMA case. Pre-multiplying
(6.32) by ′ and replacing  by ′ gives
 p −1 
( ′) ′x∞ t =  ′  I n +

∑  i ∆x
i =1 
where (′) is non-singular, so that
 p −1 
 ′x∞ t = ( ′)−1  I n +
 i =1

i  ∆x .

This is directly comparable with (6.24) (except that in 6.24 initial values have
been set to zero), and shows that each cointegrating vector constitutes a con-
straint on the long run forecasts.

6.3.3 The mechanics of forecasting from a VECM


In order to benefit from any perceived advantages to forecasting from cointe-
grated models, it is necessary to impose the cointegrating relationships. In the
VAR setting, this may be undertaken as follows.
For given  and by implication, known cointegrating rank, r, construct coin-
tegrating combinations t = ′xt, and estimate the VECM, conditional on r, as
p −1
∆xt = t −1 − ∑ ∆x
i =1
i t −i + t .

Estimation may be performed by OLS, to give


p −1 p −1
∆xt = 
ˆ t −1 + ∑
i =1
ˆ ∆x + e = 
i t −i t
ˆx +
t −1 ∑ ˆ ∆x
i =1
i t −i + et ,

ˆ =ˆ ′. Now, rearrange the VECM as the VAR


where 
Further Topics 179

p
xt = ∑ Aˆ x
i =1
i t −i + et ,

ˆ = I +
A ˆ −
ˆ, A
ˆ =
ˆ −
ˆ and A
ˆ = ˆ .
1 n 1 i i i −1 p p −1

The h-step ahead forecasts can then be produced recursively using


p
xt + h t = ∑ Aˆ ∆x
i =1
i t + h− i h
, (6.33)

where xt + h–i|t = xt + h–i for h ≤ i. If r and  are unknown, they may be replaced by
values r̂ and ˆ estimated using the Johansen procedure. This is the approach
used by Lin and Tsay (1996).
The order of the forecasting VAR in (6.33), and that used for the Johansen
pre-whitening, should be the same, determined, for example, using an infor-
mation criterion, such as the Schwarz (SIC) (see Reimers 1992; Lütkepohl
1991). Otherwise, as was explained in section 4.3.3, programs such as PCGIVE
provide systems and single equation diagnostic test for each equation in the
VAR (Doornik and Hendry 2001).
The details of information criteria vary according to the weight put on addi-
tional parameters, but they are generally of the form
T

∑ ˆ′ˆ
1
IC = ln t t + mf (T ), (6.34)
T t =1

where f(T) is an increasing function of T, m = pn2, the number of estimated


coefficients in an unrestricted VAR, and ˆt the vector of VAR residuals. A
criterion which often preferred is the SIC, for which f (T ) = ln(TT ) . Amongst the
criteria most commonly used, this penalizes additional parameters (increasing
VAR order) the most heavily, leading to relatively parsimonious models. The
favoured model is that for which the information criterion value is mini-
mized. When used in this way, the SIC provides consistent model selection in
the sense that, as the sample size tends to infinity, it will select the correct
model order with probability tending to one.

6.3.4 Forecast performance


The imposition of cointegrating restrictions on a model of I(1) series should
lead to forecast improvements for two reasons. Firstly, valid long-run relation-
ships should improve the accuracy of long-run forecasts by exploiting infor-
mation about the interrelatedness of the series. Secondly, fewer parameters are
estimated. In the unrestricted VECM,  has n2 elements, whereas when
restricted, it has 2nr. However, a number of practical issues arise:

(i) How useful is the long-run information in providing long but finite time
horizon forecasts?
180 Modelling Non-Stationary Time Series

(ii) How are short-run forecasts affected?


(iii) What are the costs of mistakenly identifying series as I(1) when they are
really I(0)?
(iv) What is the cost of incorrectly estimating r?
(v) What is the cost of imposing invalid long-run restrictions (getting the
cointegrating vectors wrong)?

These issues are discussed by Clements and Hendry (1995, 1998), Lin and Tsay
(1996) and Engle and Yoo (1987), among others. The three studies report
Monte Carlo results; their findings are summarized below.

6.3.4.1 Engle and Yoo


These authors consider a bivariate model (representable as a first-order VAR)
and discuss two types of forecast that can be made from it, one ignoring any
long-run restrictions, and one imposing them. These forecasts are based on an
unrestricted VAR (UVAR) and the Engle and Granger two-step methodology
(EG) respectively. In the latter case, at each replication, a preliminary static
regression is used to estimate the cointegrating relations and the lagged
residuals from this model being included as the lagged levels term in a
dynamic ECM.7 The putative long relations are not subject to prior testing for
cointegration.
The sample size is 100 and the forecast horizon from 1 to 20, so that in this
case, a long-run forecast is being defined as one with a horizon 20 per cent
beyond the sample, if not less. The finding is that, in terms of the mean
square forecast error as measured by the trace of the sample covariance matrix
of the forecast errors (see section 6.3.4.4 for more detail on forecast evalua-
tion), the unrestricted VAR provides superior forecasts up to and including the
5-step ahead forecast (5 per cent of sample size), thereafter, the imposition of
estimated long run restrictions improve the forecast monotonically, to an
advantage of 40 per cent over the unrestricted forecast at 20 steps ahead. This
is, of course, against a background of worsening forecast performance as fore-
cast horizon increases.

6.3.4.2 Clements and Hendry


In their book and earlier paper, Clements and Hendry (1998, 1995) generalize
the study of Engle and Yoo. They present the results of a bivariate VAR(1)
system estimated on 100 observations, but for a wider range of parameter
values and models. In addition to UVAR and EG, they consider the Johansen
maximum likelihood estimator (ML) and a misspecified model in differences
alone (DV), the lagged levels term being excluded. The DV model can be
used to forecast the level of the process by adding successive forecasts of the
differences to the known value of the level at the forecast origin. They also
Further Topics 181

introduce another issue, which is the form of the process used to compare
forecasts: the levels, the differences, or the stationary combinations. The last
of these representations is obtained by transforming the model to one in
terms of the cointegrating combinations and the differenced common trends.
Thus, the number of processes is unaltered, and their integration and cointe-
gration properties preserved. Their notation for the I(0) variables is wt where
w′t = (x′t ′⊥ xt). Consider the partition ′ = (′a ′b) with a dimensioned
r × r and b dimensioned r × (n – r) and defining

J ′ = ( 0 I n− r ) and Q = ( J )′

the representation is
wt = Gwt −1 +  t (6.35)
( I r + ′) 0  ′
where G =   and  t =  t .
 b 0  J′ 
Clements and Hendry produce forecasts of xt and xt using each of the four
estimation methods, UVAR, ML, EG, and DV. These primary forecasts are
transformed to produce forecasts of each of xt, xt and wt. That is, each fore-
cast is one of xt or xt, initially, but all are transformed (as necessary) into xt,
xt and wt. The purpose of the exercise is to emphasize that the superiority of
one forecast method over another depends not only on what model is used to
produce the forecast, but also on what properties of the forecast are being
compared.
In particular, in comparing EG and UVAR to forecast xt, the level of the
process, the importance of the imposition of a valid long-run restriction is
examined. But the question then arises as to whether it matters that the
restriction is specifically a long-run restriction. In other words, are the advan-
tages available from the imposition of correct restrictions markedly different
in a non-stationary cointegrated environment compared to a stationary one?
The way to get at this issue is to transform the forecasts to stationarity before
comparing them, effectively filtering out long-run variation. The appropriate
transformation is that of equation (6.35), applied to the forecasts. This proce-
dure is only available in the context of simulations (using parameter values
from the DGP), since the UVAR, by its very nature, brings with it no estima-
tion of the cointegrating combinations. It is still the case that the forecasts
differ in the method of their production, but are now being compared on a
more appropriately matched basis – that is, in stationary terms. If relative
forecasting performance is different in stationary space, then it suggests that
the long-run nature of the restrictions is relevant in determining forecast
behaviour.
If it is the long run nature of the restrictions that improve the long-run fore-
casts, then direct comparisons of the forecasts of the level of the process
182 Modelling Non-Stationary Time Series

where the restrictions are, and are not imposed, should favour the forecasts
made subject to the restrictions. However, if the long-run components are
removed prior to comparison, these transformed forecasts should not differ
significantly. Equation (6.35) is a very useful device for decomposing the
causes of relative forecast behaviour.
In their simplest case (among 13 parameterizations), Clements and Hendry
generate data according to a bivariate VECM model with a single lag,

∆xt = xt −1 + t .

Forecast comparisons are made in a number of ways, the simplest of which is


based on the trace of the estimated variance–covariance matrix of the forecast
errors (see section 6.3.4.4 for more detail on forecast evaluation). One para-
meterization is very similar to that used by Engle and Yoo, and therefore com-
parable with the earlier results. It is shown that, at longer forecast horizons,
material improvement in the levels forecast are available by imposing cointe-
grating restrictions. That is, EG and ML are superior in levels forecasting to
UVAR when the forecast horizon is relatively long. In addition, the superiority
is more marked with smaller sample sizes due to the enhanced role of the
degrees of freedom saved by imposing the restrictions.
When the forecasts are transformed to stationarity (using equation (6.35))
and compared again, UVAR is no longer inferior. This suggests that the gains
in forecast performance from the imposition of the restrictions are due to
their long-run characteristics, as no further restrictions have been imposed. In
contrast to these findings, the misspecified DV model performs only slightly
worse than EG and ML (and therefore better than UVAR) in levels forecasts at
longer forecast horizons, but notably under-performs the other three when
the forecasts are compared in stationary space.
These findings must be interpreted with care because, in practice, VAR order
and cointegration rank are decided from the data. In addition, systems will
normally consist of more than two variables. Clements and Hendry summa-
rize the results of their more widely parameterized study using response sur-
faces, presenting their conclusions with a number of warnings about the
additional complexities that enter in the practical forecasting setting. The
results represent a benchmark case only.

6.3.4.3 Lin and Tsay


Lin and Tsay (1996) generalize the model for forecast performance compar-
isons to one involving four variables. Their Monte Carlo study is necessarily
restricted in terms of the parameter values used, but the DGPs used are chosen
to mimic observed data characteristics, so in this sense are calibrated so as to
apply to a relevant parameter space. The structures used have the following
characteristics.
Further Topics 183

(i) All systems are second order (VAR(2)).


(ii) Five DGPs are considered in all, being respectively, from model 1 to
model 5, strongly stationary, but with two roots close to the unit circle,
stationary with two roots very close to the unit circle, non-stationary
system with cointegrating rank 2, non-stationary and non-cointegrating.8
Of these, the stationary and unit root non-cointegrated cases are diagonal.
(iii) The in-sample period consists of 400 observations, with 100 additional
out-of-sample data points generated for forecasting comparison. Forecast
horizons of 1 to 60 are used. Each replication gives rise to a set of fore-
casts at each forecast horizon.
(iv) All models are estimated as ECMs with cointegrating rank r = 0, 1, 2, 3, 4
using Johansen’s (1988, 1991, 1995) approach, and then recast as VECMs
for the purpose of forecasting the levels.
(v) The forecasting metric, E(L), where L is the forecast horizon (see equation
6.36), is based on the trace of the estimated variance–covariance matrix
of the forecast errors. Each replication gives rise to an estimated vari-
ance–covariance matrix of forecast errors, and these are then averaged
across replications. The larger is the statistic, the poorer the forecast.

The results of these exercises are presented in Figure 6.1.


Lin and Tsay gather their conclusions on these results into the following
principal points:

(i) When the system is stationary the long-run forecasts approach a constant
quickly as the forecast horizon increases. (The size of the forecast errors,
in terms of their variance is also relatively small.)
(ii) If the system is stationary, then under-specifying the rank of the long-run
matrix leads to under-performance. That is, imposing long-run restric-
tions that do not exist in practice (which are not valid) damages long-run
forecast performance. The more of these there are, the worse the perfor-
mance of the forecasts.
(iii) Unless the system is very close to non-stationarity (the near non-
stationary DGP is model 3), correct specification of the cointegrating rank
is best.
(iv) Under specification of the cointegrating rank is not serious if the
processes concerned are non-stationary. This should be contrasted with
the stationary case, where, although cointegration is not defined, the
rank of the long-run matrix still is, and where this is under-specified,
there is a deterioration in forecast performance.

Clearly, non-stationary and near non-stationary systems are harder to fore-


cast than stationary ones. As a matter of design, it should be noted that while
184 Modelling Non-Stationary Time Series

Figure 6.1 Forecasting performance from Lin and Tsay study, by model

Lin and Tsay control carefully for the roots of the processes involved, only
their cointegrated structure displays common features, in this case of the unit
root. All the other models are diagonal, meaning that, in the case of model 3
for example, although there are roots very close to being unit roots, they do
Further Topics 185

Figure 6.2 Lin and Tsay results, all models, rank 2 system

not constitute a common feature. For this to be so, the determinant of the
VAR lag operator evaluated at that root would have to be less than full rank,
but not zero. Diagonality results in its being zero (Burke 1996).9 Model 3 also
has the interesting property that the quality of forecasts is least affected by the
choice of (cointegrating) rank.
By grouping these results differently, a further conclusion can be made.
Instead of looking at the results by model and varying the cointegrating rank
imposed, it is possible to fix the imposed cointegrating rank, and see which
model is easiest or hardest to forecast for that restriction. Figure 6.2 demon-
strates the case for the imposition of rank 2, which is correct for model 4. It is
immediately obvious that, using the trace measure (see Forecast Evaluation
below), the cointegrated system is the hardest to forecast at medium and
long horizons. It is even harder to forecast than the non-stationary non-
cointegrated case.10 In fact, no matter what cointegrating rank is imposed
(0 to 4), the cointegrated system is the most difficult to forecast, in the sense
that it has the largest trace statistic. However, it remains the case that, if the
system is cointegrated, it is best to impose the appropriate cointegrating rank
(figure 6.1d).11
These forecast comparisons are more limited since they are compared in
levels terms only. Clements and Hendry demonstrate that once transformed
to stationarity, there is much less difference between forecasts based on differ-
ent procedures. It is not clear from Lin and Tsay if the same transformation
would result in less obvious distinctions between the forecasts based on the
imposition of different cointegrating ranks at the estimation stage. Broadly
speaking, the extension to the multivariate case is not found to undermine
the findings on Clements and Hendry for the bivariate case. However, the
four-variable setting makes it even more difficult to generalize the findings,
186 Modelling Non-Stationary Time Series

and the multiplicity of possible cases should lead to reticence when interpret-
ing the results in a wider setting.
In order to reduce the impact of such criticisms, Lin and Tsay present two
real data examples, one financial and one macroeconomic. They observe that
the problem of roots close to the unit circle, but not actually being unit roots,
is observable in data (that is, similarity to model 2, or, more extremely, model 3).
In such circumstances, the under-specification of the rank (imposing unit
roots that are not present) can be expected to result in poor long term fore-
casts.12 Secondly, they observe that forecast error variances from a stationary
system converge fairly rapidly as a function of forecast horizon. This is used to
explore the stationarity of a system of bond yields. In this case, the unit root
and cointegration tests performed suggest cointegration. This could be a case
where the process is near non-stationary, and with a common feature, but the
common feature is a root close to, but not on, the unit circle. It is clear from
their investigations that, at a practical level, cointegrating restrictions cannot
be assumed to improve long term forecasts, even where there is within-sample
statistical evidence to support them.

6.3.4.4 Forecast evaluation


In both the Lin and Tsay and Clements and Hendry studies, the basic measure
of forecast accuracy is the trace of the Monte Carlo estimate of the
variance–covariance matrix of the forecasts. It has the following form. Let
ek,t (j) be the j–step ahead vector forecast error made at time t arising from the
kth replication. Let the total number of replications be K. Then let
ˆ ( j ) = e ( j )e ( j )′ .
'κ,τ k ,t k ,t

One of the measures used by Clements and Hendry, and the one relevant to
most of the results reported above, is
 K 


∑ ˆ ( j) 
'k ,t

T ( j ) = trace  k =1 ,
K
 
 
which is referred to as the trace mean-square forecast error (TMSFE). Lin and
Tsay use a modified version of this criterion since each replication gives rise to
a set of j–step ahead forecasts, as a result of rolling forward the forecast origin
within the same replication. They construct a within replication estimate of
the forecast error variance–covariance matrix as
400 − j

∑ 'ˆ k ,t ( j )
ˆ ( j) =
' t = 300
k
100 − j + 1
Further Topics 187

This is then averaged across replications, the final measure being

 K 
 ∑' ˆ ( j)
k
 
E( j ) = trace k =1  (6.36)
 K 
 
 
Clements and Hendry (1998) discuss the choice of criterion, and use others in
addition to TMSFE. An important aspect of these is their sensitivity to linear
transformations of the data, although extensive use continues to be made of it.

6.3.4.5 Other issues relevant to forecasting performance in practice


In practice, forecasting will be subject to a number of other possible sources of
error (Clements and Hendry, 1998, chapter 7, for a taxonomy). In the context
of forecasting in cointegrated systems, these include the uncertainties associ-
ated with the selection of VAR order, the reliability of unit root and cointegra-
tion tests, and the estimation of the cointegrating vectors. This analysis has
dealt exclusively with CI(1, 1) systems, elsewhere in this book, the case of
cointegration in I(2) systems has been considered. This raises the question not
just of how forecasting might be affected by choice of cointegration rank, but
also types of (linear) cointegration, especially where there exists the possibility
of variables being integrated of order up to 2.
All forecasting is predicated on at least two assumptions regarding model
stability. That is, that the model structure has remained constant during the
in-sample period, and that this same structure will remain into the forecasting
period. Clements and Hendry (2001) have considered the implications for
forecasting of some types of model instability in depth. Other procedures
allow model switching (usually in a univariate setting, however), or non-linear
adjustment to equilibrium. Any or all of these methods may be appropriate
where a simple linear approximation fails to provide adequate forecasting
performance.
Typically, the order of underlying VAR model is chosen by the optimization
of some form of parsimonious information criterion, such as the SIC. These do
not all have the same model selection properties, however (Reimers 1992). A
potentially important variant of these criteria is to jointly select over VAR
order and cointegrating rank. The criteria given by equation (6.34) are easily
modified for this purpose. The VAR(p) can be estimated as a VECM as this
1 T
does not alter the value of the ∑ ˆ ˆ′ , but cointegrating restrictions can
T t =1 t t
be placed on the long-run matrix, via the Johansen procedure for example,
such that
188 Modelling Non-Stationary Time Series

 =  ′
n× n n× r r × n

such that there are only 2nr parameters of  to be freely estimated. The infor-
mation criterion is therefore of the form of (6.34) with m = (p – 1)n2 + 2nr, the
selected model being that for which the criterion is minimized over a grid of
values of p and r = 0, 1 …, n (the upper limit on the range of r allowing for sta-
tionarity). The evidence on the appropriate form of the penalty term, f(T), is
mixed (Reimers 1992), and while SIC can dominate, relative performance
depends on simulation design. In practice, it is best to compute a range of crite-
ria and search for corroborative evidence amongst them as to model order and
cointegrating rank, and, if there is significant deviation in the findings, to check
that subsequent inferences are not sensitive across the set of models suggested.13
Lin and Tsay (1996) point out that a model should be selected (and estimated)
according to its purpose. In their paper they develop the idea that if the objec-
tive of the model is to forecast at a long-term forecast horizon, then it should be
optimized to do this. Since standard methods of estimation and the form of
information criteria are based on one step-ahead errors, it would not be surpris-
ing that such models were sub-optimal in terms of, say, 50-step ahead forecasts.

6.4 Models with short-run dynamics induced by expectations

A number of papers have considered the issue of estimating the linear


quadratic cost of adjustment models under the type of dependence associated
with cointegration (Dunne and Hunter 1998; Hunter 1989; and Engsted and
Haldrup 1997). It should be understood that other forms of dependence might
lead to similar types of problems. However, none of these are insurmountable.
One issue which has been much discussed in the literature is the question of
identification. As much of the analysis to date has concerned single equa-
tions, then the identification of the discount rate is of concern (Hendry et al.
1983; or Sargan 1982a). In general identification of parameters in structural or
quasi-structural relationships is feasible (Arellano et al. 1999; Hunter 1989,
1992; and Pesaran 1981, 1987). A significant issue, as far as identification of
forward-looking behaviour is concerned, is that both the IV and GMM estima-
tors do not bind the solution based on the minimum of the optimization
problem to the restrictions associated with the terminal condition (Nickel
1985; Hunter and Ioannidis 2000). Tests of over-identifying restrictions do not
impose burdensome conditions on the estimator, and satisfaction of the
necessary conditions follows without difficulty with the exception of highly
non-persistent processes (Stock, Wright and Yogo 2002).
This section considers the impact of cointegration amongst endogenous
and exogenous variables on rational expectations solutions and reveals a
Further Topics 189

computationally efficient estimation procedure that can readily be adapted to


incorporate dependent I(1) processes either in the endogenous or the exoge-
nous variables. The necessary and sufficient conditions for separation into two
forms of long-run process is discussed in Hunter (1989, 1990), in terms of the
types of condition discussed under cointegrating exogeneity in chapter 5.
Otherwise efficient estimation of the long run requires the existence of a
number of weakly exogenous variables either for the system or a sub-system
for which behaviour is predominantly forward looking. This is intimately
related to the notion of super exogeneity which may negate the practical use
value of the Lucas critique (Lucas 1976; Hendry 1988; and Hendry and Favero
1992).

6.4.1 Linear quadratic adjustment cost models


Consider the following objective function based on Kollintzas (1985), though
for ease of exposition the interaction between yt and (yt – vt) is excluded
here:
T*
E( ℑt 't ) = ∑ E{ ( ∆y ′ K∆y
t =o
t
t t + ( yt − vt )′ H ( yt − vt )) 't )}. (6.37)

Let (6.37) define a control problem (Chow 1978), yt is an n1 vector of endoge-


nous variables, vt an n1 vector of unobserved targets, that can be defined as a
linear function of n2 exogenous variables, zt, where vt = Azt + wt, A is a matrix
of long-run multipliers, wt = zt – E(zt|'t) is a n1 vector of white noise innova-
tions and  is the discount rate. With fixed initial conditions y0 = y–, then from
Kollintzas (1985) the Lagrange–Euler first-order condition after substituting
out for vt is:

E( t Q0 yt −  t +1Q1 yt +1 −  t Q1′ yt −1 −  t H Azt 't ) = 0, (6.38)

where Q0 = (1 + ) K + H and Q1 = K.
Consider the process when it approaches its terminal value (at T* = T + N):

E(T *Q0 yT * − T * +1Q1 yT * +1 − T *Q1′ yT * −1 − T * H AzT * 't ) = 0. (6.39)


Stationarity is one precondition traditionally accepted for the transversality
condition to be satisfied (Pesaran 1987), but when the structure includes a dis-
count factor this assumption is too strong. In general all that is required is for
(6.39) to be bounded as T* → ∞.
To reveal a standard symmetric solution to the forward-looking problem,
1
(6.39) is scaled by –2–(T* + 1):
1 1
− (T * +1) T * − (T * +1)
E( 2  Q 0 yT *
2 − Q1 yT * +1 −
1 1
− (T * +1) T * − (T * +1) T *
 2  Q1′ yT * −1 −  2  H AzT * 't = 0. (6.40)
190 Modelling Non-Stationary Time Series

Simplifying (6.40):
1 1 1 1
− − (T *) (T * +1) (T * +1)
E( 2Q
0
2 y T* −2 Q1 yT * +1 −  2 Q1′ yT * −1
1 1
− T*
−  2H A 2 zT * 't ) = 0. (6.41)
1 1
Re-defining (6.41) in terms of y*T* =  – –2(T*)
yT* and z*T* =  – –2(T*)
zT* gives rise to
the symmetric solution:
1 1
− −
E( 2Q *
0 yT * − Q1 yT* * +1 − Q1′ yT* * −1 −  2H AzT* * 't ) = 0. (6.42)
In the limit (6.42) is bounded when the roots of the processes driving zt and yt
1
are of mean order less than –2– as:
Lim E( yT* * +1 't ) → 0 and Lim E( zT* * +1 't ) → 0.
T *→∞ T *→ 0

Notice that (6.42) is bounded even when y and z have univariate time
series representations that are non-stationary. Now consider the cointegration
case. Dividing (6.38) by t and transforming yields an error correction
representation:
E( −K∆yt +1 + K∆yt + H ( yt − Azt ) 't ) = 0. (6.43)

It follows that (6.43) is bounded in the limit when:


Lim { −K∆yT * +1 + K∆yT * + H ( yT * − AzT * )} → 0. (6.44)
T *→∞

From the above discussion, a regular solution (see Pesaran 1987) to (6.42)
1
exists, if and only if: (a) Qo is symmetric; (b) K is non-singular; and (c) < –2–.
Dividing through (6.38) by t yields the following difference equation:

E(Q o yt − Q1 yt +1 − Q1 yt −1 − H Azt 't ) = 0. (6.45)

Redefining (6.45) using the forward (L–1) and backward (L) lag operators:
Q ( L) E( yt 't ) = H AE( zt 't ). (6.46)

Now Q(L) = (QoI – Q1L–1 – Q′1L) has the following factorization:


Q1Q ( L) = ( I − G1 L−1 )( I − FL),

where G1 = F, F = PP–1 and  is a matrix whose diagonal elements are the
stable eigenroots of the system. Therefore:

( I − G1 L−1 )( I − FL) E( yt 't ) = K −1 H AE( zt 't ). (6.47)

It follows that the solution of the system can be written as:



yt − Fyt −1 = ∑ (G ) F E( R Az
s=0
1
s
o t +s 't ) + (G1 )− t M t + ut (6.48)

(Sargent (1978). Where Ro= ((F – I) + F–1 – I) and Mt satisfies the martingale
property E(Mt+1|'t) = (G1) Mt (Pesaran 1987).
Further Topics 191

Reversing the transformation and applying it to (6.48):

( I − G1 L−1 )( yt − Fyt −1 − ut )

= ( I − G1 L−1 )( ∑ (G ) F E( R Az
s=0
1
s
o t +s 't ) + (G1 )− t M t )

∞ ∞
= ∑
s=0
(G1 )s F E( Ro Azt + s 't ) − G1 ∑ (G ) F E( R Az
s=0
1
s
o t +s 't +1 )

+ ( I − G1 L−1 )(G1 )− t M t .
The first two terms on the right-hand side simplify, while the Koyck operator
annihilates the bubble behaviour. Therefore:

( I − G1 L−1 )( yt − Fyt −1 − ut )
∞ ∞
= FRo Azt + ∑
s =1
(G1 )s F E( Ro Azt + s 't ) − ∑ (G ) F E( R Az
s =1
1
s
o t +s 't +1 )


= FRo Azt + ∑ (G ) F( E( R Az
s =1
1
s
o t +s 't ) − E( Ro Azt + s 't +1 )).

Assuming that there are no bubbles and a forcing process zt = B(L)wt (wt is
white noise), then:
E( zt + s 't ) − E( zt + s 't +1 ) = − Bs −1wt +1

and

( I − G1 L−1 )( yt − Fyt −1 − ut ) = FRo Azt − ∑ (G ) F ( B
s =1
1
s
s −1wt +1 )


= FRo Azt − FRo ( ∑ (G ) AB
s =1
1
s
s −1 )wt +1 .

Now reversing the Koyck lead and setting ( ∑∞s =1 (G1 )s ABs −1 ) = D gives rise to a
forward-looking representation, which depends on future values of zt.
Therefore:

( yt − Fyt −1 − ut ) = ( I − G1 L−1 )−1 ( FRo Azt − FRo Dwt +1 )



= ∑ (G ) F( R Az
s =1
1
s
o t +s − Ro Dwt + s +1 ).

It is possible to estimate the above model by FIML using the following


recursion:
( yt − Fyt −1 − ut ) = ht (6.49)
ht = FRo Azt − FRo Dwt +1 + G1ht +1 .

A fixed initial condition can be handled by recursively de-meaning the depen-


dent variable (Taylor 1999), the problem of selecting an appropriate terminal
192 Modelling Non-Stationary Time Series

condition is solved by introducing a large enough future horizon or setting


(G1)sht+s+1 = 0.
Alternatively, the solution has the following backward representation, by
substituting terms of the form E(zt+s|'t) using the Wiener–Kolmogorov predic-
tion formula, which gives rise to the reduced form:

yt − Fyt −1 = Ξ( L)zt + ut , (6.50)

where ((L) = ((0 + (1L + … (s–1L ) is a function of , H, K, A and (L) =


s–1

(I + 1L + … sLs). However, this is a more complex set of non-linear relations


to deal with (Hunter 1995; or Johansen and Swensen 1999).
It is also possible to give (6.49) a recursive structural form as long as K–1
exists. Notice that Ro = K–1 H and:
( yt − Fyt −1 − ut ) = Ro FAzt − Ro FDwt +1 + G1ht +1
K( yt − Fyt −1 − ut ) = HFAzt − HFDwt +1 + KG1ht +1 . (6.51)

As in a conventional system (Sargan 1988), to identify K, H and F, then


n1 – 1 additional restrictions are required (Hunter 1992a). Subject to knowl-
edge of K and F, then H can be calculated from the following restriction
K Ro = K((F – I) + F–1 – I) = H as Ro = K–1 H commutes. Essentially, identi-
fication of K follows from the additional restrictions, while identification of H
follows from F, given knowledge of K and any additional restrictions to the
system.

6.4.2 Models with forward behaviour and n2 weakly exogenous I(1)


variables.
If one considers the backward-looking form of the forward-looking model,
then this is a VAR. The cointegrating VAR takes the from

1.   yt −1 
∆yt = [11 : 12 ]     + 1t (6.52)
2.  zt −1 
1.   yt −1 
∆zt = [21 : 22 ]     + 2t , (6.53)
2.  zt −1 

where any further dynamic can be incorporated in an appropriate time series


representation of the error process. It follows for weak exogeneity relative to
the long run, that [21 : 22] = [0 : 0]. As a result:

1.   yt −1 
∆yt = [11: 12 ]     + 1t (6.54)
2.  zt −1 
∆zt = 2 t , (6.55)
where  = [′1.′2.] and 2t = C(L)wt. Notice that inference on the short-
run parameters is not appropriate as the coefficients of the ARMA error
process forcing yt depend on the MA process forcing 1t. It follows that the
cointegrating relations are defined in the equations for yt. Now consider the
Further Topics 193

solution to the forward-looking relationship given above, then the long-run


behaviour that is important applies to the equation for yt.

yt − Fyt −1 = Ro F ∑ (G ) E( Az
s=0
1
s
t +s Ωt ) + ut

where RoF = ((F2 – F) + I – F) = (F(F – I) + I – F) = (I – F)(I – F). It follows that:


yt − Fyt −1 − ut

= ( I − F )( I − F ) ∑ (G ) E( Az
s=0
1
s
t +s 't )

∞ ∞
= (I − F)
(∑s=0
(G1 )s E( Azt + s 't ) − ∑ (G )
s=0
1
s +1
E( Azt + s 't )
)

= ( I − F ){ FAzt −1 + ∑ (G ) E( A∆z
s=0
1
s
t +s 't )}. (6.56)

Now it follows from the results in Engsted and Haldrup (1997) that (6.56) has
an error correction type representation in differences and levels. Furthermore:

∆yt + ( I − F )yt −1 − ut = ( I − F ){ Azt −1 + ∑ (G ) E( A∆z
s=0
1
s
t +s 't )}


∆yt + ( I − F ){ yt −1 − Azt −1 } − ut = ∑ (G ) E( A∆z
s=0
1
s
t +s 't ).

In the error correction form 1 = (I – F) and the cointegrating relations are


normalized with respect of n2 weakly exogenous variables as follows,
 = (I : A). The representations in Dolado et al. (1991) and Engsted and
Haldrup (1997) rely on the existence of exactly n2 weakly exogenous variables
for the long-run to be estimated from the equations on yt alone. It then
follows that the above system can be estimated in two steps. Firstly the long
run might be estimated using a regression or the Johansen Procedure, and
then the short run relationship is estimated. There is no separate long-run
relationship amongst the endogenous variables. Alternatively, consider a
solved form similar to the one dealt with in sections 6.4.1:

yt − Fyt −1 − ( I − F ) Azt −1 − ut = ∑ (G ) E( A∆z
s=0
1
s
t +s 't )}

Reversing the Koyck transformation:

( I − G1 L−1 )( yt − Fyt −1 − ( I − F ) Azt −1 − ut )



= ( I − G1 L−1 )( I − F )( ∑ (G ) E( A z
s=0
1
s
t +s 't ))

∞ ∞
= ( I − F ){ ∑ (G ) E( A∆z
s=0
1
s
t +s 't ) − G1 ∑ (G ) E( ∆Az
s =0
1
s
t +s 't +1 )}
194 Modelling Non-Stationary Time Series

or
( I − G1 L−1 )( yt − Fyt −1 − ( I − F ) Azt −1 − ut )

= ( I − F ) A∆zt + ( I − F ) ∑ (G ) ( E( A∆z
s =1
1
s
t +s 't ) − E( A∆zt + s 't +1 )).

It follows from the Granger representation theorem that zt has the following
Wold form zt = C(L)wt and

E( ∆zt + s 't ) − E( ∆zt + s 't +1 ) = −Cs −1wt +1 .

Substituting back into the forward-looking model:

( I − G1 L−1 )( yt − Fyt −1 − ( I − F ) Azt −1 − ut )



= ( I − F ) A∆zt − ( I − F ) ∑ (G ) ( AC
s =1
1
s
s −1wt +1 )


= ( I − F ) A∆zt − ( I − F )( ∑ (G ) AC
s =1
1
s
s −1 )wt +1 .


Now reversing the Koyck lead and setting ( ∑ s =1 (G1 )s ACs −1 ) = D*, gives rise to
a forward-looking representation, which depends on future values of zt:
yt − Fyt −1 − ( I − F ) Azt −1 − ut
= ( I − G1 L−1 )−1 (( I − F ) A∆zt − ( I − F )D * wt +1 )

= ∑(G ) (( I − F )A∆z
1
s
t+s − ( I − F )D * wt + s +1 ).
s =0
Now decompose the last relationship as follows:
yt − Fyt −1 − ( I − F ) Azt −1 − ut
∞ ∞

= ∑ (G1 )s (( I − F ) A zt + s − ∑ (G ) ( I − F )D * w
1
s
t + s +1 )
s =0 s =0
∞ ∞

= ∑ (G1 )s ( I − F ) Azt + s − ∑(G ) ( I − F )Az1


s
t + s −1
s =0 s =0

− ∑ (G ) ( I − F )D * w
1
s
t + s +1 .
s =0
Therefore:

yt − Fyt −1 − ut = ∑ (G ) ( I − F )Ax
s=0
1
s
t +s + ( I − F ) Axt −1 − ( I − F ) Axt −1

∞ ∞
– ∑s =1
(G1 )s ( I − F ) Axt + s −1 − ∑ (G ) ( I − F )D * w
s=0
1
s
t + s +1 .
Further Topics 195

Re-writing the above into an equation purely in levels:


∞ ∞

yt − Fyt − ut = ( I − F ){ ∑ (G1 )s Azt + s − G1 ∑ (G ) 1


s −1
( I − F ) Azt + s −1 }
s =0 s =1

− ∑ (G ) ( I − F )D w
1
s *
t + s +1 .
s =0

Re-indexing the second sum and gathering terms, yields a levels relationship:
∞ ∞

yt − Fyt −1 − ut = ( I − F )( I − G1 ) ∑(G ) Az
1
s
t+s − ∑ (G ) ( I − F )D w
1
s *
t + s +1 .
s =0 s =0

It is possible to estimate the above model by FIML using the following


recursion:
yt − Fyt −1 − ut = ht (6.57)
ht = FRo Azt − ( I − F )D * wt +1 + G1ht +1 .

In such circumstances the above relationship has the same forward recursion
as was considered before, except the transversality condition relies on the
existence of cointegration. Decompose (6.44) as follows:

Lim { −K∆yT * +1 + K∆yT * + H ( yT * − AzT * )} =


T *→∞
Lim { −K∆yT * +1 + K∆yT * } + Lim { H ( yT * − AzT * )} → 0.
T *→∞ T *→∞

The conditions for cointegration (Engle and Granger 1987) are sufficient for
this to be satisfied. That is yt ~ I(1) and (yt – Azt) ~ I(0), yt and zt cointegrate.
Furthermore, (6.57) has an error correction form:
∆yt − ( I − F )( yt −1 − Azt −1 ) − ut = ht
ht = ( I − F ) A∆zt − ( I − F )D * wt +1 + G1ht +1

In the next section the case with dependence amongst the endogenous vari-
ables is considered.

6.4.3 Models with forward behaviour and unit roots in the process
driving yt
There are a number of reasons for finding dependence amongst the endoge-
nous processes, one of which would be cointegration, the other would be the
type of dependence that exists amongst series that might satisfy an adding up
type constraint. In the former case the cause of rank failure is the existence of
a unit root and it can be shown that the original objective function can be
solved in the usual way (Hunter 1989a).
196 Modelling Non-Stationary Time Series

Consider the loss function


T*
E( ℑt 't ) = ∑ E{ ( ∆y ′ K∆y
t =o
t
t t + ( yt − vt )′ H ( yt − vt )) 't )} (6.58)

where the rank(H) = r1 As a result, the following decomposition exists: H = E′E


and rank(E) = r1. Now define M such that the matrix [E′ : M′] has full rank.
Now we can redefine the loss function in terms of new variables:
T*
E( ℑt 't ) = ∑ E{ ( ∆y ′ K ∆y
t =o
t ∗
t
∗ ∗
t + ( yt∗ − vt∗ )′ H ( yt∗ − vt∗ )) 't )} (6.59)

 E  –1
where y ∗′ = y ∗′
t 1t [ ]
y2∗′t = yt′ [ E ′ : M ′], K * = [ E ′ : M ′]–1 K   and v*t conformable with
M 
y*t. It follows that the loss function has the following form:
T*
E( ℑt 't ) = ∑ E{ y
t =o
t ∗′ ∗ ∗
1t K11 y1t + 2 y1∗t′ K12

y2∗ t +

y2∗′t K22

y2∗ t + ( y1∗t − v1∗t )′ ( y1∗t − v1∗t )) 't )}. (6.60)

Re-writing the above relationship in terms of a new set of stationary variables,


then y+t ′= [y*′
1t y*′
2t ] and here it is assumed that the long-run target for v*
2t = 0
and y2t = 0. Therefore:
T*
E( ℑt 't ) = ∑ E{ ( y
t =o
t +′ ∗ +
1t K11 y1t + 2 y1+t′ K12

y2+t +

y2+t′ K22

y2+t + ( y1+t − v1+t )′ ( y1+t − v1+t )) 't )}.

+
Now differentiating with respect to y1t gives rise to the following first-order
condition:

E( t K11 y1+t −  t +1 K11

y1+t +1 −  t ( y1+t − v1+t ) − 2 t K12

( y2+t − y2+t +1 ) Ωt ) = 0, (6.61)

and with respect of y+2t:



E( t K21 y1+t +  t K22

y2+t Ωt ) = 0.

Subtracting the above equation from its forward value and re-writing:

E( t K21 ( y1∗t −  y1∗t +1 ) +  t K22

( y2∗ t −  y2∗ t +1 ) 't ) = 0.

Now consider the system:


 I r 0 ∗
E( t ( K ∗( yt∗ −  yt∗+1 ) +  ∗
( yt − vt 't ) = 0.
 0 0
Now divide through by t and reverse the transformation:

E( K( yt −  yt +1 ) + H ( yt − zt ) 't ) = 0.
Further Topics 197

Hence, irrespective of the existence of cointegration, the same first-order con-


dition exists as does the solution dealt with before, except that H is rank
deficient. Therefore R0 = K–1H is rank deficient, F has n1 – r1 unit roots and
R0 = (I – F)(I – F)F–1 is rank deficient as can be observed from the following
decomposition:
( I − F )( I − F ) = ( I − PP −1 )( I − PP −1 )
= P( I − )( I − )P −1 .
Where the rank((I – F)(I – F)) = r1, when there are n1 – r1 unit roots. Hence the
rank of the matrix H determines the number of unit roots. Now it is probably
better to consider the recursive representation (6.57):
yt − ( I − F )( yt −1 − Azt −1 ) − ut = ht
ht = ( I − F ) A xt − ( I − F )D * wt +1 + G1ht +1 .

If I – F is rank deficient, then there is also the possibility of cointegration


amongst the endogenous and exogenous variables. Notice the dependence
also feeds forward into the relations in differences.

6.4.4 Estimation and inference


The benefit of the above approach is that it reduces the dimension of the esti-
mation problem when forward-looking behaviour needs to be considered.
Especially in terms of the need to estimate and store future predictions.
However, the downside is that inference is made more complicated.
As far as estimation is concerned, then the usual likelihood function
applies, where:
T


1 1
LogL((, H , K, A),  .) = −Tn log(2
) − T log  − tr ( −1 ut ut′ )
2 2 t =1

and ut = yt – Fyt–1 – ht. Now concentrating out  yields the quasi-likelihood:


LogLc ((, H , K, A).) = C − log 

where S = ∑ uˆt uˆt′ is a consistent estimate of . The likelihood is maximized


1 T

T t =1

using a Quasi-Newton algorithm such as Gill, Murray and Pitfield (see Sargan
1988) or an equivalent method. The method due to Gill, Murray and Pitfield
has the advantage of using the Cholesky factors from the inverse of the
Hessian. They are then bounded to be positive definite subject to an appropri-
ately conditioned Hessian matrix.
However, the conventional estimates of the parameter variance based on
the information matrix are not valid, even when the model for the endoge-
nous equations is estimated as a system. The correct estimate needs to take
account of the generated regressors and their parameter estimates. The follow-
ing algorithm is suggested to do this. Initial estimates of the exogenous
198 Modelling Non-Stationary Time Series

variables are estimated as a VAR, then the residuals are saved. The VMA repre-
sentation is estimated by OLS using the method described by Spliid (1983). In
state space form:

z = Wς + w.
 z1  w1 
   
z
 2 w2 
 .  . 
where z =  [ ]
,W = w−1 w−2 . . . w− p , w =  
 .  . 
 .  . 
   
 zT  wT 
C1 
 
C2 
. 
and ς =  
. 
. 
 
C p 

Hence, the OLS estimator of the parameters is given by:


%( 0 ) = (W(′o)W( o) )−1W( o) z,

where W(o) contains the initial estimates of the surprises, unobserved values of
the residual are set to zero and %(0) are the initial estimates of the parameters.
Once the system has been estimated, then the likelihood is re-estimated based
on B = 200, bootstrap re-samplings of the original residuals vector w, where
each iteration reallocates a block of residuals wi by the new residual set w(b)
used to provide new estimates of the VMA parameters (%(b) for b = 1, …, B).
Then given the maximum likelihood estimates of the parameters (, H, K, A)
an empirical distribution for the estimated test statistics are generated
from the bootstrap re-sampling regime. A sample of 400 is created by the
use of antithetic variance technique, providing at each bootstrap replication
a pair of residuals w(b) and –w(b) (see Hendry 1995). Then percentiles of
the empirical distribution can be used to determine critical values for the
estimated parameters.

6.5 Conclusion

In this chapter a number of more advanced issues have been addressed: coin-
tegration amongst series with different orders of integration; forecasting with
cointegrating relationships; and cointegration combined with short-run struc-
ture defined by rational expectations.
Further Topics 199

With orders of integration in excess of I(1), inference is similar to the I(1)


case except that there are now three types of process that evolve to generate
the data. Cointegration not only occurs in the usual way amongst the levels,
but may also occur between levels and differenced series, there are I(1)
common trends and also I(2) trends. However, identification is a fundamental
problem for the estimation of long-run behaviour in the I(2) case as three sets
of parameters are potentially ill-defined.
When the order of integration is less than 1, then series are not likely to
have the same fractional order of differencing. One approach is to consider
the average non-integer order of differencing for a group of series. Estimation
of the cointegrating vectors can be undertaken in a similar way as that for
I(1) series when a non-parametric approach is considered (Robinson and
Marinucci 1998), but testing is more complex (Robinson and Yajima 2002). It
is relatively straightforward to compare the order of difference between series
and to calculate the cointegrating rank, but there is no conventional proce-
dure for inference.
Forecasting in cointegrated systems occurs at two levels – the short run and
the long run and cointegration influences both of these. Short-run forecasts
are less influenced by cointegration, but long-run forecasts may be strongly
influenced. The literature is unclear as to whether gains in forecast accuracy
depend on the restrictions that cointegration imposes on the long-run process
or the interrelationship that cointegration imposes on the long-run forecasts.
It appears that there is little difference between long-run forecasts derived
from models that imposed the long-run restrictions as the forecasr evolves
when they are compared with forecasts that ex-post have the cointegrating
restriction imposed on them. This might suggest that the benefits to long-run
forecasting associated with cointegration follow from the imposition of the
restriction rather than cointegration per se. This would appear to be an issue
for further investigation, though the authors would conjecture that cointegra-
tion has a role in the accuracy of long-run forecasts.
Estimation of the structural parameters of optimizing models has become
enormously popular. It has become common practice to suggest that the VAR
is a solution to a forward-looking model, but then not to consider the relation
between the long-run and the short-run behaviour of the model. However,
both the Engle–Granger and the Johansen procedure have been applied to
models with forward-looking behaviour. The final section of this chapter con-
sidered the impact of unit root processes in the endogenous and exogenous
variables on the solution and estimation of forward-looking models with
rational expectations. Inference is significantly more complicated in these
cases and has thus far had to derive from the proposition that series are
cointegrated.
7
Conclusion: Limitations,
Developments and Alternatives

7.1 Approximation

Many economic theories, especially those in macroeconomics, are theories of


the way in which economic processes interact with one another to provide a
stable set of underlying equilibrium relationships. Failure to observe equilib-
rium is strong evidence against the theory predicting its existence.
In general, relatively little is said that would be useful to econometricians or
policy makers about the detailed nature of these relationships. This is not nec-
essarily a failure of the theories; they point the way towards what might be true
in a general sense, in effect indicating to the applied researcher where to look.
Cointegration analysis is one tool in this search. It may or may not be
useful, depending on the circumstances in which it is used. It is certainly not
a definitive statement about the structure of an economic system. It is far
more constructive to view the mathematical and statistical structures on
which cointegration is based as being approximations to reality. In this case
the question is how useful the approximation is given the ultimate aim of dis-
tinguishing between situations where equilibrium does and does not exist.
A reasonable requirement of a statistical tool is that it is internally consis-
tent, in the sense that it works well in situations for which the approximation
is exact. This requirement is satisfied by cointegration analysis, as demon-
strated by Johansen (1995a), and many other statistical theorists. This is not
really sufficient, however, and it is also necessary to examine the performance
of the technique in situations where the underlying models are not an exact
description of the actual processes generating the data. Such investigations
indicate, as with all statistical techniques, that certain approximation failings,
such as structural breaks, are more serious than others. The methods exam-
ined in this book use as the approximating process, one that is linear, has
fixed coefficients and Gaussian disturbances. A further aspect of the approx-
imation of the technique as a whole is that the distribution theory of the tests

200
Conclusion 201

is based on an arbitrarily large sample size – as is the analysis of the power of


the tests.1
It is, of course, vacuous to state that cointegration does not exist in the real
world. Cointegration is a model on which a set of methods are based, allowing
inferences to be drawn from data about the existence and characteristics of
equilibrium relationships. The reliability of the inferences varies, but it is
essential to have a tool that is capable of making the inferences. And this
methodology is capable of more. It can examine characteristics of equilibrium
relationships, and the dynamic properties of the variations about equilibrium.
As always, this is based on a set of approximations and unlikely findings
should be viewed against the possibility of approximation failure. In this,
cointegration does not differ substantially from any other statistical tech-
nique. Its benefit lies in its crucial ability to resolve the matter of the existence
or otherwise of equilibrium, a key concept throughout the subject of eco-
nomics. The huge and rapid adoption of cointegration methodology is
evidence of its invaluable contribution.

7.2 Alternative models

Many developments of the basic model have taken place, such as the intro-
duction of non-linear adjustment,2 more detailed characterizations of non-sta-
tionarity (such as fractional cointegration considered briefly in the previous
chapter). There have also been developments in other branches of times series
econometrics, such as the modelling of higher order moments of the data,
including variance, skewness and kurtosis. These models provide different
means of analyzing data, not necessarily focussed on the concept of equilib-
rium. Even so, some of the features of their data generating processes have
been used to investigate the robustness of the cointegration methodology. So
for example, the issue arrises as to how reliably cointegration, or its absence, is
identified in the presence of autoregressive conditionally heteroscedastic dis-
turbances, or where the Gaussian disturbance structure is replaced by one with
more frequently occurring extreme values (relative to a Gaussian distribu-
tion).3 It is inevitable that eventually the methods will fail.
Cointegration analysis has also been extended to panel data models where
the time series dimension is sufficiently large.

7.3 Structural breaks

Probably the main feature of economic time series that is capable of under-
mining cointegration analysis is that of structural breaks. Breaks in individual
time series can lead to incorrect inference as to their order of integration. Thus
data that are considered to be integrated of order 1, might in fact be stationary
202 Modelling Non-Stationary Time Series

in the sense that they consist of stationary stochastic deviations around a


deterministic trend that displays jumps in value of changes in slope.
Alternatively, it is conceivable that the nature of the cointegrating relation-
ship between I(1) variables may change, or that the adjustment coefficients
may change. That is, in the notation of the previous chapters, the structure of
the intercept vector ,  and the i of
p −1
xt = + xt −1 − ∑ x
i =1
i t −i + t ,

may change. If  changes, then this may be represented as changes in  or ,


and possibly the cointegrating rank (though this seems less acceptable since it
suggests the appearance and disappearance of equilibrium relationships over
time). Clearly this is not an exhaustive list.
The problem for cointegration analysis is that failure to allow for structural
breaks, especially in  or , is likely to result in an inference of non-cointegra-
tion, even where cointegration exists. Economically, this will result in the
failure to infer the presence of an equilibrium relationship where in fact one
exists.4
There is also a potential identification problem – where a structural break in
 occurs, is this parameterized as a change in  or ? Is it a change in adjust-
ment to disequilibrium or the equilibrium that is being adjusted to?5

7.4 Last comments

There is no doubt about the impact methods for the empirical analysis of time
series equilibrium have had on applied economics. The methods and models
continue to develop, and the range of subjects to which it can be applied
seems only to be limited by the availability of adequate data. Indeed, even rel-
atively small samples have been analyzed via the use of bootstrapping tech-
niques. Outside the realm of high frequency financial models, it is unlikely
that a similar revolution in econometric time series analysis will occur in the
near future.
Notes

1 Introduction
1 Muellbauer (1983) showed that a random walk model of consumption with innova-
tions in income and interest rates can be nested in the ADL framework due to
Davidson et al. (1978). However, the tests used do not take account of the under-
lying series being non-stationary.
2 As will be discovered in the last section of chapter 6, stationarity is overly strong. In
addition, the types of model used by Sargent are excessively restrictive (Hunter
1989).
3 It should be noted that the impulse response function solved from the VAR is not
unique (Lippi and Reichlin 1994) and any findings on causality depend on the
variables in the VAR model estimated (Hendry and Ericsson 1990).
4 Keynes discusses the latent nature of expectations, the problems with dynamic
specification, measurement error, the role of forecast performance and structural
breaks.

2 Univariate and Single Equation Methods


1 The sudden drops in level of the series around 1973 and 1981 are typical of real eco-
nomic time series and can cause a problem with their statistical analysis. They are
called structural breaks, and their characterisation and impact on estimation and
inference is a major concern. See, for example, Maddala and Kim (1999).
2 For a more precise definition of a stochastic process see Banerjee et al. (1993, p. 10).
3 The use of the word ‘stationarity’ can now be understood to refer to properties that
are unchanging, hence ‘stationary’. Changing the properties that are required to be
fixed through time changes the definition of stationarity. The more that are
required to be fixed, the stricter (and more impractical) the definition.
4 Covariance stationarity is also known as weak and second-order stationarity.
5 Although of no direct interest in this book, note that correlation measures only
linear association. Concentration on linearity can be justified by a distributional
assumption of normality.
6 A series that consisted of a linear time trend as the mean plus a stationary process
would have this property. In its simplest form, this is known as a trend plus noise
model.
7 In fact, the sequence of random variables underlying the time series observations is
referred to as a stochastic process, and it is then the stochastic process that is
labelled stationary.
8 In addition, the observed data will also not be a function of the DGP alone, but also
of the observation process (Hendry 1995; and Patterson 2000) including errors and
systematic distortion due to such procedures as seasonal adjustment (Wallis 1974).
9 This definition includes the requirement of zero mean. This is not really substan-
tive, but keeps things simple. All white noise in this book is zero mean white noise.
10 The detailed theory draws a distinction between two components of a time series:
that which is perfectly predictable from its own past, called a deterministic

203
204 Notes

component, and that which cannot be perfectly predicted from its own past. A
purely non-deterministic process has no component that can be predicted from its
own past, and it is this type of series to which this abbreviated version of the
theorem refers.
11 See also Box and Jenkins (1976) and Granger and Newbold (1976).
12 In addition, t is uncorrelated with future values of the process, xt+j, j>0.
13 The initial values for this equation can be calculated from the process. See Hamilton
(1994), chapter 3.
14 In fact, this derivation requires the autocorrelations to be non-time varying. In
other words, equation (2.19) only applies in the stationary case. See section 2.3.7
below.
15 As described in any textbook dealing with difference equations, the other case that
has to be considered, but which is less interesting, is that where the roots are
repeated, in which case equation (2.19) has to be modified.
16 The AR(1) process xt = xt–1 + t will have one root given by 1 = –1. Substituting
this and p = 1 into (2.13) gives (2.11b).
17 This is, in fact, a linear trend, being a linear function of time. Higher-order polyno-
mial functions of time, such as the quadratic, are also referred to as time trends. It is
for the purposes of analogy that the linear case is used here.
18 This is not a very helpful piece of terminology as it seems to mix up the discrete
and continuous time cases. Perhaps “summed” would have been a better, if more
prosaic, choice.
19 There is another absurdity about this calculation. Although purporting to be a cor-
relation, it is clear that this quantity is not restricted to [–1, +1], for if j is large but t
small, then this quantity can fall below –1.
20 Similar arguments apply in the explosive case when the roots lie inside the unit
circle.
21 Preserving the ordering so the inverse operator is the premultiplying factor on the
right-hand side of (2.25) is not necessary in the univariate case, but is good practice
since in the multivariate case discussed in section 4.2 it is important.
22 The zero lag coefficient does not have to be 1 but it simplifies things a little to con-
sider this case, which is anyway appropriate for ARMA models.
23 As with all ACFs, x(0) = 1, and for all MA(1) ACFs, x(i) = 0 for j > 1, so only x(1) is
considered in this illustration.
24 Strictly this applies only to cases of distinct real roots. Complex roots will occur as
complex conjugate pairs and both be replaced by their inverses in order that the
process remain real. Repetition of roots will mean that fewer new parameterisations
can be generated by inverting just one root.
25 Note also that the MA still has to be normalized so that (0) = 1.
26 This definition deals with the case where the non-stationarity is due to a root of
z = 1. As already stated, all that is required for non-stationarity is |z| ≤ 1, so z = 1 is a
special case.
27 Unless otherwise stated it is the case that t is white noise and (0) = (0) = 1.
28 If the (first) differenced process does have a non-zero mean, then the undifferenced
process will possess a linear deterministic trend. In other words, although a linear
trend plus noise model (2.31) is not I(1), a random walk with drift is.
29 The key property is that, in the representation of the model using the initial values
and the summed disturbance process, the order of integration of the purely stoch-
astic component and the order of the polynomial time trend is the same. So, in a
t

random walk with drift, xt = x0 + bt +


t
∑ .
j =1
j The time trend is first order (linear),

and ∑
j =1
j =  t and so is I(1).
Notes 205

30 A brief introduction may also be found in chapter 5 of Harvey (1993).


31 The tests referred to are tests of cointegration, the property that two or more time
series share a common unit root driving process.
32 In practice, this would be exacerbated by the problem of approximating a MA with
a near unit root by a finite order autoregressive process. To achieve a given level of
AR approximation to the ACF of an MA process, more AR terms will be needed as
the MA root approaches unity. See Burke (1994a) and Galbraith and Zinde-Walsh
(1993).

3 Relationships Between Non-Stationary Time Series


1 See chapter 5 for a discussion of weak exogeneity.
2 In the static case there is no distinction between the disturbances to the relation-
ship and the deviations from equilibrium because the relationship without the
disturbance is the same as the long-run solution to the model.
3 Comparing (3.14) and (3.15) it can be seen that the intercept can be included either
inside or outside the error correction term. When approached in this way, it is clear
that the true equilibrium error must include the intercept, that is if the term in the
lagged levels is supposed to represent the extent to which the system was out of
equilibrium in the previous period then it must include the intercept. However, in
general, the intercept could be divided between the constant in the equation and
the lagged levels term with the only restriction being that they sum to the appro-
priate value. Thus the Error Correction Model (ECM) is sometimes written
  + 1 
y t = 0 + 0 xt − (1 − 1 ) y t −1 − 1 − 0 xt −1  + ut where 0 + (1 − 1 ) 1 = .
 1 − 1 
Being able to deduce 0 and 1 from , is an example of the identification problem.
4 David Hendry (1995) reserves the term error correction for the case where
(0 + 1 )
= 1.
(1 − 1 )
5 More generally, there is a steady-state growth rate for y and z to which the
equilibrium adjusts, but except for the constant this does not affect the long-run
relationship.
6 A previous footnote made reference to the identification problem in terms of the
representation of the intercept in the ECM. Notice that there is no ambiguity in
moving from an ECM with a constant outside the equilibrium correction term and
within it. They will combine to form an intercept in the long-run solution.
7 Constant term outside the equilibrium correction term in this case, but inside in
the other two equations listed here.
8 This does not imply that reversion to some mean value may never occur, but that
the distribution of such reversions is so long tailed that the expected value does not
exist.
9 In estimating equilibrium relationships it is important to include an intercept since
failure to do so will bias the coefficient estimates of the relationship.
10 See Hamilton (1994, p. 106) for more details.
11 In general, such cancellation gives rise to an additive constant term, which depends
on the initial values of the z and y processes. This contributes to the time series
structure in the same way as summing a random walk does.
12 This approach has more obvious appeal when thinking in terms of a DGP that
might give rise to the data, since this is almost certainly going to be causal in some
way.
206 Notes

13 The case for general  (L) and (L) in the ADL demonstrates further the power of
the lag polynomial notation. In this case the ADL can still be written as (L)yt =
+ (L)zt + ut and the equilibrium error will be t. Applying (L) to both sides of
the long-run relationship yields
(1)
( L)t = ( L)y t − − ( L) zt .
(1)
Now substituting out for (L)yt, implies
(1)
( L)t = − + ( L)zt − ( L) zt + ut ,
(1)
(1)
= {( L) − ( L) }zt + ut .
(1)
(1)  (1) 
Letting  ( L) = ( L) − ( L), it can be seen ( L) − ( L) has a unit root as
(1)  (1) 
(1)
 (1) = (1) − (1) = 0. As reparameterizing using (L) = (1)L + *(L) , it
(1)
follows that (L) = *(L) . When substituted into the expression for (L)t gives
rise to (L)t = *(L) zt + ut. If (L) has all its roots outside the unit circle and zt is
at most I(1), then t is stationary. As a special case, if zt is the random walk
defined by (3.54b), then zt is white noise and t is ARMA(p, q) where p is at most
the order of (L) and q is at most the larger of the orders of (L) and (L) minus
one (because the unit root has been factored out). This also shows that the closer
are any of the roots of (L) to unity, the more persistent will be the equilibrium
errors. At the same time, (1) → 0, so the speed of adjustment to equilibrium gets
smaller. Note, however, that in addition to the roots of (L) those of the autore-
gressive operator of the ARMA representation of zt will also determine the
behaviour of t. Thus if zt displays persistence, so will t independent of the
speed of convergence.
14 Exactly the same random number sequences are used in the two cases.
15 This terminology is also appropriate for any regression between I(d) variables with
disturbances that are I(d–b). However, it is generally reserved for the case where the
disturbances are stationary as in any case this is the case that is of most interest
because of its equilibrium interpretation.
16 Speeds of convergence and Op(.) are discussed in more detail in Spanos (1986,
Chapter 10) and Patterson (2000, section 4.4.2) provides a brief introduction.
17 Asymptotic normality does apply if zt is strongly exogenous for the estimation of b,
that is, it is both weakly exogenous and zt is not Granger caused by yt.
18 If regressions involving not only I(1) but also I(2) variables are being considered,
then the critical values of the tests must be further adjusted. The tests are still of the
null that the disturbances are I(1) against the alternative that they are I(0), thus it is
assumed that any I(2) processes are cointegrating to I(1). Haldrup (1994) discusses
this problem and presents appropriate critical values.
19 The common factor restriction for autoregressive models in the error is discussed
by Hendry and Mizon (1978). The ADF test applied to the cointegration case is a
transformation of such autoregressive behaviour in the residual associated with the
common factor restriction. The effect of such restrictions on ADF and ECM tests of
cointegration is considered in Kremers et al. (1992).
Notes 207

4 Multivariate Time Series Approach to Cointegration


1 Without explaining or deriving the origin of the vector white noise process, this
equality is best interpreted as meaning the autocorrelation structure of the
processes on each side of the equation are the same.
2 This is an important point. Although the example has not explicitly applied the SM
form, this is in fact being used. The rationality of the VAR operator means that the
SM reparamaterization can be applied.
3 Accounts of the Smith–McMillan form only use the term “matrix polynomial” or
“polynomial matrix” for matrices with finite order scalar polynomial elements.
 i , j ( L)
4 The roots of the i,j (L) are called the poles of the rational polynomial . Since
i, j ( L)
these are elements of C (L), their poles, for i,j = 1,2, …, n are called the poles of C(L).
5 In addition, the Wold representation requires the coefficient matrices of the VMA
to converge in the sense that
 g 
Lim g →∞  ∑
 i=1
CiCi′

must exist. This would not be the case if there were any poles on or inside the unit
circle. That is, the only way a rational VMA form can be consistent with the Wold
representation is for the operator to have all poles outside the unit circle.

6 If the jth diagonal element of D*2,2(L)D(L) is
d*2,2,j(L)(1 – L)dn–r+j = (1 – L)
where d*2,2,j(L) is the jth diagonal element of D*2,2(L), then it follows from (4.38) that
d*2,2,j(L) = (1 – L)1–dn– r+j.
Since negative powers of = (1 – L) are not defined, 1 – dn–r+j ≥ 0, but, as is stated
above, dn–r+j ≥ 1. This implies dn–r+j = 1 for j = 1,2, … r, and d*2,2,j (L) = 1.
7 Clearly equation (4.40) will not hold for all C(L). It implies conditions on C(L).
These are not discussed here.
8 The equation A(L)C(L) = C(L)A(L) = In expresses very clearly the extent to
which this process is not inversion. If A(L) were the inverse of C(L) then the rela-
tionship would be A(L)C(L) = C(L)A(L) = In. Instead, the inversion is only up to a
scalar factor of , so is a form of partial inversion, where all factors apart from are
cancelled.
9 Mathematically, this is written |A(z)| = 0 ⇔ |z| > 1 or z = 1.
10 The complete theorem is theorem 4.2, p. 49 of Johansen (1995a). The statement of
this theorem is rigorous, and rather than simply refer to the I(0) property of the
cointegrating combinations of the variables and of their difference, it refers to the
condition required so that initial distributions may be given such that the processes
are I(0). The reason for this is that the definition of stationarity used by Johansen
(1995, p. 14) is such that despite the parametric condition, only specific manifesta-
tions of the initial values will deliver stationarity. However, without the parametric
condition, none would suffice.
11 Johansen does not do this, but leaves in the initial values. They are reduced to zero
later on in the proof anyway, by pre-multiplication.
12 See Johansen (1995a), theorem 2.2, p. 14.
208 Notes

13 The expression for this determinant is found on p. 51 of Johansen (1995). The addi-
tional step of factoring out the unit root term is achieved using the formula for the
determinant of partitioned matrix provided by Dhrymes (1984, p. 37).
14 Johansen’s theorem 2.2 establishes that the necessary and sufficient condition for a
VAR to be stationary is that all the roots lie outside the unit circle.
15 The nature of the projection matrices is such that C may also be written C =
⊥(′⊥ ⊥)–1′⊥.
16 The method of maximum likelihood is not discussed here, although its relevance is
described in Appendix C. For an introduction see Patterson (2000) or Sargan (1988).
17 Condition (4.59) is not usually considered in applied work. Instead, the series are
individually tested to confirm that they are I(1).
18 This result is known as the Frisch–Waugh theorem. See, for example, Davidson and
MacKinnon (1993, p. 19).
19 There are a number of standard programmes that can be used to solve eigenvalue
problems, Doornik (1995) prefers the singular value decomposition which limits
the problem to a solution in terms of positive/negative semi-definite matrices.
20 Note that since 1 ≥ 2 ≥ … ≥ n –1 ≥ n ≥ 0, then
j = 0 ⇒ i = 0, i = j, …, n.
21 Note that the trace statistics can be written as the sum of a series of max statistics:

 n 

trace ( j − 1) = 
 i= j
max (i − 1).

 
22 We will also see later that dummy variables and stationary variables may be
included in the VECM and the number of these that are included also effects the
critical values. This type of sensitivity is typical of tests and estimation procedures
involving non-stationary processes.
23 This is not strictly correct since the rejection of the previous null was achieved
using a different test. However, this is the way the non-rejection of the null would
be interpreted in a sequential testing procedure, so it is stated as the null for conve-
nience.
24 In addition, it is the last test of the sequence that examines whether the data is sta-
tionary or I(1), yet this is in practice a property that is pre-tested using unit root
tests. That is, this is not the last but the first specification issue to be decided.
25 We would like to thank Paul Fisher and Ken Wallis for providing us with the data.
26 Hence, for r = 2,  and  are 6 × 2 dimensioned matrices.
27 For i = 1, with 1 = .0827, T = 60, the max test is max(1) = –Tlog(1 – 1) = –60log
(1 – .0827) ≈ 5.18, and for i = 2, max(2) = –Tlog(1 – 2) = 8.08. The trace test is the
sum of the max tests and for i = 2, trace(2) = 5.15 + 8.08 = 13.41.
28 When the small sample adjustment due to Reimers (1994) is used to test whether
there are r = 4 or more cointegrating vectors, then the revised test statistic is 14.2
and the test is marginally rejected at the 5% level. The test adjusts for the number
of observations by correcting for degrees of freedom, but this corrected statistic is
not necessarily any more reliable than the Johansen test statistic. More specifically,
it is known that shift dummies will alter the distribution of the test statistic
(Johansen (1995), while centred seasonal dummies do not. However, the critical
values used here are based on T = 50 and are again taken from Frances (1994).
29 The one-step Chow test is based on recursive estimation starting with an initial
sample of M – 1 observations and then re-estimated over samples M, M + 1 … T.
Notes 209

Here M = 50 and T = 74. To give a perspective on the choice of the initial sample,
T
following the Sargan rule for model parameterization k < — 3 . The minimum sample
when k = 18 (a VAR has 8 constants/dummies and 2 × 5 lag coefficients) is
n = 3k: = 54. For simplicity the recursive estimates were derived from M = 50 obser-
vations, but the first four calculations must in each case be viewed with caution.
30 The Cauchy is generated by a ratio of normals. Where nominal variables and prices
are normally distributed then their ratio would not converge in distribution to
normality.
31 In practice, C(1) is not always of rank n – r, so that there may be insufficient zero
roots. When C(L) has n – r zero roots, then C(z) = C0(z)C1(z) and C1(z) is of degree
q – 1. If there are insufficient zero roots, this can be rectified by extending the poly-
nomial. Consider zC*(z), then zC*(z) = znC*(z) and this extension introduces n addi-
tional null roots. For the extended model C*(z) = C0(z)C1(z) where C1(z) is of degree
q and C0(z) is defined above.
32 Hunter and Simpson (1995) suggested that the system should be re-ordered on the
basis of tests of weak exogeneity.

5 Exogeneity and Identification


1 I would like to thank Graham Mizon for his discussion of this issue.
2 For the cointegrating exogenous case in Hunter (1992a) r1 = 1 and n1 = 3.
3 Hunter (1992a) uses the same data set as Fisher et al. (1990).
4 Hence,  and  are 6 × 2 dimensioned matrices.
5 The restrictions for SE of the oil price are ij = 0 and ij = 0 for j = 1, 2.
6 In fact the same test can be used by re-running the Johansen procedure with the
variable to be tested for WE being placed first.
7 The restrictions for CE are 51 = 0, 61 = 0 and j2 = 0 for j = 1, …, 4.
8 Johansen (2002) provides a correction factor when  is known for linear restrictions
of the form  = H. This appears to work well when the correction is less than 2,
which generally implies that there are about 100 observations in the sample avail-
able for estimation. The correction factor used to weight the Likelihood Ratio test is:
1 1 
1+ (nd + nD + kn) + (n + 1 + s − r )
T  2 
1
+ [(n − 2 r + s + 2 nD − 1)v + 2(c + c  )].
Tr
However, this is not trivial as c, c and v need to be evaluated. For the case where
there are no higher-order dynamics, the non-stationary series are all random walks
and r = 1, then:
 ′(1 +  ′)
c = −2
 ′ ′ −1
 ′(2 +  ′)
cd = − = v.
 ′ ′ −1
Otherwise, cd ≠ v and these terms are derived from the trace of products of the
matrices in (L), ′, ,  and –1. Calculations not readily available in existing
software.
9 The WE restrictions for the model in Hunter (1992) are i1 = 0 for i = 4, 5, 6 and
j2 = $j142 + $j252 + $j362 for j = 1, 2, 3.
210 Notes

10 Strong exogeneity augments the sub-block WE restriction above by i2 = 0 for i = 1,


2, 3.
11 If an equation has exactly ji = n1 – 1 restrictions, then it has enough restrictions to
be exactly identified. When ji > n – 1, enough restrictions to be over-identified, but
without the appropriate number of restrictions it will be under or not identified.
12 The hessian is the second derivative of the likelihood, which provides an estimate
of the variance–covariance matrix of the parameters. If some parameters are ill-
defined, then the likelihood is flat and the hessian matrix singular. Then some para-
meters in the model are not identified. Perfect multi-collinearity is a special case of
this and it occurs when two or more variables are related and their parameters
cannot be independently estimated and as a result are not identifiable.
13 A number of authors have considered this issue, Hunter and Simpson (1995) and
Juselius (1995).
14 Usually, two equations or blocks of equations with different parameterizations have
the same value for their likelihoods. In general, there exists at least one model with
r exactly identifying restrictions per equation with a likelihood value, the same as
the unrestricted likelihood.
15 The approach described here was first outlined for the I(1) case in Hunter and
Simpson (1995).
16 This does not account for multi-cointegration (Granger and Lee 1989) and polyno-
mial cointegration (Yoo 1986), which does introduce lags into the long-run rela-
tionships.
17 The additional restriction is required to solve for all of the parameters and of
11 21 − 21 − 21 0 61 
the eight restrictions implied by ′r =   only six
 0 0 0 0 52 − 52 
are binding. The test associated with this structure for  is  62 = 6.8291, which is
accepted at the 5% level based on a p-value = [0.3369].
18 For every row and column of  selected, there is an equivalent r dimensioned sub-
matrix of  and . To determine an appropriate orientation of the system the sub-
matrices selected need to be of full rank.
19 In the case where more complex restrictions apply, then the general restriction con-
dition and procedure in Doornik and Hendry (1996) apply.
20 Here  is identifiable for the restrictions in (I) when the selected columns of  yield
a matrix A of rank r.
21 For n = 4, a more complex example, the approach discussed above can be shown to
identify. Let:
a 0 b c  a 0
' =   and B = H 2 =  .
d e f 0 d e 
Following Boswijk (1996), identifiability is lost when a normalization is invalid
(i.e., a = 0 ⇒ rank(H2) < r), but with this new restriction [ : ] is over-identified as
j = 3 > r2 – r. Selecting a new orientation, ensuring the generic result associated with
Theorem 9 holds, then:
0 0 b c  b c 
'(1) =   and B(1) =  .
d e f 0  f 0
This orientation is rejected when xt ~ I(1), f = 0 and  is not identifiable. But the
following orientation for xt ~ I(1), implies:
0 0 b c  0 b
'( 2 ) =   , B( 2 ) =  
d e 0 0 e 0 
and rank(B) = r. Now, [ : (2)] is always empirically identified and identifiable.
Notes 211

22 The matrices ij and ij have the dimensions ni × rj for i = 1, 2 and j = 1, 2. For
example, the matrix  is partitioned into two blocks of columns, .1 of dimensions
n × r1, and .2 of dimensions n × r2, then each block is itself cut into two blocks of
rows.
23 In the limit there are r such sub-blocks, which leads to the identification case
I 
considered by Boswijk (1992) where  =  .
0 
24 The original source of the data is the National Institute of Economic Research, that
has been kindly passed on to us by Paul Fisher and Ken Wallis.
25 The model in Hunter (1992a) is massively over-identified. It is possible to identify
subject to restrictions on both  and . Here we will concentrate on identification
from  alone.
26 The discovery of four valid solutions implies that the model has four over-
identifying restrictions.
27 If the determinant is tested for any sub-matrix of  then it is found that no such
combination with non-zero determinant appears to exist.

6 Further Topics in the Analysis of Non-Stationary Time Series


1 Many series are often bounded to lie in the range [0, 1] as is the case for interest
rates. The question of non-stationarity in this context is further complicated by the
notion of what an extreme value might be. Maybe one should consider the perfor-
mance of bond prices upon which the rate of return of the safe asset is derived.
Then again the non-stationarity may be a function of the process of aggregation or
the pricing formula. In practice all models are not identified, the models estimated
are always approximations and the modellers task is to limit the degree of non-
identification (Sargan 1983a).
2 Here our analysis is restricted to the case where trends are possible via unrestricted
intercepts in the conventional cointegration analysis ( 1 ≠ 0), but there are no
quadratic trends. Otherwise, the second step of the I(2) estimator has a restricted
intercept ( 2 = 0). This is the case considered by Johansen (1995) and, unlike
Paruolo (1996), it restricts our discussion to a single table. In the emprical example
considered by Paruolo (1996), he concludes that the selection of results associated
with 1Qr,s is quite consistent when a pre-analysis of the data suggests that there are
trends in the differences ( 1 ≠ 0), but not the second differences ( 2 = 0).
3 For the example considered by Paruolo inference progressed in a straightforward
manner, by sequentially moving past each test statistic a table at a time. For the
case considered here, the progress is more complicated, even when one only con-
siders the table of tests associated with 1 ≠ 0.
4 This is an extension of the I(1) case where –1′ = **′ meaning that the esti-
mated loadings and cointerating vectors are not distinguished from any non-singu-
lar matrix product. That is [, ′] and [*, *′] are observationally equivalent. Now
this problem is further complicated in the I(2) case.
5 The values of d estimated are found to be sensitive to the bandwidth m. A common
assumption made in the literature on evaluating standard errors in cointegrating
T
regressions is to set the bandwidth to a third of the sample, m = — 3 . Alternatively,
Henry and Robinson (1996) provide some methods for the selection of m.
(L)
6 More generally C(L) = —— (L)
with the roots to the two finite polynomials all lying
outside the unit circle.
7 In both cases, the lag specification of the models used is that known from the data
generation process (DGP) and not determined empirically at each replication.
This can be expected to improve the performance of long-run forecasts, but not
212 Notes

necessarily the short-run forecasts, as the sample dynamics may be better described
by some other order than that used to generate the data.
8 Roots in the paper are the reciprocals of those normally reported, thus a root less
than one in modulus is a stationary root. On this basis the roots of the process are,
respectively: {0.5, 0.5, 0.5, 0.5}, {0.5, 0.5, 0.95, 0.95}, {0.5, 0.5, 0.99, 0.99}, {0.5, 0.5,
1.0, 1.0}, {1.0, 1.0. 1.0, 1.0}.
9 An alternative study would be one based on perturbations of the cointegrated
model, model 4, that retained the common feature, but moved it from being at the
unit root to being further outside the unit circle. This would mean that the
processes became stationary, and more solidly so, but retained the reduced rank
property key to cointegration. In this way, it is possible to isolate two aspects of the
problem with potentially different impacts: stationarity and common features
(reduced rank).
10 This conclusion can also be drawn by comparing the scaling on the vertical axes of
Figures 6.1a–6.1e, whence it will be seen that much the smallest scale is employed
in Figure 6.1c.
11 Figure 6.1d also shows clearly that, in this case, under-specification of the co-
integrating rank is not harmful to forecast performance (including imposing unit
roots), whereas over-specification leads to a deterioration in forecasting perform-
ance.
12 Though they do not establish whether it is the imposition of any false restriction
that matters, or that of unit roots in particular. This is the point made by Clements
and Hendry. They also do not consider if the near unit root is a common feature, or
if restricting it to being so would be advantageous.
13 The information criterion can also be written in terms of the eigenvalues of the
underlying problem, and hence in terms of the test statistics.

7 Conclusions
1 Johansen (2002b) provides a small sample correction to the rank test for cointegra-
tion r = 0 and r = 1. The correction factors are difficult to calculate, but based on the
simulation results there can be considerable benefits to their use. Based on the study
of a four-variable model of Danish Money, the critical values are adjusted by any-
thing between 1.14 and 1.07 for t = 50, 100. For the empirical results in section 4.6.2
such adjustments would not affect the conclusions associated with the trace test for
r = 0 and r = 1. Quite clearly such an adjustment might alter our conclusions when
r > 1. Even so, the critical values used here were taken from Franses (1994), which
assumed T = 50. Further, wrong rejection of the null might not be of paramount
importance when over-rejection of the alternative of cointegration is what is critical
to the applied researcher. Hence, were the true size of the test 10%, then over-rejec-
tion of the null might not be a problem, but cases where the size is considerably
larger ought to be avoided. In particular, test properties are likely to be very poor
when some series are I(2), because conventional tests of cointegration require all the
series in the VAR to be no more than I(1). When there are I(2) series in the VAR, this
violates the necessary and sufficient condition required for the cointegrating rela-
tionships to exist. Johansen shows that the correction increases in line with the true
size of the test as the series tend to become I(2) and in the limit non-cointegration is
always rejected. The reader is referred back to section 4.4.2 and 4.6.2.
2 For further discussion of such issues the reader is directed to Granger and Hallman
(1991), Granger (1995) and Corradi et al. (2000).
Notes 213

3 As is mentioned in Haug (1996), non-gaussianity ought not to be crucial, as long as,


sums of the residual vector of the Johansen VAR can be approximated by vector
brownian motion. (Johansen 1991, Appendix E).
4 From the point of view of an analysis of an economic system, it is a moot point
whether it is desirable to infer equilibrium where, although it exists, it does not have
fixed long run coefficients or where the rate of adjustment towards equilibrium
varies.
5 And further, the technique would not illuminate a situation where the adjustment
coefficients and the cointegrating vectors change over time, but in such a way that
the II matrix remains constant. This would correspond to a situation where the
nature of the equilibrium relationships was developing, but being compensated for
changing adjustment coefficients.

Appendix A
1 Notice that in this simple case the (2,2) element of C3(L) is |C(L)|.

Appendix C
1 The eigenvalue problem is solved with respect to both  and  under some of the
restrictions considered in chapter 5, while the likelihood associated with general
restrictions, applied to both  and , is presented in Appendix F.

Appendix E
1 This statement implies that, for u > 0, w(u) ~ N(0, u).
2 For x > 0, x can be written x = X + , where X is a non-negative integer and
0 ⭐  < 1. Then 〈x〉 = X.
3 To be more precise, let X be a random variable and x represent a value taken by X.
Also, let XT be a sequence of random variables. Let the distribution function of X be
F(.) and that of XT be FT(.). Then FT(.) is said to converge weakly to F(.) if
FT ( x) = Pr( XT ≤ x) → Pr( X ≤ x) = F ( x) as T → ∞.

4 For a proof of this result, see McCabe and Tremayne (1993, chapter 8).
5 See Johansen (1995, p. 151) for details.
6 Weak convergence, in contrast, indicates the convergence of one random variable
to another.
p
7 Technically  → 0, or, equivalently,  is said to be op(1).
8 For details, see Johansen (1995, p. 158).
9 These generalizations break down the residual product moment matrices in terms of
components in the cointegrating space and orthogonal to it.
10 Davidson (1994) provides detailed discussion of different types of stochastic
convergence.
11 Pesaran, Shin and Smith (2000) extended this set up by allowing exogenous I(1)
variables, which distorts the distributions.
12 MacKinnon, Haug and Michelis (1999) find that using Monte Carlo simulations
based on 400 observations leads to quite inaccurate results, especially when n – r is
large. They use a response surface estimated across a range of experiments using dif-
ferent sample sizes. This method calculates the relevant percentile, say the 95th,
214 Notes

appropriate for a test of 5%, for each set of Monte Carlo experiments using a partic-
ular DGP, and regresses this on the characteristics of the DGP. In the simplest form
the dependent variables are an intercept and powers of the reciprocal of the sample
size, such that the estimated intercept is the estimated asymptotic critical value of
the test. Critical values for other sample sizes are obtained by using the estimated
regression to predict substituting the relevant value for T. This approach is also used
in MacKinnon (1991) for unit root and residual based cointegration tests.
13 Asymptotic tests are those based on finite samples of data but using asymptotic
critical values.

Appendix G
1 The normalization adopted by Hunter and Simpson implies that the first vector is an
inflation equation, the second an exchange rate equation, the third a terms of trade
or real exchange rate equation and the fourth a real interest rate equation.
Appendix A: Matrix Preliminaries

A.1 Elementary row operations and elementary matrices


In what follows, the word row can be replaced by the word column to define an
elementary column operation. There are three types of elementary row operation:

(i) The interchange of two rows.


(ii) Multiplication of one row by a constant.
(iii) Addition of one row to another row times a polynomial.

A left (right) elementary matrix is a matrix such that, when it multiplies from the left
(right) it performs an elementary row (column) operation. The matrix formed from the
product of such matrices therefore performs the same transformation as a sequence of
such row (column) operations. For example, consider the use of row and column opera-
tions to diagonalize the 2 × 2 finite order polynomial matrix,
1 − 3 L − L 
C ( L) =  1 4
 − L 1 − 1 L
. (A.1)
 8 2 

Row operation 1 (objective to alter the (1,1) element to unity): replace row 1 by row
1 minus 6 times row 2. This can be achieved by pre-multiplication by the matrix
 1 − 6
 0 1 .
 
The new matrix is

 1 − 6  1 − 4 L − L  1 − 6 + 2 L
3

C1 ( L) =    1  =  1 . (A.2)
 0 1   − 8 L 1 − 2 L  − 8 L 1 − 2 L 
1 1

Row operation 2 on C1(L) (objective to alter the (2,1) element to zero): replace row 2 by row 2
plus 18– L times row 1. This can be achieved by pre-multiplication by the matrix
1 0
1 .
8 L 1
The new matrix is

1 0  1 − 6 + 2 L  1 − 6 + 2L 
C 2 ( L) =  1  1  = . (A.3)
 8 L 1  − 8 L 1 − 2 L   0 1− L+
1 5 1
4 4
L2 

Column operation 1 on C2(L) (objective to alter the (1,2) element to zero): replace column
2 by column 2 minus (2L – 6) times column 1. This can be achieved by post-
multiplication by the matrix

 1 − (2 L − 6)
0 .
 1 

215
216 Appendix A

The new matrix is


1 − 6 + 2 L  1 − (2 L − 6)  1 0 1
C 3 ( L) =   = .
 0 1 − 5
4
L + 41 L2   0 1   0 1 − 45 L + 1
4
L2 
The elementary matrices of the row operations can be multiplied together (retaining the
order of multiplication) as
1 0  1 − 6  1 −6 
G( L) =  1  0 1  =  1 3 
8 L 1  8 L 1−4 
L
and writing the elementary matrix of the column operation as
 1 − (2 L − 6)
H ( L) =  
0 1 
the diagonalized matrix may be written
C3 ( L) = G( L)C ( L) H ( L).

A.2 Unimodular matrices


Consider the matrices defining the elementary operations in the previous example. Note
that both G(L) and H(L) are matrix polynomials. In general, the determinant of a poly-
nomial matrix would be a polynomial in L. But in this case
1 − (2 L − 6)
H ( L) =   =1
0 1 
and
1 −6
G( L) = =1
1
8
L 1 − 34 L

and so are not functions of L. Furthermore, because the determinant is non-zero, the
matrices are invertible. Such matrices are known as unimodular matrices (having constant
non-zero determinant). Usefully, all elementary matrices are unimodular and so there-
fore is the product of two or more elementary matrices. It is therefore possible to invert
the transformation and express C(L) in terms of C3(L) as

C ( L) = G( L)−1 C3 ( L) H ( L)−1.

A.3 Roots of a matrix polynomial


Let A(L) be an n by n matrix polynomial of order p, and let |A (L)| be its determinant.
Then z is a root of A(L) if |A (z)| = 0. The maximum number of roots possible is np.
For example:
 3 
1 − L −L 
A( L) =  4
1 
,
1
 − L 1 − L
 8 2 

 3  1  1 5 1  1 
so A(z ) = 1 − z  1 − z  − z 2 = 1 − z + z = (1 − z ) 1 − z  ,
 4  2  8 4 4  4 
and the roots are therefore z = 1 and z = 4.
An important special case is that of a unit root. If A(L) has a unit root then |A (1)| = 0,
that is A(1) is singular. This the case in the example above by putting z = 1.
Appendix B: Matrix Algebra for Engle
and Granger (1987) Representation

B.1 Determinant/adjoint representation of a polynomial matrix


Consider the matrix inverse, A–1 = Aa/|A|, where Aa is the adjoint matrix of A. Let z be a
scalar complex number. The corresponding condition for the existence of the inverse of
a square polynomial matrix G(z) is that |G(1)| ≠ 0. If this is satisfied, then the inverse
polynomial may be written
G −1 ( z ) = G a ( z ) / G( z ). (B.1)
In particular, let G(z) be an n × n polynomial matrix with G(0) = In. That is,
m

G( z ) = I n − ∑G z
i =1
i i

where the Gi, i = 1, 2, …, m are n × n coefficient matrices. Then, denoting the determi-
nant of G(z) by |G(z)| and its adjoint by Ga(z):
G a ( z )G( z ) = G( z ) I n . (B.2)

Note that G (z) is an n × n polynomial matrix or order at most m × (n – 1), and |G(z)| is a
a

scalar polynomial of order at most m × n.

B.2 Expansions of the determinant and adjoint about z 僆 [0, 1]


The proof of this lemma may be found in Engle and Granger (1987). Now consider the
case where G(z) may be of reduced rank at z = 0, consider the expansion
G( z ) = G(0) + zG * ( z ).

Now, rank (G(0)) = n – r, 0 ≤ r ≤ n and z 僆 [0,1]. For the case considered here G* (0) ≠ 0
and the determinant of the polynomial in z is:
G( z ) = z r g ( z ),

where
a

g (z) = ∑ g z , a ≤ ( m × n) − r ,
i =0
i
i

gi being scalar coefficients, and the adjoint polynomial is

G a ( z ) = z r −1 H ( z ),
where
b

H (z) = ∑H z .
i =0
i
i

217
218 Appendix B

It follows that the index on the sum is limited by b = (m × [n – 1]) – r + 1 with Hi being
n × n coefficient matrices. If G(z) is originally of infinite order, then a and b are also
infinite.

B.3 Drawing out a factor of z from a reduced rank matrix


polynomial
It is possible to extract a factor of z from a matrix polynomial G(z) for the singular case
where G (0) is a reduced rank polynomial. If G(0) is singular then r ≥ 1, and substituting
out for the adjoint and the determinant of G(z) from (B.2) gives
G a ( z )G( z ) = z r −1 H ( z )G( z ) =
= G( z ) I n = z r g ( z ) I n .
Dividing left and right by zr–1 and arranging the polynomials in z
H ( z )G( z ) = zg ( z ) I n . (B.3)

Pre-multiplying G(z) by H(z) extracts a factor of z and reduces the expression to a scalar
diagonal form:
Application to lag polynomial to draw out a unit root factor
Let A(L) be a n × n lag polynomial matrix or order m. This may be written instead as a
polynomial of order m in = (1 – L) using 1 – = L so that
m m

A( L) = I n − ∑ A L = I − ∑ A (1 − ∆) .
i =1
i
i
n
i =1
i
i

By application of the binomial expansion to the terms (1 – )i this can be shown to be a


polynomial of order m in . For easy application of equation (B.3), let A(L) ≡ G( )

m m

A( L) = I n − ∑ A L = I − ∑ A (1 − ∆) = G(∆).
i =1
i
i
n
i =1
i
i

Now consider the z transform by setting z = :

G( z ) = I n − ∑ A (1 − z) .
i =1
i
i

Now consider G(z) evaluated at the zero frequency

m m

G(0) = I n − ∑ A (1 − 0) = I − ∑ A = A(1).
i =1
i
i

i =1
i

It is also important to recall that in the reduced rank case G(0) must be singular as is
A(1). Assuming this condition to be satisfied, then replace z in equation (B.3) by to
give

H ( ∆ )G( ∆ ) = ∆g ( ∆ ) I n . (B.4)
~
Both H( ) and g( ) may be written as polynomials of L (of unchanged order), say H (L)
~
and g (L) respectively, and so (B.4) may be written
H˜ ( L) A( L) = ∆g˜( L) I . (B.5)
n
~
Equation (B.5) states that pre-multiplying A(L) by H (L) results in a scalar diagonal lag
polynomial matrix with a scalar factor in the difference operator .
Appendix C: Johansen’s Procedure as a
Maximum Likelihood Procedure

The starting point for obtaining the maximized log-likelihood function in terms of the
relevant eigenvalues is a multivariate Gaussian distribution. From this assumption
follow the maximum likelihood estimates of the cointegrating vectors as particular
eigenvectors and the expression of the maximized likelihood in terms of the subset of
the corresponding eigenvalues. This in turn leads to simple expressions for test statistics
based on the comparison of maximized likelihoods, since these too will depend on the
relevant eigenvalues. Not all distributional assumptions will lead to these results and, as
such, the Johansen procedure can be said to depend on the Gaussian assumption. The
distributional assumption is that the disturbances of the VAR follow a multivariate
Gaussian distribution. That is:
p −1

∆xt = xt −1 − ∑ x i

t −i + t . (C.1)
i=1
 = ′ and  t ~ N I I D (0, )

The individual disturbance vector t has density


1
− n  1 
f ( t ) = (2
) 2 exp −  t′ −1 t 
 2 
giving rise to the density for xt from the VECM, conditional on past values, as


 1 
1 p −1

∑ x
− n 1

g ( xt , , i∗ , ) = (2
) 2 exp −  xt −  ′xt −1 +
 2 ∗

 i t −i
 2  i =1 
 p −1
 
×  −1  xt −  ′xt −1 +

 i =1

i∗ xt −i  .

 
The natural logarithm of the joint density of xt t = 1, 2, …, T, ignoring initial values for
convenience, is

G( xt ,t = 1,2 ,… ,T , , i∗ , ) = −
1
2
1
nT log (2
) − T log 
2
( )
 
T

p −1
′  p −1

∑ ∑ ∑
1
−  xt −  ′xt −1 + xt −i   −1  xt −  ′xt −1 +
∗ ∗
xt −i  .
2
i
  i

t =1  i =1   i =1 
 

Thus the log-likelihood of the VECM (conditional on the data), minus the constant
term –12– nT log (2
) is given by

log L (, , i∗ , ) = −
1
2
T log  ( )
 
T

p −1
′  p −1

∑ ∑ ∑
1
−  xt −  ′xt −1 + i∗ xt −i   −1  xt −  ′xt −1 + i∗ xt −i  .
2   
t =1  i =1   i =1 
 

219
220 Appendix C

This expression and subsequent algebra is simplified by re-expressing the log likelihood
in terms the following:
z0 ,t = xt , z1,t = xt −1 ,

[
z 2′ ,t = xt′−1 … xt′−( p −1) ]′
and = [ 1 … p–1]. Then the log likelihood can be written:

log L(, , , ) = −
1
2
T log  ( )
T
 
∑ (z ) (
−  ′z1,t + z 2 ,t ′  −1 z0 ,t −  ′z1,t + z 2 ,t . )
1
− 0 ,t
2 t =1 

This function may be maximized with respect to alone giving rise to an expression for
the maximum likelihood estimator for in terms of the data and the other parameters
of the model.
– –
Denote this estimator as . By differentiating the log likelihood with respect to and

solving the first-order conditions, is given by
= M 0 ,2 M 2−,12 −  ′M1,2 M 2−,12
T


1
where M i, j = zi,t z j′,t . The values of ,  and  that maximize log L(, , , ) will
T t =1
– –
also maximize this expression with substituted for – that is log L(, , , ). The
latter function is known as the concentrated likelihood function. Before writing it out
– – –
in full, note that appears in log L (, , , ) only in the term (z–0,t – ′z–1,t – z–2,t) or its
– –
transpose, so appears in the concentrated log-likelihood only in (z0,t – ′z1,t – z–2,t).
– –

But
z0 ,t −  ′z1,t − z 2 ,t = z0 ,t −  ′z1,t − ( M 0 ,2 M 2−,12 −  ′M1,2 M 2−,12 )z 2 ,t
= ( z0 ,t − M 0 ,2 M 2−,12 z 2 ,t ) −  ′( z1,t − M1,2 M 2−,12 z 2 ,t ).
Define
R0 ,t = z0 ,t − M 0 ,2 M 2−,12 z 2 ,t (C.2)
−1
R1,t = z1,t − M1,2 M z
2 ,2 2 ,t (C.3)
so that
z0 ,t −  ′z1,t − z 2 ,t = R0 ,t −  ′R1,t
and note that R0,t and R1,t are the residuals from the least squares regression of z–0,t and
z–1,t respectively on z–2,t. Using this residual notation, the concentrated log-likelihood may
be written
T

log L (, , ) = −
1
2
T log  −
1
2
( ) ∑{( R 0 ,t −  ′R1,t )′ −1 ( R0 ,t −  ′R1,t )}.
t =1

This likelihood function may be concentrated further to express it in terms of  only.


Regarding  as fixed in log L (, , ) above and solving the first-order conditions for 
and , their maximum likelihood estimators may be written

ˆ = S0 ,1( ′S1,1)−1 ,
ˆ = S − S ( ′S )−1  ′S ,
 0 ,0 0 ,1 1,1 1,0
T

∑R
1
where Si, j = i,t Rj′,t .
T i =1
Appendix C 221

A property of a Gaussian log-likelihood such as log L (, , ), evaluated at the


ˆ alone, as
maximum, is that it may be expressed in terms of 
T  ˆ  + n .
log LMAX = −  log  
2  

Thus, finally, the only term of interest in the concentrated likelihood, i.e. in log LMAX, is
ˆ, which itself is a function only of  (and the data). It therefore remains only to max-

imize log LMAX with respect to . Clearly the value of  that maximizes log LMAX also
~ ˆ|), since the difference is a constant term (a multiplicative
maximizes log L = – –T2 log (|
term in the likelihoods themselves). The problem is to obtain the value of  that max-
~
imizes log L. By definition, this will be the maximum likelihood estimator. Equivalently,
the problem is to minimize
ˆ = S − S ( ′S )−1  ′S .
Q ( ) =  0 ,0 0 ,1 1,1 1,0

The solution to this problem is obtained by first re-expressing Q() using the formulae
for the determinant of a partitioned matrix. In general, for any matrix
 A1,1 A1,2 
A= 
 A2 ,1 A2 ,2 
with invertible diagonal blocks

A = A1,1 A2 ,2 − A2 ,1 A1−,11 A1,2 = A2 ,2 A1,1 − A1,2 A2−,12 A2 ,1 .


Equating these two expressions for |A| and rearranging gives
A1,1 − A1,2 A2−,12 A2 ,1 = A1,1 A2 ,2 − A2 ,1 A1−,11 A1,2 / A2 ,2 .

Setting A1,1 = S0,0, A1,2 = S0,1, A2,1 = A′1,2 and A2,2 = ′S1,1 gives rise to the following
expression
ˆ = S − S ( ′S )−1  ′S
Q ( ) =  0 ,0 o ,1 1,1 1,0

= S0 ,0  ′S1,1 −  ′S1,0 S0−,10 S0 ,1 /  ′S1,1 .

The optimum is found by minimizing


Q˜ () =  ′S1,1 −  ′S1,0 S0−,10 S0 ,1 /  ′S1,1

=  ′( S1,1 − S1,0 S0−,10 S0 ,1 ) /  ′S1,1 . (C.4)

~
Let ˆ be the n × r matrix that minimizes Q(). Consider the solutions = i to the eigen-
value problem:
I − S1−,11S1,0 S0−,10 S0 ,1 = 0 (C.5)
ordered so that 1 > 2 > … > n Let ˆi for i = 1,2, …, r, be the eigenvectors corresponding
to i, i = 1,2, …, r, the r largest eigenvalues. Then it is stated without proof that

 = ˆ = (ˆ1 … ˆr )
minimizes (C.4). Furthermore, the minimized function can be written:
r

Q (ˆ ) = S0 ,0 ∏ (1 − ).
i =1
i

Thus, apart from constants, the maximized log-likelihood may be written

T 
r

log L˜MAX = − log S0 ,0 +


2

∑ log (1 − ).
i =1
i (C.6)
222 Appendix C

Since the ˆi are eigenvectors, many normalizations are possible. A convenient choice for
deriving the above expressions in terms of the eigen values follows from observing that
the original eigen value problem is equivalent to solving
S1,1 − S1,0 S0−,10 S0 ,1 = 0. (C.7)

(This is known as solving for the eigenvalues of S1,0S–1 0,0S0,1 in the metric of S1,1.)
Consequently the matrix of eigenvectors (ˆ) that diagonalizes S1,1
–1
S1,0S–1
0,0S0,1 also diag-
–1
onalizes S1,1 and S1,0S 0,0S0,1 in the following manner:
ˆ ′S1,1ˆ = I ,
ˆ ′S1,0 S0−,10 S0 ,1ˆ = ,
where  = diag ( 1, 2, …, r). It follows from the diagonalization that

 ′( S1,1 − S1,0 S0−,10 S0 ,1 ) /  ′S1,1 = I −  / I


r

= ∏ (1 − )
i =1
i

is the minimized value of (C.4), giving (C.6) as the maximized log-likelihood. Subject to
conditioning the problem on the , the values of ˆ and  ˆ can be calculated directly
from the formulae above, but with this normalization reduce to:

( )
−1
ˆ = S0 ,1ˆ ˆ ′S1,1ˆ = S0 ,1ˆ ,
ˆ = S −  ′.
 0 ,0

It follows that
ˆ = 
 ˆ ˆ ′ = S0 ,1
ˆ ˆ ′.

The determination of ˆ and ˆ in this way ensures that  ˆ is of rank r ≤ n. Since the
approach works regardless of whether  ˆ is of full or reduced rank, the procedure is
known as reduced rank regression. As will be indicated below, it is very closely related to
the calculation of canonical correlations.
This analysis demonstrates that the Johansen approach rests on the Gaussian assump-
tion in the following ways:

(i) Through concentration of the likelihood function it explains the generation of R0,t
and R1,t, and how this relates to the Gaussian likelihood.
(ii) The expression of the maximized likelihood in terms of the eigenvalues depends
on the particular form of the concentrated likelihood function in terms of the
ratio of the determinants of quadratic forms.1
(iii) The expressions for the likelihood ratio statistics in terms of the eigenvalues
depends on the expression for the maximized log likelihood, and hence these too
depend on the distributional assumption.
Appendix D: The Maximum
Likelihood Procedure in Terms
of Canonical Correlations

An interpretation of the maximum likelihood treatment described in Appendix B is


available through the concept of canonical correlations. The problem can be viewed as
one of finding the maximum extent of correlation between the two residuals series, R0,t
and R1,t (C.2, C.3). However, rather than deal with the residuals as they are, arising as
they do from individual equations in the system associated with a single process or its
difference as the dependent variable, the correlations considered are between linear
combinations of the R0,t and linear combinations of the R1,t. It transpires that the corre-
lations between the linear combinations of the residuals relate directly to the eigen-
values of the problem described in Appendix C (equation C.7), in such a way that
maximizing the log-likelihood corresponds to choosing the r greatest correlations. The
linear combinations of the R1,t that arise are the cointegrating combinations, where they
exist.
The idea of canonical correlation is to transform two vectors of variables so that the
elements of each vector have unit variance and are individually uncorrelated. In addi-
tion, only matching elements of the transformed vectors are correlated with each other.
Since all transformed variates have unit variance, covariance matrices and correlation
matrices are identical. This makes calculation easier.
Mapping the canonical correlation problem onto the maximum likelihood problem,
the starting point is the residuals R0,t and R1,t. These are transformed by pre-multiplica-
tion by n × n matrices A and B to give R*0,t = AR0,t, and R*1,t = B R1,t, where A and B must be
chosen such that:
T

∑R
1
Si∗, j = ∗
i,t R∗j,′t = I for i = j and P otherwise
T i =1

where P = diag (p1, …, pn), pi > 0, and, by appropriate ordering of the elements of R0,t and
R1,t, p1 ≥ p2 ≥ … ≥ pn. As all the pi are correlations and positive by construction, they lie
on the [0,1] interval. They are called canonical correlations. The solutions to the prob-
lems of the selection of A and B are the solutions to two closely associated eigenvalue
problems. Consider the matrices
H 0 = S0−,10 S0 ,1S1−,11S1,0
H1 = S1−,11S1,0 S0−,10 S0 ,1.
The eigenvalues of these two matrices are identical and given by the solution to equa-
tion (C.7) above. That is, they are the i, i = 1, 2, … n of the maximum likelihood
problem. The eigenvectors of H0 are the solutions for the columns of A and are denoted
ai. They are chosen so that a′i S0,0 aj = 1 for i = j, 0 otherwise. The eigenvectors of H1 have
already been denoted i, and are normalized as before, so that ′i S1,1j = 1 for i = j, 0
otherwise. Thus B is an n × n matrix with ith, column i. In addition, R2 = diag ( 1, …, n),
in other words, the eigenvalues are the squared canonical correlation. Thus, from
the expression for the maximized log-likelihood of equation (C.6), the Johansen ML

223
224 Appendix D

procedure can be seen to be the calculation of the coefficients of the linear combina-
tions of the non-stationary variables such that their correlation with the (canonically
combined) stationary variables is maximized. For given r, the required linear combina-
tions of the levels will be those using the eigenvectors, i, i = 1, 2, …, r. In order to max-
imize correlation with stationary variables, the linear combinations of the I(1) variables
will need to be as close to stationarity as possible. The problem is restricted by only
considering the r most correlated combinations. The cointegrating rank r has to
be determined by testing. The values of the model parameters are then obtained as
outlined in Appendix C.
Appendix E: Distribution Theory

E.1 Some univariate theory


The fundamental building blocks of unit root asymptotic theory are convergence in dis-
tribution, or weak convergence, and the scalar Wiener process or Brownian motion,
b(u), defined as follows.
Let b(u), u 僆 [0, 1], be a continuous time stochastic process with b(0) = 0. The con-
struct b(u) – b(v) is called an increment of the process. Let b(u1) – b(u2) and b(u3) –
b(u4) be two such non-overlapping increments, then

(i) b(0) = 0
(ii) b(u) – b(v) ~ N(0, |u – v|) ∀u ≠ v1
(iii) E(b (u1) – b(u2)) (b(u3) – b(v4)) = 0.

The fundamental distributional result involving scalar Brownian motion is Donsker’s


theorem. In I(1) systems, partial sums of IID processes occur naturally and frequently.
Donsker’s theorem provides an approximate large sample distribution for such quanti-
ties in terms of Brownian motions. The term ‘asymptotic’ is normally used in place of
‘approximate large sample’. For convenience, this simpler but not very informative
abbreviation is used in what follows and in the main text.
Consider the sequence

 t ~ I I D(0, 2 ), for t = 1, 2 , … , T
and partial sum
t

st = ∑ .
i =1
i

In order to employ Brownian motions to characterize the asymptotic distributions of


such quantities, st has to be manipulated so as to relate to the unit interval. To do this,
note that st can be thought of as the sum of the sequence up to some point a fraction of
the way, say , into the complete sequence. To represent this notationally, let 〈x〉 repre-
sent the integer part of x.2 Then for any t = 1, 2, …, T there is a  僆 [0, 1] such that
t = 〈T〉, allowing the partial sum to be written
T

s T = ∑ .
i =1
i

As T increases, s〈T〉 forms a sequence of random variables. In particular, in order to


obtain the asymptotic distribution, interest is in the limit of this sequence as T → ∞.
Each of the random variables s〈T〉 can be thought of as possessing a distribution func-
tion depending on T and , say FT, (.). If there exists some distribution function F (.)
such that
FT , (.) → F (.) as T → ∞,

then F (.) is called the limiting distribution of the sequence s〈T〉. It is said that FT, (.)
converges weakly to F (.).3 Notationally, FT, ⇒ F means FT, (.) converges weakly to

225
226 Appendix E

F(.). Furthermore, if S() is a random variable having distribution function F (.), then
s〈T〉 is said to converge in distribution to S(). The notation for this ought strictly to be
different since it involves random variables rather than their distribution functions, but
the same symbol ⇒ will be used. Otherwise, a commonly used notation for convergence
D
in distribution is →.
In fact, s〈T〉 does not have a limiting distribution. In order to obtain convergence, it
1 1
must be divided by T 2– . Donsker’s theorem defines the random variable to which T– 2– s〈T〉
tends in distribution in terms of Brownian motion. It states
T

∑  ⇒  b( )
1

T 2
i  for  ∈[ 0,1].4
i =1

A further tool is needed since the asymptotic distributions required are those of func-
tions of (normalized) partial sums. The continuous mapping theorem (CMT) states that,
if a sequence of random variables tends in distribution to another random variable,
then a given continuous function of the sequence tends to the same function of that
limit. So, for any continuous function g (.), the CMT states that
 −1 
g  T 2 s T  ⇒ g (  b( )).
 

An important example of a function to which the CMT applies is the integral. The CMT
and Donsker’s theorem can therefore be used to derive the Brownian motion character-
istics of a wide range of random variables based on partial sums of IID random variables.

E.2 Vector processes and cointegration


The generalization from scalar to vector processes is necessary to deal with cointegra-
tion, but is straightforward. Standard vector Brownian motion with variance covariance
matrix tIn is defined as
B( ) = [b1 ( ) … bn ( )]′
where the bi () are uncorrelated scalar Brownian motions. Thus, for  僆 [0, 1]:

(i) B(0) = 0;
(ii) B() ~ N(0,  In);

and the process has independent increments.


1
More generally, the process W() = ' 2– B() is encountered, having the same properties
as B () except that its variance-covariance matrix is  '. Donsker’s theorem now applies
to partial sums of IID(0,') vectors, t, t = 1, 2, …, T, and is that

T

∑  ⇒ W ( ).
1

T 2
i
i =1

Johansen (1995, appendix B) discusses the key results needed to obtain the limiting
distributions of the test statistics and estimators. Concerning the trace statistic, an impor-
tant observation, allowing the application of the multivariate version of Donsker’s theorem
via the CMT, is that the eigenvalues on which the statistic depends, are continuous func-
tions of product moments, the asymptotic distributions of which are available.
Appendix E 227

E.3 Testing the null hypothesis of non-cointegration


The main features in establishing the distribution of the test statistics are the following,
where for simplicity, the case of the statistic for testing the null of zero cointegrating rank
against the full rank alternative is considered, in the model with no deterministic terms.

(i) Establish the relationship between the eigenvalues that appear in the test statistic
and the product moment matrices, Si,j. This can be derived since the eigenvalues
are the solutions to the problem | I – S–1 –1
1,1S1,0S 0,0S0,1| = 0. This is the standard eigen-
value problem, for which the solutions = i, i = 1, 2, …, n are the eigenvalues of
S–1 –1
1,1S1,0S 0,0S0,1 and so
n

∑ = tr (S
i =1
i
−1 −1
S S S ).
1,1 1,0 0 ,0 0 ,1

(ii) Establish the asymptotic distributions of the Si, j under the null that the cointegrat-
ing rank is zero. These are.5
1


T −1S1,1 ⇒ WW ′du
0
(E.1)


T −1S1,0 ⇒ W (dW )′
0
(E.2)

S0 ,0 
P
→'
P
where → indicates ‘convergence in probability’, meaning that the random variable
on the left-hand side tends to the deterministic quantity on the right.6
(iii) Replace i by the more appropriate notation ˆi to emphasize that they are random
variables, and apply the CMT to obtain the limiting behaviour of
n

∑ ˆ = tr (S
i =1
i
−1 −1
S S S ).
1,1 1,0 0 ,0 0 ,1

This simply requires the substitution of the limit results at (ii) into the expression,
and adjusting for the required normalization such that convergence is to a random
variable. Thus,
n  1 −1
 1  1 
∑   ∫     ∫
i ⇒ tr  T WW ′du   W[ dw ]′ ' −1  dWW ′  .
 ∫
i =1
 0  0  0 
Clearly the right-hand side of this expression has a factor of T–1, indicating that it
tends to zero rather than a random variable. So, for weak convergence, both sides
must be multiplied by T to give
n 1 −1
 1  1 
T ∑   ∫    ∫
i ⇒ tr   WW ′du   W[ dw ]′ ' −1  dWW ′  .
  ∫
i =1
0  0  0 
1
This expression can be written in terms of standard Brownian motion, B(u) = '– –2
W(u) as

n 1 1
−1
 1 
T ∑ i ⇒ tr  
  ∫
(dB) B′ 

 
  ∫
BB′du  B(dB)′  .
 ∫
i =1
0 0  0 
228 Appendix E

(iv) Next, establish how the trace statistic can be expressed in terms of ∑ .
i =1
i

Note that, since | ˆi| < 1, the usual expansion of the natural logarithm function
applies, such that
 n 
∑ ( )
n

−T 
 i=1
log 1 − ˆi  = T
 i =1
ˆi +  ∑
where  is an asymptotically irrelevant term such that it can be ignored in the
 n 
∑ ( )
n

subsequent analysis, that is, −T 


 i=1
log 1 − ˆi  and T

∑ ˆ
i =1
i converge weakly to the

 n 
same random variable.7 But −T 
 i=1
∑ ( 
)
log 1 − ˆi  is the trace statistic for testing the

null of non-cointegration against the full rank alternative.


(v) Thus the null distribution of the test statistic is given by
1 1 
−1
1 
∑ ( )
n

−T
  ∫
log 1 − ˆi ⇒ tr   (dB) B′du   BB′du 
  ∫  ∫
 B(dB)′  .

(E.3)
i =1
0 0  0 

E.4 Testing a null hypothesis of non-zero rank


This treatment has to be generalized to allow for null hypotheses of non-zero rank and
for the various forms of trend that can be added to the basic VAR model. For testing the
null of cointegrating rank r, the trace test statistic is

 n 
−T  ∑ (
i=r +1
log 1 − ˆi .

)
The analysis proceeds by examining the behaviour of the n – r smallest eigenvalues
under the null. It is stated, without proof, that under the null hypothesis that the coin-
tegrating rank is r, with appropriate normalization, the smallest n – r eigenvalues con-
verge to zero while the remaining r tend to positive constants.8 It transpires that the
problem is best addressed not in terms of the eigenvalues, i but of i = T i. For conve-
nience, define
S( ) = S1,1 − S1,0 S0−,10 S0 ,1.

The eigenvalues are the solutions to |S( )| = 0. Clearly the solutions are unchanged for
the problem
A ′ S( ) A = 0

for any non-singular matrix A. Now partition A such that A = (A1 A2), then:
A ′ S( ) A = H G

where H = (A′1S( )A1), G = A′2(S( ) – S( )A2[A′2S( )A2]–1A′2S( ))A2.


The derivation of the distribution is obtained by choosing the partition of A such
that, asymptotically, H is not a function of . Then, asymptotically, solutions for will
arise only from

G =0 (E.4)
Appendix E 229

so that it is only necessary to consider G. But G can be broken down into a number of
components whose asymptotic distributions can be derived, and hence, via the CMT,
the distribution of the trace statistic is obtained.
Let A1 = , A2 = ⊥(′⊥⊥)–1 where  is n × r and ⊥ is n × (n – r) and orthogonal to .
The derivation begins by showing that H is redundant. For this choice of A

H = H1 − H 2
H1 =  ′S1,1
H 2 =  ′S1,0 S0−,10 S0 ,1.

H is seen to be a function of only through H1. Now reparameterize the problem using
 = T . Then, H1 = T–1′S1,1. The asymptotic limits taken from now on will be such that
 remains fixed as T → ∞, which means → 0. Thus from this point on, the discussion is
with respect to the eigenvalues normalized by T. Under this limit, H1 → 0, and so, asymp-
totically, H is not a function of , and the required solutions will follow from (E.4).
Now consider G, and for convenience put D = ⊥(′⊥⊥)–1 so that

(
G = D ′ S( ) − S( )[  ′S( ) ]−1  ′S( ) D )
= G1 − G2 − G3 (E.5)
~ ~ ~
where G1 = ) –1D′S1,1D, G2 = D′S1,0S–1
0,0S0,1D, G3 = G 3 (′S( ))G ′3, G 3 = D′S( ). Further
convergence results are now required (Johansen, 1995, lemma 10.3). These are, general-
izing (E.1) and (E.2) respectively:
1

T −1D ′S1,1D ⇒ WW ′du



0
1


D ′( S1,0 − S1,1 ′) ⇒ W (dW )′
0

9
where W now has dimension n – r dimension; and

′S1,1 
P
→  , , (E.6)
′S1,0 →  ,0 ,
P
(E.7)
S0 ,0 
P
→ 0 ,0 , (E.8)
D′S1,1 = OP (1). (E.9)
The last equality means that the probability that D′S1,1 diverges from a constant value
tends to zero, and hence that it can be regarded as a constant in the limit.10 In the fol-
lowing the “=” sign represents either equality or weak convergence to the same random
variable. Then, by (E.8)

G2 = D ′S1,0 0−1,0 S0 ,1D (E.10)


and
˜ = T −1D ′S  − D ′S S −1 S  = − D ′S S −1 S 
G3 1,1 1,0 0 ,0 0 ,1 1,0 0 ,0 0 ,1

= − D ′S1,0 0−1,0 0 , (E.11)

where the last equality follows from (E.8) and (E.7), and the previous one from
(E.9). Then,
230 Appendix E

˜ ( ′S( ) ) −1 G
G3 = G ˜′
3 3

( )
−1
˜ T −1 ′S  − S S −1 S 
=G ˜′
G
3 1,1 1,0 0 ,0 0 ,1 3

( )
−1
˜  ′S S −1 S 
= −G ˜ ′ , as T → ∞
G
3 1,0 0 ,0 0 ,1 3

( )
−1
˜  ′S
= −G −1
S S ˜ ′ as T → ∞
G
3 1,0 0 ,0 0 ,1 3

= −G ( ′ )
−1
˜ −1
 ˜′
3  ,0  0 ,0  0 , G3 (E.12)

where the last equality follows using (E.7, E.8). Substituting (E.10, E.11, and E.12) into
(E.5) gives

˜ ( ′S( ) ) −1 G
G = G1 − G2 − G ˜′
3 3

( )
−1
˜  ′  −1 
= T −1D ′S1,1D − D ′S1,0 0−1,0 S0 ,1D + G ˜′
G
3  ,0 0 ,0 0 , 3

or, using (E.11)


G = T −1D ′S1,1D − D ′S1,0QS0 ,1D = G1 − G4 (E.13)
G4 = D ′S1,0QS0 ,1D
G4 = D ′S1,0QS0 ,1D

( )
−1
Q = 0−1,0 − 0−1,0 0 ,  ′ ,0 0−1,0 0 ,   ,0 0−1,0 .

It can be shown that

Q =  ⊥ ( ⊥′ ' ⊥ )−1  ⊥′ =  ⊥ (Var ( ⊥′ W ))−1  ⊥′


and so,
G4 = D ′S1,0 ⊥ (Var ( ⊥′ W ))−1  ⊥′ S0 ,1D.

The asymptotic distribution of D′S1,0⊥ is given by


1


D ′S1,0 ⊥ ⇒ W (dW )′ ⊥
0

and so

1  1 ′

0

G4 ⇒  W (dW )′  ⊥ (Var ( ⊥′ W ))−1  ⊥′  W (dW )′ .



0


∫ (E.14)

Similarly,
1

T −1D ′S1,1D ⇒ WW ′du



0
(E.15)

and so
1


G1 ⇒  WW ′du.
0
(E.16)

Thus, substituting (E.14) and (E.16) into (E.13) gives

1
1  1 

0
∫ 0 

G ⇒  WW ′du −  W (dW )′  ⊥ (Var ( ⊥′ W ))−1  ⊥′  W (dW )′ .
  
0



Appendix E 231

It follows from the CLT that the solutions of the problem |G| = 0 converge in distribu-
tion to those of the problem

1
1  1 ′

0 0 

 WW ′du −  W (dW )′  ⊥ (Var ( ⊥′ W ))−1  ⊥′  W (dW )′  = 0.
  
0



The solutions for  are unchanged if the matrix of which the determinant is being taken
1
is pre- and post-multiplied by '– –2, which leads to simplification since the outer occur-
1
rences of W become standardized as B = '–2– W. Thus (E.17) may be replaced by

1
1  1 ′

0
∫ 0 

 BB′du −  B(dW )′  ⊥ (Var ( ⊥′ W ))−1  ⊥′  B(dW )′  = 0.
  
0


∫ (E.18)

1
Finally, noting also that (Var(′⊥W))–2– (′⊥W) = B, equation (E.18) may be written

1
1 1 ′

0
∫ 
 BB′du − B(dB)′

0




0
B(dB)′  = 0


∫ (E.19)

where B is now n – r standardized Brownian motion (dimension equal to the number of


zero eigenvalues under the null). The trace statistic is

 n 
−T  ∑ (
i=r +1
log 1 − ˆi ),
which is asymptotically equivalent to
n

T ∑ ˆ .
i = r +1
i

Equation (E.19) then gives

 1 
′  1
−1
n n
  1 
T ∑ ˆi ⇒ ∑ ∫  0    ∫
i = tr  (dB) B′   BB′du   B (dB)′  
 ∫ (E.20)
i = r +1 i = r +1
  0  0 

providing the required asymptotic distribution for the trace statistic for testing the null
of cointegrating rank r against the alternative of rank n. This result specializes to that for
testing cointegrating rank 0 against rank n by setting r = 0, as can be seen by comparing
equations (E.20) and (E.3).

E.5 Distribution theory when there are deterministic trends in the


data
The distribution has to be modified according to the deterministic components in the
process. The processes with respect to which integration takes place are unchanged, but
the integrands are modified.
The general model is
p −1

xt = xt −1 − ∑ x
i =1
i t −i + t +  t
232 Appendix E

where t = 0 + 1t which, in its most general form, allows the process xt to have a
quadratic trend, and the cointegrating relations to have a linear trend (Johansen,
1991).11 The deterministic components, in increasing order of complexity, are:

(i) no deterministic terms: t = 0


(ii) intercept only, in space of : t = 0 = "0
(iii) intercept only, not in space of : t = 0 ⊥ 0 ≠ 0
(iv) time trend, slope in space of : t = 0 + "0t
(v) time trend, slope not in space of : t = 0 + 1t, ⊥ 1 ≠ 0.

These cases correspond to different solutions of the underlying process as follows.

(i) xt has no deterministic terms and all stationary components have zero mean.
(ii)xt has neither quadratic nor linear trend, but both xt and ′xt have constant terms.
(iii)
xt has a linear trend, but this is eliminated in the cointegrating combinations.
(iv)xt has no quadratic trend, but has a linear trend that is also present in the cointe-
grating relations.
(v) xt has a quadratic trend, but the cointegrating relations have a linear trend only.

The asymptotic distribution of the trace statistic for testing the null of cointegrating
rank r has the same generic form in each case, but the distributions have to be corrected
differently. This form is

 1 
′  1
−1
  1 
∫ ∫
 
 0
 ∫
tr  (dB) F ′   F F ′du   F (dB)′  
 0  0


(E.21)
 
where B is an n – r standard Brownian motion, and F is the same standard Brownian
motion corrected for the deterministic components, with the final element (either the
n – rth or n – r + 1st) consisting of the appropriate power of u corrected for the same
components.
This is described in Table E.1. The coefficients ai and bi are fixed and required to
correct for the included deterministic terms. All elements of the corrected Brownian
motion, except the last, are, in effect, a residual having regressed the standard case on
the deterministic terms. The last term, the qth in the table below, corresponds to regress-
ing the random variable u on the same terms. If the highest order deterministic term is
orthogonal to a then the final term is n – r + 1st, otherwise it is the n – rth.

Tables of approximate asymptotic and finite sample distributions


Statistical tables are available for each of these cases reported in Table E.1, calculated for
finite samples by simulation. Table E.2 indicates where each of the cases above may be
found, with comments on their coverage. Johansen (1995) presents finite sample and
approximate asymptotic critical values for the tests, employing the standard form of the
test statistic. Osterwald-Lenum (1992) extends Johansen’s tables to consider a wider
range of dimensions for the process n = 1, 2, …, 11, Doornik (1998, 2003) discusses an
alternative method of obtaining approximate asymptotic critical values, the latter paper
providing tables. It is now common practice for regression packages to compute critical
values or p-values as required. MacKinnon, Haug and Michelis (1999) provide a
response surface methodology for computing finite sample and approximate asymptotic
critical values and p-values for all the standard cases.12
An alternative to providing tables for different sample sizes is to correct either the test
statistic or the asymptotic critical values. Reinsel and Ahn (1992), Reimers (1992) and
Appendix E 233

Table E.1 Corrections to trace statistic distributions due to deterministic components


F = {Fi(u)}

Case Deterministic Components Corrected Standard Brownian Motion

Intercept: µ0 Slope: µ1 q i = 1,2, …, q; i = q + 1, Fi(u)


Fi(u) = Bi(u) – ai – biu

(i) 0 0 n–r ai = 0, bi = 0 –
(ii) ακ0 0 n–r ai = 0, bi = 0 1
1
(iii)  ⊥ 0 ≠ 0 0 n–r–1 u – ai, ai = 1/2*
∫ B (u)du, b = 0
+
ai = i i

(iv) µ0 ακ0 n–r 1


u – ai, ai = 1/2*
∫ B (u)du, b = 0
+
ai = i i

(v) µ0  ⊥ 0 ≠ 0 Bi(u) – ai – biu++ u2 – ai – biu**

ai and bi are fixed coefficients necessary to correct for the included deterministic terms. +Corrects
Bi(u) for a constant. *Corrects u for a constant. ++Corrects Bi(u) for a linear time trend. **Corrects u2
for a linear time trend.

Table E.2 Sources of tables for the trace test

Cases Source

D (asymptotic) J (finite sample) OL

(i) table 1 table 15.1 table 0


(ii) table 2 table 15.2 table 1*
(iii) table 3 table 15.3 table 1
(iv) table 4 table 15.4 table 2*
(v) table 5 table 15.5 table 2

Note:
D – Doornik (2003); J – Johansen (1995); OL – Osterwald-Lenum (1992).

Cheung and Lai (1993) suggest correcting for, in effect, the number of parameters esti-
mated in the VAR. The correction is to replace T by T – np. Equivalently, the asymptotic
critical values can be multiplied by T/(T – np). The result is to correct a tendency of the
asymptotic tests13 to be over-sized. That is, when used naively, the tests reject the null
hypothesis too frequently. When testing the null of non-cointegration, this results in
findings of cointegration where it does not exist.

E.6 Other issues

The maximal eigenvalue statistic


This discussion has dealt with the results for the trace test. Hansen and Johansen (1998)
discuss the results for the maximal eigenvalue test that can be derived using the same
basic distributional results. Where the distribution of the trace statistic is given by the
trace of
234 Appendix E

′  1
−1
1  1 

0
∫  0

A(n, r ) =  ( dB) F ′   F F ′du 
   

∫
 F ( dB)′ 
0


as described in equation (E.21), the asymptotic distribution of the maximal eigenvalue
statistic is, analogously, the maximal eigenvalue of A(n,r). In practice, the maximal
eigenvalue statistic would be used in the same sequential manner as the trace statistic,
but it is important to note that there is no proof yet available of the consistency of this
procedure for this statistic. It is therefore reasonable to place emphasis on the trace
statistic.

Sequential testing and model selection


The distributions discussed above, whether asymptotic or finite sample, do not allow for
distortionary effects of model selection or, in finite samples, that due to sequential tests.
Each result assumes that the test takes place in isolation and is not subject to pre-
testing. In practice, it is likely that tests will suffer both inflated size and reduced power
if the critical values are not adjusted. That is, typically, the finite sample null distribu-
tions of the test statistics become more dispersed due to pre-testing or model selection.

Partial systems
The system discussed treats all variables as endogenous. There is no sense in which any
of them plays a different causal role to any others. Johansen (1992) has discussed this,
and, more recently, Harbo, Johansen, Nielsen, and Rahbek (1998), and Pesaran, Shin
and Smith (2001) have considered the impact of exogenous I(1) variables on the asymp-
totic distribution of the test statistics. This generates a wider set of models for which the
distributions must be calculated, depending not only on the total number of variables
in the system (n), but also on the number of these that are endogenous (n1, say). Thus
A(n,r) of equation (16), where B is of dimension n – r, and F depends on B as described
in table A, is replaced by

′  1
−1
1  1 
A

˜ (n, k , r ) =  (dB

0

˜ ) F˜ ′   F˜ F˜ ′du 
 
 0


∫
 F˜ (dB
0
˜ )′ 

where B is now k – r standard Brownian motion, and F is a modified n – r standard


Brownian motion, analogous to the modifications of table A for the purely endogenous
case. The underlying models are conveniently explained in MacKinnon, Haug and
Michelis (1999), where tables of critical values may also be found. Further tables may be
found in Harbo et al. (1998), with modifications provided by Doornik (2003).
Appendix F: Estimation under General
Restrictions

From the Frisch–Waugh form the system is written:

R0 ,t =  ′R1,t + ε t
or
 R0 ,t 
ε t = R0 ,t −  ′R1,t = [ I :  ′]  .
 R1,t 
It follows from Doornik and Hendry (2001) that the Concentrated Likelihood for this
multivariate least squares problem can be written:

T T  R0   I 
log L = K − [ ]
log εε ′ = K − log I : −  ′   R0′ : R1′   [ ]
2 2  R1  − ′ 
T  S0 ,0 S0 ,1   I 
=K− [
log I : −  ′ ] S  
S1,1  − ′ 
2  1,0

where Si,j = RiR′j = nt = 1 Ri,t R′j, t. Now


T
log L = K − log S0 ,0 −  ′S1,0 − S0 ,1 ′ +  ′S1,1 ′ .
2
Concentrating out the above likelihood for  = S0,1(′S1,1)–1:

T
log L = K − log S0 ,0 − S0 ,1( ′S1,1)−1  ′S1,0
2
T
= K − log S0 ,0 ( ′S1,1)−1  ′( S1,1 − S1,0 S0−,10 S1,0 ) .
2
Subject to the normalization ′S1,1 = I and given that the solution to the likelihood
problem with respect to  is invariant to S0,0, then the likelihood problem is equivalent
to solving the determinantal equation |′(S1,1 – S1,0S–1 0,0S1,0)| which in the cointegration
case is the reduced rank problem, |S1,1 – S1,0S–1
0,0S1,0| = 0. What is required is a solution to
the usual eigenvalue problem, | S1,1 – S1,0S–1
0,0S1,0| = 0, where for each non-zero eigenvalue
there is an eigenvector i such that:
( S1,1 − S1,0 S0−,10 S1,0 )i = 0.
Stacking the eigenvectors associated with the non-zero eigenvalues into an n × r matrix
, then  is the matrix that diagonalizes S1,1 – S1,0S0,0
–1
S1,0. Therefore:


r
 ′( S1,1 − S1,0 S0−,10 S1,0 ) = I − r = (1 − i ).
i =1

It is follows that the likelihood can be re-written thus:

T 
r

log LMAX = − log S0−,10 +


2

∑ log (1 − ).
i =1
i

235
236 Appendix F

As was stated in chapter 4, any test of parameters must compare the above likelihood,
which imposes no restrictions on either  or  with one on which restrictions have been
imposed. Therefore:
log L( r , H g :  = f ( ) ∩  = f ( ))

T  S0 ,0 S0 ,1   I 
=K− [
log I : − ( )( )′ ] S  .
S1,1  −( )( )′ 
2  1,0

The test is a likelihood ratio test:


LR(i ) = 2 log LMAX − 2 log L( r , H g :  = f ( ) ∩  = f ( )) ~  i2 .

Doornik and Hendry (2001) explain how to maximize the non-linear likelihood under a
range of different restrictions.
Appendix G: Proof of Identification
based on an Indirect Solution

Define  and  as consisting of ij and ij elements for i = 1, … 5, and j = 1, … 4, and 
as consisting of
ij elements for i = 1, … 5, and j = 1, … 5. For (WE) of i2,.5 = 0 and
5. = 0, which excludes them from our deliberations. However, over-identification is
sufficient for identification which implies that the conditions for over-identification are
necessary for the preferred parameters to be identified. If we look at equation (5.10) and
set 2′ = 0, then:

1  1 ′ 
 = .
 0  0 

After imposing the same restrictions as Hunter and Simpson (1995) 1 and  take the
following form:

11 0 13 14  1 21 0 0 0 


   
  22 0  24  0 1 0 − 52 52 
1 =  21 , ′ =  .
 0 0 33 34  0 − 1 1 0 0 
   
41 0 0 44  −1 0 0 1 0 

With r = 4 cointegrating vectors, the requirement of the order condition is for r2 – r =


16 – 4 = 12 restrictions with normalization. In 1 and  above there are 20 restrictions
without normalization.1 Hence, there are enough a priori restrictions to identify  and
. However, based on the indirect least squares approach. we need to find whether there
are enough solutions to the equation 1 = 1′ to derive at least one estimate of 11, 13,
14, 21, 22, 24, 33, 34, 41, 44, 21 and 52.
Multiplying ′ through by 1 yields the following matrix of restricted long-run
parameters:

11 − 14 1121 − 13 13 14 0 


 
 −  24  2121 +  22 0 −  22 52 +  24  22 52 
1 ′ =  21 .
 − 34 − 33 33 34 0 
 
41 − 44 4121 0 44 0 

Comparing 1′ with 1, where 1 = [


ij]1 for i = 1, …, 4 and j = 1, … 5, by matching
parameters, it follows that:

13 =
13 , 14 =
14 , 33 =
33 , 34 and 44 =
44 .
Consequently:

(
13 +
12 )
11 =
11 +
14 , 21 = and 41 =
41 +
44 .
(
11 +
14 )

237
238 Appendix G

Furthermore:

25
 21 =
21 − (
25 +
24 ),  24 =
24 +
25 ,  22 −
22 −  2121 and 52 = .
 22
The long-run restrictions imply that there are three over-identified parameters as there
are three unused solutions associated with some of the parameters in the system:
34 = −
31 , 33 = −
32 , 4121 =
42 .
Hence, the parameters are slightly over-identified, which is surprising given the number
of restrictions adopted, 20.
Appendix H: Generic Identification of
Long-Run Parameters in Section 5.5

From (5.18), which can be written as:

 A−1vec(

) 
vec(1112 )   31 61

   A−1vec(
32
62 )
vec ( 21 22 )  
vec(3132 )  A−1vec(
33
63 ) −1  1 0 
=  31 ,
   −1 , A =  0
vec(4142 )  A vec(
34
64 ) 
1
  62 
vec(5152 )  −1 
   A vec (

)
35 65

vec(6162 )  −1 
 A vec(
36
66 )

and using the restrictions embodied in (5.17), we obtain:

−1 −1
i1 = 31
3i , for i = 1,2 ,3,5,6, i 2 = 62
6 i , for i = 1,2 ,3,4,6,
−1 −1
1 = 31
34 , 1 = 62
65.

Similarly for (5.19):

vec(1112 ) 
 
vec( 21 22 )
vec(3132 )   1 −  51 
vec(
14
24 …
64     
  = ( B −1 ⊗ I 6 ) 
−1
 and B =   −1 
vec(4142 )  vec(
15
25 …
65  42
vec(  )  −    
 51 52

vec(6162 ) 

where  = – 1 – 12 51. Solving the former equation, subject to the restrictions on :

1  1   1
11 =
14 − 51
15 ,  21 =
24 − 51
25 , 12 = − 42
14 −
15 = 0,
     
 1 1 
 22 = − 42
24 −
25 = 0, 31 =
34 − 51
35 ,
   
1 42 1 42
41 =
44 −
45 = 0, 51 =
54 −
55 = 0,
   
1   1
61 =
64 − 42
65 = 0, 32 = − 42
34 −
35 = 0,
   
 1  1  1
42 = − 51
44 −
45 , 52 = − 51
54 −
55 , 62 = − 51
64 −
65.
     

As the parameters are over-identified one only needs to consider the following results:
11, 21, 31, 42, 52.

239
References

Abadir, K. and Talmain, G. (2002) Aggregation, persistence and volatility in a macro


model. The Review of Economic Studies, 69 749–79.
Andrews, D.W.K. (1991) Heteroskedasticity and autocorrelation consistent covariance
matrix estimation. Econometrica, 59, 817–58.
Arellano, M., Hansen, L.P., and Sentana, E. (1999) Underidentification? Paper presented
at the Econometrics Study Group Conference, Bristol, July.
Banerjee, A., Dolado, J.J., Galbraith, J.W., and Hendry, D.F. (1993) Co-integration, Error-
Correction and the Econometric Analysis of Non-Stationary Data. Oxford: Oxford
University Press.
Barndorff-Nielsen, O.E., and Shephard, N. (2001) Modelling by Levy processes for
financial econometrics, in Levy Processes: Theory and Applications. Barndorff-Nielsen,
Ole E. Mikosch, Thomas Resnick, Sidney I., eds, Boston and Basel: Birkhauser,
283–318.
Bauwens, L., Deprins, D. and Vandeuren, J.-P. (1997) Bivariate modelling of interest
rates with a cointegrated VAR-GARCH model. Discussion Paper CORE, The Catholic
University, Louvain-La Nueve DP 9780.
Bauwens, L., and Hunter, J. (2000) Identifying long-run behaviour with non-stationary
data. Discussion Paper CORE, The Catholic University, Louvain-La Nueve DP 2000/43.
Bauwens, L., Lubrano, M., and Richard J.-F. (2000) Bayesian Inference in Dynamic
Econometric Models. Oxford: Oxford University Press.
Barten, A.P. (1969) Maximum likelihood estimation of an almost complete set of
demand equations. European Economic Review, 1, 7–73.
Blough, S.R. (1992) The relationship between power and level for generic unit root tests
in finite samples. Journal of Applied Econometrics, 7, 295–308.
Boswijk, H.P. (1992) Cointegration, Identification and Exogeneity: Inference in Structural Error
Correction Models. Amsterdam: Thesis Publishers.
Boswijk, H.P. (1996) Cointegration, identification and exogeneity: inference in
structural error correction models. Journal of Business and Economics and Statistics, 14,
153–60.
Boswijk, H.P. and Frances, H. (1992) Dynamic specification and cointegration. Oxford
Bulletin of Economics and Statistics, 54, 369–81.
Box, G.E.P. and Jenkins, G.E.M. (1976) Time Series Analysis: Forecasting and Control. San
Francisco: Holden-Day.
Brockwell, P.J. and Davis, R.A. (1991) Time Series: Theory and Methods (second edition).
New York: Springer Verlag.
Burke, S.P. (1994a) Confirmatory data analysis: the joint application of stationarity and
unit root tests. Discussion Papers in Quantitative Economics and Computing, no. 20.
Department of Economics, University of Reading.
Burke, S.P. (1994b) Unit root tests of the Phillips type with data dependent selection of
the lag truncation parameter. Discussion Papers in Quantitative Economics and
Computing, no. 11, University of Reading.
Burke, S.P. (1996) Some reparameterizations of lag polynomials for dynamic analysis.
Oxford Bulletin of Economics and Statistics, 58, 373–89.

240
References 241

Burke, S.P. and Hunter, J. (1998) The impact of moving average behaviour on the
Johansen trace test for cointegration. Discussion Papers in Quantitative Economics and
Computing, no. 60, Department of Economics, University of Reading.
Caner, M., and Kilian, L. (2001) Size distortions of tests of the null hypothesis of
stationarity: evidence and implications for the PPP debate. Journal of International
Money and Finance, 20, 639–57.
Cheung, Y.-W. and Lai, K.S. (1993) Finite-sample sizes of Johansen’s likelihood ratio
tests for cointegration. Oxford Bulletin of Economics and Statistics, 55, 313–28.
Chow, G.C. (1978) Analysis and Control of Dynamic Economic Systems. New York: John
Wiley.
Clements, M.P. and Hendry, D.F. (1995) Forecasting in cointegrated systems. Journal of
Applied Econometrics, 10, 127–46.
Clements, M.P. and Hendry, D.F. (1998) Forecasting Economic Time Series. Cambridge:
Cambridge University Press.
Clements, M.P. and Hendry, D.F. (2001) Forecasting Non-Stationary Economic Time Series.
London: The MIT Press.
Corradi, V., Swanson, N.R., and White, H. (2000) Testing for stationarity-ergodicity and
for comovements between nonlinear discrete time Markov processes. Journal of
Econometrics, 96, 39–73.
Davidson, J.E.H. (1994) Stochastic Limit Theory. Oxford: Oxford University Press.
Davidson, J.E.H., Hendry, D.F., Srba, F., and Yeo, S. (1978) Econometric modelling of
the aggregate time series relationships between consumers, expenditure and income
in the United Kingdom. Economic Journal, 88, 661–92.
Davidson, R. and MacKinnon, J.G. (1993) Estimation and Inference in Econometrics.
New York: Oxford University Press
Davidson, R. and MacKinnon, J.G. (1998) Graphical methods for investigating the size
and power of hypothesis tests. The Manchester School, 6, 1–26.
Deaton, A.S. and Muellbauer, J.N.J. (1980) An almost ideal demand system. American
Economic Review, 70, 312–26.
Dickey, D.A. and Fuller, W.A. (1979) Distribution of the estimation for autoregressive
time series with a unit root. Journal of the American Statistical Association, 74, 427–31.
Dickey, D.A. and Fuller, W.A. (1981) Likelihood ratio statistics for autoregressive time
series with a unit root. Econometrica, 49, 1057–72.
Dickey, D.A, Hasza, D.P. and Fuller, W.A. (1984) Testing for unit roots in seasonal time
series. Journal of the American Statistical Association, 79, 355–67.
Dickey, D.A. and Pantula, S.G. (1987) Determining the order of differencing in auto-
regressive processes. Journal of Business and Economic Statistics, 5, 455–61.
Dhrymes, P.J. (1984) Mathematics for Econometrics. New York: Springer-Verlag.
Dolado, J., Galbraith, J.W. and Banerjee, A. (1991) Estimating intertemporal quadratic
adjustment costs models with dynamic data. International Economic Review, 32,
919–36.
Dornbusch, R. (1976) Expectations and exchange rate dynamics. Journal of Political
Economy, 84, 1161–76.
Doornik, J.A. (1995) Testing general restrictions on the cointegration space. Mimeo,
Nuffield College, Oxford.
Doornik, J.A. (1998), Approximations to the asymptotic distribution of cointegration
tests. Journal of Economic Surveys, 12, 573–93.
Doornik, J.A. (2003) Asymptotic tables for cointegration tests based on the gamma-
distribution approximation. Mimeo, Nuffield College, University of Oxford.
242 References

Doornik, J.A. and Hendry, D.F. (1996) PCFIML 9. London: Thompson International
Publishers.
Doornik, J.A. and Hendry, D.F. (2001) PCFIML 10. London: Timberlake Consultants
Press.
Dunne, J.P. and Hunter, J. (1998) The allocation of government expenditure in the UK:
a forward looking dynamic model. Paper presented at the International Institute of
Public Finance Conference, Cordoba, Argentina, August.
Elliott, G., Rothenberg, T.J and Stock, J.H. (1996) Efficient tests for an autoregressive
unit root. Econometrica, 64, 813–36.
Engle, C. (2001) The responsiveness of consumer prices to exchange rates and the impli-
cations for exchange-rate policy: a survey of a few recent new open economy macro
models. Mimeo University of Wisconsin.
Engle, R.F. (1982) Autoregressive conditional heteroscedasticity with estimates of the
variance of United Kingdom inflation. Econometrica, 50, 987–1007.
Engle, R.F. and Granger, C.W.J. (1987) Co-integration and error-correction: representa-
tion, estimation and testing. Econometrica, 55, 251–76.
Engle, R.F. and Granger, C.W.J. (1991) Long-Run Economic Relationships. Oxford: Oxford
University Press.
Engle, R.F. and Yoo, B.S. (1987) Forecasting and testing in co-integrated systems. Journal
of Econometrics, 35, 143–59.
Engle, R.F. and Yoo, B.S. (1991) Cointegrated time series: an overview with new results.
Chapter 12 in R.F. Engle and C.W.J. Granger (eds), Long-run Economic Relationships.
Oxford: Oxford University Press.
Engle, R.F., Hendry, D.F. and Richard, R.F. (1983) Exogeneity. Econometrica, 51, 277–304.
Engsted, T. and Haldrup, N. (1997) Money demand, adjustment costs and forward
looking behaviour. Journal of Policy Modeling, 19, 153–73.
Engsted, T. and Johansen, S. (1999) Granger’s representation theorem and multicointe-
gration, cointegration, causality, and forecasting. In A Festschrift in Honour of Clive
W. J. Granger. Engle, Robert F. White, Halbert, eds., Oxford and New York: Oxford
University Press, 200–11.
Ericsson, N.R. (1994) Testing exogeneity: An introduction, in Testing Exogeneity.
Ericsson, N.R. and Irons, J.S., eds, Oxford University Press, 3–38.
Ericsson, N.R. and Irons, J.S. (1994) Testing Exogeneity. Oxford: Oxford University Press.
Ericsson, N.R., Hendry, D.F. and Mizon, G.E. (1998) Exogeneity, cointegration and
economic policy analysis. Journal of Business and Economics Statistics, 16, 371–87.
Fama, E.F. (1970) Efficient capital markets: a review of theory and empirical work.
Journal of Finance, 25, 383–417.
Favero, C. and Hendry, D.F. (1992) Testing the Lucas critique: a review. Econometric
Reviews, 11, 265–306.
Fisher, P.G., Tanna, S.K., Turner, D.S, Wallis, K.F., and Whitley, J.D. (1990) Econometric
evaluation of the exchange rate in models of the UK economy. Economic Journal, 100,
1024–56.
Flôres, R. and Szafarz, A. (1995) Efficient markets do not cointegrate. Discussion Paper
9501, CEME, Université Libre de Bruxelles.
Flôres, R., and Szafarz, A. (1996) An extended definition of cointegration. Economics
Letters, 50, 193–5.
Florens, I.P., Mouchart, M. and Rolin, J.-M. (1990) Sequential Experiments, Chapter 6 in
Elements of Bayesian Statistics. New York: Marcel Dekker.
Franses, P.H. (1994) A multivariate approach to modeling univariate seasonal time
series. Journal of Econometrics, 63, 133–51.
References 243

Galbraith, J.W. and Zinde-Walsh, V. (1993) Autoregressive approximation of ARMA


processes and choice of order in parametric unit root tests. Paper presented at EC2,
University of Oxford, December.
Galbraith, J.W. and Zinde-Walsh, V. (1999) On the distribution of the augmented
Dickey–Fuller statistics in processes with moving average components. Journal of
Econometrics, 93, 25–47.
Gantmacher, F.R. (1960) Matrix Theory, vol. I. New York: Chelsea Publishing Company.
Gohberg, I., Lancaster, P. and Rodman, L. (1983) Matrix Polynomials. New York:
Academic Press.
Goldberger, A.S. (1964) Econometric Theory. New York: John Wiley and Sons.
Gonzalo, J. (1994) Comparison of five alternative methods of estimating long-run
equilibrium relationships. Journal of Econometrics, 16, 203–33.
Gonzalo, J. and Pitarakis, J.-Y. (1999) Dimensionality effect in cointegration analysis.
In Cointegration, Causality and Forecasting, A Festschrift in Honour of Clive Granger,
W.J. Granger, Engle, R.F., and White, H., eds, Oxford: Oxford University Press.
Granger, C.W.J. (1969) Investigating causal relations by econometric models and cross-
spectral methods. Econometrica, 37, 424–38.
Granger, C.W.J. (1981) Some properties of time series data and their use in econometric
model specification. Journal of Econometrics, 16, 121–30.
Granger, C.W.J. (1983) Cointegrated variables and error-correcting models. University of
California San Diego Discussion Paper 83–13.
Granger, C.W.J. (1991) Developments in the study of cointegrated economic variables.
Chapter 4 in R.F. Engle and C.W.J. Granger (eds), Long-run Economic Relationships.
Oxford: Oxford University Press.
Granger, C.W.J. (1995) Modelling nonlinear relationships between extended-memory
variables. Econometrica, 63, 265–79.
Granger, C.W.J. and Hallman, J.J. (1991) Long memory series with attractors. Oxford
Bulletin of Economics and Statistics, 53, 11–26.
Granger, C.W.J. and Joyeux, R. (1980) An introduction to long memory time series
models and fractional differencing. Journal of Time Series Analysis, 1, 15–39.
Granger, C.W.J. and Lee, T-H. (1989) Multicointegration. In Co-integration, Spurious
Regressions, and Unit Roots. Fomby, T.B. and Rhodes, G.F., eds., Advances in
Econometrics, vol. 8 Greenwich, Conn. and London: JAI Press. 71–84.
Granger, C.W.J. and Morris, M.J. (1976) Time series modelling and interpretation.
Journal of the Royal Statistical Society, Series A, 139, 246–57.
Granger, C.W.J. and Newbold, P. (1974) Spurious regression in econometrics. Journal of
Econometrics, 2, 111–20.
Granger, C.W.J. and Newbold, P. (1986) Forecasting with Economic Time Series. New York:
Academic Press.
Granger, C.W.J. and Weiss, A.A. (1983) Time Series Analysis of Error-Correcting Models.
in Studies in Econometrics, Time Series, and Multivariate Statistics. New York: Academic
Press, 255–78.
Gregoir, S. and Laroque, G. (1994) Polynomial cointegration estimation and test. Journal
of Econometrics, 63, 183–214.
Haldrup, N. (1994) The asymptotics of single equation cointegration regressions with
I(1) and I(2) variables. Journal of Econometrics, 63, 153–81.
Haldrup, N. and Salmon, M. (1998) Representations of I(2) cointegrated systems using
the Smith–McMillan form. Journal of Econometrics, 84, 303–25.
Hall, A. (1989) Testing for a unit root in the presence of moving average errors.
Biometrika, 76, 49–56.
244 References

Hall, R.E. (1978) Stochastic implications of the life cycle-permanent income hypothesis:
theory and evidence. Journal of Political Economy, 86, 971–87.
Hall, S.J. and Wickens, M. (1994) Causality in integrated systems. Centre for Economic
Forecasting, Discussion paper, 27–93, London Business School.
Hamilton, J.D. (1994) Time Series Analysis. Princeton: Princeton University Press.
Hansen, B.E. (1995) Rethinking the univariate approach to unit root testing: using
covariates to increase power. Econometric Theory, 11, 1148–72.
Hansen, L.P. and Sargent, T.J. (1982) Instrumental variables procedures for estimating
linear rational expectations models. Journal of Monetary Economics, 9, 263–96.
Hansen, P. and Johansen, S. (1998) Workbook for Cointegration. Oxford: Oxford
University Press.
Harbo, I., Johansen, S., Nielsen, B., and Rahbek, A. (1998) Asymptotic inference on
cointegrating rank in partial systems. Journal of Business and Economic Statstics, 16,
388–399.
Harvey, A.C. (1989) Forecasting Structural Time Series Models and the Kalman Filter.
Cambridge, Cambridge University Press.
Harvey, A.C. (1993) Time Series Models (second edition). London: Harvester Wheatsheaf.
Hatanaka, M. (1996) Time-series-based Econometrics: Unit Roots and Cointegration. Oxford:
Oxford University Press.
Haug, A.A. (1993) A Monte Carlo study of size distortions. Economics Letters, 41, 345–51.
Haug, A.A. (1996) Tests for cointegration: A Monte Carlo comparison. Journal of
Econometrics, 71, 89–115.
Hendry, D.F. (1988) The encompassing implications of feedback versus feed-forward
mechanisms in econometrics. Oxford Economic Papers, 40, 132–49.
Hendry, D.F. (1995) Dynamic Econometrics. Oxford: Oxford University Press.
Hendry, D.F. and Ericsson, N.R. (1990) An econometric analysis of U.K. money demand
in Monetary Trends in the United States and the United Kingdom by Milton
Friedman and Anna Schwartz. American Economic Review, 81, 8–38.
Hendry, D.F. and Favero, C. (1992) Testing the Lucas critique: a review. Econometric
Reviews, 11, 265–306.
Hendry, D.F. and Mizon, G.E. (1978) Serial correlation as a convenient simplification
not a nuisance: a comment on a study of the demand for money by the Bank of
England. Economic Journal, 88, 549–63.
Hendry, D.F. and Mizon, G.E. (1993) Evaluating dynamic econometric models by
encompassing the VAR. Chapter 18 in P.C.B. Phillips (ed.), Models, Methods and
Applications of Econometrics: Essays in Honour of A.R. Bergstrom. Cambridge, MA:
Blackwell Publishers, 272–300.
Hendry, D.F., Pagan, A. and Sargan, J.D. (1983) Dynamic Specification: The Handbook of
Econometrics. Amsterdam: North Holland.
Hendry, D.F. and Richard, J.F. (1982) On the formulation of empirical models in
dynamic econometrics. Journal of Econometrics, 20, 3–33.
Hendry, D.F. and Richard, J.F. (1983) The econometric analysis of economic time series.
International Statistical Review, 51 111–63.
Henry, M. and Robinson, P.M. (1996) Bandwidth choice in Gaussian Semi-parametric
estimation of long-run dependence. In the Papers and Proceedings of the Athens
Conference on Applied Probability and Time Series Analysis. Robinson, P.M. and
Rosenblatt, M. eds. New York: Springer-Verlag, 220–32.
Hosking, J.R.M. (1981) Fractional differencing. Biometrika, 68, 165–76.
Hubrich, K., Lutkepohl, H., and Saikkonen, P. (2001) A review of systems cointegration
tests. Econometric Reviews, 20, 247–318.
References 245

Hull, J. (2002) Options, Futures and Other Derivatives. London: Prentice Hall.
Hunter, J. (1989a) Dynamic modelling of expectations: with particular reference to the
labour market. Unpublished PhD manuscript, London School of Economics.
Hunter, J. (1989b) The effect of cointegration on solutions to rational expectations
models. Paper presented at European Econometrics Society Conference in Munich,
September.
Hunter, J. (1990) Cointegrating exogeneity. Economics Letters, 34, 33–5.
Hunter, J. (1992a) Tests of cointegrating exogeneity for PPP and uncovered interest rate
parity for the UK. Journal of Policy Modelling, Special Issue: Cointegration, Exogeneity
and Policy Analysis 14, 4, 453–63.
Hunter, J. (1992b) Representation and global identification of linear rational expecta-
tions models. Paper presented at the European Econometrics Society Conference in
Uppsala, CERF Discussion Paper, 92–03, Brunel University.
Hunter, J. (1994) A parsimonious cointegration representation of multi-cointegration.
Paper presented at the European Econometrics Society Conference in Maastricht,
CERF Discussion paper no 94–02, Brunel University.
Hunter, J. (1995) Representation and global identification of linear rational expecta-
tions. Mimeo, Brunel University.
Hunter, J. and Dislis, C.D. (1996) Cointegration representation, identification and esti-
mation. Brunel University, Centre for Research in Empirical Finance, Discussion
Paper.
Hunter, J. and Ioannidis, C. (2000) Identification and identifiability of non-linear
IV/GMM Estimators. Paper presented at the LACEA conference in Uruguay and the
ECSG conference in Bristol, Brunel University Discussion Paper, DP07–00.
Hunter, J and Simpson M. (1995) Exogeneity and identification in a model of the UK
effective exchange rate. Paper presented at the EC2 Conference in Aarhus Dec. 1995
and the Econometrics Society European Meeting in Istanbul 1996.
Inder, B. (1993) Estimating long-run relationships in economics: a comparison of
different approaches. Journal of Econometrics, 57, 53–68.
Johansen, S. (1988a) The mathematical structure of error correction models.
Contemporary Mathematics, 80, 359–86.
Johansen, S. (1988b) Statistical analysis of cointegration vectors. Journal of Economic
Dynamics and Control, 12, 231–54.
Johansen, S. (1991a) Estimation and hypothesis testing of cointegrating vectors in
Gaussian vector autoregressive models. Econometrica, 59, 1551–80.
Johansen, S. (1991b) A statistical analysis of cointegration for I(2) variables. University
of Helsinki, Department of Statistics Report, no. 77.
Johansen, S. (1992a) Testing weak exogeneity and the order of cointegration in UK
money demand data. Journal of Policy Modelling, Special Issue: Cointegration,
Exogeneity and Policy Analysis, 14, 313–34.
Johansen. S. (1992b) Cointegration in partial systems and the efficiency of single equa-
tion analysis. Journal of Econometrics, 52, 3, 389–402.
Johansen, S. (1995a) Likelihood-Inference in Cointegrated Vector Auto-Regressive Models.
Oxford: Oxford University Press.
Johansen, S. (1995b) Identifying restrictions of cointegrating vectors. Journal of
Econometrics, 69, 111–32.
Johansen, S. (1995c) A statistical analysis of cointegration for I(2) variables. Econometric
Theory, 11, 25–59.
Johansen, S. (2002a) A small sample correction for the test of cointegrating rank in the
vector autoregressive model. Econometrica, 70, 1929–61.
246 References

Johansen, S. (2002b) A small sample correction for tests of hypotheses on the co-
integrating vectors. Journal of Econometrics, 111, 195–221.
Johansen, S. and Juselius, K. (1990) Maximum likelihood estimation and inference on
cointegration – with applications to the demand for money. Oxford Bulletin of
Economics and Statistics, 52, 169–210.
Johansen, S. and Juselius, K. (1992) Some structural hypotheses in a multi-variate coin-
tegration analysis of the purchasing power parity and the uncovered interest parity
for UK. Journal of Econometrics, 53, 211–44.
Johansen, S. and Juselius, K. (1994) Identification of the long-run and the short-run
structure: An application to the IS/LM model. Journal of Econometrics, 63, 7–36.
Johansen, S. and Swensen A.R. (1999) Testing exact rational expectations in co-
integrated vector autoregressive models. Journal of Econometrics 93, 73–91.
Juselius K. (1994) Do PPP and UIRP hold in the long-run? – An example of likelihood
inference in a multivariate time-series model. Paper presented at Econometric Society
European Meeting, Maastricht.
Juselius, K. (1995) Do purchasing power parity and uncovered interest rate parity hold
in the long-run? – An example of likelihood inference in a multivariate time-series
model. Journal of Econometrics, 69, 178–210.
Keynes, J.M. (1939) Professor Tinbergen’s method. Reprinted in the Collected Writings of
John Maynard Keynes, vol. XIV, 306–18.
Kollintzas, T. (1985) The symmetric linear rational expectations model. Econometrica, 53,
963–76.
Koopmans, T.C. (1953) Identification problems in economic model construction. In
Studies in Econometric Method, Cowles Commission Monograph 14, Koopmans, T.C
and Hood, W.C., eds. New York: John Wiley and Sons.
Kremers, J.J.M., Ericsson, N.R. and Dolado, J. (1992) The power of cointegration tests.
Oxford Bulletin of Economics and Statistics, 54, 325–48.
Kwiatkowski, D., Phillips, P.C.B., Schmidt, P. and Shin, Y. (1992) Testing the null of
stationarity against the alternative of a unit root: how sure are we that economic time
series have a unit root? Journal of Econometrics, 54, 159–78.
Lee, D., and Schmidt, P. (1996) On the power of the KPSS test of stationarity against
fractionally-integrated alternatives. Journal of Econometrics, 73, 285–302.
Leybourne, S.J. and McCabe, B.M.P. (1994) A consistent test for a unit root. Journal of
Business and Economic Statistics, 12, 157–66.
Leybourne, S.J., McCabe, B.P.M. and Tremayne, A.R. (1996) Can economic time series be
differenced to stationarity? Journal of Business and Economic Statistics, 14, 435–46.
Lin, J.-L. and Tsay, R.S. (1996) Co-integration constraint and forecasting: an empirical
examination. Journal of Applied Econometrics, 11, 519–38.
Lippi, M. and Reichlin, L. (1994) VAR analysis, non-fundamental representations,
Blaschke matrices. Journal of Econometrics, 63, 290–307.
Lucas, R.E. (1976) Econometric policy evaluation: a critique. In The Philips Curve and
Labor Markets, Carnegie-Rochester Conference Series on Public Policy, vol. 1, Brunner
K. and Meltzer A.H. (eds). Amsterdam: North-Holland.
Lütkepohl, H. (1991) Introduction to Multiple Time-Series. Berlin: Springer-Verlag.
Lütkepohl, H. and Claessen, H. (1993) Analysis of cointegrated VARMA processes. Paper
presented at the EC2 conference at the Institute for Economics and Statistics, Oxford,
December.
MacKinnon, J.G. (1991) Critical values for cointegration tests. In Long-Run Economic
Relationships, R.F. Engle and C.W.J. Granger (eds). Oxford: Oxford University Press.
MacKinnon, J.G., Haug, A.A. and Michelis, L. (1999) Numerical distribution functions of
likelihood ratio tests for cointegration. Journal of Applied Econometrics, 14, 563–77.
References 247

Maddala, G.S. and Kim, I.-M. (1998) Unit Roots, Cointegration and Structural Change.
Cambridge: Cambridge University Press.
Marinucci, D. and Robinson, P.M. (2001) Finite-sample improvements in statistical
inference with I(1) processes. Journal of Applied Econometrics, 16, 431–44.
McCabe, B. and Tremayne, A.R. (1993) Elements of Modern Asymptotic Theory with
Statistical Applications. Manchester: Manchester University Press.
Mosconi, R. and Giannini, C. (1992) Non-causality in cointegrated systems: representa-
tion, estimation and testing. Oxford Bulletin of Economics and Statistics, 54, 399–417.
Muellbauer J. (1983) Surprises in the consumption function. Economic Journal,
Supplement March, 34–50.
Nankervis, J.C., and Savin, N.E. (1985) Testing the autoregressive parameter with the
t-Statistic. Journal of Econometrics, 27, 143–61.
Nankervis, J.C. and Savin, N.E. (1988) The student’s t approximation in a stationary first
order autoregressive model. Econometrica, 56, 119–45.
Nickell, S.J. (1985) Error-correction, partial adjustment and all that: an expository note.
Oxford Bulletin of Economics and Statistics, 47, 119–29.
Newey, W. and West, K. (1987) A simple positive semi-definite heteroskedasticity and
autocorrelation consistent covariance matrix. Econometrica, 55, 703–8.
Ng, S. and Perron, P. (1995) Unit root tests in ARMA models with data-dependent
methods for selection of the truncation lag. Journal of the American Statistical
Association, 90, 268–81.
Osterwald-Lenum, M. (1992) A note with quantiles of the asymptotic distribution of the
maximum likelihood cointegration rank test statistics. Oxford Bulletin of Economics and
Statistics, 54, 461–71.
Park, J.Y. and Phillips, P.C.B (1988) Statistical in regressions with integrated processes:
Part I. Econometric Theory, 4, 468–97.
Parker, S. (1998) Opening a can of worms: the pitfalls of time series regression analyses
of income inequality. Brunel University Discussion Paper, 98–11.
Paruolo, P. (1996) On the determination of integration indices in I(2) systems. Journal of
Econometrics, 72, 313–56.
Patterson, K. (2000) An Introduction to Applied Econometrics: a Time Series Approach.
Basingstoke: Macmillan.
Patterson K. (2005) Topics in Nonstationary Economic Time Series. Basingstoke: Palgrave
Macmillan.
Pesaran, M.H. (1981) Identification of rational expectations models. Journal of
Econometrics, 16, 375–98.
Pesaran, M.H. (1987) The Limits to Rational Expectations. Oxford: Basil Blackwell.
Pesaran, M.H., Shin, Y. and Smith, R.J. (2000) Structural analysis of vector error correc-
tion models with exogenous I(1) variables. Journal of Econometrics, 97, 293–343.
Pesaran, B. and Pesaran, M.H. (1998) Microfit 4. Oxford: Oxford Electronic Publishing.
Perron, P. (1989) The great crash, the oil price shock and the unit root hypothesis.
Econometrica, 57, 1361–1401.
Perron, P. (1990) Testing for a unit root in a time series with a changing mean. Journal of
Business and Economic Statistics, 8, 153–62.
Phillips, P.C.B. (1987) Time series regression with a unit root. Econometrica, 55, 277–302.
Phadke, M.S. and Kedem, G. (1978) Computation of the exact likelihood function of
multivariate moving average models. Biometrika, 65, 511–19.
Phillips, P.C.B. (1991) Optimal inference in cointegrated systems. Econometrica, 59,
283–306.
Phillips, P.C.B. (1994) Some exact distribution theory for maximum likelihood estima-
tiors of cointegrating coefficients in error correction models. Econometrica, 62, 73–93.
248 References

Phillips, P.C.B. and Hansen, B.E. (1990) Statistical inference in instrumental variables
regression with I(1) processes. Review of Economic Studies, 57, 99–125.
Phillips, P.C.B. and Ouliaris, S. (1990) Asymptotic properties of residual based tests of
cointegration. Econometrica, 58, 165–93.
Phillips, P.C.B. and Perron, P. (1988) Testing for a unit root in time series regression.
Biometrika, 75, 335–436.
Podivinsky, J.M. (1993) Small sample properties of tests of linear restrictions on cointe-
grating vectors and their weights. Economics Letters, 39, 13–18.
Reinsel, G.C. and Ahn, S.K. (1992) Vector autoregressive models with unit roots and
reduced rank structure: estimation likelihood ratio test, and forecasting. Journal of
Time Series Analysis, 13, 353–75.
Reimers, H.-E. (1992) Comparisons of tests for multivariate cointegration. Statistical
Papers, 33, 335–59.
Robinson, P.M. (1994) Semi-parametric analysis of long-memory time series. Annals of
Statistics, 23, 1630–61.
Robinson, P.M. and Marinucci, D. (1998) Semiparametric frequency domain analysis of
fractional cointegration. STICERD discussion paper EM/98/348, London School of
Economics.
Robinson, P.M. and Yajima, Y. (2002) Determination of cointegrating rank in fractional
systems. Journal of Econometrics, 106, 217–41.
Rothenberg, T.J. (1971) Identification in parametric models. Econometrica, 39, 577–91.
Said, S.E. and Dickey, D.A. (1984) Testing for unit roots in autoregressive-moving
average models of unknown order. Biometrika, 71, 599–607.
Saikonnen, P. (1991) Asymptotically efficient estimation of cointegrating regressions.
Econometric Theory, 7, 1–21.
Sargan, J.D. (1964) Wages and prices in the UK: a study in econometric methodology.
In Econometric Analysis for National Economic Planning, P.E. Hart, G. Mills and
J.K. Whitaker (eds). London: Butterworth.
Sargan, J.D. (1975) The identification and estimation of sets of simultaneous stochastic
equations. LSE discussion paper no. A1.
Sargan, J.D. (1982) Alternatives to the Muellbauer method of specifying and estimating
a rational expectations model. Florida University discussion paper 68.
Sargan, J.D. (1983a) Identification and lack of identification. Econometrica, 51, 1605–33.
Sargan, J.D. (1983b) Identification in models with autoregressive errors. In Studies in
Econometrics. Time Series and Multivariate Statistics, S. Karlin, T. Amemiya and L.A.
Goodman (eds). New York.: Academic Press, 169–205.
Sargan, J.D. (1988) Lectures on Advanced Econometric Theory. Oxford: Basil Blackwell.
Sargan, J.D. and Bhargava, A. (1983) Testing residuals from least squares regression for
being generated by a Gaussian random walk. Econometrica, 51, 153–74.
Sargent, T.J. (1978) Estimation of dynamic labour demand schedules under rational
expectations. Journal of Political Economy, 86, 1009–44.
Schwert, G.W. (1989) Tests for unit roots: a Monte Carlo investigation. Journal of
Business and Economic Statistics, 7, 147–59.
Sims, C. (1980) Macroeconomics and reality. Econometrica, 48, 11–48.
Spanos, A. (1986) Statistical Foundations of Econometric Modelling. Cambridge: Cambridge
University Press.
Spanos, A. (1994) On modeling heteroskedasticity: the Student’s t and elliptical linear
regression models. Econometric Theory, 10, 286–315.
Spliid, H. (1983) A fast estimation method for the vector auto-regressive moving average
model with exogenous variables. Journal of the American Statistical Association, 78,
843–49.
References 249

Stock, J.H. (1987) Asymptotic properties of least squares estimates of cointegration


vectors. Econometrica, 55, 1035–56.
Stock, J. and Watson, M. (1993) A simple estimator of cointegrating vectors in higher
order integrated systems. Econometrica, 61, 783–820.
Stock, J.H. and Watson, M.W. (2003) Introduction to Econometrics. Boston: Addison
Wesley.
Stock, J.H., Wright, J., and Yogo, M. (2002) A survey of weak instruments, weak
identification in GMM. Journal of Business and Economic Statistics, 20, 518–29.
Taylor, A.M. (1999) Recursive mean adjustment to tests of the seasonal unit root
hypothesis. Birmingham University Discussion paper, 99–11.
Theil, H. (1965) The information approach to demand analysis. Econometrica, 33, 67–87.
Tobin, J. (1950) A statistical demand function for food in the USA. Journal of the Royal
Statistical Society, Series A, 113–41.
Toda, H.Y. and Phillips, P.C.B. (1994) Vector autoregression and causality: a theoretical
overview and simulation study. Econometric Reviews, 13, 259–85.
Wallis, K.F. (1974) Seasonal adjustment and relations between variables. Journal of the
American Statistical Association, 69, 18–32.
Wallis, K.F., Andrews, M.J., Bell, D.N.F., Fisher, P.G. and Whitley, J.D. (1984) Models of
the UK Economy. Oxford: Oxford University Press.
White, H. (1980) A heteroskedasticity-consistent covariance estimator and a direct test
for heteroskedasticity. Econometrica, 48, 817–38.
Wickens, M.R. (1982) The efficient estimation of econometric models with rational
expectations. Review of Economic Studies, 49, 55–67.
Wickens, M.R., and Breusch, T.S. (1988) Dynamic specification, the long run and the
estimation of transformed regression models. Economic Journal, Conference Papers, 98,
189–205.
Wold, H. and Jureen, L. (1953) Demand Analysis. New York: Wiley.
Yoo, S. (1986) Multi-cointegrated time series and generalised error-correction models.
University of San Diego working paper.
Yule G.U. (1926) Why do we sometimes get non-sense correlation between time-series?
A study of sampling and the nature of time-series. Journal of the Royal Statistical Society,
89, 1–64.
Yule, G.U. (1927) On a method of investigating periodicities in disturbed series with
special reference to Wolfer’s sunspot numbers. Philosophical Transactions (A), 226,
267–98.
Zivot, E. and Andrews, D.W.K. (1992) Further evidence on the Great Crash, the Oil Price
Shock and the unit root hypothesis. Journal of Business and Economic Statistics, 10,
251–70.
Index

Abadir, K., 160 Davidson, J.E.H., 3, 67, 70, 203n1


Andrews, D.W.K., 33, 34 Davidson, R., 64, 208n18
Asymptotic distribution Differencing, 2, 3, 5, 22, 29, 50, 78, 80,
of (cointegration) tests, 64, 225–34 84, 88, 106, 124, 126, 159
of estimators in cointegrated systems, Dislis, C., 118, 126–127
63, 67 Dolado, J.J., 160, 193
Arellano, M., 188 Donsker’s theorem, 225, 226
Autocovariance, 9, 11, 18 Doornik, J.A., 114, 117, 133, 135, 139,
Autoregressive Conditional 156, 157, 168, 234, 235, 236
Heteroscedasticity (ARCH), 117, 158, Dynamic
201 models, 4, 40, 64, 70, 107
Autoregressive distributed lag (ADL) specification, 2, 106, 107, 117, 128,
model, 42, 45, 51, 52, 53, 56, 57 129, 130, 131
Autoregressive integrated moving average Dunne, J.P., 188
(ARIMA), 29, 50, 51, 52, 53
linear combinations of, 51, 53 Elementary matrices, 215
Autoregressive-moving average (ARMA) Elementary row operations, 215
processes, 14, 16 Elliot, G., 35
sum of, 48 Engle, C., 136
linear functions of, 48 Engle, R.F., 48, 62, 70, 71, 75, 78, 83, 105,
106, 161, 171, 173, 180, 193
Bannerjee, A., 11, 63–5 Engsted, T., 160, 188
Barndorff-Nielsen, O.E., 116 Equilibrium, 38, 56, 200–1
Bauwens, L., 6, 36, 117, 137, 139, 143 error, 40, 41
Boswijk, H.P., 6, 71, 148, 150, 157, 210, correction, 38, 41, 42
211n23 speed of adjustment to (see speed of
Brownian motion, 225, 226 adjustment)
static, 40, 47, 48, 52
Canonical Correlations, 223–4 Ericsson, N.R., 70, 128, 129, 137, 143,
Chow, G.C., 189 151, 152, 203n3
Claessen, H., 118, 126 Error-Correction, 67, 70, 71
Clements, M.P., 7,160, 180–2, 185–7 models (ECM), 6, 42, 43, 44, 45, 73, 129
Cointegration, 5, 6, 37, 47, 52, 56, 61, 71, term, 67, 69, 105, 126, 205
73, 74, 78, 84, 89–90, 95, 97, 105, Exact Distribution Theory, 64, 67
107, 116, 118, 125–6, 130, 139, 142, Exogeneity
159–61, 171, 173, 175, 187–8, 190, cointegrating (CE), 6, 129, 131–7, 152,
195, 197–9, 200–1 154, 157, 158, 189
testing (see testing, cointegration) concept of, 70
Cointegrating Regression, 62, 68 long-run, 128, 132
OLS Estimator, 62 strict (SE), 133, 135–6, 156
Continuous mapping theorem, 226 strong, 129, 131, 137, 154–5
Convergence, 63 super, 130, 137, 189
in distribution, 225, 226 weak (WE), 3, 129–137, 143–4, 151–8, 192
in probability, 63, 227
rate of, 63 Favero, C., 189
weak, 225 Fisher, P.G., 108, 208n25, 209n3

250
Index 251

Flôres, R. 71, 106, 160 generic, 128, 137–41, 143–4, 148


Forecast evaluation global, 138, 142
cointegrated systems, 160, 186–8, 199, Hunter conditions for, 150, 210n17
212nn22, 23 Johansen conditions for, 144–148,
stationary time series, 186–7, 203n4 210n17
non-stationary time series, 160, 187–8, local, 138, 141
199, 211n7 order condition, 138, 140, 142–3, 146,
under co-integration constraints, 160, 152–3, 237
188, 199 over, 210n11, 211n25
Forecasting, 160, 173, 175, 177–9, 181–4, rank condition, 139–41, 145–6, 150
186, 198–9, 212nn9, 10 under, 210nn11, 12
Fractional Inference, 63, 66, 70, 201
cointegration, 7, 171–2, 201 cointegrating rank, 6, 103–4, 116, 118,
differencing, 6, 31, 106, 171 125, 127, 202, 227–8, 231
integration, 31, 35 Integration, 29, 30, 62
processes, 7, 160–1 of Order 1 (I(1)), 31, 34, 36, 48–53, 57,
Fuller, W.A., 32, 34, 37, 67, 107, 172 62, 64, 74, 75, 77, 89–95, 97–8,
100, 106–7, 115, 150, 159–60,
Gantmacher, F.R., 119 162–4, 166, 168, 170–1, 177,
Generalized Bézout Theorem, 119 179–80, 189, 199
Generalized Least Squares (GLS), 117 Order 2 (I(2)), 7, 34, 85, 95, 106–7, 112,
Generalized Method of Moments (GMM), 115, 159–71, 187, 199
4, 71, 188 Interest Rate Parity (UIRP), 111, 170
Giannini, 133–4 Instrumental variables (IV), 67, 188
Gohberg, I., 118, 125
Gonzalo, J., 71, 116 Johansen, S., 6–7, 70, 73, 77, 78, 89, 97,
Granger, C.W.J., 4–5, 31, 40, 48, 62, 67, 104–11, 113–19, 125, 127, 129–30,
69–71, 78, 89, 105, 107, 117–18, 121, 132–3, 137–9, 142–51, 154, 159–63,
124, 126, 130, 159–62, 166, 171, 180, 165–6, 168–70, 179–80, 192, 212n1
194–5, 199, 210n16 Johansen
Granger procedure, 97–105, 105–7, 116–18, 125–7,
Causality, 157 144, 179, 187, 193, 199, 207n10
Representation Theorem, 6, 69, 73, Procedure for I(2), 162–5, 168–9
117–18, 126–7, 194 trace test, 100–5, 106, 108,–10, 112–13,
Gregoir, S., 125, 170 117–18, 208nn20, 27, 227–8, 231
l-max test, 100–2, 104–5, 108–9,
Haldrup, N., 119, 125, 160, 188, 193, 206 208nn20, 21, 27, 233
Hall, A., 34, 71 test for I(2), 114–15, 163–5, 167–8
Hall, S.J., 129
Hansen, B.E., 35, 64, 106 Keynes, J.M., 4, 67, 203n4
Haug, A.A, 108, 213n3, 232, 234 Killian, L., 117
Hendry, D.F., 3, 7, 70, 89, 111, 114, 117, Kollintzas, T., 189
129, 131–2, 135, 139, 154, 160, 166, Koopmans, T.C., 139
168, 180–2, 185–7, 189, 198, 203n3,
235–6 Lag polynomials, 19–21, 26, 28, 42
Hubrich, K., 11, 118, 127 roots of, 20–1, 28, 32, 45, 49, 75, 78–85,
Hull, J., 116 216
Laroque, G., 125, 170
Ioannidis, C., 188 Likelihood, 89, 97, 99, 111, 171, 180, 197,
Identification, 28, 202, 211n1 208n16, 219–22, 223
Boswijk conditions for, 148–9 concentrated, 98–9, 197, 220, 235–6
empirical, 129, 137, 143, 148 conditional, 100, 107
exact, 210n11 Lin, J.-L., 7, 160, 177, 179–80, 182–6, 188
252 Index

Linear Quadratic Adjustment Cost Rahbek A., 234


Models, 189–97 Random walk, 2–4, 22, 23, 26, 48, 52,
Lippi, M., 71, 203n3 105, 107, 109, 116, 135
Long memory, 13, 160–1 Rational Expectations Models, 4, 70, 71,
Long-run solution, 38, 40, 42, 44–6, 48 142, 188–97, 199
Lubrano, M., 143 cointegration, 192–7
Lucas, R.E., 3, 188 estimation, 197–8
Lütkepohl, H., 118, 126, 179 unit roots in the endogenous variables,
195–7
MacKinnon, J.G., 64, 208n18, 232, 234 weakly exogenous I(1) variables, 192–5
Marinucci, D., 7, 106, 116, 118, 125, 172, Reduced Form, 89, 125, 138–9, 141, 143,
199 149, 160, 192
Matrix Reichlin, L., 71, 203n3
canonical form, 121 Reparameterization, 32, 42, 44, 45, 72, 79,
determinant, 217 80, 96, 122, 124, 126, 161, 165
inversion, 217–18 Richard J.-F., 3, 36, 111, 139, 143
polynomial, 72, 73, 75, 79, 82, 118–19, Riemers, H.-E., 208n28
Appendix B.3, 207n3 Robinson, P.M., 7, 106, 116, 118, 125,
polynomial roots, 75, 78–85, 216 171–3, 199, 211n5
singular value decomposition, 115, Roots,
208n19 see lag polynomials; matrix; unit root
rational, 78, 79, 81, 82, 84 Rothenberg, T.J., 35, 111, 138, 140–1,
uni-modular, 79, 216 143, 149
Mizon, G.E., 3, 70, 129, 131, 209n1
Monte Carlo Simulation, 4, 116, 118, 141, Salmon, M., 119, 125
160, 188, 212n8 Sargan, J.D., 2–3, 7, 67, 70, 71, 89, 119,
Mosconi, R., 133–4 138, 140–2, 149, 157, 188, 192, 197,
Multicointegration, 6, 71, 210n16 211n1
Sargan–Bézout Factorization, 119–20
Newbold, P., 4, 5, 67 Sargent, T.J., 4, 190, 203
Non-Stationarity, 8, 15, 21, 38, 62, 90, 94, Savin, N.E., 35
161, 173, 177, 183 Schmidt, P., 35
Sequential testing, 103–5, 208nn20, 23,
Over-identifying restrictions, 143, 148, 24, 211n3, 234
150, 154–5, 188 Shephard, N., 116
Order in probability Simpson,M., 6, 105, 112–15, 129, 133,
Op, 63, 206 135–6, 144, 146, 148, 166, 209n32,
op, 213 237
Sims, C., 4, 70, 89
Parker, S., 143 Small Sample Correction, 212
Partial systems, 234 Bartlet, 157
Paruolo, P., 7, 166–8, 170 Bootstrap, 157, 202
Patterson, K., 7, 32–4, 62, 63, 66 Hypothesis Tests on cointegrating
Pesaran, M.H., 143, 160, 188–90, 236 vectors (β), 157, 209n8
Phillips, P.C.B., 33, 35, 63, 64, 106, 118, Trace Test, 208n28, 212n1
125, 157, 172 Tests of Linear Restrictions, 157, 209n8
Podivinsky, J.M., 157 Smith–McMillan, 71, 78–86, 125
Polynomial rational form, 79
cointegration, 71, 122, 125, 166, 170, Yoo transformation
210n16 (Smith–McMillan–Yoo form), 4, 79,
lag (see lag polynomial) 82, 83, 84, 118, 126
Purchasing Power Parity (PPP), 108, Speed of adjustment, 41, 42, 45, 46, 60,
110–11, 113, 136–7, 143, 146, 170 202, 213nn4, 5
Index 253

Stationarity, 5,10,11, 29 Trend, 5, 108


co-variance, 11, 203 common, 164–6, 168, 170, 181, 199
difference, 30 deterministic, 5, 23, 26, 108–9, 110,
strict, 14 166, 168, 202, 211n2
tests, 35, 69 I(2), 115, 164, 166, 168, 170, 199
trend stationarity, 30 quadratic, 163, 166, 211n2
weak, 203 stochastic, 23, 92, 96, 108, 168, 175
Stock, J.H., 35, 36, 62, 188 Tsay, R.S., 7, 160, 177, 179–80, 182–6,
Structural 188
form, 139, 143, 149, 192
models, 4, 31, 36, 138, 142, 159–60 Unit root, 22, 29, 30, 31
Structural breaks, 7, 34, 106, 116, 159, Unit root tests
201–2, 203n4, 212n4 Augmented Dickey–Fuller, 34, 35, 64
Spanos, A., 114, 117 critical values, 64–5
Spliid, H., 198 Dickey–Fuller, 71
Spurious Regression, 5 Elliot–Stock–Rothenberg, 36
Super-consistency, 63 power, 36, 64
Szafarz, A., 71, 106, 160 Phillips, 33, 63
Phillips–Perron, 33, 64, 66
Talmain, G. 160 Zivot–Andrews, 34
Taylor, A.M., 191
Testing Vector autoregressive (VAR) process, 4–6,
autoregressive t-values, 63 69–70, 72–3, 77–8, 81, 87–92, 94–5,
cointegration, 57, 64–7, 70, 97–105, 97, 105–19, 122–3, 125–31, 142, 144,
108–10, 112, 114–15 151, 156, 161–3, 169–70, 177–82,
cointegration in the I(2) case, 186–9, 187, 192, 198, 199
211nn2, 3 Vector autoregressive-moving average
cointegrating exogeneity (CE), 132–7, (VARMA) process, 75–7, 118–19, 122,
155, 209n7 125–6, 161
general restrictions, 135–7, 155–6, 235–6 Vector autoregressive fractionally
identifiability, 150, 154, 155 integrated moving average
identifying restrictions, 145, 148, 155 (VARFIMA) process, 172, 211n6
long-run exclusion (LE), 132–7, 156–7 Vector error correction model (VECM), 6,
over-identifying restrictions, 150, 154, 155 72–3, 89–91, 95, 97, 100, 117, 122–3,
normalization, 148 125, 162, 177–80, 182–3, 187
null of stationarity, 35 Vector moving average (VMA) process,
restrictions, 132–4, 154–5 6, 69–70, 73–5, 78–89, 96–7,
strict exogeneity (SE), 133, 135–6, 209n5 118–19, 123, 125–6, 142, 173–8,
strong exogeneity, 137, 154, 156–7, 198
210n10
unit roots see Unit root tests Wallis, K.F., 2, 203n8, 208n25
weak exogeneity (WE), 132–7, 154, White, H., 34
156–7, 209n9 White noise, 5, 15–6, 189, 191
Time Series Models Wickens. M., 4, 116, 129, 132
ARFIMA, 6, 31 Wold, H., 1–2, 5, 119, 117–18, 125–6,
autoregressive (AR), 2, 18, 21, 22, 32 161, 194
autoregressive-moving average (ARMA), Wold Representation theorem, 17, 67, 69,
2, 26, 28, 48, 50 71, 73, 74, 207n5
FARIMA see ARFIMA
moving average (MA), 2, 16, 17, 20, 29 Yajima, Y., 7, 171–3, 199
multivariate, 4–7, 71–77, 118, 125–7 see Yoo, B.S., 6, 71, 78, 83, 89, 118, 121,
also Vector Processes (VAR, VARMA 125–6, 166, 170, 173, 180, 182
and VMA) Yule G.U., 2, 4, 67, 71, 78, 79, 82

You might also like