0% found this document useful (0 votes)
402 views154 pages

Notes On Time Series Econometrics For Beginners Using Stata (PDFDrive)

Uploaded by

Mahlatse Mabeba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
402 views154 pages

Notes On Time Series Econometrics For Beginners Using Stata (PDFDrive)

Uploaded by

Mahlatse Mabeba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 154

NOTES ON TIME SERIES ECONOMETRICS FOR

BEGINNERS USING STATA


Phùng Thanh Bình
School of Economics – UEH
[email protected]
[26/2/2018]

C
Special thanks to Dr. Costas Leon for his comments and motivation

Contents

Section 1: Introduction …………………………………………………………...…… 1


Section 2: An introduction to time series econometrics ……………………………… 2
Section 3: Stationary stochastic process ……………………………………………… 4
Section 4: Nonstationary stochastic process ………………………………….…...… 14
Section 5: The unit roots and spurious regressions ……………………………......… 22
Section 6: Testing for unit roots ……………………………...................................… 27
Section 7: Short-run and long-run relationships ………….......................................… 40
Section 8: Cointegration and error correction models ……......................................… 43
Section 9: Vector autoregressive models ……………...…......................................… 61
Section 10: VECM and Johansen approach of cointegration tests ...........................… 77
Section 11: Causality tests .......................................................................................… 95
Section 12: Bounds test for cointegration …………………………………………... 117
Section 13: Nonstationary panels…………………………………………………... 126
Section 14: Suggested research topics ……………………………………………... 144
Section 15: Concluding remarks …………………………………………………… 147
References: ………………………………………………………………………… 151

0
1. INTRODUCTION
I write this notes on time series econometrics for my students in Applied Economics at
the University of Economics, HCMC (UEH)1. Since most economics students in
developing countries are likely to have problems with English as a second language,
mathematics background, and especially access to updated resources for self-study, this
series of lectures hopefully has some contribution. The aim is to help you understand
key concepts of time series econometrics through hand-on examples in Stata. To its end,
you are able to read time-series based research articles. Moreover, I also expect that
you will be interested in time series data analysis, and write your dissertation in this
field. As the time this series of lectures is preparing, I believe that the Vietnam time
series data2 is long enough for you to conduct such a study. This is just a brief summary
of the body of knowledge in time series econometrics according to my own
understanding. Obviously, it has no scientific value for citations. In addition, researches
using bivariate models are not strongly appreciated by journal’s editors3 and the
university’s supervisors as well. As a researcher, you must be fully responsible for your
own choice of the research project. My advice is that you should firstly start with the
research problem of interest, not with the data availability and the statistical techniques.
Ironically, at the time this series of lectures is writing, the ‘explanatory factor analysis’
is still a preferred choice of many young researchers in economics and business at UEH.
They blindly imitate previous studies. Honestly, I don’t want the series of models
presented in my notes will become the second wave of critiques. Therefore, you just use
it whenever you really need and crystal clearly understand it.
Some topics such as serial correlation, distributed lag models, ARIMA models, ARCH
models, multivariate ARCH models, unit root tests and cointegration tests with
structural breaks4, dynamic OLS, and fully modified OLS are beyond the scope of this
series of lectures. You can find them elsewhere such as econometrics textbooks, journal
articles, Stata manuals and my handouts as well.
After studying this series of lectures, you should be able to basically understand the
following topics in time series econometrics:
▪ An overview of time series econometrics
▪ The concepts of nonstationary, AR, MA, and random walk processes

1
Website: www.ueh.edu.vn Adress: 59C Nguyen Đinh Chieu Street, District 3, Ho Chi Minh City, Vietnam.
2
The most important data sources for these studies can be World Bank’s World Development Indicators, IMF-
IFS, General Statistical Office, and Reuters Thomson.
3
See Ozturk (2010), Omri (2014).
4
See ‘Nonstationarity II: Breaks’ in Stock & Watson (2015: p.561-67), Binh (2011), Narayan (2005).

1
▪ The concept of spurious regression
▪ The unit root tests
▪ The short-run and long-run relationships
▪ Autoregressive distributed lag (ARDL) model and error correction model (ECM)
▪ AG approach for cointegration and ECM estimation
▪ Vector autoregressive (VAR) models
▪ Vector error correction model (VECM) and Johansen approach for cointegration
▪ Granger causality tests (standard and augmented versions)
▪ ARDL and bounds test for cointegration
▪ Nonstationary panels
▪ Basic practicalities in Stata (versions 14 & 15)
▪ Suggested research topics
To get started, you should be familiar with basic econometrics and statistics5. Searching
for research articles, I realize that this kind of models has been widely applied in fields
of macroeconomics, financial economics, and especially energy economics. Therefore,
these models just equip tools for you to do research, specialized knowledge from
literature review is indeed a key.

2. AN INTRODUCTION TO TIME SERIES ECONOMETRICS


In this series of lectures, we will mainly discuss single equation estimation techniques
in a very different way from what you have previously learned in either Basic
Econometrics or Applied Econometrics courses. Asteriou & Hall (2011: p.266) said that
there are various aspects to time series analysis but the most common theme is to fully
exploit the dynamic structure in the data. Speaking differently, we will extract as much
information as possible from the past values of a certain series. The analysis of time
series is usually explored within two fundamental categories, namely, forecasting and
dynamic modelling. Pure time series forecasting, such as ARIMA and ARCH/GARCH
models, is often mentioned as univariate analysis. Unlike most other econometrics, in
univariate analysis we do not concern much with building structural models,
understanding the economy or testing economic hypotheses, but what we are really
interested is developing efficient models, which are able to forecast accurately. The
efficient forecasting models can be empirically evaluated using various ways such as

5
Suggested references: Gujarati/Porter (2009), Gujarati (2011), Wooldridge (2013), Asteriou & Hall (2011),
Studenmund (2017), Adkins & Hill (2011), Acock (2014), and Hamilton (2013).

2
significance of the estimated coefficients (especially the longest lags and white noise
nature of the errors in ARIMA models), correct sign of the estimated coefficients in
ARCH models, diagnostic checking using the correlogram, Akaike and Schwarz
criteria, and so on. In these cases, we try to exploit the dynamic inter-relationship, which
exists over time for any single series (say, sales, asset prices, or interest rates). On the
other hand, dynamic modelling, including bivariate and multivariate analysis, is mostly
concerned with understanding the structure of an economy and testing economic
hypotheses. However, this kind of modelling assumes that the series slowly adjusts to a
shock and so to understand the process must fully capture the adjustment process which
may be long and complex (Asteriou & Hall, 2011: p.266). The dynamic modelling has
become increasingly popular thanks to the works of three Nobel laureates in Economics,
namely, Clive Granger (for methods of analyzing economic time series with common
trends, or cointegration), Robert Engle (for methods of analyzing economic time series
with time-varying volatility or ARCH), and Christopher Sims (for vector autoregressive
or VAR). Up to now, dynamic modelling has remarkably contributed to economic
policy formulation, especially in macroeconomics, financial markets and energy
sectors. Generally, the key purpose of time series analysis is to capture and examine the
dynamics of the data.
In time series econometrics, it is equally important that the analysts should clearly
understand the term stochastic process. It is a collection of random variables ordered in
time (Gujarati & Porter, 2009: p.740). If we let Y denote a random variable, and if it is
continuous, we denote it a Y(t), but if it is discrete, we denote it as Y t. Since most
economic data are collected at discrete points in time, we usually use the notation Yt
rather than Y(t). If we let Y represent GDP, we have Y1, Y2, Y3, …, Y99, where the
subscript 1 denotes the first observation (i.e., GDP for the third quarter of 1993) and the
subscript 99 denotes the last observation (i.e., GDP for the first quarter of 2018). Keep
in mind that each of these Y’s is a random variable.
In what sense we can regard GDP as a stochastic process? Consider for instance the
Vietnam GDP of 836.270 billion VND for 2017Q3. In theory, the GDP figure for the
third quarter of 2017 could have been any number, depending on the prevailing
economic and political climates. The figure of 836.270 billion VND is just a particular
realization of all such possibilities. In this case, we can think of the value of 836.270
billion VND as the mean value of all possible values of GDP for the third quarter of
2017. In other words, GDP value at a certain point in time is characterized as a normal
distribution. Therefore, we can say that GDP is a stochastic process and the actual values
observed for the period 1993Q2 to 2018Q1 are a particular realization of that process.
Gujarati & Porter (2009: p.740) states that “the distinction between the stochastic
process and its realization in time series data is just like the distinction between
population and sample in cross-sectional data”. Just as we use sample data to draw

3
inferences about a population; in time series, we use the realization to draw inferences
about the underlying stochastic process.
The reason why we mention this term before examining specific models is that all basic
assumptions in time series models relate to the stochastic process (population). Stock &
Watson (2015: p.523) said that the assumption that the future will be like the past is an
important one in time series regression, sufficiently so that it is given its own name:
“stationary”. If the future is like the past, then the historical relationships can be used to
forecast the future. But if the future differs fundamentally from the past, then the
historical relationships might not be reliable guides to the future. Therefore, in the
context of time series regression, the idea that historical relationships can be generalized
to the future is formalized by the concept of stationarity.

3. STATIONARY STOCHASTIC PROCESSES


3.1 Definition
According to Gujarati & Porter (2009: p.740), a key concept underlying stochastic
process that has received a great attention and investigation by time series analysts is
the stationary stochastic process. Broadly speaking, “a time series is said to be stationary
if its mean and variance are constant over time and the value of the covariance6 between
the two periods depends only on the distance between the two periods and not the actual
time at which the covariance is computed” (Gujarati & Porter, 2009: p.740). In the time
series literature, such a stochastic process is known as a weakly stationary or covariance
stationary. By contrast, a time series is strictly stationary if all the moments of its
probability distribution are time-invariant. If, however, the stationary process is normal,
the weakly stationary stochastic process is also strictly stationary. For most practical
situations, the weak type of stationarity generally suffices. According to Asteriou &
Hall (2011: p.267), a weakly stationary series is characterized by:
(a) exhibits mean reversion in that it fluctuates around a constant long-run mean;
(b) has a finite variance that is time-invariant; and
(c) has a theoretical correlogram that diminishes as the lag length increases.
In its simplest terms a time series Yt is said to be weakly stationary (hereafter refer to
stationary) if:
(a) mean: E(Yt) =  (constant for all t);
(b) variance: Var(Yt) = E(Yt - )2 = 2 (constant for all t); and

6
or the autocorrelation coefficient.

4
(c) covariance: Cov(Yt,Yt+k) = k = E[(Yt - )(Yt+k - )].
where k, covariance (or exactly autocovariance) at lag k, is the covariance between the
values of Yt and Yt+k, that is, between two Y values k periods apart. If k = 0, we obtain
0, which is simply the variance of Y (= 2); if k = 1, 1 is the covariance between two
adjacent values of Y.
Suppose we shift the origin of Y from Yt to Yt+m (say, from the third quarter of 1998 to
the third quarter of 2008 for our GDP data). Now, if Yt is to be stationary, the mean,
variance, and autocovariance of Yt+m must be the same as those of Yt. In short, if a time
series is stationary, its mean, variance, and autocovariance (at various lags) remain the
same no matter at what point in time we measure them. Gujarati & Porter (2009: p.741)
state that such a time series will tend to return to its mean (i.e., mean reversion) and
fluctuations around this mean will have a broadly constant amplitude.
If a time series is not stationary in the sense just defined, it is called a nonstationary
time series. In other words, a nonstationary time series will have a time-varying mean
or a time-varying variance or both.
Why is stationarity important? There are at least two reasons. First, if a time series is
nonstationary, we can study its behavior only for the time period under consideration
(Gujarati & Porter, 2009: p.741). Therefore, each set of time series data will be for a
particular episode only. As a result, it is impossible to generalize it to other time periods.
Therefore, for the purpose of forecasting or policy analysis, such time series may have
little practical value. Second, if we run regressions between nonstationary series, the
results may be spurious (Gujarati & Porter, 2009: p.748; Asteriou & Hall, 2011: p.267).
In addition, a special type of stochastic process, namely, a purely random, or white noise
process, is also popular in time series econometrics. According to Gujarati & Porter
(2009: p.741), we call a stochastic process purely random if it has zero mean, constant
variance 2, and is serially uncorrelated. This is similar to what we call the error term,
ut, in the classical normal linear regression model (CNLRM). This error term is often
denoted as ut ~ iid(0,2).
3.2 MA and AR Processes
In this section, we will investigate two typical types of the stationary process, namely
moving average (MA) and autoregressive (AR).
MA(1) process
The first order of MA process [MA(1)] is defined as:
Xt = t + t-1, where t ~ iid(0,2) (1)
For examples,
OilPt = t + 0.5t-1
5
where OilPt is change in oil price and t is typhoon at sea at the current time.
Lemonadet = t - 0.5t-1
Where Lemonadet is change in lemonade quantity demanded and t is change in
temperature at the current time (see Ben Lambert’s online tutorial lectures).
MA(1) is a stationary series because it satisfies all three conditions for stationarity.

Proof:
From equation (1):
• Mean is constant.
E[Xt] = E[t + t-1] = E[t] + E[t-1] = 0 (2)
• Variance is constant.
Var(Xt) = Var(t + t-1) = Var(t) + 2Var(t-1)
= 2 + 22
= 2(1 + 2) (3)
Both  and  are constant, so Var(Xt) is indeed constant.
• Covariance only depends on the distance between two periods.
Cov(Xt, Xt-h) = f(h) ≠ f(t)
Cov(Xt, Xt-1) = Cov(t + t-1, t-1 + t-2)
= Cov(t-1, t-1) = 2 (4)
Cov(Xt, Xt-) = Cov(t + t-1, t-  + t-1- ) = 0 (5)

From (3) and (4), we have:


• Autocorrelation coefficient of order 1 is different from zero.
1 = Cov(Xt, Xt-1)/Var(Xt) = /(1 + 2)  0

From (3) and (5), we have:


• Autocorrelation coefficient of order  (  1) is always zero.
1 = Cov(Xt, Xt-)/Var(Xt) = 0/(1 + 2)  0
This property of autocorrelation is essentially important for identifying whether a
certain series follows MA(1) process or AR(1) process.

6
AR(1) process
The first order of AR process [AR(1)] is defined as:
Xt = Xt-1 + t, where t ~ iid(0,2) (6)
For examples,
OilPt = 0.5OilPt-1 + t
where OilPt-1 is change in oil price in the last period and t is any shock at the current
period.
AR(1) is a stationary series because it satisfies all three conditions for stationarity.

Proof:
From equation (6):
• Mean is constant (i.e., zero).
Xt = Xt-1 + t; t ~ iid(0,2) (6)
= [Xt-2 + t-1] + t
= 2Xt-2 + t-1 + t
= 2[Xt-3 + t-2] + t-1 + t
=…
= tX0 + 0t-0 + 1t-1 + 2t-2 + … + Tt-T
= 0t-0 + 1t-1 + 2t-2 + … + Tt-T
= t + t-1 + 2t-2 + … + Tt-T (7)
Because X0 is always zero (i.e., the starting value of any series is obviously zero).
Therefore,
E[Xt] = E[t + t-1 + 2t-2 + … + Tt-T]
=0 (8)
• Variance is constant.
Because
Xt = t + t-1 + 2t-2 + … + Tt-T (9)
So we have
Xt-1 = t-1 + t-2 + 2t-3 + … + Tt-T-1 (10)

7
Therefore,
Var(Xt) = Var(Xt-1)
As a result, from (6), (9), and (10) we have
Var(Xt) = Var(Xt-1) + Var(t)
 Var(Xt) = 2Var(Xt-1) + Var(t)
 Var(Xt) = 2Var(Xt) + Var(t)
 (1 - 2)Var(Xt) = 2
 Var(Xt) = 2/(1 - 2) = constant if || < 1 (11)
• Covariance is constant.
We have
Xt-h = hXt + t-h + t-h-1 + 2t-h-2 + … + Tt-h-T (12)
Therefore,
Cov(Xt, Xt-h) = Cov(Xt, hXt) = hCov(Xt, Xt) = hVar(Xt)
= h[2/(1 - 2)] (13)

From (12) and (13), we have:


Autocorrelation coefficient of order h is as
h = Cov(Xt, Xt-h)/Var(Xt) = h (14)
For AR(1), therefore, autocorrelation coefficient declines as h increases. This property
is essentially important for identifying whether a certain time series follows MA(1)
process or AR(1) process.
By the same token, we can show that all MA(q) and AR(p) are also stationary processes.
In principle, we can easily recognize whether a certain series follows MA(q) or AR(p)
by looking their corresponding autocorrelation functions (ACF) and partial
autocorrelation functions (PACF).
• The ACF simply represents the pattern of autocorrelation coefficients with
respect to the corresponding lags. For example, AC1 is corr(Yt,Yt-1), AC2
corr(Yt,Yt-2), …, and ACq corr(Yt,Yt-q).
• Whereas the PACF simply represents the pattern of partial regression
coefficients of the autoregressive model. For example, PAC1 is the regression
coefficient of Yt-1 when regressing Yt on Yt-1 (i.e., simple regression model);
PAC2 is the regression coefficient of Yt-2 when regressing Yt on Yt-1 and Yt-2 (i.e.,
multiple regression model); PAC3 is the regression coefficient of Yt-3 when

8
regressing Yt on Yt-1, Yt-2 and Yt-3 (i.e., multiple regression model), …, and PACp
is the regression coefficient of Yt-p when regressing Yt on Yt-1, Yt-2, …, and Yt-p
(i.e., multiple regression model).

With a stationary series, the rule of thumb is as follows:


• MA(1) has AC1  0, while the rest = 0; MA(2) has AC1  0 and AC2  0, while
the rest = 0; and so on.
• AR(1) has PAC1  0, while the rest = 0; AR(2) has PAC1  0 and PAC2  0,
while the rest = 0; and so on.

3.3 Examples of MA and AR processes

MA(1) and AR(1)


Using Stata do-file as the following commands:
clear
set obs 400
gen timevar = _n
set seed 12345
drawnorm e, n(400) means(0) sds(1)
tsset timevar
gen AR1 = 0
replace AR1 = 0.7*L.AR1 + e if _n > 1
gen MA1 = 0
replace MA1 = e+0.7*L.e if _n > 1
ac MA1, lags(15)
pac MA1, lags(15)
ac AR1, lags(15)
pac AR1, lags(15)

We have the following graphs:

9
0.40
0.40

Partial autocorrelations of MA1


0.30

0.20
Autocorrelations of MA1

0.20

0.00
0.10

-0.20
0.00
-0.10

-0.40
0 5 10 15 0 5 10 15
Lag Lag
95% Confidence bands [se = 1/sqrt(n)]
Bartlett's formula for MA(q) 95% confidence bands

Figure 3.1: ACF of MA(1) process. Figure 3.2: PACF of MA(1) process.

We see that only AC1 (i.e., 1) of MA1 process is statistically different from zero.
0.80
0.80

0.60
0.60

Partial autocorrelations of AR1


Autocorrelations of AR1

0.40
0.40

0.20
0.20

0.00
0.00

-0.20
-0.20

0 5 10 15 0 5 10 15
Lag Lag
Bartlett's formula for MA(q) 95% confidence bands 95% Confidence bands [se = 1/sqrt(n)]

Figure 3.3: ACF of AR(1) process. Figure 3.4: PACF of AR(1) process.

We see that ACh (i.e., h = h) of AR1 process is declining to zero as h increases. In this
case, only PAC1 (i.e., 1 is statistically different from zero).

MA(2) and AR(2)


Using Stata do-file as the following commands:
clear
set obs 400
gen t = _n

10
tsset t
gen eps = invnorm(uniform())
scalar theta0= 0
scalar theta1= 0.6
scalar theta2= 0.3
gen double AR2 = 0
qui replace AR2 in 3/l = theta0 + theta1*L.AR2 + theta2*L2.AR2 + eps
gen double MA2 =0
qui replace MA2 in 3/l = eps + theta1*L.eps + theta2*L2.eps
ac MA2, lags(15)
pac MA2, lags(15)
ac AR2, lags(15)
pac AR2, lags(15)

We have the following graphs:


0.60

0.60
Partial autocorrelations of MA2
0.40

0.40
Autocorrelations of MA2

0.20

0.20
0.00

0.00
-0.20

-0.20

0 5 10 15 0 5 10 15
Lag Lag
Bartlett's formula for MA(q) 95% confidence bands 95% Confidence bands [se = 1/sqrt(n)]

Figure 3.5: ACF of MA(2) process. Figure 3.6: PACF of MA(2) process.

We see that only AC1 and AC2 (i.e., 1 and 2) and of MA2 process are statistically
different from zero.

11
0.80

0.80
0.60

Partial autocorrelations of AR2

0.60
Autocorrelations of AR2

0.40

0.40
0.20

0.20
0.00

0.00
-0.20

-0.20
0 5 10 15 0 5 10 15
Lag Lag
Bartlett's formula for MA(q) 95% confidence bands 95% Confidence bands [se = 1/sqrt(n)]

Figure 3.7: ACF of AR(2) process. Figure 3.8: PACF of AR(2) process.

We see that ACh (i.e., h) of AR2 process is declining to zero as h increases. In this
case, only PAC1 and PAC2 (i.e., 1 and 2 are statistically different from zero).
By the same token, you can generate other series with higher lag orders such as MA(3)
and AR(3), MA(4) and AR(4), and so on. However, actual economic time series rarely
exhibit the exact patterns as theoretically shown.

3.4 Invertibility: Converting an AR(1) to an MA() and Vice Versa


AR(1) => MA()
If | | < 1, the AR(1) process can be converted into an infinite order MA process with
geometrically declining weights. This is simply proved as follows:
Xt = Xt-1 + t; t ~ iid(0,2) (6)
= [Xt-2 + t-1] + t
= 2Xt-2 + t-1 + t
= 2[Xt-3 + t-2] + t-1 + t
=…
= tX0 + 0t-0 + 1t-1 + 2t-2 + … + t-
= 0t-0 + 1t-1 + 2t-2 + … + t-
= t + t-1 + 2t-2 + … + t-
= t + 1t-1 + 2t-2 + … + t- = MA() (15)

12
MA(1) => AR()
If || < 1, the MA(1) process can be converted into an infinite order AR process with
geometrically declining weights. This is simply proved as follows:
Xt = t + t-1, where t ~ iid(0,2) (1)
Using the lag operator [i.e., Lt = t-1, L2t = t-2, L3t = t-3, …], equation (1) can be
rewritten as:
Xt = (1 + L)t
Xt
t = (16)
(1+ θL)

If || < 1, then the left-hand side of equation (16) can be considered as the sum of an
infinite geometric progression:
t = Xt(1 - L + 2L2 - 3L3 + …)
t = Xt - LXt + 2L2Xt - 3L3Xt + …
Xt = LXt - 2L2Xt + 3L3Xt - … + t
Xt = Xt-1 - 2Xt-2 + 3Xt-3 - … + t
Xt = 1Xt-1 - 2Xt-2 + 3Xt-3 - … + t (17)
To understand equation (17), let us rewrite the MA(1) process as defined in equation
(1) as below:
t = Xt - t-1 (18)
Lagging the relationship in equation (18) one period, we have:
t-1 = Xt–1 - t-2 (19)
Substituting this into the original expression [i.e., Eq.(1)], we have:
Xt = t + [Xt–1 - t-2] = t + Xt–1 - 2t-2 (20)
Lagging the relationship in equation (19) one period, we have:
t-2 = Xt–2 - t-3 (21)
Substituting this into the expression in Eq.(20), we have:
Xt = t + Xt–1 - 2[Xt–2 - t-3] = t + Xt–1 - 2Xt-2 + 3t-3 (22)
If we go on this procedure [i.e., lagging and substituting] for an infinite number of times,
we finally get the expression in equation (17).

13
4. NONSTATIONARY STOCHASTIC PROCESSES
4.1 Random Walk
According to Stock and Watson (2015: p.523), time series variables can fail to be
stationary in various ways, but two are especially relevant for regression analysis of
economic data: (1) the series can have persistent, long-run movements, that is, the series
can have trends; and, (2) the population regression can be unstable over time, that is,
the population regression can have breaks. For the purpose of this series of lectures, we
only focus on the first type of nonstationarity.
A trend is a persistent long-term movement of a variable over time. A time series
variable fluctuates around its trend. There are two types of trends often observed in time
series data: deterministic and stochastic. A deterministic trend is a nonrandom function
of time (i.e. Yt = A + BT + ut, Yt = A + BT + C*T2 + ut, and so on)7. For example, the
LEX [the logarithm of the dollar/euro daily exchange rate, i.e., LEX = log(EX), see data
on Table13-1.dta, Gujarati (2011)] is a nonstationary series (Figure 4.1), and its
detrended series (i.e. residuals from the regression of log(EX) on time: et = log(EX) – a
– b*Time) is still nonstationary (Figure 4.2). This points out that log(EX) is not a trend
stationary series. Note that we now temporarily accept a series with trend is
nonstationary. However, this informal method is not always reliable. We will shortly
introduce formal statistical tests for nonstationarity, called unit root tests such as ADF
[augmented Dickey-Fuller] and PP [Phillips-Perron].
.6

.3
.2
Detrended series of log(EX)
.4

.1
.2

0
0

-.1
-.2

-.2

0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500
Time Time

Figure 4.1: Log of the dollar/euro daily exchange rate. Figure 4.2: Residuals from the regression of LEX on time.

7
Yt = a + bT + et => et = Yt – a – bT is called the detrended series [where T is a trend variable, e t is residual, a
and b are estimated coefficients]. If Yt is nonstationary, while et is stationary, Yt is known as the trend (stochastic)
stationary (TSP). Here, the process with a deterministic trend is nonstationary but not a unit root process [This
term is shortly defined].

14
In contrast, a stochastic trend is random and varies over time. According to Stock and
Watson (2015: p.552), it is more appropriate to model economic time series as having
stochastic rather than deterministic trends. Therefore, our treatment of trends in
economic time series data focuses mainly on stochastic rather than deterministic trends,
and when we refer to “trends” in time series data, we mean stochastic trends unless we
explicitly say otherwise.
The simplest model of a variable with a stochastic trend is the random walk. There are
two types of random walks: (1) random walk without drift (i.e. no constant or intercept
term) and (2) random walk with drift (i.e. a constant term is present).
The random walk without drift is defined as follow. Suppose ut is a white noise error
term with mean 0 and variance 2. The Yt is said to be a random walk if:

Yt = Yt-1 + ut (23)

This equation is just a special case of equation (6), except  is now equal 1. In statistical
language, ‘ = 1’ is called unit root. The basic idea of a random walk is that the value
of the series tomorrow (Yt+1) is its value today (Yt), plus an unpredictable change (ut+1).
From equation (23), we can write
Y1 = Y 0 + u 1
Y2 = Y 1 + u 2 = Y 0 + u 1 + u 2
Y3 = Y 2 + u 3 = Y 0 + u 1 + u 2 + u 3
Y4 = Y 3 + u 4 = Y 0 + u 1 + … + u 4

Yt = Yt-1 + ut = Y0 + u1 + … + ut
In general, if the process started at some time 0 with a value Y0 [which is often assumed
as zero], we have
Yt  Y0   u t (24)

therefore,
E(Yt )  E(Y0   u t )  Y0 (25)

In like fashion, it can be shown that


Var(Yt) = E(Y0 + ∑ ut – Y0)2 = E(∑ ut )2 = t2 (26)
Therefore, the mean of Yt is equal to its initial or starting value, which is constant, but
as t increases, its variance increases indefinitely, thus violating a condition of

15
stationarity. In other words, the variance of Yt depends on t, its distribution depends on
t, that is, it is nonstationary.
Interestingly, if we re-write equation (23) as:
(Yt – Yt-1) = ∆Yt = ut (27)
where ∆Yt is the first difference of Yt. It is easy to show that, while Yt is nonstationary,
its first difference is stationary (why?). And this is very significant when we work with
time series data. In its terminology, this is widely known as the difference stationary
(stochastic) process (DSP).
Using Stata do-file as the following commands:
clear
set obs 500
gen time = _n
set seed 12345
drawnorm e, n(500) means(0) sds(1)
tsset time
gen RW = 0
replace RW = L.RW + e if _n > 1
label variable RW "Random walk without drift"
tsline RW
tsline D.RW
We have the following graphs:
40

4
Random walk without drift, D
30

2
Random walk without drift

20

0
10

-2
-4
0

0 100 200 300 400 500 0 100 200 300 400 500
time time

Figure 4.3: Random walk without drift. Figure 4.4: First difference of random walk without drift.

16
The random walk with drift can be defined as follow:
Yt =  + Yt-1 + ut (28)
where  is known as the drift parameter. The name drift comes from the fact that if we
write the preceding equation as:
Yt – Yt-1 = ∆Yt =  + ut (29)
it shows that Yt drifts upward or downward, depending on  being positive or negative.
We can easily show that, the random walk with drift violates both conditions of
stationarity. [While its first difference is indeed stationary]. Equation (28) can be
rewritten as:
Y1 =  + Y0 + u 1
Y2 =  + Y1 + u 2 =  +  + Y0 + u 1 + u 2
Y3 =  + Y2 + u3 =  +  +  + Y0 + u1 + u2 + u3
Y4 =  + Y3 + u4 =  +  +  +  + Y0 + u1 + u2 + u3 + u4

Yt = Yt-1 + ut =  +  + … +  + Y0 + u1 + … + ut
In general, if the process started at some time 0 with a value Y0 [which is often assumed
as zero], we have
E(Yt) = Y0 + t. (30)
Var(Yt) = t2 (31)
In other words, both mean and variance of Yt depends on t, its distribution depends on
t, that is, it is nonstationary.
Using Stata do-file as the following commands:
clear
set obs 500
gen time = _n
set seed 12345
drawnorm e, n(500) means(0) sds(1)
tsset time
gen RW = 0
replace RW = 0.2 + L.RW + e if _n > 1
label variable RW "Random walk with a positive drift"

17
tsline RW
tsline D.RW

We have the following graphs:


150

4
Random walk with a positive drift, D

2
100

0
50

-2
-4
0

0 100 200 300 400 500 0 100 200 300 400 500
time time

Figure 4.3: Random walk with drift = 0.2. Figure 4.4: First difference of random walk with drift = 0.2.

Using Stata do-file as the following commands:


clear
set obs 500
gen time = _n
set seed 12345
drawnorm e, n(500) means(0) sds(1)
tsset time
gen RW = 0
replace RW = -0.2 + L.RW + e if _n > 1
label variable RW "Random walk with a negative drift"
tsline RW
tsline D.RW

We have the following graphs:

18
4
0

Random walk with a negative drift, D

2
-20

0
-40

-2
-60
-80

-4
0 100 200 300 400 500 0 100 200 300 400 500
time time

Figure 4.5: A random walk without drift. Figure 4.6: First difference of a random walk without drift.

Stock and Watson (2015: p.553) say that because the variance of a random walk
increases without bound, its population autocorrelations (1) are not defined (the first
autocovariance and variance are infinite and the ratio of the two is not well defined).
Cov(Yt ,Yt−1 ) 
Corr(Yt , Yt−1 ) = ~ (32)
√Var(Yt )Var(Vart−1 ) 

In a nutshell, a random walk is a nonstationary process, where either its mean or its
variance or both increases over time. However, it is a difference stationary process
because its first difference is stationary.
Let’s return the LEX example. Figures 4.7 and 4.8 show that the logarithm of the
dollar/euro daily exchange rate is characterized as a difference stationary process
because its level is not stationary, whereas its first difference is stationary.
.04
.6

Log of the dollar/euro daily exchange rate, D


.4

.02
.2

0
0

-.02
-.2

0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500
Time Time

Figure 4.1: Log of the dollar/euro daily exchange rate. Figure 4.7: First difference of log(EX).

19
4.2 Unit Root Stochastic Process
According to Gujarati & Porter (2009: p.744), the random walk model is an example of
what is known in the literature as a unit root process.
Let us write the random walk model (23) as:
Yt = Yt-1 + ut (-1    1) (33)
This model resembles the Markov first-order autoregressive model [AR(1)], usually
mentioned in the basic econometrics course, serial correlation topic. If  = 1, equation
(33) becomes a random walk without drift. If  is in fact 1, we face what is known as
the unit root problem, that is, a situation of nonstationarity. The name unit root is due
to the fact that  = 1. Technically, if  = 1, we can write equation (33) as Yt – Yt-1 = ut.
Now using the lag operator L so that Lyt = Yt-1, L2Yt = Yt-2, and so on, we can write
equation (33) as (1-L)Yt = ut. If we set (1-L) = 0, we obtain, L = 1, hence the name unit
root. Thus, the terms nonstationarity, random walk, and unit root can be treated as
synonymous.
If, however, ||  1, that is if the absolute value of  is less than one, then it can be
shown that the time series Yt is stationary. In other words, equation (33) is really an
AR(1) process, which is previously proved as a stationary process with constant mean,
constant variance, and time-invariant covariance.

4.3 Illustrative Examples


Consider the AR(1) model as presented in equation (33). Generally, we can have three
possible cases:
Case 1: || < 1 and therefore the series Yt is stationary. Graphs of stationary series for
 = 0.67 and  = 0.98 are presented in Figures 4.8-9.
Case 2:  = 1 where in this case the series contains a unit root and is non-
stationary. Graph of stationary series for  = 1 are presented in Figure
4.10.
Case 3: || > 1 where in this case the series explodes. A graph of an explosive
series for  = 1.26 is presented in Figure 4.11.
In order to reproduce the graphs and the series which are stationary, exploding and
nonstationary, we type the following commands in Stata:
clear
set obs 500
gen timevar = _n
set seed 12345
20
tsset timevar
drawnorm e1 e2 e3 e4, n(500) means(0 0 0 0) sds(1 1 1 1)
gen AR_67 = 0
replace AR_67 = 0.67*L.AR_67 + e1 if _n > 1
label variable AR_67 "AR(1) with theta = 0.67"
gen AR_97 = 0
replace AR_97 = 0.97*L.AR_97 + e2 if _n > 1
label variable AR_97 "AR(1) with theta = 0.97"
gen RW = 0
replace RW = L.RW + e3 if _n > 1
label variable RW "AR(1) with theta = 1"
gen EP = 0
replace EP = 1.1*L.EP + e4 if _n > 1
label variable EP "AR(1) with theta = 1.1"
tsline AR_67
tsline AR_97
tsline RW
tsline EP
10
6
4

5
AR(1) with theta = 0.67

AR(1) with theta = 0.97


2

0
-5
0

-10
-2

-15
-4

0 100 200 300 400 500 0 100 200 300 400 500
timevar timevar

Figure 4.8: AR(1) with  = 0.67 (stationary). Figure 4.9: AR(1) with  = 0.97 (stationary).

21
4.00e+19
60

3.00e+19
AR(1) with theta = 1.1
AR(1) with theta = 1

40

2.00e+19
20

1.00e+19
0

0
0 100 200 300 400 500 0 100 200 300 400 500
timevar timevar

Figure 4.10: AR(1) with  = 1 (random walk). Figure 4.11: AR(1) with  = 1.1 (explosive).

5. THE UNIT ROOTS AND SPURIOUS REGRESSIONS


5.1 Spurious Regressions
Most macroeconomic time series have trend and therefore are nonstationary. The
problem with nonstationary or trended data is that the standard ordinary least squares
(OLS) regression procedures can easily lead to incorrect conclusions. According to
Asteriou & Hall (2011: p.338), it can be shown in these cases that the regression results
have very high value of R2 (sometimes even higher than 0.95) and very high values of
t-statistics (sometimes even higher than 4), while the variables used in the analysis have
no real interrelationships.
Asteriou & Hall (2011: p.338) states that many economic series typically have an
underlying rate of growth, which may or may not be constant, for example GDP, prices
or money supply all tend to grow at a regular annual rate. Such series are not only
nonstationary because the mean is continually rising but not integrated after taking the
first difference. This gives rise to one of the main reasons for taking the logarithm of
data before doing formal econometric analysis. If we take the logarithm of a series,
which exhibits an average growth rate, we will turn it into a series that follows a linear
trend and therefore integrated. This can be easily seen formally. Suppose we have a
series Xt, which increases by 10% every period, thus:
Xt = 1.1Xt-1
If we then take the logarithm of this we get
log(Xt) = log(1.1) + log(Xt-1)

22
Now the lagged dependent variable [i.e., log(Xt-1)] has a unit coefficient and each period
it increases by an absolute amount equal to log(1.1), which is of course constant. This
series would now be I(1).
More formally, consider the model:
Yt = β1 + β 2 Xt + u t (34)
where ut is the error term. The assumptions of classical linear regression model (CLRM)
require both Yt and Xt to be covariance stationary. In the presence of nonstationarity,
then the results obtained from a regression of this kind are totally spurious8 and these
regressions are called spurious regressions.
The intuition behind this is quite simple. Over time, we expect any nonstationary series
to wander around, so over any reasonably long sample the series either drift up or down.
If we then consider two completely unrelated series which are both nonstationary, we
would expect that either they will both go up or down together, or one will go up while
the other goes down (see Figure 5.1). If we performed a regression of one series on
another, we would then find either a significant positive relationship if they are going
in the same direction or a significant negative one if they are going in opposite directions
even though they are really unrelated. This is the essence of a spurious regression.
It is said that a spurious regression usually has a very high R2, t statistics that appear to
provide significant estimates, but the results may have no economic meaning. This is
because the OLS estimates may not be consistent, and therefore all the tests of statistical
inference are not valid.
Granger and Newbold (1974) constructed a Monte Carlo analysis generating a large
number of Yt and Xt series containing unit roots following the formulas:
Yt = Yt-1 + eYt (35)
Xt = Xt-1 + eXt (36)
where eYt and eXt are artificially generated normal random numbers (as the same way
performed in Section 4).
Since Yt and Xt are independent of each other, any regression between them should give
insignificant results. However, when they regressed various Yts on Xts as show in Table
5.1, they surprisingly found that they were unable to reject the null hypothesis of β2 = 0
for approximately 75% of the cases. They also found that their regressions had very
high R2s and very low values of Durbin-Watson d statistics.

8
This was first introduced by Yule (1926), and re-examined by Granger and Newbold (1977) using the Monte
Carlo simulations.

23
To see the spurious regression problem, we can type the following commands in Stata
to see how many times we can reject the null hypothesis of β2 = 0. The commands are:
clear
set obs 500
gen time = _n
set seed 12345
drawnorm e1 e2, n(500) means(0 0) sds(1 1)
tsset time
gen Y = 0
gen X = 0
replace Y = L.Y + e1 if _n > 1
replace X = L.X + e2 if _n > 1
label variable Y "Y is a random walk"
label variable X "X is a random walk"
twoway scatter Y X || lfit Y X, ytitle("Y is a random walk") xtitle("X is a random
walk") legend(off)
reg Y X
An example of a plot of Y against X obtained in this way is shown in Figure 5.1. The
estimated equation between these simulated series is:

Table 5.1: Spurious regression

24
40
30
20
10
0

-20 -10 0 10 20
X is a random walk

Figure 5.1: Scatter plot of a spurious regression

Granger and Newbold (1974) proposed the following “rule of thumb” for detecting
spurious regressions: If R2 > DW statistic or if R2  1 then the estimated regression
‘must’ be spurious (Gujarati, 2011: p.226).
To understand the problem of spurious regression better, it might be useful to use an
example with real economic data. This example was conducted by Asteriou & Hall
(2011: p.340). Consider a regression of the logarithm of real GDP (Yt) to the logarithm
of real money supply (Mt) and a constant. The results obtained from such a regression
are the following:
Yt = 0.042 + 0.453Mt; R2 = 0.945; DW = 0.221 (37)
(4.743) (8.572)
Here we see very high t-ratios, with coefficients that have the correct signs and more or
less plausible magnitudes. The coefficient of determination is very high (R2 = 0.945),
but there is a high degree of serial correlation (DW = 0.221). This shows evidence of
the possible existence of spurious regression. In fact, this regression is totally
meaningless because the money supply data are for the UK economy, while the GDP
data are for the US economy. Therefore, although there should not be any significant
relationship, the regression seems to fit the data well, and this happens because the

25
variables used in the example are trended (i.e. nonstationary). So, Asteriou & Hall
(2011: p.340) recommends that econometricians should be very careful when working
with trended variables. You can see similar examples in Gujarati (2011, pp.224-226).

5.2 Explaining the Spurious Regression Problem


According to Asteriou & Hall (2011: p.340-1), in a slightly more formal way the source
of the spurious regression problem comes from the fact that if two variables, X and Y,
are both stationary, then in general any linear combination of them will certainly be
stationary. One important linear combination of them is of course the error term, and so
if both variables are stationary, the error term will also be stationary and have a well-
behaved distribution. However, when the variables are nonstationary, then we cannot
guarantee that the errors will be stationary and as a general rule (although not always)
the error itself is nonstationary. If this happens, we violate the CLRM assumptions of
OLS regression. If the errors are nonstationary, we could expect them to wander around
and eventually get larger. But OLS regression, it selects the parameters so as to make
the sum of the squared errors as small as possible, will select any parameter which gives
the smallest error and so almost any parameter value can result.
The simplest way to examine the behavior of ut is to rewrite (34) as:
ut = Yt – β1 – β2Xt (38)
or, excluding the constant β1 (which only affects ut sequence by rescaling it):
ut = Yt – β2Xt (39)
If Yt and Xt are generated by equations (35) and (36), then if we impose the initial
conditions Y0 = X0 = 0 we get that:
ut = Y0 + eY1 + eY2 + … + eYi + 2(X0 + eX1 + eX2 + … + eXi)

or
t t
ut   eYi  2  eXi (40)
i 1 i 1

From equation (40), we realize that the variance of the error term will tend to become
infinitely large as t increases. Hence, the assumptions of the CLRM are violated, and
therefore, any t test, F test or R2 are unreliable.
In terms of equation (34), there are four different cases to discuss (Asteriou & Hall,
2011: p.342):

26
Case 1: Both Yt and Xt are stationary9, and the CLRM is appropriate with OLS
estimates being BLUE (Best Linear Unbiased Estimators).
Case 2: Yt and Xt are integrated of different orders. In this case, the regression
equations are meaningless.
Case 3: Yt and Xt are integrated of the same order [often I(1)] and the ut sequence
contains a stochastic trend. In this case, we have spurious regression and it is
often recommended to re-estimate the regression equation in the semi-
difference methods (such as the FGLS method: Orcutt-Cochrane procedure,
Prais-Winsten procedure, and Newey-West standard error).
Case 4: Yt and Xt are integrated of the same order and the ut is stationary. In this special
case, Yt and Xt are said to be cointegrated. The concept of cointegration will
be examined in detail later.

6. TESTING FOR UNIT ROOTS


6.1 Graphical Analysis
According to Gujarati & Porter (2009: p.749), before implementing formal tests, it is
always advisable to plot the time series under study. Such a plot (line graph of the level
and the first difference) [and correlogram of both the level and the first difference (i.e.,
using ACF)] gives an initial clue about the likely nature of the time series. Such an
intuitive feel is the starting point of formal tests of stationarity (i.e. choose the
appropriate test equation). You can see various graphs in Section 4.
6.2 Autocorrelation Function and Correlogram
Autocorrelation is the correlation between a variable lagged one or more periods and
itself. The correlogram or autocorrelation function is a graph of the autocorrelations for
various lags of a time series data. According to Hanke (2005), the autocorrelation
coefficients10 of a series can be used to answer the following questions:
(1) Are the data random? (This is usually used for the diagnostic tests of
forecasting models).
(2) Do the data have a trend (nonstationary)?
(3) Are the data stationary?
(4) Are the data seasonal?

9
Based on the statistical tests such as ADF, PP, and KPSS.
10
This is not explained in this lecture. You can make references from either Gujarati & Porter (2009: pp.808-13),
Hanke (2005: 60-74), or Nguyen Trong Hoai et al (2009: Chapters 3, 4, and 8).

27
Besides, the correlogram is very useful when selecting the appropriate lags [i.e., p and
q] in the ARIMA models and ARCH family models (Hoai et al., 2009)11.
(1) If a series is random, the autocorrelations (i.e. ACF) between Yt and Yt - k for
any lag k are close to zero (i.e., individual autocorrelation coefficients are
statistically insignificant). The successive values of a time series are not related
to each other (Figure 6.1). In other words, Yt and Yt - k are completely
independent for all values of k (k = 1, …., p).
(2) If a series has a (stochastic) trend, successive observations are highly
correlated, and the autocorrelation coefficients are typically significantly
different from zero for the first several time lags and then gradually drop
toward zero as the number of lags increases [i.e., not weakly dependent]. The
autocorrelation coefficient for lag 1 is often very large (close to 1). The
autocorrelation coefficient for lag 2 will also be large, and so on. However, it
will not be as large as for lag 1 (Figure 6.2).
(3) If a series is stationary, the autocorrelation coefficients for, say lag 1, lag 2, or
lag 3, are significantly different from zero and then suddenly die out as the
number of lags increases (Figure 6.3). In other words, Yt and Yt-1, Yt and Yt-2,
Yt and Yt-3 are weakly correlated [i.e., weakly dependent]; but Yt and Yt-k [as
k increases] are completely independent.
(4) If a series has a seasonal pattern, a significant autocorrelation coefficient will
occur at the seasonal time lag or multiples of seasonal lag (Figure 6.4). This is
beyond the scope of this series of lectures.

Figure 6.1: Correlogram of a random series

11
As discussed in Section 4 about AR(p) and MA(q) that p is selected by using PAC graph, and q by using AC
graph. ARIMA(p,d,q) is just a combination of the two processes after differencing d times. Since ARIMA models
are beyond the scope of this series of lectures, so we will not discuss them here.

28
Figure 6.2: Correlogram of a nonstationary series

Figure 6.3: Correlogram of a stationary series

Figure 6.4: Correlogram of a seasonal series

29
The correlogram becomes very useful for time series forecasting and other practical
(business) implications. If you conduct academic studies, however, it is necessary to
provide more formal statistics such as t statistic12, Box-Pierce Q statistic, Ljung-Box
(LB) statistic, and especially unit root tests.
6.3 Simple Dickey-Fuller Test for Unit Roots
Dickey and Fuller (1979, 1981) proposed a procedure to formally test for
nonstationarity (hereafter refer to DF test). The key insight of their tests is that testing
for nonstationarity is equivalent to testing for the existence of a unit root. Thus the test
is obviously based on the AR(1) model:
Yt = Yt-1 + ut (41)
What we need to examine here is  = 1 (unity and hence ‘unit root’). Obviously, the
null hypothesis is H0:  = 1, and the alternative hypothesis is H1:  < 1.
We obtain a different (more convenient) version of the test by subtracting Yt-1 from both
sides of Eq.(41):
Yt – Yt-1 = Yt-1 – Yt-1 + ut
∆Yt = ( - 1)Yt-1 + ut
∆Yt = Yt-1 + ut (42)
where  = ( - 1). Then, now the null hypothesis is H0:  = 0, and the alternative
hypothesis is H1:  < 0. In this case, if  = 0, then Yt follows a pure random walk (and,
of course, in this case Yt is nonstationary).
Dickey and Fuller (1979) also proposed two alternative regression equations that can
be used for testing for the presence of a unit root. The first contains a constant in the
random walk with drift process as follow:
∆Yt =  + Yt-1 + ut (43)
According to Asteriou & Hall (2011: p.343), this is an extremely important case,
because such a process exhibits a deterministic trend in the series when  = 0 (why?),
which is often the case for macroeconomic variables.
The second case is also to allow, a time trend in the model13, so as to have:
∆Yt =  + T + Yt-1 + ut (44)

12
See Hoai et al, 2009 and my lecture on ARIMA models to understand the standard error in time series
econometrics s.e. = 1/ n .
13
Exactly, a deterministic trend exists in the first differenced series.

30
The Dickey-Fuller test for stationarity is the simply the normal ‘t’ test on the coefficient
of the lagged dependent variable Yt-1 from one of the three models (42, 43, and 44).
This test does not, however, have a conventional ‘t’ distribution and so we must use
special critical values which were originally calculated by Dickey and Fuller. This is
also known as the Dickey-Fuller tau statistic (Gujarati & Porter, 2009: p.755). However,
most modern statistical packages such as Stata and Eviews routinely produce the critical
values for Dickey-Fuller tests at 1%, 5%, and 10% significant levels.
MacKinnon (1991,1996) tabulated appropriate critical values for each of the three
above models and these are presented in Table 6.1.

Table 6.1: Critical values for DF tests.

Model 1% 5% 10%

∆Yt = Yt-1 + ut -2.56 -1.94 -1.62

∆Yt =  + Yt-1 + ut -3.43 -2.86 -2.57

∆Yt =  + T + Yt-1 + ut -3.96 -3.41 -3.13

Standard critical values -2.33 -1.65 -1.28

Source: Asteriou & Hall (2011: p.343)

In all cases, the tests concern whether  = 0. The DF test statistic is the t statistic for the
lagged dependent variable. If the DF statistical value is smaller [in absolute terms] than
the critical value then we reject the null hypothesis of a unit root and conclude that Y t
is a stationary process. An easy way is to compare the ‘MacKinnon approximate’ p-
value and the significance level () often at 1%, 5%, and 10%. If the p-value is smaller
than a chosen level of significance, we reject the null hypothesis of a unit root. Note
that the MacKinnon approximate p-value and the test statistic are not always consistent
to each other (see StataCorp, 2017b: dfgls).
6.4 Augmented Dickey-Fuller Test for Unit Roots
As the error term may not be a white noise, Dickey and Fuller extended their test
procedure by suggesting an augmented version of the test (hereafter refer to ADF test)
which includes additional lagged terms of the dependent variable in order to control
serial correlation in the test equation. The lag length14 on these additional terms is either

14
See ‘Lag length selection using information criteria’ and ‘Determining lag lengths in VARs’ in Stock & Watson
(2015: p.547-551, p.641).

31
determined by Akaike Information Criterion (AIC) or Schwarz Bayesian/Information
Criterion (SBC, SIC), or more usefully by the lag length necessary to whiten the
residuals (i.e. after each case, we check whether the residuals of the ADF regression are
autocorrelated or not through LM tests and not the Durbin-Watson d test (why?)).
The three possible forms of the ADF test are given by the following equations:
p
Yt  Yt  1   i Yt  i  ut (45)
i 1

p
Yt    Yt 1   i Yt i  ut (46)
i 1

p
Yt    T  Yt 1   i Yt i  ut (47)
i 1

The difference between the three regressions concerns the presence of the deterministic
elements α and T. The critical values for the ADF test are the same as those given in
Table 6.1 for the DF test.
Similar to the simple cases, the ADF tests also concern whether  = 0. The ADF test
statistic is the t statistic for the lagged dependent variable. If the ADF statistical value
is smaller [in absolute terms] than the critical value then we reject the null hypothesis
of a unit root and conclude that Yt is a stationary process. Again, an easy way is to
compare the MacKinnon approximate p-value and the significance level () often at
1%, 5%, and 10%. If the MacKinnon approximate p-value is smaller than a chosen level
of significance (say 5%), we reject the null hypothesis that Yt represents a random walk
or has a unit root.
According to Asteriou & Hall (2011: p.344), unless the econometrician knows the
actual data-generating process, there is a question concerning whether it is most
appropriate to estimate models (36), (37), or (38). It is suggested that the test procedure
should start estimating the most general model given by equation (38) and then
answering a set of questions regarding the appropriateness of each model and moving
to the next model. This procedure is illustrated in Figure 6.1. It needs to be stressed here
that, although useful, this procedure is not designed to be applied in a mechanical
fashion. Plotting the data and observing the graph is sometimes very useful because it
can clearly indicate the presence or not of deterministic regressors (StataCorp, 2017b:
dfuller). [Note: we mean tsline of the first differenced series]. However, this procedure
is the most sensible way to test for unit roots when the form of the actual data-generating
process is typically unknown. In addition, the ADF test results are sensitive to the lag
lengths selected (StataCorp, 2017b: dfgls). Therefore, in practical applications, it is
necessary to use other tests for comparison purpose.

32
Figure 6.1: Procedure for testing for unit roots using ADF methodology.

Estimate the model

NO STOP: Conclude that


 = 0? there is no unit root

YES: Test for the presence of


the trend NO
is  = 0? given NO STOP: Conclude that
that  = 0?  = 0?
YES Yt has a unit root

YES

Estimate the model STOP: Conclude that


NO
there is no unit root

is  = 0?

YES: Test for the NO


presence of the constant

is  = 0? given NO STOP: Conclude that


that  = 0?  = 0?
YES Yt has a unit root

YES
STOP: Conclude that
NO there is no unit root
Estimate the model

is  = 0? YES STOP: Conclude that


Yt has a unit root

Source: Asteriou & Hall (2011: p.345)

33
6.5 Other Unit Root Tests
In practical studies, researchers mostly use both the ADF and the Phillips-Perron (PP)
tests15. Because the distribution theory that supporting the Dickey-Fuller tests is based
on the assumption of random error terms [iid(0,2)], when using the ADF methodology
we have to make sure that the error terms are uncorrelated and they really have a
constant variance. Phillips and Perron (1988) developed a generalization of the ADF
test procedure that allows for fairly mild assumptions concerning the distribution of
errors (Asteriou & Hall, 2011: p.344-5). The regression for the PP test is similar to DF
equation (43).
∆Yt =  + Yt-1 + et (48)
While the ADF corrects for higher order serial correlation by adding lagged differenced
terms of dependent variable on the right-hand side of the test equation, the PP test uses
Newey-West (1987) standard errors16 to account for serial correlation (Asteriou & Hall,
2011: p.345-6; StataCorp, 2017b: pperron).
So, the PP statistics are just modifications of the ADF t statistics that take into account
the less restrictive nature of the error process. The expressions are extremely complex
to derive and are beyond the scope of my notes. Since most statistical packages have
routines available to calculate these statistics, it is good for researcher to test the order
of integration of a series performing the PP test as well. The asymptotic distribution of
the PP t statistic is the same as the ADF t statistic and therefore the MacKinnon
(1991,1996) critical values are still applicable. That means the PP tests also concern
whether  = 0. The PP test statistic is the t statistic for the lagged dependent variable. If
the PP statistical value is smaller [in absolute terms] than the critical value then we
reject the null hypothesis of a unit root and conclude that Yt is a stationary process.
Again, an easy way is to compare the MacKinnon approximate p-value and the
significance level () often at 1%, 5%, and 10%. If the MacKinnon approximate p-
value is smaller than a chosen level of significance, we reject the null hypothesis that
Yt represents a random walk or has a unit root.
As with the ADF tests, the PP tests can be performed with the inclusion of a constant
and linear trend, or none of them in the test regression.
Dickey-Fuller tests may have low power (H0 of unit root not rejected, whereas in reality
there may be no unit root) when ρ is close to one. This could be the case of trend

15
Recently, the dfgls has become a priority in practical applications.
16
See Wooldridge (2013: p.431-4).

34
stationarity (H0). An alternative test is KPSS17 (Kwiatkowski-Phillips-Schmidt-Shin,
1992). Its test procedure is briefly summarized as:
(1) Regress Yt on intercept and time trend and obtain OLS residuals et.
t
(2) Calculate partial sums St =  es for all t.
s1

S2t T
(3) Calculate the test statistic KPSS = T  2 , and compare with critical value.
2

s1 
ˆ

The critical values are routinely produced by statistical packages such as Stata and
Eviews. The null hypothesis [of stationarity] is rejected if the KPSS test statistic is larger
than the selected critical value, often at 5% level of significance.
Another statistical test for a unit root, namely augmented Dickey-Fuller using GLS –
generalized least square (dfgls)18 has recently developed. Among statistical tests for a
unit root, dfgls is the most powerful and informative Hamilton (2012: p.376, StataCorp,
2017b: dfgls). It performs the modified Dickey-Fuller t test proposed by Elliott,
Rothenberg, and Stock (1996). Basically, dfgls is an augmented Dickey-Fuller test,
except that the series is transformed via a generalized least squares regression before
performing the test (see StataCorp, 2017b: dfgls).
A special attention is that the above unit root tests have an assumption that no structural
breaks exist in the series of interest. If this is a case, we must use alternative tests such
as Zivot and Andrews (ZA, 1992) or Lumsdaine and Papell (LP, 1997)19.

6.6 Performing Unit Root Tests in Stata


In this section, we use the example about the log(EX) series [Table13-1.dta, Gujarati
(2011)] to illustrate the unit root tests using Stata. For all cases, the null hypothesis is
that log(EX) series represents a random walk, or has a unit root [except the casr of KPSS
test as above mentioned].

17
See Greene (2008: p.755).
18
See ‘The DF-GLS test for a unit root’ in Stock & Watson (2015: p.651-4); StataCorp (2017) – dfgls: Here you
can see the ADF results are not as strong as those produced by dfgls through an example about the log of
investment in Germany.
19
See Narayan (2005).

35
Table 6.2: DF test of log(EX), Eq.(43).

The DF t statistic is 0.172, which is positive. This incorrect sign may be due to the test
equation is incorrectly specified. The positive  would imply  > 1, which means
log(EX) is explosive. This is not usual in macroeconomic data (Greene, 2008: p.740).
Therefore, we rule out this possibility.

Table 6.3: DF test of log(EX), Eq.(44).

The absolute value of the DF t statistic in this case is 3.026, less than the 10% critical
value of 3.128, so we should not reject the null hypothesis that log(EX) represents a
random walk, or has a unit root. In other words, log(EX) series is not stationary at 10%
level of significance. The MacKinnon approximate p-value of this test statistic is
approximately 12.48 percent as you can see at the bottom of the test results.

36
Table 6.4: DF-GLS test of log(EX).

The dfgls above reports tests of the nonstationary null hypothesis [i.e., the log(EX)
series represents a random walk, or has a unit root] for lags from 1 to 10 days. At the
bottom, the output offers three different methods for choosing an appropriate number
of lags: Ng-Perron sequential t, minimum Schwarz information criteria, and Ng-Perron
modified Akaike information criteria (MAIC). The MAIC is more recently developed,
and Monte Carlo experiments support its advantages over the Schwarz method. The
absolute value of DF-GLS statistic for 5 lags is 0.511, less than the 10% critical value
of 2.556, so we should not reject the null hypothesis. Note that Ng-Perron sequential t
indicates the maximum lags is 26. However, for space limited, we just restrict at 10
maximum lag lengths.
Let use the maximum lags at 5 from DF-GLS test results, we find that the absolute value
of the ADF t statistic from equation (47) is 2.809, less than the 10% critical value of
3.120, so we should not reject the null hypothesis that log(EX) represents a random
walk, or has a unit root. As a result, we can conclude that log(EX) series is not stationary
at 10% level of significance.

37
Table 6.5: ADF test of log(EX), Eq.(47).

Table 6.6: PP test of log(EX), Eq.(48).

Table 6.7: KPSS test of log(EX) for trend stationarity hypothesis.

The absolute value of the PP t statistic from equation (48) is 3.027, less than the 10%
critical value of 3.120, so we should not reject the null hypothesis that log(EX)

38
represents a random walk, or has a unit root. Therefore, both ADF and PP tests confirm
that log(EX) series is not stationary at 10% level of significance.
To make sure that the log(EX) series is not trend stationary, we use the KPSS test. The
results in Table 6.7 reject the null hypothesis that log(EX) is not trend stationary because
the test statistics at all lags are larger than critical values at 10%.
Table 6.8: dfgls test of log(EX).

A similar test of the first difference of log(EX) in Table 6.10, on the other hand, rejects
the nonstationary null hypothesis [i.e., log(EX) series has a unit root] at all lags [Note:
the maximum lag based on MAIC is up to 26], even at the 1% level. Therefore, we can
confirm that log(EX) series is a difference stationary process.
Table 6.9: ADF test of log(EX), Eq.(47).

39
The absolute value of the ADF t statistic is about 17.68 [Table 6.9] and PP t statistic is
about 48.39 [Table 6.10], greater than the 1% critical value of 3.43, so we should reject
the null hypothesis that the first difference of log(EX) series represents a random walk,
or has a unit root. Eventually, we could conclude that log(EX) series follows a
difference stationary process.

Table 6.10: PP test of log(EX), Eq.(48).

7. SHORT-RUN AND LONG-RUN RELATIONSHIPS


7.1 Understanding Concepts
In case of bivariate model, you have once known the static or short-run causal
relationship between two time-series Yt and Xt, where Yt is dependent variable and Xt
is independent variable. The OLS regression often experiences the serial correlation,
and we perform various remedies such as semi-difference methods (Cochrane-Orcutt,
Prais-Winsten; generally, GLS methods), first difference method, and Newey-West
standard error method. By any way, the purpose of such a study is just to know the
short-run slope or elasticity of Yt with respect to Xt [Yt/Yt]. However, the nature of
the structural modeling is to discover the dynamic causal relationship between Yt and
Xt. In such models, you must at least distinguish between the short-run and long-run
relationships [either slope or elasticity]. To simplify the analysis, we consider the simple
autoregressive distributed lag model [ARDL(1,1)] in the following form:

Yt = A0 + A1Yt-1 + B0Xt + B1Xt-1 + ut (49)

[Important note: We implicitly assume that ut is a white noise; i.e., the simple
ARDL(1,1) is a well-specified model].
We can analyze both short-run and long-run effects (either slopes or elasticities) defined
as follows:

40
(1) Short-run or static effect:
Yt
 B0 (50)
Xt

(2) Long-run or dynamic or equilibrium effect:


YT B  B1
 0 (51)
Xt 1  A1

Proof:
Yt
 B0
Xt

Yt  1 Y
 A1 t  B1 = A1.B0  B1
Xt Xt
Yt  2 Y
 A1 t  1  A1(A1.B0  B1)
Xt Xt
Yt  3 Y
 A1 t  2  A12(A1.B0  B1)]
Xt Xt


Yt    1 Y
 A1 t    A1(A1.B0  B1)]
Xt Xt

If |A1| < 1, the cumulative effect or long-run slope (Slr) will be the sum of all derivatives:
Slr  B0  [A1B0  B1]  A1[A1B0  B1]  A12(A1.B0  B1)]  ...  A1(A1.B0  B1)] (52)

Multiply both sides of (52) by A1, we have:


A1Slr  A1B0  A1[A1B0  B1]  A12(A1.B0  B1)]  ...  A1(A1.B0  B1)] (53)

By substract (53) from (52), we obtain:


Slr – A1Slr = B0 + B1
B0  B1
Slr = = equation (51)
1  A1

We can also take expectations to derive the long-run relation between Yt and Xt [see
Asteriou & Hall, 2011: p.360]:
E(Yt) = A0 + A1E(Yt-1) + B0E(Xt) + B1E(Xt-1)
E(Yt) = A0 + A1E(Yt) + B0E(Xt) + B1E(Xt)
E(Yt) - A1E(Yt) = A0 + (B0 + B1)E(Xt)
41
(1-A1)E(Yt) = A0 + (B0 + B1)E(Xt)
A0 (B  B1)
=> E(Yt) =  0 E(Xt)
1  A1 (1  A1

= α + βE(Xt)
or simply to write:
Y* = α + βX* (54)
Here, β = (B0 + B1)/(1 - A1) is the long-run effect of a lasting shock in Xt. And the short-
run effect of a change in Xt is B0.
In the same token, we can expand to a more complicated ARDL(p,q) model [Important
note: We implicitly assume that ut is a white noise]:
(1) Short-run or static effect:
Yt
 B0 (55)
Xt

(2) Long-run or dynamic or equilibrium effect:


YT B  B1  B2...  Bq
 0 (56)
Xt 1  A1  A2...  A p

7.2 ARDL and ECM: Two Sides of the Same Coin


By subtracting Yt - 1 both sides of equation (49), and re-arranging, we have:
Yt – Yt - 1 = A0 + A1Yt - 1 – Yt-1 + B0Xt – B0Xt - 1 + B0Xt - 1 + B1Xt - 1 + ut
Yt = A0 – (1–A1)Yt - 1 + B0Xt + (B0 + B1)Xt - 1 + ut
 A0 (B  B1) 
= B0Xt – (1–A1)  Yt  1   0 Xt  1  + ut
 (1  A1) (1  A1) 
= B0Xt – (1–A1) Yt  1    Xt  1  + ut

= B0Xt –  Yt  1    Xt  1  + ut (57a)

= B0Xt – ECTt - 1 + ut (57b)


This is applicable with all ARDL(p,q) models. Part in brackets of equation (57a) is
Error-Correction Term (i.e., the equilibrium error). Equations (57a or 57b) are widely
known as the Error Correction Mechanism (ECM). Therefore, ECM and ARDL are
basically the same if the series Yt and Xt are integrated of the same order [often I(1)]
and cointegrated. The term ‘cointegration’ is defined shortly.

42
In this model, Yt and Xt are assumed to be in long-run equilibrium, i.e. changes in Yt
relate to changes in Xt according B1. If Yt - 1 deviates from the optimal value (i.e. its
equilibrium), there will be a correction. Speed of adjustment is given by  = (1 - A1),
which is between > 0 and <1. We will discuss  coefficient in detail when discussing
the ECM model in the next section. Note that how large of  depends on which
mechanism [i.e., AR(p) or DL(q)] that the ARDL model follows. If coefficients A1, A2,
.., Ap are large [i.e., the ARDL model mainly follows the AR(p) process], then  will
be small. That means the speed of adjustment toward equilibrium is slow. Besides, this
coefficient also depends on the number of explanatory variables [Xt] included in the
model (Gujarati, 2011: p.243).

8. COINTEGRATION AND ERROR CORRECTION MODELS


8.1 Cointegration20
According to Asteriou & Hall (2011: p.357), the concept of cointegration was first
introduced by Granger (1981) and elaborated further by Engle and Granger (1987),
Engle and Yoo (1987), Phillips and Ouliaris (1990), Stock and Watson (1988), Phillips
(1987), and Johansen (1991 &1995).
It is known that trended time series can potentially create major problems in empirical
econometrics due to spurious regressions. One way of resolving this is to difference the
series successively until stationarity is achieved and then use the stationary series for
regression analysis. According to Asteriou & Hall (2011: p.356), this solution, however,
is not ideal because it not only differences the error process in the regression, but also
no longer gives a unique long-run solution.
If two variables are nonstationary, then we can represent the error as a combination of
two cumulated error processes. These cumulated error processes are often called
stochastic trends and normally we could expect that two nonstationary processes would
combine to produce another non-stationary process. However, in a special case that two
variables, say Xt and Yt, are really related, then we would expect them to move together
and thus the two stochastic trends would be very similar to each other. As a result, their
combination could possibly eliminate the nonstationarity. In this special case, we say
that the variables are cointegrated. Cointegration should only happen when there is truly
a relationship linking the two variables, so it becomes a very powerful way of detecting
the presence of economic structures. If the variables do not cointegrate, we usually face
the problems of spurious regression and our econometric analysis becomes almost

20
Standard regression techniques, such as OLS, require that the variables be covariance stationary ... Cointegration
analysis provides a framework for estimation, inference, and interpretation when the variables are not covariance
stationary (StataCorp, 2017b: vec intro).

43
meaningless. On the other hand, if the stochastic trends do cancel to each other, then we
have cointegration (i.e., the common trend), which gives us various practical
implications for policy design (Asteriou & Hall, 2011: p.356).
Suppose that, if there really is a genuine long-run relationship between Yt and Xt,
although they will rise overtime (because they are trended), there will be a common
trend that links them together. For an equilibrium, or long-run relationship to exist, what
we require, then, is a linear combination of Yt and Xt that is a stationary variable [an
I(0) variable]. A linear combination of Yt and Xt can be directly taken from estimating
the following regression (Asteriou & Hall, 2011: p.356-7):
Yt = β 1 + β 2 Xt + u t (58)
And the obtain the residuals:

ût  Yt  ˆ
1  ˆ
2Xt (59)21

If ût ~ I(0), we say that two variables Yt and Xt are cointegrated. Therefore, two
variables are said to be cointegrated if each is an I(1) process but a linear combination
of them is an I(0) process. It is important to note that if Yt and Xt cointegrate, the simple
regression of Yt on Xt is mis-specified (StataCorp, 2017b: vec intro).
8.2 An Example of Cointegration
Table14-1.dta [Gujarati, 2011: Chapter 14] gives quarterly data on personal
consumption expenditure (PCE) and personal disposable (i.e. after-tax) income (PDI)
for the USA for the period 1970-2008 (Gujarati, 2011: p.226). Both graph (Figure 8.1)
and ADF tests (Tables 8.1 and 8.2) indicate that these two series are not stationary. They
are I(1), that is, they have stochastic trends. In addition, the regression of log(PCE) on
log(PDI) seems to be spurious (Table 8.3) [because R2 > DW d statistic].
Since both series are trending, let us see what happens if we add a trend variable to the
model. The elasticity coefficient is now changed, but the regression is still spurious
(Table 8.4). However, after estimating the regression of log(PCE) on log(PDI) and
trend, we realize that the obtained residuals is a stationary series [i.e., I(0)] at 5% level
of significance (Table 8.5). This implies that a linear combination (et = log(PCE) – b1 –
b2log(PDI) – b3T) cancels out the stochastic trends in the two variables. Therefore, this
regression is, in fact, not spurious (Gujarati, 2011: pp.229-30). In other words, the
variables log(PCE) and log(PDI) are cointegrated.

21
Greene (2008: p.756) calls this as ‘partial difference between the two variables’. If this difference is stable
around a fixed mean, it implies the series are drifting together at roughly the same rate.

44
9
8.5
8

Log, personal consumption expenditure


Log, personal disposable income
7.5

0 50 100 150
Time

Figure 8.1: Logs of PDI and PCE, USA 1970-2008.

Table 8.1: Unit root tests for log(PCE).

45
Table 8.2: Unit root tests for log(PDI).

Table 8.3: OLS regression of log(PCE) on log(PDI)

46
Table 8.4: Regression of log(PCE) on log(PDI) and trend.

Table 8.5: ADF test for residual series from Table 8.4.

In terms of economic interpretation, two variables will be cointegrated if they have a


long-run, or equilibrium, relationship between them. In the present context, economic
theory tells us that there is a strong relationship between consumption expenditure and
personal disposable income (Gujarati, 2011: p.230).

47
In the language of cointegration theory, the equation log(PCE) = B1 + B2log(PDI) + B3T
is known as a cointegrating regression and the slope parameters B2 and B3 are known
as cointegrating parameters.

8.3 Tests of Cointegration


There are four popular tests of cointegration for time series data depending on the nature
of integration and number of variables in the model. These tests can be briefly
summarized as follows:
a) Xt: I(1) and Yt: I(1): we can use EG and AEG tests. We will discuss this case in
detail in this section.
b) Xt: I(1), Yt: I(1) and Zt: I(1) [and more variables: I(1)]: we can use Johansen’s
rank testing approach. We will discuss this case in Section 10.
c) Xt: I(0) and Yt: I(1) [and more variables: I(0) or I(1)]: we can use Perasan’
bounds testing approach. We will discuss this case in Section 12.
d) Xt: I(1) and Yt: I(d) [and more variables: I(1) or I(d)], where d > 1: we can use
Toda-Yamamoto (1995) testing approach. This rarely happens in economic time
series data. Therefore, it is beyond the scope of my notes.
For single equation of two variables, the simple tests of cointegration are DF and ADF
unit root tests on the residuals estimated from the cointegrating regression [i.e., residual-
based cointegration tests]. These unit root tests for the residuals are widely known as
the Engle-Granger (EG) and augmented Engle-Granger (AEG) tests. Notice the
difference between the unit root and cointegration tests. Tests for unit roots are
performed on single time series, whereas residual-based cointegration tests deal with
the relationship among a group of variables [usually two variables], each having a unit
root, I(1), (Gujarati, 2011: p.230-1).
This Engle-Granger 2-step approach involves the following steps:
Table 8.6: Engle-Granger 2-step approach: Step-by-step

Step 1 Test the variables for their order of integration.


The first step is to test each variable to determine its order of
integration. All tests for a unit root discussed in Section 6 can be
applied in order to infer the number of unit roots in each of the
variables. We might face three cases:
a) If both variables are stationary [I(0)], the standard OLS can be
applied to investigate the relationship between these variables. We
often mention this case as ‘stationary’ time series regression

48
models. Here, what we mainly concern is the problem of serial
correlation [see Adkins & Hill, 2011: Chapter 9].
b) If the variables are integrated of different orders, we could apply
other methods such as bounds tests and/or Toda-Yamamoto (1995)
tests for cointegration.
c) If both variables are integrated of order 1: I(1), we proceed with
step two.

Step 2 Estimate the long-run (i.e., cointegrating) relationship.


If the results of Step 1 indicate that both Xt and Yt are integrated of the
same order [usually I(1) in economics], the next step is to estimate the
long-run equilibrium relationship of the form:
ˆ  ˆ
Yt   Xt  ût

and obtain the residuals of this equation.

Step 3 Check for (cointegration) the order of integration of the residuals.


We might face two cases:
1) If ût is not stationary, the regression in Step 2 might be
spurious. If this is a case, we can only investigate either the
short-run relationship or the causality between them by
transforming into first differenced series. Causality tests are
discussed in Section 11.
2) If ût is stationary, the regression in Step 2 is called
cointegrating regression, which shows the long-run relationship
between the two variables. If this is a case, we can investigate
not only the short-run relationship or the causality between
them but also the long-run relationship by using the error
correction model.

Step 4 Estimate the error correction model.


If the variables are cointegrated, the residuals from the equilibrium
regression can be used to estimate the error correction model, which
allows us to analyze both the long-run and short-run effects between
the variables as well as to see how fast the disequilibrium is adjusted
toward its long-run equilibrium state. This is discussed shortly.

Source: Modified from Asteriou & Hall (2011: p.364-5)

49
According to Asteriou & Hall (2011: p.366) and Gujarati (2011: p.235-6), one of the
best features of the Engle-Granger 2-step approach is that it is both very easy to
understand and to implement. However, it also remains various shortcomings:

(1) One very important issue has to do with the order of the variables. When
estimating the long-run relationship, one has to place one variable in the left-
hand side and use the others as regressors. The test does not say anything about
which of the variables can be used as regressors and why. Consider, for example,
the case of just two variables, Xt and Yt. One can either regress Yt on Xt (i.e. Yt
= A + BXt + u1t) or choose to reverse the order and regress Xt on Yt (i.e. Xt = C
+ DYt + u2t). It can be shown, which asymptotic theory, that as the sample goes
to infinity the test for cointegration on the residuals of those two regressions is
equivalent (i.e. there is no difference in testing for unit roots in u1t and u2t).
However, in practice, especially in economics we rarely have very big samples
[i.e., realizations] and it is therefore possible to find that one regression exhibits
cointegration while another doesn’t. This is obviously a very undesirable feature
of the EG approach. The problem obviously becomes far more complicated when
we have more than two variables under investigation.

(2) A second problem is that when there are more than two variables there may be
more than one integrating relationship, and EG 2-step approach using residuals
from a single relationship cannot treat this possibility. In other words, the EG
approach does not allow for estimation of more than one cointegrating
regression. Suppose we have k variables, there can be at most (k - 1)
cointegrating relationships. If this is a case, we have to use cointegration tests
developed by Johansen.

(3) Along with the second problem, a third problem in dealing with multiple time
series is that we not only have to consider finding more than one cointegrating
relationship, but then we will also deal with the error correction term for each
cointegrating relationship. As a result, the simple, or bivariate error correction
model will obviously not work. This problem can be solved by using the vector
error correction model (VECM).

(4) The final problem is that it relies on a two-step estimator. The first step is to
generate the residual series and the second step is to estimate a regression for this
series in order to see if the series is stationary or not. Hence, any error introduced
in the first step is of course carried into the second step.

50
8.4 Interpreting the Error Correction Model
According to Asteriou & Hall (2011: p.360), the concepts of cointegration and the error
correction mechanism are very closely related. To understand the ECM, it is better to
think first of the ECM as a convenient reparameterization of the general linear
autoregressive distributed lag (ARDL) model [as shown in Section 7.2].
Consider the very simple dynamic ARDL(1,1) model describing the behavior of Yt in
terms of Xt as equation (49):
Yt = A0 + A1Yt-1 + B0Xt + B1Xt-1 + ut (49)
where ut ~ iid(0,2).
[That mean we implicitly assume that ut is a white noise, i.e., the ARDL(1,1) is a
correctly specified model].
In this model22, the parameter B0 denotes the short-run reaction of Yt after a change in
Xt [Eq.(50)]. The long-run effect is given when the model is in equilibrium where:
Y* = α + βX* (54)
Recall that he long-run effect (either slope or elasticity) between Yt and Xt is captured
by β = (B0 + B1)/(1 - A1) [Eq.(51)]. It is noted that, we need to make the assumption that
|A1| < 1 (why?) in order that the short-run model converges to a long-run solution.
The ECM is shown in equation (57a or 57b):
Yt = B1Xt –  Yt  1    Xt  1  + ut (57a)

or
Yt = B1Xt – ECTt - 1 + ut (57b)
According to Asteriou & Hall (2011: p.361), what is of importance here is that when
the two variables Yt and Xt are cointegrated, the ECM incorporates not only short-run
but also long-run effects. This is because the long-run equilibrium [Yt - 1 – α – βXt - 1] is
included in the model together with the short-run dynamics captured by the differenced
term. Another important advantage is that all the terms in the ECM model are stationary
and the standard OLS estimation is therefore valid. This is because if Yt and Xt are I(1),
then ∆Yt and ∆Xt are I(0), and by definition if Yt and Xt are cointegrated then their
linear combination [ut-1 = Yt-1 – α – βXt - 1] ~ I(0).
A final important point is that the coefficient  = (1 - A1) provides us with information
about the speed of adjustment in cases of disequilibrium. Note again that the value of

22
We can easily expand this model to a more general case for large numbers of lagged terms [ARDL(p,q)] as
shown in Eq(55) and Eq.(56).

51
coefficient  depends on A1 and number of X variable(s) included in the ARDL model
[the bias problem due to omitted X variable(s)]. To understand this better, consider the
long-run condition. When equilibrium holds, then [Yt-1 – α – βXt - 1] = 0. However,
during periods of disequilibrium this term is no longer be zero and measures the distance
that the system is away from its equilibrium state. For example, suppose that due to a
series of negative shocks in the economy in period t - 1. This causes [Yt - 1 – α – βXt - 1]
to be negative because Yt - 1 has moved below its long-run equilibrium path. However,
thanks to  = (1 - A1) is positive (why?), [so the product of -*ut - 1 > 0], then the overall
effect (with an assumption that the short-run effect of Xt on Yt is unchanged) is to boost
∆Yt back towards its long-run path [i.e., Yt = Yt - 1 + ∆Yt]. Again, notice that the speed
of this adjustment to equilibrium is dependent upon the magnitude of  = (1 - A1).
The coefficient  in equations (57a or 57b) is the error-correction coefficient and is also
called the adjustment coefficient. In fact,  tells us how much of the adjustment to
equilibrium takes place each period [say month, quarter, or year; depending on the
original data], or how much of the equilibrium error is corrected each period. According
to Asteriou & Hall (2011: p.363), it can be explained in the following ways:
(1) If  ~ 1, then nearly 100% of the adjustment takes place within the period 23, or
the adjustment is very fast. [i.e., Xt and their lags are key determinants of Yt].
(2) If  ~ 0.5, then about 50% of the adjustment takes place each period.
(3) If  ~ 0, then there is no adjustment. [i.e., Xt and their lags do not at all determine
Yt; i.e., Yt purely follows AR() mechanism].
According to Asteriou & Hall (2011: p.359-60), the ECM is important and popular for
many reasons, such as:
(1) It is a convenient model measuring the correction from disequilibrium of the
previous period which has a very good economic implication.
(2) If we have cointegration, ECM models are formulated in terms of first difference,
which typically eliminate trends from the variables involved; they resolve the
problem of spurious regressions.
(3) A very important advantage of ECM models is the ease with they can fit into the
general-to-specific (or Hendry) approach to econometric modeling, which is a
search for the best ECM model that fits the given data sets.
(4) The most important feature of ECM comes from the fact that the disequilibrium
error term is a stationary variable. Because of this, the ECM has important
implications: the fact that the two variables are cointegrated implies that there is

23
Again, this depends on the kind of data used, say, annually, quarterly, or monthly.

52
some automatically adjustment process which prevents the errors in the long-run
relationship becoming larger and larger.

8.5 Error Correction Model: Some Numerical Examples


8.5.1 Personal Consumption Expenditure and Personal Disposable Income
In Section 8.2 we knew that log of personal consumption expenditure (LPCE) and log
personal disposable income (LPDI) are cointegrated, and the long-run elasticity of
consumption with respect to income is about 0.77 [Table 8.4]. The relationship between
LPCE and LPDI is presented as follow:
9
8.5
8
7.5

8 8.2 8.4 8.6 8.8 9


Log, personal disposable income

Deviations around long-run equilibrium Long-run equilibrium

Figure 8.2: Long-run relationship and short-run deviations between LPDI and LPCE.

The long-run relationship between LPCE and LPDI is given by the following equation:

LPCEt = 1.67 + 0.77LPDIt + 0.0024Time + ût (60)

The lagged residuals obtained from Eq.(60) is then included in the error correction
model as an regressor of change in LPCE at the current time (LPCEt). The Stata
commands are as follows:

53
use "D:\My Blog\Time series econometrics for beginners\Table14_1.dta", clear
tsset time
regress lnpce lnpdi time
predict S1, resid
regress D.lnpce D.lnpdi L.S1

Table 8.6: Error correction model of LPCE and LPDI.

All coefficients in the table are individually statistically significant at 6% or lower level.
The coefficient of about 0.31 shows that a 1% increase in log(PDIt/PDIt-1) will lead on
average to a 0.31% increase in ln(PCEt/PCEt-1). This is the short-run consumption-
income elasticity. Whereas the long-run value is given by the cointegrating regression
(Table 8.4), which is about 0.77.
The coefficient of the error-correction term of about -0.065 suggests that about 6.5% of
the discrepancy between long-term and short-term LPCE is corrected within a quarter
(quarterly data), suggesting a slow rate of adjustment to equilibrium. Gujarati (2011,
p.233) said that one reason the rate of adjustment seems low is that our model is rather
simple. If we had the necessary data on interest rate, wealth of consumer, and so on,

54
probably we might have seen a different result. In addition, we might expect that LPCE
strongly follows the AR() mechanism.
Therefore, the ECM is presented as the following equation:

LPCEt = A1 + A2LPDIt + A3ût − 1 + vt (61)

LPCEt = 0.0055 + 0.306LPDIt – 0.065ût − 1 + vt (62)

This equation postulates that changes in LPCE depend on changes in LPDI and the
lagged equilibrium error term estimator, ût − 1 . If this error term is zero, there will not
be any disequilibrium between the cointegrating relationship [no error term here,
Eq.(60)]. But if the equilibrium error term is nonzero, relationship between LPCE and
LPDI will be out of equilibrium (Gujarati, 2011: p.232).
Suppose that LPDI = 0 (no change in LPDI) and ût − 1 is positive. This means LPCEt-
1 is too high to be in equilibrium – that is LPCEt - 1 is above its equilibrium value [= 1.67
+ 0.77LPDIt - 1 + 0.0024(Time – 1)]. Therefore, the product – 0.065ût − 1 is negative,
and LPDIt will be negative to restore the equilibrium. That is, if LPCEt - 1 is above its
equilibrium value, it will start falling in the period t to correct the equilibrium error. By
the same token, if LPCEt - 1 is below its equilibrium value [= 1.67 + 0.77LPDIt - 1 +
0.0024(Time – 1)], i.e., ût−1 is negative. The product – 0.065ût − 1 is positive, and
LPDIt will be positive to restore the equilibrium. That is, if LPCEt - 1 is below its
equilibrium value, it will start rising in the period t to correct the equilibrium error.
How about in our current example? Let us list the actual values of LPCE, LPDI, and
ût − 1 in some periods. Here are the Stata’s commands:
use "D:\My Blog\Time series econometrics for beginners\Table14_1.dta", clear
tsset time
regress lnpce lnpdi time
predict S1, resid
regress D.lnpce D.lnpdi L.S1
predict D_lpce
ereturn list
matrix b=e(b)
matrix list b
scalar b1 = b[1,3]
scalar b2 = b[1,1]

55
scalar b3 = b[1,2]
gen A1 = b1
gen A2 = b2
gen A3 = b3
rename lnpce lpce
rename lnpdi lpdi
list A1 A2 D.lpdi A3 S1 D_lpce in 152/156

Table 8.7: LPCEt due to LPDIt and ût−1 from Eqs.(61, 62).

For example, at observation 155, the actual LPCE is below its long-run equilibrium (S1
< 0). From Table 8.7, we have ∆LPCÊ 156 = 0.0055 + 0.306*0.0082 + 0.065*0.024 ~
̂ 156 = LPCE155 + 0.00845, which indicates that LPCE is rising
0.00845. Therefore, LPCE
to restore the equilibrium (although the rate of this adjustment is very slow).

8.5.2 The 3-Month and 6-Month Treasury Bill Rates


The above example indicates that the speed of adjustment toward equilibrium is quite
slow. This might be the nature of the consumption expenditure – it strongly depends on
the lagged consumption expenditures. In other contexts, especially in financial markets,
the financial variables might be less dependent on the AR() mechanism, but other
mechanisms such as MA() and/or distributed lags. To see this, we now investigate an
example about the relationship between 6-month and 3-month T-bill rates in the US
economy. This example is also from Gujarati (2011: p.233-5). Table14-8.dta gives
monthly data on 3-month and 6-month T-bill rates from January 1981 to January 2010.
To see how they relate to each other, we firstly plot the data using the following
commands in Stata:
use "D:\My Blog\Time series econometrics for beginners\Table14_8.dta", clear

56
des
label variable tb3 "3-month treasury bill rate"
label variable tb6 "6-month treasury bill rate"
set obs 349
gen month = ym(1981, 1) + _n
format %tm month
tsset month
tsline tb3 || tsline tb6, legend(position(18) ring(0) rows(2)) ylabel(0 5 10 15 17) /*
*/ xtitle(" ")
17

3-month treasury bill rate


6-month treasury bill rate
15
10
5
0

1980m1 1990m1 2000m1 2010m1

Figure 8.3: Monthly three and six months Treasury Bill rates.

Figure 8.3 shows that two series TB3 and TB6 closely go together, so we would expect
that the two rates are cointegrated. In other words, there might be a stable equilibrium
relationship between them, although each series exhibits stochastic trend, I(1). To
further investigate their relationship, we first test each series for stationarity. By using
dfgls with trend, we realize that the maximum lags based on MAIC for TB3 and TB6
are 16 and 15, respectively. Thanks to these results, we then apply the ADF tests with
constant, trend, and maximum lag, and find that both series are stationary at 1% level
of significance (Tables 8.8 and 8.9).

57
Table 8.8: ADF test for stationarity of TB3 series.

Table 8.9: ADF test for stationarity of TB6 series.

Now let us find out if the two series are cointegrated. Gujarati (2011: p.234) suggests
the cointegrating equation with quadratic trend as presented in Table 8.10. From this
regression results, we obtain the residuals [denote ECT], and then apply dfgls and ADF
tests for stationarity of this residual series. The ADF test is reported in Table 8.11. The
unit root test results show that the two series (TB6 and TB3) are cointegrated.

58
Table 8.10: Relationship between TB6 and TB3.

Table 8.11: AEG test of cointegration between TB6 and TB3.

The ECM estimation is presented in Table 8.12. Since the TB rates are in percentage
form, the findings here suggest that if the 6-month TB rate was higher than the 3-month
TB rate more than expected in the previous month, this month it will be reduced by
about 0.2 percentage points to restore the equilibrium relationship between the two
series (Gujarati, 2011: p.234-5).

59
From the cointegrating regression given in Table 8.10, we see that after allowing for
deterministic trends, if the 3-month TB rate goes up by one percentage point, on average
the 6-month TB rate goes up by about 0.95 percentage point – a very close relationship
between the two. From the ECM model given in Table 8.12, we observe that in the short
run a one percentage point change in the 3-month TB rate leads on average to about
0.88 percentage point change in the 6-month TB rate, which shows how quickly the two
rates move together.
Table 8.12: Error correction model for TB6 and TB3.

We try to regress 3-month TB rate on 6-month TB rate (Tables 8.13 and 8.14), and find
out the similar results because our sample size is large. However, the results will be
different if we are studying more than two series (Gujarati, 2011: p.235).
Table 8.13: Relationship between TB3 and TB6.

60
Table 8.14: Error correction model for TB6 and TB3.

9. VECTOR AUTOREGRESSIVE MODELS


9.1 Bivariate VAR
According to Asteriou & Hall (2011: p.320), it is quite common in economics to have
models where some variables are not only explanatory variables for a given dependent
variable, but they are also explained by the variables that they are used to determine. In
those cases, we have models of simultaneous equations, in which it is necessary to
clearly identify which are the endogenous and which are the exogenous or
predetermined variables. The decision regarding such a differentiation among variables
was heavily criticized by Sims24 (1980).
According to Sims (1980), if there is simultaneity among a number of variables, then
all these variables should be treated in the same way. In other words, these should not
distinct the endogenous versus exogenous variables. Therefore, once this distinction is
abandoned, all variables are equally treated as endogenous. This means that in the
reduced form, each equation has the same set of regressors which leads to the
development of the VAR models.
VAR is defined as a system of ARDL equations describing dynamic evolution of a set
of variables from their common history (here vector implies multiple variables
involved. The VAR model is defined as follow. Suppose we have two series, in which
Yt is affected by not only its past (or lagged) values but current25 and lagged values of

24
Nobel prize in economics 2012.
25
Gujarati (2011, p.266) said that [from the point of view of forecasting] each equation in VAR contains only its
own lagged values and the lagged values of the other variables in the system. Similarly, Wooldridge (2003, p.620-
1) said that whether the contemporaneous (current) value is included or not depends partly on the purpose of the
equation. In forecasting, it is rarely included.

61
Xt, and simultaneously, Xt is affected by not only its lagged values but current and
lagged values of Yt. This simple bivariate VAR model [i.e., a system of two variables
and one lagged value of each variable on the right-hand side: VAR(1)] is given by:

Yt = 10 - 11Xt + 11Yt - 1 + 12Xt - 1 + u1t (63)


Xt = 20 - 21Yt + 21Yt - 1 + 22Xt - 1 + u2t (64)
[

We assume that both Yt and Xt are stationary; and u1t and u2t are unrelated white-noise
error terms, which are called impulses or innovations or shocks in the language of VAR
(Gujarati, 2011, p.266; Gujarati & Porter, 2009: p.785). Note that a critical requirement
of VAR is that the time series under consideration are stationary (Gujarati, 2011: p.267).
Eq.(63) and Eq.(64) are ARDL(1,1) models, and they both constitute a first-order VAR
model [VAR(1)]. Also note that these equations are not reduced-form equations since
Yt has a contemporaneous impact on Xt (given by B1) and Xt has a contemporaneous
impact on Yt (given by B2) (Asteriou & Hall, 2011: p.320). Based on Asteriou & Hall
(2011: p.320-1), rewriting the system using matrix algebra, we get:
1 11 Yt   12 Yt−1 u1t
[ ] [ ] = [ 10 ] + [11 ] [
22 X t−1 ] + [ u2t ] (65)
21 1 Xt 20 21

or
BZt = 0 + 1Zt-1 + ut (66)
where
1 11 Y   12 u1t
B=[ ], Zt = [ t ], 0 = [ 10 ], 1 = [11 22 ], and u = [u2t ]
21 1 Xt 20 21
t

Multiplying both sides of Eq.(66) by B-1 we obtain:

Zt = A0 + A1Zt-1 + et (67)

where A0 = B-10, A1 = B-11, and et = B-1ut.


We can now rewrite the VAR(1) model as:

Yt =  + a1Yt - 1 + c1Xt - 1 + e1t (68)


Xt =  + b1Yt - 1 + d1Xt - 1 + e2t (69)

where
Y  a1 c1 e1t
Zt = [ t ], A0 = [ ], A1 = [b ], and e t=[
e2t ]
Xt  1 d1

62
To distinguish between the original VAR model and the system we have just obtained,
we call the first as a structural26 or primitive VAR system and the second as a VAR in
the standard (or reduced) form. It is important to note that the new error terms, e1t and
e2t, are composites of the two shocks u1t and u2t. Since et = B-1ut we can obtain e1t and
e2t as follows:

e1t = (u1t + 11u2t)/(1 - 1121) (70)


e2t = (u2t + 21u1t)/(1 - 1121) (71)

Since u1t and u2t are white-noise processes, it follows that both e1t and e2t are also white-
noise processes.
Similarly, we can now write the VAR(2) model as:

Zt = A0 + A1Zt - 1 + A2Zt - 2 + et (72)


or

Yt =  + a1Yt - 1 + c1Xt - 1 + a2Yt - 2 + c2Xt - 2 + e1t (73)


Xt =  + b1Yt - 1 + d1Xt - 1 + b2Yt - 2 + d2Xt - 2 + e2t (74)

where
Y  a1 c1 a2 c2 e1t
Zt = [ t ], A0 = [ ], A1 = [b ], A 2=[ ], and et=[ ]
Xt  1 d1 b2 d2 e 2t

Generally, the VAR(q) model in reduced form is as follows:

Zt = A0 + A1Zt - 1 + A2Zt - 2 + … + AqZt - q + et (75)

or

Yt =  + a1Yt - 1 + c1Xt - 1 + a2Yt - 2 + c2Xt - 2 + … + aqYt - p + cqXt - q + e1t (76)


Xt =  + b1Yt - 1 + d1Xt - 1 + b2Yt - 2 + d2Xt - 2 + … + bqYt - p + dqXt - q + e2t (77)

where
Y  a1 c1 a2 c2 aq cq e1t
Zt = [ t ], A0 = [ ], A1 = [b d1 ], A2 = [b2 ], … A q=[
dq ], and et = [e2t ]
Xt  1 d2 bq

26
See ‘Using VARs for causal analysis’ in Stock & Watson (2015: p.641-2).

63
The bivariate VAR often has the following features (Gujarati, 2011: p.266):
(1) The bivariate VAR resembles a simultaneous equation system, but the
fundamental difference between them is that each equation in VAR contains only
its own lagged values and the lagged values of the other variables in the system.
In other words, no current values of the two variables are included on the right-
hand side of these equations.
(2) Although the number of lagged values of each variable can be different, in most
cases we use the same number of lagged terms in each equation.
(3) The bivariate VAR system given above is known as a VAR(q) model, because
we have q lagged values of each variable on the right-hand side. If we have only
one lagged value of each variable on the right-hand side, it would be a VAR(1)
model; if two lagged terms, it would be a VAR(2) model; and so on.
(4) Although we are dealing with only two variables, the VAR system can be
extended to several variables.
(5) But if we consider several variables in the system with several lags for each
variable, we will have to estimate several parameters, which is not a problem in
our age of high-speed computers and sophisticated software, but the system
quickly becomes quickly unwieldy.
(6) In the two-variable system of equations (72) and (73), there can be at most one
cointegrating, or equilibrium, relationship between them. If we have a three-
variable VAR system, there can be at most two cointegrating relationships
between the three variables. In general, an k-variable VAR system can have at
most (k - 1) cointegrating relationships. Note that finding out how many
cointegrating relationships exist among n variables requires the use of
Johansen’s methodology.
Note that all variables have to be of the same order of integration. The following cases
are distinct:
(1) All the variables are I(0) (stationary): the standard case, i.e. a VAR in level. In
that case, we can estimate each equation by OLS. The VAR(q) system is defined
as follows:
Zt = A0 + A1Zt - 1 + A2Zt - 2 + … + AqZt - q + et (75)
or
Yt =  + a1Yt - 1 + c1Xt - 1 + a2Yt - 2 + c2Xt - 2 + … + aqYt - q + cqXt - q + e1t (76)
Xt =  + b1Yt - 1 + d1Xt - 1 + b2Yt - 2 + d2Xt - 2 + … + bqYt - q + dqXt - q + e2t (77)
where

64
Y  a1 c1
Zt = [ t ], A0 = [ ], A1 = [b d1 ],
Xt  1

a2 c2 aq cq e1t
A2 = [b d2 ], …, A q=[
bq dq ], and e t=[
e2t ]
2

(2) All variables are I(1) but are not cointegrated, then we estimate a VAR using
first differences of variables, which are now stationary. Here we can also use
OLS to estimate each equation individually. However, we are just able to
investigate the short-run relationships and causality directions among these
variables. The VAR(p) system is defined as follows [note that p = q - 1]:
Zt = 0 + 1Zt - 1 + 2Zt - 2 + … + pZt - p + vt (78)
or
Yt =  + a1Yt - 1 + c1Xt - 1 + … + apYt - p + cpXt - p + v1t (79)
Xt =  + b1Yt - 1 + d1Xt - 1 + … + bpYt - p + dpXt - p + v2t (80)
where
Yt  a1 c1
Zt = [ ], 0 = [ ], 1 = [b d1 ],
X t  1

a2 c2 ap cp v1t
2 = [b d2 ], … p=[
bp dp ], and vt=[
v ]
2 2t

(3) All variables are I(1), but are cointegrated, then we have to use the error
correction mechanism (ECM). However, we are dealing with more than one
equation in a VAR system, the multivariate counterpart of ECM is known as the
vector error correction model (VECM). VECM is just a special case of the VAR
for variables that are stationary in their first differences. In addition, VECM can
also take into account any cointegrating relationships among the variables
(Adkins & Hill, 2011: p.407). The VECM is defined as follows:
Zt = 0 + 1Zt - 1 + 2Zt - 2 + … + pZt - p + Zt - 1 + vt (81)
or
Yt =  + a1Yt-1 + c1Xt-1 + … + apYt-p + cpXt-p + e1Yt-1 + g1Xt-1 + v1t (82)
Xt =  + b1Yt-1 + d1Xt-1 + … + bpYt-p + dpXt-p + f1Yt-1 + h1Xt-1 + v2t (83)
where
Yt  a1 c1 a2 c2 ap cp
Zt = [ ], 0 = [ ], 1 = [b ], 2=[ ], … p=[
dp ],
X t  1 d1 b2 d2 bp
e1 g1 Yt−1 v1t
 = ’ = [ f h1 ], Z t-1 = [ ], and vt=[
v2t ]
1 X t−1

65
We can decompose  = ’ where  is the speed of adjustment to equilibrium
coefficient, and ’ is the matrix of long-run coefficients. In the next section, we will
discuss VECM models in detail for a case of more than one cointegrating equations.
According to Asteriou & Hall (2011: p.321) and Gujarati & Porter (2009: p.788), the
VAR model has some good characteristics.
• First, it is very simple because we do not have to worry about which variables
are endogenous or exogenous.
• Second, estimation is also very simple, in the sense that each equation can be
estimated with the usual OLS method separately.
• Third, forecasts obtained from VAR models are in most cases better than those
obtained from the far more complex simultaneous equation models.
• Four, besides forecasting purposes, VAR models also provide framework for
causality tests, which will be presented shortly in Section 11.
According to Asteriou & Hall (2011: p.321-2), the VAR models have been criticized
by the following aspects.
• First, they are a-theoretic since they are not based on any economic theory. Since
initially there are no restrictions on any of the parameters under estimation, in
effect ‘everything causes everything’. However, statistical inference is often
used in the estimated models so that some coefficients that appear to be
insignificant can be dropped, in order to lead models that might have an
underlying consistent theory. Such inference is normally carried out using what
are called causality tests.
• Second, they are criticized due to the loss of degrees of freedom. Thus, if the
sample size is not sufficiently large, estimating that large a number of
parameters, say, a three-variable VAR model with 12 lags for each, will consume
many degrees of freedom, creating problems in estimation.
• Third, the obtained coefficients of the VAR models are difficult to interpret since
they totally lack any theoretical background.
Gujarati & Porter (2009: p.788-9) add some other aspects:
• Because of its emphasis on forecasting, VAR models are less suited for policy
analysis.
• In an m-variable VAR model, all the m variables should be (jointly) stationary.
If that is not a case, we will have to transform the data appropriately (e.g., by
first-differencing). But the results from the tranformed data may be
unsatisfactory.

66
9.2 Estimating VAR Models in Stata
It is important to remember that a VAR model is used where there is no cointegration
among the variables and it is estimated using time series that have been transformed to
their stationary values. In other words, all variables in a VAR system must be stationary.
In Stata, the command for estimating a VAR model is:
varbasic27 endvariables, lags(# / #)
where endvariables is simply the names of the endogenous variables in the model, and
after lags the number of lags is specified by starting the first and the last lag numbers in
the parentheses. For example, suppose we have two stationary variables Yt and Xt [i.e,
both are I(0)], and the optimal lag length is 4, the we have:
varbasic Yt Xt, lags(1/4)
Note that the optimal lag length is determined by using information criteria such as AIC,
SIC, etc., as we will see in the following examples.

9.2.1 Relationship between consumption expenditure and income


This example is based on Adkins & Hill (2011: p.412-9). The dataset consumption.dta
includes quarterly macroeconomic data log of real personal disposable income (denoted
as Y) and log of real personal consumption expenditure (denoted as C) for the U.S.
economy over the period 1960:1 to 2009:4. The first step is to determine whether these
variables are stationary. If they are not, then take first differences, checking to make
sure that the differences are stationary (i.e., integrated). Next, test for cointegration. If
they are cointegrated, estimate the VECM model (see Section 10). If not, use the
differences and lagged differences to estimate a VAR model.
The implementation steps are as follows:
1) Checking the stationarity by using graphs and statistical tests;
2) Checking the cointegration of the relationship between two variables [and
suppose that they are not cointegrated];
3) Selecting the optimal lag length of the VAR model;
4) Estimating the selected VAR model.

27
Stata has two commands for fitting reduced-form VARs: var and varbasic. var allows for constraints to be
imposed on the coefficients, while varbasic allows you to fit a simple VAR quickly without constraints and graph
the IRFs (StataCorp, 2017b: var intro).

67
Figure 9.1: Personal consumption expenditure and disposable income in the U.S.
10.5
9.5 10
9

1960q1 1970q1 1980q1 1990q1 2000q1 2010q1


date

ln(Consumption ln(Income

Figure 9.2: First differences of personal consumption expenditure and disposable income in the U.S.
.04
.02
0
-.02

1960q1 1970q1 1980q1 1990q1 2000q1 2010q1


date

D.ln(Consumption D.ln(Income

68
The Stata commands are as follows:
use "D:\My Blog\s4poe_statadata\consumption.dta", clear
gen date =q(1960q1)+_n-1
format %tq date
gen Y = log(inc)
gen C = log(cons)
tsset date
tsline C Y, legend(lab (1 "ln(Consumption") lab(2 "ln(Income"))
tsline D.C D.Y, legend(lab (1 "D.ln(Consumption") lab(2 "D.ln(Income"))
dfgls C, trend
dfuller C, trend lags(3)
dfgls Y, trend
dfuller Y, trend lags(1)
regress C Y time
predict ehat, resid
tsline ehat
dfgls ehat, trend
dfuller ehat, trend lag(1)
varsoc28 D.C D.Y
varbasic D.C D.Y, lag(1/1) step(12) nograph

28
See StataCorp (2017b: varsoc). Because fitting a VAR of the correct order can be important, varsoc offers
several methods for choosing the lag order p of the VAR to fit. After fitting a VAR, and before proceeding with
inference, interpretation, or forecasting, checking that the VAR fits the data is important. varlmar can be used to
check for autocorrelation in the disturbances. varwle performs Wald tests to determine whether certain lags can
be excluded. varnorm tests the null hypothesis that the disturbances are normally distributed (StataCorp, 2017b:
var intro).

69
Table 9.1: dfgls test for selecting the optimal lag length of consumption.

Table 9.2: ADF test for real personal consumption expenditure.

70
Table 9.3: dfgls test for selecting the optimal lag length of income.

Table 9.4: ADF test for real personal income.

71
Table 9.5: Regressing C on Y and deterministic trend.

Figure 9.3: Residuals from regressing C on Y and Time.


.04
.02
0
-.02
-.04
-.06

1960q1 1970q1 1980q1 1990q1 2000q1 2010q1

72
Table 9.6: dfgls test for selecting the optimal lag length of residual.

Table 9.7: ADF test for stationarity of residual from Table 9.5.

General comment from the above graphs and tables are as follows. First, both C and Y
series are I(1). The relationship between C and Y is spurious because the residual
obtained from the regression between C and Y is not stationary. In other words, we have

73
no cointegration between C and Y29. As a result, we only estimate the coefficients of
the model using a VAR in differences instead of using a VECM model. Before
estimating a VAR model in differences, we should select the optimal lag lengths in such
a VAR model. Table 9.8 indicates that the optimal lag length is 1.

Table 9.9: Lag length selection for the VAR model.

Table 9.10: A VAR model between C and Y.

29
Note that a similar example about the relationship between consumption expenditure and income for the period
1970 – 2008 by Gujarati (2011, p.229-31) concludes that there is a cointegrating relationship between
consumption expenditure and disposable income [see Sections 8.2 and 8.5].

74
9.2.2 Relationship between money supply and interest rate
This example is based on Gujarati & Porter (2009: p.785-7) with some modifications.
Specifically, Gujarati & Porter used the ‘levels’ in his VAR models, although these
series are not stationary. Therefore, we make some modifications following a 4-step
procedure as done in the previous example. The dataset Table17_1.dta includes
quarterly data on 4 variables: M1 (money supply), R (interest rate), P (inflation), and
GDP from first quarter of 1979 to fourth quarter of 1988. However, we just use 36
observations for the analysis because we want to make comparisons between predicted
values with actual values in 1988 for forecasting purpose. The commands in Stata are
listed as below [Note that we will not present all graphs and tables for space limited]:
use "D:\My Blog\Time series econometrics for beginners\Table17_5.dta", clear
gen date =q(1979q1)+_n-1
format %tq date
tsset date
dfgls m, trend
dfuller m, trend lags(1)
dfgls r, trend
dfuller r, trend lags(1)
regress m r
estat dwatson
predict ehat, resid
tsline ehat, xtitle(" ") ytitle("Residuals from regression of M1 on R")
dfgls ehat, trend
dfuller ehat, trend lag(1)
varsoc D.m D.r
varbasic D.m D.r, lag(1/1) step(12) nograph
varbasic D.m D.r, lag(1/2) step(12) nograph
varbasic D.m D.r, lag(1/4) step(12) nograph

The test results show that both series money supply (M) and interest rate (R) are I(1)
and not cointegrated. As a result, we are able to estimate VAR models in first
differences instead of VECM models. Thanks to information criteria such as AIC, SIC,
etc., we eventually select 1 as the optimal lag lengths in the final VAR model [Table
9.11]. We also try other lag lengths such 2 and 4, but these models are not as good as
75
the model with lag length of 1 [thanks to information criteria and significance of higher
lag-length coefficients].
In order to make forecasting [Table 9.12], we can use the command ‘fcast compute’ in
Stata immediately after estimating a VAR model. A list of commands is as follows:
varbasic D.m D.r in 1/36, lag(1/1) step(12) nograph
fcast compute f_, step(4)
gen LM = L1.m
gen f_m = LM + f_D_m
gen LR = L1.r
gen f_r = LR + f_D_r
list m f_m r f_r in 37/40

Table 9.11: A VAR model between M and R.

Table 9.12: Forecast values and actual values of M and R in 1988

76
10. VECM AND JOHANSEN METHOD OF COINTEGRATION
10.1 VECM
It was mentioned in Section 8.3, when there are more than two variables in the model,
it is possible to have more than one cointegrating relationships. Generally, a model with
k variables, there is a possibility to have maximum (k – 1) cointegrating vectors. In this
case, the EG- single-equation approach cannot be applied, and we have to use the
Johansen approach for multiple equations.
In this section, we extend the single-equation error correction model to a multivariate
one. Let’s assume that we have three variables, Yt, Xt and Wt, which can all be
endogenous, i.e., using matrix notation Zt = [Yt, Xt, Wt] we have that

Zt = A0 + A1Zt - 1 + A2Zt - 2 + … + AqZt - q + et (75)

or

Yt =  + a1Yt-1 + d1Xt-1 + g2Wt-1 + … + aqYt-q + dqXt-q + gqWt-q + e1t (84)


Xt =  + b1Yt-1 + e1Xt-1 + h1Wt-1 + … + bqYt-q + eqXt-q + hqWt-q + e2t (85)
Wt =  + c1Yt-1 + f1Xt-1 + k1Wt-1 + … + cqYt-q + fqXt-q + kqWt-q + e3t (86)

where
Yt  a1 d1 g1 a2 d2 g2
Zt = [ X t ], A0 = [  ], A1 = [b1 e1 h1 ], A2 = [b2 e2 h2 ]
Wt  c1 f1 k1 c2 f2 k2
aq dq gq e1t
, … Aq = [bq eq hq ], and et = [e2t ]
cq fq kq e3t

Suppose that all variables in model (75) are I(1) and there are two cointegrating
relationships, similar to ECM in the single-equation case, we have a counterpart of ECM
for multiple equations: The simplest form of a VECM(p) is as below:

Zt = 1Zt - 1 + 2Zt - 2 + … + pZt - p + Zt - 1 + vt (87)

or

Yt = a1Yt-1 + d1Xt-1 + g2Wt-1 + … + apYt-p + dpXt-p + gpWt-p + 1Zt-1 + v1t (88)
Xt = b1Yt-1 + e1Xt-1 + h1Wt-1 + … + bpYt-p + epXt-p + hpWt-p+ 2Zt-1 + v2t (89)
Wt = c1Yt-1 + f1Xt-1 + k1Wt-1 + … + cpYt-p + fpXt-p + kpWt-p+ 3Zt-1 + v3t (90)

77
where
Yt a1 d1 g1 a2 d2 g2
Zt = [ X t ], 1 = [b1 e1 h1 ], 2 = [b2 e2 h2 ]
Wt c1 f1 k1 c2 f2 k2
ap dp gp Yt−1 v1t
, … p = [bp ep hp ], Zt-1 = [ X t−1 ], and vt = [v2t ]
cp fp kp Wt−1 v3t

Important note: VECM include one fewer lag of the first differences in comparison with
the orginal VAR. Therefore, we replace q by p [i.e., p = q – 1]. Also note that for
simplification, we denote elements in 0, 1, …, n similar as A0, A1, …, Ap, but the
nature is completely different. In addition, there may be constant and trend terms in both
VAR and cointegrating equations [i.e., Zt - 1]. We will expand these terms when
discussing the Johansen approach.
The  matrix contains information regarding the long-run relationships. We can
decompose  = ’ where  is the speed of adjustment to equilibrium coefficients, and
’ is the matrix of long-run coefficients. The matrix  is defined as follows:

1 11 12
 21 31

 = [2 ] = ’ = [ 21 22 ] [ 11 ]
3 31 32 12 22 32

Therefore, the β’Zt - 1 term is equivalent to the error correction term [Yt - 1 – α – βXt - 1]
in the single-equation case, except that now β’Zt - 1 contains up to (k – 1) vectors in a
multivariate framework.
Let us now analyze only the error correction part of the first equation [Eq.(88), i.e., for
∆Yt on the left-hand side) which gives:

 Yt 1 
 
1Zt-1 = ([  11β11 +  12β12] [  11β21 +  12β22] [  11β31 +  12β32])  X t 1  (91)
W 
 t 1 

Equation (91) can be rewritten as:

1Zt-1 =  11(β11Yt-1 + β21Xt-1 + β31Wt-1) +  12(β12Yt-1 + β22Xt-1 + β32Wt-1) (92)

which shows clearly the two co-integrating vectors with their respective speed of
adjustment terms  11 and  12.

78
10.2 Advantages of the Multiple-Equation Approach
According to Asteriou & Hall (2011: p.369-70), the multiple-equation approach has the
following advantages over the single-equation approach:
(1) From the multiple-equation approach, we can obtain estimates for both co-
integrating vectors [Eq.(92)], while with the simple equation we have only a
linear combination of the two long-run relationships.
(2) Even if there is only one cointegrating relationship [for example the first only in
Eq.(92)] rather than two, with the multiple-equation approach we can calculate
all three differing speeds of adjustment coefficients (  11  21  31).
(3) Only when  21 =  31 = 0, and only one co-integrating relationship exists, can
we then say that the multiple equation method is the same (reduces to the same)
as the single equation approach, and therefore, there is no loss from not
modelling the determinants of ∆Xt and ∆Wt. Here, it is good to mention as well
that when  21 =  31 = 0, is equivalent to Xt and Wt being weakly exogenous.
In a nutshell, only when all right-hand variables in a single equation are weakly
exogenous, does the single-equation approach provide the same result as a multiple-
equation approach.

10.3 The Johansen Approach of Cointegration


10.3.1 Introduction
The Johansen approach of cointegration test relates to the rank of matrix , denoted as
r(), i.e., the linearly independent rows in this matrix [more exactly in the matrix ’].
It is important to note that the Johansen approach requires all variables in Zt are I(1),
i.e., it is a vector of nonstationary variables. In our current concern, Zt is a vector of
three endogenous variables, i.e., Zt = [Yt, Xt, Wt]. If this is a case, Zt and of course its
lags such as Zt - 1, Zt - 2, …, Zt - p are I(0), i.e., stationary. Then the equation (87) is
well behaved only if the error term Zt - 1 must be I(0), i.e., stationary too.
Similar to Section 9.1, there are three possible cases in multiple-equation approach.
[Again, note that all variables in Zt have to be of the same order of integration].
(1) All the variables in Zt are I(0) (i.e., stationary): the standard case, i.e. a simple
VAR in levels. In that case, we can estimate each equation by OLS. The VAR(q)
system is defined as follows:

Zt = A0 + A1Zt - 1 + A2Zt - 2 + … + AqZt - q + et (75)

or

79
Yt =  + a1Yt-1 + d1Xt-1 + g2Wt-1 + … + aqYt-q + dqXt-q + gqWt-q + e1t (84)
Xt =  + b1Yt-1 + e1Xt-1 + h1Wt-1 + … + bqYt-q + eqXt-q + hqWt-q + e2t (85)
Wt =  + c1Yt-1 + f1Xt-1 + k1Wt-1 + … + cqYt-q + fqXt-q + kqWt-q + e3t (86)

where
Yt  a1 d1 g1 a2 d2 g2
Zt = [ X t ], A0 = [  ], A1 = [b1 e1 h1 ], A2 = [b2 e2 h2 ]
Wt  c1 f1 k1 c2 f2 k2
aq dq gq e1t
, … Aq = [bq eq hq ], and et = [e2t ]
cq fq kq e3t

Note: A0 may include trend variable.


(2) All variables are I(1) but are not cointegrated, and therefore the  matrix is an n
x n matrix of zeros. In this case, the appropriate strategy is to estimate a VAR
model using first differences of variables, which are now stationary. Here we can
also use OLS to estimate each equation individually. However, we are just able
to investigate the short-run relationships and causality directions among these
variables. The VAR(p) system is defined as follows:

Zt = 1Zt - 1 + 2Zt - 2 + … + pZt - p + vt (93)

or

Yt = a1Yt-1 + d1Xt-1 + g2Wt-1 + … + apYt-p + dpXt-p + gpWt-p + v1t (94)


Xt = b1Yt-1 + e1Xt-1 + h1Wt-1 + … + bpYt-p + epXt-p + hpWt-p + v2t (95)
Wt = c1Yt-1 + f1Xt-1 + k1Wt-1 + … + cpYt-p + fpXt-p + kpWt-p + v3t (96)

where
Yt a1 d1 g1 a2 d2 g2
Zt = [ X t ], 1 = [b1 e1 h1 ], 2 = [b2 e2 h2 ]
Wt c1 f1 k1 c2 f2 k2
ap dp gp v1t
, … p = [bp ep hp ], and vt = [v2t ]
cp fp kp v3t

Note: The VAR(p) model may include constant and trend variable, i.e., 0.

80
(3) All variables are I(1), but are cointegrated, i.e., there exist up to (k – 1) [= 2 in
this current case] cointegrating relationships of the form ’Zt - 1 ~ I(0). In this
particular case, r  (k - 1) cointegrating vectors exist in . This simply means
that r columns of  form r linearly independent combinations of the variables in
Zt, each of which is stationary. Here, we have to use the vector error correction
mechanism (VECM) as defined in Eq.(87).
In terms of the rank of matrix , the above cases are summarized as in Table 10.1.
Table 10.1: Rank of matrix  and its implications

Rank of  Implications

r=k All variables in Zt are stationary, i.e., I(0). We call  has a full rank.
No need to estimate the model as VECM. VAR on untransformed
data is well behaved.

r=0 All variables in Zt are I(1), but there is no cointegration. We call


the rank of  is zero. No stable long-run relations among variables.
VECM is not possible (only VAR in first differences is applicable).

0<rk-1 There are r cointegrating vectors (relationships). These vectors


describe the long-run relationships among variables in Zt. VECM
is the appropriate strategy.

10.3.2 The steps of the Johansen approach


According to Asteriou & Hall (2011: p.371-5), the Johansen approach in practice
involves the following steps:
Table 10.2: Johansen approach in practice.

Step 1 Testing the order of integration of all variables.


We can use ADF tests for each variable in Zt, as discussed in Section
6. There are some possibilities:
(1) All variables are I(0). Stop here. We use case 1 in Section
10.3.1, and estimating VAR models as in Section 9.2.
(2) All variable are I(1). Go to Step 2.
(3) A mixed case where a mix of I(0) and I(1) variables are present
in the model, we can use Perasan bounds test of cointegration.
Another strategy is to select another proxy of I(0) variable

81
which is I(1). For example, if inflation rate is I(0), we might
expect that CPI is I(1). Similarly, if we face a mix of I(1) and
I(2), we can select another proxy of I(2) variable which is I(1).
For example, if GDP is I(2), we might expect that GDP growth
rate is I(1)30.

Step 2 Setting the appropriate lag length of the model.


Setting the value of the lag length is affected by the omission of
variables that might affect only the short-run behavior of the model.
This is due to the fact that omitted variables instantly become part of
the error term. Therefore, very careful inspection of the data and the
functional relationship is necessary before proceeding with estimation
in order to decide whether to include additional variables. It is quite
common to use dummy variables to take into account short-run
‘shocks’ to the system, such as political events that had important
effects on macroeconomic conditions (Asteriou & Hall: p.371).
The most common procedure in choosing the optimal lag length is to
estimate a VAR model including all variables in levels (non-
differenced). This VAR model should be estimated for a large number
of lags, then reducing down by re-estimating the model for one lag less
until we reach zero lag (Asteriou & Hall: p.371-2).
In each of these models, we inspect the values of the AIC and the SBC
criteria, as well as the diagnostics concerning autocorrelation,
heteroskedasticity, possible ARCH effects and normality of the
residuals. In general, the model that minimizes AIC and SBC is
selected as the one with the optimal lag length. This model should also
pass all the diagnostic checks (Asteriou & Hall: p.372).
In Stata, we can use the command ‘varsoc’ as in Section 9.2 above.

Step 3 Choosing the appropriate model regarding the deterministic


components in the multivariate system.
Another important aspect in the formulation of the dynamic model is
whether an intercept and/or trend should enter either the short-run (i.e.,
VAR part) or the long-run (i.e., cointegrating equation part) model, or

30
Most macroeconomic flows and stocks such as output and employment are I(1). An I(2) series is growing at an
ever-increasing rate such as price level data. Series that are I(3) or greater are extremely unusual, but they do exist.
For example, the money stocks or price levels in hyperinflationary economies (Greene, 2008: p.740).

82
both models. The general case of the VECM including all the various
options that can possibly happen, is given by the following equation:
∆Zt = 1∆Zt-1 + … + p∆Zt-p
+ (’Zt-1  t) +0 + t + vt (97)
In general, five distinct models can be considered. Although the first
and the fifth model are not likely to happen, we present all of them for
reasons of complementarity.

Model 1: No intercept or trend in CE (the cointegrating equation) or


in VAR (i.e.,  = 0 = =  = 0).
∆Zt = 1∆Zt-1 + … + p∆Zt-p + ’Zt-1 + vt (98)

Model 2: Intercept (no trend) in CE, no intercept or trend in VAR (i.e.,


  0; 0 = =  = 0).
∆Zt = 1∆Zt-1 + … + p∆Zt-p + (’Zt-1 ) + vt (99)

Model 3: Intercept in CE and VAR, no trend in CE and VAR (i.e., 


 0, 0  0;  =  = 0).
∆Zt = 1∆Zt-1 + … + p∆Zt-p + (’Zt-1 ) + 0 + vt (100)

Model 4: Intercept in CE and VAR, linear trend in CE, no trend in


VAR (2 = 0).
∆Zt = 1∆Zt-1 + … + p∆Zt-p + (’Zt-1  t) + 0 + vt (101)

Model 5: Intercept and quadratic trend in CE, intercept and linear trend
in VAR.
∆Zt = 1∆Zt-1 + … + p∆Zt-p
+ (’Zt-1  t2) +0 + t + vt (102)

The Pantula Principle: This principle involves the estimation of all


three models (usually 2, 3, and 4) and the presentation of the results
[e.g., trace statistic] from the most restrictive hypothesis (i.e., r = 0 and
model 2) to the least restrictive hypothesis (r = k – 1 and model 4). The
model selection procedure then comprises moving from the most
restrictive model, at each stage comparing the trace test statistic to its
critical value, and stopping only when it is concluded for the first time

83
that the null hypothesis of no cointegration is not rejected (Asteriou &
Hall: p.373).

Step 4 Determining the rank of  or the number of cointegrating vectors.


There are two methods (and corresponding test statistics) for
determining the number of cointegrating relations, and both involve
estimation of matrix .
(1) One method tests the null hypothesis (H0), that rank() = r
against the hypothesis that the rank is r + 1. The test statistics
are based on the characteristic roots (also called eigenvalues)
obtained from the estimation procedure. The test consists of
ordering the largest eigenvalues in descending order and
considering whether they are significantly different from zero.
To understand the test procedure, suppose we obtained n
characteristic roots denoted by 1 > 2 > 3 > … > n. If the
variables under examination are not cointegrated, the rank of 
is zero and all the characteristic roots will equal zero. Therefore,
(1  ˆ
i ) will be equal to 1 and since ln(1) = 0. To test how
many of the numbers of the characteristic roots are significantly
different from zero, this test uses the following statistic:

 max(r, r  1)  T ln(1  ˆ
r  1) (103)

As we said before, the test statistic is based on the maximum


eigenvalue and because of that is called the maximal eigenvalue
statistic (denoted by max).
(2) The second method is based on a likelihood ratio test for the
trace of the matrix (and because of that it is called the trace
statistic). The trace statistic considers whether the trace is
increased by adding more eigenvalues beyond the rth. The null
hypothesis (H0) in this case is that the number of cointegrating
vectors is less than or equal to r. From the previous analysis, it
is clear that when all ̂ i = 0, the trace statistic is equal to zero
as well. This statistic is calculated by:
n
 trace(r)  T  ln(1  ˆ
 r  1) (104)
ir 1

The usual procedure is to work downwards and stop at the value


of r, which is associated with a test statistic that exceeds the

84
displayed critical value. [In other words, if test statistic > critical
value, we reject H0]. Critical values for both statistics are
provided by Johansen and Juselius (1990). These critical values
are directly provided from Stata after conducting a cointegration
test.

Source: Asteriou & Hall (2011: p.371-4)

10.3.3 A Numerical example


Remind that we reject the null hypothesis that r, the number of cointegrating vectors, is
less than k if the test statistic is greater than the critical values specified.
Table 10.3: Trace test

H0 H1 Statistic 95% Critical Decision

r=0 r=1 62.18 47.21 Reject H0

r  1 r=2 19.55 29.68 Accept H0

r2 r=3 8.62 15.41 Accept H0

r3 r=4 2.41 3.76 Accept H0

We conclude that this data exhibits one cointegrating vector.

10.3.4 The Johansen approach and VECM in Stata


In Stata, the command for the Johansen cointegration test has the following syntax
(Asteriou & Hall: p.380-1):

vecrank varnames, options

where in varnames we type the name of the variables (in levels) to be tested for
cointegration. From the options given, we specify the different models discussed in the
theory. So, for each case (from models 1 - 5). The options for each model are as follows:

Model 1: trend(none)
Model 2: trend(rconstant)
Model 3: trend(constant)
Model 4: trend(rtrend)
Model 5: trend(trend)
85
For example, suppose we want to test for cointegration between two variables (say, y
and x) through the third model, the command is:
vecrank y x, max trend(constant) lags(2)

where the max is in the command for Stata to show both the max and trace statistics (if
the max is omitted, Stata will report only the trace statistics). Also lags(#) determines
the number of lags to be used in the test.
If it appears that there is cointegration, the command:

vec varnames, options

provides the VECM estimation results. The options are the same as above. So, the
command:
vecrank y x, trend(trend) lags(3)
yields VECM results for the variables y and x and for three lagged short-run terms,
when the cointegrating equation has been determined from the fifth model according to
the theory.
In order to illustrate, we use the following example, which is based on group assignment
for an advanced econometrics course in 2012, School of Social Sciences, Wageningen
University, the Netherlands31. In this example, we use the dataset texashousing.dta with
monthly housing prices in four major cities in Texas (USA): Austin, Dallas, Houston
and San Antonio. Natural logarithms of housing prices are available from January 1990
till December 2003 (168 observations). It is expected that there are regional linkages
between these housing markets. If houses get very expensive in one city, people may
decide to move to another city, creating upward pressure on housing prices in all cities.
In other words, it is assumed that there exist a long-run (spatial) equilibrium between
these four housing prices series. That is what we will investigate here.

Step 1: testing the order of integration of the variables.


In order to select appropriate test equations (whether include constant and trend), we
first investigate the graph of each series, then use the dfgls command to choose the
optimal lag lengths based on the MAIC criterion. The test statistics of Dallas and
Houston in level without trend are positive, so we try other test equations with trend.
The final results are presented in Table 10.4.

31
More exactly, this is an example in StataCorp (2017b: vec intro – VECM estimation in Stata).

86
Table 10.4: Unit root tests of four housing prices.

Variable Level First difference Conclusion


Lag length P-value Lag length P-value
Austin 4 0.524 13 0.010 I(1)
Dallas 11 0.809 1 0.000 I(1)
Houston 11 0.825 13 0.000 I(1)
Sa 12 0.749 13 0.000 I(1)

From Table 10.4 we realize that all housing prices in these cities are integrated of the
same order one, i.e., I(1). Therefore, there could be cointegrating relationships among
these housing prices.

Step 2: determining the lag length of VAR models


Using the AIC criterion, we see that the appropriate lag length of VAR part in the test
equation is 3.

Table 10.5: Optimal lag length of the VAR model.

(varsoc austin dallas houston sa)

Step 3: Choosing the appropriate model regarding the deterministic components


in the multivariate system.
The Johansen tests for cointegration are presented in Tables 10.6-8, and the Pantula
principle test results are presented in Table 10.9. All three models suggest the rank of
 matrix is 2. In addition, the trace statistics for all three models are collected together
as in Table 10.9 to choose which model is the most appropriate. Start with the smaller
number of cointegrating vectors r = 0, and check whether the trace statistic for model 2
rejects H0; if ‘yes’ proceed to the right, checking whether model 3 rejects H0, and so on.
In our case, model 2 suggests that the trace statistic is smaller than the 5% critical value
at r = 2, so this model does not show cointegration for the first time. Therefore, model
2 (only intercept in CE) is the selected model.

87
Table 10.6: Model 2 results.

Table 10.7: Model 3 results.

Table 10.8: Model 4 results.

88
Table 10.9: The Pantula principle test results.

r Model 2 Model 3 Model 4

0 108.80 101.61 128.42

1 48.78 41.68 67.40

2 16.93* 9.89 23.75

3 6.10 0.34 9.53

Step 4: Determining the rank of  or the number of cointegrating vectors.


From the above tables, we could conclude that the rank of the cointegrating matrix is 2.
This implies that there exist two long-run relationships among housing prices in the
sample of four cities under study.

Step 5: Estimating the VECM32 model and interpreting the results.


From Step 4, we see that the VECM model is identified as follow:

vec austin dallas houston sa, trend(rconstant) rank(2) lag(3)

The results are divided into two parts: Table 10.10 presents the short-run relationships
and speed of adjustment coefficients, and Table 10.11 presents the long-run
relationships among four variables. Again, note that the VECM model has one fewer
lag of the first differences.

32
Note that before estimating the parameters of a VECM model, you must choose the number of lags in the
underlying VAR, the trend specification, and the number of cointegrating equations. vecrank offers several ways
of determining the number of cointegrating vectors conditional on a trend specification and lag order (StataCorp,
2017b: vecrank).

89
Table 10.10: Short-run relationships among variables.

90
91
Table 10.11: Long-run relationships among variables33.

From the cointegrating equations results, (based on the significance of the estimated
coefficients) we realize that there are two long-run cointegrating relationships
between/among house prices of: (i) Austin and San Antonio; and (ii) Dallas, Houston,
and San Antonio.
The speed of adjustment parameters in the VECM model are derived from Table 10.10,
and presented in Table 10.12. From Tables 10.11 and 10.12, we can write two
cointegrating vectors with their respective speed of adjustment terms for each equation
in VCEM model as follows:

33
Note that the coefficient of houston in the first cointegrating equation (_ce1) is not statistically significant. We
can refit the model with the Johansen normalization and the overidentifying constraint that the coefficient on
houston in the first cointegrating equation is zero [See StataCorp (2017b: vec intro – VECM estimation in Stata)
to learn this command].

92
For Austin:
-0.154(Austint-1 - 0.267Houstont-1 – 1.235SAt-1 + 5.546) - 0.025(Dallast-1 –
1.094Houstont-1 + 0.286Sat-1 – 2.343)

For Dallas:
0.071(Austint-1 - 0.267Houstont-1 – 1.235SAt-1 + 5.546) + 0.612(Dallast-1 –
1.094Houstont-1 + 0.286Sat-1 – 2.343)

For Houston:
0.188(Austint-1 - 0.267Houstont-1 – 1.235SAt-1 + 5.546) - 0.302(Dallast-1 –
1.094Houstont-1 + 0.286Sat-1 – 2.343)

For San Antonia:


0.281(Austint-1 - 0.267Houstont-1 – 1.235SAt-1 + 5.546) - 0.171(Dallast-1 –
1.094Houstont-1 + 0.286Sat-1 – 2.343)

Table 10.12: Speed of adjustment coefficients34.

Adjustment coefficients P-value Significance

D_austin 11 -0.154 0.010 Yes

12 -0.025 0.839 No

D_dallas 21 0.071 0.142 No

22 -0.302 0.002 Yes

D_houston 31 0.188 0.000 Yes

32 0.612 0.000 Yes

D_sa 41 0.281 0.000 Yes

41 -0.171 0.200 No

34
To create a separate table of adjustment parameters only, we can replay the results by specifying ‘alpha’ option
plus nobtable noetable [the command is vec, alpha nobtable noetable]. See StataCorp (2017b: vec – example 2).

93
There are some notes:
▪ For Austin: The adjustment parameter of the second cointegrating relation is
not significant because Austin is omitted in this relation (i.e., _ce2 in
cointegrating equations).
▪ For Dallas: The adjustment parameter of the first cointegrating relation is not
significant because Dallas is omitted in this relation (i.e., _ce1 in the
cointegrating equations).
▪ For Houston: Both adjustment parameters are highly significant because
Houston exists in both relations (i.e., _ce1 and _ce2 in the cointegrating
equations).
▪ For San Antonia: The adjustment parameter of the second cointegrating relation
is not significant (although it is included in both cointegrating equations) because
of lag selection (maybe). Say, when we change from lag(3) to lag(4), both
adjustment parameters become significant at 5% significance level.
You can try with other model specifications such as model 3, model 4, and/or different
lags based on other information criteria such as SIC. The Stata commands for this
example are as follows:
use "D:\My Blog\Time series econometrics for beginners\texashousing.dta", clear
tsset t
tsline D.austin D.dallas D.houston D.sa
dfgls austin
dfuller austin, lag(4)
dfgls dallas, trend
dfuller dallas, trend lag(11)
dfgls houston, trend
dfuller houston, trend lag(11)
dfgls sa
dfuller sa, lag(12)
dfgls D.austin
dfuller D.austin, lag(13)
dfgls D.dallas
dfuller D.dallas, lag(1)
dfgls D.houston

94
dfuller D.houston, lag(13)
dfgls D.sa
dfuller D.sa, lag(13)
varsoc austin dallas houston sa
vecrank austin dallas houston sa, trend(none) lag(3) /* Model 1 */
vecrank austin dallas houston sa, trend(rconstant) lag(3) /* Model 2 */
vecrank austin dallas houston sa, trend(constant) lag(3) /* Model 3 */
vecrank austin dallas houston sa, trend(rtrend) lag(3) /* Model 4 */
vecrank austin dallas houston sa, trend(trend) lag(3) /* Model 5 */

vec austin dallas houston sa, trend(rconstant) rank(2) lag(3) /* Model 2 */


vec austin dallas houston sa, trend(constant) rank(2) lag(3) /* Model 3 */
vec austin dallas houston sa, trend(rtrend) rank(2) lag(3) /* Model 3 */

11. CAUSALITY TESTS


According to Asteriou & Hall (2011: p.322), one of the good features of VAR models
is that they allow us to test the direction of causality. Causality in econometrics is
somewhat different to the concept in everyday use; it refers more to the ability of one
variable to predict (and therefore cause) the other. Gujarati (2011: p.270) said that
causality between variables, if any, must be determined externally, by appealing to some
theory or by some kind of experimentation.
Suppose two stationary variables, say Yt and Xt, affect each other with distributed lags
[more exactly in the reduced-form VAR model]. The relationship between Yt and Xt
can be captured by a VAR model. In this case, it is possible to have that (a) Yt causes
Xt (i.e., unidirectional Granger causality from Y to X), (b) Xt causes Yt (i.e.,
unidirectional Granger causality from X to Y), (c) there is a bidirectional feedback
(i.e., causality among the variables), and (d) the two variables are independent. The
problem is to find an appropriate procedure that allows us to test and statistically detect
the cause and effect relationship among variables.
Granger (1969) developed a relatively simple test that defined causality as follows: a
variable Yt is said to Granger-cause Xt, if Xt can be predicted with greater accuracy by
using past values of the Yt variable rather than not using such past values, all other terms
remaining unchanged (Asteriou & Hall: p.322).

95
There are two causality testing approaches, namely Granger causality test and Sims
causality test. However, for its popularity in practical applications, we just concentrate
on the test procedures for Granger causality test.
11.1 The Standard Granger Causality Test
The standard Granger causality test for the of two stationary variables, say, Yt and Xt,
involves as a first step the estimation of the following (reduced-form) VAR model:
q q
Yt = a1 + ∑i=1 βi X t − i + ∑i=1 i Yt − i + u1t (105)
q q
Xt = a2 + ∑i=1 i X t − i + ∑i=1 i Yt − i + u2t (106)

where it is assumed that both u1t and u1t are uncorrelated white-noise error terms [i.e.,
well-specified model, no autocorrelation, no heteroskedasticity35, no omission of
important lagged variables, etc.], and importantly Yt and Xt are stationary. We also
assume that the lag lengths of both equations are the same (i.e., q), although they might
be different. In addition, both equations (105) and (106) might include other exogenous
variables such as linear trend, quadratic trend, and so on. In this model, we can have the
following different cases:

Case 1 The lagged X terms in equation (105) are statistically different from zero as
a group, and the lagged Y terms in equation (106) are not statistically
different from zero. In this case, we have that Xt causes Yt.

Case 2 The lagged Y terms in equation (106) are statistically different from zero as
a group, and the lagged X terms in equation (105) are not statistically
different from zero. In this case, we have that Yt causes Xt.

Case 3 Both sets of lagged X and lagged Y terms are statistically different from zero
as a group in equations (105) and (106), so that we have bidirectional
causality between Yt and Xt.

Case 4 Both sets of lagged X and lagged Y terms are not statistically different from
zero in equations (105) and (106), so that Xt is independent of Yt.

The Granger causality test involves the following procedures. First, estimate the VAR
model given by equations (105) and (106). Then check the significance of the
coefficients and apply variable deletion tests, first in the lagged X terms for equation

35
Many cases of heteroskedasticity in time series data involve an error term with a variance that tends
to increase with time. That kind of heteroskedastic error term is also nonstationary (Studenmund, 2017:
p.377).

96
(105), and then in the lagged Y terms in equation (106). According to the result of the
variable deletion tests, we may conclude about the direction of causality based upon the
four cases mentioned above.
More analytically, and for the case of one equation [i.e. we will examine equation
(105)], it is intuitive to reverse the procedure in order to test for equation (106), we
perform the following steps (Asteriou & Hall: p.323-4):

Step 1 Regress Yt on lagged Y terms as in the following model:


q
Yt = a1 + ∑i=1 i Yt − i + u1t (107)

and obtain the RSS of this regression (which is the restricted one) and label
it as RSSR.

Step 2 Regress Yt on lagged Y terms plus lagged X terms as in the following


model:
q q
Yt = a1 + ∑i=1 βi X t − i + ∑i=1 i Yt − i + u1t (105)

and obtain the RSS of this regression (which is the unrestricted one) and
label it as RSSU.

Step 3 Set the null and alternative hypotheses as below:


q
H0: ∑i=1 βi = 0 or Xt does not cause Yt
q
H1: ∑i=1 βi  0 or Xt does cause Yt

Step 4 Calculate the F statistic for the normal Wald test on coefficient restrictions
given by:

(RSSR − RSSU )/q


F=
RSSU /(n − k)

where n is the included observations and k = 2q + 1 is the number of


estimated coefficients in the unrestricted model.

Step 5 If the computed F value exceeds the critical F value, reject the null
hypothesis and conclude that Xt causes Yt.

97
We then repeat the same test procedure for equation (106). It is noted that if both Yt and
Xt are nonstationary and not cointegrated, the standard Granger causality test is used
for the first differences of the nonstationary variables, i.e., Yt, Xt, and their
corresponding lags [p = q -1 in this case].

11.2 Remarking Points


Before we implement the Granger causality tests, we need to consider several factors
(Gujarati, 2011: p.272):
1) The number of lagged terms to be introduced in the Granger causality tests is an
important practical question, for the direction of causality may depend critically
on the number of lagged terms included in the model. We will have to use the
Akaike, Schwarz or similar criterion to determine the length of the lags. Some
trial and error is inevitable. Note: we can apply the Stata command ‘varsoc’ as
previously mentioned.
2) We have assumed that the error terms entering the Granger test are uncorrelated.
If this is not a case, we will have to use appropriate error transformation such as
Orcutt-Cochrane, Prais-Winten, or Newey-West techniques. These techniques
are widely discussed in the autocorrelation topic of every econometrics texts.
3) We have to guard against spurious causality. When we say that Y (say,
consumption expenditure) causes X (say, income) or vice versa, it is quite
possible that there is a “lurking” variable, such as Z (say, interest rate), that
causes both Y and X. Therefore, the causality between Y and X may in fact be
due to the omitted variable, i.e., interest rate. One way to find this out is to
consider a three-variable VAR, one equation for each of the three variables.
4) The critical assumption underlying the Granger causality test is that the variables
under study are stationary. If this is not a case, we have to make first differenced
transformation so as to have stationary series for Granger causality tests with the
tranformed variables.
5) However, while individually nonstationary, it is possible that the variables in
question are cointegrated. In that situation, as in the case of univariate
nonstationary variables, we will have to use the error correction mechanism
(ECM). This is because if Y and X are cointegrated, the following the Granger
Representation Theorem, either Y must cause X or X must cause Y. If this is a
case, there are now two sources of causation: (a) through the lagged values of
one variable (say, X) to another (say, Y); and (b) through the lagged value of the
cointegrating vector (i.e., the error term). The standard Granger test neglects the
latter source of causation.

98
11.3 The Augmented Granger Causality Test
The augmented Granger causality test for the of two nonstationary but cointegrated
variables, say, Yt ~ I(1) and Xt ~ I(1) follows the ECM model36 as:

p p
Yt = a1 + ∑i=1 βi X t − i + ∑i=1 i Yt − i + 1et - 1 + u1t (108)
p p
Xt = a2 + ∑i=1 i X t − i + ∑i=1 i Yt − i + 2et - 1 + u2t (109)

where et – 1 ~ I(0) is the lagged value of the cointegrating equation between Yt and Xt:

Yt =  + Xt + et (110)

and it is assumed that u1t and u1t are uncorrelated white-noise error terms, and
importantly et is a white noise and stationary. We also assume that the lag lengths of
both equations are the same (i.e., p = q -1 for the first differenced series), although they
might be different. Similar to the standard version of Granger causality, we can have
the following different cases:

Case 1 The lagged X terms and lagged error term in equation (108) are statistically
different from zero as a group, and the lagged Y terms and lagged error
term in equation (109) are not statistically different from zero. In this case,
we have that Xt causes Yt.

Case 2 The lagged Y terms and lagged error term in equation (109) are statistically
different from zero as a group, and the lagged X terms and lagged error term
in equation (108) are not statistically different from zero. In this case, we
have that Yt causes Xt.

Case 3 Both sets of lagged X and lagged Y terms or both sets of lagged error
terms are statistically different from zero as a group in equations (108) and
(109), so that we have bidirectional causality between Yt and Xt.

Case 4 Both sets of lagged X terms and lagged error term in equation (108) and
lagged Y terms and lagged error term in equation (109) are not statistically
different from zero, so that Xt is independent of Yt.

36
Note that we are considering the single equation case, so the ECM is used. However, if we have the multiple
equations with more than two nonstationary variables, the counterpart of ECM, i.e., VECM is used instead.

99
The augmented Granger causality test involves the following steps (suppose that Yt and
Xt are nonstationary):

Step 1 Test for cointegration between variables of interest [in the current
situation, EG approach for single equation is used]. Suppose that
cointegration exists between variables.

Step 2 Regress Yt on lagged Y terms as in the following model:


p
Yt = a1 + ∑i=1 i Yt − i + u1t (111)

and obtain the RSS of this regression (which is the restricted one) and label
it as RSSR.

Step 2 Regress Yt on lagged Y terms plus lagged X terms and lagged error
term et - 1 as in the following model:
p p
Yt = a1 + ∑i=1 βi X t − i + ∑i=1 i Yt − i + 1et - 1 + u1t (108)

and obtain the RSS of this regression (which is the unrestricted one) and
label it as RSSU.

Step 3 Set the null and alternative hypotheses as below:


p
H0: ∑i=1 βi = 1 = 0 => Xt does not cause Yt
p
H1: ∑i=1 βi  0 or 1  0 => Xt does cause Yt

Step 4 Calculate the F statistic for the normal Wald test on coefficient restrictions
given by:

(RSSR − RSSU )/p


F=
RSSU /(n − k)

where n is the included observations and k = 2p + 1 is the number of


estimated coefficients in the unrestricted model.

Step 5 If the computed F value exceeds the critical F value, reject the null
hypothesis and conclude that Xt causes Yt.

We then repeat the same test procedure for equation (109).

100
11.4 Illustrative Examples
11.4.1 Causality test of the consumption expenditure and income relationship
This example continues the relationship between consumption expenditure and income
in Section 9.2.1. We already knew that both variables ln(consumption) and log(income)
are I(1) and not cointegrated. Therefore, we can apply the standard Granger causality
test to investigate the causation between the two variables. The Stata commands are the
same as in Section 9.2.1, but we add one more command ‘vargranger’37 after the
‘varbasic’:
use "D:\My Blog\s4poe_statadata\consumption.dta", clear
gen date =q(1960q1)+_n-1
format %tq date
gen Y = log(inc)
gen C = log(cons)
tsset date
tsline C Y, legend(lab (1 "ln(Consumption") lab(2 "ln(Income"))
tsline D.C D.Y, legend(lab (1 "D.ln(Consumption") lab(2 "D.ln(Income"))
dfgls C, trend
dfuller C, trend lags(3)
dfgls Y, trend
dfuller Y, trend lags(1)
regress C Y time
predict ehat, resid
tsline ehat
dfgls ehat, trend
dfuller ehat, trend lag(1)
varsoc D.C D.Y
varbasic D.C D.Y, lag(1/1) step(12) nograph
vargranger

The standard Granger causality test for the first differenced variables of consumption
expenditure and income is presented in Table 11.1.

37
vargranger can be used only after var or svar. Besides, we can use ‘test’ instead of vargranger (StataCorp,
2017b: vargranger).

101
Table 11.1: Causality test of consumption expenditure and income relationship.

The null hypotheses are as follows:


H01: Income does not cause consumption expenditure.
H02: Consumption expenditure does not cause income.
The p-values are 0.003 and 0.000, respectively. Our conclusion is that there is a
bidirectional of feedback causality between consumption expenditure and income.

11.4.2 Causality test of the money supply and interest rate relationship
This example continues the relationship between money supply and interest rate in
Section 9.2.2. We already knew that both variables money supply and interest rate are
I(1) and not cointegrated. Therefore, we can apply the standard Granger causality test
to investigate the causation between the two variables. The Stata commands are the
same as in Section 9.2.2, but we add one more command ‘vargranger’ after the
‘varbasic’:
use "D:\My Blog\Time series econometrics for beginners\Table17_5.dta", clear
gen date =q(1979q1)+_n-1
format %tq date
tsset date
dfgls m, trend
dfuller m, trend lags(1)
dfgls r, trend
dfuller r, trend lags(1)
regress m r
estat dwatson
predict ehat, resid

102
tsline ehat, xtitle(" ") ytitle("Residuals from regression of M1 on R")
dfgls ehat, trend
dfuller ehat, trend lag(1)
varsoc D.m D.r
varbasic D.m D.r, lag(1/1) step(12) nograph
vargranger

Table 11.2: Causality test of money supply and interest rate relationship.

The null hypotheses are as follows:


H01: Interest rate does not cause money supply.
H02: Money supply does not cause interest rate.
The p-values are 0.000 and 0.001, respectively. Our conclusion is that there is a
bidirectional of feedback causality between money supply and interest rate.

14.2.3 The relationship between wheat prices and oil prices


This example is based on group assignment for an advanced econometrics course in
2012, School of Social Sciences, Wageningen University, the Netherlands. In this
example, we use the Use the data set wheatoil.dta, which contains (nominal) prices of
wheat (pwht), nominal oil prices (poil) and a time indicator (t). The data are monthly
and available for the period Jan 1990 till December 2008 (19 years*12 months = 228
obs.). In this example, we will investigate whether there is a long-run relationship
between wheat prices and oil prices. There may be all kinds of reasons for such a
relationship: oil is an important input in fertilizer production, is used for applying
machinery, drives transportation costs, etc. The aim of this example is to see the role of
KPSS test for stationarity.

103
Figure 11.1: Wheat prices and oil prices over time.

100
300

80
250
pwht

poil
200

60
150

40
100

20
0 50 100 150 200 250
t

pwht poil

The graphs for both pwht and poil indicate that there are stochastic trends (means are
not constant) and their variances are also not constant. For the pwht, it first increases
and highly fluctuates (from observation 1 to about 70), followed by a declining period
(from observation about 70 to about 120) with less fluctuation, then it tends to increase
and especially decline very quickly in the last months. Therefore, we might say that
these prices not stationary.
In order to check the order of integration for pwht we perform the Augmented Dickey
Fuller (ADF) test and the KPSS test on pwht until finding a stationary time series.

Unit root tests for the wheat prices


ADF test
H0: The pwht series is non-stationary (the pwht series has a unit root)
As this is a monthly series, we start with 12 lags. In addition, we include the trend in
the test equation. Because the coefficients of trend and lag 12 are not statistically
significant (for space limited, we don’t show the figure here), so we can remove these
in the test equation.

104
Table 11.3: ADF test for wheat prices with 11 lags.

If we choose the 5% significance level, the coefficients of lag 8 to lag 11 are not
significant. Therefore, we try the test equation with 7 lags.

Table 11.4: ADF test for wheat prices with 7 lags.

105
As the absolute value of the test statistics (2.579) is smaller than the absolute value of
5% critical value (2.882), we cannot reject the null hypothesis at 5% significance level.
Therefore, ADF test suggests that pwht series is not stationary. To be sure, we apply the
KPSS test.

KPSS test

H0: the pwht series is (trend) stationary


All test statistics are greater than the 5% critical values, so we reject the null hypothesis.
That means the pwht series is non-stationary [see Table 11.5].
We now examine stationarity of the first-differenced series of pwht without the constant
term in the test equation because there is no trend in the original series of pwht. Here is
the test result.

Table 11.5: KPSS test for wheat prices.

106
ADF test
H0: The first-difference of pwht is not stationary.
Table 11.6: ADF test for the first difference of wheat prices with 12 lags.

As the absolute value of the test statistics (4.796) is larger than the 5% critical value
(1.95), we reject the null hypothesis. That means the first-differenced series of pwht is
stationary. We now examine the KPSS test for this first-differenced series.

KPSS test
H0: The first-differenced series of pwht is stationary.
Table 11.7: KPSS test for wheat prices.

107
The KPSS test results indicate that we fail to reject the null hypothesis.
In conclusion, the pwht series is integrated of order one [I(1)].

Unit root tests for the oil prices


ADF test
H0: The poil series is not stationary.

Similarly, we first introduce 12 lags because of monthly data. However, the lag 11 and
lag 12 are not significant, so we remove them in the test equation. With 10 lags, the test
results are presented in Table 10.8.

Table 11.8: ADF test for oil prices with 10 lags.

As the absolute value of the test statistics (3.37) is smaller than the 5% critical value
(3.43), we cannot reject the null hypothesis. This implies that the poil series is not
stationary. To be sure, we apply the KPSS test.

108
KPSS test

H0: The poil series is (trend) stationary.

All test statistics are greater than the critical values (even at 1% significance level), so
we could reject the null hypothesis (not be shown here). That means the poil series is
non-stationary.
Therefore, we now examine stationarity of the first-differenced series of poil with the
constant term in the test equation because there is trend in the original series of poil.
The test results indicate that as the absolute value of the test statistics (4.166) is larger
than the 5% critical value (2.882), we reject the null hypothesis. The ADF test indicates
that the first-differenced series of poil is stationary. We also apply the KPSS test for the
first difference of the oil prices, and it confirms that indicate that the first-differenced
series of poil is stationary. Therefore, the poil series is integrated of order one [I(1)].
Note that for space limited, these figures are not shown here.

Cointegration analysis
As both series are integrated of order one, there could exist a long-run relationship
between pwht and poil. We must apply the cointegration tests to see whether there is
really a long-run (or cointegrating) relationship between them.

Table 11.9: Regressing wheat prices on oil prices.

109
The OLS estimation results seem to be spurious because of the following signals: The
t-ratio is very high, while the Durbin-Watson test statistic is very small (0.103). The
graph of residuals from this regression (Figure 11.2) show that the residuals seem to be
non-stationary. The R2 is low (0.213); this is not really a phenomenon of spurious
regression. This can be a signal of positive autocorrelation. To be sure, we must apply
the statistical tests.
We apply two different tests: (i) Residual-based test for no cointegration; and (ii)
CRDW38 test for no cointegration. Both tests check for the cointegration between poil
and pwht: poil and pwht are cointegrated if the residuals of the above estimated model
are stationary process.

Residual-based test for no cointegration (Engle-Granger approach)


We first try with 12 lags and linear trend, but the coefficients of trend and lag 12 are not
significant, so we remove them from the test equation. With 11 lags and no trend, the
test results indicate that the residuals seem to be not stationary at 5% significance level.
However, the constant in this test equation is not significant, so we try to remove it from
the test equation (see Table 11.10).

Figure 11.2: Residuals from regressing wheat prices on oil prices.


150
100
Residuals

50
0
-50

0 50 100 150 200 250


t

38
See ‘Cointegration’ in Verbeek (2004: p.314-7).

110
Table 11.10: ADF test of residual from wheat prices and oil prices regression.

The test equation without the constant term shows that the residuals become stationary
even at 1% significance level. These confused results might be due to the less power of
test of the ADF test. To avoid this, we now apply the KPSS test.

Table 11.11: KPSS test of residual from above regression.

111
The KPSS test results we reject the null hypothesis that the residuals are stationary.
Therefore, there seems to be no cointegration between pwht and poil.

CRDW test for no cointegration

Table 11.12: CRDW test for no cointegration.

The Durbin-Watson test statistic is 0.103, which is smaller than the 5% critical value
CRDW tests for no cointegration (~ 0.2, about 200 observations, 2 variables, Table 9.3,
Verbeek, 2012). Therefore, we fail to reject the null hypothesis that the residuals is non-
stationary. In other words, pwht and poil are not cointegrated.
In conclusion, there is no long-run relationship between pwht and poil. Therefore, the
OLS estimation regression between pwht and poil is likely to be spurious regression.

VAR model

Because pwht and poil are not cointegrated, so we cannot apply the VECM model. It is
just possible to use the VAR model for the first-differenced series of pwht and poil.

112
Table 11.13: VAR model for the relationship between wheat prices and oil prices.

The VAR model results indicate that the p-values of coefficients of the variable poilLD.
in the first equation (0.202) and of the variable pwhtLD in the second equation (0.97)
are very high. These suggest that neither poil affects pwht, nor pwht affects poil.
However, the cofficients of pwhtLD in the first equation, and poilLD in the second
equation are highly significant. These indicate that the first-differenced series follow
the AR process.

113
Causality test

Table 11.14: Causality test of the relationship between wheat prices and oil prices.

The null hypotheses are as follows:


H01: Oil prices does not cause wheat prices.
H02: Wheat prices does not cause oil prices.
The p-values are 0.202 and 0.970, respectively. Our conclusion is that wheat prices and
oil prices are independent.

11.4.4 The relationship between consumption expenditure and income


Table16-1.dta [Gujarati, 2011: Chapter 16] gives yearly data on personal consumption
expenditure (PCE) and personal disposable (i.e. after-tax) income (PDI) for the USA
for the period 1970-2008 (Gujarati, 2011: p.252). The unit root tests support the
hypotheses that these two series are nonstationary. In addition, after regressing
log(PCE) on log(PDI) and linear trend; and log(PDI) on log(PCE) and linear trend, we
the unit root tests of the residuals obtained from these regressions and find out that these
residual series are stationary. Therefore, the two variables are cointegrated. Thanks to
these cointegrating relationships, we can apply the augmented Granger causality test by
estimating the ECM models. Unfortunately, the command ‘vargranger’ only works with
estimates from var or svar. Hence, we must implement the Granger causality manually.
The Stata commands are as follows:
use "D:\My Blog\Time series econometrics for beginners\Table16_1.dta" , clear
tsset year
regress lnpce lnpdi time
predict S1, resid
regress lnpdi lnpce time

114
predict S2, resid
dfuller S1, lag(1)
dfuller S2, lag(1)
varsoc lnpce lnpdi
reg D.lnpce LD.lnpce LD.lnpdi L.S1
test LD.lnpdi L.S1
reg D.lnpdi LD.lnpce LD.lnpdi L.S2
test LD.lnpce L.S2

Table 11.15: Causation from PDI to PCE.

The null hypotheses are as follows:


H01: Income does not cause consumption expenditure.
Because the p-value is 0.0163, we reject the null hypothesis that income does not cause
consumption expenditure at 5% level of significance.

115
Table 11.16: Causation from PDC to PDI.

The null hypotheses are as follows:


H01: Consumption expenditure does not cause income.

Because the p-value is 0.0697, we do not reject the null hypothesis that consumption
expenditure does not cause income at 5% level of significance.

In a nutshell, we can conclude that there is a unidirectional causality from income to


consumption expenditure. This result seems to be reliable.

116
12. BOUNDS TEST FOR COINTEGRATION
12.1 Introduction
Another way to test for cointegration and causality is the bounds test for cointegration
within ARDL modelling approach. This model was developed by Pesaran et al. (2001)
and can be applied irrespective of the order of integration of the variables (irrespective
of whether regressors are purely I(0), purely I(1) or mutually cointegrated). This is
specially linked with the ECM models and called as conditional ECM39 or unrestricted
ECM40. Note that in case of multiple equation approach, we will have the
conditional/unrestricted VECM.
ARDL bounds test approach for cointegration has recently used in many practical
applications thanks to the contributions of Kripfganz41 & Schneider (2016) in terms of
Stata commands. The ardl and ardlbounds commands in Stata help researchers
implement their data analysis more quickly.

12.2 Test Procedure


The ARDL modelling approach for a single equation of bivariate relationship42 between
Yt and Xt involves estimating the conditional error correction model (CECM) as:

p p
∆Yt = c0 + c1 t + aYt−1 + bX t−1 + ∑i=1 θi ∆Yt−i + ∑i=1 i ∆X t−i + ∆X t + εt (112)

Equation (112) is Case(V) in Pesaran et al. (2011: p.296). If c1 = 0, it becomes Case(III).


And if both c0 = c1 = 0, we have Case(I). In addition, in equation (112), k = 1 (i.e., one
explanatory variable). Similarly, if we have three-variable relationship, then k = 2; four-
variable relationship, k = 3, and so on. We also need information about the observations
used to estimate the above CECM in order to determine the lower and upper bound
critical values for ADRL bounds test using Stata. For example, suppose we have n =
100 and Case(V), the Stata commands are as follows:

39
See Pesaran et al. (2001: p.290), Rahman & Kashem (2017: p.603), Rushdi et al. (2012: p.537).
40
See Zhang et al. (215: p.274).
41
http://www.kripfganz.de/stata/
42
For multivariate relationships, see Rushdi et al. (2012: p.537).

117
Table 12.1: Critical F values for ARDL bounds test.

With the Wald F statistics, we do not need the option ‘stat(F)’ because it is a default
from the command ‘ardlbounds’. However, if we want to determine the critical t values
for ARDL bounds test, we must add such an option.

Table 12.2: Critical t values for ARDL bounds test.

118
The lag lengths of Yt - i and Xt - i may be different, but we assume that they are the
same. The selection of the optimal lag lengths is also based on information criteria as
discussed earlier. In Stata, we can apply the command ‘varsoc’. However, in empirical
studies, ‘trial and error’ method is inevitable, especially in case of small samples.
In order to test for the absence of a long-run level relationship (i.e., cointegrating
relationship) between Yt and Xt in the CECM [Eq. (112)], a sequential testing of the
two null hypotheses, defined as:

H10: a = 0 and b = 0; against H11: a  0 and b  0;


and
H20: a = 0; against H21: a  0.

is conducted. If H10 is not rejected, then there does not exist a long-run
level relationship between Yt and Xt. The testing procedure is terminated (i.e., stop). If
this null is rejected, then test for the null H20 and if the latter is also rejected,
then there exists a long-run level relationship between Yt and Xt. According to Rushdi
et al. (2012: p.537), under the assumptions of all variables being I(0) and all being I(1)
respectively, the lower and upper bounds of the critical values of the test statistics
for these hypotheses are tabulated in Pesaran et al. (2011). Accordingly, the first null
hypothesis (i.e., H10) is tested by using the Wald F statistic, while the second null
hypothesis is tested by using the t statistic. Following Pesaran et al. (2001)’ philosophy,
under the null hypothesis of H10, if the computed Wald F statistic falls outside the upper
bound critical value at the prescribed level of significance (e.g., 1%, 5%, and 10%
respectively), then the null hypothesis is rejected. On the other hand, if the F statistic is
below the lower bound critical value, then the null hypothesis is not rejected. However,
if the statistic falls within these bounds, then the decision is inconclusive. A similar
decision rules apply to the t-statistic for testing the null of H20.
If these hypotheses tests establish the existence of level relationship among the
variables, we can then proceed to estimate the long run and short run coefficients in
Eq.(112). The long-run coefficients (if exist) are directly calculated by using the
estimated coefficients obtained from the CECM. How to do it? You can reference either
the relationship between ECM and ARDL discussed in Section 7, or Rushdi et al. (2012:
p.573). If the results from ARDL bounds tests indicate that there exists a long-run
relationship between (among) variables, we can employ either ECM or VECM models
to investigate short-run, long-run relationships, and the speed of adjustment to
equilibrium state by using the traditional ECM or VECM methods. In addition, we can
also investigate Granger causality tests by using the traditional VECM or the
conditional VECM models.

119
12.3 Illustrative Examples
12.3.1 Stock returns and inflation
This example is cited from the study of Rushdi et al. (2012). Its aim is to investigate the
long-run relationship between real stock returns and inflation in Australia over the
period 1969q2 to 2008q1. The data are collected from the International Financial
Statistics (IFS). The variables of interest include real stock returns (rsr), inflation (),
expected inflation (e) [estimated by ARMA(p,q) model], real economic activity (act),
and monetary policy (mp). The unit root tests find out a mixture of I(0) and I(1)
variables. Therefore, the ARDL bounds testing method seems to be appropriate. In order
to test for cointegration between real stock returns and inflation, they use both bivariate
and multivariate models. The former is presented in Table 12.3, and the latter is
presented in Table 12.4. The long-run coefficients are calculated by using estimates
from the CECM models, which is presented in Table 12.5.
Both bounds tests from the bivariate and multivariate models reject the null hypotheses,
and imply the existence of long-run relationships between real stock returns and
inflation, and between real stock returns and expected inflation. We let readers read the
original paper for getting detailed discussions. We expect that economic series are often
characterized by a mixture of I(0) and I(1) variables because some of them are in
differenced form such as asset returns, growth rates, and so on.
Table 12.3: ARDL models and bounds tests for bivariate relationships.

Source: Rushdi et al. (2012: p.540).

120
Table 12.4: ARDL models and bounds tests for multivariate relationships.

Source: Rushdi et al. (2012: p.541).

Table 12.5: Long-run coefficients in bivariate and multivariate models.

Source: Rushdi et al. (2012: p.541).

121
It is worth noting that all the above mentioned cointegration tests (i.e., EG, Johansen,
and ARDL bounds tests) assume that no structural change exists in the system. If this
is a case, an alternative method such as Gregory and Hansen (1996)43 should be used.

13.3.2 Imported technology and CO2 emission


This example is based on the research article by Danish, Wang and Wang (2018). Its
aim is to investigate the determinants of CO2 emission through the relationship between
imported technology and environmental degradation over the period 1980-2011 in
China. In doing so, this paper employs both ARDL bounds test and Johansen approach
for testing cointegration, and the VECM model for Granger causality analysis. The
variables used in this study include: CO2 emission (CO2, per capita carbon in tons),
energy consumption (EN, kg of oil equivalent per capita), FDI (FDI, net inflow as
percentage of GDP), and trade openness (TO, sum of import and export as percentage
of GDP). These variables are collected from the database of world development
indicator (WDI). Imported technology is measured as royalty and licensing fees (IT, as
fees per one million dollars of GDP per capita). This data is collected from World Bank
website. All the variables are in logarithmic form.
It is firstly noted that there are some ‘typing mistakes’ in the CECM equations (4) to
(7), i.e., the presence of first difference operator () in the cointegrating vectors is
incorrect. In fact, they should be removed. Another mistake occurs in Table 3 (Johansen
cointegration), i.e., the ranks should be 0, 1, 2, 3, and 4. In addition, the presentation of
the VECM for causality analysis is incorrect. Both ADF and PP tests indicate that all
five variables are integrated of order 1 (Table 12.6). Therefore, from the sense of time
series econometrics we can apply Johansen approach for testing the existence of long-
run relationships among the variables of interest. In this study, the ADRL bounds testing
approach is the main test for cointegration, while Johansen approach (Table 12.8) is
used to check robustness of the ARDL bounds test results (Table 12.7).

43
See Narayan (2005).

122
Table 12.6: Unit root analysis.

Source: Danish et al. (2018: p.4208)

Table 12.7: Results of ARDL bounds test.

Source: Danish et al. (2018: p.4208)

123
Table 12.8: Results of Johansen cointegration test.

Source: Danish et al. (2018: p.4208)

Table 12.9: Long-run and short-run results.

Source: Danish et al. (2018: p.4208)

124
Both ARDL bounds test and Johansen test confirm that there are long-run relationships
among imported technology, energy consumption, FDI, trade openness, and CO2
emission. In particular, there are at least three cointegration relationships exists among
the variables of interest. The paper also provides the results of both long-run and short-
run relationship between CO2 emission and other variables in the model using ARDL
cointegration technique (Table 12.9). However, there are two questionable issues. First,
that the lag length is one for all variables seems to be unreasonable because the optimal
lag length in ARDL bounds test for this model is (1,0,0,1,0). Second, the interpretation
of the elasticities is problematic. The results of causality analysis indicate a bi-
directionality exists between imported technology and carbon emission.

12.4 Estimating ARDL Model and Bounds Test in Stata


This guideline is based on Kripfganz and Schneider (2016). The syntax of ARDL
estimation is as follow:

ardl depvar [indepvars] [if] [in] [, options]

where selected options include:


lags(numlist): set lag lengths,
maxlags(numlist): set maximum lag lengths,
ec: display output in error-correction form,
ec1: like option ec, but level variables in t − 1 instead of t,
aic: use AIC as information criterion instead of BIC,
exog(varlist): exogenous variables in the regression,
noconstant: suppress constant term,
trendvar(varname): specify trend variable,
restricted: restrict constant or trend term.
If we want to implement the ARDL bounds test, the option should be either ec or ec1,
and right after estimating the ARDL model with this kind of option, we can type the
following command for ARDL bounds test:
estat btest
Suppose we have four endogenous variables Xt, Yt, Wt, Zt and one dummy variable D,
and we assume that Yt-6 has statistically significant effect on Yt [i.e., AR process], Xt - 0
has statistically significant effect on Yt, Wt - 3 has statistically significant effect on Yt,

125
and Zt - 5 has statistically significant effect on Yt [i.e., DL processes]. Then the ARDL
model is estimated as the following command:

ardl Yt Xt Wt Zt, exog(D) maxlags(6) aic

Here, the ‘aic’ is Akaike information criterion, which is used for selecting the optimal
lag length of the model [6 in this case]. The estimated coefficient Yt - 6 will be
statistically significant, and the estimated coefficients of Xt, Wt - 3, and Zt - 5 will be
statistically significant. If we just use the option lags(6), we implicitly assume that all
endogenous variables in the model simultaneously have the same lag lengths of 6.
If we want to estimate the CECM with the above information, the Stata command will
be as follow:

ardl Yt Xt Wt Zt, exog(D) ec1 lags(6 0 3 5)

The output of this command includes for components: ADJ (i.e., the speed of
adjustment coefficient for the cointegrating equation with Yt - 1 is a dependent variable),
LR presents the long-run relationship of Yt and other endogenous variables, SR presents
short-run coefficients of the first differences and their lagged variables, and exogenous
variable (i.e., D in this current example).
After estimating above equation, if we want to test for cointegration by using ARDL
bounds test, we type:
estat btest

13. NONSTATIONARY PANELS


13.1 Introduction
I observe that a new strand of research, especially in energy sector, has recently used
panel unit root and panel cointegration tests. In macroeconomics, this research field is
also increasing thanks to longer time series data is updated in many groups of
economies. Asteriou & Hall (2011: p.442) said that traditional panel data analysis has
ignored the unit root and cointegration tests. With the growing involvement of
macroeconomic applications, where a large sample of countries provide data over
lengthy time series, the issues of stationarity and cointegration have also emerged in
panel data analysis. The traditional panel data models such as fixed effects and random
effects are mainly appropriate for micro panels with large N but small T. As time
dimension is growing, various macro panels may contain unit roots.

126
Westerlund et al. (2015) said that the traditional tests for unit root and cointegration for
each country are wasteful. They list some typical reasons why it is worth using a joint
panel data. First, in many studies, a group of countries becomes the main interest of
investigation. Second, the use of panel data instead of individual time series data not
only increases the number of observations and variation but also reduces the noise from
the individual time series regressions. Third, the power of tests is increased in panels
because if individual time series is not long enough. This is particularly reasonable
when doing research in developing countries where data may be not available, or
available but over a very short period. Fourth, unlike the unit-by-unit approach, the joint
panel approach accounts for the multiplicity of the testing problem. In addition, Narayan
& Smyth (2014) stated that the traditional testing methods in time series result in mixed
findings. Therefore, they expect a shift towards the nonstationary panel data approach,
which is now observing a very large number of studies being published.
Using search engine with key words like ‘panel unit root*’ or ‘panel cointegration*’,
we can realize an increasing number of empirical studies using nonstationary panel
techniques have been published, remarkably in economics and energy journals. Below
is an example list:
Table 13.1: Journals with nonstationary panel publications.

Journal H Index Quartiles


Ecological Economics 151 Q1
Journal of Health Economics 97 Q1
Journal of Money, Credit and Banking 84 Q1
Journal of Economic Surveys 72 Q1
Journal of Comparative Economics 66 Q1
Economic Papers 56 Q2
Economic Modelling 45 Q2
Review of Development Economics 40 Q2
Journal of Economics and Business 40 Q2
Journal of Policy Modeling 38 Q2
Journal of Macroeconomics 34 Q2
Economic Record 36 Q3
Economic Systems 27 Q3

127
Bulletin of Economic Research 23 Q3
Economic Analysis and Policy 17 Q3
Journal of Financial Research 25 Q3
Renewable and Sustainable Energy Reviews 176 Q1
Energy Policy 146 Q1
Energy 134 Q1
Applied Energy 125 Q1
Energy Economics 101 Q1
Resources Policy 44 Q1
Journal of Cleaner Production 116 Q1

Source: http://www.scimagojr.com/index.php.

These indicate the fact that this new strand of research is clearly promising. Therefore,
I think that economics students at UEH should be equipped with nonstationary panel
techniques along with traditional econometrics models. If this is a case, a shining
prospect of publications is coming for young researchers because expensive survey-
based researches are beyond the ability of our economics students.
13.2 Panel Unit Root Tests
This section is mainly based on three key references: Banerjee (1999), Asteriou & Hall
(2011), and especially StataCorp (2015). To get started with panel unit root tests, it is
worth noting the following points (see Asteriou & Hall, 2011: p.443; StataCorp, 2015:
p.512). First, some of the tests require balanced panels (i.e., Ti = T for all i such as LLC,
HT, Breitung, and Hadri), whereas others allow for unbalanced panels (such as IPS,
MW). Second, one may form the null hypothesis as a generalization of the standard
ADF test (i.e., all series in the panel are assumed to be nonstationary) and reject the null
hypothesis if some of the series in the panel appear to be stationary, while on the other
hand one can formulate the null hypothesis in exactly the opposite way (i.e., all series
in the panel are stationary) and reject the null hypothesis if there is sufficient evidence
of nonstationarity [e.g., Hadri LM test, see StataCorp, 2015: p.522-3]. Third, that is the
assumptions about asymptotic behavior of a panel’s N and T dimensions (i.e., regarding
the rates at which these parameters approach to infinity).
Similar to the unit root tests in time series data, the counterparts in panel data are based
on the following first-order autoregressive model:
128
Yit = iYi,t-1 + Ziti + it (113)

where i = 1, …, N indexes panels; t = 1, …, Ti indexes time; Yit is the variable being


tested; and it is a stationary error term. The Zit term can represent panel-specific means,
panel-specific means and a linear time trend, or nothing, depending on the options
specified to the Stata command syntax (i.e., xtunitroot). In Stata default, Zit = 1, so that
the term Ziti represents panel-specific means (i.e., fixed effects). If trend is specified,
Zit = (1, time) so that Ziti represents panel-specific means and linear time trends. For
tests that do not allow it, specifying ‘noconstant’ to omit the Ziti term.
Panel unit root tests are used to test the null hypothesis H0: i = 1 for all i versus the
alternative hypothesis Ha: i < 1 [Note: It’s very similar to unit root tests in time series,
Eq.(41)]. Depending on the test, Ha may hold, for one i, a fraction of all i or all i; the
output of the respective test precisely states the alternative hypothesis [as shown in the
below tables]. For Yit may be nonstationary, so Eq.(112) could lead to spurious results.
Therefore, the test equation is often rewritten as:

Yit = iYi,t-1 + Ziti + uit (114)

Note that Zit may now include lagged terms of dependent variables for controlling serial
correlations. This is similar to the ADF equation [in Section 6.5].
For Eq.(114), the null hypothesis is then H0: i = 0 for all i versus the alternative
hypothesis Ha: i < 0. [Note that the Hadri LM test assumes the null hypothesis that all
panels are stationary (i.e., H0: i < 0) versus the alternative hypothesis that at least some
of the panels contain unit roots (i.e., Ha: i = 0 for some i). In general, most tests assume
the null hypothesis that the panels contain unit roots, i.e., H0: i = 0]. We now discuss
typical panel unit root tests that are available in Stata. In addition, we will give
illustrative examples using pennxrate.dta [i.e., in Stata command, we type webuse
pennxrate to open the data file from web]44. This dataset contains real exchange rate
data based on the Penn World Table. This is a balanced panel consisting of 151 countries
observed over 34 years, from 1970 to 2003. The variable of interest is lnxrate, the log
of the real exchange rate. The data contains the variable g7, which indicates a group of
six advanced economies because the U.S is treated as the domestic country, so it is not
included (StataCorp, 2015: p.514).

44
Note that the Webuse datasets are clearly specified in the respective examples of Stata manual. Depending on
the version we use, the file names may be different. This dataset is currently used with Stata 14.

129
Leven, Lin and Chu (LLC) test
This test is an extension of the conventional ADF test for a sample of N cross sections
observed over T time periods is given by:

∆Yi,t = Yi,t−1 + Zit βi + ∑j=1 ∅ij ∆Yi,t−j + uit


p
(115)

where j = 1, 2, …, p ADF lags. The term Ziti may include unit-specific fixed effects
and unit-specific time effects in addition to common time effects. The unit-specific
effects are an important source of heterogeneity, since the coefficient of the lagged Yi
[i.e., ] is restricted to being homogeneous across all units of the panel (Banerjee, 1999;
Asteriou & Hall, 2011: p.443). In other words, the LLC test assumes a homogenous
panel; that is, imposing an identical first-order autoregressive coefficient on each series
in the panel: 1 = 2 = … N = . In Stata, this is called as the ‘common’ autoregressive
p
parameter. The terms ∑j=1 ∅ij ∆Yi,t−j are included in order to control for the possible
serial correlation problem. The number of lags p can be specified using the option
lags(aic #), i.e., we choose the lag lengths that minimize the Akaike information
criterion within the # specified [e.g., lags(aic 10): Stata will calculate AIC for each of
10 lags, and report the lag length producing the smallest AIC]. It is assumed that if we
include sufficient lags, the error term uit will be a white noise.
The null and the alternative hypotheses of this test are:
H0 :  = 0
H1:  < 0 [that is, 1 = 2 = … N =  < 0]
The LLC test also assumes that the individual processes are cross-sectionally
independent [i.e., the errors are assumed to be independent across the units of the sample
(Banerjee, 1999)]. Under this assumption, the test derives conditions for which the
pooled OLS estimator of  will follow a standard normal distribution under the null
hypothesis. The LLC test may be viewed as a pooled ADF test, potentially with different
lag lengths across the different sections in the panels. The LLC can be used with panels
of “moderate” size, i.e., having between 10 to 250 panels and 25 to 250 observations
per panel (StataCorp, 2015: p.513). The Stata command for LLC test is as:

webuse pennxrate
xtunitroot llc lnrxrate if g7, lags(aic 10)

130
Table 13.2: LLC test of lnrxrate for G7 group.

Source: StataCorp, 2015: p.515.

The LLC bias-adjusted test statistic t* = -4.0277 is significantly less than zero (p <
0.0000), so we reject the null hypothesis of a unit root [that is, that  = 0 in (115)] in
favor of the alternative that lnrxrate is stationary [that is, that  < 0)]. Note that the
‘unadjusted t’ is a conventional t statistic for testing H0:  = 0.
Because the G7 economies have many similarities, the test results could be affected by
cross-sectional correlation in real exchange rates. One way to control this problem is to
remove cross-sectional average from the data. The Stata command is as follow:

Table 13.3: LLC test of lnrxrate for G7 group with demean option.

Source: StataCorp, 2015: p.516.

131
Harris-Tsavalis (HT) test
In many datasets, particularly in microeconomics, the time dimension, T, is small, so
test whose asymptotic properties are established by assuming that T tends to infinity
can lead to incorrect inference. HT (1999) derived a unit-root test that assumes the time
dimension, T, is fixed. Their simulation results suggest that the test has a favorable size
and the power properties for N greater than 25, and they report that the power improves
faster as T increases for a given N than when N increases for a given T (StataCorp,
2015: p.516).
The HT test statistic is based on the OLS estimator, , in the regression model:

Yit = Yi,t-1 + Ziti + it (116)

Harris and Tsavalis assume that uit is independent and identically distributed (iid)
normal with constant variance across panels. Because of the bias induced by the
inclusion of the panel means and time trends in this model, the expected value of the
OLS estimator is not equal to unity under the null hypothesis. Harris and Tsavalis
derived the mean and standard error of ρ̂ for (116) under the null hypothesis H0:  = 1
when neither panel-specific means nor time trends are included, when only panel-
specific means are included (default), and when both panel-specific means and time
trends are included. The asymptotic distribution of the test statistic is justified as N 
, so we should have a relatively large number of panels if we want to use this test.
Note that, like the LLC test, the HT test assumes that all panels share the same
autoregressive parameter, [i.e.,  instead of i].
Because the HT test is designed for cases where N is relatively large, here we test
whether the series lnrxrate contains a unit root using all countries in the dataset. We will
again remove cross-sectional means to help control for contemporaneous correlation.
The Stata command is as:
webuse pennxrate
xtunitroot ht lnrxrate, demean

132
Table 13.4: HT test of lnrxrate for all countries.

Source: StataCorp, 2015: p.517.

The point estimate of  in Eq.(116) is 0.8184, its z statistic is – 13.1239 and the p-value
is practically zero. Therefore, we strongly reject the null hypothesis of a unit root.
It is noted that we cannot compare the test results between the two tests (i.e., LLC and
HT) because LLC just uses a subset of data, while HT uses the whole dataset. The LLC
assumes that N/T  0, so N should be small relative to T. For the G7 group, it is more
likely to add more years of data rather than add more countries, because the number of
such countries in the world is virtually fixed. Therefore, the assumption that T grows
faster than N is certainly reasonable. On the other hand, the HT test assumes that T is
fixed whereas N goes to infinity. This assumption seems not to be plausible.
In short, it is important to remember that when selecting a panel unit-root test, you must
consider the relative size of N and T, and the relative speeds at which they tend to
infinity or whether either N or T is fixed.

Breitung test
Both the LLC and HT tests take approach of first fitting a regression model and
subsequently adjusting the autoregressive parameter or its t statistic to compensate for
the bias induced by having a dynamic regressor and fixed effects in the model. The
Breitung (2000; Breitung and Das, 2005) test takes a different strategy: adjusting the
data before fitting a regression model so that bias adjustments are not needed.

133
Table 13.5: The Breitung test of lnrxrate for OECD countries.

Source: StataCorp, 2015: p.518.

In the LLC test, additional lags of the dependent variable could be included to control
for serial correlation. The Breitung procedure instead allows for a prewhitening of the
series before computing the test. In particular, if the trend option is not specified, we
regress Yit and Yi,t - 1 on Yi,t - 1, Yi,t - 2, …, Yi,t - p and use the residuals from those
regressions in places of Yi,t and Yi,t - 1 in computing the test. If the trend option is
specified, the Breitung method will use a different prewhitening procedure that involves
fitting only one (instead of two) preliminary regressions (StataCorp, 2015: p.517).
Monte Carlo simulations by Breitung (2000) show that bias-corrected statistic such as
LLC’s t* suffer from low power, particularly against alternative hypotheses with
autoregressive parameters near one (i.e., ~ 1) and when the panel-specific effects are
included. In contrast, the Breitung (2000) test statistic exhibits much higher power in
these cases. Moreover, the Breitung test has good power even with small datasets (N =
25, T = 25), though the power of the test appears to deteriorate when T is fixed and N
is increased. The Breitung test assumes that the error term uit is uncorrelated across both
cross-sectional dimension i and time dimension t.
The Breitung test results for OECD countries are presented in Table 13.5. Because the
p-value is 0.0465, so we can reject the null hypothesis of a unit root at the 5% level, but
not at the 1% level.

134
Im, Pesaran and Shin (IPS) test
All the tests we have discussed so far assumed that all panels are homogeneous across
sections [i.e., 1 = 2 = … N = ]. Im et al. (1997, 2003) extended the LLC test, allowing
for heterogeneity in the value of  under the alternative hypothesis. The IPS test
provides separate estimations for each i section, allowing different specifications of the
parametric values, the residual variance and the lag lengths (Asteriou & Hall 2011:
p.444). In addition, the IPS test does not require balanced datasets, though there cannot
be gaps within a panel (StataCorp, 2015: p.518). Their model is given by:

∆Yi,t = 𝐢 Yi,t−1 + Zit βi + ∑j=1 ∅ij ∆Yi,t−j + uit


p
(117)

while now the null and alternative hypotheses are formulated as:

H0: i = 0 for all i


H1: i < 0 for at least one i

Thus, the null for this test is that all series are non-stationary processes under the
alternative that some or all of the individual series in the panel are stationary. This is in
sharp contrast with the LLC test, which assumes that all series are stationary under the
alternative hypothesis (Asteriou & Hall, 2011: p.444; Ouedraogo, 2013). In addition,
the model assumes the errors uit are serially autocorrelated with different serial
correlation (and variance) properties across units (Banerjee, 1999).
The authors object the use of pooled panel estimators as those used by LLC test, for
processes which display heterogeneity. Therefore, Im et al. (1997) propose the use of a
group-mean Lagrange multiplier (LM) statistic to test for the null hypothesis (Banerjee,
1999). However, when N and T are fixed, IPS uses simulation to calculate ‘exact’
critical values for the average of the ti statistics (i.e., t-bar), which requires a balanced
panel (Asteriou & Hall, 2011: p.444; StataCorp, 2015: p.519). Their t-bar statistic is
nothing other than the average of the individual ADF t-statistics for testing that i = 0
for all i (denoted by ti):

1
t̅ = ∑N
i=1 t i (118)
N

The Stata command for IPS test is as:


xtunitroot ips lnrxrate if oecd, demean

135
Table 13.5: The IPS test of lnrxrate for OECD countries.

Because t-bar value (= - 3.1327) is less than even its 1% critical value (= - 1.810), we
strongly reject the null hypothesis of a unit root.
The statistic labeled t-tilde-bar is similar to the t-bar statistic, except that a different
estimator of the Dickey-Fuller regression error variance is used (StataCorp, 2015:
p.519). In addition, a standardized version of t-tilde-bar statistic is labeled Z-t-tilde-bar
has an asymptotic standard normal distribution. And the p-value corresponding to Z-t-
tilde-bar is practically zero, which strongly reject the null hypothesis of a unit root.
If we include lag lengths of the dependent variable into the test equation, in the output
we see the W-t-bar statistic. This statistic has an asymptotically standard normal
distribution as T  . In this case, we should have a reasonably large number of both
time periods and panels (StataCorp, 2015: p.520).

Fisher-type test
Maddala and Wu (1999) attempted to improve to some degree the shortcomings of both
the LLC and IPS tests. They argue that while Im et al. (1997) tests relax the assumption
of homogeneity of the root across the units, several difficulties still remain (see
Banerjee, 1999). Basically, Maddala and Wu are consent with the assumption that a
heterogenous alternative is preferable, they, however, disagree with the use of the
average ADF t statistics by arguing that it is not the most effective way of evaluating
stationarity (Asteriou & Hall, 2011: p.445). They propose the use of a test due to Fisher
(1932) which is based on combining the p-values of the test statistic for a unit root in
each cross-sectional unit. The Fisher test is non-parametric, and may be computed for

136
any arbitrary choice of a test for the unit root. It is an exact test and the statistic given
by (Banerjee, 1999):

 = -2∑N
i=1 ln(πi ) (119)

where i is the probability limit values from regular ADF (or PP) unit-root tests for each
cross-section i. Because -2lni has a 2 distribution with 2 degrees of freedom, the 
statistic will follow a 2 distribution with 2N degrees of freedom as Ti   for finite
N. To consider the dependence between cross-sections, Maddala and Wu propose
obtaining the i values using bootstrap procedures by arguing that correlations between
groups can induce significant size distortions for the tests.
The Stata command is as follow:
xtunitroot fisher lnrxrate, dfuller drift lags(2) demean
or
xtunitroot fisher lnrxrate, pperron lags(2) demean
Table 13.6: The Fisher-type test of lnrxrate for all countries.

Source: StataCorp, 2015: p.522.

All four of the tests strongly reject the null hypothesis that all the panels contain unit
roots (StataCorp, 2015: p.522).

137
13.3 Panel Cointegration Tests
This section is mainly ‘cited’ from the StataCorp (2017a). In this manual of panel
cointegration tests, we realize a new Stata command ‘xtcointtest’ is introduced to
replace separate commands of previous Stata versions such as xtpedroni, xtwest, and
xtdolshm. Of course, you must install Stata 15 to run this updated command. The
xtcointtest performs the Kao (1999), Pedroni (1999, 2004), and Westerlund (2005) tests
of cointegration on a panel dataset. We can include panel-specific means and panel-
specific time trends in the cointegrating regression model. All tests have a common null
hypothesis of no cointegration. The alternative hypothesis of Kao and Pedroni tests is
that the variables are cointegrated in all panels. Westerlund test has two different
versions of alternative hypothesis, one assumes cointegration in all panels, another
assumes cointegration in some of the panels.
All the cointegration tests in xtcointest are based on the following panel-data model for
the I(1) dependent variable Yit, where i = 1, 2, …, N denotes the panel and t = 1, 2, …,
T, denotes time:

Yit = X it βi + Zit γi + eit (120)

For each panel i, each of the covariates in Xit is an I(1) series. All the tests require that
the covariates are not cointegrated among themselves. The Pedroni and Westerlund tests
allow a maximum of seven covariates in Xit. i denotes the cointegrating vector, which
may vary across panels. i is a vector of coefficients on Zit, including the deterministic
terms that control for panel-specific effects and linear time trends, and eit is the error
term. Depending on the options specified with xtcointtest, the vector Zit allows for
panel-specific means, panel-specific means and panel-specific time trends, or nothing.
By default, Zit = 1, so the term Zit γi represents panel-specific means (i.e., fixed effects).
If trend is specified, Zit = (1, t) so Zit γi represents panel-specific means and panel-
specific linear trends. The option ‘noconstant’ specifies nothing.
All tests share a common null hypothesis that Yit and Xit are not cointegrated. xtcointtest
tests for no cointegration by testing that eit [from Eq.(120)] is nonstationary. Rejection
of the null hypothesis implies that eit is stationary and that the series Yit and Xit are
cointegrated. The alternative hypothesis of the Kao tests, the Pedroni tests, and the
allpanels version of the Westerlund tests is that the variables are cointegrated in all
panels. Whereas the alternative hypothesis of the somepanels version of the Westerlund
tests is that the variables are cointegrated in some of the panels.
All the tests allow unbalanced panels and require that N is large enough that the
distribution of a sample average of panel-level statistics converges to its population

138
distribution. They also require that each Ti is large enough to run time-series regressions
using observations only from that panel.
The Kao, Pedroni, and Westerlund tests implement different types of tests for whether
eit is nonstationary. The DF tests, ADF tests, PP tests, and their variants that are reported
by xtcointtest kao and xtcointtest pedroni use different regression frameworks to handle
serial correlation in eit. The VR (variance ratio) tests that are reported by xtcointtest
westerlund and xtcointtest pedroni do not require modeling or accommodating for serial
correlation.
All variants of the DF t test statistics are constructed by fitting the model in (120) using
ordinary least squares, obtaining the predicted residuals (êit ), and then fitting the DF
regression models:

êit = ρêi,t−1 + vit (121)

e
̂ it = (ρ-1)êi,t−1 + it (121')

where ρ is the AR parameter and νit (it) is a stationary error term. The DF and the
unadjusted DF test whether the coefficient ρ is 1. By contrast, the modified DF and the
unadjusted modified DF test whether (ρ – 1) = 0. Nonstationarity under the null
hypothesis causes a test of whether ρ = 1 to differ from a test of whether (ρ – 1) = 0.
Note that these test equations assume the same AR coefficient.
The variants of these test statistics are based on the following DF regression model:

êit = ρiêi,t−1 + vit (122)

e
̂ it = (ρi-1)êi,t−1 + it (122')

In this case, we have a panel-specific AR parameter ρi. The PP t test statistic and its
variants are nonparametrically adjusted for serial correlation in the residuals using the
Newey and West (1987) heteroskedasticity- and autocorrelation-consistent (HAC)
covariance matrix estimator.
The DF, the modified DF, the PP, the modified PP, and the modified VR tests are
derived by specifying a data-generating process for the dependent variable and the
regressors. This specification allows the regressors to be endogenous as well as serially
correlated. Therefore, constructing the test statistics requires estimating the
contemporaneous and dynamic covariances between the regressors and the dependent
variable. The unadjusted DF and the unadjusted modified DF assume absence of serial

139
correlation and strictly exogenous covariates and do not require any adjustments in the
residuals.
Like the DF and PP tests, the ADF tests that ρ = 1. However, the ADF test uses
additional lags of the residuals to control for serial correlation instead of the Newey
West nonparametric adjustments. The ADF regression is

p
êit = ρêi,t−1 + ∑j=1 αij êi,t−j + wit (123)

̂ it = (ρ – 1)êi,t−1 + ∑p ij ∆êi,t−j + it


e (123’)
j=1

or
p
êit = ρiêi,t−1 + ∑j=1 αij êi,t−j + wit (124)

̂ it = (ρi – 1)êi,t−1 + ∑pj=1 ij ∆êi,t−j + it


e (124’)

where ∆êi,t−j is the jth lag of the first difference of ∆êi,t and j = 1, …, p is where p is the
number of lagged differences of dependent variable in each respective test equation.
The VR tests are based on Phillips and Ouliaris (1990) and Breitung (2002), where the
test statistic is constructed as a ratio of variances. These tests do not require modeling
or accommodating serial correlation. VR tests also test for no cointegration by testing
for the presence of a unit root in the residuals. However, they do so using the ratio of
variances of the predicted residuals. The modified VR test removes estimated
conditional variances prior to computing the VR.
Now take some examples using the command xtcointtest. The dataset used in these
examples is xtcoint.dta, which can be downloaded from Stata-Press by using the
command webuse xtcoint [remember that it is used for Stata 15]. The balanced panel
dataset on 100 countries observed from 1973q3 to 2010q4 contains quarterly data on
the log of productivity (productivity), log of domestic R&D capital stock (rddomestic),
and log of foreign R&D (rdforeign). In these examples, we are interested in the long-
run effects of domestic research and development (R&D) and foreign R&D on an
economy’s productivity.

Kao tests
The cointegrating relationship is specified as:

productivityit = i + 1rddomesticit + 2rdforeignit + eit (125)

140
Here i is the panel-specific means and the cointegrating parameters 1 and 2 are the
same across panels. We assume each series is I(1). They can be tested by using panel
unit root tests discussed above (xtunitroot). It is noted that Kao tests assume the same
AR coefficient [i.e., using Eqs.(121, 121’, 123, 123’)].
The test result is as below:

Table 13.7: Kao test for cointegration.

Source: StataCorp (2017a).


All test statistics reject the null hypothesis of no cointegration in favor of the alternative
hypothesis of the existence of a cointegrating relationship among productivity, domestic
R&D, and foreign R&D.

Pedroni tests
The cointegrating relationship is specified as:

productivityit = i + 1irddomesticit + 2irdforeignit + eit (126)

This test allows for panel-specific cointegrating vectors. This heterogeneity


distinguishes Pedroni tests from those derived by Kao. Another difference is that the
Pedroni tests allow the AR coefficient to vary across panels [i.e., using Eqs.(122, 122’,
124, 124’)]. These panel-specific AR coefficients are the default in xtcointtest pedroni,
but the ar(same) option restricts the AR coefficients (i = ) to be the same over panels.
The test result is as below:

141
Table 13.8: Pedroni test for cointegration with panel-specific AR parameter.

Source: StataCorp (2017a).

Table 13.9: Pedroni test for cointegration with a common AR parameter.

Source: StataCorp (2017a).

All test statistics reject the null hypothesis of no cointegration in favor of the alternative
hypothesis of the existence of a cointegrating relationship among productivity, domestic
R&D, and foreign R&D.

142
Westerlund tests
In allpanels option, the Westerlund tests use the model in which the AR parameter is
the same over the panels, while the default option assumes the panel-specific
cointegrating vectors.

Table 13.10: Westerlund test for cointegration with some panels cointegrated.

Source: StataCorp (2017a).

Table 13.11: Westerlund test for cointegration with all panels cointegrated.

Source: StataCorp (2017a).

The VR statistics reject the null hypothesis of no cointegration. This implies at least
some or all panels are cointegrated.

143
14. SUGGESTED RESEARCH TOPICS
From previous studies, I would like to suggest the following topics that economics
students at UEH can consider for their research proposals.
Saving, Investment and Economic Development
▪ An analysis of the interaction among savings, investments and growth in Vietnam
▪ Are saving and investment cointegrated? The case of Vietnam
▪ Causal relationship between domestic savings and economic growth: Evidence from
Vietnam
▪ Does saving really matter for growth? Evidence from Vietnam
▪ The relationship between savings and growth: Cointegration and causality evidence
from Vietnam
▪ The saving and investment nexus for Vietnam: Evidence from cointegration tests
▪ Do foreign direct investment and gross domestic investment promote economic
growth?
▪ Foreign direct investment and economic growth in Vietnam: An empirical study of
causality and error correction mechanisms
▪ The interactions among foreign direct investment, economic growth, degree of
openness and unemployment in Vietnam
Trade and Economic Development
▪ How trade and foreign investment affect the growth: A case of Vietnam?
▪ Trade, foreign direct investment and economic growth in Vietnam
▪ A cointegration analysis of the long-run relationship between black and official
foreign exchange rates: The case of the Vietnam dong
▪ An empirical investigation of the causal relationship between openness and
economic growth in Vietnam
▪ Export and economic growth in Vietnam: A Granger causality analysis
▪ Export expansion and economic growth: Testing for cointegration and causality for
Vietnam
▪ Is the export-led growth hypothesis valid for Vietnam?
▪ Is there a long-run relationship between exports and imports in Vietnam?
▪ On economic growth, FDI and exports in Vietnam
▪ Trade liberalization and industrial growth in Vietnam: A cointegration analysis

144
Stock Market and Economic Development
▪ Causality between financial development and economic growth: An application of
vector error correction to Vietnam
▪ Financial development and the FDI growth nexus: The Vietnam case
▪ Macroeconomic environment and stock market: The Vietnam case
▪ The relationship between economic factors and equity market in Vietnam
▪ Modelling the linkages between the US and Vietnam stock markets
▪ The long-run relationship between stock returns and inflation in Vietnam
▪ The relationship between financial deepening and economic growth in Vietnam
▪ Testing market efficient hypothesis: The Vietnam stock market
▪ Threshold adjustment in the long-run relationship between stock prices and
economic activity
Energy and the Economy
▪ The dynamic relationship between the GDP, imports and domestic production of
crude oil: Evidence from Vietnam
▪ Causal relationship between gas consumption and economic growth: A case of
Vietnam
▪ Causal relationship between energy consumption and economic growth: The case of
Vietnam
▪ Causality relationship between electricity consumption and GDP in Vietnam
▪ The causal relationship between electricity consumption and economic growth in
Vietnam
▪ A cointegration analysis of gasoline demand in Vietnam
▪ Cointegration and causality testing of the energy-GDP relationship: A case of
Vietnam
▪ Does more energy consumption bolster economic growth?
▪ Energy consumption and economic growth in Vietnam: Evidence from a
cointegration and error correction model
▪ The causality between energy consumption and economic growth in Vietnam
▪ The relationship between the price of oil and macroeconomic performance:
Empirical evidence for Vietnam

145
Fiscal Policy and Economic Development
▪ A causal relationship between government spending and economic development:
An empirical examination of the Vietnam economy
▪ Economic growth and government expenditure: Evidence from Vietnam
▪ Government revenue, government expenditure, and temporal causality: Evidence
from Vietnam
▪ The relationship between budget deficits and money demand: Evidence from
Vietnam
Monetary Policy and Economic Development
▪ Granger causality between money and income for the Vietnam economy
▪ Money, inflation and causality: Evidence from Vietnam
▪ Money-output Granger causality: An empirical analysis for Vietnam
▪ Time-varying parameter error correction models: The demand for money in
Vietnam
▪ Monetary transmission mechanism in Vietnam: A VAR analysis
Tourism and Economic Development
▪ Cointegration analysis of quarterly tourism demand by international tourists:
Evidence from Vietnam
▪ Does tourism influence economic growth? A dynamic panel data approach
▪ International tourism and economic development in Vietnam: A Granger causality
test
▪ Tourism demand modelling: Some issues regarding unit roots, co-integration and
diagnostic tests
▪ Tourism, trade and growth: the case of Vietnam
Agriculture and Economic Development
▪ Dynamics of rice prices and agricultural wages in Vietnam
▪ Macroeconomic factors and agricultural production linkages: A case of Vietnam
▪ Is agriculture the engine of growth?
▪ The causal relationship between fertilizer consumption and agricultural productivity
in Vietnam
▪ Macroeconomics and agriculture in Vietnam

146
Others
▪ Hypotheses testing concerning relationships between spot prices of various types of
coffee
▪ The relationship between wages and prices in Vietnam
▪ An error correction model of luxury goods expenditures: Evidence from Vietnam
▪ The relationship between macroeconomic variables and housing price index: A case
of Vietnam
▪ Explaining house prices in Vietnam
▪ Long-term trend and short-run dynamics of the Vietnam gold price: an error
correction modelling approach
▪ Macroeconomic adjustment and private manufacturing investment in Vietnam: A
time-series analysis
▪ Testing for the long run relationship between nominal interest rates and inflation
using cointegration techniques
▪ The long-run relationship between house prices and income: Evidence from
Vietnam housing markets
It is noted that empirical studies have increasingly used the nonstationary panels,
typically characterized by panel unit root tests and panel cointegration tests. Above
topics can be done by using this strand of models.

15. CONCLUDING REMARKS


We have discussed several topics in time series econometrics in this series of lectures.
Now I want to summarize the key points. We started with an overview of time series
econometrics for the beginners in Applied Economics by distinguishing the
fundamental applications in forecasting versus dynamic modeling and realization
versus stochastic process. The concept ‘stochastic process’ is then clarified into
stationarity and nonstationarity. Every model in applied time series econometrics is
based on stationary variables, but we often encounter nonstationary ones in reality,
especially in macroeconomic data. Therefore, in most practical applications we have to
do some transformations and first differencing is very popular. Two basic stationary
processes of a certain series are moving average (MA) and autoregressive (AR) because
they well satisfy all properties of a weak stationarity (constant mean, constant variance,
and time-invariant covariance). These processes are fundamental components of the so-
called ARIMA forecasting models, although we do not mention such models in this
series of lectures. The typical nonstationary series is indeed the random walk. If this is
a case, the standard OLS regressions may lead to spurious results. In its language, a

147
series characterized by a unit root is known as I(1), i.e., it becomes stationary after
taking the first difference.
In order to know whether a certain series is stationary or not, we can initially use visual
graphics such as time line plot or correlogram. However, the formal statistical tests are
always preferred. We introduced various tests for a unit root such as Dickey-Fuller,
Phillips-Perron, DF-GLS, and KPSS. We started discussing dynamic modeling by
firstly clarifying the short-run and long-run relationships between I(1) variables within
a single equation context. The key to have long-run relationship is the cointegration
between or among variables. If two variables are cointegrated, we are able to investigate
both short-run and long-run effects through error correction mechanism (ECM) models.
In contrast, we just investigate the short-run relationship by regressing a model of the
first differences. For a single equation, the most popular method for testing
cointegration is the Engle-Granger residual-based unit root test (EG approach). This
testing procedure is simply an application of standard unit root tests to the residuals
obtained from regression between or among variables of interest. If the residual is a
stationary series, we conclude that the variables used in such a regression are
cointegrated. The cointegrating equation represents the long-run or equilibrium
relationship between or among variables. Otherwise, we encounter the problem of
spurious regression. Thanks to cointegration, we can estimate the ECM model, in which
the speed of adjustment to equilibrium has practical implications for policy formulation.
If nonstationary variables are cointegrated, the conventional OLS regression models of
the first differences are seriously mis-specified. Although EG approach has many useful
contributions, it also remains various drawbacks, especially in cases of multivariate
analysis and multiple equation context.
The framework for analyzing multivariate relationships and multiple equation systems
is vector autoregressive (VAR) model. VAR modeling provides an useful framework
for forecasting purposes, causality analysis and especially estimation of vector error
correction mechanism (VECM) models. Similar to the single equation case,
cointegration is also a topic of interest in multiple equation approach. In such a situation,
Johansen test for cointegration has dominated the existing literature over the last two
decades. However, Johansen test requires that all the variables under study are
integrated of the same order 1 [i.e., I(1)]. This is not always a case in practice. If we are
skeptical that variables may be I(0), I(1) or mutually cointegrated, we can proceed the
study with ARDL bounds testing approach. It is noted that ARDL bounds test is used
for single equation, either bivariate or multivariate models.
We end our discussion with nonstationary panels, which is an extension of time series
data for panel data, where time dimension is characterized by nonstationarity. This
offers opportunities for pursuing a new strand of research in macroeconomics and
especially energy economics. For Applied Economics students, this new package of

148
techniques is likely more complicated than traditional time series models because they
have not been officially introduced into the curriculum. However, everything has its
price. It is harder, but it provides you chances to do promising researches. I introduced
a brief summary of techniques for nonstationary panels such as panel unit root tests and
panel cointegration tests. Other issues such as causality analysis, dynamic OLS (i.e.,
DOLS), and fully modified OLS (FMOLS) are beyond of this series of lectures 45.
Therefore, if you are really interested, previous empirical studies and advanced
econometrics textbooks are indeed good references.
There are still things that I’d like to share with you on topics of time series econometrics,
but time is over. I hope the notes provide you the very basic knowledge, and it is now
time for you to grab the key points learned so far and prepare a research project of your
interest with real data. My final words for you are as follows:
➢ Self-study
Stata Press provides very good documentation for self-studying
(http://www.stata-press.com). Here you can find out four sources of updated
materials: Books, eBooks, Stata documentation, and Stata journal. I’m most
interested in Stata documentation, because it provides (for free) in details every
syntax and interpretation, including various examples that are extremely useful
for a life-time studying. Two manuals that closely relate to our current discussion
are Time-Series and Longitudinal-data/Panel-data. To be effective, you should
download the datasets (in Supplemental materials), re-do the examples step-by-
step, look at the results, and carefully read the interpretations in these manuals.
In addition, you should prepare do-files for every exercise you’ll have done
because this is a good way that helps review learned lessons in case you might
forget. Furthermore, you can learn plenty of experiences from others via Statalist
and Stata Blog (in Support). Here you can see every problems that one will face
when working with Stata.
➢ Literature review
Students often ask me and my colleagues a question like ‘where do the research
topics come from?’ They are from everywhere around us such as real life
observations, talking with others, and so on. But I think the most important
source is from reading the field of knowledge that you are mostly interested in.

45
A good example for nonstationary panel analysis is the study by Ouedraogo (2013) on the relationship between
energy consumption and human development in 15 developing countries for the period 1988 to 2008. In this study,
the author used all tests (LLC, Breitung, IPS, Fisher-type, and Hadri) for a panel unit root, the Pedroni tests for
panel cointegration, FMOLS and DOLS for long-run elasticities using a panel error correction model, and panel
causality analysis.

149
Reading previous studies makes you think as a resaercher. Every research article
feeds you new ideas for further studies. Reading will show you the gaps that need
to be further filled. For time series topics, you can search a lot of studies in
macroeconomics, financial economics, development economics, environmental
economics, energy economics, health economics, etc. It depends on the major
you pays much attention. For students at developing countries, official access to
academic journals is not an easy task. But you can do in two ways: Google
Scholar and your supervisor. For cross-sectional data researches, as a student it’s
harder to conduct expensive surveys. But for time series data, you can access
available databases much easier, because the UEH’s Data Center has updated
data in various topics.
➢ Hard working
Doing research is never easy. Of course, it is actually a narrow door. It requires
a very strong passion, an acestic spirit, and an open mind. You will face a lot of
difficulties from the beginning to the end. Finding a novel research idea is not
easy. Developing a feasible research proposal is not easy. Looking for funding
is not easy. Writing a complete manuscript is not easy. And publishing it is really
hard. Besides, successive failures are waiting for you at every step. But being an
economics student you should think like an economist. Trade-offs. Being an
economics student you should think of a research career, at least an analyst at a
local fund management company, not just a sales assistant at an MNC in an
empty suit with a couple of luxury smart phones. What I mean is you must work
harder than you think. Societies of developing countries like ours are still prefer
money earners over knowledge creators. However, I believe things are changing.
Economics graduates will be publicly recognized if you and your next
generations are working more seriously./.

150
REFERENCES
Acock, A. C. (2014). A gentle introduction to Stata, 4th Edition. Stata Press.
Adkins, L. C., and Hill, R. C. (2011). Using Stata for principles of econometrics, 4th
Edition. John Wiley & Sons.
Asteriou, D., and Hall, S.G. (2011). Applied econometrics, 2nd Edition. Palgrave
Macmillan.
Banerjee, A. (1999). Panel data unit roots and cointegration: An overview. Oxford
Bulletin of Economics and Statistics, Special Issue, 607-629.
Binh, P. T. (2011). Energy consumption and economic growth in Vietnam: Threshold
cointegration and causality analysis. International Journal of Energy Economics
and Policy, 1, 1-17.
Danish, Wang, B., and Wang, Z. (2018). Imported technology and CO2 emission in
China: Collecting evidence through bound testing and VECM approach. Renewable
and Sustainable Energy Reviews, 82, 4204-14.
Dickey, D.A. and Fuller, W.A. (1979). Distribution of the estimators for autoregressive
time series with a unit root’. Journal of the American Statistical Association, 74,
427- 431.
Dickey, D.A. and Fuller, W.A. (1981). Likelihood ratio statistics for autoregressive
time series with a unit root. Econometrica, 49, 1063.
Engle, R.F., and Granger, C.W.J. (1987). Co-integration and error correction estimates:
representation, estimation, and testing. Econometrica, 55, 251–276.
Granger, C.W.J. (1981). Some properties of time series data and their use in
econometric model specification. Journal of Econometrics, 16, 121-130.
Granger, C.W.J. and Newbold, P. (1977). Spurious regression in econometrics’.
Journal of Econometrics, 2, 111-120.
Greene, W. H. (2008). Econometric analysis, 6th Edition. Pearson.
Gregory, A. W., and Hansen, B. E. (1996). Residual-based tests for cointegration in
models with regime shifts. Journal of Econometrics, 70, 461-70.
Gujarati, D. (2011). Econometrics by Example, 1st Edition, Palgrave Macmillan.
Gujarati, D., and Porter, D. (2009). Basic Econometrics, 5th Edition, McGraw-Hill.
Hamilton, L. C. (2013). Statistics with Stata: Updated for version 12. CENGAGE
Learning.
Hanke, J.E., and Wichern, D.W. (2005). Business Forecasting, 8th Edition. Pearson
Education.
Johansen, S. (1991). Estimation and hypothesis testing of cointegration vectors in
Gaussian vector autoregressive models. Econometrica, 59, 1551-1580.

151
Johansen, S. and Juselius, K. (1990). Maximum likelihood estimation and inference on
cointegration, with applications for the demand for money. Oxford Bulletin of
Economics and Statistics, 52, 169-210.
Kripfganz, S., and Schneider, D. C. (2016). Ardl: Stata module to estimate
autoregressive distributed lag models. Stata Conference, Chicago.
Ljung, G.M. and Box, G.E.P. (1978). On a measure of lack of fit in times series models.
Biometrica, 65, 297-303.
Lumsdaine, R., and Papell, D. (1997). Multiple trend breaks and the unit root
hypothesis. Review of Economics and Statistics, 79, 212-18.
MacKinnon, J.G. (1991). Critical values for cointegration tests, in R.F. Engle and
C.W.J. Granger (eds), Long-run economic relationships: Readings in cointegtion.
Oxford: Oxford University Press.
Mackinnon, J.G. (1996). Numerical distribution functions for unit root and
cointegration tests. Journal of Applied Econometrics 11, 601-618.
Narayan, P. K. (2005). The saving and investment nexus for China: evidence from
cointegration tests. Applied Economics, 37, 1979-1990.
Narayan, P. K., and R. Smyth (2014). Applied econometrics and a decade of energy
economics research. Unpublished manuscript.
Nguyen Trong Hoai, Phung Thanh Binh, and Nguyen Khanh Duy. (2009). Forecasting
and data analysis in economics and finance, Statistical Publishing House.
Omri, A. (2014). An international literature survey on energy economic growth nexus:
Evidence from country-specific studies. MPRA Paper, No. 82452.
Ouedraogo, N. (2013). Energy consumption and human development: Evidence from
a panel cointegration and error correction model. Energy, 63, 28-41.
Ozturk, I. (2010). A literature survey on energy–growth nexus. Energy Policy 38, 340–
349.
Pesaran, H.M., Shin, Y., Smith, R.J. (2001). Bounds testing approaches to the analysis
of level relationships. Journal of Applied Econometrics 16, 289–326.
Phillips, P. C. B. (1986). Understanding spurious regressions in econometrics. Journal
of Econometrics, 33, 311-340.
Phillips, P.C.B. (1987). Time series regression with a unit root’, Econometrica, 55,
277-301.
Phillips, P.C.B. (1998). New tools for understanding spurious regressions.
Econometrica, 66, 1299-1325.
Phillips, P.C.B. and Perron, P. (1988). Testing for a unit root in time series regression.
Biometrica, 75, 335-346.
Rahman, M. M., and Kashem, M. A. (2017). Carbon emissions, energy consumption
and industrial growth in Bangladesh: Empirical evidence from ARDL cointegration
and Granger causality analysis. Energy Policy, 110, 600-8.

152
Rushdi, M., Kim, J. H., and Silvapulle, P. (2012). ARDL bounds tests and robust
inference for the long run relationship between real stock returns and infl ation
in Australia. Economic Modeling, 29, 535-543.
Sims, C.A. (1980). Macroeconomics and reality. Econometrica, 48, 1-48.
StataCorp. (2015). Longitudinal data/panel data reference manual release 14:
xtunitroot. College Station, TX, Stata-Press.
StataCorp. (2017a). Longitudinal data/panel data reference manual release 15:
xtcointtest. College Station, TX, Stata-Press.
StataCorp. (2017b). Time series reference manual release 15. College Station, TX,
Stata-Press.
Stock, J.H., and Watson, M.W. (2015) Introduction to econometrics, 3rd Edition,
Pearson Education.
Studenmund, A.H. (2017). Using econometrics: A practical guide, 7th Edition, Pearson.
Toda, H.Y. and Yamamoto, T. (1995). Statistical inference in vector autoregressive
with possibly integrated processes. Journal of Econometrics, 66, 225-250.
Verbeek, M. (2004). A Guide to modern econometrics. 2nd Edition, John Wiley & Sons.
Westerlund, J., Thuraisamy, K., and Sharma, S. (2015). On the use of panel
cointegration tests in energy economics. Energy Economics, 50, 359-63.
Wooldridge, J. M. (2013). Introductory econometrics: A modern approach, 5th Edition,
South-Western CENGAGE Learning.
Zhang, H., Zhao, Q., Kuuluvainen, I., Wang, C., and Li, S. (2015). Determinants
of China’s lumber import: A bounds test for cointegration with monthly data.
Journal of Forest Economics, 21, 269-82.
Zivot, E., and Andrews, D. W. K. (1992). Further evidence of the great crush, the oil
price shock and the unit root hypothesis. Journal of Business and Economic
Statistics, 10, 251-70.

153

You might also like