Notes On Time Series Econometrics For Beginners Using Stata (PDFDrive)
Notes On Time Series Econometrics For Beginners Using Stata (PDFDrive)
C
Special thanks to Dr. Costas Leon for his comments and motivation
Contents
0
1. INTRODUCTION
I write this notes on time series econometrics for my students in Applied Economics at
the University of Economics, HCMC (UEH)1. Since most economics students in
developing countries are likely to have problems with English as a second language,
mathematics background, and especially access to updated resources for self-study, this
series of lectures hopefully has some contribution. The aim is to help you understand
key concepts of time series econometrics through hand-on examples in Stata. To its end,
you are able to read time-series based research articles. Moreover, I also expect that
you will be interested in time series data analysis, and write your dissertation in this
field. As the time this series of lectures is preparing, I believe that the Vietnam time
series data2 is long enough for you to conduct such a study. This is just a brief summary
of the body of knowledge in time series econometrics according to my own
understanding. Obviously, it has no scientific value for citations. In addition, researches
using bivariate models are not strongly appreciated by journal’s editors3 and the
university’s supervisors as well. As a researcher, you must be fully responsible for your
own choice of the research project. My advice is that you should firstly start with the
research problem of interest, not with the data availability and the statistical techniques.
Ironically, at the time this series of lectures is writing, the ‘explanatory factor analysis’
is still a preferred choice of many young researchers in economics and business at UEH.
They blindly imitate previous studies. Honestly, I don’t want the series of models
presented in my notes will become the second wave of critiques. Therefore, you just use
it whenever you really need and crystal clearly understand it.
Some topics such as serial correlation, distributed lag models, ARIMA models, ARCH
models, multivariate ARCH models, unit root tests and cointegration tests with
structural breaks4, dynamic OLS, and fully modified OLS are beyond the scope of this
series of lectures. You can find them elsewhere such as econometrics textbooks, journal
articles, Stata manuals and my handouts as well.
After studying this series of lectures, you should be able to basically understand the
following topics in time series econometrics:
▪ An overview of time series econometrics
▪ The concepts of nonstationary, AR, MA, and random walk processes
1
Website: www.ueh.edu.vn Adress: 59C Nguyen Đinh Chieu Street, District 3, Ho Chi Minh City, Vietnam.
2
The most important data sources for these studies can be World Bank’s World Development Indicators, IMF-
IFS, General Statistical Office, and Reuters Thomson.
3
See Ozturk (2010), Omri (2014).
4
See ‘Nonstationarity II: Breaks’ in Stock & Watson (2015: p.561-67), Binh (2011), Narayan (2005).
1
▪ The concept of spurious regression
▪ The unit root tests
▪ The short-run and long-run relationships
▪ Autoregressive distributed lag (ARDL) model and error correction model (ECM)
▪ AG approach for cointegration and ECM estimation
▪ Vector autoregressive (VAR) models
▪ Vector error correction model (VECM) and Johansen approach for cointegration
▪ Granger causality tests (standard and augmented versions)
▪ ARDL and bounds test for cointegration
▪ Nonstationary panels
▪ Basic practicalities in Stata (versions 14 & 15)
▪ Suggested research topics
To get started, you should be familiar with basic econometrics and statistics5. Searching
for research articles, I realize that this kind of models has been widely applied in fields
of macroeconomics, financial economics, and especially energy economics. Therefore,
these models just equip tools for you to do research, specialized knowledge from
literature review is indeed a key.
5
Suggested references: Gujarati/Porter (2009), Gujarati (2011), Wooldridge (2013), Asteriou & Hall (2011),
Studenmund (2017), Adkins & Hill (2011), Acock (2014), and Hamilton (2013).
2
significance of the estimated coefficients (especially the longest lags and white noise
nature of the errors in ARIMA models), correct sign of the estimated coefficients in
ARCH models, diagnostic checking using the correlogram, Akaike and Schwarz
criteria, and so on. In these cases, we try to exploit the dynamic inter-relationship, which
exists over time for any single series (say, sales, asset prices, or interest rates). On the
other hand, dynamic modelling, including bivariate and multivariate analysis, is mostly
concerned with understanding the structure of an economy and testing economic
hypotheses. However, this kind of modelling assumes that the series slowly adjusts to a
shock and so to understand the process must fully capture the adjustment process which
may be long and complex (Asteriou & Hall, 2011: p.266). The dynamic modelling has
become increasingly popular thanks to the works of three Nobel laureates in Economics,
namely, Clive Granger (for methods of analyzing economic time series with common
trends, or cointegration), Robert Engle (for methods of analyzing economic time series
with time-varying volatility or ARCH), and Christopher Sims (for vector autoregressive
or VAR). Up to now, dynamic modelling has remarkably contributed to economic
policy formulation, especially in macroeconomics, financial markets and energy
sectors. Generally, the key purpose of time series analysis is to capture and examine the
dynamics of the data.
In time series econometrics, it is equally important that the analysts should clearly
understand the term stochastic process. It is a collection of random variables ordered in
time (Gujarati & Porter, 2009: p.740). If we let Y denote a random variable, and if it is
continuous, we denote it a Y(t), but if it is discrete, we denote it as Y t. Since most
economic data are collected at discrete points in time, we usually use the notation Yt
rather than Y(t). If we let Y represent GDP, we have Y1, Y2, Y3, …, Y99, where the
subscript 1 denotes the first observation (i.e., GDP for the third quarter of 1993) and the
subscript 99 denotes the last observation (i.e., GDP for the first quarter of 2018). Keep
in mind that each of these Y’s is a random variable.
In what sense we can regard GDP as a stochastic process? Consider for instance the
Vietnam GDP of 836.270 billion VND for 2017Q3. In theory, the GDP figure for the
third quarter of 2017 could have been any number, depending on the prevailing
economic and political climates. The figure of 836.270 billion VND is just a particular
realization of all such possibilities. In this case, we can think of the value of 836.270
billion VND as the mean value of all possible values of GDP for the third quarter of
2017. In other words, GDP value at a certain point in time is characterized as a normal
distribution. Therefore, we can say that GDP is a stochastic process and the actual values
observed for the period 1993Q2 to 2018Q1 are a particular realization of that process.
Gujarati & Porter (2009: p.740) states that “the distinction between the stochastic
process and its realization in time series data is just like the distinction between
population and sample in cross-sectional data”. Just as we use sample data to draw
3
inferences about a population; in time series, we use the realization to draw inferences
about the underlying stochastic process.
The reason why we mention this term before examining specific models is that all basic
assumptions in time series models relate to the stochastic process (population). Stock &
Watson (2015: p.523) said that the assumption that the future will be like the past is an
important one in time series regression, sufficiently so that it is given its own name:
“stationary”. If the future is like the past, then the historical relationships can be used to
forecast the future. But if the future differs fundamentally from the past, then the
historical relationships might not be reliable guides to the future. Therefore, in the
context of time series regression, the idea that historical relationships can be generalized
to the future is formalized by the concept of stationarity.
6
or the autocorrelation coefficient.
4
(c) covariance: Cov(Yt,Yt+k) = k = E[(Yt - )(Yt+k - )].
where k, covariance (or exactly autocovariance) at lag k, is the covariance between the
values of Yt and Yt+k, that is, between two Y values k periods apart. If k = 0, we obtain
0, which is simply the variance of Y (= 2); if k = 1, 1 is the covariance between two
adjacent values of Y.
Suppose we shift the origin of Y from Yt to Yt+m (say, from the third quarter of 1998 to
the third quarter of 2008 for our GDP data). Now, if Yt is to be stationary, the mean,
variance, and autocovariance of Yt+m must be the same as those of Yt. In short, if a time
series is stationary, its mean, variance, and autocovariance (at various lags) remain the
same no matter at what point in time we measure them. Gujarati & Porter (2009: p.741)
state that such a time series will tend to return to its mean (i.e., mean reversion) and
fluctuations around this mean will have a broadly constant amplitude.
If a time series is not stationary in the sense just defined, it is called a nonstationary
time series. In other words, a nonstationary time series will have a time-varying mean
or a time-varying variance or both.
Why is stationarity important? There are at least two reasons. First, if a time series is
nonstationary, we can study its behavior only for the time period under consideration
(Gujarati & Porter, 2009: p.741). Therefore, each set of time series data will be for a
particular episode only. As a result, it is impossible to generalize it to other time periods.
Therefore, for the purpose of forecasting or policy analysis, such time series may have
little practical value. Second, if we run regressions between nonstationary series, the
results may be spurious (Gujarati & Porter, 2009: p.748; Asteriou & Hall, 2011: p.267).
In addition, a special type of stochastic process, namely, a purely random, or white noise
process, is also popular in time series econometrics. According to Gujarati & Porter
(2009: p.741), we call a stochastic process purely random if it has zero mean, constant
variance 2, and is serially uncorrelated. This is similar to what we call the error term,
ut, in the classical normal linear regression model (CNLRM). This error term is often
denoted as ut ~ iid(0,2).
3.2 MA and AR Processes
In this section, we will investigate two typical types of the stationary process, namely
moving average (MA) and autoregressive (AR).
MA(1) process
The first order of MA process [MA(1)] is defined as:
Xt = t + t-1, where t ~ iid(0,2) (1)
For examples,
OilPt = t + 0.5t-1
5
where OilPt is change in oil price and t is typhoon at sea at the current time.
Lemonadet = t - 0.5t-1
Where Lemonadet is change in lemonade quantity demanded and t is change in
temperature at the current time (see Ben Lambert’s online tutorial lectures).
MA(1) is a stationary series because it satisfies all three conditions for stationarity.
Proof:
From equation (1):
• Mean is constant.
E[Xt] = E[t + t-1] = E[t] + E[t-1] = 0 (2)
• Variance is constant.
Var(Xt) = Var(t + t-1) = Var(t) + 2Var(t-1)
= 2 + 22
= 2(1 + 2) (3)
Both and are constant, so Var(Xt) is indeed constant.
• Covariance only depends on the distance between two periods.
Cov(Xt, Xt-h) = f(h) ≠ f(t)
Cov(Xt, Xt-1) = Cov(t + t-1, t-1 + t-2)
= Cov(t-1, t-1) = 2 (4)
Cov(Xt, Xt-) = Cov(t + t-1, t- + t-1- ) = 0 (5)
6
AR(1) process
The first order of AR process [AR(1)] is defined as:
Xt = Xt-1 + t, where t ~ iid(0,2) (6)
For examples,
OilPt = 0.5OilPt-1 + t
where OilPt-1 is change in oil price in the last period and t is any shock at the current
period.
AR(1) is a stationary series because it satisfies all three conditions for stationarity.
Proof:
From equation (6):
• Mean is constant (i.e., zero).
Xt = Xt-1 + t; t ~ iid(0,2) (6)
= [Xt-2 + t-1] + t
= 2Xt-2 + t-1 + t
= 2[Xt-3 + t-2] + t-1 + t
=…
= tX0 + 0t-0 + 1t-1 + 2t-2 + … + Tt-T
= 0t-0 + 1t-1 + 2t-2 + … + Tt-T
= t + t-1 + 2t-2 + … + Tt-T (7)
Because X0 is always zero (i.e., the starting value of any series is obviously zero).
Therefore,
E[Xt] = E[t + t-1 + 2t-2 + … + Tt-T]
=0 (8)
• Variance is constant.
Because
Xt = t + t-1 + 2t-2 + … + Tt-T (9)
So we have
Xt-1 = t-1 + t-2 + 2t-3 + … + Tt-T-1 (10)
7
Therefore,
Var(Xt) = Var(Xt-1)
As a result, from (6), (9), and (10) we have
Var(Xt) = Var(Xt-1) + Var(t)
Var(Xt) = 2Var(Xt-1) + Var(t)
Var(Xt) = 2Var(Xt) + Var(t)
(1 - 2)Var(Xt) = 2
Var(Xt) = 2/(1 - 2) = constant if || < 1 (11)
• Covariance is constant.
We have
Xt-h = hXt + t-h + t-h-1 + 2t-h-2 + … + Tt-h-T (12)
Therefore,
Cov(Xt, Xt-h) = Cov(Xt, hXt) = hCov(Xt, Xt) = hVar(Xt)
= h[2/(1 - 2)] (13)
8
regressing Yt on Yt-1, Yt-2 and Yt-3 (i.e., multiple regression model), …, and PACp
is the regression coefficient of Yt-p when regressing Yt on Yt-1, Yt-2, …, and Yt-p
(i.e., multiple regression model).
9
0.40
0.40
0.20
Autocorrelations of MA1
0.20
0.00
0.10
-0.20
0.00
-0.10
-0.40
0 5 10 15 0 5 10 15
Lag Lag
95% Confidence bands [se = 1/sqrt(n)]
Bartlett's formula for MA(q) 95% confidence bands
Figure 3.1: ACF of MA(1) process. Figure 3.2: PACF of MA(1) process.
We see that only AC1 (i.e., 1) of MA1 process is statistically different from zero.
0.80
0.80
0.60
0.60
0.40
0.40
0.20
0.20
0.00
0.00
-0.20
-0.20
0 5 10 15 0 5 10 15
Lag Lag
Bartlett's formula for MA(q) 95% confidence bands 95% Confidence bands [se = 1/sqrt(n)]
Figure 3.3: ACF of AR(1) process. Figure 3.4: PACF of AR(1) process.
We see that ACh (i.e., h = h) of AR1 process is declining to zero as h increases. In this
case, only PAC1 (i.e., 1 is statistically different from zero).
10
tsset t
gen eps = invnorm(uniform())
scalar theta0= 0
scalar theta1= 0.6
scalar theta2= 0.3
gen double AR2 = 0
qui replace AR2 in 3/l = theta0 + theta1*L.AR2 + theta2*L2.AR2 + eps
gen double MA2 =0
qui replace MA2 in 3/l = eps + theta1*L.eps + theta2*L2.eps
ac MA2, lags(15)
pac MA2, lags(15)
ac AR2, lags(15)
pac AR2, lags(15)
0.60
Partial autocorrelations of MA2
0.40
0.40
Autocorrelations of MA2
0.20
0.20
0.00
0.00
-0.20
-0.20
0 5 10 15 0 5 10 15
Lag Lag
Bartlett's formula for MA(q) 95% confidence bands 95% Confidence bands [se = 1/sqrt(n)]
Figure 3.5: ACF of MA(2) process. Figure 3.6: PACF of MA(2) process.
We see that only AC1 and AC2 (i.e., 1 and 2) and of MA2 process are statistically
different from zero.
11
0.80
0.80
0.60
0.60
Autocorrelations of AR2
0.40
0.40
0.20
0.20
0.00
0.00
-0.20
-0.20
0 5 10 15 0 5 10 15
Lag Lag
Bartlett's formula for MA(q) 95% confidence bands 95% Confidence bands [se = 1/sqrt(n)]
Figure 3.7: ACF of AR(2) process. Figure 3.8: PACF of AR(2) process.
We see that ACh (i.e., h) of AR2 process is declining to zero as h increases. In this
case, only PAC1 and PAC2 (i.e., 1 and 2 are statistically different from zero).
By the same token, you can generate other series with higher lag orders such as MA(3)
and AR(3), MA(4) and AR(4), and so on. However, actual economic time series rarely
exhibit the exact patterns as theoretically shown.
12
MA(1) => AR()
If || < 1, the MA(1) process can be converted into an infinite order AR process with
geometrically declining weights. This is simply proved as follows:
Xt = t + t-1, where t ~ iid(0,2) (1)
Using the lag operator [i.e., Lt = t-1, L2t = t-2, L3t = t-3, …], equation (1) can be
rewritten as:
Xt = (1 + L)t
Xt
t = (16)
(1+ θL)
If || < 1, then the left-hand side of equation (16) can be considered as the sum of an
infinite geometric progression:
t = Xt(1 - L + 2L2 - 3L3 + …)
t = Xt - LXt + 2L2Xt - 3L3Xt + …
Xt = LXt - 2L2Xt + 3L3Xt - … + t
Xt = Xt-1 - 2Xt-2 + 3Xt-3 - … + t
Xt = 1Xt-1 - 2Xt-2 + 3Xt-3 - … + t (17)
To understand equation (17), let us rewrite the MA(1) process as defined in equation
(1) as below:
t = Xt - t-1 (18)
Lagging the relationship in equation (18) one period, we have:
t-1 = Xt–1 - t-2 (19)
Substituting this into the original expression [i.e., Eq.(1)], we have:
Xt = t + [Xt–1 - t-2] = t + Xt–1 - 2t-2 (20)
Lagging the relationship in equation (19) one period, we have:
t-2 = Xt–2 - t-3 (21)
Substituting this into the expression in Eq.(20), we have:
Xt = t + Xt–1 - 2[Xt–2 - t-3] = t + Xt–1 - 2Xt-2 + 3t-3 (22)
If we go on this procedure [i.e., lagging and substituting] for an infinite number of times,
we finally get the expression in equation (17).
13
4. NONSTATIONARY STOCHASTIC PROCESSES
4.1 Random Walk
According to Stock and Watson (2015: p.523), time series variables can fail to be
stationary in various ways, but two are especially relevant for regression analysis of
economic data: (1) the series can have persistent, long-run movements, that is, the series
can have trends; and, (2) the population regression can be unstable over time, that is,
the population regression can have breaks. For the purpose of this series of lectures, we
only focus on the first type of nonstationarity.
A trend is a persistent long-term movement of a variable over time. A time series
variable fluctuates around its trend. There are two types of trends often observed in time
series data: deterministic and stochastic. A deterministic trend is a nonrandom function
of time (i.e. Yt = A + BT + ut, Yt = A + BT + C*T2 + ut, and so on)7. For example, the
LEX [the logarithm of the dollar/euro daily exchange rate, i.e., LEX = log(EX), see data
on Table13-1.dta, Gujarati (2011)] is a nonstationary series (Figure 4.1), and its
detrended series (i.e. residuals from the regression of log(EX) on time: et = log(EX) – a
– b*Time) is still nonstationary (Figure 4.2). This points out that log(EX) is not a trend
stationary series. Note that we now temporarily accept a series with trend is
nonstationary. However, this informal method is not always reliable. We will shortly
introduce formal statistical tests for nonstationarity, called unit root tests such as ADF
[augmented Dickey-Fuller] and PP [Phillips-Perron].
.6
.3
.2
Detrended series of log(EX)
.4
.1
.2
0
0
-.1
-.2
-.2
0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500
Time Time
Figure 4.1: Log of the dollar/euro daily exchange rate. Figure 4.2: Residuals from the regression of LEX on time.
7
Yt = a + bT + et => et = Yt – a – bT is called the detrended series [where T is a trend variable, e t is residual, a
and b are estimated coefficients]. If Yt is nonstationary, while et is stationary, Yt is known as the trend (stochastic)
stationary (TSP). Here, the process with a deterministic trend is nonstationary but not a unit root process [This
term is shortly defined].
14
In contrast, a stochastic trend is random and varies over time. According to Stock and
Watson (2015: p.552), it is more appropriate to model economic time series as having
stochastic rather than deterministic trends. Therefore, our treatment of trends in
economic time series data focuses mainly on stochastic rather than deterministic trends,
and when we refer to “trends” in time series data, we mean stochastic trends unless we
explicitly say otherwise.
The simplest model of a variable with a stochastic trend is the random walk. There are
two types of random walks: (1) random walk without drift (i.e. no constant or intercept
term) and (2) random walk with drift (i.e. a constant term is present).
The random walk without drift is defined as follow. Suppose ut is a white noise error
term with mean 0 and variance 2. The Yt is said to be a random walk if:
Yt = Yt-1 + ut (23)
This equation is just a special case of equation (6), except is now equal 1. In statistical
language, ‘ = 1’ is called unit root. The basic idea of a random walk is that the value
of the series tomorrow (Yt+1) is its value today (Yt), plus an unpredictable change (ut+1).
From equation (23), we can write
Y1 = Y 0 + u 1
Y2 = Y 1 + u 2 = Y 0 + u 1 + u 2
Y3 = Y 2 + u 3 = Y 0 + u 1 + u 2 + u 3
Y4 = Y 3 + u 4 = Y 0 + u 1 + … + u 4
…
Yt = Yt-1 + ut = Y0 + u1 + … + ut
In general, if the process started at some time 0 with a value Y0 [which is often assumed
as zero], we have
Yt Y0 u t (24)
therefore,
E(Yt ) E(Y0 u t ) Y0 (25)
15
stationarity. In other words, the variance of Yt depends on t, its distribution depends on
t, that is, it is nonstationary.
Interestingly, if we re-write equation (23) as:
(Yt – Yt-1) = ∆Yt = ut (27)
where ∆Yt is the first difference of Yt. It is easy to show that, while Yt is nonstationary,
its first difference is stationary (why?). And this is very significant when we work with
time series data. In its terminology, this is widely known as the difference stationary
(stochastic) process (DSP).
Using Stata do-file as the following commands:
clear
set obs 500
gen time = _n
set seed 12345
drawnorm e, n(500) means(0) sds(1)
tsset time
gen RW = 0
replace RW = L.RW + e if _n > 1
label variable RW "Random walk without drift"
tsline RW
tsline D.RW
We have the following graphs:
40
4
Random walk without drift, D
30
2
Random walk without drift
20
0
10
-2
-4
0
0 100 200 300 400 500 0 100 200 300 400 500
time time
Figure 4.3: Random walk without drift. Figure 4.4: First difference of random walk without drift.
16
The random walk with drift can be defined as follow:
Yt = + Yt-1 + ut (28)
where is known as the drift parameter. The name drift comes from the fact that if we
write the preceding equation as:
Yt – Yt-1 = ∆Yt = + ut (29)
it shows that Yt drifts upward or downward, depending on being positive or negative.
We can easily show that, the random walk with drift violates both conditions of
stationarity. [While its first difference is indeed stationary]. Equation (28) can be
rewritten as:
Y1 = + Y0 + u 1
Y2 = + Y1 + u 2 = + + Y0 + u 1 + u 2
Y3 = + Y2 + u3 = + + + Y0 + u1 + u2 + u3
Y4 = + Y3 + u4 = + + + + Y0 + u1 + u2 + u3 + u4
…
Yt = Yt-1 + ut = + + … + + Y0 + u1 + … + ut
In general, if the process started at some time 0 with a value Y0 [which is often assumed
as zero], we have
E(Yt) = Y0 + t. (30)
Var(Yt) = t2 (31)
In other words, both mean and variance of Yt depends on t, its distribution depends on
t, that is, it is nonstationary.
Using Stata do-file as the following commands:
clear
set obs 500
gen time = _n
set seed 12345
drawnorm e, n(500) means(0) sds(1)
tsset time
gen RW = 0
replace RW = 0.2 + L.RW + e if _n > 1
label variable RW "Random walk with a positive drift"
17
tsline RW
tsline D.RW
4
Random walk with a positive drift, D
2
100
0
50
-2
-4
0
0 100 200 300 400 500 0 100 200 300 400 500
time time
Figure 4.3: Random walk with drift = 0.2. Figure 4.4: First difference of random walk with drift = 0.2.
18
4
0
2
-20
0
-40
-2
-60
-80
-4
0 100 200 300 400 500 0 100 200 300 400 500
time time
Figure 4.5: A random walk without drift. Figure 4.6: First difference of a random walk without drift.
Stock and Watson (2015: p.553) say that because the variance of a random walk
increases without bound, its population autocorrelations (1) are not defined (the first
autocovariance and variance are infinite and the ratio of the two is not well defined).
Cov(Yt ,Yt−1 )
Corr(Yt , Yt−1 ) = ~ (32)
√Var(Yt )Var(Vart−1 )
In a nutshell, a random walk is a nonstationary process, where either its mean or its
variance or both increases over time. However, it is a difference stationary process
because its first difference is stationary.
Let’s return the LEX example. Figures 4.7 and 4.8 show that the logarithm of the
dollar/euro daily exchange rate is characterized as a difference stationary process
because its level is not stationary, whereas its first difference is stationary.
.04
.6
.02
.2
0
0
-.02
-.2
0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500
Time Time
Figure 4.1: Log of the dollar/euro daily exchange rate. Figure 4.7: First difference of log(EX).
19
4.2 Unit Root Stochastic Process
According to Gujarati & Porter (2009: p.744), the random walk model is an example of
what is known in the literature as a unit root process.
Let us write the random walk model (23) as:
Yt = Yt-1 + ut (-1 1) (33)
This model resembles the Markov first-order autoregressive model [AR(1)], usually
mentioned in the basic econometrics course, serial correlation topic. If = 1, equation
(33) becomes a random walk without drift. If is in fact 1, we face what is known as
the unit root problem, that is, a situation of nonstationarity. The name unit root is due
to the fact that = 1. Technically, if = 1, we can write equation (33) as Yt – Yt-1 = ut.
Now using the lag operator L so that Lyt = Yt-1, L2Yt = Yt-2, and so on, we can write
equation (33) as (1-L)Yt = ut. If we set (1-L) = 0, we obtain, L = 1, hence the name unit
root. Thus, the terms nonstationarity, random walk, and unit root can be treated as
synonymous.
If, however, || 1, that is if the absolute value of is less than one, then it can be
shown that the time series Yt is stationary. In other words, equation (33) is really an
AR(1) process, which is previously proved as a stationary process with constant mean,
constant variance, and time-invariant covariance.
5
AR(1) with theta = 0.67
0
-5
0
-10
-2
-15
-4
0 100 200 300 400 500 0 100 200 300 400 500
timevar timevar
Figure 4.8: AR(1) with = 0.67 (stationary). Figure 4.9: AR(1) with = 0.97 (stationary).
21
4.00e+19
60
3.00e+19
AR(1) with theta = 1.1
AR(1) with theta = 1
40
2.00e+19
20
1.00e+19
0
0
0 100 200 300 400 500 0 100 200 300 400 500
timevar timevar
Figure 4.10: AR(1) with = 1 (random walk). Figure 4.11: AR(1) with = 1.1 (explosive).
22
Now the lagged dependent variable [i.e., log(Xt-1)] has a unit coefficient and each period
it increases by an absolute amount equal to log(1.1), which is of course constant. This
series would now be I(1).
More formally, consider the model:
Yt = β1 + β 2 Xt + u t (34)
where ut is the error term. The assumptions of classical linear regression model (CLRM)
require both Yt and Xt to be covariance stationary. In the presence of nonstationarity,
then the results obtained from a regression of this kind are totally spurious8 and these
regressions are called spurious regressions.
The intuition behind this is quite simple. Over time, we expect any nonstationary series
to wander around, so over any reasonably long sample the series either drift up or down.
If we then consider two completely unrelated series which are both nonstationary, we
would expect that either they will both go up or down together, or one will go up while
the other goes down (see Figure 5.1). If we performed a regression of one series on
another, we would then find either a significant positive relationship if they are going
in the same direction or a significant negative one if they are going in opposite directions
even though they are really unrelated. This is the essence of a spurious regression.
It is said that a spurious regression usually has a very high R2, t statistics that appear to
provide significant estimates, but the results may have no economic meaning. This is
because the OLS estimates may not be consistent, and therefore all the tests of statistical
inference are not valid.
Granger and Newbold (1974) constructed a Monte Carlo analysis generating a large
number of Yt and Xt series containing unit roots following the formulas:
Yt = Yt-1 + eYt (35)
Xt = Xt-1 + eXt (36)
where eYt and eXt are artificially generated normal random numbers (as the same way
performed in Section 4).
Since Yt and Xt are independent of each other, any regression between them should give
insignificant results. However, when they regressed various Yts on Xts as show in Table
5.1, they surprisingly found that they were unable to reject the null hypothesis of β2 = 0
for approximately 75% of the cases. They also found that their regressions had very
high R2s and very low values of Durbin-Watson d statistics.
8
This was first introduced by Yule (1926), and re-examined by Granger and Newbold (1977) using the Monte
Carlo simulations.
23
To see the spurious regression problem, we can type the following commands in Stata
to see how many times we can reject the null hypothesis of β2 = 0. The commands are:
clear
set obs 500
gen time = _n
set seed 12345
drawnorm e1 e2, n(500) means(0 0) sds(1 1)
tsset time
gen Y = 0
gen X = 0
replace Y = L.Y + e1 if _n > 1
replace X = L.X + e2 if _n > 1
label variable Y "Y is a random walk"
label variable X "X is a random walk"
twoway scatter Y X || lfit Y X, ytitle("Y is a random walk") xtitle("X is a random
walk") legend(off)
reg Y X
An example of a plot of Y against X obtained in this way is shown in Figure 5.1. The
estimated equation between these simulated series is:
24
40
30
20
10
0
-20 -10 0 10 20
X is a random walk
Granger and Newbold (1974) proposed the following “rule of thumb” for detecting
spurious regressions: If R2 > DW statistic or if R2 1 then the estimated regression
‘must’ be spurious (Gujarati, 2011: p.226).
To understand the problem of spurious regression better, it might be useful to use an
example with real economic data. This example was conducted by Asteriou & Hall
(2011: p.340). Consider a regression of the logarithm of real GDP (Yt) to the logarithm
of real money supply (Mt) and a constant. The results obtained from such a regression
are the following:
Yt = 0.042 + 0.453Mt; R2 = 0.945; DW = 0.221 (37)
(4.743) (8.572)
Here we see very high t-ratios, with coefficients that have the correct signs and more or
less plausible magnitudes. The coefficient of determination is very high (R2 = 0.945),
but there is a high degree of serial correlation (DW = 0.221). This shows evidence of
the possible existence of spurious regression. In fact, this regression is totally
meaningless because the money supply data are for the UK economy, while the GDP
data are for the US economy. Therefore, although there should not be any significant
relationship, the regression seems to fit the data well, and this happens because the
25
variables used in the example are trended (i.e. nonstationary). So, Asteriou & Hall
(2011: p.340) recommends that econometricians should be very careful when working
with trended variables. You can see similar examples in Gujarati (2011, pp.224-226).
or
t t
ut eYi 2 eXi (40)
i 1 i 1
From equation (40), we realize that the variance of the error term will tend to become
infinitely large as t increases. Hence, the assumptions of the CLRM are violated, and
therefore, any t test, F test or R2 are unreliable.
In terms of equation (34), there are four different cases to discuss (Asteriou & Hall,
2011: p.342):
26
Case 1: Both Yt and Xt are stationary9, and the CLRM is appropriate with OLS
estimates being BLUE (Best Linear Unbiased Estimators).
Case 2: Yt and Xt are integrated of different orders. In this case, the regression
equations are meaningless.
Case 3: Yt and Xt are integrated of the same order [often I(1)] and the ut sequence
contains a stochastic trend. In this case, we have spurious regression and it is
often recommended to re-estimate the regression equation in the semi-
difference methods (such as the FGLS method: Orcutt-Cochrane procedure,
Prais-Winsten procedure, and Newey-West standard error).
Case 4: Yt and Xt are integrated of the same order and the ut is stationary. In this special
case, Yt and Xt are said to be cointegrated. The concept of cointegration will
be examined in detail later.
9
Based on the statistical tests such as ADF, PP, and KPSS.
10
This is not explained in this lecture. You can make references from either Gujarati & Porter (2009: pp.808-13),
Hanke (2005: 60-74), or Nguyen Trong Hoai et al (2009: Chapters 3, 4, and 8).
27
Besides, the correlogram is very useful when selecting the appropriate lags [i.e., p and
q] in the ARIMA models and ARCH family models (Hoai et al., 2009)11.
(1) If a series is random, the autocorrelations (i.e. ACF) between Yt and Yt - k for
any lag k are close to zero (i.e., individual autocorrelation coefficients are
statistically insignificant). The successive values of a time series are not related
to each other (Figure 6.1). In other words, Yt and Yt - k are completely
independent for all values of k (k = 1, …., p).
(2) If a series has a (stochastic) trend, successive observations are highly
correlated, and the autocorrelation coefficients are typically significantly
different from zero for the first several time lags and then gradually drop
toward zero as the number of lags increases [i.e., not weakly dependent]. The
autocorrelation coefficient for lag 1 is often very large (close to 1). The
autocorrelation coefficient for lag 2 will also be large, and so on. However, it
will not be as large as for lag 1 (Figure 6.2).
(3) If a series is stationary, the autocorrelation coefficients for, say lag 1, lag 2, or
lag 3, are significantly different from zero and then suddenly die out as the
number of lags increases (Figure 6.3). In other words, Yt and Yt-1, Yt and Yt-2,
Yt and Yt-3 are weakly correlated [i.e., weakly dependent]; but Yt and Yt-k [as
k increases] are completely independent.
(4) If a series has a seasonal pattern, a significant autocorrelation coefficient will
occur at the seasonal time lag or multiples of seasonal lag (Figure 6.4). This is
beyond the scope of this series of lectures.
11
As discussed in Section 4 about AR(p) and MA(q) that p is selected by using PAC graph, and q by using AC
graph. ARIMA(p,d,q) is just a combination of the two processes after differencing d times. Since ARIMA models
are beyond the scope of this series of lectures, so we will not discuss them here.
28
Figure 6.2: Correlogram of a nonstationary series
29
The correlogram becomes very useful for time series forecasting and other practical
(business) implications. If you conduct academic studies, however, it is necessary to
provide more formal statistics such as t statistic12, Box-Pierce Q statistic, Ljung-Box
(LB) statistic, and especially unit root tests.
6.3 Simple Dickey-Fuller Test for Unit Roots
Dickey and Fuller (1979, 1981) proposed a procedure to formally test for
nonstationarity (hereafter refer to DF test). The key insight of their tests is that testing
for nonstationarity is equivalent to testing for the existence of a unit root. Thus the test
is obviously based on the AR(1) model:
Yt = Yt-1 + ut (41)
What we need to examine here is = 1 (unity and hence ‘unit root’). Obviously, the
null hypothesis is H0: = 1, and the alternative hypothesis is H1: < 1.
We obtain a different (more convenient) version of the test by subtracting Yt-1 from both
sides of Eq.(41):
Yt – Yt-1 = Yt-1 – Yt-1 + ut
∆Yt = ( - 1)Yt-1 + ut
∆Yt = Yt-1 + ut (42)
where = ( - 1). Then, now the null hypothesis is H0: = 0, and the alternative
hypothesis is H1: < 0. In this case, if = 0, then Yt follows a pure random walk (and,
of course, in this case Yt is nonstationary).
Dickey and Fuller (1979) also proposed two alternative regression equations that can
be used for testing for the presence of a unit root. The first contains a constant in the
random walk with drift process as follow:
∆Yt = + Yt-1 + ut (43)
According to Asteriou & Hall (2011: p.343), this is an extremely important case,
because such a process exhibits a deterministic trend in the series when = 0 (why?),
which is often the case for macroeconomic variables.
The second case is also to allow, a time trend in the model13, so as to have:
∆Yt = + T + Yt-1 + ut (44)
12
See Hoai et al, 2009 and my lecture on ARIMA models to understand the standard error in time series
econometrics s.e. = 1/ n .
13
Exactly, a deterministic trend exists in the first differenced series.
30
The Dickey-Fuller test for stationarity is the simply the normal ‘t’ test on the coefficient
of the lagged dependent variable Yt-1 from one of the three models (42, 43, and 44).
This test does not, however, have a conventional ‘t’ distribution and so we must use
special critical values which were originally calculated by Dickey and Fuller. This is
also known as the Dickey-Fuller tau statistic (Gujarati & Porter, 2009: p.755). However,
most modern statistical packages such as Stata and Eviews routinely produce the critical
values for Dickey-Fuller tests at 1%, 5%, and 10% significant levels.
MacKinnon (1991,1996) tabulated appropriate critical values for each of the three
above models and these are presented in Table 6.1.
Model 1% 5% 10%
In all cases, the tests concern whether = 0. The DF test statistic is the t statistic for the
lagged dependent variable. If the DF statistical value is smaller [in absolute terms] than
the critical value then we reject the null hypothesis of a unit root and conclude that Y t
is a stationary process. An easy way is to compare the ‘MacKinnon approximate’ p-
value and the significance level () often at 1%, 5%, and 10%. If the p-value is smaller
than a chosen level of significance, we reject the null hypothesis of a unit root. Note
that the MacKinnon approximate p-value and the test statistic are not always consistent
to each other (see StataCorp, 2017b: dfgls).
6.4 Augmented Dickey-Fuller Test for Unit Roots
As the error term may not be a white noise, Dickey and Fuller extended their test
procedure by suggesting an augmented version of the test (hereafter refer to ADF test)
which includes additional lagged terms of the dependent variable in order to control
serial correlation in the test equation. The lag length14 on these additional terms is either
14
See ‘Lag length selection using information criteria’ and ‘Determining lag lengths in VARs’ in Stock & Watson
(2015: p.547-551, p.641).
31
determined by Akaike Information Criterion (AIC) or Schwarz Bayesian/Information
Criterion (SBC, SIC), or more usefully by the lag length necessary to whiten the
residuals (i.e. after each case, we check whether the residuals of the ADF regression are
autocorrelated or not through LM tests and not the Durbin-Watson d test (why?)).
The three possible forms of the ADF test are given by the following equations:
p
Yt Yt 1 i Yt i ut (45)
i 1
p
Yt Yt 1 i Yt i ut (46)
i 1
p
Yt T Yt 1 i Yt i ut (47)
i 1
The difference between the three regressions concerns the presence of the deterministic
elements α and T. The critical values for the ADF test are the same as those given in
Table 6.1 for the DF test.
Similar to the simple cases, the ADF tests also concern whether = 0. The ADF test
statistic is the t statistic for the lagged dependent variable. If the ADF statistical value
is smaller [in absolute terms] than the critical value then we reject the null hypothesis
of a unit root and conclude that Yt is a stationary process. Again, an easy way is to
compare the MacKinnon approximate p-value and the significance level () often at
1%, 5%, and 10%. If the MacKinnon approximate p-value is smaller than a chosen level
of significance (say 5%), we reject the null hypothesis that Yt represents a random walk
or has a unit root.
According to Asteriou & Hall (2011: p.344), unless the econometrician knows the
actual data-generating process, there is a question concerning whether it is most
appropriate to estimate models (36), (37), or (38). It is suggested that the test procedure
should start estimating the most general model given by equation (38) and then
answering a set of questions regarding the appropriateness of each model and moving
to the next model. This procedure is illustrated in Figure 6.1. It needs to be stressed here
that, although useful, this procedure is not designed to be applied in a mechanical
fashion. Plotting the data and observing the graph is sometimes very useful because it
can clearly indicate the presence or not of deterministic regressors (StataCorp, 2017b:
dfuller). [Note: we mean tsline of the first differenced series]. However, this procedure
is the most sensible way to test for unit roots when the form of the actual data-generating
process is typically unknown. In addition, the ADF test results are sensitive to the lag
lengths selected (StataCorp, 2017b: dfgls). Therefore, in practical applications, it is
necessary to use other tests for comparison purpose.
32
Figure 6.1: Procedure for testing for unit roots using ADF methodology.
YES
is = 0?
YES
STOP: Conclude that
NO there is no unit root
Estimate the model
33
6.5 Other Unit Root Tests
In practical studies, researchers mostly use both the ADF and the Phillips-Perron (PP)
tests15. Because the distribution theory that supporting the Dickey-Fuller tests is based
on the assumption of random error terms [iid(0,2)], when using the ADF methodology
we have to make sure that the error terms are uncorrelated and they really have a
constant variance. Phillips and Perron (1988) developed a generalization of the ADF
test procedure that allows for fairly mild assumptions concerning the distribution of
errors (Asteriou & Hall, 2011: p.344-5). The regression for the PP test is similar to DF
equation (43).
∆Yt = + Yt-1 + et (48)
While the ADF corrects for higher order serial correlation by adding lagged differenced
terms of dependent variable on the right-hand side of the test equation, the PP test uses
Newey-West (1987) standard errors16 to account for serial correlation (Asteriou & Hall,
2011: p.345-6; StataCorp, 2017b: pperron).
So, the PP statistics are just modifications of the ADF t statistics that take into account
the less restrictive nature of the error process. The expressions are extremely complex
to derive and are beyond the scope of my notes. Since most statistical packages have
routines available to calculate these statistics, it is good for researcher to test the order
of integration of a series performing the PP test as well. The asymptotic distribution of
the PP t statistic is the same as the ADF t statistic and therefore the MacKinnon
(1991,1996) critical values are still applicable. That means the PP tests also concern
whether = 0. The PP test statistic is the t statistic for the lagged dependent variable. If
the PP statistical value is smaller [in absolute terms] than the critical value then we
reject the null hypothesis of a unit root and conclude that Yt is a stationary process.
Again, an easy way is to compare the MacKinnon approximate p-value and the
significance level () often at 1%, 5%, and 10%. If the MacKinnon approximate p-
value is smaller than a chosen level of significance, we reject the null hypothesis that
Yt represents a random walk or has a unit root.
As with the ADF tests, the PP tests can be performed with the inclusion of a constant
and linear trend, or none of them in the test regression.
Dickey-Fuller tests may have low power (H0 of unit root not rejected, whereas in reality
there may be no unit root) when ρ is close to one. This could be the case of trend
15
Recently, the dfgls has become a priority in practical applications.
16
See Wooldridge (2013: p.431-4).
34
stationarity (H0). An alternative test is KPSS17 (Kwiatkowski-Phillips-Schmidt-Shin,
1992). Its test procedure is briefly summarized as:
(1) Regress Yt on intercept and time trend and obtain OLS residuals et.
t
(2) Calculate partial sums St = es for all t.
s1
S2t T
(3) Calculate the test statistic KPSS = T 2 , and compare with critical value.
2
s1
ˆ
The critical values are routinely produced by statistical packages such as Stata and
Eviews. The null hypothesis [of stationarity] is rejected if the KPSS test statistic is larger
than the selected critical value, often at 5% level of significance.
Another statistical test for a unit root, namely augmented Dickey-Fuller using GLS –
generalized least square (dfgls)18 has recently developed. Among statistical tests for a
unit root, dfgls is the most powerful and informative Hamilton (2012: p.376, StataCorp,
2017b: dfgls). It performs the modified Dickey-Fuller t test proposed by Elliott,
Rothenberg, and Stock (1996). Basically, dfgls is an augmented Dickey-Fuller test,
except that the series is transformed via a generalized least squares regression before
performing the test (see StataCorp, 2017b: dfgls).
A special attention is that the above unit root tests have an assumption that no structural
breaks exist in the series of interest. If this is a case, we must use alternative tests such
as Zivot and Andrews (ZA, 1992) or Lumsdaine and Papell (LP, 1997)19.
17
See Greene (2008: p.755).
18
See ‘The DF-GLS test for a unit root’ in Stock & Watson (2015: p.651-4); StataCorp (2017) – dfgls: Here you
can see the ADF results are not as strong as those produced by dfgls through an example about the log of
investment in Germany.
19
See Narayan (2005).
35
Table 6.2: DF test of log(EX), Eq.(43).
The DF t statistic is 0.172, which is positive. This incorrect sign may be due to the test
equation is incorrectly specified. The positive would imply > 1, which means
log(EX) is explosive. This is not usual in macroeconomic data (Greene, 2008: p.740).
Therefore, we rule out this possibility.
The absolute value of the DF t statistic in this case is 3.026, less than the 10% critical
value of 3.128, so we should not reject the null hypothesis that log(EX) represents a
random walk, or has a unit root. In other words, log(EX) series is not stationary at 10%
level of significance. The MacKinnon approximate p-value of this test statistic is
approximately 12.48 percent as you can see at the bottom of the test results.
36
Table 6.4: DF-GLS test of log(EX).
The dfgls above reports tests of the nonstationary null hypothesis [i.e., the log(EX)
series represents a random walk, or has a unit root] for lags from 1 to 10 days. At the
bottom, the output offers three different methods for choosing an appropriate number
of lags: Ng-Perron sequential t, minimum Schwarz information criteria, and Ng-Perron
modified Akaike information criteria (MAIC). The MAIC is more recently developed,
and Monte Carlo experiments support its advantages over the Schwarz method. The
absolute value of DF-GLS statistic for 5 lags is 0.511, less than the 10% critical value
of 2.556, so we should not reject the null hypothesis. Note that Ng-Perron sequential t
indicates the maximum lags is 26. However, for space limited, we just restrict at 10
maximum lag lengths.
Let use the maximum lags at 5 from DF-GLS test results, we find that the absolute value
of the ADF t statistic from equation (47) is 2.809, less than the 10% critical value of
3.120, so we should not reject the null hypothesis that log(EX) represents a random
walk, or has a unit root. As a result, we can conclude that log(EX) series is not stationary
at 10% level of significance.
37
Table 6.5: ADF test of log(EX), Eq.(47).
The absolute value of the PP t statistic from equation (48) is 3.027, less than the 10%
critical value of 3.120, so we should not reject the null hypothesis that log(EX)
38
represents a random walk, or has a unit root. Therefore, both ADF and PP tests confirm
that log(EX) series is not stationary at 10% level of significance.
To make sure that the log(EX) series is not trend stationary, we use the KPSS test. The
results in Table 6.7 reject the null hypothesis that log(EX) is not trend stationary because
the test statistics at all lags are larger than critical values at 10%.
Table 6.8: dfgls test of log(EX).
A similar test of the first difference of log(EX) in Table 6.10, on the other hand, rejects
the nonstationary null hypothesis [i.e., log(EX) series has a unit root] at all lags [Note:
the maximum lag based on MAIC is up to 26], even at the 1% level. Therefore, we can
confirm that log(EX) series is a difference stationary process.
Table 6.9: ADF test of log(EX), Eq.(47).
39
The absolute value of the ADF t statistic is about 17.68 [Table 6.9] and PP t statistic is
about 48.39 [Table 6.10], greater than the 1% critical value of 3.43, so we should reject
the null hypothesis that the first difference of log(EX) series represents a random walk,
or has a unit root. Eventually, we could conclude that log(EX) series follows a
difference stationary process.
[Important note: We implicitly assume that ut is a white noise; i.e., the simple
ARDL(1,1) is a well-specified model].
We can analyze both short-run and long-run effects (either slopes or elasticities) defined
as follows:
40
(1) Short-run or static effect:
Yt
B0 (50)
Xt
Proof:
Yt
B0
Xt
Yt 1 Y
A1 t B1 = A1.B0 B1
Xt Xt
Yt 2 Y
A1 t 1 A1(A1.B0 B1)
Xt Xt
Yt 3 Y
A1 t 2 A12(A1.B0 B1)]
Xt Xt
…
Yt 1 Y
A1 t A1(A1.B0 B1)]
Xt Xt
If |A1| < 1, the cumulative effect or long-run slope (Slr) will be the sum of all derivatives:
Slr B0 [A1B0 B1] A1[A1B0 B1] A12(A1.B0 B1)] ... A1(A1.B0 B1)] (52)
We can also take expectations to derive the long-run relation between Yt and Xt [see
Asteriou & Hall, 2011: p.360]:
E(Yt) = A0 + A1E(Yt-1) + B0E(Xt) + B1E(Xt-1)
E(Yt) = A0 + A1E(Yt) + B0E(Xt) + B1E(Xt)
E(Yt) - A1E(Yt) = A0 + (B0 + B1)E(Xt)
41
(1-A1)E(Yt) = A0 + (B0 + B1)E(Xt)
A0 (B B1)
=> E(Yt) = 0 E(Xt)
1 A1 (1 A1
= α + βE(Xt)
or simply to write:
Y* = α + βX* (54)
Here, β = (B0 + B1)/(1 - A1) is the long-run effect of a lasting shock in Xt. And the short-
run effect of a change in Xt is B0.
In the same token, we can expand to a more complicated ARDL(p,q) model [Important
note: We implicitly assume that ut is a white noise]:
(1) Short-run or static effect:
Yt
B0 (55)
Xt
42
In this model, Yt and Xt are assumed to be in long-run equilibrium, i.e. changes in Yt
relate to changes in Xt according B1. If Yt - 1 deviates from the optimal value (i.e. its
equilibrium), there will be a correction. Speed of adjustment is given by = (1 - A1),
which is between > 0 and <1. We will discuss coefficient in detail when discussing
the ECM model in the next section. Note that how large of depends on which
mechanism [i.e., AR(p) or DL(q)] that the ARDL model follows. If coefficients A1, A2,
.., Ap are large [i.e., the ARDL model mainly follows the AR(p) process], then will
be small. That means the speed of adjustment toward equilibrium is slow. Besides, this
coefficient also depends on the number of explanatory variables [Xt] included in the
model (Gujarati, 2011: p.243).
20
Standard regression techniques, such as OLS, require that the variables be covariance stationary ... Cointegration
analysis provides a framework for estimation, inference, and interpretation when the variables are not covariance
stationary (StataCorp, 2017b: vec intro).
43
meaningless. On the other hand, if the stochastic trends do cancel to each other, then we
have cointegration (i.e., the common trend), which gives us various practical
implications for policy design (Asteriou & Hall, 2011: p.356).
Suppose that, if there really is a genuine long-run relationship between Yt and Xt,
although they will rise overtime (because they are trended), there will be a common
trend that links them together. For an equilibrium, or long-run relationship to exist, what
we require, then, is a linear combination of Yt and Xt that is a stationary variable [an
I(0) variable]. A linear combination of Yt and Xt can be directly taken from estimating
the following regression (Asteriou & Hall, 2011: p.356-7):
Yt = β 1 + β 2 Xt + u t (58)
And the obtain the residuals:
ût Yt ˆ
1 ˆ
2Xt (59)21
If ût ~ I(0), we say that two variables Yt and Xt are cointegrated. Therefore, two
variables are said to be cointegrated if each is an I(1) process but a linear combination
of them is an I(0) process. It is important to note that if Yt and Xt cointegrate, the simple
regression of Yt on Xt is mis-specified (StataCorp, 2017b: vec intro).
8.2 An Example of Cointegration
Table14-1.dta [Gujarati, 2011: Chapter 14] gives quarterly data on personal
consumption expenditure (PCE) and personal disposable (i.e. after-tax) income (PDI)
for the USA for the period 1970-2008 (Gujarati, 2011: p.226). Both graph (Figure 8.1)
and ADF tests (Tables 8.1 and 8.2) indicate that these two series are not stationary. They
are I(1), that is, they have stochastic trends. In addition, the regression of log(PCE) on
log(PDI) seems to be spurious (Table 8.3) [because R2 > DW d statistic].
Since both series are trending, let us see what happens if we add a trend variable to the
model. The elasticity coefficient is now changed, but the regression is still spurious
(Table 8.4). However, after estimating the regression of log(PCE) on log(PDI) and
trend, we realize that the obtained residuals is a stationary series [i.e., I(0)] at 5% level
of significance (Table 8.5). This implies that a linear combination (et = log(PCE) – b1 –
b2log(PDI) – b3T) cancels out the stochastic trends in the two variables. Therefore, this
regression is, in fact, not spurious (Gujarati, 2011: pp.229-30). In other words, the
variables log(PCE) and log(PDI) are cointegrated.
21
Greene (2008: p.756) calls this as ‘partial difference between the two variables’. If this difference is stable
around a fixed mean, it implies the series are drifting together at roughly the same rate.
44
9
8.5
8
0 50 100 150
Time
45
Table 8.2: Unit root tests for log(PDI).
46
Table 8.4: Regression of log(PCE) on log(PDI) and trend.
Table 8.5: ADF test for residual series from Table 8.4.
47
In the language of cointegration theory, the equation log(PCE) = B1 + B2log(PDI) + B3T
is known as a cointegrating regression and the slope parameters B2 and B3 are known
as cointegrating parameters.
48
models. Here, what we mainly concern is the problem of serial
correlation [see Adkins & Hill, 2011: Chapter 9].
b) If the variables are integrated of different orders, we could apply
other methods such as bounds tests and/or Toda-Yamamoto (1995)
tests for cointegration.
c) If both variables are integrated of order 1: I(1), we proceed with
step two.
49
According to Asteriou & Hall (2011: p.366) and Gujarati (2011: p.235-6), one of the
best features of the Engle-Granger 2-step approach is that it is both very easy to
understand and to implement. However, it also remains various shortcomings:
(1) One very important issue has to do with the order of the variables. When
estimating the long-run relationship, one has to place one variable in the left-
hand side and use the others as regressors. The test does not say anything about
which of the variables can be used as regressors and why. Consider, for example,
the case of just two variables, Xt and Yt. One can either regress Yt on Xt (i.e. Yt
= A + BXt + u1t) or choose to reverse the order and regress Xt on Yt (i.e. Xt = C
+ DYt + u2t). It can be shown, which asymptotic theory, that as the sample goes
to infinity the test for cointegration on the residuals of those two regressions is
equivalent (i.e. there is no difference in testing for unit roots in u1t and u2t).
However, in practice, especially in economics we rarely have very big samples
[i.e., realizations] and it is therefore possible to find that one regression exhibits
cointegration while another doesn’t. This is obviously a very undesirable feature
of the EG approach. The problem obviously becomes far more complicated when
we have more than two variables under investigation.
(2) A second problem is that when there are more than two variables there may be
more than one integrating relationship, and EG 2-step approach using residuals
from a single relationship cannot treat this possibility. In other words, the EG
approach does not allow for estimation of more than one cointegrating
regression. Suppose we have k variables, there can be at most (k - 1)
cointegrating relationships. If this is a case, we have to use cointegration tests
developed by Johansen.
(3) Along with the second problem, a third problem in dealing with multiple time
series is that we not only have to consider finding more than one cointegrating
relationship, but then we will also deal with the error correction term for each
cointegrating relationship. As a result, the simple, or bivariate error correction
model will obviously not work. This problem can be solved by using the vector
error correction model (VECM).
(4) The final problem is that it relies on a two-step estimator. The first step is to
generate the residual series and the second step is to estimate a regression for this
series in order to see if the series is stationary or not. Hence, any error introduced
in the first step is of course carried into the second step.
50
8.4 Interpreting the Error Correction Model
According to Asteriou & Hall (2011: p.360), the concepts of cointegration and the error
correction mechanism are very closely related. To understand the ECM, it is better to
think first of the ECM as a convenient reparameterization of the general linear
autoregressive distributed lag (ARDL) model [as shown in Section 7.2].
Consider the very simple dynamic ARDL(1,1) model describing the behavior of Yt in
terms of Xt as equation (49):
Yt = A0 + A1Yt-1 + B0Xt + B1Xt-1 + ut (49)
where ut ~ iid(0,2).
[That mean we implicitly assume that ut is a white noise, i.e., the ARDL(1,1) is a
correctly specified model].
In this model22, the parameter B0 denotes the short-run reaction of Yt after a change in
Xt [Eq.(50)]. The long-run effect is given when the model is in equilibrium where:
Y* = α + βX* (54)
Recall that he long-run effect (either slope or elasticity) between Yt and Xt is captured
by β = (B0 + B1)/(1 - A1) [Eq.(51)]. It is noted that, we need to make the assumption that
|A1| < 1 (why?) in order that the short-run model converges to a long-run solution.
The ECM is shown in equation (57a or 57b):
Yt = B1Xt – Yt 1 Xt 1 + ut (57a)
or
Yt = B1Xt – ECTt - 1 + ut (57b)
According to Asteriou & Hall (2011: p.361), what is of importance here is that when
the two variables Yt and Xt are cointegrated, the ECM incorporates not only short-run
but also long-run effects. This is because the long-run equilibrium [Yt - 1 – α – βXt - 1] is
included in the model together with the short-run dynamics captured by the differenced
term. Another important advantage is that all the terms in the ECM model are stationary
and the standard OLS estimation is therefore valid. This is because if Yt and Xt are I(1),
then ∆Yt and ∆Xt are I(0), and by definition if Yt and Xt are cointegrated then their
linear combination [ut-1 = Yt-1 – α – βXt - 1] ~ I(0).
A final important point is that the coefficient = (1 - A1) provides us with information
about the speed of adjustment in cases of disequilibrium. Note again that the value of
22
We can easily expand this model to a more general case for large numbers of lagged terms [ARDL(p,q)] as
shown in Eq(55) and Eq.(56).
51
coefficient depends on A1 and number of X variable(s) included in the ARDL model
[the bias problem due to omitted X variable(s)]. To understand this better, consider the
long-run condition. When equilibrium holds, then [Yt-1 – α – βXt - 1] = 0. However,
during periods of disequilibrium this term is no longer be zero and measures the distance
that the system is away from its equilibrium state. For example, suppose that due to a
series of negative shocks in the economy in period t - 1. This causes [Yt - 1 – α – βXt - 1]
to be negative because Yt - 1 has moved below its long-run equilibrium path. However,
thanks to = (1 - A1) is positive (why?), [so the product of -*ut - 1 > 0], then the overall
effect (with an assumption that the short-run effect of Xt on Yt is unchanged) is to boost
∆Yt back towards its long-run path [i.e., Yt = Yt - 1 + ∆Yt]. Again, notice that the speed
of this adjustment to equilibrium is dependent upon the magnitude of = (1 - A1).
The coefficient in equations (57a or 57b) is the error-correction coefficient and is also
called the adjustment coefficient. In fact, tells us how much of the adjustment to
equilibrium takes place each period [say month, quarter, or year; depending on the
original data], or how much of the equilibrium error is corrected each period. According
to Asteriou & Hall (2011: p.363), it can be explained in the following ways:
(1) If ~ 1, then nearly 100% of the adjustment takes place within the period 23, or
the adjustment is very fast. [i.e., Xt and their lags are key determinants of Yt].
(2) If ~ 0.5, then about 50% of the adjustment takes place each period.
(3) If ~ 0, then there is no adjustment. [i.e., Xt and their lags do not at all determine
Yt; i.e., Yt purely follows AR() mechanism].
According to Asteriou & Hall (2011: p.359-60), the ECM is important and popular for
many reasons, such as:
(1) It is a convenient model measuring the correction from disequilibrium of the
previous period which has a very good economic implication.
(2) If we have cointegration, ECM models are formulated in terms of first difference,
which typically eliminate trends from the variables involved; they resolve the
problem of spurious regressions.
(3) A very important advantage of ECM models is the ease with they can fit into the
general-to-specific (or Hendry) approach to econometric modeling, which is a
search for the best ECM model that fits the given data sets.
(4) The most important feature of ECM comes from the fact that the disequilibrium
error term is a stationary variable. Because of this, the ECM has important
implications: the fact that the two variables are cointegrated implies that there is
23
Again, this depends on the kind of data used, say, annually, quarterly, or monthly.
52
some automatically adjustment process which prevents the errors in the long-run
relationship becoming larger and larger.
Figure 8.2: Long-run relationship and short-run deviations between LPDI and LPCE.
The long-run relationship between LPCE and LPDI is given by the following equation:
The lagged residuals obtained from Eq.(60) is then included in the error correction
model as an regressor of change in LPCE at the current time (LPCEt). The Stata
commands are as follows:
53
use "D:\My Blog\Time series econometrics for beginners\Table14_1.dta", clear
tsset time
regress lnpce lnpdi time
predict S1, resid
regress D.lnpce D.lnpdi L.S1
All coefficients in the table are individually statistically significant at 6% or lower level.
The coefficient of about 0.31 shows that a 1% increase in log(PDIt/PDIt-1) will lead on
average to a 0.31% increase in ln(PCEt/PCEt-1). This is the short-run consumption-
income elasticity. Whereas the long-run value is given by the cointegrating regression
(Table 8.4), which is about 0.77.
The coefficient of the error-correction term of about -0.065 suggests that about 6.5% of
the discrepancy between long-term and short-term LPCE is corrected within a quarter
(quarterly data), suggesting a slow rate of adjustment to equilibrium. Gujarati (2011,
p.233) said that one reason the rate of adjustment seems low is that our model is rather
simple. If we had the necessary data on interest rate, wealth of consumer, and so on,
54
probably we might have seen a different result. In addition, we might expect that LPCE
strongly follows the AR() mechanism.
Therefore, the ECM is presented as the following equation:
This equation postulates that changes in LPCE depend on changes in LPDI and the
lagged equilibrium error term estimator, ût − 1 . If this error term is zero, there will not
be any disequilibrium between the cointegrating relationship [no error term here,
Eq.(60)]. But if the equilibrium error term is nonzero, relationship between LPCE and
LPDI will be out of equilibrium (Gujarati, 2011: p.232).
Suppose that LPDI = 0 (no change in LPDI) and ût − 1 is positive. This means LPCEt-
1 is too high to be in equilibrium – that is LPCEt - 1 is above its equilibrium value [= 1.67
+ 0.77LPDIt - 1 + 0.0024(Time – 1)]. Therefore, the product – 0.065ût − 1 is negative,
and LPDIt will be negative to restore the equilibrium. That is, if LPCEt - 1 is above its
equilibrium value, it will start falling in the period t to correct the equilibrium error. By
the same token, if LPCEt - 1 is below its equilibrium value [= 1.67 + 0.77LPDIt - 1 +
0.0024(Time – 1)], i.e., ût−1 is negative. The product – 0.065ût − 1 is positive, and
LPDIt will be positive to restore the equilibrium. That is, if LPCEt - 1 is below its
equilibrium value, it will start rising in the period t to correct the equilibrium error.
How about in our current example? Let us list the actual values of LPCE, LPDI, and
ût − 1 in some periods. Here are the Stata’s commands:
use "D:\My Blog\Time series econometrics for beginners\Table14_1.dta", clear
tsset time
regress lnpce lnpdi time
predict S1, resid
regress D.lnpce D.lnpdi L.S1
predict D_lpce
ereturn list
matrix b=e(b)
matrix list b
scalar b1 = b[1,3]
scalar b2 = b[1,1]
55
scalar b3 = b[1,2]
gen A1 = b1
gen A2 = b2
gen A3 = b3
rename lnpce lpce
rename lnpdi lpdi
list A1 A2 D.lpdi A3 S1 D_lpce in 152/156
Table 8.7: LPCEt due to LPDIt and ût−1 from Eqs.(61, 62).
For example, at observation 155, the actual LPCE is below its long-run equilibrium (S1
< 0). From Table 8.7, we have ∆LPCÊ 156 = 0.0055 + 0.306*0.0082 + 0.065*0.024 ~
̂ 156 = LPCE155 + 0.00845, which indicates that LPCE is rising
0.00845. Therefore, LPCE
to restore the equilibrium (although the rate of this adjustment is very slow).
56
des
label variable tb3 "3-month treasury bill rate"
label variable tb6 "6-month treasury bill rate"
set obs 349
gen month = ym(1981, 1) + _n
format %tm month
tsset month
tsline tb3 || tsline tb6, legend(position(18) ring(0) rows(2)) ylabel(0 5 10 15 17) /*
*/ xtitle(" ")
17
Figure 8.3: Monthly three and six months Treasury Bill rates.
Figure 8.3 shows that two series TB3 and TB6 closely go together, so we would expect
that the two rates are cointegrated. In other words, there might be a stable equilibrium
relationship between them, although each series exhibits stochastic trend, I(1). To
further investigate their relationship, we first test each series for stationarity. By using
dfgls with trend, we realize that the maximum lags based on MAIC for TB3 and TB6
are 16 and 15, respectively. Thanks to these results, we then apply the ADF tests with
constant, trend, and maximum lag, and find that both series are stationary at 1% level
of significance (Tables 8.8 and 8.9).
57
Table 8.8: ADF test for stationarity of TB3 series.
Now let us find out if the two series are cointegrated. Gujarati (2011: p.234) suggests
the cointegrating equation with quadratic trend as presented in Table 8.10. From this
regression results, we obtain the residuals [denote ECT], and then apply dfgls and ADF
tests for stationarity of this residual series. The ADF test is reported in Table 8.11. The
unit root test results show that the two series (TB6 and TB3) are cointegrated.
58
Table 8.10: Relationship between TB6 and TB3.
The ECM estimation is presented in Table 8.12. Since the TB rates are in percentage
form, the findings here suggest that if the 6-month TB rate was higher than the 3-month
TB rate more than expected in the previous month, this month it will be reduced by
about 0.2 percentage points to restore the equilibrium relationship between the two
series (Gujarati, 2011: p.234-5).
59
From the cointegrating regression given in Table 8.10, we see that after allowing for
deterministic trends, if the 3-month TB rate goes up by one percentage point, on average
the 6-month TB rate goes up by about 0.95 percentage point – a very close relationship
between the two. From the ECM model given in Table 8.12, we observe that in the short
run a one percentage point change in the 3-month TB rate leads on average to about
0.88 percentage point change in the 6-month TB rate, which shows how quickly the two
rates move together.
Table 8.12: Error correction model for TB6 and TB3.
We try to regress 3-month TB rate on 6-month TB rate (Tables 8.13 and 8.14), and find
out the similar results because our sample size is large. However, the results will be
different if we are studying more than two series (Gujarati, 2011: p.235).
Table 8.13: Relationship between TB3 and TB6.
60
Table 8.14: Error correction model for TB6 and TB3.
24
Nobel prize in economics 2012.
25
Gujarati (2011, p.266) said that [from the point of view of forecasting] each equation in VAR contains only its
own lagged values and the lagged values of the other variables in the system. Similarly, Wooldridge (2003, p.620-
1) said that whether the contemporaneous (current) value is included or not depends partly on the purpose of the
equation. In forecasting, it is rarely included.
61
Xt, and simultaneously, Xt is affected by not only its lagged values but current and
lagged values of Yt. This simple bivariate VAR model [i.e., a system of two variables
and one lagged value of each variable on the right-hand side: VAR(1)] is given by:
We assume that both Yt and Xt are stationary; and u1t and u2t are unrelated white-noise
error terms, which are called impulses or innovations or shocks in the language of VAR
(Gujarati, 2011, p.266; Gujarati & Porter, 2009: p.785). Note that a critical requirement
of VAR is that the time series under consideration are stationary (Gujarati, 2011: p.267).
Eq.(63) and Eq.(64) are ARDL(1,1) models, and they both constitute a first-order VAR
model [VAR(1)]. Also note that these equations are not reduced-form equations since
Yt has a contemporaneous impact on Xt (given by B1) and Xt has a contemporaneous
impact on Yt (given by B2) (Asteriou & Hall, 2011: p.320). Based on Asteriou & Hall
(2011: p.320-1), rewriting the system using matrix algebra, we get:
1 11 Yt 12 Yt−1 u1t
[ ] [ ] = [ 10 ] + [11 ] [
22 X t−1 ] + [ u2t ] (65)
21 1 Xt 20 21
or
BZt = 0 + 1Zt-1 + ut (66)
where
1 11 Y 12 u1t
B=[ ], Zt = [ t ], 0 = [ 10 ], 1 = [11 22 ], and u = [u2t ]
21 1 Xt 20 21
t
Zt = A0 + A1Zt-1 + et (67)
where
Y a1 c1 e1t
Zt = [ t ], A0 = [ ], A1 = [b ], and e t=[
e2t ]
Xt 1 d1
62
To distinguish between the original VAR model and the system we have just obtained,
we call the first as a structural26 or primitive VAR system and the second as a VAR in
the standard (or reduced) form. It is important to note that the new error terms, e1t and
e2t, are composites of the two shocks u1t and u2t. Since et = B-1ut we can obtain e1t and
e2t as follows:
Since u1t and u2t are white-noise processes, it follows that both e1t and e2t are also white-
noise processes.
Similarly, we can now write the VAR(2) model as:
where
Y a1 c1 a2 c2 e1t
Zt = [ t ], A0 = [ ], A1 = [b ], A 2=[ ], and et=[ ]
Xt 1 d1 b2 d2 e 2t
or
where
Y a1 c1 a2 c2 aq cq e1t
Zt = [ t ], A0 = [ ], A1 = [b d1 ], A2 = [b2 ], … A q=[
dq ], and et = [e2t ]
Xt 1 d2 bq
26
See ‘Using VARs for causal analysis’ in Stock & Watson (2015: p.641-2).
63
The bivariate VAR often has the following features (Gujarati, 2011: p.266):
(1) The bivariate VAR resembles a simultaneous equation system, but the
fundamental difference between them is that each equation in VAR contains only
its own lagged values and the lagged values of the other variables in the system.
In other words, no current values of the two variables are included on the right-
hand side of these equations.
(2) Although the number of lagged values of each variable can be different, in most
cases we use the same number of lagged terms in each equation.
(3) The bivariate VAR system given above is known as a VAR(q) model, because
we have q lagged values of each variable on the right-hand side. If we have only
one lagged value of each variable on the right-hand side, it would be a VAR(1)
model; if two lagged terms, it would be a VAR(2) model; and so on.
(4) Although we are dealing with only two variables, the VAR system can be
extended to several variables.
(5) But if we consider several variables in the system with several lags for each
variable, we will have to estimate several parameters, which is not a problem in
our age of high-speed computers and sophisticated software, but the system
quickly becomes quickly unwieldy.
(6) In the two-variable system of equations (72) and (73), there can be at most one
cointegrating, or equilibrium, relationship between them. If we have a three-
variable VAR system, there can be at most two cointegrating relationships
between the three variables. In general, an k-variable VAR system can have at
most (k - 1) cointegrating relationships. Note that finding out how many
cointegrating relationships exist among n variables requires the use of
Johansen’s methodology.
Note that all variables have to be of the same order of integration. The following cases
are distinct:
(1) All the variables are I(0) (stationary): the standard case, i.e. a VAR in level. In
that case, we can estimate each equation by OLS. The VAR(q) system is defined
as follows:
Zt = A0 + A1Zt - 1 + A2Zt - 2 + … + AqZt - q + et (75)
or
Yt = + a1Yt - 1 + c1Xt - 1 + a2Yt - 2 + c2Xt - 2 + … + aqYt - q + cqXt - q + e1t (76)
Xt = + b1Yt - 1 + d1Xt - 1 + b2Yt - 2 + d2Xt - 2 + … + bqYt - q + dqXt - q + e2t (77)
where
64
Y a1 c1
Zt = [ t ], A0 = [ ], A1 = [b d1 ],
Xt 1
a2 c2 aq cq e1t
A2 = [b d2 ], …, A q=[
bq dq ], and e t=[
e2t ]
2
(2) All variables are I(1) but are not cointegrated, then we estimate a VAR using
first differences of variables, which are now stationary. Here we can also use
OLS to estimate each equation individually. However, we are just able to
investigate the short-run relationships and causality directions among these
variables. The VAR(p) system is defined as follows [note that p = q - 1]:
Zt = 0 + 1Zt - 1 + 2Zt - 2 + … + pZt - p + vt (78)
or
Yt = + a1Yt - 1 + c1Xt - 1 + … + apYt - p + cpXt - p + v1t (79)
Xt = + b1Yt - 1 + d1Xt - 1 + … + bpYt - p + dpXt - p + v2t (80)
where
Yt a1 c1
Zt = [ ], 0 = [ ], 1 = [b d1 ],
X t 1
a2 c2 ap cp v1t
2 = [b d2 ], … p=[
bp dp ], and vt=[
v ]
2 2t
(3) All variables are I(1), but are cointegrated, then we have to use the error
correction mechanism (ECM). However, we are dealing with more than one
equation in a VAR system, the multivariate counterpart of ECM is known as the
vector error correction model (VECM). VECM is just a special case of the VAR
for variables that are stationary in their first differences. In addition, VECM can
also take into account any cointegrating relationships among the variables
(Adkins & Hill, 2011: p.407). The VECM is defined as follows:
Zt = 0 + 1Zt - 1 + 2Zt - 2 + … + pZt - p + Zt - 1 + vt (81)
or
Yt = + a1Yt-1 + c1Xt-1 + … + apYt-p + cpXt-p + e1Yt-1 + g1Xt-1 + v1t (82)
Xt = + b1Yt-1 + d1Xt-1 + … + bpYt-p + dpXt-p + f1Yt-1 + h1Xt-1 + v2t (83)
where
Yt a1 c1 a2 c2 ap cp
Zt = [ ], 0 = [ ], 1 = [b ], 2=[ ], … p=[
dp ],
X t 1 d1 b2 d2 bp
e1 g1 Yt−1 v1t
= ’ = [ f h1 ], Z t-1 = [ ], and vt=[
v2t ]
1 X t−1
65
We can decompose = ’ where is the speed of adjustment to equilibrium
coefficient, and ’ is the matrix of long-run coefficients. In the next section, we will
discuss VECM models in detail for a case of more than one cointegrating equations.
According to Asteriou & Hall (2011: p.321) and Gujarati & Porter (2009: p.788), the
VAR model has some good characteristics.
• First, it is very simple because we do not have to worry about which variables
are endogenous or exogenous.
• Second, estimation is also very simple, in the sense that each equation can be
estimated with the usual OLS method separately.
• Third, forecasts obtained from VAR models are in most cases better than those
obtained from the far more complex simultaneous equation models.
• Four, besides forecasting purposes, VAR models also provide framework for
causality tests, which will be presented shortly in Section 11.
According to Asteriou & Hall (2011: p.321-2), the VAR models have been criticized
by the following aspects.
• First, they are a-theoretic since they are not based on any economic theory. Since
initially there are no restrictions on any of the parameters under estimation, in
effect ‘everything causes everything’. However, statistical inference is often
used in the estimated models so that some coefficients that appear to be
insignificant can be dropped, in order to lead models that might have an
underlying consistent theory. Such inference is normally carried out using what
are called causality tests.
• Second, they are criticized due to the loss of degrees of freedom. Thus, if the
sample size is not sufficiently large, estimating that large a number of
parameters, say, a three-variable VAR model with 12 lags for each, will consume
many degrees of freedom, creating problems in estimation.
• Third, the obtained coefficients of the VAR models are difficult to interpret since
they totally lack any theoretical background.
Gujarati & Porter (2009: p.788-9) add some other aspects:
• Because of its emphasis on forecasting, VAR models are less suited for policy
analysis.
• In an m-variable VAR model, all the m variables should be (jointly) stationary.
If that is not a case, we will have to transform the data appropriately (e.g., by
first-differencing). But the results from the tranformed data may be
unsatisfactory.
66
9.2 Estimating VAR Models in Stata
It is important to remember that a VAR model is used where there is no cointegration
among the variables and it is estimated using time series that have been transformed to
their stationary values. In other words, all variables in a VAR system must be stationary.
In Stata, the command for estimating a VAR model is:
varbasic27 endvariables, lags(# / #)
where endvariables is simply the names of the endogenous variables in the model, and
after lags the number of lags is specified by starting the first and the last lag numbers in
the parentheses. For example, suppose we have two stationary variables Yt and Xt [i.e,
both are I(0)], and the optimal lag length is 4, the we have:
varbasic Yt Xt, lags(1/4)
Note that the optimal lag length is determined by using information criteria such as AIC,
SIC, etc., as we will see in the following examples.
27
Stata has two commands for fitting reduced-form VARs: var and varbasic. var allows for constraints to be
imposed on the coefficients, while varbasic allows you to fit a simple VAR quickly without constraints and graph
the IRFs (StataCorp, 2017b: var intro).
67
Figure 9.1: Personal consumption expenditure and disposable income in the U.S.
10.5
9.5 10
9
ln(Consumption ln(Income
Figure 9.2: First differences of personal consumption expenditure and disposable income in the U.S.
.04
.02
0
-.02
D.ln(Consumption D.ln(Income
68
The Stata commands are as follows:
use "D:\My Blog\s4poe_statadata\consumption.dta", clear
gen date =q(1960q1)+_n-1
format %tq date
gen Y = log(inc)
gen C = log(cons)
tsset date
tsline C Y, legend(lab (1 "ln(Consumption") lab(2 "ln(Income"))
tsline D.C D.Y, legend(lab (1 "D.ln(Consumption") lab(2 "D.ln(Income"))
dfgls C, trend
dfuller C, trend lags(3)
dfgls Y, trend
dfuller Y, trend lags(1)
regress C Y time
predict ehat, resid
tsline ehat
dfgls ehat, trend
dfuller ehat, trend lag(1)
varsoc28 D.C D.Y
varbasic D.C D.Y, lag(1/1) step(12) nograph
28
See StataCorp (2017b: varsoc). Because fitting a VAR of the correct order can be important, varsoc offers
several methods for choosing the lag order p of the VAR to fit. After fitting a VAR, and before proceeding with
inference, interpretation, or forecasting, checking that the VAR fits the data is important. varlmar can be used to
check for autocorrelation in the disturbances. varwle performs Wald tests to determine whether certain lags can
be excluded. varnorm tests the null hypothesis that the disturbances are normally distributed (StataCorp, 2017b:
var intro).
69
Table 9.1: dfgls test for selecting the optimal lag length of consumption.
70
Table 9.3: dfgls test for selecting the optimal lag length of income.
71
Table 9.5: Regressing C on Y and deterministic trend.
72
Table 9.6: dfgls test for selecting the optimal lag length of residual.
Table 9.7: ADF test for stationarity of residual from Table 9.5.
General comment from the above graphs and tables are as follows. First, both C and Y
series are I(1). The relationship between C and Y is spurious because the residual
obtained from the regression between C and Y is not stationary. In other words, we have
73
no cointegration between C and Y29. As a result, we only estimate the coefficients of
the model using a VAR in differences instead of using a VECM model. Before
estimating a VAR model in differences, we should select the optimal lag lengths in such
a VAR model. Table 9.8 indicates that the optimal lag length is 1.
29
Note that a similar example about the relationship between consumption expenditure and income for the period
1970 – 2008 by Gujarati (2011, p.229-31) concludes that there is a cointegrating relationship between
consumption expenditure and disposable income [see Sections 8.2 and 8.5].
74
9.2.2 Relationship between money supply and interest rate
This example is based on Gujarati & Porter (2009: p.785-7) with some modifications.
Specifically, Gujarati & Porter used the ‘levels’ in his VAR models, although these
series are not stationary. Therefore, we make some modifications following a 4-step
procedure as done in the previous example. The dataset Table17_1.dta includes
quarterly data on 4 variables: M1 (money supply), R (interest rate), P (inflation), and
GDP from first quarter of 1979 to fourth quarter of 1988. However, we just use 36
observations for the analysis because we want to make comparisons between predicted
values with actual values in 1988 for forecasting purpose. The commands in Stata are
listed as below [Note that we will not present all graphs and tables for space limited]:
use "D:\My Blog\Time series econometrics for beginners\Table17_5.dta", clear
gen date =q(1979q1)+_n-1
format %tq date
tsset date
dfgls m, trend
dfuller m, trend lags(1)
dfgls r, trend
dfuller r, trend lags(1)
regress m r
estat dwatson
predict ehat, resid
tsline ehat, xtitle(" ") ytitle("Residuals from regression of M1 on R")
dfgls ehat, trend
dfuller ehat, trend lag(1)
varsoc D.m D.r
varbasic D.m D.r, lag(1/1) step(12) nograph
varbasic D.m D.r, lag(1/2) step(12) nograph
varbasic D.m D.r, lag(1/4) step(12) nograph
The test results show that both series money supply (M) and interest rate (R) are I(1)
and not cointegrated. As a result, we are able to estimate VAR models in first
differences instead of VECM models. Thanks to information criteria such as AIC, SIC,
etc., we eventually select 1 as the optimal lag lengths in the final VAR model [Table
9.11]. We also try other lag lengths such 2 and 4, but these models are not as good as
75
the model with lag length of 1 [thanks to information criteria and significance of higher
lag-length coefficients].
In order to make forecasting [Table 9.12], we can use the command ‘fcast compute’ in
Stata immediately after estimating a VAR model. A list of commands is as follows:
varbasic D.m D.r in 1/36, lag(1/1) step(12) nograph
fcast compute f_, step(4)
gen LM = L1.m
gen f_m = LM + f_D_m
gen LR = L1.r
gen f_r = LR + f_D_r
list m f_m r f_r in 37/40
76
10. VECM AND JOHANSEN METHOD OF COINTEGRATION
10.1 VECM
It was mentioned in Section 8.3, when there are more than two variables in the model,
it is possible to have more than one cointegrating relationships. Generally, a model with
k variables, there is a possibility to have maximum (k – 1) cointegrating vectors. In this
case, the EG- single-equation approach cannot be applied, and we have to use the
Johansen approach for multiple equations.
In this section, we extend the single-equation error correction model to a multivariate
one. Let’s assume that we have three variables, Yt, Xt and Wt, which can all be
endogenous, i.e., using matrix notation Zt = [Yt, Xt, Wt] we have that
or
where
Yt a1 d1 g1 a2 d2 g2
Zt = [ X t ], A0 = [ ], A1 = [b1 e1 h1 ], A2 = [b2 e2 h2 ]
Wt c1 f1 k1 c2 f2 k2
aq dq gq e1t
, … Aq = [bq eq hq ], and et = [e2t ]
cq fq kq e3t
Suppose that all variables in model (75) are I(1) and there are two cointegrating
relationships, similar to ECM in the single-equation case, we have a counterpart of ECM
for multiple equations: The simplest form of a VECM(p) is as below:
or
Yt = a1Yt-1 + d1Xt-1 + g2Wt-1 + … + apYt-p + dpXt-p + gpWt-p + 1Zt-1 + v1t (88)
Xt = b1Yt-1 + e1Xt-1 + h1Wt-1 + … + bpYt-p + epXt-p + hpWt-p+ 2Zt-1 + v2t (89)
Wt = c1Yt-1 + f1Xt-1 + k1Wt-1 + … + cpYt-p + fpXt-p + kpWt-p+ 3Zt-1 + v3t (90)
77
where
Yt a1 d1 g1 a2 d2 g2
Zt = [ X t ], 1 = [b1 e1 h1 ], 2 = [b2 e2 h2 ]
Wt c1 f1 k1 c2 f2 k2
ap dp gp Yt−1 v1t
, … p = [bp ep hp ], Zt-1 = [ X t−1 ], and vt = [v2t ]
cp fp kp Wt−1 v3t
Important note: VECM include one fewer lag of the first differences in comparison with
the orginal VAR. Therefore, we replace q by p [i.e., p = q – 1]. Also note that for
simplification, we denote elements in 0, 1, …, n similar as A0, A1, …, Ap, but the
nature is completely different. In addition, there may be constant and trend terms in both
VAR and cointegrating equations [i.e., Zt - 1]. We will expand these terms when
discussing the Johansen approach.
The matrix contains information regarding the long-run relationships. We can
decompose = ’ where is the speed of adjustment to equilibrium coefficients, and
’ is the matrix of long-run coefficients. The matrix is defined as follows:
1 11 12
21 31
= [2 ] = ’ = [ 21 22 ] [ 11 ]
3 31 32 12 22 32
Therefore, the β’Zt - 1 term is equivalent to the error correction term [Yt - 1 – α – βXt - 1]
in the single-equation case, except that now β’Zt - 1 contains up to (k – 1) vectors in a
multivariate framework.
Let us now analyze only the error correction part of the first equation [Eq.(88), i.e., for
∆Yt on the left-hand side) which gives:
Yt 1
1Zt-1 = ([ 11β11 + 12β12] [ 11β21 + 12β22] [ 11β31 + 12β32]) X t 1 (91)
W
t 1
which shows clearly the two co-integrating vectors with their respective speed of
adjustment terms 11 and 12.
78
10.2 Advantages of the Multiple-Equation Approach
According to Asteriou & Hall (2011: p.369-70), the multiple-equation approach has the
following advantages over the single-equation approach:
(1) From the multiple-equation approach, we can obtain estimates for both co-
integrating vectors [Eq.(92)], while with the simple equation we have only a
linear combination of the two long-run relationships.
(2) Even if there is only one cointegrating relationship [for example the first only in
Eq.(92)] rather than two, with the multiple-equation approach we can calculate
all three differing speeds of adjustment coefficients ( 11 21 31).
(3) Only when 21 = 31 = 0, and only one co-integrating relationship exists, can
we then say that the multiple equation method is the same (reduces to the same)
as the single equation approach, and therefore, there is no loss from not
modelling the determinants of ∆Xt and ∆Wt. Here, it is good to mention as well
that when 21 = 31 = 0, is equivalent to Xt and Wt being weakly exogenous.
In a nutshell, only when all right-hand variables in a single equation are weakly
exogenous, does the single-equation approach provide the same result as a multiple-
equation approach.
or
79
Yt = + a1Yt-1 + d1Xt-1 + g2Wt-1 + … + aqYt-q + dqXt-q + gqWt-q + e1t (84)
Xt = + b1Yt-1 + e1Xt-1 + h1Wt-1 + … + bqYt-q + eqXt-q + hqWt-q + e2t (85)
Wt = + c1Yt-1 + f1Xt-1 + k1Wt-1 + … + cqYt-q + fqXt-q + kqWt-q + e3t (86)
where
Yt a1 d1 g1 a2 d2 g2
Zt = [ X t ], A0 = [ ], A1 = [b1 e1 h1 ], A2 = [b2 e2 h2 ]
Wt c1 f1 k1 c2 f2 k2
aq dq gq e1t
, … Aq = [bq eq hq ], and et = [e2t ]
cq fq kq e3t
or
where
Yt a1 d1 g1 a2 d2 g2
Zt = [ X t ], 1 = [b1 e1 h1 ], 2 = [b2 e2 h2 ]
Wt c1 f1 k1 c2 f2 k2
ap dp gp v1t
, … p = [bp ep hp ], and vt = [v2t ]
cp fp kp v3t
Note: The VAR(p) model may include constant and trend variable, i.e., 0.
80
(3) All variables are I(1), but are cointegrated, i.e., there exist up to (k – 1) [= 2 in
this current case] cointegrating relationships of the form ’Zt - 1 ~ I(0). In this
particular case, r (k - 1) cointegrating vectors exist in . This simply means
that r columns of form r linearly independent combinations of the variables in
Zt, each of which is stationary. Here, we have to use the vector error correction
mechanism (VECM) as defined in Eq.(87).
In terms of the rank of matrix , the above cases are summarized as in Table 10.1.
Table 10.1: Rank of matrix and its implications
Rank of Implications
r=k All variables in Zt are stationary, i.e., I(0). We call has a full rank.
No need to estimate the model as VECM. VAR on untransformed
data is well behaved.
81
which is I(1). For example, if inflation rate is I(0), we might
expect that CPI is I(1). Similarly, if we face a mix of I(1) and
I(2), we can select another proxy of I(2) variable which is I(1).
For example, if GDP is I(2), we might expect that GDP growth
rate is I(1)30.
30
Most macroeconomic flows and stocks such as output and employment are I(1). An I(2) series is growing at an
ever-increasing rate such as price level data. Series that are I(3) or greater are extremely unusual, but they do exist.
For example, the money stocks or price levels in hyperinflationary economies (Greene, 2008: p.740).
82
both models. The general case of the VECM including all the various
options that can possibly happen, is given by the following equation:
∆Zt = 1∆Zt-1 + … + p∆Zt-p
+ (’Zt-1 t) +0 + t + vt (97)
In general, five distinct models can be considered. Although the first
and the fifth model are not likely to happen, we present all of them for
reasons of complementarity.
Model 5: Intercept and quadratic trend in CE, intercept and linear trend
in VAR.
∆Zt = 1∆Zt-1 + … + p∆Zt-p
+ (’Zt-1 t2) +0 + t + vt (102)
83
that the null hypothesis of no cointegration is not rejected (Asteriou &
Hall: p.373).
max(r, r 1) T ln(1 ˆ
r 1) (103)
84
displayed critical value. [In other words, if test statistic > critical
value, we reject H0]. Critical values for both statistics are
provided by Johansen and Juselius (1990). These critical values
are directly provided from Stata after conducting a cointegration
test.
where in varnames we type the name of the variables (in levels) to be tested for
cointegration. From the options given, we specify the different models discussed in the
theory. So, for each case (from models 1 - 5). The options for each model are as follows:
Model 1: trend(none)
Model 2: trend(rconstant)
Model 3: trend(constant)
Model 4: trend(rtrend)
Model 5: trend(trend)
85
For example, suppose we want to test for cointegration between two variables (say, y
and x) through the third model, the command is:
vecrank y x, max trend(constant) lags(2)
where the max is in the command for Stata to show both the max and trace statistics (if
the max is omitted, Stata will report only the trace statistics). Also lags(#) determines
the number of lags to be used in the test.
If it appears that there is cointegration, the command:
provides the VECM estimation results. The options are the same as above. So, the
command:
vecrank y x, trend(trend) lags(3)
yields VECM results for the variables y and x and for three lagged short-run terms,
when the cointegrating equation has been determined from the fifth model according to
the theory.
In order to illustrate, we use the following example, which is based on group assignment
for an advanced econometrics course in 2012, School of Social Sciences, Wageningen
University, the Netherlands31. In this example, we use the dataset texashousing.dta with
monthly housing prices in four major cities in Texas (USA): Austin, Dallas, Houston
and San Antonio. Natural logarithms of housing prices are available from January 1990
till December 2003 (168 observations). It is expected that there are regional linkages
between these housing markets. If houses get very expensive in one city, people may
decide to move to another city, creating upward pressure on housing prices in all cities.
In other words, it is assumed that there exist a long-run (spatial) equilibrium between
these four housing prices series. That is what we will investigate here.
31
More exactly, this is an example in StataCorp (2017b: vec intro – VECM estimation in Stata).
86
Table 10.4: Unit root tests of four housing prices.
From Table 10.4 we realize that all housing prices in these cities are integrated of the
same order one, i.e., I(1). Therefore, there could be cointegrating relationships among
these housing prices.
87
Table 10.6: Model 2 results.
88
Table 10.9: The Pantula principle test results.
The results are divided into two parts: Table 10.10 presents the short-run relationships
and speed of adjustment coefficients, and Table 10.11 presents the long-run
relationships among four variables. Again, note that the VECM model has one fewer
lag of the first differences.
32
Note that before estimating the parameters of a VECM model, you must choose the number of lags in the
underlying VAR, the trend specification, and the number of cointegrating equations. vecrank offers several ways
of determining the number of cointegrating vectors conditional on a trend specification and lag order (StataCorp,
2017b: vecrank).
89
Table 10.10: Short-run relationships among variables.
90
91
Table 10.11: Long-run relationships among variables33.
From the cointegrating equations results, (based on the significance of the estimated
coefficients) we realize that there are two long-run cointegrating relationships
between/among house prices of: (i) Austin and San Antonio; and (ii) Dallas, Houston,
and San Antonio.
The speed of adjustment parameters in the VECM model are derived from Table 10.10,
and presented in Table 10.12. From Tables 10.11 and 10.12, we can write two
cointegrating vectors with their respective speed of adjustment terms for each equation
in VCEM model as follows:
33
Note that the coefficient of houston in the first cointegrating equation (_ce1) is not statistically significant. We
can refit the model with the Johansen normalization and the overidentifying constraint that the coefficient on
houston in the first cointegrating equation is zero [See StataCorp (2017b: vec intro – VECM estimation in Stata)
to learn this command].
92
For Austin:
-0.154(Austint-1 - 0.267Houstont-1 – 1.235SAt-1 + 5.546) - 0.025(Dallast-1 –
1.094Houstont-1 + 0.286Sat-1 – 2.343)
For Dallas:
0.071(Austint-1 - 0.267Houstont-1 – 1.235SAt-1 + 5.546) + 0.612(Dallast-1 –
1.094Houstont-1 + 0.286Sat-1 – 2.343)
For Houston:
0.188(Austint-1 - 0.267Houstont-1 – 1.235SAt-1 + 5.546) - 0.302(Dallast-1 –
1.094Houstont-1 + 0.286Sat-1 – 2.343)
34
To create a separate table of adjustment parameters only, we can replay the results by specifying ‘alpha’ option
plus nobtable noetable [the command is vec, alpha nobtable noetable]. See StataCorp (2017b: vec – example 2).
93
There are some notes:
▪ For Austin: The adjustment parameter of the second cointegrating relation is
not significant because Austin is omitted in this relation (i.e., _ce2 in
cointegrating equations).
▪ For Dallas: The adjustment parameter of the first cointegrating relation is not
significant because Dallas is omitted in this relation (i.e., _ce1 in the
cointegrating equations).
▪ For Houston: Both adjustment parameters are highly significant because
Houston exists in both relations (i.e., _ce1 and _ce2 in the cointegrating
equations).
▪ For San Antonia: The adjustment parameter of the second cointegrating relation
is not significant (although it is included in both cointegrating equations) because
of lag selection (maybe). Say, when we change from lag(3) to lag(4), both
adjustment parameters become significant at 5% significance level.
You can try with other model specifications such as model 3, model 4, and/or different
lags based on other information criteria such as SIC. The Stata commands for this
example are as follows:
use "D:\My Blog\Time series econometrics for beginners\texashousing.dta", clear
tsset t
tsline D.austin D.dallas D.houston D.sa
dfgls austin
dfuller austin, lag(4)
dfgls dallas, trend
dfuller dallas, trend lag(11)
dfgls houston, trend
dfuller houston, trend lag(11)
dfgls sa
dfuller sa, lag(12)
dfgls D.austin
dfuller D.austin, lag(13)
dfgls D.dallas
dfuller D.dallas, lag(1)
dfgls D.houston
94
dfuller D.houston, lag(13)
dfgls D.sa
dfuller D.sa, lag(13)
varsoc austin dallas houston sa
vecrank austin dallas houston sa, trend(none) lag(3) /* Model 1 */
vecrank austin dallas houston sa, trend(rconstant) lag(3) /* Model 2 */
vecrank austin dallas houston sa, trend(constant) lag(3) /* Model 3 */
vecrank austin dallas houston sa, trend(rtrend) lag(3) /* Model 4 */
vecrank austin dallas houston sa, trend(trend) lag(3) /* Model 5 */
95
There are two causality testing approaches, namely Granger causality test and Sims
causality test. However, for its popularity in practical applications, we just concentrate
on the test procedures for Granger causality test.
11.1 The Standard Granger Causality Test
The standard Granger causality test for the of two stationary variables, say, Yt and Xt,
involves as a first step the estimation of the following (reduced-form) VAR model:
q q
Yt = a1 + ∑i=1 βi X t − i + ∑i=1 i Yt − i + u1t (105)
q q
Xt = a2 + ∑i=1 i X t − i + ∑i=1 i Yt − i + u2t (106)
where it is assumed that both u1t and u1t are uncorrelated white-noise error terms [i.e.,
well-specified model, no autocorrelation, no heteroskedasticity35, no omission of
important lagged variables, etc.], and importantly Yt and Xt are stationary. We also
assume that the lag lengths of both equations are the same (i.e., q), although they might
be different. In addition, both equations (105) and (106) might include other exogenous
variables such as linear trend, quadratic trend, and so on. In this model, we can have the
following different cases:
Case 1 The lagged X terms in equation (105) are statistically different from zero as
a group, and the lagged Y terms in equation (106) are not statistically
different from zero. In this case, we have that Xt causes Yt.
Case 2 The lagged Y terms in equation (106) are statistically different from zero as
a group, and the lagged X terms in equation (105) are not statistically
different from zero. In this case, we have that Yt causes Xt.
Case 3 Both sets of lagged X and lagged Y terms are statistically different from zero
as a group in equations (105) and (106), so that we have bidirectional
causality between Yt and Xt.
Case 4 Both sets of lagged X and lagged Y terms are not statistically different from
zero in equations (105) and (106), so that Xt is independent of Yt.
The Granger causality test involves the following procedures. First, estimate the VAR
model given by equations (105) and (106). Then check the significance of the
coefficients and apply variable deletion tests, first in the lagged X terms for equation
35
Many cases of heteroskedasticity in time series data involve an error term with a variance that tends
to increase with time. That kind of heteroskedastic error term is also nonstationary (Studenmund, 2017:
p.377).
96
(105), and then in the lagged Y terms in equation (106). According to the result of the
variable deletion tests, we may conclude about the direction of causality based upon the
four cases mentioned above.
More analytically, and for the case of one equation [i.e. we will examine equation
(105)], it is intuitive to reverse the procedure in order to test for equation (106), we
perform the following steps (Asteriou & Hall: p.323-4):
and obtain the RSS of this regression (which is the restricted one) and label
it as RSSR.
and obtain the RSS of this regression (which is the unrestricted one) and
label it as RSSU.
Step 4 Calculate the F statistic for the normal Wald test on coefficient restrictions
given by:
Step 5 If the computed F value exceeds the critical F value, reject the null
hypothesis and conclude that Xt causes Yt.
97
We then repeat the same test procedure for equation (106). It is noted that if both Yt and
Xt are nonstationary and not cointegrated, the standard Granger causality test is used
for the first differences of the nonstationary variables, i.e., Yt, Xt, and their
corresponding lags [p = q -1 in this case].
98
11.3 The Augmented Granger Causality Test
The augmented Granger causality test for the of two nonstationary but cointegrated
variables, say, Yt ~ I(1) and Xt ~ I(1) follows the ECM model36 as:
p p
Yt = a1 + ∑i=1 βi X t − i + ∑i=1 i Yt − i + 1et - 1 + u1t (108)
p p
Xt = a2 + ∑i=1 i X t − i + ∑i=1 i Yt − i + 2et - 1 + u2t (109)
where et – 1 ~ I(0) is the lagged value of the cointegrating equation between Yt and Xt:
Yt = + Xt + et (110)
and it is assumed that u1t and u1t are uncorrelated white-noise error terms, and
importantly et is a white noise and stationary. We also assume that the lag lengths of
both equations are the same (i.e., p = q -1 for the first differenced series), although they
might be different. Similar to the standard version of Granger causality, we can have
the following different cases:
Case 1 The lagged X terms and lagged error term in equation (108) are statistically
different from zero as a group, and the lagged Y terms and lagged error
term in equation (109) are not statistically different from zero. In this case,
we have that Xt causes Yt.
Case 2 The lagged Y terms and lagged error term in equation (109) are statistically
different from zero as a group, and the lagged X terms and lagged error term
in equation (108) are not statistically different from zero. In this case, we
have that Yt causes Xt.
Case 3 Both sets of lagged X and lagged Y terms or both sets of lagged error
terms are statistically different from zero as a group in equations (108) and
(109), so that we have bidirectional causality between Yt and Xt.
Case 4 Both sets of lagged X terms and lagged error term in equation (108) and
lagged Y terms and lagged error term in equation (109) are not statistically
different from zero, so that Xt is independent of Yt.
36
Note that we are considering the single equation case, so the ECM is used. However, if we have the multiple
equations with more than two nonstationary variables, the counterpart of ECM, i.e., VECM is used instead.
99
The augmented Granger causality test involves the following steps (suppose that Yt and
Xt are nonstationary):
Step 1 Test for cointegration between variables of interest [in the current
situation, EG approach for single equation is used]. Suppose that
cointegration exists between variables.
and obtain the RSS of this regression (which is the restricted one) and label
it as RSSR.
Step 2 Regress Yt on lagged Y terms plus lagged X terms and lagged error
term et - 1 as in the following model:
p p
Yt = a1 + ∑i=1 βi X t − i + ∑i=1 i Yt − i + 1et - 1 + u1t (108)
and obtain the RSS of this regression (which is the unrestricted one) and
label it as RSSU.
Step 4 Calculate the F statistic for the normal Wald test on coefficient restrictions
given by:
Step 5 If the computed F value exceeds the critical F value, reject the null
hypothesis and conclude that Xt causes Yt.
100
11.4 Illustrative Examples
11.4.1 Causality test of the consumption expenditure and income relationship
This example continues the relationship between consumption expenditure and income
in Section 9.2.1. We already knew that both variables ln(consumption) and log(income)
are I(1) and not cointegrated. Therefore, we can apply the standard Granger causality
test to investigate the causation between the two variables. The Stata commands are the
same as in Section 9.2.1, but we add one more command ‘vargranger’37 after the
‘varbasic’:
use "D:\My Blog\s4poe_statadata\consumption.dta", clear
gen date =q(1960q1)+_n-1
format %tq date
gen Y = log(inc)
gen C = log(cons)
tsset date
tsline C Y, legend(lab (1 "ln(Consumption") lab(2 "ln(Income"))
tsline D.C D.Y, legend(lab (1 "D.ln(Consumption") lab(2 "D.ln(Income"))
dfgls C, trend
dfuller C, trend lags(3)
dfgls Y, trend
dfuller Y, trend lags(1)
regress C Y time
predict ehat, resid
tsline ehat
dfgls ehat, trend
dfuller ehat, trend lag(1)
varsoc D.C D.Y
varbasic D.C D.Y, lag(1/1) step(12) nograph
vargranger
The standard Granger causality test for the first differenced variables of consumption
expenditure and income is presented in Table 11.1.
37
vargranger can be used only after var or svar. Besides, we can use ‘test’ instead of vargranger (StataCorp,
2017b: vargranger).
101
Table 11.1: Causality test of consumption expenditure and income relationship.
11.4.2 Causality test of the money supply and interest rate relationship
This example continues the relationship between money supply and interest rate in
Section 9.2.2. We already knew that both variables money supply and interest rate are
I(1) and not cointegrated. Therefore, we can apply the standard Granger causality test
to investigate the causation between the two variables. The Stata commands are the
same as in Section 9.2.2, but we add one more command ‘vargranger’ after the
‘varbasic’:
use "D:\My Blog\Time series econometrics for beginners\Table17_5.dta", clear
gen date =q(1979q1)+_n-1
format %tq date
tsset date
dfgls m, trend
dfuller m, trend lags(1)
dfgls r, trend
dfuller r, trend lags(1)
regress m r
estat dwatson
predict ehat, resid
102
tsline ehat, xtitle(" ") ytitle("Residuals from regression of M1 on R")
dfgls ehat, trend
dfuller ehat, trend lag(1)
varsoc D.m D.r
varbasic D.m D.r, lag(1/1) step(12) nograph
vargranger
Table 11.2: Causality test of money supply and interest rate relationship.
103
Figure 11.1: Wheat prices and oil prices over time.
100
300
80
250
pwht
poil
200
60
150
40
100
20
0 50 100 150 200 250
t
pwht poil
The graphs for both pwht and poil indicate that there are stochastic trends (means are
not constant) and their variances are also not constant. For the pwht, it first increases
and highly fluctuates (from observation 1 to about 70), followed by a declining period
(from observation about 70 to about 120) with less fluctuation, then it tends to increase
and especially decline very quickly in the last months. Therefore, we might say that
these prices not stationary.
In order to check the order of integration for pwht we perform the Augmented Dickey
Fuller (ADF) test and the KPSS test on pwht until finding a stationary time series.
104
Table 11.3: ADF test for wheat prices with 11 lags.
If we choose the 5% significance level, the coefficients of lag 8 to lag 11 are not
significant. Therefore, we try the test equation with 7 lags.
105
As the absolute value of the test statistics (2.579) is smaller than the absolute value of
5% critical value (2.882), we cannot reject the null hypothesis at 5% significance level.
Therefore, ADF test suggests that pwht series is not stationary. To be sure, we apply the
KPSS test.
KPSS test
106
ADF test
H0: The first-difference of pwht is not stationary.
Table 11.6: ADF test for the first difference of wheat prices with 12 lags.
As the absolute value of the test statistics (4.796) is larger than the 5% critical value
(1.95), we reject the null hypothesis. That means the first-differenced series of pwht is
stationary. We now examine the KPSS test for this first-differenced series.
KPSS test
H0: The first-differenced series of pwht is stationary.
Table 11.7: KPSS test for wheat prices.
107
The KPSS test results indicate that we fail to reject the null hypothesis.
In conclusion, the pwht series is integrated of order one [I(1)].
Similarly, we first introduce 12 lags because of monthly data. However, the lag 11 and
lag 12 are not significant, so we remove them in the test equation. With 10 lags, the test
results are presented in Table 10.8.
As the absolute value of the test statistics (3.37) is smaller than the 5% critical value
(3.43), we cannot reject the null hypothesis. This implies that the poil series is not
stationary. To be sure, we apply the KPSS test.
108
KPSS test
All test statistics are greater than the critical values (even at 1% significance level), so
we could reject the null hypothesis (not be shown here). That means the poil series is
non-stationary.
Therefore, we now examine stationarity of the first-differenced series of poil with the
constant term in the test equation because there is trend in the original series of poil.
The test results indicate that as the absolute value of the test statistics (4.166) is larger
than the 5% critical value (2.882), we reject the null hypothesis. The ADF test indicates
that the first-differenced series of poil is stationary. We also apply the KPSS test for the
first difference of the oil prices, and it confirms that indicate that the first-differenced
series of poil is stationary. Therefore, the poil series is integrated of order one [I(1)].
Note that for space limited, these figures are not shown here.
Cointegration analysis
As both series are integrated of order one, there could exist a long-run relationship
between pwht and poil. We must apply the cointegration tests to see whether there is
really a long-run (or cointegrating) relationship between them.
109
The OLS estimation results seem to be spurious because of the following signals: The
t-ratio is very high, while the Durbin-Watson test statistic is very small (0.103). The
graph of residuals from this regression (Figure 11.2) show that the residuals seem to be
non-stationary. The R2 is low (0.213); this is not really a phenomenon of spurious
regression. This can be a signal of positive autocorrelation. To be sure, we must apply
the statistical tests.
We apply two different tests: (i) Residual-based test for no cointegration; and (ii)
CRDW38 test for no cointegration. Both tests check for the cointegration between poil
and pwht: poil and pwht are cointegrated if the residuals of the above estimated model
are stationary process.
50
0
-50
38
See ‘Cointegration’ in Verbeek (2004: p.314-7).
110
Table 11.10: ADF test of residual from wheat prices and oil prices regression.
The test equation without the constant term shows that the residuals become stationary
even at 1% significance level. These confused results might be due to the less power of
test of the ADF test. To avoid this, we now apply the KPSS test.
111
The KPSS test results we reject the null hypothesis that the residuals are stationary.
Therefore, there seems to be no cointegration between pwht and poil.
The Durbin-Watson test statistic is 0.103, which is smaller than the 5% critical value
CRDW tests for no cointegration (~ 0.2, about 200 observations, 2 variables, Table 9.3,
Verbeek, 2012). Therefore, we fail to reject the null hypothesis that the residuals is non-
stationary. In other words, pwht and poil are not cointegrated.
In conclusion, there is no long-run relationship between pwht and poil. Therefore, the
OLS estimation regression between pwht and poil is likely to be spurious regression.
VAR model
Because pwht and poil are not cointegrated, so we cannot apply the VECM model. It is
just possible to use the VAR model for the first-differenced series of pwht and poil.
112
Table 11.13: VAR model for the relationship between wheat prices and oil prices.
The VAR model results indicate that the p-values of coefficients of the variable poilLD.
in the first equation (0.202) and of the variable pwhtLD in the second equation (0.97)
are very high. These suggest that neither poil affects pwht, nor pwht affects poil.
However, the cofficients of pwhtLD in the first equation, and poilLD in the second
equation are highly significant. These indicate that the first-differenced series follow
the AR process.
113
Causality test
Table 11.14: Causality test of the relationship between wheat prices and oil prices.
114
predict S2, resid
dfuller S1, lag(1)
dfuller S2, lag(1)
varsoc lnpce lnpdi
reg D.lnpce LD.lnpce LD.lnpdi L.S1
test LD.lnpdi L.S1
reg D.lnpdi LD.lnpce LD.lnpdi L.S2
test LD.lnpce L.S2
115
Table 11.16: Causation from PDC to PDI.
Because the p-value is 0.0697, we do not reject the null hypothesis that consumption
expenditure does not cause income at 5% level of significance.
116
12. BOUNDS TEST FOR COINTEGRATION
12.1 Introduction
Another way to test for cointegration and causality is the bounds test for cointegration
within ARDL modelling approach. This model was developed by Pesaran et al. (2001)
and can be applied irrespective of the order of integration of the variables (irrespective
of whether regressors are purely I(0), purely I(1) or mutually cointegrated). This is
specially linked with the ECM models and called as conditional ECM39 or unrestricted
ECM40. Note that in case of multiple equation approach, we will have the
conditional/unrestricted VECM.
ARDL bounds test approach for cointegration has recently used in many practical
applications thanks to the contributions of Kripfganz41 & Schneider (2016) in terms of
Stata commands. The ardl and ardlbounds commands in Stata help researchers
implement their data analysis more quickly.
p p
∆Yt = c0 + c1 t + aYt−1 + bX t−1 + ∑i=1 θi ∆Yt−i + ∑i=1 i ∆X t−i + ∆X t + εt (112)
39
See Pesaran et al. (2001: p.290), Rahman & Kashem (2017: p.603), Rushdi et al. (2012: p.537).
40
See Zhang et al. (215: p.274).
41
http://www.kripfganz.de/stata/
42
For multivariate relationships, see Rushdi et al. (2012: p.537).
117
Table 12.1: Critical F values for ARDL bounds test.
With the Wald F statistics, we do not need the option ‘stat(F)’ because it is a default
from the command ‘ardlbounds’. However, if we want to determine the critical t values
for ARDL bounds test, we must add such an option.
118
The lag lengths of Yt - i and Xt - i may be different, but we assume that they are the
same. The selection of the optimal lag lengths is also based on information criteria as
discussed earlier. In Stata, we can apply the command ‘varsoc’. However, in empirical
studies, ‘trial and error’ method is inevitable, especially in case of small samples.
In order to test for the absence of a long-run level relationship (i.e., cointegrating
relationship) between Yt and Xt in the CECM [Eq. (112)], a sequential testing of the
two null hypotheses, defined as:
is conducted. If H10 is not rejected, then there does not exist a long-run
level relationship between Yt and Xt. The testing procedure is terminated (i.e., stop). If
this null is rejected, then test for the null H20 and if the latter is also rejected,
then there exists a long-run level relationship between Yt and Xt. According to Rushdi
et al. (2012: p.537), under the assumptions of all variables being I(0) and all being I(1)
respectively, the lower and upper bounds of the critical values of the test statistics
for these hypotheses are tabulated in Pesaran et al. (2011). Accordingly, the first null
hypothesis (i.e., H10) is tested by using the Wald F statistic, while the second null
hypothesis is tested by using the t statistic. Following Pesaran et al. (2001)’ philosophy,
under the null hypothesis of H10, if the computed Wald F statistic falls outside the upper
bound critical value at the prescribed level of significance (e.g., 1%, 5%, and 10%
respectively), then the null hypothesis is rejected. On the other hand, if the F statistic is
below the lower bound critical value, then the null hypothesis is not rejected. However,
if the statistic falls within these bounds, then the decision is inconclusive. A similar
decision rules apply to the t-statistic for testing the null of H20.
If these hypotheses tests establish the existence of level relationship among the
variables, we can then proceed to estimate the long run and short run coefficients in
Eq.(112). The long-run coefficients (if exist) are directly calculated by using the
estimated coefficients obtained from the CECM. How to do it? You can reference either
the relationship between ECM and ARDL discussed in Section 7, or Rushdi et al. (2012:
p.573). If the results from ARDL bounds tests indicate that there exists a long-run
relationship between (among) variables, we can employ either ECM or VECM models
to investigate short-run, long-run relationships, and the speed of adjustment to
equilibrium state by using the traditional ECM or VECM methods. In addition, we can
also investigate Granger causality tests by using the traditional VECM or the
conditional VECM models.
119
12.3 Illustrative Examples
12.3.1 Stock returns and inflation
This example is cited from the study of Rushdi et al. (2012). Its aim is to investigate the
long-run relationship between real stock returns and inflation in Australia over the
period 1969q2 to 2008q1. The data are collected from the International Financial
Statistics (IFS). The variables of interest include real stock returns (rsr), inflation (),
expected inflation (e) [estimated by ARMA(p,q) model], real economic activity (act),
and monetary policy (mp). The unit root tests find out a mixture of I(0) and I(1)
variables. Therefore, the ARDL bounds testing method seems to be appropriate. In order
to test for cointegration between real stock returns and inflation, they use both bivariate
and multivariate models. The former is presented in Table 12.3, and the latter is
presented in Table 12.4. The long-run coefficients are calculated by using estimates
from the CECM models, which is presented in Table 12.5.
Both bounds tests from the bivariate and multivariate models reject the null hypotheses,
and imply the existence of long-run relationships between real stock returns and
inflation, and between real stock returns and expected inflation. We let readers read the
original paper for getting detailed discussions. We expect that economic series are often
characterized by a mixture of I(0) and I(1) variables because some of them are in
differenced form such as asset returns, growth rates, and so on.
Table 12.3: ARDL models and bounds tests for bivariate relationships.
120
Table 12.4: ARDL models and bounds tests for multivariate relationships.
121
It is worth noting that all the above mentioned cointegration tests (i.e., EG, Johansen,
and ARDL bounds tests) assume that no structural change exists in the system. If this
is a case, an alternative method such as Gregory and Hansen (1996)43 should be used.
43
See Narayan (2005).
122
Table 12.6: Unit root analysis.
123
Table 12.8: Results of Johansen cointegration test.
124
Both ARDL bounds test and Johansen test confirm that there are long-run relationships
among imported technology, energy consumption, FDI, trade openness, and CO2
emission. In particular, there are at least three cointegration relationships exists among
the variables of interest. The paper also provides the results of both long-run and short-
run relationship between CO2 emission and other variables in the model using ARDL
cointegration technique (Table 12.9). However, there are two questionable issues. First,
that the lag length is one for all variables seems to be unreasonable because the optimal
lag length in ARDL bounds test for this model is (1,0,0,1,0). Second, the interpretation
of the elasticities is problematic. The results of causality analysis indicate a bi-
directionality exists between imported technology and carbon emission.
125
and Zt - 5 has statistically significant effect on Yt [i.e., DL processes]. Then the ARDL
model is estimated as the following command:
Here, the ‘aic’ is Akaike information criterion, which is used for selecting the optimal
lag length of the model [6 in this case]. The estimated coefficient Yt - 6 will be
statistically significant, and the estimated coefficients of Xt, Wt - 3, and Zt - 5 will be
statistically significant. If we just use the option lags(6), we implicitly assume that all
endogenous variables in the model simultaneously have the same lag lengths of 6.
If we want to estimate the CECM with the above information, the Stata command will
be as follow:
The output of this command includes for components: ADJ (i.e., the speed of
adjustment coefficient for the cointegrating equation with Yt - 1 is a dependent variable),
LR presents the long-run relationship of Yt and other endogenous variables, SR presents
short-run coefficients of the first differences and their lagged variables, and exogenous
variable (i.e., D in this current example).
After estimating above equation, if we want to test for cointegration by using ARDL
bounds test, we type:
estat btest
126
Westerlund et al. (2015) said that the traditional tests for unit root and cointegration for
each country are wasteful. They list some typical reasons why it is worth using a joint
panel data. First, in many studies, a group of countries becomes the main interest of
investigation. Second, the use of panel data instead of individual time series data not
only increases the number of observations and variation but also reduces the noise from
the individual time series regressions. Third, the power of tests is increased in panels
because if individual time series is not long enough. This is particularly reasonable
when doing research in developing countries where data may be not available, or
available but over a very short period. Fourth, unlike the unit-by-unit approach, the joint
panel approach accounts for the multiplicity of the testing problem. In addition, Narayan
& Smyth (2014) stated that the traditional testing methods in time series result in mixed
findings. Therefore, they expect a shift towards the nonstationary panel data approach,
which is now observing a very large number of studies being published.
Using search engine with key words like ‘panel unit root*’ or ‘panel cointegration*’,
we can realize an increasing number of empirical studies using nonstationary panel
techniques have been published, remarkably in economics and energy journals. Below
is an example list:
Table 13.1: Journals with nonstationary panel publications.
127
Bulletin of Economic Research 23 Q3
Economic Analysis and Policy 17 Q3
Journal of Financial Research 25 Q3
Renewable and Sustainable Energy Reviews 176 Q1
Energy Policy 146 Q1
Energy 134 Q1
Applied Energy 125 Q1
Energy Economics 101 Q1
Resources Policy 44 Q1
Journal of Cleaner Production 116 Q1
…
Source: http://www.scimagojr.com/index.php.
These indicate the fact that this new strand of research is clearly promising. Therefore,
I think that economics students at UEH should be equipped with nonstationary panel
techniques along with traditional econometrics models. If this is a case, a shining
prospect of publications is coming for young researchers because expensive survey-
based researches are beyond the ability of our economics students.
13.2 Panel Unit Root Tests
This section is mainly based on three key references: Banerjee (1999), Asteriou & Hall
(2011), and especially StataCorp (2015). To get started with panel unit root tests, it is
worth noting the following points (see Asteriou & Hall, 2011: p.443; StataCorp, 2015:
p.512). First, some of the tests require balanced panels (i.e., Ti = T for all i such as LLC,
HT, Breitung, and Hadri), whereas others allow for unbalanced panels (such as IPS,
MW). Second, one may form the null hypothesis as a generalization of the standard
ADF test (i.e., all series in the panel are assumed to be nonstationary) and reject the null
hypothesis if some of the series in the panel appear to be stationary, while on the other
hand one can formulate the null hypothesis in exactly the opposite way (i.e., all series
in the panel are stationary) and reject the null hypothesis if there is sufficient evidence
of nonstationarity [e.g., Hadri LM test, see StataCorp, 2015: p.522-3]. Third, that is the
assumptions about asymptotic behavior of a panel’s N and T dimensions (i.e., regarding
the rates at which these parameters approach to infinity).
Similar to the unit root tests in time series data, the counterparts in panel data are based
on the following first-order autoregressive model:
128
Yit = iYi,t-1 + Ziti + it (113)
Note that Zit may now include lagged terms of dependent variables for controlling serial
correlations. This is similar to the ADF equation [in Section 6.5].
For Eq.(114), the null hypothesis is then H0: i = 0 for all i versus the alternative
hypothesis Ha: i < 0. [Note that the Hadri LM test assumes the null hypothesis that all
panels are stationary (i.e., H0: i < 0) versus the alternative hypothesis that at least some
of the panels contain unit roots (i.e., Ha: i = 0 for some i). In general, most tests assume
the null hypothesis that the panels contain unit roots, i.e., H0: i = 0]. We now discuss
typical panel unit root tests that are available in Stata. In addition, we will give
illustrative examples using pennxrate.dta [i.e., in Stata command, we type webuse
pennxrate to open the data file from web]44. This dataset contains real exchange rate
data based on the Penn World Table. This is a balanced panel consisting of 151 countries
observed over 34 years, from 1970 to 2003. The variable of interest is lnxrate, the log
of the real exchange rate. The data contains the variable g7, which indicates a group of
six advanced economies because the U.S is treated as the domestic country, so it is not
included (StataCorp, 2015: p.514).
44
Note that the Webuse datasets are clearly specified in the respective examples of Stata manual. Depending on
the version we use, the file names may be different. This dataset is currently used with Stata 14.
129
Leven, Lin and Chu (LLC) test
This test is an extension of the conventional ADF test for a sample of N cross sections
observed over T time periods is given by:
where j = 1, 2, …, p ADF lags. The term Ziti may include unit-specific fixed effects
and unit-specific time effects in addition to common time effects. The unit-specific
effects are an important source of heterogeneity, since the coefficient of the lagged Yi
[i.e., ] is restricted to being homogeneous across all units of the panel (Banerjee, 1999;
Asteriou & Hall, 2011: p.443). In other words, the LLC test assumes a homogenous
panel; that is, imposing an identical first-order autoregressive coefficient on each series
in the panel: 1 = 2 = … N = . In Stata, this is called as the ‘common’ autoregressive
p
parameter. The terms ∑j=1 ∅ij ∆Yi,t−j are included in order to control for the possible
serial correlation problem. The number of lags p can be specified using the option
lags(aic #), i.e., we choose the lag lengths that minimize the Akaike information
criterion within the # specified [e.g., lags(aic 10): Stata will calculate AIC for each of
10 lags, and report the lag length producing the smallest AIC]. It is assumed that if we
include sufficient lags, the error term uit will be a white noise.
The null and the alternative hypotheses of this test are:
H0 : = 0
H1: < 0 [that is, 1 = 2 = … N = < 0]
The LLC test also assumes that the individual processes are cross-sectionally
independent [i.e., the errors are assumed to be independent across the units of the sample
(Banerjee, 1999)]. Under this assumption, the test derives conditions for which the
pooled OLS estimator of will follow a standard normal distribution under the null
hypothesis. The LLC test may be viewed as a pooled ADF test, potentially with different
lag lengths across the different sections in the panels. The LLC can be used with panels
of “moderate” size, i.e., having between 10 to 250 panels and 25 to 250 observations
per panel (StataCorp, 2015: p.513). The Stata command for LLC test is as:
webuse pennxrate
xtunitroot llc lnrxrate if g7, lags(aic 10)
130
Table 13.2: LLC test of lnrxrate for G7 group.
The LLC bias-adjusted test statistic t* = -4.0277 is significantly less than zero (p <
0.0000), so we reject the null hypothesis of a unit root [that is, that = 0 in (115)] in
favor of the alternative that lnrxrate is stationary [that is, that < 0)]. Note that the
‘unadjusted t’ is a conventional t statistic for testing H0: = 0.
Because the G7 economies have many similarities, the test results could be affected by
cross-sectional correlation in real exchange rates. One way to control this problem is to
remove cross-sectional average from the data. The Stata command is as follow:
Table 13.3: LLC test of lnrxrate for G7 group with demean option.
131
Harris-Tsavalis (HT) test
In many datasets, particularly in microeconomics, the time dimension, T, is small, so
test whose asymptotic properties are established by assuming that T tends to infinity
can lead to incorrect inference. HT (1999) derived a unit-root test that assumes the time
dimension, T, is fixed. Their simulation results suggest that the test has a favorable size
and the power properties for N greater than 25, and they report that the power improves
faster as T increases for a given N than when N increases for a given T (StataCorp,
2015: p.516).
The HT test statistic is based on the OLS estimator, , in the regression model:
Harris and Tsavalis assume that uit is independent and identically distributed (iid)
normal with constant variance across panels. Because of the bias induced by the
inclusion of the panel means and time trends in this model, the expected value of the
OLS estimator is not equal to unity under the null hypothesis. Harris and Tsavalis
derived the mean and standard error of ρ̂ for (116) under the null hypothesis H0: = 1
when neither panel-specific means nor time trends are included, when only panel-
specific means are included (default), and when both panel-specific means and time
trends are included. The asymptotic distribution of the test statistic is justified as N
, so we should have a relatively large number of panels if we want to use this test.
Note that, like the LLC test, the HT test assumes that all panels share the same
autoregressive parameter, [i.e., instead of i].
Because the HT test is designed for cases where N is relatively large, here we test
whether the series lnrxrate contains a unit root using all countries in the dataset. We will
again remove cross-sectional means to help control for contemporaneous correlation.
The Stata command is as:
webuse pennxrate
xtunitroot ht lnrxrate, demean
132
Table 13.4: HT test of lnrxrate for all countries.
The point estimate of in Eq.(116) is 0.8184, its z statistic is – 13.1239 and the p-value
is practically zero. Therefore, we strongly reject the null hypothesis of a unit root.
It is noted that we cannot compare the test results between the two tests (i.e., LLC and
HT) because LLC just uses a subset of data, while HT uses the whole dataset. The LLC
assumes that N/T 0, so N should be small relative to T. For the G7 group, it is more
likely to add more years of data rather than add more countries, because the number of
such countries in the world is virtually fixed. Therefore, the assumption that T grows
faster than N is certainly reasonable. On the other hand, the HT test assumes that T is
fixed whereas N goes to infinity. This assumption seems not to be plausible.
In short, it is important to remember that when selecting a panel unit-root test, you must
consider the relative size of N and T, and the relative speeds at which they tend to
infinity or whether either N or T is fixed.
Breitung test
Both the LLC and HT tests take approach of first fitting a regression model and
subsequently adjusting the autoregressive parameter or its t statistic to compensate for
the bias induced by having a dynamic regressor and fixed effects in the model. The
Breitung (2000; Breitung and Das, 2005) test takes a different strategy: adjusting the
data before fitting a regression model so that bias adjustments are not needed.
133
Table 13.5: The Breitung test of lnrxrate for OECD countries.
In the LLC test, additional lags of the dependent variable could be included to control
for serial correlation. The Breitung procedure instead allows for a prewhitening of the
series before computing the test. In particular, if the trend option is not specified, we
regress Yit and Yi,t - 1 on Yi,t - 1, Yi,t - 2, …, Yi,t - p and use the residuals from those
regressions in places of Yi,t and Yi,t - 1 in computing the test. If the trend option is
specified, the Breitung method will use a different prewhitening procedure that involves
fitting only one (instead of two) preliminary regressions (StataCorp, 2015: p.517).
Monte Carlo simulations by Breitung (2000) show that bias-corrected statistic such as
LLC’s t* suffer from low power, particularly against alternative hypotheses with
autoregressive parameters near one (i.e., ~ 1) and when the panel-specific effects are
included. In contrast, the Breitung (2000) test statistic exhibits much higher power in
these cases. Moreover, the Breitung test has good power even with small datasets (N =
25, T = 25), though the power of the test appears to deteriorate when T is fixed and N
is increased. The Breitung test assumes that the error term uit is uncorrelated across both
cross-sectional dimension i and time dimension t.
The Breitung test results for OECD countries are presented in Table 13.5. Because the
p-value is 0.0465, so we can reject the null hypothesis of a unit root at the 5% level, but
not at the 1% level.
134
Im, Pesaran and Shin (IPS) test
All the tests we have discussed so far assumed that all panels are homogeneous across
sections [i.e., 1 = 2 = … N = ]. Im et al. (1997, 2003) extended the LLC test, allowing
for heterogeneity in the value of under the alternative hypothesis. The IPS test
provides separate estimations for each i section, allowing different specifications of the
parametric values, the residual variance and the lag lengths (Asteriou & Hall 2011:
p.444). In addition, the IPS test does not require balanced datasets, though there cannot
be gaps within a panel (StataCorp, 2015: p.518). Their model is given by:
while now the null and alternative hypotheses are formulated as:
Thus, the null for this test is that all series are non-stationary processes under the
alternative that some or all of the individual series in the panel are stationary. This is in
sharp contrast with the LLC test, which assumes that all series are stationary under the
alternative hypothesis (Asteriou & Hall, 2011: p.444; Ouedraogo, 2013). In addition,
the model assumes the errors uit are serially autocorrelated with different serial
correlation (and variance) properties across units (Banerjee, 1999).
The authors object the use of pooled panel estimators as those used by LLC test, for
processes which display heterogeneity. Therefore, Im et al. (1997) propose the use of a
group-mean Lagrange multiplier (LM) statistic to test for the null hypothesis (Banerjee,
1999). However, when N and T are fixed, IPS uses simulation to calculate ‘exact’
critical values for the average of the ti statistics (i.e., t-bar), which requires a balanced
panel (Asteriou & Hall, 2011: p.444; StataCorp, 2015: p.519). Their t-bar statistic is
nothing other than the average of the individual ADF t-statistics for testing that i = 0
for all i (denoted by ti):
1
t̅ = ∑N
i=1 t i (118)
N
135
Table 13.5: The IPS test of lnrxrate for OECD countries.
Because t-bar value (= - 3.1327) is less than even its 1% critical value (= - 1.810), we
strongly reject the null hypothesis of a unit root.
The statistic labeled t-tilde-bar is similar to the t-bar statistic, except that a different
estimator of the Dickey-Fuller regression error variance is used (StataCorp, 2015:
p.519). In addition, a standardized version of t-tilde-bar statistic is labeled Z-t-tilde-bar
has an asymptotic standard normal distribution. And the p-value corresponding to Z-t-
tilde-bar is practically zero, which strongly reject the null hypothesis of a unit root.
If we include lag lengths of the dependent variable into the test equation, in the output
we see the W-t-bar statistic. This statistic has an asymptotically standard normal
distribution as T . In this case, we should have a reasonably large number of both
time periods and panels (StataCorp, 2015: p.520).
Fisher-type test
Maddala and Wu (1999) attempted to improve to some degree the shortcomings of both
the LLC and IPS tests. They argue that while Im et al. (1997) tests relax the assumption
of homogeneity of the root across the units, several difficulties still remain (see
Banerjee, 1999). Basically, Maddala and Wu are consent with the assumption that a
heterogenous alternative is preferable, they, however, disagree with the use of the
average ADF t statistics by arguing that it is not the most effective way of evaluating
stationarity (Asteriou & Hall, 2011: p.445). They propose the use of a test due to Fisher
(1932) which is based on combining the p-values of the test statistic for a unit root in
each cross-sectional unit. The Fisher test is non-parametric, and may be computed for
136
any arbitrary choice of a test for the unit root. It is an exact test and the statistic given
by (Banerjee, 1999):
= -2∑N
i=1 ln(πi ) (119)
where i is the probability limit values from regular ADF (or PP) unit-root tests for each
cross-section i. Because -2lni has a 2 distribution with 2 degrees of freedom, the
statistic will follow a 2 distribution with 2N degrees of freedom as Ti for finite
N. To consider the dependence between cross-sections, Maddala and Wu propose
obtaining the i values using bootstrap procedures by arguing that correlations between
groups can induce significant size distortions for the tests.
The Stata command is as follow:
xtunitroot fisher lnrxrate, dfuller drift lags(2) demean
or
xtunitroot fisher lnrxrate, pperron lags(2) demean
Table 13.6: The Fisher-type test of lnrxrate for all countries.
All four of the tests strongly reject the null hypothesis that all the panels contain unit
roots (StataCorp, 2015: p.522).
137
13.3 Panel Cointegration Tests
This section is mainly ‘cited’ from the StataCorp (2017a). In this manual of panel
cointegration tests, we realize a new Stata command ‘xtcointtest’ is introduced to
replace separate commands of previous Stata versions such as xtpedroni, xtwest, and
xtdolshm. Of course, you must install Stata 15 to run this updated command. The
xtcointtest performs the Kao (1999), Pedroni (1999, 2004), and Westerlund (2005) tests
of cointegration on a panel dataset. We can include panel-specific means and panel-
specific time trends in the cointegrating regression model. All tests have a common null
hypothesis of no cointegration. The alternative hypothesis of Kao and Pedroni tests is
that the variables are cointegrated in all panels. Westerlund test has two different
versions of alternative hypothesis, one assumes cointegration in all panels, another
assumes cointegration in some of the panels.
All the cointegration tests in xtcointest are based on the following panel-data model for
the I(1) dependent variable Yit, where i = 1, 2, …, N denotes the panel and t = 1, 2, …,
T, denotes time:
For each panel i, each of the covariates in Xit is an I(1) series. All the tests require that
the covariates are not cointegrated among themselves. The Pedroni and Westerlund tests
allow a maximum of seven covariates in Xit. i denotes the cointegrating vector, which
may vary across panels. i is a vector of coefficients on Zit, including the deterministic
terms that control for panel-specific effects and linear time trends, and eit is the error
term. Depending on the options specified with xtcointtest, the vector Zit allows for
panel-specific means, panel-specific means and panel-specific time trends, or nothing.
By default, Zit = 1, so the term Zit γi represents panel-specific means (i.e., fixed effects).
If trend is specified, Zit = (1, t) so Zit γi represents panel-specific means and panel-
specific linear trends. The option ‘noconstant’ specifies nothing.
All tests share a common null hypothesis that Yit and Xit are not cointegrated. xtcointtest
tests for no cointegration by testing that eit [from Eq.(120)] is nonstationary. Rejection
of the null hypothesis implies that eit is stationary and that the series Yit and Xit are
cointegrated. The alternative hypothesis of the Kao tests, the Pedroni tests, and the
allpanels version of the Westerlund tests is that the variables are cointegrated in all
panels. Whereas the alternative hypothesis of the somepanels version of the Westerlund
tests is that the variables are cointegrated in some of the panels.
All the tests allow unbalanced panels and require that N is large enough that the
distribution of a sample average of panel-level statistics converges to its population
138
distribution. They also require that each Ti is large enough to run time-series regressions
using observations only from that panel.
The Kao, Pedroni, and Westerlund tests implement different types of tests for whether
eit is nonstationary. The DF tests, ADF tests, PP tests, and their variants that are reported
by xtcointtest kao and xtcointtest pedroni use different regression frameworks to handle
serial correlation in eit. The VR (variance ratio) tests that are reported by xtcointtest
westerlund and xtcointtest pedroni do not require modeling or accommodating for serial
correlation.
All variants of the DF t test statistics are constructed by fitting the model in (120) using
ordinary least squares, obtaining the predicted residuals (êit ), and then fitting the DF
regression models:
e
̂ it = (ρ-1)êi,t−1 + it (121')
where ρ is the AR parameter and νit (it) is a stationary error term. The DF and the
unadjusted DF test whether the coefficient ρ is 1. By contrast, the modified DF and the
unadjusted modified DF test whether (ρ – 1) = 0. Nonstationarity under the null
hypothesis causes a test of whether ρ = 1 to differ from a test of whether (ρ – 1) = 0.
Note that these test equations assume the same AR coefficient.
The variants of these test statistics are based on the following DF regression model:
e
̂ it = (ρi-1)êi,t−1 + it (122')
In this case, we have a panel-specific AR parameter ρi. The PP t test statistic and its
variants are nonparametrically adjusted for serial correlation in the residuals using the
Newey and West (1987) heteroskedasticity- and autocorrelation-consistent (HAC)
covariance matrix estimator.
The DF, the modified DF, the PP, the modified PP, and the modified VR tests are
derived by specifying a data-generating process for the dependent variable and the
regressors. This specification allows the regressors to be endogenous as well as serially
correlated. Therefore, constructing the test statistics requires estimating the
contemporaneous and dynamic covariances between the regressors and the dependent
variable. The unadjusted DF and the unadjusted modified DF assume absence of serial
139
correlation and strictly exogenous covariates and do not require any adjustments in the
residuals.
Like the DF and PP tests, the ADF tests that ρ = 1. However, the ADF test uses
additional lags of the residuals to control for serial correlation instead of the Newey
West nonparametric adjustments. The ADF regression is
p
êit = ρêi,t−1 + ∑j=1 αij êi,t−j + wit (123)
or
p
êit = ρiêi,t−1 + ∑j=1 αij êi,t−j + wit (124)
where ∆êi,t−j is the jth lag of the first difference of ∆êi,t and j = 1, …, p is where p is the
number of lagged differences of dependent variable in each respective test equation.
The VR tests are based on Phillips and Ouliaris (1990) and Breitung (2002), where the
test statistic is constructed as a ratio of variances. These tests do not require modeling
or accommodating serial correlation. VR tests also test for no cointegration by testing
for the presence of a unit root in the residuals. However, they do so using the ratio of
variances of the predicted residuals. The modified VR test removes estimated
conditional variances prior to computing the VR.
Now take some examples using the command xtcointtest. The dataset used in these
examples is xtcoint.dta, which can be downloaded from Stata-Press by using the
command webuse xtcoint [remember that it is used for Stata 15]. The balanced panel
dataset on 100 countries observed from 1973q3 to 2010q4 contains quarterly data on
the log of productivity (productivity), log of domestic R&D capital stock (rddomestic),
and log of foreign R&D (rdforeign). In these examples, we are interested in the long-
run effects of domestic research and development (R&D) and foreign R&D on an
economy’s productivity.
Kao tests
The cointegrating relationship is specified as:
140
Here i is the panel-specific means and the cointegrating parameters 1 and 2 are the
same across panels. We assume each series is I(1). They can be tested by using panel
unit root tests discussed above (xtunitroot). It is noted that Kao tests assume the same
AR coefficient [i.e., using Eqs.(121, 121’, 123, 123’)].
The test result is as below:
Pedroni tests
The cointegrating relationship is specified as:
141
Table 13.8: Pedroni test for cointegration with panel-specific AR parameter.
All test statistics reject the null hypothesis of no cointegration in favor of the alternative
hypothesis of the existence of a cointegrating relationship among productivity, domestic
R&D, and foreign R&D.
142
Westerlund tests
In allpanels option, the Westerlund tests use the model in which the AR parameter is
the same over the panels, while the default option assumes the panel-specific
cointegrating vectors.
Table 13.10: Westerlund test for cointegration with some panels cointegrated.
Table 13.11: Westerlund test for cointegration with all panels cointegrated.
The VR statistics reject the null hypothesis of no cointegration. This implies at least
some or all panels are cointegrated.
143
14. SUGGESTED RESEARCH TOPICS
From previous studies, I would like to suggest the following topics that economics
students at UEH can consider for their research proposals.
Saving, Investment and Economic Development
▪ An analysis of the interaction among savings, investments and growth in Vietnam
▪ Are saving and investment cointegrated? The case of Vietnam
▪ Causal relationship between domestic savings and economic growth: Evidence from
Vietnam
▪ Does saving really matter for growth? Evidence from Vietnam
▪ The relationship between savings and growth: Cointegration and causality evidence
from Vietnam
▪ The saving and investment nexus for Vietnam: Evidence from cointegration tests
▪ Do foreign direct investment and gross domestic investment promote economic
growth?
▪ Foreign direct investment and economic growth in Vietnam: An empirical study of
causality and error correction mechanisms
▪ The interactions among foreign direct investment, economic growth, degree of
openness and unemployment in Vietnam
Trade and Economic Development
▪ How trade and foreign investment affect the growth: A case of Vietnam?
▪ Trade, foreign direct investment and economic growth in Vietnam
▪ A cointegration analysis of the long-run relationship between black and official
foreign exchange rates: The case of the Vietnam dong
▪ An empirical investigation of the causal relationship between openness and
economic growth in Vietnam
▪ Export and economic growth in Vietnam: A Granger causality analysis
▪ Export expansion and economic growth: Testing for cointegration and causality for
Vietnam
▪ Is the export-led growth hypothesis valid for Vietnam?
▪ Is there a long-run relationship between exports and imports in Vietnam?
▪ On economic growth, FDI and exports in Vietnam
▪ Trade liberalization and industrial growth in Vietnam: A cointegration analysis
144
Stock Market and Economic Development
▪ Causality between financial development and economic growth: An application of
vector error correction to Vietnam
▪ Financial development and the FDI growth nexus: The Vietnam case
▪ Macroeconomic environment and stock market: The Vietnam case
▪ The relationship between economic factors and equity market in Vietnam
▪ Modelling the linkages between the US and Vietnam stock markets
▪ The long-run relationship between stock returns and inflation in Vietnam
▪ The relationship between financial deepening and economic growth in Vietnam
▪ Testing market efficient hypothesis: The Vietnam stock market
▪ Threshold adjustment in the long-run relationship between stock prices and
economic activity
Energy and the Economy
▪ The dynamic relationship between the GDP, imports and domestic production of
crude oil: Evidence from Vietnam
▪ Causal relationship between gas consumption and economic growth: A case of
Vietnam
▪ Causal relationship between energy consumption and economic growth: The case of
Vietnam
▪ Causality relationship between electricity consumption and GDP in Vietnam
▪ The causal relationship between electricity consumption and economic growth in
Vietnam
▪ A cointegration analysis of gasoline demand in Vietnam
▪ Cointegration and causality testing of the energy-GDP relationship: A case of
Vietnam
▪ Does more energy consumption bolster economic growth?
▪ Energy consumption and economic growth in Vietnam: Evidence from a
cointegration and error correction model
▪ The causality between energy consumption and economic growth in Vietnam
▪ The relationship between the price of oil and macroeconomic performance:
Empirical evidence for Vietnam
145
Fiscal Policy and Economic Development
▪ A causal relationship between government spending and economic development:
An empirical examination of the Vietnam economy
▪ Economic growth and government expenditure: Evidence from Vietnam
▪ Government revenue, government expenditure, and temporal causality: Evidence
from Vietnam
▪ The relationship between budget deficits and money demand: Evidence from
Vietnam
Monetary Policy and Economic Development
▪ Granger causality between money and income for the Vietnam economy
▪ Money, inflation and causality: Evidence from Vietnam
▪ Money-output Granger causality: An empirical analysis for Vietnam
▪ Time-varying parameter error correction models: The demand for money in
Vietnam
▪ Monetary transmission mechanism in Vietnam: A VAR analysis
Tourism and Economic Development
▪ Cointegration analysis of quarterly tourism demand by international tourists:
Evidence from Vietnam
▪ Does tourism influence economic growth? A dynamic panel data approach
▪ International tourism and economic development in Vietnam: A Granger causality
test
▪ Tourism demand modelling: Some issues regarding unit roots, co-integration and
diagnostic tests
▪ Tourism, trade and growth: the case of Vietnam
Agriculture and Economic Development
▪ Dynamics of rice prices and agricultural wages in Vietnam
▪ Macroeconomic factors and agricultural production linkages: A case of Vietnam
▪ Is agriculture the engine of growth?
▪ The causal relationship between fertilizer consumption and agricultural productivity
in Vietnam
▪ Macroeconomics and agriculture in Vietnam
146
Others
▪ Hypotheses testing concerning relationships between spot prices of various types of
coffee
▪ The relationship between wages and prices in Vietnam
▪ An error correction model of luxury goods expenditures: Evidence from Vietnam
▪ The relationship between macroeconomic variables and housing price index: A case
of Vietnam
▪ Explaining house prices in Vietnam
▪ Long-term trend and short-run dynamics of the Vietnam gold price: an error
correction modelling approach
▪ Macroeconomic adjustment and private manufacturing investment in Vietnam: A
time-series analysis
▪ Testing for the long run relationship between nominal interest rates and inflation
using cointegration techniques
▪ The long-run relationship between house prices and income: Evidence from
Vietnam housing markets
It is noted that empirical studies have increasingly used the nonstationary panels,
typically characterized by panel unit root tests and panel cointegration tests. Above
topics can be done by using this strand of models.
147
series characterized by a unit root is known as I(1), i.e., it becomes stationary after
taking the first difference.
In order to know whether a certain series is stationary or not, we can initially use visual
graphics such as time line plot or correlogram. However, the formal statistical tests are
always preferred. We introduced various tests for a unit root such as Dickey-Fuller,
Phillips-Perron, DF-GLS, and KPSS. We started discussing dynamic modeling by
firstly clarifying the short-run and long-run relationships between I(1) variables within
a single equation context. The key to have long-run relationship is the cointegration
between or among variables. If two variables are cointegrated, we are able to investigate
both short-run and long-run effects through error correction mechanism (ECM) models.
In contrast, we just investigate the short-run relationship by regressing a model of the
first differences. For a single equation, the most popular method for testing
cointegration is the Engle-Granger residual-based unit root test (EG approach). This
testing procedure is simply an application of standard unit root tests to the residuals
obtained from regression between or among variables of interest. If the residual is a
stationary series, we conclude that the variables used in such a regression are
cointegrated. The cointegrating equation represents the long-run or equilibrium
relationship between or among variables. Otherwise, we encounter the problem of
spurious regression. Thanks to cointegration, we can estimate the ECM model, in which
the speed of adjustment to equilibrium has practical implications for policy formulation.
If nonstationary variables are cointegrated, the conventional OLS regression models of
the first differences are seriously mis-specified. Although EG approach has many useful
contributions, it also remains various drawbacks, especially in cases of multivariate
analysis and multiple equation context.
The framework for analyzing multivariate relationships and multiple equation systems
is vector autoregressive (VAR) model. VAR modeling provides an useful framework
for forecasting purposes, causality analysis and especially estimation of vector error
correction mechanism (VECM) models. Similar to the single equation case,
cointegration is also a topic of interest in multiple equation approach. In such a situation,
Johansen test for cointegration has dominated the existing literature over the last two
decades. However, Johansen test requires that all the variables under study are
integrated of the same order 1 [i.e., I(1)]. This is not always a case in practice. If we are
skeptical that variables may be I(0), I(1) or mutually cointegrated, we can proceed the
study with ARDL bounds testing approach. It is noted that ARDL bounds test is used
for single equation, either bivariate or multivariate models.
We end our discussion with nonstationary panels, which is an extension of time series
data for panel data, where time dimension is characterized by nonstationarity. This
offers opportunities for pursuing a new strand of research in macroeconomics and
especially energy economics. For Applied Economics students, this new package of
148
techniques is likely more complicated than traditional time series models because they
have not been officially introduced into the curriculum. However, everything has its
price. It is harder, but it provides you chances to do promising researches. I introduced
a brief summary of techniques for nonstationary panels such as panel unit root tests and
panel cointegration tests. Other issues such as causality analysis, dynamic OLS (i.e.,
DOLS), and fully modified OLS (FMOLS) are beyond of this series of lectures 45.
Therefore, if you are really interested, previous empirical studies and advanced
econometrics textbooks are indeed good references.
There are still things that I’d like to share with you on topics of time series econometrics,
but time is over. I hope the notes provide you the very basic knowledge, and it is now
time for you to grab the key points learned so far and prepare a research project of your
interest with real data. My final words for you are as follows:
➢ Self-study
Stata Press provides very good documentation for self-studying
(http://www.stata-press.com). Here you can find out four sources of updated
materials: Books, eBooks, Stata documentation, and Stata journal. I’m most
interested in Stata documentation, because it provides (for free) in details every
syntax and interpretation, including various examples that are extremely useful
for a life-time studying. Two manuals that closely relate to our current discussion
are Time-Series and Longitudinal-data/Panel-data. To be effective, you should
download the datasets (in Supplemental materials), re-do the examples step-by-
step, look at the results, and carefully read the interpretations in these manuals.
In addition, you should prepare do-files for every exercise you’ll have done
because this is a good way that helps review learned lessons in case you might
forget. Furthermore, you can learn plenty of experiences from others via Statalist
and Stata Blog (in Support). Here you can see every problems that one will face
when working with Stata.
➢ Literature review
Students often ask me and my colleagues a question like ‘where do the research
topics come from?’ They are from everywhere around us such as real life
observations, talking with others, and so on. But I think the most important
source is from reading the field of knowledge that you are mostly interested in.
45
A good example for nonstationary panel analysis is the study by Ouedraogo (2013) on the relationship between
energy consumption and human development in 15 developing countries for the period 1988 to 2008. In this study,
the author used all tests (LLC, Breitung, IPS, Fisher-type, and Hadri) for a panel unit root, the Pedroni tests for
panel cointegration, FMOLS and DOLS for long-run elasticities using a panel error correction model, and panel
causality analysis.
149
Reading previous studies makes you think as a resaercher. Every research article
feeds you new ideas for further studies. Reading will show you the gaps that need
to be further filled. For time series topics, you can search a lot of studies in
macroeconomics, financial economics, development economics, environmental
economics, energy economics, health economics, etc. It depends on the major
you pays much attention. For students at developing countries, official access to
academic journals is not an easy task. But you can do in two ways: Google
Scholar and your supervisor. For cross-sectional data researches, as a student it’s
harder to conduct expensive surveys. But for time series data, you can access
available databases much easier, because the UEH’s Data Center has updated
data in various topics.
➢ Hard working
Doing research is never easy. Of course, it is actually a narrow door. It requires
a very strong passion, an acestic spirit, and an open mind. You will face a lot of
difficulties from the beginning to the end. Finding a novel research idea is not
easy. Developing a feasible research proposal is not easy. Looking for funding
is not easy. Writing a complete manuscript is not easy. And publishing it is really
hard. Besides, successive failures are waiting for you at every step. But being an
economics student you should think like an economist. Trade-offs. Being an
economics student you should think of a research career, at least an analyst at a
local fund management company, not just a sales assistant at an MNC in an
empty suit with a couple of luxury smart phones. What I mean is you must work
harder than you think. Societies of developing countries like ours are still prefer
money earners over knowledge creators. However, I believe things are changing.
Economics graduates will be publicly recognized if you and your next
generations are working more seriously./.
150
REFERENCES
Acock, A. C. (2014). A gentle introduction to Stata, 4th Edition. Stata Press.
Adkins, L. C., and Hill, R. C. (2011). Using Stata for principles of econometrics, 4th
Edition. John Wiley & Sons.
Asteriou, D., and Hall, S.G. (2011). Applied econometrics, 2nd Edition. Palgrave
Macmillan.
Banerjee, A. (1999). Panel data unit roots and cointegration: An overview. Oxford
Bulletin of Economics and Statistics, Special Issue, 607-629.
Binh, P. T. (2011). Energy consumption and economic growth in Vietnam: Threshold
cointegration and causality analysis. International Journal of Energy Economics
and Policy, 1, 1-17.
Danish, Wang, B., and Wang, Z. (2018). Imported technology and CO2 emission in
China: Collecting evidence through bound testing and VECM approach. Renewable
and Sustainable Energy Reviews, 82, 4204-14.
Dickey, D.A. and Fuller, W.A. (1979). Distribution of the estimators for autoregressive
time series with a unit root’. Journal of the American Statistical Association, 74,
427- 431.
Dickey, D.A. and Fuller, W.A. (1981). Likelihood ratio statistics for autoregressive
time series with a unit root. Econometrica, 49, 1063.
Engle, R.F., and Granger, C.W.J. (1987). Co-integration and error correction estimates:
representation, estimation, and testing. Econometrica, 55, 251–276.
Granger, C.W.J. (1981). Some properties of time series data and their use in
econometric model specification. Journal of Econometrics, 16, 121-130.
Granger, C.W.J. and Newbold, P. (1977). Spurious regression in econometrics’.
Journal of Econometrics, 2, 111-120.
Greene, W. H. (2008). Econometric analysis, 6th Edition. Pearson.
Gregory, A. W., and Hansen, B. E. (1996). Residual-based tests for cointegration in
models with regime shifts. Journal of Econometrics, 70, 461-70.
Gujarati, D. (2011). Econometrics by Example, 1st Edition, Palgrave Macmillan.
Gujarati, D., and Porter, D. (2009). Basic Econometrics, 5th Edition, McGraw-Hill.
Hamilton, L. C. (2013). Statistics with Stata: Updated for version 12. CENGAGE
Learning.
Hanke, J.E., and Wichern, D.W. (2005). Business Forecasting, 8th Edition. Pearson
Education.
Johansen, S. (1991). Estimation and hypothesis testing of cointegration vectors in
Gaussian vector autoregressive models. Econometrica, 59, 1551-1580.
151
Johansen, S. and Juselius, K. (1990). Maximum likelihood estimation and inference on
cointegration, with applications for the demand for money. Oxford Bulletin of
Economics and Statistics, 52, 169-210.
Kripfganz, S., and Schneider, D. C. (2016). Ardl: Stata module to estimate
autoregressive distributed lag models. Stata Conference, Chicago.
Ljung, G.M. and Box, G.E.P. (1978). On a measure of lack of fit in times series models.
Biometrica, 65, 297-303.
Lumsdaine, R., and Papell, D. (1997). Multiple trend breaks and the unit root
hypothesis. Review of Economics and Statistics, 79, 212-18.
MacKinnon, J.G. (1991). Critical values for cointegration tests, in R.F. Engle and
C.W.J. Granger (eds), Long-run economic relationships: Readings in cointegtion.
Oxford: Oxford University Press.
Mackinnon, J.G. (1996). Numerical distribution functions for unit root and
cointegration tests. Journal of Applied Econometrics 11, 601-618.
Narayan, P. K. (2005). The saving and investment nexus for China: evidence from
cointegration tests. Applied Economics, 37, 1979-1990.
Narayan, P. K., and R. Smyth (2014). Applied econometrics and a decade of energy
economics research. Unpublished manuscript.
Nguyen Trong Hoai, Phung Thanh Binh, and Nguyen Khanh Duy. (2009). Forecasting
and data analysis in economics and finance, Statistical Publishing House.
Omri, A. (2014). An international literature survey on energy economic growth nexus:
Evidence from country-specific studies. MPRA Paper, No. 82452.
Ouedraogo, N. (2013). Energy consumption and human development: Evidence from
a panel cointegration and error correction model. Energy, 63, 28-41.
Ozturk, I. (2010). A literature survey on energy–growth nexus. Energy Policy 38, 340–
349.
Pesaran, H.M., Shin, Y., Smith, R.J. (2001). Bounds testing approaches to the analysis
of level relationships. Journal of Applied Econometrics 16, 289–326.
Phillips, P. C. B. (1986). Understanding spurious regressions in econometrics. Journal
of Econometrics, 33, 311-340.
Phillips, P.C.B. (1987). Time series regression with a unit root’, Econometrica, 55,
277-301.
Phillips, P.C.B. (1998). New tools for understanding spurious regressions.
Econometrica, 66, 1299-1325.
Phillips, P.C.B. and Perron, P. (1988). Testing for a unit root in time series regression.
Biometrica, 75, 335-346.
Rahman, M. M., and Kashem, M. A. (2017). Carbon emissions, energy consumption
and industrial growth in Bangladesh: Empirical evidence from ARDL cointegration
and Granger causality analysis. Energy Policy, 110, 600-8.
152
Rushdi, M., Kim, J. H., and Silvapulle, P. (2012). ARDL bounds tests and robust
inference for the long run relationship between real stock returns and infl ation
in Australia. Economic Modeling, 29, 535-543.
Sims, C.A. (1980). Macroeconomics and reality. Econometrica, 48, 1-48.
StataCorp. (2015). Longitudinal data/panel data reference manual release 14:
xtunitroot. College Station, TX, Stata-Press.
StataCorp. (2017a). Longitudinal data/panel data reference manual release 15:
xtcointtest. College Station, TX, Stata-Press.
StataCorp. (2017b). Time series reference manual release 15. College Station, TX,
Stata-Press.
Stock, J.H., and Watson, M.W. (2015) Introduction to econometrics, 3rd Edition,
Pearson Education.
Studenmund, A.H. (2017). Using econometrics: A practical guide, 7th Edition, Pearson.
Toda, H.Y. and Yamamoto, T. (1995). Statistical inference in vector autoregressive
with possibly integrated processes. Journal of Econometrics, 66, 225-250.
Verbeek, M. (2004). A Guide to modern econometrics. 2nd Edition, John Wiley & Sons.
Westerlund, J., Thuraisamy, K., and Sharma, S. (2015). On the use of panel
cointegration tests in energy economics. Energy Economics, 50, 359-63.
Wooldridge, J. M. (2013). Introductory econometrics: A modern approach, 5th Edition,
South-Western CENGAGE Learning.
Zhang, H., Zhao, Q., Kuuluvainen, I., Wang, C., and Li, S. (2015). Determinants
of China’s lumber import: A bounds test for cointegration with monthly data.
Journal of Forest Economics, 21, 269-82.
Zivot, E., and Andrews, D. W. K. (1992). Further evidence of the great crush, the oil
price shock and the unit root hypothesis. Journal of Business and Economic
Statistics, 10, 251-70.
153