ARDL
ARDL
[Note: For an important update of this post, relating to EViews 9, see my 2015 post,
here.]
Well, I finally got it done! Some of these posts take more time to prepare than you
might think.
The first part of this discussion was covered in a (sort of!) recent post, in which I
gave a brief description of Autoregressive Distributed Lag (ARDL) models, together
with some historical perspective. Now it's time for us to get down to business and
see how these models have come to play a very important role recently in the
modelling of non-stationary time-series data.
In particular, we'll see how they're used to implement the so-called "Bounds Tests",
to see if long-run relationships are present when we have a group of time-series,
some of which may be stationary, while others are not. A detailed worked example,
using EViews, is included.
First, recall that the basic form of an ARDL regression model is:
We're going to modify this model somewhat for our purposes here. Specifically, we'll
work with a mixture of differences and levels of the data. The reasons for this will
become apparent as we go along.
Let's suppose that we have a set of time-series variables, and we want to model the
relationship between them, taking into account any unit roots and/or cointegration
associated with the data. First, note that there are three straightforward situations
that we're going to put to one side, because they can be dealt with in standard
ways:
We know that all of the series are I(0), and hence stationary. In this case, we can
simply model the data in their levels, using OLS estimation, for example.
We know that all of the series are integrated of the same order (e.g., I(1)), but they
are not cointegrated. In this case, we can just (appropriately) difference each series,
and estimate a standard regression model using OLS.
We know that all of the series are integrated of the same order, and they are
cointegrated. In this case, we can estimate two types of models: (i) An OLS
regression model using the levels of the data. This will provide the long-run
equilibrating relationship between the variables. (ii) An error-correction model
(ECM), estimated by OLS. This model will represent the short-run dynamics of the
relationship between the variables.
Now, let's return to the more complicated situation mentioned above. Some of the
variables in question may be stationary, some may be I(1) or even fractionally
integrated, and there is also the possibility of cointegration among some of the I(1)
variables. In other words, things just aren't as "clear cut" as in the three situations
noted above
What do we do in such cases if we want to model the data appropriately and extract
both long-run and short-run relationships? This is where the ARDL model enters the
picture.
The ARDL / Bounds Testing methodology of Pesaran and Shin (1999) and Pesaran et
al. (2001) has a number of features that many researchers feel give it some
advantages over conventional cointegration testing. For instance:
It can be used with a mixture of I(0) and I(1) data.
It involves just a single-equation set-up, making it simple to implement and
interpret.
Different variables can be assigned different lag-lengths as they enter the model.
We need a road map to help us. Here are the basic steps that we're going to follow
(with details to be added below):
Make sure than none of the variables are I(2), as such data will invalidate the
methodology.
Formulate an "unrestricted" error-correction model (ECM). This will be a particular
type of ARDL model.
Determine the appropriate lag structure for the model in step 2.
Make sure that the errors of this model are serially independent.
Make sure that the model is "dynamically stable".
Perform a "Bounds Test" to see if there is evidence of a long-run relationship
between the variables.
If the outcome at step 6 is positive, estimate a long-run "levels model", as well as a
separate "restricted" ECM.
Use the results of the models estimated in step 7 to measure short-run dynamic
effects, and the long-run equilibrating relationship between the variables.
We can see from the form of the generic ARDL model given in equation (1) above,
that such models are characterised by having lags of the dependent variable, as
well as lags (and perhaps the current value) of other variables, as the regressors.
Let's suppose that there are three variables that we're interested in modelling: a
dependent variable, y, and two other explanatory variables, x1 and x2. More
generally, there will be (k + 1) variables - a dependent variable, and k other
variables.
Before we start, let's recall what a conventional ECM for cointegrated data looks
like. It would be of the form:
(2)
Here, z, the "error-correction term", is the OLS residuals series from the long-run
"cointegrating regression",
yt = 0 + 1x1t + 2x2t + vt
(3)
Step 1:
We can use the ADF and KPSS tests to check that none of the series we're working
with are I(2).
Step 2:
Formulate the following model:
;
Notice that this is almost like a traditional ECM. The difference is that we've now
replaced the error-correction term, zt-1 with the terms yt-1, x1t-1, and x2t-1. From
(3), we can see that the lagged residuals series would be zt-1 = (yt-1 - a0 - a1x1t-1 a2x2t-1), where the a's are the OLS estimates of the 's. So, what we're doing in
equation (4) is including the same lagged levels as we do in a regular ECM, but
we're not restricting their coefficients.
This is why we might call equation (4) an "unrestricted ECM", or an "unconstrained
ECM". Pesaran et al. (2001) call this a "conditional ECM".
Step 3:
The ranges of summation in the various terms in (4) are from 1 to p, 0 to q1, and 0
to q2 respectively.We need to select the appropriate values for the maximum lags,
p, q1, and q2. Also, note that the "zero lags" on x1 and x2 may not necessarily be
needed. Usually, these maximum lags are determined by using one or more of the
"information criteria" - AIC, SC (BIC), HQ, etc. These criteria are based on a high loglikelihood value, with a "penalty" for including more lags to achieve this. The form of
the penalty varies from one criterion to another. Each criterion starts with -2log(L),
and then penalizes, so the smaller the value of an information criterion the better
the result.
I generally use the Schwarz (Bayes) criterion (SC), as it's a consistent modelselector. Some care has to be taken not to "over-select" the maximum lags, and I
usually also pay some attention to the (apparent) significance of the coefficients in
the model.
Step 4:
A key assumption in the ARDL / Bounds Testing methodology of Pesaran et al.
(2001) is that the errors of equation (4) must be serially independent. As those
authors note (p.308), this requirement may also be influential in our final choice of
the maximum lags for the variables in the model.
Once an apparently suitable version of (4) has been estimated, we should use the
LM test to test the null hypothesis that the errors are serially independent, against
the alternative hypothesis that the errors are (either) AR(m) or MA(m), for m = 1, 2,
3,...., etc.
Step 5:
We have a model with an autoregressive structure, so we have to be sure that the
model is "dynamically stable". For full details of what this means, see my recent
post, When is an Autoregressive Model Dynamically Stable? What we need to do is
to check that all of the inverse roots of the characteristic equation associated with
our model lie strictly inside the unit circle. That recent post of mine showed how to
trick EViews into giving us the information we want in order to check that this
condition is satisfied. I won't repeat that here.
Step 6:
Now we're ready to perform the "Bounds Testing"!
Here's equation (4), again:
yt = 0 + iyt-i + jx1t-j + kx2t-k + 0yt-1 + 1x1t-1 + 2 x2t-1 +
et ; (4)
tabulated by Pesaran et al. (2001; pp.303-304), this would support the conclusion
that there is a long-run relationship between the variables. If the t-statistic is less
than the "I(0) bound", we'd conclude that the data are all stationary.
Step 7:
Assuming that the bounds test leads to the conclusion of cointegration, we can
meaningfully estimate the long-run equilibrium relationship between the variables:
yt = 0 + 1x1t + 2x2t + vt
(5)
(6)
where zt-1 = (yt-1 -a0 - a1x1t-1 - a2x2t-1), and the a's are the OLS estimates of the
's in (5).
Step 8:
We can "extract" long-run effects from the unrestricted ECM. Looking back at
equation (4), and noting that at a long-run equilibrium, yt = 0, x1t = x2t = 0,
we see that the long-run coefficients for x1 and x2 are -(1/ 0) and -(2/ 0)
respectively.
An Example:
Now we're ready to look at a very simple empirical example. I'm going to use the
data for U.S. and European natural gas prices that I made available as a second
example in my post, Testing for Granger Causality. I didn't go through the details of
testing for Granger causality with that set of data, but I mentioned near the end of
the post, and the EViews file (which included a "read_me" object with comments
about the results) is there on the code page for this blog (dated 29 April, 2011).
If you look back at that earlier file, you'll find that I used the Toda-Yamamoto (1995)
testing procedure to determine that there is Granger causality running from the U.S.
series to the European series, but not vice versa.
A new EViews file that uses the same data for our ARDL modelling is available on
the code page, under the date for the current post. The data for the two time-series
we'll be using are also available on the data page for this blog. The data are
monthly, from 1995(01) to 2011(03). In terms of the notation that was introduced
earlier, we have (k + 1) = 2 variables, so k = 1 when it comes to the bounds
testing.
Here's a plot of the data we'll be using (remember that you can enlarge most of
these inserts by clicking on them):
To complete Step 1, we need to check that neither of our time-series are I(2).
Applying the ADF test to the levels of EUR and US, the p-values are 0.53 and 0.10
respectively. Applying the test to the first-differences of the series, the p-values are
both 0.00. (The lag-lengths for the ADF regressions were chosen using the Schwarz
criterion, SC.) Clearly, neither series is I(2).
Applying the KPSS test we reject the null of stationarity, even at the 1%
significance level, for both EUR and US, but cannot reject the null of I(1) against I(2).
The p-value of 10% for the ADF test of I(1) vs. I(0) for the EUR series may leave us
wondering if that series is stationary, or not. You'll know that apparent "conflicts"
between the outcomes of tests such as these are very common in practice.
This is a great illustration of how the ARDL / Bounds Testing methodology can help
us. In order for standard cointegration testing (such as that of Engle and Granger, or
Johansen) to make any sense, we must be really sure that all of the series are
integrated of the same order. In this instance, you might not be feeling totally sure
that this is the case.
Step 2 is straightforward. Given that the Granger causality testing associated with
my earlier post suggested that there is causality from US to EUR (but not vice
versa), EUR is going to be the dependent variable in my unrestricted ECM:
(5)
To implement the information criteria for selecting the lag-lengths in an timeefficient way, I "tricked" EViews into providing lots of them at once by doing the
following. I estimated a 1-equation VAR model for EURt and I supplied the
intercept, EURt-1, USt-1, and a fixed number of lags of USt as exogenous
regressors. For example, when the fixed number of lags on USt was zero, here's
how I specified the VAR:
After estimating this model, I then chose VIEW, LAG STRUCTURE, LAG LENGTH
CRITERIA:
I then repeated this by adding USt-1 to the list of exogenous variables, and got
the following results:
I proceeded in this manner with additional lags of USt in the "exogenous" list. I
also considered cases such as:
Looking at the SC values in these three tables of results, we see that a maximum
lag of 4 is suggested for EURt. (The AIC values suggest that 8 lags of EURt may
be appropriate, but some experimentation with this was not fruitful.)
There is virtually no difference between the SC values for the case where the model
includes just USt as a regressor (0.8714), and the case where just USt-1 is included
(0.8718). To get some dynamics into the model, I'm going to go with the latter case.
With Step 3 completed, and with this lag specification in mind, let's now look at the
estimated unrestricted ECM:
Step 4 involves checking that the errors of this model are serially independent.
Selecting VIEW, RESIDUAL DIAGNOSTICS, SERIAL CORRELATION LM TEST, I get the
following results:
LM
p-value
0.079
0.779
2.878
0.237
5.380
0.146
11.753
0.019
O.K., we have a problem with serial correlation! To deal with it, I experimented with
one or two additional lags of the dependent variable as regressors, and ended up
with the following specification for the unrestricted ECM:
The serial independence results now look much more satisfactory:
m
1
LM
0.013
p-value
0.911
3.337
0.189
5.183
0.159
7.989
0.092
8.473
0.132
11.023
0.088
12.270
0.092
12.334
0.137
Next, Step 5 involves checking the dynamic stability of this ARDL model. Here are
the inverse roots of the associated characteristic equation:
All seems to be well - these roots are all inside the unit circle.
Before proceeding to the Bounds Testing, let's take a look at the "fit" of our
unrestricted ECM. The "Actual / Fitted / Residuals" plot looks like this:
When we "unscramble" these results, and look at the fit of the model in terms of
explaining the level of EUR itself, rather than EUR, things look pretty good:
We're now ready for Step 6 - the Bounds Test itself. We want to test if the
coefficients of both EUR(-1) and US(-1) are zero in our estimated model (repeated
below):
The value of our F-statistic is 5.827, and we have (k + 1) = 2 variables (EUR and US)
in our model. So, when we go to the Bounds Test tables of critical values, we have k
= 1.
Table CI (iii) on p.300 of Pesaran et al. (2001) is the relevant table for us to use here.
We haven't constrained the intercept of our model, and there is no linear trend term
included in the ECM. The lower and upper bounds for the F-test statistic at the 10%,
5%, and 1% significance levels are [4.04 , 4.78], [4.94 , 5.73], and [6.84 , 7.84]
respectively.
As the value of our F-statistic exceeds the upper bound at the 5% significance level,
we can conclude that there is evidence of a long-run relationship between the two
time-series (at this level of significance or greater).
In addition, the t-statistic on EUR(-1) is -2.926. When we look at Table CII (iii) on
p.303 of Pesaran et al. (2001), we find that the I(0) and I(1) bounds for the t-statistic
at the 10%, 5%, and 1% significance levels are [-2.57 , -2.91], [-2.86 , -3.22], and [3.43 , -3.82] respectively. At least at the 10% significance level, this result reinforces
our conclusion that there is a long-run relationship between EUR and US.
we see that the long-run multiplier between US and EUR is -(0.047134 / (0.030804)) = 1.53. In the long run, an increase of 1 unit in US will lead to an
increase of 1.53 units in EUR.
EURt = 0 + 1USt + vt
by OLS, and construct the residuals series, {zt}, we can fit a regular (restricted)
ECM:
Notice that the coefficient of the error-correction term, zt-1, is negative and very
significant. This is what we'd expect if there is cointegration between EUR and US.
The magnitude of this coefficient implies that nearly 3% of any disequilibrium
between EUR and US is corrected within one period (one month).
As none of the roots lie on the X (real) axis, it's clear that we have three complex
conjugate pairs of roots. Accordingly, the short-run dynamics associated with the
model are quite complicated. This can be seen if we consider the impulse response
function associated with a "shock" of one (sample) standard deviation:
Finally, the within-sample fit (in terms of the levels of EUR) is exceptionally good:
In fact, the simple correlations between EUR and the "fitted" EUR series from the
unrestricted and regular ECM's are each 0.994, and the correlation between the two
fitted series is 0.9999.
[Note: For an important update of this post, relating to EViews 9, see my 2015 post,
here.]
References
Pesaran, M. H. and Y. Shin, 1999. An autoregressive distributed lag modelling
approach to cointegration analysis. Chapter 11 in S. Strom (ed.), Econometrics and
Economic Theory in the 20th Century: The Ragnar Frisch Centennial Symposium.
Cambridge University Press, Cambridge. (Discussion Paper version.)
Pesaran, M. H., Shin, Y. and Smith, R. J., 2001. Bounds testing approaches to the
analysis of level relationships. Journal of Applied Econometrics, 16, 289326.
Pesaran, M. H. and R. P. Smith, 1998. Structural analysis of cointegrating VARs.
Journal of Economic Surveys, 12, 471-505.
Toda, H. Y and T. Yamamoto (1995). Statistical inferences in vector autoregressions
with possibly integrated processes. Journal of Econometrics, 66, 225-250.