We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 18
pn
1 Preliminaries
1.1. Time Series Variables and Dynamic Models
For a time series variable 44, the observations usually are indexed by a ¢ subscript instead of i.
Unless stated otherwise, we assume that yr is observed at each period ¢ = 1,....n, and these
periods are evenly spaced over time, e.g. years, months, or quarters. y, can be a flow variable
(eg. GDP, trading volume), or a stock variable (e.g. capital stock) or a price or interest rate. For
stock variables or prices, it can be important how they are defined, since they may vary within the
period. A price at period t might be defined as the price in the middle of the period, at the end of
the period, the average price over the period, etc.
A model that describes how yp evolves over time is called a time series process, and a regression
‘model that has terms ftom different time periods entering in the same equation is a dynamic model.
An example of a dynamic model is:
Ve = Bo+ Priya + Bate + Byte + uy
Models with time series variables usually are dynamic models, but not necessarily. You might have
Ye= 0+ Use + Ue
where u is distributed independently of its past values. This is not a dynamic model, because
there is nothing in it that links the different time periods.
1.2 Lag and First Difference Operators
‘The lag operator L is defined as:
Tye = ue1L* for some positive integer a means lagging ye by a periods. If a = 2 then
Dy = Ly = U(Ly) = Lani = yea
Lag polynomials are notated as A(L), B(L), etc... An example is
‘Then
AD) ue = ve = Ae
Often a lag polynomials can be inverted. Let A(L) = 1— pL. If |p| is less than one, then A(L)~*
expands like a geometric series,
A(L)“! = (1 — pL)! = 1+ pL + p°L? + p'L? + pilt +...
This expansion is used to obtain equation (2) in section 2.1.1.
The first difference, or change in y: compared to the previous period, will be denoted by the first
difference operator D, where:
Dy =e
(The symbol A is used later for describing the change in y due to a change in z.)
Note that D?yp, the second difference of yp, does not equal ys — ye-2:
Dy, = DDye = D(Dyx) = D(ye-ve-1) = Dye Dea = (ue-ue-1)— (Yea yea) = ye 2yeatyeee
1.3 White Noise
In dynamic models, the effects of the error terms are felt across time and across variables. ‘To under
stand what the model is predicting, and to figure out how to estimate its coefficients and variances,
models are commonly specified with error terms having no autocorrelation or heteroskedasticity.
These are referred to as white noise errors or disturbances. Following Greene's textbook, we will
represent a white noise variable as u. (In many other sources it is represented by ¢.) So the term“white noise” is a short way to say that
Eu = 0, Var(w) = 07, for all t, and
Ew, = 0, forall ts ,t#s
1.4. Stationarity
Stationarity usually means covariance stationarity, where the expected value, variance, and auto-
covariances of yt do not depend on t, that is, they do not change over time,
are not functions of t
Eye, Var(ye); Cov(yty Ye
In a model, the phrase “y is stationary” can mean “y is assumed to follow a covariance stationary
time series process.” If y: is an observed time series, then “y, is stationary” is a (possibly over-
confident) way of saying either “on the basis of some testing procedure, we cannot reject the null
hypothesis that ye was generated by a covariance stationary process” or “we can reject the null
hypothesis that ye was generated by a non-stationary process.”
White noise is stationary, but stationarity does not imply white noise. An autocorrelated process
can be stationary but is not white noise.
1.5 Examples of nonstationary processes
(i) An example of a nonstationary process is
y=a+ Bt+u
where u; is white noise. ‘The mean of ye, a+ Bt, is a function of t, This process is called trend
stationary, because apart from the trend, the rest of the process is stationary. That is, once
{Bt is accounted for by moving it to the left-hand side, the rest is stationary: ye — Gt = a-+ we
) Another example of a nonstationary process is a random walk,
We Yea tue2
21
where uy is white noise. Taking the variance of both sides,
Var(y) = Var(-a +m)
Var(ye) = Var(ye—1) +
Unless 2 = 0, the variance of this process increases with 4, hence must depend on ¢ and is
not stationary.
Time Series Models
ARMA models
An important time series model in statistics is the autoregressive moving average (ARMA(p, q))
model
where u; is white noise. Letting A(L) = 1— pL —.
then the ARMA(p,q) model can be written compactly as
ves e+ pryiaa tee + Ppdap te + tea tee tb Igue-g
~ ppl? and B(L) =14+yL+...+%4L4,
A(L)ye = + B(L)ue
‘The A(L)y term is the autoregressive part and the B(L)u term is the moving average part.
An ARMA(p, 0) process is more simply called an AR(p) process,
A(L)y, = 0+ ue
‘This is a regression model where y is regressed on its own lags.
Similarly, an ARMA((,q) process is called an MA(q) process,
ye=a+B(Luy
which expresses y; as a weighted average of the current and past q disturbances.
‘Some generalizations of ARMA models include(i) the autoregressive integrated moving average (ARIMA(p,d,q)) model
Dty = a+ prDtya +o. + PpD yp + ue maa +o Itty
which is an ARMA model applied to the d!* differen
of yy, and
(i) the ARMAX model
Ye =O + piyt +o. + PpYe—p + 218 + Ue + MMA +--+ Ygttt—g
which augments the ARMA model with k other regressor variables through a k x 1 vector 1, the magnitudes of the coefficients on the lag operators get larger as the lags go “further
back”. Roughly speaking, that would mean that what happened 100 periods ago would have a bigger
effect on the present than what happened last period. ‘This is an explosive time series process, which
sounds kind of exciting, but is not considered very useful in time series econometrics. There is a
great deal of interest in the borderline case where |p| = 1, specifically when p = 1 (not so much
p= -1). This value is associated with unit root time series, which will be dealt with later.
2.2 Dynamically complete models
Dynamically complete models are time series regression models with no autocorrelation in the
errors. Let x; and 2, be two time series explanatory variables, then such a model is
Me = At pit... + PpYe-p + Bote +--+ Butta + Woz t+... + web + UL (3)
which can be written more compactly as
A(L)ye = a+ B(L)ay + C(L)a + ue
where u; is white noise. There can be more right-hand side variables.2.3 Models with autocorrelated errors vs. dynamically complete models
Some of the above models have an MA component in the error terms, and therefore have auto
ypose we have specified a model with whi
related errors, while others have white noise errors.
noise errors, but find evidence of autocorrelation in the residuals of the fitted model. (‘Tests for
lation are discussed in section 4.2.2.) There are two main ways to adjust the model to deal
autocorr
with this, One is to model the autocorrelation in the errors, and the other is to include more lagged
regressors until there no longer is evidence of such autocorrelation. This second approach (making
the model dynamically complete (Wooldridge (2009, pp.396-9))) has become the more popular one.
In this section it is shown how these two approaches are related, and why the second approach has
become more popular.
2.3.1 Autoregressive errors
‘Suppose the model is
w= a+ fbrrter (@)
and autocorrelation is suspected in the error term ¢. Of the two ways of modeling autocorrelation
that we have seen, AR and MA processes, AR is by far the more common. Next, we will argue
why AR errors are more commonly used than MA errors. As usual, let u; represent white noise. If
the model has mean-zero AR(p) errors, then
= preat...+ Ppét-p + ut ,or equivalently
A(D)e = wm, where A(L) =1—piL—...—ppL?, or
e@ = AL) ly
Let p = 1, giving AR(1) or first-order autoregressive errors. It has been the most commonly-
employed version, and will simplify notation for our purposes. Then (4) becomes
we = at Br +(1- pil) my (5)
This AR error process implies a gradual decrease in the error autocorrelations as the lag increases.
Bee. = E(pie1 + us)er-1 = piVar(é:), assuming |p;| < 1 and therefore that the process isstationary. Continuing this approach would show that Eeree4 = pj Var(e,), and therefore
Cov(eveia) piVar(er) i
Cort(¢tét-s) = eet = =o
| (eet) = Tarte yWar(eraa) — VVarler :
If the residual autocorrelations die out gradually rather than suddenly, as would be implied by
fan MA(q) ertor process, then AR errors likely provide a better description of the autocorrelation
pattern.
Estimation with AR errors
Both AR and MA errors can be estimated efficiently by Generalized Least Squares when there are
no lagged y regressors. The GLS estimator of the regression coefficients when there are AR(1)
errors (e.g. GLS estimation of a and 6 in (5)) is very similar to the following “quasi-differencing”
procedure. First, estimate p1. Many estimators have been proposed, a simple one being the sample
correlation between the OLS residual e and its first lag e-1. Then take the approximate model
that results from substituting jr for p; in (6) and multiply through by (1 — AL),
we = 0+ Bret (1- Alyy
(1-AL)m = (1-ArL)at Bll ArL)ae + ue
hh = a + Batu
where fe = vi — Ayia and = 21 — pitt are quasi-differenced variables. OLS estimation of a*
and 8 in the last equation is very similar to the GLS estimator. It is known as the Cochrane-Oreutt
estimator and has been used since the 1940s.
Disadvantages of AR errors
In current practice, models with AR errors largely have been replaced by models with more lagged
explanatory variables and white noise errors. One disadvantage of AR errors is that the dynamic
effects are difficult to interpret. When the dynamics are only in observed variables, a more direct
description of “the effect of x on y” over time is possible.
‘Another disadvantage of AR errors arises when there are lagged y regressors. Consider the model
Me = atu + Bee +e, where € = per + ue (6)
therefore between y:-1 and «1. Unless p = 0, there must then be a correlation between the
Fegressor yy and the error term ¢, since y.—1 = «1-1 > ¢. Instrumental variable estimators have
beet proposed. Unlike many other endogenenous-regressor problems, however, there is another
\ 2 is an endogenous regressor. The model itself implies a correlation between y, and ¢, andway out, which is described next.
From AR errors to dynamically complete models
‘The once-popular time series model with AR errors
as been largely replaced by dynamically
complete models, which have more lags and no autocorrelation in the errors, The reasons for this
will be illustrated here by converting from one to the other and pointing out the advantages of the
latter.
First, rewrite model (6) with the autocorrelation removed by multiplying through by (1 — pL),
ye = +7 yA + Bary + (1— pL) hy
(1 eL)ue (L= pL)a + (1 — pL) yea + G(1 - pb)ae + we
we = + py + yy — Yom-2 + Bae — Spars + 7)
Model (7) has new lagged regressors which replace the dynamics that were in the error term of (6).
Simplifying the notation for the regression coefficients, (7) can be written as
Me = G+ Over + Orye2 + Ome + Bate + te (8)
Eliminating the autocorrelation in the errors has solved the endogenous regressor problem, although
comparing (7) to (8) reveals a restriction which can be written as 01630, = —63 + 6203. This
awkward-looking nonlinear restriction arises from having multiplied through the original model
by the “common factor” (1 — pL), and is referred to as a common factor restriction. In current
practice, it is typically not imposed. Researchers’ starting point is a model more like (8) than (6).
Having estimated (8), it is not natural to ask whether 6,030, = —63 + 6263. It is more natural to
ask if 8 = 0, or 4 = 0, or whether ys and/or 2,3 should be included as regressors, or whether
the error term really is white noise. The goal is to specify a dynamically complete model without
including lags unnecessarily.
2.4 Means and Variances
2.4.1 Conditional and unconditional means
If in Section 2.1.1, we assume that Eu, = 0 for all t in model (1), then (2) makes it easy to see
that Ey = 72;. This may seem to contradict (1), which says that Ey: = 7+ pyr. This apparent
contradiction arises because the regression model (1) on the one hand shows the expected value
of ye conditional on y.. On the other hand, (2) does not have any regressor variables and has appresonts the unconditional mean of ye. Summarizing
zero-mean error term, $0 its constant term re}
with more careful expectation notation,
Ey = [2p (umeonuitional menn)
y+ ey-1 (conditional mean)
Bluelue-1)
‘The unconditional mean does not depend on t. This is one of the necessary conditions for a process
to be stationary, The unconditional mean of y_ can be derived by using the assumed stationarity
of y to justify the restriction Ey, = By.-1 = y*, say. Take the expectation of both sides of the
above conditional mean expression with respect to both ye and ye-1 and then solve for ys.
Elueluet) = 7+ PE yrve-1
E(u) = y+pu*
y= 1+ pye
2.4.2 Deriving unconditional moments of a stationary AR(1)
Consider ye where
m= 6+ Aya tue
and u; is white noise. Since it is not specified otherwise, and usually it is not, we assume there are
no starting conditions. (An example of a starting condition is yo = 0.) When there are no starting
conditions, we are assuming that we observe a segment of a process that started long enough @g0
to have settled into its steady-state behaviour, in which a variable fluctuates around constant
Jong-run unconditional mean. The above process can be written as
(I= Aba = 6+u
we = (1- AL) 16+ (1- AL) tue
6
= joatet Auya + Pupr +.
= 10+ us
j=
10ise assumption leads to variance,
From the last expression we see that Ey, = 10. The whit
covariance and correlation restilts
Var(y) = E()> us)? LP Bui; (So(4%))o?
i io =o
a a
= (1e)o* uw
=]
Covluertes) = (92 Ses) = BQ PH ML
= =
= BYP ue ja) BY (4 ua) Poe
i= =
= San, = ate? wax
jo jo
Ato? 0?
Cov(yys)__ _ Cov(ues ie)
= Ter a
_ _Cov(yesyi-s) __ Covltus ves)
Corr(ye tts) = fart) Var(ues) Var(ee)
‘The covariances are not zero, 80 y is not white noise. But because the mean, variance, and
covariances do not change over time, yp is stationary.
2.4.3 Long-run expected values in dynamic regression models
‘The method of section 2.4.2 can be generalized to obtain the unconditional (on past y values) mean
of y, conditional on other explanatory variables, when the latter are assigned values that are fixed
over time. Consider
Me = Ot pry at paint -.-+ Ppp + MLE + aA Feb Mtoe + Ue OF
A(L)y = a+ B(L)te +m
where A(L) =1— pL —...— ppl? and B(L) = 9+ 1L+...+79L".
‘The long-run or steady-state value of y when a = 2* at all periods is found by letting the solution
be y*, say, then substituting these long-run values in the regression model, setting the disturbance
uterm to zero, and solving for y*, as follows:
yt sat pig’ + pay" ++ ppy + 0e" ne bob ae
y*(L— pi pa—---— Pp) = + (+N +. bE”
a +( rotten +4 er
= P= pa~ ee Pp \L= pr Pas Pp.
A notational trick with the lag polynomials makes this manipulation easier to write. When a lag
polynomial operates on something that is constant over time, the L operator does not change its
value, so this operation is equivalent to multiplying by one. In that case we can replace A(L), for
‘example, by A(1) = 1-1 —...— pp, since A(L)y* = (1 — pi — p2— ++. — pp)y" = A(1)y". With
this notation, the above derivation can be written as
A(1)y* a+ B(l)2*
eos 2
Y= apt (Gt)
‘This approach generalizes easily to several “x” variables. A regression model with k regressor
variables and their lags can be written compactly as
A(L)ye = a + Bi(L)are + Bo(L)z2e +--+ Be(L)ree + ue
When each regressor variable is fixed at a constant value over time, je = 2, j
long-run or steady-state mean of y is
o 4/81).
P= a5 (Ap)
»k, then the
We require A(1) > 0, that is, 372. pe <1, which is similar to the requirement p <1 seen earlier
in the single-lag model.
3 Interpreting the Coefficients of Dynamic Models
Here again is the dynamic model (3) from Section 2.2.
Me = Ot prve-a +... + Pptt—p + Boxe +... + Botta +02 +--+ Wb + Ue
12The coefficients are not very informative in isolation. For example, the eoeflicient (i on the regressor
21-2 indicates the change in yy resulting from a one-unit change in 2), holding the other regressors
constant, But the model itself implies that a one-unit change in 1-2 would have changed y1 and
tn-2 (unless 1 = By = 0). The holding-all-else constant thought experiment, is not very appealing
for dynamic models
‘A better way to interpret these coefficients is to use them to trace out the effect that a change in
2 would have on yr, Yet, Yes2» ete. This can be done recursively as follows. Let Ay represent
the change in y resulting from a change in z at time ¢ only, denoted as Az; (as distinct from the
first difference operator D introduced earlier). The effects of Az, on yr+1, ye+2, etc. are derived
recursively from the original model by adjusting the ¢ subscript in to t +1, £+2, ete. as required:
Au = BoAre
Aver = prdye+ ida
= pi(GAz) + Ace
= (p80 + fr)Aae
Avaz = dvr t+ mdue+ Brdse
= prl(eiPo + Hi)Aae) + p2(Poar) + Arde
= (9380+ prpi)Az: + (pais) Axe + Aa
= (5B + pub + 9260+ Ba) Aa
and so on,
‘When the process is stationary, then the long run effect of this one-time-only change in x settles
to zero, that is, ims sop S¥ti* = 0.
Another way to think about the effect of 2 on y in this model is to trace out the effect of a
“permanent” change in x beginning at time ¢ that lasts into the future. That is, let Age =
= Az, say. In general, this has a non-zero long run effect on y. Applied to
Aris = Ania =
13the above model, these effects are
Au = fede
Avs = piAm+ Ade+ bods
= plod) + (1 + 6) Ae
= (p1Bo + Bi + Bo)Ae
Avi2 = prAvirr + prAvet Bods + Bids + Bods
= pr((erBo + Bi + Bo)Az) + pr(SoAz) + Bde + Bids + Gods
= (Vio + pro + 151 + palbo + Ba + Bi + Po)
BQ) fot... + Ba
AQ) ~ T= p=.
Attic =
‘Numerical applications are not as notationally cumbersome. Suppose a fitted dynamic model is
We = SH Byrd lye they + Toa — Tet Bera — 2a + et
‘The effects on y of a one-time-only change in 2, are
Au
Aven
0
42a
3A + LTAcy
3(4.28z) + LTAx, =
Avie = 3Ayer+ Ay + 0c
= 3(2.96Azx;) +.1(4.2Ar) +0 = 1.308Ax,
Avis = 3Ayu2+ Ager
= 3(1.308Az;) + 1(2.96A2,) = 688402,
Avess = (.3 x 6884+ .1 x 1.308)Ax, = 33732Ax,
9A,
Attrss = Ox Ary
“4and the effects on y of a permanent change in a are
Au
Aver = BAye + 42Arey + LTA
= 8(42A,) +420, +170, =
16d,
Aves Auer + LAye +4. 2Arego + LTArys + Oz
= B(7AGA,) + 1(4.2A,) + 4.20, + 1L.7As + 0g = 8.4680,
42 7
Ate = +t
1-3-
‘These permanent-change effects are cumulative sums of the previously-derived one-time change
cffects. This is reflected in the Stata terminology simple and cumulative IRFs, (IRFs are impulse
response functions, which express these effects as a function of the time elapsed after the change
in 2. They usually are associated with VARs and VECMs, which are discussed in the next set of
notes.)
4 Estimation and Testing
Models like (3) and (S) are estimated by OLS. There is no autocorrelation in u,, so the lagged y
regressors are not correlated with us. As long as 2 is exogenous and there is no autocorrelation in
the errors, then the regressors values at time t and earlier are not correlated with
4.1 OLS: biased, but consistent
There is one other detail to consider, though, that does not arise in cross-section data. Let 2 =
(1 ye-r ven 24 24-1]! be the 5 x 1 vector of regressor values at time t, and @ is the coefficient vector.
Then (8) can be written as
wa yOu,
and the OLS estimator is
6= (ad) Dam
15‘The usual decomposition of 4 is
"
6 = (Dey Lam
Say Eaten w)
ey Tao+ at oa
i: our Yam ‘ :
t
Even though we assume that the elements of z are not correlated with tu, there will be non
retween ty and some elements of 244, § > 0, because the lagged y regressors
zero correlations b
‘zu; and some elements
appearing in 2+5 are correlated with 1. This causes a correlation ‘between,
in (Dy 224)"1, 80 that
BUS 22) ae #0
7 t
which implies that 6 is biased. Despite this bias, the following argument shows that plim 6 = 6,
implying that this bias shrinks to zero as n — 00.
Since Ez = 0, then Ez(yt — 26) = 0. Solving for @ gives @ = (Ex2) Eau
Now consider the plim of 6.
plim(S> 22) * san
T t
= (plim(n“! ) 224)"")
6
plim 6
= (Enz) (Bau)
where the last line follows from applying the LLN separately to each of the two sample means.
‘Therefore 4 is a consistent estimator of @. The fact that 6 is biased in finite samples tends to be
ignored in practice.
4.2 Specification Tests
‘The two main specification issues are: how many lags to apply to each of the regressor variables,
and whether or not the errors are autocorrelated,
164.2.1 Choosing the number of lags
In practice, there may be many more explanatory variables than the one ‘2’ that appears in (8)
Which lags should be included for each of these variables?
One way to streamline this decision-making, given some maximum lag p, is to require that all of
the lower-order lags be included, That is, if p = 3, include all of the variables x,, 2-1, t4-2 and
4-3. For example, suppose the coefficient on 21-1 is not significant, and you remove this regressor.
Even though its removal is ‘supported’ by the test, the restriction itself usually has little intuitive
appeal. It restricts the pattern of the dynamic effects of x on y in a hard-to-describe and possibly
nonintuitive way. One exception is when the data are quarterly or monthly and there is some
seasonal pattern (e.g. a spike in consumption in December). Then you might specify one or two
lags, followed by a gap, and another longer lag (e.g. 12 lags for quarterly data) to capture the
stochastic part of this seasonal pattern.
‘A way to further simplify this decision is to require that the same number of lags be used for each
variable. The following procedures can easily be adapted to this restriction.
Testing up and testing down
One way to select the maximum lag is to test whether the coefficient of the maximum lag of a given
model equals zero. If the test accepts, reduce the lag by one and repeat. If it rejects, stop. Starting
with a lot of lags and working down in this way is called testing-down.
Another approach is to start with a small number of lags. Add one lag, and if the new lag is not
significant, then stop. If it is significant, then keep it, add another one, test it, ete. This is called
testing-up.
‘Testing-down and testing-up both are prone to stopping at a bad number of lags due to unavoidable
type I and type II errors. They are sensitive to two arbitrary choices: the number of lags you start
at and the significance level of the test. Testing-down seems to be the more popular of the two.
Model selection criteria
‘Another approach to lag length selection is to estimate the model at many different lag lengths
and compute a model selection criterion for each. Then pick the lag length that gave the best
value of the criterion. ‘These criteria are functions of the sum of squared residuals and number of
rregressors, handling the tradeoff between model parsimony and goodness-of-fit. Of the many criteria
that have been suggested, the most commonly-used one for lag selection is Akaike's information
criterion (AIC):
AIC = nin(RSS) + 2k
7where n is sample size, RSS is the sum of squared residuals, and & is the nunber of unrestricted
regression coefficients. A smaller AIC is better.
4.2.2 Testing for autocorrelation
The traditional test of the null hypothesis of no autocorrelation in the errors is the Durbin-Watson
test. Their original test is not applicable when there is a lagged y, but they developed the Durbin-
Watson hi statistic for that situation. These tests still are used, but have been largely supplanted
by the Box-Pierce, Ljung-Box, and Breusch-Godirey tests.
‘The Bos-Pierce test statistic is
where #; is the estimated j* order autocorrelation in the OLS residuals. The null hypothesis is
Ho: p=. = 0, where p; is the j* order autocorrelation of the disturbance process.
The Ljung-Boc test statistic adjusts Qgp to more closely follow its asymptotic mull distribution in
finite samples.
:
Qua =n{n+2) 0 —
Fa
‘The Breusch-Godfrey test statistic is nR?, where R? is from the regression
4 = 0+ prea + p2ee-2 +--+ pptip + error
where the errors in question are 1 from ys = 240 +, e¢ are the OLS residuals.
For each of these three tests, n is the number of observations used in the regression (which depends
‘on both the number of observations in the original data set and the number of lags). ‘The null
hypothesis is Ho : pr Pp = 0, where pj is the j* order autocorrelation of the disturbance
process. The asymptotic null distribution of all three test statistics is chi-square with p d.f, and p
is chosen by the researcher. The rejection region is in the right tail.
These tests’ accept/reject rules are simpler that the Durbin-Watson tests, they allow for more gen-
eral lagged dependent variable specifications, and they enable testing for autocorrelations beyond
the first order.
18
5