0% found this document useful (0 votes)
46 views18 pages

Time SeriesClass Notes

Time series data class notes econometrics
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
46 views18 pages

Time SeriesClass Notes

Time series data class notes econometrics
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 18
pn 1 Preliminaries 1.1. Time Series Variables and Dynamic Models For a time series variable 44, the observations usually are indexed by a ¢ subscript instead of i. Unless stated otherwise, we assume that yr is observed at each period ¢ = 1,....n, and these periods are evenly spaced over time, e.g. years, months, or quarters. y, can be a flow variable (eg. GDP, trading volume), or a stock variable (e.g. capital stock) or a price or interest rate. For stock variables or prices, it can be important how they are defined, since they may vary within the period. A price at period t might be defined as the price in the middle of the period, at the end of the period, the average price over the period, etc. A model that describes how yp evolves over time is called a time series process, and a regression ‘model that has terms ftom different time periods entering in the same equation is a dynamic model. An example of a dynamic model is: Ve = Bo+ Priya + Bate + Byte + uy Models with time series variables usually are dynamic models, but not necessarily. You might have Ye= 0+ Use + Ue where u is distributed independently of its past values. This is not a dynamic model, because there is nothing in it that links the different time periods. 1.2 Lag and First Difference Operators ‘The lag operator L is defined as: Tye = ue1 L* for some positive integer a means lagging ye by a periods. If a = 2 then Dy = Ly = U(Ly) = Lani = yea Lag polynomials are notated as A(L), B(L), etc... An example is ‘Then AD) ue = ve = Ae Often a lag polynomials can be inverted. Let A(L) = 1— pL. If |p| is less than one, then A(L)~* expands like a geometric series, A(L)“! = (1 — pL)! = 1+ pL + p°L? + p'L? + pilt +... This expansion is used to obtain equation (2) in section 2.1.1. The first difference, or change in y: compared to the previous period, will be denoted by the first difference operator D, where: Dy =e (The symbol A is used later for describing the change in y due to a change in z.) Note that D?yp, the second difference of yp, does not equal ys — ye-2: Dy, = DDye = D(Dyx) = D(ye-ve-1) = Dye Dea = (ue-ue-1)— (Yea yea) = ye 2yeatyeee 1.3 White Noise In dynamic models, the effects of the error terms are felt across time and across variables. ‘To under stand what the model is predicting, and to figure out how to estimate its coefficients and variances, models are commonly specified with error terms having no autocorrelation or heteroskedasticity. These are referred to as white noise errors or disturbances. Following Greene's textbook, we will represent a white noise variable as u. (In many other sources it is represented by ¢.) So the term “white noise” is a short way to say that Eu = 0, Var(w) = 07, for all t, and Ew, = 0, forall ts ,t#s 1.4. Stationarity Stationarity usually means covariance stationarity, where the expected value, variance, and auto- covariances of yt do not depend on t, that is, they do not change over time, are not functions of t Eye, Var(ye); Cov(yty Ye In a model, the phrase “y is stationary” can mean “y is assumed to follow a covariance stationary time series process.” If y: is an observed time series, then “y, is stationary” is a (possibly over- confident) way of saying either “on the basis of some testing procedure, we cannot reject the null hypothesis that ye was generated by a covariance stationary process” or “we can reject the null hypothesis that ye was generated by a non-stationary process.” White noise is stationary, but stationarity does not imply white noise. An autocorrelated process can be stationary but is not white noise. 1.5 Examples of nonstationary processes (i) An example of a nonstationary process is y=a+ Bt+u where u; is white noise. ‘The mean of ye, a+ Bt, is a function of t, This process is called trend stationary, because apart from the trend, the rest of the process is stationary. That is, once {Bt is accounted for by moving it to the left-hand side, the rest is stationary: ye — Gt = a-+ we ) Another example of a nonstationary process is a random walk, We Yea tue 2 21 where uy is white noise. Taking the variance of both sides, Var(y) = Var(-a +m) Var(ye) = Var(ye—1) + Unless 2 = 0, the variance of this process increases with 4, hence must depend on ¢ and is not stationary. Time Series Models ARMA models An important time series model in statistics is the autoregressive moving average (ARMA(p, q)) model where u; is white noise. Letting A(L) = 1— pL —. then the ARMA(p,q) model can be written compactly as ves e+ pryiaa tee + Ppdap te + tea tee tb Igue-g ~ ppl? and B(L) =14+yL+...+%4L4, A(L)ye = + B(L)ue ‘The A(L)y term is the autoregressive part and the B(L)u term is the moving average part. An ARMA(p, 0) process is more simply called an AR(p) process, A(L)y, = 0+ ue ‘This is a regression model where y is regressed on its own lags. Similarly, an ARMA((,q) process is called an MA(q) process, ye=a+B(Luy which expresses y; as a weighted average of the current and past q disturbances. ‘Some generalizations of ARMA models include (i) the autoregressive integrated moving average (ARIMA(p,d,q)) model Dty = a+ prDtya +o. + PpD yp + ue maa +o Itty which is an ARMA model applied to the d!* differen of yy, and (i) the ARMAX model Ye =O + piyt +o. + PpYe—p + 218 + Ue + MMA +--+ Ygttt—g which augments the ARMA model with k other regressor variables through a k x 1 vector 1, the magnitudes of the coefficients on the lag operators get larger as the lags go “further back”. Roughly speaking, that would mean that what happened 100 periods ago would have a bigger effect on the present than what happened last period. ‘This is an explosive time series process, which sounds kind of exciting, but is not considered very useful in time series econometrics. There is a great deal of interest in the borderline case where |p| = 1, specifically when p = 1 (not so much p= -1). This value is associated with unit root time series, which will be dealt with later. 2.2 Dynamically complete models Dynamically complete models are time series regression models with no autocorrelation in the errors. Let x; and 2, be two time series explanatory variables, then such a model is Me = At pit... + PpYe-p + Bote +--+ Butta + Woz t+... + web + UL (3) which can be written more compactly as A(L)ye = a+ B(L)ay + C(L)a + ue where u; is white noise. There can be more right-hand side variables. 2.3 Models with autocorrelated errors vs. dynamically complete models Some of the above models have an MA component in the error terms, and therefore have auto ypose we have specified a model with whi related errors, while others have white noise errors. noise errors, but find evidence of autocorrelation in the residuals of the fitted model. (‘Tests for lation are discussed in section 4.2.2.) There are two main ways to adjust the model to deal autocorr with this, One is to model the autocorrelation in the errors, and the other is to include more lagged regressors until there no longer is evidence of such autocorrelation. This second approach (making the model dynamically complete (Wooldridge (2009, pp.396-9))) has become the more popular one. In this section it is shown how these two approaches are related, and why the second approach has become more popular. 2.3.1 Autoregressive errors ‘Suppose the model is w= a+ fbrrter (@) and autocorrelation is suspected in the error term ¢. Of the two ways of modeling autocorrelation that we have seen, AR and MA processes, AR is by far the more common. Next, we will argue why AR errors are more commonly used than MA errors. As usual, let u; represent white noise. If the model has mean-zero AR(p) errors, then = preat...+ Ppét-p + ut ,or equivalently A(D)e = wm, where A(L) =1—piL—...—ppL?, or e@ = AL) ly Let p = 1, giving AR(1) or first-order autoregressive errors. It has been the most commonly- employed version, and will simplify notation for our purposes. Then (4) becomes we = at Br +(1- pil) my (5) This AR error process implies a gradual decrease in the error autocorrelations as the lag increases. Bee. = E(pie1 + us)er-1 = piVar(é:), assuming |p;| < 1 and therefore that the process is stationary. Continuing this approach would show that Eeree4 = pj Var(e,), and therefore Cov(eveia) piVar(er) i Cort(¢tét-s) = eet = =o | (eet) = Tarte yWar(eraa) — VVarler : If the residual autocorrelations die out gradually rather than suddenly, as would be implied by fan MA(q) ertor process, then AR errors likely provide a better description of the autocorrelation pattern. Estimation with AR errors Both AR and MA errors can be estimated efficiently by Generalized Least Squares when there are no lagged y regressors. The GLS estimator of the regression coefficients when there are AR(1) errors (e.g. GLS estimation of a and 6 in (5)) is very similar to the following “quasi-differencing” procedure. First, estimate p1. Many estimators have been proposed, a simple one being the sample correlation between the OLS residual e and its first lag e-1. Then take the approximate model that results from substituting jr for p; in (6) and multiply through by (1 — AL), we = 0+ Bret (1- Alyy (1-AL)m = (1-ArL)at Bll ArL)ae + ue hh = a + Batu where fe = vi — Ayia and = 21 — pitt are quasi-differenced variables. OLS estimation of a* and 8 in the last equation is very similar to the GLS estimator. It is known as the Cochrane-Oreutt estimator and has been used since the 1940s. Disadvantages of AR errors In current practice, models with AR errors largely have been replaced by models with more lagged explanatory variables and white noise errors. One disadvantage of AR errors is that the dynamic effects are difficult to interpret. When the dynamics are only in observed variables, a more direct description of “the effect of x on y” over time is possible. ‘Another disadvantage of AR errors arises when there are lagged y regressors. Consider the model Me = atu + Bee +e, where € = per + ue (6) therefore between y:-1 and «1. Unless p = 0, there must then be a correlation between the Fegressor yy and the error term ¢, since y.—1 = «1-1 > ¢. Instrumental variable estimators have beet proposed. Unlike many other endogenenous-regressor problems, however, there is another \ 2 is an endogenous regressor. The model itself implies a correlation between y, and ¢, and way out, which is described next. From AR errors to dynamically complete models ‘The once-popular time series model with AR errors as been largely replaced by dynamically complete models, which have more lags and no autocorrelation in the errors, The reasons for this will be illustrated here by converting from one to the other and pointing out the advantages of the latter. First, rewrite model (6) with the autocorrelation removed by multiplying through by (1 — pL), ye = +7 yA + Bary + (1— pL) hy (1 eL)ue (L= pL)a + (1 — pL) yea + G(1 - pb)ae + we we = + py + yy — Yom-2 + Bae — Spars + 7) Model (7) has new lagged regressors which replace the dynamics that were in the error term of (6). Simplifying the notation for the regression coefficients, (7) can be written as Me = G+ Over + Orye2 + Ome + Bate + te (8) Eliminating the autocorrelation in the errors has solved the endogenous regressor problem, although comparing (7) to (8) reveals a restriction which can be written as 01630, = —63 + 6203. This awkward-looking nonlinear restriction arises from having multiplied through the original model by the “common factor” (1 — pL), and is referred to as a common factor restriction. In current practice, it is typically not imposed. Researchers’ starting point is a model more like (8) than (6). Having estimated (8), it is not natural to ask whether 6,030, = —63 + 6263. It is more natural to ask if 8 = 0, or 4 = 0, or whether ys and/or 2,3 should be included as regressors, or whether the error term really is white noise. The goal is to specify a dynamically complete model without including lags unnecessarily. 2.4 Means and Variances 2.4.1 Conditional and unconditional means If in Section 2.1.1, we assume that Eu, = 0 for all t in model (1), then (2) makes it easy to see that Ey = 72;. This may seem to contradict (1), which says that Ey: = 7+ pyr. This apparent contradiction arises because the regression model (1) on the one hand shows the expected value of ye conditional on y.. On the other hand, (2) does not have any regressor variables and has a ppresonts the unconditional mean of ye. Summarizing zero-mean error term, $0 its constant term re} with more careful expectation notation, Ey = [2p (umeonuitional menn) y+ ey-1 (conditional mean) Bluelue-1) ‘The unconditional mean does not depend on t. This is one of the necessary conditions for a process to be stationary, The unconditional mean of y_ can be derived by using the assumed stationarity of y to justify the restriction Ey, = By.-1 = y*, say. Take the expectation of both sides of the above conditional mean expression with respect to both ye and ye-1 and then solve for ys. Elueluet) = 7+ PE yrve-1 E(u) = y+pu* y= 1+ pye 2.4.2 Deriving unconditional moments of a stationary AR(1) Consider ye where m= 6+ Aya tue and u; is white noise. Since it is not specified otherwise, and usually it is not, we assume there are no starting conditions. (An example of a starting condition is yo = 0.) When there are no starting conditions, we are assuming that we observe a segment of a process that started long enough @g0 to have settled into its steady-state behaviour, in which a variable fluctuates around constant Jong-run unconditional mean. The above process can be written as (I= Aba = 6+u we = (1- AL) 16+ (1- AL) tue 6 = joatet Auya + Pupr +. = 10+ us j= 10 ise assumption leads to variance, From the last expression we see that Ey, = 10. The whit covariance and correlation restilts Var(y) = E()> us)? LP Bui; (So(4%))o? i io =o a a = (1e)o* uw =] Covluertes) = (92 Ses) = BQ PH ML = = = BYP ue ja) BY (4 ua) Poe i= = = San, = ate? wax jo jo Ato? 0? Cov(yys)__ _ Cov(ues ie) = Ter a _ _Cov(yesyi-s) __ Covltus ves) Corr(ye tts) = fart) Var(ues) Var(ee) ‘The covariances are not zero, 80 y is not white noise. But because the mean, variance, and covariances do not change over time, yp is stationary. 2.4.3 Long-run expected values in dynamic regression models ‘The method of section 2.4.2 can be generalized to obtain the unconditional (on past y values) mean of y, conditional on other explanatory variables, when the latter are assigned values that are fixed over time. Consider Me = Ot pry at paint -.-+ Ppp + MLE + aA Feb Mtoe + Ue OF A(L)y = a+ B(L)te +m where A(L) =1— pL —...— ppl? and B(L) = 9+ 1L+...+79L". ‘The long-run or steady-state value of y when a = 2* at all periods is found by letting the solution be y*, say, then substituting these long-run values in the regression model, setting the disturbance u term to zero, and solving for y*, as follows: yt sat pig’ + pay" ++ ppy + 0e" ne bob ae y*(L— pi pa—---— Pp) = + (+N +. bE” a +( rotten +4 er = P= pa~ ee Pp \L= pr Pas Pp. A notational trick with the lag polynomials makes this manipulation easier to write. When a lag polynomial operates on something that is constant over time, the L operator does not change its value, so this operation is equivalent to multiplying by one. In that case we can replace A(L), for ‘example, by A(1) = 1-1 —...— pp, since A(L)y* = (1 — pi — p2— ++. — pp)y" = A(1)y". With this notation, the above derivation can be written as A(1)y* a+ B(l)2* eos 2 Y= apt (Gt) ‘This approach generalizes easily to several “x” variables. A regression model with k regressor variables and their lags can be written compactly as A(L)ye = a + Bi(L)are + Bo(L)z2e +--+ Be(L)ree + ue When each regressor variable is fixed at a constant value over time, je = 2, j long-run or steady-state mean of y is o 4/81). P= a5 (Ap) »k, then the We require A(1) > 0, that is, 372. pe <1, which is similar to the requirement p <1 seen earlier in the single-lag model. 3 Interpreting the Coefficients of Dynamic Models Here again is the dynamic model (3) from Section 2.2. Me = Ot prve-a +... + Pptt—p + Boxe +... + Botta +02 +--+ Wb + Ue 12 The coefficients are not very informative in isolation. For example, the eoeflicient (i on the regressor 21-2 indicates the change in yy resulting from a one-unit change in 2), holding the other regressors constant, But the model itself implies that a one-unit change in 1-2 would have changed y1 and tn-2 (unless 1 = By = 0). The holding-all-else constant thought experiment, is not very appealing for dynamic models ‘A better way to interpret these coefficients is to use them to trace out the effect that a change in 2 would have on yr, Yet, Yes2» ete. This can be done recursively as follows. Let Ay represent the change in y resulting from a change in z at time ¢ only, denoted as Az; (as distinct from the first difference operator D introduced earlier). The effects of Az, on yr+1, ye+2, etc. are derived recursively from the original model by adjusting the ¢ subscript in to t +1, £+2, ete. as required: Au = BoAre Aver = prdye+ ida = pi(GAz) + Ace = (p80 + fr)Aae Avaz = dvr t+ mdue+ Brdse = prl(eiPo + Hi)Aae) + p2(Poar) + Arde = (9380+ prpi)Az: + (pais) Axe + Aa = (5B + pub + 9260+ Ba) Aa and so on, ‘When the process is stationary, then the long run effect of this one-time-only change in x settles to zero, that is, ims sop S¥ti* = 0. Another way to think about the effect of 2 on y in this model is to trace out the effect of a “permanent” change in x beginning at time ¢ that lasts into the future. That is, let Age = = Az, say. In general, this has a non-zero long run effect on y. Applied to Aris = Ania = 13 the above model, these effects are Au = fede Avs = piAm+ Ade+ bods = plod) + (1 + 6) Ae = (p1Bo + Bi + Bo)Ae Avi2 = prAvirr + prAvet Bods + Bids + Bods = pr((erBo + Bi + Bo)Az) + pr(SoAz) + Bde + Bids + Gods = (Vio + pro + 151 + palbo + Ba + Bi + Po) BQ) fot... + Ba AQ) ~ T= p=. Attic = ‘Numerical applications are not as notationally cumbersome. Suppose a fitted dynamic model is We = SH Byrd lye they + Toa — Tet Bera — 2a + et ‘The effects on y of a one-time-only change in 2, are Au Aven 0 42a 3A + LTAcy 3(4.28z) + LTAx, = Avie = 3Ayer+ Ay + 0c = 3(2.96Azx;) +.1(4.2Ar) +0 = 1.308Ax, Avis = 3Ayu2+ Ager = 3(1.308Az;) + 1(2.96A2,) = 688402, Avess = (.3 x 6884+ .1 x 1.308)Ax, = 33732Ax, 9A, Attrss = Ox Ary “4 and the effects on y of a permanent change in a are Au Aver = BAye + 42Arey + LTA = 8(42A,) +420, +170, = 16d, Aves Auer + LAye +4. 2Arego + LTArys + Oz = B(7AGA,) + 1(4.2A,) + 4.20, + 1L.7As + 0g = 8.4680, 42 7 Ate = +t 1-3- ‘These permanent-change effects are cumulative sums of the previously-derived one-time change cffects. This is reflected in the Stata terminology simple and cumulative IRFs, (IRFs are impulse response functions, which express these effects as a function of the time elapsed after the change in 2. They usually are associated with VARs and VECMs, which are discussed in the next set of notes.) 4 Estimation and Testing Models like (3) and (S) are estimated by OLS. There is no autocorrelation in u,, so the lagged y regressors are not correlated with us. As long as 2 is exogenous and there is no autocorrelation in the errors, then the regressors values at time t and earlier are not correlated with 4.1 OLS: biased, but consistent There is one other detail to consider, though, that does not arise in cross-section data. Let 2 = (1 ye-r ven 24 24-1]! be the 5 x 1 vector of regressor values at time t, and @ is the coefficient vector. Then (8) can be written as wa yOu, and the OLS estimator is 6= (ad) Dam 15 ‘The usual decomposition of 4 is " 6 = (Dey Lam Say Eaten w) ey Tao+ at oa i: our Yam ‘ : t Even though we assume that the elements of z are not correlated with tu, there will be non retween ty and some elements of 244, § > 0, because the lagged y regressors zero correlations b ‘zu; and some elements appearing in 2+5 are correlated with 1. This causes a correlation ‘between, in (Dy 224)"1, 80 that BUS 22) ae #0 7 t which implies that 6 is biased. Despite this bias, the following argument shows that plim 6 = 6, implying that this bias shrinks to zero as n — 00. Since Ez = 0, then Ez(yt — 26) = 0. Solving for @ gives @ = (Ex2) Eau Now consider the plim of 6. plim(S> 22) * san T t = (plim(n“! ) 224)"") 6 plim 6 = (Enz) (Bau) where the last line follows from applying the LLN separately to each of the two sample means. ‘Therefore 4 is a consistent estimator of @. The fact that 6 is biased in finite samples tends to be ignored in practice. 4.2 Specification Tests ‘The two main specification issues are: how many lags to apply to each of the regressor variables, and whether or not the errors are autocorrelated, 16 4.2.1 Choosing the number of lags In practice, there may be many more explanatory variables than the one ‘2’ that appears in (8) Which lags should be included for each of these variables? One way to streamline this decision-making, given some maximum lag p, is to require that all of the lower-order lags be included, That is, if p = 3, include all of the variables x,, 2-1, t4-2 and 4-3. For example, suppose the coefficient on 21-1 is not significant, and you remove this regressor. Even though its removal is ‘supported’ by the test, the restriction itself usually has little intuitive appeal. It restricts the pattern of the dynamic effects of x on y in a hard-to-describe and possibly nonintuitive way. One exception is when the data are quarterly or monthly and there is some seasonal pattern (e.g. a spike in consumption in December). Then you might specify one or two lags, followed by a gap, and another longer lag (e.g. 12 lags for quarterly data) to capture the stochastic part of this seasonal pattern. ‘A way to further simplify this decision is to require that the same number of lags be used for each variable. The following procedures can easily be adapted to this restriction. Testing up and testing down One way to select the maximum lag is to test whether the coefficient of the maximum lag of a given model equals zero. If the test accepts, reduce the lag by one and repeat. If it rejects, stop. Starting with a lot of lags and working down in this way is called testing-down. Another approach is to start with a small number of lags. Add one lag, and if the new lag is not significant, then stop. If it is significant, then keep it, add another one, test it, ete. This is called testing-up. ‘Testing-down and testing-up both are prone to stopping at a bad number of lags due to unavoidable type I and type II errors. They are sensitive to two arbitrary choices: the number of lags you start at and the significance level of the test. Testing-down seems to be the more popular of the two. Model selection criteria ‘Another approach to lag length selection is to estimate the model at many different lag lengths and compute a model selection criterion for each. Then pick the lag length that gave the best value of the criterion. ‘These criteria are functions of the sum of squared residuals and number of rregressors, handling the tradeoff between model parsimony and goodness-of-fit. Of the many criteria that have been suggested, the most commonly-used one for lag selection is Akaike's information criterion (AIC): AIC = nin(RSS) + 2k 7 where n is sample size, RSS is the sum of squared residuals, and & is the nunber of unrestricted regression coefficients. A smaller AIC is better. 4.2.2 Testing for autocorrelation The traditional test of the null hypothesis of no autocorrelation in the errors is the Durbin-Watson test. Their original test is not applicable when there is a lagged y, but they developed the Durbin- Watson hi statistic for that situation. These tests still are used, but have been largely supplanted by the Box-Pierce, Ljung-Box, and Breusch-Godirey tests. ‘The Bos-Pierce test statistic is where #; is the estimated j* order autocorrelation in the OLS residuals. The null hypothesis is Ho: p=. = 0, where p; is the j* order autocorrelation of the disturbance process. The Ljung-Boc test statistic adjusts Qgp to more closely follow its asymptotic mull distribution in finite samples. : Qua =n{n+2) 0 — Fa ‘The Breusch-Godfrey test statistic is nR?, where R? is from the regression 4 = 0+ prea + p2ee-2 +--+ pptip + error where the errors in question are 1 from ys = 240 +, e¢ are the OLS residuals. For each of these three tests, n is the number of observations used in the regression (which depends ‘on both the number of observations in the original data set and the number of lags). ‘The null hypothesis is Ho : pr Pp = 0, where pj is the j* order autocorrelation of the disturbance process. The asymptotic null distribution of all three test statistics is chi-square with p d.f, and p is chosen by the researcher. The rejection region is in the right tail. These tests’ accept/reject rules are simpler that the Durbin-Watson tests, they allow for more gen- eral lagged dependent variable specifications, and they enable testing for autocorrelations beyond the first order. 18 5

You might also like