09 - Regression With Time Series Data
09 - Regression With Time Series Data
Owing to the randomness generating the observations of y, the properties of OLS that depend
on random sampling still hold.
The econometrician’s job is to accurately model the stochastic process, both for the purpose of
inference as well as prediction.
◦ Prediction is an application for time series model estimates because knowing the process generating
new observations of y naturally enables you to estimate a future (“out of sample”) value.
Stationary and weakly dependent time
series
Many time series processes can be viewed either as
◦ regressions on lagged (past) values with additive disturbances or
◦ as aggregations of a history of innovations.
In order to show this, we have to write down a model and make some assumptions about how
present values of y (𝑦𝑦𝑡𝑡 ) are related to past values (e.g., 𝑦𝑦𝑡𝑡−1 ) and about the variance and
covariance structure of the disturbances.
For the sake of clarity, consider a univariate series that does not depend on values of other
variables—only on lagged values of itself.
Stationary and weakly dependent time
series (continued)
𝑦𝑦𝑡𝑡 = 𝜌𝜌1 𝑦𝑦𝑡𝑡−1 + 𝑒𝑒𝑡𝑡 ; 𝐸𝐸 𝑒𝑒𝑡𝑡 = 0, 𝐸𝐸 𝑒𝑒𝑡𝑡2 = 𝜎𝜎𝑒𝑒2 , 𝐸𝐸 𝑒𝑒𝑡𝑡 𝑒𝑒𝑠𝑠≠𝑡𝑡 = 0,
is a simple example of such a model.
◦ Specifically this is an autoregressive process of order 1—more commonly called “AR(1)” for brevity—
because y depends on exactly 1 lag of itself.
◦ In this instance we have also assumed that the disturbances have constant (zero) mean and variance
and are not correlated across time periods.
In order to make use of a series in regression analysis, it needs to have an expected value,
variance, and auto-covariance (covariance with lagged values of itself), though, and not all series
have these (at least not that are finite).
A series will have these properties if it is stationary.
Stationarity
The property of stationarity implies:
1 𝐸𝐸 𝑦𝑦𝑡𝑡 is independent of 𝑡𝑡,
2 𝑉𝑉𝑉𝑉𝑉𝑉 𝑦𝑦𝑡𝑡 is a finite positive constant, independent of 𝑡𝑡,
3 𝐶𝐶𝐶𝐶𝐶𝐶 𝑦𝑦𝑡𝑡 , 𝑦𝑦𝑡𝑡−𝑠𝑠 is a finite function of 𝑡𝑡 − 𝑠𝑠 , but not 𝑡𝑡 or 𝑠𝑠,
4 The distribution of 𝑦𝑦𝑡𝑡 is not changing over time.
For our purposes the 4th condition is unnecessary, and a process that satisfies the first 3 is still
weakly stationary or covariance stationary.
Stationarity of AR(1) process
The AR(1) process, 𝑦𝑦𝑡𝑡 , is covariance stationary under specific conditions.
𝐸𝐸 𝑦𝑦𝑡𝑡 = 𝜌𝜌1 𝐸𝐸 𝑦𝑦𝑡𝑡−1 + 𝐸𝐸 𝑒𝑒𝑡𝑡 ; 1 → 𝐸𝐸 𝑦𝑦𝑡𝑡 = 𝜌𝜌1 𝐸𝐸 𝑦𝑦𝑡𝑡 ⇔ 𝐸𝐸 𝑦𝑦𝑡𝑡 = 0,
Before moving on, let’s summarize a couple more things about the iterative substitution of the
AR(1) 𝑦𝑦𝑡𝑡 process.
Autocorrelation concluded
The current period’s value can be expressed neatly as an infinitely long summation of the past
disturbances (“history”).
∞
But bringing the discussion of time series data back to familiar realms, consider a simple
example in which the dependent variable is a function of contemporaneous and past values of
the explanatory variable.
Models that exhibit this trait are called “finite distributed lag” (FDL) models.
Finite distributed lag models
This type is further differentiated by its order, i.e., how many lags are relevant for predictingy.
An FDL of order q is written:
𝑦𝑦𝑡𝑡 = 𝛼𝛼0 + 𝛿𝛿0 𝑧𝑧𝑡𝑡 + 𝛿𝛿1 𝑧𝑧𝑡𝑡−1 +. . . +𝛿𝛿𝑞𝑞 𝑧𝑧𝑡𝑡−𝑞𝑞 + 𝑢𝑢𝑡𝑡 , or compactly as,
𝑞𝑞
For example demand for gasoline is quite inelastic in the short run but much more elastic in the
long run,
◦ because consumers can change their vehicles, commuting habits, and locations if given enough time.
These effects could be estimated separately using an FDL model such as:
𝑓𝑓𝑓𝑓𝑓𝑓𝑙𝑙𝑡𝑡 = 𝛼𝛼0 + 𝛿𝛿0 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑒𝑒𝑡𝑡 + 𝛿𝛿1 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑒𝑒𝑡𝑡−1 +. . . +𝛿𝛿𝑞𝑞 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑒𝑒𝑡𝑡−𝑞𝑞 + 𝑢𝑢𝑡𝑡 .
Finite distributed lag models (concluded)
The familiar prediction from microeconomics is that the static effect—sometimes called the
impact multiplier—is quite small or zero while the overall effect—the equilibrium multiplier—of
a price increase is large and negative.
Formally these could be stated:
𝑞𝑞
To model a time trend in 𝑦𝑦 that increases it by a constant amount (𝛼𝛼1 ) each period.
𝑦𝑦𝑡𝑡 = 𝛼𝛼0 + 𝛼𝛼1 𝑡𝑡 + 𝑒𝑒𝑡𝑡 → ∆𝑦𝑦𝑡𝑡 = 𝛼𝛼0 + 𝛼𝛼1 𝑡𝑡 + 𝑒𝑒𝑡𝑡 − 𝛼𝛼0 + 𝛼𝛼1 𝑡𝑡 − 1 + 𝑒𝑒𝑡𝑡−1 ,
using only the x variables, thus generating biased estimates of their coefficients.
A detrending interpretation of
regressions with a time trend
Were the researcher to “de-trend” all of the variables prior to performing the above regression,
however, the estimates would not be biased.
By “de-trending” what is meant is to regress each of them on time and subtract the fitted
values—so store the residuals.
◦ The following example will illustrate this.
Suppose one regressed a time series of employment in a local area (San Francisco, CA)
observations on a time series of the minimum wage.
Detrending (continued)
The results would look like this.
The positive coefficient estimate seems to contradict the usual theoretical prediction that a
wage floor decreases equilibrium employment.
However both variables trend upward over time, as the following regressions demonstrates.
Detrending (continued)
Detrending (continued)
Controlling for the time trend yields support for the more familiar theoretical prediction—that
wage floors decrease equilibrium unemployment.
Detrending (concluded)
The following Stata code would
enable you to obtain the same
results using de-trended variables.
reg l_emp time
predict l_emp_detr, residuals
reg minimum_wage time
predict mw_detr, residuals
Then you could regress the
residuals on one another (without
the time trend) to obtain the
estimates without the bias of the
omitted time trend.
Seasonality
The same statements about time trends can be made about seasonal (calendar period-specific)
effects:
◦ variables that move in predictable ways through the calendar year.
◦ These can be accounted for using seasonal indicator variables, e.g.,
1, month of observation is 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀
𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚ℎ𝑡𝑡 = �
0, otherwise.
Including a set of monthly (quarterly) indicators in a regression accomplishes the same thing as
including a time trend:
◦ controlling for a spurious variable that would otherwise bias the estimates.
The same process that works with trends works with seasonality: “de-seasonalizing” of
variables.
Conclusion (1)
This lesson has shown a representative sample of basic time series regression methods.
Time series analysis has been generalized to univariate processes that exhibit autoregressive and
moving average (“ARMA” models) properties and are most concisely represented by the Wold
Decomposition:
∞
𝐸𝐸 𝑦𝑦𝑡𝑡 𝑦𝑦𝑡𝑡+ℎ = 𝜌𝜌1ℎ+2 𝜎𝜎𝑦𝑦2 + 0 + 0 + 𝐸𝐸 𝑒𝑒𝑡𝑡 � 𝜌𝜌1𝑠𝑠 𝑒𝑒𝑡𝑡+ℎ−𝑠𝑠 = 𝜌𝜌1ℎ+2 𝜎𝜎𝑦𝑦2 + 𝜌𝜌1ℎ 𝜎𝜎𝑒𝑒2 ,
𝑠𝑠=0
where the 2nd equality uses the assumption that the disturbances are uncorrelated across time
periods.
𝜎𝜎𝑒𝑒2 , 𝑠𝑠 = ℎ
𝐸𝐸 𝑒𝑒𝑡𝑡 𝑒𝑒𝑡𝑡+ℎ−𝑠𝑠 = �
0, 𝑠𝑠 ≠ ℎ.
Optional: autocovariance of AR(1)
process (concluded)
Now the whole expression can be written in terms of the variance of y.
𝐸𝐸 𝑦𝑦𝑡𝑡 𝑦𝑦𝑡𝑡+ℎ = 𝜌𝜌1ℎ+2 𝜎𝜎𝑦𝑦2 + 𝜌𝜌1ℎ 𝜎𝜎𝑦𝑦2 1 − 𝜌𝜌12 = 𝜎𝜎𝑦𝑦2 𝜌𝜌1ℎ 𝜌𝜌12 + 1 − 𝜌𝜌12 = 𝜎𝜎𝑦𝑦2 𝜌𝜌1ℎ .
With this result, it is possible to make some sense of how persistent y is.
Back.
Optional: biasedness of OLS
The proof of this is fairly complex, but intuitively the bias comes from the fact that the present
period disturbance 𝑒𝑒𝑡𝑡 can be expressed as a function of lagged values of the dependent variable;
◦ these are correlated with the regressor, 𝑦𝑦𝑡𝑡−1 , resulting in the bias in finite samples.
But the OLS bias with lagged dependent variables as regressor does disappear with large
samples,
◦ so at least OLS is consistent in these circumstances.
Optional: biasedness of OLS (sketch of
“proof”)
The estimator of 𝜌𝜌 in an AR(1) regression is:
∑𝑛𝑛𝑡𝑡=1 𝑦𝑦𝑡𝑡−1 𝑒𝑒𝑡𝑡
𝜌𝜌�1 = 𝜌𝜌1 + 𝑛𝑛 2 .
∑𝑡𝑡=1 𝑦𝑦𝑡𝑡−1
𝑒𝑒𝑡𝑡 can be written using a lag operator, 1 − 𝜌𝜌1 𝐿𝐿 𝑦𝑦𝑡𝑡 = 𝑦𝑦𝑡𝑡 − 𝜌𝜌1 𝑦𝑦𝑡𝑡−1 .
Optional: biasedness of OLS (sketch of
“proof”)
So,
∑𝑛𝑛𝑡𝑡=1 𝑦𝑦𝑡𝑡−1 1 − 𝜌𝜌1 𝐿𝐿 𝑦𝑦𝑡𝑡
𝜌𝜌�1 = 𝜌𝜌1 + , which is where the bias comes from.
∑𝑛𝑛𝑡𝑡=1 𝑦𝑦𝑡𝑡−1
2