Time Series Regression
Time Series Regression
and Forecasting
14-1
Example #1 of time series data: US rate of price inflation, as
measured by the quarterly percentage change in the
Consumer Price Index (CPI), at an annual rate
14-2
Example #2: US rate of unemployment
14-3
Why use time series data?
14-4
Time series data raises new technical issues
Time lags
Correlation over time (serial correlation, a.k.a.
autocorrelation)
Forecasting models built on regression methods:
o autoregressive (AR) models
o autoregressive distributed lag (ADL) models
oneed not (typically do not) have a causal interpretation
Conditions under which dynamic effects can be estimated,
and how to estimate them
Calculation of standard errors when the errors are serially
correlated
14-5
Using Regression Models for Forecasting
For forecasting,
o R 2 matters (a lot!)
o Omitted variable bias isn’t a problem!
o We will not worry about interpreting coefficients
in forecasting models
o External validity is paramount: the model
estimated using historical data must hold into the
(near) future
14-6
Introduction to Time Series Data
and Serial Correlation
14-7
We will transform time series variables using lags, first
differences, logarithms, & growth rates
14-8
Example: Quarterly rate of inflation at an annual rate (U.S.)
CPI = Consumer Price Index (Bureau of Labor Statistics)
CPI in the first quarter of 2004 (2004:I) = 186.57
CPI in the second quarter of 2004 (2004:II) = 188.60
Percentage change in CPI, 2004:I to 2004:II
14-9
Example: US CPI inflation – its first lag and its change
14-10
Autocorrelation
14-11
14-12
Sample autocorrelations
The jth sample autocorrelation is an estimate of the jth
population autocorrelation:
□
cov(Yt ,Yt )
ˆ j = j
v□ar(Yt
where )
□ 1 T
cov(Yt ,Yt ) =
j
(Yt
T t j1 Yj1,T )(Yt j Y1,T j )
where Yj1,T is the sample average of Yt computed
over observations t = j+1,…,T. NOTE:
othe summation is over t=j+1 to T (why?)
o The divisor is T, not T – j (this is the
conventional definition used for time series data)
14-13
Example: Autocorrelations of:
(1) the quarterly rate of U.S. inflation
(2) the quarter-to-quarter change in the quarterly rate of
inflation
14-14
The inflation rate is highly serially correlated (1 = .84)
Last quarter’s inflation rate contains much information
about this quarter’s inflation rate
The plot is dominated by multiyear swings
But there are still surprise movements!
14-15
Other economic time series:
14-16
Other economic time series, ctd:
14-17
Stationarity: a key requirement for external validity of
time series regression
Yt = 0 + 1Yt–1 + ut
14-20
Example: AR(1) model of the change in inflation
R 2 = 0.05
□Inft = 0.017 –
0.238Inft–1 (0.126)
(0.096)
Is the lagged change in inflation a useful predictor of the
current change in inflation?
t = –.238/.096 = –2.47 > 1.96 (in absolute value)
ÞReject H0: 1 = 0 at the 5% significance level
Yes, the lagged change in inflation is a useful predictor of
current change in inflation–but the R 2 is pretty low!
14-21
Example: AR(1) model of inflation – STATA
First, let STATA know you are using time series data
14-22
Example: AR(1) model of inflation – STATA, ctd.
. gen lcpi = log(cpi); variable cpi is already in memory
This creates a new variable, inf, the “nth” observation of which is 400
times the difference between the nth observation on lcpi and the “n-1”th
observation on lcpi, that is, the first difference of lcpi
14-23
Example: AR(1) model of inflation – STATA, ctd
1,
170)
Prob > F = 0.0146
R-squared = 0.0564
Root MSE = 1.6639
| Robust
dinf | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
dinf |
L1. | -.2380348 .0965034 -2.47 0.015 -.4285342 -.0475354
. dis "Adjusted
_cons | Rsquared
.0171013 = " _result(8);
.1268831 0.13 0.893 -.2333681 .2675707
Adjusted Rsquared = .05082278
14-24
Forecasts: terminology and notation
Predicted values are “in-sample” (the usual definition)
Forecasts are “out-of-sample” – in the future
Notation:
o YT+1|T = forecast of YT+1 based on YT,YT–1,…, using the
population (true unknown) coefficients
o Yˆ = forecast of YT+1 based on YT,YT–1,…, using
the
T 1|T
YYT+1|T
ˆ =
T 1|T
ˆ0 + 1Yˆ T T
= +Y,
0 1
ˆ ˆ
where 0 and 1 are
using data through period T.
estimated
14-25
Forecast errors
14-26
Example: forecasting inflation using an AR(1)
14-27
The AR(p) model: using multiple lags for forecasting
R 2 = 0.18
| Robust
dinf | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
dinf |
L1. | -.2579205 .0925955 -2.79 0.006 -.4407291 -.0751119
L2. | -.3220302 .0805456 -4.00 0.000 -.481049 -.1630113
L3. | .1576116 .0841023 1.87 0.063 -.0084292 .3236523
L4. | -.0302685 .0930452 -0.33 0.745 -.2139649 .1534278
_cons | .0224294 .1176329 0.19 0.849 -.2098098 .2546685
NOTES
14-30
Example: AR(4) model of inflation – STATA, ctd.
. test L2.dinf L3.dinf L4.dinf; L2.dinf is the second lag of dinf, etc.
( 1) L2.dinf = 0.0
( 2) L3.dinf = 0.0
( 3) L4.dinf = 0.0
F( 3, 147) = 6.71
Prob > F = 0.0003
14-31
Digression: we used Inf, not Inf, in the AR’s. Why?
Inft = 0 + 1Inft–1 + ut
or
Inft – Inft–1 = 0 + 1(Inft–1 – Inft–2) + ut
or
Inft = Inft–1 + 0 + 1Inft–1 – 1Inft–2 + ut
= 0 + (1+1)Inft–1 – 1Inft–2 + ut
14-32
So why use Inft, not Inft?
AR(1) model of Inf: Inft = 0 + 1Inft–1 + ut
AR(2) model of Inf: Inft = 0 + 1Inft + 2Inft–1 + vt
When Yt is strongly serially correlated, the OLS estimator of
the AR coefficient is biased towards zero.
In the extreme case that the AR coefficient = 1, Yt isn’t
stationary: the ut’s accumulate and Yt blows up.
If Yt isn’t stationary, our regression theory are working with
here breaks down
Here, Inft is strongly serially correlated – so to keep
ourselves in a framework we understand, the regressions are
specified using Inf
More on this later…
14-33
Time Series Regression with Additional Predictors and
the Autoregressive Distributed Lag (ADL) Model
14-34
Example: inflation and unemployment
14-35
The empirical U.S. “Phillips Curve,” 1962 – 2004 (annual)
8,
163)
Prob > F = 0.0000
R-squared = 0.3663
Root MSE = 1.3926
| Robust
dinf | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
dinf |
L1. | -.4198002 .0886973 -4.73 0.000 -.5949441 -.2446564
L2. | -.3666267 .0940369 -3.90 0.000 -.5523143 -.1809391
L3. | .0565723 .0847966 0.67 0.506 -.1108691 .2240138
L4. | -.0364739 .0835277 -0.44 0.663 -.2014098 .128462
unem |
| -2.635548 .4748106 -5.55 0.000 -3.573121 -1.697975
L1.
L2. | 3.043123 .8797389 3.46 0.001 1.305969 4.780277
L3. | -.3774696 .9116437 -0.41 0.679 -2.177624 1.422685
L4. | -.2483774 .4605021 -0.54 0.590 -1.157696 .6609413
14-38
_cons | 1.304271 .4515941 2.89 0.004 .4125424 2.196
Example: ADL(4,4) model of inflation – STATA, ctd.
L4.unem; ( 1) L.unem = 0
( 2) L2.unem = 0
( 3) L3.unem = 0
( 4) L4.unem = 0
14-39
The test of the joint hypothesis that none of the X’s is a useful
predictor, above and beyond lagged values of Y, is called a
Granger causality test
ˆ
T
The forecast error is:
14-41
The mean squared forecast error (MSFE) is,
E(YT+1 – Yˆ )2 = E(uT+1)2 +
T 1|T
14-42
The root mean squared forecast error (RMSFE)
RMSFE = T YTTˆ1| )2 ]
E[(Y 1
14-43
Three ways to estimate the RMSFE
1. Use the approximation RMSFE » u, so estimate
the RMSFE by the SER.
2. Use an actual forecast history for t = t1,…, T, then
estimate by
□ 1 T 1
ˆ
MSFE =
T t1
(Yt1 t1| ) 2
tt 1Y t
1
14-44
The method of pseudo out-of-sample forecasting
Re-estimate your model every period, t = t1–1,…,T–1
Compute your “forecast” for date t+1 using the model
estimated through t
Compute your pseudo out-of-sample forecast at date
t,
ˆ
using the model estimated through t–1. This is Yt1|
t
.
Compute the poos forecast error, Y – Yˆt1|
t+1
t
Plug this forecast error into the MSFE formula,
□ 1 T
ˆ
MSFE =
T t1
t1 t1|
1
(Y ) 2
tt 1Y
1
t
1
Why the term “pseudo out-of-sample forecasts”?
14-45
Using the RMSFE to construct forecast intervals
Y T |T
± 1.96
1
Note: ˆ ´R□MSFE
1. A 95% forecast interval is not a confidence interval (YT+1
isn’t a nonrandom coefficient, it is random!)
2. This interval is only valid if uT+1 is normal – but still
might be a reasonable approximation and is a commonly
used measure of forecast uncertainty
14-46
Example #1: the Bank of England “Fan Chart”, 11/05
http://www.bankofengland.co.uk/publications/inflationreport/index.htm
14-47
Example #2: Monthly Bulletin of the European Central
Bank, Dec. 2005, Staff macroeconomic projections
http://www.ecb.int/pub/mb/html/index.en.html
14-48
Example #3: Fed, Semiannual Report to Congress, 7/04
Economic projections for 2004 and 2005
Federal Reserve Governors
and Reserve Bank presidents
tendency 2005
14-50
The Bayes Information Criterion (BIC)
SSR( p)
ln
BIC(p) = ln T ( p 1) T
T
First term: always decreasing in p (larger p, better fit)
Second term: always increasing in p.
o The variance of the forecast due to estimation error
increases with p – so you don’t want a forecasting model
with too many coefficients – but what is “too many”?
o This term is a “penalty” for using more parameters –
and thus increasing the forecast variance.
Minimizing BIC(p) trades off bias and variance to determine
a “best” value of p for your forecast.
p
SSR( p)
AIC(p) = ln T ( p 1)T
2
SSR( p)
ln
BIC(p) = ln T ( p 1) T
T
The penalty term is smaller for AIC than BIC (2 < lnT)
o AIC estimates more lags (larger p) than the BIC
o This might be desirable if you think longer lags
might be important.
o However, the AIC estimator of p isn’t consistent –
it can overestimate p – the penalty isn’t big enough
14-52
Example: AR model of inflation, lags 0 – 6:
R2
# Lags BIC AIC
0 1.095 1.076 0.000
1 1.067 1.030 0.056
2 0.955 0.900 0.181
3 0.957 0.884 0.203
4 0.986 0.895 0.204
5 1.016 0.906 0.204
6 1.046 0.918 0.204
14-53
Generalization of BIC to multivariate (ADL) models
SSR(K ) ln T
BIC(K) = ln T K
T
Can compute this over all possible combinations of lags of
Y and lags of X (but this is a lot)!
In practice you might choose lags of Y by BIC, and decide
whether or not to include X using a Granger causality test
with a fixed number of lags (number depends on the data
and application)
14-54
Nonstationarity I:
Trends (SW Section
14.6)
14-56
1. What is a trend?
A trend is a long-term movement or tendency in the data.
Trends need not be just a straight line!
Which of these series has a trend?
14-57
14-58
14-59
What is a trend, ctd.
14-61
Deterministic and stochastic trends, ctd.
YT+h|T = 0h + YT
14-64
Stochastic trends and unit autoregressive roots
AR(1): Yt = 0 + 1Yt–1 + ut
where
= 1 + 2 + … + p – 1
1 = –(2 +… + p)
2 = –(3 +… + p)
…
p–1 = –p
14-68
Unit roots in the AR(p) model, ctd.
where = 1 + 2 + … + p – 1.
14-69
2. What problems are caused by
trends?
lgdpjs infs
-2
1965q1 1970q1 1975q1 1980q1 1985q1
time
14-71
Log Japan gdp (smooth line) and US inflation (both rescaled),
1982-1999
lgdpjs infs
-2
-4
1980q1 1985q1 1990q1 1995q1 2000q1
time
14-72
3. How do you detect trends?
Yt = 0 + 1Yt–1 + ut
or
Yt = 0 + Yt–1 + ut
Yt = 0 + Yt–1 + ut
H0: = 0 (that is, 1 = 1) v. H1: < 0
Test: compute the t-statistic testing = 0
Under H0, this t statistic does not have a normal
distribution!!
You need to compare the t-statistic to the table of Dickey-
Fuller critical values. There are two cases:
14-77
Example: Does U.S. inflation have a unit root?
14-78
Example: Does U.S. inflation have a unit root? ctd
DF test for a unit root in U.S. inflation – using p = 4 lags
. reg dinf L.inf L(1/4).dinf if tin(1962q1,2004q4);
5,
Model | 118.197526 5 Prob > F = 0.0000
23.6395052
Residual | 380.599255 166 R-squared = 0.2370
2.2927666
-------------+------------------------------ Adj R-squared = 0.2140
Total | 498.796781 171 Root MSE = 1.5142
2.91694024
t = –2.69 rejects a unit root at 10% level but not the 5% level
Some evidence of a unit root – not clear cut.
This is a topic of debate – what does it mean for inflation
to have a unit root?
We model inflation as having a unit root.
14-81
Summary: detecting and addressing stochastic trends
So we will:
Go over two ways to detect changes in coefficients: tests
for a break, and pseudo out-of-sample forecast analysis
Work through an example: the U.S. Phillips curve
14-83
A. Tests for a break (change) in regression coefficients
Case I: The break date is known
Suppose the break is known to have occurred at date .
Stability of the coefficients can be tested by estimating a fully
interacted regression model. In the ADL(1,1) case:
Yt = 0 + 1Yt–1 + 1Xt–1
+ 0Dt() + 1[Dt()´Yt–1] + 2[Dt()´Xt–1] + ut
where Dt() = 1 if t , and = 0 otherwise.
If 0 = 1 = 2 = 0, then the coefficients are constant over
the full sample.
If at least one of 0, 1, or 2 are nonzero, the regression
function changes at date .
14-84
Yt = 0 + 1Yt–1 + 1Xt–1
+ 0Dt() + 1[Dt()´Yt–1] + 2[Dt()´Xt–1] + ut
where Dt() = 1 if t , and = 0 otherwise
H 0 : 0 = 1 = 2 = 0
vs. H1: at least one of 0, 1, or 2 are
nonzero
break date…
Case II: The break date is unknown
14-86
The Quandt Likelihod Ratio (QLR) Statistic
(also called the “sup-Wald” statistic)
14-88
Get this: in large samples, QLR has the distribution,
q
1 B (s) 2
maxas1a i
,
q i1 s(1 s)
14-89
Note that these critical values are larger than the Fq, critical
values – for example, F1, 5% critical value is 3.84.
14-90
Has the postwar U.S. Phillips Curve been stable?
Has this model been stable over the full period 1962-2004?
14-91
QLR tests of the stability of the U.S. Phillips curve.
dependent variable: Inft
regressors: intercept, Inft–1,…, Inft–4,
Unempt–1,…, Unempt–4
test for constancy of intercept only (other coefficients are
assumed constant): QLR = 2.865 (q = 1).
o 10% critical value = 7.12 don’t reject at 10%
level
test for constancy of intercept and coefficients on
Unempt,…, Unempt–3 (coefficients on Inft–1,…, Inft–4
are constant): QLR = 5.158 (q = 5)
o 1% critical value = 4.53 reject at 1% level
o Break date estimate: maximal F occurs in 1981:IV
Conclude that there is a break in the inflation –
unemployment relation, with estimated date of 1981:IV 14-92
14-93
B. Assessing Model Stability using Pseudo Out-of-Sample
Forecasts
The QLR test does not work well towards the very end of
the sample – but this is usually the most interesting part
– it is the most recent history and you want to know if
the forecasting model still works in the very recent past.
There are some big forecast errors (in 2001) but they do not
appear to be getting bigger – the model isn’t deteriorating
14-96
poos forecasts using the Phillips curve, ctd.
14-98
Summary, ctd.