01B Linear Regression With Time Series Data
01B Linear Regression With Time Series Data
Econometrics II
Linear Regression with Time Series Data
Morten Nyboe Tabor
university of copenhagen department of economics
Learning Goals
• Derive the method of moments (MM) estimator and state the assumptions
used to derive the estimator. Estimate and interpret the parameters.
Outline
1 The Linear Regression Model
Definition, Interpretation, and Identification
2 How do we identify and interpret parameters of the model?
Method of Moments (MM) Estimation
3 Properties of the Estimator
Consistency
Unbiasedness
Example: Bias in AR(1) Model
Asymptotic Distribution
4 Dynamic Completeness and Autocorrelation
A Dynamically Complete Model
Autocorrelation of the Error Term
Consequences of Autocorrelation
5 Model Formulation and Misspecification Testing
Model Formulation
Misspecification Testing
Model Formulation and Misspecification Testing in Practice
6 The Frisch-Waugh-Lovell Theorem
7 Recap: Linear Regression Model with Time Series Data
Econometrics II — Linear Regression with Time Series Data — Slide 3/47
1. The Linear Regression Model
university of copenhagen department of economics
Identification
E (xt t ) = 0, (∗ ∗ ∗)
which is unique.
• The parameters in β are identified by (∗ ∗ ∗) and the non-singularity
condition.
The latter is the well-known condition for no perfect multicollinearity.
• Note the two distinct conditions for OLS to converge to the true value:
1 The moment condition (∗ ∗ ∗) should be satisfied.
2 A law of large numbers should apply.
A central part of econometric analysis is to ensure these conditions.
Main Assumption
Main Assumption
Consider a time series yt and the k × 1 vector time series xt . We assume:
0
1 The process zt = (yt , xt0 ) has a joint stationary distribution.
2 The process zt is weakly dependent, so that zt and zt+h becomes
approximately independent for h → ∞.
• Interpretation:
Think of (1) as replacing identical distributions for IID data.
Think of (2) as replacing independent observations for IID data.
• Under the main assumption, most of the results for linear regression on
random samples carry over to the time series case.
Consistency
Result 1: Consistency
Let yt and xt obey the main assumption. If the regressors obey the moment
condition,
E (xt t ) = 0,
then the OLS estimator is consistent, i.e., βb → β as T → ∞.
Illustration of Consistency
• Consider the regression model with a single explanatory variable, k = 1,
Unbiasedness
• A stronger requirement for an estimator is unbiasedness: E (β
b) = β.
Result 2: Unbiasedness
Let yt and xt obey the main assumption. If the regressors are strictly
exogenous,
E (t | x1 , x2 , ..., xt , ..., xT ) = 0,
then the OLS estimator is unbiased, i.e., E (βb | x1 , x2 , ..., xT ) = β.
yt = θyt−1 + t .
We are often interested in E (θb) to check for bias for a given T . This is
typically difficult to derive analytically.
Bias(θb) = MEAN(θb) − θ.
Note that by the LLN (for independent observations, since the θb(m) ’s are
independent):
M
X
MEAN(θb) = M −1 θb(m) → E (θb) for M → ∞.
m=1
Ya_1
5.0
2.5
-0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25
0.0
-0.5
20 40 60 80 100 120 140 160 180 200
Ya_1
5.0
2.5
0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70
1.0
Ya_1 × 2MCSD
0.5
0.0
Ya_1
30
20
10
0.84 0.85 0.86 0.87 0.88 0.89 0.90 0.91 0.92 0.93 0.94
1.25
Ya_1 × 2MCSD
1.00
0.75
0.50
0 100 200 300 400 500 600 700 800 900 1000
Asymptotic Distribution
• To derive the asymptotic distribution we need a CLT; additional
restrictions on t .
E 2t | xt σ2
=
E (t s | xt , xs ) = 0 for all t 6= s.
• Residual autocorrelation does not imply that the DGP has autocorrelated
errors. Typically, autocorrelation is taken as a signal of misspecification.
Different possibilities:
1 Autoregressive errors in the DGP.
2 Dynamic misspecification.
Consequences of Autocorrelation
yt = θyt−1 + t
t = ρt−1 + vt , vt ∼ IID(0, σv2 ).
• Even if OLS is consistent, the standard formula for the variance in Result
4 is no longer valid. It is possible to derive the variance, the so-called
heteroskedasticity-and-autocorrelation-consistent (HAC) standard errors.
yt = xt0 β + t
t = ρt−1 + vt , vt ∼ IID(0, σv2 ).
xt0 − ρxt−1
0
(yt − ρyt−1 ) = β + (t − ρt−1 )
yt = ρyt−1 + xt0 β − xt−1
0
ρβ + vt .
yt = x1t · β1 + ut . (♦♦)
• An example is if the DGP exhibits a level shift, e.g., (♦) includes the
dummy variable
0 for t < T0
x2t = .
1 for t ≥ T0
If x2t is not included in (♦♦) then the residual will be systematic.
yt = xt0 β + εt . (*)
• However, if (∗∗) does not satisfy the specific assumptions we cannot use
the results in (1)-(3) to interpret the estimated coefficients and to do
statistical inference (e.g. test the hypothesis H0 : βi = 0).
Misspecification Testing
• If all tests are passed, we may think of the model as representing the main
features of the data and we can use the results in (1) − (3), e.g. for
testing hypotheses on estimated parameters.
t = xt0 δ + γb
b t−1 + ut ,
LM = T · R 2 ∼ χ2 (1).
Note that xt and t are orthogonal, and any explanatory power is due to
t−1 .
b
• The Durbin Watson (DW) test is derived for finite samples.
Based on strict exogeneity. Not valid in many models.
γ1 = ... = γk = δ1 = ... = δk = 0.
The alternative is that the variance of t depends on xit or the squares xit2
for some i = 1, 2, ...k.
• The LM test,
LM = T · R 2 ∼ χ2 (2k).
• Skewness (S) and kurtosis (K ) are the estimated third and fourth central
moments of the standardized estimated residuals ut = (εbt − ε̄)/σ
b.:
T
X T
X
S = T −1 ut3 and K = T −1 ut4 .
t=1 t=1
T 2
ξS = S → χ2 (1)
6
T
ξK = (K − 3)2 → χ2 (1).
24
• It turns out that ξS and ξK are independent: Jarque-Bera joint test of
normality:
ξJB = ξS + ξK → χ2 (2).
• Often normality of the error terms is rejected due to the presence of a few
large outliers that the model cannot account for (i.e. they are captured by
the error term).
• Note that the results for the linear regression model hold without
assuming normality of εt . However, the normal distribution is a natural
benchmark and given normality:
• β
b converges faster to the asymptotic normal distribution.
• The OLS estimator coincides with the maximum likelihood (ML) estimator.
Motivation
yt = µ + β1 xt + εt , t = 1, 2, ..., T , (1)
yt = yt − ȳ and
and the linear regression model for the de-meaned variables e
xt = xt − x̄ , given by
e
yt = b1 e
e xt + ut , t = 1, 2, ..., T . (2)
yt = µ + β1 xt + εt , t = 1, 2, ..., T , (3)
yt = yt − ȳ and
and the linear regression model for the de-meaned variables e
xt = xt − x̄ , given by
e
yt = b1 e
e xt + ut , t = 1, 2, ..., T . (4)
(a) Scatterplot of y t (vertical axis) and xt (horizontal axis) y t = y t −ȳ (vertical axis) and ~xt = xt − ¯x (horizontal axis)
(b) Scatterplot of ~
y t ×xt y t ×~
~ xt
20
5
15
0
10
-5
0 2 4 6 8 -4 -2 0 2 4
Model
Recap: yt = xt0 β + εt
Main assumption: Cross-section vs. Time series
Cross-Section: independent and identically distributed
Time Series: weak dependence and stationarity
Technical requirements for the application of a LLN and a CLT.