Chapter 17: Autocorrelation (Serial Correlation) : - o o o o - o
Chapter 17: Autocorrelation (Serial Correlation) : - o o o o - o
Chapter 17 Outline
• Review
o Regression Model
o Standard Ordinary Least Squares (OLS) Premises
o Estimation Procedures Embedded within the Ordinary Least
Squares (OLS) Estimation Procedure
o Covariance and Independence
• What Is Autocorrelation (Serial Correlation)?
• Autocorrelation and the Ordinary Least Squares (OLS) Estimation
Procedure: The Consequences
o The Mathematics
Ordinary Least Squares (OLS) Estimation Procedure for
the Coefficient Value
Ordinary Least Squares (OLS) Estimation Procedure for
the Variance of the Coefficient Estimate’s Probability
Distribution
o Our Suspicions
o Confirming Our Suspicions
• Accounting for Autocorrelation: An Example
• Justifying the Generalized Least Squares (GLS) Estimation
Procedure
• Robust Standard Errors
Review the algebra. What role, if any, did the second premise ordinary least
squares (OLS) premise, the error term/error term independence premise, play?
3. Suppose that two variables are positively correlated.
a. In words, what does this mean?
b. What type of graph do we use to illustrate their correlation? What does
the graph look like?
c. What can we say about their covariance and correlation coefficient?
4. Suppose that two variables are independent.
a. In words, what does this mean?
b. What type of graph do we use to illustrate their correlation? What does
the graph look like?
c. What can we say about their covariance and correlation coefficient?
5. Consider the following model and data:
ConsDurt = βConst + βIInct + et
Consumer Durable Data: Monthly time series data of consumer durable
production and income statistics 2004 to 2009.
ConsDurt Consumption of durables in month t (billions of 2005 chained
dollars)
Const Consumption in month t (billions of 2005 chained dollars)
Inct Disposable income in month t (billions of 2005 chained
dollars)
a. What is your theory concerning how disposable income should affect
the consumption of consumer durables? What does your theory
suggest about the sign of the income coefficient, βI?
b. Run the appropriate regression. Do the data support your theory?
yt = βConst + βxxt + et
et = ρet−1 + vt
Estyt = bConst + bxxt
Rest = yt − Estyt
Start with the last equation, the equation for Rest. Using algebra and the
other equations, show that
Rest = (βConst−bConst) + (βx−bx)xt + ρet−1 + vt
7. Consider the following equations:
yt = βConst + βxxt + et
yt−1 = βConst + βxxt−1 + et−1
et = ρet−1 + vt
Multiply the yt−1 equation by ρ. Then, subtract it from the yt equation. Using
algebra and the et equation show that
(yt − ρyt−1) = (βConst − ρβConst) + βx(xt − ρxt−1) + vt
Review
Regression Model
We begin by reviewing the basic regression model:
yt = βConst + βxxt + et yt = Dependent variable xt = Explanatory variable
et = Error term t = 1, 2, …, T T = Sample size
The error term is a random variable that represents random influences:
Mean[et] = 0
The Standard Ordinary Least Squares (OLS) Premises
Again, we begin by focusing our attention on the standard ordinary least squares
(OLS) regression premises:
• Error Term Equal Variance Premise: The variance of the error term’s
probability distribution for each observation is the same; all the variances
equal Var[e]:
Var[e1] = Var[e2] = … = Var[eT] = Var[e]
• Error Term/Error Term Independence Premise: The error terms are
independent: Cov[ei, ej] = 0.
Knowing the value of the error term from one observation does not
help us predict the value of the error term for any other observation.
• Explanatory Variable/Error Term Independence Premise: The
explanatory variables, the xt’s, and the error terms, the et’s, are not
correlated.
Knowing the value of an observation’s explanatory variable does not
help us predict the value of that observation’s error term.
4
∑(y t − y )( xt − x )
bx = t =1
T
and bConst = y − bx x
∑ (x − x )
t =1
t
2
x1 − x )( y1 − y ) + ( x2 − x )( y2 − y ) + … + ( xN − x )( y N − y ) ∑
( ( xt − x )( yt − y )
Cov[ x , y ] = = t =1
N N
−
(yi − y)
Quadrant II Quadrant I
− − − −
(xi−x)<0 (yi−y)>0 (xi−x)>0 (yi−y)>0
− − − −
(xi−x)(yi−y) < 0 (xi−x)(yi−y) > 0
−
(xi - x)
Quadrant III Quadrant IV
− − − −
(xi−x)<0 (yi−y)<0 (xi−x)>0 (yi−y)<0
− − − −
(xi-x)(yi-y) > 0 (xi−x)(yi−y) < 0
• First quadrant. Dow growth rate is greater than its mean and Nasdaq
growth is greater than its mean; the product of the deviations is positive in
the first quadrant:
( xt − x ) > 0 and ( yt − y ) > 0 → ( xt − x )( yt − y ) > 0
• Second quadrant. Dow growth rate is less than its mean and Nasdaq
growth is greater than its mean; the product of the deviations is negative in
the second quadrant:
( xt − x ) < 0 and ( yt − y ) > 0 → ( xt − x )( yt − y ) < 0
6
• Third quadrant. Dow growth rate is less than its mean and Nasdaq growth
is less than its mean; the product of the deviations is positive in the third
quadrant:
( xt − x ) < 0 and ( yt − y ) < 0 → ( xt − x )( yt − y ) > 0
• Fourth quadrant. Dow growth rate is greater than its mean and Nasdaq
growth is less than its mean; the product of the deviations is negative in
the fourth quadrant:
( xt − x ) > 0 and ( yt − y ) < 0 → ( xt − x )( yt − y ) < 0
Recall that we used precipitation in Amherst, the Nasdaq growth rate, and the
Dow Jones growth rate to illustrate independent and correlated variables in
Chapter 1:
diagram (Figure 17.3), suggesting that, for the most part, when et−1 is positive et
will be positive also or alternatively when et−1 is negative et will be negative also;
this illustrates positive autocorrelation.
et
et-1
Figure 17.5: ρ = 0
et
et-1
Figure 17.6: ρ = .9
10
The Mathematics
Now, let us explore the consequences of autocorrelation. Just as with
heteroskedasticity, we shall focus on two of the three estimation procedures
embedded within the ordinary least squares (OLS) estimation procedure, the
procedures to estimate the:
• value of the coefficient.
• variance of the coefficient estimate’s probability distribution.
Question: Are these estimation procedures still unbiased when autocorrelation is
present?
Ordinary Least Squares (OLS) Estimation Procedure for the Coefficient Value
Begin by focusing on the coefficient value. Previously, we showed that the
estimation procedure for the coefficient value was unbiased by
• applying the arithmetic of means;
and
• recognizing that the means of the error terms’ probability distributions
equal 0 (since the error terms represent random influences).
Let us quickly review. First, recall the arithmetic of means:
Mean of a constant plus a variable: Mean[c + x] = c + Mean[x]
Mean of a constant times a variable: Mean[cx] = c Mean[x]
Mean of the sum of two variables: Mean[x + y] = Mean[x] + Mean[y]
To keep the algebra straightforward, we focused on a sample size of 3:
Equation for Coefficient Estimate:
T
∑ ( x − x )e
t t
( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3
bx = β x + t =1
= βx +
T
( x1 − x )2 + ( x2 − x ) 2 + ( x3 − x ) 2
∑ (x − x )
t =1
t
2
11
What is the critical point here? We have not relied on the error term/error
term independence premise to show that the estimation procedure for the
coefficient value is unbiased. Consequently, we suspect that the estimation
procedure for the coefficient value will continue to be unbiased in the presence of
autocorrelation.
12
Ordinary Least Squares (OLS) Estimation Procedure for the Variance of the
Coefficient Estimate’s Probability Distribution
Next, consider the estimation procedure for the variance of the coefficient
estimate’s probability distribution used by the ordinary least squares (OLS)
estimation procedure:
The strategy involves two steps:
• First, we used the adjusted variance to estimate the variance of the error
SSR
term’s probability distribution: EstVar[e ] = estimates
Degrees of Freedom
Var[e].
• Second, we applied the equation relating the variance of the coefficient
estimates probability distribution and the variance of the error term’s
Var[e ]
probability distribution: Var[bx ] = T
∑ ( xt − x )2
t =1
Step 1: Estimate the variance of the error
Step 2: Apply the relationship between the
term’s
probability distribution from the available variances of coefficient estimate’s and
information – data from the first quiz error term’s probability distributions
↓ ↓
Var[e ]
SSR Var[bx ] = T
EstVar[e ] =
Degrees of Freedom ∑ ( xt − x )2
t =1
é ã
EstVar[e ]
EstVar[bx ] = T
∑ (x − x )
t =1
t
2
Consequently, when calculating the variance of the sum of two variables that are
not independent we cannot ignore their covariance.
Var[x + y] = Var[x] + Var[y] + Cov[x, y]
x and y independent x and y not independent
↓ ↓
Cov[x, y] = 0 Cov[x, y] ≠ 0
↓ ↓
Can ignore covariance Cannot ignore covariance
↓
Var[x + y] = Var[x] + Var[y]
Next, apply this to the error terms when autocorrelation is absent and
when it is present:
When autocorrelation When autocorrelation
is absent is present
↓ ↓
The error terms are The error terms
independent not independent
↓ ↓
We can ignore the We cannot ignore the
error term covariances error term covariances
t =1
critical role played by the error term/error term independence premise. We began
with the equation for the coefficient estimate:
Equation for Coefficient Estimate:
T
∑ ( x − x )e
t t
( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3
bx = β x + t =1
= βx +
T
( x1 − x )2 + ( x2 − x ) 2 + ( x3 − x ) 2
∑ (x − x )
t =1
t
2
1 2 3
1 2 3
1 2 3
Simplifying
Var[e ]
=
( x1 − x )2 + ( x2 − x ) 2 + ( x3 − x ) 2
Generalizing
Var[e ]
= T
∑ (x − x )
t =1
t
2
15
Focus on the fourth step. When the error term/error term independence
premise is satisfied, that is, when the error terms are independent, we can ignore
the covariance terms when calculating the variance of a sum of variables.
=
[( x − x ) 2
1
+ ( x2 − x ) 2 + ( x3 − x ) 2 ]2
Var [(( x − x )e + ( x − x )e + ( x − x )e )]
1 1 2 2 3 3
1
When autocorrelation is present, however, the error terms are not independent and
the covariance terms cannot be ignored. Therefore, when autocorrelation is
present the fourth step is invalid:
Var[bx ] =
[( x − x ) 2
1
+ ( x2 − x ) 2 + ( x3 − x ) 2 ]2
[Var[( x − x )e ] + Var[( x − x )e ] + Var[( x − x )e ]]
1 1 2 2 3 3
1
The procedure used by the ordinary least squares (OLS) to estimate the variance
of the coefficient estimate’s probability distribution is flawed.
Step 1: Estimate the variance of the error
Step 2: Apply the relationship between the
term’s
probability distribution from the available variances of coefficient estimate’s and
information – data from the first quiz error term’s probability distributions
↓ ↓
Var[e ]
SSR Var[bx ] = T
EstVar[e ] =
Degrees of Freedom ∑ ( xt − x )2 t =1
é ã
EstVar[e ]
EstVar[bx ] = T
∑ (x − x )
t =1
t
2
16
The equation that the ordinary least squares (OLS) estimation procedure uses to
estimate the variance of the coefficient estimate’s probability distribution is
flawed when autocorrelation is present. Consequently, how can we have faith in
the variance estimate?
Our Suspicions
Let us summarize. After reviewing the algebra we suspect that when
autocorrelation is present the ordinary least squares (OLS) estimation procedure
for the
• coefficient value will still be unbiased.
• variance of the coefficient estimate’s probability distribution may be
biased.
Econometrics Lab 17.2: The Ordinary Least Squares (GLS) Estimation Procedure
and Autocorrelation
Economic theory suggests that higher levels of disposable income increase the
consumption of consumer durables:
Theory: βI > 0. Higher disposable income increases the consumption of
durables.
There may a problem with this, however. The equation used by the
ordinary least squares (OLS) estimation procedure to estimate the variance of the
coefficient estimate’s probability distribution assumes that the error term/error
term independence premise is satisfied. Our simulation revealed that when
autocorrelation is present and the error term/error term independence premise is
violated, the ordinary least squares (OLS) estimation procedure estimating the
variance of the coefficient estimate’s probability distribution can be flawed.
Recall that the standard error equals the square root of the estimated variance.
Consequently, if autocorrelation is present, we may have entered the wrong value
for the standard error into the Econometrics Lab when we calculated Prob[Results
IF H0 True]. When autocorrelation is present the ordinary least squares (OLS)
estimation procedure bases it computations on a faulty premise, resulting in
flawed standard errors, t-Statistics, and tails probabilities. Consequently, we
should move on to the next step.
21
The residuals are plotted consecutively, one month after another. As we can easy
see, a positive residual is typically followed by another positive residual; a
negative residual is typically followed by a negative residual. “Switchovers” do
occur, but they are not frequent. This suggests that positive autocorrelation is
present. Most statistical software provides a very easy way to look at the
residuals.
23
Most of the scatter diagram points lie in the first and third quadrants. The
residuals are positively correlated.
24
Since the residual plots suggest that our fears are warranted, we now test
the autocorrelation model more formally. While there are many different
approaches, we shall focus on the Lagrange Multiplier (LM) approach which uses
an artificial regression to test for autocorrelation.3 We shall proceed by reviewing
a mathematical model of autocorrelation.
Autocorrelation Model: et = ρet−1 + vt vt‘s are independent
ρ=0 ρ≠0
↓ ↓
et = vt et depends on et−1
↓ ↓
No autocorrelation Autocorrelation present
In this case, we believe that ρ is positive. A positive rho provides the error term
with inertia. A positive error term tends to follow a positive error term and a
negative error term tends to follow a negative term. But also note that there is a
second term, vt. The vt‘s are independent; they represent random influences which
affect the error term also. It is the vt‘s that “switch” the sign of the error term.
Now, we combine the original model with the autocorrelation model:
Original Model: yt = βConst + βxxt + et et‘s are unobservable
Autocorrelation Model: et = ρet−1 + vt vt‘s are independent
Ordinary Least Squares (OLS) Estimate: Estyt = bConst + bxxt
Residuals: Rest = yt − Estyt Rest‘s are observable
Rest = yt − Estyt
⏐ Substituting for yt
⏐
↓ yt = βConst + βxxt + et
= βConst + βxxt + et − Estyt
⏐ Substituting for et
⏐
↓ et = ρet−1 + vt
= βConst + βxxt + ρet−1 + vt − Estyt
⏐ Substituting for Estyt
⏐
↓ Estyt = bConst + bxxt
= βConst + βxxt + ρet−1 + vt − (bConst + bxxt)
Rearranging terms
= (βConst − bConst) + (βx−bx)xt + ρet−1 + vt
⏐ Cannot observe et−1
⏐
↓ use Rest−1 instead
= (βConst − bConst) + (βx−bx)xt + ρRest−1 + vt
25
NB: Since the vt‘s are independent, we need not worry about
autocorrelation here.
Critical Result: The Resid(−1) coefficient estimate equals .8394. The positive
sign of the coefficient estimate suggests that an increase in last period’s
residual increases this period’s residual. This evidence suggests that
autocorrelation is present.
Now, we formulate the null and alternative hypotheses:
H0: ρ = 0 No autocorrelation present
H1: ρ > 0 Positive autocorrelation present
The null hypothesis challenges the evidence by asserting that no autocorrelation is
present. The alternative hypothesis is consistent with the evidence.
βI CoefficientStandard Tails
Estimate Error t-Statistic Probability
Ordinary Least Squares (OLS) .087 .016 5.37 <.0001
Generalized Least Squares (GLS) .041 .028 1.44 .1545
Table 17.6: Coefficient Estimate Comparison
The most striking difference is the standard errors and the calculations that are
based on the estimated variance of the coefficient probability distribution: the
coefficient’s standard error, t-Statistic, and tails probability. The standard error
nearly doubles when we account for autocorrelation. This is hardly surprising.
The ordinary least squares (OLS) regression calculations are based on the premise
that the error terms are independent. Our analysis suggests that this is not true.
The general least squares (GLS) regression accounts for error term correlation.
The standard error, t-Statistic, and tails probability in the general least squares
(GLS) regression differ substantially.
30
When the ordinary least squares (OLS) estimation procedure is used, the variance
of the estimated coefficient values equals about 1.11. Now, specify the
generalized least squares (GLS) estimation procedure by clicking GLS. Click
Start and then after many, many repetitions click Stop. When the generalized least
squares (GLS) estimation procedure is used, the variance of the estimated
coefficient values is less, 1.01. Consequently, the generalized least squares (GLS)
estimation procedure provides more reliable estimates.
1
Recall that to keep the algebra straightforward we assume that the explanatory
variables are constants. By doing so, we can apply the arithmetic of means easily.
Our results are unaffected by this assumption.
2
Recall that to keep the algebra straightforward we assume that the explanatory
variables are constants. By doing so, we can apply the arithmetic of variances
easily. Our results are unaffected by this assumption.
3
The Durbin-Watson statistic is the traditional method of testing for
autocorrelation. Unfortunately, the distribution of the Durbin-Watson statistic
depends on the distribution of the explanatory variable. This makes hypotheses
testing with the Durbin-Watson statistic more complicated than with the Lagrange
multiplier test. Consequently, we shall focus on the Lagrange multiplier test.
4
While it is beyond the scope of this textbook, it can be shown that while this
estimation procedure is biased, the magnitude of the bias diminishes and
approaches zero as the sample size approaches infinity.