0% found this document useful (0 votes)
111 views32 pages

Chapter 17: Autocorrelation (Serial Correlation) : - o o o o - o

1) The document discusses autocorrelation (serial correlation) and its impact on ordinary least squares (OLS) regression analysis. 2) Autocorrelation violates a key OLS assumption of independent error terms. 3) When error terms are autocorrelated, it invalidates the standard errors produced by OLS regression.

Uploaded by

hazar rochmatin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views32 pages

Chapter 17: Autocorrelation (Serial Correlation) : - o o o o - o

1) The document discusses autocorrelation (serial correlation) and its impact on ordinary least squares (OLS) regression analysis. 2) Autocorrelation violates a key OLS assumption of independent error terms. 3) When error terms are autocorrelated, it invalidates the standard errors produced by OLS regression.

Uploaded by

hazar rochmatin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Chapter 17: Autocorrelation (Serial Correlation)

Chapter 17 Outline
• Review
o Regression Model
o Standard Ordinary Least Squares (OLS) Premises
o Estimation Procedures Embedded within the Ordinary Least
Squares (OLS) Estimation Procedure
o Covariance and Independence
• What Is Autocorrelation (Serial Correlation)?
• Autocorrelation and the Ordinary Least Squares (OLS) Estimation
Procedure: The Consequences
o The Mathematics
ƒ Ordinary Least Squares (OLS) Estimation Procedure for
the Coefficient Value
ƒ Ordinary Least Squares (OLS) Estimation Procedure for
the Variance of the Coefficient Estimate’s Probability
Distribution
o Our Suspicions
o Confirming Our Suspicions
• Accounting for Autocorrelation: An Example
• Justifying the Generalized Least Squares (GLS) Estimation
Procedure
• Robust Standard Errors

Chapter 17 Prep Questions


1. What are the standard ordinary least squares (OLS) premises?
2. In Chapter 6 we showed that the ordinary least squares (OLS) estimation
procedure for the coefficient value was unbiased; that is, we showed that
Mean[bx] = βx
Review the algebra. What role, if any, did the second premise ordinary least
squares (OLS) premise, the error term/error term independence premise, play?
3. In Chapter 6 we showed that the variance of the coefficient estimate’s
probability distribution equals the variance of the error term’s probability
distribution divided by the sum of the squared x deviations; that is, we showed
that
Var[e ]
Var[bx ] = T
∑ ( xt − x )2
t =1
2

Review the algebra. What role, if any, did the second premise ordinary least
squares (OLS) premise, the error term/error term independence premise, play?
3. Suppose that two variables are positively correlated.
a. In words, what does this mean?
b. What type of graph do we use to illustrate their correlation? What does
the graph look like?
c. What can we say about their covariance and correlation coefficient?
4. Suppose that two variables are independent.
a. In words, what does this mean?
b. What type of graph do we use to illustrate their correlation? What does
the graph look like?
c. What can we say about their covariance and correlation coefficient?
5. Consider the following model and data:
ConsDurt = βConst + βIInct + et
Consumer Durable Data: Monthly time series data of consumer durable
production and income statistics 2004 to 2009.
ConsDurt Consumption of durables in month t (billions of 2005 chained
dollars)
Const Consumption in month t (billions of 2005 chained dollars)
Inct Disposable income in month t (billions of 2005 chained
dollars)
a. What is your theory concerning how disposable income should affect
the consumption of consumer durables? What does your theory
suggest about the sign of the income coefficient, βI?
b. Run the appropriate regression. Do the data support your theory?

[Link to MIT-ConsDurDisInc-2004-2009.wf1 goes here.]

c. Graph the residuals.


Getting Started in EViews___________________________________________
• Run the regression and close the Equation window.
• Click View
• Click Actual, Fitted, Residual
• Click Residual Graph
__________________________________________________________________
d. If the residual is positive in one month, is it usually positive in the next
month?
e. If the residual is negative in one month, is it usually negative in the next
month?
6. Consider the following equations:
3

yt = βConst + βxxt + et
et = ρet−1 + vt
Estyt = bConst + bxxt
Rest = yt − Estyt
Start with the last equation, the equation for Rest. Using algebra and the
other equations, show that
Rest = (βConst−bConst) + (βx−bx)xt + ρet−1 + vt
7. Consider the following equations:
yt = βConst + βxxt + et
yt−1 = βConst + βxxt−1 + et−1
et = ρet−1 + vt
Multiply the yt−1 equation by ρ. Then, subtract it from the yt equation. Using
algebra and the et equation show that
(yt − ρyt−1) = (βConst − ρβConst) + βx(xt − ρxt−1) + vt

Review
Regression Model
We begin by reviewing the basic regression model:
yt = βConst + βxxt + et yt = Dependent variable xt = Explanatory variable
et = Error term t = 1, 2, …, T T = Sample size
The error term is a random variable that represents random influences:
Mean[et] = 0
The Standard Ordinary Least Squares (OLS) Premises
Again, we begin by focusing our attention on the standard ordinary least squares
(OLS) regression premises:
• Error Term Equal Variance Premise: The variance of the error term’s
probability distribution for each observation is the same; all the variances
equal Var[e]:
Var[e1] = Var[e2] = … = Var[eT] = Var[e]
• Error Term/Error Term Independence Premise: The error terms are
independent: Cov[ei, ej] = 0.
Knowing the value of the error term from one observation does not
help us predict the value of the error term for any other observation.
• Explanatory Variable/Error Term Independence Premise: The
explanatory variables, the xt’s, and the error terms, the et’s, are not
correlated.
Knowing the value of an observation’s explanatory variable does not
help us predict the value of that observation’s error term.
4

Estimation Procedures Embedded within the Ordinary Least Squares (OLS)


Estimation Procedure
The ordinary least squares (OLS) estimation procedure includes three important
estimation procedures. A procedure to estimate the:
• Values of the regression parameters, βx and βConst:
T

∑(y t − y )( xt − x )
bx = t =1
T
and bConst = y − bx x
∑ (x − x )
t =1
t
2

• Variance of the error term’s probability distribution, Var[e]:


SSR
EstVar[e ] =
Degrees of Freedom

• Variance of the coefficient estimate’s probability distribution, Var[bx]:


EstVar[e ]
EstVar[bx ] = T
∑ ( xt − x )2 t =1
When the standard ordinary least squares (OLS) regression premises are met:
• Each estimation procedure is unbiased; that is, each estimation procedure
does not systematically underestimate or overestimate the actual value.
• The ordinary least squares (OLS) estimation procedure for the coefficient
value is the best linear unbiased estimation procedure (BLUE).
Crucial Point: When the ordinary least squares (OLS) estimation procedure
performs its calculations, it implicitly assumes that the standard ordinary least
squares (OLS) regression premises are satisfied.

In Chapter 16, we focused on the first standard ordinary least squares


(OLS) premise. We shall now turn our attention to the second, error term/error
term independence premise. We begin by examining precisely what the premise
means. Subsequently, we investigate what problems do and do not emerge when
the premise is violated and finally what can be done to address the problems that
do arise.
5

Covariance and Independence


We introduced covariance to quantify the notions of correlation and
independence. If two variables are correlated, their covariance is nonzero. On the
other hand, if two variables are independent their covariance is 0. A scatter
diagram allows us to illustrate how covariance is related to independence and
correlation. To appreciate why, consider the equation we use to calculate
covariance:
N

x1 − x )( y1 − y ) + ( x2 − x )( y2 − y ) + … + ( xN − x )( y N − y ) ∑
( ( xt − x )( yt − y )
Cov[ x , y ] = = t =1
N N

Focus on one term in the numerator the covariance term, ( xt − x )( yt − y ) ;


consider its sign in each of the four quadrants:


(yi − y)

Quadrant II Quadrant I
− − − −
(xi−x)<0 (yi−y)>0 (xi−x)>0 (yi−y)>0

− − − −
(xi−x)(yi−y) < 0 (xi−x)(yi−y) > 0

(xi - x)
Quadrant III Quadrant IV
− − − −
(xi−x)<0 (yi−y)<0 (xi−x)>0 (yi−y)<0

− − − −
(xi-x)(yi-y) > 0 (xi−x)(yi−y) < 0

Figure 17.1: Scatter Diagram and Covariance

• First quadrant. Dow growth rate is greater than its mean and Nasdaq
growth is greater than its mean; the product of the deviations is positive in
the first quadrant:
( xt − x ) > 0 and ( yt − y ) > 0 → ( xt − x )( yt − y ) > 0
• Second quadrant. Dow growth rate is less than its mean and Nasdaq
growth is greater than its mean; the product of the deviations is negative in
the second quadrant:
( xt − x ) < 0 and ( yt − y ) > 0 → ( xt − x )( yt − y ) < 0
6

• Third quadrant. Dow growth rate is less than its mean and Nasdaq growth
is less than its mean; the product of the deviations is positive in the third
quadrant:
( xt − x ) < 0 and ( yt − y ) < 0 → ( xt − x )( yt − y ) > 0
• Fourth quadrant. Dow growth rate is greater than its mean and Nasdaq
growth is less than its mean; the product of the deviations is negative in
the fourth quadrant:
( xt − x ) > 0 and ( yt − y ) < 0 → ( xt − x )( yt − y ) < 0
Recall that we used precipitation in Amherst, the Nasdaq growth rate, and the
Dow Jones growth rate to illustrate independent and correlated variables in
Chapter 1:

Figure 17.2: Precipitation versus Nasdaq Growth


7

Figure 17.3: Dow Jones Growth versus Nasdaq Growth

Precipitation in Amherst and the Nasdaq growth rate are independent;


knowing one does not help us predict the other. Figure 17.2 shows that the scatter
diagram points are distributed relatively evenly throughout the four quadrants
thereby suggesting that the covariance is approximately 0. On the other hand, the
Dow Jones growth rate and the Nasdaq growth rate are not independent, they are
correlated. Most points on Figure 17.3 are located in the first and third quadrants;
consequently, most of the covariance terms are positive resulting in a positive
covariance.
8

What is Autocorrelation (Serial Correlation)?


Autocorrelation (serial correlation) is present whenever the value of one
observation’s error term allows us to predict the value of the next. When this
occurs one observation’s error term is correlated with the next observation’s; the
error terms are correlated and the second premise, the error term/error term
independence premise, is violated. The following equation models
autocorrelation:
Autocorrelation Model: et = ρet−1 + vt vt‘s are independent
The Greek letter “rho” is the traditional symbol that is used to represent
autocorrelation. When rho equals 0, no autocorrelation is present; when “rho”
equals 0, the ρet−1 term disappears and the error terms, the e’s, are independent
because the vt‘s are independent. On the other hand, when “rho” does not equal 0,
autocorrelation is present.
ρ=0 ρ≠0
↓ ↓
et = vt et depends on et−1
↓ ↓
No autocorrelation Autocorrelation present
We shall now turn to the Econometrics Lab to illustrate this.

Econometrics Lab 17.1: The Error Terms and Autocorrelation

[Link to MIT-Lab 17.1 goes here.]

Figure 17.4: Rho List

We can use a simulation to illustrate autocorrelation. We begin with selecting .0


in the “Rho” list. Focus on the et−1 versus et scatter diagram. You will observe that
this scatter diagram looks very much like the Amherst Precipitation-Nasdaq
scatter diagram (Figure 17.2) indicating that the two error terms are independent;
that is, does knowing et−1 not help us to predict et? Next, specify rho to equal .9.
Now, the scatter diagram will look much more like the Dow Jones-Nasdaq scatter
9

diagram (Figure 17.3), suggesting that, for the most part, when et−1 is positive et
will be positive also or alternatively when et−1 is negative et will be negative also;
this illustrates positive autocorrelation.

et

et-1

Figure 17.5: ρ = 0

et

et-1

Figure 17.6: ρ = .9
10

Autocorrelation and the Ordinary Least Squares (OLS) Estimation


Procedure: The Consequences

The Mathematics
Now, let us explore the consequences of autocorrelation. Just as with
heteroskedasticity, we shall focus on two of the three estimation procedures
embedded within the ordinary least squares (OLS) estimation procedure, the
procedures to estimate the:
• value of the coefficient.
• variance of the coefficient estimate’s probability distribution.
Question: Are these estimation procedures still unbiased when autocorrelation is
present?
Ordinary Least Squares (OLS) Estimation Procedure for the Coefficient Value
Begin by focusing on the coefficient value. Previously, we showed that the
estimation procedure for the coefficient value was unbiased by
• applying the arithmetic of means;
and
• recognizing that the means of the error terms’ probability distributions
equal 0 (since the error terms represent random influences).
Let us quickly review. First, recall the arithmetic of means:
Mean of a constant plus a variable: Mean[c + x] = c + Mean[x]
Mean of a constant times a variable: Mean[cx] = c Mean[x]
Mean of the sum of two variables: Mean[x + y] = Mean[x] + Mean[y]
To keep the algebra straightforward, we focused on a sample size of 3:
Equation for Coefficient Estimate:
T

∑ ( x − x )e
t t
( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3
bx = β x + t =1
= βx +
T
( x1 − x )2 + ( x2 − x ) 2 + ( x3 − x ) 2
∑ (x − x )
t =1
t
2
11

Now, some algebra:1


( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3
Mean[bx ] = Mean β x + [ ( x1 − x ) 2 + ( x2 − x ) 2 + ( x3 − x ) 2
]
Applying Mean[c + x] = c + Mean[x]
( x − x )e1 + ( x2 − x )e2 + ( x3 − x )e3
= β x + Mean 1 [
( x1 − x ) 2 + ( x2 − x ) 2 + ( x3 − x ) 2
]
Rewriting the fraction as a product
= β x + Mean [( ( x − x ) + ( x −1 x ) + ( x − x ) )(( x − x )e + ( x − x )e + ( x − x )e )]
2 2 2 1 1 2 2 3 3
1 2 3

Applying Mean[cx] = cMean[x]


= βx + 2
1
( x1 − x ) + ( x2 − x ) 2 + ( x3 − x ) 2
Mean [(( x − x )e + ( x − x )e + ( x − x )e )]
1 1 2 2 3 3

Applying Mean[x + y] = Mean[x] + Mean[y]


= βx +
1
( x1 − x ) + ( x2 − x ) 2 + ( x3 − x ) 2
2
[
Mean[( x1 − x )e1 ] + Mean[( x2 − x )e2 ] + Mean[( x3 − x )e3 ] ]
Applying Mean[cx] = cMean[x]
= βx +
1
( x1 − x ) 2 + ( x2 − x ) 2 + ( x3 − x ) 2
[( x1 − x )Mean[e1 ] + ( x2 − x )Mean[e2 ] + ( x3 − x )Mean[e3 ] ]
Since Mean[e1] = Mean[e2] = Mean[e3] = 0
= βx

What is the critical point here? We have not relied on the error term/error
term independence premise to show that the estimation procedure for the
coefficient value is unbiased. Consequently, we suspect that the estimation
procedure for the coefficient value will continue to be unbiased in the presence of
autocorrelation.
12

Ordinary Least Squares (OLS) Estimation Procedure for the Variance of the
Coefficient Estimate’s Probability Distribution
Next, consider the estimation procedure for the variance of the coefficient
estimate’s probability distribution used by the ordinary least squares (OLS)
estimation procedure:
The strategy involves two steps:
• First, we used the adjusted variance to estimate the variance of the error
SSR
term’s probability distribution: EstVar[e ] = estimates
Degrees of Freedom
Var[e].
• Second, we applied the equation relating the variance of the coefficient
estimates probability distribution and the variance of the error term’s
Var[e ]
probability distribution: Var[bx ] = T
∑ ( xt − x )2
t =1
Step 1: Estimate the variance of the error
Step 2: Apply the relationship between the
term’s
probability distribution from the available variances of coefficient estimate’s and
information – data from the first quiz error term’s probability distributions
↓ ↓
Var[e ]
SSR Var[bx ] = T
EstVar[e ] =
Degrees of Freedom ∑ ( xt − x )2
t =1
é ã
EstVar[e ]
EstVar[bx ] = T

∑ (x − x )
t =1
t
2

Unfortunately, when autocorrelation is present, the second step is not


justified. To understand why, recall the arithmetic of variances:
Variance of a constant times a variable: Var[cx] = c2Var[x]
Variance of the sum of a constant and a variable: Var[c + x] = Var[x]
Variance of the sum of two variables: Var[x + y] = Var[x] + Var[y] + Cov[x,
y]
Focus on the variance of the sum of two variables:
Var[x + y] = Var[x] + Var[y] + Cov[x, y]
Since the covariance of independent variables equals 0, we can simply ignore the
covariance terms when calculating the sum of independent variables. On the other
hand, if two variables are not independent, their covariance does not equal 0.
13

Consequently, when calculating the variance of the sum of two variables that are
not independent we cannot ignore their covariance.
Var[x + y] = Var[x] + Var[y] + Cov[x, y]
x and y independent x and y not independent
↓ ↓
Cov[x, y] = 0 Cov[x, y] ≠ 0
↓ ↓
Can ignore covariance Cannot ignore covariance

Var[x + y] = Var[x] + Var[y]

Next, apply this to the error terms when autocorrelation is absent and
when it is present:
When autocorrelation When autocorrelation
is absent is present
↓ ↓
The error terms are The error terms
independent not independent
↓ ↓
We can ignore the We cannot ignore the
error term covariances error term covariances

We shall now review our derivation of the relationship between the


variance of the coefficient estimate’s probability distribution and the variance of
Var[e ]
the error term’s probability distribution, Var[bx ] = T , to illustrate the
∑ ( xt − x ) 2

t =1
critical role played by the error term/error term independence premise. We began
with the equation for the coefficient estimate:
Equation for Coefficient Estimate:
T

∑ ( x − x )e
t t
( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3
bx = β x + t =1
= βx +
T
( x1 − x )2 + ( x2 − x ) 2 + ( x3 − x ) 2
∑ (x − x )
t =1
t
2

Then, we applied a little algebra:2

( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3


[
Var[bx ] = Var β x +
( x1 − x ) 2 + ( x2 − x ) 2 + ( x3 − x ) 2
]
Applying Var[c + x] = Var[x]
14

= Var [( x( x−−x )xe) ++((xx −−xx))e ++((xx −−xx))e ]


1 1
2
2
2
2 3
2
3

1 2 3

Rewriting the fraction as a product


= Var [( ( x − x ) + ( x −1 x ) + ( x − x ) )(( x − x )e + ( x − x )e + ( x − x )e )]
2 2 2 1 1 2 2 3 3
1 2 3

Applying Var[cx] = c2Var[x]


=
[( x − x ) 2
1
+ ( x2 − x ) 2 + ( x3 − x ) 2 ]2
Var [(( x − x )e + ( x − x )e + ( x − x )e )]
1 1 2 2 3 3
1

Error Term/Error Term Independence Premise


The error terms are independent: Var[x + y] =
Var[x] + Var[y]
=
[( x − x ) 2
1
+ ( x2 − x ) 2 + ( x3 − x ) 2 ]2
[Var[( x − x )e ] + Var[( x − x )e ] + Var[( x − x )e ]]
1 1 2 2 3 3
1

Applying Var[cx] = c2Var[x]


=
1
[( x − x ) + ( x − x ) + ( x − x ) ]
2
[( x − x ) Var[e ] + ( x − x ) Var[e ] + ( x − x ) Var[e ]]
2 2 2 1
2
1 2
2
2 3
2
3
1 2 3

Error Term Equal Variance Premise


Error term variance identical: Var[e1] = Var[e2] =
Var[e3] = Var[e]
=
1
[( x − x ) + ( x − x ) + ( x − x ) ]
2
[( x − x ) Var[e] + ( x − x ) Var[e] + ( x − x ) Var[e]]
2 2 2 1
2
2
2
3
2

1 2 3

Factoring out the Var[e]


=
1
[( x − x ) + ( x − x ) + ( x − x ) ]
2
[( x − x ) + ( x − x ) + ( x − x ) ]Var[e]
2 2 2 1
2
2
2
3
2

1 2 3

Simplifying
Var[e ]
=
( x1 − x )2 + ( x2 − x ) 2 + ( x3 − x ) 2
Generalizing
Var[e ]
= T

∑ (x − x )
t =1
t
2
15

Focus on the fourth step. When the error term/error term independence
premise is satisfied, that is, when the error terms are independent, we can ignore
the covariance terms when calculating the variance of a sum of variables.
=
[( x − x ) 2
1
+ ( x2 − x ) 2 + ( x3 − x ) 2 ]2
Var [(( x − x )e + ( x − x )e + ( x − x )e )]
1 1 2 2 3 3
1

Error Term/Error Term Independence Premise


The error terms are independent: Var[x + y] =
Var[x] + Var[y]
=
[( x − x ) 2
1
+ ( x2 − x ) 2 + ( x3 − x ) 2 ]2
[Var[( x − x )e ] + Var[( x − x )e ] + Var[( x − x )e ]]
1 1 2 2 3 3
1

When autocorrelation is present, however, the error terms are not independent and
the covariance terms cannot be ignored. Therefore, when autocorrelation is
present the fourth step is invalid:
Var[bx ] =
[( x − x ) 2
1
+ ( x2 − x ) 2 + ( x3 − x ) 2 ]2
[Var[( x − x )e ] + Var[( x − x )e ] + Var[( x − x )e ]]
1 1 2 2 3 3
1

Consequently, in the presence of autocorrelation, the equation we used to describe


the relationship between the variances of the probability distribution for the error
term and the probability distribution coefficient estimate is no longer valid:
Var[e ]
Var[bx ] = T
∑ ( xt − x )2
t =1

The procedure used by the ordinary least squares (OLS) to estimate the variance
of the coefficient estimate’s probability distribution is flawed.
Step 1: Estimate the variance of the error
Step 2: Apply the relationship between the
term’s
probability distribution from the available variances of coefficient estimate’s and
information – data from the first quiz error term’s probability distributions
↓ ↓
Var[e ]
SSR Var[bx ] = T
EstVar[e ] =
Degrees of Freedom ∑ ( xt − x )2 t =1
é ã
EstVar[e ]
EstVar[bx ] = T

∑ (x − x )
t =1
t
2
16

The equation that the ordinary least squares (OLS) estimation procedure uses to
estimate the variance of the coefficient estimate’s probability distribution is
flawed when autocorrelation is present. Consequently, how can we have faith in
the variance estimate?

Our Suspicions
Let us summarize. After reviewing the algebra we suspect that when
autocorrelation is present the ordinary least squares (OLS) estimation procedure
for the
• coefficient value will still be unbiased.
• variance of the coefficient estimate’s probability distribution may be
biased.

Confirming Our Suspicions


We shall use a simulation to confirm our suspicions.

Econometrics Lab 17.2: The Ordinary Least Squares (GLS) Estimation Procedure
and Autocorrelation

[Link to MIT-Lab 17.2 goes here.]

Is OLS estimation Is OLS estimation procedure


procedure for the for the variance of the
coefficient’s value of the coefficient estimate’s
unbiased? probability distribution unbiased?
ã é ã é
Actual Estimate of Variance of the Estimate of the variance
coefficient coefficient estimated coefficient for coefficient estimate’s
value value values probability distribution
Sample Size ⏐ ↓ ↓ ↓

30 ↓ Mean (Average) Variance of the Average of
Actual of the Estimated Estimated Coefficient Estimated Variances,
Estim Value Values, bx, from Values, bx, from EstVar[bx], from
Rho Proc of βx All Repetitions All Repetitions All Repetitions
0 OLS 2.0 ≈2.0 ≈.22 ≈.22
.6 OLS 2.0 ≈2.0 ≈1.11 ≈.28
Table 17.1: Autocorrelation Simulation Results
17

Autocorrelation Model: et = ρet−1 + vt vt‘s are independent

Figure 17.7: Specifying Rho

As a benchmark, we begin by specifying rho to equal .0; consequently, no


autocorrelation present. Click Start and then after many, many repetitions click
Stop. As we observed before, both the estimation procedure for the coefficient
value and the estimation procedure for the variance of coefficient estimate’s
probability distribution are unbiased. When the ordinary least squares (OLS)
standard regression premises are met all is well. But what happens when
autocorrelation is present and the error term/error term independence premise is
violated? To investigate this, we set rho to equal .6. Click Start and then after
many, many repetitions click Stop. There is both good news and bad news:
• Good news: The ordinary least squares (OLS) estimation procedure for
the coefficient value is still unbiased. The average of the estimated values
equals the actual value, 2.
• Bad news: The ordinary least squares (OLS) estimation procedure for the
variance of the coefficient estimate’s probability distribution is biased.
The average the actual variance of the estimated coefficient values equals
1.11 while the average of the estimated variances equals .28.
Just as we feared, when autocorrelation is present, the ordinary least squares
(OLS) calculations to estimate the variance of the coefficient estimates are
flawed.

When the estimation procedure for the variance of the coefficient


estimate’s probability distribution is biased, all calculations based on the estimate
of the variance will be flawed also; that is, the standard errors, t-statistics, and tail
probabilities appearing on the ordinary least squares (OLS) regression printout are
unreliable. Consequently, we shall use an example to explore how we account for
the presence of autocorrelation.
18

Accounting for Autocorrelation: An Example


We can account for autocorrelation by applying the following steps:
• Step 1: Apply the Ordinary Least Squares (OLS) Estimation Procedure
o Estimate the model’s parameters with the ordinary least squares
(OLS) estimation procedure.
• Step 2: Consider the Possibility of Autocorrelation
o Ask whether there is reason to suspect that autocorrelation may be
present.
o Use the ordinary least squares (OLS) regression results to “get a
sense” of whether autocorrelation is a problem by examining the
residuals.
o Use the Lagrange Multiplier approach by estimating an artificial
regression to test for the presence of autocorrelation.
o Estimate the value of the autocorrelation parameter, ρ.
• Step 3: Apply the Generalized Least Squares (GLS) Estimation Procedure
o Apply the model of autocorrelation and algebraically manipulate
the original model to derive a new, tweaked model in which the
error terms do not suffer from autocorrelation.
o Use the ordinary least squares (OLS) estimation procedure to
estimate the parameters of the tweaked model.

Time series data often exhibits autocorrelation. We shall consider monthly


consumer durables data:
Consumer Durable Data: Monthly time series data of consumer durable
consumption and income statistics 2004 to 2009.
ConsDurt Consumption of durables in month t (billions of 2005 chained
dollars)
Const Consumption in month t (billions of 2005 chained dollars)
Inct Disposable income in month t (billions of 2005 chained
dollars)
Project: Assess the effect of disposable income on the consumption of
consumer durables.
These particular start and end dates were chosen to illustrate the autocorrelation
phenomenon clearly.

[Link to MIT-ConsDurDisInc-2004-2009.wf1 goes here.]

We shall focus on a traditional Keynesian model to explain the consumption of


consumer durables:
Model: ConsDurt = βConst + βIInct + et
19

Economic theory suggests that higher levels of disposable income increase the
consumption of consumer durables:
Theory: βI > 0. Higher disposable income increases the consumption of
durables.

Step 1: Apply the Ordinary Least Squares (OLS) Estimation Procedure

Ordinary Least Squares (OLS)


Dependent Variable: ConsDur
Explanatory Variable(s): Estimate SE t-Statistic Prob
Inc 0.086525 0.016104 5.372763 0.0000
Const 290.7887 155.4793 1.870273 0.0656
Number of Observations 72
Estimated Equation: EstConsDur = 290.8 + .087Inc
Interpretation of Estimates:
bInc = .087: A $1 increase in real disposable income increases the real
consumption of durable goods by $.087.
Critical Result: The Inc coefficient estimate equals .087. This evidence, the
positive sign of the coefficient estimate, suggests that higher
disposable income increases the consumption of consumer
durables thereby supporting the theory.
Table 17.2: OLS Consumer Durable Regression Results

We now formulate the null and alternative hypotheses:


H0: βI = 0 Higher disposable income does not affect the consumption of durables
H1: βI > 0 Higher disposable income increases the consumption of durables
As always, the null hypothesis challenges the evidence; the alternative hypothesis
is consistent with the evidence. Next, we calculate Prob[Results IF H0 True].
Prob[Results IF H0 True]: What is the probability that the Inc estimate from
one repetition of the experiment will be .087 or more, if H0 were true (that is,
if the per capita income has no effect on the Internet use, if βI actually equals
0)?
OLS estimation If H0 Number of Number of
procedure unbiased true SE observations parameters
é ã ↓ é ã
Mean[bI ] = β I = 0 SE[bI] = .0161 DF = 72 − 2 = 70
20

Econometrics Lab 17.3: Calculating Prob[Results IF H0 True]

[Link to MIT-Lab 17.3 goes here.]

To emphasize that the Prob[Results IF H0 True] depends on the standard error we


shall use the Econometrics Lab to calculate the probability. The following
information has already been entered:
Mean = 0 Value = .087
Standard Error = .0161 Degrees of Freedom = 70
Click Calculate.
Prob[Results IF H0 True] = <.0001.
We use the standard error provided by the ordinary least squares (OLS) regression
results to compute the Prob[Results IF H0 True].

We can also calculate the Prob[Results IF H0 True] by using the tails


probability reported in the regression printout. Since this is a one-tailed test, we
divide the tails probability by 2:
< .0001
Prob[Results IF H 0 True] = ≈ < .0001
2
Based on the 1 percent significance level, we would reject that null hypothesis.
We would reject the hypothesis that disposable income has no effect on the
consumption of consumer durables use.

There may a problem with this, however. The equation used by the
ordinary least squares (OLS) estimation procedure to estimate the variance of the
coefficient estimate’s probability distribution assumes that the error term/error
term independence premise is satisfied. Our simulation revealed that when
autocorrelation is present and the error term/error term independence premise is
violated, the ordinary least squares (OLS) estimation procedure estimating the
variance of the coefficient estimate’s probability distribution can be flawed.
Recall that the standard error equals the square root of the estimated variance.
Consequently, if autocorrelation is present, we may have entered the wrong value
for the standard error into the Econometrics Lab when we calculated Prob[Results
IF H0 True]. When autocorrelation is present the ordinary least squares (OLS)
estimation procedure bases it computations on a faulty premise, resulting in
flawed standard errors, t-Statistics, and tails probabilities. Consequently, we
should move on to the next step.
21

Step 2: Consider the Possibility of Autocorrelation


Unfortunately, there is reason to suspect that autocorrelation may be present. We
would expect the consumption of durables are not only influenced by disposable
income, but also by the business cycle:
• When the economy is strong, consumer confidence tends to be high;
consumers spend more freely and purchase more than “usual.” When the
economy is strong the error term tends to be positive.
• When the economy is weak, consumer confidence tends to be low;
consumers spend less freely and purchase less than “usual.” When the
economy is weak the error term tends to be negative.
We know that business cycles tend to last for many months, if not years.
When the economy is strong, it remains strong for many consecutive months;
hence, when the economy is strong we would expect consumers to spend more
freely and for the error term to be positive for many consecutive months. On the
other hand, when the economy is weak, we would expect consumers to spend less
freely and the error term to be negative for many consecutive months.
Economy strong Economy weak
↓ ↓
Consumer confidence In the last month, consumer
was high last month; → et−1 > 0 confidence was low; → et−1 < 0
consumers spent more freely, consumers spent less freely,
consume more, last month. consume less, last month.
↓ ↓
Typically, consumer confidence Typically, consumer confidence
will continue to be high → et > 0 will continue to be low → et < 0
this month; consumers will this month; consumers will
spend more freely, spend less freely,
consume more, this month consume less, this month
As a consequence of the business cycle we would expect the error term to exhibit
some “inertia.” Positive error terms tend to follow positive error terms; negative
error terms tend to follow negative error terms. Consequently, we suspect that the
error terms are not independent; instead, we suspect that the error terms will be
positively correlated, positive autocorrelation. How can we “test” our suspicions?
22

Of course we can never observe the error terms themselves. We can,


however, use the residuals to estimate the error terms:
Error Term Residual
↓ ↓
yt = βConst + βxxt + et Rest = yt − Estt
↓ ↓
et = yt − (βConst + βxxt) Rest = yt − (bConst + bxxt)
We can think of the residuals as the estimated errors. Since the residuals are
observable we use the residuals as proxies for the error terms. Figure 17.8 plots
the residuals.

Figure 17.8: Plot of the Residuals

The residuals are plotted consecutively, one month after another. As we can easy
see, a positive residual is typically followed by another positive residual; a
negative residual is typically followed by a negative residual. “Switchovers” do
occur, but they are not frequent. This suggests that positive autocorrelation is
present. Most statistical software provides a very easy way to look at the
residuals.
23

Getting Started in EViews___________________________________________


• First, run the regression.
• In the Equation window, click View
• Click Actual, Fitted, Residual
• Click Residual Graph
__________________________________________________________________

It is also instructive to construct a scatter diagram of the residuals versus


the residuals lagged one month:

Figure 17.9: Scatter Diagram of the Residuals

Most of the scatter diagram points lie in the first and third quadrants. The
residuals are positively correlated.
24

Since the residual plots suggest that our fears are warranted, we now test
the autocorrelation model more formally. While there are many different
approaches, we shall focus on the Lagrange Multiplier (LM) approach which uses
an artificial regression to test for autocorrelation.3 We shall proceed by reviewing
a mathematical model of autocorrelation.
Autocorrelation Model: et = ρet−1 + vt vt‘s are independent
ρ=0 ρ≠0
↓ ↓
et = vt et depends on et−1
↓ ↓
No autocorrelation Autocorrelation present
In this case, we believe that ρ is positive. A positive rho provides the error term
with inertia. A positive error term tends to follow a positive error term and a
negative error term tends to follow a negative term. But also note that there is a
second term, vt. The vt‘s are independent; they represent random influences which
affect the error term also. It is the vt‘s that “switch” the sign of the error term.
Now, we combine the original model with the autocorrelation model:
Original Model: yt = βConst + βxxt + et et‘s are unobservable
Autocorrelation Model: et = ρet−1 + vt vt‘s are independent
Ordinary Least Squares (OLS) Estimate: Estyt = bConst + bxxt
Residuals: Rest = yt − Estyt Rest‘s are observable

Rest = yt − Estyt
⏐ Substituting for yt

↓ yt = βConst + βxxt + et
= βConst + βxxt + et − Estyt
⏐ Substituting for et

↓ et = ρet−1 + vt
= βConst + βxxt + ρet−1 + vt − Estyt
⏐ Substituting for Estyt

↓ Estyt = bConst + bxxt
= βConst + βxxt + ρet−1 + vt − (bConst + bxxt)
Rearranging terms
= (βConst − bConst) + (βx−bx)xt + ρet−1 + vt
⏐ Cannot observe et−1

↓ use Rest−1 instead
= (βConst − bConst) + (βx−bx)xt + ρRest−1 + vt
25

NB: Since the vt‘s are independent, we need not worry about
autocorrelation here.

Most statistical software allows us to assess this model easily.


Getting Started in EViews___________________________________________
• First, run the regression.
• In the Equation window, click View
• Click Residual Diagnostics
• Click Serial Correlation LM Test
• Change the number of Lags to include from 2 to 1.
__________________________________________________________________

Lagrange Multiplier (LR)


Dependent Variable: Resid
Explanatory Variable(s): Estimate SE t-Statistic Prob
Inc −0.002113 0.008915 -0.237055 0.8133
Const 19.96027 86.07134 0.231904 0.8173
Resid(−1) 0.839423 0.066468 12.62904 0.0000
Number of Observations 72
Presample missing value lagged residuals set to zero.
Table 17.3: Lagrange Multiplier Test Results

Critical Result: The Resid(−1) coefficient estimate equals .8394. The positive
sign of the coefficient estimate suggests that an increase in last period’s
residual increases this period’s residual. This evidence suggests that
autocorrelation is present.
Now, we formulate the null and alternative hypotheses:
H0: ρ = 0 No autocorrelation present
H1: ρ > 0 Positive autocorrelation present
The null hypothesis challenges the evidence by asserting that no autocorrelation is
present. The alternative hypothesis is consistent with the evidence.

Next, we calculate Prob[Results IF H0 True]:


Prob[Results IF H0 True]: What is the probability that the coefficient estimate
from one regression would be.8394 or more, if the H0 were true (that is, if no
autocorrelation were actually present, if ρ actually equals 0)?
Using the tails probability reported in the regression printout:
Prob[Results IF H0 True] <.0001
26

Autocorrelation appears to be present; accordingly, we shall now return to the


autocorrelation model to estimate the parameter, ρ.
Autocorrelation Model: et = ρet−1 + vt vt‘s are independent
ρ=0 ρ≠0
↓ ↓
et = vt et depends on et−1
↓ ↓
No autocorrelation Autocorrelation present

In practice there are a variety of ways to estimate ρ. We shall discuss what


is perhaps the most straightforward. Since the error terms are unobservable, we
“replace” the error terms with the residuals:
Model: et = ρet−1 + vt vt‘s are independent
↓ ↓
Rest = ρRest−1 + vt
NB: Note that there is no constant in this model.

Ordinary Least Squares (OLS)


Dependent Variable: Residual
Explanatory Variable(s): Estimate SE t-Statistic Prob
ResidualLag 0.839023 0.064239 13.06089 0.0000
Number of Observations 71
Estimated Equation: Residual = .0890ResidualLag
Critical Result: The ResidualLag coefficient estimate equals .8390; that is, the
estimated value of ρ equals .8390.
Table 17.4: Regression Results – Estimating ρ

Estimate of ρ = Estρ = .8390


Getting Started in EViews___________________________________________
• Run the original regression; EViews automatically calculates the residuals
and places them in the variable resid.
• EViews automatically modifies Resid every time a regression is run.
Consequently, we shall now generate two new variables before running
the next regression to prevent a “clash:”
o residual = resid
o residuallag = residual(−1)
• Now, specify residual as the dependent variable and residuallag as the
explanatory variable; do not forget to “delete” the constant.
__________________________________________________________________
27

Step 3: Apply the Generalized Least Squares (GLS) Estimation Procedure


Strategy: Our strategy for dealing with autocorrelation will be similar to our
strategy for dealing with heteroskedasticity. Algebraically manipulate the original
model so that the problem of autocorrelation is eliminated in the new model. That
is, tweak the original model so that the error terms in the tweaked model are
independent. We can accomplish this with a little algebra. We begin with the
original model and then apply the autocorrelation model:
Original model: yt = βConst + βxxt + et
Autocorrelation model: et = ρet−1 + vt vt‘s are independent
Original model for period t:
yt = βConst + βxxt + et
Original model for period t–1:
yt−1 = βConst + βxxt−1 + et−1
Multiplying by ρ
ρyt−1 = ρβConst + ρβxxt−1 + ρet−1
Rewrite the equations for yt and by ρyt−1:
yt = βConst + βxxt + et
ρyt−1 = ρβConst + ρβxxt−1 + ρet−1
Subtracting
yt − ρyt−1 = βConst − ρβConst + βxxt − ρβxxt−1 + et − ρet−1
↓ Factoring out βx
yt − ρyt−1 = βConst − ρβConst + βx(xt − ρxt−1) + et − ρet−1
↓ Substituting for et
yt − ρyt−1 = βConst − ρβConst + βx(xt − ρxt−1) + ρet−1 + vt − ρet−1
↓ Simplifying
(yt − ρyt−1) = (βConst − ρβConst) + βx(xt − ρxt−1) + vt
28

In the tweaked model:


New dependent variable: yt − ρyt−1
New explanatory variable: xt − ρxt−1
Critical Point: In the tweaked model, vt‘s are independent; hence, we
need not be concerned about autocorrelation in the tweaked model.
Now, let us run the tweaked regression for our example; using the estimate
of ρ we generate two new variables:
New dependent AdjConsDurt = yt − Estρyt−1
variable:
AdjConsDurt = ConsDurt − .8390ConsDurt−1
New explanatory AdjInct = xt − Estρxt−1
variable:
AdjInct = Inct − .8390Inct−1

Ordinary Least Squares (OLS)


Dependent Variable: AdjConsDur
Explanatory Variable(s): Estimate SE t-Statistic Prob
AdjInc 0.040713 0.028279 1.439692 0.1545
Const 118.9134 44.43928 2.675861 0.0093
Number of Observations 71
Estimated Equation: EstAdjConsDur = 118.9 + .041Inc
Interpretation of Estimates:
bAdjInc = .041: A $1 increase in real disposable income increases the real
consumption of durable goods by $.041.
Critical Result: The Inc coefficient estimate equals .041. This evidence, the
positive sign of the coefficient estimate, suggests that higher
disposable income increases the consumption of consumer
durables thereby supporting the theory.
Table 17.5: GLS Regression Results – Accounting for Autocorrelation
29

We now review of null and alternative hypotheses:


H0: βI = 0 Higher disposable income does not affect the consumption of durables
H1: βI > 0 Higher disposable income increases the consumption of durables
Then, using the tails probability we calculate Prob[Results IF H0 True]:
.1545
Prob[Results IF H 0 True] = ≈ .0772
2
After accounting for autocorrelation, we cannot reject the null hypothesis at the 1
or 5 percent significance levels.

Let us now compare the disposable income coefficient estimate in last


regression, the generalized least squares (GLS) regression that accounts for
autocorrelation, with the disposable income coefficient estimate in the ordinary
least squares (OLS) regression that does not account for autocorrelation:

βI CoefficientStandard Tails
Estimate Error t-Statistic Probability
Ordinary Least Squares (OLS) .087 .016 5.37 <.0001
Generalized Least Squares (GLS) .041 .028 1.44 .1545
Table 17.6: Coefficient Estimate Comparison

The most striking difference is the standard errors and the calculations that are
based on the estimated variance of the coefficient probability distribution: the
coefficient’s standard error, t-Statistic, and tails probability. The standard error
nearly doubles when we account for autocorrelation. This is hardly surprising.
The ordinary least squares (OLS) regression calculations are based on the premise
that the error terms are independent. Our analysis suggests that this is not true.
The general least squares (GLS) regression accounts for error term correlation.
The standard error, t-Statistic, and tails probability in the general least squares
(GLS) regression differ substantially.
30

Justifying the Generalized Least Squares (GLS) Estimation Procedure


We shall now use a simulation to illustrate that the generalize least squares (GLS)
estimation procedure indeed provides “better” estimates than the ordinary least
squares (OLS) estimation procedure. While both procedures provide unbiased
estimates of the coefficient’s value, only the generalized least squares (GLS)
estimation procedure provides an unbiased estimate of the variance.
Econometrics Lab 17.4: Generalized Least Squares (GLS) Estimation Procedure
As before, choose a rho of .6; by default the ordinary least squares (OLS)
estimation procedure is chosen. Click Start and then after many, many repetitions
click Stop.

[Link to MIT-Lab 17.4 goes here.]

When the ordinary least squares (OLS) estimation procedure is used, the variance
of the estimated coefficient values equals about 1.11. Now, specify the
generalized least squares (GLS) estimation procedure by clicking GLS. Click
Start and then after many, many repetitions click Stop. When the generalized least
squares (GLS) estimation procedure is used, the variance of the estimated
coefficient values is less, 1.01. Consequently, the generalized least squares (GLS)
estimation procedure provides more reliable estimates.

Sample Size: 30 Mean (Average) Variance of the


Actual of the Estimated Estimated Coefficient
Estim Value Values, bx, from Values, bx, from
Rho Proc of βx All Repetitions All Repetitions
.6 OLS 2.0 ≈2.0 ≈1.11
.6 GLS 2.0 ≈2.0 ≈1.01
Table 17.7: Autocorrelation Simulation Results
31

Robust Standard Errors


Like heteroskedasticity, two issues emerge when autocorrelation is present:
• The standard error calculations made by the ordinary least squares (OLS)
estimation procedure are flawed.
• While the ordinary least squares (OLS) for the coefficient value is
unbiased, it is not the best linear unbiased estimation procedure (BLUE).
As before, robust standard errors address the first issue arising when
autocorrelation is present. Newey-West standard errors provide one such
approach that is suitable for both autocorrelation and heteroskedasticity. This
approach applies the same type of logic that we used to motivate the White
approach for heteroskedasticity, but it is more complicated. Consequently, we
shall not attempt to motivate the approach here. Statistical software makes it easy
to compute Newey-West robust standard errors:4

[Link to MIT-ConsDurDisInc-2004-2009.wf1 goes here.]

Getting Started in EViews___________________________________________


• Run the ordinary least squares (OLS) regression.
• In the equation window, click Estimate and Options
• In the Coefficient covariance matrix box select HAC (Newey-West) from
the drop down list.
• Click OK.
__________________________________________________________________

Ordinary Least Squares (OLS)


Dependent Variable: ConsDur
Explanatory Variable(s): Estimate SE t-Statistic Prob
Inc 0.086525 0.028371 3.049804 0.0032
Const 290.7887 268.3294 1.083701 0.2822
Number of Observations 72
Estimated Equation: EstConsDur = 290.8 + .087Inc
Interpretation of Estimates:
bInc = .087: A $1 increase in real disposable income increases the real
consumption of durable goods by $.087.
Table 17.8: OLS Regression Results – Robust Standard Errors
32

1
Recall that to keep the algebra straightforward we assume that the explanatory
variables are constants. By doing so, we can apply the arithmetic of means easily.
Our results are unaffected by this assumption.
2
Recall that to keep the algebra straightforward we assume that the explanatory
variables are constants. By doing so, we can apply the arithmetic of variances
easily. Our results are unaffected by this assumption.
3
The Durbin-Watson statistic is the traditional method of testing for
autocorrelation. Unfortunately, the distribution of the Durbin-Watson statistic
depends on the distribution of the explanatory variable. This makes hypotheses
testing with the Durbin-Watson statistic more complicated than with the Lagrange
multiplier test. Consequently, we shall focus on the Lagrange multiplier test.
4
While it is beyond the scope of this textbook, it can be shown that while this
estimation procedure is biased, the magnitude of the bias diminishes and
approaches zero as the sample size approaches infinity.

You might also like