0% found this document useful (0 votes)
41 views

Econometrics (EM2008/EM2Q05) Autocorrelation: Irene Mammi

The document discusses autocorrelation in regression models. It defines autocorrelation and presents the autocorrelation coefficient. It then discusses how to specify autoregressive and moving average models to account for autocorrelated errors. It also discusses how ordinary least squares is affected by autocorrelated errors. Finally, it presents the Durbin-Watson test for detecting autocorrelation in regression residuals.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Econometrics (EM2008/EM2Q05) Autocorrelation: Irene Mammi

The document discusses autocorrelation in regression models. It defines autocorrelation and presents the autocorrelation coefficient. It then discusses how to specify autoregressive and moving average models to account for autocorrelated errors. It also discusses how ordinary least squares is affected by autocorrelated errors. Finally, it presents the Durbin-Watson test for detecting autocorrelation in regression residuals.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Econometrics [EM2008/EM2Q05]

Lecture 6
Autocorrelation

Irene Mammi

[email protected]

Academic Year 2018/2019

1 / 30
outline

I autocorrelation
I autocorrelated disturbances
I testing for autocorrelated disturbances
I estimation with autocorrelated disturbances
I autoregressive conditional heteroskedasticity

I References:
I Johnston, J. and J. DiNardo (1997), Econometrics Methods, 4th
Edition, McGraw-Hill, New York, Chapter 6.

2 / 30
autocorrelated disturbances
I heteroskedasticity affects the elements on the principal diagonal of
var(u ), but it is still assumed E(ut ut +s ) = 0 for all t and s 6= 0
I when the errors are autocorrelated, this assumption no longer holds
I the pairwise autocovariances are

γs = E(ut ut +s ) s = 0, ±1, ±2, . . .

I when s = 0 we have
γ0 = E(ut2 ) = σu2
I the autocorrelation coefficient at lag s is

cov(ut , ut +s )
ρs = p
var(ui ) var(ut +s )
I under homoskedasticity, this reduces to
γs
ρs = s = 0, ±1, ±2, . . .
γ0
3 / 30
autocorrelated disturbances (cont.)

I the error variance-covariance matrix is

· · · γn−1 · · · ρ n −1
   
γ0 γ1 1 ρ1
 γ1 γ0 · · · γn−2   ρ1 1 · · · ρ n −2 
var(u ) =  . = σu2 
   
. . . .. .. .. .. 
 .. .. .. .. 

 . . . . 
γn−1 γn−2 ··· γ0 ρ n −1 ρ n −2 ··· 1

I there n + k unknown parameters and only n observations: need to


assume some structure for the autocorrelation

4 / 30
autoregressive and moving average schemes

I the most common specification is the autoregressive AR(1) process

ut = ϕut −1 + et

where {et } is a white noise process


I the necessary and sufficient condition for a stationary error process is

| ϕ| < 1

I the constant expectation for {ut } is

E(ut ) = 0 for all t

I given stationarity, the constant variance is

σe2
var(ut ) = σu2 =
1 − ϕ2

5 / 30
autoregressive and moving average schemes (cont.)
I the autocorrelation coefficients are

ρs = ϕs s = 0, 1, 2, . . .

I the autocorrelation coefficients start at ρ0 = 1 and decline


exponentially, but never disappearing
I the current disturbance ut can be expressed as a weighted sum of the
current and all previous shocks:

ut = et + ϕet −1 + ϕ2 et −2 + · · ·

I the variance-covariance matrix of the disturbances is

· · · ϕ n −1
 
1 ϕ
2
 ϕ 1 · · · ϕ n −2 
var(u ) = σu  .

.. .. .. 
 .. . . . 
ϕ n − 1 ϕ n − 2 ··· 1

6 / 30
autoregressive and moving average schemes (cont.)
I now need to estimate only k + 2 parameters: a feasible GLS
procedure exists
I a first-order moving average MA(1) process is defined as

ut = et + θet −1

where {et } is white noise


I the most relevant parameters of this process are

σu2 = σe2 (1 + θ 2 )
ρ1 = θ/(1 + θ 2 )
ρi = 0 i = 2, 3, . . .

I the MA process has a short finite memory, being affected only by


current and prior values of e

7 / 30
OLS and autocorrelated errors
I OLS with nonstochastic X and autocorrelated errors are unbiased,
consistent but inefficient; inference procedures are invalid
I if lags of y appear in X, we have more serious problems
I assume, for example,

yt = βyt −1 + ut | β| < 1
ut = ϕut −1 + et | ϕ| < 1

where
E( e ) = 0 and E(ee0 ) = σe2 I
I estimating β by OLS gives

∑ yt yt − 1 ∑ yt −1 ut
b= = β+
∑ yt −1
2
∑ yt2−1

thus,
1
∑ yt −1 ut

plim n
plim b = β +
plim n1 ∑ yt2−1

8 / 30
OLS and autocorrelated errors (cont.)

I the consistency of b depends on plim(∑ yt −1 ut /n)


I since
yt −1 = ut −1 + βut −2 + β2 ut −3 + · · ·
we can show that

ϕσu2
 
1
plim
n ∑ y u
t −1 t = ϕσu2 + βϕ2 σu2 + β2 ϕ3 σu2 + · · · =
1 − βϕ

so that the combination of a lagged dependent variable and


autocorrelated errors renders OLS inconsistent

9 / 30
testing for autocorrelation
I suppose that in the model y = X β + u one suspects that the error
follows a AR(1) process:

ut = ϕut −1 + et

I the null hypothesis of zero autocorrelation is then

H0 : ϕ = 0

and the alternative hypothesis is

H1 : ϕ 6= 0

I we look for a test using the OLS residuals, e = y − Xb: we have


that e = Mu, thus the variance-covariance matrix of the e’s is

var(e ) = E(ee 0 ) = σu2 M

which implies that, also under the null, the OLS residual will display
some autocorrelation.
10 / 30
testing for autocorrelation (cont.)
Durbin-Watson test

I the Durbin-Watson (DW) test is computed from OLS residuals and is


defined as
∑n (et − et −1 )2
d = t =2 n
∑t =1 et2
I DW statistic will tend to be “small” for positively autocorrelated e’s
and “large” for negative autocorrelation
I the DW is closely related to the sample first-order correlation
coefficient of the e’s:

∑nt=2 et2 + ∑nt=2 et2−1 − 2 ∑nt=2 et et −1


d=
∑nt=1 et2
I for large n, we have
d ' 2(1 − ϕ̂)
where ϕ̂ = ∑ et et −1 / ∑ et2 is the coefficient in the OLS regression of
et on et −1
11 / 30
testing for autocorrelation (cont.)
I the range of d is from 0 to 4 and

d < 2 for positive autocorrelation of the e’s


d > 2 for negative autocorrelation of the e’s
d ' 2 for zero autocorrelation of the e’s
I for a random u series, we have
2(k − 1)
E(d ) = 2 +
n−k
I any computed d depends on X so there are no exact critical values:
there are upper (dU ) and lower (dL ) bounds for them
I the testing procedure is as follows
1. if d < dL , reject the hypothesis of nonautocorrelated u in favor of the
hypothesis of positive first-order autocorrelation
2. if d > dU , do not reject the null hypothesis
3. if dL < d < dU , the test is inconclusive
I the test requires a constant term in the regression and is strictly valid
only for nonstochastic X
12 / 30
testing for autocorrelation (cont.)

Durbin-Watson for a regression containing lagged values of the


dependent variable

I large-sample (asymptotic) test for the more general case of stochastic


regressors
I consider the relation

yt = β 1 yt −1 + · · · + β r yt −r + β r +1 x1t + · · · + β r +s xst + ut

with
ut = ϕut −1 + et | ϕ| < 1 and e ∼ N (0, σe2 I )

13 / 30
testing for autocorrelation (cont.)
I Durbin’s basic result is that under the null, H0 : ϕ = 0, the statistic is
r
n a
h = ϕ̂ ∼ N (0, 1)
1 − n · var(b1 )

where

n =sample size
var(b1 ) =estimated sampling variance of the coefficient
of yt −1 in the OLS fit of the complete model
n n
ϕ̂ = ∑ et et −1 / ∑ et2−1 , the estimate of ϕ from the regression
t =2 t =2
of et on et −1 , the e’s in turn being the residuals from the
OLS regression of the complete model

14 / 30
testing for autocorrelation (cont.)

I the test procedure is as follows:


1. fit the model by OLS and note var(b1 )
2. from the residuals compute ϕ̂ or, if DW has been computed, use the
approximation ϕ̂ = 1 − d /2
3. compute h, and if h > 1.645, reject H0 at 5% in favor of the
hypothesis of positive first-order autocorrelation
4. for negative h a similar one-sided test for negative autocorrelation can
be performed
I this test breaks down if n · var(b1 ) ≥ 1

15 / 30
testing for autocorrelation (cont.)
I an asymptotically equivalent procedure is the following
1. estimate the model by OLS and obtain the residual e’s
2. estimate the OLS regression of

et on et −1 , yt −1 , . . . , yt −r , x1t , . . . , xst

3. if the coefficient of et −1 in this regression is significantly different


from zero by the usual t test, reject H0 : ϕ = 0
I Durbin indicates that this procedure can be extended to an AR(p) by
simply adding additional lagged e’s to the second regression and
testing the joint significance of the coefficients of the lagged residuals
I the AR(p) scheme is

ut = ϕ1 ut −1 + ϕ2 ut −2 + · · · + ϕp ut −p + et

I the null hypothesis would be

H0 : ϕ 1 = ϕ 2 = · · · = ϕ p = 0

16 / 30
testing for autocorrelation (cont.)
Breusch-Godfrey test

I LM test against general AR or MA error processes


I consider a simple example

yt = β 1 + β 2 xt + ut

with
ut = β 3 ut − 1 + e t
where it is assumed that | β 3 | < 1 and that the e’s are i.i.d.N (0, σe2 ).
I substituting the second equation in the first one we have

yt = β 1 (1 − β 3 ) + β 2 xt + β 3 yt −1 − β 2 β 3 xt −1 + et

I we want to test H0 : β 3 = 0
I the test is obtained in two steps
1. apply OLS to the model yt = β 1 + β 2 xt + ut to obtain the residuals
et
17 / 30
testing for autocorrelation (cont.)

et −1 to find R 2
 
2. regress et on 1 xt
I under H0 , nR 2 is asymptotically χ2 (1)
I the second (auxiliary) regression is exactly the regression of the
Durbin procedure
I the procedure extends to testing for higher order autocorrelation:
simply add further-lagged OLS residuals to the second regression
I the Breusch-Godfrey test also tests against the alternative hypothesis
of an MA(p) process for the error
I the Durbin and Breusch-Godfrey procedures are asymptotically
equivalent

18 / 30
testing for autocorrelation (cont.)
Box-Pierce-Ljung statistic

I the Box-Pierce Q statistic is based on the squares of the first p


autocorrelation coefficients of the OLS residuals. The statistic is
defined as
p
Q=n ∑ rj2
j =1

where
∑nt=j +1 et et −j
rj =
∑nt=1 et2
I the limiting distribution of Q was derived under the assumption that
the residuals come from an AR scheme (or ARMA) fitted to some
variable y
I under the hypothesis of zero autocorrelations for the residuals, Q will
have an asymptotic χ2 distribution, with df equal to p minus the
number of parameters estimated in fitting the ARMA model

19 / 30
testing for autocorrelation (cont.)

I an improved small-sample performance is expected from the revised


Ljung-Box statistic
p
Q 0 = n(n + 2) ∑ rj2 /(n − j )
j =1

I not appropriate if there are exogenous x variables in the model

20 / 30
estimation of models with autocorrelated disturbances
I one possibility is to consider a joint specification of the relationship,
y = X β + u, and an associated autocorrelation structure,

Pu = e with E(ee0 ) = σe2 I

where P is some nonsingular n × n matrix that hopefully depends on


a small number p of unknown parameters; then estimate jointly all
k + p + 1 parameters
I alternatively, and better way to proceed, check whether the
autocorrelation is a sign of misspecification of the original model: if
errors are autocorrelated, there is some systematic behavior that is
not being modeled
I suppose that the “correct” model is

yt = γ1 + γ2 xt + γ3 xt −1 + γ4 yt −1 + et

where the error are white noise and that the estimated model only
includes xt : not surprisingly, significant autocorrelation is detected
21 / 30
estimation of models with autocorrelated disturbances
(cont.)
I suppose now that to account for this autocorrelation, we specify

yt = β 1 + β 2 xt + ut and ut = ϕut −1 + et

and estimate it by GLS


I the correct model has 5 parameters; the new specification only 4: we
are thus imposing a possibly invalid restriction on the parameters of
the true model
I to see this, rewrite the estimated model as

yt = β 1 (1 − ϕ) + β 2 xt − ϕβ 2 xt −1 + ϕyt −1 + et

so that it is clear that we are imposing the restriction

γ3 + γ2 γ4 = 0

which is known as common factor restriction


22 / 30
estimation of models with autocorrelated disturbances
(cont.)

GLS estimation

I assume that y = X β + u is a good model but we must allow for an


autocorrelation structure as Pu = e with E(ee0 ) = σe2 I. Some
specific form of autocorrelation must be assumed: the most common
is an AR(1) process. In that case, the variance-covariance matrix for
u is as follows

23 / 30
estimation of models with autocorrelated disturbances
(cont.)
··· ϕ n −1
 
1 ϕ
 ϕ 1 ··· ϕ n −2 
var(u ) = σu2 
 
.. .. .. .. 
 . . . . 
ϕ n −1 ϕ n −2 ··· 1
··· ϕ n −1
 
1 ϕ
σe2  ϕ 1 ··· ϕ n −2 
=
 
.. .. .. .. 
1 − ϕ2

 . . . . 
ϕ n −1 ϕ n −2 ··· 1
= σe2 Ω

where
··· ϕ n −1
 
1 ϕ
1  ϕ 1 ··· ϕ n −2 
Ω=
 
.. .. .. .. 
1 − ϕ2

 . . . . 
ϕ n −1 ϕ n −2 ··· 1
24 / 30
estimation of models with autocorrelated disturbances
(cont.)
I the inverse matrix is
−ϕ ···
 
1 0 0 0 0
 − ϕ 1 + ϕ2 −ϕ ··· 0 0 0 
1 + ϕ2
 
−1
 0 −ϕ ··· 0 0 0 
Ω = .
 
. .. .. .. .. .. .. 
 .
 . . . . . . 

 0 0 0 · · · − ϕ 1 + ϕ2 − ϕ
0 0 0 ··· 0 −ϕ 1

I can be seen then that the matrix


p
1 − ϕ2 0 0 ···

0 0
 −ϕ 1 0 · · · 0 0
 
P=
 0 − ϕ 1 · · · 0 0 
 .
.. .
.. .
.. . .. .. .. 
 . .
0 0 0 · · · −ϕ 1

satisfies Ω−1 = P 0 P
25 / 30
estimation of models with autocorrelated disturbances
(cont.)

I if ϕ were known, there would be two equivalent ways of deriving GLS


estimates of the β vector:
1. substitute ϕ in Ω−1 and compute b GLS = (X 0 Ω−1 X )−1 X 0 Ω−1 y
directly
2. transform the data by premultiplication by the P matrix and then
regress y ∗ on X ∗
I in practice, ϕ is an unknown parameter that has to be estimated

26 / 30
autoregressive conditional heteroskedasticity (ARCH)
I Engle suggested that heteroskedasticity might also occur in a time
series framework
I in exchange rate and stock market returns, large and small errors tend
to occur in cluster
I Engle formulated the idea that the recent past might give information
about the conditional disturbance variance and postulated the relation

σt2 = α0 + α1 ut2−1 + · · · + αp ut2−p

I the conditional error variance is the variance of ut , conditional on


information available at time t − 1; and it may be expressed as

σt2 = var(ut |ut −1 , . . . , ut −p )


= E(ut2 |ut −1 , . . . , ut −p )
= Et −1 (ut2 )

where Et −1 indicates taking an expectation conditional on all


information up to the end of period t − 1
27 / 30
autoregressive conditional heteroskedasticity (ARCH)
(cont.)
I we thus have that recent disturbances influence the variance of the
current disturbance
I a variance such as σt2 = α0 + α1 ut2−1 + · · · + αp ut2−p can arise from
an error defined as

ut = et [α0 + α1 ut2−1 + · · · + αp ut2−p ]1/2

where {et } is a white noise series with unit variance. This is an


ARCH(p) process. The simplest case is an ARCH(1) process,
ut = et [α0 + α1 ut2−1 ]1/2 .
I Its properties are are as follows:
1. The ut have zero mean.
2. The conditional variance is given by σt2 = α0 + α1 ut2−1
3. The unconditional variance is σ2 = α0 /(1 − α1 ), which only exists if
α0 > 0 and |α1 | < 1.
4. The autocovariances are zero.

28 / 30
autoregressive conditional heteroskedasticity (ARCH)
(cont.)
testing for ARCH
I The obvious test is implied by σt2 = α0 + α1 ut2−1 + · · · + αp ut2−p :
1. fit y to X by OLS and obtain the residuals {et }
2. compute the OLS regression,
et2 = α̂0 + α̂1 et2−1 + · · · + α̂p et2−p + error
3. test the joint significance of α̂1 , . . . , α̂p
I if these coefficients are significantly different from zero, the
assumption of conditionally homoskedastic disturbances is rejected in
favor of ARCH disturbances
estimation under ARCH
I to estimate the regression in stage 2, suitable restrictions on the α
parameters need to be imposed, e.g.

et2 = α̂0 + α̂(0.4et2−1 + 0.3et2−2 + 0.2et2−3 + 0.1et2−4 ) + error

29 / 30
autoregressive conditional heteroskedasticity (ARCH)
(cont.)
I a less restrictive specification is the GARCH(p, q) model

σt2 = α0 + α1 ut2−1 + · · · αp ut2−p + γ1 σt2−1 + · · · + γq σt2−q

which expresses the conditional variance as a linear function of p


lagged squared disturbances and q lagged conditional variances
I the most frequent application is the GARCH(1, 1)

σt2 = α0 + α1 ut2−1 + γ1 σt2−1

I substituting successively for the lagged disturbances on RHS gives


α0
σt2 = + α1 (ut2−1 + γ1 ut2−2 + γ12 ut2−3 + γ13 ut2−4 + · · · )
1 − γ1

which implies that the current variance depends on all previous


squared errors: provided that γ1 is a positive fraction, the weights
decline exponentially
30 / 30

You might also like