Chapter9 Heteroscedasticity
Chapter9 Heteroscedasticity
9 Heteroscedasticity
(June 13, 2016)
1 Introduction
Regression disturbances whose variance are not constant across observations are het-
eroscedastic. There are several reasons why the disturbance of εi may be variable,
some of which are as follows.1
(a). Following the error-learning models, as people learn, their errors of behavior be-
come smaller over time.
(b). As income grows, people have more discretionary income and hence more scope
for choice about the disposition of their income. Hence, σi2 is likely to increase
with income.
1
Ch.9 Heteroscedasticity 1 INTRODUCTION
= σ 2 Σ,
where
ω1 0 . . . 0
0 ω2 . . . 0
. . . . . .
Σ= . (9-1)
. . . . . .
. . . . . .
0 0 . . . ωN
This form is an arbitrary scaling which allows us to use a normalization,
N
X
tr(Σ) = ωi = N.
i=1
PN
σ2
(For example, σ 2 = i=1N
i
.) This makes the classical regression with homoscedastic
disturbances a simple special case with ωi = 1, i = 1, 2, ..., N .
Example.
See Figure 11.1 (p.216) of Greene 5th edition.
White (1980) addressed the case where nothing is known about the structure of the
heteroscedasticity other than the heteroscedastic variance σi2 are uniformly bounded.
It would be desirable to be able to test a general hypothesis of the form:
H0 : σi2 = σ 2 f or all i,
H1 : N ot H0 .
V\
ar(β̂)HAC
T
!
−1 −1
X
= (X0 X) e2t xt x0t ) . (X0 X)
t=1
White derives a test for heteroscedasticity which consists of comparing the elements
of N S0 (= N
P 2 0 2 0 2
PN 0
i=1 et xt xt ) and s (X X)(= s t=1 xt xt ), thus indicating whether or not
the usual OLS formula s2 (X0 X) is a consistent covariance estimator. Large discrep-
ancies between N S0 and s2 (X0 X) support the contention of heteroscedasticity while
small discrepancies support homoscedasticity.
A very popular test for determining the presence of heteroscedasticity which is mono-
tonically related to an exogenous variable by which observations on the dependent
variable can be ordered is the Goldfeld-Quandt (1965) test.
For the GoldfeldQuandt test, we assume that the observations can be divided into
two groups in such a way that under the hypothesis of homoscedasticity, the disturbance
variances would be the same in the two groups, whereas under the alternative, the
disturbance variances would differ systematically. The steps of this test are as follow:
(a). Order the observations (from “supposed” large to small variance) by the values
of the variables Z.
(b). Choose p central observations and omit them, provides (N − p)/2 > k.
(c). Fit separate regression by OLS to the two groups, with N1 and N2 observations,
respectively.
(d). Let SSE1 and SSE2 denote the sum of squared residuals based on the large vari-
ance (which you suppose they do) and the small variance group, respectively.
e01 e1 e02 e2
Recall that σ12
∼ χ2[N1 −k] and σ22
∼ χ2[N2 −k] , then the statistics
The Goldfeld-Quandt test has been found to be reasonably powerful when we are able
to identify correctly the variable to use in the sample separation. This requirement
does limit its generality, however. Breush-Pagan (1979) assume a border class of het-
eroscedasticity defined by
Exercise 1 .
Reproduce the results of Example 11.3 at p.224 of Greene 5th edition.
3 OLS Estimation
We showed in Section 8.2 that in the presence of heteroscedasticity, the OLS estimator
β̂ is unbiased and consistent. However it is inefficient relative to the GLS estimator.
White (1980) shows that under very general conditions, the matrix
N
1 X 2
S0 = e xi x0i ,
N i=1 i
can be used as an estimator of the true variance of the OLS estimator. Inference con-
cerning β is still possible by means of OLS estimator even when the specific structure
of Σ is not specified as β̂ is normally distributed asymptotically.
More generally, White shows that tests of the general linear hypothesis Rβ = q,
under the null hypothesis, the statistics
−1 −1 L
(Rβ̂ − q)0 [R(X0 X) N S0 (X0 X) R0 ]−1 (Rβ̂ − q) −→ χ2m ,
Exercise 2 .
Reproduce the results at Table 11.1 on p. 221 of Greene 5th edition.
Consider the most general case, σi2 = σ 2 ωi . Then Σ−1 is a diagonal matrix whose
i − th diagonal element is 1/ωi (See Eq. (9-1)), that is
1 1
0 . . . 0
√
ω1
0 . . . 0
ω1
0 1 . . . 0 1
0 √ . . . 0
ω2 ω2
. . . . . . . . . . . .
Σ−1 =
= ×
. . . . . . . . . . . .
. . . . . . .
. . . . .
1
0 0 . . . ωN 0 0 . . . √1
ωN
1
√
ω1
0 . . . 0
0 √1 . . . 0
ω2
. . . . . .
. . . . . .
. . . . . .
0 0 . . . √1
ωN
= P0 P.
where wi = 1/ωi .
The logic of the computation is that observations with smaller variances receive a
large weight in the computations of the sums and therefore have greater influence in
Example.
A common specification in linear regression model with heteroscedasticity is that the
variance of the disturbances is proportional to one of the regressors or its square. For
example, if the model is
where
σi2 = σ 2 Xil2 ,
then
1
2
X1l
0 . . . 0
1
0 2
X2l
. . . 0
. . . . . .
Σ−1 = ,
. . . . . .
. . . . . .
1
0 0 . . . 2
XN l
and
1
0 . . . 0
X1l
1
0 X2l
. . . 0
. . . . . .
P= .
. . . . . .
. . . . . .
1
0 0 . . . XN l
The two step estimators are computed by first obtaining estimators σ̂i2 , usually
using some function of the OLS residuals, then the FGLS will be
" N #−1 " N #
X 1 X 1
β̌ = 2
xi x0i xi Yi . (9-4)
i=1
σ̂i i=1
σ̂i2
The OLS estimator β̂, although inefficient, is still consistent. As such, statistics
computed using the OLS residual, ei = (Yi − x0i β̂), will have the same asymptotic
properties as those computed using the true disturbance, εi = (Yi − x0i β)
Let
ε2i = σi2 + vi ,
where vi is just the difference between the random variable ε2i and its expectation.
Since εi is unobservable, we would use the OLS residual, for which
ei = εi − x0i (β̂ − β) = εi + ui .
p
But in large sample, as β̂ −→ β, terms in ui will become negligible, so that at least
approximately,
The procedure suggested is to treat the variance function as a regression and use
the squares of the OLS residual as the dependent variable. For example, if σi2 = z0i α,
then a consistent estimator of α will be the OLS in the model2
e2i = z0i α + vi∗ , i = 1, 2, ..., N. (9-5)
Having obtained the estimated α̂ in the first step from (9-6), then we substitute
= z0i α̂ into Eq. (9-5), we finish the second step and the FGLS estimator is thus
σ̂i2
obtained.
The two-step estimator may be iterated by recomputing the residuals after com-
puting the FGLS estimate and then reentering the computation
Eq.(9−6) Eq.(9−5) Eq.(9−6) Eq.(9−5)
OLS → β̂ → e −→ α̂(1) −→ β̌ (1) → ě(1) −→ α̂(2) −→ β̌ (2) →
→ ě(2) → .....,
where
ě(1) = y − Xβ̌ (1) .
In this model, vi∗ may be both heteroscedastic and autocorrelated, so α̂ is consistent but inefficient.
2
But, consistency is all that is required for asymptotically efficient estimation of β using Σ(α̂).
For convenience in what follows, substitute εi for (Yi − x0i β), denote fi (α) as simply
fi , and denote the vector of derivatives ∂fi (α)/∂α as gi . Then the derivatives of the
log-likelihood functions are
N
∂ ln L X εi
= xi 2 ,
∂β i=1
σ fi
N N
1 X ε2i
2
∂ ln L N X 1 εi
= − 2+ 4 = −1 ,
∂σ 2 2σ 2σ i=1 fi i=1
2σ 2 σ 2 fi
N
ε2i
∂ ln L X 1 1
= 2
−1 gi .
∂α i=1
2 σ fi fi
The maximum likelihood estimators are those values of β, σ 2 , and α that simultane-
ously equate these derivatives to zero. The likelihood equations are generally highly
nonlinear and will usually require an iteration solution.
Exercise 3
Reproduce the results at Table 11.2 on p.231 of Greene 5th edition.
6 ARCH Model
Heteroscedasticity is often associated with cross-sectional data, whereas time series are
usually studied in the context of homoscedastic processes. In analyses of macroeco-
nomic data, Engle (1982, 1983) and Cragg (1982) found evidence that for some kinds
of data, the disturbance variances in time-series models were less stable than usually
assumed.
With time-series data, it is not uncommon to see that the OLS residuals to be quite
small in absolute value for a number of successive periods of time, then much larger for
a while, then smaller again, and so on. This phenomenon of time-varying volatility (or
disturbances occur in clusters) is often encountered in models for stock returns, foreign
exchange rates, and other series that are determined in financial markets. Numerous
models for dealing with this phenomenon have been proposed. One very popular
approach is based on the concept of autoregressive, conditionally heteroscedastic, or
ARCH, that was introduced by Engle (1982). The basic idea of ARCH models is that
the variance of the disturbance at time t depends on the realized values of squared
disturbances in previous time periods.
A model which allows the conditional variance to depend on the past realization of
the series is considered in the following. Suppose that
p
ut = ht εt (9-6)
ht = α0 + α1 u2t−1 , (9-7)
with E(εt ) = 0 and V ar(εt ) = 1, then this is an example of what will be called an
autoregressive conditional heteroscedasticity (ARCH(1)) model.
Example.
See Figure 11.3 on p.239 of Greene 5th edition.
Let Ft−1 denote the information set available at time t − 1. The conditional mean of
ut is
p
E(ut |Ft−1 ) = ht · E(εt |Ft−1 ) = 0. (9-8)
returns.
By assuming that εt is a Gaussian variate, the condition density of ut given all the
information update to t − 1 is
p p
f (ut |Ft−1 ) = ht f (εt |Ft−1 ) = ht · N (0, 1) ∼ N (0, ht ).