ETW2510 Lecture 8 Heteroskedasticity
ETW2510 Lecture 8 Heteroskedasticity
I We have studied the multiple regression model and learnt that when:
1. model is linear in parameters: y = Xβ + u
2. conditional mean of errors is zero: E (u | X) = 0
3. columns of X are linearly independent
⇒ then the OLS estimator βb is an unbiased estimator of β
2 / 27
Recap
I We have studied the multiple regression model and learnt that when:
1. model is linear in parameters: y = Xβ + u
2. conditional mean of errors is zero: E (u | X) = 0
3. columns of X are linearly independent
⇒ then the OLS estimator βb is an unbiased estimator of β
I if in addition,
4. sample is random and errors are homoskedastic: Var (u | X) = σ 2 In ,
b = σ 2 (X0 X)−1
⇒ then βb is the BLUE and Var (β)
2 / 27
Recap
I We have studied the multiple regression model and learnt that when:
1. model is linear in parameters: y = Xβ + u
2. conditional mean of errors is zero: E (u | X) = 0
3. columns of X are linearly independent
⇒ then the OLS estimator βb is an unbiased estimator of β
I if in addition,
4. sample is random and errors are homoskedastic: Var (u | X) = σ 2 In ,
b = σ 2 (X0 X)−1
⇒ then βb is the BLUE and Var (β)
I If, in addition to the above,
5. errors are normally distributed,
⇒ then conditional on X, βb is normally distributed, and we can use the
usual t and F tests to make inferences based on the OLS estimator
2 / 27
Lecture Outline
I Heteroskedasticity:
1. Definition of heteroskedasticity and its consequences for OLS
(textbook reference 8-1)
2. Testing for heteroskedasticity (textbook reference 8-3)
2.1 Breusch-Pagan test
2.2 White test
3. Heteroskedasticity robust standard errors (a simplified version
of 8-2)
4. Weighted least squares when heteroskedasticity is known up to
a multiplicative constant (textbook reference 8-4a)
I We will not cover heteroskedasticity robust LM tests (the last part
of section -.2), Feasible GLS and the consequences of wrong
specification of the variance function (section 8.4b, 8.4c) and the
linear probability model (section 8-5).
3 / 27
Heteroskedasticity
I Sometimes there is a good reason to doubt the assumption of equal
variance for all errors. Here are some examples:
I In the study of food consumption, income is an important
explanatory variable. It is unreasonable to assume that the variance
of food consumption is the same for poor and rich people
I In many cases we do not have individual data (for confidentiality
reasons), but we get information on averages over groups of
individuals.
I For example, we can get incidences of crime per 1000 people,
employment rate and income per capita in each district. These are
averages, but different districts have different populations, so there
is a good reason to believe that variances of these averages depend
inversely on the population of each district
I In finance, some unpredicted news increase the volatility of the
market (i.e. the variance of the market return) and this can last for
several days (a large part of financial econometrics ETC3460 is
about modelling this phenomenon)
4 / 27
I A 3D graphical representation of heteroskedasticity:
5 / 27
Consequences of heteroskedasticity (HTSK) for OLS
I HTSK does not affect the first 3 assumptions on the Recap slide,
therefore the OLS estimator will remain unbiased
I HTSK violates assumption 4, therefore the OLS estimator will no
longer be BLUE and Var (β) b 6= σ 2 (X0 X)−1 . This means that the
default standard errors reported by the statistical package for the
OLS estimator will be incorrect
I Even if errors are Normally distributed, the t and F tests based on
the default OLS standard errors will be unreliable
I Fortunately, if we detect HTSK, we have ways to conduct reliable
inference based on the OLS estimator, or even obtain a more
efficient estimator than the OLS estimator
6 / 27
Detecting HTSK
I As always, step 1: think about the problem!
I If we only have one x, the scatter plot can give us a clue:
7 / 27
Testing for HTSK
I Since E (ui | xi1 , . . . , xik ) = 0,
Var (ui | xi1 , . . . , xik ) = E (ui2 | xi1 , . . . , xik )
I If we suspect that variance can change with some subset of the
independent variables, or even some exogenous variables that do not
affect the mean, but can affect the variance, then, if we had ui , we
could square it and estimate the conditional expectation function of
ui2 . But ui is unknown :-(
I Australian econometricians Trevor Breusch and Adrian Pagan, and
the American econometrician Hal White showed that we can use the
OLS residuals instead, and in large samples this will give us reliable
results :-)
8 / 27
Breusch-Pagan test
yi = β0 + β1 xi1 + β2 xi2 + · · · + βk xik + ui for i = 1, . . . , n
H0 : E (ui2 | xi1 , xi2 , . . . , xik ) = σ 2 for i = 1, . . . , n
H1 : E (ui2 | xi1 , xi2 , . . . , xik ) = δ0 + δ1 zi1 + δ2 zi2 + · · · + δq ziq
where zi1 , . . . , ziq are a subset of xi1 , xi2 , . . . , xik . In fact the z variables
can include some variables that do not appear in the conditional mean,
but may affect the variance (note the difference with the book)
1. Estimate the model by OLS as usual. Obtain ûi for i = 1, . . . , n and
square them.
2. Regress ûi2 on a constant, zi1 , . . . , ziq . Denote the R 2 of this
auxiliary regression by Rû22 .
3. Under H0 , the statistic n × Rû22 has a χ2 distribution with q degrees
of freedom in large samples. This statistic is called the Lagrange
multiplier (LM) statistic for HTSK.
4. Given the desired level of significance, we obtain the cv of the test
from the χ2 table, and reject H0 if the value of the test statistic is
larger than the cv.
9 / 27
I Under H0 the F test for the overall significance of the second
regression has an Fq,n−q−1 distribution, and it can also be used to
test for HTSK.
I Example: Net financial wealth in $1,000s, predicted by age and
current income in $1,000s
10 / 27
I Can save residuals, and run the OLS of û 2 on a constant and all the
explanatory variables, or we can use Eviews
11 / 27
I Choosing Breusch-Pagan test in Eviews produces:
12 / 27
White Test for HTSK
13 / 27
A Special Case of the White Test for HTSK
I A concern with White test is that the auxiliary regression will have
k + k(k + 1)/2 regressors, which is a very large number of
restrictions
I Recall that the fitted values from OLS, ŷ are a function of all the xs
I Thus, ŷ 2 will be a function of the squares and crossproducts of all
the x’s and ŷ and ŷ 2 can proxy for all of the the x’s their squares
and cross products.
I A special form of the White test would be to regress the residuals
squared on ŷ and ŷ 2 and use the R 2 of this regression to form an F
or LM statistic
I Note only testing for 2 restrictions now, no matter how many
independent variables we have
14 / 27
I In the financial wealth example, choosing the White test in Eviews
produces:
15 / 27
I Eviews does not have a built in command for the alternate form of
the White test, and we need to save the OLS residuals and OLS
predictions and then run the auxiliary regression:
16 / 27
Solution 1: Robust Standard Errors
I Since OLS estimator is still unbiased, we may be happy to live with
the OLS even if it is not BLUE. But the real practical problem is
that t and F statistics based on OLS standard errors are unusable
I Recall the derivation of Var (βb | X):
I With homoskedasticity,
I With HTSK
2
σ1 0 ··· 0
0 σ22 ··· 0
Var (u | X) = .
.. .. ..
.. . . .
0 0 ··· σn2
17 / 27
I Therefore, with HTSK
2
σ1 0 ··· 0
0 σ22 ··· 0
Var (βb | X) = (X0 X)−1 X0 .
0 −1
X (X X)
.. .. ..
..
. . .
0 0 ··· σn2
18 / 27
I Back to the example. The option of robust standard errors is under
the Options tab of the equation window:
19 / 27
I With this option, we get the following results:
20 / 27
Solution 2: Transform the Model
21 / 27
Weighted Least Squares
I Suppose the model
22 / 27
I The transformed (or “weighted”) model:
23 / 27
I In the financial wealth example, the auxiliary regression suggests
that the variance changes with income. Since income is positive for
all observations (why is this important?), we hypothesise that
Var (ui | inci , agei ) = σ 2 inci
I We create wi = √ 1 [Eviews command: series
inci
w=1/@sqrt(inc)] and we run the weighted regression
I The standard errors are now reliable for inference and for forming
confidence intervals
24 / 27
I Eviews also has a built in WLS command under the option tab of
the equation window. We need to enter the name of the weight
series.
25 / 27
I The only advantage of the Eviews built in command is that it
produces a set of statistics for the original model
27 / 27