Chapter 8
Chapter 8
A Heteroskedasticity-Robust LM Statistic
2
Introduction
Homoskedasticity assumption states that the variance of the unobserved error, u, conditional on the explanatory
variables is constant.
In Chapters 4 and 5, we saw that homoskedasticity is needed to justify the usual t tests, F tests, and
confidence intervals for OLS estimation of the linear regression model, even with large sample sizes
Homoskedasticity assumption fails whenever the variance of the unobserved factors changes across different
segments of the population (called Heteroskedasticity)
In this chapter, we discuss the available remedies when heteroskedasticity occurs, and we also show how to test for
its presence. We begin by briefly reviewing the consequences of heteroskedasticity for ordinary least squares
estimation.
3
8.1 Consequences of heteroskedasticity for OLS
Consequences of heteroskedasticity
Unconditional error variance is unaffected by heteroskedasticity (which refers to the conditional error variance)
2
2 σu
R ≈ 1 − 2
σy
4
# Load packages
library(dplyr)
library(ggplot2)
library(wooldridge)
# Load the sample
data("saving")
# Only use positive values of saving, which are smaller than income
saving <- saving %>%
filter(sav > 0,
inc < 20000,
sav < inc)
# Plot
ggplot(saving, aes(x = inc, y = sav)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Annual income", y = "Annual savings")
5
6
8.2 Heteroskedasticity-robust inference after OLS
estimation
Formulas for OLS standard errors and related statistics have been developed that are robust to
heteroskedasticity of unknown form.
All formulas are only valid in large samples.
Formula for heteroskedasticity-robust OLS standard error.
White/ Hubber/ Eicker standard errors: they involve the squared residuals from the regression and from a
regression of xj on all other explanatory variables
n 2 2
∑ r̂ ij û i
i=1
ˆ (β̂ ) =
Var j 2
SSR
j
7
a) Heteroskedasticity-Robust LM statistic
Not all regression packages compute F statistics that are robust to heteroskedasticity.
It is sometimes convenient to have a way of obtaining a test of multiple exclusion restrictions that is robust to
heteroskedasticity and does not require a particular kind of econometric software--> Heteroskedasticity Robust
LM statistic
2. Regress each of the independent variables excluded under the null on all of the included independent
variables; if there are q excluded variables, this leads to q sets of residuals (r
~ ~ ~
1, r 2, … , r q)
where SSR1 is just the usual sum of squared residuals from this final regression.
Under H0 , LM is distributed approximately as χ2q .
13
8.3 Testing for Heteroskedasticity
Breusch-Pagan test for heteroskedasticity
2
H0 : Var(u ∣ x1 , x2 , … , xk ) = Var(u ∣ x) = σ
2 2 2
Var(u ∣ x) = E (u ∣ x) − [E(u ∣ x)] = E (u ∣ x)
2 2 2
⇒ E (u ∣ x1 , … , xk ) = E (u ) = σ
y = β0 + β1 x1 + β2 x2 +. . +u
Regress squared residuals on all independent variables and test whether this regression has explanatory power
error
2
û = δ 0 + δ 1 x1 + ⋯ + δ k xk +
H 0 : δ1 = δ2 = ⋯ = δk = 0
16
Breusch-Pagan test for heteroskedasticity
error
2
û = δ 0 + δ 1 x1 + ⋯ + δ k xk +
H 0 : δ1 = δ2 = ⋯ = δk = 0
A large test statistic (= a high R-squared) is evidence against the null hypothesis that the expected value of u2 is
unrelated to the explanatory variables.
2
R 2
û /k
F =
2
1 − R 2
/(n − k − 1)
û
Alternative test statistic (= Lagrange multiplier statistic, LM). Again, high values of the test statistic (=high R-
squared) lead to rejection of the null hypothesis.
2 2
LM = n ⋅ R 2
∼ χ
k
û
19
Example: Heteroskedasticity in housing price equations
20
data(hprice1, package='wooldridge')
# Regression model for price
model_a <- feols(price ~ lotsize + sqrft + bdrms, hprice1)
#You can use clustered or robust uncertainty estimates by modifying the vcov parameter.
#This function accepts 5 different types of input. You can use a string or a vector of strings:
modelsummary(model_a, vcov = c("classical", "robust", "stata"), gof_omit = ".*")
#modelsummary(models, vcov = "robust")
21
Model 1 Model 2 Model 3
22
data(hprice1, package='wooldridge')
# Regression model for price
model_0 <- lm(price ~ lotsize + sqrft + bdrms, hprice1)
modelsummary(model_0,output = "markdown")
hprice1 %<>% mutate(uhat = resid(model_0))
23
# Graph of residuals against independent variable
ggplot(data = hprice1, mapping = aes(x = sqrft, y = uhat)) +
theme_bw() +
geom_point() +
geom_hline(yintercept = 0, col = 'red') +
labs(y = 'Residuals', x = 'Square feet, sqrft')
24
25
# Graph of residuals against fitted values
hprice1 %<>% mutate(yhat = fitted(model_0))
ggplot(data = hprice1, mapping = aes(x = yhat, y = uhat)) +
theme_bw() +
geom_point() +
geom_hline(yintercept = 0, col = 'red') +
labs(y = 'Residuals', x = 'Fitted values')
26
27
# Regression model for lprice
model_1 <- lm(lprice ~ llotsize + lsqrft + bdrms, hprice1)
hprice1 %<>% mutate(uhat1 = resid(model_1))
#summary ( lm(lprice ~ llotsize + lsqrft + bdrms, hprice1))
#modelsummary(model_1,output = "markdown")
28
# Graph of residuals against independent variable
ggplot(hprice1) +
theme_bw() +
geom_point(aes(x = lsqrft, y = uhat1)) +
geom_hline(yintercept = 0, col = 'red') +
labs(y = 'Residuals', x = 'Log square feet, lsqrft')
29
30
# Graph of residuals against fitted values
hprice1 %<>% mutate(yhat1 = fitted(model_1))
ggplot(data = hprice1, mapping = aes(x = yhat1, y = uhat1)) +
theme_bw() +
geom_point() +
geom_hline(yintercept = 0, col = 'red') +
labs(y = 'Residuals', x = 'Fitted values')
31
32
data(hprice1, package='wooldridge')
model_0 <- lm(price ~ lotsize + sqrft + bdrms, hprice1)
hprice1 %<>% mutate(uhat = resid(model_0), yhat =fitted(model_0))
# Get residuals(uhat) and predicted values(yhat), and square them
hprice1 %<>% mutate(uhatsq = uhat^2, yhatsq = yhat^2)
# Regression for Breusch-Pagan test
model_BP <- lm(uhatsq ~ lotsize + sqrft + bdrms, hprice1)
#summary(model_BP)
# Number of independent variables k1
(k1 <- model_BP$rank - 1)
## [1] 3
33
# F-test and LM-test for heteroscedasticity
(r2 <- summary(model_BP)$r.squared) # R-squared
## [1] 0.1601407
## [1] 88
## [1] 5.338919
## [1] 0.002047744
## [1] 14.09239
## [1] 0.00278206 34
a) The White Test for Heteroskedasticity
Regress squared residuals on all independent variables, their squares, and all interaction terms
2 2 2 2
û =δ0 + δ1 x1 + δ2 x2 + δ3 x3 + δ4 x + δ5 x + δ6 x
1 2 3
+ δ7 x1 x2 + δ8 x1 x3 + δ9 x2 x3 + error
The White test detects more general deviations from heteroskedasticity than the Breusch- Pagan test
H 0 : δ1 = δ2 = ⋯ = δ9 = 0
2 2
LM = n ⋅ R 2
∼ χ
9
û
Disadvantage
Including all squares and interactions leads to a large number of estimated parameters (e.g. k=6 leads to 27
parameters to be estimated).
36
# White test --------------------------------------------------------------
# Generate squares and interaction of independent variables
hprice1 %<>% mutate(lotsizesq = lotsize^2,
sqrftsq = sqrft^2,
bdrmssq = bdrms^2,
lotsizeXsqrft = lotsize*sqrft,
lotsizeXbdrms = lotsize*bdrms,
sqrftXbdrms = sqrft*bdrms)
# Regression for White test
model_White <- lm(uhatsq ~ lotsize + sqrft + bdrms + lotsizesq + sqrftsq +
bdrmssq + lotsizeXsqrft + lotsizeXbdrms + sqrftXbdrms, hprice1)
#summary(model_White)
(k2 <- model_White$rank - 1) # Number of independent variables k2
## [1] 9
## [1] 0.3833143
## [1] 5.386953
## [1] 1.012939e-05
## [1] 33.73166
## [1] 9.95294e-05
38
Alternative form of the White test
error
2 2
û = δ0 + δ1 ŷ + δ2 ŷ +
This regression indirectly tests the dependence of the squared residuals on the explanatory variables, their
squares, and interactions, because the predicted values of y and its square implicitly contrain all of these terms
39
# Alternative White test ----
# Regression for alternative White test
model_Alt <- lm(uhatsq ~ yhat + yhatsq, hprice1)
# Get residuals and predicted values, and square them
hprice1 %<>% mutate(uhat1 = resid(model_Alt),
yhat1 = fitted(model_Alt),
uhat1sq = uhat1^2,
yhat1sq = yhat1^2)
## [1] 2
40
# F-test and LM-test for heteroscedasticity
(r2 <- summary(model_Alt)$r.squared) # R-squared
## [1] 0.1848684
## [1] 88
## [1] 9.638819
## [1] 0.0001687248
## [1] 16.26842
## [1] 0.0002933311 41
price<- list(
"Model for price "= model_0,
"Breusch-Pagan Test (uhatsq)"=model_BP,
"White Test (uhatsq) "=model_White,
"Alternative White Test (uhatsq)"=model_Alt)
#create one table
modelsummary(price,stars = TRUE,fmt = 2,gof_omit = "R2 | R2 Within |AIC|BIC|Log.Lik.|R2 Pseudo")
42
Model for Breusch-Pagan Test White Test Alternative White Test
price (uhatsq) (uhatsq) (uhatsq)
(Intercept) −21.77 −5522.79+ 15626.24 19071.59*
(29.48) (3259.48) (11369.41) (8876.23)
lotsize 0.00** 0.20** −1.86**
(0.00) (0.07) (0.64)
sqrft 0.12*** 1.69 −2.67
(0.01) (1.46) (8.66)
bdrms 13.85 1041.76 −1982.84
(9.01) (996.38) (5438.48)
lotsizesq 0.00
(0.00)
sqrftsq 0.00
(0.00)
bdrmssq 289.75
(758.83)
(Intercept) 19071.59*
(8876.23)
yhat −119.66*
(53.32)
yhatsq 0.21**
(0.07)
Num.Obs. 88
R2 0.185
F 9.639
RMSE 6480.06
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
44
Heteroskedasticity tests summary for price regressions
All tests show heteroscedasticity in price. The regression for price needs correction for heteroscedasticity.
45
Heteroskedasticity tests summary for log of price regressions
All tests show homoscedasticity in log price. The regression for log price does not need correction for
heteroscedasticity.
46
8.4 Weighted Least Squares Estimation
Before the development of heteroskedasticity-robust statistics, the response to a finding of heteroskedasticity was
to specify its form and use a weighted least squares method, which we develop in this section
If we have correctly specified the form of the variance (as a function of explanatory variables), then weighted least
squares (WLS) is more efficient than OLS, and WLS leads to new t and F statistics that have t and F
distributions.
We will also discuss the implications of using the wrong form of the variance in the WLS procedure.
49
a) The Heteroskedasticity Is Known Up to a Multiplicative Constant
2
Var(ui ∣ xi ) = σ h (xi ) , h (xi ) = hi > 0
yi = β0 + β1 xi1 + ⋯ + βk xik + ui
yi 1 xi1 xik ui
[ ] = β0 [ ] + β1 [ ] + ⋯ + βk [ ] + [ ]
√hi √hi √hi √hi √hi
Transformed model
∗ ∗ ∗ ∗ ∗
y = β0 x + β1 x + ⋯ + βk x + u
i i0 i1 ik i
52
Example: Savings and income
2
savi = β0 + β1 inci + ui , Var(ui ∣ inci ) = σ inci
savi 1 inci ∗
[ ] = β0 [ ] + β1 [ ] + u
√inci √inci √inci i
2 2
2
ui E(u ∣x) σ hi
∗2 i 2
E (u ∣ xi ) = E [( ) ∣ xi ] = = = σ
i √h i hi hi
If the other Gauss-Markov assumptions hold as well, OLS applied to the transformed model is the best linear
unbiased estimator.
53
OLS in the transformed model is weighted least squares (WLS)
2
n yi 1 xi1 xik
min ∑ ([ ] − b0 [ ] − b1 [ ] − ⋯ − bk [ ])
i=1 √hi √hi √hi √hi
n 2
min ∑ (yi − b0 − b1 xi1 − ⋯ − bk xik ) /hi
i=1
Observations with a large variance get a smaller weight in the optimization problem
Observations with a large variance are less informative than observations with small variance and therefore
should get less weight.
54
# When the heteroscedasticity form is known var(u|x)=(sigma^2)*(sqrft),
# use WLS with weight=1/sqrft.
# WLS: estimate model with weight=1/sqrft
model_WLS1 <- lm(formula = price ~ lotsize + sqrft + bdrms,
data = hprice1, weights = 1/sqrft)
# Multiply all variables and the constant by 1/sqrt(sqrft)
hprice1 %<>% mutate(pricestar = price/sqrt(sqrft),
lotsizestar = lotsize/sqrt(sqrft),
sqrftstar = sqrft/sqrt(sqrft),
bdrmsstar = bdrms/sqrt(sqrft),
constantstar = 1/sqrt(sqrft))
# WLS: estimate model with transformed variables by OLS
model_WLS2 <- lm(pricestar ~ 0 + constantstar+ lotsizestar + sqrftstar + bdrmsstar,
hprice1)
55
Model 1 Model 2
(Intercept) 4.199
(29.698)
lotsize 0.002**
(0.001)
sqrft 0.118***
(0.014)
bdrms 10.607
(8.659)
constantstar 4.199
(29.698)
lotsizestar 0.002**
(0.001)
sqrftstar 0.118***
(0.014)
bdrmsstar 10.607
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 56
Important special case of heteroskedasticity
If the observations are reported as averages at the city/county/state/-country/firm level, they should be weighted
by the size of the unit
If errors are homoskedastic at the individual-level, WLS with weights equal to firm size mi should be used.
If the assumption of homoskedasticity at the individual-level is not exactly right, one can calculate robust
standard errors after WLS (i.e. for the transformed model).
57
b) The heteroskedasticity function must be estimated: Feasible GLS
Unknown heteroskedasticity function (feasible GLS)
In many cases is difficult to find the function h(x) so we can use data to estimate it h(x)
ˆ
2
⇒ log(u ) = α0 + δ1 x1 + ⋯ + δk xk + e
error
2
ˆ ˆ
log(û ) = α̂0 + δ 1 x1 + ⋯ + δ k xk +
ˆ ˆ
⇒ ĥi = exp(α̂0 + δ 1 x1 + ⋯ + δ k xk )
61
Steps to compute a Feasible GLS
1. Run the regression of y on x1 , x2 , … , xk and obtain the residuals, û.
2. Create log(û by first squaring the OLS residuals and then taking the natural log.
2
)
3. Run the regression in equation log(u on x1 , x2 , … , xk and obtain the fitted values, g
^.
2
^ )
If instead we first transform all variables and run OLS, each variable gets multiplied by 1/√h
^
i , including
the intercept.
62
Example: Demand for cigarettes
Estimation by OLS
63
Example: Demand for cigarettes
Estimation by FGLS
Discussion:
The income elasticity is now statistically significant; other coefficients are also more precisely estimated
(without changing qualitative results).
64
# Feasible GLS (FGLS) -----------------------------------------------------
# When the heteroscedasticity form is not known,
# var(u|x) = sigma^2*(delta0 + delta1*lotsize + delta2*sqrft + delta3*bdrms)
# estimate hhat and use WLS with weight=1/hhat.
# Heteroscedasticity form, estimate hhat
# model_0 <- lm(price ~ lotsize + sqrft + bdrms, hprice1)
summary(model_0)
hprice1 %<>% mutate(u = resid(model_0),
g = log(u^2))
model_g <- lm(g ~ lotsize + sqrft + bdrms, hprice1)
hprice1 %<>% mutate(ghat = fitted(model_g),
hhat = exp(ghat))
# FGLS: estimate model using WLS with weight=1/hhat
model_FGLS1 <- lm(formula = price ~ lotsize + sqrft + bdrms,
data = hprice1,
weights = 1/hhat)
#summary(model_FGLS1)
65
##
## Call:
## lm(formula = price ~ lotsize + sqrft + bdrms, data = hprice1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -120.026 -38.530 -6.555 32.323 209.376
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.177e+01 2.948e+01 -0.739 0.46221
## lotsize 2.068e-03 6.421e-04 3.220 0.00182 **
## sqrft 1.228e-01 1.324e-02 9.275 1.66e-14 ***
## bdrms 1.385e+01 9.010e+00 1.537 0.12795
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 59.83 on 84 degrees of freedom
## Multiple R-squared: 0.6724, Adjusted R-squared: 0.6607
## F-statistic: 57.46 on 3 and 84 DF, p-value: < 2.2e-16
66
# Multiply all variables and the constant by 1/sqrt(hhat)
hprice1 %<>% mutate(pricestar1 = price/sqrt(hhat),
lotsizestar1 = lotsize/sqrt(hhat),
sqrftstar1 = sqrft/sqrt(hhat),
bdrmsstar1 = bdrms/sqrt(hhat),
constantstar1 = 1/sqrt(hhat))
# FGLS: estimate model with transformed variables by OLS
model_FGLS2 <- lm(pricestar1 ~ 0 + constantstar1 + lotsizestar1 + sqrftstar1 +
bdrmsstar1, hprice1)
#summary(model_FGLS2)
67
68
fgls<-list(model_FGLS1,model_FGLS2)
modelsummary(fgls,stars = TRUE,fmt = 3,gof_omit = "R2 | R2 Within |AIC|BIC|Log.Lik.|R2 Pseudo")
69
Model 1 Model 2
(Intercept) 45.912
(30.824)
lotsize 0.004**
(0.001)
sqrft 0.092***
(0.015)
bdrms 6.175
(8.894)
constantstar1 45.912
(30.824)
lotsizestar1 0.004**
(0.001)
sqrftstar1 0.092***
(0.015)
bdrmsstar1 6.175
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 70
c) What if the assumed heteroskedasticity function is wrong?
If the heteroskedasticity function is misspecified, WLS is still consistent under MLR.1 – MLR.4, but robust
standard errors should be computed.
If OLS and WLS produce very different estimates, this typically indicates that some other assumptions (e.g.
MLR.4) are wrong.
If there is strong heteroskedasticity, it is still often better to use a wrong form of heteroskedasticity in order to
increase efficiency.
71