ECON 413 Introductory Econometrics
Chapter 12 – Heteroskedasticity
November 21, 2011
ECON 413 Introductory Econometrics
This Lecture
We are going to relax the assumption that variance of error terms are
homoskedastic, i.e.
var (u|x1 , x2 ...xk ) = σ 2
Notice that the assumption of homoskedasticity plays no role in unbiasedness
and consistency of the estimator
However it has effects on statistical inference
i.e. The t-test, F-test and confidence intervals are constructed base on the
assumption on homoskedasticity
Also, if heteroskedasticity is present, the OLS estimator is no longer the best
linear unbiased estimator
ECON 413 Introductory Econometrics
Testing for Heteroskedasticity
Consider the following linear model:
y = β0 + β1 x1 + β2 x2 + ... + βk xk + u
We would like to test the null hypothesis that:
var (u|x1 , x2 ...xk ) = σ 2
In order to test for violation of the homoskedasticity assumption, we want to
test whether u 2 is related (in expected terms) to one or more of the explanatory
variables.
A simple way is to assume a linear function:
u 2 = δ0 + δ1 x1 + δ2 x2 + ... + δk xk + v
We are testing the null hypothesis of:
δ1 = δ2 = ... = δk = 0
ECON 413 Introductory Econometrics
Testing Heteroskedasticity
Since we never know the actual errors in the population model, but we do have
estimates of them: the OLS residuals, ûi , is an estimate of the error ui for
observation i.
Thus, we can estimate the equation:
uˆi2 = δ0 + δ1 x1 + δ2 x2 + ... + δk xk + error
Let Rû22 be the R-squared in estimating the above equation.
Then the F statistics is:
Rû22 /k
F = ∼ Fk,n−k−1
(1 − Rû22 )/(n − k − 1)
The above procedure is called the Breusch-Pagan Test for heteroskedasticity
(BP test)
ECON 413 Introductory Econometrics
The Breusch-Pagan Test
1 Estimate the model by OLS as usual. Obtain the squared OLS residuals, uˆ2
2 Run the regression of uˆ2 against the independent variables (x1 , x2 , ..., xk ). Keep
the R-squared from the regression, Rû22
3 Form the F statistic and compute the p-value (using the Fk,n−k−1 distribution)
4 If the p-value is sufficiently small, that is, below the chosen significance level,
then we reject the null hypothesis of homoskedasticity
ECON 413 Introductory Econometrics
STATA example
We use the housing price data in HPRICE1.DTA to test for heteroskedasticityin
a simple house price equation
We first estimate the equation by using the levels of all variables:
. reg price lotsize sqrft bdrms
Source SS df MS Number of obs = 88
F( 3, 84) = 57.46
Model 617130.701 3 205710.234 Prob > F = 0.0000
Residual 300723.805 84 3580.0453 R-squared = 0.6724
Adj R-squared = 0.6607
Total 917854.506 87 10550.0518 Root MSE = 59.833
price Coef. Std. Err. t P>|t| [95% Conf. Interval]
lotsize .0020677 .0006421 3.22 0.002 .0007908 .0033446
sqrft .1227782 .0132374 9.28 0.000 .0964541 .1491022
bdrms 13.85252 9.010145 1.54 0.128 -4.065141 31.77018
_cons -21.77031 29.47504 -0.74 0.462 -80.38466 36.84405
ECON 413 Introductory Econometrics
STATA Example
To detect heteroskedasticity, we can first inspect the residuals plot to see if the
variance of the error terms change in any systematic ways with the fitted values
The residual plot can be generated by using the command: rvfplot
200
100
Residuals
0
-100
200 300 400 500 600
Fitted values
We can see that in the residual plot, the errors seem to be more dispersed when
the fitted value becomes bigger
We should then carry out the Breusch-Pagan test for heteroskedasticity
ECON 413 Introductory Econometrics
STATA Example
To test for heteroskedasticity using the Breusch-Pagan Test, we need to first
obtain the residuals from our regression by using the command predict name of
residual, res
Then we need to obtain the squared residuals by using the command: gen
We then regress the squared residuals with the independent variables
The result shows that the p-value for the F-statistic is 0.002 which is strongly
against the null hypothesis of homoskedasticity
. predict res1, residual
. gen res1sq = res1*res1
. reg res1sq lotsize sqrft bdrms
Source SS df MS Number of obs = 88
F( 3, 84) = 5.34
Model 701213780 3 233737927 Prob > F = 0.0020
Residual 3.6775e+09 84 43780003.5 R-squared = 0.1601
Adj R-squared = 0.1301
Total 4.3787e+09 87 50330276.7 Root MSE = 6616.6
res1sq Coef. Std. Err. t P>|t| [95% Conf. Interval]
lotsize .2015209 .0710091 2.84 0.006 .0603116 .3427302
sqrft 1.691037 1.46385 1.16 0.251 -1.219989 4.602063
bdrms 1041.76 996.381 1.05 0.299 -939.6526 3023.173
_cons -5522.795 3259.478 -1.69 0.094 -12004.62 959.0348
ECON 413 Introductory Econometrics
STATA Example
One way to reduce the problem of heteroskedasticity is to use logarithmic
functional forms for the dependent and independent variables
Consider the same example, but now instead of running the regression against
the levels, we estimate the logged version of the regression:
. reg lprice llotsize lsqrft bdrms
Source SS df MS Number of obs = 88
F( 3, 84) = 50.42
Model 5.15504028 3 1.71834676 Prob > F = 0.0000
Residual 2.86256324 84 .034078134 R-squared = 0.6430
Adj R-squared = 0.6302
Total 8.01760352 87 .092156362 Root MSE = .1846
lprice Coef. Std. Err. t P>|t| [95% Conf. Interval]
llotsize .1679667 .0382812 4.39 0.000 .0918404 .244093
lsqrft .7002324 .0928652 7.54 0.000 .5155597 .8849051
bdrms .0369584 .0275313 1.34 0.183 -.0177906 .0917074
_cons -1.297042 .6512836 -1.99 0.050 -2.592191 -.001893
ECON 413 Introductory Econometrics
STATA Example
We can inspect the residual plot:
1
.5
Residuals
0
-.5
-1
5 5.5 6 6.5
Fitted values
Heteroskedasticity has seemed to be improved
ECON 413 Introductory Econometrics
STATA Example
We can conduct the Breusch-Pagan Test:
. predict res2,residual
. gen res2sq = res2*res2
. reg res2sq llotsize lsqrft bdrms
Source SS df MS Number of obs = 88
F( 3, 84) = 1.41
Model .022620168 3 .007540056 Prob > F = 0.2451
Residual .448717194 84 .005341871 R-squared = 0.0480
Adj R-squared = 0.0140
Total .471337362 87 .005417671 Root MSE = .07309
res2sq Coef. Std. Err. t P>|t| [95% Conf. Interval]
llotsize -.0070156 .0151563 -0.46 0.645 -.0371556 .0231244
lsqrft -.0627368 .0367673 -1.71 0.092 -.1358526 .0103791
bdrms .0168407 .0109002 1.54 0.126 -.0048356 .038517
_cons .509994 .257857 1.98 0.051 -.0027829 1.022771
The p-value of the F-test is 0.245 which means we fail to reject the null
hypothesis of homoskedasticity in the model with logarithmic unctional forms
ECON 413 Introductory Econometrics
The White Test for Heteroskedasticity
The assumption of homoskedasticity can in fact be replaced by a weaker
assumption that the squared error u 2 is uncorrelated with all the independent
variables (xj ), the squares of the independent variables (xj2 ), and all the cross
products (xj xh for j 6= h)
This is known as the White test for heteroskedasticity
To test a model with k = 3 independent variables, the White test is based on an
estimation of
uˆ2 = δ0 + δ1 x1 + δ2 x2 + δ3 x3 + δ4 x12 + δ5 x22 +
δ6 x32 + δ7 x1 x2 + δ8 x1 x3 + δ9 x2 x3 + error
The difference between the White test and the Breusch-Pagan test is that the
former includes the squares and cross products of the independent variables
ECON 413 Introductory Econometrics
The White Test
The formulation of the White test above seems to be very complicated
However, we can preserve the spirit of the White test while conserving on the
degrees of freedom by using the OLS fitted values in the test
Recall that the fitted values are defined as:
yˆi = βˆ0 + βˆ1 x1i + βˆ2 x2i + ... + βˆk xki
To obtain function of all the squares and cross products of the independent
variables, we can square the fitted values
This suggests testing for heteroskedasticity by estimating the equation:
uˆ2 = δ0 + δ1 ŷ + δ2 yˆ2 + error
where ŷ stands for fitted values
It is important not to confuse ŷ and y in this equation
We use the fitted values because they are functions of the independent
variables, using y does not produce a valid test for heteroskedasticity
ECON 413 Introductory Econometrics
The White Test
We can use F statistic for the null hypothesis H0 : δ1 = 0, δ2 = 0
We can view the above test as a special case of the White test
The procedure of a special case of the White test for Heteroskedasticity
1 Estimate the model by OLS as usual. Obtain the OLS residuals û and the
fitted values ŷ . Compute the squared OLS residuals uˆ2 and the squared
fitted values yˆ2
2 Run the regression of uˆ2 against ŷ and yˆ2
3 Keep the R-squared from the regression, R 2ˆ2
u
4 Form the F statistic and compute the p-value (using the F2,n−3
distribution)
ECON 413 Introductory Econometrics
Weighted Least Squares Estimation
If we can detect heteroskedasticity in a specific form, we can se a weighted least
squares method to correct for the variance
Suppose the Heteroskedasticity is known up to a multiplicative constant, i.e.
var (u|x) = σ 2 h(x)
where h(x) is some function of the explanatory variables that determines the
heteroskedasticity
Since variance must be positive, so h(x) > 0 for all possible values of the
independent variables
Suppose the heteroskedasticity takes the following form:
σi2 = var (ui |xi ) = σ 2 h(xi ) = σ 2 hi
ECON 413 Introductory Econometrics
Consider the following simple savings function:
savi = β0 + β1 inci + ui
var (ui |inci ) = σ 2 inci
Here, h(x) = h(inc) = inc
i.e. the variance of the error is proportional to the level of income
This means that, as income increases, the variability in savings increases
How can we use the information in the form of heteroskedasticity to estimate
the βj ?
Consider the original equation:
yi = β0 + β1 x1i + β2 x2i + ... + βk xki + ui
Since var (ui |xi ) = E (ui2 |xi ) = σ 2 hi
√
The variance of ui / hi is σ 2 , i.e.
p 2
E [(ui / hi ] = E (ui2 /hi ) = (σ 2 hi )/hi = σ 2
ECON 413 Introductory Econometrics
√
We can divide the original equation by hi to get
p p p p p p
yi / hi = β0 / hi + β1 (x1i / hi ) + β2 (x2i / hi ) + ... + βk (xki / hi ) + (ui / hi )
or
yi∗ = β0 xi0
∗ ∗
+ β1 xi1 + ... + βk xik∗ + ui∗
√
∗ = 1/ h and the other starred variables denote the corresponding
where xi0 i
√
original variables divided by hi
Consider our simple example of savings function, the transformed equation is:
inci = β0 (1/ inci ) + β1 inci + ui∗
p p p
savi /
ECON 413 Introductory Econometrics
Now, with this transformed equation, the assumption of homoskedasticity is
satisfied
The OLS estimators from this transformed equation has appealing properties of
BLUE
These estimators β0∗ , β1∗ , ..., βk∗ will be different from the OLS estimatorsin the
original equation
These estimators are referred to weighted least squares (GLS) estimators
ECON 413 Introductory Econometrics
STATA Example
In this example, we estimate equations that explain net total financial wealth in
terms of income
nettfa = β0 + β1 inc + u
We use the data on single people (fsize=1 ) for the above regression:
. keep if fsize ==1
(7258 observations deleted)
. reg nettfa inc
Source SS df MS Number of obs = 2017
F( 1, 2015) = 181.60
Model 377482.064 1 377482.064 Prob > F = 0.0000
Residual 4188482.98 2015 2078.6516 R-squared = 0.0827
Adj R-squared = 0.0822
Total 4565965.05 2016 2264.86361 Root MSE = 45.592
nettfa Coef. Std. Err. t P>|t| [95% Conf. Interval]
inc .8206815 .0609 13.48 0.000 .7012479 .940115
_cons -10.57095 2.060678 -5.13 0.000 -14.61223 -6.529671
ECON 413 Introductory Econometrics
STATA Example
Suppose we assume the OLS standard errors has a variance: var (u|inc) = σ 2 inc
We can run the following WLS regression:
√ √ √
nettfa/ inc = β0 / inc + β1 inc + u
. gen nettfa1 = nettfa/sqrt(inc)
. gen invincsq = 1/sqrt(inc)
. gen inc1 = sqrt(inc)
. reg nettfa1 invincsq inc1, noc
Source SS df MS Number of obs = 2017
F( 2, 2015) = 138.24
Model 14410.242 2 7205.12101 Prob > F = 0.0000
Residual 105019.471 2015 52.1188444 R-squared = 0.1207
Adj R-squared = 0.1198
Total 119429.713 2017 59.2115585 Root MSE = 7.2193
nettfa1 Coef. Std. Err. t P>|t| [95% Conf. Interval]
invincsq -9.580702 1.653284 -5.79 0.000 -12.82303 -6.338378
inc1 .7870523 .0634814 12.40 0.000 .6625562 .9115484
ECON 413 Introductory Econometrics