Chapter 8 Heteroskedasticity
Chapter 8 Heteroskedasticity
HETEROSKEDASTICITY
What is heteroskedasticity?
The likely variability in the variance of
random error is known as
heteroskedasticity. One of the
assumptions of least squares estimation is
the constancy of error variance 2 but if
the assumption is violated then the
ordinary least square process will no
longer have all desirable qualifications.
Further understanding
Recall the example of household income and food
expenditure. Different income levels would
determine different food expenditures. Food
expenditures are supposed to scatter around the
mean expenditure with a certain variance. If this
variance is equal for any level of income then
the data qualify the property of
homoskedasiticity. If, on the other hand, food
expenditures have a variable nature of
uncertainty then the corresponding distributions
of food expenditures would have different
variances- characterized as heteroskedasticity.
Implication of heteroskedasticity
As an intuition, think of a possibility of more
variability in food expenditure of high income
people relative to the low income group. Low
income group’s expenditure is largely
determined by their income but high income
people’s expenditure is attributable to many
factors other than income. Naturally
expenditures of high income group would not be
very much clustered around the mean
expenditure although low income group’s
expenditure is.
Heteroskedasticity implications
• Existence of heteroskedasticity would
mean that the probability of obtaining
larger absolute values of ei is higher for
high income than low income.
• High income people’s expenditure is
influenced by other factors than income
which leads to having larger ei for high
icome, which eventually mean higher
variance of ei at high icome levels.
The above statement qualifies the fact that
variance of ei, and therefore yi, depend on
xi.
Formal treatment of heteroskedasticity
Let ei is the difference between individual i’s food
expenditure and mean expenditure, i.e., ei=yi-E(yi),
where
E( y ) = + x y = + x + e .......(8.3)
i 1 2i i 1 2i i
pdf ( y / x )
1 y
x pdf ( y / x )
1 2
x
2
E( y) = + x
1 2
x
Graph of heteroskedasticity
pdf
f (y)
pdf ( y / x )
1
y
x pdf ( y / x )
1 2
x
2
E( y) = + x
1 2
x
Graphical illustration of heteroskedasticity
Implications
. Graph of heteroskedasticity
• At xi, the probability density
pdf
f (y)
function f(y1/x1) is such that y1
will be close to E(y1) with high
f (y |x ) probability.
1 1
• For higher x, the pdf f(y2/x2) is
y more spread out; we are less
x
1 f (y |x )
2 2
certain about where y2 might
x
2 fall,
E( y) = + x
1 2 • When homoskedasticity exists,
x the probability density
function for the errors does not
change as x changes.
Questions to be addressed
• What are the consequences of
heteroskedasticity?
• Is there a better estimation technique?
• How to detect the existence of
heteroskedasticity?
400
of higher income, i.e.,
300
mean y is further apart
from individual y at higher
200 income levels. Variance
of y, hence of e, is higher
100 when we take account of
0 5 10 15 20 25 30 35
the higher income level
INCOME
group- this characterizes
heteroskedasticity.
Consequences of heteroskedasticity for
least squares estimation
1. Under heteroskedasticity, the standard
errors usually computed for the LSE are
incorrect. Confidence intervals and
hypothesis tests that use these standard
errors may be misleading.
But for the simple linear regression model yi = 1+ 2xi +ei
where var(e ) = 2 , econometrician Hal White proposed
i i
− 2 2
i( x x
)
i
ˆ 2 2
var( ) = w =
... ... (8.8)
2 i i 2
x −x
− 2
Where, w= i ( x x )
i (x − x)2 i
i
The resulting standard errors have become known as White’s
heteroskedasticity-consistent standard errors, or
heteroskedasticity robust, or simply robust standard errors.
White’s standard error
To obtain the white stanadard error for ̂ corresponding to
2 eˆ = y − ˆ − ˆ x
8.8, we obtain the least squares residuals
i i 1 2i
and replace 2 in (8.8) with the squares of the least squares
i
residuals. The white variance estimator is given by
− 2 ˆ 2
i( x
x ) e
i
var̂(ˆ ) = w eˆ =
2 2
... ... (8.9)
2 i i 2
( x − x )
2
i
• Manipulation 2
e
= 1 var(e ) = 1 2x = 2
x i x i
i i
Estimate the transformed model
You are now ready to apply the LS technique to obtain the
estimates of and using the observable asterisked
1 2
variables- these are generalized least squares
estimators.
Remember!
1. The model no longer contains any intercept term.
2. The transformed model is linear in the unknown
parameters and . These are the original
1 2
parameters we are interested in. Transformation does
not affect them anyway.
3. The transformed model satisfies the conditions of the
Gauss-Markov theorem, and the LS estimators defined
in terms of the transformed variables are BLUE.
GLS are WLS
GLS estimators can be viewed as weighted least squares
estimator. Least square estimator yields values of and
that minimize the sum of squared errors. In this case,1 we are2
minimizing the sum of squared transformed errors that is
given by
e2
*2 = i = −1/ 2e )2
i
e i
( x
i
x
i
The errors are weighted by x−1/ 2, which is reciprocal of x
i i
Note! x
- When i is small, the data contain more information about
the regression function and the observations are weighted
heavily.
- When
x
i is large, the data contain less information about
the regression function and the observations are weighted
lightly.
y 1 x e
• i = + i + i
1 2
i i i i
or, yi* = 1xi*1+ 2xi*2 + ei* , where
y 1 x e
y =
* i , x = , x* = i , e* = i
*
i i1 i2 i
i i i i
We have var(e ) = 2 = 2x ... ... (8.17)
i i i
• dividing by
i
• But is unknown. Therefore, you need the estimate
i
of .
i
We got ln 2 = + z
i 1 2i
Therefore,
+ z
2 =e 1 2 i
i
Variance function
• Meanwhile, we got the estimates α̂1 and α̂2
Y1_STAR = 76.0538706374*X11_STAR +
10.6334817282*X22_STAR
Observe
There is a considerable drop in the standard errors compared to
the GLS estimates using the variance function var(e ) = 2x
i i
but the estimates of and have not changed a great
1 2
deal. Under previous specification standard errors of 1 and 2
were 23.79 and 1.39 respectively, whereas they are now 9.71
and 0.97. This improved results are attributable to better
modeling, i.e., more general variance specification and better
estimation.
How does mean wage vary if the worker resides in
metropolitan area? Introducing heteroskedastic
partition.
Using ch8_cps2 data we obtain the following regression output
40
30
RESIDUAL
20
10
-10
-20
0 4 8 12 16 20
EDUC
Residual Plot
Residuals against experience
50
40
30
RESIDUAL
20
10
-10
-20
0 10 20 30 40 50 60
EXPER
Residuals against y_cap
* 50
40
30
RESIDUAL
20
10
-10
-20
-5 0 5 10 15 20
WAGE_CAP
2. The Goldfeld-Quandt test
This test is specially designed under the circumstance of
heteroskedastic partition. Consider the variance difference of
metropolitan and rural areas H : 2 = 2
R H1: M R
2 2
0 M
ˆ 2 / 2
F = M M ~ F (N − K , N − K )
Test statistic:
2 2 M M R R
/
ˆ
R R
Computed value of ˆ 2
F = M ~ F (808−3,192 −3)
ˆ 2
R
Computed F=31.284/15.243=2.09
FLc=F(0.025, 805, 189)=0.81; FUc=F(0.975, 805, 189)=1.26
Reject null and conclude that the wage variances for the rural
and metropolitan regions are not equal.
Note: for one-tail test the critical value changes.
Goldfeld-Quandt test under one explanatory variable
Apart from heteroskedastic partition the above test could be
applied if the variance is a function of a single explanatory
variable. To perform the test the observations are ordered
according to the explanatory variabale so that, if
heteroskedasticity exists, the first half of the sample will
correspnd to observations with lower variances and he last
half of them will correspond to obsevation with higher
variances. Split the observations into approximately two equal
halves and carry out two separate least squuares regressions
that yield variance estimates, ˆ 2 and ˆ 2
1 2
Use food data: using first 20 and second 20 observations we
obtain ˆ 2 = 3574.8 ˆ 2 =12921.9 2
1 2 ˆ
F = 2 =12921.9 = 3.61
ˆ 2 3574.8
1
Believing that the variances could increase with income, use a
one-tail test with 5% critical value, F(0.95, 18, 18)=2.22. reject
null and conclude that variance increases with income.
On the basis of food data, can you conclude
that variance increases with income?
3. Testing Heteroskedasticity on the basis of variance function
The variance function is relevant when heteroskedasticity is a
possibility. In the event of homoskedasticity you are not
supposed to write the variance as a function since it is purely
constant, however.
A general form of variance function is
var( y ) = 2 = E(e2) = h( + z +.....+s z ) .... (8.37)
i i i 1 2 i2 is
One specific form of variance function is
+ z +.....+ s z
h( + z +.....+ s z ) = e 1 2 i2 is
1 2 i2 is
ln 2 + ln x
One special case is h( + z ) = e i
1 2i
Another example is cited as a linear one:
h( + z +.....+s z ) = + z +.....+s z ... (8.38)
1 2 i2 is 1 2 i2 is
Testing procedure
If = =....=s = 0 then the variance would be
2 3
constant, homoskedasticity will be ensured. Therefore
testing for heteroskedasticity would mean testing the null
hypothesis H : = =....= = 0
0 2 3 S
against the alternative that
H :not all the s in H are zero
1 0
To obtain test statistic reconsider the variance function
var( y ) = 2 = E(e2) = + z +.....+s z
i i i 1 2 i2 is
Let v = e2 − E(e2) is the difference between a squared error
i i i
and its mean, then
e2 = E(e2) + v = + z +.....+s z + v .... ...(8.41)
i i i 1 2 i2 is i
There is a chance to think of estimating this function as we do
in the case of regressing yi on xi but ys are observable
whereas e are not
Test procedure cont.
e2 by residuals eˆ2.
Get rid of the problem by replacing
i
eˆ2 = + z +.....+s z + v ....i ...(8.42)
i 1 2 i2 is i
Our interest is to investigate if zi2, zi3... Zis help explain the
variation in eˆ2 . Since the R2 goodness-of-fit statistic from
i
(8.42) measures the proportion of variation in eˆ2 explained
i It can be
by the zs, it is a natural candidate for a test statistic.
shown that R2 times the sample size has a chi square
distribution with S-1 df when null is true. i.e.,
2 = N R2 ~ 2
(S −1)
Because a large R2 value provides evidence against the null
hypothesis (it suggests the z variables explain changes in the
variance), the rejection region for the statistic in (8.43) is in
the right tail of the distribution. Thus for a 5% significance
level, reject null and conclude that heteroskedasticity exists
when 2 2
(0.95,S −1)
Features of this test
1. It is a large sample test