0% found this document useful (0 votes)
97 views52 pages

Chapter 8 Heteroskedasticity

1. Heteroskedasticity refers to non-constant variance in the error term. This violates an assumption of ordinary least squares regression. 2. Under heteroskedasticity, the standard errors computed for the regression coefficients are incorrect. Confidence intervals and hypothesis tests may therefore be misleading. 3. The White estimator provides an alternative way to calculate standard errors that is robust to heteroskedasticity. It replaces the assumption of constant variance with a formula that allows the variance to depend on the values of the independent variables.

Uploaded by

tan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views52 pages

Chapter 8 Heteroskedasticity

1. Heteroskedasticity refers to non-constant variance in the error term. This violates an assumption of ordinary least squares regression. 2. Under heteroskedasticity, the standard errors computed for the regression coefficients are incorrect. Confidence intervals and hypothesis tests may therefore be misleading. 3. The White estimator provides an alternative way to calculate standard errors that is robust to heteroskedasticity. It replaces the assumption of constant variance with a formula that allows the variance to depend on the values of the independent variables.

Uploaded by

tan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Chapter 8

HETEROSKEDASTICITY
What is heteroskedasticity?
The likely variability in the variance of
random error is known as
heteroskedasticity. One of the
assumptions of least squares estimation is
the constancy of error variance  2 but if
the assumption is violated then the
ordinary least square process will no
longer have all desirable qualifications.
Further understanding
Recall the example of household income and food
expenditure. Different income levels would
determine different food expenditures. Food
expenditures are supposed to scatter around the
mean expenditure with a certain variance. If this
variance is equal for any level of income then
the data qualify the property of
homoskedasiticity. If, on the other hand, food
expenditures have a variable nature of
uncertainty then the corresponding distributions
of food expenditures would have different
variances- characterized as heteroskedasticity.
Implication of heteroskedasticity
As an intuition, think of a possibility of more
variability in food expenditure of high income
people relative to the low income group. Low
income group’s expenditure is largely
determined by their income but high income
people’s expenditure is attributable to many
factors other than income. Naturally
expenditures of high income group would not be
very much clustered around the mean
expenditure although low income group’s
expenditure is.
Heteroskedasticity implications
• Existence of heteroskedasticity would
mean that the probability of obtaining
larger absolute values of ei is higher for
high income than low income.
• High income people’s expenditure is
influenced by other factors than income
which leads to having larger ei for high
icome, which eventually mean higher
variance of ei at high icome levels.
The above statement qualifies the fact that
variance of ei, and therefore yi, depend on
xi.
Formal treatment of heteroskedasticity
Let ei is the difference between individual i’s food
expenditure and mean expenditure, i.e., ei=yi-E(yi),
where
E( y ) =  +  x y =  +  x + e .......(8.3)
i 1 2i i 1 2i i

One can view E( y ) =  +  x as that part of food


i 1 2i
expenditure explained by income xi and ei as part of food
expenditure by other factors. The concept of
heteroskedasticity underlies in the question that whether
E( y ) =  +  x is better at explaining expenditure on food
i 1 2i
for low-income households than it is for high-income
households. If the answer of this question is yes, it will
mean the existence of heteroskedasticity in data. Indeed
high income group’s fooding is influenced by other
factors than only income.
Formal treatment cont.
The above story signifies one fact, the probability
of getting large positive or negative values for ei
is higher for high incomes than it is for low
incomes. Factors other than incomes can have a
larger impact on food expenditure when
household income is high. A random variable, in
this case ei, has a higher probability of taking on
large values if its variance is high. This means
var(ei) depend directly on inocome xi. The formal
statement is to say var(yi) increases as xi
increaes. Food expenditure yi can deviate further
from its mean E( yi ) = 1+ 2xi when xi is large. In
such a case, when the variances for all
observations are not the same,
heteroskedasticity exists.
Graph of homoskedasticity
pdf
f (y)

pdf ( y / x )
1 y
x pdf ( y / x )
1 2
x
2
E( y) =  +  x
1 2
x
Graph of heteroskedasticity
pdf
f (y)

pdf ( y / x )
1

y
x pdf ( y / x )
1 2
x
2
E( y) =  +  x
1 2
x
Graphical illustration of heteroskedasticity

Implications
. Graph of heteroskedasticity
• At xi, the probability density
pdf
f (y)
function f(y1/x1) is such that y1
will be close to E(y1) with high
f (y |x ) probability.
1 1
• For higher x, the pdf f(y2/x2) is
y more spread out; we are less
x
1 f (y |x )
2 2
certain about where y2 might
x
2 fall,
E( y) =  +  x
1 2 • When homoskedasticity exists,
x the probability density
function for the errors does not
change as x changes.
Questions to be addressed
• What are the consequences of
heteroskedasticity?
• Is there a better estimation technique?
• How to detect the existence of
heteroskedasticity?

Assumptions surrounding ei (e.g. Zero mean and


covariance) remain unaltered except constancy
of the variance of random error. This is rather
assumed that variance depends on xi.
Var(yi) = var(ei) = h(xi)
Heteroskedasticity is more likely in cross sectional data
Informal way of detecting
heteroskedasticity
. 600

Figure demonstrates the


500
fact of higher variability of
expenditures in the event
FOOD_EXP

400
of higher income, i.e.,
300
mean y is further apart
from individual y at higher
200 income levels. Variance
of y, hence of e, is higher
100 when we take account of
0 5 10 15 20 25 30 35
the higher income level
INCOME
group- this characterizes
heteroskedasticity.
Consequences of heteroskedasticity for
least squares estimation
1. Under heteroskedasticity, the standard
errors usually computed for the LSE are
incorrect. Confidence intervals and
hypothesis tests that use these standard
errors may be misleading.

2. The least squares esitmator is sitll a linear


and unbiased estimator, but it is no
longer best. There is another estiamator
with a smaller variance.
Impact on Standard Error
For the simple regression model with homoskedasticity
assumption, the variance of ̂ is  2
2 var(ˆ ) =
2 2
 i
( x − x )

But for the simple linear regression model yi = 1+ 2xi +ei
where var(e ) = 2 , econometrician Hal White proposed
i i  
−  2 2 
 i( x x


) 
i 
ˆ 2 2
var( ) =  w  = 
... ... (8.8)
2 i i  2
x −x 
− 2 

Where, w= i  ( x x ) 
i (x − x)2  i 

i
The resulting standard errors have become known as White’s
heteroskedasticity-consistent standard errors, or
heteroskedasticity robust, or simply robust standard errors.
White’s standard error
To obtain the white stanadard error for ̂ corresponding to
2 eˆ = y − ˆ − ˆ x
8.8, we obtain the least squares residuals
i i 1 2i
and replace  2 in (8.8) with the squares of the least squares
i
residuals. The white variance estimator is given by
 

− 2 ˆ 2 
 i( x 

x ) e
i


var̂(ˆ ) =  w eˆ =
2 2  
... ... (8.9)
2 i i  2
 ( x − x ) 
 2
 
 i 

Standard error is the square root of estimated variance above.


Go back to eviews file in chapter 2 and check that White’s
standard errors are smaller than the incorrect ones but the
coefficient estimates are the same.
The procedure allows us to compute correct confidence interval
which would be narrower than before.
Alternative estimation, GLS
The generalized least squares estimates are best
linear unbiased estimates in the presence of
heteroskedasticity which rest on the
assumption of how the variances  2 change
i
with each observation. Three alternative
structures regarding  i2 are taken into
account:
1. Error variance  i2 linearly increases with xi
2. Error variance  i2 nonlinearly varies with xi

3. Error variance  i2 encounter heteroskedastic


partition.
1. Linear variation of  2
Assuming the simple regression model, y =  +  x +e
i 1 2 i i
Under heteroskedasticity, E(e ) = 0
i
cov(e ,e ) = 0
i j
var(e ) = 2 = 2x ... ... (8.11)
i i i
(8.11) signifies that var(ei) is proportional to income. This
assumption clarifies that, for low levels of income, food
expenditure will be clustered around the mean y. At low
levels of income food expenditure is mostly explained by
income but at high levels of income food expenditure
depends on other factors than income.

LS estimators lose their property of being best linear


unbiased estimator in the presence of heteroskedastic
variance. What can be done?
Transform the model into one with homoskedastic errors

If the errors are transformed into homoskedastic errors then the


least squares procedure give a best linear unbiased
estimator.
As a trick of doing this, divide both sides of the regression model
by x
i
The transformed model then turns into
y* =  x* +  x* + e* .... .... (8.14)
i 1 i1 2 i2 i
y x e
Where, y =* i , x* = 1 , x* = i = x , e* = i
i x i1 x i2 x i i
i x
i i i
e
var(e*) = var( i ) = 1 var(e ) = 1  2x = 2
i x x i x i
i i i
The transformed error term will retain the properties of zero
mean, and zero correlation between different observations.
We got var(e ) = 2x
i i

• Manipulation 2
     
     
 e     

var(e*) = var i = var 1 e = 1













 var(e )
i 
 x



 x i



x

 i
     

 i


 i 


 i 

= 1 var(e ) = 1  2x = 2
x i x i
i i
Estimate the transformed model
You are now ready to apply the LS technique to obtain the
estimates of  and  using the observable asterisked
1 2
variables- these are generalized least squares
estimators.

Remember!
1. The model no longer contains any intercept term.
2. The transformed model is linear in the unknown
parameters  and  . These are the original
1 2
parameters we are interested in. Transformation does
not affect them anyway.
3. The transformed model satisfies the conditions of the
Gauss-Markov theorem, and the LS estimators defined
in terms of the transformed variables are BLUE.
GLS are WLS
GLS estimators can be viewed as weighted least squares
estimator. Least square estimator yields values of  and 
that minimize the sum of squared errors. In this case,1 we are2
minimizing the sum of squared transformed errors that is
given by
e2
*2 = i = −1/ 2e )2
 i 
e  i
( x
i
x
i
The errors are weighted by x−1/ 2, which is reciprocal of x
i i
Note! x
- When i is small, the data contain more information about
the regression function and the observations are weighted
heavily.
- When
x
i is large, the data contain less information about
the regression function and the observations are weighted
lightly.

In this way we take the advantage of the heteroskedasticity to


improve parameter estimation.
Eviews
I have transformed all the variables of food
exp model to obtain GLS estimates.
Y_STAR = 78.6840801834*X1_STAR +
10.4510090573*X2_STAR
Few Observations!
1. GLS estimates are somewhat different from OLS that do not
take account of heteroskedasticity.
2. The interpretation of the estimates are the same as the
original model estimates. The transformation of the
variables should be regareded as a device for converting a
heteroskedastic errror model into a homoskedastic error
model, not as something that changes the meaning of the
coefficients.
3. The standard errors are lower than the OLS ones-
exemplifying the superiority of GLS over OLS. However,
such improvement is attributable to a restrictive assumption
regarding variance var(e ) = 2 = 2x
i i i
4. The smaller standard errors have the advantage of
producing narrower more informative confidence intervals.
2. Non-linear/general variance function
Earlier we assumed error variance var(ei ) = 2 xi but there is
no reason why we should restrict ourselves to such a
conservative dependence of variance on xi. Even this could
be  2 times x 2 or x etc. In general we could
i i
var(e ) = 2 = 2x ... ... (8.17)
propose
i i e i 1
var(e*) = var( i ) =  var(e ) = 1  2x = 2
i  x i
x
i
x i i
i
The heteroskedastic error turns into homoskedastic error.

Next order of business is to create new variables by dividing i-th


observation by x / 2
i
Before such transformation, we need an estimate of ghama ˆ
Obtain ˆ
Taking logs of (8.17),
ln( 2) = ln 2 + ln x
i i
ln 2 = + z ..... ..... (8.20)
i 1 2i
Here,   ln 2 , 2  and z = ln x
i i
1
The question is to estimate α and α
1 2
Applying the same technique while estimating 1 and 2 from
the regresssion function, yi = 1+ 2 xi + ei
we can follow the similar strategy for estimating the variance
function using the squares of the least squares residuals (eˆ2)
as our observation. i
Obtain ˆ
The regression function for variance:
ln(eˆ2) = ln 2 + v = + z + v ... ..... (8.21)
i i i 1 2i i
Regressing ln(eˆ2) on a constant and zi obtain the least squares
estimates for i α and α
1 2
Note that the new error term v does not have zero mean and the
i
vi are correlated and heteroskedastic, still the least squares
estimator for α is unbiased in large samples. From food
2
expenditure model,
ln(eˆ2) = 0.937795963475 + 2.329*LOG(INCOME)
i
ln(eˆ2) = 0.9378+ 2.329z
i i
Estimate of ghama ˆ = 2.329
The next step is to transform the observations in such a way that
the transformed model has a constant error variance. This can
be done by dividing both sides of y =  +  x +e by xˆ / 2
i 1 2 i i i
You can avoid this tedious division

• Instead of dividing both sides by x , we can apply
i
easier technique to convert heteroskedastic error into
homoscedastic error.
Our regression model is y =  +  x +e , where
i 1 2 i i
var(e ) = 2 = 2x ... ... (8.17)
i i i
Divide both sides of the regression model by  , which
i
results
simplification
• .

y 1 x e
• i = + i + i
 1 2 
i i i i
or, yi* = 1xi*1+ 2xi*2 + ei* , where
y 1 x e
y =
* i , x = , x* = i , e* = i
*
i  i1  i2  i 
i i i i
We have var(e ) = 2 = 2x ... ... (8.17)
i i i

• What about the variance of ei* ?


• Given, e
e =
* i
i 
i 2
   
   
e
   
var(e*) = var i = 1 var(e ) = 1  2 =1








i 









i 



2 i
i i
   i

• Variance of ei* is constant, thus


homoscedastic.
How did we convert heteroskedastic error into
homoscedastic error?

• dividing by 
i
• But  is unknown. Therefore, you need the estimate
i
of  .
i
We got ln 2 = + z
i 1 2i
Therefore,
 + z
 2 =e 1 2 i
i
Variance function
• Meanwhile, we got the estimates α̂1 and α̂2

• Use them to obtain variance


Transforming the model
In line with the more general specifications in (8.19), we can
obtain variance estimates from ˆ +ˆ z
ˆ 2 = e 1 2 i
i
and then divide both sides of the equation by ˆ . Get the
generalized least squares estimates for  and 
i
1 2
The transformed model is
y* =  x* +  x* + e* .... .... (8.24)
i 1 i1 2 i2 i
y 1 x e
y =
* i x =
* x* = i e* = i
i ˆ i1 ˆ i2 ˆ i ˆ
i i i i
Now you are ready to estimate the coefficients. Same trick could
be applied to more general model with many explanatory
variables.
Summary
1. Estimate the original model ....
2. Regress log(eˆ2) on constant and zi,
obtain the estimates of 1 and 2
3. Obtain the variance estimate ˆi and
4. Transform the model by dividing each
observation by ˆi
5. Apply least square process to obtain
estimates of s .
Done

Y1_STAR = 76.0538706374*X11_STAR +
10.6334817282*X22_STAR
Observe
There is a considerable drop in the standard errors compared to
the GLS estimates using the variance function var(e ) = 2x
i i
but the estimates of  and  have not changed a great
1 2
deal. Under previous specification standard errors of 1 and 2
were 23.79 and 1.39 respectively, whereas they are now 9.71
and 0.97. This improved results are attributable to better
modeling, i.e., more general variance specification and better
estimation.
How does mean wage vary if the worker resides in
metropolitan area? Introducing heteroskedastic
partition.
Using ch8_cps2 data we obtain the following regression output

WAGE = -9.914 + 1.234*EDUC + 0.133*EXPER + 1.524*METRO

This will mean workers working in metropolitan area would have


1.524 dollars higher mean hourly wages than the workers who
live in rural area.
3. Heteroskedastic Partition
Heteroskedastic partition is attributable to unequal error variance
of two subsets of observation. As an illustration, suppose
wage is determined by edcuation and experience. If we have
data on wage, education and experience for two different
regions e.g. metropolitan and rural areas then the unequal
error variance of two subsets characterizes the
heteroskedastic partition.
Example: take a look at eviews file chp8_cps2. Two different
regressions of wage on educ and exper (one for metropolitan
and another for rural area) yield two different error variances:
there are 808 metropolitan observations and 192 rural
observations. Variance estimates are
ˆ 2 = 31.824 ˆ 2 =15.243
M R
If we can statistically prove that  2  2 , heteroskedastic
partition would become evident M R

Implication: anyone may have the perception that the greater


range of different types of jobs in a metropolitan area will lead
to city wages having higher variance.
Obtaining GLS
The strategy for obtaining generalized least squres
estimates is the same as before. The variables
are trasformed by dividing each observation by
the standard deviation of the corresponding
error term. In the example of heteroskedastic
partition, the metropolitan observations are
divided by  M2 and others by  2 .
R
Application of least squares to the complete set of
transformed observations yields best linear
unbiased estimators.
How to detect Heteroskedasticity?
There are three ways to investigate:
1. Informal use of residual plots
2. The Goldfeld-Quandt test
3. Variance function test
1. Residual plots
1. Estimate the regression model
2. Save residuals
3. Look at the residual plots.
If the errors are homoskedastic, there should be no
patterns of any sort in the residuals.
If the errors are heteroskedastic, they may tend to exhibit
greater variation in some systematic way. For
example, as income increases the absolute values of
the residuals do tend to increase.
The above method can be applied to any simple
regression. But if we deal with multiple regression then
the residuals may be plotted against each explanatory
variable or against ŷ , to see if those residuals vary in
a systematic way.
Plot of residuals
Residuals against education
50

40

30
RESIDUAL

20

10

-10

-20
0 4 8 12 16 20

EDUC
Residual Plot
Residuals against experience
50

40

30
RESIDUAL

20

10

-10

-20
0 10 20 30 40 50 60

EXPER
Residuals against y_cap
* 50

40

30
RESIDUAL

20

10

-10

-20
-5 0 5 10 15 20

WAGE_CAP
2. The Goldfeld-Quandt test
This test is specially designed under the circumstance of
heteroskedastic partition. Consider the variance difference of
metropolitan and rural areas H : 2 = 2
R H1: M  R
2 2
0 M
ˆ 2 / 2
F = M M ~ F (N − K , N − K )
Test statistic:
2 2 M M R R
 /
ˆ
R R
Computed value of ˆ 2
F = M ~ F (808−3,192 −3)
ˆ 2
R
Computed F=31.284/15.243=2.09
FLc=F(0.025, 805, 189)=0.81; FUc=F(0.975, 805, 189)=1.26
Reject null and conclude that the wage variances for the rural
and metropolitan regions are not equal.
Note: for one-tail test the critical value changes.
Goldfeld-Quandt test under one explanatory variable
Apart from heteroskedastic partition the above test could be
applied if the variance is a function of a single explanatory
variable. To perform the test the observations are ordered
according to the explanatory variabale so that, if
heteroskedasticity exists, the first half of the sample will
correspnd to observations with lower variances and he last
half of them will correspond to obsevation with higher
variances. Split the observations into approximately two equal
halves and carry out two separate least squuares regressions
that yield variance estimates, ˆ 2 and ˆ 2
1 2
Use food data: using first 20 and second 20 observations we
obtain ˆ 2 = 3574.8 ˆ 2 =12921.9 2
1 2 ˆ
F = 2 =12921.9 = 3.61
ˆ 2 3574.8
1
Believing that the variances could increase with income, use a
one-tail test with 5% critical value, F(0.95, 18, 18)=2.22. reject
null and conclude that variance increases with income.
On the basis of food data, can you conclude
that variance increases with income?
3. Testing Heteroskedasticity on the basis of variance function
The variance function is relevant when heteroskedasticity is a
possibility. In the event of homoskedasticity you are not
supposed to write the variance as a function since it is purely
constant, however.
A general form of variance function is
var( y ) = 2 = E(e2) = h( + z +.....+s z ) .... (8.37)
i i i 1 2 i2 is
One specific form of variance function is
 + z +.....+ s z
h( + z +.....+ s z ) = e 1 2 i2 is
1 2 i2 is
ln 2 + ln x
One special case is h( + z ) = e i
1 2i
Another example is cited as a linear one:
h( + z +.....+s z ) = + z +.....+s z ... (8.38)
1 2 i2 is 1 2 i2 is
Testing procedure
If  = =....=s = 0 then the variance would be
2 3
constant, homoskedasticity will be ensured. Therefore
testing for heteroskedasticity would mean testing the null
hypothesis H : = =....= = 0
0 2 3 S
against the alternative that
H :not all the s in H are zero
1 0
To obtain test statistic reconsider the variance function
var( y ) = 2 = E(e2) = + z +.....+s z
i i i 1 2 i2 is
Let v = e2 − E(e2) is the difference between a squared error
i i i
and its mean, then
e2 = E(e2) + v = + z +.....+s z + v .... ...(8.41)
i i i 1 2 i2 is i
There is a chance to think of estimating this function as we do
in the case of regressing yi on xi but ys are observable
whereas e are not
Test procedure cont.
e2 by residuals eˆ2.
Get rid of the problem by replacing
i
eˆ2 = + z +.....+s z + v ....i ...(8.42)
i 1 2 i2 is i
Our interest is to investigate if zi2, zi3... Zis help explain the
variation in eˆ2 . Since the R2 goodness-of-fit statistic from
i
(8.42) measures the proportion of variation in eˆ2 explained
i It can be
by the zs, it is a natural candidate for a test statistic.
shown that R2 times the sample size has a chi square
distribution with S-1 df when null is true. i.e.,
 2 = N  R2 ~  2
(S −1)
Because a large R2 value provides evidence against the null
hypothesis (it suggests the z variables explain changes in the
variance), the rejection region for the statistic in (8.43) is in
the right tail of the distribution. Thus for a 5% significance
level, reject null and conclude that heteroskedasticity exists
when  2   2
(0.95,S −1)
Features of this test
1. It is a large sample test

2. The test is referred to as Lagrange


Multiplier test or Breusch-Pagan test

3. The test can be carried out for any sort of


variance function although the test above
is conducted using the linear function.
The White test
While conducting LM test we presuppose that we have
knowledge of what variables will appear in the variance
function, i.e. we assumed z2, z3, .. zS can be specified but in
reality we may not have precise knowledge of the relevant
variables. This understanding motivated econometrician Hal
White to define the z’s as equal to the x’s, the squares of x’s,
and their cross products. For example, mean function being
E( y) =  +  x +  x
1 2 i2 3 i3
The White test specifies z = x z = x z = x2 z = x2
2 2 3 3 4 2 5 3
The White test is performed as an F-test or chi_sqr test.
Example
In the food expenditure model,
ˆ2e = h( + x )
i 1 2i
Begin by estimating the function, eˆ2 = + x + v using least
squares.
i 1 2i i
SST=4610749441, SSE=3759556169
R2=0.186  2 = N  R2 = 400.1846 = 7.38
since there is only one parameter in the null hypothesis the  2
test has one degree of freedom. The 5% critical value is 3.84.
reject null and conclude that variance depends on income.
White version: estimate the equation
Then test against eˆ2 = + x + x2 + v
i 1 2i 2i i
H : = = 0 H :  or   0
0 2 3 1 2 3
 2 = N  R2 = 400.1888 = 7.555
2 = 5.99
(0.95,2)
Conclude that heteroskedasticity exists.

You might also like