Prac 11 Heteroscedasticity
Prac 11 Heteroscedasticity
This practical exercise focuses on the methods of detection and remedies for heteroscedasticity
in multiple regression estimation as discussed in Session 11. Heteroscedasticity is most often
encountered when working with cross-sectional data.
1. Start a log file and ensure that it is a text file (i.e. choose the .log option).
2. Download the dataset hetero1.dta from the Moodle site for the course and save it to a
convenient location. Click on the ‘open’ icon in Stata, and then navigate to the location
where you saved the dataset and open it.
This data set contains cross-sectional data from 85 countries. The dataset contains the
following variables:
y life expectancy in years
x1 per capita income in US Dollars
x2 an index of access to healthcare (0 = lowest, 100 = highest)
(Source: Exercise 13.21 from Gujarati, D. N. (2006) “Essentials of Econometrics”, 3 rd
Edition, McGraw-Hill International Edition)
To confirm this, type:
desc
We would like to estimate a model explaining life expectancy and test the regression for
heteroscedasticity. We propose life expectancy depends on income and access to healthcare.
3. Regress life expectancy on per capita income and the index of access to healthcare. Type:
reg y x1 x2
predict yfit
predict e, resid
A TESTING PROCEDURES
4. Graph the residuals against yfit:
scatter e yfit
scatter e yfit, yline(0)
Do you think there is heteroscedasticity?
5. Conduct the PARK test for heteroscedasticity (see the class notes):
Step 1: Estimate the regression. You have already carried this out.
Step 2: Obtain the residuals, square them and take their logs. Enter the commands:
gen esq=e^2
gen lnesq= log(esq)
Step 3: Generate logged values of the explanatory variables from your regression:
gen lnx1=log(x1)
gen lnx2=log(x2)
gen lnyfit=log(yfit)
Step 5: Follow the rest of your notes (page 9) to establish if your data has
heteroscedasticity.
heteroscedasticity Is heteroscedasticity present? Which of the variable(s) do you
think is ‘responsible’ for the heteroscedasticity?
6. Conduct the GLEJSER test for heteroscedasticity (as outlined in the notes, p.10):
Step 1: Generate absolute values of the residuals from the original regression as follows:
gen eabs=abs(e)
Step 2: Generate the square root and inverse values of x1 and x2 as follows:
gen x1sqrt=sqrt(x1)
gen x2sqrt=sqrt(x2)
gen x2inv=1/x2
gen x1inv=1/x1
Step 3: Then run the regressions as outlined on page 10 of your notes (i.e. equations (2),
(3) and (4)).
reg eabs x1
reg eabs x2
reg eabs x1sqrt
reg eabs x2sqrt
reg eabs x1inv
reg eabs x2inv
What do you conclude? Do your results support the conclusions you arrived at from
the Park test?
7. Conduct the White test by running the auxiliary regression outlined in the notes (p.11). You
will first need to generate variables in addition to those already created above before you
run the regression:
gen x1sq= x1^2
gen x2sq= x2^2
gen x1x2= x1*x2
reg esq x1 x2 x1sq x2sq x1x2
Thereafter, since you also need to calculate the chi-squared statistic, and its probability,
which is nR2. Use n=85 (the size of our sample) and the R2 value from the above
regression.
Type:
di 85*{insert here the R2 value from the above regression}
Since we have five explanatory variables in the last regression, we calculate the test
statistic by typing:
di chiprob(5, {insert chi2 value you calculated using the previous command})
Looking at the first part of the output, do the results agree with your manual test above?
Is the conclusion the same? Can you understand why you get different results?
10. Examine the pattern of heteroscedasticity by graphing the residuals (e) against x1 and x2
separately. What do you observe?
B REMEDIAL MEASURES
11. Carry out a weighted least squares regression as described below. Recall that if the error
variance is unknown, you need to make an assumption about the pattern of
heteroscedasticity. First, we assume that the error variance is proportional to x1. We try the
weighting suggested in the notes (p. 14) i.e. we assume that . We transform the
variables by dividing each one of them by the square root of x1. We also generate a new
variable
x0t =1/x1sqrt.
gen yt=y/x1sqrt
gen x1t=x1/x1sqrt
gen x2t=x2/x1sqrt
gen x0t=1/x1sqrt
Now regress the transformed variables without an intercept. [Recall that to perform a
regression without an intercept, you have to type “reg…, noconstant” after the last
variable.]
reg yt x0t x1t x2t, noconstant
12. Retrieve the residuals. (You will have to use a different name - i.e. not “e”, but e1, for
example.) Plot them against x1 and x2 separately. Do you think you have solved the
heteroscedasticity problem?
13. We now assume that the error variance is proportional to (x1) 2. Referring to page 16 of the
notes, try the following weighting. Generate a transformed y by dividing it by x1, and a
transformed x2 by dividing it by x1, and generate a new variable x0w=1/x1.
Then regress the transformed y on x0w and the transformed x2 with an intercept.
Retrieve the residuals (again, you will have to use a different name e.g. ew) and plot them
against x1 and x2 separately. Do you think that you have now solved the heteroscedasticity
problem?
Note the differences in your results (comparing them with your original regression results),
particularly the sizes of the coefficient errors. Are the changes in these as expected?