Detecting and Resolving Heteroskedasticity in STATA-1
Detecting and Resolving Heteroskedasticity in STATA-1
(STATA Commands)
Used Econometric Model: price=ß1+ ß2rooms+ ß3sqfeet
Data File: hprice.dta
Figure 2
. twoway (scatter utsq sqfeet) (lfit utsq sqfeet)
Figure 3
There is a clear evidence of heteroscedasticity in all three figures (the variation of squared residuals
is not constant). The variable room is the stronger case of heteroscedasticity, while sqfeet is
relatively week case of hetero.
2) Formal Methods
i) Breusch-Pagan (BP) Test: This test regresses the squared error term (utsq) on independent
variables (rooms and sqfeet in this case). The significant F-value or LM test values of this
auxiliary regression
. reg utsq rooms sqfeet
Since F-value is highly significant (>2.5; p<0.05), there is a strong evidence of heteroscedasticity.
The another way is to calculate LM test (which follows a χ2 distribution with k-1 degree of
freedom) by multiplying number of observations (n) with R-square (R2) of above auxiliary
regression.
*The following command generates the value of LM test by multiplying the estimates of
number of observations (e(N) with r-squared (e(r2).
. scalar nR2=e(N)*e(r2) for single value
Since the chi-square value is highlight significant (p<0.05), we reject the null hypothesis that there
is no heteroscedasticity in the model.
ii) Glesjer LM Test
This test regresses the absolute residuals (or standard errors) on independent variables. Therefore,
we need to generate the absolute residuals (absut) with the following command.
. g absut=abs(ut) generating abs values of error term
*Now run the auxiliary regression as follows:
repeating the same three commands
Since F-value is highly significant (>2.5; p<0.05), there is a strong evidence of heteroscedasticity.
The another way is to calculate LM test (which follows a χ2 distribution with k-1 degree of
freedom) by multiplying number of observations (n) with R-square (R2) of above auxiliary
regression.
*The following command generates the value of LM test by multiplying the estimates of
number of observations (e(N) with r-squared (e(r2).
. scalar nR2=e(N)*e(r2)
*The following command generates the 5% critical value of χ2 distribution.
. scalar chi2critical=invchi2tail(e(df_m), 0.05)
*The following command generates the p-value of χ2 distribution.
. scalar p_value=chi2tail(e(df_m), nR2)
* Now list all the scalar values generated previously with the following command.
Interpretation: Since calculated value of LM test (13.13) is greater than χ2 critical value (5.99),
we reject the null hypothesis of no heteroscedasticity.
iii) Harvey-Godfrey Test
This test regresses the log squared residuals (or standard errors) on independent variables.
Therefore, we need to generate the log squared residuals (lutsq) with the following command.
. g lutsq=log(utsq)
*Now run the auxiliary regression as follows:
Since F-value is highly significant (>2.5; p<0.05), there is a strong evidence of heteroscedasticity.
The another way is to calculate LM test (which follows a χ2 distribution with k-1 degree of
freedom) by multiplying number of observations (n) with R-square (R2) of above auxiliary
regression.
*The following command generates the value of LM test by multiplying the estimates of
number of observations (e(N) with r-squared (e(r2).
. scalar nR2=e(N)*e(r2)
*The following command generates the 5% critical value of χ2 distribution.
. scalar chi2critical=invchi2tail(e(df_m), 0.05)
*The following command generates the p-value of χ2 distribution.
. scalar p_value=chi2tail(e(df_m), nR2)
* Now list all the scalar values generated previously with the following command
Interpretation: Since calculated value of LM test (8.65) is greater than χ2 critical value (5.99), we
reject the null hypothesis of no heteroscedasticity.
iv) *Park LM Test
This test regresses the log squared residuals (or standard errors) on log independent variables.
Therefore, we need to generate the log independent variables with the following commands.
. g lrooms=log(rooms)
. g lsqfeet=log(sqfeet)
*Now run the auxiliary regression as follows:
Since F-value is highly significant (>2.5; p<0.05), there is a strong evidence of heteroscedasticity.
The another way is to calculate LM test (which follows a χ2 distribution with k-1 degree of
freedom) by multiplying number of observations (n) with R-square (R2) of above auxiliary
regression.
*The following command generates the value of LM test by multiplying the estimates of
number of observations (e(N) with r-squared (e(r2).
. scalar nR2=e(N)*e(r2)
*The following command generates the 5% critical value of χ2 distribution.
. scalar chi2critical=invchi2tail(e(df_m), 0.05)
*The following command generates the p-value of χ2 distribution.
. scalar p_value=chi2tail(e(df_m), nR2)
* Now list all the scalar values generated previously with the following command
Interpretation: Since calculated value of LM test (7.41) is greater than χ2 critical value (5.99), we
reject the null hypothesis of no heteroscedasticity.
v) *White Test
*no cross-products: In this case, we regress the squared residuals or errors on independent
variables (IVs) and their squared values (or quadratic terms).
* Generate the quadratic terms of IVs
. g rooms2=rooms^2
. g sqfeet2=sqfeet^2
*Now run the auxiliary regression as follows: we also put squared (quadratic) terms b\c sometimes
terms have non collinear relation
Since F-value is highly significant (>2.5; p<0.05), there is a strong evidence of heteroscedasticity.
The another way is to calculate LM test (which follows a χ2 distribution with k-1 degree of
freedom) by multiplying number of observations (n) with R-square (R2) of above auxiliary
regression.
*The following command generates the value of LM test by multiplying the estimates of
number of observations (e(N) with r-squared (e(r2).
. scalar nR2=e(N)*e(r2)
*The following command generates the 5% critical value of χ2 distribution.
. scalar chi2critical=invchi2tail(e(df_m), 0.05)
*The following command generates the p-value of χ2 distribution.
. scalar p_value=chi2tail(e(df_m), nR2)
* Now list all the scalar values generated previously with the following command
Interpretation: Since calculated value of LM test (16.20) is greater than χ2 critical value (5.99),
we reject the null hypothesis of no heteroscedasticity.
*with cross-products: In this case, we regress the squared residuals or errors on independent
variables (IVs), their squared values (or quadratic terms) and their interaction or cross-
product terms.
*Generate the quadratic terms of IVs. Note that we have already generated quadratic terms
in first method of White Test.
. g roomsXsqfeet=rooms*sqfeet
Since F-value is highly significant (>2.5; p<0.05), there is a strong evidence of heteroscedasticity.
The another way is to calculate LM test (which follows a χ2 distribution with k-1 degree of
freedom) by multiplying number of observations (n) with R-square (R2) of above auxiliary
regression.
*The following command generates the value of LM test by multiplying the estimates of
number of observations (e(N) with r-squared (e(r2).
. scalar nR2=e(N)*e(r2)
*The following command generates the 5% critical value of χ2 distribution.
. scalar chi2critical=invchi2tail(e(df_m), 0.05)
*The following command generates the p-value of χ2 distribution.
. scalar p_value=chi2tail(e(df_m), nR2)
* Now list all the scalar values generated previously with the following command
Interpretation: Since calculated value of LM test (17.23) is greater than χ2 critical value (5.99),
we reject the null hypothesis of no heteroscedasticity.
*Now test the heteroscedasticity with the following BP test to check whether hetero is
removed or not
Since the chi-square value of BP Test is insignificant, the hetero problem is removed from the
given model.
3) Applying White Heteroscedastic-consistent Robust Standard Errors Method
Remember this method only correct standard errors for hetero problem, but does not
remove the heteroscedasticity from the data
if nothing works