0% found this document useful (0 votes)
154 views11 pages

Detecting and Resolving Heteroskedasticity in STATA-1

The document outlines methods for detecting and resolving heteroscedasticity in a regression model using STATA commands. It includes both informal graphical methods and formal statistical tests such as the Breusch-Pagan test, Glesjer LM test, Harvey-Godfrey test, Park LM test, and White test, all indicating significant evidence of heteroscedasticity. Solutions for addressing heteroscedasticity, including generalized least squares, taking logarithms of variables, and applying robust standard errors, are also discussed.

Uploaded by

Sharjeel Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
154 views11 pages

Detecting and Resolving Heteroskedasticity in STATA-1

The document outlines methods for detecting and resolving heteroscedasticity in a regression model using STATA commands. It includes both informal graphical methods and formal statistical tests such as the Breusch-Pagan test, Glesjer LM test, Harvey-Godfrey test, Park LM test, and White test, all indicating significant evidence of heteroscedasticity. Solutions for addressing heteroscedasticity, including generalized least squares, taking logarithms of variables, and applying robust standard errors, are also discussed.

Uploaded by

Sharjeel Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Detecting Heteroscedasticity

(STATA Commands)
Used Econometric Model: price=ß1+ ß2rooms+ ß3sqfeet
Data File: hprice.dta

1) Informal or Graphical Method


For generating the scatter plots between squared error terms and estimated price, and between
squared error terms and independent variables (rooms and sqfeet), first we run the regression and
generated estimated price (priceht), residuals (ut) and squared residuals (utsq) as follows:
Note: when we write text starting with *, it means these are comments to STATA commands for
guidance.
*regress the price on rooms and sqrfeet

*predict the estimated value of price (priceht)

*predict the residual or standard errors (ut) of our regression model


. predict ut, residual
*generate (g) the squared residuals or standard errors (ut) of our regression model
. g ustq=ut^2
*now generate the scatter diagrams to detect heteroscedasticity
. twoway (scatter utsq priceht) (lfit utsq priceht)
Figure 1
. twoway (scatter utsq rooms) (lfit utsq rooms)

Figure 2
. twoway (scatter utsq sqfeet) (lfit utsq sqfeet)

Figure 3
There is a clear evidence of heteroscedasticity in all three figures (the variation of squared residuals
is not constant). The variable room is the stronger case of heteroscedasticity, while sqfeet is
relatively week case of hetero.

2) Formal Methods
i) Breusch-Pagan (BP) Test: This test regresses the squared error term (utsq) on independent
variables (rooms and sqfeet in this case). The significant F-value or LM test values of this
auxiliary regression
. reg utsq rooms sqfeet

Since F-value is highly significant (>2.5; p<0.05), there is a strong evidence of heteroscedasticity.
The another way is to calculate LM test (which follows a χ2 distribution with k-1 degree of
freedom) by multiplying number of observations (n) with R-square (R2) of above auxiliary
regression.
*The following command generates the value of LM test by multiplying the estimates of
number of observations (e(N) with r-squared (e(r2).
. scalar nR2=e(N)*e(r2) for single value

*The following command generates the 5% critical value of χ2 distribution.


. scalar chi2critical=invchi2tail(e(df_m), 0.05) for critical value of chi square, entering df testing variance

*The following command generates the p-value of χ2 distribution.


. scalar p_value=chi2tail(e(df_m), nR2)
* Now list all the scalar values generated previously with the following command.
Interpretation: Since calculated value of LM test (10.58) is greater than χ2 critical value (5.99), we
reject the null hypothesis of no heteroscedasticity.
*BP test can also be simply implemented with the following command after running the main
regression

Since the chi-square value is highlight significant (p<0.05), we reject the null hypothesis that there
is no heteroscedasticity in the model.
ii) Glesjer LM Test
This test regresses the absolute residuals (or standard errors) on independent variables. Therefore,
we need to generate the absolute residuals (absut) with the following command.
. g absut=abs(ut) generating abs values of error term
*Now run the auxiliary regression as follows:
repeating the same three commands

Since F-value is highly significant (>2.5; p<0.05), there is a strong evidence of heteroscedasticity.
The another way is to calculate LM test (which follows a χ2 distribution with k-1 degree of
freedom) by multiplying number of observations (n) with R-square (R2) of above auxiliary
regression.
*The following command generates the value of LM test by multiplying the estimates of
number of observations (e(N) with r-squared (e(r2).
. scalar nR2=e(N)*e(r2)
*The following command generates the 5% critical value of χ2 distribution.
. scalar chi2critical=invchi2tail(e(df_m), 0.05)
*The following command generates the p-value of χ2 distribution.
. scalar p_value=chi2tail(e(df_m), nR2)
* Now list all the scalar values generated previously with the following command.

Interpretation: Since calculated value of LM test (13.13) is greater than χ2 critical value (5.99),
we reject the null hypothesis of no heteroscedasticity.
iii) Harvey-Godfrey Test
This test regresses the log squared residuals (or standard errors) on independent variables.
Therefore, we need to generate the log squared residuals (lutsq) with the following command.
. g lutsq=log(utsq)
*Now run the auxiliary regression as follows:
Since F-value is highly significant (>2.5; p<0.05), there is a strong evidence of heteroscedasticity.
The another way is to calculate LM test (which follows a χ2 distribution with k-1 degree of
freedom) by multiplying number of observations (n) with R-square (R2) of above auxiliary
regression.
*The following command generates the value of LM test by multiplying the estimates of
number of observations (e(N) with r-squared (e(r2).
. scalar nR2=e(N)*e(r2)
*The following command generates the 5% critical value of χ2 distribution.
. scalar chi2critical=invchi2tail(e(df_m), 0.05)
*The following command generates the p-value of χ2 distribution.
. scalar p_value=chi2tail(e(df_m), nR2)
* Now list all the scalar values generated previously with the following command

Interpretation: Since calculated value of LM test (8.65) is greater than χ2 critical value (5.99), we
reject the null hypothesis of no heteroscedasticity.
iv) *Park LM Test
This test regresses the log squared residuals (or standard errors) on log independent variables.
Therefore, we need to generate the log independent variables with the following commands.
. g lrooms=log(rooms)
. g lsqfeet=log(sqfeet)
*Now run the auxiliary regression as follows:
Since F-value is highly significant (>2.5; p<0.05), there is a strong evidence of heteroscedasticity.
The another way is to calculate LM test (which follows a χ2 distribution with k-1 degree of
freedom) by multiplying number of observations (n) with R-square (R2) of above auxiliary
regression.
*The following command generates the value of LM test by multiplying the estimates of
number of observations (e(N) with r-squared (e(r2).
. scalar nR2=e(N)*e(r2)
*The following command generates the 5% critical value of χ2 distribution.
. scalar chi2critical=invchi2tail(e(df_m), 0.05)
*The following command generates the p-value of χ2 distribution.
. scalar p_value=chi2tail(e(df_m), nR2)
* Now list all the scalar values generated previously with the following command

Interpretation: Since calculated value of LM test (7.41) is greater than χ2 critical value (5.99), we
reject the null hypothesis of no heteroscedasticity.
v) *White Test
*no cross-products: In this case, we regress the squared residuals or errors on independent
variables (IVs) and their squared values (or quadratic terms).
* Generate the quadratic terms of IVs
. g rooms2=rooms^2
. g sqfeet2=sqfeet^2
*Now run the auxiliary regression as follows: we also put squared (quadratic) terms b\c sometimes
terms have non collinear relation
Since F-value is highly significant (>2.5; p<0.05), there is a strong evidence of heteroscedasticity.
The another way is to calculate LM test (which follows a χ2 distribution with k-1 degree of
freedom) by multiplying number of observations (n) with R-square (R2) of above auxiliary
regression.
*The following command generates the value of LM test by multiplying the estimates of
number of observations (e(N) with r-squared (e(r2).
. scalar nR2=e(N)*e(r2)
*The following command generates the 5% critical value of χ2 distribution.
. scalar chi2critical=invchi2tail(e(df_m), 0.05)
*The following command generates the p-value of χ2 distribution.
. scalar p_value=chi2tail(e(df_m), nR2)
* Now list all the scalar values generated previously with the following command

Interpretation: Since calculated value of LM test (16.20) is greater than χ2 critical value (5.99),
we reject the null hypothesis of no heteroscedasticity.
*with cross-products: In this case, we regress the squared residuals or errors on independent
variables (IVs), their squared values (or quadratic terms) and their interaction or cross-
product terms.
*Generate the quadratic terms of IVs. Note that we have already generated quadratic terms
in first method of White Test.
. g roomsXsqfeet=rooms*sqfeet
Since F-value is highly significant (>2.5; p<0.05), there is a strong evidence of heteroscedasticity.
The another way is to calculate LM test (which follows a χ2 distribution with k-1 degree of
freedom) by multiplying number of observations (n) with R-square (R2) of above auxiliary
regression.
*The following command generates the value of LM test by multiplying the estimates of
number of observations (e(N) with r-squared (e(r2).
. scalar nR2=e(N)*e(r2)
*The following command generates the 5% critical value of χ2 distribution.
. scalar chi2critical=invchi2tail(e(df_m), 0.05)
*The following command generates the p-value of χ2 distribution.
. scalar p_value=chi2tail(e(df_m), nR2)
* Now list all the scalar values generated previously with the following command

Interpretation: Since calculated value of LM test (17.23) is greater than χ2 critical value (5.99),
we reject the null hypothesis of no heteroscedasticity.

after running main regression:


simple command for white test:
- command: estat imtest, white
Resolving Heteroscedasticity
1) GLS/WLS
generalised least squares/weighted least squares
In this method, we divide the regression equation by the variance or standard deviation of that
independent variable which is mainly causing the heteroscedasticity problem. For instance, if we
look at above scatter diagrams, rooms variable has the strong case of heteroscedasticity.
dividing the whole reg equation with variance / std dev of the regressor causing hetero
*Run the following regression command by using rooms as analytical weight (aweight) in the
regression equation
graph will tell us the regressor causing
hetero and we'll mention the variable in
command whose weights stata will use.

*Now test the heteroscedasticity with the following BP test to check whether hetero is
removed or not

if insignificant, then hetero has


been removed

if we know the source of


hetero (variable) , we can run
wls/gls
Interpretation: Since the chi-square value is significant, the hetero problem still exists in the data.
Note: You can try the same method for priceht and sqfeet variables
2) *Taking Log of Variables
. g lprice=log(price)

Since the chi-square value of BP Test is insignificant, the hetero problem is removed from the
given model.
3) Applying White Heteroscedastic-consistent Robust Standard Errors Method
Remember this method only correct standard errors for hetero problem, but does not
remove the heteroscedasticity from the data
if nothing works

You might also like