0% found this document useful (0 votes)
107 views7 pages

Detecting and Resolving Model Specification Errors in STATA

The document outlines methods for detecting and resolving model specification errors in STATA, focusing on data normality, outlier management, and model specification tests. It details the use of regression analysis, residual prediction, and various statistical tests such as Ramsey’s RESET and Langrage Multiplier tests to identify and correct issues in econometric models. Additionally, it discusses the Box-Cox transformation for selecting the appropriate functional form when comparing models with different dependent variables.

Uploaded by

Sharjeel Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views7 pages

Detecting and Resolving Model Specification Errors in STATA

The document outlines methods for detecting and resolving model specification errors in STATA, focusing on data normality, outlier management, and model specification tests. It details the use of regression analysis, residual prediction, and various statistical tests such as Ramsey’s RESET and Langrage Multiplier tests to identify and correct issues in econometric models. Additionally, it discusses the Box-Cox transformation for selecting the appropriate functional form when comparing models with different dependent variables.

Uploaded by

Sharjeel Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Detecting and Resolving Model Specification Errors in STATA

(STATA Commands, Results and Interpretations)


Used Econometric Model: wage=ß1+ ß2educ+ ß3exper+ ß4 tenure+ ß4 IQ
Data File: WAGE2
1) Detecting Data Normality
*Run the regression based on above econometric model as follows:

*predict the residuals of model with the following STATA command


. predict res01, residual
*draw the histogram with normality curve
. histogram res01, normal
.0015
.001
Density
5.0e-04

-1000 0 1000 2000


Residuals

Interpretation: The data is somewhat positively skewed as its right tail is longer.
*You can also check the data normality with the following STATA command

Interpretation: Since JB test value is highly significant (its p-value<0.05), the data is non-normal
and has outliers in data as well.
Resolving Data Non-Normality Issue
i) Application of log-lin or double-log model: In this approach, you can run the log model to
smoothen and normalize the data to a certain extent.
*Run the following STATA command of log-lin model of wage to address data normality
issue

*Now predict the residuals of this log-lin model and apply JB test again to check data
normality.

Interpretation: As compared to previous results, JB test value has been greatly reduced from
738.4 to 42.89 and data has been normalized to a greater extent though it is yet not perfectly normal
ii) Winsorization of Data: This statistical method is used to replace the outliers with the nearest
values of quartiles or percentiles.
*First install winsor2 command in STATA as follows:

*Now run the winsor2 command to address outliers.


. winsor2 lwage educ exper tenure IQ
*Now run the regression model with new winsorized variables created by STATA with _w
lables:

*Now predict the residuals of this log-lin model and apply JB test again to check data
normality.

Interpretation: Bravo! The data has been completely normalized and outliers have been removed
as well. Now our hypothesis testing would be valid as t and F tests require the data normality
condition.
2) Model Specification Tests (Detecting Omitted Variable Bias)
i) Ramsey’s RESET Test: The first test is Ramsey’s RESET test which is commonly used to
detect the model specification error by including the quadratic and cubic powers of fitted values
of Y variable (in this case wage).
*Run the following STATA command using our final model with winsorized variables.

* After running the above regression, now predict the fitted values of lwage_w, and generate
(g) the quadratic and cubic values of lwage_w as follows:

* Now run the Ramsey’s RESET test by regressing the lwage_w on regressors and quadratic
and cubic terms of lwage_w with the following STATA command.
Interpretation: Since the F-value of unrestricted model is highly significant (its p-
value<0.05), we conclude that model is misspecified.
ii) Langrage Multiplier (LM) Test: In this test we regress the residuals of our model on quadratic
and cubic terms of fitted or estimated values of Y.
*Run the following STATA command

*The following STATA command generates the value of LM test by multiplying the estimates
of number of observations (e(N) with r-squared (e(r2).
. scalar nR2=e(N)*e(r2)
*The following command generates the 5% critical value of χ2 distribution.
. scalar chi2critical=invchi2tail(e(df_m), 0.05)
*The following command generates the p-value of χ2 distribution.
. scalar p_value=chi2tail(e(df_m), nR2)
* Now list all the scalar values generated previously with the following command.

Interpretation: Since calculated value of LM test (586.65) is much greater than χ2 critical value
(5.99), we reject the null hypothesis of no specification errors.
3) Detecting the Right Functional Form: If we want to compare two competing models with
the same dependent variables, then run the regressions in STATA and choose the model which has
the highest Adjusted R2 and lower AIC or BIC values. However, problem arises when we have
different DVs. For instance, you want to compare two model in which DV is wage in one model
and lwage in another model.
wage=ß1+ ß2educ+ ß3exper+ ß4 tenure+ ß4 IQ (1)
lwage=ß1+ ß2 leduc+ ß3 lexper+ ß4 tenure+ ß4 lIQ (2)
In this case, we use the following Box-Cox transformation procedure to choose the right functional
form.
*Step 1. Find out the geometric mean of wage variable with the following STATA command:

*Step 2. Now divide wage variable by its geometric mean to create new variable ‘wagestar’
with the following STATA command

*Step 3. Now regress both models 1 and 2 with newly created common variable ‘wagestar’
with the following STATA commands:
*Step 4. Now calculate the Box-Cox statistic as follows: -

B-Cox stat=   n  ln 
1 RSS2 
2   where RSS represents the residual sum of squares of both models.
 RSS1 

Note: keep the higher RSS in the numerator which is RSS2 of log model in this case Moreover B-
Cox stat follows the chi-square distribution with k-1 degree of freedom, where k is number of
coefficients.
 152.02 
B-Cox Stat   0.5*935  ln    3.395
 150.92 

*Now calculate the p-value of Box-Cox statistic where k-1 (5-1=4) degree of freedom; there
are four IVs in our model.

*Now list the calculated p-value with the following STATA command

Interpretation: Since the test-statistic is insignificant (as its p-value is greater than 0.05), so we
cannot conclude that the log function is superior to linear model.

You might also like