Prac 12-Model Selection
Prac 12-Model Selection
Download the datasets model select 1.dta and model select 2.dta from
the Moodle site for the course and save them to a convenient location. Open STATA
and then open the dataset model select 1.dta.
2. The dataset gives data on the real gross domestic product (y), labour input (x2), and
real capital input (x3) in the manufacturing sector for a developing country for the
years 1958 to 1972. Suppose that the theoretically correct production function that
we can estimate using this data, is of the Cobb-Douglas type. Our model can be
specified as follows:
ln Y t =
B1 + B2 ln X 2 t + B3 ln X 3t + ut
ln Y t =A 1 + A 2 X 2 t +v t
v
where t = error term.
Run the above regression and examine the consequences by referring to the Note
on Omitted Variable Bias uploaded on Moodle.
What difference(s) do you note with regard to the estimated coefficient values (i.e.
elasticity values), the standard errors and the R2 values?
7. Now suppose that data on labour (i.e. X 2) were not initially available and therefore
you estimated the following production function:
ln Y t =
B1 + B2 ln X 3 t + w t
where
w t = error term.
Run the above regression and again examine the consequences. What difference(s)
do you note with regard to the estimated coefficient values (i.e. elasticity values), the
standard errors and the R2 values?
Comment on this by running the regression with the trend variable in the model,
examining and commenting on the statistical significance of all the variables, on why
they may have possibly changed (compared to your original model) and how you
would interpret these changes.
This data set contains information on U.S expenditure on imported goods (y),
personal disposable income (x) and the trend variable (t) for the period 1968
to 1987.
12. Conduct an examination of the residuals plotted against the period of the study (i.e.
“year” variable).
predict e, resid
twoway connected e year, yline(0)
Do the residuals look randomly distributed or do they reveal any kind of systematic
pattern? If they do not appear to be randomly distributed, provide one or more
possible reasons.
13. Now regress y on x and t. Again, examine the residuals (call the variable for the
residuals e2) to see whether they now appear to be randomly distributed. What do
you conclude?
reg y x t
predict e2, resid
twoway connected e2 year, yline(0)
14. Given the above, let us now test whether a log-linear specification may not have
been more appropriate than a linear specification. But since both models may look
equally good in terms of the usual criteria we can now test for the “better” model
using the MWD Test as follows:
Step 1: Estimate the linear model (which you have already done) and obtain the
^
estimated Y values i.e. Y i .
predict yfit
^ ^
Step 2: Generate the logged value of the estimated Y i (above) to yield ln { Y i¿ :
g lnyfitted= log(yfit)
^i
^i−lnY
Step 4: Obtain. Z1 i=ln Y
15. To test whether the log-linear model is appropriate, continue the MWD test as
explained on pg. 11-12 of your notes.
^ )−Y^
Step 6 : Obtain Z 2i=antilog (ln Y i i
gen z2 = exp(lnyfit)-yfit
Step 7: Regress lnY on the X’s or logs of X’s and Z2.
reg lny lnx t z2