Exercise 1 (Week 37)
Exercise 1 (Week 37)
1. The file CEO.dat contains data on 447 chief executive officers and can be used to
examine the effects of firm performance on CEO salary.
(i) Estimate a model relating annual salary to firm sales and assets value. Make the model
of the constant elasticity variety for both independent variables. Write the results out in
equation form.
(ii) Add profits to the model from part (i). Why can this variable not be included in
logarithmic form? Would you say that these firm performance variables explain most of
the variation in CEO salaries?
(iii) Add the variable tenure to the model in part (ii). What is the estimated percentage
return for another year of CEO tenure, holding other factors fixed?
(iv) Find the sample correlation coefficient between the variables log(sales) and profits.
Are these variables highly correlated? What does this say about the OLS estimators?
The following equation describes the median housing price in a community in terms of
amount of pollution (nox for nitrous oxide) and the average number of rooms in houses in
the community (rooms):
(i) What are the probable signs of β1 and β2 ? What is the interpretation of β1 ? Explain.
1
(ii) Why might nox [or more precisely, log(nox)] and rooms be negatively correlated? If
this is the case, does the simple regression of log(price) on log(nox) produce an upward
or a downward biased estimator of β1 ?
(iii) Using the data in HPRICE2.RAW, the following equations were estimated:
\ = 11.71 − 1.043log(nox),
log(price) n = 506, R2 = 0.264
Is the relationship between the simple and multiple regression estimates of the elasticity
of price with respect to nox what you would have predicted, given your answer in part?
(ii) Does this mean that -0.718 is definitely closer to the true elasticity than -1.043?
4. The file Kc_house_data.dat contains data on This dataset contains house sale
prices for King County, which includes Seattle. It includes homes sold between May
2014 and May 2015.
(i) Confirm the partialling out interpretation of the OLS estimates by explicitly doing the
partialling out. Regress log(price) on log(sq f t_living), log(sq f t_lot) and f loors.
(ii) Regress log(sq f t_living) on log(sq f t_lot) and f loors, and save the residual, which
we can call, r˜1 .
(iii) Now regress log(price) on r˜1 . Can you confirm that the coefficient on r̃1 is the same
we get for log(sq f t_living) on the regression in (i). What about the standard errors in (ii),
are they the same?
(iv) Run a regression that also gives you the same standard errors. (hint: you need to re-
move the proportion of variance coming from log(sq f t_lot) and f loors on our dependent
variable)
5. Use the Kc_house_data.dat again, this time we look at ommited variable bias.
(i) Run a simple regression of log(sq f t_living) on log(sq f t_above), to obtain the slope
coefficient, δ̃1 .
2
(ii) Run a simple regression of log(price) on log(sq f t_living), to obtain the slope coeffi-
cient, β̃1 .
(iii) Run a simple regression of log(price) on log(sq f t_living), and log(sq f t_above),to
obtain the slope coefficients, β̂1 and β̂2 .
(iv) Verify that β̃1 = β̂1 + β̂2 δ˜1
A problem of interest to health officials (and others) is to determine the effects of smoking
during pregnancy on infant health. One measure of infant health is birth weight; a birth
weight that is too low can put an infant at risk for contracting various illnesses. Since
factors other than cigarette smoking that affect birth weight are likely to be correlated
with smoking, we should take those factors into account. For example, higher income
generally results in access to better prenatal care, as well as better nutrition for the mother.
An equation that recognizes this is:
(i) Let β̂0 and β̂1 be the intercept and slope from the regression of yi on xi , using n ob-
servations. Let c1 and c2 , with c2 6= 0, be constants. Let β̃0 and β̃1 be the intercept and
slope from the regression of c1 yi on c2 xi . Show that β̃1 = (c1 /c2 )β̂1 (typo in the book) and
β̃0 = c1 β̂0 , thereby verifying the claims on units of measurement in Section 2-4. [Hint: To
obtain β̃1 , plug the scaled versions of x and y into (2.19). Then, use (2.17) for β̃0 , being
3
sure to plug in the scaled x and y and the correct slope.]
Equation (2.17):
Equation (2.19):
n
∑ (xi − x̄)(yi − ȳ)
i=1
β̂1 = n .
∑ (xi − x̄)2
i=1
(ii) Now, let β̃0 and β̃1 be from the regression of (c1 + yi ) on (c2 + xi ) (with no restriction
on c1 or c2 ). Show that β̃1 = β̂1 and β̃0 = β̂0 + c1 − c2 β̂1 .
(iii) Now, let β̂0 and β̂1 be the OLS estimates from the regression log(yi ) on xi , where we
must assume yi > 0 for all i. For c1 > 0, let β̃0 and β̃1 be the intercept and slope from the
regression of log(c1 yi ) on xi . Show that β̃1 = β̂1 and β̃0 = log(c1 ) + β̂0 .
(iv) Now, assuming that xi > 0 for all i, let β̃0 and β̃1 be the intercept and slope from the
regression of yi on log(c2 xi ). How do β̃0 and β̃1 compare with the intercept and slope from
the regression yi on log(xi )?