Assignment EMET8005
Assignment EMET8005
xlsx contains data on 1070 houses sold in Baton Rouge, Louisiana, during
mid-2005. The data include sale price, the house size in square feet, its age, whether it has a
pool or a fireplace or is on a waterfront. Also included is an indicator variable
TRADITIONAL indicating whether the house style is traditional or not.
(a) Plot the house price against house size for houses with traditional style.
(b) For the traditional-style houses estimate the linear regression model:
PRICE = β1 + β 2 SQFT + e . Interpret the estimates. Draw sketch of the fitted line. Test
the null hypothesis that the slope is zero against the alternative that it is positive, using
α = 0.05 .
(c) For the traditional-style houses estimate the quadratic regression model:
PRICE = β1 + β 2 SQFT 2 + e . Compute the marginal effect of an additional square feet
of living area in a home with 2000 square feet of living space. Compute the elasticity
PRICE with respect to SQFT for a home with 2000 square feet of living space. Graph
the fitted line.
(d) For regressions in (b) and (c) compute the least squares residuals and plot them
against SQFT. Do any of the regression assumptions appear violated?
(e) One basis for choosing between these two specifications is how well the data are fit
by the model. Compare the sum of squared residuals (SSE) from the models in (b) and
(c). Which model has a lower SSE? How does having a lower SSE indicate a better
fitting model?
(f) For traditional-style houses estimate the log-linear regression model:
ln( PRICE ) = β1 + β 2 SQFT + e . Interpret the estimates. Graph the fitted line and
sketch the tangent line to the curve for a house with 2000 square feet of living area.
(g) How would you compute the sum of squared residuals for the model in (f) to make it
comparable to those from models in (b) and (c)? Compare this sum of squared
residuals to the SSE from the linear and quadratic specifications. Which models fits
the data best?
(h) Using the linear model in (b), test the null hypothesis that the expected price of house
of 2000 square feet is equal to, or less than $120,000. What is the appropriate
alternative hypothesis? Use α = 0.05 level of significance. Obtain the p-value of the
test. What is your conclusion?
(i) Based on the estimated results from part (b), construct a 95% interval estimate of the
expected price of a house of 2000 square feet.
Q2. The data file cps4_small.xls contains 1000 observations on hourly rates, education, and
other variables from 2008 Current Population Survey (CPS). The variable this exercise will
use are years of education EDUC , years of experience EXPER and hours worked per week
HRSWK .
β1 + β 2 EDUC + e and log-linear regression
(a) Estimate the linear regression WAGE =
ln(WAGE ) = β1 + β 2 EDUC + e .
(b) What is the estimated return to education in each model? That is, for an additional
year of education, what percent increase in wage can the average worker expect?
(c) Construct histograms of the residuals from the linear and log-linear model in (b), and
the Jarque-Bera test for normality.
(d) Compare the R2 of the linear model to the generalised R2 for the log-linear model.
Which model fits the data better?
(e) Estimate the following wage equation:
ln(WAGE ) = β1 + β 2 EDUC + β3 EXPER + β 4 HRSWK + e . Report the results and
interpret the estimates for β 2 , β 3 and β 4 . Are these estimates different from zero?
(f) Find a 90% interval estimate for the percentage increase in wage from working an
additional hour per week.
(g) Re-estimate the model with additional variables (that is estimate equation in (i)
below) EDUC × EXPER, EDUC 2 and EXPER 2 . Report the result. Are the estimated
coefficient significantly different from zero?
∂ ln(WAGE )
(h) For the model in (g) estimate the marginal effect for two workers Jill
∂EDUC
and Wendy; Jill has 16 years of education and 10 years of experience, while Wendy
has 12 years of education and 10 years of experience. What can you say about the
marginal effect of education as education increases?
(i) For the specification:
ln(WAGE ) = β1 + β 2 EDUC + β3 EDUC 2 + β 4 EXPER + β5 EXPER 2 +
β 6 EDUC × EXPER + β 7 HRSWK + e
test whether β 4 , β5 and β 6 are jointly insignificant.
Q3. The data file br2.xls contains data on 1080 house sales in Baton Rouge, Louisiana,
during July and August 2005. The variables are PRICE ($), SQFT (total square feet),
BEDROOMS (number), BATHS (number), AGE (years), OWNER (=1 if occupied by owner;
zero if vacant or rented), POOL (=1 if present), TRADITIONAL (=1 if traditional style; 0 if
other style), FIREPLACE (=1 if present), and WATERFRONT (=1 if on waterfront).
(a) Compute the data summary statistics and comment. In particular, construct a
histogram of PRICE. What do you observe?
(b) Estimate a regression model explaining ln (PRICE/100) as a function of the remaining
variables. Divide the variable SQFT by 100 prior to estimation. Comment on how
well the model fits the data. Discuss the signs and statistical significance of the
estimated coefficients. Are the signs what you expect? Given an exact interpretation
of the coefficient of WATERFRONT.
(c) Create a variable that is the product of WATERFRONT and TRADITIONAL. Add this
variable to the model and re-estimate. What is the effect of adding this variable?
Interpret the coefficients of this interaction variable, and discuss its sign and statistical
significance.
(d) It is arguable that the traditional-style homes may have a different regression function
from the diverse set of non-traditional styles. Carry out a Chow test of the equivalence
of the regression models for traditional versus non-traditional styles. What do you
conclude?
Q4.
(a) Using the data in cps4_small.xls estimate the following wage equation with least
squares and heterokedasticity-robust standard errors:
Note: For all the above tests above, carefully state the (i) null and the alternative hypothesis, (ii) test
statistics and the corresponding critical value, (iii) Rejection region, (iv) Decision rule and (v)
Conclusion.