Problem Set 4 With Solutions
Problem Set 4 With Solutions
Question 1
(a) ui represents factors other than nap time that influence the stu-
dent’s performance on the exam including amount of time study-
ing, aptitude for the material, and so forth. Some students will
have studied more than average, other less; some students will
have higher than average aptitude for the subject, others lower,
and so forth.
i.
ii.
0.17 " 5 = 0.85
1
Question 2
where c is a number. For example, in part (a) c = 100, and in part (b)
c = 1/1000.
The OLS estimator obtained from the regression of Y on X ! is
Pn P
ˆ ! i=1 (Yi # Ȳ )(Xi! # X̄ ! ) c ni=1 (Yi # Ȳ )(Xi # X̄) 1ˆ
!1 = Pn = Pn = !
! !
i=1 (Xi # X̄ )
2 c 2
i=1 (Xi # X̄)
2 c 1
ˆ is the OLS obtained from the regression of Y on X
because X̄ ! = cX̄. ! 1
! ˆ ! X̄ ! = Ȳ # 1 !
ˆ ! = Ȳ # ! ˆ cX̄ = Ȳ # !
ˆ X̄ = !
ˆ
0 1
c 1 1 0
Question 3
(to four decimal places). The p-value is less than 0.01, so we can
reject the null hypothesis that there is no gender gap at a 1%
significance level.
(c) The 95% confidence interval for the gender gap is {1.78 ± 1.96 " 0.29},
that is, 1.21 ! ! 1 ! 2.35.
2
(d) The sample average wage of women is ! ˆ = 10.73. The sample
0
ˆ ˆ
average wage of men is ! 0 + ! 1 = 10.73 + 1.78 = 12.51.
(e) The binary variable regression model relating wages to gender can
be written as either W age = ! 0 + ! 1 M ale + u , or as W age =
" 0 + " 1 F emale + v. In the first regression equation, Male equals
1 for men and 0 for women; ! 0 is the population mean of wages
for women and ! 0 + ! 1 is the population mean of wages for men.
In the second regression equation, Female equals 1 for women
and 0 for men; " 0 is the population mean of wages for men and
" 0 + " 1 is the population mean of wages for women. This implies
the following relationship for the coe¢cients in the two regression
equations:
"0 = !0 + !1
"0 + "1 = !0
ˆ and !
Given the coe¢cient estimates ! ˆ , we have
0 1
ˆ +!
"ˆ 0 = ! ˆ = 10.73 + 1.78 = 12.51
0 1
Wd
age = 12.51 # 1.98F emale, R2 = 0.09, SER = 3.8
3
1
Question 4
(a) The estimated regression is
= 512.7 + 707.7×Height
(3379.9) (50.4)
The 95% confidential interval for the slope coefficient is 707.7 ± 1.96×50.4, or
608.9 ≤ 1 ≤ 806.5. This interval does not include 1 = 0, so the estimated slope is
significantly different than 0 at the 5% level. Alternatively, the t-statistic is 707.7/50.4 ≈
14.0, which is greater in absolute value than the 5% critical value of 1.96. And finally,
the p-value for the t-statistic is p-value ≈ 0.000, which is smaller than 0.05.
= 12650 + 511.2×Height
(6299) (97.6)
The 95% confidential interval for the slope coefficient is 511.2 ± 1.96×97.6, or
319.9 ≤ 1,Female ≤ 702.5. This interval does not include 1,Female = 0, so the estimated
slope is significantly different than 0 at the 5% level.
= -43130 + 1306.9×Height
(6925) (98.9)
The 95% confidential interval for the slope coefficient is 1306.9 ± 1.96×98.9, or
1113.1 ≤ 1,Male ≤ 1500.6. This interval does not include 1,Male = 0, so the estimated
slope is significantly different than 0 at the 5% level.
(d) The estimate of 1,Male 1,Female is b̂1,Male - b̂1,Female and the standard error is
( )
SE b̂1,Male - b̂1,Female = var( b̂1,Male ) + var( b̂1,Female ) = SE( b̂1,Male )2 + SE( b̂1,Female )2 . Using
the estimated regressions in (b) and (c): b̂1,Male - b̂1,Female = 1306.9 511.2 = 795.7, and .
( )
SE b̂1,Male - b̂1,Female = 98.9 2 + 97.6 2 = 138.9 .
The 95% confidence interval for 1,Male 1,Female is 795.7 ± 1.96 × 138.9 or
523.5 ≤ 1,Male 1,Female ≤ 1,067.8. This interval does not include 1,Male 1,Female = 0, so
the estimated difference in the slopes is significantly different than 0 at the 5% level.