Problem Set 3 With Solutions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Problem Set 3 - SOLUTIONS

Question 1
Suppose that a researcher, using data on class size (CS) and average test
scores from 50 third-grade classes, estimates the OLS regression:

\ = 640.3 − 4.93 × CS,


T estScore R2 = 0.11, SER = 8.7

(a) A classroom has 25 students. What is the regression’s prediction


for that classroom’s average test score?
The predicted test score is

\ = 640.3 − 4.93 × 25 = 517.05


T estScore

(b) Last year a classroom had 24 students, and this year it has 21
students. What is the regression’s prediction for the change in
the classroom’s average test score?

\ = −4.93×21−(−4.93×24) = −4.93×∆CS = 14.79


∆T estScore

(c) The sample average size across the 50 classrooms is 22.8. What
is the sample average of the test scores across the 50 classrooms?
(Hint: review the formulas for the OLS estimators).
Using the formula for β̂0 , we know that:

β̂0 = T estScore − β̂1 × CS

Solving we get T estScore = β̂0 + β̂1 × CS = 640.3 − 4.93 × 22.8 =


527.90
(d) What is the sample standard deviation of test scores across the
50 classrooms? (Hint: review the formulas for the R2 and SER)
Pn
(Y −Y )2
The sample variance of Y is s2Y = i=1n−1i . We need to find
Pn 2
i=1 (Yi − Y ) = T SS, also called the Total Sum of Squares.
TSS appears in the formula for R2 = ESS 2
T SS . Rewrite R as

ESS T SS − SSR SSR


R2 = = =1−
T SS T SS T SS
To find T SS, we now need to find SSR, since R2 is known. We
use the formula of the Standard Error of the Regression (SER)
to get the sum of squared residuals.

1
v
u n
u 1 X 2
SER = t ûi
n−2
i=1
n
X
2
(8.7) × 48 = û2i = SSR = 3633.1
i=1

Now rewrite R2 as

SSR SSR 3633.1


R2 = 1 − ⇒ T SS = = = 4082.1
T SS 1 − R2 1 − 0.11
from which follows that the sample variance of Y is
Pn
− Y )2
i=1 (Yi4082.1
s2Y = = = 83.308
n−1 49

and the standard deviation is sY = 83.308 = 9.127.

Question 2
This question explores the relationship between earnings and height using
the data file Earnings and Height, which contains data on earnings, height,
and other characteristics of a random sample of U.S. workers. A detailed
description is given in Earnings and Height Description.1 Both files are on
the course’s website.

(a) The scatterplot below plots annual earnings (Earnings on the Y-


axis) on height (Height on the X-axis). Notice that the points on
the plot fall along horizontal lines. (There are only 23 distinct
values of Earnings). Why? (Hint: Carefully read the detailed
data description.)

1
These data were used by Professors Anne Case (Princeton University) and Christina
Paxson (Brown University) in their paper “Stature and Status: Height, Ability, and Labor
Market Outcomes,” Journal of Political Economy, 2008, 116(3): 499–532.

2
The data documentation reports that individual earnings were re-
ported in 23 brackets, and a single average value is reported for
earnings in the same bracket. Thus, the dataset contains 23 dis-
tinct values of earnings.
(b) A regression of Earnings on Height. Gives the following results,

\ = −512.7 + 707.7 ∗ Height


Earnings R2 = 0.011
i. What is the estimated slope?
The estimated slope is 707.7 (Dollars per year).
ii. Use the estimated regression to predict earnings for a worker
who is 67 inches tall, for a worker who is 70 inches tall, and
for a worker who is 65 inches tall.

\ height=67 = −512.7 + 707.7 × 67 = 46, 901 (dollars per year)


Earnings|
\ height=70 = −512.7 + 707.7 × 70 = 49, 204 (dollars per year)
Earnings|
\ height=65 = −512.7 + 707.7 × 65 = 45, 486 (dollars per year)
Earnings|

(c) Suppose height were measured in centimeters instead of inches.


Answer the following questions about the Earnings on Height (in
cm) regression.
i. What is the estimated slope of the regression?
Recall that 1 cm = 0.394 inches. The estimated regression
(if we make units explicit, is

Earnings($)
\ = −512.7($) + 707.7($/inch) × Height(inches)

Note that

3
707.7($/inch)×Height(inches) = 707.7($/inch)×(0.394inch/cm)×Height(cm)

So the new slope is 707.7 × 0.394 = 278.8 ($/cm)


ii. What is the estimated intercept?
The intercept is not affected (its unit were $ to begin with,
not inches).
iii. What is the R2 ?
The R2 is unit-less; so it does not change.
iv. What is the standard error of the regression?
Without looking at the data (and finding out, for example,
the sample size) there is not enough information to compute
SER.
(d) A regression of Earnings on Height, using data for female workers
only gives the following results,

\ = 12560 + 511.2 ∗ Height


Earnings R2 = 0.003
i. What is the estimated slope?
The estimated slope is 511.2 (Dollars per year).
ii. A randomly selected woman is 1 inch taller than the average
woman in the sample. Would you predict her earnings to be
higher or lower than the average earnings for women in the
sample? By how much?
A women who is one inch taller than average is predicted to
have earnings that are $511.2 per year higher than average.
(e) A regression of Earnings on Height, using data for male workers
only gives the following results,

\ = −43130 + 1306.9 ∗ Height


Earnings R2 = 0.021
i. What is the estimated slope?
The estimated slope is 1306.9 (Dollars per year).
ii. A randomly selected male is 1 inch taller than the average
male in the sample. Would you predict his earnings to be
higher or lower than the average earnings for women in the
sample? By how much?
A man who is one inch taller than average is predicted to
have earnings that are $1306.9 per year higher than average.
(f) Do you think that height is uncorrelated with other factors that
cause earning? That is, do you think that the regression error
term, say u, has a conditional mean of zero, given Height (X)?

4
Height may be correlated with other factors that cause earnings.
For example, height may be correlated with “strength,” and in
some occupations, stronger workers may by more productive.

You might also like