Econometric Methods

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

II Internal Assessment: April 11, 2019

Econometric Methods
Max Marks: 20 Max Time: 1.5 Hrs

1. Consider the following regressions of average hourly earnings on gender and education and other
characteristics. (12 Marks)

Table 1
Regressor (1) (2) (3)
College(X1 ) 5.46 5.48 5.44
(0.21) (0.21) (0.21)
Female(X2 ) −2.46 −2.62 −2.62
(0.2) (0.2) (0.2)
Age(X3 ) 0.29 0.29
(0.04) (0.04)
North(X4 ) 0.69
(0.3)
West(X5 ) 0.60
(0.28)
South(X6 ) −0.27
(0.26)
Intercept 12.69 4.40 3.75
(0.14) (1.05) (1.06)

F -stat (Regions)=0 6.10


SER 6.27 6.22 6.21
R2 0.176 0.190 0.194
N 4000 4000 4000
Note: SEs in paranthesis.

where (dependent variable) - average hourly earnings (in 1998 dollars); College = binary vari-
able (1 if college, 0 if high school); Female = binary variable (1 if female, 0 if male); Age = age
(in years); North = binary variable (1 if Region = East, 0 otherwise) West = binary variable (1
if Region = West, 0 otherwise) South - binary variable (1 if Region = South. 0 otherwise) East -
binary variable (1 if Region = East, 0 otherwise)
a) Using the regression results in column (1):
(i) Is the college-high school earnings difference estimated from this regression statistically
significant at the 5% level? Construct a 95% confidence interval of the difference.
For the two-tailed test
H0 : β 1 = 0
H1 : β 1 6= 0
The t-statistic is 5.46/0.21 = 26.0 > 1.96 (tcrit ). So, the coefficient is statistically
significant at the 5% level. The 95% CI is 5.46±1.96×0.21.
(ii) Is the male-female earnings difference estimated from this regression statistically signif-
icant at the 5% level? Construct a 95% confidence interval for the difference.
For the two-tailed test
H0 : β 2 = 0
H1 : β 2 6= 0
The t-statistic is −2.64/0.20 = −13.2 and |t|= 13.2 > 1.96. So, the coefficient is
statistically significant at the 5% level. The 95% CI is -2.64±1.96×0.20.
b) Using the regression results in column (2):
(i) Is age an important determinant of earnings? Use an appropriate statistical test and/or
confidence interval to explain your answer.
0.29
Yes, age is an important determinant of earnings. Using a t-test, the t-statistic is 0.04 =
−13
7.25, with a p-value of 4.2×10 , implying that the coefficient on age is statistically
1
2

significant at the 1% level. The 95% CI is 0.29±1.96×0.04. The coefficient on age


variable commands an interpretation of increase in earning for an additional 1 year of
age. The CI similarly has an interpretation with a corresponding upper and lower limits.
(ii) Sally is a 29-year-old female college graduate. Betsy is a 34-year-old female college
graduate. Construct a 95% confidence interval for the expected difference between their
earnings.
Given is ∆Age = 5. The CI for age effect on earnings due to a 1-year increase in age
is given by 0.29±1.96×0.04. Thus for 5 year increase in age it will be ∆Age × [0.29 ±
1.96 × 0.04] = 1.45 ± 1.96 × 0.20 = $1.06 to $1.84.
c) Using the regression results in column (3):
(i) Do there appear to be important regional differences? Use an appropriate hypothesis
test to explain your answer.
The F-statistic testing the coefficients on the regional regressors are zero is 6.10. The
1% critical value (from the F3,∞ distribution) is 3.78. Because 6.10 > 3.78, the regional
effects are significant at the 1% level.
(ii) Juanita is a 28-year-old female college graduate from the South. Molly is a 28-year-old
female college graduate from the East. Jennifer is a 28-year-old female college graduate
from the West.
(A) Construct a 9 5% confidence interval for the difference in expected earnings between
Juanita and Molly.
The expected difference between Juanita and Molly is (X6,Juanita −X6,M olly )×β6 =
β6 . Thus, a 95% confidence interval is −0.27 ± 1.96 × 0.26.
(B) Explain how you would construct a 95% confidence interval for the difference in
expected earnings between Juanita and Jennifer. {Hint: What would happen if
you included East and excluded west from the regression?)
The expected difference between Juanita and Jennifer is (X5,Juanita −X5,Jennif er )×
β5 + (X6,Juanita − X6,Jennif er ) × β6 = −β5 + β6 . A 95% confidence interval could
be constructed using the general methods discussed.
d) Evaluate the following statement: “In all of the regressions, the coefficient on Female is
negative, large, and statistically significant. This provides strong statistical evidence of gender
discrimination in the labor market.”
In isolation, these results do imply gender discrimination. Gender discrimination means that
two workers, identical in every way but gender, are paid different wages. Thus, it is also
important to control for characteristics of the workers that may affect their productivity (ed-
ucation, years of experience, etc.) If these characteristics are systematically different between
men and women, then they may be responsible for the difference in mean wages. (If this were
true, it would raise an interesting and important question of why women tend to have less ed-
ucation or less experience than men, but that is a question about something other than gender
discrimination.) These are potentially important omitted variables in the regression that will
lead to bias in the OLS coefficient estimator for Female. Since these characteristics were not
controlled for in the statistical analysis, it is premature to reach a conclusion about gender
discrimination.
(1% critical value from F3,∞ distribution = 3.78)
Ans:
2. Consider the following model (5 Marks)

Y = b0 + b1 χ + u

and
X =χ+ν
with the assumptions χ = N (χ̄, σx2 ), u = N (0, σu2 ), ν = N (0, σν2 ), E(χu) = 0, E(χν) = 0,
E(uν) = 0. That is errors are normally and independently distributed of χ and of each other
3

with zero means and constant variances. Under these assumptions show that: Var(X)=σχ2 + σν2 ,
Var(Y)=b21 σχ2 + σu2 .

Proof. This is the case of error in the explanatory variable. Here we can treat χ as the true (but
unobservable) value of the explanatory variable. Instead what is observed is X which is related to χ in
the following manner:
X =χ+ν
where ν is the random variable representing the errors of measurement in X. Let us derive formally the
mean and variance of the distributions of X and Y .
(a) Var(X )=σχ2 + σν2
Because X = χ + ν, we have E(X) = E(χ + ν) = E(χ) + E(ν) = χ̄ + 0 = χ̄.
Now Var(X) = E(X − E(X))2 . However, we just saw that E(χ) = χ̄. Hence,

Var(X) = E(X − E(X))2


= E(X − χ̄)2
= E(χ + ν − χ̄)2
= E(χ2 + ν 2 + χ̄2 + 2χν − 2χχ̄ − 2ν χ̄) (∵ E(χν) = 0 and E(ν) = 0)
= E(χ2 + ν 2 + χ̄2 − 2χχ̄)
= E(χ2 ) + E(ν 2 ) + E(χ̄2 ) − 2E(χχ̄)
= E(χ2 ) + σν2 − χ̄2
= E(χ2 ) − [E(χ)]2 + σν2
= Var(χ) + σν2
= σχ2 + σν2
Hence proved. 

Proof. (b) Var(Y ) = b21 σχ2 + σu2


We know that mean of Y is E(Y ) = E(b0 +b1 χ+u) = E(b0 )+E(b1 χ)+E(u) = b0 +b1 E(χ)+0 = b0 +b1 χ̄
We need to show that the variance of Y is given by σY2 = b21 σχ2 + σu2
Because
E[Y − E(Y )]2 = E[Y − E(Y )]2
= E[Y − b0 − b1 χ̄]2
= E[b0 + b1 χ + u − b0 − b1 χ̄]2
= E[b1 (χ − χ̄) + u]2
= E[b21 (χ − χ̄)2 + u2 + 2b1 (χ − χ̄)u]2
= b21 E(χ − χ̄)2 + E(u2 ) + 2b1 E[(χ − χ̄)u] ∵ we assume χ and u independent
= b21 E[χ − E(χ̄)]2 + Var(u)
= b21 Var(x) + Var(u)
= b21 σχ2 + σu2
Hence proved. 

3. Consider the following model (3 Marks)


\ = 0.264 + 1.043 log(assesss) + 0.0074 log(lotsize) − 0.1032 log(sqrf t) + 0.0338bdrms
log(price)
(0.570) (0.151) (0.0386) (0.0221)
(0.1384)

n = 88, SSR = 1822, and R2 = 0.773


where price=house price, assess = the assessed housing value (before the house was sold),
lotsize = size of the lot in feet, sqrft = square footage, bdrms=number of bedrooms. Construct
an appropriate test to see if the asessed housing price is a rational valuation. That is, a 1% change
in assess should be asssociated with a 1% change in price. (F -crit value (4,83) = 2.50).
Ans: There are two possibile ways to think about this answer:
a) To test if the included variables jointly significantly determine house prices.
4

The null hypothesis then becomes

H0 : β1 = 0, β2 = 0, β3 = 0, β4 = 0
H1 : At least one is not zero

We can test this out by constructing an F -test using the R2 value for the given (unrestricted)
model (since R2 value for the restricted model - with the restrictions hypothesized in the H0
- is zero).
R2 /q 0.773/4 0.1933
F4,n−4−1 = = = = 70.68 > 2.5
(1 − R2 )/(n − k − 1) (1 − 0.773)/(88 − 4 − 1) 0.002735
Hence we reject the null hypothesis at 5% level of significance suggesting that the parameters
jointly are important. However, the proposed test here is not relevant from the question’s
standpoint.
b) To test if the assessed housing price is a rational valuation (That is, to see if the price what
was assessed earlier is a perfect predictor of the price at which the house was actually sold,
i.e., elasticity is 1). In addition, lotsize, sqrft, bdrms should not help to explain log(price),
once the assessed value has been controlled for. Then the hypothesis to be tested becomes

H0 : β1 = 1, β2 = 0, β3 = 0, β4 = 0
H1 : At least one is not zero
If we impose these restrictions in the original model, we get
\ = β0 + 1.log(assess)
log(price)

This can also be written as


\ − log(assess) = β0
log(price)

However, this is a model altogether with a different dependent variable than the unrestricted
model given to us. To test the generalized hypothesis, thus we would need information, on
the SSR of the restricted model (information on the R2 from the restricted model is of no use
because the dependent variable in both models are not the same).
Hence, given the limitated information provided in the question, the only test that is possible
for us to perform is to check if the elasticity is 1, that is, H0 : β1 = 1 against the alternative
H1 : β1 6= 1 (or with an one-sided alternative). Hence the t-statistic will be
β̂1 − β1 1.043 − 1
= = 0.284 < 1.96
SE(β̂1 ) 0.151
Hence, we fail to reject the null hypothesis, which suggests that there is limited evidence
against the alternative hypothesis of not having a rational valuation.

You might also like