0% found this document useful (0 votes)
28 views14 pages

Final AK (Spring 2024)

Uploaded by

Batchb689
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views14 pages

Final AK (Spring 2024)

Uploaded by

Batchb689
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Department of Ag and Resource Economics UC Davis

Name:

ARE 106 Final Exam


Spring 2024

You have 120 minutes to complete this exam. You may not use any help apart from a
calculator and the formulas on the last page of the exam. You should have 17 pages of
questions and one page of tables. Please answer these questions in the space provided.
You can use the back of the sheet if you run out of room.

Total Available Points: 100

Question A (15 points; 3 points per problem)

Circle the best answer

1. The first step in doing an econometric study is:


a) Define the purpose of your study
b) Load the data into your econometric software
c) Tweet about it
d) Collect your data

2. The key assumption for OLS to be unbiased is


a) CR3: Uncorrelated errors
b) CR2: Homoskedasticity
c) CR4: Normally distributed errors
d) CR1: Representative sample

3. Model : 𝑌𝑌𝑖𝑖 = 𝑏𝑏0 + 𝑏𝑏1 𝑋𝑋1𝑖𝑖 + 𝑏𝑏2 𝑋𝑋2𝑖𝑖 + 𝑒𝑒𝑖𝑖

Which of the following is the correct interpretation of b1?

a) If you change the value of X1 by one unit and do not change X2, then we predict Y
will increase by b1 units
b) If observations A and B have the same value of X2, and observation A has a one-
unit larger value of X1 than observation B, then we predict that Y is b1 units
higher for observation A than observation B
c) If you change the value of X1 by one unit, then we predict Y will increase by b1
units
d) If observation A has a one unit larger value of X1 than observation B, then we
predict that Y is b1 units higher for observation A than observation B
Department of Ag and Resource Economics UC Davis

4. When we use White’s standard error formula in place of the usual standard error
formula,
a) the coefficient estimates do not change and the standard error estimates do
change
b) the coefficient estimates change and the standard error estimates do not change
c) the coefficient estimates and the standard error estimates both change
d) the coefficient estimates and the standard error estimates both stay the same

5. Causation in a regression context means


a) if you tell me the value of X, then I can make a prediction of Y in a different
population
b) if you change the value of X, then Y would not change
c) if you change the value of X, then I can make a prediction of how Y would change
d) if you tell me the value of X, then I can make a prediction of Y in that population
Department of Ag and Resource Economics UC Davis

Question B (20 points)

1. Consider the linear regression model: 𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖 + 𝜀𝜀𝑖𝑖 , 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 , 𝜀𝜀𝑖𝑖 ) = 0.
Is this a population model or a sample model? How do you know? (5 points)

Population because of the Greek letters fir the betas and the covariance condition.

Take off zero points if they do not mention covariance condition

2. State CR 4 mathematically and explain in words what it means and why we usually
do not need it. (5 points)

εi has a normal distribution.

Students could write 𝜀𝜀𝑖𝑖 ~𝑁𝑁(0, 𝜎𝜎 2 ), but we rarely used this notation in class, so
writing it this way is not required.

3. Suppose you are working on an econometric model that satisfies all the assumptions.
Then, due to a computer glitch, you lose all the observations for which Xi < 100. Will
ordinary least squares still provide the best linear unbiased estimator of β1 if applied
to the remaining data? Explain. (5 points)

Typically, yes. Selecting the sample based on the X variable does not bias the
regression slope because it does not induce a correlation between the X variable and
the error. See the example below:

Figure 10.5. Subpopulation with FLE<50%


1000

950

900

850 Low FLE Subpopulation: y = -1.81x + 925.7

800
API Score

750 Population: y = -1.80x + 925.3

700

650

600

550

500
0 10 20 30 40 50 60 70 80 90 100
FLE (%)

It is possible to argue both sides. Accept sensible arguments. For example, if the
relationship between Y and X is nonlinear, then the slope for the observations with X <
100 may differ from the overall slope.
Department of Ag and Resource Economics UC Davis

4. Which of the five assumptions are required to justify using the usual standard error
formula when doing hypothesis tests? (5 points)

CR1, CR2, CR3

Question C (40 points)

You have annual data from 1952 through 2019 on global GDP per capita and global
average temperature.

Consider the three regression models


Model 1: log_𝑔𝑔𝑔𝑔𝑔𝑔𝑡𝑡 = 𝛽𝛽0 + 𝛽𝛽1 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝜀𝜀𝑡𝑡
Model 2: log_𝑔𝑔𝑔𝑔𝑔𝑔𝑡𝑡 = 𝛽𝛽0 + 𝛽𝛽1 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝛽𝛽2 𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑡𝑡 + 𝜀𝜀𝑡𝑡
Model 3: log_𝑔𝑔𝑔𝑔𝑔𝑔𝑡𝑡 = 𝛽𝛽0 + 𝛽𝛽1 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝛽𝛽2 𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑡𝑡 + 𝛽𝛽3 log _𝑔𝑔𝑔𝑔𝑔𝑔𝑡𝑡−1 + 𝛽𝛽4 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡−1 + 𝜀𝜀𝑡𝑡

Here are the variable definitions:


1. log_gdp log of real global gross domestic product per capita
2. temp average temperature on earth (degrees Celsius)
3. year year (ranges from 1952 through 2019)

Here is your R output:

> reg1 <- lm(data=df,formula=log_gdp~Temp)


> summary(reg1)

Call:
lm(formula = log_gdp ~ Temp, data = df)

Residuals:
Min 1Q Median 3Q Max
-0.48909 -0.08811 0.00353 0.12535 0.37544

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.76803 0.02864 306.18 <2e-16 ***
Temp 0.71885 0.04513 15.93 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1758 on 66 degrees of freedom


Multiple R-squared: 0.7936, Adjusted R-squared: 0.7904
F-statistic: 253.7 on 1 and 66 DF, p-value: < 2.2e-16

> coeftest(reg1,vcov = NeweyWest(reg1,lag=3,prewhite=F))

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)


(Intercept) 8.768029 0.054521 160.818 < 2.2e-16 ***
Temp 0.718855 0.089693 8.0146 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Department of Ag and Resource Economics UC Davis

> reg2 <- lm(data=df,formula=log_gdp~Temp+Year)


> summary(reg2)

Call:
lm(formula = log_gdp ~ Year + Temp, data = df)

Residuals:
Min 1Q Median 3Q Max
-0.06008 -0.02846 -0.00083 0.01878 0.09285

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -31.46816 0.95774 -32.857 <2e-16 ***
Temp -0.04986 0.02021 -2.468 0.0162 *
Year 20.42904 0.48626 42.012 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.03339 on 65 degrees of freedom


Multiple R-squared: 0.9927, Adjusted R-squared: 0.9924
F-statistic: 4400 on 2 and 65 DF, p-value: < 2.2e-16

> coeftest(reg2,vcov = NeweyWest(reg2,lag=3,prewhite=F))

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)


(Intercept) -31.468159 1.103769 -28.5097 < 2e-16 ***
Temp -0.049860 0.023274 -2.1423 0.03592 *
Year 20.429036 0.560683 36.4360 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> df <- mutate(df,e2=reg2$residuals)


> reg_bg2 <-lm(data=df,formula=e2~lag(e2)+Year+Temp)
> summary(reg_bg2)

Call:
lm(formula = e2 ~ lag(e2) + Temp + Year, data = df)

Residuals:
Min 1Q Median 3Q Max
-0.046258 -0.009924 0.000421 0.008793 0.049779

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.63886 0.55102 2.974 0.00416 **
lag(e2) 0.86754 0.06959 12.467 < 2e-16 ***
Temp 0.03662 0.01151 3.182 0.00227 **
Year -0.83300 0.27972 -2.978 0.00412 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.01817 on 63 degrees of freedom


(1 observation deleted due to missingness)
Multiple R-squared: 0.7117, Adjusted R-squared: 0.6979
F-statistic: 51.83 on 3 and 63 DF, p-value: < 2.2e-16

> bg_r2 <- summary(reg_bg2)$r.squared


> bg2 <- bg_r2*nobs(reg_bg2)
> bg2
[1] 47.68059
Department of Ag and Resource Economics UC Davis

> reg3 <- lm(data=df,formula=log_gdp~Year+lag(log_gdp)+Temp+lag(Temp))


> summary(reg3)

Call:
lm(formula = log_gdp ~ Temp + Year + lag(log_gdp) + lag(Temp),
data = df)

Residuals:
Min 1Q Median 3Q Max
-0.034084 -0.007723 0.000021 0.008797 0.039097

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.465444 1.924301 -1.801 0.0766 .
Temp 0.002018 0.009650 0.209 0.8350
Year 2.260280 1.219843 1.853 0.0687 .
lag(log_gdp) 0.889612 0.056329 15.793 <2e-16 ***
lag(Temp) -0.010282 0.009526 -1.079 0.2846
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.01468 on 62 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.99860, Adjusted R-squared: 0.9985
F-statistic: 1.091e+04 on 4 and 62 DF, p-value: < 2.2e-16

> coeftest(reg3,vcov = NeweyWest(reg3,lag=3,prewhite=F))

t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.4654437 1.8001434 -1.9251 0.05881 .
Temp 0.0020183 0.0092277 0.2187 0.82758
Year 2.2602800 1.1430097 1.9775 0.05244 .
lag(log_gdp) 0.8896123 0.0536230 16.5901 < 2e-16 ***
lag(Temp) -0.0102824 0.0090954 -1.1305 0.26262
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> df <- mutate(df,e3=c(0,reg3$residuals))


> reg_bg3 <-lm(data=df,formula=e3~lag(e3)+Year+lag(log_gdp)+Temp+lag(Temp))
> summary(reg_bg3)

Call:
lm(formula = e3 ~ lag(e3) + Temp + Year + lag(log_gdp) + lag(Temp),
data = df)

Residuals:
Min 1Q Median 3Q Max
-0.032828 -0.007332 -0.000374 0.006514 0.039849

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.540929 2.085083 -0.739 0.4627
lag(e3) 0.244884 0.139084 1.761 0.0833 .
Temp -0.001491 0.009528 -0.157 0.8761
Year 0.992783 1.325597 0.749 0.4568
lag(log_gdp) -0.047364 0.061585 -0.769 0.4448
lag(Temp) -0.002086 0.009443 -0.221 0.8259
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.01444 on 61 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.04836, Adjusted R-squared: -0.02964
F-statistic: 0.62 on 5 and 61 DF, p-value: 0.685
Department of Ag and Resource Economics UC Davis

> bg_r2 <- summary(reg_bg3)$r.squared


> bg3 <- bg_r2*nobs(reg_bg3)
> bg3
[1] 3.240266

> reg4 <- lm(data=df,formula=log_gdp~Year+lag(log_gdp))


> summary(reg4)

Call:
lm(formula = log_gdp ~ Year + lag(log_gdp), data = df)

Residuals:
Min 1Q Median 3Q Max
-0.035710 -0.006691 -0.000086 0.009070 0.041218

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.71053 1.53316 -1.768 0.0818 .
Year 1.81517 1.00719 1.802 0.0762 .
lag(log_gdp) 0.90349 0.05182 17.434 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.01458 on 64 degrees of freedom


(1 observation deleted due to missingness)
Multiple R-squared: 0.99855, Adjusted R-squared: 0.9985
F-statistic: 2.21e+04 on 2 and 64 DF, p-value: < 2.2e-16

> coeftest(reg4,vcov = NeweyWest(reg4,lag=3,prewhite=F))

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)


(Intercept) -2.710534 1.573182 -1.7230 0.08972 .
Year 1.815171 1.034226 1.7551 0.08403 .
lag(log_gdp) 0.903485 0.053383 16.9245 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Department of Ag and Resource Economics UC Davis

1. Interpret your estimate of β1 in Model 1. (5 points)

If the temperature were one degree higher in one year than another year, we
predict that log GDP would be β1 units higher.

Alternative: If the temperature were one degree higher in one year than another
year, we predict that GDP would be approximately 100β1% higher. Deduct two points
if students say β1% rather than 100β1%

2. Does your estimate of β1 in Model 1 imply that higher temperatures cause GDP to
increase? Explain. (5 points)

No. It implies temperature and GDP are positively correlated, but there are many
other variables not in the model that also predict higher GDP.

3. Test the null hypothesis that β1 = 1 in Model 1 (5 points)

Use the Newey West corrected standard errors because we may have correlated
errors.

H0: β1=1 vs HA: β1≠1


t statistic: t = (0.718855-1)/0.089693 = -3.13

The 5% two tail critical value is +-1.96. We reject the null hypothesis because
3.13>1.96.
Deduct 0.5 if they do not state the null hypothesis. Deduct 0.5 if they do not state
significance level. Deduct 0.5 if they do not state correct critical value. Deduct 1 if
they do not state the conclusion of the test as reject the null.
Deduct 1 if they do not correctly justify their choice of standard error.

4. Based on the plot, do you expect there to be correlated errors in Model 2? Explain.
(5 points)

Yes, because positive errors tend to follow positive errors and negative errors tend
to follow negative errors. Put another way, the errors tend to have multiple
consecutive when they are the same sign.

5. Conduct a Breuch-Godfrey test for correlated errors in Model 2. (5 points)

Using the regression named reg_bg2

H0: cov(εt, εt-s)=0, H1: cov(εt, εt-s)≠0


Department of Ag and Resource Economics UC Davis

Our test statistic is (N-1)*R2 = 67*0.7117 = 47.68 > 3.84 (chi-squared, df=1 at alpha=5%)

We reject the null of uncorrelated errors at 5% significance.

Deduct 0.5 if they do not state significance level. Deduct 0.5 if they do not state correct
critical value. Deduct 1 if they do not state the conclusion of the test as reject the null.
Deduct 0.5 if they compute the BG stat incorrectly, but otherwise do the test correctly.
Deduct 0 if they correctly use the test statistic 47.68 (which is printed in the code output)
without stating the formula (N-1)*R2)

6. Compute a 95% confidence interval for β1 in Model 2. (5 points)

There is heteroskedasticity in Model 2 from previous question.

Therefore, I will use the Newey West corrected std errors

CI = -0.049860±1.96*0.023274 = [-0.004,-0.095]

Deduct 0.5 if use incorrect critical value. Deduct 1 if they use incorrect standard
error. Deduct 0.5 if they do not justify their choice of standard error, which they can
do with BG test or referring to C4. Significance level is stated in question so not
required in answer.

7. Test the null hypothesis that β1=0 and β4=0 in Model 3. (5 points)

Use a Wald test.


H0: β1=0 and β4=0 vs HA: not H0
2 2
𝑅𝑅𝑎𝑎𝑎𝑎𝑎𝑎 − 𝑅𝑅𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 0.99860 − 0.99855
𝑊𝑊 = 2 = = 2.21
(1 − 𝑅𝑅𝑎𝑎𝑎𝑎𝑎𝑎 )/(𝑁𝑁 − 𝐾𝐾 − 1) (1 − 0.99860)/(62)

Compare to χ2 with 2 degrees of freedom. Number of degrees of freedom is the


number of = signs in the null hypothesis.

5% critical value is 5.99. We cannot reject the null hypothesis and believe that at
least one coefficient is non-zero.

Deduct 2 if they compute Wald statistic incorrectly. Deduct 1 if they do not state
significance level. Deduct 1 if they do not state correct critical value. Deduct 1 if they
do not state the conclusion of the test as cannot reject the null.
Department of Ag and Resource Economics UC Davis

8. What do you conclude about the relationship between global GDP and
temperature? (5 points)

The distributed lag model (Model 3) shows that there is no statistically significant
relationship between GDP and temperature (Wald test in question 7).

Using “trick one” the long-run coefficient on temperature is -0.05 (which is similar to
the estimate in Model 2). This estimate suggests 5% of GDP loss per degree of
warming, but the lack of statistical significance

Model 2 shows a statistically significant negative relationship of 4.9% of GDP loss per
degree of warming, although the relationship is not strongly significant. From the
answer to question 6, the upper bound of the confidence interval is -0.004 (i.e., 0.4%
of GDP loss per degree of warming).

There are also potential omitted variables that are related to temperature and affect
GDP. In that case, the coefficients in the above models are biased for the causal
effect of temperature on GDP.

It is not necessary to include all of the above arguments, but they must answer based
on the regression results. Give points for a sensible answer.

Question D (25 points)

You are interested in the relationship between education and wages. You have six
variables:
1. age age in years
2. female =1 if female and 0 otherwise
3. educ years of schooling
4. earnings weekly earnings per hour

Consider the three regression models


Model 1: log (𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒)𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑖𝑖 + 𝜀𝜀𝑖𝑖
Model 2: log (𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒)𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑖𝑖 + 𝛽𝛽2 𝑎𝑎𝑎𝑎𝑎𝑎𝑖𝑖 + 𝛽𝛽3 𝑎𝑎𝑎𝑎𝑎𝑎𝑖𝑖2 + 𝜀𝜀𝑖𝑖
Model 3: log(𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒)𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑖𝑖 + 𝛽𝛽2 𝑎𝑎𝑎𝑎𝑎𝑎𝑖𝑖 + 𝛽𝛽3 𝑎𝑎𝑎𝑎𝑎𝑎𝑖𝑖2 + 𝛽𝛽4 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑖𝑖 +
𝛽𝛽5 (𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑖𝑖 ∗ 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑖𝑖 ) + 𝜀𝜀𝑖𝑖

Here is your R code:


> cps <- mutate(cps,log_earnings=log(earnings))
> cps <- mutate(cps,age2=age^2)
> reg1<-lm(formula=log_earnings~education, data=cps)
> summary(reg1)

Call:
lm(formula = log_earnings ~ education, data = cps)
Department of Ag and Resource Economics UC Davis

Residuals:
Min 1Q Median 3Q Max
-6.7575 -0.3368 0.0081 0.3557 2.4706

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.941065 0.123349 72.49 0.0002 ***
educ 0.121765 0.008634 14.10 0.0002 ***
---
Residual standard error: 0.6579 on 823 degrees of freedom
Multiple R-squared: 0.1946, Adjusted R-squared: 0.1937
F-statistic: 198.9 on 1 and 823 DF, p-value: < 0.00000000000000022

> coeftest(reg1,vcov = vcovHC(reg1, type = "HC0"))

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)


(Intercept) 8.9410646 0.1374916 65.030 0.0000 ***
educ 0.1217645 0.0094372 12.903 0.0000 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> # Add residuals to data frame


> cps$e2 <- (reg1$residuals)^2

> reg_bp <-lm(data=cps,formula=e2~education)


> summary(reg_bp)

Call:
lm(formula = e2 ~ education, data = cps)

Residuals:
Min 1Q Median 3Q Max
-0.627 -0.402 -0.306 -0.066 45.134

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.77319 0.34406 2.247 0.0249 *
educ -0.02432 0.02408 -1.010 0.3129
---
Residual standard error: 1.835 on 823 degrees of freedom
Multiple R-squared: 0.001238, Adjusted R-squared: 2.408e-05
F-statistic: 1.02 on 1 and 823 DF, p-value: 0.3129

> reg2<-lm(formula=log_earnings~education+age+age2, data=cps)


> summary(reg2)

Call:
lm(formula = log_earnings ~ education + age + age2, data = cps)

Residuals:
Min 1Q Median 3Q Max
-6.8375 -0.3363 0.0153 0.3292 2.4681

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.5960988 0.2908583 26.116 < 0.0000000000000002 ***
educ 0.1150854 0.0085494 13.461 < 0.0000000000000002 ***
age 0.0633995 0.0136130 4.657 0.00000374 ***
age2 -0.0006460 0.0001577 -4.096 0.00004613 ***
Department of Ag and Resource Economics UC Davis

---
Residual standard error: 0.6455 on 821 degrees of freedom
Multiple R-squared: 0.2266, Adjusted R-squared: 0.2238
F-statistic: 80.2 on 3 and 821 DF, p-value: < 0.00000000000000022

> reg3<-lm(formula=log_earnings~education+female + female*education + age+age2,


data=cps)
> summary(reg3)

Call:
lm(formula = log_earnings ~ education + female + female * education +
age + age2, data = cps)

Residuals:
Min 1Q Median 3Q Max
-7.0330 -0.2896 -0.0005 0.3284 2.2855

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.6399735 0.2859595 26.717 0.0000 ***
educ 0.1068435 0.0105056 10.170 0.0000 ***
age 0.0743070 0.0131279 5.660 0.0000 ***
age2 -0.0007637 0.0001519 -5.028 0.0000 ***
female -0.7917855 0.2414095 -3.280 0.0010 **
education:female 0.0296351 0.0168202 1.762 0.0784 .
---
Residual standard error: 0.618 on 819 degrees of freedom
Multiple R-squared: 0.2928, Adjusted R-squared: 0.2885
F-statistic: 67.82 on 5 and 819 DF, p-value: < 0.00000000000000022

𝛽𝛽
1. Suppose hourly wages are determined by the model: ℎ𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑖𝑖 = 𝐴𝐴𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑖𝑖 𝑒𝑒 𝜀𝜀𝑖𝑖 . How
could you estimate this model by OLS? How would you interpret the coefficient β? (5
points)

Yes, if you take logs. Then the model would be

𝑙𝑙𝑙𝑙𝑙𝑙(ℎ𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑖𝑖 ) = 𝛽𝛽0 + 𝛽𝛽𝑙𝑙𝑙𝑙𝑙𝑙(𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑖𝑖 ) + 𝜀𝜀𝑖𝑖

β is the elasticity of wages with respect to education. (Alternate answer: if a person


has 1% more education than another person, then we predict they have β% higher
wages)

2. Conduct a Breusch-Pagan test for heteroskedasticity in Model 1. (5 points)

Using the regression reg_bp in the output.

H0: var(εi)=constant, H1: var(εi) is not a constant and varies with education

Our test statistic is N*R2 = 825*0.001238 = 1.02 > 3.84 (chi-squared, df=1 at
alpha=5%)
Department of Ag and Resource Economics UC Davis

We cannot reject the null of homoskedasticity at 5% significance.

Deduct 0.5 if they do not state significance level. Deduct 0.5 if they do not state correct
critical value. Deduct 1 if they do not state the conclusion of the test as reject the null.
Deduct 0.5 if they compute the Wald stat incorrectly, but otherwise do the test
correctly.

3. At what age does predicted earnings peak, holding education constant? (5 points)

From Model 2, the concave function is 0.0634*age - 0.000646*age2. Taking the


derivative with respect to age and setting to zero implies that the maximum wage is
achieved at age 49.1.

4. For a given age, test the hypothesis that the marginal effect of education on wages
varies by gender (female)? (5 points)

In Model 3, test H0: β5=1 vs HA: β5≠1

We have no way to test for heteroskedasticity in this model, but we found no


heteroskedasticity in Model 2, so the regular standard error is likely valid.
t statistic: t = 0.0296351/0.0168202 = 1.76

The 5% two tail critical value is +-1.96. We cannot reject the null hypothesis because
1.76<1.96.
Deduct 0.5 if they do not state the null hypothesis. Deduct 0.5 if they do not state
significance level. Deduct 0.5 if they do not state correct critical value. Deduct 1 if
they do not state the conclusion of the test as cannot reject the null.
Deduct 0.5 if they do not correctly justify their choice of standard error.

5. How much does an extra year of education cause earnings to increase? Justify your
answer. (5 points)

Model 2 implies that an extra year of education is associated with approximately


12% higher wages, holding age constant.

However, there are also potential omitted variables that are related to education
and affect wages. In that case, the coefficients in the above models are biased for
the causal effect of education on wages. Model 3 accounts for one possible omitted
variable (gender), but others exist, especially the person’s natural ability to be
successful in the labor market.
Department of Ag and Resource Economics UC Davis

It is not necessary to include all of the above arguments, but they must answer based
on the regression results. Give points for a sensible answer.

Student’s t-Distribution

Significance Level
Degrees of Freedom (2-Tailed Test)
0.10 0.05 0.01
1 6.31 12.71 63.66
2 2.92 4.30 9.93
3 2.35 3.18 5.84
4 2.13 2.78 4.60
5 2.02 2.57 4.03
6 1.94 2.45 3.71
7 1.90 2.37 3.50
8 1.86 2.31 3.36
9 1.83 2.26 3.25
10 1.81 2.23 3.17
15 1.75 2.13 2.95
20 1.73 2.09 2.85
25 1.71 2.06 2.79
30 1.70 2.04 2.75
40 1.68 2.02 2.70
50 1.68 2.01 2.68
60 1.67 2.00 2.66
80 1.66 1.99 2.64
100 1.66 1.98 2.63
150 1.66 1.98 2.61
∞(Z) 1.65 1.96 2.58

Chi-Square Distribution

Significance Level
Degrees of Freedom
0.10 0.05 0.01
1 2.71 3.84 6.63
2 4.61 5.99 9.21
3 6.25 7.81 11.34
4 7.78 9.49 13.28
5 9.24 11.07 15.09

You might also like