Final AK (Spring 2024)
Final AK (Spring 2024)
Name:
You have 120 minutes to complete this exam. You may not use any help apart from a
calculator and the formulas on the last page of the exam. You should have 17 pages of
questions and one page of tables. Please answer these questions in the space provided.
You can use the back of the sheet if you run out of room.
a) If you change the value of X1 by one unit and do not change X2, then we predict Y
will increase by b1 units
b) If observations A and B have the same value of X2, and observation A has a one-
unit larger value of X1 than observation B, then we predict that Y is b1 units
higher for observation A than observation B
c) If you change the value of X1 by one unit, then we predict Y will increase by b1
units
d) If observation A has a one unit larger value of X1 than observation B, then we
predict that Y is b1 units higher for observation A than observation B
Department of Ag and Resource Economics UC Davis
4. When we use White’s standard error formula in place of the usual standard error
formula,
a) the coefficient estimates do not change and the standard error estimates do
change
b) the coefficient estimates change and the standard error estimates do not change
c) the coefficient estimates and the standard error estimates both change
d) the coefficient estimates and the standard error estimates both stay the same
1. Consider the linear regression model: 𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖 + 𝜀𝜀𝑖𝑖 , 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 , 𝜀𝜀𝑖𝑖 ) = 0.
Is this a population model or a sample model? How do you know? (5 points)
Population because of the Greek letters fir the betas and the covariance condition.
2. State CR 4 mathematically and explain in words what it means and why we usually
do not need it. (5 points)
Students could write 𝜀𝜀𝑖𝑖 ~𝑁𝑁(0, 𝜎𝜎 2 ), but we rarely used this notation in class, so
writing it this way is not required.
3. Suppose you are working on an econometric model that satisfies all the assumptions.
Then, due to a computer glitch, you lose all the observations for which Xi < 100. Will
ordinary least squares still provide the best linear unbiased estimator of β1 if applied
to the remaining data? Explain. (5 points)
Typically, yes. Selecting the sample based on the X variable does not bias the
regression slope because it does not induce a correlation between the X variable and
the error. See the example below:
950
900
800
API Score
700
650
600
550
500
0 10 20 30 40 50 60 70 80 90 100
FLE (%)
It is possible to argue both sides. Accept sensible arguments. For example, if the
relationship between Y and X is nonlinear, then the slope for the observations with X <
100 may differ from the overall slope.
Department of Ag and Resource Economics UC Davis
4. Which of the five assumptions are required to justify using the usual standard error
formula when doing hypothesis tests? (5 points)
You have annual data from 1952 through 2019 on global GDP per capita and global
average temperature.
Call:
lm(formula = log_gdp ~ Temp, data = df)
Residuals:
Min 1Q Median 3Q Max
-0.48909 -0.08811 0.00353 0.12535 0.37544
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.76803 0.02864 306.18 <2e-16 ***
Temp 0.71885 0.04513 15.93 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
t test of coefficients:
Call:
lm(formula = log_gdp ~ Year + Temp, data = df)
Residuals:
Min 1Q Median 3Q Max
-0.06008 -0.02846 -0.00083 0.01878 0.09285
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -31.46816 0.95774 -32.857 <2e-16 ***
Temp -0.04986 0.02021 -2.468 0.0162 *
Year 20.42904 0.48626 42.012 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
t test of coefficients:
Call:
lm(formula = e2 ~ lag(e2) + Temp + Year, data = df)
Residuals:
Min 1Q Median 3Q Max
-0.046258 -0.009924 0.000421 0.008793 0.049779
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.63886 0.55102 2.974 0.00416 **
lag(e2) 0.86754 0.06959 12.467 < 2e-16 ***
Temp 0.03662 0.01151 3.182 0.00227 **
Year -0.83300 0.27972 -2.978 0.00412 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Call:
lm(formula = log_gdp ~ Temp + Year + lag(log_gdp) + lag(Temp),
data = df)
Residuals:
Min 1Q Median 3Q Max
-0.034084 -0.007723 0.000021 0.008797 0.039097
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.465444 1.924301 -1.801 0.0766 .
Temp 0.002018 0.009650 0.209 0.8350
Year 2.260280 1.219843 1.853 0.0687 .
lag(log_gdp) 0.889612 0.056329 15.793 <2e-16 ***
lag(Temp) -0.010282 0.009526 -1.079 0.2846
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.01468 on 62 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.99860, Adjusted R-squared: 0.9985
F-statistic: 1.091e+04 on 4 and 62 DF, p-value: < 2.2e-16
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.4654437 1.8001434 -1.9251 0.05881 .
Temp 0.0020183 0.0092277 0.2187 0.82758
Year 2.2602800 1.1430097 1.9775 0.05244 .
lag(log_gdp) 0.8896123 0.0536230 16.5901 < 2e-16 ***
lag(Temp) -0.0102824 0.0090954 -1.1305 0.26262
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Call:
lm(formula = e3 ~ lag(e3) + Temp + Year + lag(log_gdp) + lag(Temp),
data = df)
Residuals:
Min 1Q Median 3Q Max
-0.032828 -0.007332 -0.000374 0.006514 0.039849
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.540929 2.085083 -0.739 0.4627
lag(e3) 0.244884 0.139084 1.761 0.0833 .
Temp -0.001491 0.009528 -0.157 0.8761
Year 0.992783 1.325597 0.749 0.4568
lag(log_gdp) -0.047364 0.061585 -0.769 0.4448
lag(Temp) -0.002086 0.009443 -0.221 0.8259
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.01444 on 61 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.04836, Adjusted R-squared: -0.02964
F-statistic: 0.62 on 5 and 61 DF, p-value: 0.685
Department of Ag and Resource Economics UC Davis
Call:
lm(formula = log_gdp ~ Year + lag(log_gdp), data = df)
Residuals:
Min 1Q Median 3Q Max
-0.035710 -0.006691 -0.000086 0.009070 0.041218
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.71053 1.53316 -1.768 0.0818 .
Year 1.81517 1.00719 1.802 0.0762 .
lag(log_gdp) 0.90349 0.05182 17.434 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
t test of coefficients:
If the temperature were one degree higher in one year than another year, we
predict that log GDP would be β1 units higher.
Alternative: If the temperature were one degree higher in one year than another
year, we predict that GDP would be approximately 100β1% higher. Deduct two points
if students say β1% rather than 100β1%
2. Does your estimate of β1 in Model 1 imply that higher temperatures cause GDP to
increase? Explain. (5 points)
No. It implies temperature and GDP are positively correlated, but there are many
other variables not in the model that also predict higher GDP.
Use the Newey West corrected standard errors because we may have correlated
errors.
The 5% two tail critical value is +-1.96. We reject the null hypothesis because
3.13>1.96.
Deduct 0.5 if they do not state the null hypothesis. Deduct 0.5 if they do not state
significance level. Deduct 0.5 if they do not state correct critical value. Deduct 1 if
they do not state the conclusion of the test as reject the null.
Deduct 1 if they do not correctly justify their choice of standard error.
4. Based on the plot, do you expect there to be correlated errors in Model 2? Explain.
(5 points)
Yes, because positive errors tend to follow positive errors and negative errors tend
to follow negative errors. Put another way, the errors tend to have multiple
consecutive when they are the same sign.
Our test statistic is (N-1)*R2 = 67*0.7117 = 47.68 > 3.84 (chi-squared, df=1 at alpha=5%)
Deduct 0.5 if they do not state significance level. Deduct 0.5 if they do not state correct
critical value. Deduct 1 if they do not state the conclusion of the test as reject the null.
Deduct 0.5 if they compute the BG stat incorrectly, but otherwise do the test correctly.
Deduct 0 if they correctly use the test statistic 47.68 (which is printed in the code output)
without stating the formula (N-1)*R2)
CI = -0.049860±1.96*0.023274 = [-0.004,-0.095]
Deduct 0.5 if use incorrect critical value. Deduct 1 if they use incorrect standard
error. Deduct 0.5 if they do not justify their choice of standard error, which they can
do with BG test or referring to C4. Significance level is stated in question so not
required in answer.
7. Test the null hypothesis that β1=0 and β4=0 in Model 3. (5 points)
5% critical value is 5.99. We cannot reject the null hypothesis and believe that at
least one coefficient is non-zero.
Deduct 2 if they compute Wald statistic incorrectly. Deduct 1 if they do not state
significance level. Deduct 1 if they do not state correct critical value. Deduct 1 if they
do not state the conclusion of the test as cannot reject the null.
Department of Ag and Resource Economics UC Davis
8. What do you conclude about the relationship between global GDP and
temperature? (5 points)
The distributed lag model (Model 3) shows that there is no statistically significant
relationship between GDP and temperature (Wald test in question 7).
Using “trick one” the long-run coefficient on temperature is -0.05 (which is similar to
the estimate in Model 2). This estimate suggests 5% of GDP loss per degree of
warming, but the lack of statistical significance
Model 2 shows a statistically significant negative relationship of 4.9% of GDP loss per
degree of warming, although the relationship is not strongly significant. From the
answer to question 6, the upper bound of the confidence interval is -0.004 (i.e., 0.4%
of GDP loss per degree of warming).
There are also potential omitted variables that are related to temperature and affect
GDP. In that case, the coefficients in the above models are biased for the causal
effect of temperature on GDP.
It is not necessary to include all of the above arguments, but they must answer based
on the regression results. Give points for a sensible answer.
You are interested in the relationship between education and wages. You have six
variables:
1. age age in years
2. female =1 if female and 0 otherwise
3. educ years of schooling
4. earnings weekly earnings per hour
Call:
lm(formula = log_earnings ~ education, data = cps)
Department of Ag and Resource Economics UC Davis
Residuals:
Min 1Q Median 3Q Max
-6.7575 -0.3368 0.0081 0.3557 2.4706
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.941065 0.123349 72.49 0.0002 ***
educ 0.121765 0.008634 14.10 0.0002 ***
---
Residual standard error: 0.6579 on 823 degrees of freedom
Multiple R-squared: 0.1946, Adjusted R-squared: 0.1937
F-statistic: 198.9 on 1 and 823 DF, p-value: < 0.00000000000000022
t test of coefficients:
Call:
lm(formula = e2 ~ education, data = cps)
Residuals:
Min 1Q Median 3Q Max
-0.627 -0.402 -0.306 -0.066 45.134
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.77319 0.34406 2.247 0.0249 *
educ -0.02432 0.02408 -1.010 0.3129
---
Residual standard error: 1.835 on 823 degrees of freedom
Multiple R-squared: 0.001238, Adjusted R-squared: 2.408e-05
F-statistic: 1.02 on 1 and 823 DF, p-value: 0.3129
Call:
lm(formula = log_earnings ~ education + age + age2, data = cps)
Residuals:
Min 1Q Median 3Q Max
-6.8375 -0.3363 0.0153 0.3292 2.4681
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.5960988 0.2908583 26.116 < 0.0000000000000002 ***
educ 0.1150854 0.0085494 13.461 < 0.0000000000000002 ***
age 0.0633995 0.0136130 4.657 0.00000374 ***
age2 -0.0006460 0.0001577 -4.096 0.00004613 ***
Department of Ag and Resource Economics UC Davis
---
Residual standard error: 0.6455 on 821 degrees of freedom
Multiple R-squared: 0.2266, Adjusted R-squared: 0.2238
F-statistic: 80.2 on 3 and 821 DF, p-value: < 0.00000000000000022
Call:
lm(formula = log_earnings ~ education + female + female * education +
age + age2, data = cps)
Residuals:
Min 1Q Median 3Q Max
-7.0330 -0.2896 -0.0005 0.3284 2.2855
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.6399735 0.2859595 26.717 0.0000 ***
educ 0.1068435 0.0105056 10.170 0.0000 ***
age 0.0743070 0.0131279 5.660 0.0000 ***
age2 -0.0007637 0.0001519 -5.028 0.0000 ***
female -0.7917855 0.2414095 -3.280 0.0010 **
education:female 0.0296351 0.0168202 1.762 0.0784 .
---
Residual standard error: 0.618 on 819 degrees of freedom
Multiple R-squared: 0.2928, Adjusted R-squared: 0.2885
F-statistic: 67.82 on 5 and 819 DF, p-value: < 0.00000000000000022
𝛽𝛽
1. Suppose hourly wages are determined by the model: ℎ𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑖𝑖 = 𝐴𝐴𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑖𝑖 𝑒𝑒 𝜀𝜀𝑖𝑖 . How
could you estimate this model by OLS? How would you interpret the coefficient β? (5
points)
H0: var(εi)=constant, H1: var(εi) is not a constant and varies with education
Our test statistic is N*R2 = 825*0.001238 = 1.02 > 3.84 (chi-squared, df=1 at
alpha=5%)
Department of Ag and Resource Economics UC Davis
Deduct 0.5 if they do not state significance level. Deduct 0.5 if they do not state correct
critical value. Deduct 1 if they do not state the conclusion of the test as reject the null.
Deduct 0.5 if they compute the Wald stat incorrectly, but otherwise do the test
correctly.
3. At what age does predicted earnings peak, holding education constant? (5 points)
4. For a given age, test the hypothesis that the marginal effect of education on wages
varies by gender (female)? (5 points)
The 5% two tail critical value is +-1.96. We cannot reject the null hypothesis because
1.76<1.96.
Deduct 0.5 if they do not state the null hypothesis. Deduct 0.5 if they do not state
significance level. Deduct 0.5 if they do not state correct critical value. Deduct 1 if
they do not state the conclusion of the test as cannot reject the null.
Deduct 0.5 if they do not correctly justify their choice of standard error.
5. How much does an extra year of education cause earnings to increase? Justify your
answer. (5 points)
However, there are also potential omitted variables that are related to education
and affect wages. In that case, the coefficients in the above models are biased for
the causal effect of education on wages. Model 3 accounts for one possible omitted
variable (gender), but others exist, especially the person’s natural ability to be
successful in the labor market.
Department of Ag and Resource Economics UC Davis
It is not necessary to include all of the above arguments, but they must answer based
on the regression results. Give points for a sensible answer.
Student’s t-Distribution
Significance Level
Degrees of Freedom (2-Tailed Test)
0.10 0.05 0.01
1 6.31 12.71 63.66
2 2.92 4.30 9.93
3 2.35 3.18 5.84
4 2.13 2.78 4.60
5 2.02 2.57 4.03
6 1.94 2.45 3.71
7 1.90 2.37 3.50
8 1.86 2.31 3.36
9 1.83 2.26 3.25
10 1.81 2.23 3.17
15 1.75 2.13 2.95
20 1.73 2.09 2.85
25 1.71 2.06 2.79
30 1.70 2.04 2.75
40 1.68 2.02 2.70
50 1.68 2.01 2.68
60 1.67 2.00 2.66
80 1.66 1.99 2.64
100 1.66 1.98 2.63
150 1.66 1.98 2.61
∞(Z) 1.65 1.96 2.58
Chi-Square Distribution
Significance Level
Degrees of Freedom
0.10 0.05 0.01
1 2.71 3.84 6.63
2 4.61 5.99 9.21
3 6.25 7.81 11.34
4 7.78 9.49 13.28
5 9.24 11.07 15.09