ESB2021 Resit With Solution
ESB2021 Resit With Solution
Question 1 – MCQ (25 Marks, 2.5 points for each MCQ Question)
c. Women – on average – weigh 10% less than men for a given age
2. For the model from the previous question: suppose you find that β2 = 0. 01
and β3=−0.0001. What does it tell you about the age at which we expect
people to be heaviest?
a. At 25 years
b. At 50 years
c. At 40 years
a. Yes
b. No
4. The demand for a new drug is known to be linear and downward sloping; i.e.
a higher price means a lower demand. A researcher provided an estimate of
this demand curve but suspects that a confounding factor led to a downward
bias. This means that the estimated curve is
c. Upward sloping
a. Equal to zero.
b. 50%
6. The R output below shows a regression of COVID19 cases per student among
US Universities in Fall 2020. The variable partyrank ranks Universities
according to the quality of the local party scene (i.e. the University with the
best party scene has rank 1). What does the regression suggest about the
relationship between party rank and covid cases?
#simple regression of case on partyrank
lm(casesOstudent~partyrank,datafinal2) %>% summary()
Call:
Residuals:
Min 1Q Median 3Q Max
Coefficients:
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
8. The following R output provides results using a dataset from the UK Health
and Lifestyle Survey (1984-85). In this survey, several thousand people in the
UK were being asked questions about their health and lifestyle.The variable
bmi records the body mass index (BMI) of the respondents. The BMI uses
weight and height to work out whether a weight is healthy or if someone is
overweight. A value between 18.5 and 24.9 indicates a healthy weight. The
variable region is a categorical variable recording in which region a
respondent is based. According to the output provided, which region is the
least overweight region (on average)?
a) London
b) Scotland
c) Wales
d) South East
summary(halsx$bmi)
table(halsx$region)
##
## wales north north west yorks/humber west midlands
## 498 540 1092 808 823
## east midlands east anglia south west south east greater london
## 682 333 720 1607 943
## scotland
## 925
##
## Call:
## lm(formula = bmi ~ region, data = halsx)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.3808 -2.8505 -0.5398 2.2378 30.3695
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 25.2405 0.2071 121.860 < 2e-16 ***
## regionnorth -0.5668 0.2840 -1.996 0.04598 *
## regionnorth west -0.5400 0.2484 -2.174 0.02973 *
## regionyorks/humber -0.6353 0.2608 -2.436 0.01487 *
## regionwest midlands -0.7341 0.2626 -2.796 0.00519 **
## regioneast midlands -0.5497 0.2694 -2.040 0.04135 *
## regioneast anglia -0.6755 0.3183 -2.122 0.03385 *
## regionsouth west -0.4772 0.2676 -1.783 0.07455 .
## regionsouth east -1.1507 0.2349 -4.899 9.82e-07 ***
## regiongreater london -1.2294 0.2561 -4.801 1.61e-06 ***
## regionscotland -0.3269 0.2560 -1.277 0.20161
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.08 on 7260 degrees of freedom
## (1700 observations deleted due to missingness)
## Multiple R-squared: 0.006982, Adjusted R-squared: 0.005614
## F-statistic: 5.105 on 10 and 7260 DF, p-value: 1.825e-07
9. Suppose you have estimated the following equation describing the
relationship between a wind turbine’s monthly electricity output (in MWh) and
the age of a turbine
2
𝐸 = 1000 + 30𝐴𝐺𝐸 − 𝐴𝐺𝐸 + ϵ
Based on this, at what age would we expect the highest output?
a) 15 years
b) 5 years
c) 30 years
d) There is not enough information to tell.
e) Now include distance squared as additional explanatory variable. Assume you can
interpret this regression causally. What does it tell you about the relationship
between prices and distance? What is the impact of an additional km of distance on
price 2km from the centre? Can you identify a distance from the centre at which
distance has no more impact on price?
(a) Run a regression of mort on imm. Provide an interpretation of the parameter related
to imm.
(b) Would you say that the regression reported above provides a causal estimate of the
impact of immunization? Can you suggest reason why there might be a bias? Discuss
the possible direction of the bias.
(c) Now include year and country fixed effects in the regression from part a). Discuss the
merits (or lack thereof) of this specification there to establish the causal effect of
immunization on mortality.
(d) What do the results from part c) suggest about the worldwide trend in childmortality
from 199 onwards? How much lower or higher is child mortality in 2007 compared to
1999?
(e) Add GDP per capita (gdppc) as additional explanatory variable to the specification
from part c). Discuss why this might be a good idea. Could there also be reasons why
it is problematic? Discuss the results shown below. How does this affect the
coefficient for imm?
(a) Consider column 1. How can we interpret the regression coefficient reported there?
(b) Can you propose a mechanism that would lead to a causal effect from air quality to
crime?
(c) Columns 3 to 5 include a variety of fixed effects as control variables. Namely: Ward,
Day of week (DOW) and year-month fixed effects. Explain why these might help in
getting a better estimate of the causal effect of pollution. Can you also discuss at
least one confounding factor that is not addressed by these control variables?
(d) The authors propose to use the wind direction on a particular day in a particular
ward interacted with broad city location (central, north, south, east, west) as
instrumental variable to deal with any remaining confounding factors that might exist
even after including all the fixed effects discussed in part (d). Explain why this might
help. Can you also discuss potential issues that might invalidate this instrumental
variable strategy?
(e) Columns 3 to 6 provide results from an instrumental variable estimation using wind
speeds. Discuss this result. If windspeed is indeed a valid instrument, what do the
results suggest about the direction of the bias in original regression (repeated in
columns 1 and 2)? Which confounding mechanism would be consistent with this kind
of bias?