CC655 Final 2021 Key
CC655 Final 2021 Key
Department of Economics
Wilfrid Laurier University
Fall 2021
Final Exam
Structure:
Answer 4 of the following 6 questions
If you answer all 6 questions, I will grade only the first 4.
Each question has equal weight. The weight of subquestions are noted in the question
It is open book and open internet
You are prohibited from collaborating in any way with other people on the midterm. This means
no discussing the questions in person, by phone, using chat services, question boards, email, etc.
If you draw your answer from another source (e.g. the textbook, the internet), you must reference
it. The normal rules of plagiarism apply to this exam.
Instructor help on the test will be limited to clarification questions only
Submission Instructions:
Submit your exam to Gradescope when complete.
You are required to hand-write your responses to each question on paper (i.e. do not use a word
processor unless you have accommodations through ALC)
Upload your hand-written responses to Gradescope by taking a photo of your response to each
question separately, and then uploading each image.
a) For questions you do not answer, upload an image of that question that says “did not
answer.”
If you run into technical problems or otherwise cannot get your answers posted to Gradescope, I
will collect your exam and post them from my office.
1
Questions
1) (15 points) You are interested in the effect of library fines on whether long-overdue library books get
returned. You find a local library who decided to forgive all library fines owed by people who owed
below $50 in total, and want to use regression discontinuity. However, the people who work at the
library say that a small handful of library patrons insisted on repaying their fines, and wanted them
reinstated. Also, a small handful of patrons with fines in the $51-55 range called the library many
times until they got theirs removed too. Assume the true effect of canceling fines is that it increases
return rates by 10%.
a. (5 points) If you estimated this using a sharp regression discontinuity estimator, how would
the people who called to get their fines reinstated or cancelled affect your results and why
By using a sharp regression discontinuity in this instance, you would measure the
reduced form effect of cancelling fines on returns. For people who have fines <=$50,
repaying their fines probably reduces the likelihood that they return the books.
Likewise, for people with fines >$50, having their fines cancelled anyway will probably
increase the likelihood. So what probably happens is that the reduced-form effect is
below the true causal effect of 10%.
b. (5 points) Outline how you would use a fuzzy regression discontinuity estimator to correct
for the problem in (a).
A fuzzy RD would correct for the fact that having fines of $50 or lower doesn’t increase
the probability of cancelling fines from 0 to 1, but rather something less than that. If
you take the reduced form estimate from (a) and divide it by a first stage estimate of the
change in the probability of treatment, you would get back to the 10% estimate of the
true effect.
2
c. (5 points) When looking at the data, you notice there are a disproportionate number of people
with fines of exactly $49. What, if any, issues would this create for your analysis?
This would suggest that some people have manipulated their fine amount, which could
signal that people on the left of the discontinuity are somehow different from the people
on the right. It could be, for example, that people know they’ll get their fines cancelled
if they keep it below $50, so they return their books to avoid making fines higher. If
this is a non-random group of people, that would cause the RD estimator to fail because
y0 (the return rate for people who don’t get the treatment) would jump up at the cutoff.
2) (15 points) You are investigating the effect of being eligible for food assistance on your kids’
academic performance. You have data on income and childrens’ average test scores. Assume
everyone becomes eligible for food assistance if their income is below $25,000 per year. You take
all the data and estimate the effect by OLS, regressing test scores on (Income – 25000), Below (an
indicator for income being below $25,000 per year), and (Income – 25000) x Below. You get the
below regression table.
Test
Scores
(Intercept) 74.999***
(0.022)
(0.000)
Below -4.978***
(0.031)
(0.000)
Num.Obs. 16132
R2 0.996
3
* p < 0.1, ** p < 0.05, *** p < 0.01
a. (5 points) Draw a graph with the predicted values from this regression on the y axis, and
(income-25000) on the horizontal axis. Indicate the regression discontinuity estimate of the
effect of food assistance on the graph (including the actual value of the effect).
b. (5 points) Why did we subtract 25000 from Income before running the model?
In the RD model, we want to measure the effect of being below the cutoff at the cutoff
point itself, which is $25,000. Because the model allows the slope to differ on each side
of the cutoff, using income without subtracting $25,000 would make the coefficient on
“below” measure the intercept change at income = $0, which is not what we want. By
recentering income at $25,000 we measure the intercept shift at the cutoff point.
4
c. (5 points) Explain the requirements for the RD method to identify the local (at the cutoff)
treatment effect of the food assistance program on test scores.
What must be true is that y0, which is test scores in the absence of the food program,
has to be continuous at the cutoff so that people on each side of the cutoff are
comparable to each other at the baseline. This effectively means that being on either
side of the cutoff is as good as random, which mimics random assignment in an
experiment.
3) (15 points) Consider the below graph showing the average outcome for treated and control groups in
the leadup to treatment (indicated by the dashed line), and also after treatment.
5
a. (5 points) Based on the prior trend, does it seem likely that constant trends assumption holds
in this instance?
It looks like the untreated group is trending upward faster than the treated group, so
this assumption does not appear to hold.
6
If you actually estimate this difference in differences, it will produce a big negative
result. It will probably underestimate the causal effect (make it seem more negative
than it actually is). If you look only at the black line, clearly the treated group is
already trending up, so we will want to correct for that. But using a group that is
trending up faster even before the treatment will tend to over-correct for the trend, and
make the treatment effect seem smaller that it actually is.
7
c. (5 points) Does it matter that there are more time periods prior to the treatment compared to
after the treatment? Explain your reasoning.
It doesn’t matter in terms of identifying an effect, and in fact having several time
periods before the treatment is good because you can check the common trends
assumption. Having fewer time periods after treatment just limits you in that you can
only measure the very short term effect.
Generally speaking, you don’t want to have too many time periods both before or after
the treatment, because you run the risk of measuring the effects of other things that
happen in those time periods that are not the treatment itself.
4) (15 points) Suppose you are interested in the demand for tickets to hockey games, and you believe
that quantity of tickets demanded is a linear function of price and other factors such as the popularity
of the home and visiting teams, income in the city, and other factors. You have data across a number
of hockey arenas located in different cities around North America (note: this is not necessarily the
NHL, so assume there are many of them). One issue with your data is that sometimes the games are
sold out, and so the quantity demanded is capped from above at arena capacity.
a. (5 points) Imagine that, on average, unobserved factors that determine ticket demand are
unrelated to price once we add the other variables listed above to the model. Would an OLS
regression of ticket demand on price and the other listed variables estimate the average
treatment on the treated? Explain.
8
b. (5 points) Suppose you motivate a Tobit model with an equation for “latent” demand for
hockey tickets, y ¿ =xβ−η , where x are the explanatory variables including price, η is a
normally distributed error, and y ¿ is continuous. The observed y equals 0 if y ¿ ≤ 0 and equals
arena capacity if y ¿ exceeds capacity (for simplicity, assume all arenas have the same
capacity). Otherwise it equals actual ticket demand. How would the slope from the OLS
regression in (a) compare to the slope from a Tobit model with the same variables? Explain.
c. (5 points) In your view, is it more relevant to study the marginal effect of price on E[ y∨x ¿
or of price on E [ y ¿¿ ¿∨x ]¿? Explain.
9
5) (15 points) Suppose you are interested in analyzing the factors that determine the likelihood of
getting a job interview for a series of applicants. Below are estimates of the parameters from a
linear probability model (LPM), Probit, and Logit of a binary variable equal to 1 if the applicant gets
a call back (call_back), years of work experience (yearsexp), and a dummy variable equal to 1 if the
person has a bachelor’s degree or higher (bachelor). Also below is a table of summary statistics for
each of those variables.
a. (5 points) Explain why the Probit and Logit parameter estimates are so much higher than the
linear probability model.
The reason is that the probit and logit parameters measure different things than the
LMP parameter. In particular, the logit and probit parameters are not equal to the
marginal effects, and to be able to compare all three you need to take the derivative of
them and hold the independent variables constant at the mean or median. Then they
will look much more similar.
10
b. (5 points) Draw a graph of the predicted values of the LPM and the Logit at various levels of
experience for people with a bachelor’s degree (to draw this graph, pick at least 4 values of
experience to evaluate the model).
0.08
0.06
0.04
0.02
0
5 10 15 20
Experience
c. (5 points) Suppose you are interested in the causal effect (the Average Treatment Effect on
the Treated) of having a bachelor’s degree on getting a callback. Do you think any of these
models identify the ATT? Explain.
None of them likely identify the causal effect, because there are omitted variables that
probably create selection bias. For example, natural ability is probably correlated with
having a bachelor’s degree and getting a callback, so omitting it will bias the estimates
upward. You could make the same argument with several other possible omitted
variables. Thus none of these regressions will identify the causal effect.
11
6) (15 points) The code below creates a dataset based on the following data generating process
y ¿ =2+4∗w¿ + ai +u¿
clear all
local N = 5000
local T = 5
local obs = `N'*`T'
set obs `obs'
gen id = floor((_n-1)/`T') +1
bysort id: gen t = _n
xtset id t
drop temp
gen y0 = 2 + a + eta
gen y1 = y0 + 4
gen y = y0 + (y1-y0)*w
a. (5 points) Briefly outline the key differences between a Pooled OLS model (which ignores
the panel structure), fixed effects, and random effects.
Pooled OLS completely ignores the fact that observations come from the same people
over time, and basically treats the panel as a bigger sample. For the purposes of
computing standard errors, it also ignores any common unobserved effect across
observations for the same people. For this to work, you would need to treat any
unobserved effect as unrelated to the variable of interest to measure treatment effects.
Fixed effects recognizes the panel structure of the data, and corrects for it explicitly by
taking a within transformation or adding dummy variables for each cross-sectional
unit. You would use this method if you think the common error component is
correlated with your variable of interest, and also fixed across time for each person.
Random effects recognizes the panel structure, but treats the common error component
as unrelated to the variable of interest. The main difference with Pooled OLS is that
this method explicitly accounts for the special error correlation structure caused by the
common error component for each person, and uses a GLS transformation to compute
the estimator.
12
b. (5 points) Suppose you only observe w and y, and you want to estimate the causal effect
(Average Treatment Effect) of w on y. Would you use Pooled OLS, Fixed Effects, Random
effects, or a First-Difference regression? Explain.
All of these will produce a biased estimate of the treatment effect because w is
correlated with eta, the time-varying error component. Given this, you might as well
use an estimator that has a lower variance, which would be either pooled OLS or
random effects.
c. (5 points) Now imagine you delete the line gen w = t>=3 & eta>0 and replace it with
sum a, d
gen w = (a <= r(p25) | a >= r(p75)) & t>=3
This generates w equal to 1 if a is in the bottom 25th percentile or top 25th percentile (and time
is period 3 or later). how would this affect your choice of model from part (a)?
Now the treatment w is no longer correlated with eta, the time-varying error term.
However, note that the unobserved effect a is mean independent of treatment, because
both the treatment and control groups have the same mean value of a. This means that
the treatment and a are uncorrelated and you could use any of the panel methods. It
would make the most sense to use Random Effects, since it would be the most efficient.
13