Problem Set III: Part 1a: Theoretical Exercise
Problem Set III: Part 1a: Theoretical Exercise
Ursula Mello
Yi = β0 + β1 Di + β2 Di Xi + f (Xi ) + ui (1)
where Di is an indicator of treatment that takes value 1 if i participates in the program we want to
evaluate, Xi is an observable continuous variable, f (x) is a derivable function that does not depend
on any other variable nor parameter of the model, ui is unobservable. While ui |X follows an unk-
nown distribution, we know that E [ui |X] and E [ui |D, X] depend continuously on X. Furthermore,
participation in the program is determined according to the following rule:
(a) Derive the effect of the program for i, τi , as a function of the terms and parameters in (1). Using
τi , write the expression for τAT T as a function of the parameters of the model.
(b) Write and explain the discontinuity identification condition on D and the assumption about local
continuity on Y that are needed for identifying a treatment effect with the Regression Discon-
tinuity Method. Are those conditions satisfied in the model described in this exercise? Would
they be satisfied if γ1 were equal to 0?
(c) Calculate the effect of the program identified by the RD estimator, τRD , as a function of β1 , β2 y
x. Calculate it also if γ2 = 0. You might, if you want, use the variable Zi = 1 {Xi ≥ x} to define
τRD . And if β2 = 0, what effect is identified?
(d) Explain a way to estimate τRD assuming the assumptions needed for RD are satisfied but that
the functional forms in (1) and (2) are unknown.
Part 1b: Dell, 2010. The persistent effects of Peru’s mining mita. Eco-
nometrica.
1. Explain what the mita system was and how it creates a discontinuity in the paper’s context. Is that a
sharp or fuzzy RD design, and why? Explain what sharpness and fuzziness mean in this context.
1
2. Explain what is the role of Table 1 in the paper. Does it provide evidence to which of the RDDs
crucial assumptions? Why is that condition crucial for interpreting RD estimates as causal?
3. Suppose we are working on a homogeneous effects, sharp RD framework, so that Yi = αDi +Y0i . Use
the fact that limz→z+ E[Yi j | Zi = z] = limz→z− E[Yi j | Zi = z] to demonstrate formally that α can be
0 0
identified as α = limz→z+ E[Yi | Zi = z] − limz→z− E[Yi | Zi = z].
0 0
Part 2: Angrist, J.D. and Lavy, V., 1999. Using Maimonides’ rule to es-
timate the effect of class size on scholastic achievement. The Quarterly
Journal of Economics.
1. Explain what the Maimonides’ rule is and how it creates a discontinuity in class size assignment to
children. Is that a sharp or fuzzy RD, and why?
2. The data file angrist1999.dta contains the data used in that paper. Use it for the following points:
(a) Reproduce Figure 2-Panel A, but using math scores instead of reading scores. Which prelimi-
nary conclusions about the relationship between class size and school performance?
(b) Now run an OLS regression of average math score on average class size, controlling for percen-
tage of disadvantaged kids and enrollment. Is the estimate for average class size causal? Is its
sign consistent with your discussion in item (a)?
(c) Implement an IV estimation of average math test scores on average class size by instrumenting
it with the Maimonides Rule. Control for percentage of disadvantaged kids and enrollment.
Comment the estimate for class size – is it causal, and how it compares to what was found in
item (b)? If causal, which assumptions we need to make for that to hold?
3. Let us now give a LATE-type of interpretation to our problem. Disregard enrollment cohorts with
more than 50 students. Suppose we are interested in looking at the local effect in the neighbourhood
of 5 students around the class size cutoff (i.e. z0 = 40, h = 5).
(a) Which is the extra assumption needed for this estimation, and how is it formally stated? How
would it be interpreted in the current context?
(b) Estimate a LATE-like αRD for math scores as the outcome variable. That is, estimate1
E[Yi | Wi = 1, Si = 1] − E[Yi | Wi = 0, Si = 1]
α̂RD = (3)
E[Di | Wi = 1, Si = 1] − E[Di | Wi = 0, Si = 1]
1 You will be looking at the cohorts with z ± h pupils. S is a dummy for belonging to such group, and W a dummy for belonging
0 i i
to an enrollment cohort above the class size cutoff.