Section13 PDF
Section13 PDF
∗
GSIs: Caroline, Chris, Jimmy, Kaushiki, Leah
1 Experiments
1.1 Basic model
• Yi = β0 + β1 Xi + ui , where:
– The OLS estimator βb1 from the regression of Yi on Xi is then called the differences estimator.
treatment control
∗ βb1 = Y −Y
– “Treatment effect” means the causal effect of a treatment on some outcome of interest in an ideal
randomized controlled experiment.
– The term “causal effect” comes from this setting.
• The OLS estimator βb1 from the regression of Yi on Xi , W1i , ..., Wri is then called the differences
estimator with additional regressors.
∗ Thanks to previous GSIs for sharing their notes.
1
1.3 Basic model with panel data
• ∆Yi = β0 + β1 Xi + ui
– ∆Yi = Yiaf ter − Yibef ore , which assumes that we have observations on the same subjects before
and after the treatment (panel data).
• The OLS estimator βb1 from the regression of ∆Yi on Xi is then called the differences-in-differences
estimator, because:
• We also can estimate the treatment effect in this setting using an OLS regression on the following
model:
• βb1 (or βb3 for the second equation) is called the differences-in-differences estimator with additional
regressors.
1. failure to randomize
– example: treatment assigned based on characteristics or preferences of the subject
2. failure to follow treatment protocol
– example: individual selected to participate in a job training program (treatment) do not show
up for training sessions
2
3. attrition: subject drops out of the study after being randomly assigned to the treatment or control
group
– example: treated individual move to some place where there is no possibility to continue the
treatment
4. Hawthorne effect: subject change behavior just because he/she is participating in the experiment
– example: the excitement created by or the attention resulting from being in an experiment
induces extra effort that can affect outcomes
5. small samples
– high cost and/or ethical concerns
1. nonrepresentative sample
2. nonrepresentative program or policy
3. general equilibrium effects
– example: students receiving extra classes (treatment) can help other students in their neigh-
borhood, expanding the effect of the treatment
4. treatment vs eligibility effects
– example: people can participate in a job training program (treatment) if their income is at
most 40,000/year (eligibility)
3
3 Exercises
Stock and Watson, Exercise 13.3. Suppose that, in a randomized controlled experiment of the effect
of an SAT preparatory course on SAT scores, the following results are reported:
Final Exam Spring 2014, Question 3. You are hired by the Government of Ghana to study the impact
of income on the level of education. Using data on rural villages, you estimate the following population
regression using OLS:
where Educi is average years of formal education in the village, Incomei is average annual income per capita
in the village, P opi is the number of village residents, Schooli is the number of schools in the village, and
Agei is the average age of the village population.
(a) (5 points) Explain what econometric problem is likely to arise that leads to biased and inconsistent
estimates as a result of including Income as a regressor in the education regression as is done above.
You learn from Ghana’s Minister of Agriculture that the country’s citizens derive the bulk of their
income from agriculture. As a result, you cleverly infer that average annual rainfall (Rainf all) may be
a good instrument for income.
(b) (5 points) You recall from your econometrics course that an instrument can be used in a procedure called
Two Stage Least Squares that is designed to solve this econometric problem. Describe carefully the first
of the two stages and why TSLS will generate a consistent estimate of β1 .
You want to check the Minister’s suggestion that rainfall has an impact on incomes in Ghana. You
have information on average annual incomes in 1996 and 1997 for two regions: the “coastal region,”
which had the same precipitation level in both years, and the “hill region,” which experienced a 30%
increase in rainfall. Comparing 1996 and 1997, income in the coastal region fell from 124 to 104, while
income in the hill region fell from 98 to 99. You also recall from your econometrics course that this
situation might represent a “natural” or “quasi” experiment, allowing you to estimate the “treatment
effect” of rainfall.
(c) (8 points) Perform a difference in differences analysis of the effect of rainfall on average income. Sum-
marize the analysis in a table.
(d) (6 points) Describe a multivariate regression that when estimated using OLS will generate exactly the
same estimate of the effect of rainfall on income as was generated by the analysis in part (c).
(e) (6 points) Describe in detail one threat to the internal validity of the OLS estimates when treating these
data as a quasi-experiment, and how it would bias the coefficient estimate.
4
Final Exam Spring 2011, Question 5. In 1980, due to a temporary easing of Cuban emigration rules,
there was a huge influx of Cuban immigrants into the state of Florida. As a result of this so-called “Mariel
boatlift,” the low-skilled labor force of Miami increased by 7%. David Card compared the average hourly
wages in Miami and comparison cities (Atlanta, Houston, Los Angeles, and Tampa-St. Petersburg). The
average hourly wages expressed in logarithms are given in the below table:
Cities
Miami Comparison
1979 1.85 1.93
Year
1981 1.85 1.91
(a) (10 points) Calculate the percentage change in average hourly wages in the treatment group and in the
control group, and uses those changes to the differences-in-differences (“DiD”) estimate. Is the sign of
the DID estimate what would be predicted by economic theory? Explain.
(b) (10 points) Give an example of a relevant variable that is omitted from the DiD estimation, and predict
the likely bias it would cause.
(c) (12 points) To accommodate other determinants of metropolitan wage rates, you suggest including a
measure of the size of the metropolitan manufacturing sector Mi since it might reflect ability to absorb
low-skilled workers. Write down a linear regression that generates a DiD estimate while incorporating
this control variable. Why would you believe that this regression approach would change your estimate
of the effect of the Mariel boatlift from (a)?
5
Exercise Solutions
Stock and Watson, Exercise 13.3.
T reatment Control
(a) The estimated average treatment effect is X −X = 1241 − 1201 = 40 points.
(b) There would be nonrandom assignment if men and women had different probabilities of being assigned to
the treatment and control groups. Let pM denote the probability that a male is assigned to the treatment
group, and let pW denote the probability that a female is assigned to the treatment group. Random
assignment means pM = pW (i.e., probability of assignment does not depend on gender). Testing this
M −b
null hypothesis results in a t-statistic of t-stat = q pbM (1−ppbbM )
pW
p
bW (1−p
bW )
0.55−0.45
= √ 0.55·0.45 0.45·0.55
= 1.42, so
+ 100 + 100
nM nW
that the null of random assignment cannot be rejected at all common significance levels.
Using the OLSEs from this regression, compute the fitted values Income \ from this regression. These
fitted values are highly correlated with if the instrument is relevant, and if the instrument is exogenous
then they should be uncorrelated with the population error term. More simply, the fitted values measure
that portion of the endogenous regressor which is correlated with variable of interest and uncorrelated
with the error term.
(c) Formally, the D-in-D impact of a 30% increase in rainfall is β = [Income(Hill, 1997)−Income(Hill, 1996)]−
[Income(Coast, 1997) − Income(Coast, 1996)] = (96 − 98) − (104 − 124) = −2 + 20 = 18. Income raised
by 18 unites due to the 30% increase in rainfall.
(d) Let Gi = 1 if village is in Hills and Gi = 0i f village is on the Coast; Dt = 1 if year is 1997 and Dt = 0 if
year is 1996. Consider the OLS regression: Ii = β0 + β1 Gi + β2 Dt + β3 Gi × Dt + ui where Ii is income
of village i. The differences in differences estimate of the rainfall effect is the OLSE of β3 .
(e) Failure of randomization: the villages may not have been randomly chosen in the two regions; in fact,
rainfall is likely not uniform throughout a region, so e.g. a dry part of the Hill region could be no
different than the coast. Failure of compliance: should not be an issue here since cannot easily control
rainfall. Attrition: movement of people especially between two regions would affect results. Hawthorne
effect: depends on whether villages informed about researcher collecting data on income, education and
other information.
6
(b) Cuban emigrants likely come to Miami where other Cubans have settled in earlier years who now offer
them job opportunities that are not available in the other cities This would bias DiD upward.
(c) Two possible specifications, one with differences ∆Yi = Yiaf ter − Yibef ore = β0 + β1 Xi + β2 Mi , and a
second as Yi = β0 + β1 Xi + β2 Gi + β3 Di + β4 Mi with usual definitions of the dummies. It is different
because it stems from using the multiple regression model rather than the regression with a single
regressor. In that case, β1 is consistent (as long as we have conditional mean independence). Intuitively,
by including the additional controls, the differences estimator controls for the fact that the treatment
probability can depend on their values. The inclusion of the characteristics also allows for testing for
random receipt of treatment and random assignment using the usual F-statistic in auxiliary regressions.