0% found this document useful (0 votes)
110 views7 pages

Section13 PDF

This document provides an overview of experimental and quasi-experimental methods for estimating causal treatment effects in economics. It discusses the basic models for randomized experiments, including differences-in-differences estimators with and without additional control variables. It also covers threats to internal and external validity and provides an example of a quasi-experiment using a natural experiment design. The document concludes with exercises on analyzing results from a randomized controlled experiment and interpreting a differences-in-differences regression.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views7 pages

Section13 PDF

This document provides an overview of experimental and quasi-experimental methods for estimating causal treatment effects in economics. It discusses the basic models for randomized experiments, including differences-in-differences estimators with and without additional control variables. It also covers threats to internal and external validity and provides an example of a quasi-experiment using a natural experiment design. The document concludes with exercises on analyzing results from a randomized controlled experiment and interpreting a differences-in-differences regression.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Econ 140 (Spring 2018) - Section 13


GSIs: Caroline, Chris, Jimmy, Kaushiki, Leah

1 Experiments
1.1 Basic model
• Yi = β0 + β1 Xi + ui , where:

– Yi is the dependent variable


– Xi is a treatment dummy
∗ Xi = 1, if individual i was randomly included in the treatment group
∗ Xi = 0, if individual i was randomly included in the control group
– ui is the error term
∗ E[ui |Xi ] = 0: conditional mean-zero assumption

• Treatment effect: E[Yi |Xi = 1] − E[Yi |Xi = 0] = (β0 + β1 ) − (β0 ) = β1

– The OLS estimator βb1 from the regression of Yi on Xi is then called the differences estimator.
treatment control
∗ βb1 = Y −Y
– “Treatment effect” means the causal effect of a treatment on some outcome of interest in an ideal
randomized controlled experiment.
– The term “causal effect” comes from this setting.

1.2 General model


• What if the treatment and control groups differ in observable characteristics?
• Yi = β0 + β1 Xi + β2 W1i + ... + β1+r Wri + ui

– W 0 s are additional regressors representing observable characterisitcs of the treated entities.


∗ Example: If the randomization occurs separately for each level of education and for each
gender, then we must include these variables in our regression to estimate β1 .
– E[ui |Xi , W1i , ..., Wri ] = E[ui |W1i , ..., Wri ]: conditional mean-independence assumption
∗ Interpretation: After conditioning on the observable characteristics, the treatment variable
(Xi ) is uncorrelated with the error term (⇒ consistency of the estimator defined below).
∗ If it holds, the coefficient on Xi (variable of interest) will have a causal interpretation, but
the coefficients on W1i , ..., Wri (control variables) will not!

• The OLS estimator βb1 from the regression of Yi on Xi , W1i , ..., Wri is then called the differences
estimator with additional regressors.
∗ Thanks to previous GSIs for sharing their notes.

1
1.3 Basic model with panel data
• ∆Yi = β0 + β1 Xi + ui

– ∆Yi = Yiaf ter − Yibef ore , which assumes that we have observations on the same subjects before
and after the treatment (panel data).

• Treatment effect: E[∆Yi |Xi = 1] − E[∆Yi |Xi = 0] = (β0 + β1 ) − (β0 ) = β1

• The OLS estimator βb1 from the regression of ∆Yi on Xi is then called the differences-in-differences
estimator, because:

bef ore treatment bef ore control


treatment control
 af ter   af ter 
– βb1 = ∆Y − ∆Y = Y −Y − Y −Y

• We also can estimate the treatment effect in this setting using an OLS regression on the following
model:

– Yit = β0 + β1 Xi + β2 Af tert + β3 (Xi ∗ Af tert ) + uit


∗ Xi is a treatment dummy.
∗ Af tert is a period dummy.
∗ βb3 is the differences-in-differences estimator here, because:
After Before Difference
bef ore treatment
treatment,af ter treatment,bef ore
 af ter

Treatment Y Y Y −Y
bef ore control
control,af ter control,bef ore
 af ter 
Control Y Y Y −Y
treatment control
Differences-in-differences estimator ∆Y − ∆Y
or, in terms of β 0 s:
After Before Difference
Treatment βb0 + βb1 + βb2 + βb3 βb0 + βb1 βb2 + βb3
Control βb0 + βb2 βb0 βb2
Differences-in-differences estimator βb3

1.4 General model with panel data


• Include the W 0 s:

– ∆Yi = β0 + β1 Xi + β2 W1i + ... + β1+r Wri + ui , or


– Yit = β0 + β1 Xi + β2 Af tert + β3 (Xi ∗ Af tert ) + β4 W1it + ... + β3+r Writ + uit

• βb1 (or βb3 for the second equation) is called the differences-in-differences estimator with additional
regressors.

1.5 Threats to validity


• Threats to internal validity:

1. failure to randomize
– example: treatment assigned based on characteristics or preferences of the subject
2. failure to follow treatment protocol
– example: individual selected to participate in a job training program (treatment) do not show
up for training sessions

2
3. attrition: subject drops out of the study after being randomly assigned to the treatment or control
group
– example: treated individual move to some place where there is no possibility to continue the
treatment
4. Hawthorne effect: subject change behavior just because he/she is participating in the experiment
– example: the excitement created by or the attention resulting from being in an experiment
induces extra effort that can affect outcomes
5. small samples
– high cost and/or ethical concerns

• Threats to external validity:

1. nonrepresentative sample
2. nonrepresentative program or policy
3. general equilibrium effects
– example: students receiving extra classes (treatment) can help other students in their neigh-
borhood, expanding the effect of the treatment
4. treatment vs eligibility effects
– example: people can participate in a job training program (treatment) if their income is at
most 40,000/year (eligibility)

• Practice problem: SW 13.5


(see solution online at http://wps.aw.com/aw stock ie 3/178/45691/11696965.cw/index.html)

2 Quasi-Experiments (or “Natural Experiments”)


• In Social Sciences, because of ethical concerns and/or high cost of experiments, researchers often use
natural experiments to estimate causal (“treatment“) effects. Here, randomness is introduced by vari-
ations in individual circumstances that make it appear as if the treatment is randomly assigned.
These variations in individual circumstances might arise because of vagaries in legal institutions, lo-
cation, timing of policy or program implementation, natural randomness such as birth dates, rainfall,
earthquakes, or other factors that are unrelated to the causal effect under study.

– Example 1: Effect of class size on test scores


∗ Earthquake destroys schools ⇒ students go to closest schools ⇒ increase in class sizes (“treat-
ment“)
– Example 2: Effect of immigration on native wages
∗ “Mariel boatlift“ (Cuban immigration to Miami in the 1980s resulted from temporary lifting of
restrictions on emigration in Cuba) ⇒ increase immigrant population in Miami (“treatment“)
⇒ increase the Miami labor force

• Same econometric models as for experiments: differences-in-differences in the above examples.


• Key threat to internal validity: “as if“ randomization may not be a good proxy for true randomization.
• Key threat to external validity: there may be something special about the setting, e.g. with the Florida
labor market or with the time period or with the Cuban immigrants in the Mariel Boatlift example,
which hinders the generalizability of the results.

3
3 Exercises
Stock and Watson, Exercise 13.3. Suppose that, in a randomized controlled experiment of the effect
of an SAT preparatory course on SAT scores, the following results are reported:

Treatment Group Control Group


Average SAT Score (X) 1241 1201
Standard deviation of SAT score (SX ) 93.2 97.1
Number of women 55 45
Number of men 45 55

(a) Estimate the average treatment effect on test scores.


(b) Is there evidence of non-random assignment? Explain.

Final Exam Spring 2014, Question 3. You are hired by the Government of Ghana to study the impact
of income on the level of education. Using data on rural villages, you estimate the following population
regression using OLS:

Educi = β0 + β1 Incomei + β2 P opi + β3 Schooli + β4 Agei + ui

where Educi is average years of formal education in the village, Incomei is average annual income per capita
in the village, P opi is the number of village residents, Schooli is the number of schools in the village, and
Agei is the average age of the village population.

(a) (5 points) Explain what econometric problem is likely to arise that leads to biased and inconsistent
estimates as a result of including Income as a regressor in the education regression as is done above.

You learn from Ghana’s Minister of Agriculture that the country’s citizens derive the bulk of their
income from agriculture. As a result, you cleverly infer that average annual rainfall (Rainf all) may be
a good instrument for income.
(b) (5 points) You recall from your econometrics course that an instrument can be used in a procedure called
Two Stage Least Squares that is designed to solve this econometric problem. Describe carefully the first
of the two stages and why TSLS will generate a consistent estimate of β1 .

You want to check the Minister’s suggestion that rainfall has an impact on incomes in Ghana. You
have information on average annual incomes in 1996 and 1997 for two regions: the “coastal region,”
which had the same precipitation level in both years, and the “hill region,” which experienced a 30%
increase in rainfall. Comparing 1996 and 1997, income in the coastal region fell from 124 to 104, while
income in the hill region fell from 98 to 99. You also recall from your econometrics course that this
situation might represent a “natural” or “quasi” experiment, allowing you to estimate the “treatment
effect” of rainfall.
(c) (8 points) Perform a difference in differences analysis of the effect of rainfall on average income. Sum-
marize the analysis in a table.
(d) (6 points) Describe a multivariate regression that when estimated using OLS will generate exactly the
same estimate of the effect of rainfall on income as was generated by the analysis in part (c).
(e) (6 points) Describe in detail one threat to the internal validity of the OLS estimates when treating these
data as a quasi-experiment, and how it would bias the coefficient estimate.

4
Final Exam Spring 2011, Question 5. In 1980, due to a temporary easing of Cuban emigration rules,
there was a huge influx of Cuban immigrants into the state of Florida. As a result of this so-called “Mariel
boatlift,” the low-skilled labor force of Miami increased by 7%. David Card compared the average hourly
wages in Miami and comparison cities (Atlanta, Houston, Los Angeles, and Tampa-St. Petersburg). The
average hourly wages expressed in logarithms are given in the below table:

Cities
Miami Comparison
1979 1.85 1.93
Year
1981 1.85 1.91

(a) (10 points) Calculate the percentage change in average hourly wages in the treatment group and in the
control group, and uses those changes to the differences-in-differences (“DiD”) estimate. Is the sign of
the DID estimate what would be predicted by economic theory? Explain.

(b) (10 points) Give an example of a relevant variable that is omitted from the DiD estimation, and predict
the likely bias it would cause.
(c) (12 points) To accommodate other determinants of metropolitan wage rates, you suggest including a
measure of the size of the metropolitan manufacturing sector Mi since it might reflect ability to absorb
low-skilled workers. Write down a linear regression that generates a DiD estimate while incorporating
this control variable. Why would you believe that this regression approach would change your estimate
of the effect of the Mariel boatlift from (a)?

5
Exercise Solutions
Stock and Watson, Exercise 13.3.
T reatment Control
(a) The estimated average treatment effect is X −X = 1241 − 1201 = 40 points.
(b) There would be nonrandom assignment if men and women had different probabilities of being assigned to
the treatment and control groups. Let pM denote the probability that a male is assigned to the treatment
group, and let pW denote the probability that a female is assigned to the treatment group. Random
assignment means pM = pW (i.e., probability of assignment does not depend on gender). Testing this
M −b
null hypothesis results in a t-statistic of t-stat = q pbM (1−ppbbM )
pW
p
bW (1−p
bW )
0.55−0.45
= √ 0.55·0.45 0.45·0.55
= 1.42, so
+ 100 + 100
nM nW
that the null of random assignment cannot be rejected at all common significance levels.

Final Exam Spring 2014, Question 3.


(a) The variable is likely to be endogenous since not only does village income have an impact on education,
but the average education may also cause income. If we ignore the endogeneity issue, using OLS will
result in and biased and inconsistent estimate of β1 . Omitted variables such as religious or sectarian
composition of the population that correlate with both income and education may be another source of
endogeneity that biases the estimate of β1 .
(b) Using TSLS, we would need to run the first stage regression of the endogenous regressor on the instru-
ments and controls:

Incomei = π0 + π1 Rainf alli + π2 P opi + π3 Schooli + π4 Agei + vi .

Using the OLSEs from this regression, compute the fitted values Income \ from this regression. These
fitted values are highly correlated with if the instrument is relevant, and if the instrument is exogenous
then they should be uncorrelated with the population error term. More simply, the fitted values measure
that portion of the endogenous regressor which is correlated with variable of interest and uncorrelated
with the error term.
(c) Formally, the D-in-D impact of a 30% increase in rainfall is β = [Income(Hill, 1997)−Income(Hill, 1996)]−
[Income(Coast, 1997) − Income(Coast, 1996)] = (96 − 98) − (104 − 124) = −2 + 20 = 18. Income raised
by 18 unites due to the 30% increase in rainfall.

Region Rainfall 1996 1997


Coast (control) No change 124 104
Hill (treatment) +30% 98 96

(d) Let Gi = 1 if village is in Hills and Gi = 0i f village is on the Coast; Dt = 1 if year is 1997 and Dt = 0 if
year is 1996. Consider the OLS regression: Ii = β0 + β1 Gi + β2 Dt + β3 Gi × Dt + ui where Ii is income
of village i. The differences in differences estimate of the rainfall effect is the OLSE of β3 .
(e) Failure of randomization: the villages may not have been randomly chosen in the two regions; in fact,
rainfall is likely not uniform throughout a region, so e.g. a dry part of the Hill region could be no
different than the coast. Failure of compliance: should not be an issue here since cannot easily control
rainfall. Attrition: movement of people especially between two regions would affect results. Hawthorne
effect: depends on whether villages informed about researcher collecting data on income, education and
other information.

Final Exam Spring 2011, Question 5.


(a) Change in the treatment group: 0%, change in the control group: -2% (keeping in mind that a 1%
change in wage is equivalent to a 0.01 change in the log of wage). Effect of the increase in labor supply
on average hourly wages is equal to +2%. Standard economic theory suggests a negative, not positive,
change.

6
(b) Cuban emigrants likely come to Miami where other Cubans have settled in earlier years who now offer
them job opportunities that are not available in the other cities This would bias DiD upward.
(c) Two possible specifications, one with differences ∆Yi = Yiaf ter − Yibef ore = β0 + β1 Xi + β2 Mi , and a
second as Yi = β0 + β1 Xi + β2 Gi + β3 Di + β4 Mi with usual definitions of the dummies. It is different
because it stems from using the multiple regression model rather than the regression with a single
regressor. In that case, β1 is consistent (as long as we have conditional mean independence). Intuitively,
by including the additional controls, the differences estimator controls for the fact that the treatment
probability can depend on their values. The inclusion of the characteristics also allows for testing for
random receipt of treatment and random assignment using the usual F-statistic in auxiliary regressions.

You might also like