Difference in Difference Estimation in R and Stata
Difference in Difference Estimation in R and Stata
DID estimation uses four data points to deduce the impact of a policy change or some other
shock (a.k.a. treatment) on the treated population: the effect of the treatment on the treated. The
structure of the experiment implies that the treatment group and control group have similar
characteristics and are trending in the same way over time. This means that the counterfactual
(unobserved scenario) is that had the treated group not received treatment, its mean value would
be the same distance from the control group in the second period. See the diagram below; the
four data points are the observed mean (average) of each group. These are the only data points
necessary to calculate the effect of the treatment on the treated. The dotted lines represent the
trend that is not observed by the researcher. Notice that although the means are different, they
both have the same time trend (i.e. slope).
For a more thorough work through of the effect of the Earned Income Tax Credit on female
employment, see an earlier post of mine:
Calculate the D-I-D Estimate of the Treatment Effect
We will now use R and Stata to calculate the unconditional difference-in-difference estimates
of the effect of the 1993 EITC expansion on employment of single women.
R:
1
2 # Load the foreign package
3 require(foreign)
4
5 # Import data from web site
6
7 require(foreign)
8
# update: first download the file eitc.dta from this link:
9
# https://docs.google.com/open?id=0B0iAUHM7ljQ1cUZvRWxjUmpfVXM
10# Then import from your hard drive:
11eitc = read.dta("C:/link/to/my/download/folder/eitc.dta")
12
13# Create two additional dummy variables to indicate before/after
14# and treatment/control groups.
15
# the EITC went into effect in the year 1994
16
eitc$post93 = as.numeric(eitc$year >= 1994)
17
18# The EITC only affects women with at least one child, so the
19# treatment group will be all women with children.
20eitc$anykids = as.numeric(eitc$children >= 1)
21
22# Compute the four data points needed in the DID calculation:
23a = sapply(subset(eitc, post93 == 0 & anykids == 0, select=work), mean)
b = sapply(subset(eitc, post93 == 0 & anykids == 1, select=work), mean)
24c = sapply(subset(eitc, post93 == 1 & anykids == 0, select=work), mean)
25d = sapply(subset(eitc, post93 == 1 & anykids == 1, select=work), mean)
26
27# Compute the effect of the EITC on the employment of women with
28children:
29(d-c)-(b-a)
30
The result is the width of the “shift” shown in the diagram above.
STATA:
cd "C:\DATA\Econ 562\homework"
use eitc, clear
Where is the white noise error term, and is the effect of the treatment on the treated — the
shift shown in the diagram. To be clear, the coefficient on is the value we
are interested in (i.e., ).
R:
1eitc$p93kids.interaction = eitc$post93*eitc$anykids
3summary(reg1)
STATA:
gen interaction = post93*anykids
reg work post93 anykids interaction