100% found this document useful (1 vote)
198 views3 pages

Difference in Difference Estimation in R and Stata

This document discusses difference-in-differences (DID) estimation and provides code examples to calculate unconditional and conditional DID estimates using R and Stata. It uses data on the 1993 expansion of the Earned Income Tax Credit (EITC) and its impact on employment for single women. The document shows how to define treatment and control groups, calculate the four data points needed for the DID calculation, and run a simple regression with an interaction term to estimate the conditional treatment effect of the EITC on employment for women with children.

Uploaded by

renita1901
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
198 views3 pages

Difference in Difference Estimation in R and Stata

This document discusses difference-in-differences (DID) estimation and provides code examples to calculate unconditional and conditional DID estimates using R and Stata. It uses data on the 1993 expansion of the Earned Income Tax Credit (EITC) and its impact on employment for single women. The document shows how to define treatment and control groups, calculate the four data points needed for the DID calculation, and run a simple regression with an interaction term to estimate the conditional treatment effect of the EITC on employment for women with children.

Uploaded by

renita1901
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Differences-in-Differences estimation in R and Stata

{ a.k.a. Difference-in-Difference, Difference-in-Differences,DD, DID, D-I-D. }

DID estimation uses four data points to deduce the impact of a policy change or some other
shock (a.k.a. treatment) on the treated population: the effect of the treatment on the treated. The
structure of the experiment implies that the treatment group and control group have similar
characteristics and are trending in the same way over time. This means that the counterfactual
(unobserved scenario) is that had the treated group not received treatment, its mean value would
be the same distance from the control group in the second period. See the diagram below; the
four data points are the observed mean (average) of each group. These are the only data points
necessary to calculate the effect of the treatment on the treated. The dotted lines represent the
trend that is not observed by the researcher. Notice that although the means are different, they
both have the same time trend (i.e. slope).

For a more thorough work through of the effect of the Earned Income Tax Credit on female
employment, see an earlier post of mine:
Calculate the D-I-D Estimate of the Treatment Effect

We will now use R and Stata to calculate the unconditional difference-in-difference estimates
of the effect of the 1993 EITC expansion on employment of single women.

R:
1
2 # Load the foreign package
3 require(foreign)
4
5 # Import data from web site
6
7 require(foreign)
8
# update: first download the file eitc.dta from this link:
9
# https://docs.google.com/open?id=0B0iAUHM7ljQ1cUZvRWxjUmpfVXM
10# Then import from your hard drive:
11eitc = read.dta("C:/link/to/my/download/folder/eitc.dta")
12
13# Create two additional dummy variables to indicate before/after
14# and treatment/control groups.
15
# the EITC went into effect in the year 1994
16
eitc$post93 = as.numeric(eitc$year >= 1994)
17
18# The EITC only affects women with at least one child, so the
19# treatment group will be all women with children.
20eitc$anykids = as.numeric(eitc$children >= 1)
21
22# Compute the four data points needed in the DID calculation:
23a = sapply(subset(eitc, post93 == 0 & anykids == 0, select=work), mean)
b = sapply(subset(eitc, post93 == 0 & anykids == 1, select=work), mean)
24c = sapply(subset(eitc, post93 == 1 & anykids == 0, select=work), mean)
25d = sapply(subset(eitc, post93 == 1 & anykids == 1, select=work), mean)
26
27# Compute the effect of the EITC on the employment of women with
28children:
29(d-c)-(b-a)
30

The result is the width of the “shift” shown in the diagram above.

STATA:
cd "C:\DATA\Econ 562\homework"
use eitc, clear

gen anykids = (children >= 1)


gen post93 = (year >= 1994)

mean work if post93==0 & anykids==0 /* value 1 */


mean work if post93==0 & anykids==1 /* value 2 */
mean work if post93==1 & anykids==0 /* value 3 */
mean work if post93==1 & anykids==1 /* value 4 */
Then you must do the calculation by hand (shown on the last line of the R code).
(value 4 – value 3) – (value 2 – value 1)

Run a simple D-I-D Regression

Now we will run a regression to estimate the conditional difference-in-difference estimate of


the effect of the Earned Income Tax Credit on “work”, using all women with children as the
treatment group. This is exactly the same as what we did manually above, now using ordinary
least squares. The regression equation is as follows:

Where is the white noise error term, and is the effect of the treatment on the treated — the
shift shown in the diagram. To be clear, the coefficient on is the value we
are interested in (i.e., ).

R:
1eitc$p93kids.interaction = eitc$post93*eitc$anykids

2reg1 = lm(work ~ post93 + anykids + p93kids.interaction, data = eitc)

3summary(reg1)

The coefficient estimate on p93kids.interaction should match the value calculated


manually above.

STATA:
gen interaction = post93*anykids
reg work post93 anykids interaction

You might also like