2020 Difference in Difference Analysis
2020 Difference in Difference Analysis
Hsueh-Sheng Wu
CFDR Workshop Series
June 15, 2020
1
Outline of Presentation
• What is Difference-in-Differences (DID)
analysis
• Threats to internal and external validity
• Compare and contrast three different research
designs
• Graphic presentation of the DID analysis
• Link between regression and DID
• Stata -diff- module
• Sample Stata codes
• Conclusions
2
What Is Difference-in-Differences Analysis
• Difference-in-Differences (DID) analysis is a statistic technique
that analyzes data from a nonequivalence control group design
and makes a casual inference about an independent variable
(e.g., an event, treatment, or policy) on an outcome variable
• A non-equivalence control group design establishes the
temporal order of the independent variable and the dependent
variable, so it establishes which variable is the cause and
which one is the effect
• A non-equivalence control group design does not randomly
assign respondents to the treatment or control group, so
treatment and control groups may not be equivalent in their
characteristics and reactions to the treatment
• DID is commonly used to evaluate the outcome of policies or
natural events (such as Covid-19)
3
Internal and External Validity
• When designing an experiment, researchers need to
consider how extraneous variables may threaten the
internal validity and external validity of an experiment
4
Threats to Internal Validity
• History: historical events happened to respondents’ lives during the
course of the experiment
• Maturation: physiological and/or psychological changes among
respondents during the course of the experiment
• Testing: respondents perform better on a similar test when they take it
the second time
• Instrumentation: different measuring procedures or measurements
are used in the pre-test and the post-test
• Regression toward the mean: the ceiling effect or the flooring effect
• Selection: the experiment and control groups are not equivalent
groups in the first place, which contributes to the differences in the
outcome variable later
• Attrition: the experiment and control groups differ in the likelihood of
dropping out, leading to difference in the outcome variable later
5
Threats to External Validity
• Reactive effects of experimental arrangements: unique
features of an experiment lead respondents to have change in
the outcome variable
6
Compare and Contrast Three Different Research Designs
Table 1. Comparisons of an Experiment, a Quasi-Experiment, and a Survey
The Pretest-Posttest Nonequivalent Control
Sample Design* A Two-Wave Panel Survey**
Control Group Design Group Design*
Xt 1 Xt 2
Yt 1 X Yt 2 Yt 1 X Yt 2
Yt 1 Yt 2
R -------------------------- ----------------------
Xc1 Xc2
Yc1 Yt 2 Yc1 Yt 2
Yc1 Yc2
Design Characteristics
Randomization ✓ ✗ ✗
Manipulation of X ✓ ✓ ✗
Control for Internal Validity
Threats
History ✓ ✓ ✓ ?
Instrumentation ✓ ✓ ✓ ?
Testing ✓ ✓ ✓ ?
Regression toward the mean ✓ ✓ ✓ ?
Maturation ✓ ✓ ✓ ?
Attrition ✓ ✓ ✓ ?
Selection ✓ ✓ ✗ ?
Interactdions between
Selection and other threats
✓ ✓ ✗ ?
Control for External Validity
Threats
Reactive effects of experimental
arrangements
✗ ✗ ✗ ✓
Reactive or interaction effect of
testing
✗ ✗ ✗ ✗
Interaction effects of selection
biases and the experimental ? ? ✗ ?
variable
Multiple treatment interference ? ? ? ✓
*Difference-in Differences analysis ususally use data collected from this design
** Surveys generally relies on statistical methods, rather than research design, to control for threats to internal validity.
7
Graphic Presentation of the DID Analysis
Yt2*
Yt1
Yc2
Yc1
t0 Time t1
8
Link between Regression and DID
• From the perspective of regression analysis, DID estimates the
interaction term of time and treatment
• DID estimates the difference between Yt2 and Yt2* = (B0 + B1 + B2 + B3) - (B0 + B1
+ B2) = B3
9
Strengths and Weaknesses of DID
Strengths:
• DID is intuitive and can be easily understood within the framework of regression
• DID uses a nonequivalent control group design to establish the temporal order
between the independent variable and the outcome variable, which is crucial in
identify the causal direction of variables
• The incorporation of control group eliminates many threats, except the selection
bias to internal validity, so researchers do not need to statistically control every
confounding variables in the analysis
Weaknesses:
• In a natural experiment setting, it is difficult to understand what characteristics
of experiments leads to change
• It is also unclear how much the experiment resembles the event in real life and
raises the question about the external validity of the findings
• The equivalence between the treatment and control group (e.g., selection bias)
prevents researchers from making valid casual inference on the treatment and
the outcome variable. However, some statistical control (e.g., propensity score
matching) can be used along with DID to reduce this problem.
10
Stata -diff- Module
• Dr. Juan Villa wrote the Stata -diff- module. Users can install this
module by typing “ssc install diff” in the Stata command window.
• This module allows researchers to incorporate additional covariates of
outcome to adjust for different levels of the outcome between the
treatment and control groups
• This module allows researchers to reduce the selection bias problem
by calculating the kernel propensity score and use it to match the
treatment and control groups. In addition, this module can test
whether these two groups are equivalent in covariates after matching
is performed.
• This module analyzes quantile outcome variable
• This module conducts triple difference-in-differences analysis
• This module has a bootstrap option to gain a better estimate of the
variance of the parameter
• This module can be used to analyze repeated cross-sectional data
research design
11
Examples of DID Analysis
12
Conclusions
• Difference-in-Differences (DID) analysis is a useful statistic technique that
analyzes data from a nonequivalence control group design and makes a casual
inference about an independent variable (e.g., an event, treatment, or policy) on
an outcome variable
• The analytic concept of DID is very easy to comprehended within the framework
of regression
• Selection bias is the most common threat to the DID analysis. Researchers can
reduce this problem by incorporating covariates that may contribute to the
outcome variable or by using propensity score matching to make treatment and
control groups equivalent.
• The findings of DID analysis may not be generalized to other settings, depending
on what the experiment is, how much the experiment mimics the event in real
life, and how respondents react to the experiment.
• Sociologists are interested in some constructs that should not or cannot be
manipulated for ethical reasons (e.g., change in people’s marital status, the
occurrence of a pandemic disease or a natural disaster). Thus, if data happen to
be collected before and after an event, researchers can use DID to analyze such
data and gain a better understanding about the relation between the event and
the outcome variable.
13