0% found this document useful (0 votes)
25 views36 pages

Did Functional - Form

This document discusses the Difference-in-Differences (DiD) method for causal inference, emphasizing the importance of the parallel trends assumption and its sensitivity to functional form. It explores how parallel trends can be maintained across different transformations of outcomes, such as levels and logs, and presents a framework for testing the validity of these assumptions in empirical studies. The lecture also includes an empirical illustration using state-level minimum wage changes to demonstrate the application of these concepts.

Uploaded by

kanspurchase2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views36 pages

Did Functional - Form

This document discusses the Difference-in-Differences (DiD) method for causal inference, emphasizing the importance of the parallel trends assumption and its sensitivity to functional form. It explores how parallel trends can be maintained across different transformations of outcomes, such as levels and logs, and presents a framework for testing the validity of these assumptions in empirical studies. The lecture also includes an empirical illustration using state-level minimum wage changes to demonstrate the application of these concepts.

Uploaded by

kanspurchase2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Causal Inference using Difference-in-Differences

Lecture 4: Parallel Trends and Functional Form

Pedro H. C. Sant’Anna
Emory University

January 2025
Introduction
Introduction

■ Difference-in-differences (DiD) is one of the most popular strategies for estimating


causal effects in non-experimental contexts.

■ The reliability of DiD methods depends on the parallel trends assumption.

■ Random assignment of treatment (unconfoundedness) is not necessary for parallel


trends to hold.

What does parallel trends impose if treatment is not randomly assigned?

1
Introduction

■ There are potentially many ways of tackling this question.

■ A natural one focuses on the extent to which the validity of DiD depends on
functional form restrictions.

■ Following Athey and Imbens (2006), we will say parallel trends is insensitive to
functional form if when it holds for potential outcomes Y(∞), it also holds for
potential outcomes s(Y(∞)) for any strictly monotonic s.

■ Intuitively, this says that parallel trends holds regardless of the units in which one
measures the outcome.

2
Why study sensitivity to functional form

■ Studying sensitivity to functional form helps clarify the different ways that a
researcher can justify the validity of a DiD design:

▶ Can verify conditions that ensure PT holds for all functional forms.

▶ If sensitive to functional form, can justify the particular choice.

3
WWhy study sensitivity to functional form

■ It often may not be clear from subject-specific knowledge what is the “right”
transformation for PT to hold.

■ Example: different labor market studies have measured earnings in levels, logs, or
percentiles relative to national wage distribution.

■ The choice of transformation may be motivated by which ATT is “most relevant”, but
not always obvious that policy variation will generate PT for the same transformation

■ Moreover, we might want to use the same policy variation to study the ATT for
multiple transformations of the same outcome.

■ We will use Meyer, Viscusi and Durbin (1995) as a running example in the next slides:
interested in studying whether changes in weekly benefit amounts affected the
duration of time out of work in Michigan and Kentucky.
4
Parallel Trends in levels

■ Parallel trends assumption (in levels):

E [Yi,t=2 (∞)|Gi = 2] − E [Yi,t=1 (∞)|Gi = 2] = E [Yi,t=2 (∞)|Gi = ∞] − E [Yi,t=1 (∞)|Gi = ∞]


■ If Y is the duration of claims measured in weeks, and treatment is an increase of cap
(PT in levels)
▶ PT would suggest that the average untreated claims’ duration among workers who are
affected by the increased cap would evolve the same as the average untreated claims’
duration among workers who are not affected by the change in the cap.
▶ If the average change in untreated claims’ duration among workers who are not affected
by the change in the cap is 0.05 weeks, these would serve as counterfactual changes for
the average untreated claims’ duration among workers who are affected by the change
in cap
▶ ATT would provide the average treatment effect (in weeks) among workers who are
affected by the change in cap.
5
Parallel Trends in logs

■ Parallel trends assumption (in logs):


       
E ln Yi,t=2 (∞) | Gi = 2 − E ln Yi,t=1 (∞) | Gi = 2 = E ln Yi,t=2 (∞) | Gi = ∞ − E ln Yi,t=1 (∞) | Gi = ∞

   
E ln Yi,t=2 (∞) − ln Yi,t=1 (∞) | Gi = 2 = E ln Yi,t=2 (∞) − ln Yi,t=1 (∞) | Gi = ∞
   
Y (∞) Y (∞)
E ln i,t=2 Gi = 2 = E ln i,t=2 Gi = ∞
Y i, t = 1 ( ∞ ) Y i, t = 1 ( ∞ )

■ Under parallel trends (in logs), the ATT would take the format:
 
Yi,t=2 (2)
ATT = E [ln Yi,t=2 (2) − ln Yi,t=2 (∞) | G = 2] = E ln G=2 .
Yi,t=2 (∞)

■ ATT is measured in relative terms when you have PT in logs.


6
Parallel Trends in logs

■ Parallel trends assumption (in logs):


   
Yi,t=2 (∞) Yi,t=2 (∞)
E ln Gi = 2 = E ln Gi = ∞
Y i, t = 1 ( ∞ ) Y i, t = 1 ( ∞ )

■ If Y is the duration of claims measured in weeks, and treatment is an increase of cap


(PT in levels)
▶ PT would suggest that the average log relative growth of untreated claims’ duration
among workers who are treated would be the same as the average log relative growth of
untreated claims’ duration among workers who are not treated
▶ If the average log relative growth of untreated claims’ duration among workers who are
not treated is 0.008, these would serve as counterfactual changes for the average log
relative growth of untreated claims’ duration for workers that are treated.
▶ ATT would provide the average treatment effect (in relative terms) among workers who
are treated.
7
Which PT should we pick?

What if we take other transformations?

8
The rest of the lecture will build on

Roth and Sant’Anna (2023), but with

different notation.

9
Setup
Model setup

■ We consider the 2x2 DiD setup:

▶ 2 time periods: t = 1 (before treatment) and t = 2 (after treatment);

▶ 2 groups: G = 2 (treated at period 2) and G = ∞ (untreated by period 2);

■ Potential outcomes: Yi,t (2), Yi,t (∞). Observe Yi,t = 1{Gi =1} Yi,t (2) + 1{Gi =∞} Yi,t (∞).

■ Let’s assume No-anticipation: Yi,t=1 (2) = Yi,t=1 (∞).

■ Target parameter is the ATT in period t = 2,

ATT = E [Yi,t=2 (2) − Yi,t=2 (∞) | G = 2] .

10
More general models

■ We consider a 2-period, 2-group model for expositional simplicity.

■ More recent papers have considered settings with multiple periods and staggered
adoption.
▶ Typically impose a version of the 2-group, 2-period parallel trends assumption for many
periods/groups
(de Chaisemartin and D’Haultfœuille, 2020; Callaway and Sant’Anna, 2021; Sun and Abraham, 2021;
Borusyak, Jaravel and Spiess, 2024; Wooldridge, 2021).
▶ Thus, 2x2 results have immediate implications for the generalized PT assumption in the
staggered case.

■ The following results remain valid if all probability statements are implicitly
conditional on X, as when one assumes conditional parallel trends
(Heckman, Ichimura and Todd, 1997; Abadie, 2005; Sant’Anna and Zhao, 2020; Callaway and Sant’Anna,
11
2021).
Parallel Trends for all transformations of Y(∞)
Parallel Trends and Insensitivity to Functional Form

■ Following the definition in Athey and Imbens (2006), we say parallel trends is
insensitive to functional form (a.k.a. invariant to transformations) if
   
E s(Yi,t=2 (∞)) | Gi = 2 − E s(Yi,t=1 (∞)) | Gi = 2
=
   
E s(Yi,t=2 (∞)) | Gi = ∞ − E s(Yi,t=1 (∞)) | Gi = ∞

for all strictly monotonic s such that the expectations exist and are finite.

▶ s could be levels, logs, percentiles of a reference distribution, etc.

12
Insensitivity of Parallel Trends

Roth and Sant’Anna (2023) established the following characterization relating PT and
functional form.

Proposition (PT and functional form)


Parallel trends is insensitive to functional form if and only if parallel trends of CDFs is
satisfied, i.e.

FYi,t=2 (∞)|Gi =2 (y) − FYi,t=1 (∞)|Gi =2 (y) = FYi,t=2 (∞)|Gi =∞ (y) − FYi,t=1 (∞)|Gi =∞ (y), for all y ∈ R (1)
| {z } | {z }
Change in CDF for treated group Change in CDF for comparison group

where FYi,t (∞)|Gi =g is the cumulative distribution function of Yi,t (∞) | Gi = g.

Note that if Y(∞) is continuous (discrete), this is equivalent to parallel trends of PDFs (PMFs).
13
What Generates PT of CDFs?
What Generates PT of CDFs?

■ Under minor regularity conditions, Roth and Sant’Anna (2023) shows that parallel
trends of CDFs holds if and only if

FYi,t (∞)|Gi =g (y) = θJt (y) + (1 − θ )Hg (y) for all y ∈ R and g × t ∈ {2, ∞} × {1, 2}. (2)

for some θ ∈ [0, 1] and CDFs Jt (y) and Hg (y) depending only on time and group,
respectively.

■ This says that the distribution of Y(∞) for group g in period t is a mixture of a
time-dependent distribution (not depending on g) and a group-dependent
distribution (not depending on t).

14
Cases

This implies that PT is insensitive to funct form iff we are in the following three cases:
■ Case 1: (As-If) Randomized Treatment (θ = 1). The distribution of Yi,t (∞)|G = g is
the same for both groups (g = 2, ∞)

■ Case 2: Stationary Y(∞) (θ = 0). For each group, the distribution of Yi,t (∞)|G = g
doesn’t depend on t.

■ Case 3: A hybrid. (θ ∈ (0, 1)).


θ fraction of the population is as-if randomized btwn treatment and control
1 − θ fraction of the population is non-randomized in treatment and control but
have stationary Y(∞) distributions (conditional on group)
▶ Perhaps plausible if there is effectively an experiment among a sub-population with
time trends (e.g. younger workers), and endogenous selection into treatment among
sub-populations with stable earnings over time (e.g. older workers).
15
Numerical Illustration of Case 3

■ θ = 21 (e.g. share of younger workers)


Jt ∼ lognormal(1 + t, 1) (e.g. wages of younger workers in period t)
Hg ∼ lognormal(3 + 1{g=2} , 1) (e.g. wages of older workers in state g)
Yi,t (∞)|Gi = g ∼ θJt + (1 − θ )Hg

16
Can we test PT in CDFs?
Testable Implications

■ The parallel trends of CDFs condition implies that

FYi,t=2 (∞)|Gi =2 (y) = FYi,t=1 (∞)|Gi =2 (y) + FYi,t=2 (∞)|Gi =∞ (y) − FYi,t=1 (∞)|Gi =∞ (y) for all y ∈ R
| {z } | {z }
Counterfactual Identified
(3)

■ A (sharp) testable implication of PT of CDFs is that the RHS is monotonically


increasing.

■ If the RHS is non-monotonic, then there is no possible counterfactual distribution


Yi,t=2 (∞)|Gi = 2 such that parallel trends is insensitive to functional form!

■ Roth and Sant’Anna (2023) show that we can use this to test for cases where it is
clear from data we need to justify the particular choice of functional form
17
Testing in Practice

■ Consider the case where Y(∞) has finite support.

■ Then, testing that the implied CDF is increasing is equivalent to testing that the
implied mass is non-negative at all support points, i.e.
fYi,t=1 |Gi =2 (y) + fYi,t=2 |Gi =∞ (y) − fYi,t=2 |Gi =∞ (y) ≥ 0 for all y,
where fYi,t |Gi =g (y) is the probability mass function of Yi,t |Gi = g.

■ To test, we can merely replace the mass functions with sample analogs and apply
tools from the moment inequality literature to test that

E [fYi,t=1 |Gi =2 (y) + fYi,t=2 |Gi =∞ (y) − fYi,t=2 |Gi =∞ (y)] ≥ 0 for all y.

■ With continuous support, can likewise use methods for testing a continuum of
inequalities (e.g. Andrews and Shi (2013)). 18
Caveats

■ These tests may be useful for detecting when parallel trends is sensitive to
functional form.

■ But failure to reject does not mean that we don’t need to worry about functional
form!

■ PT of CDFs is falsifiable but not verifiable:


▶ Null is that there is some possible distribution for Yi,t=2 (∞)|Gi = 2 such that it holds.

■ Like tests of pre-trends, such pre-tests may be underpowered, and relying on them
can introduce distortions from pre-testing (Roth, 2022).

19
Empirical Illustration
Empirical Illustration

■ Stylized analysis of the impact of state-level minimum wage changes on wage


distribution

■ Testing PT of CDFs is interesting both because it determines whether PT is sensitive


to functional form and because DiD has been used to estimate distributional
impacts in this context.

■ Set-up:
▶ The pre-period is either 2007 or 2010. Post-period is 2015

▶ Treatment is whether the state raised MW between Pre and Post.

20
Empirical Illustration

■ Panel data from Cengiz, Dube, Lindner and Zipperer (2019) with state-level MW
changes and employment-to-population ratios for 25c wage-bins (in 2016 dollars) at
state-level

■ If Wi is person i’s wage if employed and 0 otherwise, then employment-to-pop ratio


at wage w is density of Wi at w.

■ Estimate counterfactual employment-to-population ratio in bin w under PT of CDFs


as:
f̂post,D=1 (w) = f̂pre,D=1 (w) + f̂post,D=0 (w) − f̂pre,D=0 (w),
weighting states by population size

■ Conduct moment inequality tests by comparing the minimum studentized moment


to “least-favorable” critical values (assuming all moments have mean zero) 21
Results: Pre = 2007, Post = 2015

22
Results: Pre = 2007, Post = 2015

■ Implied density is negative for wages $̃5-7.

■ Intuitively, employment declines in control states are larger than initial levels in
treatment states (likely b/c of differential effects of change in federal MW)

23
Results: Pre = 2010, Post = 2015

24
R package

■ Jon Roth and I have prepared the R package didFF to help you use these tests.

■ The package covers a variety of setups:

▶ Multiple time periods;

▶ Staggered treatment adoption;

▶ PT plausible only after conditioning on covariates.

■ Please check it out at https://github.com/pedrohcgs/didFF

25
References
Abadie, Alberto, “Semiparametric Difference-in-Differences Estimators,” The Review of
Economic Studies, 2005, 72 (1), 1–19.
Andrews, Donald W. K. and Xiaoxia Shi, “Inference Based on Conditional Moment
Inequalities,” Econometrica, 2013, 81 (2), 609–666.
Athey, Susan and Guido Imbens, “Identification and Inference in Nonlinear
Difference-in-Differences Models,” Econometrica, 2006, 74 (2), 431–497.
Borusyak, Kirill, Xavier Jaravel, and Jann Spiess, “Revisiting Event Study Designs: Robust
and Efficient Estimation,” Review of Economic Studies, 2024, Forthcoming.
Callaway, Brantly and Pedro H. C. Sant’Anna, “Difference-in-Differences with Multiple
Time Periods,” Journal of Econometrics, 2021, 225 (2), 200–230.
Cengiz, Doruk, Arindrajit Dube, Attila Lindner, and Ben Zipperer, “The Effect of Minimum
Wages on Low-Wage Jobs,” The Quarterly Journal of Economics, August 2019, 134 (3),
1405–1454.
de Chaisemartin, Clément and Xavier D’Haultfœuille, “Two-Way Fixed Effects Estimators
with Heterogeneous Treatment Effects,” American Economic Review, 2020, 110 (9),
2964–2996.
Heckman, James J., Hidehiko Ichimura, and Petra E. Todd, “Matching As An Econometric
Evaluation Estimator: Evidence from Evaluating a Job Training Programme,” The Review
of Economic Studies, October 1997, 64 (4), 605–654.
Meyer, Bruce D., W. Kip Viscusi, and David L. Durbin, “Workers’ Compensation and Injury
Duration: Evidence from a Natural Experiment,” The American Economic Review, 1995,
85 (3), 322–340.
Roth, Jonathan, “Pre-test with Caution: Event-study Estimates After Testing for Parallel
Trends,” American Economic Review: Insights, 2022, Forthcoming.
and Pedro H. C. Sant’Anna, “When Is Parallel Trends Sensitive to Functional Form?,”
Econometrica, 2023, 91 (2), 737–747.
Sant’Anna, Pedro H. C. and Jun Zhao, “Doubly robust difference-in-differences estimators,”
Journal of Econometrics, November 2020, 219 (1), 101–122.
Sun, Liyan and Sarah Abraham, “Estimating Dynamic Treatment Effects in Event Studies
with Heterogeneous Treatment Effects,” Journal of Econometrics, 2021, 225 (2).
Wooldridge, Jeffrey M, “Two-Way Fixed Effects, the Two-Way Mundlak Regression, and
Difference-in-Differences Estimators,” Working Paper, 2021, pp. 1–89.

25

You might also like