0% found this document useful (0 votes)
19 views28 pages

Econometrics 2: 1. Repeated Cross Section: Difference in Differences

The document discusses the Difference in Differences (DiD) method in econometrics, focusing on its application to repeated cross-sectional data and policy evaluation. It outlines the motivation for measuring treatment effects over time, provides examples such as the impact of minimum wage changes, and explains the common trends condition necessary for valid results. Additionally, it explores extensions of the DiD approach, including dynamic specifications and the inclusion of control variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views28 pages

Econometrics 2: 1. Repeated Cross Section: Difference in Differences

The document discusses the Difference in Differences (DiD) method in econometrics, focusing on its application to repeated cross-sectional data and policy evaluation. It outlines the motivation for measuring treatment effects over time, provides examples such as the impact of minimum wage changes, and explains the common trends condition necessary for valid results. Additionally, it explores extensions of the DiD approach, including dynamic specifications and the inclusion of control variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Econometrics 2

1. Repeated Cross Section: Difference in differences

Laurent Davezies & Elia Lapenta

ENSAE, 2024/2025

1 / 28
Outline

Introduction

The difference in differences method

Generalizations

2 / 28
Longitudinal data

▶ Repeated cross sections: data measuring the same variables at different


periods, on different units (units=individuals, households, firms, areas...).
▶ Examples:
▶ Traditional national surveys: Consumer Expenditure Survey, Health
Survey, Housing survey, Time-use survey...
▶ Administrative data without individual identifier.

▶ Panels (next lectures): data mesuring the same variables at different


periods on the same units:
▶ The US PSID and NLSY, administrative data including an identifier...
▶ Rotating panels such as the Insee Labor Force Survey.

▶ We consider here methods applying to both types of data.

3 / 28
Motivation

▶ Measuring the evolution over time of an outcome of interest:


▶ Wages, income;
▶ Consumption (share of certain expenditures, durable goods);
▶ Fertility...

▶ Separating the explained and unexplained part of this evolution.

▶ Measuring the evolution of the effect of treatments/particular covariates:


▶ gender wage gap;
▶ returns to schooling...

4 / 28
Evolution of returns to schooling and gender wage gap
▶ Interact the covariate of interest with time.

▶ The example below is obtained on US data between 1978 and 1985:

5 / 28
Difference in Differences

▶ Evaluate treatment effects: alternative to the instrumental variable


approach.
▶ One of the main empirical methods used in policy evaluation:
Difference in Differences
▶ Large number of scientific papers and/or policy analysis using DiD or
some extensions/generalizations of DiD.
▶ Basic idea: compare evolution of a group entering in a treatment
with evolution of a group not entering in the treatment.
▶ Design of the data: repeated cross section
▶ With panel data (see next lectures) : same identification assumption
but more precise estimates

6 / 28
Outline

Introduction

The difference in differences method

Generalizations

7 / 28
Motivation

▶ Two repeated cross sections at t = 0 or t = 1.

▶ We consider a binary treatment Dt ∈ {0, 1} at period t.

▶ Yt (0) (resp. Yt (1)) is the potential outcome associated to no treatment


(resp. treatment) at period t. We only observe Yt := Yt (Dt ).
▶ Two groups (stable on the two periods): the “control group” (G = 0) and
the “treatment group” (G = 1).
▶ The control group remains untreated in both periods, whereas the
treatment group receives the treatment at the second period. Then
Dt = G × t.
▶ We seek to identify the average treatment effect on the treated, viz.

δ T = E (Y1 (1) − Y1 (0)|D1 = 1)


= E (Y1 (1) − Y1 (0)|G = 1).

8 / 28
Basic set-up

▶ Often, the assumption of no selection is unrealistic (ie


Cov(D1 , Y1 (0)) ̸= 0), even if we include regressors (ie
Cov(D1 , Y1 (0)|X ) ̸= 0).
▶ It is not always possible to find a valid instrument.

▶ Other idea: exploit spatial and temporal variations in the treatment.

▶ Examples:
▶ Effect of minimum wage on employment: use variations between US
states and temporal variations in the minimum wage.
▶ Effect of taxes on consumption.
▶ Effect of the presence of Seveso plants or green parks on housing
prices.

9 / 28
Example
▶ Effect of minimum wage on employment (Card and Krueger, 1994)?
▶ In April 1992, New Jersey increases its minimum wage, from $4.25 to
$5.05. Pennsylvania keeps its minimum wage at $4.25.
▶ Card and Krueger focus on fast-food restaurants.
▶ They gather data on around 400 such restaurants in the two states, before
and after the reform.

10 / 28
First strategy: control-treated comparison

▶ A first idea would be to simply compare the control and treatment group,
after the introduction of the treatment:

βCS = E (Y1 |G = 1) − E (Y1 |G = 0).

▶ But to obtain βCS = δ T , one would require:

E (Y1 (0)|G = 1) = E (Y1 (0)|G = 0).

▶ This condition is often unrealistic. We can check it informally by looking


at the 1st period:

E (Y0 (0)|G = 1) = E (Y0 (0)|G = 0)


⇐⇒ E (Y0 |G = 1) = E (Y0 |G = 0). (1)

▶ In C & K: E
b (Y0 |G = 1) ≃ 20.4, E
b (Y0 |G = 0) ≃ 23.3. We reject (1) at the
5% level.

11 / 28
Second strategy: before-after comparison

▶ A second idea would be to measure the evolution of Y in the treatment


group:
βBA = E (Y1 |G = 1) − E (Y0 |G = 1).
▶ But for βBA = δ T to hold, one would need

E (Y1 (0)|G = 1) = E (Y0 (0)|G = 1),

▶ This condition is often unrealistic. We can test it informally by checking


whether in the control group,

E (Y1 (0)|G = 0) = E (Y0 (0)|G = 0)


⇐⇒ E (Y1 |G = 0) = E (Y0 |G = 0). (2)

▶ In C & K: E
b (Y1 |G = 0) ≃ 21.2 and E
b (Y0 |G = 0) ≃ 23.3. We reject (2) at
the 10% level.

12 / 28
Third strategy: difference in differences
▶ We now combine the two previous ideas by considering the difference in
differences:

βDID = [E (Y1 |G = 1) − E (Y0 |G = 1)] − [E (Y1 |G = 0) − E (Y0 |G = 0)] .

Theorem 1
Let us suppose that the following common trends condition holds:

E (Y1 (0)|G = 1) − E (Y0 (0)|G = 1) = E (Y1 (0)|G = 0) − E (Y0 (0)|G = 0)

Then βDID = δ T .

Proof: since Dt = G × t, we have

βDID =E (Y1 (1) − Y1 (0)|G = 1)


+ [E (Y1 (0)|G = 1) − E (Y0 (0)|G = 1)]
− [E (Y1 (0)|G = 0) − E (Y0 (0)|G = 0)] .
=δ T □

13 / 28
Graphical interpretation

(taken from “Mostly Harmless Econometrics” by J. Angrist and S. Pischke)

14 / 28
Example: Card and Krueger (1994)

▶ C & K get the results below (which is partially pasted from their Table 3).

⇒ Positive and significant effect! Sign reversed from the classical


microeconomic prediction.
▶ Explanation of C & K: can be the case if restaurants are local monopsony
on the labor market.

15 / 28
The common trends condition

▶ Key condition, which is not testable.

▶ However, if we observe the outcome at several periods (0, -1, -2 etc.)


before the treatment, we can test very close conditions

E [Yt (0)|G = 1] − E [Yt−1 (0)|G = 1]


=E [Yt (0)|G = 0] − E [Yt−1 (0)|G = 0], t ≤ 0.

▶ These conditions are testable because Yt = Yt (0) when t ≤ 0.

▶ We simply test that Y follows a parallel trend in the two groups before
the introduction of the policy.
▶ Further, assume that t 7→ E (Yt (1) − Yt (0)|G = 1) is constant.

▶ Then we can also “test” the common trends condition by testing that
t 7→ E (Yt |G = 1) − E (Yt |G = 0) is constant for t > 1.

16 / 28
Example of Card & Krueger
▶ The graph below is taken from Card and Krueger (2000), who obtain after
their 1994 paper some administrative data on a longer period.
▶ Conclusion?

17 / 28
Example of Pischke (2007)
▶ What is the effect of the length of a school year on students’ achievement?

▶ To answer this, Pischke uses the fact that in 1967, West Germany except
Bavaria moved the start of the school year from Spring to Fall.

⇒ the school year was shortened in 1967 and 1968, from 37 to 24 weeks.

(graph taken from “Mostly Harmless Econometrics” by J. Angrist and S. Pischke)

18 / 28
Outline

Introduction

The difference in differences method

Generalizations

19 / 28
Difference-in-differences and regressions

▶ We can compute the DID by a regression, see below.

▶ 1st benefit: we can easily compute the standard errors of βbDID .

▶ 2nd benefit of the regression view: including control variables.

▶ Pooling the two periods in a dataset: define time T as a random variable


on the pooled sample
D := DT = G × T ,
Y := YT = YT (DT ).

20 / 28
Difference-in-differences and regressions

▶ 1st benefit: we can easily compute the standard errors of βbDID .

Proposition 1
βDID (resp. βbDID ) can be obtained as the coefficient of D = G × T in the
theoretical regression (resp. regression on the data) of Y on G, T and D.

Proof: let β denote the coeff. of the theoretical reg. of Y on X . Recall that:
h 2 i
β = arg min E E (Y |X ) − X ′ b .
b

Here X = (1, G, T , G × T )′ . We have 4 coeffs. (β = (β1 , β2 , β3 , β4 )′ ) for 4


values (E (Y |G = g, T = t) with (g, t) ∈ {0, 1}2 ). Thus E (Y |X ) = X ′ β and:

βDID =E (Y |G = 1, T = 1) − E (Y |G = 1, T = 0)
− [E (Y |G = 0, T = 1) − E (Y |G = 0, T = 0)]
=(β1 + β2 + β3 + β4 ) − (β1 + β2 ) − [(β1 + β3 ) − β1 ] = β4 .

We reason similarly for βbDID , just replacing E (.) by E


b (.) □

21 / 28
Adding control variables
▶ 2nd benefit of the regression view: including control variables.

▶ The common trends condition corresponds to:

YT (d) = β01 + Gβ02 + T β03 + dδ T + ε, E (ε|G, T ) = 0.


▶ We can extend this model by assuming:

YT (d) = β01 + Gβ02 + T β03 + W ′ γ + dδ T + ε, E (ε|G, T ) = 0, E (W ε) = 0,


where W corresponds to a vector of control variables.
▶ Remark: (β1 , β2 , β3 , β4 ) in the previous slide are coefficients of a linear
projection but (β01 , β02 , β03 , δ T ) are causal parameters.
▶ Then, we can identify and estimate δ T by a reg. of Y on G, T , W and D.

▶ Motivation: common trends are assumed to hold for Y (0) − W ′ γ rather


than Y (0), which may be more plausible.
▶ Example: effect of a Seveso plant on housing prices.

▶ The different geographical areas may have different housing prices absent
the plant because of, e.g., different evolution in average income.
22 / 28
Multiple groups and periods, non-binary treatment

▶ We can also consider the case with several groups (g), multiple periods
(t) and a non-binary treatment.
▶ Example 1: D = minimum wage, G = US state, T = year.

▶ Example 2: D = cigarette tax rate, G = US state, T = year.

▶ In such cases, we still assume:

g t
X X
Y (d) = α + 1{G = g}βg + 1{T = t}γt + dδ T + ε,
g=1 t=1

to which we can also add covariates X .


▶ Equivalently, for a unit i in group g and date t:

Yi,g,t = α + βg + γt + Dg,t δ T + εi,g,t . (3)

23 / 28
Dynamic specifications
▶ A policy may take some time to be effective. Sometimes, it is also
anticipated by agents.
▶ We can estimate such dynamic effects by specifications generalizing (3).

▶ For simplicity, assume that Dg,t = 1{t ≥ Fg }: treatment is binary and


monotonic over time. Fg is the time period in which group g starts to be
treated, Fg = +∞ if g remains untreated.
▶ Then, fix τ1 , τ2 > 0 and define
−τ1
Dg,t = 1{t ≤ Fg − τ1 },
k
Dg,t = 1{t = Fg + k}, if k ∈ {−τ1 + 1, ..., τ2 − 1}
τ2
Dg,t = 1{t ≥ Fg + τ2 }.

▶ We consider the following specification:


τ2
X k
Yi,g,t = α + βg + γt + Dg,t δk + εi,g,t .
k=−τ1

24 / 28
Dynamic specifications

▶ Since Dg,t = 0 when t < Fg , we can test the common trends condition by
testing δk = 0 for k < 0.
▶ A violation of δk = 0 for k < 0 can also be due to the anticipation of the
treatment/policy.
▶ If k 7→ δk is monotonic on {0, ..., τ2 }, the treatment takes some time to
reach its full (positive or negative) effect.
▶ To identify the δk , Fg should not be constant, namely, the treatment
should not be introduced simultaneously in all groups.
▶ Even so, we need additional restrictions on the (δk )k . Often: δ−1 = 0.

25 / 28
Example: unilateral divorce laws and divorce rates
▶ In the 70’s, several US state introduce unilateral divorces laws. Were
these laws responsible for the rise in the divorce rate?
▶ Wolfers exploits variations in the timing of the introduction of these laws
to answer this question.
▶ Conclusion from the plot of t 7→ b
δt (obtained using Wolfers’ data)?

26 / 28
Computation of standard errors

▶ Possible aggregated shocks at the (g, t) level: for instance, economic


shocks specific to state g and year t.
▶ In such. a case, we must compute standard errors with clusters at the
g × t level.
▶ But Bertrand et al. (2004) also underline the importance of temporal
dependence.
▶ Using real data, they show that standard errors ignoring such dependence
leads to a severe under-estimation of the true standard errors.
▶ Idea: they create fictitious laws and estimate their effects. They find that
≃ 45% of the effects are significant, instead of ≃ 5%!
▶ Possible solution: clustering at the group level. This implies to have
“many” groups (≥ 50).

27 / 28
Summary

▶ Idea behind difference in differences:


1. A treatment group becomes treated whereas a control group remains
untreated.
2. We then compare the evolutions of their average outcomes.
▶ Works under the common trends condition.

▶ “Test” of this condition using the trends prior to the policy introduction.

▶ Link with regressions.

▶ Generalizations: inclusion of control variables, multiple groups and periods


of time, dynamic specifications...
▶ Computation of standard errors.

28 / 28

You might also like