Econometrics 2: 1. Repeated Cross Section: Difference in Differences
Econometrics 2: 1. Repeated Cross Section: Difference in Differences
ENSAE, 2024/2025
1 / 28
Outline
Introduction
Generalizations
2 / 28
Longitudinal data
3 / 28
Motivation
4 / 28
Evolution of returns to schooling and gender wage gap
▶ Interact the covariate of interest with time.
5 / 28
Difference in Differences
6 / 28
Outline
Introduction
Generalizations
7 / 28
Motivation
8 / 28
Basic set-up
▶ Examples:
▶ Effect of minimum wage on employment: use variations between US
states and temporal variations in the minimum wage.
▶ Effect of taxes on consumption.
▶ Effect of the presence of Seveso plants or green parks on housing
prices.
9 / 28
Example
▶ Effect of minimum wage on employment (Card and Krueger, 1994)?
▶ In April 1992, New Jersey increases its minimum wage, from $4.25 to
$5.05. Pennsylvania keeps its minimum wage at $4.25.
▶ Card and Krueger focus on fast-food restaurants.
▶ They gather data on around 400 such restaurants in the two states, before
and after the reform.
10 / 28
First strategy: control-treated comparison
▶ A first idea would be to simply compare the control and treatment group,
after the introduction of the treatment:
▶ In C & K: E
b (Y0 |G = 1) ≃ 20.4, E
b (Y0 |G = 0) ≃ 23.3. We reject (1) at the
5% level.
11 / 28
Second strategy: before-after comparison
▶ In C & K: E
b (Y1 |G = 0) ≃ 21.2 and E
b (Y0 |G = 0) ≃ 23.3. We reject (2) at
the 10% level.
12 / 28
Third strategy: difference in differences
▶ We now combine the two previous ideas by considering the difference in
differences:
Theorem 1
Let us suppose that the following common trends condition holds:
Then βDID = δ T .
13 / 28
Graphical interpretation
14 / 28
Example: Card and Krueger (1994)
▶ C & K get the results below (which is partially pasted from their Table 3).
15 / 28
The common trends condition
▶ We simply test that Y follows a parallel trend in the two groups before
the introduction of the policy.
▶ Further, assume that t 7→ E (Yt (1) − Yt (0)|G = 1) is constant.
▶ Then we can also “test” the common trends condition by testing that
t 7→ E (Yt |G = 1) − E (Yt |G = 0) is constant for t > 1.
16 / 28
Example of Card & Krueger
▶ The graph below is taken from Card and Krueger (2000), who obtain after
their 1994 paper some administrative data on a longer period.
▶ Conclusion?
17 / 28
Example of Pischke (2007)
▶ What is the effect of the length of a school year on students’ achievement?
▶ To answer this, Pischke uses the fact that in 1967, West Germany except
Bavaria moved the start of the school year from Spring to Fall.
⇒ the school year was shortened in 1967 and 1968, from 37 to 24 weeks.
18 / 28
Outline
Introduction
Generalizations
19 / 28
Difference-in-differences and regressions
20 / 28
Difference-in-differences and regressions
Proposition 1
βDID (resp. βbDID ) can be obtained as the coefficient of D = G × T in the
theoretical regression (resp. regression on the data) of Y on G, T and D.
Proof: let β denote the coeff. of the theoretical reg. of Y on X . Recall that:
h 2 i
β = arg min E E (Y |X ) − X ′ b .
b
βDID =E (Y |G = 1, T = 1) − E (Y |G = 1, T = 0)
− [E (Y |G = 0, T = 1) − E (Y |G = 0, T = 0)]
=(β1 + β2 + β3 + β4 ) − (β1 + β2 ) − [(β1 + β3 ) − β1 ] = β4 .
21 / 28
Adding control variables
▶ 2nd benefit of the regression view: including control variables.
▶ The different geographical areas may have different housing prices absent
the plant because of, e.g., different evolution in average income.
22 / 28
Multiple groups and periods, non-binary treatment
▶ We can also consider the case with several groups (g), multiple periods
(t) and a non-binary treatment.
▶ Example 1: D = minimum wage, G = US state, T = year.
g t
X X
Y (d) = α + 1{G = g}βg + 1{T = t}γt + dδ T + ε,
g=1 t=1
23 / 28
Dynamic specifications
▶ A policy may take some time to be effective. Sometimes, it is also
anticipated by agents.
▶ We can estimate such dynamic effects by specifications generalizing (3).
24 / 28
Dynamic specifications
▶ Since Dg,t = 0 when t < Fg , we can test the common trends condition by
testing δk = 0 for k < 0.
▶ A violation of δk = 0 for k < 0 can also be due to the anticipation of the
treatment/policy.
▶ If k 7→ δk is monotonic on {0, ..., τ2 }, the treatment takes some time to
reach its full (positive or negative) effect.
▶ To identify the δk , Fg should not be constant, namely, the treatment
should not be introduced simultaneously in all groups.
▶ Even so, we need additional restrictions on the (δk )k . Often: δ−1 = 0.
25 / 28
Example: unilateral divorce laws and divorce rates
▶ In the 70’s, several US state introduce unilateral divorces laws. Were
these laws responsible for the rise in the divorce rate?
▶ Wolfers exploits variations in the timing of the introduction of these laws
to answer this question.
▶ Conclusion from the plot of t 7→ b
δt (obtained using Wolfers’ data)?
26 / 28
Computation of standard errors
27 / 28
Summary
▶ “Test” of this condition using the trends prior to the policy introduction.
28 / 28