ECON-C4210 - Econometrics II: Capstone: Lecture 3: Difference in Difference Regression
ECON-C4210 - Econometrics II: Capstone: Lecture 3: Difference in Difference Regression
Otto Toivanen
• Let’s assume a discrete treatment (=all those who get the treatment get the same
treatment):
• In economics:
1 The researcher decides on what the experiment is.
2 The researcher decides what the population of interest is.
3 The researcher draws a random sample.
4 Individuals in the random sample are randomly allocated into control and treatment groups.
Yi = β0 + β1 Ti + i
Q3: what if individuals truly randomized and the researcher observes other characteristics
besides Yi , Ti ?
• Treatment: E[Y |T = 1] = β0 + β1 × 1
• Control: E[Y |T = 0] = β0 + β1 × 0
• Difference: E[∆Y ] = β1
• This is why a t-test on the difference in Y between treatment and control groups often
sufficient.
Two possibilities:
2 Some get the treatment (treatment group), some don’t (control group) →
Difference-in-difference (DiD) setup.
• We concentrate on DiD.
Bloom, N., Liang, J., Roberts, J. & Ying, Z. J. (2015). Does working from home work?
Evidence from a Chinese experiment. The Quarterly Journal of Economics, 130(1),
165–218
2 Difference in change of performance between those workers that shift to WFH and those
who stay in office.
Gil, R. (2015). Does vertical integration decrease prices? Evidence from the paramount
antitrust case of 1948. American Economic Journal: Economic Policy, 7(2), 162–91
2 Difference in change of ticket prices between VI and non-VI theatres due to removing VI.
Aghion, P., Akcigit, U., Hyytinen, A. & Toivanen, O. (2022). A year older, a year wiser
(and farther from frontier): Invention rents and human capital depreciation. Review of
Economics and Statistics, forthcoming
2 Difference in change of wages before and after invention for individuals in inventing and
non-inventing firms.
• Notice how we could replace α0 + βgroup with αi if we have individual level data. αi , too,
would vanish in the differencing over time in both the control and the treatment group.
3 DiD: E[∆∆Y ] = β1
• β1 is the Average Treatment Effect, (ATE), as it measures the average change in Yit due
to the treatment.
• An RCT also delivers (an estimate of) ATE.
• DiD allows for individual specific constants if you have data on the same individuals
before and after.
• → DiD doesn’t necessitate randomization.
1 Identifying assumption #1: common trends: The outcome variable would have
developed similarly in the treatment group as it did in the control group, had the
treatment group not received the treatment.
Common trends.
• Substantively?
• Example #1: Bloom et al., 2015
• Those that know their productivity is (permanently) declining decide to work from home (or
office).
• Example #2: Gil, 2015
• Think of the effect of hiring a new CEO on firm performance. Firm observes performance is
(permanently) declining compared to peers, and therefore hires a new CEO.
• Example #3: Aghion et al., 2018
• The treatment firms are in growing markets where within-firm human capital important. The
trend growth of wages therefore different from that of control group firms.
→ selection into treatment can depend on individual specific ”things” that are constant over
the periods.
• Even 2 period DiD allows control variables.
• Controls may be more important than in an RCT to reduce variation & to remove
omitted variable bias.
• Technically, the ”shock” in the 1st period leads somebody to (not) choose the treatment.
• Can allow for more flexible models (e.g. introduction of time/period dummies; testing of
common trends using treatment group - time period dummy interactions).
• BUT: notice that one stretches what αi captures.
• What have we assumed about the effect of treatment on the control group?
• Important but difficult topic. We will neglect it, as is all too often done in the literature,
too.
• Example #1: a merger affects the prices of all firms (products) in the market, not just
those of the merging parties.
• Example #2: a wholesale education reform (think of the Finnish reform making secondary
education compulsory) affects the wages of not only those whose education changes
because of the reforms, but also of those who compete with them in the job market.
• Example #3: a regulatory reform affects the prices of all pharmaceuticals based on the
same molecule. Kortelainen et al. (2023) study reforms in Nordic pharmaceutical markets
and define prices both at the
1 market-level and the
2 package-level.
Kortelainen, M., Markkanen, J., Siikanen, M. & Toivanen, O. (2023). The effects of price
regulation on pharmaceutical expenditure and availability [Unpublished manuscript].
Toivanen ECON-C4210 Lecture 3 29 / 36
Choosing the comparison group
• A large body of literature has demonstrated that key to success in using DiD (more
generally, in identifying causal effects) is the choice of the control group.
• Control group observation units should be “as similar” to treatment group observation
units.
→ conditional DiD.
• Conditional = first choose carefully which observation units to include in the control
group.
• When done correctly, this helps a great deal.
3 Go through potential control group observation units and choose a unit / units that are
as similar as the treatment group observation unit #1. Many different technical solutions
to implement this.
• Aghion, P., Akcigit, U., Hyytinen, A. & Toivanen, O. (2018). On the returns to invention
within firms: Evidence from finland. AEA Papers and Proceedings, 108, 208–12
• We study what happens to wages of individuals after invention.
1 Inventors
2 Entrepreneurs
3 White-collar workers
4 Blue-collar workers
• Those in the treatment group work in the same firm as the inventor in the year of the
patent application.
• Different units get the treatment at different times (at the extreme, all units eventually
get the treatment, leading to an event study setting).
• Different units get a different treatment. Example: Finnish cost subsidies to firms during
the COVID-19 crises vary from 2 000€ to 500 000€.
• The effect of the treatment is different for different treatment units, possibly conditional
on observables.
• DID methods have developed rapidly in the last few years re all these issues. It is now
well understood that the base two-way FE DID may produce biased results in settings
that are even a little more complicated than the textbook setting.