0% found this document useful (0 votes)
13 views96 pages

正在发送邮件 wk-08-slides

The document discusses the concept of difference in differences (DID) as a method for estimating the impact of a treatment by comparing outcomes between a treatment group and a control group before and after an intervention. It emphasizes the importance of the parallel trends assumption, which posits that in the absence of treatment, both groups would have followed similar trends over time. Additionally, it outlines the necessary conditions for valid inference using DID, including the stability of group composition and the absence of spillover effects.

Uploaded by

jianyi.y
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views96 pages

正在发送邮件 wk-08-slides

The document discusses the concept of difference in differences (DID) as a method for estimating the impact of a treatment by comparing outcomes between a treatment group and a control group before and after an intervention. It emphasizes the importance of the parallel trends assumption, which posits that in the absence of treatment, both groups would have followed similar trends over time. Additionally, it outlines the necessary conditions for valid inference using DID, including the stability of group composition and the absence of spillover effects.

Uploaded by

jianyi.y
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

2023 S2 ECOS 3997 (Stream 3)

Week 8: Difference in differences

Ellen Stuart
School of Economics
The University of Sydney
21 September 2023
Reminder

Reminders:

• No office hours next week (midsemester break)


• Two more weeks of lecture (Weeks 9 and 10)
Review

Our goal: to empirically estimate the revenue-maximizing linear income tax rate

What have we done so far?

• Wk1: Used a simple model to tell us what we need out of the data (e)
• Wk2: Considered the pros and cons of different sources of data
• Wk3-5: Initial/exploratory data analysis (data cleaning, data visualization,
summary statistics)
• Wk6-7: Prediction/inference (linear regression, instrumental variables)

What are we going to do today?

• More inference: difference in differences & our 2nd paper estimating the ETI

1/91
Where we left off

Recall our linear regression model. The true population values are:

Yi = β0 + β1 Xi + ui

What we estimate:
i = βb0 + βb1 Xi + ubi

To make claims about the population, we need (at a minimum):

• βb1 to be a good (unbiased) estimate of β1 : E[βb1 ] = β1


• A confidence interval around βb1
• Exogeneity of the error term u

2/91
Where we left off

Reminder

Exogeneity ⇒ there is NO systematic relationship between the independent


variable X and the error term u

Formally, Cov[X , u] = 0 (i.e., there is no covariance between X and u)

If there is a systematic relationship between X and u, we have an endogeneity


problem (and we say that “X is endogenous”)

Broadly, endogeneity is when (one of) the independent variable(s) is correlated


with the error term

3/91
Where we left off

Reminder

Example: If we run the regression:


WAMi = β0 + β1 (hours of studyi ) + ui

We might worry that one (or more) of the following omitted variables are
correlated with hours of study:
• Inherent ability
• Class attendance
• Quality of instruction
• Difficulty of other courses taken concurrently
Because these are all in u, we would have X correlated with u and therefore we
would be concerned about endogeneity 4/91
Where we left off

Reminder

Last week we discussed one strategy empirical economists use when faced with
endogeneity: instrumental variables

Intuition for the approach:


• Find a variable that (1) is correlated with X , (2) is NOT correlated with u,
and (3) is only correlated with Y through it’s correlation with X
Two notes:
• Finding a variable that meets these three criteria is very difficult
• Even if we have a valid instrument, our coefficient has a slightly different
interpretation (i.e., it is the average treatment effect on the compliers)
5/91
Difference in differences (diff-in-diff): Definition

Another strategy empirical economists use when faced with endogeneity:


difference in differences (or diff-in-diff, DnD, DID...)

Intuition for the approach:

• Find two groups of people that are “the same”


• One group receives a treatment (the “treatment group”)
• One group doesn’t receive a treatment (the “control group”)

Because the two groups are “the same”, we can compare (what happens to the
treatment group) to (what happens in the control group) and assume the
difference is due to the treatment

6/91
Diff-in-diff: Visual/example

Imagine there is some population of interest:

Population of
interest

The mean of some variable of interest Y is 10 for this population

7/91
Diff-in-diff: Visual/example

A bolt of lightening (or maybe some policy intervention) strikes the population,
randomly breaking it into two groups:

Population of
interest

Control group Treatment group


Population that does Population that
not receive treatment receives treatment

8/91
Diff-in-diff: Visual/example

Because the policy intervention was random, the mean value of Y is ≈ 10 for both
the treatment group and the control group

15.6

11.7
Treatment group 10.2
Control group 10.1

Pre-intervention Post-intervention 9/91


Diff-in-diff: Visual/example

The policy intervention has some effect on the value of Y so that the mean value
of Y is different between the treatment and control groups after the intervention

15.6

11.7
Treatment group 10.2
Control group 10.1

Pre-intervention Post-intervention 10/91


Diff-in-diff: Visual/example

The difference in differences estimator first takes the difference between the post-
and pre-intervention for both groups:

15.6 15.6 − 10.2 = 5.4

11.7 11.7 − 10.1 = 1.6


Treatment group 10.2
Control group 10.1

Pre-intervention Post-intervention 11/91


Diff-in-diff: Visual/example

The difference in differences estimator then takes the difference in those differences
(thus the name):

15.6 15.6 − 10.2 = 5.4

11.7 11.7 − 10.1 = 1.6


Treatment group 10.2
Control group 10.1 5.4 − 1.6 = 3.8

Pre-intervention Post-intervention 12/91


Diff-in-diff: Visual/example

We can also see this as a table:

Pre Post Difference


Treatment 10.2 15.6 5.4
Control 10.1 11.7 1.6
Difference 0.1 3.9 3.8

The difference in differences is (15.6 − 10.2) − (11.7 − 10.1) = 3.8

13/91
Diff-in-diff: Intuition

Why take both differences? Why not compare the treated group before and after?
• We want to know how the intervention affected the mean of Y
• We might worry that some other thing also impacted Y at the same time as
the treatment
• If we just looked at the treated group before and after, the effect of that other
thing would be incorrectly attributed to our policy intervention
• Looking at the control group before and after the intervention allows us to
estimate the average impact of that other thing
• Because the two groups are otherwise “the same” apart from the treatment,
we might expect that the impact of that other thing to be the same for the
control group and the treatment group
• We can then remove the impact from that other thing from our estimate of
the impact of the intervention 14/91
Diff-in-diff: Estimation

Formally, we are estimating 3 coefficients from the following model:

Y = β0 + β1 (Treatment = 1) + β2 (Post = 1) + β3 (Treatment x Post = 1) + u

Y : outcome variable

Treatment: variable that = 1 if the row in the data is from the treatment group

Post: variable that = 1 if the row in the data is from the post-intervention period

Treatment x Post: variable that = 1 if the row in the data is BOTH from the
treatment group and from the post-intervention period

15/91
Diff-in-diff: Estimation

Our data might look something like this: Outcome is our “Y” variable

Group is treatment status (1 =


Outcome Group Period Group x Period
treatment group, 0 = control group)
10 0 0 0
18 1 0 0 Period is time period (1 = post-
11 0 0 0 intervention, 0 = pre-intervention)
24 1 0 0
18 0 1 0 Group x Period is the interaction of
21 1 1 1 Group and Period (1 = treatment
13 0 1 0 group AND post-intervention, 0
17 1 1 1 otherwise)

16/91
Diff-in-diff: Estimation

Y = β0 + β1 (Treatment = 1) + β2 (Post = 1) + β3 (Treatment x Post = 1) + u

β0 is the mean of Y in control group pre-intervention (i.e., mean of Ycontrol,pre )


| {z }
β0
Treatment = 0 15.6
Post = 0
⇒ Treatment x Post = 0

β0
11.7
Treatment group 10.2
Control group 10.1
17/91
Pre-intervention Post-intervention
Diff-in-diff: Estimation

Y = β0 + β1 (Treatment = 1) + β2 (Post = 1) + β3 (Treatment x Post = 1) + u

β1 = (mean of Ytreated,pre ) - (mean of Ycontrol,pre )


| {z }
=β0
Treatment = 1 15.6
Post = 0
⇒ Treatment x Post = 0
β0 + β1
β0
11.7
Treatment group 10.2
Control group 10.1
18/91
Pre-intervention Post-intervention
Diff-in-diff: Estimation

Y = β0 + β1 (Treatment = 1) + β2 (Post = 1) + β3 (Treatment x Post = 1) + u

β2 = (mean of Ycontrol,post ) - (mean of Ycontrol,pre )


| {z }
=β0
Treatment = 0 15.6
Post = 1
⇒ Treatment x Post = 0
β0 + β2
β0
11.7
Treatment group 10.2
Control group 10.1
19/91
Pre-intervention Post-intervention
Diff-in-diff: Estimation

Y = β0 + β1 (Treatment = 1) + β2 (Post = 1) + β3 (Treatment x Post = 1) + u

β3 = (mean Ytreated,post ) - (mean Ycontrol,pre ) - (mean Ytreated,pre ) - (mean Ycontrol,post )


| {z } | {z } | {z }
=β0 =β1 =β2
Treatment = 1 15.6
Post = 1
⇒ Treatment x Post = 1
β0 +β1 +β2 +β3

11.7
Treatment group 10.2
Control group 10.1
20/91
Pre-intervention Post-intervention
Diff-in-diff: Estimation

Each coefficient represents either (1) a group mean or (2) a difference between
means:

• β0 = mean of Y in the control group pre-intervention


• β1 = difference in mean of Y in control and treatment groups pre-intervention
• β2 = difference in mean of Y in control group pre- and post-intervention
– This is the time trend–the change in the outcome over time that was
independent of the intervention
• β3 = difference in mean of Y in treatment group post-intervention compared
to estimate of what would have happened absent the intervention
– This is the ATT: Average Treatment Effect on the Treated – the change in
the outcome for the treatment group that we can attribute to the intervention
– This is the key parameter of interest

21/91
Diff-in-diff: Estimation

What is the counterfactual? I.e., what would have happened to the treatment group if
the policy intervention had not occurred?

15.6 = β0 + β1 + β2 + β3

11.8 = β0 + β1 + β2
11.7 = β0 + β2
Treatment group 10.2
Control group 10.1

Pre-intervention Post-intervention
22/91
Diff-in-diff: Assumptions

Parallel trends assumption: Critical assumption of diff-in-diff

If the intervention had not occurred (i.e., in the absence of the treatment), the
control and treatment groups would have had a similar trend over time

This assumption allows us to use β2 to net out changes in the outcome that are
not due to the treatment

Don’t trust a study that uses diff-in-diff but doesn’t graphically show these trends!

23/91
Diff-in-diff: Assumptions

Classic example of failure of parallel trends: Ashenfelter’s Dip


Intuition:
• Training group had low income
in 1964 likely due to being in
training–not a problem
• Individuals in training group
experienced a “bad year” in
1963 (and maybe more likely to
enter training as a result)
• Comparing 1963 to 1965 may
overestimate returns to training
Dip among treated ∼ Ashenfelter (REStat 1978)
24/91
Source: https://pages.uoregon.edu/waddell/metrics/difference-in-differences.html
Diff-in-diff: Assumptions

Other assumptions for inference:

The allocation of the policy intervention was not determined by the outcome of
interest

The composition of the control and treatment groups is stable (true for panel data;
must be checked for repeated cross sections)

No spillover effects (i.e., the effect on the treatment group does not indirectly
effect the control group)

25/91
Diff-in-diff: Seven more things

Seven more things. First:

Difference in differences can handle different “cases”

Our toy example had a difference across all four dimensions:

• Pre-intervention control & control groups


• Post-intervention control & treatment groups
• Pre- and post-intervention control group
• Pre- and post-intervention treatment group

26/91
Diff-in-diff: Seven more things

Possible alternative: no initial differences (⇒ β1 = 0)

15.6

11.7
Treatment group
Control &
10.1
treatment groups
Pre-intervention Post-intervention
27/91
Diff-in-diff: Seven more things

Possible alternative: no trend in the control group (⇒ β2 = 0)

15.6

Treatment group 10.2


Control group 10.1 10.1

Pre-intervention Post-intervention
28/91
Diff-in-diff: Seven more things

Possible alternative: no effect of the intervention (⇒ β3 = 0)

11.8
11.7
Treatment group 10.2
Control group 10.1

Pre-intervention Post-intervention
29/91
Diff-in-diff: Seven more things

Second thing.

Researchers sometimes estimate a “triple difference” (DDD)

We saw an example of this in Week 2 when discussing the use of field experiments
in economics

Chetty, Looney, and Kroft (2009) look at differences:

• Before and after the intervention


• Between treated and control products
• Between treated and control stores

30/91
Diff-in-diff: Seven more things

Third thing.

Our toy example split our population randomly

With most natural experiments, this is not the case

The validity of a difference in differences approach depends on how comparable the


treatment and control groups are

As a result, diff-in-diff is often combined with matching methods (and


appropriate adjustments to standard errors)

31/91
Diff-in-diff: Seven more things

Fourth thing.

In a difference in differences framework, we are always comparing a treatment and


a control group in a pre-period and a post-period

⇒ Diff-in-diff requires data from the pre- and post- period (like panel data or
repeated cross sections)–a single cross section won’t work

32/91
Diff-in-diff: Seven more things

Fifth thing.

The coefficients of the regression model presented earlier represent means and
differences in means–there is no slope term included

We could instead apply the diff-in-diff design to coefficient estimates from some
other regression (rather than to variables)

Broadly, we would run a regression for the pre-periods and a regression for the
post-periods

This allows us to examine the trend in post-treatment coefficients (i.e., are the
increasing? Decreasing?)

33/91
Diff-in-diff: Seven more things

Sixth thing.

Difference in differences estimation can handle:


• More than two groups
– If treatment rolled out over time −→ staggered diff-in-diff
• More than two periods (multiple pre-periods and multiple post-periods)

The second point in particular is increasingly popular, as it provides (1) additional


evidence of the parallel trends assumption and (2) evidence of how the effect
evolves over time

Note, when using many years of data, standard errors likely need to be adjusted for
autocorrelation (correlation across time)
34/91
Diff-in-diff: Seven more things

The generalized regression is:


Yigt = αg + γt + βIgt + δXigt + uigt

• Yitg : outcome Y for individual i of group g at time t


• αg : group-specific fixed effect for group g (will be different for every g of n groups)
• γt : time-specific fixed effect for time t (will be different for every t of m periods)
• β: diff-in-diff effect
• Igt : interaction term
– One for each of n x m group x period dummies
– Can capture effect heterogeneity over time

This specification is called two-way fixed effects (TWFE) estimation because of the
group and time fixed effects
35/91
Diff-in-diff: Seven more things

Last thing.

Difference in difference estimation is undergoing a bit of a revolution:


“Several DiD innovations came out simultaneously in 2020 and 2021 with
some staggered roll-outs in 2022. At the heart of this new DiD literature
is the premise that the classic Two-way Fixed Effects (TWFE) model
can give wrong estimates. This is especially true if the treatments are
heterogeneous (differential treatment timings, different treatment sizes,
different treatment statuses over time) that can result in ‘“negative weights’
contaminating the ATE.” (Asjad Naqvi, Austrian Institute for Economic
Research)
Source: https://asjadnaqvi.github.io/DiD/
36/91
Diff-in-diff: Seven more things

The new diff-in-diff methods provide estimation techniques that “correct” for the
biases in TWFE
Three critical papers in this space:

1. Liyang Sun and Sarah Abraham (2021). “Estimating Dynamic Treatment


Effects in Event Studies with Heterogeneous Treatment Effects.” Journal of
Econometrics, 225(2): 175–99.
2. Brantly Callaway and Pedro H.C. Sant’Anna. 2021. “Difference-in-Differences
with Multiple Time Periods.” Journal of Econometrics, 225(2): 200–230.
3. Andrew Goodman-Bacon (2022). “Difference-in-Differences with Variation in
Treatment Timing.” Journal of Econometrics, 225(2): 254-277.

37/91
Diff-in-diff: Seven more things

In short:

Correctly implementing diff-in-diff in a given setting requires understanding what


modifications to the standard errors are required

Without correct standard errors, we cannot make correct inference

38/91
Diff-in-diff: More examples

Paper: Raj Chetty, Adam


Looney, and Kory Kroft
(2009). “Salience and
Taxation: Theory and
Evidence.” American
Economic Review 99(4):
1145-1177

Research question: How


salient are price-exclusive
sales taxes?

39/91
Diff-in-diff: More examples

Paper: William C. Boning,


Nathaniel Hendren, Ben
Sprung-Keyser, and Ellen
Stuart (2023). “A Welfare
Analysis of Tax Audits Across
the Income Distribution.”
NBER Working Paper.

Research question: How do


the returns to tax audits of
individuals vary across the
income distribution, and
what are the implications for
welfare analysis?

40/91
Diff-in-diff: More examples

Paper: Steven Hamilton,


Geoffrey Liu, Jorge
Miranda-Pinto, and Tristram
Sainsbury (2023). “Early
Pension Withdrawal as
Stimulus.” Working Paper.

Research question: What


was the marginal propensity
to spend early
superannuation withdrawals
made during the COVID-19
pandemic?

41/91
Diff-in-diff: More examples

Paper: Douglas Almond,


Kenneth Y. Chay, and
Michael Greenstone (2006).
“Civil Rights, the War on
Poverty, and Black-White
Convergence in Infant
Mortality in the Rural South
and Mississippi.” MIT
Working Paper 07-04.

Research question: How did


increased access to hospitals
in the rural South after the
passage of the Civil Rights
Act affect infant mortality for
Black children?
42/91
Diff-in-diff: More examples

Paper: Owen Kay and


Michael Ricks (2023).
“Time-Limited Subsidies:
Optimal Taxation with
Implications for Renewable
Energy Subsidies.” Working
Paper.

Research question: What is


the optimal subsidy policy
when the subsidy will be in
effect for a limited period of
time?

43/91
Diff-in-diff: More examples

Paper: Esther Duflo (2001).


“Schooling and Labor Market
Consequences of School
Construction in Indonesia:
Evidence from an Unusual
Policy Experiment.”
American Economic Review,
91(4): 795-813.

Research question: How did


the expansion in the number
of schools constructed in
Indonesia between 1983 and
1978 affect years of
schooling?

44/91
Paper discussion
Feldstein (1995)

Martin Feldstein (1995). “The Effect of Marginal Tax Rates on Taxable Income: A
Panel Study of the 1986 Tax Reform Act.” Journal of Political Economy, 103(3):
551-572.

Abstract: This paper uses a Treasury Department panel of more than 4,000
taxpayers to estimate the sensitivity of taxable income to changes in tax rates on
the basis of a comparison of the tax returns of the same individual taxpayers
before and after the 1986 reform. The analysis emphasizes that the response of
taxable income involves much more than a change in the traditional measures of
labor supply. The evidence shows an elasticity of taxable income with respect to
the marginal net-of-tax rate that is at least one and could be substantially higher.
The implications for recent tax rate changes are discussed.
45/91
Feldstein (1995): Context

What is the context?

Tax Reform of 1986 (United States) – TRA86

• Reduced marginal income tax rates (MTR) for high-income individuals


– Highest MTR fell from 50% to 28%
– Net-of-tax rate rose from 50% to 72% (a 44% increase)
• Bigger tax base (e.g., capital gains taxed as ordinary income)

46/91
Feldstein (1995): Context

1985 1988
Bracket > AGI MTR > AGI MTR
1 $0 0.0% $0 11%
2 $3,670 11.0% $3,000 15%
3 $5,940 12.0% $30,950 28%
Additional context (not provided 4 $8,200 14.0% $45,000 35%
by the paper) 5 $12,840 16.0% $90,000 39%
6 $17,270 18.0%
Marginal tax rates faced by 7 $21,800 22.0%
8 $26,550 25.0%
married-filing-jointly returns in 9 $32,270 28.0%
1985 versus 1988: 10 $37,980 33.0%
11 $49,420 38.0%
12 $64,750 42.0%
13 $92,370 45.0%
14 $118,050 49.0%
15 $175,250 50.0%
47/91
Source: https://taxfoundation.org/data/all/federal/historical-income-tax-rates-brackets/
Feldstein (1995): Context

“The Tax Reform Act of 1986 combined sharp reductions in high marginal tax
rates with base-broadening changes in tax rules. The combination was
designed to be approximately revenue neutral and distributionally neutral if
there were no behavioral response to the tax changes.”

“To increase the political appeal of the tax proposal, the tax changes were
actually structured so that tax revenue would decline in each broad income
class (assuming no behavioral response) and so that the resulting revenue
shortfall would be made up by an increase in the corporate income
tax.11 ”

48/91
Feldstein (1995): Context

⇒ tax reform likely disproportionately affected high-income individuals

49/91
Feldstein (1995): Data

What data do they use?

“This is the first time in which panel data have been used to estimate the
sensitivity of taxable income to marginal tax rates.”

• Nonstratified random sample of all tax returns


• Includes all information from Form 1040 and associated Schedules
– Individual Income Tax Return in the U.S.
• Data from 1985 and 1988
– “before the 1986 reductions were enacted or widely anticipated”

50/91
Feldstein (1995): Data

Focus on “the largest marital status subgroup, those taxpayers who were married
and filed a joint return in both 1985 and 1988.”

• “The income of a taxpaying unit can be substantially affected by changes in


marital status through marriage, divorce, or the death of a member of the
couple.”

“Since retirement also causes a substantial change in income, the analysis excludes
taxpayers who were over age 65 in 1988.”

51/91
Feldstein (1995): Data

Paper does not tell us how many observations there were before restricting to the
final analysis sample. For example:
• We don’t know what impact, e.g., dropping all taxpayers who adopted
subchapter S corporation between 1985 and 1988 had on the final sample size
(more on that in a minute)
• We don’t know by what margin married filing jointly is “the largest marital
status subgroup”
Analysis sample includes:
• 3,538 medium-income taxpayers (1985 MTR between 22-38%)
• 197 high-income taxpayers (1985 MTR between 42-45%)
• 57 highest-income taxpayers (1985 MTR 49% or 50%)
52/91
Feldstein (1995): Data

“The analysis excludes taxpayers with 1985 marginal tax rates below 22 percent for
two reasons.”

• Many low-income taxpayers became nontaxable post TRA86 and were no


longer in the data
• Concerns about mean reversion (individuals who experienced a negative
income shock in 1985 may have recovered to “normal” level in 1988)

FN 16 acknowledges that mean reversion may be a problem for the included


individuals, too

53/91
Feldstein (1995): Data

What are some benefits of this context and data?

Many of the advantages are relative to what was available before:

• Panel data −→ same individuals in before and after period


• Allows for response of taxable income as a whole (as opposed to labor force
participation or working hours, as had previously been used via survey data)

54/91
Feldstein (1995): Data

What are some drawbacks of using this context and data?

• Nonstratified sample −→ number of high-income taxpayers is relatively small


• Attrition from the panel
– “There is also some attrition in the sample over time as some lower-income
individuals become nontaxable and as some single individuals who marry case to
be the primary taxpayer on the return. Although this unusual type of panel data
attrition is nonrandom, it is likely to have relatively little effect on the middle-
and upper-income married taxpayers who are the focus of this study.”
• How taxable income was calculated in 1985 was different to how it was
calculated in 1988 (more on this soon)
– To what extent do we believe the various assumptions made to make taxable
income in 1985 comparable to that in 1988?

55/91
Feldstein (1995): Empirical strategy

The empirical strategy is to conduct a simple difference in differences analysis


(remember, this was long before the recent innovations in diff-in-diff)

The paper doesn’t show us the equation, but this is what is estimated:
• First difference: (post-pre) for each group
– Percent change in net of tax rate between 1985-1988
– Percent change in taxable income between 1985-1988
• Second difference: difference of differences for both net of tax rate and
taxable income for each group:
– High-income minus medium-income
– Highest-income minus high-income
– Highest-income minus medium-income

56/91
Feldstein (1995): Empirical strategy

This gives us:

A: The difference in the percent change in taxable income between 1985 and 1988
B: The difference in the percent change in net-of-tax rate between 1985 and 1988

The implied elasticity of taxable income is then calculated as


A
e=
B

57/91
Feldstein (1995): Empirical strategy

To do this, need to address one flaw of the data:

The definition of taxable income changed between 1985 and 1988

If no adjustments were made, we would not be comparing apples to apples

58/91
Feldstein (1995): Empirical strategy

“The changes in the tax rules that accompanied the tax rate reductions mean that
precautions must be taken in comparing incomes in 1985 and 1988. Four such
changes are noteworthy.”

1. Adjusted gross income (AGI) in 1985 excluded 60% of realized capital gains;
exclusion was eliminated by TRA86

Strategy: Paper shows comparisons for both all of AGI and AGI excluding capital
gains

59/91
Feldstein (1995): Empirical strategy

2. TRA86 changed the incentives to use subchapter C corporations, “which


permitted them to pay lower rates of tax than the individual income tax, especially
on profits below $100,000”

• Many people switched to subchapter S corporations after TRA86, “causing


previously excluded corporate income to appear on their personal tax returns”
• Could −→ overestimates of increases in income after tax reform

Strategy: Paper has no way to obtain 1985 subchapter C incomes; drops all
taxpayers who adopted subchater S corporation between 1985 and 1988

60/91
Feldstein (1995): Empirical strategy

3. Certain “passive losses” (e.g., on real estate investments) could no longer be


used to offset other income

• Observe a sharp decline in these investments after 1986


• Unclear if decline is because of inability to offset or re-structuring of income
because of reduced MTR

Strategy: Assume two extremes:

• Assume reduction entirely due to lower MTR (no adjustment for losses)
• Assume reduction entirely due new offsetting rules (add losses to taxable
income in both 1985 and 1988)

61/91
Feldstein (1995): Empirical strategy

62/91
Feldstein (1995): Empirical strategy

4. Changes to personal exemptions, the effective zero bracket amount, and the
definition of taxable income (specifically, that taxable income was defined net of
the zero bracket amount and personal exemptions)

Strategy: adjust 1985 taxable income to be based on 1988 definitions

63/91
Feldstein (1995): Empirical strategy

“One final adjustment is necessary to make modified taxable income for 1985
comparable to the taxable income that the taxpayer would report in 1988 if the
taxpayer did not change his behavior.”

1985 taxable income is increased as follows:

• Consider a taxpayer’s AGI less capital gains


• Assume this amount rose “at the same rate as nominal personal income per
capita (17.4 percent)” over this time period
• The amount of this increase is added to 1985 taxable income

Intuition: even without the tax reform, nominal wages would have increased, on
average, as a result of inflation, promotions, etc.
64/91
Feldstein (1995): Empirical strategy

The 17.4 figure is the average for all of the US.

Do we believe it would have been uniform across the population?

65/91
Feldstein (1995): Empirical strategy

“With these adjustments, the differences among taxpayer groups in the


change in taxable income between 1985 and 1988 should reflect changes in
marginal tax rates, changes in individuals’ market opportunities, and other
nontax sources of change in taxpayer behavior, but not the changes in tax
rules as such.14 Moreover, the observed behavior should reflect the way in
which tax rate changes alter behavior under the post-1986 tax rules with
limited opportunities for tax sheltering.”

66/91
Feldstein (1995): Empirical strategy

Aside: what’s in footnote 14?

“There are of course some additional small changes in tax rules that have not been
taken into account. Three deserve special mention.”
• Rules about who was eligible for a tax-benefited retirement savings account
−→ would increase taxable income more for lowest-income group in study; bias
estimates downward
• Increase in Social Security tax rates and tax base
Small increase that is somewhat offset by future benefits
• Changes to the “alternate minimum tax”
−→ Some people experienced smaller reductions in tax rate, may not have as
large a income response as would have otherwise

67/91
Feldstein (1995): Empirical strategy

Even if we believe the author’s claim that income in 1985 has been sufficient
adjusted so that it is comparable to income in 1988, we need to think about our
critical assumption for diff in diff.

Specifically, for difference in differences to produce correct estimates, we need the


parallel trends assumption to hold:

If the intervention had not occurred (i.e., in the absence of the treatment),
the control and treatment groups would have had a similar trend over time

68/91
Feldstein (1995): Empirical strategy

What would that mean in this context?

The numerator of our elasticity is “the difference in the percent change in taxable
income between 1985 and 1988”

For parallel trends to hold in this context, that would mean that if the tax reform
had not occurred, the percent change in taxable income between 1985 and 1988
would be the same in the treatment and control groups

69/91
Feldstein (1995): Empirical strategy

Do we have any evidence that that is, or is not, true?

70/91
Feldstein (1995): Empirical strategy

However:

Source: Figure 16 in
Emmanuel Saez (2017).
“Income and Wealth
Inequality: Evidence and
Policy Implications.”
Contemporary Economic
Policy, 35(1): 7-25 71/91
Feldstein (1995): Results

Recall the estimation strategy:


The elasticity of taxable income is calculated as
A
e=
B

where:

A: The difference in the percent change in taxable income between 1985 and 1988
B: The difference in the percent change in net-of-tax rate between 1985 and 1988

72/91
Feldstein (1995): Results

73/91
Feldstein (1995): Results

Some observations from Table 1:

• “Medium” income group has average 1985 AGI of $30.7k-$67.5k


• “High” income group has average 1985 AGI of $94.3k-$126.9k
• “Highest” income group has average 1985 AGI of $177.7k-$479.0k

We know that income is very skewed–what is the maximum income in the sample?
The paper notes “Because the sample sizes are relatively small for the top tax rate
groups, calculations are presented in the lower part of the table that combine
several individual 1985 marginal tax rate groups with the appropriate sample
weights.”

74/91
Feldstein (1995): Results

“Because the Tax Reform Act of 1986 did not reduce marginal tax rates on capital
gains in the same way that it did for other income, to study the effect of lowering
marginal tax rates it is appropriate to focus on income excluding capital gains.18 ”

FN 18: “Although in the long run individuals might be able to substitute


compensation in the form of capital gains for some ordinary income, this was
unlikely to be a significant factor just two years after the Tax Reform Act of 1986
was passed.”

75/91
Feldstein (1995): Results

76/91
Feldstein (1995): Results

Some observations from Table 2:


• Estimated ranges from 1.04 to 2.14
– These are much larger than what we saw with Kleven and Schultz (2014)
• Adding partnership losses ↓ estimates (intuition: taxable income smaller ⇒
numerator smaller)
• No standard errors!

77/91
Feldstein (1995): Results

Revisiting the sample selection decisions:

78/91
Feldstein (1995): Results

The paper includes an application of the estimated elasticities to make revenue


predictions based on the 1993 increase in marginal income tax rates that occurred
in 1993:
With no behavioral response, the TAXSIM model implies that the tax
rate changes enacted in 1993 would raise tax liabilities by $25.0 billion at
1993 income levels. If, however, taxable income declines by 12 percent
for individuals with incomes between $140,000 and $250,000 and by 16.5
percent for individuals with incomes over $250,000 (i.e., by the amounts
implied by the lowest estimated elasticity [1.04] of taxable income to net-
of-tax rates), tax revenue would increase by only $1.6 billion.

79/91
Feldstein (1995): Discussion

Feldstein (1995) uses a nonstratified panel of individual U.S. tax returns from 1985 and
1988 to estimate the elasticity of taxable income using a difference in differences approach

The paper was one of the first (ever) in empirical economics to use panel data

The identifying tax variation comes from a single tax reforms which significantly decreased
top marginal tax rates and broadened the tax base

They find:

• Large taxable income elasticities ranging between 1 and 3


• Larger elasticities if partnership losses were excluded–depends on assumption about
why there was a big change in business income

“If the long-run response to a change in marginal tax rates is greater than the short-run
response...this analysis...may understate the long-run sensitivity of taxable income to
80/91
changes in tax rates.”
Feldstein (1995): Discussion

One final point.

Emmanuel Saez, Joel Slemrod, and Seth Giertz* make the following important
observation:
• “[N]ote that if the control group faces a tax change, difference-in-differences
estimates will be consistent only if the elasticities are the same for the two
groups.”
In all of the comparisons in Feldstein (1995), the control group also faced tax
changes, which means the estimates are only consistent if we assume the two
groups have the same elasticity

*Emmanuel Saez, Joel Slemrod, and Seth H. Giertz (2012). “The Elasticity of Taxable Income with Respect to Marginal Tax Rates: A Critical
Review.” Journal of Economic Literature, 50(1): 3-50.
81/91
Feldstein (1995): Discussion

To see this, remember that the estimate of the ETI with this empirical strategy is :

(% ∆ taxable income, TG)-(% ∆ taxable income, CG)


e=
(% ∆ net-of-tax rate, TG)-(% ∆ net-of-tax rate, CG)

(TG = treatment group, CG = control group)


Assume:

• Both groups face a tax change, for example:


– % ∆ in net-of-tax rate, TG = 0.5 * (% ∆ in net-of-tax rate, CG)
• The two groups have different elasticities, for example:
– eT > 0, eC = 0

82/91
Feldstein (1995): Discussion

Then:
• eT = (% ∆ in net-of-tax rate, TG)/(% ∆ in taxable income, TG)
• % ∆ in taxable income, CG = 0

(% ∆ taxable income, TG)


e=
(% ∆ net-of-tax rate, TG)-0.5 * (% ∆ net-of-tax rate, TG)
(% ∆ taxable income, TG)
=
0.5(% ∆ net-of-tax rate, TG)
= 2eT

−→ the estimated elasticity is twice the size of the true elasticity for the treatment
group
83/91
Feldstein (1995): Discussion

All of the groups considered in Feldstein (1995) faced tax rate changes

⇒ for the elasticity estimates to be consistent, we need to assume that all of the
income groups have the same elasticity

Is this a reasonable assumption?

84/91
Feldstein (1995): Discussion

Figure 1: Top Income Shares and Marginal Tax Rates, 1960-2006

85/91
*Emmanuel Saez, Joel Slemrod, and Seth H. Giertz (2012). “The Elasticity of Taxable Income with Respect to Marginal Tax Rates: A Critical
Review.” Journal of Economic Literature, 50(1): 3-50.
Feldstein (1995): Discussion

Figure 1: Top Income Shares and Marginal Tax Rates, 1960-2006

*Emmanuel Saez, Joel Slemrod, and Seth H. Giertz (2012). “The Elasticity of Taxable Income with Respect to Marginal Tax Rates: A Critical 86/91
Review.” Journal of Economic Literature, 50(1): 3-50.
Feldstein (1995): Discussion

What is the implied answer to our research question?

We asked: what is the revenue-maximizing linear tax rate?

We used a simple model to derive the following relationship:


1
τ∗ =
1+e

where e is the elasticity of taxable income

87/91
Feldstein (1995): Discussion

Taxpayer groups (classified by 1985 marginal rate):


• Medium (22-38): M
• High (42-45): H
• Highest (49-50): HH
Measures of taxable income: Adjusted taxable income (ATI) and ATI + gross loss
(ATI+GL)
T. group C. group Income e Implied τ ∗
H M ATI 1.10 47.6%
H M ATI+GL 1.04 49.0%
HH H ATI 3.05 24.7%
HH H ATI+GL 1.48 40.3%
HH H ATI 2.14 31.8%
HH M ATI+GL 1.25 44.4% 88/91
Wrapping up

Today:

• Inference using difference in differences

Key take-aways:

• Difference in differences is very cool!


• Diff-in-diff is hard to do “right”

89/91
Wrapping up

How does this relate to our research question?

• We need to estimate the elasticity of taxable income


• To do this, we need a causal estimate of how taxable income changes when
the net-of-tax rate changes
• One way to (maybe) get causality is to use difference in differences
• Feldstein (1995) provides us with several estimates

90/91
Wrapping up

Next time (Week 9):


• Lecture: More inference (specifically, regression discontinuity)
• Tutorial: Prompt 4 (next slide)
• Media assignment (slide after that) – due in Week 13
References:
• A Guide on Data Analysis, Mike Nguyen (2020) (https:
//bookdown.org/mike/data_analysis/difference-in-differences.html)
• Data Science for Public Service, Program Evaluation III
(https://ds4ps.org/PROG-EVAL-III/index.html)
• Difference-in-Difference Estimation, Columbia Mailman School of Public
Health (https://www.publichealth.columbia.edu/research/
population-health-methods/difference-difference-estimation)
91/91
Prompt 4

Consider the first two papers we’ve read that try to estimate the elasticity of
taxable income (e): Kleven and Schultz (2014) and Feldstein (1995).
For each paper, outline the following in about 150-175 words:

• What is the empirical strategy used?


• What are the critical assumptions necessary for the empirical strategy to
provide a causal estimate?
• What evidence does the paper provide that these assumptions hold?
• Do you believe the arguments presented?

Conclude with the following: note which of the two papers you find more credible
and, in 4-5 sentences, explain the main reason(s) why.
Media assignment

Find a data visualization in a news article (or other media source). In 3 minutes, discuss:
• Where is the visualization from? Was it created by the news source, or did they copy
a visualization from somewhere else?
• What is the visualization trying to show? Is it successful?
• Does the surrounding article discuss the underlying data? Is it mentioned in any notes
of the figure?
– If yes, what is the data source? How do you feel about that data?
– If no, does it change the way you feel about the figure to not know where the
underlying data is from? How so?
• Is there anything that is misleading about the figure?
Your recording should be structured so that both you and the visualization are visible. One
way to do this would be to put your visualization on a slide and share your screen for the
recording.

You might also like