Statistical Models For Causal Analysis - Causal Inference - Notes

This document summarizes key concepts in statistical causal inference models. It defines causal effects as the difference between potential outcomes under treatment versus no treatment. The fundamental challenge is that we can never observe both potential outcomes for the same unit. The average treatment effect is the average causal effect across all units. Key assumptions like SUTVA are needed to identify causal effects from observational data. Naive comparisons of treatment and control groups are biased if treatment selection is related to potential outcomes. Randomized experiments best address this bias.

Uploaded by

Miriam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views3 pages

Statistical Models For Causal Analysis - Causal Inference - Notes

Uploaded by

Miriam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Chapter 1: Statistical Models for Causal Analysis

MIT 17.802: Quantitative Research Methods II

2023-03-19

Lecture 1: Statistical Models for Causal Analysis

Basic concepts and definitions

Causal inference - inference about counterfactuals We need a statistical model that can explicitly distinguish
factuals and counterfactuals.
Treatment (Di ): Indicator of treatment intake for unit i, where i = 1, . . . , N
Observed outcome (Yi ): Variable of interest whose value may be effected by the treatment
Yi = YDi i = Di Y1i + (1 − Di )Y0i
Meaning that if Di = 1, then Yi = Y1i , and if Di = 0, then Yi = Y0i .
Potential outcomes (Ydi ): Value of the outcome that would be realized if unit i received the treatment d
where d = 0 or 1
Y1i - potential outcome for unit i with treatment.
Y0i - potential outcome for unit i without treatment.
Causal effect / unit treatment effect (τi = Y1i − Y0i )
The fundamental problem of causal inference is that we can never observe both Y1i and Y0i for the
same i. This makes τi unidentifiable without further assumptions.

Key assumptions

SUTVA: Stable Unit Treatment Value Assumption

′
Y(D1 ,D2 ,...,DN )i = Y(D1′ ,D2′ ,...,DN
′ )i if Di = D .
i

This means: 1. No interference between units (spillover effects, contagion, dilution, etc). 2. Stability of
treatment across units (no different versions of treatments)
Without SUTVA, even with a two unit vector, there are way too many potential outcomes for unit one:
Y(0,0)1 , Y(1,0)1 , Y(0,1)1 , Y(1,1)1 . This means that there are at least six causal effects for unit 1 (all the possible
combinations of one of those potential outcomes minus the other.)

Key estimands

Since unit-level casual effects are fundamentally unobservable, we instead focus on averages in most situa-
tions.
The Average treatment effect (ATE) is still identified, and throughout this course we will consider various
assumptions under which it can be identified from the observed information.

1
N
1 X
τAT E − [Y1i − Y0i ] = E[Y1i − Y0i ]
N i=1

The average treatment effect on the treated (ATT) is not equal to ATE when Di and Ydi are associated.

N
1 X
τAT T = Di [Y1i − Y0i ] = E[Y1i − Y0i |Di = 1]
N1 i=1

PN
Where N1 = i=1 Di , or the the number of treated units.
The average treatment effect on the control (ATC) by extension could be thought of as:
N
1 X
τAT C = (1 − Di )[Y1i − Y0i ] = E[Y1i − Y0i |Di = 0]
N0 i=1

The conditional average treatment effect is a subgroup effect; the treatment effect on units that have
particular characteristics x.

τCAT E (x) = E[Y1i − Y0i |Xi = x]

Where Xi is a pre-treatment covariate for unit i.

The most common naive estimator is a comaprison of observed outcomes for the treated and untreated:

N1 N2
1 X 1 X
τ̃ = y1i − y0i = E[Yi |Di = 1] − E[Yi |Di = 0]
n1 i=1 n0 i=1

Unfortunately, this estimator is biased if selection into treatment is associated with potential outcomes.
Proof (drawing also from Recitation 2, slide 5)
This is what we start out with:
τ̃ = E[Yi |Di = 1] − E[Yi |Di = 0]

Which, in practice is this:

τ̃ = E[Y1i |Di = 1] − E[Y0i |Di = 0]

Now we can add and subtract (so the terms cancel out) E[Y0i |Di = 1], the hypothetical expected outcome of
if a treatment group individual didn’t get treatment.

Then we switch the order around:

τ̃ = E[Y1i |Di = 1] − E[Y0i |Di = 1] + E[Y0i |Di = 1] − E[Y0i |Di = 0]

Here, the first half of our new τ̃ is the average treatment effect on the treated:

E[Y1i |Di = 1] − E[Y0i |Di = 1] = τAT T

The second half of our new τ̃ is the selection bias, because it represents the non treatment driven differences
in expected values between the treatment and control groups.

2
E[Y0i |Di = 1] − E[Y0i |Di = 0] = SelectionBias

The only time that ATT will be identified here is when selection bias is zero, which will only happen
when:the expected outcome of no treatment for the treatment group is the same as the expected outcome
of no treatment for the control group.

E[Y0i |Di = 1] = E[Y0i |Di = 0] = E[Y0i ]

Meanwhile (still going off of recitation), ATE will be identified when both:

E[Y0i |Di = 1] = E[Y0i |Di = 0] = E[Y0i ]

AND

E[Y1i |Di = 1] = E[Y1i |Di = 0] = E[Y1i ]

Research design can make it more likely that these conditions are met:
Best: Researcher randomizes the treatment.
Next best: Treatment assignment process is quasi-random and well understood (“natural experiments”)
Not so great: Treatment is “as if” random after statistical control (regression, matching)
Worst: Treatment is self-selected and no plausible control is available.