Matching and The Propensity Score Handout
Matching and The Propensity Score Handout
Alberto Abadie
MIT
Alternatives to regression:
Subclassification
Matching
Definition (Outcomes)
Those variables, Y , that are (possibly) not predetermined are
called outcomes (for some individual i, Y0i 6= Y1i )
In general, one should not condition on outcomes, because this
may induce bias
Randomization implies:
Therefore:
E [Y |D = 1] − E [Y |D = 0] = E [Y1 |D = 1] − E [Y0 |D = 0]
= E [Y1 ] − E [Y0 ]
= E [Y1 − Y0 ].
V U
X
X is a confounder, V and U are not.
Conditional on X there is no confounding.
Correlation between Y and D conditional on X is reflective of
the effect of D on Y . That is:
⊥D|X .
(Y1 , Y0 )⊥
Identification Result
Given selection on observables we have
E [Y1 − Y0 |X ] = E [Y1 − Y0 |X , D = 1]
= E [Y |X , D = 1] − E [Y |X , D = 0]
Identification Result
Similarly,
αATET = E [Y1 − Y0 |D = 1]
Z
= E [Y |X , D = 1] − E [Y |X , D = 0] dP(X |D = 1)
Matching
A: Trainees
3
2
1
frequency
0
B: Non−Trainees
3
2
1
0
20 30 40 50 60
age
Graphs by group
A: Trainees
3
2
1
frequency
0
B: Non−Trainees
3
2
1
0
20 30 40 50 60
age
Graphs by group
Treatment effect estimates
Before matching
After matching:
Matching
Works well when we can find good matches for each treated unit,
so M is usually small (typically, M = 1 or M = 2).
Matching
Then,
1 X
bATET − αATET
α = (µ1 (Xi ) − µ0 (Xi ) − αATET )
N1
Di =1
1 X
+ (εi − εj(i) )
N1
Di =1
1 X
+ (µ0 (Xi ) − µ0 (Xj(i) )).
N1
Di =1
Matching: Bias
Now, if k is large:
⇒ The difference between Xi and Xj(i) converges to zero very slowly
⇒ The difference µ0 (Xi ) − µ0 (Xj(i) ) converges to zero very slowly
√
⇒ E [ N1 (µ0 (Xi ) − µ0 (Xj(i) ))|D = 1] may not converge to zero!
√
⇒ E [ N1 (b αATET − αATET )] may not converge to zero!
µ0 (Xi ) − µ0 (Xj(i) )
to the bias.
Bias-corrected matching:
BC 1 X
bATET
α = (Yi − Yj(i) ) − (b
µ0 (Xi ) − µ
b0 (Xj(i) ))
N1
Di =1
Propensity score
⊥D|X , then
Because of the Rosenbaum-Rubin result, if (Y1 , Y0 )⊥
p(X )
X X
And the results follow from integration over P(X ) and P(X |D = 1).
D − p(X )
αATE =E Y
p(X )(1 − p(X ))
1 D − p(X )
αATET = E Y
P(D = 1) 1 − p(X )
N
1 X Di − pb(Xi )
bATE
α = Yi
N pb(Xi )(1 − pb(Xi ))
i=1
XN
1 Di − pb(Xi )
bATET
α = Yi
N1 1 − pb(Xi )
i=1
Z X Y
In the DAG:
Z X Y
U is a parent of X and Y .
X and Y are descendants of Z .
There is a directed path from Z to Y .
There are two paths from Z to U (but no directed path).
X is a collider of the path Z → X ← U.
X is a noncollider of the path Z → X → Y .
Confounding
Confounding arises when the treatment and the outcome have
common causes.
D Y
X
The association between D and Y does not only reflect the
causal effect of D on Y .
Confounding creates backdoor paths, that is, paths starting
with incoming arrows. In the DAG we can see a backdoor
path from D to Y (D ← X → Y ).
However, once we “block” the backdoor path by conditioning
on the common cause, X , the association between D and Y is
only reflective of the effect of D on Y .
D Y
X
Blocked paths
A path is blocked if and only if:
It contains a noncollider that has been conditioned on,
Or, it contains a collider that has not been conditioned on and
has no descendants that have been conditioned on.
Examples:
1 Conditioning on a noncollider blocks a path:
X Z Y
2 Conditioning on a collider opens a path:
Z X Y
3 Not conditioning on a collider (or its descendants) leaves a
path blocked:
Z X Y
Backdoor criterion
Suppose that:
D is a treatment,
Y is an outcome,
X1 , . . . , Xk is a set of covariates.
X1 D Y
X2
U X2 D Y
X1 D Y
X2
Conditioning on X1 blocks the backdoor path. Conditioning
on X2 would open a path!
Matching on all pretreatment covariates is not always
the answer: There is one backdoor path and it is closed.
U1
X D Y
U2
No confounding. Conditioning on X would open a path!
Implications for practice (cont.)
X3 D Y X3 D Y
X2 X2
Appendix:
Matching Distance Metric
Matching: Distance metric
When the vector of matching covariates,
X1
X2
X = . ,
..
Xk
where
b12
σ 0 ··· 0
0 b22
σ ··· 0
b
V = .. .. .. .. .
. . . .
0 0 ··· bk2
σ
Notice that, the normalized Euclidean distance is equal to:
v
u k
uX (Xni − Xnj )2
kXi − Xj k = t 2
.
n=1
b
σ n