Propensity Scores
Propensity Scores
Surgery-Drug
Surgery-Drug
Treatment
Treatment
Effect
Effect
= = -
Ceteris Paribus
Other Things Being Equal
• Suppose we are comparing a treated group to a
control group, and we want to know if the
treatment has a causal effect on the outcome.
• To be fair, we must compare treated and controls
who are similar in terms of everything that affects
the outcome, except for the receipt of treatment.
• How do we, as statisticians, typically recommend
that people do this?
Importance of Randomization
2 1
1 1
2 1 2 1
2 1
1 2
2 1 2 1
2 2
1 2
336,117 12,679
How Are Experiments Designed?
1 2
1 2 1 1
2 2
2 1 1 1
0 0 0
Not Treated Treated Not Treated Treated Not Treated Treated
Aspirin Use and Mortality
2
100( pTreatment − pControl )
d= for binary variables
pT (1 − pT ) + pC (1 − pC )
2
|Standardized Differences| > 10%
Indicate Serious Imbalance
Before Match:
– 811/2310 (35.1%) Aspirin users used β-blockers
– 550/3864 (14.2%) non-Aspirin users used β-blockers
– Standardized Difference is 49.9%
– P value for difference is < .001
After Match:
– 352/1351 (26.1%) Aspirin users used β-blockers
– 358/1351 (26.5%) non-Aspirin users used β-blockers
– Standardized Difference is –1.0%
– P value for difference is .79
Covariate Balance for Aspirin Study
PriorCAD
PriorPCInterv
PriorCABG
LipidLow
MayoRisk
Age
BetaBl
%Men
StressIsche
EchoLV
PriorQMI
Diltiazem
Hypertension
Ischemic
Diabetes
RSysBP
Nifedipine
Before Match
ACEinh
Digoxin
CongestHF
ChestPain
AtrialFib
Fitness
Tobacco
RDiasBP
BMI
HRRecov
PeakExMen
PeakExWom
RestHrtRate
EjectionFrac
-50 0 50 100
A s p - N o A s p St a nd a r d i z e d D i f f e r e n c e ( % )
Covariate Balance for Aspirin Study
PriorCAD
PriorPCInterv
PriorCABG
LipidLow
MayoRisk
Age
BetaBl
%Men
StressIsche
EchoLV
PriorQMI
Diltiazem
Hypertension
Ischemic
Diabetes
RSysBP
Nifedipine
Before Match
ACEinh
Digoxin After Match
CongestHF
ChestPain
AtrialFib
Fitness
Tobacco
RDiasBP
BMI
HRRecov Stdzd Diff = 16%
PeakExMen
PeakExWom
RestHrtRate
EjectionFrac
-50 0 50 100
A s p - N o A s p St a nd a r d i z e d D i f f e r e n c e ( % )
Absolute Standardized Differences
PriorCAD
PriorPCInterv
PriorCABG
LipidLow
MayoRisk
Age
BetaBl
%Men
EjectionFrac
RestHrtRate
PeakExWom
StressIsche
EchoLV
PriorQMI
Diltiazem
Hypertension
PeakExMen Before Match
Ischemic
HRRecov After Match
BMI
Diabetes
RSysBP
Nifedipine
ACEinh
RDiasBP
Tobacco
Digoxin
CongestHF
Fitness
ChestPain
AtrialFib
0 20 40 60 80 100 120
Asp - NoAsp Ab sol u te S tan dardize d Di ffe re n ce ( %)
Incomplete vs. Inexact Matching
• Trade-off between
– Failing to match all treated subjects (incomplete)
– Matching dissimilar subjects (inexact matching)
• There can be a severe bias due to incomplete matching
– it’s often better to match all treated subjects, then
follow with analytical adjustments for residual
imbalances in the covariates.
• In practice, concern has been inexactness.
• Certainly worthwhile to define the comparison group
and carefully explore why subjects match.
What if Treated and Untreated Groups
Don’t Overlap Completely?
Propensity Score
• Inferences for the causal effects
1
of treatment on the subjects
with no overlap cannot be
drawn without heroic
modeling assumptions.
• Usually, we’d exclude these
treated subjects, and explain
separately.
0
Not Treated Treated
Which Aspirin Users Get Matched?
• Generally, characteristics of unmatched aspirin users
tend to indicate high propensity scores.
– Overall, 37% of patients were taking aspirin.
– The rate was much higher in some populations…
67% of Prior CAD patients were taking aspirin.
– So prior CAD patients had higher propensity scores for
aspirin use.
– Of the unmatched aspirin users, 99.8% (957/959) had prior
coronary artery disease.
– So it’s likely that the unmatched users tended towards larger
propensity scores than the matched users.
Who’s Getting Matched Here?
Where Do The Propensity Scores Overlap?
Propensity to Caveat: This simulation depicts
Use Aspirin what often happens.
1
2.0
• Effect of rehabilitation
on discharge to home
from nursing home
Relative Risk
1.8
– Murray et al. (2003)
Arch Phys Med Rehab.
1.6 • Severity – risk-adjusted
• Selection – PS-adjusted
1.4 • Selection bias blunts the
rehab effect size.
1.2
Crude Severity Selection Both
On Planning an Observational Study
(Rosenbaum, 2002)
• A convincing OS is the result of active observation,
a search for those rare circumstances in which
tangible evidence may be obtained to distinguish
treatment effects from the most plausible biases.
• Experimental control is replaced in a good OS by
careful choice of environment. Design is crucial!
– Options narrow as an investigation proceeds.
– These are reasonable methods with large samples,
especially if we have a good selection model using
multiple covariates.
What should always be done in an OS …
and often isn’t?
• http://www.chrp.org/propensity
• Free issue of Health Services and Outcomes Research
Methodology (December 2001 v2 issues 3-4) – go to
http://www.kluweronline.com/issn/1387-3741
– I especially recommend the article by DB Rubin (using PS
in designing a complex observational study) and the article
by Landrum and Ayanian (PS vs. Instrumental Variables)
• Rosenbaum PR (2002) Observational Studies, Springer.
• Email me at [email protected]