What Do They Do, How Should I Use Them, and Why Should I Care?
What Do They Do, How Should I Use Them, and Why Should I Care?
Thomas E. Love, PhD Center for Health Care Research & Policy Case Western Reserve University [email protected] ASA Cleveland Chapter, December 2003
Propensity Scores:
Ceteris Paribus
Importance of Randomization
1 2 1 2 1 1 2 2 2 1 2 1 2 1 2 2 1 1 2 1
We Usually Assign Treatments At Random Except sometimes, we cant. Or we just dont. And sometimes, these are the situations we care most about. So how can we assess causal effects in observational studies?
Observational Studies
2 1 2 1 2 1 1 2 1 2 2 2 1 1 1 1 1 1 2 2 2 1 2 1
In an observational study, the researcher does not randomly allocate the treatments.
CHD risk for patients using HRT was 1.29 times as high as the risk for patients not using HRT.
Thou Shalt Value Parsimony Thou Shalt Examine Thy Predictors For Collinearity Thou Shalt Test All Thy Predictors For Statistical Significance Thou Shalt Have Ten Times As Many Subjects As Predictors Thou Shalt Carefully Examine Thy Regression Coefficients (Beta Weights)
Thou Shalt Perform Bootstrap Analyses To Assess Shrinkage Thou Shalt Perform Regression Diagnostics and Examine Residuals With Care Thou Shalt Hold Out A Sample of Thy Data for Cross-Validation Thou Shalt Perform External Validation on a New Sample of Data
Thou Shalt Ignore Commandments 1 through 9 And Instead Simply Ensure That The Model Adequately Balances The Covariates
Apologies to Joe Schafer
No reason that absence of significance implies imbalance is small enough to be ignored. Doesnt consider covariate-to-outcome relationship. This process considers covariates one at a time, while the PS adjustments will control the covariates simultaneously.
If those who receive treatment dont overlap (in terms of covariates) with those who receive the control, weve got nothing to compare. Modeling, no matter how sophisticated, cant help us to develop information out of thin air.
Not much help. The information available to infer treatment effect will reside almost entirely in the few patients who overlap. Need to think hard about whether useful inferences will be possible.
Baseline characteristics appear very dissimilar: 25 of 31 covariates have p < .001, 28 of 31 have p < .05. Aspirin user covariates indicate higher mortality risk.
Baseline characteristics similar in matched users and non-users. 30 of 31 covariates show NS difference between matched users and non-users. [Peak exercise capacity for men is p = .01]
100( xTreatment xControl ) for continuous variables d= 2 2 sTreatment + sControl 2 100( pTreatment pControl ) d= for binary variables pT (1 pT ) + pC (1 pC ) 2
Using Standardized Differences to Measure Covariate Balance Standardized Differences greater than 10% in absolute value indicate serious imbalance
After Match:
352/1351 (26.1%) Aspirin users used -blockers 358/1351 (26.5%) non-Aspirin users used -blockers Standardized Difference is 1.0% P value for difference is .79
Before Match
100
A s p - N o A s p St a nd a r d i z e d D i f f e r e n c e ( % )
50
100
A s p - N o A s p St a nd a r d i z e d D i f f e r e n c e ( % )
80
100
120
There can be a severe bias due to incomplete matching its often better to match all treated subjects, then follow with analytical adjustments for residual imbalances in the covariates. In practice, concern has been inexactness. Certainly worthwhile to define the comparison group and carefully explore why subjects match.
Inferences for the causal effects of treatment on the subjects with no overlap cannot be drawn without heroic modeling assumptions. Usually, wed exclude these treated subjects, and explain separately.
Non-users Users
During follow-up 153 (6%) of the 2702 propensity scorematched patients died. Aspirin use was associated with a lower risk of death in matched group (4% vs. 8%, p = .002).
Relative Risk
2
Crude Severity Selection Both
Relative risk of death by 30 days for patients with pneumonia vs. patients without. Severity risk-adjusted Selection PS-adjusted
Relative Risk
1.8
1.6
1.4
Severity risk-adjusted Selection PS-adjusted Selection bias blunts the rehab effect size.
1.2
Crude Severity Selection Both