Missing Data
Missing Data
– …………
2
Example: Longitudinal Data with
Dropout (Hedeker and Gibbons, 1997)
3
4
Bias when ignoring subjects with
missing data: a simulation study
• True model:
X ~ N(0,1)
Logit[Pr(E=1|X)]=0.5+X
logit[Pr(D=1|E,X)]=0.25+0.5X+1.1E
5
Missing-Data Mechanism
D E X
• D and E : completely observed
• X : sometimes missing ?
?
• Values of X in each cell are set ?
to missing with the following
underlying probabilities:
D=0,E=0: p00=0.19
D=0,E=1: p01=0.09
D=1,E=0: p10=0.015
D=1,E=1: p11=0.055
6
Before Deletion Estimates
• Histogram of 5000
estimates before deleting
values of X
• logistic model
logit Pr(D=1|E,X)
=b0+b1E+b2X
7
Complete-Case Estimates
• Histogram of
complete- case
analysis estimates
• Delete subjects with
missing X values
• True value = 1.1,
serious negative bias
8
Patterns of Missing Data
• General pattern
variables
cases
9
10
Some Examples of Missingness Mechanism
11
More Examples
• MCAR:
– patients had their weight measured by flipping a
coin.
• MAR:
– patients with high blood pressure had their
weight measured.
• NMAR:
– overweight patients had their weight measured.
What Mechanism to Assume
• MCAR:
– Simplest mechanism; strongest assumption;
usually not the true mechanism in practice
• NMAR:
– Most complex mechanism; weakest assumption;
likely the true mechanism in practice
• MAR:
– A mechanism between MCAR and NMAR;
oftentimes a good approximation to the truth;
easy to work with
General Strategies
Complete cases
???
??
?? ?
??? ? ???
Complete cases
???
??
?? ? Discard
??? ? ???
17
18
Imputation and
Multiple
Imputation
19
Problem
Y1 Y2 Y3 Y4 … Yp
Complete
cases
Cases with
some missing
values
Important
issues:
Imputations are
not real values
Uncertainties
associated with
imputes
Imputation :
Draws from Pr( Dmiss | Dobs )
22
Features of Imputation
Complete cases
437
63
22 1
741 7 234
Imputations
Good Bad
Rectangular File Naïve methods can be bad
Retains observed data Invents data –
Handles missing data once Understates uncertainty
Exploits incomplete cases
23
A Bivariate Example:
Continuous Case
• Imputations are random draws
from a predictive distribution
for the missing values
Y1 Y2 mean
yˆ i 2 = Eˆ ( yi 2 | yi1 )+ ri
ri ~ N (0, s221 ), s221 = resid variance, or
yˆ r +1,2
yˆ r + 2,2 ri = residual from randomly selected complete case
yˆ r +3,2
24
A Bivariate Example: Binary
Case
• For binary (0-1) data, impute 1 with
probability = predicted prob of a one
given observed covariates
Y1 Y2
pˆ i 2 = Pr( yi 2 = 1| yi1 ) (e.g. logistic regression)
1, prob pˆ i 2
pˆ r +1,2 yi 2 =
pˆ r + 2,2 0, prob 1 − pˆ i 2
pˆ r +3,2
25
Example: Should Imputations be
conditional on all observed variables?
26
BLS Simulation Example
• BLS researchers:
– created population by accumulating
complete cases over several years
– drew 200 random samples of size 500 each
(Before deletion data sets)
– created missing data on income in each data
set
– supplied 200 data sets along with 55
covariates to University of Michigan
27
BLS Example (Continued)
• UM did not know how Income values
were deleted (except that some or all of
55 covariates were used in specifying
missing data mechanism)
• UM created two sets of imputations
Using Expenditure
Not Using Expenditure
28
BLS Imputations
• Imputations were created by drawing
values from the posterior predictive
distribution of income under an explicit
model
• One included expenditure as a
conditioning variable and other did not
• Two sets of imputed data sets and actual
data sets were analyzed by UM and BLS
respectively.
29
BLS Models of Interest
• OLS model
Food-At-Home=b0+b1 Income+covariates
• Tobit Model
Food-Away-Home= g0+g1 Income+covariates
30
Estimated regression coefficients of income from
undeleted and imputed data-sets: OLS Model
31
Estimated regression coefficients of income from
undeleted and imputed data-sets: Tobit Model
32
What should imputes condition on?
33
Key Problem of Single
Imputation
– multiple imputation
34
Multiple Imputation
• Create D sets of imputations, each set a draw
from the predictive distribution of the missing
values
– e.g. D=5
Y1 Y2 Y3 Y4 Y5
1 2 3 4 5
?
? 2.1 2.7 1.9 2.5 2.3
? ? 4.5 5.1 5.8 3.9 4.2
1 1 2 1 2
24 31 32 18 25
35
Multiple Imputation Inference
• D completed data sets (e.g. D = 5)
• Analyze each completed data set
• Combine results in easy way to produce
multiple imputation inference
• Particularly useful for public use datasets
– data provider creates imputes for multiple
users, who can analyze data with complete-
data methods
36
MI Inference for a Scalar Estimand
= estimand of interest
Estimate (se 2 )
Dataset (d ) 1 b 531234
38
Summary of MI Inferences
D WD BD TD = WD + 6 5 BD gˆD = 1.2 B
D
(1.2 BD +WD )
39
Imputation for monotone patterns
U Y1 Y2 ... Yp
(a) regress Y1 on U , impute missing values of Y1
…
(b) regress Y2 on Y1 and U ,
impute missing values of Y2
(with imputes for missing Y1 from (a))
... …
43
Weighted
Complete Case
Analysis
44
Weighted CC Analysis
w1
w2 Complete cases
w3
???
??
?? ? Discard
??? ? ???
weights
47
Propensity Score Weighting
• A widely used approach alternative to
imputation
• Avoids modeling the data distribution
• A Fundamental concept in both missing
data and causal inference
• May not be stable if some propensity
scores are close to zero
48
Likelihood
Methods
49
Likelihood methods
• Statistical model + data Likelihood
• Two general approaches based on
likelihood
– maximum likelihood inference for large
samples
– Bayesian inference for small samples:
log(likelihood) + log(prior) = log(posterior)
• Methods use all available data
– do not require rectangular data sets
50
Parametric Likelihood
• Data Y
• Statistical model yields probability density f (Y | )
for Y with unknown parameters
• Likelihood function is then the density as a function of
L( | Y ) = const f (Y | )
• Loglikelihood is often easier to work with:
51
Example: Normal sample
• univariate iid normal sample
Data Y = ( y1 ,..., yn )
1 n 2
Normal density: f (Y | , 2 ) = ( 2 )
2 − n /2
exp − 2 ( yi − )
2 i =1
1 n 2
Likelihood: L( , | Y ) =
2 −n
exp − 2 ( yi − )
2 i =1
52
Maximum Likelihood Estimate
• The maximum likelihood (ML) estimate ˆ of
maximizes the likelihood
L(ˆ | Y ) L( | Y ) for all
53
Computing the ML estimate
54
Properties of ML estimates
• Under assumed model, ML estimate is:
– Consistent
– Efficient for large samples
– not necessarily the best for small samples
• ML estimate is transformation invariant
– If ˆ is the ML estimate of
Then (ˆ) is the ML estimate of ( )
55
Likelihood methods with incomplete
data
• Statistical models needed for:
– data without missing values
– missing-data mechanism
• Model for mechanism not needed if it is ignorable (to be
defined later)
• With likelihood, proceed as before:
– ML estimates, large sample standard errors
– Bayes posterior distribution
56
The Observed Data
missing data
M1 M2 M3Y1 Y2 Y3 observed missing indicators
1 0 0 ? ? 1 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 1 0 0 1
0 0 1 ? ? 0 0 1
0 1 1 ? ? 0 1 1
? ? ? ?
58
The likelihood
• Likelihood should involve model for M
f (Yobs , M | , ) = f (Yobs , Ymis | ) f ( M | Yobs , Ymis , ) dYmis
59
Ignoring the md mechanism
(B) Distinctness:
and have distinct parameter spaces
(Bayes: priors distributions are independent)
60
Proof: f (Yobs , M | , ) = f (Yobs , Ymis | ) f ( M | Yobs , Ymis , ) dYmis
= f (M | Yobs , ) f (Yobs | )
63
Some Discussion
and Take Home
Messages
64
A Few Notes
• Any missing data method involves modeling
assumptions
65
66
67
Missing data methods -- history
1. Before the EM algorithm (pre-1970’s)
– Ad-hoc adjustments (simple imputation)
– ML for simple problems (Anderson 1957)
– ML for complex problems too hard
2. ML era (1970’s – mid 1980’s)
– Rubin formulates model for missing data
mechanism, defines MAR (1976)
– EM and extensions facilitate ML for complex
problems
– ML for more flexible models – beyond multivariate 68
normal (see e.g. Little and Rubin 1987)
Missing data methods -- history
3. Bayes and Multiple Imputation (mid 1980’s –
present)
– Rubin proposes MI, justified via Bayes (1977, 1987)
– MCMC facilitates Bayes as an alternative to ML, with
better small sample properties (see e.g. Little and
Rubin 2019)
4. Robustness concerns (1990’s – present)
– Robins et al propose doubly robust methods for
missing data based on semiparametric approach
– Robust Bayesian models, more attention to model
checks
69
A Great Textbook on Missing
Data
70
Summary
71