Meth 2024 Part2 Missing
Meth 2024 Part2 Missing
ta methods
Missingness
Valentin Patilea
UO UM
•••• · · ·
•••• •••
•••• · · ·
•••• •••
•••• •••
U
• · • • · • ·
• • · · • • •
• · • • • · ·
· • • • • • •
• • • • • • •
3 / 40
Agenda
Missing data
Types of missingness
4 / 40
Destructive or non-destructive?
▶ Alternative title
Ignorable or non-ignorable?
5 / 40
Why ’destructive’ missingness is hopeless?
▶ Let Y ∈ R, and ∆ ∈ {0, 1}
▶ Let
p(y ; θ) = P(∆ = 1 | Y = y ) = E(∆ | Y = y ), θ ∈ Θ, (6)
where p(y ; θ) is some parametric (say) probability function
▶ The complete data likelihood for one observation is then
L(θ) = p(Y ; θ)∆ {1 − p(Y ; θ)}1−∆ , (7)
and θ cannot be estimated consistently using only the Y values with
∆ = 1, except supposing p(·; θ) constant!
▶ if p(Y ; θ) does not depend on Y , we get the Bernoulli
likelihood for ∆; since ∆ is always observed, consistent
estimation is possible
▶ If y 7→ p(y ; θ) is not a constant function, there is little hope to be
able to estimate functionals of Y (mean, variance, quantiles...)
6 / 40
▶ Consider that we want to estimate E[ϕ(Y )] for some function
ϕ(·) of interest (e.g., ϕ(y ) = y , ϕ(y ) = y 2 ,...)
▶ With the notation (6), we can write
▶ The good surrogate Y value, for which (8) holds true, depends on
p(y ; θ), or requires the missing values of Y ; it also depends on ϕ(·)
8 / 40
What do we learn?
32
This corresponds to setting ϕ(Y ) = 0 on slide 8, in equation (8).
9 / 40
What else do we learn?
▶ There is one case where the problems can be avoided: when
the event {∆ = 1} has a constant probability33
▶ In fact, it is not the constant nature of this probability which
matters!
▶ It is the fact that it can be estimated consistently under the
missingness!!!
▶ This is the case of a constant probability, it is enough to count
the missing observations.
34
Sometimes called ignorable missingness
11 / 40
Agenda
Missing data
12 / 40
▶ Let Y ∈ Rq be a vector of interest
▶ when we consider q = 1, we write Y instead
∆⊥Y
Proposition 5
Let ϕ(Y) ∈ R be a function of Y and define35 ∆ϕ(∆Y) = 0.
Under the condition (9),
E[∆ϕ(∆Y)] = p∆ E[ϕ(Y)].
38
The explicit calculation of the variance is not obvious. Admit that for a
binomial variable S ∼ B(n, pS ), it holds E[(1 + S)−1 ] = {1−qSn+1 }/{(n + 1)pS },
where qS = 1 − pS . This can be used to derive a bound for the variance.
16 / 40
Consistent estimators under the MCAR condition (3/3)
▶ Consider a loss function39 ℓ(y , c) and define
Proposition 6
Under some mild conditions on the loss and if p∆ > 0, it holds40
sup R(c)
b − R(c) → 0, in probability.
c∈C
41
In other words, discard all the individuals with missing components.
18 / 40
Agenda
Missing data
19 / 40
▶ Let Y ∈ R and X a vector of covariates (predictors)
20 / 40
▶ Let U = (Y , X⊤ )⊤ = (U⊤ ⊤ ⊤ ⊤
M , UO ) and Ui = (UM,i , UO,i ) be an IID
sample of size n
▶ three situations: (a) UM = Y ; (b) UM is (sub)vector of X; or
(c) UM contains Y and a (sub)vector of X
▶ Let ∆i ∈ {0, 1}, 1 ≤ i ≤ n, be indicators of non-missingness
(availability)
▶ ∆i = 1 if UM,i is observed; ∆i = 0 if UM,i is NOT observed
42
In the sense restrictive
22 / 40
Agenda
Missing data
23 / 40
The data
▶ Consider a vector U = (U⊤ ⊤ ⊤
M , UO ) and the complete IID
⊤ ⊤ ⊤
sample Ui = (UM,i , UO,i ) , 1 ≤ i ≤ n
(U⊤ ⊤ ⊤
M,i , UO,i ) and ∆i = 1 or ("NA"⊤ , U⊤ ⊤
O,i ) and ∆i = 0
24 / 40
Missing At Random (MAR)
∆ ⊥ UM | UO
∆ ⊥ U | UO
25 / 40
Understanding the MAR definition
▶ The probability
p(UO ) = P(∆ = 1 | UO ),
Proposition 7
Under the MAR condition, if44 p(UO ) ≥ c > 0, it holds
" #
∆
E ×ϕ(∆U) = E [ϕ(U)] (11)
p(UO )
| {z }
ω(∆,UO )
44
This is the meaning of "the propensity score is bounded away from zero"
27 / 40
▶ Proposition 7 introduces the Inverse Probability Weighting
(IPW), which applies to many situations (see also the next slides)
28 / 40
How to get the propensity score?
45
The crucial point is to know that the propensity score depends only on Z!
29 / 40
Agenda
Missing data
30 / 40
▶ Simplify and set q = 2 and let Y = (Y (1) , Y (2) )⊤ be the
variables of interest, and Z a vector of auxiliary variables
▶ Assume that Y (2) is always observed (available), and Y (1) is
possibly missing. Moreover, Y (1) , Y (2) and Z are not
necessarily independent.
▶ Let ∆ be the missing indicator and assume the MAR condition
32 / 40
Agenda
Missing data
33 / 40
▶ Let U = (Y , X⊤ , Z⊤ )⊤ , a vector gathering a response Y , covariates
X and auxiliary variables Z
▶ when there is no auxiliary variable, Z is an empty vector
▶ Suppose U = (U⊤ ⊤ ⊤
M , UO ) , and let the complete IID sample be Ui ,
1 ≤ i ≤ n. Here, the always observed vector UO can be any of the
ones enumerated on slide 28!
47
Such as mean regression, quantile regression,...
34 / 40
Regression with missing responses under MAR (1/4)
▶ We now elaborate on the different situations on slide 28, first
the case of possibly missing Y and always observed X, and
some auxiliary variables Z are possibly available
▶ thus, here UO = (X⊤ , Z⊤ )⊤ , and UM = Y
▶ Exercise:
▶ Derive the expression of β
b ,β
1
b and β
2
b in the case of the
3
linear model
▶ Calculate the bias and the variance of β
b ,β
1
b and β
2
b under
3
MAR condition
▶ Deduce that βb ,β
1
b and β
2
b are consistent
3
▶ Compare the variances of β b ,β
1
b and β
2
b
3
49
See also Proposition 6.
37 / 40
Regression with missing responses under MAR (4/4)
▶ The mean regression50 with missing responses, under the MAR
condition, is a particular51 situation where several consistent
estimators can be constructed in addition to β b , suggested by
1
Proposition 7
▶ β
b is based on the idea of a surrogate (or synthetic) response
2
that can be calculated with the available data, and has exactly the
same conditional expectation given the covariates as the response Y
▶ β
b corresponds to the idea of considering only the complete
3
observations