0% found this document useful (0 votes)
5 views40 pages

Meth 2024 Part2 Missing

The document discusses methods for handling missing data, focusing on the types of missingness: Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR). It emphasizes the challenges of estimating parameters in the presence of destructive missingness and presents approaches for consistent estimation under MCAR and MAR conditions. The document also highlights the limitations of common imputation methods and the importance of understanding the nature of missingness to avoid biased estimators.

Uploaded by

Irch Ngoubili
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views40 pages

Meth 2024 Part2 Missing

The document discusses methods for handling missing data, focusing on the types of missingness: Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR). It emphasizes the challenges of estimating parameters in the presence of destructive missingness and presents approaches for consistent estimation under MCAR and MAR conditions. The document also highlights the limitations of common imputation methods and the importance of understanding the nature of missingness to avoid biased estimators.

Uploaded by

Irch Ngoubili
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Incomplete D?

ta methods
Missingness

Valentin Patilea

Ensai 2A, Oct-Dec 2024


This version: October 15, 2024
Some notation
▶ In the following the complete data are realizations of one of the
following types of variables/vectors
▶ Y (or Y ) and Z– when interested in the marginal distribution
characteristics (mean, quantiles,...), and some vector (possibly
empty30 ) of auxiliary variables Z influence the missingness
▶ (Y , X⊤ )⊤ and Z – when interested by the conditional
distribution characteristics, and some vector (possibly empty)
of auxiliary variables Z influence the missingness
▶ Sometimes we will gather everything in a vector U, and the
complete sample is denoted Ui , 1 ≤ i ≤ n
▶ Always observed (available) components of a vector will receive the
subscript O. For example, we write XO or UO
▶ Possibly missing (non available) components of a vector will receive
the subscript M. For example, we write XM or UM
▶ If all the components of a vector are always observed (available), we
drop the subscript O and simply write, say, X instead of XO
30
Convention: ’Z is an empty vector’ means there is no auxiliary variable
2 / 40
What is missing? What pattern is NOT considered?
The situations we study

UO UM
•••• · · ·
•••• •••
•••• · · ·
•••• •••
•••• •••

The situations we do NOT study

U
• · • • · • ·
• • · · • • •
• · • • • · ·
· • • • • • •
• • • • • • •

3 / 40
Agenda

Missing data
Types of missingness

Missing Completely at Random (MCAR)

Missing at Random (MAR)

4 / 40
Destructive or non-destructive?
▶ Alternative title

Ignorable or non-ignorable?

▶ The question on ‘destructive’ or ‘non-destructive’ concerns


the possibility to build consistent estimators or not
▶ with non-destructive missingness is still possible to construct
consistent estimators when data are incomplete
▶ with destructive (non-ignorable) missing, this is no longer true

▶ With non-destructive missingness, the consistent estimators


are less precise compared with the complete data case!
▶ The optimality bounds (e.g., Cramér-Rao bound) are expected
to be larger

5 / 40
Why ’destructive’ missingness is hopeless?
▶ Let Y ∈ R, and ∆ ∈ {0, 1}

▶ Assume that Y is observed (available) iff ∆ = 1

▶ Let
p(y ; θ) = P(∆ = 1 | Y = y ) = E(∆ | Y = y ), θ ∈ Θ, (6)
where p(y ; θ) is some parametric (say) probability function
▶ The complete data likelihood for one observation is then
L(θ) = p(Y ; θ)∆ {1 − p(Y ; θ)}1−∆ , (7)
and θ cannot be estimated consistently using only the Y values with
∆ = 1, except supposing p(·; θ) constant!
▶ if p(Y ; θ) does not depend on Y , we get the Bernoulli
likelihood for ∆; since ∆ is always observed, consistent
estimation is possible
▶ If y 7→ p(y ; θ) is not a constant function, there is little hope to be
able to estimate functionals of Y (mean, variance, quantiles...)
6 / 40
▶ Consider that we want to estimate E[ϕ(Y )] for some function
ϕ(·) of interest (e.g., ϕ(y ) = y , ϕ(y ) = y 2 ,...)
▶ With the notation (6), we can write

E[ϕ(Y )] = E[∆ϕ(Y )] + E[(1 − ∆)ϕ(Y )]

▶ The expectation E[(1 − ∆)ϕ(Y )] cannot be estimated


consistently with the incomplete data! Because Y is not
available if ∆ = 0!
▶ An idea can be to use the identity

E[ϕ(Y )] = E[{∆/p(Y ; θ)}ϕ(Y )],

but, in general31 , p(y ; θ) cannot be estimated (see slide 6)


▶ Several remedies for destructive missingness can be tried...
31
This means without additional conditions/restrictions
7 / 40
(Bad) remedies to destructive missigness
▶ Replace the missing values by an imputed value, say, Y
▶ some users would choose Y = E[{∆/E[∆]}Y ], or the median
of {∆/E[∆]}Y , or...
▶ We then get
???
E[ϕ(Y )] = E[∆ϕ(Y )] + ϕ(Y )E[1 − ∆], (8)
and now both expectations on the RHS can be consistently
estimated with the incomplete data
▶ However, there is no guarantee that the equality in (8) still holds!
▶ equality (8) holds iff
E[(1 − ∆)ϕ(Y )]
ϕ(Y ) = ,
E[1 − ∆]
and the numerator cannot be estimated consistently!
▶ Caution to usual imputation methods!

▶ The good surrogate Y value, for which (8) holds true, depends on
p(y ; θ), or requires the missing values of Y ; it also depends on ϕ(·)
8 / 40
What do we learn?

▶ Destructive missingness is hopeless. Very little can be done!

▶ If the probability of the event {∆ = 1} depends on Y (is not


constant), there is practically no way to build (asymptotically)
unbiased estimators for quantities like E[ϕ(Y )]
▶ The usual practical remedies to replace missing values by
some summaries (mean, median,...) computed with the
observed values are most likely wrong (biased)!
▶ Dropping the missing values32 leads most likely to biased
estimators, even asymptotically

32
This corresponds to setting ϕ(Y ) = 0 on slide 8, in equation (8).
9 / 40
What else do we learn?
▶ There is one case where the problems can be avoided: when
the event {∆ = 1} has a constant probability33
▶ In fact, it is not the constant nature of this probability which
matters!
▶ It is the fact that it can be estimated consistently under the
missingness!!!
▶ This is the case of a constant probability, it is enough to count
the missing observations.

▶ Thus, in a setup where the probability of {∆ = 1}, given what


is observed, can be estimated, the setup is no longer hopeless!

▶ We then move to the non-destructive situation!

▶ We are now ready to elaborate our road map!


33
Which means ∆ ⊥ Y
10 / 40
Road map for the study of the missingness

▶ Define types of missingness which allow unbiased/consistent


estimation (non destructive34 missingness)
▶ Missing Completely at Random (MCAR)
▶ the probability of {∆ = 1} is constant
▶ Missing at Random (MAR)
▶ the conditional probability of {∆ = 1} depends only on
variables which are always observed (without missing values)
▶ The remaining case (destructive) will be called Missing Not at
Random (MNAR) and is beyond the scope of the lectures
▶ The setups considered:
▶ marginal distribution aspects (moments, quantiles,...)
▶ conditional distribution aspects (mean, quantile regressions,...)

▶ A glimpse on the imputation methods

34
Sometimes called ignorable missingness
11 / 40
Agenda

Missing data

Missing Completely at Random (MCAR)


Marginal distribution aspects
Conditional distribution aspects (regressions)

Missing at Random (MAR)

12 / 40
▶ Let Y ∈ Rq be a vector of interest
▶ when we consider q = 1, we write Y instead

▶ Let Yi , 1 ≤ i ≤ n, be an independent sample of interest Y


(thus, the sample is Yi , 1 ≤ i ≤ n, when q = 1)
▶ Let ∆i ∈ {0, 1}, 1 ≤ i ≤ n, be indicators of non-missingness
(availability)
▶ ∆i = 1 if Yi is observed
▶ ∆i = 0 if Yi is NOT observed (a vector of "NA" is recoded)

▶ P(∆i = 1) = p > 0 (Herein, this is always assumed to hold)

▶ Notation/convention: it is common to denote the available


data by (∆i , ∆i Yi⊤ )⊤ , an independent sample of (∆, ∆Y⊤ )⊤
▶ Strictly speaking, ∆i Yi = Yi if Yi observed, else ∆i Yi ="NA"
▶ However, in calculations, when there is no possible confusion,
it is convenient to consider ∆i Yi = 0 when Yi is not observed
13 / 40
MCAR condition
▶ The missing completely at random (MCAR) condition is

∆⊥Y

▶ Equivalently, MCAR condition is

P(∆ = 1 | Y) = P(∆ = 1) =: p∆ (9)

Proposition 5
Let ϕ(Y) ∈ R be a function of Y and define35 ∆ϕ(∆Y) = 0.
Under the condition (9),

E[∆ϕ(∆Y)] = p∆ E[ϕ(Y)].

Moreover, if ℓ(y, ·) is a loss function, it holds

E[∆ℓ(∆Y, ·)] = p∆ E[ℓ(Y, ·)].


35
This convention serves with ∆ = 0 when ϕ("NA") has no meaning.
14 / 40
Consistent estimators under the MCAR condition (1/3)
▶ In view of Proposition 5, when p∆ > 0, a consistent estimator36 of
mY = E[Y] is
n n
1 1X 1X
m
bY = × ∆i Yi , where p
b∆ = ∆i
p
b∆ n i=1 n i=1

▶ Exercise: Assume E[ϕ(Y)] exists, and set ∆ϕ(∆Y) = 0. If p∆ > 0,


n
1 1X
× ∆i ϕ(∆i Yi ) → E[ϕ(Y)], in probability
p
b∆ n i=1

▶ Exercise: Assume that the variance of Y exists and p > 0.



▶ Show that m b Y is consistent and n−asymptotic normal.
▶ What is the asymptotic variance of m b Y?
▶ Is m b Y an unbiased estimator37 of mY ?
36
Recall the rule 0/0=0 and the convention ∆i Yi = 0 if ∆i = 0.
37
Get some hint from a simulation exercise.
15 / 40
Consistent estimators under the MCAR condition (2/3)

▶ Consider the distribution function estimation problem in the


case Y ∈ R (and p∆ = P(∆ = 1) > 0)
▶ With the rules 0/0 = 0 and ∆i 1{· ≤ ·} = 0 if ∆i = 0, let
n
1 1X
FbY (y ) = × ∆i 1{∆i Yi ≤ y }
pb∆ n i=1

▶ Exercise: Show that FbY (y ) is an unbiased estimator of the df


FY . Compute its variance38 and deduce the consistency.

38
The explicit calculation of the variance is not obvious. Admit that for a
binomial variable S ∼ B(n, pS ), it holds E[(1 + S)−1 ] = {1−qSn+1 }/{(n + 1)pS },
where qS = 1 − pS . This can be used to derive a bound for the variance.
16 / 40
Consistent estimators under the MCAR condition (3/3)
▶ Consider a loss function39 ℓ(y , c) and define

R(c) = E[ℓ(Y , c)], for c in some compact interval C ⊂ R

▶ With the rule 0/0 = 0 and ∆i ℓ(·, ·) = 0 if ∆i = 0, let


n
1 1X
R(c)
b = × ∆i ℓ(∆i Yi , c)
p
b∆ n i=1

Proposition 6
Under some mild conditions on the loss and if p∆ > 0, it holds40

sup R(c)
b − R(c) → 0, in probability.
c∈C

As a consequence, the minimizer of R(c)


b convergences in probability to
the (unique) minimizer of R(c).
39
E.g., the check function ℓ(y , c) = (y −c)[τ −1{y −c < 0}], with τ ∈ (0, 1).
40
The proof of the uniform conv. in prob. is not required for the exam.
17 / 40
▶ As a consequence of Proposition 6, under the MCAR
condition, the mean, the variance, the quantiles,..., can be
computed very simply
▶ Consider the sample of vectors Yi having all the components
available41 (observed), and simply compute the mean, the
variance, the quantiles,..., with that sample.
▶ Ignore the existence of missing values, and work with the
Pn
sample of complete Yi , which has the (random) size i=1 ∆i

41
In other words, discard all the individuals with missing components.
18 / 40
Agenda

Missing data

Missing Completely at Random (MCAR)


Marginal distribution aspects
Conditional distribution aspects (regressions)

Missing at Random (MAR)

19 / 40
▶ Let Y ∈ R and X a vector of covariates (predictors)

▶ Consider a loss function ℓ(y , c)

▶ Let M = {mβ (·) : β ∈ B} be a predictive model, that is a set


of real-valued functions mβ (x), for some parameter set B
▶ keep in mind the linear model, where mβ (x) = x⊤ β

▶ With complete data (Yi , X⊤ ⊤


i ) , 1 ≤ i ≤ n, the estimator
associated to the loss function ℓ(·, ·) is
n
X
β
b = arg min ℓ(Yi , mβ (Xi ))
β∈B
i=1

▶ The framework includes the mean regression, quantile


regression, etc

20 / 40
▶ Let U = (Y , X⊤ )⊤ = (U⊤ ⊤ ⊤ ⊤
M , UO ) and Ui = (UM,i , UO,i ) be an IID
sample of size n
▶ three situations: (a) UM = Y ; (b) UM is (sub)vector of X; or
(c) UM contains Y and a (sub)vector of X
▶ Let ∆i ∈ {0, 1}, 1 ≤ i ≤ n, be indicators of non-missingness
(availability)
▶ ∆i = 1 if UM,i is observed; ∆i = 0 if UM,i is NOT observed

▶ Assume the MCAR condition


∆ ⊥ U, and assume P(∆ = 1) > 0

▶ Consider a loss function ℓ(·, ·), and a predictive/regression model


M = {mβ (·) : β ∈ B}
▶ With the data (∆i , ∆i U⊤ ⊤
i ) , 1 ≤ i ≤ n, the estimator of β becomes
n
X
β
b = arg min ∆i ℓ(Yi , mβ (Xi ))
β∈B
i=1

▶ The asymptotic behavior of βb is expected to be that of the same


estimator with a complete sample size of n × P(∆ = 1) 21 / 40
Some conclusions on the estimators under MCAR

▶ The MCAR is a strong42 condition which can be handled


quite easily
▶ Basically, the usual statistical procedures designed for
complete data can be applied with the completely observed
sample, ignoring the missing observations (or observations
with missing components)
▶ The theoretical properties of the estimators obtained after
discarding the missing observations, or with missing
components, are very similar or identical to those obtained
with a complete sample of size n × P(∆ = 1)
▶ In particular, the variance of the estimators is expected to be
larger than with complete data

42
In the sense restrictive
22 / 40
Agenda

Missing data

Missing Completely at Random (MCAR)

Missing at Random (MAR)


Definition
Marginal distribution aspects
Conditional distribution aspects (regressions)

23 / 40
The data
▶ Consider a vector U = (U⊤ ⊤ ⊤
M , UO ) and the complete IID
⊤ ⊤ ⊤
sample Ui = (UM,i , UO,i ) , 1 ≤ i ≤ n

▶ The vector UO can include


▶ components of the vector of interest when the marginal distribution
is studied
▶ components of the covariates and/or the response in regression
(predictive) models
▶ auxiliary variables Z

▶ Let ∆i , 1 ≤ i ≤ n, be a sample of the indicator ∆ ∈ {0, 1} of


non-missingness (availability)

▶ The observations (available data) are (∆i , ∆i Ui ), that is

(U⊤ ⊤ ⊤
M,i , UO,i ) and ∆i = 1 or ("NA"⊤ , U⊤ ⊤
O,i ) and ∆i = 0

24 / 40
Missing At Random (MAR)

▶ The Missing At Random (MAR) condition is

∆ ⊥ UM | UO

▶ Equivalent formulation of the MAR condition (exercise!) :

∆ ⊥ U | UO

▶ Another equivalent formulation of the MAR condition :

P(∆ = 1 | U) = P(∆ = 1 | UO ) =: p(UO )

25 / 40
Understanding the MAR definition
▶ The probability

p(UO ) = P(∆ = 1 | UO ),

is usually called the propensity score


▶ When UO is an empty vector or the propensity score is a
constant function, we recover the MCAR definition
▶ Consider that UO = (V⊤ ⊤ ⊤
O , WO ) . The MAR definition
includes43 the case where the propensity score depends only
on the subvector WO , i.e., p(UO ) = p(WO )
▶ The vector UO can include
▶ variables from the statistical analysis (e.g., the response
and/or covariates in the case of a predictive analysis);
▶ auxiliary variables not considered in the modeling, serving just
for handling the missingness
43
Recall that ∆ ⊥ U | {VO , WO } implies ∆ ⊥ U | WO
26 / 40
The weighting under MAR
▶ Recall that we focus on the unifying problem of constructing
(asymptotically) unbiased estimators for expectations
▶ We thus look for a weight function ω(∆, UO ) such that, for
any integrable functional of interest ϕ(U), we have

E [ω(∆, UO ) × ϕ(∆U)] = E [ϕ(U)] (10)

where, by definition we can set ϕ(∆U) = 0

Proposition 7
Under the MAR condition, if44 p(UO ) ≥ c > 0, it holds
" #

E ×ϕ(∆U) = E [ϕ(U)] (11)
p(UO )
| {z }
ω(∆,UO )

44
This is the meaning of "the propensity score is bounded away from zero"
27 / 40
▶ Proposition 7 introduces the Inverse Probability Weighting
(IPW), which applies to many situations (see also the next slides)

▶ Study of marginal distribution aspects of a vector Y, with auxiliary


information (if any) gathered in a vector Z
▶ Imagine Y = (YM ⊤ ⊤ ⊤
, YO ) and that Z is always observed. Then
U = (U⊤ ⊤ ⊤ ⊤ ⊤ ⊤
M , UO ) with UM = YM and UO = (YO , Z )

▶ Regression with missing responses: let Y response, X covariate


vector, Z a vector of auxiliary variables (possibly empty)
▶ Then UM = Y and UO = (X⊤ , Z⊤ )⊤

▶ Regression with missing covariates: let Y response, X = (X⊤ ⊤ ⊤


M , XO )
covariate vector, Z vector of auxiliary variables (possibly empty)
▶ Then UM = XM and UO = (Y , X⊤ ⊤ ⊤
O, Z )

▶ Regression with missing responses and covariates: let Y response,


X = (X⊤ ⊤ ⊤
M , XO ) covariates, Z auxiliary variables (possibly empty)
▶ Then UM = (Y , X⊤ ⊤ ⊤
M ) and UO = (XO , Z )
⊤ ⊤

28 / 40
How to get the propensity score?

▶ The propensity score is the conditional probability of a 0-1


variable (here, ∆) given a set of predictors (here, UO )
▶ Both ∆ and UO are always observed!
▶ The propensity score can be estimated by usual methods in
statistics and ML, such as
▶ logistic regression,
▶ random forests, boosting, etc

▶ When UO includes auxiliary variables Z, and p(·) depends45


only on Z, the propensity score may be estimated using (also)
external data!
▶ ‘Big Data’ can help !!!

45
The crucial point is to know that the propensity score depends only on Z!
29 / 40
Agenda

Missing data

Missing Completely at Random (MCAR)

Missing at Random (MAR)


Definition
Marginal distribution aspects
Conditional distribution aspects (regressions)

30 / 40
▶ Simplify and set q = 2 and let Y = (Y (1) , Y (2) )⊤ be the
variables of interest, and Z a vector of auxiliary variables
▶ Assume that Y (2) is always observed (available), and Y (1) is
possibly missing. Moreover, Y (1) , Y (2) and Z are not
necessarily independent.
▶ Let ∆ be the missing indicator and assume the MAR condition

P(∆ = 1 | Y (1) , Y (2) , Z) = P(∆ = 1 | Z) =: p(Z) ≥ c > 0

▶ Let (∆i , ∆i Yi(1) , Yi(2) , Zi )⊤ , 1 ≤ i ≤ n, be the IID data

▶ The propensity score p(·) will be


▶ either assumed given;
▶ or estimated by a logistic model, in which case Z is assumed
bounded and Var(Z) is assumed46 positive definite.
46
Implicitly, Z does not have constant components, which makes sense for
the auxiliary variables.
31 / 40
▶ Exercise: Assume the propensity score is given. Construct an
unbiased estimator for the df of Y (1) , and compute its variance.
Show its pointwise consistency and asymptotic normality. Comment
the case where Y (1) ⊥ Z.

▶ Exercise: Assume the propensity score is given. Construct unbiased


estimators for the mean and second order moment of Y (1) , and
compute their variance. Show they are consistent and asymptotic
normal. Build also an unbiased estimator for E[Y (1) Y (2) ], and
compute its variance.

▶ Exercise: Assume the propensity score is estimated by logistic


regression. Construct consistent estimators for the mean, the
second order moment of Y (1) and E[Y (1) Y (2) ]. Are they
asymptotically normal?

▶ Exercise: Assume the propensity score is given. Propose an


estimator for the τ −th order quantile of Y (1) . Assuming a result as
in Proposition 6, discuss the consistency of your estimator.

32 / 40
Agenda

Missing data

Missing Completely at Random (MCAR)

Missing at Random (MAR)


Definition
Marginal distribution aspects
Conditional distribution aspects (regressions)

33 / 40
▶ Let U = (Y , X⊤ , Z⊤ )⊤ , a vector gathering a response Y , covariates
X and auxiliary variables Z
▶ when there is no auxiliary variable, Z is an empty vector

▶ Suppose U = (U⊤ ⊤ ⊤
M , UO ) , and let the complete IID sample be Ui ,
1 ≤ i ≤ n. Here, the always observed vector UO can be any of the
ones enumerated on slide 28!

▶ Let ∆i the missingness indicators, and p(UO,i ) the propensity scores

▶ Consider a loss function ℓ(·, ·), and a predictive47 model


M = {mβ (·) : β ∈ B}, where mβ (·) are functions of X
▶ for example mβ (x) = x⊤ β

▶ With the data (∆i , ∆i U⊤ ⊤


i ) , 1 ≤ i ≤ n, under the MAR condition,
the estimator of β inspired by Proposition 7 is
n
X ∆i
β
b = arg min ℓ(Yi , mβ (Xi ))
β∈B
i=1
p(UO,i )

47
Such as mean regression, quantile regression,...
34 / 40
Regression with missing responses under MAR (1/4)
▶ We now elaborate on the different situations on slide 28, first
the case of possibly missing Y and always observed X, and
some auxiliary variables Z are possibly available
▶ thus, here UO = (X⊤ , Z⊤ )⊤ , and UM = Y

▶ Assume the propensity score p(UO ) is given and bounded


away from zero
▶ Let
β 0 = arg min E [ℓ(Y , mβ (X))]
β∈B

be the ‘true’ value of the parameter (β 0 is supposed to be the


unique solution of the minimization problem)
▶ We first focus on the mean regression, and next move to more
general case (quantile,...)
▶ The focus is also on the case where mβ (x) = x⊤ β
35 / 40
Regression with missing responses under MAR (2/4)
▶ Consider the mean regression model
Y = mβ (X) + ε, with E(ε | X) = 0, Var(ε | X) ≤ C ,
and assume48 that ε ⊥ Z | X. Let β 0 be the true value of the
parameter, and assume that it is identifiable.
▶ Consider the following least-squares estimators

n
b = arg min 1 ∆i
X
β 1 {Yi − mβ (Xi )}2
β∈B n p(UO,i )
i=1

n  2
b = arg min 1 ∆i Yi
X
β 2 − mβ (X i )
β∈B n p(UO,i )
i=1

n
1X
β 3 = arg min
b ∆i {Yi − mβ (Xi )}2
β∈B n
i=1
48
It means that, given the covariates X, the auxiliary variables Z, if any
involved, do not bring additional information for explaining the response
36 / 40
Regression with missing responses under MAR (3/4)
▶ Exercise: Show that under MAR, each of the following
expectations are minimized at β = β 0 :
  " 2 #
∆i ∆ i Yi
{Yi − mβ (Xi )}2 , E , E ∆i {Yi − mβ (Xi )}2
 
E − mβ (Xi )
p(UO,i ) p(UO,i )

Admitting that the convergence of the sample means on slide 36 to


their expectations is uniform49 with respect to β, deduce that β
b ,
1
β 2 and β 3 are consistent in probability!
b b

▶ Exercise:
▶ Derive the expression of β
b ,β
1
b and β
2
b in the case of the
3
linear model
▶ Calculate the bias and the variance of β
b ,β
1
b and β
2
b under
3
MAR condition
▶ Deduce that βb ,β
1
b and β
2
b are consistent
3
▶ Compare the variances of β b ,β
1
b and β
2
b
3
49
See also Proposition 6.
37 / 40
Regression with missing responses under MAR (4/4)
▶ The mean regression50 with missing responses, under the MAR
condition, is a particular51 situation where several consistent
estimators can be constructed in addition to β b , suggested by
1
Proposition 7

▶ β
b is based on the idea of a surrogate (or synthetic) response
2
that can be calculated with the available data, and has exactly the
same conditional expectation given the covariates as the response Y

▶ β
b corresponds to the idea of considering only the complete
3
observations

▶ Exercise: Propose an estimator for β using the pinball loss (check


function), that is consider a quantile regression model and propose
an estimator when the response is possibly missing, under the MAR
condition. Justify your choice.
50
That is, the predictive model M with the quadratic loss ℓ(y , c) = (y − c)2
51
Proceed to some simulations in the quantile regression case
38 / 40
Regression with missing covariates under MAR
▶ Consider the case where X = (X⊤ ⊤ ⊤
O , XM ) and the subvector XM is
possibly missing, while Y , XO , and possibly some auxiliary variables
Z, are always observed
▶ thus, here UO = (Y , X⊤ ⊤ ⊤
O , Z ) , and UM = XM

▶ In the case of the mean regression, under MAR, consider the


following least-squares estimators
n
b = arg min 1 ∆i
X
β 1 {Yi − mβ (Xi )}2
β∈B n p(UO,i )
i=1

▶ Exercise: Justify the definition of β


b . Show that, in general, β
1
b
2
and β 3 on slide 36 cannot be justified in this setup of missingness
b

▶ Remark: The case missing response and some covariates, i.e.


UO = (X⊤ ⊤ ⊤ ⊤ ⊤
O , Z ) , and UM = (Y , XM ) , is similar to the case of
missing covariates
39 / 40
Take away
▶ The missing at random (MAR) mechanism is a weaker constraint
than missing completely at random (MCAR)
▶ The usual statistical procedures can be adapted to missingness
under MAR by using the IPW (Inverse Probability Weighting)
▶ IPW is for missingness what IPCW is for right-censoring

▶ The IPW involves a propensity score, a conditional probability


depending only on the variables always observed, possibly also on
variables not considered for study (auxiliary variables)
▶ The propensity score can be estimated by any existing stat/ML
approach for binary regression/classification
▶ In some regression/predictive approaches, when only the response is
possibly missing, discarding the individuals with missing components
still allow to build consistent estimators (see β
b ), but the variance is
3
expected to be larger compared to the IPW-based estimators
▶ The MAR condition is not a testable hypothesis!
40 / 40

You might also like