0% found this document useful (0 votes)

5 views40 pages

Meth 2024 Part2 Missing

The document discusses methods for handling missing data, focusing on the types of missingness: Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR). It emphasizes the challenges of estimating parameters in the presence of destructive missingness and presents approaches for consistent estimation under MCAR and MAR conditions. The document also highlights the limitations of common imputation methods and the importance of understanding the nature of missingness to avoid biased estimators.

Uploaded by

Irch Ngoubili

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views40 pages

Meth 2024 Part2 Missing

Uploaded by

Irch Ngoubili

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Incomplete D?

ta methods
Missingness

Valentin Patilea

Ensai 2A, Oct-Dec 2024

This version: October 15, 2024
Some notation
▶ In the following the complete data are realizations of one of the
following types of variables/vectors
▶ Y (or Y ) and Z– when interested in the marginal distribution
characteristics (mean, quantiles,...), and some vector (possibly
empty30 ) of auxiliary variables Z influence the missingness
▶ (Y , X⊤ )⊤ and Z – when interested by the conditional
distribution characteristics, and some vector (possibly empty)
of auxiliary variables Z influence the missingness
▶ Sometimes we will gather everything in a vector U, and the
complete sample is denoted Ui , 1 ≤ i ≤ n
▶ Always observed (available) components of a vector will receive the
subscript O. For example, we write XO or UO
▶ Possibly missing (non available) components of a vector will receive
the subscript M. For example, we write XM or UM
▶ If all the components of a vector are always observed (available), we
drop the subscript O and simply write, say, X instead of XO
30
Convention: ’Z is an empty vector’ means there is no auxiliary variable
2 / 40
What is missing? What pattern is NOT considered?
The situations we study

UO UM
•••• · · ·
•••• •••
•••• · · ·
•••• •••
•••• •••

The situations we do NOT study

U
• · • • · • ·
• • · · • • •
• · • • • · ·
· • • • • • •
• • • • • • •

3 / 40
Agenda

Missing data
Types of missingness

Missing Completely at Random (MCAR)

Missing at Random (MAR)

4 / 40
Destructive or non-destructive?
▶ Alternative title

Ignorable or non-ignorable?

▶ The question on ‘destructive’ or ‘non-destructive’ concerns

the possibility to build consistent estimators or not
▶ with non-destructive missingness is still possible to construct
consistent estimators when data are incomplete
▶ with destructive (non-ignorable) missing, this is no longer true

▶ With non-destructive missingness, the consistent estimators

are less precise compared with the complete data case!
▶ The optimality bounds (e.g., Cramér-Rao bound) are expected
to be larger

5 / 40
Why ’destructive’ missingness is hopeless?
▶ Let Y ∈ R, and ∆ ∈ {0, 1}

▶ Assume that Y is observed (available) iff ∆ = 1

▶ Let
p(y ; θ) = P(∆ = 1 | Y = y ) = E(∆ | Y = y ), θ ∈ Θ, (6)
where p(y ; θ) is some parametric (say) probability function
▶ The complete data likelihood for one observation is then
L(θ) = p(Y ; θ)∆ {1 − p(Y ; θ)}1−∆ , (7)
and θ cannot be estimated consistently using only the Y values with
∆ = 1, except supposing p(·; θ) constant!
▶ if p(Y ; θ) does not depend on Y , we get the Bernoulli
likelihood for ∆; since ∆ is always observed, consistent
estimation is possible
▶ If y 7→ p(y ; θ) is not a constant function, there is little hope to be
able to estimate functionals of Y (mean, variance, quantiles...)
6 / 40
▶ Consider that we want to estimate E[ϕ(Y )] for some function
ϕ(·) of interest (e.g., ϕ(y ) = y , ϕ(y ) = y 2 ,...)
▶ With the notation (6), we can write

E[ϕ(Y )] = E[∆ϕ(Y )] + E[(1 − ∆)ϕ(Y )]

▶ The expectation E[(1 − ∆)ϕ(Y )] cannot be estimated

consistently with the incomplete data! Because Y is not
available if ∆ = 0!
▶ An idea can be to use the identity

E[ϕ(Y )] = E[{∆/p(Y ; θ)}ϕ(Y )],

but, in general31 , p(y ; θ) cannot be estimated (see slide 6)

▶ Several remedies for destructive missingness can be tried...
31
This means without additional conditions/restrictions
7 / 40
(Bad) remedies to destructive missigness
▶ Replace the missing values by an imputed value, say, Y
▶ some users would choose Y = E[{∆/E[∆]}Y ], or the median
of {∆/E[∆]}Y , or...
▶ We then get
???
E[ϕ(Y )] = E[∆ϕ(Y )] + ϕ(Y )E[1 − ∆], (8)
and now both expectations on the RHS can be consistently
estimated with the incomplete data
▶ However, there is no guarantee that the equality in (8) still holds!
▶ equality (8) holds iff
E[(1 − ∆)ϕ(Y )]
ϕ(Y ) = ,
E[1 − ∆]
and the numerator cannot be estimated consistently!
▶ Caution to usual imputation methods!

▶ The good surrogate Y value, for which (8) holds true, depends on
p(y ; θ), or requires the missing values of Y ; it also depends on ϕ(·)
8 / 40
What do we learn?

▶ Destructive missingness is hopeless. Very little can be done!

▶ If the probability of the event {∆ = 1} depends on Y (is not

constant), there is practically no way to build (asymptotically)
unbiased estimators for quantities like E[ϕ(Y )]
▶ The usual practical remedies to replace missing values by
some summaries (mean, median,...) computed with the
observed values are most likely wrong (biased)!
▶ Dropping the missing values32 leads most likely to biased
estimators, even asymptotically

32
This corresponds to setting ϕ(Y ) = 0 on slide 8, in equation (8).
9 / 40
What else do we learn?
▶ There is one case where the problems can be avoided: when
the event {∆ = 1} has a constant probability33
▶ In fact, it is not the constant nature of this probability which
matters!
▶ It is the fact that it can be estimated consistently under the
missingness!!!
▶ This is the case of a constant probability, it is enough to count
the missing observations.

▶ Thus, in a setup where the probability of {∆ = 1}, given what

is observed, can be estimated, the setup is no longer hopeless!

▶ We then move to the non-destructive situation!

▶ We are now ready to elaborate our road map!

33
Which means ∆ ⊥ Y
10 / 40
Road map for the study of the missingness

▶ Define types of missingness which allow unbiased/consistent

estimation (non destructive34 missingness)
▶ Missing Completely at Random (MCAR)
▶ the probability of {∆ = 1} is constant
▶ Missing at Random (MAR)
▶ the conditional probability of {∆ = 1} depends only on
variables which are always observed (without missing values)
▶ The remaining case (destructive) will be called Missing Not at
Random (MNAR) and is beyond the scope of the lectures
▶ The setups considered:
▶ marginal distribution aspects (moments, quantiles,...)
▶ conditional distribution aspects (mean, quantile regressions,...)

▶ A glimpse on the imputation methods

34
Sometimes called ignorable missingness
11 / 40
Agenda

Missing data

Missing Completely at Random (MCAR)

Marginal distribution aspects
Conditional distribution aspects (regressions)

Missing at Random (MAR)

12 / 40
▶ Let Y ∈ Rq be a vector of interest
▶ when we consider q = 1, we write Y instead

▶ Let Yi , 1 ≤ i ≤ n, be an independent sample of interest Y

(thus, the sample is Yi , 1 ≤ i ≤ n, when q = 1)
▶ Let ∆i ∈ {0, 1}, 1 ≤ i ≤ n, be indicators of non-missingness
(availability)
▶ ∆i = 1 if Yi is observed
▶ ∆i = 0 if Yi is NOT observed (a vector of "NA" is recoded)

▶ P(∆i = 1) = p > 0 (Herein, this is always assumed to hold)

▶ Notation/convention: it is common to denote the available

data by (∆i , ∆i Yi⊤ )⊤ , an independent sample of (∆, ∆Y⊤ )⊤
▶ Strictly speaking, ∆i Yi = Yi if Yi observed, else ∆i Yi ="NA"
▶ However, in calculations, when there is no possible confusion,
it is convenient to consider ∆i Yi = 0 when Yi is not observed
13 / 40
MCAR condition
▶ The missing completely at random (MCAR) condition is

∆⊥Y

▶ Equivalently, MCAR condition is

P(∆ = 1 | Y) = P(∆ = 1) =: p∆ (9)

Proposition 5
Let ϕ(Y) ∈ R be a function of Y and define35 ∆ϕ(∆Y) = 0.
Under the condition (9),

E[∆ϕ(∆Y)] = p∆ E[ϕ(Y)].

Moreover, if ℓ(y, ·) is a loss function, it holds

E[∆ℓ(∆Y, ·)] = p∆ E[ℓ(Y, ·)].

35
This convention serves with ∆ = 0 when ϕ("NA") has no meaning.
14 / 40
Consistent estimators under the MCAR condition (1/3)
▶ In view of Proposition 5, when p∆ > 0, a consistent estimator36 of
mY = E[Y] is
n n
1 1X 1X
m
bY = × ∆i Yi , where p
b∆ = ∆i
p
b∆ n i=1 n i=1

▶ Exercise: Assume E[ϕ(Y)] exists, and set ∆ϕ(∆Y) = 0. If p∆ > 0,

n
1 1X
× ∆i ϕ(∆i Yi ) → E[ϕ(Y)], in probability
p
b∆ n i=1

▶ Exercise: Assume that the variance of Y exists and p > 0.

√
▶ Show that m b Y is consistent and n−asymptotic normal.
▶ What is the asymptotic variance of m b Y?
▶ Is m b Y an unbiased estimator37 of mY ?
36
Recall the rule 0/0=0 and the convention ∆i Yi = 0 if ∆i = 0.
37
Get some hint from a simulation exercise.
15 / 40
Consistent estimators under the MCAR condition (2/3)

▶ Consider the distribution function estimation problem in the

case Y ∈ R (and p∆ = P(∆ = 1) > 0)
▶ With the rules 0/0 = 0 and ∆i 1{· ≤ ·} = 0 if ∆i = 0, let
n
1 1X
FbY (y ) = × ∆i 1{∆i Yi ≤ y }
pb∆ n i=1

▶ Exercise: Show that FbY (y ) is an unbiased estimator of the df

FY . Compute its variance38 and deduce the consistency.

38
The explicit calculation of the variance is not obvious. Admit that for a
binomial variable S ∼ B(n, pS ), it holds E[(1 + S)−1 ] = {1−qSn+1 }/{(n + 1)pS },
where qS = 1 − pS . This can be used to derive a bound for the variance.
16 / 40
Consistent estimators under the MCAR condition (3/3)
▶ Consider a loss function39 ℓ(y , c) and define

R(c) = E[ℓ(Y , c)], for c in some compact interval C ⊂ R

▶ With the rule 0/0 = 0 and ∆i ℓ(·, ·) = 0 if ∆i = 0, let

n
1 1X
R(c)
b = × ∆i ℓ(∆i Yi , c)
p
b∆ n i=1

Proposition 6
Under some mild conditions on the loss and if p∆ > 0, it holds40

sup R(c)
b − R(c) → 0, in probability.
c∈C

As a consequence, the minimizer of R(c)

b convergences in probability to
the (unique) minimizer of R(c).
39
E.g., the check function ℓ(y , c) = (y −c)[τ −1{y −c < 0}], with τ ∈ (0, 1).
40
The proof of the uniform conv. in prob. is not required for the exam.
17 / 40
▶ As a consequence of Proposition 6, under the MCAR
condition, the mean, the variance, the quantiles,..., can be
computed very simply
▶ Consider the sample of vectors Yi having all the components
available41 (observed), and simply compute the mean, the
variance, the quantiles,..., with that sample.
▶ Ignore the existence of missing values, and work with the
Pn
sample of complete Yi , which has the (random) size i=1 ∆i

41
In other words, discard all the individuals with missing components.
18 / 40
Agenda

Missing data

Missing Completely at Random (MCAR)

Marginal distribution aspects
Conditional distribution aspects (regressions)

Missing at Random (MAR)

19 / 40
▶ Let Y ∈ R and X a vector of covariates (predictors)

▶ Consider a loss function ℓ(y , c)

▶ Let M = {mβ (·) : β ∈ B} be a predictive model, that is a set

of real-valued functions mβ (x), for some parameter set B
▶ keep in mind the linear model, where mβ (x) = x⊤ β

▶ With complete data (Yi , X⊤ ⊤

i ) , 1 ≤ i ≤ n, the estimator
associated to the loss function ℓ(·, ·) is
n
X
β
b = arg min ℓ(Yi , mβ (Xi ))
β∈B
i=1

▶ The framework includes the mean regression, quantile

regression, etc

20 / 40
▶ Let U = (Y , X⊤ )⊤ = (U⊤ ⊤ ⊤ ⊤
M , UO ) and Ui = (UM,i , UO,i ) be an IID
sample of size n
▶ three situations: (a) UM = Y ; (b) UM is (sub)vector of X; or
(c) UM contains Y and a (sub)vector of X
▶ Let ∆i ∈ {0, 1}, 1 ≤ i ≤ n, be indicators of non-missingness
(availability)
▶ ∆i = 1 if UM,i is observed; ∆i = 0 if UM,i is NOT observed

▶ Assume the MCAR condition

∆ ⊥ U, and assume P(∆ = 1) > 0

▶ Consider a loss function ℓ(·, ·), and a predictive/regression model

M = {mβ (·) : β ∈ B}
▶ With the data (∆i , ∆i U⊤ ⊤
i ) , 1 ≤ i ≤ n, the estimator of β becomes
n
X
β
b = arg min ∆i ℓ(Yi , mβ (Xi ))
β∈B
i=1

▶ The asymptotic behavior of βb is expected to be that of the same

estimator with a complete sample size of n × P(∆ = 1) 21 / 40
Some conclusions on the estimators under MCAR

▶ The MCAR is a strong42 condition which can be handled

quite easily
▶ Basically, the usual statistical procedures designed for
complete data can be applied with the completely observed
sample, ignoring the missing observations (or observations
with missing components)
▶ The theoretical properties of the estimators obtained after
discarding the missing observations, or with missing
components, are very similar or identical to those obtained
with a complete sample of size n × P(∆ = 1)
▶ In particular, the variance of the estimators is expected to be
larger than with complete data

42
In the sense restrictive
22 / 40
Agenda

Missing data

Missing Completely at Random (MCAR)

Missing at Random (MAR)

Definition
Marginal distribution aspects
Conditional distribution aspects (regressions)

23 / 40
The data
▶ Consider a vector U = (U⊤ ⊤ ⊤
M , UO ) and the complete IID
⊤ ⊤ ⊤
sample Ui = (UM,i , UO,i ) , 1 ≤ i ≤ n

▶ The vector UO can include

▶ components of the vector of interest when the marginal distribution
is studied
▶ components of the covariates and/or the response in regression
(predictive) models
▶ auxiliary variables Z

▶ Let ∆i , 1 ≤ i ≤ n, be a sample of the indicator ∆ ∈ {0, 1} of

non-missingness (availability)

▶ The observations (available data) are (∆i , ∆i Ui ), that is

(U⊤ ⊤ ⊤
M,i , UO,i ) and ∆i = 1 or ("NA"⊤ , U⊤ ⊤
O,i ) and ∆i = 0

24 / 40
Missing At Random (MAR)

▶ The Missing At Random (MAR) condition is

∆ ⊥ UM | UO

▶ Equivalent formulation of the MAR condition (exercise!) :

∆ ⊥ U | UO

▶ Another equivalent formulation of the MAR condition :

P(∆ = 1 | U) = P(∆ = 1 | UO ) =: p(UO )

25 / 40
Understanding the MAR definition
▶ The probability

p(UO ) = P(∆ = 1 | UO ),

is usually called the propensity score

▶ When UO is an empty vector or the propensity score is a
constant function, we recover the MCAR definition
▶ Consider that UO = (V⊤ ⊤ ⊤
O , WO ) . The MAR definition
includes43 the case where the propensity score depends only
on the subvector WO , i.e., p(UO ) = p(WO )
▶ The vector UO can include
▶ variables from the statistical analysis (e.g., the response
and/or covariates in the case of a predictive analysis);
▶ auxiliary variables not considered in the modeling, serving just
for handling the missingness
43
Recall that ∆ ⊥ U | {VO , WO } implies ∆ ⊥ U | WO
26 / 40
The weighting under MAR
▶ Recall that we focus on the unifying problem of constructing
(asymptotically) unbiased estimators for expectations
▶ We thus look for a weight function ω(∆, UO ) such that, for
any integrable functional of interest ϕ(U), we have

E [ω(∆, UO ) × ϕ(∆U)] = E [ϕ(U)] (10)

where, by definition we can set ϕ(∆U) = 0

Proposition 7
Under the MAR condition, if44 p(UO ) ≥ c > 0, it holds
" #
∆
E ×ϕ(∆U) = E [ϕ(U)] (11)
p(UO )
| {z }
ω(∆,UO )

44
This is the meaning of "the propensity score is bounded away from zero"
27 / 40
▶ Proposition 7 introduces the Inverse Probability Weighting
(IPW), which applies to many situations (see also the next slides)

▶ Study of marginal distribution aspects of a vector Y, with auxiliary

information (if any) gathered in a vector Z
▶ Imagine Y = (YM ⊤ ⊤ ⊤
, YO ) and that Z is always observed. Then
U = (U⊤ ⊤ ⊤ ⊤ ⊤ ⊤
M , UO ) with UM = YM and UO = (YO , Z )

▶ Regression with missing responses: let Y response, X covariate

vector, Z a vector of auxiliary variables (possibly empty)
▶ Then UM = Y and UO = (X⊤ , Z⊤ )⊤

▶ Regression with missing covariates: let Y response, X = (X⊤ ⊤ ⊤

M , XO )
covariate vector, Z vector of auxiliary variables (possibly empty)
▶ Then UM = XM and UO = (Y , X⊤ ⊤ ⊤
O, Z )

▶ Regression with missing responses and covariates: let Y response,

X = (X⊤ ⊤ ⊤
M , XO ) covariates, Z auxiliary variables (possibly empty)
▶ Then UM = (Y , X⊤ ⊤ ⊤
M ) and UO = (XO , Z )
⊤ ⊤

28 / 40
How to get the propensity score?

▶ The propensity score is the conditional probability of a 0-1

variable (here, ∆) given a set of predictors (here, UO )
▶ Both ∆ and UO are always observed!
▶ The propensity score can be estimated by usual methods in
statistics and ML, such as
▶ logistic regression,
▶ random forests, boosting, etc

▶ When UO includes auxiliary variables Z, and p(·) depends45

only on Z, the propensity score may be estimated using (also)
external data!
▶ ‘Big Data’ can help !!!

45
The crucial point is to know that the propensity score depends only on Z!
29 / 40
Agenda

Missing data

Missing Completely at Random (MCAR)

Missing at Random (MAR)

Definition
Marginal distribution aspects
Conditional distribution aspects (regressions)

30 / 40
▶ Simplify and set q = 2 and let Y = (Y (1) , Y (2) )⊤ be the
variables of interest, and Z a vector of auxiliary variables
▶ Assume that Y (2) is always observed (available), and Y (1) is
possibly missing. Moreover, Y (1) , Y (2) and Z are not
necessarily independent.
▶ Let ∆ be the missing indicator and assume the MAR condition

P(∆ = 1 | Y (1) , Y (2) , Z) = P(∆ = 1 | Z) =: p(Z) ≥ c > 0

▶ Let (∆i , ∆i Yi(1) , Yi(2) , Zi )⊤ , 1 ≤ i ≤ n, be the IID data

▶ The propensity score p(·) will be

▶ either assumed given;
▶ or estimated by a logistic model, in which case Z is assumed
bounded and Var(Z) is assumed46 positive definite.
46
Implicitly, Z does not have constant components, which makes sense for
the auxiliary variables.
31 / 40
▶ Exercise: Assume the propensity score is given. Construct an
unbiased estimator for the df of Y (1) , and compute its variance.
Show its pointwise consistency and asymptotic normality. Comment
the case where Y (1) ⊥ Z.

▶ Exercise: Assume the propensity score is given. Construct unbiased

estimators for the mean and second order moment of Y (1) , and
compute their variance. Show they are consistent and asymptotic
normal. Build also an unbiased estimator for E[Y (1) Y (2) ], and
compute its variance.

▶ Exercise: Assume the propensity score is estimated by logistic

regression. Construct consistent estimators for the mean, the
second order moment of Y (1) and E[Y (1) Y (2) ]. Are they
asymptotically normal?

▶ Exercise: Assume the propensity score is given. Propose an

estimator for the τ −th order quantile of Y (1) . Assuming a result as
in Proposition 6, discuss the consistency of your estimator.

32 / 40
Agenda

Missing data

Missing Completely at Random (MCAR)

Missing at Random (MAR)

Definition
Marginal distribution aspects
Conditional distribution aspects (regressions)

33 / 40
▶ Let U = (Y , X⊤ , Z⊤ )⊤ , a vector gathering a response Y , covariates
X and auxiliary variables Z
▶ when there is no auxiliary variable, Z is an empty vector

▶ Suppose U = (U⊤ ⊤ ⊤
M , UO ) , and let the complete IID sample be Ui ,
1 ≤ i ≤ n. Here, the always observed vector UO can be any of the
ones enumerated on slide 28!

▶ Let ∆i the missingness indicators, and p(UO,i ) the propensity scores

▶ Consider a loss function ℓ(·, ·), and a predictive47 model

M = {mβ (·) : β ∈ B}, where mβ (·) are functions of X
▶ for example mβ (x) = x⊤ β

▶ With the data (∆i , ∆i U⊤ ⊤

i ) , 1 ≤ i ≤ n, under the MAR condition,
the estimator of β inspired by Proposition 7 is
n
X ∆i
β
b = arg min ℓ(Yi , mβ (Xi ))
β∈B
i=1
p(UO,i )

47
Such as mean regression, quantile regression,...
34 / 40
Regression with missing responses under MAR (1/4)
▶ We now elaborate on the different situations on slide 28, first
the case of possibly missing Y and always observed X, and
some auxiliary variables Z are possibly available
▶ thus, here UO = (X⊤ , Z⊤ )⊤ , and UM = Y

▶ Assume the propensity score p(UO ) is given and bounded

away from zero
▶ Let
β 0 = arg min E [ℓ(Y , mβ (X))]
β∈B

be the ‘true’ value of the parameter (β 0 is supposed to be the

unique solution of the minimization problem)
▶ We first focus on the mean regression, and next move to more
general case (quantile,...)
▶ The focus is also on the case where mβ (x) = x⊤ β
35 / 40
Regression with missing responses under MAR (2/4)
▶ Consider the mean regression model
Y = mβ (X) + ε, with E(ε | X) = 0, Var(ε | X) ≤ C ,
and assume48 that ε ⊥ Z | X. Let β 0 be the true value of the
parameter, and assume that it is identifiable.
▶ Consider the following least-squares estimators
▶
n
b = arg min 1 ∆i
X
β 1 {Yi − mβ (Xi )}2
β∈B n p(UO,i )
i=1
▶
n 2
b = arg min 1 ∆i Yi
X
β 2 − mβ (X i )
β∈B n p(UO,i )
i=1
▶
n
1X
β 3 = arg min
b ∆i {Yi − mβ (Xi )}2
β∈B n
i=1
48
It means that, given the covariates X, the auxiliary variables Z, if any
involved, do not bring additional information for explaining the response
36 / 40
Regression with missing responses under MAR (3/4)
▶ Exercise: Show that under MAR, each of the following
expectations are minimized at β = β 0 :
" 2 #
∆i ∆ i Yi
{Yi − mβ (Xi )}2 , E , E ∆i {Yi − mβ (Xi )}2

E − mβ (Xi )
p(UO,i ) p(UO,i )

Admitting that the convergence of the sample means on slide 36 to

their expectations is uniform49 with respect to β, deduce that β
b ,
1
β 2 and β 3 are consistent in probability!
b b

▶ Exercise:
▶ Derive the expression of β
b ,β
1
b and β
2
b in the case of the
3
linear model
▶ Calculate the bias and the variance of β
b ,β
1
b and β
2
b under
3
MAR condition
▶ Deduce that βb ,β
1
b and β
2
b are consistent
3
▶ Compare the variances of β b ,β
1
b and β
2
b
3
49
See also Proposition 6.
37 / 40
Regression with missing responses under MAR (4/4)
▶ The mean regression50 with missing responses, under the MAR
condition, is a particular51 situation where several consistent
estimators can be constructed in addition to β b , suggested by
1
Proposition 7

▶ β
b is based on the idea of a surrogate (or synthetic) response
2
that can be calculated with the available data, and has exactly the
same conditional expectation given the covariates as the response Y

▶ β
b corresponds to the idea of considering only the complete
3
observations

▶ Exercise: Propose an estimator for β using the pinball loss (check

function), that is consider a quantile regression model and propose
an estimator when the response is possibly missing, under the MAR
condition. Justify your choice.
50
That is, the predictive model M with the quadratic loss ℓ(y , c) = (y − c)2
51
Proceed to some simulations in the quantile regression case
38 / 40
Regression with missing covariates under MAR
▶ Consider the case where X = (X⊤ ⊤ ⊤
O , XM ) and the subvector XM is
possibly missing, while Y , XO , and possibly some auxiliary variables
Z, are always observed
▶ thus, here UO = (Y , X⊤ ⊤ ⊤
O , Z ) , and UM = XM

▶ In the case of the mean regression, under MAR, consider the

following least-squares estimators
n
b = arg min 1 ∆i
X
β 1 {Yi − mβ (Xi )}2
β∈B n p(UO,i )
i=1

▶ Exercise: Justify the definition of β

b . Show that, in general, β
1
b
2
and β 3 on slide 36 cannot be justified in this setup of missingness
b

▶ Remark: The case missing response and some covariates, i.e.

UO = (X⊤ ⊤ ⊤ ⊤ ⊤
O , Z ) , and UM = (Y , XM ) , is similar to the case of
missing covariates
39 / 40
Take away
▶ The missing at random (MAR) mechanism is a weaker constraint
than missing completely at random (MCAR)
▶ The usual statistical procedures can be adapted to missingness
under MAR by using the IPW (Inverse Probability Weighting)
▶ IPW is for missingness what IPCW is for right-censoring

▶ The IPW involves a propensity score, a conditional probability

depending only on the variables always observed, possibly also on
variables not considered for study (auxiliary variables)
▶ The propensity score can be estimated by any existing stat/ML
approach for binary regression/classification
▶ In some regression/predictive approaches, when only the response is
possibly missing, discarding the individuals with missing components
still allow to build consistent estimators (see β
b ), but the variance is
3
expected to be larger compared to the IPW-based estimators
▶ The MAR condition is not a testable hypothesis!
40 / 40

Lec4 Missing
No ratings yet
Lec4 Missing
12 pages
Milsap Allison
No ratings yet
Milsap Allison
18 pages
EM Algorithm
No ratings yet
EM Algorithm
30 pages
Missingdata
No ratings yet
Missingdata
10 pages
SPSS
No ratings yet
SPSS
92 pages
Missing Data DAGS R448-Reprint
No ratings yet
Missing Data DAGS R448-Reprint
12 pages
Rubin 1976
No ratings yet
Rubin 1976
12 pages
Missing Data
100% (2)
Missing Data
35 pages
Missing Data Analysis: University College London, 2015
No ratings yet
Missing Data Analysis: University College London, 2015
37 pages
Efficient Estimation From Right-Censored Data When Failure Indicators Are Missing at Random
No ratings yet
Efficient Estimation From Right-Censored Data When Failure Indicators Are Missing at Random
17 pages
DM Missing Value
No ratings yet
DM Missing Value
21 pages
Dyad 008
No ratings yet
Dyad 008
8 pages
Modern Method Web in Ar May 2012
No ratings yet
Modern Method Web in Ar May 2012
45 pages
Handling Missing Data
No ratings yet
Handling Missing Data
23 pages
Paper 4 SiT Dec2009 Shu Tha Pat Raj
No ratings yet
Paper 4 SiT Dec2009 Shu Tha Pat Raj
18 pages
Imputation: - Applied Multivariate Analysis & Statistical Learning
No ratings yet
Imputation: - Applied Multivariate Analysis & Statistical Learning
17 pages
VVImp Missing Values v14
No ratings yet
VVImp Missing Values v14
35 pages
Meth 2024 Part1 Censored
No ratings yet
Meth 2024 Part1 Censored
80 pages
Missng Data
No ratings yet
Missng Data
8 pages
DADM S5 Imputation of Missing Data
No ratings yet
DADM S5 Imputation of Missing Data
15 pages
Week 5 Lecture - Data Wrangling
No ratings yet
Week 5 Lecture - Data Wrangling
26 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
MIssing Data Imputation Using Machine Learning Algorithm
No ratings yet
MIssing Data Imputation Using Machine Learning Algorithm
11 pages
Missing Data & How To Handle It
No ratings yet
Missing Data & How To Handle It
32 pages
Lecture 2.3.10
No ratings yet
Lecture 2.3.10
30 pages
Data Screening: Wei-Jiun, Shen Ph. D
No ratings yet
Data Screening: Wei-Jiun, Shen Ph. D
31 pages
21 Ejs1881
No ratings yet
21 Ejs1881
42 pages
Week 10 Non Response and Missing Data
No ratings yet
Week 10 Non Response and Missing Data
73 pages
High-Dimensional Regression With Noisy and Missing Data: Provable Guarantees With Nonconvexity
No ratings yet
High-Dimensional Regression With Noisy and Missing Data: Provable Guarantees With Nonconvexity
28 pages
Parametric MMD Estimation With Missing Values - Robustness To Missingness and Data Model Misspecification
No ratings yet
Parametric MMD Estimation With Missing Values - Robustness To Missingness and Data Model Misspecification
39 pages
Econometrics Module 2
No ratings yet
Econometrics Module 2
38 pages
A GMM Approach For Dealing With Missing Data
No ratings yet
A GMM Approach For Dealing With Missing Data
41 pages
Efron 1994
100% (1)
Efron 1994
14 pages
Solutions For Missing Data in Structural Equation Modeling
No ratings yet
Solutions For Missing Data in Structural Equation Modeling
6 pages
Missing Data
No ratings yet
Missing Data
14 pages
Semiparametric Theory and Missing Data - Anastasios Tsiatis - Springer Series in Statistics, 1, 2006 - Springer - 9780387324487 - Anna's Archive
No ratings yet
Semiparametric Theory and Missing Data - Anastasios Tsiatis - Springer Series in Statistics, 1, 2006 - Springer - 9780387324487 - Anna's Archive
391 pages
Missing Data Values and How To Handle It
No ratings yet
Missing Data Values and How To Handle It
5 pages
Little1988 Test
No ratings yet
Little1988 Test
6 pages
Da Theory 03
No ratings yet
Da Theory 03
2 pages
Natalie Loxton Data Screening
No ratings yet
Natalie Loxton Data Screening
36 pages
Nonparametric Analysis of Factorial Designs With Random Missingness: Bivariate Data
No ratings yet
Nonparametric Analysis of Factorial Designs With Random Missingness: Bivariate Data
38 pages
CH 02 Data Handling Technique
No ratings yet
CH 02 Data Handling Technique
105 pages
Lecture 10
No ratings yet
Lecture 10
20 pages
Aaps - Schafer Missing Data and Longitudinal Analysis
No ratings yet
Aaps - Schafer Missing Data and Longitudinal Analysis
59 pages
Advanced Handling of Missing Data: One-Day Workshop
No ratings yet
Advanced Handling of Missing Data: One-Day Workshop
38 pages
Missing Data
No ratings yet
Missing Data
71 pages
R300 - Summer 2018 Advanced Econometric Methods Study Aid
No ratings yet
R300 - Summer 2018 Advanced Econometric Methods Study Aid
9 pages
00 Linacre Estimation Methods
No ratings yet
00 Linacre Estimation Methods
19 pages
R2 - Horton2007 - Missing Data
No ratings yet
R2 - Horton2007 - Missing Data
13 pages
Performance of A General Location Model With An Ignorable Missing-Data Assumption in A Multivariate Mental Health Services Study
No ratings yet
Performance of A General Location Model With An Ignorable Missing-Data Assumption in A Multivariate Mental Health Services Study
13 pages
Data Cleaning Workshop:: Club Data Science and Cloud Computing
No ratings yet
Data Cleaning Workshop:: Club Data Science and Cloud Computing
6 pages
FDS U4
No ratings yet
FDS U4
93 pages
Act2 Apren GVZA
No ratings yet
Act2 Apren GVZA
4 pages
Values
No ratings yet
Values
30 pages
Act 2 AGJ
No ratings yet
Act 2 AGJ
6 pages
S3 Missing Value Analysis Imputation
No ratings yet
S3 Missing Value Analysis Imputation
15 pages
Handling The Missing Values
No ratings yet
Handling The Missing Values
4 pages
ECON 1630 Problem Set #2 Fall 2021: Bias Variance
No ratings yet
ECON 1630 Problem Set #2 Fall 2021: Bias Variance
9 pages
IBM SPSS Missing Values
100% (1)
IBM SPSS Missing Values
34 pages
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Meth 2024 SM2
No ratings yet
Meth 2024 SM2
26 pages
Meth 2024 SM1
No ratings yet
Meth 2024 SM1
14 pages
Dyn td2
No ratings yet
Dyn td2
5 pages
Cascade 2
No ratings yet
Cascade 2
4 pages
00fax Analyse Fonctionnelle
No ratings yet
00fax Analyse Fonctionnelle
64 pages
Package TSA': R Topics Documented
No ratings yet
Package TSA': R Topics Documented
79 pages
Undergraduate Econometric
No ratings yet
Undergraduate Econometric
15 pages
PSTAT 174/274 Lecture Notes 5
No ratings yet
PSTAT 174/274 Lecture Notes 5
28 pages
Aff700 1000 220401
No ratings yet
Aff700 1000 220401
8 pages
Statistical Inference: Confidence Intervals
No ratings yet
Statistical Inference: Confidence Intervals
22 pages
Econometrics Analysis
No ratings yet
Econometrics Analysis
8 pages
BA182 Third Long Exam
No ratings yet
BA182 Third Long Exam
4 pages
Levy 345-003
No ratings yet
Levy 345-003
4 pages
Homework 4
No ratings yet
Homework 4
6 pages
Econometrics I 1
No ratings yet
Econometrics I 1
22 pages
Section 12 PDF
No ratings yet
Section 12 PDF
7 pages
6 One Hot Encoding
No ratings yet
6 One Hot Encoding
3 pages
Theory of Estimation2019
No ratings yet
Theory of Estimation2019
4 pages
SSRN Id3365282
No ratings yet
SSRN Id3365282
32 pages
Module 3 - Regression
No ratings yet
Module 3 - Regression
55 pages
ARIMAX Analysis
No ratings yet
ARIMAX Analysis
4 pages
Logistic Regression
No ratings yet
Logistic Regression
49 pages
Data 9
No ratings yet
Data 9
9 pages
Sudeep Rath Pizza Hut Case Analysis
No ratings yet
Sudeep Rath Pizza Hut Case Analysis
3 pages
Statistical Approach For Selection of Regression Model During Validation of Bioanalytical Method
No ratings yet
Statistical Approach For Selection of Regression Model During Validation of Bioanalytical Method
7 pages
Chapter 2 The Simple Regression Model
No ratings yet
Chapter 2 The Simple Regression Model
9 pages
Introduction To Vars and Structural Vars:: Estimation & Tests Using Stata
100% (1)
Introduction To Vars and Structural Vars:: Estimation & Tests Using Stata
69 pages
Introduction To Econometrics - Stock & Watson - CH 9 Slides
100% (1)
Introduction To Econometrics - Stock & Watson - CH 9 Slides
69 pages
Chapter 4 ML Parametric Classification
No ratings yet
Chapter 4 ML Parametric Classification
42 pages
Econometrics I Lecture 4 Wooldridge
No ratings yet
Econometrics I Lecture 4 Wooldridge
33 pages
Unit IV Lesson 2 Understanding Confidence Interval Estimates For The Sample Mean
No ratings yet
Unit IV Lesson 2 Understanding Confidence Interval Estimates For The Sample Mean
18 pages
Exercise - MLR - Colaboratory
No ratings yet
Exercise - MLR - Colaboratory
2 pages
UTS Statistik
No ratings yet
UTS Statistik
29 pages
University of Zimbabwe: Time: 2 Hours
No ratings yet
University of Zimbabwe: Time: 2 Hours
3 pages
Solved Problems
No ratings yet
Solved Problems
6 pages