0% found this document useful (0 votes)

14 views6 pages

Full Information Multiple Imputationfor Linear Regression Modelwith Missing Response Variable

Uploaded by

ahmed.imad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views6 pages

Full Information Multiple Imputationfor Linear Regression Modelwith Missing Response Variable

Uploaded by

ahmed.imad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/381770222

Full Information Multiple Imputation for Linear Regression Model with

Missing Response Variable

Article in IAENG International Journal of Applied Mathematics · June 2024

CITATIONS READS

0 23

2 authors, including:

Guangbao Guo
Shandong University of Technology
51 PUBLICATIONS 167 CITATIONS

SEE PROFILE

All content following this page was uploaded by Guangbao Guo on 28 June 2024.

The user has requested enhancement of the downloaded file.

IAENG International Journal of Applied Mathematics
______________________________________________________________________________________

Full Information Multiple Imputation for Linear

Regression Model with Missing Response Variable
Limin Song, Guangbao Guo

Abstract—Linear regression models are commonly used to Define the observable and missing values in the response
determine the quantitative relationships between variables and variable Y to be denoted Yobs and Ymis , respectively, the
utilize the resulting regression equations to make predictions. parts of the matrix X corresponding to Yobs and Ymis to be
This paper proposes a fully informative multiple imputation
method based on a linear regression model with a missing re- denoted Xobs and Xmis , respectively.
sponse variable, utilizing all observable data to obtain estimates For addressing the imputation problem of missing response
of the regression coefficients and thereby the predicted values variables in linear regression models, the most common
of the missing response variable. This not only provides a good methods are mean imputation and regression imputation, but
explanation of the relationship between the response variable these approaches also have some disadvantages. For instance,
and their respective variables, but also effectively enhances the
imputation accuracy of the response variable. The stability mean imputation can reduce the correlation between vari-
and sensitivity of the fiMI method are evaluated through a ables, while regression imputation can artificially increase
simulation study. Subsequently, the proposed method is applied this correlation. Wang et al. [1] (2009) used the expectation
to two real data sets, the admission prediction data set and the and maximization (EM) method to calculate the asymptotic
goalkeeper data set, and is discussed and analyzed. variances and standard errors of the maximum likelihood
Index Terms—linear regression models, missing response estimator (MLE) for linear models with missing data for the
variables, full information, multiple imputation. missing response variable. However, the standard deviation
can only be calculated after the operations have converged
I. I NTRODUCTION and cannot be obtained directly. Liu (2012) proposed a new
expectation recursive least squares (ERLS) method based on
W E consider the following linear regression model
Y = Xβ + ε, (1)
the EM algorithm for linear regression models. Avoiding the
difficulty of finding the inverse of the correlation matrix of
high-dimensional data. However, the calculation of regres-
where X = (Xij ) ∈ Rn×p is the independent variable,
sion coefficients requires several iterations, which increases
Xi. = (Xi1 , Xi2 , · · · , Xip ) represents the i − th row of
the computational time.
matrix X(i = 1, · · · , n), X.j = (X1j , X2j , · · · , Xnj )⊤
The method for dealing with missing data has under-
represents the j − th row of matrix X(j = 1, · · · , p),
gone two main methods: single imputation and multiple
β = (β1 , β2 , · · · , βp )⊤ ∈ Rp×1 is a vector of unknown
imputation. The emergence of multiple imputation methods
parameters, Y = (Y1 , Y2 , · · · Yn )⊤ ∈ Rn×1 is the response
has addressed the shortcomings of single imputation. Rubin
variable, and ε = (ε1 , ε2 , · · · εn )⊤ ∈ Rn×1 is the residual
[4] (1987) proposed a multiple imputation procedure that
vector. εi ∼ N (0, σ 2 In ) and independent of each other.
involves replacing each missing data point with a range
Suppose there are imperfectly independent and identi-
of potential data sets (thus also reflecting the uncertainty
cally distributed samples {(Xi. , Yi , δi ), 1 ≤ i ≤ n},where
associated with the imputed values); subsequent to this,
{Xi. , 1 ≤ i ≤ n} is fully observable, {Yi , 1 ≤ i ≤ n} is
analyzing these multiple imputed data sets using standard
missing, and δi is the variable indicating that Yi is missing,
procedures applicable to complete data sets; and ultimately
i.e.
0, if Yi is missing; generalizing and consolidating the findings from these anal-
δi = yses. Buuren et al. [2] (2011) used the R package mice to
1, if Yi is not missing.
impute incomplete multivariate data using chained equations,
Assume that Y satisfies the MAR mechanism, i.e. providing a practical step-by-step approach to addressing the
P (δi = 1|Xi. , Yi ) = P (δi = 1|Xi. , Yi ) = P (Xi. ), issue of missing data in applications. The mice package is
commonly used to impute missing response variables under
i.e. under a given Xi. , Yi is conditionally independent of δi . linear regression models, with the most commonly used
The number of cells in the response variable Y with no methods being predictive mean matching multiple imputation
missing data andP the number of cells with missing data to be (PMMMI) method, bayesian multiple imputation (BayesMI)
n
denoted nOB = i=1 δi and nNA = n − nOB , respectively. method, and bootstrap multiple imputation (bootstrapMI)
method. Rubin [6] (1999) and Schafer [7] (1997) have
Manuscript received July 1, 2023; revised November 10, 2023.
This work was supported by a grant from National Social Science Foun- conducted a series of studies on Bayesian multiple impu-
dation Project under project ID 23BTJ059, a grant from Natural Science tation methods, where the imputation accuracy is strongly
Foundation of Shandong under project ID ZR2020MA022, and a grant from influenced by the missing data mechanism. Little [8] (1988),
National Statistical Research Program under project ID 2022LY016.
Limin Song is a postgraduate student of Mathematics and Statistics, Morris et al. [9] (2015), and Buuren [10] (2018) further
Shandong University of Technology, Zibo, China. (e-mail: songsong- discussed the predictive mean multiple imputation methods
[email protected]). and found that the missing data mechanism has a small
Guangbao Guo is a professor of Mathematics and Statistics, Shandong
University of Technology, Zibo, China (Corresponding author to provide impact on the imputation accuracy. Chang et al. [5] (2020)
phone:15269366362; e-mail: [email protected]). studied the problem of missing data for independent variables

Volume 54, Issue 1, January 2024, Pages 77-81

IAENG International Journal of Applied Mathematics
______________________________________________________________________________________

in a distributed methods environment, and developed an process is repeated m times. To avoid extraneous complexity,
efficient distributed multiple imputation method for horizon- we assume that nOB > p.
tally divided incomplete data communication. However, no First, we calculate the matrix
solution is provided when the response variable is missing.
⊤
A = Xobs Xobs + λIp×p ,
II. F ULL INFORMATION MULTIPLE IMPUTATION where λ is the regularization parameter, which allows a
Multiple imputation (MI) is arguably the most popular limited solution to the over-fitting problem in (2). The
method for dealing with missing data. The MI method regression weights
replaces each missing value with a sample from its posterior
predictive distribution. The predictive imputation model is β ∗ = (A)−1 Xobs
⊤
Yobs
estimated from the observed data and does not use the miss-
ing values. The missing values are imputed multiple times are obtained with reference to (2) and the matrix A. Next,
in order to account for the uncertainty of the imputation, Choleskey’s decomposition of the positive definite matrix A
and each imputed data set is then used to fit an analysis yields matrix CA , i.e.
model. The parameter estimate β is combined with the results ⊤
A = (CA CA ),
of these analyses to produce a final estimate from multiple
imputed data sets. This method yields estimates that are more where CA is the upper triangular matrix. We obtain estimates
robust than those obtained by using a single value to fill in of the regression coefficients as follows:
for the missing data.
A straightforward method to analyzing data is to aggregate β̂ = β ∗ + σ(CA )−1 g, (4)
information from the minimum observable data so that it
will impute by analyzing all observable data. We refer to where g = (g1 , g2 , · · · , gp )⊤ is a gi ∼ N (0, 1) and mutually
this method as full information (fi) method, and next we independent p− dimensional variable. At this point
will extend it to the full information multiple imputation β̂f i = β̂,
(fiMI) method. In linear regression models with missing
response variable, the general linear regression imputation 1 X
p
⊤ ⊤ ¯
requires only Xobs Xobs and Xobs Yobs to obtain least squares Cov(β̂f i ) = ((β̂f i )i − β̂f i ).
p − 1 i=1
estimates of the regression coefficients, as can be seen from
the following equation:
According to sufficient statistics β̂f i and Cov(β̂f i ) of
⊤ the normal distribution, samples β1 , · · · , βM are ob-
β̂ = (Xobs Xobs )−1 (Xobs
⊤
Yobs ). (2)
tained as being independent of each other and obeying
However, the regression coefficients estimated in equation N (β̂f i , Cov(β̂f i )). Send the multiple regression coefficients
(2) may suffer from overfitting, leading to inaccurate pre- β1 , · · · , βM to the imputation model and integrate the mul-
dictions. To address this issue, we propose to fit a linear tiple imputation results using Rubin’s rule to obtain β̂ and
regression imputation model using the fi method, which Cov(β̂). Based on the final obtained β̂, impute the missing
can be interpreted as fitting the imputation model using all values Ŷmis = Xmis β̂ of the response variable and expand
observable data. By passing the imputed model parameters to obtain Ŷ .
to the full observable data set, it is expected to achieve the
best computational performance because it fully exploits all
III. N UMERICAL A NALYSIS
available information.
According to (1), it follows that Yi ∼ N (Xi. β, σ 2 ) with A. Evaluation indicators
priors 1) Mean square error of Ŷ
π(σ 2 ) ∝ IG(1/2, 1/2), The mean square error (MSE) calculates the difference
β | σ 2 ∼ N (0, σ 2 λ−1 I), between the imputed value and the original true value.
n
where IG and N are denoted as inverse gamma and mul- 1X
MSE(Ŷ) = (Yi − Ŷi )2 ,
tivariate Gaussian distributions, respectively. The posterior n i=1
distribution of (σ 2 , β) is given by
where Yi and Ŷi denote the original true value and the
2 imputed value respectively.
σ |Xobs ∼ IG((nOB + 1)/2, (SSE + 1)/2),
⊤ 2) Mean absolute error of Ŷ
β|σ 2 , Xobs ∼ N ((Xobs Xobs + λI)−1 Xobs
⊤
Yobs , (3)
⊤
The mean absolute error (MAE) is the average of the
σ 2 (Xobs Xobs + λI)−1 ) absolute differences between each predicted value and the
where corresponding actual value.
SSE = ∥Yobs − Xobs β ∗ ∥22 , n
1X
MAE(Ŷ) = |Ŷi − Yi |.
the specific representation of β ∗ will be given later. The fiMI n i=1
method samples (σ 2 , β) from (3), imputes the missing values
of the response variable from (1), and fits the analytical linear When the difference between the predicted and true values
regression model using the estimated complete data. This is smaller, it means that the imputation is better.

Volume 54, Issue 1, January 2024, Pages 77-81

IAENG International Journal of Applied Mathematics
______________________________________________________________________________________

(a) MSE values of Y(10−4)

1.30

0.81
B. Simulation

2.1
6
5.45 2.049 0.801
5.262 2.017 0.655 0.652

0.65
1.224 0.794

2.0
1.204

1.20

0.79
Firstly, the initial parameters have been fixed at

MSE

MSE
1.9
4

0.55
(n, p, M R) = (1000, 5, 10%), and the values of MSE(Ŷ)

1.10

0.77
1.8
3
1.775 1.047 0.497
and MAE(Ŷ) have been calculated for the fiMI method and 2.25 0.754

1.00

0.75

0.45
1.7
2
ERLS EMRE fiMI ERLS EMRE fiMI ERLS EMRE fiMI ERLS EMRE fiMI ERLS EMRE fiMI
the comparison method under missing response variables. n=300 n=500 n=1000 n=1500 n=2000
(b) MAE values of Y(10−1)
^
According to Table I, when there is a missing response

0.12

0.82

0.88

0.92
variable in the linear regression model, the fiMI method has 0.105 0.915

0.872
0.803 0.804 0.863 0.872
0.101 0.903

0.10

0.80

0.86
0.852
the lowest values for MSE and MAE. Overall, for imputation 0.87

0.88
MAE

MAE

MAE
0.78

0.84

0.868
0.08
of linear regression models with missing response variables,

0.84
0.76

0.82
0.756
0.067 0.865
the fiMI method has the highest imputation accuracy for 0.805 0.806

0.864
0.06

0.74

0.80

0.80
ERLS EMRE fiMI ERLS EMRE fiMI ERLS EMRE fiMI ERLS EMRE fiMI ERLS EMRE fiMI

the parameter combination (n, p, M R) = (1000, 5, 10%) , n=300 n=500 n=1000 n=1500 n=2000

meaning that the imputed values from the fiMI method are Fig. 2. Results of MSE and MAE values obtained by fiMI, ERLS, and
closest to the true values. EMRE methods in simulated data with different n values

TABLE I
MSE A ND MAE VALUES O F F I MI M ETHOD A ND C OMPARISON Case 2. Varying p with fixed (n, M R)
M ETHODS I N S IMULATED DATA The parameter values are set as (n, M R) = (1000, 10%)
and p = (3, 5, 10, 15, 20). The comparison results are shown
Indicators fiMI ERLS EMRE PMMMI BayesMI bootstrapMI in Fig. 3 and Fig. 4.
MSE
1.0472 1.2040 1.224 1.7151 2.3004 2.1565
(10−4 ) (a) MSE values of Y(10−4)
^
MAE 6
8.0541 8.5226 8.6269 1.0401 1.2393 1.1368
(10−2 ) 5
MSE
4
3 ●
●
2 ● ●
●

1
Next, the parameters (n, p, M R) are varied to examine the 3 5 10 15 20
p
MSE, MAE, and MRE values of the fiMI method and the (b) MAE values of Y(10−1)
^
comparison method under different sample sizes, numbers
1.4
of variables, and missing ratios for sensitivity and stability
MAE

●
1.2 ●
●
●

analysis. 1.0
●

0.8
Case 1. Varying n with fixed (p, M R) 3 5 10
15 20
p
The parameter values are set as (p, M R) = (5, 10)% and BayesMI bootstrapMI fiMI PMMMI
●

n = (300, 500, 1000, 1500, 2000) . The comparison results

are shown in Fig. 1 and Fig. 2. Fig. 3. Results of MSE and MAE values obtained by fiMI method and
multiple comparison methods in simulated data with different p values (case
2)
(a) MSE values of Y(10−4)
^
●

6
MSE

(a) MSE values of Y(10−4)

^
4
1.10 1.15 1.20 1.25 1.30 1.35

1.30

●
1.0 1.1 1.2 1.3 1.4 1.5 1.6

2
1.4
●
●
● 1.235 1.453
1.28 1.224 1.334
1.27 1.204
1.20

1.20

1.3
3 5 10 15 20 1.189
1.343 1.254

n(102)
MSE

MSE

MSE
1.2
1.10

1.10

(b) MAE values of Y(10−1)

^ 1.064
1.1

1.047 1.088 1.067

1.128
1.00

1.00

1.0

● ● ●
●
ERLS EMRE fiMI ERLS EMRE fiMI ERLS EMRE fiMI ERLS EMRE fiMI ERLS EMRE fiMI
1.0
MAE

p=3 p=5 p=10 p=15 p=20

(b) MAE values of Y(10−1)
^
0.5
0.88

1.00
0.92

●
0.92

0.92

0.863 0.951 0.957

0.86

0.95

0.91
3 5 10 15 20 0.891 0.892
0.852 0.897
0.908

n(102)
0.88

0.88

0.88
MAE

MAE

0.876
0.84

0.90

BayesMI bootstrapMI fiMI PMMMI

●
0.84

0.84

0.84
0.82

0.85

0.831 0.835
0.825
0.805
0.804
0.80

0.80

Fig. 1. Results of MSE and MAE values obtained by the fiMI method ERLS EMRE
p=3
fiMI ERLS EMRE
p=5
fiMI ERLS EMRE
p=10
fiMI ERLS EMRE
p=15
fiMI ERLS EMRE
p=20
fiMI

and multiple comparison methods in simulated data with different n values

(case 1)
Fig. 4. Results of MSE and MAE values obtained by fiMI, ERLS, and
EMRE methods in simulated data with different p values
Upon observing Fig. 1(a) and 2(a), it is found that the
MSE value of the fiMI method gradually decreases from Upon observing Fig. 3(a) and Fig. 4(a), it is found that the
2.2493e-04 to 4.9658e-05 as the sample size n increases MSE value of the fiMI method fluctuates within the range
for the fixed parameter (p, M R); the MSE values of all of 1.047241e-04 to 1.12679e-04 as the number of variables
other comparison methods are noticeably higher than the p increases for the fixed parameter, indicating that it is less
fiMI method. Observing Fig. 1(b) and 2(b) reveals that the affected by the number of variables p. The MSE value of
MAE value of the fiMI method tends to flatten out, indicating the other compared methods is higher than the fiMI method.
that the increase of the sample size n does not have much Observing Fig. 3(b) and 4(b) reveals that the change in the
impact on the MAE value. The fluctuation of the MAE value MAE value of the fiMI method is more gradual, indicating
of the fiMI method is the smallest, and the MAE values of that the increase in the number of variables p has less impact
the other methods are higher than the fiMI method. on the MAE value. The MAE value of the fiMI method

Volume 54, Issue 1, January 2024, Pages 77-81

IAENG International Journal of Applied Mathematics
______________________________________________________________________________________

fluctuates within the range of 8.04380e-02∼8.3516e-02, with each variable of the admission prediction data set as shown
the smallest range of fluctuation; the MAE values of the other in Table II:
methods are higher than the fiMI method. TABLE II
Case 3. Varying M R with fixed (n, p) T HE C ORRELATION C OEFFICIENT A ND P- VALUE B ETWEEN T HE
The parameter values are set as (n, p) = (1000, 5) and I NDEPENDENT VARIABLES A ND R ESPONSE VARIABLE I N A DMISSION
P REDICT DATA SET
M R = (10%, 20%, 30%, 40%, 50%) . The results are shown
in Fig. 5 and Fig. 6. Statistical GRE TOEFL University
SOP LOR CGPA
tests Score Score Rating
(a) MSE values of Y(10−4)
^
12.5
CC 0.803 0.792 0.711 0.676 0.670 0.873
10.0
●
P-value 2.2e-16 2.2e-16 2.2e-16 2.2e-16 2.2e-16 2.2e-16
MSE

7.5 ●
●

5.0 ●

2.5 ●
The correlation and significance test analysis show that
10 20 30 40 50
MR(%) these six characteristic variables are all highly correlated with
(b) MAE values of Y(10−1)
^
the response variable chance of admission, so the above six
6
●
●
characteristic variables are selected as independent variables.
MAE

4 ● The admission prediction data set is suitable for establishing

●
2 a multiple linear regression model, our regression model is
●

10 20 30 40 50 6
MR(%) X
BayesMI bootstrapMI fiMI PMMMI
●
Yi = Xij βj + εi , i = 1, 2, · · · 400.
j=1
Fig. 5. Results MSE and MAE values obtained by fiMI method and
multiple comparison methods in simulated data with different M R values For the admission prediction data set, we set the M R
(case 3) of admission chances to 50% , then impute with the fiMI
method and comparison method, and finally compare the
imputation methods in terms of imputation accuracy.
(a) MSE values of Y(10−4)
^
1.30

2.00 2.05 2.10 2.15 2.20 2.25

2.85 2.90 2.95 3.00 3.05 3.10

5.10 5.15 5.20 5.25 5.30 5.35

4.6

4.512
(a) MSE values of Y(10−6)
5.319 ^
4.4

1.224 2.189 3.042 3.042

1.204
1.20

11.648 11.504
12
4.2
MSE

MSE

4.073
2.104 10.381
4.0

10
1.10

MSE

5.177
3.8

1.047
5.121 7.313 7.369
8

2.011 2.861 3.646

6.508
1.00

3.6

ERLS EMRE fiMI ERLS EMRE fiMI ERLS EMRE fiMI ERLS EMRE fiMI ERLS EMRE fiMI
6

MR=10% MR=20% MR=30% MR=40% MR=50%

ERLS EMRE PMMMI BayesMI bootstrapMI fiMI
(b) MAE values of Y(10−1)
^
0.88

1.70

2.45

4.25
3.5

(b) MAE values of Y(10−2)

0.863
1.69 2.421 2.417
3.437
4.209
^
3.4
0.86

1.68

2.40

4.20

0.852
2.6 3.0 3.4 3.8

3.697
3.3
MAE

MAE

MAE
0.84

1.66

2.35

4.15

3.228
3.358 3.382
3.2

MAE

4.102
0.82

1.64

2.30

4.10

1.637 2.295
3.1

4.085
3.066
0.805 1.623 2.869 2.8638
2.749
0.80

1.62

2.25

4.05
3.0

ERLS EMRE fiMI ERLS EMRE fiMI ERLS EMRE fiMI ERLS EMRE fiMI ERLS EMRE fiMI
MR=10% MR=20% MR=30% MR=40% MR=50%
ERLS EMRE PMMMI BayesMI bootstrapMI fiMI

Fig. 6. Results of MSE and MAE values obtained by fiMI, ERLS, and
EMRE methods in simulated data with different M R values Fig. 7. MSE and MAE values obtained by fiMI method and comparison
methods in admission predict data set
Upon observing Fig. 5(a) and Fig. 6(a), it can be seen
that the MSE values of both the fiMI method and the other It can be seen from Fig. 7 that the fiMI method has
comparative methods show an overall increasing trend with the lowest MSE and MAE values for the response variable
the increase of the M R for the fixed parameter (n, p), with admission chances of M R = 50%, indicating that the
the MSE value of the fiMI method fluctuating within the fiMI method has the best imputation effect. Overall, for the
range of 1.0472e-04 to 5.1215e-04; the other imputation admission prediction data set with a large ratio of missing
methods have higher MSE values than the fiMI method. values, the fiMI method has the highest imputation accuracy,
Observing Fig. 5(b) and Fig. 6(b) reveals that the MAE followed by the ERLS and EMRE methods.
values of both the fiMI method and the other comparative The second data set for the empirical study is the goal-
methods roughly increase linearly with the increase of the keeper player data set. Rating is the response variable.
M R, but the MAE value of the fiMI method is the smallest Firstly, we perform correlation analysis on each variable of
among all the imputation methods, and the fluctuation range the goalkeeper data set, and the correlation coefficients and p-
is also the smallest. values between each characteristic variable and the response
variable are calculated as shown in Table III below:
C. Real Data Analysis TABLE III
T HE C ORRELATION C OEFFICIENT A ND P- VALUE B ETWEEN T HE
In this section, two real data sets are selected: the ad- I NDEPENDENT VARIABLES A ND R ESPONSE VARIABLE I N GK DATA S ET
mission prediction data set and the goalkeeper data set,
and the data set for this empirical study is obtained from Statistical
Positioning Diving Kicking Handling Reflexes
tests
a third-party data science community, the Heywhale. The
response variable is the chance of admission in the admission CC 0.923319 0.9217224 0.7543833 0.9113288 0.9262662
P-value 2.2e-16 2.2e-16 2.2e-16 2.2e-16 2.2e-16
prediction data set. Firstly, correlation analysis is done for

Volume 54, Issue 1, January 2024, Pages 77-81

IAENG International Journal of Applied Mathematics
______________________________________________________________________________________

The correlation and significance test analysis show that The performance of the imputation method discussed in
these five characteristic variables are all strongly correlated this paper is mainly verified through a large number of sim-
with rating, so the above five characteristic variables are ulation experiments and real data, mainly from practice and
selected as independent variables. The goalkeeper data set applications. Next, more attention will be paid to theoretical
is suitable for multiple linear regression modeling, so p = 5 support, and the theorem proving of each imputation method
, our regression model is as follows: will be studied, which can also serve as a direction for us in
the future.
5
X
Yi = Xij βj + εi , i = 1, 2, · · · 2003.
j=1
R EFERENCES
[1] J. X. Wang and M. Yu, “Note on the EM algorithm in linear regression
For the goalkeeper data set, we still consider the case of model,” International Mathematical Forum, vol. 4, no. 38, pp. 1883-
a large percentage of missing response variables and set the 1889, 2009.
missing ratio of the response variable rating M R = 50% [2] S. V. Buuren and K. G. Oudshoorn, “mice: Multivariate imputation by
chained equations in r,” Journal of Statistical Software, vol. 45, no. 3,
then impute with the fiMI method and the comparison pp. 1–67, 2011.
method, and finally compare the imputation methods in terms [3] G. Guo, Y. Sun and X. Jiang, “A partitioned quasi-likelihood for
of imputation accuracy. distributed statistical inference,”Computational Statistics, vol. 35, no.
4, pp. 1577–1596, 2020.
[4] G. Guo, H. Song, and L. Zhu, “ISR: The Iterated Score Regression-
(a) MSE values of Y(10−4)
^
Based Estimation Algorithm,” 2022.
3.24 3.053 [5] G. Guo, C. Wei, and G. Qian, “Sparse online principal component
3.047
2.0 3.0

analysis for parameter estimation in factor model,” Computational

MSE

Statistics, vol. 38, no. 2, pp. 1095-1116. 2022.

1.568 1.576 1.449 [6] C. Chang, Y. Deng, X. Jiang and Q. Long, “Multiple imputation for
1.0

ERLS EMRE PMMMI BayesMI bootstrapMI fiMI analysis of incomplete data in distributed health data networks,” Nature
Communications, vol. 11, no. 1, pp. 5467-547, 2020.
(b) MAE values of Y(10−1) [7] D. B. Rubin, Multiple Imputation for Nonresponse in Surveys. John
^
Wiley, 1999.
4.5

4.293 4.377
4.004 [8] J. L. Schafer, Analysis of incomplete multivariate data. London:
Chapman &Hall, 1997.
MAE
3.5

3.06 3.008 2.962 [9] R. J. A. Little, “Missing-Data Adjustments in Large Surveys,” Journal
of Business & Economic Statistics, vol. 6, no. 3, pp. 287-296, 1988.
2.5

ERLS EMRE PMMMI BayesMI bootstrapMI fiMI [10] T. P. Morris, I. R. White and P. Royston, “Tuning multiple imputation
by predictive mean matching and local residual draws,” BMC Med Res
Methodol, vol. 14, no. 1, pp. 1-13, 2015.
Fig. 8. MSE and MAE values obtained by fiMI method and comparison [11] S. V. Buuren, Flexible Imputation of Missing Data, 2nd ed. Chapman
methods in GK data set & Hall/CRC, 2018.

Observation of Fig. 8 reveals that the fiMI method has

the lowest MSE and MAE values for the response variable
rating of M R = 50% , indicating that the fiMI method
has the best imputation effect. Overall, for goalkeeper data
set with a large ratio of missing, the fiMI method has the
highest imputation accuracy, followed by the EMRE and
ERLS methods, respectively.

IV. C ONCLUSION
Big data statistical analysis has become one of the main-
stream positions in current statistical research. As missing
data in statistical analysis is objective and inevitable in
reality, techniques for dealing with missing data have re-
ceived much attention from the statistical community, and
imputation methods for missing data have been widely used
in many fields. To address this issue, this paper investigates
imputation methods for handling missing response variables
in linear regression models to better improve the imputation
accuracy while interpolating missing data. The work accom-
plished is as follows: the six methods are compared in terms
of method steps, the advantages and disadvantages of the six
methods are summarised, and the six methods are compared
in terms of computational performance.
For the problem of imputation accuracy, the sensitivity and
stability of the method are investigated through simulation,
and real data analysis is carried out to verify the performance
of the method. It is found that the proposed method has
higher imputation accuracy and is more effective in dealing
with data with missing ratios.

View publication stats

Volume 54, Issue 1, January 2024, Pages 77-81

The Future of Copyright in The Age of Artificial Intelligence (Aviv H. Gaon) (Z-Library)
No ratings yet
The Future of Copyright in The Age of Artificial Intelligence (Aviv H. Gaon) (Z-Library)
285 pages
Financial Management Project Bba - Compressed
No ratings yet
Financial Management Project Bba - Compressed
55 pages
Stats216 Winter18 Practice Midterm Solutions
100% (1)
Stats216 Winter18 Practice Midterm Solutions
3 pages
Extreme Learning Machine For Missing Data Using Multiple Imputations
No ratings yet
Extreme Learning Machine For Missing Data Using Multiple Imputations
18 pages
Jds 1135
No ratings yet
Jds 1135
13 pages
A GMM Approach For Dealing With Missing Data
No ratings yet
A GMM Approach For Dealing With Missing Data
41 pages
R2 - Horton2007 - Missing Data
No ratings yet
R2 - Horton2007 - Missing Data
13 pages
Experiment No.2 Title:: Predicting Missing Data Using Regression Modeling
No ratings yet
Experiment No.2 Title:: Predicting Missing Data Using Regression Modeling
8 pages
Missing Data Analysis: University College London, 2015
No ratings yet
Missing Data Analysis: University College London, 2015
37 pages
Modern Method Web in Ar May 2012
No ratings yet
Modern Method Web in Ar May 2012
45 pages
Multiple Imputation of Predictor Variables Using Generalized Additive Models
No ratings yet
Multiple Imputation of Predictor Variables Using Generalized Additive Models
27 pages
Imputation Based On Local Linear Regression For Nonmonotone Nonrespondents in Longitudinal Surveys
No ratings yet
Imputation Based On Local Linear Regression For Nonmonotone Nonrespondents in Longitudinal Surveys
17 pages
DADM S5 Imputation of Missing Data
No ratings yet
DADM S5 Imputation of Missing Data
15 pages
Missing Data
No ratings yet
Missing Data
71 pages
Roles of Imputation Methods For Filling The Missing Values: A Review
No ratings yet
Roles of Imputation Methods For Filling The Missing Values: A Review
9 pages
Missing Values
No ratings yet
Missing Values
16 pages
Dejong 2014
No ratings yet
Dejong 2014
19 pages
Ratio Imputation Improvement
No ratings yet
Ratio Imputation Improvement
39 pages
Schafer SMMR 1999 MI Primer
No ratings yet
Schafer SMMR 1999 MI Primer
14 pages
Semi-Supervised Inference For Block-Wise Missing Data Without Imputation
No ratings yet
Semi-Supervised Inference For Block-Wise Missing Data Without Imputation
36 pages
Unit - 3 - R Programming
No ratings yet
Unit - 3 - R Programming
16 pages
8 Hron Et Al 2010
No ratings yet
8 Hron Et Al 2010
13 pages
Missing Data Techniques - UCLA
No ratings yet
Missing Data Techniques - UCLA
66 pages
Journal of Statistical Software: Reviewer: Abdolvahab Khademi University of Massachusetts
No ratings yet
Journal of Statistical Software: Reviewer: Abdolvahab Khademi University of Massachusetts
4 pages
Mida (AE)
No ratings yet
Mida (AE)
12 pages
Multiple Imputation in Practice
No ratings yet
Multiple Imputation in Practice
11 pages
Parametric MMD Estimation With Missing Values - Robustness To Missingness and Data Model Misspecification
No ratings yet
Parametric MMD Estimation With Missing Values - Robustness To Missingness and Data Model Misspecification
39 pages
White 2010
No ratings yet
White 2010
23 pages
Platias2020 Greece
No ratings yet
Platias2020 Greece
10 pages
A Comprehensive Approach To Misspecification Testing in Linear Regression Models
No ratings yet
A Comprehensive Approach To Misspecification Testing in Linear Regression Models
6 pages
Comparing Multiple Imputation and Machine Learning Techniques For Longitudinal Data
No ratings yet
Comparing Multiple Imputation and Machine Learning Techniques For Longitudinal Data
13 pages
ST-14 Handling Missing Data With Multiple Imputation Using PROC MI in SAS
No ratings yet
ST-14 Handling Missing Data With Multiple Imputation Using PROC MI in SAS
5 pages
Lec4 Missing
No ratings yet
Lec4 Missing
12 pages
Multiple
No ratings yet
Multiple
30 pages
Multiple Imputation of Incomplete Categorical Data Using Latent Class Analysis
No ratings yet
Multiple Imputation of Incomplete Categorical Data Using Latent Class Analysis
30 pages
Ijctt V3i2p104
No ratings yet
Ijctt V3i2p104
5 pages
M Akaba 2019
No ratings yet
M Akaba 2019
7 pages
Meth 2024 Part3 Imput
No ratings yet
Meth 2024 Part3 Imput
32 pages
A Comparison of Imputation Strategies For Ordinal Missing Data On Likert Scale Variables
No ratings yet
A Comparison of Imputation Strategies For Ordinal Missing Data On Likert Scale Variables
21 pages
Chapter 3
No ratings yet
Chapter 3
58 pages
MIssing Data Imputation Using Machine Learning Algorithm
No ratings yet
MIssing Data Imputation Using Machine Learning Algorithm
11 pages
Missing Data Imputation Using Singular Value Decomposition
No ratings yet
Missing Data Imputation Using Singular Value Decomposition
6 pages
RJwrapper
No ratings yet
RJwrapper
24 pages
SPSS
No ratings yet
SPSS
92 pages
Unit 2 Notes - Docx-3
No ratings yet
Unit 2 Notes - Docx-3
14 pages
CBRG A Novel Algorithm For Handling Missing Data Using Bayesian Ridge Regression and Fea
No ratings yet
CBRG A Novel Algorithm For Handling Missing Data Using Bayesian Ridge Regression and Fea
17 pages
Advanced Sampling Techniques: (Errors in Sample Survey)
No ratings yet
Advanced Sampling Techniques: (Errors in Sample Survey)
7 pages
MICE
No ratings yet
MICE
4 pages
High-Dimensional Regression With Noisy and Missing Data: Provable Guarantees With Nonconvexity
No ratings yet
High-Dimensional Regression With Noisy and Missing Data: Provable Guarantees With Nonconvexity
28 pages
01 Dealing With Missing Data The Art and Science of Imputation
No ratings yet
01 Dealing With Missing Data The Art and Science of Imputation
26 pages
Multiple Imputation w2 2024
No ratings yet
Multiple Imputation w2 2024
45 pages
DL Vs Conventional
No ratings yet
DL Vs Conventional
14 pages
Multiple Imputation: Julia Kozlitina Steve Robertson April 26, 2006
No ratings yet
Multiple Imputation: Julia Kozlitina Steve Robertson April 26, 2006
23 pages
JDS 612 PDF
No ratings yet
JDS 612 PDF
18 pages
CahanBaiNg-Factor-Based Imputation of Missing Values and Covariances in Panel Data of Large Dimensions
No ratings yet
CahanBaiNg-Factor-Based Imputation of Missing Values and Covariances in Panel Data of Large Dimensions
34 pages
Missing Value Imputation Via Clusterwise Linear Regression
No ratings yet
Missing Value Imputation Via Clusterwise Linear Regression
13 pages
Handling Missing Data in Item Response Theory
No ratings yet
Handling Missing Data in Item Response Theory
33 pages
KK Yanti Cantik
No ratings yet
KK Yanti Cantik
19 pages
Sefidian2018 PDF
No ratings yet
Sefidian2018 PDF
61 pages
Efron 1994
100% (1)
Efron 1994
14 pages
Recurrent Neural Networks For Multivariate Time Series With Missing Values
No ratings yet
Recurrent Neural Networks For Multivariate Time Series With Missing Values
12 pages
Co-Clustering: Models, Algorithms and Applications
From Everand
Co-Clustering: Models, Algorithms and Applications
Gérard Govaert
No ratings yet
Chapter 5 Masunda
No ratings yet
Chapter 5 Masunda
6 pages
ANCOVA
No ratings yet
ANCOVA
17 pages
Geomorphology: Clément Roux, Adrien Alber, Mélanie Bertrand, Lise Vaudor, Hervé Piégay
No ratings yet
Geomorphology: Clément Roux, Adrien Alber, Mélanie Bertrand, Lise Vaudor, Hervé Piégay
9 pages
Sta 2100 Notes PDF
No ratings yet
Sta 2100 Notes PDF
73 pages
BMC Health Services Research: Willing To Wait?: The Influence of Patient Wait Time On Satisfaction With Primary Care
No ratings yet
BMC Health Services Research: Willing To Wait?: The Influence of Patient Wait Time On Satisfaction With Primary Care
5 pages
Interpretation of A Pressuremeter Test in Cohesive Soils
No ratings yet
Interpretation of A Pressuremeter Test in Cohesive Soils
10 pages
1649 Questionnare
No ratings yet
1649 Questionnare
2 pages
Standard Formate of Internship Report
No ratings yet
Standard Formate of Internship Report
6 pages
AFM 20530: Business Finance Semester I Group Assignment I - Intake 14 and 13 EX
No ratings yet
AFM 20530: Business Finance Semester I Group Assignment I - Intake 14 and 13 EX
2 pages
Cipp Model
No ratings yet
Cipp Model
34 pages
Minitab Basic Tutorial
No ratings yet
Minitab Basic Tutorial
32 pages
Bsbcrt511 Tasks
No ratings yet
Bsbcrt511 Tasks
18 pages
Botany Lab Report
No ratings yet
Botany Lab Report
4 pages
Do Not Neglect The Power of Symbols On Employee Performance: An Empirical Evidence From Turkey
No ratings yet
Do Not Neglect The Power of Symbols On Employee Performance: An Empirical Evidence From Turkey
15 pages
Impact of e Learning Vs Traditional Learning On Students Performance and Attitude 5e020d7d4cae7
No ratings yet
Impact of e Learning Vs Traditional Learning On Students Performance and Attitude 5e020d7d4cae7
10 pages
Arasula Reyniel
No ratings yet
Arasula Reyniel
42 pages
Project Housing & The Architectural Profession in The 1960s
No ratings yet
Project Housing & The Architectural Profession in The 1960s
346 pages
1 PB
No ratings yet
1 PB
12 pages
C Manema Chapter 1-5 - Tt1
No ratings yet
C Manema Chapter 1-5 - Tt1
67 pages
Bus 5113 Discussion Unit 3
No ratings yet
Bus 5113 Discussion Unit 3
3 pages
Leadership, Management, Bioethics & Research
100% (1)
Leadership, Management, Bioethics & Research
8 pages
Oyebiyi Khadijah Damilola Lin-2019-1176
No ratings yet
Oyebiyi Khadijah Damilola Lin-2019-1176
57 pages
Subjectivity in Performance Evaluations A Review of The Literature
No ratings yet
Subjectivity in Performance Evaluations A Review of The Literature
33 pages
Epidemiology Revision Module - 20240920 - JT
No ratings yet
Epidemiology Revision Module - 20240920 - JT
23 pages
Research Methodology This Chapter Presents The Research Method
No ratings yet
Research Methodology This Chapter Presents The Research Method
9 pages
Relationship Between The Set Outcome and Volleyball Skills in A Professional Volleyball Champion Male Team in Puerto Rico
No ratings yet
Relationship Between The Set Outcome and Volleyball Skills in A Professional Volleyball Champion Male Team in Puerto Rico
4 pages
A Model Data Management Plan Standard Operating Procedure: Results From The DIA Clinical Data Management Community, Committee On Clinical Data Management Plan
No ratings yet
A Model Data Management Plan Standard Operating Procedure: Results From The DIA Clinical Data Management Community, Committee On Clinical Data Management Plan
10 pages
Industry 4.0: A Review On Industrial Automation and Robotic: Article
No ratings yet
Industry 4.0: A Review On Industrial Automation and Robotic: Article
9 pages

Full Information Multiple Imputationfor Linear Regression Modelwith Missing Response Variable

Uploaded by

Full Information Multiple Imputationfor Linear Regression Modelwith Missing Response Variable

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Full Information Multiple Imputation for Linear Regression Model with

Article in IAENG International Journal of Applied Mathematics · June 2024

The user has requested enhancement of the downloaded file.

Full Information Multiple Imputation for Linear

Volume 54, Issue 1, January 2024, Pages 77-81

Volume 54, Issue 1, January 2024, Pages 77-81

(a) MSE values of Y(10−4)

n = (300, 500, 1000, 1500, 2000) . The comparison results

(a) MSE values of Y(10−4)

(b) MAE values of Y(10−1)

1.047 1.088 1.067

p=3 p=5 p=10 p=15 p=20

0.863 0.951 0.957

BayesMI bootstrapMI fiMI PMMMI

and multiple comparison methods in simulated data with different n values

Volume 54, Issue 1, January 2024, Pages 77-81

4 ● The admission prediction data set is suitable for establishing

2.00 2.05 2.10 2.15 2.20 2.25

2.85 2.90 2.95 3.00 3.05 3.10

5.10 5.15 5.20 5.25 5.30 5.35

1.224 2.189 3.042 3.042

2.011 2.861 3.646

MR=10% MR=20% MR=30% MR=40% MR=50%

(b) MAE values of Y(10−2)

Volume 54, Issue 1, January 2024, Pages 77-81

analysis for parameter estimation in factor model,” Computational

Statistics, vol. 38, no. 2, pp. 1095-1116. 2022.

Observation of Fig. 8 reveals that the fiMI method has

View publication stats

You might also like