Journal of Statistical Software: Reviewer: Abdolvahab Khademi University of Massachusetts

The document is a book review of 'Flexible Imputation of Missing Data (2nd Edition)' by Stef van Buuren, which addresses the challenges of missing data in statistical analysis, particularly in clinical studies. The book is structured into twelve chapters covering basic to advanced techniques, practical applications, and case studies, all utilizing the R programming language and the mice package for multiple imputation. It is aimed at practitioners, advanced students, and theoretical researchers, providing a comprehensive resource for handling missing data effectively.

Uploaded by

Andrea

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views4 pages

Journal of Statistical Software: Reviewer: Abdolvahab Khademi University of Massachusetts

Uploaded by

Andrea

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

JSS Journal of Statistical Software

April 2020, Volume 93, Book Review 1. doi: 10.18637/jss.v093.b01

Reviewer: Abdolvahab Khademi

University of Massachusetts

Flexible Imputation of Missing Data (2nd Edition)

Stef van Buuren

Chapman & Hall/CRC, Boca Raton, 2018.
ISBN 9781138588318. xxvii+416 pp. USD 91.95 (H).
https://www.crcpress.com/9781138588318

Occurrence of missing data can cause serious issues, including decreased sample size, biased
estimates, and algorithmic problems. Therefore, proper treatment of missing data is a signif-
icant part of data analysis in statistics, especially in clinical and experimental studies.
Treatment of missing data is usually included in a section of its own in most textbooks, pre-
senting best or most convenient practices, based on the methods presented and the statistical
sophistication of the audience. However, the contexts, complexity, and severity of missing
data are so diverse and complicated that its treatment warrants a volume of its own. There
are currently several books on missing data, ranging from practical to more theoretical. Flex-
ible Imputation of Missing Data (2nd Edition) is an updated addition to the literature on
missing data, which combines practice, theory, and applications using the R programming
language.
The twelve chapters of the book are grouped in four sections: the basics, advanced techniques,
case studies, and extensions. Exposition in each chapter is accompanied by plenty of graphs,
code, examples, and exercises. The code and the entire book are available online at the
author’s personal website. The programming language of the book is R and the treatment of
missing data is performed by the package mice, which was created by the author.
Chapter 1, introduction, provides motivation for dealing with missing data, how missing data
occur (e.g. by nonresponse or attrition), current practice in the literature, categories of
missing data (MCAR, MAR, MNAR), and common fixes and their advantages and draw-
backs. Common practices, such as listwise and pairwise deletion methods, mean imputa-
tion, (stochastic) regression regression, last/baseline observation carried forward (LOCF and
BOCF), and indicator method (popular in public health) are discussed. The largest portion
of the chapter is dedicated to multiple imputation. According to the author, the emphasis of
the entire book is on multiple imputation because of its efficiency and statistical properties
compared to other methods.
The entire Chapter 2, multiple imputation, is devoted to the method of multiple imputa-
tion (MI), starting with a historical sketch of the concept and practice of MI. The historical
2 Flexible Imputation of Missing Data (2nd Edition)

section is not only anecdotal, but also very informative due to its reference to some works
that shaped the foundation of the study of missing data. In the following sections, topics in
incomplete data are elaborated, including incomplete-data perspective (which in my opinion
is an excellent point by the author), causes of missing data, notation in the book, rigorous
definitions of MCAR, MAR, MNAR (and how to simulate them), and ignorable and nonig-
norable missing data models. Other topics presented in this chapter include the goal of MI,
sources of variation in MI (sampling, missing vales, simulation), characteristics of a proper
imputations (unbiased population estimand, unbiased sampling variance, confidence valid es-
timate of variance due to missing data), variance ratio, and degrees of freedom for testing
MI. Once an MI is performed, procedures are needed to evaluate the performance of the MI.
These procedures are explained next in this chapter along with an example and R code. This
chapter ends with discussion on imputation versus prediction, when not to use MI, and how
to determine how many MI’s are needed in a given missing data analysis scenario.
Applications of MI in common statistical methods are introduced in Chapter 3, univariate
missing data. A motivating example is introduced at the beginning of the chapter, upon which
the author conceptually builds five methods of imputing the missing data. These foundation
methods are then taken up and implemented in the rest of the chapter on different inferential
methods, including normal linear regression, t-distribution based methods, classification and
regression trees, general linear models, count data, semi-continuous data, censored data, and
data with nonignorable missingness. Imputation methods are clearly explained in each section
with the algorithm and the R code.
In situations where there is more than one variable with missing data, which is very common
in real world research and data analysis, different methods and tools are needed. Chapter 4,
multivariate missing data, introduces imputation techniques for such cases. The author begins
with different patterns of missing data in multivariate research, followed by some illustrative
plots and R code and output on a dataset. In addition, some useful indices are presented that
show the relationship between the variables in the dataset and the potential contribution of
those variables in imputing the missing data (inbound and outbound indices, and influx and
outflux coefficients). The rest of the chapter focuses on more nuanced issues, theory, and
algorithms. Imputation of monotone missing data patterns, joint modeling of continuous and
discrete data, and fully conditional specification (FCS) are discussed. Numerical issues in
the FCS modeling are discussed in technical depth. Familiarity with stochastic processes and
numerical iterative methods are assumed in this section. The chapter ends with an example
using the MICE algorithm. The theoretical discussion in this chapter will help the audience
when setting the parameters in the mice package. Therefore, understanding of the technical
aspects, though probably daunting for the beginner, is recommended.
Multiple imputation work flow comprises three phases: imputation of the missing data m
times, analysis of the m imputed datasets, and pooling of the parameters across m analyses.
Chapter 5, analysis of imputed data, exclusively focuses on phase two of the MI work flow.
The work flow of choice by the author is introduced first (using the mice package), followed
by some common practices that the author does not recommend, such as parameter averaging
and data stacking. Next, pooling methods based on normal distribution are presented. In the
multiple-parameter cases (such as when a categorical variable is encoded as dummy variables),
different statistics are introduced, such as multivariate Wald test (D1 ), combined test statistics
(D2 ), and likelihood ratio test (D3 ). Stepwise model selection is specifically treated for the
potential problems it raises in imputation. To eliminate sampling error, the bootstrap method
Journal of Statistical Software – Book Reviews 3

in the context of MI is introduced, followed by a short discussion on parallel processing

implementation of each imputation by a separate thread or core.
In section II of the book (advanced techniques), the author discusses MI techniques in mul-
tilevel modeling and causal inference. This section comprises three chapters. In Chapter 6,
imputation in practice, issues arising in real world data analyses are discussed, primarily in
relation to the algorithm and package mice. This chapter provides practical solutions to
several key points the practitioner needs to address before and during the execution of MI.
The points addressed include type of missingness, form of imputation model, set of predictors
to include in the imputation process, imputing variables that are function of other variables,
setting up imputation and the number of iterations, and the number of MI datasets. Technical
considerations, such as visit sequence, convergence of algorithm, and diagnostics are clearly
discussed. The author provides practical advice from experience and literature, accompa-
nied by examples, code, and illustrative graphs. For the practitioner, this chapter provides a
wealth of practical information that will be immediately applicable.
Imputation of missing data in more complex statistical models, such as random effects, mixed
effects, and hierarchical models, are discussed in Chapter 7, multilevel multiple imputation.
A brief introduction to multilevel modeling is given, contrasted with single level statistics,
the notation used in the chapter, and complications involved in the imputation of missing
data with multilevel models. Joint modeling and fully conditional specification methods are
revisited in this chapter for multilevel data. In the rest of the chapter, the author walks
through an example using the mice package on different variations of multilevel data, such
as intercept-only model, random intercepts, random intercepts with interaction, and random
slopes. Emphasis is on level 2 missing data because they are more challenging to impute.
Overall, those who use this book to apply the techniques in their own work and research will
find this chapter very practical and straightforward with excellent extended examples, plots,
code, and helpful interpretation of the results.
Causal inference in experimental studies, such as in medical, psychological, and educational
research, is usually discussed at aggregate level. However, in more specific studies, causal
effects are of interest at the experimental unit level, such as the individual level. Chapter 8,
individual causal effects, discusses methods and issues in imputation of missing data in the
framework of causal effects when the effect is studied at the unit level (heterogeneity in
treatment effect). The first few pages of this chapter is devoted to introducing the concepts
of individual causal effects (ICE). Next, the author presents imputation methods in the FCS
framework (naive FCS and FCS with a prior). Extensions to the framework, especially the
use of control variates is also discussed in this brief chapter. Most of the contents of the
chapter comprise examples, plots, and code in R, together with extensive discussion of the
output. The rich plots in this chapter show how informative plots can be in communication
of statistical results.
The third section of the book presents case studies in which real research and data are pre-
sented. In Chapter 9, measurement issues, real data are used to illustrate how to deal with
too many independent variables, sensitivity analysis, self-reported data, and dependence in
observations. Chapter 10, selection issues, presents MI challenges when rows (cases) are miss-
ing, such as in drop-out situations in panel designs or non-response. Chapter 11, longitudinal
data, presents example data in studies where measurements are repeated over longer time.
Section four of the book (extensions), includes only Chapter 12, conclusion. In this chapter,
4 Flexible Imputation of Missing Data (2nd Edition)

the author provides tips and advice regarding practical issues such as reporting, file manage-
ment, and ideas for future research on missing data analysis.
Flexible Imputation of Missing Data (2nd Edition) will definitely appeal to practitioners who
analyze real world data with missing values, particularly clinical and health data. The book
covers all types of missing data and missing data patterns. There are several aspects of the
book that make it accessible and distinguished from other volumes currently available. The
most prominent feature of this book is the clarity of exposition achieved by presenting clear
description, examples, plots, and code. In addition, the example data sets used in the book are
very familiar to researchers and practitioners in clinical and health data analysis, creating a
tangible connection between the text and the practice. Students can use the example datasets
to understand very clearly the process of missing data management and analysis. Another
great feature of the book is the use of the R programming language and focus on one package.
This structure gives the book coherence in terms of methods and tools used. In addition, on
the theory side there is enough information, challenging questions, and reference to literature
that make this book a rich resource for theoretical researchers. The intended audience of
this book are practitioners in data analysis (especially biostatisticians), advanced graduate
students, and theoretical researchers.

Reviewer:
Abdolvahab Khademi
University of Massachusetts
Department of Mathematics and Statistics
Amherst MA 01002, United States of America
E-mail: [email protected]

Journal of Statistical Software http://www.jstatsoft.org/

published by the Foundation for Open Access Statistics http://www.foastat.org/
April 2020, Volume 93, Book Review 1 Published: 2020-04-18
doi:10.18637/jss.v093.b01

Applied Missing Data Analysis, 2nd Edition
No ratings yet
Applied Missing Data Analysis, 2nd Edition
564 pages
Arihant CBSE Applied Mathematics Term 2 Class 11 Book
100% (3)
Arihant CBSE Applied Mathematics Term 2 Class 11 Book
214 pages
Missing Data
No ratings yet
Missing Data
71 pages
Multiple Imputation of Missing Data
No ratings yet
Multiple Imputation of Missing Data
495 pages
Flexible Imputation of Missing Data
100% (3)
Flexible Imputation of Missing Data
444 pages
Semiparametric Theory and Missing Data - Anastasios Tsiatis - Springer Series in Statistics, 1, 2006 - Springer - 9780387324487 - Anna's Archive
No ratings yet
Semiparametric Theory and Missing Data - Anastasios Tsiatis - Springer Series in Statistics, 1, 2006 - Springer - 9780387324487 - Anna's Archive
391 pages
A GMM Approach For Dealing With Missing Data
No ratings yet
A GMM Approach For Dealing With Missing Data
41 pages
Handling Data With Three Types of Missing Values
No ratings yet
Handling Data With Three Types of Missing Values
33 pages
Comparing Multiple Imputation and Machine Learning Techniques For Longitudinal Data
No ratings yet
Comparing Multiple Imputation and Machine Learning Techniques For Longitudinal Data
13 pages
S3 Missing Value Analysis Imputation
No ratings yet
S3 Missing Value Analysis Imputation
15 pages
SPSS
No ratings yet
SPSS
92 pages
2019 Multiple Imputations
No ratings yet
2019 Multiple Imputations
27 pages
A Comparative Study of Multiple Imputation and Maximum Likelihood Methods of Imputing Missing Data in A
No ratings yet
A Comparative Study of Multiple Imputation and Maximum Likelihood Methods of Imputing Missing Data in A
14 pages
Emmanuel 2021 A Survey On Missing Data in Machine Learning
No ratings yet
Emmanuel 2021 A Survey On Missing Data in Machine Learning
37 pages
Missing Data Techniques - UCLA
No ratings yet
Missing Data Techniques - UCLA
66 pages
Multiple Imputation w2 2024
No ratings yet
Multiple Imputation w2 2024
45 pages
FDS U4
No ratings yet
FDS U4
93 pages
IJDKP
No ratings yet
IJDKP
17 pages
2 PB
No ratings yet
2 PB
10 pages
Multiple Imputation in Practice
No ratings yet
Multiple Imputation in Practice
11 pages
A Comparison of Three Popular Methods For Handling Missing Data Complete Case Analysis Inverse
No ratings yet
A Comparison of Three Popular Methods For Handling Missing Data Complete Case Analysis Inverse
31 pages
Graham2009 Missing Values Analysis
No ratings yet
Graham2009 Missing Values Analysis
31 pages
Missing Values
No ratings yet
Missing Values
16 pages
8 Hron Et Al 2010
No ratings yet
8 Hron Et Al 2010
13 pages
Centraltendencywhattoconsider 1
No ratings yet
Centraltendencywhattoconsider 1
6 pages
Emmanuel Et Al. - 2021 - A Survey On Missing Data in Machine Learning
No ratings yet
Emmanuel Et Al. - 2021 - A Survey On Missing Data in Machine Learning
37 pages
01 Dealing With Missing Data The Art and Science of Imputation
No ratings yet
01 Dealing With Missing Data The Art and Science of Imputation
26 pages
The Negative Impact of Missing Value Imputation in Classification of Diabetes Dataset and Solution For Improvement
No ratings yet
The Negative Impact of Missing Value Imputation in Classification of Diabetes Dataset and Solution For Improvement
8 pages
Week 5 Lecture - Data Wrangling
No ratings yet
Week 5 Lecture - Data Wrangling
26 pages
Unit - 3 - R Programming
No ratings yet
Unit - 3 - R Programming
16 pages
Mida (AE)
No ratings yet
Mida (AE)
12 pages
Dyad 008
No ratings yet
Dyad 008
8 pages
Data Imputation For Missing Values
No ratings yet
Data Imputation For Missing Values
14 pages
Missing Data Analysis: University College London, 2015
No ratings yet
Missing Data Analysis: University College London, 2015
37 pages
Advanced Handling of Missing Data: One-Day Workshop
No ratings yet
Advanced Handling of Missing Data: One-Day Workshop
38 pages
An Analysis of Four Missing Data Treatment Methods For Supervised Learning
No ratings yet
An Analysis of Four Missing Data Treatment Methods For Supervised Learning
16 pages
Missing Data Mechanisms and Imputation Methods
No ratings yet
Missing Data Mechanisms and Imputation Methods
16 pages
Lecture 2.3.10
No ratings yet
Lecture 2.3.10
30 pages
White 2010
No ratings yet
White 2010
23 pages
Platias2020 Greece
No ratings yet
Platias2020 Greece
10 pages
Mice Vs Ppca
No ratings yet
Mice Vs Ppca
8 pages
Handling Missing Data
No ratings yet
Handling Missing Data
32 pages
DADM S5 Imputation of Missing Data
No ratings yet
DADM S5 Imputation of Missing Data
15 pages
M Akaba 2019
No ratings yet
M Akaba 2019
7 pages
JDS 612 PDF
No ratings yet
JDS 612 PDF
18 pages
Handling Missing Data
No ratings yet
Handling Missing Data
23 pages
Machine Learning Based Missing Data Imputation
No ratings yet
Machine Learning Based Missing Data Imputation
13 pages
Dealing With Missing Data: Key Assumptions and Methods For Applied Analysis
No ratings yet
Dealing With Missing Data: Key Assumptions and Methods For Applied Analysis
20 pages
Missing Data
100% (2)
Missing Data
35 pages
Imputation
No ratings yet
Imputation
10 pages
Business Analytics ST1
No ratings yet
Business Analytics ST1
13 pages
603-8-1 Donders - J Clin Epidemiol 2006 v59 n10 p1087-91
No ratings yet
603-8-1 Donders - J Clin Epidemiol 2006 v59 n10 p1087-91
5 pages
Modern Method Web in Ar May 2012
No ratings yet
Modern Method Web in Ar May 2012
45 pages
Ads Exp2
No ratings yet
Ads Exp2
3 pages
ISAT 600 Progress Report 2
No ratings yet
ISAT 600 Progress Report 2
6 pages
Missing Data & How To Handle It
No ratings yet
Missing Data & How To Handle It
32 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Unit 2 Notes - Docx-3
No ratings yet
Unit 2 Notes - Docx-3
14 pages
Roles of Imputation Methods For Filling The Missing Values: A Review
No ratings yet
Roles of Imputation Methods For Filling The Missing Values: A Review
9 pages
Ijctt V3i2p104
No ratings yet
Ijctt V3i2p104
5 pages
Ridge Regression
No ratings yet
Ridge Regression
82 pages
Allama Iqbal Open University Islamabad Assignment#1
No ratings yet
Allama Iqbal Open University Islamabad Assignment#1
27 pages
HKAL Pure Math Booklist
No ratings yet
HKAL Pure Math Booklist
8 pages
Iso 67892003
No ratings yet
Iso 67892003
5 pages
Ordered Pair:-An Ordered Pair Consist of Two Elements in A Fixed Order
No ratings yet
Ordered Pair:-An Ordered Pair Consist of Two Elements in A Fixed Order
19 pages
Advanced Mathematics 2
No ratings yet
Advanced Mathematics 2
4 pages
Unit-1 PCT
No ratings yet
Unit-1 PCT
14 pages
2022-2023 ASVAB Arithmetic Reasoning and Mathematics
No ratings yet
2022-2023 ASVAB Arithmetic Reasoning and Mathematics
4 pages
2025 02 Ransomware 2025
No ratings yet
2025 02 Ransomware 2025
23 pages
Dexterous Abacus Level 1 Workbook
No ratings yet
Dexterous Abacus Level 1 Workbook
91 pages
Module 4 - Lecture Notes Engineering Design-Pages-15-18,3-13,1
No ratings yet
Module 4 - Lecture Notes Engineering Design-Pages-15-18,3-13,1
16 pages
Cube and Cube Roots
No ratings yet
Cube and Cube Roots
5 pages
DesignXplorer 17.0 M01 Introduction
No ratings yet
DesignXplorer 17.0 M01 Introduction
32 pages
TCode & Table Data
No ratings yet
TCode & Table Data
10 pages
Importance of Equivalence
No ratings yet
Importance of Equivalence
1 page
Experiment Standing Wave
No ratings yet
Experiment Standing Wave
12 pages
Advances in Geophysics Volume 55 1st Edition Renata Dmowska
No ratings yet
Advances in Geophysics Volume 55 1st Edition Renata Dmowska
75 pages
3238-Article Text-5879-1-10-20180104
No ratings yet
3238-Article Text-5879-1-10-20180104
140 pages
ENISA2018 Dataset Imputation-2
No ratings yet
ENISA2018 Dataset Imputation-2
13 pages
Cs 101
No ratings yet
Cs 101
29 pages
Quadric Surfaces
No ratings yet
Quadric Surfaces
5 pages
CLRS Solution Chapter31
No ratings yet
CLRS Solution Chapter31
22 pages
SCIENCE
No ratings yet
SCIENCE
4 pages
Technical Assessment 1
No ratings yet
Technical Assessment 1
3 pages
ME 2016 Spring 24 Homework 3
No ratings yet
ME 2016 Spring 24 Homework 3
4 pages
Fibonacci Sequence
No ratings yet
Fibonacci Sequence
6 pages
CVP Analysis
No ratings yet
CVP Analysis
16 pages
Dramatic Inversion
No ratings yet
Dramatic Inversion
2 pages
Assigment 4. Cèsar Rodriguez
No ratings yet
Assigment 4. Cèsar Rodriguez
9 pages
Health History
No ratings yet
Health History
3 pages
Narayana: Common Practice Test-7
No ratings yet
Narayana: Common Practice Test-7
13 pages
Language For Mediating C1-C2
No ratings yet
Language For Mediating C1-C2
2 pages
Dramatic Inversion Answer Key
No ratings yet
Dramatic Inversion Answer Key
2 pages
Analisis Risiko Produksi Usahatani Bawang Merah Di Desa Petak Kecamatan Bagor Kabupaten Nganjuk
No ratings yet
Analisis Risiko Produksi Usahatani Bawang Merah Di Desa Petak Kecamatan Bagor Kabupaten Nganjuk
17 pages
Kinetika Kimia Orde 1
No ratings yet
Kinetika Kimia Orde 1
24 pages
Big-O Notation Demystified: Definitive Reference for Developers and Engineers
From Everand
Big-O Notation Demystified: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Journal of Statistical Software: Reviewer: Abdolvahab Khademi University of Massachusetts

Uploaded by

Journal of Statistical Software: Reviewer: Abdolvahab Khademi University of Massachusetts

Uploaded by

JSS Journal of Statistical Software

April 2020, Volume 93, Book Review 1. doi: 10.18637/jss.v093.b01

Reviewer: Abdolvahab Khademi

Flexible Imputation of Missing Data (2nd Edition)

Stef van Buuren

in the context of MI is introduced, followed by a short discussion on parallel processing

Journal of Statistical Software http://www.jstatsoft.org/

You might also like