0% found this document useful (0 votes)

8 views

Handling missing data

The document discusses the challenges and techniques for handling missing data, emphasizing the importance of imputation methods to preserve data integrity and improve predictive accuracy. It highlights multiple imputation using Bayesian methods, specifically the MICE (Multiple Imputation by Chained Equations) approach, which accounts for uncertainty in missing data. The document also outlines various imputation techniques and their comparative advantages and disadvantages, providing insights into practical implementation using statistical programming tools like R and Python.

Uploaded by

Mayura D

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Handling missing data

Uploaded by

Mayura D

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.

com

Dealing with Missing Data-

The Art and Science of Imputation
May 2021

For the International Cost Estimating and Analysis

Association Conference – May 2021
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

IMPUTATION
FILLING IN HOLES IN DATASETS

THE PROBLEM OF MISSING DATA

A significant problem, especially for small datasets
Often dealt with by removing observations with missing data

TECHNIQUES FOR HANDLING MISSING DATA

A variety of techniques exist for filling in missing data, though
some perform better than others

FILLING IN HOLES WITH STATISTICS

Recognizing the inherent uncertainty in missing data, we
adopt and advocate the method of multiple imputation
using Bayesian methods (“chained equations”)

2
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

Why Imputation?
Is it worth it?

Preserves Data
Fooled by Randomness
Imputation prevents the reduction of
Having more data prevents us from falling
sample size due to missing values. This
prey to overly optimistic models that are
helps to preserve all responses in the
fit to more noise than signals
sample

Impute and
Assess Risk!

Preserves Structure of Data

Predictive Accuracy
When we remove data points, we could
Reducible uncertainty can be reduced by
be missing important patterns in the data,
increasing sample size. This helps to
which can cause our analysis to distort
improve predictive accuracy
true patterns within the data

3
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

DATA
Foundation of All
Analyses

The goal is to turn

data into
How Should We Handle It? information, and
The bulk of the time in analytics should
be spent on collecting, normalizing
information into
and verifying data. In defense and insight.
aerospace applications, datasets are
small. Data should be preserved when -Carly Fiorina

4 possible
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

IMPUTATION
To impute or not to impute, that is the question

01 02 03

Understand Determine Know when

the available variables that blanks are
data would benefit intentional
from imputation

Imputation is a powerful method that is useful for filling blanks when they are missing within a dataset
An analyst must understand the data intimately to know if a blank means that the factor is not applicable for
that data point
5
Sometimes a blank does not reflect a nonresponse and should be observed “as is”
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

Is the response missing at random?

The US Census Bureau

deals with missing data all
the time. If no response is
provided for the name of
Person 7 on the Census
form from the household
of six members, this missing
value is not an omission;
the response is “Not
Applicable”

6
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

ISSUES WITH DATA GAPS

What can go wrong?

Fewer Degrees of Reduction of Predictive Inability to Use

Freedom Power Advanced Methods
Removing observations with Predictive power is diminished Certain Machine Learning
missing values results in fewer when degrees of freedom are methods cannot be applied
degrees of freedom in models small when missing values are
prevalent

7
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

METHODS ALLOWING
MISSING DATA
Complete-Case Analysis
Approach that excludes any records with missing data.
Disadvantage – bias becomes introduced into the analysis
due to the removal of data that may provide insight into the
population

Available-Case Analysis
Approach allows the analysis of subsets of the complete
dataset so that multiple aspects of a problem can be
studied. Disadvantage – bias is again introduced if data are
missing in a pattern

Alternative to Allowing Missingness

Though methods exist to continue with analysis upon removal
of missing data, better alternatives exist for filling data gaps

8
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

IMPUTATION METHODS

Mean Imputation Imputing using Regression Expectation

Related Observations Imputation Maximization
Filling missing values with the Filling missing values with Replacing missing values with Replacing missing values by
mean of the observed values responses from related a predicted value based on exploring the covariation
observations the results of fitting a among variables in order to
regression line to the available infer values for the missing
data data

To retain as much of the precious gold (data) as possible, we should consider using imputation
methods. There are several methods you can choose to make a best statistical inference at a
response that will close a data gap

9
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

IMPUTATION METHODS
How do they compare?

Mean Imputation Related Observations Regression Imputation Expectation Maximization

This method helps to restrict the This method also helps to restrict This method uses regression to This method uses maximum
variability of the data variability in the data predict missing values. MICE is a likelihood method to estimate
regression imputation method missing values
Disadvantage: it weakens Disadvantage: Introduces
covariances and correlations measurement error Advantage: Produces unbiased Advantage: Increases precision
amount features estimates with data that are and decreases parameter bias
10
Missing At Random (MAR)
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

Tools for Imputation

R Python
R is a language and Python is a high-level
environment for statistical programming language with
computing and graphics. It is dynamic semantics. Like R,
an integrated suite of software Python supports modules and
facilities for data manipulation, packages to help with analysis
calculation and graphical
display

11
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

MICE
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

MULTIPLE IMPUTATION BY
CHAINED EQUATIONS
MICE

Method
This method creates multiple imputations for a missing value
that accounts for the statistical uncertainty in the imputation

Assumptions
This method operates under the assumption that the missing
data is MAR. MAR occurs when a data gap is full accounted
for by variables where there is complete information

Iterations
Multiple regression models are conducted and each variable
with missing data is modeled conditionally on the responses
of the other variables within the dataset. With this method,
each variable is modeled according to its own distribution

13
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

HOW MICE FILLS GAPS

Several imputed versions of the data are created using plausible data values

01 02 03

NUMBER #01 NUMBER #02 NUMBER #03

Multiple imputation is a series of stochastic The first step is an imputation step (I-step) The number of iterations, m, are specified
regression imputations that fills data gaps using stochastic for the number of imputations that are
regression conducted in the I-step

06 05 04
NUMBER #06 NUMBER #05 NUMBER #04
The coefficients of the individual equation The P-step proceeds by taking a random In posterior step (P-step), the mean and
are averaged using a simple, unweighted draw from the mean and covariance covariance distributions are calculated
mean. Goodness-of-fit measures are distributions, which are used to calculate from the filled-in data
14
calculated using the pooled results regression coefficients
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

THE MICE PROCESS

Given the multiple imputations, the coefficients of the individual equation are averaged (using a
simple, unweighted mean). The other parameters, including the degrees of freedom, standard
errors, and R2s are combined using what is known as Rubin’s Rules, after the statistician who
developed them

15
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

UNDERSTANDING THE DATA

Exploring engine data

Dataset
The data used for analysis is a Wheeled and Tracked Vehicle
Engine dataset. The dataset is small, which makes the use of
imputation very important

Included Features
Identification (ID), Brake Horsepower (bHP), Displacement
(DISP), Engine Speed (EngSP), Cylinders (CYL), Unit Cost in
Dollars (UC), Dry Weight (DryWGT)

Missing Counts
Of the seven features included in the dataset, four of those
seven have missing values.
N=9

16
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

Dataset Example
Four variables have missing data

ID bHP EngSP CYL DryWGT DISP UC

1 290 2600 6 7.2 $40,079
2 330 2400 6 1296 7.2 $40,927
3 330 2200 6 1905 8.8 $29,563
4 515 1500 6 3090 15.2 $63,931
5 675 2101 8 14.8 $111,976
6 675 2101 8 14.8 $120,661
7 500 2100 8 12.1 $47,873
8 362 2300 3230 12.1
9 340 8 912 6.6 $40,661

17
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

IMPLEMENTING MICE

01 02 03
We used the statistical Conduct linear regression on Pooling Results
programming platform R and each of the five imputed
Combining the results of these separate
the ‘mice’ package to datasets analyses is referred to as pooling
calculate imputed data
To view each of the imputed datasets, we The pooled regression equation has
use the complete() function: coefficients that are the arithmetic means
R code:
of the coefficients for the five individual
install.packages('mice’)
R code: regressions
library(mice)
completedData<-complete(imputedata,1)
data<-read(“Example.csv”)
Let m denote the number of imputed
imputdata<-mice(data, m=5, meth=‘pmm’,
The number one in the complete function datasets, 𝛽𝑖 denote the ith coefficient, and 𝛽𝑖𝑗
seed=23109)
indicates that you want to see the first denote the ith coefficient for the jth imputed dataset;
iteration. To see the other 2-5 datasets, you then:
Fixed seed to ensure the analysis is
will need to write functions to create and σ𝑚𝑗=1 𝛽𝑖𝑗
repeatable 𝛽𝑖 =
view those datasets 𝑚
The default in mice is m=5. This parameter
will need to be included if another value of
imputations is desired
18
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

IMPLEMENTING MICE

04 05 06
Pooling Results - 2 Goodness-of-Fit Statistics Compare Results
To fit a linear model to a dataset, use the Unlike the coefficients, you cannot simply Compare the results from the imputed
lm() function. Then, pool the m estimates average the R2 values, standard errors, the dataset to the original dataset with missing
𝑄෠ (1) , … , 𝑄෠ (𝑚) into one model 𝑄.
ഥ F-stats, etc., in order to calculate the values removed
goodness-of-fit statistics
R code:
Fit1<-with(imputedata,lm(UC~bHP)) R code:
Summary(pool(Fit1)) pool.r.squared(fit4, adjusted = FALSE)

poolF<-mi.anova(mi.res=imputedata,
formula="UC~bHP")

19
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

ANALYZING RESULTS
Creating plots to determine reasonableness of imputations

Scatterplot Analysis
There is a linear relationship
between UC and bHP. The pattern
of the relationship seems plausible
for the imputed values (pink) as
compared to the observed values
(blue)

Density Plot Analysis

Density plots provide a visual into
the shapes of each imputation. The
plot is useful to determine outlier
imputations and works for variables
with two or more missing values

20
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

MICE Results
ID bHP EngSP CYL DryWGT DISP UC

1 290 2600 6 3090, 1296, 7.2 $40,079

1905, 1905, 912
2 330 2400 6 1296 7.2 $40,927

3 330 2200 6 1905 8.8 $29,563

4 515 1500 6 3090 15.2 $63,931

5 675 2101 8 912, 3230, 1296, 14.8 $111,976

3090, 1905
6 675 2101 8 3090,1905, 14.8 $120,661
3090, 912, 912
7 500 2100 8 912, 3090, 1296, 12.1 $47,873
3090, 912
8 362 2300 8, 8, 8, 3230 12.1 $47,873,
6 $47,873,
$40,079
$40,927
$111,976
9 340 2400, 2400, 8 912 6.6 $40,661
2300, 2300,
2400
21
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

FIT RESULTS
Comparing results from the original dataset to the imputed (pooled) dataset

Linear Model MICE Imputed Model

The model is a solid one with a statistically significant p-value less than Though the R2 statistic is lower than the original dataset, we gained some
alpha = 0.05 and an R2 equal to 87.5%. One data point was removed due degrees of freedom with the use of imputation with the creation of this
to missing a unit cost value statistically significant model. The model does not gain a full degree of
freedom since the iterations are pooled

22
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

EXPECTATION
MAXIMIZATION
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

Expectation
Maximization
Imputing by optimizing

Maximum Likelihood
The maximum likelihood method is used to impute missing values.
This method uses available data to impute a value and then checks
to determine the reasonableness of the guess

Covariance
The covariation among variables is used to infer probable values for
the missing data

Two-Step Process
The method follows a two-step process to fill in missing data

24
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

EM TWO-STEP PROCESS
How EM fills data gaps

STEP #01 01 02 STEP #02

Iterative Process
The maximum likelihood estimates
EM is an of the mean vector and

First Pass at Filling Gaps iterative covariance matrix are calculated.

The algorithm begins by filling the process The covariance matrix is then used
to derive regression equations for
gaps with the conditional mean of
used to fill the next iteration and the cycle
the missing values.
data gaps continues until the difference
between the covariance matrices
in subsequent runs falls below the
convergence criteria

25
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

IMPLEMENTING EM

01 02 03
Show missingness patterns Performing maximum Pooling Results
likelihood estimation using
The function prelim.norm if used on a matrix The average of the imputations is
of the x (bHP) and y (cost) variables to sort EM algorithm calculated for the variable with missing
rows according to the missingness patterns values
Fixed seed to ensure the analysis is R code:
repeatable b<-em.norm(a) R code:
c1<-getparam.norm(a,b) c1$mu[1]
R code:
a<-prelim.norm(cbind(y,x) This function produces a vector which can The estimates for the coefficients of the
then be used to return a list of parameters model are then estimated
b.est<-c(c1$mu[1]-
(c1$sigma[1,2]/c1$sigma[2,2])*c1$mu[2],c1
$sigma[1,2]/c1$sigma[2,2])

The model can then be used to calculate

the missing values for the dataset

26
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

EM
ID bHP UC
1 290 $40,079
2 330 $40,927
3 330 $29,563
4 515 $63,931
5 675 $111,976
6 675 $120,661
7 500 $47,873
8 362 $59,771
9 340 $40,661

27
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

FIT RESULTS - 2
Comparing results from the original dataset to the EM imputed dataset

Linear Model EM Imputed Model

The model is a solid one with a statistically significant p-value less than Compared to the results produced from removing the data points with
alpha = 0.05 and an R2 equal to 87.5%. One data point was removed due missing values, this is a better performing model. A degree of freedom
to missing a unit cost value was gained and the R2 metric increased while the model retained
statistical significance

28
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

EXPECTATION MAXIMIZATION
Why choose EM?

ADVANTAGES DISADVANTAGES
EM preserves the relationship with other EM can sometime underestimate standard
variables, unlike mean imputation error

29
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

COMPARING METHODS
MICE VERSUS EM

MICE and EM are based on similar For small data sets, it is wise to run both and
assumptions and in practice they often compare the results, as small differences in
produce similar results. The Bayesian the methods could have an outsized
estimation in MICE is asymptotically impact when the number of data points is
equivalent to the maximum likelihood limited
estimates in EM, so for large data sets the
two methods should provide similar results

There are multiple methods which can be used to impute data. Two of the strongest techniques, MICE
and EM, should be considered first as they preserve relationships between independent and
dependent variables and estimate error more accurately.

The MICE method for imputation has an edge over EM since MICE calculates multiple imputations for
the missing values instead of one single estimate.

30
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

Q&A

THE FUTURE. DELIVERED.

Galorath provides solutions that help organizational leaders make complex business decisions
with confidence. Our predictive analytics products and services give complete insight into the
implications of significant technical or financial decisions, allowing organizations to execute a
plan with assurance and reach their goals with absolute certainty.

Learn more or schedule a demo

(310) 906-6320 • [email protected] Kimberly Roye Christian Smart, PhD, CCEA
[email protected] [email protected]

3
1
Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.com

Presenters

Kimberly Roye Christian Smart Dustin Hilton

Senior Data Scientist Chief Scientist Senior Cost Analyst
[email protected] [email protected] [email protected]

EventGuideSpoilers 1-0-35
No ratings yet
EventGuideSpoilers 1-0-35
10 pages
01-dealing-with-missing-data-the-art-and-science-of-imputation
No ratings yet
01-dealing-with-missing-data-the-art-and-science-of-imputation
26 pages
Data Imputation for Missing Values
No ratings yet
Data Imputation for Missing Values
14 pages
Missing Data & How To Handle It
No ratings yet
Missing Data & How To Handle It
32 pages
DADM S5 Imputation of Missing Data
No ratings yet
DADM S5 Imputation of Missing Data
15 pages
Missing Data
100% (2)
Missing Data
35 pages
ADS-EXP2
No ratings yet
ADS-EXP2
3 pages
Missing Data Techniques - UCLA
No ratings yet
Missing Data Techniques - UCLA
66 pages
Imputation
No ratings yet
Imputation
10 pages
ISAT 600 Progress Report 2
No ratings yet
ISAT 600 Progress Report 2
6 pages
Modern Method Web in Ar May 2012
No ratings yet
Modern Method Web in Ar May 2012
45 pages
Missing Data Analysis: University College London, 2015
No ratings yet
Missing Data Analysis: University College London, 2015
37 pages
platias2020-Greece
No ratings yet
platias2020-Greece
10 pages
Missing_Data
No ratings yet
Missing_Data
71 pages
Missing Data Mechanisms and Imputation Methods
No ratings yet
Missing Data Mechanisms and Imputation Methods
16 pages
Data Cleaning
No ratings yet
Data Cleaning
8 pages
Centraltendencywhattoconsider 1
No ratings yet
Centraltendencywhattoconsider 1
6 pages
WINSEM2018-19 - MGT1051 - TH - SJTG23 - VL2018195003627 - Reference Material I - 12-12 - C1 - BAE
No ratings yet
WINSEM2018-19 - MGT1051 - TH - SJTG23 - VL2018195003627 - Reference Material I - 12-12 - C1 - BAE
20 pages
Mice Lectures
No ratings yet
Mice Lectures
109 pages
Machine Learning Based Missing Data Imputation
No ratings yet
Machine Learning Based Missing Data Imputation
13 pages
MIssing Data Imputation Using Machine Learning Algorithm
No ratings yet
MIssing Data Imputation Using Machine Learning Algorithm
11 pages
3 -Missing Values-1
No ratings yet
3 -Missing Values-1
9 pages
Handling Missing Data
No ratings yet
Handling Missing Data
23 pages
Lecture 2.3.10
No ratings yet
Lecture 2.3.10
30 pages
Adsl Exp 3 2024
No ratings yet
Adsl Exp 3 2024
11 pages
Data - Preprocessing - 2
No ratings yet
Data - Preprocessing - 2
10 pages
Ijctt V3i2p104
No ratings yet
Ijctt V3i2p104
5 pages
Business Analytics ST1
No ratings yet
Business Analytics ST1
13 pages
BC 2014 Session2
No ratings yet
BC 2014 Session2
45 pages
missng data
No ratings yet
missng data
8 pages
a-comparison-of-three-popular-methods-for-handling-missing-data-complete-case-analysis-inverse
No ratings yet
a-comparison-of-three-popular-methods-for-handling-missing-data-complete-case-analysis-inverse
31 pages
Missing Data
No ratings yet
Missing Data
14 pages
v93b01
No ratings yet
v93b01
4 pages
Quntative Data Analysis SPSS: Formating, Handling, & Manipulation
No ratings yet
Quntative Data Analysis SPSS: Formating, Handling, & Manipulation
22 pages
FDS_U4.pptx
No ratings yet
FDS_U4.pptx
93 pages
chapter_3
No ratings yet
chapter_3
58 pages
An analysis of four missing data treatment methods for supervised learning
No ratings yet
An analysis of four missing data treatment methods for supervised learning
16 pages
Week 5 Lecture - Data Wrangling
No ratings yet
Week 5 Lecture - Data Wrangling
26 pages
SPSS
No ratings yet
SPSS
92 pages
S3 Missing Value Analysis Imputation
No ratings yet
S3 Missing Value Analysis Imputation
15 pages
Imputation: - Applied Multivariate Analysis & Statistical Learning
No ratings yet
Imputation: - Applied Multivariate Analysis & Statistical Learning
17 pages
Unit - 3 - R Programming
No ratings yet
Unit - 3 - R Programming
16 pages
Unit2 _Data Cleaning and Multivariate Techniques_26_01_2025
No ratings yet
Unit2 _Data Cleaning and Multivariate Techniques_26_01_2025
42 pages
Dealing With Missing Data: Key Assumptions and Methods For Applied Analysis
No ratings yet
Dealing With Missing Data: Key Assumptions and Methods For Applied Analysis
20 pages
11-Data Pre-Processing, Exploratory Data Analysis.-23-03-2023
No ratings yet
11-Data Pre-Processing, Exploratory Data Analysis.-23-03-2023
37 pages
Missing Data Imputation Using Singular Value Decomposition
No ratings yet
Missing Data Imputation Using Singular Value Decomposition
6 pages
2019 Multiple Imputations
No ratings yet
2019 Multiple Imputations
27 pages
Efron 1994
100% (1)
Efron 1994
14 pages
AI351 Lecture 1 - Data Preprocessing
No ratings yet
AI351 Lecture 1 - Data Preprocessing
8 pages
Mida (AE)
No ratings yet
Mida (AE)
12 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Missing Data
No ratings yet
Missing Data
25 pages
Missing Data Handling
No ratings yet
Missing Data Handling
19 pages
Class5 DataPreprocessing DataCleaning 23aug2021
No ratings yet
Class5 DataPreprocessing DataCleaning 23aug2021
14 pages
CH 02 Data Handling Technique
No ratings yet
CH 02 Data Handling Technique
105 pages
Marketing Analytics (Unit 2)
No ratings yet
Marketing Analytics (Unit 2)
78 pages
MICE Research Paper
No ratings yet
MICE Research Paper
17 pages
Data Preparation .1
No ratings yet
Data Preparation .1
37 pages
Multiple Imputation in Practice
No ratings yet
Multiple Imputation in Practice
11 pages
EXP-12_IAIML
No ratings yet
EXP-12_IAIML
13 pages
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
2012 KW Oem Engine Harness
No ratings yet
2012 KW Oem Engine Harness
4 pages
Giga Casting Straightening 2024
No ratings yet
Giga Casting Straightening 2024
21 pages
Tensorflow Vs Pytorch
No ratings yet
Tensorflow Vs Pytorch
10 pages
Lovol Service Manual_KTR
No ratings yet
Lovol Service Manual_KTR
13 pages
Machine Learning-Based Approaches For Breast Cancer Detection in Microwave Imaging
No ratings yet
Machine Learning-Based Approaches For Breast Cancer Detection in Microwave Imaging
2 pages
Lecture1 - Java Server Pages-Đã G P PDF
No ratings yet
Lecture1 - Java Server Pages-Đã G P PDF
465 pages
Cinematography Theory and Practice Image Making for Cinematographers and Directors 2nd Edition Blain Brown instant download
100% (1)
Cinematography Theory and Practice Image Making for Cinematographers and Directors 2nd Edition Blain Brown instant download
56 pages
Module 3
No ratings yet
Module 3
22 pages
Motion To Amend and Extention of Time
No ratings yet
Motion To Amend and Extention of Time
3 pages
CN lab report 1
No ratings yet
CN lab report 1
5 pages
Unit 1: Introduction To The Autocad Interface: Objectives: Assignments/Quizzes/Tests
No ratings yet
Unit 1: Introduction To The Autocad Interface: Objectives: Assignments/Quizzes/Tests
13 pages
OBE Video
No ratings yet
OBE Video
72 pages
Aanhidayatulloh,+7+etty+padmiati (1) - Dikonversi
No ratings yet
Aanhidayatulloh,+7+etty+padmiati (1) - Dikonversi
26 pages
AI Use Cases For Business Leaders:: Realize Value With AI
No ratings yet
AI Use Cases For Business Leaders:: Realize Value With AI
16 pages
PGM-C-010 Gyro Manual
No ratings yet
PGM-C-010 Gyro Manual
132 pages
Tech Man Eng Schemes 3590et 3590egt
No ratings yet
Tech Man Eng Schemes 3590et 3590egt
24 pages
Lab 5
No ratings yet
Lab 5
9 pages
Problem Solving: A D G H Finish
No ratings yet
Problem Solving: A D G H Finish
6 pages
Analysis and Simulation of A Multilevel Inverter Converter NPC Cascade
No ratings yet
Analysis and Simulation of A Multilevel Inverter Converter NPC Cascade
6 pages
ECE108 Course Notes
No ratings yet
ECE108 Course Notes
97 pages
Rotorflush Filters RF400 Data Sheet
No ratings yet
Rotorflush Filters RF400 Data Sheet
3 pages
A Data Analysis and Data Visualization Using Python
No ratings yet
A Data Analysis and Data Visualization Using Python
7 pages
Tenable Core + Nessus User Guide: Last Revised: October 28, 2021
No ratings yet
Tenable Core + Nessus User Guide: Last Revised: October 28, 2021
124 pages
GEC 7 Final Term Module
No ratings yet
GEC 7 Final Term Module
131 pages
Price List Traytek 2021 Final-Compressed
No ratings yet
Price List Traytek 2021 Final-Compressed
30 pages
Enabling HSTS For A Service
No ratings yet
Enabling HSTS For A Service
3 pages
NT680Pro User S Manual English V1.01
No ratings yet
NT680Pro User S Manual English V1.01
36 pages
Fall2021 Teams 4
No ratings yet
Fall2021 Teams 4
2 pages
Calculator SolarPVsystem SXPOL S4
No ratings yet
Calculator SolarPVsystem SXPOL S4
14 pages

Handling missing data

Uploaded by

Handling missing data

Uploaded by

Presented for the ICEAA 2021 Online Workshop - www.iceaaonline.

Dealing with Missing Data-

For the International Cost Estimating and Analysis

THE PROBLEM OF MISSING DATA

TECHNIQUES FOR HANDLING MISSING DATA

FILLING IN HOLES WITH STATISTICS

Preserves Structure of Data

The goal is to turn

Understand Determine Know when

Is the response missing at random?

The US Census Bureau

ISSUES WITH DATA GAPS

Fewer Degrees of Reduction of Predictive Inability to Use

Alternative to Allowing Missingness

Mean Imputation Imputing using Regression Expectation

Mean Imputation Related Observations Regression Imputation Expectation Maximization

Tools for Imputation

HOW MICE FILLS GAPS

NUMBER #01 NUMBER #02 NUMBER #03

THE MICE PROCESS

UNDERSTANDING THE DATA

ID bHP EngSP CYL DryWGT DISP UC

Density Plot Analysis

1 290 2600 6 3090, 1296, 7.2 $40,079

3 330 2200 6 1905 8.8 $29,563

4 515 1500 6 3090 15.2 $63,931

5 675 2101 8 912, 3230, 1296, 14.8 $111,976

Linear Model MICE Imputed Model

STEP #01 01 02 STEP #02

First Pass at Filling Gaps iterative covariance matrix are calculated.

The model can then be used to calculate

Linear Model EM Imputed Model

THE FUTURE. DELIVERED.

Learn more or schedule a demo

Kimberly Roye Christian Smart Dustin Hilton

You might also like