Lecture 8 Handling Missing Values
Lecture 8 Handling Missing Values
2
What are missing values ?
3
Why is Missing Data Treatment
Required?
• Missing data in the training data set can reduce the power / fit of a model.
• It can lead to a biased model because we have not analyzed the behavior and
relationship with other variables correctly.
• It can lead to wrong prediction or classification.
• In some cases , we may not be able to apply a given algorithm/ method if
there are missing values in the data
4
Why is Missing Data Treatment required?
5
Why my Data has Missing Values?
6
Treating Missing Data
7
Deleting Missing Values
8
Deleting Missing Values
9
Missing Values Imputation
10
Handling Missing Values in
Python
11
A Sample Use Case
12
Missing Values Recognized by Pandas
13
14
How many Missing Values are there?
15
How many missing values are there ?
16
Defining the Missing Values
17
18
Checking Individual Columns for Null
Values
19
Converting invalid entries to NaN
20
Other Useful and Efficient Methods
• Pandas replace() method
• Lambda Functions 21
Dealing With Missing Values
22
Replacing with Median –
NUM_BEDROOMS
23
Deleting Rows with NULL Values
• In some cases we can delete all the rows with Null Values
24
Functions to Explore .
• inna()
• isnull()
• fillna()
• dropna()
• replace()
• mean()
• median()
• mode() , mode()[0]
25