0% found this document useful (0 votes)
6 views

Lecture 8 Handling Missing Values

Uploaded by

Fatima Chaudhry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Lecture 8 Handling Missing Values

Uploaded by

Fatima Chaudhry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Application Development (CSF 510)

Handling Missing Values using Pandas


Lecture Contents
• What are missing values?
• Why missing value treatment is required?
• Why there are missing values in data?
• Detecting Missing values
• Configuring missing values
• Handling missing values

2
What are missing values ?

3
Why is Missing Data Treatment
Required?
• Missing data in the training data set can reduce the power / fit of a model.
• It can lead to a biased model because we have not analyzed the behavior and
relationship with other variables correctly.
• It can lead to wrong prediction or classification.
• In some cases , we may not be able to apply a given algorithm/ method if
there are missing values in the data

4
Why is Missing Data Treatment required?

5
Why my Data has Missing Values?

• Data Extraction Problems


• Data import from different data sources etc
• Problems at Data Collection Level
• The data is not available ( Non-Availability)
• Data is not Mandatory for the business operations , so not recorded
• Wrong data is also considered as missing data

6
Treating Missing Data

• We can handle missing values by:


• Drop Null or Missing Values Altogether
• Fill Missing Values using Imputation ( Mean , Median , Mode)
• Predicting Missing Values with Machine Learning Algorithm

7
Deleting Missing Values

• This is the fastest and easiest step to handle missing values.


• This method reduces the quality of our model as it reduces sample size.
• If the missing data is less than 5 % for a large dataset , we can delete missing
values :
• Deleting 500 records from 10,000 records

8
Deleting Missing Values

9
Missing Values Imputation

• We can impute the missing values


• Using mean and median for numerical attributes
• Mean is sensitive to outliers
• Using mode for Categorical Attributes
• Using a Machine Learning Algorithm like KNN
• It will give the maximum close estimation , but
• It can be compute intensive

10
Handling Missing Values in
Python

11
A Sample Use Case

12
Missing Values Recognized by Pandas

• A Pandas dataframe can recognize the following values as missing:


• NA
• NaN
• n/a
• N/A
• Blank Field

13
14
How many Missing Values are there?

15
How many missing values are there ?

16
Defining the Missing Values

17
18
Checking Individual Columns for Null
Values

19
Converting invalid entries to NaN

20
Other Useful and Efficient Methods
• Pandas replace() method

• NumPy where() method

• Lambda Functions 21
Dealing With Missing Values

• Replace with a given Value

22
Replacing with Median –
NUM_BEDROOMS

23
Deleting Rows with NULL Values

• In some cases we can delete all the rows with Null Values

24
Functions to Explore .
• inna()
• isnull()
• fillna()
• dropna()
• replace()
• mean()
• median()
• mode() , mode()[0]

25

You might also like