0% found this document useful (0 votes)
24 views22 pages

Lec9 Dealing With Missing Values

The document discusses approaches for dealing with missing values in data. It identifies missing values in Pandas dataframes and describes methods for filling missing values in both numerical and categorical variables, including using mean, median, and mode values. The document provides examples of using functions like fillna() to impute missing values.

Uploaded by

aniket786611
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views22 pages

Lec9 Dealing With Missing Values

The document discusses approaches for dealing with missing values in data. It identifies missing values in Pandas dataframes and describes methods for filling missing values in both numerical and categorical variables, including using mean, median, and mode values. The document provides examples of using functions like fillna() to impute missing values.

Uploaded by

aniket786611
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Dealing with missing values

In this lecture
 Identifying missing values

 Approaches to fill the missing values

Python for Data Science 2


Importing data into Spyder
 Importing necessary libraries
‘os’ library to change the working directory
‘pandas’ library to work with dataframes

 Changing the working directory

Python for Data Science 3


Importing data into Spyder

 Importing data

 Creating copies of original data

Python for Data Science 4


Identifying missing values
 In Pandas dataframes, missing data is represented
by NaN (an acronym for Not a Number)

 To check null values in Pandas dataframes,


isnull() and isna() are used

 These functions returns a dataframe of Boolean


values which are True for NaN values

Python for Data Science 5


Identifying missing values
Dataframe.isna.sum(), Dataframe.isnull.sum()
• To check the count of missing values present in each column

(or)

Python for Data Science 6


Identifying missing values

• Subsetting the rows that have one or more missing values

Python for Data Science 7


Identifying missing values

Python for Data Science 8


Approached to fill the missing values

Two ways of
approach

Fill the missing values with


Fill the missing values by
the class which has
mean / median, in case of
maximum count, in case of
numerical variable
categorical variable

Python for Data Science 9


Imputing missing values
• Look at the description to know whether numerical
variables should be imputed with mean or median

DataFrame.describe()

• Generate descriptive statistics that summarize the


central tendency, dispersion and shape of a dataset’s
distribution, excluding NaN values

Python for Data Science 10


Statistical summary of data

Python for Data Science 11


Imputing missing values of ‘Age’

• Calculating the mean value of the Age variable

• To fill NA/NaN values using the specified value


DataFrame.fillna()

Python for Data Science 12


Imputing missing values of ‘KM’

• Calculating the median value of the KM variable

• To fill NA/NaN values using the specified value


DataFrame.fillna()

Python for Data Science 13


Imputing missing values of ‘HP’
• Calculating the mean value of the HP variable

• To fill NA/NaN values using the specified value


DataFrame.fillna()

Python for Data Science 14


Imputing missing values of ‘HP’
• Check for missing data after filling values

Python for Data Science 15


Imputing missing values of ‘FuelType’
Series.value_counts()
• Returns a Series containing counts of unique values
• The values will be in descending order so that the
first element is the most frequently-occurring
element
• Excludes NA values by default

Python for Data Science 16


Imputing missing values of ‘FuelType’
Series.value_counts()
• To get the mode value of FuelType

• To fill NA/NaN values using the specified value


DataFrame.fillna()

Python for Data Science 17


Imputing missing values of ‘MetColor’
Series.value_counts()
• To get the mode value of MetColor

• To fill NA/NaN values using the specified value


DataFrame.fillna()

Python for Data Science 18


Checking for missing values

• Check for missing data after filling values

Python for Data Science 19


Imputing missing values using lambda functions

• To fill the NA/ NaN values in both numerical and categorial variables at one stretch

• Check for missing data after filling values

Python for Data Science 20


Summary
 Identifying missing values

 Approaches to fill the missing values

Python for Data Science 21


THANK YOU

You might also like