0% found this document useful (0 votes)
17 views16 pages

Pandas - Dataframe - Handling Missing Nan Values

The document provides a comprehensive guide on handling missing or NaN values in Pandas, covering methods such as isna(), isnull(), dropna(), and fillna(). It explains how to create DataFrames from CSV files, check for missing values, count them, and handle them by dropping or filling with specific values. Various code examples illustrate the use of these methods for practical data manipulation.

Uploaded by

dheerajsai01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views16 pages

Pandas - Dataframe - Handling Missing Nan Values

The document provides a comprehensive guide on handling missing or NaN values in Pandas, covering methods such as isna(), isnull(), dropna(), and fillna(). It explains how to create DataFrames from CSV files, check for missing values, count them, and handle them by dropping or filling with specific values. Various code examples illustrate the use of these methods for practical data manipulation.

Uploaded by

dheerajsai01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Data Science – Pandas – Handling Missing or NaN values

12. PANDAS – Handling missing or NaN values

Contents
1. NaN Value ........................................................................................................................................... 2
2. Creating a DataFrame by loading csv file .......................................................................................... 3
3. isna() and isnull() method – Checking NaN values ............................................................................ 4
4. notnull() method – Checking NaN values .......................................................................................... 6
5. Counting NaN values in column wise ................................................................................................ 7
6. dropna() method – Handling missing values ..................................................................................... 9
7. dropna(inplace = True) method – Handling missing values............................................................ 12
8. fillna() method – Handling missing values ...................................................................................... 13

1|Page 12.PANDAS – HANDLING NAN VALUES


Data Science – Pandas – Handling Missing or NaN values

12. PANDAS – Handling missing or NaN values

1. NaN Value

 The full form of NaN is Not a Number


 The purpose of NaN is, to represent the missing values in data.
 The data type of NaN is float.
 While loading csv file, if file having missing values then it will be
considered as NaN values.
 During data analysis we need to handle these NaN values.
o For Example, suppose different users being surveyed may choose
not to share their income, some user may choose not to share the
address in this way many datasets went missing.

None and NaN

 None : None is a Python object which is holding nothing


 NaN : NaN is a pandas related object which represents missing data

2|Page 12.PANDAS – HANDLING NAN VALUES


Data Science – Pandas – Handling Missing or NaN values

2. Creating a DataFrame by loading csv file

 We can create DataFrame by loading csv file


 The given fruits.csv file having missing values.
 Kindly observe the missing/NaN values in DataFrame.

Program Loading fruits csv file


Name demo1.py
Input file fruits1.csv

import pandas as pd

df1 = pd.read_csv("fruits1.csv")

print(df1)

Output

3|Page 12.PANDAS – HANDLING NAN VALUES


Data Science – Pandas – Handling Missing or NaN values

3. isna() and isnull() method – Checking NaN values

 isna() and isnull() are a predefined methods in DataFrame


 We can access these methods by using DataFrame object.
 By using these methods we can check missing values exist in DataFrame
or not.
 If missing values are available then it return as True, otherwise False

Program isna() method


Name demo2.py
Input file fruits1.csv

import pandas as pd

df1 = pd.read_csv("fruits1.csv")
df2 = df1.isna()

print(df1.head())
print()
print(df2.head())

Output

4|Page 12.PANDAS – HANDLING NAN VALUES


Data Science – Pandas – Handling Missing or NaN values

Program isnull() method


Name demo3.py
Input file fruits1.csv

import pandas as pd

df1 = pd.read_csv("fruits1.csv")
df2 = df1.isnull()

print(df1.head())
print()
print(df2.head())

Output

Make a note

 isnull() and isna() both methods works in same way

5|Page 12.PANDAS – HANDLING NAN VALUES


Data Science – Pandas – Handling Missing or NaN values

4. notnull() method – Checking NaN values

 notnull() is a predefined method in DataFrame


 We can access this method by using DataFrame object.
 By using this method we can check missing values exist in DataFrame or
not.
 If missing values are available then it return as False, otherwise True

Program notnull() method


Name demo4.py
Input file fruits1.csv

import pandas as pd

df1 = pd.read_csv("fruits1.csv")
df2 = df1.notnull()

print(df1.head())
print()
print(df2.head())

Output

6|Page 12.PANDAS – HANDLING NAN VALUES


Data Science – Pandas – Handling Missing or NaN values

5. Counting NaN values in column wise

 We can count number of missing values in DataFrame


 By using isna() and sum() methods we can count the number of missing
values in each column.

Program Counting the missing values in each column


Name demo5.py
Input file fruits1.csv

import pandas as pd

df1 = pd.read_csv('fruits1.csv')
s = df1.isna().sum()

print(s)

Output

7|Page 12.PANDAS – HANDLING NAN VALUES


Data Science – Pandas – Handling Missing or NaN values

Program Counting the missing values in each column with percentage


Name demo6.py
Input file fruits1.csv

import pandas as pd

df1 = pd.read_csv('fruits1.csv')
s = df1.isna().sum()
per = (s * 100) / len(df1)

print(per)

Output

8|Page 12.PANDAS – HANDLING NAN VALUES


Data Science – Pandas – Handling Missing or NaN values

6. dropna() method – Handling missing values

 dropna() is a predefined method in DataFrame


 We can access dropna() method by using DataFrame object.
 This method drops the rows where at least one value is missing.

Program Dropping rows where NaN values existing


Name demo7.py
Input file fruits1.csv

import pandas as pd

df1 = pd.read_csv("fruits1.csv")
df2 = df1.dropna()

print(df2)

Output

9|Page 12.PANDAS – HANDLING NAN VALUES


Data Science – Pandas – Handling Missing or NaN values

Program Dropping rows where NaN values existing and counting


Name demo8.py
Input file fruits1.csv

import pandas as pd

df1 = pd.read_csv("fruits1.csv")
df2 = df1.dropna()
s = df2.isna().sum()

print(s)

Output

10 | P a g e 12.PANDAS – HANDLING NAN VALUES


Data Science – Pandas – Handling Missing or NaN values

Program Converting float column type into int data type


Name demo9.py
Input file fruits1.csv

import pandas as pd

df1 = pd.read_csv('fruits1.csv')
df2 = df1.dropna()
df3 = df2.astype(int)

print(df2.head())
print()
print(df3.head())

Output

11 | P a g e 12.PANDAS – HANDLING NAN VALUES


Data Science – Pandas – Handling Missing or NaN values

7. dropna(inplace = True) method – Handling missing values

 dropna(inplace = True) is a predefined method in DataFrame


 We can access this method by using DataFrame object.
 This method drops the rows and perform changes on existing
DataFrame.

Program Dropping NaN values by using inplace parameter


Name demo10.py
Input file fruits1.csv

import pandas as pd

df1 = pd.read_csv("fruits1.csv")
df1.dropna(inplace = True)

print(df1)

Output

12 | P a g e 12.PANDAS – HANDLING NAN VALUES


Data Science – Pandas – Handling Missing or NaN values

8. fillna() method – Handling missing values

 fillna() is a predefined method in DataFrame


 We can access this method by using DataFrame object.
 By using this method we can fill missing/NaN values with specific value.
o fillna(0) -> This method fill NaN with Zero values
o fillna(number) -> This method fill NaN with number

Program Filling NaN values with zero


Name demo11.py
Input file fruits1.csv

import pandas as pd

df1 = pd.read_csv("fruits1.csv")
df2 = df1.fillna(0)

print(df1.head())
print()
print(df2.head())

Output

13 | P a g e 12.PANDAS – HANDLING NAN VALUES


Data Science – Pandas – Handling Missing or NaN values

Program Filling NaN values with specific value


Name demo12.py

import pandas as pd
import numpy as np

data = [
["Rajan", 26, 40000],
["Daniel", 16, 20000],
["Veeru", 45, 90000],
["Venkat", np.nan, 45000],
["Sumanth", 20, 95000],
["Shafi", np.nan, 97000]
]

df1 = pd.DataFrame(data, columns = ['Name', 'Age', 'Salary'])


df2 = df1.fillna(22)

print(df1)
print()
print(df2)

Output

14 | P a g e 12.PANDAS – HANDLING NAN VALUES


Data Science – Pandas – Handling Missing or NaN values

Program Filling NaN value with mean value


Name demo13.py

import pandas as pd
import numpy as np

data = [
["Shahid", 26, 40000],
["Daniel", 16, 20000],
["Karteek", np.nan, 90000],
["Venkat", np.nan, 45000],
["Veeru", 24, 95000],
["Shafi", np.nan, 97000]
]

df1 = pd.DataFrame(data, columns = ['Name', 'Age', 'Salary'])

print(df1)
m = df1['Age'].mean()
df1['Age'] = df1['Age'].fillna(m)
print()
print(df1)

Output

15 | P a g e 12.PANDAS – HANDLING NAN VALUES


Data Science – Pandas – Handling Missing or NaN values

Program Creating dataframe and replacing nan values with specific value
Name demo14.py

import pandas as pd
import numpy as np

data = [
['Shahid', np.nan, 40000],
['Daniel', 16, 20000],
['Veeru', 45, 90000],
['Sumanth', 20, 95000]
]

df1 = pd.DataFrame(data, columns = ['Name', 'Age', 'Salary'])

print(df1)

df2 = df1.replace(np.nan, 0)

print()
print(df2)

Output

16 | P a g e 12.PANDAS – HANDLING NAN VALUES

You might also like