Working with Missing Data in Pandas
Last Updated :
02 Jun, 2025
In Pandas, missing data occurs when some values are missing or not collected properly and these missing values are represented as:
- None: A Python object used to represent missing values in object-type arrays.
- NaN: A special floating-point value from NumPy which is recognized by all systems that use IEEE floating-point standards.
In this article we see how to detect, handle and fill missing values in a DataFrame to keep the data clean and ready for analysis.
Checking Missing Values in Pandas
Pandas provides two important functions which help in detecting whether a value is NaN helpful in making data cleaning and preprocessing easier in a DataFrame or Series are given below :
1. Using isnull()
isnull() returns a DataFrame of Boolean value where True represents missing data (NaN). This is simple if we want to find and fill missing data in a dataset.
Example 1: Finding Missing Values in a DataFrame
We will be using Numpy and Pandas libraries for this implementation.
Python
import pandas as pd
import numpy as np
d = {'First Score': [100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score': [np.nan, 40, 80, 98]}
df = pd.DataFrame(d)
mv = df.isnull()
print(mv)
Output
Example 2: Filtering Data Based on Missing Values
Here we used random Employee dataset, you can download the csv file from here. The isnull() function is used over the "Gender" column in order to filter and print out rows containing missing gender data.
Python
import pandas as pd
d = pd.read_csv("/content/employees.csv")
bool_series = pd.isnull(d["Gender"])
missing_gender_data = d[bool_series]
print(missing_gender_data)
Output

2. Checking for Non-Missing Values Using notnull()
notnull() function returns a DataFrame with Boolean values where True indicates non-missing (valid) data. This function is useful when we want to focus only on the rows that have valid, non-missing values.
Example 1: Identifying Non-Missing Values in a DataFrame
Python
import pandas as pd
import numpy as np
d = {'First Score': [100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score': [np.nan, 40, 80, 98]}
df = pd.DataFrame(d)
nmv = df.notnull()
print(nmv)
Output

Example 2: Filtering Data with Non-Missing Values
notnull() function is used over the "Gender" column in order to filter and print out rows containing missing gender data.
Python
import pandas as pd
d = pd.read_csv("/content/employees.csv")
nmg = pd.notnull(d["Gender"])
nmgd= d[nmg]
display(nmgd)
Output

Filling Missing Values in Pandas
Following functions allow us to replace missing values with a specified value or use interpolation methods to find the missing data.
1. Using fillna()
fillna() used to replace missing values (NaN) with a given value. Lets see various example for this.
Example 1: Fill Missing Values with Zero
Python
import pandas as pd
import numpy as np
d = {'First Score': [100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score': [np.nan, 40, 80, 98]}
df = pd.DataFrame(d)
df.fillna(0)
Output

Example 2: Fill with Previous Value (Forward Fill)
The pad method is used to fill missing values with the previous value.
Python
Output

Example 3: Fill with Next Value (Backward Fill)
The bfill function is used to fill it with the next value.
Python
df.fillna(method='bfill')
Output

Example 4: Fill NaN Values with 'No Gender'
Python
import pandas as pd
import numpy as np
d = pd.read_csv("/content/employees.csv")
d[10:25]
Output

Now we are going to fill all the null values in Gender column with "No Gender"
Python
d["Gender"].fillna('No Gender', inplace = True)
d[10:25]
Output

2. Using replace()
Use replace() function to replace NaN values with a specific value.
Example
Python
import pandas as pd
import numpy as np
data = pd.read_csv("/content/employees.csv")
data[10:25]
Output

Now, we are going to replace the all NaN value in the data frame with -99 value.
Python
data.replace(to_replace=np.nan, value=-99)
Output

3. Using interpolate()
The interpolate() function fills missing values using interpolation techniques such as the linear method.
Example
Python
import pandas as pd
df = pd.DataFrame({"A": [12, 4, 5, None, 1],
"B": [None, 2, 54, 3, None],
"C": [20, 16, None, 3, 8],
"D": [14, 3, None, None, 6]})
print(df)
Output

Let’s interpolate the missing values using Linear method. This method ignore the index and consider the values as equally spaced.
Python
df.interpolate(method ='linear', limit_direction ='forward')
Output

Dropping Missing Values in Pandas
The dropna() function used to removes rows or columns with NaN values. It can be used to drop data based on different conditions.
1. Dropping Rows with At Least One Null Value
Remove rows that contain at least one missing value.
Example
Python
import pandas as pd
import numpy as np
dict = {'First Score': [100, 90, np.nan, 95],
'Second Score': [30, np.nan, 45, 56],
'Third Score': [52, 40, 80, 98],
'Fourth Score': [np.nan, np.nan, np.nan, 65]}
df = pd.DataFrame(dict)
df.dropna()
Output
2. Dropping Rows with All Null Values
We can drop rows where all values are missing using dropna(how='all').
Example
Python
dict = {'First Score': [100, np.nan, np.nan, 95],
'Second Score': [30, np.nan, 45, 56],
'Third Score': [52, np.nan, 80, 98],
'Fourth Score': [np.nan, np.nan, np.nan, 65]}
df = pd.DataFrame(dict)
df.dropna(how='all')
Output

3. Dropping Columns with At Least One Null Value
To remove columns that contain at least one missing value we use dropna(axis=1).
Example
Python
dict = {'First Score': [100, np.nan, np.nan, 95],
'Second Score': [30, np.nan, 45, 56],
'Third Score': [52, np.nan, 80, 98],
'Fourth Score': [60, 67, 68, 65]}
df = pd.DataFrame(dict)
df.dropna(axis=1)
Output

4. Dropping Rows with Missing Values in CSV Files
When working with CSV files, we can drop rows with missing values using dropna().
Example
Python
import pandas as pd
d = pd.read_csv("/content/employees.csv")
nd = d.dropna(axis=0, how='any')
print("Old data frame length:", len(d))
print("New data frame length:", len(nd))
print("Rows with at least one missing value:", (len(d) - len(nd)))
Output:
Drop Rows with NaNSince the difference is 236, there were 236 rows which had at least 1 Null value in any column. By using these functions we can easily detect, handle and fill missing values.
Similar Reads
Data Analysis (Analytics) Tutorial Data Analytics is a process of examining, cleaning, transforming and interpreting data to discover useful information, draw conclusions and support decision-making. It helps businesses and organizations understand their data better, identify patterns, solve problems and improve overall performance.
4 min read
Prerequisites for Data Analysis
Exploratory Data Analysis (EDA) with NumPy, Pandas, Matplotlib and SeabornExploratory Data Analysis (EDA) serves as the foundation of any data science project. It is an essential step where data scientists investigate datasets to understand their structure, identify patterns, and uncover insights. Data preparation involves several steps, including cleaning, transforming,
4 min read
SQL for Data AnalysisSQL (Structured Query Language) is a powerful tool for data analysis, allowing users to efficiently query and manipulate data stored in relational databases. Whether you are working with sales, customer or financial data, SQL helps extract insights and perform complex operations like aggregation, fi
6 min read
Python | Math operations for Data analysisPython is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.There are some important math operations that can be performed on a pandas series to si
2 min read
Python - Data visualization tutorialData visualization is a crucial aspect of data analysis, helping to transform analyzed data into meaningful insights through graphical representations. This comprehensive tutorial will guide you through the fundamentals of data visualization using Python. We'll explore various libraries, including M
7 min read
Free Public Data Sets For AnalysisData analysis is a crucial aspect of modern decision-making processes across various domains, including business, academia, healthcare, and government. However, obtaining high-quality datasets for analysis can be challenging and costly. Fortunately, there are numerous free public datasets available
5 min read
Data Analysis Libraries
Understanding the Data
What is Data ?Data is a word we hear everywhere nowadays. In general, data is a collection of facts, information, and statistics and this can be in various forms such as numbers, text, sound, images, or any other format.In this article, we will learn about What is Data, the Types of Data, Importance of Data, and
9 min read
Understanding Data Attribute Types | Qualitative and QuantitativeWhen we talk about data mining , we usually discuss knowledge discovery from data. To learn about the data, it is necessary to discuss data objects, data attributes, and types of data attributes. Mining data includes knowing about data, finding relations between data. And for this, we need to discus
6 min read
Univariate, Bivariate and Multivariate data and its analysisIn this article,we will be discussing univariate, bivariate, and multivariate data and their analysis. Univariate data: Univariate data refers to a type of data in which each observation or data point corresponds to a single variable. In other words, it involves the measurement or observation of a s
5 min read
Attributes and its Types in Data AnalyticsIn this article, we are going to discuss attributes and their various types in data analytics. We will also cover attribute types with the help of examples for better understanding. So let's discuss them one by one. What are Attributes?Attributes are qualities or characteristics that describe an obj
4 min read
Loading the Data
Data Cleaning
What is Data Cleaning?Data cleaning, also known as data cleansing or data scrubbing, is the process of identifying and correcting (or removing) errors, inconsistencies, and inaccuracies within a dataset. This crucial step in the data management and data science pipeline ensures that the data is accurate, consistent, and
12 min read
ML | Overview of Data CleaningData cleaning is a important step in the machine learning (ML) pipeline as it involves identifying and removing any missing duplicate or irrelevant data. The goal of data cleaning is to ensure that the data is accurate, consistent and free of errors as raw data is often noisy, incomplete and inconsi
13 min read
Best Data Cleaning Techniques for Preparing Your DataData cleaning, also known as data cleansing or data scrubbing, is the process of identifying and correcting errors, inconsistencies, and inaccuracies in datasets to improve their quality, accuracy, and reliability for analysis or other applications. It involves several steps aimed at detecting and r
6 min read
Handling Missing Data
Working with Missing Data in PandasIn Pandas, missing data occurs when some values are missing or not collected properly and these missing values are represented as:None: A Python object used to represent missing values in object-type arrays.NaN: A special floating-point value from NumPy which is recognized by all systems that use IE
5 min read
Drop rows from Pandas dataframe with missing values or NaN in columnsWe are given a Pandas DataFrame that may contain missing values, also known as NaN (Not a Number), in one or more columns. Our task is to remove the rows that have these missing values to ensure cleaner and more accurate data for analysis. For example, if a row contains NaN in any specified column,
4 min read
Count NaN or missing values in Pandas DataFrameIn this article, we will see how to Count NaN or missing values in Pandas DataFrame using isnull() and sum() method of the DataFrame. 1. DataFrame.isnull() MethodDataFrame.isnull() function detect missing values in the given object. It return a boolean same-sized object indicating if the values are
3 min read
ML | Handling Missing ValuesMissing values are a common issue in machine learning. This occurs when a particular variable lacks data points, resulting in incomplete information and potentially harming the accuracy and dependability of your models. It is essential to address missing values efficiently to ensure strong and impar
12 min read
Working with Missing Data in PandasIn Pandas, missing data occurs when some values are missing or not collected properly and these missing values are represented as:None: A Python object used to represent missing values in object-type arrays.NaN: A special floating-point value from NumPy which is recognized by all systems that use IE
5 min read
ML | Handle Missing Data with Simple ImputerSimpleImputer is a scikit-learn class which is helpful in handling the missing data in the predictive model dataset. It replaces the NaN values with a specified placeholder. It is implemented by the use of the SimpleImputer() method which takes the following arguments : missing_values : The missing_
2 min read
How to handle missing values of categorical variables in Python?Machine Learning is the field of study that gives computers the capability to learn without being explicitly programmed. Often we come across datasets in which some values are missing from the columns. This causes problems when we apply a machine learning model to the dataset. This increases the cha
4 min read
Replacing missing values using Pandas in PythonDataset is a collection of attributes and rows. Data set can have missing data that are represented by NA in Python and in this article, we are going to replace missing values in this article We consider this data set: Dataset data set In our data contains missing values in quantity, price, bought,
2 min read
Outliers Detection
Exploratory Data Analysis
Time Series Data Analysis