Pandas DataFrame duplicated() Method - Python
Last Updated :
07 Jun, 2025
Pandas is widely used library in Python used for tasks like cleaning, analyzing and transforming data. One important part of cleaning data is identifying and handling duplicate rows which can lead to incorrect results if left unchecked.
The duplicated() method in Pandas helps us to find these duplicates in our data quickly and returns True for duplicates and False for unique rows. It's a simple way to clean up our dataset before going into analysis. In this article, we'll see how the duplicated() method works with easy examples.
Lets see an example:
Python
import pandas as pd
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Alice', 'Charlie'],
'Age': [25, 32, 25, 37]
})
duplicates = df[df.duplicated()]
print(duplicates)
Output:
Name Age
2 Alice 25
Syntax
DataFrame.duplicated(subset=None, keep='first')
Parameters:
1. subset: (Optional) Specifies which columns to check for duplicates. By default, it checks all columns.
2. keep: Finds which duplicates to mark as True:
- 'first' (default): Marks duplicates after the first occurrence as True.
- 'last': Marks duplicates after the last occurrence as True.
- False: Marks all occurrences of duplicates as True.
Returns: A Boolean series where each value corresponds to whether the row is a duplicate (True) or unique (False).
Let's look at some examples of the duplicated method in Pandas library used to identify duplicated rows in a DataFrame. Here we will be using custom dataset you can download it by clicking Here.
Example 1: Returning a Boolean Series
In this example we will identify duplicate values in the First Name column using the default keep='first' parameter.. This keeps the first occurrence of each duplicate and marks the rest as duplicates.
Python
import pandas as pd
data = pd.read_csv("/content/employees.csv")
data.sort_values("First Name", inplace = True)
bool_series = data["First Name"].duplicated()
data.head()
data[bool_series]
Output:
OutputExample 2: Removing duplicates
In this example we'll remove all duplicates from the DataFrame. By setting keep=False we remove every instance of a duplicate.
Python
import pandas as pd
data = pd.read_csv("/content/employees.csv")
data.sort_values("First Name", inplace = True)
bool_series = data["First Name"].duplicated(keep = False)
bool_series
data = data[~bool_series]
data.info()
data
Output:
OutputExample 3: Keeping the Last Occurrence of Duplicates
In this example, we will keep the last occurrence of each duplicate and mark the rest as duplicates. This is done using the keep='last' arguments.
Python
import pandas as pd
data = pd.read_csv("/content/employees.csv")
data.sort_values("First Name", inplace=True)
bool_series_last = data["First Name"].duplicated(keep='last')
data_last = data[~bool_series_last]
data_last.info()
print(data_last)
Output:
OutputMastering the use of the duplicated() method in Pandas helps in effective data cleaning, helping us manage duplicate entries and retain only the unique, meaningful data for analysis.
Similar Reads
Apply function to every row in a Pandas DataFrame Python is a great language for performing data analysis tasks. It provides a huge amount of Classes and functions which help in analyzing and manipulating data more easily. In this article, we will see how we can apply a function to every row in a Pandas Dataframe. Apply Function to Every Row in a P
7 min read
Joining two Pandas DataFrames using merge() The merge() function is designed to merge two DataFrames based on one or more columns with matching values. The basic idea is to identify columns that contain common data between the DataFrames and use them to align rows. Let's understand the process of joining two pandas DataFrames using merge(), e
4 min read
Python | Pandas DataFrame.astype() Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. DataFrame.astype() method is used to cast a pandas object to a specified dtype.astype(
4 min read
Python | Pandas DataFrame.set_index() Pandas DataFrame.set_index() method sets one or more columns as the index of a DataFrame. It can accept single or multiple column names and is useful for modifying or adding new indices to your DataFrame. By doing so, you can enhance data retrieval, indexing, and merging tasks.Syntax: DataFrame.set_
3 min read
Pandas DataFrame.reset_index() In Pandas, reset_index() method is used to reset the index of a DataFrame. By default, it creates a new integer-based index starting from 0, making the DataFrame easier to work with in various scenarios, especially after performing operations like filtering, grouping or multi-level indexing. Example
3 min read
Python | Pandas Dataframe.at[ ] Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas at[] is used to return data in a dataframe at the passed location. The passed l
2 min read
Pandas DataFrame iterrows() Method iterrows() method in Pandas is a simple way to iterate over rows of a DataFrame. It returns an iterator that yields each row as a tuple containing the index and the row data (as a Pandas Series). This method is often used in scenarios where row-wise operations or transformations are required. Exampl
4 min read
Python | Pandas Series.iteritems() Pandas series is a One-dimensional ndarray with axis labels. The labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Pandas Series.iteritems() function iterates
2 min read
Python | Pandas.to_datetime() When a CSV file is imported and a Data Frame is made, the Date time objects in the file are read as a string object rather than a Date Time object Hence itâs very tough to perform operations like Time difference on a string rather than a Date Time object. Pandas to_datetime() method helps to convert
4 min read
Python | pandas.to_numeric method Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. pandas.to_numeric() is one of the general functions in Pandas which is used to convert
2 min read