Pandas DataFrame duplicated() Method - Python

Last Updated : 26 Jul, 2025

The duplicated() method in Pandas helps us to find these duplicates in our data quickly and returns True for duplicates and False for unique rows. It is used to clean our dataset before going into analysis. In this article, we'll see how the duplicated() method works with some examples.

Lets see an example:

import pandas as pd
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Alice', 'Charlie'],
    'Age': [25, 32, 25, 37]
})
duplicates = df[df.duplicated()]
print(duplicates)

 
import pandas as pd
df = pd.DataFrame({    'Name': ['Alice', 'Bob', 'Alice', 'Charlie'],    'Age': [25, 32, 25, 37]})duplicates = df[df.duplicated()]print(duplicates)

Output:

Name Age
2 Alice 25

Syntax:

DataFrame.duplicated(subset=None, keep='first')

Parameters:

1. subset: (Optional) Specifies which columns to check for duplicates. By default, it checks all columns.

2. keep: Finds which duplicates to mark as True:

'first' (default): Marks duplicates after the first occurrence as True.
'last': Marks duplicates after the last occurrence as True.
False: Marks all occurrences of duplicates as True.

Returns: A Boolean series where each value corresponds to whether the row is a duplicate (True) or unique (False).

Let's look at some examples of the duplicated method in Pandas library used to identify duplicated rows in a DataFrame. Here we will be using custom dataset.

You can download the dataset from Here.

Example 1: Returning a Boolean Series

In this example we will identify duplicate values in the First Name column using the default keep='first' parameter.. This keeps the first occurrence of each duplicate and marks the rest as duplicates.

import pandas as pd
data = pd.read_csv("/content/employees.csv")
data.sort_values("First Name", inplace = True)
bool_series = data["First Name"].duplicated()
data.head()
data[bool_series]

Output:

Example 2: Removing duplicates

In this example we'll remove all duplicates from the DataFrame. By setting keep=False we remove every instance of a duplicate.

import pandas as pd
data = pd.read_csv("/content/employees.csv")
data.sort_values("First Name", inplace = True)
bool_series = data["First Name"].duplicated(keep = False)
bool_series
data = data[~bool_series]
data.info()
data

Output:

Example 3: Keeping the Last Occurrence of Duplicates

In this example, we will keep the last occurrence of each duplicate and mark the rest as duplicates. This is done using the keep='last' arguments.

import pandas as pd
data = pd.read_csv("/content/employees.csv")
data.sort_values("First Name", inplace=True)
bool_series_last = data["First Name"].duplicated(keep='last')
data_last = data[~bool_series_last]
data_last.info()
print(data_last)

 
import pandas as pd
data = pd.read_csv("/content/employees.csv")data.sort_values("First Name", inplace=True)bool_series_last = data["First Name"].duplicated(keep='last')data_last = data[~bool_series_last]data_last.info()print(data_last)

Output:

Kartikaybhutani

Improve

Article Tags :

Practice Tags :

Pandas DataFrame duplicated() Method - Python

Example 1: Returning a Boolean Series

Example 2: Removing duplicates

Example 3: Keeping the Last Occurrence of Duplicates

Similar Reads

Python Fundamentals

Python Data Structures

Advanced Python

Data Science with Python

Web Development with Python

Python Practice

Thank You!

What kind of Experience do you want to share?

Log in

Create Account