Delete duplicates in a Pandas Dataframe based on two columns Last Updated : 21 Mar, 2024 Summarize Comments Improve Suggest changes Share Like Article Like Report A dataframe is a two-dimensional, size-mutable tabular data structure with labeled axes (rows and columns). It can contain duplicate entries and to delete them there are several ways. The dataframe contains duplicate values in column order_id and customer_id. Below are the methods to remove duplicate values from a dataframe based on two columns. Method 1: using drop_duplicates() Approach: We will drop duplicate columns based on two columnsLet those columns be 'order_id' and 'customer_id'Keep the latest entry onlyReset the index of dataframe Below is the python code for the above approach. Python3 # import pandas library import pandas as pd # load data df1 = pd.read_csv("super.csv") # drop rows which have same order_id # and customer_id and keep latest entry newdf = df1.drop_duplicates( subset = ['order_id', 'customer_id'], keep = 'last').reset_index(drop = True) # print latest dataframe display(newdf) Output: Method 2: using groupby() Approach: We will group rows based on two columnsLet those columns be 'order_id' and 'customer_id'Keep the first entry only The python code for the above approach is given below. Python3 # import pandas library import pandas as pd # read data df1 = pd.read_csv("super.csv") # group data over columns 'order_id' # and 'customer_id' and keep first entry only newdf1 = df1.groupby(['order_id', 'customer_id']).first() # print new dataframe print(newdf1) Output: Comment More infoAdvertise with us Next Article Delete duplicates in a Pandas Dataframe based on two columns R rohanchopra96 Follow Improve Article Tags : Python Python-pandas Python pandas-dataFrame Practice Tags : python Similar Reads How to Find & Drop duplicate columns in a Pandas DataFrame? Letâs discuss How to Find and drop duplicate columns in a Pandas DataFrame. First, Letâs create a simple Dataframe with column names 'Name', 'Age', 'Domicile', and 'Age'/'Marks'. Find Duplicate Columns from a DataFrameTo find duplicate columns we need to iterate through all columns of a DataFrame a 4 min read How to count duplicates in Pandas Dataframe? Let us see how to count duplicates in a Pandas DataFrame. Our task is to count the number of duplicate entries in a single column and multiple columns. Under a single column : We will be using the pivot_table() function to count the duplicates in a single column. The column in which the duplicates a 2 min read How to Delete a column from Pandas DataFrame Deleting data is one of the primary operations when it comes to data analysis. Very often we see that a particular column in the DataFrame is not at all useful for us and having it may lead to problems so we have to delete that column. For example, if we want to analyze the students' BMI of a partic 2 min read Find duplicate rows in a Dataframe based on all or selected columns Duplicating rows in a DataFrame involves creating identical copies of existing rows within a tabular data structure, such as a pandas DataFrame, based on specified conditions or across all columns. This process allows for the replication of data to meet specific analytical or processing requirements 5 min read Prevent duplicated columns when joining two Pandas DataFrames Column duplication usually occurs when the two data frames have columns with the same name and when the columns are not used in the JOIN statement. In this article, let us discuss the three different methods in which we can prevent duplication of columns when joining two data frames. Syntax: pandas. 5 min read Like