Open In App

Pandas DataFrame drop() Method

Last Updated : 04 Dec, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Drop is a useful functionality in Pandas used to remove specified labels from rows or columns in a DataFrame and it provides options to modify the original DataFrame directly or return a new one with the changes. Since drop works for both columns and rows we have to specify the axis. By default, the axis is 0 which means that the rows are deleted by default. To delete the columns, we specify the axis value as 1. For example: Let us consider a sample dataframe and drop a row and column:

Python
import pandas as pd

df = pd.DataFrame({'A': [1, 2],'B': [4, 5],'C': [7, 8]})
print("Original DataFrame:\n", df)

df_dropped_col = df.drop(columns=['B']) # Dropping a column
print("\nDataFrame after dropping column 'B': \n", df_dropped_col)

df_dropped_row = df.drop(index=1) # Dropping a row
print("\nDataFrame after dropping row with index 1:\n", df_dropped_row)

Output
Original DataFrame:
    A  B  C
0  1  4  7
1  2  5  8

DataFrame after dropping column 'B':
    A  C
0  1  7
1  2  8

DataFrame after dropping row with index 1:
    A  B  C
0  1  4  7

Understanding the Syntax and Parameters of drop() Method

The drop() method has a straightforward syntax that provides flexibility in specifying what to remove:

DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise'), here:

  • labels denote the row labels or column labels
  • axis=0 denotes rows and axis=1 denotes columns
  • index denotes row indices
  • columns variable takes the list of column names
  • level used for multi indexing dataframes
  • inplace = True means changes will be reflected in the original dataframe.
  • errors can be either raise or ignore. This parameter is used when the column names do not exist in the dataframe.

How to Drop Rows Using Index Labels?

Dropping rows by index labels is efficient when you know which specific rows need removal, it ensures that only the specified rows are removed without affecting others. The approach is crucial for refining datasets by eliminating irrelevant or erroneous entries without altering the overall structure.

In general we use index values to access the rows of the dataframe. So we pass the list of row indices that are to be dropped.

Python
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
df_dropped = df.drop(0, axis=0)
print(df_dropped)

Output
   A  B  C
1  2  5  8
2  3  6  9

This code drops rows with index labels 0 directly from the original DataFrame.

Dropping the Columns by Label

Here we will be dropping the columns by specifying the label parameter. It is to be noted that label can take single value or multiple values in list format.

Python
import pandas as pd

data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)

df_dropped = df.drop(['B','C'], axis=1)
print(df_dropped)

Output
   A
0  1
1  2
2  3

How to Drop Columns Using Column Names?

Let us consider a dataframe. Here we have four columns and we wish to remove any two columns. Inplace=true determines whether changes are applied directly to the original DataFrame or if a new modified version is returned

Python
import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob'],'Age': [25, 30],'City': ['New York', 'Los Angeles'],'Country': ['USA', 'USA']})

# Drop the columns 'Age' and 'Country'
df.drop(columns=['Age', 'Country'],inplace=True)
print(df)

Output
    Name         City
0  Alice     New York
1    Bob  Los Angeles

As we can see from the output, the two columns: Age and Country has been dropped from the dataframe. Also using the inplace=True has modified the original dataframe.

Dropping rows from a Multi index dataframe

Multi-index dataframes are those that comprises of more than one level of indexing. These dataframes are used to handle the hierarchical data. For MultiIndex DataFrames, specifying the level parameter allows users to remove labels at specific hierarchical levels. This capability is essential for managing complex data structures where multiple indexing levels exist. Here level basically takes the index name(s) as input. For deleting the rows we have to specify the particular set of indices.

Python
import pandas as pd
# Create a MultiIndex DataFrame
arrays = [['A', 'A', 'B', 'B'], ['one', 'two', 'one', 'two']]
index = pd.MultiIndex.from_tuples(list(zip(*arrays)), names=['letter', 'number'])
df = pd.DataFrame({'X': [1, 2, 3, 4],'Y': [5, 6, 7, 8]}, index=index)

df_dropped = df.drop(('B', 'one'))
print("\nDataFrame after dropping ('B', 'one'):")
print(df_dropped)

Output
DataFrame after dropping ('B', 'one'):
               X  Y
letter number      
A      one     1  5
       two     2  6
B      two     4  8

Next Article
Practice Tags :

Similar Reads