Pandas DataFrame drop() Method
Last Updated :
04 Dec, 2024
Drop is a useful functionality in Pandas used to remove specified labels from rows or columns in a DataFrame and it provides options to modify the original DataFrame directly or return a new one with the changes. Since drop works for both columns and rows we have to specify the axis. By default, the axis is 0 which means that the rows are deleted by default. To delete the columns, we specify the axis value as 1. For example: Let us consider a sample dataframe and drop a row and column:
Python
import pandas as pd
df = pd.DataFrame({'A': [1, 2],'B': [4, 5],'C': [7, 8]})
print("Original DataFrame:\n", df)
df_dropped_col = df.drop(columns=['B']) # Dropping a column
print("\nDataFrame after dropping column 'B': \n", df_dropped_col)
df_dropped_row = df.drop(index=1) # Dropping a row
print("\nDataFrame after dropping row with index 1:\n", df_dropped_row)
OutputOriginal DataFrame:
A B C
0 1 4 7
1 2 5 8
DataFrame after dropping column 'B':
A C
0 1 7
1 2 8
DataFrame after dropping row with index 1:
A B C
0 1 4 7
Understanding the Syntax and Parameters of drop() Method
The drop()
method has a straightforward syntax that provides flexibility in specifying what to remove:
DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise'), here:
- labels denote the row labels or column labels
- axis=0 denotes rows and axis=1 denotes columns
- index denotes row indices
- columns variable takes the list of column names
- level used for multi indexing dataframes
- inplace = True means changes will be reflected in the original dataframe.
- errors can be either raise or ignore. This parameter is used when the column names do not exist in the dataframe.
How to Drop Rows Using Index Labels?
Dropping rows by index labels is efficient when you know which specific rows need removal, it ensures that only the specified rows are removed without affecting others. The approach is crucial for refining datasets by eliminating irrelevant or erroneous entries without altering the overall structure.
In general we use index values to access the rows of the dataframe. So we pass the list of row indices that are to be dropped.
Python
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
df_dropped = df.drop(0, axis=0)
print(df_dropped)
Output A B C
1 2 5 8
2 3 6 9
This code drops rows with index labels 0 directly from the original DataFrame.
Dropping the Columns by Label
Here we will be dropping the columns by specifying the label parameter. It is to be noted that label can take single value or multiple values in list format.
Python
import pandas as pd
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)
df_dropped = df.drop(['B','C'], axis=1)
print(df_dropped)
How to Drop Columns Using Column Names?
Let us consider a dataframe. Here we have four columns and we wish to remove any two columns. Inplace=true determines whether changes are applied directly to the original DataFrame or if a new modified version is returned
Python
import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob'],'Age': [25, 30],'City': ['New York', 'Los Angeles'],'Country': ['USA', 'USA']})
# Drop the columns 'Age' and 'Country'
df.drop(columns=['Age', 'Country'],inplace=True)
print(df)
Output Name City
0 Alice New York
1 Bob Los Angeles
As we can see from the output, the two columns: Age and Country has been dropped from the dataframe. Also using the inplace=True has modified the original dataframe.
Dropping rows from a Multi index dataframe
Multi-index dataframes are those that comprises of more than one level of indexing. These dataframes are used to handle the hierarchical data. For MultiIndex DataFrames, specifying the level parameter allows users to remove labels at specific hierarchical levels. This capability is essential for managing complex data structures where multiple indexing levels exist. Here level basically takes the index name(s) as input. For deleting the rows we have to specify the particular set of indices.
Python
import pandas as pd
# Create a MultiIndex DataFrame
arrays = [['A', 'A', 'B', 'B'], ['one', 'two', 'one', 'two']]
index = pd.MultiIndex.from_tuples(list(zip(*arrays)), names=['letter', 'number'])
df = pd.DataFrame({'X': [1, 2, 3, 4],'Y': [5, 6, 7, 8]}, index=index)
df_dropped = df.drop(('B', 'one'))
print("\nDataFrame after dropping ('B', 'one'):")
print(df_dropped)
OutputDataFrame after dropping ('B', 'one'):
X Y
letter number
A one 1 5
two 2 6
B two 4 8
Similar Reads
Pandas DataFrame.dropna() Method
Pandas is one of the packages that makes importing and analyzing data much easier. Sometimes CSV file has null values, which are later displayed as NaN in Pandas DataFrame. Pandas dropna() method allows the user to analyze and drop Rows/Columns with Null values in different ways.  Pandas DataFrame.
3 min read
Pandas dataframe.groupby() Method
Pandas groupby() function is a powerful tool used to split a DataFrame into groups based on one or more columns, allowing for efficient data analysis and aggregation. It follows a "split-apply-combine" strategy, where data is divided into groups, a function is applied to each group, and the results
6 min read
Pandas DataFrame.loc[] Method
Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects. This is the primary data structure o
6 min read
Pandas DataFrame itertuples() Method
itertuples() is a method that is used to iterate over the rows and return the values along with attributes in tuple format. It returns each row as a lightweight namedtuple, which is faster and more memory-efficient than other row iteration methods like iterrows(). Let us consider one sample example.
7 min read
Pandas DataFrame take() Method
Python is a great tool for data analysis, primarily because of the fantastic ecosystem of data-centric Python packages like Pandas which make analyzing data much easier. Pandas take() function returns elements on the given indices, along an axis. This means that we are not indexing according to actu
3 min read
Pandas Dataframe Index
Index in pandas dataframe act as reference for each row in dataset. It can be numeric or based on specific column values. The default index is usually a RangeIndex starting from 0, but you can customize it for better data understanding. You can easily access the current index of a dataframe using th
3 min read
Pandas DataFrame.to_sparse() Method
Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects. This is the primary data structure o
2 min read
DataFrame.read_pickle() method in Pandas
Prerequisite : pd.to_pickle method() The read_pickle() method is used to pickle (serialize) the given object into the file. This method uses the syntax as given below : Syntax: pd.read_pickle(path, compression='infer') Parameters: Arguments                Type
2 min read
Pandas DataFrame
Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal comp
11 min read
Python | Pandas Dataframe/Series.head() method
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas head() method is used to return top n (5 by default) rows of a data frame or se
2 min read