When working with large datasets, there are often columns that are irrelevant or redundant. Pandas provides an efficient way to remove these unnecessary columns using the `drop()` function. In this article, we will cover various methods to drop columns from a DataFrame.
Python
import pandas as pd
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)
df = df.drop('B', axis=1)
print(df)
Output:
A C
0 1 7
1 2 8
2 3 9
In this example, the DataFrame df
will now only contain columns 'A' and 'C'.
We will load this dataset into a Pandas DataFrame and explore different ways to drop columns. Dataset: data.csv
Displaying the first five rows of DatasetMethod 1: Dropping Columns by Name
The most common method for removing columns is by specifying the column name. drop() function allows us to easily remove one or more columns.
A) Call the drop() method with the column name to drop.
Python
# Drop the 'Gender' column
df = df.drop('Gender', axis=1)
print(df)
Output:
Dropping by NamesIn this example, the Gender column is dropped by specifying its name. The axis=1 argument tells Pandas to drop a column (since axis=0 refers to rows).
B) If we need to drop multiple columns, we can pass a list of column names to the drop() function.
Python
# Drop both 'Age' and 'Gender' columns
df = df.drop(['Age', 'Gender'], axis=1)
print(df)
Output:
Dropping multiple columns This method removes both the Age and Gender columns from the DataFrame. By passing a list of column names, you can drop multiple columns at once.
You can refer this article for more detailed explanation: How to drop one or multiple columns in pandas dataframe
Method 2: Dropping Columns Inplace
If you want to modify the original DataFrame directly, you can use the inplace=True argument. This avoids creating a new DataFrame and applies the change to the existing one.
- Use inplace=True to remove columns directly.
Python
import pandas as pd
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)
# Drop column 'B' inplace
df.drop('B', axis=1, inplace=True)
print(df)
Output A C
0 1 7
1 2 8
2 3 9
Using inplace=True removes the Salary column from the DataFrame without creating a new variable. The change is applied directly to the original DataFrame.
Method 3: Dropping Columns Using del
Another way to drop a column from a DataFrame is by using the `del` statement. This method directly deletes the column from the DataFrame.
- Use del to remove a column.
Python
import pandas as pd
data = {'A': [10, 20, 30], 'B': [40, 50, 60], 'C': [70, 80, 90]}
df = pd.DataFrame(data)
# Drop column 'C' using del
del df['C']
print(df)
Output A B
0 10 40
1 20 50
2 30 60
The del statement is a simple and direct way to remove a column from the DataFrame. It directly deletes the column, and no new DataFrame is created.
Method 4: Dropping Columns with Missing Values
You can also drop columns that contain missing values using the dropna() method. This is useful when you want to clean the dataset by removing columns with NaN values.
- Use dropna(axis=1) to remove columns with missing values.
Python
import pandas as pd
data = {'A': [10, None, 30], 'B': [None, 50, 60], 'C': [70, 80, 90]}
df = pd.DataFrame(data)
# Drop columns with missing values
df.dropna(axis=1, inplace=True)
print(df)
This removes any column that contains missing values (NaN). You can also specify thresh to drop columns that don’t meet a certain number of non-null values.
You can refer this article for more detailed explanation: Working with Missing Data in Pandas
Recommendation: For general column removal, using drop() is the most straightforward method. For cleaning data with missing values, dropna() is ideal. If you prefer to modify the original DataFrame without creating a new one, use inplace=True.
Similar Reads
Drop Empty Columns in Pandas
Cleaning data is an essential step in data analysis. In this guide we will explore different ways to drop empty, null and zero-value columns in a Pandas DataFrame using Python. By the end you'll know how to efficiently clean your dataset using the dropna() and replace() methods. Understanding dropna
3 min read
Pandas DataFrame.columns
In Pandas, DataFrame.columns attribute returns the column names of a DataFrame. It gives access to the column labels, returning an Index object with the column labels that may be used for viewing, modifying, or creating new column labels for a DataFrame.Note: This attribute doesn't require any param
2 min read
How to Drop Index Column in Pandas?
When working with Pandas DataFrames, it's common to reset or remove custom indexing, especially after filtering or modifying rows. Dropping the index is useful when:We no longer need a custom index.We want to restore default integer indexing (0, 1, 2, ...).We're preparing data for exports or transfo
2 min read
Pandas Rename Column
Renaming columns in a Pandas DataFrame is a common operation when we want to clean, standardize, or transform data. In this article, we'll explore few different methods for renaming columns, each with specific use cases. Whether we're renaming a few columns or applying custom transformations, these
4 min read
Pandas Select Columns
Simplest way to select a specific or multiple columns in pandas dataframe is by using bracket notation, where you place the column name inside square brackets. Let's consider following example: Pythonimport pandas as pd data = {'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'], 'Age': [25, 30, 22,
3 min read
How to Exclude Columns in Pandas?
Excluding columns in a Pandas DataFrame is a common operation when you want to work with only relevant data. In this article, we will discuss various methods to exclude columns from a DataFrame, including using .loc[], .drop(), and other techniques.Exclude One Column using .loc[]We can exclude a col
2 min read
Show all columns of Pandas DataFrame
Pandas sometimes hides some columns by default if the DataFrame is too wide. To view all the columns in a DataFrame pandas provides a simple way to change the display settings using the pd.set_option() function. This function allow you to control how many rows or columns are displayed in the output.
2 min read
Drop Duplicates Ignoring One Column-Pandas
Pandas provide various features for users to implement on datasets. One such feature is dropping the duplicate rows, which can be done using the drop_duplicates function available in Pandas. There are some cases where the user wants to eliminate the duplicates but does not consider any certain colum
5 min read
Rename column by index in Pandas
A column of a data frame can be changed using the position it is in known as its index. Just by the use of the index, a column can be renamed. Dealing with large and complex datasets in Pandas often requires manipulating column names for better analysis. Renaming columns by their index position can
6 min read
Pandas Change Datatype
In data analysis, ensuring that each column in a Pandas DataFrame has the correct data type is crucial for accurate computations and analyses. The most common way to change the data type of a column in a Pandas DataFrame is by using the astype() method. This method allows you to convert a specific c
2 min read