Open In App

Drop rows from Pandas dataframe with missing values or NaN in columns

Last Updated : 09 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

We are given a Pandas DataFrame that may contain missing values, also known as NaN (Not a Number), in one or more columns. Our task is to remove the rows that have these missing values to ensure cleaner and more accurate data for analysis. For example, if a row contains NaN in any specified column, that entire row should be dropped. Let’s explore some simple and efficient ways to achieve this.

Using dropna()

dropna() method is the most efficient used function to remove missing values from a DataFrame. It allows dropping rows or columns containing NaN values based on specific conditions. You can remove rows with at least one NaN (dropna()), rows where all values are NaN (dropna(how=’all’)), or columns with NaNs (dropna(axis=1)).

Python
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [100, np.nan, np.nan, 95],
    'B': [30, np.nan, 45, 56],
    'C': [52, np.nan, 80, 98],
    'D': [np.nan, np.nan, np.nan, 65]
})

print("Original:\n", df)

print("\nDrop rows with at least 1 NaN:\n", df.dropna())  
print("\nDrop rows where all values are NaN:\n", df.dropna(how='all'))
print("\nDrop columns with at least 1 NaN:\n", df.dropna(axis=1))  

Output:

output

Using dropna()

Explanation: First, df.dropna() removes any row that contains at least one NaN, ensuring only completely filled rows remain. Next, df.dropna(how=’all’) eliminates rows where all values are NaN, retaining partially filled rows. Finally, df.dropna(axis=1) removes columns that contain any missing values, leaving only those that are fully populated.

Using notna()

notna() function is used to identify and filter out missing values without directly removing them. It returns a boolean DataFrame where True represents non-null values. Using df[df.notna().all(axis=1)], we can filter out rows with NaNs, while df.loc[:, df.notna().all()] drops columns with missing values. This method provides better control over handling NaNs compared to dropna().

Python
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [100, np.nan, np.nan, 95],
    'B': [30, np.nan, 45, 56],
    'C': [52, np.nan, 80, 98],
    'D': [np.nan, np.nan, np.nan, 65]
})
print("Original:\n", df)

print("\nDrop rows with at least 1 NaN:\n", df[df.notna().all(axis=1)])
print("\nDrop rows where all values are NaN:\n", df[df.notna().any(axis=1)])
print("\nDrop columns with at least 1 NaN:\n", df.loc[:, df.notna().all()])

Output:

output

Using notna()

Explanation: First, df[df.notna().all(axis=1)] removes any row that contains at least one NaN, ensuring only completely filled rows remain. Next, df[df.notna().any(axis=1)] eliminates rows where all values are NaN, retaining partially filled rows. Finally, df.loc[:, df.notna().all()] removes columns that contain any missing values, leaving only those that are fully populated.

Using query()

query() function enables filtering rows based on logical conditions. Since NaN is not equal to itself (NaN != NaN), we can use expressions like “A == A and B == B” to drop rows containing NaNs. While this method enhances readability and works well for smaller datasets

Python
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [100, np.nan, np.nan, 95],
    'B': [30, np.nan, 45, 56],
    'C': [52, np.nan, 80, 98],
    'D': [np.nan, np.nan, np.nan, 65]
})
print("Original:\n", df)

print("\nDrop rows with at least 1 NaN:\n", df.query("A == A and B == B and C == C and D == D"))

Output:

output

Using query()

Explanation: df.query(“A == A and B == B and C == C and D == D”) method filters out rows containing at least one NaN. Since NaN is not equal to itself (NaN != NaN), this condition ensures only rows with all valid (non-null) values are retained.

Using isna() with mask()

fillna() method replaces NaN values with a specified value, while mask() selectively masks and modifies NaNs before dropping them. This approach is useful when pre-processing data before NaN removal, ensuring that missing values are handled appropriately before performing dropna(). However, it is less efficient than dropna() for direct NaN removal.

Python
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [100, np.nan, np.nan, 95],
    'B': [30, np.nan, 45, 56],
    'C': [52, np.nan, 80, 98],
    'D': [np.nan, np.nan, np.nan, 65]
})
print("Original:\n", df)

res = df.mask(df.isna())
print("\nDrop rows with at least 1 NaN:\n", res.dropna())
print("\nDrop columns with at least 1 NaN:\n", res.dropna(axis=1))

Output:

output

Using isna() with mask ()

Explanation: df.mask(df.isna()) marks missing values without altering data. res.dropna() removes rows with any NaN, keeping only complete ones. res.dropna(axis=1) drops columns with NaN, retaining fully populated columns.



Next Article
Practice Tags :

Similar Reads