Open In App

Select Rows From List of Values in Pandas DataFrame

Last Updated : 10 Dec, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Let's learn how to select rows from a list of values in Pandas DataFrame using isin() method.

Using isin() to Select Rows from a List of Values

The isin() function is one of the most commonly used methods for filtering data based on a list of values. Let’s walk through a simple example to illustrate this.

Python
import pandas as pd
data = {
    
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Department': ['HR', 'IT', 'IT', 'Finance']
}
df = pd.DataFrame(data)
departments = ['HR', 'Finance']
filtered_df = df[df['Department'].isin(departments)]
print(filtered_df)

Output
    Name Department
0  Alice         HR
3  David    Finance

In this example, the isin() function filters rows where the Department column matches 'HR' or 'Finance'. This is a simple and effective way to filter data in Pandas.

Using isin() to Filter Based on Multiple Conditions

You can also apply multiple conditions using isin() along with logical operators like & (AND) or | (OR). Let us consider a dataframe. Here we will filter out those rows using isin() that satisfy the column named 'Status' as 'Pending' or 'Completed'.

Python
import pandas as pd
data = {
    'OrderID': [1011, 1021, 1052],
    'Customer': ['Alice', 'Bob', 'Eve'],
    'Status': ['Completed', 'Pending', 'Canceled']
}
df = pd.DataFrame(data)
status = ['Completed','Pending']
filtered_df = df[df['Status'].isin(status)]
print(filtered_df)

Output
   OrderID Customer     Status
0     1011    Alice  Completed
1     1021      Bob    Pending

Using Negation with isin() to Exclude Rows

You can exclude rows from the DataFrame that match certain values using the negation operator ~ with isin(). Let us consider the following example:

Python
import pandas as pd

# Sample DataFrame
data = {
    'Item': ['Carrot', 'Banana', 'Broccoli', 'Orange'],
    'Category': ['Vegetable', 'Fruit', 'Vegetable', 'Fruit']
}
df = pd.DataFrame(data)

# Exclude fruits
filtered_df = df[~df['Category'].isin(['Fruit'])]

print(filtered_df)

Output
       Item   Category
0    Carrot  Vegetable
2  Broccoli  Vegetable

Using multiple isin() along with conditional operators

You can filter rows by using multiple isin() methods in combination with the AND (&) operator. This is especially helpful when you want to check multiple columns against different lists of values.

Python
import pandas as pd

# Sample DataFrame
data = {
    'Fruit': ['Apple', 'Carrot', 'Banana', 'Grapes'],
    'Category': ['Fruit', 'Vegetable', 'Fruit', 'Fruit'],
}
df = pd.DataFrame(data)

# Conditional filtering with AND
filtered_df = df[df['Fruit'].isin(['Apple', 'Banana', 'Carrot']) & df['Category'].isin(['Fruit'])]

print(filtered_df)

Output
    Fruit Category
0   Apple    Fruit
2  Banana    Fruit

In this example, we apply the AND condition to ensure that only fruits from the ['Apple', 'Banana', 'Carrot'] list in the Fruit column are included, and that their Category is Fruit. The isin() checks if the value in each column belongs to the specified list, and the logical AND (&) ensures both conditions must be true for the row to be included.

Using query() for Advanced Filtering

The query() method in Pandas is similar to SQL queries. It allows you to filter rows using an in operator, making it ideal for filtering based on a list of values. Let's consider following example, where we filter rows based on specific grades.

Python
import pandas as pd

# Sample DataFrame with student data
data = {
    'Student': ['John', 'Emily', 'Sarah', 'Michael'],
    'Age': [18, 19, 17, 20],
    'Grade': ['A', 'B', 'A', 'C'],
}

df = pd.DataFrame(data)

# Select rows where 'Student' is in a tuple
filtered_df = df.query('Grade in ("A", "C")')

print(filtered_df)

Output
   Student  Age Grade
0     John   18     A
2    Sarah   17     A
3  Michael   20     C

In this example, query() filters the DataFrame to only include students with grades 'A' or 'C'. This approach is clean and efficient, especially when dealing with more complex queries involving multiple columns.

Using apply() and Lambda Functions for Complex Filters

For more complex filtering, apply() combined with lambda functions is a versatile option. This method is especially useful when you need to check conditions that cannot be easily captured by isin().

Python
import pandas as pd

# Sample DataFrame with employee data
data = {
    'Employee': ['John','Michael', 'David', 'Sophia'],
    'Department': ['HR', 'HR', 'Finance', 'IT'],
    'Salary': [50000, 65000, 70000, 62000]
}
df = pd.DataFrame(data)

# Select rows where 'Department' is either 'HR' or 'Finance' using apply
filtered_df = df[df['Department'].apply(lambda x: x in ['HR', 'Finance'])]

print(filtered_df)

Output
  Employee Department  Salary
0     John         HR   50000
1  Michael         HR   65000
2    David    Finance   70000

In this case, apply() along with a lambda function checks if each value in the Department column matches either 'HR' or 'Finance'. This approach is particularly useful when the filtering condition is complex or involves custom logic that cannot be directly captured by simple methods like isin().

Conclusion

In this article, we explored several techniques for filtering rows in a Pandas DataFrame using a list of values. The most common and efficient approach is the isin() method, which works well for basic filtering. For more complex conditions, methods like query() and apply() with lambda functions provide greater flexibility.


Next Article

Similar Reads