Open In App

Select rows that contain specific text using Pandas

Last Updated : 04 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

While preprocessing data using pandas dataframe there may be a need to find the rows that contain specific text. Our task is to find the rows that contain specific text in the columns or rows of a dataframe in pandas.

Dataset in use:

jobAge_RangeSalaryCredit-RatingSavingsBuys_Hone
OwnMiddle-agedHighFair10000Yes
GovtYoungLowFair15000No
PrivateSeniorAverageExcellent20000Yes
OwnMiddle-agedHighFair13000No
OwnYoungLowExcellent17000Yes
PrivateSeniorAverageFair18000No
GovtYoungAverageFair11000No
PrivateMiddle-agedLowExcellent9000No
GovtSeniorHighExcellent14000Yes

Using contains()

Using the contains() function of strings to filter the rows. We are filtering the rows based on the ‘Credit-Rating’ column of the dataframe by converting it to string followed by the contains method of string class. contains() method takes an argument and finds the pattern in the objects that calls it. Example:

Python
import pandas as pd

# reading csv file
df = pd.read_csv("Assignment.csv")

# filtering the rows where Credit-Rating is Fair
df = df[df['Credit-Rating'].str.contains('Fair')]
print(df)

Output

containsOutput

Rows containing Fair as Savings

Explanation: This code reads data from a CSV file (“Assignment.csv”) into a Pandas DataFrame. It then filters the rows where the Credit-Rating column contains the string ‘Fair’ using the str.contains() method. This method checks if ‘Fair’ is present in the Credit-Rating column for each row. The filtered DataFrame is then printed, displaying only the rows with a ‘Fair’ credit rating.

Using itertuples()

Using itertuples() to iterate rows with find to get rows that contain the desired text. itertuple method return an iterator producing a named tuple for each row in the DataFrame. It works faster than the iterrows() method of pandas. Example:

Python
import pandas as pd

# reading csv file
df = pd.read_csv("Assignment.csv")

# filtering the rows where Age_Range contains Young
for x in df.itertuples():
    if x[2].find('Young') != -1:
        print(x)

Output

itertuplesOutput

Rows with Age_Range as Young

Explanation: This code reads the CSV file “Assignment.csv” into a Pandas DataFrame, then filters and prints rows where the Age_Range column contains the string ‘Young’. It uses itertuples() to iterate over the rows and find() to check if ‘Young’ is present in the Age_Range column.

Using iterrows()

Using iterrows() to iterate rows with find to get rows that contain the desired text. iterrows() function returns the iterator yielding each index value along with a series containing the data in each row. It is slower as compared to the itertuples because of lot of type checking done by it. Example:

Python
import pandas as pd

# Reading the CSV file into a DataFrame
df = pd.read_csv("Assignment.csv")

# Filtering the rows where job is 'Govt'
for index, row in df.iterrows():
    if 'Govt' in row['job']:
        print(
            index,
            row['job'],
            row['Age_Range'],
            row['Salary'],
            row['Savings'],
            row['Credit-Rating']
        )

Output

Rows with job as Govt

Explanation: This code reads the CSV file “Assignment.csv” into a Pandas DataFrame and filters the rows where the job column contains ‘Govt’. It uses iterrows() to iterate over the rows, and for each row, it checks if ‘Govt’ is in the job column. If the condition is met, it prints the index, job, Age_Range, Salary, Savings, and Credit-Rating of that row.

Using regular expressions

Using regular expressions to find the rows with the desired text. search() is a method of the module re. re.search(pattern, string): It is similar to re.match() but it doesn’t limit us to find matches at the beginning of the string only. We are iterating over the every row and comparing the job at every index with ‘Govt’ to only select those rows. Example:

Python
from re import search
import pandas as pd

# Read the CSV file into a DataFrame
df = pd.read_csv("Assignment.csv")

# Iterate over rows where the job is 'Govt' and print relevant information
for ind in df.index:
    if search('Govt', df['job'][ind]):
        print(
            df['job'][ind],
            df['Savings'][ind],
            df['Age_Range'][ind],
            df['Credit-Rating'][ind]
        )

Output

regularExpressionsOutput

Rows where job is Govt

Explanation: This code reads the “Assignment.csv” file into a Pandas DataFrame. It then iterates over each row using df.index and checks if the string ‘Govt’ is present in the job column using search() from the re (regular expression) module. If the condition is met, it prints the job, Savings, Age_Range, and Credit-Rating of that row.



Next Article
Practice Tags :

Similar Reads