Check For A Substring In A Pandas Dataframe Column
Last Updated :
28 Apr, 2025
Pandas is a data analysis library for Python that has exploded in popularity over the past years. In technical terms, pandas is an in memory nosql database, that has sql-like constructs, basic statistical and analytic support, as well as graphing capability .One common task in data analysis is searching for substrings within a dataset, and Pandas offers efficient tools to accomplish this.
In this article, we will explore the ways by which we can check for a substring in a Pandas DataFrame column.
Check for a Substring in a DataFrame Column
Below are some of the ways by which check for a substring in a Pandas DataFrame column in Python:
- Using str.contains() method
- Using Regular Expressions
- apply() function
- List Comprehension with 'in' Operator
Check For a Substring in a Pandas Dataframe using str.contains() method
In this example, a pandas DataFrame is created with employee information. A new column, 'NameContainsSubstring,' is added, indicating whether the substring 'an' is present in each 'Name' entry using the str.contains
method.
Python3
import pandas as pd
data = {
'EmployeeID': [101, 102, 103, 104],
'Name': ['Aman', 'Bhavna', 'Madhav', 'Rohan'],
'Department': ['HR', 'IT', 'Finance', 'Marketing'],
'Salary': [60000, 75000, 90000, 65000]
}
df = pd.DataFrame(data)
# Checking for substring 'an' in the 'Name' column
substring = 'an'
df['NameContainsSubstring'] = df['Name'].str.contains(substring)
filtered_df = df[df['NameContainsSubstring']]
print(filtered_df)
Output:
EmployeeID Name Department Salary NameContainsSubstring
0 101 Aman HR 60000 True
3 104 Rohan Marketing 65000 True
Check For A Substring In A Pandas Dataframe Using Regular Expressions
In this example, a pandas DataFrame is created with employee information. A new column, 'NameContainsPattern,' is added, indicating whether the regular expression pattern 'ma' is present in each 'Name' entry.
In this example, the str.contains
method is used with the regex=True
parameter to interpret the pattern as a regular expression. The negative lookahead ensures that 'ma' is not immediately followed by the end of the string.
Python3
import pandas as pd
data = {
'EmployeeID': [101, 102, 103, 104],
'Name': ['aman', 'bhavna', 'madhav', 'rohan'],
'Department': ['HR', 'IT', 'Finance', 'Marketing'],
'Salary': [60000, 75000, 90000, 65000]
}
df = pd.DataFrame(data)
# regular expression pattern with negative lookahead
pattern = r'ma(?!$)'
df['NameContainsPattern'] = df['Name'].str.contains(pattern, regex=True)
filtered_df = df[df['NameContainsPattern']]
print(filtered_df)
Output:
EmployeeID Name Department Salary NameContainsPattern
0 101 aman HR 60000 True
2 103 madhav Finance 90000 True
Check For A Substring In A Pandas Dataframe Using apply() function
In this example, a pandas DataFrame is created with employee information, including 'EmployeeID', 'Name', 'Department', and 'Salary'. A new column, 'NameContainsSubstring,' is added, indicating whether the substring 'av' is present in each 'Name' entry using the apply() method with a lambda function.
Python3
import pandas as pd
# Creating a relevant 4-column DataFrame
data = {
'EmployeeID': [101, 102, 103, 104],
'Name': ['Aman', 'Bhavna', 'Madhav', 'Rohan'],
'Department': ['HR', 'IT', 'Finance', 'Marketing'],
'Salary': [60000, 75000, 90000, 65000]
}
df = pd.DataFrame(data)
# Checking for substring 'av' in the 'Name' column and adding a new column
substring = 'av'
df['NameContainsSubstring'] = df['Name'].apply(lambda x: substring in x)
filtered_df = df[df['NameContainsSubstring']]
print(filtered_df)
Output:
EmployeeID Name Department Salary NameContainsSubstring
1 102 Bhavna IT 75000 True
2 103 Madhav Finance 90000 True
Check For A Substring In A Pandas Dataframe Using List Comprehension with 'in' Operator
In this example, let's check whether the substring is present in each department key using list comprehension.
Python3
import pandas as pd
data = {
'EmployeeID': [101, 102, 103, 104],
'Name': ['Aman', 'Bhavna', 'Madhav', 'Rohan'],
'Department': ['HR', 'IT', 'Finance', 'Marketing'],
'Salary': [60000, 75000, 90000, 65000]
}
df = pd.DataFrame(data)
# Checking for substring
substring = 'Finance'
df['NameContainsSubstring'] = [substring in Department for Department in df['Department']]
filtered_df = df[df['NameContainsSubstring']]
print(filtered_df)
Output:
EmployeeID Name Department Salary NameContainsSubstring
2 103 Madhav Finance 90000 True
Similar Reads
Check if a column starts with given string in Pandas DataFrame? In this program, we are trying to check whether the specified column in the given data frame starts with specified string or not. Let us try to understand this using an example suppose we have a dataset named student_id, date_of_joining, branch. Example: Python3 #importing library pandas as pd impor
2 min read
Check if a column starts with given string in Pandas DataFrame? In this program, we are trying to check whether the specified column in the given data frame starts with specified string or not. Let us try to understand this using an example suppose we have a dataset named student_id, date_of_joining, branch. Example: Python3 #importing library pandas as pd impor
2 min read
How to check for a substring in a PySpark dataframe ? In this article, we are going to see how to check for a substring in PySpark dataframe. Substring is a continuous sequence of characters within a larger string size. For example, "learning pyspark" is a substring of "I am learning pyspark from GeeksForGeeks". Let us look at different ways in which w
5 min read
How to Get substring from a column in PySpark Dataframe ? In this article, we are going to see how to get the substring from the PySpark Dataframe column and how to create the new column and put the substring in that newly created column. We can get the substring of the column using substring() and substr() function. Syntax: substring(str,pos,len) df.col_n
3 min read
How to lowercase strings in a column in Pandas dataframe Analyzing real-world data is somewhat difficult because we need to take various things into consideration. Apart from getting the useful data from large datasets, keeping data in required format is also very important. One might encounter a situation where we need to lowercase each letter in any spe
2 min read
Filter pandas DataFrame by substring criteria IntroductionPandas is a popular Python library for data analysis and manipulation. The DataFrame is one of the key data structures in Pandas, providing a way to store and work with structured data in a tabular format. DataFrames are useful for organizing and storing data in a consistent format, allo
10 min read