Filtering a row in PySpark DataFrame based on matching values from a list

Last Updated : 28 Jul, 2021

In this article, we are going to filter the rows in the dataframe based on matching values in the list by using isin in Pyspark dataframe

isin(): This is used to find the elements contains in a given dataframe, it will take the elements and get the elements to match to the data

Syntax: isin([element1,element2,.,element n])

Create Dataframe for demonstration:

Python3

# importing module
import pyspark

# importing sparksession
from pyspark.sql import SparkSession

# creating sparksession
# and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()

# list  of students  data  with null values
# we can define null values with none
data = [[1, "sravan", "vignan"],
        [2, "ramya", "vvit"],
        [3, "rohith", "klu"],
        [4, "sridevi", "vignan"],
        [5, "gnanesh", "iit"]]

# specify column names
columns = ['ID', 'NAME', 'college']

# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)

dataframe.show()

Output:

Method 1: Using filter() method

It is used to check the condition and give the results, Both are similar

Syntax: dataframe.filter(condition)

Where, condition is the dataframe condition.

Here we will use all the discussed methods.

Syntax: dataframe.filter((dataframe.column_name).isin([list_of_elements])).show()

where,

column_name is the column
elements are the values that are present in the column
show() is used to show the resultant dataframe

Example 1: Get the particular ID's with filter() clause.

Python3

# get the ID : 1,2,3 from dataframe
dataframe.filter((dataframe.ID).isin([1,2,3])).show()

Output:

Example 2: Get ID's not present in 1 and 3

Python3

# get the ID : not in 1 and 3 from dataframe
dataframe.filter(~(dataframe.ID).isin([1, 3])).show()

Output:

Example 3: Get names from dataframe.

Python3

# get name as sravan
dataframe.filter((
  dataframe.NAME).isin(['sravan'])).show()

Output:

Method 2: Using where() method

where() is used to check the condition and give the results

Syntax: dataframe.where(condition)

where, condition is the dataframe condition

Overall Syntax with where clause:

dataframe.where((dataframe.column_name).isin([elements])).show()

where,

column_name is the column
elements are the values that are present in the column
show() is used to show the resultant dataframe

Example: Get the particular colleges with where() clause

Python3

# get college as vignan
dataframe.where((
  dataframe.college).isin(['vignan'])).show()

Output:

How to get a value from the Row object in PySpark Dataframe?

sravankumar_171fa07058

Improve

Article Tags :

Practice Tags :

python

Filtering a row in PySpark DataFrame based on matching values from a list

Method 1: Using filter() method

Method 2: Using where() method

Similar Reads

Thank You!

What kind of Experience do you want to share?