Filtering a row in PySpark DataFrame based on matching values from a list Last Updated : 28 Jul, 2021 Comments Improve Suggest changes Like Article Like Report In this article, we are going to filter the rows in the dataframe based on matching values in the list by using isin in Pyspark dataframe isin(): This is used to find the elements contains in a given dataframe, it will take the elements and get the elements to match to the data Syntax: isin([element1,element2,.,element n]) Create Dataframe for demonstration: Python3 # importing module import pyspark # importing sparksession from pyspark.sql import SparkSession # creating sparksession # and giving an app name spark = SparkSession.builder.appName('sparkdf').getOrCreate() # list of students data with null values # we can define null values with none data = [[1, "sravan", "vignan"], [2, "ramya", "vvit"], [3, "rohith", "klu"], [4, "sridevi", "vignan"], [5, "gnanesh", "iit"]] # specify column names columns = ['ID', 'NAME', 'college'] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data, columns) dataframe.show() Output: Method 1: Using filter() method It is used to check the condition and give the results, Both are similar Syntax: dataframe.filter(condition) Where, condition is the dataframe condition. Here we will use all the discussed methods. Syntax: dataframe.filter((dataframe.column_name).isin([list_of_elements])).show() where, column_name is the columnelements are the values that are present in the columnshow() is used to show the resultant dataframe Example 1: Get the particular ID's with filter() clause. Python3 # get the ID : 1,2,3 from dataframe dataframe.filter((dataframe.ID).isin([1,2,3])).show() Output: Example 2: Get ID's not present in 1 and 3 Python3 # get the ID : not in 1 and 3 from dataframe dataframe.filter(~(dataframe.ID).isin([1, 3])).show() Output: Example 3: Get names from dataframe. Python3 # get name as sravan dataframe.filter(( dataframe.NAME).isin(['sravan'])).show() Output: Method 2: Using where() method where() is used to check the condition and give the results Syntax: dataframe.where(condition) where, condition is the dataframe condition Overall Syntax with where clause: dataframe.where((dataframe.column_name).isin([elements])).show() where, column_name is the columnelements are the values that are present in the columnshow() is used to show the resultant dataframe Example: Get the particular colleges with where() clause Python3 # get college as vignan dataframe.where(( dataframe.college).isin(['vignan'])).show() Output: Comment More infoAdvertise with us Next Article Filtering a row in PySpark DataFrame based on matching values from a list sravankumar_171fa07058 Follow Improve Article Tags : Python Python-Pyspark Practice Tags : python Similar Reads Filtering rows based on column values in PySpark dataframe In this article, we are going to filter the rows based on column values in PySpark dataframe. Creating Dataframe for demonstration:Python3 # importing module import spark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app n 2 min read Filter Rows Based on Conditions in a DataFrame in R In this article, we will explore various methods to filter rows based on Conditions in a data frame by using the R Programming Language. How to filter rows based on Conditions in a data frame R language offers various methods to filter rows based on Conditions in a data frame. By using these methods 3 min read Extract First and last N rows from PySpark DataFrame In data analysis, extracting the start and end of a dataset helps understand its structure and content. PySpark, widely used for big data processing, allows us to extract the first and last N rows from a DataFrame. In this article, we'll demonstrate simple methods to do this using built-in functions 2 min read How to get a value from the Row object in PySpark Dataframe? In this article, we are going to learn how to get a value from the Row object in PySpark DataFrame. Method 1 : Using __getitem()__ magic method We will create a Spark DataFrame with at least one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.co 5 min read Delete rows in PySpark dataframe based on multiple conditions In this article, we are going to see how to delete rows in PySpark dataframe based on multiple conditions. Method 1: Using Logical expression Here we are going to use the logical expression to filter the row. Filter() function is used to filter the rows from RDD/DataFrame based on the given conditio 2 min read Filtering a PySpark DataFrame using isin by exclusion In this article, we will discuss how to filter the pyspark dataframe using isin by exclusion. isin(): This is used to find the elements contains in a given dataframe, it takes the elements and gets the elements to match the data. Syntax: isin([element1,element2,.,element n) Creating Dataframe for de 2 min read Create PySpark DataFrame from list of tuples In this article, we are going to discuss the creation of a Pyspark dataframe from a list of tuples. To do this, we will use the createDataFrame() method from pyspark. This method creates a dataframe from RDD, list or Pandas Dataframe. Here data will be the list of tuples and columns will be a list 2 min read Filter PySpark DataFrame Columns with None or Null Values Many times while working on PySpark SQL dataframe, the dataframes contains many NULL/None values in columns, in many of the cases before performing any of the operations of the dataframe firstly we have to handle the NULL/None values in order to get the desired result or output, we have to filter th 4 min read How to create a PySpark dataframe from multiple lists ? In this article, we will discuss how to create Pyspark dataframe from multiple lists. ApproachCreate data from multiple lists and give column names in another list. So, to do our task we will use the zip method. zip(list1,list2,., list n) Pass this zipped data to spark.createDataFrame() method data 2 min read Select Rows From List of Values in Pandas DataFrame Let's learn how to select rows from a list of values in Pandas DataFrame using isin() method. Using isin() to Select Rows from a List of ValuesThe isin() function is one of the most commonly used methods for filtering data based on a list of values. Letâs walk through a simple example to illustrate 4 min read Like