Delete rows in PySpark dataframe based on multiple conditions Last Updated : 29 Jun, 2021 Summarize Comments Improve Suggest changes Share Like Article Like Report In this article, we are going to see how to delete rows in PySpark dataframe based on multiple conditions. Method 1: Using Logical expression Here we are going to use the logical expression to filter the row. Filter() function is used to filter the rows from RDD/DataFrame based on the given condition or SQL expression. Syntax: filter( condition) Parameters: Condition: Logical condition or SQL expression Example 1: Python3 # importing module import pyspark # importing sparksession from pyspark.sql # module from pyspark.sql import SparkSession # spark library import import pyspark.sql.functions # creating sparksession and giving an app name spark = SparkSession.builder.appName('sparkdf').getOrCreate() # list of students data data = [["1", "Amit", " DU"], ["2", "Mohit", "DU"], ["3", "rohith", "BHU"], ["4", "sridevi", "LPU"], ["1", "sravan", "KLMP"], ["5", "gnanesh", "IIT"]] # specify column names columns = ['student_ID', 'student_NAME', 'college'] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data, columns) dataframe = dataframe.filter(dataframe.college != "IIT") dataframe.show() Output: Example 2: Python3 # importing module import pyspark # importing sparksession from pyspark.sql # module from pyspark.sql import SparkSession # spark library import import pyspark.sql.functions # creating sparksession and giving an app name spark = SparkSession.builder.appName('sparkdf').getOrCreate() # list of students data data = [["1", "Amit", " DU"], ["2", "Mohit", "DU"], ["3", "rohith", "BHU"], ["4", "sridevi", "LPU"], ["1", "sravan", "KLMP"], ["5", "gnanesh", "IIT"]] # specify column names columns = ['student_ID', 'student_NAME', 'college'] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data, columns) dataframe = dataframe.filter( ((dataframe.college != "DU") & (dataframe.student_ID != "3")) ) dataframe.show() Output: Method 2: Using when() method It evaluates a list of conditions and returns a single value. Thus passing the condition and its required values will get the job done. Syntax: When( Condition, Value) Parameters: Condition: Boolean or columns expression.Value: Literal Value Example: Python3 # importing module import pyspark # importing sparksession from pyspark.sql # module from pyspark.sql import SparkSession # spark library import import pyspark.sql.functions # spark library import from pyspark.sql.functions import when # creating sparksession and giving an app name spark = SparkSession.builder.appName('sparkdf').getOrCreate() # list of students data data = [["1", "Amit", " DU"], ["2", "Mohit", "DU"], ["3", "rohith", "BHU"], ["4", "sridevi", "LPU"], ["1", "sravan", "KLMP"], ["5", "gnanesh", "IIT"]] # specify column names columns = ['student_ID', 'student_NAME', 'college'] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data, columns) dataframe.withColumn('New_col', when(dataframe.student_ID != '5', "True") .when(dataframe.student_NAME != 'gnanesh', "True") ).filter("New_col == True").drop("New_col").show() Output: Comment More infoAdvertise with us Next Article Python PySpark - DataFrame filter on multiple columns K kumar_satyam Follow Improve Article Tags : Python Python-Pyspark Practice Tags : python Similar Reads Pyspark - Filter dataframe based on multiple conditions In this article, we are going to see how to Filter dataframe based on multiple conditions. Let's Create a Dataframe for demonstration: Python3 # importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an 3 min read Count rows based on condition in Pyspark Dataframe In this article, we will discuss how to count rows based on conditions in Pyspark dataframe. For this, we are going to use these methods: Using where() function.Using filter() function. Creating Dataframe for demonstration: Python3 # importing module import pyspark # importing sparksession from pysp 4 min read Split Spark DataFrame based on condition in Python In this article, we are going to learn how to split data frames based on conditions using Pyspark in Python. Spark data frames are a powerful tool for working with large datasets in Apache Spark. They allow to manipulate and analyze data in a structured way, using SQL-like operations. Sometimes, we 5 min read Filtering rows based on column values in PySpark dataframe In this article, we are going to filter the rows based on column values in PySpark dataframe. Creating Dataframe for demonstration:Python3 # importing module import spark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app n 2 min read Python PySpark - DataFrame filter on multiple columns In this article, we are going to filter the dataframe on multiple columns by using filter() and where() function in Pyspark in Python. Creating Dataframe for demonestration: Python3 # importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSessio 2 min read Drop Rows in PySpark DataFrame with Condition In this article, we are going to drop the rows in PySpark dataframe. We will be considering most common conditions like dropping rows with Null values, dropping duplicate rows, etc. All these conditions use different functions and we will discuss them in detail.We will cover the following topics:Dro 4 min read Like