Drop rows containing specific value in PySpark dataframe Last Updated : 30 Jun, 2021 Summarize Comments Improve Suggest changes Share Like Article Like Report In this article, we are going to drop the rows with a specific value in pyspark dataframe. Creating dataframe for demonstration: Python3 # importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName('sparkdf').getOrCreate() # list of students data data = [["1", "sravan", "vignan"], ["2", "ojaswi", "vvit"], ["3", "rohith", "vvit"], ["4", "sridevi", "vignan"], ["6", "ravi", "vrs"], ["5", "gnanesh", "iit"]] # specify column names columns = ['ID', 'NAME', 'college'] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data, columns) print('Actual data in dataframe') dataframe.show() Output: Method 1: Using where() function This function is used to check the condition and give the results. That means it drops the rows based on the values in the dataframe column Syntax: dataframe.where(condition) Example 1: Python program to drop rows with college = vrs. Python3 # drop rows with college vrs dataframe.where(dataframe.college!='vrs').show() Output: Example 2: Python program to drop rows with ID=1 Python3 # drop rows with id=1 dataframe.where(dataframe.ID !='1').show() Output: Method 2: Using filter() function This function is used to check the condition and give the results, Which means it drops the rows based on the values in the dataframe column. Both are similar. Syntax: dataframe.filter(condition) Example: Python code to drop row with name = ravi. Python3 # drop rows with name = ravi dataframe.filter(dataframe.NAME !='ravi').show() Output: Comment More infoAdvertise with us Next Article Get specific row from PySpark dataframe G gottumukkalabobby Follow Improve Article Tags : Python Python-Pyspark Practice Tags : python Similar Reads Drop Rows in PySpark DataFrame with Condition In this article, we are going to drop the rows in PySpark dataframe. We will be considering most common conditions like dropping rows with Null values, dropping duplicate rows, etc. All these conditions use different functions and we will discuss them in detail.We will cover the following topics:Dro 4 min read Get specific row from PySpark dataframe In this article, we will discuss how to get the specific row from the PySpark dataframe. Creating Dataframe for demonstration: Python3 # importing module import pyspark # importing sparksession # from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession # and giving an app 4 min read How to Drop Rows that Contain a Specific Value in Pandas? In this article, we will discuss how to drop rows that contain a specific value in Pandas. Dropping rows means removing values from the dataframe we can drop the specific value by using conditional or relational operators. Method 1: Drop the specific value by using Operators We can use the column_na 3 min read Count values by condition in PySpark Dataframe In this article, we are going to count the value of the Pyspark dataframe columns by condition. Creating Dataframe for demonstration: Python3 # importing module import pyspark # importing sparksession from # pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving a 3 min read Show distinct column values in PySpark dataframe In this article, we are going to display the distinct column values from dataframe using pyspark in Python. For this, we are using distinct() and dropDuplicates() functions along with select() function. Let's create a sample dataframe. Python3 # importing module import pyspark # importing sparksessi 2 min read Removing duplicate rows based on specific column in PySpark DataFrame In this article, we are going to drop the duplicate rows based on a specific column from dataframe using pyspark in Python. Duplicate data means the same data based on some condition (column values). For this, we are using dropDuplicates() method: Syntax: dataframe.dropDuplicates(['column 1','column 1 min read Like