Drop rows containing specific value in PySpark dataframe Last Updated : 30 Jun, 2021 Comments Improve Suggest changes Like Article Like Report In this article, we are going to drop the rows with a specific value in pyspark dataframe. Creating dataframe for demonstration: Python3 # importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName('sparkdf').getOrCreate() # list of students data data = [["1", "sravan", "vignan"], ["2", "ojaswi", "vvit"], ["3", "rohith", "vvit"], ["4", "sridevi", "vignan"], ["6", "ravi", "vrs"], ["5", "gnanesh", "iit"]] # specify column names columns = ['ID', 'NAME', 'college'] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data, columns) print('Actual data in dataframe') dataframe.show() Output: Method 1: Using where() function This function is used to check the condition and give the results. That means it drops the rows based on the values in the dataframe column Syntax: dataframe.where(condition) Example 1: Python program to drop rows with college = vrs. Python3 # drop rows with college vrs dataframe.where(dataframe.college!='vrs').show() Output: Example 2: Python program to drop rows with ID=1 Python3 # drop rows with id=1 dataframe.where(dataframe.ID !='1').show() Output: Method 2: Using filter() function This function is used to check the condition and give the results, Which means it drops the rows based on the values in the dataframe column. Both are similar. Syntax: dataframe.filter(condition) Example: Python code to drop row with name = ravi. Python3 # drop rows with name = ravi dataframe.filter(dataframe.NAME !='ravi').show() Output: Comment More infoAdvertise with us Next Article Drop rows containing specific value in PySpark dataframe gottumukkalabobby Follow Improve Article Tags : Python Python-Pyspark Practice Tags : python Similar Reads Drop Rows in PySpark DataFrame with Condition In this article, we are going to drop the rows in PySpark dataframe. We will be considering most common conditions like dropping rows with Null values, dropping duplicate rows, etc. All these conditions use different functions and we will discuss them in detail.We will cover the following topics:Dro 4 min read Get specific row from PySpark dataframe In this article, we will discuss how to get the specific row from the PySpark dataframe. Creating Dataframe for demonstration: Python3 # importing module import pyspark # importing sparksession # from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession # and giving an app 4 min read How to Drop Rows that Contain a Specific Value in Pandas? In this article, we will discuss how to drop rows that contain a specific value in Pandas. Dropping rows means removing values from the dataframe we can drop the specific value by using conditional or relational operators. Method 1: Drop the specific value by using Operators We can use the column_na 3 min read Count values by condition in PySpark Dataframe In this article, we are going to count the value of the Pyspark dataframe columns by condition. Creating Dataframe for demonstration: Python3 # importing module import pyspark # importing sparksession from # pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving a 3 min read Show distinct column values in PySpark dataframe In this article, we are going to display the distinct column values from dataframe using pyspark in Python. For this, we are using distinct() and dropDuplicates() functions along with select() function. Let's create a sample dataframe. Python3 # importing module import pyspark # importing sparksessi 2 min read Removing duplicate rows based on specific column in PySpark DataFrame In this article, we are going to drop the duplicate rows based on a specific column from dataframe using pyspark in Python. Duplicate data means the same data based on some condition (column values). For this, we are using dropDuplicates() method: Syntax: dataframe.dropDuplicates(['column 1','column 1 min read PySpark DataFrame - Drop Rows with NULL or None Values Sometimes while handling data inside a dataframe we may get null values. In order to clean the dataset we have to remove all the null values in the dataframe. So in this article, we will learn how to drop rows with NULL or None Values in PySpark DataFrame. Function Used In pyspark the drop() funct 5 min read Cleaning data with dropna in Pyspark While dealing with a big size Dataframe which consists of many rows and columns they also consist of many NULL or None values at some row or column, or some of the rows are totally NULL or None. So in this case, if we apply an operation on the same Dataframe that contains many NULL or None values th 4 min read PySpark - Split dataframe by column value A distributed collection of data grouped into named columns is known as a Pyspark data frame in Python. There occurs various circumstances in which you need only particular rows in the data frame. For this, you need to split the data frame according to the column value. This can be achieved either 3 min read Filtering rows based on column values in PySpark dataframe In this article, we are going to filter the rows based on column values in PySpark dataframe. Creating Dataframe for demonstration:Python3 # importing module import spark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app n 2 min read Like