Python PySpark - Drop columns based on column names or String condition Last Updated : 27 Mar, 2023 Comments Improve Suggest changes Like Article Like Report In this article, we will be looking at the step-wise approach to dropping columns based on column names or String conditions in PySpark. Stepwise ImplementationStep1: Create CSV Under this step, we are simply creating a CSV file with three rows and columns. CSV Used: Step 2: Import PySpark Library Under this step, we are importing the PySpark packages to use its functionality by using the below syntax: import pysparkStep 3: Start a SparkSession In this step we are simply starting our spark session using the SparkSession.builder.appName() function. Python3 from pyspark.sql import SparkSession spark = SparkSession.builder.appName( 'GeeksForGeeks').getOrCreate() # You can use any appName print(spark) Output: Step 4: Read our CSV To read our CSV we use spark.read.csv(). It has 2 parameters: header = True [Sets column names to First row in the CSV]inferSchema = True [Sets the right datatypes for the column elements] Python3 df = spark.read.csv('book1.csv', header=True, inferSchema=True) df.show() Output: Step 5: Drop Column based on Column Name Finally, we can see how simple it is to Drop a Column based on the Column Name. To Drop a column we use DataFrame.drop(). And to the result to it, we will see that the Gender column is now not part of the Dataframe. see Python3 df = df.drop("Gender") df.show() Comment More infoAdvertise with us Next Article Python PySpark - Drop columns based on column names or String condition A ayushmankumar7 Follow Improve Article Tags : Python Python-Pyspark Practice Tags : python Similar Reads PySpark dataframe add column based on other columns In this article, we are going to see how to add columns based on another column to the Pyspark Dataframe. Creating Dataframe for demonstration: Here we are going to create a dataframe from a list of the given dataset. Python3 # Create a spark session from pyspark.sql import SparkSession spark = Spar 2 min read Selecting only numeric or string columns names from PySpark DataFrame In this article, we will discuss how to select only numeric or string column names from a Spark DataFrame. Methods Used:createDataFrame: This method is used to create a spark DataFrame.isinstance: This is a Python function used to check if the specified object is of the specified type.dtypes: It ret 2 min read Count rows based on condition in Pyspark Dataframe In this article, we will discuss how to count rows based on conditions in Pyspark dataframe. For this, we are going to use these methods: Using where() function.Using filter() function. Creating Dataframe for demonstration: Python3 # importing module import pyspark # importing sparksession from pysp 4 min read Split Spark DataFrame based on condition in Python In this article, we are going to learn how to split data frames based on conditions using Pyspark in Python. Spark data frames are a powerful tool for working with large datasets in Apache Spark. They allow to manipulate and analyze data in a structured way, using SQL-like operations. Sometimes, we 5 min read Pyspark - Parse a Column of JSON Strings In this article, we are going to discuss how to parse a column of json strings into their own separate columns. Here we will parse or read json string present in a csv file and convert it into multiple dataframe columns using Python Pyspark. Example 1: Parse a Column of JSON Strings Using pyspark.sq 4 min read Like