Python PySpark - Drop columns based on column names or String condition
In this article, we will be looking at the step-wise approach to dropping columns based on column names or String conditions in PySpark.
Stepwise Implementation
Step1: Create CSV
Under this step, we are simply creating a CSV file with three rows and columns.
CSV Used:

Step 2: Import PySpark Library
Under this step, we are importing the PySpark packages to use its functionality by using the below syntax:
import pyspark
Step 3: Start a SparkSession
In this step we are simply starting our spark session using the SparkSession.builder.appName() function.
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName(
'GeeksForGeeks').getOrCreate() # You can use any appName
print(spark)
Output:

Step 4: Read our CSV
To read our CSV we use spark.read.csv(). It has 2 parameters:
- header = True [Sets column names to First row in the CSV]
- inferSchema = True [Sets the right datatypes for the column elements]
df = spark.read.csv('book1.csv', header=True, inferSchema=True)
df.show()
Output:

Step 5: Drop Column based on Column Name
Finally, we can see how simple it is to Drop a Column based on the Column Name.
To Drop a column we use DataFrame.drop(). And to the result to it, we will see that the Gender column is now not part of the Dataframe. see
df = df.drop("Gender")
df.show()
