Python PySpark - Drop columns based on column names or String condition Last Updated : 27 Mar, 2023 Comments Improve Suggest changes Like Article Like Report In this article, we will be looking at the step-wise approach to dropping columns based on column names or String conditions in PySpark. Stepwise ImplementationStep1: Create CSV Under this step, we are simply creating a CSV file with three rows and columns. CSV Used: Step 2: Import PySpark Library Under this step, we are importing the PySpark packages to use its functionality by using the below syntax: import pysparkStep 3: Start a SparkSession In this step we are simply starting our spark session using the SparkSession.builder.appName() function. Python3 from pyspark.sql import SparkSession spark = SparkSession.builder.appName( 'GeeksForGeeks').getOrCreate() # You can use any appName print(spark) Output: Step 4: Read our CSV To read our CSV we use spark.read.csv(). It has 2 parameters: header = True [Sets column names to First row in the CSV]inferSchema = True [Sets the right datatypes for the column elements] Python3 df = spark.read.csv('book1.csv', header=True, inferSchema=True) df.show() Output: Step 5: Drop Column based on Column Name Finally, we can see how simple it is to Drop a Column based on the Column Name. To Drop a column we use DataFrame.drop(). And to the result to it, we will see that the Gender column is now not part of the Dataframe. see Python3 df = df.drop("Gender") df.show() Comment More infoAdvertise with us Next Article Python PySpark - Drop columns based on column names or String condition ayushmankumar7 Follow Improve Article Tags : Python Python-Pyspark Practice Tags : python Similar Reads PySpark dataframe add column based on other columns In this article, we are going to see how to add columns based on another column to the Pyspark Dataframe. Creating Dataframe for demonstration: Here we are going to create a dataframe from a list of the given dataset. Python3 # Create a spark session from pyspark.sql import SparkSession spark = Spar 2 min read Selecting only numeric or string columns names from PySpark DataFrame In this article, we will discuss how to select only numeric or string column names from a Spark DataFrame. Methods Used:createDataFrame: This method is used to create a spark DataFrame.isinstance: This is a Python function used to check if the specified object is of the specified type.dtypes: It ret 2 min read Count rows based on condition in Pyspark Dataframe In this article, we will discuss how to count rows based on conditions in Pyspark dataframe. For this, we are going to use these methods: Using where() function.Using filter() function. Creating Dataframe for demonstration: Python3 # importing module import pyspark # importing sparksession from pysp 4 min read Split Spark DataFrame based on condition in Python In this article, we are going to learn how to split data frames based on conditions using Pyspark in Python. Spark data frames are a powerful tool for working with large datasets in Apache Spark. They allow to manipulate and analyze data in a structured way, using SQL-like operations. Sometimes, we 5 min read Pivot String column on Pyspark Dataframe Pivoting in data analysis refers to the transformation of data from a long format to a wide format by rotating rows into columns. In PySpark, pivoting is used to restructure DataFrames by turning unique values from a specific column (often categorical) into new columns, with the option to aggregate 4 min read Pyspark - Parse a Column of JSON Strings In this article, we are going to discuss how to parse a column of json strings into their own separate columns. Here we will parse or read json string present in a csv file and convert it into multiple dataframe columns using Python Pyspark. Example 1: Parse a Column of JSON Strings Using pyspark.sq 4 min read How to add a constant column in a PySpark DataFrame? In this article, we are going to see how to add a constant column in a PySpark Dataframe. It can be done in these ways: Using Lit()Using Sql query. Creating Dataframe for demonstration: Python3 # Create a spark session from pyspark.sql import SparkSession from pyspark.sql.functions import lit spark 2 min read How to show full column content in a PySpark Dataframe ? Sometimes in Dataframe, when column data containing the long content or large sentence, then PySpark SQL shows the dataframe in compressed form means the first few words of the sentence are shown and others are followed by dots that refers that some more data is available. From the above sample Data 5 min read PySpark DataFrame - Select all except one or a set of columns In this article, we are going to extract all columns except a set of columns or one column from Pyspark dataframe. For this, we will use the select(), drop() functions. But first, let's create Dataframe for demonestration. Python3 # importing module import pyspark # importing sparksession from pyspa 2 min read Split single column into multiple columns in PySpark DataFrame pyspark.sql.functions provide a function split() which is used to split DataFrame string Column into multiple columns.  Syntax: pyspark.sql.functions.split(str, pattern, limit=- 1) Parameters: str: str is a Column or str to split.pattern: It is a str parameter, a string that represents a regular ex 4 min read Like