How to Rename Multiple PySpark DataFrame Columns
Last Updated :
29 Jun, 2021
In this article, we will discuss how to rename the multiple columns in PySpark Dataframe. For this we will use withColumnRenamed() and toDF() functions.
Creating Dataframe for demonstration:
Python3
# importing module
import pyspark
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of students data with null values
# we can define null values with none
data = [[None, "sravan", "vignan"],
["2", None, "vvit"],
["3", "rohith", None],
["4", "sridevi", "vignan"],
["1", None, None],
["5", "gnanesh", "iit"]]
# specify column names
columns = ['ID', 'NAME', 'college']
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
# show columns
print(dataframe.columns)
# display dataframe
dataframe.show()
Output:
Method 1: Using withColumnRenamed()
This method is used to rename a column in the dataframe
Syntax: dataframe.withColumnRenamed("old_column_name", "new_column_name")
where
- dataframe is the pyspark dataframe
- old_column_name is the existing column name
- new_column_name is the new column name
To change multiple columns, we can specify the functions for n times, separated by "." operator
Syntax: dataframe.withColumnRenamed("old_column_name", "new_column_name").
withColumnRenamed"old_column_name", "new_column_name")
Example 1: Python program to change the column name for two columns
Python3
# display actual columns
print("Actual columns: ", dataframe.columns)
# change the college column name to university
# and ID to student_id
dataframe = dataframe.withColumnRenamed(
"college", "university").withColumnRenamed("ID", "student_id")
# display modified columns
print("modified columns: ", dataframe.columns)
# final dataframe
dataframe.show()
Output:
Example 2: Rename all columns
Python3
# display actual columns
print("Actual columns: ", dataframe.columns)
# change the college column name to university
# and ID to student_id
dataframe = dataframe.withColumnRenamed(
"college", "university").withColumnRenamed(
"ID", "student_id").withColumnRenamed("NAME", "student_name")
# display modified columns
print("modified columns: ", dataframe.columns)
# final dataframe
dataframe.show()
Output:
Method 2: Using toDF()
This method is used to change the names of all the columns of the dataframe
Syntax: dataframe.toDF(*("column 1","column 2","column n))
where, columns are the columns in the dataframe
Example: Python program to change the column names
Python3
# display actual
print("Actual columns: ", dataframe.columns)
# change column names to A,B,C
dataframe = dataframe.toDF(*("A", "B", "C"))
# display new columns
print("New columns: ", dataframe.columns)
# display dataframe
dataframe.show()
Output: