Converting a PySpark DataFrame Column to a Python List
Last Updated :
01 Dec, 2021
In this article, we will discuss how to convert Pyspark dataframe column to a Python list.
Creating dataframe for demonstration:
Python3
# importing module
import pyspark
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of students data
data = [["1", "sravan", "vignan", 67, 89],
["2", "ojaswi", "vvit", 78, 89],
["3", "rohith", "vvit", 100, 80],
["4", "sridevi", "vignan", 78, 80],
["1", "sravan", "vignan", 89, 98],
["5", "gnanesh", "iit", 94, 98]]
# specify column names
columns = ['student ID', 'student NAME',
'college', 'subject1', 'subject2']
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
# display dataframe
dataframe.show()
Output:

Method 1: Using flatMap()
This method takes the selected column as the input which uses rdd and converts it into the list.
Syntax: dataframe.select('Column_Name').rdd.flatMap(lambda x: x).collect()
where,
- dataframe is the pyspark dataframe
- Column_Name is the column to be converted into the list
- flatMap() is the method available in rdd which takes a lambda expression as a parameter and converts the column into list
- collect() is used to collect the data in the columns
Example 1: Python code to convert particular column to list using flatMap
Python3
# convert student Name to list using
# flatMap
print(dataframe.select('student Name').
rdd.flatMap(lambda x: x).collect())
# convert student ID to list using
# flatMap
print(dataframe.select('student ID').
rdd.flatMap(lambda x: x).collect())
Output:
['sravan', 'ojaswi', 'rohith', 'sridevi', 'sravan', 'gnanesh']
['1', '2', '3', '4', '1', '5']
Example 2: Convert multiple columns to list.
Python3
# convert multiple columns to list using flatMap
print(dataframe.select(['student Name',
'student Name',
'college']).
rdd.flatMap(lambda x: x).collect())
Output:Â
['sravan', 'sravan', 'vignan', 'ojaswi', 'ojaswi', 'vvit', 'rohith', 'rohith', 'vvit', 'sridevi', 'sridevi', 'vignan', 'sravan', 'sravan', Â 'vignan', 'gnanesh', 'gnanesh', 'iit']
Method 2: Using map()
This function is used to map the given dataframe column to list
Syntax: dataframe.select('Column_Name').rdd.map(lambda x : x[0]).collect()
where,
- dataframe is the pyspark dataframe
- Column_Name is the column to be converted into the list
- map() is the method available in rdd which takes a lambda expression as a parameter and converts the column into list
- collect() is used to collect the data in the columns
Example: Python code to convert pyspark dataframe column to list using the map function.
Python3
# convert student Name to list using map
print(dataframe.select('student Name').
rdd.map(lambda x : x[0]).collect())
# convert student ID to list using map
print(dataframe.select('student ID').
rdd.map(lambda x : x[0]).collect())
# convert student college to list using
# map
print(dataframe.select('college').
rdd.map(lambda x : x[0]).collect())
Output:
['sravan', 'ojaswi', 'rohith', 'sridevi', 'sravan', 'gnanesh']
['1', '2', '3', '4', '1', '5']
['vignan', 'vvit', 'vvit', 'vignan', 'vignan', 'iit']
Method 3: Using collect()
Collect is used to collect the data from the dataframe, we will use a comprehension data structure to get pyspark dataframe column to list with collect() method.Â
Syntax: [data[0] for data in dataframe.select('column_name').collect()]
Where,
- dataframe is the pyspark dataframe
- data is the iterator of the dataframe column
- column_name is the column in the dataframe
Example: Python code to convert dataframe columns to list using collect() method
Python3
# display college column in
# the list format using comprehension
print([data[0] for data in dataframe.
select('college').collect()])
# display student ID column in the
# list format using comprehension
print([data[0] for data in dataframe.
select('student ID').collect()])
# display subject1 column in the list
# format using comprehension
print([data[0] for data in dataframe.
select('subject1').collect()])
# display subject2 column in the
# list format using comprehension
print([data[0] for data in dataframe.
select('subject2').collect()])
Output:
['vignan', 'vvit', 'vvit', 'vignan', 'vignan', 'iit']
['1', '2', '3', '4', '1', '5']
[67, 78, 100, 78, 89, 94]
[89, 89, 80, 80, 98, 98]
Method 4: Using toLocalIterator()
This method is used to iterate the column values in the dataframe, we will use a comprehension data structure to get pyspark dataframe column to list with toLocalIterator() method.
Syntax: [data[0] for data in dataframe.select('column_name').toLocalIterator()]
Where,
- dataframe is the pyspark dataframe
- data is the iterator of the dataframe column
- column_name is the column in the dataframe
Example: Convert pyspark dataframe columns to list using toLocalIterator() method
Python3
# display college column in the list
# format using comprehension
print([data[0] for data in dataframe.
select('college').collect()])
# display student ID column in the
# list format using comprehension
print([data[0] for data in dataframe.
select('student ID').toLocalIterator()])
# display subject1 column in the list
# format using comprehension
print([data[0] for data in dataframe.
select('subject1').toLocalIterator()])
# display subject2 column in the
# list format using comprehension
print([data[0] for data in dataframe.
select('subject2').toLocalIterator()])
Output:
['vignan', 'vvit', 'vvit', 'vignan', 'vignan', 'iit']
['1', '2', '3', '4', '1', '5']
[67, 78, 100, 78, 89, 94]
[89, 89, 80, 80, 98, 98]
Method 5: Using toPandas()
Used to convert a column to dataframe, and then we can convert it into a list.Â
Syntax: list(dataframe.select('column_name').toPandas()['column_name'])
Where,
- toPandas() is used to convert particular column to dataframe
- column_name is the column in the pyspark dataframe
Example: Convert pyspark dataframe columns to list using toPandas() method
Python3
# display college column in
# the list format using toPandas
print(list(dataframe.select('college').
toPandas()['college']))
# display student NAME column in
# the list format using toPandas
print(list(dataframe.select('student NAME').
toPandas()['student NAME']))
# display subject1 column in
# the list format using toPandas
print(list(dataframe.select('subject1').
toPandas()['subject1']))
# display subject2 column
# in the list format using toPandas
print(list(dataframe.select('subject2').
toPandas()['subject2']))
Output:
['vignan', 'vvit', 'vvit', 'vignan', 'vignan', 'iit']
['sravan', 'ojaswi', 'rohith', 'sridevi', 'sravan', 'gnanesh']
[67, 78, 100, 78, 89, 94]
[89, 89, 80, 80, 98, 98]
Similar Reads
Python Tutorial | Learn Python Programming Language Python Tutorial â Python is one of the most popular programming languages. Itâs simple to use, packed with features and supported by a wide range of libraries and frameworks. Its clean syntax makes it beginner-friendly.Python is:A high-level language, used in web development, data science, automatio
10 min read
Python Interview Questions and Answers Python is the most used language in top companies such as Intel, IBM, NASA, Pixar, Netflix, Facebook, JP Morgan Chase, Spotify and many more because of its simplicity and powerful libraries. To crack their Online Assessment and Interview Rounds as a Python developer, we need to master important Pyth
15+ min read
Non-linear Components In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co
11 min read
Python OOPs Concepts Object Oriented Programming is a fundamental concept in Python, empowering developers to build modular, maintainable, and scalable applications. By understanding the core OOP principles (classes, objects, inheritance, encapsulation, polymorphism, and abstraction), programmers can leverage the full p
11 min read
Python Projects - Beginner to Advanced Python is one of the most popular programming languages due to its simplicity, versatility, and supportive community. Whether youâre a beginner eager to learn the basics or an experienced programmer looking to challenge your skills, there are countless Python projects to help you grow.Hereâs a list
10 min read
Python Exercise with Practice Questions and Solutions Python Exercise for Beginner: Practice makes perfect in everything, and this is especially true when learning Python. If you're a beginner, regularly practicing Python exercises will build your confidence and sharpen your skills. To help you improve, try these Python exercises with solutions to test
9 min read
Python Programs Practice with Python program examples is always a good choice to scale up your logical understanding and programming skills and this article will provide you with the best sets of Python code examples.The below Python section contains a wide collection of Python programming examples. These Python co
11 min read
Spring Boot Tutorial Spring Boot is a Java framework that makes it easier to create and run Java applications. It simplifies the configuration and setup process, allowing developers to focus more on writing code for their applications. This Spring Boot Tutorial is a comprehensive guide that covers both basic and advance
10 min read
Class Diagram | Unified Modeling Language (UML) A UML class diagram is a visual tool that represents the structure of a system by showing its classes, attributes, methods, and the relationships between them. It helps everyone involved in a projectâlike developers and designersâunderstand how the system is organized and how its components interact
12 min read
Enumerate() in Python enumerate() function adds a counter to each item in a list or other iterable. It turns the iterable into something we can loop through, where each item comes with its number (starting from 0 by default). We can also turn it into a list of (number, item) pairs using list().Let's look at a simple exam
3 min read