Create PySpark DataFrame from list of tuples

How to create a PySpark dataframe from multiple lists ?

Last Updated : 30 May, 2021

In this article, we will discuss how to create Pyspark dataframe from multiple lists.

Approach

Create data from multiple lists and give column names in another list. So, to do our task we will use the zip method.

zip(list1,list2,., list n)

Pass this zipped data to spark.createDataFrame() method

dataframe = spark.createDataFrame(data, columns)

Examples

Example 1: Python program to create two lists and create the dataframe using these two lists

Python3

# importing module
import pyspark

# importing sparksession from 
# pyspark.sql module
from pyspark.sql import SparkSession

# creating sparksession and giving 
# an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()

# list  of college data with dictionary
# with two lists in three elements each
data = [1, 2, 3]
data1 = ["sravan", "bobby", "ojaswi"]

# specify column names
columns = ['ID', 'NAME']

# creating a dataframe by zipping the two lists
dataframe = spark.createDataFrame(zip(data, data1), columns)

# show data frame
dataframe.show()

Output:

Example 2: Python program to create 4 lists and create the dataframe

Python3

# importing module
import pyspark

# importing sparksession from 
# pyspark.sql module
from pyspark.sql import SparkSession

# creating sparksession and giving 
# an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()

# list  of college data with dictionary
# with four lists in three elements each
data = [1, 2, 3]
data1 = ["sravan", "bobby", "ojaswi"]
data2 = ["iit-k", "iit-mumbai", "vignan university"]
data3 = ["AP", "TS", "UP"]

# specify column names
columns = ['ID', 'NAME', 'COLLEGE', 'ADDRESS']

# creating a dataframe by zipping 
# the two lists
dataframe = spark.createDataFrame(
  zip(data, data1, data2, data3), columns)

# show data frame
dataframe.show()

Output:

Create PySpark DataFrame from list of tuples

S

sravankumar_171fa07058

Improve

Article Tags :

Practice Tags :

python

Similar Reads

PySpark - Create DataFrame from List

In this article, we are going to discuss how to create a Pyspark dataframe from a list.Â To do this first create a list of data and a list of column names. Then pass this zipped data to spark.createDataFrame() method. This method is used to create DataFrame. The data attribute will be the list of da

Create PySpark DataFrame from list of tuples

In this article, we are going to discuss the creation of a Pyspark dataframe from a list of tuples.Â To do this, we will use the createDataFrame() method from pyspark. This method creates a dataframe from RDD, list or Pandas Dataframe. Here data will be the list of tuples and columns will be a list

How to Add Multiple Columns in PySpark Dataframes ?

In this article, we will see different ways of adding Multiple Columns in PySpark Dataframes.Â Let's create a sample dataframe for demonstration: Dataset Used: Cricket_data_set_odi Python3 # import pandas to read json file import pandas as pd # importing module import pyspark # importing sparksessio

How to create an empty PySpark DataFrame ?

In PySpark, an empty DataFrame is one that contains no data. You might need to create an empty DataFrame for various reasons such as setting up schemas for data processing or initializing structures for later appends. In this article, weâ€™ll explore different ways to create an empty PySpark DataFrame

How to create DataFrame from Scala's List of Iterables?

In Scala, working with large datasets is made easier with Apache Spark, a powerful framework for distributed computing. One of the core components of Spark is DataFrames, which organizes data into tables for efficient processing. In this article, we'll explore how to create DataFrames from simple li

How to select and order multiple columns in Pyspark DataFrame ?

In this article, we will discuss how to select and order multiple columns from a dataframe using pyspark in Python. For this, we are using sort() and orderBy() functions along with select() function. Methods UsedSelect(): This method is used to select the part of dataframe columns and return a copy