Convert Python Dictionary List to PySpark DataFrame
Last Updated :
18 Jul, 2021
In this article, we will discuss how to convert Python Dictionary List to Pyspark DataFrame.
It can be done in these ways:
- Using Infer schema.
- Using Explicit schema
- Using SQL Expression
Method 1: Infer schema from the dictionary
We will pass the dictionary directly to the createDataFrame() method.
Syntax: spark.createDataFrame(data)
Example: Python code to create pyspark dataframe from dictionary list using this method
Python3
# import the modules
from pyspark.sql import SparkSession
# Create Spark session app name
# is GFG and master name is local
spark = SparkSession.builder.appName("GFG").master("local") .getOrCreate()
# dictionary list of college data
data = [{"Name": 'sravan kumar',
"ID": 1,
"Percentage": 94.29},
{"Name": 'sravani',
"ID": 2,
"Percentage": 84.29},
{"Name": 'kumar',
"ID": 3,
"Percentage": 94.29}
]
# Create data frame from dictionary list
df = spark.createDataFrame(data)
# display
df.show()
Output:
Method 2: Using Explicit schema
Here we are going to create a schema and pass the schema along with the data to createdataframe() method.
Schema structure:
schema = StructType([
StructField('column_1', DataType(), False),
StructField('column_2', DataType(), False)])
Where columns are the name of the columns of the dictionary to get in pyspark dataframe and Datatype is the data type of the particular column.
Syntax: spark.createDataFrame(data, schema)
Where,
- data is the dictionary list
- schema is the schema of the dataframe
Python program to create pyspark dataframe from dictionary lists using this method.
Python3
# import the modules
from pyspark.sql import SparkSession
from pyspark.sql.types import StructField, StructType,
StringType, IntegerType, FloatType
# Create Spark session app name is
# GFG and master name is local
spark = SparkSession.builder.appName("GFG").master("local") .getOrCreate()
# dictionary list of college data
data = [{"Name": 'sravan kumar',
"ID": 1,
"Percentage": 94.29},
{"Name": 'sravani',
"ID": 2,
"Percentage": 84.29},
{"Name": 'kumar',
"ID": 3,
"Percentage": 94.29}
]
# specify the schema
schema = StructType([
StructField('Name', StringType(), False),
StructField('ID', IntegerType(), False),
StructField('Percentage', FloatType(), True)
])
# Create data frame from
# dictionary list through the schema
df = spark.createDataFrame(data, schema)
# display
df.show()
Output:
Method 3: Using SQL Expression
Here we are using the Row function to convert the python dictionary list to pyspark dataframe.
Syntax: spark.createDataFrame([Row(**iterator) for iterator in data])
where:
- createDataFrame() is the method to create the dataframe
- Row(**iterator) to iterate the dictionary list.
- data is the dictionary list
Python code to convert dictionary list to pyspark dataframe.
Python3
# import the modules
from pyspark.sql import SparkSession, Row
# Create Spark session app name
# is GFG and master name is local
spark = SparkSession.builder.appName("GFG").master("local") .getOrCreate()
# dictionary list of college data
data = [{"Name": 'sravan kumar',
"ID": 1,
"Percentage": 94.29},
{"Name": 'sravani',
"ID": 2,
"Percentage": 84.29},
{"Name": 'kumar',
"ID": 3,
"Percentage": 94.29}
]
# create dataframe using sql expression
dataframe = spark.createDataFrame([Row(**variable)
for variable in data])
dataframe.show()
Output:
Similar Reads
Convert PySpark DataFrame to Dictionary in Python In this article, we are going to see how to convert the PySpark data frame to the dictionary, where keys are column names and values are column values. Before starting, we will create a sample Dataframe: Python3 # Importing necessary libraries from pyspark.sql import SparkSession # Create a spark se
3 min read
Python - Convert dict of list to Pandas dataframe In this article, we will discuss how to convert a dictionary of lists to a pandas dataframe. Method 1: Using DataFrame.from_dict() We will use the from_dict method. This method will construct DataFrame from dict of array-like or dicts. Syntax: pandas.DataFrame.from_dict(dictionary) where dictionary
2 min read
How to convert list of dictionaries into Pyspark DataFrame ? In this article, we are going to discuss the creation of the Pyspark dataframe from the list of dictionaries. We are going to create a dataframe in PySpark using a list of dictionaries with the help createDataFrame() method. The data attribute takes the list of dictionaries and columns attribute tak
2 min read
How to convert Dictionary to Pandas Dataframe? Converting a dictionary into a Pandas DataFrame is simple and effective. You can easily convert a dictionary with key-value pairs into a tabular format for easy data analysis. Lets see how we can do it using various methods in Pandas.1. Using the Pandas ConstructorWe can convert a dictionary into Da
2 min read
Python | Convert list of nested dictionary into Pandas dataframe Given a list of the nested dictionary, write a Python program to create a Pandas dataframe using it. We can convert list of nested dictionary into Pandas DataFrame. Let's understand the stepwise procedure to create a Pandas Dataframe using the list of nested dictionary. Convert Nested List of Dictio
4 min read
How To Convert Pandas Dataframe To Nested Dictionary In this article, we will learn how to convert Pandas DataFrame to Nested Dictionary. Convert Pandas Dataframe To Nested DictionaryConverting a Pandas DataFrame to a nested dictionary involves organizing the data in a hierarchical structure based on specific columns. In Python's Pandas library, we ca
2 min read
How to Convert Pandas to PySpark DataFrame ? In this article, we will learn How to Convert Pandas to PySpark DataFrame. Sometimes we will get csv, xlsx, etc. format data, and we have to store it in PySpark DataFrame and that can be done by loading data in Pandas then converted PySpark DataFrame. For conversion, we pass the Pandas dataframe int
3 min read
Converting a PySpark DataFrame Column to a Python List In this article, we will discuss how to convert Pyspark dataframe column to a Python list. Creating dataframe for demonstration: Python3 # importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app n
5 min read
Convert PySpark Row List to Pandas DataFrame In this article, we will convert a PySpark Row List to Pandas Data Frame. A Row object is defined as a single Row in a PySpark DataFrame. Thus, a Data Frame can be easily represented as a Python List of Row objects. Method 1 : Use createDataFrame() method and use toPandas() method Here is the syntax
4 min read
Convert PySpark dataframe to list of tuples In this article, we are going to convert the Pyspark dataframe into a list of tuples. The rows in the dataframe are stored in the list separated by a comma operator. So we are going to create a dataframe by using a nested list Creating Dataframe for demonstration: Python3 # importing module import p
2 min read