Convert PySpark DataFrame to Dictionary in Python
Last Updated :
17 Jun, 2021
In this article, we are going to see how to convert the PySpark data frame to the dictionary, where keys are column names and values are column values.
Before starting, we will create a sample Dataframe:
Python3
# Importing necessary libraries
from pyspark.sql import SparkSession
# Create a spark session
spark = SparkSession.builder.appName('DF_to_dict').getOrCreate()
# Create data in dataframe
data = [(('Ram'), '1991-04-01', 'M', 3000),
(('Mike'), '2000-05-19', 'M', 4000),
(('Rohini'), '1978-09-05', 'M', 4000),
(('Maria'), '1967-12-01', 'F', 4000),
(('Jenis'), '1980-02-17', 'F', 1200)]
# Column names in dataframe
columns = ["Name", "DOB", "Gender", "salary"]
# Create the spark dataframe
df = spark.createDataFrame(data=data,
schema=columns)
# Print the dataframe
df.show()
Output :
Method 1: Using df.toPandas()
Convert the PySpark data frame to Pandas data frame using df.toPandas().
Syntax: DataFrame.toPandas()
Return type: Returns the pandas data frame having the same content as Pyspark Dataframe.
Get through each column value and add the list of values to the dictionary with the column name as the key.
Python3
# Declare an empty Dictionary
dict = {}
# Convert PySpark DataFrame to Pandas
# DataFrame
df = df.toPandas()
# Traverse through each column
for column in df.columns:
# Add key as column_name and
# value as list of column values
dict[column] = df[column].values.tolist()
# Print the dictionary
print(dict)
Output :
{'Name': ['Ram', 'Mike', 'Rohini', 'Maria', 'Jenis'],
'DOB': ['1991-04-01', '2000-05-19', '1978-09-05', '1967-12-01', '1980-02-17'],
'Gender': ['M', 'M', 'M', 'F', 'F'],
'salary': [3000, 4000, 4000, 4000, 1200]}
Method 2: Using df.collect()
Convert the PySpark data frame into the list of rows, and returns all the records of a data frame as a list.
Syntax: DataFrame.collect()
Return type: Returns all the records of the data frame as a list of rows.
Python3
import numpy as np
# Convert the dataframe into list
# of rows
rows = [list(row) for row in df.collect()]
# COnvert the list into numpy array
ar = np.array(rows)
# Declare an empty dictionary
dict = {}
# Get through each column
for i, column in enumerate(df.columns):
# Add ith column as values in dict
# with key as ith column_name
dict[column] = list(ar[:, i])
# Print the dictionary
print(dict)
Output :
{'Name': ['Ram', 'Mike', 'Rohini', 'Maria', 'Jenis'],
'DOB': ['1991-04-01', '2000-05-19', '1978-09-05', '1967-12-01', '1980-02-17'],
'Gender': ['M', 'M', 'M', 'F', 'F'],
'salary': ['3000', '4000', '4000', '4000', '1200']}
Method 3: Using pandas.DataFrame.to_dict()
Pandas data frame can be directly converted into a dictionary using the to_dict() method
Syntax: DataFrame.to_dict(orient='dict',)
Parameters:
- orient: Indicating the type of values of the dictionary. It takes values such as {‘dict’, ‘list’, ‘series’, ‘split’, ‘records’, ‘index’}
Return type: Returns the dictionary corresponding to the data frame.
Code:
Python3
# COnvert PySpark dataframe to pandas
# dataframe
df = df.toPandas()
# Convert the dataframe into
# dictionary
dict = df.to_dict(orient = 'list')
# Print the dictionary
print(dict)
Output :
{'Name': ['Ram', 'Mike', 'Rohini', 'Maria', 'Jenis'],
'DOB': ['1991-04-01', '2000-05-19', '1978-09-05', '1967-12-01', '1980-02-17'],
'Gender': ['M', 'M', 'M', 'F', 'F'],
'salary': [3000, 4000, 4000, 4000, 1200]}
Converting a data frame having 2 columns to a dictionary, create a data frame with 2 columns naming 'Location' and 'House_price'
Python3
# Importing necessary libraries
from pyspark.sql import SparkSession
# Create a spark session
spark = SparkSession.builder.appName('DF_to_dict').getOrCreate()
# Create data in dataframe
data = [(('Hyderabad'), 120000),
(('Delhi'), 124000),
(('Mumbai'), 344000),
(('Guntur'), 454000),
(('Bandra'), 111200)]
# Column names in dataframe
columns = ["Location", 'House_price']
# Create the spark dataframe
df = spark.createDataFrame(data=data, schema=columns)
# Print the dataframe
print('Dataframe : ')
df.show()
# COnvert PySpark dataframe to
# pandas dataframe
df = df.toPandas()
# Convert the dataframe into
# dictionary
dict = df.to_dict(orient='list')
# Print the dictionary
print('Dictionary :')
print(dict)
Output :
Similar Reads
Convert Python Dictionary List to PySpark DataFrame
In this article, we will discuss how to convert Python Dictionary List to Pyspark DataFrame. It can be done in these ways: Using Infer schema.Using Explicit schemaUsing SQL Expression Method 1: Infer schema from the dictionary We will pass the dictionary directly to the createDataFrame() method. Syn
3 min read
How to convert Dictionary to Pandas Dataframe?
Converting a dictionary into a Pandas DataFrame is simple and effective. You can easily convert a dictionary with key-value pairs into a tabular format for easy data analysis. Lets see how we can do it using various methods in Pandas.1. Using the Pandas ConstructorWe can convert a dictionary into Da
2 min read
How to convert list of dictionaries into Pyspark DataFrame ?
In this article, we are going to discuss the creation of the Pyspark dataframe from the list of dictionaries. We are going to create a dataframe in PySpark using a list of dictionaries with the help createDataFrame() method. The data attribute takes the list of dictionaries and columns attribute tak
2 min read
How to Convert Pandas to PySpark DataFrame ?
In this article, we will learn How to Convert Pandas to PySpark DataFrame. Sometimes we will get csv, xlsx, etc. format data, and we have to store it in PySpark DataFrame and that can be done by loading data in Pandas then converted PySpark DataFrame. For conversion, we pass the Pandas dataframe int
3 min read
Python - Convert dict of list to Pandas dataframe
In this article, we will discuss how to convert a dictionary of lists to a pandas dataframe. Method 1: Using DataFrame.from_dict() We will use the from_dict method. This method will construct DataFrame from dict of array-like or dicts. Syntax: pandas.DataFrame.from_dict(dictionary) where dictionary
2 min read
How To Convert Pandas Dataframe To Nested Dictionary
In this article, we will learn how to convert Pandas DataFrame to Nested Dictionary. Convert Pandas Dataframe To Nested DictionaryConverting a Pandas DataFrame to a nested dictionary involves organizing the data in a hierarchical structure based on specific columns. In Python's Pandas library, we ca
2 min read
How to create DataFrame from dictionary in Python-Pandas?
The task of converting a dictionary into a Pandas DataFrame involves transforming a dictionary into a structured, tabular format where keys represent column names or row indexes and values represent the corresponding data.Using Default ConstructorThis is the simplest method where a dictionary is dir
3 min read
Python | Convert list of nested dictionary into Pandas dataframe
Given a list of the nested dictionary, write a Python program to create a Pandas dataframe using it. We can convert list of nested dictionary into Pandas DataFrame. Let's understand the stepwise procedure to create a Pandas Dataframe using the list of nested dictionary. Convert Nested List of Dictio
4 min read
Convert PySpark Row List to Pandas DataFrame
In this article, we will convert a PySpark Row List to Pandas Data Frame. A Row object is defined as a single Row in a PySpark DataFrame. Thus, a Data Frame can be easily represented as a Python List of Row objects. Method 1 : Use createDataFrame() method and use toPandas() method Here is the syntax
4 min read
Create PySpark dataframe from dictionary
In this article, we are going to discuss the creation of Pyspark dataframe from the dictionary. To do this spark.createDataFrame() method method is used. This method takes two argument data and columns. The data attribute will contain the dataframe and the columns attribute will contain the list of
2 min read