How to create a PySpark dataframe from multiple lists ? Last Updated : 30 May, 2021 Comments Improve Suggest changes Like Article Like Report In this article, we will discuss how to create Pyspark dataframe from multiple lists. ApproachCreate data from multiple lists and give column names in another list. So, to do our task we will use the zip method. zip(list1,list2,., list n) Pass this zipped data to spark.createDataFrame() method dataframe = spark.createDataFrame(data, columns) Examples Example 1: Python program to create two lists and create the dataframe using these two lists Python3 # importing module import pyspark # importing sparksession from # pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving # an app name spark = SparkSession.builder.appName('sparkdf').getOrCreate() # list of college data with dictionary # with two lists in three elements each data = [1, 2, 3] data1 = ["sravan", "bobby", "ojaswi"] # specify column names columns = ['ID', 'NAME'] # creating a dataframe by zipping the two lists dataframe = spark.createDataFrame(zip(data, data1), columns) # show data frame dataframe.show() Output: Example 2: Python program to create 4 lists and create the dataframe Python3 # importing module import pyspark # importing sparksession from # pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving # an app name spark = SparkSession.builder.appName('sparkdf').getOrCreate() # list of college data with dictionary # with four lists in three elements each data = [1, 2, 3] data1 = ["sravan", "bobby", "ojaswi"] data2 = ["iit-k", "iit-mumbai", "vignan university"] data3 = ["AP", "TS", "UP"] # specify column names columns = ['ID', 'NAME', 'COLLEGE', 'ADDRESS'] # creating a dataframe by zipping # the two lists dataframe = spark.createDataFrame( zip(data, data1, data2, data3), columns) # show data frame dataframe.show() Output: Comment More infoAdvertise with us Next Article How to create a PySpark dataframe from multiple lists ? sravankumar_171fa07058 Follow Improve Article Tags : Python Python-Pyspark Practice Tags : python Similar Reads PySpark - Create DataFrame from List In this article, we are going to discuss how to create a Pyspark dataframe from a list. To do this first create a list of data and a list of column names. Then pass this zipped data to spark.createDataFrame() method. This method is used to create DataFrame. The data attribute will be the list of da 2 min read Create PySpark DataFrame from list of tuples In this article, we are going to discuss the creation of a Pyspark dataframe from a list of tuples. To do this, we will use the createDataFrame() method from pyspark. This method creates a dataframe from RDD, list or Pandas Dataframe. Here data will be the list of tuples and columns will be a list 2 min read How to Add Multiple Columns in PySpark Dataframes ? In this article, we will see different ways of adding Multiple Columns in PySpark Dataframes. Let's create a sample dataframe for demonstration: Dataset Used: Cricket_data_set_odi Python3 # import pandas to read json file import pandas as pd # importing module import pyspark # importing sparksessio 2 min read How to create an empty PySpark DataFrame ? In PySpark, an empty DataFrame is one that contains no data. You might need to create an empty DataFrame for various reasons such as setting up schemas for data processing or initializing structures for later appends. In this article, weâll explore different ways to create an empty PySpark DataFrame 4 min read How to drop multiple column names given in a list from PySpark DataFrame ? In this article, we are going to drop multiple columns given in the list in Pyspark dataframe in Python. For this, we will use the drop() function. This function is used to remove the value from dataframe. Syntax: dataframe.drop(*['column 1','column 2','column n']) Where, dataframe is the input data 2 min read How to create DataFrame from Scala's List of Iterables? In Scala, working with large datasets is made easier with Apache Spark, a powerful framework for distributed computing. One of the core components of Spark is DataFrames, which organizes data into tables for efficient processing. In this article, we'll explore how to create DataFrames from simple li 3 min read How to select and order multiple columns in Pyspark DataFrame ? In this article, we will discuss how to select and order multiple columns from a dataframe using pyspark in Python. For this, we are using sort() and orderBy() functions along with select() function. Methods UsedSelect(): This method is used to select the part of dataframe columns and return a copy 2 min read PySpark dataframe foreach to fill a list In this article, we are going to learn how to make a list of rows in Pyspark dataframe using foreach using Pyspark in Python. PySpark is a powerful open-source library for working on large datasets in the Python programming language. It is designed for distributed computing and it is commonly used f 3 min read Multiple criteria for aggregation on PySpark Dataframe In this article, we will discuss how to do Multiple criteria aggregation on PySpark Dataframe. Data frame in use: In PySpark,  groupBy() is used to collect the identical data into groups on the PySpark DataFrame and perform aggregate functions on the grouped data. So by this we can do multiple aggre 3 min read Create PySpark dataframe from nested dictionary In this article, we are going to discuss the creation of Pyspark dataframe from the nested dictionary. We will use the createDataFrame() method from pyspark for creating DataFrame. For this, we will use a list of nested dictionary and extract the pair as a key and value. Select the key, value pairs 2 min read Like