PySpark - Create DataFrame from List Last Updated : 30 May, 2021 Comments Improve Suggest changes Like Article Like Report In this article, we are going to discuss how to create a Pyspark dataframe from a list. To do this first create a list of data and a list of column names. Then pass this zipped data to spark.createDataFrame() method. This method is used to create DataFrame. The data attribute will be the list of data and the columns attribute will be the list of names. dataframe = spark.createDataFrame(data, columns) Example1: Python code to create Pyspark student dataframe from two lists. Python3 # importing module import pyspark # importing sparksession from # pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving # an app name spark = SparkSession.builder.appName('sparkdf').getOrCreate() # list of college data with two lists data = [["java", "dbms", "python"], ["OOPS", "SQL", "Machine Learning"]] # giving column names of dataframe columns = ["Subject 1", "Subject 2", "Subject 3"] # creating a dataframe dataframe = spark.createDataFrame(data, columns) # show data frame dataframe.show() Output: Example 2: Create a dataframe from 4 lists Python3 # importing module import pyspark # importing sparksession from # pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving # an app name spark = SparkSession.builder.appName('sparkdf').getOrCreate() # list of college data with two lists data = [["node.js", "dbms", "integration"], ["jsp", "SQL", "trigonometry"], ["php", "oracle", "statistics"], [".net", "db2", "Machine Learning"]] # giving column names of dataframe columns = ["Web Technologies", "Data bases", "Maths"] # creating a dataframe dataframe = spark.createDataFrame(data, columns) # show data frame dataframe.show() Output: Comment More infoAdvertise with us Next Article PySpark - Create DataFrame from List sravankumar_171fa07058 Follow Improve Article Tags : Python Python-Pyspark Practice Tags : python Similar Reads Create PySpark DataFrame from list of tuples In this article, we are going to discuss the creation of a Pyspark dataframe from a list of tuples. To do this, we will use the createDataFrame() method from pyspark. This method creates a dataframe from RDD, list or Pandas Dataframe. Here data will be the list of tuples and columns will be a list 2 min read Create PySpark dataframe from nested dictionary In this article, we are going to discuss the creation of Pyspark dataframe from the nested dictionary. We will use the createDataFrame() method from pyspark for creating DataFrame. For this, we will use a list of nested dictionary and extract the pair as a key and value. Select the key, value pairs 2 min read How to create a PySpark dataframe from multiple lists ? In this article, we will discuss how to create Pyspark dataframe from multiple lists. ApproachCreate data from multiple lists and give column names in another list. So, to do our task we will use the zip method. zip(list1,list2,., list n) Pass this zipped data to spark.createDataFrame() method data 2 min read Create PySpark dataframe from dictionary In this article, we are going to discuss the creation of Pyspark dataframe from the dictionary. To do this spark.createDataFrame() method method is used. This method takes two argument data and columns. The data attribute will contain the dataframe and the columns attribute will contain the list of 2 min read Creating a PySpark DataFrame PySpark helps in processing large datasets using its DataFrame structure. In this article, we will see different methods to create a PySpark DataFrame. It starts with initialization of SparkSession which serves as the entry point for all PySpark applications which is shown below:from pyspark.sql imp 5 min read PySpark dataframe foreach to fill a list In this article, we are going to learn how to make a list of rows in Pyspark dataframe using foreach using Pyspark in Python. PySpark is a powerful open-source library for working on large datasets in the Python programming language. It is designed for distributed computing and it is commonly used f 3 min read How to create an empty PySpark DataFrame ? In PySpark, an empty DataFrame is one that contains no data. You might need to create an empty DataFrame for various reasons such as setting up schemas for data processing or initializing structures for later appends. In this article, weâll explore different ways to create an empty PySpark DataFrame 4 min read PySpark Collect() â Retrieve data from DataFrame Collect() is the function, operation for RDD or Dataframe that is used to retrieve the data from the Dataframe. It is used useful in retrieving all the elements of the row from each partition in an RDD and brings that over the driver node/program. So, in this article, we are going to learn how to re 6 min read How to create PySpark dataframe with schema ? In this article, we will discuss how to create the dataframe with schema using PySpark. In simple words, the schema is the structure of a dataset or dataframe. Functions Used:FunctionDescriptionSparkSessionThe entry point to the Spark SQL.SparkSession.builder()It gives access to Builder API that we 2 min read Like