0% found this document useful (0 votes)
2K views

Create A DataFrame

This document shows code to create a Spark DataFrame from Row objects, display the DataFrame, and write it out as a Parquet file. It imports SparkSession to create a Spark instance, imports Row and DataFrame functions from Spark SQL, defines Row objects for passenger data and adds them to a list, creates a DataFrame from the list, displays the DataFrame contents, and writes the single-partition DataFrame to a Parquet file called PassengerData.

Uploaded by

Arpita Das
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views

Create A DataFrame

This document shows code to create a Spark DataFrame from Row objects, display the DataFrame, and write it out as a Parquet file. It imports SparkSession to create a Spark instance, imports Row and DataFrame functions from Spark SQL, defines Row objects for passenger data and adds them to a list, creates a DataFrame from the list, displays the DataFrame contents, and writes the single-partition DataFrame to a Parquet file called PassengerData.

Uploaded by

Arpita Das
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 1

# Put your code here

from pyspark.sql import SparkSession


spark = SparkSession \
.builder \
.appName("Data Frame PASSANGER") \
.config("spark.some.config.option", "some-value") \
.getOrCreate()

from pyspark.sql import *


passanger = Row("Name","age","source","destination")
data1 = passanger("David", "22", "London", "Paris")
data2 = passanger("Steve", "22", "New York", "Sydney")
passangerData=[data1,data2]
df = spark.createDataFrame(passangerData)
df.show()

# Don't Remove this line


df.coalesce(1).write.parquet("PassengerData")

You might also like