0% found this document useful (0 votes)
224 views1 page

Spark Read - Write Cheat Sheet

This document provides a cheat sheet for reading and writing data in different file formats using Spark SQL and DataFrames. It shows how to read and write CSV, Parquet, JSON, and Delta Lake files. Key functions covered include spark.read to read files, df.write to write files, and Spark SQL commands to read from and write to Delta Lake tables.

Uploaded by

rajikare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
224 views1 page

Spark Read - Write Cheat Sheet

This document provides a cheat sheet for reading and writing data in different file formats using Spark SQL and DataFrames. It shows how to read and write CSV, Parquet, JSON, and Delta Lake files. Key functions covered include spark.read to read files, df.write to write files, and Spark SQL commands to read from and write to Delta Lake tables.

Uploaded by

rajikare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Spark Read/Write Cheat Sheet

Read CSV Read Parquet


>>> df=spark.read.format("csv").option("header","true").load(filePath) >>> df=spark.read.format("parquet).load(parquetDirectory)

Infer Schema OR

>>> df=spark.read.format("csv").option("inferSchema","true").load(filePath) >>> df=spark.read.parquet(parquetDirectory)

Custom Schema Write Parquet


>>> csvSchema = StructType([StructField(“id",IntegerType(),False)]) >>> df.write.format("parquet").mode("overwrite").save("outputPath")

>>> df=spark.read.format("csv").schema(csvSchema).load(filePath) Write Parquet Partition By

Write CSV >>> df.write.format("parquet").partitionBy("keyColumn").save("outputPath")

>>> df.write.format("csv").mode("overwrite).save(outputPath/file.csv) Read Delta


Spark SQL

Read JSON >>> SELECT * FROM delta. `/path/to/delta_directory`


>>> df=spark.read.format("json").option("inferSchema”,"true").load(filePath)
Spark SQL Unmanaged Table
Write JSON
>>> spark.sql(""" DROP TABLE IF EXISTS delta_table_name""")
>>> df.write.format("json").mode("overwrite).save(outputPath/file.json)
>>> spark.sql(""" CREATE TABLE delta_table_name USING DELTA LOCATION ‘{}’
”””.format(pathToDelta))

Write Delta
>>>someDataFrame.write.format(“delta").partitionBy("someColumn").save(path)

You might also like