0% found this document useful (0 votes)
73 views

De Mod 0 Get Started With Pyspark Programming

The document discusses Spark SQL, which is a module for structured data processing in Spark. It provides an overview of Spark SQL and shows how SQL queries can be expressed using either SQL syntax or the DataFrame API in Python. Spark SQL optimizes queries before executing them on Resilient Distributed Datasets (RDDs).

Uploaded by

Jaya Bharathi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views

De Mod 0 Get Started With Pyspark Programming

The document discusses Spark SQL, which is a module for structured data processing in Spark. It provides an overview of Spark SQL and shows how SQL queries can be expressed using either SQL syntax or the DataFrame API in Python. Spark SQL optimizes queries before executing them on Resilient Distributed Datasets (RDDs).

Uploaded by

Jaya Bharathi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Get Started

with PySpark
Programming

©2022 Databricks Inc. — All rights reserved 1


Module Agenda
Get Started with PySpark Programming

Spark SQL Overview


DE 0.1 - Spark SQL
DE 0.2L - Spark SQL Lab
DE 0.3 - DataFrame & Column
DE 0.4L - Purchase Revenues Lab
DE 0.5 - Aggregation
DE 0.6L - Revenue by Traffic Lab

©2022 Databricks Inc. — All rights reserved 2


Spark SQL Overview

©2022 Databricks Inc. — All rights reserved 3


Spark SQL is a module for structured data processing
with multiple interfaces

DataFrame API
SQL
Python, Scala, Java, R

©2022 Databricks Inc. — All rights reserved


The same Spark SQL query can be expressed with
SQL and the DataFrame API

SELECT id, result spark.table("exams")


FROM exams .select("id", "result")
WHERE result > 70 .where("result > 70")
ORDER BY result .orderBy("result")

©2022 Databricks Inc. — All rights reserved


Spark SQL executes all queries on the same
engine

SQL Queries

Python DataFrame
API

Query Plans RDDs Execution


Scala DataFrame API

©2022 Databricks Inc. — All rights reserved


Spark SQL optimizes queries before execution

Query Plan Optimized RDDs Execution


Query Plan

©2022 Databricks Inc. — All rights reserved

You might also like