100% found this document useful (2 votes)

284 views88 pages

A Deep Dive Into Query Execution Engine of Spark SQL

The document provides an overview of query execution in Spark SQL, including: - Spark SQL engine components like Catalyst optimization and Tungsten execution. - Physical planning which transforms logical operators into physical operators and chooses execution plans. - Whole stage code generation which fuses operators into single nodes that generate optimized code. - Implementation details like single dependency pipelines within whole stage code generation nodes and operator interfaces.

Uploaded by

maghnus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

284 views88 pages

A Deep Dive Into Query Execution Engine of Spark SQL

Uploaded by

maghnus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 88

Deep Dive:

Query Execution of Spark SQL

Maryann Xue, Xingbo Jiang, Kris Mok

Apr. 2019
1
About Us
Software Engineers

• Maryann Xue
PMC of Apache Calcite & Apache Phoenix @maryannxue
• Xingbo Jiang
Apache Spark Committer @jiangxb1987
• Kris Mok
OpenJDK Committer @rednaxelafx

2
Databricks Unified Analytics Platform
DATABRICKS WORKSPACE
Notebooks
Jobs
Models
APIs
Dashboards End to end ML lifecycle

DATABRICKS RUNTIME
Databricks Delta ML Frameworks
Reliable & Scalable Simple & Integrated

DATABRICKS CLOUD SERVICE

Databricks Customers Across Industries
Financial Services Healthcare & Pharma Media & Entertainment Data & Analytics Services Technology

Public Sector Retail & CPG Consumer Services Marketing & AdTech Energy & Industrial IoT
Apache Spark 3.x
Spark Spark 3rd-party
Spark ML
Streaming Graph Libraries
SQL
SparkSession / DataFrame / Dataset APIs

Catalyst Optimization & Tungsten Execution

Data Source Connectors Spark Core

5
Apache Spark 3.x
Spark Spark 3rd-party
Spark ML
Streaming Graph Libraries
SQL
SparkSession / DataFrame / Dataset APIs

Catalyst Optimization & Tungsten Execution

Data Source Connectors Spark Core

6
Spark SQL Engine
Analysis -> Logical Optimization -> Physical Planning -> Code Generation -> Execution

Runtime

7
Spark SQL Engine - Front End
Analysis -> Logical Optimization -> Physical Planning -> Code Generation -> Execution

Runtime
Reference: A Deep Dive into Spark SQL’s Catalyst Optimizer,
Yin Huai, Spark Summit 2017
8
Spark SQL Engine - Back End
Analysis -> Logical Optimization -> Physical Planning -> Code Generation -> Execution

Runtime

9
Agenda

10
Agenda

Physical
Planning

11
Physical Planning
• Transform logical operators into physical operators

• Choose between different physical alternatives

- e.g., broadcast-hash-join vs. sort-merge-join

• Includes physical traits of the execution engine

- e.g., partitioning & ordering.

• Some ops may be mapped into multiple physical nodes

- e.g., partial agg —> shuffle —> final agg
1
A Physical Plan Example Scan B

SELECT a1, sum(b1)FROM A Filter

JOIN B ON A.key = B.key Scan A
WHERE b1 < 1000 GROUP BY a1 BroadcastExchange

Scan B BroadcastHashJoin
Scan A
Filter HashAggregate

Join ShuffleExchange

Aggregate HashAggregate

1
Scheduling a Physical Plan
Job 1 Stage 1
Scan B

Filter
• Scalar subquery
Broadcast exchange: BroadcastExchange
Job 2 Stage 1
- Executed as separate jobs Scan A

• Partition-local ops: BroadcastHashJoin

- Executed in the same stage

HashAggregate

• Shuffle: ShuffleExchange
Stage 2
- The stage boundary
- A sync barrier across all nodes HashAggregate

1
Agenda

Code
Generation

15
Execution, Old: Volcano Iterator Model
• Volcano iterator model
- All ops implement the same interface, e.g., next()
- next() on final op -> pull input from child op by calling child.next() -> goes on
and on, ending up with a propagation of next() calls
• Pros: Good abstraction; Easy to implement
• Cons: Virtual function calls —> less efficient
next() next() next()

Scan Filter Project Result Iterator iterate

1
Execution, New: Whole-Stage Code Generation
• Inspired by Thomas Neumann’s paper

• Fuse a string of operators (oftentimes Scan

the entire stage) into one WSCG op
that runs the generated code. long count = 0;
Filter for (item in sales) {
if (price < 100) {
• A general-purpose execution engine count += 1;
just like Volcano model but without Project }
Volcano’s performance downsides: }
- No virtual function calls
- Data in CPU registers Aggregate
- Loop unrolling & SIMD
Execution Models: Old vs. New
• Volcano iterator model: Pull model; Driven by the final operator
next() next() next()

Scan Filter Project Result Iterator iterate

• WSCG model: Push model; Driven by the head/source operator

next()

Scan Filter Project Result Iterator iterate

1
A Physical Plan Example - WSCG
Job 2
WSCG Job 1
Scan A
WSCG
BroadcastHashJoin
Stage 1 Scan B
Stage 1
HashAggregate
Filter

ShuffleExchange

WSCG BroadcastExchange
Stage 2
HashAggregate
Implementation
• The top node WholeStageCodegenExec implements the iterator
interface to interop with other code-gen or non-code-gen
physical ops.

• All underlying operators implement a code-generation interface:

doProduce() & doConsume()

• Dump the generated code: df.queryExecution.debug.codegen

Single dependency
• A WSCG node contains a linear list of physical operators that
support code generation.
• No multi dependency between enclosed ops.
• A WSCG node may consist of one or more pipelines.

WSCG
Pipeline 1 Pipeline 2

Op1 Op2 Op3 Op4 Op5

A Single Pipeline in WSCG
• A string of non-blocking operators form a pipeline in WSCG
• The head/source:
- Implement doProduce() - the driving loop producing source data.
• The rest:
- doProduce() - fall through to head of the pipeline.
- Implement doConsume() for its own processing logic.
produce produce produce produce

Op1 Op2 Op3 WSCG Generate

Code
consume consume consume
A Single Pipeline Example
Scan SELECT sid FROM emps WHERE age < 36
produce while (table.hasNext()) {
InternalRow row = table.next();
Filter

produce

Project

produce

WholeStageCodegen

START: if (shouldStop()) return;

produce
Generated for RowIterator }
A Single Pipeline Example
Scan SELECT sid FROM emps WHERE age < 36
produce consume while (table.hasNext()) {
InternalRow row = table.next();
Filter
if (row.getInt(2) < 36) {
produce

Project

produce
}
WholeStageCodegen

START: if (shouldStop()) return;

produce
}
WholeStageCodegen

START: if (shouldStop()) return;

produce consume ret = rowWriter.getRow();

}
WholeStageCodegen

START: if (shouldStop()) return;

produce
Generated for RowIterator }
Multiple Pipelines in WSCG
• Head (source) operator: • End (sink): RowIterator
- The source, w/ or w/o input RDDs - Pulls result from the last pipeline
- e.g., Scan, SortMergeJoin • Blocking operators:
• Non-blocking operators: - End of the previous pipeline
- In the middle of the pipeline - Start of a new pipeline
- e.g., Filter, Project - e.g., HashAggregate, Sort

WSCG source non-blocking blocking non-blocking sink

Pipeline 1 Pipeline 2
RowIterator
Op1 Op2 Op3 Op4 Op5
Blocking Operators in WSCG
• A Blocking operator, e.g., HashAggregateExec, SortExec, break
pipelines, so there may be multiple pipelines in one WSCG node.
• A Blocking operator’s doConsume():
- Implement the callback to build intermediate result.
• A Blocking operator’s doProduce():
- Consume the entire output from upstream to finish building
the intermediate result.
- Start a new loop and produce output for downstream based
on the intermediate result.
A Blocking Operator Example - HashAgg
SELECT age, count(*) FROM emps GROUP BY age
HashAggregate
doProduce() while (table.hasNext()) {
InternalRow row = table.next();
child.produce()
int age = row.getInt(2);
Scan hashMap.insertOrIncrement(sid);
}
consume

HashAggregate

produce

START: WholeStageCodegen
produce
A Blocking Operator Example - HashAgg
SELECT age, count(*) FROM emps GROUP BY age
HashAggregate
doProduce() while (table.hasNext()) {
InternalRow row = table.next();
child.produce()
int age = row.getInt(2);
Scan hashMap.insertOrIncrement(sid);
}
consume

HashAggregate while (hashMapIter.hasNext()) {

Entry e = hashMapIter.next();
produce rowWriter.write(0, e.getKey());
start a new pipeline
consume rowWriter.write(1, e.getValue());
ret = rowWriter.getRow();
START: WholeStageCodegen if (shouldStop()) return;
produce }
WSCG: BHJ vs. SMJ Job 1
WSCG
Scan B
Job 2
• BHJ (broadcast-hash-join) is a
Filter
pipelined operator. Scan A
WSCG

• BHJ executes the build side job first, BroadcastHashJoin BroadcastExchange

the same way as in non-WSCG.

HashAggregate

• BHJ is fused together with the probe ShuffleExchange

side plan (i.e., streaming plan) in
WSCG
WSCG.
HashAggregate
WSCG: BHJ vs. SMJ
Job 1 WSCG
Scan B
WSCG
Scan A
Filter

ShuffleExchange
• SMJ (sort-merge-join) is ShuffleExchange
WSCG
NOT fused with either child Sort WSCG
Sort
plan for WSCG. Child plans
are separate WSCG nodes. WSCG
SortMergeJoin

• Thus, SMJ must be the head HashAggregate

operator of a WSCG node.
ShuffleExchange

WSCG
HashAggregate
WSCG Limitations
• Problems:
- No JIT compilation for bytecode size over 8000 bytes (*).
- Over 64KB methods NOT allowed by Java Class format.

• Solutions:
- Fallback - spark.sql.codegen.fallback; spark.sql.codegen.hugeMethodLimit
- Move blocking loops into separate methods, e.g. hash-map building in
HashAgg and sort buffer building in Sort.
- Split consume() into individual methods for each operator -
spark.sql.codegen.splitConsumeFuncByOperator
About Us
Software Engineers

• Maryann Xue
PMC of Apache Calcite & Apache Phoenix @maryannxue
• Xingbo Jiang
Apache Spark Committer @jiangxb1987
• Kris Mok
OpenJDK Committer @rednaxelafx

34
Agenda

RDDs
(DAGs)

35
A Physical Plan Example Scan B

Filter

SELECT a1, sum(b1) Scan A

BroadcastExchange
FROM A JOIN B
ON A.key = B.key BroadcastHashJoin

WHERE b1 < 1000

HashAggregate
GROUP BY a1
ShuffleExchange

HashAggregate
RDD and Partitions
RDD(Resilient Node1
Distributed Dataset)
represents an Partition
Partition
immutable, Partition Node2
partitioned collection
of elements that can
be operated in Node3
RDD
parallel.
Physical Operator
Volcano
iterator while (iter.hasNext()) output
model {
val tmpVal = Partition
Filter Partition
iter.next() Partition
if (condition(tmpVal))
RDD
{
return tmpVal
}
}
A Physical Plan Example - Scheduling
Job 2
Job 1
Scan A

Scan B
BroadcastHashJoin
Stage 1 Stage 1
Filter
HashAggregate

ShuffleExchange
BroadcastExchange
Stage 2
HashAggregate
Stage Execution

Scan A
TaskSet0
Stage 1 BroadcastHashJoin

HashAggregate

Partition0 Partition1 Partition2 Partition3

Stage Execution
0 Task0
1 Task1
2 TaskSet1

3 Task3

Partitions
Stage Execution
Task0 0 Task0
1 Task1
TaskSet2 2 TaskSet1

3 Task3

Partitions
How to run a Task
spark.task.cpus=1

Task5 Task6 Task7

Task0 Task1 Task2 Task3 Task4

Executor(spark.executor.cores=5)
Fault Tolerance
● MPP-like analytics engines(e.g., Teradata, Presto, Impala):
○ Coarser-grained recovery model
○ Retry an entire query if any machine fails
○ Short/simple queries
● Spark SQL:
○ Mid-query recovery model
○ RDDs track the series of transformations used to build
them (the lineage) to recompute lost partitions.
○ Long/complex queries [e.g., complex UDFs]
Handling Task Failures
Task Failure Fetch Failure
● Don’t count the failure into
● Record the failure count of
task failure count
the task
● Retry the stage if stage failure
● Retry the task if failure
< maxStageFailures
count < maxTaskFailures
● Abort the stage and
● Abort the stage and
corresponding jobs if stage
corresponding jobs if count
failure >= maxStageFailures
>= maxTaskFailures
● Mark executor/host as lost
(optional)
Agenda

Memory
Management

46
Memory Consumption in Executor JVM
Spark uses memory for:
• RDD Storage [e.g., call cache()].
• Execution memory [e.g., Shuffle
and aggregation buffers]
• User code [e.g., allocate large
arrays]
Challenges:
• Task run in a shared-memory environment.
• Memory resource is not enough!
Execution Memory
Execution Memory
• Buffer intermediate results
• Normally short lived
Storage Memory

User Memory

Reserved Memory
Storage Memory
Execution Memory
• Reuse data for future
computation
• Cached data can be Storage Memory
long-lived
• LRU eviction for spill User Memory
data
Reserved Memory
Unified Memory Manager
Execution Memory
(1.0 - spark.memory.storageFraction) *
• Express execution and USABLE_MEMORY
storage memory as one
single unified region Storage Memory
spark.memory.storageFraction *
• Keep acquiring execution USABLE_MEMORY

memory and evict storage as User Memory

you need more execution (1.0 - spark.memory.fraction) *
(SYSTEM_MEMORY - RESERVED_MEMORY)
memory
Reserved Memory
RESERVED_SYSTEM_MEMORY_BYTES
(300MB)
Dynamic occupancy mechanism
spark.memory.storageFraction
• If one of its space is insufficient but the other is free, then it
will borrow the other’s space.
• If both parties don’t have enough space, evict storage
memory using LRU mechanism.
One problem remains...

• The memory resource is not enough!

Inside JVM Outside JVM
Managed by GC Not managed by GC

On-Heap Memory Off-Heap Memory

Executor Process
Off-Heap Memory

• Enabled by spark.memory.offHeap.enabled
• Memory size controlled by
spark.memory.offHeap.size
Execution Memory

Storage Memory
Off-Heap Memory
• Pros
• Speed: Off-Heap Memory > Disk
• Not bound by GC
• Cons
• Manually manage memory allocation/release
Tuning Data Structures
In Spark applications:
• Prefer arrays of objects instead of collection classes
(e.g., HashMap)
• Avoid nested structures with a lot of small objects and
pointers when possible
• Use numeric IDs or enumeration objects instead of strings
for keys
Tuning Memory Config

spark.memory.fraction
• More execution and storage memory
• Higher risk of OOM
spark.memory.storageFraction
• Increase storage memory to cache more data
• Less execution memory may lead to tasks spill more often
Tuning Memory Config
spark.memory.offHeap.enabled
spark.memory.offHeap.size
• Off-Heap memory not bound by GC
• On-Heap + Off-Heap memory must fit in total executor
memory (spark.executor.memory)
spark.shuffle.file.buffer
spark.unsafe.sorter.spill.reader.buffer.size
• Buffer shuffle file to amortize disk I/O
• More execution memory consumption
About Us
Software Engineers

• Maryann Xue
PMC of Apache Calcite & Apache Phoenix @maryannxue
• Xingbo Jiang
Apache Spark Committer @jiangxb1987
• Kris Mok
OpenJDK Committer @rednaxelafx

58
Agenda

Vectorized
Reader

59
Vectorized Readers
Read columnar format data as-is without converting to row
format.
• Apache Parquet
• Apache ORC
• Apache Arrow
• ...

60
Vectorized Readers
Parquet vectorized reader is 9 times faster than the non-
vectorized one.

See blog post

61
Vectorized Readers
Supported built-in data sources:
• Parquet
• ORC

Arrow is used for intermediate data in PySpark.

62
Implement DataSource
DataSource v2 API provides the way to implement your own
vectorized reader.
• PartitionReaderFactory
• supportColumnarReads(...) to return true
• createColumnarReader(...) to return
PartitionReader[ColumnarBatch]

• [SPARK-25186] Stabilize Data Source V2 API

63
Delta Lake
• Full ACID transactions
• Schema management
• Scalable metadata handling
• Data versioning and time travel
• Unified batch/streaming support Delta Lake: https://delta.io/
Documentation:
• Record update and deletion https://docs.delta.io
For details, refer to the blog
• Data expectation https://tinyurl.com/yxhbe2lg
Agenda

UDF

65
What’s behind foo(x) in Spark SQL?
What looks like a function call can be a lot of things:
• upper(str): Built-in function
• max(val): Aggregate function
• max(val) over …: Window function
• explode(arr): Generator
• myudf(x): User-defined function
• myudaf(x): User-defined aggregate function
• transform(arr, x -> x + 1): Higher-order function
• range(10): Table-value function
Functions in Spark SQL
Builtin Scalar Java / Scala Python UDF (*) Aggregate / Higher-order
Function UDF Window Function

Scope 1 Row 1 Row 1 Row Whole table 1 Row

Data Feed Scalar Scalar Batch of data Scalar expressions Expression of

expressions expressions + aggregate buffer complex type

Process Same JVM Same JVM Python Worker Same JVM Same JVM
process

Impl. Level Expression Expression Physical Operator Physical Operator Expression

Data Type Internal External External Internal Internal

(*): and all other non-Java user-defined functions

UDF execution
User Defined Functions:
• Java/Scala UDFs
• Hive UDFs
• when Hive support enabled
Also we have:
• Python/Pandas UDFs
• will talk later in PySpark execution

68
Java/Scala UDFs
• UDF: User Defined Function
• Java/Scala lambdas or method references can be used.

• UDAF: User Defined Aggregate Function

• Need to implement UserDefinedAggregateFunction.

69
UDAF
Implement UserDefinedAggregateFunction
• def initialize(...)
• def update(...)
• def merge(...)
• def evaluate(...)
• ...

70
Hive UDFs
Available when Hive support enabled.
• Register using create function command
• Use in HiveQL

71
Hive UDFs
Provides wrapper expressions for each UDF type:
• HiveSimpleUDF: UDF
• HiveGenericUDF: GenericUDF
• HiveUDAFFunction: UDAF
• HiveGenericUDTF: GenericUDTF

72
UDF execution
1. Before invoking UDFs, convert arguments from internal data
format to objects suitable for each UDF types.
• Java/Scala UDF: Java/Scala objects
• Hive UDF: ObjectInspector
2. Invoke the UDF.
3. After invocation, convert the returned values back to internal
data format.

73
Agenda

PySpark

74
PySpark
PySpark is a set of Python bindings for Spark APIs.
• RDD
• DataFrame
• other libraries based on RDDs, DataFrames.
• MLlib, Structured Streaming, ...

Also, SparkR: R bindings for Spark APIs

75
PySpark
RDD vs. DataFrame:
• RDD invokes Python functions on Python worker
• DataFrame just constructs queries, and executes it on the
JVM.
• except for Python/Pandas UDFs

76
PySpark execution
Python script drives Spark on JVM via Py4J.
Executors run Python worker.

Executor Python Worker

Driver

Executor Python Worker

Python
Executor Python Worker

77
PySpark and Pandas
Ease of interop: PySpark can convert data between PySpark
DataFrame and Pandas DataFrame.
• pdf = df.toPandas()
• df = spark.createDataFrame(pdf)

Note: df.toPandas() triggers the execution of the PySpark

DataFrame, similar to df.collect()

78
PySpark and Pandas (cont’d)
New way of interop: Koalas brings the Pandas API to Apache
Spark
import databricks.koalas as ks

import pandas as pd
pdf = pd.DataFrame({'x':range(3), 'y':['a','b','b'], 'z':['a','b','b']})

# Create a Koalas DataFrame from pandas DataFrame

df = ks.from_pandas(pdf)

# Rename the columns

df.columns = ['x', 'y', 'z1']

# Do some operations in place: https://github.com/databricks/koalas

df['x2'] = df.x * df.x
79
Agenda

Python/Pandas
UDF

80
Python UDF and Pandas UDF
@udf('double')
def plus_one(v):
return v + 1

@pandas_udf('double', PandasUDFType.SCALAR)
def pandas_plus_one(vs):
return vs + 1

81
Python/Pandas UDF execution

Batch of data Deserializer

Physical Operator

PythonRunner Invoke UDF

Batch of data Serializer

82
Python UDF execution

Batch of Rows Deserializer

PythonUDFRunner
Physical Operator

Invoke UDF

Batch of Rows Serializer

83
Pandas UDF execution

Batch of Columns Deserializer

ArrowPythonRunner
Physical Operator

Invoke UDF

Batch of Columns Serializer

84
Python/Pandas UDFs
Python UDF
• Serialize/Deserialize data with Pickle
• Fetch data in blocks, but invoke UDF row by row
Pandas UDF
• Serialize/Deserialize data with Arrow
• Fetch data in blocks, and invoke UDF block by block

85
Python/Pandas UDFs
Pandas UDF perform much better than row-at-a-time Python
UDFs.
• 3x to over 100x

See blog post

86
Further Reading
This Spark+AI Summit:
• Understanding Query Plans and Spark
Previous Spark Summits:
• A Deep Dive into Spark SQL’s Catalyst Optimizer
• Deep Dive into Project Tungsten: Bringing Spark Closer to
Bare Metal
• Improving Python and Spark Performance and
Interoperability with Apache Arrow
Thank you
Maryann Xue ([email protected])
Xingbo Jiang ([email protected])
Kris Mok ([email protected])

De Mod 5 Deploy Workloads With Databricks Workflows
No ratings yet
De Mod 5 Deploy Workloads With Databricks Workflows
19 pages
Fast Data Processing with Spark 2 - Third Edition
From Everand
Fast Data Processing with Spark 2 - Third Edition
Krishna Sankar
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Cloudera Spark Developer Training
No ratings yet
Cloudera Spark Developer Training
491 pages
Mastering Kafka Streams: From Basics to Expert Proficiency
From Everand
Mastering Kafka Streams: From Basics to Expert Proficiency
William Smith
No ratings yet
Learning Apache Spark 2
From Everand
Learning Apache Spark 2
Muhammad Asif Abbasi
No ratings yet
Unstructured Dataload Into Hive Database Through PySpark
No ratings yet
Unstructured Dataload Into Hive Database Through PySpark
9 pages
WP - Databricks vs. ETL Data Lake - Updated
No ratings yet
WP - Databricks vs. ETL Data Lake - Updated
12 pages
Databricks - Data Intelligence Platform For Advanced Data Architecture
No ratings yet
Databricks - Data Intelligence Platform For Advanced Data Architecture
5 pages
Spark Interview QUestions
No ratings yet
Spark Interview QUestions
200 pages
Databricks Cloud How To Log Analysis Example
No ratings yet
Databricks Cloud How To Log Analysis Example
9 pages
Apache Spark Analytics Made Simple
No ratings yet
Apache Spark Analytics Made Simple
76 pages
Window Function in Pyspark
100% (1)
Window Function in Pyspark
8 pages
Spark SQL Optimization
No ratings yet
Spark SQL Optimization
29 pages
Big Data Hadoop Training Certification 7
No ratings yet
Big Data Hadoop Training Certification 7
40 pages
Apache Druid: Sudhindra Tirupati Nagaraj
No ratings yet
Apache Druid: Sudhindra Tirupati Nagaraj
12 pages
Advanced Spark Training
0% (1)
Advanced Spark Training
49 pages
Spark: Prepared by Dulari Bhatt
No ratings yet
Spark: Prepared by Dulari Bhatt
19 pages
Apache Spark Interview Questions and Answers PDF
No ratings yet
Apache Spark Interview Questions and Answers PDF
31 pages
Steps in SHA-256 Algorithm
No ratings yet
Steps in SHA-256 Algorithm
5 pages
Apache Airflow On Docker For Complete Beginners - Justin Gage - Medium
No ratings yet
Apache Airflow On Docker For Complete Beginners - Justin Gage - Medium
12 pages
Ambari Operations
No ratings yet
Ambari Operations
194 pages
Apache Spark Essential Training
No ratings yet
Apache Spark Essential Training
30 pages
Apache Spark Architecture
No ratings yet
Apache Spark Architecture
7 pages
Final Print Py Spark
No ratings yet
Final Print Py Spark
133 pages
Apache Spark Python Slides
No ratings yet
Apache Spark Python Slides
186 pages
Complex Event Processing With Apache Flink Presentation
No ratings yet
Complex Event Processing With Apache Flink Presentation
49 pages
Spark Interview
No ratings yet
Spark Interview
17 pages
Top Answers To Spark Interview Questions
No ratings yet
Top Answers To Spark Interview Questions
4 pages
Simulado Databricks
No ratings yet
Simulado Databricks
25 pages
Flink Vs Spark by Slim Baltagi
No ratings yet
Flink Vs Spark by Slim Baltagi
67 pages
Mongodb Spark
No ratings yet
Mongodb Spark
13 pages
8 Steps For A Developer To Learn Apache Spark and Delta Lake PDF
No ratings yet
8 Steps For A Developer To Learn Apache Spark and Delta Lake PDF
35 pages
Bigdata Interview Preparation Guide
No ratings yet
Bigdata Interview Preparation Guide
292 pages
Spark Summit East 2015 - Adv Dev Ops - Student Slides
No ratings yet
Spark Summit East 2015 - Adv Dev Ops - Student Slides
219 pages
Orchestrate Redshift ETL Using AWS Glue and Step Functions Report
No ratings yet
Orchestrate Redshift ETL Using AWS Glue and Step Functions Report
31 pages
Airflow Introduction
No ratings yet
Airflow Introduction
9 pages
Spark Concept
No ratings yet
Spark Concept
18 pages
Apache HIVE
100% (1)
Apache HIVE
105 pages
ABD00 Notebooks Combined - Databricks
No ratings yet
ABD00 Notebooks Combined - Databricks
109 pages
Airflow 2 X
100% (2)
Airflow 2 X
39 pages
Databricks Cloud Workshop: SF, 2015-05-20! Download Slides
100% (1)
Databricks Cloud Workshop: SF, 2015-05-20! Download Slides
168 pages
Spark Notes
No ratings yet
Spark Notes
6 pages
PySpark Reference Guide
No ratings yet
PySpark Reference Guide
2 pages
Apache Spark Interview Questions
No ratings yet
Apache Spark Interview Questions
12 pages
4 - Action and RDD Transformations
No ratings yet
4 - Action and RDD Transformations
25 pages
Databricks Question
No ratings yet
Databricks Question
7 pages
Public - Crash Course - Apache Spark - Berlin - 2018 PDF
No ratings yet
Public - Crash Course - Apache Spark - Berlin - 2018 PDF
76 pages
What Is Spark?: History of Apache Spark
No ratings yet
What Is Spark?: History of Apache Spark
65 pages
Edureka - Scala Interview Questions
No ratings yet
Edureka - Scala Interview Questions
21 pages
Snow SQL
No ratings yet
Snow SQL
3 pages
7 Steps For A Developer To Learn Apache Spark
No ratings yet
7 Steps For A Developer To Learn Apache Spark
30 pages
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
No ratings yet
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
11 pages
Pyspark
No ratings yet
Pyspark
31 pages
Apache Spark - Optimization Techniques
No ratings yet
Apache Spark - Optimization Techniques
7 pages
Spark Interview Questions
100% (1)
Spark Interview Questions
8 pages
Spark Structured Streaming
No ratings yet
Spark Structured Streaming
655 pages
Spark Interview Questions 1713805760
No ratings yet
Spark Interview Questions 1713805760
40 pages
Spark Interview Questions: Click Here
No ratings yet
Spark Interview Questions: Click Here
35 pages
Set Your Data in Motion
No ratings yet
Set Your Data in Motion
8 pages
Sis Soft
No ratings yet
Sis Soft
8 pages
Veritas NetBackup102 Sheltered Harbor Solution Guide
No ratings yet
Veritas NetBackup102 Sheltered Harbor Solution Guide
44 pages
Network Monitoring Analysis Using Wireshark Tools (Term Project)
No ratings yet
Network Monitoring Analysis Using Wireshark Tools (Term Project)
19 pages
Communication Network and Internet Services
No ratings yet
Communication Network and Internet Services
8 pages
Roadmap BSC CE 20230310 v2
No ratings yet
Roadmap BSC CE 20230310 v2
1 page
Pembelajaran Sosial
No ratings yet
Pembelajaran Sosial
604 pages
Using MSI Automation To Modify A Developer 7 Project
No ratings yet
Using MSI Automation To Modify A Developer 7 Project
5 pages
PSSH HOWTO PDF
No ratings yet
PSSH HOWTO PDF
5 pages
Application Development-1
No ratings yet
Application Development-1
14 pages
Business Models of Platform As A Service (Paas) Providers: Current State and Future Directions
No ratings yet
Business Models of Platform As A Service (Paas) Providers: Current State and Future Directions
25 pages
Nidhi Project
No ratings yet
Nidhi Project
70 pages
Exam - NET Programmer
No ratings yet
Exam - NET Programmer
8 pages
Tableau - Online Material
No ratings yet
Tableau - Online Material
1,167 pages
AMDP & CDS ABAP On HANA Interview Question
No ratings yet
AMDP & CDS ABAP On HANA Interview Question
16 pages
Google Web Vitals
No ratings yet
Google Web Vitals
8 pages
Aws24 Inbooth Wednesday Dec 4
No ratings yet
Aws24 Inbooth Wednesday Dec 4
12 pages
Java Threads
No ratings yet
Java Threads
85 pages
Google Fundamentals of Digital Marketing Course Notes
No ratings yet
Google Fundamentals of Digital Marketing Course Notes
2 pages
Presentation Data Mining
No ratings yet
Presentation Data Mining
22 pages
Indian Local Language News Aggregator SRS
No ratings yet
Indian Local Language News Aggregator SRS
13 pages
Incident Log - Template
No ratings yet
Incident Log - Template
10 pages
Microsoft Azure Fundamentals - 04 - Azure Pricing and Support
No ratings yet
Microsoft Azure Fundamentals - 04 - Azure Pricing and Support
17 pages
Project Report On Online Individual Assessment System
No ratings yet
Project Report On Online Individual Assessment System
19 pages
Test SFSF 3
No ratings yet
Test SFSF 3
14 pages
Isp Email
No ratings yet
Isp Email
40 pages
SBasic ABAP
No ratings yet
SBasic ABAP
167 pages
Formulir Risk Treatment Plan
No ratings yet
Formulir Risk Treatment Plan
15 pages
Blueprint
No ratings yet
Blueprint
1 page
Release Strategy: Transaction Zme28-Release Purchase Order
No ratings yet
Release Strategy: Transaction Zme28-Release Purchase Order
7 pages
CH-4 Business Analytics (Wiley)
No ratings yet
CH-4 Business Analytics (Wiley)
30 pages

A Deep Dive Into Query Execution Engine of Spark SQL

Uploaded by

A Deep Dive Into Query Execution Engine of Spark SQL

Uploaded by

Deep Dive:

Query Execution of Spark SQL

Maryann Xue, Xingbo Jiang, Kris Mok

DATABRICKS CLOUD SERVICE

Catalyst Optimization & Tungsten Execution

Data Source Connectors Spark Core

Catalyst Optimization & Tungsten Execution

Data Source Connectors Spark Core

• Choose between different physical alternatives

• Includes physical traits of the execution engine

• Some ops may be mapped into multiple physical nodes

SELECT a1, sum(b1)FROM A Filter

• Partition-local ops: BroadcastHashJoin

- Executed in the same stage

Scan Filter Project Result Iterator iterate

• Fuse a string of operators (oftentimes Scan

Scan Filter Project Result Iterator iterate

• WSCG model: Push model; Driven by the head/source operator

Scan Filter Project Result Iterator iterate

• All underlying operators implement a code-generation interface:

• Dump the generated code: df.queryExecution.debug.codegen

Op1 Op2 Op3 Op4 Op5

Op1 Op2 Op3 WSCG Generate

START: if (shouldStop()) return;

START: if (shouldStop()) return;

START: if (shouldStop()) return;

produce consume ret = rowWriter.getRow();

START: if (shouldStop()) return;

WSCG source non-blocking blocking non-blocking sink

HashAggregate while (hashMapIter.hasNext()) {

• BHJ executes the build side job first, BroadcastHashJoin BroadcastExchange

the same way as in non-WSCG.

• BHJ is fused together with the probe ShuffleExchange

• Thus, SMJ must be the head HashAggregate

SELECT a1, sum(b1) Scan A

WHERE b1 < 1000

Partition0 Partition1 Partition2 Partition3

Task5 Task6 Task7

Task0 Task1 Task2 Task3 Task4

memory and evict storage as User Memory

• The memory resource is not enough!

On-Heap Memory Off-Heap Memory

See blog post

Arrow is used for intermediate data in PySpark.

• [SPARK-25186] Stabilize Data Source V2 API

Scope 1 Row 1 Row 1 Row Whole table 1 Row

Data Feed Scalar Scalar Batch of data Scalar expressions Expression of

Impl. Level Expression Expression Physical Operator Physical Operator Expression

Data Type Internal External External Internal Internal

(*): and all other non-Java user-defined functions

• UDAF: User Defined Aggregate Function

Also, SparkR: R bindings for Spark APIs

Executor Python Worker

Executor Python Worker

Note: df.toPandas() triggers the execution of the PySpark

# Create a Koalas DataFrame from pandas DataFrame

# Rename the columns

# Do some operations in place: https://github.com/databricks/koalas

Batch of data Deserializer

PythonRunner Invoke UDF

Batch of data Serializer

Batch of Rows Deserializer

Batch of Rows Serializer

Batch of Columns Deserializer

Batch of Columns Serializer

See blog post

You might also like