0% found this document useful (0 votes)

43 views

Duckdb Parallelism

Uploaded by

yoonghm

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views

Duckdb Parallelism

Uploaded by

yoonghm

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Mark Raasveldt

Parallel Quacking
Parallel Quacking

▸ When building DuckDB we have mostly

focused on building a functional system
▸ Avoid premature optimization
▸ Avoid adding optimizations that prevent
adding features
Parallel Quacking

▸ Suddenly people are benchmarking our system

▸ Including benchmarks in research papers
▸ Yikes!

▸ We haven’t exactly spend a lot of time

optimizing…
Parallel Quacking

▸ We are now pretty happy with functionality

▸ Window functions, subqueries, collations,
(recursive) CTEs, Parquet/Pandas/CSV
readers, …
▸ Maybe we should start optimizing!
Parallel Quacking

▸ DuckDB is currently single-threaded

▸ Parallelism is an obvious performance boost

▸ More importantly: parallelism requires a

structural change to the code
▸ Optimizations need to account for parallelism
▸ Optimizing a single-threaded HT is pointless if
we have to throw it away once we add
parallelism!
Parallel Quacking

▸ Parallelism is actually our oldest open issue!

▸ Created one month after the initial commit

▸ So it’s about time :)

DBMS Parallelism

▸ Short intro to DBMS parallelism

▸ DBMS have two types of parallelism
▸ Inter-query and intra-query parallelism

▸ Inter-query: multiple different queries

can be executed in parallel
▸ Intra-query: a single query can be
parallelized
DBMS Parallelism

▸ Most systems have inter-query

▸ We already had this

▸ Most useful for OLTP systems

▸ Many concurrent clients requests, etc
DBMS Parallelism

▸ Intra-query is not part of most OLTP

systems
▸ e.g. MySQL/PostgreSQL/SQLite
▸ Not useful for small queries

▸ Only useful for complex queries

▸ Aka OLAP systems
DBMS Parallelism

▸ Exchange operator: original way of

doing parallelism
▸ Parallelism is encapsulated in the
exchange operator
▸ All other ops are unaware of parallelism
▸ Easy to bolt onto existing systems

[1993] Encapsulation of Parallelism and

Architecture-Independence in Extensible
Database Query Execution

Goetz Graefe et al.

DBMS Parallelism
DBMS Parallelism

▸ MonetDB uses system similar to exchange

operator
▸ Individual ops are parallelism-unaware

▸ Data is partitioned by mitosis (mergetable?)

▸ Ops execute sequentially on partitions
▸ Result is combined by mat.pack
DBMS Parallelism

▸ Exchange operator works to parallelize queries

▸ It is nice to bolt on to an existing system
▸ Don’t need to change any operators!

▸ But has partitioning/merging overhead…

▸ Works well for certain queries1, not for many
others
▸ 1ungrouped aggregates or aggregates with low
amount of groups
Morsel-Driven Parallelism

▸ Alternative: Morsel-driven parallelism

▸ Parallelism-aware operators
▸ Query is divided into pipelines
▸ Those pipelines are executed in parallel

[2014] Morsel-Driven Parallelism: A

NUMA-Aware Query Evaluation
Framework for the Many-Core Age

Viktor Leis et al.

Morsel-Driven Parallelism

SELECT …
FROM S
JOIN R USING (A) 3: Probe HTs and output result
JOIN T USING (B); (depends on 1 and 2)

1: HT Build “T”

2: HT Build “S”
Morsel-Driven Parallelism

SELECT …
FROM S
JOIN R USING (A)
JOIN T USING (B);

HT Build “T”

HT Build “S”

▸ HT builds of S and T can be trivially parallelized

▸ No shared data
▸ Limited parallelizability: depends on Q complexity…
Morsel-Driven Parallelism

▸ Need to parallelize inside a pipeline

▸ How to do that?
▸ Contention happens at endpoints
▸ Scan of T
▸ HT build at join HT Build “T”

▸ Use parallelism-aware operators at endpoints

▸ The rest of the operators (HT probe, projection,
filter, etc…) don't need to be aware
Morsel-Driven Parallelism

TPC-H SF100, 32 cores

[2014] Morsel-Driven Parallelism: A

NUMA-Aware Query Evaluation
Framework for the Many-Core Age

Viktor Leis et al.

Morsel-Driven Parallelism

TPC-H SF100, 32 cores

[2014] Morsel-Driven Parallelism: A

NUMA-Aware Query Evaluation
32 cores, 64 hardware threads Framework for the Many-Core Age

Viktor Leis et al.

Morsel-Driven Parallelism

TPC-H SF100, 32 cores

[2014] Morsel-Driven Parallelism: A

NUMA-Aware Query Evaluation
32 cores, 64 hardware threads Framework for the Many-Core Age

Viktor Leis et al.

Morsel-Driven Parallelism

TPC-H SF100, 32 cores

[2014] Morsel-Driven Parallelism: A

NUMA-Aware Query Evaluation
32 cores, 64 hardware threads Framework for the Many-Core Age

Viktor Leis et al.

Morsel-Driven Vegetable Soup

▸ Morsel-driven parallelism seems like the way to go

▸ How can we add it to our vegetable soup?

Parallelism in DuckDB

▸ DuckDB uses a pull-based volcano execution model

▸ "Vector Volcano”

▸ Every operator implements a GetChunk operator

▸ Recursively calls GetChunk on children
▸ Until we reach a data source (e.g. table scan)
Parallelism in DuckDB

▸ BuildHashTable: pull everything from RHS (build-side)

▸ ProbeHashTable: pull single chunk from LHS (probe
side)
Parallelism in DuckDB

▸ Have to split up building from probing

▸ Create individual pipelines
▸ Design interface that allows for parallel-aware execution
Parallelism in DuckDB

▸ Contention is in the source and sink of a pipeline

▸ Most difficult contention is in the sink
▸ Splitting up a scan is relatively simple
Parallelism in DuckDB

▸ Sink Interface
▸ Sink has two states
▸ Global state: single state per sink
▸ Local state: single state per thread
▸ Actual content depends on the operator
Parallelism in DuckDB

▸ Sink Interface
▸ Sink takes as input the two states + a DataChunk
▸ Called repeatedly until the source data is exhausted
Parallelism in DuckDB

▸ Sink Interface
▸ Combine is called after source of a single thread is
exhausted

▸ Combine is the final chance to merge any changes

in the local sink state to the global state
Parallelism in DuckDB

▸ Sink Interface
▸ Finalize is called after all tasks related to the sink
are completed
Parallelism in DuckDB

▸ Example: Ungrouped Aggregate

▸ Global state holds the aggregate result, and a lock
Parallelism in DuckDB

▸ Example: Ungrouped Aggregate

▸ Local state holds a thread-local aggregate, and
some intermediates
Parallelism in DuckDB

▸ Example: Ungrouped Aggregate

▸ Sink: Aggregate into thread-local aggregation
Parallelism in DuckDB

▸ Example: Ungrouped Aggregate

▸ Combine: Merge local state into global state
Parallelism in DuckDB

▸ Example: Ungrouped Aggregate

▸ Finalize: Nothing, we are done
▸ (both Combine and Finalize are optional)
Parallelism in DuckDB

▸ Splitting up scans
▸ Splitting up scans is generally not very difficult
▸ But we have multiple types of scans
▸ Base table, parquet, CSV, aggregate HT, etc…
▸ How to split up depends on scan type
Parallelism in DuckDB

▸ Interface for parallel scans:

▸ One task is created for every invoked callback

▸ Implementation is optional
▸ No implementation -> scan will not be parallelized
Parallelism in DuckDB

▸ Currently only implemented for base table

▸ One task for every 100 vectors (102,400 tuples)

▸ Parquet/Pandas is not very complicated

▸ CSV can also benefit…
▸ Future work!
Parallelism in DuckDB

▸ Creating the pipelines

▸ Created by a single traversal of the query tree
▸ Encounter a pipeline breaker: create a new
pipeline
Parallelism in DuckDB

Encounter hash join: create build pipeline in RHS*

SELECT … and create a dependency in main pipeline
FROM S
JOIN R USING (A)
JOIN T USING (B);

* This image is taken from HyPer which builds on the LHS - we build on the RHS.
Is there a standard? Should we switch this? Is it even important?
Parallelism in DuckDB

SELECT …
FROM S
JOIN R USING (A)
Another hash join: create another
JOIN T USING (B); build pipeline and dependency
Parallelism in DuckDB

TPC-H Q1

P1 (depends on P2)
Scans the aggregate HT!

This 0 is a bug in our profiler with parallel execution atm, TODO

Parallelism in DuckDB

▸ Notes on parallelism
▸ The final pipeline (i.e. the one that outputs
results) is not parallelized

▸ Doesn’t matter for TPC-H (there is always a Top-N

or ORDER BY…)

▸ But can definitely matter for other queries!

▸ We can push a “materialize” operator that

materializes in parallel

▸ Future work!
Parallelism in DuckDB

▸ Notes on load balancing

▸ Pipelines are split into tasks
▸ Tasks are scheduled in a concurrent queue
▸ Worker threads work on these tasks in scheduled
order

▸ Except the calling thread: this thread works on its

own query

▸ Short queries will not have to wait for long queries

▸ Every query has at least one thread working on it
Parallelism in DuckDB

▸ NUMA Awareness
▸ TODO :)
Preliminary Results

▸ Results
▸ Before we implemented splitting of scans we were
curious

▸ How much does TPC-H benefit from inter-pipeline

parallelization?
Preliminary Results

▸ Small speedup in some queries

▸ Most queries are dominated by a single pipeline!

* Actually 3 threads, due to an off-by-one :)

Preliminary Results

▸ Preliminary results (including splitting of pipelines)

▸ Notes
▸ We did not implement a good aggregate HT yet!
▸ Currently global HT that is locked on every sink

▸ Join HT/scan also have a (low) amount of contention

▸ Did not have much time to look at it yet
▸ This was all finished last Thursday :)
Preliminary Results

▸ Preliminary results
Preliminary Results

▸ Preliminary results
Preliminary Results
▸ Q1
Parallel Sequential
Preliminary Results
▸ Q18 Sequential
Preliminary Results
▸ Q18 Parallel
Future Work

▸ Future Work
▸ Rework aggregate hash table
▸ More profiling of contention (specifically in scans)
▸ Parallel window functions, ORDER BY, Top N…
▸ Parallel Parquet/CSV/Pandas scans
▸ Expand profiler to better display parallelism/
pipelines

RDS 81346 Wind Farm Example (RDS-PS)
No ratings yet
RDS 81346 Wind Farm Example (RDS-PS)
1 page
Wind Power Plant Testing and Commissioning
100% (1)
Wind Power Plant Testing and Commissioning
27 pages
CISCO SAN Commands: 1.to Get Help Commands
No ratings yet
CISCO SAN Commands: 1.to Get Help Commands
9 pages
ADBMS Parallel and Distributed Databases
No ratings yet
ADBMS Parallel and Distributed Databases
98 pages
Query Parallelism
No ratings yet
Query Parallelism
8 pages
Mark Raasveldt & Hannes Mühleisen: Duckdb
No ratings yet
Mark Raasveldt & Hannes Mühleisen: Duckdb
38 pages
Lecture 1 Parallel Databases
No ratings yet
Lecture 1 Parallel Databases
30 pages
Parallel Databases
No ratings yet
Parallel Databases
11 pages
Parallel Database: Architecture For Parallel Databases. Parallel Query Evaluation Parallelizing Individual Operations
No ratings yet
Parallel Database: Architecture For Parallel Databases. Parallel Query Evaluation Parallelizing Individual Operations
27 pages
DuckDB Benchmarking
No ratings yet
DuckDB Benchmarking
4 pages
Unit No.4 Parallel Database
No ratings yet
Unit No.4 Parallel Database
32 pages
Ads unit 3
No ratings yet
Ads unit 3
8 pages
p64 Stonebraker PDF
No ratings yet
p64 Stonebraker PDF
8 pages
Parallel DB /D.S.Jagli 1 5/4/2012 1 1. Parallel DB /D.S.Jagli
No ratings yet
Parallel DB /D.S.Jagli 1 5/4/2012 1 1. Parallel DB /D.S.Jagli
70 pages
ParallelDBs PDF
No ratings yet
ParallelDBs PDF
23 pages
Parallel & Distributed Databases: C S 5 6 1 - S P R I N G 2 0 1 2 Wpi, Mohamed Eltabakh
No ratings yet
Parallel & Distributed Databases: C S 5 6 1 - S P R I N G 2 0 1 2 Wpi, Mohamed Eltabakh
23 pages
TDD: Topics in Distributed Databases: Parallel Database Management Systems
No ratings yet
TDD: Topics in Distributed Databases: Parallel Database Management Systems
38 pages
ADTHEORY1
No ratings yet
ADTHEORY1
15 pages
Fundamentals of Database Systems: (Parallel and Distributed Databases)
No ratings yet
Fundamentals of Database Systems: (Parallel and Distributed Databases)
46 pages
Unit 2adtnotes
No ratings yet
Unit 2adtnotes
74 pages
DuckDB in Action MEAP v01 Chptrs 1to3 MotheDuck
100% (1)
DuckDB in Action MEAP v01 Chptrs 1to3 MotheDuck
71 pages
Introduction To DBMS
No ratings yet
Introduction To DBMS
37 pages
Unit 5 Parallel and Distributed Databases
No ratings yet
Unit 5 Parallel and Distributed Databases
22 pages
Parallel Database
No ratings yet
Parallel Database
27 pages
UNIT-3: Introduction To Parallel Database and I/O Parallelism
No ratings yet
UNIT-3: Introduction To Parallel Database and I/O Parallelism
52 pages
M.C.a. (Sem - IV) Paper - IV - Adavanced Database Techniques
No ratings yet
M.C.a. (Sem - IV) Paper - IV - Adavanced Database Techniques
114 pages
Adv DBMS-Unit 2
No ratings yet
Adv DBMS-Unit 2
15 pages
To Paralelel or Not
No ratings yet
To Paralelel or Not
62 pages
Elective-I Advanced Database Management Systems: Unit Ii
100% (1)
Elective-I Advanced Database Management Systems: Unit Ii
141 pages
Inter and Intra Query Parallelism
No ratings yet
Inter and Intra Query Parallelism
1 page
Module1 ADBMS
No ratings yet
Module1 ADBMS
99 pages
Cs6005 - Advanced Database Systems (Unit-1)
No ratings yet
Cs6005 - Advanced Database Systems (Unit-1)
136 pages
LN 2
No ratings yet
LN 2
33 pages
Second Unit ADBMS
No ratings yet
Second Unit ADBMS
53 pages
SAYAN_GHOSH_26900123054_DISTRIBUTED_DATABASE_SYSTEM_CSE_6TH_SEM
No ratings yet
SAYAN_GHOSH_26900123054_DISTRIBUTED_DATABASE_SYSTEM_CSE_6TH_SEM
11 pages
Database Management Systems: Unit 4 - Parallel DBMS
No ratings yet
Database Management Systems: Unit 4 - Parallel DBMS
14 pages
8-Parallel Nhom5
No ratings yet
8-Parallel Nhom5
59 pages
Download complete DuckDB in Action MEAP V02 Mark Needham ebook PDF file all chapters
100% (1)
Download complete DuckDB in Action MEAP V02 Mark Needham ebook PDF file all chapters
82 pages
DWHM 1
No ratings yet
DWHM 1
12 pages
Parallel-Databases
No ratings yet
Parallel-Databases
10 pages
Advanced Database Management System - Tutorials and Notes - Partitioned Parallel Hash Join
No ratings yet
Advanced Database Management System - Tutorials and Notes - Partitioned Parallel Hash Join
6 pages
14-queryexecution2
No ratings yet
14-queryexecution2
47 pages
BCSE412L - Parallel Computing 01
No ratings yet
BCSE412L - Parallel Computing 01
27 pages
DuckDB in Action MEAP V03 Mark Needham Michael Hunger Michael Simons pdf download
100% (1)
DuckDB in Action MEAP V03 Mark Needham Michael Hunger Michael Simons pdf download
41 pages
Types of Database Parallelism
No ratings yet
Types of Database Parallelism
2 pages
02 Lecf 13 Map Reduce
No ratings yet
02 Lecf 13 Map Reduce
81 pages
2 Parallel Databases
No ratings yet
2 Parallel Databases
44 pages
Cap 5
No ratings yet
Cap 5
50 pages
Dbms
No ratings yet
Dbms
14 pages
EB3053 New
No ratings yet
EB3053 New
22 pages
Sayan Ghosh 26900123054 Distributed Database System Cse 6th Sem
No ratings yet
Sayan Ghosh 26900123054 Distributed Database System Cse 6th Sem
11 pages
Inter-Query Parallelism: Interquery and Intraquery Parallelism in Parallel Database
No ratings yet
Inter-Query Parallelism: Interquery and Intraquery Parallelism in Parallel Database
2 pages
Parallel DBMS Vendors
No ratings yet
Parallel DBMS Vendors
14 pages
Para Distr Query Processing Notes
No ratings yet
Para Distr Query Processing Notes
7 pages
14-queryexecution2 (1)
No ratings yet
14-queryexecution2 (1)
6 pages
Lecture 3 Distributed and Dynamic Indexing
No ratings yet
Lecture 3 Distributed and Dynamic Indexing
13 pages
Parallel Execution in Oracle
No ratings yet
Parallel Execution in Oracle
17 pages
Lecture 09
No ratings yet
Lecture 09
25 pages
DuckDB in Action MEAP v02 Chptrs 1to4 MotheDuck
No ratings yet
DuckDB in Action MEAP v02 Chptrs 1to4 MotheDuck
123 pages
Parallel Database
No ratings yet
Parallel Database
22 pages
Relayd and Httpd Mastery: IT Mastery, #11
From Everand
Relayd and Httpd Mastery: IT Mastery, #11
Michael W. Lucas
No ratings yet
Types of Database Parallelism1
No ratings yet
Types of Database Parallelism1
2 pages
IR - Assigment - RK
No ratings yet
IR - Assigment - RK
14 pages
2020 RDS General Introduction 2020 06 25 - Handout
100% (1)
2020 RDS General Introduction 2020 06 25 - Handout
35 pages
Control Engineering
No ratings yet
Control Engineering
28 pages
Ubuntu 1404 Server Guide
No ratings yet
Ubuntu 1404 Server Guide
380 pages
Guide For RDS-PS
No ratings yet
Guide For RDS-PS
9 pages
AN4503.Power Management For Kinetis
No ratings yet
AN4503.Power Management For Kinetis
69 pages
Analyzing The Performance of CC and Debugging Opengl Es Frames On Mainstream x86 and Arm Android Devices
No ratings yet
Analyzing The Performance of CC and Debugging Opengl Es Frames On Mainstream x86 and Arm Android Devices
14 pages
Wind Power Plant SCADA and Controls
No ratings yet
Wind Power Plant SCADA and Controls
7 pages
Vestas V112-3.0 MW Wind Turbine
100% (1)
Vestas V112-3.0 MW Wind Turbine
20 pages
BTLE Tech Training From BTSIG
No ratings yet
BTLE Tech Training From BTSIG
420 pages
Lessons in Electric Circuits, Volume I - DC
No ratings yet
Lessons in Electric Circuits, Volume I - DC
560 pages
Unmanaged API
No ratings yet
Unmanaged API
29 pages
EL003 Electronic Projects New Collections Vol 3N
100% (1)
EL003 Electronic Projects New Collections Vol 3N
238 pages
Harrison 950m
No ratings yet
Harrison 950m
4 pages
Interview Questions On OBIEE
No ratings yet
Interview Questions On OBIEE
4 pages
300 620 DCACI v1.1
No ratings yet
300 620 DCACI v1.1
2 pages
Ch4 Modern Ethernet
No ratings yet
Ch4 Modern Ethernet
6 pages
2018 EEE243 Spring MID TERM Solution
No ratings yet
2018 EEE243 Spring MID TERM Solution
6 pages
One Voice Operations Center Users Manual Ver 78 Nov
No ratings yet
One Voice Operations Center Users Manual Ver 78 Nov
408 pages
11.4.4.2 Lab - Task and System CLI Commandsc7fa
No ratings yet
11.4.4.2 Lab - Task and System CLI Commandsc7fa
5 pages
MapR v4.X Upgrade Documentation v1.2
No ratings yet
MapR v4.X Upgrade Documentation v1.2
38 pages
Cassandra
No ratings yet
Cassandra
7 pages
Ameritec Am5ext 200
No ratings yet
Ameritec Am5ext 200
8 pages
UCS SDK v4.0 Programmer's Manual - Eng
No ratings yet
UCS SDK v4.0 Programmer's Manual - Eng
267 pages
Computer 2 0 Batch RBE E Book Hindi With Latest SSC TCS Questions
No ratings yet
Computer 2 0 Batch RBE E Book Hindi With Latest SSC TCS Questions
179 pages
Humidity/temperature Transmitter Duct Mounting
No ratings yet
Humidity/temperature Transmitter Duct Mounting
2 pages
The Unix File System
No ratings yet
The Unix File System
4 pages
Fullcoursebook 2013 12 11 PDF
No ratings yet
Fullcoursebook 2013 12 11 PDF
257 pages
10 Interrupt v21 Rev1
No ratings yet
10 Interrupt v21 Rev1
43 pages
Micro Controler 8051
No ratings yet
Micro Controler 8051
71 pages
Practical-4: Aim: Learn Basics of Cisco Packet Tracer (CPT) and Establish Peer To Peer
No ratings yet
Practical-4: Aim: Learn Basics of Cisco Packet Tracer (CPT) and Establish Peer To Peer
2 pages
Res2dinv 2
No ratings yet
Res2dinv 2
151 pages
M156B3-LA1-ChiMei Schematic
No ratings yet
M156B3-LA1-ChiMei Schematic
27 pages
IDS806 Installer Manual
No ratings yet
IDS806 Installer Manual
40 pages
Microsoft Power Apps: A Getting Started Guide
No ratings yet
Microsoft Power Apps: A Getting Started Guide
18 pages
Cec Client
No ratings yet
Cec Client
9 pages
EN Jabra Evolve2 55 Data Sheet A4 WEB 150323
No ratings yet
EN Jabra Evolve2 55 Data Sheet A4 WEB 150323
2 pages
Piezo Buzzer and Variables
No ratings yet
Piezo Buzzer and Variables
3 pages
Skylight Element LT v7.9.4 Release Notes
No ratings yet
Skylight Element LT v7.9.4 Release Notes
119 pages
Assosa University College of Engineering Department of Electrical and Computer Engineering Stream: Communication Title: Wireless TV Audio Transmitter
No ratings yet
Assosa University College of Engineering Department of Electrical and Computer Engineering Stream: Communication Title: Wireless TV Audio Transmitter
10 pages