02 Distdbms Storage
02 Distdbms Storage
Systems@TUDa http://tuda.systems/
Announcements (29/10/2024)
Slides on Lab 0 tasks have been updated.
• Clarification on next_free and free_list
Systems@TUDa http://tuda.systems/ | 3
READING LIST
Silberschatz et. al.: Database Systems Concepts, 7th Edition (Chapter 20
to Chapter 21)
Systems@TUDa http://tuda.systems/ | 4
DISTRIBUTED DBMSs
Server
2
Server
1
Communication
Network
Server Server
4 3
Coordi
Worker
nator
Oracle Oracle
Worker
Peer Peer
SAP MSQL
Systems@TUDa http://tuda.systems/ | 7
FOCUS OF THIS & NEXT LECTURES
Distributed DBMS within a Data Center (aka Parallel DBMSs)
Network
Compute CPU
RAM
Multiple workers each having a
private portion of the data, each worker
Storage runs share of query on private data
CPU
Compute
RAM
Multiple servers each having
access to the shared data over network,
Network compute servers can run queries
on any part of data
RAM
Shared
Storage data
Inter-query parallelism
▪ Different queries are run in parallel across machines of a cluster
Network
Compute
Storage
Orders0 Partition 0
Orders1 Partition 1
Orders2 Partition 2
Orders3 Partition 3
Ordersh(oid)=0 Partition 0
h(oid)= 0
h(oid)= 3 Ordersh(oid)=1 Partition 1
h(oid)= 1
h(oid)= 1
h(oid)= 1 Ordersh(oid)=2 Partition 2
h(oid)= 2
h(oid)= 3
h(oid)= 2
Orders h(oid)=3 Partition 3
h(oid)=oid%4
OrdersTime in Q1 Partition 0
OrdersTime in Q2 Partition 1
OrdersTime in Q3 Partition 2
OrdersTime in Q4 Partition 3
c⨝o=?
oid total cid
cid cname 1 100 1 o1 Node N2
Node N2 c1 1 Smith 3 99 2
c⨝o=?
oid total cid
cid cname 1 100 1 o1 Node N2
Node N2 c1 1 Smith 4 199 1
Range-Partitioning:
• Pro: Pruning for key-lookups (year=2017) and range queries (year>=2010)
• Con: Sensitive to partition skew
Hash-Partitioning:
• Pro: Pruning for key-lookups, avoids partitioning skew (if hash function is good)
• Con: Pruning does not work for range queries
Systems@TUDa http://tuda.systems/ | 12/6/22 | 35
RECAP: CO-PARTITION & DISTRIBUTED JOINS
Customer0 (Node 0) Orders0 (Node 0)
cid cname cage oid total cid
Co-Partition 5 199 3
3 C 22 ⨝ c.cid=o.cid
on join key … … …
19 55 6
6 F 35
3
6
C
F
22
35
5
…
19
199
……
199
5555
3
…
6
3 1
…1
6 2
?
Customer1 (Node 1) Orders1 (Node 1) Part1 (Node 1)
cid cname cage oid total
oid total cid cidpart part name
1
4
A
D
19
59
11
……
77
77
7777
……
499
499
4
…
1
4 1
…3
1 3
? 1 Phone
2 TV
3 Comp.
Customer2 (Node 2) Orders2 (Node 2) Part2 (Node 2)
cid cname cage Oid.
oid total
total cid part
cid
2
5
B
E
37
28
33
……
45
45
……
100
100
1010
2
…
2
2 2
…3
5 3
?
Hash-Partitioned: cid%3 Hash-Partitioned: cid%3
PREDICATE-BASED REFERENCE PARTITIONING
Based on paper: https://dl.acm.org/doi/10.1145/2723372.2723718
fk fk
CUSTOMER C ORDERS O LINEITEM L
fk
fk fk
NATION SUPPLIER S
fk fk
CUSTOMER C ORDERS O LINEITEM L
fk
fk fk
NATION SUPPLIER S
2
Workload-driven (WD): Use workload to derive join paths
SELECT D3.city,
SUM(F.price)
FROM F, D2, D3, D4
WHERE F.d2 = D2.id
AND F.d3 = D3.id
AND F.d4 = D4.id
AND D2.quarter = 4
AND D2.year = 2019
AND D3.country='Germany'
AND D4.product='Lenovo T61’
GROUP BY D3.city
Sales0
FactSales
Table1 F:
Sales2
Sales
Sales3
D4: Product0
D4: Product
D1: Payment D4: Product1
D4: Product2
D4: Product3
Systems@TUDa http://tuda.systems/ | 12/6/22 | 47
EXAMPLE: STAR SCHEMA
Step 2: Partitions of dimension and fact table are co-located on nodes
Node 3
Node 1
Sales0
Sales2
D4:
Product0
D4:
Product2
Node 4
Node 2
Sales1
Sales3
D4:
Product1
D4:
Product3
Node 3
Node 1
Sales0
Sales2
D4:
Product0 D1:
D1: D4:
Payment Payment Product2
D3: D3:
D2: Time D2: Time Location
Location
Node 4
Node 2
Sales1
D4: Sales3
D1: Product1 D1:
Payment Payment D4:
Product3
Systems@TUDa http://tuda.systems/ | 12/6/22 | 49
ISSUE: MANY LARGE DIMENSION TABLES
Problem: There are multiple large dimension tables and they do not fit
all in one node = cannot fully replicate them
Solution 1:
• Partition all dimension tables
• ... but then co-partitioning of all dimension tables with fact table is not
possible → distributed join with fact table needed (see next lecture)
https://dspace.mit.edu/handle/1721.1/73347
OLTP PARTITIONING: SCHISM
1. Build a graph from a workload trace
▪ Nodes: Tuples accessed by the transactions (txn) in trace
▪ Edges: Connect tuples accessed by same txn via an edge
OLTP PARTITIONING: SCHISM
2. Partitioning should minimize distributed txns
• Idea: min-cut of graph minimizes distributed txns
EXAMPLE: BUILDING A GRAPH
EXAMPLE: BUILDING A GRAPH
EXAMPLE: BUILDING A GRAPH
EXAMPLE: BUILDING A GRAPH
EXAMPLE: BUILDING A GRAPH