10-08-2024
Overview of distributed query
processing
Dario Della Monica
These slides are a modified version of the slides provided with the book
Özsu and Valduriez, Principles of Distributed Database Systems (3rd Ed.), 2011
The original version of the slides is available at: extras.springer.com
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/1
Outline (distributed DB)
• Introduction (Ch. 1) ⋆
• Distributed Database Design (Ch. 3) ⋆
• Distributed Query Processing (Ch. 6-8) ⋆
➡ Overview (Ch. 6) ⋆
➡ Query decomposition and data localization (Ch. 7) ⋆
➡ Distributed query optimization (Ch. 8) ⋆
• Distributed Transaction Management (Ch. 10-12) ⋆
⋆ Özsu and Valduriez, Principles of Distributed Database Systems (3rd Ed.), 2011
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/2
1
10-08-2024
Query Processing in a D-DBMS
high level user query
query
processor
Low-level data manipulation
commands for D-DBMS
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/3
Selecting Alternatives
SELECT *
FROM EMP,ASG
WHERE EMP.ENO = ASG.ENO
AND RESP = "Manager"
EMP ⋈ENO (RESP=“Manager” (ASG))
RESP=“Manager” (EMP ⋈ENO (ASG))
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/4
2
10-08-2024
What are the Additional Problems?
• More parameters
➡ Replication of fragments
➡ Data exchange alternatives/multiple sites
• To transform a global query on relations of a distributed DB (seen as a single DB by the user)
into local queries on fragments stored on several local DB’s (data localization)
• QEP must include information on communications (data transfers among sites) and on which
sites operations are performed
• Use of semijoins to reduce the amount of data transferred among sites
➡ Focus of the optimizer is selecting optimal order for join and semijoin operations
• Centralized vs. distributed optimization
• Cost to minimize
➡ Centralized DB: CPU and I/O cost only (actually, only I/O)
➡ Distributed DB: also communication costs
➡ Communication costs are the dominating ones (even though this might not be the case with
increased network speed, especially within Local Area Network)
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/5
What are the Additional Problems?
– Example
• Global query: EMP ⋈ENO (RESP=“Manager” (ASG))
• Fragmentation and allocation
ASG1 = σENO ≤“E3”(ASG) (site 1)
ASG2= σENO >“E3”(ASG) (site 2) Relational algebra must be extended to
EMP1= σENO ≤“E3”(EMP) (site 3)
EMP2= σENO >“E3”(EMP) (site 4)
model exchanging data between sites
Query result (site 5)
Site 5 Site 5 Strategy B
Strategy A result = EMP’1 ∪ EMP’2 result= (EMP1 ∪ EMP2)⋈ENO(σRESP=“Manager”(ASG1 ∪ ASG2))
EMP’1 EMP’2
Site 3 Site 4 ASG1 ASG2 EMP1 EMP2
EMP’1= EMP1 ⋈ENO ASG’1 EMP’2= EMP2 ⋈ENO ASG’2
Site 1 Site 2 Site 3 Site 4
Assume
ASG’1 ASG’2 ➡ card(EMP) = 400
➡ card(ASG) = 1000
Site 1 Site 2 ➡ 20 managers in ASG
ASG’1 = RESP=“Manager”(ASG1) ASG’2 = RESP=“Manager”(ASG2) ➡ indexes on ASG.RESP and EMP.ENO
➡ access cost per tuple = 1 unit
➡ network transfer cost per tuple = 10 units
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/6
3
10-08-2024
Cost of Alternatives
• Assume
➡ card (EMP) = 400, card(ASG) = 1000, 20 managers in ASG
➡ indexes on ASG.RESP and EMP.ENO
➡ tuple access cost = 1 unit; tuple transfer cost = 10 units
• Strategy A
➡ produce ASG': (10+10) tuple access cost 20
➡ transfer ASG' to the sites of EMP: (10+10) tuple transfer cost 200
➡ produce EMP': (10+10) 2 tuple access cost 40
➡ transfer EMP' to result site: (10+10) tuple transfer cost 200
Total Cost 460
• Strategy B
➡ transfer EMP to site 5: 400 tuple transfer cost 4,000
➡ transfer ASG to site 5: 1000 tuple transfer cost 10,000
➡ produce ASG': 1000 tuple access cost 1,000
➡ join EMP and ASG': 400 20 tuple access cost 8,000
Total Cost 23,000
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/7
Distributed Query Processing
Methodology
Calculus Query on Distributed Relations
Query GLOBAL
Decomposition SCHEMA
Algebraic Query on Distributed
Relations
CONTROL
Data FRAGMENT
SITE Localization SCHEMA
Fragment Query
Global STATS ON
Optimization FRAGMENTS
Optimized Fragment Query
with Communication Operations
LOCAL Local LOCAL
Optimization SCHEMAS
SITES
Optimized Local Queries
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/8