L1 Distributed QueryProcessing
L1 Distributed QueryProcessing
These slides are a modified version of the slides provided with the book
Özsu and Valduriez, Principles of Distributed Database Systems (3rd Ed.), 2011
The original version of the slides is available at: extras.springer.com
⋆ Özsu and Valduriez, Principles of Distributed Database Systems (3rd Ed.), 2011
1
10-08-2024
query
processor
Selecting Alternatives
SELECT *
FROM EMP,ASG
WHERE EMP.ENO = ASG.ENO
AND RESP = "Manager"
2
10-08-2024
• QEP must include information on communications (data transfers among sites) and on which
sites operations are performed
• Use of semijoins to reduce the amount of data transferred among sites
➡ Focus of the optimizer is selecting optimal order for join and semijoin operations
EMP’1 EMP’2
Site 3 Site 4 ASG1 ASG2 EMP1 EMP2
EMP’1= EMP1 ⋈ENO ASG’1 EMP’2= EMP2 ⋈ENO ASG’2
Site 1 Site 2 Site 3 Site 4
Assume
ASG’1 ASG’2 ➡ card(EMP) = 400
➡ card(ASG) = 1000
Site 1 Site 2 ➡ 20 managers in ASG
ASG’1 = RESP=“Manager”(ASG1) ASG’2 = RESP=“Manager”(ASG2) ➡ indexes on ASG.RESP and EMP.ENO
➡ access cost per tuple = 1 unit
➡ network transfer cost per tuple = 10 units
3
10-08-2024
Cost of Alternatives
• Assume
➡ card (EMP) = 400, card(ASG) = 1000, 20 managers in ASG
➡ indexes on ASG.RESP and EMP.ENO
➡ tuple access cost = 1 unit; tuple transfer cost = 10 units
• Strategy A
➡ produce ASG': (10+10) tuple access cost 20
➡ transfer ASG' to the sites of EMP: (10+10) tuple transfer cost 200
➡ produce EMP': (10+10) 2 tuple access cost 40
➡ transfer EMP' to result site: (10+10) tuple transfer cost 200
Total Cost 460
• Strategy B
➡ transfer EMP to site 5: 400 tuple transfer cost 4,000
➡ transfer ASG to site 5: 1000 tuple transfer cost 10,000
➡ produce ASG': 1000 tuple access cost 1,000
➡ join EMP and ASG': 400 20 tuple access cost 8,000
Total Cost 23,000
Query GLOBAL
Decomposition SCHEMA
Fragment Query
Global STATS ON
Optimization FRAGMENTS