0% found this document useful (0 votes)
19 views

L1 Distributed QueryProcessing

Uploaded by

naziya1531
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

L1 Distributed QueryProcessing

Uploaded by

naziya1531
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

10-08-2024

Overview of distributed query


processing
Dario Della Monica

These slides are a modified version of the slides provided with the book
Özsu and Valduriez, Principles of Distributed Database Systems (3rd Ed.), 2011
The original version of the slides is available at: extras.springer.com

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/1

Outline (distributed DB)


• Introduction (Ch. 1) ⋆

• Distributed Database Design (Ch. 3) ⋆

• Distributed Query Processing (Ch. 6-8) ⋆


➡ Overview (Ch. 6) ⋆

➡ Query decomposition and data localization (Ch. 7) ⋆


➡ Distributed query optimization (Ch. 8) ⋆

• Distributed Transaction Management (Ch. 10-12) ⋆

⋆ Özsu and Valduriez, Principles of Distributed Database Systems (3rd Ed.), 2011

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/2

1
10-08-2024

Query Processing in a D-DBMS


high level user query

query
processor

Low-level data manipulation


commands for D-DBMS

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/3

Selecting Alternatives
SELECT *
FROM EMP,ASG
WHERE EMP.ENO = ASG.ENO
AND RESP = "Manager"

EMP ⋈ENO (RESP=“Manager” (ASG))

RESP=“Manager” (EMP ⋈ENO (ASG))

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/4

2
10-08-2024

What are the Additional Problems?


• More parameters
➡ Replication of fragments
➡ Data exchange alternatives/multiple sites

• To transform a global query on relations of a distributed DB (seen as a single DB by the user)


into local queries on fragments stored on several local DB’s (data localization)

• QEP must include information on communications (data transfers among sites) and on which
sites operations are performed
• Use of semijoins to reduce the amount of data transferred among sites
➡ Focus of the optimizer is selecting optimal order for join and semijoin operations

• Centralized vs. distributed optimization


• Cost to minimize
➡ Centralized DB: CPU and I/O cost only (actually, only I/O)
➡ Distributed DB: also communication costs
➡ Communication costs are the dominating ones (even though this might not be the case with
increased network speed, especially within Local Area Network)

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/5

What are the Additional Problems?


– Example
• Global query: EMP ⋈ENO (RESP=“Manager” (ASG))
• Fragmentation and allocation
ASG1 = σENO ≤“E3”(ASG) (site 1)
ASG2= σENO >“E3”(ASG) (site 2) Relational algebra must be extended to
EMP1= σENO ≤“E3”(EMP) (site 3)
EMP2= σENO >“E3”(EMP) (site 4)
model exchanging data between sites
Query result (site 5)
Site 5 Site 5 Strategy B
Strategy A result = EMP’1 ∪ EMP’2 result= (EMP1 ∪ EMP2)⋈ENO(σRESP=“Manager”(ASG1 ∪ ASG2))

EMP’1 EMP’2
Site 3 Site 4 ASG1 ASG2 EMP1 EMP2
EMP’1= EMP1 ⋈ENO ASG’1 EMP’2= EMP2 ⋈ENO ASG’2
Site 1 Site 2 Site 3 Site 4
Assume
ASG’1 ASG’2 ➡ card(EMP) = 400
➡ card(ASG) = 1000
Site 1 Site 2 ➡ 20 managers in ASG
ASG’1 = RESP=“Manager”(ASG1) ASG’2 = RESP=“Manager”(ASG2) ➡ indexes on ASG.RESP and EMP.ENO
➡ access cost per tuple = 1 unit
➡ network transfer cost per tuple = 10 units

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/6

3
10-08-2024

Cost of Alternatives
• Assume
➡ card (EMP) = 400, card(ASG) = 1000, 20 managers in ASG
➡ indexes on ASG.RESP and EMP.ENO
➡ tuple access cost = 1 unit; tuple transfer cost = 10 units
• Strategy A
➡ produce ASG': (10+10)  tuple access cost 20
➡ transfer ASG' to the sites of EMP: (10+10)  tuple transfer cost 200
➡ produce EMP': (10+10)  2  tuple access cost 40
➡ transfer EMP' to result site: (10+10)  tuple transfer cost 200
Total Cost 460
• Strategy B
➡ transfer EMP to site 5: 400  tuple transfer cost 4,000
➡ transfer ASG to site 5: 1000  tuple transfer cost 10,000
➡ produce ASG': 1000  tuple access cost 1,000
➡ join EMP and ASG': 400 20 tuple access cost 8,000
Total Cost 23,000

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/7

Distributed Query Processing


Methodology
Calculus Query on Distributed Relations

Query GLOBAL
Decomposition SCHEMA

Algebraic Query on Distributed


Relations
CONTROL
Data FRAGMENT
SITE Localization SCHEMA

Fragment Query

Global STATS ON
Optimization FRAGMENTS

Optimized Fragment Query


with Communication Operations

LOCAL Local LOCAL


Optimization SCHEMAS
SITES

Optimized Local Queries


Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/8

You might also like