8 Query Optimization
8 Query Optimization
• Introduction
• Background
• Distributed Database Design
• Database Integration
• Semantic Data Control
• Distributed Query Processing
➡ Overview
➡ Query decomposition and localization
➡ Distributed query optimization
• Search algorithm
➡ How do we move inside the solution space?
➡ Exhaustive search, heuristic algorithms (iterative improvement,
simulated annealing, genetic,…)
Input Query
Equivalent QEP
Best QEP
PROJ EMP
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.8/5
Search Space
Restrict by means of heuristics
Perform unary operations before binary operations
…
⋈ R3 ⋈ ⋈
R1 R2 R1 R2 R3 R4
⋈ ⋈ R3 ⋈ R3
R1 R2 R1 R2 R1 R2
• Randomized
⋈ ⋈
⋈ R3 ⋈ R2
R1 R2 R1 R3
Site 2 y units
q': SELECT V1.A1 INTO R1'
FROM R1 V1
WHERE P1(V1.A1)
q12: SELECT ASG.ENO INTO GVAR
FROM ASG,JVAR
WHERE ASG.PNO=JVAR.PNO
• Merge join
sort relations
merge relations
➡ Complexity: n1+ n2 if relations are previously sorted and equijoin
ASG
ENO PNO
EMP PROJ
EMP ⋈ ASG EMP × PROJ ASG ⋈ EMPASG ⋈ PROJ PROJ ⋈ PROJ × EMP
pruned pruned pruned ASG pruned
$a=A
$a=A
➡ System R*
➡ Two-step
• Semijoin ordering
➡ SDD-1
Consider
PROJ ⋈PNO ASG ⋈ENO EMP
Site 2
ASG
ENO PNO
EMP PROJ
Site 1 Site 3
5. EMP Site 2
PROJ Site 2
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.8/39
Semijoin Algorithms
• Consider the join of two relations:
➡ R[A] (located at site 1)
➡ S[A](located at site 2)
• Alternatives:
1. Do the join R ⋈AS
2. Perform one of the semijoin equivalents
➡ S' = A(S)
➡ S' Site 1
➡ Site 1 computes R' = R ⋉AS'
➡ R' Site 2
➡ Site 2 computes R' ⋈AS
Semijoin is better if
size(A(S)) + size(R ⋉AS)) < size(R)
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.8/41
Distributed Dynamic
Algorithm
1. Execute all monorelation queries (e.g., selection, projection)
2. Reduce the multirelation query to produce irreducible
subqueries
q1 q2 … qnsuch that there is only one relation between qi
and qi+1
3. Choose qi involving the smallest fragments to execute (call
MRQ')
4. Find the best execution strategy for MRQ'
a) Determine processing site
b) Determine fragments to move
5. Repeat 3 and 4
• Fetch as needed
➡ Number of messages = O(cardinality of external relation)
➡ Data transfer per message is minimal
➡ Better if relations are large and the selectivity is good
Distributed DBMS
* avg. ©inner tuple size/msg. size
M. T. Özsu & P. Valduriez Ch.8/46
Static Approach –
Vertical Partitioning & Joins
3. Move both inner and outer relations to another site
• Hybrid optimization
➡ Choose-plan approach can be used
➡ 2-step approach simpler
➡ A query Q ={q1, q2, q3, q4} such that each subquery qi is the
maximum processing unit that accesses one relation and
communicates with its neighboring queries
➡ For each qi in Q, a feasible allocation set of sites Sq={s1, s2, …,sk}
where each site stores a copy of the relation in qi