Query Processing
Query Processing
Systems
M. Tamer Özsu
Patrick Valduriez
query
processor
• Query optimization
➡ How do we determine the “best” execution plan?
Strategy 1
ENAME(RESP=“Manager”EMP.ENO=ASG.ENO(EMP×ASG))
Strategy 2
(EMP ⋈ENO (RESP=“Manager” (ASG))
ENAME
Site 5 Site 5
result EMP1' EMP2' result= (EMP1 U EMP2)⋈ENOσRESP=“Manager”(ASG1 U ASG2)
EMP1' EMP2'
Site 3 Site 4 ASG1 ASG2 EMP1 EMP2
EMP’1=EMP1 ⋈ENO ASG’1 EMP’2=EMP2 ⋈ENO ASG’2
Site 1 Site 2 Site 3 Site 4
ASG1' ASG'2
Site 1 Site 2
ASG1' σ RESP"Manager" ASG1 ASG'2 σ RESP"Manager" ASG2
Select
• Assume Project O(n)
(without duplicate elimination)
➡ relations of cardinality n
➡ sequential scan Project
(with duplicate elimination) O(n log n)
Group
Join
Semi-join O(n log n)
Division
Set Operators
➡ Optimal
• Heuristics
➡ Not optimal
Query GLOBAL
Decomposition SCHEMA
Fragment Query
Global STATS ON
Optimization FRAGMENTS
ASG ASG
EMP.ENO=ASG.ENO ASG.PNO=PROJ.PNO EMP.ENO=ASG.ENO ASG.PNO=PROJ.PNO
TITLE =
EMP RESP PROJ EMP PROJ
“Programmer”
ENAME
RESULT
PNAME=“CAD/CAM”
ASG
ENAME
RESULT
PNAME=“CAD/CAM”
SELECT TITLE
FROM EMP
WHERE EMP.ENAME = "J. Doe"
SELECT ENAME
PNAME=“CAD/CAM” Select
FROM PROJ, ASG, EMP
WHERE ASG.ENO=EMP.ENO ENAME≠“J. DOE”
AND ASG.PNO=PROJ.PNO
AND ENAME ≠ "J. Doe"
AND PROJ.PNAME="CAD/CAM" ⋈PNO
AND (DUR=12 OR DUR=24)
⋈ENO Join
⋈PNO,ENO
⋈PNO
PNO,ENAME
⋈ENO
SELECT ENAME
PNAME=“CAD/CAM” Select
FROM PROJ, ASG, EMP
WHERE ASG.ENO=EMP.ENO ENAME≠“J. DOE”
AND ASG.PNO=PROJ.PNO
AND ENAME ≠ "J. Doe"
AND PROJ.PNAME="CAD/CAM" ⋈PNO
AND (DUR=12 OR DUR=24)
⋈ENO Join
...
ENO=“E5” ENO=“E5”
SELECT ENAME
FROM EMP
ENAME ENAME
⋈ENO
TITLE=“Mech. Eng.”
⋈ENO ⋈ENO
TITLE=“Mech. Eng.”
ASG2 EMP2
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/46
Reduction for Hybrid
Fragmentation
• Combine the rules already specified:
➡ Remove empty relations generated by contradicting selections on
horizontal fragments;
➡ Remove useless relations generated by projections on vertical fragments;
➡ Distribute joins over unions in order to isolate and remove useless joins.
× ASG
PROJ EMP
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/51
Cost-Based Optimization
• Solution space
➡ The set of equivalent algebra expressions (query trees).
• Cost function (in terms of time)
➡ I/O cost + CPU cost + communication cost
➡ These might have different weights in different distributed
environments (LAN vs WAN).
➡ Can also maximize throughput
• Search algorithm
➡ How do we move inside the solution space?
➡ Exhaustive search, heuristic algorithms (iterative improvement,
simulated annealing, genetic,…)
Input Query
Equivalent QEP
Best QEP
⋈ R3 ⋈ ⋈
R1 R2 R1 R2 R3 R4
⋈ ⋈ R3 ⋈ R3
R1 R2 R1 R2 R1 R2
• Randomized
⋈ ⋈
⋈ R3 ⋈ R2
R1 R2 R1 R3
• Response Time
➡ Do as many things as possible in parallel
➡ May increase total time because of increased total activity
Site 2 y units
1
S F(A = value) =
card(∏A(R))
max(A) – value
S F(A >value) =
max(A) – min(A)
value – max(A)
S F(A <value) =
max(A) – min(A)
SF(p(Ai) p(Aj)) = SF(p(Ai)) × SF(p(Aj))
SF(p(Ai) p(Aj)) = SF(p(Ai)) + SF(p(Aj)) – (SF(p(Ai)) × SF(p(Aj)))
SF(A{value}) = SF(A= value) * card({values})
q': SELECT V1.A1 INTO R1'
FROM R1 V1
WHERE P1(V1.A1)
q12: SELECT ASG.ENO INTO GVAR
FROM ASG,JVAR
WHERE ASG.PNO=JVAR.PNO
ASG
ENO PNO
EMP PROJ
EMP ⋈ ASG EMP × PROJ ASG ⋈ EMP ASG ⋈ PROJ PROJ ⋈ PROJ × EMP
pruned pruned pruned ASG pruned
Consider
PROJ ⋈PNO ASG ⋈ENO EMP
Site 2
ASG
ENO PNO
EMP PROJ
Site 1 Site 3
5. EMP Site 2
PROJ Site 2
Site 2 computes EMP ⋈ PROJ ⋈ ASG
➡S' = A(S)
➡S' Site 1
➡Site 1 computes R' = R ⋉AS'
➡R' Site 2
➡Site 2 computes R' ⋈AS
Semijoin is better if
size(A(S)) + size(R ⋉AS)) < size(R)