DDB Lec5
DDB Lec5
1
2
Step 3: Global Query Optimization
3
Problem of Global Query Optimization
7
Three Join Tree Examples
SELECT ENAME, RESP
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO AND ASG.PNO=PROJ.PNO
(a) PNO (b) ENO
(c) ENO,PNO
X ASG
PROJ EMP 8
Restricting the Size of Search Space
(c) ENO,PNO
X ASG
PROJ EMP 9
Restricting the Size of Search Space (cont.)
R4
R3
R1 R2 R3 R4
R1 R2 10
Search Strategy
How to move in the search space?
Deterministic and randomized
Deterministic
Starting from base relations, joining one more relation at
each step until complete plans are obtained
Dynamic programming builds all possible plans first,
breadth-first, before it chooses the “best” plan
– the most popular search strategy
Greedy algorithm builds only one plan, depth-first R4
R3 R3
R1 R2
R1 R2 R1 R2 11
Search Strategy (cont.)
Randomized
Trade optimization time for execution time
Better when > 5-6 relations
Do not guarantee the best solution is obtained, but avoid the
high cost of optimization in terms of memory and time
Search for optimalities around a particular starting point
By iterative improvement and simulated annealing
R3 R2
R1 R2 R1 R3
12
Search Strategy (cont.)
13
Cost Functions
Total time
the sum of all time (also referred to as cost) components
Response Time
the elapsed time from the initiation to the completion of the
query
14
Total Cost
15
Total Cost Factors
17
Example
card ( R >< S )
SF>< ( R, S )
card ( R)* card ( S )
21
Intermediate Relation Size
Selection
card ( F ( R)) SF ( F ) card ( R )
1
SF ( A value)
card ( A ( R))
max( A) value
SF ( A value)
max( A) min( A)
value min( A)
SF ( A value)
max( A) min( A)
SF ( P ( Ai ) P ( Aj )) SF ( P ( Ai )) SF ( P ( Aj ))
SF ( P ( Ai ) P ( Aj ))
SF ( P ( Ai )) SF ( P ( Aj )) SF ( P ( Ai )) SF ( P ( Aj ))
SF ( A {values}) SF ( A value) card ({values}) 22
Intermediate Relation Size (cont.)
Projection
23
Intermediate Relation Size (cont.)
Cartesian product
card ( R S ) card ( R) card (S )
Union
24
Intermediate Relation Size (cont.)
Join
No general way for its calculation. Some systems use the
upper bound of card(R*S) instead. Some estimations can
be used for simple cases.
Special case: A is a key of R and B is a foreign key of S
card R >< A B S card S
More general:
card R >< A B S SF>< ( R, S )* card ( R)* card S
25
Intermediate Relation Sizes (cont.)
Semijoin
where
SF (R A S) = SF (S.A) = card ( A ( S )) / card dom[ A]
26
Centralized Query Optimization
27
INGRES Language: QUEL
One-variable query
Queries containing a single variable.
Multivariable query
Queries containing more than one variable.
QUEL can be equally translated into SQL. So we
32
INGRES – Detachment (cont.)
q: SELECT V2.A2, V3.A3, …, Vn.An
FROM R1 V1, R2 V2, …, Rn Vn
WHERE P1(V1.A1) AND P2(V1.A1, V2.A2, …, Vn.An)
q’ - one variable query generated by the single
variable predicate P1:
SELECT V1.A1 INTO R1’
FROM R1 V1
WHERE P1(V1.A1)
Note
34
INGRES – Detachment Example
Original query q1
SELECT E.ENAME
FROM EMP E, ASG G, PROJ J
WHERE E.ENO=G.ENO AND
J.PNO=G.PNO AND
J.PNAME=“CAD/CAM”
35
INGRES – Detachment Example (cont.)
First use the one variable predicate to get
q11 and q’ such that q = q11 q’
q11:
SELECT J.PNO INTO JVAR
FROM PROJ J
WHERE PNAME=“CAD/CAM”
q’:
SELECT E.ENAME
FROM EMP E, ASG G, JVAR
WHERE E.ENO=G.ENO
AND G.PNO=JVAR.PNO
36
INGRES – Detachment Example (cont.)
Then q’ is further decomposed into q12q13
SELECT E.ENAME
q13 FROM EMP E, GVAR
WHERE E.ENO=GVAR.ENO
38
System R
Static
query optimization based on exhaustive
search of the solution space
Simple(i.e., mono-relation) queries are executed
according to the best access path
Execute joins
Determine the possible ordering of joins
Determine the cost of each ordering
Choose the join ordering with minimal cost
39
System R Algorithm
41
System R Algorithm - Example
Find names of employees working on the CAD/CAM project.
Assume
EMP has an index on ENO
ASG has an index on PNO
PROJ has an index on PNO and an index on PNAME
ASG
ENO PNO
EMP PROJ
42
System R Example (cont.)
Choose the best access paths to each relation
EMP: sequential scan (no selection on EMP)
ASG: sequential scan (no selection on ASG)
PROJ: index on PNAME (there is a selection on PROJ based on
PNAME)
Determine the best join ordering
EMP ASG PROJ
ASG PROJ EMP
PROJ ASG EMP
ASG EMP PROJ
EMP PROJ ASG
PROJ EMP ASG
Select the best ordering based on the join costs evaluated
according to the two join methods 43
System R Example (cont.)
alternative joins
EMP ASG EMP × PROJ ASG EMP ASG PROJ PROJ ASG PROJ × EMP
Final plan:
select PROJ using index on PNAME
then join with ASG using index on PNO
then join with EMP using index on ENO
45
46