Lecture09 Optimization
Lecture09 Optimization
Data Management
Lectures 9
Query Optimization (Part 1)
Logical
Query Select Logical Plan
plan
optimization
Select Physical Plan
Physical
plan
Query Execution
Disk 3
Query Optimization
Goal:
• Why difficult:
– Need to explore a large number of plans
– Need to estimate the cost of each plan
4
Query Optimization
Three major components:
2. Search space
2. Search space
• Independence
• Containment of values
• Preservation of values
CSEP 544 - Spring 2021 9
Size Estimation
Selectivity Factors
Uniformity assumption σA=c(R)
Equality:
12
T(σpred(R)) = θpred * T(R)
Selectivity Factors
Uniformity assumption σA=c(R)
Equality:
• θA=c = 1/V(R,A)
13
T(σpred(R)) = θpred * T(R)
Selectivity Factors
Uniformity assumption σA=c(R)
Equality:
• θA=c = 1/V(R,A)
σc1<A<c2(R)
Range:
• θc1<A<c2 = (c2 – c1)/(max(R,A) - min(R,A))
14
T(σpred(R)) = θpred * T(R)
Selectivity Factors
Uniformity assumption σA=c(R)
Equality:
• θA=c = 1/V(R,A)
σc1<A<c2(R)
Range:
• θc1<A<c2 = (c2 – c1)/(max(R,A) - min(R,A))
Selectivity Factors
Uniformity assumption σA=c(R)
Equality:
• θA=c = 1/V(R,A)
σc1<A<c2(R)
Range:
• θc1<A<c2 = (c2 – c1)/(max(R,A) - min(R,A))
Selectivity Factors
R ⋈R.A=S.B S
Join
Selectivity Factors
R ⋈R.A=S.B S
Join
• θ R.A=S.B = 1/ ( MAX( V(R,A), V(S,B))
Selectivity Factors
R ⋈R.A=S.B S
Containment of values: if V(R,A) ≤ V(S,B), then
the set of A values of R is included in the set of
B values of S
Selectivity Factors
R ⋈R.A=S.B S
Assume V(R,A) ≤ V(S,B)
• Tuple t in R joins with T(S)/V(S,B) tuples in S
Selectivity Factors
R ⋈R.A=S.B S
Assume V(R,A) ≤ V(S,B)
• Tuple t in R joins with T(S)/V(S,B) tuples in S
• Hence T(R ⨝A=B S) = T(R) T(S) / V(S,B)
Selectivity Factors
R ⋈R.A=S.B S
Assume V(R,A) ≤ V(S,B)
• Tuple t in R joins with T(S)/V(S,B) tuples in S
• Hence T(R ⨝A=B S) = T(R) T(S) / V(S,B)
In general:
• T(R ⨝A=B S) = T(R) T(S) / max(V(R,A),V(S,B))
• θ R.A=S.B = 1/ ( max( V(R,A), V(S,B))
• Examples next...
sid = sid
Supply Supplier
T(Supplier) = 1000
T(Supply) = 10000 B(Supplier) = 100
B(Supply) = 100 V(Supplier, scity) = 20
V(Supplier, state) = 10
M=11
V(Supply, pno) = 2500
Supplier(sid, sname, scity, sstate)
Supply(sid, pno, quantity)
sid = sid
Supply Supplier
T(Supplier) = 1000
T(Supply) = 10000 B(Supplier) = 100
B(Supply) = 100 V(Supplier, scity) = 20
V(Supplier, state) = 10
M=11
V(Supply, pno) = 2500
Supplier(sid, sname, scity, sstate)
Supply(sid, pno, quantity)
Supply Supplier
T(Supplier) = 1000
T(Supply) = 10000 B(Supplier) = 100
B(Supply) = 100 V(Supplier, scity) = 20
V(Supplier, state) = 10
M=11
V(Supply, pno) = 2500
Supplier(sid, sname, scity, sstate)
Supply(sid, pno, quantity)
Supply Supplier
T(Supplier) = 1000
T(Supply) = 10000 B(Supplier) = 100
B(Supply) = 100 V(Supplier, scity) = 20
V(Supplier, state) = 10
M=11
V(Supply, pno) = 2500
Supplier(sid, sname, scity, sstate)
Supply(sid, pno, quantity)
Estimated
(why?) Logical Query Plan 1
πsname
SELECT sname
T <1 FROM Supplier x, Supply y
WHERE x.sid = y.sid
and y.pno = 2
σpno=2∧scity=‘Seattle’∧sstate=‘WA’ and x.scity = ‘Seattle’
and x.sstate = ‘WA’
T = 10000
sid = sid
Supply Supplier
T(Supplier) = 1000
T(Supply) = 10000 B(Supplier) = 100
B(Supply) = 100 V(Supplier, scity) = 20
V(Supplier, state) = 10
M=11
V(Supply, pno) = 2500
Supplier(sid, sname, scity, sstate)
Supply(sid, pno, quantity)
sid = sid
σpno=2 σscity=‘Seattle’∧sstate=‘WA’
Supply Supplier
T(Supplier) = 1000
T(Supply) = 10000 B(Supplier) = 100
B(Supply) = 100 V(Supplier, scity) = 20
V(Supplier, state) = 10
M=11
V(Supply, pno) = 2500
Supplier(sid, sname, scity, sstate)
Supply(sid, pno, quantity)
σpno=2 σscity=‘Seattle’∧sstate=‘WA’
Supply Supplier
T(Supplier) = 1000
T(Supply) = 10000 B(Supplier) = 100
B(Supply) = 100 V(Supplier, scity) = 20
V(Supplier, state) = 10
M=11
V(Supply, pno) = 2500
Supplier(sid, sname, scity, sstate)
Supply(sid, pno, quantity)
Supply Supplier
T(Supplier) = 1000
T(Supply) = 10000 B(Supplier) = 100
B(Supply) = 100 V(Supplier, scity) = 20
V(Supplier, state) = 10
M=11
V(Supply, pno) = 2500
Supplier(sid, sname, scity, sstate)
Supply(sid, pno, quantity)
Supply Supplier
T(Supplier) = 1000
T(Supply) = 10000 B(Supplier) = 100
B(Supply) = 100 V(Supplier, scity) = 20
V(Supplier, state) = 10
M=11
V(Supply, pno) = 2500
Supplier(sid, sname, scity, sstate)
Supply(sid, pno, quantity)
Supply Supplier
T(Supplier) = 1000
T(Supply) = 10000 B(Supplier) = 100
B(Supply) = 100 V(Supplier, scity) = 20
V(Supplier, state) = 10
M=11
V(Supply, pno) = 2500
Supplier(sid, sname, scity, sstate)
Supply(sid, pno, quantity)
Physical Plan 1
πsname
T <1
σpno=2∧scity=‘Seattle’∧sstate=‘WA’
T = 10000
Total cost: 100/10 * 100 = 1000
sid = sid
Block nested loop join
Scan
Supply Scan
Supplier
T(Supplier) = 1000
T(Supply) = 10000 B(Supplier) = 100
B(Supply) = 100 V(Supplier, scity) = 20
V(Supplier, state) = 10
M=11
V(Supply, pno) = 2500
Supplier(sid, sname, scity, sstate)
Supply(sid, pno, quantity)
Physical Plan 1
πsname
T <1
σpno=2∧scity=‘Seattle’∧sstate=‘WA’
T = 10000
Total cost: 100+100*100/10 = 1100
sid = sid
Block nested loop join
Scan
Supply Scan
Supplier
T(Supplier) = 1000
T(Supply) = 10000 B(Supplier) = 100
B(Supply) = 100 V(Supplier, scity) = 20
V(Supplier, state) = 10
M=11
V(Supply, pno) = 2500
Supplier(sid, sname, scity, sstate)
Supply(sid, pno, quantity)
Physical Plan 2
πsname
Cost of Supply(pno) = 4
T=4 Cost of Supplier(scity) = 50
Total cost: 54
T= 5
T=4 sid = sid
Main memory join
σsstate=‘WA’
Unclustered σpno=2 T= 50
index lookup σscity=‘Seattle’ Unclustered
Supply(pno) index lookup
Supply Supplier Supplier(scity)
T(Supplier) = 1000
T(Supply) = 10000 B(Supplier) = 100
B(Supply) = 100 V(Supplier, scity) = 20
V(Supplier, state) = 10
M=11
V(Supply, pno) = 2500
Supplier(sid, sname, scity, sstate)
Supply(sid, pno, quantity)
Physical Plan 2
πsname
Cost of Supply(pno) = 4
T=4 Cost of Supplier(scity) = 50
Total cost: 54
T= 5
T=4 sid = sid
Main memory join
σsstate=‘WA’
Unclustered σpno=2 T= 50
index lookup σscity=‘Seattle’ Unclustered
Supply(pno) index lookup
Supply Supplier Supplier(scity)
T(Supplier) = 1000
T(Supply) = 10000 B(Supplier) = 100
B(Supply) = 100 V(Supplier, scity) = 20
V(Supplier, state) = 10
M=11
V(Supply, pno) = 2500
Supplier(sid, sname, scity, sstate)
Supply(sid, pno, quantity)
Physical Plan 2
πsname
Cost of Supply(pno) = 4
T=4 Cost of Supplier(scity) = 50
Total cost: 54
T= 5
T=4 sid = sid
Main memory join
σsstate=‘WA’
Unclustered σpno=2 T= 50
index lookup σscity=‘Seattle’ Unclustered
Supply(pno) index lookup
Supply Supplier Supplier(scity)
T(Supplier) = 1000
T(Supply) = 10000 B(Supplier) = 100
B(Supply) = 100 V(Supplier, scity) = 20
V(Supplier, state) = 10
M=11
V(Supply, pno) = 2500
Supplier(sid, sname, scity, sstate)
Supply(sid, pno, quantity)
Physical Plan 3
πsname
T=4
Physical Plan 3
πsname
T=4
Physical Plan 3
πsname
T=4
Assume V = 10
Histograms
Employee(ssn, name, age)
T(Employee) = 25000, V(Empolyee, age) = 50
σage=48(Empolyee) = ?
Estimate: T(Employee) / V(Employee,age) = 500
V= 3 10 7 6 5 4
V= 3 10 7 6 5 4
• Eq-Depth
• V-Optimal histograms
CSEP 544 - Spring 2021 52
Employee(ssn, name, age)
Histograms
Eq-width:
Eq-depth: