CSE 444: Database Internals: Section 4: Query Optimizer
CSE 444: Database Internals: Section 4: Query Optimizer
Section 4:
Query Optimizer
Plan for Today
• Problem 1A, 1B: Estimating cost of a plan
– You try to compute the cost for 5 mins
– We go over the solution together
Query:
SELECT S.name
FROM Student S, Book B, Checkout C
WHERE S.sid = C.sid
AND B.bid = C.bid
AND B.author = 'Olden Fames'
AND S.age > 12
AND S.age < 20
S(sid,name,age,addr)
B(bid,title,author)
C(sid,bid,date)
Assumptions
• Student: S, Book: B, Checkout: C
Q. Compute
(On the fly)
f 12<age<20 1. the cost and
cardinality in steps
(Block nested loop e (a) to (g)
S inner) sid 2. the total cost
Assumptions:
d sid (On the fly) • Unclustered B+tree index on
B.author
(Indexed-nested loop, • Clustered B+tree index on
C.bid
B outer, C inner) c • All index pages are in memory
bid • Unlimited memory
(On the fly) b bid
a author = ‘Olden Fames’ Checkout C Student S
(Index scan) (File scan)
Book B
(Index scan) 7
S(sid,name,age,addr) T(S)=10,000 B(S)=1,000 V(B,author) = 500
B(bid,title,author): Un. B+ on author T(B)=50,000 B(B)=5,000 7 <= age <= 24
C(sid,bid,date): Cl. B+ on bid T(C)=300,000 B(C)=15,000
(a) T(B) / V(B, author)
Solution – 1B cost = 50,000/500 = 100
cardinality = 100
(unclustered)
Query:
SELECT S.sid, R.rname
FROM Sailors S, Boats B, Reserves R
WHERE S.sid = R.sid
AND B.bid = R.bid
AND B.color = red
Example is from the Ramakrishnan book
S (sid, sname, srating, age)
B (bid, bname, color)
R (sid, bid, date, rname)
Available Indexes
• Sailors: S, Boats: B, Reserves: R
First Pass
• Where to start?
– How to access each relation, assuming it would be the first
relation being read
– File scan is also available!
• Sailors?
– No selection matching an index, use File Scan (no overhead)
• Reserves?
– Same as Sailors
• Boats?
– Hash index on color, matches B.color = red
– B+ tree also matches the predicate, but hash index is cheaper
• B+ tree would be cheaper for range queries
S (sid, sname, srating, age): 1. B+tree - sid, 2. hash index - sid SELECT S.sid, R.rname
B (bid, bname, color) : 1. B+tree - color, 2. hash index - color WHERE S.sid = R.sid
R (sid, bid, date, rname) : 1. B+tree - sid, 2. Clustered B+tree - bid B.bid = R.bid, B.color = red
Second Pass
• What next?
– For each of the plan in Pass 1 taken as outer, consider joining
another relation as inner
• What are the combinations? How many new options?
Outer Inner OPTION 1 OPTION 2 OPTION 3
R (file scan) B (B+-color) (hash color) (File scan)
R (file scan) S (B+-sid) (hash sid) ,,
S (file scan) B (B+-color) (hash color) ,,
S (file scan) R (B+-sid) (Cl. B+ bid) ,,
B (hash index) R (B+-sid) (Cl. B+ bid ,,
B (hash index) S (B+-sid) (hash sid) ,,
S (sid, sname, srating, age): 1. B+tree - sid, 2. hash index - sid SELECT S.sid, R.rname
B (bid, bname, color) : 1. B+tree - color, 2. hash index - color WHERE S.sid = R.sid
R (sid, bid, date, rname) : 1. B+tree - sid, 2. Clustered B+tree - bid B.bid = R.bid, B.color = red
Second Pass
• Which outer-inner combinations can be discarded?
– B, S and S, B: Cartesian product!
R (file scan) B (B+-color) Not useful (hash color) Consider all methods, select
those tuples where B.color = red using the
color index (note: no index on bid)
S (file scan) R (B+-sid) Consider all (Cl. B+ bid) Not useful
methods
B (hash R (B+-sid) Not useful (Cl. B+ bid)
index) 2A. Index nested loop join
(no H. I. on bid)
2B. Sort-merge join
Keep the least cost plan between (clustered, index sorted on bid,
• (R, S) and (S, R) produces outputs in sorted order by
• (R, B) and (B, R) bid, retained if cheaper)
S (sid, sname, srating, age): 1. B+tree - sid, 2. hash index - sid SELECT S.sid, R.rname
B (bid, bname, color) : 1. B+tree - color, 2. hash index - color WHERE S.sid = R.sid
R (sid, bid, date, rname) : 1. B+tree - sid, 2. Clustered B+tree - bid B.bid = R.bid, B.color = red
Third Pass
• Join with the third relation
• For each option retained in Pass 2, join with the third
relation
• E.g.
– Boats (B+tree on color) – sort-merged-join – Reserves
(B+tree on bid)
– Join the result with Sailors (B+ tree on sid) using sort-merge-
join
• Need to sort (B join R) by sid, was sorted on bid before
• Outputs tuples sorted by sid
• Not useful here, but will be useful if we had GROUP BY on sid
• In general, a higher cost “interesting” plans may be retained (e.g.
sort operator at root, grouping attribute in group by query later, join
attriute in a later join)
Tomorrow, Lecture 12
• Pseudocode for Sellinger Optimization as a
dynamic programming
• Complexity of the algorithm