0% found this document useful (0 votes)
107 views77 pages

Query Optimization in Relational Database Systems

Query optimization in relational database systems involves finding the most efficient way to evaluate a query by considering different execution plans composed of relational algebra operations. The optimizer represents each possible execution strategy as a query evaluation plan (QEP) and aims to select the best plan to minimize the cost of computing the query results. Some techniques used in optimization include pushing selections down, reordering joins and applying indexes to improve performance.

Uploaded by

Gilbert Heß
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views77 pages

Query Optimization in Relational Database Systems

Query optimization in relational database systems involves finding the most efficient way to evaluate a query by considering different execution plans composed of relational algebra operations. The optimizer represents each possible execution strategy as a query evaluation plan (QEP) and aims to select the best plan to minimize the cost of computing the query results. Some techniques used in optimization include pushing selections down, reordering joins and applying indexes to improve performance.

Uploaded by

Gilbert Heß
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Query Optimization in

Relational Database Systems


It is safer to accept any chance
that offers itself, and extemporize
a procedure to fit it, than to get a
good plan matured, and wait
for a chance of using it.

Thomas Hardy (1874)


in Far from the Madding Crowd

CS3223: Query Optimization 1


Query Optimization
• Since each relational op returns a relation, ops can be
composed!
• Queries that require multiple ops to be composed may
be composed in different ways - thus optimization is
necessary for good performance, e.g. A B C D can
be evaluated as follows:
• (((A B) C) D)
• ((A B) (C D))
• ((B A) (D C))
• …

CS3223: Query Optimization 2


Query Optimization
• Each strategy can be represented as a query evaluation plan
(QEP) - Tree of R.A. ops, with choice of algorithms for each
op.
SM HJ
NL
D
NL HJ INL
C

A B A B C D

(((A B) C) D) ((A B) (C D))

• Goal of optimization: To find the “best” plan that compute the


same answer (to avoid “bad” plans)
CS3223: Query Optimization 3
Motivating Examples

Sailors (sid: integer, sname: string, rating: integer, age: real)


Reserves (sid: integer, bid: integer, day: dates, rname: string)

• Reserves:
• Each tuple is 40 bytes long, 100 tuples per page, 1000 pages.
• Sailors:
• Each tuple is 50 bytes long, 80 tuples per page, 500 pages.

CS3223: Query Optimization 4


SELECT S.sname
FROM Reserves R, Sailors S
WHERE R.sid=S.sid AND
Example
R.bid=100 AND S.rating>5

sname
sname
sname
rating > 5

bid=100 rating > 5

sid=sid
sid=sid

bid=100 Sailors bid=100 rating > 5


sid=sid

Reserves Reserves Sailors


Sailors Reserves
Logical plans
CS3223: Query Optimization 5
SELECT S.sname
FROM Reserves R, Sailors S
WHERE R.sid=S.sid AND
Example (Cont)
R.bid=100 AND S.rating>5
Query Evaluation Plan:
• Cost: 500+500*1000 I/Os (On-the-fly)
sname
• Memory: 3

bid=100 rating > 5 (On-the-fly)

(Page Nested Loops)


sid=sid

Sailors Reserves

Physical plan
CS3223: Query Optimization 6
Alternative Plan 1 (No Indexes)
• Main difference: push selections down
• Assume 5 buffers, T1 = 10 pages (100 boats,
uniform distribution), T2 = 250 pages (10 ratings,
(On-the-fly)
uniform distribution) sname

• Cost of plan:
• Scan Reserves (1000) + write temp T1 (10 pages, if we (Sort-Merge)
have 100 boats, uniform distribution) sid=sid
• Scan Sailors (500) + write temp T2 (250 pages, if we
(T1) (T2)
have 10 ratings) bid=100 rating > 5
• Sort T1 (2*2*10), sort T2 (2*4*250), merge (10+250)
• Total: 4060 page I/Os Reserves Sailors

CS3223: Query Optimization 7


Alternative Plan 2 (With Indexes)
• Clustered index on bid of Reserves
• 100,000/100 = 1000 tuples on 1000/100 = 10 pages
• Hash index on sid (format 2). Join column sid is a key sname
(On-the-fly)
for Sailors
• INL with pipelining (outer is not materialized) rating > 5 (On-the-fly)
• Project out unnecessary fields from outer doesn’t help
• At most one matching tuple, unclustered index (INL
sid=sid with pipelining )
on sid OK
• Did not push “rating>5” before the join. Why? bid=100 Sailors
• Cost? (Use hash
index; do
not write
• Selection of Reserves tuples (10 I/Os); for each, must Reserves result to
temp)
get matching Sailors tuple (1000*2.2); total 2210 I/Os

CS3223: Query Optimization 8


Overview of Query Optimization
SQL query
parse
parse tree
convert
answer
logical query plan
execute
apply laws
Pi
“improved” l.q.p
estimate result sizes pick best
l.q.p. +sizes {P1,C1>...}
consider physical plans estimate costs
{P1,P2,…..}
CS3223: Query Optimization 9
Example: SQL query

SELECT sname
FROM Sailors
WHERE sid IN (
SELECT sid
FROM Reserves
WHERE rname LIKE ‘Tan%’
);

(Find names of sailors whose reservation is made by


someone whose name begins with “Tan”)
CS3223: Query Optimization 10
Example: Parse Tree
<Query>

<SFW>

SELECT <SelList> FROM <FromList> WHERE <Condition>

<Attribute> <RelName> <Tuple> IN <Query>

sname Sailors <Attribute> ( <Query> )

sid <SFW>

SELECT <SelList> FROM <FromList> WHERE <Condition>

<Attribute> <RelName> <Attribute> LIKE <Pattern>

sid Reserves rname ‘Tan%’


CS3223: Query Optimization 11
Example: Improved Logical
Logical Query Plan Query Plan
sname sname
sid=sid
sid=sid

Sailors sid Sailors sid
rname LIKE ‘Tan%’ rname LIKE ‘TAN%’
Reserves Can we improve further?
Reserves
e.g. Push project to Sailors?
CS3223: Query Optimization 12
Example: Estimate Result Sizes

Need expected size

Sailors 

Reserves

CS3223: Query Optimization 13


Example: One Physical Plan

Hash join Parameters: join order,


memory size, project attributes,...

SEQ scan index scan


Parameters:
Select Condition,...

Sailors Reserves

CS3223: Query Optimization 14


Example: Estimate Plan Cost to
Find Optimal QEP
QEPs

P1 P2 …. Pn

C1 C2 …. Cn

Pick best!
CS3223: Query Optimization 15
Relational Algebra Equivalences
What about p1p2 … pn(R)?

CS3223: Query Optimization 16


Relational Algebra Equivalences

CS3223: Query Optimization 17


Bags vs. Sets
R = {a,a,b,b,b,c}
S = {b,b,c,c,d}
RS=?

• Option 1 SUM
R  S = {a,a,b,b,b,b,b,c,c,c,d}
• Option 2 MAX
R  S = {a,a,b,b,b,c,c,d}

CS3223: Query Optimization 18


“SUM” is implemented
• Use “SUM” option for bag unions
• Some rules cannot be used for bags
• e.g. A s (B s C) = (A s B) s (A s C)

Let A, B and C be {x}


B B C = {x, x} A B (B B C) = {x}
A B B = {x} A B C = {x} (A B B) B (A B C) = {x, x}

CS3223: Query Optimization 19


Query Optimizer
• Find the “best” plan (more often avoid the bad plans)
• Comprises the following
• Plan space
• huge number of alternative, semantically equivalent plans
• computationally expensive to examine all
• Conventional wisdom: avoid bad plans
• need to include plans that have low cost
• Enumeration algorithm (Search space)
• search strategy (optimization algorithm) that searches through the plan
space
• has to be efficient (low optimization overhead)
• Cost model
• facilitate comparisons of alternative plans
• has to be “accurate”
CS3223: Query Optimization 20
Join Plan Notation

CS3223: Query Optimization 21


Plan Space
• Left-deep trees: right child has to be a base table
• Right-deep trees: left child has to be a base table
• Deep trees: one of the two children is a base table
• Bushy tree: unrestricted
space of bushy tree is the largest,
it also contains left/right/deep trees

D D

C C

A B C D A B A B
Bushy tree Left-deep tree Deep tree
in general, good plans can be found in left/deep tree
CS3223: Query Optimization 22
Query Plan Space for R S T

This has not accounted for the algorithms!


CS3223: Query Optimization 23
Search Algorithms (and Search Space)
• Exhaustive (Complete space)
• enumerate each possible plan, and pick the best
• Greedy Techniques (Very small – polynomial)
• smallest relation next
• smallest result next
• typically polynomial time complexity
• Randomized/Transformation Techniques (Large space – can be
complete if you run the algorithms indefinitely)
• System R approach (Almost complete)
• Dynamic Programming with Pruning

CS3223: Query Optimization 24


Multi-Join Queries
• Focus on multi-join queries first means dont consider selection/projection

• Join is the most expensive operations


• Selections and projections can be pushed down as early as
possible
• Query
• A query graph whose nodes are relations and edges represent a
join condition between the two nodes

CS3223: Query Optimization 25


Greedy Algorithm (Example)
• Heuristic 1: Smallest relation next
• Suppose Ri < Rk for i < k
R1

All plans must begin with R1


R2
small relation --> table will be small
R1 R3 --> intermediate result will be small

R5 R4 R2
R3 R4 R5
Join Graph All plans beginning with R2-R5 have been pruned!

CS3223: Query Optimization 26


Greedy Algorithm (Example)
• Smallest relation next
• What if R1 < R5 < R3 < R2 < R4???
using "smallest relation next" will result in large cross products and large
intermediate table

R2
Another heuristic:
R1 R3
Smallest result next?

R5 R4

CS3223: Query Optimization 27


plan space
algorithm

Cost Models
cost model

• Typically, a combination of CPU and I/O costs


• Objective is to be able to rank plans
• exact value is not necessary
• Relies on
• statistics on relations and indexes
• formulas to estimate CPU and I/O cost
• formulas to estimate selectivities of operators and intermediate
results

CS3223: Query Optimization 28


Cost Estimation
• For each plan considered, must estimate cost:
• Must estimate cost of each operation in plan tree
• Depends on input cardinalities
• Depends on buffer size, availability of indexes, algorithms used,
etc. ?
• We’ve already discussed how to estimate the cost of
operations (sequential scan, index scan, joins, etc.) ? R4

• Must estimate size of result for each operation in ?


R3
tree!
R1 R2
• Use information about the input relations
• Typical assumptions like uniform distribution of data and
independence of predicates can simplify size estimation but is
error prone

CS3223: Query Optimization 29


Statistics and Catalogs
• Need information about the relations and indexes involved
Catalogs typically contain at least:
• # tuples of R (||R||), #bytes in each R tuple (S(R))
• # blocks/pages to hold all R tuples (|R|)
• # distinct values in R for attribute A (V(R,A))
• NPages for each index
• Index height, low/high key values (Low/High) for each tree index
• Catalogs updated periodically
• Updating whenever data changes is too expensive; lots of
approximation anyway, so slight inconsistency ok

CS3223: Query Optimization 30


Estimation Assumptions
• Uniformity assumption
• Uniform distribution of attribute values
• Independence assumption
• Independent distribution of values in different attributes
• Inclusion assumption
• For R R.A=S.B S, if V(R, A) ≤ V(R, B), then A(R)  B(S)
• V(R, A) is the number of distinct values of R.A
assume key-foreign key relationship

CS3223: Query Optimization 31


Example
R A B C D A: 20 byte string
cat 1 10 a B: 4 byte integer
cat 1 20 b
dog 1 30 a
C: 8 byte string
dog 1 40 c D: 5 byte string
bat 1 50 d

||R|| = 5 S(R) = 37
V(R,A) = 3 V(R,C) = 5
V(R,B) = 1 V(R,D) = 4
CS3223: Query Optimization 32
Size estimate for W = Z=val (R)
||R||
R A B C D V(R,A)=3 ||W|| =
V(R,Z)
cat 1 10 a V(R,B)=1
cat 1 20 b
V(R,C)=5 S(W) = S(R)
dog 1 30 a
dog 1 40 c V(R,D)=4
bat 1 50 d
Assumption:
Values in select expression Z = val are uniformly
distributed over possible V(R,Z) values
Alternative assumption: use DOM(R,Z)
CS3223: Query Optimization 33
What about W = z  val (R)?
Solution: Estimate values in range
R Z
Min=1 V(R,Z)=10

W= z  15 (R)
Max=20

20-15+1
f (fraction of range) = = 6 ||W|| = f  ||R||
20-1+1 20

Alternative: (Max(Z)-value)/(Max(Z)-Min(Z))
CS3223: Query Optimization 34
W = R1 R2
R1 A B C R2 A D

Assumption:
V(R1,A)  V(R2,A)  Every A value in R1 is in R2
V(R2,A)  V(R1,A)  Every A value in R2 is in R1

“containment of value sets”

CS3223: Query Optimization 35


Computing T(W) when V(R1,A)  V(R2,A)

R1 A B C R2 A D
Take
1 tuple Match

1 tuple of R1 matches with ||R2|| tuples of R2


V(R2,A)
||R2||
so ||W|| =||R1|| 
V(R2,A)
||R2||  ||R1||
If V(R2,A)  V(R1,A) ||W|| =
V(R1,A)
CS3223: Query Optimization 36
For complex expressions, need
intermediate T,S,V results
E.g. W = [A=a (R1) ] R2

Treat as relation U

||U|| = ||R1||/V(R1,A) S(U) = S(R1)

Also need V (U, *) !!


CS3223: Query Optimization 37
Example
R1 A B C D V(R1,A)=3
cat 1 10 10 V(R1,B)=1
cat 1 20 20 V(R1,C)=5
dog 1 30 10 V(R1,D)=3
dog 1 40 30 U =  A=a (R1)
bat 1 50 10
||R1||
V(U,A) = 1? V(U, B) = 1? (= V(R, B)) V(U,C) = ?
V(R1,A)

V(D,U) … somewhere in between V(U,B) and V(U,C)


CS3223: Query Optimization 38
For Joins U = R1(A,B) R2(A,C)
V(U,A) = min { V(R1, A), V(R2, A) }
V(U,B) = V(R1, B)
V(U,C) = V(R2, C)

(Assumption: Preservation of value sets)

CS3223: Query Optimization 39


Example

Z = R1(A,B) R2(B,C) R3(C,D)

R1 ||R1|| = 1000 V(R1,A)=50 V(R1,B)=100


R2 ||R2|| = 2000 V(R2,B)=200 V(R2,C)=300
R3 ||R3|| = 3000 V(R3,C)=90 V(R3,D)=500

CS3223: Query Optimization 40


Z = R1(A,B) R2(B,C) R3(C,D)
||R1|| = 1000 V(R1,A)=50 V(R1,B)=100
Partial Result: U = R1 R2 ||R2|| = 2000 V(R2,B)=200 V(R2,C)=300
||R3|| = 3000 V(R3,C)=90 V(R3,D)=500

||U|| = 10002000 V(U,A) = 50


200 V(U,B) = 100
V(U,C) = 300

Z=U R3
V(Z,A) = 50
||Z|| = 100020003000 V(Z,B) = 100
200300 V(Z,C) = 90
V(Z,D) = 500
CS3223: Query Optimization 41
Errors in Estimating Size of Plan
• Errors
• source include uniformity assumption, and inability to
capture correlation, accuracy of cost model, statistics, etc.
• propagated to other operators at the higher level of the plan R4
tree
R3
• Dealing with errors
R1 R2
• Maintain more detailed statistics (at finer granularity)
• During runtime, may need to sample the actual intermediate
results
• dynamic query optimization

CS3223: Query Optimization 42


Statistical Summaries of Data
• More detailed information are sometimes stored e.g.,
histograms of the values in some attributes
• a histogram divides the values on a column into k buckets
• k is predetermined or computed based on space allocation
• several choices for “bucketization’’ of values
• If a table has n records, an equi-depth histogram divides the set of
values on a column into k ranges such that each range has
approximately the same number of records, i.e., n/k
• Equi-width histogram – each bucket has (almost) equal number of
values
• Witihin each bucket, records are uniformly distributed across the range
of the bucket
• Frequently occurring values may be placed in singleton buckets
CS3223: Query Optimization 43
Histograms
15 records are uniformly
distributed

3.5
3.0
2.25

CS3223: Query Optimization 44


Estimations with Histograms

3.5
3.0
2.25

Query Q:  A=6 (R) Query Q:  A=10 (R)

Actual value, ||Q|| = 3 Actual value, ||Q|| = 0


Without histogram, ||Q|| = 45/15 = 3 Without histogram, ||Q|| = 45/15 = 3
Equiwidth histogram, ||Q|| = 15/3 = 5 Equiwidth histogram, ||Q|| = 1
Equidepth histogram, ||Q|| = 14/4 = 3.5 Equidepth histogram, ||Q|| = 1.75

CS3223: Query Optimization 45


Query Evaluation

CS3223: Query Optimization 46


Query Evaluation Approaches
• Materialization evaluation
• An operator is evaluated only when each of its operands
has been completely evaluated or materialized
• Intermediate results are materialized to disk
• Pipelining evaluation
• The output produced by an operator is passed directly to
its parent operator
• Execution of operators is interleaved

CS3223: Query Optimization 47


Materialization Evaluation

CS3223: Query Optimization 48


Pipeline Evaluation: Iterator Model
consumer
Open()

Open()

Open() Open()

CS3223: Query Optimization 49


Pipeline Evaluation: Iterator Model
consumer
GetNext() t

All operators are


GetNext() t
running simultaneously

GetNext() GetNext()
tm ta

CS3223: Query Optimization 50


Pipeline Evaluation with Partial Materialization

Plan 1
Plan 2
CS3223: Query Optimization 51
Randomized Techniques
• Employ randomized/transformation techniques for query
optimization
• State space -- space of plans, State -- plan
• Each state has a cost associated with it
• determined by some cost model
• A move is a perturbation applied to a state to get to another
state
• a move set is the set of moves available to go from one state to another
• any one move is chosen from this move set randomly
• each move set has a probability associated to indicate the probability of selecting the move
• Two states are neighboring states if one move suffices to go
from one state to the other
CS3223: Query Optimization 52
Randomized Algorithm (Example)
State/QEP 1
R4 R4 Neighbor
R3 R3 of QEP 1

R1 R2 R2 R1

A move

Neighbor R4 R4
of QEP 1
R3 R3
R1 R2 R2 R1

CS3223: Query Optimization 53


More on Randomized Techniques
• A local minimum in the state space is a state such that its
cost is lower than that of all neighboring states
• A global minimum is a state which has the lowest cost
among all local minima
• at most one global minimum
• A move that takes one state to another state with a lower
cost is called a downward move; otherwise it is an upward
move
• in a local/global minimum, all moves are upward moves

CS3223: Query Optimization 54


Local Optimization
Repeat until
a near-optimal S = initialize() // initial plan
minimum
Is reached
minS = S // cost of plan S – currently the best
repeat {
repeat {
By doing so newS = move(S) // move to a new plan
repeatedly,
a local minimum if (cost(newS) < cost(S)) A move is accepted if it is a
downward move, i.e., has
can S = newS a lower cost
be reached } until (“local minimum reached”)
if (cost(S) < cost(minS))
minS = S
newStart(S); // iterate with a different initial plan
} until (“stopping condition satisfied”)
return (minS);
CS3223: Query Optimization 55
Issues on Local Optimization
• How is the start state obtained?
Run: sequence of
• The state in which we start a run
moves to a local
• The start state of the first run is the initial state minimum from the
• All start states should be different start state
• Should be obtained quickly
• Random
• greedy heuristics
• making a number of moves from the local minimum, except that this time each move
is accepted irrespective of whether it increases or decreases the cost
• How is the local minimum detected? need to move to a plan with a higher
cost in order to jump out of the local
minimum
• How is the stopping criterion detected?

CS3223: Query Optimization 56


Issues on Local Optimization (Cont)
• How is the local minimum detected?
• Not practical to examine all neighbors to verify that one has
reached a local minimum
• Based on random sampling
• examine a sufficiently large number of neighbors
• if any one is lower, we move to that state, and repeat the process
• if no tested neighbor is of lower cost, the current state can be considered a local
minimum
• the number of neighbors to examine can be specified as a parameter,
and is called the sequence length
• Can also be time-based

CS3223: Query Optimization 57


Issues in Local Optimization (Cont)

• How is the stopping criterion detected?


• Determines the number of times that the outer loop is
executed
• Can be fixed and is given by sizeFactor*N, where
sizeFactor is a parameter, N is the number of relations
• Why N? Can it be a constant?
if you have a 4 table join you might not need 500 initial plans.
if you have a 500 table join, 100 initial plans likely will not find an optimal plan

CS3223: Query Optimization 58


What about the MOVEs?
Transformation Rules
• Restricted to left-deep trees
• all possible permutations of the N relations
• let S be the current state, S = (… i … j … k …)
• swap
• select two relations, say i and j at random. Swapped i and j to get the new
state newS = ( … j … i … k … ) might result in a cross product being removed, hence
reducing the cost
• 3Cycle
• select three relations, say i and j and k at random. The move consists of
cycling i, j and k: i is moved to the position of j, j is moved to the position of k
and k is moved to the position of i. The resultant new state newS = ( … k …
i…j…)
• Other methods (e.g., join methods)? Bushy trees?

CS3223: Query Optimization 59


Comparison between Exhaustive,
Greedy and Randomized Algorithms
randomised algorithm might have high overhead
if it keeps repeating the same few plans.
• Search space
• Plan quality
• Optimization overhead

CS3223: Query Optimization 60


Dynamic Programming (Left-Deep Trees)
(System R)
• The algorithm proceeds by considering increasingly larger
subsets of the set of all relations
• Builds a plan bottom-up (beginning from 1 table, then to 2, and so on)
• Plans for a set of cardinality i are constructed as extensions of the
best plan for a set of cardinality i-1
• For each set of cardinality i, we only keep ONE best plan

CS3223: Query Optimization 61


Dynamic Programming (Cont)
{}

{1} {2} {3} {4}

{1 2} {1 3} {1 4} {2 3} {2 4} {3 4}

{1 2 3} {1 2 4} {2 3 4} {1 3 4}

{1 2 3 4}
CS3223: Query Optimization 62
Dynamic Programming (Left-Deep Trees)
(System R)
• The algorithm proceeds by considering increasingly larger
subsets of the set of all relations
• Builds a plan bottom-up (beginning from 1 table, then to 2, and so on)
• Plans for a set of cardinality i are constructed as extensions of the best
plan for a set of cardinality i-1
• Keep only ONE best plan for each set of cardinality i
• Search space can be pruned based on the principal of optimality
• if two plans differ only in a subplan, then the plan with the better subplan is
also the better plan

CS3223: Query Optimization 63


Principle of Optimality
{}

{1} {2} {3} {4}

{1 2} {1 3} {1 4} {2 3} {2 4} {3 4}

{1 2 3} {1 2 4} {2 3 4} {1 3 4}

{1 2 3 4}

CS3223: Query Optimization 64


Dynamic Programming (Left-Deep Trees)
(System R)
• The algorithm proceeds by considering increasingly larger subsets of
the set of all relations
• Builds a plan bottom-up (beginning from 1 table, then to 2, and so on)
• Plans for a set of cardinality i are constructed as extensions of the best plan for a set of
cardinality i-1
• Keep only ONE best plan for each set of cardinality i
• Search space can be pruned based on the principal of optimality
• if two plans differ only in a subplan, then the plan with the better subplan is also the better
plan
• Computation overhead reduced due to overlapping subproblems
• Multiple sets of cardinality i uses same set at cardinality (i-1)

CS3223: Query Optimization 65


Dynamic Programming (Left-Deep Trees)
• accessPlan(R) produces the best plan for relation (single
table) R
• joinPlan(p1,R) extends the (partial) join plan p1 into another
plan p2 in which the result of p1 is joined with R in the best
possible way
• p1 = R1 JOIN R2 JOIN R3
• p2 = joinPlan(p1, R) = (R1 JOIN R2 JOIN R3) JOIN R4
• Optimal plans for subsets are stored in optplan() array and
are reused rather than recomputed
CS3223: Query Optimization 66
Dynamic Programming (Cont)
for i = 1 to N
optPlan({Ri}) = accessPlan(Ri)
for i = 2 to N {
forall S subset of {R1, R2, … Rn} such that |S|=i {
bestPlan = dummy plan with infinite cost
forall Rj, Sj, |Sj| = i-1 such that S = {Rj} U Sj {
p = joinPlan(optPlan(Sj), Rj) {}

if cost(p) < cost(bestPlan) {1} {2} {3} {4}

bestPlan = p
} {1 2} {1 3} {1 4} {2 3} {2 4} {3 4}

optPlan(S) = bestPlan
} {1 2 3} {1 2 4} {2 3 4} {1 3 4}

}
Popt = optPlan{R1, R2, … Rn} {1 2 3 4}

CS3223: Query Optimization 67


Dynamic Programming: A Concrete Example

CS3223: Query Optimization 68


Enumeration of Single-relation Plans

CS3223: Query Optimization 69


Enumeration of Two-Relation Plans
s and t not considered because we want to avoid cross products

CS3223: Query Optimization 70


Enumeration of Three-Relation Plans

CS3223: Query Optimization 71


Optimal Plan

CS3223: Query Optimization 72


Dynamic Programming (Cont)
• Time & Space complexity
• For k relations, for left-deep trees, 2k – 1 entries!
DP is not used when k exceeds some value
• For bushy trees, O(3k) (about 15-20)

• Is DP (as presented) optimal?


not optimal if we also consider bushy trees, since DP doesnt include bushy trees

not optimal even within the subset of left-deep trees

CS3223: Query Optimization 73


Dynamic Programming (Cont)
• Time & Space complexity
• For k relations, for left-deep trees, 2k – 1 entries!
• For bushy trees, O(3k)
• Is DP optimal?
• DP may maintain multiple plans per subset of
relations
plans with properties that are beneficial in the future
• Interesting orders might be pruned away because they are not optimal now

plans that are sorted with a certain ordering, for example.


queries that have ordering
and group by. we might want to
keep plans that result in these
relations being sorted. more computataion overhead.

CS3223: Query Optimization 74


CS3223: Query Optimization 75
Dynamic Programming (Cont)
• Time & Space complexity
• For k relations, for left-deep trees, 2k – 1 entries!
• For bushy trees, O(3k)
• Is DP optimal?
• DP may maintain multiple plans per subset of
relations
• Interesting orders
• Is DP with interesting orders optimal?
gives the optimal query with respect to total cost, but not for other
cost models. but will still be a good solution.
CS3223: Query Optimization 76
Summary
• Query optimization is NP-hard
• Instead of finding the best, the objective is largely to
avoid the bad plans
• Many different optimization strategies have been
proposed
• greedy heuristics are fast but may generate plans that are
far from optimal
• dynamic programming is effective at the expense of high
optimization overhead

CS3223: Query Optimization 77

You might also like