DBT 2
DBT 2
Query Execution
Department of Computer Science and Engineering
List of Contents
- Query Execution
- Query Compilation
- Physical Query Plan - Operators
- The Computation Model for Physical Operators
3
DATABASE TECHNOLOGIES
Query Compilation
4
DATABASE TECHNOLOGIES
Query Compilation - Example
5
DATABASE TECHNOLOGIES
Query Compilation - Example
Physical query plans are built from operators, each of which implements one step of the plan
Scanning - read the contents of a relation R. There are two approaches to locating the tuples of a
relation R.
1. Table scan – Read the blocks containing the tuples of R one by one from secondary storage
2. Index scan – If there is an index on any attribute of R, use this index to get all the tuples of R.
Sorting
The physical-query-plan operator sort-scan takes a relation R and a specification of the attributes
on which the sort is to be made and produces R in that sorted order
7
DATABASE TECHNOLOGIES
The Computation Model for Physical Operators
8
DATABASE TECHNOLOGIES
The Computation Model for Physical Operators
• This method starts the process of getting tuples, but does not get a tuple.
• It initializes any data structures needed to perform the operation and calls Open() for
any arguments of the operation
11
DATABASE TECHNOLOGIES
The Computation Model for Physical Operators
• This method ends the iteration after all tuples, or after all tuples that
the consumer wanted have been obtained.
13
DATABASE TECHNOLOGIES
The Computation Model for Physical Operators
Open()
{
R.Open();
CurRel := R;
}
14
DATABASE TECHNOLOGIES
The Computation Model for Physical Operators
Close()
{
R.Close();
S.Close();
}
16
THANK YOU
Query Execution
Department of Computer Science and Engineering
17
Database Technologies
Query Processing and Optimization
List of Contents
- One-Pass Algorithms for Tuple-at-a-Time Operations
- One-Pass Algorithms for Unary, Full-Relation
Operations (Duplicate Elimination, Grouping)
- One-Pass Algorithms for Binary Operations (Set Union,
Set intersection, Set Difference, Product, Natural Join)
8
DATABASE TECHNOLOGIES
Query Execution
9
DATABASE TECHNOLOGIES
Query Execution
10
DATABASE TECHNOLOGIES
Query Execution
11
DATABASE TECHNOLOGIES
Query Execution
Main Memory and Disk I/O Requirements for One-Pass Algorithms for Different Operations
13
THANK YOU
One Pass Algorithms
Department of Computer Science and Engineering
14
Database Technologies
Query Processing and Optimization
List of Contents
- Two Pass Algorithms based on Sorting (Two-Phase Multiway
Merge-Sort, Duplicate Elimination, Grouping & Aggregation,
Union, Intersection, Difference, Join, Merge-Join)
- Two Pass Algorithms based on Hashing (Partitioning Relations,
Duplicate Elimination, Grouping & Aggregation, Union,
Intersection and Difference)
- Saving some disk I/Os
Department of Computer Science and Engineering
2
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
• Phase 2 :
• Merge the sorted sublists into one sorted list with all the records as
follows.
• Find the smallest key among the first remaining elements of all the lists.
• Move the smallest element to the first available position of the output block
• If the output block is full, write it to disk and reinitialize the same buffer in main
memory to hold the next output block
• If the block from which the smallest element was just taken is now exhausted of
records, read the next block from the same sorted sublist into the same buffer
that was used for the block just exhausted. If no blocks remain, then leave its
buffer empty and do not consider elements from that list in any further
competition for smallest remaining elements.
4
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
F F F F F F
T T G G G G
H H H H H H
G G T T A T A T A H U
K B B J Z
K A A A A
U
U K K K K … F K
A G T
A U U U U
Z
Z Z Z Z Z
J
B J B B B B
B J J J J
5
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
6
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
7
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
F F A A A A
A A F F F F
H H G G G G
G G H H A H A H A A H
K B B B K
K A A A A
U
U K K K K F … F U
A G Z
A U U U U
Z
Z Z Z Z Z
F
B F B B B B
B F F F F
9
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
10
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
F F A A A A
A A F F F F
H H G G G G
G G H H A=2 H A=2 H A=2 A=2 H=1
K B=1 B=1 B=1 K=1
K A A A A
U
U K K K K F=2 … F=2 U=1
A G=1 Z=1
A U U U U
Z
Z Z Z Z Z
F
B F B B B B
B F F F F
12
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
F Z F A A A A
A F A F F F F
H B H G G G G
G G H H A H A H A A H
S
K B B B K
K A A A A
U
U K K K K F … F U
A G Z
A U U U U
R
Z B B B B
F F F F F
B Z Z Z Z
14
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
Sort-Based Intersection
• Create sorted sublists from both R and S
• Use one main-memory buffer for each sublist of R and S. Initialize each with the first block
from the corresponding sublist
• Repeatedly find the first remaining tuple t among all the buffers.
• Copy t to the output if it exists in R and S and remove from the buffers all copies of t
• The cost of disk I/O’s is 3 * (B(R) + B(S)) .
• The total size of the two relations must not exceed M2. That is, B(R) + B(S) ≤ M2.
15
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
F Z F A A A A
A F A F F F F
H B H G G G G
G G H H H H F F
S
K
K A A A A
U
U K K K K …
A
A U U U U
R
Z B B B B
F F F F F
B Z Z Z Z
16
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
Sort-Based Difference
• Create sorted sublists from both R and S
• Use one main-memory buffer for each sublist of R and S. Initialize each with the first block
from the corresponding sublist
• Repeatedly find the first remaining tuple t among all the buffers.
• Copy t to the output if it exists in one and not in the other and remove from the buffers all
copies of t
• The cost of disk I/O’s is 3 * (B(R) + B(S)) .
• The total size of the two relations must not exceed M2. That is, B(R) + B(S) ≤ M2.
17
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
F Z F A A A A
A F A F F F F
H B H G G G G
G G H H A H A H A A U
S
K G
K A A A A
U
U K K K K … H
A K
A U U U U
R
Z B B B B
F F F F F
B Z Z Z Z
18
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
Sort-Based Join
• Consider relations R(X,Y ) and S(Y,Z) to join and M blocks of main memory for buffers
• Sort R and S using 2PMMS with Y as the sort key
• Merge the sorted R and S using only two buffers: one for the current block of R and the other for the current block of S
by repeating the following
• Find the least value y of the join attributes Y that is currently at the front of the blocks for R and S.
• If y does not appear at the front of the other relation, then remove the tuple(s) with sort key y.
• Otherwise, identify all the tuples from both relations having sort key y. If necessary, read blocks from the sorted R
and/or S, until we are sure there are no more y’s in either relation. As many as M buffers are available for this
purpose.
• Output all the tuples that can be formed by joining tuples from R and S that have a common Y -value y.
• If either relation has no more unconsidered tuples in main memory, reload the buffer for that relation 19
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
F Z F A A A A A
A F A F F A A A
H B H G G F F F
G G H H G G G F
S
K
K A A H
U
U K K K
A
A U U U
R
Z B B B B B
F F F F F F
B Z Z Z Z Z
20
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
21
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
22
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
Analysis of “sort-merge-join”
• The number of disk I/O’s is 3 * (B(R) + B(S))
• The sizes of the sorted sublists are M blocks and there can be at most M of them among the two lists
• B(R) + B(S) ≤ M2
Example of “sort-merge-join”
• Consider joining relations R and S of sizes 1000 and 500 blocks respectively using 101 buffers
• Divide R into 10 sublists and S into 5 sublists each of length 100 and sort them
• Use 15 buffers to hold the current blocks of each of the sublists. Use the remaining 86 buffers to store tuples in case
there are many tuples with a fixed Y value
• We need to do three disk I/O’s per block of data. Two to create the sorted sublists and one for the block of every
sorted sublist that is read into main memory one more time in the multiway merging process.
• The total number of disk I/O’s = 3 * (1000 + 500) = 4500
23
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
24
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
25
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
26
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
27
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
28
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
M-1
F F
A G
M
H
F F A H F F F A H
G
A G K U G G G K U
K
A … …
H F A Z F B Z
U
G B
A
Z
F
B H
30
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
31
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
32
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
33
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
34
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
35
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
• Consider relations R and S of 1000 and 500 blocks respectively using M = 101
• For hybrid hash-join, k = 500 / 101 = 5
• On average, each bucket will have 100 tuples of S
• To fit one of these buckets and four extra blocks for the other four buckets, we
need 104 blocks of main memory. Therefore, there is a chance that the in-memory
bucket may overflow
• So choose k = 6
36
DATABASE TECHNOLOGIES
Query Execution – Two pass algorithms
38
THANK YOU
Two Pass Algorithms
Department of Computer Science and Engineering
39
Database Technologies
Query Processing and Optimization
List of Contents
- Buffer Management
- Buffer Management Strategies (Least Recently Used, First In
First Out, Clock Algorithm)
- Relationship Between Physical Operator Selection and Buffer
Management
- Index Scan
- Clustering and Non Clustering Indexes
• First-In-First-Out (FIFO)
• When a buffer is needed, the buffer that has been occupied the longest by the
same block is emptied and used for the new block
• The buffer manager needs to know only the time at which the block currently
occupying a buffer was loaded into that buffer. An entry into a table is made
when the block is read from disk, and there is no need to modify the table when
the block is accessed
• It requires less maintenance than LRU, but it can make more mistakes. A block
that is used repeatedly, say the root block of a B-tree index will eventually
become the oldest block in a buffer. It will be written back to disk, only to be
reread shortly thereafter into another buffer.
5
DATABASE TECHNOLOGIES
Query Execution – Buffer Management
8
DATABASE TECHNOLOGIES
Query Execution – Buffer Management
• Hash-Based algorithms
• We can reduce the number of buckets if M shrinks, as long as the buckets do not then become so large
that they do not fit in allotted main memory. However, these algorithms cannot respond to changes in
M while the algorithm executes.
10
DATABASE TECHNOLOGIES
Query Execution – Indexes
Index Scan
If there is an index on any attribute of R, we may be able to use this index to get
all the tuples of R.
For example, a sparse index on R can be used to lead us to all the blocks holding R,
even if we don’t know otherwise which blocks these are.
This operation is called Index-scan.
The important observation is that the index is used to not only to get all the tuples
of the relation it indexes, but to get only those tuples that have a particular value
(or sometimes a particular range of values) in the attribute or attributes that form
the search key for the index.
11
DATABASE TECHNOLOGIES
Query Execution – Indexes
Figure: A clustering index has all tuples with a fixed value packed into (close to) the minimum
possible number of blocks
14
THANK YOU
Buffer Management, Clustered Indexes
Department of Computer Science and Engineering
15
Database Technologies
Query Processing and Optimization
Query Compiler
Department of Computer Science and Engineering
List of Contents
- Query Parsing and Preprocessing
- Syntax Analysis and Parse Trees
- Grammar for a Simple Subset of SQL
- Preprocessor
- Preprocessing Queries Involving Views
3
DATABASE TECHNOLOGIES
The Query Compiler – Parser
Example:
• Consider the following relations
StarsIn (movieTitle, movieYear, starName)
5
DATABASE TECHNOLOGIES
The Query Compiler – Parser
Syntactic
SELECT movieTitle Category
(Intermediate
FROM StarsIn
Nodes)
WHERE starName IN
( SELECT name FROM MovieStar
WHERE birthdate LIKE ’%1960’ );
Atoms
(Leaf Nodes)
6
DATABASE TECHNOLOGIES
The Query Compiler – Parser
7
DATABASE TECHNOLOGIES
The Query Compiler – Parser
Example:
• Consider the following relations
StarsIn (movieTitle, movieYear, starName)
8
DATABASE TECHNOLOGIES
The Query Compiler – Parser
Example:
• Consider the following relations
StarsIn (movieTitle, movieYear, starName)
9
DATABASE TECHNOLOGIES
The Query Compiler – Parser
Example:
• Consider the following relations
StarsIn (movieTitle, movieYear, starName)
10
DATABASE TECHNOLOGIES
The Query Compiler – Preprocessor
Preprocessor
• The preprocessor is also responsible for semantic checking
• Semantic rules
1. Check relation uses - Every relation mentioned in a FROM-clause must be a relation or
view in the current schema
2. Check and resolve attribute uses - Every attribute that is mentioned in the SELECT or
WHERE clause must be an attribute of some relation in the current scope.
3. Check types - All attributes must be of a type appropriate to their use. Operators are
checked to see that they apply to values of appropriate and compatible types. Attribute
birthdate can be treated as a string hence it is valid.
11
DATABASE TECHNOLOGIES
The Query Compiler – Preprocessor
12
DATABASE TECHNOLOGIES
The Query Compiler – Preprocessor
Query:
SELECT title
FROM ParamountMovies
WHERE year = 1979;
14
DATABASE TECHNOLOGIES
The Query Compiler – Preprocessor
Query:
SELECT title
FROM ParamountMovies
WHERE year = 1979;
15
THANK YOU
Query Compiler
Department of Computer Science and Engineering
16
Database Technologies
Query Processing and Optimization
List of Contents
- Laws for Selection
- Laws for Projection
- Laws for Joins and Product
- Laws for Duplicate Elimination
- Laws for Grouping and Aggregation
3
DATABASE TECHNOLOGIES
Query Execution – Query Compiler
• One of the most important rules of efficient query processing is to move/pushing the
selections down the tree as far as they will go without changing what the expression does
1) σC1 AND C2 (R) = σC1 (σC2 (R)) 7) σC (R ⋈ S) = σC (R) ⋈ S
2) σC1 OR C2 (R) = σC1 (R) ∪ σC2 (R) 8) σC (R ⋈ D S) = σC (R) ⋈ D S
3) σC1 (σC2 (R)) = σC2 (σC1 (R)) 9) σC (R ∩ S) = σC (R) ∩ S.
4) σC (R ∪ S) = σC (R) ∪ σC (S) 10)σC (R × S) = R × σC (S)
5) σC (R − S) = σC (R) − S 11)σC (R × S) = σC (R) × S
6) σC (R − S) = σC (R) − σC (S) 12)σC (R ⋈ S) = σC (R) ⋈ σC (S) 4
DATABASE TECHNOLOGIES
Query Execution – Query Compiler
5
DATABASE TECHNOLOGIES
Query Execution – Query Compiler
6
DATABASE TECHNOLOGIES
Query Execution – Query Compiler
• R ⋈ C S = σC (R × S)
• R ⋈ S = πL (σC (R × S))
Laws for Duplicate Elimination
8
DATABASE TECHNOLOGIES
Query Execution – Query Compiler
Query:
SELECT movieYear, MAX(birthdate)
FROM MovieStar, StarsIn
WHERE name = starName
GROUP BY movieYear;
9
DATABASE TECHNOLOGIES
Query Execution – Query Compiler
Query:
SELECT movieYear, MAX(birthdate)
FROM MovieStar, StarsIn
WHERE name = starName
GROUP BY movieYear;
10
DATABASE TECHNOLOGIES
Query Execution – Query Compiler
Query:
SELECT movieYear, MAX(birthdate)
FROM MovieStar, StarsIn
WHERE name = starName
GROUP BY movieYear; 11
DATABASE TECHNOLOGIES
Query Execution – Query Compiler
Query:
SELECT movieYear, MAX(birthdate)
FROM MovieStar, StarsIn
WHERE name = starName
GROUP BY movieYear;
12
THANK YOU
Algebraic Laws for Improving Query
Plans
Department of Computer Science and Engineering
13
Database Technologies
Query Processing and Optimization
List of Contents
- Conversion to Relational Algebra
- Removing Subqueries From Conditions
- Improving the Logical Query Plan
- Improving the Logical Query Plan - most commonly used
optimization techniques
- Grouping Associative/Commutative Operators
1. Conversion to Relational Algebra - Transform SQL parse trees to algebraic logical query plans.
2. Improving the Logical Query Plan – Rewrite the logical plan using algebraic laws
3
DATABASE TECHNOLOGIES
Query Execution – Query Compiler
i. The product of all the relations mentioned in the <FromList>, which is the argument of:
ii. A selection σC , where C is the expression in the construct being replaced, which in turn
4
DATABASE TECHNOLOGIES
Query Execution – Query Compiler
5
DATABASE TECHNOLOGIES
Query Execution – Query Compiler
SELECT movieTitle
FROM StarsIn
WHERE starName IN
( SELECT name
FROM MovieStar
WHERE birthdate LIKE ’%1960’);
6
DATABASE TECHNOLOGIES
Query Execution – Query Compiler
7
DATABASE TECHNOLOGIES
Query Execution – Query Compiler
Improving the Logical Query Plan - most commonly used optimization techniques
• Selections can be pushed down the expression tree as far as they can go.
• If a selection condition is the AND of several conditions, then we can split the condition and push
each piece down the tree separately. This strategy is probably the most effective improvement
technique.
• Projections can be pushed down the tree, or new projections can be added.
• Duplicate eliminations can sometimes be removed, or moved to a more convenient position in
the tree
• Certain selections can be combined with a product below to turn the pair of operations into an
equijoin, which is generally much more efficient to evaluate than are the two operations
separately. 8
DATABASE TECHNOLOGIES
Query Execution – Query Compiler
Example:
SELECT movieTitle
FROM StarsIn
WHERE starName IN
( SELECT name
FROM MovieStar
WHERE birthdate LIKE ’%1960’);
9
DATABASE TECHNOLOGIES
Query Execution – Query Compiler
11
Database Technologies
Query Processing and Optimization
List of Contents
- Introduction to Cost-Based Plan Selection
- Obtaining Estimates for Size Parameters
- Computation of Statistics
- Heuristics for Reducing the Cost of Logical Query Plans
- Approaches to Enumerating Physical Plans
3
DATABASE TECHNOLOGIES
Query Compiler
Example of Obtaining Estimates for Size Parameters using the Equal-Width Histogram
• SELECT Jan.day, July.day FROM Jan, July WHERE Jan.temp = July.temp;
• If two corresponding bands have T1 and T2 tuples respectively and the number of values
in a band is V, then the estimate for the number of tuples in the join of those bands is
T1*T2/V
• Many of these products are 0, because one or the other of T1 and T2 is 0. The only bands
for which neither is 0 are 40–49 and 50–59.
• Estimate for 40-49 = 10 * 5 / 10 = 5
• Estimate for 50-59 = 5 * 20 / 10 = 10
• Estimate for the size of this join is 5 + 10 = 15 tuples
• Estimated value using a simpler method discussed earlier = 245 * 245 / 100 = 600 tuples
8
DATABASE TECHNOLOGIES
Query Compiler
Computation of Statistics
• Statistics normally are computed periodically.
• The re-computation of statistics might be triggered automatically after some period of time or after
some number of updates
• Computing statistics for an entire relation R can be very expensive, particularly if we compute V (R,a)
for each attribute a in the relation
• One common approach is to compute approximate statistics by sampling only a fraction of the data
• In a small sample of R, say 1% of its tuples, if we find that most of the a-values we see are different,
then it is likely that V (R,a) is close to T(R).
• If we find that the sample has very few different values of a, then it is likely that we have seen most of
the a-values
9
DATABASE TECHNOLOGIES
Query Compiler
• Top-down: Work down the tree of the logical query plan from the root. For each possible
implementation of the operation at the root, consider each possible way to evaluate its
argument(s) and compute the cost of each combination, taking the best
• Bottom-up: For each subexpression of the logical-query-plan tree, compute the costs of
all possible ways to compute that subexpression. The possibilities and costs for a
subexpression E are computed by considering the options for the subexpressions of E and
combining them in all possible ways with implementations for the root operator of E
12
DATABASE TECHNOLOGIES
Query Compiler- Approaches to Enumerating Physical Plans
Heuristic Enumeration
• Use the same approach to selecting a physical plan that is generally used for selecting a logical plan. That is, make a
sequence of choices based on heuristics.
• Most commonly used heuristic approaches:
i. If the logical plan calls for a selection σ A = c(R) and relation R has an index on attribute A, then perform an
index-scan to obtain only the tuples of R with A-value equal to c.
ii. If the selection involves one condition like A = c and other conditions as well, implement the selection by an
index scan followed by a further selection on the tuples, which shall be represented by the physical operator
filter.
iii. If an argument of a join has an index on the join attribute(s), then use an index-join with that relation in the
inner loop
iv. If one argument of a join is sorted on the join attribute(s), then prefer a sort-join to a hash-join, although not
necessarily to an index-join if one is possible.
v. When computing union or intersection of three or more relations, group the smallest relations first
13
DATABASE TECHNOLOGIES
Query Compiler- Approaches to Enumerating Physical Plans
14
DATABASE TECHNOLOGIES
Query Compiler- Approaches to Enumerating Physical Plans
Hill Climbing
• Start with a heuristically selected physical plan.
• Make small changes to the plan, e.g., replacing one method for executing an operator by
another, or reordering joins by using the associative and/or commutative laws, to find
“nearby” plans that have lower cost.
• If you find a plan such that no small modification yields a plan of lower cost, choose the
physical query plan.
15
DATABASE TECHNOLOGIES
Query Compiler- Approaches to Enumerating Physical Plans
Dynamic Programming
• This is a variation of the general bottom-up strategy
• For each subexpression only the plan with least cost is considered
• As we work up the tree, consider possible implementations of each node, assuming the
best plan for each subexpression
16
DATABASE TECHNOLOGIES
Query Compiler- Approaches to Enumerating Physical Plans
Selinger-style Optimization
• This approach improves upon the dynamic-programming approach by keeping for each
subexpression not only the plan of least cost, but certain other plans that have higher
cost, yet produce a result that is sorted in an order that may be useful higher up in the
expression tree
• If we take the cost of a plan to be the sum of the sizes of the intermediate relations, then
there appears to be no advantage to having an argument sorted
• If we use the more accurate measure disk I/O’s as the cost, then the advantage of having
an argument sorted becomes clear if we can use one of the sort-based algorithms and
save the work of the first pass for the argument that is sorted already
17
THANK YOU
Cost Based Plan Selection
Department of Computer Science and Engineering
18
Database Technologies
Query Processing and Optimization
List of Contents
- Significance of Left and Right Join Arguments
- Join Trees
- Left-Deep, Right-Deep & Bushy Join Trees
- Dynamic Programming to Select a Join Order and Grouping
- A Greedy Algorithm for Selecting a Join Order
• A critical problem in cost-based optimization is selecting an order for the (natural) join of three or more
relations;
• Join methods are asymmetric. The roles played by the two argument relations are different and the cost of
the join depends on which relation plays which role;
• In one-pass join, read one relation preferably the smaller into main memory, creating a structure such as a
hash table to facilitate matching of tuples from the other relation. It then reads the other relation one block
at a time to join its tuples with the tuples stored in memory;
• Hash join: Assume the left argument of the join is the smaller relation and store it in a main-memory data
structure. This relation is called the build relation. The right argument of the join, called the probe relation,
is read a block at a time and its tuples are matched in main memory with those of the build relation;
• Nested-loop join: Assume the left argument is the relation of the outer loop.
• Index-join: Assume the right argument has the index.
3
DATABASE TECHNOLOGIES
Query Compiler- Choosing an Order for Joins
Join Trees
• When two relations are to be joined, there are only two choices for a join tree— take either of the
two relations to be the left argument
• When the join involves more than two relations, the number of possible join trees grows rapidly.
There are n! ways to order n relations
• Consider joining four relations R, S, T and U
Left Deep Join Tree
• A binary tree is Left-deep if all the right children are leaves
• Left-deep trees for joins interact well with common join algorithms, nested-loop joins and one-
pass joins in particular.
• Query plans based on left-deep trees plus these join implementations will tend to be more efficient
than the same algorithms used with non-left-deep trees.
4
DATABASE TECHNOLOGIES
Query Compiler- Choosing an Order for Joins
• The “leaves” in a left or right-deep join tree can actually be interior nodes with operators other than a join
• The total number of tree shapes
• If one-pass joins are used and the build relation is on the left, then the amount of memory needed at any
one time tends to be smaller than if we used a right-deep tree or a bushy tree for the same relations.
• If we use nested-loop joins with the relation of the outer loop on the left, then we avoid constructing any
intermediate relation more than once
5
DATABASE TECHNOLOGIES
Query Compiler- Choosing an Order for Joins
6
DATABASE TECHNOLOGIES
Query Compiler- Choosing an Order for Joins
1. Consider the pairs of relations. The cost for each is 0, since there are still no intermediate relations
in a join of two relations
7
DATABASE TECHNOLOGIES
Query Compiler- Choosing an Order for Joins
Consider {R,S,T}. We must consider each of the three pairs {R, S}, {R, T}, {S, T} in turn.
Cost of {R, S} = 5,000, {R, T} = 1,000,000 and {S, T} = 2,000. Pick {S, T} since t has the lowest cost
5. Consider joining all four relations
8
DATABASE TECHNOLOGIES
Query Compiler- Choosing an Order for Joins
9
DATABASE TECHNOLOGIES
Query Compiler- Choosing an Order for Joins
1. Consider the pairs of relations. The pair {T, U} has the least size
10
DATABASE TECHNOLOGIES
Query Compiler- Choosing an Order for Joins
6. NOTE: In the two examples solved above, the tree resulting from the greedy
algorithm is
the same as that selected by the dynamic programming algorithm.
11
7. However, there are examples where the greedy algorithm fails to find the best
THANK YOU
Choosing An Order For Joins
Department of Computer Science and Engineering
12