Overview of Query Evaluation: R&G Chapter 12
Overview of Query Evaluation: R&G Chapter 12
Evaluation
R&G Chapter 12
Lecture 13
Administrivia
Exams graded
HW2 due in a week
No Office Hours Today
Review: Storage
A DBMS has layers
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
Now to Midterm 2
Review
We studied Relational Algebra
Many equivalent queries, produce same result
Which expression is most efficient?
We studied file organizations
Hash files, Sorted files, Clustered &
Unclustered Indexes
Compared scans, sorting, searches, insert,
delete
Today: costs to implement relational
operations
Thurs, Tues: Sorting, Joins
Overview (cont)
Query Evaluation involves:
Choosing an Access Path to get at each
table
Evaluating different algorithms for each
relational operator
Choosing the order to apply the relational
operators
These choices interrelated
Overview (cont)
Overall goal: minimize I/Os
Algorithms for evaluating relational
operators use simple ideas extensively:
Indexing: Can use WHERE conditions to
retrieve small set of tuples (selections, joins)
Iteration: Sometimes, faster to scan all tuples
even if there is an index. (sometimes scan the
data entries in an index instead of the table
itself.)
Partitioning: By using sorting or hashing, we
can partition the input tuples and replace an
expensive operation by similar operations on
smaller
inputs.
* Watch
for these
techniques as we discuss query evaluation!
Intermission: a preview of
sorting
Data can only be sorted when in memory
But tables often *much* bigger than
memory
One solution: merge sort
Every one stand up
Go to the aisle by the windows
I will take 10 people at a time onto the
stage
I will sort each group of 10 on last name
from A to Z
Groups will then be merged
log 2 N 1
So total cost is:
3,4
6,2
9,4
8,7
5,6
3,1
3,4
2,6
4,9
7,8
5,6
1,3
4,7
8,9
2,3
4,6
1,3
5,6
Input file
PASS 0
1-page runs
PASS 1
2
2-page runs
PASS 2
2,3
2 N log 2 N 1
4,4
6,7
8,9
1,2
3,5
6
4-page runs
PASS 3
1,2
2,3
3,4
4,5
6,6
7,8
9
8-page runs
Example 1
Select sname, bid from Sailors S, Reserves
R where s.sid = r.sid and S.age > 99
Several possible rel. algebra queries:
s.age>99)(S
R)
s.age>99)S)
Projection
SELECT
DISTINCT
R.sid,
R.bid
FROM
Reserves
Expensive part is removing duplicates.
SQL systems dont removeR duplicates unless the
keyword DISTINCT is specified in a query.
Sorting Approach: Sort on <sid, bid> and remove
duplicates. (Can optimize this by dropping unwanted
information while sorting.)
Hashing Approach: Hash on <sid, bid> to create
partitions. Load partitions into memory one at a time,
build in-memory hash structure, and eliminate
duplicates.
If there is an index with both R.sid and R.bid in the
search key, may be cheaper to sort data entries!
Join: Sort-Merge (R
i=j
S)
sid
28
28
31
31
31
58
bid
103
103
101
102
101
103
day
12/4/96
11/3/96
10/10/96
10/12/96
10/11/96
11/12/96
rname
guppy
yuppy
dustin
lubber
lubber
dustin
Cost Estimation
For each plan considered, must estimate
cost:
Must estimate cost of each operation in plan
tree.
Depends on input cardinalities.
Weve already discussed how to estimate the cost
of operations (sequential scan, index scan, joins,
etc.)
termk
Motivating Example
RA Tree:
SELECT S.sname
FROM Reserves R, Sailors S
WHERE R.sid=S.sid AND
R.bid=100 AND S.rating>5
sname
bid=100
rating > 5
sid=sid
Sailors
Reserves
Cost: 500+500*1000 I/Os
(On-the-fly)
By no means the worst plan!
Plan: sname
Misses several opportunities:
selections could have been
rating > 5 (On-the-fly)
`pushed earlier, no use is
bid=100
made of any available indexes,
etc.
(Simple Nested Loops)
Goal of optimization: To find
sid=sid
more efficient plans that
compute the same answer.
Reserves
Sailors
(On-the-fly)
Alternative Plans 1
(No Indexes)
sname
sid=sid
(Scan;
write to bid=100
temp T1)
(Sort-Merge Join)
rating > 5
(Scan;
write to
temp T2)
Alternative Plans 2
With Indexes
sname
(On-the-fly)
Summary
There are several alternative evaluation algorithms for
each relational operator.
A query is evaluated by converting it to a tree of operators
and evaluating the operators in the tree.
Must understand query optimization in order to fully
understand the performance impact of a given database
design (relations, indexes) on a workload (set of queries).
Two parts to optimizing a query:
Consider a set of alternative plans.
Must prune search space; typically, left-deep plans only.