QEII
QEII
Introduction to Database
Systems
Query Evaluation II
1.1
Cost-based Query Sub-System
Select *
Queries From Blah B
Where B.blah = blah
Query Parser
Query Optimizer
Schema Statistics
Query Plan Evaluator
1.2
Review - Relational Operations
Join ( ⋈ ) Allows us to combine two relations.
Set-difference ( - ) Tuples in reln. 1, but not in reln. 2.
Union ( ) Tuples in reln. 1 and in reln. 2.
Also: Aggregation (SUM, MIN, etc.) and GROUP BY
Since each op returns a relation, ops can be composed ! After we cover
the operations, we will discuss how to optimize queries formed by
composing them.
1.3
Selection (filter)
Operators
1.4
Schema for Examples
1.5
SELECT *
Simple Selections FROM Reserves R
WHERE R.date > ‘1/1/2015’
Of the form
Question: how best to perform? Depends on:
what indexes/access paths are available
what is the expected size of the result (in terms of number of tuples and/or
number of pages)
Size of result approximated as
1.6
Alternatives for Simple Selections
With no index, unsorted:
Must essentially scan the whole relation
cost is M (#pages in R). For “Reserves” = 1000 I/Os.
With no index, sorted on day:
cost of binary search + number of pages containing results.
For reserves = 10 I/Os + selectivity*1000
With an index on selection attribute:
Use index to find qualifying data entries,
then retrieve corresponding data records.
(Hash index useful only for equality selections.)
1.7
Using an Index for Selections
1.8
Selections using Index (cont)
Important refinement for unclustered indexes:
1.10
Two Approaches to General Selections
1.11
Most Selective Index - Example
Consider day < 8/9/94 AND bid=5 AND sid=3.
A B+ tree index on day can be used;
then, bid=5 and sid=3 must be checked
for each retrieved tuple.
Similarly, a hash index on <bid, sid> could be used;
Then, day<8/9/94 must be checked.
How about a B+tree on <rname,day>?
How about a B+tree on <day, rname>?
How about a Hash index on <day, rname>?
1.12
Intersection of Rids
Second approach: if we have 2 or more matching indexes
(w/Alternatives (2) or (3) for data entries):
Get sets of rids of data records using each matching index.
Then intersect these sets of rids.
Retrieve the records and apply any remaining terms.
Consider day<8/9/94 AND bid=5 AND sid=3. With a B+ tree
index on day and an index on sid, we can retrieve rids of
records satisfying day<8/9/94 using the first, rids of recs
satisfying sid=3 using the second, intersect, retrieve records
and check bid=5.
Note: commercial systems use various tricks to do this:
bit maps, bloom filters, index joins
1.13
Join Operators
1.14
Join Operators
1.15
Equality Joins With One Join Column
SELECT *
FROM Reserves R1, Sailors S1
WHERE R1.sid=S1.sid
1.16
Simple Nested Loops Join
foreach tuple r in R do
foreach tuple s in S do
if ri == sj then add <r, s> to result
For each tuple in the outer relation R, we scan the entire inner relation S.
How much does this Cost?
(pR * M) * N + M = 100,000*500 + 1000 I/Os. ( about 50M I/Os!!)
At 10ms/IO, Total: ???
What if smaller relation (S) was outer?
(ps * N) *M + N = 40,000*1000 + 500 I/Os. (better…. 40M I/Os)
Prohibitively expensive…
1.17
Page-Oriented Nested Loops Join
foreach page bR in R do
foreach page bS in S do
foreach tuple r in bR do
foreach tuple s in bSdo
if ri == sj then add <r, s> to result
For each page of R, get each page of S, and write out matching pairs of
tuples <r, s>, where r is in R-page and S is in S-page.
1.18
Block Nested Loops Join
Page-oriented NL doesn’t exploit extra buffers.
Alternative approach: Use one page as an input buffer for scanning
the inner S, one page as the output buffer, and use all remaining
pages to hold ``block’’ of outer R.
For each matching tuple r in R-block, s in S-page, add <r, s> to
result. Then read next R-block, scan S, etc.
1.19
Examples of Block Nested Loops
Cost:
1.20
Index Nested Loops Join
foreach tuple r in R do
foreach tuple s in S where ri == sj do
add <r, s> to result
If there is an index on the join column of one relation (say S), can make it
the inner and exploit the index.
Cost: M + ( (M*pR) * cost of finding matching S tuples)
For each R tuple, cost of probing S index is about 1.2 for hash index, 2-4
for B+ tree.
Cost of then finding S tuples (assuming Alt. (2) or (3) for data entries)
depends on clustering.
Clustered index: 1 I/O per page of matching S tuples.
Unclustered: up to 1 I/O per matching S tuple.
1.21
Examples of Index Nested Loops
1.22
Sort-Merge Join (R S)
Sort R and S on the join column, then scan them to do a ``merge’’ (on join
col.), and output result tuples.
Particularly useful if
one or both inputs are already sorted on join attribute(s)
output is required to be sorted on join attributes(s)
“Merge” phase can require some back tracking if duplicate values appear in
join column
R is scanned once; each S group is scanned once per matching R tuple.
(Multiple scans of an S group will probably find needed pages in buffer.)
1.23
Example of Sort-Merge Join
1.25
Refinement of Sort-Merge Join
We can combine the merging phases in the sorting of R and S with the
merging required for the join.
Pass 0 as before, but apply to both R then S before merge.
If B > , where L is the size of the larger relation, using the sorting
refinement that produces runs of length 2B in Pass 0, #runs of each relation is
< B/2.
In “Merge” phase: Allocate 1 page per run of each relation, and `merge’ while
checking the join condition
Cost: read+write each relation in Pass 0 + read each relation in (only) merging
pass (+ writing of result tuples).
In example, cost goes down from 7500 to 4500 I/Os for B=300.
In practice, the I/O cost of sort-merge join, like the cost of external sorting,
is linear.
1.26
Impact of Buffering
1.27
Original
Hash-Join Relation OUTPUT Partitions
1
1
INPUT 2
Partition both relations 2
hash
...
function
on the join attributes
h
using hash function h. B-1
B-1
R tuples in partition Ri
Disk B main memory buffers Disk
will only match S tuples
in partition Si.
Partitions
Join Result
For i= 1 to #partitions { of R & S
Hash table for partition
hash Ri (k < B-1 pages)
Read in partition Ri fn
h2
and hash it using
h2 (not h2
h). Input buffer Output
for Si buffer
Scan partition Si and
Disk B main memory buffers Disk
probe hash table
1.28
Observations on Hash-Join
If the hash function does not partition uniformly, one or more R partitions
may not fit in memory. Can apply hash-join technique recursively to do
the join of this R-partition with corresponding S-partition.
1.29
Cost of Hash-Join
1.30
Set Operations
Intersection and cross-product special cases of join.
Union (Distinct) and Except similar; we’ll do union.
1.31
General Join Conditions
1.32
Review
Implementation of Relational Operations as Iterators
Focus largely on External algorithms (sorting/hashing)
Choices depend on indexes, memory, stats,…
Joins
Blocked nested loops:
simple, exploits extra memory
Indexed nested loops:
best if 1 rel small and one indexed
Sort/Merge Join
good with small amount of memory, bad with duplicates
Hash Join
fast (if enough memory), bad with skewed data
Relatively easy to parallelize
1.33
Aggregation Operators
1.34
Schema for Examples
1.35
Aggregate Operations (AVG, MIN,
etc.)
Without grouping:
In general, requires scanning the relation.
Given a tree index whose search key includes all attributes in the SELECT or
WHERE clauses, can do index-only scan.
With grouping:
Sort on group-by attributes, then scan relation and compute aggregate for
each group. (Better: combine sorting and aggregate computation.)
Similar approach based on hashing on group-by attributes.
Given a tree index whose search key includes all attributes in SELECT,
WHERE and GROUP BY clauses, can do index-only scan; if group-by
attributes form prefix of search key, can retrieve data entries/tuples in group-
by order.
1.36
Sort GROUP BY: Naïve Solution Aggregate
The Sort iterator naturally permutes its input so that all tuples
are output in sequence Sort
The Aggregate iterator keeps running info (“transition values” or
“transVals”) on agg functions in the SELECT list, per group:
Example transVals:
For COUNT, it keeps count-so-far
For SUM, it keeps sum-so-far
For AVERAGE it keeps sum-so-far and count-so-far
As soon as the Aggregate iterator sees a tuple from a new group:
ù It produces an output for the old group based on the agg
function
E.g. for AVERAGE it returns (sum-so-far/count-so-far)
ù It resets its running info.
ù It updates the running info with the new tuple’s info
1.37
Sort GROUP BY: Naïve Solution
A, 3
B, 2
C, 1
D, 1
<B,2>
<A, 3>
<A,2>
<A, 1>
<B,1>
Aggregate
A
B
C
A
AB
Sort
A
B
D
C
B
A
A C A A
B D B
1.38
Hash GROUP BY: Naïve Solution
Aggregate
(similar to the Sort GROUPBY)
Hash
The Hash iterator permutes its input so that all tuples are output in
groups.
The Aggregate iterator keeps running info (“transition values” or
“transVals”) on agg functions in the SELECT list, per group
E.g., for COUNT, it keeps count-so-far
For SUM, it keeps sum-so-far
For AVERAGE it keeps sum-so-far and count-so-far
When the Aggregate iterator sees a tuple from a new group:
ù It produces an output for the old group based on the agg
function
E.g. for AVERAGE it returns (sum-so-far/count-so-far)
ù It resets its running info.
ù It updates the running info with the new tuple’s info
1.39
External Original
Hashing Relation OUTPUT
1
Partitions
Partition:
1
Each group will INPUT 2
hash 2
be in a single
...
function
disk-based partition file. But hp B-1
those files have many B-1
groups inter-mixed.
Rehash: Disk B main memory buffers Disk
1.41
SELECT DISTINCT
Projection R.sid, R.bid
1.42
DupElim & Indexes
If an index on the relation contains all wanted attributes in its search key,
can do index-only scan.
Apply projection techniques to data entries (much smaller!)
If an ordered (i.e., tree) index contains all wanted attributes as prefix of
search key, can do even better:
Retrieve data entries in order (index-only scan), discard unwanted fields,
compare adjacent tuples to check for duplicates.
1.43
Summary of Query Evaluation
Queries are composed of a few basic operators;
The implementation of these operators can be carefully tuned (and it is
important to do this!).
Operators are “plug-and-play” due to the Iterator model.
1.44