L10-Query Evaluaion
L10-Query Evaluaion
Operators
1
Background
• Data pages must be read into the memory to be processed
• The memory area where a data page is stored for processing is
called a buffer page (or simply buffer)
• More available buffers usually can speed up the processing
• To speed up the query evaluation, a data file (i.e, relation) often
needs to be sorted based on the search key
• Suppose a file contains M pages, and B buffers are available in the
memory then (see Section 13.3: External Merge sort)
– Sorting the file requires 2xMx(logB-1M/B + 1) page accesses
– Example, a file contains 1000 pages, and 20 buffer pages are available, then
sorting the file requires 2x1000x(log19 1000/20 + 1) =
2x1000x(2+1) = 6000 page accesses
2
Access Paths
• An access path is a method of retrieving tuples:
– File scan, or index that matches a selection (in the
query)
• Selectivity of an access path:
– # of pages retrieved (index pages + data pages)
• For a single operation, there may be different access paths
with different selectivity
• Most selective access path for an operation:
– The one with the lowest selectivity
• Using the most selective access path to minimizes the cost
for the operation
3
Examples and Cost Calculations
• Given the following schema:
–Sailors(sid: integer, sname: string, rating: integer, age: real)
–Reserves(sid: integer, bid: integer, day: dates, rname: string)
• rname is the name of the person who has made the reservation
• sid is the id of the person on whose behalf the reservation was made
• Thus, rname and sid may refer to different persons
• Assuming the following sizes
–Sailors: 500 pages, 80 tuples/page, 50 bytes/tuple
–Reserves: 1000 pages, 100 tupes/page, 40 bytes/tuple
• We consider only I/O cost: number of pages that are read/written
• If alternatives involve the same cost for writing pages, we ignore
these when doing the comparison
4
A Motivating Example
• Consider the following simple query
SELECT *
FROM Reserves R
WHERE R.rname = ‘Joe’
• If no index is created on rname, the most selective
access path (also the only one) is:
1. Scanning the entire Reserves relation, by reading the page
one after another
2. For each page scanned, checking the condition on each tuple
3. If the condition is met, then add the tuple to the result
Cost: 1000 I/Os
• If index is available on rname, we can do it much faster
5
Selection: R.attr op value (R)
• No Index, Unsorted (R contains M pages, same below)
– Most selective access path is file scan
– Cost: M I/Os. Let R be Reserves. Then M = 1000
• No Index, R sorted on R.attr. Most selective access path:
– Binary search on R.attr for value to locate the first tuple that satisfies the
condition
– Start at this position, scan the relation until the condition becomes untrue
– Cost of binary search: log2(M)
– Cost of scanning after binary search: depends on the # of tuples satisfying
the condition, can vary from zero to M.
– Example: Let R be Reserves and ‘Chan’ < rname < ‘Lin’. Assume 10% of
the tuples satisfy the condition.
• Cost of binary search: log2(1000) 10
• Cost of scanning after binary search: 100
• Total cost: 110
6
Selection: R.attr op value (R)
• B+ Tree Index on R.attr (assuming op is not )
1. Search the tree to find the first data entry that points to a qualifying
tuple of R
2. Scan the leaf page to retrieve all the data entries in which the key value
satisfies the selection condition (not needed for clustered index)
3. For each data entry retrieved, follow the pointer to get the
corresponding tuple of R
Cost
Step 1: height of the tree, usually 2 or 3 I/Os
Step 2: depends on the number of such data entries
Step 3: let N be the number of the qualifying tuples and P be the
number of tuples that can be stored in a page.
Index is clustered: N/P+1 (in reality, most likely N < P, so cost 2)
Index is not clustered: N (since in the worst case, each qualifying tuple
could be in a different page.
7
Example Sel. : B+ Tree Clustered Index
Assume the search key is sid. Consider the selection condition:
19 < sid < 29
17
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 30* 33* 34* 38* 39*
Data file
22
19
33
30
27
34
29
39
38
24
20
17
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 30* 33* 34* 38* 39*
Data file
16
22
19
33
30
34
27
39
29
38
24
20
8
11
Example Selection: Hash Index Clustered
• Selection Condition: age = 7 sid sname age
• h(7) = 7 mod 32 = 7 = 1112 2 1
4* 12* 32*48* 4
• Cost
– Calculate h(7): 0 5
– Get directory entry: 0 2 2 7
(assume directory is in 00 1* 5* 21* 25*
main memory) 7
01
– Get bucket page: 1 7
10 2
– Get qualifying tuples: 2
11 10* 10
10
DIRECTORY 2
3* 7* 19*
Bucket pages
Example Selection: Hash Index Unclustered
• Sel. Cond. age = 7 sid sname age
2 5
• h(7) = 7 mod 32 = 7 = 1112 4* 12* 32*48* 4
• Cost
1
– Get directory entry: 0 2 2 7
(assume directory is in 00 1* 5* 21* 25*
main memory) 10
01
– Get bucket page: 1 7
10 2
– Get qualifying tuple: 3
11 10* 10
7
DIRECTORY 2
3* 7* 19*
Bucket pages
13
Projection
• Consider example:
SELECT DISTINCT R.sid, R.bid
FROM Reserves R
• General method
– Scan relation R and discard unwanted attributes
– Eliminate duplicates (This is an expensive operation.)
• projection based on sorting
1. Scan R, write sid and bid of each tuple to a temporary file T
2. Sort T based on both sid and bid
3. Scan the sorted file, compare the adjacent tuples and discard duplicates
• Cost, assuming T has 250 pages, and 20 buffers available
– Step 1: 1000 + 250 = 1250 I/Os
– Step 2: 2 250 2 =1000 I/Os (Refer to the formula on p.2)
– Step 3: 250 I/Os
– Total: 2500 I/Os
14
Example: sid, bid(Reserves) based on sorting
sid bid day rname sid bid sid bid sid bid
Discard duplicates
2 400 2 400 7 120 9 111
12 300 9 111 10 100
12 300 2 400 10 100 10 300
2 400 20 150 10 300 12 300
20 150
10 300 12 300 12 500
10 300 14 100 12 500 14 300
14 100 16 200 14 100 14 100
16 200 9 111 14 100 16 200
10 100 16 200 20 150
9 111 7 120 20 150
10 100
7 120
Reserves 15
The Join operation
• Example:
SELECT *
FROM Reserves R, Sailors S (R has 1000 pages, 100 tuples/page)
WHERE R.sid = S.sid (S has 500 pages, 80 tuples/page)
• Nested Loops Join
foreach tuple r in R do (R is called outer relation)
foreach tuple s in S do (S is called inner relation)
if r.sid = s.sid then add <r, s> to result
Cost
– Scan R: 1000 I/Os
– S is scanned once for each tuple of R :1000 100 500
– Total: 1000 100 500 + 1000 = 50001000 I/Os
– Switch R and S, the total is: 500 80 1000 + 500 = 40000500 I/Os
– If each I/O takes 10ms, the total time is over 100 hours!!
16
Join Operation (cont.)
• Refinement to Nested Loops Join – a page at a time
For each page p of R
for each page q of S
output all r p and s q such that r.sid =
s.sid
• Cost
– Scan R: 1000 I/Os
– For each page of R, read 500 pages of S
– Total: 1000 + 1000 500 = 501000
– Amount of time: 1.4 hours
17
One page is reserved
• Block Nested Loops Join to read Sailors and
the other page is
– Suppose we have enough buffers to hold B+2 pages, reserved for output.
then read B pages of Reserves at a time:
For each block P of Reserves
for each page q of Sailors
for each r P and s q such that r.sid=s.sid
add <r, s> to the result
• Cost (assuming B = 100. Thus Reserves contains 10
blocks)
Note that the definition
– Scan Reserves: 1000 I/Os of B is a little bit
– For each block, scan Sailors, for 500 I/Os different from that
in your textbook.
– Total: 1000 + 10500 = 6000
– If we choose Sailors to be the outer relation, the cost is:
500 + 51000 = 5500
• If the buffer is large enough to hold the smaller relation
(+ 2 more pages), the cost is 1000 + 500 = 1500
18
Diagramatic View for Block Nested Loops Join
Relations Sailors
and Reserves Result
input buffer to
output buffer
scan Sailors
Disk Disk
19
• Index Nested Loops Join
– Assume an index on sid of the Sailors relation,
16
16 16
h(1
6)
16*, 24* h
16
16 16
h(1
16*, 24* 6 )
h
16
16
Buckets Directory
Reserves Sailors 23
Idea for Sort-Merge Join (R i=j S)
• Sort R and S on the join column, then scan them to do a
``merge’’ (on join col.), and output result tuples.
– Advance scan of R until current R-tuple >= current S tuple, then
advance scan of S until current S-tuple >= current R tuple; do this
until current R tuple = current S tuple.
– At this point, all R tuples with the same value in Ri (current R
group) and all S tuples with the same value in Sj (current S group)
match; output <r, s> for all pairs of such tuples.
– Then resume scanning R and S.
• R is scanned once; each S group is scanned once per
matching R tuple. (Multiple scans of an S group are likely
to find needed pages in buffer.)
24
Example of Sort-Merge Join
sid bid day rname
sid sname rating age
22 dustin 7 45.0 28 103 12/4/96 guppy
28 yuppy 9 35.0 28 103 11/3/96 yuppy
31 lubber 8 55.5 31 101 10/10/96 dustin
44 guppy 5 35.0 31 102 10/12/96 lubber
58 rusty 10 35.0 31 101 10/11/96 lubber
58 103 11/12/96 dustin
• Cost
• Cost for sorting the two relations (use formula)
• Cost of joining two relations: if one of the join attributes is a primary key, then
this cost is M+N, where M and N are the sizes (in pages) of the two relations
25
Size of the Join result
• Terminologies
– n : number of tuples in relation r.
r
– f : blocking factor of relation r, i.e., the number of
r
tuples of relation r that fit into one block (page).
– If tuples of relation r are stored together physically,
br nr / f r
27
• Example
– ncustomer = 10,000.
– fcustomer = 25 bcustomer = 10000/25 = 400.
– ndepositor = 5000
– fdepositor = 50 bdepositor = 5000/50 = 100.
– V(customer-name, depositor) = 2500
• On average each customer has two accounts
– Assume customer-name in depositor is a
foreign key on customer.
28
• r s contains nrns tuples.
• Each tuple of r s occupies sr+ss bytes.
• Size of natural join
– Let r(R) and s(S) be relations.
– If R S =
• r s is the same as r s.
– If R S is a key of r:
• A tuple of s will join with at most one tuple from r.
• The number of tuples in r s is no greater than the
number of tuples in s.
• If R S is a foreign key of s referencing to r, then
the number of tuples in r s is exactly the same as
the number of tuples in s.
29
• Example, depositor customer
» Customer-name in depositor is a foreign key of refer
to the customer-name in customer.
– The size of the result is ndepositor = 5000.
– If R S is a key for neither r nor s:
• Let R S = {A}.
• Assume each value appears with equal probability.
• We estimate that each tuple t in r produces
ns
in r s
V ( A, s )
• Total number of tuples in r s is estimated to be
nr ns
V ( A, s )
30
– Similarly, if we reverse the roles of r and s, the total
number of tuple is estimated to be
nr ns
V ( A, r )
– These two estimates differ if V(A,r) V(A,s).
– The lower of the two estimates is probably the better
one, since there are likely to be some dangling tuples.
– Example without using information about foreign keys:
• V(customer-name,depositor) = 2500
– Size = 5000*10000/2500 = 20000
• V(customer-name, customer) = 10000
– Size = 5000*10000/10000 = 5000
• The lowest one is 5000.
31
Selection Operation
• Linear Search
– All block have to be read: b
r
– Selection on a key attribute: b /2
r
• Binary Search
– Locating the first tuple: 2 r
log (b )
33
• Selections involving comparisons
– Let lowest and highest values be min(A,r) and
max(A,r).
– Assume the values are uniformly distributed.
– Number of records that will satisfy A v
• 0, if v min( A, r )
• nr , if v max( A, r )
• otherwise, nr (v min( A, r )) /(max( A, r ) min( A, r ))
• Clustered index, comparison
• Let the number of values that satisfy the condition
be c.
• Cost = index access + c / f r
34
• Unclustered index, comparison
– Cost = index access + c
– E.g. Assume one-half of the records satisfy the
condition
• Cost = index access + nr/2
• Suppose B+tree is used
– Cost = number of levels (number of leave nodes)/2 - 1 n r / 2
35
Complex Selections
• Conjunction selection 1 2 n
(r )
– Let si be the size of i (r )
– Probability of satisfying condition i is si / nr
n
– Estimate size = r 1 2 ( s s s n ) / nr
n
36
• Conjunctive selection using one index
– Use a selection algorithm (those discussed before) to retrieve the
records.
– Complete the operation by testing, in the memory buffer (on the
fly), whether the remaining conditions are satisfied.
• Conjunctive selection using composite index
– As discussed before.
• Conjunctive Selection by intersection of identifiers
– Each index is scanned for pointers to tuples that satisfy an
individual condition.
– Intersect all the retrieved pointers
– Using the resulting pointers to retrieve the actual records.
– If indices are not available on all the individual conditions, then
the retrieved records are tested against the remaining conditions.
37
• Disjunctive selection by union of identifiers
– Each index is scanned for pointers to tuples that satisfy an
individual condition.
– Union all the retrieved pointers
– Using the resulting pointers to retrieve the actual records.
– If even one of the conditions does not have an access path, we will
have to perform a linear scan on the relation.
38
More Examples
• Reserves:
– Each tuple is 40 bytes long, 100 tuples per page, 1000
pages.
• Sailors:
– Each tuple is 50 bytes long, 80 tuples per page, 500
pages.
39
RA Tree: sname
SELECT S.sname
FROM Reserves R, Sailors S
WHERE R.sid=S.sid AND bid=100 rating > 5
R.bid=100 AND S.rating>5
sid=sid
• Translating to relational algebra operation
sname(bid=100rating>5(Reserves sid=sidSailors))
Reserves Sailors
• Tree representation (to the right)
– Nodes are relational operators (On-the-fly)
– sname
Edges point to where the input comes from
– In general, we must also add access path to each node Plan:
• Cost:500+500*1000=500500 I/Os (page at a time)
• Misses several opportunities: selections could have bid=100 rating > 5 (On-the-fly)
been `pushed’ earlier, didn’t use any available
indexes, etc.
• Goal of optimization: To find more efficient plans (Simple Nested Loops)
that compute the same answer. sid=sid
Sailors 40
Reserves
On-the-Fly vs Materialized Evaluations
• Let op1 and op2 be two relational algebra operations, and
op1 is performed on the result of op2
• The evaluation of op1 is on-the-fly if the result of op2 is
directly sent to op1 (i.e, not stored in a temporary file)
• It is materialized if that result is stored in a temporary file
first
• On-the-fly evaluation is called pipelined evaluation
• On-the-fly can be more efficient then materialized
– Example: bid=100(Reserves sid=sid Sailors))
• Whether on-the-fly should be used depends on the case
– Example: (age<25 Sailors) sid=sid (bid=100 Reserves)
– What if only one sailor has ever reserved bid=100 once? (i.e, only
one tuple in Reserves has bid=100)
41
On-the-Fly vs. Materialized Evaluations : example
bid=100(Reserves sid=sidSailors))
Reserves Sailors Reserves Sailors
sid bid sid name sid bid sid name
1 100 1 Lin 1 100 1 Lin
2 200 2 Wu 2 200 2 Wu
Disk 2 100 3 Du 2 100 3 Du Disk
1 100 Lin
2 200 Wu
1 100 Lin
2 200 Wu
RAM
Buffer for temp. Buffer for temp. Buffer for temp. Buffer for temp.
result result result result
RAM
43
Materialized On-the-Fly
On-the-Fly vs. Materialized Evaluations : example
bid=100(Reserves sid=sidSailors))
Reserves Sailors Reserves Sailors
sid bid sid name sid bid sid name
1 100 1 Lin 1 100 1 Lin
2 200 2 Wu 2 200 2 Wu
Disk 2 100 3 Du 2 100 3 Du Disk
1 100 Lin 2 100 Wu
2 200 Wu
Buffer for temp. Buffer for temp. Buffer for temp. Buffer for temp.
result result result result
RAM
44
Materialized On-the-Fly
On-the-Fly vs. Materialized Evaluations : example
bid=100(Reserves sid=sidSailors))
(Scan; (Scan;
write to bid=100 rating > 5 write to
• Main difference: push selects. temp T1) temp T2)
• With 5 buffer pages, cost of plan: Reserves Sailors
– Scan Reserves (1000) + write temp T1 (10 pages, assuming 100 boats, uniform
distribution. There are 100,000 tuples in Reserves, with 1000 tuples per boat, stored in
10 pages.)
– Scan Sailors (500) + write temp T2 (250 pages, assuming 10 ratings, uniform
distribution. There are 40,000 tuples in Sailors, with 4000 tuples per rating. Thus 20000
tuples have ratings > 5, stored in 20000/80 = 250 pages. )
– Sort T1 (2*10*2), sort T2 (2*250*4), merge (10+250)
– Total: 4060 page I/Os.
• If we used BNL join, join cost = 10+4*250, total cost = 1010 + 750 + 1010= 2770.
(Note for BNL we do not sort T1 and T2.)
• Furthermore, we can push projections (next slide)
46
Alternative Plans 1 (cont.)
(No Indexes) sid=sid
(BNL)
= 1906
47
(On-the-fly)
sname
Alternative Plans 2
sname
join. Why?
(Use hash
– If we perform the selection index; do bid=100 Sailors
not write
before the join, we must result to
temp)
scan the Sailors, with an Reserves
extra cost of 500 I/Os.
More importantly, once the selection is performed, we have no index on the sid field
of the result any more
Note: the result is a brand new file. Its location is independent of the location of
Sailors.
We have to build a new index solely for the subsequent Join operation. Is this
worthwhile?
49
(On-the-fly)
Alternative Plans 2
sname
– 2 + 10 + 10x500 = 5012
Reserves
• The cost for the lower plan
(On-the-fly)
– 2 + 10(select) + 10(write) + 2x3x10(sort) + sname
(10+500) (sort-merge join) = 592
• This implies that on-the-fly evaluation is
rating > 5 (On-the-fly)
not always the best
( Merge join)
sid=sid
50
Reserves