0% found this document useful (0 votes)

25 views50 pages

L10-Query Evaluaion

Database Query Evaluation

Uploaded by

Jason Wong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views50 pages

L10-Query Evaluaion

Database Query Evaluation

Uploaded by

Jason Wong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 50

Evaluating Relational

Operators

1
Background
• Data pages must be read into the memory to be processed
• The memory area where a data page is stored for processing is
called a buffer page (or simply buffer)
• More available buffers usually can speed up the processing
• To speed up the query evaluation, a data file (i.e, relation) often
needs to be sorted based on the search key
• Suppose a file contains M pages, and B buffers are available in the
memory then (see Section 13.3: External Merge sort)
– Sorting the file requires 2xMx(logB-1M/B + 1) page accesses
– Example, a file contains 1000 pages, and 20 buffer pages are available, then
sorting the file requires 2x1000x(log19 1000/20 + 1) =
2x1000x(2+1) = 6000 page accesses

2
Access Paths
• An access path is a method of retrieving tuples:
– File scan, or index that matches a selection (in the
query)
• Selectivity of an access path:
– # of pages retrieved (index pages + data pages)
• For a single operation, there may be different access paths
with different selectivity
• Most selective access path for an operation:
– The one with the lowest selectivity
• Using the most selective access path to minimizes the cost
for the operation

3
Examples and Cost Calculations
• Given the following schema:
–Sailors(sid: integer, sname: string, rating: integer, age: real)
–Reserves(sid: integer, bid: integer, day: dates, rname: string)
• rname is the name of the person who has made the reservation
• sid is the id of the person on whose behalf the reservation was made
• Thus, rname and sid may refer to different persons
• Assuming the following sizes
–Sailors: 500 pages, 80 tuples/page, 50 bytes/tuple
–Reserves: 1000 pages, 100 tupes/page, 40 bytes/tuple
• We consider only I/O cost: number of pages that are read/written
• If alternatives involve the same cost for writing pages, we ignore
these when doing the comparison
4
A Motivating Example
• Consider the following simple query
SELECT *
FROM Reserves R
WHERE R.rname = ‘Joe’
• If no index is created on rname, the most selective
access path (also the only one) is:
1. Scanning the entire Reserves relation, by reading the page
one after another
2. For each page scanned, checking the condition on each tuple
3. If the condition is met, then add the tuple to the result
 Cost: 1000 I/Os
• If index is available on rname, we can do it much faster

5
Selection: R.attr op value (R)
• No Index, Unsorted (R contains M pages, same below)
– Most selective access path is file scan
– Cost: M I/Os. Let R be Reserves. Then M = 1000
• No Index, R sorted on R.attr. Most selective access path:
– Binary search on R.attr for value to locate the first tuple that satisfies the
condition
– Start at this position, scan the relation until the condition becomes untrue
– Cost of binary search: log2(M)
– Cost of scanning after binary search: depends on the # of tuples satisfying
the condition, can vary from zero to M.
– Example: Let R be Reserves and ‘Chan’ < rname < ‘Lin’. Assume 10% of
the tuples satisfy the condition.
• Cost of binary search: log2(1000)  10
• Cost of scanning after binary search: 100
• Total cost: 110
6
Selection: R.attr op value (R)
• B+ Tree Index on R.attr (assuming op is not  )
1. Search the tree to find the first data entry that points to a qualifying
tuple of R
2. Scan the leaf page to retrieve all the data entries in which the key value
satisfies the selection condition (not needed for clustered index)
3. For each data entry retrieved, follow the pointer to get the
corresponding tuple of R
 Cost
 Step 1: height of the tree, usually 2 or 3 I/Os
 Step 2: depends on the number of such data entries
 Step 3: let N be the number of the qualifying tuples and P be the
number of tuples that can be stored in a page.
 Index is clustered:  N/P+1 (in reality, most likely N < P, so cost  2)
 Index is not clustered: N (since in the worst case, each qualifying tuple
could be in a different page.

7
Example Sel. : B+ Tree Clustered Index
Assume the search key is sid. Consider the selection condition:
19 < sid < 29

5 13 24 30

2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 30* 33* 34* 38* 39*

Data file
22
19

33
30
27

34
29

39
38
24
20

• Step 1 (searching the tree): 3 pages

• Step 2: (need not scan the leave nodes) 0 pages
• Step 3 (retrieving the qualifying tuples): 2 pages 8
Example Sel. : B+ Tree Unclustered Index
Assume the search key is sid. Consider the selection condition:
19 < sid < 29

5 13 24 30

2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 30* 33* 34* 38* 39*

Data file
16

30
34

27
39

29
38
24

20
8

• searching the tree: 3 pages

• scan the leaf nodes: 2 pages
• retrieving the qualifying tuples: 4 pages 9
• Total = 3+2-1+4 = 8
Sample DB Selection cost for B+ Tree Index
• Consider selection ‘Chan’ < rname < ‘Lin’, assume 10% of
the tuples satisfy the condition
– Search B+ tree: a few I/O, say 3
– Clustered index:
• Scan the data file: 1000x 10% = 100
– Unclustered index:
• Search B+ tree: 3
• Scan the leaf nodes: assume the total number of leaf pages is 1/10 of
the number of data page. The cost is 1000x 1/10 x 1/10 = 10
• Retrieve the qualifying tuples:
– Total number of tuples in Reserves: 100, 000
– total number of qualifying tuples: 100, 000 x 1/10 = 10, 000
– Cost: 10, 000 I/O (since in the worst case, for each tuple we need to read
a page. Note that a page containing more than one qualifying tuples may
be read repeatedly)
• Thus, unclustered B+ tree index is not appropriate for
range search
10
Selection: R.attr op value (R) (cont.)
• Hash index on R.attr (assuming op is = )
1. Calculate the hash value for value. (The hash value identifies the
directory entry.)
2. Get the directory entry identified by the hash value
3. Retrieve the bucket page(s) pointed by the directory entry
4. For each data entry in the bucket, retrieve the qualifying tuple
 Cost
 Step 1: 0
 Step 2: 1 if directory does not fit in the memory, 0 otherwise
 Step 3: typically 1.2 (Recall we may go through overflow pages)
 Step 4: if R.attr is a key for R, then 1, otherwise, depends on the
number of qualifying tuples and whether or not it is a clustered index.

11
Example Selection: Hash Index Clustered
• Selection Condition: age = 7 sid sname age
• h(7) = 7 mod 32 = 7 = 1112 2 1
4* 12* 32*48* 4
• Cost
– Calculate h(7): 0 5
– Get directory entry: 0 2 2 7
(assume directory is in 00 1* 5* 21* 25*
main memory) 7
01
– Get bucket page: 1 7
10 2
– Get qualifying tuples: 2
11 10* 10
10

DIRECTORY 2
3* 7* 19*

Bucket pages
Example Selection: Hash Index Unclustered
• Sel. Cond. age = 7 sid sname age
2 5
• h(7) = 7 mod 32 = 7 = 1112 4* 12* 32*48* 4
• Cost
1
– Get directory entry: 0 2 2 7
(assume directory is in 00 1* 5* 21* 25*
main memory) 10
01
– Get bucket page: 1 7
10 2
– Get qualifying tuple: 3
11 10* 10
7

DIRECTORY 2
3* 7* 19*

Bucket pages

13
Projection
• Consider example:
SELECT DISTINCT R.sid, R.bid
FROM Reserves R
• General method
– Scan relation R and discard unwanted attributes
– Eliminate duplicates (This is an expensive operation.)
• projection based on sorting
1. Scan R, write sid and bid of each tuple to a temporary file T
2. Sort T based on both sid and bid
3. Scan the sorted file, compare the adjacent tuples and discard duplicates
• Cost, assuming T has 250 pages, and 20 buffers available
– Step 1: 1000 + 250 = 1250 I/Os
– Step 2: 2 250  2 =1000 I/Os (Refer to the formula on p.2)
– Step 3: 250 I/Os
– Total: 2500 I/Os

14
Example: sid, bid(Reserves) based on sorting
sid bid day rname sid bid sid bid sid bid

12 500 12 500 2 400 2 400

Discard unwanted attr

14 100 14 100 2 400 7 120

Sort on sid and bid

Discard duplicates
2 400 2 400 7 120 9 111
12 300 9 111 10 100
12 300 2 400 10 100 10 300
2 400 20 150 10 300 12 300
20 150
10 300 12 300 12 500
10 300 14 100 12 500 14 300
14 100 16 200 14 100 14 100
16 200 9 111 14 100 16 200
10 100 16 200 20 150
9 111 7 120 20 150
10 100
7 120

Reserves 15
The Join operation
• Example:
SELECT *
FROM Reserves R, Sailors S (R has 1000 pages, 100 tuples/page)
WHERE R.sid = S.sid (S has 500 pages, 80 tuples/page)
• Nested Loops Join
foreach tuple r in R do (R is called outer relation)
foreach tuple s in S do (S is called inner relation)
if r.sid = s.sid then add <r, s> to result
Cost
– Scan R: 1000 I/Os
– S is scanned once for each tuple of R :1000  100 500
– Total: 1000  100  500 + 1000 = 50001000 I/Os
– Switch R and S, the total is: 500  80  1000 + 500 = 40000500 I/Os
– If each I/O takes 10ms, the total time is over 100 hours!!
16
Join Operation (cont.)
• Refinement to Nested Loops Join – a page at a time
For each page p of R
for each page q of S
output all r  p and s  q such that r.sid =
s.sid
• Cost
– Scan R: 1000 I/Os
– For each page of R, read 500 pages of S
– Total: 1000 + 1000  500 = 501000
– Amount of time: 1.4 hours

17
One page is reserved
• Block Nested Loops Join to read Sailors and
the other page is
– Suppose we have enough buffers to hold B+2 pages, reserved for output.
then read B pages of Reserves at a time:
For each block P of Reserves
for each page q of Sailors
for each r  P and s  q such that r.sid=s.sid
add <r, s> to the result
• Cost (assuming B = 100. Thus Reserves contains 10
blocks)
Note that the definition
– Scan Reserves: 1000 I/Os of B is a little bit
– For each block, scan Sailors, for 500 I/Os different from that
in your textbook.
– Total: 1000 + 10500 = 6000
– If we choose Sailors to be the outer relation, the cost is:
500 + 51000 = 5500
• If the buffer is large enough to hold the smaller relation
(+ 2 more pages), the cost is 1000 + 500 = 1500
18
Diagramatic View for Block Nested Loops Join
Relations Sailors
and Reserves Result

B pages for Reserves

input buffer to
output buffer
scan Sailors

Disk Disk

19
• Index Nested Loops Join
– Assume an index on sid of the Sailors relation,

For each r  Reserves do

for each s  Sailors where r.sid = s.sid (use index)
add <r, s> to the result
• Note: for each tuple in Reserves, we use the index to find the
matching tuples in Sailors
• Cost, assuming hash index (assume directory is in main memory)
– Scan Reserves: 1000 I/Os
– For each tuple in Reserves, an average of 1.2 I/O to get to bucket
page containing the matching Sailors data entry
– For each matching Sailors data entry, retrieve the Sailors tuple for 1
I/O (note: sid is the primary key of Sailors relation)
– Each block of Reserves contains 100 tuples.
– Total: 1000+100x1000x(1+1.2)=221000 I/Os
20
Index on sid of Sailors
sid age level name sid bid day rname

16
16 16
h(1
6)
16*, 24* h

Buckets Directory Reserves

Sailors
21
• Now assume an index on sid of the Reserves relation, we use
the following algorithm
For each s  Sailors do
for each r  Reserves where s.sid = r.sid
add <r, s> to the result
• Cost, assuming hash index (assume directory is in main memory)
– Scan Sailors: 500 I/Os
– For each tuple in Sailors, an average of 1.2 I/O to get to the bucket
page containing the matching Reserves data entry
– For each matching Reserves data entry, retrieve the Reserves tuples.
Estimation: 100,000 reservations for 40,000 sailors, so each sailor
makes 2.5 reservations on the average
• Clustered index: 2.5 reservations likely on the same page, 40000x1 I/Os.
Total cost: 500+40000  1.2+40000  1 = 88500
• Uclustered index: 2.5 reservations not likely on the same page, 40000x2.5
I/Os. Total cost = 500+40000  1.2+40000  2.5 = 148500 I/Os
22
Unclustered Index on sid of Reserves
sid bid day rname sid age level name

16
16 16
h(1
16*, 24* 6 )
h

Buckets Directory
Reserves Sailors 23
Idea for Sort-Merge Join (R i=j S)
• Sort R and S on the join column, then scan them to do a
``merge’’ (on join col.), and output result tuples.
– Advance scan of R until current R-tuple >= current S tuple, then
advance scan of S until current S-tuple >= current R tuple; do this
until current R tuple = current S tuple.
– At this point, all R tuples with the same value in Ri (current R
group) and all S tuples with the same value in Sj (current S group)
match; output <r, s> for all pairs of such tuples.
– Then resume scanning R and S.
• R is scanned once; each S group is scanned once per
matching R tuple. (Multiple scans of an S group are likely
to find needed pages in buffer.)
24
Example of Sort-Merge Join
sid bid day rname
sid sname rating age
22 dustin 7 45.0 28 103 12/4/96 guppy
28 yuppy 9 35.0 28 103 11/3/96 yuppy
31 lubber 8 55.5 31 101 10/10/96 dustin
44 guppy 5 35.0 31 102 10/12/96 lubber
58 rusty 10 35.0 31 101 10/11/96 lubber
58 103 11/12/96 dustin

• Cost
• Cost for sorting the two relations (use formula)
• Cost of joining two relations: if one of the join attributes is a primary key, then
this cost is M+N, where M and N are the sizes (in pages) of the two relations

25
Size of the Join result
• Terminologies
– n : number of tuples in relation r.
r
– f : blocking factor of relation r, i.e., the number of
r
tuples of relation r that fit into one block (page).
– If tuples of relation r are stored together physically,
br  nr / f r 

– S , the size of a record (tuple) of relation r.

r
– V(A,r): the number of distinct values that appear in
relation r for attribute
 (r) A
A
• The size of .
• V(A,r) = n if A is a key for relation r.
r
26
• SC(A,r): the selection cardinality of
attribute A of relation r, i.e., average number
of records that satisfy an equality condition
on attribute A.
– SC(A,r) = 1, if A is a key of r.
– Assume distinct values are distributed evenly
and A is not a key then SC ( A, r )  (nr / V ( A, r ))

27
• Example
– ncustomer = 10,000.
– fcustomer = 25 bcustomer = 10000/25 = 400.
– ndepositor = 5000
– fdepositor = 50  bdepositor = 5000/50 = 100.
– V(customer-name, depositor) = 2500
• On average each customer has two accounts
– Assume customer-name in depositor is a
foreign key on customer.
28
• r  s contains nrns tuples.
• Each tuple of r  s occupies sr+ss bytes.
• Size of natural join
– Let r(R) and s(S) be relations.
– If R  S = 
• r   s is the same as r  s.
– If R  S is a key of r:
• A tuple of s will join with at most one tuple from r.
• The number of tuples in r   s is no greater than the
number of tuples in s.
• If R  S is a foreign key of s referencing to r, then
the number of tuples in r   s is exactly the same as
the number of tuples in s.
29
• Example, depositor  customer
» Customer-name in depositor is a foreign key of refer
to the customer-name in customer.
– The size of the result is ndepositor = 5000.
– If R  S is a key for neither r nor s:
• Let R  S = {A}.
• Assume each value appears with equal probability.
• We estimate that each tuple t in r produces
ns
in r   s
V ( A, s )
• Total number of tuples in r   s is estimated to be
nr ns
V ( A, s )
30
– Similarly, if we reverse the roles of r and s, the total
number of tuple is estimated to be
nr ns
V ( A, r )
– These two estimates differ if V(A,r)  V(A,s).
– The lower of the two estimates is probably the better
one, since there are likely to be some dangling tuples.
– Example without using information about foreign keys:
• V(customer-name,depositor) = 2500
– Size = 5000*10000/2500 = 20000
• V(customer-name, customer) = 10000
– Size = 5000*10000/10000 = 5000
• The lowest one is 5000.

31
Selection Operation
• Linear Search
– All block have to be read: b
r
– Selection on a key attribute: b /2
r
• Binary Search
– Locating the first tuple:  2 r 
log (b )

– Assume the total number of records that will satisfy the

selection is SC(A,r).
– These records will occupy SC ( A, r ) / f r 
– Total block access:  2 r  
log (b )  SC ( A, r ) / f r   1

– If the equality condition is on a key attribute

• SC(A,r) = 1
• Total cost = log 2 (br )
32
• Clustered index, equality on key
– Cost = index access + 1
• Clustered index, equality on nonkey
– SC(A,r) tuples will satisfy an equality condition.
– Block access to retrieve record: SC ( A, r ) / f r 
– Total: index access  SC ( A, r ) / f
 r
• Unclustered index, equality
– SC(A,r) tuples will satisfy an equality condition.
– Worst case scenario: each matching record resides on a
different block.
– Cost = index access + SC(A,r)
– For key indexing attribute: index access + 1

33
• Selections involving comparisons
– Let lowest and highest values be min(A,r) and
max(A,r).
– Assume the values are uniformly distributed.
– Number of records that will satisfy A  v
• 0, if v  min( A, r )
• nr , if v  max( A, r )
• otherwise, nr  (v  min( A, r )) /(max( A, r )  min( A, r ))
• Clustered index, comparison
• Let the number of values that satisfy the condition
be c.
• Cost = index access + c / f r 
34
• Unclustered index, comparison
– Cost = index access + c
– E.g. Assume one-half of the records satisfy the
condition
• Cost = index access + nr/2
• Suppose B+tree is used
– Cost = number of levels  (number of leave nodes)/2 - 1  n r / 2

35
Complex Selections
• Conjunction selection    1 2  n
(r )
– Let si be the size of   i (r )
– Probability of satisfying condition  i is si / nr
n 
– Estimate size = r 1 2 ( s  s    s n ) / nr
n

• Disjunction selection  1  2  n (r )

– Estimate size = nr  (1  (1  s1 / nr )   (1  sn / nr ))
• Negation   (r )
– Estimate size = size(r )  size(  (r ))

36
• Conjunctive selection using one index
– Use a selection algorithm (those discussed before) to retrieve the
records.
– Complete the operation by testing, in the memory buffer (on the
fly), whether the remaining conditions are satisfied.
• Conjunctive selection using composite index
– As discussed before.
• Conjunctive Selection by intersection of identifiers
– Each index is scanned for pointers to tuples that satisfy an
individual condition.
– Intersect all the retrieved pointers
– Using the resulting pointers to retrieve the actual records.
– If indices are not available on all the individual conditions, then
the retrieved records are tested against the remaining conditions.

37
• Disjunctive selection by union of identifiers
– Each index is scanned for pointers to tuples that satisfy an
individual condition.
– Union all the retrieved pointers
– Using the resulting pointers to retrieve the actual records.
– If even one of the conditions does not have an access path, we will
have to perform a linear scan on the relation.

38
More Examples

Sailors (sid: integer, sname: string, rating: integer, age: real)

Reserves (sid: integer, bid: integer, day: dates, rname: string)

• Reserves:
– Each tuple is 40 bytes long, 100 tuples per page, 1000
pages.
• Sailors:
– Each tuple is 50 bytes long, 80 tuples per page, 500
pages.
39
RA Tree: sname
SELECT S.sname
FROM Reserves R, Sailors S
WHERE R.sid=S.sid AND bid=100 rating > 5
R.bid=100 AND S.rating>5

sid=sid
• Translating to relational algebra operation
sname(bid=100rating>5(Reserves sid=sidSailors))
Reserves Sailors
• Tree representation (to the right)
– Nodes are relational operators (On-the-fly)
– sname
Edges point to where the input comes from
– In general, we must also add access path to each node Plan:
• Cost:500+500*1000=500500 I/Os (page at a time)
• Misses several opportunities: selections could have bid=100 rating > 5 (On-the-fly)
been `pushed’ earlier, didn’t use any available
indexes, etc.
• Goal of optimization: To find more efficient plans (Simple Nested Loops)
that compute the same answer. sid=sid

Sailors 40
Reserves
On-the-Fly vs Materialized Evaluations
• Let op1 and op2 be two relational algebra operations, and
op1 is performed on the result of op2
• The evaluation of op1 is on-the-fly if the result of op2 is
directly sent to op1 (i.e, not stored in a temporary file)
• It is materialized if that result is stored in a temporary file
first
• On-the-fly evaluation is called pipelined evaluation
• On-the-fly can be more efficient then materialized
– Example: bid=100(Reserves sid=sid Sailors))
• Whether on-the-fly should be used depends on the case
– Example: (age<25 Sailors) sid=sid (bid=100 Reserves)
– What if only one sailor has ever reserved bid=100 once? (i.e, only
one tuple in Reserves has bid=100)

41
On-the-Fly vs. Materialized Evaluations : example
bid=100(Reserves sid=sidSailors))
Reserves Sailors Reserves Sailors
sid bid sid name sid bid sid name
1 100 1 Lin 1 100 1 Lin
2 200 2 Wu 2 200 2 Wu
Disk 2 100 3 Du 2 100 3 Du Disk
1 100 Lin
2 200 Wu

Buffer for 1 100 1 Lin Buffer for

Reserves 2 200 2 Wu Reserves
Sailors 2 100 3 Du Sailors

1 100 Lin
2 200 Wu
RAM

Buffer for temp. Buffer for temp.

result result
RAM
42
Materialized On-the-Fly
On-the-Fly vs. Materialized Evaluations : example
bid=100(Reserves sid=sidSailors))
Reserves Sailors Reserves Sailors
sid bid sid name sid bid sid name
1 100 1 Lin 1 100 1 Lin
2 200 2 Wu 2 200 2 Wu
Disk 2 100 3 Du 2 100 3 Du Disk
1 100 Lin 2 100 Wu
2 200 Wu

Buffer for 1 100 1 Lin 1 100 1 Lin Buffer for

Reserves 2 200 2 Wu 2 200 2 Wu Reserves
Sailors 2 100 3 Du 2 100 3 Du Sailors

1 100 Lin 1 100 Lin 1 100 Lin

2 100 Wu
2 100 Wu 2 200 Wu RAM

Buffer for temp. Buffer for temp. Buffer for temp. Buffer for temp.
result result result result
RAM
43
Materialized On-the-Fly
On-the-Fly vs. Materialized Evaluations : example
bid=100(Reserves sid=sidSailors))
Reserves Sailors Reserves Sailors
sid bid sid name sid bid sid name
1 100 1 Lin 1 100 1 Lin
2 200 2 Wu 2 200 2 Wu
Disk 2 100 3 Du 2 100 3 Du Disk
1 100 Lin 2 100 Wu
2 200 Wu

Buffer for 1 100 1 Lin 1 100 1 Lin Buffer for

Reserves 2 200 2 Wu 2 200 2 Wu Reserves
Sailors 2 100 3 Du 2 100 3 Du Sailors

1 100 Lin 1 100 Lin

2 100 Wu 2 100 Wu
2 100 Wu 2 100 Du RAM

Buffer for temp. Buffer for temp. Buffer for temp. Buffer for temp.
result result result result
RAM
44
Materialized On-the-Fly
On-the-Fly vs. Materialized Evaluations : example
bid=100(Reserves sid=sidSailors))

Reserves Sailors Reserves Sailors

sid bid sid name sid bid sid name
1 100 1 Lin 1 100 1 Lin
2 200 2 Wu 2 200 2 Wu
Disk 3 100 3 Du 3 100 3 Du Disk
1 100 Lin 2 100 Wu
2 200 Wu

1 100 1 Lin 1 100 1 Lin

2 200 2 Wu 2 200 2 Wu
2 100 3 Du 2 100 3 Du

1 100 Lin 1 100 Lin 1 100 Lin

2 200 Wu 2 100 Du 2 200 Wu 1 100 Lin RAM
2 100 Wu
2 100 Wu 2 100 Du

RAM 1 100 Lin 2 100 Wu

2 200 Wu On-the-Fly 45
Materialized, data flow order:
(On-the-fly)
sname
Alternative Plans 1
(No Indexes) sid=sid
(Sort-Merge Join)

(Scan; (Scan;
write to bid=100 rating > 5 write to
• Main difference: push selects. temp T1) temp T2)
• With 5 buffer pages, cost of plan: Reserves Sailors
– Scan Reserves (1000) + write temp T1 (10 pages, assuming 100 boats, uniform
distribution. There are 100,000 tuples in Reserves, with 1000 tuples per boat, stored in
10 pages.)
– Scan Sailors (500) + write temp T2 (250 pages, assuming 10 ratings, uniform
distribution. There are 40,000 tuples in Sailors, with 4000 tuples per rating. Thus 20000
tuples have ratings > 5, stored in 20000/80 = 250 pages. )
– Sort T1 (2*10*2), sort T2 (2*250*4), merge (10+250)
– Total: 4060 page I/Os.

• If we used BNL join, join cost = 10+4*250, total cost = 1010 + 750 + 1010= 2770.
(Note for BNL we do not sort T1 and T2.)
• Furthermore, we can push projections (next slide)
46
Alternative Plans 1 (cont.)
(No Indexes) sid=sid
(BNL)

• When we `push’ projections as

On-the-fly: On-the-fly:
sname,sid
well, T1 has only sid, T2 only sid write to T1 sid
write to T2
and sname: Scan
Scan rating > 5
– Assume T1 fits in 3 pages, and T2 bid=100
fits in 200 pages.
– Join cost of BNL: 3 + 1x200 = 203
File scan Reserves Sailors File scan
– Total cost: 1000+3+500+200+203

= 1906

47
(On-the-fly)
sname

Alternative Plans 2 with Indexes

Assumption: clustered hash index on bid of rating > 5 (On-the-fly)

Reserves, and hash index on sid of Sailors

1. Hash on 100, get all Reserves tuples with bid = (Index Nested Loops,
sid=sid with pipelining )
100
2. Use index nested loops with pipelining (outer is (Use hash
index; do bid=100 Sailors
not materialized) not write
result to
– For each tuple obtained at step 1, hash its sid temp)
value to get the matching Sailors tuple (Note: sid Reserves

is the key of Sailors. So at most one matching

tuple exists: unclustered index on sid of Sailors
ok)
– Join these two tuples, select if rating > 5
– If selected, projecting out its sname
• Cost, assuming 100 boats uniformly distributed in Reserves
• We get 100,000/100 = 1000 tuples with a bid of 100. They are in 10 pages (note:
Reserves is sorted on bid). For each such tuple, use 2.2 I/O (0 [directory in main memory]
+1.2 [with overflow buckets]+1 [sid is the key in Sailor]) to get the matching tuple in
Sailors, for a total of 1000x2.2=2200 I/Os
• Total cost of the query: 2 +10 + 2200 = 2212 I/Os (Assume index cost of Reserves is 2,
i.e., directory is on disk, no overflow bucket ) 48
(On-the-fly)

Alternative Plans 2
sname

With Indexes (cont.) rating > 5 (On-the-fly)

• Note: we did not push the (Index Nested Loops,

selection on rating before the sid=sid with pipelining )

join. Why?
(Use hash
– If we perform the selection index; do bid=100 Sailors
not write
before the join, we must result to
temp)
scan the Sailors, with an Reserves
extra cost of 500 I/Os.

 More importantly, once the selection is performed, we have no index on the sid field
of the result any more
 Note: the result is a brand new file. Its location is independent of the location of
Sailors.
 We have to build a new index solely for the subsequent Join operation. Is this
worthwhile?

49
(On-the-fly)

Alternative Plans 2
sname

With Indexes (cont.) rating > 5 (On-the-fly)

• Assume: Sailors sorted on sid, clustered ( Nested Loops,

hash index on bid of Reserves, 3 buffer sid=sid page at a time)
pages available and 1000 tuples with (Use hash
index; do not
bid=100 in Reserves (i.e, in 10 pages) write result to Sailors
bid=100
• The cost for the upper plan: temp relation)

– 2 + 10 + 10x500 = 5012
Reserves
• The cost for the lower plan
(On-the-fly)
– 2 + 10(select) + 10(write) + 2x3x10(sort) + sname
(10+500) (sort-merge join) = 592
• This implies that on-the-fly evaluation is
rating > 5 (On-the-fly)
not always the best

( Merge join)
sid=sid

(Use hash index; write

result to T1.Then sort it
on sid) bid=100 Sailors

50
Reserves

05 QueryProcessing LecW4 Feb7 22
No ratings yet
05 QueryProcessing LecW4 Feb7 22
55 pages
Advanced Database Systems Lecture Notes
No ratings yet
Advanced Database Systems Lecture Notes
79 pages
Dbms MCQ 01: Database Administrator
100% (1)
Dbms MCQ 01: Database Administrator
84 pages
7-Query Processing
No ratings yet
7-Query Processing
47 pages
1.3 PPT - Measure of Query Cost
100% (1)
1.3 PPT - Measure of Query Cost
42 pages
Spatial Database
No ratings yet
Spatial Database
30 pages
13 QP1
No ratings yet
13 QP1
33 pages
Relational Query Optimization: CS186 R & G Chapters 12/15
No ratings yet
Relational Query Optimization: CS186 R & G Chapters 12/15
51 pages
Midterm 13w2
No ratings yet
Midterm 13w2
8 pages
Database Technology Query Processing: Heiko Paulheim
No ratings yet
Database Technology Query Processing: Heiko Paulheim
60 pages
Week 1 Activity Sheet:: Defining A Database
100% (1)
Week 1 Activity Sheet:: Defining A Database
30 pages
Evaluation of Relational Operations: Chapter 14, Part A (Joins)
No ratings yet
Evaluation of Relational Operations: Chapter 14, Part A (Joins)
6 pages
Microsoft - Certshared.dp 203.free - pdf.2023 Sep 25.by - Osborn.177q.vce
No ratings yet
Microsoft - Certshared.dp 203.free - pdf.2023 Sep 25.by - Osborn.177q.vce
24 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
49 pages
Database Tuning: Database Tuning Describes A Group of Activities Used To Optimize and Homogenize The
No ratings yet
Database Tuning: Database Tuning Describes A Group of Activities Used To Optimize and Homogenize The
20 pages
Notes On DBMS Internals: Preamble
No ratings yet
Notes On DBMS Internals: Preamble
27 pages
Notes On DBMS Internals: Preamble
No ratings yet
Notes On DBMS Internals: Preamble
20 pages
QueryProcess Optim
No ratings yet
QueryProcess Optim
60 pages
CAS CS 460/660 Introduction To Database Systems Query Optimization
No ratings yet
CAS CS 460/660 Introduction To Database Systems Query Optimization
20 pages
QEII
No ratings yet
QEII
44 pages
Hash Tables and Query Execution: March 1st, 2004
No ratings yet
Hash Tables and Query Execution: March 1st, 2004
32 pages
Query Processing
No ratings yet
Query Processing
77 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
05 Optimization
No ratings yet
05 Optimization
58 pages
Database Modeling - notes-VI
No ratings yet
Database Modeling - notes-VI
8 pages
Relational Query Optimization: Plan: Tree of R.A. Ops, With Choice of Alg For Each Op
No ratings yet
Relational Query Optimization: Plan: Tree of R.A. Ops, With Choice of Alg For Each Op
7 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
49 pages
Database Management Systems Practice Problem Set: Query Evaluation, Optimization
No ratings yet
Database Management Systems Practice Problem Set: Query Evaluation, Optimization
3 pages
Unit 4
No ratings yet
Unit 4
24 pages
Assignment3 Sol
No ratings yet
Assignment3 Sol
4 pages
Introduction To Database Management Systems CS470
No ratings yet
Introduction To Database Management Systems CS470
11 pages
CH 13 Updated
No ratings yet
CH 13 Updated
30 pages
DBMS 10 Joins v2
No ratings yet
DBMS 10 Joins v2
38 pages
3 Join Optimization
No ratings yet
3 Join Optimization
32 pages
Correction of Final Exam 24-25
No ratings yet
Correction of Final Exam 24-25
5 pages
06 Query Processing (2) - NDN
No ratings yet
06 Query Processing (2) - NDN
31 pages
Unit 4 - Query Processing
No ratings yet
Unit 4 - Query Processing
49 pages
DBMS Unit5 Lecture1
No ratings yet
DBMS Unit5 Lecture1
22 pages
Relational Query Optimization: Warih Maharani, ST.,MT
No ratings yet
Relational Query Optimization: Warih Maharani, ST.,MT
39 pages
DBMS UNIT 4 Part 1
No ratings yet
DBMS UNIT 4 Part 1
15 pages
Unit 1
No ratings yet
Unit 1
23 pages
Unit-2 Query Processing and Optimization, Query Equivalence, Join Strategies
No ratings yet
Unit-2 Query Processing and Optimization, Query Equivalence, Join Strategies
38 pages
Final Review
No ratings yet
Final Review
96 pages
CSE 444 Practice Problems
No ratings yet
CSE 444 Practice Problems
13 pages
HW 3 Sol
No ratings yet
HW 3 Sol
8 pages
Query Processing
No ratings yet
Query Processing
39 pages
Q Evaluation
No ratings yet
Q Evaluation
17 pages
CSE 444: Database Internals: Section 4: Query Optimizer
No ratings yet
CSE 444: Database Internals: Section 4: Query Optimizer
16 pages
3 Query Processing and Optimization-1
No ratings yet
3 Query Processing and Optimization-1
18 pages
BCS Topic
No ratings yet
BCS Topic
66 pages
DBMS R19 Unit Iv
No ratings yet
DBMS R19 Unit Iv
25 pages
Overview of Query Evaluation: R&G Chapter 12
No ratings yet
Overview of Query Evaluation: R&G Chapter 12
30 pages
Course08 - RelEval
No ratings yet
Course08 - RelEval
22 pages
Query Processing + Optimization: Outline: Operator Evaluation Strategies
No ratings yet
Query Processing + Optimization: Outline: Operator Evaluation Strategies
53 pages
Lecture11 Query Processing
No ratings yet
Lecture11 Query Processing
37 pages
Advance Database Management System: Unit - 2 .Query Processing and Optimization
No ratings yet
Advance Database Management System: Unit - 2 .Query Processing and Optimization
38 pages
Database Modeling - Notes-V
No ratings yet
Database Modeling - Notes-V
9 pages
Ch12-Query Processing
No ratings yet
Ch12-Query Processing
34 pages
ADBMS TypicalQueryOptimizer
No ratings yet
ADBMS TypicalQueryOptimizer
30 pages
Unit IV Part II
No ratings yet
Unit IV Part II
37 pages
Oracle Performance Improvement by Tuning Disk Input Output
No ratings yet
Oracle Performance Improvement by Tuning Disk Input Output
4 pages
Sample Questions On Computer Science Subjects
No ratings yet
Sample Questions On Computer Science Subjects
19 pages
Apps Check v02
No ratings yet
Apps Check v02
296 pages
Building An Analytics Platform
No ratings yet
Building An Analytics Platform
145 pages
Indexing Files: Last Time
No ratings yet
Indexing Files: Last Time
5 pages
DDD Exam Mark Scheme Autumn 2018
No ratings yet
DDD Exam Mark Scheme Autumn 2018
12 pages
Abends DB2
No ratings yet
Abends DB2
7 pages
Files and File Groups Architecture
No ratings yet
Files and File Groups Architecture
7 pages
Requirements Engineering: Ian Sommerville,, 9 Edition Pearson Education, Addison-Wesley
No ratings yet
Requirements Engineering: Ian Sommerville,, 9 Edition Pearson Education, Addison-Wesley
54 pages
Data Analytics Curriculum
No ratings yet
Data Analytics Curriculum
8 pages
Final Petrol PPM - Merged
No ratings yet
Final Petrol PPM - Merged
36 pages
Hashing Part1 - 241021 - 152911
No ratings yet
Hashing Part1 - 241021 - 152911
10 pages
Indexing
No ratings yet
Indexing
10 pages
Stem Analysis
No ratings yet
Stem Analysis
13 pages
Compiled Objectives
No ratings yet
Compiled Objectives
59 pages
Oracle Partitioned Tables
No ratings yet
Oracle Partitioned Tables
38 pages
MYSQL
No ratings yet
MYSQL
3 pages
D426 Study Guide
No ratings yet
D426 Study Guide
15 pages
Chapter 4 Accessing Data
No ratings yet
Chapter 4 Accessing Data
32 pages
Safeti 8.1 Release Notes
No ratings yet
Safeti 8.1 Release Notes
42 pages
Cheat Exam Oracle
No ratings yet
Cheat Exam Oracle
18 pages
What Is NoSQL
No ratings yet
What Is NoSQL
52 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
13 pages
Thesis
No ratings yet
Thesis
80 pages
SQL Profiler - 2
No ratings yet
SQL Profiler - 2
22 pages
C5 Database Commands - Viva Clipper !
No ratings yet
C5 Database Commands - Viva Clipper !
9 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
From Everand
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
CSPacademic
No ratings yet
Search Tree: Fundamentals and Applications
From Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

L10-Query Evaluaion

Uploaded by

L10-Query Evaluaion

Uploaded by

Evaluating Relational

• Step 1 (searching the tree): 3 pages

• searching the tree: 3 pages

12 500 12 500 2 400 2 400

Discard unwanted attr

Sort on sid and bid

B pages for Reserves

For each r  Reserves do

Buckets Directory Reserves

– S , the size of a record (tuple) of relation r.

– Assume the total number of records that will satisfy the

– If the equality condition is on a key attribute

• Disjunction selection  1  2  n (r )

Sailors (sid: integer, sname: string, rating: integer, age: real)

Buffer for 1 100 1 Lin Buffer for

Buffer for temp. Buffer for temp.

Buffer for 1 100 1 Lin 1 100 1 Lin Buffer for

1 100 Lin 1 100 Lin 1 100 Lin

Buffer for 1 100 1 Lin 1 100 1 Lin Buffer for

1 100 Lin 1 100 Lin

Reserves Sailors Reserves Sailors

1 100 1 Lin 1 100 1 Lin

1 100 Lin 1 100 Lin 1 100 Lin

RAM 1 100 Lin 2 100 Wu

• When we `push’ projections as

Alternative Plans 2 with Indexes

Reserves, and hash index on sid of Sailors

is the key of Sailors. So at most one matching

With Indexes (cont.) rating > 5 (On-the-fly)

• Note: we did not push the (Index Nested Loops,

With Indexes (cont.) rating > 5 (On-the-fly)

• Assume: Sailors sorted on sid, clustered ( Nested Loops,

(Use hash index; write

You might also like