Chapter 13

sorting

Uploaded by

Vijaya Goel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views24 pages

Chapter 13

sorting

Uploaded by

Vijaya Goel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 24

Silberschatz, Korth and Sudarshan 13.

1 Database System Concepts - 5

th
Edition, Aug 27, 2005.
Sorting
We may build an index on the relation, and then use the index to read
the relation in sorted order. May lead to one disk block access for
each tuple.
For relations that fit in memory, techniques like quicksort can be used.
For relations that dont fit in memory, external
sort-merge is a good choice.
Silberschatz, Korth and Sudarshan 13.2 Database System Concepts - 5
th
Edition, Aug 27, 2005.
External Sort-Merge
1. Create sorted runs. Let i be 0 initially.
Repeatedly do the following till the end of the relation:
(a) Read M blocks of relation into memory
(b) Sort the in-memory blocks
(c) Write sorted data to run R
i
; increment i.
Let the final value of i be N
2. Merge the runs (next slide)..
Let M denote memory size (in pages).
Silberschatz, Korth and Sudarshan 13.3 Database System Concepts - 5
th
Edition, Aug 27, 2005.
External Sort-Merge (Cont.)
2. Merge the runs (N-way merge). We assume (for now) that N <
M.
1. Use N blocks of memory to buffer input runs, and 1 block to
buffer output. Read the first block of each run into its buffer
page
2. repeat
1. Select the first record (in sort order) among all buffer
pages
2. Write the record to the output buffer. If the output buffer
is full write it to disk.
3. Delete the record from its input buffer page.
If the buffer page becomes empty then
read the next block (if any) of the run into the buffer.
3. until all input buffer pages are empty:
Silberschatz, Korth and Sudarshan 13.4 Database System Concepts - 5
th
Edition, Aug 27, 2005.
External Sort-Merge (Cont.)
If N > M, several merge passes are required.
In each pass, contiguous groups of M - 1 runs are merged.
A pass reduces the number of runs by a factor of M -1, and
creates runs longer by the same factor.
E.g. If M=11, and there are 90 runs, one pass reduces
the number of runs to 9, each 10 times the size of the
initial runs
Repeated passes are performed till all runs have been
merged into one.
Silberschatz, Korth and Sudarshan 13.5 Database System Concepts - 5
th
Edition, Aug 27, 2005.
Example: External Sorting Using Sort-Merge
Silberschatz, Korth and Sudarshan 13.6 Database System Concepts - 5
th
Edition, Aug 27, 2005.
External Merge Sort (Cont.)
Cost analysis:
Total number of merge passes required: log
M1
(b
r
/M)(.
Block transfers for initial run creation as well as in each
pass is 2b
r

for final pass, we dont count write cost
we ignore final write cost for all operations since the
output of an operation may be sent to the parent
operation without being written to disk
Thus total number of block transfers for external sorting:
b
r
( 2 log
M1
(b
r
/ M)( + 1)
Silberschatz, Korth and Sudarshan 13.7 Database System Concepts - 5
th
Edition, Aug 27, 2005.
Join Operation
Several different algorithms to implement joins
Nested-loop join
Block nested-loop join
Indexed nested-loop join
Merge-join
Hash-join
Choice based on cost estimate
Examples use the following information
Number of records of customer: 10,000 depositor: 5000
Number of blocks of customer: 400 depositor: 100
Silberschatz, Korth and Sudarshan 13.8 Database System Concepts - 5
th
Edition, Aug 27, 2005.
Nested-Loop Join
To compute the theta join r
u
s
for each tuple t
r
in r do begin
for each tuple t
s
in s do begin
test pair (t
r
,t
s
) to see if they satisfy the join condition u
if they do, add t
r
t
s
to the result.
end
end
r is called the outer relation and s the inner relation of the join.
Requires no indices and can be used with any kind of join condition.
Expensive since it examines every pair of tuples in the two relations.
Silberschatz, Korth and Sudarshan 13.9 Database System Concepts - 5
th
Edition, Aug 27, 2005.
Nested-Loop Join (Cont.)
In the worst case, if there is enough memory only to hold one block of each
relation, the estimated cost is
n
r
- b
s
+ b
r

block transfers

If the smaller relation fits entirely in memory, use that as the inner relation.
Reduces cost to b
r
+ b
s
block transfers
Assuming worst case memory availability cost estimate is
with depositor as outer relation:
5000 - 400 + 100 = 2,000,100 block transfers,
with customer as the outer relation
10000 - 100 + 400 = 1,000,400 block transfers
If smaller relation (depositor) fits entirely in memory, the cost estimate will be 500
block transfers.
Block nested-loops algorithm (next slide) is preferable.
Silberschatz, Korth and Sudarshan 13.10 Database System Concepts - 5
th
Edition, Aug 27, 2005.
Block Nested-Loop Join
Variant of nested-loop join in which every block of inner relation is
paired with every block of outer relation.
for each block B
r
of r do begin
for each block B
s
of s do begin
for each tuple t
r
in B
r
do begin
for each tuple t
s
in B
s
do begin
Check if (t
r
,t
s
) satisfy the join condition
if they do, add t
r

t
s
to the result.
end
end
end
end
Silberschatz, Korth and Sudarshan 13.11 Database System Concepts - 5
th
Edition, Aug 27, 2005.
Block Nested-Loop Join (Cont.)
Worst case estimate: b
r
- b
s
+ b
r
block transfers
Each block in the inner relation s is read once for each block in
the outer relation (instead of once for each tuple in the outer
relation
Best case: b
r
+ b
s
block transfers.
Improvements to nested loop and block nested loop algorithms:
In block nested-loop, use M 2 disk blocks as blocking unit for
outer relations, where M = memory size in blocks; use remaining
two blocks to buffer inner relation and output
Cost = b
r
/ (M-2)( - b
s
+ b
r
block transfers
If equi-join attribute forms a key on inner relation, stop inner loop
on first match
Scan inner loop forward and backward alternately, to make use of
the blocks remaining in buffer (with LRU replacement)
Use index on inner relation if available (next slide)
Silberschatz, Korth and Sudarshan 13.12 Database System Concepts - 5
th
Edition, Aug 27, 2005.
Indexed Nested-Loop Join
Index lookups can replace file scans if
join is an equi-join or natural join and
an index is available on the inner relations join attribute
Can construct an index just to compute a join.
For each tuple t
r
in the outer relation r, use the index to look up tuples in s
that satisfy the join condition with tuple t
r
.(Equivalent to selection on s)
Worst case: buffer has space for only one page of r and one page of
index. For each tuple in r, we perform an index lookup on s.
Cost of the join: b
r
+ n
r
- c
Where c is the cost of traversing index and fetching all matching s
tuples for one tuple or r
c can be estimated as cost of a single selection on s using the join
condition.
If indices are available on join attributes of both r and s,
use the relation with fewer tuples as the outer relation.
Silberschatz, Korth and Sudarshan 13.13 Database System Concepts - 5
th
Edition, Aug 27, 2005.
Example of Nested-Loop Join Costs
Compute depositor customer, with depositor as the outer relation.
Let customer have a primary B
+
-tree index on the join attribute
customer-name, which contains 20 entries in each index node.
Since customer has 10,000 tuples, the height of the tree is 4, and one
more access is needed to find the actual data
depositor has 5000 tuples
Cost of block nested loops join
400*100 + 100 = 40,100 block transfers
assuming worst case memory
may be significantly less with more memory
Cost of indexed nested loops join
100 + 5000 * 5 = 25,100 block transfers.
CPU cost likely to be less than that for block nested loops join
Silberschatz, Korth and Sudarshan 13.14 Database System Concepts - 5
th
Edition, Aug 27, 2005.
Merge-Join
1. Sort both relations on their join attribute (if not already sorted on the join
attributes).
2. Merge the sorted relations to join them
1. Join step is similar to the merge stage of the sort-merge algorithm.
2. Main difference is handling of duplicate values in join attribute every
pair with same value on join attribute must be matched
Silberschatz, Korth and Sudarshan 13.15 Database System Concepts - 5
th
Edition, Aug 27, 2005.
Merge-Join (Cont.)
Can be used only for equi-joins and natural joins
Each block needs to be read only once (assuming all tuples for any given
value of the join attributes fit in memory)
Thus the cost of merge join is:
b
r
+ b
s
block transfers
+ the cost of sorting if relations are unsorted.
hybrid merge-join: If one relation is sorted, and the other has a
secondary B
+
-tree index on the join attribute
Merge the sorted relation with the leaf entries of the B
+
-tree .
Sort the result on the addresses of the unsorted relations tuples
Scan the unsorted relation in physical address order and merge with
previous result, to replace addresses by the actual tuples
Sequential scan more efficient than random lookup
Silberschatz, Korth and Sudarshan 13.16 Database System Concepts - 5
th
Edition, Aug 27, 2005.
Hash-Join
Applicable for equi-joins and natural joins.
A hash function h is used to partition tuples of both relations
h maps JoinAttrs values to {0, 1, ..., n}, where JoinAttrs denotes the
common attributes of r and s used in the natural join.
r
0
, r
1
, . . ., r
n
denote partitions of r tuples
Each tuple t
r
e r is put in partition r
i
where i = h(t
r
[JoinAttrs]).
S
0
,, S
1
. . ., S
n
denotes partitions of s tuples
Each tuple t
s
es is put in partition s
i
, where i = h(t
s
[JoinAttrs]).

Note: In book, r
i
is denoted as H
ri,
s
i
is denoted as H
si
and
n

is denoted as n
h.

Silberschatz, Korth and Sudarshan 13.17 Database System Concepts - 5
th
Edition, Aug 27, 2005.
Hash-Join (Cont.)
Silberschatz, Korth and Sudarshan 13.18 Database System Concepts - 5
th
Edition, Aug 27, 2005.
Hash-Join (Cont.)
r tuples in r
i
need only to be compared with s tuples in s
i
Need
not be compared with s tuples in any other partition, since:
an r tuple and an s tuple that satisfy the join condition will
have the same value for the join attributes.
If that value is hashed to some value i, the r tuple has to be in
r
i
and the s tuple in s
i
.
Silberschatz, Korth and Sudarshan 13.19 Database System Concepts - 5
th
Edition, Aug 27, 2005.
Hash-Join Algorithm
1. Partition the relation s using hashing function h.
2. Partition r similarly.
3. For each i:
(a) Load s
i
into memory and build an in-memory hash index on it
using the join attribute. This hash index uses a different hash
function than the earlier one h.
(b) Read the tuples in r
i
from the disk one by one. For each tuple
t
r
locate each matching tuple t
s
in s
i
using the in-memory hash
index. Output the concatenation of their attributes.
The hash-join of r and s is computed as follows.
Relation s is called the build input and
r is called the probe input.
Silberschatz, Korth and Sudarshan 13.20 Database System Concepts - 5
th
Edition, Aug 27, 2005.
Complex Joins
Join with a conjunctive condition:
r
u1. u 2.... . u n
s
Either use nested loops/block nested loops, or
Compute the result of one of the simpler joins r
ui
s
final result comprises those tuples in the intermediate result
that satisfy the remaining conditions

u
1
. . . . . u
i 1
. u
i +1
. . . . . u
n

Join with a disjunctive condition

r
u1 v u2 v... v un
s
Either use nested loops/block nested loops, or
Compute as the union of the records in individual joins r
u
i
s:
(r
u1
s) (r
u2
s) . . . (r
un
s)

Silberschatz, Korth and Sudarshan 13.21 Database System Concepts - 5
th
Edition, Aug 27, 2005.
Evaluation of Expressions
So far: we have seen algorithms for individual operations
Alternatives for evaluating an entire expression tree
Materialization: generate results of an expression whose inputs
are relations or are already computed, materialize (store) it on
disk. Repeat.
Pipelining: pass on tuples to parent operations even as an
operation is being executed
We study above alternatives in more detail
Silberschatz, Korth and Sudarshan 13.22 Database System Concepts - 5
th
Edition, Aug 27, 2005.
Materialization
Materialized evaluation: evaluate one operation at a time,
starting at the lowest-level. Use intermediate results
materialized into temporary relations to evaluate next-level
operations.
E.g., in figure below, compute and store

then compute the store its join with customer, and finally
compute the projections on customer-name.
) (
2500
account
balance<
o
Silberschatz, Korth and Sudarshan 13.23 Database System Concepts - 5
th
Edition, Aug 27, 2005.
Materialization (Cont.)
Materialized evaluation is always applicable
Cost of writing results to disk and reading them back can be quite high
Our cost formulas for operations ignore cost of writing results to
disk, so
Overall cost = Sum of costs of individual operations +
cost of writing intermediate results to disk
Double buffering: use two output buffers for each operation, when one
is full write it to disk while the other is getting filled
Allows overlap of disk writes with computation and reduces
execution time
Silberschatz, Korth and Sudarshan 13.24 Database System Concepts - 5
th
Edition, Aug 27, 2005.
Pipelining
Pipelined evaluation : evaluate several operations simultaneously,
passing the results of one operation on to the next.
E.g., in previous expression tree, dont store result of

instead, pass tuples directly to the join.. Similarly, dont store result of
join, pass tuples directly to projection.
Much cheaper than materialization: no need to store a temporary relation
to disk.
Pipelining may not always be possible e.g., sort, hash-join.
For pipelining to be effective, use evaluation algorithms that generate
output tuples even as tuples are received for inputs to the operation.
Pipelines can be executed in two ways: demand driven and producer
driven
) (
2500
account
balance<
o

I Puc Computer Science Lab Manual 2024-2025 - With - Flowcharts
89% (9)
I Puc Computer Science Lab Manual 2024-2025 - With - Flowcharts
61 pages
Database Systems Concept 5th Edition Silberschatz Korth
No ratings yet
Database Systems Concept 5th Edition Silberschatz Korth
68 pages
Lesson 06
No ratings yet
Lesson 06
44 pages
7-Query Processing
No ratings yet
7-Query Processing
47 pages
Ch12-Query Processing
No ratings yet
Ch12-Query Processing
34 pages
Unit 4 - Query Processing
No ratings yet
Unit 4 - Query Processing
49 pages
DBMS UNIT 4 Part 1
No ratings yet
DBMS UNIT 4 Part 1
15 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
49 pages
Unit-2 Query Processing and Optimization, Query Equivalence, Join Strategies
No ratings yet
Unit-2 Query Processing and Optimization, Query Equivalence, Join Strategies
38 pages
Dbms Chapter 5
No ratings yet
Dbms Chapter 5
54 pages
Unit 3
No ratings yet
Unit 3
63 pages
Unit IV Part II
No ratings yet
Unit IV Part II
37 pages
DBMS R19 Unit Iv
No ratings yet
DBMS R19 Unit Iv
25 pages
Query Processing + Optimization: Outline: Operator Evaluation Strategies
No ratings yet
Query Processing + Optimization: Outline: Operator Evaluation Strategies
53 pages
Evaluation of Relational Operations: Chapter 14, Part A (Joins)
No ratings yet
Evaluation of Relational Operations: Chapter 14, Part A (Joins)
6 pages
QueryProcessing Sorting
No ratings yet
QueryProcessing Sorting
44 pages
CH 22
No ratings yet
CH 22
34 pages
BCS Topic
No ratings yet
BCS Topic
66 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
Unit 3 - DBMS
No ratings yet
Unit 3 - DBMS
15 pages
Problem Solving 3
No ratings yet
Problem Solving 3
3 pages
3 Join Optimization
No ratings yet
3 Join Optimization
32 pages
Relational Algebra
No ratings yet
Relational Algebra
42 pages
QEII
No ratings yet
QEII
44 pages
Chapter 13: Query Processing: Database System Concepts, 5th Ed
No ratings yet
Chapter 13: Query Processing: Database System Concepts, 5th Ed
55 pages
05 Optimization
No ratings yet
05 Optimization
58 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
55 pages
Chap 2
No ratings yet
Chap 2
26 pages
Advance Database Management System: Unit - 2 .Query Processing and Optimization
No ratings yet
Advance Database Management System: Unit - 2 .Query Processing and Optimization
38 pages
Query Processing - Short Form
No ratings yet
Query Processing - Short Form
3 pages
Solution 03
No ratings yet
Solution 03
6 pages
Course08 - RelEval
No ratings yet
Course08 - RelEval
22 pages
CH 15
No ratings yet
CH 15
59 pages
CH 13 Updated
No ratings yet
CH 13 Updated
30 pages
Notes On DBMS Internals: Preamble
No ratings yet
Notes On DBMS Internals: Preamble
20 pages
Lecture11 Query Processing
No ratings yet
Lecture11 Query Processing
37 pages
06 Query Processing (2) - NDN
No ratings yet
06 Query Processing (2) - NDN
31 pages
Chapter 1 Part II
No ratings yet
Chapter 1 Part II
22 pages
Chapter 12: Query Processing
No ratings yet
Chapter 12: Query Processing
57 pages
Correction of Final Exam 24-25
No ratings yet
Correction of Final Exam 24-25
5 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
25 pages
13 QP1
No ratings yet
13 QP1
33 pages
Chapter 13: Query Processing: Database System Concepts, 6 Ed
No ratings yet
Chapter 13: Query Processing: Database System Concepts, 6 Ed
21 pages
Chapter 22: Parallel and Distributed Query Processing: Database System Concepts, 7 Ed
No ratings yet
Chapter 22: Parallel and Distributed Query Processing: Database System Concepts, 7 Ed
79 pages
Notes On DBMS Internals: Preamble
No ratings yet
Notes On DBMS Internals: Preamble
27 pages
ch14 7166
No ratings yet
ch14 7166
70 pages
Query Processing
No ratings yet
Query Processing
77 pages
Relational Operators
No ratings yet
Relational Operators
114 pages
Lecture Notes
No ratings yet
Lecture Notes
96 pages
Chapter 4: Intermediate SQL: Database System Concepts, 7 Ed
No ratings yet
Chapter 4: Intermediate SQL: Database System Concepts, 7 Ed
55 pages
Q Evaluation
No ratings yet
Q Evaluation
17 pages
Hash Tables and Query Execution: March 1st, 2004
No ratings yet
Hash Tables and Query Execution: March 1st, 2004
32 pages
SolSQL Inroduction
No ratings yet
SolSQL Inroduction
68 pages
QueryProcess Optim
No ratings yet
QueryProcess Optim
60 pages
ch2 PDF
No ratings yet
ch2 PDF
30 pages
Chapter 2: Intro To Relational Model
No ratings yet
Chapter 2: Intro To Relational Model
31 pages
Chapter 2: Intro To Relational Model: Database System Concepts, 6 Ed
No ratings yet
Chapter 2: Intro To Relational Model: Database System Concepts, 6 Ed
30 pages
Variants of Resource Allocation Problem: Vijaya Goel
No ratings yet
Variants of Resource Allocation Problem: Vijaya Goel
7 pages
GATE Paper CS-2006
No ratings yet
GATE Paper CS-2006
28 pages
CH 5
No ratings yet
CH 5
46 pages
3 Flip-Flops: 3.1 RS Latches
No ratings yet
3 Flip-Flops: 3.1 RS Latches
28 pages
What Is "Computer": A Machine That Performs Computational Tasks Using Stored Instructions
No ratings yet
What Is "Computer": A Machine That Performs Computational Tasks Using Stored Instructions
33 pages
Continuos Steel Reheating Furnaces: Specification, Design and Equipment
83% (12)
Continuos Steel Reheating Furnaces: Specification, Design and Equipment
68 pages
Gravitation Notes
No ratings yet
Gravitation Notes
21 pages
Profit and Loss Problems TOP 10 Important Short Tricks Are Here
100% (1)
Profit and Loss Problems TOP 10 Important Short Tricks Are Here
4 pages
Playing With Color - Color Wheels
No ratings yet
Playing With Color - Color Wheels
4 pages
LDR
No ratings yet
LDR
7 pages
P.4 Mathematics Mid Term 1
No ratings yet
P.4 Mathematics Mid Term 1
7 pages
Process Safety
No ratings yet
Process Safety
98 pages
CSE 108.04 Analysis & Design of Reinforced Concrete Foundati
No ratings yet
CSE 108.04 Analysis & Design of Reinforced Concrete Foundati
85 pages
Pedestrian Wind Comfort Study Using Computational Fluid Dynamic (CFD) Simulation
No ratings yet
Pedestrian Wind Comfort Study Using Computational Fluid Dynamic (CFD) Simulation
17 pages
Projects
No ratings yet
Projects
35 pages
DNM ENG Series PDF
No ratings yet
DNM ENG Series PDF
24 pages
Technical Article SSAB Structural Hollow Sections For Functional Design According To Eurocode3
No ratings yet
Technical Article SSAB Structural Hollow Sections For Functional Design According To Eurocode3
17 pages
Transformer Selection
100% (1)
Transformer Selection
2 pages
ATS001386E GSS G GST G Garden Speaker Datasheet
No ratings yet
ATS001386E GSS G GST G Garden Speaker Datasheet
2 pages
Shivansh Rai Project
No ratings yet
Shivansh Rai Project
55 pages
Introduction To Internet of Things Prof. Sudip Misra Assignment 1
No ratings yet
Introduction To Internet of Things Prof. Sudip Misra Assignment 1
24 pages
Technical Data Sheet: HL-10T8-PC
No ratings yet
Technical Data Sheet: HL-10T8-PC
2 pages
Coding Key PVQ40
No ratings yet
Coding Key PVQ40
4 pages
Response of Mung Bean (Vigna Radiata (L.) R. Wilczek) To An Increasing Natural Temperature Gradient Under Different Crop Management Systems
No ratings yet
Response of Mung Bean (Vigna Radiata (L.) R. Wilczek) To An Increasing Natural Temperature Gradient Under Different Crop Management Systems
18 pages
Astm - C177 - 10
No ratings yet
Astm - C177 - 10
23 pages
Unit-4.5 Fuzzy Fication and Defuzzification and Applications
No ratings yet
Unit-4.5 Fuzzy Fication and Defuzzification and Applications
7 pages
Thermos
No ratings yet
Thermos
41 pages
Task Description PC Comm. Electrical
No ratings yet
Task Description PC Comm. Electrical
7 pages
LTE Outbound Roaming Session For PCRF: Samir Mohanty
No ratings yet
LTE Outbound Roaming Session For PCRF: Samir Mohanty
82 pages
9395P Manual
No ratings yet
9395P Manual
232 pages
Math8 - q1 - w4 - d1 - Adding and Subtracting Rational Algebraic Expression - M8AL Ia B 1 - v1
No ratings yet
Math8 - q1 - w4 - d1 - Adding and Subtracting Rational Algebraic Expression - M8AL Ia B 1 - v1
4 pages
DLL - SCIENCE 5-3rd Quarter Week 1-9
100% (6)
DLL - SCIENCE 5-3rd Quarter Week 1-9
42 pages
SMS46KI03I
No ratings yet
SMS46KI03I
3 pages
上課筆記 week 13
No ratings yet
上課筆記 week 13
17 pages

Chapter 13

Uploaded by

Chapter 13

Uploaded by

Silberschatz, Korth and Sudarshan 13.

1 Database System Concepts - 5

You might also like