0% found this document useful (0 votes)
24 views

Ext Sorting

Uploaded by

fovoni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Ext Sorting

Uploaded by

fovoni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Spring 2017

EXTERNAL SORTING
(CH. 13 IN THE COW BOOK)

2/7/17 CS 564: Database Management Systems; (c) Jignesh M. Patel, 2013 1


Motivation for External Sort
• Often have a large (size greater than the available
main memory) that we need to sort.
• Why are we sorting:
– Query processing: e.g. there are sort-based join and
aggregate algorithms
– Bulkload B+-tree: recall you had to sort the data
entries in the leaf level for this.
– One can specify ORDER BY in SQL, which sorts the
output of the query
–…
2/7/17 CS 564: Database Management Systems; (c) Jignesh M. Patel, 2013 2
Problem Statement
• Given M memory pages, and a relation of size N pages,
where N > M, sort R on a sort key, to produce an output
relation R’ that is sorted on the sort key.
• Example: Sort the following table on zipcode
CREATE TABLE Tweets (
uniqueMsgID INTEGER, -- unique message id
tstamp TIMESTAMP, -- when was the tweet posted
uid INTEGER, -- unique id of the user
msg VARCHAR (140), -- the actual message
zip INTEGER, -- zipcode when posted
retweet BOOLEAN -- retweeted?
);

• Another example: SELECT * FROM Tweets


WHERE tstamp = TODAY
Note the sort key can be composite
ORDER BY zip
2/7/17 CS 564: Database Management Systems; (c) Jignesh M. Patel, 2013 3
Goal of a good sort algorithm
• Sort efficiently! Where does the
memory come from?
• Sort well!
– Able to sort large relations with “small” amounts of
main memory
• What does sort efficiently mean:
– Minimize the number of disk I/Os
– Try using sequential I/Os rather than random I/Os
– Minimize the CPU costs
– Overlap I/O operations with CPU operations
Quick note: Sorting is very important in MapReduce. The reducer
expects data to arrive in sorted order from the mappers.
2/7/17 CS 564: Database Management Systems; (c) Jignesh M. Patel, 2013 4
2-Way Sort: Requires 3 Buffers
• Pass 1: Read a page, sort it, write it (a run).
– only one buffer page is used
• Pass 2, 3, …, etc.: Algorithms for
sorting in memory?
– three buffer pages used.

INPUT 1

OUTPUT
INPUT 2

Disk Main memory buffers Disk

2/7/17 CS 564: Database Management Systems 5


Two-Way External Merge Sort
3,4 6,2 9,4 8,7 5,6 3,1 2 Input file
• Read & write entire file in PASS 0
each pass 3,4 2,6 4,9 7,8 5,6 1,3 2 1-page runs
PASS 1
• N pages, # passes = 2,3 4,7 1,3
2-page runs
!"log 2 N #$ +1 4,6 8,9 5,6 2
PASS 2
• So total cost is: 2,3
4,4 1,2
2N ("log 2 N # + 1)
4-page runs
6,7 3,5
6
• Divide and conquer 8,9
PASS 3

How can we utilize more


1,2
2,3

than three buffer pages? 3,4


4,5
8-page runs

6,6
7,8
9
2/7/17 CS 564: Database Management Systems 6
General External Merge Sort
• Sort a file with N pages using B buffer pages:
– Pass 0: use B buffer pages (run size = B pgs).
Produce éN/Bù sorted runs of B pages each.
– Pass 2, 3, …: merge B-1 runs.

INPUT 1

... ...
INPUT 2
... OUTPUT

INPUT B-1
Disk Disk
B-1 way merge.
Total buffer pages: B Where are the main memory
buffer pages allocated?
2/7/17 CS 564: Database Management Systems 7
Cost of External Sort Merge
• # passes =
• I/O Cost = # passes * 2 N
• Consider sorting a file with a 1000 pages, using 11
buffer pages.
!1000 #
– At the end of the first pass, we have "" $$ = 91 runs of
11
size 11 pages
! 91#
– Next pass produces "" ``$$ = 10 runs
of size 110 pages each
10
– The next pass produces the fully
`` sorted file

2/7/17 CS 564: Database Management Systems; (c) Jignesh M. Patel, 2013 8


Number of Passes of External Sort
N (# of pages) B=3 B=17 B=257
100 7 2 1
10,000 13 4 2
1,000,000 20 5 3
10,000,000 23 6 3
100,000,000 26 7 4
1,000,000,000 30 8 4
32K pg
size, 32TB @1ms per read, 1111
relation
hours = 46 days!
2/7/17 CS 564: Database Management Systems 9
Size of the
Internal Sort Algorithm: Replacement Sort buffer pool?
Example: M = 2 pages, 2 tuples per page.
Input Sequence: 10, 20, 30, 40, 25, 35, 9, 8, 7, 6, 5, …
1. In-memory 10, 20, 30, 40
2. Read 25, Output 10. In-memory: 20, 25, 30, 40
3. Read 35, Output 20. In-memory : 25, 30, 35, 40
4. Read 9, Output 25. In-memory : 9, 30, 35, 40
5. Read 8, Output 30. In-memory : 8, 9, 35, 40
6. Read 7, Output 35. In-memory : 7, 8, 9, 40
7. Read 6, Output 40. In-memory : 6, 7, 8, 9
8. Read 5, Flush output, Start new run. In-memory …
On Disk: 10, 20, 25, 30, 35, 40

Average length of a run in replacement sort is 2M


2/7/17 CS 564: Database Management Systems 10
Internal Sort Algorithm
• Quicksort is a fast way to sort in memory.
• An alternative is replacement sort, which is also called tournament
sort or heapsort
– Top:Read in M pages of the relation R
– Output:move smallest record to output buffer
– Read in a new record r
– insert r into “sorted heap”
– if r not smallest, then GOTO Output
– else remove r from “heap”
– output “heap” in order; GOTO Top
• Worst-Case: What is min length of a run? How does this arise?
• Best-Case: What is max length of a run? How does this arise?
• Quicksort is faster, but longer runs often means fewer passes!
2/7/17 CS 564: Database Management Systems 11
Blocked I/Os
• So far we reading/writing one page at a time, but we
know that reading a block of pages sequentially is faster.
• Make each buffer (input/output) be a block of pgs.
– Will reduce fan-out during merge passes! Side-effect?
– Reduces per page I/O cost.
– First Pass: Each run 2B pages, ⌈N/2B⌉ runs (where B is the size
of the buffer pool in #pages)
• Which internal sort algorithm are we using?

– Merge Tree Fanout: F = ⌊B/b⌋ - 1, b is block size


– # passes: ⌈logF …⌉ + 1
– In practice, buffer pools are large, so most files are sorted in 2-3
passes
2/7/17 CS 564: Database Management Systems 12
Reduces response time.
Double Buffering What about throughput?

• Overlap CPU and IO processing


• Prefetch into shadow block.
– Potentially, more passes; in practice, 2-3 passes.

INPUT 1

INPUT 1'

INPUT 2
OUTPUT
INPUT 2'
OUTPUT'

b
block size
Disk INPUT k
Disk
INPUT k'

B main memory buffers, k-way merge


2/7/17 CS 564: Database Management Systems 13
Using B+ Trees for Sorting
• Scenario: Table to be sorted has B+ tree index on
sorting column(s).
• Idea: Can retrieve records in order by traversing leaf
pages.
• Is this a good idea?
• Cases to consider:
– B+ tree is clustered Good idea!
– B+ tree is not clustered Could be a very bad idea!

2/7/17 CS 564: Database Management Systems 14


Clustered B+ Tree Used for Sorting
• Go to the left-most leaf,
then retrieve all leaf Index
pages (Directs search)

• If data entry has records,


Data Entries
then we are done! ("Sequence set")
• If the data entries have
rids, each data page is
fetched just once (since Data Records
this is a clustered index)
Faster than
external sorting! Why not scan the data file directly?

2/7/17 CS 564: Database Management Systems 15


Unclustered B+ Tree Used for Sorting
• Unclustered B+-trees only have rids in the data entries
• So, in general, one I/O per data record!

When can this be useful? Index (Directs search)

Data Entries
("Sequence set")

Data Records

2/7/17 CS 564: Database Management Systems 16


Sorting Records!
• Sorting is a competitive sport!
• See http://sortbenchmark.org/
– Task is to sort 100 byte records.
– Different flavors of metrics that people compete on.
– Sort at trillion records as fast as you can,
• using general purpose sorting code (Daytona) or
• code specialized just for the benchmark (Indy)

2/7/17 CS 564: Database Management Systems 17

You might also like