Data Structures and Algorithms For External Storage: External Sorting. Index Files

Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

DSA - lecture 12 - T.U.Cluj-Napoca - M.

Joldos 1
Data Structures and
Algorithms for External
Storage
External sorting. Index files.
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 2
External Storage
Secondary memory
Typically organized in blocks
Basic operations involve buffers
Cost measure
Disk: seek time, latency time
Block accesses
Data typically stored in files
Files
Sequential access
Direct access
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 3
Files
All algorithms so far assumed that all
elements of a (large) array can be accessed
randomly.
If the array is too large to fit in main memory,
it has to be kept on a secondary storage
device.
Typically, if data is organized as sequential
files, which guarantee (in average) constant
access time only for strictly sequential read
and write operations.
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 4
Storing Information in Files
Typical operations on files:
insert a particular record into a particular file.
delete from a particular file all records having a
designated key value in each of a designated set
of fields.
modify all records in a particular file by setting to
designated values certain fields in those records
that have a designated value in each of another
set of fields.
retrieve all records having designated values in
each of a designated set of fields.
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 5
External Sorting
External sorting: sorting data stored on
secondary memory (typically as files)
Cost measures:
Number of block accesses
(The number of steps required to sort n records)
(The number of comparisons between keys
needed to sort n records (if the comparison is
expensive))
(The number of times the records must be
moved)
Note that the items in paranthesis refer to main
memory
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 6
Merge Sort
Idea: organize file into progressively larger runs
run: sequence of records r
1
, , r
k
, where key(r
1
) s key(r
2
)
ss key(r
k
)
length of run
tail
Example



Begin with two files, say f
1
and f
2
, organized into runs of
length k
Assume that:
The numbers of runs, including tails, on f
1
and f
2
differ by at
most one,
At most one of f
1
and f
2
has a tail, and
The one with a tail has at least as many runs as the other.
Merge Sort for Files
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 7
Merge Sort for Files
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 8
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 9
Mergesort Example
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 10
Speed up Mergesort
Begin with a pass that:
reads k records in memory,
sorts them with (quicksort),
writes them back,
then merge
Use more channels to secondary memory
to make efficient use of processor speed
Carefully select run to replenish if runs are much
larger than block size
Based on the last keys compared
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 11
Speed up Mergesort Example
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 12
Multiway Merge
If reading and writing between main and secondary memory is the
bottleneck, perhaps we could save time if we had more the one data
channel. Suppose that
We have 2m disk units, each with its own channel. We could place
m files, f
1
, f
2
,...,f
m
on m of the disk units, say organized as runs of
length k.
We can read m runs, one from each file, and merge them into one
run of length mk. This run is placed on one of m output files g
1
,
g
2
,..., g
m
, each getting a run in turn.
The merging process in main memory can be carried out in O(log m)
steps per record if we organize candidate records, into a heap
If we have n records, and the length of runs is multiplied by m with
each pass, then after i passes runs will be of length m
i
.
If m
i
n, that is, after i = log
m
n passes, the entire list will be
sorted. As log
m
n = log
2
n / log
2
m, we save by a factor of log
2
m in
the number of times we read each record
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 13
Polyphase Sort
We can perform an m-way merge sort with only
m+1 files, as an alternative to the 2m-file strategy:
In one pass, when runs from each of m files are merged
into runs of the m+1
st
file, we need not use all the runs on
each of the m input files. Rather, each file, when it becomes
the output file, is filled with runs of a certain length. It uses
some of these runs to help fill each of the other m files
when it is their turn to be the output file.
Each pass produces files of a different length. Since each of
the files loaded with runs on the previous m passes
contributes to the runs of the current pass, the length on
one pass is the sum of the lengths of the runs produced on
the previous m passes. ( If fewer than m passes have taken
place, regard passes prior to the first as having produced
runs of length 1.)
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 14
Polyphase Sort Example
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 15
Alternative File Organizations
Many alternatives exist, each ideal for some
situation , and not so good in others:
Heap files: Suitable when typical access is a file
scan retrieving all records.
Sorted Files: Best if records must be retrieved in
some order, or only a `range of records is needed.
Hashed Files: Good for equality selections.
File is a collection of buckets. Bucket = primary
page plus zero or more overflow pages.
Hashing function h: h(r) = bucket in which
record r belongs. h looks at only some of the
fields of r, called the search fields.
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 16
Indexes
An index on a file speeds up selections on
the search key field(s)
Search key = any subset of the fields of a
record
Search key is not the same as key (minimal set of
fields that uniquely identify a record).
Entries in an index: (k, r), where:
k = the key
r = the record OR record id OR record ids

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 17
Index Classification
Clustered/unclustered
Clustered = records sorted in the key order
Unclustered = no
Dense/sparse
Dense = each record has an entry in the index
Sparse = only some records have
Primary/secondary
Primary = on the primary key
Secondary = on any key
Some books interpret these differently
B
+
tree / Hash table /
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 18
Clustered vs. Unclustered Index
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 19
Multiway Search Trees
Multiway Search Trees (MWSTs) are a generalization of
BSTs
MWST of order n:
Each node has n or fewer sub-trees: S
1
S
2
.S
m
, m n
Each node has n 1or fewer keys
k
1
k
2
k
m1
: m1 keys in ascending order k(S
i
) k
i
k(S
i
+1) ,
k(S
m1
) < k(S
m
)
Suitable for disks:
Nodes correspond to disk pages
Pros:
tree height is low for large n
fewer disk accesses
Cons:
low space utilization if non-full
MWSTs are non-balanced in general!
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 20
MWST Example
Example: 4000 keys, n=5
At least 4000/(51) nodes (pages)
1
st
level(root): 1 node, 4 keys, 5 sub-trees
+2
nd
level: 5 nodes, 20 keys, 25 sub-trees
+3
rd
level: 25 nodes, 100 keys, 125 sub-trees
+4
th
level: 125 nodes, 500 keys, 525 sub-trees
+5
th
level: 525 nodes, 2100 keys, 2625 sub-tress
+6
th
level: 2625 nodes, 10500 keys,
tree height = 6 (including root)
If n = 11 at least 400 nodes
tree height = 3
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 21
Operations and Issues on MWSTs
Operations
Search: returns pointer to node containing the key and
position of key in the node
Insert: new key if not the tree
Delete: existing key
Important Issues
Keep MWST balanced after insertions or deletions
Balanced MWSTs: B-trees, B+-trees
Reduce number of disk accesses
Data storage: two alternatives
1. inside nodes: less sub-trees, nodes
2. pointers from the nodes to data pages
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 22
B Trees
So far search trees were limited to main memory
structures
Assumption: the dataset organized in a search tree fits in
main memory (including the tree overhead)
Counter-example: transaction data of a bank > 1
GB per day
use secondary storage media (punch cards, hard disks,
magnetic tapes, etc.)
Consequence: make a search tree structure
secondary-storage-enabled
B Trees - Proposed by R. Bayer and E. M.
McCreigh in 1972.
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 23
B-tree Definitions
Node x has fields
n[x]: the number of keys of that the node
key
1
[x] s s key
n[x]
[x]: the keys in ascending order
leaf[x]: true if leaf node, false if internal node
if internal node, then c
1
[x], , c
n[x]+1
[x]: pointers to children
Keys separate the ranges of keys in the subtrees. If k
i
is an
arbitrary key in the subtree c
i
[x] then k
i
s key
i
[x] s k
i+1
Every leaf has the same depth
In a B-tree of a degree t all nodes except the root node
have between t and 2t children (i.e., between t1 and 2t1
keys).
The root node has between 0 and 2t children (i.e., between
0 and 2t1 keys)
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 24
B Tree Examples
n=3
n=5
n=7
n is the number of keys
stored in a node
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 25
Binary-trees vs. B-trees
Size of B-tree nodes: determined by the page size.
One page = one node.
A B-tree of height 2 may contain > 1 billion keys!
Heights of Binary-tree and B-tree are logarithmic
Binary-tree: logarithm of base 2
B-tree: logarithm of base, e.g., 1000

1000
1000 1000 1000
1001
1000 1000 1000
1001
1001 1001
root:1 node
1000 keys
level 1:1001 nodes,
1,001,000 keys
level 2: 1,002,001 nodes,
1,002,001,000 keys
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 26
B-tree T of height h, containing n > 1 keys and
minimum degree t > 2, the following restriction on
the height holds:


Height of a B-tree
1
log
2
t
n
h
+
s
1
1
1 ( 1) 2 2 1
h
i h
i
n t t t

=
> + =

1
t - 1 t - 1
t - 1 t - 1 t - 1
t t
t - 1 t - 1 t - 1
0 1
1 2
2 2t
depth
#of
nodes
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 27
B-tree Operations
An implementation needs to support the
following B-tree operations

Searching (simple)
Creating an empty tree (trivial)
Insertion (complex)
Deletion (complex)
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 28
Creating an Empty Tree. Searching
Creating:
Empty B-tree = create a
root & write it to disk!
Searching
Straightforward
generalization of a binary
tree search
BTreeCreate(T)
01 x AllocateNode();
02 leaf[x] TRUE;
03 n[x] 0;
04 DiskWrite(x);
05 root[T] x
BTreeSearch(x,k)
01 i 1
02 while i s n[x] and k > key
i
[x]
03 i i+1
04 if i s n[x] and k = key
i
[x] then
05 return(x,i)
06 if leaf[x] then
08 return NIL
09 else DiskRead(c
i
[x])
10 return BTtreeSearch(c
i
[x],k)
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 29
Splitting Nodes (1)
P Q R S T V
W
T
1
T
8
...
... N W ...
y = c
i
[x]
x
... N S W ...
x
P Q R T V W
y = c
i
[x] z = c
i+1
[x]
Nodes fill up and reach their maximum capacity 2t 1
Before we can insert a new key, we have to make room,
i.e., split nodes
Result: one key of x moves up to parent + 2 nodes with
t 1 keys
x: parent node
y: node to be split and child of x
i: index in x
z: new node
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 30
Splitting Nodes (2)
BTreeSplitChild(x,i,y)
01 z AllocateNode()
02 leaf[z] leaf[y]
03 n[z] t-1
04 for j 1 to t-1
05 key
j
[z] key
j+t
[y]
06 if not leaf[y] then
07 for j 1 to t
08 c
j
[z] c
j+t
[y]
09 n[y] t-1
10 for j n[x]+1 downto i+1
11 c
j+1
[x] c
j
[x]
12 c
i+1
[x] z
13 for j n[x] downto i
14 key
j+1
[x] key
j
[x]
15 key
i
[x] key
t
[y]
16 n[x] n[x]+1
17 DiskWrite(y)
18 DiskWrite(z)
19 DiskWrite(x)
x: parent node
y: node to be split and child of x
i: index in x
z: new node
P Q R S T V
W
T
1
T
8

...
... N W ...
y = c
i
[x]
x
Running Time:
A local operation that does not traverse the tree
O(t) CPU-time, since two loops run t times
3 I/Os
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 31
Inserting Keys
Done recursively, by starting from the root
and recursively traversing down the tree to
the leaf level
Before descending to a lower level in the
tree, make sure that the node contains less
than 2t 1 keys:
so that if we split a node in a lower level we will
have space to include a new key
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 32
Inserting Keys (2)
Special case: root is full (BtreeInsert)

BTreeInsert(T)
01 r root[T]
02 if n[r] = 2t 1 then
03 s AllocateNode()
05 root[T] s
06 leaf[s] FALSE
07 n[s] 0
08 c
1
[s] r
09 BTreeSplitChild(s,1,r)
10 BTreeInsertNonFull(s,k)
11 else BTreeInsertNonFull(r,k)
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 33
Splitting the root requires the creation of a
new root





The tree grows at the top instead of the
bottom
Splitting the Root
A D F H L N P
T
1
T
8
...
root[T]
r
A D F L N P
H
root[T]
s
r
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 34
Inserting Keys
BtreeNonfull tries to insert a key k into a
node x, which is assumed to be non-full
when the procedure is called
BTreeInsert and the recursion in
BTreeInsertNonfull guarantees that this
assumption is true!
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 35
Inserting Keys
BTreeInsertNonFull(x,k)
01 i n[x]
02 if leaf[x] then
03 while i > 1 and k < key
i
[x]
04 key
i+1
[x] key
i
[x]
05 i i - 1
06 key
i+1
[x] k
07 n[x] n[x] + 1
08 DiskWrite(x)

09 else while i > 1 and k < key
i
[x]
10 i i - 1
11 i i + 1
12 DiskRead c
i
[x]
13 if n[c
i
[x]] = 2t 1 then
14 BTreeSplitChild(x,i,c
i
[x])
15 if k > key
i
[x] then
16 i i + 1
17 BTreeInsertNonFull(c
i
[x],k)
leaf insertion
internal node:
traversing tree
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 36
Insertion: Example
G M P X
A C D
E
J K R S T U
V
N O Y Z
G M P X
A B C D
E
J K R S T U
V
N O Y Z
G M P T X
A B C D
E
J K Q R
S
N O Y Z U V
initial tree (t = 3)
B
inserted
Q
inserted
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 37
Insertion: Example (2)
G M
A B C D
E
J K L Q R
S
N O Y Z U V
T X
P
C G M
A B J K L Q R
S
N O Y Z U V
T X
P
D E F
L inserted
F
inserted
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 38
Insertion: Running Time
Disk I/O: O(h), since only O(1) disk accesses
are performed during recursive calls of
BTreeInsertNonFull
CPU: O(th) = O(t log
t
n)
At any given time there are O(1) number of
disk pages in main memory
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 39
Deleting Keys
Done recursively, by starting from the root and
recursively traversing down the tree to the leaf level
Before descending to a lower level in the tree,
make sure that the node contains at least t keys (cf.
insertion less than 2t 1 keys)
BtreeDelete distinguishes three different
stages/scenarios for deletion
Case 1: key k found in leaf node
Case 2: key k found in internal node
Case 3: key k suspected in lower level node
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 40









Case 1: If the key k is in node x, and x is a leaf,
delete k from x
Deleting Keys (2)
C G M
A B J K L Q R S N O Y Z U V
T X
P
D E F
initial tree
C G M
A B J K L Q R S N O Y Z U V
T X
P
D E
F
deleted:
case 1
x
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 41
Deleting Keys (3)
C G L
A B J K Q R S N O Y Z U V
T X
P
D E
M deleted:
case 2a
Case 2: If the key k is in node x, and x is not a leaf,
delete k from x
a) If the child y that precedes k in node x has at least t keys,
then find the predecessor k of k in the sub-tree rooted at y.
Recursively delete k, and replace k with k in x.
b) Symmetrically for successor node z
x
y
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 42
Deleting Keys (4)
If both y and z have only t 1 keys, merge k with
the contents of z into y, so that x loses both k and
the pointers to z, and y now contains 2t 1 keys.
Free z and recursively delete k from y.
C L
A B D E J K Q R
S
N O Y Z U V
T X
P
G
deleted:
case 2c
y = y+k + z - k
x - k
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 43
Deleting Keys - Distribution
Descending down the tree: if k not found in
current node x, find the sub-tree c
i
[x] that has to
contain k.
If c
i
[x] has only t 1 keys take action to ensure
that we descent to a node of size at least t.
Case 1 (two cases exist): if c
i
[x] has only t 1 keys,
but a sibling with at least t keys, give c
i
[x] an extra
key by:
moving a key from x to c
i
[x],
moving a key from c
i
[x]s immediate left and right
sibling up into x, and
moving the appropriate child from the sibling into
c
i
[x] - distribution
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 44
Deleting Keys Distribution(2)
C L P T X
A B E J K Q R
S
N O Y Z U V
c
i
[x]
x
sibling
delete B
B deleted:
E L P T X
A C J K Q R
S
N O Y Z U V
... k ...
...
k
A B
c
i
[x]
x ... k ...
... k
A
c
i
[x]
B
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 45
Deleting Keys - Merging
If c
i
[x] and both of c
i
[x]s siblings have t 1
keys, merge c
i
with one sibling:
moving a key from x down into the new merged
node to become the median key for that node
x ... l m ...
...l k m
...
A B
x ... l k m...
...
l
m
A B
c
i
[x]
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 46
Deleting Keys Merging (2)
tree shrinks in
height
D
deleted:
C L P T X
A B E J K Q R
S
N O Y Z U V
C L
A B D E J K Q R
S
N O Y Z U V
T X
P
delete D
c
i
[x]
sibling
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 47
Deletion: Running Time
Most of the keys are in the leaf, thus deletion most
often occurs there!
In this case deletion happens in one downward
pass to the leaf level of the tree
Case 2: Deletion from an internal node might
require backing up
Running time:
Disk I/O: O(h), since only O(1) disk operations are
produced during recursive calls
CPU: O(th) = O(t log
t
n)
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 48
Two-pass Operations
Simpler, practical versions of algorithms use
two passes (down and up the tree):
Down Find the node where deletion or insertion
should occur
Up If needed, split, merge, or distribute;
propagate splits, merges, or distributes up the tree
To avoid reading the same nodes twice, use
a buffer of nodes

B-Tree / B+Tree animations
B-Tree
http://slady.net/java/bt/view.php
http://www.youtube.com/watch?v=coRJrcIYbF4
http://ats.oka.nu/b-tree.en.html
http://www.cs.auckland.ac.nz/software/AlgAnim/n_ary_trees.html
B+tree
http://www.seanster.com/BplusTree/BplusTree.ht
ml
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 49
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos 50

AHU, chapter 11
CLR, chapter 19, CLRS chapter 18
Notes
Reading

You might also like