0% found this document useful (0 votes)
28 views66 pages

External Searching: B-Trees: Dr. Jicheng Fu

The document discusses B-trees, which are multiway search trees used to store data in hard disks. B-trees are designed to minimize disk accesses by keeping the tree short and balanced. The key points are: 1) B-trees allow multiple children per node, with nodes filling an entire disk block for efficient access. 2) Searching and insertion on B-trees is similar to binary search trees but considers multiple children at each node. 3) Insertion may cause nodes to split, with the median key propagating up to balance the tree and keep it of minimum height.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views66 pages

External Searching: B-Trees: Dr. Jicheng Fu

The document discusses B-trees, which are multiway search trees used to store data in hard disks. B-trees are designed to minimize disk accesses by keeping the tree short and balanced. The key points are: 1) B-trees allow multiple children per node, with nodes filling an entire disk block for efficient access. 2) Searching and insertion on B-trees is similar to binary search trees but considers multiple children at each node. 3) Insertion may cause nodes to split, with the median key propagating up to balance the tree and keep it of minimum height.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 66

External Searching: B-Trees

Dr. Jicheng Fu

Department of Computer Science


University of Central Oklahoma
Objectives (Section 11.3)

 The difference between internal and external


searches
 Multiway Search Trees
 The definition of B-trees
 The search algorithm of B-trees
 Insertion algorithms of B-trees
 Deletion algorithms of B-trees
Motivation

 Internal search
 The data structure is kept in RAM
 E.g., binary search trees and AVL trees
 External search
 Locate and retrieve records stored in hard disks
 B-trees are designed for external searches
 B-trees stay in the hard disk
 Differences between RAM and hard disks
 RAM
 Access time: Microseconds
 Access unit: Words (usually 4 bytes)
 Hard disks
 Access time: Milliseconds
 Access units: Pages or blocks (usually 1 KB or more)
 Goal in external searching
 Minimize the number of disk accesses
 Disk access time is much longer than internal
computation time
 A disk block may have room for several records
 A multiway decision can be used in a block
 Multiway trees are more appropriate
 Reduce the tree height, therefore reduce disk accesses
Multiway Search Trees

 An m-way search tree is a tree in which, for


some integer m called the order of the tree,
each node has at most m children
 If k  m is the number of children, then the node
contains exactly k -1 keys, which partition all the
keys into k subsets consisting of
 all the keys less than the first key in the node
 all the keys between a pair of keys in the node, and
 all keys greater than the largest key in the node
 A B-tree is an m-way tree
 A 5-way search tree
 More about m
 m is decided by
 Disk block size (s)
 Record size (r)
 Pointer size (p)
 m  p  ( m  1)  r  s
 sr 
 m 
 p  r 
Balanced Multiway Tree: The B-
Tree
 Motivation (minimizing disk accesses)
 Reduce the height of the tree as much as possible
 Solution
 No empty subtrees appear above the leaves
 The division of keys in subsets is efficient
 All leaves stay on the same level
 Guarantee the worst case performance
 Every internal (non-leaf) node has at least some
minimal number of children
 Definition
 A B-tree of order m is an m-way search tree in
which
 All leaves are on the same level
 All internal nodes except the root have at most m
nonempty children, and at least m / 2 nonempty
children (No overflow or underflow)
 The number of keys in each internal node is one less
than the number of its nonempty children, and these
keys partition the keys in the children in the fashion of a
search tree
 The root has at most m children, but may have as few as
two if it is not a leaf, or none if the tree consists of the
root alone
 A B-tree of order 5
C++ Implementation
 For simplicity, the algorithms assume that the
B-tree is constructed in RAM
 Pointers are used to describe its structure
 In real applications,
 the addresses of disk blocks are used in place of
the pointers,
 taking a pointer reference becomes making a disk
access, and
 each tree node corresponds to a disk block
 Auxiliary Disk I/O methods are needed
 Declaration of B_tree class
 Users are allowed to specify:
 the type of records to store in a B-tree
 the order of a B-tree (m)
template <class Record, int order>
class B_tree {
public: // Add public methods.
private: // data members
B_node<Record, order>* root;
// Add private auxiliary functions here.
};
 Declaration of B_node class
template <class Record, int order>
struct B_node {
// data members:
int count;
Record data[order - 1];
B_node<Record, order>* branch[order];
// constructor:
B_node( );
};
 Contiguous arrays are used to implement the entry
(record) list and the children list (why?)
 count gives the number of records in the B node
 If count is nonzero then the node has count + 1 children
 About branch[]
 branch[0] points to the subtree containing all records
with keys less than that in data[0]
 For 1  position  count - 1, branch[position] points to
the subtree with keys strictly between data[position - 1]
and data[position]
 branch[count] points to the subtree with keys greater
than that of data[count - 1].
 Constructor
 Create an empty node (root)
 Emptiness is implemented by setting count to 0 in the newly
created node
Searching Algorithm
 The search process is very similar to that of the binary search
tree
 Public search_tree method
template <class Record, int order>
Error_code B_tree<Record, order> :: search_tree(Record &target)
/* Post: If there is an entry in the B-tree whose key matches that in
target, the parameter target is replaced by the corresponding
Record from the B-tree and a code of success is returned.
Otherwise a code of not_present is returned.
Uses: recursive_search_tree */
{
return recursive_search_tree(root, target);
}
 Auxiliary function recursive_search_tree
template <class Record, int order>
Error_code B_tree<Record, order> :: recursive_search_tree(
B_node<Record, order>* current, Record &target)
/* Pre: current is either NULL or points to a subtree of the B_tree.
Post: If target key is not in the subtree, a code of not_present is
returned. Otherwise, a code of success is returned and target is set
to the corresponding Record of the subtree.
Uses: recursive_search_tree recursively and search_node */
{
Error_code result = not_present;
int position;
if (current != NULL) {
result = search_node(current, target, position);
if (result == not_present)
result = recursive_search_tree(current->branch[position], target);
else
target = current->data[position];
}
return result;
}
 A search starts from the root and walks down
through the tree
 Similar to a search through a binary search tree
 In a B-tree, each node is examined to find which
branch to take next (search_node)
 search_node seeks a target among the records stored in a
current node
 search_node uses an output parameter position
 Output the index of the target if found within the current
node, or
 output the index of the branch to continue search
 Auxiliary function search_node
template <class Record, int order>
Error_code B_tree<Record, order> :: search_node(
B_node<Record, order> *current, const Record &target, int
&position)
/* Pre: current points to a node of a B_tree.
Post: If the Key of target is found in *current, then success is returned,
the parameter position is set to the index of target, and the
corresponding Record is copied to target. Otherwise, not_present is
returned, and position is set to the branch index on which to continue
the search.
Uses: Methods of class Record. */
{
position = 0;
while (position < current->count && target > current->data[position])
position++; // Perform a sequential search through the keys.
if (position < current->count && target == current->data[position])
return success;
else
return not_present;
}
 Sequential search is used in search_node
 If the record size is large in some applications, the order
of the B-tree will be relatively small, and sequential
search within the node is appropriate
 For B-trees of large order, binary search may be
used in search_node
 Linked binary search trees may also be used
instead of contiguous arrays
 Red-black trees
B-tree Insertion
 B-trees grow in a bottom-up fashion
 Due to the condition that all leaves be on the
same level
 Different from binary search trees
 General idea
 Search the tree for the new key
 If the key is truly new, this search will terminate in failure
at a leaf
 Insert the new key into to the leaf node
 If the node was not previously full, then the insertion is
finished
 General idea (cont’d)
 If the leaf node is full, the node splits into two
nodes, side by side on the same level
 When a node splits, insert the median key of the
old leaf into its parent node
 Repeat the splitting process in the parent node if
necessary
 When a key is added to a full root, then the root
splits in two
 The median key is sent upward becomes a new root
 This is the only time when the B-tree grows in height
 Example: B-tree insertion (order: 5)
 Discussion
 One splitting prepares the way for several simple
insertions
 When a node splits, it produces two half-full nodes
 Later insertions are less likely to split nodes for a while
 No matter in what order the keys arrive, the tree is
always balanced
Insertion Algorithms

 Insertion into a B-tree can be naturally


implemented as a recursive function
 After insertion in a subtree, a (median) record may
need to be reinserted higher in the tree
 Recursion can keep track of the position within
the tree and move back up the tree
 An explicit auxiliary stack is not necessary
 The recursion function push_down needs
three additional output parameters
 current
 The root of the current subtree under consideration.
 If *current splits to accommodate new entry, push
down returns overflow, and all 3 parameters are
used:
 The old node *current contains the left half of the entries
 median gives the median record
 right_branch points to a new node containing the right
half of the former *current
 Public insertion method: insert
 Needs only one parameter: new_entry
 It calls the recursive method push_down
 If the outermost call to function push_down returns
overflow, then one record, median, remains to be
reinserted into the B-tree
 A new root must then be created to hold median
 The height of the entire B-tree will increase
 This is the only way that the B-tree grows in height
template <class Record, int order>
Error_code B_tree<Record, order> :: insert(const Record
&new_entry)
/* Post: If the Key of new_entry is already in the B-tree, a code of
duplicate_error is returned. Otherwise, a code of success is
returned and the Record new_entry is inserted into the B-tree in
such a way that the properties of a B-tree are preserved
Uses: Methods of struct B_node and the auxiliary function
push_down. */
{
Record median;
B_node<Record, order> *right_branch, *new_root;
Error_code result = push_down(root, new_entry, median,
right_branch);
(continued on next slide)
if (result == overflow) { // The whole tree grows in height
// Make a brand new root for the whole B-tree.
new_root = new B_node<Record, order>;
new_root->count = 1;
new_root->data[0] = median;
new_root->branch[0] = root;
new_root->branch[1] = right_branch;
root = new_root;
result = success;
}
return result;
}
 Auxiliary recursive function: push_down
 Use a parameter current to point to the root of the
subtree being searched
 The condition current == NULL is used to
terminate the recursion
 Continue moving down the tree searching for new_entry
until we hit an empty subtree
 New_entry is not immediately inserted
 overflow is returned and the new record is sent back up
(called median) for later insertion
 Because the B-tree does not grow by adding new leaves
 When a recursive call returns overflow, the record
median needs to be inserted in the current node
 If there is room, then insertion is finished
 Otherwise, the node *current splits into *current and
*right_branch and a new median is sent up the tree
 The function uses three auxiliary functions
 search_node (the same procedure used by searching)
 push_in puts the median record into node *current
provided that there is room
 split splits a full node*current into two nodes that will be
siblings on the same level in the B-tree
template <class Record, int order>
Error_code B_tree<Record, order> :: push_down(
B_node<Record, order> *current, const Record &new_entry,
Record &median, B_node<Record, order> * &right_branch)
/* Pre: current is either NULL or points to a node of a B_tree.
Post: If new_entry is found in the subtree to which current points,
duplicate_error is returned. Otherwise, new_entry is inserted into the
subtree: If this causes the height of the subtree to grow, overflow is
returned, and the Record median is extracted to be reinserted higher in
the B-tree, together with the subtree right_branch on its right. If the
height does not grow, success is returned.
Uses: push_down (recursive), search_node, split_node, and push_in */
{
Error_code result;
int position;
if (current == NULL) {
// Since we cannot insert in an empty tree, the recursion terminates.
median = new_entry;
right_branch = NULL;
result = overflow;
}
(continued on next slide)
else { // Search the current node.
if (search_node(current, new_entry, position) == success)
result = duplicate_error;
else {
Record extra_entry;
B_node<Record, order> *extra_branch;
result = push_down(current->branch[position], new_entry,
extra_entry, extra_branch);
if (result == overflow) { // extra_entry must be added to current
if (current->count < order - 1) {
result = success;
push_in(current, extra_entry, extra_branch, position);
}
else split_node( current, extra_entry, extra_branch, position,
right_branch, median);
// Record median and its right_branch will go up to a higher node
}
}
}
return result;
}
 Auxiliary function: push_in
template <class Record, int order>
void B_tree<Record, order>::push_in(B_node<Record, order> *current,
const Record &entry, B_node<Record, order> *right_branch,
int position)
/* Pre: current points to a node of a B_tree. The node *current is not full
and entry belongs in *current at index position.
Post: entry has been inserted along with its right-hand branch
right_branch into *current at index position. */
{
for (int i = current->count; i > position; i--) {
// Shift all later data to the right.
current->data[i] = current->data[i - 1];
current->branch[i + 1] = current->branch[i];
}
current->data[position] = entry;
current->branch[position + 1] = right_branch;
current->count++;
}
 Auxiliary function: split_node
 Outline
 Insert a record extra_entry with subtree pointer extra_branch
into the full node *current
 Split the right half off as a new node *right_half
 Remove the median record and send it upward for reinsertion
later
 We cannot insert extra_entry directly into the full node
 First determine whether extra_entry belongs in the left or right
half and divide the node accordingly
 Insert extra_entry into the appropriate half
 Divide the node so that median is the largest entry in the left
half
template <class Record, int order>
void B_tree<Record, order> :: split_node(
B_node<Record, order> *current, // node to be split
const Record &extra_entry, // new entry to insert
B_node<Record, order> *extra_branch, // subtree on right of extra_entry
int position, // index in node where extra_entry goes
B_node<Record, order> * &right_half, // new node for right half of entries
Record &median) // median entry (in neither half)
/* Pre: current points to a node of a B_tree. The node *current is full, but if
there were room, the record extra_entry with its right-hand pointer
extra_branch would belong in *current at position, 0  position < order.
Post: The node *current with extra_entry and pointer extra_branch at
position are divided into nodes *current and *right_half separated by a
Record median.
Uses: Methods of struct B_node, function push_in. */
(continued on next slide)
{
right_half = new B_node<Record, order>;
int mid = order/2; // The entries from mid on will go to right_half
if (position <= mid) { // First case: extra_entry belongs in left half
for (int i = mid; i < order - 1; i++) { // Move entries to right_half
right_half->data[i - mid] = current->data[i];
right_half->branch[i + 1 - mid] = current->branch[i + 1];
}
current->count = mid;
right_half->count = order - 1 - mid;
push_in(current, extra_entry, extra_branch, position);
}
(continued on next slide)
else { // Second case: extra_entry belongs in right half
mid++; // Temporarily leave the median in left half.
for (int i = mid; i < order - 1; i++) { // Move entries to right_half
right_half->data[i - mid] = current->data[i];
right_half->branch[i + 1 - mid] = current->branch[i + 1];
}
current->count = mid;
right_half->count = order - 1 - mid;
push_in(right_half, extra_entry, extra_branch, position - mid);
}
median = current->data[current->count - 1];
// Remove median from left half.
right_half->branch[0] = current->branch[current->count];
current->count--;
}
Deletion from a B-Tree

 General method
 If the entry to be deleted is not in a leaf, then its
immediate predecessor (or successor) is
guaranteed to be in a leaf
 The immediate predecessor (or successor) is
promoted into the position of the deleted entry, and
the entry is deleted from the leaf
 If the leaf contains more than the minimum number
of entries, then one of them can be deleted with no
further action
 If the leaf contains just the minimum number of
records (underflow), then we first look at the two
sibling leaves (or, in the case of a node on the
outside, one leaf) that are immediately adjacent to
each other and are children of the same node
 If one of these has more than the minimum number of
entries, then one of them can be moved into the parent
node, and the entry from the parent moved into the leaf
where the deletion is occurring
 If the adjacent leaf has only the minimum number of
entries, then the two leaves and the median entry from
the parent can all be combined as one new leaf, which
will contain no more than the maximum number of
entries allowed
 If this step leaves the parent node with too few
entries, then the process propagates upward. In
the extreme case, the last entry is removed from
the root, and then the height of the tree decreases
Deletion Algorithms
 Recursion is employed in the implementation of the
deletion algorithm
 If underflow occurs in a node,
 we do not pull an entry down from a parent node during an
inner recursive call, and
 the recursive function is allowed to return even though
there are too few entries in the node.
 The outer call will then detect this underflow and move
entries as required
 When the last entry is removed from the root, then the
empty node is deleted and the height of the B-tree shrinks
 Public deletion method: remove
template <class Record, int order>
Error_code B_tree<Record, order> :: remove(const Record &target)
/* Post: If a Record with Key matching that of target, success is
returned and the corresponding node is removed from the B-tree.
Otherwise, not_present is returned.
Uses: Function recursive_remove */
{
Error_code result;
result = recursive_remove(root, target);
if (root != NULL && root->count == 0) { // root is now empty.
B_node<Record, order> *old_root = root;
root = root->branch[0];
delete old_root;
}
return result;
}
 Auxiliary recursive function: recursive_remove
 It first searches the current node for target
 If target is found and the current node is not a leaf,
 the immediate predecessor of target is located and is placed
in the current node, and
 the recursive process continues. BUT the entry to be deleted
is the immediate predecessor, not target any more
 Deletion from a leaf is straightforward, and otherwise the
process continues by recursion
 When a recursive call returns, the function checks to see if
enough entries remain in the appropriate node
 If not, it moves entries as required
template <class Record, int order>
Error_code B_tree<Record, order> :: recursive_remove(
B_node<Record, order> *current, const Record &target)
/* Pre: current is either NULL or points to the root node of a subtree
of a B_tree
Post: If a Record with Key matching that of target belongs to the
subtree, a code of success is returned and the corresponding
node is removed from the subtree so that the properties of a B-
tree are maintained. Otherwise, a code of not_present is
returned.
Uses: Functions search_node, copy_in_predecessor,
recursive_remove (recursively), remove_data, and restore. */
{
Error_code result;
int position;
if (current == NULL) result = not_present;
(continued on next slide)
else {
if (search_node(current, target, position) == success) {
// The target is in the current node.
result = success;
if (current->branch[position] != NULL) { // not at a leaf node
copy_in_predecessor(current, position);
recursive_remove(current->branch[position],
current->data[position]);
}
else remove_data(current, position); // Remove from a leaf node.
}
else result = recursive_remove(current->branch[position], target);
if (current->branch[position] != NULL)
if (current->branch[position]->count < (order - 1)/2)
restore(current, position);
}
return result;
}
 Auxiliary function: remove_data
 Delete an entry in a leaf node

template <class Record, int order>


void B_tree<Record, order> :: remove_data(
B_node<Record, order> *current, int position)
/* Pre: current points to a leaf node in a B-tree with an entry at
position
Post: This entry is removed from *current. */
{
for (int i = position; i < current->count - 1; i++)
current->data[i] = current->data[i + 1];
current->count--;
}
 Auxiliary function: copy_in_predecessor
 Invoked when an entry must be deleted from a non-leaf node

 The immediate predecessor is found and used to replace the entry to


be deleted
template <class Record, int order>
void B_tree < Record, order > :: copy_in_predecessor(
B_node<Record, order> *current, int position)
/* Pre: current points to a non-leaf node in a B-tree with an entry at
position
Post: This entry is replaced by its immediate predecessor */
{
B_node<Record, order> *leaf = current->branch[position];
// First go left from the current entry.
while (leaf->branch[leaf->count] != NULL)
leaf = leaf->branch[leaf->count]; // Move as far rightward as
possible.
current->data[position] = leaf->data[leaf->count - 1];
}
 Auxiliary function: restore
 Restore current->branch[position] to the required
minimum number of entries if a recursive call has
reduced its count below this minimum
 The function that is, it looks first to the left sibling
to take an entry and uses the right sibling only
when there are no entries to spare in the left one
template <class Record, int order>
void B_tree<Record, order> :: restore(
B_node<Record, order> *current, int position)
/* Pre: current points to a non-leaf node in a B-tree; the node to
which current->branch[position] points has one too few entries.
Post: An entry is taken from elsewhere to restore the minimum
number of entries in the node to which current->branch[position]
points
Uses: move_left, move_right, combine. */
{
if (position == current->count) // case: rightmost branch
if (current->branch[position - 1]->count > (order - 1)/2)
move_right(current, position - 1);
else
combine(current, position);
(continued on next slide)
else if (position == 0) // case: leftmost branch
if (current->branch[1]->count > (order - 1)/2)
move_left(current, 1);
else
combine(current, 1);
else // remaining cases: intermediate branches
if (current->branch[position - 1]->count > (order - 1)/2)
move_right(current, position - 1);
else if (current->branch[position + 1]->count > (order - 1)/2)
move_left(current, position + 1);
else
combine(current, position);
}
 Auxiliary function: move_left
template <class Record, int order>
void B_tree<Record, order> :: move_left(
B_node<Record, order> *current, int position)
/* Pre: current points to a node in a B-tree with more than the minimum
number of entries in branch position and one too few entries in branch
position - 1.
Post: The leftmost entry from branch position has moved into current,
which has sent an entry into the branch position - 1. */
{
B_node<Record, order> *left_branch = current->branch[position - 1],
*right_branch = current->branch[position];
// Take entry from the parent
left_branch->data[left_branch->count] = current->data[position - 1];
left_branch->branch[++left_branch->count] = right_branch->branch[0];
(continued on next slide)
// Add the right-hand entry to the parent.
current->data[position - 1] = right_branch->data[0];
right_branch->count--;
for (int i = 0; i < right_branch->count; i++) {
// Move right-hand entries to fill the hole.
right_branch->data[i] = right_branch->data[i + 1];
right_branch->branch[i] = right_branch->branch[i + 1];
}
right_branch->branch[right_branch->count] =
right_branch->branch[right_branch->count + 1];
}
 Auxiliary function: move_right
template <class Record, int order>
void B_tree<Record, order> :: move_right(
B_node<Record, order> *current, int position)
/* Pre: current points to a node in a B-tree with more than the minimum
number of entries in branch position and one too few entries in branch
position + 1
Post: The rightmost entry from branch position has moved into current,
which has sent an entry into the branch position + 1. */
{
B_node<Record, order> *right_branch = current->branch[position + 1],
*left_branch = current->branch[position];
right_branch->branch[right_branch->count + 1] =
right_branch->branch[right_branch->count];
(continued on next slide)
for (int i = right_branch->count ; i > 0; i--) { // Make room for new entry
right_branch->data[i] = right_branch->data[i - 1];
right_branch->branch[i] = right_branch->branch[i - 1];
}
right_branch->count++;
right_branch->data[0] = current->data[position];
// Take entry from parent.
right_branch->branch[0] = left_branch->branch[left_branch->count--];
current->data[position] = left_branch->data[left_branch->count];
}
 Auxiliary function: combine
template <class Record, int order>
void B_tree<Record, order> :: combine(
B_node<Record, order> *current, int position)
/* Pre: current points to a node in a B-tree with entries in the branches
position and position - 1, with too few to move entries
Post: The nodes at branches position - 1 and position have been
combined into one node, which also includes the entry formerly in
current at index position - 1. */
{
int i;
B_node<Record, order> *left_branch = current->branch[position - 1],
*right_branch = current->branch[position];
left_branch->data[left_branch->count] = current->data[position - 1];
left_branch->branch[++left_branch->count] = right_branch->branch[0];
(continued on next slide)
for (i = 0; i < right_branch->count; i++) {
left_branch->data[left_branch->count] = right_branch->data[i];
left_branch->branch[++left_branch->count] =
right_branch->branch[i + 1];
}
current->count--;
for (i = position - 1; i < current->count; i++) {
current->data[i] = current->data[i + 1];
current->branch[i + 1] = current->branch[i + 2];
}
delete right_branch;
}

You might also like