External Searching: B-Trees: Dr. Jicheng Fu
External Searching: B-Trees: Dr. Jicheng Fu
Dr. Jicheng Fu
Internal search
The data structure is kept in RAM
E.g., binary search trees and AVL trees
External search
Locate and retrieve records stored in hard disks
B-trees are designed for external searches
B-trees stay in the hard disk
Differences between RAM and hard disks
RAM
Access time: Microseconds
Access unit: Words (usually 4 bytes)
Hard disks
Access time: Milliseconds
Access units: Pages or blocks (usually 1 KB or more)
Goal in external searching
Minimize the number of disk accesses
Disk access time is much longer than internal
computation time
A disk block may have room for several records
A multiway decision can be used in a block
Multiway trees are more appropriate
Reduce the tree height, therefore reduce disk accesses
Multiway Search Trees
General method
If the entry to be deleted is not in a leaf, then its
immediate predecessor (or successor) is
guaranteed to be in a leaf
The immediate predecessor (or successor) is
promoted into the position of the deleted entry, and
the entry is deleted from the leaf
If the leaf contains more than the minimum number
of entries, then one of them can be deleted with no
further action
If the leaf contains just the minimum number of
records (underflow), then we first look at the two
sibling leaves (or, in the case of a node on the
outside, one leaf) that are immediately adjacent to
each other and are children of the same node
If one of these has more than the minimum number of
entries, then one of them can be moved into the parent
node, and the entry from the parent moved into the leaf
where the deletion is occurring
If the adjacent leaf has only the minimum number of
entries, then the two leaves and the median entry from
the parent can all be combined as one new leaf, which
will contain no more than the maximum number of
entries allowed
If this step leaves the parent node with too few
entries, then the process propagates upward. In
the extreme case, the last entry is removed from
the root, and then the height of the tree decreases
Deletion Algorithms
Recursion is employed in the implementation of the
deletion algorithm
If underflow occurs in a node,
we do not pull an entry down from a parent node during an
inner recursive call, and
the recursive function is allowed to return even though
there are too few entries in the node.
The outer call will then detect this underflow and move
entries as required
When the last entry is removed from the root, then the
empty node is deleted and the height of the B-tree shrinks
Public deletion method: remove
template <class Record, int order>
Error_code B_tree<Record, order> :: remove(const Record &target)
/* Post: If a Record with Key matching that of target, success is
returned and the corresponding node is removed from the B-tree.
Otherwise, not_present is returned.
Uses: Function recursive_remove */
{
Error_code result;
result = recursive_remove(root, target);
if (root != NULL && root->count == 0) { // root is now empty.
B_node<Record, order> *old_root = root;
root = root->branch[0];
delete old_root;
}
return result;
}
Auxiliary recursive function: recursive_remove
It first searches the current node for target
If target is found and the current node is not a leaf,
the immediate predecessor of target is located and is placed
in the current node, and
the recursive process continues. BUT the entry to be deleted
is the immediate predecessor, not target any more
Deletion from a leaf is straightforward, and otherwise the
process continues by recursion
When a recursive call returns, the function checks to see if
enough entries remain in the appropriate node
If not, it moves entries as required
template <class Record, int order>
Error_code B_tree<Record, order> :: recursive_remove(
B_node<Record, order> *current, const Record &target)
/* Pre: current is either NULL or points to the root node of a subtree
of a B_tree
Post: If a Record with Key matching that of target belongs to the
subtree, a code of success is returned and the corresponding
node is removed from the subtree so that the properties of a B-
tree are maintained. Otherwise, a code of not_present is
returned.
Uses: Functions search_node, copy_in_predecessor,
recursive_remove (recursively), remove_data, and restore. */
{
Error_code result;
int position;
if (current == NULL) result = not_present;
(continued on next slide)
else {
if (search_node(current, target, position) == success) {
// The target is in the current node.
result = success;
if (current->branch[position] != NULL) { // not at a leaf node
copy_in_predecessor(current, position);
recursive_remove(current->branch[position],
current->data[position]);
}
else remove_data(current, position); // Remove from a leaf node.
}
else result = recursive_remove(current->branch[position], target);
if (current->branch[position] != NULL)
if (current->branch[position]->count < (order - 1)/2)
restore(current, position);
}
return result;
}
Auxiliary function: remove_data
Delete an entry in a leaf node