Unit V
Unit V
• Indexing minimizes the number of disk accesses required when a query is processed.
• First column is the Search key. It contains a copy of the primary key or candidate key of the table. The values of this
column may be sorted or not. But if the values are sorted, the corresponding data can be accessed easily.
• Second column is the Data reference or Pointer. It contains the address of the disk block where we can find the
• It is a two-level indexing technique used to reduce the mapping size of the primary index.
• The secondary index points to a certain location where the data is to be found but the actual data is not
sorted like in the primary indexing.
• Secondary Indexing is also known as non-clustered Indexing.
• In dense indexing, the index table contains records for every search key value of the database.
• It is like primary indexing but contains a record for every search key.
Sparse Indexing
3. Some Indexing uses sorted and unique keys which helps to retrieve sorted queries even faster.
5. As Index tables are smaller in size, they are stored in the main memory.
6. Since CPU speed and secondary memory speed have a large difference, the CPU uses this main memory
We will describe a B-tree of order 5 using a C++ structure. The declaration of B-tree node is given in Fig.
• The maximum number of items in a B-tree of order m and height h is shown in Table
B tree of order 3 is shown in the following image.
Search Operation in B-Tree
• The search operation in B-Tree is similar to the search operation in Binary Search Tree.
• In a Binary search tree, the search process starts from the root node and we make a 2-way decision every time
(we go to either left subtree or right subtree).
• In B-Tree also search process starts from the root node but here we make an n-way decision every time.
Where 'n' is the total number of children the node has.
• In a B-Tree, the search operation is performed with O(log n) time complexity.
• The search operation is performed as follows...
• Step 1 - Read the search element from the user.
• Step 2 - Compare the search element with first key value of root node in the tree.
• Step 3 - If both are matched, then display "Given node is found!!!" and terminate the function
• Step 4 - If both are not matched, then check whether search element is smaller or larger than that key
value.
• Step 5 - If search element is smaller, then continue the search process in left subtree.
• Step 6 - If search element is larger, then compare the search element with next key value in the same
node and repeate steps 3, 4, 5 and 6 until we find the exact match or until the search element is
compared with last key value in the leaf node.
• Step 7 - If the last key value in the leaf node is also not matched then display "Element is not found"
and terminate the function.
Insert Operation on B Tree in Data Structure
• Binary search trees grow at their leaves, but the B-trees grow at the root.
1. First, the new key is searched in the tree. If the new key is not found, then the search terminates at a
leaf.
3. If the leaf node is not full, then the new key is added to it and the insertion is finished.
4. If the leaf node is full, then it splits into two nodes on the same level, except that the median key is sent
5. If this would result in the parent becoming too big, split the parent into two, promoting the middle key.
6. This strategy might have to be repeated all the way to the top.
7. If necessary, the root is split into two and the middle key is promoted to a new root, making the tree one
level higher.
Steps to insert an element in B Tree in Data Structure
1. Calculate the maximum number of keys in the node based on the order of the B tree.
2. If the tree is empty, a root node is allocated and the key is inserted and acts as the root node.
2. If the elements are greater than the maximum number of keys, split at the median.
3. Push the median key upwards and split the left and right keys as left and right child respectively.
key.
The time complexity for insertion in a B Tree is dependent on the number of nodes and thus, O(log n).
Construct B-tree of order 4 by inserting the following data one at a time.
20, 10, 30, 15, 12, 40, 50
Insert 12
• Goes to left child [10, 15]
• Inserted in order → [10, 12, 15]
• Insert 40
• Goes to right child [30] → [30, 40]
Insert 50
• Goes to right child [30, 40] → [30, 40, 50]
Insert the keys to a 5-way B-tree:
3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 08, 19, 04, 31, 35, 56
Insertion
1) Initialize x as root.
1. Find the child of x that is going to be traversed next. Let the child be y.
3. If y is full, split it and change x to point to one of the two parts of y. If k is smaller than mid key in y,
then set x as the first part of y. Else second part of y. When we split y, we move a key from y to its
parent x.
3) The loop in step 2 stops when x is leaf. x must have space for 1 extra key as we have been splitting all
• During insertion, we had to ensure that the number of keys in the node doesn't cross a maximum.
• Similarly, during deletion, we need to ensure that the number of keys in the node after deletion doesn't go
1.1 If the node has more than MIN keys - Deletion of key does not violate any property and thus the
key can be deleted easily by shifting other keys of the node, if required.
1.2 If the node has less than MIN keys - This kind of deletion violates a property of B tree. In case the
keys in the left sibling are greater than MIN, keys are borrowed from there. If the keys in right sibling
are greater than MIN, then keys are borrowed from there. If either of these do not hold true, then a
2.1 In this case, the successor key (smallest key in the right subtree) is copied at the place of the key to
be deleted and then the successor is deleted. This case further reduces to Case 1, i.e. deletion from a
leaf node.
• Let's understand deletion using an example. Consider the following B tree of order 5 with the nodes 5, 12,
32 and 53 to be deleted in the given order.
• Since the order of the B tree is 5, the minimum and maximum number of keys in a node are 2 and 4
respectively.
Step 1 - Deleting 5
Since 5 is a key in a leaf node with
keys>MIN, this would be Case 1.1. A
simple deletion with key shift would be
done. Keys 9 and 10 are shifted left to fill
the gap created by the deletion of 5.
Step 2 - Deleting 12
• Here, key 12 is to be deleted from
node [12,17]. Since this node has only
MIN keys, we will try to borrow from its
left sibling [2,9,10] which has more
than MIN keys. The parent of these
nodes [11,21] contains the separator
key 11. So, the last key of left sibling
(10) is moved to the place of the
separator key and the separator key is
moved to the underflow node (the
node where deletion took place). The
resulting tree after deletion can be
found as follows:
Step 3 - Deleting 32
• Here, key 32 is to be deleted from
node [32, 41]. Since this node has
only MIN keys and does not have a
left sibling, we will try to borrow from
its right sibling [53, 61, 64] which has
more than MIN keys. The parent of
these nodes [51, 67] contains the
separator key 51. So, the first key of
right sibling (53) is moved to the place
of the separator key and the separator
key is moved to the underflow node
(the node where deletion took place).
The resulting tree after deletion can be
found as follows:
Step 4 - Deleting 53
Here key 53 is to be deleted from node [53,
67] which is a non-leaf node. In such a
case, the successor key (61) will be copied
in place of 53 and now the task reduces to
deletion of 61 from the leaf node. Since this
node would have less than MIN keys, we
check for the left sibling. Since the left
sibling has only MIN keys, we move to the
right sibling. The leftmost key of the right
sibling (68) moves to the parent node and
replaces the separator (67) while the
separator shifts to the underflow node
making it [64,67]. The resulting tree after
deletion can be found as follows:
• The internal node, which is deleted,
is replaced by an inorder successor
if the right child has more than the
minimum number of keys.
• If either child has exactly a minimum
number of keys then, merge the left
and the right children.
• After merging if the parent node has
less than the minimum number of
keys then, look for the siblings as in
Case I.
B-tree Insertion Algorithm
Procedure Insert(key):
Otherwise:
3. Split the current root and move the median key to the new parent.
• Recursively insert the key into the B-tree starting from the root using the InsertNonFull function.
Procedure InsertNonFull(node, key):
• Find the appropriate position to insert the key in the node's keys array.
• Insert the key into the appropriate position in the node's keys array.
• Decide which child of the current node to descend into based on the value of the key.
• Recursively call InsertNonFull on the selected child node with the key.
Procedure Split(parent, childIndex):
• Copy the upper half of the keys from the child node to the newNode.
• Insert the median key from the child node into the parent's keys array at the appropriate position.
• Insert the newNode into the parent's children array at the appropriate position.
B-tree Deletion Algorithm
Procedure Remove(key):
• If the root is empty, print "Tree is empty" and return.
• Call the Remove function starting from the root with the given key.
• If the root becomes empty after deletion, adjust the root.
Procedure Remove(node, key):
• Search for the key in the node:
• If found:
• If the node is a leaf, remove the key from the node's keys array.
• If the node is not a leaf:
• Find the predecessor of the key from the left subtree (or successor from the right
subtree).
• Replace the key with its predecessor (or successor).
• Recursively call Remove on the child node from which the predecessor (or successor)
was obtained.
• If not found:
• If the node is a leaf, print "Key not found" and return.
• If the node is not a leaf:
• If necessary, borrow a key from a sibling or merge with a sibling.
• Recursively call Remove on the appropriate child node.
Procedure BorrowFromLeft(node, index):
• Get the child node (child) at index from the parent node.
• Get the left sibling node (sibling) of the child node from the parent node.
• Move the rightmost key from the sibling node to the left of the child node.
• If the child node is not a leaf, adjust the corresponding child pointers.
Procedure BorrowFromRight(node, index):
• Get the child node (child) at index from the parent node.
• Get the right sibling node (sibling) of the child node from the parent node.
• Move the leftmost key from the sibling node to the right of the child node.
• If the child node is not a leaf, adjust the corresponding child pointers.
Procedure Merge(node, index):
• Get the child node (child) at index from the parent node.
• Get the sibling node (sibling) next to the child node.
• Merge the keys and children of the sibling node into the child node.
• Remove the key from the parent node.
• Remove the sibling node from the parent node and free its memory.
B+Tree
• A B+ Tree is simply a balanced binary search tree, in which all data is stored in the leaf nodes, while the
internal nodes store just the indices.
• B+ tree is an extension of the B tree.
• The difference in B+ tree and B tree is that in B tree the keys and records can be stored as internal as
well as leaf nodes whereas in B+ trees, the records are stored as leaf nodes and the keys are stored
only in internal nodes.
• Each leaf is at the same height and all leaf nodes have links to the other leaf nodes.
• The root node always has a minimum of two children.
Properties of B+ Trees
1. All data is stored in the leaf nodes, while the internal nodes store just the indices.
2. Each leaf is at the same height.
3. All leaf nodes have links to the other leaf nodes.
4. The root node has a minimum of two children.
5. Each node except root can have a maximum of m children and a minimum of m/2 children.
6. Each node can contain a maximum of m-1 keys and a minimum of ⌈m/2⌉ - 1 keys.
Difference Between B Tree and B+ Tree
Insertion Operation on B+ Tree
• Insert the key into the leaf node in increasing order if the leaf isn't full.
• Step 1: Insert the new node as a leaf node, in the increasing order. Now since the leaf node was already full,
• Step 4: In case the parent node is already full, just repeat the steps 2 and 3.
Insertion Example The elements to be inserted are 5,15, 25, 35, 45.
• Insert 5.
• Insert 15.
• Insert 25.
• Insert 35
Insert 45
Search Operation on B+ Tree
Deletion from a B+ Tree
Deletion Operation
Before going through the steps below, one must know these facts about a B+ tree of degree m.
• A node can have a maximum of m children. (i.e. 3)
• A node can contain a maximum of m - 1 keys. (i.e. 2)
• A node should have a minimum of ⌈m/2⌉ children. (i.e. 2)
• A node (except root node) should contain a minimum of ⌈m/2⌉ - 1 keys. (i.e. 1)
Example - Deletion from a B+ Tree
• While deleting a key, we have to take care of
the keys present in the internal nodes (i.e.
indexes) as well because the values are
redundant in a B+ tree. Search the key to be
deleted then follow the following steps.
• Case I - The key to be deleted is present
only at the leaf node not in the indexes (or
internal nodes). There are two cases for it:
1. There is more than the minimum
number of keys in the node. Simply
delete the key.
2 . There is an exact minimum number of keys in
the node. Delete the key and borrow a key from
the immediate sibling. Add the median key of the
sibling node to the parent.
Case II
• The key to be deleted is present in the internal nodes as well.
Then we have to remove them from the internal nodes as well.
There are the following cases for this situation.
1. If there is more than the minimum number of keys in
the node, simply delete the key from the leaf node and
delete the key from the internal node as well. Fill the
empty space in the internal node with the inorder
successor.
2. If there is an exact minimum number of keys in the node, then
delete the key and borrow a key from its immediate sibling (through
the parent).Fill the empty space created in the index (internal node)
with the borrowed key.
3 . This case is similar to Case II(1) but here, empty space is
generated above the immediate parent node.After deleting the key,
merge the empty space with its sibling.Fill the empty space in the
grandparent node with the inorder successor.
Case III -
• In this case, the height of the tree gets shrinked. It is a little
complicated.Deleting 55 from the tree below leads to this
condition. It can be understood in the illustrations below.
Algorithm - Basic operations associated with B+ Tree
1. Height of the B+ tree is always balanced and is comparitively lesser than B tree.
4. Because the data is only stored on the leaf nodes, search queries are faster.
Applications of B+ Tree
6. B+ trees in DBMS plays a useful role by supporting equality and range search.
• Trie data structure is an advanced data structure used for storing and searching strings efficiently.
• Trie comes from the word reTRIEval which means to find or get something back.
• Dictionaries can be implemented efficiently using a Trie data structure and Tries are also used for the
• Trie data structure is faster than binary search trees and hash tables for storing and retrieving data.
• Trie data structure is also known as a Prefix Tree or a Digital Tree. We can do prefix-based searching easily
1. In tries the keys are searched using common prefixes. Hence it is faster. The lookup of keys depends
2. Tries take less space when they contain a large number of short strings. As nodes are shared
3. Tries help with longest prefix matching, when we want to find the key.
Applications of tries
1. Tries has an ability to insert, delete or search for the entries. Hence they are used in building
if (!child->isLeaf) {
newNode->children.assign(child->children.begin() + ORDER/2 + 1, child->children.end());
child->children.resize(ORDER/2 + 1);
}
}
void insertNonFull(Node* node, const T& key)
{
int i = node->keys.size() - 1;
if (node->isLeaf) {
node->keys.push_back(T()); // Make space for the new key
while (i >= 0 && key < node->keys[i]) {
node->keys[i + 1] = node->keys[i];
i--;
}
node->keys[i + 1] = key;
} else {
while (i >= 0 && key < node->keys[i]) {
i--;
}
i++;
if (node->children[i]->keys.size() == ORDER - 1) {
split(node, i);
if (key > node->keys[i])
i++;
}
insertNonFull(node->children[i], key);
}
}
bool search(Node* node, const T& key)
{
int i = 0;
while (i < node->keys.size() && key > node->keys[i])
{
i++;
}
if (i < node->keys.size() && key == node->keys[i])
{
return true;
}
if (node->isLeaf)
{
return false;
}
return search(node->children[i], key);
}
void remove(Node* node, const T& key) {
int i = 0;
while (i < node->keys.size() && key > node->keys[i]) {
i++;
}
if (i < node->keys.size() && key == node->keys[i]) {
// Key found in this node
if (node->isLeaf) {
node->keys.erase(node->keys.begin() + i);
} else {
// Replace key with predecessor or successor
// Here, I'm just taking the predecessor for simplicity
T predecessor = findPredecessor(node->children[i]);
node->keys[i] = predecessor;
remove(node->children[i], predecessor);
}
}
else {
if (node->isLeaf) {
std::cerr << "Key not found\n";
return;
}
bool isLastChild = (i == node->keys.size());
if (node->children[i]->keys.size() < ORDER/2) {
// Borrow from left or right sibling or merge
if (i > 0 && node->children[i-1]->keys.size() >= ORDER/2) {
// Borrow from left sibling
borrowFromLeft(node, i);
} else if (i < node->keys.size() && node->children[i+1]->keys.size() >= ORDER/2) {
// Borrow from right sibling
borrowFromRight(node, i);
} else {
// Merge with a sibling
if (isLastChild) {
merge(node, i - 1);
i--;
} else {
merge(node, i);
}
}
}
remove(node->children[i], key);
}
}
T findPredecessor(Node* node)
{
while (!node->isLeaf) {
node = node->children.back();
}
return node->keys.back();
}
child->keys.push_back(node->keys[index]);
if (!child->isLeaf) {
child->children.push_back(sibling->children.front());
sibling->children.erase(sibling->children.begin());
}
node->keys[index] = sibling->keys.front();
sibling->keys.erase(sibling->keys.begin());
}
void merge(Node* node, int index)
child->keys.push_back(node->keys[index]);
if (!child->isLeaf) {
child->children.insert(child->children.end(),sibling->children.begin(),sibling->children.end());
node->keys.erase(node->keys.begin() + index);
delete sibling;
}
BTree() : root(nullptr) {}
tree.insert(10);
tree.insert(20);
tree.insert(5);
tree.insert(6);
tree.insert(12);
tree.insert(30);
tree.insert(7);
tree.insert(17);
tree.remove(20);
tree.remove(30);
return 0;
}