Ds&aa (Unit-1)
Ds&aa (Unit-1)
Introduction to
1 Algorithm Analysis, AVL
Trees, B-Trees
What is an Algorithm?
Characteristics of an Algorithm
o Input: An algorithm has some input values. We can pass 0 or some input
value to an algorithm.
o Output: We will get 1 or more output at the end of an algorithm.
o Unambiguity: An algorithm should be unambiguous which means that the
instructions in an algorithm should be clear and simple.
o Finiteness: An algorithm should have finiteness. Here, finiteness means
that the algorithm should contain a limited number of instructions, i.e., the
instructions should be countable.
o Effectiveness: An algorithm should be effective as each instruction in an
algorithm affects the overall process.
o Language independent: An algorithm must be language-independent so
that the instructions in an algorithm can be implemented in any of the
languages with the same output.
1
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
Dataflow of an Algorithm
Step 2: Squeeze the lemon as much you can and take out its juice in a container.
Step 5: When sugar gets dissolved, add some water and ice in it.
The above real-world can be directly compared to the definition of the algorithm.
We cannot perform the step 3 before the step 2, we need to follow the specific
2
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
order to make lemon juice. An algorithm also says that each and every instruction
should be followed in a specific order to perform a specific task.
The following are the steps required to add two numbers entered by the user:
Step 1: Start
Step 4: Add the values of a and b and store the result in the sum variable, i.e.,
sum=a+b.
Step 6: Stop
Factors of an Algorithm
The following are the factors that we need to consider for designing an algorithm:
o Modularity: If any problem is given and we can break that problem into
small-small modules or small-small steps, which is a basic definition of an
algorithm, it means that this feature has been perfectly designed for the
algorithm.
o Correctness: The correctness of an algorithm is defined as when the given
inputs produce the desired output, which means that the algorithm has been
designed algorithm. The analysis of an algorithm has been done correctly.
o Maintainability: Here, maintainability means that the algorithm should be
designed in a very simple structured way so that when we redefine the
algorithm, no major change will be done in the algorithm.
o Functionality: It considers various logical steps to solve the real-world
problem.
o Robustness: Robustness means that how an algorithm can clearly define
our problem.
o User-friendly: If the algorithm is not user-friendly, then the designer will
not be able to explain it to the programmer.
o Simplicity: If the algorithm is simple then it is easy to understand.
3
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
Importance of Algorithms
Issues of Algorithms
The following are the issues that come while designing an algorithm:
Approaches of Algorithm
The following are the approaches used after considering both the theoretical and
practical importance of designing an algorithm:
4
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
5
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
Algorithm Analysis
The algorithm can be analyzed in two levels, i.e., first is before creating the
algorithm, and second is after creating the algorithm. The following are the two
analysis of an algorithm:
Algorithm Complexity
1. sum=0;
2. // Suppose we have to calculate the sum of n numbers.
3. for i=1 to n
4. sum=sum+i;
5. // when the loop ends then sum holds the sum of the n numbers
6. return sum;
In the above code, the time complexity of the loop statement will be atleast n, and
if the value of n increases, then the time complexity also increases. While the
complexity of the code, i.e., return sum will be constant as its value is not
dependent on the value of n and will provide the result in one step only. We
generally consider the worst-time complexity as it is the maximum time taken for
any given input size.
6
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
Auxiliary space: The extra space required by the algorithm, excluding the input
size, is known as an auxiliary space. The space complexity considers both the
spaces, i.e., auxiliary space, and space used by the input.
So,
Types of Algorithms
o Search Algorithm
o Sort Algorithm
Search Algorithm
On each day, we search for something in our day to day life. Similarly, with the
case of computer, huge data is stored in a computer that whenever the user asks
for any data then the computer searches for that data in the memory and provides
that data to the user. There are mainly two techniques available to search the data
in an array:
o Linear search
o Binary search
Linear Search
Linear search is a very simple algorithm that starts searching for an element or a
value from the beginning of an array until the required element is not found. It
compares the element to be searched with all the elements in an array, if the match
is found, then it returns the index of the element else it returns -1. This algorithm
can be implemented on the unsorted list.
7
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
Binary Search
A Binary algorithm is the simplest algorithm that searches the element very
quickly. It is used to search the element from the sorted list. The elements must
be stored in sequential order or the sorted manner to implement the binary
algorithm. Binary search cannot be implemented if the elements are stored in a
random manner. It is used to find the middle element of the list.
Sorting Algorithms
Sorting algorithms are used to rearrange the elements in an array or a given data
structure either in an ascending or descending order. The comparison operator
decides the new order of the elements.
Time complexity is defined in terms of how many times it takes to run a given
algorithm, based on the length of the input. Time complexity is not a
measurement of how much time it takes to execute a particular algorithm because
such factors as programming language, operating system, and processing power
are also considered.
8
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
A good algorithm executes quickly and saves space in the process. You should
find a happy medium of space and time (space and time complexity), but you can
do with the average. Now, take a look at a simple algorithm for calculating the
"mul" of two numbers.
Step 1: Start.
Step 5: Store the mul of 'a' and 'b' in a variable named 'mul" -> Output
Step 6: End.
You will now see how significant space and time complexity is after
understanding what they are.
9
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
The input size has a strong relationship with time complexity in data structure.
As the size of the input increases, so does the runtime, or the amount of time it
takes the algorithm to run.
Here is an example.
Assume you have a set of numbers S= (10, 50, 20, 15, 30)
There are numerous algorithms for sorting the given numbers. However, not all
of them are effective. To determine which is the most effective, you must perform
computational analysis on each algorithm.
Here are some of the most critical findings from the graph:
10
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
Before you can run an analysis on any algorithm, you must first determine
its stability. Understanding your data is the most important aspect of
conducting a successful analysis.
You can't compare two algorithms head to head. It is heavily influenced by the
tools and hardware you use for comparisons, such as the operating system, CPU
model, processor generation, and so on. Even if you calculate time and space
complexity for two algorithms running on the same system, the subtle changes in
the system environment may affect their time and space complexity in data
structure.
As a result, you compare space and time complexity using asymptotic analysis.
It compares two algorithms based on changes in their performance as the input
size is increased or decreased.
11
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
f(n) : there exist positive constant c and n0 such that 0 <= f(n) <= c*g(n)
'n' denotes the upper bound value. If a function is O(n), it is also O(n2) and O(n3).
It is the most widely used notation for Asymptotic analysis. It specifies the upper
bound of a function, i.e., the maximum time required by an algorithm or the
worst-case time complexity. In other words, it returns the highest possible output
value (big-O) for a given input.
Big-Omega is an Asymptotic Notation for the best case or a floor growth rate for
a given function. It gives you an asymptotic lower bound on the growth rate of
an algorithm's runtime.
12
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
Big theta defines a function's lower and upper bounds, i.e., it exists as both, most,
and least boundaries for a given input value.
From the definition : f(n) is Θ(g(n)) if there exists positive numbers c1, c2 and
N such that c1g(n) <= f(n) <= c2g(n) for all n >= N.
13
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
Average Case: You add the running times for each possible input combination
and take the average in the average case. Here, the execution time serves as both
a lower and upper bound on the algorithm's time complexity.
You will now see how to calculate space and time complexity after grasping the
significance of space and time complexity.
The best algorithm/program should have a low level of space complexity. The
less space required, the faster it executes.
To calculate time complexity, you must consider each line of code in the program.
Consider the multiplication function as an example. Now, calculate the time
complexity of the multiply function:
1. mul <- 1
14
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
2. i <- 1
3. While i <= n do
4. mul = mul * 1
5. i=i+1
6. End while
Let T(n) be a function of the algorithm's time complexity. Lines 1 and 2 have a
time complexity of O. (1). Line 3 represents a loop. As a result, you must repeat
lines 4 and 5 (n -1) times. As a result, the time complexity of lines 4 and 5 is O.
(n).
Finally, adding the time complexity of all the lines yields the overall time
complexity of the multiple function fT(n) = O(n).
The iterative method gets its name because it calculates an iterative algorithm's
time complexity by parsing it line by line and adding the complexity.
Aside from the iterative method, several other concepts are used in various cases.
The recursive process, for example, is an excellent way to calculate time
complexity for recurrent solutions that use recursive trees or substitutions. The
master's theorem is another popular method for calculating time complexity.
With an example, you will go over how to calculate space complexity in this
section. Here is an example of computing the multiplication of array elements:
1. int mul, i
2. While i < = n do
3. mul <- mul * array[i]
4. i <- i + 1
5. end while
6. return mul
Let S(n) denote the algorithm's space complexity. In most systems, an integer
occupies 4 bytes of memory. As a result, the number of allocated bytes would be
the space complexity.
15
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
Line 1 allocates memory space for two integers, resulting in S(n) = 4 bytes
multiplied by 2 = 8 bytes. Line 2 represents a loop. Lines 3 and 4 assign a value
to an already existing variable. As a result, there is no need to set aside any space.
The return statement in line 6 will allocate one more memory case. As a result,
S(n)= 4 times 2 + 4 = 12 bytes.
Because the array is used in the algorithm to allocate n cases of integers, the final
space complexity will be fS(n) = n + 12 = O (n).
As you progress through this tutorial, you will see some differences between
space and time complexity.
You now understand space and time complexity fundamentals and how to
calculate it for an algorithm or program. In this section, you will summarise all
previous discussions and list the key differences in a table.
The size of the input data is the Primarily determined by the auxiliary
primary determinant. variable size
16
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
AVL Tree can be defined as height balanced binary search tree in which each
node is associated with a balance factor which is calculated by subtracting the
height of its right sub-tree from that of its left sub-tree.
If balance factor of any node is 1, it means that the left sub-tree is one level higher
than the right sub-tree.
If balance factor of any node is 0, it means that the left sub-tree and right sub-tree
contain equal height.
If balance factor of any node is -1, it means that the left sub-tree is one level lower
than the right sub-tree.
An AVL tree is given in the following figure. We can see that, balance factor
associated with each node is in between -1 and +1. therefore, it is an example of
AVL tree.
17
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
Complexity
Due to the fact that, AVL tree is also a binary search tree therefore, all the
operations are performed in the same way as they are performed in a binary search
tree. Searching and traversing do not lead to the violation in property of AVL
tree. However, insertion and deletion are the operations which can violate this
property and therefore, they need to be revisited.
SN Operation Description
AVL tree controls the height of the binary search tree by not letting it to be
skewed. The time taken for all operations in a binary search tree of height h
is O(h). However, it can be extended to O(n) if the BST becomes skewed (i.e.
worst case). By limiting this height to log n, AVL tree imposes an upper bound
on each operation to be O(log n) where n is the number of nodes.
18
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
AVL Rotations
We perform rotation in AVL tree only in case if Balance Factor is other than -1,
0, and 1. There are basically four types of rotations which are as follows:
Where node A is the node whose balance Factor is other than -1, 0, 1.
The first two rotations LL and RR are single rotations and the next two rotations
LR and RL are double rotations. For a tree to be unbalanced, minimum height
must be at least 2, Let us understand each rotation
1. RR Rotation
When BST becomes unbalanced, due to a node is inserted into the right subtree
of the right subtree of A, then we perform RR rotation, RR rotation is an
anticlockwise rotation, which is applied on the edge below a node having balance
factor -2
2. LL Rotation
When BST becomes unbalanced, due to a node is inserted into the left subtree of
the left subtree of C, then we perform LL rotation, LL rotation is clockwise
rotation, which is applied on the edge below a node having balance factor 2.
19
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
3. LR Rotation
Double rotations are bit tougher than single rotation which has already explained
above. LR rotation = RR rotation + LL rotation, i.e., first RR rotation is performed
on subtree and then LL rotation is performed on full tree, by full tree we mean
the first node from the path of inserted node whose balance factor is other than -
1, 0, or 1.
State Action
20
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
4. RL Rotation
As already discussed, that double rotations are bit tougher than single rotation
which has already explained above. R L rotation = LL rotation + RR rotation, i.e.,
first LL rotation is performed on subtree and then RR rotation is performed on
full tree, by full tree we mean the first node from the path of inserted node whose
balance factor is other than -1, 0, or 1.
State Action
21
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
H, I, J, B, A, E, C, F, D, G, K, L
1. Insert H, I, J
22
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
On inserting the above elements, especially in the case of H, the BST becomes
unbalanced as the Balance Factor of H is -2. Since the BST is right-skewed, we
will perform RR Rotation on node H.
2. Insert B, A
23
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
3. Insert E
24
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
4. Insert C, F, D
25
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
26
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
5. Insert G
27
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
6. Insert K
28
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
7. Insert L
On inserting the L tree is still balanced as the Balance Factor of each node is now
either, -1, 0, +1. Hence the tree is a Balanced AVL tree
29
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
The first step of insertionof a node into the AVL tree is a normal insertion using
a BST insertion algorithm. However in addition we have to rebalance the tree if
an imbalance occurs. As we have discussed in the last module, an imbalance
occurs if a node’s balance factor changes from -1 to -2 or from+1 to
+2.Rebalancing is done at the deepest or lowest unbalanced ancestor of the
inserted node.
2.Same side (left-left or right-right) insertion that causes an imbalance. This type
of insertion requires a single rotation to rebalance.
Insertion Algorithm
The first step of the insertion algorithm for AVL trees is the same as insertion
into a binary search tree. Here we first find a place for the value to be inserted,
and then insert it. Now comes the next step of insertion into the AVL tree which
is searching back from the inserted node looking for imbalance. If there is an
imbalancewhen a new element is added as the outside grandchild (that is the left
grandchild of a left child (left-left) or the right grandchild of a right child (right-
right))(Figure 24.1 (a)) we perform single rotation and exit. When a new item is
added as an inside grandchild (Figure 24.1 (b)), the imbalance is fixed with a
double right or left rotation. As already discussed in the previous module an
inside grandchild is when we have left grandchild of a right child (left-right) or
the right grandchild of a left child (right-left)).
As we have already discussed in the previous module the insert operation may
cause balance factor to become +2 or –2 for some node. Now only nodes on the
path from the insertion point to the root node have possibly changed in height.
Therefore after insertion, we go back up to the root node by node, updating
heights as we go. If a new balance factor (the difference hleft-hright) is +2 or –2,
we need to adjust the balance of the tree by rotation around the node. Now let us
consider a validAVL sub-tree (Figure 24.2). Before the insertion into the subtree
30
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
M the tree was a valid AVL tree, but after the insertion the AVL property is
violated at node j where the balance factor has become +2.
Rebalancing
Rebalancing is done at the deepest unbalanced ancestor of the inserted node. Now
let the node that needs rebalancing be denoted as X. The rebalancing is performed
through four separate types of rotations as given below:
Let us consider the first case of insertion into left sub-tree of left child of
X (Figure 24.3). Here we have inserted 7 as left sub-tree of left child (8) of X (9).
It is at X that the balancing is violated with balance factor of X becoming +2.
Now this node X is the pivot. This imbalance occurred when the new element (7)
was added to the left sub-tree of the outside left grandchild that is we have the
case of single right rotation.
Now we carry out single right rotation of 8 about 9. In this case 8 becomes the
right child of the parent (6) of 9 while 9 becomes the right child of 8. The BST
property of the AVL tree is now maintained.
Let us consider the second case of insertion into right sub-tree of right child
of X (Figure 24.4). Here we have inserted 45 as right sub-tree of right child (40)
of X (35). It is at X that the balancing is violated with balance factor of X
becoming -2. Now the node X is the pivot. This imbalance occurred when the
new element (45) was added to the right sub-tree of the outside right grandchild
that is we have the case of single left rotation.Now we carry out single left rotation
of 40 about 35. In this case 40 becomes the right child of the parent (30) of 35
31
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
while 35 becomes the left child of 40. The BST property of the AVL tree is now
maintained.
Let us consider the third case of insertion into left sub-tree of right child of
X (Figure 24.5). Here we have inserted 34 as left sub-tree of right child (40) of X
(30). Balance is violated at X balance factor becoming -2. Now we carry out
two rotations that is a right rotation followed by a left rotation:
The first is a single right rotate of 35 about the first pivot (40). Here 35
becomes the new right child of X while 40 becomes the right child of 35.
Next we carry out a single left rotate of 35 about second pivot node X (30).
In this case 35 becomes the left child of the parent (20) of 30 while 30
becomes the new left child of 35. The BST property of the AVL tree is now
maintained.
Let us consider the fourth case of insertion into right sub-tree of left child of
X (Figure 24.6). Here we have inserted 7 as right sub-tree of left child (5) of X
(10). Balance is violated at X balance factor becoming +2. When a new item (7)
is added to the sub-tree oftheinside grandchild, the imbalance is fixed with a
double rotation. Now we carry out two rotations that is a left rotation followed by
a right rotation:
if( T == NULL )
else {
T->Element = X; T->Height = 0;
32
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
T->Left = Insert( X, T->Left );/* Insertion at the left*/ if( Height( T->Left )
– Height( T->Right ) == 2 ) if( X < T->Left->Element )
T = SingleRotateWithLeft( T ); /* LL */
else
T = SingleRotateWithRight( T ); /* RR */
else
T = DoubleRotateWithRight( T ); /* RL */
return T;
The first is a single left rotate of 6 about the first pivot (5). Here 6 becomes the
new left child of the parent (10)of 5 while 5 becomes the left child of 6.
Next we carry out a single right rotate of 6 about the second pivot node X
(10). In this case 6 becomes the left child of parent (13) of X (10) while 10
becomes the new right child of 6. The BST property of the AVL tree is
now maintained
33
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
Deletion
Let us consider the simplest case of deletion that is deletion which causes no
imbalance. We first consider the current node pthat has sub-trees T1 and T2 with
equal heights and so the balance factor of pis zero (Figure 24.8).When deletion
occurs say in the left sub-tree T1, it’s height is decreased but the height of p
remains unchanged. The balance factor of pbecomes (-1). This is allowed so there
is no imbalance. This is illustrated using the example given in Figure 24.9. The
deletion of 14 causes the balance factor of 15 to change from 0 to -1 but it’s height
remains the same as before deletion and hence no rotation is needed.
We next consider the current node p whose balance factor is not 0. Let us see the
case where balance factor of pis +1 (Figure 24.10) and the taller subtree (here T1)
was shortened. Now the balance factor of P becomes 0, the height of the tree is
reduced but there is no imbalance and hence there is no need for rotations.
34
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
Now let us consider the case shown in Figure 24.11. The balance factor
of q is 0.Before deletion the left sub-tree of p had height h and the right sub-tree
of p had height=h+1. Now we delete a node from left sub-tree such that it’s height
become h- 1. Now the balance factor at p changes from -1 to -2 and the AVL
condition is violated. Then we need to carry out rebalancing. In this case we need
to carry out a single right rotation of q about p where p becomes the left child of
q and q’s left child becomes p’s right child.
Now let us consider the case shown in Figure 24.12. The balance factor
of qis equal to that of p.Before deletion left sub-tree of p from T1, the balance
factor of p becomes (h-1) – (h+1)= -2 and hence there is an imbalance. Now we
do a single right rotation of q about P where P becomes left sub-tree of q and T2
becomes right sub-tree of P. The overall height of the tree is reduced.
Now let us consider the example of deletion which causes an imbalance and leads
to single right rotation (Figure 24.13). Here we assume that 40 is deleted (From
T3 of Figure 24.12). This causes an imbalance at node 35 (q). Now to rebalance
we carry out single right rotation of 32 (T2) with 35 (q) as the pivot. Now 32
becomes the right child of the parent (30) of 35 while 35 becomes the right child
of 32.
Now let us consider the example of deletion which causes an imbalance and leads
to single left rotation (Figure 24.14). Here we assume that 32 is deleted. This
causes an imbalance at the root node 44. Now to rebalance we carry out single
left rotation of 62 with 44 as the pivot. Now 62 becomes the new root node while
44 becomes the right child of 62. The right sub-tree of 62 rooted at 50 now
becomes the right sub-tree of 44.
In case the balance factors of p and q are opposite then we need to apply a double
rotation (Figure 24.15). The balance factor at q will be 0 or 1. The balance factor
of node p will be 0 or -1.Now let us assume p’s left child has height h before
deletion. Let us assume that the right child q of p has left sub-tree rooted at r to
be of height h-1+1=h or h-2 +1=h-1.The left sub-tree of q has height h-1. Now let
us assume that a node is deleted from the left sub-tree of p making it’s height h-
35
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
1. We first right rotate r about q and then left rotate r about p. Then we set the
balance factors of the new root r to be 0.
For the case of deletion with two or more rotations let us consider the example
given in Figure 24.16. The deletion of 40 first causes an imbalance to occur at 35.
Now we carry out a right rotate of 32 with 35 as pivot. This causes 32 to become
right child of parent (30) of 35 and 35 itself becomes right child of 32. However
this rotation causes an imbalance at the root node 30. This requires us to carry out
another right rotation of 20 about the pivot 30. This causes 20 to become the new
root and the right sub-tree of 20 becomes the right sub-tree of 30. Now the tree is
balanced. Please note that in the case of deletion we need to check for imbalance
and carry on rebalancing until the tree is balanced. This may cause more than two
rotations in some situations.
T = SingleRotateWithRight( T );/* RR */
else
T = DoubleRotateWithRight( T ); /* RL */
T = SingleRotateWithLeft( T );/* LL */
36
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
else
T = DoubleRotateWithLeft( T ); /* LR */
T->Element = TmpCell->Element;
T = SingleRotateWithLeft( T );
else
T = DoubleRotateWithLeft( T ); /*LR*/
TmpCell = T;
37
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
B Tree
B Tree is a specialized m-way tree that can be widely used for disk access. A B-
Tree of order m can have at most m-1 keys and m children. One of the main
reason of using B tree is its capability to store large number of keys in a single
node and large key values by keeping the height of the tree relatively small.
It is not necessary that, all the nodes contain the same number of children but,
each node must have m/2 number of nodes.
38
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
1. Compare item 49 with root node 78. since 49 < 78 hence, move to its left
sub-tree.
2. Since, 40<49<56, traverse right sub-tree of 40.
3. 49>45, move to right. Compare 49.
4. match found, return.
Searching in a B tree depends upon the height of the tree. The search algorithm
takes O(log n) time to search any element in a B tree.
Inserting
Insertions are done at the leaf node level. The following algorithm needs to be
followed in order to insert an item into B Tree.
1. Traverse the B Tree in order to find the appropriate leaf node at which the
node can be inserted.
2. If the leaf node contain less than m-1 keys then insert the element in the
increasing order.
3. Else, if the leaf node contains m-1 keys, then follow the following steps.
o Insert the new element in the increasing order of elements.
o Split the node into the two nodes at the median.
o Push the median element upto its parent node.
o If the parent node also contain m-1 number of keys, then split it too
by following the same steps.
Example:
Insert the node 8 into the B Tree of order 5 shown in the following image.
39
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
The node, now contain 5 keys which is greater than (5 -1 = 4 ) keys. Therefore
split the node from the median i.e. 8 and push it up to its parent node shown as
follows.
Deletion
Deletion is also performed at the leaf nodes. The node which is to be deleted
can either be a leaf node or an internal node. Following algorithm needs to be
followed in order to delete a node from a B tree.
1. Locate the leaf node.
2. If there are more than m/2 keys in the leaf node then delete the desired
key from the node.
3. If the leaf node doesn't contain m/2 keys then complete the keys by taking
the element from eight or left sibling.
o If the left sibling contains more than m/2 elements then push its
largest element up to its parent and move the intervening element
down to the node where the key is deleted.
40
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
o If the right sibling contains more than m/2 elements then push its
smallest element up to the parent and move intervening element
down to the node where the key is deleted.
4. If neither of the sibling contain more than m/2 elements then create a new
leaf node by joining two leaf nodes and the intervening element of the
parent node.
5. If parent is left with less than m/2 nodes then, apply the above process on
the parent too.
If the the node which is to be deleted is an internal node, then replace the node
with its in-order successor or predecessor. Since, successor or predecessor will
always be on the leaf node hence, the process will be similar as the node is
being deleted from the leaf node.
Example 1
Delete the node 53 from the B Tree of order 5 shown in the following figure.
Now, 57 is the only element which is left in the node, the minimum number of
elements that must be present in a B tree of order 5, is 2. it is less than that, the
elements in its left and right sub-tree are also not sufficient therefore, merge it
with the left sibling and intervening element of parent i.e. 49.
41
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS
Application of B tree
B tree is used to index the data and provides fast access to the actual data stored
on the disks since, the access to value stored in a large database that is stored on
a disk is a very time consuming process.
42