0% found this document useful (0 votes)
47 views42 pages

Ds&aa (Unit-1)

The document provides an overview of algorithms, including their definitions, characteristics, and importance in problem-solving. It discusses various types of algorithms, such as sorting and searching algorithms, and emphasizes the significance of time and space complexity in algorithm analysis. Additionally, it outlines different approaches to algorithm design and the factors that contribute to an effective algorithm.

Uploaded by

SRIKANTH KETHA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views42 pages

Ds&aa (Unit-1)

The document provides an overview of algorithms, including their definitions, characteristics, and importance in problem-solving. It discusses various types of algorithms, such as sorting and searching algorithms, and emphasizes the significance of time and space complexity in algorithm analysis. Additionally, it outlines different approaches to algorithm design and the factors that contribute to an effective algorithm.

Uploaded by

SRIKANTH KETHA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

Introduction to
1 Algorithm Analysis, AVL
Trees, B-Trees

What is an Algorithm?

An algorithm is a process or a set of rules required to perform calculations or


some other problem-solving operations especially by a computer. The formal
definition of an algorithm is that it contains the finite set of instructions which are
being carried in a specific order to perform the specific task. It is not the complete
program or code; it is just a solution (logic) of a problem, which can be
represented either as an informal description using a Flowchart or Pseudocode.

Characteristics of an Algorithm

The following are the characteristics of an algorithm:

o Input: An algorithm has some input values. We can pass 0 or some input
value to an algorithm.
o Output: We will get 1 or more output at the end of an algorithm.
o Unambiguity: An algorithm should be unambiguous which means that the
instructions in an algorithm should be clear and simple.
o Finiteness: An algorithm should have finiteness. Here, finiteness means
that the algorithm should contain a limited number of instructions, i.e., the
instructions should be countable.
o Effectiveness: An algorithm should be effective as each instruction in an
algorithm affects the overall process.
o Language independent: An algorithm must be language-independent so
that the instructions in an algorithm can be implemented in any of the
languages with the same output.

1
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

Dataflow of an Algorithm

o Problem: A problem can be a real-world problem or any instance from the


real-world problem for which we need to create a program or the set of
instructions. The set of instructions is known as an algorithm.
o Algorithm: An algorithm will be designed for a problem which is a step by
step procedure.
o Input: After designing an algorithm, the required and the desired inputs are
provided to the algorithm.
o Processing unit: The input will be given to the processing unit, and the
processing unit will produce the desired output.
o Output: The output is the outcome or the result of the program.

Why do we need Algorithms?

We need algorithms because of the following reasons:

o Scalability: It helps us to understand the scalability. When we have a big


real-world problem, we need to scale it down into small-small steps to
easily analyze the problem.
o Performance: The real-world is not easily broken down into smaller steps.
If the problem can be easily broken into smaller steps means that the
problem is feasible.

Let's understand the algorithm through a real-world example. Suppose we want


to make a lemon juice, so following are the steps required to make a lemon juice:

Step 1: First, we will cut the lemon into half.

Step 2: Squeeze the lemon as much you can and take out its juice in a container.

Step 3: Add two tablespoon sugar in it.

Step 4: Stir the container until the sugar gets dissolved.

Step 5: When sugar gets dissolved, add some water and ice in it.

Step 6: Store the juice in a fridge for 5 to minutes.

Step 7: Now, it's ready to drink.

The above real-world can be directly compared to the definition of the algorithm.
We cannot perform the step 3 before the step 2, we need to follow the specific

2
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

order to make lemon juice. An algorithm also says that each and every instruction
should be followed in a specific order to perform a specific task.

Now we will look an example of an algorithm in programming.

We will write an algorithm to add two numbers entered by the user.

The following are the steps required to add two numbers entered by the user:

Step 1: Start

Step 2: Declare three variables a, b, and sum.

Step 3: Enter the values of a and b.

Step 4: Add the values of a and b and store the result in the sum variable, i.e.,
sum=a+b.

Step 5: Print sum

Step 6: Stop

Factors of an Algorithm

The following are the factors that we need to consider for designing an algorithm:

o Modularity: If any problem is given and we can break that problem into
small-small modules or small-small steps, which is a basic definition of an
algorithm, it means that this feature has been perfectly designed for the
algorithm.
o Correctness: The correctness of an algorithm is defined as when the given
inputs produce the desired output, which means that the algorithm has been
designed algorithm. The analysis of an algorithm has been done correctly.
o Maintainability: Here, maintainability means that the algorithm should be
designed in a very simple structured way so that when we redefine the
algorithm, no major change will be done in the algorithm.
o Functionality: It considers various logical steps to solve the real-world
problem.
o Robustness: Robustness means that how an algorithm can clearly define
our problem.
o User-friendly: If the algorithm is not user-friendly, then the designer will
not be able to explain it to the programmer.
o Simplicity: If the algorithm is simple then it is easy to understand.

3
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

o Extensibility: If any other algorithm designer or programmer wants to use


your algorithm then it should be extensible.

Importance of Algorithms

1. Theoretical importance: When any real-world problem is given to us and


we break the problem into small-small modules. To break down the
problem, we should know all the theoretical aspects.
2. Practical importance: As we know that theory cannot be completed without
the practical implementation. So, the importance of algorithm can be
considered as both theoretical and practical.

Issues of Algorithms

The following are the issues that come while designing an algorithm:

o How to design algorithms: As we know that an algorithm is a step-by-step


procedure so we must follow some steps to design an algorithm.
o How to analyze algorithm efficiency

Approaches of Algorithm

The following are the approaches used after considering both the theoretical and
practical importance of designing an algorithm:

o Brute force algorithm: The general logic structure is applied to design an


algorithm. It is also known as an exhaustive search algorithm that searches
all the possibilities to provide the required solution. Such algorithms are of
two types:
o Optimizing: Finding all the solutions of a problem and then take out
the best solution or if the value of the best solution is known then it
will terminate if the best solution is known.
o Sacrificing: As soon as the best solution is found, then it will stop.
o Divide and conquer: It is a very implementation of an algorithm. It allows
you to design an algorithm in a step-by-step variation. It breaks down the
algorithm to solve the problem in different methods. It allows you to break
down the problem into different methods, and valid output is produced for
the valid input. This valid output is passed to some other function.
o Greedy algorithm: It is an algorithm paradigm that makes an optimal
choice on each iteration with the hope of getting the best solution. It is easy
to implement and has a faster execution time. But, there are very rare cases
in which it provides the optimal solution.

4
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

o Dynamic programming: It makes the algorithm more efficient by storing


the intermediate results. It follows five different steps to find the optimal
solution for the problem:
o It breaks down the problem into a subproblem to find the optimal
solution.
o After breaking down the problem, it finds the optimal solution out
of these subproblems.
o Stores the result of the subproblems is known as memorization.
o Reuse the result so that it cannot be recomputed for the same
subproblems.
o Finally, it computes the result of the complex program.
o Branch and Bound Algorithm: The branch and bound algorithm can be
applied to only integer programming problems. This approach divides all
the sets of feasible solutions into smaller subsets. These subsets are further
evaluated to find the best solution.
o Randomized Algorithm: As we have seen in a regular algorithm, we have
predefined input and required output. Those algorithms that have some
defined set of inputs and required output, and follow some described steps
are known as deterministic algorithms. What happens that when the
random variable is introduced in the randomized algorithm?. In a
randomized algorithm, some random bits are introduced by the algorithm
and added in the input to produce the output, which is random in nature.
Randomized algorithms are simpler and efficient than the deterministic
algorithm.
o Backtracking: Backtracking is an algorithmic technique that solves the
problem recursively and removes the solution if it does not satisfy the
constraints of a problem.

The major categories of algorithms are given below:

o Sort: Algorithm developed for sorting the items in a certain order.


o Search: Algorithm developed for searching the items inside a data
structure.
o Delete: Algorithm developed for deleting the existing element from the
data structure.
o Insert: Algorithm developed for inserting an item inside a data structure.
o Update: Algorithm developed for updating the existing element inside a
data structure.

5
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

Algorithm Analysis

The algorithm can be analyzed in two levels, i.e., first is before creating the
algorithm, and second is after creating the algorithm. The following are the two
analysis of an algorithm:

o Priori Analysis: Here, priori analysis is the theoretical analysis of an


algorithm which is done before implementing the algorithm. Various
factors can be considered before implementing the algorithm like processor
speed, which has no effect on the implementation part.
o Posterior Analysis: Here, posterior analysis is a practical analysis of an
algorithm. The practical analysis is achieved by implementing the
algorithm using any programming language. This analysis basically
evaluate that how much running time and space taken by the algorithm.

Algorithm Complexity

The performance of the algorithm can be measured in two factors:

o Time complexity: The time complexity of an algorithm is the amount of


time required to complete the execution. The time complexity of an
algorithm is denoted by the big O notation. Here, big O notation is the
asymptotic notation to represent the time complexity. The time complexity
is mainly calculated by counting the number of steps to finish the
execution. Let's understand the time complexity through an example.

1. sum=0;
2. // Suppose we have to calculate the sum of n numbers.
3. for i=1 to n
4. sum=sum+i;
5. // when the loop ends then sum holds the sum of the n numbers
6. return sum;

In the above code, the time complexity of the loop statement will be atleast n, and
if the value of n increases, then the time complexity also increases. While the
complexity of the code, i.e., return sum will be constant as its value is not
dependent on the value of n and will provide the result in one step only. We
generally consider the worst-time complexity as it is the maximum time taken for
any given input size.

o Space complexity: An algorithm's space complexity is the amount of space


required to solve a problem and produce an output. Similar to the time
complexity, space complexity is also expressed in big O notation.

6
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

For an algorithm, the space is required for the following purposes:

1. To store program instructions


2. To store constant values
3. To store variable values
4. To track the function calls, jumping statements, etc.

Auxiliary space: The extra space required by the algorithm, excluding the input
size, is known as an auxiliary space. The space complexity considers both the
spaces, i.e., auxiliary space, and space used by the input.

So,

Space complexity = Auxiliary space + Input size.

Types of Algorithms

The following are the types of algorithm:

o Search Algorithm
o Sort Algorithm

Search Algorithm

On each day, we search for something in our day to day life. Similarly, with the
case of computer, huge data is stored in a computer that whenever the user asks
for any data then the computer searches for that data in the memory and provides
that data to the user. There are mainly two techniques available to search the data
in an array:

o Linear search
o Binary search

Linear Search

Linear search is a very simple algorithm that starts searching for an element or a
value from the beginning of an array until the required element is not found. It
compares the element to be searched with all the elements in an array, if the match
is found, then it returns the index of the element else it returns -1. This algorithm
can be implemented on the unsorted list.

7
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

Binary Search

A Binary algorithm is the simplest algorithm that searches the element very
quickly. It is used to search the element from the sorted list. The elements must
be stored in sequential order or the sorted manner to implement the binary
algorithm. Binary search cannot be implemented if the elements are stored in a
random manner. It is used to find the middle element of the list.

Sorting Algorithms

Sorting algorithms are used to rearrange the elements in an array or a given data
structure either in an ascending or descending order. The comparison operator
decides the new order of the elements.

Why do we need a sorting algorithm?

o An efficient sorting algorithm is required for optimizing the efficiency of


other algorithms like binary search algorithm as a binary search algorithm
requires an array to be sorted in a particular order, mainly in ascending
order.
o It produces information in a sorted order, which is a human-readable
format.
o Searching a particular element in a sorted list is faster than the unsorted
list.

SPACE AND TIME COMPLEXITY ANALYSIS


What Is Time Complexity?

Time complexity is defined in terms of how many times it takes to run a given
algorithm, based on the length of the input. Time complexity is not a
measurement of how much time it takes to execute a particular algorithm because
such factors as programming language, operating system, and processing power
are also considered.

Time complexity is a type of computational complexity that describes the time


required to execute an algorithm. The time complexity of an algorithm is the
amount of time it takes for each statement to complete. As a result, it is highly
dependent on the size of the processed data. It also aids in defining an algorithm's
effectiveness and evaluating its performance.

8
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

What Is Space Complexity?

When an algorithm is run on a computer, it necessitates a certain amount of


memory space. The amount of memory used by a program to execute it is
represented by its space complexity. Because a program requires memory to store
input data and temporal values while running, the space complexity is auxiliary
and input space.

What Does It Take To Develop a Good Algorithm?

A good algorithm executes quickly and saves space in the process. You should
find a happy medium of space and time (space and time complexity), but you can
do with the average. Now, take a look at a simple algorithm for calculating the
"mul" of two numbers.

Step 1: Start.

Step 2: Create two variables (a & b).

Step 3: Store integer values in ‘a’ and ‘b.’ -> Input

Step 4: Create a variable named 'mul'

Step 5: Store the mul of 'a' and 'b' in a variable named 'mul" -> Output

Step 6: End.

You will now see how significant space and time complexity is after
understanding what they are.

9
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

How Significant Are Space and Time Complexity?

Significant in Terms of Time Complexity

The input size has a strong relationship with time complexity in data structure.
As the size of the input increases, so does the runtime, or the amount of time it
takes the algorithm to run.

Here is an example.

Assume you have a set of numbers S= (10, 50, 20, 15, 30)

There are numerous algorithms for sorting the given numbers. However, not all
of them are effective. To determine which is the most effective, you must perform
computational analysis on each algorithm.

Here are some of the most critical findings from the graph:

 This test revealed the following sorting algorithms: Quicksort, Insertion


sort, Bubble sort, and Heapsort.
 Python is the programming language used to complete the task, and the
input size ranges from 50 to 500 characters.
 The results were as follows: "Heap Sort algorithms performed well despite
the length of the lists; on the other hand, you discovered that Insertion sort
and Bubble sort algorithms performed far worse, significantly increasing
computing time." See the graph above for the results.

10
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

 Before you can run an analysis on any algorithm, you must first determine
its stability. Understanding your data is the most important aspect of
conducting a successful analysis.

What Are Asymptotic Notations?

Asymptotic Notations are programming languages that allow you to analyze an


algorithm's running time by identifying its behavior as its input size grows. This
is also referred to as an algorithm's growth rate. When the input size increases,
does the algorithm become incredibly slow? Is it able to maintain its fast run time
as the input size grows? You can answer these questions thanks to Asymptotic
Notation.

You can't compare two algorithms head to head. It is heavily influenced by the
tools and hardware you use for comparisons, such as the operating system, CPU
model, processor generation, and so on. Even if you calculate time and space
complexity for two algorithms running on the same system, the subtle changes in
the system environment may affect their time and space complexity in data
structure.

As a result, you compare space and time complexity using asymptotic analysis.
It compares two algorithms based on changes in their performance as the input
size is increased or decreased.

Asymptotic notations are classified into three types:

1. Big-Oh (O) notation


2. Big Omega ( Ω ) notation
3. Big Theta ( Θ ) notation

Now, go over each of these notations one by one.

1. Big-Oh (O) Notation

Paul Bachmann invented the big-O notation in 1894. He inadvertently introduced


this notation in his discussion of function approximation.

From the definition: O (g(n)) =

11
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

f(n) : there exist positive constant c and n0 such that 0 <= f(n) <= c*g(n)

For all n >= n0

'n' denotes the upper bound value. If a function is O(n), it is also O(n2) and O(n3).

It is the most widely used notation for Asymptotic analysis. It specifies the upper
bound of a function, i.e., the maximum time required by an algorithm or the
worst-case time complexity. In other words, it returns the highest possible output
value (big-O) for a given input.

2. Big-Omega (Ω) notation

Big-Omega is an Asymptotic Notation for the best case or a floor growth rate for
a given function. It gives you an asymptotic lower bound on the growth rate of
an algorithm's runtime.

From the definition: The function f( n ) is Ω (g(n)) if there exists a positive


number c and N, such that f(n) >= cg(n) for all n >= N.

12
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

3. Big-Theta (Θ) notation

Big theta defines a function's lower and upper bounds, i.e., it exists as both, most,
and least boundaries for a given input value.

From the definition : f(n) is Θ(g(n)) if there exists positive numbers c1, c2 and
N such that c1g(n) <= f(n) <= c2g(n) for all n >= N.

13
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

Best Case, Worst Case, and Average Case in Asymptotic Analysis

Best Case: It is defined as the condition that allows an algorithm to complete


statement execution in the shortest amount of time. In this case, the execution
time serves as a lower bound on the algorithm's time complexity.

Average Case: You add the running times for each possible input combination
and take the average in the average case. Here, the execution time serves as both
a lower and upper bound on the algorithm's time complexity.

Worst Case: It is defined as the condition that allows an algorithm to complete


statement execution in the shortest amount of time possible. In this case, the
execution time serves as an upper bound on the algorithm's time complexity.

You will now see how to calculate space and time complexity after grasping the
significance of space and time complexity.

Significant in Terms of Space Complexity

Space complexity refers to the total amount of memory space used by an


algorithm/program, including the space of input values for execution. Calculate
the space occupied by variables in an algorithm/program to determine space
complexity.

However, people frequently confuse Space-complexity with auxiliary space.


Auxiliary space is simply extra or temporary space, and it is not the same as space
complexity. To put it another way,

Auxiliary space + space use by input values = Space Complexity

The best algorithm/program should have a low level of space complexity. The
less space required, the faster it executes.

Method for Calculating Space and Time Complexity

Methods for Calculating Time Complexity

To calculate time complexity, you must consider each line of code in the program.
Consider the multiplication function as an example. Now, calculate the time
complexity of the multiply function:

1. mul <- 1

14
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

2. i <- 1
3. While i <= n do
4. mul = mul * 1
5. i=i+1
6. End while

Let T(n) be a function of the algorithm's time complexity. Lines 1 and 2 have a
time complexity of O. (1). Line 3 represents a loop. As a result, you must repeat
lines 4 and 5 (n -1) times. As a result, the time complexity of lines 4 and 5 is O.
(n).

Finally, adding the time complexity of all the lines yields the overall time
complexity of the multiple function fT(n) = O(n).

The iterative method gets its name because it calculates an iterative algorithm's
time complexity by parsing it line by line and adding the complexity.

Aside from the iterative method, several other concepts are used in various cases.
The recursive process, for example, is an excellent way to calculate time
complexity for recurrent solutions that use recursive trees or substitutions. The
master's theorem is another popular method for calculating time complexity.

Methods for Calculating Space Complexity

With an example, you will go over how to calculate space complexity in this
section. Here is an example of computing the multiplication of array elements:

1. int mul, i
2. While i < = n do
3. mul <- mul * array[i]
4. i <- i + 1
5. end while
6. return mul

Let S(n) denote the algorithm's space complexity. In most systems, an integer
occupies 4 bytes of memory. As a result, the number of allocated bytes would be
the space complexity.

15
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

Line 1 allocates memory space for two integers, resulting in S(n) = 4 bytes
multiplied by 2 = 8 bytes. Line 2 represents a loop. Lines 3 and 4 assign a value
to an already existing variable. As a result, there is no need to set aside any space.
The return statement in line 6 will allocate one more memory case. As a result,
S(n)= 4 times 2 + 4 = 12 bytes.

Because the array is used in the algorithm to allocate n cases of integers, the final
space complexity will be fS(n) = n + 12 = O (n).

As you progress through this tutorial, you will see some differences between
space and time complexity.

Time Complexity vs. Space Complexity

You now understand space and time complexity fundamentals and how to
calculate it for an algorithm or program. In this section, you will summarise all
previous discussions and list the key differences in a table.

Time Complexity Space Complexity

Calculates the time required Estimates the space memory required

Memory space is counted for all


Time is counted for all statements
variables, inputs, and outputs.

The size of the input data is the Primarily determined by the auxiliary
primary determinant. variable size

More crucial in terms of solution More essential in terms of solution


optimization optimization

16
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

AVL TREES – CREATION


AVL Tree is invented by GM Adelson - Velsky and EM Landis in 1962. The tree
is named AVL in honour of its inventors.

AVL Tree can be defined as height balanced binary search tree in which each
node is associated with a balance factor which is calculated by subtracting the
height of its right sub-tree from that of its left sub-tree.

Tree is said to be balanced if balance factor of each node is in between -1 to 1,


otherwise, the tree will be unbalanced and need to be balanced.

Balance Factor (k) = height (left(k)) - height (right(k))

If balance factor of any node is 1, it means that the left sub-tree is one level higher
than the right sub-tree.

If balance factor of any node is 0, it means that the left sub-tree and right sub-tree
contain equal height.

If balance factor of any node is -1, it means that the left sub-tree is one level lower
than the right sub-tree.

An AVL tree is given in the following figure. We can see that, balance factor
associated with each node is in between -1 and +1. therefore, it is an example of
AVL tree.

17
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

Complexity

Algorithm Average case Worst case

Space o(n) o(n)

Search o(log n) o(log n)

Insert o(log n) o(log n)

Delete o(log n) o(log n)

Operations on AVL tree

Due to the fact that, AVL tree is also a binary search tree therefore, all the
operations are performed in the same way as they are performed in a binary search
tree. Searching and traversing do not lead to the violation in property of AVL
tree. However, insertion and deletion are the operations which can violate this
property and therefore, they need to be revisited.

SN Operation Description

Insertion in AVL tree is performed in the same way as it is


performed in a binary search tree. However, it may lead to
1 Insertion violation in the AVL tree property and therefore the tree
may need balancing. The tree can be balanced by applying
rotations.

Deletion can also be performed in the same way as it is


performed in a binary search tree. Deletion may also
2 Deletion
disturb the balance of the tree therefore, various types of
rotations are used to rebalance the tree.

Why AVL Tree?

AVL tree controls the height of the binary search tree by not letting it to be
skewed. The time taken for all operations in a binary search tree of height h
is O(h). However, it can be extended to O(n) if the BST becomes skewed (i.e.
worst case). By limiting this height to log n, AVL tree imposes an upper bound
on each operation to be O(log n) where n is the number of nodes.

18
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

AVL Rotations

We perform rotation in AVL tree only in case if Balance Factor is other than -1,
0, and 1. There are basically four types of rotations which are as follows:

1. L L rotation: Inserted node is in the left subtree of left subtree of A


2. R R rotation : Inserted node is in the right subtree of right subtree of A
3. L R rotation : Inserted node is in the right subtree of left subtree of A
4. R L rotation : Inserted node is in the left subtree of right subtree of A

Where node A is the node whose balance Factor is other than -1, 0, 1.

The first two rotations LL and RR are single rotations and the next two rotations
LR and RL are double rotations. For a tree to be unbalanced, minimum height
must be at least 2, Let us understand each rotation

1. RR Rotation

When BST becomes unbalanced, due to a node is inserted into the right subtree
of the right subtree of A, then we perform RR rotation, RR rotation is an
anticlockwise rotation, which is applied on the edge below a node having balance
factor -2

In above example, node A has balance factor -2 because a node C is inserted in


the right subtree of A right subtree. We perform the RR rotation on the edge below
A.

2. LL Rotation

When BST becomes unbalanced, due to a node is inserted into the left subtree of
the left subtree of C, then we perform LL rotation, LL rotation is clockwise
rotation, which is applied on the edge below a node having balance factor 2.

19
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

In above example, node C has balance factor 2 because a node A is inserted in


the left subtree of C left subtree. We perform the LL rotation on the edge below
A.

3. LR Rotation

Double rotations are bit tougher than single rotation which has already explained
above. LR rotation = RR rotation + LL rotation, i.e., first RR rotation is performed
on subtree and then LL rotation is performed on full tree, by full tree we mean
the first node from the path of inserted node whose balance factor is other than -
1, 0, or 1.

Let us understand each and every step very clearly:

State Action

A node B has been inserted into the right subtree of A the


left subtree of C, because of which C has become an
unbalanced node having balance factor 2. This case is L R
rotation where: Inserted node is in the right subtree of left
subtree of C

As LR rotation = RR + LL rotation, hence RR


(anticlockwise) on subtree rooted at A is performed first.
By doing RR rotation, node A, has become the left subtree
of B.

20
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

After performing RR rotation, node C is still unbalanced,


i.e., having balance factor 2, as inserted node A is in the
left of left of C

Now we perform LL clockwise rotation on full tree, i.e. on


node C. node C has now become the right subtree of node
B, A is left subtree of B

Balance factor of each node is now either -1, 0, or 1, i.e.


BST is balanced now.

4. RL Rotation

As already discussed, that double rotations are bit tougher than single rotation
which has already explained above. R L rotation = LL rotation + RR rotation, i.e.,
first LL rotation is performed on subtree and then RR rotation is performed on
full tree, by full tree we mean the first node from the path of inserted node whose
balance factor is other than -1, 0, or 1.

State Action

A node B has been inserted into the left subtree of C the


right subtree of A, because of which A has become an
unbalanced node having balance factor - 2. This case is RL
rotation where: Inserted node is in the left subtree of right
subtree of A

21
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

As RL rotation = LL rotation + RR rotation, hence, LL


(clockwise) on subtree rooted at C is performed first. By
doing RR rotation, node C has become the right subtree
of B.

After performing LL rotation, node A is still unbalanced,


i.e. having balance factor -2, which is because of the right-
subtree of the right-subtree node A.

Now we perform RR rotation (anticlockwise rotation) on


full tree, i.e. on node A. node C has now become the right
subtree of node B, and node A has become the left subtree
of B.

Balance factor of each node is now either -1, 0, or 1, i.e.,


BST is balanced now.

Q: Construct an AVL tree having the following elements

H, I, J, B, A, E, C, F, D, G, K, L

1. Insert H, I, J

22
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

On inserting the above elements, especially in the case of H, the BST becomes
unbalanced as the Balance Factor of H is -2. Since the BST is right-skewed, we
will perform RR Rotation on node H.

The resultant balance tree is:

2. Insert B, A

23
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

On inserting the above elements, especially in case of A, the BST becomes


unbalanced as the Balance Factor of H and I is 2, we consider the first node from
the last inserted node i.e. H. Since the BST from H is left-skewed, we will perform
LL Rotation on node H.

The resultant balance tree is:

3. Insert E

On inserting E, BST becomes unbalanced as the Balance Factor of I is 2, since if


we travel from E to I we find that it is inserted in the left subtree of right subtree
of I, we will perform LR Rotation on node I. LR = RR + LL rotation

3 a) We first perform RR rotation on node B

The resultant tree after RR rotation is:

24
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

3b) We first perform LL rotation on the node I

The resultant balanced tree after LL rotation is:

4. Insert C, F, D

25
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

On inserting C, F, D, BST becomes unbalanced as the Balance Factor of B and H


is -2, since if we travel from D to B we find that it is inserted in the right subtree
of left subtree of B, we will perform RL Rotation on node I. RL = LL + RR
rotation.

4a) We first perform LL rotation on node E

The resultant tree after LL rotation is:

4b) We then perform RR rotation on node B

The resultant balanced tree after RR rotation is:

26
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

5. Insert G

On inserting G, BST become unbalanced as the Balance Factor of H is 2, since if


we travel from G to H, we find that it is inserted in the left subtree of right subtree
of H, we will perform LR Rotation on node I. LR = RR + LL rotation.

5 a) We first perform RR rotation on node C

The resultant tree after RR rotation is:

27
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

5 b) We then perform LL rotation on node H

The resultant balanced tree after LL rotation is:

6. Insert K

28
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

On inserting K, BST becomes unbalanced as the Balance Factor of I is -2. Since


the BST is right-skewed from I to K, hence we will perform RR Rotation on the
node I.

The resultant balanced tree after RR rotation is:

7. Insert L

On inserting the L tree is still balanced as the Balance Factor of each node is now
either, -1, 0, +1. Hence the tree is a Balanced AVL tree

29
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

INSERTION, DELETION OPERATIONS AND APPLICATIONS


Insertion into AVL Trees

The first step of insertionof a node into the AVL tree is a normal insertion using
a BST insertion algorithm. However in addition we have to rebalance the tree if
an imbalance occurs. As we have discussed in the last module, an imbalance
occurs if a node’s balance factor changes from -1 to -2 or from+1 to
+2.Rebalancing is done at the deepest or lowest unbalanced ancestor of the
inserted node.

There are three cases that can happen during insertion:

1. Insertion that does not cause an imbalance.

2.Same side (left-left or right-right) insertion that causes an imbalance. This type
of insertion requires a single rotation to rebalance.

3.Opposite side (left-right or right-left) insertion that causes an imbalance. This


type of insertion requires a double rotation to rebalance.

Insertion Algorithm

The first step of the insertion algorithm for AVL trees is the same as insertion
into a binary search tree. Here we first find a place for the value to be inserted,
and then insert it. Now comes the next step of insertion into the AVL tree which
is searching back from the inserted node looking for imbalance. If there is an
imbalancewhen a new element is added as the outside grandchild (that is the left
grandchild of a left child (left-left) or the right grandchild of a right child (right-
right))(Figure 24.1 (a)) we perform single rotation and exit. When a new item is
added as an inside grandchild (Figure 24.1 (b)), the imbalance is fixed with a
double right or left rotation. As already discussed in the previous module an
inside grandchild is when we have left grandchild of a right child (left-right) or
the right grandchild of a left child (right-left)).

As we have already discussed in the previous module the insert operation may
cause balance factor to become +2 or –2 for some node. Now only nodes on the
path from the insertion point to the root node have possibly changed in height.
Therefore after insertion, we go back up to the root node by node, updating
heights as we go. If a new balance factor (the difference hleft-hright) is +2 or –2,
we need to adjust the balance of the tree by rotation around the node. Now let us
consider a validAVL sub-tree (Figure 24.2). Before the insertion into the subtree

30
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

M the tree was a valid AVL tree, but after the insertion the AVL property is
violated at node j where the balance factor has become +2.

Rebalancing

Rebalancing is done at the deepest unbalanced ancestor of the inserted node. Now
let the node that needs rebalancing be denoted as X. The rebalancing is performed
through four separate types of rotations as given below:

Outside Cases (require single rotation) :

1. Insertion into left subtree of left child of X – (Single Right Rotation)

2. Insertion into right subtree of right child of X – (Single Left Rotation)

Inside Cases (require double rotation) :

3. Insertion into right subtree of left child of X – (Left-Right Rotation)

4. Insertion into left subtree of right child of X – (Right-Left Rotation)

Insertion into left sub-tree of left child of X

Let us consider the first case of insertion into left sub-tree of left child of
X (Figure 24.3). Here we have inserted 7 as left sub-tree of left child (8) of X (9).
It is at X that the balancing is violated with balance factor of X becoming +2.
Now this node X is the pivot. This imbalance occurred when the new element (7)
was added to the left sub-tree of the outside left grandchild that is we have the
case of single right rotation.

Now we carry out single right rotation of 8 about 9. In this case 8 becomes the
right child of the parent (6) of 9 while 9 becomes the right child of 8. The BST
property of the AVL tree is now maintained.

Insertion into right sub-tree of right child of X

Let us consider the second case of insertion into right sub-tree of right child
of X (Figure 24.4). Here we have inserted 45 as right sub-tree of right child (40)
of X (35). It is at X that the balancing is violated with balance factor of X
becoming -2. Now the node X is the pivot. This imbalance occurred when the
new element (45) was added to the right sub-tree of the outside right grandchild
that is we have the case of single left rotation.Now we carry out single left rotation
of 40 about 35. In this case 40 becomes the right child of the parent (30) of 35

31
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

while 35 becomes the left child of 40. The BST property of the AVL tree is now
maintained.

Let us consider the third case of insertion into left sub-tree of right child of
X (Figure 24.5). Here we have inserted 34 as left sub-tree of right child (40) of X
(30). Balance is violated at X balance factor becoming -2. Now we carry out
two rotations that is a right rotation followed by a left rotation:

 The first is a single right rotate of 35 about the first pivot (40). Here 35
becomes the new right child of X while 40 becomes the right child of 35.
 Next we carry out a single left rotate of 35 about second pivot node X (30).
In this case 35 becomes the left child of the parent (20) of 30 while 30
becomes the new left child of 35. The BST property of the AVL tree is now
maintained.

Insertion into right sub-tree of left child of X

Let us consider the fourth case of insertion into right sub-tree of left child of
X (Figure 24.6). Here we have inserted 7 as right sub-tree of left child (5) of X
(10). Balance is violated at X balance factor becoming +2. When a new item (7)
is added to the sub-tree oftheinside grandchild, the imbalance is fixed with a
double rotation. Now we carry out two rotations that is a left rotation followed by
a right rotation:

AvlTree Insert( ElementType X, AvlTree T ){ if( T == NULL ){

/* Create and return a one-node tree */

T = malloc( sizeof( struct AvlNode ) );

if( T == NULL )

FatalError( “Out of space!!!” );

else {

T->Element = X; T->Height = 0;

T->Left = T->Right = NULL;

32
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

else if( X < T->Element ){

T->Left = Insert( X, T->Left );/* Insertion at the left*/ if( Height( T->Left )
– Height( T->Right ) == 2 ) if( X < T->Left->Element )

T = SingleRotateWithLeft( T ); /* LL */

else

T = DoubleRotateWithLeft( T ); /* LR */ } else if( X > T->Element ){

T->Right = Insert( X, T->Right ); /* Insertion at the right*/ if( Height( T-


>Right ) – Height( T->Left ) == 2 ) if( X > T->Right->Element )

T = SingleRotateWithRight( T ); /* RR */

else

T = DoubleRotateWithRight( T ); /* RL */

} /* Else X is in the tree already; we’ll do nothing */ T-


>Height=Max(Height(T->Left),Height(T->Right))+1;

return T;

The first is a single left rotate of 6 about the first pivot (5). Here 6 becomes the
new left child of the parent (10)of 5 while 5 becomes the left child of 6.

 Next we carry out a single right rotate of 6 about the second pivot node X
(10). In this case 6 becomes the left child of parent (13) of X (10) while 10
becomes the new right child of 6. The BST property of the AVL tree is
now maintained

The detailed algorithm for insertion is given in Figure 24.7.

33
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

Deletion

The first step in deletion is deleting the node to be deleted X as in an ordinary


binary search tree. Note that whatever may be the cases of deletion from the
binary search tree, the last node deleted is a leaf.The deletion may result in an
imbalance. Now we follow the path from the new leaf towards the root.For each
node X encountered, we need to check if heights of left(X) and right(X) differ by
at most 1. If yes, proceed to the parent(X) which now becomes the new X. If not,
that is the heights differ by more than 1 then we need to perform an appropriate
rotation at X. There are 4 cases as in the case of insertion. In the case of deletion,
after we perform a rotation at X, we may have to perform a rotation at some
ancestor of X. Thus, we must continue to trace the path until we reach the root.
Now we will discuss the case of deletion where no balancing is needed and the
cases where rebalancing of the tree is needed when an imbalance occurs after
deletion.

We can consider three cases for deletion:

1. Deletion that does not cause an imbalance.

2.Deletion that requires a single rotation to rebalance.

3.Deletion that requires two or more rotations to rebalance.

Deletion with no Imbalance

Let us consider the simplest case of deletion that is deletion which causes no
imbalance. We first consider the current node pthat has sub-trees T1 and T2 with
equal heights and so the balance factor of pis zero (Figure 24.8).When deletion
occurs say in the left sub-tree T1, it’s height is decreased but the height of p
remains unchanged. The balance factor of pbecomes (-1). This is allowed so there
is no imbalance. This is illustrated using the example given in Figure 24.9. The
deletion of 14 causes the balance factor of 15 to change from 0 to -1 but it’s height
remains the same as before deletion and hence no rotation is needed.

We next consider the current node p whose balance factor is not 0. Let us see the
case where balance factor of pis +1 (Figure 24.10) and the taller subtree (here T1)
was shortened. Now the balance factor of P becomes 0, the height of the tree is
reduced but there is no imbalance and hence there is no need for rotations.

34
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

Deletion with Single Rotation

Now let us consider the case shown in Figure 24.11. The balance factor
of q is 0.Before deletion the left sub-tree of p had height h and the right sub-tree
of p had height=h+1. Now we delete a node from left sub-tree such that it’s height
become h- 1. Now the balance factor at p changes from -1 to -2 and the AVL
condition is violated. Then we need to carry out rebalancing. In this case we need
to carry out a single right rotation of q about p where p becomes the left child of
q and q’s left child becomes p’s right child.

Now let us consider the case shown in Figure 24.12. The balance factor
of qis equal to that of p.Before deletion left sub-tree of p from T1, the balance
factor of p becomes (h-1) – (h+1)= -2 and hence there is an imbalance. Now we
do a single right rotation of q about P where P becomes left sub-tree of q and T2
becomes right sub-tree of P. The overall height of the tree is reduced.

Deletion with Single Right Rotation

Now let us consider the example of deletion which causes an imbalance and leads
to single right rotation (Figure 24.13). Here we assume that 40 is deleted (From
T3 of Figure 24.12). This causes an imbalance at node 35 (q). Now to rebalance
we carry out single right rotation of 32 (T2) with 35 (q) as the pivot. Now 32
becomes the right child of the parent (30) of 35 while 35 becomes the right child
of 32.

Deletion with Single Left Rotation

Now let us consider the example of deletion which causes an imbalance and leads
to single left rotation (Figure 24.14). Here we assume that 32 is deleted. This
causes an imbalance at the root node 44. Now to rebalance we carry out single
left rotation of 62 with 44 as the pivot. Now 62 becomes the new root node while
44 becomes the right child of 62. The right sub-tree of 62 rooted at 50 now
becomes the right sub-tree of 44.

Deletion with Double Rotation

In case the balance factors of p and q are opposite then we need to apply a double
rotation (Figure 24.15). The balance factor at q will be 0 or 1. The balance factor
of node p will be 0 or -1.Now let us assume p’s left child has height h before
deletion. Let us assume that the right child q of p has left sub-tree rooted at r to
be of height h-1+1=h or h-2 +1=h-1.The left sub-tree of q has height h-1. Now let
us assume that a node is deleted from the left sub-tree of p making it’s height h-

35
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

1. We first right rotate r about q and then left rotate r about p. Then we set the
balance factors of the new root r to be 0.

For the case of deletion with two or more rotations let us consider the example
given in Figure 24.16. The deletion of 40 first causes an imbalance to occur at 35.
Now we carry out a right rotate of 32 with 35 as pivot. This causes 32 to become
right child of parent (30) of 35 and 35 itself becomes right child of 32. However
this rotation causes an imbalance at the root node 30. This requires us to carry out
another right rotation of 20 about the pivot 30. This causes 20 to become the new
root and the right sub-tree of 20 becomes the right sub-tree of 30. Now the tree is
balanced. Please note that in the case of deletion we need to check for imbalance
and carry on rebalancing until the tree is balanced. This may cause more than two
rotations in some situations.

AvlTree Delete( ElementType X, AvlTree T ){ if( T == NULL ) Error(“Item


not Found);

else if( X < T->Element ){

T->Left = Delete( X, T->Left );

if( Height( T->Left ) – Height( T->Right ) == -2 )/*Imbalance due to


insertion*/ if( Height(T->Right->Right) > Height(T->Right->Left) )

T = SingleRotateWithRight( T );/* RR */

else

T = DoubleRotateWithRight( T ); /* RL */

else if( X > T->Element ){

T->Right = Delete( X, T->Right );

if( Height( T->Right ) – Height( T->Left ) == -2 )

if( Height(T->Left->Left) > Height(T->Left->Right) )

T = SingleRotateWithLeft( T );/* LL */

36
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

else

T = DoubleRotateWithLeft( T ); /* LR */

else if( T->Left && T->Right ){ /* Found with two children */

/* Replace with smallest in right subtree */

TmpCell = FindMin( T->Right );

T->Element = TmpCell->Element;

T->Right = Delete( T->Element, T->Right ); if( Height( T->Right ) – Height(


T->Left ) == -2 )

if( Height(T->Left->Left) > Height(T->Left->Right) )/*LL*/

T = SingleRotateWithLeft( T );

else

T = DoubleRotateWithLeft( T ); /*LR*/

else {/* Found with one or zero child */

TmpCell = T;

T = T->Left ? T->Left : T->Right;/* Also handles 0 child */ free( TmpCell );

if( T!= NULL )

T->Height=Max(Height(T->Left), Height(T->Right))+1; return T;

B-TREES – CREATION, INSERTION, DELETION OPERATIONS AND


APPLICATIONS

37
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

B Tree
B Tree is a specialized m-way tree that can be widely used for disk access. A B-
Tree of order m can have at most m-1 keys and m children. One of the main
reason of using B tree is its capability to store large number of keys in a single
node and large key values by keeping the height of the tree relatively small.

A B tree of order m contains all the properties of an M way tree. In addition, it


contains the following properties.
1. Every node in a B-Tree contains at most m children.
2. Every node in a B-Tree except the root node and the leaf node contain at
least m/2 children.
3. The root nodes must have at least 2 nodes.
4. All leaf nodes must be at the same level.

It is not necessary that, all the nodes contain the same number of children but,
each node must have m/2 number of nodes.

A B tree of order 4 is shown in the following image.

While performing some operations on B Tree, any property of B Tree may


violate such as number of minimum children a node can have. To maintain the
properties of B Tree, the tree may split or join.
Operations
Searching :

Searching in B Trees is similar to that in Binary search tree. For example, if we


search for an item 49 in the following B Tree. The process will something like
following :

38
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

1. Compare item 49 with root node 78. since 49 < 78 hence, move to its left
sub-tree.
2. Since, 40<49<56, traverse right sub-tree of 40.
3. 49>45, move to right. Compare 49.
4. match found, return.

Searching in a B tree depends upon the height of the tree. The search algorithm
takes O(log n) time to search any element in a B tree.

Inserting

Insertions are done at the leaf node level. The following algorithm needs to be
followed in order to insert an item into B Tree.
1. Traverse the B Tree in order to find the appropriate leaf node at which the
node can be inserted.
2. If the leaf node contain less than m-1 keys then insert the element in the
increasing order.
3. Else, if the leaf node contains m-1 keys, then follow the following steps.
o Insert the new element in the increasing order of elements.
o Split the node into the two nodes at the median.
o Push the median element upto its parent node.
o If the parent node also contain m-1 number of keys, then split it too
by following the same steps.

Example:

Insert the node 8 into the B Tree of order 5 shown in the following image.

39
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

8 will be inserted to the right of 5, therefore insert 8.

The node, now contain 5 keys which is greater than (5 -1 = 4 ) keys. Therefore
split the node from the median i.e. 8 and push it up to its parent node shown as
follows.

Deletion

Deletion is also performed at the leaf nodes. The node which is to be deleted
can either be a leaf node or an internal node. Following algorithm needs to be
followed in order to delete a node from a B tree.
1. Locate the leaf node.
2. If there are more than m/2 keys in the leaf node then delete the desired
key from the node.
3. If the leaf node doesn't contain m/2 keys then complete the keys by taking
the element from eight or left sibling.
o If the left sibling contains more than m/2 elements then push its
largest element up to its parent and move the intervening element
down to the node where the key is deleted.

40
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

o If the right sibling contains more than m/2 elements then push its
smallest element up to the parent and move intervening element
down to the node where the key is deleted.
4. If neither of the sibling contain more than m/2 elements then create a new
leaf node by joining two leaf nodes and the intervening element of the
parent node.
5. If parent is left with less than m/2 nodes then, apply the above process on
the parent too.

If the the node which is to be deleted is an internal node, then replace the node
with its in-order successor or predecessor. Since, successor or predecessor will
always be on the leaf node hence, the process will be similar as the node is
being deleted from the leaf node.

Example 1

Delete the node 53 from the B Tree of order 5 shown in the following figure.

53 is present in the right child of element 49. Delete it.

Now, 57 is the only element which is left in the node, the minimum number of
elements that must be present in a B tree of order 5, is 2. it is less than that, the
elements in its left and right sub-tree are also not sufficient therefore, merge it
with the left sibling and intervening element of parent i.e. 49.

The final B tree is shown as follows.

41
ADVANCED DATA STRUCTURES & ALGORITHM ANALYSIS

Application of B tree

B tree is used to index the data and provides fast access to the actual data stored
on the disks since, the access to value stored in a large database that is stored on
a disk is a very time consuming process.

Searching an un-indexed and unsorted database containing n key values needs


O(n) running time in worst case. However, if we use B Tree to index this
database, it will be searched in O(log n) time in worst case.

42

You might also like