Adsanotes
Adsanotes
UNIT 1
Introduction to Algorithm Analysis
What is an algorithm?
An algorithm is a step by step procedure to solve a problem. In normal language, the algorithm is
defined as a sequence of statements which are used to perform a task. In computer science, an
algorithm can be defined as follows...
An algorithm is a sequence of unambiguous instructions used for solving a problem, which can
be implemented (as a program) on a computer.
Algorithms are used to convert our problem solution into step by step statements. These statements can
be converted into computer programming instructions which form a program. This program is
executed by a computer to produce a solution. Here, the program takes required data as input,
processes data according to the program instructions and finally produces a result as shown in the
following picture.
Specifications of Algorithms
Every algorithm must satisfy the following specifications...
Input - Every algorithm must take zero or more number of input values from external.
Output - Every algorithm must produce an output as result.
Definiteness - Every statement/instruction in an algorithm must be clear and unambiguous (only
one interpretation).
Finiteness - For all different cases, the algorithm must produce result within a finite number of
steps.
Effectiveness - Every instruction must be basic enough to be carried out and it also must be
feasible.
Example for an Algorithm
Let us consider the following problem for finding the largest value in a given list of values.
Problem Statement : Find the largest number in the given list of numbers?
Input : A list of positive integer numbers. (List must contain at least one number).
Output : The largest number in the given list of positive integer numbers.
Consider the given list of numbers as 'L' (input), and the largest number as 'max' (Output).
Algorithm
1. Step 1: Define a variable 'max' and initialize with '0'.
2. Step 2: Compare first number (say 'x') in the list 'L' with 'max', if 'x' is larger than 'max', set
'max' to 'x'.
3. Step 3: Repeat step 2 for all numbers in the list 'L'.
4. Step 4: Display the value of 'max' as a result.
Algorithm Performance Analysis
What is Performance Analysis of an algorithm?
If we want to go from city "A" to city "B", there can be many ways of doing this. We can go by
flight, by bus, by train and also by bicycle. Depending on the availability and convenience, we
choose the one which suits us. Similarly, in computer science, there are multiple algorithms to
solve a problem. When we have more than one algorithm to solve a problem, we need to select the
best one. Performance analysis helps us to select the best algorithm from multiple algorithms to
solve a problem.
When there are multiple alternative algorithms to solve a problem, we analyze them and pick the
one which is best suitable for our requirements. The formal definition is as follows...
Performance of an algorithm is a process of making evaluative judgement about algorithms.
It can also be defined as follows...
Performance of an algorithm means predicting the resources which are required to an
algorithm to perform its task.
That means when we have multiple algorithms to solve a problem, we need to select a suitable
algorithm to solve that problem.
We compare algorithms with each other which are solving the same problem, to select the best
algorithm. To compare algorithms, we use a set of parameters or set of elements like memory required
by that algorithm, the execution speed of that algorithm, easy to understand, easy to implement, etc.,
Generally, the performance of an algorithm depends on the following elements...
1. Whether that algorithm is providing the exact solution for the problem?
2. Whether it is easy to understand?
3. Whether it is easy to implement?
4. How much space (memory) it requires to solve the problem?
5. How much time it takes to solve the problem? Etc.,
2
Space and Time Complexity analysis
When we want to analyse an algorithm, we consider only the space and time required by that particular
algorithm and we ignore all the remaining elements.
Based on this information, performance analysis of an algorithm can also be defined as follows...
Performance analysis of an algorithm is the process of calculating space and time required by
that algorithm.
Generally, when a program is under execution it uses the computer memory for THREE reasons. They
are as follows...
1. Instruction Space: It is the amount of memory used to store compiled version of instructions.
2. Environmental Stack: It is the amount of memory used to store information of partially
executed functions at the time of function call.
3. Data Space: It is the amount of memory used to store all the variables and constants.
Note - When we want to perform analysis of an algorithm based on its Space complexity, we consider
only Data Space and ignore Instruction Space as well as Environmental Stack.
That means we calculate only the memory required to store Variables, Constants, Structures, etc.,
To calculate the space complexity, we must know the memory required to store different data type
values (according to the compiler). For example, the C Programming Language compiler requires the
following...
1. 2 bytes to store Integer value.
2. 4 bytes to store Floating Point value.
3. 1 byte to store Character value.
4. 6 (OR) 8 bytes to store double value.
b) Time Complexity
What is Time complexity?
Every algorithm requires some amount of computer time to execute its instruction to perform the task.
This computer time required is called time complexity.
The time complexity of an algorithm can be defined as follows...
The time complexity of an algorithm is the total amount of time required by an algorithm to
complete its execution.
Generally, the running time of an algorithm depends upon the following...
1. Whether it is running on Single processor machine or Multi processor machine.
2. Whether it is a 32 bit machine or 64 bit machine.
3. Read and Write speed of the machine.
4. The amount of time required by an algorithm to
perform Arithmetic operations, logical operations, return value and assignment operations
etc.,
5. Input data
Note - When we calculate time complexity of an algorithm, we consider only input data and ignore
the remaining things, as they are machine dependent. We check only, how our program is behaving for
the different input values to perform all the operations like Arithmetic, Logical, Return value and
Assignment etc.,
Calculating Time Complexity of an algorithm based on the system configuration is a very difficult task
because the configuration changes from one system to another system. To solve this problem, we must
assume a model machine with a specific configuration. So that, we can able to calculate generalized
time complexity according to that model machine.
To calculate the time complexity of an algorithm, we need to define a model machine. Let us assume a
machine with following configuration...
1. It is a Single processor machine
2. It is a 32 bit Operating System machine
5
3. It performs sequential execution
4. It requires 1 unit of time for Arithmetic and Logical operations
5. It requires 1 unit of time for Assignment and Return value
6. It requires 1 unit of time for Read and Write operations
Now, we calculate the time complexity of following example code by using the above-defined model
machine...
Consider the following piece of code...
Example 1
int sum(int a, int b)
{
return a+b;
}
In the above sample code, it requires 1 unit of time to calculate a+b and 1 unit of time to return the
value. That means, totally it takes 2 units of time to complete its execution. And it does not change
based on the input values of a and b. That means for all input values, it requires the same amount of
time i.e. 2 units.
If any program requires a fixed amount of time for all input values then its time complexity is
said to be Constant Time Complexity.
Consider the following piece of code...
Example 2
int sum(int A[], int n)
{
int sum = 0, i;
for(i = 0; i < n; i++)
sum = sum + A[i];
return sum;
}
For the above code, time complexity can be calculated as follows...
6
In above calculation
Cost is the amount of computer time required for a single operation in each line.
Repeatation is the amount of computer time required by each operation for all its repeatations.
Total is the amount of computer time required by each operation to execute.
So above code requires '4n+4' Units of computer time to complete the task. Here the exact time is not
fixed. And it changes based on the n value. If we increase the n value then the time required also
increases linearly.
Totally it takes '4n+4' units of time to complete its execution and it is Linear Time Complexity.
If the amount of time required by an algorithm is increased with the increase of input value then
that time complexity is said to be Linear Time Complexity.
Asymptotic Notations
What is Asymptotic Notation?
Whenever we want to perform analysis of an algorithm, we need to calculate the complexity of that
algorithm. But when we calculate the complexity of an algorithm it does not provide the exact amount
of resource required. So instead of taking the exact amount of resource, we represent that complexity
in a general form (Notation) which produces the basic nature of that algorithm. We use that general
form (Notation) for analysis process.
Asymptotic notation of an algorithm is a mathematical representation of its complexity.
In asymptotic notation, when we want to represent the complexity of an algorithm, we use only the
most significant terms in the complexity of that algorithm and ignore least significant terms in the
complexity of that algorithm (Here complexity can be Space Complexity or Time Complexity).
For example, consider the following time complexities of two algorithms...
Algorithm 1 : 5n2 + 2n + 1
Algorithm 2 : 10n2 + 8n + 3
Generally, when we analyze an algorithm, we consider the time complexity for larger values of input
data (i.e. 'n' value). In above two time complexities, for larger value of 'n' the term '2n + 1' in
algorithm 1 has least significance than the term '5n2', and the term '8n + 3' in algorithm 2 has least
significance than the term '10n2'.
Here, for larger value of 'n' the value of most significant terms ( 5n2 and 10n2 ) is very larger than the
value of least significant terms ( 2n + 1 and 8n + 3 ). So for larger value of 'n' we ignore the least
significant terms to represent overall time required by an algorithm. In asymptotic notation, we use
only the most significant terms to represent the time complexity of an algorithm.
7
Majorly, we use THREE types of Asymptotic Notations and those are as follows...
1. Big - Oh (O)
2. Big - Omega (Ω)
3. Big - Theta (Θ)
Big - Oh Notation (O)
Big - Oh notation is used to define the upper bound of an algorithm in terms of Time Complexity.
That means Big - Oh notation always indicates the maximum time required by an algorithm for all
input values. That means Big - Oh notation describes the worst case of an algorithm time complexity.
Big - Oh Notation can be defined as follows...
Consider function f(n) as time complexity of an algorithm and g(n) is the most significant term.
If f(n) <= C g(n) for all n >= n0, C > 0 and n0 >= 1. Then we can represent f(n) as O(g(n)).
f(n) = O(g(n))
Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-Axis and
time required is on Y-Axis
In above graph after a particular input value n0, always C g(n) is greater than f(n) which indicates the
algorithm's upper bound.
Example
Consider the following f(n) and g(n)...
f(n) = 3n + 2
g(n) = n
If we want to represent f(n) as O(g(n)) then it must satisfy f(n) <= C g(n) for all values of C >
0 and n0>= 1
f(n) <= C g(n)
⇒3n + 2 <= C n
Above condition is always TRUE for all values of C = 4 and n >= 2.
By using Big - Oh notation we can represent the time complexity as follows...
3n + 2 = O(n)
8
Big - Omege Notation (Ω)
Big - Omega notation is used to define the lower bound of an algorithm in terms of Time Complexity.
That means Big-Omega notation always indicates the minimum time required by an algorithm for all
input values. That means Big-Omega notation describes the best case of an algorithm time complexity.
Big - Omega Notation can be defined as follows...
Consider function f(n) as time complexity of an algorithm and g(n) is the most significant term.
If f(n) >= C g(n) for all n >= n0, C > 0 and n0 >= 1. Then we can represent f(n) as Ω(g(n)).
f(n) = Ω(g(n))
Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-Axis and
time required is on Y-Axis
In above graph after a particular input value n0, always C g(n) is less than f(n) which indicates the
algorithm's lower bound.
Example
Consider the following f(n) and g(n)...
f(n) = 3n + 2
g(n) = n
If we want to represent f(n) as Ω(g(n)) then it must satisfy f(n) >= C g(n) for all values of C >
0 and n0>= 1
f(n) >= C g(n)
⇒3n + 2 >= C n
Above condition is always TRUE for all values of C = 1 and n >= 1.
By using Big - Omega notation we can represent the time complexity as follows...
3n + 2 = Ω(n)
9
Big - Theta Notation (Θ)
Big - Theta notation is used to define the average bound of an algorithm in terms of Time
Complexity.
That means Big - Theta notation always indicates the average time required by an algorithm for all
input values. That means Big - Theta notation describes the average case of an algorithm time
complexity.
Big - Theta Notation can be defined as follows...
Consider function f(n) as time complexity of an algorithm and g(n) is the most significant term.
If C1 g(n) <= f(n) <= C2 g(n) for all n >= n0, C1 > 0, C2 > 0 and n0 >= 1. Then we can represent
f(n) as Θ(g(n)).
f(n) = Θ(g(n))
Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-Axis and
time required is on Y-Axis
In above graph after a particular input value n0, always C1 g(n) is less than f(n) and C2 g(n) is greater
than f(n) which indicates the algorithm's average bound.
Example
Consider the following f(n) and g(n)...
f(n) = 3n + 2
g(n) = n
If we want to represent f(n) as Θ(g(n)) then it must satisfy C1 g(n) <= f(n) <= C2 g(n) for all values
of C1 > 0, C2 > 0 and n0>= 1
C1 g(n) <= f(n) <= C2 g(n)
⇒C1 n <= 3n + 2 <= C2 n
Above condition is always TRUE for all values of C1 = 1, C2 = 4 and n >= 2.
By using Big - Theta notation we can represent the time compexity as follows...
3n + 2 = Θ(n)
10
AVL Trees
AVL Tree is invented by GM Adelson - Velsky and EM Landis in 1962. The tree is named AVL in
honour of its inventors.
AVL Tree can be defined as height balanced binary search tree in which each node is associated with a
balance factor which is calculated by subtracting the height of its right sub-tree from that of its left
sub-tree.
Tree is said to be balanced if balance factor of each node is in between -1 to 1, otherwise, the tree will
be unbalanced and need to be balanced.
Balance Factor (k) = height (left(k)) - height (right(k))
If balance factor of any node is 1, it means that the left sub-tree is one level higher than the right sub-
tree.
If balance factor of any node is 0, it means that the left sub-tree and right sub-tree contain equal height.
If balance factor of any node is -1, it means that the left sub-tree is one level lower than the right sub-
tree.
An AVL tree is given in the following figure. We can see that, balance factor associated with each
node is in between -1 and +1. therefore, it is an example of AVL tree.
Complexity
11
Operations on AVL tree
Due to the fact that, AVL tree is also a binary search tree therefore, all the operations are performed in
the same way as they are performed in a binary search tree. Searching and traversing do not lead to the
violation in property of AVL tree. However, insertion and deletion are the operations which can violate
this property and therefore, they need to be revisited.
2 Deletion Deletion can also be performed in the same way as it is performed in a binary
search tree. Deletion may also disturb the balance of the tree therefore,
various types of rotations are used to rebalance the tree.
12
1. RR Rotation
When BST becomes unbalanced, due to a node is inserted into the right subtree of the right subtree of
A, then we perform RR rotation, RR rotation is an anticlockwise rotation, which is applied on the edge
below a node having balance factor -2
In above example, node A has balance factor -2 because a node C is inserted in the right subtree of A
right subtree. We perform the RR rotation on the edge below A.
2. LL Rotation
When BST becomes unbalanced, due to a node is inserted into the left subtree of the left subtree of C,
then we perform LL rotation, LL rotation is clockwise rotation, which is applied on the edge below a
node having balance factor 2.
In above example, node C has balance factor 2 because a node A is inserted in the left subtree of C left
subtree. We perform the LL rotation on the edge below A.
3. LR Rotation
Double rotations are bit tougher than single rotation which has already explained above. LR rotation =
RR rotation + LL rotation, i.e., first RR rotation is performed on subtree and then LL rotation is
performed on full tree, by full tree we mean the first node from the path of inserted node whose
balance factor is other than -1, 0, or 1.
Let us understand each and every step very clearly:
State Action
A node B has been inserted into the right subtree of A the left subtree
of C, because of which C has become an unbalanced node having
balance factor 2. This case is L R rotation where: Inserted node is in
the right subtree of left subtree of C
14
4. RL Rotation
As already discussed, that double rotations are bit tougher than single rotation which has already
explained above. R L rotation = LL rotation + RR rotation, i.e., first LL rotation is performed on
subtree and then RR rotation is performed on full tree, by full tree we mean the first node from the path
of inserted node whose balance factor is other than -1, 0, or 1.
State Action
A node B has been inserted into the left subtree of C the right subtree
of A, because of which A has become an unbalanced node having
balance factor - 2. This case is RL rotation where: Inserted node is in the
left subtree of right subtree of A
Balance factor of each node is now either -1, 0, or 1, i.e., BST is balanced
now.
15
Example Construct an AVL tree having the following elements
H, I, J, B, A, E, C, F, D, G, K, L
1. Insert H, I, J
On inserting the above elements, especially in the case of H, the BST becomes unbalanced as the
Balance Factor of H is -2. Since the BST is right-skewed, we will perform RR Rotation on node H.
2. Insert B, A
On inserting the above elements, especially in case of A, the BST becomes unbalanced as the Balance
Factor of H and I is 2, we consider the first node from the last inserted node i.e. H. Since the BST from
H is left-skewed, we will perform LL Rotation on node H.
The resultant balance tree is:
16
3b) We first perform LL rotation on the node I
The resultant balanced tree after LL rotation is:
17
4. Insert C, F, D
On inserting C, F, D, BST becomes unbalanced as the Balance Factor of B and H is -2, since if we
travel from D to B we find that it is inserted in the right subtree of left subtree of B, we will perform
RL Rotation on node I. RL = LL + RR rotation.
4a) We first perform LL rotation on node E
The resultant tree after LL rotation is:
18
5. Insert G
On inserting G, BST become unbalanced as the Balance Factor of H is 2, since if we travel from G to
H, we find that it is inserted in the left subtree of right subtree of H, we will perform LR Rotation on
node I. LR = RR + LL rotation.
5 a) We first perform RR rotation on node C
The resultant tree after RR rotation is:
19
6. Insert K
On inserting K, BST becomes unbalanced as the Balance Factor of I is -2. Since the BST is right-
skewed from I to K, hence we will perform RR Rotation on the node I.
The resultant balanced tree after RR rotation is:
7. Insert L
On inserting the L tree is still balanced as the Balance Factor of each node is now either, -1, 0, +1.
Hence the tree is a Balanced AVL tree
20
Applications of AVL Tree:
1. It is used to index huge records in a database and also to efficiently search in that.
2. For all types of in-memory collections, including sets and dictionaries, AVL Trees are used.
3. Database applications, where insertions and deletions are less common but frequent data
lookups are necessary
4. Software that needs optimized search.
5. It is applied in corporate areas and storyline games.
B Trees
B Tree is a specialized m-way tree that can be widely used for disk access. A B-Tree of order m can
have at most m-1 keys and m children. One of the main reason of using B tree is its capability to store
large number of keys in a single node and large key values by keeping the height of the tree relatively
small.
A B tree of order m contains all the properties of an M way tree. In addition, it contains the following
properties.
Every node in a B-Tree contains at most m children.
Every node in a B-Tree except the root node and the leaf node contain at least m/2 children.
The root nodes must have at least 2 nodes.
All leaf nodes must be at the same level.
It is not necessary that, all the nodes contain the same number of children but, each node must have
m/2 number of nodes.
A B tree of order 4 is shown in the following image.
While performing some operations on B Tree, any property of B Tree may violate such as number of
minimum children a node can have. To maintain the properties of B Tree, the tree may split or join.
B Tree Operations
Searching:
Searching in B Trees is similar to that in Binary search tree. For example, if we search for an item 49
in the following B Tree. The process will something like following :
Compare item 49 with root node 78. since 49 < 78 hence, move to its left sub-tree.
21
Since, 40<49<56, traverse right sub-tree of 40.
49>45, move to right. Compare 49.
match found, return.
Searching in a B tree depends upon the height of the tree. The search algorithm takes O(log n) time to
search any element in a B tree.
Inserting:
Insertions are done at the leaf node level. The following algorithm needs to be followed in order to
insert an item into B Tree.
1. Traverse the B Tree in order to find the appropriate leaf node at which the node can be inserted.
2. If the leaf node contain less than m-1 keys then insert the element in the increasing order.
3. Else, if the leaf node contains m-1 keys, then follow the following steps.
o Insert the new element in the increasing order of elements.
o Split the node into the two nodes at the median.
o Push the median element upto its parent node.
o If the parent node also contain m-1 number of keys, then split it too by following the
same steps.
Example:
Insert the node 8 into the B Tree of order 5 shown in the following image.
22
The node, now contain 5 keys which is greater than (5 -1 = 4 ) keys. Therefore split the node from the
median i.e. 8 and push it up to its parent node shown as follows.
Deletion:
Deletion is also performed at the leaf nodes. The node which is to be deleted can either be a leaf node
or an internal node. Following algorithm needs to be followed in order to delete a node from a B tree.
1. Locate the leaf node.
2. If there are more than m/2 keys in the leaf node then delete the desired key from the node.
3. If the leaf node doesn't contain m/2 keys then complete the keys by taking the element from
eight or left sibling.
o If the left sibling contains more than m/2 elements then push its largest element up to its
parent and move the intervening element down to the node where the key is deleted.
o If the right sibling contains more than m/2 elements then push its smallest element up to
the parent and move intervening element down to the node where the key is deleted.
4. If neither of the sibling contain more than m/2 elements then create a new leaf node by joining
two leaf nodes and the intervening element of the parent node.
5. If parent is left with less than m/2 nodes then, apply the above process on the parent too.
If the the node which is to be deleted is an internal node, then replace the node with its in-order
successor or predecessor. Since, successor or predecessor will always be on the leaf node hence, the
process will be similar as the node is being deleted from the leaf node.
Example 1
Delete the node 53 from the B Tree of order 5 shown in the following figure.
23
Now, 57 is the only element which is left in the node, the minimum number of elements that must be
present in a B tree of order 5, is 2. it is less than that, the elements in its left and right sub-tree are also
not sufficient therefore, merge it with the left sibling and intervening element of parent i.e. 49.
The final B tree is shown as follows.
Applications of B Tree:
o Using the B-Tree, finding data in a data set can be done in a great deal less time.
o Multilevel indexing is possible with the indexing feature.
o Other applications of B-Trees include encryption, computer networks, and natural language
processing.
o Since accessing values stored in a large database that is stored on a disc takes a long time, B
trees are used to index the data and provide quick access to the actual data stored on the disks.
o In the worst case, it takes O(n) running time to search a database with n key values that is not
sorted or indexed. However, if we use B Tree to index this database, it will be searched in O
(log n) time in worst case.