Data Structures Unit 1 and 2
Data Structures Unit 1 and 2
SYLLABUS
Binary Search Trees: Basics – Querying a Binary search tree – Insertion and Deletion- Red
Black trees: Properties of Red-Black Trees – Rotations – Insertion – Deletion -B-Trees: Definition of
B -trees – Basic operations on B-Trees – Deleting a key from a B-Tree- Heap – Heap
Implementation – Disjoint Sets - Fibonacci Heaps: structure – Merge able- heap operations
Decreasing a key and deleting a node-Bounding the maximum degree.
ALGORITHMS:
Algorithm is a step-by-step procedure, which defines a set of instructions to be executed in
a certain order to get the desired output.
From the data structure point of view, following are some important categories of
algorithms −
Characteristics of an Algorithm:
Not all procedures can be called an algorithm. An algorithm should have the following
characteristics
i. Unambiguous − Algorithm should be clear and unambiguous. Each of its steps (or phases),
and their inputs/outputs should be clear and must lead to only one meaning.
ii. Input − An algorithm should have 0 or more well-defined inputs.
iii. Output − An algorithm should have 1 or more well-defined outputs, and should match the
desired output.
iv. Finiteness − Algorithms must terminate after a finite number of steps.
v. Feasibility − Should be feasible with the available resources.
vi. Independent − An algorithm should have step-by-step directions, which should be
independent of any programming code.
There are no well-defined standards for writing algorithms. Rather, it is problem and
resource dependent. Algorithms are never written to support a particular programming code.
As we know that all programming languages share basic code constructs like loops (do, for,
while), flow-control (if-else), etc. These common constructs can be used to write an algorithm.
We write algorithms in a step-by-step manner, but it is not always the case. Algorithm
writing is a process and is executed after the problem domain is well-defined. That is, we should
know the problem domain, for which we are designing a solution.
Example:
ALGORITHM AS TECHNOLOGY
There can be different solutions or algorithms for the same coding problem and these solutions
may differ in terms of efficiency.
These differences can be much more significant than differences due to hardware and software.
So, the system performance depends on choosing efficient algorithms as much as choosing fast
hardware.
Even applications that do not require algorithms directly at the application level, relies heavily
upon algorithms.
For examples:
1. Does the application requires fast hardware? The hardware design uses algorithms.
2. Does the application depend upon the user interface? The design of the user interface relies on
algorithms.
3. Does the application rely on fast networking? Networking relies heavily on routing algorithms.
Overall, algorithms are at the core of almost all computer applications. Just as rapid innovations
are being made in other computer technologies , they are also being made in algorithms.
ALGORITHM ANALYSIS
Efficiency of an algorithm can be analyzed at two different stages, before implementation and
after implementation. They are the following −
1. A Priori Analysis − This is a theoretical analysis of an algorithm. Efficiency of an algorithm is
measured by assuming that all other factors, for example, processor speed, are constant and have
no effect on the implementation.
Algorithm analysis deals with the execution or running time of various operations involved.
The running time of an operation can be defined as the number of computer instructions
executed per operation.
Algorithm Complexity
Suppose X is an algorithm and n is the size of input data, the time and space used by the algorithm
X are the two main factors, which decide the efficiency of X.
I. Time Factor − Time is measured by counting the number of key operations such as
comparisons in the sorting algorithm.
II. Space Factor − Space is measured by counting the maximum memory space required by
the algorithm.
The complexity of an algorithm f(n) gives the running time and/or the storage space required by
the algorithm in terms of n as the size of input data.
Space Complexity
Space complexity of an algorithm represents the amount of memory space required by the
algorithm in its life cycle.
The space required by an algorithm is equal to the sum of the following two components −
A) A fixed part that is a space required to store certain data and variables, that are
independent of the size of the problem. For example, simple variables and constants
used, program size, etc.
B) A variable part is a space required by variables, whose size depends on the size of the
problem. For example, dynamic memory allocation, recursion stack space, etc.
Space complexity S(P) of any algorithm P is S(P) = C + SP(I), where C is the fixed part and S(I) is the
variable part of the algorithm, which depends on instance characteristic I.
Example:
Time Complexity
Time complexity of an algorithm represents the amount of time required by the algorithm to run
to completion.
Time requirements can be defined as a numerical function T(n), where T(n) can be measured as
the number of steps, provided each step consumes constant time.
For example, addition of two n-bit integers takes n steps.
Consequently, the total computational time is T(n) = c ∗ n, where c is the time taken for the
addition of two bits.
Here, we observe that T(n) grows linearly as the input size increases.
ASYMPTOTIC ANALYSIS
ASYMPTOTIC NOTATIONS
Following are the commonly used asymptotic notations to calculate the running time complexity
of an algorithm.
Ο Notation
Ω Notation
θ Notation
Big Oh Notation, Ο
The notation Ο(n) is the formal way to express the upper bound of an algorithm's running time. It
measures the worst case time complexity or the longest amount of time an algorithm can possibly
take to complete.
Omega Notation, Ω
The notation Ω(n) is the formal way to express the lower bound of an algorithm's running time. It
measures the best case time complexity or the best amount of time an algorithm can possibly
take to complete.
Theta Notation, θ
The notation θ(n) is the formal way to express both the lower bound and the upper bound of an
algorithm's running time. It is represented as follows −
Often algorithm efficiency is the deciding factor in analysis quality (of even if it possible at all).
The main objectives are to extend the basis understanding of efficient algorithms and data structures
for fundamental (big data) problems, as well as to further increase the Danish strength and capacity
within algorithms and data structures.
Since efficient algorithms and data structures are important – often even essential – in other
computer science research areas (as also explicitly indicated e.g. in the descriptions of the artificial
intelligence and data management disciplines), as well as in applications, there are significant
opportunities for synergies between algorithms researchers and other researcher in the project.
Thus, use of algorithmic advances in interdisciplinary and real-life application settings is another
important objective.
1. Whether that algorithm is providing the exact solution for the problem?
2. Whether it is easy to understand?
3. Whether it is easy to implement?
4. How much space (memory) it requires to solve the problem?
5. How much time it takes to solve the problem? Etc.,
When we want to analyse an algorithm, we consider only the space and time required by that
particular algorithm and we ignore all the remaining elements.
Performance analysis of an algorithm is performed by using the following measures...
1. Space required to complete the task of that algorithm (Space Complexity). It includes
program space and data space
2. Time required to complete the task of that algorithm (Time Complexity)
RECURRENCE RELATIONS
A recurrence is an equation or inequality that describes a function in terms of its values on smaller
inputs. To solve a Recurrence Relation means to obtain a function defined on the natural numbers
that satisfy the recurrence.
For Example, the Worst Case Running Time T(n) of the MERGE SORT Procedures is described by
the recurrence.
1. Substitution Method
2. Iteration Method
3. Recursion Tree Method
4. Master Method
1. SUBSTITUTION METHOD:
For Example1 Solve the equation by Substitution Method. We have to show that it is
asymptotically bound by O (log n).
Solution:
Example 2:
Solution:
2. ITERATION METHOD:
It means to expand the recurrence and express it as a summation of terms of n and initial
condition.
Solution:
Example 2:
Solution:
1. Recursion Tree Method is a pictorial representation of an iteration method which is in the form
of a tree where at each level nodes are expanded.
4. It is sometimes difficult to come up with a good guess. In Recursion tree, each root
and child represents the cost of a single subproblem.
5. We sum the costs within each of the levels of the tree to obtain a set of pre-level
costs and then sum all pre-level costs to determine the total cost of all levels of the
recursion.
6. A Recursion Tree is best used to generate a good guess, which can be verified by
the Substitution Method.
Example 1:
Solution:
Example 2:
Solution:
4. MASTER METHOD
The Master Method is used for solving the following types of recurrence
T (n) = a T + f (n) with a≥1 and b≥1 be constant & f(n) be a function and can be
interpreted as
In the function to the analysis of a recursive algorithm, the constants and function
take on the following significance:
MASTER THEOREM:
It is possible to complete an asymptotic tight bound in these three cases:
EXAMPLE:
SOLUTION:
Since this equation holds, the first case of the master theorem applies to the given recurrence
relation, thus resulting in the conclusion:
Case 3: If it is true f(n) = Ω for some constant ε >0 and it also true that: a
1. Searching: The TREE-SEARCH (x, k) algorithm searches the tree node at x for a node whose key
value equal to k. It returns a pointer to the node if it exists otherwise NIL.
Clearly, this algorithm runs in O (h) time where h is the height of the tree. The iterative version of
the above algorithm is very easy to implement
2. Minimum and Maximum: An item in a binary search tree whose key is a minimum can always
be found by following left child pointers from the root until a NIL is encountered. The following
procedure returns a pointer to the minimum element in the subtree rooted at a given node x.
3. Successor and predecessor: Given a node in a binary search tree, sometimes we used to find its
successor in the sorted form determined by an in order tree walk. If all keys are specific, the
successor of a node x is the node with the smallest key greater than key[x]. The structure of a
binary search tree allows us to rule the successor of a node without ever comparing keys. The
following action returns the successor of a node x in a binary search tree if it exists, and NIL if x
has the greatest key in the tree:
The code for TREE-SUCCESSOR is broken into two cases. If the right subtree of node x is nonempty,
then the successor of x is just the leftmost node in the right subtree, which we find in line 2 by
calling TREE-MINIMUM (right [x]). On the other hand, if the right subtree of node x is empty and x
has a successor y, then y is the lowest ancestor of x whose left child is also an ancestor of x. To
find y, we quickly go up the tree from x until we encounter a node that is the left child of its
parent; lines 3-7 of TREE-SUCCESSOR handle this case.
The running time of TREE-SUCCESSOR on a tree of height h is O (h) since we either follow a simple
path up the tree or follow a simple path down the tree. The procedure TREE-PREDECESSOR, which
is symmetric to TREE-SUCCESSOR, also runs in time O (h).
Now our node z will be either left or right child of its parent (y).
5. Deletion in Binary Search Tree: When Deleting a node from a tree it is essential
that any relationships, implicit in the tree can be maintained. The deletion of nodes
from a binary search tree will be considered:
1. Nodes with no children: This case is trivial. Simply set the parent's pointer to
the node to be deleted to nil and delete the node.
2. Nodes with one child: When z has no left child then we replace z by its right
child which may or may not be NIL. And when z has no right child, then we
replace z with its right child.
3. Nodes with both Childs: When z has both left and right child. We find z's
successor y, which lies in right z's right subtree and has no left child (the
successor of z will be a node with minimum value its right subtree and so it
has no left child).
o If y is z's right child, then we replace z.
o Otherwise, y lies within z's right subtree but not z's right child. In this
case, we first replace z by its own right child and the replace z by y.
For Example: Deleting a node z from a binary search tree. Node z may be the root, a
left child of node q, or a right child of q.
A Red Black Tree is a category of the self-balancing binary search tree. It was created
in 1972 by Rudolf Bayer who termed them "symmetric binary B-trees."
A red-black tree is a Binary tree where a particular node has color as an extra
attribute, either red or black. By check the node colors on any simple path from the
root to a leaf, red-black trees secure that no such path is higher than twice as long as
any other so that the tree is generally balanced.
A tree T is an almost red-black tree (ARB tree) if the root is red, but other conditions
above hold.
Operations on RB Trees:
1. Rotation:
Example: Draw the complete binary tree of height 3 on the keys {1, 2, 3... 15}. Add the NIL leaves
and color the nodes in three different ways such that the black heights of the resulting trees are: 2,
3 and 4.
Solution:
2. Insertion:
o Insert the new node the way it is done in Binary Search Trees.
o Color the node red
o If an inconsistency arises for the red-black tree, fix the tree according to the type of
discrepancy.
A discrepancy can decision from a parent and a child both having a red color. This type of
discrepancy is determined by the location of the node concerning grandparent, and the color of the
sibling of the parent.
After the insert new node, Coloring this new node into black may violate the black-height conditions
and coloring this new node into red may violate coloring conditions i.e. root is black and red node
has no red children. We know the black-height violations are hard. So we color the node red. After
this, if there is any color violation, then we have to correct them by an RB-INSERT-FIXUP procedure.
Example: Show the red-black trees that result after successively inserting the keys
41,38,31,12,19,8 into an initially empty red-black tree.
Solution:
Insert 41
Insert 19
o If the element to be deleted is in a node with only left child, swap this node
with one containing the largest element in the left subtree. (This node has no
right child).
o If the element to be deleted is in a node with only right child, swap this node
with the one containing the smallest element in the right subtree (This node
has no left child).
o If the element to be deleted is in a node with both a left child and a right
child, then swap in any of the above two ways. While swapping, swap only
the keys but not the colors.
o The item to be deleted is now having only a left child or only a right child.
Replace this node with its sole child. This may violate red constraints or black
constraint. Violation of red constraints can be easily fixed.
o If the deleted node is black, the black constraint is violated. The elimination
of a black node y causes any path that contained y to have one fewer black
node.
o Two cases arise:
o The replacing node is red, in which case we merely color it black to
make up for the loss of one black node.
o The replacing node is black.
The strategy RB-DELETE is a minor change of the TREE-DELETE procedure. After splicing out a
node, it calls an auxiliary procedure RB-DELETE-FIXUP that changes colors and performs rotation
to restore the red-black properties.
B - TREES
B-Trees maintain balance by ensuring that each node has a minimum number of keys, so the
tree is always balanced. This balance guarantees that the time complexity for operations such as
insertion, deletion, and searching is always O(log n), regardless of the initial shape of the tree.
Properties of B-Tree:
All leaves are at the same level.
B-Tree is defined by the term minimum degree ‘t‘. The value of ‘t‘ depends
upon disk block size.
Every node except the root must contain at least t-1 keys. The root may
contain a minimum of 1 key.
All nodes (including root) may contain at most (2*t – 1) keys.
Number of children of a node is equal to the number of keys in it plus 1.
All keys of a node are sorted in increasing order. The child between two
keys k1 and k2 contains all keys in the range from k1 and k2.
B-Tree grows and shrinks from the root which is unlike Binary Search Tree.
Binary Search Trees grow downward and also shrink from downward.
Like other balanced Binary Search Trees, the time complexity to search,
insert and delete is O(log n).
Insertion of a Node in B-Tree happens only at Leaf Node.
Following is an example of a B-Tree of minimum order 5
Note: that in practical B-Trees, the value of the minimum order is much more than
5.
We can see in the above diagram that all the leaf nodes are at the same level and all non-leafs
have no empty sub-tree and have keys one less than the number of their children.
Traversal in B-Tree:
Traversal is also similar to Inorder traversal of Binary Tree. We start from the leftmost child,
recursively print the leftmost child, then repeat the same process for the remaining children and
keys. In the end, recursively print the rightmost child.
Search Operation in B-Tree:
Search is similar to the search in Binary Search Tree. Let the key to be searched is k.
Start from the root and recursively traverse down.
For every visited non-leaf node,
If the node has the key, we simply return the node.
Otherwise, we recur down to the appropriate child (The child which is just before
the first greater key) of the node.
If we reach a leaf node and don’t find k in the leaf node, then return NULL.
Searching a B-Tree is similar to searching a binary tree. The algorithm is similar and goes with
recursion. At each level, the search is optimized as if the key value is not present in the range of the
parent then the key is present in another branch. As these values limit the search they are also
known as limiting values or separation values. If we reach a leaf node and don’t find the desired key
then it will display NULL.
Example:
Solution:
Applications of B-Trees:
It is used in large databases to access data stored on the disk
Searching for data in a data set can be achieved in significantly less time using the B-
Tree
With the indexing feature, multilevel indexing can be achieved.
Most of the servers also use the B-tree approach.
B-Trees are used in CAD systems to organize and search geometric data.
B-Trees are also used in other areas such as natural language processing, computer
networks, and cryptography.
A heap is a complete binary tree, and the binary tree is a tree in which the node can
have utmost two children. Before knowing more about the heap data structure, we
should know about the complete binary tree.
A complete binary tree is a binary tree in which all the levels except the last level, i.e.,
leaf node should be completely filled, and all the nodes should be left-justified.
In the above figure, we can observe that all the internal nodes are completely filled
except the leaf node; therefore, we can say that the above tree is a complete binary
tree.
o Min Heap
o Max heap
Min Heap: The value of the parent node should be less than or equal to either of its
children.
Or
In other words, the min-heap can be defined as, for every node i, the value of node i
is greater than or equal to its parent value except the root node. Mathematically, it
can be defined as:
Example:
11 is the root node, and the value of the root node is less than the value of all the
other nodes (left child or a right child).
Max Heap: The value of the parent node is greater than or equal to its children.
Or
In other words, the max heap can be defined as for every node i; the value of node i
is less than or equal to its parent value except the root node. Mathematically, it can
be defined as:
The above tree is a max heap tree as it satisfies the property of the max heap. Now, let's see the
array representation of the max heap.
The total number of comparisons required in the max heap is according to the height
of the tree. The height of the complete binary tree is always logn; therefore, the
time complexity would also be O(logn).
Suppose we want to create the max heap tree. To create the max heap tree, we
need to consider the following two cases:
o First, we have to insert the element in such a way that the property of the
complete binary tree must be maintained.
o Secondly, the value of the parent node should be greater than the either of
its child.
Step 2: The next element is 33. As we know that insertion in the binary tree always starts from the
left side so 44 will be added at the left of 33
Step 3: The next element is 77 and it will be added to the right of the 44
As we can observe in the above tree that it does not satisfy the max heap property, i.e., parent
node 44 is less than the child 77. So, we will swap these two values
Step 4: The next element is 11. The node 11 is added to the left of 33
Step 5: The next element is 55. To make it a complete binary tree, we will add the node 55 to the
right of 33
As we can observe in the above figure that it does not satisfy the property of the max heap
because 33<55, so we will swap these two values as shown below:
Step 6: The next element is 88. The left subtree is completed so we will add 88 to the left of 44
As we can observe in the above figure that it does not satisfy the property of the
max heap because 44<88, so we will swap these two values as shown below:
Again, it is violating the max heap property because 88>77 so we will swap these two
values as shown below:
Step 7: The next element is 66. To make a complete binary tree, we will add the 66
element to the right side of 77 as shown below:
In the above figure, we can observe that the tree satisfies the property of max heap;
therefore, it is a heap tree.
In Deletion in the heap tree, the root node is always deleted and it is replaced with
the last element.
Step 1: In the above tree, the first 30 node is deleted from the tree and it is replaced
with the 15 element as shown below:
Now we will heapify the tree. We will check whether the 15 is greater than either of
its child or not. 15 is less than 20 so we will swap these two values as shown below:
Again, we will compare 15 with its child. Since 15 is greater than 10 so no swapping
will occur.