Advanced Data Structure
Advanced Data Structure
Tree is a collection of elements called Nodes, where each node can have
arbitrary number of children.
Field Description
Root Root is a special node in a tree. The entire tree is referenced through it. It does not
have a parent.
Path Path is a number of successive edges from source node to destination node.
Height of Height of a node represents the number of edges on the longest path between that
Node node and a leaf.
Depth of Depth of a node represents the number of edges from the tree's root node to the
Node node.
Edge Edge is a connection between one node to another. It is a line between two nodes
Introduction to Trees
In the above figure, D, F, H, G are leaves. B and C are siblings. Each node
excluding a root is connected by a direct edge from exactly one other node
parent → children.
Levels of a node
Levels of a node represents the number of connections between the node and
the root. It represents generation of a node. If the root node is at level 0, its
next node is at level 1, its grand child is at level 2 and so on. Levels of a node
can be shown as follows:
Note:
- Nodes which are not leaves, are called Internal Nodes. Internal nodes have
at least one child.
- A tree can be empty with no nodes or a tree consists of one node called
the Root.
Height of a Node
In the above figure, A, B, C, D can have height. Leaf cannot have height as
there will be no path starting from a leaf. Node A's height is the number of
edges of the path to K not to D. And its height is 3.
Note:
- Height of a node defines the longest path from the node to a leaf.
Depth of a Node
While talking about the height, it locates a node at bottom where for depth, it is
located at top which is root level and therefore we call it depth of a node.
In the above figure, Node G's depth is 2. In depth of a node, we just count how
many edges between the targeting node & the root and ignoring the directions.
Advantages of Tree
Tree reflects structural relationships in the data.
It is used to represent hierarchies.
It provides an efficient insertion and searching operations.
Trees are flexible. It allows to move subtrees around with minimum effort.
Binary Tree- Representation in
Memory
A binary tree is a non-linear data structure to maintain binary relationships among
elements. Binary trees are special trees where a node can have maximum two child
nodes. These are on the left and right side of a given nodes so called left child and right
child nodes. These trees are best used to store decision trees which represent decisions
involving yes or no, true or false or 0 or 1. They are most frequently used in gaming
applications where only two moves are possible to be taken by a player. It stores various
states that may be achieved after a move is taken by a player.
Memory Representation-Array
A small and almost complete binary tree can be easily stored in a linear array. Small tree
is preferably stored in linear array because searching process in a linear array is
expensive. Complete means that if most of the nodes have two child nodes.
To store binary tree in a linear array, you need to consider the positional indexes of the
nodes. This indexing must be considered starting with 1 from the root node going from
left to right as you go down from one level to other.
These rules are used to store the tree of the above example in an array
If a binary tree contains less number of elements by is deep In structure, the memory
underutilization is a major issue.
o What type of data needs to be stored?: It might be a possibility that a certain data
structure can be the best fit for some kind of data.
o Cost of operations: If we want to minimize the cost for the operations for the most
frequently performed operations. For example, we have a simple list on which we have to
perform the search operation; then, we can create an array in which elements are stored
in sorted order to perform the binary search. The binary search works very fast for the
simple list as it divides the search space into half.
o Memory usage: Sometimes, we want a data structure that utilizes less memory.
A tree is also one of the data structures that represent hierarchical data. Suppose we
want to show the employees and their positions in the hierarchical form then it can be
represented as shown below:
The above tree shows the organization hierarchy of some company. In the above
structure, john is the CEO of the company, and John has two direct reports named
as Steve and Rohan. Steve has three direct reports named Lee, Bob, Ella where Steve is
a manager. Bob has two direct reports named Sal and Emma. Emma has two direct
reports named Tom and Raj. Tom has one direct report named Bill. This particular
logical structure is known as a Tree. Its structure is similar to the real tree, so it is named
a Tree. In this structure, the root is at the top, and its branches are moving in a
downward direction. Therefore, we can say that the Tree data structure is an efficient
way of storing the data in a hierarchical way.
o A tree data structure is defined as a collection of objects or entities known as nodes that
are linked together to represent or simulate hierarchy.
o A tree data structure is a non-linear data structure because it does not store in a
sequential manner. It is a hierarchical structure as elements in a Tree are arranged in
multiple levels.
o In the Tree data structure, the topmost node is known as a root node. Each node
contains some data, and data can be of any type. In the above tree structure, the node
contains the name of the employee, so the type of data would be a string.
o Each node contains some data and the link or reference of other nodes that can be
called children.
In the above structure, each node is labeled with some number. Each arrow shown in the
above figure is known as a link between the two nodes.
o Root: The root node is the topmost node in the tree hierarchy. In other words, the root
node is the one that doesn't have any parent. In the above structure, node numbered 1
is the root node of the tree. If a node is directly linked to some other node, it would be
called a parent-child relationship.
o Child node: If the node is a descendant of any node, then the node is known as a child
node.
o Parent: If the node contains any sub-node, then that node is said to be the parent of
that sub-node.
o Sibling: The nodes that have the same parent are known as siblings.
o Leaf Node:- The node of the tree, which doesn't have any child node, is called a leaf
node. A leaf node is the bottom-most node of the tree. There can be any number of leaf
nodes present in a general tree. Leaf nodes can also be called external nodes.
o Internal nodes: A node has atleast one child node known as an internal
o Ancestor node:- An ancestor of a node is any predecessor node on a path from the root
to that node. The root node doesn't have any ancestors. In the tree shown in the above
image, nodes 1, 2, and 5 are the ancestors of node 10.
o Descendant: The immediate successor of the given node is known as a descendant of a
node. In the above figure, 10 is the descendant of node 5.
o Number of edges: If there are n nodes, then there would n-1 edges. Each arrow in the
structure represents the link or path. Each node, except the root node, will have atleast
one incoming link known as an edge. There would be one link for the parent-child
relationship.
o Depth of node x: The depth of node x can be defined as the length of the path from the
root to the node x. One edge contributes one-unit length in the path. So, the depth of
node x can also be defined as the number of edges between the root node and the node
x. The root node has 0 depth.
o Height of node x: The height of node x can be defined as the longest path from the
node x to the leaf node.
Based on the properties of the Tree data structure, trees are classified into various
categories.
Implementation of Tree
The tree data structure can be created by creating the nodes dynamically with the help
of the pointers. The tree in the memory can be represented as shown below:
The above figure shows the representation of the tree data structure in the memory. In
the above structure, the node contains three fields. The second field stores the data; the
first field stores the address of the left child, and the third field stores the address of the
right child.
1. struct node
2. {
3. int data;
4. struct node *left;
5. struct node *right;
6. }
The above structure can only be defined for the binary trees because the binary tree can
have utmost two children, and generic trees can have more than two children. The
structure of the node for generic trees would be different as compared to the binary
tree.
Applications of trees
The following are the applications of trees:
o Storing naturally hierarchical data: Trees are used to store the data in the hierarchical
structure. For example, the file system. The file system stored on the disc drive, the file
and folder are in the form of the naturally hierarchical data and stored in the form of
trees.
o Organize data: It is used to organize data for efficient insertion, deletion and searching.
For example, a binary tree has a logN time for searching an element.
o Trie: It is a special kind of tree that is used to store the dictionary. It is a fast and efficient
way for dynamic spell checking.
o Heap: It is also a tree data structure implemented using arrays. It is used to implement
priority queues.
o B-Tree and B+Tree: B-Tree and B+Tree are the tree data structures used to implement
indexing in databases.
o Routing table: The tree data structure is also used to store the data in routing tables in
the routers.
o General tree: The general tree is one of the types of tree data structure. In the general
tree, a node can have either 0 or maximum n number of nodes. There is no restriction
imposed on the degree of the node (the number of nodes that a node can contain). The
topmost node in a general tree is known as a root node. The children of the parent node
are known as subtrees.
There can be n number of subtrees in a general tree. In the general tree, the subtrees are
unordered as the nodes in the subtree cannot be ordered.
Every non-empty tree has a downward edge, and these edges are connected to the
nodes known as child nodes. The root node is labeled with level 0. The nodes that have
the same parent are known as siblings.
o Binary tree: Here, binary name itself suggests two numbers, i.e., 0 and 1. In a binary tree,
each node in a tree can have utmost two child nodes. Here, utmost means whether the
node has 0 nodes, 1 node or 2 nodes.
To know more about the binary tree, click on the link given below:
https://www.javatpoint.com/binary-tree
o Binary Search tree: Binary search tree is a non-linear data structure in which one node is
connected to n number of nodes. It is a node-based data structure. A node can be
represented in a binary search tree with three fields, i.e., data part, left-child, and right-
child. A node can be connected to the utmost two child nodes in a binary search tree, so
the node contains two pointers (left child and right child pointer).
Every node in the left subtree must contain a value less than the value of the root node,
and the value of each node in the right subtree must be bigger than the value of the
root node.
A node can be created with the help of a user-defined data type known as struct, as
shown below:
1. struct node
2. {
3. int data;
4. struct node *left;
5. struct node *right;
6. }
The above is the node structure with three fields: data field, the second field is the left
pointer of the node type, and the third field is the right pointer of the node type.
To know more about the binary search tree, click on the link given below:
https://www.javatpoint.com/binary-search-tree
o AVL tree
It is one of the types of the binary tree, or we can say that it is a variant of the binary
search tree. AVL tree satisfies the property of the binary tree as well as of the binary
search tree. It is a self-balancing binary search tree that was invented by Adelson
Velsky Lindas. Here, self-balancing means that balancing the heights of left subtree and
right subtree. This balancing is measured in terms of the balancing factor.
We can consider a tree as an AVL tree if the tree obeys the binary search tree as well as
a balancing factor. The balancing factor can be defined as the difference between the
height of the left subtree and the height of the right subtree. The balancing factor's
value must be either 0, -1, or 1; therefore, each node in the AVL tree should have the
value of the balancing factor either as 0, -1, or 1.
To know more about the AVL tree, click on the link given below:
https://www.javatpoint.com/avl-tree
o Red-Black Tree
The red-Black tree is the binary search tree. The prerequisite of the Red-Black tree is
that we should know about the binary search tree. In a binary search tree, the value of
the left-subtree should be less than the value of that node, and the value of the right-
subtree should be greater than the value of that node. As we know that the time
complexity of binary search in the average case is log2n, the best case is O(1), and the
worst case is O(n).
When any operation is performed on the tree, we want our tree to be balanced so that
all the operations like searching, insertion, deletion, etc., take less time, and all these
operations will have the time complexity of log2n.
The red-black tree is a self-balancing binary search tree. AVL tree is also a height
balancing binary search tree then why do we require a Red-Black tree. In the AVL tree,
we do not know how many rotations would be required to balance the tree, but in the
Red-black tree, a maximum of 2 rotations are required to balance the tree. It contains
one extra bit that represents either the red or black color of a node to ensure the
balancing of the tree.
o Splay tree
The splay tree data structure is also binary search tree in which recently accessed
element is placed at the root position of tree by performing some rotation operations.
Here, splaying means the recently accessed node. It is a self-balancing binary search
tree having no explicit balance condition like AVL tree.
It might be a possibility that height of the splay tree is not balanced, i.e., height of both
left and right subtrees may differ, but the operations in splay tree takes order
of logN time where n is the number of nodes.
Splay tree is a balanced tree but it cannot be considered as a height balanced tree
because after each operation, rotation is performed which leads to a balanced tree.
o Treap
Treap data structure came from the Tree and Heap data structure. So, it comprises the
properties of both Tree and Heap data structures. In Binary search tree, each node on
the left subtree must be equal or less than the value of the root node and each node on
the right subtree must be equal or greater than the value of the root node. In heap data
structure, both right and left subtrees contain larger keys than the root; therefore, we
can say that the root node contains the lowest value.
In treap data structure, each node has both key and priority where key is derived from
the Binary search tree and priority is derived from the heap data structure.
The Treap data structure follows two properties which are given below:
o Right child of a node>=current node and left child of a node <=current node (binary
tree)
o Children of any subtree must be greater than the node (heap)
o B-tree
B-tree is a balanced m-way tree where m defines the order of the tree. Till now, we read
that the node contains only one key but b-tree can have more than one key, and more
than 2 children. It always maintains the sorted data. In binary tree, it is possible that leaf
nodes can be at different levels, but in b-tree, all the leaf nodes must be at the same
level.
The root node must contain minimum 1 key and all other nodes must contain
atleast ceiling of m/2 minus 1 keys.
Binary Tree
The Binary tree means that the node can have maximum two children. Here, binary
name itself suggests that 'two'; therefore, each node can have either 0, 1 or 2 children.
The above tree is a binary tree because each node contains the utmost two children. The
logical representation of the above tree is given below:
In the above tree, node 1 contains two pointers, i.e., left and a right pointer pointing to
the left and right node respectively. The node 2 contains both the nodes (left and right
node); therefore, it has two pointers (left and right). The nodes 3, 5 and 6 are the leaf
nodes, so all these nodes contain NULL pointer on both left and right parts.
As we know that,
n = 2h+1 -1
n+1 = 2h+1
log2(n+1) = log2(2h+1)
log2(n+1) = h+1
h = log2(n+1) - 1
As we know that,
n = h+1
h= n-1
The full binary tree is also known as a strict binary tree. The tree can only be considered
as the full binary tree if each node must contain either 0 or 2 children. The full binary
tree can also be defined as the tree in which each node must contain 2 children except
the leaf nodes.
o The number of leaf nodes is equal to the number of internal nodes plus 1. In the
above example, the number of internal nodes is 5; therefore, the number of leaf
nodes is equal to 6.
o The maximum number of nodes is the same as the number of nodes in the binary
tree, i.e., 2h+1 -1.
o The minimum number of nodes in the full binary tree is 2*h-1.
o The minimum height of the full binary tree is log2(n+1) - 1.
o The maximum height of the full binary tree can be computed as:
n= 2*h - 1
n+1 = 2*h
h = n+1/2
The above tree is a complete binary tree because all the nodes are completely filled, and
all the nodes in the last level are added at the left first.
A tree is a perfect binary tree if all the internal nodes have 2 children, and all the leaf
nodes are at the same level.
Let's look at a simple example of a perfect binary tree.
The below tree is not a perfect binary tree because all the leaf nodes are not at the same
level.
Note: All the perfect binary trees are the complete binary trees as well as the full binary
tree, but vice versa is not true, i.e., all complete binary trees and full binary trees are the
perfect binary trees.
The degenerate binary tree is a tree in which all the internal nodes have only one
children.
The balanced binary tree is a tree in which both the left and right trees differ by atmost
1. For example, AVL and Red-Black trees are balanced binary tree.
A Binary tree is implemented with the help of pointers. The first node in the tree is
represented by the root pointer. Each node in the tree consists of three parts, i.e., data,
left pointer and right pointer. To create a binary tree, we first need to create the node.
We will create the node of user-defined as shown below:
1. struct node
2. {
3. int data,
4. struct node *left, *right;
5. }
In the above structure, data is the value, left pointer contains the address of the left
node, and right pointer contains the address of the right node.
1. #include<stdio.h>
2. struct node
3. {
4. int data;
5. struct node *left, *right;
6. }
7. void main()
8. {
9. struct node *root;
10. root = create();
11. }
12. struct node *create()
13. {
14. struct node *temp;
15. int data;
16. temp = (struct node *)malloc(sizeof(struct node));
17. printf("Press 0 to exit");
18. printf("\nPress 1 for new node");
19. printf("Enter your choice : ");
20. scanf("%d", &choice);
21. if(choice==0)
22. {
23. return 0;
24. }
25. else
26. {
27. printf("Enter the data:");
28. scanf("%d", &data);
29. temp->data = data;
30. printf("Enter the left child of %d", data);
31. temp->left = create();
32. printf("Enter the right child of %d", data);
33. temp->right = create();
34. return temp;
35. }
36. }
The above code is calling the create() function recursively and creating new node on
each recursive call. When all the nodes are created, then it forms a binary tree structure.
The process of visiting the nodes is known as tree traversal. There are three types
traversals used to visit a node:
o Inorder traversal
o Preorder traversal
o Postorder traversal
Searching
Before moving directly to the binary search tree, let's first see a brief
description of the tree.
What is a tree?
A tree is a kind of data structure that is used to represent the data in
hierarchical form. It can be defined as a collection of objects or entities
called as nodes that are linked together to simulate a hierarchy. Tree is a
non-linear data structure as the data in a tree is not stored linearly or
sequentially.
Similarly, we can see the left child of root node is greater than its left child
and smaller than its right child. So, it also satisfies the property of binary
search tree. Therefore, we can say that the tree in the above image is a
binary search tree.
Suppose the data elements are - 45, 15, 79, 90, 10, 55, 12, 20, 50
o First, we have to insert 45 into the tree as the root of the tree.
o Then, read the next element; if it is smaller than the root node, insert
it as the root of the left subtree, and move to the next element.
o Otherwise, if the element is larger than the root node, then insert it as
the root of the right subtree.
Now, let's see the process of creating the Binary search tree using the given
data element. The process of creating the BST is shown below -
As 15 is smaller than 45, so insert it as the root node of the left subtree.
As 79 is greater than 45, so insert it as the root node of the right subtree.
Step 4 - Insert 90.
90 is greater than 45 and 79, so it will be inserted as the right subtree of 79.
12 is smaller than 45 and 15 but greater than 10, so it will be inserted as the
right subtree of 10.
Step 8 - Insert 20.
20 is smaller than 45 but greater than 15, so it will be inserted as the right
subtree of 15.
50 is greater than 45 but smaller than 79 and 55. So, it will be inserted as a
left subtree of 55.
Now, the creation of binary search tree is completed. After that, let's move
towards the operations that can be performed on Binary search tree.
We can perform insert, delete and search operations on the binary search
tree.
3. If it is not matched, then check whether the item is less than the root
element, if it is smaller than the root element, then move to the left
subtree.
4. If it is larger than the root element, then move to the right subtree.
6. If the element is not found or not present in the tree, then return
NULL.
Step1:
Step2:
Step3:
Now, let's see the algorithm to search an element in the Binary search tree.
It is the simplest case to delete a node in BST. Here, we have to replace the
leaf node with NULL and simply free the allocated space.
We can see the process to delete a leaf node from BST in the below image.
In below image, suppose we have to delete node 90, as the node to be
deleted is a leaf node, so it will be replaced with NULL, and the allocated
space will free.
When the node to be deleted has only one child
In this case, we have to replace the target node with its child, and then
delete the child node. It means that after replacing the target node with its
child node, the child node will now contain the value to be deleted. So, we
simply have to replace the child node with NULL and free up the allocated
space.
We can see the process of deleting a node with one child from BST in the
below image. In the below image, suppose we have to delete the node 79,
as the node to be deleted has only one child, so it will be replaced with its
child 55.
So, the replaced node 79 will now be a leaf node that can be easily deleted.
o After that, replace that node with the inorder successor until the
target node is placed at the leaf of tree.
o And at last, replace the node with NULL and free up the allocated
space.
The inorder successor is required when the right child of the node is not
empty. We can obtain the inorder successor by finding the minimum
element in the right child of the node.
We can see the process of deleting a node with two children from BST in
the below image. In the below image, suppose we have to delete node 45
that is the root node, as the node to be deleted has two children, so it will
be replaced with its inorder successor. Now, node 45 will be at the leaf of
the tree so that it can be deleted easily.
Now, let's see the process of inserting a node into BST using an example.
The complexity of the Binary Search tree
Let's see the time and space complexity of the Binary search tree. We will
see the time complexity for insertion, deletion, and searching operations in
best case, average case, and worst case.
1. Time Complexity
Operations Best case Average case Worst case
time time time
complexity complexity complexity
Insertion O(n)
Deletion O(n)
Search O(n)
Here, we will see the inorder traversal of the tree to check whether the
nodes of the tree are in their proper location or not. We know that the
inorder traversal always gives us the data in ascending order. So, after
performing the insertion and deletion operations, we perform the inorder
traversal, and after traversing, if we get data in ascending order, then it is
clear that the nodes are in their proper location.
1. #include <iostream>
2. using namespace std;
3. struct Node {
4. int data;
5. Node *left;
6. Node *right;
7. };
8. Node* create(int item)
9. {
10. Node* node = new Node;
11. node->data = item;
12. node->left = node->right = NULL;
13. return node;
14. }
15. /*Inorder traversal of the tree formed*/
16. void inorder(Node *root)
17. {
18. if (root == NULL)
19. return;
20. inorder(root->left); //traverse left subtree
21. cout<< root->data << " "; //traverse root node
22. inorder(root->right); //traverse right subtree
23. }
24. Node* findMinimum(Node* cur) /*To find the inorder successor*/
25. {
26. while(cur->left != NULL) {
27. cur = cur->left;
28. }
29. return cur;
30. }
31. Node* insertion(Node* root, int item) /*Insert a node*/
32. {
33. if (root == NULL)
34. return create(item); /*return new node if tree is empty*/
35. if (item < root->data)
36. root->left = insertion(root->left, item);
37. else
38. root->right = insertion(root->right, item);
39. return root;
40. }
41. void search(Node* &cur, int item, Node* &parent)
42. {
43. while (cur != NULL && cur->data != item)
44. {
45. parent = cur;
46. if (item < cur->data)
47. cur = cur->left;
48. else
49. cur = cur->right;
50. }
51. }
52. void deletion(Node*& root, int item) /*function to delete a node*/
53. {
54. Node* parent = NULL;
55. Node* cur = root;
56. search(cur, item, parent); /*find the node to be deleted*/
57. if (cur == NULL)
58. return;
59. if (cur->left == NULL && cur-
>right == NULL) /*When node has no children*/
60. {
61. if (cur != root)
62. {
63. if (parent->left == cur)
64. parent->left = NULL;
65. else
66. parent->right = NULL;
67. }
68. else
69. root = NULL;
70. free(cur);
71. }
72. else if (cur->left && cur->right)
73. {
74. Node* succ = findMinimum(cur->right);
75. int val = succ->data;
76. deletion(root, succ->data);
77. cur->data = val;
78. }
79. else
80. {
81. Node* child = (cur->left)? cur->left: cur->right;
82. if (cur != root)
83. {
84. if (cur == parent->left)
85. parent->left = child;
86. else
87. parent->right = child;
88. }
89. else
90. root = child;
91. free(cur);
92. }
93. }
94. int main()
95. {
96. Node* root = NULL;
97. root = insertion(root, 45);
98. root = insertion(root, 30);
99. root = insertion(root, 50);
100. root = insertion(root, 25);
101. root = insertion(root, 35);
102. root = insertion(root, 45);
103. root = insertion(root, 60);
104. root = insertion(root, 4);
105. printf("The inorder traversal of the given binary tree is - \n");
106. inorder(root);
107. deletion(root, 25);
108. printf("\nAfter deleting node 25, the inorder traversal of the given
binary tree is - \n");
109. inorder(root);
110. insertion(root, 2);
111. printf("\nAfter inserting node 2, the inorder traversal of the given b
inary tree is - \n");
112. inorder(root);
113. return 0;
114. }
Output
For example, in a family tree, each person has only one biological mother
and father, but they may have multiple grandparents, aunts, and uncles,
etc. In computer science and software programming, trees are often used
to represent the structure of HTML doc, trees are often used to represent
the structure of HTML documents or file systems. They can also be used
to store data such as DNA sequences or mathematical expressions. trees
are often implemented using pointers in programming languages such as
C++.
Types of Trees
After introducing trees in the data structure, we know they are used for
different purposes. Here is an overview of some of the most popular
types of trees in the data structure.
1. General Tree
A general tree is the most basic type of tree. It is made up of nodes that
can have any number of child nodes. There is no specific relationship
between the nodes; they can be traversed in any order. General trees are
used when the relationship between the nodes is not important.
2. Binary tree
A binary tree is a special type of tree where each and every node can
have no more than two child nodes. The left child node is always less
than the parent node, and the right child node is always greater than or
equal to the parent node. Binary trees are used when the nodes'
relationship is important and needs to be kept in order.
A binary search tree (BST) is a binary tree where every node has a value
greater than all the values in its left subtree and less than all the values in
its right subtree. BSTs are used when quickly searching for a value in a
large dataset is important.
4. AVL Tree
5. Red-Black Tree
6. N-ary Tree
An n-ary tree (or k-ary tree) generalizes BSTs and RBTs by allowing each
node to have no more than k children instead of just 2 children as in
BSTs/RBTs. N-ary trees are used when quick search timeshare is
important and when the data does not fit well into a traditional binary tree
structure(i.e., when k > 2).
Root: The root is the topmost node in a tree. It does not have a parent and
typically has zero or more child nodes.
Child Node: A child node is any node that has a parent. Child nodes can have
their own children (sub-nodes), which makes them parent nodes as well.
Parent: A parent is a node that has at least one child node. Parent nodes can
also have their own parents (super-nodes), making them child nodes.
Source
Sibling: Siblings are nodes that share the same parent node. They can be
thought of as "brothers and sisters" within the tree structure.
Leaf Node: A leaf node is any node with no child nodes. Leaf nodes are
typically the "end" of a tree branch.
Internal Nodes: An internal node is a node that has at least one child node.
Internal nodes are typically found "in-between" other nodes in a tree structure.
Ancestor Node: An ancestor node is any node that is on the path from the root
to the current node. Ancestor nodes can be thought of as "parents, grandparents,
etc."
Descendant: A descendant is a node that is a child, grandchild, great-
grandchild, etc., of the current node. In other words, a descendant is any node
that is "below" the current node in the tree structure.
Height of a Node: The height of a node is the no. of edges from the node to the
deepest leaf descendant. To put it another way, it is the "distance" from the
node to the bottom of the tree.
Depth of a Node: The depth of a node is the no. of edges from the root to the
node. Therefore, it is the "distance" from the root to the node.
Height of a Tree: The height of a tree is the height of its root node.
To get an insider's view into the advanced terminologies of trees in the
data structure, you can look for Python Programming for beginners and
experts in online courses. You can get the best out of your knowledge with
the most reliable resources around.
Applications
In computer science, the tree data structure can be used to store
hierarchical data. A tree traversal is a process of visiting each node in a
tree. This can be done in different ways like pre-order, post-order, or in-
order. Trees are also used to store data that naturally have hierarchical
relationships, like the file system on our computers. Besides that, trees
are also used in several applications like heaps, tries, and suffix trees.
Let's take a look at some of these applications:
2) Organize Data
3) Trie
With these three conditions met, Trie provides an efficient way to retrieve
strings from a dataset with a time complexity of O(M), where M is the
length of the string retrieved. This makes it suitable for dictionary
operations like autocomplete or spell check etc., which have become very
popular these days with internet users all over the world.
4) Heap
A heap is a special type of binary tree where every parent node has
either two child nodes or no child nodes, and every node satisfies one
heap property – min heap or max heap property software programming. The
Min heap property states that every parent node must have a value less
than or equal to its child node, while the max heap property specifies that
the parent node's value must be greater than or equal to the value of its
child nodes (in the case of two children).
B-trees and T-trees are two types of trees in the data structure that are
used to efficiently store large amounts of data. These trees are often
used in databases because they allow for quick insertion and deletion of
records while still maintaining fast access times.
6) Routing Table
There are two main types of binary trees: the binary tree and the binary
search tree. Both types of trees in data structure have their own unique
characteristics and drawbacks.
The biggest difference between the two types of trees in the data
structure is in how they are structured. A binary tree comprises two nodes,
each of which can have zero, one, or two child nodes. On the other hand,
a binary search tree is made up of nodes that each have two child nodes.
This difference in structure means that binary trees are more efficient
when searching for data, while binary search trees are more efficient
when it comes to insertions and deletions.
Another difference between the two types of trees in the data structure is
how they are traversed. Binary trees can be traversed in either a breadth-
first or a depth-first manner, while binary search trees can only be
traversed in a depth-first manner. This difference can be significant in
performance; the breadth-first traversal of a binary tree is typically faster
than the depth-first traversal of a binary search tree.
Trees offer quicker search, insertion, and deletion than other data
structures, such as linked lists, because of their shorter depth. For
example, to delete an element from a linked list, you need to traverse the
entire list until you find the element you want to delete, which could take
O(n) time if the list is unsorted and up to O(log n) if it's sorted.
However, if you know the value of the element you want to delete
beforehand, deleting it from a tree would only take O(log n) time since
you can simply search for it and then remove it.
2. Flexibility
Trees do not have a fixed size like arrays, so they can grow and shrink as
needed, making them very flexible, especially when dealing with dynamic
data sets. For example, let's say you have an array of integers that can
hold 100 elements, and you want to add the 101st element to it. Still,
unfortunately, there's no more space left in the array, so you have to
create a larger array big enough to hold all 101 elements and then copy
all elements from the old array into a new one, which could be inefficient.
Furthermore, since treey does not have a fixed size, you could simply add
the 101st element without worrying about creating larger arrays or
copying data, making it more flexible than arrays.
3. Space Efficiency
Trees only require extra space for pointers since each node only needs to
store the address or reference of its child nodes, unlike arrays which
require extra space for every single element even if some of those
elements are not used yet. For example, let's say we have an array of
integers that can hold 1000 elements, but we only store 500 values. Then
half of our array's memory would go wasted, which is not very space
efficient.
In contrast, with trees, since each node only needs extra space for the
address or reference of its child nodes, we don't waste any memory even
if some parts of our tree are empty.
Tree Data Structure
We read the linear data structures like an array, linked list, stack and queue in which all
the elements are arranged in a sequential manner. The different data structures are used
for different kinds of data.
o Cost of operations: If we want to minimize the cost for the operations for the
most frequently performed operations. For example, we have a simple list on
which we have to perform the search operation; then, we can create an array in
which elements are stored in sorted order to perform the binary search. The
binary search works very fast for the simple list as it divides the search space into
half.
o Memory usage: Sometimes, we want a data structure that utilizes less memory.
A tree is also one of the data structures that represent hierarchical data. Suppose we
want to show the employees and their positions in the hierarchical form then it can be
represented as shown below:
The above tree shows the organization hierarchy of some company. In the above
structure, john is the CEO of the company, and John has two direct reports named
as Steve and Rohan. Steve has three direct reports named Lee, Bob, Ella where Steve is
a manager. Bob has two direct reports named Sal and Emma. Emma has two direct
reports named Tom and Raj. Tom has one direct report named Bill. This particular
logical structure is known as a Tree. Its structure is similar to the real tree, so it is named
a Tree. In this structure, the root is at the top, and its branches are moving in a
downward direction. Therefore, we can say that the Tree data structure is an efficient
way of storing the data in a hierarchical way.
o A tree data structure is a non-linear data structure because it does not store in a
sequential manner. It is a hierarchical structure as elements in a Tree are arranged
in multiple levels.
o In the Tree data structure, the topmost node is known as a root node. Each node
contains some data, and data can be of any type. In the above tree structure, the
node contains the name of the employee, so the type of data would be a string.
o Each node contains some data and the link or reference of other nodes that can
be called children.
In the above structure, each node is labeled with some number. Each arrow shown in the
above figure is known as a link between the two nodes.
o Root: The root node is the topmost node in the tree hierarchy. In other words,
the root node is the one that doesn't have any parent. In the above structure,
node numbered 1 is the root node of the tree. If a node is directly linked to
some other node, it would be called a parent-child relationship.
o Child node: If the node is a descendant of any node, then the node is known as a
child node.
o Parent: If the node contains any sub-node, then that node is said to be the
parent of that sub-node.
o Sibling: The nodes that have the same parent are known as siblings.
o Leaf Node:- The node of the tree, which doesn't have any child node, is called a
leaf node. A leaf node is the bottom-most node of the tree. There can be any
number of leaf nodes present in a general tree. Leaf nodes can also be called
external nodes.
o Internal nodes: A node has atleast one child node known as an internal
o Number of edges: If there are n nodes, then there would n-1 edges. Each arrow
in the structure represents the link or path. Each node, except the root node, will
have atleast one incoming link known as an edge. There would be one link for the
parent-child relationship.
o Depth of node x: The depth of node x can be defined as the length of the path
from the root to the node x. One edge contributes one-unit length in the path.
So, the depth of node x can also be defined as the number of edges between the
root node and the node x. The root node has 0 depth.
o Height of node x: The height of node x can be defined as the longest path from
the node x to the leaf node.
Based on the properties of the Tree data structure, trees are classified into various
categories.
Implementation of Tree
The tree data structure can be created by creating the nodes dynamically with the help
of the pointers. The tree in the memory can be represented as shown below:
The above figure shows the representation of the tree data structure in the memory. In
the above structure, the node contains three fields. The second field stores the data; the
first field stores the address of the left child, and the third field stores the address of the
right child.
1. struct node
2. {
3. int data;
4. struct node *left;
5. struct node *right;
6. }
The above structure can only be defined for the binary trees because the binary tree can
have utmost two children, and generic trees can have more than two children. The
structure of the node for generic trees would be different as compared to the binary
tree.
Applications of trees
The following are the applications of trees:
o Storing naturally hierarchical data: Trees are used to store the data in the
hierarchical structure. For example, the file system. The file system stored on the
disc drive, the file and folder are in the form of the naturally hierarchical data and
stored in the form of trees.
o Organize data: It is used to organize data for efficient insertion, deletion and
searching. For example, a binary tree has a logN time for searching an element.
o Trie: It is a special kind of tree that is used to store the dictionary. It is a fast and
efficient way for dynamic spell checking.
o B-Tree and B+Tree: B-Tree and B+Tree are the tree data structures used to
implement indexing in databases.
o Routing table: The tree data structure is also used to store the data in routing
tables in the routers.
o General tree: The general tree is one of the types of tree data structure. In the
general tree, a node can have either 0 or maximum n number of nodes. There is
no restriction imposed on the degree of the node (the number of nodes that a
node can contain). The topmost node in a general tree is known as a root node.
The children of the parent node are known as subtrees.
There can be n number of subtrees in a general tree. In the general tree, the
subtrees are unordered as the nodes in the subtree cannot be ordered.
Every non-empty tree has a downward edge, and these edges are connected to
the nodes known as child nodes. The root node is labeled with level 0. The nodes
that have the same parent are known as siblings.
o Binary tree: Here, binary name itself suggests two numbers, i.e., 0 and 1. In a
binary tree, each node in a tree can have utmost two child nodes. Here, utmost
means whether the node has 0 nodes, 1 node or 2 nodes.
To know more about the binary tree, click on the link given below:
https://www.javatpoint.com/binary-tree
o Binary Search tree: Binary search tree is a non-linear data structure in which one
node is connected to n number of nodes. It is a node-based data structure. A
node can be represented in a binary search tree with three fields, i.e., data part,
left-child, and right-child. A node can be connected to the utmost two child
nodes in a binary search tree, so the node contains two pointers (left child and
right child pointer).
Every node in the left subtree must contain a value less than the value of the root
node, and the value of each node in the right subtree must be bigger than the
value of the root node.
A node can be created with the help of a user-defined data type known as struct, as
shown below:
1. struct node
2. {
3. int data;
4. struct node *left;
5. struct node *right;
6. }
The above is the node structure with three fields: data field, the second field is the left
pointer of the node type, and the third field is the right pointer of the node type.
To know more about the binary search tree, click on the link given below:
https://www.javatpoint.com/binary-search-tree
o AVL tree
It is one of the types of the binary tree, or we can say that it is a variant of the binary
search tree. AVL tree satisfies the property of the binary tree as well as of the binary
search tree. It is a self-balancing binary search tree that was invented by Adelson
Velsky Lindas. Here, self-balancing means that balancing the heights of left subtree and
right subtree. This balancing is measured in terms of the balancing factor.
We can consider a tree as an AVL tree if the tree obeys the binary search tree as well as
a balancing factor. The balancing factor can be defined as the difference between the
height of the left subtree and the height of the right subtree. The balancing factor's
value must be either 0, -1, or 1; therefore, each node in the AVL tree should have the
value of the balancing factor either as 0, -1, or 1.
To know more about the AVL tree, click on the link given below:
https://www.javatpoint.com/avl-tree
o Red-Black Tree
The red-Black tree is the binary search tree. The prerequisite of the Red-Black tree is
that we should know about the binary search tree. In a binary search tree, the value of
the left-subtree should be less than the value of that node, and the value of the right-
subtree should be greater than the value of that node. As we know that the time
complexity of binary search in the average case is log2n, the best case is O(1), and the
worst case is O(n).
When any operation is performed on the tree, we want our tree to be balanced so that
all the operations like searching, insertion, deletion, etc., take less time, and all these
operations will have the time complexity of log2n.
The red-black tree is a self-balancing binary search tree. AVL tree is also a height
balancing binary search tree then why do we require a Red-Black tree. In the AVL tree,
we do not know how many rotations would be required to balance the tree, but in the
Red-black tree, a maximum of 2 rotations are required to balance the tree. It contains
one extra bit that represents either the red or black color of a node to ensure the
balancing of the tree.
o Splay tree
The splay tree data structure is also binary search tree in which recently accessed
element is placed at the root position of tree by performing some rotation operations.
Here, splaying means the recently accessed node. It is a self-balancing binary search
tree having no explicit balance condition like AVL tree.
It might be a possibility that height of the splay tree is not balanced, i.e., height of both
left and right subtrees may differ, but the operations in splay tree takes order
of logN time where n is the number of nodes.
Splay tree is a balanced tree but it cannot be considered as a height balanced tree
because after each operation, rotation is performed which leads to a balanced tree.
o Treap
Treap data structure came from the Tree and Heap data structure. So, it comprises the
properties of both Tree and Heap data structures. In Binary search tree, each node on
the left subtree must be equal or less than the value of the root node and each node on
the right subtree must be equal or greater than the value of the root node. In heap data
structure, both right and left subtrees contain larger keys than the root; therefore, we
can say that the root node contains the lowest value.
In treap data structure, each node has both key and priority where key is derived from
the Binary search tree and priority is derived from the heap data structure.
The Treap data structure follows two properties which are given below:
o Right child of a node>=current node and left child of a node <=current node
(binary tree)
o B-tree
B-tree is a balanced m-way tree where m defines the order of the tree. Till now, we read
that the node contains only one key but b-tree can have more than one key, and more
than 2 children. It always maintains the sorted data. In binary tree, it is possible that leaf
nodes can be at different levels, but in b-tree, all the leaf nodes must be at the same
level.
o For minimum children, a leaf node has 0 children, root node has minimum 2
children and internal node has minimum ceiling of m/2 children. For example, the
value of m is 5 which means that a node can have 5 children and internal nodes
can contain maximum 3 children.
The root node must contain minimum 1 key and all other nodes must contain
atleast ceiling of m/2 minus 1 keys.
Graph
A graph can be defined as group of vertices and edges that are used to connect these
vertices. A graph can be seen as a cyclic tree, where the vertices (Nodes) maintain any
complex relationship among them instead of having parent child relationship.
Definition
A graph G can be defined as an ordered set G(V, E) where V(G) represents the set of
vertices and E(G) represents the set of edges which are used to connect these vertices.
A Graph G(V, E) with 5 vertices (A, B, C, D, E) and six edges ((A,B), (B,C), (C,E), (E,D), (D,B),
(D,A)) is shown in the following figure.
In a directed graph, edges form an ordered pair. Edges represent a specific path from
some vertex A to another vertex B. Node A is called initial node while node B is called
terminal node.
Closed Path
A path will be called as closed path if the initial node is same as terminal node. A path
will be closed path if V0=VN.
Simple Path
If all the nodes of the graph are distinct with an exception V0=VN, then such path P is
called as closed simple path.
Cycle
A cycle can be defined as the path which has no repeated edges or vertices except the
first and last vertices.
Connected Graph
A connected graph is the one in which some path exists between every two vertices (u,
v) in V. There are no isolated nodes in connected graph.
Complete Graph
A complete graph is the one in which every node is connected with all other nodes. A
complete graph contain n(n-1)/2 edges where n is the number of nodes in the graph.
Weighted Graph
In a weighted graph, each edge is assigned with some data such as length or weight.
The weight of an edge e can be given as w(e) which must be a positive (+) value
indicating the cost of traversing the edge.
Digraph
A digraph is a directed graph in which each edge of the graph is associated with some
direction and the traversing can be done only in the specified direction.
Loop
An edge that is associated with the similar end points can be called as Loop.
Adjacent Nodes
If two nodes u and v are connected via an edge e, then the nodes u and v are called as
neighbours or adjacent nodes.
A graph is a data structure that consist a sets of vertices (called nodes) and edges. There
are two ways to store Graphs into the computer's memory:
Now, let's start discussing the ways of representing a graph in the data structure.
Sequential representation
In sequential representation, there is a use of an adjacency matrix to represent the
mapping between vertices and edges of the graph. We can use an adjacency matrix to
represent the undirected graph, directed graph, weighted directed graph, and weighted
undirected graph.
If adj[i][j] = w, it means that there is an edge exists from vertex i to vertex j with weight
w.
aij = 0 {Otherwise}
In the above figure, an image shows the mapping among the vertices (A, B, C, D, E), and
this mapping is represented by using the adjacency matrix.
There exist different adjacency matrices for the directed and undirected graph. In a
directed graph, an entry Aij will be 1 only when there is an edge directed from Vi to Vj.
Consider the below-directed graph and try to construct the adjacency matrix of it.
In the above graph, we can see there is no self-loop, so the diagonal entries of the
adjacent matrix are 0.
In the above image, we can see that the adjacency matrix representation of the
weighted directed graph is different from other representations. It is because, in this
representation, the non-zero values are replaced by the actual weight assigned to the
edges.
Adjacency matrix is easier to implement and follow. An adjacency matrix can be used
when the graph is dense and a number of edges are large.
In the above figure, we can see that there is a linked list or adjacency list for every node
of the graph. From vertex A, there are paths to vertex B and vertex D. These nodes are
linked to nodes A in the given adjacency list.
An adjacency list is maintained for each node present in the graph, which stores the
node value and a pointer to the next adjacent node to the respective node. If all the
adjacent nodes are traversed, then store the NULL in the pointer field of the last node of
the list.
The sum of the lengths of adjacency lists is equal to twice the number of edges present
in an undirected graph.
Now, consider the directed graph, and let's see the adjacency list representation of that
graph.
For a directed graph, the sum of the lengths of adjacency lists is equal to the number of
edges present in the graph.
Now, consider the weighted directed graph, and let's see the adjacency list
representation of that graph.
In the case of a weighted directed graph, each node contains an extra field that is called
the weight of the node.
In an adjacency list, it is easy to add a vertex. Because of using the linked list, it also
saves space.
Here, there are four vertices and five edges in the graph that are non-directed.
Output:
Output:
In the output, we will see the adjacency list representation of all the vertices of the
graph. After the execution of the above code, the output will be -
Introduction to Graphs – Data Structure and
Algorithm
A Graph is a non-linear data structure consisting of vertices and edges. The vertices are sometimes
also referred to as nodes and the edges are lines or arcs that connect any two nodes in the graph.
More formally a Graph is composed of a set of vertices( V ) and a set of edges( E ). The graph is
denoted by G(V, E).
Graph data structures are a powerful tool for representing and analyzing complex relationships
between objects or entities. They are particularly useful in fields such as social network analysis,
recommendation systems, and computer networks. In the field of sports data science, graph data
structures can be used to analyze and understand the dynamics of team performance and player
interactions on the field.
Components of a Graph
Vertices: Vertices are the fundamental units of the graph. Sometimes, vertices are also known as
vertex or nodes. Every node/vertex can be labeled or unlabelled.
Edges: Edges are drawn or used to connect two nodes of the graph. It can be ordered pair of
nodes in a directed graph. Edges can connect any two nodes in any possible way. There are no
rules. Sometimes, edges are also known as arcs. Every edge can be labelled/unl abelled.
Types Of Graph
1. Null Graph
A graph is known as a null graph if there are no edges in the graph.
2. Trivial Graph
Graph having only a single vertex, it is also the smallest graph possible.
3. Undirected Graph
A graph in which edges do not have any direction. That is the nodes are unordered
pairs in the definition of every edge.
4. Directed Graph
A graph in which edge has direction. That is the nodes are ordered pairs in the
definition of every edge.
5. Connected Graph
The graph in which from one node we can visit any other node in the graph is known
as a connected graph.
6. Disconnected Graph
The graph in which at least one node is not reachable from a node is known as a
disconnected graph.
7. Regular Graph
The graph in which the degree of every vertex is equal to K is called K regular graph.
8. Complete Graph
The graph in which from each node there is an edge to each other node.
.
9. Cycle Graph
The graph in which the graph is a cycle in itself, the degree of each vertex is 2.
10. Cyclic Graph
A graph containing at least one cycle is known as a Cyclic graph.
11. Directed Acyclic Graph
A Directed Graph that does not contain any cycle.
12. Bipartite Graph
A graph in which vertex can be divided into two sets such that vertex in each set does
not contain any edge between them.
Adjacency List
This graph is represented as a collection of linked lists. There is an array of pointer
which points to the edges connected to that vertex.
Advantages:
1. Graphs are a versatile data structure that can be used to represent a wide range
of relationships and data structures.
2. They can be used to model and solve a wide range of problems, including
pathfinding, data clustering, network analysis, and machine learning.
3. Graph algorithms are often very efficient and can be used to solve complex
problems quickly and effectively.
4. Graphs can be used to represent complex data structures in a simple and intuiti ve
way, making them easier to understand and analyze.
Disadvantages:
1. Graphs can be complex and difficult to understand, especially for people who are
not familiar with graph theory or related algorithms.
2. Creating and manipulating graphs can be computationally expensive, especially
for very large or complex graphs.
3. Graph algorithms can be difficult to design and implement correctly, and can be
prone to bugs and errors.
4. Graphs can be difficult to visualize and analyze, especially for very large or
complex graphs, which can make it challenging to extract meaningful insights
from the data
Applications of Graphs in Data Structure
Graphs data structures have a variety of applications. Some of the most popular
applications are:
What if you are provided with a graph of nodes where every node is linked to several
other nodes with varying distance. Now, if you begin from one of the nodes in the graph,
Well simply explained, an algorithm that is used for finding the shortest distance, or
path, from starting node to target node in a weighted graph is known as Dijkstra’s
Algorithm.
This algorithm makes a tree of the shortest path from the starting node, the source, to all
Dijkstra's algorithm makes use of weights of the edges for finding the path that
minimizes the total distance (weight) among the source node and all other nodes. This
Dijkstra’s algorithm is the iterative algorithmic process to provide us with the shortest
path from one specific starting node to all other nodes of a graph. It is different from
the minimum spanning tree as the shortest distance among two vertices might not
positive because, during the execution, the weights of the edges are added to find the
shortest path.
And therefore if any of the weights are introduced to be negative on the edges of the
graph, the algorithm would never work properly. However, some algorithms like
It is also a known fact that breadth-first search(BFS) could be used for calculating the
shortest path for an unweighted graph, or for a weighted graph that has the same cost at
But if the weighted graph has unequal costs at all its edges, then BFS infers uniform-cost
Instead of extending nodes in order of their depth from the root, uniform-cost search
develops the nodes in order of their costs from the root. And a variant of this algorithm is
approximation of the accurate distance is steadily displaced by more suitable values until
is generally substituted by the least of its previous value with the distance of a recently
determined path.
It uses a priority queue to greedily choose the nearest node that has not been visited yet
Example Involved
For example, an individual wants to calculate the shortest distance between the source, A,
and the destination, D, while calculating a subpath which is also the shortest path
between its source and destination. Let’s see here how Dijkstra’s algorithm works;
It works on the fact that any subpath, let say a subpath B to D of the shortest path
between vertices A and D is also the shortest path between vertices B and D, i.e., each
Here, Dijkstra’s algorithm uses this property in the reverse direction, that means, while
determining distance, we overestimate the distance of each vertex from the starting vertex
then inspect each node and its neighbours for detecting the shortest subpath to those
neighbours.
This way the algorithm deploys a greedy approach by searching for the next plausible
solution and expects that the end result would be the appropriate solution for the entire
problem.
Before proceeding the step by step process for implementing the algorithm, let us
Basically, the Dijkstra’s algorithm begins from the node to be selected, the source node,
and it examines the entire graph to determine the shortest path among that node and all
The algorithm maintains the track of the currently recognized shortest distance from each
node to the source code and updates these values if it identifies another shortest path.
Once the algorithm has determined the shortest path amid the source code to another
node, the node is marked as “visited” and can be added to the path.
This process is being continued till all the nodes in the graph have been added to the path,
as this way, a path gets created that connects the source node to all the other nodes
2. Mark the picked starting node with a current distance of 0 and the rest nodes with
infinity,
4. For the current node, analyse all of its unvisited neighbours and measure their
distances by adding the current distance of the current node to the weight of the
5. Compare the recently measured distance with the current distance assigned to the
neighbouring node and make it as the new current distance of the neighbouring
node,
6. After that, consider all of the unvisited neighbours of the current node, mark the
7. If the destination node has been marked visited then stop, an algorithm has ended,
and
8. Else, choose the unvisited node that is marked with the least distance, fix it as the
new current node, and repeat the process again from step 4.
In the above section, you have gained the step by step process of Dijkstra’s algorithm,
1. During the execution of the algorithm, each node will be marked with its
In this case, the minimum distance is 0 for node C. Also, for the rest of the nodes,
as we don’t know this distance, they will be marked as infinity (∞), except node C
2. Now the neighbours of node C will be checked, i.e, node A, B, and D. We start
with B, here we will add the minimum distance of current node (0) with the weight
of the edge (7) that linked the node C to node B and get 0+ 7= 7.
Now, this value will be compared with the minimum distance of B (infinity), the
least value is the one that remains the minimum distance of B, like in this case, 7 is
edge that connects node C to A), and get 1. Again, 1 is compared with the
Since, all the neighbours of node C have checked, so node C is marked as visited
4. Now, we will select the new current node such that the node must be unvisited
with the lowest minimum distance, or the node with the least number and no check
mark. Here, node A is the unvisited with minimum distance 1, marked as current
We repeat the algorithm, checking the neighbour of the current node while
obtain 4. This value, 4, will be compared with the minimum distance of B, 7, and
5. After this, node A marked as visited with a green check mark. The current node is
selected as node D, it is unvisited and has a smallest recent distance. We repeat the
which is infinity, and mark the smallest value as node E as 9. The node D is
6. The current node is set as node B, here we need to check only node E as it is
unvisited and the node D is visited. We obtain 4+ 1=5, compare it with the
We mark D as visited node with a green check mark, and node E is set as the
current node.
Marked Node B as visited
7. Since it doesn’t have any unvisited neighbours, so there is not any requirement to
So, we are done as no unvisited node is left. The minimum distance of each node is now
algorithm that could help us in real-world applications. Such as, for Dijkstra’s algorithm,
For example, if a person wants to travel from city A to city B where both cities are
connected with various routes. Which route commonly he/ she should choose?
Undoubtedly, we would adopt the route through which we could reach the destination
Further, with the discussion, it has various real-world use cases, some of the applications
For map applications, it is hugely deployed in measuring the least possible distance and
check direction amidst two geographical regions like Google Maps, discovering map
locations pointing to the vertices of a graph, calculating traffic and delay-timing, etc.
For telephone networks, this is also extensively implemented in the conducting of data in
networking and telecommunication domains for decreasing the obstacle taken place for
transmission.
Wherever addressing the need for shortest path explications either in the domain of
algorithm is applied.
Besides that, other applications are road conditions, road closures and construction, and
1. One of the main advantages of it is its little complexity which is almost linear.
2. It can be used to calculate the shortest path between a single node to all other
nodes and a single source node to a single destination node by stopping the
algorithm once the shortest distance is achieved for the destination node.
3. It only works for directed-, weighted graphs and all edges should have non-
negative values.
3. As it heads to the acyclic graph, so can’t achieve the accurate shortest path, and
In data structures,
Shortest path problem is a problem of finding the shortest path(s) between
vertices of a given graph.
Shortest path between two vertices is a path that has the least cost as
compared to all other existing paths.
Shortest path algorithms are a family of algorithms used for solving the
shortest path problem.
Applications-
It is a shortest path problem where the shortest path between a given pair of
vertices is computed.
A* Search Algorithm is a famous algorithm used for solving single-pair
shortest path problem.
It is a shortest path problem where the shortest path from all the vertices to
a single destination vertex is computed.
By reversing the direction of each edge in the graph, this problem reduces
to single-source shortest path problem.
Dijkstra’s Algorithm is a famous algorithm adapted for solving single-
destination shortest path problem.
It is a shortest path problem where the shortest path between every pair of
vertices is computed.
Floyd-Warshall Algorithm and Johnson’s Algorithm are the famous
algorithms used for solving All pairs shortest path problem.
The shortest path problem is the problem of finding a path between two vertices (or
nodes) in a graph such that the sum of the weights of its constituent edges is
minimized. The shortest path between any two nodes of the graph can be founded
using many algorithms, such as Dijkstra’s algorithm, Bellman-Ford algorithm, Floyd
Warshall algorithm. There are some properties of finding the shortest paths based on
which the algorithm to find the shortest path works:
1. Optimal Substructure Property
All the sub-paths of the shortest path must also be the shortest paths.
If there exists the shortest path length between two nodes U and V, then
greedily choosing the edge with the minimum length between V to S will give
the shortest path length between U and S.
All the algorithms listed above work based on this property.
For example, let P1 be a sub-path from (X → Y) of the shortest path (S →X
→Y → V) of graph G. And let P2 be any other path (X → Y) in graph G.
Then, the cost of P1 must be less than or equal to the cost of P2.
Otherwise, the path (S →X →Y → V) will not be the shortest path between
nodes S and V.
Graph G
2. Triangle Inequality
Let d(a, b) be the length of the shortest path from a to b in graph G1. Then,
d(a, b) ≤ d(a, x) + d(x, b)
Dijkstra Algorithm-
Conditions-
Dijkstra Algorithm-
S ← ∅ // The set of vertices that have been visited 'S' is initially empty
do u ← mindistance (Q, dist) // A vertex from Q with the least distance is selected
S ← S ∪ {u} // Vertex 'u' is added to 'S' list of vertices that have been visited
for all v ∈ neighbors[u] // For all the neighboring vertices of vertex 'u'
then dist[v] ← dist[u] + w(u,v) // The new value of the shortest path is selected
return dist
Implementation-
Step-01:
One set contains all those vertices which have been included in the shortest path tree.
In the beginning, this set is empty.
Other set contains all those vertices which are still left to be included in the shortest path tree.
In the beginning, this set contains all the vertices of the given graph.
Step-02:
For each vertex of the given graph, two variables are defined as-
The value of variable ‘Π’ for each vertex is set to NIL i.e. Π[v] = NIL
The value of variable ‘d’ for source vertex is set to 0 i.e. d[S] = 0
The value of variable ‘d’ for remaining vertices is set to ∞ i.e. d[v] = ∞
Step-03:
The following procedure is repeated until all the vertices of the graph are processed-
Among unprocessed vertices, a vertex with minimum value of variable ‘d’ is chosen.
Its outgoing edges are relaxed.
After relaxing the edges for that vertex, the sets created in step-01 are updated.
Here, d[a] and d[b] denotes the shortest path estimate for vertices a and b respectively from the
source vertex ‘S’.
Now,
Case-01:
Here,
Case-02:
Here,
With adjacency list representation, all vertices of the graph can be traversed using BFS in
O(V+E) time.
In min heap, operations like extract-min and decrease-key value takes O(logV) time.
So, overall time complexity becomes O(E+V) x O(logV) which is O((E + V) x logV) = O(ElogV)
This time complexity can be reduced to O(E+VlogV) using Fibonacci heap.
Problem-
Using Dijkstra’s Algorithm, find the shortest distance from source vertex ‘S’ to remaining vertices
in the following graph-
Also, write the order in which the vertices are visited.
Solution-
Step-01:
Unvisited set : {S , a , b , c , d , e}
Visited set : { }
Step-02:
The two variables Π and d are created for each vertex and initialized as-
Step-03:
Now,
d[S] + 1 = 0 + 1 = 1 < ∞
∴ d[a] = 1 and Π[a] = S
d[S] + 5 = 0 + 5 = 5 < ∞
∴ d[b] = 5 and Π[b] = S
Unvisited set : {a , b , c , d , e}
Visited set : {S}
Step-04:
d[a] + 2 = 1 + 2 = 3 < ∞
∴ d[c] = 3 and Π[c] = a
d[a] + 1 = 1 + 1 = 2 < ∞
∴ d[d] = 2 and Π[d] = a
d[b] + 2 = 1 + 2 = 3 < 5
∴ d[b] = 3 and Π[b] = a
Unvisited set : {b , c , d , e}
Visited set : {S , a}
Step-05:
d[d] + 2 = 2 + 2 = 4 < ∞
∴ d[e] = 4 and Π[e] = d
Unvisited set : {b , c , e}
Visited set : {S , a , d}
Step-06:
Now,
d[b] + 2 = 3 + 2 = 5 > 2
∴ No change
After edge relaxation, our shortest path tree remains the same as in Step-05.
Unvisited set : {c , e}
Visited set : {S , a , d , b}
Step-07:
d[c] + 1 = 3 + 1 = 4 = 4
∴ No change
After edge relaxation, our shortest path tree remains the same as in Step-05.
Step-08:
Unvisited set : { }
Visited set : {S , a , d , b , c , e}
Now,
All vertices of the graph are processed.
Our final shortest path tree is as shown below.
It represents the shortest path from source vertex ‘S’ to all other remaining vertices.
S , a , d , b , c , e.
What is Sorting?
Sorting is a process of ordering or placing a list of elements from a collection in
some kind of order. It is nothing but storage of data in sorted order. Sorting can be
done in ascending and descending order. It arranges the data in a sequence which
makes searching easier.
Employee No.
Employee Name
Employee Salary
Department Name
Here, employee no. can be takes as key for sorting the records in ascending or
descending order. Now, we have to search a Employee with employee no. 116, so
we don't require to search the complete record, simply we can search between the
Employees with employee no. 100 to 120.
Sorting Techniques
Sorting technique depends on the situation. It depends on two parameters.
1. Execution time of program that means time taken for execution of program.
2. Space that means space taken by the program.
1. Bubble Sort
2. Insertion Sort
3. Selection Sort
4. Quick Sort
5. Heap Sort
1. Bubble Sort
Bubble sort is a type of sorting.
It is used for sorting 'n' (number of items) elements.
It compares all the elements one by one and sorts them based on their values.
The above diagram represents how bubble sort actually works. This sort takes
O(n2) time. It starts with the first two elements and sorts them in ascending order.
Bubble sort starts with first two elements. It compares the element to check which
one is greater.
In the above diagram, element 40 is greater than 10, so these values must be
swapped. This operation continues until the array is sorted in ascending order.
Example: Program for Bubble Sort
#include <stdio.h>
void bubble_sort(long [], long);
int main()
{
long array[100], n, c, d, swap;
printf("Enter Elements\n");
scanf("%ld", &n);
printf("Enter %ld integers\n", n);
for (c = 0; c < n; c++)
scanf("%ld", &array[c]);
bubble_sort(array, n);
printf("Sorted list in ascending order:\n");
for ( c = 0 ; c < n ; c++ )
printf("%ld\n", array[c]);
return 0;
}
void bubble_sort(long list[], long n)
{
long c, d, t;
for (c = 0 ; c < ( n - 1 ); c++)
{
for (d = 0 ; d < n - c - 1; d++)
{
if (list[d] > list[d+1])
{
/* Swapping */
t = list[d];
list[d] = list[d+1];
list[d+1] = t;
}
}
}
}
Output:
2. Insertion Sort
Insertion sort is a simple sorting algorithm.
This sorting method sorts the array by shifting elements one by one.
It builds the final sorted array one item at a time.
Insertion sort has one of the simplest implementation.
This sort is efficient for smaller data sets but it is insufficient for larger lists.
It has less space complexity like bubble sort.
It requires single additional memory space.
Insertion sort does not change the relative order of elements with equal keys
because it is stable.
The above diagram represents how insertion sort works. Insertion sort works like
the way we sort playing cards in our hands. It always starts with the second
element as key. The key is compared with the elements ahead of it and is put it in
the right place.
In the above figure, 40 has nothing before it. Element 10 is compared to 40 and is
inserted before 40. Element 9 is smaller than 40 and 10, so it is inserted before 10
and this operation continues until the array is sorted in ascending order.
Example: Program for Insertion Sort
#include <stdio.h>
int main()
{
int n, array[1000], c, d, t;
printf("Enter number of elements\n");
scanf("%d", &n);
printf("Enter %d integers\n", n);
for (c = 0; c < n; c++)
{
scanf("%d", &array[c]);
}
for (c = 1 ; c <= n - 1; c++)
{
d = c;
while ( d > 0 && array[d] < array[d-1])
{
t = array[d];
array[d] = array[d-1];
array[d-1] = t;
d--;
}
}
printf("Sorted list in ascending order:\n");
for (c = 0; c <= n - 1; c++)
{
printf("%d\n", array[c]);
}
return 0;
}
Output:
Selection Sort
Selection sort is a simple sorting algorithm which finds the smallest element in the
array and exchanges it with the element in the first position. Then finds the second
smallest element and exchanges it with the element in the second position and
continues until the entire array is sorted.
In the above diagram, the smallest element is found in first pass that is 9 and it is
placed at the first position. In second pass, smallest element is searched from the
rest of the element excluding first element. Selection sort keeps doing this, until
the array is sorted.
Example: Program for Selection Sort
#include <stdio.h>
int main()
{
int array[100], n, c, d, position, swap;
printf("Enter number of elements\n");
scanf("%d", &n);
printf("Enter %d integers\n", n);
for ( c = 0 ; c < n ; c++ )
scanf("%d", &array[c]);
for ( c = 0 ; c < ( n - 1 ) ; c++ )
{
position = c;
for ( d = c + 1 ; d < n ; d++ )
{
if ( array[position] > array[d] )
position = d;
}
if ( position != c )
{
swap = array[c];
array[c] = array[position];
array[position] = swap;
}
}
printf("Sorted list in ascending order:\n");
for ( c = 0 ; c < n ; c++ )
printf("%d\n", array[c]);
return 0;
}
Output:
Quick Sort
Quick sort is also known as Partition-exchange sort based on the rule of Divide
and Conquer.
It is a highly efficient sorting algorithm.
Quick sort is the quickest comparison-based sorting algorithm.
It is very fast and requires less additional space, only O(n log n) space is required.
Quick sort picks an element as pivot and partitions the array around the picked
pivot.
There are different versions of quick sort which choose the pivot in
different ways:
4. Median as pivot
Step 2: Take two variables to point left and right of the list excluding pivot.
Step 5: While value at left < (Less than) pivot move right.
Step 6: While value at right > (Greater than) pivot move left.
Step 7: If both Step 5 and Step 6 does not match, swap left and right.
Step 8: If left = (Less than or Equal to) right, the point where they met is new
pivot.
The above diagram represents how to find the pivot value in an array. As we see,
pivot value divides the list into two parts (partitions) and then each part is
processed for quick sort. Quick sort is a recursive function. We can call the partition
function again.
Output:
Heap Sort
1. Shape Property
2. Heap Property
1. Shape property represents all the nodes or levels of the tree are fully filled.
Heap data structure is a complete binary tree.
2. Heap property is a binary tree with special characteristics. It can be classified
into two types:
I. Max-Heap
II. Min Heap
I. Max Heap: If the parent nodes are greater than their child nodes, it is called
a Max-Heap.
II. Min Heap: If the parent nodes are smaller than their child nodes, it is called
a Min-Heap.
#include <stdio.h>
void main()
{
int heap[10], no, i, j, c, root, temp;
printf("\n Enter no of elements :");
scanf("%d", &no);
printf("\n Enter the nos : ");
for (i = 0; i < no; i++)
scanf("%d", &heap[i]);
for (i = 1; i < no; i++)
{
c = i;
do
{
root = (c - 1) / 2;
if (heap[root] < heap[c]) /* to create MAX heap array */
{
temp = heap[root];
heap[root] = heap[c];
heap[c] = temp;
}
c = root;
} while (c != 0);
}
printf("Heap array : ");
for (i = 0; i < no; i++)
printf("%d\t ", heap[i]);
for (j = no - 1; j >= 0; j--)
{
temp = heap[0];
heap[0] = heap[j]; /* swap max element with rightmost leaf element */
heap[j] = temp;
root = 0;
do
{
c = 2 * root + 1; /* left node of root element */
if ((heap[c] < heap[c + 1]) && c < j-1)
c++;
if (heap[root]<heap[c] && c<j) /* again rearrange to max heap array */
{
temp = heap[root];
heap[root] = heap[c];
heap[c] = temp;
}
root = c;
} while (c < j);
}
printf("\n The sorted array is : ");
for (i = 0; i < no; i++)
printf("\t %d", heap[i]);
printf("\n Complexity : \n Best case = Avg case = Worst case = O(n logn) \n");
}
Output:
Sorting: Internal & external sorting
Floyd Warshall Algorithm-
Advantages-
Algorithm-
Create a |V| x |V| matrix // It represents the distance between every pair of vertices as given
For each cell (i,j) in M do-
if i = = j
M[ i ][ j ] = 0 // For all diagonal elements, value = 0
if (i , j) is an edge in E
M[ i ][ j ] = weight(i,j) // If there exists a direct edge between the vertices, value = weight of edge
else
M[ i ][ j ] = infinity // If there is no direct edge between the vertices, value = ∞
for k from 1 to |V|
for i from 1 to |V|
for j from 1 to |V|
if M[ i ][ j ] > M[ i ][ k ] + M[ k ][ j ]
M[ i ][ j ] = M[ i ][ k ] + M[ k ][ j ]
Time Complexity-
Floyd Warshall Algorithm consists of three loops over all the nodes.
The inner most loop consists of only constant complexity operations.
Hence, the asymptotic complexity of Floyd Warshall algorithm is O(n3).
Here, n is the number of nodes in the given graph.
Problem-
Solution-
Step-01:
Remove all the self loops and parallel edges (keeping the lowest weight
edge) from the graph.
In the given graph, there are neither self edges nor parallel edges.
Step-02:
Remember-
The process of radix sort works similar to the sorting of students names,
according to the alphabetical order. In this case, there are 26 radix formed due
to the 26 alphabets in English. In the first pass, the names of students are
grouped according to the ascending order of the first letter of their names.
After that, in the second pass, their names are grouped according to the
ascending order of the second letter of their name. And the process continues
until we find the sorted list.
Algorithm
1. radixSort(arr)
2. max = largest element in the given array
3. d = number of digits in the largest element (or, max)
4. Now, create d buckets of size 0 - 9
5. for i -> 0 to d
6. sort the array elements using counting sort (or any stable sort) according to th
e digits at
7. the ith place
The steps used in the sorting of radix sort are listed as follows -
o First, we have to find the largest element (suppose max) from the given
array. Suppose 'x' be the number of digits in max. The 'x' is calculated
because we need to go through the significant places of all elements.
o After that, go through one by one each significant place. Here, we have
to use any stable sorting algorithm to sort the digits of each significant
place.
Now let's see the working of radix sort in detail by using an example. To
understand it more clearly, let's take an unsorted array and try to sort it using
radix sort. It will make the explanation clearer and easier.
In the given array, the largest element is 736 that have 3 digits in it. So, the
loop will run up to three times (i.e., to the hundreds place). That means three
passes are required to sort the array.
Now, first sort the elements on the basis of unit place digits (i.e., x = 0). Here,
we are using the counting sort algorithm to sort the elements.
Pass 1:
In the first pass, the list is sorted on the basis of the digits at 0's place.
After the first pass, the array elements are -
Pass 2:
In this pass, the list is sorted on the basis of the next significant digits (i.e.,
digits at 10th place).
After the second pass, the array elements are -
Pass 3:
In this pass, the list is sorted on the basis of the next significant digits (i.e.,
digits at 100th place).
After the third pass, the array elements are -
1. Time Complexity
2. Space Complexity
Stable YES
1. #include <stdio.h>
2.
3. int getMax(int a[], int n) {
4. int max = a[0];
5. for(int i = 1; i<n; i++) {
6. if(a[i] > max)
7. max = a[i];
8. }
9. return max; //maximum element from the array
10. }
11.
12. void countingSort(int a[], int n, int place) // function to implement counting s
ort
13. {
14. int output[n + 1];
15. int count[10] = {0};
16.
17. // Calculate count of elements
18. for (int i = 0; i < n; i++)
19. count[(a[i] / place) % 10]++;
20.
21. // Calculate cumulative frequency
22. for (int i = 1; i < 10; i++)
23. count[i] += count[i - 1];
24.
25. // Place the elements in sorted order
26. for (int i = n - 1; i >= 0; i--) {
27. output[count[(a[i] / place) % 10] - 1] = a[i];
28. count[(a[i] / place) % 10]--;
29. }
30.
31. for (int i = 0; i < n; i++)
32. a[i] = output[i];
33. }
34.
35. // function to implement radix sort
36. void radixsort(int a[], int n) {
37.
38. // get maximum element from array
39. int max = getMax(a, n);
40.
41. // Apply counting sort to sort elements based on place value
42. for (int place = 1; max / place > 0; place *= 10)
43. countingSort(a, n, place);
44. }
45.
46. // function to print array elements
47. void printArray(int a[], int n) {
48. for (int i = 0; i < n; ++i) {
49. printf("%d ", a[i]);
50. }
51. printf("\n");
52. }
53.
54. int main() {
55. int a[] = {181, 289, 390, 121, 145, 736, 514, 888, 122};
56. int n = sizeof(a) / sizeof(a[0]);
57. printf("Before sorting array elements are - \n");
58. printArray(a,n);
59. radixsort(a, n);
60. printf("After applying Radix sort, the array elements are - \n");
61. printArray(a, n);
62. }
Output:
Before knowing more about the heap sort, let's first see a brief description
of Heap.
What is a heap?
A heap is a complete binary tree, and the binary tree is a tree in which the
node can have the utmost two children. A complete binary tree is a binary tree
in which all the levels except the last level, i.e., leaf node, should be completely
filled, and all the nodes should be left-justified.
Algorithm
1. HeapSort(arr)
2. BuildMaxHeap(arr)
3. for i = length(arr) to 2
4. swap arr[1] with arr[i]
5. heap_size[arr] = heap_size[arr] ? 1
6. MaxHeapify(arr,1)
7. End
BuildMaxHeap(arr)
1. BuildMaxHeap(arr)
2. heap_size(arr) = length(arr)
3. for i = length(arr)/2 to 1
4. MaxHeapify(arr,i)
5. End
MaxHeapify(arr,i)
1. MaxHeapify(arr,i)
2. L = left(i)
3. R = right(i)
4. if L ? heap_size[arr] and arr[L] > arr[i]
5. largest = L
6. else
7. largest = i
8. if R ? heap_size[arr] and arr[R] > arr[largest]
9. largest = R
10. if largest != i
11. swap arr[i] with arr[largest]
12. MaxHeapify(arr,largest)
13. End
o The first step includes the creation of a heap by adjusting the elements
of the array.
o After the creation of heap, now remove the root element of the heap
repeatedly by shifting it to the end of the array, and then store the heap
structure with the remaining elements.
Now let's see the working of heap sort in detail by using an example. To
understand it more clearly, let's take an unsorted array and try to sort it using
heap sort. It will make the explanation clearer and easier.
First, we have to construct a heap from the given array and convert it into max
heap.
After converting the given heap into max heap, the array elements are -
Next, we have to delete the root element (89) from the max heap. To delete
this node, we have to swap it with the last node, i.e. (11). After deleting the
root element, we again have to heapify it to convert it into max heap.
After swapping the array element 89 with 11, and converting the heap into
max-heap, the elements of array are -
In the next step, again, we have to delete the root element (81) from the max
heap. To delete this node, we have to swap it with the last node, i.e. (54). After
deleting the root element, we again have to heapify it to convert it into max
heap.
After swapping the array element 81 with 54 and converting the heap into
max-heap, the elements of array are -
In the next step, we have to delete the root element (76) from the max heap
again. To delete this node, we have to swap it with the last node, i.e. (9). After
deleting the root element, we again have to heapify it to convert it into max
heap.
After swapping the array element 76 with 9 and converting the heap into
max-heap, the elements of array are -
In the next step, again we have to delete the root element (54) from the max
heap. To delete this node, we have to swap it with the last node, i.e. (14). After
deleting the root element, we again have to heapify it to convert it into max
heap.
After swapping the array element 54 with 14 and converting the heap into
max-heap, the elements of array are -
In the next step, again we have to delete the root element (22) from the max
heap. To delete this node, we have to swap it with the last node, i.e. (11). After
deleting the root element, we again have to heapify it to convert it into max
heap.
After swapping the array element 22 with 11 and converting the heap into
max-heap, the elements of array are -
In the next step, again we have to delete the root element (14) from the max
heap. To delete this node, we have to swap it with the last node, i.e. (9). After
deleting the root element, we again have to heapify it to convert it into max
heap.
After swapping the array element 14 with 9 and converting the heap into
max-heap, the elements of array are -
In the next step, again we have to delete the root element (11) from the max
heap. To delete this node, we have to swap it with the last node, i.e. (9). After
deleting the root element, we again have to heapify it to convert it into max
heap.
After swapping the array element 11 with 9, the elements of array are -
Now, heap has only one element left. After deleting it, heap will be empty.
After completion of sorting, the array elements are -
1. Time Complexity
The time complexity of heap sort is O(n logn) in all three cases (best case,
average case, and worst case). The height of a complete binary tree having n
elements is logn.
2. Space Complexity
Stable N0
Implementation of Heapsort
Now, let's see the programs of Heap sort in different programming languages.
1. #include <stdio.h>
2. /* function to heapify a subtree. Here 'i' is the
3. index of root node in array a[], and 'n' is the size of heap. */
4. void heapify(int a[], int n, int i)
5. {
6. int largest = i; // Initialize largest as root
7. int left = 2 * i + 1; // left child
8. int right = 2 * i + 2; // right child
9. // If left child is larger than root
10. if (left < n && a[left] > a[largest])
11. largest = left;
12. // If right child is larger than root
13. if (right < n && a[right] > a[largest])
14. largest = right;
15. // If root is not largest
16. if (largest != i) {
17. // swap a[i] with a[largest]
18. int temp = a[i];
19. a[i] = a[largest];
20. a[largest] = temp;
21.
22. heapify(a, n, largest);
23. }
24. }
25. /*Function to implement the heap sort*/
26. void heapSort(int a[], int n)
27. {
28. for (int i = n / 2 - 1; i >= 0; i--)
29. heapify(a, n, i);
30. // One by one extract an element from heap
31. for (int i = n - 1; i >= 0; i--) {
32. /* Move current root element to end*/
33. // swap a[0] with a[i]
34. int temp = a[0];
35. a[0] = a[i];
36. a[i] = temp;
37.
38. heapify(a, i, 0);
39. }
40. }
41. /* function to print the array elements */
42. void printArr(int arr[], int n)
43. {
44. for (int i = 0; i < n; ++i)
45. {
46. printf("%d", arr[i]);
47. printf(" ");
48. }
49.
50. }
51. int main()
52. {
53. int a[] = {48, 10, 23, 43, 28, 26, 1};
54. int n = sizeof(a) / sizeof(a[0]);
55. printf("Before sorting array elements are - \n");
56. printArr(a, n);
57. heapSort(a, n);
58. printf("\nAfter sorting array elements are - \n");
59. printArr(a, n);
60. return 0;
61. }
Output
Comparison Between Various Sorting Algorithms
Time Complexities of all Sorting Algorithms
The efficiency of an algorithm depends on two parameters:
1. Time Complexity
2. Space Complexity
Time Complexity: Time Complexity is defined as the number of times a
particular instruction set is executed rather than the total time taken. It is
because the total time took also depends on some external factors like
the compiler used, processor’s speed, etc.
Space Complexity: Space Complexity is the total memory space
required by the program for its execution.
Both are calculated as the function of input size(n).
One important thing here is that in spite of these parameters the
efficiency of an algorithm also depends upon the nature and size
of the input.
Types Of Time Complexity :
1. Best Time Complexity: Define the input for which algorithm takes
less time or minimum time. In the best case calculate the lower bound
of an algorithm. Example: In the linear search when search data is
present at the first location of large data then the best case occurs.
2. Average Time Complexity: In the average case take all random
inputs and calculate the computation time for all inputs.
And then we divide it by the total number of inputs.
3. Worst Time Complexity: Define the input for which algorithm takes a
long time or maximum time. In the worst calculate the upper bound of
an algorithm. Example: In the linear search when search data is
present at the last location of large data then the worst case occurs.
Following is a quick revision sheet that you may refer to at the last
minute
The collection of files is known as Directory. The collection of directories at the different
levels, is known as File System.
Every file carries a name by which the file is recognized in the file system. One directory
cannot have two files with the same name.
2.Identifier
Along with the name, Each File has its own extension which identifies the type of the file.
For example, a text file has the extension .txt, A video file can have the extension .mp4.
3.Type
In a File System, the Files are classified in different types such as video files, audio files,
text files, executable files, etc.
4.Location
In the File System, there are several locations on which, the files can be stored. Each file
carries its location as its attribute.
5.Size
The Size of the File is one of its most important attribute. By size of the file, we mean the
number of bytes acquired by the file in the memory.
6.Protection
The Admin of the computer may want the different protections for the different files.
Therefore each file carries its own set of permissions to the different group of Users.
Every file carries a time stamp which contains the time and date on which the file is last
modified.
This operation is used to create a file in the file system. It is the most widely used
operation performed on the file system. To create a new file of a particular type the
associated application program calls the file system. This file system allocates space to
the file. As the file system knows the format of directory structure, so entry of this new
file is made into the appropriate directory.
2. Open operation:
This operation is the common operation performed on the file. Once the file is created,
it must be opened before performing the file processing operations. When the user
wants to open a file, it provides a file name to open the particular file in the file system.
It tells the operating system to invoke the open system call and passes the file name to
the file system.
3. Write operation:
This operation is used to write the information into a file. A system call write is issued
that specifies the name of the file and the length of the data has to be written to the file.
Whenever the file length is increased by specified value and the file pointer is
repositioned after the last byte written.
4. Read operation:
This operation reads the contents from a file. A Read pointer is maintained by the OS,
pointing to the position up to which the data has been read.
5. Re-position or Seek operation:
The seek system call re-positions the file pointers from the current position to a specific
place in the file i.e. forward or backward depending upon the user's requirement. This
operation is generally performed with those file management systems that support
direct access files.
6. Delete operation:
Deleting the file will not only delete all the data stored inside the file it is also used so
that disk space occupied by it is freed. In order to delete the specified file the directory
is searched. When the directory entry is located, all the associated file space and the
directory entry is released.
7. Truncate operation:
Truncating is simply deleting the file except deleting attributes. The file is not
completely deleted although the information stored inside the file gets replaced.
8. Close operation:
When the processing of the file is complete, it should be closed so that all the changes
made permanent and all the resources occupied should be released. On closing it
deallocates all the internal descriptors that were created when the file was opened.
9. Append operation:
Sorting Algorithms
A Sorting Algorithm is used to rearrange a given array or list elements according to a
comparison operator on the elements. The comparison operator is used to decide the
new order of element in the respective data structure.
Selection Sort
The selection sort algorithm sorts an array by repeatedly finding the minimum element
(considering ascending order) from unsorted part and putting it at the beginning. The
algorithm maintains two subarrays in a given array.
1) The subarray which is already sorted.
2) Remaining subarray which is unsorted.
In every iteration of selection sort, the minimum element (considering ascending order)
from the unsorted subarray is picked and moved to the sorted subarray.
Bubble Sort
Bubble Sort is the simplest sorting algorithm that works by repeatedly swapping the
adjacent elements if they are in wrong order.
Example:
First Pass:
( 5 1 4 2 8 ) –> ( 1 5 4 2 8 ), Here, algorithm compares the first two elements, and
swaps since 5 > 1.
( 1 5 4 2 8 ) –> ( 1 4 5 2 8 ), Swap since 5 > 4
( 1 4 5 2 8 ) –> ( 1 4 2 5 8 ), Swap since 5 > 2
( 1 4 2 5 8 ) –> ( 1 4 2 5 8 ), Now, since these elements are already in order (8 > 5),
algorithm does not swap them.
Second Pass:
( 1 4 2 5 8 ) –> ( 1 4 2 5 8 )
( 1 4 2 5 8 ) –> ( 1 2 4 5 8 ), Swap since 4 > 2
( 1 2 4 5 8 ) –> ( 1 2 4 5 8 )
( 1 2 4 5 8 ) –> ( 1 2 4 5 8 )
Now, the array is already sorted, but our algorithm does not know if it is completed. The
algorithm needs one whole pass without any swap to know it is sorted.
Third Pass:
( 1 2 4 5 8 ) –> ( 1 2 4 5 8 )
( 1 2 4 5 8 ) –> ( 1 2 4 5 8 )
( 1 2 4 5 8 ) –> ( 1 2 4 5 8 )
( 1 2 4 5 8 ) –> ( 1 2 4 5 8 )
Another Example:
12, 11, 13, 5, 6
Let us loop for i = 1 (second element of the array) to 4 (last element of the array)
i = 1. Since 11 is smaller than 12, move 12 and insert 11 before 12
11, 12, 13, 5, 6
i = 2. 13 will remain at its position as all elements in A[0..I-1] are smaller than 13
11, 12, 13, 5, 6
HeapSort
Heap sort is a comparison based sorting technique based on Binary Heap data
structure. It is similar to selection sort where we first find the maximum element and
place the maximum element at the end. We repeat the same process for remaining
element.
What is Binary Heap?
Let us first define a Complete Binary Tree. A complete binary tree is a binary tree in
which every level, except possibly the last, is completely filled, and all nodes are as far
left as possible
A Binary Heap is a Complete Binary Tree where items are stored in a special order
such that value in a parent node is greater(or smaller) than the values in its two children
nodes. The former is called as max heap and the latter is called min heap. The heap
can be represented by binary tree or array.
Merge Sort
Like QuickSort, Merge Sort is a Divide and Conquer algorithm. It divides input array in
two halves, calls itself for the two halves and then merges the two sorted halves. The
merge() function is used for merging two halves. The merge(arr, l, m, r) is key process
that assumes that arr[l..m] and arr[m+1..r] are sorted and merges the two sorted sub-
arrays into one. See following C implementation for details.
MergeSort(arr[], l, r)
If r > l
Searching Algorithms
‘Recent Articles’ on Searching
Searching Algorithms are designed to check for an element or retrieve an element from
any data structure where it is stored. Based on the type of search operation, these
algorithms are generally classified into two categories:
1. Sequential Search: In this, the list or array is traversed sequentially and every
element is checked. For example: Linear Search.
2. Interval Search: These algorithms are specifically designed for searching in sorted
data-structures. These type of searching algorithms are much more efficient than
Linear Search as they repeatedly target the center of the search structure and
divide the search space in half. For Example: Binary Search.
Files: As we know that Computers are used for storing the information for a
Permanent Time or the Files are used for storing the Data of the users for a Long
time Period. And the files can contains any type of information means they can Store
the text, any Images or Pictures or any data in any Format. So that there must be
Some Mechanism those are used for Storing the information, Accessing the
information and also Performing Some Operations on the files.
There are Many files which have their Owen Type and own names. When we Store a
File in the System, then we must have to specify the Name and the Type of File. The
Name of file will be any valid Name and Type means the application with the file has
linked.
So that we can say that Every File also has Some Type Means Every File belongs to
Special Type of Application software’s. When we Provides a Name to a File then we
also specify the Extension of the File because a System will retrieve the Contents of
the File into that Application Software. For Example if there is a File Which Contains
Some Paintings then this will Opened into the Paint Software.
1) Ordinary Files or Simple File: Ordinary File may belong to any type of
Application for example notepad, paint, C Program, Songs etc. So all the Files those
are created by a user are Ordinary Files. Ordinary Files are used for Storing the
information about the user Programs. With the help of Ordinary Files we can store
the information which contains text, database, any image or any other type of
information.
2) Directory files: The Files those are Stored into the a Particular Directory or
Folder. Then these are the Directory Files. Because they belongs to a Directory and
they are Stored into a Directory or Folder. For Example a Folder Name Songs which
Contains Many Songs So that all the Files of Songs are known as Directory Files.
3) Special Files: The Special Files are those which are not created by the user. Or
The Files those are necessary to run a System. The Files those are created by the
System. Means all the Files of an Operating System or Window, are refers to Special
Files. There are Many Types of Special Files, System Files, or windows Files, Input
output Files. All the System Files are Stored into the System by using. sys
Extension.
4) FIFO Files: The First in First Out Files are used by the System for Executing the
Processes into Some Order. Means To Say the Files those are Come first, will be
Executed First and the System Maintains a Order or Sequence Order. When a user
Request for a Service from the System, then the Requests of the users are Arranged
into Some Files and all the Requests of the System will be performed by the System
by using Some Sequence Order in which they are Entered or we can say that all the
files or Requests those are Received from the users will be Executed by using Some
Order which is also called as First in First Out or FIFO order.
1) Read Operation: Meant To Read the information which is Stored into the Files.
2) Write Operation: For inserting some new Contents into a File.
3) Rename or Change the Name of File.
4) Copy the File from one Location to another.
5) Sorting or Arrange the Contents of File.
6) Move or Cut the File from One Place to Another.
7) Delete a File
8) Execute Means to Run Means File Display Output.
We can Also Link a File with any other File. These are also called as the Symbolic
Links, in the Symbolic Links all the files are linked by using Some Text or Some
Alias.
What is File?
File is a collection of records related to each other. The file size is limited by the
size of memory and storage medium.
1. File Activity
2. File Volatility
File activity specifies percent of actual records which proceed in a single run.
File Organization
File organization ensures that records are available for processing. It is used to
determine an efficient file organization for each base relation.
Sequential file search starts from the beginning of the file and the records can
be added at the end of the file.
In sequential file, it is not possible to add a record in the middle of the file
without rewriting the file.
Advantages of sequential file
Direct access file helps in online transaction processing system (OLTP) like
online railway reservation system.
In direct access file, sorting of the records are not required.
It accesses the desired records immediately.
It updates several files quickly.
It has better control over record allocation.
Disadvantages of direct access file organization
In indexed sequential access file, sequential file and random file access is
possible.
It accesses the records very fast if the index table is properly organized.
The records can be inserted in the middle of the file.
It provides quick access for sequential and direct processing.
It reduces the degree of the sequential search.
Disadvantages of Indexed sequential access file organization
• Physical Files:
A collection of bits stored in the secondary storage device
• Logical File:
A channel that connects he program to the physical file(Stream).
An example
FILE* out
out=fopen(“sample.txt”,”w”);
Here out is the logical file and sample.txt is the physical file.
BASIC OPERATIONS IN FILES
• The records are stored based on their relative position with respect to first record.
• Record with key 50 is placed at location 50
• The search complexity is O(1)
• Disadvantage is a lot of memory is wasted.
• For example if no record has key 100 the position 100 is wasted.
FILE ORGANIZATION AND STRUCTURE
• "File organization" refers to the logical relationships among the various records that
onstitute the file, particularly with respect to the means of identification and access to
any specific record. "File structure" refers to the format of the label and data blocks and
of any logical record control information.
• The organization of a given file may be sequential, relative, or indexed.
FILE ORGANIZATION AND STRUCTURE
• Sequential Files
• A sequential file is organized such that each record in the file except the first has a unique predecessor record and each record
except the last has a unique successor record. These predecessor-successor relationships are established by the order in which
the records are written when the file is created. Once established, the predecessor-successor relationships do not change except
in the case where records are added to the end of the file.
• A file that is organized sequentially must be accessed sequentially.
• Variable- or Fixed-Length Sequential Files
• Sequential files may be recorded in variable-length or fixed-length record form. If a file consists of variable-length records, each
logical record is preceded by control information that indicates the size of the logical record. The control information is recorded
when the logical record is written, based on the size of the internal record specified in the WRITE statement, and is subsequently
used by the input- output control system to determine the location of successive logical records. If a file consists of fixed- length
records, the record size is established at the time the file is opened and is the same for every logical record on the file. Therefore,
there is no need to record any control information with the logical record.
CONT…….
• Relative Files
• A relative file, which must be allocated to random mass storage file space in theexecution activity, is organized
such that each record location is uniquely identified by an integer value greater than zero which specifies ordinal
position on the file. In the RELATIVE KEY phrase of the SELECT clause, the source program specifies a numeric
integer data item as the relative key item.
• Indexed Files
• An indexed file, which must be allocated in the execution activity to two or more random mass storage files (one
for the index, and one or more for the data), is organized such that each record is uniquely identified by the value
of a key within the record. In the RECORD KEY phrase of the SELECT clause, the source program specifies one
of the data items within one of the records associated with the file as the record key data item. Each attempt to
access a record based on the record key item causes a search of the index file for a key that matches the current
contents of the record key data item in the file record area. The matching index record in turn points to the
location of the associated data record.
FILES
• File: A file is a collection of rated data that is treated as a single unit on a peripheral device. for example text document in word
processing.
• Types OF FILES:
• Master file:it contains records of permanent data types.master files are created at the time when you install yopur business. if you
wish to convert your company into computerised one you need to create master file which can be created by using your manual file
folder and keying data onto storage devices for example the name of coustomer ,dob,genderetc these are permanent data types
• Transaction file: It contains data which is used to update the records of master file for example address of the costumer etc.
transaction file ,A collection of transaction records. The data in transaction files is used to update the master files, which contain the
data about the subjects of the organization (customers, employees, vendors, etc.). Transaction files also serve as audit trails and
history for the organization. Where before they were transferred to offline storage after some period of time, they are increasingly
being kept online for routine analyses.See data warehouse, transaction processing and information system.
• A report is a textual work (usually of writing, speech, television, or film) made with the specific intention of relaying information or
recounting certain events in a widely presentable form.
• Written reports are documents which present focused, salient content to a specific audience. Reports are often used to display the
result of an experiment, investigation, or inquiry. The audience may be public or private, an individual or the public in general.
Reports are used in government, business, education, science, and other fields.
• A report file is a file that describes how a report is printed.
FILES
WORK FILE IS Temporary file containing documents, drafts, records, roughnotes, and sketches employed
in the analysis or preparation of plans, projects, or other documents.
Program Files is a folder in Microsoft Windows operating systems where applications that are not part of
the operating system are installed by default.
A text file (sometimes spelled "textfile": an old alternate name is "flatfile") is a kind of computer file that is
structured as a sequence of lines of electronic text. A text file exists within a computer file system. The end
of a text file is often denoted by placing one or more special characters, known as an end-of-file marker,
after the last line in a text file.
"Text file" refers to a type of container, while plain text refers to a type of content. Text files can contain
plain text, but they are not limited to such.
At a generic level of description, there are two kinds of computer files: text files and binary files
“
This Photo by Unknown Author is licensed under CC BY-ND
Any Queries
”
Binary Tree Traversing
1. Preorder traversal
2. Inorder traversal
3. Postorder traversal
1) Preorder traversal
To traverse a binary tree in preorder, following operations are carried out:
Algorithm:
Algorithm preorder(t)
/*t is a binary tree. Each node of t has three fields:
lchild, data, and rchild.*/
{
If t! =0 then
{
Visit(t);
Preorder(t->lchild);
Preorder(t->rchild);
}
}
2) Inorder traversal
To traverse a binary tree in inorder traversal, following operations are carried
out:
Algorithm:
Algorithm inorder(t)
3) Postorder traversal
To traverse a binary tree in postorder traversal, following operations are
carried out:
Algorithm:
Algorithm postorder(t)
BINARY TREE
1. A binary tree has a root node. It may not have any child nodes(0 child nodes,
NULL tree).
2. A root node may have one or two child nodes. Each node forms a binary tree
itself.
In a complete binary tree, every internal node has exactly two children and all
leaf nodes are at same level.
For example, at Level 2, there must be 22 = 4 nodes and at Level 3 there must
be 23 = 8 nodes.
3. Skewed Binary Tree
If a tree which is dominated by left child node or right child node, is said to be
a Skewed Binary Tree.
In a skewed binary tree, all nodes except one have only one child node. The
remaining node has no child.
In a left skewed tree, most of the nodes have the left child without
corresponding right child.
In a right skewed tree, most of the nodes have the right child without
corresponding left child.
4. Extended Binary Tree
Extended binary tree consists of replacing every null subtree of the original tree
with special nodes.
Empty circle represents internal node and filled circle represents external node.
The nodes from the original tree are internal nodes and the special nodes are
external nodes.
Every internal node in the extended binary tree has exactly two children and
every external node is a leaf. It displays the result which is a complete binary
tree.
The above tree is AVL tree because the difference between heights of left and
right subtrees for every node is less than or equal to 1.
A C E H I
3 5 8 2 7
Huffman Tree Construction
A H C E I
3 2 5 8 7
5
Huffman Tree Construction
A E I
H
8 7
3 2
C
5 5
10
Huffman Tree Construction
E I
A H
8 7
3 2
C
15
5 5
10
Huffman Tree Construction
A H
E = 01
3 2
I = 00
C E I
1 0 C = 10
5 5 8 7 A = 111
1 0 1 0 H = 110
10 15
1 0
25
A B C D E F G H
22 5 11 19 2 11 25 5
Try to build Huffman
Algorithm
Binary Search Tree
For a binary tree to be a binary search tree, the data of all the nodes in the left sub-tree of the root
node should be ≤ the data of the root. The data of all the nodes in the right subtree of the root node
should be > the data of the root.
Example
Also, considering the root node with data=5, its children also satisfy the specified ordering. Similarly,
the root node with data=19 also satisfies this ordering. When recursive, all sub trees satisfy the left
and right sub tree ordering.
Pre-order traversal
{
If (root)
{
Printf ("%d ",root->data); //Printf root->data
preorder(root->left); //Go to left subtree
preorder(root->right); //Go to right subtree
}
}
Post-order traversal
In-order traversal
The 'inorder( )' procedure is called with root equal to node with data=10
Since the node has a left subtree, 'inorder( )' is called with root equal to node with data=5
Again, the node has a left subtree, so 'inorder( )' is called with root=1
Node with data=1 does not have a left subtree. Hence, this node is processed.
Node with data=1 does not have a right subtree. Hence, nothing is done.
inorder(1) gets completed and this function call is popped from the call stack.
Left subtree of node with data=5 is completely processed. Hence, this node gets processed.
Right subtree of this node with data=5 is non-empty. Hence, the right subtree gets processed
now. 'inorder(6)' is then called.
Note
'inorder(6)' is only equivalent to saying inorder(pointer to node with data=6). The notation has been
used for brevity.
Again, the node with data=6 has no left subtree, Therefore, it can be processed and it also has no
right subtree. 'inorder(6)' is then completed.
Both the left and right subtrees of node with data=5 have been completely processed.
Hence, inorder(5) is then completed.
The order in which BST in Fig. 1 is visited is: 1, 5, 6, 10, 17, 19. The in-order traversal of a BST
gives a sorted ordering of the data elements that are present in the BST. This is an important
property of a BST.
Insertion in BST
Algorithm
1. If the data of the root node is greater, and if a left subtree exists, then repeat step 1 with root
= root of left subtree. Else, insert element as left child of current root.
2. If the data of the root node is greater, and if a right subtree exists, then repeat step 2 with
root = root of right subtree. Else, insert element as right child of current root.
3. mplementation
4. struct node* insert(struct node* root, int data)
5. {
6. if (root == NULL) //If the tree is empty, return a
new,single node
7. return newNode(data);
8. else
9. {
10. //Otherwise, recur down the tree
11. if (data <= root->data)
12. root->left = insert(root->left, data);
13. else
14. root->right = insert(root->right, data);
15. //return the (unchanged) root pointer
16. return root;
17. }
18. }