Data Structures and Algorithms (DSA) - Cropped
Data Structures and Algorithms (DSA) - Cropped
Its Link:
Programiz I recommend you to
visit their site.
For any value of n, the running time of an algorithm does not cross the time
provided by O(g(n)).
Since it gives the worstcase running time of an algorithm, it is widely used to
analyze an algorithm as we are always interested in the worstcase scenario.
For any value of n, the minimum time required by the algorithm is given by
Omega Ω(g(n)).
If a function f(n) lies anywhere in between c1g(n) and c2g(n) for all n ≥ n0, then f(n) is
said to be asymptotically tight bound.
DSAMaster Theorem
The master method is a formula for solving recurrence relations of the form:
T(n) = aT(n/b) + f(n),
where,
n = size of input
a = number of subproblems in the recursion
n/b = size of each subproblem. All subproblems are assumed
to have the same size.
f(n) = cost of the work done outside the recursive call,
which includes the cost of dividing the problem and
cost of merging the solutions
Here, a ≥ 1 and b > 1 are constants, and f(n) is an asymptotically positive function.
Master Theorem
If a ≥ 1 and b > 1 are constants and f(n) is an asymptotically positive function, then the
time complexity of a recursive relation is given by
T(n) = aT(n/b) + f(n)
where, T(n) has the following asymptotic bounds:
ϵ > 0 is a constant.
1. If the cost of solving the subproblems at each level increases by a certain factor, the
value of f(n) will become polynomially smaller than nlogb a. Thus, the time complexity is
oppressed by the cost of the last level ie. nlogb a
2. If the cost of solving the subproblem at each level is nearly equal, then the value
of f(n) will be nlogb a. Thus, the time complexity will be f(n) times the total number of
levels ie. nlogb a * log n
3. If the cost of solving the subproblems at each level decreases by a certain factor, the
value of f(n) will become polynomially larger than nlogb a. Thus, the time complexity is
oppressed by the cost of f(n).
• a<1
Stacks
You can think of the stack data structure as the pile of plates on top of another.
And, if you want the plate at the bottom, you must first remove all the plates on top.
This is exactly how the stack data structure works.
We can implement a stack in any programming language like C, C++, Java, Python or
C#, but the specification is pretty much the same.
1. A pointer called TOP is used to keep track of the top element in the stack.
2. When initializing the stack, we set its value to 1 so that we can check if the stack is
empty by comparing TOP == 1.
3. On pushing an element, we increase the value of TOP and place the new element in
the position pointed to by TOP.
4. On popping an element, we return the element pointed to by TOP and reduce its
value.
def create_stack():
stack = []
return stack
def check_empty(stack):
return len(stack) == 0
def pop(stack):
if (check_empty(stack)):
return "stack is empty"
return stack.pop()
stack = create_stack()
push(stack, str(1))
push(stack, str(2))
push(stack, str(3))
push(stack, str(4))
print("popped item: " + pop(stack))
print("stack after popping an element: " + str(stack))
Queue
Types of Queues
A queue is a useful data structure in programming. It is similar to the ticket queue
outside a cinema hall, where the first person entering the queue is the first person
who gets the ticket.
• Simple Queue
• Circular Queue
• Priority Queue
Simple Queue
In a simple queue, insertion takes place at the rear and removal occurs at the front.
It strictly follows the FIFO (First in First out) rule.
Circular Queue
In a circular queue, the last element points to the first element making a circular link.
Circular Queue Representation
The main advantage of a circular queue over a simple queue is better memory
utilization. If the last position is full and the first position is empty, we can insert an
element in the first position. This action is not possible in a simple queue.
Priority Queue
A priority queue is a special type of queue in which each element is associated with a
priority and is served according to its priority. If elements with the same priority
occur, they are served according to their order in the queue.
Linked lists can be of multiple types: singly, doubly, and circular linked list. In this
article, we will focus on the singly linked list. To learn about other types, visit Types
of Linked List.
Note: You might have played the game Treasure Hunt, where each clue includes the
information about the next clue. That is how the linked list operates.
• A data item
We wrap both the data item and the next node reference in a struct as:
Understanding the structure of a linked list node is the key to having a grasp on it.
Each struct node has a data item and a pointer to another struct node. Let us create
a simple Linked List with three items to understand how this works.
If you didn't understand any of the lines above, all you need is a refresher
on pointers and structs.
In just a few steps, we have created a simple linked list with three nodes.
• Point its next pointer to the struct node containing 2 as the data value
Doing something similar in an array would have required shifting the positions of all
the subsequent elements.
In python and Java, the linked list can be implemented using classes as shown in the
codes below.
Apart from that, linked lists are a great way to learn how pointers work. By practicing
how to manipulate linked lists, you can prepare yourself to learn more advanced
data structures like graphs and trees.
class LinkedList:
def __init__(self):
self.head = None
if __name__ == '__main__':
linked_list = LinkedList()
# Assign item values
linked_list.head = Node(1)
second = Node(2)
third = Node(3)
# Connect nodes
linked_list.head.next = second
second.next = third
# Print the linked list item
while linked_list.head != None:
print(linked_list.head.item, end=" ")
linked_list.head = linked_list.head.next
Space Complexity: O(n)
• Implemented in stack and queue
• In undo functionality of softwares
• Hash tables, Graphs
Linked List Operations
Here's a list of basic linked list operations that we will cover in this article.
• Traversal access each element of the linked list
• Insertion adds a new element to the linked list
• Deletion removes the existing elements
• Search find a node in the linked list
• Sort sort the nodes of the linked list
Before you learn about linked list operations in detail, make sure to know
about Linked List first.
Things to Remember about Linked List
• head points to the first node of the linked list
In all of the examples, we will assume that the linked list has three
nodes 1 >2 >3 with node structure as below:
structnode{intdata;
structnode*next;};
Traverse a Linked List
Displaying the contents of a linked list is very simple. We keep moving the temp node
to the next one and display its contents.
1. Insert at the beginning
• Allocate memory for new node
• Store data
• Change next of new node to point to head
• Change head to point to recently created node
structnode*newNode;
newNode = malloc(sizeof(structnode));
newNode>data = 4;
newNode>next = head;
head = newNode;
2. Insert at the End
• Allocate memory for new node
• Store data
• Traverse to last node
• Change next of last node to recently created node
structnode*newNode;
newNode = malloc(sizeof(structnode));
newNode>data = 4;
newNode>next = NULL;
structnode*temp = head;
while(temp>next != NULL){
temp = temp>next;
}
temp>next = newNode;
3. Insert at the Middle
• Allocate memory and store data for new node
• Traverse to node just before the required position of new node
• Change next pointers to include new node in between
structnode*newNode;
newNode = malloc(sizeof(structnode));
newNode>data = 4;
structnode*temp = head;
for(int i=2; i < position; i++) {
if(temp>next != NULL) {
temp = temp>next;
}
}
newNode>next = temp>next;
temp>next = newNode;
1. Delete from beginning
• Point head to the second node
head= head>next;
2. Delete from end
• Traverse to second last element
• Change its next pointer to null
struct node* temp = head;
while(temp>next>next!=NULL){
temp = temp>next;
}
temp>next = NULL;
3. Delete from middle
• Traverse to element before the element to be deleted
• Change next pointers to exclude the node from the chain
for(inti=2; i< position; i++) {
if(temp>next!=NULL) {
temp = temp>next;
}
}
temp>next= temp>next>next;
• In each iteration, check if the key of the node is equal to item. If it the key matches
the item, return true otherwise return false.
// Search a nodeboolsearchNode(structNode** head_ref, int key) {
structNode* current = *head_ref;
while(current != NULL) {
if(current>data == key) returntrue;
current = current>next;
}
returnfalse;
}
2. If head is null, return.
3. Else, run a loop till the last node (i.e. NULL).
4. In each iteration, follow the following step 56.
6. Check if the data of the current node is greater than the next node. If it is greater,
swap current and index.
while(index != NULL) {
if(current>data > index>data) {
temp = current>data;
current>data = index>data;
index>data = temp;
}
index = index>next;
}
current = current>next;
}
}
}
# Create a node
class Node:
def __init__(self, data):
self.data = data
self.next = None
class LinkedList:
def __init__(self):
self.head = None
# Insert at the beginning
def insertAtBeginning(self, new_data):
new_node = Node(new_data)
new_node.next = self.head
self.head = new_node
# Insert after a node
def insertAfter(self, prev_node, new_data):
if prev_node is None:
print("The given previous node must inLinkedList.")
return
new_node = Node(new_data)
new_node.next = prev_node.next
prev_node.next = new_node
# Insert at the end
def insertAtEnd(self, new_data):
new_node = Node(new_data)
if self.head is None:
self.head = new_node
return
last = self.head
while (last.next):
last = last.next
last.next = new_node
# Deleting a node
def deleteNode(self, position):
if self.head is None:
return
temp = self.head
if position == 0:
self.head = temp.next
temp = None
return
# Find the key to be deleted
for i in range(position 1):
temp = temp.next
if temp is None:
break
if temp.next is None:
return
next = temp.next.next
temp.next = None
temp.next = next
# Search an element
def search(self, key):
current = self.head
current = current.next
return False
if head is None:
return
else:
while current is not None:
# index points to the node next to current
index = current.next
index = index.next
current = current.next
if __name__ == '__main__':
llist = LinkedList()
llist.insertAtEnd(1)
llist.insertAtBeginning(2)
llist.insertAtBeginning(3)
llist.insertAtEnd(4)
llist.insertAfter(llist.head.next, 5)
print('linked list:')
llist.printList()
print()
item_to_find = 3
if llist.search(item_to_find):
print(str(item_to_find) + " is found")
else:
print(str(item_to_find) + " is not found")
llist.sortLinkedList(llist.head)
print("Sorted List: ")
llist.printList()
Types of Linked Lists
two>next = three;
two>prev = one;
three>next = NULL;
three>prev = two;
If you want to learn more about it, please visit doubly linked list and operations on it.
• for singly linked list, next pointer of last item points to the first item
• In the doubly linked list, prev pointer of the first item points to the last item as well.
If you want to learn more about it, please visit circular linked list and operations on
it.
Hash Tables
Hash Table
The Hash table data structure stores elements in keyvalue pairs where
Here, h(k) will give us a new index to store the element linked with k.
Hash table Representation
To learn more, visit Hashing.
Hash Collision
When the hash function generates the same index for multiple keys, there will be a
conflict (what value to be stored in that index). This is called a hash collision.
We can resolve the hash collision using one of the following techniques.
If j is the slot for multiple elements, it contains a pointer to the head of the list of
elements. If no element is present, j contains NIL.
Collision Resolution using chaining
Pseudocode for operations
chainedHashSearch(T, k)
returnT[h(k)]
chainedHashInsert(T, x)
T[h(x.key)] = x //insert at the headchainedHashDelete(T, x)
T[h(x.key)] = NIL
2. Open Addressing
Unlike chaining, open addressing doesn't store multiple elements into the same slot.
Here, each slot is either filled with a single key or left NIL.
i. Linear Probing
where
• i = {0, 1, ….}
If a collision occurs at h(k, 0), then h(k, 1) is checked. In this way, the value of i is
incremented linearly.
The problem with linear probing is that a cluster of adjacent slots is filled. When
inserting a new element, the entire cluster must be traversed. This adds to the time
required to perform operations on the hash table.
It works similar to linear probing but the spacing between the slots is increased
(greater than one) by using the following relation.
where,
• i = {0, 1, ….}
If a collision occurs after applying a hash function h(k), then another hash function is
calculated for finding the next slot.
Here, we will look into different methods to find a good hash function
1. Division Method
If k is a key and m is the size of the hash table, the hash function h() is calculated as:
h(k) = k mod m
For example, If the size of a hash table is 10 and k = 112 then h(k) = 112 mod 10 = 2. The
value of m must not be the powers of 2. This is because the powers of 2 in binary
format are 10, 100, 1000, …. When we find k mod m, we will always get the lower order p
bits.
if m = 22, k = 17, then h(k) = 17 mod 22 = 10001 mod 100 = 01
if m = 23, k = 17, then h(k) = 17 mod 22 = 10001 mod 100 = 001
if m = 24, k = 17, then h(k) = 17 mod 22 = 10001 mod 100 = 0001
if m = 2p, then h(k) = p lower bits of m
2. Multiplication Method
where,
• A is any constant. The value of A lies between 0 and 1. But, an optimal choice will be ≈
(√51)/2 suggested by Knuth.
3. Universal Hashing
• cryptographic applications
• always greater than its child node/s and the key of the root node is the largest
among all other nodes. This property is also called max heap property.
• always smaller than the child node/s and the key of the root node is the smallest
among all other nodes. This property is also called min heap property.
Maxheap
Minheap
This type of data structure is also called a binary heap.
Heap Operations
Some of the important operations performed on a heap are described below along
with their algorithms.
Heapify
Heapify is the process of creating a heap data structure from a binary tree. It is used
to create a MinHeap or a MaxHeap.
Initial Array
3. Start from the first index of nonleaf node whose index is given by n/2 1.
5. The index of left child is given by 2i + 1 and the right child is given by 2i + 2.
If leftChild is greater than currentElement (i.e. element at ith index), set leftChildIndex as
largest.
If rightChild is greater than element in largest, set rightChildIndex as largest.
Swap if necessary
Algorithm
Heapify(array, size, i)
seti aslargest
leftChild = 2i + 1rightChild = 2i + 2ifleftChild > array[largest]
setleftChildIndex aslargest
ifrightChild > array[largest]
setrightChildIndex aslargest
swap array[i] andarray[largest]
To create a MaxHeap:
MaxHeap(array, size)
loopfromthe first index ofnonleaf node down to zero
call heapify
For MinHeap, both leftChild and rightChild must be smaller than the parent for all
nodes.
Peek (Find max/min)
Peek operation returns the maximum element from Max Heap or minimum element
from Min Heap without deleting the node.
For both Max heap and Min Heap
return rootNode
ExtractMax/Min
ExtractMax returns the node with maximum value after removing it from a Max
Heap whereas ExtractMin returns the node with minimum after removing it from
Min Heap.
# MaxHeap data structure in Pythondefheapify(arr, n, i):largest = i
l = 2* i + 1r = 2* i + 2ifl < n andarr[i] < arr[l]:
largest = l
ifr < n andarr[largest] < arr[r]:
largest = r
iflargest != i:
arr[i],arr[largest] = arr[largest],arr[i]
heapify(arr, n, largest)
definsert(array, newNum):size = len(array)
ifsize == 0:
array.append(newNum)
else:
array.append(newNum);
fori inrange((size//2)1, 1, 1):
heapify(array, size, i)
defdeleteNode(array, num):size = len(array)
i = 0fori inrange(0, size):
ifnum == array[i]:
breakarray[i], array[size1] = array[size1], array[i]
array.remove(num)
fori inrange((len(array)//2)1, 1, 1):
heapify(array, len(array), i)
arr = []
insert(arr, 3)
insert(arr, 4)
insert(arr, 9)
insert(arr, 5)
insert(arr, 2)
print("MaxHeap array: "+ str(arr))
deleteNode(arr, 4)
print("After deleting an element: "+ str(arr))
Fibonacci Heap
Fibonacci Heap
A fibonacci heap is a data structure that consists of a collection of trees which follow
min heap or max heap property. We have already discussed min heap and max heap
property in the Heap Data Structure article. These two properties are the
characteristics of the trees present on a fibonacci heap.
In a fibonacci heap, a node can have more than two children or no children at all.
Also, it has more efficient heap operations than that supported by the binomial and
binary heaps.
Fibonacci Heap
There are two main advantages of using a circular doubly linked list.
Algorithm
insert(H, x)
degree[x] = 0
p[x] = NIL
child[x] = NIL
left[x] = x
right[x] = x
mark[x] = FALSE
concatenate the root list containing x with root list H
if min[H] == NIL or key[x] < key[min[H]]
then min[H] = x
n[H] = n[H] + 1
Inserting a node into an already existing heap follows the steps below.
3. If the heap is empty, set the new node as a root node and mark it min.
4. Else, insert the node into the root list and update min.
Insertion Example
Find Min
Union
2. Update min by selecting a minimum key from the new root lists.
It is the most important operation on a fibonacci heap. In this operation, the node
with minimum value is removed from the heap and the tree is readjusted.
3. Create an array of size equal to the maximum degree of the trees in the heap before
deletion.
4. Do the following (steps 57) until there are no multiple roots with the same degree.
5. Map the degree of current root (minpointer) to the degree in the array.
6. Map the degree of next root to the degree in array.
7. If there are more than two mappings for the same degree, then apply union
operation to those roots such that the minheap property is maintained (i.e. the
minimum is at the root).
Fibonacci Heap
2. Delete the min node, add all its child nodes to the root list and set the minpointer to
the next root in the root list.
Create an array
These are the most important operations which are discussed in Decrease Key and
Delete Node Operations.
# Fibonacci Heap in pythonimportmath
# Creating fibonacci treeclassFibonacciTree:def__init__(self, value):self.value = value
self.child = []
self.order = 0# Adding tree at the end of the treedefadd_at_end(self, t):self.child.append(t)
self.order = self.order + 1# Creating Fibonacci heapclassFibonacciHeap:def__init__(self):self.trees = []
self.least = Noneself.count = 0# Insert a nodedefinsert_node(self, value):new_tree =
FibonacciTree(value)
self.trees.append(new_tree)
if(self.least isNoneorvalue < self.least.value):
self.least = new_tree
self.count = self.count + 1# Get minimum valuedefget_min(self):ifself.least isNone:
returnNonereturnself.least.value
# Extract the minimum valuedefextract_min(self):smallest = self.least
ifsmallest isnotNone:
forchild insmallest.child:
self.trees.append(child)
self.trees.remove(smallest)
ifself.trees == []:
self.least = Noneelse:
self.least = self.trees[0]
self.consolidate()
self.count = self.count 1returnsmallest.value
# Consolidate the treedefconsolidate(self):aux = (floor_log(self.count) + 1) * [None]
whileself.trees != []:
x = self.trees[0]
order = x.order
self.trees.remove(x)
whileaux[order] isnotNone:
y = aux[order]
ifx.value > y.value:
x, y = y, x
x.add_at_end(y)
aux[order] = Noneorder = order + 1aux[order] = x
self.least = Nonefork inaux:
ifk isnotNone:
self.trees.append(k)
if(self.least isNoneork.value < self.least.value):
self.least = k
deffloor_log(x):returnmath.frexp(x)[1] 1fibonacci_heap = FibonacciHeap()
fibonacci_heap.insert_node(7)
fibonacci_heap.insert_node(3)
fibonacci_heap.insert_node(17)
fibonacci_heap.insert_node(24)
print('the minimum value of the fibonacci heap: {}'.format(fibonacci_heap.get_min()))
print('the minimum value removed: {}'.format(fibonacci_heap.extract_min()))
Complexities
Insertion O(1)
Find Min O(1)
Union O(1)
Extract Min O(log n)
Decrease Key O(1)
Delete Node O(log n)
1. Decrease a key: decreases the value of a the key to any lower value
Decreasing a Key
In decreasing a key operation, the value of a key is decreased to a lower value.
DecreaseKey
1. Select the node to be decreased, x, and change its value to the new value k.
2. If the parent of x, y, is not null and the key of parent is greater than that of the k then
call Cut(x) and CascadingCut(y) subsequently.
3. If the key of x is smaller than the key of min, then mark x as min.
Cut
1. Remove x from the current position and add it to the root list.
CascadingCut
2. Cut part: Since 24 ≠ nill and 15 < its parent, cut it and add it to the root list. CascadingCut
part: mark 24.
Example: Decreasing 35 to 5
Decrease 35 to 5
2. Cut part: Since 26 ≠ nill and 5<its parent, cut it and add it to the root list.
3. CascadingCut part: Since 26 is marked, the flow goes to Cut and CascadingCut.
Cut(26): Cut 26 and add it to the root list and mark it as false.
Deleting a Node
This process makes use of decreasekey and extractmin operations. The following
steps are followed for deleting a node.
1. Let k be the node to be deleted.
2. Apply decreasekey operation to decrease the value of k to the lowest possible value
(i.e. ∞).
3. Apply extractmin operation to remove this node.
Complexities
Decrease Key O(1)
Delete Node O(log n)
Tree Basic
A Tree
Different tree data structures allow quicker and easier access to the data as it is a
nonlinear data structure.
Tree Terminologies
Node
A node is an entity that contains a key or value and pointers to its child nodes.
The last nodes of each path are called leaf nodes or external nodes that do not
contain a link/pointer to child nodes.
Edge
Height of a Node
The height of a node is the number of edges from the node to the deepest leaf (ie.
the longest path from the node to a leaf node).
Depth of a Node
The depth of a node is the number of edges from the root to the node.
Height of a Tree
The height of a Tree is the height of the root node or the depth of the deepest node.
Height and depth of each node in a tree
Degree of a Node
Forest
Types of Tree
1. Binary Tree
4. BTree
Tree Traversal
In order to perform any operation on a tree, you need to reach to the specific node.
The tree traversal algorithm helps in visiting a required node in the tree.
Tree Applications
• Binary Search Trees(BSTs) are used to quickly check whether an element is present in
a set or not.
• A modified version of a tree called Tries is used in modern routers to store routing
information.
• Most popular databases use BTrees and TTrees, which are variants of the tree
structure we learned above to store their data
• Compilers use a syntax tree to validate the syntax of every program you write.
Tree Traversal
Linear data structures like arrays, stacks, queues, and linked list have only one way to
read the data. But a hierarchical data structure like a tree can be traversed in
different ways.
Tree traversal
Let's think about how we can read the elements of the tree in the image shown
above.
Instead, we use traversal methods that take into account the basic structure of a tree
i.e.
structnode{intdata;
structnode* left;structnode* right;}
The struct node pointed to by left and right might have other left and right children so
we should think of them as subtrees instead of subnodes.
• Two subtrees
Depending on the order in which we do this, there can be three types of traversal.
Inorder traversal
1. First, visit all the nodes in the left subtree
Preorder traversal
1. Visit root node
Postorder traversal
1. Visit all the nodes in the left subtree
Stack
Now we traverse to the subtree pointed on the TOP of the stack.
Again, we follow the same rule of inorder
Left subtree > root > right subtree
Final Stack
Since the node "5" doesn't have any subtrees, we print it directly. After that we print
its parent "12" and then the right child "6".
Putting everything on a stack was helpful because now that the leftsubtree of the
root node has been traversed, we can print it and go to the right subtree.
After going through all the elements, we get the inorder traversal as
5 > 12 > 6 > 1 > 9
We don't have to create the stack ourselves because recursion maintains the correct
order for us.
class Node:
def __init__(self, item):
self.left = None
self.right = None
self.val = item
def inorder(root):
if root:
# Traverse left
inorder(root.left)
# Traverse root
print(str(root.val) + ">", end='')
# Traverse right
inorder(root.right)
def postorder(root):
if root:
# Traverse left
postorder(root.left)
# Traverse right
postorder(root.right)
# Traverse root
print(str(root.val) + ">", end='')
def preorder(root):
if root:
# Traverse root
print(str(root.val) + ">", end='')
# Traverse left
preorder(root.left)
# Traverse right
preorder(root.right)
root = Node(1)
root.left = Node(2)
root.right = Node(3)
root.left.left = Node(4)
root.left.right = Node(5)
Binary Tree
A binary tree is a tree data structure in which each parent node can have at most
two children. Each node of a binary tree consists of three items:
• data item
Binary Tree
A full Binary tree is a special type of binary tree in which every parent node/internal
node has either two or no children.
Full Binary Tree
To learn more, please visit full binary tree.
A perfect binary tree is a type of binary tree in which every internal node has exactly
two child nodes and all the leaf nodes are at the same level.
A complete binary tree is just like a full binary tree, but with two major differences
3. The last leaf element might not have a right sibling i.e. a complete binary tree
doesn't have to be a full binary tree.
A degenerate or pathological tree is the tree having a single child either left or right.
It is a type of binary tree in which the difference between the height of the left and
the right subtree for each node is either 0 or 1.
class Node:
def __init__(self, key):
self.left = None
self.right = None
self.val = key
# Traverse preorder
def traversePreOrder(self):
print(self.val, end=' ')
if self.left:
self.left.traversePreOrder()
if self.right:
self.right.traversePreOrder()
# Traverse inorder
def traverseInOrder(self):
if self.left:
self.left.traverseInOrder()
print(self.val, end=' ')
if self.right:
self.right.traverseInOrder()
# Traverse postorder
def traversePostOrder(self):
if self.left:
self.left.traversePostOrder()
if self.right:
self.right.traversePostOrder()
print(self.val, end=' ')
root = Node(1)
root.left = Node(2)
root.right = Node(3)
root.left.left = Node(4)
# Creating a node
class Node:
return False
root = Node(1)
root.rightChild = Node(3)
root.leftChild = Node(2)
root.leftChild.leftChild = Node(4)
root.leftChild.rightChild = Node(5)
root.leftChild.rightChild.leftChild = Node(6)
root.leftChild.rightChild.rightChild = Node(7)
if isFullTree(root):
print("The tree is a full binary tree")
else:
print("The tree is not a full binary tree")
Perfect Binary Tree
2. If a node has h > 0, it is a perfect binary tree if both of its subtrees are of height h
1 and are nonoverlapping.
Perfect Binary Tree (Recursive Representation)
class newNode:
def __init__(self, k):
self.key = k
self.right = self.left = None
root = None
root = newNode(1)
root.left = newNode(2)
root.right = newNode(3)
root.left.left = newNode(4)
root.left.right = newNode(5)
if (is_perfect(root, calculateDepth(root))):
print("The tree is a perfect binary tree")
else:
print("The tree is not a perfect binary tree")
To learn more about the height of a tree/node, visit Tree Data Structure.Following
are the conditions for a heightbalanced binary tree:
1. difference between the left and the right subtree for any node is not more than one
class Node:
class Height:
def __init__(self):
self.height = 0
left_height = Height()
right_height = Height()
if root is None:
return True
l = isHeightBalanced(root.left, left_height)
r = isHeightBalanced(root.right, right_height)
if abs(left_height.height right_height.height)
Binary Search Tree
• It is called a binary tree because each tree node has a maximum of two children.
• It is called a search tree because it can be used to search for the presence of a
number in O(log(n)) time.
The properties that separate a binary search tree from a regular binary tree is
1. All nodes of left subtree are less than the root node
2. All nodes of right subtree are more than the root node
3. Both subtrees of each node are also BSTs i.e. they have the above two properties
A tree having a right subtree with one value smaller than the root is shown to demonstrate that it is not
a valid binary search tree
The binary tree on the right isn't a binary search tree because the right subtree of
the node "3" contains a value smaller than it.
There are two basic operations that you can perform on a binary search tree:
Search Operation
The algorithm depends on the property of BST that if each left subtree has values
below root and each right subtree has values above the root.
If the value is below the root, we can say for sure that the value is not in the right
subtree; we need to only search in the left subtree and if the value is above the root,
we can say for sure that the value is not in the left subtree; we need to only search in
the right subtree.
Algorithm:
If root == NULL
returnNULL;
If number == root>datareturnroot>data;
If number < root>datareturnsearch(root>left)
If number > root>datareturnsearch(root>right)
4 is found
If the value is found, we return the value so that it gets propagated in each recursion
step as shown in the image below.
If you might have noticed, we have called return search(struct node*) four times.
When we return either the new node or NULL, the value gets returned again and
again until search(root) returns the final result.
If the value is found in any of the subtrees, it is propagated up so that in the end it is returned,
otherwise null is returned
If the value is not found, we eventually reach the left or right child of a leaf node
which is NULL and it gets propagated and returned.
Insert Operation
Inserting a value in the correct position is similar to searching because we try to
maintain the rule that the left subtree is lesser than root and the right subtree is
larger than root.
We keep going to either right subtree or left subtree depending on the value and
when we reach a point left or right subtree is null, we put the new node there.
Algorithm:
If node == NULL
returncreateNode(data)
if(data< node>data)
node>left = insert(node>left, data);
elseif(data> node>data)
node>right = insert(node>right, data);
returnnode;
The algorithm isn't as simple as it looks. Let's try to visualize how we add a number
to an existing BST.
4<8 so, transverse through the left child of 8
Image showing the importance of returning the root element at the end so that the elements don't lose
their position during the upward recursion step.
Deletion Operation
There are three cases for deleting a node from a binary search tree.
Case I
In the first case, the node to be deleted is the leaf node. In such a case, simply delete
the node from the tree.
4 is to be deleted
In the second case, the node to be deleted lies has a single child node. In such a case
follow the steps below:
copy the value of its child to the node and delete the child
Final tree
Case III
In the third case, the node to be deleted has two children. In such a case follow the
steps below:
3 is to be deleted
Copy the value of the inorder successor (4) to the node
# Create a node
class Node:
def __init__(self, key):
self.key = key
self.left = None
self.right = None
# Inorder traversal
def inorder(root):
if root is not None:
# Traverse left
inorder(root.left)
# Traverse root
print(str(root.key) + ">", end=' ')
# Traverse right
inorder(root.right)
# Insert a node
def insert(node, key):
return node
return current
# Deleting a node
def deleteNode(root, key):
root.key = temp.key
return root
root = None
root = insert(root, 8)
root = insert(root, 3)
root = insert(root, 1)
root = insert(root, 6)
root = insert(root, 7)
root = insert(root, 10)
root = insert(root, 14)
root = insert(root, 4)
print("\nDelete 10")
root = deleteNode(root, 10)
print("Inorder traversal: ", end=' ')
inorder(root)
AVL Tree
AVL tree is a selfbalancing binary search tree in which each node maintains extra
information called a balance factor whose value is either 1, 0 or +1.
AVL tree got its name after its inventor Georgy AdelsonVelsky and Landis.
Balance Factor
Balance factor of a node in an AVL tree is the difference between the height of the
left subtree and that of the right subtree of that node.
Balance Factor = (Height of Left Subtree Height of Right Subtree) or (Height of Right
Subtree Height of Left Subtree)
The self balancing property of an avl tree is maintained by the balance factor. The
value of balance factor should always be 1, 0 or +1.
Avl tree
Left Rotate
In leftrotation, the arrangement of the nodes on the right is transformed into the
arrangements on the left node.
Algorithm
Right Rotate
In leftrotation, the arrangement of the nodes on the left is transformed into the
arrangements on the right node.
4. Else if y is the right child of its parent p, make x as the right child of p.
In leftright rotation, the arrangements are first shifted to the left and then to the
right.
In rightleft rotation, the arrangements are first shifted to the right and then to the
left.
New node
2. Go to the appropriate leaf node to insert a newNode using the following recursive
steps. Compare newKey with rootKey of the current tree.
a. If newKey < rootKey, call insertion algorithm on the left subtree of the current node until
the leaf node is reached.
b. Else if newKey > rootKey, call insertion algorithm on the right subtree of current node
until the leaf node is reached.
a. If balanceFactor > 1, it means the height of the left subtree is greater than that of the
right subtree. So, do a right rotation or leftright rotation
b. If balanceFactor < 1, it means the height of the right subtree is greater than that of the
left subtree. So, do right rotation or rightleft rotation
a. If nodeToBeDeleted is the leaf node (ie. does not have any child), then
remove nodeToBeDeleted.
b. If nodeToBeDeleted has one child, then substitute the contents of nodeToBeDeleted with
that of the child. Remove the child.
c. If nodeToBeDeleted has two children, find the inorder successor w of nodeToBeDeleted (ie.
node with a minimum value of key in the right subtree).
Update bf
4. Rebalance the tree if the balance factor of any of the nodes is not equal to 1, 0 or 1.
import sys
class AVLTree(object):
root.height = 1 + max(self.getHeight(root.left),
self.getHeight(root.right))
return root
# Update the balance factor of nodes
root.height = 1 + max(self.getHeight(root.left),
self.getHeight(root.right))
balanceFactor = self.getBalance(root)
# Balance the tree
if balanceFactor > 1:
if self.getBalance(root.left) >= 0:
return self.rightRotate(root)
else:
root.left = self.leftRotate(root.left)
return self.rightRotate(root)
if balanceFactor < 1:
if self.getBalance(root.right) <= 0:
return self.leftRotate(root)
else:
root.right = self.rightRotate(root.right)
return self.leftRotate(root)
return root
# Function to perform left rotation
def leftRotate(self, z):
y = z.right
T2 = y.left
y.left = z
z.right = T2
z.height = 1 + max(self.getHeight(z.left),
self.getHeight(z.right))
y.height = 1 + max(self.getHeight(y.left),
self.getHeight(y.right))
return y
# Function to perform right rotation
def rightRotate(self, z):
y = z.left
T3 = y.right
y.right = z
z.left = T3
z.height = 1 + max(self.getHeight(z.left),
self.getHeight(z.right))
y.height = 1 + max(self.getHeight(y.left),
self.getHeight(y.right))
return y
# Get the height of the node
def getHeight(self, root):
if not root:
return 0
return root.height
# Get balance factore of the node
def getBalance(self, root):
if not root:
return 0
return self.getHeight(root.left) self.getHeight(root.right)
def getMinValueNode(self, root):
if root is None or root.left is None:
return root
return self.getMinValueNode(root.left)
def preOrder(self, root):
if not root:
return
print("{0} ".format(root.key), end="")
self.preOrder(root.left)
self.preOrder(root.right)
# Print the tree
def printHelper(self, currPtr, indent, last):
if currPtr != None:
sys.stdout.write(indent)
if last:
sys.stdout.write("R")
indent += " "
else:
sys.stdout.write("L")
indent += "| "
print(currPtr.key)
self.printHelper(currPtr.left, indent, False)
self.printHelper(currPtr.right, indent, True)
myTree = AVLTree()
root = None
nums = [33, 13, 52, 9, 21, 61, 8, 11]
for num in nums:
root = myTree.insert_node(root, num)
myTree.printHelper(root, "", True)
key = 13
root = myTree.delete_node(root, key)
print("After Deletion: ")
myTree.printHelper(root, "", True)
• For searching in large databases
\
B Tree
B-tree
Btree is a special type of selfbalancing search tree in which each node can contain
more than one key and can have more than two children. It is a generalized form of
the binary search tree.
Btree
Other data structures such as a binary search tree, avl tree, redblack tree, etc can
store only one key in one node. If you have to store a large number of keys, then the
height of such trees becomes very large and the access time increases.
However, Btree can store many keys in a single node and can have multiple child
nodes. This decreases the height significantly allowing faster disk accesses.
B-tree Properties
1. For each node x, the keys are stored in increasing order.
3. If n is the order of the tree, each internal node can contain at most n 1 keys along
with a pointer to each child.
4. Each node except root can have at most n children and at least n/2 children.
5. All leaves have the same depth (i.e. heighth of the tree).
7. If n ≥ 1, then for any nkey Btree of height h and minimum degree t ≥ 2, h ≥ logt (n+1)/2.
Operations on a B-tree
Searching an element in a Btree
1. Starting from the root node, compare k with the first key of the node.
If k = the first key of the node, return the node and the index.
3. If k < the first key of the root node, search the left child of this key recursively.
4. If there is more than one key in the current node and k > the first key, compare k with
the next key in the node.
If k < next key, search the left child of this key (ie. k lies in between the first and the
second keys).
Else, search the right child of the key.
Searching Example
1. Let us search key k = 17 in the tree below of degree 3.
Btree
2. k is not found in the root so, compare it with the root key.
4. Compare k with 16. Since k > 16, compare k with the next key 18.
Compare with the keys from left to right
5. Since k < 18, k lies between 16 and 18. Search in the right child of 16 or the left child of
18.
6. k is found.
k is found
# Create node
class BTreeNode:
def __init__(self, leaf=False):
self.leaf = leaf
self.keys = []
self.child = []
class BTree:
def __init__(self, t):
self.root = BTreeNode(True)
self.t = t
# Search key
def search_key(self, k, x=None):
if x is not None:
i=0
while i < len(x.keys) and k > x.keys[i][0]:
i += 1
if i < len(x.keys) and k == x.keys[i][0]:
return (x, i)
elif x.leaf:
return None
else:
return self.search_key(k, x.child[i])
else:
return self.search_key(k, self.root)
# Split
def split(self, x, i):
t = self.t
y = x.child[i]
z = BTreeNode(y.leaf)
x.child.insert_key(i + 1, z)
x.keys.insert_key(i, y.keys[t 1])
z.keys = y.keys[t: (2 * t) 1]
y.keys = y.keys[0: t 1]
if not y.leaf:
z.child = y.child[t: 2 * t]
y.child = y.child[0: t 1]
def main():
B = BTree(3)
for i in range(10):
B.insert_key((i, 2 * i))
B.print_tree(B.root)
B Tree Applications
• databases and file systems
• multilevel indexing
From <https://www.programiz.com/dsa/btree>
Insertion into a BTree
Insertion Operation
1. If the tree is empty, allocate a root node and insert the key.
6. Now, there are elements greater than its limit. So, split at the median.
7. Push the median key upwards and make the left keys as a left child and the right keys
as a right child.
Insertion Example
Let us understand the insertion operation with the illustrations below.
The elements to be inserted are 8, 9, 10, 11, 15, 16, 17, 18, 20, 23.
The elements to be inserted are 8, 9, 10, 11, 15, 16, 17, 18, 20, 23.
Inserting elements into a Btree
# Create a node
class BTreeNode:
def __init__(self, leaf=False):
self.leaf = leaf
self.keys = []
self.child = []
# Tree
class BTree:
def __init__(self, t):
self.root = BTreeNode(True)
self.t = t
# Insert node
def insert(self, k):
root = self.root
if len(root.keys) == (2 * self.t) 1:
temp = BTreeNode()
self.root = temp
temp.child.insert(0, root)
self.split_child(temp, 0)
self.insert_non_full(temp, k)
else:
self.insert_non_full(root, k)
# Insert nonfull
def insert_non_full(self, x, k):
i = len(x.keys) 1
if x.leaf:
x.keys.append((None, None))
while i >= 0 and k[0] < x.keys[i][0]:
x.keys[i + 1] = x.keys[i]
i = 1
while i >= 0 and k[0] < x.keys[i][0]:
x.keys[i + 1] = x.keys[i]
i = 1
x.keys[i + 1] = k
else:
while i >= 0 and k[0] < x.keys[i][0]:
i = 1
i += 1
if len(x.child[i].keys) == (2 * self.t) 1:
self.split_child(x, i)
if k[0] > x.keys[i][0]:
i += 1
self.insert_non_full(x.child[i], k)
def main():
B = BTree(3)
for i in range(10):
B.insert((i, 2 * i))
B.print_tree(B.root)
if __name__ == '__main__':
main()
Deletion from a BTree
While deleting a tree, a condition called underflow may occur. Underflow occurs
when a node contains less than the minimum number of keys it should hold.
1. Inorder Predecessor
The largest key on the left child of a node is called its inorder predecessor.
2. Inorder Successor
The smallest key on the right child of a node is called its inorder successor.
Deletion Operation
Before going through the steps below, one must know these facts about a B tree of
degree m.
4. A node (except root node) should contain a minimum of ⌈m/2⌉ 1 keys. (i.e. 1)
Case I
The key to be deleted lies in the leaf. There are two cases for it.
1. The deletion of the key does not violate the property of the minimum number of
keys a node should hold.
In the tree below, deleting 32 does not violate the above properties.
2. The deletion of the key violates the property of the minimum number of keys a node
should hold. In this case, we borrow a key from its immediate neighboring sibling
node in the order of left to right.
First, visit the immediate left sibling. If the left sibling node has more than a
minimum number of keys, then borrow a key from this node.
In the tree below, deleting 31 results in the above condition. Let us borrow a key
from the left sibling node.
Deleting a leaf key (31)If both the immediate sibling nodes already have a minimum
number of keys, then merge the node with either the left sibling node or the right
sibling node. This merging is done through the parent node.
Case II
If the key to be deleted lies in the internal node, the following cases occur.
1. The internal node, which is deleted, is replaced by an inorder predecessor if the left
child has more than the minimum number of keys.
3. If either child has exactly a minimum number of keys then, merge the left and the
right children.
Deleting an internal node (30)After merging if the parent node has less than the
minimum number of keys then, look for the siblings as in Case I.
Case III
In this case, the height of the tree shrinks. If the target key lies in an internal node,
and the deletion of the key leads to a fewer number of keys in the node (i.e. less
than the minimum required), then look for the inorder predecessor and the inorder
successor. If both the children contain a minimum number of keys then, borrowing
cannot take place. This leads to Case II(3) i.e. merging the children.
Again, look for the sibling to borrow a key. But, if the sibling also has only a minimum
number of keys then, merge the node with the sibling along with the parent. Arrange
the children accordingly (increasing order).
# Btree node
class BTreeNode:
def __init__(self, leaf=False):
self.leaf = leaf
self.keys = []
self.child = []
class BTree:
def __init__(self, t):
self.root = BTreeNode(True)
self.t = t
# Insert a key
def insert(self, k):
root = self.root
if len(root.keys) == (2 * self.t) 1:
temp = BTreeNode()
self.root = temp
temp.child.insert(0, root)
self.split_child(temp, 0)
self.insert_non_full(temp, k)
else:
self.insert_non_full(root, k)
if i < len(x.keys) and x.keys[i][0] == k[0]:
return self.delete_internal_node(x, k, i)
elif len(x.child[i].keys) >= t:
self.delete(x.child[i], k)
else:
if i != 0 and i + 2 < len(x.child):
if len(x.child[i 1].keys) >= t:
self.delete_sibling(x, i, i 1)
elif len(x.child[i + 1].keys) >= t:
self.delete_sibling(x, i, i + 1)
else:
self.delete_merge(x, i, i + 1)
elif i == 0:
if len(x.child[i + 1].keys) >= t:
self.delete_sibling(x, i, i + 1)
else:
self.delete_merge(x, i, i + 1)
elif i + 1 == len(x.child):
if len(x.child[i 1].keys) >= t:
self.delete_sibling(x, i, i 1)
else:
self.delete_merge(x, i, i 1)
self.delete(x.child[i], k)
# Delete internal node
def delete_internal_node(self, x, k, i):
t = self.t
if x.leaf:
if x.keys[i][0] == k[0]:
x.keys.pop(i)
return
return
if len(x.child[i].keys) >= t:
x.keys[i] = self.delete_predecessor(x.child[i])
return
elif len(x.child[i + 1].keys) >= t:
x.keys[i] = self.delete_successor(x.child[i + 1])
return
else:
self.delete_merge(x, i, i + 1)
self.delete_internal_node(x.child[i], k, self.t 1)
# Delete the predecessor
def delete_predecessor(self, x):
if x.leaf:
return x.pop()
n = len(x.keys) 1
if len(x.child[n].keys) >= self.t:
self.delete_sibling(x, n + 1, n)
else:
self.delete_merge(x, n, n + 1)
self.delete_predecessor(x.child[n])
# Delete the successor
def delete_successor(self, x):
if x.leaf:
return x.keys.pop(0)
if len(x.child[1].keys) >= self.t:
self.delete_sibling(x, 0, 1)
else:
self.delete_merge(x, 0, 1)
self.delete_successor(x.child[0])
# Delete resolution
def delete_merge(self, x, i, j):
cnode = x.child[i]
if j > i:
rsnode = x.child[j]
cnode.keys.append(x.keys[i])
for k in range(len(rsnode.keys)):
cnode.keys.append(rsnode.keys[k])
if len(rsnode.child) > 0:
cnode.child.append(rsnode.child[k])
if len(rsnode.child) > 0:
cnode.child.append(rsnode.child.pop())
new = cnode
x.keys.pop(i)
x.child.pop(j)
else:
lsnode = x.child[j]
lsnode.keys.append(x.keys[j])
for i in range(len(cnode.keys)):
lsnode.keys.append(cnode.keys[i])
if len(lsnode.child) > 0:
lsnode.child.append(cnode.child[i])
if len(lsnode.child) > 0:
lsnode.child.append(cnode.child.pop())
new = lsnode
x.keys.pop(j)
x.child.pop(i)
if x == self.root and len(x.keys) == 0:
self.root = new
B = BTree(3)
for i in range(10):
B.insert((i, 2 * i))
B.print_tree(B.root)
B.delete(B.root, (8,))
print("\n")
B.print_tree(B.root)
Deletion Complexity
Best case Time complexity: Θ(log n)
B+ Tree
A B+ tree is an advanced form of a selfbalancing tree in which all the values are
present in the leaf level.
Properties of a B+ Tree
1. All leaves are at the same level.
3. Each node except root can have a maximum of m children and at least m/2 children.
4. Each node can contain a maximum of m 1 keys and a minimum of ⌈m/2⌉ 1 keys.
Comparison between a B-tree and a B+ Tree
Btree
B+ tree
The data pointers are present only at the leaf nodes on a B+ tree whereas the data
pointers are present in the internal, leaf or root nodes on a Btree.
The leaves are not connected with each other on a Btree whereas they are
connected on a B+ tree.
Searching on a B+ Tree
The following steps are followed to search for data in a B+ Tree of order m. Let the
data to be searched be k.
1. Start from the root node. Compare k with the keys at the root node [k1, k2, k3,......km
1].
3. Else if k == k1, compare k2. If k < k2, k lies between k1 and k2. So, search in the left child
of k2.
B+ tree
1. Compare k with the root node.
k not found
go to the right
5. k is found.
k is found
# B+ tee in python
import math
# Node creation
class Node:
def __init__(self, order):
self.order = order
self.values = []
self.keys = []
self.nextKey = None
self.parent = None
self.check_leaf = False
# B plus tree
class BplusTree:
def __init__(self, order):
self.root = Node(order)
self.root.check_leaf = True
# Insert operation
def insert(self, value, key):
value = str(value)
old_node = self.search(value)
old_node.insert_at_leaf(old_node, value, key)
if (len(old_node.values) == old_node.order):
node1 = Node(old_node.order)
node1.check_leaf = True
node1.parent = old_node.parent
mid = int(math.ceil(old_node.order / 2)) 1
node1.values = old_node.values[mid + 1:]
node1.keys = old_node.keys[mid + 1:]
node1.nextKey = old_node.nextKey
old_node.values = old_node.values[:mid + 1]
old_node.keys = old_node.keys[:mid + 1]
old_node.nextKey = node1
self.insert_in_parent(old_node, node1.values[0], node1)
parentNode = n.parent
temp3 = parentNode.keys
for i in range(len(temp3)):
if (temp3[i] == n):
parentNode.values = parentNode.values[:i] + \
[value] + parentNode.values[i:]
parentNode.keys = parentNode.keys[:i +
1] + [ndash] + parentNode.keys[i + 1:]
if (len(parentNode.keys) > parentNode.order):
parentdash = Node(parentNode.order)
parentdash.parent = parentNode.parent
mid = int(math.ceil(parentNode.order / 2)) 1
parentdash.values = parentNode.values[mid + 1:]
parentdash.keys = parentNode.keys[mid + 1:]
value_ = parentNode.values[mid]
if (mid == 0):
parentNode.values = parentNode.values[:mid + 1]
else:
parentNode.values = parentNode.values[:mid]
parentNode.keys = parentNode.keys[:mid + 1]
for j in parentNode.keys:
j.parent = parentNode
for j in parentdash.keys:
j.parent = parentdash
self.insert_in_parent(parentNode, value_, parentdash)
# Delete a node
def delete(self, value, key):
node_ = self.search(value)
temp = 0
for i, item in enumerate(node_.values):
if item == value:
temp = 1
if key in node_.keys[i]:
if len(node_.keys[i]) > 1:
node_.keys[i].pop(node_.keys[i].index(key))
elif node_ == self.root:
node_.values.pop(i)
node_.keys.pop(i)
else:
node_.keys[i].pop(node_.keys[i].index(key))
del node_.keys[i]
node_.values.pop(node_.values.index(value))
self.deleteEntry(node_, value, key)
else:
print("Value not in Key")
return
if temp == 0:
print("Value not in Tree")
return
# Delete an entry
def deleteEntry(self, node_, value, key):
if not node_.check_leaf:
for i, item in enumerate(node_.keys):
if item == key:
node_.keys.pop(i)
break
for i, item in enumerate(node_.values):
if item == value:
node_.values.pop(i)
break
is_predecessor = 0
parentNode = node_.parent
PrevNode = 1
NextNode = 1
PrevK = 1
PostK = 1
for i, item in enumerate(parentNode.keys):
if item == node_:
if i > 0:
PrevNode = parentNode.keys[i 1]
PrevK = parentNode.values[i 1]
if i < len(parentNode.keys) 1:
NextNode = parentNode.keys[i + 1]
PostK = parentNode.values[i]
if PrevNode == 1:
ndash = NextNode
value_ = PostK
elif NextNode == 1:
is_predecessor = 1
ndash = PrevNode
value_ = PrevK
else:
if len(node_.values) + len(NextNode.values) < node_.order:
ndash = NextNode
value_ = PostK
else:
is_predecessor = 1
ndash = PrevNode
value_ = PrevK
if not ndash.check_leaf:
for j in ndash.keys:
j.parent = ndash
self.deleteEntry(node_.parent, value_, node_)
del node_
else:
if is_predecessor == 1:
if not node_.check_leaf:
ndashpm = ndash.keys.pop(1)
ndashkm_1 = ndash.values.pop(1)
node_.keys = [ndashpm] + node_.keys
node_.values = [value_] + node_.values
parentNode = node_.parent
for i, item in enumerate(parentNode.values):
if item == value_:
p.values[i] = ndashkm_1
break
else:
ndashpm = ndash.keys.pop(1)
ndashkm = ndash.values.pop(1)
node_.keys = [ndashpm] + node_.keys
node_.values = [ndashkm] + node_.values
parentNode = node_.parent
for i, item in enumerate(p.values):
if item == value_:
parentNode.values[i] = ndashkm
break
else:
if not node_.check_leaf:
ndashp0 = ndash.keys.pop(0)
ndashk0 = ndash.values.pop(0)
node_.keys = node_.keys + [ndashp0]
node_.values = node_.values + [value_]
parentNode = node_.parent
for i, item in enumerate(parentNode.values):
if item == value_:
parentNode.values[i] = ndashk0
break
else:
ndashp0 = ndash.keys.pop(0)
ndashk0 = ndash.values.pop(0)
node_.keys = node_.keys + [ndashp0]
node_.values = node_.values + [ndashk0]
parentNode = node_.parent
for i, item in enumerate(parentNode.values):
if item == value_:
parentNode.values[i] = ndash.values[0]
break
if not ndash.check_leaf:
for j in ndash.keys:
j.parent = ndash
if not node_.check_leaf:
for j in node_.keys:
j.parent = node_
if not parentNode.check_leaf:
for j in parentNode.keys:
j.parent = parentNode
node1 = Node(str(level[0]) + str(tree.root.values))
while (len(lst) != 0):
x = lst.pop(0)
lev = level.pop(0)
if (x.check_leaf == False):
for i, item in enumerate(x.keys):
print(item.values)
else:
for i, item in enumerate(x.keys):
print(item.values)
if (flag == 0):
lev_leaf = lev
leaf = x
flag = 1
record_len = 3
bplustree = BplusTree(record_len)
bplustree.insert('5', '33')
bplustree.insert('15', '21')
bplustree.insert('25', '31')
bplustree.insert('35', '41')
bplustree.insert('45', '10')
printTree(bplustree)
if(bplustree.find('5', '34')):
print("Found")
else:
print("Not found")
Search Complexity
Time Complexity
B+ Tree Applications
• Multilevel Indexing
• Faster operations on the tree (insertion, deletion, search)
• Database indexing
Insertion on a B+ Tree
Insertion on a B+ Tree
Inserting an element into a B+ tree consists of three main events: searching the
appropriate leaf, inserting the element and balancing/splitting the tree.
Let us understand these events below.
Insertion Operation
Before inserting an element into a B+ tree, these properties must be kept in mind.
• The root has at least two children.
The following steps are followed for inserting an element.
1. Since every element is inserted into the leaf node, go to the appropriate leaf node.
2. Insert the key into the leaf node.
Case I
1. If the leaf is not full, insert the key into the leaf node in increasing order.
Case II
1. If the leaf is full, insert the key into the leaf node in increasing order and balance the
tree in the following way.
4. If the parent node is already full, follow steps 2 to 3.
Insertion Example
Let us understand the insertion operation with the illustrations below.
The elements to be inserted are 5,15, 25, 35, 45.
1. Insert 5.
Insert 5
2. Insert 15.
Insert 15
3. Insert 25.
Insert 25
4. Insert 35.
Insert 35
5. Insert 45.
Insert 45
# B+ tee in python
import math
# Node creation
class Node:
def __init__(self, order):
self.order = order
self.values = []
self.keys = []
self.nextKey = None
self.parent = None
self.check_leaf = False
# B plus tree
class BplusTree:
def __init__(self, order):
self.root = Node(order)
self.root.check_leaf = True
# Insert operation
def insert(self, value, key):
value = str(value)
old_node = self.search(value)
old_node.insert_at_leaf(old_node, value, key)
if (len(old_node.values) == old_node.order):
node1 = Node(old_node.order)
node1.check_leaf = True
node1.parent = old_node.parent
mid = int(math.ceil(old_node.order / 2)) 1
node1.values = old_node.values[mid + 1:]
node1.keys = old_node.keys[mid + 1:]
node1.nextKey = old_node.nextKey
old_node.values = old_node.values[:mid + 1]
old_node.keys = old_node.keys[:mid + 1]
old_node.nextKey = node1
self.insert_in_parent(old_node, node1.values[0], node1)
# Inserting at the parent
def insert_in_parent(self, n, value, ndash):
if (self.root == n):
rootNode = Node(n.order)
rootNode.values = [value]
rootNode.keys = [n, ndash]
self.root = rootNode
n.parent = rootNode
ndash.parent = rootNode
return
parentNode = n.parent
temp3 = parentNode.keys
for i in range(len(temp3)):
if (temp3[i] == n):
parentNode.values = parentNode.values[:i] + \
[value] + parentNode.values[i:]
parentNode.keys = parentNode.keys[:i +
1] + [ndash] + parentNode.keys[i + 1:]
if (len(parentNode.keys) > parentNode.order):
parentdash = Node(parentNode.order)
parentdash.parent = parentNode.parent
mid = int(math.ceil(parentNode.order / 2)) 1
parentdash.values = parentNode.values[mid + 1:]
parentdash.keys = parentNode.keys[mid + 1:]
value_ = parentNode.values[mid]
if (mid == 0):
parentNode.values = parentNode.values[:mid + 1]
else:
parentNode.values = parentNode.values[:mid]
parentNode.keys = parentNode.keys[:mid + 1]
for j in parentNode.keys:
j.parent = parentNode
for j in parentdash.keys:
j.parent = parentdash
self.insert_in_parent(parentNode, value_, parentdash)
# Print the tree
def printTree(tree):
lst = [tree.root]
level = [0]
leaf = None
flag = 0
lev_leaf = 0
node1 = Node(str(level[0]) + str(tree.root.values))
while (len(lst) != 0):
x = lst.pop(0)
lev = level.pop(0)
if (x.check_leaf == False):
for i, item in enumerate(x.keys):
print(item.values)
else:
for i, item in enumerate(x.keys):
print(item.values)
if (flag == 0):
lev_leaf = lev
leaf = x
flag = 1
record_len = 3
bplustree = BplusTree(record_len)
bplustree.insert('5', '33')
bplustree.insert('15', '21')
bplustree.insert('25', '31')
bplustree.insert('35', '41')
bplustree.insert('45', '10')
printTree(bplustree)
if(bplustree.find('5', '34')):
print("Found")
else:
print("Not found")
Insertion Complexity
Time complexity: Θ(t.logt n)
Deletion Operation
Before going through the steps below, one must know these facts about a B+ tree of
degree m.
1. A node can have a maximum of m children. (i.e. 3)
While deleting a key, we have to take care of the keys present in the internal nodes
(i.e. indexes) as well because the values are redundant in a B+ tree. Search the key to
be deleted then follow the following steps.
Case I
The key to be deleted is present only at the leaf node not in the indexes (or internal
nodes). There are two cases for it:
1. There is more than the minimum number of keys in the node. Simply delete the key.
Deleting 40 from Btree
2. There is an exact minimum number of keys in the node. Delete the key and borrow a
key from the immediate sibling. Add the median key of the sibling node to the
parent.
Deleting 5 from Btree
Case II
The key to be deleted is present in the internal nodes as well. Then we have to
remove them from the internal nodes as well. There are the following cases for this
situation.
1. If there is more than the minimum number of keys in the node, simply delete the key
from the leaf node and delete the key from the internal node as well.
Fill the empty space in the internal node with the inorder successor.
2. If there is an exact minimum number of keys in the node, then delete the key and
borrow a key from its immediate sibling (through the parent).
Fill the empty space created in the index (internal node) with the borrowed key.
Deleting 35 from Btree
3. This case is similar to Case II(1) but here, empty space is generated above the
immediate parent node.
After deleting the key, merge the empty space with its sibling.
Fill the empty space in the grandparent node with the inorder successor.
Deleting 25 from Btree
Case III
In this case, the height of the tree gets shrinked. It is a little complicated.Deleting 55
from the tree below leads to this condition. It can be understood in the illustrations
below.
Deleting 55 from Btree
# B+ tee in python
import math
# Node creation
class Node:
def __init__(self, order):
self.order = order
self.values = []
self.keys = []
self.nextKey = None
self.parent = None
self.check_leaf = False
# B plus tree
class BplusTree:
def __init__(self, order):
self.root = Node(order)
self.root.check_leaf = True
# Insert operation
def insert(self, value, key):
value = str(value)
old_node = self.search(value)
old_node.insert_at_leaf(old_node, value, key)
if (len(old_node.values) == old_node.order):
node1 = Node(old_node.order)
node1.check_leaf = True
node1.parent = old_node.parent
mid = int(math.ceil(old_node.order / 2)) 1
node1.values = old_node.values[mid + 1:]
node1.keys = old_node.keys[mid + 1:]
node1.nextKey = old_node.nextKey
old_node.values = old_node.values[:mid + 1]
old_node.keys = old_node.keys[:mid + 1]
old_node.nextKey = node1
self.insert_in_parent(old_node, node1.values[0], node1)
parentNode = n.parent
temp3 = parentNode.keys
for i in range(len(temp3)):
if (temp3[i] == n):
parentNode.values = parentNode.values[:i] + \
[value] + parentNode.values[i:]
parentNode.keys = parentNode.keys[:i +
1] + [ndash] + parentNode.keys[i + 1:]
if (len(parentNode.keys) > parentNode.order):
parentdash = Node(parentNode.order)
parentdash.parent = parentNode.parent
mid = int(math.ceil(parentNode.order / 2)) 1
parentdash.values = parentNode.values[mid + 1:]
parentdash.keys = parentNode.keys[mid + 1:]
value_ = parentNode.values[mid]
if (mid == 0):
parentNode.values = parentNode.values[:mid + 1]
else:
parentNode.values = parentNode.values[:mid]
parentNode.keys = parentNode.keys[:mid + 1]
for j in parentNode.keys:
j.parent = parentNode
for j in parentdash.keys:
j.parent = parentdash
self.insert_in_parent(parentNode, value_, parentdash)
# Delete a node
def delete(self, value, key):
node_ = self.search(value)
temp = 0
for i, item in enumerate(node_.values):
if item == value:
temp = 1
if key in node_.keys[i]:
if len(node_.keys[i]) > 1:
node_.keys[i].pop(node_.keys[i].index(key))
elif node_ == self.root:
node_.values.pop(i)
node_.keys.pop(i)
else:
node_.keys[i].pop(node_.keys[i].index(key))
del node_.keys[i]
node_.values.pop(node_.values.index(value))
self.deleteEntry(node_, value, key)
else:
print("Value not in Key")
return
if temp == 0:
print("Value not in Tree")
return
# Delete an entry
def deleteEntry(self, node_, value, key):
if not node_.check_leaf:
for i, item in enumerate(node_.keys):
if item == key:
node_.keys.pop(i)
break
for i, item in enumerate(node_.values):
if item == value:
node_.values.pop(i)
break
is_predecessor = 0
parentNode = node_.parent
PrevNode = 1
NextNode = 1
PrevK = 1
PostK = 1
for i, item in enumerate(parentNode.keys):
if item == node_:
if i > 0:
PrevNode = parentNode.keys[i 1]
PrevK = parentNode.values[i 1]
if i < len(parentNode.keys) 1:
NextNode = parentNode.keys[i + 1]
PostK = parentNode.values[i]
if PrevNode == 1:
ndash = NextNode
value_ = PostK
elif NextNode == 1:
is_predecessor = 1
ndash = PrevNode
value_ = PrevK
else:
if len(node_.values) + len(NextNode.values) < node_.order:
ndash = NextNode
value_ = PostK
else:
is_predecessor = 1
ndash = PrevNode
value_ = PrevK
if not ndash.check_leaf:
for j in ndash.keys:
j.parent = ndash
if not ndash.check_leaf:
for j in ndash.keys:
j.parent = ndash
if not node_.check_leaf:
for j in node_.keys:
j.parent = node_
if not parentNode.check_leaf:
for j in parentNode.keys:
j.parent = parentNode
# Print the tree
def printTree(tree):
lst = [tree.root]
level = [0]
leaf = None
flag = 0
lev_leaf = 0
node1 = Node(str(level[0]) + str(tree.root.values))
while (len(lst) != 0):
x = lst.pop(0)
lev = level.pop(0)
if (x.check_leaf == False):
for i, item in enumerate(x.keys):
print(item.values)
else:
for i, item in enumerate(x.keys):
print(item.values)
if (flag == 0):
lev_leaf = lev
leaf = x
flag = 1
record_len = 3
bplustree = BplusTree(record_len)
bplustree.insert('5', '33')
bplustree.insert('15', '21')
bplustree.insert('25', '31')
bplustree.insert('35', '41')
bplustree.insert('45', '10')
printTree(bplustree)
if(bplustree.find('5', '34')):
print("Found")
else:
print("Not found")
RedBlack Tree
RedBlack Tree
RedBlack tree is a selfbalancing binary search tree in which each node contains an
extra bit for denoting the color of the node, either red or black.
A redblack tree satisfies the following properties:
An example of a redblack tree is:
Red Black Tree
Each node has the following attributes:
• color
• key
• leftChild
• rightChild
• parent (except root node)
The redblack color is meant for balancing the tree.
The limitations put on the node colors ensure that any simple path from the root to a
leaf is not more than twice as long as any other such path. It helps in maintaining the
selfbalancing property of the redblack tree.
Rotating the subtrees in a RedBlack Tree
In rotation operation, the positions of the nodes of a subtree are interchanged.
Rotation operation is used for maintaining the properties of a redblack tree when
they are violated by other operations such as insertion and deletion.
There are two types of rotations:
Left Rotate
In leftrotation, the arrangement of the nodes on the right is transformed into the
arrangements on the left node.
Algorithm
1. Let the initial tree be:
Initial tree
Right Rotate
In rightrotation, the arrangement of the nodes on the left is transformed into the
arrangements on the right node.
Initial Tree
4. Else if y is the right child of its parent p, make x as the right child of p.
In leftright rotation, the arrangements are first shifted to the left and then to the
right.
In rightleft rotation, the arrangements are first shifted to the right and then to the
left.
While inserting a new node, the new node is always inserted as a RED node. After
insertion of a new node, if the tree is violating the properties of the redblack tree
then, we do the following operations.
1. Recolor
2. Rotation
Following steps are followed for inserting a new element into a redblack tree:
1. Let y be the leaf (ie. NIL) and x be the root of the tree.
2. Check if the tree is empty (ie. whether x is NIL). If yes, insert newNode as a root node
and color it black.
This is because inserting a red node does not violate the depth property of a red
black tree.
If you attach a red node to a red node, then the rule is violated but it is easier to fix
this problem than the problem introduced by violating the depth property.
This algorithm is used for maintaining the property of a redblack tree if the insertion
of a newNode violates this property.
a. If the color of the right child of gP of z is RED, set the color of both the children
of gP as BLACK and the color of gP as RED.
b. Assign gP to newNode.
Case-II:
d. LeftRotate newNode.
Case-III:
f. RightRotate gP.
a. If the color of the left child of gP of z is RED, set the color of both the children of gP as
BLACK and the color of gP as RED.
b. Assign gP to newNode.
c. Else if newNode is the left child of p then, assign p to newNode and Right
Rotate newNode.
e. LeftRotate gP.
4. Set the root of the tree as BLACK.
This operation removes a node from the tree. After deleting a node, the redblack
property is maintained again.
4. Else
This violation is corrected by assuming that node x (which is occupying y's original
position) has an extra black. This makes node x neither red nor black. It is either
doubly black or blackandred. This violates the redblack properties.
However, the color attribute of x is not changed rather the extra black is represented
in x's pointing to the node.
1. Do the following until the x is not the root of the tree and the color of x is BLACK
iii. RightRotate w.
3. Else the same as above with right changed to left and vice versa.
import sys
# Node creation
class Node():
def __init__(self, item):
self.item = item
self.parent = None
self.left = None
self.right = None
self.color = 1
class RedBlackTree():
def __init__(self):
self.TNULL = Node(0)
self.TNULL.color = 0
self.TNULL.left = None
self.TNULL.right = None
self.root = self.TNULL
# Preorder
def pre_order_helper(self, node):
if node != TNULL:
sys.stdout.write(node.item + " ")
self.pre_order_helper(node.left)
self.pre_order_helper(node.right)
# Inorder
def in_order_helper(self, node):
if node != TNULL:
self.in_order_helper(node.left)
sys.stdout.write(node.item + " ")
self.in_order_helper(node.right)
# Postorder
def post_order_helper(self, node):
if node != TNULL:
self.post_order_helper(node.left)
self.post_order_helper(node.right)
sys.stdout.write(node.item + " ")
# Search the tree
def search_tree_helper(self, node, key):
if node == TNULL or key == node.item:
return node
if key < node.item:
return self.search_tree_helper(node.left, key)
return self.search_tree_helper(node.right, key)
# Balancing the tree after deletion
def delete_fix(self, x):
while x != self.root and x.color == 0:
if x == x.parent.left:
s = x.parent.right
if s.color == 1:
s.color = 0
x.parent.color = 1
self.left_rotate(x.parent)
s = x.parent.right
if s.left.color == 0 and s.right.color == 0:
s.color = 1
x = x.parent
else:
if s.right.color == 0:
s.left.color = 0
s.color = 1
self.right_rotate(s)
s = x.parent.right
s.color = x.parent.color
x.parent.color = 0
s.right.color = 0
self.left_rotate(x.parent)
x = self.root
else:
s = x.parent.left
if s.color == 1:
s.color = 0
x.parent.color = 1
self.right_rotate(x.parent)
s = x.parent.left
if s.right.color == 0 and s.right.color == 0:
s.color = 1
x = x.parent
else:
if s.left.color == 0:
s.right.color = 0
s.color = 1
self.left_rotate(s)
s = x.parent.left
s.color = x.parent.color
x.parent.color = 0
s.left.color = 0
self.right_rotate(x.parent)
x = self.root
x.color = 0
def __rb_transplant(self, u, v):
if u.parent == None:
self.root = v
elif u == u.parent.left:
u.parent.left = v
else:
u.parent.right = v
v.parent = u.parent
# Node deletion
def delete_node_helper(self, node, key):
z = self.TNULL
while node != self.TNULL:
if node.item == key:
z = node
if node.item <= key:
node = node.right
else:
node = node.left
if z == self.TNULL:
print("Cannot find key in the tree")
return
y = z
y_original_color = y.color
if z.left == self.TNULL:
x = z.right
self.__rb_transplant(z, z.right)
elif (z.right == self.TNULL):
x = z.left
self.__rb_transplant(z, z.left)
else:
y = self.minimum(z.right)
y_original_color = y.color
x = y.right
if y.parent == z:
x.parent = y
else:
self.__rb_transplant(y, y.right)
y.right = z.right
y.right.parent = y
self.__rb_transplant(z, y)
y.left = z.left
y.left.parent = y
y.color = z.color
if y_original_color == 0:
self.delete_fix(x)
# Balance the tree after insertion
def fix_insert(self, k):
while k.parent.color == 1:
if k.parent == k.parent.parent.right:
u = k.parent.parent.left
if u.color == 1:
u.color = 0
k.parent.color = 0
k.parent.parent.color = 1
k = k.parent.parent
else:
if k == k.parent.left:
k = k.parent
self.right_rotate(k)
k.parent.color = 0
k.parent.parent.color = 1
self.left_rotate(k.parent.parent)
else:
u = k.parent.parent.right
if u.color == 1:
u.color = 0
k.parent.color = 0
k.parent.parent.color = 1
k = k.parent.parent
else:
if k == k.parent.right:
k = k.parent
self.left_rotate(k)
k.parent.color = 0
k.parent.parent.color = 1
self.right_rotate(k.parent.parent)
if k == self.root:
break
self.root.color = 0
# Printing the tree
def __print_helper(self, node, indent, last):
if node != self.TNULL:
sys.stdout.write(indent)
if last:
sys.stdout.write("R")
indent += " "
else:
sys.stdout.write("L")
indent += "| "
s_color = "RED" if node.color == 1 else "BLACK"
print(str(node.item) + "(" + s_color + ")")
self.__print_helper(node.left, indent, False)
self.__print_helper(node.right, indent, True)
def preorder(self):
self.pre_order_helper(self.root)
def inorder(self):
self.in_order_helper(self.root)
def postorder(self):
self.post_order_helper(self.root)
def searchTree(self, k):
return self.search_tree_helper(self.root, k)
def minimum(self, node):
while node.left != self.TNULL:
node = node.left
return node
def maximum(self, node):
while node.right != self.TNULL:
node = node.right
return node
def successor(self, x):
if x.right != self.TNULL:
return self.minimum(x.right)
y = x.parent
while y != self.TNULL and x == y.right:
x = y
y = y.parent
return y
def predecessor(self, x):
if (x.left != self.TNULL):
return self.maximum(x.left)
y = x.parent
while y != self.TNULL and x == y.left:
x = y
y = y.parent
return y
def left_rotate(self, x):
y = x.right
x.right = y.left
if y.left != self.TNULL:
y.left.parent = x
y.parent = x.parent
if x.parent == None:
self.root = y
elif x == x.parent.left:
x.parent.left = y
else:
x.parent.right = y
y.left = x
x.parent = y
def right_rotate(self, x):
y = x.left
x.left = y.right
if y.right != self.TNULL:
y.right.parent = x
y.parent = x.parent
if x.parent == None:
self.root = y
elif x == x.parent.right:
x.parent.right = y
else:
x.parent.left = y
y.right = x
x.parent = y
def insert(self, key):
node = Node(key)
node.parent = None
node.item = key
node.left = self.TNULL
node.right = self.TNULL
node.color = 1
y = None
x = self.root
while x != self.TNULL:
y = x
if node.item < x.item:
x = x.left
else:
x = x.right
node.parent = y
if y == None:
self.root = node
elif node.item < y.item:
y.left = node
else:
y.right = node
if node.parent == None:
node.color = 0
return
if node.parent.parent == None:
return
self.fix_insert(node)
def get_root(self):
return self.root
def delete_node(self, item):
self.delete_node_helper(self.root, item)
def print_tree(self):
self.__print_helper(self.root, "", True)
if __name__ == "__main__":
bst = RedBlackTree()
bst.insert(55)
bst.insert(40)
bst.insert(65)
bst.insert(60)
bst.insert(75)
bst.insert(57)
bst.print_tree()
Before reading this article, please refer to the article on redblack tree.
While inserting a new node, the new node is always inserted as a RED node. After
insertion of a new node, if the tree is violating the properties of the redblack tree
then, we do the following operations.
1. Recolor
2. Rotation
New node
2. Let y be the leaf (ie. NIL) and x be the root of the tree. The new node is inserted in the
following tree.
Initial tree
3. Check if the tree is empty (ie. whether x is NIL). If yes, insert newNode as a root node
and color it black.
Set the color of the newNode red and assign null to the children
This is because inserting a red node does not violate the depth property of a red
black tree.
If you attach a red node to a red node, then the rule is violated but it is easier to fix
this problem than the problem introduced by violating the depth property.
a. If the color of the right child of gP of newNode is RED, set the color of both the children
of gP as BLACK and the color of gP as RED.
Color change
b. Assign gP to newNode.
Reassigning newNode
Case-II:
c. (Before moving on to this step, while loop is checked. If conditions are not satisfied,
it the loop is broken.)
Else if newNode is the right child of p then, assign p to newNode.
Assigning parent of newNode as newNode
d. LeftRotate newNode.
Left Rotate
Case-III:
e. (Before moving on to this step, while loop is checked. If conditions are not satisfied,
it the loop is broken.)
Set color of p as BLACK and color of gP as RED.
Color change
f. RightRotate gP.
Right Rotate
a. If the color of the left child of gP of z is RED, set the color of both the children of gP as
BLACK and the color of gP as RED.
b. Assign gP to newNode.
c. Else if newNode is the left child of p then, assign p to newNode and RightRotate newNode.
e. LeftRotate gP.
4. (This step is performed after coming out of the while loop.)
Set the root of the tree as BLACK.
Final tree
import sys
# Node creation
class Node():
def __init__(self, item):
self.item = item
self.parent = None
self.left = None
self.right = None
self.color = 1
class RedBlackTree():
def __init__(self):
self.TNULL = Node(0)
self.TNULL.color = 0
self.TNULL.left = None
self.TNULL.right = None
self.root = self.TNULL
# Preorder
def pre_order_helper(self, node):
if node != TNULL:
sys.stdout.write(node.item + " ")
self.pre_order_helper(node.left)
self.pre_order_helper(node.right)
# Inorder
def in_order_helper(self, node):
if node != TNULL:
self.in_order_helper(node.left)
sys.stdout.write(node.item + " ")
self.in_order_helper(node.right)
# Postorder
def post_order_helper(self, node):
if node != TNULL:
self.post_order_helper(node.left)
self.post_order_helper(node.right)
sys.stdout.write(node.item + " ")
# Search the tree
def search_tree_helper(self, node, key):
if node == TNULL or key == node.item:
return node
if key < node.item:
return self.search_tree_helper(node.left, key)
return self.search_tree_helper(node.right, key)
# Balance the tree after insertion
def fix_insert(self, k):
while k.parent.color == 1:
if k.parent == k.parent.parent.right:
u = k.parent.parent.left
if u.color == 1:
u.color = 0
k.parent.color = 0
k.parent.parent.color = 1
k = k.parent.parent
else:
if k == k.parent.left:
k = k.parent
self.right_rotate(k)
k.parent.color = 0
k.parent.parent.color = 1
self.left_rotate(k.parent.parent)
else:
u = k.parent.parent.right
if u.color == 1:
u.color = 0
k.parent.color = 0
k.parent.parent.color = 1
k = k.parent.parent
else:
if k == k.parent.right:
k = k.parent
self.left_rotate(k)
k.parent.color = 0
k.parent.parent.color = 1
self.right_rotate(k.parent.parent)
if k == self.root:
break
self.root.color = 0
# Printing the tree
def __print_helper(self, node, indent, last):
if node != self.TNULL:
sys.stdout.write(indent)
if last:
sys.stdout.write("R")
indent += " "
else:
sys.stdout.write("L")
indent += "| "
s_color = "RED" if node.color == 1 else "BLACK"
print(str(node.item) + "(" + s_color + ")")
self.__print_helper(node.left, indent, False)
self.__print_helper(node.right, indent, True)
def preorder(self):
self.pre_order_helper(self.root)
def inorder(self):
self.in_order_helper(self.root)
def postorder(self):
self.post_order_helper(self.root)
def searchTree(self, k):
return self.search_tree_helper(self.root, k)
def minimum(self, node):
while node.left != self.TNULL:
node = node.left
return node
def maximum(self, node):
while node.right != self.TNULL:
node = node.right
return node
def successor(self, x):
if x.right != self.TNULL:
return self.minimum(x.right)
y = x.parent
while y != self.TNULL and x == y.right:
x = y
y = y.parent
return y
def predecessor(self, x):
if (x.left != self.TNULL):
return self.maximum(x.left)
y = x.parent
while y != self.TNULL and x == y.left:
x = y
y = y.parent
return y
def left_rotate(self, x):
y = x.right
x.right = y.left
if y.left != self.TNULL:
y.left.parent = x
y.parent = x.parent
if x.parent == None:
self.root = y
elif x == x.parent.left:
x.parent.left = y
else:
x.parent.right = y
y.left = x
x.parent = y
def right_rotate(self, x):
y = x.left
x.left = y.right
if y.right != self.TNULL:
y.right.parent = x
y.parent = x.parent
if x.parent == None:
self.root = y
elif x == x.parent.right:
x.parent.right = y
else:
x.parent.left = y
y.right = x
x.parent = y
def insert(self, key):
node = Node(key)
node.parent = None
node.item = key
node.left = self.TNULL
node.right = self.TNULL
node.color = 1
y = None
x = self.root
while x != self.TNULL:
y = x
if node.item < x.item:
x = x.left
else:
x = x.right
node.parent = y
if y == None:
self.root = node
elif node.item < y.item:
y.left = node
else:
y.right = node
if node.parent == None:
node.color = 0
return
if node.parent.parent == None:
return
self.fix_insert(node)
def get_root(self):
return self.root
def print_tree(self):
self.__print_helper(self.root, "", True)
if __name__ == "__main__":
bst = RedBlackTree()
bst.insert(55)
bst.insert(40)
bst.insert(65)
bst.insert(60)
bst.insert(75)
bst.insert(57)
bst.print_tree()
RedBlack Tree Deletion
Before reading this article, please refer to the article on redblack tree.
Deleting a node may or may not disrupt the redblack properties of a redblack tree.
If this action violates the redblack properties, then a fixing algorithm is used to
regain the redblack properties.
Node to be deleted
Assign x to the rightChild
Transplant nodeToBeDeleted with x
This violation is corrected by assuming that node x (which is occupying y's original
position) has an extra black. This makes node x neither red nor black. It is either
doubly black or blackandred. This violates the redblack properties.
However, the color attribute of x is not changed rather the extra black is represented
in x's pointing to the node.
1. Do the following until the x is not the root of the tree and the color of x is BLACK
2. If x is the left child of its parent then,
Assigning w
Color change
iii. LeftRotate the parent of x.
Leftrotate
Reassign w
Color change
iii. RightRotate w.
Right rotate
iv. Assign the rightChild of the parent of x to w.
Reassign w
Color change
iv. LeftRotate the parent of x.
Leftrotate
Set x as root
3. Else same as above with right changed to left and vice versa.
The workflow of the above cases can be understood with the help of the flowchart
below.
Flowchart for deletion operation
# Implementing RedBlack Tree in Python
import sys
# Node creation
class Node():
def __init__(self, item):
self.item = item
self.parent = None
self.left = None
self.right = None
self.color = 1
class RedBlackTree():
def __init__(self):
self.TNULL = Node(0)
self.TNULL.color = 0
self.TNULL.left = None
self.TNULL.right = None
self.root = self.TNULL
# Preorder
def pre_order_helper(self, node):
if node != TNULL:
sys.stdout.write(node.item + " ")
self.pre_order_helper(node.left)
self.pre_order_helper(node.right)
# Inorder
def in_order_helper(self, node):
if node != TNULL:
self.in_order_helper(node.left)
sys.stdout.write(node.item + " ")
self.in_order_helper(node.right)
# Postorder
def post_order_helper(self, node):
if node != TNULL:
self.post_order_helper(node.left)
self.post_order_helper(node.right)
sys.stdout.write(node.item + " ")
# Search the tree
def search_tree_helper(self, node, key):
if node == TNULL or key == node.item:
return node
if key < node.item:
return self.search_tree_helper(node.left, key)
return self.search_tree_helper(node.right, key)
# Balancing the tree after deletion
def delete_fix(self, x):
while x != self.root and x.color == 0:
if x == x.parent.left:
s = x.parent.right
if s.color == 1:
s.color = 0
x.parent.color = 1
self.left_rotate(x.parent)
s = x.parent.right
if s.left.color == 0 and s.right.color == 0:
s.color = 1
x = x.parent
else:
if s.right.color == 0:
s.left.color = 0
s.color = 1
self.right_rotate(s)
s = x.parent.right
s.color = x.parent.color
x.parent.color = 0
s.right.color = 0
self.left_rotate(x.parent)
x = self.root
else:
s = x.parent.left
if s.color == 1:
s.color = 0
x.parent.color = 1
self.right_rotate(x.parent)
s = x.parent.left
if s.right.color == 0 and s.left.color == 0:
s.color = 1
x = x.parent
else:
if s.left.color == 0:
s.right.color = 0
s.color = 1
self.left_rotate(s)
s = x.parent.left
s.color = x.parent.color
x.parent.color = 0
s.left.color = 0
self.right_rotate(x.parent)
x = self.root
x.color = 0
def __rb_transplant(self, u, v):
if u.parent == None:
self.root = v
elif u == u.parent.left:
u.parent.left = v
else:
u.parent.right = v
v.parent = u.parent
# Node deletion
def delete_node_helper(self, node, key):
z = self.TNULL
while node != self.TNULL:
if node.item == key:
z = node
if node.item <= key:
node = node.right
else:
node = node.left
if z == self.TNULL:
print("Cannot find key in the tree")
return
y = z
y_original_color = y.color
if z.left == self.TNULL:
x = z.right
self.__rb_transplant(z, z.right)
elif (z.right == self.TNULL):
x = z.left
self.__rb_transplant(z, z.left)
else:
y = self.minimum(z.right)
y_original_color = y.color
x = y.right
if y.parent == z:
x.parent = y
else:
self.__rb_transplant(y, y.right)
y.right = z.right
y.right.parent = y
self.__rb_transplant(z, y)
y.left = z.left
y.left.parent = y
y.color = z.color
if y_original_color == 0:
self.delete_fix(x)
# Balance the tree after insertion
def fix_insert(self, k):
while k.parent.color == 1:
if k.parent == k.parent.parent.right:
u = k.parent.parent.left
if u.color == 1:
u.color = 0
k.parent.color = 0
k.parent.parent.color = 1
k = k.parent.parent
else:
if k == k.parent.left:
k = k.parent
self.right_rotate(k)
k.parent.color = 0
k.parent.parent.color = 1
self.left_rotate(k.parent.parent)
else:
u = k.parent.parent.right
if u.color == 1:
u.color = 0
k.parent.color = 0
k.parent.parent.color = 1
k = k.parent.parent
else:
if k == k.parent.right:
k = k.parent
self.left_rotate(k)
k.parent.color = 0
k.parent.parent.color = 1
self.right_rotate(k.parent.parent)
if k == self.root:
break
self.root.color = 0
# Printing the tree
def __print_helper(self, node, indent, last):
if node != self.TNULL:
sys.stdout.write(indent)
if last:
sys.stdout.write("R")
indent += " "
else:
sys.stdout.write("L")
indent += "| "
s_color = "RED" if node.color == 1 else "BLACK"
print(str(node.item) + "(" + s_color + ")")
self.__print_helper(node.left, indent, False)
self.__print_helper(node.right, indent, True)
def preorder(self):
self.pre_order_helper(self.root)
def inorder(self):
self.in_order_helper(self.root)
def postorder(self):
self.post_order_helper(self.root)
def searchTree(self, k):
return self.search_tree_helper(self.root, k)
def minimum(self, node):
while node.left != self.TNULL:
node = node.left
return node
def maximum(self, node):
while node.right != self.TNULL:
node = node.right
return node
def successor(self, x):
if x.right != self.TNULL:
return self.minimum(x.right)
y = x.parent
while y != self.TNULL and x == y.right:
x = y
y = y.parent
return y
def predecessor(self, x):
if (x.left != self.TNULL):
return self.maximum(x.left)
y = x.parent
while y != self.TNULL and x == y.left:
x = y
y = y.parent
return y
def left_rotate(self, x):
y = x.right
x.right = y.left
if y.left != self.TNULL:
y.left.parent = x
y.parent = x.parent
if x.parent == None:
self.root = y
elif x == x.parent.left:
x.parent.left = y
else:
x.parent.right = y
y.left = x
x.parent = y
def right_rotate(self, x):
y = x.left
x.left = y.right
if y.right != self.TNULL:
y.right.parent = x
y.parent = x.parent
if x.parent == None:
self.root = y
elif x == x.parent.right:
x.parent.right = y
else:
x.parent.left = y
y.right = x
x.parent = y
def insert(self, key):
node = Node(key)
node.parent = None
node.item = key
node.left = self.TNULL
node.right = self.TNULL
node.color = 1
y = None
x = self.root
while x != self.TNULL:
y = x
if node.item < x.item:
x = x.left
else:
x = x.right
node.parent = y
if y == None:
self.root = node
elif node.item < y.item:
y.left = node
else:
y.right = node
if node.parent == None:
node.color = 0
return
if node.parent.parent == None:
return
self.fix_insert(node)
def get_root(self):
return self.root
def delete_node(self, item):
self.delete_node_helper(self.root, item)
def print_tree(self):
self.__print_helper(self.root, "", True)
if __name__ == "__main__":
bst = RedBlackTree()
bst.insert(55)
bst.insert(40)
bst.insert(65)
bst.insert(60)
bst.insert(75)
bst.insert(57)
bst.print_tree()
print("\nAfter deleting an element")
bst.delete_node(40)
bst.print_tree()
Graph DS
Let's try to understand this through an example. On facebook, everything is a node.
That includes User, Photo, Album, Event, Group, Page, Comment, Story, Video, Link,
Note...anything that has data is a node.
Every relationship is an edge from one node to another. Whether you post a photo,
join a group, like a page, etc., a new edge is created for that relationship.
Example of graph data structure
All of facebook is then a collection of these nodes and edges. This is because
facebook uses a graph data structure to store its data.
More precisely, a graph is a data structure (V, E) that consists of
• A collection of vertices V
• A collection of edges E, represented as ordered pairs of vertices (u,v)
Vertices and edges
In the graph,
V = {0, 1, 2, 3}
E = {(0,1), (0,2), (0,3), (1,2)}
G = {V, E}
Graph Terminology
• Adjacency: A vertex is said to be adjacent to another vertex if there is an edge
connecting them. Vertices 2 and 3 are not adjacent because there is no edge
between them.
• Path: A sequence of edges that allows you to go from vertex A to vertex B is called a
path. 01, 12 and 02 are paths from vertex 0 to vertex 2.
• Directed Graph: A graph in which an edge (u,v) doesn't necessarily mean that there
is an edge (v, u) as well. The edges in such a graph are represented by arrows to
show the direction of the edge.
Graph Representation
Graphs are commonly represented in two ways:
1. Adjacency Matrix
An adjacency matrix is a 2D array of V x V vertices. Each row and column represent a
vertex.
If the value of any element a[i][j] is 1, it represents that there is an edge connecting
vertex i and vertex j.
2. Adjacency List
The index of the array represents a vertex and each element in its linked list
represents the other vertices that form an edge with the vertex.
The adjacency list for the graph we made in the first example is as follows:
Adjacency list representation
An adjacency list is efficient in terms of storage because we only need to store the
values for the edges. For a graph with millions of vertices, this can mean a lot of
saved space.
Graph Operations
The most common graph operations are:
• Graph Traversal
An undirected graph is a graph in which the edges do not point in any direction (ie.
the edges are bidirectional).
Undirected Graph
A connected graph is a graph in which there is always a path from a vertex to any
other vertex.
Connected Graph
Spanning tree
A spanning tree is a subgraph of an undirected connected graph, which includes all
the vertices of the graph with a minimum possible number of edges. If a vertex is
missed, then it is not a spanning tree.
The total number of spanning trees with n vertices that can be created from a
complete graph is equal to n(n2).
If we have n = 4, the maximum number of possible spanning trees is equal to 442 = 16.
Thus, 16 spanning trees can be formed from a complete graph with 4 vertices.
Normal graph
Some of the possible spanning trees that can be created from the above graph are:
A spanning tree
A spanning tree
A spanning tree
A spanning tree
A spanning tree
A spanning tree
Weighted graph
The possible spanning trees from the above graph are:
Minimum spanning tree 1
1. Prim's Algorithm
2. Kruskal's Algorithm
• Cluster Analysis
For example:
Initial graph
The strongly connected components of the above graph are:
Let us start from vertex0, visit all of its child vertices, and mark the visited vertices
as done. If a vertex leads to an already visited vertex, then push this vertex to the
stack.
For example: Starting from vertex0, go to vertex1, vertex2, and then to vertex3.
Vertex3 leads to already visited vertex0, so push the source vertex (ie. vertex3)
into the stack.
Stacking
Similarly, a final stack is created.
Final Stack
Start from the top vertex of the stack. Traverse through all of its child vertices. Once
the already visited vertex is reached, one strongly connected component is formed.
For example: Pop vertex0 from the stack. Starting from vertex0, traverse through
its child vertices (vertex0, vertex1, vertex2, vertex3 in sequence) and mark them
as visited. The child of vertex3 is already visited, so these visited vertices form one
strongly connected component.
Start from the top and traverse through all the vertices
Go to the stack and pop the top vertex if already visited. Otherwise, choose the top
vertex from the stack and traverse through its child vertices as presented above.
class Graph:
# dfs
def dfs(self, d, visited_vertex):
visited_vertex[d] = True
print(d, end='')
for i in self.graph[d]:
if not visited_vertex[i]:
self.dfs(i, visited_vertex)
for i in self.graph:
for j in self.graph[i]:
g.add_edge(j, i)
return g
for i in range(self.V):
if not visited_vertex[i]:
self.fill_order(i, visited_vertex, stack)
gr = self.transpose()
while stack:
i = stack.pop()
if not visited_vertex[i]:
gr.dfs(i, visited_vertex)
print("")
g = Graph(8)
g.add_edge(0, 1)
g.add_edge(1, 2)
g.add_edge(2, 3)
g.add_edge(2, 4)
g.add_edge(3, 0)
g.add_edge(4, 5)
g.add_edge(5, 6)
g.add_edge(6, 4)
g.add_edge(6, 7)
• Maps
Adjacency Matrix
An adjacency matrix is a way of representing a graph as a matrix of booleans (0's and
1's). A finite graph can be represented in the form of a square matrix on a computer,
where the boolean value of the matrix indicates if there is a direct path between two
vertices.
An undirected graph
We can represent this graph in matrix form like below.
Matrix representation of the graph
Each cell in the above table/matrix is represented as Aij, where i and j are vertices.
The value of Aij is either 1 or 0 depending on whether there is an edge from
vertex i to vertex j.
If there is a path from i to j, then the value of Aij is 1 otherwise its 0. For instance,
there is a path from vertex 1 to vertex 2, so A12 is 1 and there is no path from vertex 1
to 3, so A13 is 0.
In case of undirected graphs, the matrix is symmetric about the diagonal because of
every edge (i,j), there is also an edge (j,i).
• If the graph is dense and the number of edges is large, an adjacency matrix should be
the first choice. Even if the graph and the adjacency matrix is sparse, we can
represent it using data structures for sparse matrices.
• The biggest advantage, however, comes from the use of matrices. The recent
advances in hardware enable us to perform even expensive matrix operations on the
GPU.
• By performing operations on the adjacent matrix, we can get important insights into
the nature of the graph and the relationship between its vertices.
• While basic operations are easy, operations like inEdges and outEdges are expensive
when using the adjacency matrix representation.
class Graph(object):
# Add edges
def add_edge(self, v1, v2):
if v1 == v2:
print("Same vertex %d and %d" % (v1, v2))
self.adjMatrix[v1][v2] = 1
self.adjMatrix[v2][v1] = 1
# Remove edges
def remove_edge(self, v1, v2):
if self.adjMatrix[v1][v2] == 0:
print("No edge between %d and %d" % (v1, v2))
return
self.adjMatrix[v1][v2] = 0
self.adjMatrix[v2][v1] = 0
def __len__(self):
return self.size
def main():
g = Graph(5)
g.add_edge(0, 1)
g.add_edge(0, 2)
g.add_edge(1, 2)
g.add_edge(2, 0)
g.add_edge(2, 3)
g.print_matrix()
if __name__ == '__main__':
main()
• Navigation tasks
Adjacency List
Adjacency List
An adjacency list represents a graph as an array of linked lists. The index of the array
represents a vertex and each element in its linked list represents the other vertices
that form an edge with the vertex.
An undirected graph
We can represent this graph in the form of a linked list on a computer as shown
below.
Linked list representation of the graph
Here, 0, 1, 2, 3 are the vertices and each of them forms a linked list with all of its
adjacent vertices. For instance, vertex 1 has two adjacent vertices 0 and 2. Therefore,
1 is linked with 0 and 2 in the figure above.
We stay close to the basic definition of a graph a collection of vertices and edges {V,
E}. For simplicity, we use an unlabeled graph as opposed to a labeled one i.e. the
All we are saying is we want to store a pointer to struct node*. This is because we don't
know how many vertices the graph will have and so we cannot create an array of
Linked Lists at compile time.
From <https://www.programiz.com/dsa/graphadjacencylist>
class AdjNode:
def __init__(self, value):
self.vertex = value
self.next = None
class Graph:
def __init__(self, num):
self.V = num
self.graph = [None] * self.V
# Add edges
def add_edge(self, s, d):
node = AdjNode(d)
node.next = self.graph[s]
self.graph[s] = node
node = AdjNode(s)
node.next = self.graph[d]
self.graph[d] = node
if __name__ == "__main__":
V=5
graph.print_agraph()
1. Visited
2. Not Visited
The purpose of the algorithm is to mark each vertex as visited while avoiding cycles.
2. Take the top item of the stack and add it to the visited list.
3. Create a list of that vertex's adjacent nodes. Add the ones which aren't in the visited
list to the top of the stack.
Vertex 2 has an unvisited adjacent vertex in 4, so we add that to the top of the stack and visit it.
Vertex 2 has an unvisited adjacent vertex in 4, so we add that to the top of the stack and visit it.
After we visit the last element 3, it doesn't have any unvisited adjacent nodes, so we
have completed the Depth First Traversal of the graph.
After we visit the last element 3, it doesn't have any unvisited adjacent nodes, so we have
completed the Depth First Traversal of the graph.
init() {
For each u ∈ G
u.visited = false
For each u ∈ G
DFS(G, u)
}
# DFS algorithm
def dfs(graph, start, visited=None):
if visited is None:
visited = set()
visited.add(start)
print(start)
dfs(graph, '0')
BFS algorithm
A standard BFS implementation puts each vertex of the graph into one of two
categories:
1. Visited
2. Not Visited
The purpose of the algorithm is to mark each vertex as visited while avoiding cycles.
1. Start by putting any one of the graph's vertices at the back of a queue.
2. Take the front item of the queue and add it to the visited list.
3. Create a list of that vertex's adjacent nodes. Add the ones which aren't in the visited
list to the back of the queue.
The graph might have two different disconnected parts so to make sure that we
cover every vertex, we can also run the BFS algorithm on every node
BFS example
Let's see how the Breadth First Search algorithm works with an example. We use an
undirected graph with 5 vertices.
Visit last remaining item in the stack to check if it has unvisited neighbors
Since the queue is empty, we have completed the Breadth First Traversal of the
graph.
BFS pseudocode
create a queue Q
mark v as visited and put v into Q
while Q is nonempty
remove the head u of Q
mark and enqueue all (unvisited) neighbours of u
import collections
# BFS algorithm
def bfs(graph, root):
while queue:
if __name__ == '__main__':
graph = {0: [1, 2], 1: [2], 2: [3], 3: [1, 2]}
print("Following is Breadth First Traversal: ")
bfs(graph, 0)
It is similar to Dijkstra's algorithm but it can work with graphs in which edges can
have negative weights.
Why would one ever have edges with negative weights in real
life?
Negative weight edges might seem useless at first but they can explain a lot of
phenomena like cashflow, the heat released/absorbed in a chemical reaction, etc.
For instance, if there are different ways to reach from one chemical A to another
chemical B, each method will have subreactions involving both heat dissipation and
absorption.
If we want to find the set of reactions where minimum energy is required, then we
will need to be able to factor in the heat absorption as negative weights and heat
dissipation as positive weights.
Negative weight cycles can give an incorrect result when trying to find out the shortest path
Shortest path algorithms like Dijkstra's Algorithm that aren't able to detect such a
cycle can give an incorrect result because they can go through a negative weight
cycle and reduce the path length.
By doing this repeatedly for all vertices, we can guarantee that the result is
optimized.
Step1 for Bellman Ford's algorithm
We also want to be able to get the shortest path, not only know the length of the
shortest path. For this, we map each vertex to the vertex that last updated its path
length.
Once the algorithm is over, we can backtrack from the destination vertex to the
source vertex to find the path.
function bellmanFord(G, S)
for each vertex V in G
distance[V] < infinite
previous[V] < NULL
distance[S] < 0
for each vertex V in G
for each edge (U,V) in G
tempDistance < distance[U] + edge_weight(U, V)
if tempDistance < distance[V]
distance[V] < tempDistance
previous[V] < U
Bellman Ford's Algorithm vs Dijkstra's Algorithm
# Bellman Ford Algorithm in Python
class Graph:
def __init__(self, vertices):
self.V = vertices # Total number of vertices in the graph
self.graph = [] # Array of edges
# Add edges
def add_edge(self, s, d, w):
self.graph.append([s, d, w])
# Print the solution
def print_solution(self, dist):
print("Vertex Distance from Source")
for i in range(self.V):
print("{0}\t\t{1}".format(i, dist[i]))
def bellman_ford(self, src):
# Step 1: fill the distance array and predecessor array
dist = [float("Inf")] * self.V
# Mark the source vertex
dist[src] = 0
# Step 2: relax edges |V| 1 times
for _ in range(self.V 1):
for s, d, w in self.graph:
if dist[s] != float("Inf") and dist[s] + w < dist[d]:
dist[d] = dist[s] + w
# Step 3: detect negative cycle
# if value changes then we have a negative cycle in the graph
# and we cannot find the shortest distances
for s, d, w in self.graph:
if dist[s] != float("Inf") and dist[s] + w < dist[d]:
print("Graph contains negative weight cycle")
return
# No negative weight cycle found!
# Print the distance and predecessor array
self.print_solution(dist)
g = Graph(5)
g.add_edge(0, 1, 5)
g.add_edge(0, 2, 4)
g.add_edge(1, 3, 3)
g.add_edge(2, 1, 6)
g.add_edge(3, 2, 2)
g.bellman_ford(0)
Space Complexity
And, the space complexity is O(V).
2. For finding the shortest path
Bubble Sort
Bubble Sort
Bubble sort is a sorting algorithm that compares two adjacent elements and swaps
them until they are not in the intended order.
Just like the movement of air bubbles in the water that rise up to the surface, each
element of the array move to the end in each iteration. Therefore, it is called a
bubble sort.
1. Starting from the first index, compare the first and the second elements.
2. If the first element is greater than the second element, they are swapped.
3. Now, compare the second and the third elements. Swap them if they are not in
order.
4. The above process goes on until the last element.
Compare the Adjacent Elements
2. Remaining Iteration
The same process goes on for the remaining iterations.
After each iteration, the largest element among the unsorted elements is placed at
the end.
Put the largest element at the end
In each iteration, the comparison takes place up to the last unsorted element.
Compare the adjacent elements
The array is sorted when all the unsorted elements are placed at their correct
positions.
The array is sorted if all elements are kept in the right order
def bubbleSort(array):
swapped = True
bubbleSort(data)
Complexity in Detail
nearly equals to n2
Also, if we observe the code, bubble sort requires two loops. Hence, the complexity
is n*n = n2
1. Time Complexities
2. Space Complexity
• In the optimized bubble sort algorithm, two extra variables are used. Hence, the
space complexity will be O(2).
2. Compare minimum with the second element. If the second element is smaller
than minimum, assign the second element as minimum.
Compare minimum with the third element. Again, if the third element is smaller, then
assign minimum to the third element otherwise do nothing. The process goes on until
the last element.
Compare minimum with the remaining elements
3. After each iteration, minimum is placed in the front of the unsorted list.
4. For each iteration, indexing starts from the first unsorted element. Step 1 to 3 are
repeated until all the elements are placed at their correct positions.
The first iteration
The second iteration
Complexity = O(n2)
Also, we can analyze the complexity by simply observing the number of loops. There
are 2 loops so the complexity is n*n = n2.
Time Complexities:
The time complexity of the selection sort is the same in all cases. At every step, you
have to find the minimum element and put it in the right place. The minimum
element is not known until the end of the array is not reached.
Space Complexity:
Insertion sort works similarly as we sort cards in our hand in a card game.
We assume that the first card is already sorted then, we select an unsorted card. If
the unsorted card is greater than the card in hand, it is placed on the right otherwise,
to the left. In the same way, other unsorted cards are taken and put in their right
place.
Initial array
1. The first element in the array is assumed to be sorted. Take the second element and
store it separately in key.
Compare key with the first element. If the first element is greater than key, then key is
placed in front of the first element.
If the first element is greater than key, then key is placed in front of the first
element.
Take the third element and compare it with the elements on the left of it. Placed it
just behind the element smaller than it. If there is no element smaller than it, then
place it at the beginning of the array.
Place 1 at the beginning
def insertionSort(array):
# Compare key with each element on the left of it until an element smaller than it is found
# For descending order, change key<array[j] to key>array[j].
while j >= 0 and key < array[j]:
array[j + 1] = array[j]
j=j1
data = [9, 5, 1, 4, 3]
insertionSort(data)
print('Sorted Array in Ascending Order:')
print(data)
Time Complexities
Each element has to be compared with each of the other elements so, for every nth
element, (n1) number of comparisons are made.
Divide
If q is the halfway point between p and r, then we can split the subarray A[p..r] into
two arrays A[p..q] and A[q+1, r].
Conquer
In the conquer step, we try to sort both the subarrays A[p..q] and A[q+1, r]. If we haven't
yet reached the base case, we again divide both these subarrays and try to sort
them.
Combine
When the conquer step reaches the base step and we get two sorted
subarrays A[p..q] and A[q+1, r] for array A[p..r], we combine the results by creating a
sorted array A[p..r] from two sorted subarrays A[p..q] and A[q+1, r].
MergeSort Algorithm
The MergeSort function repeatedly divides the array into two halves until we reach a
stage where we try to perform MergeSort on a subarray of size 1 i.e. p == r.
After that, the merge function comes into play and combines the sorted arrays into
larger arrays until the whole array is merged.
MergeSort(A, p, r):
if p > r
return
q = (p+r)/2
mergeSort(A, p, q)
mergeSort(A, q+1, r)
merge(A, p, q, r)
As shown in the image below, the merge sort algorithm recursively divides the array
into halves until we reach the base case of array with 1 element. After that, the
merge function picks up the sorted subarrays and merges them to gradually sort the
entire array.
Every recursive algorithm is dependent on a base case and the ability to combine the
results from base cases. Merge sort is no different. The most important part of the
merge sort algorithm is, you guessed it, merge step.
The merge step is the solution to the simple problem of merging two sorted
lists(arrays) to build one large sorted list(array).
The algorithm maintains three pointers, one for each of the two arrays and one for
maintaining the current index of the final sorted array.
Have we reached the end of any of the arrays?
No:
Compare current elements of both arrays
Copy smaller element into sorted array
Move pointer of element containing smaller element
Yes:
Copy all remaining elements of nonempty array
Merge step
This is why we only need the array, the first position, the last index of the first
subarray(we can calculate the first index of the second subarray) and the last index
of the second subarray.
Our task is to merge two subarrays A[p..q] and A[q+1..r] to create a sorted array A[p..r].
So the inputs to the function are A, p, q and r
3. Until we reach the end of either L or M, pick the larger among the elements
from L and M and place them in the correct position at A[p..q]
inti, j, k;
i= 0;
j= 0;
k= p;
# MergeSort in Python
def mergeSort(array):
if len(array) > 1:
i=j=k=0
# Driver program
if __name__ == '__main__':
array = [6, 5, 12, 10, 9, 1]
mergeSort(array)
Time Complexity
Space Complexity
• External sorting
• Ecommerce applications
Quick Sort
Quicksort Algorithm
Quicksort is a sorting algorithm based on the divide and conquer approach where
While dividing the array, the pivot element should be positioned in such a way that
elements less than pivot are kept on the left side and elements greater than pivot
are on the right side of the pivot.
2. The left and right subarrays are also divided using the same approach. This process
continues until each subarray contains a single element.
3. At this point, elements are already sorted. Finally, elements are combined to form a
sorted array.
There are different variations of quicksort where the pivot element is selected from
different positions. Here, we will be selecting the rightmost element of the array as
the pivot element.
Now the elements of the array are rearranged so that elements that are smaller than
the pivot are put on the left and the elements greater than the pivot are put on the
right.
Put all the smaller elements on the left and greater on the right of pivot element
Here's how we rearrange the array:
a. A pointer is fixed at the pivot element. The pivot element is compared with the
elements beginning from the first index.
Comparison of pivot element with element beginning from the first index
b. If the element is greater than the pivot element, a second pointer is set for that
element.
If the element is greater than the pivot element, a second pointer is set for that
element.
c. Now, pivot is compared with other elements. If an element smaller than the pivot
element is reached, the smaller element is swapped with the greater element found
earlier.
d. Again, the process is repeated to set the next greater element as the second pointer.
And, swap it with another smaller element.
The process is repeated to set the next greater element as the second pointer.
Pivot elements are again chosen for the left and the right subparts separately.
And, step 2 is repeated.
Select pivot element of in each half and put at correct place using recursion
The subarrays are divided until each subarray is formed of a single element. At this
point, the array is already sorted.
You can understand the working of quicksort algorithm with the help of the
illustrations below.
data = [8, 7, 2, 1, 0, 9, 6]
print("Unsorted Array")
print(data)
size = len(data)
quickSort(data, 0, size 1)
Quicksort Complexity
Time Complexity
Best O(n*log n)
Worst O(n2)
Average O(n*log n)
Space Complexity O(log n)
Stability No
1. Time Complexities
This condition leads to the case in which the pivot element lies in an extreme end of
the sorted array. One subarray is always empty and another subarray contains n
1 elements. Thus, quicksort is called only on this subarray.
However, the quicksort algorithm has better performance for scattered pivots.
2. Space Complexity
Quicksort Applications
Quicksort algorithm is used when
Given array
2. Initialize an array of length max+1 with all elements 0. This array is used for storing the
count of the elements in the array.
Count array
3. Store the count of each element at their respective index in count array
For example: if the count of element 3 is 2 then, 2 is stored in the 3rd position
of count array. If element "5" is not present in the array, then 0 is stored in 5th
position.
Count of each element stored
4. Store cumulative sum of the elements of the count array. It helps in placing the
elements into the correct index of the sorted array.
Cumulative count
5. Find the index of each element of the original array in the count array. This gives the
cumulative count. Place the element at the index calculated as shown in figure
below.
Counting sort
6. After placing each element at its correct position, decrease its count by one.
def countingSort(array):
size = len(array)
output = [0] * size
# Find the index of each element of the original array in count array
# place the elements in output array
i = size 1
while i >= 0:
output[count[array[i]] 1] = array[i]
count[array[i]] = 1
i = 1
data = [4, 2, 2, 8, 3, 3, 1]
countingSort(data)
print("Sorted Array in Ascending Order: ")
print(data)
Complexity
Time Complexity
Best O(n+k)
Worst O(n+k)
Average O(n+k)
Space Complexity O(max)
Stability Yes
Time Complexities
There are mainly four main loops. (Finding the greatest value can be done outside
the function.)
In all the above cases, the complexity is the same because no matter how the
elements are placed in the array, the algorithm goes through n+k times.
Space Complexity
The space complexity of Counting Sort is O(max). Larger the range of elements, larger
is the space complexity.
Suppose, we have an array of 8 elements. First, we will sort elements based on the
value of the unit place. Then, we will sort elements based on the value of the tenth
place. This process goes on until the last significant place.
Let the initial array be [121, 432, 564, 23, 1, 45, 788]. It is sorted according to radix sort as
shown in the figure below.
In this array [121, 432, 564, 23, 1, 45, 788], we have the largest number 788. It has 3 digits.
Therefore, the loop should go up to hundreds place (3 times).
Use any stable sorting technique to sort the digits at each significant place. We have
used counting sort for this.
# Using counting sort to sort the elements in the basis of significant places
def countingSort(array, place):
size = len(array)
output = [0] * size
count = [0] * 10
For the radix sort that uses counting sort as an intermediate stable sort, the time
complexity is O(d(n+k)).
Here, d is the number cycle and O(n+k) is the time complexity of counting sort.
Thus, radix sort has linear time complexity which is better than O(nlog n) of
comparative sorting algorithms.
If we take very large digit numbers or the number of other bases like 32bit and 64
bit numbers then it can perform in linear time however the intermediate sort takes
large space.
This makes radix sort space inefficient. This is the reason why this sort is not used in
software libraries.
Finally, the sorted buckets are combined to form a final sorted array.
Input array
Create an array of size 10. Each slot of this array is used as a bucket for storing
elements.
If we take integer numbers as input, we have to divide it by the interval (10 here)
to get the floor value.
Insert all the elements into the buckets from the array
3. The elements of each bucket are sorted using any of the stable sorting algorithms.
Here, we have used quicksort (inbuilt function).
Sort the elements in each bucket
It is done by iterating through the bucket and inserting an individual element into the
original array in each cycle. The element from the bucket is erased once it is copied
into the original array.
def bucketSort(array):
bucket = []
The initial set of numbers that we want to sort is stored in an array e.g. [10, 3, 76, 34, 23,
32] and after sorting, we get a sorted array [3,10,23,32,34,76].
Heap sort works by visualizing the elements of the array as a special kind of complete
binary tree called a heap.
Note: As a prerequisite, you must know about a complete binary tree and heap data
structure.
If the index of any element in the array is i, the element in the index 2i+1 will become
the left child and element in 2i+2 index will become the right child. Also, the parent of
any element at index i is given by the lower bound of (i1)/2.
Similarly,
Left child of 12 (index 1)
= element in (2*1+1) index
= element in 3 index
=5
Right child of 12
= element in (2*1+2) index
= element in 4 index
=6
Let us also confirm that the rules hold for finding parent of any node
Parent of 9 (position 2)
= (21)/2
=½
= 0.5
~ 0 index
=1
Parent of 12 (position 1)
= (11)/2
= 0 index
=1
• All nodes in the tree follow the property that they are greater than their children i.e.
the largest element is at the root and both its children and smaller than the root and
so on. Such a heap is called a maxheap. If instead, all nodes are smaller than their
children, it is called a minheap
Max Heap and Min Heap
To learn more about it, please visit Heap Data Structure.
Since heapify uses recursion, it can be difficult to grasp. So let's first think about how
you would heapify a tree with just three elements.
heapify(array)
Root = array[0]
Largest = largest( array[0] , array[2*0+ 1]. array[2*0+2])
if(Root != Largest)
Swap(Root, Largest)
Heapify base cases
The example above shows two scenarios one in which the root is the largest
element and we don't need to do anything. And another in which the root had a
larger element as a child and we needed to swap to maintain maxheap property.
If you're worked with recursive algorithms before, you've probably identified that
this must be the base case.
Now let's think of another scenario in which there is more than one level.
How to heapify root element when its subtrees are already max heaps
The top element isn't a maxheap but all the subtrees are maxheaps.
To maintain the maxheap property for the entire tree, we will have to keep pushing
2 downwards until it reaches its correct position.
How to heapify root element when its subtrees are maxheaps
Thus, to maintain the maxheap property in a tree where both subtrees are max
heaps, we need to run heapify on the root element repeatedly until it is larger than
its children or it becomes a leaf node.
We can combine both these conditions in one heapify function as
void heapify(int arr[], int n, int i) {
// Find largest among root, left child and right childint largest = i;
int left= 2* i + 1;
int right= 2* i + 2;
if(left< n && arr[left] > arr[largest])
largest = left;
if(right< n && arr[right] > arr[largest])
largest = right;
// Swap and continue heapifying if root is not largestif(largest != i) {
swap(&arr[i], &arr[largest]);
heapify(arr, n, largest);
}
}
This function works for both the base case and for a tree of any size. We can thus
move the root element to the correct position to maintain the maxheap status for
any tree size as long as the subtrees are maxheaps.
Build max-heap
To build a maxheap from any tree, we can thus start heapifying each subtree from
the bottom up and end up with a maxheap after the function is applied to all the
elements including the root element.
So, we can build a maximum heap as
// Build heap (rearrange array)for(int i = n / 2 1; i >= 0; i)
heapify(arr, n, i);
Create array and calculate i
Steps to build max heap for heap sort
Steps to build max heap for heap sort
Steps to build max heap for heap sort
As shown in the above diagram, we start by heapifying the lowest smallest trees and
gradually move up until we reach the root element.
If you've understood everything till here, congratulations, you are on your way to
mastering the Heap sort.
2. Swap: Remove the root element and put at the end of the array (nth position) Put
the last item of the tree (heap) at the vacant place.
4. Heapify: Heapify the root element again so that we have the highest element at
root.
5. The process is repeated until all the items of the list are sorted.
Swap, Remove, and Heapify
The code below shows the operation.
// Heap sortfor(int i = n 1; i >= 0; i) {
swap(&arr[0], &arr[i]);
// Heapify root element to get highest element at root againheapify(arr, i, 0);
}
# Heap Sort in python
def heapify(arr, n, i):
# Find largest among root and children
largest = i
l = 2 * i + 1
r = 2 * i + 2
if l < n and arr[i] < arr[l]:
largest = l
if r < n and arr[largest] < arr[r]:
largest = r
# If root is not largest, swap with largest and continue heapifying
if largest != i:
arr[i], arr[largest] = arr[largest], arr[i]
heapify(arr, n, largest)
def heapSort(arr):
n = len(arr)
# Build max heap
for i in range(n//2, 1, 1):
heapify(arr, n, i)
for i in range(n1, 0, 1):
# Swap
arr[i], arr[0] = arr[0], arr[i]
# Heapify root element
heapify(arr, i, 0)
# Heapify root element
heapify(arr, i, 0)
Heap Sort has O(nlog n) time complexities for all the cases ( best case, average case,
and worst case).
Let us understand the reason why. The height of a complete binary tree containing n
elements is log n
As we have seen earlier, to fully heapify an element whose subtrees are already max
heaps, we need to keep comparing the element with its left and right children and
pushing it downwards until it reaches a point where both its children are smaller
than it.
In the worst case scenario, we will need to move an element from the root to the
leaf node making a multiple of log(n) comparisons and swaps.
During the build_max_heap stage, we do that for n/2 elements so the worst case
complexity of the build_heap step is n/2*log n ~ nlog n.
During the sorting step, we exchange the root element with the last element and
heapify the root element. For each element, this again takes log n worst time because
we might have to bring the element all the way from the root to the leaf. Since we
repeat this n times, the heap_sort step is also nlog n.
we might have to bring the element all the way from the root to the leaf. Since we
repeat this n times, the heap_sort step is also nlog n.
Also since the build_max_heap and heap_sort steps are executed one after another, the
algorithmic complexity is not multiplied and it remains in the order of nlog n.
Also it performs sorting in O(1) space complexity. Compared with Quick Sort, it has a
better worst case ( O(nlog n) ). Quick Sort has complexity O(n^2) for worst case. But in
other cases, Quick Sort is fast. Introsort is an alternative to heapsort that combines
quicksort and heapsort to retain advantages of both: worst case speed of heapsort
and average speed of quicksort.
Although Heap Sort has O(n log n) time complexity even for the worst case, it doesn't
have more applications ( compared to other sorting algorithms like Quick Sort,
Merge Sort ). However, its underlying data structure, heap, can be efficiently used if
we want to extract the smallest (or largest) from the list of items without the
overhead of keeping the remaining items in the sorted order. For e.g Priority
Queues.
Shell Sort
The interval between the elements is reduced based on the sequence used. Some of
the optimal sequences that can be used in the shell sort algorithm are:
Note: The performance of the shell sort depends on the type of sequence used for a
given input array.
Initial array
2. We are using the shell's original sequence (N/2, N/4, ...1) as intervals in our algorithm.
In the first loop, if the array size is N = 8 then, the elements lying at the interval of N/2
= 4 are compared and swapped if they are not in order.
b. If the 0th element is greater than the 4th one then, the 4th element is first stored
in temp variable and the 0th element (ie. greater element) is stored in the 4th position
and the element stored in temp is stored in the 0th position.
3. In the second loop, an interval of N/4 = 8/4 = 2 is taken and again the elements lying at
these intervals are sorted.
Rearrange the elements at n/4 interval
You might get confused at this point.
All the elements in the array lying at the current interval are compared.
The elements at 4th and 2nd position are compared. The elements
at 2nd and 0th position are also compared. All the elements in the array lying at the
current interval are compared.
5. Finally, when the interval is N/8 = 8/8 =1 then the array elements lying at the interval of
1 are sorted. The array is now completely sorted.
Rearrange the elements at n/8 interval
array[j] = temp
interval //= 2
data = [9, 8, 3, 7, 5, 6, 4, 1]
size = len(data)
shellSort(data, size)
print('Sorted Array in Ascending Order:')
print(data)
Shell sort is an unstable sorting algorithm because this algorithm does not examine
the elements lying in between the intervals.
Time Complexity
According to Poonen Theorem, worst case complexity for shell sort is Θ(Nlog N)2/(log log
N)2) or Θ(Nlog N)2/log log N) or Θ(N(log N)2) or something in between.
The complexity depends on the interval chosen. The above complexities differ for
different increment sequences chosen. Best increment sequence is unknown.
Space Complexity:
• Insertion sort does not perform well when the close elements are far apart. Shell sort
helps in reducing the distance between the close elements. Thus, there will be less
number of swappings to be performed.
Linear Search
Linear Search
Linear search is a sequential searching algorithm where we start from one end and
check every element of the list until the desired element is found. It is the simplest
searching algorithm.
Element found
array = [2, 4, 0, 1, 9]
x=1
n = len(array)
result = linearSearch(array, n, x)
if(result == 1):
print("Element not found")
else:
print("Element found at index: ", result)
Binary Search
Binary Search is a searching algorithm for finding an element's position in a sorted
array.
Binary search can be implemented only on a sorted list of items. If the elements are
not sorted already, we need to sort them first.
1. Iterative Method
2. Recursive Method
Initial array
Let x = 4 be the element to be searched.
2. Set two pointers low and high at the lowest and the highest positions respectively.
Setting pointers
3. Find the middle element mid of the array ie. arr[(low + high)/2] = 6.
Mid element
5. If x > mid, compare x with the middle element of the elements on the right side of mid.
This is done by setting low to low = mid + 1.
6. Else, compare x with the middle element of the elements on the left side of mid. This
is done by setting high to high = mid 1.
8. x = 4 is found.
Found
def binarySearch(array, x, low, high):
# Repeat until the pointers low and high meet each other
while low <= high:
mid = low + (high low)//2
if array[mid] == x:
return mid
elif array[mid] < x:
low = mid + 1
else:
high = mid 1
return 1
array = [3, 4, 5, 6, 7, 8, 9]
x = 4
result = binarySearch(array, x, 0, len(array)1)
if result != 1:
print("Element is present at index " + str(result))
else:
print("Not found")
# Binary Search in python
def binarySearch(array, x, low, high):
if high >= low:
mid = low + (high low)//2
# If found at mid, then return it
if array[mid] == x:
return mid
# Search the left half
elif array[mid] > x:
return binarySearch(array, x, low, mid1)
# Search the right half
else:
return binarySearch(array, x, mid + 1, high)
else:
return 1
array = [3, 4, 5, 6, 7, 8, 9]
x = 4
result = binarySearch(array, x, 0, len(array)1)
if result != 1:
print("Element is present at index " + str(result))
else:
print("Not found")
Space Complexity
The space complexity of the binary search is O(1).
• While debugging, the binary search is used to pinpoint the place where the error
happens.
Greedy Algorithm
Greedy Algorithm
A greedy algorithm is an approach for solving a problem by selecting the best option
available at the moment. It doesn't worry whether the current best result will bring
the overall optimal result.
The algorithm never reverses the earlier decision even if the choice is wrong. It
works in a topdown approach.
This algorithm may not produce the best result for all the problems. It's because it
always goes for the local best choice to produce the global best result.
However, we can determine if the algorithm can be used with any problem if the
problem has the following properties:
If an optimal solution to the problem can be found by choosing the best choice at
each step without reconsidering the previous steps once chosen, the problem can be
solved using a greedy approach. This property is called greedy choice property.
2. Optimal Substructure
If the optimal overall solution to the problem corresponds to the optimal solution to
its subproblems, then the problem can be solved using a greedy approach. This
property is called optimal substructure.
For example, suppose we want to find the longest path in the graph below from root
to leaf. Let's use the greedy algorithm here.
Apply greedy approach to this tree to find the longest route
Greedy Approach
2. Our problem is to find the largest path. And, the optimal solution at the moment
is 3. So, the greedy algorithm will choose 3.
However, it is not the optimal solution. There is another path that carries more
weight (20 + 2 + 10 = 32) as shown in the image below.
Longest path
Therefore, greedy algorithms do not always give an optimal/feasible solution.
Greedy Algorithm
1. To begin with, the solution set (containing answers) is empty.
2. At each step, an item is added to the solution set until a solution is reached.
3. If the solution set is feasible, the current item is kept.
4. Else, the item is rejected and never considered again.
Let's now use this algorithm to solve a problem.
Solution:
3. Always select the coin with the largest value (i.e. 5) until the sum > 18. (When we
select the largest value at each step, we hope to reach the destination faster. This
concept is called greedy choice property.)
Ford-Fulkerson Algorithm
FordFulkerson algorithm is a greedy approach for calculating the maximum possible
flow in a network or a graph.
A term, flow network, is used to describe a network of vertices and edges with a
source (S) and a sink (T). Each vertex, except S and T, can receive and send an equal
amount of stuff through it. S can only send and T can only receive stuff.
We can visualize the understanding of the algorithm using a flow of liquid inside a
network of pipes of different capacities. Each pipe has a certain capacity of liquid it
can transfer at an instance. For this algorithm, we are going to find how much liquid
can be flowed from the source to the sink at an instance using the network.
Terminologies Used
Augmenting Path
Residual Graph
Residual Capacity
It is the capacity of the edge after subtracting the flow from the maximum capacity.
2. While there is an augmenting path between the source and the sink, add this path to
the flow.
Ford-Fulkerson Example
The flow of all the edges is 0 at the beginning.
2. Select another path SDCT. The minimum capacity among these edges is 3 (SD).
3. Now, let us consider the reversepath BD as well. Selecting path SABDCT. The
minimum residual capacity among the edges is 1 (DC).
4. Adding all the flows = 2 + 3 + 1 = 6, which is the maximum possible flow on the flow
network.
Note that if the capacity for any edge is full, then that path cannot be used.
# FordFulkerson algorith in Python
from collections import defaultdict
class Graph:
def __init__(self, graph):
self.graph = graph
self. ROW = len(graph)
# Using BFS as a searching algorithm
def searching_algo_BFS(self, s, t, parent):
visited = [False] * (self.ROW)
queue = []
queue.append(s)
visited[s] = True
while queue:
u = queue.pop(0)
for ind, val in enumerate(self.graph[u]):
if visited[ind] == False and val > 0:
queue.append(ind)
visited[ind] = True
parent[ind] = u
return True if visited[t] else False
# Applying fordfulkerson algorithm
def ford_fulkerson(self, source, sink):
parent = [1] * (self.ROW)
max_flow = 0
while self.searching_algo_BFS(source, sink, parent):
path_flow = float("Inf")
s = sink
while(s != source):
path_flow = min(path_flow, self.graph[parent[s]][s])
s = parent[s]
# Adding the path flows
max_flow += path_flow
# Updating the residual values of edges
v = sink
while(v != source):
u = parent[v]
self.graph[u][v] = path_flow
self.graph[v][u] += path_flow
v = parent[v]
return max_flow
g = Graph(graph)
source = 0
sink = 5
Dijkstra's Algorithm
It differs from the minimum spanning tree because the shortest distance between
two vertices might not include all the vertices of the graph.
The algorithm uses a greedy approach in the sense that we find the next best
solution hoping that the end result is the best solution for the whole problem.
Choose a starting vertex and assign infinity path values to all other devices
Go to each vertex and update its path length
If the path length of the adjacent vertex is lesser than new path length, don't update it
Notice how the rightmost vertex has its path length updated twice
Repeat until all the vertices have been visited
We also want to be able to get the shortest path, not only know the length of the
shortest path. For this, we map each vertex to the vertex that last updated its path
length.
Once the algorithm is over, we can backtrack from the destination vertex to the
source vertex to find the path.
A minimum priority queue can be used to efficiently receive the vertex with least
path distance.
functiondijkstra(G, S)foreach vertex V in Gdistance[V]< infiniteprevious[V]< NULLIfV != S, add V to Priority
Queue Qdistance[S]< 0whileQ IS NOT EMPTYU< Extract MIN from Qforeach unvisited neighbour V of
UtempDistance< distance[U] + edge_weight(U, V)iftempDistance < distance[V]distance[V]<
tempDistanceprevious[V]< Ureturndistance[], previous[]
# Dijkstra's Algorithm in Python
import sys
# Providing the graph
vertices = [[0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 0],
[1, 1, 0, 1, 1, 0, 0],
[1, 0, 1, 0, 0, 0, 1],
[0, 0, 1, 0, 0, 1, 0],
[0, 1, 0, 0, 1, 0, 1],
[0, 0, 0, 1, 0, 1, 0]]
edges = [[0, 0, 1, 2, 0, 0, 0],
[0, 0, 2, 0, 0, 3, 0],
[1, 2, 0, 1, 3, 0, 0],
[2, 0, 1, 0, 0, 0, 1],
[0, 0, 3, 0, 0, 2, 0],
[0, 3, 0, 0, 2, 0, 1],
[0, 0, 0, 1, 0, 1, 0]]
# Find which vertex is to be visited next
def to_be_visited():
global visited_and_distance
v = 10
for index in range(num_of_vertices):
if visited_and_distance[index][0] == 0 \
and (v < 0 or visited_and_distance[index][1] <=
visited_and_distance[v][1]):
v = index
return v
num_of_vertices = len(vertices[0])
visited_and_distance = [[0, 0]]
for i in range(num_of_vertices1):
visited_and_distance.append([0, sys.maxsize])
for vertex in range(num_of_vertices):
# Find next vertex to be visited
to_visit = to_be_visited()
for neighbor_index in range(num_of_vertices):
# Updating new distances
if vertices[to_visit][neighbor_index] == 1 and \
visited_and_distance[neighbor_index][0] == 0:
new_distance = visited_and_distance[to_visit][1] \
+ edges[to_visit][neighbor_index]
if visited_and_distance[neighbor_index][1] > new_distance:
visited_and_distance[neighbor_index][1] = new_distance
visited_and_distance[to_visit][0] = 1
i = 0
# Printing the distance
for distance in visited_and_distance:
print("Distance of ", chr(ord('a') + i),
" from source vertex: ", distance[1])
i = i + 1
where, E is the number of edges and V is the number of vertices.
Space Complexity: O(V)
• In social networking applications
• In a telephone network
• To find the locations in the map
Kruskal's
Kruskal's Algorithm
Kruskal's algorithm is a minimum spanning tree algorithm that takes a graph as input
and finds the subset of the edges of that graph which
• form a tree that includes every vertex
• has the minimum sum of weights among all the trees that can be formed from the
graph
We start from the edges with the lowest weight and keep adding edges until we
reach our goal.
The steps for implementing Kruskal's algorithm are as follows:
1. Sort all the edges from low weight to high
2. Take the edge with the lowest weight and add it to the spanning tree. If adding the
edge created a cycle, then reject this edge.
3. Keep adding edges until we reach all vertices.
Choose the edge with the least weight, if there are more than 1, choose anyone
Choose the next shortest edge and add it
Choose the next shortest edge that doesn't create a cycle and add it
Choose the next shortest edge that doesn't create a cycle and add it
Repeat until you have a spanning tree
The most common way to find this out is an algorithm called Union FInd. The Union
Find algorithm divides the vertices into clusters and allows us to check if two vertices
belong to the same cluster or not and hence decide whether adding an edge creates
a cycle.
KRUSKAL(G):
A=∅
For each vertex v ∈ G.V:
MAKESET(v)
Foreachedge (u, v) ∈ G.E ordered byincreasing orderbyweight(u, v):
ifFINDSET(u) ≠ FINDSET(v):
A = A ∪ {(u, v)}
UNION(u, v)
returnA
class Graph:
def __init__(self, vertices):
self.V = vertices
self.graph = []
# Search function
# Applying Kruskal algorithm
def kruskal_algo(self):
result = []
i, e = 0, 0
self.graph = sorted(self.graph, key=lambda item: item[2])
parent = []
rank = []
for node in range(self.V):
parent.append(node)
rank.append(0)
while e < self.V 1:
u, v, w = self.graph[i]
i = i + 1
x = self.find(parent, u)
y = self.find(parent, v)
if x != y:
e = e + 1
result.append([u, v, w])
self.apply_union(parent, rank, x, y)
for u, v, weight in result:
print("%d %d: %d" % (u, v, weight))
g = Graph(6)
g.add_edge(0, 1, 4)
g.add_edge(0, 2, 4)
g.add_edge(1, 2, 2)
g.add_edge(1, 0, 4)
g.add_edge(2, 0, 4)
g.add_edge(2, 1, 2)
g.add_edge(2, 3, 3)
g.add_edge(2, 5, 2)
g.add_edge(2, 4, 4)
g.add_edge(3, 2, 3)
g.add_edge(3, 4, 3)
g.add_edge(4, 2, 4)
g.add_edge(4, 3, 3)
g.add_edge(5, 2, 2)
g.add_edge(5, 4, 3)
g.kruskal_algo()
• In computer network (LAN connection)
Prims's
Prim's Algorithm
Prim's algorithm is a minimum spanning tree algorithm that takes a graph as input
and finds the subset of the edges of that graph which
• form a tree that includes every vertex
• has the minimum sum of weights among all the trees that can be formed from the
graph
We start from one vertex and keep adding edges with the lowest weight until we
reach our goal.
The steps for implementing Prim's algorithm are as follows:
1. Initialize the minimum spanning tree with a vertex chosen at random.
2. Find all the edges that connect the tree to new vertices, find the minimum and add it
to the tree
3. Keep repeating step 2 until we get a minimum spanning tree
Choose the nearest edge not yet in the solution, if there are multiple choices, choose one at random
Repeat until you have a spanning tree
INF = 9999999
# number of vertices in graph
V=5
# create a 2d array of size 5x5
# for adjacency matrix to represent graph
G = [[0, 9, 75, 0, 0],
[9, 0, 95, 19, 42],
[75, 95, 0, 51, 66],
[0, 19, 51, 0, 31],
[0, 42, 66, 31, 0]]
# create a array to track selected vertex
# selected will become true otherwise false
selected = [0, 0, 0, 0, 0]
# set number of edge to 0
no_edge = 0
# the number of egde in minimum spanning tree will be
# always less than(V 1), where V is number of vertices in
# graph
# choose 0th vertex and make it true
selected[0] = True
# print for edge and weight
print("Edge : Weight\n")
while (no_edge < V 1):
# For every vertex in the set S, find the all adjacent vertices
#, calculate the distance from the vertex selected at step 1.
# if the vertex is already in the set S, discard it otherwise
# choose another vertex nearest to selected vertex at step 1.
minimum = INF
x=0
y=0
for i in range(V):
if selected[i]:
for j in range(V):
if ((not selected[j]) and G[i][j]):
# not in selected and there is an edge
if minimum > G[i][j]:
minimum = G[i][j]
x=i
y=j
print(str(x) + "" + str(y) + ":" + str(G[x][y]))
selected[y] = True
no_edge += 1
• In network designed
Huffman Coding
Huffman Coding is a technique of compressing data to reduce its size without losing
any of the details. It was first developed by David Huffman.
Huffman Coding is generally useful to compress the data in which there are
frequently occurring characters.
Initial string
Each character occupies 8 bits. There are a total of 15 characters in the above string.
Thus, a total of 8 * 15 = 120 bits are required to send this string.
Using the Huffman Coding technique, we can compress the string to a smaller size.
Huffman coding first creates a tree using the frequencies of the character and then
generates code for each character.
Once the data is encoded, it has to be decoded. Decoding is done using the same
tree.
Huffman Coding prevents any ambiguity in the decoding process using the concept
of prefix code ie. a code associated with a character should not be present in the
prefix of any other code. The tree created above helps in maintaining the property.
Frequency of string
2. Sort the characters in increasing order of the frequency. These are stored in a
priority queue Q.
4. Create an empty node z. Assign the minimum frequency to the left child of z and
assign the second minimum frequency to the right child of z. Set the value of the z as
the sum of the above two minimum frequencies.
5. Remove these two minimum frequencies from Q and add the sum into the list of
frequencies (* denote the internal nodes in the figure above).
6. Insert node z into the tree.
8. For each nonleaf node, assign 0 to the left edge and 1 to the right edge.
Assign 0 to the left edge and 1 to the right edge
For sending the above string over a network, we have to send the tree as well as the
above compressedcode. The total size is given by the table below.
Without encoding, the total size of the string was 120 bits. After encoding the size is
reduced to 32 + 15 + 28 = 75.
Let 101 is to be decoded, we can traverse from the root as in the figure below.
Decoding
string = 'BCAADDDCCACACAC'
def children(self):
return (self.left, self.right)
def nodes(self):
return (self.left, self.right)
def __str__(self):
return '%s_%s' % (self.left, self.right)
# Main function implementing huffman coding
def huffman_code_tree(node, left=True, binString=''):
if type(node) is str:
return {node: binString}
(l, r) = node.children()
d = dict()
d.update(huffman_code_tree(l, True, binString + '0'))
d.update(huffman_code_tree(r, False, binString + '1'))
return d
# Calculating frequency
freq = {}
for c in string:
if c in freq:
freq[c] += 1
else:
freq[c] = 1
nodes = freq
huffmanCode = huffman_code_tree(nodes[0][0])
Dynamic Programming
Dynamic Programming is a technique in computer programming that helps to
efficiently solve a class of problems that have overlapping subproblems and optimal
substructure property.
If any problem can be divided into subproblems, which in turn are divided into
smaller subproblems, and if there are overlapping among these subproblems, then
the solutions to these subproblems can be saved for future reference. In this way,
efficiency of the CPU can be enhanced. This method of solving a solution is referred
to as dynamic programming.
Such problems involve repeatedly calculating the value of the same subproblems to
find the optimum solution.
Algorithm
Let n be the number of terms.
1. If n <= 1, return 1.
2. Else, return the sum of two preceding numbers.
3. The third term is sum of 0 (from step 1) and 1(from step 2), which is 1.
4. The fourth term is the sum of the third term (from step 3) and second term (from
step 2) i.e. 1 + 1 = 2.
5. The fifth term is the sum of the fourth term (from step 4) and third term (from step
3) i.e. 2 + 1 = 3.
Hence, we have the sequence 0,1,1, 2, 3. Here, we have used the results of the
previous steps as shown below. This is called a dynamic programming approach.
F(0) = 0
F(1) = 1
F(2) = F(1) + F(0)
F(3) = F(2) + F(1)
F(4) = F(3) + F(2)
But not all problems that use recursion can use Dynamic Programming. Unless there
is a presence of overlapping subproblems like in the fibonacci sequence problem, a
recursion can only reach the solution using a divide and conquer approach.
That is the reason why a recursive algorithm like Merge Sort cannot use Dynamic
Programming, because the subproblems are not overlapping in any way.
However, greedy algorithms look for locally optimum solutions or in other words, a
greedy choice, in the hopes of finding a global optimum. Hence greedy algorithms
can make a guess that looks optimum at the time but becomes costly down the line
and do not guarantee a globally optimum.
Dynamic programming, on the other hand, finds the optimal solution to subproblems
and then makes an informed choice to combine the results of those subproblems to
find the most optimum solution
Floyd Warshall
Floyd-Warshall Algorithm
FloydWarshall Algorithm is an algorithm for finding the shortest path between all
the pairs of vertices in a weighted graph. This algorithm works for both the directed
and undirected weighted graphs. But, it does not work for the graphs with negative
cycles (where the sum of the edges in a cycle is negative).
A weighted graph is a graph in which each edge has a numerical value associated
with it.
This algorithm follows the dynamic programming approach to find the shortest
paths.
Initial graph
Follow the steps below to find the shortest path between all the pairs of vertices.
1. Create a matrix A0 of dimension n*n where n is the number of vertices. The row and
the column are indexed as i and j respectively. i and j are the vertices of the graph.
Each cell A[i][j] is filled with the distance from the ith vertex to the jth vertex. If there is
no path from ith vertex to jth vertex, the cell is left as infinity.
Fill each cell with the distance between ith and jth vertex
2. Now, create a matrix A1 using matrix A0. The elements in the first column and the first
row are left as they are. The remaining cells are filled in the following way.
Let k be the intermediate vertex in the shortest path from source to destination. In
this step, k is the first vertex. A[i][j] is filled with (A[i][k] + A[k][j]) if (A[i][j] > A[i][k] + A[k][j]).
That is, if the direct distance from the source to the destination is greater than the
path through the vertex k, then the cell is filled with A[i][k] + A[k][j].
In this step, k is vertex 1. We calculate the distance from source vertex to destination
vertex through this vertex k.
Calculate the distance from the source vertex to destination vertex through this
vertex k
For example: For A1[2, 4], the direct distance from vertex 2 to 4 is 4 and the sum of the
distance from vertex 2 to 4 through vertex (ie. from vertex 2 to 1 and from vertex 1
to 4) is 7. Since 4 < 7, A0[2, 4] is filled with 4.
3. Similarly, A2 is created using A1. The elements in the second column and the second
row are left as they are.
In this step, k is the second vertex (i.e. vertex 2). The remaining steps are the same as
in step 2.
Calculate the distance from the source vertex to destination vertex through this
vertex 2
Calculate the distance from the source vertex to destination vertex through this
vertex 3
Calculate the distance from the source vertex to destination vertex through this
vertex 4
5. A4 gives the shortest path between each pair of vertices.
Floyd-Warshall Algorithm
n = no of vertices
A = matrix of dimension n*n
for k = 1 to n
for i = 1 to n
for j = 1 to n
Ak[i, j] = min (Ak1[i, j], Ak1[i, k] + Ak1[k, j])
return A
INF = 999
# Algorithm implementation
def floyd_warshall(G):
distance = list(map(lambda i: list(map(lambda j: j, i)), G))
There are three loops. Each loop has constant complexities. So, the time complexity
of the FloydWarshall algorithm is O(n3).
Space Complexity
If S1 and S2 are the two given sequences then, Z is the common subsequence
of S1 and S2 if Z is a subsequence of both S1 and S2. Furthermore, Z must be a strictly
increasing sequence of the indices of both S1 and S2.
In a strictly increasing sequence, the indices of the elements chosen from the original
sequences must be in ascending order in Z.
If
S1 = {B, C, D, A, A, C, D}
Then, {A, D, B} cannot be a subsequence of S1 as the order of the elements is not the
same (ie. not strictly increasing sequence).
If
S1 = {B, C, D, A, A, C, D}
S2 = {A, C, D, B, A, C}
Then, common subsequences are {B, C}, {C, D, A, C}, {D, A, C}, {A, A, C}, {A, C}, {C, D}, ...
Before proceeding further, if you do not already know about dynamic programming,
please go through dynamic programming.
Second Sequence
The following steps are followed for finding the longest common subsequence.
Initialise a table
3. If the character correspoding to the current row and current column are matching,
then fill the current cell by adding one to the diagonal element. Point an arrow to the
diagonal cell.
4. Else take the maximum value from the previous column and previous row element
for filling the current cell. Point an arrow to the cell with maximum value. If they are
equal, point to any of them.
6. The value in the last row and the last column is the length of the longest common
subsequence.
7. In order to find the longest common subsequence, start from the last element and
follow the direction of the arrow. The elements corresponding to () symbol form the
longest common subsequence.
The method of dynamic programming reduces the number of function calls. It stores
the result of each function call so that it can be used in future calls without the need
for redundant calls.
In the above dynamic algorithm, the results obtained from each comparison
between elements of X and the elements of Y are stored in a table so that they can be
used in future computations.
So, the time taken by a dynamic approach is the time taken to fill the table (ie.
O(mn)). Whereas, the recursion algorithm has the complexity of 2max(m, n).
index = L[m][n]
lcs_algo = [""] * (index+1)
lcs_algo[index] = ""
i=m
j=n
while i > 0 and j > 0:
if S1[i1] == S2[j1]:
lcs_algo[index1] = S1[i1]
i = 1
j = 1
index = 1
S1 = "ACADB"
S2 = "CBDA"
m = len(S1)
n = len(S2)
lcs_algo(S1, S2, m, n)
Backtracking Algorithm
A backtracking algorithm is a problemsolving algorithm that uses a brute force
approach for finding the desired output.
The Brute force approach tries out all the possible solutions and chooses the
desired/best solutions.
The term backtracking suggests that if the current solution is not suitable, then
backtrack and try other solutions. Thus, recursion is used in this approach.
This approach is used to solve problems that have multiple solutions. If you want an
optimal solution, you must go for dynamic programming.
Solution: There are a total of 3! = 6 possibilities. We will try all the possibilities and
get the possible solutions. We recursively try all the possibilities.
All the possibilities are:
All the possibilities
The following state space tree shows the possible solutions.
State tree with all the solutions
Rabin Karp Algo
Rabin-Karp Algorithm
RabinKarp algorithm is an algorithm used for searching/matching patterns in the
text using a hash function. Unlike Naive string matching algorithm, it does not travel
through every character in the initial phase rather it filters the characters that do not
match and then performs the comparison.
A hash function is a tool to map a larger input value to a smaller output value. This
output value is called the hash value.
Text
And the string to be searched in the above text be:
Pattern
2. Let us assign a numerical value(v)/weight for the characters we will be using in the
problem. Here, we have taken first ten alphabets only (i.e. A to J).
Text Weights
3. m be the length of the pattern and n be the length of the text. Here, m = 10 and n = 3.
Let d be the number of characters in the input set. Here, we have taken input set {A,
B, C, ..., J}. So, d = 10. You can assume any suitable value for d.
7. We calculate the hash value of the next window by subtracting the first term and
adding the next term as shown below.
t = ((1 * 102) + ((2 * 101) + (3 * 100)) * 10 + (3 * 100)) mod 13
= 233 mod 13
= 12
In order to optimize this process, we make use of the previous hash value in the
following way.
t = ((d * (t v[character to be removed] * h) + v[character to be added] ) mod 13
= ((10 * (6 1 * 9) + 3 )mod 13
= 12
Where, h = dm1 = 1031 = 100.
8. For BCC, t = 12 (≠6). Therefore, go for the next window.
After a few searches, we will get the match for the window CDA in the text.
Algorithm
n = t.length
m = p.length
h = dm1 mod q
p=0
t0 = 0
for i = 1 to m
p = (dp + p[i]) mod q
t0 = (dt0 + t[i]) mod q
for s = 0 to n m
if p = ts
if p[1.....m] = t[s + 1..... s + m]
print "pattern found at position" s
If s < nm
ts + 1 = (d (ts t[s + 1]h) + t[s + m + 1]) mod q
# RabinKarp algorithm in python
d = 10
def search(pattern, text, q):
m = len(pattern)
n = len(text)
p = 0
t = 0
h = 1
i = 0
j = 0
for i in range(m1):
h = (h*d) % q
# Calculate hash value for pattern and text
for i in range(m):
p = (d*p + ord(pattern[i])) % q
t = (d*t + ord(text[i])) % q
# Find the match
for i in range(nm+1):
if p == t:
for j in range(m):
if text[i+j] != pattern[j]:
break
j += 1
if j == m:
print("Pattern is found at position: " + str(i+1))
if i < nm:
t = (d*(tord(text[i])*h) + ord(text[i+m])) % q
if t < 0:
t = t+q
text = "ABCCDDAEFG"
pattern = "CDD"
q = 13
search(pattern, text, q)
When the hash value of the pattern matches with the hash value of a window of the
text but the window is not the actual pattern then it is called a spurious hit.
Spurious hit increases the time complexity of the algorithm. In order to minimize
spurious hit, we use modulus. It greatly reduces the spurious hit.
The worstcase complexity occurs when spurious hits occur a number for all the
windows.
• For searching string in a bigger text