Data Structure and Algorithms
Data Structure and Algorithms
Learning Objectives
● To understand the concepts of ADTs
● To learn linear data structures-lists, stacks, queues
● To learn Tree structures and application of trees
● To learn graph structures and application of graphs
● To understand various sorting and searching
Unit I: Abstract Data Types (ADTs) - List ADT-array-based implementation linked list
implementation singly linked lists-circular linked lists-doubly linked lists-applications of lists-
Polynomial Manipulation- All operations: Insertion-Deletion-Merge-Traversal.
Unit IV: Definition- Representation of Graph- Types of graph-Breadth first traversal – Depth
first traversal-Topological sort- Bi-connectivity – Cut vertex Euler circuits-Applications of
graphs.
Text Book
● Mark Allen Weiss, ―Data Structures and Algorithm Analysis in C++‖, Pearson
Education 2014, 4th Edition.
● ReemaThareja, ―Data Structures Using C‖, Oxford Universities Press 2014, 2nd
Edition.
Reference Books
● Thomas H.Cormen,ChalesE.Leiserson,RonaldL.Rivest, Clifford Stein, ―Introduction
to Algorithms‖, McGraw Hill 2009, 3rd Edition.
● Aho, Hopcroft and Ullman, ―Data Structures and Algorithms‖, Pearson Education
2003.
Web Resources
● https://programiz.com/dsa
UNIT I
Unit I: Abstract Data Types (ADTs) - List ADT- Array-based implementation - Linked list
implementation singly linked lists-circular linked lists-doubly linked lists-applications of lists-
Polynomial Manipulation- All operations: Insertion-Deletion-Merge-Traversal.
DSA
DSA is defined as a combination of two separate yet interrelated topics – Data Structure
and Algorithms. DSA is one of the most important skills that every computer science student
must have. It is often seen that people with good knowledge of these technologies are better
programmers than others and thus, crack the interviews of almost every tech giant.
Data Structures
A data structure is a storage that is used to store and organize data. It is a way of
arranging data on a computer so that it can be accessed and updated efficiently. A data structure
is not only used for organizing the data. It is also used for processing, retrieving, and storing
data.
Algorithms
Algorithm is defined as a process or set of well-defined instructions that are typically
used to solve a particular group of problems or perform a specific type of calculation.
Static data structure has a fixed memory size. It is easier to access the elements in a
static data structure.
Example: array.
In dynamic data structure, the size is not fixed. It can be randomly updated during the
runtime which may be considered efficient concerning the memory (space) complexity of the
code.
Example: Queue, Stack, etc.
Non-Linear Data Structure
Data structures where data elements are not placed sequentially or linearly are called
non-linear data structures. In a non-linear data structure, we can’t traverse all the elements in a
single run only.
Examples: Trees and Graphs.
Abstract Data Types (ADTs) are a way of organizing and storing data to provide
specific functionality without specifying the implementation details. It is a blueprint for
creating a data structure that defines the behavior and interface of the structure, without
specifying how it is implemented.
An ADT in the data structure can be thought of as a set of operations that can be
performed on a set of values.
In Python, ADTs are typically implemented using classes. Some examples of ADT are
Stack, Queue, List etc.
A stack is an abstract data type (ADT) that follows the Last In, First Out (LIFO)
principle. In a stack, elements are added and removed from the same end, typically called the
"top" of the stack. The last element added is the first one to be removed.
Stack Operations
A queue is an abstract data type (ADT) that follows the First In, First Out (FIFO) principle.
In a queue, elements are added at the rear (enqueue) and removed from the front (dequeue).
Queues are commonly used in scenarios where the order of elements matters, such as task
scheduling, breadth-first search, and handling requests in a system.
Queue Operations
In Python, lists are a built-in data type that can be considered as a form of an Abstract
Data Type (ADT Lists are linear data structures that hold data in a non-continuous structure.
The list is made up of data storage containers known as "nodes." These nodes are linked to one
another, which means that each node contains the address of another block. All of the nodes
are thus connected to one another via these links.
List Operations
Array
An array is a group of similar elements or data items of the same type collected at
contiguous memory locations. In simple words, we can say that in computer programming,
arrays are generally used to organize the same type of data.
Array for Integral value
Representation of an Array
Arrays can be represented in several ways, depending on the different languages. The
picture below shows the representation of the array.
Arrays always store the same type of values. In the above example:
We can say that we can simply initialize elements of an array at the time of declaration
and for that, we have to use the proper syntax:
class ArrayList:
def __init__(self):
self._data = []
def is_empty(self):
return len(self._data) == 0
def length(self):
return len(self._data)
def display(self):
print(self._data)
# Example usage:
my_list = ArrayList()
my_list.append(1)
my_list.append(2)
my_list.append(3)
print("Original list:")
my_list.display()
my_list.insert(1, 4)
print("\nList after inserting 4 at index 1:")
my_list.display()
my_list.remove(2)
print("\nList after removing 2:")
my_list.display()
Linked List
A Linked List is a data structure, where data nodes are linked together forming a
chain or list.
A Python linked list is an abstract data type in Python that allows users to organize
information in nodes, which then link to another node in the list. This makes it easier to insert
and remove information without changing the index of other items in the list.
● The starting point of the linked list is known as the head of the list. It is not a different
node, but refers to the first node.
● The node present in the end is called NULL.
Creation of Node and Declaration of Linked Lists
Class Node:
def __init__(self, data):
self.data = data
self.next = None
It is the most manageable type of linked list in which every node includes some data
and the address part, which means a pointer to the next node in the series. In a singly linked
list, we can perform operations like insertion, deletion, and traversal.
# Node class
class Node:
# Function to initialize the node object
def __init__(self, data):
self.data = data # Assign data
self.next = None # Initialize next as null
class LinkedList:
When a node holds a data part and two addresses, it is known as a doubly-linked list.
Two addresses means a pointer to the previous node and the next node.
In a circular linked list, the last node of the series contains the address of the first node
to make a circular chain.
class Node:
def __init__(self, data):
self.data = data
self.next = None
Basic Operations in Linked List
Polynomial Manipulation
Polynomial manipulations are one of the most important applications of linked lists.
Polynomials are an important part of mathematics not inherently supported as a data type by
most languages. A polynomial is a collection of different terms, each comprising coefficients,
and exponents. It can be represented using a linked list. This representation makes polynomial
manipulation efficient.
Representation of a Polynomial
A polynomial is an expression that contains more than two terms. A term is made up of
coefficient and exponent. An example of polynomial is
P(x) = 4x3+6x2+7x+9
A polynomial thus may be represented using arrays or linked lists. Array representation
assumes that the exponents of the given expression are arranged from 0 to the highest value
(degree), which is represented by the subscript of the array beginning with 0. The coefficients
of the respective exponent are placed at an appropriate index in the array. The array
representation for the above polynomial expression is given below:
A polynomial may also be represented using a linked list. A structure may be defined
such that it contains two parts- one is the coefficient and second is the corresponding exponent.
The structure definition may be given as shown below:
class Polynomial:
def __init__(self, coefficient, exponent):
self.coefficient = coefficient
self.exponent = exponent
self.next = None
Thus the above polynomial may be represented using linked list as shown below:
For adding two polynomials using arrays is straightforward method, since both the
arrays may be added up element wise beginning from 0 to n-1, resulting in addition of two
polynomials. Addition of two polynomials using linked list requires comparing the exponents,
and wherever the exponents are found to be same, the coefficients are added up. For terms with
different exponents, the complete term is simply added to the result thereby making it a part of
addition result. The complete program to add two polynomials is given in subsequent section.
# Example
poly1 = [3, 0, 2] # 3x^2 + 2
poly2 = [1, 4] # x + 4
result = add_polynomials(poly1, poly2)
print(result)
In this example, poly1 represents the polynomial 3x^2 + 2, and poly2 represents x + 4.
The add_polynomials function adds these two polynomials and returns the result as a new
array. The output will be [3, 1, 6], representing the polynomial 3x^2 + x + 6.
Multiplying two polynomials involves distributing each term of one polynomial across
all terms of the other polynomial and then combining like terms. This process can be
implemented using different data structures. Here's an example in Python using arrays to
represent polynomials:
def multiply_polynomials(poly1, poly2):
degree1 = len(poly1) - 1
degree2 = len(poly2) - 1
result_degree = degree1 + degree2
result = [0] * (result_degree + 1)
return result
# Example
poly1 = [3, 2, 5] # 3x^2 + 2x + 5
poly2 = [1, 4] #x+4
result = multiply_polynomials(poly1, poly2)
print(result)
This implementation uses nested loops to iterate over each term of both polynomials
and multiply the corresponding coefficients. The result is then accumulated into the appropriate
position in the result array based on the sum of the exponents.
Now, the next node at the left should point to the new node.
LeftNode.next -> NewNode;
This will put the new node in the middle of the two. The new list should look like this
Insertion in linked list can be done in three different ways. They are explained as
follows
Insertion at Beginning
In this operation, we are adding an element at the beginning of the list.
Algorithm
1. START
2. Create a node to store the data
3. Check if the list is empty
4. If the list is empty, add the data to the node and assign the head pointer to it.
5. If the list is not empty, add the data to a node and link to the current head. Assign the
head to the newly added node.
6. END
Example
class Node:
def __init__(self, data):
self.data = data
self.next = None
class LinkedList:
def __init__(self):
self.head = None
Insertion at Ending
In this operation, we are adding an element at the ending of the list.
Algorithm
1. START
2. Create a new node and assign the data
3. Find the last node
4. Point the last node to new node
5. END
Example
class LinkedList:
# ... (previous code)
Example
class LinkedList:
# ... (previous code)
Deletion Operation
Deletion is also a more than one step process. We shall learn with pictorial
representation. First, locate the target node to be removed, by using searching algorithms.
The left (previous) node of the target node now should point to the next node of the
target node
LeftNode.next -> TargetNode.next;
This will remove the link that was pointing to the target node. Now, using the following
code, we will remove what the target node is pointing at.
TargetNode.next -> NULL;
We need to use the deleted node. We can keep that in memory otherwise we can simply
deallocate memory and wipe off the target node completely.
Similar steps should be taken if the node is being inserted at the beginning of the list.
While inserting it at the end, the second last node of the list should point to the new node and
the new node will point to NULL.
Deletion in linked lists is also performed in three different ways. They are as follows
Deletion at Beginning
In this deletion operation of the linked, we are deleting an element from the beginning
of the list. For this, we point the head to the second node.
Algorithm
1. START
2. Assign the head pointer to the next node in the list
3. END
Example
class Node:
def __init__(self, data):
self.data = data
self.next = None
def delete_at_beginning(head):
if head is not None:
head = head.next
return head
Deletion at Ending
In this deletion operation of the linked, we are deleting an element from the ending of
the list.
Algorithm
1. START
2. Iterate until we find the second last element in the list.
3. Assign NULL to the second last element in the list.
4. END
Example
def delete_at_end(head):
if head is None:
return None
if head.next is None:
return None
temp = head
while temp.next.next is not None:
temp = temp.next
temp.next = None
return head
Example
class Node:
def __init__(self, data):
self.data = data
self.next = None
return head
The traversal operation walks through all the elements of the list in an order and
displays the elements in that order.
Algorithm
1. START
2. While the list is not empty and did not reach the end of the list,
print the data in each node
3. END
Example
class Node:
def __init__(self, data):
self.data = data
self.next = None
def print_linked_list(head):
current = head
while current is not None:
print(current.data, end=" -> ")
current = current.next
print("None")
Merge Operation
The merge operation in a linked list typically involves combining two sorted linked lists
into a single sorted linked list.
class Node:
def __init__(self, data):
self.data = data
self.next = None
current = current.next
return dummy.next
Stack ADT
A stack is a linear data structure where elements are stored in the LIFO (Last In First
Out) principle where the last element inserted would be the first element to be deleted. A stack
is an Abstract Data Type (ADT), that is popularly used in most programming languages.
Stack Representation
A stack allows all data operations at one end only. At any given time, we can only
access the top element of a stack.
A stack can be implemented by means of Array, Structure, Pointer, and Linked List.
Stack can either be a fixed size one or it may have a sense of dynamic resizing. We can perform
the two operations in the stack - PUSH and POP. The insert and delete operations are often
called push and pop.
Given below is the stack representation to show how data is inserted and deleted in a
stack.
Stack Operations
There are various stack operations that are applicable on a stack. Stack operations are
generally used to extract information and data from a stack data structure.
1. push()
Push is a function in stack definition which is used to insert data at the stack's top.
Algorithm
1. Checks if the stack is full.
2. If the stack is full, produces an error and exit.
3. If the stack is not full, increments top to point next
empty space.
4. Adds data element to the stack location, where top
is pointing.
5. Returns success.
Example
# Example usage
my_stack = Stack()
2. pop()
Pop is a function in the stack definition which is used to remove data from the stack's
top.
Algorithm
def pop(stack):
if not stack:
print("Stack is empty. Cannot pop.")
return None
return stack.pop()
# Example usage:
my_stack = [10, 20, 30]
popped_element = pop(my_stack)
3. topElement() / peek()
TopElement / Peek is a function in the stack which is used to extract the element present
at the stack top.
Algorithm
1. START
2. return the element at the top of the stack
3. END
4. isEmpty()
isEmpty is a boolean function in stack definition which is used to check whether the
stack is empty or not. It returns true if the stack is empty. Otherwise, it returns false.
Algorithm
1. START
2. If the top value is -1, the stack is empty. Return 1.
3. Otherwise, return 0.
4. END
5. isFull()
The isFull() operation checks whether the stack is full. This operation is used to check
the status of the stack with the help of top pointer.
Algorithm
1. START
2. If the size of the stack is equal to the top position of the stack,
the stack is full. Return 1.
3. Otherwise, return 0.
4. END
6. size()
Size is a function in stack definition which is used to find out the number of elements
that are present inside the stack.
Application of the Stack
Example: A + (B – C)
The normal precedence rules for arithmetic expressions must be understood in order to
evaluate the expressions. The following are the five fundamental arithmetic operators’
precedence rules:
Each operator is positioned between the operands in an expression written using the
infix notation. Depending on the requirements of the task, infix expressions may be
parenthesized or not.
Example: A + B, (C – D) etc.
Because the operator appears between the operands, all of these expressions are
written in infix notation.
Prefix Notation
The operator is listed before the operands in the prefix notation. Since the Polish
mathematician invented this system, it is frequently referred to as polish notation.
Because the operator occurs before the operands in all of these expressions, prefix
notation is used.
Postfix Notation
The operator is listed after the operands in postfix notation. Polish notation is simply
reversed in this notation, which is also referred to as Reverse Polish notation.
All these expressions are in postfix notation because the operator comes after the
operands.
Postfix expression: The expression of the form “a b operator” (ab+) i.e., When every pair of
operands is followed by an operator.
1. Infix Expression: A + B * C - D / E
2. Scan the expression from left to right:
a. Operand A: Output A.
b. Operator +: Push onto the stack.
c. Operand B: Output B.
d. Operator *: Pop + (precedence of * is higher than +) and output +. Push * onto
the stack.
e. Operand C: Output C.
f. Operator -: Push onto the stack.
g. Operand D: Output D.
h. Operator /: Pop - (precedence of / is higher than -) and output -. Push / onto the
stack.
i. Operand E: Output E.
3. Pop and output any remaining operators from the stack: Pop / and output /.
4. The postfix expression is: A B C * + D E / -
So, the infix expression A + B * C - D / E is converted to the postfix expression A B
C * + D E / -.
def infix_to_postfix(infix_expression):
precedence = {'+': 1, '-': 1, '*': 2, '/': 2, '^': 3}
def is_operator(char):
return char in "+-*/^"
postfix = []
stack = []
while stack:
postfix.append(stack.pop())
return ''.join(postfix)
# Example usage:
infix_expression = "a + b * (c - d) / e"
postfix_expression = infix_to_postfix(infix_expression)
print("Infix Expression:", infix_expression)
print("Postfix Expression:", postfix_expression)
Queue ADT
A Queue is an abstract linear data structure serving as a collection of elements that are
inserted (enqueue operation) and removed (dequeue operation) according to the First in First
Out (FIFO) approach.
Insertion happens at the rear end of the queue whereas deletion happens at the front end
of the queue. The front of the queue is returned using the peek operation.
A queue of people waiting for their turn or a queue of airplanes waiting for landing
instructions are also some real life examples of the queue data structure.
Queue Representation
A Queue in data structure can be accessed from both of its sides (at the front for
deletion and back for insertion).
The following diagram tries to explain the queue representation as a data structure-
A Queue in data structure can be implemented using arrays, linked lists, or vectors.
For the sake of simplicity, we will be implementing a queue using a one-dimensional array.
Working of Queue
We can use queue to perform its main two operations: Enqueue and Dequeue, other
operations being Peek, isEmpty and isFull.
Queue operations
Enqueue
The Enqueue operation is used to add an element to the front of the queue.
Dequeue
The Dequeue operation is used to remove an element from the rear of the queue.
Peek
The Peek operation is used to return the front most element of the queue.
Steps of the algorithm
isFull
The isFull operation is used to check if the queue is full or not.
1. Check if the number of elements in the queue (size) is equal to the capacity, if yes,
return True.
2. Return False.
isEmpty
The isEmpty operation is used to check if the queue is empty or not.
1. Check if the number of elements in the queue (size) is equal to 0, if yes, return True.
2. Return False.
Types of Queues in Data Structure
There are four different types of queues in data structures:
● Simple Queue
● Circular Queue
● Priority Queue
● Double-Ended Queue (Deque)
Simple Queue
Simple Queue is a linear data structure that follows the First-In-First-Out (FIFO)
principle, where elements are added to the rear (back) and removed from the front (head).
Circular Queue
A circular queue is a special case of a simple queue in which the last member is linked
to the first. As a result, a circle-like structure is formed.
In a priority queue, the nodes will have some predefined priority in the priority queue.
The node with the least priority will be the first to be removed from the queue. Insertion takes
place in the order of arrival of the nodes.
Below diagram shows how an application use priority queue for the items consumed
by the user.
Deque (Double Ended Queue)
In a double-ended queue, insertion and deletion can occur at both the queue's front
and rear ends.
Input restricted deque - As the name implies, in input restricted queue, insertion operation
can be performed at only one end, while deletion can be performed from both ends.
Output restricted deque - As the name implies, in output restricted queue, deletion
operation can be performed at only one end, while insertion can be performed from both
ends.
Applications of queue
● Task Scheduling
● Resource Allocation
● Batch Processing
● Message Buffering
● Event Handling
● Traffic Management
● Operating systems
● Network protocols
● Printer queues
● Web servers
● Breadth-first search algorithm
UNIT III
Unit III: Tree ADT-tree traversals-Binary Tree ADT-expression trees-applications of
trees-binary search tree ADT- Threaded Binary Trees-AVL Trees- B Tree- B+ Tree –
Heap-Applications of heap.
Tree ADT
A Tree is a widely used abstract data type (ADT) in computer science and data
structures. It is a hierarchical data structure that consists of nodes connected by edges. Each
node in a tree has a parent-child relationship, except for the topmost node, which is called the
root and has no parent. The nodes with no children are called leaves.
The Tree Abstract Data Type (Tree ADT) typically includes various operations that can
be performed on a tree.
Representation of Node
class TreeNode:
def __init__(self, data):
self.data = data # Information stored in the node
self.children = [] # References to child nodes
# Example usage:
root_node = TreeNode(10)
child1 = TreeNode(5)
child2 = TreeNode(15)
root_node.children.append(child1)
root_node.children.append(child2)
Some basic terms used in Tree data structure.
In a tree data structure, there are several basic terms that are commonly used to describe
its components and relationships. Here are some fundamental terms:
● Node: A fundamental building block of a tree that stores data. Each node has zero or
more child nodes, except for the topmost node called the root, which has no parent.
● Root: The topmost node in a tree. It is the starting point for traversing the tree and has
no parent.
● Parent: A node in a tree that has one or more child nodes. The node directly above a
given node is its parent.
● Child: A node in a tree that is a descendant of another node. The node directly below
a given node is its child.
● Sibling: Nodes that share the same parent in a tree are called siblings. They are at the
same level of the hierarchy.
● Leaf: A node in a tree that has no children, i.e., it is a node without any descendants.
● Subtree: A tree formed by a node and all its descendants.
● Ancestor: A node that is on the path from the root to another node, including the node
itself.
● Descendant: A node that is reached by moving down the tree from another node,
including the node itself.
● Level: The level of a node in a tree is its distance from the root. The root is at level 0,
its children are at level 1, and so on.
● Depth: The depth of a node is the length of the path from the root to that node. The
depth of the root is 0.
● Height: The height of a node is the length of the longest path from the node to a leaf.
The height of the tree is the height of the root.
Tree Operations
Here are some common operations associated with the Tree ADT:
● Binary Tree
● Binary Search Tree (BST)
● Threaded Binary Trees
● AVL Tree
● B-Tree
● B+ Tree
● Heap
Tree Traversal
Traversal is a process to visit all the nodes of a tree and may print their values too.
Because, all nodes are connected via edges (links) we always start from the root (head) node.
That is, we cannot randomly access a node in a tree. There are three ways which we use to
traverse a tree −
● In-order Traversal
● Pre-order Traversal
● Post-order Traversal
Generally, we traverse a tree to search or locate a given item or key in the tree or to
print all the values it contains.
In-order Traversal
In this traversal method, the left subtree is visited first, then the root and later the right
sub-tree. We should always remember that every node may represent a subtree itself.
If a binary tree is traversed in-order, the output will produce sorted key values in an
ascending order.
We start from A, and following in-order traversal, we move to its left subtree B.B is
also traversed in-order. The process goes on until all the nodes are visited. The output of in-
order traversal of this tree will be −
D→B→E→A→F→C→G
Algorithm
Pre-order Traversal
In this traversal method, the root node is visited first, then the left subtree and finally
the right subtree.
We start from A, and following pre-order traversal, we first visit A itself and then move
to its left subtree B. B is also traversed pre-order. The process goes on until all the nodes are
visited. The output of pre-order traversal of this tree will be −
A→B→D→E→C→F→G
Algorithm
Post-order Traversal
In this traversal method, the root node is visited last, hence the name. First we traverse
the left subtree, then the right subtree and finally the root node.
We start from A, and following pre-order traversal, we first visit the left subtree B. B
is also traversed post-order. The process goes on until all the nodes are visited. The output of
post-order traversal of this tree will be −
D→E→B→F→G→C→A
Algorithm
class Node:
def __init__(self, key):
self.leftChild = None
self.rightChild = None
self.data = key
# Main class
if __name__ == "__main__":
root = Node(3)
root.leftChild = Node(26)
root.rightChild = Node(42)
root.leftChild.leftChild = Node(54)
root.leftChild.rightChild = Node(65)
root.rightChild.leftChild = Node(12)
# Function call
print("Inorder traversal of binary tree is")
InorderTraversal(root)
print("\nPreorder traversal of binary tree is")
PreorderTraversal(root)
print("\nPostorder traversal of binary tree is")
PostorderTraversal(root)
Binary Tree ADT
A binary tree is a tree in which no node can have more than two children. The maximum
degree of any node is two. This means the degree of a binary tree is either zero or one or two.
In the above fig., the binary tree consists of a root and two sub trees Tl & Tr. All nodes
to the left of the binary tree are referred as left subtrees and all nodes to the right of a binary
tree are referred to as right subtrees.
Implementation
A binary tree has at most two children; we can keep direct pointers to them. The
declaration of tree nodes is similar in structure to that for doubly linked lists, in that a node is
a structure consisting of the key information plus two pointers (left and right) to other nodes.
class BinaryTreeNode:
def __init__(self, data):
self.data = data # Information stored in the node
self.left = None # Reference to the left child
self.right = None # Reference to the right child
# Example usage:
# Creating a binary tree with nodes 10, 5, and 15
root_node = BinaryTreeNode(10)
root_node.left = BinaryTreeNode(5)
root_node.right = BinaryTreeNode(15)
Strictly binary tree is a binary tree where all the nodes will have either zero or two
children. It does not have one child in any node.
Skew tree
A skew tree is a binary tree in which every node except the leaf has only one child node.
There are two types of skew tree, they are left skewed binary tree and right skewed binary tree.
A left skew tree has node with only the left child. It is a binary tree with only left
subtrees.
A right skew tree has node with only the right child. It is a binary tree with only right
subtrees.
Full binary tree or proper binary tree
A binary tree is a full binary tree if all leaves are at the same level and every non leaf
node has exactly two children and it should contain maximum possible number of nodes in all
levels. A full binary tree of height h has 2h+1 – 1 nodes.
Every non leaf node has exactly two children but all leaves are not necessary at the
same level. A complete binary tree is one where all levels have the maximum number of nodes
except the last level. The last level elements should be filled from left to right.
An almost complete binary tree is a tree in which each node that has a right child also
has a left child. Having a left child does not require a node to have a right child.
Application of trees
Expression Tree
Expression trees are useful in evaluating expressions by traversing the tree in a specific
order, such as post-order or in-order. They are also employed in compilers and interpreters for
parsing and optimizing expressions in programming languages.
In an expression tree
● Nodes: Each node in the tree represents an operand or an operator. Operand nodes
typically contain the values or variables, while operator nodes represent operations
such as addition, subtraction, multiplication, division, etc.
● Edges: The edges between nodes represent the relationships between operands and
operators. For example, an edge connecting an operator node to its operand nodes
signifies that the operation should be performed on those operands.
● Leaves: The nodes without any children are called leaves. Leaves typically contain
the operands, such as constants or variables.
Algorithm
Here's a simple example of an expression tree for the mathematical expression "3 + 4 * 5":
In this tree, the root node represents the addition operator, and its two children are the
operand nodes (3) and the multiplication operator. The multiplication operator has two
children, which are the operands (4 and 5).
● The left subtree of a node contains only nodes with keys lesser than the node’s
key.
● The right subtree of a node contains only nodes with keys greater than the
node’s key.
● The left and right subtree each must also be a binary search tree.
class Node:
def __init__(self, key, value):
self.key = key
self.value = value
self.left = None
self.right = None
Algorithm
Example
class TreeNode:
def __init__(self, key):
self.val = key
self.left = None
self.right = None
# Recursive Case: Insert into the left or right subtree based on the key
if key < root.val:
root.left = insert_bst(root.left, key)
elif key > root.val:
root.right = insert_bst(root.right, key)
return root
Search
● Finding a specific key value within the BST.
● Start from the root node and compare the target key with the current node's key.
● Move left or right in the tree based on the comparison until finding a match or
reaching a leaf node.
Algorithm
Example
class TreeNode:
def __init__(self, key):
self.val = key
self.left = None
self.right = None
# If the target is less than the root's value, search in the left subtree
if target < root.val:
return search_bst(root.left, target)
# If the target is greater than the root's value, search in the right subtree
return search_bst(root.right, target)
Deletion
● Removing a node with a specific key value from the BST.
● Handle different scenarios: a node has no children, a node has one child, or a node has
two children.
● Reorganize the tree while maintaining the BST property.
Algorithm
class TreeNode:
def __init__(self, key):
self.val = key
self.left = None
self.right = None
def find_in_order_successor(node):
current = node
while current.left is not None:
current = current.left
return current
in_order_successor = find_in_order_successor(root.right)
root.val = in_order_successor.val
root.right = delete_node_bst(root.right, in_order_successor.val)
return root
Binary search trees are an essential data structure in computer science and have various
applications in various domains. Their efficient search, insertion, and deletion operations make
them valuable for solving many problems. Here are some common applications of binary
search trees:
1. Searching and Retrieval: BST trees are mainly utilized for effective data retrieval
and searching operations. The binary search property helps to guarantee that the search
operation can be completed in O(log n) time, where n is the number of nodes in the tree.
2. Database Systems: Binary search trees are used in various databases to search and
index large, scattered reports. For example, we can store the names using the BST tree
structure in the phonebook.
3. Auto-Complete and Spell Check: A binary search tree can implement auto-
complete functionality at various places, like search engines. It can quickly suggest
completions while typing based on the prefix entered. They are also used in spell-
checking algorithms to suggest corrections for incorrectly spelled words.
4. File Systems: Various current version file systems use the binary search algorithm
to store files in directories.
5. Priority Queues: They can also be used to implement priority queues. The key of
each element represents its priority, and we can efficiently extract the element with the
highest (or lowest) priority.
9. Compressing Data: It can also help in compressing extensive data and optimizing
space. It can help implement various applications like image, file, audio, video, etc.
Implementing threaded binary trees involves careful management of the threads during
insertion, deletion, and other tree operations to ensure correctness and efficiency.
Singly Threaded Binary Tree
In a singly threaded binary tree, each node is linked to its in-order successor (or
predecessor) using a thread. The thread essentially serves as a shortcut, allowing us to traverse
the tree without having to follow the left or right child pointers in certain cases.
Traversal in a singly threaded binary tree can be done without recursion or a stack,
making it more memory-efficient.
Node Structure of Single-Threaded Binary Trees
# Sample usage
root = ThreadedTreeNode(10)
root.left_child = ThreadedTreeNode(5)
root.right_child = ThreadedTreeNode(15)
A doubly threaded binary tree is an extension of the singly threaded tree. In this case,
each node is linked to both its in-order successor and predecessor using left and right threads,
respectively. This allows for more efficient backward traversal as well.
With both left and right threads, we can navigate both forward (in-order successor) and
backward (in-order predecessor) in the tree without the need for recursion or a stack.
Node Structure of Double-Threaded Binary Trees
class DoubleThreadedTreeNode:
def __init__(self, data):
self.data = data # Data stored in the node
self.left_child = None # Pointer to the left child
self.right_child = None # Pointer to the right child
self.left_thread = False # Indicates whether the left pointer is a thread or points to a
child
self.right_thread = False # Indicates whether the right pointer is a thread or points to a
child
self.in_order_predecessor = None # Pointer to the in-order predecessor (if left_thread
is True)
self.in_order_successor = None # Pointer to the in-order successor (if right_thread is
True)
Advantages
● Space Efficiency
● Efficient Traversal
Disadvantages
● Complexity
● Limited to In-Order Traversal
AVL Trees
AVL trees are a type of self-balancing binary search tree (BST). In a binary search tree,
each node has at most two children, and for each node, all elements in its left subtree are less
than the node, and all elements in its right subtree are greater than the node.
The AVL tree was named after its inventors Adelson-Velsky and Landis. The key
feature of AVL trees is that they maintain balance during insertions and deletions, ensuring
that the tree remains relatively balanced, and the height difference between the left and right
subtrees of any node (called the balance factor) is at most 1.
The balance factor of a node in an AVL tree is the height of its left subtree minus the
height of its right subtree. The balance factor can be -1, 0, or 1 for each node in the tree.
The above tree is AVL because the differences between the heights of left and right
subtrees for every node are less than or equal to 1.
To maintain balance during insertions and deletions, AVL trees use rotations. There are
four types of rotations. They are
This rotation is performed when the balance factor of a node becomes greater than 1,
indicating that the left subtree is too deep.
Right Rotation (RR Rotation)
This rotation is performed when the balance factor becomes less than -1, indicating that
the right subtree is too deep.
This is a combination of left and right rotations. It is performed when the balance factor
of the left child of a node is less than 0.
This is a combination of right and left rotations. It is performed when the balance factor
of the right child of a node is greater than 0.
The rotations help to restore the balance of the tree and maintain the AVL property.
The time complexity of basic operations (insertion, deletion, and search) in an AVL
tree is O(log n), where n is the number of nodes in the tree.
AVL trees are widely used in scenarios where efficient search, insertion, and deletion
operations are required, and the tree needs to remain balanced to ensure optimal performance.
AVL trees support various operations that are standard for binary search trees
including
● Insertion
● Deletion
● Searching
Insertion
A newNode is always inserted as a leaf node with a balance factor equal to 0. After
each insertion, the ancestors of the newly inserted node are examined because the insertion
only affects their heights, potentially inducing an imbalance. This process of traversing the
ancestors to find the unbalanced node is called retracing.
Step 1: START
Step 2: Insert the node using BST insertion logic.
Step 3: Calculate and check the balance factor of each node.
Step 4: If the balance factor follows the AVL criterion, go to step 6.
Step 5: Else, perform tree rotations according to the insertion done. Once the tree is balanced
go to step 6.
Step 6: END
Deletion
A node is always deleted as a leaf node. After deleting a node, the balance factors of
the nodes get changed. To rebalance the balance factor, suitable rotations are performed.
Step 1: START
Step 2: Find the node in the tree. If the element is not found, go to step 7.
Step 3: Delete the node using BST deletion logic.
Step 4: Calculate and check the balance factor of each node.
Step 5: If the balance factor follows the AVL criterion, go to step 7.
Step 6: Else, perform tree rotations to balance the unbalanced nodes. Once the tree is
balanced go to step 7.
Step 7: END
Search
Perform a standard BST search. The AVL property ensures that the search operation
takes O(log n) time, where n is the number of nodes in the tree.
Algorithm
Step 1: START
Step 2: If the root node is NULL, return false.
Step 3: Check if the current node’s value is equal to the value of the node to be searched. If
yes, return true.
Step 4: If the current node’s value is less than the searched key then recur to the right
subtree.
Step 5: If the current node’s value is greater than the searched key then recur to the left
subtree.
Step 6: END
B Tree
A B-tree is a sort of self-balancing search tree whereby each node could have more than
two children and hold multiple keys. It’s a broader version of the binary search tree. It is also
usually called a height-balanced m-way tree.
B trees are also widely used in disk access, minimizing the disk access time since the
height of a b tree is low.
A B tree of order m contains all the properties of an M way tree. In addition, it contains
the following properties.
● Every node in a B-Tree except the root node and the leaf node contain at least
m/2 children.
Insertion operation
The insertion operation for a B Tree is done similar to the Binary Search Tree but the
elements are inserted into the same node until the maximum keys are reached. The insertion is
done using the following procedure −
Step 1 − Calculate the maximum (m−1)and, minimum ([m/2]−1) number of keys a node can
hold, where m is denoted by the order of the B Tree.
Step 2 − The data is inserted into the tree using the binary search insertion and once the keys
reach the maximum number, the node is split into half and the median key becomes the internal
node while the left and right keys become its children.
Another hiccup occurs during the insertion of 11, so the node is split and median is
shifted to the parent.
While inserting 16, even if the node is split in two parts, the parent node also overflows
as it reached the maximum keys. Hence, the parent node is split first and the median key
becomes the root. Then, the leaf node is split in half the median of leaf node is shifted to its
parent.
The final B tree after inserting all the elements is achieved.
Deletion operation
The deletion operation in a B tree is slightly different from the deletion operation of a
Binary Search Tree. The procedure to delete a node from a B tree is as follows −
Case 1 − If the key to be deleted is in a leaf node and the deletion does not violate the minimum
key property, just delete the node.
Case 2 − If the key to be deleted is in a leaf node but the deletion violates the minimum key
property, borrow a key from either its left sibling or right sibling. In case if both siblings have
exact minimum number of keys, merge the node in either of them.
Case 3 − If the key to be deleted is in an internal node, it is replaced by a key in either left child
or right child based on which child has more keys. But if both child nodes have a minimum
number of keys, they’re merged together.
Case 4 − If the key to be deleted is in an internal node violating the minimum keys property,
and both its children and sibling have a minimum number of keys, merge the children. Then
merge its sibling with its parent.
Searching
Searching in B Trees is similar to that in Binary search tree. For example, if we search
for an item 49 in the following B Tree. The process will something like following :
1. Compare item 49 with root node 78. since 49 < 78 hence, move to its left sub-tree.
Searching in a B tree depends upon the height of the tree. The search algorithm takes
O(log n) time to search any element in a B tree.
B+ Tree
A B+ tree is a type of self-balancing tree data structure that maintains sorted data and
allows searches, insertions, and deletions in logarithmic time. It is commonly used in database
systems and file systems to organize and manage large amounts of data efficiently.
The "B" in B+ tree stands for "balanced," and the "plus" indicates that the tree is an
extension of the original B-tree structure.
Balanced Structure: B+ trees are self-balancing, meaning that after each insertion or
deletion operation, the tree is automatically adjusted to maintain balance. This ensures
that the height of the tree remains logarithmic, resulting in efficient search operations.
Node Structure: The nodes of a B+ tree have a specific structure. In a B+ tree, all keys
are present at the leaves, and internal nodes only contain keys for navigation purposes
(not for data storage). This allows for efficient range queries and sequential access.
Sorted Order: The keys in each node are stored in sorted order. This property
facilitates binary search, making search operations more efficient.
Non-Leaf Nodes: Internal nodes in a B+ tree do not store actual data. They contain
keys to guide the search process. All actual data is stored in the leaves.
Sequential Access: The leaf nodes of a B+ tree are linked together in a sequential order,
making it easy to perform range queries and sequential access.
Fan-out: The number of children for each internal node (excluding the root) is known
as the "fan-out." A higher fan-out reduces the height of the tree, leading to more
efficient search operations.
Operations in B+ tree
A B+ tree supports various operations to manage and manipulate data efficiently. The
main operations include:
Search
Insertion
● To add a new key and its associated data into the B+ tree.
● The insertion operation begins with a search to find the appropriate leaf node where
the new key should be inserted.
● If the leaf node has enough space, the key is inserted directly. If the leaf is full, it may
trigger a split operation to maintain balance.
● After insertion, the tree is adjusted to ensure it remains balanced.
Deletion
● Similar to insertion, the deletion operation starts with a search to locate the leaf node
containing the key to be deleted.
● If deleting a key causes an underflow in the leaf node (the node has too few keys), it
may trigger redistribution or merging of nodes.
● After deletion, the tree is adjusted to maintain balance.
Difference Between B Tree and B+ Tree
B Tree B+ Tree
Data is stored in leaf as well as internal Data is stored only in leaf nodes.
nodes
Operations such as searching, insertion and Operations such as searching, insertion and
deletion are comparatively slower. deletion are comparatively faster.
Leaf nodes are not linked together. Leaf nodes are linked together as a linked list.
They are not advantageous as compared to They are advantageous as compared to B trees,
B+ trees, and hence, they aren't used in and hence, because of their efficiency, they find
DBMS. their applications in DBMS.
Heap
A Heap is a special Tree-based data structure in which the tree is a complete binary
tree.
1. Max-Heap: In a Max-Heap the key present at the root node must be greatest among
the keys present at all of it’s children. The same property must be recursively true
for all sub-trees in that Binary Tree.
2. Min-Heap: In a Min-Heap the key present at the root node must be minimum
among the keys present at all of it’s children. The same property must be recursively
true for all sub-trees in that Binary Tree.
Heapify
● Heapify is the process of converting a binary tree (or an array) into a heap, either in
the form of a max heap or a min heap.
● There are two types of heapify operations: "bottom-up" heapify and "top-down"
heapify.
● Bottom-up heapify is typically used during the construction of a heap, starting from
the bottom of the tree and ensuring that the heap property is satisfied at each step.
● Top-down heapify is often used after removing the root element in order to maintain
the heap property.
● It takes O(log N) to balance the tree.
Insertion
● Insertion involves adding a new element to the heap while maintaining the heap
property.
● The typical approach is to add the new element to the end of the heap (or array
representation) and then perform a "heapify-up" operation to restore the heap
property.
● This operation also takes O(logN) time.
Deletion
● Deletion involves removing an element from the heap while maintaining the heap
property.
● In a min heap, the minimum element (root) is removed; in a max heap, the maximum
element is removed.
● The typical approach is to swap the element to be deleted with the last element,
remove the last element, and then perform a "heapify-down" operation to restore the
heap property.
● The standard deletion on Heap is to delete the element present at the root node of the
heap.
● It takes O(logN) time.
Peek
● Peek, or Find-Min/Find-Max, involves returning the minimum (or maximum) element
in the heap without removing it.
● In a min heap, the root contains the minimum element; in a max heap, the root contains
the maximum element.
Applications of Heap
Heaps are versatile data structures with various applications in computer science and
programming. Some common applications include:
Priority Queues
One of the most common applications of heaps is in implementing priority queues.
Heaps allow for efficient insertion and extraction of elements with the highest (or lowest)
priority.
Heap Sort
Heap Sort is a sorting algorithm that uses a binary heap to sort elements in ascending
or descending order. It has a time complexity of O(n log n) and is an in-place sorting algorithm.
Huffman Coding
Huffman coding, a technique for lossless data compression, uses a binary heap to
efficiently construct a binary tree representing variable-length codes for each character in a
text.
Job Scheduling
Heaps can be used in job scheduling algorithms where tasks or jobs have different
priorities. The tasks with higher priority can be efficiently extracted from the heap for
execution.
Graph
A Graph is a non-linear data structure consisting of vertices and edges. The vertices are
sometimes also referred to as nodes and the edges are lines or arcs that connect any two nodes
in the graph. More formally a Graph is composed of a set of vertices( V ) and a set of edges( E
). The graph is denoted by G(E, V).
Components of a Graph
● Vertices: Vertices are the fundamental units of the graph. Sometimes, vertices are
also known as vertex or nodes. Every node/vertex can be labeled or unlabelled.
● Edges: Edges are drawn or used to connect two nodes of the graph. It can be ordered
pair of nodes in a directed graph. Edges can connect any two nodes in any possible
way. There are no rules. Sometimes, edges are also known as arcs. Every edge can
be labeled/unlabelled.
Types of graphs
Graphs are a fundamental data structure used to model relationships between objects.
There are two main types of graphs. They are
Undirected Graph
In an undirected graph, nodes are connected by edges that are all bidirectional. For
example if an edge connects node 1 and 2, we can traverse from node 1 to node 2, and from
node 2 to 1.
Directed Graph
In a directed graph, nodes are connected by directed edges – they only go in one
direction. For example, if an edge connects node 1 and 2, but the arrow head points towards 2,
we can only traverse from node 1 to node 2 – not in the opposite direction.
● Weighted Graph: Each edge has a weight or cost associated with it, representing some
measure such as distance, time, or cost.
● Disconnected Graph: There are at least two nodes for which there is no path between
them.
Graph Representation
In graph data structure, a graph representation is a technique to store graphs into the
memory of a computer. We can represent a graph in many ways.
The following two are the most commonly used representations of a graph.
1. Adjacency Matrix
2. Adjacency List
Adjacency Matrix
● A two-dimensional array where each cell at the intersection of row i and column j
represents whether there is an edge between node i and node j. It's suitable for dense
graphs.
● A slot matrix[i][j] = 1 indicates that there is an edge from node i to node j.
Weight or cost is indicated at the graph's edge, a weighted graph representing these
values in the matrix.
Adjacency List
● A collection of lists or arrays where each list represents the neighbors of a particular
node. It's suitable for sparse graphs.
● To create an Adjacency list, an array of lists is used. The size of the array is equal to
the number of nodes.
● A single index, array[i] represents the list of nodes adjacent to the ith node.
Graph Traversal in Data Structure
Pseudo Code
while queue:
node = queue.pop(0)
In the above diagram, the full way of traversing is shown using arrows.
● Step 1: Create a Queue with the same size as the total number of vertices in the graph.
● Step 2: Choose 12 as your beginning point for the traversal. Visit 12 and add it to the
Queue.
● Step 3: Insert all the adjacent vertices of 12 that are in front of the Queue but have not
been visited into the Queue. So far, we have 5, 23, and 3.
● Step 4: Delete the vertex in front of the Queue when there are no new vertices to visit
from that vertex. We now remove 12 from the list.
● Step 5: Continue steps 3 and 4 until the queue is empty.
● Step 6: When the queue is empty, generate the final spanning tree by eliminating
unnecessary graph edges.
Example
graph = {
'A': {'B', 'C'},
'B': {'A', 'D', 'E'},
'C': {'A', 'F'},
'D': {'B'},
'E': {'B', 'F'},
'F': {'C', 'E'}
}
bfs(graph, 'A')
When traversing a graph, the DFS method goes as far as it can before turning around.
This algorithm explores the graph in depth-first order, starting with a given source node and
then recursively visiting all of its surrounding vertices before backtracking. DFS will analyze
the deepest vertices in a branch of the graph before moving on to other branches. To implement
DFS, either recursion or an explicit stack might be utilized.
Graph Traversal: DFS Algorithm
Pseudo Code
The entire path of traversal is depicted in the diagram above with arrows.
● Step 1: Create a Stack with the total number of vertices in the graph as its size.
● Step 2: Choose 12 as your beginning point for the traversal. Go to that vertex and
place it on the Stack.
● Step 3: Push any of the adjacent vertices of the vertex at the top of the stack that has
not been visited onto the stack. As a result, we push 5
● Step 4: Repeat step 3 until there are no new vertices to visit from the stack’s top
vertex.
● Step 5: Use backtracking to pop one vertex from the stack when there is no new
vertex to visit.
● Step 6: Repeat steps 3, 4, and 5.
● Step 7: When the stack is empty, generate the final spanning tree by eliminating
unnecessary graph edges.
Example
graph = {
'A': {'B', 'C'},
'B': {'A', 'D', 'E'},
'C': {'A', 'F'},
'D': {'B'},
'E': {'B', 'F'},
'F': {'C', 'E'}
}
dfs(graph, 'A')
Topological Sort
Topological sort is a technique used in graph theory to order the vertices of a directed
acyclic graph (DAG). It ensures that for every directed edge from vertex A to vertex B, vertex
A comes before vertex B in the ordering. This is useful in scheduling problems, where tasks
depend on the completion of other tasks.
The algorithm begins by selecting a vertex with no incoming edges, adding it to the
ordering, and removing all outgoing edges from the vertex. This process is repeated until all
vertices are visited, and the resulting ordering is a topological sort of the DAG.
There are multiple algorithms for topological sorting, including Depth-First Search
(DFS) and Breadth-First Search (BFS). DFS-based algorithms are more commonly used for
topological sorting.
Here’s a step-by-step algorithm for topological sorting using Depth First Search (DFS):
Example
class Graph:
def __init__(self, vertices):
self.vertices = vertices
self.adj_list = {v: [] for v in range(vertices)}
def topological_sort(graph):
stack = []
visited = [False] * graph.vertices
def dfs(vertex):
visited[vertex] = True
for neighbor in graph.adj_list[vertex]:
if not visited[neighbor]:
dfs(neighbor)
stack.append(vertex)
for v in range(graph.vertices):
if not visited[v]:
dfs(v)
return result
# Example usage:
# Create a graph with 6 vertices and 7 directed edges
g = Graph(6)
g.add_edge(5, 2)
g.add_edge(5, 0)
g.add_edge(4, 0)
g.add_edge(4, 1)
g.add_edge(2, 3)
g.add_edge(3, 1)
Example
In the above graph vertex 3 and 4 are Articulation Points since the removal of vertex
3 (or 4) along with its associated edges makes the graph disconnected.
Eulerian Path and Circuit
Eulerian Path is a path in graph that visits every edge exactly once. Eulerian Circuit
is an Eulerian Path which starts and ends on the same vertex.
Applications of Graphs in Data Structures
In many fields, the quantitative discipline is crucial. Graphs are regarded as an excellent
modeling instrument that can be used to simulate various phases of relationships between all
physical circumstances. Graphs are a useful tool for illustrating a variety of real-world issues.
Some significant graph uses are listed below:
Social Networks: Graphs are unique network configurations with just one kind
of edge separating each vertex.
Web Graphs: There are many allusions to URLs on the internet. In other terms,
the internet is a great source of network data.
Neural Networks: Large diagrams that artificially link neurons with synapses
create neural networks. There are numerous varieties of neural networks, and the
primary distinction among them is how graphs are formed.
Map Networks: All devices come pre-loaded with applications like Uber, Apple
Maps, Google Maps, and Maze. Models for navigation issues resemble those for
graph issues. Consider issues with moving merchants, issues with shortcuts,
Hammington paths, etc.
Blockchains: Each block’s vertices can contain numerous deals, and the edges
link the blocks that follow. The present benchmark for historical transactions is
the biggest branch from the first block.
Searching Algorithm
Based on the type of search operation, these algorithms are generally classified into two
categories:
Sequential Search: In this, the list or array is traversed sequentially and every
element is checked. For example: Linear Search.
Interval Search: These algorithms are specifically designed for searching in
sorted data-structures. For Example: Binary Search.
Linear Search
Linear search, also known as sequential search, is a simple algorithm used to locate a
specific value within a list. It sequentially checks each element of the list until a match is
found or the entire list has been searched. Once a match was found, then the address of the
matching target element is returned. If the element is not found, then it returns a NULL value.
Following is a step-by-step approach employed to perform Linear Search Algorithm.
Here's a basic implementation of the linear search algorithm in Python:
# Example usage:
my_list = [1, 5, 9, 12, 3, 7]
target_element = 12
if result != -1:
print(f"Element {target_element} found at index {result}.")
else:
print(f"Element {target_element} not found in the list.")
In this example, the linear_search function iterates through each element of the list
(arr) and compares it with the target element (target). If a match is found, the function returns
the index of that element; otherwise, it returns -1 to indicate that the target element is not
present in the list.
Binary Search
The Binary Search algorithm is a fast technique that works efficiently on a sorted list.
Thus, it is important to make sure that the list should be a sorted one from which the element
is to be searched.
Binary search works on the divide and conquer approach, i.e. the list from which the
search is to be done is divided into two halves, and then the searched element is compared
with the middle element in the array. If the element is found, then the index of the middle
element is returned. Otherwise, the search will keep going in either of the halves according
to the result generated through the match.
Here is a simple implementation of binary search in Python:
def binary_search(arr, target):
low, high = 0, len(arr) - 1
if arr[mid] == target:
return mid
elif arr[mid] < target:
low = mid + 1
else:
high = mid - 1
return -1
# Example usage:
sorted_array = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
target_value = 7
if result != -1:
print(f"Element {target_value} found at index {result}")
else:
print(f"Element {target_value} not found in the array")
This code defines a function binary_search that takes a sorted array (arr) and a target
value (target). It returns the index of the target value in the array or -1 if the target is not
present. The example usage demonstrates how to use this function with a sorted array and a
target value.
Sorting
Sorting refers to rearrangement of a given array or list of elements according to a
comparison operator on the elements. The comparison operator is used to decide the new
order of elements in the respective data structure.
Types of Sorting Techniques
There are various sorting algorithms are used in data structures. The following two
types of sorting algorithms can be broadly classified:
Comparison-based: We compare the elements in a comparison-based
sorting algorithm)
Non-comparison-based: We do not compare the elements in a non-
comparison-based sorting algorithm)
Bubble Sort
Bubble Sort is an algorithm that sorts an array from the lowest value to the highest
value.
Bubble sort is a sorting algorithm that compares two adjacent elements and swaps
them until they are in the intended order.
In each iteration, the comparison takes place up to the last unsorted element.
The array is sorted if all elements are kept in the right order
bubbleSort(array)
for i <- 1 to indexOfLastUnsortedElement-1
if leftElement > rightElement
swap leftElement and rightElement
end bubbleSort
Example
def bubble_sort(arr):
n = len(arr)
# Example usage:
my_list = [64, 25, 12, 22, 11]
bubble_sort(my_list)
Compare minimum with the second element. If the second element is smaller
than minimum, assign the second element as minimum.
Compare minimum with the third element. Again, if the third element is
smaller, then assign minimum to the third element otherwise do nothing. The
process goes on until the last element.
After each iteration, minimum is placed in the front of the unsorted list.
Swap the first with minimum
For each iteration, indexing starts from the first unsorted element. Step 1 to 3
are repeated until all the elements are placed at their correct positions.
Example
def selectionSort(array, size):
for step in range(size):
min_idx = step
1. The first element in the array is assumed to be sorted. Take the second element and
store it separately in key.
Compare key with the first element. If the first element is greater than key, then key is
placed in front of the first element.
If the first element is greater than key, then key is placed in front of the first element.
Example
def insertionSort(array):
data = [9, 5, 1, 4, 3]
insertionSort(data)
print('Sorted Array in Ascending Order:')
print(data)
2. We are using the shell's original sequence (N/2, N/4, ...1) as intervals in our
algorithm.
In the first loop, if the array size is N = 8 then, the elements lying at the interval
of N/2 = 4 are compared and swapped if they are not in order.
a. The 0th element is compared with the 4th element.
b. If the 0th element is greater than the 4th one then, the 4th element is first
stored in temp variable and the 0th element (ie. greater element) is stored in
the 4th position and the element stored in temp is stored in the 0th position.
Rearrange the elements at n/2 interval
3. In the second loop, an interval of N/4 = 8/4 = 2 is taken and again the elements lying
at these intervals are sorted.
Rearrange the elements at n/4 interval
All the elements in the array lying at the current interval are compared
The elements at 4th and 2nd position are compared. The elements
at 2nd and 0th position are also compared. All the elements in the array lying at the
current interval are compared.
4. The same process goes on for remaining elements.
Rearrange all the elements at n/4 interval
5. Finally, when the interval is N/8 = 8/8 =1 then the array elements lying at the interval
of 1 are sorted. The array is now completely sorted.
Rearrange the elements at n/8 interval
Shell Sort Algorithm
shellSort(array, size)
for interval i <- size/2n down to 1
for each interval "i" in array
sort all the elements at interval "i"
end shellSort
Example
def shellSort(array, n):
# Rearrange elements at each n/2, n/4, n/8, ... intervals
interval = n // 2
while interval > 0:
for i in range(interval, n):
temp = array[i]
j=i
while j >= interval and array[j - interval] > temp:
array[j] = array[j - interval]
j -= interval
array[j] = temp
interval //= 2
data = [9, 8, 3, 7, 5, 6, 4, 1]
size = len(data)
shellSort(data, size)
print('Sorted Array in Ascending Order:')
print(data)
In this array [121, 432, 564, 23, 1, 45, 788], we have the largest number 788. It has 3
digits. Therefore, the loop should go up to hundreds place (3 times).
Use any stable sorting technique to sort the digits at each significant place. We have
used counting sort for this.
Example
def countingSort(array, place):
size = len(array)
output = [0] * size
count = [0] * 10
# Calculate count of elements
for i in range(0, size):
index = array[i] // place
count[index % 10] += 1
Components of Hashing
There are majorly three components of hashing:
Key: A Key can be anything string or integer which is fed as input in the hash
function the technique that determines an index or location for storage of an item in a
data structure.
Hash Function: The hash function receives the input key and returns the index of an
element in an array called a hash table. The index is known as the hash index.
Hash Table: Hash table is a data structure that maps keys to values using a special
function called a hash function. Hash stores the data in an associative manner in an
array where each data value has its own unique index.
Collision
The hashing process generates a small number for a big key, so there is a possibility
that two keys could produce the same value. The situation where the newly inserted key maps
to an already occupied, and it must be handled using some collision handling technology.
Hash functions are essential components of hashing techniques used in data structures
and algorithms. They take an input (or key) and produce a fixed-size hash value or hash code.
The hash value is used to efficiently index or locate data in hash tables or other data structures.
Types of hash function
There are many hash functions that use numeric or alphanumeric keys. They are
Division Method.
Mid Square Method.
Folding Method.
Multiplication Method.
Division method
This method involves dividing the key by the table size and taking the remainder as
the hash value.
Formula: hashValue = key % tableSize
For example, if the table size is 10 and the key is 23, the hash value would be 3 (23 %
10 = 3).
Multiplication method
This method involves multiplying the key by a constant and taking the fractional part
of the product as the hash value.
Formula: hashValue = floor(tableSize * ((key * A) % 1))
For example, if the key is 23 and the constant is 0.618, the hash value would be 2
(floor(10*(0.61823 - floor(0.61823))) = floor(2.236) = 2).
Folding Method
Folding hashing involves dividing the key value into equal-sized parts and then
performing some arithmetic operation (such as addition or XOR) on those parts to obtain the
hash value.
Formula: Divide the key into equal-sized parts and perform an operation (e.g., addition
or XOR) on those parts to obtain the hash value.
Mid-Square Method
This technique involves squaring the key value and extracting a portion of the
resulting digits as the hash value. The extracted portion can be taken from the middle or any
other fixed position.
Formula: hashValue = extractDigits((key^2), startPos, size)
Separate chaining
Separate chaining is a technique used in hash tables, a common data structure, to handle
collisions. In hash tables, collisions occur when two or more keys hash to the same index in
the table. Separate chaining resolves these collisions by allowing multiple elements to exist at
each index of the hash table.
Working of separate chaining
Hashing: When a key-value pair is inserted into the hash table, a hash function is
applied to the key to determine its index in the table. The hash function should ideally
distribute keys evenly across the table.
Collision Handling: If two or more keys hash to the same index, a collision occurs.
Instead of overwriting the existing value, separate chaining allows multiple values to
be stored at the same index.
Linked Lists or other data structures: At each index in the hash table, a linked list or
another data structure (like an array, tree, or even another hash table) is used to store
the collided key-value pairs.
Insertion and Retrieval: When inserting a new key-value pair, the pair is added to the
linked list at the corresponding index. When retrieving a value for a key, the hash
function is used to find the index, and then the linked list at that index is traversed to
find the key-value pair.
Collision Resolution: If the number of elements in any linked list grows too large, it
can lead to performance issues. To mitigate this, techniques such as resizing the hash
table and rehashing all the elements into a larger table may be employed.
Open Addressing
Open addressing is a method for handling collisions. In Open Addressing, all elements are
stored in the hash table itself. So at any point, the size of the table must be greater than or equal
to the total number of keys. This approach is also known as closed hashing. This entire
procedure is based upon probing.
Insert Operation
Hash function is used to compute the hash value for a key to be inserted.
Hash value is then used as an index to store the key in the hash table.
In case of collision,
Probing is performed until an empty bucket is found.
Once an empty bucket is found, the key is inserted.
Probing is performed in accordance with the technique used for open addressing.
Search Operation
To search any particular key,
Its hash value is obtained using the hash function used.
Using the hash value, that bucket of the hash table is checked.
If the required key is found, the key is searched.
Otherwise, the subsequent buckets are checked until the required key or an
empty bucket is found.
The empty bucket indicates that the key is not present in the hash table.
Delete Operation
The key is first searched and then deleted.
After deleting the key, that particular bucket is marked as “deleted”.
1. Linear Probing
In linear probing,
When collision occurs, we linearly probe for the next bucket.
We keep probing until an empty bucket is found.
Advantage
It is easy to compute.
Disadvantage
The main problem with linear probing is clustering.
Many consecutive elements form groups.
Then, it takes time to search an element or to find an empty bucket.
2. Quadratic Probing
In quadratic probing,
When collision occurs, we probe for i2‘th bucket in ith iteration.
We keep probing until an empty bucket is found.
3. Double Hashing
In double hashing,
We use another hash function hash2(x) and look for i * hash2(x) bucket in
ith iteration.
It requires more computation time as two hash functions need to be computed.
Rehashing
Rehashing is a technique in which the table is resized, i.e., the size of the table is
doubled by creating a new table.
Extensible Hashing
Extendible hashing is a dynamic approach to managing data. In this hashing method,
flexibility is a crucial factor. This method caters to flexibility so that even the hashing function
dynamically changes according to the situation and data type.
Algorithm
The following illustration represents the initial phases of our hashtable:
Directories and buckets are two key terms in this algorithm. Buckets are the holders of hashed
data, while directories are the holders of pointers pointing towards these buckets. Each
directory has a unique ID.
The following points explain how the algorithm work:
1. Initialize the bucket depths and the global depth of the directories.
2. Convert data into a binary representation.
3. Consider the "global depth" number of the least significant bits (LSBs) of data.
4. Map the data according to the ID of a directory.
5. Check for the following conditions if a bucket overflows (if the number of elements in
a bucket exceeds the set limit):
a) Global depth == bucket depth: Split the bucket into two and increment the global
depth and the buckets' depth. Re-hash the elements that were present in the split
bucket.
b) Global depth > bucket depth: Split the bucket into two and increment the bucket
depth only. Re-hash the elements that were present in the split bucket.
6. Repeat the steps above for each element.
By implementing the steps above, it will be evident why this method is considered so flexible
and dynamic.
Example
Let's take the following example to see how this hashing method works where:
Data = {28,4,19,1,22,16,12,0,5,7}
Bucket limit = 3
Convert the data into binary representation:
28 = 11100
4 = 00100
19 = 10011
1 = 00001
22 = 10110
16 = 10000
12 = 01100
0 = 00000
5 = 00101
7 = 00111