Greedy Algorithms
Greedy Algorithms
Greedy Algorithms
Global optima, on the other hand, are points where the function
reaches its highest (or lowest) value across the entire problem space. In
the same terrain example, the highest peak represents the global
optimum because it's the highest point in the entire landscape.
In essence, local optima are "hills" in the solution space, while the
global optimum is the tallest "mountain" across the entire landscape.
#include <bits/stdc++.h>
using namespace std;
// Constructor
Item(int profit, int weight)
{
this->profit = profit;
this->weight = weight;
}
};
// Driver code
int main()
{
int W = 50;
Item arr[] = { { 60, 10 }, { 100, 20 }, { 120, 30 } };
int N = sizeof(arr) / sizeof(arr[0]);
// Function call
cout << fractionalKnapsack(W, arr, N);
return 0;
}
Huffman Coding
Huffman coding is a lossless data compression algorithm. The idea is to
assign variable-length codes to input characters, lengths of the assigned
codes are based on the frequencies of corresponding characters.
The variable-length codes assigned to input characters are Prefix Codes,
means the codes (bit sequences) are assigned in such a way that the
code assigned to one character is not the prefix of code assigned to any
other character. This is how Huffman Coding makes sure that there is no
ambiguity when decoding the generated bitstream.
Let us understand prefix codes with a counter example. Let there be four
characters a, b, c and d, and their corresponding variable length codes be
00, 01, 0 and 1. This coding leads to ambiguity because code assigned to
c is the prefix of codes assigned to a and b. If the compressed bit stream
is 0001, the de-compressed output may be “cccd” or “ccb” or “acd” or
“ab”.
See this for applications of Huffman Coding.
There are mainly two major parts in Huffman Coding
1. Build a Huffman Tree from input characters.
2. Traverse the Huffman Tree and assign codes to characters.
Algorithm:
Q = c
for i<-1 to n-1
do
{
Step 1. Build a min heap that contains 6 nodes where each node
represents root of a tree with single node.
Step 2 Extract two minimum frequency nodes from min heap. Add a new
internal node with frequency 5 + 9 = 14.
Illustration of step 2
Now min heap contains 5 nodes where 4 nodes are roots of trees with
single element each, and one heap node is root of tree with 3 elements
character Frequency
c 12
d 13
Internal Node 14
e 16
f 45
Step 3: Extract two minimum frequency nodes from heap. Add a new
internal node with frequency 12 + 13 = 25
Illustration of step 3
Now min heap contains 4 nodes where 2 nodes are roots of trees with
single element each, and two heap nodes are root of tree with more than
one nodes
character Frequency
Internal Node 14
e 16
Internal Node 25
f 45
Step 4: Extract two minimum frequency nodes. Add a new internal node
with frequency 14 + 16 = 30
Illustration of step 4
Step 5: Extract two minimum frequency nodes. Add a new internal node
with frequency 25 + 30 = 55
Illustration of step 5
Step 6: Extract two minimum frequency nodes. Add a new internal node
with frequency 45 + 55 = 100
Illustration of step 6
Since the heap contains only one node, the algorithm stops here.
Steps to print codes from Huffman Tree:
Traverse the tree formed starting from the root. Maintain an auxiliary
array. While moving to the left child, write 0 to the array. While moving
to the right child, write 1 to the array. Print the array when a leaf node is
encountered.
Steps to print code from HuffmanTree
return temp;
}
// current size is 0
minHeap->size = 0;
minHeap->capacity = capacity;
// A utility function to
// swap two min heap nodes
void swapMinHeapNode(struct MinHeapNode** a,
struct MinHeapNode** b)
--minHeap->size;
minHeapify(minHeap, 0);
return temp;
}
++minHeap->size;
int i = minHeap->size - 1;
while (i
&& minHeapNode->freq
< minHeap->array[(i - 1) / 2]->freq) {
int n = minHeap->size - 1;
int i;
return minHeap;
}
{
struct MinHeapNode *left, *right, *top;
top->left = left;
top->right = right;
insertMinHeap(minHeap, top);
}
arr[top] = 0;
printCodes(root->left, arr, top + 1);
}
arr[top] = 1;
printCodes(root->right, arr, top + 1);
}
{
// Construct Huffman Tree
struct MinHeapNode* root
= buildHuffmanTree(data, freq, size);
return 0;
}
The graph contains 9 vertices and 14 edges. So, the minimum spanning
tree formed will be having (9 – 1) = 8 edges.
After sorting:
1 7 6
2 8 2
2 6 5
4 0 1
4 2 5
6 8 6
7 2 3
Weight Source Destination
7 7 8
8 0 7
8 1 2
9 3 4
10 5 4
11 1 7
14 3 5
Now pick all edges one by one from the sorted list of edges
Step 1: Pick edge 7-6. No cycle is formed, include it.
Add edge 7-6 in the MST
Step 6: Pick edge 8-6. Since including this edge results in the cycle,
discard it. Pick edge 2-3: No cycle is formed, include it.
Add edge 2-3 in the MST
Step 7: Pick edge 7-8. Since including this edge results in the cycle,
discard it. Pick edge 0-7. No cycle is formed, include it.
Add edge 0-7 in MST
Step 8: Pick edge 1-2. Since including this edge results in the cycle,
discard it. Pick edge 3-4. No cycle is formed, include it.
Note: Since the number of edges included in the MST equals to (V – 1),
so the algorithm stops here
Below is the implementation of the above approach:
/ C++ program for the above approach
#include <bits/stdc++.h>
using namespace std;
public:
DSU(int n)
{
parent = new int[n];
rank = new int[n];
for (int i = 0; i < n; i++) {
parent[i] = -1;
rank[i] = 1;
}
}
// Find function
int find(int i)
{
if (parent[i] == -1)
return i;
// Union function
void unite(int x, int y)
{
int s1 = find(x);
int s2 = find(y);
if (s1 != s2) {
if (rank[s1] < rank[s2]) {
parent[s1] = s2;
}
else if (rank[s1] > rank[s2]) {
parent[s2] = s1;
}
else {
parent[s2] = s1;
rank[s1] += 1;
}
}
}
};
class Graph {
vector<vector<int> > edgelist;
int V;
public:
Graph(int V) { this->V = V; }
void kruskals_mst()
{
// Sort all edges
sort(edgelist.begin(), edgelist.end());
// Driver code
int main()
{
Graph g(4);
g.addEdge(0, 1, 10);
g.addEdge(1, 3, 15);
g.addEdge(2, 3, 4);
g.addEdge(2, 0, 6);
g.addEdge(0, 3, 5);
// Function call
g.kruskals_mst();
return 0;
}
Kruskal(G):
Initialize an empty set A to store the edges of the minimum spanning tree
Sort the edges of G in non-decreasing order of their weights
Initialize a disjoint-set data structure to keep track of connected components
Find(u):
// Find the representative of the set containing u
if u ≠ parent[u]:
parent[u] = Find(parent[u])
return parent[u]
Union(u, v):
// Merge the sets containing u and v
root_u = Find(u)
root_v = Find(v)
parent[root_u] = root_v
Example of a graph
Step 1: Firstly, we select an arbitrary vertex that acts as the starting
vertex of the Minimum Spanning Tree. Here we have selected vertex 0 as
the starting vertex.
Step 2: All the edges connecting the incomplete MST and other vertices
are the edges {0, 1} and {0, 7}. Between these two the edge with
minimum weight is {0, 1}. So include the edge and vertex 1 in the MST.
Step 4: The edges that connect the incomplete MST with the fringe
vertices are {1, 2}, {7, 6} and {7, 8}. Add the edge {7, 6} and the vertex 6
in the MST as it has the least weight (i.e., 1).
6 is added in the MST
Step 5: The connecting edges now are {7, 8}, {1, 2}, {6, 8} and {6, 5}.
Include edge {6, 5} and vertex 5 in the MST as the edge has the minimum
weight (i.e., 2) among them.
Step 7: The connecting edges between the incomplete MST and the
other edges are {2, 8}, {2, 3}, {5, 3} and {5, 4}. The edge with minimum
weight is edge {2, 8} which has weight 2. So include this edge and the
vertex 8 in the MST.
Note: If we had selected the edge {1, 2} in the third step then the MST
would look like the following.
Structure of the alternate MST if we had selected edge {1, 2} in the MST
How to implement Prim’s Algorithm?
Follow the given steps to utilize the Prim’s Algorithm mentioned above
for finding MST of a graph:
• Create a set mstSet that keeps track of vertices already included
in MST.
• Assign a key value to all vertices in the input graph. Initialize all
key values as INFINITE. Assign the key value as 0 for the first
vertex so that it is picked first.
• While mstSet doesn’t include all vertices
• Pick a vertex u that is not there in mstSet and has a
minimum key value.
• Include u in the mstSet.
• Update the key value of all adjacent vertices of u. To
update the key values, iterate through all adjacent
vertices.
• For every adjacent vertex v, if the weight of
edge u-v is less than the previous key value
of v, update the key value as the weight of u-v.
The idea of using key values is to pick the minimum weight edge from
the cut. The key values are used only for vertices that are not yet
included in MST, the key value for these vertices indicates the minimum
weight edges connecting them to the set of vertices included in MST.
Below is the implementation of the approach:
#include <bits/stdc++.h>
using namespace std;
return min_index;
}
// Driver's code
int main()
{
int graph[V][V] = { { 0, 2, 0, 6, 0 },
{ 2, 0, 3, 8, 5 },
{ 0, 3, 0, 0, 7 },
{ 6, 8, 0, 0, 9 },
{ 0, 5, 7, 9, 0 } };
return 0;
}
Output
Edge Weight
0 - 1 2
1 - 2 3
0 - 3 6
1 - 4 5
Complexity Analysis of Prim’s Algorithm:
Time Complexity: O(V2), If the input graph is represented using an
adjacency list, then the time complexity of Prim’s algorithm can be
reduced to O(E * logV) with the help of a binary heap. In this
implementation, we are always considering the spanning tree to start
from the root of the graph
Auxiliary Space: O(V)
Bellman–Ford Algorithm
Imagine you have a map with different cities connected by roads, each road
having a certain distance. The Bellman–Ford algorithm is like a guide that
helps you find the shortest path from one city to all other cities, even if some
roads have negative lengths. It’s like a GPS for computers, useful for figuring
out the quickest way to get from one point to another in a network. In this
article, we’ll take a closer look at how this algorithm works and why it’s so
handy in solving everyday problems.
Bellman-Ford Algorithm
Bellman-Ford is a single source shortest path algorithm that determines
the shortest path between a given source vertex and every other vertex
in a graph. This algorithm can be used on
both weighted and unweighted graphs.
A Bellman-Ford algorithm is also guaranteed to find the shortest path in
a graph, similar to Dijkstra’s algorithm. Although Bellman-Ford is slower
than Dijkstra’s algorithm, it is capable of handling graphs with negative
edge weights, which makes it more versatile. The shortest path cannot
be found if there exists a negative cycle in the graph. If we continue to go
around the negative cycle an infinite number of times, then the cost of
the path will continue to decrease (even though the length of the path is
increasing). As a result, Bellman-Ford is also capable of
detecting negative cycles, which is an important feature.
The idea behind Bellman Ford Algorithm:
The Bellman-Ford algorithm’s primary principle is that it starts with a
single source and calculates the distance to each node. The distance is
initially unknown and assumed to be infinite, but as time goes on, the
algorithm relaxes those paths by identifying a few shorter paths. Hence it
is said that Bellman-Ford is based on “Principle of Relaxation“.
Principle of Relaxation of Edges for Bellman-Ford:
• It states that for the graph having N vertices, all the edges
should be relaxed N-1 times to compute the single source
shortest path.
• In order to detect whether a negative cycle exists or not, relax
all the edge one more time and if the shortest distance for any
node reduces then we can say that a negative cycle exists. In
short if we relax the edges N times, and there is any change in
the shortest distance of any node between the N-
1th and Nth relaxation than a negative cycle exists, otherwise
not exist.
Why Relaxing Edges N-1 times, gives us Single Source Shortest Path?
In the worst-case scenario, a shortest path between two vertices can
have at most N-1 edges, where N is the number of vertices. This is
because a simple path in a graph with N vertices can have at most N-
1 edges, as it’s impossible to form a closed loop without revisiting a
vertex.
By relaxing edges N-1 times, the Bellman-Ford algorithm ensures that
the distance estimates for all vertices have been updated to their optimal
values, assuming the graph doesn’t contain any negative-weight cycles
reachable from the source vertex. If a graph contains a negative-weight
cycle reachable from the source vertex, the algorithm can detect it
after N-1 iterations, since the negative cycle disrupts the shortest path
lengths.
In summary, relaxing edges N-1 times in the Bellman-Ford algorithm
guarantees that the algorithm has explored all possible paths of length
up to N-1, which is the maximum possible length of a shortest path in a
graph with N vertices. This allows the algorithm to correctly calculate the
shortest paths from the source vertex to all other vertices, given that
there are no negative-weight cycles.
Why Does the Reduction of Distance in the N’th Relaxation Indicates the
Existence of a Negative Cycle?
As previously discussed, achieving the single source shortest paths to all
other nodes takes N-1 relaxations. If the N’th relaxation further reduces
the shortest distance for any node, it implies that a certain edge with
negative weight has been traversed once more. It is important to note
that during the N-1 relaxations, we presumed that each vertex is
traversed only once. However, the reduction of distance during the N’th
relaxation indicates revisiting a vertex.
Working of Bellman-Ford Algorithm to Detect the Negative cycle in the
graph:
Let’s suppose we have a graph which is given below and we want to find
whether there exists a negative cycle or not using Bellman-Ford.
Initial Graph
Step 1: Initialize a distance array Dist[] to store the shortest distance for
each vertex from the source vertex. Initially distance of source will be 0
and Distance of other vertices will be INFINITY.
Initialize a distance array
1st Relaxation
2nd Relaxation
4th Relaxation
Step 7: Now the final relaxation i.e. the 6th relaxation should indicate the
presence of negative cycle if there is any changes in the distance array of
5th relaxation.
During the 6th relaxation, following changes can be seen:
• Current Distance of E > (Distance of F) + (Weight of F to E) i.e. 6
> 8 + (-3)
• Dist[E]=5
• Current Distance of F > (Distance of D ) + (Weight of D to F) i.e.
8>5+2
• Dist[F]=7
Since, we observer changes in the Distance array Hence ,we can
conclude the presence of a negative cycle in the graph.
6th Relaxation
printArr(dist, V);
return;
}
// Driver's code
int main()
{
/* Let us create the graph given in above example */
int V = 5; // Number of vertices in graph
int E = 8; // Number of edges in graph
struct Graph* graph = createGraph(V, E);
// Function call
BellmanFord(graph, 0);
return 0;
}
Output
Vertex Distance from Source
0 0
1 -1
2 2
3 -2
4 1
Complexity Analysis of Bellman-Ford Algorithm:
• Time Complexity when graph is connected:
• Best Case: O(E), when distance array after 1st and 2nd
relaxation are same , we can simply stop further
processing
• Average Case: O(V*E)
• Worst Case: O(V*E)
• Time Complexity when graph is disconnected:
• All the cases: O(E*(V^2))
• Auxiliary Space: O(V), where V is the number of vertices in the
graph.
Bellman Ford’s Algorithm Applications:
• Network Routing: Bellman-Ford is used in computer networking
to find the shortest paths in routing tables, helping data packets
navigate efficiently across networks.
• GPS Navigation: GPS devices use Bellman-Ford to calculate the
shortest or fastest routes between locations, aiding navigation
apps and devices.
• Transportation and Logistics: Bellman-Ford’s algorithm can be
applied to determine the optimal paths for vehicles in
transportation and logistics, minimizing fuel consumption and
travel time.
• Game Development: Bellman-Ford can be used to model
movement and navigation within virtual worlds in game
development, where different paths may have varying costs or
obstacles.
• Robotics and Autonomous Vehicles: The algorithm aids in path
planning for robots or autonomous vehicles, considering
obstacles, terrain, and energy consumption.
Drawback of Bellman Ford’s Algorithm:
• Bellman-Ford algorithm will fail if the graph contains any
negative edge cycle.
What is Dijkstra’s Algorithm? | Introduction
to Dijkstra’s Shortest Path Algorithm
Dijkstra’s Algorithm:
Dijkstra’s algorithm is a popular algorithms for solving many single-
source shortest path problems having non-negative edge weight in the
graphs i.e., it is to find the shortest distance between two vertices on a
graph. It was conceived by Dutch computer scientist Edsger W.
Dijkstra in 1956.
The algorithm maintains a set of visited vertices and a set of unvisited
vertices. It starts at the source vertex and iteratively selects the unvisited
vertex with the smallest tentative distance from the source. It then visits
the neighbors of this vertex and updates their tentative distances if a
shorter path is found. This process continues until the destination vertex
is reached, or all reachable vertices have been visited.
Need for Dijkstra’s Algorithm (Purpose and Use-Cases)
The need for Dijkstra’s algorithm arises in many applications where
finding the shortest path between two points is crucial.
For example, It can be used in the routing protocols for computer
networks and also used by map systems to find the shortest path
between starting point and the Destination (as explained in How does
Google Maps work?)
Can Dijkstra’s Algorithm work on both Directed and Undirected graphs?
Yes, Dijkstra’s algorithm can work on both directed graphs and
undirected graphs as this algorithm is designed to work on any type of
graph as long as it meets the requirements of having non-negative edge
weights and being connected.
• In a directed graph, each edge has a direction, indicating the
direction of travel between the vertices connected by the edge.
In this case, the algorithm follows the direction of the edges
when searching for the shortest path.
• In an undirected graph, the edges have no direction, and the
algorithm can traverse both forward and backward along the
edges when searching for the shortest path.
Algorithm for Dijkstra’s Algorithm:
1. Mark the source node with a current distance of 0 and the rest
with infinity.
2. Set the non-visited node with the smallest current distance as
the current node.
3. For each neighbor, N of the current node adds the current
distance of the adjacent node with the weight of the edge
connecting 0->1. If it is smaller than the current distance of
Node, set it as the new current distance of N.
4. Mark the current node 1 as visited.
5. Go to step 2 if there are any nodes are unvisited.
How does Dijkstra’s Algorithm works?
Let’s see how Dijkstra’s Algorithm works with an example given below:
Dijkstra’s Algorithm will generate the shortest path from Node 0 to all
other Nodes in the graph.
Consider the below graph:
Dijkstra’s Algorithm
The algorithm will generate the shortest path from node 0 to all the
other nodes in the graph.
For this graph, we will assume that the weight of the edges
represents the distance between two nodes.
As, we can see we have the shortest path from,
Node 0 to Node 1, from
Node 0 to Node 2, from
Node 0 to Node 3, from
Node 0 to Node 4, from
Node 0 to Node 6.
Initially we have a set of resources given below :
• The Distance from the source node to itself is 0. In this example
the source node is 0.
• The distance from the source node to all other node is unknown
so we mark all of them as infinity.
Example: 0 -> 0, 1-> ∞,2-> ∞,3-> ∞,4-> ∞,5-> ∞,6-> ∞.
• we’ll also have an array of unvisited elements that will keep
track of unvisited or unmarked Nodes.
• Algorithm will complete when all the nodes marked as visited
and the distance between them added to the path. Unvisited
Nodes:- 0 1 2 3 4 5 6.
Step 1: Start from Node 0 and mark Node as visited as you can check in
below image visited Node is marked red.
Dijkstra’s Algorithm
Step 3: Then Move Forward and check for adjacent Node which is Node
3, so marked it as visited and add up the distance, Now the distance will
be:
Distance: Node 0 -> Node 1 -> Node 3 = 2 + 5 = 7
Dijkstra’s Algorithm
Step 4: Again we have two choices for adjacent Nodes (Either we can
choose Node 4 with distance 10 or either we can choose Node 5 with
distance 15) so choose Node with minimum distance. In this step Node
4 is Minimum distance adjacent Node, so marked it as visited and add up
the distance.
Distance: Node 0 -> Node 1 -> Node 3 -> Node 4 = 2 + 5 + 10 = 17
Dijkstra’s Algorithm
Step 5: Again, Move Forward and check for adjacent Node which
is Node 6, so marked it as visited and add up the distance, Now the
distance will be:
Distance: Node 0 -> Node 1 -> Node 3 -> Node 4 -> Node 6 = 2 + 5 +
10 + 2 = 19
Dijkstra’s Algorithm
So, the Shortest Distance from the Source Vertex is 19 which is
optimal one
Pseudo Code for Dijkstra’s Algorithm
function Dijkstra(Graph, source):
// Initialize distances to all nodes as infinity, except for the source node.
distances = map infinity to all nodes
distances = 0
// Initialize an empty set of visited nodes and a priority queue to keep
track of the nodes to visit.
visited = empty set
queue = new PriorityQueue()
queue.enqueue(source, 0)
// Loop until all nodes have been visited.
while queue is not empty:
// Dequeue the node with the smallest distance from the priority
queue.
current = queue.dequeue()
// If the node has already been visited, skip it.
if current in visited:
continue
// Mark the node as visited.
visited.add(current)
// Check all neighboring nodes to see if their distances need to be
updated.
for neighbor in Graph.neighbors(current):
// Calculate the tentative distance to the neighbor through the
current node.
tentative_distance = distances[current] +
Graph.distance(current, neighbor)
// If the tentative distance is smaller than the current distance to
the neighbor, update the distance.
if tentative_distance < distances[neighbor]:
distances[neighbor] = tentative_distance
// Enqueue the neighbor with its new distance to be considered
for visitation in the future.
queue.enqueue(neighbor, distances[neighbor])
// Return the calculated distances from the source to all other nodes in
the graph.
return distances
Example
Output: Vertex
Vertex Distance from Source
0 -> 0
1 -> 2
2 -> 6
3 -> 7
4 -> 17
5 -> 22
6 -> 19
Below is the algorithm based on the above idea:
• Initialize the distance values and priority queue.
• Insert the source node into the priority queue with distance 0.
• While the priority queue is not empty:
• Extract the node with the minimum distance from the
priority queue.
• Update the distances of its neighbors if a shorter path
is found.
• Insert updated neighbors into the priority queue.
Below is the C++ Implementation of the above approach:
// Program to find Dijkstra's shortest path using
// priority_queue in STL
#include <bits/stdc++.h>
class Graph {
public:
};
Graph::Graph(int V)
this->V = V;
adj[u].push_back(make_pair(v, w));
adj[v].push_back(make_pair(u, w));
// https://www.geeksforgeeks.org/implement-min-heap-using-stl/
pq;
// its distance as 0.
pq.push(make_pair(0, src));
dist[src] = 0;
while (!pq.empty()) {
// in pair)
int u = pq.top().second;
pq.pop();
// vertex
// adjacent of u.
int v = (*i).first;
// Updating distance of v
int main()
int V = 7;
Graph g(V);
g.addEdge(0, 1, 2);
g.addEdge(0, 2, 6);
g.addEdge(1, 3, 5);
g.addEdge(2, 3, 8);
g.addEdge(3, 4, 10);
g.addEdge(3, 5, 15);
g.addEdge(4, 6, 2);
g.addEdge(5, 6, 6);
g.shortestPath(0);
return 0;
}
Final Answer:
Output
Bellman-Ford
optimized for finding
algorithm is optimized
the shortest path
for finding the shortest
between a single
path between a single
source node and all
source node and all
other nodes in a graph
other nodes in a graph
with non-negative
with negative edge
edge weights
Optimization weights.
Dijkstra’s algorithm
Bellman-Ford
does not work with
algorithm can handle
graphs that have
negative edge weights
negative edge weights,
and can detect
as it assumes that all
negative-weight cycles
edge weights are non-
in the graph.
Negative Weights negative.
Floyd-Warshall
Dijkstra’s algorithm algorithm, on the other
does not work with hand, is an all-pairs
graphs that have shortest path algorithm
negative edge weights, that uses dynamic
as it assumes that all programming to
edge weights are non- calculate the shortest
negative. path between all pairs
Negative Weights of nodes in the graph.
A* algorithm uses a
heuristic function that
estimates the distance
Dijkstra’s algorithm, from the current node
does not use any to the goal node. This
heuristic function and heuristic function is
considers all the nodes admissible, meaning
in the graph. that it never
overestimates the
actual distance to the
Heuristic Function goal node
// Driver's Code
int main() {
// Example 1
string txt1 = "AABAACAADAABAABA";
string pat1 = "AABA";
cout << "Example 1: " << endl;
search(pat1, txt1);
// Example 2
string txt2 = "agd";
string pat2 = "g";
cout << "\nExample 2: " << endl;
search(pat2, txt2);
return 0;
}
Output
Pattern found at index 0
Pattern found at index 9
Pattern found at index 13
Time Complexity: O(N2)
Auxiliary Space: O(1)
Complexity Analysis of Naive algorithm for Pattern Searching:
Best Case: O(n)
• When the pattern is found at the very beginning of the text (or very
early on).
• The algorithm will perform a constant number of comparisons, typically
on the order of O(n) comparisons, where n is the length of the pattern.
Worst Case: O(n2)
• When the pattern doesn’t appear in the text at all or appears only at the
very end.
• The algorithm will perform O((n-m+1)*m) comparisons, where n is the
length of the text and m is the length of the pattern.
• In the worst case, for each position in the text, the algorithm may need
to compare the entire pattern against the text.
Rabin-Karp Algorithm for Pattern Searching
Given a text T[0. . .n-1] and a pattern P[0. . .m-1], write a function search(char P[],
char T[]) that prints all occurrences of P[] present in T[] using Rabin Karp
algorithm. You may assume that n > m.
Examples:
Input: T[] = “THIS IS A TEST TEXT”, P[] = “TEST”
Output: Pattern found at index 10
Input: T[] = “AABAACAADAABAABA”, P[] = “AABA”
Output: Pattern found at index 0
Pattern found at index 9
Pattern found at index 12
Rabin-Karp Algorithm:
In the Naive String Matching algorithm, we check whether every
substring of the text of the pattern’s size is equal to the pattern or not
one by one.
Like the Naive Algorithm, the Rabin-Karp algorithm also check every
substring. But unlike the Naive algorithm, the Rabin Karp algorithm
matches the hash value of the pattern with the hash value of the current
substring of text, and if the hash values match then only it starts
matching individual characters. So Rabin Karp algorithm needs to
calculate hash values for the following strings.
• Pattern itself
• All the substrings of the text of length m which is the size of
pattern.
How is Hash Value calculated in Rabin-Karp?
Hash value is used to efficiently check for potential matches between
a pattern and substrings of a larger text. The hash value is calculated
using a rolling hash function, which allows you to update the hash value
for a new substring by efficiently removing the contribution of the old
character and adding the contribution of the new character. This makes it
possible to slide the pattern over the text and calculate the hash value
for each substring without recalculating the entire hash from scratch.
Here’s how the hash value is typically calculated in Rabin-Karp:
Step 1: Choose a suitable base and a modulus:
• Select a prime number ‘p‘ as the modulus. This choice helps
avoid overflow issues and ensures a good distribution of hash
values.
• Choose a base ‘b‘ (usually a prime number as well), which is
often the size of the character set (e.g., 256 for ASCII
characters).
Step 2: Initialize the hash value:
• Set an initial hash value ‘hash‘ to 0.
Step 3: Calculate the initial hash value for the pattern:
• Iterate over each character in the pattern from left to right.
• For each character ‘c’ at position ‘i’, calculate its contribution to
the hash value as ‘c * (bpattern_length – i – 1) % p’ and add it to
‘hash‘.
• This gives you the hash value for the entire pattern.
Step 4: Slide the pattern over the text:
• Start by calculating the hash value for the first substring of
the text that is the same length as the pattern.
Step 5: Update the hash value for each subsequent substring:
• To slide the pattern one position to the right, you remove the
contribution of the leftmost character and add the contribution
of the new character on the right.
• The formula for updating the hash value when moving from
position ‘i’ to ‘i+1’ is:
hash = (hash - (text[i - pattern_length] * (bpattern_length - 1)) % p)
* b + text[i]
Step 6: Compare hash values:
• When the hash value of a substring in the text matches the
hash value of the pattern, it’s a potential match.
• If the hash values match, we should perform a character-by-
character comparison to confirm the match, as hash
collisions can occur.
Below is the Illustration of above algorithm:
Step-by-step approach:
• Initially calculate the hash value of the pattern.
• Start iterating from the starting of the string:
• Calculate the hash value of the current substring having
length m.
• If the hash value of the current substring and the
pattern are same check if the substring is same as the
pattern.
• If they are same, store the starting index as a valid
answer. Otherwise, continue for the next substrings.
• Return the starting indices as the required answer.
Below is the implementation of the above approach:
#include <bits/stdc++.h>
*/
int M = strlen(pat);
int N = strlen(txt);
int i, j;
int h = 1;
h = (h * d) % q;
// window of text
p = (d * p + pat[i]) % q;
t = (d * t + txt[i]) % q;
if (p == t) {
if (txt[i + j] != pat[j]) {
break;
// ...i+M-1]
if (j == M)
<< endl;
if (i < N - M) {
// it to positive
if (t < 0)
t = (t + q);
}
/* Driver code */
int main()
int q = INT_MAX;
// Function Call
return 0;
Output
Pattern found at index 0
Pattern found at index 10
Time Complexity:
• The average and best-case running time of the Rabin-Karp
algorithm is O(n+m), but its worst-case time is O(nm).
• The worst case of the Rabin-Karp algorithm occurs when all
characters of pattern and text are the same as the hash values
of all the substrings of T[] match with the hash value of P[].
Auxiliary Space: O(1)
Limitations of Rabin-Karp Algorithm
Spurious Hit: When the hash value of the pattern matches with the hash
value of a window of the text but the window is not the actual pattern
then it is called a spurious hit. Spurious hit increases the time complexity
of the algorithm. In order to minimize spurious hit, we use good hash
function. It greatly reduces the spurious hit.
Finite Automata algorithm for Pattern
Searching
Given a text txt[0..n-1] and a pattern pat[0..m-1], write a
function search(char pat[], char txt[]) that prints all occurrences
of pat[] in txt[]. You may assume that n > m.
Examples:
Input: txt[] = "THIS IS A TEST TEXT"
pat[] = "TEST"
Output: Pattern found at index 10
Why it is efficient?
These string matching automaton are very efficient because they
examine each text character exactly once, taking constant time per text
character. The matching time used is O(n) where n is the length of Text
string.
But the preprocessing time i.e. the time taken to build the finite
automaton can be large if ? is large.
Before we discuss Finite Automaton construction, let us take a look at
the following Finite Automaton for pattern ACACAGA.
The above diagrams represent graphical and tabular representations of
pattern ACACAGA.
Number of states in Finite Automaton will be M+1 where M is length of
the pattern. The main thing to construct Finite Automaton is to get the
next state from the current state for every possible character.
Given a character x and a state k, we can get the next state by
considering the string “pat[0..k-1]x” which is basically concatenation of
pattern characters pat[0], pat[1] …pat[k-1] and the character x. The idea
is to get length of the longest prefix of the given pattern such that the
prefix is also suffix of “pat[0..k-1]x”. The value of length gives us the next
state.
For example, let us see how to get the next state from current state 5
and character ‘C’ in the above diagram. We need to consider the string,
“pat[0..4]C” which is “ACACAC”. The length of the longest prefix of the
pattern such that the prefix is suffix of “ACACAC”is 4 (“ACAC”). So the
next state (from state 5) is 4 for character ‘C’.
In the following code, computeTF() constructs the Finite Automaton. The
time complexity of the computeTF() is O(m^3*NO_OF_CHARS) where m
is length of the pattern and NO_OF_CHARS is size of alphabet (total
number of possible characters in pattern and text). The implementation
tries all possible prefixes starting from the longest possible that can be a
suffix of “pat[0..k-1]x”. There are better implementations to construct
Finite Automaton in O(m*NO_OF_CHARS) (Hint: we can use something
like lps array construction in KMP algorithm).
// CPP program for Finite Automata Pattern searching
// Algorithm
#include <bits/stdc++.h>
using namespace std;
#define NO_OF_CHARS 256
return 0;
}
int TF[M+1][NO_OF_CHARS];
computeTF(pat, M, TF);
#include <bits/stdc++.h>
if (j == M) {
printf("Found pattern at index %d ", i - j);
j = lps[j - 1];
}
// Driver code
int main()
{
char txt[] = "ABABDABACDABABCABAB";
char pat[] = "ABABCABAB";
KMPSearch(pat, txt);
return 0;
}
Output
Found pattern at index 10
Time Complexity: O(N+M) where N is the length of the text and M is the
length of the pattern to be found.
Auxiliary Space: O(M)