Implementation of Rabin Karp Algorithm in C++
Last Updated :
30 Jul, 2024
The Rabin-Karp Algorithm is a string-searching algorithm that efficiently finds a pattern within a text using hashing. It is particularly useful for finding multiple patterns in the same text or for searching in streams of data. In this article, we will learn how to implement the Rabin-Karp Algorithm in C++.
Example:
Input:
T[] = “THIS IS A TEST TEXT”, P[] = “TEST”
Output:
Pattern found at index 10
The Rabin-Karp Algorithm uses hashing to find patterns in strings. It calculates a hash value for the pattern and compares it to the hash values of substrings in the text. If the hash values match, it performs a character-by-character comparison to confirm the match. This algorithm is efficient for average and best-case scenarios, making it suitable for real-world applications.
Algorithm
- Initialize the following variables:
- m: length of the pattern.
- n: length of the text.
- d: number of characters in the input alphabet (256 in this case).
- q: a prime number used for hash calculation.
- p: hash value for the pattern.
- t: hash value for the current window of text.
- h: helper value for rolling hash calculation, equal to (d^(m-1)) % q.
- Calculate initial hash values:
- Compute the hash value p for the pattern.
- Compute the hash value t for the first window of the text (first m characters).
- Slide the pattern over the text:For each position i in the text (from 0 to n-m):
- If the hash values of the pattern and current text window match (p == t):
- Perform a character-by-character comparison of the pattern with the current text window.
- If all characters match, report a pattern occurrence at index i.
- If i < n-m (not at the end of the text):
- Calculate the hash value for the next text window using the rolling hash technique:
- Remove the contribution of the first character of the current window.
- Add the contribution of the next character after the current window.
- Ensure the new hash value is positive by adding q if necessary.
- Repeat step 3 until the end of the text is reached.
Below is the Illustration of above algorithm:
Hash Calculation
The hash function used in this algorithm is:
H = (d * H + ASCII value of next character) % q
Here,
- H is the current hash value.
- d is the number of characters in the input alphabet.
- q is a prime number.
Rolling Hash
The algorithm uses a rolling hash technique to efficiently calculate hash values for subsequent windows:
- Remove the contribution of the first character: t = t - text[i] * h
- Shift the remaining hash value:t = t * d
- Add the contribution of the new last character:t = t + text[i+m]
- Take the modulus with q: t = t % q
This allows for O(1) time complexity when sliding the window, making the overall average-case time complexity O(n+m), where n is the length of the text and m is the length of the pattern.
C++ Program to Implement Rabin Karp Algorithm
The following program demonstrates the implementation of Rabin Karp Algorithm:
C++
// C++ Program for Implementation of Rabin-Karp Algorithm
#include <iostream>
#include <string>
using namespace std;
// Number of characters in the input alphabet
#define d 256
void rabinKarp(string pattern, string text, int q)
{
int m = pattern.length();
int n = text.length();
int i, j;
// Hash value for pattern
int p = 0;
// Hash value for text
int t = 0;
int h = 1;
// The value of h would be "pow(d, m-1)%q"
for (i = 0; i < m - 1; i++)
h = (h * d) % q;
// Calculate the hash value of pattern and first window
// of text
for (i = 0; i < m; i++) {
p = (d * p + pattern[i]) % q;
t = (d * t + text[i]) % q;
}
// Slide the pattern over text one by one
for (i = 0; i <= n - m; i++) {
// Check the hash values of current window of text
// and pattern
if (p == t) {
// Check for characters one by one
for (j = 0; j < m; j++) {
if (text[i + j] != pattern[j])
break;
}
if (j == m)
cout << "Pattern found at index " << i
<< endl;
}
// Calculate hash value for next window of text
if (i < n - m) {
t = (d * (t - text[i] * h) + text[i + m]) % q;
if (t < 0)
t = (t + q);
}
}
}
// Driver Code
int main()
{
string text = "GEEKSFORGEEKS";
string pattern = "GEEKS";
int q = 101;
rabinKarp(pattern, text, q);
return 0;
}
OutputPattern found at index 0
Pattern found at index 8
Time Complexity:
- The average and best-case running time of the Rabin-Karp algorithm is O(n+m), but its worst-case time is O(nm).
- The worst case of the Rabin-Karp algorithm occurs when all characters of pattern and text are the same as the hash values of all the substrings of T[] match with the hash value of P[].
Auxiliary Space: O(1)
Limitations of Rabin-Karp Algorithm
When the hash value of the pattern matches with the hash value of a window of the text but the window is not the actual pattern then it is called a spurious hit. Spurious hit increases the time complexity of the algorithm. In order to minimize spurious hit, we use good hash function. It greatly reduces the spurious hit.
Related Posts
Similar Reads
Implementation of Wu Manber Algorithm? What is Wu- Manber Algorithm? The Wu-Manber algorithm is a string-matching algorithm that is used to efficiently search for patterns in a body of text. It is a hybrid algorithm that combines the strengths of the Boyer-Moore and Knuth-Morris-Pratt algorithms to provide fast and accurate pattern match
12 min read
Tarjanâs Algorithm in C++ In this post, we will see the implementation of Tarjanâs Algorithm in C++ language.What is Tarjanâs Algorithm?Tarjanâs Algorithm is a well-known algorithm used for finding strongly connected components (SCCs) in a directed graph. An SCC is a maximal subgraph where every vertex is reachable from ever
6 min read
Floyd-Warshall Algorithm in C++ The Floyd-Warshall algorithm is a dynamic programming technique used to find the shortest paths between all pairs of vertices in a weighted graph. This algorithm is particularly useful for graphs with dense connections and can handle both positive and negative edge weights, though it cannot handle n
4 min read
Kahnâs Algorithm in C++ In this post, we will see the implementation of Kahnâs Algorithm in C++.What is Kahnâs Algorithm?Kahnâs Algorithm is a classic algorithm used for topological sorting of a directed acyclic graph (DAG). Topological sorting is a linear ordering of vertices such that for every directed edge u -> v, v
4 min read
Prim's Algorithm in C++ Prim's Algorithm is a greedy algorithm that is used to find the Minimum Spanning Tree (MST) for a weighted, undirected graph. MST is a subset of the graph's edges that connects all vertices together without any cycles and with the minimum possible total edge weight In this article, we will learn the
6 min read
Kosarajuâs Algorithm in C++ In this post, we will see the implementation of Kosarajuâs Algorithm in C++.What is Kosarajuâs Algorithm?Kosarajuâs Algorithm is a classic algorithm used for finding strongly connected components (SCCs) in a directed graph. An SCC is a maximal subgraph where every vertex is reachable from every othe
4 min read
How to Converting a R code into C++ for Rcpp implementation When dealing with performance issues in R code, there may be situations where R alone is not sufficiently fast. To rescue there is a powerful package in R Programming Language called Rcpp that allows for seamless integration of C++ code into R, providing significant performance improvements. Convert
11 min read
Prim's Algorithm in C Primâs algorithm is a greedy algorithm that finds the minimum spanning tree (MST) for a weighted undirected graph. It starts with a single vertex and grows the MST one edge at a time by adding the smallest edge that connects a vertex in the MST to a vertex outside the MST. In this article, we will l
6 min read
Kahnâs Algorithm in C Language In this post, we will see the implementation of Kahn's Algorithm in C language.What is Kahn's Algorithm?The Kahn's Algorithm is a classic algorithm used to the perform the topological sorting on the Directed Acyclic Graph (DAG). It produces a linear ordering of the vertices such that for every direc
5 min read
Barrett Reduction Algorithm (Optimized Variant) The Barrett Reduction Algorithm computes the remainder of a big integer x divided by another large integer mod. The procedure precomputes mu, the inverse of mod with regard to the next power of 2. All mod calculations utilize this value. Approach: The Barrett Reduction Algorithm approximates x divid
8 min read