Rabin-Karp Algorithm
Rabin-Karp Algorithm
#string #pattern-matching
The Rabin-Karp Algorithm is a string search algorithm that uses hashing to find
occurrences of a pattern in a text. It is particularly efficient when there are multiple
patterns to search for or when checking a large text for multiple occurrences of a pattern.
Key Idea
The main idea behind Rabin-Karp is to compute a hash value for the pattern and
compare it with the hash values of substrings of the text. Instead of recalculating the
hash of each substring from scratch, a rolling hash technique is used to update the hash
as the window slides over the text.
Rolling Hash
To compute the hash of a string S of length mm, we use the formula:
m−1 m−2
hash(S) = (S[0] ⋅ d + S[1] ⋅ d + ⋯ + S[m − 1]) mod q
Where:
d is the base (typically a large number like 256 for character encoding).
q is a large prime number to minimize collisions.
S[i] represents the i-th character of the string S.
To update the hash for the next substring of length m, we use the rolling hash formula:
m−1
newH ash = (d ⋅ (oldH ash − S[i] ⋅ d ) + S[i + m]) mod q
Where:
Time Complexity
Average Case:
The average time complexity is O(n + m), where nn is the length of the text and mm
is the length of the pattern. This is due to the efficient use of rolling hash.
Worst Case:
The worst-case time complexity can be O(n ⋅ m) due to hash collisions. In such cases,
we would have to verify the actual characters when hashes match.
Space Complexity
Space Complexity:
The space complexity is O(1) if we don't use extra storage for the hashes or other
structures. However, if we need to store multiple hashes (for multiple patterns), the
space complexity could be O(k), where k is the number of patterns.