0% found this document useful (0 votes)
15 views

Rabin-Karp Algorithm

explanation of rabin-karp in outsdanding way

Uploaded by

Andrej Mikuš
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Rabin-Karp Algorithm

explanation of rabin-karp in outsdanding way

Uploaded by

Andrej Mikuš
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

A naive string matching algorithm compares the given pattern against all positions

in the given text. Each comparison takes time proportional to the length of the
pattern, and the number of positions is proportional to the length of the text.
Therefore, the worst-case time for such a method is proportional to the product of
the two lengths. In many practical cases, this time can be significantly reduced by
cutting short the comparison at each position as soon as a mismatch is found, but
this idea cannot guarantee any speedup.

Several string-matching algorithms, including the Knuth–Morris–Pratt algorithm and


the Boyer–Moore string-search algorithm, reduce the worst-case time for string
matching by extracting more information from each mismatch, allowing them to skip
over positions of the text that are guaranteed not to match the pattern. The Rabin–
Karp algorithm instead achieves its speedup by using a hash function to quickly
perform an approximate check for each position, and then only performing an exact
comparison at the positions that pass this approximate check.

A hash function is a function which converts every string into a numeric value,
called its hash value; for example, we might have hash("hello")=5. If two strings
are equal, their hash values are also equal. For a well-designed hash function, the
inverse is true, in an approximate sense: strings that are unequal are very
unlikely to have equal hash values. The Rabin–Karp algorithm proceeds by computing,
at each position of the text, the hash value of a string starting at that position
with the same length as the pattern. If this hash value equals the hash value of
the pattern, it performs a full comparison at that position.

In order for this to work well, the hash function should be selected randomly from
a family of hash functions that are unlikely to produce many false positives, that
is, positions of the text which have the same hash value as the pattern but do not
actually match the pattern. These positions contribute to the running time of the
algorithm unnecessarily, without producing a match. Additionally, the hash function
used should be a rolling hash, a hash function whose value can be quickly updated
from each position of the text to the next. Recomputing the hash function from
scratch at each position would be too slow.
The algorithm

The algorithm is as shown:

function RabinKarp(string s[1..n], string pattern[1..m])


hpattern := hash(pattern[1..m]);
for i from 1 to n-m+1
hs := hash(s[i..i+m-1])
if hs = hpattern
if s[i..i+m-1] = pattern[1..m]
return i
return not found

Lines 2, 4, and 6 each require O(m) time. However, line 2 is only executed once,
and line 6 is only executed if the hash values match, which is unlikely to happen
more than a few times. Line 5 is executed O(n) times, but each comparison only
requires constant time, so its impact is O(n). The issue is line 4.

Naively computing the hash value for the substring s[i+1..i+m] requires O(m) time
because each character is examined. Since the hash computation is done on each
loop, the algorithm with a naive hash computation requires O(mn) time, the same
complexity as a straightforward string matching algorithm. For speed, the hash must
be computed in constant time. The trick is the variable hs already contains the
previous hash value of s[i..i+m-1]. If that value can be used to compute the next
hash value in constant time, then computing successive hash values will be fast.
The trick can be exploited using a rolling hash. A rolling hash is a hash function
specially designed to enable this operation. A trivial (but not very good) rolling
hash function just adds the values of each character in the substring. This rolling
hash formula can compute the next hash value from the previous value in constant
time:

s[i+1..i+m] = s[i..i+m-1] - s[i] + s[i+m]

This simple function works, but will result in statement 5 being executed more
often than other more sophisticated rolling hash functions such as those discussed
in the next section.

Good performance requires a good hashing function for the encountered data. If the
hashing is poor (such as producing the same hash value for every input), then line
6 would be executed O(n) times (i.e. on every iteration of the loop). Because
character-by-character comparison of strings with length m takes O(m) time, the
whole algorithm then takes a worst-case O(mn) time.

You might also like