String Matching Algorithms
String Matching Algorithms
● Introduction
● String Matching
● Basic Classification
● Naive Algorithm
● Rabin-Karp Algorithm
○ String Hashing
○ Hash value for substrings
● Knuth-Morris-Pratt Algorithm
○ Prefix Function
○ KMP Matcher
● Summary
INTRODUCTION
String S = a b c a b a a b c a b a c
Pattern P = a b a a
abaa
abaa
ILLUSTRATION
abaa
Since mismatch is detected, shift ‘p’ one position to the left and
perform steps analogous to those from step 1 to step 3. At
position where mismatch is detected, shift ‘p’ one position to
right and repeat matching procedure.
ILLUSTRATION
Finally, a match is found after shifting ‘p’ three times to the right
side.
abcabaabcabac
abaa
Algorithm :
1. Calculate the hash for the pattern P
2. Calculate the hash values for all the prefixes of the text T.
3. Now, we can compare a substring of length |s| in constant
time using the calculated hashes.
Solution -
Problem : Given string S and indices i and j . Find the hash value
of S[i..j]
Solution :
By definition we have,
Multiplying by pi gives,
vector<int> occurrences;
for (int i = 0; i + S - 1 < T; i++) {
long long cur_h = (h[i+S] + m - h[i]) % m;
if (cur_h == h_s * p_pow[i] % m)
occurrences.push_back(i);
}
return occurrences;
}
KNUTH-MORRIS-PRATT ALGORITHM
Knuth, Morris and Pratt proposed a linear time algorithm for the
string matching problem.
Mathematically,
EXAMPLE
S = “aabaaab”
PREFIX Ⲡ[i]
a a 0
aa aa 1
aab aab 0
aaba aaba 1
aabaa aabaa 2
aabaaa aabaaa 2
aabaaab aabaaab 3
ALGORITHM TO COMPUTE PREFIX FUNCTION
● If we have reached the length j=0 and still don’t have the
match, then we assign Ⲡ[i] = 0 and go to the next index ‘i+1’
PSEUDOCODE
Runtime - O(n)
KMP MATCHER
S = “aba”
T = “aababac”
Generated string(G) = “aba#aababac”
Index (i) PREFIX Ⲡ[i]
4 a 1
5 aa 1
6 aab 2
7 aaba 3
8 aabab 2
9 aababa 3
10 aababac 0
Runtime: O(n+m)
SUMMARY