String Matching
String Matching
String Matching
Overview
A B C A B A A C A B TEXT
SHIFT=3
A B A A PATTERN
EXAMPLE
SUPPOSE, T=1011101110 P=111, FIND ALL VALID
SHIFT……
1 1
T=Text 1 0 1 1 1 0 1 1 1 0
S=0
P=Pattern 1 1
1 1 1
• 1
• 1
1 0 1 • 1 1 0 0 1 1 1 0
• 1
• 1
• 1
• 0
S=1 • 1
1 1 • 1 1
• 1
• 0
• 0
• 1
• 1
• 1
• 0
• 1
• 1
• 1
• 0
1 0 1 1 1 0 1 1 1 0
S=2
1 1 1
S=3
1 1 1
1 0 1 1 1 0 1 1 1 0
S=4
1 1 1
1 0 1 1 1 0 1 1 1 0
S=5
1 1 1
1 0 1 1 1 0 1 1 1 0
S=6
1 1 1
1 1 1
Algorithm Analysis
• It takes time ((n-m+1)m) in the worst case.
• For each of the (n-m+1) possible shifts s, line 4 will
execute m times. Hence the worst case running time is
((n-m+1)m) which is m2.
THE RABIN-KARP ALGORITHM
• Rabin and Karp proposed a string matching
algorithm that performs well in practice and that also
generalizes to other algorithms for related
problems, such as two dimensional pattern
matching.
ALGORITHM
• RABIN-KARP-MATCHER(T,P,d,q)
n = T.length
m = P.length
h = d^(m-1) mod q
p=0
t =0
for i =1 to m
p = (dp + P[i]) mod q
t = (d t + T[i]) mod q
for s = 0 to n – m
if p == t
if P[1…m] == T[s+1…. s+m]
printf “ Pattern occurs with shift ” s
if s< n-m
then ts+1 = (d(t- T[s+1]h)+ T[s+m+1]) mod q
Pattern P=26, how many spurious hits does the Rabin Karp
matcher in the text T=31415926535, P = 26 will have?
S=1
3 1 4 1 5 9 2 6 5 3 5
14 mod 11 = 3
not equal to 4
S=2 3 1 4 1 5 9 2 6 5 3 5
S=4
3 1 4 1 5 9 2 6 5 3 5
HIT
3 1 4 1 5 9 2 6 5 3 5
S=5
26 mod 11 = 4 EXACT
MATCH
S=7
3 1 4 1 5 9 2 6 5 3 5
65 mod 11 = 10
not equal to 4
S=8 3 1 4 1 5 9 2 6 5 3 5
53 mod 11 = 9
not equal to 4
S=9
3 1 4 1 5 9 2 6 5 3 5