Talgo STR CMP
Talgo STR CMP
2 Karp-Rabin fingerprint
algorithm
1 String searching
3 Knuth-Morris-Pratt algorithm
TECHNO
OF LO
TE
GY
ITU
IAN INST
KH
ARAGPUR
IND
19 5 1
Section outline
String search
Brute force approach
1 String searching
TECHNO
OF LO
TE
GY
ITU
IAN INST
KH
ARAGPUR
IND
19 5 1
String search
Given a pattern string p, find first match in text t
N: # characters in text
M: # characters in pattern
Length of pattern is small compared to the length of the text (N ≫ M)
Pattern can be pre-preprocessed
Text cannot be pre-processed
Example
Search Text, N = 21
n n e e n l e d e n e e n e e d l e n l d
Search Pattern, M = 6
n e e d l e
Successful search OF
TECHNO
LO
TE
GY
n n e e n l e d e n e e n e e d l e n l d
ITU
IAN INST
KH
ARAGPUR
IND
19 5 1
GY
ITU
IAN INST
KH
ARAGPUR
IND
19 5 1
Section outline
TECHNO
OF LO
TE
GY
ITU
IAN INST
KH
ARAGPUR
IND
19 5 1
Example
Search pattern
59265 % 97 = 95
5 9 2 6 5
Search Text
3 1 4 1 5 9 2 6 5 3 5 8 9 7 3 3 4 6
3 1 4 1 5 31415 % 97 = 84
1 4 1 5 9 14159 % 97 = 94
4 1 5 9 2 41592 % 97 = 76
1 5 9 2 6 15926 % 97 = 18
...
GY
ITU
IAN INST
KH
ARAGPUR
IND
19 5 1
Pre-compute: 10000 % 97 = 9
First hash: 31415 % 97 = 84
...
Previous hash: 41592 % 97 = 76
Efficient next hash computation
of 15926 (% 97) = 18 :
= (41592−(4×10000))×10+6 % 97
= (76−(4×9))×10+6 % 97
= 406 % 97
= 18
TECHNO
OF LO
TE
GY
ITU
IAN INST
KH
ARAGPUR
IND
19 5 1
pattern-matching algorithms
yog, kms kOflm^
TECHNO
OF LO
TE
GY
ITU
IAN INST
KH
ARAGPUR
IND
19 5 1
pattern-matching algorithms
yog, kms kOflm^
GY
ITU
IAN INST
KH
ARAGPUR
IND
19 5 1
Section outline
computation
KMP failure function
3 Knuth-Morris-Pratt algorithm algorithm
Optimised pattern matching Overall complexity of KMP
with KMP Optimised failure function
KMP algorithm computation
KMP failure function
TECHNO
OF LO
TE
GY
ITU
IAN INST
KH
ARAGPUR
IND
19 5 1
TECHNO
OF LO
TE
GY
ITU
IAN INST
KH
ARAGPUR
IND
19 5 1
TECHNO
OF LO
TE
GY
ITU
IAN INST
KH
ARAGPUR
IND
19 5 1
TECHNO
OF LO
TE
GY
ITU
IAN INST
KH
ARAGPUR
IND
19 5 1
GY
ITU
IAN INST
KH
ARAGPUR
IND
at state 2 19 5 1
GY
ITU
IAN INST
KH
ARAGPUR
IND
at state 2 19 5 1
The point
CM (IIT of resumption for failure at aAlgorithms
Kharagpur) certain point is the failure function
April 16, 2024 9 / 17
Knuth-Morris-Pratt algorithm KMP algorithm
KMP algorithm
Example
Search text
Search pattern
a a c a a a a b c a a b
1 2 3 4 5 6
a a c a a b
0 1 2 1 2 3 a a c a a b
Failure function 1 2 3 4 5 6 7
GY
ITU
IAN INST
KH
ARAGPUR
IND
19 5 1
KMP algorithm
Example
Search text
Search pattern
a a c a a a a b c a a b
1 2 3 4 5 6
a a c a a b
0 1 2 1 2 3 a a c a a b
Failure function 1 2 3 4 5 6 7
GY
ITU
IAN INST
KH
ARAGPUR
IND
7
GY
ITU
IAN INST
KH
ARAGPUR
IND
19 5 1
GY
ITU
IAN INST
KH
ARAGPUR
⟳
IND
a a c a a b
1 2 3 4 5 6 7
TECHNO
OF LO
TE
GY
ITU
IAN INST
KH
ARAGPUR
IND
19 5 1
a a c a a b
1 2 3 4 5 6 7
GY
ITU
IAN INST
KH
ARAGPUR
IND
(at P[1]=a) 19 5 1
a a c a a b
1 2 3 4 5 6 7
GY
ITU
IAN INST
KH
ARAGPUR
IND
8 endfor
19 5 1
c a d c a c a d
1 2 3 4 5 6 7 8 9
KMPCompFail(P[1..M])
Example (FF for “cadcacad”)
1 j←0
2
Search pattern
for i←1 to M // scan through M!
ı 1 2 3 4 5 6 7 8
3 fail[i]←j P[ı] c a d c a c a d
// next prepare for fail[i+1] fail[ı] 0 1 1 1 2 3 2 3
4 while (j>0 and P[i]̸=P[j]) do d
P[ȷ4 ] – c c c a a a
5
c
j←fail[j]
1
6 done ȷ5 – 0 0 – – – –
−
7 j←j+1 ȷ7 1 1 1 2 3 2 3 4 TE
OF
TECHNO
LO
GY
ITU
IAN INST
KH
ARAGPUR
IND
8 endfor
19 5 1
GY
ITU
IAN INST
KH
ARAGPUR
IND
Failure function
19 5 1
GY
ITU
IAN INST
KH
Pratt, SIAM JoC, v6, n2, June 1997
ARAGPUR
IND
Failure function
19 5 1
a a c a a b
1 2 3 4 5 6 7
KMPOptFail(P[1..M], fail[1..M])
Consider the failure at P[5]=a; 1 for i←2 to M // bottom-up DP
fail[5]=2
2 if P[i]=P[fail[i]] // definite failure
But P[2]=a, so after failing to
match a at P[5], failure is 3 fail[i]←fail[fail[i]] // fail all
guaranteed at P[2] 4 endfor // way back via DP
This definite failure could be Example (Opt FF for “aacaab”)
remedied by going all the way
back to fail3 [5]=0 Search pattern
1 2 3 4 5 6
Function KMPOptFail does the
a a c a a b
required post-processisng –
0 0 2 0 0 3 TE
OF
TECHNO
LO
GY
ITU
IAN INST
KH
ARAGPUR
IND
Optimised failure fn
19 5 1