0% found this document useful (0 votes)
8 views26 pages

Talgo STR CMP

The document discusses string searching algorithms, focusing on the Brute Force approach, Karp-Rabin fingerprint algorithm, and Knuth-Morris-Pratt algorithm. It outlines the mechanics of each algorithm, including their efficiency and worst-case scenarios. The document is part of an educational resource from IIT Kharagpur, dated April 16, 2024.

Uploaded by

manishbojja56
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views26 pages

Talgo STR CMP

The document discusses string searching algorithms, focusing on the Brute Force approach, Karp-Rabin fingerprint algorithm, and Knuth-Morris-Pratt algorithm. It outlines the mechanics of each algorithm, including their efficiency and worst-case scenarios. The document is part of an educational resource from IIT Kharagpur, dated April 16, 2024.

Uploaded by

manishbojja56
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Contents

2 Karp-Rabin fingerprint
algorithm
1 String searching
3 Knuth-Morris-Pratt algorithm

TECHNO
OF LO
TE

GY
ITU
IAN INST

KH
ARAGPUR
IND
 

19 5 1

yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 1 / 17


String searching

Section outline

String search
Brute force approach
1 String searching

TECHNO
OF LO
TE

GY
ITU
IAN INST

KH
ARAGPUR
IND
 

19 5 1

yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 2 / 17


String searching String search

String search
Given a pattern string p, find first match in text t
N: # characters in text
M: # characters in pattern
Length of pattern is small compared to the length of the text (N ≫ M)
Pattern can be pre-preprocessed
Text cannot be pre-processed

Example
Search Text, N = 21
n n e e n l e d e n e e n e e d l e n l d
Search Pattern, M = 6
n e e d l e
Successful search OF
TECHNO
LO
TE

GY
n n e e n l e d e n e e n e e d l e n l d

ITU
IAN INST

KH
ARAGPUR
IND
 

19 5 1

yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 3 / 17


String searching Brute force approach

Brute force approach

Check for pattern starting at every text position


Running time depends on pattern and text
Worst case: MN comparisons, in practice almost linear
Slow if M and N are large, and have lots of repetition

Example (Worst case of brute force approach)


Search Pattern
a a a a a b
Search Text
a a a a a a a a a a a a a a a a a a b
a a a a a b
a a a a a b
a a a a a b
a a a a a b
... TE
OF
TECHNO
LO

GY
ITU
IAN INST

KH
ARAGPUR
IND
 

19 5 1

yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 4 / 17


Karp-Rabin fingerprint algorithm

Section outline

String matching using


hashing
2 Karp-Rabin fingerprint Efficient hash computation
algorithm

TECHNO
OF LO
TE

GY
ITU
IAN INST

KH
ARAGPUR
IND
 

19 5 1

yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 5 / 17


Karp-Rabin fingerprint algorithm String matching using hashing

String matching using hashing

Example
Search pattern
59265 % 97 = 95
5 9 2 6 5
Search Text
3 1 4 1 5 9 2 6 5 3 5 8 9 7 3 3 4 6
3 1 4 1 5 31415 % 97 = 84
1 4 1 5 9 14159 % 97 = 94
4 1 5 9 2 41592 % 97 = 76
1 5 9 2 6 15926 % 97 = 18
...

Match not possible unless computed hash of text substring under


consideration matches hash of search string
In case of match, actual comparison is needed to confirm match TE
OF
TECHNO
LO

GY
ITU
IAN INST

KH
ARAGPUR
IND
 

19 5 1

yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 6 / 17


Karp-Rabin fingerprint algorithm Efficient hash computation

Efficient hash computation


Example

Pre-compute: 10000 % 97 = 9
First hash: 31415 % 97 = 84
...
Previous hash: 41592 % 97 = 76
Efficient next hash computation
of 15926 (% 97) = 18 :

= (41592−(4×10000))×10+6 % 97
= (76−(4×9))×10+6 % 97
= 406 % 97
= 18

TECHNO
OF LO
TE

Published in 1987 as Efficient randomized

GY
ITU
IAN INST

KH
ARAGPUR
IND
 

19 5 1

pattern-matching algorithms
yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 7 / 17


Karp-Rabin fingerprint algorithm Efficient hash computation

Efficient hash computation


Choose modulus to be a large
Example prime (q)
Pre-compute: 10000 % 97 = 9 Each window of N of size |M| is
expected to be uniformly
First hash: 31415 % 97 = 84 distributed in [0, q − 1]
... Expected
 running timeis
Previous hash: 41592 % 97 = 76  
N −M 

O N + M ,

Efficient next hash computation  q 
of 15926 (% 97) = 18 :
| {z }
A
where A is the expected
= (41592−(4×10000))×10+6 % 97 number of matches with H(M)
= (76−(4×9))×10+6 % 97 triggering a full comparison
with M
= 406 % 97 Worst case: Θ(MN) – when?
= 18

TECHNO
OF LO
TE

Published in 1987 as Efficient randomized

GY
ITU
IAN INST

KH
ARAGPUR
IND
 

19 5 1

pattern-matching algorithms
yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 7 / 17


Karp-Rabin fingerprint algorithm Efficient hash computation

Efficient hash computation


Choose modulus to be a large
Example prime (q)
Pre-compute: 10000 % 97 = 9 Each window of N of size |M| is
expected to be uniformly
First hash: 31415 % 97 = 84 distributed in [0, q − 1]
... Expected
 running timeis
Previous hash: 41592 % 97 = 76  
N −M 

O N + M ,

Efficient next hash computation  q 
of 15926 (% 97) = 18 :
| {z }
A
where A is the expected
= (41592−(4×10000))×10+6 % 97 number of matches with H(M)
= (76−(4×9))×10+6 % 97 triggering a full comparison
with M
= 406 % 97 Worst case: Θ(MN) – when?
= 18 Possible if the computed hash
matches every time and
confirmational matching is
TECHNO
OF LO
TE

Published in 1987 as Efficient randomized

GY
ITU
IAN INST

KH
ARAGPUR
IND
 

19 5 1

pattern-matching algorithms needed yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 7 / 17


Knuth-Morris-Pratt algorithm

Section outline

computation
KMP failure function
3 Knuth-Morris-Pratt algorithm algorithm
Optimised pattern matching Overall complexity of KMP
with KMP Optimised failure function
KMP algorithm computation
KMP failure function

TECHNO
OF LO
TE

GY
ITU
IAN INST

KH
ARAGPUR
IND
 

19 5 1

yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 8 / 17


Knuth-Morris-Pratt algorithm Optimised pattern matching with KMP

Optimised pattern matching with KMP


Example
Search text
a a c a a a a b c a a b
Search pattern
1 2 3 4 5 6
a a c a a b a a c a a b
1 2 3 4 5 6 7

Suppose aacaa is received; current state will be 6 and b will be expected


If b is not received, the pattern will have to be moved forward
Instead of moving forward by one position (brute force approach), better to align the prefix
aa with the suffix aa at the point of failure – amounts to resuming comparison at state 3

TECHNO
OF LO
TE

GY
ITU
IAN INST

KH
ARAGPUR
IND
 

19 5 1

yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 9 / 17


Knuth-Morris-Pratt algorithm Optimised pattern matching with KMP

Optimised pattern matching with KMP


Example
Search text
a a c a a a a b c a a b
Search pattern
1 2 3 4 5 6
a a c a a b a a c a a b
1 2 3 4 5 6 7

Suppose aacaa is received; current state will be 6 and b will be expected


If b is not received, the pattern will have to be moved forward
Instead of moving forward by one position (brute force approach), better to align the prefix
aa with the suffix aa at the point of failure – amounts to resuming comparison at state 3
We want the longest prefix (aa) that is a suffix at the point of failure (state 6)

TECHNO
OF LO
TE

GY
ITU
IAN INST

KH
ARAGPUR
IND
 

19 5 1

yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 9 / 17


Knuth-Morris-Pratt algorithm Optimised pattern matching with KMP

Optimised pattern matching with KMP


Example
Search text
a a c a a a a b c a a b
Search pattern
1 2 3 4 5 6
a a c a a b a a c a a b
1 2 3 4 5 6 7

Suppose aacaa is received; current state will be 6 and b will be expected


If b is not received, the pattern will have to be moved forward
Instead of moving forward by one position (brute force approach), better to align the prefix
aa with the suffix aa at the point of failure – amounts to resuming comparison at state 3
We want the longest prefix (aa) that is a suffix at the point of failure (state 6)
Similarly, if after receiving aaca another a is not received; failure is at state 5; comparison
may be resumed from state 2

TECHNO
OF LO
TE

GY
ITU
IAN INST

KH
ARAGPUR
IND
 

19 5 1

yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 9 / 17


Knuth-Morris-Pratt algorithm Optimised pattern matching with KMP

Optimised pattern matching with KMP


Example
Search text
a a c a a a a b c a a b
Search pattern
1 2 3 4 5 6
¬c?
a a c a a b a a c a a b
1 2 3 4 5 6 7

Suppose aacaa is received; current state will be 6 and b will be expected


If b is not received, the pattern will have to be moved forward
Instead of moving forward by one position (brute force approach), better to align the prefix
aa with the suffix aa at the point of failure – amounts to resuming comparison at state 3
We want the longest prefix (aa) that is a suffix at the point of failure (state 6)
Similarly, if after receiving aaca another a is not received; failure is at state 5; comparison
may be resumed from state 2
Failure transitions are meant to step back in the pattern string, staying at the same place
on the search string leads to ∞-loop, so on failure at state 3 (on ¬c), matching is resumed TE
OF
TECHNO
LO

GY
ITU
IAN INST

KH
ARAGPUR
IND
 

at state 2 19 5 1

yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 9 / 17


Knuth-Morris-Pratt algorithm Optimised pattern matching with KMP

Optimised pattern matching with KMP


Example
Search text
Search pattern
a a c a a a a b c a a b
1 2 3 4 5 6
a a c a a b
¬c?
0 1 2 1 2 3 a a c a a b
Failure function 1 2 3 4 5 6 7

Suppose aacaa is received; current state will be 6 and b will be expected


If b is not received, the pattern will have to be moved forward
Instead of moving forward by one position (brute force approach), better to align the prefix
aa with the suffix aa at the point of failure – amounts to resuming comparison at state 3
We want the longest prefix (aa) that is a suffix at the point of failure (state 6)
Similarly, if after receiving aaca another a is not received; failure is at state 5; comparison
may be resumed from state 2
Failure transitions are meant to step back in the pattern string, staying at the same place
on the search string leads to ∞-loop, so on failure at state 3 (on ¬c), matching is resumed TE
OF
TECHNO
LO

GY
ITU
IAN INST

KH
ARAGPUR
IND
 

at state 2 19 5 1

yog, kms kOflm^

The point
CM (IIT of resumption for failure at aAlgorithms
Kharagpur) certain point is the failure function
April 16, 2024 9 / 17
Knuth-Morris-Pratt algorithm KMP algorithm

KMP algorithm
Example
Search text
Search pattern
a a c a a a a b c a a b
1 2 3 4 5 6
a a c a a b
0 1 2 1 2 3 a a c a a b
Failure function 1 2 3 4 5 6 7

Use knowledge of search pattern


Build automaton from pattern
Run automaton on text
On failure, go back to the longest
proper prefix that is a suffix at the
point of the last match – to avoid OF
TECHNO
LO
TE

looping (at state 3, for example)

GY
ITU
IAN INST

KH
ARAGPUR
IND
 

19 5 1

yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 10 / 17


Knuth-Morris-Pratt algorithm KMP algorithm

KMP algorithm
Example
Search text
Search pattern
a a c a a a a b c a a b
1 2 3 4 5 6
a a c a a b
0 1 2 1 2 3 a a c a a b
Failure function 1 2 3 4 5 6 7

1 j←1 // start of Pattern


Use knowledge of search pattern 2 for i←1 to N // scan through Text
Build automaton from pattern 3 while j > 0 and T[i]̸=P[j]
4 j←fail[j] // fail while no match
Run automaton on text
5 if j=M return i−M+1
On failure, go back to the longest
proper prefix that is a suffix at the // terminate (success)
point of the last match – to avoid 6 j←j+1 // move forward in pattern OF
TECHNO
LO
TE

looping (at state 3, for example)

GY
ITU
IAN INST

KH
ARAGPUR
IND
7
 

return NoMatch terminate (failure) 19 5 1

yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 10 / 17


Knuth-Morris-Pratt algorithm KMP failure function computation

KMP failure function computation

Base case fail[1]=0 – need to start over again ✓


Need to compute fail[ı], assuming fail[ȷ], ȷ < ı are available
Inductive cases, starting with k = 1 (i.e. fail [ı − 1] and α′ = ϵ)
failk [ı − 1] indicates the longest proper prefix (say α) that is a suffix
at P[ı − 1]; α = ϵ if failk [ı − 1] = 0
Exit case where failk [ı − 1] = 0 No prefix, so resume matching
at the beginning, fail[ı] = 1 ✓
Case P[ı − 1] = P[failk [ı − 1]]
α a ? ... α′ α a ...
k
fail [ı − 1] ı−1 ı
Thus, α · P[ı − 1] is the longest proper prefix that is also a
suffix at P[ı], so fail[ı] = failk [ı − 1] + 1 ✓ TE
OF
TECHNO
LO

GY
ITU
IAN INST

KH
ARAGPUR
IND
 

19 5 1

yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 11 / 17


Knuth-Morris-Pratt algorithm KMP failure function computation

KMP failure function computation (contd.)


Inductive cases, starting with k = 1 (when α′ = ϵ, contd.)
failk [ı − 1] indicates the longest proper prefix (say α) that is a suffix
at P[ı − 1]; α = ϵ if failk [ı − 1] = 0
Case P[ı − 1] ̸= P[failk [ı − 1]]
α b ... α′ α a ...
k
fail [ı − 1] ı−1 ı
Now, α · P[ı − 1] is not an admissible suffix at P[ı], as
P[ı − 1] ̸= P[failk [ı − 1]]
But, fail[failk [ı − 1]] = failk+1 [ı − 1] indicates the longest
proper prefix (say β) that is a suffix at P[failk [ı − 1]]
β a? ? ... β b ... α′ α a ...
k+1 k
fail [ı − 1] fail [ı − 1] ı−1 ı
Now, β is the longest proper prefix of P and also a suffix of α
Thus, if P[failk+1 [ı − 1]] = P[i − 1], β · P[ı − 1] is the longest
proper prefix at P[ı], so fail[ı] = failk+1 [ı − 1] + 1 ✓ TE
OF
TECHNO
LO

GY
ITU
IAN INST

KH
ARAGPUR

IND
 

Otherwise, continue induction with k ← k + 1


19 5 1

yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 12 / 17


Knuth-Morris-Pratt algorithm KMP failure function computation

KMP failure function computation example


Example (Some steps of failure function computation for “aacaab”)

a a c a a b
1 2 3 4 5 6 7

Consider failure at P[ı] (say, P[6]=b)


We would like to identify the longest proper prefix that is a suffix at P[ı]
The longest proper prefix that is a suffix at P[ı − 1] is denoted by fail[ı − 1]
So, if P[ı − 1]=P[fail[ı − 1]], then fail[ı]=fail[ı − 1]+1
P[5]=a; P[fail[5]]=P[2]=a; so fail[6]=fail[5]+1=2+1=3

TECHNO
OF LO
TE

GY
ITU
IAN INST

KH
ARAGPUR
IND
 

19 5 1

yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 13 / 17


Knuth-Morris-Pratt algorithm KMP failure function computation

KMP failure function computation example


Example (Some steps of failure function computation for “aacaab”)

a a c a a b
1 2 3 4 5 6 7

Consider failure at P[ı] (say, P[6]=b)


We would like to identify the longest proper prefix that is a suffix at P[ı]
The longest proper prefix that is a suffix at P[ı − 1] is denoted by fail[ı − 1]
So, if P[ı − 1]=P[fail[ı − 1]], then fail[ı]=fail[ı − 1]+1
P[5]=a; P[fail[5]]=P[2]=a; so fail[6]=fail[5]+1=2+1=3
If P[ı − 1]̸=P[fail[ı − 1]], then continue checking from
P[fail[fail[ı − 1]]]=P[fail2 [ı − 1]], and so on, but stopping at P[1]
While computing fail[4], we find P[3]=c and P[fail[3](=2)]=a;
P[3]̸=P[fail[3](=2)], so go further back to fail2 [3]=1 and stop there TE
OF
TECHNO
LO

GY
ITU
IAN INST

KH
ARAGPUR
IND
 

(at P[1]=a) 19 5 1

yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 13 / 17


Knuth-Morris-Pratt algorithm KMP failure function algorithm

KMP failure function computation algorithm

a a c a a b
1 2 3 4 5 6 7

Example (FF for “aacaab”)


KMPCompFail(P[1..M]) Search pattern
1 j←0 ı 1 2 3 4 5 6
2 for i←1 to M // scan through M! P[ı] a a c a a b
3 fail[i]←j fail[ı] 0 1 2 1 2 3
c
// next prepare for fail[i+1] a
P[ȷ4 ] – a a a a
4 while (j>0 and P[i]̸=P[j]) do a
a
5 j←fail[j] 2
1
6 done ȷ5 – – – – 1
0
7
0
j←j+1
ȷ7 1 2 1 2 3 1
TECHNO
OF LO
TE

GY
ITU
IAN INST

KH
ARAGPUR
IND
8 endfor
 

19 5 1

yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 14 / 17


Knuth-Morris-Pratt algorithm KMP failure function algorithm

KMP failure function algorithm (contd.)

c a d c a c a d
1 2 3 4 5 6 7 8 9

KMPCompFail(P[1..M])
Example (FF for “cadcacad”)
1 j←0
2
Search pattern
for i←1 to M // scan through M!
ı 1 2 3 4 5 6 7 8
3 fail[i]←j P[ı] c a d c a c a d
// next prepare for fail[i+1] fail[ı] 0 1 1 1 2 3 2 3
4 while (j>0 and P[i]̸=P[j]) do d
P[ȷ4 ] – c c c a a a
5
c
j←fail[j]
1
6 done ȷ5 – 0 0 – – – –

7 j←j+1 ȷ7 1 1 1 2 3 2 3 4 TE
OF
TECHNO
LO

GY
ITU
IAN INST

KH
ARAGPUR
IND
8 endfor
 

19 5 1

yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 15 / 17


Knuth-Morris-Pratt algorithm Overall complexity of KMP

Complexity of computing fail[ı] and running KMP


KMPCompFail(P[1..M])
1 j←0 Note that fail[j]<j
2
In L5 j is decreased by at least 1
for i←1 to M // scan thru M!
Overall j can go back in L5 only as
3 fail[i]←j
much as it has progressed in L7
4 while (j>0 and P[i]̸=P[j]) L7 is executed M times
5 j←fail[j] Complexity of KMPCompFail is
6 done O(M) (from 2M))
7 j←j+1
8 endfor
Example (FF for “aacaab”)
Search pattern
1 2 3 4 5 6
a a c a a b
0 1 2 1 2 3 TE
OF
TECHNO
LO

GY
ITU
IAN INST

KH
ARAGPUR
IND
Failure function
 

19 5 1

yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 16 / 17


Knuth-Morris-Pratt algorithm Overall complexity of KMP

Complexity of computing fail[ı] and running KMP


KMPCompFail(P[1..M])
1 j←0 Note that fail[j]<j
2
In L5 j is decreased by at least 1
for i←1 to M // scan thru M!
Overall j can go back in L5 only as
3 fail[i]←j
much as it has progressed in L7
4 while (j>0 and P[i]̸=P[j]) L7 is executed M times
5 j←fail[j] Complexity of KMPCompFail is
6 done O(M) (from 2M))
7 j←j+1 Using similar reasoning complexity
8 endfor of the KMP algorithm is O(N) (from
Example (FF for “aacaab”)
2N)
Overall complexity: O(M) + O(N)
Search pattern | {z } | {z }
1 2 3 4 5 6 make Fail do KMP
Publication: Fast pattern matching in
a a c a a b
0 1 2 1 2 3 strings, D E Knuth, J H Morris, V R TE
OF
TECHNO
LO

GY
ITU
IAN INST

KH
Pratt, SIAM JoC, v6, n2, June 1997

ARAGPUR
IND
Failure function
 

19 5 1

yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 16 / 17


Knuth-Morris-Pratt algorithm Optimised failure function computation

Optimised failure function computation

a a c a a b
1 2 3 4 5 6 7

KMPOptFail(P[1..M], fail[1..M])
Consider the failure at P[5]=a; 1 for i←2 to M // bottom-up DP
fail[5]=2
2 if P[i]=P[fail[i]] // definite failure
But P[2]=a, so after failing to
match a at P[5], failure is 3 fail[i]←fail[fail[i]] // fail all
guaranteed at P[2] 4 endfor // way back via DP
This definite failure could be Example (Opt FF for “aacaab”)
remedied by going all the way
back to fail3 [5]=0 Search pattern
1 2 3 4 5 6
Function KMPOptFail does the
a a c a a b
required post-processisng –
0 0 2 0 0 3 TE
OF
TECHNO
LO

employing dynamic programming

GY
ITU
IAN INST

KH
ARAGPUR
IND
Optimised failure fn
 

19 5 1

yog, kms kOflm^

CM (IIT Kharagpur) Algorithms April 16, 2024 17 / 17

You might also like