0% found this document useful (0 votes)
28 views7 pages

The Knuth Morris Pratt Algorithm

Uploaded by

Shreyasis Roy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views7 pages

The Knuth Morris Pratt Algorithm

Uploaded by

Shreyasis Roy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7

The Knuth Morris Pratt

Algorithm
String matching algorithm
KMP Algorithm
• Due to Knuth, Morris, Pratt .
• Time complexity Θ(n+m) .
• Use an auxiliary function π[1…m] precomputed from pattern in
O(m) time .
• Avoids computing function δ .
• For any state q = 0,1,..m and any char a ϵ Σ, π[q] contains the info
that is independent of a .
• π has only m entries δ has O(m|Σ|) entries .
• We save a factor of |Σ| .
PREFIX Function for a Pattern
• Encapsulates knowledge about how the pattern matches against
shifts of itself.
• The above information helps to avoid testing useless shifts in
naïve algorithm or computing the δ in S.M.Automaton
• T b a c b a b a b a b b c b a b

q = 5 characters
P S a b a b a c a have matched at
shift s
s=4
q=5 a b a b a Þ s+1 is an invalid
s+q = 9 a b a b a c a shift
S' But s+2 may be
possible.
s’ = 6, a b a
k=3
Question ??
• P[1…m], T[1…n]
Given that P[1…q] matched text char T[s+1…s+q], what is the least shift
s’>s such that P[1…k] = T[s’+1..s’+k]
Where s’+k = s+q ?
• Such a shift is the 1st shift of interest after s .
• Best case s’ = s+q , so reject all shifts s+1,s+2,…,s+q-1.
• Also in any case, at new shift s’, we don’t need to compare the first k
characters of P, with the corresponding T characters.
• This answer can be precomputed by comparing pattern against itself.
• Note- Since T[s’+1,…s’+k] is part of the known portion of the text,
pk is a suffix of pq , so ask the question in a different way ---
Find largest k<q, ϶ pk ⊐ pq
then s’ s+ (q-k) is the next potentially valid shift.
• So store k at the new shift s’ rather than s-s’.
• Prefix Function-
π{1,2,..m} → {0,1,..,m-1}
such that
π[q] = max{k : k<q and pk ⊐ pq }
• So π[q] is the length of the longest prefix of P that is proper suffix
of Pq .
COMPUTE –PREFIX –FUNCTION (P)

1. m ← length [P]
2. π[1] ← 0
3. k ← 0
4. For q ← 2 to m
5. do while k > 0 and p[k+1] ≠ p[q]
6. do k ← π[k]
7. if p[k+1] = p[q]
8. then k ← k+1
9. π[q] ← k
10. return π
• KMP Matcher
1. n ← length[T]
2. m ← length[P]
3. π ← Compute Prefix Function(P)
4. q ← 0
5. For i ← 1 to n
6. do while q > 0 and p[q+1] ≠ T[i]
7. do q ← π[q]
8. if p[q+1] = T[i]
9. then q ← q+1
10. if q=m
11. then print “Pattern valid shift ” i-m
12. q ← π[q]
Time complexity = O(m+n)

You might also like