0% found this document useful (0 votes)

30 views10 pages

32.4 The Knuth-Morris-Pratt Algorithm: Either

Uploaded by

tripathiaryashi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views10 pages

32.4 The Knuth-Morris-Pratt Algorithm: Either

Uploaded by

tripathiaryashi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

32.

4 The Knuth-Morris-Pratt algorithm 975

? 32.3-5
Given two patterns P and P 0 , describe how to construct a ûnite automaton that
determines all occurrences of either pattern. Try to minimize the number of states
in your automaton.

32.3-6
Given a pattern P containing gap characters (see Exercise 32.1-4), show how to
build a ûnite automaton that can ûnd an occurrence of P in a text T in O.n/
matching time, where n D jT j.

? 32.4 The Knuth-Morris-Pratt algorithm

Knuth, Morris, and Pratt developed a linear-time string matching algorithm that
avoids computing the transition function ı altogether. Instead, the KMP algorithm
uses an auxiliary function � , which it precomputes from the pattern in ‚.m/ time
and stores in an array �Œ1 W m�. The array � allows the algorithm to compute the
transition function ı efûciently (in an amortized sense) <on the üy= as needed.
Loosely speaking, for any state q D 0; 1; : : : ; m and any character a 2 †, the
value �Œq � contains the information needed to compute ı.q; a/ but that does not
depend on a. Since the array � has only m entries, whereas ı has ‚.m j†j/ en-
tries, the KMP algorithm saves a factor of j†j in the preprocessing time by com-
puting � rather than ı . Like the procedure F INITE -AUTOMATON -M ATCHER, once
preprocessing has completed, the KMP algorithm uses ‚.n/ matching time.

The preûx function for a pattern

The preûx function � for a pattern encapsulates knowledge about how the pattern
matches against shifts of itself. The KMP algorithm takes advantage of this infor-
mation to avoid testing useless shifts in the naive pattern-matching algorithm and to
avoid precomputing the full transition function ı for a string-matching automaton.
Consider the operation of the naive string matcher. Figure 32.9(a) shows a par-
ticular shift s of a template containing the pattern P D ababaca against a text T .
For this example, q D 5 of the characters have matched successfully, but the 6th
pattern character fails to match the corresponding text character. The informa-
tion that q characters have matched successfully determines the corresponding text
characters. Because these q text characters match, certain shifts must be invalid.
In the example of the ûgure, the shift s C 1 is necessarily invalid, since the ûrst
pattern character (a) would be aligned with a text character that does not match the
ûrst pattern character, but does match the second pattern character (b). The shift
976 Chapter 32 String Matching

s0 D s C 2 shown in part (b) of the ûgure, however, aligns the ûrst three pattern
characters with three text characters that necessarily match.
More generally, suppose that you know that P Œ W q� h T Œ W s C q� or, equiva-
lently, that P Œ1 W q� D T Œs C 1 W s C q�. You want to shift P so that some shorter
preûx P Œ W k� of P matches a sufûx of T Œ W s C q�, if possible. You might have more
than one choice for how much to shift, however. In Figure 32.9(b), shifting P by 2
positions works, so that P Œ W 3� h T Œ W s C q�, but so does shifting P by 4 positions,
so that P Œ W 1� h T Œ W s C q� in Figure 32.9(c). If more than one shift amount works,
you should choose the smallest shift amount so that you do not miss any potential
matches. Put more precisely, you want to answer this question:
Given that pattern characters P Œ1 W q� match text characters T Œs C 1 W s C q�
(that is, P Œ W q� h T Œ W s C q�), what is the least shift s 0 > s such that for
some k < q ,

P Œ1 W k� D T Œs 0 C 1 W s 0 C k� ; (32.6)

(that is, P Œ W k� h T Œ W s 0 C k�), where s 0 C k D s C q?

Here’s another way to look at this question. If you know P Œ W q� h T Œ W s C q�,
then how do you ûnd the longest proper preûx P Œ W k� of P Œ W q� that is also a sufûx
of T Œ W s C q�? These questions are equivalent because given s and q , requiring
s 0 C k D s C q means that ûnding the smallest shift s 0 (2 in Figure 32.9(b)) is
tantamount to ûnding the longest preûx length k (3 in Figure 32.9(b)). If you add
the difference q  k in the lengths of these preûxes of P to the shift s , you get the
new shift s 0 , so that s 0 D s C .q  k/. In the best case, k D 0, so that s 0 D s C q ,
immediately ruling out shifts s C 1; s C 2; : : : ; s C q  1. In any case, at the new
shift s 0 , it is redundant to compare the ûrst k characters of P with the corresponding
characters of T , since equation (32.6) guarantees that they match.
As Figure 32.9(d) demonstrates, you can precompute the necessary information
by comparing the pattern against itself. Since T Œs 0 C 1 W s 0 C k� is part of the
matched portion of the text, it is a sufûx of the string P Œ W q�. Therefore, think
of equation (32.6) as asking for the greatest k < q such that P Œ W k� h P Œ W q�.
Then, the new shift s 0 D s C .q  k/ is the next potentially valid shift. It will be
convenient to store, for each value of q , the number k of matching characters at the
new shift s 0 , rather than storing, say, the amount s 0  s to shift by.
Let’s look at the precomputed information a little more formally. For a given
pattern P Œ1 W m�, the preûx function for P is the function � W f1; 2; : : : ; mg !
f0; 1; : : : ; m  1g such that
�Œq � D max fk W k < q and P Œ W k� h P Œ W q�g :
That is, �Œq � is the length of the longest preûx of P that is a proper sufûx of P Œ W q�.
Here is the complete preûx function � for the pattern ababaca:
32.4 The Knuth-Morris-Pratt algorithm 977

b a c b a b a b a a b c b a b T b a c b a b a b a a b c b a b T

s a b a b a c a P sʹ = s + 2 a b a b a c a P
q k
(a) (b)

b a c b a b a b a a b c b a b T a b a b a P Œ W q�

s+4 a b a b a c a P a b a P Œ W k�
(c) (d)

Figure 32.9 The preûx function � . (a) The pattern P D ababaca aligns with a text T so that the
ûrst q D 5 characters match. Matching characters, in blue, are connected by blue lines. (b) Knowing
these particular 5 matched characters (P Œ W 5�) sufûces to deduce that a shift of s C 1 is invalid,
but that a shift of s 0 D s C 2 is consistent with everything known about the text and therefore is
potentially valid. The preûx P Œ W k�, where k D 3, aligns with the text seen so far. (c) A shift of s C 4
is also potentially valid, but it leaves only the preûx P Œ W 1� aligned with the text seen so far. (d) To
precompute useful information for such deductions, compare the pattern with itself. Here, the longest
preûx of P that is also a proper sufûx of P Œ W 5� is P Œ W 3�. The array � represents this precomputed
information, so that �Œ5� D 3. Given that q characters have matched successfully at shift s , the next
potentially valid shift is at s 0 D s C .q  �Œq �/ as shown in part (b).

i 1 2 3 4 5 6 7
P Œi � a b a b a c a
�Œi � 0 0 1 2 3 0 1

The procedure KMP-M ATCHER on the following page gives the Knuth-Morris-
Pratt matching algorithm. The procedure follows from F INITE -AUTOMATON -
M ATCHER for the most part. To compute � , KMP-M ATCHER calls the auxiliary
procedure C OMPUTE -P REFIX -F UNCTION. These two procedures have much in
common, because both match a string against the pattern P : KMP-M ATCHER
matches the text T against P , and C OMPUTE -P REFIX -F UNCTION matches P
against itself.
Next, let’s analyze the running times of these procedures. Then we’ll prove them
correct, which will be more complicated.

Running-time analysis
The running time of C OMPUTE -P REFIX -F UNCTION is ‚.m/, which we show by
using the aggregate method of amortized analysis (see Section 16.1). The only
tricky part is showing that the while loop of lines 536 executes O.m/ times alto-
978 Chapter 32 String Matching

KMP-M ATCHER .T; P; n; m/

1 � D C OMPUTE -P REFIX -F UNCTION .P; m/
2 q D 0 // number of characters matched
3 for i D 1 to n // scan the text from left to right
4 while q > 0 and P Œq C 1� ¤ T Œi �
5 q D �Œq � // next character does not match
6 if P Œq C 1� = = T Œi �
7 q D qC1 // next character matches
8 if q == m // is all of P matched?
9 print <Pattern occurs with shift= i  m
10 q D �Œq � // look for the next match

C OMPUTE -P REFIX -F UNCTION .P; m/

1 let �Œ1 W m� be a new array
2 �Œ1� D 0
3 k D0
4 for q D 2 to m
5 while k > 0 and P Œk C 1� ¤ P Œq�
6 k D �Œk �
7 if P Œk C 1� == P Œq�
8 k D kC1
9 �Œq � D k
10 return �

gether. Starting with some observations about k , we’ll show that it makes at most
m  1 iterations. First, line 3 starts k at 0, and the only way that k increases is by the
increment operation in line 8, which executes at most once per iteration of the for
loop of lines 439. Thus, the total increase in k is at most m  1. Second, since k < q
upon entering the for loop and each iteration of the loop increments q , we always
have k < q . Therefore, the assignments in lines 2 and 9 ensure that �Œq � < q for
all q D 1; 2; : : : ; m, which means that each iteration of the while loop decreases k .
Third, k never becomes negative. Putting these facts together, we see that the total
decrease in k from the while loop is bounded from above by the total increase in k
over all iterations of the for loop, which is m  1. Thus, the while loop iterates at
most m  1 times in all, and C OMPUTE -P REFIX -F UNCTION runs in ‚.m/ time.
Exercise 32.4-4 asks you to show, by a similar aggregate analysis, that the match-
ing time of KMP-M ATCHER is ‚.n/.
32.4 The Knuth-Morris-Pratt algorithm 979

P5 a b a b a c a

P3 a b a b a c a �Œ5� D3

i 1 2 3 4 5 6 7
P1 a b a b a c a �Œ3� D1
P Œi � a b a b a c a
�Œi � 0 0 1 2 3 0 1
P0 " a b a b a c a �Œ1� D0

(a) (b)

Figure 32.10 An illustration of Lemma 32.5 for the pattern P ababaca and q D D
5. (a) The
� function for the given pattern. Since �Œ5� 3, �Œ3� D 1, and �Œ1� D D
0, iterating � gives
�  Œ5� Df g
3; 1; 0 . (b) Sliding the template containing the pattern P to the right and noting when
W W
some preûx P Œ k� of P matches up with some proper sufûx of P Œ 5�. Matches occur when k 3, D
1, and 0. In the ûgure, the ûrst row gives P , and the vertical red line is drawn just after P Œ 5�. W
W
Successive rows show all the shifts of P that cause some preûx P Œ k� of P to match some sufûx
W
of P Œ 5�. Successfully matched characters are shown in blue. Blue lines connect aligned matching
f W
characters. Thus, k k < 5 and P Œ k� P Œ 5� W ❂ 3; 1; 0 . Lemma 32.5 claims that �  Œq�
W gDf g D
f W
k k < q and P Œ k� W ❂ W g
P Œ q� for all q .

Compared with F INITE -AUTOMATON -M ATCHER, by using � rather than ı , the

KMP algorithm reduces the time for preprocessing the pattern from O.m j†j/
to ‚.m/, while keeping the actual matching time bounded by ‚.n/.

Correctness of the preﬁx-function computation

We’ll see a little later that the preûx function � helps to simulate the transition
function ı in a string-matching automaton. But ûrst, we need to prove that the
procedure C OMPUTE -P REFIX -F UNCTION does indeed compute the preûx func-
tion correctly. Doing so requires ûnding all preûxes P Œ W k� that are proper sufûxes
of a given preûx P Œ W q�. The value of �Œq � gives us the length of the longest such
preûx, but the following lemma, illustrated in Figure 32.10, shows that iterating the
preûx function � generates all the preûxes P Œ W k� that are proper sufûxes of P Œ W q�.
Let
˚ 
�  Œq� D �Œq �; � .2/ Œq�; � .3/ Œq�; : : : ; � .t / Œq� ;

where � .i / Œq� is deûned in terms of functional iteration, so that � .0/ Œq� D q and
� .i / Œq� D �Œ� .i 1/ Œq�� for i  1 (so that �Œq � D � .1/ Œq�), and where the sequence
in �  Œq� stops upon reaching � .t / Œq� D 0 for some t  1.
980 Chapter 32 String Matching

Lemma 32.5 (Preﬁx-function iteration lemma)

Let P be a pattern of length m with preûx function � . Then, for q D 1; 2; : : : ; m,
we have �  Œq� D fk W k < q and P Œ W k� ❂ P Œ W q�g.

Proof We ûrst prove that �  Œq� ෂ fk W k < q and P Œ W k� ❂ P Œ W q�g or, equiva-
lently,
i 2 �  Œq� implies P Œ W i � ❂ P Œ W q� : (32.7)
If i 2 �  Œq�, then i D � Œq� for some u > 0. We prove equation (32.7)
.u/

by induction on u. For u D 1, we have i D �Œq �, and the claim follows since

i < q and P Œ W �Œq �� ❂ P Œ W q� by the deûnition of � . Now consider some u  1
such that both � Œq� and � C Œq� belong to �  Œq�. Let i D � Œq�, so that
.u/ .u 1/ .u/

�Œi � D � C Œq�. The inductive hypothesis is that P Œ W i � ❂ P Œ W q�. Because

.u 1/

the relations < and ❂ are transitive, we have �Œi � < i < q and P Œ W �Œi �� ❂
P Œ W i � ❂ P Œ W q�, which establishes equation (32.7) for all i in �  Œq�. Therefore,
�  Œq� ෂ fk W k < q and P Œ W k� ❂ P Œ W q�g.
We now prove that fk W k < q and P Œ W k� ❂ P Œ W q�g ෂ �  Œq� by contradiction.
Suppose to the contrary that the set fk W k < q and P Œ W k� ❂ P Œ W q�g  �  Œq� is
nonempty, and let j be the largest number in the set. Because �Œq � is the largest
value in fk W k < q and P Œ W k� ❂ P Œ W q�g and �Œq � 2 �  Œq�, it must be the case
that j < �Œq �. Having established that �  Œq� contains at least one integer greater
than j , let j 0 denote the smallest such integer. (We can choose j 0 D �Œq � if
no other number in �  Œq� is greater than j .) We have P Œ W j � ❂ P Œ W q� because
j 2 fk W k < q and P Œ W k� ❂ P Œ W q�g, and from j 0 2 �  Œq� and equation (32.7),
we have P Œ W j 0 � ❂ P Œ W q�. Thus, P Œ W j � ❂ P Œ W j 0 � by Lemma 32.1, and j is the
largest value less than j 0 with this property. Therefore, we must have �Œj 0 � D j
and, since j 0 2 �  Œq�, we must have j 2 �  Œq� as well. This contradiction proves
the lemma.

The algorithm C OMPUTE -P REFIX -F UNCTION computes �Œq �, in order, for q D

1; 2; : : : ; m. Setting �Œ1� to 0 in line 2 of C OMPUTE -P REFIX -F UNCTION is cer-
tainly correct, since �Œq � < q for all q . We’ll use the following lemma and its
corollary to prove that C OMPUTE -P REFIX -F UNCTION computes �Œq � correctly
for q > 1.

Lemma 32.6
Let P be a pattern of length m, and let � be the preûx function for P . For q D
1; 2; : : : ; m, if �Œq � > 0, then �Œq �  1 2 �  Œq  1�.

Proof Let r D�Œq � > 0, so that r < q and P Œ r � W ❂ W

P Œ q�, and thus,
r 1<q 1 and P Œ rW  ❂
1� PŒ q W 
1� (by dropping the last character from
32.4 The Knuth-Morris-Pratt algorithm 981

W W
P Œ r � and P Œ q�, which we can do because r > 0). By Lemma 32.5, therefore,
 2 
r 1 �  Œq 1�. Thus, we have �Œq � 1  D  2
r 1 �  Œq 1�. 
For q D 2; 3; : : : ; m, deûne the subset E  ෂ �  Œq  1� by
q 1

E q 1 D fk 2 �  Œq  1� W P Œk C 1� D P Œq�g
D fk W k < q  1 and P Œ W k� ❂ P Œ W q  1� and P Œk C 1� D P Œq�g
(by Lemma 32.5)
D fk W k < q  1 and P Œ W k C 1� ❂ P Œ W q�g :
The set Eq1 consists of the values k < q  1 for which P Œ W k� ❂ P Œ W q  1� and
for which, because P Œk C 1� D P Œq�, we have P Œ W k C 1� ❂ P Œ W q�. Thus, Eq1
consists of those values k 2 �  Œq  1� such that extending P Œ W k� to P Œ W k C 1�
produces a proper sufûx of P Œ W q�.

Corollary 32.7
Let P be a pattern of length m, and let � be the preûx function for P . Then, for
q D 2; 3; : : : ; m,
(
�Œq � D 0 if Eq1 D;;
1 C max E q 1 if Eq1 ¤;:

Proof If Eq1 is empty, there is no k 2 �  Œq  1� (including k D 0) such that

extending P Œ W k� to P Œ W k C 1� produces a proper sufûx of P Œ W q�. Therefore,
�Œq � D 0.
If, instead, Eq1 is nonempty, then for each k 2 Eq1 , we have k C 1 < q and
P Œ W k C 1� ❂ P Œ W q�. Therefore, the deûnition of �Œq � gives

�Œq �  1 C max E  q 1 : (32.8)

Note that �Œq �> 0. Let r D 
�Œq � 1, so that r 1 C D
�Œq � > 0, and therefore
W C 1� ❂
PŒ r W
P Œ q�. If a nonempty string is a sufûx of another, then the two
C W C
strings must have the same last character. Since r 1 > 0, the preûx P Œ r 1� is
C D 2 
nonempty, and so P Œr 1� P Œq�. Furthermore, r �  Œq 1� by Lemma 32.6.
2  D හ
Therefore, r Eq1 , and so �Œq � 1 r max Eq1 or, equivalently,
�Œq � හ 1 C max E  q 1 : (32.9)
Combining equations (32.8) and (32.9) completes the proof.

We now ûnish the proof that C OMPUTE -P REFIX -F UNCTION computes � cor-
rectly. The key is to combine the deûnition of Eq1 with the statement of Corol-
lary 32.7, so that �Œq � equals 1 plus the greatest value of k in �  Œq  1� such that
982 Chapter 32 String Matching

P Œk C 1� D P Œq�. First, in C OMPUTE-P REFIX -F UNCTION, k D �Œq  1� at the

start of each iteration of the for loop of lines 439. This condition is enforced by
lines 2 and 3 when the loop is ûrst entered, and it remains true in each successive
iteration because of line 9. Lines 538 adjust k so that it becomes the correct value
of �Œq �. The while loop of lines 536 searches through all values k 2 �  Œq  1� in
decreasing order to ûnd the value of �Œq �. The loop terminates either because k
reaches 0 or P Œk C 1� D P Œq�. Because the <and= operator short-circuits, if the
loop terminates because P Œk C 1� D P Œq�, then k must have also been positive,
and so k is the greatest value in Eq1 . In this case, lines 739 set �Œq � to k C 1,
according to Corollary 32.7. If, instead, the while loop terminates because k D 0,
then there are two possibilities. If P Œ1� D P Œq�, then Eq1 D f0g, and lines 739
set both k and �Œq � to 1. If k D 0 and P Œ1� ¤ P Œq�, however, then Eq1 D ;. In
this case, line 9 sets �Œq � to 0, again according to Corollary 32.7, which completes
the proof of the correctness of C OMPUTE -P REFIX -F UNCTION.

Correctness of the Knuth-Morris-Pratt algorithm

You can think of the procedure KMP-M ATCHER as a reimplemented version
of the procedure F INITE -AUTOMATON -M ATCHER, but using the preûx func-
tion � to compute state transitions. Speciûcally, we’ll prove that in the i th
iteration of the for loops of both KMP-M ATCHER and F INITE -AUTOMATON -
M ATCHER, the state q has the same value upon testing for equality with m (at
line 8 in KMP-M ATCHER and at line 4 in F INITE -AUTOMATON -M ATCHER).
Once we have argued that KMP-M ATCHER simulates the behavior of F INITE -
AUTOMATON -M ATCHER, the correctness of KMP-M ATCHER follows from the
correctness of F INITE -AUTOMATON -M ATCHER (though we’ll see a little later why
line 10 in KMP-M ATCHER is necessary).
Before formally proving that KMP-M ATCHER correctly simulates F INITE -
AUTOMATON -M ATCHER, let’s take a moment to understand how the preûx func-
tion � replaces the ı transition function. Recall that when a string-matching
automaton is in state q and it scans a character a D T Œi �, it moves to a new
state ı.q; a/. If a D P Œq C 1�, so that a continues to match the pattern, then the
state number is incremented: ı.q; a/ D q C 1. Otherwise, a ¤ P Œq C 1�, so that
a does not continue to match the pattern, and the state number does not increase:
0 හ ı.q; a/ හ q . In the ûrst case, when a continues to match, KMP-M ATCHER
moves to state q C 1 without referring to the � function: the while loop test in
line 4 immediately comes up false, the test in line 6 comes up true, and line 7
increments q .
The � function comes into play when the character a does not continue to match
the pattern, so that the new state ı.q; a/ is either q or to the left of q along the spine
of the automaton. The while loop of lines 435 in KMP-M ATCHER iterates through
32.4 The Knuth-Morris-Pratt algorithm 983

the states in �  Œq�, stopping either when it arrives in a state, say q 0 , such that a
matches P Œq 0 C 1� or q 0 has gone all the way down to 0. If a matches P Œq 0 C 1�,
then line 7 sets the new state to q 0 C 1, which should equal ı.q; a/ for the simulation
to work correctly. In other words, the new state ı.q; a/ should be either state 0 or
a state numbered 1 more than some state in �  Œq�.
Let’s look at the example in Figures 32.6 and 32.10, which are for the pattern
P D ababaca. Suppose that the automaton is in state q D 5, having matched
ababa. The states in �  Œ5� are, in descending order, 3, 1, and 0. If the next char-
acter scanned is c, then you can see that the automaton moves to state ı.5; c/ D 6
in both F INITE -AUTOMATON -M ATCHER (line 3) and KMP-M ATCHER (line 7).
Now suppose that the next character scanned is instead b, so that the automaton
should move to state ı.5; b/ D 4. The while loop in KMP-M ATCHER exits after
executing line 5 once, and the automaton arrives in state q 0 D �Œ5� D 3. Since
P Œq 0 C 1� D P Œ4� D b, the test in line 6 comes up true, and the automaton moves
to the new state q 0 C 1 D 4 D ı.5; b/. Finally, suppose that the next character
scanned is instead a, so that the automaton should move to state ı.5; a/ D 1. The
ûrst three times that the test in line 4 executes, the test comes up true. The ûrst time
ûnds that P Œ6� D c ¤ a, and the automaton moves to state �Œ5� D 3 (the ûrst state
in �  Œ5�). The second time ûnds that P Œ4� D b ¤ a, and the automaton moves to
state �Œ3� D 1 (the second state in �  Œ5�). The third time ûnds that P Œ2� D b ¤ a,
and the automaton moves to state �Œ1� D 0 (the last state in �  Œ5�). The while loop
exits once it arrives in state q 0 D 0. Now line 6 ûnds that P Œq 0 C 1� D P Œ1� D a,
and line 7 moves the automaton to the new state q 0 C 1 D 1 D ı.5; a/.
Thus, the intuition is that KMP-M ATCHER iterates through the states in �  Œq� in
decreasing order, stopping at some state q 0 and then possibly moving to state q 0 C 1.
Although that might seem like a lot of work just to simulate computing ı.q; a/,
bear in mind that asymptotically, KMP-M ATCHER is no slower than F INITE -
AUTOMATON -M ATCHER.
We are now ready to formally prove the correctness of the Knuth-Morris-Pratt
algorithm. By Theorem 32.4, we have that q D �.T Œ W i �/ after each time line 3 of
F INITE -AUTOMATON -M ATCHER executes. Therefore, it sufûces to show that the
same property holds with regard to the for loop in KMP-M ATCHER. The proof
proceeds by induction on the number of loop iterations. Initially, both procedures
set q to 0 as they enter their respective for loops for the ûrst time. Consider iter-
ation i of the for loop in KMP-M ATCHER. By the inductive hypothesis, the state
number q equals �.T Œ W i  1�/ at the start of the loop iteration. We need to show
that when line 8 is reached, the new value of q is �.T Œ W i �/. (Again, we’ll handle
line 10 separately.)
Considering q to be the state number at the start of the for loop iteration, when
KMP-M ATCHER considers the character T Œi �, the longest preûx of P that is a
sufûx of T Œ W i � is either P Œ W q C 1� (if P Œq C 1� D T Œi �) or some preûx (not
984 Chapter 32 String Matching

necessarily proper, and possibly empty) of P Œ W q�. We consider separately the

three cases in which �.T Œ W i �/ D 0, �.T Œ W i �/ D q C 1, and 0 < �.T Œ W i �/ හ q .
 If �.T Œ W i �/ D 0, then P Œ W 0� D " is the only preûx of P that is a sufûx of T Œ W i �.
The while loop of lines 435 iterates through each value q 0 in �  Œq�, but although
P Œ W q 0 � ❂ P Œ W q� ❂ T Œ W i  1� for every q 0 2 �  Œq� (because < are ❂ are tran-
sitive relations), the loop never ûnds a q 0 such that P Œq 0 C 1� D T Œi �. The loop
terminates when q reaches 0, and of course line 7 does not execute. Therefore,
q D 0 at line 8, so that now q D �.T Œ W i �/.
 If �.T Œ W i �/ D q C 1, then P Œq C 1� D T Œi �, and the while loop test in line 4 fails
the ûrst time through. Line 7 executes, incrementing the state number to q C 1,
which equals �.T Œ W i �/.
 If 0 < �.T Œ W i �/ හ q 0 , then the while loop of lines 435 iterates at least once,
checking in decreasing order each value in �  Œq� until it stops at some q 0 < q .
Thus, P Œ W q 0 � is the longest preûx of P Œ W q� for which P Œq 0 C 1� D T Œi �, so
that when the while loop terminates, q 0 C 1 D �.P Œ W q�T Œi �/. Since q D
�.T Œ W i  1�/, Lemma 32.3 implies that �.T Œ W i  1�T Œi �/ D �.P Œ W q�T Œi �/.
Thus we have
q 0 C 1 D �.P Œ W q�T Œi �/
D �.T Œ W i  1�T Œi �/
D �.T Œ W i �/
when the while loop terminates. After line 7 increments q , the new state num-
ber q equals �.T Œ W i �/.
Line 10 is necessary in KMP-M ATCHER, because otherwise, line 4 might try
to reference P Œm C 1� after ûnding an occurrence of P . (The argument that
q D �.T Œ W i  1�/ upon the next execution of line 4 remains valid by the hint
given in Exercise 32.4-8: that ı.m; a/ D ı.�Œm�;a/ or, equivalently, �.P a/ D
�.P Œ W �Œm��a/ for any a 2 †.) The remaining argument for the correctness
of the Knuth-Morris-Pratt algorithm follows from the correctness of F INITE -
AUTOMATON -M ATCHER, since we have shown that KMP-M ATCHER simulates
the behavior of F INITE -AUTOMATON -M ATCHER.

Exercises

32.4-1
Compute the preûx function � for the pattern ababbabbabbababbabb.

32.4-2
Give an upper bound on the size of �  Œq� as a function of q . Give an example to
show that your bound is tight.

Labreport Heat Exchanger
No ratings yet
Labreport Heat Exchanger
27 pages
The Knuth Morris Pratt Algorithm
No ratings yet
The Knuth Morris Pratt Algorithm
7 pages
Unit8 ADA SPPDF 2022 11 11 17 17 37pdf 2023 12 06 16 57 08
No ratings yet
Unit8 ADA SPPDF 2022 11 11 17 17 37pdf 2023 12 06 16 57 08
18 pages
5CS4-AOA-Unit-3 @zammers
No ratings yet
5CS4-AOA-Unit-3 @zammers
7 pages
A357460420 - 22393 - 2 - 2018 - String Matching
No ratings yet
A357460420 - 22393 - 2 - 2018 - String Matching
27 pages
Ada Notes Unit 4
No ratings yet
Ada Notes Unit 4
28 pages
Abstract
No ratings yet
Abstract
12 pages
KMP Algorithm
No ratings yet
KMP Algorithm
21 pages
DAA Unit 5
No ratings yet
DAA Unit 5
22 pages
4 Module Algorithms
No ratings yet
4 Module Algorithms
28 pages
String Matching
No ratings yet
String Matching
63 pages
Algorithms in Bioinformatics
No ratings yet
Algorithms in Bioinformatics
7 pages
Knuth Moris 2797348
No ratings yet
Knuth Moris 2797348
21 pages
KMP 2
No ratings yet
KMP 2
7 pages
String Matching
No ratings yet
String Matching
27 pages
String Matching
100% (1)
String Matching
27 pages
String Matching Algorithms
No ratings yet
String Matching Algorithms
46 pages
KMP Algo
No ratings yet
KMP Algo
16 pages
W 9 Presentation
No ratings yet
W 9 Presentation
20 pages
W9 Presentation
No ratings yet
W9 Presentation
20 pages
Mathematical Model For String Pattern Matching Algorithm (Boyer-Moore's Algorithm)
No ratings yet
Mathematical Model For String Pattern Matching Algorithm (Boyer-Moore's Algorithm)
5 pages
How A Search Engine Works
No ratings yet
How A Search Engine Works
28 pages
String Matching
No ratings yet
String Matching
35 pages
AAD-String Matching
No ratings yet
AAD-String Matching
15 pages
Knuth-Morris-Pratt Algorithm KENT
No ratings yet
Knuth-Morris-Pratt Algorithm KENT
4 pages
KMP Algorithm
No ratings yet
KMP Algorithm
20 pages
Week 9 String Algorithms, Approximation
No ratings yet
Week 9 String Algorithms, Approximation
22 pages
Week4 PPT SM
No ratings yet
Week4 PPT SM
35 pages
4string Matching Kmprabin Karp and Naive
No ratings yet
4string Matching Kmprabin Karp and Naive
57 pages
String Matching - RYS - Lect - 1 - 2 - 3 - Update
No ratings yet
String Matching - RYS - Lect - 1 - 2 - 3 - Update
61 pages
String Matching Problem
No ratings yet
String Matching Problem
16 pages
Ch-5 Numerical Daa
No ratings yet
Ch-5 Numerical Daa
11 pages
BNP Unit-5 Lecture 20 KMP 5.2
No ratings yet
BNP Unit-5 Lecture 20 KMP 5.2
14 pages
String Search: 1 2 I I+1 I+m-1 N
No ratings yet
String Search: 1 2 I I+1 I+m-1 N
8 pages
Sandeep Singh (Iii B.Tech I.T)
No ratings yet
Sandeep Singh (Iii B.Tech I.T)
179 pages
String Matching Chapter 12 Goodrich Nep
No ratings yet
String Matching Chapter 12 Goodrich Nep
43 pages
A Two Way Pattern Matching Algorithm Using Sliding Patterns
No ratings yet
A Two Way Pattern Matching Algorithm Using Sliding Patterns
5 pages
Today's Lecture: String Matching Algorithm Naïve / Brute Force RK
No ratings yet
Today's Lecture: String Matching Algorithm Naïve / Brute Force RK
20 pages
Lecture 18 - String Matching-KMP
No ratings yet
Lecture 18 - String Matching-KMP
40 pages
Lecture 56string Matching
No ratings yet
Lecture 56string Matching
43 pages
Unit 3
No ratings yet
Unit 3
34 pages
KMP Algorithm
No ratings yet
KMP Algorithm
3 pages
String Matching
No ratings yet
String Matching
30 pages
Unit 5 String Matching 2010
No ratings yet
Unit 5 String Matching 2010
5 pages
String Matching Introduction To NP-Completeness
No ratings yet
String Matching Introduction To NP-Completeness
37 pages
AOA Module 6 - String of Algorithms - Aeraxia - in
No ratings yet
AOA Module 6 - String of Algorithms - Aeraxia - in
26 pages
String Matching
No ratings yet
String Matching
4 pages
String Matching
No ratings yet
String Matching
34 pages
Unit-8 String Matching
No ratings yet
Unit-8 String Matching
31 pages
Notes 5
No ratings yet
Notes 5
23 pages
Lecture 39 Knutt Morris Pratt
No ratings yet
Lecture 39 Knutt Morris Pratt
15 pages
CH 8
No ratings yet
CH 8
26 pages
CHPT 9 Pattern Matching
No ratings yet
CHPT 9 Pattern Matching
14 pages
String Matching Algorithm
100% (1)
String Matching Algorithm
14 pages
Text Pattern Search Using Naïve Algorithm: Justine Estoesta, Patricia Mae Omana, Winci John Singh
No ratings yet
Text Pattern Search Using Naïve Algorithm: Justine Estoesta, Patricia Mae Omana, Winci John Singh
5 pages
Pattern Matching
No ratings yet
Pattern Matching
3 pages
Outline and Reading: Strings ( 9.1.1) Pattern Matching Algorithms
No ratings yet
Outline and Reading: Strings ( 9.1.1) Pattern Matching Algorithms
3 pages
Lecture#8 - String Matching Algorithm
No ratings yet
Lecture#8 - String Matching Algorithm
38 pages
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet
Fifth Dimension: The Light to See
From Everand
Fifth Dimension: The Light to See
Marc E. King
No ratings yet
Satplan: Fundamentals and Applications
From Everand
Satplan: Fundamentals and Applications
Fouad Sabry
No ratings yet
Red Team Blue Team Exercise Data Sheet
No ratings yet
Red Team Blue Team Exercise Data Sheet
2 pages
Logaritmos Exponencial by Ven Reprint
No ratings yet
Logaritmos Exponencial by Ven Reprint
83 pages
The Kite Runner Essays
100% (2)
The Kite Runner Essays
7 pages
Pub - Finite Element Analysis PDF
No ratings yet
Pub - Finite Element Analysis PDF
694 pages
MS Broschuere FLUITEX EN Metric
No ratings yet
MS Broschuere FLUITEX EN Metric
12 pages
Aquaguard Absolute Hot Cold N Ambient UV Aquagard Absloute Hot Cold N Ambient UV Echnical Specification
No ratings yet
Aquaguard Absolute Hot Cold N Ambient UV Aquagard Absloute Hot Cold N Ambient UV Echnical Specification
8 pages
Get Hitler in Argentina But No Teutonic Conspiracy of 1000 Years 1st Edition Bruno Buike Free All Chapters
100% (1)
Get Hitler in Argentina But No Teutonic Conspiracy of 1000 Years 1st Edition Bruno Buike Free All Chapters
51 pages
RK20S1 MEC420 Assignment 02 by RK
No ratings yet
RK20S1 MEC420 Assignment 02 by RK
2 pages
FU The Freeform Universal RPG (Classic Rules)
No ratings yet
FU The Freeform Universal RPG (Classic Rules)
24 pages
The Changing Face of The Ethiopian Rift Lakes and Their Environs
No ratings yet
The Changing Face of The Ethiopian Rift Lakes and Their Environs
18 pages
Intro To Microscopes - Compiled Lesson Plan
No ratings yet
Intro To Microscopes - Compiled Lesson Plan
32 pages
Advertisement For Dav
No ratings yet
Advertisement For Dav
9 pages
As Phy Revision BK For Mid Term PDF
No ratings yet
As Phy Revision BK For Mid Term PDF
10 pages
ANN Course File 2011
No ratings yet
ANN Course File 2011
8 pages
OL Physics Book 2 (MCQ Theory) 2008 Till 2021
No ratings yet
OL Physics Book 2 (MCQ Theory) 2008 Till 2021
386 pages
Dissertation Theatre Vu Ou Lu
100% (1)
Dissertation Theatre Vu Ou Lu
7 pages
Year 5 Science Term 1
No ratings yet
Year 5 Science Term 1
42 pages
Fundamentals of Metal Forming
No ratings yet
Fundamentals of Metal Forming
22 pages
Measurement of Irrigation Water
No ratings yet
Measurement of Irrigation Water
83 pages
Final Thermodynamics
No ratings yet
Final Thermodynamics
6 pages
Summary Completion Ielts Reading
No ratings yet
Summary Completion Ielts Reading
8 pages
Enhancing The Weather - Governance of Weather Modification Activit
No ratings yet
Enhancing The Weather - Governance of Weather Modification Activit
69 pages
Ram Mohan Impact of West
No ratings yet
Ram Mohan Impact of West
2 pages
SSC Geography
No ratings yet
SSC Geography
3 pages
R2 BCBrochure
No ratings yet
R2 BCBrochure
2 pages
Fortiss Report AI Engineering en Web
No ratings yet
Fortiss Report AI Engineering en Web
120 pages
A Cell-Based Smoothed Finite Element Method For TH
No ratings yet
A Cell-Based Smoothed Finite Element Method For TH
14 pages
List of Dutch Inventions and Discoveries - Wikipedia, The Free Encyclopedia20151006224847
No ratings yet
List of Dutch Inventions and Discoveries - Wikipedia, The Free Encyclopedia20151006224847
131 pages
LabVIEW Signal Processing Course Manual
No ratings yet
LabVIEW Signal Processing Course Manual
432 pages

32.4 The Knuth-Morris-Pratt Algorithm: Either

Uploaded by

32.4 The Knuth-Morris-Pratt Algorithm: Either

Uploaded by

32.

4 The Knuth-Morris-Pratt algorithm 975

? 32.4 The Knuth-Morris-Pratt algorithm

The preûx function for a pattern

(that is, P Œ W k� h T Œ W s 0 C k�), where s 0 C k D s C q?

KMP-M ATCHER .T; P; n; m/

C OMPUTE -P REFIX -F UNCTION .P; m/

Compared with F INITE -AUTOMATON -M ATCHER, by using � rather than ı , the

Correctness of the preﬁx-function computation

Lemma 32.5 (Preﬁx-function iteration lemma)

by induction on u. For u D 1, we have i D �Œq �, and the claim follows since

�Œi � D � C Œq�. The inductive hypothesis is that P Œ W i � ❂ P Œ W q�. Because

The algorithm C OMPUTE -P REFIX -F UNCTION computes �Œq �, in order, for q D

Proof Let r D�Œq � > 0, so that r < q and P Œ r � W ❂ W

Proof If Eq1 is empty, there is no k 2 �  Œq  1� (including k D 0) such that

�Œq �  1 C max E  q 1 : (32.8)

P Œk C 1� D P Œq�. First, in C OMPUTE-P REFIX -F UNCTION, k D �Œq  1� at the

Correctness of the Knuth-Morris-Pratt algorithm

necessarily proper, and possibly empty) of P Œ W q�. We consider separately the

You might also like