KMP Algorithm
KMP Algorithm
. a O(mn) approach
One of the most obvious approach towards the string matching problem would be to compare the first element of the pattern to be searched p, with the first element of the string S in which to locate p. If the first element of p matches the first element of S, compare the second element of p with second element of S. If match found proceed likewise until entire p is found. If a mismatch is found at any position, shift p one position to the right and repeat comparison beginning from first element of p.
Pattern p
a b a a
a b a a
a b a a
a b a a
Mismatch occurs here..
Since mismatch is detected, shift p one position to the right and repeat matching procedure.
S
p
a b c a b a ab c a b a c
abaa
Finally, a match would be found after shifting p three times to the right side. Drawbacks of this approach: if m is the length of pattern p and n the length of string S, the matching time is of the order O(mn). This is a certainly a very slow running algorithm. What makes this approach so slow is the fact that elements of S with which comparisons had been performed earlier are involved again and again in comparisons in some future iterations. For example: when mismatch is detected for the first time in comparison of p[3] with S[3], pattern p would be moved one position to the right and matching procedure would resume from here. Here the first comparison that would take place would be between p[0]=a and S[1]=b. It should be noted here that S[1]=b had been previously involved in a comparison in step 2. this is a repetitive use of S[1] in another comparison. It is these repetitive comparisons that lead to the runtime of O(mn).
Step 4: q = 5, k =2 [5] = 3
q
p q p
1
a 0 1 a 0
2
b 0 2 b 0
3
a 1 3 a 1
4
b 2 4 b 2
5
a 3 5 a 3
6
c
7
a
Step 5: q = 6, k = 3 [6] = 1
6 c 1
7 a
1
a 0
2
b 0
3
a 1
4
b 2
5
a 3
6
c 1
7
a 1
Step 6: q = 7, k = 1 [7] = 1
q p
1 a 0
2 b 0
3 A 1
4 b 2
5 a 3
6 c 1
7 a 1
Note: KMP finds every occurrence of a p in S. That is why KMP does not terminate in step 12, rather it searches remainder of S for any more occurrences of p.
q p
1 a 0
2 b 0
3 a 1
4 b 2
5 a 3
6 c 1
7 a 1
S p
b a c b a b a b a b a c a a b a b a b a c a
P[1] does not match with S[1]. p will be shifted one position to the right. Step 2: i = 2, q = 0 comparing p[1] with S[2]
b a c b a b a b a b a c a a b a b a b a c a
P[1] matches S[2]. Since there is a match, p is not shifted.
S p
b a c b a b a b a b a c a a b
a b a b a c a
Backtracking on p, comparing p[1] and S[3] Step 4: i = 4, q = 0 comparing p[1] with S[4] p[1] does not match with S[4]
S p
b a c b a b a b a b a c a a b a b a b a c a
p[1] matches with S[5]
S
p
b a c b a b a b a b a c a a b
a b a b a c a
S p
b a c b a b a b a b a c a a b
a b a b a c a
p[3] matches with S[7]
S p
b a c b a b a b a b a c a a b a b a b a c a
p[4] matches with S[8]
S
p
b a c b a b a b a b a c a a b
a b a b a c a
S p
b a c b a b a b a b a c a a b
a b a b a c a
p[6] doesnt match with S[10]
S p
b a c b a b a b a b a c a a b a b a b a c a
Backtracking on p, comparing p[4] with S[10] because after mismatch q = [5] = 3
S p
b a c b a b a b a b a c a a b a b a b a c a
S p
b a c b a b a b a b a c a a b
a b a b a c a
Step 13: i = 13, q = 6 Comparing p[7] with S[13] p[7] matches with S[13]
S
p
b a c b a b a b a b a c a a b a b a b a c a
Pattern p has been found to completely occur in string S. The total number of shifts that took place for the match to be found are: i m = 13 7 = 6 shifts.
KMP Matcher 1 n length[S] 2 m length[p] 3 Compute-Prefix-Function(p) 4q0 5 for i 1 to n 6 do while q > 0 and p[q+1] != S[i] 7 do q [q] 8 if p[q+1] = S[i] 9 then q q + 1 10 if q = m 11 then print Pattern occurs with shift i m 12 q [ q]
In the above pseudocode for computing the prefix function, the for loop from step 4 to step 10 runs m times. Step 1 to step 3 take constant time. Hence the running time of compute prefix function is (m).
The for loop beginning in step 5 runs n times, i.e., as long as the length of the string S. Since step 1 to step 4 take constant time, the running time is dominated by this for loop. Thus running time of matching function is (n).