nextjKnuth-Morris-Pratt (KMP) Algorithm For Pattern Matching
nextjKnuth-Morris-Pratt (KMP) Algorithm For Pattern Matching
[top pattern]
[bottom pattern]
Comparing the above two patterns is done systematically. We first take 1 bit of the top pattern and bottom
pattern into consideration, then 2 bits of the top and bottom patterns, and so on.
If we consider only one bit in each the patterns for the comparison, we dont need most of the bits shown
above and can rewrite the above as:
1xxxxxxx
1xxxxxxx
[top pattern]
[bottom pattern]
Now with only the first bits in both patterns in mind (i.e. forget about all the xs) , we look for an overlap
between the top pattern and the bottom pattern. In the above case, there is no such overlap. Therefore, we
say that when j = 1, next[j] = 0. By j = 1, we mean considering the first bit of each pattern. By next[j] = 0
we mean 0 bits matched.
Weve so far compared the first bit of the pattern with itself. Now we compare the first two bits of the
pattern with itself (i.e. for j = 2) as follows:
11xxxxxx
11xxxxxx
[top pattern]
[bottom pattern]
We look for an overlap between the relevant bits of the above patterns (i.e. the bits that are not xs). We
see that the first bit of the bottom pattern overlaps with the second bit of the top pattern, and only this bit is
similar in the overlapping portions of both patterns. So, when j = 2, next[j] = 1 (because only 1 bit
matched).
Similarly, we check for j = 3;
110xxxxx
110xxxxx
110xxxxx
[top pattern]
[centre pattern]
[bottom pattern]
Here we consider the first three bits of the patterns. There are three rows now because we need to make the
first bit in the last row (i.e. the bottom pattern) coincide with bit number j (i.e 3) in the top pattern.
So now we look at the overlapping bits of the top pattern with each of the other patterns. We can see that
the second and third bits from the top pattern (i.e. the 1 0) overlaps with the first two bits of the centre
pattern (i.e. the 1 1). But for a match, ALL these bits have to match, therefore 1 0 does not match with
1 1. Next we check the bottom pattern with the top pattern. Here again we see that the third bit from the
top pattern (i.e. 0) overlaps with the first bit of the bottom pattern (i.e. 1). Again this is not a match. So
when j = 3, next[j] = 0 (its 0 because we couldnt find a match).
Similarly, for j = 4, we compare the first four bits of all the patterns.
1101xxxx
1101xxxx
1101xxxx
1101xxxx
[top pattern]
[second pattern]
[third pattern]
[bottom pattern]
Firstly, check the overlapping bits of the top pattern with the second pattern. We see that 1 0 1 of the top
pattern overlaps with 1 1 0 of the second pattern. But although the first bits match, we do not consider it a
proper match unless ALL the three bits match. We move on to compare the top pattern with the third
pattern. We see that 0 1 of the top pattern overlaps with 1 1 of the third pattern. Still no match, so we
consider the bottom pattern and the top pattern. Here, the last bit 1 of the top pattern matches with the
first bit 1 of the bottom pattern. Since there is 1 matching bit, for j = 4, next[j] = 1.
Again, for j = 5, we have:
11011xxx
11011xxx
11011xxx
11011xxx
11011xxx
[top pattern]
[second pattern]
[third pattern]
[fourth pattern]
[bottom pattern]
Compare top and second; 1 0 1 1 and 1 1 0 1 is not a match. Compare top and third; 0 1 1 and 1 1 0
still no match. Compare top and fourth; 1 1 and 1 1, alrighty then weve found a match of 2 bits. Now if
we continue and we compare the top and bottom patterns, we will find a match of 1 bit. But since we found
a match of 2 bits between the top and fourth patterns, we can ignore the match of 1 bit, between the top and
bottom patterns. So for j = 5, next[j] = 2.
Similarly do for the next 2 steps (i.e. upto j = 7, not j = 8). We can create our next[j] table based on the
above results to get:
j
0
1
2
3
4
5
6
7
next[j]
-1
0
1
0
1
2
2
3