StringMatchingAlgorithmsL1
StringMatchingAlgorithmsL1
• String matching algorithms tries to find one or more indices where one or several
strings (pattern) are found in the larger string (text).
This means:
• there is a string array T which contains a certain number of characters that is
larger than the number of characters in string array P.
• P is said to be the pattern array because it contains a pattern of characters to be
searched for in the larger array T.
Naïve Algorithm
• Naïve Algorithm also known as brute-force algorithm
• It is the simplest method among other pattern searching algorithms.
• It checks all character of the main string (T) to the pattern (P).
• This algorithm is useful for smaller texts.
• It does not need any pre-processing phases.
• Algorithm is space efficient and does not take extra space.
• The time complexity of Naïve Pattern Search method is O(m*n). The m is the size
of pattern and n is the size of the main string.
Naïve Algorithm Example Cont.
index
i
1 2 3 4 5 6 7 8 9 10 11 12
a b c d a b c a b c d f
string
Index j =1
1
Pattern value = a
Index j = 2
2
Pattern value =b
Index j = 3
3
Pattern value = c
Index j = 4
4
Pattern value = d
Index j = 5
5 Pattern value = f
mismatch therefore
Move j to index 1
move i to index 2 In other words reset index I and j
Naïve Algorithm Example Cont.
index
i
1 2 3 4 5 6 7 8 9 10 11 12
a b c d a b c a b c d f
Index j = 1
6 Pattern value =a
mismatch therefore
Index j = 1
7 Pattern value =a
mismatch therefore
1 2 3 4 5 6 7 8 9 10 11 12
a b c d a b c a b c d f
Pattern: a b c d f
• Move j and i until there is a mismatch.
j
• In case of mismatch
• shift j to the starting point
• i will start from index 2
Naïve Algorithm
• The naive string-matcher is inefficient because it entirely ignores information
gained about the text for one value of T when it considers other values of s.
• What will be the time complexity of Naïve Algorithm?
• What will be the pseudo code for this?
Knuth Morris Pratt (KMP) Algorithm
• This algorithm was conceived by Donald Knuth and Vaughan Pratt and
independently by James H. Morris in 1977.
• Knuth, Morris and Pratt discovered first linear time string-matching algorithm by
analysis of the naïve algorithm.
• It keeps the information that naive approach wasted information gathered during
the scan of the text.
• By avoiding this waste of information, it achieves a running time of O(n).
• The implementation of Knuth-Morris-Pratt algorithm is efficient because it
minimizes the total number of comparisons of the pattern against the input
string.
Knuth Morris Pratt (KMP) Algorithm
Pattern: a b a b d i=1
J+1 Initial state
J=0
J=0
1- Compare i with j+1
Index [j] 0 1 2 3 4 5 i=0, value = a
j+1=1 , value = a
P[i] a b a b d 2- If match
Pi [j] 0 0 1 2 0 3- Move j; (j will move to index 1 as it was on
index 0 and we compared j+1)
4- Move i; (i will move to index 2)
Please note:
After this step j=1 and i=2
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d
Iteration 2
Pattern: a b a b d i=2
J J+1 j=1
1- Compare i with j+1
Index [j] 0 1 2 3 4 5 i=2, value = b
j+1=2 , value = b
P[i] a b a b d 2- If match
Pi [j] 0 0 1 2 0 3- Move j; (j will move to index 2 as it was on
index 1 and we compared j+1)
4- Move i; (i will move to index 3)
Please note:
After this step j=2 and i=3
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d
Iteration 3
Pattern: a b a b d i=3
J J+1 j=2
1- Compare i with j+1
Index [j] 0 1 2 3 4 5 i=3, value = a
j+1=3 , value = a
P[i] a b a b d 2- If match
Pi [j] 0 0 1 2 0 3- Move j; (j will move to index 3 as it was on
index 2 and we compared j+1)
4- Move i; (i will move to index 4)
Please note:
After this step j=3 and i=4
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d
Iteration 4
Pattern: a b a b d i=4
J J+1 j=3
1- Compare i with j+1
Index [j] 0 1 2 3 4 5 i=4, value = b
j+1=4 , value = b
P[i] a b a b d 2- If match
Pi [j] 0 0 1 2 0 3- Move j; (j will move to index 4 as it was on
index 3 and we compared j+1)
4- Move i; (i will move to index 5)
Please note:
After this step j=4 and i=5
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d
Iteration 5
i=5
Pattern: a b a b d j=4
1- Compare i with j+1
J J+1
i=5, value = c
j+1=5 , value = d
Index [j] 0 1 2 3 4 5 2- If match
P[i] a b a b d 3- Move j; (j will move to index 4 as it was on
index 3 and we compared j+1)
Pi [j] 0 0 1 2 0
4- Move i; (i will move to index 5)
5- Mismatch
J will move to index 2 6- Move j to index below alphabet
(here check index below letter b its index 2
7- go to step 1 and compare
Please note: After this step j=2 and i=5, we did not increment i
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d
Iteration 6
i=5
Pattern: a b a b d j=2
1- Compare i with j+1
J J+1
i=5, value = c
j+1=3 , value = a
Index [j] 0 1 2 3 4 5 2- If match
P[i] a b a b d 3- Move j; (j will move to index 4 as it was on
index 3 and we compared j+1)
Pi [j] 0 0 1 2 0
4- Move i; (i will move to index 5)
5- Mismatch (again)
J will move to index 0
6- Move j to index below alphabet
(here index below letter b is 0)
7- go to step 1 and compare
Please note: After this step j=0 and i=5
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d
Iteration 7
i=5, j=0
1- Compare i with j+1
Pattern: a b a b d i=5, value = c
J J+1 j+1=1 , value = a
2- If match
3- Move j; (j will move to index 4 as it was on index 3 and
Index [j] 0 1 2 3 4 5 we compared j+1)
a b a b d 4- Move i; (i will move to index 5)
P[i]
5- Mismatch (again)
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter a its index 0 and j is
already on 0 index. We can go beyond)
7- go to step 1 and compare
8-Increment i
Please note: After this step j=0 and i=6, we incremented i 9- go to step 1 and compare
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d
Iteration 8
i=6, j=0
Pattern: a b a b d 1- Compare i with j+1
i=6, value = a
J J+1
j+1=1 , value = a
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 1 as it was on
P[i] a b a b d index 0 and we compared j+1)
4- Move i; (i will move to index 7)
Pi [j] 0 0 1 2 0
5- Mismatch (again)
6- Move j to index below alphabet
(here index below letter a its index 0 and j is already on 0
index. We can go beyond)
7- go to step 1 and compare
Please note: After this step j=1 and i=7 8-Increment i
9- go to step 1 and compare
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d
Iteration 9
i=7, j=1
Pattern: a b a b d 1- Compare i with j+1
i=7, value = b
J J+1 j+1=1 , value = b
2- If match
Index [j] 0 3- Move j; (j will move to index 2 as it was on index 1 and
1 2 3 4 5
we compared j+1)
P[i] a b a b d 4- Move i; (i will move to index 8)
0 0 1 2 0 5- Mismatch (again)
Pi [j]
6- Move j to index below alphabet
(here index below letter a its index 0 and j is
already on 0 index. We can go beyond)
7- go to step 1 and compare
Please note: After this step j=2 and i=8 8-Increment i
9- go to step 1 and compare
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d
Iteration 10
i=8, j=2
Pattern: a b a b d 1- Compare i with j+1
i=8, value = c
J J+1
j+1=3 , value = a
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 2 as it was on index 1 and
we compared j+1)
P[i] a b a b d
4- Move i; (i will move to index 8)
Pi [j] 0 0 1 2 0 5- Mismatch (again)
6- Move j to index below alphabet
J will move to index 0 (here index below letter b its index 0, so j moved to index
0)
7- go to step 1 and compare
8-Increment i
Please note: After this step j=0 and i=8 9- go to step 1 and compare
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d
Iteration 11
i=8, j=0
Pattern: a b a b d 1- Compare i with j+1
i=8, value = c
J J+1
j+1=1 , value = a
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 2 as it was on index 1 and
we compared j+1)
P[i] a b a b d
4- Move i; (i will move to index 8)
Pi [j] 0 0 1 2 0 5- Mismatch
6- Move j to index below alphabet
J is already index 0 (here index below letter a is 0, and j is already at index 0)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
Please note: After this step j=0 and i=9, i will be incremented
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d
Iteration 12
i=9, j=0
Pattern: a b a b d 1- Compare i with j+1
i=9, value = a
J J+1
j+1=1 , value = a
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 1 as it was on
P[i] a b a b d index 0 and we compared j+1)
4- Move i; (i will move to index 10)
Pi [j] 0 0 1 2 0
5- Mismatch
6- Move j to index below alphabet
(here index below letter a is 0, and j is already at index 0)
7- go to step 1 and compare
8-Increment i
Please note: After this step j=1 and i=10 9- go to step 1 and compare
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d
Iteration 13
i=10, j=1
Pattern: a b a b d 1- Compare i with j+1
i=10, value = b
J J+1
j+1=2 , value = b
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 2)
P[i] a b a b d 4- Move i;
5- Mismatch
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter a is 0, and j is already at index 0)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
Please note: After this step j=2 and i=11
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d
Iteration 14
i=11, j=2
Pattern: a b a b d 1- Compare i with j+1
i=11, value = a
J J+1
j+1=3 , value = a
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 3)
P[i] a b a b d 4- Move i;
5- Mismatch
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter a is 0, and j is already at index 0)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
Please note: After this step j=3 and i=12
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d
Iteration 15
i=12, j=3
Pattern: a b a b d 1- Compare i with j+1
i=12, value = b
J J+1
j+1=4 , value = b
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 4)
P[i] a b a b d 4- Move i;
5- Mismatch
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter a is 0, and j is already at index 0)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
Please note: After this step j=4 and i=13
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d
Iteration 15
i=13, j=4
Pattern: a b a b d 1- Compare i with j+1
i=13, value = a
J J+1
j+1=5 , value = d
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 4)
P[i] a b a b d 4- Move i;
5- Mismatch
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter b is 2, and j moved to index 2)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
Please note: After this step j=2 and i=13
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d
Iteration 16
i=13, j=2
Pattern: a b a b d 1- Compare i with j+1
i=13, value = a
J J+1
j+1=3 , value = a
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 4)
P[i] a b a b d 4- Move i;
5- Mismatch
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter b is 2, and j moved to index 2)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
Please note: After this step j=3 and i=14
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d
Iteration 16
i=14, j=3
Pattern: a b a b d 1- Compare i with j+1
i=14, value = b
J J+1
j+1=4 , value = b
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 5)
P[i] a b a b d 4- Move i;
5- Mismatch
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter b is 2, and j moved to index 2)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
Please note: After this step j=4 and i=15
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d
Iteration 17
i=15, j=4
Pattern: a b a b d 1- Compare i with j+1
i=15, value = d
J J+1
j+1=5 , value = d
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 5)
P[i] a b a b d 4- Move i;
5- Mismatch
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter b is 2, and j moved to index 2)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
10- check the conditions i reached to maximum –end of
program- (apply appropriate boundary conditions.)
Knuth Morris Pratt (KMP) Algorithm
• Advantages
• The running time and space complexity of the KMP algorithm is optimal (O(m + n)), which
is very fast.
• O(m) - It is to compute the values (array T in example).
• O(n) - It is to compare the pattern to the text (array P in example).
• The algorithm never needs to move backwards in the input text T. It makes the algorithm
good for processing very large files.
• Note why it is said KMP achieve O(n).
• Disadvantages
• Doesn’t work so well as the size of the alphabets increases. By which more chances of
mismatch occurs.
• What is prefix and suffix in KMP algorithm?
• What is pi?
Home
Assignment