0% found this document useful (0 votes)
13 views

StringMatchingAlgorithmsL1

Uploaded by

Ayman Asif
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

StringMatchingAlgorithmsL1

Uploaded by

Ayman Asif
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

String Matching

CS-209: Design and Analysis of Algorithm


Instructor: Dr. Maria Anjum
Contents
• Naïve Algorithm
• Knuth Morris Pratt (KMP) Algorithm
• Robin Karp algorithm
• Finite Automata
String Matching Algorithms

• String matching algorithms tries to find one or more indices where one or several
strings (pattern) are found in the larger string (text).

• Use of String matching algorithms


• Can greatly aid the responsiveness of the text-editing program.
• String-matching algorithms search for particular patterns in DNA sequences.
• Internet search engines also use them to find Web pages relevant to queries.
• Plagiarism checking in documents
• Bioinformatics
String Matching Algorithms

- Formal Definition of String Matching Problem

- Assume text is an array T[1..n] of length n and the pattern is an array


P[1..m] of length m ≤ n

This means:
• there is a string array T which contains a certain number of characters that is
larger than the number of characters in string array P.
• P is said to be the pattern array because it contains a pattern of characters to be
searched for in the larger array T.
Naïve Algorithm
• Naïve Algorithm also known as brute-force algorithm
• It is the simplest method among other pattern searching algorithms.
• It checks all character of the main string (T) to the pattern (P).
• This algorithm is useful for smaller texts.
• It does not need any pre-processing phases.
• Algorithm is space efficient and does not take extra space.
• The time complexity of Naïve Pattern Search method is O(m*n). The m is the size
of pattern and n is the size of the main string.
Naïve Algorithm Example Cont.
index
i
1 2 3 4 5 6 7 8 9 10 11 12
a b c d a b c a b c d f

string

Pattern: a b c d f • Move j and i until there is a mismatch.


j • In case of mismatch
• shift j to the starting point
• i will start from index 2
Naïve Algorithm Example Cont.
index
i
1 2 3 4 5 6 7 8 9 10 11 12
a b c d a b c a b c d f

• Move j and i until there is a mismatch.


• In case of mismatch
• shift j to the starting point
• i will start from index 2
Pattern: a b c d f
j Index i = 1
value = a

Index j =1
1
Pattern value = a

No mismatch therefore move i and j i-e i++ and j++


Naïve Algorithm Example Cont.
index
i
1 2 3 4 5 6 7 8 9 10 11 12
a b c d a b c a b c d f

• Move j and i until there is a mismatch.


• In case of mismatch
• shift j to the starting point
• i will start from index 2
Pattern: a b c d f
j Index i = 2
value =b

Index j = 2
2
Pattern value =b

No mismatch therefore move i and j i-e i++ and j++


Naïve Algorithm Example Cont.
index
i
1 2 3 4 5 6 7 8 9 10 11 12
a b c d a b c a b c d f

• Move j and i until there is a mismatch.


• In case of mismatch
• shift j to the starting point
• i will start from index 2
Pattern: a b c d f
j Index i =3
value = c

Index j = 3
3
Pattern value = c

No mismatch therefore move i and j i-e i++ and j++


Naïve Algorithm Example Cont.
index
i
1 2 3 4 5 6 7 8 9 10 11 12
a b c d a b c a b c d f

• Move j and i until there is a mismatch.


• In case of mismatch
• shift j to the starting point
Pattern: a b c d f • i will start from index 2
j
Index I = 4
value = d

Index j = 4
4
Pattern value = d

No mismatch therefore move i and j i-e i++ and j++


Naïve Algorithm Example Cont.
index
i
1 2 3 4 5 6 7 8 9 10 11 12
a b c d a b c a b c d f

• Move j and i until there is a mismatch.


• In case of mismatch
• shift j to the starting point
• i will start from index 2
Pattern: a b c d f
Index i = 5
j value = a

Index j = 5
5 Pattern value = f
mismatch therefore

Move j to index 1
move i to index 2 In other words reset index I and j
Naïve Algorithm Example Cont.
index
i
1 2 3 4 5 6 7 8 9 10 11 12
a b c d a b c a b c d f

• Move j and i until there is a mismatch.


• In case of mismatch
• shift j to the starting point
• i will start from index 2
Pattern: a b c d f
Index i= 2
j value =b

Index j = 1
6 Pattern value =a
mismatch therefore

move i to next index Guess what will be the index for i?


Move j to index 1 In other words reset index I and j
Naïve Algorithm Example Cont.
index
i
1 2 3 4 5 6 7 8 9 10 11 12
a b c d a b c a b c d f

• Move j and i until there is a mismatch.


• In case of mismatch
• shift j to the starting point
• i will start from index 2
Pattern: a b c d f
Index i = 3
j value = c

Index j = 1
7 Pattern value =a
mismatch therefore

move i to next index Guess what will be the index for i?


Please complete the rest of the iterations Move j to index 1 In other words reset index I and j
Naïve Algorithm Example

1 2 3 4 5 6 7 8 9 10 11 12
a b c d a b c a b c d f

Pattern: a b c d f
• Move j and i until there is a mismatch.
j
• In case of mismatch
• shift j to the starting point
• i will start from index 2
Naïve Algorithm
• The naive string-matcher is inefficient because it entirely ignores information
gained about the text for one value of T when it considers other values of s.
• What will be the time complexity of Naïve Algorithm?
• What will be the pseudo code for this?
Knuth Morris Pratt (KMP) Algorithm

• This algorithm was conceived by Donald Knuth and Vaughan Pratt and
independently by James H. Morris in 1977.
• Knuth, Morris and Pratt discovered first linear time string-matching algorithm by
analysis of the naïve algorithm.
• It keeps the information that naive approach wasted information gathered during
the scan of the text.
• By avoiding this waste of information, it achieves a running time of O(n).
• The implementation of Knuth-Morris-Pratt algorithm is efficient because it
minimizes the total number of comparisons of the pattern against the input
string.
Knuth Morris Pratt (KMP) Algorithm

• Compares from left to right.


• Shifts more than one position.
• Preprocessing approach of Pattern to avoid trivial comparisons.
• Avoids recomputing matches.
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

1. Compare i with j+1


Pattern: a b a b d 2. If match then
3. Move i
J=0 4. Move j
Index [j] 0
5. Repeat 1-4 steps until mismatch
1 2 3 4 5
6. Move j to index below alphabet
P[i] a b a b d • Go to step 1
Pi [j] 0 0 1 2 0 • Repeat until mismatch
• Move j to index below alphabet
• If j reached zero and cant go back, move i

Zero index not assigned to anyone before and


a or b did not appear on any previous index.
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Pattern: a b a b d i=1
J+1 Initial state
J=0
J=0
1- Compare i with j+1
Index [j] 0 1 2 3 4 5 i=0, value = a
j+1=1 , value = a
P[i] a b a b d 2- If match
Pi [j] 0 0 1 2 0 3- Move j; (j will move to index 1 as it was on
index 0 and we compared j+1)
4- Move i; (i will move to index 2)

Please note:
After this step j=1 and i=2
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 2
Pattern: a b a b d i=2
J J+1 j=1
1- Compare i with j+1
Index [j] 0 1 2 3 4 5 i=2, value = b
j+1=2 , value = b
P[i] a b a b d 2- If match
Pi [j] 0 0 1 2 0 3- Move j; (j will move to index 2 as it was on
index 1 and we compared j+1)
4- Move i; (i will move to index 3)

Please note:
After this step j=2 and i=3
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 3
Pattern: a b a b d i=3
J J+1 j=2
1- Compare i with j+1
Index [j] 0 1 2 3 4 5 i=3, value = a
j+1=3 , value = a
P[i] a b a b d 2- If match
Pi [j] 0 0 1 2 0 3- Move j; (j will move to index 3 as it was on
index 2 and we compared j+1)
4- Move i; (i will move to index 4)

Please note:
After this step j=3 and i=4
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 4
Pattern: a b a b d i=4
J J+1 j=3
1- Compare i with j+1
Index [j] 0 1 2 3 4 5 i=4, value = b
j+1=4 , value = b
P[i] a b a b d 2- If match
Pi [j] 0 0 1 2 0 3- Move j; (j will move to index 4 as it was on
index 3 and we compared j+1)
4- Move i; (i will move to index 5)

Please note:
After this step j=4 and i=5
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 5

i=5
Pattern: a b a b d j=4
1- Compare i with j+1
J J+1
i=5, value = c
j+1=5 , value = d
Index [j] 0 1 2 3 4 5 2- If match
P[i] a b a b d 3- Move j; (j will move to index 4 as it was on
index 3 and we compared j+1)
Pi [j] 0 0 1 2 0
4- Move i; (i will move to index 5)
5- Mismatch
J will move to index 2 6- Move j to index below alphabet
(here check index below letter b its index 2
7- go to step 1 and compare
Please note: After this step j=2 and i=5, we did not increment i
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 6

i=5
Pattern: a b a b d j=2
1- Compare i with j+1
J J+1
i=5, value = c
j+1=3 , value = a
Index [j] 0 1 2 3 4 5 2- If match
P[i] a b a b d 3- Move j; (j will move to index 4 as it was on
index 3 and we compared j+1)
Pi [j] 0 0 1 2 0
4- Move i; (i will move to index 5)
5- Mismatch (again)
J will move to index 0
6- Move j to index below alphabet
(here index below letter b is 0)
7- go to step 1 and compare
Please note: After this step j=0 and i=5
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 7
i=5, j=0
1- Compare i with j+1
Pattern: a b a b d i=5, value = c
J J+1 j+1=1 , value = a
2- If match
3- Move j; (j will move to index 4 as it was on index 3 and
Index [j] 0 1 2 3 4 5 we compared j+1)
a b a b d 4- Move i; (i will move to index 5)
P[i]
5- Mismatch (again)
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter a its index 0 and j is
already on 0 index. We can go beyond)
7- go to step 1 and compare
8-Increment i
Please note: After this step j=0 and i=6, we incremented i 9- go to step 1 and compare
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 8

i=6, j=0
Pattern: a b a b d 1- Compare i with j+1
i=6, value = a
J J+1
j+1=1 , value = a
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 1 as it was on
P[i] a b a b d index 0 and we compared j+1)
4- Move i; (i will move to index 7)
Pi [j] 0 0 1 2 0
5- Mismatch (again)
6- Move j to index below alphabet
(here index below letter a its index 0 and j is already on 0
index. We can go beyond)
7- go to step 1 and compare
Please note: After this step j=1 and i=7 8-Increment i
9- go to step 1 and compare
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 9
i=7, j=1
Pattern: a b a b d 1- Compare i with j+1
i=7, value = b
J J+1 j+1=1 , value = b
2- If match
Index [j] 0 3- Move j; (j will move to index 2 as it was on index 1 and
1 2 3 4 5
we compared j+1)
P[i] a b a b d 4- Move i; (i will move to index 8)
0 0 1 2 0 5- Mismatch (again)
Pi [j]
6- Move j to index below alphabet
(here index below letter a its index 0 and j is
already on 0 index. We can go beyond)
7- go to step 1 and compare
Please note: After this step j=2 and i=8 8-Increment i
9- go to step 1 and compare
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 10
i=8, j=2
Pattern: a b a b d 1- Compare i with j+1
i=8, value = c
J J+1
j+1=3 , value = a
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 2 as it was on index 1 and
we compared j+1)
P[i] a b a b d
4- Move i; (i will move to index 8)
Pi [j] 0 0 1 2 0 5- Mismatch (again)
6- Move j to index below alphabet
J will move to index 0 (here index below letter b its index 0, so j moved to index
0)
7- go to step 1 and compare
8-Increment i
Please note: After this step j=0 and i=8 9- go to step 1 and compare
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 11
i=8, j=0
Pattern: a b a b d 1- Compare i with j+1
i=8, value = c
J J+1
j+1=1 , value = a
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 2 as it was on index 1 and
we compared j+1)
P[i] a b a b d
4- Move i; (i will move to index 8)
Pi [j] 0 0 1 2 0 5- Mismatch
6- Move j to index below alphabet
J is already index 0 (here index below letter a is 0, and j is already at index 0)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
Please note: After this step j=0 and i=9, i will be incremented
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 12
i=9, j=0
Pattern: a b a b d 1- Compare i with j+1
i=9, value = a
J J+1
j+1=1 , value = a
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 1 as it was on
P[i] a b a b d index 0 and we compared j+1)
4- Move i; (i will move to index 10)
Pi [j] 0 0 1 2 0
5- Mismatch
6- Move j to index below alphabet
(here index below letter a is 0, and j is already at index 0)
7- go to step 1 and compare
8-Increment i
Please note: After this step j=1 and i=10 9- go to step 1 and compare
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 13
i=10, j=1
Pattern: a b a b d 1- Compare i with j+1
i=10, value = b
J J+1
j+1=2 , value = b
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 2)
P[i] a b a b d 4- Move i;
5- Mismatch
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter a is 0, and j is already at index 0)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
Please note: After this step j=2 and i=11
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 14
i=11, j=2
Pattern: a b a b d 1- Compare i with j+1
i=11, value = a
J J+1
j+1=3 , value = a
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 3)
P[i] a b a b d 4- Move i;
5- Mismatch
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter a is 0, and j is already at index 0)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
Please note: After this step j=3 and i=12
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 15
i=12, j=3
Pattern: a b a b d 1- Compare i with j+1
i=12, value = b
J J+1
j+1=4 , value = b
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 4)
P[i] a b a b d 4- Move i;
5- Mismatch
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter a is 0, and j is already at index 0)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
Please note: After this step j=4 and i=13
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 15
i=13, j=4
Pattern: a b a b d 1- Compare i with j+1
i=13, value = a
J J+1
j+1=5 , value = d
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 4)
P[i] a b a b d 4- Move i;
5- Mismatch
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter b is 2, and j moved to index 2)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
Please note: After this step j=2 and i=13
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 16
i=13, j=2
Pattern: a b a b d 1- Compare i with j+1
i=13, value = a
J J+1
j+1=3 , value = a
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 4)
P[i] a b a b d 4- Move i;
5- Mismatch
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter b is 2, and j moved to index 2)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
Please note: After this step j=3 and i=14
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 16
i=14, j=3
Pattern: a b a b d 1- Compare i with j+1
i=14, value = b
J J+1
j+1=4 , value = b
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 5)
P[i] a b a b d 4- Move i;
5- Mismatch
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter b is 2, and j moved to index 2)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
Please note: After this step j=4 and i=15
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 17
i=15, j=4
Pattern: a b a b d 1- Compare i with j+1
i=15, value = d
J J+1
j+1=5 , value = d
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 5)
P[i] a b a b d 4- Move i;
5- Mismatch
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter b is 2, and j moved to index 2)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
10- check the conditions i reached to maximum –end of
program- (apply appropriate boundary conditions.)
Knuth Morris Pratt (KMP) Algorithm

• Advantages
• The running time and space complexity of the KMP algorithm is optimal (O(m + n)), which
is very fast.
• O(m) - It is to compute the values (array T in example).
• O(n) - It is to compare the pattern to the text (array P in example).
• The algorithm never needs to move backwards in the input text T. It makes the algorithm
good for processing very large files.
• Note why it is said KMP achieve O(n).

• Disadvantages
• Doesn’t work so well as the size of the alphabets increases. By which more chances of
mismatch occurs.
• What is prefix and suffix in KMP algorithm?
• What is pi?
Home
Assignment

• What will be the time complexity of Naïve Algorithm and KMP


algorithm?
• What will be the pseudo code for these algorithms?
• Book exercise 32.1-1.
• Book example for KMP algorithm.
References
• Book Introduction to algorithms, 3rd edition, Chapter String Matching
• https://home.cse.ust.hk/~dekai/271/notes/L16/L16.pdf
• https://www.youtube.com/watch?v=V5-7GzOfADQ
• http://cs.indstate.edu/~kmandumula/abstract.pdf
• https://www.youtube.com/watch?v=qQ8vS2btsxI check for collusion

You might also like