0% found this document useful (0 votes)
13 views

StringMatchingAlgorithmsL1

Uploaded by

Ayman Asif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

StringMatchingAlgorithmsL1

Uploaded by

Ayman Asif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

String Matching

CS-209: Design and Analysis of Algorithm


Instructor: Dr. Maria Anjum
Contents
• Naïve Algorithm
• Knuth Morris Pratt (KMP) Algorithm
• Robin Karp algorithm
• Finite Automata
String Matching Algorithms

• String matching algorithms tries to find one or more indices where one or several
strings (pattern) are found in the larger string (text).

• Use of String matching algorithms


• Can greatly aid the responsiveness of the text-editing program.
• String-matching algorithms search for particular patterns in DNA sequences.
• Internet search engines also use them to find Web pages relevant to queries.
• Plagiarism checking in documents
• Bioinformatics
String Matching Algorithms

- Formal Definition of String Matching Problem

- Assume text is an array T[1..n] of length n and the pattern is an array


P[1..m] of length m ≤ n

This means:
• there is a string array T which contains a certain number of characters that is
larger than the number of characters in string array P.
• P is said to be the pattern array because it contains a pattern of characters to be
searched for in the larger array T.
Naïve Algorithm
• Naïve Algorithm also known as brute-force algorithm
• It is the simplest method among other pattern searching algorithms.
• It checks all character of the main string (T) to the pattern (P).
• This algorithm is useful for smaller texts.
• It does not need any pre-processing phases.
• Algorithm is space efficient and does not take extra space.
• The time complexity of Naïve Pattern Search method is O(m*n). The m is the size
of pattern and n is the size of the main string.
Naïve Algorithm Example Cont.
index
i
1 2 3 4 5 6 7 8 9 10 11 12
a b c d a b c a b c d f

string

Pattern: a b c d f • Move j and i until there is a mismatch.


j • In case of mismatch
• shift j to the starting point
• i will start from index 2
Naïve Algorithm Example Cont.
index
i
1 2 3 4 5 6 7 8 9 10 11 12
a b c d a b c a b c d f

• Move j and i until there is a mismatch.


• In case of mismatch
• shift j to the starting point
• i will start from index 2
Pattern: a b c d f
j Index i = 1
value = a

Index j =1
1
Pattern value = a

No mismatch therefore move i and j i-e i++ and j++


Naïve Algorithm Example Cont.
index
i
1 2 3 4 5 6 7 8 9 10 11 12
a b c d a b c a b c d f

• Move j and i until there is a mismatch.


• In case of mismatch
• shift j to the starting point
• i will start from index 2
Pattern: a b c d f
j Index i = 2
value =b

Index j = 2
2
Pattern value =b

No mismatch therefore move i and j i-e i++ and j++


Naïve Algorithm Example Cont.
index
i
1 2 3 4 5 6 7 8 9 10 11 12
a b c d a b c a b c d f

• Move j and i until there is a mismatch.


• In case of mismatch
• shift j to the starting point
• i will start from index 2
Pattern: a b c d f
j Index i =3
value = c

Index j = 3
3
Pattern value = c

No mismatch therefore move i and j i-e i++ and j++


Naïve Algorithm Example Cont.
index
i
1 2 3 4 5 6 7 8 9 10 11 12
a b c d a b c a b c d f

• Move j and i until there is a mismatch.


• In case of mismatch
• shift j to the starting point
Pattern: a b c d f • i will start from index 2
j
Index I = 4
value = d

Index j = 4
4
Pattern value = d

No mismatch therefore move i and j i-e i++ and j++


Naïve Algorithm Example Cont.
index
i
1 2 3 4 5 6 7 8 9 10 11 12
a b c d a b c a b c d f

• Move j and i until there is a mismatch.


• In case of mismatch
• shift j to the starting point
• i will start from index 2
Pattern: a b c d f
Index i = 5
j value = a

Index j = 5
5 Pattern value = f
mismatch therefore

Move j to index 1
move i to index 2 In other words reset index I and j
Naïve Algorithm Example Cont.
index
i
1 2 3 4 5 6 7 8 9 10 11 12
a b c d a b c a b c d f

• Move j and i until there is a mismatch.


• In case of mismatch
• shift j to the starting point
• i will start from index 2
Pattern: a b c d f
Index i= 2
j value =b

Index j = 1
6 Pattern value =a
mismatch therefore

move i to next index Guess what will be the index for i?


Move j to index 1 In other words reset index I and j
Naïve Algorithm Example Cont.
index
i
1 2 3 4 5 6 7 8 9 10 11 12
a b c d a b c a b c d f

• Move j and i until there is a mismatch.


• In case of mismatch
• shift j to the starting point
• i will start from index 2
Pattern: a b c d f
Index i = 3
j value = c

Index j = 1
7 Pattern value =a
mismatch therefore

move i to next index Guess what will be the index for i?


Please complete the rest of the iterations Move j to index 1 In other words reset index I and j
Naïve Algorithm Example

1 2 3 4 5 6 7 8 9 10 11 12
a b c d a b c a b c d f

Pattern: a b c d f
• Move j and i until there is a mismatch.
j
• In case of mismatch
• shift j to the starting point
• i will start from index 2
Naïve Algorithm
• The naive string-matcher is inefficient because it entirely ignores information
gained about the text for one value of T when it considers other values of s.
• What will be the time complexity of Naïve Algorithm?
• What will be the pseudo code for this?
Knuth Morris Pratt (KMP) Algorithm

• This algorithm was conceived by Donald Knuth and Vaughan Pratt and
independently by James H. Morris in 1977.
• Knuth, Morris and Pratt discovered first linear time string-matching algorithm by
analysis of the naïve algorithm.
• It keeps the information that naive approach wasted information gathered during
the scan of the text.
• By avoiding this waste of information, it achieves a running time of O(n).
• The implementation of Knuth-Morris-Pratt algorithm is efficient because it
minimizes the total number of comparisons of the pattern against the input
string.
Knuth Morris Pratt (KMP) Algorithm

• Compares from left to right.


• Shifts more than one position.
• Preprocessing approach of Pattern to avoid trivial comparisons.
• Avoids recomputing matches.
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

1. Compare i with j+1


Pattern: a b a b d 2. If match then
3. Move i
J=0 4. Move j
Index [j] 0
5. Repeat 1-4 steps until mismatch
1 2 3 4 5
6. Move j to index below alphabet
P[i] a b a b d • Go to step 1
Pi [j] 0 0 1 2 0 • Repeat until mismatch
• Move j to index below alphabet
• If j reached zero and cant go back, move i

Zero index not assigned to anyone before and


a or b did not appear on any previous index.
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Pattern: a b a b d i=1
J+1 Initial state
J=0
J=0
1- Compare i with j+1
Index [j] 0 1 2 3 4 5 i=0, value = a
j+1=1 , value = a
P[i] a b a b d 2- If match
Pi [j] 0 0 1 2 0 3- Move j; (j will move to index 1 as it was on
index 0 and we compared j+1)
4- Move i; (i will move to index 2)

Please note:
After this step j=1 and i=2
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 2
Pattern: a b a b d i=2
J J+1 j=1
1- Compare i with j+1
Index [j] 0 1 2 3 4 5 i=2, value = b
j+1=2 , value = b
P[i] a b a b d 2- If match
Pi [j] 0 0 1 2 0 3- Move j; (j will move to index 2 as it was on
index 1 and we compared j+1)
4- Move i; (i will move to index 3)

Please note:
After this step j=2 and i=3
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 3
Pattern: a b a b d i=3
J J+1 j=2
1- Compare i with j+1
Index [j] 0 1 2 3 4 5 i=3, value = a
j+1=3 , value = a
P[i] a b a b d 2- If match
Pi [j] 0 0 1 2 0 3- Move j; (j will move to index 3 as it was on
index 2 and we compared j+1)
4- Move i; (i will move to index 4)

Please note:
After this step j=3 and i=4
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 4
Pattern: a b a b d i=4
J J+1 j=3
1- Compare i with j+1
Index [j] 0 1 2 3 4 5 i=4, value = b
j+1=4 , value = b
P[i] a b a b d 2- If match
Pi [j] 0 0 1 2 0 3- Move j; (j will move to index 4 as it was on
index 3 and we compared j+1)
4- Move i; (i will move to index 5)

Please note:
After this step j=4 and i=5
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 5

i=5
Pattern: a b a b d j=4
1- Compare i with j+1
J J+1
i=5, value = c
j+1=5 , value = d
Index [j] 0 1 2 3 4 5 2- If match
P[i] a b a b d 3- Move j; (j will move to index 4 as it was on
index 3 and we compared j+1)
Pi [j] 0 0 1 2 0
4- Move i; (i will move to index 5)
5- Mismatch
J will move to index 2 6- Move j to index below alphabet
(here check index below letter b its index 2
7- go to step 1 and compare
Please note: After this step j=2 and i=5, we did not increment i
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 6

i=5
Pattern: a b a b d j=2
1- Compare i with j+1
J J+1
i=5, value = c
j+1=3 , value = a
Index [j] 0 1 2 3 4 5 2- If match
P[i] a b a b d 3- Move j; (j will move to index 4 as it was on
index 3 and we compared j+1)
Pi [j] 0 0 1 2 0
4- Move i; (i will move to index 5)
5- Mismatch (again)
J will move to index 0
6- Move j to index below alphabet
(here index below letter b is 0)
7- go to step 1 and compare
Please note: After this step j=0 and i=5
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 7
i=5, j=0
1- Compare i with j+1
Pattern: a b a b d i=5, value = c
J J+1 j+1=1 , value = a
2- If match
3- Move j; (j will move to index 4 as it was on index 3 and
Index [j] 0 1 2 3 4 5 we compared j+1)
a b a b d 4- Move i; (i will move to index 5)
P[i]
5- Mismatch (again)
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter a its index 0 and j is
already on 0 index. We can go beyond)
7- go to step 1 and compare
8-Increment i
Please note: After this step j=0 and i=6, we incremented i 9- go to step 1 and compare
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 8

i=6, j=0
Pattern: a b a b d 1- Compare i with j+1
i=6, value = a
J J+1
j+1=1 , value = a
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 1 as it was on
P[i] a b a b d index 0 and we compared j+1)
4- Move i; (i will move to index 7)
Pi [j] 0 0 1 2 0
5- Mismatch (again)
6- Move j to index below alphabet
(here index below letter a its index 0 and j is already on 0
index. We can go beyond)
7- go to step 1 and compare
Please note: After this step j=1 and i=7 8-Increment i
9- go to step 1 and compare
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 9
i=7, j=1
Pattern: a b a b d 1- Compare i with j+1
i=7, value = b
J J+1 j+1=1 , value = b
2- If match
Index [j] 0 3- Move j; (j will move to index 2 as it was on index 1 and
1 2 3 4 5
we compared j+1)
P[i] a b a b d 4- Move i; (i will move to index 8)
0 0 1 2 0 5- Mismatch (again)
Pi [j]
6- Move j to index below alphabet
(here index below letter a its index 0 and j is
already on 0 index. We can go beyond)
7- go to step 1 and compare
Please note: After this step j=2 and i=8 8-Increment i
9- go to step 1 and compare
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 10
i=8, j=2
Pattern: a b a b d 1- Compare i with j+1
i=8, value = c
J J+1
j+1=3 , value = a
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 2 as it was on index 1 and
we compared j+1)
P[i] a b a b d
4- Move i; (i will move to index 8)
Pi [j] 0 0 1 2 0 5- Mismatch (again)
6- Move j to index below alphabet
J will move to index 0 (here index below letter b its index 0, so j moved to index
0)
7- go to step 1 and compare
8-Increment i
Please note: After this step j=0 and i=8 9- go to step 1 and compare
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 11
i=8, j=0
Pattern: a b a b d 1- Compare i with j+1
i=8, value = c
J J+1
j+1=1 , value = a
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 2 as it was on index 1 and
we compared j+1)
P[i] a b a b d
4- Move i; (i will move to index 8)
Pi [j] 0 0 1 2 0 5- Mismatch
6- Move j to index below alphabet
J is already index 0 (here index below letter a is 0, and j is already at index 0)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
Please note: After this step j=0 and i=9, i will be incremented
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 12
i=9, j=0
Pattern: a b a b d 1- Compare i with j+1
i=9, value = a
J J+1
j+1=1 , value = a
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 1 as it was on
P[i] a b a b d index 0 and we compared j+1)
4- Move i; (i will move to index 10)
Pi [j] 0 0 1 2 0
5- Mismatch
6- Move j to index below alphabet
(here index below letter a is 0, and j is already at index 0)
7- go to step 1 and compare
8-Increment i
Please note: After this step j=1 and i=10 9- go to step 1 and compare
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 13
i=10, j=1
Pattern: a b a b d 1- Compare i with j+1
i=10, value = b
J J+1
j+1=2 , value = b
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 2)
P[i] a b a b d 4- Move i;
5- Mismatch
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter a is 0, and j is already at index 0)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
Please note: After this step j=2 and i=11
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 14
i=11, j=2
Pattern: a b a b d 1- Compare i with j+1
i=11, value = a
J J+1
j+1=3 , value = a
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 3)
P[i] a b a b d 4- Move i;
5- Mismatch
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter a is 0, and j is already at index 0)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
Please note: After this step j=3 and i=12
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 15
i=12, j=3
Pattern: a b a b d 1- Compare i with j+1
i=12, value = b
J J+1
j+1=4 , value = b
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 4)
P[i] a b a b d 4- Move i;
5- Mismatch
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter a is 0, and j is already at index 0)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
Please note: After this step j=4 and i=13
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 15
i=13, j=4
Pattern: a b a b d 1- Compare i with j+1
i=13, value = a
J J+1
j+1=5 , value = d
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 4)
P[i] a b a b d 4- Move i;
5- Mismatch
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter b is 2, and j moved to index 2)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
Please note: After this step j=2 and i=13
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 16
i=13, j=2
Pattern: a b a b d 1- Compare i with j+1
i=13, value = a
J J+1
j+1=3 , value = a
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 4)
P[i] a b a b d 4- Move i;
5- Mismatch
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter b is 2, and j moved to index 2)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
Please note: After this step j=3 and i=14
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 16
i=14, j=3
Pattern: a b a b d 1- Compare i with j+1
i=14, value = b
J J+1
j+1=4 , value = b
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 5)
P[i] a b a b d 4- Move i;
5- Mismatch
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter b is 2, and j moved to index 2)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
Please note: After this step j=4 and i=15
Knuth Morris Pratt (KMP) Algorithm
i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Array T a b a b c a b c a b a b a b d

Iteration 17
i=15, j=4
Pattern: a b a b d 1- Compare i with j+1
i=15, value = d
J J+1
j+1=5 , value = d
2- If match
Index [j] 0 1 2 3 4 5 3- Move j; (j will move to index 5)
P[i] a b a b d 4- Move i;
5- Mismatch
Pi [j] 0 0 1 2 0 6- Move j to index below alphabet
(here index below letter b is 2, and j moved to index 2)
7- go to step 1 and compare
8-Increment i
9- go to step 1 and compare
10- check the conditions i reached to maximum –end of
program- (apply appropriate boundary conditions.)
Knuth Morris Pratt (KMP) Algorithm

• Advantages
• The running time and space complexity of the KMP algorithm is optimal (O(m + n)), which
is very fast.
• O(m) - It is to compute the values (array T in example).
• O(n) - It is to compare the pattern to the text (array P in example).
• The algorithm never needs to move backwards in the input text T. It makes the algorithm
good for processing very large files.
• Note why it is said KMP achieve O(n).

• Disadvantages
• Doesn’t work so well as the size of the alphabets increases. By which more chances of
mismatch occurs.
• What is prefix and suffix in KMP algorithm?
• What is pi?
Home
Assignment

• What will be the time complexity of Naïve Algorithm and KMP


algorithm?
• What will be the pseudo code for these algorithms?
• Book exercise 32.1-1.
• Book example for KMP algorithm.
References
• Book Introduction to algorithms, 3rd edition, Chapter String Matching
• https://home.cse.ust.hk/~dekai/271/notes/L16/L16.pdf
• https://www.youtube.com/watch?v=V5-7GzOfADQ
• http://cs.indstate.edu/~kmandumula/abstract.pdf
• https://www.youtube.com/watch?v=qQ8vS2btsxI check for collusion

You might also like