0% found this document useful (0 votes)
53 views50 pages

String Matching 2019

The document discusses string matching algorithms. It begins by formalizing the string matching problem, where the text is an array T of characters and the pattern is an array P of characters. It then describes the naive string matching algorithm, which checks the pattern against each substring of the text, resulting in O(n^2) worst-case time complexity. Next, it introduces the Rabin-Karp algorithm, which uses hashing to match pattern hashes to text substring hashes in constant time, improving average-case performance. Finally, it explains the Knuth-Morris-Pratt (KMP) algorithm, which uses preprocessing of the pattern to construct a lookup table that allows it to efficiently shift the pattern as matching fails,

Uploaded by

yetsedaw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views50 pages

String Matching 2019

The document discusses string matching algorithms. It begins by formalizing the string matching problem, where the text is an array T of characters and the pattern is an array P of characters. It then describes the naive string matching algorithm, which checks the pattern against each substring of the text, resulting in O(n^2) worst-case time complexity. Next, it introduces the Rabin-Karp algorithm, which uses hashing to match pattern hashes to text substring hashes in constant time, improving average-case performance. Finally, it explains the Knuth-Morris-Pratt (KMP) algorithm, which uses preprocessing of the pattern to construct a lookup table that allows it to efficiently shift the pattern as matching fails,

Uploaded by

yetsedaw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

String Matching

Algorithms

Advanced Algorithms and


Data Structure

Sirwe Saeedi
Spring 2019

https://www.google.com/search?q=question+at+the+end+of+slide&tbm=isch&tbs=rimg:CSVuLLlqcQL3IjiiRAzI700j8YOM-DP3mSu_16Cut
gg7ZrkEj0CYhz4UkHrU8GfokvjEWFacz7m269dIe0L5ORu6VgioSCaJEDMjvTSPxEa56d-APlP9fKhIJg4z4M_
Applications 1

• BioInformatics
• DNA sequencing
Applications 2

• Web pages search engine


Formalize String

Matching Problem

• A text as an array of characters T[1..n]


• A pattern as an array of characters P[1..m]
• m<=n
• The characters
Formalize String

Matching Problem

T[1..15] L O L O E L L O H E L L O

P[1..5] H E L L O
Formalize String

Matching Problem

T[1..15] L O L O E L L O H E L L O

P[1..5] H E L L O
Formalize String

Matching Problem

T[1..15] L O L O E L L O H E L L O

P[1..5] H E L L O
String Matching 

Problem

T[1..15] L O L O E L L O H E L L O

P[1..5] H E L L O

T[12] = P[2]

T[14] = P[4]
T[15] = P[5]
T[13] = P[3]
T[11] = P[1]
s = 10
First occurrence of pattern
Naive String 

MatchingAlgorithm

Check P with each substring of T for all possible shifts

T[1..15] L O L O E L L O H E L L O

P[1..5] H E L L O

for s=0 test T[1..5] = P[1..5]


Naive String 

MatchingAlgorithm

check P with each substring of T for all possible shifts

T[1..15] L O L O E L L O H E L L O

P[1..5] H E L L O

for s=1 test T[2..5+1] = P[1..5]


Naive String 

MatchingAlgorithm

check P with each substring of T for all possible shifts

T[1..15] L O L O E L L O H E L L O

P[1..5] H E L L O

for s=0 test T[3..5+2] = P[1..5]


Naive String 

MatchingAlgorithm

check P with each substring of T for all possible shifts

T[1..15] L O L O E L L O H E L L O

P[1..5] H E L L O

for s=0 test T[11..5+10] = P[1..5]


Naive String 

MatchingAlgorithm

https://labs.xjtudlc.com/labs/wldmt/reading%20list/books/Algorithms%20and%20optimization/Introduction%20to%20Algorithms.pdf
Naive String 

Matching Algorithm
Time Complexity

Matching time in the worst case: O(m(n-m+1)) ~ O(n^2)

Text = a^n a a a a a a a a a . . . a a a

Pattern = a^m a a a a a
Naive String 

Matching Algorithm
Time Complexity

Matching time in the worst case: O(m(n-m+1)) ~ O(n^2)

Text = a^n a a a a a a a a a . . . a a a

Pattern = a^m a a a a a
Naive String 

Matching Algorithm
Time Complexity

Matching time in the worst case: O(m(n-m+1)) ~ O(n^2)

Text = a^n a a a a a a a a a . . . a a a

Pattern = a^m a a a a a
Naive String 

Matching Algorithm
Time Complexity

Matching time in the worst case: O(m(n-m+1)) ~ O(n^2)

Text = a^n a a a a a a a a a . . . a a a

Pattern = a^m a a a a a
Rabin-Karp String 

Matching Algorithm

• The Rabin-Karp algorithm calculates a hash value for the pattern, 



and for each M-character subsequence of text to be compared.

• If the hash values are unequal, the algorithm will calculate the hash 

value for next M-character sequence.

• If the hash values are equal, the algorithm will compare the pattern

and the M-character sequence.

• In this way, there is only one comparison per text subsequence, and

character matching is only needed when hash values match.
Some mathematics

• Consider an M-character sequence as an M-digit number in base b, 



where b is the number of letters in the alphabet. The subsequent

t[i..i+M-1] is mapped to the number:

x(i) = t[i]*b^(M-1) + t[i+1]*b^(M-2) + … + t[i+M-1]

• Furthermore, given x(i) we can compute x(i+1) for the next subsequent 

t[i+1..i+M] in constant time, as follows:

x(i+1) = t[i+1]*b^(M-1) + t[i+2]*b^(M-2) + … + t[i+M]


Some mathematics

• x(i+1) = x(i)*b ———> Shift left one digit


-t[i]*b^M ———> Subtract leftmost digit
+t[i+M] ———> Add new rightmost digit

• We adjust the existing value when we move over one character


• Constant time to compute M-digit numbers of each M-characters 

subsequence
Some mathematics

• We hash the value by taking it mod a prime number q



The mod function is useful in this case:
1. [(x mod q) + (y mod q)] mod q = (x+y) mod q
2. (x mod q) mod q = x mod q

• For these reasons:


hash(x(i)) = ((t[i]*b^(M-1) mod q) + (t[i+1]* b^(M-2) mod q) +

… + (t[i+M-1] mod q)) mod q
• So:
h(x(i+1)) = ( h(x(i)*b mod q -t[i]*b^M mod q +t[i+M] mod q) mod q

Rabin-Karp String 

Matching Algorithm

https://labs.xjtudlc.com/labs/wldmt/reading%20list/books/Algorithms%20and%20optimization/Introduction%20to%20Algorithms.pdf
Rabin-Karp Algorithm
Example

hash(‘aab’) = 3

Text = ‘aabbcaba’ a a b b c a b a

Pattern = ‘cab’ c a b
hash(‘cab’) = 0

hash(‘abb’) = 0

Text = ‘aabbcaba’ a a b b c a b a

Pattern = ‘cab’ c a b
hash(‘cab’) = 0
Rabin-Karp Algorithm
Example

hash(‘bbc’) = 3

Text = ‘aabbcaba’ a a b b c a b a

Pattern = ‘cab’ c a b
hash(‘cab’) = 0

hash(‘bca’) = 0

Text = ‘aabbcaba’ a a b b c a b a

Pattern = ‘cab’ c a b
hash(‘cab’) = 0
Rabin-Karp Algorithm
Example

hash(‘cba’) = 0

Text = ‘aabbcaba’ a a b b c a b a

Collision happened
Pattern = ‘cab’ c a b
in hashing 

hash(‘cab’) = 0 But the algorithm
handles it
hash(‘aba’) = 0

Text = ‘aabbcaba’ a a b b c a b a

Pattern = ‘cab’ c a b
hash(‘cab’) = 0
Time Complexity

Matching time in the worst case



O(m(n-m+1)) ~ O(n^2)

Performs better in average case



preprocessing time

O(m)
KMP String 

Matching Algorithm

• Knuth-Morris-Pratt Algorithm
• Improves the worst case time complexity to O(n)
• Use degenerating property of the pattern
KMP Algorithm
Example

A A A A A B A A A B A

A A A A Initial Position
KMP Algorithm
Example

A A A A A B A A A B A

A A A A Pattern shifted one position


KMP Algorithm
Example

A A A A A B A A A B A

A A A A Pattern shifted one position

Need preprocessing of pattern


KMP Algorithm
Preprocessing

• text = T[1..n]
• pattern = P[1..m]
• LPS = [1..m]
KMP Algorithm
Preprocessing

• pattern[] A B X A B

• LPS[]
0 1 2 3 4

LPS[i]
length of maximum matching
prefix(suffix) of pattern[0..i]
KMP Algorithm
Preprocessing

• pattern[] A B X A B

• LPS[] 0

0 1 2 3 4

LPS[0] = 0
KMP Algorithm
Preprocessing

• pattern[] A B X A B

• LPS[] 0 0

0 1 2 3 4

LPS[0] = 0
LPS[1] = 0
KMP Algorithm
Preprocessing

• pattern[] A B X A B

• LPS[] 0 0 0

0 1 2 3 4

LPS[0] = 0
LPS[1] = 0
LPS[2] = 0
KMP Algorithm
Preprocessing

• pattern[] A B X A B

• LPS[] 0 0 0

0 1 2 3 4

LPS[0] = 0
LPS[1] = 0
LPS[2] = 0
LPS[3] =
KMP Algorithm
Preprocessing

• pattern[] A B X A B

• LPS[] 0 0 0 1

0 1 2 3 4

LPS[0] = 0
LPS[1] = 0
LPS[2] = 0
LPS[3] = 1
KMP Algorithm
Preprocessing

• pattern[] A B X A B

• LPS[] 0 0 0 1 2

0 1 2 3 4
LPS[0] = 0
LPS[1] = 0
LPS[2] = 0
LPS[3] = 1
LPS[4] = 2
KMP Algorithm
Searching the Pattern

• To search pattern in the main text use the LPS array


• For each value of LPS we can decide which next characters

should be matched

• The idea is not matching characters that we already know


match anyway
KMP Algorithm
Searching the Pattern

• Text[] A B X A B A B X A B

A B X A B
• pattern[]
• LPS[] 0 0 0 1 2

0 1 2 3 4
KMP Algorithm
Searching the Pattern

• Text[] A B X A B A B X A B

A B X A B
• pattern[]
• LPS[] 0 0 0 1 2

0 1 2 3 4
KMP Algorithm
Searching the Pattern

• Text[] A B X A B A B X A B

A B X A B
• pattern[]
• LPS[] 0 0 0 1 2

0 1 2 3 4
KMP Algorithm
Searching the Pattern

• Text[] A B X A B A B X A B

A B X A B
• pattern[]
• LPS[] 0 0 0 1 2

0 1 2 3 4
KMP Algorithm
Searching the Pattern

• Text[] A B X A B A B X A B

A B X A B
• pattern[]
• LPS[] 0 0 0 1 2

0 1 2 3 4
KMP Algorithm
Searching the Pattern

• Text[] A B X A B A B X A B

A B X A B
• pattern[]
• LPS[] 0 0 0 1 2 Current Character

0 1 2 3 4
KMP Algorithm
Searching the Pattern

• Text[] A B X A B A B X A B

A B X A B
• pattern[]
• LPS[] 0 0 0 1 2
Substring behind the
0 1 2 3 4 current character
pattern[0..1] = ‘AB’
KMP Algorithm
Searching the Pattern

• Text[] A B X A B A B X A B

A B X A B
• pattern[]
• LPS[] 0 0 0 1 2

0 1 2 3 4
References

• Introduction to Algorithms Third Edition, Thomas H. Cormen 



Charles E. Leiserson Ronald L. Rivest Clifford Stein

• https://www.ics.uci.edu/~eppstein/161/960227.html
• https://www.nayuki.io/
Thank you
any questions

https://www.google.com/search?q=question+at+the+end+of+slide&tbm=isch&tbs=rimg:CSVuLLlqcQL3IjiiRAzI700j8YOM-DP3mSu_16Cut
gg7ZrkEj0CYhz4UkHrU8GfokvjEWFacz7m269dIe0L5ORu6VgioSCaJEDMjvTSPxEa56d-APlP9fKhIJg4z4M_
Back up

You might also like