String Matching 2019
String Matching 2019
Algorithms
Sirwe Saeedi
Spring 2019
https://www.google.com/search?q=question+at+the+end+of+slide&tbm=isch&tbs=rimg:CSVuLLlqcQL3IjiiRAzI700j8YOM-DP3mSu_16Cut
gg7ZrkEj0CYhz4UkHrU8GfokvjEWFacz7m269dIe0L5ORu6VgioSCaJEDMjvTSPxEa56d-APlP9fKhIJg4z4M_
Applications 1
• BioInformatics
• DNA sequencing
Applications 2
T[1..15] L O L O E L L O H E L L O
P[1..5] H E L L O
Formalize String
Matching Problem
T[1..15] L O L O E L L O H E L L O
P[1..5] H E L L O
Formalize String
Matching Problem
T[1..15] L O L O E L L O H E L L O
P[1..5] H E L L O
String Matching
Problem
T[1..15] L O L O E L L O H E L L O
P[1..5] H E L L O
T[12] = P[2]
T[14] = P[4]
T[15] = P[5]
T[13] = P[3]
T[11] = P[1]
s = 10
First occurrence of pattern
Naive String
MatchingAlgorithm
T[1..15] L O L O E L L O H E L L O
P[1..5] H E L L O
T[1..15] L O L O E L L O H E L L O
P[1..5] H E L L O
T[1..15] L O L O E L L O H E L L O
P[1..5] H E L L O
T[1..15] L O L O E L L O H E L L O
P[1..5] H E L L O
https://labs.xjtudlc.com/labs/wldmt/reading%20list/books/Algorithms%20and%20optimization/Introduction%20to%20Algorithms.pdf
Naive String
Matching Algorithm
Time Complexity
Text = a^n a a a a a a a a a . . . a a a
Pattern = a^m a a a a a
Naive String
Matching Algorithm
Time Complexity
Text = a^n a a a a a a a a a . . . a a a
Pattern = a^m a a a a a
Naive String
Matching Algorithm
Time Complexity
Text = a^n a a a a a a a a a . . . a a a
Pattern = a^m a a a a a
Naive String
Matching Algorithm
Time Complexity
Text = a^n a a a a a a a a a . . . a a a
Pattern = a^m a a a a a
Rabin-Karp String
Matching Algorithm
• If the hash values are unequal, the algorithm will calculate the hash
value for next M-character sequence.
• If the hash values are equal, the algorithm will compare the pattern
and the M-character sequence.
• In this way, there is only one comparison per text subsequence, and
character matching is only needed when hash values match.
Some mathematics
• Furthermore, given x(i) we can compute x(i+1) for the next subsequent
t[i+1..i+M] in constant time, as follows:
https://labs.xjtudlc.com/labs/wldmt/reading%20list/books/Algorithms%20and%20optimization/Introduction%20to%20Algorithms.pdf
Rabin-Karp Algorithm
Example
hash(‘aab’) = 3
Text = ‘aabbcaba’ a a b b c a b a
Pattern = ‘cab’ c a b
hash(‘cab’) = 0
hash(‘abb’) = 0
Text = ‘aabbcaba’ a a b b c a b a
Pattern = ‘cab’ c a b
hash(‘cab’) = 0
Rabin-Karp Algorithm
Example
hash(‘bbc’) = 3
Text = ‘aabbcaba’ a a b b c a b a
Pattern = ‘cab’ c a b
hash(‘cab’) = 0
hash(‘bca’) = 0
Text = ‘aabbcaba’ a a b b c a b a
Pattern = ‘cab’ c a b
hash(‘cab’) = 0
Rabin-Karp Algorithm
Example
hash(‘cba’) = 0
Text = ‘aabbcaba’ a a b b c a b a
Collision happened
Pattern = ‘cab’ c a b
in hashing
hash(‘cab’) = 0 But the algorithm
handles it
hash(‘aba’) = 0
Text = ‘aabbcaba’ a a b b c a b a
Pattern = ‘cab’ c a b
hash(‘cab’) = 0
Time Complexity
O(m)
KMP String
Matching Algorithm
• Knuth-Morris-Pratt Algorithm
• Improves the worst case time complexity to O(n)
• Use degenerating property of the pattern
KMP Algorithm
Example
A A A A A B A A A B A
A A A A Initial Position
KMP Algorithm
Example
A A A A A B A A A B A
A A A A A B A A A B A
• text = T[1..n]
• pattern = P[1..m]
• LPS = [1..m]
KMP Algorithm
Preprocessing
• pattern[] A B X A B
• LPS[]
0 1 2 3 4
LPS[i]
length of maximum matching
prefix(suffix) of pattern[0..i]
KMP Algorithm
Preprocessing
• pattern[] A B X A B
• LPS[] 0
0 1 2 3 4
LPS[0] = 0
KMP Algorithm
Preprocessing
• pattern[] A B X A B
• LPS[] 0 0
0 1 2 3 4
LPS[0] = 0
LPS[1] = 0
KMP Algorithm
Preprocessing
• pattern[] A B X A B
• LPS[] 0 0 0
0 1 2 3 4
LPS[0] = 0
LPS[1] = 0
LPS[2] = 0
KMP Algorithm
Preprocessing
• pattern[] A B X A B
• LPS[] 0 0 0
0 1 2 3 4
LPS[0] = 0
LPS[1] = 0
LPS[2] = 0
LPS[3] =
KMP Algorithm
Preprocessing
• pattern[] A B X A B
• LPS[] 0 0 0 1
0 1 2 3 4
LPS[0] = 0
LPS[1] = 0
LPS[2] = 0
LPS[3] = 1
KMP Algorithm
Preprocessing
• pattern[] A B X A B
• LPS[] 0 0 0 1 2
0 1 2 3 4
LPS[0] = 0
LPS[1] = 0
LPS[2] = 0
LPS[3] = 1
LPS[4] = 2
KMP Algorithm
Searching the Pattern
• Text[] A B X A B A B X A B
A B X A B
• pattern[]
• LPS[] 0 0 0 1 2
0 1 2 3 4
KMP Algorithm
Searching the Pattern
• Text[] A B X A B A B X A B
A B X A B
• pattern[]
• LPS[] 0 0 0 1 2
0 1 2 3 4
KMP Algorithm
Searching the Pattern
• Text[] A B X A B A B X A B
A B X A B
• pattern[]
• LPS[] 0 0 0 1 2
0 1 2 3 4
KMP Algorithm
Searching the Pattern
• Text[] A B X A B A B X A B
A B X A B
• pattern[]
• LPS[] 0 0 0 1 2
0 1 2 3 4
KMP Algorithm
Searching the Pattern
• Text[] A B X A B A B X A B
A B X A B
• pattern[]
• LPS[] 0 0 0 1 2
0 1 2 3 4
KMP Algorithm
Searching the Pattern
• Text[] A B X A B A B X A B
A B X A B
• pattern[]
• LPS[] 0 0 0 1 2 Current Character
0 1 2 3 4
KMP Algorithm
Searching the Pattern
• Text[] A B X A B A B X A B
A B X A B
• pattern[]
• LPS[] 0 0 0 1 2
Substring behind the
0 1 2 3 4 current character
pattern[0..1] = ‘AB’
KMP Algorithm
Searching the Pattern
• Text[] A B X A B A B X A B
A B X A B
• pattern[]
• LPS[] 0 0 0 1 2
0 1 2 3 4
References
• https://www.ics.uci.edu/~eppstein/161/960227.html
• https://www.nayuki.io/
Thank you
any questions
https://www.google.com/search?q=question+at+the+end+of+slide&tbm=isch&tbs=rimg:CSVuLLlqcQL3IjiiRAzI700j8YOM-DP3mSu_16Cut
gg7ZrkEj0CYhz4UkHrU8GfokvjEWFacz7m269dIe0L5ORu6VgioSCaJEDMjvTSPxEa56d-APlP9fKhIJg4z4M_
Back up