Week14 Chap7 String Algorithms
Week14 Chap7 String Algorithms
CONTENTS
3 4
BOYER MOORE ALGORITHM BOYER MOORE ALGORITHM
a b a c b a c a c b a c a b a c b a c a c b a c
a c b a a c b a
5 6
a b a c b a c a c b a c a b a c b a c a c b a c
a c b a a c b a
• Slide the sample string P from left to right • Slide the sample string P from left to right
• Match: right to left • Match: right to left
• Use preprocessing information to skip as many • Use preprocessing information to skip as many
characters as possible characters as possible
• Preprocessing the sample string P
• Last[x]: The rightmost position that appears the
letter x in P Last[a] = 4, Last[b] = 3,
Last[c] = 2
7 8
BOYER MOORE ALGORITHM BOYER MOORE ALGORITHM
a b a c b a c a c b a c a b a c b a c a c b a c
a c b a a c b a
Bad character c
• Slide the sample string P from left to right • Slide the sample string P from left to right
• Match: right to left j = 4, unmatch position • Match: right to left
• Use preprocessing information to skip as many • Use preprocessing information to skip as many
characters as possible characters as possible a c b a
• Preprocessing the sample string P • Preprocessing the sample string P
• Last[x]: The rightmost position that appears the • Last[x]: The rightmost position that appears the
letter x in P letter x in P
• When a mismatch occurs with the bad character x • When a mismatch occurs with the bad character x
(a character of T), P is slid to the right (a character of T), P is slid to the right
max{j - Last[x], 1} positions max{j - Last[x], 1} positions
where j is the current index (mismatch occurring) where j is the current index (mismatch occurring)
on P when matching characters from right to left on P when matching characters from right to left
9 10
a b a c b a c a c b a c a b a c b a c a c b a c
a c b a a c b a
• Slide the sample string P from left to right • Slide the sample string P from left to right
• Match: right to left • Match: right to left
• Use preprocessing information to skip as many • Use preprocessing information to skip as many
characters as possible a c b a characters as possible a c b a
• Preprocessing the sample string P • Preprocessing the sample string P
• Last[x]: The rightmost position that appears the • Last[x]: The rightmost position that appears the
a c b a a c b a
letter x in P letter x in P
• When a mismatch occurs with the bad character x • When a mismatch occurs with the bad character x
(a character of T), P is slid to the right (a character of T), P is slid to the right a c b a
max{j - Last[x], 1} positions max{j - Last[x], 1} positions
where j is the current index (mismatch occurring) where j is the current index (mismatch occurring)
on P when matching characters from right to left on P when matching characters from right to left
11 12
BOYER MOORE ALGORITHM BOYER MOORE ALGORITHM
a b a c b a c a c b a c a b a c b a c a c b a c
a c b a a c b a
• Slide the sample string P from left to right • Slide the sample string P from left to right
• Match: right to left • Match: right to left
• Use preprocessing information to skip as many • Use preprocessing information to skip as many
characters as possible a c b a characters as possible a c b a
• Preprocessing the sample string P • Preprocessing the sample string P
• Last[x]: The rightmost position that appears the • Last[x]: The rightmost position that appears the
a c b a a c b a
letter x in P letter x in P
• When a mismatch occurs with the bad character x • When a mismatch occurs with the bad character x
(a character of T), P is slid to the right a c b a (a character of T), P is slid to the right a c b a
max{j - Last[x], 1} positions max{j - Last[x], 1} positions
where j is the current index (mismatch occurring) a c b a where j is the current index (mismatch occurring) a c b a
on P when matching characters from right to left on P when matching characters from right to left
a c b a
13 14
15 16
RABIN KARP ALGORITHM RABIN KARP ALGORITHM
• The Rabin-Karp algorithm converts the sample strings to non-negative integers • Disadvantage
• Each letter in the alphabet is represented by a non-negative integer less than d • When M is large, converting strings to numbers takes considerable time,
• Convert the string P[1..M] to a positive integer • Can cause overflow for the basic data types of the programming language
p = P[1]*dM-1 + P[2]*dM-2 + . . . + P[M]*d0 • Solution: perform division by Q and get the remainder value
• Match patterns by comparing 2 corresponding code values: • When the 2 remainders are different, it means 2 different numeric values and 2 corresponding
• If the two codes are different, the two corresponding strings are different strings are also different
• If the two codes are equal, we proceed to match each character • When the two remainders are equal, match each character in the traditional way
• Use the Horner scheme to increase the speed of calculating the encoding of substrings in T
• With sliding position s, convert the substring T[s+1 .. s+M] to number:
17 18
19 20
KMP ALGORITHM KMP ALGORITHM
a b a c b a c a c b a c • Preprocessing:
• [q]: length of the longest prefix which is also the
strict suffix of the string P[1..q]
a c b a
21 22
23 24
KMP ALGORITHM KMP ALGORITHM
• Slide the sample string P from left to right over T • Slide the sample string P from left to right over T
0 0 1 2 3 4 0 1 0 0 1 2 3 4 0 1
25 26
• Slide the sample string P from left to right over T • Slide the sample string P from left to right over T
0 0 1 2 3 4 0 1 0 0 1 2 3 4 0 1
27 28
KMP ALGORITHM KMP ALGORITHM
• Slide the sample string P from left to right over T • Slide the sample string P from left to right over T
0 0 1 2 3 4 0 1 0 0 1 2 3 4 0 1
29 30
• Slide the sample string P from left to right over T • Slide the sample string P from left to right over T
0 0 1 2 3 4 0 1 0 0 1 2 3 4 0 1
31 32
KMP ALGORITHM KMP ALGORITHM
• Slide the sample string P from left to right over T • Slide the sample string P from left to right over T
0 0 1 2 3 4 0 1 0 0 1 2 3 4 0 1
33 34
• Slide the sample string P from left to right over T • Slide the sample string P from left to right over T
0 0 1 2 3 4 0 1 0 0 1 2 3 4 0 1
35 36
KMP ALGORITHM KMP ALGORITHM
• Slide the sample string P from left to right over T • Slide the sample string P from left to right over T
0 0 1 2 3 4 0 1 0 0 1 2 3 4 0 1
37 38
• Slide the sample string P from left to right over T i • Slide the sample string P from left to right over T i
0 0 1 2 3 4 0 1 0 0 1 2 3 4 0 1
39 40
KMP ALGORITHM KMP ALGORITHM
• Slide the sample string P from left to right over T i • Slide the sample string P from left to right over T i
0 0 1 2 3 4 0 1 0 0 1 2 3 4 0 1
41 42
• Slide the sample string P from left to right over T i • Slide the sample string P from left to right over T i
0 0 1 2 3 4 0 1 0 0 1 2 3 4 0 1
43 44
KMP ALGORITHM KMP ALGORITHM
• Slide the sample string P from left to right over T i • Slide the sample string P from left to right over T i
0 0 1 2 3 4 0 1 0 0 1 2 3 4 0 1
45 46
47 48
THANK YOU !
49