0% found this document useful (0 votes)
2 views

Week14 Chap7 String Algorithms

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Week14 Chap7 String Algorithms

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

APPLIED ALGORITHMS

CONTENTS

• Boyer Moore algorithm


• Rabin Karp algorithm
• KMP algorithm
APPLIED ALGORITHMS
STRING PROCESSING ALGORITHM

3 4
BOYER MOORE ALGORITHM BOYER MOORE ALGORITHM

a b a c b a c a c b a c a b a c b a c a c b a c

a c b a a c b a

5 6

BOYER MOORE ALGORITHM BOYER MOORE ALGORITHM

a b a c b a c a c b a c a b a c b a c a c b a c

a c b a a c b a

• Slide the sample string P from left to right • Slide the sample string P from left to right
• Match: right to left • Match: right to left
• Use preprocessing information to skip as many • Use preprocessing information to skip as many
characters as possible characters as possible
• Preprocessing the sample string P
• Last[x]: The rightmost position that appears the
letter x in P Last[a] = 4, Last[b] = 3,
Last[c] = 2

7 8
BOYER MOORE ALGORITHM BOYER MOORE ALGORITHM

a b a c b a c a c b a c a b a c b a c a c b a c

a c b a a c b a
Bad character c
• Slide the sample string P from left to right • Slide the sample string P from left to right
• Match: right to left j = 4, unmatch position • Match: right to left
• Use preprocessing information to skip as many • Use preprocessing information to skip as many
characters as possible characters as possible a c b a
• Preprocessing the sample string P • Preprocessing the sample string P
• Last[x]: The rightmost position that appears the • Last[x]: The rightmost position that appears the
letter x in P letter x in P
• When a mismatch occurs with the bad character x • When a mismatch occurs with the bad character x
(a character of T), P is slid to the right (a character of T), P is slid to the right
max{j - Last[x], 1} positions max{j - Last[x], 1} positions
where j is the current index (mismatch occurring) where j is the current index (mismatch occurring)
on P when matching characters from right to left on P when matching characters from right to left

9 10

BOYER MOORE ALGORITHM BOYER MOORE ALGORITHM

a b a c b a c a c b a c a b a c b a c a c b a c

a c b a a c b a

• Slide the sample string P from left to right • Slide the sample string P from left to right
• Match: right to left • Match: right to left
• Use preprocessing information to skip as many • Use preprocessing information to skip as many
characters as possible a c b a characters as possible a c b a
• Preprocessing the sample string P • Preprocessing the sample string P
• Last[x]: The rightmost position that appears the • Last[x]: The rightmost position that appears the
a c b a a c b a
letter x in P letter x in P
• When a mismatch occurs with the bad character x • When a mismatch occurs with the bad character x
(a character of T), P is slid to the right (a character of T), P is slid to the right a c b a
max{j - Last[x], 1} positions max{j - Last[x], 1} positions
where j is the current index (mismatch occurring) where j is the current index (mismatch occurring)
on P when matching characters from right to left on P when matching characters from right to left

11 12
BOYER MOORE ALGORITHM BOYER MOORE ALGORITHM

a b a c b a c a c b a c a b a c b a c a c b a c

a c b a a c b a

• Slide the sample string P from left to right • Slide the sample string P from left to right
• Match: right to left • Match: right to left
• Use preprocessing information to skip as many • Use preprocessing information to skip as many
characters as possible a c b a characters as possible a c b a
• Preprocessing the sample string P • Preprocessing the sample string P
• Last[x]: The rightmost position that appears the • Last[x]: The rightmost position that appears the
a c b a a c b a
letter x in P letter x in P
• When a mismatch occurs with the bad character x • When a mismatch occurs with the bad character x
(a character of T), P is slid to the right a c b a (a character of T), P is slid to the right a c b a
max{j - Last[x], 1} positions max{j - Last[x], 1} positions
where j is the current index (mismatch occurring) a c b a where j is the current index (mismatch occurring) a c b a
on P when matching characters from right to left on P when matching characters from right to left
a c b a
13 14

BOYER MOORE ALGORITHM CONTENTS

computeLast(p){ boyerMoore(P, T){


for c = 0 to 255 do last[c] = 0; computeLast(P);
• Boyer Moore algorithm
k = p.length();
for i = k-1 downto i >= 0 do {
s = 0; cnt = 0;
N = T.length(); M = P.length();
• Rabin Karp algorithm
if last[p[i]] = 0 then last[p[i]] = i; while s <= N-M do { • KMP algorithm
} j = M-1;
} while j >= 0 && T[j+s] = P[j] do
j = j -1;
if j == -1 then {
cnt++; s = s + 1;
}else{
k = last[T[j+s]];
s = s + (j - k > 1 ? j - k : 1);
}
}
return cnt;
}

15 16
RABIN KARP ALGORITHM RABIN KARP ALGORITHM

• The Rabin-Karp algorithm converts the sample strings to non-negative integers • Disadvantage
• Each letter in the alphabet is represented by a non-negative integer less than d • When M is large, converting strings to numbers takes considerable time,
• Convert the string P[1..M] to a positive integer • Can cause overflow for the basic data types of the programming language

p = P[1]*dM-1 + P[2]*dM-2 + . . . + P[M]*d0 • Solution: perform division by Q and get the remainder value
• Match patterns by comparing 2 corresponding code values: • When the 2 remainders are different, it means 2 different numeric values and 2 corresponding
• If the two codes are different, the two corresponding strings are different strings are also different
• If the two codes are equal, we proceed to match each character • When the two remainders are equal, match each character in the traditional way
• Use the Horner scheme to increase the speed of calculating the encoding of substrings in T
• With sliding position s, convert the substring T[s+1 .. s+M] to number:

Ts = T[s+1]*dM-1 + T[s+2]*dM-2 + . . . + T[s+M]*d0


• With sliding position s+1, Ts+1 can be efficiently calculated based on Ts (previously calculated):
Ts+1 = (Ts - T[s+1]*dM-1)*d + T[s+M+1]

17 18

RABIN KARP ALGORITHM CONTENTS

hashCode(p){ rabinKarp(P, T){


c = 0; cnt = 0; N = T.length(); M = P.length(); • Boyer Moore algorithm
for i = 0 to p.length()-1 do {
c = c*256 + p[i];
e = dM-1;
codeP = hashCode(P); codeT = hashCode(T,0,M-1);
• Rabin Karp algorithm
c = c%Q; for s = 0 to N-M do {
• KMP algorithm
} if(codeP = codeT){
return c; ok = true;
} for j = 0 to M-1 do if P[j] != T[j + s] then {
hashCode(s, start, end){ ok = false; break;
c = 0; }
for i = start to end do { if ok then cnt++;
c = c*256 + s[i]; }
c = c%Q; t = T[s]*e; t = t %Q; t = (codeT - t)%Q;
} codeT = (t*d + T[s+M])%Q;
return c; }
} return cnt;
}

19 20
KMP ALGORITHM KMP ALGORITHM

a b a c b a c a c b a c • Preprocessing:
• [q]: length of the longest prefix which is also the
strict suffix of the string P[1..q]
a c b a

• Slide the sample string P from left to right 1 2 3 4 5 6 7 8


• Match: left to right P a b a b a b c a
• Use preprocessing information to skip as many
characters as possible  0 0 1 2 3 4 0 1

21 22

KMP ALGORITHM KMP ALGORITHM

• Preprocessing: • Slide the sample string P from left to right over T


• [q]: length of the longest prefix which is also the
strict suffix of the string P[1..q] kmp(P, T){
q = 0;
computePI(P){
for i = 1..N do {
1 2 3 4 5 6 7 8 pi[1] = 0;
while q > 0 && P[q+1] != T[i]
P a b a b a b c a k = 0;
q = pi[q];
for q = 2  M do {
 0 0 1 2 3 4 0 1 while(k > 0 && P[k+1] != P[q])
if P[q+1] = T[i]
q = q + 1;
k = pi[k];
if(q = M){
if P[k+1] = P[q] then
output(i-M+1);
k = k + 1;
q = pi[q];
pi[q] = k;
}
}
}
}
}

23 24
KMP ALGORITHM KMP ALGORITHM

• Slide the sample string P from left to right over T • Slide the sample string P from left to right over T

kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21


q = 0; q = 0;
a a a b a b a b c a c a b a b a b a b c a a a a b a b a b c a c a b a b a b a b c a
for i = 1..N do { for i = 1..N do {
while q > 0 && P[q+1] != T[i] a b a b a b c a while q > 0 && P[q+1] != T[i] a b a b a b c a
q = pi[q]; q = pi[q];
q
if P[q+1] = T[i] if P[q+1] = T[i]
q = q + 1; Init: q = 0 q = q + 1; i = 1, T[1] = P[0+1]  q = 1
if(q = M){ if(q = M){
output(i-M+1); output(i-M+1);
q = pi[q]; q = pi[q];
} }
} 1 2 3 4 5 6 7 8 } 1 2 3 4 5 6 7 8
} P a b a b a b c a } P a b a b a b c a

 0 0 1 2 3 4 0 1  0 0 1 2 3 4 0 1

25 26

KMP ALGORITHM KMP ALGORITHM

• Slide the sample string P from left to right over T • Slide the sample string P from left to right over T

kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21


q = 0; q = 0;
a a a b a b a b c a c a b a b a b a b c a a a a b a b a b c a c a b a b a b a b c a
for i = 1..N do { for i = 1..N do {
while q > 0 && P[q+1] != T[i] a b a b a b c a while q > 0 && P[q+1] != T[i] a b a b a b c a
q = pi[q]; q = pi[q];
q q
if P[q+1] = T[i] if P[q+1] = T[i]
q = q + 1; i = 2, T[2]  P[1+1] q = q + 1; i = 2, T[2]  P[1+1]  q = [1] = 0
if(q = M){ if(q = M){
output(i-M+1); output(i-M+1);
q = pi[q]; q = pi[q];
} }
} 1 2 3 4 5 6 7 8 } 1 2 3 4 5 6 7 8
} P a b a b a b c a } P a b a b a b c a

 0 0 1 2 3 4 0 1  0 0 1 2 3 4 0 1

27 28
KMP ALGORITHM KMP ALGORITHM

• Slide the sample string P from left to right over T • Slide the sample string P from left to right over T

kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21


q = 0; q = 0;
a a a b a b a b c a c a b a b a b a b c a a a a b a b a b c a c a b a b a b a b c a
for i = 1..N do { for i = 1..N do {
while q > 0 && P[q+1] != T[i] a b a b a b c a while q > 0 && P[q+1] != T[i] a b a b a b c a
q = pi[q]; q = pi[q];
q q
if P[q+1] = T[i] if P[q+1] = T[i]
q = q + 1; i = 3, T[3] = P[0+1] q = q + 1; i = 3, T[3] = P[0+1]  q = q + 1 = 1
if(q = M){ if(q = M){
output(i-M+1); output(i-M+1);
q = pi[q]; q = pi[q];
} }
} 1 2 3 4 5 6 7 8 } 1 2 3 4 5 6 7 8
} P a b a b a b c a } P a b a b a b c a

 0 0 1 2 3 4 0 1  0 0 1 2 3 4 0 1

29 30

KMP ALGORITHM KMP ALGORITHM

• Slide the sample string P from left to right over T • Slide the sample string P from left to right over T

kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21


q = 0; q = 0;
a a a b a b a b c a c a b a b a b a b c a a a a b a b a b c a c a b a b a b a b c a
for i = 1..N do { for i = 1..N do {
while q > 0 && P[q+1] != T[i] a b a b a b c a while q > 0 && P[q+1] != T[i] a b a b a b c a
q = pi[q]; q = pi[q];
q q
if P[q+1] = T[i] if P[q+1] = T[i]
q = q + 1; i = 4, T[4] = P[1+1] q = q + 1; i = 4, T[4] = P[1+1]  q = q + 1 = 2
if(q = M){ if(q = M){
output(i-M+1); output(i-M+1);
q = pi[q]; q = pi[q];
} }
} 1 2 3 4 5 6 7 8 } 1 2 3 4 5 6 7 8
} P a b a b a b c a } P a b a b a b c a

 0 0 1 2 3 4 0 1  0 0 1 2 3 4 0 1

31 32
KMP ALGORITHM KMP ALGORITHM

• Slide the sample string P from left to right over T • Slide the sample string P from left to right over T

kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21


q = 0; q = 0;
a a a b a b a b c a c a b a b a b a b c a a a a b a b a b c a c a b a b a b a b c a
for i = 1..N do { for i = 1..N do {
while q > 0 && P[q+1] != T[i] a b a b a b c a while q > 0 && P[q+1] != T[i] a b a b a b c a
q = pi[q]; q = pi[q];
q q
if P[q+1] = T[i] if P[q+1] = T[i]
q = q + 1; i = 5, T[5] = P[2+1] q = q + 1; i = 5, T[5] = P[2+1]  q = q + 1 = 3
if(q = M){ if(q = M){
output(i-M+1); output(i-M+1);
q = pi[q]; q = pi[q];
} }
} 1 2 3 4 5 6 7 8 } 1 2 3 4 5 6 7 8
} P a b a b a b c a } P a b a b a b c a

 0 0 1 2 3 4 0 1  0 0 1 2 3 4 0 1

33 34

KMP ALGORITHM KMP ALGORITHM

• Slide the sample string P from left to right over T • Slide the sample string P from left to right over T

kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21


q = 0; q = 0;
a a a b a b a b c a c a b a b a b a b c a a a a b a b a b c a c a b a b a b a b c a
for i = 1..N do { for i = 1..N do {
while q > 0 && P[q+1] != T[i] a b a b a b c a while q > 0 && P[q+1] != T[i] a b a b a b c a
q = pi[q]; q = pi[q];
q q
if P[q+1] = T[i] if P[q+1] = T[i]
q = q + 1; i = 6, T[6] = P[3+1] q = q + 1; i = 6, T[6] = P[3+1]  q = q + 1 = 4
if(q = M){ if(q = M){
output(i-M+1); output(i-M+1);
q = pi[q]; q = pi[q];
} }
} 1 2 3 4 5 6 7 8 } 1 2 3 4 5 6 7 8
} P a b a b a b c a } P a b a b a b c a

 0 0 1 2 3 4 0 1  0 0 1 2 3 4 0 1

35 36
KMP ALGORITHM KMP ALGORITHM

• Slide the sample string P from left to right over T • Slide the sample string P from left to right over T

kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21


q = 0; q = 0;
a a a b a b a b c a c a b a b a b a b c a a a a b a b a b c a c a b a b a b a b c a
for i = 1..N do { for i = 1..N do {
while q > 0 && P[q+1] != T[i] a b a b a b c a while q > 0 && P[q+1] != T[i] a b a b a b c a
q = pi[q]; q = pi[q];
q q
if P[q+1] = T[i] if P[q+1] = T[i]
q = q + 1; i = 7, T[7] = P[4+1] q = q + 1; i = 7, T[7] = P[4+1]  q = q + 1 = 5
if(q = M){ if(q = M){
output(i-M+1); output(i-M+1);
q = pi[q]; q = pi[q];
} }
} 1 2 3 4 5 6 7 8 } 1 2 3 4 5 6 7 8
} P a b a b a b c a } P a b a b a b c a

 0 0 1 2 3 4 0 1  0 0 1 2 3 4 0 1

37 38

KMP ALGORITHM KMP ALGORITHM

• Slide the sample string P from left to right over T i • Slide the sample string P from left to right over T i

kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21


q = 0; q = 0;
a a a b a b a b c a c a b a b a b a b c a a a a b a b a b c a c a b a b a b a b c a
for i = 1..N do { for i = 1..N do {
while q > 0 && P[q+1] != T[i] a b a b a b c a while q > 0 && P[q+1] != T[i] a b a b a b c a
q = pi[q]; q = pi[q];
q q
if P[q+1] = T[i] if P[q+1] = T[i]
q = q + 1; i = 8, T[8] = P[5+1] q = q + 1; i = 8, T[8] = P[5+1]  q = q + 1 = 6
if(q = M){ if(q = M){
output(i-M+1); output(i-M+1);
q = pi[q]; q = pi[q];
} }
} 1 2 3 4 5 6 7 8 } 1 2 3 4 5 6 7 8
} P a b a b a b c a } P a b a b a b c a

 0 0 1 2 3 4 0 1  0 0 1 2 3 4 0 1

39 40
KMP ALGORITHM KMP ALGORITHM

• Slide the sample string P from left to right over T i • Slide the sample string P from left to right over T i

kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21


q = 0; q = 0;
a a a b a b a b c a c a b a b a b a b c a a a a b a b a b c a c a b a b a b a b c a
for i = 1..N do { for i = 1..N do {
while q > 0 && P[q+1] != T[i] a b a b a b c a while q > 0 && P[q+1] != T[i] a b a b a b c a
q = pi[q]; q = pi[q];
q q
if P[q+1] = T[i] if P[q+1] = T[i]
q = q + 1; i = 9, T[9] = P[6+1] q = q + 1; i = 9, T[9] = P[6+1]  q = q + 1 = 7
if(q = M){ if(q = M){
output(i-M+1); output(i-M+1);
q = pi[q]; q = pi[q];
} }
} 1 2 3 4 5 6 7 8 } 1 2 3 4 5 6 7 8
} P a b a b a b c a } P a b a b a b c a

 0 0 1 2 3 4 0 1  0 0 1 2 3 4 0 1

41 42

KMP ALGORITHM KMP ALGORITHM

• Slide the sample string P from left to right over T i • Slide the sample string P from left to right over T i

kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21


q = 0; q = 0;
a a a b a b a b c a c a b a b a b a b c a a a a b a b a b c a c a b a b a b a b c a
for i = 1..N do { for i = 1..N do {
while q > 0 && P[q+1] != T[i] a b a b a b c a while q > 0 && P[q+1] != T[i] a b a b a b c a
q = pi[q]; q = pi[q];
q q
if P[q+1] = T[i] if P[q+1] = T[i]
q = q + 1; i = 10, T[10] = P[7+1] q = q + 1; i = 10, T[10] = P[7+1]  q = q + 1 = 8 FOUND
if(q = M){ if(q = M){
output(i-M+1); output(i-M+1);
q = pi[q]; q = pi[q];
} }
} 1 2 3 4 5 6 7 8 } 1 2 3 4 5 6 7 8
} P a b a b a b c a } P a b a b a b c a

 0 0 1 2 3 4 0 1  0 0 1 2 3 4 0 1

43 44
KMP ALGORITHM KMP ALGORITHM

• Slide the sample string P from left to right over T i • Slide the sample string P from left to right over T i

kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21


q = 0; q = 0;
a a a b a b a b c a c a b a b a b a b c a a a a b a b a b c a c a b a b a b a b c a
for i = 1..N do { for i = 1..N do {
while q > 0 && P[q+1] != T[i] a b a b a b c a while q > 0 && P[q+1] != T[i] a b a b a b c a
q = pi[q]; q = pi[q];
q q
if P[q+1] = T[i] if P[q+1] = T[i]
q = q + 1; i = 10, T[10] = P[7+1]  q = q + 1 = 8 FOUND, q = [q] = [8] = 1 q = q + 1; i = 11, T[11]  P[1+1]
if(q = M){ if(q = M){
output(i-M+1); output(i-M+1);
q = pi[q]; q = pi[q];
} }
} 1 2 3 4 5 6 7 8 } 1 2 3 4 5 6 7 8
} P a b a b a b c a } P a b a b a b c a

 0 0 1 2 3 4 0 1  0 0 1 2 3 4 0 1

45 46

KMP ALGORITHM KMP ALGORITHM


kmp(P, T){
• Slide the sample string P from left to right over T i computePi(p){ P = "-" + P; T = "-" + T;
pi[1] = 0; computePi(P);
kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 int k = 0;
cnt = 0;
q = 0; for q = 2 to p.length()-1 do {
a a a b a b a b c a c a b a b a b a b c a N = T.length()-1;
while(k > 0 && p[k+1] != p[q]) do
for i = 1..N do { M = P.length()-1;
k = pi[k];
while q > 0 && P[q+1] != T[i] a b a b a b c a q = 0;
if (p[k+1] = p[q]) then k = k + 1;
q = pi[q]; for i= 1 to N do {
q pi[q] = k;
if P[q+1] = T[i] while(q > 0 and P[q+1] != T[i]) do
}
q = q + 1; i = 11, T[11]  P[1+1]  q = [q] = [1] = 0 q = pi[q];
}
if(q = M){ if(P[q+1] = T[i]) then
output(i-M+1); q = q + 1;
q = pi[q]; if(q = M) then {
} cnt += 1; q = pi[q];
} 1 2 3 4 5 6 7 8 }
} P a b a b a b c a }
return cnt;
 0 0 1 2 3 4 0 1
}

47 48
THANK YOU !

49

You might also like