Pattern Matching 2
Pattern Matching 2
Semester 1, 2006-2007
Pattern Matching
Dr. Andrew Davison
WiG Lab (teachers room), CoE
[email protected]
T: a b a c a a b
1
P: a b a c a b
4 3 2
a b a c a b
Applications:
– text editors, Web search engines (e.g. Google), ima
ge analysis
T: a n d r e w T: a n d r e w
P: r e w P: r e w
T:
P: j=6
jnew = 3
b(5) means
– find the size of the largest prefix of P[1..5] that
is also a suffix of P[1..5]
= find the size largest prefix of "abaab" that
is also a suffix of "baab"
= find the size of "ab"
=2
int i=0;
int j=0;
:
int m = pattern.length();
int j = 0;
int i = 1;
:
b(4) means
– find the size of the largest prefix of P[1..5] that
is also a suffix of P[1..5]
= find the size largest prefix of "abaca" that
is also a suffix of "baca"
= find the size of "a"
=1
T: a b a a b x
Basic KMP
P: a b a a b a does not do this.
a b a a b a
T x a
There are 3 possible
i
cases, tried in order.
P ba
j
240-301 Comp. Eng. Lab III (Software), Pattern Matching 31
Case 1
If P contains x somewhere, then try to
shift P right to align the last occurrence
of x in P with T[i].
T x a T x a ? ?
i inew
and
move i and
j right, so
P x c ba j at end P x c ba
j jnew
240-301 Comp. Eng. Lab III (Software), Pattern Matching 32
Case 2
If P contains x somewhere, but a shift right
to the last occurrence is not possible, then
shift P right by 1 character to T[i+1].
T x a x T xa x ?
i inew
and
move i and
j right, so
P cw ax j at end P cwax
j x is after jnew
240-301 Comp. Eng. Lab jIII position
(Software), Pattern Matching 33
Case 3
If cases 1 and 2 do not apply, then shift P to
align P[1] with T[i+1].
T x a T x a ? ? ?
i inew
and
move i and
j right, so
P d c ba j at end P d c ba
j 1 jnew
No x in P
240-301 Comp. Eng. Lab III (Software), Pattern Matching 34
Boyer-Moore Example (1)
T:
a p a t t e r n m a t c h i n g a l g o r i t h m
1 3 5 11 10 9 8 7
r i t h m r i t h m r i t h m r i t h m
P: 2 4 6
r i t h m r i t h m r i t h m
x a b c d
L(x) 5 6 4 -1
x a b c d
L(x) 5 6 4 1
240-301 Comp. Eng. Lab III (Software), Pattern Matching 39
Return index where
Boyer-Moore in Java pattern starts, or -1
if (i > n-1)
return -1; // no match if pattern is
// longer than text
:
return last;
} // end of buildLast()