Moore Algorithm
Moore Algorithm
NO_OF_CHARS = 256
def badCharHeuris c(string, size):
'''
The preprocessing func on for
Boyer Moore's bad character heuris c
'''
'''
Shi the pa ern so that the next character in text
aligns with the last occurrence of it in pa ern.
The condi on s+m < n is necessary for the case when
pa ern occurs at the end of text
'''
s += (m-badChar[ord(txt[s+m])] if s+m < n else 1)
else:
'''
Shi the pa ern so that the bad character in text
aligns with the last occurrence of it in pa ern. The
max func on is used to make sure that we get a posi ve
shi . We may get a nega ve shi if the last occurrence
of bad character in pa ern is on the right side of the
current character.
'''
s += max(1, j-badChar[ord(txt[s+j])])
if __name__ == '__main__':
main()
Output
pa ern occurs at shi = 4
Time Complexity : O(m*n)
Auxiliary Space: O(1)
The Bad Character Heuris c may take O(m*n) me in worst case. The worst
case occurs when all characters of the text and pa ern are same. For example,
txt[] = “AAAAAAAAAAAAAAAAAA” and pat[] = “AAAAA”. The Bad Character
Heuris c may take O(n/m) in the best case. The best case occurs when all the
characters of the text and pa ern are different.
Good Suffix
heuris c for pa ern searching. Just like bad character heuris c, a preprocessing
table is generated for good suffix heuris c.
Good Suffix Heuris c
Let
t
be substring of text
T
which is matched with substring of pa ern
P
Now we shi pa ern un l : 1) Another occurrence of t in P matched with t in T.
2) A prefix of P, which matches with suffix of t 3) P moves past t
Case 1: Another occurrence of t in P matched with t in T
Pa ern P might contain few more occurrences of
t
. In such case, we will try to shi the pa ern to align that occurrence with t in
text T. For example-
Explana on:
In the above example, we have got a substring t of text T matched with pa ern
P (in green) before mismatch at index 2. Now we will search for occurrence of t
(“AB”) in P. We have found an occurrence star ng at posi on 1 (in yellow
background) so we will right shi the pa ern 2 mes to align t in P with t in T.
This is weak rule of original Boyer Moore and not much effec ve, we will
discuss a
Strong Good Suffix rule
shortly.
Case 2: A prefix of P, which matches with suffix of t in T
It is not always likely that we will find the occurrence of t in P. Some mes there
is no occurrence at all, in such cases some mes we can search for some
suffix of t
matching with some
prefix of P
and try to align them by shi ing P. For example –
Explana on:
In above example, we have got t (“BAB”) matched with P (in green) at index 2-4
before mismatch . But because there exists no occurrence of t in P we will
search for some prefix of P which matches with some suffix of t. We have found
prefix “AB” (in the yellow background) star ng at index 0 which matches not
with whole t but the suffix of t “AB” star ng at index 3. So now we will shi
pa ern 3 mes to align prefix with the suffix.
Case 3: P moves past t
If the above two cases are not sa sfied, we will shi the pa ern past the t. For
example –
Explana on:
If above example, there exist no occurrence of t (“AB”) in P and also there is no
prefix in P which matches with the suffix of t. So, in that case, we can never find
any perfect match before index 4, so we will shi the P past the t ie. to index 5.
Strong Good suffix Heuris c
Suppose substring
q = P[i to n]
got matched with
t
in T and
c = P[i-1]
is the mismatching character. Now unlike case 1 we will search for t in P which
is not preceded by character
c
. The closest such occurrence is then aligned with t in T by shi ing pa ern P.
For example –
Explana on:
In above example,
q = P[7 to 8]
got matched with t in T. The mismatching character
c
is “C” at posi on P[6]. Now if we start searching t in P we will get the first
occurrence of t star ng at posi on 4. But this occurrence is preceded by “C”
which is equal to c, so we will skip this and carry on searching. At posi on 1 we
got another occurrence of t (in the yellow background). This occurrence is
preceded by “A” (in blue) which is not equivalent to c. So we will shi pa ern P
6 mes to align this occurrence with t in T.We are doing this because we
already know that character
c = “C”
causes the mismatch. So any occurrence of t preceded by c will again cause
mismatch when aligned with t, so that’s why it is be er to skip this.
Preprocessing for Good suffix heuris c
As a part of preprocessing, an array
shi
is created. Each entry
shi [i]
contain the distance pa ern will shi if mismatch occur at posi on
i-1
. That is, the suffix of pa ern star ng at posi on
i
is matched and a mismatch occur at posi on
i-1
. Preprocessing is done separately for strong good suffix and case 2 discussed
above.
1) Preprocessing for Strong Good Suffix
Before discussing preprocessing, let us first discuss the idea of border. A
border
is a substring which is both proper suffix and proper prefix. For example, in
string
“ccacc”
,
“c”
is a border,
“cc”
is a border because it appears in both end of string but
“cca”
is not a border. As a part of preprocessing an array
bpos
(border posi on) is calculated. Each entry
bpos[i]
contains the star ng index of border for suffix star ng at index i in given
pa ern P. The suffix
?
beginning at posi on m has no border, so
bpos[m]
is set to
m+1
where
m
is the length of the pa ern. The shi posi on is obtained by the borders which
cannot be extended to the le . Following is the code for preprocessing –
void preprocess_strong_suffix(int *shi , int *bpos,
char *pat, int m)
{
int i = m, j = m+1;
bpos[i] = j;
while(i > 0)
{
while(j <= m && pat[i-1] != pat[j-1])
{
if (shi [j] == 0)
shi [j] = j-i;
j = bpos[j];
}
i--; j--;
bpos[i] = j;
}
}
Explana on:
Consider pa ern
P = “ABBABAB”, m = 7
.
The widest border of suffix “AB” beginning at posi on i = 5 is ?(nothing)
star ng at posi on 7 so bpos[5] = 7.
At posi on i = 2 the suffix is “BABAB”. The widest border for this suffix is
“BAB” star ng at posi on 4, so j = bpos[2] = 4.
We can understand
bpos[i] = j
using following example –
If character
#
Which is at posi on
i-1
is equivalent to character
?
at posi on
j-1
, we know that border will be
? + border of suffix at posi on i
which is star ng at posi on
j
which is equivalent to saying that
border of suffix at i-1 begin at j-1
or
bpos[ i-1 ] = j-1
or in the code –
i--;
j--;
bpos[ i ] = j
But if character
#
at posi on
i-1
do not match with character
?
at posi on
j-1
then we con nue our search to the right. Now we know that –
while i > 0:
bpos = [0] * (m + 1)
# do preprocessing
preprocess_strong_suffix(shi , bpos, pat, m)
preprocess_case2(shi , bpos, pat, m)
while s <= n - m:
j=m-1
# Driver Code
if __name__ == "__main__":
text = "ABAAAABAACD"
pat = "ABA"
search(text, pat)