0% found this document useful (0 votes)
29 views

Boyer-Moore Algorithm

explanation of boyer-moore in scientific view

Uploaded by

Andrej Mikuš
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Boyer-Moore Algorithm

explanation of boyer-moore in scientific view

Uploaded by

Andrej Mikuš
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

Definitions

Boyer-moore algorithm

A N P A N M A N -
P A N - - - - - -
- P A N - - - - -
- - P A N - - - -
- - - P A N - - -
- - - - P A N - -
- - - - - P A N -
Alignments of pattern PAN to text ANPANMAN,
from k=3 to k=8. A match occurs at k=5.

T denotes the input text to be searched. Its length is n.


P denotes the string to be searched for, called the pattern. Its length is m.
S[i] denotes the character at index i of string S, counting from 1.
S[i..j] denotes the substring of string S starting at index i and ending at j,
inclusive.
A prefix of S is a substring S[1..i] for some i in range [1, l], where l is the
length of S.
A suffix of S is a substring S[i..l] for some i in range [1, l], where l is the
length of S.
An alignment of P to T is an index k in T such that the last character of P is
aligned with index k of T.
A match or occurrence of P occurs at an alignment k if P is equivalent to T[(k-
m+1)..k].

Description

The Boyer–Moore algorithm searches for occurrences of P in T by performing explicit


character comparisons at different alignments. Instead of a brute-force search of
all alignments (of which there are n − m + 1 {\displaystyle n-m+1}), Boyer–Moore
uses information gained by preprocessing P to skip as many alignments as possible.

Previous to the introduction of this algorithm, the usual way to search within text
was to examine each character of the text for the first character of the pattern.
Once that was found the subsequent characters of the text would be compared to the
characters of the pattern. If no match occurred then the text would again be
checked character by character in an effort to find a match. Thus almost every
character in the text needs to be examined.

The key insight in this algorithm is that if the end of the pattern is compared to
the text, then jumps along the text can be made rather than checking every
character of the text. The reason that this works is that in lining up the pattern
against the text, the last character of the pattern is compared to the character in
the text. If the characters do not match, there is no need to continue searching
backwards along the text. If the character in the text does not match any of the
characters in the pattern, then the next character in the text to check is located
m characters farther along the text, where m is the length of the pattern. If the
character in the text is in the pattern, then a partial shift of the pattern along
the text is done to line up along the matching character and the process is
repeated. Jumping along the text to make comparisons rather than checking every
character in the text decreases the number of comparisons that have to be made,
which is the key to the efficiency of the algorithm.

More formally, the algorithm begins at alignment k = m {\displaystyle k=m}, so the


start of P is aligned with the start of T. Characters in P and T are then compared
starting at index m in P and k in T, moving backward. The strings are matched from
the end of P to the start of P. The comparisons continue until either the beginning
of P is reached (which means there is a match) or a mismatch occurs upon which the
alignment is shifted forward (to the right) according to the maximum value
permitted by a number of rules. The comparisons are performed again at the new
alignment, and the process repeats until the alignment is shifted past the end of
T, which means no further matches will be found.

The shift rules are implemented as constant-time table lookups, using tables
generated during the preprocessing of P.

You might also like