0% found this document useful (0 votes)
55 views

String Matching Class

The document summarizes the Boyer-Moore string matching algorithm. It works by shifting the pattern right to left, unlike the Knuth-Morris-Pratt algorithm. It uses two heuristics - the bad character heuristic and good suffix heuristic - to determine how far to shift the pattern. The worst case time complexity is O((n-m+1)m + |Σ|) where n is the text length, m is the pattern length, and Σ is the alphabet size.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

String Matching Class

The document summarizes the Boyer-Moore string matching algorithm. It works by shifting the pattern right to left, unlike the Knuth-Morris-Pratt algorithm. It uses two heuristics - the bad character heuristic and good suffix heuristic - to determine how far to shift the pattern. The worst case time complexity is O((n-m+1)m + |Σ|) where n is the text length, m is the pattern length, and Σ is the alphabet size.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

B.H.

Manjunatha Kumar
@
S.S.I.T
The Boyer-Moore Algorithm
“If the pattern P is relatively long and the alphabet Σ is reasonably large, then
[this algorithm] is likely to be the most efficient string-matching algorithm.” Matches right to left, unlike KMP.

Boyer-Moore-Matcher(T, P, Σ)
1. n <- length[T]
2. m <- length[P]
3. λ <- Compute-Last-Occurrence-Function(P, m, Σ)
4. γ <- Compute-Good-Suffix-Function(P, m)
5. s <- 0
6. while s <= n – m
7. do j <- m
8. while j > 0 and P[j] = T[s+j]
9. do j <- j – 1
10. if j = 0
11. then print “Pattern occurs at shift s”
12. s <- s + γ[0]
13. else s <- s + max(γ[0], j – λ[T[s+j]] )
The function Boyer-Moore-Matcher(T,P, Σ) “looks remarkably like the naive stringmatching algorithm.” Indeed,
commenting out lines 3-4 and changing lines 12-13 to s <- s + 1, results in a version of the naive string-matching
algorithm.
The Boyer-Moore Algorithm uses the greater of two heuristics to determine how much to shift next by.
The first heuristic, is the bad-character heuristic.
In general, works as follows:
P[j] != T[s+j] for some j, where 1<= j <= m.
Let k be the largest index in the range 1 <= k<= m such that T[s+j] = P[k], if any such k exists. Otherwise let k = 0.
We can safely increase by j – k, three cases to show this.
Case 3. k > j, resulting in a negative shift
Good Suffix Heuristic
Define the relation Q ~ R for strings Q and R to mean that Q ⊃ R or R ⊃ Q.
If two strings are similar, then we can align them with their rightmost characters matched, and no pair of aligned
characters will disagree.
The relation “~” is symmetric.
Q ~ R and S ~ R imply Q ~ S
“If P[j] != T[s+j], where j < m, then the good-suffix heuristic says that we can safely
advance by
γ[j] = m – max{k: 0 <= k < m and P[j+1..m] ~ Pk}”
“γ[j] is the least amount we can advance s and not cause any characters in the “good suffix” T[s + j + 1..s + m] to
be mismatched against the new alignment of the pattern.”
γ[j] > 0 for all j = 1..m, which ensures that this algorithm makes progress.
Example to compute Good Suffix Heuristic
Analysis
Worst case is O((n – m + 1)m + |Σ|)
Compute-Last-Occurrence-Function takes time O(m + |Σ|).
Compute-Good-Suffix-Function takes time O(m).
O(m) time is spent validating each valid shift s.

Note: For example problems refer class notes

You might also like