Boyer-Moore Algorithm for Pattern Searching in C++
Last Updated :
05 Aug, 2024
The Boyer-Moore algorithm is an efficient string searching algorithm that is used to find occurrences of a pattern within a text. This algorithm preprocesses the pattern and uses this information to skip sections of the text, making it much faster than simpler algorithms like the naive approach.
In this article, we will learn the Boyer-Moore algorithm and its implementation in C++.
Example
Input:
txt[] = “THIS IS A TEST TEXT”
pat[] = “TEST”
Output:
Pattern found at index 10
The Boyer-Moore algorithm is a pattern matching algorithm that uses two heuristics to improve its performance: the bad character heuristic and the good suffix heuristic. These heuristics allow the algorithm to skip over sections of the text that cannot contain the pattern, thus reducing the number of comparisons needed.
Boyer-Moore in C++How does Boyer-Moore Algorithm work in C++?
The algorithm works by aligning the pattern against the text and then attempting to match it from right to left. If a mismatch is found, the algorithm uses the bad character and good suffix heuristics to determine how far to shift the pattern to the right before attempting the next match.
Case 1: Mismatch Becomes Match
If a mismatch occurs at a position, look up the last occurrence of the mismatched character in the pattern. Shift the pattern such that this character in the pattern aligns with the mismatched character in the text. This allows us to skip unnecessary comparisons.
Example:
- Mismatch at position 3 with character 'A'.
- Last occurrence of 'A' in the pattern is at position 1.
- Shift the pattern right by 2 positions to align 'A' in the pattern with 'A' in the text.
Case 2: Pattern Moves Past the Mismatch Character
If the mismatched character does not exist in the pattern, shift the pattern past the mismatched character. This ensures that any future alignments do not involve the same mismatched character.
Example:
- Mismatch at position 7 with character 'C'.
- 'C' does not exist in the pattern before position 7.
- Shift the pattern right past position 7, resulting in a perfect match of the pattern in the text.
Steps of Boyer-Moore Algorithm to Implement in C++
To implement the Boyer-Moore algorithm in C++, follow these steps:
- Preprocess the pattern to create the bad character table. This table stores the last occurrence of each character in the pattern. If a character is not present in the pattern, its value is set to -1.
- Preprocess the pattern to create the good suffix table. This table helps determine how far to shift the pattern when a mismatch occurs after a partial match.
- Initialize the shift of the pattern to the beginning of the text.
- Compare the pattern with the text from right to left. Start with the last character of the pattern and move towards the first character.
- If a mismatch is found, use the bad character table and good suffix table to calculate the shift distance. Shift the pattern to the right by the maximum value suggested by either heuristic.
- If a complete match is found, print the starting index of the match in the text and shift the pattern to the right to continue searching.
- Repeat steps 4 to 6 until the pattern has been aligned with the end of the text.
C++ Program to Implement Boyer-Moore Algorithm
Below is a C++ program that implements the Boyer-Moore algorithm for pattern searching:
C++
// C++ Program for Bad Character Heuristic of Boyer Moore String Matching Algorithm
#include <algorithm>
#include <iostream>
#include <string>
#define NO_OF_CHARS 256
using namespace std;
// The preprocessing function for Boyer Moore's
// bad character heuristic
void badCharHeuristic(const string &str, int size, int badchar[NO_OF_CHARS])
{
// Initialize all occurrences as -1
for (int i = 0; i < NO_OF_CHARS; i++)
badchar[i] = -1;
// Fill the actual value of last occurrence
// of a character
for (int i = 0; i < size; i++)
badchar[(int)str[i]] = i;
}
/* A pattern searching function that uses Bad
Character Heuristic of Boyer Moore Algorithm */
void search(const string &txt, const string &pat)
{
int m = pat.size();
int n = txt.size();
int badchar[NO_OF_CHARS];
/* Fill the bad character array by calling
the preprocessing function badCharHeuristic()
for the given pattern */
badCharHeuristic(pat, m, badchar);
int s = 0; // s is the shift of the pattern with
// respect to text
while (s <= (n - m))
{
int j = m - 1;
/* Keep reducing index j of the pattern while
characters of the pattern and text are
matching at this shift s */
while (j >= 0 && pat[j] == txt[s + j])
j--;
/* If the pattern is present at the current
shift, then index j will become -1 after
the above loop */
if (j < 0)
{
cout << "Pattern occurs at shift = " << s << endl;
/* Shift the pattern so that the next
character in the text aligns with the last
occurrence of it in the pattern.
The condition s+m < n is necessary for
the case when the pattern occurs at the end
of the text */
s += (s + m < n) ? m - badchar[txt[s + m]] : 1;
}
else
{
/* Shift the pattern so that the bad character
in the text aligns with the last occurrence of
it in the pattern. The max function is used to
make sure that we get a positive shift.
We may get a negative shift if the last
occurrence of the bad character in the pattern
is on the right side of the current
character. */
s += max(1, j - badchar[txt[s + j]]);
}
}
}
/* Driver code */
int main()
{
string txt = "ABAAABCD";
string pat = "ABC";
search(txt, pat);
return 0;
}
OutputPattern occurs at shift = 4
Time Complexity: O(m*n)
Auxiliary Space: O(1)
The Bad Character Heuristic may take O(m*n) time in worst case. The worst case occurs when all characters of the text and pattern are same. For example, txt[] = “AAAAAAAAAAAAAAAAAA” and pat[] = “AAAAA”. The Bad Character Heuristic may take O(n/m) in the best case. The best case occurs when all the characters of the text and pattern are different.
Boyer-Moore vs Traditional Pattern Searching Algorithms
Compared to traditional algorithms like the Naive algorithm, which checks each position in the text one by one, Boyer-Moore often skips large sections of the text, resulting in faster performance. The Knuth-Morris-Pratt (KMP) algorithm is more efficient than the Naive algorithm by preprocessing the pattern to create a partial match table, allowing it to skip unnecessary comparisons, but Boyer-Moore generally outperforms KMP in practical applications due to its heuristics.
Applications of Boyer-Moore Algorithm
- Used in text editors and search engines to find occurrences of a word or phrase.
- Used in DNA sequence analysis to find patterns within genetic data.
- Used in algorithms for data compression to find repeating patterns.
- Used in intrusion detection systems to find patterns of malicious activity within network traffic.
Similar Reads
Boyer-Moore Algorithm for Pattern Searching in C In this article, we will learn the Boyer-Moore Algorithm, a powerful technique for pattern searching in strings using C programming language. What is the Boyer-Moore Algorithm?The Boyer-Moore Algorithm is a pattern searching algorithm that efficiently finds occurrences of a pattern within a text. It
7 min read
KMP (Knuth-Morris-Pratt) Algorithm for Pattern Searching in C The KMP (Knuth-Morris-Pratt) algorithm is an efficient string searching algorithm used to find occurrences of a pattern within a text. Unlike simpler algorithms, KMP preprocesses the pattern to create a partial match table, known as the "lps" (Longest Prefix Suffix) array, which helps in skipping un
4 min read
Boyer Moore Algorithm in Python Boyer-Moore algorithm is an efficient string search algorithm that is particularly useful for large-scale searches. Unlike some other string search algorithms, the Boyer-Moore does not require preprocessing, making it ideal where the sample is relatively large relative to the data being searched. Wh
3 min read
Explain an alternative Sorting approach for MO's Algorithm MO's Algorithm is an algorithm designed to efficiently answer range queries in an array in linear time. It is a divide-and-conquer approach that involves pre-processing the array, partitioning it into blocks, and then solving the queries in each of the blocks. Alternate Approach for Sorting: An alte
15+ min read
Implementation of Rabin Karp Algorithm in C++ The Rabin-Karp Algorithm is a string-searching algorithm that efficiently finds a pattern within a text using hashing. It is particularly useful for finding multiple patterns in the same text or for searching in streams of data. In this article, we will learn how to implement the Rabin-Karp Algorith
5 min read
Searching Elements in an Array | Array Operations In this post, we will look into search operation in an Array, i.e., how to search an element in an Array, such as: Searching in an Unsorted Array using Linear SearchSearching in a Sorted Array using Linear SearchSearching in a Sorted Array using Binary SearchSearching in an Sorted Array using Fibona
15+ min read
C++ Program For Sentinel Linear Search The Sentinal linear search is a version of linear search where the last element of the array is replaced by a value to be searched. This helps in reducing the number of comparisons made to check for array boundaries and hence, improving the performance. In this article, we will discuss the sentinal
7 min read
Searching in a Map Using std::map Functions in C++ In C++, map container is defined as std::map class template that also contains member function to search for an element on the bases of the keys. In this article, we will learn different methods to search for an element with the given key in C++.The recommended method to search for the given key in
4 min read
Java Program to Implement Commentz-Walter Algorithm The Commentz-Walter algorithm is a string-matching algorithm that is used to search for a given pattern in a text string. It is a variant of the Knuth-Morris-Pratt (KMP) algorithm, which is a well-known algorithm for string matching. This algorithm works by preprocessing the pattern string to create
6 min read
Implementation of Wu Manber Algorithm? What is Wu- Manber Algorithm? The Wu-Manber algorithm is a string-matching algorithm that is used to efficiently search for patterns in a body of text. It is a hybrid algorithm that combines the strengths of the Boyer-Moore and Knuth-Morris-Pratt algorithms to provide fast and accurate pattern match
12 min read