Boyer-Moore Algorithm for Pattern Searching in C++
Last Updated :
05 Aug, 2024
The Boyer-Moore algorithm is an efficient string searching algorithm that is used to find occurrences of a pattern within a text. This algorithm preprocesses the pattern and uses this information to skip sections of the text, making it much faster than simpler algorithms like the naive approach.
In this article, we will learn the Boyer-Moore algorithm and its implementation in C++.
Example
Input:
txt[] = “THIS IS A TEST TEXT”
pat[] = “TEST”
Output:
Pattern found at index 10
The Boyer-Moore algorithm is a pattern matching algorithm that uses two heuristics to improve its performance: the bad character heuristic and the good suffix heuristic. These heuristics allow the algorithm to skip over sections of the text that cannot contain the pattern, thus reducing the number of comparisons needed.
Boyer-Moore in C++How does Boyer-Moore Algorithm work in C++?
The algorithm works by aligning the pattern against the text and then attempting to match it from right to left. If a mismatch is found, the algorithm uses the bad character and good suffix heuristics to determine how far to shift the pattern to the right before attempting the next match.
Case 1: Mismatch Becomes Match
If a mismatch occurs at a position, look up the last occurrence of the mismatched character in the pattern. Shift the pattern such that this character in the pattern aligns with the mismatched character in the text. This allows us to skip unnecessary comparisons.
Example:
- Mismatch at position 3 with character 'A'.
- Last occurrence of 'A' in the pattern is at position 1.
- Shift the pattern right by 2 positions to align 'A' in the pattern with 'A' in the text.
Case 2: Pattern Moves Past the Mismatch Character
If the mismatched character does not exist in the pattern, shift the pattern past the mismatched character. This ensures that any future alignments do not involve the same mismatched character.
Example:
- Mismatch at position 7 with character 'C'.
- 'C' does not exist in the pattern before position 7.
- Shift the pattern right past position 7, resulting in a perfect match of the pattern in the text.
Steps of Boyer-Moore Algorithm to Implement in C++
To implement the Boyer-Moore algorithm in C++, follow these steps:
- Preprocess the pattern to create the bad character table. This table stores the last occurrence of each character in the pattern. If a character is not present in the pattern, its value is set to -1.
- Preprocess the pattern to create the good suffix table. This table helps determine how far to shift the pattern when a mismatch occurs after a partial match.
- Initialize the shift of the pattern to the beginning of the text.
- Compare the pattern with the text from right to left. Start with the last character of the pattern and move towards the first character.
- If a mismatch is found, use the bad character table and good suffix table to calculate the shift distance. Shift the pattern to the right by the maximum value suggested by either heuristic.
- If a complete match is found, print the starting index of the match in the text and shift the pattern to the right to continue searching.
- Repeat steps 4 to 6 until the pattern has been aligned with the end of the text.
C++ Program to Implement Boyer-Moore Algorithm
Below is a C++ program that implements the Boyer-Moore algorithm for pattern searching:
C++
// C++ Program for Bad Character Heuristic of Boyer Moore String Matching Algorithm
#include <algorithm>
#include <iostream>
#include <string>
#define NO_OF_CHARS 256
using namespace std;
// The preprocessing function for Boyer Moore's
// bad character heuristic
void badCharHeuristic(const string &str, int size, int badchar[NO_OF_CHARS])
{
// Initialize all occurrences as -1
for (int i = 0; i < NO_OF_CHARS; i++)
badchar[i] = -1;
// Fill the actual value of last occurrence
// of a character
for (int i = 0; i < size; i++)
badchar[(int)str[i]] = i;
}
/* A pattern searching function that uses Bad
Character Heuristic of Boyer Moore Algorithm */
void search(const string &txt, const string &pat)
{
int m = pat.size();
int n = txt.size();
int badchar[NO_OF_CHARS];
/* Fill the bad character array by calling
the preprocessing function badCharHeuristic()
for the given pattern */
badCharHeuristic(pat, m, badchar);
int s = 0; // s is the shift of the pattern with
// respect to text
while (s <= (n - m))
{
int j = m - 1;
/* Keep reducing index j of the pattern while
characters of the pattern and text are
matching at this shift s */
while (j >= 0 && pat[j] == txt[s + j])
j--;
/* If the pattern is present at the current
shift, then index j will become -1 after
the above loop */
if (j < 0)
{
cout << "Pattern occurs at shift = " << s << endl;
/* Shift the pattern so that the next
character in the text aligns with the last
occurrence of it in the pattern.
The condition s+m < n is necessary for
the case when the pattern occurs at the end
of the text */
s += (s + m < n) ? m - badchar[txt[s + m]] : 1;
}
else
{
/* Shift the pattern so that the bad character
in the text aligns with the last occurrence of
it in the pattern. The max function is used to
make sure that we get a positive shift.
We may get a negative shift if the last
occurrence of the bad character in the pattern
is on the right side of the current
character. */
s += max(1, j - badchar[txt[s + j]]);
}
}
}
/* Driver code */
int main()
{
string txt = "ABAAABCD";
string pat = "ABC";
search(txt, pat);
return 0;
}
OutputPattern occurs at shift = 4
Time Complexity: O(m*n)
Auxiliary Space: O(1)
The Bad Character Heuristic may take O(m*n) time in worst case. The worst case occurs when all characters of the text and pattern are same. For example, txt[] = “AAAAAAAAAAAAAAAAAA” and pat[] = “AAAAA”. The Bad Character Heuristic may take O(n/m) in the best case. The best case occurs when all the characters of the text and pattern are different.
Boyer-Moore vs Traditional Pattern Searching Algorithms
Compared to traditional algorithms like the Naive algorithm, which checks each position in the text one by one, Boyer-Moore often skips large sections of the text, resulting in faster performance. The Knuth-Morris-Pratt (KMP) algorithm is more efficient than the Naive algorithm by preprocessing the pattern to create a partial match table, allowing it to skip unnecessary comparisons, but Boyer-Moore generally outperforms KMP in practical applications due to its heuristics.
Applications of Boyer-Moore Algorithm
- Used in text editors and search engines to find occurrences of a word or phrase.
- Used in DNA sequence analysis to find patterns within genetic data.
- Used in algorithms for data compression to find repeating patterns.
- Used in intrusion detection systems to find patterns of malicious activity within network traffic.
Similar Reads
Boyer-Moore Algorithm for Pattern Searching in C
In this article, we will learn the Boyer-Moore Algorithm, a powerful technique for pattern searching in strings using C programming language. What is the Boyer-Moore Algorithm?The Boyer-Moore Algorithm is a pattern searching algorithm that efficiently finds occurrences of a pattern within a text. It
7 min read
KMP (Knuth-Morris-Pratt) Algorithm for Pattern Searching in C
The KMP (Knuth-Morris-Pratt) algorithm is an efficient string searching algorithm used to find occurrences of a pattern within a text. Unlike simpler algorithms, KMP preprocesses the pattern to create a partial match table, known as the "lps" (Longest Prefix Suffix) array, which helps in skipping un
4 min read
Boyer Moore Algorithm in Python
Boyer-Moore algorithm is an efficient string search algorithm that is particularly useful for large-scale searches. Unlike some other string search algorithms, the Boyer-Moore does not require preprocessing, making it ideal where the sample is relatively large relative to the data being searched. Wh
3 min read
Searching Algorithms in Python
Searching algorithms are fundamental techniques used to find an element or a value within a collection of data. In this tutorial, we'll explore some of the most commonly used searching algorithms in Python. These algorithms include Linear Search, Binary Search, Interpolation Search, and Jump Search.
6 min read
Floyd-Warshall Algorithm in C++
The Floyd-Warshall algorithm is a dynamic programming technique used to find the shortest paths between all pairs of vertices in a weighted graph. This algorithm is particularly useful for graphs with dense connections and can handle both positive and negative edge weights, though it cannot handle n
4 min read
Tarjanâs Algorithm in C++
In this post, we will see the implementation of Tarjanâs Algorithm in C++ language.What is Tarjanâs Algorithm?Tarjanâs Algorithm is a well-known algorithm used for finding strongly connected components (SCCs) in a directed graph. An SCC is a maximal subgraph where every vertex is reachable from ever
6 min read
Kahnâs Algorithm in C++
In this post, we will see the implementation of Kahnâs Algorithm in C++.What is Kahnâs Algorithm?Kahnâs Algorithm is a classic algorithm used for topological sorting of a directed acyclic graph (DAG). Topological sorting is a linear ordering of vertices such that for every directed edge u -> v, v
4 min read
Algorithm Library Functions in C++ STL
Non-modifying sequence operations std :: all_of : Test condition on all elements in rangestd :: any_of : Test if any element in range fulfills conditionstd :: none_of : Test if no elements fulfill conditionstd :: for_each : Apply function to rangestd :: find : Find value in rangestd :: find_if : Fin
4 min read
Kosarajuâs Algorithm in C++
In this post, we will see the implementation of Kosarajuâs Algorithm in C++.What is Kosarajuâs Algorithm?Kosarajuâs Algorithm is a classic algorithm used for finding strongly connected components (SCCs) in a directed graph. An SCC is a maximal subgraph where every vertex is reachable from every othe
4 min read
Johnson Algorithm in C++
Johnsonâs Algorithm is an algorithm used to find the shortest paths between all pairs of vertices in a weighted graph. It is especially useful for sparse graphs and can handle negative weights, provided there are no negative weight cycles. This algorithm uses both Bellman-Ford and Dijkstra's algorit
8 min read