CSES Solution - Finding Patterns
Last Updated :
09 Apr, 2024
Prerequisite: Aho-Corasick algorithm
Given a string and patterns, check for each pattern if it appears in the string.
Example:
Input: s = "aybabtu", patterns = {"bab", "abc", "ayba"}
Output: YES
NO
YES
Input: s = "bybabtu", patterns = {"bat", "tu"}
Output:
NO
YES
Approach:
The solution uses a method called the Aho-Corasick algorithm. Let's see how it works:
- Building a Trie: We first create a special kind of tree called a ‘trie’ from the patterns we’re looking for. Each pattern is a path in the trie from the root to a node.
- Failure Links: We then add "failure links" to the trie. These links help us jump back efficiently in the trie when a character doesn’t match.
- Searching: We go through the given string, character by character, moving along the paths in the trie. If we reach the end of a path (a pattern), we know that pattern exists in the string.
- Counting: While searching, we keep track of how many times we reach each node (pattern). This tells us how many times each pattern appears in the string.
This method is fast because it checks all patterns at the same time, instead of one by one.
Steps-by-step approach:
- Implement a insertPattern() function to insert each pattern into the trie structure.
- Build failure links in the trie using the buildFailureLinks() function to prepare for pattern matching.
- Run the Aho-Corasick algorithm on the main string s using the runAhoCorasick() function to identify occurrences of patterns.
- Perform a Depth-First Search (DFS) on the trie structure to propagate occurrences down and update pattern matches using the dfs function.
- In the main function, input and insert patterns into the trie, build failure links, run the Aho-Corasick algorithm, and propagate occurrences using DFS.
- Output the pattern matches by checking the patternFound array for each pattern index and print "YES" if a match is found, otherwise print "NO".
Below is the implementation of the above approach:
C++
#include <algorithm>
#include <iostream>
#include <queue>
using namespace std;
#define pb push_back
// The main string and the number of patterns
string s;
int n, nodeIndex = 1, patternFound[500005];
// Adjacency list for the trie nodes
vector<int> trieAdjList[500005];
// Structure for each node in the trie
struct TrieNode {
int failureLink, child[26] = {}, occurrences = 0;
vector<int> patternIndices;
} trie[500005];
// Function to insert a pattern into the trie
void insertPattern(string pattern, int index)
{
// Start at the root node
int node = 1;
// Traverse the pattern character by character
for (int i = 0; i < pattern.size(); i++) {
// If the current character's child node doesn't
// exist, create it
if (trie[node].child[pattern[i] - 'a'] == 0)
trie[node].child[pattern[i] - 'a']
= ++nodeIndex;
// Move to the child node
node = trie[node].child[pattern[i] - 'a'];
}
// After traversing the pattern, add the pattern index
// to the current node
trie[node].patternIndices.pb(index);
}
// Function to build the failure links for the Aho-Corasick
// algorithm
void buildFailureLinks()
{
// Initialize a queue for BFS traversal
queue<int> nodeQueue;
// Start from the root node
int node = 1;
// Set failure link of root node to itself
trie[1].failureLink = 1;
// Push all child nodes of the root to the queue
for (int i = 0; i < 26; i++) {
if (trie[node].child[i])
trie[trie[node].child[i]].failureLink = node,
nodeQueue.push(trie[node].child[i]);
else
trie[node].child[i]
= 1; // If child node doesn't exist, point
// to root
}
// Perform BFS traversal to set failure links
while (!nodeQueue.empty()) {
node = nodeQueue.front();
nodeQueue.pop();
for (int i = 0; i < 26; i++) {
if (trie[node].child[i])
trie[trie[node].child[i]].failureLink
= trie[trie[node].failureLink].child[i],
nodeQueue.push(trie[node].child[i]);
else
trie[node].child[i]
= trie[trie[node].failureLink].child[i];
}
}
// Create adjacency list based on failure links
for (int i = 2; i <= nodeIndex; i++)
trieAdjList[trie[i].failureLink].pb(i);
}
// Function to run the Aho-Corasick algorithm on the main
// string
void runAhoCorasick(string s)
{
// Start from the root node
for (int i = 0, node = 1; i < s.size(); i++) {
// Traverse the trie based on characters in the main
// string
node = trie[node].child[s[i] - 'a'];
// Increment occurrences at each node
trie[node].occurrences++;
}
}
// DFS function to propagate the occurrences down the trie
int dfs(int u)
{
// Initialize total occurrences at current node
int totalOccurrences = trie[u].occurrences;
// Traverse child nodes recursively
for (int v : trieAdjList[u])
totalOccurrences += dfs(v);
// Propagate occurrences to pattern indices
for (int patternIndex : trie[u].patternIndices)
patternFound[patternIndex] = totalOccurrences;
// Return total occurrences
return totalOccurrences;
}
// Main function
int main()
{
string s = "aybabtu";
int n = 3;
vector<string> patterns = { "bab", "abc", "ayba" };
// Input and insert patterns into trie
for (int i = 0; i < n; i++) {
insertPattern(patterns[i], i);
}
// Build failure links for Aho-Corasick algorithm
buildFailureLinks();
// Run Aho-Corasick algorithm on main string
runAhoCorasick(s);
// Propagate occurrences down the trie using DFS
dfs(1);
// Output pattern matches
for (int i = 0; i < n; i++)
cout << (patternFound[i] ? "YES" : "NO") << '\n';
}
Java
/*code by flutterfly */
import java.util.*;
public class AhoCorasick {
static String s;
static int n, nodeIndex = 1;
static int[] patternFound = new int[500005];
static List<Integer>[] trieAdjList = new List[500005];
// Structure for each node in the trie
static class TrieNode {
int failureLink;
int[] child = new int[26];
int occurrences = 0;
List<Integer> patternIndices = new ArrayList<>();
}
static TrieNode[] trie = new TrieNode[500005];
// Function to insert a pattern into the trie
static void insertPattern(String pattern, int index) {
// Start at the root node
int node = 1;
// Traverse the pattern character by character
for (int i = 0; i < pattern.length(); i++) {
// If the current character's child node doesn't exist, create it
if (trie[node].child[pattern.charAt(i) - 'a'] == 0)
trie[node].child[pattern.charAt(i) - 'a'] = ++nodeIndex;
// Move to the child node
node = trie[node].child[pattern.charAt(i) - 'a'];
}
// After traversing the pattern, add the pattern index to the current node
trie[node].patternIndices.add(index);
}
// Function to build the failure links for the Aho-Corasick algorithm
static void buildFailureLinks() {
// Initialize a queue for BFS traversal
Queue<Integer> nodeQueue = new LinkedList<>();
// Start from the root node
int node = 1;
// Set failure link of root node to itself
trie[1].failureLink = 1;
// Push all child nodes of the root to the queue
for (int i = 0; i < 26; i++) {
if (trie[node].child[i] != 0) {
trie[trie[node].child[i]].failureLink = node;
nodeQueue.add(trie[node].child[i]);
} else {
trie[node].child[i] = 1; // If child node doesn't exist, point to root
}
}
// Perform BFS traversal to set failure links
while (!nodeQueue.isEmpty()) {
node = nodeQueue.poll();
for (int i = 0; i < 26; i++) {
if (trie[node].child[i] != 0) {
trie[trie[node].child[i]].failureLink = trie[trie[node].failureLink].child[i];
nodeQueue.add(trie[node].child[i]);
} else {
trie[node].child[i] = trie[trie[node].failureLink].child[i];
}
}
}
// Create adjacency list based on failure links
for (int i = 2; i <= nodeIndex; i++)
trieAdjList[trie[i].failureLink].add(i);
}
// Function to run the Aho-Corasick algorithm on the main string
static void runAhoCorasick(String s) {
// Start from the root node
for (int i = 0, node = 1; i < s.length(); i++) {
// Traverse the trie based on characters in the main string
node = trie[node].child[s.charAt(i) - 'a'];
// Increment occurrences at each node
trie[node].occurrences++;
}
}
// DFS function to propagate the occurrences down the trie
static int dfs(int u) {
// Initialize total occurrences at current node
int totalOccurrences = trie[u].occurrences;
// Traverse child nodes recursively
for (int v : trieAdjList[u])
totalOccurrences += dfs(v);
// Propagate occurrences to pattern indices
for (int patternIndex : trie[u].patternIndices)
patternFound[patternIndex] = totalOccurrences;
// Return total occurrences
return totalOccurrences;
}
// Main function
public static void main(String[] args) {
s = "aybabtu";
n = 3;
String[] patterns = { "bab", "abc", "ayba" };
// Initialize trie nodes
for (int i = 0; i <= 500004; i++) {
trie[i] = new TrieNode();
trieAdjList[i] = new ArrayList<>();
}
// Input and insert patterns into trie
for (int i = 0; i < n; i++) {
insertPattern(patterns[i], i);
}
// Build failure links for Aho-Corasick algorithm
buildFailureLinks();
// Run Aho-Corasick algorithm on main string
runAhoCorasick(s);
// Propagate occurrences down the trie using DFS
dfs(1);
// Output pattern matches
for (int i = 0; i < n; i++)
System.out.println((patternFound[i] != 0 ? "YES" : "NO"));
}
}
Python3
from collections import deque
class TrieNode:
def __init__(self):
self.failureLink = 0
self.child = [0] * 26
self.occurrences = 0
self.patternIndices = []
def insert_pattern(pattern, index):
"""Inserts a pattern into the trie."""
global nodeIndex
node = 1
# Traverse the trie to insert the pattern character by character
for char in pattern:
char_index = ord(char) - ord('a')
if trie[node].child[char_index] == 0:
nodeIndex += 1
trie[node].child[char_index] = nodeIndex
node = trie[node].child[char_index]
# Add the pattern index to the current node
trie[node].patternIndices.append(index)
def build_failure_links():
"""Builds failure links for the Aho-Corasick algorithm."""
nodeQueue = deque()
node = 1
trie[1].failureLink = 1
# Initialize failure links for child nodes of the root
for i in range(26):
if trie[node].child[i] != 0:
trie[trie[node].child[i]].failureLink = node
nodeQueue.append(trie[node].child[i])
else:
trie[node].child[i] = 1
# Perform BFS traversal to set failure links
while nodeQueue:
node = nodeQueue.popleft()
for i in range(26):
if trie[node].child[i] != 0:
trie[trie[node].child[i]].failureLink = trie[trie[node].failureLink].child[i]
nodeQueue.append(trie[node].child[i])
else:
trie[node].child[i] = trie[trie[node].failureLink].child[i]
# Create adjacency list based on failure links
for i in range(2, nodeIndex + 1):
trie_adj_list[trie[i].failureLink].append(i)
def run_aho_corasick(s):
"""Runs the Aho-Corasick algorithm on the main string."""
node = 1
for char in s:
# Traverse the trie based on characters in the main string
node = trie[node].child[ord(char) - ord('a')]
# Increment occurrences at each node
trie[node].occurrences += 1
def dfs(u):
"""DFS function to propagate the occurrences down the trie."""
total_occurrences = trie[u].occurrences
# Traverse child nodes recursively
for v in trie_adj_list[u]:
total_occurrences += dfs(v)
# Propagate occurrences to pattern indices
for pattern_index in trie[u].patternIndices:
pattern_found[pattern_index] = total_occurrences
# Return total occurrences
return total_occurrences
# Sample Input
s = "aybabtu"
n = 3
patterns = ["bab", "abc", "ayba"]
# Initialize variables
nodeIndex = 1
trie = [TrieNode() for _ in range(500005)]
trie_adj_list = [[] for _ in range(500005)]
pattern_found = [0] * 500005
# Insert patterns into the trie
for i in range(n):
insert_pattern(patterns[i], i)
# Build failure links for Aho-Corasick algorithm
build_failure_links()
# Run Aho-Corasick algorithm on the main string
run_aho_corasick(s)
# Propagate occurrences down the trie using DFS
dfs(1)
# Output pattern matches
for i in range(n):
print("YES" if pattern_found[i] != 0 else "NO")
C#
//code by flutterfly
using System;
using System.Collections.Generic;
public class Program
{
// Structure for each node in the trie
public class TrieNode
{
public int failureLink;
public int[] child = new int[26];
public int occurrences = 0;
public List<int> patternIndices = new List<int>();
}
static string s;
static int n, nodeIndex = 1;
static int[] patternFound = new int[500005];
static List<int>[] trieAdjList = new List<int>[500005];
static TrieNode[] trie = new TrieNode[500005];
// Function to insert a pattern into the trie
static void InsertPattern(string pattern, int index)
{
// Start at the root node
int node = 1;
// Traverse the pattern character by character
foreach (char c in pattern)
{
int idx = c - 'a';
// If the current character's child node doesn't exist, create it
if (trie[node].child[idx] == 0)
trie[node].child[idx] = ++nodeIndex;
// Move to the child node
node = trie[node].child[idx];
}
// After traversing the pattern, add the pattern index to the current node
trie[node].patternIndices.Add(index);
}
// Function to build the failure links for the Aho-Corasick algorithm
static void BuildFailureLinks()
{
// Initialize a queue for BFS traversal
Queue<int> nodeQueue = new Queue<int>();
// Start from the root node
int node = 1;
// Set failure link of root node to itself
trie[1].failureLink = 1;
// Push all child nodes of the root to the queue
for (int i = 0; i < 26; i++)
{
if (trie[node].child[i] != 0)
{
trie[trie[node].child[i]].failureLink = node;
nodeQueue.Enqueue(trie[node].child[i]);
}
else
{
trie[node].child[i] = 1; // If child node doesn't exist, point to root
}
}
// Perform BFS traversal to set failure links
while (nodeQueue.Count > 0)
{
node = nodeQueue.Dequeue();
for (int i = 0; i < 26; i++)
{
if (trie[node].child[i] != 0)
{
trie[trie[node].child[i]].failureLink = trie[trie[node].failureLink].child[i];
nodeQueue.Enqueue(trie[node].child[i]);
}
else
{
trie[node].child[i] = trie[trie[node].failureLink].child[i];
}
}
}
// Create adjacency list based on failure links
for (int i = 0; i <= nodeIndex; i++)
trieAdjList[i] = new List<int>();
for (int i = 2; i <= nodeIndex; i++)
trieAdjList[trie[i].failureLink].Add(i);
}
// Function to run the Aho-Corasick algorithm on the main string
static void RunAhoCorasick(string s)
{
// Start from the root node
int node = 1;
// Traverse the main string character by character
foreach (char c in s)
{
int idx = c - 'a';
// Traverse the trie based on characters in the main string
node = trie[node].child[idx];
// Increment occurrences at each node
trie[node].occurrences++;
}
}
// DFS function to propagate the occurrences down the trie
static int DFS(int u)
{
// Initialize total occurrences at current node
int totalOccurrences = trie[u].occurrences;
// Traverse child nodes recursively
foreach (int v in trieAdjList[u])
totalOccurrences += DFS(v);
// Propagate occurrences to pattern indices
foreach (int patternIndex in trie[u].patternIndices)
patternFound[patternIndex] = totalOccurrences;
// Return total occurrences
return totalOccurrences;
}
// Main function
public static void Main(string[] args)
{
s = "aybabtu";
n = 3;
string[] patterns = { "bab", "abc", "ayba" };
// Input and insert patterns into trie
for (int i = 0; i < 500005; i++)
trie[i] = new TrieNode();
for (int i = 0; i < n; i++)
InsertPattern(patterns[i], i);
// Build failure links for Aho-Corasick algorithm
BuildFailureLinks();
// Run Aho-Corasick algorithm on main string
RunAhoCorasick(s);
// Propagate occurrences down the trie using DFS
DFS(1);
// Output pattern matches
for (int i = 0; i < n; i++)
Console.WriteLine((patternFound[i] != 0 ? "YES" : "NO"));
}
}
JavaScript
//code by flutterfly
// The main string and the number of patterns
let s;
let n;
let nodeIndex = 1;
let patternFound = new Array(500005).fill(0);
// Adjacency list for the trie nodes
let trieAdjList = new Array(500005).fill().map(() => []);
// Structure for each node in the trie
class TrieNode {
constructor() {
this.failureLink = 0;
this.child = new Array(26).fill(0);
this.occurrences = 0;
this.patternIndices = [];
}
}
let trie = new Array(500005).fill().map(() => new TrieNode());
// Function to insert a pattern into the trie
function insertPattern(pattern, index) {
// Start at the root node
let node = 1;
// Traverse the pattern character by character
for (let i = 0; i < pattern.length; i++) {
// If the current character's child node doesn't exist, create it
if (trie[node].child[pattern.charCodeAt(i) - 'a'.charCodeAt(0)] === 0) {
trie[node].child[pattern.charCodeAt(i) - 'a'.charCodeAt(0)] = ++nodeIndex;
}
// Move to the child node
node = trie[node].child[pattern.charCodeAt(i) - 'a'.charCodeAt(0)];
}
// After traversing the pattern, add the pattern index to the current node
trie[node].patternIndices.push(index);
}
// Function to build the failure links for the Aho-Corasick algorithm
function buildFailureLinks() {
// Initialize a queue for BFS traversal
let nodeQueue = [];
// Start from the root node
let node = 1;
// Set failure link of root node to itself
trie[1].failureLink = 1;
// Push all child nodes of the root to the queue
for (let i = 0; i < 26; i++) {
if (trie[node].child[i] !== 0) {
trie[trie[node].child[i]].failureLink = node;
nodeQueue.push(trie[node].child[i]);
} else {
trie[node].child[i] = 1; // If child node doesn't exist, point to root
}
}
// Perform BFS traversal to set failure links
while (nodeQueue.length > 0) {
node = nodeQueue.shift();
for (let i = 0; i < 26; i++) {
if (trie[node].child[i] !== 0) {
trie[trie[node].child[i]].failureLink = trie[trie[node].failureLink].child[i];
nodeQueue.push(trie[node].child[i]);
} else {
trie[node].child[i] = trie[trie[node].failureLink].child[i];
}
}
}
// Create adjacency list based on failure links
for (let i = 2; i <= nodeIndex; i++) {
trieAdjList[trie[i].failureLink].push(i);
}
}
// Function to run the Aho-Corasick algorithm on the main string
function runAhoCorasick(s) {
// Start from the root node
let node = 1;
// Traverse the main string character by character
for (let i = 0; i < s.length; i++) {
// Traverse the trie based on characters in the main string
node = trie[node].child[s.charCodeAt(i) - 'a'.charCodeAt(0)];
// Increment occurrences at each node
trie[node].occurrences++;
}
}
// DFS function to propagate the occurrences down the trie
function dfs(u) {
// Initialize total occurrences at current node
let totalOccurrences = trie[u].occurrences;
// Traverse child nodes recursively
for (let v of trieAdjList[u]) {
totalOccurrences += dfs(v);
}
// Propagate occurrences to pattern indices
for (let patternIndex of trie[u].patternIndices) {
patternFound[patternIndex] = totalOccurrences;
}
// Return total occurrences
return totalOccurrences;
}
// Main function
function main() {
s = "aybabtu";
n = 3;
let patterns = ["bab", "abc", "ayba"];
// Input and insert patterns into trie
for (let i = 0; i < n; i++) {
insertPattern(patterns[i], i);
}
// Build failure links for Aho-Corasick algorithm
buildFailureLinks();
// Run Aho-Corasick algorithm on main string
runAhoCorasick(s);
// Propagate occurrences down the trie using DFS
dfs(1);
// Output pattern matches
for (let i = 0; i < n; i++) {
console.log(patternFound[i] ? "YES" : "NO");
}
}
main();
Time complexity: O(numPatterns * m + n + s), where: numPatterns is the number of patterns, m is the average length of the patterns, n is the total number of nodes in the trie, s is the length of the main string.
Auxiliary Space: O(n + numPatterns)
Similar Reads
DSA Tutorial - Learn Data Structures and Algorithms DSA (Data Structures and Algorithms) is the study of organizing data efficiently using data structures like arrays, stacks, and trees, paired with step-by-step procedures (or algorithms) to solve problems effectively. Data structures manage how data is stored and accessed, while algorithms focus on
7 min read
Non-linear Components In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co
11 min read
Quick Sort QuickSort is a sorting algorithm based on the Divide and Conquer that picks an element as a pivot and partitions the given array around the picked pivot by placing the pivot in its correct position in the sorted array. It works on the principle of divide and conquer, breaking down the problem into s
12 min read
Merge Sort - Data Structure and Algorithms Tutorials Merge sort is a popular sorting algorithm known for its efficiency and stability. It follows the divide-and-conquer approach. It works by recursively dividing the input array into two halves, recursively sorting the two halves and finally merging them back together to obtain the sorted array. Merge
14 min read
Bubble Sort Algorithm Bubble Sort is the simplest sorting algorithm that works by repeatedly swapping the adjacent elements if they are in the wrong order. This algorithm is not suitable for large data sets as its average and worst-case time complexity are quite high.We sort the array using multiple passes. After the fir
8 min read
Data Structures Tutorial Data structures are the fundamental building blocks of computer programming. They define how data is organized, stored, and manipulated within a program. Understanding data structures is very important for developing efficient and effective algorithms. What is Data Structure?A data structure is a st
2 min read
Breadth First Search or BFS for a Graph Given a undirected graph represented by an adjacency list adj, where each adj[i] represents the list of vertices connected to vertex i. Perform a Breadth First Search (BFS) traversal starting from vertex 0, visiting vertices from left to right according to the adjacency list, and return a list conta
15+ min read
Binary Search Algorithm - Iterative and Recursive Implementation Binary Search Algorithm is a searching algorithm used in a sorted array by repeatedly dividing the search interval in half. The idea of binary search is to use the information that the array is sorted and reduce the time complexity to O(log N). Binary Search AlgorithmConditions to apply Binary Searc
15 min read
Insertion Sort Algorithm Insertion sort is a simple sorting algorithm that works by iteratively inserting each element of an unsorted list into its correct position in a sorted portion of the list. It is like sorting playing cards in your hands. You split the cards into two groups: the sorted cards and the unsorted cards. T
9 min read
Dijkstra's Algorithm to find Shortest Paths from a Source to all Given a weighted undirected graph represented as an edge list and a source vertex src, find the shortest path distances from the source vertex to all other vertices in the graph. The graph contains V vertices, numbered from 0 to V - 1.Note: The given graph does not contain any negative edge. Example
12 min read