CSES Solutions - String Matching
Last Updated :
02 Apr, 2024
Given a string S and a pattern P, your task is to count the number of positions where the pattern occurs in the string.
Examples:
Input: S = "saippuakauppias", P = "pp"
Output: 2
Explanation: "pp" appears 2 times in S.
Input: S = "aaaa", P = "aa"
Output: 3
Explanation: "aa" appears 3 times in S.
Approach: To solve the problem, follow the below idea:
To find all occurrences of a pattern in a text we can use various String-Matching algorithms. The Knuth-Morris-Pratt (KMP) algorithm is a suitable choice for this problem. KMP is an efficient string-matching algorithm that can find all occurrences of a pattern in a string in linear time.
Concatenate the Pattern and Text: The first step is to concatenate the pattern and the text with a special character # in between. This is done to ensure that the pattern and text don’t overlap during the computation of the prefix function.
Compute the Prefix Function: The computePrefix function is used to compute the prefix function of the concatenated string. The prefix function for a position i in the string is defined as the maximum proper prefix of the substring ending at position i that is also a suffix of this substring. This function is a key part of the KMP algorithm.
Count the Occurrences: After the prefix function is computed, the next step is to count the number of occurrences of the pattern in the text. This is done by iterating over the prefix function array and checking how many times the pattern length appears in the array. Each time the pattern length appears in the array, it means an occurrence of the pattern has been found in the text.
Step-by-step algorithm:
- Declare the prefix function array pi[], and the count of occurrences.
- The prefix function is computed for the pattern string. This function calculates the longest proper prefix which is also a suffix for each substring of the pattern. This information is stored in the pi array.
- The pattern string is concatenated with the text string, with a special character (#) in between to separate them.
- Iterate over the concatenated string. For each character, check if it matches the current character of the pattern (using the pi[] array). If it does, move to the next character of both the pattern and the text. If it doesn’t, move to the next character of the text, but stay on the current character of the pattern (or move to the character indicated by the pi array).
- Each time the end of the pattern is reached (i.e., all characters of the pattern have matched), increment the count of occurrences.
- After the entire text has been scanned, print the count of occurrences.
Below is the implementation of the algorithm:
C++
#include <bits/stdc++.h>
using namespace std;
// Function to compute the prefix function of a string for
// KMP algorithm
vector<int> computePrefix(string S)
{
int N = S.length();
vector<int> pi(N);
for (int i = 1; i < N; i++) {
int j = pi[i - 1];
// Find the longest proper prefix which is also a
// suffix
while (j > 0 && S[i] != S[j])
j = pi[j - 1];
if (S[i] == S[j])
j++;
pi[i] = j;
}
return pi;
}
// Function to count the number of occurrences of a pattern
// in a text using KMP algorithm
int countOccurrences(string S, string P)
{
// Concatenate pattern and text with a special character
// in between
string combined = P + "#" + S;
// Compute the prefix function
vector<int> prefixArray = computePrefix(combined);
int count = 0;
// Count the number of times the pattern appears in the
// text
for (int i = 0; i < prefixArray.size(); i++) {
if (prefixArray[i] == P.size())
count++;
}
return count;
}
// Driver code
int main()
{
string S = "saippuakauppias";
string P = "pp";
cout << countOccurrences(S, P) << "\n";
return 0;
}
Java
import java.util.*;
public class KMPAlgorithm {
// Function to compute the prefix function of a string for KMP algorithm
static List<Integer> computePrefix(String S) {
int N = S.length();
List<Integer> pi = new ArrayList<>(Collections.nCopies(N, 0));
for (int i = 1; i < N; i++) {
int j = pi.get(i - 1);
// Find the longest proper prefix which is also a suffix
while (j > 0 && S.charAt(i) != S.charAt(j))
j = pi.get(j - 1);
if (S.charAt(i) == S.charAt(j))
j++;
pi.set(i, j);
}
return pi;
}
// Function to count the number of occurrences of a pattern in a text using KMP algorithm
static int countOccurrences(String S, String P) {
// Concatenate pattern and text with a special character in between
String combined = P + "#" + S;
// Compute the prefix function
List<Integer> prefixArray = computePrefix(combined);
int count = 0;
// Count the number of times the pattern appears in the text
for (int i = 0; i < prefixArray.size(); i++) {
if (prefixArray.get(i) == P.length())
count++;
}
return count;
}
// Driver code
public static void main(String[] args) {
String S = "saippuakauppias";
String P = "pp";
System.out.println(countOccurrences(S, P));
}
}
Python
# Function to compute the prefix function of a string for
# KMP algorithm
def compute_prefix(s):
n = len(s)
pi = [0] * n
j = 0
for i in range(1, n):
while j > 0 and s[i] != s[j]:
j = pi[j - 1]
if s[i] == s[j]:
j += 1
pi[i] = j
return pi
# Function to count the number of occurrences of a pattern
# in a text using KMP algorithm
def count_occurrences(s, p):
# Concatenate pattern and text with a special character
# in between
combined = p + "#" + s
# Compute the prefix function
prefix_array = compute_prefix(combined)
count = 0
# Count the number of times the pattern appears in the
# text
for pi in prefix_array:
if pi == len(p):
count += 1
return count
# Driver code
if __name__ == "__main__":
S = "saippuakauppias"
P = "pp"
print(count_occurrences(S, P))
C#
using System;
using System.Collections.Generic;
public class KMPAlgorithm
{
// Function to compute the prefix function of a string for KMP algorithm
static List<int> ComputePrefix(string S)
{
int N = S.Length;
List<int> pi = new List<int>(new int[N]);
for (int i = 1; i < N; i++)
{
int j = pi[i - 1];
// Find the longest proper prefix which is also a suffix
while (j > 0 && S[i] != S[j])
j = pi[j - 1];
if (S[i] == S[j])
j++;
pi[i] = j;
}
return pi;
}
// Function to count the number of occurrences of a pattern in a text using KMP algorithm
static int CountOccurrences(string S, string P)
{
// Concatenate pattern and text with a special character in between
string combined = P + "#" + S;
// Compute the prefix function
List<int> prefixArray = ComputePrefix(combined);
int count = 0;
// Count the number of times the pattern appears in the text
for (int i = 0; i < prefixArray.Count; i++)
{
if (prefixArray[i] == P.Length)
count++;
}
return count;
}
// Driver code
public static void Main(string[] args)
{
string S = "saippuakauppias";
string P = "pp";
Console.WriteLine(CountOccurrences(S, P));
}
}
JavaScript
// Function to compute the prefix function of a string for
// KMP algorithm
function computePrefix(S) {
let N = S.length;
let pi = new Array(N).fill(0);
for (let i = 1; i < N; i++) {
let j = pi[i - 1];
// Find the longest proper prefix which is also a
// suffix
while (j > 0 && S[i] != S[j])
j = pi[j - 1];
if (S[i] == S[j])
j++;
pi[i] = j;
}
return pi;
}
// Function to count the number of occurrences of a pattern
// in a text using KMP algorithm
function countOccurrences(S, P) {
// Concatenate pattern and text with a special character
// in between
let combined = P + "#" + S;
// Compute the prefix function
let prefixArray = computePrefix(combined);
let count = 0;
// Count the number of times the pattern appears in the
// text
for (let i = 0; i < prefixArray.length; i++) {
if (prefixArray[i] == P.length)
count++;
}
return count;
}
// Driver code
let S = "saippuakauppias";
let P = "pp";
console.log(countOccurrences(S, P));
Time Complexity: O(N+M) where N is the length of the text and M is the length of the pattern to be found.
Auxiliary Space: O(N)
Similar Reads
Dynamic Programming or DP Dynamic Programming is an algorithmic technique with the following properties.It is mainly an optimization over plain recursion. Wherever we see a recursive solution that has repeated calls for the same inputs, we can optimize it using Dynamic Programming. The idea is to simply store the results of
3 min read
Practice For Cracking Any Coding Interview The coding questions in this article are difficulty-wise ordered. The idea of this post is to target two types of people.Competitive Programming Preparation (For Ist and IInd Year Students): It is recommended to finish all questions from all categories except possibly Linked List, Tree, and BST. How
11 min read
Prefix Sum Array - Implementation and Applications Given an array arr[] of size n, the task is to find the prefix sum of the array. A prefix sum array is another array prefixSum[] of the same size, such that prefixSum[i] is arr[0] + arr[1] + arr[2] . . . arr[i].Examples: Input: arr[] = [10, 20, 10, 5, 15]Output: 10 30 40 45 60Explanation: For each i
8 min read
Introduction to Backtracking Backtracking is like trying different paths, and when you hit a dead end, you backtrack to the last choice and try a different route. In this article, we'll explore the basics of backtracking, how it works, and how it can help solve all sorts of challenging problems. It's like a method for finding t
7 min read
Competitive Programming - A Complete Guide Competitive Programming is a mental sport that enables you to code a given problem under provided constraints. The purpose of this article is to guide every individual possessing a desire to excel in this sport. This article provides a detailed syllabus for Competitive Programming designed by indust
8 min read
UGC NET Computer Science Syllabus 2024 PDF Download UGC NET is a competitive exam that is conducted by NTAs(National Testing Agency). Computer Science and Applications is one of the popular branches of UGC NET. In this article, we are going to discuss the syllabus of Computer Science and Applications and different terms related to Computer Science an
14 min read
Bits manipulation (Important tactics) Prerequisites: Bitwise operators in C, Bitwise Hacks for Competitive Programming, Bit Tricks for Competitive Programming Table of Contents Compute XOR from 1 to n (direct method)Count of numbers (x) smaller than or equal to n such that n+x = n^xHow to know if a number is a power of 2?Find XOR of all
15+ min read
Introduction to Flowcharts The flowcharts are simple visual tools that help us understand and represent processes very easily. They use shapes like arrows, rectangles, and diamonds to show steps and decisions clearly. If someone is making a project or explaining a complex task, flowcharts can make complex ideas easier to unde
5 min read
Program to print ASCII Value of a character Given a character, we need to print its ASCII value in C/C++/Java/Python. Examples : Input : a Output : 97 Input : DOutput : 68 Here are few methods in different programming languages to print ASCII value of a given character : Python code using ord function : ord() : It converts the given string o
4 min read
Cryptography Hash Functions Cryptographic hash functions are mathematical algorithms that transform input data into a fixed-length sequence of characters, referred to as a hash value. Cryptographic hash functions are intended to be fast, deterministic, and one-way, meaning that even a minor change in input yields a very differ
6 min read