Count distinct substrings of a string using Rabin Karp algorithm
Last Updated :
04 Oct, 2023
Given a string, return the number of distinct substrings using Rabin Karp Algorithm.
Examples:
Input : str = “aba”
Output : 5
Explanation :
Total number of distinct substring are 5 - "a", "ab", "aba", "b" ,"ba"
Input : str = “abcd”
Output : 10
Explanation :
Total number of distinct substring are 10 - "a", "ab", "abc", "abcd", "b", "bc", "bcd", "c", "cd", "d"
Approach:
Prerequisite: Rabin-Karp Algorithm for Pattern Searching
Calculate the current hash value of the current character and store
in a dictionary/map to avoid repetition.
To compute the hash (rolling hash) as done in Rabin-Karp algorithm follow:
The hash function suggested by Rabin and Karp calculates an integer value. The integer value for a string is the numeric value of a string. For example, if all possible characters are from 1 to 10, the numeric value of “122” will be 122. The number of possible characters is higher than 10 (256 in general) and the pattern length can be large. So the numeric values cannot be practically stored as an integer. Therefore, the numeric value is calculated using modular arithmetic to make sure that the hash values can be stored in an integer variable (can fit in memory words). To do rehashing, we need to take off the most significant digit and add the new least significant digit in the hash value. Rehashing is done using the following formula.
hash( txt[s+1 .. s+m] ) = ( d ( hash( txt[s .. s+m-1]) – txt[s]*h ) + txt[s + m] ) mod q
hash( txt[s .. s+m-1] ) : Hash value at shift s.
hash( txt[s+1 .. s+m] ): Hash value at next shift (or shift s+1)
d: Number of characters in the alphabet
q: A prime number
h: d^(m-1)
The idea is similar as we evaluate a mathematical expression. For example, we have a string of "1234" let us compute the value of the substring "12" as 12 and we want to compute the value of the substring "123" this can be calculated as ((12)*10+3)=123, similar logic is applied here.
C++
#include <bits/stdc++.h>
using namespace std;
// Driver code
int main()
{
int t = 1;
// store prime to reduce overflow
long long mod = 9007199254740881;
for(int i = 0; i < t; i++)
{
// string to check number of distinct substring
string s = "abcd";
// to store substrings
vector<vector<long long>>l;
// to store hash values by Rabin Karp algorithm
unordered_map<long long,int>d;
for(int i=0;i<s.length();i++){
int suma = 0;
long long pre = 0;
// Number of input alphabets
long long D = 256;
for(int j=i;j<s.length();j++){
// calculate new hash value by adding next element
pre = (pre*D+s[j]) % mod;
// store string length if non repeat
if(d.find(pre) == d.end())
l.push_back({i, j});
d[pre] = 1;
}
}
// resulting length
cout<<l.size()<<endl;
// resulting distinct substrings
for(int i = 0; i < l.size(); i++)
cout << s.substr(l[i][0],l[i][1]+1-l[i][0]) << " ";
}
}
// This code is contributed by shinjanpatra
Java
import java.util.*;
public class Main {
public static void main(String[] args) {
int t = 1;
// store prime to reduce overflow
long mod = 9007199254740881L;
for (int i = 0; i < t; i++) {
// string to check number of distinct substring
String s = "abcd";
// to store substrings
List<List<Integer>> l = new ArrayList<>();
// to store hash values by Rabin Karp algorithm
Map<Long, Integer> d = new HashMap<>();
for (int j = 0; j < s.length(); j++) {
long suma = 0;
long pre = 0;
// Number of input alphabets
int D = 256;
for (int k = j; k < s.length(); k++) {
// calculate new hash value by adding next element
pre = (pre*D + (long)s.charAt(k)) % mod;
// store string length if non repeat
if (!d.containsKey(pre)) {
List<Integer> sublist = new ArrayList<>();
sublist.add(j);
sublist.add(k);
l.add(sublist);
}
d.put(pre, 1);
}
}
// resulting length
System.out.println(l.size());
// resulting distinct substrings
for (int j = 0; j < l.size(); j++) {
int start = l.get(j).get(0);
int end = l.get(j).get(1);
System.out.print(s.substring(start, end+1) + " ");
}
}
}
}
Python3
# importing libraries
import sys
import math as mt
t = 1
# store prime to reduce overflow
mod = 9007199254740881
for ___ in range(t):
# string to check number of distinct substring
s = 'abcd'
# to store substrings
l = []
# to store hash values by Rabin Karp algorithm
d = {}
for i in range(len(s)):
suma = 0
pre = 0
# Number of input alphabets
D = 256
for j in range(i, len(s)):
# calculate new hash value by adding next element
pre = (pre*D+ord(s[j])) % mod
# store string length if non repeat
if d.get(pre, -1) == -1:
l.append([i, j])
d[pre] = 1
# resulting length
print(len(l))
# resulting distinct substrings
for i in range(len(l)):
print(s[l[i][0]:l[i][1]+1], end=" ")
C#
using System;
using System.Collections.Generic;
class GFG {
static void Main()
{
int t = 1;
// store prime to reduce overflow
long mod = 9007199254740881;
for (int i = 0; i < t; i++)
{
// string to check number of distinct substring
string s = "abcd";
// to store substrings
List<List<long>> l = new List<List<long>>();
// to store hash values by Rabin Karp algorithm
Dictionary<long, int> d = new Dictionary<long, int>();
for (int j = 0; j < s.Length; j++)
{
int suma = 0;
long pre = 0;
// Number of input alphabets
long D = 256;
for (int k = j; k < s.Length; k++)
{
// calculate new hash value by adding next element
pre = (pre * D + s[k]) % mod;
// store string length if non repeat
if (!d.ContainsKey(pre))
{
List<long> sub = new List<long>();
sub.Add(j);
sub.Add(k);
l.Add(sub);
}
d[pre] = 1;
}
}
// resulting length
Console.WriteLine(l.Count);
// resulting distinct substrings
for (int j = 0; j < l.Count; j++)
{
Console.Write(s.Substring((int)l[j][0], (int)l[j][1] + 1 - (int)l[j][0]) + " ");
}
}
}
}
//This code is contributed by rudra1807raj
JavaScript
<script>
let t = 1
// store prime to reduce overflow
let mod = 9007199254740881
for(let i = 0; i < t; i++){
// string to check number of distinct substring
let s = 'abcd'
// to store substrings
let l = []
// to store hash values by Rabin Karp algorithm
let d = new Map()
for(let i=0;i<s.length;i++){
let suma = 0
let pre = 0
// Number of input alphabets
let D = 256
for(let j=i;j<s.length;j++){
// calculate new hash value by adding next element
pre = (pre*D+s.charCodeAt(j)) % mod
// store string length if non repeat
if(d.has([pre, -1]) == false)
l.push([i, j])
d.set(pre , 1)
}
}
// resulting length
document.write(l.length,"</br>")
// resulting distinct substrings
for(let i = 0; i < l.length; i++)
document.write(s.substring(l[i][0],l[i][1]+1)," ")
}
// This code is contributed by shinjanpatra
</script>
Output10
a ab abc abcd b bc bcd c cd d
Time Complexity: O(N2), N is the length of the string
Auxiliary Space: O(N*2) => O(N)
Another Approach:
- Define an input string "s".
- Create an empty unordered set "substrings" to store the distinct substrings.
- Use two nested loops to iterate over all possible substrings of "s". The outer loop iterates over the starting index of each substring.
The inner loop iterates over the ending index of each substring, starting from the starting index. - Use the "substr" method of the input string "s" to extract each substring and insert it into the "substrings" set.
The "substr" method takes two arguments: the starting index of the substring and the length of the substring.
The length of the substring is computed as "j - i + 1". - After all substrings have been added to the set, output the size of the set to get the number of distinct substrings.
Below is the implementation of the above approach:
C++
#include <iostream>
#include <unordered_set>
#include <string>
using namespace std;
int main() {
// Input string
string s = "abcd";
// Set to store distinct substrings
unordered_set<string> substrings;
// Iterate over all possible substrings and add them to the set
for (int i = 0; i < s.size(); i++) {
for (int j = i; j < s.size(); j++) {
substrings.insert(s.substr(i, j - i + 1));
}
}
//This code is contributed rudra1807raj
// Output the number of distinct substrings
cout << substrings.size() << endl;
}
Java
import java.util.*;
public class GFG {
public static void main(String[] args) {
// Input string
String s = "abcd";
// Set to store distinct substrings
Set<String> substrings = new HashSet<>();
// Iterate over all possible substrings and add them to the set
for (int i = 0; i < s.length(); i++) {
for (int j = i; j < s.length(); j++) {
substrings.add(s.substring(i, j + 1));
}
}
// Output the number of distinct substrings
System.out.println(substrings.size());
}
}
// This code is contributed by rudra1807raj
Python
def main():
# Input string
s = "abcd"
# Set to store distinct substrings
substrings = set()
# Iterate over all possible substrings and add them to the set
for i in range(len(s)):
for j in range(i, len(s)):
substrings.add(s[i:j + 1])
# Output the number of distinct substrings
print(len(substrings))
if __name__ == "__main__":
main()
C#
using System;
using System.Collections.Generic;
class GFG
{
static void Main(string[] args)
{
// Input string
string s = "abcd";
// Set to store distinct substrings
HashSet<string> substrings = new HashSet<string>();
// Iterate over all possible substrings and add them to the set
for (int i = 0; i < s.Length; i++){
for (int j = i; j < s.Length; j++){
substrings.Add(s.Substring(i, j - i + 1));
}
}
// Output the number of distinct substrings
Console.WriteLine(substrings.Count);
}
}
JavaScript
<script>
function countDistinctSubstrings(s) {
const substrings = new Set();
// Iterate over all possible substrings and add them to the set
for (let i = 0; i < s.length; i++) {
for (let j = i; j < s.length; j++) {
substrings.add(s.substring(i, j + 1));
}
}
// Output the number of distinct substrings
return substrings.size;
}
// Input string
const s = "abcd";
// Get the number of distinct substrings
const distinctSubstringsCount = countDistinctSubstrings(s);
document.write(distinctSubstringsCount);
</script>
Time complexity: O(n^3), where n is the length of the input string "s".
Auxiliary Space: O(n^2), where n is the length of the input string "s". The space complexity is dominated by the number of distinct substrings that are stored in the unordered_set.
Similar Reads
DSA Tutorial - Learn Data Structures and Algorithms DSA (Data Structures and Algorithms) is the study of organizing data efficiently using data structures like arrays, stacks, and trees, paired with step-by-step procedures (or algorithms) to solve problems effectively. Data structures manage how data is stored and accessed, while algorithms focus on
7 min read
Quick Sort QuickSort is a sorting algorithm based on the Divide and Conquer that picks an element as a pivot and partitions the given array around the picked pivot by placing the pivot in its correct position in the sorted array. It works on the principle of divide and conquer, breaking down the problem into s
12 min read
Merge Sort - Data Structure and Algorithms Tutorials Merge sort is a popular sorting algorithm known for its efficiency and stability. It follows the divide-and-conquer approach. It works by recursively dividing the input array into two halves, recursively sorting the two halves and finally merging them back together to obtain the sorted array. Merge
14 min read
Bubble Sort Algorithm Bubble Sort is the simplest sorting algorithm that works by repeatedly swapping the adjacent elements if they are in the wrong order. This algorithm is not suitable for large data sets as its average and worst-case time complexity are quite high.We sort the array using multiple passes. After the fir
8 min read
Data Structures Tutorial Data structures are the fundamental building blocks of computer programming. They define how data is organized, stored, and manipulated within a program. Understanding data structures is very important for developing efficient and effective algorithms. What is Data Structure?A data structure is a st
2 min read
Breadth First Search or BFS for a Graph Given a undirected graph represented by an adjacency list adj, where each adj[i] represents the list of vertices connected to vertex i. Perform a Breadth First Search (BFS) traversal starting from vertex 0, visiting vertices from left to right according to the adjacency list, and return a list conta
15+ min read
Binary Search Algorithm - Iterative and Recursive Implementation Binary Search Algorithm is a searching algorithm used in a sorted array by repeatedly dividing the search interval in half. The idea of binary search is to use the information that the array is sorted and reduce the time complexity to O(log N). Binary Search AlgorithmConditions to apply Binary Searc
15 min read
Insertion Sort Algorithm Insertion sort is a simple sorting algorithm that works by iteratively inserting each element of an unsorted list into its correct position in a sorted portion of the list. It is like sorting playing cards in your hands. You split the cards into two groups: the sorted cards and the unsorted cards. T
9 min read
Dijkstra's Algorithm to find Shortest Paths from a Source to all Given a weighted undirected graph represented as an edge list and a source vertex src, find the shortest path distances from the source vertex to all other vertices in the graph. The graph contains V vertices, numbered from 0 to V - 1.Note: The given graph does not contain any negative edge. Example
12 min read
Selection Sort Selection Sort is a comparison-based sorting algorithm. It sorts an array by repeatedly selecting the smallest (or largest) element from the unsorted portion and swapping it with the first unsorted element. This process continues until the entire array is sorted.First we find the smallest element an
8 min read