Python - Maximum occurring Substring from list
Last Updated :
08 May, 2023
Sometimes, while working with Python strings, we can have a problem in which we need to check for maximum occurring substring from strings list. This can have application in DNA sequencing in Biology and other application. Let's discuss certain way in which this task can be performed.
Method 1 : Using regex() + groupby() + max() + lambda
The combination of above functionalities can be used to solve this particular problem. In this, we first extract the sequences using regex function. Then the counter grouping is performed using groupby(). The last step is extracting maximum which is done using max() along with lambda function.
Python3
# Python3 code to demonstrate working of
# Maximum occurring Substring from list
# Using regex() + groupby() + max() + lambda
import re
import itertools
# initializing string
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = ['gfg', 'is', 'best']
# printing original string and list
print("The original string is : " + test_str)
print("The original list is : " + str(test_list))
# Maximum occurring Substring from list
# Using regex() + groupby() + max() + lambda
seqs = re.findall(str.join('|', test_list), test_str)
grps = [(key, len(list(j))) for key, j in itertools.groupby(seqs)]
res = max(grps, key=lambda ele: ele[1])
# printing result
print("Maximum frequency substring : " + str(res[0]))
Output : The original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg
Time complexity: O(n), where n is the length of the input string. The time complexity of regex(), groupby(), and max() is O(n).
Auxiliary space: O(k), where k is the length of the input list. This is the space needed to store the list of substrings. The space complexity of regex(), groupby(), and max() is O(1).
Method 2: Using count() and max() methods
count() returns the occurrence of a particular element in a sequence and the max() method returns the maximum of that.
Python3
# Python3 code to demonstrate working of
# Maximum occurring Substring from list
# initializing string
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = ['gfg', 'is', 'best']
# printing original string and list
print("The original string is : " + test_str)
print("The original list is : " + str(test_list))
res = []
for i in test_list:
res.append(test_str.count(i))
x = max(res)
result = test_list[res.index(x)]
# printing result
print("Maximum frequency substring : " + str(result))
OutputThe original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg
Time Complexity: O(n)
Auxiliary Space: O(n)
Method 3: Using re.findall() + Counter
This is an alternate approach that uses re.findall() and Counter module. In this, we extract the sequence using re.findall() and count the occurrence of each element using Counter() from collections module.
Python3
# Python3 code to demonstrate working of
# Maximum occurring Substring from list
# Using re.findall() + Counter
# importing modules
import collections
import re
# initializing string
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = ['gfg', 'is', 'best']
# printing original string and list
print("The original string is : " + test_str)
print("The original list is : " + str(test_list))
# Maximum occurring Substring from list
# Using re.findall() + Counter
seqs = re.findall(str.join('|', test_list), test_str)
res = collections.Counter(seqs).most_common(1)[0][0]
# printing result
print("Maximum frequency substring : " + str(res))
# This code is contributed by Edula Vinay Kumar Reddy
OutputThe original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg
Time Complexity: O(n)
Auxiliary Space: O(n)
Method 4 : Using operator.countOf() and max() methods
Python3
# Python3 code to demonstrate working of
# Maximum occurring Substring from list
# initializing string
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = ['gfg', 'is', 'best']
# printing original string and list
print("The original string is : " + test_str)
print("The original list is : " + str(test_list))
res = []
for i in test_list:
import operator
res.append(operator.countOf(test_str, i))
x = max(res)
result = test_list[res.index(x)]
# printing result
print("Maximum frequency substring : " + str(result))
OutputThe original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg
Time Complexity : O(n)
Auxiliary Space : O(n)
Method 5: Using a dictionary to count occurrences
In this approach, we can use a dictionary to count the occurrences of each substring in the list. We can iterate over the string and for each substring in the list, we can count the number of occurrences of that substring in the string and update the count in the dictionary. Finally, we can find the substring with the maximum count in the dictionary.
Approach:
- Initialize an empty dictionary to count the occurrences of substrings.
- Iterate over the string using a for loop.
- For each substring in the list, find the number of occurrences of that substring in the string using the count() method and update the count in the dictionary.
- Find the substring with the maximum count in the dictionary.
- Return the maximum frequency substring.
Example:
Python3
# Python3 code to demonstrate working of
# Maximum occurring Substring from list
# Using dictionary
# initializing string
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = ['gfg', 'is', 'best']
# printing original string and list
print("The original string is : " + test_str)
print("The original list is : " + str(test_list))
# Maximum occurring Substring from list
# Using dictionary
count_dict = {}
for sub in test_list:
count_dict[sub] = test_str.count(sub)
res = max(count_dict, key=count_dict.get)
# printing result
print("Maximum frequency substring : " + str(res))
OutputThe original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg
Time complexity: O(n*m), where n is the length of the string and m is the total number of substrings in the list.
Auxiliary space: O(m), where m is the total number of substrings in the list.
Method 6: Using itertools.product() and count()
- Import the product function from the itertools module.
- Use the product() function to generate all possible substrings of length len(sub) for each substring sub in test_list.
- Count the number of occurrences of each substring using the count() method.
- Initialize a variable max_count to 0 and a variable max_substring to an empty string.
- Loop through the substrings and their counts.
- If the current count is greater than max_count, update max_count and max_substring to the corresponding substring.
- Print the maximum occurring substring.
Example:
Python3
# Python3 code to demonstrate working of
# Maximum occurring Substring from list
# Using itertools.product() and count()
import itertools
# initializing string
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = ['gfg', 'is', 'best']
# printing original string and list
print("The original string is : " + test_str)
print("The original list is : " + str(test_list))
# Maximum occurring Substring from list
# Using itertools.product() and count()
max_count = 0
max_substring = ""
for sub in test_list:
for substring in itertools.product(*[sub]*len(sub)):
count = test_str.count(''.join(substring))
if count > max_count:
max_count = count
max_substring = ''.join(substring)
# printing result
print("Maximum frequency substring : " + str(max_substring))
OutputThe original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg
Time complexity: O(n*m^2), where n is the length of test_list and m is the maximum length of a substring in test_list.
Auxiliary space: O(m)
Similar Reads
Python - Maximum Consecutive Substring Occurrence
Sometimes, while working with Python, we can have a problem in which we must check for substrings occurring in consecutive repetition. This can have applications in data domains. Let us discuss a way in which this task can be performed. Method #1: Using max() + re.findall() A combination of the abov
5 min read
Python - Maximum of String Integer list
Sometimes, while working with data, we can have a problem in which we receive a series of lists with data in string format, which we wish to find the max of each string list integer. Letâs discuss certain ways in which this task can be performed. Method #1 : Using loop + int() This is the brute forc
4 min read
Python | Find last occurrence of substring
Sometimes, while working with strings, we need to find if a substring exists in the string. This problem is quite common and its solution has been discussed many times before. The variation of getting the last occurrence of the string is discussed here. Let's discuss certain ways in which we can fin
8 min read
Python | Count String occurrences in mixed list
Sometimes, while working with data, we can have a problem in which we need to check for the occurrences of a particular data type. In this, we can also have a problem in which we need to check for string occurrences. Let's discuss certain ways in which this task can be performed. Method #1 : Using i
8 min read
Python - Maximum Quotient Pair in List
Sometimes, we need to find the specific problem of getting the pair which yields the maximum Quotient, this can be solved by sorting and getting the first and last elements of the list. But in some case, we donât with to change the ordering of list and perform some operation in a similar list withou
5 min read
Python | Maximum Difference in String
Sometimes, we might have a problem in which we require to get the maximum difference of 2 numbers from Strings but with a constraint of having the numbers in successions. This type of problem can occur while competitive programming. Letâs discuss certain ways in which this problem can be solved. Met
4 min read
Python | Consecutive Maximum Occurrence in list
Sometimes, while working with Python lists or in competitive programming setup, we can come across a subproblem in which we need to get an element which has the maximum consecutive occurrence. The knowledge of the solution of it can be of great help and can be employed whenever required. Let's discu
5 min read
Python - All occurrences of Substring from the list of strings
Given a list of strings and a list of substring. The task is to extract all the occurrences of a substring from the list of strings. Examples: Input : test_list = ["gfg is best", "gfg is good for CS", "gfg is recommended for CS"] subs_list = ["gfg", "CS"] Output : ['gfg is good for CS', 'gfg is reco
5 min read
Python - Ranged Maximum Element in String List
Sometimes, while working with Python data, we can have a problem in which we have data in form of String List and we require to find the maximum element in that data, but that also in a certain range of indices. This is quite peculiar problem but can have application in data domains. Let's discuss c
4 min read
Maximum String Value Length of Key - Python
The task of finding the maximum string length of a specific key in a list of dictionaries involves identifying the longest string associated with the given key. Given a list of dictionaries, the goal is to extract the string values of the specified key, compute their lengths and determine the maximu
3 min read