Python | Remove Redundant Substrings from Strings List
Last Updated :
07 Apr, 2023
Given list of Strings, task is to remove all the strings, which are substrings of other Strings.
Input : test_list = ["Gfg", "Gfg is best", "Geeks", "for", "Gfg is for Geeks"]
Output : ['Gfg is best', 'Gfg is for Geeks']
Explanation : "Gfg", "for" and "Geeks" are present as substrings in other strings.
Input : test_list = ["Gfg", "Geeks", "for", "Gfg is for Geeks"]
Output : ['Gfg is for Geeks']
Explanation : "Gfg", "for" and "Geeks" are present as substrings in other strings.
Method #1 : Using enumerate() + join() + sort()
The combination of above functions can be used to solve this problem. In this, first the sorting is performed on length parameter, and current word is checked with other words, if it occurs as substring, if yes, its excluded from filtered result.
Python3
# Python3 code to demonstrate working of
# Remove Redundant Substrings from Strings List
# Using enumerate() + join() + sort()
# initializing list
test_list = ["Gfg", "Gfg is best", "Geeks", "Gfg is for Geeks"]
# printing original list
print("The original list : " + str(test_list))
# using loop to iterate for each string
test_list.sort(key = len)
res = []
for idx, val in enumerate(test_list):
# concatenating all next values and checking for existence
if val not in ', '.join(test_list[idx + 1:]):
res.append(val)
# printing result
print("The filtered list : " + str(res))
OutputThe original list : ['Gfg', 'Gfg is best', 'Geeks', 'Gfg is for Geeks']
The filtered list : ['Gfg is best', 'Gfg is for Geeks']
Time complexity: O(nlogn), where n is the length of the test_list. The enumerate() + join() + sort() takes O(nlogn) time
Auxiliary Space: O(n), extra space of size n is required
Method #2 : Using list comprehension + join() + enumerate()
The combination of above functions can be used to solve this problem. In this, we perform task in similar way as above just the difference being in more compact way in list comprehension.
Python3
# Python3 code to demonstrate working of
# Remove Redundant Substrings from Strings List
# Using list comprehension + join() + enumerate()
# initializing list
test_list = ["Gfg", "Gfg is best", "Geeks", "Gfg is for Geeks"]
# printing original list
print("The original list : " + str(test_list))
# using list comprehension to iterate for each string
# and perform join in one liner
test_list.sort(key = len)
res = [val for idx, val in enumerate(test_list) if val not in ', '.join(test_list[idx + 1:])]
# printing result
print("The filtered list : " + str(res))
OutputThe original list : ['Gfg', 'Gfg is best', 'Geeks', 'Gfg is for Geeks']
The filtered list : ['Gfg is best', 'Gfg is for Geeks']
The Time and Space Complexity for all the methods are the same:
Time Complexity: O(n)
Space Complexity: O(n)
Method#3: Using Recursive method.
Algorithm
- Sort the list of strings by length.
- Initialize an empty result list.
- For each string in the sorted list: a. Check if the string is a redundant substring of any of the remaining strings in the list (i.e., any string that comes after it in the sorted list). If it is, skip the string and move on to the next one. b. If the string is not redundant, add it to the result list.
- Return the result list.
Python3
def remove_redundant_substrings(strings):
# Base case: if the list is empty or has only one element, return it
if len(strings) <= 1:
return strings
# Sort the list by length to simplify the recursion
strings.sort(key=len)
# Take the first string and remove it from the list
current_string = strings.pop(0)
# Recursively remove redundant substrings from the rest of the list
remaining_strings = remove_redundant_substrings(strings)
# Check if the current string is a redundant substring of any of the remaining strings
for string in remaining_strings:
if current_string in string:
return remaining_strings
# If the current string is not redundant, add it back to the list and return it
remaining_strings.append(current_string)
return remaining_strings
test_list = ["Gfg", "Gfg is best", "Geeks", "Gfg is for Geeks"]
print("The original list : " + str(test_list))
res = remove_redundant_substrings(test_list)
print("The filtered list : " + str(res))
OutputThe original list : ['Gfg', 'Gfg is best', 'Geeks', 'Gfg is for Geeks']
The filtered list : ['Gfg is for Geeks', 'Gfg is best']
The time complexity of this algorithm is O(n^2 * m), where n is the number of strings in the input list and m is the maximum length of a string in the list. The worst case occurs when all the strings are unique and none of them are a substring of any of the others, so we have to check each string against every other string in the list. The sorting step takes O(n log n) time, and the string comparisons take O(m) time each, so the overall time complexity is O(n^2 * m).
The auxiliary space of this algorithm is O(n * m), because we need to store a copy of each string in the input list (which takes O(n * m) space), plus the result list (which can also take up to O(n * m) space if all the strings are unique and none of them are redundant substrings of any of the others
Similar Reads
Python - Remove empty strings from list of strings When working with lists of strings in Python, you may encounter empty strings (" ") that need to be removed. We'll explore various methods to Remove empty strings from a list. Using List ComprehensionList comprehension is the most concise and efficient method to filter out empty strings. This method
2 min read
Python - Check if substring present in string The task is to check if a specific substring is present within a larger string. Python offers several methods to perform this check, from simple string methods to more advanced techniques. In this article, we'll explore these different methods to efficiently perform this check.Using in operatorThis
2 min read
Python | Filter String with substring at specific position Sometimes, while working with Python string lists, we can have a problem in which we need to extract only those lists that have a specific substring at a specific position. This kind of problem can come in data processing and web development domains. Let us discuss certain ways in which this task ca
7 min read
String Subsequence and Substring in Python Subsequence and Substring both are parts of the given String with some differences between them. Both of them are made using the characters in the given String only. The difference between them is that the Substring is the contiguous part of the string and the Subsequence is the non-contiguous part
5 min read
Check if String Contains Substring in Python This article will cover how to check if a Python string contains another string or a substring in Python. Given two strings, check whether a substring is in the given string. Input: Substring = "geeks" String="geeks for geeks"Output: yesInput: Substring = "geek" String="geeks for geeks"Output: yesEx
8 min read
Remove Words that are Common in Two Strings We are given two strings we need to remove words that are common in two strings. For example, we are having two strings s = "hello world programming is fun" a = "world is amazing programming" we need to remove the common words from both the string so that output should be "hello fun amazing" we can
3 min read
Remove All Duplicates from a Given String in Python The task of removing all duplicates from a given string in Python involves retaining only the first occurrence of each character while preserving the original order. Given an input string, the goal is to eliminate repeated characters and return a new string with unique characters. For example, with
2 min read
Python | Remove all duplicates words from a given sentence Goal is to process a sentence such that all duplicate words are removed, leaving only the first occurrence of each word. Final output should maintain the order of the words as they appeared in the original sentence. Let's understand how to achieve the same using different methods:Using set with join
4 min read
Python | Remove consecutive duplicates from list Removing consecutive duplicates from a list means eliminating repeated elements that appear next to each other in the list. If an element repeats consecutively, only the first occurrence should remain and the duplicates should be removed.Example:Input: ['a', 'a', 'b', 'b', 'c', 'a', 'a', 'a']Output:
3 min read
Python - Find all close matches of input string from a list In Python, there are multiple ways to find all close matches of a given input string from a list of strings. Using startswith() startswith() function is used to identify close matches for the input string. It checks if either the strings in the list start with the input or if the input starts with t
3 min read