Python - Remove Rows for similar Kth column element

Last Updated : 14 Apr, 2023

Given a Matrix, remove row if similar element has occurred in row above in Kth column.

Input : test_list = [[3, 4, 5], [2, 3, 5], [10, 4, 3], [7, 8, 9], [9, 3, 6]], K = 2
Output : [[3, 4, 5], [10, 4, 3], [7, 8, 9], [9, 3, 6]]
Explanation : In [2, 3, 5], we already has list [3, 4, 5] having 5 at K, i.e 2nd pos.

Input : test_list = [[3, 4, 5], [2, 3, 3], [10, 4, 3], [7, 8, 9], [9, 3, 6]], K = 2
Output : [[3, 4, 5], [2, 3, 3], [7, 8, 9], [9, 3, 6]]
Explanation : In [10, 4, 3], we already has list [2, 3, 3] having 3 at K, i.e 2nd pos.

Method 1: Using loopset

In this, we maintain a memoization container which keeps track of elements in Kth column, if row's Kth column element is present already, that row is omitted from result.

Python3

# Python3 code to demonstrate working of 
# Remove Rows for similar Kth column element
# Using loop

# initializing list
test_list = [[3, 4, 5], [2, 3, 5], [10, 4, 3], [7, 8, 9], [9, 3, 6]]

# printing original list
print("The original list is : " + str(test_list))

# initializing K 
K = 1

res = []
memo = []
for sub in test_list:
    
    # in operator used to check if present or not
    if not sub[K] in memo:
        res.append(sub)
        memo.append(sub[K])

# printing result 
print("The filtered Matrix : " + str(res))

Output

The original list is : [[3, 4, 5], [2, 3, 5], [10, 4, 3], [7, 8, 9], [9, 3, 6]]
The filtered Matrix : [[3, 4, 5], [2, 3, 5], [7, 8, 9]]

Time complexity: O(M^N) as the number of combinations generated is M choose N.
Auxiliary space: O(M^N) as the size of the resultant list is also M choose N.

Method 2: Using list comprehension.

Use a list comprehension to create a new list that only contains the rows where the Kth element is unique. The code initializes the original list, Kth column, and then filters the list using list comprehension, finally printing the filtered matrix.

Step-by-step approach:

Initialize the original list.
Initialize the value of K.
Use list comprehension to create a new list that only contains the rows where the Kth element is unique.

Below is the implementation of the above approach:

Python3

# Python3 code to demonstrate working of 
# Remove Rows for similar Kth column element
# Using list comprehension

# initializing list
test_list = [[3, 4, 5], [2, 3, 5], [10, 4, 3], [7, 8, 9], [9, 3, 6]]

# printing original list
print("The original list is : " + str(test_list))

# initializing K 
K = 1

# using list comprehension to filter the list
res = [sub for i, sub in enumerate(test_list) if sub[K] not in [sub2[K] for sub2 in test_list[:i]]]

# printing result 
print("The filtered Matrix : " + str(res))

Output

The original list is : [[3, 4, 5], [2, 3, 5], [10, 4, 3], [7, 8, 9], [9, 3, 6]]
The filtered Matrix : [[3, 4, 5], [2, 3, 5], [7, 8, 9]]

Time complexity: O(n^2) because of the nested loop.
Auxiliary space: O(n^2) because we are storing the sub-lists in the memo list.

Method#3: Using Recursive method.

Algorithm:
1. Initialize an empty list called `res` and an empty list called `memo`.
2. Loop through each sub-list in the `test_list`.
3. Check if the Kth element of the sub-list is present in the `memo` list.
4. If the Kth element is not present in the `memo` list, append the sub-list to `res` and the Kth element to `memo`.
5. Return the filtered list `res`.

Python3

def remove_similar_rows(test_list, K, memo=None):
    if memo is None:
        memo = []
    if not test_list:
        return []
    if test_list[0][K] in memo:
        return remove_similar_rows(test_list[1:], K, memo)
    return [test_list[0]] + remove_similar_rows(test_list[1:], K, memo + [test_list[0][K]]) 

# initializing list
test_list = [[3, 4, 5], [2, 3, 5], [10, 4, 3], [7, 8, 9], [9, 3, 6]]

# printing original list
print("The original list is : " + str(test_list))

# initializing K 
K = 1

# using recursive function to remove rows
res = remove_similar_rows(test_list, K)

# printing result 
print("The filtered Matrix : " + str(res))

Output

The original list is : [[3, 4, 5], [2, 3, 5], [10, 4, 3], [7, 8, 9], [9, 3, 6]]
The filtered Matrix : [[3, 4, 5], [2, 3, 5], [7, 8, 9]]

The time complexity of this code is O(N^2) where N is the number of sub-lists in the `test_list`.

The space complexity of this code is also O(N^2) because we are storing the filtered list `res` and the `memo` list which can each have N elements in the worst case.

Method 4: Using the Set data structure:

Initialize an empty set unique_elements.
Initialize an empty list result[].
Loop through each sub-list in the given list test_list and check if the element at index K of the current sub-list is present in the unique_elements set or not. If it is not present, add it to the unique_elements set and append the current sub-list to the result list.
Return the result list.

Below is the implementation of the above approach:

Python3

# Python program for the above approach

# Driver Code
test_list = [[3, 4, 5], [2, 3, 5], [10, 4, 3], [7, 8, 9], [9, 3, 6]]
K = 1

unique_elements = set()
result = []

for sub in test_list:
    if sub[K] not in unique_elements:
        unique_elements.add(sub[K])
        result.append(sub)

print("The filtered Matrix : " + str(result))

Output

The filtered Matrix : [[3, 4, 5], [2, 3, 5], [7, 8, 9]]

The time complexity of this approach is O(N), where N is the number of sub-lists in the given list.

The space complexity is also O(N), as we are storing at most N unique elements in the unique_elements set and at most N sub-lists in the result list.

Method 5: Using heapq:

Algorithm:

Initialize an empty list res to store the filtered rows.
Initialize an empty set seen to keep track of the values of column K that have already been seen.
Iterate over each row in the input test_list:
a. If the value of column K for the current row is not in seen, then add the row to the res list using
heapq.heappush() and add the value of column K to seen.
Return the filtered list res.

Python3

import heapq

def remove_similar_rows(test_list, K):
    res = []
    seen = set()
    for row in test_list:
        if row[K] not in seen:
            heapq.heappush(res, row)
            seen.add(row[K])
    return res

# initializing list
test_list = [[3, 4, 5], [2, 3, 5], [10, 4, 3], [7, 8, 9], [9, 3, 6]]

# printing original list
print("The original list is : " + str(test_list))

# initializing K
K = 1

# using heapq to remove rows
res = remove_similar_rows(test_list, K)

# printing result
print("The filtered Matrix : " + str(res))
#This code is contributed by Rayudu.

Output

The original list is : [[3, 4, 5], [2, 3, 5], [10, 4, 3], [7, 8, 9], [9, 3, 6]]
The filtered Matrix : [[2, 3, 5], [3, 4, 5], [7, 8, 9]]

Time complexity: O(n log n) where n is the number of rows in the input test_list. This is because the heapq.heappush() operation takes log n time and it is performed for each row in the input list.

Space complexity: O(n) where n is the number of rows in the input test_list. This is because the res list and the seen set each store at most n elements.