Python - Group list by first character of string

Last Updated : 23 Mar, 2023

Sometimes, we have a use case in which we need to perform the grouping of strings by various factors, like first letter or any other factor. These types of problems are typical to database queries and hence can occur in web development while programming. This article focuses on one such grouping by the first letter of the string. Let’s discuss certain ways in which this can be performed.

Method #1: Using next() + lambda + loop

The combination of the above 3 functions is used to solve this particular problem by the naive method. The lambda function performs the task of finding like initial character, and the next function helps in forwarding iteration.

Python3

# Python3 code to demonstrate
# Initial Character Case Categorization
# using next() + lambda + loop

# initializing list
test_list = ['an', 'a', 'geek', 'for', 'g', 'free']

# printing original list
print("The original list : " + str(test_list))

# using next() + lambda + loop
# Initial Character Case Categorization


def util_func(x, y): return x[0] == y[0]


res = []
for sub in test_list:
    ele = next((x for x in res if util_func(sub, x[0])), [])
    if ele == []:
        res.append(ele)
    ele.append(sub)

# print result
print("The list after Categorization : " + str(res))

Output :

The original list : ['an', 'a', 'geek', 'for', 'g', 'free']
The list after Categorization : [['an', 'a'], ['geek', 'g'], ['for', 'free']]

Time Complexity: O(n^2), where n is the number of elements in the list.
Auxiliary Space: O(n), where n is the number of elements in the list.

Method #2: Using sorted() + groupby()

This particular task can also be solved using the groupby function which offers a convenient method to solve this problem. The sorted function sorts the elements by initial character to be feed to groupby for the relevant grouping.

Python3

# Python3 code to demonstrate
# Initial Character Case Categorization
# using sorted() + groupby()
from itertools import groupby

# initializing list
test_list = ['an', 'a', 'geek', 'for', 'g', 'free']

# printing original list
print("The original list : " + str(test_list))

# using sorted() + groupby()
# Initial Character Case Categorization


def util_func(x): return x[0]


temp = sorted(test_list, key=util_func)
res = [list(ele) for i, ele in groupby(temp, util_func)]

# print result
print("The list after Categorization : " + str(res))

Output :

The original list : ['an', 'a', 'geek', 'for', 'g', 'free']
The list after Categorization : [['an', 'a'], ['geek', 'g'], ['for', 'free']]

Time complexity: O(nlogn), where n is the length of the input list.
Auxiliary space: O(n), where n is the length of the input list.

Method #3: Using for loop

Python3

# Python3 code to demonstrate
# Initial Character Case Categorization

# initializing list
test_list = ['an', 'a', 'geek', 'for', 'g', 'free']

# printing original list
print("The original list : " + str(test_list))

res = []
x=[]
for i in test_list:
    if i[0] not in x:
        x.append(i[0])
for i in x:
    p=[]
    for j in test_list:
        if j[0]==i:
            p.append(j)
    res.append(p)
            
# print result
print("The list after Categorization : " + str(res))

Output

The original list : ['an', 'a', 'geek', 'for', 'g', 'free']
The list after Categorization : [['an', 'a'], ['geek', 'g'], ['for', 'free']]

Time complexity: O(n^2), where n is the length of the input list.

Auxiliary space: O(n), where n is the length of the input list.

Approach 4: Using defaultdict

Python3

from collections import defaultdict

#Initializing list
test_list = ['an', 'a', 'geek', 'for', 'g', 'free']

#printing original list
print("The original list : " + str(test_list))

#Using defaultdict
res = defaultdict(list)
for i in test_list:
    res[i[0]].append(i)

#print result
print("The list after Categorization : " + str(list(res.values())))
#This code is contributed by Edula Vinay Kumar Reddy

Output

The original list : ['an', 'a', 'geek', 'for', 'g', 'free']
The list after Categorization : [['an', 'a'], ['geek', 'g'], ['for', 'free']]

Time Complexity: O(n) where n is the number of elements in test_list
Auxiliary Space: O(n) as we use a defaultdict to store the result
Explanation:
We use a defaultdict to store the result where the key is the first character of each string in the list and value is the list of all strings with the same first character.
The defaultdict automatically initializes the value as an empty list if the key is not present. So, we don't have to check if the key is present or not.
We iterate through the test_list and add the elements to the corresponding key in the defaultdict.
Finally, we convert the defaultdict values to a list to get the final result.\

Method #5: Using dictionary comprehension

Use dictionary comprehension to categorize the words based on their first character.

First, we create a set of unique first characters in the list using set comprehension: set([word[0] for word in test_list]).
Next, we create a dictionary comprehension where the keys are the first characters and the values are the list of words starting with that character: {char: [word for word in test_list if word.startswith(char)] for char in set([word[0] for word in test_list])}.
Finally, we print the result.

Python3

# initializing list
test_list = ['an', 'a', 'geek', 'for', 'g', 'free']

# printing original list
print("The original list : " + str(test_list))

# using dictionary comprehension
# Initial Character Case Categorization
res = {char: [word for word in test_list if word.startswith(char)] for char in set([word[0] for word in test_list])}

# print result
print("The list after Categorization : " + str(res))

Output

The original list : ['an', 'a', 'geek', 'for', 'g', 'free']
The list after Categorization : {'g': ['geek', 'g'], 'f': ['for', 'free'], 'a': ['an', 'a']}

Time complexity: O(n^2), where n is the length of the input list. This is because we use the startswith() method inside the list comprehension, which has a time complexity of O(n).

Auxiliary space: O(n), where n is the length of the input list. This is because we create a dictionary where each key has a list of words starting with that character.

Method #6: Using itertools.groupby() with sorted()

Use the itertools.groupby() function in combination with sorted() to group the words based on their first character.

Step-by-step approach:

First, sort the input list using sorted().
Then, use itertools.groupby() to group the words based on their first character. groupby() returns an iterator of (key, group) pairs, where key is the first character and group is an iterator of words starting with that character.
Iterate over the (key, group) pairs, convert the group iterator to a list using list(), and append it to the result list res.
Finally, print the result.

Below is the implementation of the above approach:

Python3

import itertools

# initializing list
test_list = ['an', 'a', 'geek', 'for', 'g', 'free']

# printing original list
print("The original list : " + str(test_list))

# using itertools.groupby() with sorted()
# Initial Character Case Categorization
res = []
for k, g in itertools.groupby(sorted(test_list), key=lambda x: x[0]):
    res.append(list(g))

# print result
print("The list after Categorization : " + str(res))

Output

The original list : ['an', 'a', 'geek', 'for', 'g', 'free']
The list after Categorization : [['a', 'an'], ['for', 'free'], ['g', 'geek']]

Time complexity: O(n log n), where n is the length of the input list. This is because we use sorted() which has a time complexity of O(n log n).
Auxiliary space: O(n), where n is the length of the input list. This is because we create a list of lists, where each inner list contains the words starting with the same character.