
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Find Frequency of Common Elements in a List of Strings
In this Python article, the given task is to get the frequency of the elements which are common in a list of strings. Sometimes the list to be analyzed using Python is available in an Excel file. To get this list from Excel, a module called openpyxl is used. In this Python article, using three different examples, the ways of getting the frequency of the items which are repeated in a list of strings are given. In example 1, the frequency of the characters which are common in a list of strings is found. In the next two examples, the methods are given where the frequency of words that are common in a list of strings is given. In these examples, the list of strings is fetched from an Excel file's column.
Preprocessing Steps
Step 1 ? Login using Google account. Go to Google Colab. Open a new Colab Notebook and write the python code in it.
Step 2 ? First upload the Excel file "oldrecord5.xlsx" to Google Colab.
Step 3 ? Import "openpyxl".
Step 4 ? Use openpyxl.load_workbook function to load the excel file.
Step 5 ? Open the active sheet in a variable called myxlsxsheet
Step 6 ? Fetch this string column into a dataframe using Pandas.
Step 7 ? Convert dataframe to a list. Call this list as "title_list"
The Excel file content used for these examples
Fig: Showing the Excel file oldrecord5.xls used in examples
Upload the Excel file to colab
Fig: Uploading the oldrecord5.xls in Google Colab
Example 1: Get the frequency of the characters that are found in a list of strings using reduce function
In this approach, the reduce function is being used.
Step 1 ? Use the list "title_list" from the preprocessing steps above.
Step 2 ? Use reduce, lambda, and Counter to find the frequency of the characters for all the common characters in these strings.
Step 3 ? Show the results in the form of a dictionary.
Write the following code in the Google Colab Worksheet's code cell
import openpyxl from openpyxl import load_workbook import pandas as pd from functools import reduce from collections import Counter # load excel file with its path myxlsx = openpyxl.load_workbook("oldrecord5.xlsx") myxlsxsheet = myxlsx.active # Convert to DataFrame df = pd.DataFrame(myxlsxsheet.values) #Select those rows that contain "Discussion" String df1=df[df.iloc[:,3].str.contains('Discussion')] #Select only the titles' column df2 = df1.iloc[:,3] title_list=df2.values.tolist() print(title_list) itemFreq = reduce(lambda m, n: m & n, (Counter(elem) for elem in title_list[1:]),Counter(title_list[0])) print("Common Characters and their occurrence : " , str(dict(itemFreq)))
Viewing The Result
Press the play button on the code cells to see the results.
['Types of Environment - Class Discussion', 'Management Structure and Nature - Class Discussion', 'Macro- Demography, Natural, Legal & Political - Class Discussion'] Common Characters and their occurrence : {'e': 2, 's': 5, ' ': 5, 'o': 1, 'n': 1, 'i': 2, 'r': 1, 'm': 1, 't': 1, '-': 1, 'C': 1, 'l': 1, 'a': 1, 'D': 1, 'c': 1, 'u': 1}
Fig 1: Showing the results using Google Colab.
Example 2: Get the frequency of the words that are found in a list of strings by combining and sorting lists
To follow this approach we have used the following steps
Step 1 ? Use the list "title_list" from the preprocessing steps above.
Step 2 ? Use split on individual list items to separate it into words and then combine these words into a combined list.
Step 3 ? Sort this combined list and find the frequency using Counter. Show the results in the form of a dictionary.
Write the following code in the Google Colab Worksheet's code cell.
from collections import Counter import openpyxl from openpyxl import load_workbook import pandas as pd # load excel file with its path myxlsx = openpyxl.load_workbook("oldrecord5.xlsx") myxlsxsheet = myxlsx.active # Convert to DataFrame df = pd.DataFrame(myxlsxsheet.values) #Select those rows that contain "Discussion" String df1=df[df.iloc[:,3].str.contains('Discussion')] #Select only titles' column df2 = df1.iloc[:,3] title_list=df2.values.tolist() print(title_list) lst1= title_list[0].split() lst2= title_list[1].split() lst3= title_list[2].split() combinedlist = [*lst1, *lst2, *lst3] # Print output print("Concatenated List: ",combinedlist) for elem in sorted(combinedlist): print(elem) frequencyofelements=Counter(combinedlist) print("frequency of elements: ",frequencyofelements)
The Output of the Example 2
To see the results in colab, press the play button on the code cells.
['Types of Environment - Class Discussion', 'Management Structure and Nature - Class Discussion', 'Macro- Demography, Natural, Legal & Political - Class Discussion'] Concatenated List: ['Types', 'of', 'Environment', '-', 'Class', 'Discussion', 'Management', 'Structure', 'and', 'Nature', '-', 'Class', 'Discussion', 'Macro-', 'Demography,', 'Natural,', 'Legal', '&', 'Political', '-', 'Class', 'Discussion'] & - - - Class Class Class Demography, Discussion Discussion Discussion Environment Legal Macro- Management Natural, Nature Political Structure Types and of frequency of elements: Counter({'-': 3, 'Class': 3, 'Discussion': 3, 'Types': 1, 'of': 1, 'Environment': 1, 'Management': 1, 'Structure': 1, 'and': 1, 'Nature': 1, 'Macro-': 1, 'Demography,': 1, 'Natural,': 1, 'Legal': 1, '&': 1, 'Political': 1})
Fig 2: Showing the results using Google Colab.
Example 3: Get the frequency of the words that are found in a list of strings by using pandas and their functions
To follow this approach we have used the following steps
Step 1 ? Use the list "title_list" from the preprocessing steps above.
Step 2 ? Use split on individual list items to separate them into words and then combine these words into a combined list.
Step 3 ? Use Pandas Series and then use value_counts() function to calculate the frequency of words used. Show the output.
Write the following code in the Google Colab Worksheet's code cell.
import openpyxl from openpyxl import load_workbook import pandas as pd # load excel file with its path myxlsx = openpyxl.load_workbook("oldrecord5.xlsx") myxlsxsheet = myxlsx.active # Convert to DataFrame df = pd.DataFrame(myxlsxsheet.values) #Select those rows that contain "Discussion" String df1=df[df.iloc[:,3].str.contains('Discussion')] #Select only titles' column df2 = df1.iloc[:,3] title_list=df2.values.tolist() print(title_list) lst1= title_list[0].split() lst2= title_list[1].split() lst3= title_list[2].split() #combinedlist = [*lst1, *lst2, *lst3, *lst4, *lst5] combinedlist = [*lst1, *lst2, *lst3] # Print output print("Concatenated List: ",combinedlist) frequencyofelements = pd.Series(combinedlist).value_counts() print("frequency of elements: ") print(frequencyofelements)
Viewing The Result
Press the play button on the code cells to see the results
['Types of Environment - Class Discussion', 'Management Structure and Nature - Class Discussion', 'Macro- Demography, Natural, Legal & Political - Class Discussion'] Concatenated List: ['Types', 'of', 'Environment', '-', 'Class', 'Discussion', 'Management', 'Structure', 'and', 'Nature', '-', 'Class', 'Discussion', 'Macro-', 'Demography,', 'Natural,', 'Legal', '&', 'Political', '-', 'Class', 'Discussion'] frequency of elements: - 3 Class 3 Discussion 3 Types 1 of 1 Environment 1 Management 1 Structure 1 and 1 Nature 1 Macro- 1 Demography, 1 Natural, 1 Legal 1 & 1 Political 1 dtype: int64
In this Python article, by three different examples, the ways to show how to find the frequency of the elements that are found in a list of strings are given. In the first example, the way to do this is given by treating elements as simple characters occurring in strings. In Example two and three, first the strings are separated as individual meaningful words and then they are used as elements to get the frequency.