Similarity Metrics of Strings - Python Last Updated : 17 Jan, 2025 Comments Improve Suggest changes Like Article Like Report In Python, we often need to measure the similarity between two strings. For example, consider the strings "geeks" and "geeky" —we might want to know how closely they match, whether for tasks like comparing user inputs or finding duplicate entries. Let's explore different methods to compute string similarity.Using SequenceMatcher() from difflibSequenceMatcher class in the difflib module provides a simple way to measure string similarity based on the ratio of matching subsequences. Python from difflib import SequenceMatcher s1 = "geeks" s2 = "geeky" # Calculating similarity ratio res = SequenceMatcher(None, s1, s2).ratio() print(res) Output0.8 Explanation:SequenceMatcher() compares two strings and calculates the ratio of matching characters.The ratio method returns a float between 0 and 1, indicating how similar the strings are.This method is simple to use and works well for general string similarity tasks.Let's explore some more methods and see how we can find similarity metrics of strings.Table of ContentUsing Levenshtein distance (edit distance)Using Jaccard similarityUsing Cosine similarityUsing Hamming distanceUsing Levenshtein distance (edit distance)Levenshtein distance measures the number of edits (insertions, deletions, or substitutions) needed to convert one string into another. Python import Levenshtein s1 = "geeks" s2 = "geeky" # Calculating similarity ratio res = Levenshtein.ratio(s1, s2) print(res) Explanation:Levenshtein.ratio() method calculates a similarity score based on the edit distance.It is more accurate for cases where string transformations are involved.This method is widely used in text processing and is efficient for moderate string lengths.Using Jaccard similarityJaccard similarity compares the common elements between two sets and calculates their ratio to the union of the sets. Python s1 = "geeks" s2 = "geeky" # Converting strings to sets of characters set1 = set(s1) set2 = set(s2) # Calculating Jaccard similarity res = len(set1 & set2) / len(set1 | set2) print(res) Output0.6 Explanation:The strings are converted into sets of characters.The intersection and union of the sets are used to calculate the similarity ratio.This method is effective for comparing unique characters and is easy to implement.Using Cosine similarityCosine similarity measures the angle between two vectors in a multidimensional space, where each string is represented as a vector of character counts. Python from collections import Counter from math import sqrt s1 = "geeks" s2 = "geeky" # Convert strings to character frequency vectors vec1 = Counter(s1) vec2 = Counter(s2) # Calculating cosine similarity dot_product = sum(vec1[ch] * vec2[ch] for ch in vec1) magnitude1 = sqrt(sum(count ** 2 for count in vec1.values())) magnitude2 = sqrt(sum(count ** 2 for count in vec2.values())) res = dot_product / (magnitude1 * magnitude2) print(res) Output0.857142857142857 Explanation:The strings are represented as frequency vectors using the Counter class.The dot product and magnitudes of the vectors are used to compute the similarity.This method is useful for comparing strings with weighted character counts.Using Hamming distanceHamming distance measures the number of differing characters at corresponding positions in two strings of equal length. Python s1 = "geeks" s2 = "geeky" # Calculating Hamming distance res = sum(c1 != c2 for c1, c2 in zip(s1, s2)) if len(s1) == len(s2) else "Strings must be of equal length" print(res) Output1 Explanation:zip() function pairs characters from both strings for comparison.A generator expression counts differing characters.This method requires strings of equal length and is efficient for this specific task. Comment More infoAdvertise with us Next Article Similarity Metrics of Strings - Python manjeet_04 Follow Improve Article Tags : Python Python Programs Python string-programs Practice Tags : python Similar Reads Python - Filter Similar Case Strings Given the Strings list, the task is to write a Python program to filter all the strings which have a similar case, either upper or lower. Examples: Input : test_list = ["GFG", "Geeks", "best", "FOr", "all", "GEEKS"]Â Output : ['GFG', 'best', 'all', 'GEEKS']Â Explanation : GFG is all uppercase, best is 9 min read Python - Similar characters Strings comparison Given two Strings, separated by delim, check if both contain same characters. Input : test_str1 = 'e!e!k!s!g', test_str2 = 'g!e!e!k!s', delim = '!' Output : True Explanation : Same characters, just diff. positions. Input : test_str1 = 'e!e!k!s', test_str2 = 'g!e!e!k!s', delim = '!' Output : False Ex 6 min read Python | Kth index character similar Strings Sometimes, we require to get the words that have the Kth index with the specific letter. This kind of use case is quiet common in places of common programming projects or competitive programming. Letâs discuss certain shorthand to deal with this problem in Python. Method #1: Using list comprehension 3 min read Python - Remove similar index elements in Strings Given two strings, removed all elements from both, which are the same at similar index. Input : test_str1 = 'geels', test_str2 = 'beaks' Output : gel, bak Explanation : e and s are removed as occur in same indices. Input : test_str1 = 'geeks', test_str2 = 'geeks' Output : '', '' Explanation : Same s 6 min read Python | Grouping similar substrings in list Sometimes we have an application in which we require to group common prefix strings into one such that further processing can be done according to the grouping. This type of grouping is useful in the cases of Machine Learning and Web Development. Let's discuss certain ways in which this can be done. 7 min read Python Program to check for almost similar Strings Given two strings, the task here is to write a python program that can test if they are almost similar. Similarity of strings is being checked on the criteria of frequency difference of each character which should be greater than a threshold here represented by K. Input : test_str1 = 'aabcdaa', test 5 min read Python - Extract Similar Key Values Given a dictionary, extract all values which are from similar keys, i.e contains all similar characters, just jumbled to form each other. Input : test_dict = {'gfg' : 5, 'ggf' : 19, 'gffg' : 9, 'gff' : 3, 'fgg' : 3}, tst_wrd = 'fgg' Output : [5, 19, 3] Explanation : gfg, ggf and fgg have values, 5, 5 min read Python | Percentage similarity of lists Sometimes, while working with Python list, we have a problem in which we need to find how much a list is similar to other list. The similarity quotient of both the list is what is required in many scenarios we might have. Let's discuss a way in which this task can be performed. Method 1: Using "|" o 6 min read Python | Strings with similar front and rear character Sometimes, while programming, we can have a problem in which we need to check for the front and rear characters of each string. We may require to extract the count of all strings with similar front and rear characters. Let's discuss certain ways in which this task can be performed. Method #1: Using 4 min read Like