
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Find All Close Matches of Input String from a List in Java
Finding close matches to a string from a list of words is a common problem in string manipulation and pattern recognition. This article demonstrates two effective approaches for solving this problem in Java. The first approach utilizes string encoding to identify similar patterns, while the second approach leverages the Levenshtein Distance algorithm to find approximate matches.
String Encoding Approach
The string encoding approach is a clever technique where each string is transformed into a unique encoded format. Strings with the same encoded pattern are considered matches
HashMap
A HashMap in Java is a collection that stores key-value pairs, where keys are unique, and values can be associated with them.
-
containsKey(Object key): Checks if a specific key exists in the map
-
put(K key, V value): Associates a key with a value in the map
- get(Object key): Retrieves the value associated with a given key.
Example
Below is the example to find all close matches of input string from a list string encoding approach ?
import java.io.*; import java.util.*; public class Demo{ static String string_encoding(String str){ HashMap<Character, Integer> my_map = new HashMap<>(); String result = ""; int i = 0; char ch; for (int j = 0; j < str.length(); j++) { ch = str.charAt(j); if (!my_map.containsKey(ch)) my_map.put(ch, i++); result += my_map.get(ch); } return result; } static void match_words( String[] my_arr, String my_pattern){ int len = my_pattern.length(); String hash_val = string_encoding(my_pattern); for (String word : my_arr){ if (word.length() == len && string_encoding(word).equals(hash_val)) System.out.print(word + " "); } } public static void main(String args[]){ String[] my_arr = { "mno", "aabb", "pqr", "xxyy", "mmnn" }; String my_pattern = "ddcc"; System.out.println("The patterns similar to ddcc in the array are :"); match_words(my_arr, my_pattern); } }
Output
The patterns similar to ddcc in the array are : aabb xxyy mmnn
Time Complexity
String Encoding: O(m)O(m), where m is the length of the string.
Pattern Matching: O(nÃm), where n is the number of words in the list.
Space Complexity
HashMap Storage: O(k), where k is the number of unique characters in a string.
Using Levenshtein Distance
The Levenshtein Distance algorithm calculates the minimum number of single-character edits (insertions, deletions, or substitutions) required to transform one string into another. This approach is ideal for finding approximate matches rather than exact pattern matches.
- Dynamic Programming (DP) is a problem-solving technique that breaks a problem into smaller overlapping subproblems, solving each once and storing their results for reuse.
int[][] dp = new int[m + 1][n + 1];Find Edit Distance ?
int editDistance = dp[m][n];Compare Distances to Identify Matches ?
if (editDistance <= threshold) closeMatches.add(candidateString);
Example
Below is the example to find all close matches of the input string from a list using Levenshtein distance ?
public class LevenshteinDemo { static int levenshteinDistance(String s1, String s2) { int[][] dp = new int[s1.length() + 1][s2.length() + 1]; for (int i = 0; i <= s1.length(); i++) { for (int j = 0; j <= s2.length(); j++) { if (i == 0) { dp[i][j] = j; } else if (j == 0) { dp[i][j] = i; } else if (s1.charAt(i - 1) == s2.charAt(j - 1)) { dp[i][j] = dp[i - 1][j - 1]; } else { dp[i][j] = 1 + Math.min(dp[i - 1][j - 1], Math.min(dp[i - 1][j], dp[i][j - 1])); } } } return dp[s1.length()][s2.length()]; } static void findCloseMatches(String[] words, String target, int threshold) { System.out.println("Close matches for "" + target + "" with distance <= " + threshold + ":"); for (String word : words) { if (levenshteinDistance(word, target) <= threshold) { System.out.print(word + " "); } } } public static void main(String[] args) { String[] words = { "mno", "aabb", "pqr", "xxyy", "mmnn" }; String target = "aabb"; findCloseMatches(words, target, 2); } }
Output
Close matches for "aabb" with distance <= 2:Time Complexity
aabb xxyy mmnn
Levenshtein Distance Calculation: O(nÃm), where n and m are the lengths of the two strings.
Pattern Matching: O(nÃpÃq), where n is the number of words, p is the average length of words, and q is the target string length.
Space Complexity
DP Table Storage: O(mÃq)