Find All Close Matches of Input String from a List in Java



Finding close matches to a string from a list of words is a common problem in string manipulation and pattern recognition. This article demonstrates two effective approaches for solving this problem in Java. The first approach utilizes string encoding to identify similar patterns, while the second approach leverages the Levenshtein Distance algorithm to find approximate matches.

String Encoding Approach

The string encoding approach is a clever technique where each string is transformed into a unique encoded format. Strings with the same encoded pattern are considered matches

HashMap

A HashMap in Java is a collection that stores key-value pairs, where keys are unique, and values can be associated with them.

  • containsKey(Object key): Checks if a specific key exists in the map
  • put(K key, V value): Associates a key with a value in the map
  • get(Object key): Retrieves the value associated with a given key.

Example

Below is the example to find all close matches of input string from a list string encoding approach ?

import java.io.*;
import java.util.*;
public class Demo{
   static String string_encoding(String str){
      HashMap<Character, Integer> my_map = new HashMap<>();
      String result = "";
      int i = 0;
      char ch;
      for (int j = 0; j < str.length(); j++) {
         ch = str.charAt(j);
         if (!my_map.containsKey(ch))
         my_map.put(ch, i++);
         result += my_map.get(ch);
      }
      return result;
   }
   static void match_words( String[] my_arr, String my_pattern){
      int len = my_pattern.length();
      String hash_val = string_encoding(my_pattern);
      for (String word : my_arr){
         if (word.length() == len && string_encoding(word).equals(hash_val))
         System.out.print(word + " ");
      }
   }
   public static void main(String args[]){
      String[] my_arr = { "mno", "aabb", "pqr", "xxyy", "mmnn" };
      String my_pattern = "ddcc";
      System.out.println("The patterns similar to ddcc in the array are :");
      match_words(my_arr, my_pattern);
   }
}

Output

The patterns similar to ddcc in the array are :
aabb xxyy mmnn

Time Complexity
String Encoding: O(m)O(m), where m is the length of the string.
Pattern Matching: O(n×m), where n is the number of words in the list.
Space Complexity
HashMap Storage: O(k), where k is the number of unique characters in a string.

Using Levenshtein Distance

The Levenshtein Distance algorithm calculates the minimum number of single-character edits (insertions, deletions, or substitutions) required to transform one string into another. This approach is ideal for finding approximate matches rather than exact pattern matches.

  • Dynamic Programming (DP) is a problem-solving technique that breaks a problem into smaller overlapping subproblems, solving each once and storing their results for reuse.
Initialize DP Table ?
int[][] dp = new int[m + 1][n + 1];
Find Edit Distance ?
int editDistance = dp[m][n];
Compare Distances to Identify Matches ?
if (editDistance <= threshold) closeMatches.add(candidateString);

Example

Below is the example to find all close matches of the input string from a list using Levenshtein distance ?

public class LevenshteinDemo {
    static int levenshteinDistance(String s1, String s2) {
        int[][] dp = new int[s1.length() + 1][s2.length() + 1];

        for (int i = 0; i <= s1.length(); i++) {
            for (int j = 0; j <= s2.length(); j++) {
                if (i == 0) {
                    dp[i][j] = j;
                } else if (j == 0) {
                    dp[i][j] = i;
                } else if (s1.charAt(i - 1) == s2.charAt(j - 1)) {
                    dp[i][j] = dp[i - 1][j - 1];
                } else {
                    dp[i][j] = 1 + Math.min(dp[i - 1][j - 1],
                               Math.min(dp[i - 1][j], dp[i][j - 1]));
                }
            }
        }

        return dp[s1.length()][s2.length()];
    }

    static void findCloseMatches(String[] words, String target, int threshold) {
        System.out.println("Close matches for "" + target + "" with distance <= " + threshold + ":");

        for (String word : words) {
            if (levenshteinDistance(word, target) <= threshold) {
                System.out.print(word + " ");
            }
        }
    }

    public static void main(String[] args) {
        String[] words = { "mno", "aabb", "pqr", "xxyy", "mmnn" };
        String target = "aabb";

        findCloseMatches(words, target, 2);
    }
}

Output

Close matches for "aabb" with distance <= 2:
aabb xxyy mmnn
Time Complexity
Levenshtein Distance Calculation: O(n×m), where n and m are the lengths of the two strings.
Pattern Matching: O(n×p×q), where n is the number of words, p is the average length of words, and q is the target string length.
Space Complexity
DP Table Storage: O(m×q)

Alshifa Hasnain
Alshifa Hasnain

Converting Code to Clarity

Updated on: 2024-12-17T03:34:20+05:30

684 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements