The CANON_EQ field of the Pattern class matches two characters only if they are canonically equal. When you use this as flag value to the compile() method, two characters will be matched if and only if their full canonical decompositions are equal.
Where canonical decomposition is one of the Unicode text normalization forms
Example 1
import java.util.regex.Matcher; import java.util.regex.Pattern; public class CANON_EQ_Example { public static void main( String args[] ) { String regex = "b\u0307"; //Compiling the regular expression Pattern pattern = Pattern.compile(regex, Pattern.CANON_EQ); //Retrieving the matcher object Matcher matcher = pattern.matcher("\u1E03"); if(matcher.matches()) { System.out.println("Match found"); } else { System.out.println("Match not found"); } } }
Output
Match found
Example 2
import java.util.regex.Matcher; import java.util.regex.Pattern; public class CANON_EQ_Example { public static void main( String args[] ) { String regex = "a\u030A"; //Compiling the regular expression Pattern pattern = Pattern.compile(regex, Pattern.CANON_EQ); //Retrieving the matcher object String [] input = {"\u00E5", "a\u0311", "a\u0325", "a\u030A", "a\u1E03", "a\uFB03" }; for (String ele : input) { Matcher matcher = pattern.matcher(ele); if(matcher.matches()) { System.out.println(ele+" is a match for "+regex); } else { System.out.println(ele+" is not a match for "+regex); } } } }
Output
å is a match for a? a? is not a match for a? a? is not a match for a? a? is a match for a? a? is not a match for a? a? is not a match for a?