
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Determine a Character's Unicode Block in Java
In this article, we will learn to represent the Unicode block containing the given character in Java. Unicode provides a standardized way to represent characters from various writing systems across the world. In Java, characters belong to different Unicode Blocks, which help in categorizing them based on language, symbols, and special characters.
Understanding Unicode Blocks
A Unicode Block is a range of Unicode characters grouped together based on similar properties.
For example:
-
Basic Latin (U+0000 to U+007F) contains English letters and symbols.
-
CJK Unified Ideographs (U+4E00 to U+9FFF) contains Chinese, Japanese, and Korean characters.
- Arabic (U+0600 to U+06FF) contains Arabic script characters.
Different Approaches
The following are two different approaches to represent the Unicode block containing the given character in Java?
Using Character.UnicodeBlock.of(char ch)
To determine a Character's Unicode Block, use the Character.UnicodeBlock.of() method in Java. The method returns the object representing the Unicode block containing the given character, or null if the character is not a member of a defined block.
Following are the steps to represent the Unicode block containing the given character using g Character.UnicodeBlock.of() method ?
- The program checks the Unicode block of '\u5639' (a Chinese character) and prints CJK_UNIFIED_IDEOGRAPHS.
- It prints Unicode blocks for other characters, such as a space (BASIC_LATIN), an arrow (ARROWS), and an Arabic letter (ARABIC).
- The method Character.UnicodeBlock.of() efficiently retrieves the block name.
Character.UnicodeBlock block = Character.UnicodeBlock.of(ch);
Example
Below is an example that shows how we can represent the Unicode block containing the given character using Character.UnicodeBlock.of() method?
public class Demo { public static void main(String []args) { char ch = '\u5639'; System.out.println(ch); Character.UnicodeBlock block = Character.UnicodeBlock.of(ch); System.out.println(block); System.out.println(Character.UnicodeBlock.of(' ')); System.out.println(Character.UnicodeBlock.of('\u21ac')); System.out.println(Character.UnicodeBlock.of(1565)); } }
Time Complexity: O(1), Each lookup is constant time.
Space Complexity: O(1), Uses a few constant variables.
Output
? CJK_UNIFIED_IDEOGRAPHS BASIC_LATIN ARROWS ARABIC
Using Unicode Code Points
For characters outside the Basic Multilingual Plane (BMP) (U+0000 to U+FFFF), we use code points instead of char. The Character.codePointAt() method helps process multi-byte characters properly.
Following are the steps to represent the Unicode block containing the given character using Unicode code points ?
- The program determines the Unicode block of "?", a musical symbol.
-
Character.codePointAt(0) retrieves the Unicode code point.
- The method Character.UnicodeBlock.of(codePoint) correctly identifies the block.
int codePoint = text.codePointAt(0); // Get Unicode code point
Example
Below is an example to represent the Unicode block containing the given character using Unicode code points ?
public class UnicodeBlockFinder { public static void main(String[] args) { String text = "?"; // A musical symbol (surrogate pair) int codePoint = text.codePointAt(0); // Get Unicode code point Character.UnicodeBlock block = Character.UnicodeBlock.of(codePoint); System.out.println("Character: " + text); System.out.println("Unicode Block: " + block); } }
Output
Character: ?
Unicode Block: MUSICAL_SYMBOLS
Time Complexity: O(1), Retrieving code point and Unicode block are constant-time operations.
Space Complexity: O(1), Stores a single string and an integer.
Conclusion
Unicode Blocks help categorize characters based on their linguistic, symbolic, or script-based properties. We covered two key methods, First using Character.UnicodeBlock.of(char ch) - This method efficiently retrieves the Unicode block of a given character and works well for characters within the Basic Multilingual Plane (BMP) second using Unicode Code Points - For characters outside the BMP, we use Character.codePointAt() to correctly handle multi-byte characters and determine their Unicode block.