
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Bits Used to Represent Unicode, ASCII, UTF-16, and UTF-8 Characters in Java
In general, data is stored in a computer in the form of bits (1 or, 0). There are various coding schemes available specifying the set of bytes represented by each character.
ASCII − Stands for American Standards Code for Information Interchange. It is developed by American standards association and is the mostly used coding system. It represents characters using 7 bits and has includes 128 characters: upper and lowercase Latin alphabet, the numbers 0-9, and some extra characters).
Unicode (UTF) − Stands for Unicode Translation Format. It is developed by The Unicode Consortium. if you want to create documents that use characters from multiple character sets, you will be able to do so using the single Unicode character encodings. It provides 3 types of encodings.
- UTF-8 − It comes in 8-bit units (bytes), a character in UTF8 can be from 1 to 4 bytes long, making UTF8 variable width.
- UTF-16 − It comes in 16-bit units (shorts), it can be 1 or 2 shorts long, making UTF16 variable width.
- UTF-32 − It comes in 32-bit units (longs). It is a fixed-width format and is always 1 "long" in length.
Representation in Java
The following table lists the number of bits used in Java to represent various coding standards.
Representation | bits used |
---|---|
ASCII | 7 bits (represented as 8 bits). |
UTF-8 | 8, 16 and, 18bit patterns. |
UTF-16 | 16 bits and larger bit patterns. |