Bits Used to Represent Unicode, ASCII, UTF-16, and UTF-8 Characters in Java

Java Object Oriented Programming Programming

In general, data is stored in a computer in the form of bits (1 or, 0). There are various coding schemes available specifying the set of bytes represented by each character.

ASCII − Stands for American Standards Code for Information Interchange. It is developed by American standards association and is the mostly used coding system. It represents characters using 7 bits and has includes 128 characters: upper and lowercase Latin alphabet, the numbers 0-9, and some extra characters).

Unicode (UTF) − Stands for Unicode Translation Format. It is developed by The Unicode Consortium. if you want to create documents that use characters from multiple character sets, you will be able to do so using the single Unicode character encodings. It provides 3 types of encodings.

UTF-8 − It comes in 8-bit units (bytes), a character in UTF8 can be from 1 to 4 bytes long, making UTF8 variable width.
UTF-16 − It comes in 16-bit units (shorts), it can be 1 or 2 shorts long, making UTF16 variable width.
UTF-32 − It comes in 32-bit units (longs). It is a fixed-width format and is always 1 "long" in length.

Representation in Java

The following table lists the number of bits used in Java to represent various coding standards.

Representation	bits used
ASCII	7 bits (represented as 8 bits).
UTF-8	8, 16 and, 18bit patterns.
UTF-16	16 bits and larger bit patterns.

Venkata Sai

Updated on: 2019-07-30T22:30:26+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started