0% found this document useful (0 votes)
21 views

Text Encoding

ASCII is a 7-bit character encoding standard that can represent 128 characters, including letters, numbers, and symbols. This limited character set is insufficient to represent all languages and scripts. Extended ASCII schemes were developed but varied between countries. Unicode was created as a universal international encoding standard, using more bits per character (8, 16, or 32) to support many scripts and over 100,000 characters, including emoji. Numeric digits can be represented as characters with distinct binary codes rather than their numeric binary representation.

Uploaded by

hujass99n
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Text Encoding

ASCII is a 7-bit character encoding standard that can represent 128 characters, including letters, numbers, and symbols. This limited character set is insufficient to represent all languages and scripts. Extended ASCII schemes were developed but varied between countries. Unicode was created as a universal international encoding standard, using more bits per character (8, 16, or 32) to support many scripts and over 100,000 characters, including emoji. Numeric digits can be represented as characters with distinct binary codes rather than their numeric binary representation.

Uploaded by

hujass99n
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Text

Encoding
ASCII
• ASCII stands for American Standard Code for Information Interchange. It is a character system that lets
computers and devices to process letters, numbers and characters.
• It is represented by a 7-bit binary number that is either 0's or 1’s.
• HTML (HyperText Markup Language) are based on ASCII (American Standard Code for Information
Interchange). There are 128 characters that can be represented using ASCII.
E-ASCII

• There are also non-standard extensions to ASCII, sometimes referred to as extended ASCII.
• These are schemes where the additional codes that arose from an 8-bit system were
allocated to represent additional characters.
• However, such schemes varied from country to country so were not very useful for global
communications. In modern coding schemes only the first 128 codes are retained allowing
compatibility with the original ASCII coding scheme
Problem with
ASCII
• The problem with ASCII is that it only allows you to represent a small number of characters (128 for standard 7-bit
ASCII).
• This might be enough to represent the characters in the English alphabet, but it is not sufficient to represent all of the
languages and scripts in the world, and all of the possible numbers and symbols.
• For example, ASCII can't possibly store the hundreds of thousands of characters in the below scripts in just 8 bits.
• Chinese characters 汉字
• Japanese characters 漢字
• Cyrillic Кири́ ллица
• GujaraR !ુજરાતી
• Urdu ‫اردو‬
• Greek ελληνικά
• Nepali
• Moreover, the widespread use of the World Wide Web made it more important to have a universal internaRonal
coding system, as the range of pla`orms and programs has increased dramaRcally, with more developers from
around the world using a much wider range of characters.
Unicode
• The character set that is most commonly used instead is Unicode
• Unicode also added emoji compatibility, so now all emojis can be used as a unicode character. There are currently 2623 emojis
contained within Unicode.
• Each Unicode character can be encoded on a computer with three different encoding standards, which differ based
on the minimum number of bits used:

Name Description

UTF-8
The most common Unicode format is 8-bit.
Characters can use as few as 8 bits, maximizing
compatibility with ASCII. However, UTF-8 also
allows for variable-width encoding, expanding to
16, 24, 32, 40, or 48 bits when dealing with
larger sets of characters.
UTF-16
Like UTF-8, 16-bit allows variable-width
encoding, and can expand to 32 bits.

UTF-32 With 32-bit, each character uses exactly 32 bits;


this is an example of fixed-width encoding.
Character code
for numeric digits
• A number can be represented as a set of characters.
• For example, the number 35 can be represented as the characters '3' and '5'. When a denary digit
(from 0 to 9) is processed as a character, the computer uses the binary pattern of its character code,
instead of the binary representation of that digit.
• For example, the binary representation of the number 35 using 8 bits is 001000112, but the binary
pattern for '35' is 00110011001101012. This is because the character code for '3' using 8-bit ASCII is
5110 = 001100112 and the character code for '5' is 5310 = 001101012.
• Therefore, it is important that we can tell the difference between the binary representation of a denary
number, and the (different) binary pattern for that number when it is stored as a set of characters.

You might also like