The document discusses ASCII and Unicode character encoding systems, detailing how ASCII uses 7 bits to represent 128 characters and how Unicode provides a unique code point for each character across multiple languages. It also explains the concept of parity bits for error detection in data communication and introduces Gray codes to minimize errors in binary counting. The document highlights the importance of these encoding systems in ensuring accurate data representation and communication.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
13 views20 pages
Ch01 - Lecture 5 ASCII and UNI Codes
The document discusses ASCII and Unicode character encoding systems, detailing how ASCII uses 7 bits to represent 128 characters and how Unicode provides a unique code point for each character across multiple languages. It also explains the concept of parity bits for error detection in data communication and introduces Gray codes to minimize errors in binary counting. The document highlights the importance of these encoding systems in ensuring accurate data representation and communication.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20
ASCII and UNICODES
Dr. Khursheed Aurangzeb
ASCII Character Code
The standard binary code for the alphanumeric
characters is called ASCII (American Standard Code for Information Interchange)
It uses seven bits to code 128 characters, as shown in
Table 1-5. The seven bits of the code are designated by B1 through B7, with B7 being the most significant bit
Note that the most significant three bits of the code
determine the column of the table and the least significant four bits the row of the table ASCII Character Code
The letter A, for example, is represented in ASCII as
1000001 (column 100, row 0001)
The ASCII code contains 94 characters that can be
printed and 34 nonprinting characters used for various control functions
The printing characters consist of the 26 uppercase
letters, the 26 lowercase letters, the 10 numerals, and 32 special printable characters such as %, @, and $ ASCII Codes ASCII Codes Cont’d… ASCII Character Code
ASCII is a 7-bit code, but most computers manipulate an
8-bit quantity as a single unit called a byte Therefore, ASCII characters most often are stored one per byte, with the most significant bit set to 0 The extra bit is sometimes used for specific purposes, eg. some printers recognize an additional 128 8-bit characters, with the most significant bit set to 1 for enabling printer to produce additional symbols, such as those from the Greek alphabet or characters with accent marks as used in languages other than English Unicode
Unicode was developed as an industry standard for
providing a common representation of symbols and ideographs for the most of the world’s languages By providing a standard representation for different languages, Unicode removes the need to convert between different character sets and eliminates the conflicts that arise from using the same numbers for different character sets Unicode provides a unique number called a code point for each character, as well as a unique name Unicode
There are several standard encodings of the code points
that range from 8 to 32 bits (1 to 4 bytes)
For example, UTF-8 (UCS Transformation Format, where
UCS stands for Universal Character Set) is a variable-length encoding that uses from 1 to 4 bytes for each code point
UTF-16 is a variable-length encoding that uses either 2 or 4
bytes for each code point, while UTF-32 is a fixed-length that uses 4 bytes for every code point UTF-8 Encoding for Unicode Code Points Unicode
A common notation for referring to a code point is the
characters “U+” followed by the four to six hexadecimal digits of the code point. For example, U+0030 is the character “0”, named Digit Zero. The first 128 code points of Unicode, from U+0000 to U+007F, correspond to the ASCII characters. Unicode currently supports over a million code points from a hundred scripts worldwide. Unicode
To illustrate the UTF-8 encoding, consider a couple of
examples. The code point U+0054, Latin capital letter T, “T”, is in the range of U+0000 0000 to U+0000 007F. So it would be encoded with one byte with a value of (01010100)2. The code point U+00B1, plus-minus sign, “±”, is in the range of U+0000 0080 to U+0000 07FFF. So, it would be encoded with two bytes with a value of (11000010 10110001)2. Parity Bit
To detect errors in data communication and
processing, an additional bit is sometimes added to a binary code word to define its parity
A parity bit is the extra bit included to make the
total number of 1s in the resulting code word either even or odd. Eg. Consider following even and odd parity Binary Codes
As we count up or down using binary codes, the number
of bits that change from one binary value to the next varies
This is illustrated by the binary code for the octal digits
on the left in Table 1-7
As we count from 000 up to 111 and “roll over” to 000,
the number of bits that change between the binary values ranges from 1 to 3 Problem with Binary Codes
For many applications, multiple bit changes as the
circuit counts is not a problem. There are applications, however, in which a change of more than one bit when counting up or down can cause serious problems. This is illustrated by the binary code for the octal digits in the Table on next slid. One such problem is illustrated by an optical shaft- angle encoder shown in Figure in coming slides Gray Codes Optical Shaft-Angle Encoder Gray Codes Gray Codes
The encoder is a disk attached to a rotating shaft for
measurement of the rotational position of the shaft. The disk contains areas that are clear for binary 1 and opaque for binary 0. An illumination source is placed on one side of the disk, and optical sensors, one for each of the bits to be encoded, are placed on the other side of the disk. When a clear region lies between the source and a sensor, the sensor responds to the light with a binary 1 output. Gray Codes
When an opaque region lies between the
source and the sensor, the sensor responds to the dark with a binary 0. The rotating shaft, however, can be in any angular position. For example, suppose that the shaft and disk are positioned so that the sensors lie right at the boundary between 011 and 100. In this case, sensors in positions B2, B1, and B0 have the light partially blocked. Gray Codes
In such a situation, it is unclear whether the
three sensors will see light or dark. As a consequence, each sensor may produce
either a 1 or a 0. Thus, the resulting encoded binary number for
a value between 3 and 4 may be 000, 001, 010,
011, 100, 101, 110, or 111. Either 011 or 100 will be satisfactory in this
case, but the other six values are clearly erroneous! Gray Codes
To see the solution to this problem, notice that in
those cases in which only a single bit changes when going from one value to the next or previous value, this problem cannot occur. For example, if the sensors lie on the boundary between 2 and 3, the resulting code is either 010 or 011, either of which is satisfactory. If we change the encoding of the values 0 through 7 such that only one bit value changes as we count up or down (including rollover from 7 to 0), then the encoding will be satisfactory for all positions.