0% found this document useful (0 votes)
80 views9 pages

CH1 Number System

The document summarizes different number systems used in computing, including decimal, binary, hexadecimal, and character codes like ASCII and Unicode. It explains that computers use binary to represent data internally, while hexadecimal is commonly used by programmers for its compactness between binary and decimal. Character codes map letters and symbols to numbers to allow consistent representation of text across systems.

Uploaded by

HASEN SEID
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views9 pages

CH1 Number System

The document summarizes different number systems used in computing, including decimal, binary, hexadecimal, and character codes like ASCII and Unicode. It explains that computers use binary to represent data internally, while hexadecimal is commonly used by programmers for its compactness between binary and decimal. Character codes map letters and symbols to numbers to allow consistent representation of text across systems.

Uploaded by

HASEN SEID
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Chapter One - Number System

We enter data into a computer or review (see) output data from a computer using the letter of
alphabet, various special symbols, and the numerals in the decimal number system. But since
computer is an electronic device which understands electrical flow (signal) there is no letter,
symbol or number inside the computer.

Computer works with binary numbers. As a semiconductor is conducting or isn’t conducting; a


switch is closed or opened. So data are represented in the form of a code which can have a
corresponding electrical signal.

A number system is a set of symbols used for counting. There are various number systems some
of them are discussed below.

THE DECIMAL SYSTEM

In everyday life we use a system based on decimal digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) to


represent numbers, and refer to the system as the decimal system. Consider what the number
83 means. It means eight tens plus three:

83 = (8 * 10) + 3

The number 4728 means four thousands, seven hundreds, two tens, plus eight:
4728 = (4 * 1000) + (7 * 100) + (2 * 10) + 8
The decimal system is said to have a base, or radix, of 10. This means that each digit in the
number is multiplied by 10 raised to a power corresponding to that digit’s position:

83 = (8 * 101) + (3 * 100)
4728 = (4 * 103) + (7 * 102) + (2 * 101) + (8 * 100)
In any number, the leftmost digit is referred to as the most significant digit, because it carries the
highest value. The rightmost digit is called the least significant digit.

1|Page
THE BINARY SYSTEM
In the decimal system, 10 different digits are used to represent numbers with a base of 10.
In the binary system, we have only two digits, 1 and 0. Thus, numbers in the binary system
are represented to base 2.

To avoid confusion, we will sometimes put a subscript on a number to indicate its base. For
example, 8310 and 472810 are numbers represented in decimal notation or, more briefly,
decimal numbers. The digits 1 and 0 in binary notation have the same meaning as in
decimal notation:

02 = 010

12 = 110

To represent larger numbers, as with decimal notation, each digit in a binary number has
a value depending on its position:

102 = (1 * 21) + (0 * 20) = 210


112 = (1 * 21) + (1 * 20) = 310
1002 = (1 * 22) + (0 * 21) + (0 * 20) = 410

Therefore, to convert a number from binary notation to decimal notation, all that is required is to
multiply each binary digit by the appropriate power of 2 and add the results.

To convert a decimal integer N into binary form, we use repeated divisions by 2. The remainders
and the final quotient, 1, give us, in order of increasing significance, the binary digits of N. Figure
9.1 shows two examples.

2|Page
HEXADECIMAL NOTATION
Because of the inherent binary nature of digital computer components, all forms of data
within computers are represented by various binary codes. However, no matter how
convenient the binary system is for computers, it is exceedingly cumbersome for human
beings. Consequently, most computer professionals who must spend time working with the
actual raw data in the computer prefer a more compact notation.

What notation to use? One possibility is the decimal notation. This is certainly more compact
than binary notation, but it is awkward because of the tediousness of converting between
base 2 and base 10.

Instead, a notation known as hexadecimal has been adopted. Binary digits are grouped
into sets of four bits, called a nibble. Each possible combination of four binary digits is
given a symbol, as follows:

0000 = 0 0100 = 4 1000 = 8 1100 = C


0001 = 1 0101 = 5 1001 = 9 1101 = D
0010 = 2 0110 = 6 1010 = A 1110 = E
0011 = 3 0111 = 7 1011 = B 1111 = F

Because 16 symbols are used, the notation is called hexadecimal, and the 16 symbols are the
hexadecimal digits.

A sequence of hexadecimal digits can be thought of as representing an integer in base 16.


Thus,

2C16 = (216 * 161) + (C16 * 160)


= (210 * 161) + (1210 * 160) = 44

3|Page
Decimal (base 10) Binary (base 2) Hexadecimal (base 16)
0 0000 0
1 0001 1
2 0010 2
3 0011 3
4 0100 4
5 0101 5
6 0110 6
7 0111 7
8 1000 8
9 1001 9
10 1010 A
11 1011 B
12 1100 C
13 1101 D
14 1110 E
15 1111 F
16 0001 0000 10
17 0001 0001 11
18 0001 0010 12
31 0001 1111 1F
100 0110 0100 64
255 1111 1111 FF
256 0001 0000 0000 100

Decimal, Binary, and Hexadecimal

4|Page
Hexadecimal notation is not only used for representing integers but also used as a concise
notation for representing any sequence of binary digits, whether they represent text,
numbers, or some other type of data. The reasons for using hexadecimal notation are as
follows:

1. It is more compact than binary notation.

2. In most computers, binary data occupy some multiple of 4 bits, and hence some
multiple of a single hexadecimal digit.
3. It is extremely easy to convert between binary and hexadecimal notation.

As an example of the last point, consider the binary string 110111100001. This is equivalent
to

1101 1110 0001 = DE116

D E 1

This process is performed so naturally that an experienced programmer can mentally


convert visual representations of binary data to their hexadecimal equivalent without
written effort.

CHARACTER CODES
Each computer has a set of characters that it uses. As a bare minimum, this set includes the 26
uppercase letters, the 26 lowercase letters, the digits 0 through 9, and a set of special symbols,
such as space, period, minus sign, comma, and carriage return.

In order to transfer these characters into the computer, each one is assigned a number: for
example, a = 1, b = 2, ..., z = 26, + = 27, - = 28. The mapping of characters onto integers is called
a character code. It is essential that communicating computers use the same code or they will
not be able to understand one an- other. For this reason, standards have been developed. Below
we will examine three of the most important ones.

ASCII
One widely used code is called ASCII (American Standard Code for Information
Interchange). The ASCII printing characters are straightforward. They include the upper and
lowercase letters, digits, punctuation marks, and a few math symbols. The following figure shows
the ASCII code.

5|Page
Unicode
The computer industry grew up mostly in the U.S., which led to the ASCII character set. ASCII
is fine for English but less fine for other languages. French needs accents (e.g., système);
German needs diacritical marks (e.g., für), and so on. Some European languages have a few
letters not found in ASCII, such as the German ß and the Danish o/ . Some languages have entirely
different alphabets (e.g., Russian and Arabic), and a few languages have no alphabet at all (e.g.,
Chinese). As computers spread to the four corners of the globe and software vendors want to
sell products in countries where most users do not speak English, a different character set is
needed.

The first attempt at extending ASCII was IS 646, which added another 128 characters to ASCII,
making it an 8-bit code called Latin-1. The additional characters were mostly Latin letters with
accents and diacritical marks. The next at- tempt was IS 8859, which introduced the concept of
6|Page
a code page, a set of 256 characters for a particular language or group of languages. IS 8859-
1 is Latin-1. IS 8859-2 handles the Latin-based Slavic languages (e.g., Czech, Polish, and
Hungarian). IS 8859-3 contains the characters needed for Turkish, Maltese, Esperanto, and
Galician, and so on. The trouble with the code-page approach is that the software has to keep
track of which page it is currently on, it is impossible to mix languages over pages, and the
scheme does not cover Japanese and Chinese at all.

A group of computer companies decided to solve this problem by forming a consortium to create
a new system, called Unicode, and getting it proclaimed an International Standard (IS 10646).
Unicode is now supported by programming languages (e.g., Java), operating systems (e.g.,
Windows), and many applications.

The idea behind Unicode is to assign every character and symbol a unique 16-bit value, called
a code point. No multibyte characters or escape sequences are used. Having every symbol be
16 bits makes writing software simpler.

With 16-bit symbols, Unicode has 65,536 code points. Since the world’s languages collectively
use about 200,000 symbols, code points are a scarce resource that must be allocated with great
care. To speed the acceptance of Unicode, the consortium cleverly used Latin-1 as code points
0 to 255, making conversion between ASCII and Unicode easy. To avoid wasting code points,
each diacritical mark has its own code point. It is up to software to combine diacritical marks
with their neighbors to form new characters. While this puts more work on the soft- ware, it
saves precious code points.

To allow users to invent special characters for special purposes, 6400 code points have been
allocated for local use.

While Unicode solves many problems associated with internationalization, it does not (attempt
to) solve all the world’s problems. For example, while the Latin alphabet is in order, the Han
ideographs are not in dictionary order. As a consequence, an English program can examine
‘‘cat’’ and ‘‘dog’’ and sort them alphabetically by simply comparing the Unicode value of their
first character. A Japanese program needs external tables to figure out which of two symbols
comes before theother in the dictionary.

Another issue is that new words are popping up all the time. Fifty years ago nobody talked about
apps, chatrooms, cyberspace, emoticons, gigabytes, lasers, modems, smileys, or videotapes.
Adding new words in English does not require new code points. Adding them in Japanese does.

7|Page
In addition to new technical words, there is a demand for adding at least 20,000 new (mostly
Chinese) personal and place names. Blind people think Braille should be in there, and special
interest groups of all kinds want what they perceive as their rightful code points. The Unicode
consortium reviews and decides on all new proposals.

65,536 code points was not enough to satisfy everyone, so in 1996 an additional 16 planes
of 16 bits each were added, expanding the total number of characters to 1,114,112.

UTF-8
Although better than ASCII, Unicode eventually ran out of code points and it also requires 16
bits per character to represent pure ASCII text, which is wasteful. Consequently, another coding
scheme was developed to address these concerns. It is called UTF-8 UCS Transformation
Format where UCS stands for Universal Character Set, which is essentially Unicode. UTF-8
codes are variable length, from 1 to 4 bytes, and can code about two billion characters. It is the
dominant character set used on the World Wide Web.

One of the nice properties of UTF-8 is that codes 0 to 127 are the ASCII characters, allowing
them to be expressed in 1 byte (versus 2 bytes in Unicode). For characters not in ASCII, the
high-order bit of the first byte is set to 1, indicatingthat 1 or more additional bytes follow. In
all, six different formats are used, as illustrated in the figure below. The bits marked ‘‘d’’ are
data bits.

Bits Byte 1 Byte 2 Byte 3 Byte 4 Byte 5 Byte 6


7 0ddddddd
11 110ddddd 10dddddd
16 1110dddd 10dddddd 10dddddd
21 11110ddd 10dddddd 10dddddd 10dddddd
26 111110dd 10dddddd 10dddddd 10dddddd 10dddddd
31 1111110x 10dddddd 10dddddd 10dddddd 10dddddd 10dddddd

The UTF-8 encoding scheme.

8|Page
UTF-8 has a number of advantages over Unicode and other schemes. First, if a program or
document uses only characters that are in the ASCII character set, each can be represented in 8
bits. Second, the first byte of every UTF-8 character uniquely determines the number of bytes
in the character. Third, the continuation bytes in an UTF-8 character always start with 10,
whereas the initial byte never does, making the code self-synchronizing. In particular, in the
event of a communication or memory error, it is always possible to go forward and find the start
of the next character (assuming it has not been damaged).

9|Page

You might also like