A Tutorial - Data Representation
A Tutorial - Data Representation
Representation
Integers, Floating-point Numbers,
and Characters
1. Number Systems
Human beings use decimal (base 10) and duodecimal (base 12) number systems for counting and
measurements (probably because we have 10 fingers and two big toes). Computers use binary (base
2) number system, as they are made from binary digital components (known as transistors) operating
in two states - on and off. In computing, we also use hexadecimal (base 16) or octal (base 8) number
systems, as acompact form for represent binary numbers.
1.1 Decimal (Base 10) Number System
Decimal number system has ten symbols: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, called digits. It uses positional
notation. That is, the least-significant digit (right-most digit) is of the order of 10^0 (units or ones),
the second right-most digit is of the order of 10^1 (tens), the third right-most digit is of the order
of 10^2 (hundreds), and so on. For example,
We shall denote a binary number with a suffix B. Some programming languages denote binary
numbers with prefix 0b (e.g., 0b1001000), or prefix b with the bits quoted (e.g., b'10001111').
A binary digit is called a bit. Eight bits is called a byte (why 8-bit unit? Probably because 8=23).
1.3 Hexadecimal (Base 16) Number System
Hexadecimal number system uses 16 symbols: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F, called hex
digits. It is a positional notation, for example,
We shall denote a hexadecimal number (in short, hex) with a suffix H. Some programming languages
denote hex numbers with prefix 0x (e.g.,0x1A3C5F), or prefix x with hex digit quoted
(e.g., x'C3A4D98B').
Each hexadecimal digit is also called a hex digit. Most programming languages accept
lowercase 'a' to 'f' as well as uppercase 'A' to 'F'.
Computers uses binary system in their internal operations, as they are built from binary digital
electronic components. However, writing or reading a long sequence of binary bits is cumbersome
and error-prone. Hexadecimal system is used as a compact form or shorthand for binary bits. Each
hex digit is equivalent to 4 binary bits, i.e., shorthand for 4 bits, as follows:
0H (0000B) 1H (0001B) 2H (0010B) 3H (0011B)
(0D) (1D) (2D) (3D)
4H (0100B) 5H (0101B) 6H (0110B) 7H (0111B)
(4D) (5D) (6D) (7D)
8H (1000B) 9H (1001B) AH (1010B) BH (1011B)
(8D) (9D) (10D) (11D)
CH (1100B) DH (1101B) EH (1110B) FH (1111B)
(12D) (13D) (14D) (15D)
1.4 Conversion from Hexadecimal to Binary
Replace each hex digit by the 4 equivalent bits, for examples,
It is important to note that hexadecimal number provides a compact form or shorthand for
representing binary bits.
1.6 Conversion from Base r to Decimal (Base 10)
Given a n-digit base r number: dn-1 dn-2 dn-3 ... d3 d2 d1 d0 (base r), the decimal equivalent is
given by:
For examples,
The above procedure is actually applicable to conversion between any 2 base systems. For example,
Example 1:
Example 2:
The moral of the story is unless you know the encoding scheme, there is no way that you can decode
the data.
3. Integer Representation
Integers are whole numbers or fixed-point numbers with the radix point fixed after the least-
significant bit. They are contrast to real numbers or floating-point numbers, where the position of the
radix point varies. It is important to take note that integers and floating-point numbers are treated
differently in computers. They have different representation and are processed differently (e.g.,
floating-point numbers are processed in a so-called floating-point processor). Floating-point
numbers will be discussed later.
Computers use a fixed number of bits to represent an integer. The commonly-used bit-lengths for
integers are 8-bit, 16-bit, 32-bit or 64-bit. Besides bit-lengths, there are two representation schemes
for integers:
1. Unsigned Integers: can represent zero and positive integers.
2. Signed Integers: can represent zero, positive and negative integers. Three representation
schemes had been proposed for signed integers:
a. Sign-Magnitude representation
b. 1's Complement representation
c. 2's Complement representation
You, as the programmer, need to decide on the bit-length and representation scheme for your
integers, depending on your application's requirements. Suppose that you need a counter for
counting a small quantity from 0 up to 200, you might choose the 8-bit unsigned integer scheme as
there is no negative numbers involved.
Because of the fixed precision (i.e., fixed number of bits), an n-bit 2's complement signed integer has a
certain range. For example, for n=8, the range of 2's complement signed integers is -128 to +127.
During addition (and subtraction), it is important to check whether the result exceeds this range, in
other words, whether overflow or underflow has occurred.
Example 4: Overflow: Suppose that n=8, 127D + 2D = 129D (overflow - beyond the range)
127D → 0111 1111B
2D → 0000 0010B(+
1000 0001B → -127D (wrong)
Example 5: Underflow: Suppose that n=8, -125D - 5D = -130D (underflow - below the
range)
https://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepres
entation.html