Data Representation
Data Representation
1
UniCode
• Bit means “binary digit” and is the smallest unit of computerized data.
• A bit is a 2-base number, i.e. it has either the value of 0 or 1.
• A byte is an amount of memory, a certain collection of bits, originally variable in size
but now almost always eight bits.
2
UniCode
3
ASCII
4
UniCode
• ASCII was an American-developed standard, so it only defined unaccented characters.
There was an ‘e’, but no ‘é’ or ‘Í’.
• This meant that languages which required accented characters couldn’t be
represented in ASCII.
• Unicode started out using 16-bit characters instead of 8-bit characters. 16 bits means
you have 2^16 = 65,536 distinct values available.
• This made it possible to represent many different characters from many different
alphabets
5
UniCode
• UTF-8 will encode a character with a single byte.
• UTF-16 will encode a character with a two bytes
•
6
UniCode
• A string of ASCII text is also valid UTF-8 text.
• UTF-8 uses the following rules:
• If the code point is < 128, it’s represented by the corresponding byte value.
• If the code point is >= 128, it’s turned into a sequence of two, three, or four bytes,
where each byte of the sequence is between 128 and 255.
• Latin-1, also known as ISO-8859-1, is a similar encoding.
• Unicode code points 0–255 are identical to the Latin-1 values, so converting to this
encoding simply requires converting code points to byte values; if a code point
larger than 255 is encountered, the string can’t be encoded into Latin-1.
7
Detecting Encodings
• Chardet library:
• Character encoding auto-detection in Python.
8
Integer representation
• Unsigned Integers:
• Unsigned integers can represent zero and positive integers, but not negative integers.
9
Integer representation
• Signed Integers:
• Signed integers can represent zero, positive integers, as well as negative integers.
• Three representation types:
Ø Sign-Magnitude representation
Ø 1's Complement representation
Ø 2's Complement representation : Modern method
• the most significant bit (msb) is the sign bit, with value of 0 representing positive
integers and 1 representing negative integers.
The remaining n-1 bits represents the magnitude of the integer, as follows:
• for positive integers, the absolute value of the integer is equal to "the magnitude of
the (n-1)-bit binary pattern".
• for negative integers, the absolute value of the integer is equal to "the magnitude of
the complement of the (n-1)-bit binary pattern plus one" (hence called 2's
complement).
10
Integer representation
• 2's Complement representation
• Example 1: Suppose that n=8 and the binary representation 0 100 0001B
• Sign bit is 0 ⇒ positive
Absolute value is 100 0001B = 65D
Hence, the integer is +65
• Example 2: Suppose that n=8 and the binary representation 1 000 0001B.
Sign bit is 1 ⇒ negative
Absolute value is the complement of 000 0001B plus 1, i.e., 111 1110B + 1B = 127D
Hence, the integer is -127D
• Example 3: Suppose that n=8 and the binary representation 0 000 0000B.
Sign bit is 0 ⇒ positive
Absolute value is 000 0000B = 0D
Hence, the integer is +0D
• Example 4: Suppose that n=8 and the binary representation 1 111 1111B.
Sign bit is 1 ⇒ negative
Absolute value is the complement of 111 1111B plus 1, i.e., 000 0000B + 1B = 1D
Hence, the integer is -1D
11
Floating Point representation
Normalized form
•S = 1 (negative or positive)
•E = 1000 0001
•F = 011 0000 0000 0000 0000 0000
12
Floating Point representation
De-Normalized form: In normalized form implicit leading 1 for the fraction, it
cannot represent the number zero!
We can also represent very small positive and negative numbers in de-normalized form with E=0
For example, if S=1, E=0, and F=011 0000 0000 0000 0000 0000.
The actual fraction is 0.011=1×2^-2+1×2^-3=0.375D.
Since S=1, it is a negative number.
With E=0, the actual exponent is -126.
Hence the number is -0.375×2^-126 = -4.4×10^-39, which is an extremely small negative
number (close to zero).
13
Floating Point representation
In summary:
For 1 ≤ E ≤ 254, N = (-1)^S × 1.F × 2^(E-127).
For E = 255, it represents special values, such as ±INF (positive and negative infinity) and
NaN (not a number).
14
Floating Point representation
Example:
0 10000000 110 0000 0000 0000 0000 0000
15
Floating Point representation
Example:
1 01111110 100 0000 0000 0000 0000 0000
16
Floating Point representation
Example:
1 00000000 000 0000 0000 0000 0000 0001
17
64-bit Double-Precision Floating-Point Numbers
• The most significant bit is the sign bit (S), with 0 for positive numbers and 1
for negative numbers.
• The following 11 bits represent exponent (E).
• The remaining 52 bits represents fraction (F).
18
Floating-Point Numbers Representations
19