0% found this document useful (0 votes)
9 views19 pages

Data Representation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views19 pages

Data Representation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Data representations in computer

1
UniCode
• Bit means “binary digit” and is the smallest unit of computerized data.
• A bit is a 2-base number, i.e. it has either the value of 0 or 1.
• A byte is an amount of memory, a certain collection of bits, originally variable in size
but now almost always eight bits.

Some example bytes could be 00000001 or 11111111 or 01010011.

The calculation of the decimal equivalent of the binary value 00000001:

2
UniCode

• The calculation of the decimal equivalent of the binary value 11111111:

• The calculation of the decimal equivalent of the binary value 01010011:

3
ASCII

• ASCII stands for American Standard Code for Information Interchange.


• It is a standard for assigning numerical values to the set of letters in the Roman
alphabet and typographic characters.
• The ASCII character set can be represented by 7 bits. This makes 27 or 128 different
values resp. characters
• As ASCII uses only 7 of the 8 bits available of an byte the first bit is always 0:
0xxxxxxx;

• The first 32 characters are for control


characters.

4
UniCode
• ASCII was an American-developed standard, so it only defined unaccented characters.
There was an ‘e’, but no ‘é’ or ‘Í’.
• This meant that languages which required accented characters couldn’t be
represented in ASCII.
• Unicode started out using 16-bit characters instead of 8-bit characters. 16 bits means
you have 2^16 = 65,536 distinct values available.
• This made it possible to represent many different characters from many different
alphabets

5
UniCode
• UTF-8 will encode a character with a single byte.
• UTF-16 will encode a character with a two bytes

• Same binary data different interpretation by different encodings

6
UniCode
• A string of ASCII text is also valid UTF-8 text.
• UTF-8 uses the following rules:
• If the code point is < 128, it’s represented by the corresponding byte value.
• If the code point is >= 128, it’s turned into a sequence of two, three, or four bytes,
where each byte of the sequence is between 128 and 255.
• Latin-1, also known as ISO-8859-1, is a similar encoding.
• Unicode code points 0–255 are identical to the Latin-1 values, so converting to this
encoding simply requires converting code points to byte values; if a code point
larger than 255 is encountered, the string can’t be encoded into Latin-1.

• One-character Unicode strings can also be created


with the chr() built-in function, which takes integers
and returns a Unicode string of length 1 that contains
the corresponding code point.
• The reverse operation is the built-in ord() function
that takes a one-character Unicode string and returns
the code point value

7
Detecting Encodings
• Chardet library:
• Character encoding auto-detection in Python.

• Latin-1 is also known as ISO-8859-1

8
Integer representation
• Unsigned Integers:
• Unsigned integers can represent zero and positive integers, but not negative integers.

9
Integer representation
• Signed Integers:
• Signed integers can represent zero, positive integers, as well as negative integers.
• Three representation types:
Ø Sign-Magnitude representation
Ø 1's Complement representation
Ø 2's Complement representation : Modern method
• the most significant bit (msb) is the sign bit, with value of 0 representing positive
integers and 1 representing negative integers.
The remaining n-1 bits represents the magnitude of the integer, as follows:
• for positive integers, the absolute value of the integer is equal to "the magnitude of
the (n-1)-bit binary pattern".
• for negative integers, the absolute value of the integer is equal to "the magnitude of
the complement of the (n-1)-bit binary pattern plus one" (hence called 2's
complement).

10
Integer representation
• 2's Complement representation
• Example 1: Suppose that n=8 and the binary representation 0 100 0001B
• Sign bit is 0 ⇒ positive
Absolute value is 100 0001B = 65D
Hence, the integer is +65

• Example 2: Suppose that n=8 and the binary representation 1 000 0001B.
Sign bit is 1 ⇒ negative
Absolute value is the complement of 000 0001B plus 1, i.e., 111 1110B + 1B = 127D
Hence, the integer is -127D

• Example 3: Suppose that n=8 and the binary representation 0 000 0000B.
Sign bit is 0 ⇒ positive
Absolute value is 000 0000B = 0D
Hence, the integer is +0D

• Example 4: Suppose that n=8 and the binary representation 1 111 1111B.
Sign bit is 1 ⇒ negative
Absolute value is the complement of 111 1111B plus 1, i.e., 000 0000B + 1B = 1D
Hence, the integer is -1D
11
Floating Point representation
Normalized form

1 1000 0001 011 0000 0000 0000 0000 0000

•S = 1 (negative or positive)
•E = 1000 0001
•F = 011 0000 0000 0000 0000 0000

• N = (-1)^S × 1.F × 2^(E-127) we need to represent both positive


and negative exponent.
Fraction part: 1. 011 0000 0000 0000 0000 0000B With an 8-bit E, ranging from 0 to
Here at ”1” at the beginning is implicit 255, the excess-127 scheme could
Fraction: 1 + 1×2^-2 + 1×2^-3 = 1.375D. provide actual exponent of -127 to
128
Exponent part: 1000 0001B=129
So the number is -1.375×2^2=-5.5D

12
Floating Point representation
De-Normalized form: In normalized form implicit leading 1 for the fraction, it
cannot represent the number zero!

•For E=0, the numbers are in the de-normalized


form.
•An implicit leading 0 (instead of 1) is used for the
fraction; and the actual exponent is always -126.
Hence, the number zero can be represented with
E=0 and F=0 (because 0.0×2^-126=0).

We can also represent very small positive and negative numbers in de-normalized form with E=0

For example, if S=1, E=0, and F=011 0000 0000 0000 0000 0000.
The actual fraction is 0.011=1×2^-2+1×2^-3=0.375D.
Since S=1, it is a negative number.
With E=0, the actual exponent is -126.
Hence the number is -0.375×2^-126 = -4.4×10^-39, which is an extremely small negative
number (close to zero).

13
Floating Point representation

In summary:
For 1 ≤ E ≤ 254, N = (-1)^S × 1.F × 2^(E-127).

• These numbers are in the so-called normalized form.


• The sign-bit represents the sign of the number.
• Fractional part (1.F) are normalized with an implicit leading 1.
• The exponent is bias (or in excess) of 127, so as to represent both positive and
negative exponent.
• The range of exponent is -126 to +127

•For E = 0, N = (-1)^S × 0.F × 2^(-126).


These numbers are in the so-called denormalized form.
The exponent of 2^-126 evaluates to a very small number.
Denormalized form is needed to represent zero (with F=0 and E=0).
It can also represents very small positive and negative number close to zero.

For E = 255, it represents special values, such as ±INF (positive and negative infinity) and
NaN (not a number).
14
Floating Point representation

Example:
0 10000000 110 0000 0000 0000 0000 0000

• Sign bit S = 0 ⇒ positive number


• E = 1000 0000B = 128D (in normalized form)
• Fraction is 1.11B (with an implicit leading 1) = 1 + 1×2^-1 + 1×2^-2 = 1.75D
• The number is +1.75 × 2^(128-127) = +3.5D

15
Floating Point representation

Example:
1 01111110 100 0000 0000 0000 0000 0000

• Sign bit S = 1 ⇒ Negative number


• E = 01111110B = 126D (in normalized form)
• Fraction is 1.1B (with an implicit leading 1) = 1 + 2^-1 = 1.5D
• The number is -1.5 × 2^(126-127) = -0.75D

16
Floating Point representation

Example:
1 00000000 000 0000 0000 0000 0000 0001

• E = 0 (in de-normalized form)


• Fraction is 0.000 0000 0000 0000 0000 0001B (with an implicit leading 0) = 1×2^-23
• The number is -2^-23 × 2^(-126) = -2×(-149) ≈ -1.4×10^-45

17
64-bit Double-Precision Floating-Point Numbers

• The most significant bit is the sign bit (S), with 0 for positive numbers and 1
for negative numbers.
• The following 11 bits represent exponent (E).
• The remaining 52 bits represents fraction (F).

• Normalized form: For 1 ≤ E ≤ 2046, N = (-1)^S × 1.F × 2^(E-1023).


• Denormalized form: For E = 0, N = (-1)^S × 0.F × 2^(-1022).
• These are in the denormalized form.
• For E = 2047, N represents special values, such as ±INF (infinity), NaN (not
a number)

18
Floating-Point Numbers Representations

19

You might also like