0% found this document useful (0 votes)
8 views31 pages

CH03 Data II

This document covers the representation of fractional numbers in binary, focusing on the IEEE floating point standard and its properties. It discusses the limitations of precision in both fractional decimal and binary numbers, as well as the computation over floating point numbers. Additionally, it explains normalized, denormalized, and special values in floating point representation, along with the distribution of encoded values and rounding issues.

Uploaded by

zhwzhw1115
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views31 pages

CH03 Data II

This document covers the representation of fractional numbers in binary, focusing on the IEEE floating point standard and its properties. It discusses the limitations of precision in both fractional decimal and binary numbers, as well as the computation over floating point numbers. Additionally, it explains normalized, denormalized, and special values in floating point representation, along with the distribution of encoded values and rounding issues.

Uploaded by

zhwzhw1115
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

CH03

Data Representation II
COMP1411: Introduction to Computer Systems

Dr. Mingsong LYU ( 呂鳴松 )


Department of Computing,
The Hong Kong Polytechnic University
Spring 2025

Acknowledgement: These slides are based on the textbook (Computer Systems: A Programmer’s Perspective) and its slides.
These slides are only intended to use internally. Do not publish it anywhere without permission.
Overview
• Representing fractional numbers with binary
• IEEE floating point standard
• Examples and properties
• Computation over floating point numbers
Rethinking fractional decimal numbers
425.367

4 * 102 2 * 101 5 * 100 3 * 10-1 6 * 10-2 7 * 10-3

• Limitation on precision
• Given fixed number of digits in the fractional
part, precision is fixed. In the above Option Max Precision
example, 3 digits in the fractional part
means we can only represent a unit as small X.XXXXX 9.99999 0.00001
as 0.001 --- precision limitation XX.XXXX 99.9999 0.0001
• If we want arbitrary precision, we need an XXX.XXX 999.999 0.001
infinite number of digits
XXXX.XX 9999.99 0.01
• Limitation of representations XXXXX.X 99999.9 0.1
• Given 6 digits in total to represent a
(fractional) number 
Fractional binary numbers
• Fractional binary numbers
1011.1012

1 * 23 1 * 21 1 * 20 1 * 2-1 0 * 2-2 1 * 2-3

Binary number Decimal value


0.001 0.125
0.010 0.25
0.011 0.375
0.100 0.5
0.101 0.625
0.110 0.75
0.111 0.875

Precision limitation: 2-3 (1/8)


Fractional binary numbers
2i
2i-1

4 = 22
••• 2 = 21
1 = 20

bi-
bi ••• b2 -1b1 b0 b-1 b-2 b-3 ••• b-j
1 2 = 1/2
2-2 = 1/4 •••
2-3 = 1/8

2-j

Precision limit: 1 / 2j
Representable numbers
• Limitation #1
• Can only exactly represent numbers of the form x/2k (k: fractional digits)
• Values that cannot be exactly represented
• 1/3 0.0101010101[01]…2
• 1/5 0.001100110011[0011]…2
• 1/10 0.0001100110011[0011]…2

• Limitation #2
• Given a fixed number of bits, limited range of numbers
• More integer bits, or more fraction bits?
• Larger value, or better precision?
• We are running out of bits …

Point here? Point here?


More precise fraction Larger value
Overview
• Get familiar with fractional binary numbers
• IEEE floating point standard
• Examples and properties
• Computation over floating point numbers
IEEE Floating Point
• IEEE Standard 754
• Many machines were having different floating point
encodings, hard to exchange information
• In 1985, IEEE established a uniform standard for floating point
arithmetic
• Nowadays, supported by all major CPUs

• Driven by numerical concerns


• Nice standards for rounding, overflow, underflow
• Hard to make fast in hardware
• Numerical analysts predominated over hardware designers
in defining standard
Floating Point Representation

s exp frac

Total available binary digits to represent a number

v = (–1)s M 2E
• Any binary fractional number can be written into the following
numerical form
• Sign bit s determines whether number is negative or positive
• Significand M is normally a fractional value in range [1.0,2.0).
• Exponent E weights value by power of two
• Decimal value v
Normalized values - Definition
s exp frac v = (–1)s M 2E

• Normalized values: when exp ≠ 000…0 and exp ≠ 111…1


• E = exp – Bias
• exp: B2U value read from the exp field
• Bias = 2k-1 -1, where k is the number of bits for exp
• E.g., k = 3, bias = 3, exp range [1, 6], E range [-2, 3]

• M = 1.xxx…x2 (with implied leading 1)


• xxx…x: bits from the frac field
• Minimum when frac = 000…0 (M = 1.0)
• Maximum when frac = 111…1 (M = 1.111…1)
Normalized values - Example
• Compute the decimal value of this 32-bit floating point number
• 0100 0110 0110 1101 1011 0100 0000 0000

0 10001100 11011011011010000000000

• k=8 v = (–1)s M 2E
• S=0
• Significand
frac = 110110110110100000000002
M = 1.11011011011012
• Exponent
V = (-1)0 * 1.11011011011012 * 213
Exp = 100011002 = 140
= 111011011011012
Bias = 27 – 1 = 127
= 1521310
E = Exp – Bias = 13
Denormalized values - Definition
s 0000000…0 frac v = (–1)s M 2E

• When exp = 000…0


• E = 1 – Bias (NOT 0 – Bias)
• M = 0.xxx…x2 (with implied leading 0)
• xxx…x: bits from the frac field
• Cases
• exp = 000…0, frac = 000..0  ZERO
• Note that s = 0/1, positive/negative zero
• exp = 000…0, frac ≠ 000…0  numbers closest to 0.0
• Designed to represent very “small” numbers
Special values - Definition
s 1111111…1 frac v = (–1)s M 2E

• When exp = 111…1


• Case 1: exp = 111…1, frac = 000…0
• Value  (infinity)
• Positive infinity and negative infinity depending on s
• To represent overflows
• e.g., the result of “x = 1.0/0.0”

• Case 2: exp = 111…1, frac ≠ 000…0


• Not-a-Number (NaN)
• Represents case when no numeric value can be determined
• E.g., the result of sqrt(-1),  - ,  × 0
A miniature floating-point example
An 8-bits floating
point representation
s 4-bit exp 3-bit frac
s exp frac E Value
0 0000 000 -6 0
0 0000 001 -6 1/8*1/64 = 1/512 closest to zero
Denormalized 0 0000 010 -6 2/8*1/64 = 2/512
numbers …
0 0000 110 -6 6/8*1/64 = 6/512
0 0000 111 -6 7/8*1/64 = 7/512 largest denorm
0 0001 000 -6 8/8*1/64 = 8/512 smallest norm
0 0001 001 -6 9/8*1/64 = 9/512

0 0110 110 -1 14/8*1/2 = 14/16
0 0110 111 -1 15/8*1/2 = 15/16 closest to 1 below
Normalized
0 0111 000 0 8/8*1 = 1
numbers
0 0111 001 0 9/8*1 = 9/8 closest to 1 above
0 0111 010 0 10/8*1 = 10/8

0 1110 110 7 14/8*128 = 224
0 1110 111 7 15/8*128 = 240 largest norm
0 1111 000 n/a inf
Special values
0 1111 001 n/a NaN (not a number)
Visualizing floating point encodings
− +
−Normalized −Denorm +Denorm +Normalized

NaN NaN
0 +0

• Only representing discrete numbers in a range


• Limited precision and limited max/min value
• Variable precision: higher as the value is closer to zero
• Smooth transition at the boundary of denorm and normalized
Nice Properties
• Floating point zero same as integer zero
• All bits are 0
• Can (almost) use unsigned integer comparison
• Must first compare sign bits
• Must consider -0 = 0
• NaNs problematic
• Will be greater than any other values
• What should the comparison yield?
• Otherwise, OK
• Denorm vs. normalized
• Normalized vs. infinity
Appreciate the beauty of the econding
An 8-bits floating
point representation
s 4-bit exp 3-bit frac
s exp frac E Value
0 0000 000 -6 0
0 0000 001 -6 1/8*1/64 = 1/512
Denormalized 0 0000 010 -6 2/8*1/64 = 2/512 8 different bit
numbers … vectors
0 0000 110 -6 6/8*1/64 = 6/512
0 0000 111 -6 7/8*1/64 = 7/512
0 0001 000 -6 8/8*1/64 = 8/512
0 0001 001 -6 9/8*1/64 = 9/512

0 0110 110 -1 14/8*1/2 = 14/16
0 0110 111 -1 15/8*1/2 = 15/16
Normalized 112 different
0 0111 000 0 8/8*1 = 1
numbers bit vectors
0 0111 001 0 9/8*1 = 9/8
0 0111 010 0 10/8*1 = 10/8

0 1110 110 7 14/8*128 = 224
0 1110 111 7 15/8*128 = 240
0 1111 000 n/a inf
Special values
0 1111 001 n/a NaN (not a number)
The distribution of the encoded values

• The IEEE 754 FP encoding covers non-negative values in range [0, 240], with 120 available
bit vectors representing 120 different values (including both integer and fractional values).
• However, the 120 values represented are not evenly distributed in range [0, 240].
• We obtain the distribution of the values and find out that this encoding devotes more bit
vectors to represent small values while less bit vectors to represent large values. For
example, 50+ values are in range [0, 1], while only 1 value falls in range [225, 240].
• On the small values’ side, this encoding can achieve higher precision; while towards the
large values’ side, precision is decreased and on the contrary, larger values are covered.
• The design idea is that people may need higher precision when they deal with small
values; while for large values, the precision seems not as critical.
• This is a trade-off between precision and the breadth of the value range. This encoding is
clever, if the design idea reflects common usage patterns. This is the case in reality.
Decimal value  IEEE FP 754 binary

• Steps
• Normalize to have leading 1
s 4-bits exp 3-bits frac
• Round to fit within fraction
• Post-normalize to deal with effects of rounding
• Case study

Value Binary
128 10000000
13 00001101
17 00010001
19 00010011
138 10001010
63 00111111
Normalize

s 4-bits exp 3-bits frac


• Requirement
• Set binary point in the form 1.xxx…x
• Decrement exponent as you shift the point to the left

Value Binary Fraction E


128 10000000 1.0000000 7
13 00001101 1.1010000 3
17 00010001 1.0001000 4
19 00010011 1.0011000 4
138 10001010 1.0001010 7
63 00111111 1.1111100 5

We have only 3 bits for the frac part, so we need to do rounding!


A rounding problem
• Let’s consider rounding the following numbers into integers.

Traditional Yet another Assume we have ten numbers as listed.


Number Rounding rounding The average value of these ten numbers
are 15.
10.5 11 10
We want to round each number to an
11.5 12 12 integer value.
12.5 13 12 The second column lists the results of
13.5 14 14 applying traditional rounding. As a result,
each rounded value increases by 0.5
14.5 15 14 compared to the original number. As a
15.5 16 16 result, the average of all the ten
rounded numbers is 15.5.
16.5 17 16
Let’s apply round to even, the results of
17.5 18 18 which is listed in the third column.
Compared to the original values, some
18.5 19 18
rounded values become larger, some
19.5 20 20 rounded values become smaller. As a
result, the average of all the ten
15 15.5 15
rounded numbers is 15.
The traditional rounding scheme
Rounding
s 4-bits exp 3-bits frac

1.BBGRXXX
Guard bit: LSB of result Round bit: 1st bit removed Sticky bit: OR of remaining bits
• Round up conditions
• Round = 1, sticky = 1  > 0.5
• Round = 1, sticky = 0  round to even, to make G an even number

Value Fraction GRS Incr? Rounded


128 1.0000000 000 NO 1.000
13 1.1010000 100 NO 1.101
17 1.0001000 010 NO 1.000
19 1.0011000 110 YES 1.010
138 1.0001010 011 YES 1.001
63 1.1111100 111 YES 10.000
Rounding
• The above rounding scheme can be further explained into the
following cases
• Assume the frac part has three bits 1.BBGRXXX
• Case 1: R = 0, no carry in
• E.g., 1.1010110  1.101, 1.0000101  1.000
• Case 2: R = 1, X = at least one of the sticky bits is 1, do carry in
• E.g., 1.0001001  1.001, 1.0011010  1.010
• Case 3: R = 1, X = all of the sticky bits are 0, depends
• Case 3.1: G = 1, do carry in
• E.g., 1.0011000  1.010, 1.1011000  1.110
• Case 3.2: G = 0, no carry in
• E.g., 1.1001000  1.100, 1.0001000  1.000
Post-normalize
• Issue
• Rounding may have caused overflow
• Handle by shifting right once & incrementing exponent

Value Fraction Rounded exp Adjust Result


128 1.0000000 1.000 7 128
13 1.1010000 1.101 3 13
17 1.0001000 1.000 4 16
19 1.0011000 1.010 4 20
138 1.0001010 1.001 7 144
63 1.1111100 10.000 5 1.000/6 64
Indication of Rounding
• The case for value 138
Value Fraction Rounded exp Result
138 1.0001010 1.001 7 144

• The bit vector to represent 138 is 0 1110 001


• In fact, 0 1110 001 represents value 144, instead of 138
• Looking at adjacent bit vectors
• 0 1110 000  128
• 0 1110 001  144
• 0 1110 010  160
• 138 cannot be precisely represented by this encoding, and it has
been rounded to the closes value the encoding can represent
Overview
• Background: fractional binary numbers
• IEEE floating point standard: definition
• Examples and properties
• Computation over floating point numbers
Floating point addition
• (–1)s1 M1 2E1 + (-1)s2 M2 2E2
•Assume E1 > E2 Get binary points lined up
E1–E2
• Exact Result: (–1)s M 2E (–1)s1 M1
•Sign s, significand M:
• Result of signed align & add + (–1)s2 M2

•Exponent E: E2
(–1)s M

• Fixing
•If M ≥ 2, shift M right, increment E
•if M < 1, shift M left k positions, decrement E by k
•Overflow if E out of range, computer will represent the result by ∞ or - ∞
•Round M to fit frac precision
Floating point multiplication
• (–1)s1 M1 2E1 x (–1)s2 M2 2E2

• Exact Result: (–1)s M 2E


s1 exp1 frac1
• Sign s: s1 ^ s2
• Significand M: M1 x M2
s2 exp2 frac2
• Exponent E: E1 + E2

• Fixing
• Round M to fit frac precision
• If M ≥ 2, shift M right, increment E
•Overflow if E out of range, computer will represent the result by ∞ or - ∞

• Implementation
• The biggest task is multiplying significands, taking a lot of time
Mathematical properties of FP Multiplication
• For the following questions, YES or NO?
• Multiplication Commutative (a*b = b*a)?
• Multiplication Associative ((a*b)*c = a*(b*c))?
• Possibility of overflow, inexactness of rounding
• Eg: (1020 * 1020)*10-20= inf,
1020 * (1020 * 10-20)= 1020
• 1 * a = a?
• Multiplication distributes over addition (a*(b+c) = a*b + a*c)?
• Possibility of overflow, inexactness of rounding
1020 *(1020 - 1020)= 0.0,
1020 * 1020 – 1020 * 1020 = inf – inf  NaN
Mathematical properties of FP addition
• For the following questions, YES or NO?
• Commutative (a+b = b+a)?
• Associative ((a+b)+c = a+(b+c))??
• Overflow and rounding
• (3.14 + 1020) – 1020 = 0.0
• 3.14 + (1020 – 1020) = 3.14
• 0 is additive identity?
• Every element (x) has an additive inverse (exists x’, so that
x+x’=0)?
− +
−Normalized −Denorm +Denorm +Normalized

NaN NaN
0 +0
Thank You

You might also like