0% found this document useful (0 votes)

16 views28 pages

COA Chapter 02 Part 3

Uploaded by

BottleFlip Guy21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views28 pages

COA Chapter 02 Part 3

Uploaded by

BottleFlip Guy21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

CHAPTER 2

Computer
Arithmetic and
Digital Logic

These slides are being provided with permission from the copyright for CS2208
1 use only. The slides must not be reproduced or provided to anyone outside of
the class.
All download copies of the slides are for personal use only.
Students must destroy these copies within 30 days after receipt of final course
evaluations.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

Rounding and Errors

 Floating-point arithmetic can lead to an increase in the number of bits in
the fractional part
 To keep the number of fractional bits constant, rounding is needed
o Error will be induced
 The rounding mechanisms include
o Truncation (i.e., dropping unwanted bits) by rounding towards zero;
a.k.a., rounding down
o Rounding towards positive or negative infinity: the nearest valid
floating-point number in the direction of positive infinity (for positive
values) or negative infinity (for negative values) is chosen to decide
the rounding; a.k.a., rounding up.
o Rounding to nearest: the closest valid floating-point number to the
actual value is used.

Rounding and Errors

 Integer rounding examples:
Rounding towards zero (i.e., rounding down)
o +4.7 truncation, i.e., rounded towards zero  +4
o –4.7 truncation, i.e., rounded towards zero  –4
In truncation, we just get rid of the extra digits (regardless the number
is positive or negative). The result is rounding towards zero.
Rounding towards ± infinity (i.e., rounding up)
It is the opposite of rounding towards zero
o +4.7 rounded towards + infinity +5
o –4.7 rounded towards – infinity –5
Rounding to nearest
o +4.7 rounded to nearest  +5
o –4.7 rounded to nearest  –5
o +4.3 rounded to nearest  +4
o –4.3 rounded to nearest  –4

Rounding and Errors

 In binary, when the number to be rounded is midway between two points
on the floating-point line, rounding to the nearest selects the value whose
least-significant digit is zero (i.e., rounding to an even binary significand).
 For example:
 0.1110000111100001111000010002 will be rounded to
0.111000011110000111100002
 0.1110000111100001111000110002 will be rounded to
0.111000011110000111100102

Normalization
 A number is called normalized when it is written in scientific notation
with a single non-zero digit before the radix point (i.e., the integer
part consists of a single non-zero digit).

Example 1:
 The number 123.45610 is not normalized, as the integer part is not a
single non-zero digit.
 To normalize it, you need to move the decimal point two position to
the left and to compensate this move by multiplying the number by
100, i.e.,
 1.2345610 × 102

Example 2:
 The number 0.0012310 is not normalized, as the integer part is not a
single non-zero digit.
 To normalize it, you need to move the decimal point three position to
the right and to compensate this move by dividing the number by
1000, i.e.,
 1.2310 × 10-3
39
 In base b, a normalized number will have the form ± b0 . b1 b2 b3... × bn
where b0 ≠ 0, and b0, b1, b2, b3 ... are integers between 0 and b -1
This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

Floating-point Numbers
 Floating-point arithmetic lets you handle the very large and very small
values found in scientific applications.
 Floating-point is also called scientific notation, because scientists use it
to represent large numbers (e.g., 1.2345 × 1020) and small numbers that
are very close to zero, but not zero (e.g., 0.45679999 × 10-50).
 A floating-point value is encoded as two components: a number and
an adjustment to the location of the radix point within the number.

 A binary floating-point number is represented by

mantissa × 2exponent
o for example, 101010.1111102 can be represented by
1.010101111102 × 25, where
 the significant digits (or simply significand) is 1.01010111110
and
 the exponent is 5 (000001012 in 8-bit binary arithmetic).
 The term mantissa has been replaced by significand to indicate the
number of significant bits in a floating-point number.
 Because a floating-point number is defined as
40
the product of two values, a floating-point value is not unique;
for example, 10.1102 × 24 = 1.0112 × 25.
This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

Normalization of Floating-point Numbers

 In the IEEE-754 Standard for Floating-Point Arithmetic, the significand
term is always normalized (unless it represents a zero or underflow)
 A normalized binary significand always has a leading 1 (i.e., 1 in the MSB)
 The normalized absolute non-zero values of the IEEE-754 FP numbers
are always in the range The book is missing the –ve sign here
The minimum The maximum
absolute value 1.000…02 × 2 -e to 1.111…12 × 2e absolute value
 The floating-point normalization leads to the highest available precision,
as all significant bits are utilized.
o the un-normalized 8-bit significand 0.0000101 has only three
significant bits, whereas
o the normalized 8-bit significand 1.0100011 has eight three
significant bits. not four

o If a floating-point calculation is to yield the value 0.110... 2× 2e,

the result would be normalized to give 1.10... 2 × 2e -1.
o Similarly, the result 10.1... 2 × 2e would be normalized to 1.01... 2×2e+1.
41

Significand and Exponent Encoding

 The significand of an IEEE-754 floating-point number is represented in
sign and magnitude form.

 The exponent is represented in a biased form,

by adding a constant to the true exponent.

 Suppose an 8-bit exponent is used and all exponents are biased by 127.
o If the true exponent is 0, it will be encoded as 0 + 127 = 127.
o If the true exponent is –2, it will be encoded as –2 + 127 = 125.
o If the true exponent is +2, it will be encoded as +2 + 127 = 129.

 A real number such as 1010.1111 is normalized to get +1.0101111 × 23.

o The true exponent is +3, which is encoded as a biased exponent of

3 + 127; that is 13010 or 10000010 in binary form.

 Likewise, if a biased exponent is 13010, the true exponent is 130 – 127 = 3

Significand and Exponent Encoding

 A 32-bit single-precision IEEE-754 floating-point number is represented
by the bit sequence
S EEEEEEEE 1.FFFFFFFFFFFFFFFFFFFFFFF
o S is the sign bit,
 0 means positive significand,
 1 means negative significand
o E is an eight-bit biased exponent that tells you how to shift the
binary point, and
o F is a 23-bit fractional significand.
o The leading 1 and the binary point in front of the significand
are omitted when the number is encoded. In this case, B is 127,
 A floating-point number X is defined as: i.e., excess-127 code

1 ≤ E ≤ 254  X = (–1)S × 2(E – B) × 1.F

When 1 ≤ E ≤ 254,
the significand =
1 + the fractional significand F
43

Significand and Exponent Encoding

 If the exponent EEEEEEEE > 0, the significand of an IEEE-754 floating-
point number is normalized in the range 1.0000...00 to 1.1111...11,
 If the exponent EEEEEEEE = 0, the significand is Used when it is impossible to
represented without normalization. normalize the number.
o In such cases, the floating-point number X is defined as:
S 00000000 0.FFFFFFFFFFFFFFFFFFFFFFF
E = 0  X = (–1)S × 2(0 – (B – 1)) × 0.F
When E = 0, In this case, B – 1 is 126,
the significand = 0 + the fractional significand F
where, i.e., excess-126 code
o S is the sign bit,
 0 means positive significand,
 1 means negative significand
o E=0
 the exponent was biased by B – 1
o F is the fractional significand
 As E = 0, the significand was encoded without normalization,
i.e., 0.F without an implicit leading one
44

 When E = 0, F ≠ 0  ± Denormalized underflow number

Significand and Exponent Encoding

 The floating-point value of zero is represented by

0.00...00 × 2most negative exponent
i.e., the zero is represented by
o a zero significand and
o a zero biased exponent
as Figure 2.6 demonstrates.
In this floating-point representation,
how many zeros do we have?

Significand and Exponent Encoding

float type in Java and C
double type in Java and C

The L value =1, if and only if E ≠ 0

The L value =0, if and only if E = 0

If E ≠ 0, True exponent =
biased exponent – bias
Biased
values If E = 0, True exponent =
} 0 – (bias – 1)

Unbiased
} values
The book flipped the meaning of S. It is S=0 for +ve and =1 for –ve.
In the IEEE single precision representation,
1  254 for NORMALIZED numbers
the largest normalized absolute number is
46 +38.
2+127 × 1.111…12 ≈ 2+128 = 10+38.5318394 ≈ 3.4×10
the smallest normalized absolute number is
When E = 255 2-126New
× content
1.000…0 2=2
This slide is modified from the original slide by the author A. Clements and used with permission. added and -126 = 10by
copyrighted -37.9297794
© Mahmoud ≈R.1.17×10
El-Sakka. -38.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

Significand and Exponent Encoding

In this case, B – 1 is 126

E = 0  X = (–1)S × 2(0 – (B – 1)) × 0.F

 Underflow occurs when the result of a calculation is a very small number;
smaller in magnitude than the smallest value representable as a
normalized floating-point number in the target data type.
 Replacing an underflow case by a zero might be ok from the addition
point of view, but it is not ok from the multiplication point of view.
47
 NaN means Not a Number, e.g., 0 ÷ 0, ∞ ÷ ∞, 0 × ∞, or ∞ ─ ∞
 In NaN, the value of F is ignored by applications.
This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

From Binary to 32-bit IEEE-754 FP 0 = 0000

 Example 1(a): 1 = 0001
Convert –11110000111100.001111000011112 into 2 = 0010
a 32-bit single-precision IEEE-754 FP value. 3 = 0011
o The number is negative  S = 1 4 = 0100
o The significand is 11110000111100.001111000011112 5 = 0101
6 = 0110
o The normalized significand is 7 = 0111
1.1110000111100001111000011112× 213
8 = 1000
o The biased exponent is the true exponent plus 127; that is, 9 = 1001
13 + 127 = 14010 = 1000 11002 A = 1010
Hence, E = 1000 11002
B = 1011
o To encode the F value, we will ignore the leading 1 and C = 1100
we will only consider the first 23 bits after the binary point, D = 1101
i.e., 1110000111100001111000011112
E = 1110
o The ignored part of the significand is rounded to the nearest, F = 1111
hence the value of F = 111000011110000111100012
o The final number is 1100 0110 0111 0000 1111 0000 1111 00012,
or C670F0F116
or 0xC670F0F1
48
or 0XC670F0F1

From 32-bit IEEE-754 FP to Binary

 Example 1(b): Convert C670F0F116 from a 32-bit single-precision
IEEE-754 FP value into a binary value It can also be written as 0xC670F0F1
---This is the same value as in example 1(a)
o Unpack the number into sign bit, biased exponent, and fractional
significand:
C670F0F116 1100 0110 0111 0000 1111 0000 1111 00012
 S=1
 E = 100 0110 0
 F =111 0000 1111 0000 1111 0001
o As the sign bit is 1, the number is negative.
o Subtract 127 from the biased exponent 100 0110 02 to get
the true exponent  1000 11002 – 0111 11112 = 0000 11012 = 1310.
o The fractional significand is .111 0000 1111 0000 1111 00012.
o Reinserting the leading one gives 1.111 0000 1111 0000 1111 00012.
o The number is –1.111 0000 1111 0000 1111 00012 × 213
= –1111 0000 1111 00.00 1111 00012
Note that the correct answer is:
–1111 0000 1111 00.00 1111 00012 not 49
–1111 0000 1111 00.00 1111 000011112
This is due to the rounding error.
This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

From 32-bit IEEE-754 FP to Decimal

 Example 2: Convert 1111 1110 0110 0000 0000 0000 0000 00002 from
a 32-bit single-precision IEEE-754 FP value into a decimal value.
o Unpack the number into sign bit, biased exponent, and fractional
significand.
 S=1
 E = 1111 1100
 F =110 0000 0000 0000 0000 0000
o As the sign bit is 1, the number is negative.
o Subtract 127 from the biased exponent 1111 11002 to get
the true exponent  1111 11002 – 0111 11112 = 0111 11012 = 12510.
o The fractional significand is .110 0000 0000 0000 0000 00002.
o Reinserting the leading one gives 1.110 0000 0000 0000 0000 00002.
o The number is –1.112 × 2125 = – 1.7510 × 2125
2125 = 10z  log10(2125) = z  z = 125 × 0.30103 = 37.62875
2125 = 1037.62875 = 1037 × 100.62875 = 1037 × 4.25353 50
–1.75 × 2 = –1.75 × 10 × 4.25353 = –7.4436775 × 10
125 37 37

From 32-bit IEEE-754 FP to Decimal

 Example 3: Convert 1000 0000 0110 0000 0000 0000 0000 00002 from
a 32-bit single-precision IEEE-754 FP value into a decimal value.
o Unpack the number into sign bit, biased exponent, and fractional
significand.
 S=1
 E = 0000 0000
 F =110 0000 0000 0000 0000 0000
o As the sign bit is 1, the number is negative.
o As E = 0  true exponent = 0 – (127 – 1) = –126
o The fractional significand is .110 0000 0000 0000 0000 00002.
o As E = 0, the fractional significand is not normalized. The L value =0,
as E = 0
o As E = 0 and F ≠ 0, it means that this is an underflow case.
o The number is –0.112 × 2-126 = – 0.75 × 2-126
2-126 = 10z  log10(2-126)= z  z = -126×0.30103 = -37.92978
2-126 = 10-37.92978 =10-37 × 10 -0.92978= 10-37 × 0.11755
51
–0.75 × 2-126 = –0.75 × 10-37 × 0.11755 = –0.088162 × 10-37
= –8.8162 × 10-39 < the smallest normalized value (–1.17×10-38)
This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

From 32-bit IEEE-754 FP to Decimal

 Example 4: Convert 0111 1111 1000 0000 0000 0000 0000 00002 from
a 32-bit single-precision IEEE-754 FP value into a decimal value.
o Unpack the number into sign bit, biased exponent, and fractional
significand.
 S=0
 E = 1111 1111
 F =000 0000 0000 0000 0000 0000
o As the sign bit is 0, the number is positive.
o As E = 255  either an infinity case or a NaN case
o The fractional significand is .000 0000 0000 0000 0000 00002.
o As the biased exponent is 255 and the F and the S values are zero,
it means that this is a +infinity case, i.e., a number larger than
3.4028235 ×10+38

From 32-bit IEEE-754 FP to Decimal

 Example 5: Convert 1111 1111 1110 0000 0000 0000 0000 00002 from
a 32-bit single-precision IEEE-754 FP value into a decimal value.
o Unpack the number into sign bit, biased exponent, and fractional
significand.
 S=1
 E = 1111 1111
 F =110 0000 0000 0000 0000 0000
o As the sign bit is 1, the number is negative.
o As E = 255  either an infinity case or a NaN case
o The fractional significand is .110 0000 0000 0000 0000 00002.
o As the biased exponent is 255, the F value is NOT zero, and the S
value is 1, it means that this is a –NaN case (Not a Number),
e.g., the result of a 0 ÷ 0, ∞ ÷ ∞, 0 × ∞, or ∞ ─ ∞ operation.
o In –NaN or +NaN cases, the value of F is ignored.
o The value –NaN
53

From 32-bit IEEE-754 FP to Decimal

 Example 6: Convert C46C000016 from 32-bit single-precision 0 = 0000
IEEE-754 FP value into a decimal value. 1 = 0001
2 = 0010
o Convert the hexadecimal number into binary form 3 = 0011
C46C000016 = 1100 0100 0110 1100 0000 0000 0000 00002. 4 = 0100
5 = 0101
o Unpack the number into sign bit, biased exponent, and fractional 6 = 0110
significand. 7 = 0111
 S=1 8 = 1000
 E = 1000 1000 it is 9 9 = 1001
not 7 A = 1010
 F =110 1100 0000 0000 0000 0000 B = 1011
C = 1100
o As the sign bit is 1, the number is negative. D = 1101
o We subtract 127 from the biased exponent 1000 10002 to get E = 1110
the true exponent  1000 1000 – 0111 1111 = 0000 1001 = 9 . F = 1111
2 2 2 10

o The fractional significand is .110 1100 0000 0000 0000 00002.

o Reinserting the leading one gives 1.110 1100 0000 0000 0000 00002.
o The number is –1.110 1100 0000 0000 0000 00002 × 29,
54
or –1110 1100 00.00 0000 0000 00002 (i.e., –944.010).

From Binary to 32-bit IEEE-754 FP

 Example 7: Convert 0.1000 1000 0000 0000 0000 0000 0001 112 × 2 -124
into a 32-bit single-precision IEEE-754 FP value.
o The number is positive  S = 0
o The fractional part is 0.1000 1000 0000 0000 0000 0000 0001 112
The normalized fractional part is
1.000 1000 0000 0000 0000 0000 0001 112× 2-1

o Hence the number will be

1.000 1000 0000 0000 0000 0000 0001 112 × 2 -125
o As the exponent is greater than or equal –126, the fractional part
will be represented as a normalized number
o The number = 1.000 1000 0000 0000 0000 0000 0001 112 × 2 -125
o As F is normalized the biased exponent will be
the true exponent plus 127; Rounded to
that is, –125 + 127 = 2; Hence, E = 0000 00102 the nearest
o The encoded F value (23 bits) will be 000 1000 0000 0000 0000 0000
o The final number is 0000 0001 0000 1000 0000 0000 0000 00002,
or 0108000016. 55

From Binary to 32-bit IEEE-754 FP

 Example 8: Convert 0.0000 1000 0000 0000 0000 0000 0001 112 × 2 -124
into a 32-bit single-precision IEEE-754 FP value.
o The number is positive  S = 0
o The fractional part is 0.0000 1000 0000 0000 0000 0000 0001 112
The normalized fractional part is
1.000 0000 0000 0000 0000 0001 112× 2-5

o Hence the number will be 1.000 0000 0000 0000 0000 0001 112 × 2 -129
o As the exponent is less than –126, the fractional part can NOT be
represented as a normalized number (the number is too small)
o Instead, we will attempt to represent it as
an un-normalized underflow number with exponent = –126
o The number = 0.001 0000 0000 0000 0000 0000 0011 12 × 2 -126
o As F is un-normalized the biased exponent will be
the true exponent plus 127 – 1;
that is, –126 + 127 – 1 = 0; Hence, E = 0000 00002 Rounded to
the nearest
o The encoded F value (23 bits) will be 001 0000 0000 0000 0000 0000
56
o The final number is 0000 0000 0001 0000 0000 0000 0000 00002,
or 0010000016.
This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

From Binary to 32-bit IEEE-754 FP

 Example 9: Convert 0.0000 0000 0000 0000 0000 0000 0001 112 × 2 -124
into a 32-bit single-precision IEEE-754 FP value.
o The number is positive  S = 0
o The fractional part is 0.0000 0000 0000 0000 0000 0000 0001 112
The normalized fractional part is 1.112× 2-28

o Hence the number will be 1.112 × 2-152

o As the exponent is less than –126, the fractional part can NOT be
represented as a normalized number (the number is too small)
o Instead, we will attempt to represent it as
an un-normalized underflow number with exponent = –126
o The number = 0.000 0000 0000 0000 0000 0000 0011 12 × 2 -126
o As F is un-normalized the biased exponent will be
the true exponent plus 127 – 1; Rounded to
that is, –126 + 127 – 1 = 0; Hence, E = 0000 00002 the nearest
o The encoded F value (23 bits) will be 000 0000 0000 0000 0000 0000
o The final number is 0000 0000 0000 0000 0000 0000 0000 00002,
57
or 0000000016.
I.e., the number is encoded as ZERO
This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

From Binary to 32-bit IEEE-754 FP

 Example 10: Convert 0.0000 0000 0000 0000 0000 0000 0111 112 × 2 -124
into a 32-bit single-precision IEEE-754 FP value.
o The number is positive  S = 0
o The fractional part is 0.0000 0000 0000 0000 0000 0000 0111 112
The normalized fractional part is 1.11112× 2 -26

o Hence the number will be 1.11112 × 2 -150

o As the exponent is less than –126, the fractional part can NOT be
represented as a normalized number (the number is too small)
o Instead, we will attempt to represent it as
an un-normalized underflow number with exponent = –126
o The number = 0.000 0000 0000 0000 0000 0000 1111 12 × 2 -126
o As F is un-normalized the biased exponent will be
the true exponent plus 127 – 1; Rounded to
that is, –126 + 127 – 1 = 0; Hence, E = 0000 00002 the nearest
o The encoded F value (23 bits) will be 000 0000 0000 0000 0000 0001
o The final number is 0000 0000 0000 0000 0000 0000 0000 00012,
58
or 0000000116 ---the smallest non-zero positive un-normalized
underflow number (1.4012985×10-45)
This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

From Binary to 32-bit IEEE-754 FP

 Example 11: Convert 1111.1111 1111 1111 1111 1111 0112 × 2 124 into
a 32-bit single-precision IEEE-754 FP value.
o The number is positive  S = 0
o The fractional part is 1111.1111 1111 1111 1111 1111 0112
The normalized fractional part is
1.111 1111 1111 1111 1111 1111 0112 × 23

o Hence the number will be 1.111 1111 1111 1111 1111 1111 0112× 2127
o The biased exponent is the true exponent plus 127;
that is, 127 + 127 = 254; Hence, E = 1111 11102
o To encode the F value, we will ignore the leading 1 and
we will only consider the first 23 bits after the binary point, i.e.,
111 1111 1111 1111 1111 1111
Rounded to
the nearest
o The final number is 0111 1111 0111 1111 1111 1111 1111 11112,
or 7F7FFFFF16.
59
o This number is the largest positive normalized number
(3.4028235×10+38)
This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

From Binary to 32-bit IEEE-754 FP

 Example 12: Convert 1111.1111 1111 1111 1111 1111 1112 × 2 124 into
a 32-bit single-precision IEEE-754 FP value.
o The number is positive  S = 0
o The fractional part is 1111.1111 1111 1111 1111 1111 1112
The normalized fractional part is
1.111 1111 1111 1111 1111 1111 1112 × 23
o Hence the number will be 1.111 1111 1111 1111 1111 1111 1112× 2127
o To encode the F value, we will only consider the first 23 bits after the
binary point
o Note that, the rounding here will add 1 to the fraction to make it
10.000 0000 0000 0000 0000 0000 2× 2127
o As a result of this, the number needs to renormalized again
1.0000 0000 0000 0000 0000 0000 2× 2128 As long as the true
exponent is > 127,
o The true exponent of the normalized number is > 127, the number will be
hence the number will be encoded as +infinity, i.e., encoded as infinity,
 the F value will be 000 0000 0000 0000 0000 0000 regardless of the
 the E value will be 1111 11112 value of F.
60
o The final number is 0111 1111 1000 0000 0000 0000 0000 00002
i.e., +infinity (7F80000016)
This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

From Decimal to 32-bit IEEE-754 FP

 Example 13: Convert 4100.12510 into a 32-bit single-precision 0 = 0000
IEEE-754 FP value. 1 = 0001
2 = 0010
o Convert 4100.12510 into a fixed-point binary 3 = 0011
 410010 = 1 0000 0000 01002 and 4 = 0100
5 = 0101
 0.12510 = 0.0012. 6 = 0110
 Therefore, 4100.12510 = 1000 0000 0010 0.0012. 7 = 0111
8 = 1000
o Normalize 1000 0000 0010 0.0012 to 1.000 0000 0010 00012 × 212. 9 = 1001
A = 1010
o The sign bit, S, is 0 because the number is positive B = 1011
C = 1100
o The biased exponent is the true exponent plus 127; that is, D = 1101
1210 + 12710 = 13910 = 1000 10112 E = 1110
F = 1111
o The fractional significand is 000 0000 0010 0001 0000 0000
 the leading 1 is stripped and
 the significand is expanded to 23 bits.
o The final number is 0100 0101 1000 0000 0010 0001 0000 00002, 61
or 4580210016.

This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements
Due to the used decimal
precision, both numbers looks
the same, but they are not.
The 32-bit IEEE-754 FP Due to the used decimal
precision, both numbers looks
the same, but they are not.
S=1 S=1 S=0 S=0
E=1 (true exp.=-126) E=0 (true exp.=-126) E=0 (true exp.=-126) E=1 (true exp.=-126)
F=0x000000 F=0x7FFFFF F=0x7FFFFF F=0x000000

S=1 Value = ─1.17549×10-38 Value = ─1.17549×10- Value=+1.17549×10-38 Value =+1.17549×10-38 S=0

E=255 smallest negative 38 largest underflow largest underflow smallest positive E=255
F=0x00000 normalized number, negative number. positive number. normalized number, F=0x000000
i.e., there is a hidden Un-normalized, Un-normalized, i.e., there is a hidden
Value = ─∞ leading 1. i.e., no leading 1. i.e., no leading 1. leading 1. Value =+∞
normalized un-normalized normalized
-ve +ve
─1.4013×10-45 +1.4013×10-45

S=1 S=1 S=0 S=0

E=254 (true exp.=+127) E=0 (true exp.=-126) E=0 (true exp.=-126) E=254 (true exp.=+127)
The next
F=0x7FFFFF F=0x000001 S=0 or 1 F=0x000001 F=0x7FFFFF value after
E=0 (true exp.=-126) the largest
Value = ─3.40282×10+38 Value= ─1.4013×10-45 Value=+1.4013×10 -45 Value =+3.40282×10 +38 positive
F=0x000000 normalized
largest negative smallest underflow smallest underflow largest positive number.
normalized number, negative number. Value=+/-zero positive number. normalized number,
i.e., there is a hidden Un-normalized, Un-normalized, Un-normalized, i.e., there is a hidden
leading 1. i.e., no leading 1. i.e., no leading 1. i.e., no leading 1. leading 1.

The step-size between consecutive floating-point numbers is NOT always constant as in integer numbers. 62

To compare two floating-point values without fully decode them, you need to compare S, E, and then F values in order.
This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.

FloatingPoint Handout
No ratings yet
FloatingPoint Handout
122 pages
COA - Unit2 Floating Point Arithmetic 2
No ratings yet
COA - Unit2 Floating Point Arithmetic 2
67 pages
Discrete Mathematics AAQS007-4-1 Number Base System: Answers
No ratings yet
Discrete Mathematics AAQS007-4-1 Number Base System: Answers
2 pages
Year 5 Summer Block 1 Decimals Full File
No ratings yet
Year 5 Summer Block 1 Decimals Full File
171 pages
Class 3 Place Value Assignment 1
82% (11)
Class 3 Place Value Assignment 1
3 pages
Binary Practice
No ratings yet
Binary Practice
5 pages
Ec-214 Digital Logic Design: J.Ravindranadh SR - Associate Professor. E-Mail
No ratings yet
Ec-214 Digital Logic Design: J.Ravindranadh SR - Associate Professor. E-Mail
65 pages
Arithmetic, Logic Instructions, and Programs
No ratings yet
Arithmetic, Logic Instructions, and Programs
62 pages
f31 Book Arith Pres Pt5
No ratings yet
f31 Book Arith Pres Pt5
93 pages
Floating Point Arithmetic
No ratings yet
Floating Point Arithmetic
74 pages
Expected Coding Decoding Questions For IBPS Clerk Mains Exam
No ratings yet
Expected Coding Decoding Questions For IBPS Clerk Mains Exam
22 pages
Ottoman Turkish-Transliteration
No ratings yet
Ottoman Turkish-Transliteration
3 pages
Week 7 Class 2
No ratings yet
Week 7 Class 2
14 pages
Chap 3 L1 - Place Value
No ratings yet
Chap 3 L1 - Place Value
10 pages
Assigned Byte 0 Assigned Byte 1 B0 B1: Computer Organization & Assembly Language ASSIGNMENT#2 (FALL 2023)
No ratings yet
Assigned Byte 0 Assigned Byte 1 B0 B1: Computer Organization & Assembly Language ASSIGNMENT#2 (FALL 2023)
62 pages
Cáiuén (才文) : Notable features
No ratings yet
Cáiuén (才文) : Notable features
1 page
English Lesson 1-Reading and Writing Draft-6
No ratings yet
English Lesson 1-Reading and Writing Draft-6
37 pages
Floating Point: 15-213: Introduction To Computer Systems 4 Lecture, Sep. 10, 2015
No ratings yet
Floating Point: 15-213: Introduction To Computer Systems 4 Lecture, Sep. 10, 2015
40 pages
Hiragana: Mem3e Japanese Lesson 1
No ratings yet
Hiragana: Mem3e Japanese Lesson 1
17 pages
Floating Point Arithmetic: Numbers
No ratings yet
Floating Point Arithmetic: Numbers
14 pages
Cosc 2150: Computer Organization: Chapter 9, Part 3 Floating Point Numbers
No ratings yet
Cosc 2150: Computer Organization: Chapter 9, Part 3 Floating Point Numbers
39 pages
No NIK My Orange Status Nama Lengkap Karyawan Tanggal Kejadian
No ratings yet
No NIK My Orange Status Nama Lengkap Karyawan Tanggal Kejadian
27 pages
String Functions
No ratings yet
String Functions
4 pages
TP3 Fractional and Floating Point Numbers Solutions
No ratings yet
TP3 Fractional and Floating Point Numbers Solutions
33 pages
Lect4 Floats
No ratings yet
Lect4 Floats
64 pages
5 Data - Floating - Point v1
No ratings yet
5 Data - Floating - Point v1
25 pages
CS101 Assignment 1 Solution..
No ratings yet
CS101 Assignment 1 Solution..
2 pages
String: Etl Labs PVT LTD - PHP
No ratings yet
String: Etl Labs PVT LTD - PHP
8 pages
L3 Source of Error, Floating-Point
No ratings yet
L3 Source of Error, Floating-Point
26 pages
Week 3
No ratings yet
Week 3
66 pages
Essay in Latex
No ratings yet
Essay in Latex
4 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
30 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
30 pages
01 The Alphabet
No ratings yet
01 The Alphabet
1 page
Floating Point: - We Need A Way To Represent
No ratings yet
Floating Point: - We Need A Way To Represent
14 pages
C Programming Assignment
No ratings yet
C Programming Assignment
5 pages
Floating Point Representation of Data: By-Astha Jain Class-It1 0827IT171019
No ratings yet
Floating Point Representation of Data: By-Astha Jain Class-It1 0827IT171019
16 pages
PT Math
No ratings yet
PT Math
2 pages
Rounding Errors: Course Website
No ratings yet
Rounding Errors: Course Website
34 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
55 pages
Floating Point
No ratings yet
Floating Point
33 pages
LEC03 Data II
No ratings yet
LEC03 Data II
45 pages
EE 109 Unit 20: IEEE 754 Floating Point Representation Floating Point Arithmetic
No ratings yet
EE 109 Unit 20: IEEE 754 Floating Point Representation Floating Point Arithmetic
31 pages
Part 1
No ratings yet
Part 1
33 pages
Demystifying Floating Point - John Farrier - CppCon 2015
No ratings yet
Demystifying Floating Point - John Farrier - CppCon 2015
61 pages
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
No ratings yet
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
34 pages
Ponto Flutuante
No ratings yet
Ponto Flutuante
87 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
31 pages
GSC-320 Numerical Computing: Lecturer:Fasiha Ikram
No ratings yet
GSC-320 Numerical Computing: Lecturer:Fasiha Ikram
17 pages
Ece552 10 Floating Point
No ratings yet
Ece552 10 Floating Point
15 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Week8 Slides
No ratings yet
Week8 Slides
43 pages
Floating Point Arithmetic Class
No ratings yet
Floating Point Arithmetic Class
24 pages
Long Test in English 9quarter 1 Week 3 4
No ratings yet
Long Test in English 9quarter 1 Week 3 4
3 pages
Floating Point Arithmetic
100% (1)
Floating Point Arithmetic
30 pages
Ieee Arith
No ratings yet
Ieee Arith
3 pages
MATH1070 2 Error and Computer Arithmetic
No ratings yet
MATH1070 2 Error and Computer Arithmetic
60 pages
Cao Journal Review - Merged
No ratings yet
Cao Journal Review - Merged
13 pages
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
No ratings yet
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
51 pages
MATH1070 2 Error and Computer Arithmetic PDF
No ratings yet
MATH1070 2 Error and Computer Arithmetic PDF
60 pages
Lecture 2
No ratings yet
Lecture 2
27 pages
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
No ratings yet
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
42 pages
ML System Optimization Lecture 11 Quantization
No ratings yet
ML System Optimization Lecture 11 Quantization
150 pages
Dateadd Function: Dateadd (Interval, Number, Date)
No ratings yet
Dateadd Function: Dateadd (Interval, Number, Date)
4 pages
Real Number Representation and Floating Point Arithmetic
No ratings yet
Real Number Representation and Floating Point Arithmetic
12 pages
8.3 Floating Point Numbers
No ratings yet
8.3 Floating Point Numbers
19 pages
3345 - Roman Numerals PDF 13012024
No ratings yet
3345 - Roman Numerals PDF 13012024
4 pages
Binary Denary and Hexadecimal
No ratings yet
Binary Denary and Hexadecimal
3 pages
Lecture 4 - Floating Point Data
No ratings yet
Lecture 4 - Floating Point Data
44 pages
Floating Point
No ratings yet
Floating Point
3 pages
Numerical+Analysis+Chapter+1 2
No ratings yet
Numerical+Analysis+Chapter+1 2
13 pages
Number Bases
No ratings yet
Number Bases
4 pages
04 Float 2
No ratings yet
04 Float 2
44 pages
04 Float
No ratings yet
04 Float
40 pages
IEEE 754 Floating Point Notes
No ratings yet
IEEE 754 Floating Point Notes
4 pages
Topical Questions 2210-NUMBER SYSTEM (Encrypted)
No ratings yet
Topical Questions 2210-NUMBER SYSTEM (Encrypted)
36 pages
CH08.2-Computer Arithmetic
No ratings yet
CH08.2-Computer Arithmetic
14 pages
Floating Point & Fixed Point Representation - BCA II
No ratings yet
Floating Point & Fixed Point Representation - BCA II
24 pages
Chapter 1 - Izaac-Wang - Computational Quantum Mechanics (2018)
No ratings yet
Chapter 1 - Izaac-Wang - Computational Quantum Mechanics (2018)
12 pages
Floating - Point - Number
No ratings yet
Floating - Point - Number
36 pages
CH03 Data II
No ratings yet
CH03 Data II
31 pages
Xi String
No ratings yet
Xi String
12 pages
Floating-Point Numbers
No ratings yet
Floating-Point Numbers
23 pages
2.4 Floating Points
No ratings yet
2.4 Floating Points
36 pages
Floating Point
No ratings yet
Floating Point
13 pages
08 FloatingPoint
No ratings yet
08 FloatingPoint
52 pages
L2-Variables and Floating Point Number System
No ratings yet
L2-Variables and Floating Point Number System
38 pages
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet

COA Chapter 02 Part 3

Uploaded by

COA Chapter 02 Part 3

Uploaded by

CHAPTER 2

Rounding and Errors

Rounding and Errors

Rounding and Errors

 A binary floating-point number is represented by

Normalization of Floating-point Numbers

o If a floating-point calculation is to yield the value 0.110... 2× 2e,

Significand and Exponent Encoding

 The exponent is represented in a biased form,

 A real number such as 1010.1111 is normalized to get +1.0101111 × 23.

o The true exponent is +3, which is encoded as a biased exponent of

 Likewise, if a biased exponent is 13010, the true exponent is 130 – 127 = 3

Significand and Exponent Encoding

1 ≤ E ≤ 254  X = (–1)S × 2(E – B) × 1.F

Significand and Exponent Encoding

 When E = 0, F ≠ 0  ± Denormalized underflow number

Significand and Exponent Encoding

 The floating-point value of zero is represented by

Significand and Exponent Encoding

The L value =1, if and only if E ≠ 0

Significand and Exponent Encoding

In this case, B – 1 is 126

E = 0  X = (–1)S × 2(0 – (B – 1)) × 0.F

From Binary to 32-bit IEEE-754 FP 0 = 0000

From 32-bit IEEE-754 FP to Binary

From 32-bit IEEE-754 FP to Decimal

From 32-bit IEEE-754 FP to Decimal

From 32-bit IEEE-754 FP to Decimal

From 32-bit IEEE-754 FP to Decimal

From 32-bit IEEE-754 FP to Decimal

o The fractional significand is .110 1100 0000 0000 0000 00002.

From Binary to 32-bit IEEE-754 FP

o Hence the number will be

From Binary to 32-bit IEEE-754 FP

From Binary to 32-bit IEEE-754 FP

o Hence the number will be 1.112 × 2-152

From Binary to 32-bit IEEE-754 FP

o Hence the number will be 1.11112 × 2 -150

From Binary to 32-bit IEEE-754 FP

From Binary to 32-bit IEEE-754 FP

From Decimal to 32-bit IEEE-754 FP

S=1 Value = ─1.17549×10-38 Value = ─1.17549×10- Value=+1.17549×10-38 Value =+1.17549×10-38 S=0

S=1 S=1 S=0 S=0

You might also like