0% found this document useful (0 votes)
16 views28 pages

COA Chapter 02 Part 3

Uploaded by

BottleFlip Guy21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views28 pages

COA Chapter 02 Part 3

Uploaded by

BottleFlip Guy21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

CHAPTER 2

Computer
Arithmetic and
Digital Logic

These slides are being provided with permission from the copyright for CS2208
1 use only. The slides must not be reproduced or provided to anyone outside of
the class.
All download copies of the slides are for personal use only.
Students must destroy these copies within 30 days after receipt of final course
evaluations.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

Rounding and Errors


 Floating-point arithmetic can lead to an increase in the number of bits in
the fractional part
 To keep the number of fractional bits constant, rounding is needed
o Error will be induced
 The rounding mechanisms include
o Truncation (i.e., dropping unwanted bits) by rounding towards zero;
a.k.a., rounding down
o Rounding towards positive or negative infinity: the nearest valid
floating-point number in the direction of positive infinity (for positive
values) or negative infinity (for negative values) is chosen to decide
the rounding; a.k.a., rounding up.
o Rounding to nearest: the closest valid floating-point number to the
actual value is used.

36

This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

Rounding and Errors


 Integer rounding examples:
Rounding towards zero (i.e., rounding down)
o +4.7 truncation, i.e., rounded towards zero  +4
o –4.7 truncation, i.e., rounded towards zero  –4
In truncation, we just get rid of the extra digits (regardless the number
is positive or negative). The result is rounding towards zero.
Rounding towards ± infinity (i.e., rounding up)
It is the opposite of rounding towards zero
o +4.7 rounded towards + infinity +5
o –4.7 rounded towards – infinity –5
Rounding to nearest
o +4.7 rounded to nearest  +5
o –4.7 rounded to nearest  –5
o +4.3 rounded to nearest  +4
o –4.3 rounded to nearest  –4

37

This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

Rounding and Errors


 In binary, when the number to be rounded is midway between two points
on the floating-point line, rounding to the nearest selects the value whose
least-significant digit is zero (i.e., rounding to an even binary significand).
 For example:
 0.1110000111100001111000010002 will be rounded to
0.111000011110000111100002
 0.1110000111100001111000110002 will be rounded to
0.111000011110000111100102

38

This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

Normalization
 A number is called normalized when it is written in scientific notation
with a single non-zero digit before the radix point (i.e., the integer
part consists of a single non-zero digit).

Example 1:
 The number 123.45610 is not normalized, as the integer part is not a
single non-zero digit.
 To normalize it, you need to move the decimal point two position to
the left and to compensate this move by multiplying the number by
100, i.e.,
 1.2345610 × 102

Example 2:
 The number 0.0012310 is not normalized, as the integer part is not a
single non-zero digit.
 To normalize it, you need to move the decimal point three position to
the right and to compensate this move by dividing the number by
1000, i.e.,
 1.2310 × 10-3
39
 In base b, a normalized number will have the form ± b0 . b1 b2 b3... × bn
where b0 ≠ 0, and b0, b1, b2, b3 ... are integers between 0 and b -1
This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

Floating-point Numbers
 Floating-point arithmetic lets you handle the very large and very small
values found in scientific applications.
 Floating-point is also called scientific notation, because scientists use it
to represent large numbers (e.g., 1.2345 × 1020) and small numbers that
are very close to zero, but not zero (e.g., 0.45679999 × 10-50).
 A floating-point value is encoded as two components: a number and
an adjustment to the location of the radix point within the number.

 A binary floating-point number is represented by


mantissa × 2exponent
o for example, 101010.1111102 can be represented by
1.010101111102 × 25, where
 the significant digits (or simply significand) is 1.01010111110
and
 the exponent is 5 (000001012 in 8-bit binary arithmetic).
 The term mantissa has been replaced by significand to indicate the
number of significant bits in a floating-point number.
 Because a floating-point number is defined as
40
the product of two values, a floating-point value is not unique;
for example, 10.1102 × 24 = 1.0112 × 25.
This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

Normalization of Floating-point Numbers


 In the IEEE-754 Standard for Floating-Point Arithmetic, the significand
term is always normalized (unless it represents a zero or underflow)
 A normalized binary significand always has a leading 1 (i.e., 1 in the MSB)
 The normalized absolute non-zero values of the IEEE-754 FP numbers
are always in the range The book is missing the –ve sign here
The minimum The maximum
absolute value 1.000…02 × 2 -e to 1.111…12 × 2e absolute value
 The floating-point normalization leads to the highest available precision,
as all significant bits are utilized.
o the un-normalized 8-bit significand 0.0000101 has only three
significant bits, whereas
o the normalized 8-bit significand 1.0100011 has eight three
significant bits. not four

o If a floating-point calculation is to yield the value 0.110... 2× 2e,


the result would be normalized to give 1.10... 2 × 2e -1.
o Similarly, the result 10.1... 2 × 2e would be normalized to 1.01... 2×2e+1.
41

This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

Significand and Exponent Encoding


 The significand of an IEEE-754 floating-point number is represented in
sign and magnitude form.

 The exponent is represented in a biased form,


by adding a constant to the true exponent.

 Suppose an 8-bit exponent is used and all exponents are biased by 127.
o If the true exponent is 0, it will be encoded as 0 + 127 = 127.
o If the true exponent is –2, it will be encoded as –2 + 127 = 125.
o If the true exponent is +2, it will be encoded as +2 + 127 = 129.

 A real number such as 1010.1111 is normalized to get +1.0101111 × 23.

o The true exponent is +3, which is encoded as a biased exponent of


3 + 127; that is 13010 or 10000010 in binary form.

 Likewise, if a biased exponent is 13010, the true exponent is 130 – 127 = 3

42

This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

Significand and Exponent Encoding


 A 32-bit single-precision IEEE-754 floating-point number is represented
by the bit sequence
S EEEEEEEE 1.FFFFFFFFFFFFFFFFFFFFFFF
o S is the sign bit,
 0 means positive significand,
 1 means negative significand
o E is an eight-bit biased exponent that tells you how to shift the
binary point, and
o F is a 23-bit fractional significand.
o The leading 1 and the binary point in front of the significand
are omitted when the number is encoded. In this case, B is 127,
 A floating-point number X is defined as: i.e., excess-127 code

1 ≤ E ≤ 254  X = (–1)S × 2(E – B) × 1.F

When 1 ≤ E ≤ 254,
the significand =
1 + the fractional significand F
43

This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

Significand and Exponent Encoding


 If the exponent EEEEEEEE > 0, the significand of an IEEE-754 floating-
point number is normalized in the range 1.0000...00 to 1.1111...11,
 If the exponent EEEEEEEE = 0, the significand is Used when it is impossible to
represented without normalization. normalize the number.
o In such cases, the floating-point number X is defined as:
S 00000000 0.FFFFFFFFFFFFFFFFFFFFFFF
E = 0  X = (–1)S × 2(0 – (B – 1)) × 0.F
When E = 0, In this case, B – 1 is 126,
the significand = 0 + the fractional significand F
where, i.e., excess-126 code
o S is the sign bit,
 0 means positive significand,
 1 means negative significand
o E=0
 the exponent was biased by B – 1
o F is the fractional significand
 As E = 0, the significand was encoded without normalization,
i.e., 0.F without an implicit leading one
44

 When E = 0, F ≠ 0  ± Denormalized underflow number


This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

Significand and Exponent Encoding

 The floating-point value of zero is represented by


0.00...00 × 2most negative exponent
i.e., the zero is represented by
o a zero significand and
o a zero biased exponent
as Figure 2.6 demonstrates.
In this floating-point representation,
how many zeros do we have?

45

This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

Significand and Exponent Encoding


float type in Java and C
double type in Java and C

The L value =1, if and only if E ≠ 0


The L value =0, if and only if E = 0

If E ≠ 0, True exponent =
biased exponent – bias
Biased
values If E = 0, True exponent =
} 0 – (bias – 1)

Unbiased
} values
The book flipped the meaning of S. It is S=0 for +ve and =1 for –ve.
In the IEEE single precision representation,
1  254 for NORMALIZED numbers
the largest normalized absolute number is
46 +38.
2+127 × 1.111…12 ≈ 2+128 = 10+38.5318394 ≈ 3.4×10
the smallest normalized absolute number is
When E = 255 2-126New
× content
1.000…0 2=2
This slide is modified from the original slide by the author A. Clements and used with permission. added and -126 = 10by
copyrighted -37.9297794
© Mahmoud ≈R.1.17×10
El-Sakka. -38.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

Significand and Exponent Encoding

In this case, B – 1 is 126

E = 0  X = (–1)S × 2(0 – (B – 1)) × 0.F


 Underflow occurs when the result of a calculation is a very small number;
smaller in magnitude than the smallest value representable as a
normalized floating-point number in the target data type.
 Replacing an underflow case by a zero might be ok from the addition
point of view, but it is not ok from the multiplication point of view.
47
 NaN means Not a Number, e.g., 0 ÷ 0, ∞ ÷ ∞, 0 × ∞, or ∞ ─ ∞
 In NaN, the value of F is ignored by applications.
This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

From Binary to 32-bit IEEE-754 FP 0 = 0000


 Example 1(a): 1 = 0001
Convert –11110000111100.001111000011112 into 2 = 0010
a 32-bit single-precision IEEE-754 FP value. 3 = 0011
o The number is negative  S = 1 4 = 0100
o The significand is 11110000111100.001111000011112 5 = 0101
6 = 0110
o The normalized significand is 7 = 0111
1.1110000111100001111000011112× 213
8 = 1000
o The biased exponent is the true exponent plus 127; that is, 9 = 1001
13 + 127 = 14010 = 1000 11002 A = 1010
Hence, E = 1000 11002
B = 1011
o To encode the F value, we will ignore the leading 1 and C = 1100
we will only consider the first 23 bits after the binary point, D = 1101
i.e., 1110000111100001111000011112
E = 1110
o The ignored part of the significand is rounded to the nearest, F = 1111
hence the value of F = 111000011110000111100012
o The final number is 1100 0110 0111 0000 1111 0000 1111 00012,
or C670F0F116
or 0xC670F0F1
48
or 0XC670F0F1

This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

From 32-bit IEEE-754 FP to Binary


 Example 1(b): Convert C670F0F116 from a 32-bit single-precision
IEEE-754 FP value into a binary value It can also be written as 0xC670F0F1
---This is the same value as in example 1(a)
o Unpack the number into sign bit, biased exponent, and fractional
significand:
C670F0F116 1100 0110 0111 0000 1111 0000 1111 00012
 S=1
 E = 100 0110 0
 F =111 0000 1111 0000 1111 0001
o As the sign bit is 1, the number is negative.
o Subtract 127 from the biased exponent 100 0110 02 to get
the true exponent  1000 11002 – 0111 11112 = 0000 11012 = 1310.
o The fractional significand is .111 0000 1111 0000 1111 00012.
o Reinserting the leading one gives 1.111 0000 1111 0000 1111 00012.
o The number is –1.111 0000 1111 0000 1111 00012 × 213
= –1111 0000 1111 00.00 1111 00012
Note that the correct answer is:
–1111 0000 1111 00.00 1111 00012 not 49
–1111 0000 1111 00.00 1111 000011112
This is due to the rounding error.
This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

From 32-bit IEEE-754 FP to Decimal


 Example 2: Convert 1111 1110 0110 0000 0000 0000 0000 00002 from
a 32-bit single-precision IEEE-754 FP value into a decimal value.
o Unpack the number into sign bit, biased exponent, and fractional
significand.
 S=1
 E = 1111 1100
 F =110 0000 0000 0000 0000 0000
o As the sign bit is 1, the number is negative.
o Subtract 127 from the biased exponent 1111 11002 to get
the true exponent  1111 11002 – 0111 11112 = 0111 11012 = 12510.
o The fractional significand is .110 0000 0000 0000 0000 00002.
o Reinserting the leading one gives 1.110 0000 0000 0000 0000 00002.
o The number is –1.112 × 2125 = – 1.7510 × 2125
2125 = 10z  log10(2125) = z  z = 125 × 0.30103 = 37.62875
2125 = 1037.62875 = 1037 × 100.62875 = 1037 × 4.25353 50
–1.75 × 2 = –1.75 × 10 × 4.25353 = –7.4436775 × 10
125 37 37

This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

From 32-bit IEEE-754 FP to Decimal


 Example 3: Convert 1000 0000 0110 0000 0000 0000 0000 00002 from
a 32-bit single-precision IEEE-754 FP value into a decimal value.
o Unpack the number into sign bit, biased exponent, and fractional
significand.
 S=1
 E = 0000 0000
 F =110 0000 0000 0000 0000 0000
o As the sign bit is 1, the number is negative.
o As E = 0  true exponent = 0 – (127 – 1) = –126
o The fractional significand is .110 0000 0000 0000 0000 00002.
o As E = 0, the fractional significand is not normalized. The L value =0,
as E = 0
o As E = 0 and F ≠ 0, it means that this is an underflow case.
o The number is –0.112 × 2-126 = – 0.75 × 2-126
2-126 = 10z  log10(2-126)= z  z = -126×0.30103 = -37.92978
2-126 = 10-37.92978 =10-37 × 10 -0.92978= 10-37 × 0.11755
51
–0.75 × 2-126 = –0.75 × 10-37 × 0.11755 = –0.088162 × 10-37
= –8.8162 × 10-39 < the smallest normalized value (–1.17×10-38)
This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

From 32-bit IEEE-754 FP to Decimal


 Example 4: Convert 0111 1111 1000 0000 0000 0000 0000 00002 from
a 32-bit single-precision IEEE-754 FP value into a decimal value.
o Unpack the number into sign bit, biased exponent, and fractional
significand.
 S=0
 E = 1111 1111
 F =000 0000 0000 0000 0000 0000
o As the sign bit is 0, the number is positive.
o As E = 255  either an infinity case or a NaN case
o The fractional significand is .000 0000 0000 0000 0000 00002.
o As the biased exponent is 255 and the F and the S values are zero,
it means that this is a +infinity case, i.e., a number larger than
3.4028235 ×10+38

52

This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

From 32-bit IEEE-754 FP to Decimal


 Example 5: Convert 1111 1111 1110 0000 0000 0000 0000 00002 from
a 32-bit single-precision IEEE-754 FP value into a decimal value.
o Unpack the number into sign bit, biased exponent, and fractional
significand.
 S=1
 E = 1111 1111
 F =110 0000 0000 0000 0000 0000
o As the sign bit is 1, the number is negative.
o As E = 255  either an infinity case or a NaN case
o The fractional significand is .110 0000 0000 0000 0000 00002.
o As the biased exponent is 255, the F value is NOT zero, and the S
value is 1, it means that this is a –NaN case (Not a Number),
e.g., the result of a 0 ÷ 0, ∞ ÷ ∞, 0 × ∞, or ∞ ─ ∞ operation.
o In –NaN or +NaN cases, the value of F is ignored.
o The value –NaN
53

This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

From 32-bit IEEE-754 FP to Decimal


 Example 6: Convert C46C000016 from 32-bit single-precision 0 = 0000
IEEE-754 FP value into a decimal value. 1 = 0001
2 = 0010
o Convert the hexadecimal number into binary form 3 = 0011
C46C000016 = 1100 0100 0110 1100 0000 0000 0000 00002. 4 = 0100
5 = 0101
o Unpack the number into sign bit, biased exponent, and fractional 6 = 0110
significand. 7 = 0111
 S=1 8 = 1000
 E = 1000 1000 it is 9 9 = 1001
not 7 A = 1010
 F =110 1100 0000 0000 0000 0000 B = 1011
C = 1100
o As the sign bit is 1, the number is negative. D = 1101
o We subtract 127 from the biased exponent 1000 10002 to get E = 1110
the true exponent  1000 1000 – 0111 1111 = 0000 1001 = 9 . F = 1111
2 2 2 10

o The fractional significand is .110 1100 0000 0000 0000 00002.


o Reinserting the leading one gives 1.110 1100 0000 0000 0000 00002.
o The number is –1.110 1100 0000 0000 0000 00002 × 29,
54
or –1110 1100 00.00 0000 0000 00002 (i.e., –944.010).

This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

From Binary to 32-bit IEEE-754 FP


 Example 7: Convert 0.1000 1000 0000 0000 0000 0000 0001 112 × 2 -124
into a 32-bit single-precision IEEE-754 FP value.
o The number is positive  S = 0
o The fractional part is 0.1000 1000 0000 0000 0000 0000 0001 112
The normalized fractional part is
1.000 1000 0000 0000 0000 0000 0001 112× 2-1

o Hence the number will be


1.000 1000 0000 0000 0000 0000 0001 112 × 2 -125
o As the exponent is greater than or equal –126, the fractional part
will be represented as a normalized number
o The number = 1.000 1000 0000 0000 0000 0000 0001 112 × 2 -125
o As F is normalized the biased exponent will be
the true exponent plus 127; Rounded to
that is, –125 + 127 = 2; Hence, E = 0000 00102 the nearest
o The encoded F value (23 bits) will be 000 1000 0000 0000 0000 0000
o The final number is 0000 0001 0000 1000 0000 0000 0000 00002,
or 0108000016. 55

This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

From Binary to 32-bit IEEE-754 FP


 Example 8: Convert 0.0000 1000 0000 0000 0000 0000 0001 112 × 2 -124
into a 32-bit single-precision IEEE-754 FP value.
o The number is positive  S = 0
o The fractional part is 0.0000 1000 0000 0000 0000 0000 0001 112
The normalized fractional part is
1.000 0000 0000 0000 0000 0001 112× 2-5

o Hence the number will be 1.000 0000 0000 0000 0000 0001 112 × 2 -129
o As the exponent is less than –126, the fractional part can NOT be
represented as a normalized number (the number is too small)
o Instead, we will attempt to represent it as
an un-normalized underflow number with exponent = –126
o The number = 0.001 0000 0000 0000 0000 0000 0011 12 × 2 -126
o As F is un-normalized the biased exponent will be
the true exponent plus 127 – 1;
that is, –126 + 127 – 1 = 0; Hence, E = 0000 00002 Rounded to
the nearest
o The encoded F value (23 bits) will be 001 0000 0000 0000 0000 0000
56
o The final number is 0000 0000 0001 0000 0000 0000 0000 00002,
or 0010000016.
This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

From Binary to 32-bit IEEE-754 FP


 Example 9: Convert 0.0000 0000 0000 0000 0000 0000 0001 112 × 2 -124
into a 32-bit single-precision IEEE-754 FP value.
o The number is positive  S = 0
o The fractional part is 0.0000 0000 0000 0000 0000 0000 0001 112
The normalized fractional part is 1.112× 2-28

o Hence the number will be 1.112 × 2-152


o As the exponent is less than –126, the fractional part can NOT be
represented as a normalized number (the number is too small)
o Instead, we will attempt to represent it as
an un-normalized underflow number with exponent = –126
o The number = 0.000 0000 0000 0000 0000 0000 0011 12 × 2 -126
o As F is un-normalized the biased exponent will be
the true exponent plus 127 – 1; Rounded to
that is, –126 + 127 – 1 = 0; Hence, E = 0000 00002 the nearest
o The encoded F value (23 bits) will be 000 0000 0000 0000 0000 0000
o The final number is 0000 0000 0000 0000 0000 0000 0000 00002,
57
or 0000000016.
I.e., the number is encoded as ZERO
This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

From Binary to 32-bit IEEE-754 FP


 Example 10: Convert 0.0000 0000 0000 0000 0000 0000 0111 112 × 2 -124
into a 32-bit single-precision IEEE-754 FP value.
o The number is positive  S = 0
o The fractional part is 0.0000 0000 0000 0000 0000 0000 0111 112
The normalized fractional part is 1.11112× 2 -26

o Hence the number will be 1.11112 × 2 -150


o As the exponent is less than –126, the fractional part can NOT be
represented as a normalized number (the number is too small)
o Instead, we will attempt to represent it as
an un-normalized underflow number with exponent = –126
o The number = 0.000 0000 0000 0000 0000 0000 1111 12 × 2 -126
o As F is un-normalized the biased exponent will be
the true exponent plus 127 – 1; Rounded to
that is, –126 + 127 – 1 = 0; Hence, E = 0000 00002 the nearest
o The encoded F value (23 bits) will be 000 0000 0000 0000 0000 0001
o The final number is 0000 0000 0000 0000 0000 0000 0000 00012,
58
or 0000000116 ---the smallest non-zero positive un-normalized
underflow number (1.4012985×10-45)
This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

From Binary to 32-bit IEEE-754 FP


 Example 11: Convert 1111.1111 1111 1111 1111 1111 0112 × 2 124 into
a 32-bit single-precision IEEE-754 FP value.
o The number is positive  S = 0
o The fractional part is 1111.1111 1111 1111 1111 1111 0112
The normalized fractional part is
1.111 1111 1111 1111 1111 1111 0112 × 23

o Hence the number will be 1.111 1111 1111 1111 1111 1111 0112× 2127
o The biased exponent is the true exponent plus 127;
that is, 127 + 127 = 254; Hence, E = 1111 11102
o To encode the F value, we will ignore the leading 1 and
we will only consider the first 23 bits after the binary point, i.e.,
111 1111 1111 1111 1111 1111
Rounded to
the nearest
o The final number is 0111 1111 0111 1111 1111 1111 1111 11112,
or 7F7FFFFF16.
59
o This number is the largest positive normalized number
(3.4028235×10+38)
This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

From Binary to 32-bit IEEE-754 FP


 Example 12: Convert 1111.1111 1111 1111 1111 1111 1112 × 2 124 into
a 32-bit single-precision IEEE-754 FP value.
o The number is positive  S = 0
o The fractional part is 1111.1111 1111 1111 1111 1111 1112
The normalized fractional part is
1.111 1111 1111 1111 1111 1111 1112 × 23
o Hence the number will be 1.111 1111 1111 1111 1111 1111 1112× 2127
o To encode the F value, we will only consider the first 23 bits after the
binary point
o Note that, the rounding here will add 1 to the fraction to make it
10.000 0000 0000 0000 0000 0000 2× 2127
o As a result of this, the number needs to renormalized again
1.0000 0000 0000 0000 0000 0000 2× 2128 As long as the true
exponent is > 127,
o The true exponent of the normalized number is > 127, the number will be
hence the number will be encoded as +infinity, i.e., encoded as infinity,
 the F value will be 000 0000 0000 0000 0000 0000 regardless of the
 the E value will be 1111 11112 value of F.
60
o The final number is 0111 1111 1000 0000 0000 0000 0000 00002
i.e., +infinity (7F80000016)
This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements

From Decimal to 32-bit IEEE-754 FP


 Example 13: Convert 4100.12510 into a 32-bit single-precision 0 = 0000
IEEE-754 FP value. 1 = 0001
2 = 0010
o Convert 4100.12510 into a fixed-point binary 3 = 0011
 410010 = 1 0000 0000 01002 and 4 = 0100
5 = 0101
 0.12510 = 0.0012. 6 = 0110
 Therefore, 4100.12510 = 1000 0000 0010 0.0012. 7 = 0111
8 = 1000
o Normalize 1000 0000 0010 0.0012 to 1.000 0000 0010 00012 × 212. 9 = 1001
A = 1010
o The sign bit, S, is 0 because the number is positive B = 1011
C = 1100
o The biased exponent is the true exponent plus 127; that is, D = 1101
1210 + 12710 = 13910 = 1000 10112 E = 1110
F = 1111
o The fractional significand is 000 0000 0010 0001 0000 0000
 the leading 1 is stripped and
 the significand is expanded to 23 bits.
o The final number is 0100 0101 1000 0000 0010 0001 0000 00002, 61
or 4580210016.

This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.
Computer Organization and Architecture: Themes and Variations, 1st Edition Clements
Due to the used decimal
precision, both numbers looks
the same, but they are not.
The 32-bit IEEE-754 FP Due to the used decimal
precision, both numbers looks
the same, but they are not.
S=1 S=1 S=0 S=0
E=1 (true exp.=-126) E=0 (true exp.=-126) E=0 (true exp.=-126) E=1 (true exp.=-126)
F=0x000000 F=0x7FFFFF F=0x7FFFFF F=0x000000

S=1 Value = ─1.17549×10-38 Value = ─1.17549×10- Value=+1.17549×10-38 Value =+1.17549×10-38 S=0


E=255 smallest negative 38 largest underflow largest underflow smallest positive E=255
F=0x00000 normalized number, negative number. positive number. normalized number, F=0x000000
i.e., there is a hidden Un-normalized, Un-normalized, i.e., there is a hidden
Value = ─∞ leading 1. i.e., no leading 1. i.e., no leading 1. leading 1. Value =+∞
normalized un-normalized normalized
-ve +ve
─1.4013×10-45 +1.4013×10-45

S=1 S=1 S=0 S=0


E=254 (true exp.=+127) E=0 (true exp.=-126) E=0 (true exp.=-126) E=254 (true exp.=+127)
The next
F=0x7FFFFF F=0x000001 S=0 or 1 F=0x000001 F=0x7FFFFF value after
E=0 (true exp.=-126) the largest
Value = ─3.40282×10+38 Value= ─1.4013×10-45 Value=+1.4013×10 -45 Value =+3.40282×10 +38 positive
F=0x000000 normalized
largest negative smallest underflow smallest underflow largest positive number.
normalized number, negative number. Value=+/-zero positive number. normalized number,
i.e., there is a hidden Un-normalized, Un-normalized, Un-normalized, i.e., there is a hidden
leading 1. i.e., no leading 1. i.e., no leading 1. i.e., no leading 1. leading 1.

The step-size between consecutive floating-point numbers is NOT always constant as in integer numbers. 62

To compare two floating-point values without fully decode them, you need to compare S, E, and then F values in order.
This slide is modified from the original slide by the author A. Clements and used with permission. New content added and copyrighted by © Mahmoud R. El-Sakka.

You might also like