0% found this document useful (0 votes)
52 views19 pages

COA - Unit2 Floating Point Arithmetic 3

The document discusses floating point representations and arithmetic. It begins with an example of decimal division and then covers IEEE 754 floating point number representations using sign-magnitude notation. It explains how numbers are represented with a sign bit, exponent field, and fraction field. It also discusses details like exponent biasing and special values like infinity and NaN. Finally, it provides examples of floating point addition and multiplication algorithms and shows MIPS instructions for floating point operations.

Uploaded by

Devika csbs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views19 pages

COA - Unit2 Floating Point Arithmetic 3

The document discusses floating point representations and arithmetic. It begins with an example of decimal division and then covers IEEE 754 floating point number representations using sign-magnitude notation. It explains how numbers are represented with a sign bit, exponent field, and fraction field. It also discusses details like exponent biasing and special values like infinity and NaN. Finally, it provides examples of floating point addition and multiplication algorithms and shows MIPS instructions for floating point operations.

Uploaded by

Devika csbs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 19

Floating Point

• Today’s topics:

 Division
 IEEE 754 representations
 FP arithmetic

1
Divide Example
• Divide 7ten (0000 0111two) by 2ten (0010two)
Iter Step Quot Divisor Remainder
0 Initial values 0000 0010 0000 0000 0111
1 Rem = Rem – Div 0000 0010 0000 1110 0111
Rem < 0  +Div, shift 0 into Q 0000 0010 0000 0000 0111
Shift Div right 0000 0001 0000 0000 0111
2 Same steps as 1 0000 0001 0000 1111 0111
0000 0001 0000 0000 0111
0000 0000 1000 0000 0111
3 Same steps as 1 0000 0000 0100 0000 0111
4 Rem = Rem – Div 0000 0000 0100 0000 0011
Rem >= 0  shift 1 into Q 0001 0000 0100 0000 0011
Shift Div right 0001 0000 0010 0000 0011
5 Same steps as 4 0011 0000 0001 0000 0001
2
Hardware for Division

Source: H&P textbook

A comparison requires a subtract; the sign of the result is


examined; if the result is negative, the divisor must be added back

Similar to multiply, results are placed in Hi (remainder) and Lo (quotient)


3
Efficient Division

4
Divisions involving Negatives

• Simplest solution: convert to positive and adjust sign later

• Note that multiple solutions exist for the equation:


Dividend = Quotient x Divisor + Remainder

+7 div +2 Quo = Rem =


-7 div +2 Quo = Rem =
+7 div -2 Quo = Rem =
-7 div -2 Quo = Rem =

5
Divisions involving Negatives

• Simplest solution: convert to positive and adjust sign later

• Note that multiple solutions exist for the equation:


Dividend = Quotient x Divisor + Remainder

+7 div +2 Quo = +3 Rem = +1


-7 div +2 Quo = -3 Rem = -1
+7 div -2 Quo = -3 Rem = +1
-7 div -2 Quo = +3 Rem = -1

Convention: Dividend and remainder have the same sign


Quotient is negative if signs disagree
These rules fulfil the equation above

6
Floating Point

• Normalized scientific notation: single non-zero digit to the


left of the decimal (binary) point – example: 3.5 x 109

• 1.010001 x 2-5two = (1 + 0 x 2-1 + 1 x 2-2 + … + 1 x 2-6) x 2-5ten

• A standard notation enables easy exchange of data between


machines and simplifies hardware algorithms – the
IEEE 754 standard defines how floating point numbers
are represented

7
Sign and Magnitude Representation

Sign Exponent Fraction


1 bit 8 bits 23 bits
S E F

• More exponent bits  wider range of numbers (not necessarily more


numbers – recall there are infinite real numbers)

• More fraction bits  higher precision

• Register value = (-1)S x F x 2E

• Since we are only representing normalized numbers, we are


guaranteed that the number is of the form 1.xxxx..
Hence, in IEEE 754 standard, the 1 is implicit
Register value = (-1)S x (1 + F) x 2E
8
Sign and Magnitude Representation

Sign Exponent Fraction


1 bit 8 bits 23 bits
S E F

• Largest number that can be represented:

• Smallest number that can be represented:

9
Sign and Magnitude Representation
Sign Exponent Fraction
1 bit 8 bits 23 bits
S E F

• Largest number that can be represented: 2.0 x 2128 = 2.0 x 1038

• Smallest number that can be represented: 1.0 x 2-127 = 2.0 x 10-38

• Overflow: when representing a number larger than the one above;


Underflow: when representing a number smaller than the one above

• Double precision format: occupies two 32-bit registers:


Largest: Smallest:
Sign Exponent Fraction
1 bit 11 bits 52 bits
S E F 10
Details

• The number “0” has a special code so that the implicit 1 does not
get added: the code is all 0s
(it may seem that this takes up the representation for 1.0, but
given how the exponent is represented, we’ll soon see that
that’s not the case)
(see discussion of denorms (pg. 222) in the textbook)

• The largest exponent value (with zero fraction) represents +/- infinity

• The largest exponent value (with non-zero fraction) represents


NaN (not a number) – for the result of 0/0 or (infinity minus infinity)

• Note that these choices impact the smallest and largest numbers
that can be represented

11
Exponent Representation

• To simplify sort, sign was placed as the first bit

• For a similar reason, the representation of the exponent is also


modified: in order to use integer compares, it would be preferable to
have the smallest exponent as 00…0 and the largest exponent as 11…1

• This is the biased notation, where a bias is subtracted from the


exponent field to yield the true exponent

• IEEE 754 single-precision uses a bias of 127 (since the exponent


must have values between -127 and 128)…double precision uses
a bias of 1023

Final representation: (-1)S x (1 + Fraction) x 2(Exponent – Bias)

12
Examples

Final representation: (-1)S x (1 + Fraction) x 2(Exponent – Bias)

• Represent -0.75ten in single and double-precision formats

Single: (1 + 8 + 23)

Double: (1 + 11 + 52)

• What decimal number is represented by the following


single-precision number?
1 1000 0001 01000…0000
13
Examples

Final representation: (-1)S x (1 + Fraction) x 2(Exponent – Bias)

• Represent -0.75ten in single and double-precision formats

Single: (1 + 8 + 23)
1 0111 1110 1000…000

Double: (1 + 11 + 52)
1 0111 1111 110 1000…000

• What decimal number is represented by the following


single-precision number?
1 1000 0001 01000…0000 14
-5.0
FP Addition

• Consider the following decimal example (can maintain


only 4 decimal digits and 2 exponent digits)

9.999 x 101 + 1.610 x 10-1


Convert to the larger exponent:
9.999 x 101 + 0.016 x 101
Add
10.015 x 101
Normalize
1.0015 x 102
Check for overflow/underflow
Round
1.002 x 102
Re-normalize 15
FP Addition

• Consider the following decimal example (can maintain


only 4 decimal digits and 2 exponent digits)

9.999 x 101 + 1.610 x 10-1


Convert to the larger exponent:
9.999 x 101 + 0.016 x 101
Add
10.015 x 101
Normalize If we had more fraction bits,
these errors would be minimized
1.0015 x 102
Check for overflow/underflow
Round
1.002 x 102
Re-normalize 16
FP Multiplication

• Similar steps:
 Compute exponent (careful!)
 Multiply significands (set the binary point correctly)
 Normalize
 Round (potentially re-normalize)
 Assign sign

17
MIPS Instructions

• The usual add.s, add.d, sub, mul, div

• Comparison instructions: c.eq.s, c.neq.s, c.lt.s….


These comparisons set an internal bit in hardware that
is then inspected by branch instructions: bc1t, bc1f

• Separate register file $f0 - $f31 : a double-precision


value is stored in (say) $f4-$f5 and is referred to by $f4

• Load/store instructions (lwc1, swc1) must still use


integer registers for address computation

18
Code Example

float f2c (float fahr)


{
return ((5.0/9.0) * (fahr – 32.0));
}

(argument fahr is stored in $f12)


lwc1 $f16, const5($gp)
lwc1 $f18, const9($gp)
div.s $f16, $f16, $f18
lwc1 $f18, const32($gp)
sub.s $f18, $f12, $f18
mul.s $f0, $f16, $f18
jr $ra

19

You might also like