Floating Point Arithmetic: Computer Architecture and Assembly Language Dr. Aiman El-Maleh
Floating Point Arithmetic: Computer Architecture and Assembly Language Dr. Aiman El-Maleh
ICS 233
Computer Architecture and Assembly Language
Dr. Aiman El-Maleh
College of Computer Sciences and Engineering
King Fahd University of Petroleum and Minerals
[Adapted from slides of Dr. M. Mudawar, ICS 233, KFUPM]
Outline
❖ Floating-Point Numbers
S Exponent Fraction
Round Round
to - Zero to +
Cox
6
Alan L. Cox [email protected]
Next . . .
❖ Floating-Point Numbers
S Exponent8 Fraction23
S Exponent11 Fraction52
(continued)
❖ Solution:
Sign = 1 is negative
Exponent = (01111100)2 = 124, E – bias = 124 – 127 = –3
Significand = (1.0100 … 0)2 = 1 + 2-2 = 1.25 (1. is implicit)
Value in decimal = –1.25 × 2–3 = –0.15625
❖ What is the decimal value of?
01000001001001100000000000000000
❖ Solution: implicit
Value in decimal = +(1.01001100 … 0)2 × 2130–127 =
(1.01001100 … 0)2 × 23 = (1010.01100 … 0)2 = 10.375
Floating Point ICS 233 – KFUPM © Muhamed Mudawar slide 12
Examples of Double Precision Float
❖ What is the decimal value of this Double Precision float ?
01000000010100101010000000000000
00000000000000000000000000000000
❖ Solution:
Value of exponent = (10000000101)2 – Bias = 1029 – 1023 = 6
Value of double float = (1.00101010 … 0)2 × 26 (1. is implicit) =
(1001010.10 … 0)2 = 74.5
❖ What is the decimal value of ?
10111111100010000000000000000000
00000000000000000000000000000000
• Single: 1011111101000…00
• Double: 1011111111101000…00
Operation Result
n / 0
x
nonzero / 0
+ (similar for -)
0 / 0 NaN
- NaN (similar for -)
/ NaN
x 0 NaN
NaN op anything NaN
Exp. exp E 2E
0 000 -2 ¼ Denormalized
1 001 -2 ¼
2 010 -1 ½
3 011 0 1
Normalized
4 100 1 2
5 101 2 4
6 110 3 8
7 111 n/a Inf or NaN
Cox 29
Simple Data Types
Alan L. Cox [email protected]
• Associativity law for addition: a + (b + c) = (a + b) + c
int i = 1000 / 6;
float f = 1000.0 / 6.0;
Surprise!
Arithmetic in binary, printing in decimal –
doesn’t always give expected result
#include <limits.h>
#include <stdio.h>
void main()
{
unsigned int ui = UINT_MAX;
float f = ui;
printf(“ui: %u\nf: %f\n”, ui, f);
}
ui: 4294967295
f: 4294967296.000000
Cox 32
Simple Data Types
Alan L. Cox [email protected]