Pooja Vashisth
Pooja Vashisth
Pooja Vashisth
1. Representation of numerical data as floating-point numbers
2. Describe underflow, overflow, round off, and truncation errors
3. Describe floating point arithmetic operations
4. Hardware implementation of floating-point operations
2 2
Floating-point data formats
Underflow and overflow
How representations affect accuracy and precision
Investigate hardware implementation of various floating-point
arithmetic operations
3 3
CSC 258
4
We’ve presented a fixed point representation of (some)
real numbers
5
There’s a shift in topics next week after.
We’re moving away from assembly and to
computer organization.
6
Representation for non-integral numbers
Including very small and very large numbers
In binary
±1.xxxxxxx2 × 2yyyy
7
Defined by IEEE Std 754-1985
Developed in response to divergence of representations
Portability issues for scientific code
8
single: 8 bits single: 23 bits
double: 11 bits double: 52 bits
S Exponent Fraction
Largest value
exponent: 11111110
⇒ actual exponent = 254 – 127 = +127
Fraction: 111…11 ⇒ significand ≈ 2.0
10 ±2.0 × 2+127 ≈ ±3.4 × 10+38
Exponents 0000…00 and 1111…11 reserved
Smallest value
Exponent: 00000000001
⇒ actual exponent = 1 – 1023 = –1022
Fraction: 000…00 ⇒ significand = 1.0
±1.0 × 2–1022 ≈ ±2.2 × 10–308
Largest value
Exponent: 11111111110
⇒ actual exponent = 2046 – 1023 = +1023
Fraction: 111…11 ⇒ significand ≈ 2.0
±2.0 × 2+1023 ≈ ±1.8 × 10+308
11
Relative precision
all fraction bits are significant
Single: approx 2–23
Equivalent to 23 × log102 ≈ 23 × 0.3 ≈ 6 decimal digits of precision
Double: approx 2–52
Equivalent to 52 × log102 ≈ 52 × 0.3 ≈ 16 decimal digits of precision
12
Represent –0.75
–0.75 = (–1)1 × 1.12 × 2–1
S = 1
Fraction = 1000…002
Exponent = –1 + Bias
Single: –1 + 127 = 126 = 011111102
Double: –1 + 1023 = 1022 = 011111111102
Single: 1011111101000…00
Double: 1011111111101000…00
13
What number is represented by the single-
precision float
11000000101000…00
S = 1
Fraction = 01000…002
Exponent = 100000012 = 129
16
Consider a 4-digit decimal example
9.999 × 101 + 1.610 × 10–1
2. Add significands
9.999 × 101 + 0.016 × 101 = 10.015 × 101
2. Add significands
1.0002 × 2–1 + –0.1112 × 2–1 = 0.0012 × 2–1
19
Step 1
Step 2
Step 3
Step 4
20
Consider a 4-digit decimal example
1.110 × 1010 × 9.200 × 10–5
1. Add exponents
For biased exponents, subtract bias from sum
New exponent = 10 + –5 = 5
2. Multiply significands
1.110 × 9.200 = 10.212 ⇒ 10.212 × 105
1. Add exponents
Unbiased: –1 + –2 = –3
Biased: (–1 + 127) + (–2 + 127) = –3 + 254 – 127 = –3 + 127
2. Multiply significands
1.0002 × 1.1102 = 1.1102 ⇒ 1.1102 × 2–3
23
Separate FP registers: f0, …, f31
double-precision
single-precision values stored in the lower 32 bits
FP instructions operate only on FP registers
FP load and store instructions
flw, fld
fsw, fsd
24
Single-precision arithmetic
fadd.s, fsub.s, fmul.s, fdiv.s,
fsqrt.s
e.g., fadds.s f2, f4, f6
Double-precision arithmetic
fadd.d, fsub.d, fmul.d, fdiv.d,
fsqrt.d
e.g., fadd.d f2, f4, f6
26
C = C + A × B
All 32 × 32 matrices, 64-bit double-precision
elements
C code:
void mm (double c[][],
double a[][], double b[][]) {
size_t i, j, k;
for (i = 0; i < 32; i = i + 1)
for (j = 0; j < 32; j = j + 1)
for (k = 0; k < 32; k = k + 1)
c[i][j] = c[i][j]
+ a[i][k] * b[k][j];
}
Addresses of c, a, b in x10, x11, x12, and
27
i, j, k in x5, x6, x7
RISC-V code:
mm:...
li x28,32 // x28 = 32 (row size/loop end)
li x5,0 // i = 0; initialize 1st for loop
L1: li x6,0 // j = 0; initialize 2nd for loop
L2: li x7,0 // k = 0; initialize 3rd for loop
slli x30,x5,5 // x30 = i * 2**5 (size of row of c)
add x30,x30,x6 // x30 = i * size(row) + j
slli x30,x30,3 // x30 = byte offset of [i][j]
add x30,x10,x30 // x30 = byte address of c[i][j]
fld f0,0(x30) // f0 = c[i][j]
L3: slli x29,x7,5 // x29 = k * 2**5 (size of row of b)
add x29,x29,x6 // x29 = k * size(row) + j
slli x29,x29,3 // x29 = byte offset of [k][j]
add x29,x12,x29 // x29 = byte address of b[k][j]
fld f1,0(x29) // f1 = b[k][j]
28
…
slli x29,x5,5 // x29 = i * 2**5 (size of row of a)
add x29,x29,x7 // x29 = i * size(row) + k
slli x29,x29,3 // x29 = byte offset of [i][k]
add x29,x11,x29 // x29 = byte address of a[i][k]
fld f2,0(x29) // f2 = a[i][k]
fmul.d f1, f2, f1 // f1 = a[i][k] * b[k][j]
fadd.d f0, f0, f1 // f0 = c[i][j] + a[i][k] * b[k][j]
addi x7,x7,1 // k = k + 1
bltu x7,x28,L3 // if (k < 32) go to L3
fsd f0,0(x30) // c[i][j] = f0
addi x6,x6,1 // j = j + 1
bltu x6,x28,L2 // if (j < 32) go to L2
addi x5,x5,1 // i = i + 1
bltu x5,x28,L1 // if (i < 32) go to L1
29
IEEE Std 754 specifies additional rounding
control
Extra bits of precision (guard, round, sticky)
Choice of rounding modes
Allows programmer to fine-tune numerical behavior
of a computation
Not all FP units implement all options
Most programming languages and FP libraries just
use defaults
Trade-off between hardware complexity,
performance, and market requirements
30
31
• Submit READY? Quizzes before next week classes
• Participate in Peer discussion and Q/A every week
• Check your labs schedule… (Lab D)
• Attempt your Quiz3 (closes Fri, Feb. 17)
32
• Practice questions, and these are part of Homework2:
(also mentioned at eClass course webpage)
• #?: 3.1, 3.6, 3.11*, 3.13*
• #?: 3.18*, 3.20*, 3.23, 3.24
33
Basics of logic design
Hardware Description Language
34
35