Chapter 03 Arith 3 Float
Chapter 03 Arith 3 Float
Architecture
CH3 Computer Arithmetic (III)
Floating Point
2
Outline
• Overview
• RISC-V floating-point instructions
• Fixed-point and floating-point representations
• IEEE 754 standard
• Floating-point operations
3
RISC-V Floating Instructions
• Arithmetic
• fadd.s, fsub.s, fmul.s, fdiv.s # s means single-precision
• fadd.d, fsub.d, fmul.d, fdiv.d # d means double-precision
• Comparisons
• feq.s, feq.d # equal
• flt.s, flt.d # less than
• fle.s, fle.d # less than or equal
• Load / store
• flw, fsw
• fld, fsd
4
Floating Point Unit and Register
Files
• Separate 32 registers for Processor
floating-point $0
• Register pairs (e.g., $F0 and $1
Add
$F1) for double precision $2
• $F0 is not always zero
Mult/Div
$31
5
Fixed-Point
• Integers scaled by an implicit (隱含的) factor
• The scaling factor for each variable does not change
(i.e., fixed) during the entire computation
• Examples
• 3.14 is represented as
• 314 (scaling factor = 1/100)
• 3140 (scaling factor = 1/1000)
• 5,000,000 is represented as
• 5 (scaling factor = 1,000,000)
• 50 (scaling factor = 100,000)
6
Floating-Point ~= Scientific Notation
(科學記號表示法)
• 光速 + 2.99792458 × 10 8 (m/s)
• 電子電量 - 1.60217733 × 10 -19 (C)
• 0.5莫耳碳原子 + 6.00000000 × 10 -3 (kg)
sign significand (有效數) exponent (指數)
or fraction (小數),
or mantissa (尾數) radix or base (底數)
• Normalized form
• Exactly one non-zero significant digit to the left of the
point
• 29.9 × 107 and 0.299 × 109 are not normalized forms
7
Floating-Point ~= Scientific Notation
(科學記號表示法)
• 光速 + 2.99792458 × 10 8 (m/s)
• 電子電量 - 1.60217733 × 10 -19 (C)
• 0.5莫耳碳原子 + 6.00000000 × 10 -3 (kg)
sign significand (有效數) exponent (指數)
or fraction (小數),
or mantissa (尾數) radix or base (底數)
8
Floating Point Number
32 bits
• IEEE 754 standard
• Single-precision S Exp. Significand 64 bits
9
Special Floating Point Numbers
32 bits
S Exp. Significand
zero S 0…00 0…00
Denormalized value S 0…00 non-zero
+/- ∞ S 1…11 0…00
Not a number (NaN) any 1…11 non-zero
10
Denormalized Value
• S 0...00 Significand are denormalized values
• (-1)S × 0.Significant × 2(1-bias)
• No leading one to the left of the point
• Objective
• Represent very small value
• Gradual underflow
11
Floating Point Examples
32 bits
• S Exp. Significand
0 01111000 10100……..000
= 1.1010…...000two × 2(120-127)
= 1.1010two × 2-7
= 1.625ten × 2-7
= 0.0126953125ten
12
Floating Point Examples
fraction part x2
• Convert -3.14 to 32-bit float
• 3 = 11two 0.14
0.28 0.88
• 3.14
0.56 1.76
= 11.0010_0011_1101_0111_0000_1010…two 1.12 1.52
= 1.1001_0001_1110_1011_1000_010 × 21 0.24 1.04
0.48 0.08
23-bit significand (assume not rounded) 0.96 0.16
1.92 0.32
1.84 0.64
= 1 10000000 1.68 1.28
1.36 0.56
0.72 1.12
32 bits
1.44 0.24
13
IEEE 754 Online Converter
https://www.h-schmidt.net/FloatConverter/IEEE754.html
http://babbage.cs.qc.cuny.edu/IEEE-754.old/Decimal.html
14
Floating Point Operations
• Comparisons
• Addition
• Multiplication
15
Comparisons
• Similar to sign-magnitude integer comparison
S Exp. Significand S Exp. Significand
viewed as viewed as
S Magnitude S Magnitude
• Rationales
• Positive > negative
• Between two positive floating point numbers
• One with larger {exponent, significand} is greater
• Between two negative floating point numbers
• One with smaller {exponent, significand} is greater
16
Comparisons (Cont'd)
• Cases directly supported by sign-magnitude
comparisons
• +∞ == +∞
• -∞ == -∞
• -∞ < all numbers < ∞
• 0 == -0
17
Addition
• Steps
1. Align (adjust the smaller number)
2. Perform addition
3. Normalize
4. Round
5. Re-normalize
• Examples
• 9.999ten × 101 + 1.610ten × 10-1
• 1.101two × 29 + 1.110two × 212
• Assume four-digit significands
18
Decimal Example
9.999ten × 101 + 1.610ten × 10-1
Align
= 9.999ten × 101 + 0.01610ten × 101
Add
= 10.01500ten × 101
Normalize
= 1.001500ten × 102
Round
= 1.002ten × 102
Renormalize
= 1.002ten × 102 (no change)
19
Binary Example
1.101 × 29 + 1.110 × 212
Align two two
= 0.001101two × 212 + 1.110two × 212
Add
= 1.111101two × 212
Normalize
= 1.111101two × 212 (no change)
Round
= 10.000two × 212
Renormalize
= 1.000two × 213
20
Compare
Exponents
Shift smaller
number right
Add
Normalize
Round
21
Multiplication
• Steps
1. Add exponents (considering the bias)
2. Multiply the significands (with sign determined)
3. Normalize (and check over/underflow)
4. Round
5. Re-normalize (and re-check over/underflow)
• Examples
• 1.110ten × 1010 × 9.200ten × 10-5
• 1.000two × 2-1 × (-1.110two) × 2-2
• Inputs and outputs have four-digit significand
22
Decimal Example
1.110ten × 1010 × 9.200ten × 10-5
Exponent 10 + (-5) = 5
Multiply 1.110ten × 9.200ten = 10.212ten
Normalize = 1.0212ten × 106
Round = 1.021ten × 106
Renormalize = 1.021ten × 106 (no change)
23
Binary Example
1.000two × 2-1 × (-1.110two) × 2-2
Exponent (-1) + (-2) = (-3)
Multiply 1.000two × (-1.110two) = (-1.110000two)
Normalize = (-1.110000two) × 2-3 (no change)
Round = (1.110two) × 2-3 (no change)
Renormalize = (1.110two) × 2-3 (no change)
24
Internal Format with Extra Bits
• Extra bits are needed during arithmetic operations
to increase the arithmetic accuracy
• e.g., 1.101two × 29 + 1.110two × 212
without extra bits with extra bits
25
IEEE 754 Internal Format
• Three extra bits
• The 3rd one represents any remaining nonzero bits to
the right
26
IEEE 754 Internal Format
• Roles/names of the three extra bits
• First: Guard
• Second: Round
• Third: Sticky
0.000010001two × 212
S Exp. Significand 0.0000101two × 212
First
Second
Third
27
IEEE 754 Rounding Mode
• Four modes can be chosen by programmers
• Toward 0 (also called truncation)
• Toward +∞
• Toward -∞
• Toward nearest even (default mode)
• Choose the even one if there are two equally nearest values
28
Round Toward Nearest Even
• Binary examples
29
Outline
• Overview
• IEEE 754 standard
• Single-precision
• Double-precision
• Special numbers
• Floating-point operations
• Addition
• Multiplication
• Rounding
30