Lect 13

Lecture 13
15-02-2021
1
Multiplication

Long-multiplication approach:
multiplicand
1000
multiplier
× 1001
1000
00000
000000
1000000
product 1001000
Length of product is
the sum of operand
lengths
Chapter 3 — Arithmetic for Computers — 2 2

Multiplication Hardware
Initially 0
Chapter 3 — Arithmetic for Computers — 3
3
1000
× 1001
1000
0000
0000
1000
1001000
4
5
Optimized Multiplier
1000 Multiplier is
initially placed
× 1001 here
1000
0000 Important observation
0000 to make: Shifting
1000 P M product right 1 bit is
equivalent to shifting
1001000 multiplicand 1 bit left
 One cycle per partial-product addition

 That’s ok, if frequency of multiplications is low
6
Fast Multiplier

Uses multiple adders
 Cost/performance tradeoff
 Can be pipelined
 Several multiplication performed in parallel 7
8
MIPS Multiplication

Two 32-bit registers for product
 HI: most-significant 32 bits
 LO: least-significant 32-bits

Instructions:
 mult rs, rt / multu rs, rt

64-bit product in HI/LO
 mfhi rd / mflo rd

Move from HI/LO to rd

Can test HI value to see if product overflows 32 bits
 mul rd, rs, rt

Least-significant 32 bits of product –> rd

§3.4 Division
Division

Check for 0 divisor

Long division approach
quotient
 If divisor ≤ dividend bits
dividend

1 bit in quotient, subtract
1001
1000 1001010
 Otherwise
-1000 
0 bit in quotient, bring down next
divisor dividend bit
10
101 
Restoring division
1010  Do the subtract, and if remainder
-1000 goes < 0, add divisor back
10
remainder 
Signed division
n-bit operands yield n-bit  Divide using absolute values
quotient and remainder  Adjust sign of quotient and remainde
as required
10
Division Hardware
Initially divisor
in left half
Initially dividend
32 bits 
33 iterations
11
1001
1000 1001010
-1000
10
101
1010
-1000
10
Chapter 3 — Arithmetic for Computers12— 12

FIGURE 3.10 Division example using the algorithm in Figure 3.9. The bit
examined to determine the next step is circled in color.

Optimized Divider
Dividend initially
Remainder,
Quotient

One cycle per partial-remainder subtraction

Looks a lot like a multiplier!
 Same hardware can be used for both
14
Faster Division

Can’t use parallel hardware as in multiplier:
 Subtraction is conditional on sign of
remainder…

Faster dividers (e.g. SRT division)
generate multiple quotient bits per step
 Still require multiple steps
15
MIPS Division

Use HI/LO registers for result
 HI: 32-bit remainder
 LO: 32-bit quotient

Instructions:
 div rs, rt / divu rs, rt
 No overflow or divide-by-0 checking

Software must perform checks if required
 Use mfhi, mflo to access result

Floating Point
 Significand
Representation for non-integral numbers
 Including very small and very large numbers

Numbers in scientific notation:
 –2.34 × 1056
 +0.002 × 10–4
normalized
 +987.02 × 109
not normalized

In binary
 ±1.xxxxxxx2 × 2yyyy
not normalized

Types float and double in C
normalized
17
Floating Point Standard

Defined by IEEE Std 754 --- 1985

Developed in response to divergence of
representations:
 Portability issues for scientific code

Now almost universally adopted

Two representations
 Single precision (32-bit)
 Double precision (64-bit)
18
IEEE Floating-Point Format
single: 8 bits single: 23 bits
double: 11 bits double: 52 bits
S Exponent Fraction
(Exponent Bias)
x  ( 1)  (1 Fraction)  2
S
View + as

S: sign bit (0  non-negative, 1  negative) a binary

Normalized significand: 1.0 ≤ |significand| < 2.0 point .
 Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden
bit)
 Significand is Fraction with the “1.” restored

Exponent: excess representation: actual exponent + Bias
 Ensures exponent is unsigned
 Single: Bias = 127; Double: Bias = 1203
19
Single-Precision Range

Exponents 00000000 and 11111111 are reserved

Smallest value:
 Exponent: 00000001
 actual exponent = 1 – 127 = –126
 Fraction: 000…00  significand = 1.0
 ±1.0 × 2–126 ≈ ±1.2 × 10–38

Largest value:
 exponent: 11111110
 actual exponent = 254 – 127 = +127
 Fraction: 111…11  significand ≈ 2.0
 ±2.0 × 2+127 ≈ ±3.4 × 10+38
20
Double-Precision Range

Exponents 0000…00 and 1111…11 reserved

Smallest value:
 Exponent: 00000000001
 actual exponent = 1 – 1023 = –1022
 Fraction: 000…00  significand = 1.0
 ±1.0 × 2–1022 ≈ ±2.2 × 10–308

Largest value:
 Exponent: 11111111110
 actual exponent = 2046 – 1023 = +1023
 Fraction: 111…11  significand ≈ 2.0
 ±2.0 × 2+1023 ≈ ±1.8 × 10+308
21
Floating-Point Precision

Relative precision:
 all fraction bits are significant
 Single: approx 2–23

Equivalent to 23 × log102 ≈ 23 × 0.3 ≈ 6 decimal digits
of precision
 Double: approx 2–52


Equivalent to 52 × log102 ≈ 52 × 0.3 ≈ 16 decimal digits
of precision
22
Floating-Point Example

Represent –0.75
 –0.75 = (–1)1 × 1.12 × 2–1
S =1
 Fraction = 1000…002
 Exponent = –1 + Bias

Single: –1 + 127 = 126 = 011111102

Double: –1 + 1023 = 1022 = 011111111102

Single: 1 01111110 1000…00

Double: 1 01111111110 1000…00
23
Floating-Point Example

Which number is represented by the following
in single-precision float:
1 10000001 01000…00
S =1
 Fraction = 01000…002
 Exponent = 100000012 = 129
 x = (–1)1 × (1 + 012) × 2(129 – 127)
= (–1) × 1.25 × 22
= –5.0
24
Exercise 1

Represent -85.125 in IEEE FP format

–85.125= (–1)1 × (85.125)

85= 1010101

.125=.001

85.125=1010101.001=1.010101001 x 26

S=1
 Fraction = 0101010012
 Exponent = 100001012 = 133
 1 10000101 0101010010…00
25
Exercise 2

Represent 176.375 in IEEE FP format

176.375=10110000.011

1.0110000011 * 27

Exponent= 127+7=134= 10000110

0 10000110
01100000110000000000000
26
Special Values
27
Denormal Numbers

The smallest representable number is ±1.0 × 2–
126

How to close the gap?

Subnormal (denormal number):
 When all the exponent bits are 0
 The leading hidden bit of the significand is
implied to be 0.
28
Denormal Numbers

Exponent = 000...0  hidden bit is 0
Bias
x  ( 1)  (0  Fraction)  2
S
 Denormal with fraction = 000...0

x  ( 1)S  (0  0)  2Bias  0.0
Two representations of 0.0!

Largest subnormal number is 0.999999988×2–126.
 It is close to the smallest normalized number 1×2–126.
29
Infinities and NaNs

Exponent = 111...1, Fraction = 000...0
 ±Infinity
 Can be used in subsequent calculations, avoiding need for
overflow check

Exponent = 111...1, Fraction ≠ 000...0
 Not-a-Number (NaN)
 Indicates illegal or undefined result

e.g., 0.0 / 0.0
 Can be used in subsequent calculations
30
Floating-Point Addition

Consider a 4-digit decimal example
 9.999 × 101 + 1.610 × 10–1

1. Align decimal points
 Shift number with smaller exponent
 9.999 × 101 + 0.016 × 101

2. Add significands
 9.999 × 101 + 0.016 × 101 = 10.015 × 101

3. Normalize result & check for over/underflow
 1.0015 × 102

4. Round and renormalize if necessary
 1.002 × 102
31
Floating-Point Addition

Now consider a 4-digit binary example
 1.0002 × 2–1 + –1.1102 × 2–2 (in decimal 0.5 + –0.4375)

1. Align binary points
 Shift number with smaller exponent
 1.0002 × 2–1 + –0.1112 × 2–1

2. Add significands
 1.0002 × 2–1 + –0.1112 × 2–1 = 0.0012 × 2–1

3. Normalize result & check for over/underflow
 1.0002 × 2–4, with no over/underflow

4. Round and renormalize if necessary
 1.0002 × 2–4 (no change) = 0.0625
32
FP Adder Hardware

Much more complex than integer adder

Doing it in one clock cycle would make
clock cycle too long:
 Much longer than integer operations
 Slower clock would penalize all instructions

FP adder usually takes several cycles
 Can be pipelined
33
FP Adder Hardware
Step 1
Step 2
Step 3
Step 4
34
FP Arithmetic Hardware

FP multiplier is of similar complexity to FP adder
 But uses a multiplier for significands instead of an adder

FP arithmetic hardware usually does
 Addition, subtraction, multiplication, division, reciprocal,
square-root
 FP  integer conversion

Operations usually takes several cycles
 Can be pipelined
37
Accurate Arithmetic

IEEE Std 754 specifies additional rounding control
 Extra bits of precision (guard, round, sticky)
 Choice of rounding modes
 Allows programmer to fine-tune numerical behavior of a
computation

Not all FP units implement all options
 Most programming languages and FP libraries just use
defaults

Trade-off between hardware complexity,
performance, and market requirements
38
Subword Parallellism

Graphics and audio applications can take
advantage of performing simultaneous
operations on short vectors
 Example: 128-bit adder:

Sixteen 8-bit adds

Eight 16-bit adds

Four 32-bit adds

Also called data-level parallelism, vector
parallelism, or Single Instruction, Multiple Data
(SIMD)
39
Associativity

Parallel programs may interleave operations in
unexpected orders
 Assumptions of associativity may fail
(x+y)+z x+(y+z)
x -1.50E+38 -1.50E+38
y 1.50E+38 0.00E+00
z 1.0 1.0 1.50E+38
1.00E+00 0.00E+00
 Need to validate parallel programs under
varying degrees of parallelism
40
Who Cares About FP Accuracy?

Important for scientific code
 But for everyday consumer use?

“My bank balance is out by 0.0002¢!” 

The Intel Pentium FDIV bug
 The market expects accuracy
 See Colwell, The Pentium Chronicles
41
Concluding Remarks

Bits have no inherent meaning
 Interpretation depends on the
instructions applied

Computer representations of numbers
 Finite range and precision
 Need to account for this in programs
42
Concluding Remarks

ISAs support arithmetic
 Signed and unsigned integers
 Floating-point approximation to reals

Bounded range and precision
 Operations can overflow and underflow

MIPS ISA
 Core instructions: 54 most frequently used

100% of SPECINT, 97% of SPECFP
 Other instructions: less frequent
43

Lect 13

Uploaded by

Copyright:

Available Formats

Lect 13

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lect 13

Uploaded by

Copyright:

Available Formats

Lecture 13

Chapter 3 — Arithmetic for Computers — 2 2

Chapter 3 — Arithmetic for Computers — 3

 One cycle per partial-product addition

Chapter 3 — Arithmetic for Computers — 9

Chapter 3 — Arithmetic for Computers12— 12

Chapter 3 — Arithmetic for Computers — 13 13

 Use mfhi, mflo to access result

Chapter 3 — Arithmetic for Computers — 16 16

 Double: approx 2–52

 Denormal with fraction = 000...0

You might also like