Lect 13

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 41

Lecture 13

15-02-2021

1
Multiplication

Long-multiplication approach:

multiplicand
1000
multiplier
× 1001
1000
00000
000000
1000000
product 1001000

Length of product is
the sum of operand
lengths

Chapter 3 — Arithmetic for Computers — 2 2


Multiplication Hardware

Initially 0

Chapter 3 — Arithmetic for Computers — 3

3
1000
× 1001
1000
0000
0000
1000
1001000

4
Chapter 3 — Arithmetic for Computers — 4
Chapter 3 — Arithmetic for Computers — 5

5
Optimized Multiplier

1000 Multiplier is
initially placed
× 1001 here
1000
0000 Important observation
0000 to make: Shifting
1000 P M product right 1 bit is
equivalent to shifting
1001000 multiplicand 1 bit left

 One cycle per partial-product addition


 That’s ok, if frequency of multiplications is low
6
Chapter 3 — Arithmetic for Computers — 6
Fast Multiplier

Uses multiple adders
 Cost/performance tradeoff

 Can be pipelined
 Several multiplication performed in parallel 7
Chapter 3 — Arithmetic for Computers — 8

8
MIPS Multiplication

Two 32-bit registers for product
 HI: most-significant 32 bits
 LO: least-significant 32-bits

Instructions:
 mult rs, rt / multu rs, rt

64-bit product in HI/LO
 mfhi rd / mflo rd

Move from HI/LO to rd

Can test HI value to see if product overflows 32 bits
 mul rd, rs, rt

Least-significant 32 bits of product –> rd

Chapter 3 — Arithmetic for Computers — 9


§3.4 Division
Division

Check for 0 divisor

Long division approach
quotient
 If divisor ≤ dividend bits
dividend

1 bit in quotient, subtract
1001
1000 1001010
 Otherwise
-1000 
0 bit in quotient, bring down next
divisor dividend bit
10
101 
Restoring division
1010  Do the subtract, and if remainder
-1000 goes < 0, add divisor back
10
remainder 
Signed division
n-bit operands yield n-bit  Divide using absolute values
quotient and remainder  Adjust sign of quotient and remainde
as required

10
Chapter 3 — Arithmetic for Computers — 10
Division Hardware

Initially divisor
in left half

Initially dividend

32 bits 
33 iterations

11
1001
1000 1001010
-1000
10
101
1010
-1000
10

Chapter 3 — Arithmetic for Computers12— 12


FIGURE 3.10 Division example using the algorithm in Figure 3.9. The bit
examined to determine the next step is circled in color.

Chapter 3 — Arithmetic for Computers — 13 13


Optimized Divider
Dividend initially
Remainder,
Quotient


One cycle per partial-remainder subtraction

Looks a lot like a multiplier!
 Same hardware can be used for both
14
Chapter 3 — Arithmetic for Computers — 14
Faster Division

Can’t use parallel hardware as in multiplier:
 Subtraction is conditional on sign of
remainder…


Faster dividers (e.g. SRT division)
generate multiple quotient bits per step
 Still require multiple steps

15
Chapter 3 — Arithmetic for Computers — 15
MIPS Division

Use HI/LO registers for result
 HI: 32-bit remainder
 LO: 32-bit quotient

Instructions:
 div rs, rt / divu rs, rt
 No overflow or divide-by-0 checking

Software must perform checks if required

 Use mfhi, mflo to access result

Chapter 3 — Arithmetic for Computers — 16 16


Floating Point
 Significand
Representation for non-integral numbers
 Including very small and very large numbers

Numbers in scientific notation:
 –2.34 × 1056
 +0.002 × 10–4
normalized
 +987.02 × 109
not normalized

In binary
 ±1.xxxxxxx2 × 2yyyy
not normalized

Types float and double in C

normalized

17
Floating Point Standard

Defined by IEEE Std 754 --- 1985

Developed in response to divergence of
representations:
 Portability issues for scientific code

Now almost universally adopted

Two representations
 Single precision (32-bit)
 Double precision (64-bit)
18
IEEE Floating-Point Format
single: 8 bits single: 23 bits
double: 11 bits double: 52 bits
S Exponent Fraction

(Exponent Bias)
x  ( 1)  (1 Fraction)  2
S

View + as

S: sign bit (0  non-negative, 1  negative) a binary

Normalized significand: 1.0 ≤ |significand| < 2.0 point .
 Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden
bit)
 Significand is Fraction with the “1.” restored

Exponent: excess representation: actual exponent + Bias
 Ensures exponent is unsigned
 Single: Bias = 127; Double: Bias = 1203

19
Single-Precision Range

Exponents 00000000 and 11111111 are reserved

Smallest value:
 Exponent: 00000001
 actual exponent = 1 – 127 = –126
 Fraction: 000…00  significand = 1.0
 ±1.0 × 2–126 ≈ ±1.2 × 10–38

Largest value:
 exponent: 11111110
 actual exponent = 254 – 127 = +127
 Fraction: 111…11  significand ≈ 2.0
 ±2.0 × 2+127 ≈ ±3.4 × 10+38
Chapter 3 — Arithmetic for Computers — 20
20
Double-Precision Range

Exponents 0000…00 and 1111…11 reserved

Smallest value:
 Exponent: 00000000001
 actual exponent = 1 – 1023 = –1022
 Fraction: 000…00  significand = 1.0
 ±1.0 × 2–1022 ≈ ±2.2 × 10–308

Largest value:
 Exponent: 11111111110
 actual exponent = 2046 – 1023 = +1023
 Fraction: 111…11  significand ≈ 2.0
 ±2.0 × 2+1023 ≈ ±1.8 × 10+308

21
Floating-Point Precision

Relative precision:
 all fraction bits are significant
 Single: approx 2–23

Equivalent to 23 × log102 ≈ 23 × 0.3 ≈ 6 decimal digits
of precision

 Double: approx 2–52



Equivalent to 52 × log102 ≈ 52 × 0.3 ≈ 16 decimal digits
of precision

22
Floating-Point Example

Represent –0.75
 –0.75 = (–1)1 × 1.12 × 2–1
S =1
 Fraction = 1000…002
 Exponent = –1 + Bias

Single: –1 + 127 = 126 = 011111102

Double: –1 + 1023 = 1022 = 011111111102

Single: 1 01111110 1000…00

Double: 1 01111111110 1000…00

23
Floating-Point Example

Which number is represented by the following
in single-precision float:
1 10000001 01000…00
S =1
 Fraction = 01000…002
 Exponent = 100000012 = 129
 x = (–1)1 × (1 + 012) × 2(129 – 127)
= (–1) × 1.25 × 22
= –5.0
24
Exercise 1

Represent -85.125 in IEEE FP format

–85.125= (–1)1 × (85.125)

85= 1010101

.125=.001

85.125=1010101.001=1.010101001 x 26

S=1
 Fraction = 0101010012
 Exponent = 100001012 = 133
 1 10000101 0101010010…00
25
Exercise 2

Represent 176.375 in IEEE FP format

176.375=10110000.011

1.0110000011 * 27

Exponent= 127+7=134= 10000110

0 10000110
01100000110000000000000
26
Special Values

27
Denormal Numbers

The smallest representable number is ±1.0 × 2–
126


How to close the gap?

Subnormal (denormal number):
 When all the exponent bits are 0
 The leading hidden bit of the significand is
implied to be 0.

28
Denormal Numbers

Exponent = 000...0  hidden bit is 0
Bias
x  ( 1)  (0  Fraction)  2
S

 Denormal with fraction = 000...0


x  ( 1)S  (0  0)  2Bias  0.0
Two representations of 0.0!


Largest subnormal number is 0.999999988×2–126.
 It is close to the smallest normalized number 1×2–126.
29
Infinities and NaNs

Exponent = 111...1, Fraction = 000...0
 ±Infinity
 Can be used in subsequent calculations, avoiding need for
overflow check

Exponent = 111...1, Fraction ≠ 000...0
 Not-a-Number (NaN)
 Indicates illegal or undefined result

e.g., 0.0 / 0.0
 Can be used in subsequent calculations

30
Floating-Point Addition

Consider a 4-digit decimal example
 9.999 × 101 + 1.610 × 10–1

1. Align decimal points
 Shift number with smaller exponent
 9.999 × 101 + 0.016 × 101

2. Add significands
 9.999 × 101 + 0.016 × 101 = 10.015 × 101

3. Normalize result & check for over/underflow
 1.0015 × 102

4. Round and renormalize if necessary
 1.002 × 102

31
Floating-Point Addition

Now consider a 4-digit binary example
 1.0002 × 2–1 + –1.1102 × 2–2 (in decimal 0.5 + –0.4375)

1. Align binary points
 Shift number with smaller exponent
 1.0002 × 2–1 + –0.1112 × 2–1

2. Add significands
 1.0002 × 2–1 + –0.1112 × 2–1 = 0.0012 × 2–1

3. Normalize result & check for over/underflow
 1.0002 × 2–4, with no over/underflow

4. Round and renormalize if necessary
 1.0002 × 2–4 (no change) = 0.0625

32
FP Adder Hardware

Much more complex than integer adder

Doing it in one clock cycle would make
clock cycle too long:
 Much longer than integer operations
 Slower clock would penalize all instructions

FP adder usually takes several cycles
 Can be pipelined
33
FP Adder Hardware

Step 1

Step 2

Step 3

Step 4

34
FP Arithmetic Hardware

FP multiplier is of similar complexity to FP adder
 But uses a multiplier for significands instead of an adder

FP arithmetic hardware usually does
 Addition, subtraction, multiplication, division, reciprocal,
square-root
 FP  integer conversion

Operations usually takes several cycles
 Can be pipelined

37
Accurate Arithmetic

IEEE Std 754 specifies additional rounding control
 Extra bits of precision (guard, round, sticky)
 Choice of rounding modes
 Allows programmer to fine-tune numerical behavior of a
computation

Not all FP units implement all options
 Most programming languages and FP libraries just use
defaults

Trade-off between hardware complexity,
performance, and market requirements
38
Subword Parallellism

Graphics and audio applications can take
advantage of performing simultaneous
operations on short vectors
 Example: 128-bit adder:

Sixteen 8-bit adds

Eight 16-bit adds

Four 32-bit adds

Also called data-level parallelism, vector
parallelism, or Single Instruction, Multiple Data
(SIMD)
39
Associativity

Parallel programs may interleave operations in
unexpected orders
 Assumptions of associativity may fail

(x+y)+z x+(y+z)
x -1.50E+38 -1.50E+38
y 1.50E+38 0.00E+00
z 1.0 1.0 1.50E+38
1.00E+00 0.00E+00
 Need to validate parallel programs under
varying degrees of parallelism
40
Who Cares About FP Accuracy?


Important for scientific code
 But for everyday consumer use?

“My bank balance is out by 0.0002¢!” 


The Intel Pentium FDIV bug
 The market expects accuracy
 See Colwell, The Pentium Chronicles
41
Concluding Remarks

Bits have no inherent meaning
 Interpretation depends on the
instructions applied

Computer representations of numbers
 Finite range and precision
 Need to account for this in programs

42
Concluding Remarks

ISAs support arithmetic
 Signed and unsigned integers
 Floating-point approximation to reals

Bounded range and precision
 Operations can overflow and underflow

MIPS ISA
 Core instructions: 54 most frequently used

100% of SPECINT, 97% of SPECFP
 Other instructions: less frequent

43

You might also like