Chapter IV Computer Arithmetic
Chapter IV Computer Arithmetic
Jehan-François Pâris
[email protected]
Chapter Organization
• Representing negative numbers
• Integer addition and subtraction
• Integer multiplication and division
• Floating point operations
• Examples of implementation
– IBM 360, RISC, x86
A warning
• Binary addition, subtraction, multiplication and
division are very easy
ADDITION AND SUBTRACTION
General concept
• Decimal addition • Binary addition
(carry) 1_ ( carry) 111_
19 10011
+ 7 + 111
26 11010
• 16+8+2 = 26
Realization
• Simplest solution is a battery of full adders
o s3 s2 s1 s0
x3 y3 x2 y2 x1 y1 x0 y0
Observations
• Adder add four-bit values
• Output o indicates if there is an overflow
– A result that cannot be represented using 4
bits
– Happens when x + y > 15
• Operation is slowed down by carry propagation
– Faster solutions (not discussed here)
Signed and unsigned additions
• Unsigned addition in • Signed addition in
4-bit arithmetic 4-bit arithmetic
( carry) 11_ ( carry) 11_
1011 1011
+ 0011 + 0011
1110 1110
• 11 + 3 = 14 • -5 + 3 = -2
(8 + 4 + 2)
Signed and unsigned additions
• Same rules apply even though bit strings
represent different values
• Sole difference is overflow handling
Overflow handling (I)
• No overflow in signed • Signed addition in
arithmetic 4-bit arithmetic
( carry) 111_ ( carry) 1__
1110 0110
+ 0011
+ 0011
1001
0001
• 6 + 3 -7
• -2 + 3 = 1 (false)
(correct)
Overflow handling (II)
• In signed arithmetic an overflow happens when
– The sum of two positive numbers exceeds the
maximum positive value that can be
represented using n bits: 2n – 1 – 1
– The sum of two negative numbers falls below
the minimum negative value that can be
represented using n bits: – 2n – 1
Example
• Four-bit arithmetic:
– Sixteen possible values
– Positive overflow happens when result > 7
– Negative overflow happens when result < -8
• Eight-bit arithmetic:
– 256 possible values
– Positive overflow happens when result > 127
– Negative overflow happens when result < -128
Overflow handling (III)
• MIPS architecture handles signed and unsigned
overflows in a very different fashion:
– Ignores unsigned overflows
• Implements modulo 2n arithmetic
– Generates an interrupt whenever it detects a
signed overflows
• Lets the OS handled the condition
Why?
• To keep the CPU as simple and regular as
possible
An interesting consequence
• Most C compilers ignore overflows
– C compilers must use unsigned arithmetic for
their integer operations
• Fortran compilers expect overflow conditions to
be detected
– Fortran compilers must use signed arithmetic
for their integer operations
Subtraction
• Can be implementing by
– Specific hardware
– Negating the subtrahend
Negating a number
• Toggle all bits then add one
In 4-bit arithmetic (I)
0000 0 1111 +1 = 0000 0
0001 1 1110 +1 = 1111 -1
0010 2 1101 +1 = 1110 -2
0011 3 1100 +1 = 1101 -3
0100 4 1011 +1 = 1100 -4
0101 5 1010 +1 = 1011 -5
0110 6 1001 +1 = 1010 -6
0111 7 1000 +1 = 1001 -7
In 4-bit arithmetic (II)
1000 -8 0111 +1 =1000 ?
1001 -7 0110 +1 = 0111 7
1010 -6 0101 +1 = 0110 6
1011 -5 0100 +1 = 0101 5
1100 -4 0011 +1 = 0100 4
1101 -3 0010 +1 = 0011 3
1110 -2 0001 +1 = 0010 2
1111 -1 0000 +1 = 0001 1
MULTIPLICATION
Decimal multiplication
(carry) 1_ • What are the rules?
37 – Successively multiply the
x 12 multiplicand by each digit of the
74 multiplier starting at the right
370 shifting the result left by an
444 extra left position each time
each time but the first
– Sum all partial results
Binary multiplication
• What are the rules?
(carry)
111 _ – Successively multiply the
1101 multiplicand by each digit of
x 101 the multiplier starting at the
right shifting the result left by
1101
an extra left position each
00
110100 time each time but the first
1000001 – Sum all partial results
• Binary multiplication is easy!
Binary multiplication table
X 0 1
0 0 0
1 0 1
Algorithm
• Clear contents of 64-bit product register
• For (i = 0; i <32; i++) {
– If (LSB of multiplier_register ==1)
• Add contents of multiplicand register to product
register
– Shift right one position multiplier register
– Shift left one position multiplicand register
• } / / for loop
Multiplier: First version
Shift Left Shift Right
Multiplicand (64 bits) Multiplier
64-bit
ALU Control
• First addition
Multiplicand Multiplier Product
0011 0011 0011
Example (II)
• Shift right and left
Multiplicand Multiplier Product
0110 0001 0011
• Second addition
Multiplicand Multiplier Product
0110 0001 1001
– 0110 + 011 = 1001
Example (III)
• Shift right and left
Multiplicand Multiplier Product
1100 0000 1001
1101
x 101
1101 • Observe that the least
00 significant bit added during
110100 each cycle remains
1000101 unchanged
Algorithm
• Clear contents of 64-bit product register
• For (i = 0; i <32; i++) {
– If (LSB of multiplier_register ==1)
• Add contents of multiplicand register to
product register
– Save LSB of product register
– Shift right one position both multiplier register
and product register
• } / / for loop
Multiplier: Second version
Shift Right
Multiplicand Multiplier
32-bit
ALU Control
+ Test
Product (64 bits)
Shift Right and Save
Decimal Example (I)
• Multiply 27 by 12
• Start
Multiplicand Multiplier Product Result
27 12 -- --
• First digit
Multiplicand Multiplier Product Result
27 12 54 --
Decimal Example (II)
• Shift right multiplier and product
Multiplicand Multiplier Product Result
27 1 5 4
• Second digit
Multiplicand Multiplier Product Result
27 1 32 4
Decimal Example (III)
• Shift right multiplier and product
Multiplicand Multiplier Product Result
27 0 3 24
• First bit
Multiplicand Multiplier Product Result
0011 0011 0011 --
Example (II)
• Shift right multiplier and product
Multiplicand Multiplier Product Result
0011 0001 0001 1-
• Second bit
Multiplicand Multiplier Product Result
0011 0001 0100 1-
Product register contains 0011 + 001 = 0100
Example (III)
• Shift right multiplier and product
Multiplicand Multiplier Product Result
0011 0000 010 01-
32-bit Control
ALU + Test
Multiplier + Product
Shift Right and Save
Third Optimization
• Multiplication requires 32 additions and 32 shift
operations
• Can have two or more partial multiplications
– One using bits 0-15 of multiplier
– A second using bits 16-31
then add together the partial results
Multiplying negative numbers
• Can use the same algorithm as before but we
must extend the sign bit of the product
Related MIPS instructions (I)
• Integer multiplication uses a separate pair of
registers (hi and lo)
• mult $s0, $s1
– multiply contents of register $s0 by contents
of register $s1 and store results in register
pair hi-lo
• multu $s0, $s1
– same but unsigned
Related MIPS instructions (II)
• mflo $s9
– Move contents of register lo to register $s0
• mfhi $s9
– Move contents of register hi to register $s0
DIVISION
Division
• Implemented by successive subtractions
• Result must verify the equality
Dividend = Multiplier× Quotient + Remainder
Decimal division (long division
303 • What are the rules?
– Repeatedly try to subtract smaller
7 2126
multiple of divisor from dividend
-210 – Record multiple (or zero)
26 – At each step, repeat with a lower
-21 power of ten
5 – Stop when remainder is smaller
than divisor
Binary division
011 • What are the rules?
11 1011 – Repeatedly try to subtract powers
X
-11 of two of divisor from dividend
1011 – Mark 1 for success, 0 for failure
– At each step, shift divisor one
>- position to the right
X 11 – Stop when remainder is smaller
than divisor
101
>>-11
Same division in decimal
2+1=3 • What are the rules?
3 11 – Repeatedly try to subtract powers
X
-12 of two of divisor from dividend
11 – Mark 1 for success, 0 for failure
– At each step, shift divisor one
>-6 position to the right
X – Stop when remainder is smaller
5
than divisor
>-3
2
Observations
• Binary division is actually simpler
– We start with a left-shifted version of divisor
– We try to subtract it from dividend
• No need to find out which multiple to subtract
– We mark 1 for success, 0 for failure
– We shift divisor one position left after every
attempt
How to start the division
• One 64-bit register for successive remainders
Initialized with dividend
• One 64-bit register for divisor
Quotient
– Start with quotient in upper half
• One 32-bit register for the quotient
All zeroes
How we proceed (I)-
• After each step we shift the quotient to the right
one position at a time
Divisor
Divisor
Div isor
Divisor
How we proceed (II)
• After each step we shift the contents of the
quotient register one position to the left
– To make space for the new 0 or 1 being
inserted
0
01
011
0110
Division Algorithm
• For i in range(0,33) : # from 0 to 32
– Subtract contents of divisor register from
remainder register
– If remainder 0 :
• Shift quotient register to the left
• Set new rightmost bit to 1
Else :
• Undo subtraction
• Shift quotient register to the left
• Set new rightmost bit to 0
– Shift right one position contents of divisor register
A simple divider
Shift Right Shift Left
Divisor (64 bits) Quotient
64-bit
ALU Control
+ Test
Remainder (64 bits)
Signed division
• Easiest solution is to remember the sign of the
operands and adjust the sign of the quotient and
remainder accordingly
• A little problem:
5 2 = 2 and the remainder is 1
-5 2 = -2 and the remainder is -1
The sign of the remainder must match the sign
of the quotient
Related MIPS instructions
• Integer division uses the same pair of registers
(hi and lo) as integer multiplication
• div $s0, $s1
– divide contents of register $s0 by contents of
register $s, leave the quotient in register lo
and the remainder in register hi
• divu $s0, $s1
– same but unsigned
TRANSITION SLIDE
• Here end the materials that were on the first fall
2012 midterm
• Here start the materials that will be on the fall
2012 midterm
To be moved to
the right place
FLOATING POINT OPERATIONS
Floating point numbers
• Used to represent real numbers
• Very similar to scientific notation
3.5×106, 0.82×10–5, 75×106, …
• Both decimal numbers in scientific notation and
floating point numbers can be normalized:
3.5×106, 8.2×10–6, 7.5×107, …
Fractional binary numbers
• 0.1 is ½ or 0.5ten
• 0.01 is 0.1 is 1/4 or 0.25ten
• 0.11 is ½ + ¼ = ¾ or 0.75ten
• 1.1 is 1½ or 1.5ten
• 10.01 is 2 + ¼ or 2.5ten
• 11.11 is ______ or _____
Normalizing binary numbers
• 0.1 becomes 1.0×2-1
• 0.01 becomes 1.0×2-2
• 0.11 becomes 1.1×2-1
SExp Coefficient
001…1000000000000000000000000
• Biased exponent is 127ten
• True coefficient is implicit one followed by all
zeroes
Decoding a floating point number
• Sign indicated by first bit
• Subtract 127 from biased exponent to obtain
power of two:
<be> – 127
• Use coefficient to construct a normalized binary
value with a binary point:
1.<coefficient>
• Number being represented is
1.<coefficient> × 2<be> – 127
First example
0 01…1 00000000000000000000000000000
• Sign bit is zero:
Number is positive
• Biased exponent is 127
Power of two is zero
• Normalized binary value is
1.0000000
• Number is 1×20 = 1
Second example
0 10…0 10000000000000000000000000000
• Sign bit is zero:
Number is positive
• Biased exponent is 128
Power of two is 1
• Normalized binary value is
1.1000000
• Number is 1.1×21 = 11 = 3ten
Third example
1 01…1011000000000000000000000000000
• Sign bit is 1:
Number is negative
• Biased exponent is 126
Power of two is –1
• Normalized binary value is
1.1100000
• Number is –1.11×2–1 = –0.111 = –7/8ten
Can we do it now?
0 129ten 10100000000000000000000000000
• Sign bit is 0:
Number is ___________
• Biased exponent is 129
Power of two is _______
• Normalized binary value is
1.__________
• Number is _________________________
Encoding a floating point number
• Use sign to pick sign bit
• Normalize the number:
Convert it to form 1.<more bits> × 2<exp>
• Add 127 to exponent <exp> to obtain
biased exponent <be>
• Coefficient <coeff> is equal to fractional part
<more bits> of number
First example
• Represent 7:
– Convert to binary: 111
– Normalize: 1.11×22
– Sign bit is 0
– Biased exponent is 127 + 2 = 10000001two
– Coefficient is 1100…0
0 10…0111000000000000000000000000000
Second example
• Represent 1/2
– Convert to binary: 0.1
– Normalize: 1.0×2-1
– Sign bit is 0
– Biased exponent is 127 – 1 = 01111110two
– Coefficient is 00…0
0 01…1000000000000000000000000000000
Third example
• Represent –2
– Convert to binary: 10
– Normalize: 1.0×21
– Sign bit is 1
– Biased exponent is 127 + 1 = 10000000two
– Coefficient is 00…0
1 10…0000000000000000000000000000000
Fourth example
• Represent 9/4
– Convert to binary: 1001×2–2
– Normalize: 1.001×21
– Sign bit is 0
– Biased exponent is 127 + 1 = 10000000two
– Coefficient is 0010…0
1 10…0000100000000000000000000000000
Can we do it now?
• Represent 6.25:
– Convert to binary: ________
– Normalize: 1.______×2_______
– Sign bit is _____
– Biased exponent is 127 + ___ = ______ten
– Coefficient is_________
Range
• Can represent numbers between
1.00…0×2–126 and 1.11…1×2127
– Say between 2–126 and 2128
• Observing that 210 103
we divide the exponents by 10 and multiply them by
3 to obtain the interval expressed in powers of 10
– Approximate range is 10–38 to 1038
Accuracy
• We have 24 significant bits
– Theoretical precision of 1/224, that is, roughly
1/107
• Cannot add correctly billions or trillions
• Actual situation is worse if we do too many
computations
– 1,000,000 – 999,999.4875 = ???
Guard bits
• Do all arithmetic operations with two additional
bits to reduce rounding errors
Double precision arithmetic (I)
• Use 64-bit double words
• Allows us to have
– One bit for sign
– Eleven bits for exponent
• 2,048 possible values
– Fifty-two bits for coefficient
• Plus the implicit leading bit
Double precision arithmetic (II)
• Exponents are still represented using a biased
notation
– Stored value = actual exponent + bias
• For 11-bit exponents, bias is 1023
– Stored value of 1 corresponds to –1,022
– Stored value of 2,046 corresponds to +1,023
– Stored values of 0 and 2,047 are reserved for
special cases
Double precision arithmetic (III)
• Can now represent numbers between
1.00…0×2–1,022 and 1.11…1×21,203
– Say between 2–1,022 and 21,204
– Approximate range is 10–307 to 10307
• In reality, more like 10–308 to 10308
Double precision arithmetic (IV)
• We now have 53 significant bits
– Theoretical precision of 1/253. that is, roughly
1/1016
• Can now add correctly billions or trillions
If that is now enough, …
• Can use 128-bit quad words
• Allows us to have
– One bit for sign
– Fifteen bits for exponent
• From –16382 to +16383
– One hundred twelve bits for coefficient
• Plus the implicit leading bit
Decimal floating point addition (I)
• 5.25×103 + 1.22×102 = ?
• Denormalize number with smaller exponent:
5.25×103 + 0.122×103
• Add the numbers:
5.25×103 + 0.122×103 = 5.372×103
• Result is normalized
Decimal floating point addition (II)
• 9.25×103 + 8.22×102 = ?
• Denormalize number with smaller exponent:
9.25×103 + 0.822×103
• Add the numbers:
9.25×103 + 0.822×103 = 10.072×103
• Normalize the result:
10.072×103 = 1.0072×104
Binary floating point addition (I)
• Say 1001 + 10 or 1.001×23 + 1.0×21
• Denormalize number with smaller exponent:
1.001×23 + 0.01×23
• Add the numbers:
1.001×23 + 0.01×23 = 1.011×23
• Result is normalized
Binary floating point addition (II)
• Say 101 + 11 or 1.01×22 + 1.1×21
• Denormalize number with smaller exponent:
1.01×22 + 0.11×22
• Add the numbers:
1.01×22 + 0.11×22 = 10.00×22
• Normalize the results
10.00×22 = 1.000×23
Binary floating point subtraction
• Say 101 – 11 or 1.01×22 – 1.1×21
• Denormalize number with smaller exponent:
1.01×22 – 0.11×22
• Perform the subtraction:
1.01×22 – 0.11×22 = 0.10×22
• Normalize the results
0.10×22 = 1.0×21
Decimal floating point multiplication
• Exponent of product is the sum of the exponents
of multiplicand and multiplier
• Coefficient of product is the product of the
coefficients of multiplicand and multiplier
• Compute sign using usual rules of arithmetic
• May have to renormalize the product
Decimal floating point multiplication
• 6×103 × 2.5×102 = ?
• Exponent of product is:
3+2=5
• Multiply the coefficients:
6 ×2.5 = 15
• Result will be positive
• Normalize the result:
15×105 = 1.5×106
Binary floating point multiplication
• Exponent of product is the sum of the exponents
of multiplicand and multiplier
• Coefficient of product is the product of the
coefficients of multiplicand and multiplier
• Compute sign using usual rules of arithmetic
• May have to renormalize the product
Binary floating point multiplication
• Say 110 ×11 or 1.1×22 × 1.1×21
• Exponent of product is:
2+1=3
• Multiply the coefficients:
1.1 × 1.1 = 10.01
• Result will be positive
• Normalize the result:
10.01×23 = 1.001×24
FP division
• Very tricky
• One good solution is to multiply the dividend by
the inverse of the divisor
A trap
• Addition does not necessarily commute:
• –9×1037 + 9×1037 + 4×10-37
• Observe that
• (–9×1037 + 9×1037) + 4×10-37 = 4×10-37
while
• –9×1037 + (9×1037+ 4×10-37) = 0
due to the limited accuracy of FP numbers
IMPLEMENTATIONS
The floating-point unit (I)
• Floating-point instructions were an optional
feature
– User had to buy a separate floating-point unit
aka floating point coprocessor
• Before Intel 80486, all Intel x86
architectures the option to install a separate
floating-point chip(8087, 80287, 80387)
The floating-point unit (II)
• Default solution was to simulate the missing
floating-point instructions through assembly
routines
• As a result, many processor architectures use
separate banks of registers for integer arithmetic
and floating point arithmetic
The floating-point unit (III)
• Some older architectures implemented
– Single-precision operations in hardware
through the FPU
– Double-precision operations by software
• Made double-precession operations much
costlier than single-precision operations.
IBM 360 FP INSTRUCTIONS
Overview
• FPU offers a very familiar user interface
– Eight general purpose FP registers
• Distinct from the integer registers
– Two-operand instructions in both RR and RX
formats
• Includes single-precision and double-precision
versions or addition, subtraction, multiplication
and division
Examples of RR instructions
• AFR f1, f2 add contents of floating-point
register f2 into f1
• ADR f1,f2 add contents of double-precision
register f2 into f1
• LFR f1, f2 load contents of floating-point
register f2 into f1
• Also had load positive, load negative, load
complement instructions for floating-point and
double-precision operands
Examples of RX instructions
• AF r1, d(r2) add contents of word at address
d + contents(r2) into register r1
• AD r1,d(r2) …
MIPS FP INSTRUCTIONS
Overview
• Thirty-two specialized single-precision registers:
$f0, $f1, … $f31
• Each pair of single-precision registers forms a
double-precision register
• *.s instructions apply to single precision format
• *.d instructions apply to double precision format
• Most instructions are in the R format
R-format instructions (I)
• add.s f1, f2, f3 f1 = r2 + f3 (single precision)
• add.d f2, f4, f6 (f2, f2+1) = (f4, f4+1) + (f6, f6 +1)
(double precision applies to
register pairs)
• sub.s f1, f2, f3 f1 = f2 – f3 (single precision)
• sub.d f2, f4, f6 (double precision)
• mul.s f1, f2, f3 f1 = f2×f3 (single precision)
• mul.d f2, f4, f6 (double precision)
R-format instructions (II)
• div.s f1, f2, f3 f1 = f2 /f3 (single precision)
• div.d f2, f4, f6 (double precision)
• c.x.s f1, f2 FP condition = f1 x f2 ? 1 ! 0
where x can be equal, not equal,
less than, less than or equal,
greater than, greater than or equal
• c.x.d f2, f4 (double precision)
I-format instructions (I)
• bclt a jump to address computed by
adding 4×a to the current value of
the PC if the FP condition is true
• bclf a jump to address computed by
adding 4×a to the current value of
the PC if the FP condition is false
I-format instructions (I)
• lwcl f1, a(r1) load floating-point word at address
a + contents(r1) into f1
• ldcl f2, a(r1) (double precision)
• swcl f1, a(r1) store floating-point value in f1
into word at address
a + contents(r1)
• sdcl f2, a(r1) (double precision)