Chapter 3:
Arithmetic for Computers
Ngo Lam Trung
[with materials from Computer Organization and Design, 4th Edition,
Patterson & Hennessy, © 2008, MK
and M.J. Irwin’s presentation, PSU 2008]
IT3283E Fall 2022 1
Content
❑ Integer representation and arithmetic
❑ Floating point number representation and arithmetic
IT3283E Fall 2022 2
What are stored inside computer?
❑ Data, of course!
❑ And data is represented as binary numbers
❑ Then how binary numbers are treated by CPUs?
Integers
- Unsigned
- Signed
Floating point numbers
- Single precision
- Double precision
- Other formats
IT3283E Fall 2022 3
Unsigned Binary Integers
❑ Using n-bit binary number to represent non-negative
integer
x = x n−1x n−2 ...x1x 0
= x n−1 2n−1 + x n−2 2n−2 + + x1 21 + x 0 20
❑ Range: 0 to +2n – 1
❑ Example
0000 0000 0000 0000 0000 0000 0000 10112
= 0 + … + 1×23 + 0×22 +1×21 +1×20
= 0 + … + 8 + 0 + 2 + 1 = 1110
❑ Data range using 32 bits
0 to 232-1 = 4,294,967,295
IT3283E Fall 2022 4
Eg: 32 bit Unsigned Binary Integers
Hex Binary Decimal
0x00000000 0…0000 0
0x00000001 0…0001 1
0x00000002 0…0010 2
0x00000003 0…0011 3
0x00000004 0…0100 4
0x00000005 0…0101 5
0x00000006 0…0110 6
0x00000007 0…0111 7
0x00000008 0…1000 8
0x00000009 0…1001 9
…
0xFFFFFFFC 1…1100 232-4
0xFFFFFFFD 1…1101 232-3
0xFFFFFFFE 1…1110 232-2
0xFFFFFFFF 1…1111 232-1
IT3283E Fall 2022 5
Exercise
❑ Convert to 32-bit integers
25 = 0000 0000 0000 0000 0000 0000 0001 1001
125 = 0000 0000 0000 0000 0000 0000 0111 1101
255 = 0000 0000 0000 0000 0000 0000 1111 1111
❑ Convert 32-bit integers to decimal
0000 0000 0000 0000 0000 0000 1100 1111 = 207
0000 0000 0000 0000 0000 0001 0011 0011 = 307
IT3283E Fall 2022 6
Signed binary integers
❑ Using n-bit binary number to represent integer, including
negative values
x = x n−1x n−2 ...x1x 0
n −1 n−2
= − x n−1 2 + x n−2 2 + + x1 2 + x 0 2
1 0
❑ Range: –2n – 1 to +2n – 1 – 1
❑ Example
1111 1111 1111 1111 1111 1111 1111 11002
= –1×231 + 1×230 + … + 1×22 +0×21 +0×20
= –2,147,483,648 + 2,147,483,644 = –410
❑ Using 32 bits
–2,147,483,648 to +2,147,483,647
IT3283E Fall 2022 7
Signed integer negation
❑ Given 𝑥 = 𝑥𝑛 − 1𝑥𝑛 − 2 … 𝑥1𝑥0, how to calculate −𝑥?
❑ Let 𝑥ҧ = 1′ 𝑠 𝑐𝑜𝑚𝑝𝑙𝑒𝑚𝑒𝑛𝑡 𝑜𝑓 𝑥
𝑥ҧ = 1111 … 112 − 𝑥
(1 → 0, 0 → 1)
Then
𝑥ҧ + 𝑥 = 1111 … 112 = −1
➔ 𝑥ҧ + 1 = −𝑥
❑ Example: find binary representation of -2
+2 = 0000 0000 … 00102
–2 = 1111 1111 … 11012 + 1
= 1111 1111 … 11102
IT3283E Fall 2022 8
Signed binary negation
2’sc binary decimal
-23 =
1000 -8
-(23 - 1) =
1001 -7
1010 -6
1011 -5
complement all the 1100 -4
bits 1101 -3
1011
1110 -2
0101
and add a 1 1111 -1
and add a 1
0000 0
1010 0001 1
0110
0010 2
complement all the 0011 3
bits 0100 4
0101 5
0110 6
IT3283E Fall 2022 23 - 1 = 0111 7 9
Exercise
❑ Find 16 bit signed integer representation of
16 = 0000 0000 0001 0000
-16 = 1111 1111 1111 0000
100 = 0000 0000 0110 0100
-100 = 1111 1111 1001 1100
IT3283E Fall 2022 10
Sign extension
❑ Given n-bit integer 𝑥 = 𝑥𝑛 − 1𝑥𝑛 − 2 … 𝑥1𝑥0
❑ Find corresponding m-bit representation (m > n) with the
same numeric value
𝑥 = 𝑥𝑚 − 1𝑥𝑚 − 2 … 𝑥1𝑥0
❑ → Replicate the sign bit to the left
❑ Examples: 8-bit to 16-bit
+2: 0000 0010 => 0000 0000 0000 0010
–2: 1111 1110 => 1111 1111 1111 1110
IT3283E Fall 2022 11
Addition and subtraction
❑ Addition
Similar to what you do to add two numbers manually
Digits are added bit by bit from right to left
Carries passed to the next digit to the left
❑ Subtraction
Negate the second operand then add to the first operand
IT3283E Fall 2022 12
Examples
❑ All numbers are 8-bit signed integer
12 + 8 =
122 + 8 =
122 + 80 =
IT3283E Fall 2022 13
Dealing with Overflow
❑ Overflow occurs when the result of an operation cannot be represented in 32-bits, i.e., when
the sign bit contains a value bit of the result and not the proper sign bit
❑ When adding operands with different signs or when
subtracting operands with the same sign, overflow can
never occur
Operation Operand A Operand B Result indicating
overflow
A+B ≥0 ≥0 <0
A+B <0 <0 ≥0
A-B ≥0 <0 <0
A-B <0 ≥0 ≥0
IT3283E Fall 2022 14
Adder implementation
❑ N-bit ripple-carry adder
CarryIn0
A0 1-bit result0
B0 ALU
CarryOut0 Performance depends
CarryIn1
on data length
A1 1-bit result1
B1 ALU
CarryOut1
CarryIn2
A2 1-bit result2
B2 ALU
CarryOut2
CarryIn3
A3 1-bit result3
B3 ALU
CarryOut3
➔ Performance is low
IT3283E Fall 2022 15
Making addition faster: infinite hardware
❑ Parallelize the adder with the cost of hardware
❑ Given the addition:
𝑎𝑛 − 1𝑎𝑛 − 2 … 𝑎1𝑎0 + 𝑏𝑛 − 1𝑏𝑛 − 2 … 𝑏1𝑏0
❑ Let 𝑐𝑖 is the carry at bit 𝑖
c2 = (b1 . c1) + (a1 . c1) + (a1 . b1)
c1 = (b0 . C0) + (a0 . c0) + (a0 . b0)
❑ Then
Find c2 = (a1 . a0 . b0)
c2 from a0, b0, a1, b1?
+ (a1 . a0 . c0) 𝑐𝑛 − 1 will be
+ (a1 . b0 . c0) extremely complicated
+ (b1 . a0 . b0)
+ (b1 . a0 . c0)
+ (b1 . b0 . c0)
+ (a1 . b1)
❑ We can hardwire the adder circuit to have super-fast
performance. Problem: too many hardware gates!
IT3283E Fall 2022 16
Making addition faster: Carry Lookahead
❑ Approach
Make hardwired 4 bit adder → fast and simple enough
Develop a carry lookahead unit to calculate the carry bit before
finishing the addition
❑ At bit 𝑖
❑ Denote
❑ Then
IT3283E Fall 2022 17
Carry lookahead
❑ With 4-bit adder
➔ All carry bits can be calculated after 3 gate delay
➔ All result bits can be calculated after maximum of 4 gate
delay
➔ How to implement bigger adder?
IT3283E Fall 2022 18
Carry lookahead
❑ For 16-bit adder → fast C1, C2, C3, C4 is needed
C0 = 0
C1
C2
C3
C4
IT3283E Fall 2022 19
Carry lookahead
❑ Denote
❑ Then big-carry bits can be calculated fast
IT3283E Fall 2022 20
16-bit Adder
IT3283E Fall 2022 21
Exercise
❑ Dertermine 𝑔𝑖 , 𝑝𝑖, 𝐺𝑖 , 𝑃𝑖 when adding the two 16-bit
numbers
𝑎 = 0001 1010 0011 0011
𝑏 = 1110 0101 1110 1011
❑ Calculate 𝑐15
IT3283E Fall 2022 22
Exercise
❑ 𝑝𝑖, 𝑔𝑖
IT3283E Fall 2022 23
Exercise
❑ 𝑐15 is actually 𝐶4
IT3283E Fall 2022 24
Multiply
❑ Binary multiplication is just a bunch of right shifts and
adds
n
multiplicand
multiplier
partial
can be formed in parallel
n product
and added in parallel for
array
faster multiplication
double precision product
2n
n-bit multiplicand and multiplier → 2n-bit product
IT3283E Fall 2022 25
Example
IT3283E Fall 2022 26
Add and Right Shift Multiplier Hardware
0 1 1 0 = 6 6x5=?
4-bit integer
multiplicand
add
32-bit ALU
shift
right
product
multiplier Control
0 0 0 0 0 1 0 1 = 5
add 0 1 1 0 0 1 0 1 LSB=1 → add multiplicand
0 0 1 1 0 0 1 0 shift right
add 0 0 1 1 0 0 1 0 LSB=0 → no change
0 0 0 1 1 0 0 1 shift right
add 0 1 1 1 1 0 0 1 LSB=1 → add multiplicand
0 0 1 1 1 1 0 0 shift right
add 0 0 1 1 1 1 0 0 LSB=0 → no change
0 0 0 1 1 1 1 0 shift right = 30
IT3283E Fall 2022 28
Fast multiplier – Design for Moore
❑ Why is this fast?
IT3283E Fall 2022 29
Fast multiplier – Design for Moore
❑ How fast is this?
❑ Anything wrong?
IT3283E Fall 2022 30
MIPS Multiply Instruction
❑ Multiply (mult and multu) produces a double
precision product (2 x 32 bit)
mult $s0, $s1 # hi||lo = $s0 * $s1
0 16 17 0 0 0x18
Two additional registers: hi and lo
Low-order word of the product is stored in processor register
lo and the high-order word is stored in register hi
Instructions mfhi rd and mflo rd are provided to move
the product to (user accessible) registers in the register file
IT3283E Fall 2022 31
Division
❑ Division is just a bunch of quotient digit guesses and left
shifts and subtracts
dividend = quotient x divisor + remainder
n
n quotient
0 0 0 dividend
divisor
0
partial
0 remainder
array
0
remainder
n
IT3283E Fall 2022 32
Left Shift and Subtract Division Hardware
0010 =2
divisor
32-bit ALU subtract
shift
left
dividend
remainder quotient Control
0000 0110 =6
0000 1100
sub 1110 1 1 0 0 rem neg, so ‘ient bit = 0
0000 1100 restore remainder
0001 1000
sub 1111 1 0 0 0 rem neg, so ‘ient bit = 0
0001 1000 restore remainder
0011 0000
sub 0001 0 0 0 1 rem pos, so ‘ient bit = 1
0010 0010
sub 0000 0 0 1 1 rem pos, so ‘ient bit = 1
= 3 with 0 remainder
IT3283E Fall 2022 34
S Divide Instruction
❑ Divide (div and divu) generates the reminder in hi
and the quotient in lo
div $s0, $s1 # lo = $s0 / $s1
# hi = $s0 mod $s1
0 16 17 0 0 0x1A
Instructions mfhi rd and mflo rd are provided to move
the quotient and reminder to (user accessible) registers in the
register file
❑ As with multiply, divide ignores overflow so software must determine if the quotient is
too large. Software must also check the divisor to avoid division by 0.
IT3283E Fall 2022 35
Signed integer multiplication and division
❑ Reuse unsigned multiplication then fix product sign later
❑ Multiplication
Multiplicand and multiplier are of the same sign: keep product
Multiplicand and multiplier are of different sign: negate product
❑ Division:
Dividend and divisor of the same sign:
- Keep quotient
- Keep/negate remainder so it is of the same sign with dividend
Dividend and divisor of different sign:
- Negate quotient
- Keep/negate remainder so it is of the same sign with dividend
IT3283E Fall 2022 36
Representing Big (and Small) Numbers
❑ Encoding non-integer value?
Earth mass: (5.9722±0.0006)×1024 (kg)
Weight of an amu (atomic mass unit, 1/12 mass of C12)
0.0000000000000000000000000166 or 1.6 x 10-27 (kg)
PI number
PI = 3.14159….
❑ Problem: how to represent the above numbers?
➔ We need reals or floating-point numbers!
➔ Floating point numbers in decimal:
➔ 1000
➔1 x 103
➔ 0.1 x 104
IT3283E Fall 2022 37
Floating point number
❑ In decimal system
2013.1228 = 201.31228 * 10
= 20.131228 * 102
= 2.0131228 * 103
= 20131228 * 10-4
❑ What is the “standard” form?
2.0131228 * 103 = 2.0131228E+03
mantissa exponent
❑ In binary X = 1.xxxxx * 2yyyy
❑ Sign, mantissa, and exponent need to be represented
IT3283E Fall 2022 38
Floating point number
❑ Floating point representation in binary
(-1)sign x 1.F x 2E-bias
Still have to fit everything in 32 bits (single precision)
Bias = 127 with single precision floating point number
s E (exponent) F (fraction)
1 sign bit 8 bits 23 bits
❑ Defined by the IEEE 754-1985 standard
❑ Single precision: 32 bit
❑ Double precision: 64 bit
❑ Correspond to float and double in C
IT3283E Fall 2022 39
Examples
❑ Ex1: convert X into decimal value
X = 1100 0001 0101 0110 0000 0000 0000 0000
sign = 1 → X is negative
E = 1000 0010 = 130
F = 10101100...00
→ X = (-1)1 x 1.101011000..00 x 2130-127
= -1.101011 x 23 = -1101.011
= -13.375
IT3283E Fall 2022 40
Example
❑ Ex2: find decimal value of X
X = 0011 1111 1000 0000 0000 0000 0000 0000
sign = 0
e = 0111 1111 = 127
m = 000…0000 (23 bit 0)
X = (-1)0 x 1.00…000 x 2127-127 = 1.0
IT3283E Fall 2022 41
Example
❑ Ex3: find binary representation of X = 9.6875 in IEEE 754
single precision
Converting X to plain binary
910 = 10012
0.6875 x 2 = 1.375 → get bit 1
0.375 x 2 = 0.75 → get bit 0
0.75 x 2 = 1.5 → get bit 1
0.5 x = 1.0 → get bit 1
➔ 9.687510 = 1001.10112
IT3283E Fall 2022 42
Example
❑ Ex3: find binary representation of X = 9.6875 in IEEE 754
single precision
X = 9.6875(10) = 1001.1011(2) = 1.0011011 x 23
Then
S=0
e = 127 + 3 = 130(10) = 1000 0010(2)
m = 001101100...00 (23 bit)
Finally
X = 0100 0001 0001 1011 0000 0000 0000 0000
IT3283E Fall 2022 43
Examples
❑ 1.02 x 2-1 =
❑ 100.7510 =
IT3283E Fall 2022 44
Some special values
❑ Smallest+: 0 00000001 1.00000000000000000000000
= 1 x 21-127
❑ Zero: 0 00000000 00000000000000000000000
= true 0
❑ Largest+: 0 11111110 1.11111111111111111111111
= (2-2-23) x 2254-127
IT3283E Fall 2022 45
Too large or too small values
❑ Overflow (floating point) happens when a positive
exponent becomes too large to fit in the exponent field
❑ Underflow (floating point) happens when a negative
exponent becomes too large to fit in the exponent field
-2127 -2-127 2-127 2127
-∞ +∞
- largestE -smallestF - largestE +smallestF
+ largestE -largestF + largestE +largestF
❑ Reduce the chance of underflow or overflow is to offer another format that has a larger
exponent field
Double precision – takes two MIPS words
s E (exponent) F (fraction)
1 bit 11 bits 20 bits
F (fraction continued)
32 bits
IT3283E Fall 2022 46
Reduce underflow with the same bit length?
❑ De-normalized number
IT3283E Fall 2022 47
IEEE 754 FP Standard Encoding
❑ Special encodings are used to represent unusual events
± infinity for division by zero
NAN (not a number) for invalid operations such as 0/0
True zero is the bit string all zero
Single Precision Double Precision Object
E (8) F (23) E (11) F (52) Represented
0000 0000 0 0000 … 0000 0 true zero (0)
0000 0000 nonzero 0000 … 0000 nonzero ± denormalized
number
0111 1111 to anything 0111 …1111 to anything ± floating point
+127,-126 +1023,-1022 number
1111 1111 +0 1111 … 1111 -0 ± infinity
1111 1111 nonzero 1111 … 1111 nonzero not a number
(NaN)
IT3283E Fall 2022 48
Floating Point Addition
❑ Addition (and subtraction)
(F1 2E1) + (F2 2E2) = F3 2E3
Step 0: Restore the hidden bit in F1 and in F2
Step 1: Align fractions by right shifting F2 by E1 - E2
positions (assuming E1 E2) keeping track of (three of) the
bits shifted out in G R and S
Step 2: Add the resulting F2 to F1 to form F3
Step 3: Normalize F3 (so it is in the form 1.XXXXX …)
- If F1 and F2 have the same sign → F3 [1,4) → 1 bit right shift F3 and increment
E3 (check for overflow)
- If F1 and F2 have different signs → F3 may require many left shifts each time
decrementing E3 (check for underflow)
Step 4: Round F3 and possibly normalize F3 again
Step 5: Rehide the most significant bit of F3 before storing
the result
IT3283E Fall 2022 49
Floating Point Addition Example
❑ Add
(0.5 = 1.0000 2-1) + (-0.4375 = -1.1100 2-2)
Step 0:
Step 1:
Step 2:
Step 3:
Step 4:
Step 5:
IT3283E Fall 2022 50
Floating Point Addition Example
❑ Add: 0.5 + (-0.4375) = ?
(0.5 = 1.0000 2-1) + (-0.4375 = -1.1100 2-2)
Step 0: Hidden bits restored in the representation above
Step 1: Shift significand with the smaller exponent (1.1100)
right until its exponent matches the larger exponent
(so once)
Step 2: Add significands
1.0000 + (-0.111) = 1.0000 – 0.111 = 0.001
Step 3: Normalize the sum, checking for exponent
over/underflow
0.001 x 2-1 = 0.010 x 2-2 = .. = 1.000 x 2-4
Step 4: The sum is already rounded, so we’re done
Step 5: Rehide the hidden bit before storing
IT3283E Fall 2022 51
Floating Point Multiplication
❑ Multiplication
(F1 2E1) x (F2 2E2) = F3 2E3
Step 0: Restore the hidden bit in F1 and in F2
Step 1: Add the two (biased) exponents and subtract the
bias from the sum, so E1 + E2 – 127 = E3
also determine the sign of the product (which depends on
the sign of the operands (most significant bits))
Step 2: Multiply F1 by F2 to form a double precision F3
Step 3: Normalize F3 (so it is in the form 1.XXXXX …)
- Since F1 and F2 come in normalized → F3 [1,4) → 1 bit right shift F3 and
increment E3
- Check for overflow/underflow
Step 4: Round F3 and possibly normalize F3 again
Step 5: Rehide the most significant bit of F3 before storing
the result
IT3283E Fall 2022 52
Floating Point Multiplication Example
❑ Multiply
(0.5 = 1.0000 2-1) x (-0.4375 = -1.1100 2-2)
Step 0:
Step 1:
Step 2:
Step 3:
Step 4:
Step 5:
IT3283E Fall 2022 53
Floating Point Multiplication Example
❑ Multiply
(0.5 = 1.0000 2-1) x (-0.4375 = -1.1100 2-2)
Step 0: Hidden bits restored in the representation above
Step 1: Add the exponents (not in bias would be -1 + (-2) = -
3 and in bias would be (-1+127) + (-2+127) – 127 =
(-1 -2) + (127+127-127) = -3 + 127 = 124
Step 2: Multiply the significands
1.0000 x 1.110 = 1.110000
Step 3: Normalized the product, checking for exp over/underflow
1.110000 x 2-3 is already normalized
Step 4: The product is already rounded, so we’re done
Step 5: Rehide the hidden bit before storing
IT3283E Fall 2022 54
Support for Accurate Arithmetic
❑ IEEE 754 FP rounding modes
Always round up (toward +∞)
Always round down (toward -∞)
Truncate
Round to nearest even (when the Guard || Round || Sticky
are 100) – always creates a 0 in the least significant (kept)
bit of F
❑ Rounding (except for truncation) requires the hardware to
include extra F bits during calculations
Guard and Round bit – 2 additional bits to increase accuracy
Sticky bit – used to support Round to nearest even; is set to a 1
whenever a 1 bit shifts (right) through it (e.g., when aligning F
during addition/subtraction)
F = 1 . xxxxxxxxxxxxxxxxxxxxxxx G R S
http://pages.cs.wisc.edu/~markhill/cs354/Fall2008/notes/flpt.apprec.html
IT3283E Fall 2022 55
Example
❑ Calculate:
0.2 x 5 = ?
0.333 x 3 = ?
(1.0/3) x 3 = ?
IT3283E Fall 2022 56