Chapter_08_Computer_Arithmetic_2
Chapter_08_Computer_Arithmetic_2
1
2nd version
www.basiccomparch.com
Download the pdf of the book
videos
3
Integer Division
* Let us only consider positive numbers
* N = DQ + R
* N → Dividend
* D → Divisor
* Q → Quotient
* R → Remainder
* Properties
* [Property 1:] R < D, R >= 0
* [Property 2:] Q is the largest positive integer
satisfying the equation (N = DQ +R) and Property 1
4
Reduction of the Divison Problem
𝑁 =𝐷𝑄+ 𝑅
𝑛−1
⏟ ⏟
𝑁 − 𝐷𝑄 𝑛 2 =𝐷 𝑄1 … 𝑛−1 +𝑅
𝑁′ 𝑄′
5
How to Reduce the Problem
* We need to find Qn
* We can try both values – 0 and 1
* First try 1
* If : N – D2n-1 >= 0, Qn = 1 (maximize the
quotient)
* Otherwise it is 0
* Once we have reduced the problem
* We can proceed recursively
6
Iterative Divider
Divisor(D)
(U-D)
U V
7
Restoring Division
Algorithm 3: Restoring algorithm to divide two 32 bit numbers
Data: Divisor in D, Dividend in V, U = 0
Result: U contains the remainder (lower 32 bits), and V contains the quotient
i ← 0
for i < 32 do
i ← i + 1
/* Left shift UV by 1 position */
UV ← UV << 1
U ← U - D
if U ≥ 0 then
q ← 1
end
else
U ← U + D
q ← 0
end
/* Set the quotient bit */
LSB of V ← q
end
8
Example
Dividend (N) 00111
Quotient(Q) 0010
Remainder(R) 0001
9
Restoring Division
* Consider each bit of the dividend
* Try to subtract the divisor from the U
register
* If the subtraction is successful, set the relevant
quotient bit to 1
* Else, set the relevant quotient bit to 0
* Left shift
10
Proof
* Let us consider the value stored in UV
(ignoring quotient bits)
* After the shift (first iteration)
* UV = 2N
* After line 5, UV contains
* UV – 2nD = 2N – 2nD = 2 * (N – 2n-1 D)
* If (U – D) >= 0
* N' = N – 2n-1D.
* Thus, UV contains 2N'
11
Proof - II
* If (U – D) < 0
* We know that (N = N')
* Add D to U → Add 2nD to UV
* partial dividend = 2N = 2N'
* In both cases
* The partial dividend = 2N'
* After 32 iterations
* V will contain the entire quotient
12
Proof - III
* At the end, UV = 232 * N32 (Ni is the partial
dividend after the ith iteration)
* N31 = DQ1 + R
* N31 – DQ1 = N32 = R
* Thus, U contains the remainder (R)
13
Time Complexity
* n iterations
* Each iteration takes log(n) time
* Total time : n log(n)
14
Restoring vs Non-Restoring Division
15
Algorithm 4: Non-restoring algorithm to divide two 32 bit numbers
Data: Divisor in D, Dividend in V, U = 0
Result: U contains the remainder (lower 32 bits), and V contains the quotient
i ← 0
for i < 32 do
i ← i + 1
/* Left shift UV by 1 position */
UV ← UV << 1
if U ≥ 0 then
U ← U − D
end
else
U ← U + D
end
if U ≥ 0 then
q ← 1
end
else
q ← 0
end
/* Set the quotient bit */
lsb of V ← q
end
if U <0 then
U ← U + D
end
16
Dividend (N) 0011
1
Divisor (D) 0011
U V
beginning: 00000 0111
Quotient(Q) 0010
Remainder(R) 0001
17
Idea of the Proof
* Start from the beginning : If (U – D) >= 0
* Both the algorithms (restoring and non-restoring)
produce the same result, and have the same state
* If (U – D) < 0
* We have a divergence
* In the restoring algorithm
* value(UV) = A
* In the non-restoring algorithm
* value(UV) = A - 2nD
18
Proof - II
* In the next iteration (just after the shift)
* Restoring : value(UV) = 2A
* Non - Restoring : value(UV) = 2A - 2n+1D
* If the quotient bit is 1 (end of iteration)
* Restoring :
* Subtract 2nD
* value(UV) = 2A - 2nD
* Non Restoring :
* Add 2nD
* value(UV) = 2A – 2n+1D + 2nD = 2A - 2nD
19
Proof - III
* If the quotient bit is 0
* Restoring
* partial dividend = 2A
* Non restoring
* partial dividend = 2A – 2nD
* Next iteration (if quotient bit = 1) (after shift)
* Restoring : partial dividend : 4A
* Non restoring : partial dividend : 4A – 2n+1D
* Keep applying the same logic ….
20
Outline
* Addition
* Multiplication
* Division
* Floating Point Addition
* Floating Point Multiplication
* Floating Point Division
21
Adding Two Numbers (same sign)
Symbol Meaning
S Sign bit (0(+ve), 1(-ve))
P Significand (form: 1.xxx(normal) or 0.xxx(denormal))
M Mantissa (fractional part of significand)
E (exponent + 127(bias))
Z Set of integers
22
Addition
* Add : A + B
* Unpack the E fields → EA , EB
* Let the E field of the result be → EC
* Unpack the significand (P)
* P contains → 1 bit before the decimal point,
23 mantissa bits (24 bits)
* Unpack to a 25 bit number (unsigned)
* W → Add a leading 0 bit, 24 bits of the signficand
23
Addition - II
* With no loss of generality
* Assume EA >= EB
24
Renormalisation
* Let the significand represented by register, W,
be PW
* There is a possibility that PW >= 2
* In this case, we need to renormalise
* W ← W >> 1
* E A ← EA + 1
* The final result
* Sign bit (same as sign of A or B)
* Significand (PW), exponent field (EA)
25
Example
Answer:
The decimal point in W is shown for enhancing
readability. For simplicity, biased notation not
used.
1. A = 1.01 * 23 and B = 1.11 * 21
2. W = 01.11 (significand of B)
3. E = 3
4. W = 01.11 >> (3-1) = 00.0111
5. W + PA = 00.0111 + 01.0100 = 01.1011
6. Result: C = 1.011 * 23 26
Example - II
Example: Add : 1.012 * 23 + 1.112 * 22
Answer:
The decimal point in W is shown for enhancing
readability. For simplicity, biased notation not
used.
28
Rounding - II
29
IEEE 754 Rounding Modes
* Truncation
* P' = P
* Example in decimal : 9.5 → 9, 9.6 → 9
* Round to +∞
* P' = ⎡P +R⎤
* Example in decimal : 9.5 → 10, -3.2 → -3
30
IEEE 754 Rounding - II
* Round to -∞
* P' = ⌊P+R⌋
* Example in decimal : 9.5 → 9, -3.2 → -4
* Round to nearest
* P' = [P + R]
* Example in decimal :
* 9.4 → 9 , 9.5 → 10 (even)
* 9.6 → 10 , -2.3 → -2
* -3.5 → -4 (even)
31
Rounding Modes – Summary
32
Implementing Rounding
* We need three bits
* lsb(P)
* msb of the residue (R) → r (round bit)
* OR of the rest of the bits of the residue (R) → s
(sticky bit)
33
Renormalisation after Rounding
34
Addition of Numbers (Opposite Signs)
*C=A+B
* Same assumption EA >= EB
* Steps
* Load W with the significand of B (PB)
* Take the 2's complement of W (W = -B)
* W ← W >> (EA – EB)
* W ← W + PA
* If (W < 0) replace it with its 2's complement. Flip the sign
of the result.
35
Addition of Numbers (Opposite Signs)-
II
36
C=A+B
Y
A=0? C=B
N N
W < 0?
Y
B=0? C=A Y
N W - W (2's complement)
Swap A and B
such that E B<= EA
S=S
N
Y Y
sign(A) = sign(B)? Overflow or Construct C out
underflow? of W, E, and S
N
N Report
W - W (2's complement)
Round W
W W + PA
Normalize W and
C
update E
37
Outline
* Addition
* Multiplication
* Division
* Floating Point Addition
* Floating Point Multiplication
* Floating Point Division
38
Multiplication of FP Numbers
* Steps
* E ← EA + EB - bias
* W ← PA * PB
* Normalise (shift left or shift right)
* Round
* Renormalise
39
C=A*B
Y
A=0? C=0
Normalize W and
N
update E
Y
B=0? C=0
N
Overflow or Y
underflow?
S sign(A) sign(B)
E E A+ E B - bias N Report
Round W Construct C out
of W, E, and S
Normalize W and
Overflow or Y update E
underflow?
Report
N
C
Overflow or Y
underflow?
W P * P
A B
N Report
40
Outline
* Addition
* Multiplication
* Division
* Floating Point Addition
* Floating Point Multiplication
* Floating Point Division
41
Simple Division Algorithm
* Divide A/B to produce C
* There is no notion of a remainder in FP division
* Algorithm
* E ← EA – EB + bias
* W ← PA / PB
* normalise, round, renormalise
* Complexity : O(n log(n))
42
Goldschmidt Division
* Let us compute the reciprocal of B (1/B)
* Then, we can use the standard floating point
multiplication algorithm
* Ignoring the exponent
* Let us compute (1/PB)
* If B is a normal floating point number
* 1 <= PB < 2
* PB = 1 + X (X < 1)
43
Goldschmidt Division - II
44
* No point considering Y32
* Cannot be represented in our format
45
Generating the 1/(1-Y)
47
Division using the Newton
Raphson Method
* Let us focus on just finding the reciprocal
of a number
* Let us designate PB as b (1 <= b < 2)
* Aim is to compute 1/b
* Let us create a function f(x) = 1/x – b
* f(x) = 0, when x = 1/b
* Problem of computing the reciprocal
* same as computing the root of f(x)
48
Idea of the Method
x0,f(x0)
f(x)
x1,f(x1)
root
x2 x1 x0
x
50
Analysis
* f(x) = 1/x – b
* f’(x)= d f(x) / d(x) = -1 / x2
* f'(x0) = -1/x02
* At x0, y = 1/x0 - b
51
Algebra
52
Intersection with the x-axis
𝑥12
− 2 + −𝑏=0
𝑥0 𝑥0
53
Evolution of the Error
𝜀 ( 𝑥 0 ) =𝑏 𝑥0 − 1
54
Bounding the Error
55
Evolution of the Error - II
Iteration max(e (x)
1
0 2
1
1 22
1
2 24
1
3 28
1
4 216
1
5 232
* E(x) = bx – 1 = b (x – 1/b)
* x – 1/b is the difference between the ideal value and the actual
estimate (x). This is near 2-32, which is too small to be considered.
* No point considering beyond 5 iterations
* Since, we are limited to 23 bit mantissas
56
Time Complexity
* In every step, the operation that we
need to perform is :
* xn = 2xn-1 – bxn-12
* Requires a shift, multiply, and subtract operation
* O(log(n)) time
* Number of steps: O(log(n))
* Total time : O( log (n)2 )
57
THE END
58