0% found this document useful (0 votes)
16 views77 pages

Module 3 (BKM) - Arithmetic

Uploaded by

ssedits050
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views77 pages

Module 3 (BKM) - Arithmetic

Uploaded by

ssedits050
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Computer Arithmetic

Dr. Bimal Kumar Meher


Associate Professor
Dept. of CSE
Addition and Subtraction of signed
magnitude numbers
• Let the magnitude of the two numbers to be added (or
subtracted) are A and B.
• When the signed numbers are added or subtracted,
we find the following eight different conditions to be
considered:

2
Algorithm for Addition of signed magnitude
numbers
• When the signs of A and B are identical, add the two
magnitudes and attach the sign of A to the result.
• When the signs of A and B are different, compare the
magnitudes and subtract the smaller number from the
larger.
• Choose the sign of the result to be the same as A if A > B or the
complement of the sign of A if A < B.
• If the two magnitudes are equal, subtract B from A and
make the sign of the result positive.
Algorithm for Subtraction of signed magnitude
numbers
• When the signs of A and B are different, add the two
magnitudes and attach the sign of A to the result.
• When the signs of A and B are same, compare the
magnitudes and subtract the smaller number from the
larger.
• Choose the sign of the result to be the same as A if A > B or the
complement of the sign of A if A < B.
• If the two magnitudes are equal, subtract B from A and
make the sign of the result positive.
Hardware Implementation
Basic components of Hardware
Implementation
• It needs two registers to hold the magnitude of two numbers,
say A and B.
• As and Bs are two flip-flops to hold the corresponding signs.
• The result may be transferred to a third register or may be
accumulated in A and As.
• A+B is performed with the help of parallel adder.
• A-B is performed by adding the 2’s complement of B to A.
• A 2’s complement subtractor and comparator can be
implemented by using a complementer.
• The output carry is transferred to flip-flop E, which determines
the relative magnitude of two numbers.
• The AVF (Add OverFlow) flip-flop holds the overflow bit during
A+B.
Hardware Implementation Explanation
• The addition A+B is done through the parallel adder.
• The sum S output of the adder is applied to the input of A
register.
• The complementer provides an output of B or the complement
of B depending on the mode control M.
• The complementer consists of XOR gates and the parallel
adder consists of full adder circuits.
• The M signal is also applied to the input carry of the adder.
• When M=0, the output of B is transferred to the adder, the
input carry is 0, and the output of the adder is equal to the
sum A+B.
• When M=1, the 1’s complement of B is applied to the adder,
the input carry is 1, and the output S=A+B’+1 (That is the 2’s
complement of B subtracted from A)
Flowchart of
Addition/Subtraction
Hardware Algorithm
• The XOR decides
whether the signs of A
and B are identical or
not.
• If the AsBs=0, then the
signs are identical and
vice versa.
• For add operation,
identical signs means
the magnitudes to be
added.
• For subtract operation,
different sign means the
magnitudes to be
added.
Flowchart of
Addition/Subtraction
Hardware Algorithm-
Working
• The magnitudes are
added with a
microoperation
EA=A+B, where EA
is a register that
combines E and A.
• The carry in E after
the addition
constitutes an
overflow if it is
equal to 1.
• The value of E is
transferred into the
AVF.
Flowchart of
Addition/Subtraction
Hardware Algorithm-
Working
• The two magnitudes
are subtracted if the
signs are different
for an add operation
or identical for a
subtract operation.
• The subtraction is
done by adding 2’s
complement of B to
A.
• No overflow occurs
if the numbers are
subtracted. So AVF is
cleared to 0.
Flowchart of
Addition/Subtraction
Hardware Algorithm-
Working
• If E=1, it indicates A>=B
and the number in A is the
correct result if A≠0.
• If E=0, it indicates A<B and
2’s complement of A is
done.
• Since A<B, the sign of the
result is the complement
of the original sign of A.
• Therefore, As is
complemented to obtain
the correct sign.
• Finally, A and As give the
Addition/subtraction of signed numbers
xi yi Carry-in ci Sumsi Carry-outci +1
At the ith stage:
0 0 0 0 0 Input:
0 0 1 1 0 ci is the carry-in
0 1 0 1 0
0 1 1 0 1
Output:
1 0 0 1 0 si is the sum
1 0 1 0 1 ci+1 carry-out to (i+1)st
1 1 0 0 1
1 1 1 1 1 state

si = xi yi ci + xi yi ci + xi yi ci + xi yi ci = x i  yi  ci
ci +1 = yi ci + xi ci + xi yi

Example:

X 7 0 1 1 1 Carry-out xi Carry-in
+Y = +6 = + 00 1 1 1 1 0 0 0 yi
ci+1 ci
Z 13 1 1 0 1 si

Legend for stage i


Addition logic for a single stage
Sum Carry
yi
c
i
xi
xi
yi si c
c i +1
i
ci
x
xi yi i
yi

ci + 1 Full adder ci
(FA)

s
i

Full Adder (FA): Symbol for the complete circuit for a single stage of
addition.
n-bit adder
• Cascade n full adder (FA) blocks to form a n-bit adder.
• Carries propagate or ripple through this cascade,
hence called bit ripple carry adder.
x y x y x y
n- 1 n- 1 1 1 0 0

c c
n- 1 1
c FA FA FA c
n 0

s s s
n- 1 1 0

Most significant bit Least significant bit


(MSB) position (LSB) position
K n-bit adder
• K n-bit numbers can be added by cascading k n-bit adders.
• Each n-bit adder forms a block, so this is cascading of blocks.
• Carries ripple or propagate through blocks, hence called
Blocked Ripple Carry Adder
x y x y x y x y x y
kn - 1 kn - 1 2n - 1 2n - 1 n n n- 1 n- 1 0 0

c
n-bit n-bit n n-bit
c c
kn adder adder adder 0

s s s s s s
kn - 1 (k - 1 ) n 2n - 1 n n- 1 0
n-bit subtractor
•Recall X – Y is equivalent to adding 2’s complement of Y to X.
•2’s complement is equivalent to 1’s complement + 1.
•X – Y = X + Y + 1

x y x y x y
n- 1 n- 1 1 1 0 0

c c
n- 1 1
c
n FA FA FA 1

s s s
n- 1 1 0

Most significant bit Least significant bit


(MSB) position (LSB) position
n-bit adder/subtractor (contd..)
y y y
n- 1 1 0
Add/Sub
control

x x x
n- 1 1 0

c n-bit adder
n c
0

s s s
n- 1 1 0

•Add/sub control = 0, addition.


•Add/sub control = 1, subtraction.
Detecting overflows
 Overflows can only occur when the sign of the two
operands is the same.
 Overflow occurs if the sign of the result is different
from the sign of the operands.
 Recall that the MSB represents the sign.
 xn-1, yn-1, sn-1 represent the sign of operand x, operand y and
result s respectively.
 Circuit to detect overflow can be implemented by the
following logic expressions:

Overflow  xn 1 yn 1sn 1  xn 1 yn 1sn 1

Overflow  cn  cn 1
Computing the Add time

x0 y0 Consider 0th stage:

•c1 is available after 2 gate delays.


c1 FA c0 •s0 is available after 1 gate delay.

s0

Sum Carry
yi
c
i
xi
xi
yi si c
c i +1
i
ci
x
i
yi
Computing the Add time (contd..)
Cascade of 4 Full Adders, or a 4-bit adder

x3 y3 x2 y2 x1 y1 x0 y0

FA FA FA FA c0
c4 c3 c2 c1

s3 s2 s1 s0

•s0 available after 1 gate delays, c1 available after 2 gate delays.


•s1 available after 3 gate delays, c2 available after 4 gate delays.
•s2 available after 5 gate delays, c3 available after 6 gate delays.
•s3 available after 7 gate delays, c4 available after 8 gate delays.
Note: For an n-bit adder, sn-1 is available after 2n-1
gate delays and cn is available after 2n gate delays.
Design of Fast Adder
Recall the equations: si  xi  yi  ci
ci 1  xi yi  xi ci  yi ci
Second equation can be written as:
ci 1  xi yi  ( xi  yi )ci
We can write:
ci 1  Gi  Pi ci
where Gi  xi yi and Pi  xi  yi
• Gi is called generate function and Pi is called propagate
function
• Gi and Pi are computed only from xi and yi and not ci, thus
they can be computed in one gate delay after X and Y are
applied to the inputs of an n-bit adder.
Generate and Propagate Functions
• If Gi=1, then ci+1 = 1, independent of the input carry ci
• This occurs when both xi and yi are 1.
• The propagate function means that, an input carry will
produce an output carry when either xi is 1 or yi is 1.
• All Gi and Pi functions can be formed independently and in
parallel in one logic gate delay after the X and Y vectors are
applied to the inputs of an n-bit adder.
• Each bit stage contains a AND gate to form Gi and OR gate to
form Pi and 3-input XOR gate to form si.
• An adequate propagate function can be realized as Pi =xiyi
which differs from Pi = xi + yi when xi = yi =1. But in this case Gi
=1, so it doesn’t matter whether Pi is or 1.
• So we can cascade two 2-input XOR gates to replace the 3-
input XOR gate.
Carry Lookahead
ci 1  Gi  Pi ci
ci  Gi 1  Pi 1ci 1
 ci1  Gi  Pi (Gi 1  Pi 1ci 1 )
continuing
 ci1  Gi  Pi (Gi 1  Pi 1 (Gi  2  Pi 2 ci 2 ))
until
ci1  Gi  PiGi 1  Pi Pi1 Gi 2  ..  Pi Pi 1 ..P1G0  Pi Pi 1 ...P0 c 0
• All carries obtained 3 gate delays after X, Y and c0 are applied.
• One gate delay for Pi and Gi
• Two gate delays in the AND-OR circuit for ci+1
• All sums can be obtained in 1 gate delay after the carries are
computed.
• Independent of n, n-bit addition requires only 4 gate delays.
• This is called Carry Lookahead adder.
Carry-lookahead Adder
x y x y x y x y
3 3 2 2 1 1 0 0

4-bit
c4
B cell
c
3
B cell
c
2
B cell
c
1
B cell . c
0 carry-lookahead
adder
s s s s
3 2 1 0

G3 P3 G2 P2 G P G P
1 1 0 0

Carry-lookahead logic

xi yi

. .
. c
i
B-cell for a single stage
B cell

Gi P i
si
Carry-lookahead Adder(contd..)
• Performing n-bit addition in 4 gate delays independent of n is
good only theoretically because of fan-in constraints.

ci1  Gi  PiGi 1  Pi Pi1 Gi 2  ..  Pi Pi 1 ..P1G0  Pi Pi 1 ...P0 c0

• Last AND gate and OR gate require a fan-in of (n+1) for a n-bit
adder.
• For a 4-bit adder (n=4) fan-in of 5 is required.
• Practical limit for most gates.
• In order to add operands longer than 4 bits, we can cascade
4-bit Carry-Lookahead adders.
• Cascade of Carry-Lookahead adders is called Blocked Carry-
Lookahead adder.
Blocked Carry-Lookahead adder
Carry-out from a 4-bit block can be given as:

c4  G3  P3G2  P3 P2 G1  P3P2 P1G0  P3 P2 P1P0 c0


Rewrite this as:
P0I  P3 P2 P1 P0
G0I  G3  P3 G2  P3 P2 G1  P3 P2 P1G0
Subscript I denotes the blocked carry lookahead and identifies the
block.

Cascade 4 4-bit adders, c16 can be expressed as:

c16  G3I  P3I G2I  P3I P2I G1I  P3I P2I P10G0I  P3I P2I P10P00c0
Blocked Carry-Lookahead adder
x15-12 y15-12 x11-8 y11-8 x7-4 y7-4 x3-0 y3-0

c16 4-bit adder


c12
4-bit adder
c8
4-bit adder
c4
4-bit adder . c0

s15-12 s11-8 s7-4 s3-0

G3I P3I G2I P2I G1I P1I G0I P0I

Carry-lookahead logic

After xi, yi and c0 are applied as inputs:


- Gi and Pi for each stage are available after 1 gate delay.
- PI is available after 2 and GI after 3 gate delays.
- All carries are available after 5 gate delays.
- c16 is available after 5 gate delays.
- s15 which depends on c12 is available after 8 (5+3)gate delays
(Recall that for a 4-bit carry lookahead adder, the last sum bit is
available 3 gate delays after all inputs are available)
Multiplication
Multiplication of unsigned numbers

• Product of 2 n-bit numbers is at most a 2n-bit number.


• Unsigned multiplication can be viewed as addition of
shifted versions of the multiplicand.
Multiplication of unsigned numbers
(contd..)
 We added the partial products at end.
 Alternative would be to add the partial products at each
stage.
 Rules to implement multiplication are:
 If the ith bit of the multiplier is 1, shift the multiplicand and
add the shifted multiplicand to the current value of the
partial product.
 Hand over the partial product to the next stage
 Value of the partial product at the start stage is 0.
Combinational array multiplier

Multiplicand
0 m3 0 m2 0 m1 0 m0
(PP0)
q0
0
PP1 p0
q1
0
PP2 p1
q2
0
PP3 p2
q3
0
,
p7 p6 p5 p4 p3

Product is: p7,p6,..p0

Multiplicand is shifted by displacing it through an array of adders.


Multiplication of unsigned numbers
Typical multiplication cell

Bit of incoming partial product (PPi)


jth multiplicand bit

ith multiplier bit ith multiplier bit

carry out FA carry in

Bit of outgoing partial product (PP(i+1))


Combinational array multiplier
(contd..)
• Combinational array multipliers are:
• Extremely inefficient.
• Have a high gate count for multiplying numbers of practical
size such as 32-bit or 64-bit numbers.
• Perform only one function, namely, unsigned integer
product.
• Sequential techniques incurs more efficiency with less
combinational logic.
Signed magnitude Multiplication
• Multiplication of two signed magnitude numbers are done
by successive shift and add operation.
• The process checks the LSB of the multiplier first.
• If it is 1, the multiplicand is copied down, otherwise 0s are
copied down.
• Then subsequent bits of the multiplier are checked and
either the multiplicand or 0s are copied by shifting one
position to the left from MSB of the last line of bits.
• Finally, the numbers are added and their sum gives the
product.
• The sign of their product is determined from the signs of the
multiplicand and the multiplier.
• If the signs are same, the sign of the product is positive, otherwise negative.
Sequential Multiplication
• The previous process is slightly modified when implemented
in a digital computer.
• First, instead of adding all the lines at the end, the adder adds
two lines of multiplicand bits and accumulate the partial
product in a register.
• Second, instead of shifting the multiplicand to the left, the
partial product is shifted to the right. Because, adding a left-
shifted multiplicand to an unshifted partial product is
equivalent to adding an unshifted multiplicand to a right-
shifted partial product.
• Third, when corresponding bit of the multiplier is 0, there is
no need to add all zeros to the partial product since it will not
alter the value.
Hardware Implementation

• Its hardware requirement is same as that of adder/subtractor


with inclusion of two more registers.
• Q and Qs : Stores the multiplier the sign of multiplier respectively
• B and Bs : Stores the multiplicand the sign of multiplicand
respectively.
Hardware Implementation
• SC (sequence counter): Stores number of bits in the
multiplier. It is decremented by 1. The process continues
until SC is non zero.
• EA register gets the partial product after sum of A and B. E
stores the carry bit of the addition and A accumulates the
sum. Initially E and A store zero.
• Both partial product and multiplier are shifted to the right.
• The LSB of A is shifted into the most significant position of Q,
the bit from E is shifted into the most significant position of
A; 0 is shifted to E.
• After the shift, one bit of the partial product is shifted into
Q, pushing the multiplier bit one position to the right.
• By this process the rightmost flip-flop in register Q (Qn) will
hold the bit of the multiplier which will be inspected next.
Hardware Algorithm
1. First the signs of the multiplicand (Bs)
and the multiplier (Qs) are compared
and both A and Q are set to correspond
to the sign of the product.
2. A and E are cleared and SC is initialized.
3. If the lower order bit of Qn = 1:
multiplicand B is added to the partial
product A.
4. Else register EAQ is shifted one position
to the right to form the new partial
product.
5. SC is decremented by 1.
6. If SC = 0, the process is stopped, else
the process is repeated from step 3 and
new partial product is formed.
7. Final product is available in both A and
Q.
Example
• Multiplicand B=0111
• Multiplier Q = 0101
• OPERATIONS E A Q SC
• Initial values 0 000 101 011
• Qn=1; Add B 111
• First partial product 0 111
• Shift right EAQ 0 011 110 010
• Qn=0; Shift right EAQ 0 001 111 001
• Qn=1; Add B 111
• Second partial product 1 000
• Shift right EAQ 0 100 011 000
• Final Product in AQ = 100011
Signed Multiplication
• Considering 2’s-complement signed operands, what will happen
to (-13)(+11) if following the same method of unsigned
multiplication?
1 0 0 1 1 ( - 13)
0 1 0 1 1 ( + 11)

1 1 1 1 1 1 0 0 1 1

1 1 1 1 1 0 0 1 1
Sign extension is
shown in blue 0 0 0 0 0 0 0 0

1 1 1 0 0 1 1

0 0 0 0 0 0

1 1 0 1 1 1 0 0 0 1 ( - 143)

Sign extension of negative multiplicand.


Signed Multiplication

• For a negative multiplier, a straightforward solution is to


form the 2’s-complement of both the multiplier and the
multiplicand and proceed as in the case of a positive
multiplier.
• This is possible because complementation of both operands
does not change the value or the sign of the product.
• A technique that works equally well for both negative and
positive multipliers – Booth algorithm.
Surprise Test(13.05.22)

1. Multiply each of the following pairs of signed 2’s


complement numbers using Booth algorithm. In
each case assume that A is the multiplicand and B
is the multiplier
a) A=010111 and B=110110
b)A=110011 and B=101100
c) A=110101 and B=011011
d)A=001111 and B=001111
2. What is the advantage of Booth recoding over
normal binary coding for multiplication of binary
numbers? Give examples of best case and worst
case Booth recoding.
Booth Multiplication Algorithm
• This algorithm can be used for multiplication for both
negative and positive multipliers.
• It generates a 2n-bit product and treats both positive and
negative 2’s-complement n-bit operands uniformly.
• The mechanism of operation is that strings of 0s in the
multiplier requires no addition but shifting.
• A string of 1s in the multiplier from bit weight 2k to 2m can
be treated as 2k+1 – 2m.
• Example: Binary number 001110(+14) has a string of 1s
from 23 to 21 (k=3, m=1). So, the number can be
represented as 2k+1 – 2m = 24 – 21 = 14.
• So, the multiplication M x 14, where M is the multiplicand
and 14 is the multiplier can be done as M x 24 – M x 21.
• Thus the product can be obtained by shifting M four times
to the left and subtracting M shifted left one time.
Booth Multiplication Algorithm (contd…)
• Like all other multiplication schemes, Booth algorithm also
shifts the partial product (PP) after checking the multiplier
bits.
• Prior to the shifting, the multiplicand may be added to the
PP, subtracted from the PP, or left unchanged according to
the following rules:
1. The multiplicand is subtracted from the PP upon
encountering the first least significant 1 in a string of 1s in
the multiplier (assuming a dummy 0 before the 1).
2. The multiplicand is added to the PP upon encountering
the first 0 (provided that there was a previous 1) in a string
of 0s in the multiplier.
3. The PP does not change when the multiplier bit is identical
to the previous multiplier bit.
Booth Multiplication Algorithm (contd…)

• The algorithm works for positive or negative multipliers in 2’s


complement representation.
• The reason is a negative number ends with a string of 1s and
the last operation will be a subtraction of the appropriate
weight.
• Example: The multiplier = -14 can be represented in 2’s
complement as 110010 and is treated as -25 + 24 +21 = -14
Booth Recoding Example
• In the Booth scheme the multiplier is scanned from
right to left with a dummy 0:
• -1 times the shifted multiplicand is selected when moving
from 0 to 1,
• +1 times the shifted multiplicand is selected when moving
from 1 to 0,
• 0 times the shifted multiplicand is selected when moving
from 0 to 0 or 1 to 1.
Dummy 0
0 0 1 0 1 1 0 0 1 1 1 0 1 0 1 1 0 0 0

0 +1 -1 +1 0 - 1 0 +1 0 0 - 1 +1 - 1+1 0 - 1 0 0

Booth recoding of a multiplier.


Booth Recoding Table

Multiplier
Version of multiplicand(M)
selected by bit
Bit i Bit i -1

0 0 0 XM
0 1 +1 XM
1 0 1 XM
1 1 0 XM
Booth Multiplication Example

0 1 1 0 1 (+13) 01101
X 1 1 0 1 0 (- 6) 0-1+1-1 0
000 0000000
111 110011
000 01101
111 0011
000 000
111 0 1 1 0 0 1 0 (-78 )
Best case vs Worst case of Booth
Recoding
• Best case – a long string of 1’s
• Worst case – 0’s and 1’s are alternating
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
Worst-case
multiplier
+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1

1 1 0 0 0 1 0 1 1 0 1 1 1 1 0 0
Ordinary
multiplier
0 -1 0 0 +1 -1 +1 0 -1 +1 0 0 0 -1 0 0

0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1
Good
multiplier
0 0 0 +1 0 0 0 0 -1 0 0 0 +1 0 0 -1
Hardware Implementation of Booth
Algorithm

 The multiplicand is stored in BR and multiplier in QR including


their sign bits.
 An extra flip-flop Qn+1 is appended to QR which is cleared to 0.
 The sequence counter SC is initialized to the number of bits in
the multiplier.
Flow Chart for
Booth Algorithm
Multiplier(QR): 10011 (=-13)
Multiplicand(BR): 10111 (=-9)
Example Size of Multiplier:5
Integer Division
Manual Division

21 10101
13 274 1101 100010010
26 1101
14 10000
13 1101
1 1110
1101
1

Longhand division examples.


Longhand Division Steps

• Position the divisor appropriately with respect to


the dividend and perform a subtraction.
• If the remainder is zero or positive, a quotient bit of
1 is determined, the remainder is extended by
another bit of the dividend, the divisor is
repositioned, and another subtraction is performed.
• If the remainder is negative, a quotient bit of 0 is
determined, the dividend is restored by adding back
the divisor, and the divisor is repositioned for
another subtraction.
Hardware Components

Shift left

an an-1 a0 qn-1 q0
Dividend Q
A Quotient
Setting

N+1 bit Add/Subtract


adder
Control
Sequencer

0 mn-1 m0

Divisor M
Restoring Division
1. Initialize the register Q=dividend, M=divisor, A=0
Initialize count=n, where n=no. of bits of the dividend
and take M and Q as (n+1) bits by appending 0s
2. Repeat steps (3-7) while (count !=0)
3. Shift A and Q left one binary position
4. Subtract M from A, and place the answer back in A
5. If the sign of A is 1, set q0 to 0 and add M back to A
(restore A);
6. else set q0 to 1
7. count=count-1
8. Register Q gives the quotient and A gives the
remainder.
Restoring Division Example
Initially 0 0 0 0 0 1 0 0 0
0 0 0 1 1
Shift 0 0 0 0 1 0 0 0
Subtract 1 1 1 0 1 First cycle
Set q0 1 1 1 1 0
Restore 1 1
0 0 0 0 1 0 0 0 0
10 Shift 0 0 0 1 0 0 0 0
11 1000 Subtract 1 1 1 0 1
11 Set q0 1 1 1 1 1 Second cycle
Restore 1 1
10 0 0 0 1 0 0 0 0 0
Shift 0 0 1 0 0 0 0 0
Subtract 1 1 1 0 1
Set q0 0 0 0 0 1 Third cycle

Shift 0 0 0 1 0 0 0 0 1
Subtract 1 1 1 0 1 0 0 1
Set q0 1 1 1 1 1 Fourth cycle
Restore 1 1
0 0 0 1 0 0 0 1 0

Remainder Quotient
Nonrestoring Division
• It avoids restoring A after an unsuccessful subtraction.
1. Initialize the register Q=dividend, M=divisor, A=0 Initialize
count=n, where n=no. of bits of the dividend and take M and
Q as (n+1) bits by appending 0s
2. Shift A and Q left by one bit position
3. Repeat steps (a-e) while (count !=0)
a. If the sign of A is 0, subtract M from A
b. Else add M to A.
c. if the sign of A is 0, set q0 to 1
d. else set q0 to 0.
e. count=count-1
4. If the sign of A is 1, add M to A
5. Q gives the quotient and A gives the remainder.
Nonrestoring Division Example
Initially 0 0 0 0 0 1 0 0 0
0 0 0 1 1
Shift 0 0 0 0 1 0 0 0 First cycle
Subtract 1 1 1 0 1
Set q0 1 1 1 1 0 0 0 0 0

Shift 1 1 1 0 0 0 0 0
Add 0 0 0 1 1 Second cycle

Set q 1 1 1 1 1 0 0 0 0
0

Shift 1 1 1 1 0 0 0 0
1 1 1 1 1 Add 0 0 0 1 1 Third cycle
Restore
0 0 0 1 1 Set q 0 0 0 0 1 0 0 0 1
remainder 0
Add 0 0 0 1 0
Remainder Shift 0 0 0 1 0 0 0 1
Subtract 1 1 1 0 1 Fourth cycle
Set q 1 1 1 1 1 0 0 1 0
0

Quotient
Floating-Point Numbers
and
Operations
Fixed Point vs Scientific notation
• In fixed point notation either the binary point is to the
immediate right or it is to the immediate left.

b31b30b29....................b1b0. implicit binary point

implicit binary point .b31b30b29....................b1b0

• Fixed point representation suffers from a drawback that the


representation can only represent a finite range (and quite
small) range of numbers.

V(b) = b31.231 + b30.230 + bn-3.229 + .... + b1.21 + b0.20

V(b) = b31.2-1 + b30.2-2 + b29.2-3 + .... + b1.2-31 + b0.2-32

0  V (b)  1  2 32  0.9999999998


Fixed Point vs Scientific notation (contd…)
• A more convenient representation is the scientific
representation, where the numbers are represented in the
form:
x  m1.m2m3m4  b e
• This helps to represent a very large number and very small
numbers by changing the position of the binary point
• Since the binary point is said to float, therefore the
numbers are called floating point numbers.
• Components of these numbers are:
Mantissa (m), implied base (b), and exponent (e)

• Significant digits: It refers to the digits those appear after


the binary point in the mantissa. Following example has 7
significant digits. x  0.m m m m m m m  b e
1 2 3 4 5 6 7
Sign and exponent digits
• In a 32-bit number, suppose we allocate 24 bits to represent a
fractional mantissa.
• Assume that the mantissa is represented in sign and magnitude
format, and we have allocated one bit to represent the sign.
• We allocate 7 bits to represent the exponent, and assume that
the exponent is represented as a 2’s complement integer.
• There are no bits allocated to represent the base, we assume
that the base is implied for now, that is the base is 2.
• Since a 7-bit 2’s complement number can represent values in
the range -64 to 63, the range of numbers that can be
represented is:
0.0000001 x 2-64 < = | x | <= 0.9999999 x 263

• In decimal representation this range is:


0.5421 x 10-20 < = | x | <= 9.2237 x 1018
A sample representation

1 7 24

Sign Exponent Fractional mantissa


bit
•24-bit mantissa with an implied binary point to the immediate left
•7-bit exponent in 2’s complement form, and implied base is 2.
Normalization
• Consider the number:
x = 0.0004056781 x 1012
• If the number is to be represented using only 7 significant
mantissa digits, the representation ignoring rounding is:
x = 0.0004056 x 1012
• If the number is shifted so that as many significant digits are
brought into 7 available slots:
x = 0.4056781 x 109 = 0.0004056 x 1012

• Exponent of x was decreased by 1 for every left shift of x.


• A number which is brought into a form so that all of the
available mantissa digits are optimally used, is called a
normalized number.
• Same methodology holds in the case of binary mantissas
0001101000(10110) x 28 = 1101000101(10) x 25
Normalization, overflow and underflow
The procedure for normalizing a floating point number is:
Do (until MSB of mantissa = = 1)
Shift the mantissa left (or right)
Decrement (increment) the exponent by 1
end do
Applying the normalization procedure to: .000111001110....0010 x 2-62
gives: .111001110........ x 2-65

But we cannot represent an exponent of –65, in trying to normalize


The number we have underflowed our representation.
Applying the normalization procedure to: 1.00111000............x 263
gives: 0.100111..............x 264
This overflows the representation.
Excess notation
• Rather than representing an exponent in 2’s complement
form, it turns out to be more beneficial to represent the
exponent in excess notation.
• If 7 bits are allocated to the exponent, exponents can be
represented in the range of -64 to +63, that is: -64 <= e <= 63
• Exponent can also be represented using the following coding
called as excess-64: E’ = Etrue + 64

• In general, excess-p coding is represented as: E’ = Etrue + p


• If Etrue= -64, the Excess code E’= 0
• If Etrue =0, the Excess code E’= 64
• If Etrue =63 the Excess code E’= 127
• This enables efficient comparison of the relative sizes of two
floating point numbers.
IEEE notation
• IEEE Floating Point notation is the standard representation in use.
There are two representations:
• Single precision.
• Double precision.
• Both have an implied base of 2.
• Single precision:
• 32 bits (23-bit mantissa, 8-bit exponent in excess-127
representation)
• Double precision:
• 64 bits (52-bit mantissa, 11-bit exponent in excess-1023
representation)
• Fractional mantissa, with an implied binary point at immediate left.

Sign Exponent Mantissa


1 8 or 11 23 or 52
Single precision (uses one word of 32bit)
Single precision (contd…)

• This is called excess-127 format.


• E’ is in the range 0 ≤E’≤255.
• The end values of this range, 0 and 255, are used to represent
special values.
• The range of E’ for normal values is 1 ≤ E’ ≤ 254.
• So the actual exponent E (= E’-127) is in the range of -126 ≤ E ≤ 127
• As the most significant bit of the mantissa is always 1 in binary, the
M field represent the fractional part of the mantissa i.e. the bits
right of the binary point.
• 32 bit representation is called single precision representation
because it occupies a single 32-bit word.
• The scale factor has a range of 2-126 to 2+127 which is approximately
equal to 10±38.
• The 24 bit mantissa provides approximately same as a 7-digit
Double precision (uses 2 words of 32bits)
Double precision (contd…)

• 64 bit representation is called double precision representation


because it occupies two words of 32-bits each.
• E’ is in the range 0 ≤E ≤ 2047.
• The end values of this range, 0 and 2047, are used to represent
special values.
• The range of E’ for normal values is 1 ≤ E’ ≤ 2046.
• So the actual exponent E (=E’-1023) is in the range of -1022 ≤ E ≤
1023.
• The scale factor has a range of 2-1022 to 2+1023 which is
approximately equal to 10±308.
• The 52 bit mantissa provides approximately same as a 16-digit
decimal value.
Floating point arithmetic
Addition:
3.1415 x 108 + 1.19 x 106 = 3.1415 x 108 + 0.0119 x 108 = 3.1534 x 108

Multiplication:
3.1415 x 108 x 1.19 x 106 = (3.1415 x 1.19 ) x 10(8+6)
Division:
3.1415 x 108 / 1.19 x 106 = (3.1415 / 1.19 ) x 10(8-6)

Biased exponent problem:


If a true exponent e is represented in excess-p notation, that is
as e+p, then consider what happens under multiplication:
a. 10(x + p) * b. 10(y + p) = (a.b). 10(x + p + y +p) = (a.b). 10(x +y + 2p)

Representing the result in excess-p notation implies that the


Exponent should be x+y+p. Instead it is x+y+2p.
Biases should be handled in floating point arithmetic.
Floating point arithmetic: ADD/SUB rule
• Choose the number with the smaller exponent.
• Shift its mantissa right until the exponents of both the
numbers are equal.
• Add or subtract the mantissas.
• Determine the sign of the result.
• Normalize the result if necessary and truncate/round to the
number of mantissa bits.
Floating point arithmetic: MUL rule
• Add the exponents.
• Subtract the bias.
• Multiply the mantissas and determine the sign of the result.
• Normalize the result (if necessary).
• Truncate/round the mantissa of the result.
Floating point arithmetic: DIV rule
• Subtract the exponents
• Add the bias.
• Divide the mantissas and determine the sign of the result.
• Normalize the result if necessary.
• Truncate/round the mantissa of the result.

Note: Multiplication and division does not require alignment of the


mantissas the way addition and subtraction does.

You might also like