0% found this document useful (0 votes)
30 views6 pages

Divider Implementation - 2

1) The document describes an algorithm for integer division that works similarly to long division. It breaks the dividend into bits and subtracts the divisor repeatedly to obtain the quotient bits. 2) It then discusses hardware implementation considerations for restoring dividers, which keep track of the actual residue value at each step of division. 3) Special attention is paid to subtracting unsigned integers, as the divider relies on subtraction results. The document outlines how to determine the sign of subtraction results and properly represent zero differences using two's complement arithmetic.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views6 pages

Divider Implementation - 2

1) The document describes an algorithm for integer division that works similarly to long division. It breaks the dividend into bits and subtracts the divisor repeatedly to obtain the quotient bits. 2) It then discusses hardware implementation considerations for restoring dividers, which keep track of the actual residue value at each step of division. 3) Special attention is paid to subtracting unsigned integers, as the divider relies on subtraction results. The document outlines how to determine the sign of subtraction results and properly represent zero differences using two's complement arithmetic.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, OAKLAND UNIVERSITY

Digital Library: Arithmetic Cores RECRLAB@OU

Divider Implementation
ALGORITHM
 The division of two unsigned integer numbers 𝐴⁄𝐵 (where 𝐴 is the dividend and 𝐵 the divisor), results in a quotient 𝑄 and
a remainder 𝑅. These quantities are related by 𝐴 = 𝐵 × 𝑄 + 𝑅.
For the implementation, we follow the hand-division method. We grab bits of A one by one and comparing it with the divisor.
If the result is greater or equal than B, then we subtract B from it. On each iteration, we get one bit of Q. Fig. 1 shows the
algorithm as well as an example: A = 10001100; B = 1001
00001111 Q
ALGORITHM
B 1001 10001100 A
1001 R = 0
for i = n-1 downto 0
10001 left shift R (input = ai)
1001 if R  B
10000 qi = 1, R  R-B
1001 else
qi = 0
1110 end
1001 end

101 R
Figure 1. Division Algorithm
For hardware implementation, we consider restoring dividers (i.e., those that keep the actual residue value at every step).

SUBTRACTION OF UNSIGNED NUMBERS REPRESENTED WITH 𝑛 BITS: 𝑇 = 𝑅 − 𝐵


 This point deserves special attention as the divider hardware relies on a result obtained here.
 We usually determine the sign of the subtraction by sign-extending 𝑅 and 𝐵 so that they are in 2’s complement representation
with 𝑛 + 1 bits. Then, we do: 𝑇 = 𝑅 + 𝑛𝑜𝑡(𝐵) + 1, where 𝑇 = 𝑡𝑛 𝑡𝑛−1 𝑡𝑛−2 … 𝑡0 , and 𝑡𝑛 determines the sign of the subtraction
result.
However, when 𝑅 and 𝐵 are unsigned, we can compute 𝑛𝑜𝑡(𝐵) without sign-extending 𝐵. We then analyze 𝑐𝑛 = 𝑐𝑜𝑢𝑡:
- If 𝑐𝑛 = 1 → 𝑅 ≥ 𝐵 (and 𝑅 − 𝐵 is equal to 𝑡𝑛−1 𝑡𝑛−2 … 𝑡0 , i.e. it is an unsigned number with 𝑛 bits)
- If 𝑐𝑛 = 0 → 𝑅 < 𝐵 (here 𝑅 − 𝐵 is NOT equal to 𝑡𝑛−1 𝑡𝑛−2 … 𝑡0)

NOTE ABOUT THE 2’S COMPLEMENT OF ZERO


 Let 𝐴 be a number in 2’s complement with 𝑛 bits: 𝐴 = 𝑎𝑛−1 𝑎𝑛−2 … 𝑎0 , where 𝐴 = −𝑎𝑛−1 2𝑛−1 + ∑𝑛−2𝑖=0 𝑎𝑖 2 is the signed decimal
𝑖

value of 𝐴.
 The 2’s complement of 𝐴 is given by: 𝑃 = 𝑛𝑜𝑡(𝐴) + 1. 𝑃 = 𝑝𝑛−1 𝑝𝑛−2 … 𝑝0
If 𝑃 and 𝐴 are thought as 𝑛-bit unsigned numbers, i.e.: 𝐴 = ∑𝑛−1
𝑖=0 𝑎𝑖 2 , 𝑃 = ∑𝑖=0 𝑝𝑖 2 then: 𝑃 = 2 − 𝐴.
𝑖 𝑛−1 𝑖 𝑛

 What if 𝐴 = 0? Here 𝑃 = 2 requires 𝑛 + 1 bits. Why 𝑃 is not zero? This is actually


𝑛
cn-1=1

consistent with 2’s complement arithmetic, as in the operation 𝑄 − 𝐴:


cn=1

𝑄 − 𝐴 = 𝑄 + 𝑛𝑜𝑡(𝑃) + 1, we let 𝑐𝑖𝑛 hold the value of 1, so that if 𝐴 = 0, then


𝑛𝑜𝑡(𝐴) = 11 … 11 and 𝑐𝑖𝑛 = 1. This way, 𝑛𝑜𝑡(𝐴) + 1 is properly represented. Fig. 1 cin = c0
2 shows this operation. Note that with 𝑐𝑖𝑛 = 1, all carries (from 𝑐0 to 𝑐𝑁 ) are Q: q q q
n-1 n-2 n-3 ...q 0 +
one. The result of the operation is then Q. There is no overflow as 𝑜𝑣𝑒𝑟𝑓𝑙𝑜𝑤 = P: 1 1 1 ...1
𝑐𝑛 𝑐𝑛−1 = 0. Thus, the case 𝐴 = 0 works very well for 2’s complement qn-1qn-2qn-3...q0
operations, if we include let 𝑐𝑖𝑛 carry the value of 1.
Figure 2. Q-A when A=0
COMPUTING 𝑹 − 𝑩 WITH 𝒏 bits
 𝑅 = 𝑟𝑛−1 𝑟𝑛−2 … 𝑟0 , 𝐵 = 𝑏𝑛−1 𝑏𝑛−2 … 𝑏0 . With 𝑅, 𝐵 unsigned, we have 0 ≤ 𝑅, 𝐵 ≤ 2𝑛 − 1
 To do 𝑅 − 𝐵, we sign-extend 𝑅 and 𝐵 to 𝑛 + 1 bits turning them into two numbers in 2’s complement representation. The
sign-extension actually amounts to zero-extending. Then: 𝑅 = 0𝑟𝑛−1 𝑟𝑛−2 … 𝑟0 , 𝐵 = 0𝑏𝑛−1 𝑏𝑛−2 … 𝑏0 . 𝑟𝑛 = 𝑏𝑛 = 0. In 2’s
complement, we have that: 0 ≤ 𝑅, 𝐵 ≤ 2𝑛 − 1. It follows that: −(2𝑛 − 1) ≤ 𝑅 − 𝐵 ≤ 2𝑛 − 1. Thus 𝑅 − 𝐵 can be represented
in 2’s complement with 𝑛 + 1 bits (as expected).
 Let 𝐾 = 𝑛𝑜𝑡(𝐵) + 1, 𝐾 = 𝑘𝑛 𝑘𝑛−1 𝑘𝑛−2 … 𝑘0 . In unsigned representation, 𝐾 = 2𝑛+1 − 𝐵.

Fig. 3 shows the operation 𝑅 − 𝐵 by using: 𝑅 + 𝐾, where 𝐾 = 𝑛𝑜𝑡(𝐵) + 1. Recall that we let 1 be held by 𝑐𝑖𝑛. Note that if 𝐵 =
0 → 𝐾 = 2𝑛+1 (here 𝐾 is represented by the second operator as well as 𝑐𝑖𝑛 = 1)

1 Daniel Llamocca
ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, OAKLAND UNIVERSITY
Digital Library: Arithmetic Cores RECRLAB@OU

1 cin
R: 0rn-1rn-2...r0 - R: 0rn-1rn-2...r0 +
B: 0bn-1bn-2...b0 K: 1kn-1kn-2...k0
Figure 3. Operation 𝑅 − 𝐵 ≡ 𝑅 + 𝐾 = 𝑅 + 𝑛𝑜𝑡(𝐵) + !

Now, we determine the value of 𝑘𝑛−1 :


 Case 𝐵 ≠ 0: 1 ≤ 𝐵 ≤ 2𝑛 − 1 → 2𝑛+1 − (2𝑛 − 1) ≤ 𝐾 ≤ 2𝑛+1 − 1 ∴ 2𝑛 + 1 ≤ 𝐾 ≤ 2𝑛+1 − 1. Thus, 𝑘𝑛 = 1
 Case 𝐵 = 0: 𝐾 = 2𝑛+1 . 𝐾 requires 𝑁 + 2 bits, with 𝑘𝑛+1 = 1, and 𝑘𝑛 = 0:

𝐾 𝑘𝑛 𝑘𝑛−1 𝑘𝑛−2 … 𝑘0 𝑘𝑛
2𝑛 100…0
𝐵≠0 2𝑛 + 1 100…1
𝑘𝑛 = 1
(or 𝐵 > 0) … …
2𝑛+1 − 1 111…1
𝐵=0 2𝑛+1 1000…0 𝑘𝑛 = 0

Now, we consider 𝑅, 𝐵, and 𝐾 to represent unsigned integers.


𝑛 𝑛 𝑛−1 𝑛−1

𝑅 − 𝐵 ≡ 𝑅 + 𝐾 = ∑ 𝑟𝑖 2𝑖 + ∑ 𝑘𝑖 2𝑖 = ∑ 𝑟𝑖 2𝑖 + 𝑘𝑛 2𝑛 + ∑ 𝑘𝑖 2𝑖
𝑖=0 𝑖=0 𝑖=0 𝑖=0
𝑛−1 𝑛−1

𝑅 + 𝐾 = 𝑅 + 2𝑛+1 − 𝐵 = ∑ 𝑟𝑖 2𝑖 + 2𝑛+1 − ∑ 𝑏𝑖 2𝑖
𝑖=0 𝑖=0

 𝑅 − 𝐵 < 0:
Since 𝑅 ≥ 0 → 𝐵 > 0 → 𝑘𝑛 = 1
 𝑅 + 2𝑛+1 − 𝐵 = ∑𝑛−1 𝑖
𝑖=0 𝑟𝑖 2 + 2
𝑛+1 − ∑𝑛−1 𝑏 2𝑖 < 2𝑛+1
𝑖=0 𝑖
 𝑅 + 𝐾 = ∑𝑖=0 𝑟𝑖 2 + 𝑘𝑛 2 + ∑𝑛−1
𝑛−1 𝑖 𝑛 𝑖
𝑖=0 𝑘𝑖 2 < 2
𝑛+1 → ∑𝑛−1 𝑟 2𝑖 + ∑𝑛−1 𝑘 2𝑖 < 2𝑛
𝑖=0 𝑖 𝑖=0 𝑖
o The (𝑛 + 1)-bit sum (considering the operation as unsigned) of R and K is lower than 2𝑛+1 . Then, there is no overflow
in the (𝑛 + 1)- bit unsigned sum. Thus 𝑐𝑛+1 = 0.
o The 𝑛-bit sum (considering the operations as unsigned) of 𝑅 and 𝑘𝑛−1 𝑘𝑛−2 … 𝑘0 is lower than 2𝑛 . Thus, there is no
overflow of the 𝑛-bit unsigned sum. Thus 𝑐𝑛 = 0.

 𝑅 − 𝐵 ≥ 0:
 𝑅 + 2𝑛+1 − 𝐵 = ∑𝑛−1 𝑖
𝑖=0 𝑟𝑖 2 + 2
𝑛+1 − ∑𝑛−1 𝑏 2𝑖 ≥ 2𝑛+1
𝑖=0 𝑖
 𝑅 + 𝐾 = ∑𝑖=0 𝑟𝑖 2 + 𝑘𝑛 2 + ∑𝑛−1
𝑛−1 𝑖 𝑛 𝑖
𝑖=0 𝑘𝑖 2 ≥ 2
𝑛+1 → ∑𝑛−1 𝑟 2𝑖 + ∑𝑛−1 𝑘 2𝑖 ≥ 2𝑛+1 − 𝑘 2𝑛
𝑖=0 𝑖 𝑖=0 𝑖 𝑛
o The (𝑛 + 1)-bit sum (considering the operation as unsigned) of R and K is greater or equal than 2𝑛+1 . Then, there is
overflow of the (𝑛 + 1)-bit unsigned sum. Thus 𝑐𝑛+1 = 1.
o For the n-bit sum of R and 𝑘𝑛−1 𝑘𝑛−2 … 𝑘0 , we have two cases:
𝐵 > 0 → 𝑘𝑛 = 1. Then ∑𝑛−1 𝑖 𝑛−1 𝑖
𝑖=0 𝑟𝑖 2 + ∑𝑖=0 𝑘𝑖 2 ≥ 2
𝑛+1 − 2𝑛 → ∑𝑛−1 𝑟 2𝑖 + ∑𝑛−1 𝑘 2𝑖 ≥ 2𝑛
𝑖=0 𝑖 𝑖=0 𝑖
𝐵 = 0 → 𝑘𝑛 = 0. Then ∑𝑖=0 𝑟𝑖 2 + ∑𝑖=0 𝑘𝑖 2 ≥ 2𝑛+1
𝑛−1 𝑖 𝑛−1 𝑖

In both cases, the n-bit sum (considering the operands as unsigned) of 𝑅 and 𝑘𝑛−1 𝑘𝑛−2 … 𝑘0 is greater of equal than 2𝑛 .
So, there is overflow of the 𝑛-bit unsigned sum. Thus 𝑐𝑛 = 1 when 𝑅 ≥ 𝐵.

 2’s complement operation 𝑅 − 𝐵 with 𝑛 + 1 bits: There is no overflow of the subtraction as 𝑐𝑛 = 𝑐𝑛−1 .
 For 𝑅 − 𝐵 ≥ 0: The result 𝑇 = 𝑅 − 𝐵 is a positive number, thus 𝑇𝑛 = 0. Therefore 𝑡𝑛−1 𝑡𝑛−2 … 𝑡0 contains 𝑅 − 𝐵 in unsigned
representation.

In conclusion:

 𝐼𝑓 𝑅 < 𝐵 → 𝑐𝑛 = 0. The 𝑛 bits 𝑇𝑛−1 𝑇𝑛−2 … 𝑇0 DO NOT contain the result 𝑅 − 𝐵.


 𝐼𝑓 𝑅 ≥ 𝐵 → 𝑐𝑛 = 1. The 𝑛 bits 𝑇𝑛−1 𝑇𝑛−2 … 𝑇0 DO represent 𝑅 − 𝐵 in unsigned representation.

2 Daniel Llamocca
ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, OAKLAND UNIVERSITY
Digital Library: Arithmetic Cores RECRLAB@OU

RESTORING ARRAY DIVIDER FOR UNSIGNED INTEGERS

 𝐴, 𝐵: positive integers in unsigned representation. 𝐴 = 𝑎𝑁−1 𝑎𝑁−2 … 𝑎0 with 𝑁 bits, and 𝐵 = 𝑏𝑀−1 𝑏𝑀−2 … 𝑏0 with 𝑀 bits, with
the condition that 𝑁 ≥ 𝑀. 𝑄 = 𝑞𝑢𝑜𝑡𝑖𝑒𝑛𝑡, 𝑅 = 𝑟𝑒𝑠𝑖𝑑𝑢𝑒. 𝐴 = 𝐵 × 𝑄 + 𝑅.
M bits
In this parallel implementation, the result of every stage is called
the remainder 𝑅𝑖 . Y0

Fig. 4 depicts the parallel algorithm with 𝑁 stages. For each stage Stage 0 ...
𝑖, 𝑖 = 0, … , 𝑁 − 1, we have: R0
𝑅𝑖 : output of stage 𝑖. Remainder after every stage. Y1
𝑌𝑖 : input of stage 𝑖. It holds the minuend.
Stage 1 ...
For the next stage, we append the next bit of 𝐴 to 𝑅𝑖 . This becomes R1
𝑌𝑖+1 (the minuend): Y2
𝑌𝑖+1 = 𝑅𝑖 &𝑎𝑁−1−𝑖 , 𝑖 = 0, … , 𝑁 − 1
Stage 2 ...
At each stage 𝑖, the subtraction 𝑌𝑖 − 𝐵 is performed. If 𝑌𝑖 ≥ 𝐵 then
R2
𝑅𝑖 = 𝑌𝑖 − 𝐵. If 𝑌𝑖 < 𝐵, then 𝑅𝑖 = 𝑌𝑖 .
Y3
# of
Stage 𝑌𝑖 Computation of 𝑅𝑖
𝑅𝑖 bits
Stage 3 ...
𝑅0 = 𝑌0 − 𝐵, 𝑖𝑓 𝑌0 ≥ 𝐵

...

...
0 𝑌0 = 𝑎𝑁−1 1
𝑅0 = 𝑌0 , 𝑖𝑓 𝑌0 < 𝐵
𝑅1 = 𝑌1 − 𝐵, 𝑖𝑓 𝑌1 ≥ 𝐵
RM-2
1 𝑌1 = 𝑅0 &𝑎𝑁−2 2 YM-1
𝑅1 = 𝑌1 , 𝑖𝑓 𝑌1 < 𝐵
𝑅2 = 𝑌2 − 𝐵, 𝑖𝑓 𝑌2 ≥ 𝐵
2 𝑌2 = 𝑅1 &𝑎𝑁−3
𝑅2 = 𝑌2 , 𝑖𝑓 𝑌2 < 𝐵
3 Stage M-1 ...
RM-1
… … … …
YM
𝑅𝑀−1 = 𝑌𝑀−1 − 𝐵, 𝑖𝑓 𝑌𝑀−1 ≥ 𝐵
M-1 𝑌𝑀−1 = 𝑅𝑀−2 &𝑎𝑀−𝑁 M
𝑅𝑀−1 = 𝑌𝑀−1 , 𝑖𝑓 𝑌𝑀−1 < 𝐵 Stage M ...
RM
Since 𝐵 has 𝑀 bits, the operation 𝑌𝑖 − 𝐵 requires 𝑀 bits for both
YM+1
operands. To maintain consistency, we let 𝑌𝑖 be represented with
𝑀 bits. Stage M+1 ...
RM+1
𝑅𝑖 : output of each stage. For the first 𝑀 stages, 𝑅𝑖 requires 𝑖 + 1
bits. However, for consistency and clarity’s sake, since 𝑅𝑖 might be YM+2
the result of a subtraction, we let 𝑅𝑖 use M bits.
Stage M+2 ...
For stages 0 𝑡𝑜 𝑀 − 2:
...

𝑅𝑖 is always transferred onto the next stage. Note that we transfer ...
RN-2
𝑅𝑖 with 𝑀 − 1 least significant bits. There is no loss of accuracy YN-1
here since 𝑅𝑖 at most requires M-1 bits for stage M-2. We need 𝑅𝑖
with M-1 bits since 𝑌𝑖+1 uses 𝑀 bits. Stage N-1 ...
RN-1
Stages 𝑀 − 1 𝑡𝑜 𝑁 − 1:
Starting from stage 𝑀 − 1, 𝑅𝑖 requires 𝑀 bits. We also know that M+1 bits
the remainder requires at most 𝑀 bits (maximum value is 2𝑀 − 2). Figure 4. Parallel implementation algorithm
So, starting from stage M-1 we need to transfer 𝑀 bits.
As 𝑌𝑖+1 now requires 𝑀 + 1 bits, we need 𝑀 + 1 units starting from stage 𝑀.

 To implement the operation 𝑌𝑖 − 𝐵 we use a subtractor. When 𝑌𝑖 ≥ 𝐵 → 𝑐𝑜𝑢𝑡𝑖 = 1, and when 𝑌𝑖 < 𝐵 → 𝑐𝑜𝑢𝑡𝑖 = 0. This 𝑐𝑜𝑢𝑡𝑖
becomes a bit of the quotient: 𝑄𝑖 = 𝑐𝑜𝑢𝑡𝑁−1−𝑖 . This quotient Q requires N bits at most.
 Also, the final remainder is the result of the last stage. The maximum theoretical value of the remainder is 2𝑀 − 2, thus the
remainder 𝑅 requires 𝑀 bits. 𝑅 = 𝑅𝑁−1 .
 Also, note that we should avoid a division by 0. If B=0, then, in our circuit: 𝑄 = 2𝑁 − 1 and R = 𝑎𝑀−1 𝑎𝑀−2 … 𝑎0 .

3 Daniel Llamocca
ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, OAKLAND UNIVERSITY
Digital Library: Arithmetic Cores RECRLAB@OU

COMBINATIONAL ARRAY DIVIDER

Fig. 5 shows the hardware of this array divider for N=8, M=4. Note that the first M=4 stages only require 4 units, while the next
stages require 5 units. This is fully combinatorial implementation.
 Each level computes 𝑅𝑖 . It first computes 𝑌𝑖 − 𝐵. When 𝑌𝑖 ≥ 𝐵 → 𝑐𝑜𝑢𝑡𝑖 = 1, and when 𝑌𝑖 < 𝐵 → 𝑐𝑜𝑢𝑡𝑖 = 0. This 𝑐𝑜𝑢𝑡𝑖 is used
to determine whether the next 𝑅𝑖 is 𝑌𝑖 − 𝐵 or 𝑌𝑖 .
 Each Processing Unit (PU) is used to process 𝑌𝑖 − 𝐵 one bit at a time, and to let a particular bit of either 𝑌𝑖 − 𝐵 or 𝑌𝑖 be
transferred on to the next stage.

b3 0 b2 0 b1 0 b0 a7
x03 x02 x01 x00 b a
c04 c03 c02 c01 c00 PU
q7 PU PU PU PU 1
y03 y02 y01 y00 a6
x13 x12 x11 x10
c14 c13 c12 c11 c10
q6 PU PU PU PU 1
y13 y12 y11 y10 a5
x23 x22 x21 x20 cout FA cin
c24 c23 c22 c21 c20
q5 PU PU PU PU 1
y23 y22 y21 y20 a4 1 0
s
x33 x32 x31 x30
c34 c33 c32 c31 c30
q4 PU PU PU PU 1
r
y33 y32 y31 y30 a3
0
x44 x43 x42 x41 x40
c45 c44 c43 c42 c41 c40
q3 PU PU PU PU PU 1
y44 y43 y42 y41 y40 a2
x54 x53 x52 x51 x50
c55 c54 c53 c52 c51 c50
q2 PU PU PU PU PU 1
y54 y53 y52 y51 y50 a1
x64 x63 x62 x61 x60
c65 c64 c63 c62 c61 c60
M N
q1 PU PU PU PU PU 1
y64 y63 y62 y61 y60 a0
A N N Q x74 x73 x72 x71 x70
ARRAY
c75 c74 c73 c72 c71 c70
B M DIVIDER M q0 PU PU PU PU PU 1
R
y74 y73 y72 y71 y70

r3 r2 r1 r0
Figure 5. Fully Combinatorial Array Divider architecture for N=8, M=4

FULLY PIPELINED ARRAY DIVIDER

Fig. 6 shows the hardware core of the fully pipelined array divider with its inputs, outputs, and parameters.
M N

A N N Q
M M
B R
ARRAY
E DIVIDER v
resetn
clock
Figure 6. Fully pipelined IP core for the array divider

4 Daniel Llamocca
ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, OAKLAND UNIVERSITY
Digital Library: Arithmetic Cores RECRLAB@OU

Fig. 7 shows the internal architecture of this pipelined array divider for N=8, M=4. Note that the first M=4 stages only require
4 units, while the next stages require 5 units. Note that the enable input ‘E’ is only an input to the shift register on the left,
which is used to generate the valid output 𝑣. This way, valid outputs are readily signaled. If E=’1’, the output result is computed
in N cycles (and v=’1’ after N cycles).

E b3 0 b2 0 b1 0 b0 a7 a6 a5 a4 a3 a2 a1 a0

x03 x02 x01 x00


c04 c03 c02 c01 c
PU PU PU PU 1 00
y03 y02 y01 y00

x13 x12 x11 x10


c14 c13 c12 c11 c10
PU PU PU PU 1
y13 y12 y11 y10

x23 x22 x21 x20


c24 c23 c22 c21 c20
PU PU PU PU 1
y23 y22 y21 y20

x33 x32 x31 x30


c34 c33 c32 c31 c30
PU PU PU PU 1
y33 y32 y31 y30
0
x44 x43 x42 x41 x40
c45 c44 c43 c42 c41 c40
PU PU PU PU PU 1
y44 y43 y42 y41 y40

x54 x53 x52 x51 x50


c55 c54 c53 c52 c51 c50
PU PU PU PU PU 1
y54 y53 y52 y51 y50

x64 x63 x62 x61 x60


c65 c64 c63 c62 c61 c60
PU PU PU PU PU 1
y64 y63 y62 y61 y60

x74 x73 x72 x71 x70


c75 c74 c73 c72 c71 c70
PU PU PU PU PU 1

y74 y73 y72 y71 y70

v q7 q6 q5 q4 q3 q2 q1 q0 r3 r2 r1 r0
Figure 7. Fully Pipelined Array Divider architecture for N=8, M=4

5 Daniel Llamocca
ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, OAKLAND UNIVERSITY
Digital Library: Arithmetic Cores RECRLAB@OU

ITERATIVE RESTORING DIVIDER

Fig. 8 shows the iterative hardware architecture as well as the state machine. Here, 𝑅𝑖 is always held at register R. The subtractor
computes 𝑌𝑖 − 𝐵. This requires 𝑀 + 1 bits in the worst case.
 If 𝑌𝑖 ≥ 𝐵 then 𝑅𝑖 = 𝑌𝑖 − 𝐵. Yi here is the minuend. 𝑌𝑖 − 𝐵 is loaded onto register R. Note that only M bits are needed.
 If 𝑌𝑖 < 𝐵, then 𝑅𝑖 = 𝑌𝑖 . Here only 𝑌𝑖 is loaded onto register R. This is done by just shifting 𝑎𝑁−1 into register R

Note that R requires M bits since it holds the remainder at every stage. Also, since we always shift 𝑐𝑜𝑢𝑡𝑖 onto register A, the
quotient Q is held at A in the last iteration.
E DA DB resetn=0
S1
N M
sclrR 1, ER1
C 0
L E
LEFT SHIFT w
E REGISTER
REGISTER 0
E
LAB

EA

A B 1
M

Y 0 LAB, EA 1
RM-1RM-2...R0aN-1

cout M+1 0&B


S2

cout cout - ER 1, EA 1

M+1
1
cout LR  1
M
0
sclrR sclr
FSM LR L LEFT SHIFT aN-1
ER w
E REGISTER no
C=N-1 C  C+1

M+1 M yes
M S3
aN-1 RM-1RM-2...R0 done 1

N M
0 1
E
done Q R
Figure 8. Iterative Divider

6 Daniel Llamocca

You might also like