m7p Skip
m7p Skip
c12 c8 c4
4-bit 4-bit 4-bit cin
p[12,15] p[8,11] p[4,7] p[0,3]
The longest delay path begins with a carry generated in stage 0 in the least
significant block, propagates through 3 stages in that block, then through
the OR gate, then through k − 2 carry-skip units, and then through 3 of the
4 stages in the most significant block, to the ck−1 signal. We can generalize
these results for a block size of b in a k bit adder as follows:
Tp is the time to propagate a carry through one stage of the adder (from ci
to ci+1 ), and Ts is the delay through one carry-skip stage
Recall that Tp = 2D in the standard ripple-carry adder based on two half-
adders. The delay Ts = 2D since there is an AND gate and an OR gate in
series in the carry-skip unit.
The optimum block size, bopt , is found by differentiating the right-hand side
with respect to b and equating the result to zero.
Tfopt
ixed−skip−add = 4Db
opt
+ 2kD/bopt − 7D
√ √
= 4D k + 2D 2k − 7D
√
= 6.8D k − 7D
For example, in a 32 bit adder, bopt = 4 and the delay is approximately 25D.
Compare this value with the delay of a ripple-carry system, 64D.
√
In a 64 bit adder, bopt = 32 = 5.657. If we use b = 4, the delay is 41D. If
b = 8 the delay is again 41D. An in-between solution is possible with b = 6.
Then there are 10 blocks of 6 and 1 block of 4 (at the most significant end).
The corresponding delay is 35D. The 64 bit ripple-carry adder has delay
128D.
More Fast Adders 3
In the next development there are t carry-skip blocks with sizes bt−1 , · · · , b1 , b0
going from left to right. See Figure 2:
b5 b4 b3 b2 b1 b0
path 1
0 to k-1
path 2
b0 to k-1
path 3
0 to k-b5-1
Consider the equation for the worst case delay from stage 0 to stage k − 1,
corresponding to path 1 in the diagram:
Compare this with the delay for path 1, the longest carry-propagation path.
Again, if Tp = Ts , block size bt−2 can be one larger than block size bt−1
without making this delay path worse than path 1. Blocks to the left of
the center of the adder may also have sizes that form a simple incremental
sequence.
This analysis suggests an organ-pipe structure for the block sizes,
b+t b+t
b, b + 1, · · ·, − 1, − 1, · · ·, b + 1, b
2 2
The worst case delays are from carries generated in stage zero of the adder and
absorbed anywhere in the left-hand half, and from carries generated anywhere
in the right-hand half and absorbed in stage 27. These eight delays of 17D
are made equal by making the block sizes vary in the organ-pipe fashion
described above.
More Fast Adders 5
which gives:
b = k/t − t/4 + 1/2
The worst-case delay through the adder with variable block sizes is then:
√
opt
Tvar−skip−add = 4D k − 5D
√
which is approximately 2 smaller than with fixed block size. Example:
Continuing with our 32 bit adder example of the previous section,
√
topt = 2 32 = 11.3
The delay through this adder, assuming ripple-carry units are used for the
three k/2 bit adders, is kD + Tm , where Tm is the delay through the layer of
2-input multiplexers. This is approximately 1/2 the delay of a ripple carry
adder, but it requires a large increase in the number of transistors. Note that
some sharing of the logic between two adders in the left-half of the unit is
possible. In position j ≥ k/2 with inputs xj and yj , the signals xj ⊕ yj and
xj .yj are needed in both adders for that position.
The carry-select technique could be applied recursively to each of the three
adder blocks in Figure 3. Just one recursive application results in the circuit
of Figure 4.
Each of the three previous k/2 bit adders has been replaced by a carry-
select adder using three k/4 bit adders. The result is wasteful since the
most significant k/4 bit adders are unnecessarily repeated. We could use two
adders in place of four. The result is shown in Figure 5.
More Fast Adders 8
1
0
0
1
k/4
k/4 cin
0
ck/4
1
1
0 k/4 k/4
k/4
k/4
ck/2
k/2
1 1 1
cin
0 0 0
k/4 ck/4
k/4+1 k/4
k/4+1
ck/2
k/4 k/4
k/2+1
cout + high k/2 bits middle k/4 bits low k/4 bits
where T (i) is the delay of the k bit conditional-sum adder using adder blocks
of size i and TF A is the delay of a single full adder.
The number of full-adders is 2k − i and the number of multiplexers is (k −
1)(log2 k + 1).
Note again that some sharing of the logic between the two adders with iden-
tical inputs is possible. The number of full-adders given above is therefore
an overestimate of the circuit needs. The text suggests an equivalent of only
k full adders.
A 32 bit fully recursive conditional-sum adder would have delay 6D + 5Tm ,
which could realistically become 16D or so, and would require 32 full adders
and 124 2-input multiplexers. We would probably implement the multiplex-
ers using pass transistor logic. Inverters would be needed to avoid long chains
of pass transistors.
A 64 bit fully recursive carry-select adder would have delay 6D + 6Tm , which
could realistically become 18D or so, and would require 64 full adders and
315 2-input multiplexers. The use of compound CMOS EX-OR gates could
shave a few more gate delays from these estimates, although the assumption
of 2D for the multiplexer delay could be an underestimate since at least one
inverter per multiplexer would be present in the worst-case delay path.
These delay times are not much larger than for the best carry-prefix net-
works and CLA implementations, and the numbers of transistors are also
comparable.
Assuming a 2-input multiplexer requires 8 transistors (including an inverter
for its control input and another for its output), and each adder stage can be
implemented in 32 transistors, a 64 bit conditional-sum adder would require
approximately 32 × 64 + 8 × 315 = 3, 608 transistors. Recall that the CLA-4
based 64 bit adder had delay 15D and required 4076 transistors, while the
Kogge-Stone carry-prefix network adder had delay 10D are required 6670
transistors.
The conditional-sum of 16 decimal digits is illustrated in the following table.
More Fast Adders 10
2 6 7 7 4 1 0 0 2 6 9 2 4 3 5 8
+ 5 6 0 4 9 7 9 4 1 5 1 7 1 6 4 5
At t0 the sum in the righ-most column is finalized and in the other 15 columns,
two sums are present. These are combined in pairs to form 2 sums for
each group of two columns, then four columns, and so on. For the pair of
columns with inputs 92+17, the two pairs of sums are 10/11 and 9/10. The
logic combines these to form the two sums 109/110 for that group of two
columns. These are then combined with sums 041/042 to the left to form
sums 04209/04210 for 4 columns. The same process occurs at each level. At
the same time, the resolved carry from the right hand end of the sum makes
its way across the columns stradling 1,2,4,8, then 16 columns.
2 6 7 7 4 1 0 0 2 6 9 2 4 3 5 8
5 6 0 4 9 7 9 4 1 5 1 7 1 6 4 5
07 12 07 11 13 08 09 04 03 11 10 09 05 09 09
13 t0
08 13 08 12 14 09 10 05 04 12 11 10 06 10 10
082823894
042096003 t3
082823895
08282389442096003 t4