0% found this document useful (0 votes)

64 views22 pages

Anarraymultiplier: Computer Organization and Design, 2Nd Edition, Morgan Kaufmann, 1998 (Sec

This document discusses different designs for an array multiplier. It begins by describing a simple school-taught method of multiplication using an array and summarizing its shortcomings. It then discusses two alternative designs: 1) A two-cycle multiplier that uses a single adder and cycles multiple times, and 2) Using a carry-save representation to defer carry propagation and speed up the multiplier array, at the cost of a final carry propagation step. The document focuses on exploring design option 2 using logical effort analysis to evaluate adder cell designs for the multiplier array.

Uploaded by

mohammad66s

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views22 pages

Anarraymultiplier: Computer Organization and Design, 2Nd Edition, Morgan Kaufmann, 1998 (Sec

Uploaded by

mohammad66s

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

An Array Multiplier

Logical effort is very effective for comparing large structures to select one of
several quite different overall organizations. Rather than completing a detailed
design of each alternative, we can use the method of logical effort to estimate
the performance of a design sketch. In this extended example, we explore several
designs for a multiplier. We make no claim that we find the best multiplier
design for any situation; we offer it only to illustrate the application of logical
effort.
The example illustrates many of the techniques of logical effort explained in
the book. The reader is assumed to be familiar with logical effort applied to static
gates (Chapters 1 and 4), asymmetric gates (Chapter 6), forks (Chapter 9), and
branches (Chapter 10). An alternative design using domino logic (Chapter 8)
is explored briefly. Methods of building multipliers and adders are introduced
briefly in the example; for a more complete treatment, the reader is invited to
study a text on computer design, such as D. A. Patterson and J. L. Hennessy,
Computer Organization and Design, 2nd edition, Morgan Kaufmann, 1998 (Section 4.6).
A multiplier is an interesting design example because it affords a rich set of
design choices. We can build a multiplier that uses a large array of adders or
an alternative that cycles a smaller array of adders several times to complete the
product. We will use logical effort to seek a design with minimum delay for a
given array configuration.

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 1

An Array Multiplier

Our exploration of the design proceeds in several steps:

1. Define the context of the multiplier array, its inputs and outputs, and its basic
structure. This step identifies key critical paths and structural elements, such
as an adder cell.
2. Evaluate different designs for the adder cell and its interconnection into an
array. Here logical effort is vital for quick evaluation of a set of combinatorial
alternatives. We will estimate the least delay through the cell without doing
a detailed design or picking transistor sizes.
3. Evaluate the design of the parts of the multiplier other than the adder cell
to estimate overall performance. This step uncovers a property of the design
that suggests altering the overall structure slightly.
4. Complete the design of the adder cell by determining transistor sizes and
other implementation details.
As the design proceeds, we will face many design decisions, often exchanging
space and speed. Not all of the reasonable choices are worked out in detail.

Multiplier Structure
There are many different structures that can be used to implement a multiplier.
Method 1 in Table 1 shows diagrammatically the simple method for multiplication we were all taught in school, modified so that all the numbers use binary
notation. In the illustration, a 5-bit multiplicand is multiplied by a 6-bit multiplier to obtain an 11-bit product. Each row of the array is the product of the
multiplicand and a single bit of the multiplier. Because multiplier bits are either
0 or 1, each row of the array is either 0 or a copy of the multiplicand. The array
is laid out so that each row is shifted to the left to account for the increasing binary significance of the multiplier bits. We sum the rows of the array to obtain
the product. You may wish to verify that the result is correct: in base ten, the
multiplicand is 11, the multiplier 46, and the product 506.
This simple method can be implemented directly in hardware by using five
separate 2-input, 5-bit adders to add the six rows of the array shown in the
table. But this design suffers in two ways. First, it is large because there are a lot
of adders. And second, the delay will be substantial because bits must propagate
through all five adders and because each adder has a carry path to resolve a 5-bit
carry.

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 2

1 Multiplier Structure

Table 1 Three methods for multiplying a 5-bit multiplicand by a 6-bit

multiplier.
Method 1: Full multiply
Multiplicand (5 bits):
Multiplier (6 bits):
Array:

Product (11 bits):

1
1
0
0
1
1

0
1
0
1
1

1
1
0
1

1
0
0

0
1
0
0
1
1

0
0
0
1
0
1
0
1

0
0
1
1
0

0
0

0
1
0

0
0
0
1
0

0
0
1

0
1

0
0
1
0
0

0
0
0
0
1

1
1
0
1
1

0
0
0
1
1

0
1
0

0
1

00
0

0
1
10

00
0
0
1
11

00
0
1

0
00

00
0
1
0
10

00
0
0
0
01

00
1
0
1
10

10
0
0
1
1

10
1
0

11
1

0
0

0
1
1

0
0
0
1

0
1
0
1
1

Method 2: Two-cycle multiply, first cycle

Previous partial product:
Array:

Next partial product:

second cycle
Previous partial product:
Array:

Next partial product:

0
0

0
1
1

Method 3: Two-cycle multiply, carry save, first cycle

Previous partial product:
Array:

Next partial product:

second cycle
Previous partial product:
Array:

Next partial product:

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 3

0
00

0
1
10

00
0
1
1
0

An Array Multiplier

A small multiplier can be constructed by iteratively adding values to a partial

product. (This technique is explained in Patterson and Hennessy, 1998, Section 4.6.) The partial product is held in a register initialized to 0. During each
cycle, the multiplicand is added to the partial product if the low-order bit of the
multiplier is 1, the multiplier is shifted one bit to the right, and the multiplicand is shifted one bit to the left. Because this method uses only a single adder,
it is much smaller than the full adder array. In our example, this method would
require six cycles to complete because we require a 6-bit multiplier.
Instead of either of these extremes, we will study a design that lies in between.
It will use two cycles to form the product, and each cycle will form the partial
product governed by three bits of the multiplier. Method 2 in Table 1 shows
how this technique is related to the full array: the first cycle computes the same
answer as the top three rows of the full array, and the second cycle computes
the same answer as the bottom three rows of the full array. Note that the loworder three bits of the next partial product formed in the first cycle (shown in
boldface type) become part of the final result. The remaining five bits become
the previous partial product for the second cycle.
This two-cycle method may still suffer from long carry paths in the adders,
especially if the number of bits in the multiplicand (the length of the carry path)
exceeds the number of rows in the multiplier array. By using carry-save form, we
can speed up the multiplier array by deferring carry resolution until after the
two cycles have finished. (See Patterson and Hennessy, 1998, Figure 4.56 and
associated text.) The essence of carry-save form is to represent a partial product
by remembering two bits for each binary position. To obtain a conventional binary number from its carry-save form, we must sum the two bits corresponding
to each binary position and propagate any carries that are generated. Method 3
in Table 1 shows the double-bit partial product representation. You may wish to
verify that the next partial products in carry-save form have exactly the same
values as their counterparts in Method 2.
There is a price for this structure, however: after the multiplier array is
finished, five bits of the final product remain in carry-save form. To obtain a
carry-resolved result, the carry-save result must be fed to a 2-input, 5-bit full
adder, which has a 5-bit carry path. Although carry lookahead techniques can
be used for speed, the time required for this carry resolution step may be critical
to the overall design. But the virtue of the structure shown in Method 3 is that
the carry resolution is done only once, not once per cycle of the multiplier array.

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 4

1 Multiplier Structure

T4
P04

T3
P03

T2
P02

T1
P01

T0
P00
a b

cry s

P14

P13

P12

P11

P23

P22

P21

P20

P10

0
P24

Figure 1 The multiplier array of adder cells. A partial product enters at the
top, the Pij values are added, and a new partial product emerges at the bottom, in
carry-save form. In an actual layout, successive rows would probably be shifted
to the right so that the adder cells would form a perfect rectangular array.

But how does carry-save form avoid carry paths in the array? The answer
is illustrated in Figure 1, laid out to resemble the format shown in Table 1,
Method 3. The previous partial product enters at the top, in carry-save form,
from register cells Tj . That is, each bit of the partial product is represented by
two signals, which must be added to determine the binary value. Both signals
from Tj have binary weight 2j in the result. The next partial product is produced
at the bottom of the figure. Note that three bits of the result (T1, T2, and T3)
are produced in carry-resolved rather than carry-save form and become a part
of the final product. The remaining five bits of partial product are routed to the
inputs of the registers Tj , j = 0 . . . 4 to become the partial product used as input
in the next iteration of the multiplier. To simplify the diagram, the drawing of
each register Tj is split: its outputs appear at the top of the figure and its inputs
appear at the bottom.

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 5

An Array Multiplier

The array consists of three rows of five adder cells each. Each row is responsible for adding a shifted form of the multiplicand to the partial product and
passing the partial product to the row below. Each adder cell contains a 1-bit
full adder with three inputs of equal binary value, labeled a, b, and c. Each cell
has two outputs representing the sum (s) and carry (cry) that result from adding
together the three inputs. The sum output represents the same binary weight as
each of the three inputs. The carry output represents twice the binary weight of
the sum output. As you can see from the figure, each cell combines a sum and
a carry input from cells earlier in the array with a product bit, labeled Pij . Note
that the longest path through the array is three adder cells, corresponding to the
number of rows in the array.
The product bits Pij are generated by combining multiplicand and multiplier
bits using an and gate, Pij = Qi Rj , where Qi are bits of the multiplier and Rj
are bits of the multiplicand, again using the notation that bit j has weight 2j . The
effect is that a row of cells adds 0 to the partial product if the corresponding bit
of the multiplier is 0 and adds the multiplicand if it is 1. The structure of the
array causes the multiplicand to shift to the left, corresponding to the weight
of the multiplier bit. The multiplier Q and multiplicand R are both stored in
registers operated at the same time as the partial-product register T. Of course,
at the end of the first cycle, the multiplier must be shifted three bits to the right
so that in the second cycle the high-order three bits of the multiplier are used
to control the array. Neither the registers Q and R nor the shifting logic appears
in the figure. Also not shown are registers to save the low-order three bits of the
result of the first cycle (T1, T2, and T3).
The design task is to make the long paths through the arrays of and gates
and adders operate quickly. In order to obtain results that are somewhat more
general than the example, well represent the number of bits in the multiplicand
by the symbol m and the number of rows in the multiplier array by n. So the
longest path through the array has an and gate and n adder cells. Note that
the choice of a carry-save representation avoids a carry path that would have a
length m + n, which is usually much longer than n, the length of the carry path
in the array. By retaining m and n as parameters, we can obtain results that will
let us evaluate alternatives to the 3 5 array design if we wish to reevaluate the
speed-space trade-off.

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 6

2 Adder Cell

Figure 2 An adder cell sums three inputs a, b, and c to produce one sum bit s
and one carry bit cry.

Table 2

Logical efforts of inputs for asymmetric majority and parity gates.

Majority

Parity

Input
a
b
c

Logical effort g
2
4
4

Bundle
a
b
c

Logical effort g
6
12
6

Total

Adder Cell
A key element in these paths is the adder cell, shown in Figure 2. It consists of
a parity circuit to generate the sum output and a majority circuit to generate
the carry output. Circuits for the majority and parity gates are shown in Figures 4.6(b) and 4.7(b) in the book. In order to reduce the logical effort of the
majority and parity circuits, we have chosen topologically asymmetric forms.
The three inputs of the parity circuit drive different numbers of transistors. One
input is twice as hard to drive as the other two. Similarly, two inputs of the majority circuit are twice as hard to drive as the other one (see Table 2). Thus, some
inputs of these circuits are easier to drive than others.
Circuit details of these two gates can be ignored for now. The parity gate
requires dual-rail (true and complement) inputs. The majority gate inverts the
sense of its inputs, that is, it computes the complement of the majority of its
inputs. However, because of the symmetry of the majority function, the gate

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 7

An Array Multiplier

will compute true majority by complementing each of its inputs. We can defer
considering these details until later because they do not enter into logical effort
calculations.
A critical part of the design of the multiplier array is choosing which inputs
of the majority and parity circuits to connect to which outputs of previous
circuits. From a functional point of view, the three inputs to the majority and
parity circuits are interchangeable. From a logical effort point of view, however,
they are not. We are thus faced with a combinatorial problem. Shall we connect
the sum output to the easy-to-drive or harder-to-drive inputs of the majority
and parity circuits of the subsequent adder? Inside each adder cell, should we
connect the hard-to-drive input of the parity circuit to the easy-to-drive input
of the majority circuit to make the input capacitances of the three adder inputs
more equal?
Let us first consider the situation within a single adder cell. As shown in
Figure 2, we will call the load driven by the majority circuit x and the load driven
by the parity circuit y. The relative sizes of these two loads will depend on how
these signals are connected to inputs of subsequent adder cells. The figure also
shows that stages of amplification may be inserted before (or perhaps after) the
logic circuits. We will probably use 21 forks to provide the required true and
complement signals, but lets not worry about that just yet. The wiring structure
of the array depicted in Figure 1 tells us that the sum and carry paths in each
cell should have identical delays so that all paths through the entire array that
traverse n adder cells will have the same overall delay. We know that the fastest
design will be one that minimizes the effort along a path through the entire
array and therefore one that also minimizes the effort along a path from inputs to
outputs of an adder cell. Moreover, logical effort tells us the relationship between
the path effort F through the adder, the input capacitances of the three adder
inputs, and the output loads to be driven, x and y.
The adder cell has two different configurations, depending on which inputs
of the majority gate are tied to which inputs of the parity gate (see Figure 3).
These two configurations are named V and W. The following list shows the
input capacitance on each of the three cell inputs (a, b, and c) for the two
configurations. In each case, the effort F is the effort along the paths from the
specified input to either the carry or sum outputs.

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 8

2 Adder Cell

Configuration V

Configuration W

Figure 3 Two wiring configurations for the adder cell. Numbers inside the
gate symbols are the logical effort of the input or input bundle.

Configuration V

Configuration W

Ca = (2x + 6y)/F

Ca = (2x + 12y)/F

Cb = (4x + 12y)/F

Cb = (4x + 6y)/F

Cc = (4x + 6y)/F

These expressions are derived from the basic equation of logical effort,
F = GH. By way of example, let us consider the a input of configuration V. If
we assume a fraction of the input capacitance is devoted to the path through
the majority gate, we have Fcry = 2x/(Ca ), where Fcry is the effort to drive the
carry output. Analogously for the sum output, we have Fs = 6y/((1 )Ca ).
Recognizing that the two efforts should be equal for equal delay, F = Fcry = Fs ,
we find that the branching factor drops out (why?), and we obtain F =
(2x + 6y)/Ca , the expression shown in the preceding list. The constants 2, 4,
6, and 12 that appear in these expressions are the logical efforts of the various
inputs of the parity and majority circuits. Notice that in configuration W the
two inputs called b and c have identical capacitance.
Next we need to consider how the sum and carry outputs of a cell are
connected to the a, b, or c inputs of subsequent adders. There are six possible
wiring configurations (see Table 3), characterized by the way in which each input
is connected to the previous cell. These wiring configurations are combined
with the internal configurations described previously (V and W) to yield twelve
combinations, V1 . . . V6 and W1 . . . W6. Because some of the inputs have

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 9

An Array Multiplier

Table 3

Summary of intercell wiring configurations.

Case
1
2
3
4
5
6

Connections
Pij a
Pij a
Pij b
Pij b
Pij c
Pij c

Table 4
is V2.

cry b
cry c
cry a
cry c
cry a
cry b

sc
sb
sc
sa
sb
sa

Summary of intercell wiring cases. The fastest design is V4; the slowest

Type

V1
V2
V3
V4
V5
V6
W1
W3
W4

12.0
14.32
9.29
8.61
14.0
10.0
10.0
11.21
13.29

identical logical effort, some combinations have the same effect. Listing only
distinct cases, we have V1 . . . V6, W1, W3, W4 (Table 4).
Which combination gives the best performance? Because the answer to this
combinatorial problem can be found only by working out the least-delay circuit
for each possible combination of inputs and outputs, we shall write expressions
for capacitance for each of the various combinations. For each combination we
can set x and y equal to the capacitances of the inputs that they drive and derive a
value for F as a consequence. Well show the steps for combination W4. Because
the carry signal is connected to input c of the next stage, we have
x = Cc = (4x + 6y)/F

(1)

Likewise, because the sum signal is connected to input a of the next stage,
y = Ca = (2x + 12y)/F

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 10

(2)

2 Adder Cell

We can set y = 1 without losing generality because the absolute capacitances

chosen for the gate sizes are not important; it is the relative sizes we are trying
to determine. We can solve the two equations in two unknowns (x and F) to get
F = 13.29. By solving these two equations simultaneously, we find designs in
which the effort F to generate the sum is the same as that to generate the carry;
thus, the delays along these two paths will be the same!1
We can find values of F for the other cases in a similar way. Table 4 lists
the nine cases and summarizes the results: the fastest design is V4; the slowest
is V2. It is interesting that both the fastest and the slowest designs come from
configuration V.
Why is it that the fastest design is fastest? In the type V designs, the logical
effort expressions show that the a input is easiest to drive, the b input is hardest
to drive, and the c input requires an intermediate amount of drive. The fastest
design, V4, connects the sum to a, which is easiest to drive, connects the carry
to c, which is next easiest to drive, and leaves the hardest-to-drive input, b, for
the product and gate to drive. By connecting the sum output to the easiestto-drive input, this connection compensates for the larger logical effort of the
parity circuit that generates the sum signal. By leaving the hardest-to-drive input
for the product unit to drive, this connection passes the buck of logical effort
onto another circuit not yet included in our analysis.
Where do we stand now? Whats the delay through an adder cell? From
Table 4 we know that the effort along the critical path of the best configuration,
V4, is F = 8.61. This suggests a two-stage design. Using Equation 1.17, we have
D = NF 1/N + P = 2(8.61)1/2 + 7 = 12.9

(3)

based on an estimate of the parasitic delay of the majority or parity gate of 6pinv ,
the addition of pinv assuming one stage of inverter to make a two-stage design,
and the rule of thumb that pinv = 1.
We can also compute the load capacitances of the inputs a, b, and c from
the equations prepared for configuration V. Given that F = 8.61, we find that
Ca = y = 1, Cb = 2, and Cc = x = 1.3. Thus, the carry (cry) signals will have
slightly more drive than the sum (s) signals, matched to the c and a input
1. In general, the delays along two paths will differ if the parasitic delays of the gates on the paths
differ. But it so happens that the estimated parasitic delays for the majority and parity gates are both
6pinv , so we are justified in asserting that equal efforts along the paths yield equal delays.

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 11

An Array Multiplier

capacitances, respectively. We need to enspect the edges of the adder cell array to
ensure that these conventions are acceptable. At the bottom of the array, carry
and sum signals enter the partial-product register; at the top of the array, the
register drives the a and c inputs of adder cells. So by great good fortune, we
simply arrange that the two bits stored for each binary position in register T are
carried at different powers: one has input and output drive of 1.0 units and one
of 1.3 units!

The Rest of the Structure

Let us now consider circuits to generate the product signals Pij in Figure 1. The
multiplier and multiplicand bits each drive the inputs of a large number of
and gates. Notice that there is a regular pattern to the wiring: one multiplier
bit is distributed to all the and gates in a row, and one multiplicand bit is
distributed to all the and gates along a diagonal sloping downward to the
left. This suggests we run multiplier wires horizontally and multiplicand wires
vertically (or diagonally) through the array. These wires must be long enough
to span the width or height of the array and may thus contribute noticeable
stray capacitance. Because of both fanout and stray capacitance, there will be
considerable effort associated with driving these wires, so we must anticipate
including amplifiers to drive them.
Figure 4 shows a schematic diagram of the circuit. In our analysis, we will
assume the gate is a nand gate, even though it generates the wrong polarity
signal to send to the adder cell. Well worry about polarity, true/complement
bundles, and the like only after weve completed a preliminary delay estimate.
This circuit is an ideal place to use an asymmetric nand gate, favoring the
input that is most heavily loaded due to fanout. In our example, m = 5 and
n = 3, so input a should be favored. We want to equate the effort F = GBH of
driving a row with the effort of driving a column. The electrical efforts will be
the same for the path from a multiplier bit to Pij as for a path from a multiplicand
bit to Pij . If we let ga stand for the logical effort of input a to the nand gate and
b stand for the other input, we have
ga m = gb n

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 12

(4)

3 The Rest of the Structure

Rj
*
*

m
loads
total

Pij
n
loads
total

Figure 4 The multiplier and multiplicand are combined to produce values Pij
to be summed in the adder array.

Equations 6.2 and 6.3 in the book give expressions for ga and gb for a nand
gate. Assuming = 2, m = 5, and n = 3, we solve to find a value for the
symmetry factor s = 0.28. As a result, ga = 1.13 and gb = 1.86.
Now we can estimate the delay of the circuit. The electrical effort will be 2
because, first, we can assume multiplicand and multiplier bits have the same
drive as register T, which drives the a inputs, with Ca = 1; and, second, the
nand gate must drive input b, with Cb = 2. Thus, we have F = GBH = ga m(2) =
1.13 5 2 = 11.3. This calls for a two-stage design, so

(5)
D = NF 1/N + P = 2 11.3 + 1 + 2 = 9.7
Its worth remarking that if we had not used an asymmetric nand gate, the delay
would have been 10.3.
The delay of this structure plays a different role in different parts of the adder
array. In the top row, the delay is on the critical path through the entire array:
the top row of adder cells cannot begin working until the P0j bits have been
computed. However, for all other rows, the delay in computing the product
bits will be far less than the delay of other signals reaching the adder cells

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 13

An Array Multiplier

(compare the delay of computing the product bits with the delay of an adder
cell computed in the previous section). This suggests that we might prefer a
different design for all but the top row, using delayed multiplicand bits to drive
the nand gates. In other words, we drive the top row as shown in Figure 4 but
fork off a separate signal to carry the multiplicand bit to other rows. The fork
will have more stages of amplification that take more time, but it will not offer
much load to the multiplicand bit. If we divide the load equally between the top
row nand gates and the fork, the situation would be similar to setting n = 2
in the preceding analysis. If we do so, we find ga = 1.07. This differs very little
from 1.13, illustrating the limited gains available from asymmetric structures.
This analysis lets us assume that ga 1.1 irrespective of the number of rows, n.
As the number of columns, m, increases, the situation is not as favorable.
The load on a multiplier bit increases, and we must build a suitable string of
amplifiers to drive the bit to all nand gates in the top row as fast as possible.
(Again, other rows are not on the critical path and could get by with less drive.
But the drive for the top row is on the critical path for the whole array.) We
have F = GBH = ga m(2) = 2.2m. We know that good designs will bear an effort
3.59 for each stage, so the best number of stages will be N = ln F/ ln , thus
giving rise to a delay:
D = NF 1/N + Npinv + 1 = 3.6 ln F + 1 = 3.6 ln m + 3.8

(6)

(The term 1 accounts for the extra parasitic delay of a nand gate compared
to that of an inverter.) For example, for m = 32, D = 16.3.

Evaluating the Design

Were now in a position to estimate the delay through the entire multiplier. The
critical path will be the generation of the product bits, Pij , that drive the top row
of adder cells, followed by three times the delay through the adder cell. Note
that all paths through adder cells are at most three cells long, although the paths
exhibit all combinations of carry and sum signals. So the estimate of the total
delay is 9.7 + 3 12.9 = 48.4. Note that the delays generating the Pij bits for the
second and third row of adder cells are not on the critical path because the delay
in presenting the partial product to these stages is 9.7 + 12.9, which exceeds the
delay of the product bits.

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 14

5 Circuit Design

Is a total delay of 48.4 acceptable? Perhaps this is just the number we were
seeking! If its smaller than our target, we can omit some of the amplifier stages
we used to obtain least delay. If its larger than we intended, we need to make
some structural changes to the multiplier because these delays are estimates of
the least possible delayit may not be possible to achieve such a low delay when
design details are considered.
What kinds of structural changes could we contemplate? The delay to compute the Pij bits for the top row of adder cells could be removed entirely by
pipelining, that is, computing these values in the previous cycle and saving them
in a register. Of course, we would need to worry about how these values are prepared for the first iteration of the multiplier. If successful, this change would
reduce the delay to 3 12.9 = 38.7.
If still faster cycle time is required, we could consider reducing from 3 to 2 the
number of multiplier bits used in each cycle, thus reducing the number of rows
of adder cells to 2. Of course, this might not speed up the overall multiplication
time because additional iterations would be required in order to use all the
multiplier bits (6 in our example).
We could also consider changing circuit technologies. Rather than using
static gates, we could consider domino gates. Well defer a detailed explanation
of this choice until a later section, but we know from experience that domino
circuits are likely to be faster than their static counterparts.
Its also important to remark that our analysis is not perfect. Weve ignored
the design of the partial-product register. If the register cells use multiplexers
driven by a two-phase clock, they will have two stages, each with a logical effort
of 2, for a combined logical effort of 4. Ideally, two stages should be bearing an
effort more like 2 13. So the registers can amplifythat is, their outputs
should drive more load than their inputs require. Our design does not exploit
this amplification because we would need adder cells with different transistor
sizes: the top row would use larger transistors than the bottom row. Weve chosen
not to attempt this optimization, but it is easy to analyze (see Exercise 3).

Circuit Design
In this section, we assume that the structure explored in previous sections
has been selected and that its time to complete the design. We need to insert
amplifiers so that paths have the right number of stages, we need to worry about

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 15

An Array Multiplier

the polarity of various signals, and we need to provide true and complement
forms of signals that drive the three-input parity gate in the adder cells.
Weve already noticed a problem: the critical path through the adder cell has
an effort of F = 8.61, which would suggest a two-stage design, but if we use a
21 fork to generate true and complement forms, followed by the parity stage,
that path will have an effective length between two and three stagestoo many.
If we want to have exactly two stages in each adder cell, we can use a dualrail design, in which every input and output to the cell is carried as a two-wire
bundle containing true and complement forms of the signal. Figure 5 shows a
detailed design of such a cell. The true and complement inputs to the parity
gates appear separately; each input is labeled with its logical effort. The output
loads are 1.0 and 1.3, and the input loads 1, 2, and 1.3, as determined by the
analysis in Section 2. We can verify the effort of one of the paths (e.g., a) by
summing the effort GH of each branch: F = 2(1.3/1) + 3(1/1) + 3(1/1) = 8.6,
which is exactly what our previous analysis determined. So this circuit design

should yield least delay. Its delay will be D = 2 8.6 + 1 + 6 = 12.9.

Knowing that the stage effort should be 8.6, its easy to determine the
transistor sizes of each of the gates. For example, suppose the inverter connected
to the a input has an input capacitance equal to that of a transistor whose width
is 6.4 microns and whose length is the minimum permitted by the fabrication
process. Then the sum of the transistor widths driven by the inverter should be

6.4 8.6 = 18.8 microns. This should be divided in proportion to the effort
borne by each branch, that is, 2 1.3 to the majority gate and 3 1 to each of the
two parity gates. Thus, the majority gate should have 5.7 microns of input load
and the parity gates 6.6 microns of input load. The other paths can be analyzed
analogously.
Its interesting to consider the performance of a one-stage design; that is, to
remove the inverters from the circuit of Figure 5 and invert the polarity of the
inputs. We now have D = 8.6 + 6 = 14.6, compared to 12.9 for the best twostage design.
One problem with the dual-rail design may prove fatal: it takes a lot of area.
The area of the additional majority and parity gates, which are large, may exceed
that of an alternative single-rail design. The complex wiring topologies of these
gates require lots of room for wires, contacts, and crossovers. If youve ever tried
to lay out a two-input xor gate, you can appreciate the problem!

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 16

5 Circuit Design

2
4 M
4

cry
load = 1.3

2
4 M
4

cry
load = 1.3

3
3
6
6
3
3

s
load = 1.0

3
3
6
6
3
3

s
load = 1.0

a
1.0
a
1.0

b
2.0
b
2.0

c
1.3
c
1.3

Figure 5 Design for an adder cell in which all inputs and outputs are carried
in dual-rail bundles.

By way of comparison, lets consider a single-rail design that uses 21 forks

within each adder cell to develop the true and complement signals required to
drive the parity gate. The design is shown in Figure 6, again annotated with
capacitances and logical effort. Note that in order to generate the true form of
the carry signal, we feed the majority gate with complemented inputs.
To determine transistor sizes, we need to analyze the three forks. But the
forks are not independent: they share a choice of sizes of the majority and
parity gates. Its easier to start by estimating the sizes of these gates. We know
that the path effort of the a and c paths is 8.6. We also know that the path
length will be effectively between 2 and 3, say, 2.5. Thus, the stage effort is
f (8.6)1/2.5 = 2.4. If the majority gate is to have this stage effort and drive a load
capacitance of 1.3, then the input with logical effort of 2 must have capacitance
2(1.3/2.4) 1. The input with logical effort of 4 will thus have capacitance of
about 2. Analogously for the parity gate, an input with logical effort of 3 will

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 17

An Array Multiplier

Figure 6 Design for an adder cell in which all inputs and outputs are carried
in single-rail form.

have capacitance 3(1.0/2.4) = 1.25, and the input with logical effort of 6 will
have capacitance 2.5.
Now we can analyze the forks with known loads. Consider the fork attached
to the a signal. We dont know how the input load 1.0 will divide on the two
paths, but we know that if the 2-inverter leg of the fork has input capacitance
, the other has input capacitance 1 . The 2-inverter leg is loaded with
capacitance 1.25. The 1-inverter leg is loaded with 1 + 1.25 = 2.25. Setting delays
in the two paths equal, we have
p
2 1.25/ + 2 = 2.25/(1 ) + 1

(7)

We find = 0.47 and the delay through either path is 5.25. To get the delay of the
entire path, we must add the stage effort of the majority gate (calculated above
to be 2.4) and its parasitic delay, 6. The total is thus 5.25 + 2.4 + 6 = 13.65. This
is only slightly worse than the best (12.9), and this circuit may be much easier
to lay out than the dual-rail form. Although the 21 forks look forbidding, the
inverters and their wiring arent hard to lay out.

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 18

6 Other Design Choices

Other Design Choices

This section explores two additional design considerations. The first addresses
the question of the long wires to distribute multiplier and multiplicand bits to
the array. The second looks into a domino logic design for the adder cell.

6.1

Wiring capacitance
Do the long wires in the multiplier array contribute significant delay? To address
this question, we need to estimate the size of an adder cell. For the first time, we
have to choose an actual transistor size in microns. Lets assume were using a
0.6 micron process, with a design style in which 1 unit of capacitance (e.g., the
load presented by input a in Figure 6) represents 9 microns of transistor width;
for example, an inverter with pulldown width of 3 microns and pullup width
of 6 microns. By looking at a layout for a two-input xor gate and making some
crude estimates, we can guess that the parity gate could be laid out in a box 20
microns wide and 30 microns high. The majority gate would be 15 microns wide
and the same height (were assuming power and ground wires run horizontally,
so all the gates should have the same height). The forks would require about 10
microns each. Thus, the cell might fit in a box 65 microns wide and 30 microns
high.
A multiplicand bit wire runs vertically to span three rows of cells, or about
90 microns. Using the rule of thumb that a wire is about 1/10 the capacitance
per unit length of a transistor gate, the wire would offer the same capacitance
as 9 microns of gate width, or about 1 unit of capacitance. This is negligible.
A multiplier bit wire runs horizontally to span five columns of cells, or
about 320 microns. This converts to 32 microns of gate width, or about 4
units of capacitance. This is enough load that we should consider it in our
design. If the inverter driving the horizontal wire (signal Qi in Figure 4) has
an input capacitance of 1 unit (which we assumed) and a stage effort of around
4 (characteristic of minimum-delay paths), then the load capacitance of the five
nand gates can be assumed to total about 4 units. Thus, the wiring capacitance
doubles the assumed load on the inverter. We should redo the analysis for this
part of the circuit when we choose final transistor sizes. We will find that the
greater wire load may necessitate a second inverter in the Qi amplifier. Moreover,
since the nand gates represent a smaller fraction of the Qi load, they may be

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 19

An Array Multiplier

increased in size, reducing their efforts while increasing the amplifier effort by
a smaller amount.

6.2

Domino logic
Lets briefly analyze the use of domino logic in the adder cell. (To understand
this section, you will need to be familiar with Chapter 8 in the book.) Well
use the domino gate structure shown in Figure 8.4(a), consisting of a dynamic
stage using a clocked evaluation transistor, followed by a hi-skew inverter. The
dynamic stages will use pulldown networks analogous to those of Figure 8.5
that compute the majority and parity functions. Consider the parity pulldown
network shown in Figure 4.6(b). If we add a series evaluation transistor, well
need to make each of the transistors have width 4 rather than width 3 in order
to match the pulldown characteristics of the reference inverter. Thus, the logical
effort of the a, a , c, and c inputs of the dynamic circuit is 4/3, and the logical
effort of the b and b inputs is 8/3. An analogous attack on the majority gate of
Figure 4.7(b) finds logical effort of the a input to be 1, while that of the b and c
inputs is 2. We recall that the hi-skew inverter (Figure 7.4) has logical effort 5/6
for rising outputs (Table 7.2).
Figure 7 shows the cell put together. Note that we must carry the inputs and
outputs in dual-rail form because of the nature of domino logic: because the
parity network requires true and complement forms and we must have an even
number of stages along every signal path, we have to carry both forms. Despite
the changes in logical effort from the static gates, we still will consider a V4
internal wiring configuration with relative loads on carry and sum signals in
the ratio 1.3 to 1. We find that the path effort through the cell along the a or c
paths is 3.3, contrasted with 8.61 for the static design. Such a low path effort calls
for a one-stage design, but with domino logic we must have an even number, so
two is the best we can do!
What is the delay of this cell? We estimate the parasitic delay of the majority and parity stages is about 3 (why?) and that of the hi-skew gates is 5/6
(Section 8.2.2). So we have

D = 2 3.3 + 3 + 5/6 = 7.5

(8)
This is far better than 12.9, the best delay with static gates. This analysis is
consistent with the rule of thumb that domino logic is 1.5 to 2 times the speed
of static logic.

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 20

7 Conclusion

1
2 M
2

cry
load = 1.3

1
2 M
2

cry
load = 1.3

4/3
4/3
8/3
8/3
4/3
4/3

s
load = 1.0

4/3
4/3
8/3
8/3
4/3
4/3

s
load = 1.0

a
1.0
a
1.0

b
2.0
b
2.0

c
1.3
c
1.3

Figure 7

Design for an adder cell using domino logic.

Lest you get euphoric over this result, beware that domino circuits, though
fast, require careful design and noise analysis. Making the parity and majority
dynamic gates work properly will probably require secondary precharge transistors on one or more nodes within the pulldown network. A fair amount of
analysis and simulation may be required to demonstrate that the gates work
correctly with sufficiently wide operating margins.

Conclusion
The multiplier design example has illustrated some of the strengths of logical
effort:
When a great many design alternatives exist, logical effort can be a simple way
to find the best. The twelve different wiring topologies of the adder cell (two
internal configurations, six external wiring patterns) were easily compared
with logical effort.

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 21

An Array Multiplier

Even without detailed design of circuits and transistor sizes, logical effort
gives a delay estimate. We were able to estimate the delay of an n m array
implemented in static gates to be 12.9n + 3.6 ln m + 3.8 (see Equations 3
and 6).
Preliminary delay estimates reveal weaknesses in designs. The time required
to generate the product bits, Pij , is a significant fraction of the total delay.

Exercises
1. The analysis of the single-rail circuit, Figure 6, approximated the sizes for
the majority and parity gates. Work out the best sizes for these gates.
2. Analyze the domino form of the adder cell to determine whether the V4
configuration is the best and what the relative loading of the carry and sum
signals should be.
3. At the end of Section 4, we pointed out that the adder array might be faster if
different rows of adder cells used different designs. Estimate the maximum
speed increase that could be obtained in the given example (n = 3).

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 22

TCM-4127 ZF 5HP19 (All)
100% (3)
TCM-4127 ZF 5HP19 (All)
2 pages
Lpvlsi Unit-4
100% (1)
Lpvlsi Unit-4
37 pages
Iaetsd High Efficient Carry Skip Adder in Various Multiplier Structures
No ratings yet
Iaetsd High Efficient Carry Skip Adder in Various Multiplier Structures
6 pages
4bit Array Multiplier
No ratings yet
4bit Array Multiplier
4 pages
Implementation of ALU Using Modified Radix-4 Modified Booth Multiplier
No ratings yet
Implementation of ALU Using Modified Radix-4 Modified Booth Multiplier
15 pages
Lecture 2 - Adders - Multipliers
No ratings yet
Lecture 2 - Adders - Multipliers
75 pages
CHAPTER
No ratings yet
CHAPTER
47 pages
Unit-Iv Adders:: Binary Adder Notations and Operations
No ratings yet
Unit-Iv Adders:: Binary Adder Notations and Operations
33 pages
Computer Organization and Architecture: UNIT-2
No ratings yet
Computer Organization and Architecture: UNIT-2
29 pages
A Comparative Study of Different Multiplier Designs PDF
No ratings yet
A Comparative Study of Different Multiplier Designs PDF
4 pages
University of Massachusetts Dept. of Electrical & Computer Engineering
No ratings yet
University of Massachusetts Dept. of Electrical & Computer Engineering
19 pages
Lec 15
No ratings yet
Lec 15
14 pages
Coa M3
No ratings yet
Coa M3
27 pages
Selected Topics of VLSI Design: Part 3: Multiplication
No ratings yet
Selected Topics of VLSI Design: Part 3: Multiplication
14 pages
CO Unit 2
No ratings yet
CO Unit 2
110 pages
Half-Wave Rectifier With R Load
50% (2)
Half-Wave Rectifier With R Load
19 pages
Fingerprint Based Exam Hall Authentication System
100% (3)
Fingerprint Based Exam Hall Authentication System
15 pages
Faculty of Engineering & Technology Electrical and Computer Engineering Department Advanced Digital Design Encs533 Course Project
No ratings yet
Faculty of Engineering & Technology Electrical and Computer Engineering Department Advanced Digital Design Encs533 Course Project
9 pages
This Chapter Introduces Various Algorithms and Techniques To Perform Some Basic Arithmetic Operations Related To A Computer
No ratings yet
This Chapter Introduces Various Algorithms and Techniques To Perform Some Basic Arithmetic Operations Related To A Computer
5 pages
Coa 2ND UNIT
No ratings yet
Coa 2ND UNIT
36 pages
CO Unit 4
No ratings yet
CO Unit 4
31 pages
Combinational Multiplier
No ratings yet
Combinational Multiplier
5 pages
8 Karatsuba Document
No ratings yet
8 Karatsuba Document
75 pages
Module 2 - Number System Arithmetic
No ratings yet
Module 2 - Number System Arithmetic
60 pages
Multiplexer-Based Array Multipliers: Kiamal Z. Pekmestzi
No ratings yet
Multiplexer-Based Array Multipliers: Kiamal Z. Pekmestzi
9 pages
Module 4 Coa
No ratings yet
Module 4 Coa
67 pages
Design of Area, Power and Delay Efficient High-Speed Multipliers
No ratings yet
Design of Area, Power and Delay Efficient High-Speed Multipliers
8 pages
Array Multiplier
No ratings yet
Array Multiplier
24 pages
Array Vs Tree2
No ratings yet
Array Vs Tree2
42 pages
A Comparative Study On Different Multipliers-Survey
No ratings yet
A Comparative Study On Different Multipliers-Survey
14 pages
Vlsi Unit 4
No ratings yet
Vlsi Unit 4
114 pages
Fast Multiplication
No ratings yet
Fast Multiplication
55 pages
DSD Lab 8 Handout
No ratings yet
DSD Lab 8 Handout
9 pages
Coa M3
No ratings yet
Coa M3
66 pages
Coa m3 Part2 Extra Slides
No ratings yet
Coa m3 Part2 Extra Slides
66 pages
Unit 1 Coa
No ratings yet
Unit 1 Coa
52 pages
ASM Design Example Bin Mult
No ratings yet
ASM Design Example Bin Mult
11 pages
Hardware Implementation of Adaptive Noise Cancellation Over DSP Kit TMS320C6713
No ratings yet
Hardware Implementation of Adaptive Noise Cancellation Over DSP Kit TMS320C6713
12 pages
COA - Module 3 Computer Arithmetic - Part2
No ratings yet
COA - Module 3 Computer Arithmetic - Part2
43 pages
Exercise Workbook9 Basic
No ratings yet
Exercise Workbook9 Basic
40 pages
VLSI
No ratings yet
VLSI
20 pages
Module3 DDCO
No ratings yet
Module3 DDCO
37 pages
DICD Fall 2024 Lecture 09 Arithmetic Circuits
No ratings yet
DICD Fall 2024 Lecture 09 Arithmetic Circuits
52 pages
COA-II Unit
No ratings yet
COA-II Unit
65 pages
Blue Screen of Death Codes For XP
No ratings yet
Blue Screen of Death Codes For XP
4 pages
Performance Comparison of Multipliers For Power-Speed Trade-Off in VLSI Design
No ratings yet
Performance Comparison of Multipliers For Power-Speed Trade-Off in VLSI Design
5 pages
Review of Multiplier S
No ratings yet
Review of Multiplier S
6 pages
The Efficient Implementation of An Array Multiplier
No ratings yet
The Efficient Implementation of An Array Multiplier
5 pages
Hall Effect Sensor
No ratings yet
Hall Effect Sensor
6 pages
Chapter 6. Arithmetic: Computer Organization
No ratings yet
Chapter 6. Arithmetic: Computer Organization
74 pages
Unit 2
No ratings yet
Unit 2
44 pages
Adders and Multipliers
No ratings yet
Adders and Multipliers
59 pages
High Performance Multipliers in Quicklogic Fpgas
No ratings yet
High Performance Multipliers in Quicklogic Fpgas
9 pages
Pradeepkumar S K Asst. Prof. Dept. of E&Ce K I T, Tiptur
No ratings yet
Pradeepkumar S K Asst. Prof. Dept. of E&Ce K I T, Tiptur
42 pages
Adm 870
No ratings yet
Adm 870
54 pages
EEE 241 - Lecture 15 & 16
No ratings yet
EEE 241 - Lecture 15 & 16
28 pages
San To SH
No ratings yet
San To SH
8 pages
DC73C12RBU6
No ratings yet
DC73C12RBU6
2 pages
CAT Battery Quick Reference Guide
No ratings yet
CAT Battery Quick Reference Guide
2 pages
21CS401 Ca Unit Ii - 230223 - 190425
No ratings yet
21CS401 Ca Unit Ii - 230223 - 190425
26 pages
Roland BK-9
No ratings yet
Roland BK-9
166 pages
Final Note Arithmetic Vtu
No ratings yet
Final Note Arithmetic Vtu
30 pages
Design of Binary Multiplier Using Adders-3017 PDF
No ratings yet
Design of Binary Multiplier Using Adders-3017 PDF
5 pages
PaperID 74S201921
No ratings yet
PaperID 74S201921
7 pages
Design of Modified Low Power Booth Multiplier
No ratings yet
Design of Modified Low Power Booth Multiplier
6 pages
Sieving For Perfect Quality Control: Milling
No ratings yet
Sieving For Perfect Quality Control: Milling
12 pages
Sarthak Gupta
No ratings yet
Sarthak Gupta
10 pages
C02651 3 - SR 700L (E)
No ratings yet
C02651 3 - SR 700L (E)
8 pages
Design FF Low Power Multiplier Unit Using Wallace Tree Algorithm IJERTV9IS020069
No ratings yet
Design FF Low Power Multiplier Unit Using Wallace Tree Algorithm IJERTV9IS020069
5 pages
Multifunction Quad Power Amplifier With Built-In Diagnostics Features
No ratings yet
Multifunction Quad Power Amplifier With Built-In Diagnostics Features
17 pages
Vlsi Design Cia2
No ratings yet
Vlsi Design Cia2
2 pages
Resume 1
No ratings yet
Resume 1
5 pages
The Science and Engineering of Materials, 4 Ed
No ratings yet
The Science and Engineering of Materials, 4 Ed
54 pages
A 32-Bit Carry Lookahead Adder
No ratings yet
A 32-Bit Carry Lookahead Adder
5 pages
Construction of Digital Water Level Indi20200516 46043 1dalkbh With
No ratings yet
Construction of Digital Water Level Indi20200516 46043 1dalkbh With
6 pages
RT - DS - R3000 Lite - v.1.3.2
No ratings yet
RT - DS - R3000 Lite - v.1.3.2
2 pages
CS25
No ratings yet
CS25
20 pages
OCR A Level H446 Specification Map PDF
No ratings yet
OCR A Level H446 Specification Map PDF
2 pages
Myo Kinisi Brochure
No ratings yet
Myo Kinisi Brochure
5 pages
JawadRasheedSheikh CV
No ratings yet
JawadRasheedSheikh CV
1 page
Unit V
No ratings yet
Unit V
6 pages
Bluetooth Headset
No ratings yet
Bluetooth Headset
15 pages
Performance Analysis of 32-Bit Array Multiplier With A Carry Save Adder and With A Carry-Look-Ahead Adder
No ratings yet
Performance Analysis of 32-Bit Array Multiplier With A Carry Save Adder and With A Carry-Look-Ahead Adder
4 pages
Unsymmetrical Fault Analysis
100% (1)
Unsymmetrical Fault Analysis
49 pages
Xii e Int Physics Project List
No ratings yet
Xii e Int Physics Project List
1 page
Motor Protection
No ratings yet
Motor Protection
10 pages
Design of Efficient Multiplier Using VHDL
No ratings yet
Design of Efficient Multiplier Using VHDL
50 pages
MATLAB for Beginners: A Gentle Approach - Revised Edition
From Everand
MATLAB for Beginners: A Gentle Approach - Revised Edition
Peter I. Kattan
3.5/5 (11)
Essential Algorithms: A Practical Approach to Computer Algorithms Using Python and C#
From Everand
Essential Algorithms: A Practical Approach to Computer Algorithms Using Python and C#
Rod Stephens
4.5/5 (2)
MATLAB for Beginners: A Gentle Approach
From Everand
MATLAB for Beginners: A Gentle Approach
Peter I. Kattan
No ratings yet

Anarraymultiplier: Computer Organization and Design, 2Nd Edition, Morgan Kaufmann, 1998 (Sec

Uploaded by

Anarraymultiplier: Computer Organization and Design, 2Nd Edition, Morgan Kaufmann, 1998 (Sec

Uploaded by

An Array Multiplier

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 1

Our exploration of the design proceeds in several steps:

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 2

Table 1 Three methods for multiplying a 5-bit multiplicand by a 6-bit

Product (11 bits):

Method 2: Two-cycle multiply, first cycle

Next partial product:

Next partial product:

Method 3: Two-cycle multiply, carry save, first cycle

Next partial product:

Next partial product:

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 3

A small multiplier can be constructed by iteratively adding values to a partial

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 4

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 5

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 6

Logical efforts of inputs for asymmetric majority and parity gates.

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 7

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 8

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 9

Summary of intercell wiring configurations.

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 10

We can set y = 1 without losing generality because the absolute capacitances

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 11

The Rest of the Structure

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 12

3 The Rest of the Structure

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 13

Evaluating the Design

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 14

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 15

should yield least delay. Its delay will be D = 2 8.6 + 1 + 6 = 12.9.

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 16

By way of comparison, lets consider a single-rail design that uses 21 forks

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 17

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 18

6 Other Design Choices

Other Design Choices

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 19

D = 2 3.3 + 3 + 5/6 = 7.5

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 20

Design for an adder cell using domino logic.

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 21

Sutherland Array Multiplier second pages 1999/2/24 13:44 p. 22

You might also like