A_simple_algorithm_for_fanout_optimization_using_high-performance_buffer_libraries
A_simple_algorithm_for_fanout_optimization_using_high-performance_buffer_libraries
466
1063-6757193$03.00 0 1993 IEEE
Authorized licensed use limited to: TAMAL DAS. Downloaded on January 20,2025 at 17:41:16 UTC from IEEE Xplore. Restrictions apply.
LT-trees, which are fanout chains terminated by a loads of the replaced sinks, and as required time the
two-level tree of inverters for extra buffering. Dy- earliest of the replaced sinks times.
namic programming techniques are applied to solve
for the minimal-delay implementation. Yoshikawa Our aggregate sinks are labeled SI to S,, with
et. al. 161 propose a variation of combinatorial merg- loads Llto L, and required times RI to I?.,. Thus,
ing able to generate a broad spectrum of buffer- chain node Jvk has load Lk,and its signal must ar-
tree topologies. Singh et al. 171 recursively gen- rive no later than Rk. The buffer driving Nk is la-
erate fanout trees, which is combined with gate re- beled &, and its delay is denoted by &. Specify-
powering. These algorithms are complex and slow. ing the sink assignment transforms a problem into a
simpler but equivalent "aggregate fanout problem."
By contrast, our algorithm only builds fanout
chains. Its simplicity allows us to obtain better so- We now indicate a crucial characteristic of the in-
lutions when the objective is to minimize area under verters in our custom library: It has a wide range of
delay constraints. We incur little or no penalty by inverter sizes so we may assume they are continu-
reducing the search space because of the freedom ous. We define two functions: A(COut,D)is the area
we have in our technology to size inverters almost required by a single inverter to drive load Gout with
continuously. delay D, and Cin(Cotrtr D)is its input capacitance.
Both A and Ci, are monotonicallyincreasing in Gout
and decreasing in D. Qpically Ci, is proportional to
Definitions the area A, but we handle Ci, as a separate function
to clarify the equations. While size can grow arbi-
We now state the fanout problem. The problem
trarily, in practice the library has only inverters up
inputs are:
to a "reasonable" size.
a set of n sinks, labeled S1 to S,. Each sink has
a given polarity, a load Lk,and a required time Assumptions
Rk. There are p sinks with positive polarity and
q with negative polarity; We have two classes of assumptions. First, tech-
nology assumptions (TG,1 to 3, which are typical
a driving gate, or source, with a worst-case ar- of CMOS technologies over their useful range, and
rival time A; so valid across processes. Data supporting 1 to 3 is
found in the Appendix.
a set of inverters (differing in size) from a library.
1. CMOS inverters have a high gain. Specifically,
We want as solution a funout tree of minimum area over the useful range of the library:
that meets the delay constraints, i.e. such that the
required arrival times for all the sinks are satisfied.
467
Authorized licensed use limited to: TAMAL DAS. Downloaded on January 20,2025 at 17:41:16 UTC from IEEE Xplore. Restrictions apply.
Duplicating gates implies extra wires and less considers monotone sink assignments, i.e. sink as-
efficient routing. Load splitting is common when signments that place sinks along a fanout chain in
driving large loads with semi-custom libraries, order of increasing required times.
but a custom methodology allows very large in-
verters. For our processes, driving a load with Restricting the search to monotone sink assign-
one inverter results in 4 to 18 % less area than ments is suboptimal in general, as was shown by
using two smaller inverters. l’buati [l,pp.461. However, his example relies crit-
ically on the use of a discrete library. With a con-
We add two methodology assumptions (MA). These tinuous library, restricting the search to monotone
assumptions come from our design methodology, sink assignments does not incur any loss of opti-
rather than CMOS physical properties, and may be mality, as long as our technology and methodology
violated in special circumstances. assumptions remain valid.
1. No gate in the network can have a delay smaller Theorem 1: For a fanout chain with sinks of both
than Dmin,or larger than D,,,. This avoids polarities, given required times and a library with
potential coupling noise and reduces power con- almost continuous inverter sizes, the minimal area
sumption. In practice, Dminand D,,, are within solution has the sinks of a given polarity placed in
a 1:4range. order of required times.
2. Critical paths should not contain inverters. This Before proving the theorem, we consider a special
cannot be guaranteed in all cases. However, to case. Assume two sink assignments for a fanout
speed up critical paths, signal polarities are cho- chain, PA and PB, such that:
sen to minimize the number of inverters. If two
the only difference between PA and PB is that in
paths need a signal in opposite polarities, the
PB, a load ALh was shifted to a sink of equal polar-
most critical path gets it inversion-free.
ity but further away from the source.
We can place the sinks at different locations in a AreaGainB3 = A(L3 + AL3,D3) - A(&, 4 )
given fanout chain as long as we respect the po-
larities. For n sinks and c locations, there are cn but by the mean-value theorem
possible sink assignments. An exhaustive enumer-
ation of all possible assignments is too costly. As
most other fanout algorithms, our algorithm only AreaGainB3 = scout - (8zAr 3 , ~ 3 ) ~ ~ 3
468
Authorized licensed use limited to: TAMAL DAS. Downloaded on January 20,2025 at 17:41:16 UTC from IEEE Xplore. Restrictions apply.
for some intermediate value L35.t!,35L3 AL3. As + By the Lemma, there is a solution for the new
the load change at node NZis only due to the new sink assignment at least as small as the best for the
input capacitance at Nz out-of-order sink assignment. By repeatedly moving
out-of-order sinks to an in-order location without in-
creasing area, we eventually obtain an in-order so-
lution no worse than the original one.
then we have the inequality
Minimal Area under Delay Constraint
We present two techniques for attacking the prob-
lem of minimal area fanout chains under a delay
constraint (MADC). Each is appropriate in a differ-
ent environment. In both cases, the basic algorithm
The same process is repeated back along the chain. consists of the following steps:
If we examine the area gain for Bz, we see i t involves
the product of AL3 with two partials, aA/aCout Try all possible sink assignments compatible
and aCi,/aC,ut. By our TA 1 and 2, we neglect with increasing order of required times and sink
AreaGainBz when adding up the total area change. polarity.
As for AreaGaanB~,two terms cause load changes: For each sink assignment, apply an inverter
one due to the removal of ALa, and another due to selection algorithm and determine the solution
the increase of the input capacitance of Bz. The area cost.
gain due to the larger input capacitance involves the pick the best solution.
product of two partials of the type aCin/aCout with
AL3 as one can see by applying the inequality above We now limit ourselves to chains with at most four
twice. Thus, we can neglect this term. After these inverters. In practice, longer chains are rarely nec-
observations, we are left with a total area gain essary. So, we do not consider this an undue re-
striction. The arguments can be easily extended to
an arbitrary chain, but at a loss in computational
efficiency.
As a consequence of Theorem 1, when we analyze
for some LI - A L ~ ~ ~ I S L I . sink assignments we can restrict our search to as-
As we assumed B1 is faster than B3, by TA 2 we signments that are ordered with respect to required
obtain the first partial is actually smaller than the times. As the number of inverters in our chain
second, so we have decreased the total area. This is equal to 4, we then have O(p * q ) assignments
proves the claim. (where p(q) indicates the number of positive (nega-
tive) sinks), instead of a number of cases that grows
Case 3: If Sol-A has B1 sized larger and slower exponentially with sink count. If we used inverter
than B3, S o l B still has smaller area than Sol-A. chains of length 2K,the number of cases to consider
is O((p * q ) K - l ) , still polynomial in the number of
Due to lack of space, we omit the proof. Now we sinks.
prove Theorem 1. Zntegration with a Bee-based MADC Algorithm
Proof of the Theorem: Consider a solution with Optimizing fanout trees is typically done when
out-of-order sinks. Some sinks with late required mapping the network. Since an inverter chain is
times are at earlier points on the chain than other a tree, sizing the chain can be combined with a
sinks of the same polarity with smaller required minimum area under delay constraint (MADC) tree-
times, i.e., there are sinks S; and S,, of the same covering algorithm ([I], [SI). We add the fanout
polarity, on nodes Nk and Nm with k < m, such chain to its fanin tree to make a single tree, and
that Ri > R, (so Si is less critical). Consider now optimize the entire unit using the same covering al-
moving Si to N,. By changing one sink assignment, gorithm. We perform fanout chain inverter selection
we create a new problem PB from the original one. and tree covering in a single step.
Note that now PB has the same (or greater) re- Since algorithms such as [91 proceed from tree
quired times as the original problem: RBI >= RA1 leaves to tree root, calculating dynamic programs,
for all 1. Rk can only have improved, since we re- the integration works well. The dynamic programs
laxed its requirements by removing a sink. But & for all non-chain nodes (the nodes in the original
has not changed since we added a noncritical sink. fanin tree) are calculated first, only once. Then, the
469
Authorized licensed use limited to: TAMAL DAS. Downloaded on January 20,2025 at 17:41:16 UTC from IEEE Xplore. Restrictions apply.
dynamic program for the chain nodes is calculated In summary, we have focused on the case of li-
O(p * q ) times (once for each ordered sink assign- braries with a large, almost continuous range of in-
ment) and the best overall solution is chosen. The verter sizes. For such libraries, we have provided
total complexity is thus 0(sinks2 * treenodes). experimental evidence that the fanout chain is typi-
cally the minimal-area configuration of a fanout tree
Note that the algorithm naturally picks the opti- which satisfies the delay constraints. We present
mal length of inverter chain to use. Even though simple, efficient algorithms to explore the relevant
the initial network has four inverters, it may map space of fanout chains to find the best one. We
to fewer inverters if this produces less area and i t showed that our techniques give similar or better re-
can drive the loads under the delay constraints. sults than existing techniques at lower cost in soft-
ware complexity.
Device Sizing Approach
470
Authorized licensed use limited to: TAMAL DAS. Downloaded on January 20,2025 at 17:41:16 UTC from IEEE Xplore. Restrictions apply.
[4] "The Fanout Problem: From Theory to Prac-
The 0.9ns line gives a n isolation ratio of 0.13, and tice," C.L. Berman et al., in Advanced Research in
the 0.35ns line corresponds to a ratio of 0.6. Thus, VLSI: Roc. 1989 Decennial Caltech Conference,
reasonable speeds do have high gains. C.L.Seitz (Ed.), MIT Press, 1989, pp. 69-99.
TA 2 states that as inverters become faster, they [5] "Lattis: An Iterative Speedup Heuristic for
get more self-loaded. Fixing a load C, we see that Mapped Logic," J. Fishburn, Proc. DAC 1992,
the constant-delay lines are initially closely spaced pp. 488-491.
and then they get much further apart, showing in-
creased area penalties for the same delay reduction. [6] "Timing Optimization on Mapped Circuits," T.
Yoshikawa et. al., h o c . DAC 1991, pp. 112-117.
Finally, TA 3 says that driving a load in a given
time with a single buffer is better than splitting the [ 71 "A Heuristic Algorithm for the Fanout Problem, "
load into two. Note that the constant delay lines K.J. Singh and A. Sangiovanni-Vincentli, Proc.
have a y-intercept greater than zero. This means DAC 1990, pp. 357-360.
that a "zero load' inverter still has positive area, so [8] 'XNear-Optimal Algorithm for IIL.chnology Map-
using two inverters doubles this cost. The numbers ping Minimizing Area under Delay Constraints, "
do not include the wiring costs; this effect would K. Chaudhary and M. Pedram, h o c . DAC 1992,
make the gain more dramatic. This justifies TA 3. pp. 492-498.
[9] "DAGON: Technology Binding and Local Opti-
References mization by DAG Matching," K. Keutzer, Proc.
DAC 1987, pp. 341-347.
[13 "Performance-OrientedTechnology Mapping," H.
Touati, PhD Thesis, U. California at Berkeley, [101 "TILOS:A Posynomial Approach to Pansistor
1990. Sizing, " J. Fishburn and A. Dunlop, h o c . IC-
CAD 1985, p . 326-328.
[2] "Logic Synthesis for VLSZ Design," R. Rudell, [lll 'A Convex 8ptimrzatron Appnmch to Pansistor
Memo No. UCB/ERL M89/49, U. California at Sizing for VLSZ Circuits," S.S. Sapatnekar et al.,
Berkeley, 1989. Proc. ICCAD 1991, pp. 482-485.
[3] "Sequential Circuit Design using Synthesis and
Optimization," E. Sentovich et al., Proc. ICCD
1992, pp. 328-333.
Table 1
Results
problem # sinks # inver area slack (ns) # inver area slack (ns)
47 1
Authorized licensed use limited to: TAMAL DAS. Downloaded on January 20,2025 at 17:41:16 UTC from IEEE Xplore. Restrictions apply.