A_fanout_optimization_algorithm_based_on_the_effort_delay_model
A_fanout_optimization_algorithm_based_on_the_effort_delay_model
A Fanout Optimization Algorithm Based showed that there still exists an optimal solution in this search space
on the Effort Delay Model under a gain-based delay model. Fanout-free trees are trees in which a
buffer can drive at most one other buffer.
Peyman Rezvani and Massoud Pedram In this paper, an algorithm is presented that finds the fanout tree
topology and sizes of the buffers on the tree by decomposing the whole
problem into subproblems and solving each subproblem separately for
Abstract—This paper presents a Logical Effort-based fanout OPtimizer each sink. The solutions to the subproblems are then merged to form
for ARea and Delay (LEOPARD), which relies on the availability of a (near)
continuous size buffer library. Based on the concept of logical effort in very the solution to the whole problem. Our derivation relies on the notions
large scale integrated circuits, the proposed algorithm attempts to minimize of logical and electrical effort first proposed in [4].
the total buffer area under the required time and input capacitance con- Sutherland and Sproull [4] minimized the delay along any single path
straints by constructing the fanout tree topology and assigning the buffer by assigning equal delay budgets to each stage on that path. While this
sizes. More precisely, the proposed algorithm produces the optimum fanout approach was proven to minimize the delay, it did not necessarily result
tree solution if the fanout tree topology is restricted to a chain of buffers.
For the case where a discrete size library of buffers is available, this paper in an optimal solution in terms of the total buffer area. Kung [3], on the
also presents a postprocessing (buffer merging) step that transforms the other hand, solved the fanout-optimization problem to minimize the
continuous buffer-sizing solution to a discrete one while minimizing the input capacitance seen at the source gate subject to timing constraints for
round-off error. Experimental results show that compared with previous the sinks and without any consideration of the buffer area. In contrast,
approaches, both for continuous and discrete buffer libraries, LEOPARD the approach presented in this paper minimizes the total buffer area
achieves a significant reduction in the total buffer area subject to the re-
quired time constraints. subject to capacitance constraint for the driver. This is an important
distinction because it allows one to tradeoff the propagation delay
Index Terms—Buffer insertion, fanout optimization, gate sizing, logic de- through the source driver and through the rest of the buffer tree to reduce
sign, logical effort.
the total buffer area without too high of an increase in the overall delay.
The remainder of this paper is organized as follows. In Section II,
I. INTRODUCTION the effort delay model that is used throughout this paper is explained.
Section III explains the details of the algorithm. In Section IV, experi-
Quite often in a very large scale integrated (VLSI) design, a signal mental results are shown, and in Section V, we conclude the paper.
needs to be distributed to several destinations under a required timing
constraint at each destination. Furthermore, in practice, there may also
be a limitation on the load that can be driven by the source signal. II. DELAY MODEL
Fanout optimization is the problem of finding a buffer-tree topology The delay model used in this paper is based on the concept of logical
and sizing the buffers in this topology so as to satisfy the constraints. and electrical efforts presented in [4]. The effort-based model is basi-
Since these buffers must be picked from the sizes that are available cally a reformulation of the conventional RC model of CMOS gate delay.
in a given cell library, the more realistic problem is to find the op- Using the same terminology as in [4], the delay of a gate is defined
timum sizes for the buffers from the set of sizes available in the library. to be
This problem has been proved to be NP-complete [1]. While several
approaches exist for tackling the fanout optimization problem using
d = (p + gh) (1)
simplified delay models [9], [10], new techniques [12] have also been
proposed which use more accurate delay models or even taking inter-
connect delay into account [11]. More recently, however, researchers where is a time unit that characterizes the semiconductor process
[3] have started to use continuous, as opposed to discrete, size libraries, being used. It is only used to convert the unitless part of (p + gh) to a
in the sense that the optimum fanout tree is calculated with the as- time unit. For simplicity, is not considered from now on. Parameter p
sumption that buffers are available in all sizes. This greatly simplifies is the parasitic delay of the gate. The major contribution to the parasitic
the problem and allows the application of more powerful optimization delay is the capacitance of the source/drain regions of the transistors
techniques. At the same time, the number of discrete sizes for inverters that drive the output. Throughout this paper pinv is used as the parasitic
in a typical application-specified integrated circuit (ASIC) library has delay for an inverter. Parameter g is called the logical effort of the gate
increased to the extent that a “near-continuous inverter sizing” model and depends only on the topology of the gate and the ability to produce
has become a valid and fairly accurate model. output current. The logical effort for an inverter is assumed to be 1
In [2], the authors simplified the fanout optimization problem by and, for other gates, calculated based on their internal topologies. The
restricting the search space to a subset of trees and showed that the logical effort of a logic gate tells how much worse it is at producing
results still compare very favorably with the algorithms that consider output current than is an inverter, given that each of its inputs may have
a larger set of topologies. The authors used a dynamic programming only the same input capacitance as the inverter. Parameter h (specified
approach to implicitly enumerate the set of so-called LT-trees and find for each input pin of the gate) is called the electrical effort (also called
the optimal LT-tree topology and sizing. An LT-tree is either a 2-level gain) of the gate and is defined to be the ratio of the capacitive load
buffer type or a chain of buffers with intermediate fanouts to sinks that driven by the gate to the input capacitance at the corresponding input
ends up to sinksor to a 2-level tree. Reference [3] also restricted the pin. The electrical effort describes how the electrical environment of
search space to a certain class of trees, called fanout-free trees, and the logic gate affects performance and how the size of the transistors in
the gate determines its load-driving capability.
Manuscript received October 7, 2003; revised March 16, 2003. A prelimi- The important point is that p and g are independent of the size of
nary version of this work appeared in Proc. Int. Conf. Computer-Aided Design, the gate, and the only factor that is affected by sizing is the electrical
San Jose, CA, pp. 516–519, Nov. 1999. This paper was recommended by Asso- effort h. Reference [4] shows how p and g are independent of sizing
ciate Editor M. D. F. Wong. by doing the reformulation to define the four factors , p, g , and h in
The authors are with the Department of Electrical Engineering-Systems,
University of Southern California, Los Angeles, CA 90089 (e-mail: peyman@ terms of the resistance and capacitance of a minimum size inverter and
usc.edu; [email protected]). a template gate representing the topology of the gate. For details, refer
Digital Object Identifier 10.1109/TCAD.2003.819423 to [4].
find the optimum number of buffers for a buffer chain and the appro- constant and equal to
priate sizing for them to minimize the total buffer area such that the n
delay from Q to S is less than or equal to TR , the required polarity P
i
h = T R 0 npinv : (4)
is achieved, and the capacitive load imposed on Q is no more than Cin . i=1
Multiple-sink fanout optimization (mFO) problem: Given the
source of a signal Q with maximum driving capability Cin along with Hence, the claim is proved.
a set of m sinks Si each of which is assigned a triplet (CL , TR , Pi ) To find the optimum number of buffers n the maximum input capac-
where CL is the capacitive load, TR is the required arrival time, and itance constraint C1 Cin is used, where C1 is the input capacitance
Pi is the required polarity for the sink Si , find a fanout tree of buffers
of the first buffer in the chain being driven by the source signal and Cin
and the appropriate sizing for them to minimize the total buffer area is the given constraint on the input capacitance.
such that the timing constraint and the polarity required at each sink is The input capacitance for the first buffer is computed as follows:
satisfied and the capacitive load imposed on Q is no more than Cin . C L
Note that the only difference between the two problems is the C1 = : (5)
h i
number of sinks to be driven. Area, the objective function in both of
these problems, is considered to be the summation of input capaci- Let the electrical effort of the chain be defined as the product of elec-
tances of all the buffers, which is reasonable with the assumption of trical efforts of all the buffers, and let it be shown by H . Using the
continuous sizing for the gates. above equation, the input capacitance constraint can be restated as fol-
The rest of this section is organized as follows. The 1FO problem lows:
is solved in Section III-A, and in Section III-B, the mFO problem is
C L C L:
solved based on the solution derived for the 1FO problem. H = hi = (6)
C1 Cin
A. Buffer Chain Theorem 2: In the 1FO problem, for a fixed number of buffers n in
For the 1FO problem, the solution is a chain of buffers between the the chain, the electrical effort of the buffer chain H achieves its max-
source and the sink (Fig. 1). The variables of the problem are defined imum value when all hi s are equal.
to be the number of buffers n and the electrical efforts of these buffers Proof: According to Lemma 1, the summation of all hi s is con-
h 1 ; h 2 ; . . . ; hn . stant for any given number of buffers. Since the product of some vari-
Since the logical effort for an inverter is 1, the delay through the ables with a constant summation is maximum when all those variables
buffer chain can be expressed in terms of n and hi s as follows: are equal, all hi s have to be equal to maximize H .
n The electrical effort of each buffer for the buffer chain that maxi-
mizes H , according to Theorem 2 and (4), would then be
delay = npinv + h : i (2)
i=1
^ ^ = T R 0 npinv 8
h i =h i = 1; . . . ; n: (7)
The overall area, which is calculated as the summation of the input n
i=1 i=1 j =i j
H is drawn in Fig. 2 for TR = 14 and pinv = 0:6.
The goal would be to find n and all hi s to minimize area while both According to Theorem 2, there is a maximum value that H can
timing and input capacitance constraints are satisfied. That is achieve for any given buffer count. Therefore, the only buffer counts
Min area that are feasible are those for which the maximum value that H
achieves is not less than the ratio CL =Cin (6) and those correspond
st : delay TR
C1 Cin :
to the buffer counts between the points of intersection of H and line
CL =Cin (Fig. 2). As an example, for Case I in Fig. 2, there is no
Theorem 1: In the 1FO problem, delay through the optimum buffer feasible solution because there are no intersection points and H lies
chain is exactly equal to the specified required time TR , i.e., delay = TR . below CL =Cin for all buffer counts. For Case III, on the other hand,
Authorized licensed use limited to: TAMAL DAS. Downloaded on January 20,2025 at 17:30:55 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 12, DECEMBER 2003 1673
5 and 6, the optimum sizing for the buffers on the chain is found by
solving a convex optimization problem as follows:
C + C C
Min h h h +... + h h h
Fig. 2. Plot of H = Max( h ) versus n. st : h1 + . . . + hn TR 0 npinv
...
(11)
h1 . . . hn CC :
This is a minimization of a posynomial function with posynomial in-
equality constraints that can be easily solved in polynomial time [6].
Finally, among all of the solutions, the one with the minimum area is
selected as the optimum solution.
It is interesting to note that by taking the derivative of H and setting
it equal to zero, its maximum value is found to be at
n^ = TR 2 (pinv ) (12)
where
Lambert p e
(pinv ) = :
Lambert p e
(13)
pinv +1
Now that there exists a range for input capacitance for each buffer
count, it can be proven that area is a decreasing function of input ca-
pacitance in this range.
Theorem 5: For a fixed number of buffers in a buffer chain, the area
cost is a decreasing function of input capacitance for C i Cin C i .
Proof: Increasing input capacitance Cin for a branch will de-
crease the ratio CL =Cin in the capacitive constraint of the optimiza-
tion problem in (11). Therefore, there either exists a better solution
with smaller area or, if not, the same solution with the same area is still
achievable. Hence, increasing input capacitance will not increase area
and, therefore, area is a decreasing function of input capacitance and
claim is proven.
Fig. 5. Input capacitance allocation for a fanout-free buffer tree. Area versus input capacitance for some buffer count will, therefore,
look something like the graph shown in Fig. 6(a). As shown in Fig. 6(a),
Therefore, if T 3 is the optimal fanout tree with the proper sizing of no feasible solution exists for input capacitances smaller than C i and
buffers, it can be split to a fanout-free tree consisting of a set of buffer the area stays the same for input capacitances larger than C i . Different
chains T , which has the same area as T 3 , according to Theorem 4, buffer counts in the range [1, bTR =pinv ] result in the graphs shown in
and also satisfies the timing and input capacitance constraints (Fig. 5). Fig. 6(b). The minimum area over all buffer counts will, therefore, look
First, T will be found by using the optimal algorithm presented in Sec- like the graph shown in Fig. 6(c). This piecewise nature of area versus
tion III-A. The method used to transform T into T 3 will be discussed input capacitance, which is due to different buffer counts, causes the
later. ICA problem to be NP-complete.
The 1FO problem was stated such that the maximum input capaci- Theorem 6: ICA problem is NP-complete.
tance allowed was given. Therefore, before the mFO problem can be Proof: To perform the proof, the 0-1 Knapsack problem will be
broken down into 1FO problems, different portions of Cin need to be reduced to the ICA problem. In the conventional version of the Knap-
allocated to each branch (Fig. 5). sack problem, each item has a size and a value and the objective is to
Input capacitance allocation (ICA) problem: Given a number of maximize the total value. In the ICA problem, however, the objective
sinks, each with a required time, capacitive load, and required polarity, is to minimize area. Therefore, we will consider the negative of area,
and a total budget on input capacitance Cin , allocate portions of Cin rather than the area itself, so as to make the problem a maximization
to each branch, such that the total area is minimized while the given problem rather than a minimization one [Fig. 7(a)].
constraints for all sinks are satisfied. The value versus size curve for some item of 0-1 Knapsack problem
In this section, it is first proven that the ICA problem is NP-complete is shown in Fig. 7(b). The point about this graph is that it is not a con-
and then a heuristic is proposed for solving this problem. tinuous one. For sizes below si , the value is zero, and for sizes greater
Intuitively speaking, the input capacitance allocation problem is than si , the value is vi . Assuming to be the accuracy of the machine,
similar to Knapsack problem, where objects of the Knapsack problem the graph can be modified to the one shown in Fig. 7(c) to make it a
correspond to the capacitance budgets of each branch and the total continuous one. Note that the graph may have any arbitrary behavior in
capacitance is limited by the input capacitance constraint Cin , which the range between si and si + . This new graph is a special case of the
corresponds to the Knapsack volume. graph shown in Fig. 7(a), in which the curve has become linear. Since
Before it can be formally proven that this problem is NP-complete, the 0-1 knapsack problem is NP-complete, the ICA problem is NP-hard
the behavior of area must be studied as a function of input capacitance as well, otherwise one could formulate the 0-1 Knapsack problem as an
for each branch. The valid range for the buffer count on branch i is [1, ICA problem and solve it in polynomial time. Note that the NP-hard-
bTR =pinv c], according to (10). For each buffer count n, in this range, ness of ICA is because of the piecewise nature of the area versus input
there exists a maximum electrical effort for the buffer chain, according capacitance curve and, that, in turn, is because area is represented by
to (8). Therefore, because of the capacitance constraint in (6), there different functions for different buffer counts. Now that it has been
exists a minimum required input capacitance as follows: proven that ICA is NP-hard, it must be shown that the decision version
CL of ICA can be tested in polynomial time. This is obviously true be-
Ci = T 0np n (15)
cause one can easily add up the input capacitances of each branch and
n compare it with the input capacitance budget Cin . This can be done in
where the denominator is the maximum value that can be achieved by linear time, meaning ICA is in NP, and since it was proven that ICA is
h, according to (8). On the other hand, there exists a maximum ben- NP-hard, therefore, the ICA problem is NP-complete.
eficial input capacitance, C i , for each buffer count which means that
After proving that ICA is an NP-complete problem, this section pro-
allocating an input capacitance larger than C i will not improve area
ceeds by proposing a heuristic method for allocating input capacitances
any further. This value can be calculated using the same optimization
to each branch.
problem as in (11) but with dropping the capacitance constraint. That
Let m denote the number of sinks and, thus, the number of branches.
Consider the k th branch (1 k m) and H k , the maximum of
is
arean
fhg = Min
st : delayn TR
electrical effort of the k th branch, has its minimal value of 1 at nk = 0
(lim. H when n tends toward 0). On the other hand, H k cannot be any
larger than (TR ; pinv ), the value of H k (nk ) when nk is calculated
and then calculating C i as follows:
from (12). According to (5), the maximum value of H k corresponds to
CL
Ci = : the minimum value of Cik . Therefore, the minimum acceptable input
h capacitance would be
Obviously, any input capacitance larger than C i will not improve area
any further because allocating C i already results in the same solution CLk
Ck = : (16)
as when the capacitance constraint is dropped. (TR ; pinv )
Authorized licensed use limited to: TAMAL DAS. Downloaded on January 20,2025 at 17:30:55 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 12, DECEMBER 2003 1675
(a) (b)
(c)
Fig. 6. (a) Area versus input cap for some buffer count n. (b) Area versus input cap for different buffer counts. (c) Minimum area versus input cap.
(a) (b)
(c)
Fig. 7. (a) Area versus input cap. (b) Value versus size for an item of knapsack problem. (c) Modified value versus size graph.
Allocating any capacitance less than C k to any branch will make The proposed heuristic is shown in Fig. 9. Line 4 finds xk s such that
that branch infeasible. Hence, m new positive variables xk for the desired ratio between them, as discussed above, is fulfilled.
k = 1; . . . ; m are introduced such that The slope for each branch is estimated as follows:
Cik = C k + xk : (17) ymax 0 ymin (TR ; pinv ) 0 1
:
This way, one can be sure that the minimum required capacitance is
slopek =
xmax 0 xmin =
TR (pinv ) 0 0
(18)
allocated to each branch. The heuristic is to find xk s in such a way that
After finding the allocated input capacitances, m instances of the 1FO
their ratio is proportional to the positive slope of H graph in Fig. 2.
problem will be generated that can be optimally solved by the algorithm
The motivation behind this heuristic is the fact that for two different
presented in Section III-A.
branches to have the same change in buffer count, the branch with
smaller slope would need a smaller change in CL =Cin . When a branch
is given a wider range of buffer counts to explore, a better solution will C. Merging Buffer Chains
likely be found. For an example, refer to Fig. 8. Branch 1 has a larger So far, a continuous-sized buffer library has been assumed. In reality
slope compared to branch 2; therefore, a larger change in CL =Cin for the ASIC library has a finite (and hopefully large) number of inverter
branch 1 is required to have the same buffer count range as branch 2. sizes. So the solution needs to be mapped to one consistent with the
Since CL is given and fixed for each branch, changing CL =Cin corre- library. The main problem when rounding the inverter sizes is that it
sponds to changing the Cin allocated to that branch. may result in significant errors. To alleviate this problem, the merging
Authorized licensed use limited to: TAMAL DAS. Downloaded on January 20,2025 at 17:30:55 UTC from IEEE Xplore. Restrictions apply.
1676 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 12, DECEMBER 2003
TABLE I
COMPARISON WITH SUTHERLAND
very first stage) is the same. As noted in the proof of Theorem 4, for the
merging transformation to produce the exact same area and delay, the
Fig. 9. Algorithm InCapAlloc. electrical efforts of the buffers to be merged must be equal. However,
because each branch of the fanout tree is optimized separately with re-
spect to the corresponding sink, the electrical efforts of the buffers may
transformation, which is the opposite of the split transformation intro- not necessarily be equal. Thus, a constant " is defined and two buffers
duced in Fig. 4, is used. are merged if the difference between their electrical efforts is less than
To show how this works, recall Theorem 4. If the electrical efforts or equal to " percent. In addition, two buffers are merged if the rounding
of the buffers on two branches are equal, one can merge them and re- error after merging the two is smaller than the summation of rounding
place them with a single buffer with the same electrical effort. Note errors of each buffer before the merge operation. Obviously, the effi-
that simply because the electrical efforts of the buffers are the same, ciency of this approach is dependent on the order in which the buffers
one cannot conclude that the buffer sizes are also the same. As shown are selected to be merged. The approach presented here is to cluster the
in Fig. 4, the sizes of each of the buffers before merging are C1 =h buffers into groups of nearly equal electrical efforts and check for the
and C2 =h, respectively, and the size of the buffer after merging is merging possibilities inside each group. Merging is performed starting
(C1 + C2 )=h. Therefore, the size of the buffer after merging is equal
at the source of the signal, and proceeding toward the sinks, while at
to the summation of buffer sizes before merging. This fact can be used the same time preserving the area so as not to increase the capacitive
to reduce the rounding error. As an example, consider a buffer size of load imposed on the previous stage. The pseudocode for a recursive
0.35 that has to be mapped to a buffer size of 1 in the ASIC library. merging algorithm is shown in Fig. 10.
Now, if two buffers of size 0.35 could be merged to a single buffer, the
size would be 0.7, and rounding to a buffer size of 1 would result in
IV. EXPERIMENTAL RESULTS
smaller error.
Clearly, one has to be concerned about satisfying the required time Three different sets of experiments were performed. In the first set,
and input capacitance constraints when performing this transformation. the LEOPARD algorithm of Section III was compared with an im-
The merging should be performed in such a way that all timing con- plementation of the Sutherland algorithm [4], which minimizes delay
straints are satisfied and the area (as well as the input capacitance of the through a path. The results are reported in Table I.
Authorized licensed use limited to: TAMAL DAS. Downloaded on January 20,2025 at 17:30:55 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 12, DECEMBER 2003 1677
Sutherland delay model. A very good match between the SIS delay and
For all of the experiments, the minimization problems within the logical effort delay model values was enforced.
LEOPARD algorithm were solved using the Matlab Optimization The fanout optimization programs of SIS were first used to perform
Toolbox v. 2.0. Furthermore, pinv was assumed to be 0.6. For each cir- fanout optimization. The results are reported in column 6 of Table III.
cuit, the capacitive load of the sink and the maximum capacitance that Then, the delay and input capacitance resulting from SIS were used
the source can drive were given. First, the path delay was minimized as constraints for LEOPARD. The results, assuming a continuous-size
using Sutherland’s method. Delay and area of minimum-delay buffer buffer library, are reported in column 3. Then, merging and mapping to
chain are reported in columns 2 and 3. Next, the resulting delay and the real buffers in the ASIC library were performed, and the results are
polarity were used as the constraints for the area minimization problem shown in columns 4 and 5. As shown in the table, in case of continuous
in LEOPARD. In the 4th column, the minimum area generated by sizing the area is expressed in terms of the capacitances but for the dis-
LEOPARD, subject to the given constraints, is shown. As expected, crete-sized buffers, it is the actual buffer area extracted from the library.
the area is almost the same because delay has been minimized and, Results show an average of 38% area improvement for LEOPARD.
hence, the timing constraint is so tight there will not be much room for
reducing area. However, when LEOPARD was given a 5% additional
slack, it can reduce area by an average of 29% as shown in columns 6 V. CONCLUSION
and 7. This shows how delay can be traded off for area to significantly
This paper presented an optimal algorithm for buffer chains to min-
reduce area using LEOPARD if a slight increase in delay can be
imize area with the assumption of continuous sizing for the buffers.
afforded. Note that merging or rounding is not applied during this
The algorithm finds the optimum number of buffers and the optimum
set of experiments and the area reported is the summation of input
sizing for them by solving a posynomial minimization problem subject
capacitances of all inverters.
to posynomial inequality constraints which can be easily and quickly
In the next set of experiments, the results from LEOPARD are com- solved by a convex program solver. Based on this algorithm, a heuristic
pared with the results of an implementation of Kung’s algorithm [3]. method was presented for the general case of buffer trees. Considering
For each circuit, a number of sinks with capacitive load, required the fact that the number of discrete sizes for buffers in typical libraries
time, and required polarity are given. The number of sinks for each cir- has highly increased, the assumption of near-continuous buffer library
cuit is shown in column 2. Kung’s algorithm was first used to minimize is fairly accurate.
capacitive load on the source. The resulting capacitance and area are
reported in columns 3 and 4. The capacitance calculated by Kung’s al-
gorithm was then used as the capacitive constraint for area optimization REFERENCES
in LEOPARD. The resulting area is reported in column 5. Finally, an [1] C. L. Berman, J. L. Carter, and K. F. Day, “The fanout problem: from
additional 5% input capacitance was allowed for each circuit to further theory to practice,” in Advanced Research in VLSI: Proceedings of the
reduce area, and the resulting input capacitance and area are shown in 1989 Decennial Caltech Conferences, C. L. Seitz, Ed. Cambridge,
columns 6 and 7. An average of 19% improvement in area is achieved MA: MIT Press, 1989, pp. 69–99.
in the expense of 5% additional input capacitance. Note that in this [2] K. Kodandapani, J. Grodstein, A. Domic, and H. Touati, “A simple algo-
rithm for fanout optimization using high-performance buffer libraries,”
set of experiments, neither merging nor rounding were performed for in Proc. Int. Conf. Computer-Aided Design, 1993, pp. 466–471.
Kung’s algorithm or LEOPARD and the area reported in Table II is the [3] D. S. Kung, “A fast fanout optimization algorithm for near-continuous
total capacitance of inverters calculated by the algorithms rather than buffer libraries,” in Proc. 35th Design Automation Conf., 1998, pp.
extracted from the library. pinv is assumed to be 0.6. 352–355.
[4] I. E. Sutherland and R. F. Sproull, “Logical effort: Designing for speed
Finally, our last set of experimental results compare LEOPARD with on the back of an envelope,” in Advanced Research in VLSI. Santa
the sequential interactive synthesis (SIS) fanout optimization program. Cruz, CA: Univ. of Calif., 1991.
SIS runs different fanout optimization programs, namely LT-Tree, Two- [5] R. M. Corless, G. H. Gonnet, D. E. G. Hare, D. J. Jeffrey, and D. E.
Level, Bottom-Up, and Balanced, and the best one is reported [14]. In Knuth, “On the Lambert W function,” Adv. Computat. Math., vol. 5, pp.
329–359, 1996.
this set of experiments, a standard cell library consisting of ten different [6] P. M. Vaidya, “A new algorithm for minimizing convex functions over
inverters was used. For each inverter, intrinsic and Rout were specified convex sets,” in Proc. IEEE Foundations Comput. Sci., Oct. 1989, pp.
for the SIS library delay model and pinv and were specified for the 332–337.
Authorized licensed use limited to: TAMAL DAS. Downloaded on January 20,2025 at 17:30:55 UTC from IEEE Xplore. Restrictions apply.
1678 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 12, DECEMBER 2003
[7] C. Mead and L. Conway, Introduction to VLSI Systems. Reading, MA: these methods have been incorporated into commercial computer-aided
Addison Wesley, 1980. design (CAD) tools and used by industry.
[8] P. Rezvani, A. Ajami, M. Pedram, and H. Savoj, “LEOPARD: A logical
One major obstacle is that these methods are based on constrained
effort-based fanout optimizer for area and delay,” in Proc. Int. Conf.
Computer-Aided Design, Nov. 1999, pp. 516–519. nonlinear programming, a process known to be computationally inten-
[9] M. C. Golumbic, “Combinational merging,” IEEE Trans. Comput., vol. sive (NP-hard) [12]. These methods are applicable only to small size
25, pp. 1164–1167, Nov. 1976. problems, while P/G networks in today’s very large scale integration
[10] K. J. Singh and A. Sangiovanni-Vincentelli, “A heuristic algorithm for (VLSI) design may contain millions of wire segments (therefore, mil-
the fanout problem,” in Proc. 27th Design Automation Conf., June 1990,
pp. 357–360. lions of variables). On the other hand, with the continuous shrinking of
[11] A. Salek, J. Lou, and M. Pedram, “A simultaneous routing tree con- the chip feature size, P/G network optimization is becoming increas-
struction and fanout optimization algorithm,” in Proc. Int. Conf. Com- ingly important, since more and more portions of the chip area are
puter-Aided Design, Nov. 1998, pp. 625–630. dedicated to P/G routings, and the problems of IR drop and electro-
[12] P. Cocchini, M. Pedram, G. Piccinini, and M. Zamboni, “Fanout opti-
migration deteriorate.
mization under a submicron transistor-level delay model,” IEEE Trans.
Computer-Aided Design, vol. 9, pp. 339–349, Mar. 1990. In this paper, we present a new method capable of solving the P/G
[13] Y.Yu Nesterov and A. Nemirovsky, Interior point polynomial methods optimization problem orders of magnitude faster than the best known
in convex programming. Philadelphia, PA: SIAM, 1994. method. Our method is inspired by a key observation made by Chowd-
[14] H. J. Touati, “Performance-Oriented Technology Mapping,” Ph.D. dis- hury that if currents in wire segments are fixed, and voltages are used as
sertation, Univ. California, Berkeley, 1990.
variables, then the resulting optimization problem is convex [8]. How-
ever, instead of using the conjugate gradient method as in [8], we show
that the problem can be solved elegantly by a sequence of linear pro-
grams. We prove that there always exists a sequence of linear programs
that converge to the optimal solution of the original convex optimiza-
tion problem. Experimental results have demonstrated that usually a
Reliability-Constrained Area Optimization of VLSI
few linear programs are required to reach the optimal solution. The
Power/Ground Networks Via Sequence complexity of the proposed method is proportional to the complexity
of Linear Programmings of linear programming (which can be solved in polynomial time [5],
[12]). Therefore, our method is scalable, i.e., the CPU time increases
Sheldon X.-D. Tan, C.-J. Richard Shi, and Jyh-Chwen Lee
approximately polynomially with the size of a network. In practice,
we have observed that the new method is orders of magnitude faster
Abstract—This paper presents a new method of sizing the widths of than the conjugate gradient method with constantly better optimization
the power and ground routes in integrated circuits so that the chip area results.
required by the routes is minimized subject to electromigration and IR This paper is organized as follows. Section II reviews some previous
voltage drop constraints. The basic idea is to transform the underlying work. Section III describes the formulation of the P/G network opti-
constrained nonlinear programming problem into a sequence of linear
programs. Theoretically, we show that the sequence of linear programs mization problem. The new method is presented in Section IV. Some
always converges to the optimum solution of the relaxed convex opti- practical considerations are described in Section V. Experimental re-
mization problem. Experimental results demonstrate that the proposed sults from some large P/G networks are summarized in Section VI.
sequence-of-linear-program method is orders of magnitude faster than the Section VII concludes the paper.
best-known method based on conjugate gradients with constantly better
solution qualities.
Index Terms—Circuit modeling, linear programming, power distribu- II. PREVIOUS WORK
tion network, simulation and optimization.
It is generally assumed that the average current drawn by each
module is known and is modeled as an independent current source
I. INTRODUCTION (we do not consider the temporal correlations of current sources). The
Power/ground (P/G) networks connect the P/G supplies in the circuit constraints from reliability and design rules include: 1) IR voltage
modules to the P/G pads on a chip. An important problem in P/G net- drop constraints; 2) metal-migration constraints; 3) minimum width
work design is to use the minimum amount of chip area for wiring P/G constraints; and 4) equal width constraints. The problem of deter-
networks, while avoiding potential reliability failures due to electromi- mining the widths of wire segments of a P/G network to minimize the
gration and excessive IR drops. Specifically, we are concerned with the total P/G routing area subject to all these constraints is a constrained
problem of P/G-network optimization where the topologies of P/G net- nonlinear optimization problem [6], [7].
works are assumed to be fixed, and only the widths of wire segments In the method of Chowdhury and Breuer [6], resistance values and
are to be determined. Several methods have been developed to solve branch currents are selected as independent variables. Both the objec-
this problem [6]–[9]. However, to the best of our knowledge, none of tive function and the IR voltage drop constraints become nonlinear. The
augmented Lagrangian method combined with the steepest descent al-
gorithm [1] is used to solve the resulting problem.
Manuscript received August 17, 2002; revised February 3, 2003. Some pre-
Dutta and Marek-Sadowska [9] used only resistance values as vari-
liminary results of this paper were presented at the ACM/IEEE 38th Design
Automation Conference, New Orleans, LA, June 1999. This paper was recom- ables. All of the constraints expressed in terms of nodal (terminal)
mended by Associate Editor M. Sarrafzadeh. voltages and branch currents, which have to be obtained by explicitly
S. X.-D. Tan is with the Department of Electrical Engineering, University of solving an electrical network, become nonlinear. The feasible direction
California, Riverside, CA 92521 USA (e-mail: [email protected]). method [4] is employed to solve the nonlinear optimization problem.
C.-J. R. Shi is with the Department of Electrical Engineering, University of
Washington, Seattle, WA 98195 USA. At each iteration step, extra effort is required to solve the electrical net-
J.-C. Lee is with Synopsys Inc., Mountain View, CA 94043 USA. work for nodal voltages and branch currents, as well as their gradients
Digital Object Identifier 10.1109/TCAD.2003.819429 by numerical differentiation.