Defered Merge Algorithm by Boese
Defered Merge Algorithm by Boese
(DME) Algorithm
The Deferred-Merge Embedding (DME) algorithm
embeds internal nodes of the topology G via a two-
phase process. A bottom-up phase constructs a tree
ms(a)
ms(v) ms(b)
of line segments that represent loci of possible place-
ments of the internal nodes in the ZST. A top-down
|e b|
s1
s8 |e v |
s2 s7 possible
ms(v) placements
of v
Figure 3: A tree of merging segments. Solid for each internal node v in G (top-down order)
lines are merging segments; dotted lines indicate if v is the root
edges between merging segments. Choose any q 2 ms(v)
pl(v) q
Procedure Build Tree of Segments else
Input: Topology G; set of sink locations S Let p be the parent node of v
Output: Merging segments ms(v) for each Construct trrp as follows:
node v in G and edge lengths jev j for core(trrp ) fpl(p)g
each v =6 s0 radius(trrp ) jev j
for each node v in G (bottom-up order) Choose any q 2 ms(v) \ trrp
if v is a sink node, pl(v) q
ms(v) fpl(v)g endif
else Figure 6: Creating the ZST by embedding inter-
Let a and b be the children of v nal nodes of the topology.
Calculate Edge Lengths(jea j,jebj)
Create TRRs trra and trrb as follows: we select pl(v) as follows: (i) if v is the root node, then
core(trra ) ms(a)
radius(trra ) jea j any point in ms(v) can be chosen as pl(v);3 and (ii) if
core(trrb ) ms(b) v is an internal node other than the root, then v can
radius(trrb ) jeb j be embedded at any point in ms(v) that is at distance
ms(v) trra \ trrb jev j or less from pl(p). (The merging segment ms(p)
endif was constructed such that d(ms(v); ms(p)) jev j, so
there must exist some choice of pl(v) satisfying this
Figure 4: Constructing the tree of segments. condition.4) More speci cally, the procedure creates
a square TRR trrp with radius ev and with core equal
Figure 4 gives a precise description of the pro- to the placement of v's parent node p. The placement
cedure Build Tree of Segments, which constructs the of v can be any point from ms(v) \ trrp (see Figure
tree of merging segments. Details of the Calcu- 5). In Figure 3, the resulting placements for the tree
late Edge Lengths step depend on the delay model. of merging segments are indicated by the points where
For the linear model, the calculation is straightfor- segments are connected by dotted lines. Figure 6 de-
ward (see [2]). The calculation for the Elmore model scribes procedure Find Exact Placements, which per-
can be found in [2][3][8]. Unless more wire is needed forms the embedding of nodes from the tree of merging
to balance delays between Ta and Tb , it must be that segments.
jea j + jeb j = d(ms(a); ms(b)). Since each instruction in Find Exact Placements is
By Lemma 1, procedure Build Tree of Segments re- executed at most once for each node in G, and since
quires constant time to compute each new merging the intersection of TRRs ms(v) and trrp can be found
segment, and linear time in the size of S to construct in constant time by Lemma 1, Find Exact Placements
the entire tree of merging segments. 3 If the speci cation requires a xed source location, s0 ,
0
3.2 Phase II: Embedding of Nodes choose pl(s0 ) 2 ms(s0 ) with minimum distance from s00 and
connect a wire directly from s00 to pl(s0 ).
Once the tree of segments has been constructed, the 4 The distance can be less than d(ms(v ); ms(p)) only when
exact embeddings of internal nodes in the ZST are cho- extra wire is used to merge v with its sibling w, i.e., when the
sen in a top-down manner. For node v in topology G, merging cost for p is greater than d(ms(v); ms(w)).
requires time linear in the size of S. Hence, DME is a Proof: The lemma is obvious after transformation to
linear time algorithm overall. the L metric, where TRRs become rectangles with
vertical and horizontal boundaries.
1
4 Optimality of DME for Linear Delay Theorem 2 : For any sink set S and topology G, the
The DME algorithm is optimal in the linear delay DME algorithm will nd a ZST with minimum feasible
regime (the proof of Theorem 1 is contained in [2]). delay, equal to one-half the diameter of S .
Theorem 1 Given a set of sinks S <2 and a con- Proof: Let d equal the diameter of S. We assign a
nection topology G, the DME algorithm produces a TRR, called TRR(v), to each node v 2 G such that
ZST T in the linear model with minimum cost over
all ZSTs with topology G and sinks S . if v is a sink node, then TRR(v) =
MD(pl(v); d=2); and
DME also produces the optimal ZST in the varia- if v is an internal node with children a and b, then
tion of the Zero Skew Clock Routing Problem where TRR(v) = TRR(a) \ TRR(b).
the position of the source is xed. This extension to
Theorem 1 is proved in [4]. By Lemma 2, TRR(s0 ) = \s S [MD(si ; d=2)] is
Under the linear model, DME also minimizes the non-empty. Let sj and sk be two points in S such
i2
source-sink delay in a ZST. We now prove that given that d(sj ; sk ) = d. The intersection of TRR(sj ) =
any input topology, DME will in fact construct a ZST MS(sj ; d=2) and TRR(sk ) = MS(sk ; d=2) must have
with delay equal to one-half the diameter of the sink
set S, which is the minimum feasible radius for any radius 0 (by Lemma 1), and so T RR(s0 ) must have
tree connecting S. radius 0.
De ne a Manhattan disk to be a TRR with a core For any node v, let tLD (v) be the linear delay (sum
consisting of a single point. In other words, a Manhat- of edge lengths) from v to each of the sinks in the
tan disk is the set of all points within a prescribed ra- subtree of v constructed by the DME algorithm.
dius of a central point. In the Manhattan plane, such Fact: For each node v in G, core(T RR(v)) = ms(v)
a \disk" is actually shaped like a diamond (e.g., trrp and radius(TRR(v)) = d=2 , tLD (v).
in Figure 5). Let MD(si ; r) denote the Manhattan We prove the Fact using induction on the maximum
disk with core fsi g and radius r 0. The diameter of number of edges between v and sinks in its subtree. If
S is de ned to be minfd(si ; sj ) j si ; sj 2 S g. Lemma v is a sink, then core(TRR(v)) = fvg = ms(v); and
2 shows that it is feasible to construct a ZST for S
with linear delay equal to one-half its diameter. radius(TRR(v)) = d=2 = d=2 , tLD (v):
Lemma 2 : Let d be the diameter of sink set S. Then If v is an internal node with children a and b, in-
ductively assume that the Fact holds for a and b.
\s S [MD(si ; d=2)] = 6 ;: In the linear delay model, we have that tLD (a) =
i2
tLD (v) , jea j. Hence,
Proof: It is well known that the Manhattan metric radius(TRR(a)) = d=2 , tLD (a)
after a 45 degree rotation is equivalent to the L met- = d=2 , tLD (v) + jea j
ric, where d[(x; y); (x ; y )] = maxfjx , x j; jy , y jg.
1
0 0 0 0
tical and horizontal boundaries. Consider the small- Similarly, radius(TRR(b)) = d=2 , tLD (v) + jeb j.
est rectangle R with vertical and horizontal boundary Consider the TRRs trra and trrb constructed by
lines that contains all points in S (after rotation). Let procedure Build Tree of Segments in Figure 4. By
d be the diameter of S. Then both the width and construction, core(trra ) = ms(a), radius(trra ) =
height of R must be less than or equal to d (otherwise jea j, core(trrb ) = ms(b), and radius(trrb ) = jeb j.
there would be two sinks si and sj with d(si ; sj ) > d). Thus,
Consequently, the point at the center of R is within radius(T RR(a)) = d=2 , tLD (v) + radius(trra )
distance
T [MD(s d=2 of all sinks in S, and is contained in
s S
i2 i ; d=2)]. and
The next lemma states that increasing the radius radius(T RR(b)) = d=2 , tLD (v) + radius(trrb)
of two TRRs by a constant, , will increase the radius
of their intersection by without changing its core. In other words, T RR(a) and TRR(b) can be con-
structed from trra and trrb, respectively, by adding
Lemma 3 : Let A and B be TRRs, and suppose the constant d=2 , tLD (v) to their radii. Consequently,
A \ B = C 6= ;. Construct TRRs A and B such0 0
Lemma 3 implies that core(TRR(v)) = ms(v) and
that for 0, core(A ) = core(A), radius(A ) =
0 0
radius(TRR(v)) = d=2 , tLD (v). This proves the
radius(A)+ , core(B ) = core(B), and radius(B ) =
0 0
Fact.
radius(B) + . If C = A \ B , then core(C ) =
0 0 0 0
Because radius(TRR(s0 )) = 0, we have that
core(C) and radius(C ) = radius(C) + .
0
tLD (s0 ) = d=2, which proves the theorem.
5 Suboptimality For Elmore Delay Our results also indicate a very signi cant reduction
While the experimental results in Section 6 clearly in source-sink delay in the Elmore model: the com-
show the e ectiveness of the DME algorithm in the bination of KCR+DME reduced Elmore delay by an
Elmore delay model, examples exist for which DME average of 22% compared to the results of Tsay.
does not give an optimal ZST under the Elmore model 7 Conclusion
for a given topology [2][4]. The counterexample in
[2][4] refutes the claim in [3] that the DME algorithm The Deferred-Merge Embedding (DME) algorithm
is optimal for any given routing topology under the o ers many improvements over previous embedding
Elmore model. schemes. DME constructs a highly exible tree
of merging segments which allows a choice among
6 Results minimum-cost zero skew clock trees. Given any con-
nection topology over the set of sink locations, DME
We implemented the DME algorithm on Sun always produces a tree with exact zero skew, and may
SPARC workstations in the C/UNIX environment. thus be applied to previously generated clock trees in
The code can be obtained from the authors. We used order to improve both wirelength and delay. Exper-
two sets of benchmarks: (i) the sink placements for iments show that applying DME to topologies gen-
the MCNC Primary1 and Primary2 benchmarks used erated by the algorithm of [6] results in wirelength
in [5] and [6], and originally provided by the authors reductions of 9% to 16% over [5] [6] [8]. Finally, under
of [5]; and (ii) the sink placements for the ve bench- the linear delay model, DME yields optimal total wire-
marks r1 - r5 used in [8]. length for the topology and optimal source-sink delay
Our experimental results for linear delay are con- overall.
tained in Table 1. We applied the DME embedding al-
gorithm to the topologies generated by the bottom-up,
matching based method of Kahng, Cong and Robins 8 Remarks and Acknowledgements
(KCR) [6]. We compare our results with the origi- Most of the results in this paper also appear in
nal KCR results and with the Method of Means and [4], re ecting a collaboration between the present au-
Medians (MMM) of Jackson et al. [5]. The combined thors and the authors of [3] that arose after it was
algorithm KCR+DME produced an average reduction learned that the two groups had, through indepen-
in cost of 9% from the original KCR results and 16% dent research, come up with essentially the same em-
from the MMM results. In the linear model, DME also bedding approach. The authors are grateful to Dr.
produces trees with optimal source-sink delay. In our Ren-Song Tsay for providing benchmark data.
experiments, this optimal delay was on average 19% References
less than that of the KCR constructions. [1] H. Bakoglu, Circuits, Interconnections and Packaging for
number KCR
reduction by
KCR+DME
reduction by
KCR+DME
VLSI , Addison-Wesley, 1990.
of MMM KCR +DME from from [2] K. D. Boese and A. B. Kahng, \Zero-Skew Clock Routing Trees
sinks cost cost cost MMM (%) KCR (%)
P1 269 161.7 153.9 140.3 13.2 8.8 With Minimum Wirelength," technical report UCLA CSD-
P2 603 406.3 376.7 350.4 13.8 7.0 920012, March 1992.
r1 267 1,815 1,627 1,497 17.5 8.0
r2 598 3,625 3,349 3,013 16.9 10.0 [3] T.-H. Chao, Y.-C. Hsu, and J.-M. Ho, \Zero Skew Clock Net
r3 862 4,643 4,360 3,902 16.0 10.5
r4 1,903 9,376 8,580 7,782 17.0 9.3 Routing," to appear in Proc. ACM/IEEE Design Automation
r5 3,101 13,805
average
12,928 11,665 15.5
15.7
9.8
9.1
Conf., 1992.
[4] T.-H. Chao, Y.-C. Hsu J.-M. Ho, K. D. Boese and A. B. Kahng,
Table 1: Comparison of KCR+DME with other algo- \Zero Skew Clock Routing With Minimum Wirelength," sub-
mitted to IEEE Transactions on Computers and Systems,
rithms for the linear delay model, using MCNC bench- 1992.
marks Primary1 (P1) and Primary2 (P2), and bench-
marks r1 through r5 from Tsay. [5] M. A. B. Jackson, A. Srinivasan and E. S. Kuh, \Clock Rout-
ing for High Performance ICs," Proc. ACM/IEEE Design Au-
tomation Conf., 1990, pp. 573-579.
reduction by reduction by
Tsay KCR KCR+DME KCR+DME [6] A. B. Kahng, J. Cong, and G. Robins, \High-Performance
MMM Tsay +DME +DME from from Clock Routing Based on Recursive Geometric Matching,"
cost cost cost cost MMM (%) Tsay (%)
P1 161.7 140.3 13.2 Proc. ACM/IEEE Design Automation Conf., 1991, pp. 322-
P2 406.3 348.3 14.3 327.
r1 1,815 1,697 1,658 1,487 18.1 12.4
r2 3,625 3,432 3,368 3,020 16.7 12.0 [7] J. Rubinstein, P. Pen eld, and M. A. Horowitz, \Signal De-
r3 4,643 4,407 4,333 3,867 16.7 12.3
r4 9,376 8,866 8,694 7,713 17.7 13.0 lay in RC Tree Networks," IEEE Transactions on Computer-
r5 13,805 13,199
average
12,926 11,606 15.9
16.1
12.1
12.4
Aided Design 2(3) July 1983, pp. 202-211.
[8] R. S. Tsay, \Exact Zero Skew," IEEE Int. Conference on
Table 2: Comparison of KCR+DME with other algo- Computer-Aided Design, 1991, pp. 336-339.
rithms for the Elmore delay model. Results of Tsay's
algorithm for benchmarks P1 and P2 were not avail-
able.
Similar improvements were obtained for Elmore de-
lay on the same benchmarks, as shown in Table 2.
The average reduction in wirelength was 16% versus
the MMM results, and 12% versus the results of Tsay.