0% found this document useful (0 votes)
6 views

Defered Merge Algorithm by Boese

Uploaded by

madhuri
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Defered Merge Algorithm by Boese

Uploaded by

madhuri
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Zero-Skew Clock Routing Trees With Minimum Wirelength

Kenneth D. Boese and Andrew B. Kahng


UCLA Computer Science Dept., Los Angeles, CA 90024-1596

Abstract computes loci of feasible locations for the roots of re-


In the design of high performance VLSI systems, cursively merged subtrees, and (ii) a top-down phase
minimization of clock skew is an increasingly impor- then resolves the exact embedding of these internal
tant objective. Additionally, wirelength of clock rout- nodes of the clock tree. In practice, the DME algo-
ing trees should be minimized in order to reduce sys- rithm begins with an initial clock tree computed by
tem power requirements and deformation of the clock any previous method, then maintains exact zero clock
pulse at the synchronizing elements of the system. In skew while reducing the wirelength. In regimes where
this paper, we present the Deferred-Merge Embedding the linear delay model applies, our method produces
(DME) algorithm, which in linear time embeds any the optimal (i.e., minimum wirelength) zero skew clock
given connection topology into the Manhattan plane tree with respect to the prescribed topology, and this
to create a clock tree with zero skew while minimizing tree will also enjoy optimal source-sink delay. Experi-
total wirelength. Extensive experimental results show mental results in Section 6 below show that the DME
that the algorithm yields exact zero skew trees with 9% approach is highly e ective in both the Elmore and
to 16% wirelength reduction over previous construc- linear delay models. We achieve average savings in
tions [5] [6]. The DME algorithm may be applied to total clock tree wirelength of 16% over the MMM al-
either the Elmore or the linear delay model, and yields gorithm [5] and 9% over the method of Kahng et al.
optimal total wirelength for linear delay. [6]. In all cases, our clock trees have exact zero skew
according to the appropriate delay model.
1 Introduction 2 Problem Formulation
In synchronous VLSI designs, circuit speed is in- The placement phase of physical layout determines
creasingly limited by clock skew, which is the maxi- positions for the synchronizing elements of a circuit,
mum di erence in arrival times of the clocking signal which we call the sinks of the clock net. A nite set of
at the synchronizing elements. This is seen from the sink locations, denoted by S = fs1 ; s2; : : :; sng  <2,
following well-known inequality governing the clock speci es an instance of the clock routing problem. A
period of a clock signal net [1] [5]: connection topology is de ned to be a rooted binary
tree, G, which has n leaves corresponding to the set of
clock period  td + tskew + tsu + tds sinks S. A clock tree T(S) is an embedding of the con-
nection topology in the Manhattan plane.1 In other
where td is the delay on the longest path through com- words, the embedding associates a placement in <2
binational logic, tskew is the clock skew, tsu is the with each internal node v 2 G; we will use pl(T; v) or
set up time of the synchronizing elements, and tds is pl(v) to denote this location. The root of the clock
the propagation delay within the synchronizing ele- tree is the clock source, denoted by s0 . We direct all
ments. With increased switching speeds, tskew may edges of the clock tree away from the source; a directed
account for over 10% of the system cycle time in high- edge from v to w may be uniquely identi ed with w
performance systems [1]. and denoted by ew . We say that v is the parent of
Previous methods for skew minimization [5] [6] [8] w, and w is a child of v. The wirelength, or cost, of
concentrate on the problem of computing a clock tree the edge ew is denoted by jew j, and must be greater
topology, and only incompletely address the associ- than or equal to the Manhattan distance between its
ated problem of nding a minimum-cost embedding endpoints pl(w) and pl(v).2 The cost of T(S) is the
of the topology. However, the total wirelength of the total wirelength of the edges in T(S).
clock tree is critical to power consumption and other For a given clock tree T(S), let td (s0 ; si ) denote the
area/performance parameters of the layout. In this signal propagation time, or delay, on the unique path
paper, we propose a new approach which achieves ex- from source s0 to sink si ; the collection of edges in this
act zero skew while signi cantly reducing the total path is denoted by path(s0 ; si ). The skew of T (S) is
wirelength of the clock tree. The basic idea of our the maximum value of jtd (s0 ; si ) , td (s0 ; sj )j over all
Deferred-Merge Embedding (DME) algorithm is to de-
fer the embedding of internal nodes in a given topol- 1 Because the meaning is clear, we use T (S ) instead of
ogy for as long as possible: (i) a bottom-up phase T (S;G) to denote a clock tree, although implicitly the embed-
ding is always with respect to a particular topology G.
 This
work was supported in part by NSF MIP-9110696, 2 To preserve zero skew, it is sometimes necessary for an edge
ARO DAAK-70-92-K-0001, and a GTE Graduate Fellowship. to have length greater than the distance between its endpoints.
sink pairs si ; sj 2 S. If the skew of T (S) is zero then the wiring directions. The collection of points within
it is called a zero skew clock tree (ZST). Given a set a xed distance of a Manhattan arc is called a tilted
S of sinks, the zero skew clock routing problem is to rectangular region, or TRR, whose boundary is com-
construct a ZST T (S) of minimum cost. A variant of posed of Manhattan arcs (see Figure 1). The core of a
interest is where the topology is prescribed: TRR is the subset of the TRR at maximum (Manhat-
Zero Skew Clock Routing Problem (S,G): tan) distance from its boundary; this subset is always
Given a set S of sink locations and a connection topol- a Manhattan arc. The radius of a TRR is the distance
ogy G, construct a ZST T(S) with topology G and hav- between its core and its boundary.
ing minimum cost.
The notion of a zero skew clock tree is well de ned
only in the context of a method for evaluating signal
delays. The delay from the source to any sink de- core

pends on the wirelength of the source-sink path, the


RC constants of the wire segments in the routing, and
the underlying connection topology of the clock tree.
In practice simple RC delay approximations, such as
radius

the linear model or the Elmore model, are often used


to approximate signal delay. Since our construction
applies to any delay model that is monotone in the
wirelength of each edge (e.g., in the linear model, de-
lay is simply given by edge length), we defer details of Figure 1: An example of a TRR.
these delay models to [2][3][8].
3 The Deferred-Merge Embedding trra trrb

(DME) Algorithm
The Deferred-Merge Embedding (DME) algorithm
embeds internal nodes of the topology G via a two-
phase process. A bottom-up phase constructs a tree
ms(a)
ms(v) ms(b)
of line segments that represent loci of possible place-
ments of the internal nodes in the ZST. A top-down
|e b|

phase then resolves the exact locations of all internal


nodes in T. In the discussion that follows, the dis-
|ea |

tance between two points p and q is assumed to be the


Manhattan distance d(p; q), and the distance between
two sets of points P and Q, written d(P; Q), is given
by minfd(p; q) j p 2 P and q 2 Qg.
3.1 Phase I: Tree of Merging Segments Figure 2: Construction of merging segment
For prescribed sink locations S and connection ms(v).
topology G, we construct a tree of merging segments. The merging segment of node v, ms(v), is de ned
For each node v 2 G, we construct a merging seg-
ment containing a set of possible placements of v. The recursively as follows: if v is a sink si , then ms(v) =
merging segment of a node depends on the merging fsi g. If v is an internal node, then ms(v) is the set
segments of its two children, so the topology must be of all points within distance jea j of ms(a) and within
processed in a bottom-up order. In building the tree of distance jeb j of ms(b). If ms(a) and ms(b) are both
merging segments, we also assign a length to each edge Manhattan arcs, then we obtain the merging segment
in G; this length is retained in the nal embedding of ms(v) by intersecting two TRRs, trra with core ms(a)
G as a ZST. and radius jea j, and trrb with core ms(b) and radius
Let a and b be the children of node v in G. We jebj, i.e., ms(v) = trra \ trrb. Figure 2 depicts an
use TSa and T Sb to denote the subtrees of merging example of the construction of ms(v). The following
segments rooted at a and b, respectively. We are in- lemma can be used to show that if ms(a) and ms(b)
terested in placements of v which allow TSa and TSb are Manhattan arcs, then ms(v) is also a Manhattan
to be merged with minimum added wire while preserv- arc. Moreover, since for each sink si , we have that
ing zero skew. De ne the merging cost between T Sa ms(si ) is a single point and thus a Manhattan arc, by
and TSb to be jea j + jeb j, where jea j and jebj are the induction all merging segments are Manhattan arcs.
lengths to be assigned to edges ea and eb . Since delay
is a monotone increasing function of wirelength, there Lemma 1 : The intersection of two TRRs, A and
is a unique assignment to jea j and jeb j that minimizes B , is also a TRR and can be found in constant time.
merging cost while balancing delays at pl(v). If radius(A)+radius(B) = d(core(A); core(B)), then
A Manhattan arc is a line segment, possibly of zero A \ B is also a Manhattan arc.
length, with slope +1 or -1; in other words, a Man-
hattan arc is a line segment tilted at 45 degrees from The proof of Lemma 1 is contained in [2].
trrp

Figure 3 illustrates a tree of merging segments. The


leaves of the tree are all single points representing the
sink locations s1 ; : : :; s8, and the interior nodes of the
tree are Manhattan arcs.
pl(p)

s1
s8 |e v |

s2 s7 possible
ms(v) placements
of v

Figure 5: Finding the placement of v given the


s3
s6 placement of its parent p.
Procedure Find Exact Placements
Input: Tree of segments TS containing ms(v)
s5
root merging
segment and jev j for each node v in G
Output: ZST T(S)
s4

Figure 3: A tree of merging segments. Solid for each internal node v in G (top-down order)
lines are merging segments; dotted lines indicate if v is the root
edges between merging segments. Choose any q 2 ms(v)
pl(v) q
Procedure Build Tree of Segments else
Input: Topology G; set of sink locations S Let p be the parent node of v
Output: Merging segments ms(v) for each Construct trrp as follows:
node v in G and edge lengths jev j for core(trrp ) fpl(p)g
each v =6 s0 radius(trrp ) jev j
for each node v in G (bottom-up order) Choose any q 2 ms(v) \ trrp
if v is a sink node, pl(v) q
ms(v) fpl(v)g endif
else Figure 6: Creating the ZST by embedding inter-
Let a and b be the children of v nal nodes of the topology.
Calculate Edge Lengths(jea j,jebj)
Create TRRs trra and trrb as follows: we select pl(v) as follows: (i) if v is the root node, then
core(trra ) ms(a)
radius(trra ) jea j any point in ms(v) can be chosen as pl(v);3 and (ii) if
core(trrb ) ms(b) v is an internal node other than the root, then v can
radius(trrb ) jeb j be embedded at any point in ms(v) that is at distance
ms(v) trra \ trrb jev j or less from pl(p). (The merging segment ms(p)
endif was constructed such that d(ms(v); ms(p))  jev j, so
there must exist some choice of pl(v) satisfying this
Figure 4: Constructing the tree of segments. condition.4) More speci cally, the procedure creates
a square TRR trrp with radius ev and with core equal
Figure 4 gives a precise description of the pro- to the placement of v's parent node p. The placement
cedure Build Tree of Segments, which constructs the of v can be any point from ms(v) \ trrp (see Figure
tree of merging segments. Details of the Calcu- 5). In Figure 3, the resulting placements for the tree
late Edge Lengths step depend on the delay model. of merging segments are indicated by the points where
For the linear model, the calculation is straightfor- segments are connected by dotted lines. Figure 6 de-
ward (see [2]). The calculation for the Elmore model scribes procedure Find Exact Placements, which per-
can be found in [2][3][8]. Unless more wire is needed forms the embedding of nodes from the tree of merging
to balance delays between Ta and Tb , it must be that segments.
jea j + jeb j = d(ms(a); ms(b)). Since each instruction in Find Exact Placements is
By Lemma 1, procedure Build Tree of Segments re- executed at most once for each node in G, and since
quires constant time to compute each new merging the intersection of TRRs ms(v) and trrp can be found
segment, and linear time in the size of S to construct in constant time by Lemma 1, Find Exact Placements
the entire tree of merging segments. 3 If the speci cation requires a xed source location, s0 ,
0
3.2 Phase II: Embedding of Nodes choose pl(s0 ) 2 ms(s0 ) with minimum distance from s00 and
connect a wire directly from s00 to pl(s0 ).
Once the tree of segments has been constructed, the 4 The distance can be less than d(ms(v ); ms(p)) only when
exact embeddings of internal nodes in the ZST are cho- extra wire is used to merge v with its sibling w, i.e., when the
sen in a top-down manner. For node v in topology G, merging cost for p is greater than d(ms(v); ms(w)).
requires time linear in the size of S. Hence, DME is a Proof: The lemma is obvious after transformation to
linear time algorithm overall. the L metric, where TRRs become rectangles with
vertical and horizontal boundaries.
1

4 Optimality of DME for Linear Delay Theorem 2 : For any sink set S and topology G, the
The DME algorithm is optimal in the linear delay DME algorithm will nd a ZST with minimum feasible
regime (the proof of Theorem 1 is contained in [2]). delay, equal to one-half the diameter of S .
Theorem 1 Given a set of sinks S  <2 and a con- Proof: Let d equal the diameter of S. We assign a
nection topology G, the DME algorithm produces a TRR, called TRR(v), to each node v 2 G such that
ZST T in the linear model with minimum cost over
all ZSTs with topology G and sinks S .  if v is a sink node, then TRR(v) =
MD(pl(v); d=2); and
DME also produces the optimal ZST in the varia-  if v is an internal node with children a and b, then
tion of the Zero Skew Clock Routing Problem where TRR(v) = TRR(a) \ TRR(b).
the position of the source is xed. This extension to
Theorem 1 is proved in [4]. By Lemma 2, TRR(s0 ) = \s S [MD(si ; d=2)] is
Under the linear model, DME also minimizes the non-empty. Let sj and sk be two points in S such
i2

source-sink delay in a ZST. We now prove that given that d(sj ; sk ) = d. The intersection of TRR(sj ) =
any input topology, DME will in fact construct a ZST MS(sj ; d=2) and TRR(sk ) = MS(sk ; d=2) must have
with delay equal to one-half the diameter of the sink
set S, which is the minimum feasible radius for any radius 0 (by Lemma 1), and so T RR(s0 ) must have
tree connecting S. radius 0.
De ne a Manhattan disk to be a TRR with a core For any node v, let tLD (v) be the linear delay (sum
consisting of a single point. In other words, a Manhat- of edge lengths) from v to each of the sinks in the
tan disk is the set of all points within a prescribed ra- subtree of v constructed by the DME algorithm.
dius of a central point. In the Manhattan plane, such Fact: For each node v in G, core(T RR(v)) = ms(v)
a \disk" is actually shaped like a diamond (e.g., trrp and radius(TRR(v)) = d=2 , tLD (v).
in Figure 5). Let MD(si ; r) denote the Manhattan We prove the Fact using induction on the maximum
disk with core fsi g and radius r  0. The diameter of number of edges between v and sinks in its subtree. If
S is de ned to be minfd(si ; sj ) j si ; sj 2 S g. Lemma v is a sink, then core(TRR(v)) = fvg = ms(v); and
2 shows that it is feasible to construct a ZST for S
with linear delay equal to one-half its diameter. radius(TRR(v)) = d=2 = d=2 , tLD (v):
Lemma 2 : Let d be the diameter of sink set S. Then If v is an internal node with children a and b, in-
ductively assume that the Fact holds for a and b.
\s S [MD(si ; d=2)] = 6 ;: In the linear delay model, we have that tLD (a) =
i2
tLD (v) , jea j. Hence,
Proof: It is well known that the Manhattan metric radius(TRR(a)) = d=2 , tLD (a)
after a 45 degree rotation is equivalent to the L met- = d=2 , tLD (v) + jea j
ric, where d[(x; y); (x ; y )] = maxfjx , x j; jy , y jg.
1
0 0 0 0

Hence we need only prove the lemma for the L met-


ric, where TRRs are equivalent to rectangles with ver-
1

tical and horizontal boundaries. Consider the small- Similarly, radius(TRR(b)) = d=2 , tLD (v) + jeb j.
est rectangle R with vertical and horizontal boundary Consider the TRRs trra and trrb constructed by
lines that contains all points in S (after rotation). Let procedure Build Tree of Segments in Figure 4. By
d be the diameter of S. Then both the width and construction, core(trra ) = ms(a), radius(trra ) =
height of R must be less than or equal to d (otherwise jea j, core(trrb ) = ms(b), and radius(trrb ) = jeb j.
there would be two sinks si and sj with d(si ; sj ) > d). Thus,
Consequently, the point at the center of R is within radius(T RR(a)) = d=2 , tLD (v) + radius(trra )
distance
T [MD(s d=2 of all sinks in S, and is contained in
s S
i2 i ; d=2)]. and
The next lemma states that increasing the radius radius(T RR(b)) = d=2 , tLD (v) + radius(trrb)
of two TRRs by a constant, , will increase the radius
of their intersection by  without changing its core. In other words, T RR(a) and TRR(b) can be con-
structed from trra and trrb, respectively, by adding
Lemma 3 : Let A and B be TRRs, and suppose the constant d=2 , tLD (v) to their radii. Consequently,
A \ B = C 6= ;. Construct TRRs A and B such0 0
Lemma 3 implies that core(TRR(v)) = ms(v) and
that for   0, core(A ) = core(A), radius(A ) =
0 0
radius(TRR(v)) = d=2 , tLD (v). This proves the
radius(A)+ , core(B ) = core(B), and radius(B ) =
0 0
Fact.
radius(B) +  . If C = A \ B , then core(C ) =
0 0 0 0
Because radius(TRR(s0 )) = 0, we have that
core(C) and radius(C ) = radius(C) +  .
0
tLD (s0 ) = d=2, which proves the theorem.
5 Suboptimality For Elmore Delay Our results also indicate a very signi cant reduction
While the experimental results in Section 6 clearly in source-sink delay in the Elmore model: the com-
show the e ectiveness of the DME algorithm in the bination of KCR+DME reduced Elmore delay by an
Elmore delay model, examples exist for which DME average of 22% compared to the results of Tsay.
does not give an optimal ZST under the Elmore model 7 Conclusion
for a given topology [2][4]. The counterexample in
[2][4] refutes the claim in [3] that the DME algorithm The Deferred-Merge Embedding (DME) algorithm
is optimal for any given routing topology under the o ers many improvements over previous embedding
Elmore model. schemes. DME constructs a highly exible tree
of merging segments which allows a choice among
6 Results minimum-cost zero skew clock trees. Given any con-
nection topology over the set of sink locations, DME
We implemented the DME algorithm on Sun always produces a tree with exact zero skew, and may
SPARC workstations in the C/UNIX environment. thus be applied to previously generated clock trees in
The code can be obtained from the authors. We used order to improve both wirelength and delay. Exper-
two sets of benchmarks: (i) the sink placements for iments show that applying DME to topologies gen-
the MCNC Primary1 and Primary2 benchmarks used erated by the algorithm of [6] results in wirelength
in [5] and [6], and originally provided by the authors reductions of 9% to 16% over [5] [6] [8]. Finally, under
of [5]; and (ii) the sink placements for the ve bench- the linear delay model, DME yields optimal total wire-
marks r1 - r5 used in [8]. length for the topology and optimal source-sink delay
Our experimental results for linear delay are con- overall.
tained in Table 1. We applied the DME embedding al-
gorithm to the topologies generated by the bottom-up,
matching based method of Kahng, Cong and Robins 8 Remarks and Acknowledgements
(KCR) [6]. We compare our results with the origi- Most of the results in this paper also appear in
nal KCR results and with the Method of Means and [4], re ecting a collaboration between the present au-
Medians (MMM) of Jackson et al. [5]. The combined thors and the authors of [3] that arose after it was
algorithm KCR+DME produced an average reduction learned that the two groups had, through indepen-
in cost of 9% from the original KCR results and 16% dent research, come up with essentially the same em-
from the MMM results. In the linear model, DME also bedding approach. The authors are grateful to Dr.
produces trees with optimal source-sink delay. In our Ren-Song Tsay for providing benchmark data.
experiments, this optimal delay was on average 19% References
less than that of the KCR constructions. [1] H. Bakoglu, Circuits, Interconnections and Packaging for
number KCR
reduction by
KCR+DME
reduction by
KCR+DME
VLSI , Addison-Wesley, 1990.
of MMM KCR +DME from from [2] K. D. Boese and A. B. Kahng, \Zero-Skew Clock Routing Trees
sinks cost cost cost MMM (%) KCR (%)
P1 269 161.7 153.9 140.3 13.2 8.8 With Minimum Wirelength," technical report UCLA CSD-
P2 603 406.3 376.7 350.4 13.8 7.0 920012, March 1992.
r1 267 1,815 1,627 1,497 17.5 8.0
r2 598 3,625 3,349 3,013 16.9 10.0 [3] T.-H. Chao, Y.-C. Hsu, and J.-M. Ho, \Zero Skew Clock Net
r3 862 4,643 4,360 3,902 16.0 10.5
r4 1,903 9,376 8,580 7,782 17.0 9.3 Routing," to appear in Proc. ACM/IEEE Design Automation
r5 3,101 13,805
average
12,928 11,665 15.5
15.7
9.8
9.1
Conf., 1992.
[4] T.-H. Chao, Y.-C. Hsu J.-M. Ho, K. D. Boese and A. B. Kahng,
Table 1: Comparison of KCR+DME with other algo- \Zero Skew Clock Routing With Minimum Wirelength," sub-
mitted to IEEE Transactions on Computers and Systems,
rithms for the linear delay model, using MCNC bench- 1992.
marks Primary1 (P1) and Primary2 (P2), and bench-
marks r1 through r5 from Tsay. [5] M. A. B. Jackson, A. Srinivasan and E. S. Kuh, \Clock Rout-
ing for High Performance ICs," Proc. ACM/IEEE Design Au-
tomation Conf., 1990, pp. 573-579.
reduction by reduction by
Tsay KCR KCR+DME KCR+DME [6] A. B. Kahng, J. Cong, and G. Robins, \High-Performance
MMM Tsay +DME +DME from from Clock Routing Based on Recursive Geometric Matching,"
cost cost cost cost MMM (%) Tsay (%)
P1 161.7 140.3 13.2 Proc. ACM/IEEE Design Automation Conf., 1991, pp. 322-
P2 406.3 348.3 14.3 327.
r1 1,815 1,697 1,658 1,487 18.1 12.4
r2 3,625 3,432 3,368 3,020 16.7 12.0 [7] J. Rubinstein, P. Pen eld, and M. A. Horowitz, \Signal De-
r3 4,643 4,407 4,333 3,867 16.7 12.3
r4 9,376 8,866 8,694 7,713 17.7 13.0 lay in RC Tree Networks," IEEE Transactions on Computer-
r5 13,805 13,199
average
12,926 11,606 15.9
16.1
12.1
12.4
Aided Design 2(3) July 1983, pp. 202-211.
[8] R. S. Tsay, \Exact Zero Skew," IEEE Int. Conference on
Table 2: Comparison of KCR+DME with other algo- Computer-Aided Design, 1991, pp. 336-339.
rithms for the Elmore delay model. Results of Tsay's
algorithm for benchmarks P1 and P2 were not avail-
able.
Similar improvements were obtained for Elmore de-
lay on the same benchmarks, as shown in Table 2.
The average reduction in wirelength was 16% versus
the MMM results, and 12% versus the results of Tsay.

You might also like