Scalable H-Tree With Useful Skew
Scalable H-Tree With Useful Skew
Y, MONTH 2017 1
1937-4151 c 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. XX, NO. Y, MONTH 2017 2
TABLE I
P ROPERTIES CLOCK TREE CONSTRUCTION PERFORMED BASED ON THE DYNAMIC IMPLIED SKEW CONSTRAINTS AND THE STATIC ARRIVAL TIME
CONSTRAINTS , WHICH ARE RESPECTIVELY SHOWN IN F IGURE 1 AND F IGURE 2.
and lower bound of each arrival time range. satisfies B2 ≥ xub i and B2 ≥ −xlb i for all i ∈ V , which
To facilitate the construction of such a BST, the mini- is illustrated in Figure 4. The virtual minimum delay offset
mum and maximum downstream delay of each subtree i are of fimin and virtual maximum delay offset of fimax for a sink i
stored and denoted min ti and max ti , respectively. Initially, are specified by the arrival time constraints and B v as follows:
min t = 0 and max t = 0 for each subtree. Next, a clock tree
is constructed by iteratively merging subtrees while ensuring Bv
of fimin = − − xlb
i , (9)
that max tk − min tk ≤ B of each formed subtree k. 2
v
A pair of subtrees i and j are merged into a larger subtree B
of fimax = − xub
i , (10)
k with max tk − min tk ≤ B as follows: the subtrees i 2
and j are connected with a wire and the length of the wire The skew bound B v can be obtained in constant time and
is equal to the Manhattan distance between the subtrees. (For min tk and max tk can still be incrementally computed
certain pairs of delay imbalanced subtrees, detour wiring is for each subtree k. Therefore, it is possible to merge a pair
required [1].) Next, the alternative locations for the root of of subtrees in constant time. Given a specified routing tree
subtree k are determined. This can be performed in constant topology, it is therefore also possible to construct a UST in
time, as the skew bound B can be obtained in constant time linear time.
and min tk and max tk can be computed incrementally as Note that the reference point is arbitrary and not specified
follows: and that B v can in fact be defined to an arbitrary value by
the offsets. The generalization allows arrival time ranges to
min tk = min{min ti + w(k, i), min tj + w(k, j)}, (7)
be unaligned (importantly pairwise non-intersecting) and of
max tk = max{max ti + w(k, i), max tj + w(k, j)}, (8) different lengths. The increased flexibility in the specification
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. XX, NO. Y, MONTH 2017 6
stem from that the inserted safety margins are smaller than the
magnitude of the delay variations introduced by OCV.
The clock tree construction is guided by the Elmore delay
model and pre-characterized LUTs for the buffers. The LUTs
characterize the output transition time and propagation delay
of the buffers as a function of the downstream load capacitance
and the input transition time. The CTO process is guided by
NGSPICE simulations. Optionally, NGSPICE simulations can
also be integrated into the CTS phase to accurately evaluate
the transition time of each subtree and to update the SG with
accurate timing after each buffer stage has been constructed.
Fig. 7. Flow for CTS and CTO.
B. Solution space exploration implementation is considered for each subtree. (We do not
Solution space exploration is performed by transforming consider various buffer sizes and various stem wire lengths
each subtree into alternative implementations. In the following, when respecifying the constraints.) However, when alternative
we describe the details of how the transformations are applied buffer tree topologies are considered, negative cycles may be
in the tree construction process. formed in the SG. Therefore, a subset of alternative subtree
1) Routing tree topology transformation: When a subtree implementations that do not create negative cycles is required
with n leaf nodes is inserted into the NNG, it is rerooted to to be found. This is solved by first selecting the minimum cost
(2n − 3) subtrees with different tree topologies. To bound the subtree implementation for each subtree. Next, one alternative
run-time overhead, subtrees with detour wires are discarded implementation is added to each subtree while it is verified
during the rerooting process. In addition, if there are more than that no negative cycles are formed in the SG. If a negative
Nmax alternative implementations for a subtree, the Nmax cycle is formed, the subtree implementation creating the cycle
implementations with the smallest capacitive cost are sampled is discarded. This process is repeated until all implementations
and the remainder of the implementations are discarded [3]. have been added or discarded.
Consequently, the run-time complexity of evaluating the cost
2
of joining two subtrees is O(Nmax ). After a pair of subtrees
are joined, only the subtree combination with the smallest D. Clock tree optimization
capacitive cost is kept. Next, the newly formed subtree is
After an initial clock tree has been constructed in the CTS
rerooted and reinserted into the NNG.
phase, some timing violations may still exist. The timing
2) Buffer tree topology transformation: The buffer tree
violations stem from that the inserted safety margins Muser
topology transformation is applied after a buffer has been
are smaller than the magnitude of the delay variations δi and
inserted at the root of each subtree in the buffer insertion step.
δj introduced by OCV. Let the closest common ancestor of
The transformation allows a buffered subtree with n leaf nodes
FFi and FFj in the clock tree be denoted CCAij . In Eq (1)
to be transformed into (2n − 3) alternative implementations.
and Eq (2), δi (δj ) is equal to cocv times the propagation delay
However, many implementations are eliminated because of
between CCAij and FFi (FFj ) in the clock tree. The parameter
the transition time constraint. In our implementation, subtrees
cocv is set to 0.085 in our experiments. The total negative
that required the insertion of detour wires were also removed.
slack (TNS) is the sum of the timing violations in Eq (1) and
Next, the formed subtrees with buffers attached at the root are
Eq (2). The objective of the CTO phase is to reduce TNS to
inserted into the NNG.
zero. The motivation for performing CTO in this paper is to
3) Buffer sizing and stem wire insertion transformation:
allow comparisons with results in earlier studies where CTO
Buffer sizing and stem wire insertion is applied after the
was applied. We evaluate our clock trees both after CTS and
buffer tree topology transformation. In combination, the two
after CTO, to demonstrate the effectiveness of the proposed
transformations result in up to p · q · (2n − 3) alternative
framework.
implementations for a subtree with n leaf nodes.
When stem wires of different lengths are used, the cost of In general, CTO is performed by realizing delay adjustments
merging two subtrees is set to the length of the stem wires (in in the tree by inserting buffers and detour wires. The delay
the previous buffer stage) plus the length of wires required to adjustments are specified using an LP formulation [18], [19],
connect the subtree pair. If buffers are sized up, the capacitive [20]. For further technical details of the CTO phase, please
cost of the buffers is also included. (The wire length cost refer to [18], [19], [20].
can be translated into capacitive cost.) When stem wires of
different lengths are used, it is common that multiple different
alternative subtree combinations result in subtrees with the IX. E XPERIMENTAL EVALUATION
exact same (smallest) cost. For these cases, it is important to
In this section, we present experimental results to demon-
keep all (or at least multiple) alternative subtree combinations,
strate the effectiveness of the proposed framework. In Sec-
or there may be an noticeable overall loss in solution quality.
tion IX-A, we introduce the tree structures that are used in
Even though the capacitive costs are the same, the root node of
the evaluation. In Section IX-B, static arrival time constraints
each topology may be restricted to different spatial locations.
are compared with dynamic implied skew constraints. The
Note that the selection of the exact physical location of each
solution space exploration is evaluated in Section IX-C. The
buffer (and stem wire) is deferred using the DME paradigm
techniques of specifying and respecifying arrival time con-
until the top-down embedding as in [9].
straints are evaluation in Section IX-D. The robustness of the
constructed clock trees to OCV is evaluated in Section IX-E
C. Specification and respecification of arrival time constraints and the timing model selection is evaluated in Section IX-F.
In the construction of the bottom most buffered stage, Note that CTO is only applied when evaluating the robustness
the arrival time constraints are specified with respect to the to OCV and the timing model selection in Section IX-E and
sinks, as described in Section VII. In the construction of a Section IX-F, respectively.
higher-level buffer stage, an SG is formed and the arrival The algorithms are implemented in C++ and the experi-
time constraints are respecified to expose additional timing ments are performed on a 8 core 3.4 GHz Linux machine
margins. It is easy to respecify the constraints if only a single with 31.3 GB of memory.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. XX, NO. Y, MONTH 2017 10
TABLE III
C OMPARISONS OF VARIOUS TREE STRUCTURES .
TABLE IV
E VALUATION OF CLOCK TREES IN TNS, TIMING YIELD , AND CAPACITIVE COST. A ‘-’ IN THE CTO RUN - TIME COLUMN MEANS THAT CTO IS NOT
REQUIRED .