Slack Driven Clock Offset CTS
Slack Driven Clock Offset CTS
Multi-corner Multi-mode
Timing Closure
Subhendu Roy1, Pavlos M. Mattheakis2, Laurent
Masse-Navette2 and David Z. Pan1
1ECE Department, The University of Texas at Austin
2Mentor Graphics, Fremont
1
Outline
! CTS Preliminaries
! Prior Work and Limitations
! Experimental Results
2
CTS-Preliminaries
3
CTS-Preliminaries
4
Prior Work and Limitations(1)
5
Prior Work and Limitations(2)
[Lu+, IMSCS’09] – Post-CTS bounded delay buffering
!
at leaves
› Buffering at leaves high area/power cost
› Does not tackle MCMM scenario
Too much
B1 B1 area cost
B2 B3 B2 B3
D Q D Q
Dslack < 0 Qslack > 0 Dslack > 0 Qslack < 0
Clk
Clk
7
Notion of Offset
! Pre-CTS useful skew Difficult to implement
! Post-CTS useful skew greedy, high area cost, may not
support MCMM
B1 B1
Reduce granularity
in clock scheduling
B2 B3 o1 B2 B3 o2
s1 s2 s3 s4 s5
ff1 ff2 ff3 ff4 ff5 ff1 ff2 ff3 ff4 ff5
B0
! Positive offset if doff > 0,
clock-arrival at B1’s
output to be delayed by doff
B2 B1 B3
doff
! Negative offset if doff < 0,
clock-arrival at B1’s output to
B4 B5 be expedited by doff
9
Our Contributions
! First
work to consider offsets at output pins of clock
tree cells
› In a placed design with already routed clock tree
! Anarea-efficient and non-intrusive algorithm is
presented
› To realize negative offsets
! A methodology for clock tree resynthesis presented
› Significantly improved timing metrics in large-scale
industrial designs under MCMM scenarios
10
Outline
! CTS Preliminaries
! Prior Work and Limitations
! Experimental Results
11
How CT-Resynthesis Fit in the Flow
Floorplanning, Placement
Pre-CTS Optimization
Estimate offsets by
LP solver
Clock Tree Resynthesis
Realize offsets
incrementally
Post-CTS Data-path Optimization
12
MCMM Offset Estimation
13
Positive Offset Realization
No impact on siblings
B0 B0
B2 B1 B3 B2
+doff B1 B3
B4 B5 D1
Delay block B4 B5
14
Negative Offset Realization Issues(1)
B0 B0
B2 B1 B2 B1
B3 B3
B5
B5 -d
B4 off B B4 B6
6
15
Negative Offset Realization Issues(2)
! Speed-up by buffer removal may not be practically
realizable
B0 B0
B1
B2 B3 B4 B2 B3 B4
Levels = [0 3]
Levels = [-1 3]
Levels = [-3 3]
hn1 hn2
19
Robust Negative Offset Realization
! Restructuring should guarantee no adverse impact on
clock-tree under MCMM
! Need to identify potential acceptor pins
› Sequential cells in TFO should have available positive slack
B0
B0 needs to be B0
a good acceptor
B1
B3 B2 B1 B3
B5
B4 B5 B6 B4 B6
-doff
20
Slack Manager to Identify Acceptors
B1 Qslksum = -8
Qslkcnt = 2
Qslksum = -2
Qslkcnt = 1 B3
Qslksum = -6
! Same info kept for D-slack
B2 Qslkcnt = 1 parameters
! Slack parameters
calculated
ff1 ff2 ff3 ff4 ff5 › Per scenario (mode +
corner combination)
Qslk=8 Qslk=4 Qslk=-2 Qslk=8 Qslk=-6
› Bottom-up fashion
21
Clock Tree Restructuring
B4
lev = x - 1
B0 B5 B6
lev = x
lev = x + 1 B1
22
Clock Tree Restructuring
B4
lev = x - 1
B0 B5 B6
lev = x
lev = x + 1 B1
Is neg. Q-slack count at B0
- neg. D-slack count at B0 >= 0 ?
No " Size up B1
B2 B3 Yes " To Move B1, Is neg. Q-
slack count at B4 = 0 across all
scenarios?
23
Clock Tree Restructuring
B4
lev = x - 1
B0 B5 B6
lev = x
lev = x + 1 B1
Is neg. Q-slack count at B4 = 0
across all scenarios?
Yes " B4 is a candidate
B2 B3 acceptor
24
Clock Tree Restructuring
B4
lev = x - 1
B0 B5 B6
lev = x B1
lev = x + 1
B2 B3
Restructuring guarantee no
adverse impact on FFs at the
TFO of B5 and B6
25
Neg. Offset Realization Algorithm (NORA)
Cost Function
Sort according to geometrical
proximity
Cost = ∞, if DRC violation
β * (error), o.w.
26
Neg. Offset Realization Algorithm (NORA)
27
Clock Tree Resynthesis Algorithm
Calculate clock tree offsets
No
Extract offset(p) Offset(p) > 0?
Yes
Yes
Any remaining NORA (p, offset)
offset?
No
End
28
Experimental Setup
! Integrated to Industrial P&R tool
! Run on 256GB RAM, 16-core 3GHz CPU
29
Only Negative Offset Realization
! Restructuring is area-efficient
! Avg. 15.85% improvement in TNS
30
Pos. and Neg. Offset Realization
31
The Overall Comparison
32
Conclusion and Future Work
! First work to consider offsets at output pins of clock tree
cells instead of estimating clock schedule at registers
! A novel clock tree resynthesis methodology presented
Future Work:
! Concurrent offset realization
! Introduce OCV-impact into the cost function
33
THANK YOU
Questions?
34
Back-up Slides
35
Future Work
36
Local Transformation
! Speed-up by buffer removal may not be practically
realizable
B0 B0
B1
B2 B3 B4 B2 B3 B4
[Rama12] Functional Skew Aware Clock Tree Synthesis by V. Ramachandran, ISPD 2012
38
Motivation
! [Kour99],[Naw06] - data path aware clock scheduling
› Calculate clock skew in pre-CTS stage
› Actual implementation difficult to achieve
› Unaware of MCMM scenarios
39
Preliminaries
FF1 FF2
Comb. Block
sd – ss
40
Preliminaries
FF1 FF2
Comb. Block
sd – ss
41
Motivation
! Issues
› Maximum operating frequency limited
› Sacrifice in area/power
42
Motivation
tpd,reg = 2 ns
Tsu = 1 ns T + (sd – ss) > tpd,reg+ tpd,comb + Tsu
17 ns 11 ns
Tclock,min = 20 ns
43
Motivation
tpd,reg = 2 ns
Tsu = 1 ns T + (sd – ss) > tpd,reg+ tpd,comb + Tsu
17 ns 11 ns
3 ns
Useful Skew
Tclock,min = 17 ns
44
Outline
! Preliminaries
! Motivation
! Our Approach
! Feasibility Aware Clock Scheduling (FACS)
! Experimental Results
45
What is Offset?
B0 B0
B2 B1 B1
B3 B2 B3
op +doff op -doff
B4 B5 B4 B5
Clock-arrival at op to be Clock-arrival at op to be
delayed by doff expedited by doff
46
Experimental Results
Discussion:
! In design E, clock-tree overhead (54.98%) seems high !
47
Offset Extraction in MCMM
! MCMM Handling
› Scaling factors calculated for each corner
› Functional timing paths across all active modes analyzed
48