Machine Learning Applications in Physical Design - Recent Results and Directions
Machine Learning Applications in Physical Design - Recent Results and Directions
1 CONTEXT: THE LAST SCALING LEVERS Figure 1: Design Capability Gap [40] [20].
Semiconductor technology scaling is challenged on many fronts More broadly, the industry faces three intertwined challenges:
that include pitch scaling, patterning flexibility, wafer processing cost, quality and predictability. Cost corresponds to engineering
cost, interconnect resistance, and variability. The difficulty of contin- effort, compute effort, and schedule. Quality corresponds to tra-
uing Moore’s-Law lateral scaling beyond the foundry 5nm node has ditional power, performance and area (PPA) competitive metrics
been widely lamented. Scaling boosters (buried interconnects, back- along with other criteria such as reliability and yield (which also
side power delivery, supervias), next device architectures (VGAA determines cost). Predictability corresponds to the reliability of the
FETs), ever-improving design-technology co-optimizations, and design schedule, e.g., whether there will be unforeseen floorplan
use of the vertical dimension (heterogeneous multi-die integration, ECO iterations, whether detailed routing or timing closure flow
monolithic 3D VLSI) all offer potential extensions of the indus- stages will have larger than anticipated turnaround time, etc. Prod-
try’s scaling trajectory. In addition, various “rebooting computing” uct quality of results (QOR) must also be predictable. Each of three
paradigms – quantum, approximate, stochastic, adiabatic, neuro- challenges implies a corresponding “last lever” for scaling. In other
morphic, etc. – are being actively explored. words, reduction of design cost, improvement of design quality, and
No matter how future extensions of semiconductor scaling ma- reduction of design schedule (which is the flip side of predictability;
terialize, the industry already faces a crisis: design of new products recall that Moore’s Law is “one week equals one percent”) are are
in advanced nodes costs too much.1 Cost pressures rise when in- all forms of design-based equivalent scaling [19] [20] that can ex-
cremental technology and product benefits fall. Transitioning from tend availability of leading-edge technology to designers and new
40nm to 28nm brought as little as 20% power, performance or area products. A powerful lever for this will be the use of machine learn-
(PPA) benefit. Today, going from foundry 10nm to 7nm, or from ing (ML) techniques, both inside and “around” electronic design
7nm to 5nm, the benefit is significantly less, and products may automation (EDA) tools.
The remainder of this paper reviews opportunities for machine
1 The 2001 International Technology Roadmap for Semiconductors [40] noted that “cost learning in IC physical implementation. Section 2 reviews exam-
of design is the greatest threat to continuation of the semiconductor roadmap”.
ple ML applications aimed at removing unnecessary design and
Permission to make digital or hard copies of all or part of this work for personal or
modeling margins through new correlation mechanisms. Section 3
classroom use is granted without fee provided that copies are not made or distributed reviews applications that seek faster design convergence through
for profit or commercial advantage and that copies bear this notice and the full citation predictors of downstream flow outcomes. Section 4 gives a broader
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
vision of how ML can help the IC design and EDA fields escape
to post on servers or to redistribute to lists, requires prior specific permission and/or a the current “local minimum” of coevolution in design methodology
fee. Request permissions from [email protected]. and design tools. Section 5 concludes with open challenges for ML
ISPD’18, March 25–28, 2018, Monterey, CA, USA in IC physical design. Since this paper shares its subject matter and
© 2018 Association for Computing Machinery.
ACM ISBN 978-1-4503-5626-8/18/03. . . $15.00 was written contemporaneously with [23], readers are referred to
https://doi.org/10.1145/3177540.3177554 [23] for additional context.
68
Statistical and Machine Learning-Based CAD ISPD’18, March 25–28, 2018, Monterey, CA, USA
Figure 2: Accuracy-cost tradeoff in analysis. Figure 3: Flow and results for machine learning of STA tool
Miscorrelation forces introduction of design guardbands and/or miscorrelation: (a) [16]; (b) [30]. HSM approaches are de-
pessimism into the flow. For example, if the place-and-route (P&R) scribed in [28] [29].
tool’s STA report determines that an endpoint has positive worst
setup slack, while the signoff STA tool determines that the same Next Targets. [23] identifies two near-term extensions in the realm
endpoint has negative worst slack, an iteration (ECO fixing step) of timer analysis correlation. (1) PBA from GBA. Timing analysis
will be required. On the other hand, if the P&R tool applies pes- pessimism is reduced with path-based analysis (PBA), at the cost of
simism to guardband its miscorrelation to the signoff tool, this will significantly greater runtime than traditional graph-based analysis
cause unneeded sizing, shielding or VT-swapping operations that (GBA). In GBA, worst (resp. best) transitions (for max (resp. min)
cost area, power and design schedule. Miscorrelation of timing delay analyses) are propagated at each pin along a timing path,
analyses is particularly harmful: (i) timing closure can consume leading to conservative arrival time estimates. PBA calculates path-
up to 60% of design time [12], and (ii) added guardbands not only specific transition and arrival times at each pin, reducing pessimism
worsen power-speed-area tradeoffs [3, 9, 12], but can also lead to that can easily exceed a stage delay. Figure 4 shows the frequency
non-convergence of the design. distribution of endpoint slack pessimism in GBA. This pessimism
Signoff Timer Correlation. Correlation to signoff timing is the harms the design flow, e.g., when GBA reports negative slack when
most valuable target for ML in back-end design. Improved correla- PBA slack is positive, schedule and chip resources are wasted to fix
tion can give “better accuracy for free” that shifts the cost-accuracy false timing violations; when both GBA and PBA report negative
tradeoff (i.e. achieving the ML impact in Figure 2) and reduces iter- slack, there is waste from from over-fixing per the GBA report;
ations, turnaround time, overdesign, and tool license usage along etc. Similar considerations apply to accuracy requirements for pre-
the entire path to final design signoff.3 [27] uses a learning-based diction of PBA slack itself. (2) Prediction of timing at “missing
approach to fit analytical models of wire slew and delay to estimates corners”. Today’s signoff timing analysis is performed at 200+ cor-
from a signoff STA tool. These models improve accuracy of delay ners, and even P&R and optimization steps of physical design must
and slew estimations along with overall timer correlation, such that satisfy constraints at dozens of corners. [23] [24] note that predic-
fewer invocations of signoff STA are needed during incremental tion of STA results for one or more “missing” corners that are not
gate sizing optimization [34]. [16] applies deep learning to model analyzed, based on the STA reports for corners that are analyzed,
and correct divergence between different STA tools with respect to corresponds to matrix completion in ML [6] - and that the outlook
flip-flop setup time, cell arc delay, wire delay, stage delay, and path for this ML application is promising. An implicit challenge is to
slack at timing endpoints. The approach achieves substantial (mul- identify or synthesize the K timing corners that will enable the most
tiple stage delays) reductions in miscorrelation. Both a one-time accurate prediction of timing at all N production timing corners.
training methodology using artificial and real circuit topologies, as Product teams can also inform foundries and library teams of these
well as an incremental training flow during production usage, are K corners, so that the corresponding timing libraries can be the
described (Figure 3(a)). [30] achieves accurate (sub-10ps worst-case first to be characterized.
error in a foundry 28nm FDSOI technology) prediction of SI-mode
timing slacks based on “cheaper, faster” non-SI mode reports. A
combination of electrical, functional and topological parameters
are used to predict the incremental transition times and arc/path
delays due to SI effects. From this and other works, an apparent
“no-brainer” is to use Hybrid Surrogate Modeling (HSM) [28] to
combine predicted values from multiple ML models into final pre-
dictions (Figure 3(b)).
2 The figure’s y-axis shows that the error of the simplest estimates (e.g., “Elmore delay”)
can be viewed as having accuracy of (100 − x )%. The return on investment for new
ML applications would be higher when x is larger.
3 Given that miscorrelation equates with margin, it is useful to note [18]. Figure 4: Frequency distribution of ((PBA slack) − (GBA
slack)) at endpoints of netcard, 28FDSOI.
69
Statistical and Machine Learning-Based CAD ISPD’18, March 25–28, 2018, Monterey, CA, USA
70
Statistical and Machine Learning-Based CAD ISPD’18, March 25–28, 2018, Monterey, CA, USA
lie in finding the fixed point of a chicken-egg loop, as noted in [7] reduces the time needed to solve any given subproblem, and smaller
[22]. An example challenge today is to predict the fixed point for subproblems can be better-solved (see [33]). At the same time, in-
(non-uniform) power distribution and post-P&R layout that meets creasing the number of design partitions without undue loss of
signoff constraints with maximum utilization. global solution quality demands new placement, global routing and
optimization algorithms, as well as fundamentally new RTL parti-
4 SOC IMPLEMENTATION: A VISION tion and floorplan co-optimization capabilities. Further, reducing
Physical design tools and flows today are unpredictable. A root design flexibility by giving designers “freedoms from choice” with
cause is that many complex heuristics have been accreted upon pre- respect to RTL constructs, power distribution, clock distribution,
vious complex heuristics. Thus, tools have become unpredictable, global buffering, non-default wiring rules, etc. would increase pre-
particularly when they are forced to try hard. Figure 7 (left), from im- dictability, leading to fewer iterations (ideally, single-pass design).
plementation of the PULPino low-power RISC V core in a foundry Turnaround time is then minimized. Improved predictability and
14nm enablement, shows that post-P&R area can change by 6% fewer iterations result in smaller design guardbands. The end result:
when target frequency changes by just 10MHz near the maximum improvement of achieved design quality, which shrinks the design
achievable frequency. Figure 7 (right) illustrates that the statistics capability gap. As pointed out in [24], achieving this vision of fu-
of this noisy tool behavior are Gaussian [32] [17]. Unpredictability ture SOC design methodology would improve quality, schedule and
of design implementation results in unpredictability of the design cost – i.e., “the last scaling levers”. A number of new mindsets for
schedule. However, since product companies must strictly meet tool developers and design flow engineers are implicit: (i) tools and
design and tapeout schedules, the design target (PPA) must be guard- flows should never return unexpected results; (ii) designers should
banded, impacting product quality and profitability. Put another see predictability, not chaos, in their tools and flows; (iii) cloud
way: (i) our heuristics and tools are chaotic when designers de- deployment and parallel search can help to preserve or improve
mand best-quality results; and (ii) when designers want predictable achieved quality of results; and (iv) the focus of design-based equiv-
results, they must aim low. alent scaling is on sustained reduction of design time and design
effort.
71
Statistical and Machine Learning-Based CAD ISPD’18, March 25–28, 2018, Monterey, CA, USA
72
Statistical and Machine Learning-Based CAD ISPD’18, March 25–28, 2018, Monterey, CA, USA
I.e., as in [22] [7], with early priorities being vectorless dynamic [5] K. D. Boese, A. B. Kahng and S. Muddu, ”New Adaptive Multistart Techniques
IR drop and power-temperature loops. (5) Continued improvement for Combinatorial Global Optimizations”, Operations Research Letters 16(2) (1994),
pp. 101-113.
of timing correlation and estimation as in [16] [30]. Matching the [6] E. J. Candes and B. Recht, “Exact Matrix Completion via Convex Optimization”,
golden tool earlier in the flow will more accurately drive optimiza- Foundations of Computational Mathematics 9 (2009), pp. 717-772.
tions and reduce ECO iterations. [7] W.-T. J. Chan, K. Y. Chung, A. B. Kahng, N. D. MacDonald and S. Nath, “Learning-
Based Prediction of Embedded Memory Timing Failures During Initial Floorplan
Predictive Models of Tools and Designs. (1) Prediction of the Design”, Proc. ASP-DAC, 2016, pp. 178-185.
convergent point for non-uniform PDN and P&R. The PDN is defined [8] W.-T. J. Chan, P.-H. Ho, A. B. Kahng and P. Saxena, “Routability Optimization for
before placement, but power analysis and routability impact can Industrial Designs at Sub-14nm Process Nodes Using Machine Learning”, Proc.
ISPD, 2017, pp. 15-21.
be assessed only after routing. (2) Estimation of the PPA response [9] T.-B. Chan, A. B. Kahng, J. Li and S. Nath, “Optimization of Overdrive Signoff”,
of a given block in response to floorplan optimizations. Final PPA Proc. ASP-DAC, 2013, pp. 344-349.
impacts of feedthroughs, shape, utilization, memory placement, [10] T.-B. Chan, A. B. Kahng and J. Li, “NOLO: A No-Loop, Predictive Useful Skew
Methodology for Improved Timing in IC Implementation”, Proc. ISQED, 2014, pp.
etc. must be comprehended to enable floorplan assessment and 504-509.
optimization (within a higher-level exploration of design partition- [11] S. Fenstermaker, D. George, A. B. Kahng, S. Mantik and B. Thielges, “METRICS:
ing/floorplanning solutions). (3) Estimation of useful skew impact on A System Architecture for Design Process Optimization”, Proc. DAC, 2000, pp.
705-710.
post-route WNS, TNS metrics. See, e.g., [10]. A low-level related chal- [12] R. Goering, “What’s Needed to “Fix” Timing Signoff?”, DAC Panel, 2013.
lenge: predicting buffer locations to optimize both common paths [13] P. Gupta, A. B. Kahng, A. Kasibhatla and P. Sharma, “Eyecharts: Constructive
and useful skew. (4) “Auto-magic” determination of constraints for a Benchmarking of Gate Sizing Heuristics”, Proc. DAC, 2010, pp. 597-602.
[14] L. Hagen and A. B. Kahng, “Combining Problem Reduction and Adaptive Multi-
given netlist, for given performance and power targets – i.e., best Start: A New Technique for Superior Iterative Partitioning”, IEEE Trans. Computer-
settings for maxtrans, maxcap, clock uncertainty, etc. at each flow Aided Design of Integrated Circuits and Systems 16(7) (1997), pp. 709-717.
stage. More generally, determine “magic” corners and constraints [15] K. Han, A. B. Kahng, J. Lee, J. Li and S. Nath, “A Global-Local Optimization Frame-
work for Simultaneous Multi-Mode Multi-Corner Skew Variation Reduction”,
that will produce the best netlist to send into P&R. (5) Prediction Proc. DAC, 2015, pp. 26:1-26:6.
of the best “target sequence” of constraints through layout optimiza- [16] S. S. Han, A. B. Kahng, S. Nath and A. Vydyanathan, “A Deep Learning Method-
tion phases. I.e., timing and power targets at synthesis, placement, ology to Proliferate Golden Signoff Timing”, Proc. DATE, 2014, pp. 260:1-260:6.
[17] K. Jeong and A. B. Kahng, “Methodology From Chaos in IC Implementation”,
etc. such that best final PPA metrics are achieved. (6) Prediction of Proc. ISQED, 2010, pp. 885-892.
impacts (setup, hold slack, max transition, power) of an ECO, across [18] K. Jeong, A. B. Kahng and K. Samadi, “Impacts of Guardband Reduction on
MCMM scenarios. (7) Prediction of the “most-optimizable” cells during Design Process Outcomes: A Quantitative Approach", IEEE Trans. Semiconductor
Manufacturing 22(4) (2009), pp. 552-565.
design closure. Many optimization steps are wasted on instances [19] A. B. Kahng, “The Cost of Design”, IEEE Design & Test of Computers, 2002.
that cannot be perturbed due to placement, timing, power and other [20] A. B. Kahng, “The ITRS Design Technology and System Drivers Roadmap: Process
context. (8) Prediction of divergence (detouring, timing/slew viola- and Status”, Proc. DAC, 2013, pp. 34-39.
[21] A. B. Kahng, DARPA IDEA Workshop presentation, Arlington, April 2017.
tions) between trial/global route and final detailed route. (9) Prediction [22] A. B. Kahng, ANSYS Executive Breakfast keynote talk, June 2017.
of “doomed runs” at all steps of the physical design flow. http://vlsicad.ucsd.edu/Presentations/talk/Kahng-ANSYS-DACBreakfast_
And More. (1) Infrastructure for ML in IC design. Standards for talk_DISTRIBUTED2.pdf
[23] A. B. Kahng, “New Directions for Learning-Based IC Design Tools and Method-
model encapsulation, model application, IP-preserving model shar- ologies”, Proc. ASP-DAC, 2018, pp. 405-410.
ing, etc. are yet to be developed. (2) Standard ML platform for EDA [24] A. B. Kahng, “Quality, Schedule, and Cost: Design Technology and the Last
modeling. Enablement of design metrics collection, tool and flow Semiconductor Scaling Levers”. keynote talk, ASP-DAC, 2018. http://vlsicad.ucsd.
edu/ASPDAC18/ASP-DAC-2018-Keynote-Kahng-POSTED.pptx
model generation, design-adaptive tool and flow configuration, pre- [25] A. B. Kahng and S. Kang, “Construction of Realistic Gate Sizing Benchmarks
diction of tool and flow outcomes, etc. would realize the original With Known Optimal Solutions”, Proc. ISPD, 2012, pp. 153-160.
vision of METRICS [36] [11] [31]. (3) Development of more mod- [26] A. B. Kahng, S. Kang, H. Lee, I. L. Markov and P. Thapar, “High-Performance
elable algorithms and tools with smoother, less-chaotic outcomes Gate Sizing with a Signoff Timer”, Proc. ICCAD, 2013, pp. 450-457.
[27] A. B. Kahng, S. Kang, H. Lee, S. Nath and J. Wadhwani, “Learning-Based Approxi-
than present methods. (4) Development of datasets to support ML. mation of Interconnect Delay and Slew in Signoff Timing Tools”, Proc. SLIP, 2013,
This spans new classes of artificial circuits and “eyecharts”, as well pp. 1-8.
as sharing of training data and the data generation task across [28] A. B. Kahng, B. Lin and S. Nath, “Enhanced Metamodeling Techniques for High-
Dimensional IC Design Estimation Problems”, Proc. DATE, 2013, pp. 1861-1866.
different design organizations. [29] A. B. Kahng, B. Lin and S. Nath, “High-Dimensional Metamodeling for Prediction
of Clock Tree Synthesis Outcomes”, Proc. SLIP, 2013, pp. 1-7.
6 ACKNOWLEDGMENTS [30] A. B. Kahng, M. Luo and S. Nath, “SI for Free: Machine Learning of Interconnect
Coupling Delay and Transition Effects”, Proc. SLIP, 2015, pp. 1-8.
Many thanks are due to Dr. Tuck-Boon Chan, Dr. Jiajia Li, Dr. Sid- [31] A. B. Kahng and S. Mantik, “A System for Automatic Recording and Prediction
dhartha Nath, Dr. Stefanus Mantik, Dr. Kambiz Samadi, Dr. Kwan- of Design Quality Metrics”, Proc. ISQED, 2001, pp. 81-86.
[32] A. Kahng and S. Mantik, “Measurement of Inherent Noise in EDA Tools”, Proc.
gok Jeong, Ms. Hyein Lee and Mr. Wei-Ting Jonas Chan who, along ISQED, 2002, pp. 206-211.
with current ABKGroup students and collaborators, performed [33] A. Katsioulas, S. Chow, J. Avidan and D. Fotakis, “Integrated Circuit Architecture
much of the research cited in this paper. I thank Professor Lawrence with Standard Blocks”’, U.S. Patent 6,467,074, 2002.
[34] C. W. Moon, P. Gupta, P. J. Donehue and A. B. Kahng, “Method of Designing
Saul for ongoing discussions and collaborations. Permission of coau- a Digital Circuit by Correlating Different Static Timing Analyzers", US Patent
thors to reproduce figures from works referenced here is gratefully 7,823,098, 2010.
acknowledged. Research at UCSD is supported by NSF, Qualcomm, [35] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications
on Speech Recognition”, Proc. IEEE 77 (1989), pp. 257-286.
Samsung, NXP, Mentor Graphics and the C-DEN center. [36] The GSRC METRICS Initiative. http://vlsicad.ucsd.edu/GSRC/metrics/
[37] Partitioning- and Placement-based Intrinsic Rent Parameter Evaluation. http:
REFERENCES //vlsicad.ucsd.edu/WLD/RentCon.pdf
[38] “DARPA Rolls Out Electronics Resurgence Initiative”, https://www.darpa.mil/
[1] P. Agrawal, M. Broxterman, B. Chatterjee, P. Cuevas, K. H. Hayashi, A. B. Kahng, news-events/2017-09-13
P. K. Myana and S. Nath, “Optimal Scheduling and Allocation for IC Design [39] Gate Sizing Benchmarks With Known Optimal Solution. http://vlsicad.ucsd.edu/
Management and Cost Reduction”, ACM TODAES 22(4) (2017), pp. 60:1-60:30. SIZING/bench/artificial.html
[2] D. Aldous and U. Vazirani, “Go With the Winners”, Proc. IEEE Symp. on Founda- [40] International Technology Roadmap for Semiconductors. http://www.itrs2.net/
tions of Computer Science, 1994, pp. 492-501. itrs-reports.html
[3] S. Bansal and R. Goering, “Making 20nm Design Challenges Manageable”,
http://www.chipdesignmag.com/pdfs/chip_design_special_DAC_issue_2012.pdf
[4] D. Bertsekas, Dynamic Programming and Optimal Control, Athena, 1995.
73