Survey of Machine Learning For Electronic Design Automation
Survey of Machine Learning For Electronic Design Automation
513
Session 7A: Special Session - 3: Machine Learning-Aided Computer-Aided Design GLSVLSI ’22, June 6–8, 2022, Irvine, CA, USA
514
Session 7A: Special Session - 3: Machine Learning-Aided Computer-Aided Design GLSVLSI ’22, June 6–8, 2022, Irvine, CA, USA
using a Convolutional Neural Network (CNN). The demonstrations permutations possible, even for small designs. This is exacerbated
are made by successfully designing logic synthesis flows of three by larger designs, and a solution to find the best placement while
large-scaled designs. The Authors in [20] propose a transfer learn- using reasonable resources is a challenge. Multiple research teams
ing approach that reuses the knowledge obtained from previously have been invested in finding solutions to improve the placement.
explored design spaces in exploring a new target design space. The In [16], the authors first train a model to predict the number of
authors develop a novel neural network model for mixed-sharing Design Rule Check (DRC) violations for the current macro place-
multi-domain transfer learning. In [42] an end-to-end framework ment. DRCs are design rules that make sure the netlist layout is
called IRONMAN is proposed. The main goal is to enable a flex- compliant with the foundry-specific tape-out requirements. To gen-
ible and automated Design Space Exploration (DSE), which can erate macro placements with fewer DRCs, authors in [16] use the
provide optimized solutions under user-specified constraints or predictions obtained from their trained ML model. This is used as
Pareto trade-offs among different objectives, such as resource types, the evaluation function in the simulated annealing process. While
area, and latency. The IRONMAN framework consists of three main this work represents an interesting direction, the results shared are
components: GPP (a graph-neural-network-based performance pre- based on netlists with less than six macros, which are not realistic
dictor), RLMD (an RL-based DSE engine that explores the optimized compared to modern IC netlists. Moreover, their approach does not
resource allocation strategy), and CT (a code transformer that as- include any optimization during the place and route steps. Due to
sists RLMD and GPP by extracting data flow graphs from original the optimization, the placement and routing can change dramat-
HLS C/C++). Authors in [10] have presented MLSBench, a collec- ically, and the actual DRC will change accordingly, invalidating
tion of around 5000 synthesizable designs written in C and C++. the model prediction. In addition, although adhering to the DRC
They provide a methodology to generate designs with variations criteria is necessary, the primary objective of macro placement is
of a design, which creates a potential for creating new designs and to optimize for wire length and timing like Worst Negative Slack
enlarging the database in the future. This is followed by analysis (WNS) and Total Negative Slack (TNS), power, and area, and their
and validation that the generated designs are different. The authors work does not consider these metrics.
in [9] propose MAFIA, a tool to compile ML inference on small Recently, authors in [29] have presented a learning-based ap-
form-factor FPGAs for IoT applications. MAFIA provides native proach for chip placement, and unlike other prior methods, their
support for linear algebra operations and can express a variety of approach has the ability to learn from past experience and progres-
ML algorithms, including state-of-the-art models. In [6] the authors sively improve over time. As the model is trained over a greater
develop hls4ml, which is an open-source hardware-software code- number of chip blocks, it becomes better at generating optimized
sign flow. This is used to interpret and translate ML algorithms for placements for previously unseen chip macros, modules, and blocks.
ASIC and FPGA implementations. The paper introduces readers The authors take chip placement as an RL problem and train an
to the essential features of hls4ml, which includes network opti- agent to place the nodes of a chip netlist onto a chip layout. Rep-
mization techniques like pruning and quantization-aware training, resentation learning is used in the supervised task of predicting
which can be integrated into the device implementations. placement quality in order to enhance their RL policy. The authors
were able to enable feature-rich embeddings of the input netlists
2.2 Physical Design by designing a neural network architecture that could accurately
Once again, attributing to the rise in the need for semiconductor predict reward across the wide variety of input netlists and their
ICs, generating a layout from a netlist is essential in the IC de- placements. This architecture is used as the encoder of their RL
sign process. Physical design is converting a logical netlist or RTL policy and value networks to enable transfer learning. Mirhoseini
into a physical layout. Most fabrication processes require design and colleagues, in [28], use an RL-based graph placement method.
houses to use certain design libraries specific to their fabrication As shown in Fig. 2, an RL agent is used to place macros one after the
process. Generating these design layouts from the design netlist and other, and once all macros are placed, the standard cell placement
other design files is complex and time-consuming. Authors in [4] is done using a force-directed method. The method learns from
review available opportunities for ML with a focus on IC physical past experiences, which in turn improves the speed and quality of
implementation. They give examples like (1) removing unnecessary producing solutions for new instances of the problem.
design and modeling margins through correlation mechanisms, Clock Tree Synthesis (CTS): CTS is a physical design step in
(2) achieving faster design convergence through predictors of low implementing the clock network. Historically, the EDA tools were
downstream outcomes that comprehend both tools and design in- designed to build balanced clock trees by minimizing the clock
stances, and (3) corollaries such as optimizing the usage of design skew, giving each register-to-register timing path equal time. The
resources licenses and available schedule. Some open challenges problem with the zero skew clock tree is that all registers launched
for ML in IC physical design are also discussed herein. simultaneously, resulting in a surge of demanded current from the
Floor planning and Placement: Chip placement and floor battery on the active edge of the clock. In this section, recent works
planning are two important processes in the IC design flow. Find- utilizing ML to advance the CTS process are discussed. In [1] the au-
ing the optimal floor plan and placement of a design is considered thor presents a fully automated RL-based solution for reducing the
one of the most time-consuming and complex processes. Modern peak current. The agent modifies the clock arrival times for each of
IC designs have numerous smaller IPs, macros, and modules that the registers to maximize the distribution of clock arrivals. Using RL
require multiple iterations of placement to figure out the optimal allows the agent to explore optimization opportunities beyond the
position for each instance. Most ICs do not have the most opti- heuristic algorithm. The work in [39] employs a genetic algorithm
mal placement simply because of the high number of placement to optimize the clock-skew by limiting the maximum number of
515
Session 7A: Special Session - 3: Machine Learning-Aided Computer-Aided Design GLSVLSI ’22, June 6–8, 2022, Irvine, CA, USA
Figure 2: Overview of RL-based chip floorplanning method and training scheme in [28] (reproduced from [28])
clock drivers introduced and utilizes clustering techniques. In [26] the need to run a golden IR drop tool. The manuscript also shows
the authors utilize a mixed technique to achieve CTS optimization. significant improvement over an industry-leading golden IR drop
The paper employs a Generative Adversarial Network (GAN) aug- sign-off tool with negligible error rates. Similarly, authors in [4]
mented by RL. It is worth noting that the traditional GAN includes present a design flow to generate a PDN with negligible overhead
a generator and discriminator. In this paper, RL uses a pre-trained for standard cell routing while still meeting the IR drop and EM
regression model as a supervisor of the generator. The work in [21] constraints for a given placement. The ML model used in [4] pre-
estimates the clock tree elements such as how many buffers to be dicts the total wire length of the global route associated with a
used or the wire-loads utilizing Artificial Neural Network (ANN). given PDN configuration to speed up the search process. Calculat-
During CTS, the proposed technique uses ANN to determine the ing the IR drop after each ECO, authors in [7] use timing, power,
number of buffers to be added or removed to achieve the designated and physical features collected before ECO to predict the IR drop of
target clock skew. The result maximizes input transition times for a design after ECO. Regional models for cell instances near IR drop
clock buffers and sinks. The work presented by [34] suggests a violations are built to improve prediction accuracy and training
two-tier hybrid approach to optimization. The paper discusses the time. Results in [7] show that IR drop prediction for a design with
employment of supervised ML techniques such as the Support Vec- 100,000 cell instances can be predicted within 2 minutes. In [24], the
tor Machine (SVM) algorithm to estimate clock buffer and wire authors propose an ML technique to build IR drop prediction mod-
sizing. The report focused on providing an alternative to the ex- els based on circuits before ECO revision. Once the ECO revision is
pensive circuit-level simulations and reducing clock skew without done, these prediction models are reused to predict the IR drop of
significantly increasing power dissipation. The paper [30] utilizes a the ECO-revised circuit. The work in [43] provides a review of the
Convolutions Neural Network (CNN) augmented and enhanced by process in IR drop estimation techniques that use ML algorithms.
K-Means clustering and Linear Programming optimization to esti- Authors in [44] propose PowerNet, a CNN-based dynamic IR drop
mate different parameters of CTS. The paper focuses on decreasing estimation technique that can handle both vector-based and vector-
the power consumption of the clock network by reducing the clock less IR analysis. The CNN model used in PowerNet is general and
sinks along various data paths. transferable to different designs. The authors in [23] propose an
Routing: Routing has been a critical and complex challenge automatic flow to alter IR drop violations by ECO, which provides
for IC designers. Due to the huge number of routing possibilities cell movement and downsize solutions. An ML algorithm is used to
for each design, the need to optimize the EDA tools and routing predict IR drop in order to prevent over-fixing. A novel multi-round
algorithm is paramount especially with larger and more complex bipartite matching is used to optimize the resources used during
designs. ML has been used to improve routing quality and time. the ECO flow. MAVIREC [5] is a tool that uses ML techniques like
Authors in [45] provide insights into learning-free placement and three dimension convolutions and regression-like layers to suggest
routing approaches and then provide a detailed review of recent a larger subset of worst-case test patterns in order to improve test
advances in ML for routing and placement. The proposed method coverage and accurately predict the IR drop. Another method to
in [38] uses a deep learning-based congestion estimation algorithm predict IR drop of an IC layout is presented in [15] where XGBoost,
to improve routing quality. Their routing algorithm extracts ap- an ML technique, is used to make dynamic IR drop predictions,
propriate three dimensional features from already placed netlists. which can be applied to vector-based and vector-less IR drop analy-
The authors also propose a congestion estimator that produces a sis, simultaneously. In [15], the authors use a correlation coefficient
heatmap to serve as a guide for initial pattern during global routing to characterize the symmetry of predicted data and golden data.
phase. In [22], a deep RL method is proposed to solve the global 2.4 Static Timing Analysis
routing problem in a simulated environment and an RL agent is
Static Timing Analysis (STA) is a process that takes several iter-
used to produce an optimal policy for routing.
ations. Each iteration may take several hours for larger designs.
2.3 IR drop STA is a technique to check if a design satisfies the timing rules
The on-chip power delivery network (PDN) is a vital part of any required for the end product to function correctly. The input to an
chip, as it determines the quality and reliability of the fabricated STA tool is the routed netlist, clock definitions (or clock frequency),
IC. Ideally, a grid that is dense and compact is desired. However, a and external environment definitions. The STA validates whether
sparse PDN leaves more room for a clock, signal, and Engineering the design could operate at the rated clock frequency without any
Change Order (ECO) routing. Most complex designs need multiple timing violations. Some of the basic timing violations are setup
iterations of PDN design before finalizing the final PDN layout. violation and hold violation. Almost always, the initial layout will
Authors in [13] extract relevant SOC floorplan and PDN features have multiple timing violations, which the STA tool resolves by an
using superposition and partitioning techniques. An ML model iterative process of buffer insertion, signal and clock-tree re-routing,
is then used to predict the updated static IR drop for each power and layout alterations. Accurate timing closure is an important step
node by a series of SOC floorplan alterations. This is done without in the IC design flow. STA will be done multiple times after each
516
Session 7A: Special Session - 3: Machine Learning-Aided Computer-Aided Design GLSVLSI ’22, June 6–8, 2022, Irvine, CA, USA
517
Session 7A: Special Session - 3: Machine Learning-Aided Computer-Aided Design GLSVLSI ’22, June 6–8, 2022, Irvine, CA, USA
methods are discussed under the relevant sections. Finally, future 1–4.
perspectives of ML in EDA are given, and opportunities in hardware [22] H. Liao, W. Zhang, X. Dong, et al. 2019. A Deep Reinforcement Learning Approach
for Global Routing. ArXiv abs/1906.08809 (2019).
security automation are discussed. [23] H.-Y. Lin, Y.-C. Fang, S.-T. Liu, et al. 2020. Automatic IR-Drop ECO Using Machine
Learning. In 2020 IEEE International Test Conference in Asia (ITC-Asia). 7–12.
ACKNOWLEDGMENT [24] S.-Y. Lin, Y.-C. Fang, Y.-C. Li, et al. 2018. IR drop prediction of ECO-revised
This work was supported in part by the National Science Foundation circuits using machine learning. In 2018 IEEE 36th VLSI Test Symposium (VTS).
1–6.
through Computing Research Association for CIFellows #2030859. [25] D. S. Lopera, L. Servadei, V. P. Kasi, et al. 2021. RTL Delay Prediction Using
Neural Networks. In 2021 IEEE Nordic Circuits and Systems Conference (NorCAS).
REFERENCES IEEE, 1–7.
[1] S. A. Beheshti-Shirazi, A. Vakil, S. Manoj, et al. 2021. A Reinforced Learning [26] Y.-C. Lu, J. Lee, A. Agnesina, et al. 2019. GAN-CTS: A generative adversar-
Solution for Clock Skew Engineering to Reduce Peak Current and IR Drop. In ial framework for clock tree prediction and optimization. In 2019 IEEE/ACM
Proceedings of the 2021 on Great Lakes Symposium on VLSI. 181–187. International Conference on Computer-Aided Design (ICCAD). IEEE, 1–8.
[2] S. Bian, M. Hiromoto, M. Shintani, et al. 2017. LSTA: Learning-based static [27] H. M. Makrani, F. Farahmand, H. Sayadi, et al. 2019. Pyramid: Machine learning
timing analysis for high-dimensional correlated on-chip variations. In 2017 54th framework to estimate the optimal timing and resource usage of a high-level
ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, 1–6. synthesis design. In 2019 29th International Conference on Field Programmable
[3] P. Cao, W. Bao, K. Wang, et al. 2021. A Timing Prediction Framework for Wide Logic and Applications (FPL). IEEE, 397–403.
Voltage Design with Data Augmentation Strategy. In Proceedings of the 26th Asia [28] A. Mirhoseini, A. Goldie, M. Yazgan, et al. 2021. A graph placement methodology
and South Pacific Design Automation Conference. 291–296. for fast chip design. Nature 594, 7862 (2021), 207–212.
[4] W.-H. Chang, C.-H. Lin, S.-P. Mu, et al. 2017. Generating Routing-Driven Power [29] A. Mirhoseini, A. Goldie, M. Yazgan, et al. 2020. Chip Placement with Deep
Distribution Networks With Machine-Learning Technique. IEEE Transactions on Reinforcement Learning. ArXiv abs/2004.10746 (2020).
Computer-Aided Design of Integrated Circuits and Systems 36, 8 (2017), 1237–1250. [30] S. Nagaria and S. Deb. 2020. Designing of an Optimization Technique for the
[5] V. A. Chhabria, Y. Zhang, H. Ren, et al. 2021. MAVIREC: ML-Aided Vectored Prediction of CTS Outcomes using Neural Network. In 2020 IEEE International
IR-Drop Estimation and Classification. In 2021 Design, Automation Test in Europe Symposium on Smart Electronic Systems (iSES) (Formerly iNiS). 312–315.
Conference Exhibition (DATE). 1825–1828. [31] W. L. Neto, M. Austin, S. Temple, et al. 2019. LSOracle: a Logic Synthesis Frame-
[6] F. Fahim, B. Hawks, C. Herwig, et al. 2021. hls4ml: An open-source codesign work Driven by Artificial Intelligence: Invited Paper. In 2019 IEEE/ACM Interna-
workflow to empower scientific low-power machine learning devices. arXiv tional Conference on Computer-Aided Design (ICCAD). 1–6.
preprint arXiv:2103.05579 (2021). [32] O. S. Ram and S. Saurabh. 2020. Modeling multiple-input switching in timing
[7] Y.-C. Fang, H.-Y. Lin, M.-Y. Su, et al. 2018. Machine-Learning-Based Dynamic analysis using machine learning. IEEE Transactions on Computer-Aided Design of
IR Drop Prediction for ECO. In Proceedings of the International Conference on Integrated Circuits and Systems 40, 4 (2020), 723–734.
Computer-Aided Design (ICCAD ’18). Association for Computing Machinery, New [33] W. Raslan and Y. Ismail. [n. d.]. Deep Learning Autoencoder-based Compression
York, NY, USA, Article 17, 7 pages. for Current Source Model Waveforms. In 2021 28th IEEE International Conference
[8] M. Ferianc, H. Fan, R. S. W. Chu, et al. 2020. Improving Performance Estimation on Electronics, Circuits, and Systems (ICECS). IEEE, 1–6.
for FPGA-Based Accelerators for Convolutional Neural Networks. In Applied [34] R. Samanta, J. Hu, and P. Li. 2010. Discrete Buffer and Wire Sizing for Link-Based
Reconfigurable Computing. Architectures, Tools, and Applications, Fernando Rincón, Non-Tree Clock Networks. IEEE Transactions on Very Large Scale Integration
Jesús Barba, Hayden K. H. So, Pedro Diniz, and Julián Caba (Eds.). Springer (VLSI) Systems 18, 7 (2010), 1025–1035.
International Publishing, Cham, 3–13. [35] S. Saurabh, H. Shah, and S. Singh. 2018. Timing closure problem: Review of
[9] N. P. Ghanathe, V. Seshadri, R. Sharma, et al. 2021. MAFIA: Machine Learning challenges at advanced process nodes and solutions. IETE Technical Review
Acceleration on FPGAs for IoT Applications. In 2021 31st International Conference (2018).
on Field-Programmable Logic and Applications (FPL). 347–354. [36] M. A. Savari and H. Jahanirad. 2020. NN-SSTA: A deep neural network approach
[10] P. Goswami, M. Shahshahani, and D. Bhatia. 2020. MLSBench: A Synthesizable for statistical static timing analysis. Expert Systems with Applications 149 (2020),
Dataset of HLS Designs to Support ML Based Design Flows. In Proceedings of the 113309.
2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays [37] T. Sharma, S. Kolluru, and K. S. Stevens. 2020. Learning Based Timing Closure
(FPGA ’20). Association for Computing Machinery, New York, NY, USA, 312. on Relative Timed Design. In IFIP/IEEE International Conference on Very Large
[11] W. Haaswijk, E. Collins, B. Seguin, et al. 2018. Deep Learning for Logic Optimiza- Scale Integration-System on a Chip. Springer, 133–148.
tion Algorithms. In 2018 IEEE International Symposium on Circuits and Systems [38] M. Su, H. Ding, S. Weng, et al. 2022. High-Correlation 3D Routability Estimation
(ISCAS). 1–4. for Congestion-guided Global Routing. In 2022 27th Asia and South Pacific Design
[12] A. Han, Z. Zhao, C. Feng, et al. 2021. Stage-based Path Delay Prediction with Automation Conference (ASP-DAC). 580–585.
Customized Machine Learning Technique. In Proceedings of the 2021 5th Interna- [39] P. Vuillod, L. Benini, A. Bogliolo, et al. 1996. Clock-skew optimization for peak
tional Conference on Electronic Information Technology and Computer Engineering. current reduction. In Proceedings of 1996 International Symposium on Low Power
926–933. Electronics and Design. IEEE, 265–270.
[13] C.-T. Ho and A. B. Kahng. 2019. IncPIRD: Fast Learning-Based Prediction of [40] A. Wheeldon, A. Yakovlev, and R. Shafik. 2021. Self-timed Reinforcement Learning
Incremental IR Drop. In 2019 IEEE/ACM International Conference on Computer- using Tsetlin Machine. In 2021 27th IEEE International Symposium on Asynchro-
Aided Design (ICCAD). 1–8. nous Circuits and Systems (ASYNC). IEEE, 40–47.
[14] A. Hosny, S. Hashemi, M. Shalan, et al. 2020. DRiLLS: Deep Reinforcement Learn- [41] A. WIlliams. 2019. LARGEST CHIP EVER HOLDS 1.2 TRILLION TRANSISTORS.
ing for Logic Synthesis. In 2020 25th Asia and South Pacific Design Automation Retrieved April 5, 2022 from https://hackaday.com/2019/08/21/largest-chip-ever-
Conference (ASP-DAC). 581–586. holds-1-2-trillion-transistors/
[15] P. Huang, C. Ma, and Z. Wu. 2021. Fast Dynamic IR-Drop Prediction Using [42] N. Wu, Y. Xie, and C. Hao. 2021. IRONMAN: GNN-assisted Design Space Explo-
Machine Learning in Bulk FinFET Technologies. Symmetry 13, 10 (2021). ration in High-Level Synthesis via Reinforcement Learning. Proceedings of the
[16] Y.-H. Huang, Z. Xie, G.-Q. Fang, et al. 2019. Routability-Driven Macro Placement 2021 on Great Lakes Symposium on VLSI (2021).
with Embedded CNN-Based Prediction Model. In 2019 Design, Automation Test in [43] Z. Xie, H. Li, X. Xu, et al. 2020. Fast IR Drop Estimation with Machine Learning.
Europe Conference Exhibition (DATE). 180–185. In Proceedings of the 39th International Conference on Computer-Aided Design
[17] A. B. Kahng, U. Mallappa, and L. Saul. 2018. Using machine learning to predict (ICCAD ’20). Association for Computing Machinery, New York, NY, USA, Article
path-based slack from graph-based timing analysis. In ICCD. IEEE, 603–612. 13, 8 pages.
[18] A. B. Kahng, U. Mallappa, L. Saul, et al. 2019. " Unobserved Corner" Prediction: [44] Z. Xie, H. Ren, B. Khailany, et al. 2020. PowerNet: Transferable Dynamic IR Drop
Reducing Timing Analysis Effort for Faster Design Convergence in Advanced- Estimation via Maximum Convolutional Neural Network. In 2020 25th Asia and
Node Design. In 2019 Design, Automation & Test in Europe Conference & Exhibition South Pacific Design Automation Conference (ASP-DAC). 13–18.
(DATE). IEEE, 168–173. [45] J. Yan, X. Lyu, R. Cheng, et al. 2022. Towards Machine Learning for Placement
[19] R. G. Kim, J. R. Doppa, and P. P. Pande. 2018. Machine Learning for Design and Routing in Chip Design: a Methodological Overview. ArXiv abs/2202.13564
Space Exploration and Optimization of Manycore Systems. In Proceedings of the (2022).
International Conference on Computer-Aided Design (ICCAD ’18). Association for [46] T. Yang, G. He, and P. Cao. 2022. Pre-Routing Path Delay Estimation Based on
Computing Machinery, New York, NY, USA, Article 48, 6 pages. Transformer and Residual Framework. In 2022 27th Asia and South Pacific Design
[20] J. Kwon and L. P. Carloni. 2020. Transfer Learning for Design-Space Exploration Automation Conference (ASP-DAC). 184–189.
with High-Level Synthesis. In 2020 ACM/IEEE 2nd Workshop on Machine Learning [47] C. Yu, H. Xiao, and G. De Micheli. 2018. Developing Synthesis Flows without
for CAD (MLCAD). 163–168. Human Knowledge. In Proceedings of the 55th Annual Design Automation Con-
[21] Y. Kwon, J. Jung, I. Han, et al. 2018. Transient Clock Power Estimation of Pre-CTS ference (DAC ’18). Association for Computing Machinery, New York, NY, USA,
Netlist. In 2018 IEEE International Symposium on Circuits and Systems (ISCAS). Article 50, 6 pages.
518