0% found this document useful (0 votes)
523 views41 pages

AIML Algorithms and Applications in VLSI Design and Technology

This document reviews the use of artificial intelligence and machine learning algorithms in VLSI design and technology. It discusses how AI/ML can help address challenges from growing process variations and reduce chip manufacturing turnaround time. The document provides a thorough literature review of past and potential future applications of AI/ML across various stages of VLSI design and manufacturing, including circuit simulation, architecture design, physical design, manufacturing, and testing. It aims to highlight how AI/ML approaches can automate and optimize VLSI design and production processes.

Uploaded by

Farah Mohmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
523 views41 pages

AIML Algorithms and Applications in VLSI Design and Technology

This document reviews the use of artificial intelligence and machine learning algorithms in VLSI design and technology. It discusses how AI/ML can help address challenges from growing process variations and reduce chip manufacturing turnaround time. The document provides a thorough literature review of past and potential future applications of AI/ML across various stages of VLSI design and manufacturing, including circuit simulation, architecture design, physical design, manufacturing, and testing. It aims to highlight how AI/ML approaches can automate and optimize VLSI design and production processes.

Uploaded by

Farah Mohmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

AI/ML Algorithms and Applications in VLSI Design and Technology

Deepthi Amurua,∗, Harsha V. Vudumulaa, Pavan K Cherupallya, Sushanth R Gurrama,


Amir Ahmadb, Andleeb Zahraa and Zia Abbasa
aCenter for VLSI and Embedded Systems Technology (CVEST), International Institute of Information Technology, Hyderabad (IIIT-H),
Gachibowli, Hyderabad, 500032, India
bCollege of IT, UAE University, Al Ain, 15551, Abu Dhabi, UAE

ARTICLE INFO ABSTRACT


Keywords: An evident challenge ahead for the integrated circuit (IC) industry is the investigation and development
Artificial Intelligence (AI) of methods to reduce the design complexity ensuing from growing process variations and curtail the
Machine learning (ML) turnaround time of chip manufacturing. Conventional methodologies employed for such tasks are
Manufacturing largely manual, time-consuming, and resource-intensive. In contrast, the unique learning strategies
CMOS of artificial intelligence (AI) provide numerous exciting automated approaches for handling complex
VLSI Design and data-intensive tasks in very-large-scale integration (VLSI) design and testing. Employing AI and
machine learning (ML) algorithms in VLSI design and manufacturing reduces the time and effort for
understanding and processing the data within and across different abstraction levels. It, in turn,
improves the IC yield and reduces the manufacturing turnaround time. This paper thoroughly reviews
the AI/ML automated approaches introduced in the past toward VLSI design and manufacturing.
Moreover, we discuss the future scope of AI/ML applications to revolutionize the field of VLSI design,
aiming for high-speed, highly intelligent, and efficient implementations.

1. Introduction dimensions, simple scaling eventually stops. Although de-


vices are small, many aspects of their performance deteri-
A dramatic revolution has been triggered in the field of orate, e.g., leakage increases [4, 5, 6]; gain decreases; and
electronics by the advent of complementary metal-oxide- sensitivity to process variations in manufacturing increases
semiconductor (CMOS) transistors in the integrated circuit [7]. The profound increase in process variations signifi-
(IC) industry, leading to the era of semiconductor devices. cantly impacts the circuit operation, leading to a variable
Thenceforth, CMOS technology has been the predominant performance in identical-sized transistors. It further impacts
technology in the field of microelectronics. The number of the propagation delay of the circuit, which behaves as a
transistors fabricated on a single chip has increased expo- stochastic random variable, thereby complicating the timing-
nentially since the 1960s [1], [2]. The continuous down- closure techniques and strongly affecting the chip yield [8].
scaling of transistors over many technological generations Increasing process variations in the nanometer regime is one
has improved the density and performance of these devices of the major causes of parametric yield loss. Multi-gate field-
[3], leading to tremendous growth in the microelectronics effect transistors (FETs) [9] are more tolerant to process vari-
industry. The realization of complex digital systems on ations than CMOS transistors. However, their performance
a single chip is enabled by modern very-large-scale inte- parameters are also affected by aggressive scaling [10, 11].
gration (VLSI) technology. The high demand for portable Advanced and affordable design techniques with finer
electronics in recent years has significantly increased the optimization must be adopted in the VLSI design flow to
demand for power-sensitive designs with sophisticated fea- maintain future performance trends in circuits and systems.
tures. Highly advanced and scalable VLSI circuits meet the The turnaround time of a chip depends on the performance
ever-increasing demand in the electronics industry. Con- of electronic design automation (EDA) tools in overcoming
tinuous device downscaling is one of the major driving design constraints. The traditional rule-based methodologies
forces of IC technology advancement with improved device in EDA take longer to yield an optimal solution for the set
performance. Currently, devices are being scaled down to design constraints. In addition, to a certain level, the conven-
the sub-3-nm-gate regime and beyond. tional solutions employed for such tasks are largely manual;
Aggressive downscaling of CMOS technology has cre- thus, they are time-critical and resource intensive, resulting
ated many challenges for device engineers and new oppor- in time-to-market delays. Moreover, once the data are fed
tunities. The semiconductor process complexity increases as back, it is difficult and time-consuming for the designers to
the transistor dimensions decrease. As we approach atomic understand the underlying functionalities, i.e., the root cause
∗Corresponding of issues, and apply fixes if required. This difficulty increases
author
[email protected] (D. Amuru); under the impact of the process and environmental variations
[email protected] (H.V. Vudumula); [12, 7].
[email protected] (P.K. Cherupally); Artificial intelligence (AI) has provided prominent so-
[email protected] (S.R. Gurram); [email protected] (A.
Ahmad); [email protected] (A. Zahra); [email protected] (Z. Abbas)
lutions to many problems in various fields. The principle of
ORCID(s): 0000-0003-0793-3244 (D. Amuru) AI is based on human intelligence, interpreted in such a way
that a machine can easily mimic it and execute tasks of

D Amuru et al.: Preprint submitted to Elsevier Page 1 of 41


AI/ML in VLSI

Figure 1: Different areas of VLSI Technology reviewed in the paper

varying complexity. Machine learning (ML) is a subset of We organized the paper as follows. Section 2 briefly
AI. The goals of AI/ML are learning, reasoning, predicting, discusses the existing review articles on AI/ML–VLSI. An
and perceiving. AI/ML can quickly identify the trends and overview of artificial intelligence and machine learning and
patterns in large volumes of data, enabling users to make a brief on different steps in the VLSI design and manu-
relevant decisions. AI/ML algorithms can handle multi- facturing are presented in sections 3 and 4, respectively.
dimensional and multivariate data at high computational A detailed survey of AI/ML-CAD-oriented work in circuit
speeds. These algorithms continuously gain experience and simulation at various abstraction levels (device level, gate
improve the accuracy and efficiency of their predictions. level, circuit level, register-transfer level (RTL), and post-
Further, they facilitate decision-making by optimizing the layout simulation) is presented in Section 5. A review of
relevant processes. Considering the numerous advantages of AI/ML algorithms at the architecture level and SoC level is
AI/ML algorithms, their applications are endless. Over the reported in sections 6 and 7. A survey of the learning
last decade, AI/ML strategies have been extensively applied strategies proposed in physical design and manufacturing
in VLSI design and technology. (lithography, reliability analysis, yield prediction, and man-
VLSI–computer-aided design (CAD) tools are involved agement) is discussed in sections 8 and 9, respectively. The
in several stages of the chip design flow, from design entry AI/ML approaches proposed in testing are reported in
to full-custom layouts. Design and performance evaluation Section 10. Sources of training data for AI/ML-VLSI are
of highly complex digital and analog ICs depends on the presented in Section 11, followed by challenges and
CAD tools’ capability. Advancement of VLSI–CAD tools opportunities for AI/ML approaches in the field of VLSI in
is becoming increasingly challenging and complex with the Section 12.
tremendous increase in transistors per chip. Numer- ous
opportunities are available in semiconductor and EDA 2. Existing Reviews
technology for developing/incorporating AI/ML solutions to
automate processes at various VLSI design and manufactur- The impact of AI on VLSI design was first demonstrated
ing levels for quick convergence [13],[14]. These intelligent in 1985 by Robert. S. Kirk [15]. He briefly explained the
learning algorithms are steered and designed to achieve scope and necessity for AI techniques in CAD tools at
relatively fast turnaround times with efficient, automated different levels of VLSI design. His paper included a brief
solutions for chip fabrication. on the existing VLSI–AI tools and stressed the importance of
This work thoroughly attempts to summarize the litera- incorporating the expanded capabilities of AI in CAD tools.
ture on AI/ML algorithms for VLSI design and modeling at The advantages of incorporating AI in the VLSI design pro-
different abstraction levels. It is the first paper that provides cess and its applications are briefed in [16] and [17]. Khan et
a detailed review encompassing circuit modeling to system- al. [17] focused on the applications of AI in the IC industry,
on-chip (SoC) design, along with physical design, testing, particularly in expert systems; different knowledge-based
and manufacturing. We also briefly present the VLSI design systems, such as design automation assistant, design advisor
flow and introduction to artificial intelligence for the benefit by NCR, and REDESIGN, being used in the VLSI industry.
of the readers. Rapid developments in AI/ML have drawn the attention of
researchers who have made numerous pioneering efforts

D Amuru et al.: Preprint submitted to Elsevier Page 2 of 41


AI/ML in VLSI
challenges and future directions. In [28], a comprehensive
to design, develop, and apply learning strategies to VLSI review of Graphical Neural Networks (GNNs) for EDA is
design and manufacturing. The implementation of neural presented, highlighting the areas of logic synthesis, physical
networks (NNs) for digital and analog VLSI circuits and design, and verification. As graphs are an intuitive way of
knowledge-based systems has been reported in [18]. The representing circuits, netlists, and layout, GNN can easily
scope for the joint optimization of physical design with data fit into EDA to solve combinatorial optimization problems
analytics and ML is reviewed in [19]. at various levels and improve the QoR (Quality of Results)
Many recent applications and opportunities for ML in [29]. A review of ML achievements in placement and routing
physical design are reviewed in [20]. Beerel et al. [21] stated with benchmark results on benchmark ISPD 2015 datasets is
the challenges and opportunities associated with ML-based presented in [30].
algorithms in asynchronous CAD/VLSI; they proposed the Recently, a brief review of recent machine learning and
development of an ML-based recommendation tool, called deep learning techniques incorporated in analog and digital
design advisor, that monitors and records the actions taken VLSI, including physical design, is discussed in [31]. VLSI
by various designers during the usage of standard RTL, Computer-Aided Design at different abstraction levels from
logic synthesis, and place route tools. The design advisor a machine-learning perspective is presented in [32]. In [33],
chooses the best action for a given scenario by running applications, opportunities, and challenges of reinforcement
powerful training engines. Subsequently, the design advisor learning to EDA, mainly macro chip placement, analog
is deployed and used by circuit designers to obtain design transistor sizing, and logic synthesis, are discussed with
recommendations. Overall, these design advisors focus more practical implementations.
on asynchronous CAD/ML tools. Stratigopoulos et al. re- The reviews mentioned above break down to provide
viewed IC testing by demonstrating various ML techniques a detailed discussion of the AI/ML approaches proposed in
in the field of testing and provided recommendations for the literature, mainly covering all the abstraction levels of
future practitioners [22]. the digital VLSI design flow. This review summarizes the
Elfadel et al. [23] discussed in detail various ML meth- literature on AI/ML algorithms for VLSI design and
ods used in the fields of physical design; yield prediction; modeling at different abstraction levels. We also discuss the
failure, power, and thermal analysis; and analog design. challenges, opportunities, and scope for incorporating
Khailany et al. [24] highlighted the application of ML in automated learning strategies at various levels in the semi-
chip designing. They focused on ML-based approaches in conductor design flow. The design abstraction levels covered
micro-architectural design space exploration, power analy- in this review under different sections are shown through a
sis, VLSI physical design, and analog design to optimize the dendrogram in fig.1. A concise VLSI design flow with the
prediction speed and tape-out time. They proposed an AI- traditional commercial CAD tools used in the industry and
driven physical design flow with a deep reinforcement the surrogate AI/ML techniques proposed by researchers is
learning (DRL) optimization loop to automatically explore given in fig.2. Figure 6 provides a summary of the AI/ML
the design space for high-quality physical floorplans, timing techniques proposed in the literature for VLSI circuit simu-
constraints, and placements, which can achieve good-quality lation for estimating circuit performance parameters, such as
results, downstream clock-tree synthesis (CTS), and routing the transistor characteristics, statistical static timing analysis
steps. (SSTA), leakage power, power consumption, and post-layout
ML in EDA is currently gaining the attention of re- behavior.
searchers and research communities. Employing ML in IC In the following sections, we present a brief background
design and manufacturing augments the designers by reduc- of AI/ML and a brief description of the different stages of
ing their time and effort in data analysis, optimizing the the VLSI design flow.
design flow, and improving time to market [25]. Rapp et al.
presented a comprehensive presentation of state of the art on
ML for CAD at different abstract levels [26]. Interestingly, 3. Brief on VLSI Design Flow
the paper also presents a meta-study of ML usage in CAD A traditional digital IC design flow has many hierarchi-
to capture the overall trend of suitable ML algorithms at cal levels, as shown in fig.2; the flowchart covers a gen-
various levels of the VLSI cycle. As per the meta-study, the eralized design flow, including the front-end and back-end
trend for ML-CAD is shifting toward Physical design with of full-custom/semi-custom IC designs. The design specifi-
NN-implementations compared to other abstraction levels cations abstractly describe the functionality, interface, and
and algorithms. The paper also discusses open challenges overall architecture of the digital circuit to be designed.
while employing ML for CAD, such as the problem of They include block diagrams providing the functional de-
combinatorial optimization, limited availability of training scription, timing specifications, propagation delays, package
data, and practical limitations. However, the reviews and type required, and design constraints. They also act as an
summaries have been presented only for the last five years, agreement between the design engineer and vendor. The
limited to five key conferences and journals. Another sur- architectural design level comprises the system’s basic archi-
vey [27] summarizes ML-CAD works in a well-tabulated tecture. It includes decisions such as reduced instruction set
manner covering many abstraction levels in digital/analog computing/complex instruction set computing (RISC/CISC)
design flow. However, there needed to be more focus on

D Amuru et al.: Preprint submitted to Elsevier Page 3 of 41


AI/ML in VLSI

Figure 2: Modern Chip Design Flow

processors and the number of arithmetic logic units (ALUs) automatically convert C/C++-based system specifications
and floating-point units. The outcome of this level is a to HDL. Alternatively, the logic synthesis tool produces the
micro-architectural specification that contains the functional netlist, i.e., a gate-level description for the high-level
descriptions of subsystem units. Architects can estimate the behavioral description. The logic synthesis tool ensures that
design performance and power based on such descriptions. the gate-level netlist meets the timing, area, and power
The behavioral design level is the next; it provides the specifications. Logic verification is performed through test-
functional description of the design, often written using bench/simulation. Formal verification and scan insertion
Verilog HDL/VHDL. The behavioral level comprises a high- through design for testability (DFT) are performed at this
level description of the functionality, hiding the underlying stage to examine the RTL mapping [34]. Next, system parti-
implementation details. The timing information is checked tioning, which divides large and complex systems into small
and validated in the next level, i.e., the RTL description (reg- modules, is performed, followed by floor planning, place-
ister transfer level). A high-level synthesis (HLS) tool can ment, and routing. The primary function of the floor planner

D Amuru et al.: Preprint submitted to Elsevier Page 4 of 41


AI/ML in VLSI
when new data is introduced. ML can handle structured and
is to estimate the required chip area for standard cell/module semi-structured data, whereas AI can handle structured,
design implementation and is responsible for improving the semi-structured, and unstructured data. ML can be divided
design performance. The place and route tool place the sub- into three main types: supervised, unsupervised, and rein-
modules, gates, and flip-flops, followed by CTS (clock tree forcement learning. Supervised learning is performed when
synthesis) and reset routing. Subsequently, the routing of the output label is present for every element in the given
each block is performed. After placement and routing, layout data. Unsupervised learning is performed when only input
verification is performed to determine if the designed layout variables are present in the data. The learning that involves
conforms to the electrical/physical design rules and source data with a few labeled samples and the rest is unlabeled is
schematic. These processes are implemented using tools referred to as semi-supervised learning [38].
such as design rule check (DRC) and electrical rule check
(ERC). After the post-layout simulation, where parasitic 4.1. Supervised Learning
resistance and capacitance extraction and verification are Supervised learning is further divided into two classes:
performed, the chip moves to the sign-off stage [35]. GDS-II classification and regression. Classification is a form of data
is the resultant file sent to the semiconductor foundries for analysis that extracts models describing important data
IC fabrication. classes. Such models, called classifiers, predict discrete
IC fabrication involves many advanced and complex categorical class labels [39]. In contrast, regression is used
physical and chemical processes that must be performed to predict missing or unavailable numerical data rather than
with utmost precision. It comprises numerous stages, from discrete class labels. Regression analysis is a statistical
wafer preparation to reliability testing. A detailed descrip- methodology generally used for the numeric prediction
tion of each stage is presented in [36]. In brief, silicon of continuous-valued functions [40]. The term prediction
crystals are grown and sliced to produce wafers. The wafers refers to both numeric and class-label predictions. The
must be polished to near perfection to achieve extremely classification/regression process can be viewed as a learning
small dimensions of VLSI devices. The fabrication process function to predict a mapping of 𝑌 = 𝑓 (𝑋) where 𝑌 is a set
comprises several steps, including the deposition and dif- of output variables for 𝑋 input variables. The mapping
fusion of various materials on the wafer. The layout data function is estimated for predicting the associated class label
from the GDS-II file is converted into photolithographic y of a given new tuple 𝑋 (Fig. 3(b)). The most considerable
masks, one for each layer. The masks define the spaces drawback of supervised learning is that it requires a massive
on the wafer where certain materials need to be deposited, amount of unbiased labeled training data, which is hard
diffused, or even removed. During each step, one mask is to produce in specific applications such as VLSI. Most
used. Several dozen masks may be used to complete the popular regression and classification algorithms include
fabrication process. Lithography is the step that involves linear, polynomial, and ridge regressions; decision trees
mask preparation and verification as well as the definition (DT); random forest (RF); support vector machines (SVMs);
of different materials in specific areas of the IC. It is a and ensembled learning [41, 42].
crucial step during fabrication and is repeated numerous
times at different stages. It is the step most affected by the 4.2. Unsupervised Learning
downscaling of technology nodes and the increase in process In contrast to supervised learning, unsupervised learning
variations. After the chip is fabricated, the wafer is diced, and does not require a label for each training tuple. Hence, it
individual chips are separated. Subsequently, each chip is requires less effort to generate the data than supervised learn-
packaged and tested to validate the design specifications and ing. However, point estimates/desired output for a required
functional behavior. Post-silicon validation is the last step in input vector is harder to achieve with unsupervised learning.
IC manufacturing and is used to detect and fix bugs in ICs It is employed to identify unknown patterns in the data. Clus-
and systems after production [37]. tering and dimensionality reduction through principal com-
ponent analysis and other methods are powerful applications
associated with unsupervised learning. Clustering involves
4. Brief on AI/ML algorithms grouping or segmenting objects into subsets or "clusters"
In modern times, statistical learning plays a crucial role such that the objects in each cluster are more closely related
in nearly every emerging field of science and technology. to one another than to the objects of different clusters. For a
The vast amount of data generated and communicated within more detailed discussion, refer to [43]. Common clustering
each field can be mined for learning patterns and depen- algorithms include K-nearest neighbors (KNN), K-means
dencies among the parameters for future analyses and pre- clustering, hierarchical Clustering, and agglomerative clus-
dictions. The statistical learning approach can be applied to tering [44].
solve many real-world problems. AI is a technology that
enables a machine to simulate human behavior. ML and 4.3. Semi-supervised Learning
deep learning are the two main subsets of AI. ML allows Semi-supervised learning acts as a bridge between su-
a machine to automatically learn from past data without pervised and unsupervised methodologies. It is useful when
explicit programming. Deep learning is the prime subset of training data has limited labeled samples and a large set of
ML (Fig. 3(a)). ML includes learning and self-correction unlabeled samples. It works great to automate data labeling.

D Amuru et al.: Preprint submitted to Elsevier Page 5 of 41


AI/ML in VLSI

Figure 3: (a) Overview of Artificial Intelligence techniques (b) Learning function of classification/regression algorithms (c) Deep
learning training and prediction

It works better than supervised/unsupervised learning alone and deep convolution NNs (DCNNs) [48]. Recently, DNNs
in some applications. The training starts with limited labeled have revolutionized the field of computer vision. DCNNs are
data and then applies algorithms to model the unlabeled suitable for computer vision tasks [50]. Other popular deep
dataset with pseudo labels in the next step. Then, the labeled learning techniques include recurrent NNs (RNNs) [51];
data is linked with pseudo labels and later with unlabeled generative adversarial networks (GANs) [52, 53]; and DRL
data to improve the accuracy [45, 46]. However, many efforts (deep reinforcement learning) [54]. Refer to the research
are needed to converge both parts of the semi-supervised works mentioned in fig. 2 for implementation details of these
methodology in certain complex applications. algorithms.
Rapid development in several fields of AI/ML is increas-
4.4. Reinforcement Learning ing the scope for solution creation to address many divergent
Reinforcement learning is an area of machine learning problems associated with IC design and manufacturing. In
that maps the situations to actions to maximize a numerical the following sections, we discuss the applications of AI/ML
reward signal; it is focused on goal-directed learning based at different abstraction levels of VLSI design and analysis,
on interactions [47]. It does not rely on examples of correct starting with circuit simulation.
behavior as in the case of supervised learning or does not
try to find a hidden pattern as in unsupervised learning.
Reinforcement learning is trying to learn from experience 5. AI at the Circuit Simulation
and find an optimum solution that maximizes a reward Simulation plays a vital role in IC device modeling.
signal. Performance evaluation of designed circuits through simula-
tions is becoming quite challenging in the nanometer regime
4.5. Deep Learning due to increasing process and environmental variations [55,
Deep learning is a subset of ML and is particularly suit- 56, 57]. The ability to discover functional and electrical per-
able for big-data processing. Deep learning enables the com- formance variations early in the design cycle can improve the
puter to build complex concepts from simple concepts [48]. IC yield, which depends on the simulation tools’ capability.
A feed-forward network or multi-layer perceptron (MLP) is By assimilating the automated learning capabilities offered
an essential example of a deep learning model or artificial by AI/ML algorithms in E-CAD tools, the turnaround time
neural network (ANN) (Fig. 3(c)). An MLP is a mathemat- and performance of the chip can be revamped with reduced
ical function mapping a set of input and output values. The design effort. Researchers have proposed surrogate method-
function is formed by composing many simple functions. A ologies targeting the characterization of the leakage power,
shallow neural network (SNN) is an NN with one or two total power, dynamic power, propagation delay, and IR-
hidden layers. A network with tens to hundreds of such layers drop estimation ranging from stack-level transistor models
is called a deep neural network (DNN). DNNs extract fea- to the subsystem level [58]. Different AI/ML algorithms
tures layer by layer and combines low-level features to form have been explored for circuit modeling at different abstrac-
high-level features; thus, they can be used to find distributed tion levels, including linear regression (LR), polynomial
expressions of data [49]. Compared with shallow neural net- regression (PR), response surface modeling (RSM), SVM,
works (SNNs), DNNs have better feature expression and the ensembled techniques, Bayes theorem, ANNs, and pattern
ability to model complex mapping. Frequently used DNNs recognition models [59]. The following subsections describe
include deep belief networks, stacked autoencoder (SAE),

D Amuru et al.: Preprint submitted to Elsevier Page 6 of 41


AI/ML in VLSI
to manufacturing is another concern [71]. Identifying the
the learning strategies proposed in the literature for VLSI trend in ML applications for device modeling from RSM to
device/circuit characterization at different abstraction levels. ANNs over the years and noticing the future requirements
in advanced technologies, we propose inductive transfer
5.1. DEVICE LEVEL learning [72, 73] as a promising technique for investigating
Parametric yield estimation of the circuit and device the device behavior in forthcoming technology nodes from
modeling at the transistor level is the primary focus area the knowledge of existing technology nodes.
at this level. Parametric yield estimation of statistical-aware Given a source domain, 𝐷𝑆, a corresponding source task,
VLSI circuits is not new; this process has been evolving 𝑇𝑆 , a target domain, 𝐷𝑇 , and a target task, 𝑇𝑇 , the objective
along with ML algorithms since the 1980s. Statistical para- of transfer learning is to enable the learning of the target
metric yield estimation was proposed [60] for determining conditional probability distribution, 𝑃 (𝑌𝑇 |𝑋𝑇 ) in 𝐷𝑇
the overall parametric yield of MOS circuits. Alvarez et al. with the information gained from 𝐷𝑆 and 𝑇𝑆 where 𝐷𝑆 ≠
and Young et al. proposed a statistical design anal- ysis 𝐷𝑇 or 𝑇𝑆 ≠ 𝑇𝑇 . In most cases, a limited number of labeled
through a response surface methodology (RSM) for target examples are assumed to be available, exponentially
computer-aided VLSI device design [61, 62]. The proposed smaller than the number of labeled source examples. Fig. 4
models have been successfully applied to optimize the BiC- shows the proposed methodology for developing a learning
MOS transistor design. RSM has inspired industrial exper- system using transfer learning to analyze the behavior of
imentation since its development in the 1950s. Refer to devices in upcoming technology nodes.
[63],[64] for a comprehensive review of RSM. Khan et al.
[65] proposed the multivariate polynomial regression (MPR) 5.2. GATE LEVEL
method for approximating the early voltage and MOSFET Researchers have explored the application and devel-
characteristics in saturation; they considered a curve-fitting opment of AI/ML techniques for gate-level circuit design
approach using the least-squares method in MPR for simpli- and evaluation. Figure 5 shows generalized modeling of
fying the complexity in BSIM3, and BSIM4 equations [66] statistical aware circuit simulation at the gate level. Down
to calculate the MOSFET characteristics realistically. the line, RSM modeling was popular for estimating process
Considering the drastic decrease in the dimensions of variation effects on the circuit design. Mutlu et al. presented
technology nodes, conducting a thorough analysis of the a detailed analysis of the development of RSMs to estimate
characteristics at the device level is of utmost necessity. The the process variation effects on the circuit design [74]. Basu
randomness in the behavior of transistors due to the inter- et al. [75] developed a library of statistical intra-gate varia-
die and intra-die variations in the process causes ex- tion tolerant cells by building RSM-based gate-delay models
ponential changes in the device currents, particularly in the with reduced dimensions; the developed, optimized standard
sub-threshold [56]. Statistical sampling techniques are more cells can be used for chip-level optimization to realize the
effective than conventional corner-based methods for timing of critical paths. In [76], [77] RSM learning models
estimating the effect of the process parameters on the device were developed via a combination of statistical design of
[67]. The datasets generated from the statistical sampling experiment (DoE) and an automatic selection algorithm for
techniques are best suited for learning strategies. The devel- the SSTA of the gate-level library-cell characterization of
opment of AI/ML algorithms for analyzing device parame- VLSI circuits. Their models considered the threshold
ters at different technology nodes facilitates the optimization voltage (𝑉𝑡ℎ) and current gain (𝛽) as model parameters
of the device parameters and estimating the parametric yield for a compact transistor model characterization of power,
at very high computational speeds. Owing to this fact, an delay, and output transitions. In [76], the RSM and linear
ML-based Tikhonov regularization (TR) approach is im- sensitivity approaches were proposed to increase the analysis
plemented to analyze the impact of the process on 𝑉𝑇 𝐻 speed by one and two orders of magnitude, respectively,
in GaN-based high electron mobility transistors (HEMTs) when compared to that of Monte Carlo (MC) simulations,
[68]. In [69], neural network-based variability analysis of albeit at the cost of a decrease in accuracy of up to 2% and 7%
ferroelectric field-effect transistor (FeFET) with raw data in respectively. In [77], on average, s-DoE has an error of 0.22%
the form of polarization maps from the metrology as inputs is at the tails of 3𝜎 distribution compared to the 10x error
proposed. High/low threshold voltage, on-state current, and given by sensitivity analysis by cadence encounter library
sub-threshold slope are sampled as outputs from the model. characterizer (ELC).
The experiments show that ML predictions are 106 times Miranda et al. [78] also proposed a variation-aware sta-
faster and > 98% accurate compared to TCAD simulations. tistical design of experiments approach (s-DoE) for predict-
A hybrid analytical and deep-learning-assisted MOSFET I- ing the parametric yield of static random access memory
V (current-voltage) modeling is proposed in [70]. For mod- (SRAM) circuits under process variability. Their approach
eling the I-V characteristics of a 12nm gate length GAAFET achieved an accuracy of approximately two orders of mag-
(Gate-all-around transistor) technology, a 3-layer NN with nitude better than that for the sensitivity analysis in the
18 neurons was employed. tail response under 3𝜎 process variations and a CPU time
Performance evaluation of FinFET devices and circuits 10–100 times less than that in MC simulations. The case
designed at 7 nm and above is becoming challenging. Ac- studies in the article demonstrate the advantage of s-DoE
curate estimation of the reliability of these devices prior

D Amuru et al.: Preprint submitted to Elsevier Page 7 of 41


AI/ML in VLSI

Figure 4: Block diagram of the proposed inductive transfer learning for device modeling at upcoming lower technology nodes

in choosing the region of interest in the distribution to technologies, and non-FinFET and FinFET technologies, us-
improve accuracy while reducing the number of simula- ing a limited combination of output capacitance, input slew
tions. Under similar lines, Chaudhuri et al. [79] developed rate, and supply voltage. Utilizing the Bayesian inference
accurate RSM-based analytical leakage models for 22nm framework, they extract the new timing model parameters
shorted-gate and independent-gate FinFETs using a central using an ultra-small set of additional timing measurements
composite rotatable design to estimate the leakage current in from the target technology, achieving a 15× runtime speedup
FinFET standard cells by considering the process variations. in simulation runs without compromising accuracy, which
Their results agreed well with the quasi-MC simulations is better than the traditional lookup table approach. They
performed in TCAD using 2D cross-sections. employed ML to develop priors of timing model coefficients
Exploration of possible patterns in simulated data and using old libraries and sparse sampling to provide the addi-
reuse of the data across various stages of circuit design was tional data points required for building the new library in the
of great interest. In this fashion, Cao et al. [80] proposed target technology.
a robust table-lookup method for estimating the gate-level Over time, polynomial regression was another important
circuit leakage power and switching energy of all possible analytical modeling technique. A statistical leakage estima-
states using the Bayesian interface (BI) and neural networks tion through PR was proposed by [82]. Experimental results
(NNs). Their model uses pattern recognition by classifying on the MCNC benchmark [83] show that the leakage estima-
the possible states based on the average power consump- tion is five times more efficient than Wilkinson’s approach
tion values using NNs. The idea is centered on using the [84] with no accuracy loss in mean estimation and about 1%
statistical information on a circuit’s available SPICE power in standard deviation. On these lines, Moshrefi et al.
data points to characterize the correlation between the state- [85] proposed an accurate, low-cost Burr distribution as a
transition patterns and power consumption values of the cir- function for delay estimation at varying threshold voltages
cuit. Such correlated pattern information is further utilized ±10% from mean. The samples are generated at the 90, 45,
to predict the power consumption of any seen and unforeseen and 22nm technology nodes. Statistical data from MATLAB
state transition in the entire state-transition space of the were applied to HSPICE for simulations to obtain delay
circuit. The estimation errors obtained using NNs always variations. The relation between the threshold voltage and
exhibit normal distributions, with much smaller variations delay variations was determined as a fourth-order polyno-
than benchmark curves. Moreover, the estimation error de- mial equation. In addition to the mean and variance of
creases with the number of clusters and complexity of the the estimated distributions, the maximum likelihood was
NNs when appropriate features are extracted. Additionally, considered the third parameter, forming a three-parameter
the time required to train and validate the NNs is negligible probability density function. The proposed Burr distribution
compared to the computing time required to generate statis- benefits with one more degree of freedom to the normal
tical distributions using the SPICE environment. distribution [86], and with lower error distribution.
Applying BI, Yu et al. [81] proposed a novel nonlinear The AI/ML predictive algorithms are intermittently ap-
analytical timing model for statistical characterization of the plied for the process–voltage–temperature (PVT) variation-
delay and slew of standard library cells in bulk silicon, SOI aware library-cell characterization of digital circuit design

D Amuru et al.: Preprint submitted to Elsevier Page 8 of 41


AI/ML in VLSI

Figure 5: Generalized statistical aware modeling for VLSI circuit simulation

and simulation. The accurate performance modeling of dig- 90nm MOS technology to create the dataset. The feature
ital circuits is becoming difficult with the acute downscaling vectors extracted for modeling are capacitance, resistance,
of transistor dimensions in the deep sub-micrometer regime number of MOSFET, their respective width and length, and
[87], [88]. To address the concern regarding the performance the average power consumption of the respective layout. As
modeling of digital circuits in the sub-micrometer regime, per the experimental results, Extra tree and polynomial
Stillmaker et al. [89] developed polynomial equations for regressors demonstrate better performance over Linear, RF,
curve-fitting the measurements of the CMOS circuit delay, and DT regressors. GPU-based circuit analysis is required
power, and energy dissipation based on HSPICE simulated at the present state of complex-circuit analysis. Recently,
data using predictive technology models (PTMs) [90] at XT-PRAGGMA, a tool to eliminate false aggressors and ac-
technology nodes ranging from 180 nm to 7 nm. Second- curately predict crosstalk-induced delta delays using GPU-
order and third-order polynomial models were developed accelerated dynamic gate-level simulations and machine
with iterative power, delay, and energy measurement ex- learning, is proposed in [96]. It shows a speedup of 1800x
periments, attaining a coefficient of determination (R2score compared to SPICE-based simulations.
[91]) of 0.95. The scaling models proposed in [92] and An accurate yield estimation in the early stage of the
[89] are more accurate for comparing devices at different design cycle can positively impact the cost and quality of IC
technology nodes and supply voltages than the classical manufacturing [97],[98]. Comprehensive analysis of VLSI
scaling methods. circuits’ delay and power characteristics being designed at
Development of MPR and ANN models for measuring the sub-nanometer scale under expanding PVT variations is
the PVT-aware (process voltage temperature) leakage in extremely important for parametric yield estimation. As
CMOS and FinFET digital logic cells was reported in [91], reported earlier, accurate predictions are made by well-
and [93] respectively. [91] also models total power with the trained AI/ML algorithms, such as PR, ANNs, GB, and BI,
same MPR model. The developed models demonstrated high with power and delay estimations that are very close to
accuracy with < 1% error w.r.t. the HSPICE simulations. those of the most-reliable HSPICE models. Incorporating
Amuru et al. [94] reported a PVT-aware estimation of leak- such efficient ML models in EDA tools for library-cell char-
age power and propagation delay with a Gradient boosting acterization at the transistor level and gate level facilitates
algorithm, which yields a < 1% error in estimations with the performance evaluation of complex VLSI circuits at
104 times improvement in computational speed compared to very high computational speeds, facilitating the analysis of
HSPICE simulations. These characterized library-cell esti- the yield. These advanced computing EDA tools drastically
mations can be used for estimating the overall leakage power improve the turnaround time of the IC.
and propagation delay of complex circuits, avoiding the rela-
tively long simulation runs of traditional compilers. Bhavesh 5.3. CIRCUIT LEVEL
et al. [95] propose an estimation of power consumption of the Statistical characterization of VLSI circuits under pro-
MOSFET-based digital circuits using regression algorithms. cess variations is essential for avoiding silicon re-spins.
PMOS-based Resistive Load Inverter (RLI), NMOS-based Similar to gate-level, explorations for the design of ML-
RLI, and CMOS-based NAND gate layout are employed at based surrogate models at the circuit level were reported in

D Amuru et al.: Preprint submitted to Elsevier Page 9 of 41


AI/ML in VLSI
a post-layout simulation with the SDF back-annotation in
the literature. Hou et al. [99] reported the power estimation Mentor Graphics ModelSim to extract the bit-level timing
of VLSI circuits using NNs. Trained NNs can estimate the error information. The logistic regression model shows an
power using the input/output (I/O) and cell number without average accuracy of 95% at various voltage/temperature
requiring circuit information such as net structures. This corners and unseen workload, with an average guard-band
approach requires the power estimation results of benchmark reduction of 10%. ML-based power estimation techniques
circuits to train the target NN. Limited experimental results at the RTL level that outperform commercial RTL tools
have shown that this method can give acceptable results [111, 112, 113, 114] were proposed in [115]. Their ex-
with a specific net structure at a considerably high speed. periments recommend CNN over ridge regression, gradient
Stockman et al. [100] discussed a novel approach for predict- tree boosting, and multi-layer perceptron for accurate power
ing power consumption based on memory activity counters, estimations. The average power estimation from the RTL
exploiting the statistical relationship between power con- simulations using a GNN [116], GRANNITE, was presented
sumption and potential variables. The proposed ML models in [117]. GRANNITE achieved > 18.7𝑋 speedup when
for the prediction include support vector regression (SVR), compared to traditional per-cycle gate-level simulations.
genetic algorithms, and NNs. They showed that a NN with The AI/ML strategies can be extended to the circuit and
two hidden layers and five nodes per layer is the best pre- RTL level to build macrocell models for parametric yield
dictor among the chosen ML models, with a mean square estimation and optimization. The models built using ANNs,
error of 0.047. In addition, they explained that the ML ap- CNNs, and deep learning techniques are helpful for complex
proaches are significantly less costly and less complex than a cell design optimization and power-delay product
hardware solution, with reduced run time. Janakiraman et al. prediction as they are less dependent on the complete circuit
[101] proposed an efficient ANN model for characterizing description. Another critical bottleneck is the generation of
the voltage- and temperature-aware statistical analysis of big data for ML algorithms. ML algorithms require a large
leakage power. Trained transistor-level stack models used amount of simulated data to accurately develop I/O
for circuit leakage estimation. The designed model showed relationships, which is possible at some levels of digital
100x improvement in runtime with < 1% and < 2% error in circuits and their applications. The concept of GANs can
the mean and standard deviation of Monte-Carlo statistical help address this concern. Generative models aim to esti-
leakage estimations. The complexity of the comprehensive mate the training data’s probability distribution and generate
model is reported as 𝑂(𝑁) on par with existing linear and samples belonging to the same data distribution manifold
quadratic models [102, 103, 84, 104]. [118]. GAN-based semi-supervised method architectures for
Garg et al. presented SVM-based macro models for char- the regression task proposed recently [119] strengthen the
acterizing transistor stacks of CMOS gates with an average possibilities of applying GAN to the regression tasks of
increase in the runtime of 17× compared to those of the digital circuits. Different measures and techniques need to be
HSPICE computations for estimating the leakage power explored to keep the quantization error introduced by these
[105]. Kahng et al. [106] proposed a hybrid surrogate model networks in check.
that combines the predictions of ANN and SVM models to
estimate the incremental delay due to the signal integrity 5.5. POST LAYOUT SIMULATION
aware path delay in a 28-nm FDSOI technology, demon- ML models also facilitate the efficient use of resources in
strating a worst-case error of < 10𝑝𝑠. An accurate power repeated dynamic IR-drop simulations. The model proposed
estimation of CMOS VLSI circuit using Random Forest in [120] reduces the training time by building small-region
(RF) that performs better than NNs is proposed in [107]. models for cell instances for IR-drop violations instead of
Results show a good agreement with ISCAS’89 Benchmark building a global model for the entire chip. Further, ML
circuits. A fast and efficient ResNet-based digital circuit models work on the regional clusters to extract the required
optimization framework for leakage and delay is proposed features and predict the violations. Experiments on validated
in [108]. Results on 22nm Metal Gate High-K digital cells industry designs show that the XGBoost model outperforms
show 36.7% and 18.8% reduction in delay and leakage using CNNs for IR-drop prediction, requiring less than 2 min for
a genetic algorithm. each ECO iteration. Zhiyao Xie et al. [121] developed a fast
design independent dynamic IR-drop estimation technique
5.4. RTL level named PowerNet based on CNNs. Design-dependent, ML-
The effect of process variability on guard bands and its based IR-drop estimation techniques are proposed in [120,
mitigation are detailed in [109]. Jiao et al. [110] proposed a 122, 123, 124, 125].
supervised-learning model for the bit-level static timing er- Han et al. [126] proposed an ML-based tool called
ror prediction modeled at the RTL level, aiming for a guard- Golden Timer eXtension (GTX) for sign-off timing analysis.
band reduction in error-resilient applications. They consid- Using the proposed tool, they attempted to predict the timing
ered floating-point pipelined circuits in their analysis. The slack between different timing tools and the correlation
circuit’s behavior was characterized by timing errors using between the sign-off tool and implementation tool across
the Synopsys design and Synopsys IC compilers as frontend multiple technology nodes. The poor yield due to the inac-
and backend design tools, respectively. Synopsys prime-time curate timing estimation by the STA sign-off, particularly
was used for voltage and temperature scaling, followed by

D Amuru et al.: Preprint submitted to Elsevier Page 10 of 41


AI/ML in VLSI

Figure 6: Summary of proposed AI/ML algorithms in literature for circuit simulation parameter estimation/performance evaluation

at nodes below 16 nm at low voltages, can be improved obtain the electromigration-aware aging prediction of the
using surrogate tools, supporting advanced processes for power grid networks during the design phase itself.
accurate timing calibration. ML techniques in chip design Power delivery networks (PDNs) supply low-noise power
and manufacturing, notably addressing the effect of process to the active components of the ICs. As the supply volt- age
variations on chip manufacturing at the sub-22-nm regime, scaled down, the variations in power supply voltage
are discussed in [127]. The authors discuss pattern-matching increased, affecting the system’s performance, especially at
techniques integrated with ML techniques for pre-silicon higher frequencies. The effects of this power supply noise
HD, post-silicon variation extraction, bug localization, and can be minimized with a proper design of impedance-
learning techniques for post-silicon time tuning. [128] re- controlled PDN. The probability of system failure increases
views some of the on-chip power grid design solutions using with the PDN ratio (The ratio of the actual impedance of the
AI/ML approaches. It thoroughly discusses Power grid PDN to the target impedance). It can be minimized by
analysis using probabilistic, heuristic, and machine-learning efficiently selecting and placing decoupling capacitors on
approaches. It further recommends that it is necessary to

D Amuru et al.: Preprint submitted to Elsevier Page 11 of 41


AI/ML in VLSI
Things) and embedded system applications. Various AI
the board and/or the package. A fast ML-based surrogate- applications involve large datasets and demand a faster
assisted meta-heuristic optimization framework for decou- interface between the computing unit and memory. Different
pling capacitor optimization is proposed in [129]. memory architectures were proposed in the past, addressing
Further, a low-cost machine learning-based chip per- data movement and processing issues. Kang et al. [134]
formance prediction framework using on-chip resources is proposed deep embedding of computation in an SRAM
proposed [130]. It predicts the maximum operating fre- parallel processing architecture for pattern recognition in
quency of chips for speed binning with an accuracy of over 256 × 256 images; their model enables multi-row read
90% w.r.t Automatic Test equipment (ATE). Experimental access and analog signal processing without degrading the
results on 12𝑛𝑚 industrial chips show that linear regression system performance. Their method employs two models:
is more suitable with less training time and model size than multi-row READ and the analog sum of absolute difference
XGBoost. It also proposes a sensor selection method to (SAD) computation. This architecture differs from conven-
minimize the area overhead on on-chip sensors. Sadiqbatcha tional architecture as a data path between the processor and
et al. [131] propose RealMaps, a framework for real-time memory is not required. The SAD is computed at different
estimation of full-chip heatmaps using an LSTM-NN (Long locations of the array in parallel with multiple windows of
short-term memory) model with existing embedded temper- the template pattern. For high-performance computations,
ature sensors and system-level utilization information. The Zhang et al. [132] proposed a 6T-SRAM array that stores an
experiments to identify the dominant spatial features through ML classifier model, which is an ultra-low energy detector
2D spatial DCT (Discrete Cosine Transform) shows that for image classification. The prototype is a 128 × 128 SRAM
only 36 DCT coefficients are required to maintain sufficient array that operates at 300 MHz, with an accuracy equivalent
accuracy. Fig 6 presents a summary of the AI/ML algorithms to that of a discrete SRAM/digital-MAC system.
proposed in the literature to address VLSI circuit simulation. Gonugondla et al. [135] presented a robust, deep-in-
As reported earlier, AI and ML can be incorporated memory ML classifier with a stochastic gradient descent
into EDA tools and methodologies at various stages of based on an on-chip trainer using a standard 16 kB 6T-
circuit simulation to address different statistical/parameter SRAM bit-cell array. In-memory computing is a technology
estimations, including the leakage power, total power, prop- that uses memory devices assembled in an array to execute
agation delay, and effects induced due to aging, yield, and MAC operations [136]. Kang et al. [133] worked on deep in-
power consumption. Assimilation of these automated learn- memory architecture (DIMA) as a substitute for the regular
ing strategies into VLSI circuit design and simulation will von Neuman architecture for realizing energy and latency-
revolutionize the field of CAD–VLSI considering the nu- efficient ML SoCs. This architecture was employed mainly
merous related advantages. for targeted applications, such as IoT and autonomous driv-
ing, which require computing heavy ML algorithms. DIMA
6. AI in Architectures eliminates the need for separate computation and memory
by implanting the conventional memory periphery with the
Design of VLSI architectures became dynamic with the
computation hardware. The design employs 6T SRAM with
evolution of AI/ML techniques [132, 133]. Advances in NN
a changeless bit-cell structure to maintain the storage den-
algorithms and innovations in high bandwidth and high-
sity. In [137], MAC circuit architecture in a 2T–1C con-
performance semiconductor designs have paved a new way
figuration (two MoS2 FETs and one metal-insulator-metal
to address the challenges in hardware implementations of
capacitor) is the core module for the convolution operation
advanced real-time applications. Over the last few decades,
in an artificial neural network. The memory portion of
different architectures have inspired the advancement of
this circuit is similar to Dynamic Random Access Memory
VLSI technology. Most design developments/improvements
(DRAM) but with a longer retention time owing to the
are motivated by the need for edge applications with high
ultralow leakage current of the MoS2 transistors.
processing speeds, improved reliability, low implementation
Wang et al. [138] discussed parallel digital VLSI archi-
cost, and time-to-market windows. The architectural designs
tecture for combined SVM training and classification. In this
proposed in the literature are majorly for the application
parallel architecture, a multi-layer system bus and multiple
domains of image processing and signal processing, speech
distributed memories fully utilize parallelism. Before this,
processing, IoT, and automobile.
many SVMs were developed and discussed in [139, 140,
This survey presents a broad review of VLSI architec-
141], primarily focusing on the 90-nm technology node.
tural modifications at the memory and systolic array archi-
Distinctively, Wang et al. in [138] developed the SVM on a
tectures in this section and at the SoC level in the next
45-nm node on a commercial GPU with an enhanced
section to provide the authors with an overview and scope
speedup of 29× compared with traditional SVMs on digital
of research in VLSI architectures for ML.
hardware.
6.1. Memory Systems A part of the computation tasks can be performed in-
Memory systems are one of the computational sys- side the memory to solve the data movement issue, thus
tems’ essential and dominant components. Different scalable avoiding the memory access bottleneck and accelerating the
memory architectures have been designed for the real-time application performance significantly. Such architectures are
processing of ML algorithms in various IoT (Internet of

D Amuru et al.: Preprint submitted to Elsevier Page 12 of 41


AI/ML in VLSI
any read disturb issues is proposed in [150]. Recent state-
processing in-memory (PIM) architectures. [142] proposes of-the-art works on CIM chips are presented in [151].
NNPIM, a novel PIM architecture, to accelerate NN’s in- As per their research, the SRAM-based CIM solution can be
terface inside the memory. The memory architecture com- a potential choice for AI processors than NVM-based (non-
bines crossbar memory architecture for faster operations, volatile memory) CIMs. NVM-based CIMs or mem- ristive
optimization techniques to improve the NN performance and devices include resistive random-access memory (RRAM),
reduce energy consumption, and weight sharing mechanism magneto-resistance RAM (MRAM), and phase- change
to reduce the computational requirement of NNs. [143, 144] memory (PCM) [152, 153]. A survey on the mem- ristive
are some of the significant state-of-the-art DRAM PIM simulation frameworks, their comparisons, and future
architectures. modeling is highlighted in [154].
Another evolved computing technique is near-memory In the past, Cheng et al. [155] introduced the training-in-
processing (NMP). Near-memory processing incorporates memory architecture for the memristor-based DNN named
the memory and logic chips in 3D storage packages to TIME. It reduced the computation time of the regular train-
provide high bandwidth. Schuiki et al. [145] proposed a ing systems. This architecture supports not only interference
near-memory architecture for training DNNs. This model but also backpropagation and update during the training of
was developed for accelerating DNN training instead of the NNs. It is based on metal-oxide resistive random access
interference. The training engine, NTX, was used to train the memory, which enhances performance and efficiency. The
DNNs at scale. They explored the RISC-V cores and NTX main module is divided into three subarrays: full-function,
coprocessor by reducing the overhead on the main proces- buffer, and memory. The full-function subarray manages
sor by seven times. The NTX combined with the RISC-V the memory and training operations such as interference,
processor core offers a shared memory space with single- backpropagation, and update. The memory subarray man-
cycle access on a 128-kB tightly coupled data memory. The ages data storage, and the buffer subarray holds the inter-
architecture employs a hybrid memory cube as the memory mediate data for the full-function subarray. This architecture
module for training the DNNs in data centers. improves energy efficiency in deep reinforcement learning
In [146], a general-purpose vector architecture for mi- and supervised learning.
gration of ML kernels for near-data processing (NDP) to A thorough survey on hardware accelerators is outside
achieve high speedup with low energy consumption is pre- the scope of this paper. Interested readers can refer to [156],
sented. Their architecture shows a speedup of up to 10x for a review of accelerators and similar works [157, 158, 159].
KNN, 11× for MLP, and 3× for convolution when However, we could provide the overview of different design
processing near-data compared to a high-performance ×86 aspects at the architecture level to speed up the ML compu-
baseline. The work also includes an NDP intrinsics library tations
that supports validating NDP architectures based on large
vectors. A machine-learning framework is proposed in [147] 6.2. Systolic Arrays
to effectively predict the suitable NSP system (among an A systolic array is a subset of the data-flow architecture
HBM-based (High Bandwidth Memory) NDP system, an comprising several identical cells, with each cell locally con-
HMC-based (Hybrid Memory Cube) NDP system, and a nected to its nearest neighbor. A wavefront of computation
conventional DDR4-based system) for a given application is propagated in the array with a throughput proportional to
based on the rankings in performance for a given workload. the I/O bandwidth. Systolic arrays are fine-grained and
Kaplan et al. worked on K-means and KNN algorithm evalu- highly concurrent architectures. The progress of IoT-based
ation for processing in-storage acceleration of ML (PRINS) smart applications has exponentially increased the demand
[148], a system employing resistive content addressable for deep learning algorithms and, in turn, systolic array-
memory (ReCAM). This architecture functions both as a based architectures.
storage and a massively parallel associative processor. This In these lines, an automatic design space exploration
design works better than the von Neumann architecture framework for CNN-based systolic array architecture im-
model in managing the bottleneck between the storage and plementations on an FPGA under high resource utilization
main memory. These algorithms outperformed CPU, GPU, and at higher speeds was proposed in [160, 161]. They
and field-programmable gate array (FPGA) in fetching time- utilize analytical models to provide in-depth resource esti-
accessing data from the main memory. The ReCAM is more mation and performance analysis. However, systolic array
efficient than traditional CAMs as it implements line-by- implementations on FPGAs are affected much by the spar-
line execution of the truth table of the expression. PRINS sity problem of deep neural networks. Researchers earlier
enhances the power efficiency and performance compared worked towards this problem. An approach of packing sparse
to other hardware for both K-means and KNN evaluation. convolutional neural networks into a denser format for ef-
A survey on the architectural aspects, dimensions, chal- ficient implementations using systolic arrays is proposed in
lenges, and limitations of In-memory computing processing- [162]. However, these designs create irregular sparse models
in-memory (CIM) is presented in [149]. A robust and area- that fail to exploit the data-reuse rate feature of the systolic
efficient CIM approach with 6T foundry bit-cells that has array. Structured pruning was introduced in [163, 164] to
improved dynamic voltage range for dot product computa- overcome the problem associated with the data-reuse rate
tions, withstanding bit-cell 𝑉𝑡 variations, and eliminating

D Amuru et al.: Preprint submitted to Elsevier Page 13 of 41


AI/ML in VLSI
7. AI at the SOC
that produces DNNs compatible with the synchronous and
rhythmic flow of data from memory to the systolic arrays. Artificial intelligence, more specifically deep learning, is
Further, [165] propose Eridanus, an approach for struc- feasible in most hardware applications due to the advance-
tural pruning the zero-values in sparse DNN models be- fore ments in computing and semiconductor fields. Many at-
implementing them on systolic arrays. The approach tempts have been made to replicate the human brain in next-
examines the correlation among all the filters to extract the generation applications, often referred to as neuromorphic
locally-dense blocks, the widths of which match the width of computing. Several critical modifications are made to the
the target systolic array, thus reducing the sparsity problem. SoC architectures to incorporate deep-learning capabilities.
Similarly, optimization for the systolic array architecture of These design modifications impact general-purpose SoC
deep learning accelerators for sparse CNN models on FPGA designs and specialized systems that include specialized
platforms is necessary as the zeros in the filter matrix of CNN processing technologies with heterogeneous and massive
occupy the computation units resulting in sub- optimal parallel matrix computations, innovative memory architec-
efficiency. A sparse matrix packing method with bit- map tures, and high-speed data connectivity.
representation that condenses sparse filters to reduce the AI-SoC models must be compressed to ensure their
computation required for systolic array accelerators is operation at constrained memory architectures in mobile,
proposed in [166]. communications, automobile, and IoT edge applications.
Many systolic array architectural modifications were The model compression is performed through controlled
proposed in the literature addressing specific applications. pruning without compromising accuracy. However, power,
In [167], an MLP training accelerator as a systolic array on latency, and other areas could be trade-offs. Therefore, the
Xilinx U50 Alveo FPGA card is proposed to address the architectural modifications are to be carefully chosen with
attack detection on a massive amount of traffic logs in the combined efforts on memory and datapath subsystems.
network intrusion detection in a short time. The processing FPGA (Field Programmable Gate Array) is one of the
speed per power consumption was 11.5 times better than the widespread and commercially available programmable logic
CPU and 21.4 times better than the GPU. An approximate devices to accelerate the computing power of AI on hardware
systolic array architecture combines timing error prediction [138, 165]. FPGA became a robust device for hardware
and approximate computing to relax the timing constraints accelerators because of its low cost, high energy efficiency,
of MACs [168]. The proposed array on CIFAR-10 image reusability, and flexibility. ASIC (Application Specific Inte-
classification could obtain a 36% energy reduction with only grated Circuits) are at their best for implementing special-
a 1% accuracy loss. A reconfigurable systolic ring archi- ized applications.
tecture to reduce on-chip memory requirement and power NNs are biologically inspired and perform parallel com-
consumption [169]. putations. Digital units such as DSP models, floating-point
Matrix multiplication is one of the primary computations units, ALUs, and high-speed multipliers can be effectively
in most computing architectures. [170] proposes a novel implemented using NN techniques. The fundamental ad-
systolic array based on factoring and radix-8 multipliers vantage of NNs for digital applications is that high-speed
to significantly reduce the area, delay, and power from the circuits can be realized efficiently because of the almost
conventional radix-4 design providing the same functional- constant operation time, regardless of the increasing number
ity. FusedGCN [171], a systolic architecture that computes of bits in the circuit. Exploiting the parallelism in NN
the triple matrix multiplication to accelerate graph convo- computations also provides a balance between using internal
lutions. It supports compressed sparse representations and and off-chip memory.
tiled computations without losing the regularity of a sys- Many ML and deep learning applications were reported
tolic architecture. Recently, a hybrid accumulator factored in the past for the performance evaluation of SoCs. Joseph
systolic array based on partial factoring of carry propagate et al. [174] developed empirical models for processors us-
adder is proposed [172] with a significant improvement in ing LR to characterize the relationship between proces- sor
area, delay, and power. response and micro-architectural parameters. Lee et al.
The functional safety of the accelerators is another criti- [175], Yun et.al[176] proposed power estimation models
cal concern. Faults manifested due to manufacturing defects established via regression analysis for accurate performance
in the data paths of GPU/TPU accelerated DNNs on systolic prediction and power of microprocessor applications in the
arrays may lead to a functional safety violation. An extensive micro-architectural design space. The model proposed in
functional safety assessment of a DNN accelerator exposed [175] reduces the simulation cost with increased profiling
to faults in the data path is presented in [173]. efficiency and improved performance by effectively assess-
From the directions of the state-of-the-art works, it de- ing and modeling the sensitivity according to the number of
mands systolic array architectures that are more flexible, samples simulated for the model formulation and finding
with more data-flow strategies and multiple data transmis- fewer than 4000 sufficient samples from a design space of ap-
sion modes in the future to handle the increasing depths of proximately 22 billion points. Depending on the application,
deep neural networks. 50% - 90% of predictions achieve error rates of < 10%. The
maximum outlier error percent reported is approximately
20% - 33%. Wherein hierarchical Clustering is employed

D Amuru et al.: Preprint submitted to Elsevier Page 14 of 41


AI/ML in VLSI
CAD tools capable of delivering industrial-quality chip
in [176] to determine the best predictors among the ten designs must be tuned for optimal PPA (performance, power,
considered events. The proposed model shows an average area). A holistic approach that involves online and offline
estimation error of approximately 4% between the actual machine learning approaches working together for industrial
and estimated power consumptions when applied to an Intel- design flow tuning is proposed in [186]. The work highlights
XScale-architecture-based PXA320 mobile processor. SynTunSys (STS), an online system that optimizes designs
An investigation and comparative analysis on the appli- and generates data for a recommender system that performs
cation of Machine Learning algorithms for logic synthesis offline training and recommendation. The work also pro-
of incompletely-specified functions is presented in [177]. poses adaptable Online & offline Systems for the future that
Periodic performance monitoring of SoCs is essential for dynamically adapts to the trials originating from the online-
high-speed and energy-efficient computing systems. How- learning algorithm and the recommender system in the due
ever, performance monitoring is dependent on the accu- rate lifespan of the system across various STS iterations.
sampling of critical paths. These critical paths dras- tically Addressing the challenges in meeting timing constraints in
vary with PVT conditions, particularly at advanced nodes. modern ICs, Ajirlou et al. [187] proposed an additional ML
Addressing this issue, Wang et. Al [178] proposes a pipeline stage in the baseline pipelined RISC processor to
machine-learning-based SoC real-time performance moni- classify instructions into propagation delay classes and en-
toring methodology incorporating physical parasitic charac- hance temporal resource utilization. The critical challenges
teristics and PVT variations with unknown critical paths. in deploying ML-based SoC design for real design flows are
Several SoC architectures were reported targeting spe- presented in [188]. The work highlights the challenges due
cific applications. MLSoC for multimedia content analysis to limited data, insufficient open-source benchmarks and
(implemented in TSMC 90-nm CMOS technology) [179]. datasets, EDA tool-based data generation, and Synthetic data
Jokic et al. [180] presents a complete end-to-end dual-engine generation.
SOC for face analysis that achieves >2X improvement in AI-SoC architectures are at the beginning of their ca-
energy efficiency compared to the state-of-art systems. The pabilities with tightly coupled processors and memory ar-
efficiency comes with the hierarchical implementation of chitectures. There is a long way to reach their full capacity
the Binary Decision Tree in the first level and more power- mimicking the human brain in edge applications.
hungry CNN in the next level, which can be triggered when
needed. Machine efficiency monitoring is significant to
achieve high productivity, failure, and cost reduction. An 8. AI in Physical design
SoC-based tool wear monitoring system with a combination VLSI Physical design has numerous combinatorial prob-
of signal processing, deep learning, and decision making is lems that require many iterations to converge. Semicon-
proposed in [181]. The sensor fusion data collected from the ductor technology scaling has increased the complexity of
three-axial accelerometer and MEMS microphone, com- these design problems with complex design rules and design
bined with the measurement of tool flank wear at different for manufacturing (DFM) constraints, making it challenging
scenarios using a camera, is fed to a CNN to detect any to achieve optimal solutions [189]. Traditionally, these is-
machining variation. Extreme learning machines (ELMs) sues/violations are detected and fixed manually. However,
are NN architectures to increase computational efficiency the traditional manual approach to design closure at ad-
and performance for large data processing [182]. A low- vanced nodes is striving hard to meet the market windows. In
cost real-time neuromorphic hardware system of spiking addition to that, the design quality and manufacturing pro-
Extreme Learning Machine (ELM) with on-chip triplet- cess in the later stages of the design flow becomes extremely
based reward-modulated spike-timing-dependent plasticity sensitive to the changes in the early stages, in turn increasing
(R-STDP) learning capability is proposed in [183]. the turnaround time and retarding the design closure. Thus,
A thorough timing analysis of an SoC is also essential to the early-stage prediction of valid designs is critical, par-
meet the design specifications. In [184], ensemble learning- ticularly at the current technology nodes. Machine learning
based timing analysis in an SoC physical design was per- and pattern-matching techniques provide reasonably good
formed. Ensemble learning is a combination of multiple abstraction and quality of results at several stages of physical
machine learning models to improve the performance of the design. They act as a bridge to connect each step and provide
base learners. Many floor plan files with different parameter valuable feedback to achieve early design closure.
settings, followed by slack time from Synopsys IC Compiler Broadly, physical design can be divided into four stages:
tool as the label, were used for training supervised learning partitioning, floor planning, placement & clock tree syn-
algorithms. The idea was to feedback on the prediction thesis, and routing (Fig. 7). We review AI/ML approaches
results at an early stage to the physical design flow to modify proposed by the researchers in these stages through the
the improper floorplan. Bigram-based multi-voltage aware following subsections.
timing path slack divergence prediction [185] utilizes the
classification and regression tree (CART) approach. Exper- 8.1. AI for Partitioning, Floor planning and
imental results show an accuracy of 95 to 97% in predicting Placement
cell delays and endpoint timing slack. Partitioning is one of the dominant areas of VLSI physi-
cal design. The main objective of partitioning is to divide the

D Amuru et al.: Preprint submitted to Elsevier Page 15 of 41


AI/ML in VLSI
Moving to placement, high regularity of data paths is
essential for compact layout design during placement. How-
ever, the data paths are frequently mixed with other cir-
cuits, such as random logic. For designs with many em-
bedded data paths, it is crucial to extract and place them
appropriately for high-quality placement. Existing analyti-
cal placement techniques handle them sub-optimally [196].
However, modern placers fail to handle data paths effectively
due to technological constraints. ML plays a crucial role in
such scenarios. Ward et al. [197] proposed PADE to
demonstrate the capability of automatic datapath extraction
for large-scale designs mixed with random and datapath
circuits. The effective features are extracted by analyzing
Figure 7: Physical Design Flow
the global placement netlist to predict the direction of the
datapath. PADE employs a combination of SVM and NN for
cluster classification and evaluation. Experimental results on
complex circuit into sub-blocks, design them individually, hybrid benchmarks showed promising improvements in half-
and then assemble them separately to reduce the design perimeter and Steiner tree wire lengths. Wang et al. present a
complexity. Floor planning and placement are the other connection vector-based and learning-based data path logic
critical stages in the design flow for design quality and extraction strategies [198]. SVM and CNN are employed for
design closure. Floor planning maps the logic description machine learning based extraction. Results on MISPD 2011
from partitioning and the physical description to minimize data path benchmarks show that both the strategies equally
chip area and delay. The floor planning goals are arranging perform in classifying data path and non-data path parts.
the chip’s sub-blocks and deciding the type and location of Chip placement is one of the chip design cycle’s most
I/O pads, power pads, and power and clock distributions. time-consuming and complex stages. AI will provide the
Placement determines the physical locations of logic gates necessary means to shorten the chip design cycle, ultimately
(cells) in the circuit layout, and its solution largely impacts forming a symbiotic relationship between the hardware and
the subsequent routing and post-routing closure. Global AI, each promoting the advancement of the other. To reduce
placement, legalization, and detailed placement are the three the time required by the chip placement, Mirhoseini et al.
stages of placement. The global placement provides the proposed an approach that can learn from past experiences
rough locations of standard cells, and legalization removes and improve over time [199]. The authors posed placement
any design rule violations and overlaps based on the global as an RL problem and trained an agent to place the nodes of
placement solution. Detailed placement incrementally im- a chip netlist onto a chip canvas such that the final PPA is
proves the overall placement quality [190]. optimized while adhering to the constraints imposed by the
Chip floor planning is modeled as a reinforcement learn- placement density and routing congestion. The RL agent
ing problem in [191]. An edge-based graph convolutional (policy network) sequentially places the macros, and once
neural network architecture capable of learning rich and all macros are placed, a force-directed method produces
transferable chip representations is modeled out of RL. The a rough placement of the standard cells. This RL agent
method was used to design the next generation of Google’s becomes faster and better at chip placement as it gains
artificial intelligence accelerators and has shown the poten- experience on numerous chip netlists. The results ensured
tial to save thousands of hours of human effort for each new that the proposed approach generates placements in under 6
generation. A machine learning-based methodology to pre- hours, whereas the strongest baselines require human experts
dict post P&R (place and route) slack of SRAMs at the floor in the loop, and the overall process may take several weeks.
planning stage, given only a netlist, constraints, and floor In [200], quantum machine learning techniques are proposed
plan context tested on 28nm foundry FDSOI technology for faster and optimal solutions with low-error rates to VLSI
shows a worst-case error of 224ps [192]. Cheng et al. pro- placement problems. A complete placement was achieved
pose [193] regression methodology to quickly evaluate rout- using the variational quantum Eigen solver (VQE) [201]
ing congestion and half-perimeter wire length in each macro approach, tested on two circuits: a toy circuit (comprising
placement during floor planning. They explored solutions eight gates) and another circuit called "Apte," taken from the
using different regression techniques – LR, DTR (decision MCNC benchmark suite [83]. Research on GPU acceleration
tree regressor), booster DTR, NN, and Poisson regression. for placement and timing analysis achieved 500x speedup for
Among these, DTR showed a better performance. A multi- static timing analysis on a million-gate design harnessing the
chip module (MCM) has many small chips integrated into power of machine learning techniques with heterogeneous
a package and joined by interconnects [194]. Multi-chip parallelism [202].
partitioning is harder due to sparse search space. An RL Placement and routing are two highly dependent physical
solution for partitioning ML models in MCM is presented design stages. Tight cooperation between them is highly
in [195].

D Amuru et al.: Preprint submitted to Elsevier Page 16 of 41


AI/ML in VLSI

Figure 8: Meta Modeling Flow

recommendable for optimized chip layout. Traditional place- percentage of the overall power in the final full-chip design,
ment algorithms that estimate routability using pin delay or it is vital to have an optimized clock tree that prevents serious
through wirelength models can never meet their objectives design problems, including excessive power consumption,
due to increased manufacturing constraints, and complex high routing congestion (caused when extra shielding tech-
standard cell layouts [203]. A deep-learning model (CNN niques are used), and protracted time closure. With the
based) to estimate the routability of a placement to quickly downscaling of devices, the run time and complexity of
analyze the degree of routing difficulty to be encountered by existing EDA tools for accomplishing CTS have increased.
the detailed router is presented in [204]. In [205], a CNN- Highly efficient clock trees that optimize key-desired param-
based RL model is proposed for detailed Placement, keeping eters, such as the clock power, skew, and clock wire length,
optimal routability for current Placement. A generalized are required. It is a very time-consuming process involv- ing
placement optimization framework to meet the post-layout searching for parameters in a wide range of candidate
PPA metrics with a small runtime overhead is proposed parameters. Several ML algorithms have been proposed to
in [206]. Given an initial placement, unsupervised learn- ing automate the prediction of clock-network metrics.
discovers the critical cell clusters for post-route PPA Data mining tools such as the cubist data mining tool
improvements from timing, power, and congestion analy- [212] are used to achieve skew and insertion delay efficiently.
sis. A directed-placement optimization followed them. The In [213], statistical learning and meta-modeling methods
approach is validated on industrial benchmarks in a 5nm (including surrogate models) were employed to predict es-
technology node. sential parameters, such as the clock power and clock wire
A machine-learning model for predicting the sensitivity length, as shown in Fig. 8. In [214], the authors implement
of minimum valid block-level area of various physical lay- a hierarchical hybrid surrogate model for CTS prediction,
out factors that provides 100x speedup compared to con- mitigating parameter multi-collinearity challenges in rela-
ventional design technology co-optimization (DTCO) and tively high dimensions. They tackle the high-dimensionality
system technology co-optimization (STCO) approaches is problem by dividing the architectural and floor planning
proposed in [207]. This research suggests bootstrap aggre- parameters into two groups – one with low multi-collinearity
gation and gradient boosting techniques for block-level area and the other with parameters that exhibit large linear de-
sensitivity prediction from their experiments across various pendence. Later the models from these groups are combined
ML algorithms. Further, [208] quotes MAGICAL (Fully through least-squares regression (LSQR). [215] presents an
automated analog layout from netlists to GDSII, includ- ing ANN-based transient clock power estimation that can be
automatic layout constraint generation, placement, and applied to pre-CTS netlists.
routing), an open-source VLSI placement engine. Magical Ray et al. [216] employ ML-based parameter tuning
1.0 is open-source. [209] presents automated floor planning in multi-source CTS to build a high-performance clock
by exploration with different floor plan alternatives and network with a quick turnaround time. GAN-CTS, a con-
placement styles. ditional GAN framework for CTS outcome prediction and
RL is being proposed as the best solution for the physical optimization, outperforms commercial auto-generated clock
design of an IC as it does not depend on any external tree tools in terms of clock power, skew, and wire length
data or prior knowledge for training and could produce (target CTS metrics) [217]. Design features are directly
unusual solutions based on the design space exploration by extracted from placement layout images to perform practical
the agent. Some RL approaches for placement optimizations CTS outcome predictions. The framework also employs RL
[210, 211]. to supervise the generator of GAN toward clock tree opti-
mization. A modified GAN-CTS [218] employs a multitask
8.2. AI for Clock Tree Synthesis(CTS) learning technique to simultaneously predict the target CTS
Clock tree synthesis is one of the crucial steps in the metrics using multi-output deep NN. It achieves higher accu-
VLSI physical design. It is used to reduce clock skew and racy in a shorter training time compared to the meta-learning
insertion delay. As the clock network contributes a large

D Amuru et al.: Preprint submitted to Elsevier Page 17 of 41


AI/ML in VLSI
its tile area. An instance belongs to the positive (P) class and
approach [217]. An RL-based solution reduces over 40% of is labeled with the target value of 1 if any short violation is
the peak current of a design at the CTS stage compared to observed in its tile area after detailed routing. The collected
the heuristic CTS solutions utilized by physical design EDA data are fed to a supervised-learning algorithm.
tools [219]. Zhang et al. propose a density and pins peaks-based fast
neural network algorithm in NTHU-Route 2.0 [231] to
8.3. AI for Routing predict congestion map [232]. A GAN-based congestion
Routing lays physical connections to the circuit blocks estimator can produce congestion heatmaps from placement,
and pins assigned during the placement as per logical con- and netlist information [233]. A deep learning framework for
nectivity and design rules. Global routing (GR) and detailed predicting the shorts violations by extracting useful features
routing are two routing stages. Global routing partitions the after placement and analyzing them drastically decreases the
routing region into tiles and decides tile-to-tile paths for all prediction time and requirement of a global router [234].
nets while attempting to optimize specific objectives such as Interestingly, the framework considers a short prediction
minimum wire length and timing budget. The actual problem as a binary classification problem with imbalanced
geometric layout of each net within the assigned routing data. The results show that the model is 14x faster than
regions is carried out in the detailed routing stage [220]. NCTU-GR [235] for smaller designs and up to 96x faster
Routing congestion [221] is the major bottleneck in the for larger designs. A CNN-based GR congestion estimation
GR stage which is caused when an overflow of net assign- algorithm [236] that utilizes the 3D congestion information
ment occurs in a region. Another area for improvement in similar to [237] showed an incremental improvement in the
routing is DRVs (detailed routing violations). The heuristic number of overflows, wire length, and vias. The congestion
and probabilistic approaches employed in traditional GR heatmaps and placement information extracted from hyper-
solutions [222, 223] suffer from scalability limitations as- images of the design feature extraction algorithm act as
sociated with advanced nodes. Early prediction of routing inputs to the congestion model. Goswami et al. proposed a
requirements enables the design engineers to create high- regression-based routing congestion prediction problem for
quality layouts faster. Some research efforts were made FPGAs [238]. The paper reports important features through
to address these challenges using machine-learning-based thorough feature engineering for modeling the regression
approaches. algorithms - RF, MLP, LR, and MARS. On average, the
MARS (multivariate adaptive regression splines), a non- proposed methodology is 25 to 50 times faster than Xilinx
parametric flexible regression modeling for high-dimensional Vivado-based routing calculation tool, which reports actual
data, is used for modeling routing congestion in [224]. Qi et congestion after detailed routing.
al. [225] also utilize MARS to construct a routing congestion One solution for reducing the overall time of physical
model that directly estimates detailed routing congestion design is to predict the circuit performance after physical
through a mapping function that maps global routes and design. Li and Franzon [239] proposed an ML approach us-
layout data to detailed routing congestion. Router-friendly ing surrogate modeling (SUMO). They employed surrogate
placement solutions can be obtained from congestion esti- models to predict the results after the GR step. In the first
mators. SVM for classifying BEOL (back end of line) stack- stage, SUMO generates models for each output to predict the
specific placements routability based on the DRVs/DRCs GR results in the future. In the second stage, after analyzing
(design rule check) from P & R tools at the post-route the linear relationships among thousands of GR results and
stage achieved significant improvements than employing detailed routing results, these results were set as inputs and
only congestion maps [226]. Xie et al. propose RouteNet outputs in ML models. These trained ML models precisely
[227] to evaluate the overall routability of cell placement predict the after-detailed routing results using the GR results.
solutions without global routing or predict the locations of NNs and decision trees are the most used ML models for
DRC (Design Rule Checking) hotspots. RouteNet is built this problem. A machine learning-based pre-routing timing
over CNNs and shows 50% higher accuracy than GR. An prediction approach [240] shows a closer match with post-
ML approach predicts any short violations before detailed routing analysis from Synopsys PrimeTime. RF performed
routing with placement and global routing congestion infor- well in their analysis compared to lasso and NN techniques.
mation and sends it as feedback to the placement system for Substrate routing automation framework through super-
improvement [228, 229, 230]. vised learning algorithms is proposed in [241]. It com-
Figure 9 shows the general procedure of ML-based rout- bines manual and automated results as training data to a
ing shorts prediction. In the routing step, each circuit under CNN for improved design cycle and performance. A GNN-
training is routed using a detailed routing tool, and the based congestion estimation approach that can predict the
locations of any identified shorts are collected. In feature detail routed lower metal layer congestion values from a
extraction, the circuit area is divided into small tiles, and technology-specific gate-level netlist for every cell in a de-
occurrences of shorts are investigated inside these tiles. Each sign is proposed in [242]. The training dataset is built from
tile is described by a feature vector (appropriate features that the detail-routed congestion maps by dividing them into
contribute to routing violations) and is considered as an discrete grids and assigning the congestion value of each
instance. An instance belongs to the negative (N) class and grid as the target value. Another GNN-based routing short
is labeled with the target value of 0, i.e., there is no short in

D Amuru et al.: Preprint submitted to Elsevier Page 18 of 41


AI/ML in VLSI
steps of deposition and diffusion based on the circuit’s com-
plexity. During each step, one mask is used. The exposure
parameters required to achieve accurate pattern transfer from
the mask to the photosensitive layer primarily depend on
the wavelength of the radiation source and the dose re-
quired to achieve the desired change in the properties of the
photoresist. Identifying defects during mask synthesis and
verifying each lithography stage before proceeding to the
next stage is crucial for yield enhancement but very difficult
in the nanometer dimensions, mainly due to the increased
random variations in the process. Introduction/improvement
of automated procedures at various stages of lithography is
necessary to increase the manufacturing yield and reduce
the cost and turnaround time. Traditionally, this was a very
laborious process; fortunately, the introduction of ML has
afforded many opportunities for increasing the processing
speed, particularly in mask synthesis and verification [247].
The need for Machine Learning in the lithography pro-
cess is discussed in [248]. It also highlights various al-
gorithms and their trade-offs used for hotspot detection
Figure 9: Typical flow of ML-based routing techniques
(HD), optical proximity correction (OPC), sub-resolution
assist feature (SRAF), phase shift masks (PSM), and resist
violations prediction at the placement stage is proposed in modelling. They also propose a Gaussian process to reduce
[243]. The information is fed back to the placement system, the false positive outcomes of ML algorithms. In detail, we
and a new placement result is generated with reduced DRC discuss the research on ML-Lithography in the following
violations. GraphSAGE (Graph sample and aggregate) is sub-sections.
applied effectively to combine the adjacency matrix with the
9.1.1. At Mask Synthesis
features of each tile. [244] presents a survey of the recent
Optical lithography is the most widely used technique in
development of machine learning-based routing algorithms.
IC manufacturing, where a geometric mask is projected into
XGBoost is employed to predict post-detailed routing timing
a photo-resist-coated semiconductor through a photon-
at the post-GR stage in [245]. When employed for post-GR
based technique. Moore’s law has driven features to ever
optimization, it improves the circuit performance.
smaller dimensions, and the technology has been scaled
Supervised learning, NNs in particular, and RL-based
down to the limit of light wavelength. Consequently, the
solutions reported in the literature produced many valuable
printed patterns get distorted due to diffraction, resulting in
feedbacks and solutions for different complex modeling
process defects. Various resolution enhancement techniques
tasks at various physical design stages.
(RETs) are employed to improve the performance of photo-
After placement and routing, the layout is generated and
lithography. OPC and SRAF insertion are the most used
the design is ready for fabrication.
RETs to maximize the process window and ensure accurate
patterns on the wafer. However, these enhancement tech-
9. AI in Manufacturing niques suffer from an extremely long runtime owing to their
Numerous processes are involved in manufacturing an laborious iterative process. Many state-of-the-art methods
IC, including wafer preparation, epitaxy, oxidation, diffu- use machine learning to identify defective lithographic pat-
sion, ion implantation, lithography, etching, and metalliza- terns.
tion [246]. All the steps are performed in highly sophis- LR was the first ML technique used in OPC. An LR
ticated fabrication units with constant human supervision. model for predicting the optimum starting point for a traditional-
The fabricated ICs are packaged in special packages to iterative-model-based OPC has been proposed [249]. Using
protect them from external/environmental damage. discrete cosine transform coefficients from the lowpass-
filtered 2 × 2 um layout patterns as inputs and creating
9.1. AI for Lithography separate models for normal edge, concave corner, and con-
Most chip-manufacturing processes are complex chemi- vex corner fragments; they achieved a 32% reduction in the
cal processes, except for the lithography process. Lithogra- runtime. When Luo [250] proposed a three-layer MLP to
phy transforms layout data into geometric patterns as masks generate the optimal mask pattern for OPC, deep learning
and from masks to the resist material on the semiconductor. came into use. Using the steepest descent method to generate
After the physical design, lithography is a crucial step in the training set, his model drastically reduced computation
chip manufacturing. Masks identify spaces on the wafer time.
where certain materials need to be deposited, diffused, or
removed. The fabrication process involves several dozen

D Amuru et al.: Preprint submitted to Elsevier Page 19 of 41


AI/ML in VLSI

Figure 10: Typical procedure of ML-based mask Synthesis Flow

A hierarchical Bayes model (HBM) was proposed in though they are on the mask. The process of SRAF gener-
[251], for OPC, along with a new feature-extracting tech- ation is similar to OPC and is computationally expensive.
nique known as concentric circle area sampling (CCAS). Recently, ML was applied to SRAF generation. Xu et al.
HBM provides a flexible model that is not constrained by the [259] demonstrated an SRAF generation technique with
linearity of the model parameters or the number of samples; supervised-learning data for the first time. In their model,
this model utilizes a Bayes inference technique to learn the features are extracted using CCAS and compacted to reduce
optimal parameters from the given data. All parameters are training data size. Logistic regression and SVM models
estimated using the Markov chain MC method [252]. This were employed for training and testing. Instead of using
approach has shown better results than other ML techniques, binary classification models, the author uses the models as
such as LR and SVMs. Most ML OPCs use local pattern probability maxima. SRAFs are inserted at the grids with the
densities or pixel values of rasterized layouts as parameters, probability maxima. This model shows a drastic speedup in
which are typically huge numbers. It leads to overfitting and, computation with less error. Shim et al. [260] used decision
consequently, reduced accuracy. Choi et al. [253] proposed trees and logistic regression for SRAF generation, which
the usage of basic functions of polar Fourier transform showed a 10× improvement in runtime.
(PFT) as parameters of ML OPC. The PFT signals obtained Etching and mask synthesis are performed simultane-
from the layout are used as input parameters for an MLP ously. Recently, ML has been used to predict the etch bias
whose number of layers and neurons are decided empirically. (over-etched or under-etched). ANNs [261, 262, 263] have
Experimental results show that this model achieves an 80% been used to predict the etch proximity correction to com-
reduction in the OPC time and a 35% reduction in error. pensate for the etch bias, yielding better accuracy than
ML is also explored in inverse lithography technology traditional methods.
(ILT) [254], a popular pixel-based OPC method. ILT treats Although these ML models achieve high accuracy, they
the OPC as an inverse imaging problem and follows a rig- require a large amount of data for training. In the field of
orous approach to determine the mask shapes that produce lithography, where the technology shrinks very rapidly, and
the desired on-wafer results. Jia and Lam [255] developed old data cannot be used for the new models, data generation
a stochastic gradient descent model for mask optimization is a very laborious task. One of the solutions to this prob-
that showed promising results in robust mask production. lem is to use transfer learning, [264] which takes the data
Luo et al. [256] proposed an SVM-based layout retargeting generated through old technology nodes and information
method for ILT for fast convergence. A solution to ILT was about the evolution of nodes, e.g., from 10 to 7 nm, and uses
achieved through a hybrid approach by combining physics- them for model training. The authors also employ active data
based feature maps [257] with image space information as selection to use the unlabeled data for training using Clus-
model inputs to DCNN (deep CNN) [258]. tering. ResNet is used along with these two active learning
SRAFSs are small rectangular patterns on a mask that and transfer learning techniques, yielding high accuracy with
assist in printing target patterns; they are not printed even very few data samples for training.

D Amuru et al.: Preprint submitted to Elsevier Page 20 of 41


AI/ML in VLSI

Figure 11: Generative Adversarial Networks Figure 12: Examples of lithography hotspot Patterns

GANs [53] are one of the hottest prospects in deep decides the appropriate OPC engine for a given pattern,
learning. Figure 11 shows the general design of the primary taking advantage of both ILT and model-based OPC with
optimization flow of a generative adversarial network. It con- negligible overhead. They designed a classification model
tains two networks interacting with each other. The first one with a task-aware loss function to capture the design char-
is called the "generator" and takes random vectors as input acteristics better and achieve their objectives. Yang et al.
and generates samples as close to the true dataset distribution [269] also proposed an active-learning-based layout pat-
as possible. The second one is called the "discriminator" and tern sampling and HD flow for effective, optimized pattern
attempts to distinguish the true dataset from the generated selection. The experiments show that the proposed flow
samples. At convergence, ideally, the generator is expected significantly reduces the lithography simulation overhead
to generate samples with the same distribution as the true with satisfactory detection accuracy.
dataset. This technique was exploited in lithography mod- E-beam lithography is another prominent patterning
eling. GANs were used for OPC where intermediate ILT method to electronically transfer the layouts onto the wafer.
results initialize the generator; this improves the training Non-uniformities caused by parallel e-beam maskless lithog-
process, allowing the network to produce an improved mask raphy result in variations within the targets. Scatterometry
[265]. In [266], CGAN was used for the generation of SRAF. measures the defects caused by simulated dose variations in
Conditional GAN is an extension of GAN, where the gen- patterned multi-beam maskless lithography. An ML-based
erator and discriminator are conditioned on some auxiliary scatterometry to quantify critical dimension (measured pa-
information, such as class labels or data, from other modali- rameter for variation detection) and sensitivity analysis in
ties. A new technique for data preparation, i.e., a novel multi- detecting beam defects is proposed in [270]. A fast in-
channel heatmap encoding/decoding scheme that maps lay- line EUV resist characterization using scatterometry in
outs to images suitable for CGAN training while preserv- conjunction with machine learning algorithms is presented
ing the layout details, was also proposed here. This model in [271].
achieves a 14× reduction in computation costs compared to
9.1.2. At Mask Verification
state-of-the-art ML techniques. LithoGAN [267] is an end-
Due to complicated design rules and various RETs such
to-end lithography modeling approach where the mask
as OPC and SRAF, there may still be many lithographic
pattern is directly mapped to the resist pattern. Here, a
hotspots that may cause opens, shorts, and reductions in
CGAN is used to predict the shape of the resist pattern, and
yield (Fig. 12). Therefore, detecting and removing these
a CNN is used to determine the center location of the resist
hotspots are critical for achieving a high yield. Traditionally,
pattern. This technique overcomes the laborious process of
pattern-matching techniques are widely used in HD. Hotspot
building and training a model for each stage, resulting in a
patterns are stored in a predefined library, and given a new
reduction in computation time of approximately 190 times
testing pattern; a hotspot is detected if it can be matched to
compared to other ML techniques.
the existing patterns. This technique is accurate for already-
Different OPC engines work on different design patterns,
known hotspot patterns but does not work well for new, un-
each of which has advantages and disadvantages. Compared
known patterns. ML-based approaches show better accuracy
to the model-based OPC, ILTs generally promise good mask
for both seen and unseen patterns.
printability owing to their relatively large solution space.
Early ML usage in lithography HD included classifiers
However, this conclusion only sometimes holds as ILTs need
such as simple NNs (including ANNs) [272, 273], which
to solve a highly non-convex optimization problem, which
detect hotspots from given patterns. Clustering algorithms
is occasionally challenging to converge. GAN yields good
were also extensively used, [274, 275], where a large dataset
results; however, it is difficult to train for some patterns. To
overcome these challenges, Yang et al.[268] proposed a
heterogeneous OPC flow, where a deterministic ML model

D Amuru et al.: Preprint submitted to Elsevier Page 21 of 41


AI/ML in VLSI
dataset is first augmented with a mirror-flipped, 180°-rotated
of hotspots is divided into multiple classes using these algo- version of the original layout clips, followed by up-sampling.
rithms, and pattern-matching techniques are used for the de- In this technique, overfitting can be reduced through random
tection of new hotspots. As the detected hotspots in the same mirroring. Zhang et al. [285] built an online learning model
class share similar geometric shapes, it is expected that they with a novel critical feature extraction technique. They
can be fixed using a standard fixing solution. False alarms constructed an ensemble classifier using smooth boosting
are a critical issue in HD in many ML methods. Researchers and modified Naive Bayes. Their technique outperformed
have been attempting to overcome this challenge. Ding et al. state-of-the-art methods in terms of accuracy and false
[276] attempted to successfully refine the SVM and ANN alarms. Subsequently, they extended this technique to on-
classifiers to identify the hotspot patterns more accurately line learning, which yielded even better performance. Ye
Topological classification is another method where feedback [286] proposed litho-GPA, a Gaussian process assurance
learning is employed to reduce false alarms. Yu et al. [277] (confidence value), provided along with each prediction. The
classified the already-known hotspots and non-hotspot pat- framework also incorporated a set of weak classifiers and
terns into clusters according to the topologies in their core active data sampling for learning, reducing the amount of
regions. Subsequently, they extracted critical features and training data and computations required.
constructed an SVM kernel with multiple feedback learning Park et al. [287] propose an SVM model trained with
from the mispredicted non-hotspots. Combining different lithographic information that detects pinching and bridging
ML techniques yields better outcomes most times. Ding hotspots during mask transferring to wafer. They further in-
et al. [278] proposed a new algorithm that combines ML and corporate domain knowledge of lithographic information in
pattern matching, and Matsunawa et al. [279] used the the SVM kernel to accomplish an accurate decision function
AdaBoost classifier; both approaches resulted in a significant to classify them into four categories – horizontal bridging
reduction in false alarms and outperformed many other ML (HB), vertical bridging (VB), horizontal pinching (HP), and
techniques. Semi-supervised learning [280] is also being vertical pinching (VP). A hybrid pattern matching-SVM
used for detection; it leverages both labeled and unlabeled classifier for HD is presented in [288]. CNN-based HD is
data, thereby reducing the dependence on labeled training proposed in [289, 290]. The framework in [290] also has
data. It is advantageous as obtaining labeled hotspot regions a transfer learning scheme to reduce the training sample
is considerably more difficult. This method combines Classi- requirement for modeling HD at a more advanced node. A
fication and Clustering, creating a multitasking network that modified DNN by replacing pooling layers with convolution
groups the unlabeled data with labeled data and then uses layers for HD is proposed in [291]. They applied hotspot
them for learning. It reduces the pre-processing time and folding, rotating, and mirror-flipping for highly imbalanced
amount of labeled training data required. datasets to maximize the training samples. Addressing the
CNNs are a widely used NN technique in image pro- challenge of an imbalanced dataset in HD, [292] propose
cessing, classification, etc. HD is very similar to image a dataset sampling technique based on autoencoders. The
classification; CNNs [281, 282] have recently been used in autoencoders identify latent data features that can recon-
this field, yielding better accuracy than other state-of-the-art struct the input patterns, which are then grouped using
ML approaches. Pooling layers are one of the building blocks Density-based spatial clustering of applications with noise
of CNNs. These layers reduce the number of parameters and (DBSCAN). These clustered patterns are sampled to reduce
computation steps in the network by extracting the statistical the training set size. An automatic layout generation tool that
summary of the local regions of the previous layer, thereby can synthesize different layout patterns given a set of design
reducing the feature map dimension and drastically lowering rules is proposed in [293]. The tool supports via and uni-
the sensitivity of the NN to small changes. However, in the directional metal layer generation. It’s robustness in HD is
HD process, these layers may ignore small edge dis- tested using state-of-the-art ML models.
placements and turn them into non-hotspot regions. Yang et SONR (state of nature reduction) [294], a semi-supervised
al. [283] proposed a pooling-free CNN architecture that feature vector-based ML tool for lithographic HD at different
overcomes this defect, yielding increased accuracy. Online stages and cross products based on known hotspots, is a
learning is another method in ML where the model is trained fast and effective method to optimize OPC verification flow
and updated with new data that is fed over time to build the and improve manufacturing yield. The proposed workflow
predictor for future data. This method can adapt over time is available as Mentor’s Calibre SONR tool. [295] demon-
and works well with new models; thus, it can be used in HD. strates an HD case study based on ADAPT, a framework
Although CNN has the potential to perform well in HD, for the fast migration of machine learning models across
hotspot patterns are always minorities in the VLSI mask different IC technologies. It is an unsupervised Bayesian
design as less number of patterns are available for training, approach to significantly reduce model cost and provide
resulting in an imbalanced training data set; this results in a customized learning with fewer data techniques and labeling
model with high false negatives. strategies. An ML-based color defect detection for after de-
Yang et al. [284] attempted to apply minority up-sampling velop inspections in lithography exhibited more sensitivity
and random-mirror flipping before training the network and and specificity in a trial comparison against the reference
achieved better performance than state-of-the-art hotspot method [296, 297].
detectors. In this pre-processing technique, the training

D Amuru et al.: Preprint submitted to Elsevier Page 22 of 41


AI/ML in VLSI
training robust and accurate models. Therefore, developing
Automation in SEM (scanning electron microscope) im- techniques for improving modeling accuracy with a relaxed
age pre-processing using dimensionality reduction and fea- demand for big data is critical to promote the widespread
ture detection dramatically reduces the computation time of adoption of ML.
lithography patterning [298]. A framework combining ML
models for automatically mining lithographic hotspots from 9.2. Reliability Analysis
massive SEM images detects hard defects such as bridging Over the last few decades, shrinking CMOS geometries
and necking and soft defects such as scumming that are hard have increased manufacturing defect levels and on-chip fault
to detect by manual inspection [299]. However, they propose rates. Increased fault rates have considerably impacted the
manual inspection on top of their framework for the final performance and reliability of circuits. The multiplied fault
decision on the detected hotspots. The solution proposed rates have necessitated an accurate and robust reliability
could reduce the workload to a large extent compared with analysis. The fundamental reliability analysis evaluates logic
the traditional way. Recently, many researchers have been circuit errors due to hot-carrier insertion, electro-migration,
searching for efficient solutions beyond ML [300, 301]. NBTI (negative-bias temperature instability), and electro-
A circuit-based hybrid quantum-classical machine learning static ejection. Reliability engineers focus on correcting the
using variational quantum layers for lithography HD from functionality and enhancing the circuit’s lifetime.
SEM images is proposed in [300]. The hybrid approach adds Precise reliability analysis involves numerous mathemat-
quantum circuits to the conventional CNN for enhanced ical equations. However, mathematical equations fall apart
performance. Quantum computing simulation has been per- due to the complexity involved in the reliability estimation
formed with CuQuantum, an Nvidia software development of large circuits with millions of gates. Reliability engineers
kit with optimized libraries and tools for accelerating quan- have worked on MC simulations, which are rigorous and
tum computing workflows. Virtual metrology model using time-consuming. Therefore, the evolution of ML has aided
CNNs to predict the overlay errors of the photo-lithography engineers in developing exhaustive and rapid reliability anal-
process [301]. ysis algorithms. Patel et al. [306] and Krishnaswamy et
Layout patterns play an essential role as resources for al. [307] developed probabilistic transfer matrices, which
flows of various DFM that we have already discussed. How- perform simultaneous analyses over all the possible I/O
ever, VLSI layout pattern libraries are not readily available combinations. The major limitation of the method is the
due to the long and iterative technology life cycle, which can large memory requirement for the matrices. Choudhury et
slow down the technology node development. However, sig- al. [308] recommended algorithms for reliability analysis
nificant effort has been devoted to enlarging existing libraries based on a single pass, observability, and a max-k gate. The
by exploiting existing patterns, including flipping, rotating, methods are precise for PVT and aging-related degra-
and using a random generator. These methods are coupled dation. Beg et al. [309] presented an NN-based nanocircuit
with complex manuals for guidance and hardly increase the reliability estimation method as an alternative to traditional
layout diversity owing to their deterministic strategy. To mathematical methods. This method is time efficient for the
address these problems, Zhang et al. [302] proposed a analysis of the circuits.
pattern generation and legalization framework comprising Circuit aging is one of the essential concerns in the
two learning-based modules for pattern topology genera- nanometer regime in designing future reliable ICs. Several
tion and design rule legalization. In the generation stage, a operating conditions, such as temperature, voltage bias, and
variational convolutional autoencoder (VCAE) [303] is de- process parameters, influence the performance degradation
signed to efficiently generate realistic pattern topologies via of the IC. NBTI is a significant phenomenon occurring at
Gaussian perturbation. For the legalization stage, a CGAN present and future technology nodes and contributes sig-
[304] model is used to transform the generated samples from nificantly to the performance degradation of an IC due to
blurry patterns to smooth ones, significantly reducing the aging. It shifts the threshold voltage during the lifetime,
DRC violation risks. Based on an adversarial autoencoder, degrades the device drive current, and degrades the device
a pattern style detection tool is designed to examine the performance. It is necessary to evaluate the impact of NBTI
pattern styles and filter out unrealistic generated patterns. A on the performance of a circuit under stress early in the
novel confidence-aware deep learning model for post- design phase to incorporate appropriate design solutions.
fabrication wafer map defect is proposed in [305]. The Many researchers have contributed to the NBTI estimation
experiment results on industrial wafer datasets demonstrate early in the design phase through ML algorithms.
superior accuracy compared to the traditional approach. The Karmi et al. [310] proposed an aging prognosis approach
paper also discusses the scope of DL-based approaches for based on nonlinear regression models that map circuit oper-
manufacturing and yield in the near future. ating conditions to critical path delay estimation. The ap-
Evidently, ML is no longer a novelty in chip fabrication. proach also considered the effects due to process variations.
Chip manufacturers will continue to leverage the technology The experiments showed that the impact of IC aging on
as it matures. ML provides solutions to many problems in critical path delays could be accurately estimated through
lithography. However, unlike areas such as image process- nonlinear regression models. Such modeling facilitates the
ing, where a large amount of data is available, it is difficult
and expensive to obtain enough data in VLSI design for

D Amuru et al.: Preprint submitted to Elsevier Page 23 of 41


AI/ML in VLSI
is becoming challenging at the present scale of circuit de-
implementation of preventive actions before the circuit ex- sign. Addressing these issues, [320] proposed CNN-based
periences aging-related malfunctions. A gate-level timing arithmetic circuit classification, taking the image generated
prediction under dynamic on-chip variations is proposed in from a circuit’s conjunctive normal form description. How-
[311]. The high-dimensional features added to the statisti- ever, the structural information of circuits is difficult to
cal timing analysis for modeling the NBTI increase with capture in the CNN framework. Resolving it, [321] pro-
increasing circuit complexity. The proposed learning-based poses a GNN framework for ASIC circuit netlist recognition,
approach efficiently captures these high-dimensional corre- which classifies circuits according to their structural simi-
lations and estimates the NBTI-induced delay degradation, larity. Case studies on four designs of adder circuits exhibit
with a maximum absolute error of 4% across all the designs. 98.3% accuracy. Several environmental, performance and
SVR and random forest models were applied to the timing process-related embedded instruments (EI) are present in an
estimation. Analysis and estimation of the impact of NBTI- SoC with a JTAG interface. The EI data is systemati- cally
induced variations at multi-gate transistors in digital circuits collected over time and analyzed using PCA (principal
are becoming highly challenging [312]. A quick and accurate component analysis) and a power-law-based degradation
estimation of process variation impact and device aging on model to predict the remaining valid lifetime of an SoC
the delay of any path within a circuit is possible through [322]. Liakos et al. [323] proposed hardware trojan learning
GNNs [313]. analysis (ATLAS) that identifies hardware trojan-infected
Electro-migration is another concern successfully ad- circuits using a gradient boosting model on data from the
dressed by various AI strategies. A new data-driven learning- gate-level netlists phase. The feature extraction was based on
based approach for fast 2D analysis of electric potential and the area and power analysis from Synopsis Design Compiler
electric fields based on DNNs is proposed in [314]. As an NXT industrial tool. ATLAS model was trained and tested
extension, Lamichhane et al. [315] proposed an image- on all circuits available in the Trust-HUB benchmark suite.
generative learning framework for electrostatic analysis for The experimental results show that the classification perfor-
VLSI dielectric aging estimation. It speeds up the analysis mance is better than the existing models. In [324], GNNs
compared to the conventional numerical method, COSMOL. are proposed for reverse engineering of gate-level netlists
Compared to the similar CNN-based method, the proposed without manual intervention or post-processing. The exper-
GAN-based approach gives 1.54x more speedup with around imental results on EPFL benchmarks [325], the ISCAS-85
similar accuracy. benchmarks, and the 74X series benchmark show an average
Reliability analysis and failure prediction of 3D ICs has accuracy of 98.82% in terms of mapping individual gates to
gained considerable attention over the past few years. A modules.
study on the 3D X-ray tomographic images combined with Numerical methods and MC simulations, which relia-
AI deep learning based on a CNN for non-destructive bility engineers widely use, have memory and timing con-
analysis of solder interconnects demonstrates an accuracy straint bottlenecks. The NNs and Bayesian-statistical models
of 89.9% in predicting the interconnect operational faults of are exhaustive and consume less memory. Recently, hyper-
solder joints of 3D ICs [316]. Adaptive lifetime prediction dimensional computing, an emerging alternative to ML, has
techniques (ADLPT) that minimize redundant prediction been proposed to address the circuit reliability issues [326].
operations in 3D NAND flash memories by exploiting re- The experiments to estimate transistor electrical charac-
liability variation are presented in [317]. teristics and manufacturing variability on industrial 14nm
Kundu et al. [318] confer the reliability issues of differ- FinFET Intel instruments demonstrate 4x smaller error with
ent AI/ML hardware. The paper explores and analyzes the 20x fewer training samples. Thus, ML and more advanced
impact of DRAM faults on the performance of the DNN models will play a significant role in reliability estimation in
accelerator by implementing MLP on MNIST datasets. Fur- the future.
ther, they discussed the impact of the circuit and transistor-
level hazards such as PVT variations, runtime power supply 9.3. Yield Estimation and Management
voltage noise and droop, circuit aging, and radiation-induced Many complex and interrelated components during the
soft errors on AI/ML accelerator performance. The accu- manufacturing process affect the yield of an IC. Yield learn-
racy impact on MAC units due to these hazards has been ing and optimization are critical for advanced IC design and
estimated. The paper also highlights the reliability issues of manufacturing. A yield prediction model is necessary to
neuromorphic hardware and proposes RENEU, a reliability- precisely evaluate the productivity of new wafer maps
oriented approach to map machine learning applications to because the yield is directly related to the productivity and
it. the design of the wafer map affects the yield [327]. Many
The ever-growing circuit complexity is also raising con- statistical approaches [328, 329] for yield modeling and
cerns about hardware security. ML can aid in detecting hard- optimization have been proposed since the 1980s; however,
ware attacks and could take necessary counter-attacks with with the uncertainty in nanoscale fabrication and the grow-
suitable design [319]. Hardware assurance and verification ing complexity of the process, large volumes of data are
in manufactured ICs are also important to identify hardware being generated daily, traditional approaches have limits in
Trojans. Manual verification to identify such security threats extracting the full benefits of the data. Even the most

D Amuru et al.: Preprint submitted to Elsevier Page 24 of 41


AI/ML in VLSI
10.1. Functional Verification
complex sophisticated process results in poor exploitation of Functional verification is verifying that a design con-
data. AI/ML could aid in continuous quality improvement in forms to its specifications. A set of input vectors is provided
a large and complex process. to the CUT (circuit under test), and its output is compared
Cycle time(CT) is one of the critical performance mea- to the golden output of the specification for checking the
sures of the semiconductor production line. It is a mandate possibility of faults. Functional verification [340] is very
to understand the key factors influencing CT for its effective difficult because of the sheer volume of possible test cases,
reduction and yield enhancement. A data-driven approach even in a simple design. Manufacturers generally employ a
to predict the CT understanding the key factors influencing random test pattern generator [341], which provides signif-
it is proposed in [330, 331]. The approach of data mining icant fault coverage; however, it may only cover some of
and ML can be used for analyzing the extracted information the faults and has a very long runtime. ML is being used to
and knowledge from different stages of manufacturing for predict the best test set to achieve the maximum fault
troubleshooting and defect diagnosis, which decreases the coverage with a minimum number of test vectors. In [342],
turnaround time. Learning approaches are proposed in [332] the nearest neighbor algorithm was used to generate efficient
for yield enhancement. A backend final test yield prediction patterns for BIST (built-in self-test) test pattern generation,
at the wafer fabrication stage using a Gaussian Mixture improving the fault coverage. This algorithm detects random
Models (GMM) clustering approach through a weighted pattern-resistant faults and produces test patterns directed
ensembled regressor is proposed in [333]. Yield prediction toward them. Bayesian networks were used [343, 344] to
at an early stage helps in cost reduction and quality control. predict the test pattern set. A Bayesian network is a graphical
However, there are some limitations to GMM - sensitive to representation of the joint probability distribution for a set
initial guesses of parameters and high chances of getting of variables. The Bayesian network model describes the
stuck at the local minimum. As an extension to this, [334] relations between the test directives and coverage space and
propose a final test yield optimization approach through is used to achieve the required test patterns for a given
wafer acceptance test parameters’ inverse design. coverage area. It was further enhanced by clustering the
Classification aids in minimizing wafer yield loss and coverage events and working on them as a group. Hughes et
package yield improvement by thoroughly analyzing data al. [345] proposed an ML approach for functional ver-
across fab measurements, wafer tests, and package tests ification, where a NN model is used with RL to track the
[335]. In [336], an ROI-based (return on investment) wafer coverage results of a simulation and, after that, to generate
productivity model using DNNs as a yield prediction tech- a set of verification input data recommendations, which will
nique and differential evolution for optimization is proposed. increase the probability of hitting functional coverage
The DNNs are trained using geometric features of dies. statements and identifying hard-to-hit faults while adjusting
A DNN approach exploits spatial relationships among po- itself.
sitions of dies on a wafer and die-level yield variations The initial stage of the CUT greatly impacts the time
collected from a wafer test to predict the yield for pre- taken and the ability of stimuli generators to generate the
evaluating the productivity of new wafers [327]. requested stimuli successfully. Some initial states can lead
ML is gradually being utilized in yield prediction and to poor fault coverage, resulting in faulty products. Bayesian
optimization and is still in its early stages. There is scope for networks are employed to automatically and approximately
significant growth in using various ML techniques in yield identify the region of favorable initial states; however, they
enhancement. require a certain level of human guidance to select one of
the initial states [346]. Identification of power-risky test
10. AI at Testing patterns is also essential, as excessive test power can lead to
VLSI testing is the process of detecting possible faults in failure due to IR drop and noise. However, simulation of
an IC after chip fabrication. It is the most critical step in the all the patterns is impossible due to the long runtime. Thus,
VLSI design flow. The earlier a defect is detected, the lesser the pre-selection and creation of a subset of patterns are
the final product cost. Rule of 10 [337] states that the cost crucial. Dhotre et al.[347] proposed a transient power
of fault detection increases by order of 10 moving from one activity metric to identify potentially power-risky patterns.
stage to the next in the IC design flow. Improving the yield The method uses the layout and power information to rank
is a necessity for any company; shipping defective parts can the patterns approximately according to their power dissipa-
destroy a company’s reputation [338]. Almost 70% of the tion and subsequently uses a K-means clustering to cluster
design development time and resources are spent on VLSI all the instances with concentrated high switching activity.
testing. Different stages of the design flow involve different The application of ML can be extended to delay test
testing procedures. Broadly different levels of testing are measurements as well. Wang et al. [348, 349] proposed
functional verification testing, acceptance testing, manufac- models for 𝐹𝑚𝑎𝑥 prediction based on the results of struc-
turing testing, wafer level testing, packaging level testing, tural delay test measurements to determine the optimum
and so on [339]. We highlight the significant areas of testing conditions for improving the correlation between the golden
that has AI/ML contributions. reference and potential low-cost alternative for measuring

D Amuru et al.: Preprint submitted to Elsevier Page 25 of 41


AI/ML in VLSI
pre-processing techniques are applied, bore applying this
the performance variability of the chip design. The perfor- data as input to the ML model, which attempts to classify
mance and robustness of the proposed methodology with a the fault. In [357], the Fourier harmonic components of the
new dataset pruning method, called "conformity check," is CUT response are simulated from a sinusoidal input signal
demonstrated on a high-performance microprocessor design and supplied to a two-layer MLP, which attempts to identify
using KNN, least-squares fit, ridge regression, SVR, and the fault. Additionally, a selection criterion for determining
Gaussian process regression (GPR) models. GPR has proven the best components that describe the circuit behavior under
effective in achieving accurate functional and system 𝐹𝑚𝑎𝑥 fault-free (nominal) and fault situations is used and provided
prediction. as input to a NN [358]. The NN, along with clustering,
In [350], an explainable ML approach called Sentences classifies the faults into a fault dictionary.
in Feature Subsets (SiFS) for test point insertion (TPI) is In [359], wavelet transform along with PCA was used as
proposed. The proposed ML methodology can also apply to a pre-processing technique to extract the optimal number of
human-readable classification in EDA. An ANN-guided features from the circuit node voltages. A two-layer NN was
ATPG (automatic test pattern generation) [351, 352] pro- trained on these features to the probability of input features
posed in the recent past reduces the backtracks for PODEM belonging to different fault classes. This model yields a 95%-
and improves the backtraces, particularly in reconvergent 98% accuracy on nonlinear circuits. It was improved in [360]
fault-free circuits with reduced CPU time. The training by dividing the circuit successively into modules. At each
parameters include input-output distances and testability stage of module subdivision, a NN is trained to determine
values from cop (controllability and observability program) the sub-module that inherits the fault of interest from the
for signal nodes. Unifying ANN for ATPG incurs a one- parent module. It led to an increase in the training efficiency
time cost, after which ML imparted to ATPG can have long- of the NN, resulting in 100% accuracy in the classification.
term benefits. Design2Vec, a deep architecture that learns A novel anomaly detection technique [361] for post-silicon
representations of RTL syntax and semantic abstractions of bug diagnosis was proposed to detect aberrant behaviors or
hardware designs using a GNN, is proposed in [353]. These anomalies and identify a bug’s time and location. This
representations are applied to several tasks within algorithm comprises two stages. In the first stage, it collects
verification, such as test point coverage and new test gen- data online by running multiple executions of the same test
eration to cover desired points. Pattern identification and on the post-silicon platform, after which it records compact
reordering method are presented in [354]. An ML algorithm measurements of the signal activity. In the second stage, it
was used to select the most effective test patterns, and then an analyzes the data offline. The authors measured the amount
optimal pattern sequence was determined using the weighted of time the signal’s value was one during the time step and
SVMRANK (SVM Rank classification) algorithm. Experi- applied ML to the measurements to locate the bug. The
ments show time-saving of 3.89 times at the expense of 2% fundamental goal of testing is to determine the defects’ root
prediction accuracy. In [355], a KNN method is proposed to causes and eliminate them.
divide the test patterns into valid and invalid patterns and Automatic defect classification, which has existed for
then use only valid patterns to reduce the test time. several years, has been revolutionized by ML in terms of
Experiments show that compared to the traditional method; speed and accuracy, although ML-based defect analysis is
this methodology reduces the test time by 1.75 times. Chen et still not ideal for industrial standards. Much focus is needed
al. proposed an RL-based test program generation technique on automated defect analysis to locate the root cause of the
for transition delay fault (TDF) detection [356]. defect.
Even though ML has shown significant progress and
promise for functional verification, more is needed to perfect 10.3. Scan Chain Diagnosis
accuracy and human intervention. Intelligent data collection Scan chain structures are widely used in VLSI circuits
procedures and a novel feature extraction scheme with MI under design for testing. They increase the fault coverage and
should be inducted into CAD tools as initial steps for IC diagnosability by enhancing the controllability and observ-
testing to become fully automated. ability of the digital circuit logic [362]. Figure 13 shows the
design of a preliminary scan chain. During normal circuit
10.2. Fault Diagnosis operation, these structures function like a regular flip-flop,
After functional verification, the following test proce- and during testing, they shift and capture data at intermediate
dure identifies the fault location and type, called fault diag- nodes, aiding the identification of the fault location. How-
nosis. Traditionally, this step is not fully automated, and the ever, the circuit cannot be tested if a fault occurs in the scan
engineer’s experience and intuition play a part in developing chain. Therefore, scan chain diagnosis is very crucial. Tra-
the test strategies. At present, digital circuits and systems are ditionally, many special-tester-based and hardware-based
almost fully automated and have been extensively explored. diagnostic techniques were used. Although they provide
In contrast, analog circuits are more difficult to diagnose; high accuracy, they are computationally expensive and time-
over the last few years, extensive research on analog fault consuming. Recently, software-based diagnostic methods
diagnosis has been conducted, and many ML models have have attracted significant attention; however, these methods
been reported. Most of these models focus on obtaining the
output response of a circuit for the test pattern; different

D Amuru et al.: Preprint submitted to Elsevier Page 26 of 41


AI/ML in VLSI

Figure 13: A Typical Scan Chain Design

do not provide satisfactory results. ML is widely used in scan model electrostatic problems for VLSI modeling applica-
chain diagnosis to achieve sufficient resolution and accuracy. tions achieves an error rate of 9.3% in electric potential
An unsupervised-learning model was proposed [363], estimation without labeled data and yields 5.7% error with
where a Bayesian model was employed for diagnosis. The the assistance of a limited number of coarse labeling data
failing probabilities of each scan cell were supplied as input [373]. The paper also highlights the implementation of ML
to the model, which partitioned the scan cells into multiple models for data exploration for IC testing and reliability
clusters. After that, the defective scan cell is found at the analysis. In a survey of ML applications on analog and
boundaries of adjacent clusters. This model yielded 100% digital IC testing, significant challenges and opportunities
accuracy for both permanent and intermittent faults, al- are presented [374].
though only for single stuck-at faults. ANNs have come into We observe that deep NNs, GNNs in particular and
use recently, providing sufficient resolution and accuracy. Bayesian networks are the most suitable approaches to act
For example, in [364], a coarse global neural network was as an alternative to various laborious manual testing proce-
used to select several suspected scan cells (affine group) dures.
from all the scan-chain cells, and a refined local neural
network to identify the final suspected scan cell in the affine 11.Sources of Training data for AI/ML-VLSI
group. This successively refined focus increased the reso-
lution and accuracy but significantly increased the training The techniques of AI/ML would aid in solving many
time due to multiple networks. A two-stage NN model was challenges in the IC industry. Nevertheless, the limited data
proposed to identify the exact location of a stuck-at-fault and availability for training the necessary algorithms is a known
transition fault in [365]. The 1st stage ANN trained with difficulty in VLSI domain. Although there is a plethora of
entire scan data with all faults predicts a scan window with tools for designing, manufacturing, and testing VLSI cir-
successive candidates. The 2nd stage ANN analyzes the fail cuits, a systematical way of capturing relevant and sufficient
data locally to identify the exact fault location. data for training AI/ML algorithms still needs to be solved.
Liu et al. proposed RF classification to predict test chip A structured methodology for automated data capture across
design exploration synthesis outcomes [366]. In [367], a DT- different design levels needs to be incorporated into the IC
based screening method is proposed to predict unreliable design flow to resolve the challenge of data scarcity to a
dies that would fail the HTOL (high-temperature operating certain extent.
life) test [367]. The HTOL test is a popular test to determine This section presents a brief on sources of training data
the device’s intrinsic reliability and predict the device’s long- explored and implemented in literature for future research in-
term failure rate and lifetime of the device [368]. SVM and terest (Fig. 14). SIS is an interactive tool for synthesizing and
autoencoder-based early stage system level testing (SLT) optimizing sequential circuits that produces an optimized
failure estimation reduces the testing cost by 40% with a netlist in the target technology [375]. Benchmark circuits to
minor impact on defective parts per million (DPPM) [369]. analyze hardware security are available at Trust-HUB [376].
In addition, adaptive test methods that analyze the failing The research community utilized EDA tools from Cadence,
data and test logs, dynamically reorder the test patterns and Synopsys, and Mentor Graphics, while ISCAS and ISPD
adjust the testing process bring down the testing cost by benchmarks were used by many to generate training datasets
several orders [370, 371]. and for testing/model validation.
The state-of-the-art DL for IC test (GCNs (Graph Con-
volutional Networks) and ANNs in particular) is discussed 12. Challenges and Opportunities for AI/ML
in [372]. The work systematically investigates the robustness in VLSI
of ML metrics and models in the context of IC testing
and highlights the opportunities and challenges in adopting The dimensions of devices are decreasing; however, as
them. A novel physics-informed neural network (PINN) to we approach atomic dimensions, many aspects of their per-
formance deteriorate, e.g., leakage increases (particularly in

D Amuru et al.: Preprint submitted to Elsevier Page 27 of 41


AI/ML in VLSI

Figure 14: Sources of training data in the literature

D Amuru et al.: Preprint submitted to Elsevier Page 28 of 41


AI/ML in VLSI
smartphones is necessary for various applications in many
the sub-threshold region), gain decreases, and sensitivity to research areas. In this regard, learning algorithms will im-
fluctuations in manufacturing processes increase drastically prove the end-to-end performance, promoting a high utiliza-
[7]. This results in less reliability and yield. The growing tion ratio and high data bandwidth [377]. Memory designs
process variability and environmental sources of variation in on nanometer technology nodes are becoming increasingly
nanometer technology are leading to the deterioration of the challenging because they are the smallest devices on the chip
overall circuit performance. Modeling these effects based on and are thus affected the most in terms of functionality and
worst-case process corners would no longer be valid as most yield. Increasing the inter-die and intra-die variabilities will
parameters vary statistically. Moreover, many parameters exacerbate the cell-stability concerns. The AI/ML approach
exhibit complex correlations and wide variances. also increases the statistical analysis rate of memory designs.
Presently, the computationally efficient methods for es- The learning strategies of AI/ML have been extended to
timating the outputs corresponding to the inputs are some of high-level SoC designs [378] in the past. Kong and Bohr
the areas attracting significant interest in the field of circuit [379, 380] discuss the survey of design challenges in the
modeling of VLSI–CAD. To maximize chip relia- bility and nanometer regime. A vast network of AI is employed in
yield, each design in VLSI is optimally tuned to consume hardware acceleration to implement dynamic high-level dig-
and dissipate low power, occupy a minimum area, and ital circuits onto the hardware. High-speed VLSI hardware
achieve high throughput. Device models coupled with systems provide the necessary driving capability for AI/ML
circuit simulation tools significantly improve design algorithms to achieve their maximum potential. Dense NNs
productivity, providing insights for improving the design used extensively in embedded systems, such as IoT sensors,
choices and circuit performance [7]. Accurate and fast es- cars, and cameras, need high classification speeds, which are
timation techniques are required during circuit design and possible with high-speed hardware accelerators [381, 382].
modeling to estimate and verify the effect of the process In-memory-based computing by IBM [383] demonstrates
variations on the circuit output; this can aid the incor- how the completion speed of tasks by ML can be increased
poration of corrective measures/methodologies to improve significantly with reduced power consumption. The real-
the yield, thereby guaranteeing the design quality. The pri- ization of AI/ML learning algorithms in hardware reduces
mary challenge under process variations is to identify the the learning time and increases the speed of the prediction
dominant parameters causing the variations, estimate the process by many orders of magnitude. Interconnect datasets
relationship between the dominant parameters and the circuit need to be built carefully with many SoC parameters, floor
performance parameters, develop models for performance planning, routing constraints, and clock characteristics, cre-
evaluation, and incorporate these models into design tools. ating ample design space for exploration. Current GPUs of-
This problem is more pronounced in the nanometer regime fer high acceleration rates with parallel computing facilities
with the increased complexity of digital design. Traditional and superior performance for large design spaces.
models estimating the circuit performance comprise many Present ASIC design methodologies break down in light
parameters and have complicated equations that significantly of the new economic and technological realities; new design
slow the simulation speed. At the current technology nodes, methodologies are required for which the physical imple-
there is a need for compact device models with essential mentation of the design is more predictable. A database and
capabilities of scalability and universality (i.e., the ability to interface, from design-to-manufacturing to effectively
support different technologies). Nevertheless, one can see managing the parameter variability and increasing the data
many opportunities to address these challenges. volume, must be provided [379]. A paradigm shift in CAD
Surrogate ML/AI models provide solutions to these tool research is required to manage complex functional and
problems. These models forecast the device performance physical variabilities. The AI/ML algorithms can be un-
and can be easily extended to circuit-level and system-level folded to the CAD-tool methodologies in physical design to
design and analysis. Such models have been proposed in the manage the involved complexities. Data mining approaches,
literature to improve the turnaround time, and yield of ICs such as clustering and classification, can be imbibed into
[91, 93]. This learning methodology can also be applied for VLSI partitioning, paving a new route for recognizing hid-
post-layout simulation and ECO, aiding the achievement of den patterns in data and predicting the relationships between
timing closure [120]. Surrogate AI/ML models offer the attributes that enable forecasting outcomes [384].
comparable simulation rates to traditional EDA tools with Similarly, learning strategies can be applied to find cost-
reasonable accuracy. Potential risks in the advanced silicon effective solutions for placement and routing [23]. Reducing
nodes can be estimated and analyzed with prior design data the design cost of an IC is the primary driving force for
using ML algorithms. These algorithms can better capture downscaling [385]. The continued shrinkage of logic devices
complex electrical behavior in advanced technology nodes has brought about new challenges in chip manufacturing. It
than traditional EDA tools. The best methodologies for is increasingly difficult to resolve fine patterns and place
incorporating computationally effective AI/ML models into them accurately on the die, particularly at sizes below 20 nm.
VLSI–CAD design tools need to be explored. ML techniques can be utilized in the chip-manufacturing-
Estimation and analysis of the subsystem behavior are process-optimized compact-patterning models in the lithog-
also crucial in IC technology. For instance, accurately es- raphy process, mask synthesis, and correction and can be
timating the subsystem power consumption of commercial

D Amuru et al.: Preprint submitted to Elsevier Page 29 of 41


AI/ML in VLSI
References
extended to physical verification to validate the design’s
[1] J. Carballo, W. J. Chan, P. A. Gargini, A. B. Kahng, S. Nath, Itrs
manufacturability. For automated recovery and repair and
2.0: Toward a re-framing of the semiconductor technology roadmap,
big data debugging, the challenges in chip manufacturing in: 2014 IEEE 32nd International Conference on Computer Design
need to be addressed. Post-silicon validation is also possible (ICCD), 2014, pp. 139–146. doi:10.1109/ICCD.2014.6974673.
using ML algorithms with available training data from the [2] G. E. Moore, Cramming more components onto integrated circuits,
pre-silicon stage. The cost of testing a VLSI chip/subsystem reprinted from electronics, volume 38, number 8, april 19, 1965,
pp.114 ff., IEEE Solid-State Circuits Society Newsletter 11 (2006)
can be reduced using AI algorithms. For instance, finding
33–35.
an efficient solution for rearranging the test cases using AI [3] H. . P. Wong, D. J. Frank, P. M. Solomon, C. H. J. Wann, J. J. Welser,
heuristic search algorithms can reduce power dissipation Nanoscale cmos, Proceedings of the IEEE 87 (1999) 537–570.
during testing [386]. [4] R. Vaddi, S. Dasgupta, R. P. Agarwal, Device and circuit design
Having stated that there are many critical problems, such challenges in the digital subthreshold region for ultralow-power
applications, VLSI Des. 2009 (2009).
as high variability and deteriorated reliability, a wide variety
[5] D. Sylvester, H. Kaul, Power-driven challenges in nanometer design,
of AI and ML approaches - supervised/unsupervised/semi- IEEE Des. Test 18 (2001) 12–22.
supervised learning; NNs, MLP structures [387], [43], [59] [6] H. Iwai, Logic lsi technology roadmap for 22 nm and beyond,
- CNNs and deep learning [388], provide opportunities to in: 2009 16th IEEE International Symposium on the Physical and
solve the numerous problems and challenges in the field of Failure Analysis of Integrated Circuits, 2009, pp. 7–10. doi:10.1109/
IPFA.2009.5232710.
VLSI design. There is a trade-off between selecting suitable
[7] B. H. Calhoun, Y. Cao, X. Li, K. Mai, L. T. Pileggi, R. A. Rutenbar,
algorithms and architectures with available training data and K. L. Shepard, Digital circuit design challenges and opportunities
other model constraints. Fitting these new techniques in the in the era of nanoscale cmos, Proceedings of the IEEE 96 (2008)
classical flow of VLSI is another big challenge. Another 343–365.
issue is the availability of standardized, licensed ML algo- [8] M. H. Abu-Rahma, M. Anis, Variability in vlsi circuits: Sources
rithms with a thorough debugging facility. High-yielding and design considerations, in: 2007 IEEE International Symposium
on Circuits and Systems, 2007, pp. 3215–3218. doi:10.1109/ISCAS.
implementations are achievable by critically channeling ML 2007.378156.
designers’ domain knowledge with CAD designers. [9] S. Chaudhuri, N. K. Jha, Finfet logic circuit optimization with
The availability of limited training data can be maxi- different finfet styles: Lower power possible at higher supply voltage,
mally solved if the data flow across the design cycle can be in: 2014 27th International Conference on VLSI Design and 2014
effectively captured and explored. The chip designing 13th International Conference on Embedded Systems, 2014, pp.
476–482. doi:10.1109/VLSID.2014.89.
industries should understand the importance of systematic [10] R. S. Rathore, A. K. Rana, R. Sharma, Threshold voltage variability
generation, the capture of data, the incorporation of dis- induced by statistical parameters fluctuations in nanoscale bulk and
tributed bid data systems for chip workflows, and data- soi finfets, in: 2017 4th International Conference on Signal
driven optimizations to accelerate the quality, cost, and time Processing, Computing and Control (ISPCC), 2017, pp. 377–380.
of results [389]. It could be beneficial to have benchmark doi:10.1109/ISPCC.2017.8269707.
[11] A. R. Brown, N. Daval, K. K. Bourdelle, B. Nguyen, A. Asenov,
datasets for AI/ML training for future research and auto- Comparative simulation analysis of process-induced variability in
mated IC design flow development. To address the dearth of nanoscale soi and bulk trigate finfets, IEEE Transactions on Electron
training data, the critical challenge for employing AI/ML, Devices 60 (2013) 3611–3617.
standardized statistical training data for circuit modeling [12] M. Belleville, O. Thomas, A. Valentian, F. Clermidy, Designing dig-
should be developed. Such open-source contributions in the ital circuits with nano-scale devices: Challenges and opportunities,
Solid-State Electronics 84 (2013) 38–45. Selected Papers from the
VLSI community help to address the challenges more ESSDERC 2012 Conference.
effectively. Further, researchers can use them as a baseline [13] L. Wang, M. Luo, Machine learning applications and opportunities
and rapidly progress [390]. in ic design flow, in: 2019 International Symposium on VLSI
Different abstraction levels in the design flow, ranging Design, Automation and Test (VLSI-DAT), 2019, pp. 1–3.
from circuit design to chip fabrication and testing, inherently [14] C. K. C. Lee, Deep learning creativity in eda, in: 2020 International
Symposium on VLSI Design, Automation and Test (VLSI-DAT),
comprise numerous models relating inputs to outputs. An 2020, pp. 1–1.
enormous amount of data flows across billions of devices or [15] R. S. Kirk, The impact of ai technology on vlsi design, in:
components integrated/to be integrated on the chip [23]. The Managing Requirements Knowledge, International Workshop on,
complex I/O relationships between the components, IEEE Computer Society, 1985, pp. 125–125.
processes and various abstraction levels within each abstrac- [16] G. Rabbat, Vlsi and ai are getting closer, IEEE Circuits and Devices
Magazine 4 (1988) 15–18.
tion level can be explored via AI/ML algorithms using the [17] M. Z. A. Khan, H. Saleem, S. Afzal, Application of vlsi in artificial
information accumulated during different kinds of simula- intelligence (2012).
tions/analyses. Further, we need to analyze the data streams [18] J. Delgado-Frias, W. Moore, VLSI for Artificial Intelligence, vol-
associated with file operations, which clustering algorithms ume 68, 1989. doi:10.1007/978-1-4613-1619-0.
can use to deliver high application performance. AI/ML [19] L. Capodieci, Data analytics and machine learning for design-
process-yield optimization in electronic design automation and ic
solutions can be employed in VLSI–CAD for design-flow semiconductor manufacturing, in: 2017 China Semiconductor Tech-
optimization. nology International Conference (CSTIC), 2017, pp. 1–3.
Future advancements in differential programming and [20] A. B. Kahng, Machine learning applications in physical design: Re-
quantum ML approaches can lead to incredible break- cent results and directions, in: Proceedings of the 2018 International
throughs in the EDA industry. Symposium on Physical Design, 2018, pp. 68–73.

D Amuru et al.: Preprint submitted to Elsevier Page 30 of 41


AI/ML in VLSI

[21] P. A. Beerel, M. Pedram, Opportunities for machine learning in elec- sciencedirect.com/science/article/pii/B9780123814791000083.


tronic design automation, in: 2018 IEEE International Symposium doi:https://doi.org/10.1016/B978-0-12-381479-1.00008-3.
on Circuits and Systems (ISCAS), 2018, pp. 1–5. [40] S. B. Kotsiantis, I. Zaharakis, P. Pintelas, Supervised machine
[22] H.-G. Stratigopoulos, Machine learning applications in ic testing, learning: A review of classification techniques, Emerging artificial
in: 2018 IEEE 23rd European Test Symposium (ETS), IEEE, 2018, intelligence applications in computer engineering 160 (2007) 3–24.
pp. 1–10. [41] T. G. Dietterich, Machine-learning research, AI magazine 18 (1997)
[23] I. A. Elfadel, D. Boning, X. Li, Machine Learning in VLSI 97–97.
Computer-Aided Design, 2019. doi:10.1007/978-3-030-04666-8. [42] T. G. Dietterich, Ensemble methods in machine learning, in:
[24] B. Khailany, H. Ren, S. Dai, S. Godil, B. Keller, R. Kirby, A. Kline- Multiple Classifier Systems, Springer Berlin Heidelberg, Berlin,
felter, R. Venkatesan, Y. Zhang, B. Catanzaro, et al., Accelerating Heidelberg, 2000, pp. 1–15.
chip design with machine learning, IEEE Micro 40 (2020) 23–32. [43] T. Hastie, R. Tibshirani, J. Friedman, The elements of statistical
[25] C. Schuermyer, Deploying new nodes faster with machine learning learning – data mining, inference, and prediction, 2008.
for ic design and manufacturing, in: 2019 International Symposium [44] R. Xu, D. Wunsch, Clustering, volume 10, John Wiley & Sons, 2008.
on VLSI Technology, Systems and Application (VLSI-TSA), 2019, [45] Semi-Supervised Classification Using Pattern Clustering, John
pp. 1–3. doi:10.1109/VLSI-TSA.2019.8804650. Wiley & Sons, Ltd, 2013, pp. 127–181. URL: https:
[26] M. Rapp, H. Amrouch, Y. Lin, B. Yu, D. Z. Pan, M. Wolf, J. Henkel, //onlinelibrary.wiley.com/doi/abs/10.1002/9781118557693.ch4.
Mlcad: A survey of research in machine learning for cad keynote doi:https://doi.org/10.1002/9781118557693.ch4.
paper, IEEE Transactions on Computer-Aided Design of Integrated [46] O. Chapelle, B. Schölkopf, A. Zien, Introduction to Semi-Supervised
Circuits and Systems (2021) 1–1. Learning, 2006, pp. 1–12.
[27] G. Huang, J. Hu, Y. He, J. Liu, M. Ma, Z. Shen, J. Wu, Y. Xu, [47] R. S. Sutton, A. G. Barto, Reinforcement Learning: An Introduction,
H. Zhang, K. Zhong, X. Ning, Y. Ma, H. Yang, B. Yu, H. Yang, A Bradford Book, Cambridge, MA, USA, 2018.
Y. Wang, Machine learning for electronic design automation: A [48] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press,
survey, ACM Trans. Des. Autom. Electron. Syst. 26 (2021). 2016. http://www.deeplearningbook.org.
[28] D. S. Lopera, L. Servadei, G. N. Kiprit, S. Hazra, R. Wille, W. Ecker, [49] Huang Yi, Sun Shiyu, Duan Xiusheng, Chen Zhigang, A study on
A survey of graph neural networks for electronic design automation, deep neural networks framework, in: 2016 IEEE Advanced In-
in: 2021 ACM/IEEE 3rd Workshop on Machine Learning for CAD formation Management, Communicates, Electronic and Automation
(MLCAD), 2021, pp. 1–6. doi:10.1109/MLCAD52597.2021.9531070. Control Conference (IMCEC), 2016, pp. 1519–1522. doi:10.1109/
[29] Y. Ma, Z. He, W. Li, L. Zhang, B. Yu, Understanding Graphs in IMCEC.2016.7867471.
EDA: From Shallow to Deep Learning, Association for Computing [50] E. Nishani, B. Çiço, Computer vision approaches based on deep
Machinery, New York, NY, USA, 2020, p. 119–126. URL: https: learning and neural networks: Deep neural networks for video
//doi.org/10.1145/3372780.3378173. analysis of human pose estimation, in: 2017 6th Mediterranean
[30] V. Hamolia, V. Melnyk, A survey of machine learning methods and Conference on Embedded Computing (MECO), 2017, pp. 1–4.
applications in electronic design automation, in: 2021 11th Interna- doi:10.1109/MECO.2017.7977207.
tional Conference on Advanced Computer Information Technologies [51] B. Chandra, R. K. Sharma, On improving recurrent neural network
(ACIT), 2021, pp. 757–760. doi:10.1109/ACIT52158.2021.9548117. for image classification, in: 2017 International Joint Conference on
[31] A. Malhotra, A. Singh, Implementation of ai in the field of vlsi: A Neural Networks (IJCNN), 2017, pp. 1904–1907. doi:10.1109/
review, in: 2022 Second International Conference on Power, IJCNN.2017.7966083.
Control and Computing Technologies (ICPC2T), 2022, pp. 1–5. [52] A. Sinha, M. Jenckel, S. S. Bukhari, A. Dengel, Unsupervised ocr
doi:10.1109/ICPC2T53885.2022.9776845. model evaluation using gan, in: 2019 International Conference on
[32] M. Bansal, Priya, Machine learning perspective in vlsi computer- Document Analysis and Recognition (ICDAR), 2019, pp. 1256–
aided design at different abstraction levels, in: S. Shakya, R. Bestak, 1261. doi:10.1109/ICDAR.2019.00-42.
R. Palanisamy, K. A. Kamel (Eds.), Mobile Computing and Sustain- [53] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
able Informatics, Springer Singapore, Singapore, 2022, pp. 95–112. S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in:
[33] A. F. Budak, Z. Jiang, K. Zhu, A. Mirhoseini, A. Goldie, D. Z. Pan, Advances in neural information processing systems, 2014, pp. 2672–
Reinforcement learning for electronic design automation: Case 2680.
studies and perspectives: (invited paper), in: 2022 27th Asia and [54] A. Jeerige, D. Bein, A. Verma, Comparison of deep reinforcement
South Pacific Design Automation Conference (ASP-DAC), 2022, learning approaches for intelligent game playing, in: 2019 IEEE 9th
pp. 500–505. doi:10.1109/ASP-DAC52403.2022.9712578. Annual Computing and Communication Workshop and Conference
[34] L.-T. Wang, C.-W. Wu, X. Wen, VLSI Test Principles and Architec- (CCWC), 2019, pp. 0366–0371. doi:10.1109/CCWC.2019.8666545.
tures: Design for Testability, Morgan Kaufmann Publishers Inc., San [55] A. Zjajo, Stochastic Process Variation in Deep-Submicron CMOS,
Francisco, CA, USA, 2006. Springer, 2016.
[35] N. Weste, D. Harris, CMOS VLSI Design: A Circuits and Systems [56] S. Shukla, S. S. Gill, N. Kaur, H. Jatana, V. Nehru, Comparative sim-
Perspective, 4th ed., Addison-Wesley Publishing Company, USA, ulation analysis of process parameter variations in 20 nm triangular
2010. finfet, Active and Passive Electronic Components 2017 (2017).
[36] S. M. S. M. Sze, VLSI technology / edited by S.M. Sze., McGraw- [57] Z. Abbas, M. Olivieri, Impact of technology scaling on leakage
Hill series in electrical engineering. Electronics and electronic cir- power in nano-scale bulk cmos digital standard cells, Microelec-
cuits, 2nd ed. ed., McGraw-Hill, New York, 1988. tronics Journal 45 (2014) 179–195.
[37] S. Mitra, S. A. Seshia, N. Nicolici, Post-silicon validation opportu- [58] M. Olivieri, A. Mastrandrea, Logic drivers: A propagation delay
nities, challenges and recent advances, in: Proceedings of the 47th modeling paradigm for statistical simulation of standard cell designs,
Design Automation Conference, DAC ’10, Association for Comput- IEEE Transactions on Very Large Scale Integration (VLSI) Systems
ing Machinery, New York, NY, USA, 2010, p. 12–17. URL: https: 22 (2013) 1429–1440.
//doi.org/10.1145/1837274.1837280. doi:10.1145/1837274.1837280. [59] C. M. Bishop, Pattern Recognition and Machine Learning (Infor-
[38] E. Alpaydin, Introduction to machine learning, MIT press, 2020. mation Science and Statistics), Springer-Verlag, Berlin, Heidelberg,
[39] J. Han, M. Kamber, J. Pei, 8 - classification: Basic concepts, 2006.
in: J. Han, M. Kamber, J. Pei (Eds.), Data Mining (Third [60] P. Cox, Ping Yang, S. S. Mahant-Shetti, P. Chatterjee, Statistical
Edition), The Morgan Kaufmann Series in Data Management modeling for efficient parametric yield estimation of mos vlsi cir-
Systems, third edition ed., Morgan Kaufmann, Boston, 2012, pp. cuits, IEEE Transactions on Electron Devices 32 (1985) 471–478.
327–391. URL: https://www.

D Amuru et al.: Preprint submitted to Elsevier Page 31 of 41


AI/ML in VLSI

[61] A. R. Alvarez, B. L. Abdi, D. L. Young, H. D. Weed, J. Teplik, [78] M. Miranda, P. Zuber, P. Dobrovolný, P. Roussel, Variability aware
E. R. Herald, Application of statistical design and response surface modeling for yield enhancement of sram and logic, in: 2011 Design,
methods to computer-aided vlsi device design, IEEE Transactions on Automation Test in Europe, 2011, pp. 1–6.
Computer-Aided Design of Integrated Circuits and Systems 7 (1988) [79] S. Chaudhuri, P. Mishra, N. K. Jha, Accurate leakage estimation
272–288. for finfet standard cells using the response surface methodology, in:
[62] D. L. Young, J. Teplik, H. D. Weed, N. T. Tracht, A. R. Alvarez, 2012 25th International Conference on VLSI Design, IEEE, 2012,
Application of statistical design and response surface methods to pp. 238–244.
computer-aided vlsi device design ii. desirability functions and [80] L. Cao, Circuit power estimation using pattern recognition tech-
taguchi methods, IEEE Transactions on Computer-Aided Design niques, in: Proceedings of the 2002 IEEE/ACM international
of Integrated Circuits and Systems 10 (1991) 103–115. conference on Computer-aided design, 2002, pp. 412–417.
[63] R. H. Myers, D. C. Montgomery, G. G. Vining, C. M. Borror, S. M. [81] L. Yu, S. Saxena, C. Hess, I. A. M. Elfadel, D. Antoniadis, D. Bon-
Kowalski, Response surface methodology: A retrospective and ing, Statistical library characterization using belief propagation
literature survey, Journal of Quality Technology 36 (2004) 53–77. across multiple technology nodes, in: 2015 Design, Automation &
[64] R. Myers, D. Montgomery, C. Anderson-Cook, Response Surface Test in Europe Conference & Exhibition (DATE), IEEE, 2015, pp.
Methodology: Process and Product Optimization Using Designed 1383–1388.
Experiments, Wiley Series in Probability and Statistics, Wiley, 2016. [82] L. Cheng, P. Gupta, L. He, Efficient additive statistical leakage esti-
URL: https://books.google.co.in/books?id=vOBbCwAAQBAJ. mation, IEEE Transactions on Computer-Aided Design of Integrated
[65] M. A. H. Khan, A. S. M. Z. Rahman, T. Muntasir, U. K. Acharjee, Circuits and Systems 28 (2009) 1777–1781.
M. A. Layek, Multiple polynomial regression for modeling a mosfet [83] Mcnc designers’ manual, 1993. URL: https://www.carolana.com/NC/
in saturation to validate the early voltage, in: 2011 IEEE Symposium NC_Manuals/NC_Manual_1993_1994.pdf.
on Industrial Electronics and Applications, 2011, pp. 261–266. [84] H. Chang, S. Sapatnekar, Full-chip analysis of leakage power under
[66] Y. S. Chauhan, S. Venugopalan, M. A. Karim, S. Khandelwal, process variations, including spatial correlations, in: Proceedings.
N. Paydavosi, P. Thakur, A. M. Niknejad, C. C. Hu, Bsim — 42nd Design Automation Conference, 2005., 2005, pp. 523–528.
industry standard compact mosfet models, in: 2012 Proceedings of doi:10.1109/DAC.2005.193865.
the ESSCIRC (ESSCIRC), 2012, pp. 30–33. doi:10.1109/ESSCIRC. [85] A. Moshrefi, H. Aghababa, O. Shoaei, Statistical estimation of delay
2012.6341249. in nano-scale cmos circuits using burr distribution, Microelectron.
[67] Z. Abbas, M. Olivieri, Optimal transistor sizing for maximum yield J. 79 (2018) 30–37.
in variation-aware standard cell design, International Journal of [86] T.-T. Liu, J. M. Rabaey, Statistical analysis and optimization of
Circuit Theory and Applications 44 (2016) 1400–1424. asynchronous digital circuits, in: 2012 IEEE 18th International
[68] T.-L. Wu, S. B. Kutub, Machine learning-based statistical approach Symposium on Asynchronous Circuits and Systems, IEEE, 2012,
to analyze process dependencies on threshold voltage in recessed pp. 1–8.
gate algan/gan mis-hemts, IEEE Transactions on Electron Devices [87] K. J. Kuhn, Considerations for ultimate cmos scaling, IEEE
67 (2020) 5448–5453. Transactions on Electron Devices 59 (2012) 1813–1828.
[69] G. Choe, P. V. Ravindran, A. Lu, J. Hur, M. Lederer, A. Reck, [88] K. J. Kuhn, Cmos transistor scaling past 32nm and implications on
S. Lombardo, N. Afroze, J. Kacher, A. I. Khan, S. Yu, Ma- variation, in: 2010 IEEE/SEMI Advanced Semiconductor Manufac-
chine learning assisted statistical variation analysis of ferroelectric turing Conference (ASMC), 2010, pp. 241–246.
transistors: From experimental metrology to predictive modeling, [89] A. Stillmaker, B. Baas, Scaling equations for the accurate prediction
in: 2022 IEEE Symposium on VLSI Technology and Circuits of cmos device performance from 180 nm to 7 nm, Integration 58
(VLSI Technology and Circuits), 2022, pp. 336–337. doi:10.1109/ (2017) 74–81.
VLSITechnologyandCir46769.2022.9830392. [90] Predictive technology model, 2012. URL: http://ptm.asu.edu/.
[70] M.-Y. Kao, H. Kam, C. Hu, Deep-learning-assisted physics-driven [91] D. Amuru, A. Zahra, Z. Abbas, Statistical variation aware leakage
mosfet current-voltage modeling, IEEE Electron Device Letters 43 and total power estimation of 16 nm vlsi digital circuits based
(2022) 974–977. on regression models, in: A. Sengupta, S. Dasgupta, V. Singh,
[71] M. Choi, X. Xu, V. Moroz, Modeling performance and thermal R. Sharma, S. Kumar Vishvakarma (Eds.), VLSI Design and Test,
induced reliability issues of a 3nm finfet logic chip operation in a Springer Singapore, Singapore, 2019, pp. 565–578.
fan-out and a flip-chip packages, in: 2019 18th IEEE Intersociety [92] A. Stillmaker, Z. Xiao, B. Baas, Toward more accurate scaling
Conference on Thermal and Thermomechanical Phenomena in Elec- estimates of cmos circuits from 180 nm to 22 nm, 2012.
tronic Systems (ITherm), 2019, pp. 107–112. [93] S. Gourishetty, H. Mandadapu, A. Zahra, Z. Abbas, A highly
[72] S. J. Pan, Q. Yang, A survey on transfer learning, IEEE Transactions accurate machine learning approach to modelling pvt variation aware
on Knowledge and Data Engineering 22 (2010) 1345–1359. leakage power in finfet digital circuits, in: 2019 IEEE Asia Pacific
[73] F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, Q. He, Conference on Circuits and Systems (APCCAS), 2019, pp. 61–64.
A comprehensive survey on transfer learning, Proceedings of the [94] D. Amuru, M. S. Ahmed, Z. Abbas, An efficient gradient boosting
IEEE 109 (2021) 43–76. approach for pvt aware estimation of leakage power and propagation
[74] A. A. Mutlu, M. Rahman, Statistical methods for the estimation of delay in cmos/finfet digital cells, in: 2020 IEEE International
process variation effects on circuit operation, IEEE Transactions on Symposium on Circuits and Systems (ISCAS), 2020, pp. 1–5.
Electronics Packaging Manufacturing 28 (2005) 364–375. [95] M. D. Bhavesh, N. A. Anilkumar, M. I. Patel, R. Gajjar, D. Panchal,
[75] S. Basu, P. Thakore, R. Vemuri, Process variation tolerant standard Power consumption prediction of digital circuits using machine
cell library development using reduced dimension statistical model- learning, in: 2022 2nd International Conference on Artificial Intel-
ing and optimization techniques, in: 8th International Symposium ligence and Signal Processing (AISP), 2022, pp. 1–6. doi:10.1109/
on Quality Electronic Design (ISQED’07), 2007, pp. 814–820. AISP53593.2022.9760542.
[76] L. Brusamarello, G. Wirth, P. Roussel, M. Miranda, Fast and [96] V. A. Chhabria, B. Keller, Y. Zhang, S. Vollala, S. Pratty, H. Ren,
accurate statistical characterization of standard cell libraries, Mi- B. Khailany, Xt-praggma: Crosstalk pessimism reduction achieved
croelectronics Reliability 51 (2011) 2341–2350. with gpu gate-level simulations and machine learning, in: 2022
[77] M. Miranda, P. Roussel, L. Brusamarello, G. Wirth, Statistical ACM/IEEE 4th Workshop on Machine Learning for CAD (ML-
characterization of standard cells using design of experiments with CAD), 2022, pp. 63–69. doi:10.1109/MLCAD55463.2022.9900084.
response surface modeling, in: 2011 48th ACM/EDAC/IEEE Design [97] T. Chen, V.-K. Kim, M. Tegethoff, Ic yield estimation at early stages
Automation Conference (DAC), 2011, pp. 77–82. of the design cycle, Microelectronics journal 30 (1999) 725–732.

D Amuru et al.: Preprint submitted to Elsevier Page 32 of 41


AI/ML in VLSI

[98] R. R. Rao, A. Devgan, D. Blaauw, D. Sylvester, Parametric yield [115] Y. Zhou, H. Ren, Y. Zhang, B. Keller, B. Khailany, Z. Zhang, Primal:
estimation considering leakage variability, in: Proceedings of the Power inference using machine learning, in: 2019 56th ACM/IEEE
41st Annual Design Automation Conference, DAC ’04, Association Design Automation Conference (DAC), 2019, pp. 1–6.
for Computing Machinery, New York, NY, USA, 2004, p. 442–447. [116] J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, M. Sun,
URL: https://doi.org/10.1145/996566.996693. doi:10.1145/996566. 996693. Graph neural networks: A review of methods and applications, 2019.
[99] L. Hou, L. Zheng, W. Wu, Neural network based vlsi power arXiv:1812.08434.
estimation, in: 2006 8th International Conference on Solid-State and [117] Y. Zhang, H. Ren, B. Khailany, Grannite: Graph neural network in-
Integrated Circuit Technology Proceedings, 2006, pp. 1919–1921. ference for transferable power estimation, in: 2020 57th ACM/IEEE
[100] M. Stockman, M. Awad, R. Khanna, C. Le, H. David, E. Gorbatov, Design Automation Conference (DAC), 2020, pp. 1–6. doi:10.1109/
U. Hanebutte, A novel approach to memory power estimation using DAC18072.2020.9218643.
machine learning, in: 2010 International Conference on Energy [118] E. Banijamali, A. Ghodsi, P. Popuart, Generative mixture of net-
Aware Computing, IEEE, 2010, pp. 1–3. works, in: 2017 International Joint Conference on Neural Networks
[101] V. Janakiraman, A. Bharadwaj, V. Visvanathan, Voltage and tem- (IJCNN), 2017, pp. 3753–3760. doi:10.1109/IJCNN.2017.7966329.
perature aware statistical leakage analysis framework using artificial [119] M. Rezagholiradeh, M. A. Haidar, Reg-gan: Semi-supervised
neural networks, IEEE Transactions on Computer-Aided Design of learning based on generative adversarial networks for regression,
Integrated Circuits and Systems 29 (2010) 1056–1069. in: 2018 IEEE International Conference on Acoustics, Speech and
[102] S. Narendra, V. De, S. Borkar, D. Antoniadis, A. Chandrakasan, Signal Processing (ICASSP), 2018, pp. 2806–2810.
Full-chip sub-threshold leakage power prediction model for sub-0.18 [120] Y. Fang, H. Lin, M. Sui, C. Li, E. J. Fang, Machine-learning-based
/spl mu/m cmos, in: Proceedings of the International Symposium on dynamic ir drop prediction for eco, in: 2018 IEEE/ACM Interna-
Low Power Electronics and Design, 2002, pp. 19–23. tional Conference on Computer-Aided Design (ICCAD), 2018, pp.
[103] R. R. Rao, A. Devgan, D. Blaauw, D. Sylvester, Analytical 1–7. doi:10.1145/3240765.3240823.
yield prediction considering leakage/performance correlation, IEEE [121] Z. Xie, H. Ren, B. Khailany, Y. Sheng, S. Santosh, J. Hu, Y. Chen,
Transactions on Computer-Aided Design of Integrated Circuits and Powernet: Transferable dynamic ir drop estimation via maximum
Systems 25 (2006) 1685–1695. convolutional neural network, in: 2020 25th Asia and South Pa-
[104] H. Chang, S. S. Sapatnekar, Prediction of leakage power under cific Design Automation Conference (ASP-DAC), 2020, pp. 13–18.
process uncertainties, ACM Trans. Des. Autom. Electron. Syst. 12 doi:10.1109/ASP-DAC47756.2020.9045574.
(2007) 12–es. [122] S. Lin, Y. Fang, Y. Li, Y. Liu, T. Yang, S. Lin, C. Li, E. J. Fang, Ir
[105] L. Garg, V. Sahula, Variability aware support vector machine drop prediction of eco-revised circuits using machine learning, in:
based macromodels for statistical estimation of subthreshold leakage 2018 IEEE 36th VLSI Test Symposium (VTS), 2018, pp. 1–6.
power, in: 2012 International Conference on Synthesis, Modeling, doi:10.1109/VTS.2018.8368657.
Analysis and Simulation Methods and Applications to Circuit De- [123] Y. Yamato, T. Yoneda, K. Hatayama, M. Inoue, A fast and accurate
sign (SMACD), 2012, pp. 253–256. per-cell dynamic ir-drop estimation method for at-speed scan test
[106] A. B. Kahng, M. Luo, S. Nath, Si for free: machine learning pattern validation, in: 2012 IEEE International Test Conference,
of interconnect coupling delay and transition effects, in: 2015 2012, pp. 1–8. doi:10.1109/TEST.2012.6401549.
ACM/IEEE International Workshop on System Level Interconnect [124] F. Ye, F. Firouzi, Y. Yang, K. Chakrabarty, M. B. Tahoori, On-
Prediction (SLIP), 2015, pp. 1–8. chip voltage-droop prediction using support-vector machines, in:
[107] V. Govindaraj, B. Arunadevi, Machine learning based power estima- 2014 IEEE 32nd VLSI Test Symposium (VTS), 2014, pp. 1–6.
tion for cmos vlsi circuits, Applied Artificial Intelligence 35 (2021) doi:10.1109/VTS.2014.6818798.
1043–1055. [125] S. Kundu, M. Prasad, S. Nishad, S. Nachireddy, H. K, Mlir: Machine
[108] K. Agarwal, A. Jain, D. Amuru, Z. Abbas, Fast and efficient resnn learning based ir drop prediction on eco revised design for faster
and genetic optimization for pvt aware performance enhancement convergence, in: 2022 35th International Conference on VLSI De-
in digital circuits, in: 2022 International Symposium on VLSI sign and 2022 21st International Conference on Embedded Systems
Design, Automation and Test (VLSI-DAT), 2022, pp. 1–4. doi:10. (VLSID), 2022, pp. 68–73. doi:10.1109/VLSID2022.2022.00025.
1109/VLSI-DAT54769.2022.9768067. [126] S. Han, A. B. Kahng, S. Nath, A. S. Vydyanathan, A deep learning
[109] A. Rahimi, L. Benini, R. K. Gupta, Hierarchically focused guard- methodology to proliferate golden signoff timing, in: 2014 Design,
banding: An adaptive approach to mitigate pvt variations and aging, Automation Test in Europe Conference Exhibition (DATE), 2014,
in: 2013 Design, Automation Test in Europe Conference Exhibition pp. 1–6.
(DATE), 2013, pp. 1695–1700. [127] C. Zhuo, B. Yu, D. Gao, Accelerating chip design with machine
[110] X. Jiao, A. Rahimi, B. Narayanaswamy, H. Fatemi, J. P. de Gyvez, learning: From pre-silicon to post-silicon, in: 2017 30th IEEE
R. K. Gupta, Supervised learning based model for predicting International System-on-Chip Conference (SOCC), 2017, pp. 227–
variability-induced timing errors, in: 2015 IEEE 13th International 232. doi:10.1109/SOCC.2017.8226046.
New Circuits and Systems Conference (NEWCAS), 2015, pp. 1–4. [128] S. Dey, S. Nandi, G. Trivedi, Machine learning for vlsi cad: A case
[111] A. Bogliolo, L. Benini, G. De Micheli, Regression-based rtl power study in on-chip power grid design, in: 2021 IEEE Computer
modeling, ACM Trans. Des. Autom. Electron. Syst. 5 (2000) 337– Society Annual Symposium on VLSI (ISVLSI), 2021, pp. 378–383.
372. doi:10.1109/ISVLSI51109.2021.00075.
[112] J. H. Anderson, F. N. Najm, Power estimation techniques for fpgas, [129] H. Vaghasiya, A. Jain, J. N. Tripathi, A machine learning based
IEEE Transactions on Very Large Scale Integration (VLSI) Systems metaheuristic technique for decoupling capacitor optimization, in:
12 (2004) 1015–1027. 2022 IEEE 26th Workshop on Signal and Power Integrity (SPI),
[113] S. Ahuja, D. A. Mathaikutty, G. Singh, J. Stetzer, S. K. Shukla, 2022, pp. 1–4. doi:10.1109/SPI54345.2022.9874924.
A. Dingankar, Power estimation methodology for a high-level [130] M.-Y. Su, W.-C. Lin, Y.-T. Kuo, C.-M. Li, E. J.-W. Fang, S. S.-
synthesis framework, in: 2009 10th International Symposium on Y. Hsueh, Chip performance prediction using machine learning
Quality Electronic Design, 2009, pp. 541–546. doi:10.1109/ISQED. techniques, in: 2021 International Symposium on VLSI Design,
2009.4810352. Automation and Test (VLSI-DAT), 2021, pp. 1–4. doi:10.1109/ VLSI-
[114] D. Sunwoo, G. Y. Wu, N. A. Patil, D. Chiou, Presto: An fpga- DAT52063.2021.9427338.
accelerated power estimation methodology for complex systems, in: [131] S. Sadiqbatcha, J. Zhang, H. Amrouch, S. X.-D. Tan, Real-time full-
2010 International Conference on Field Programmable Logic and chip thermal tracking: A post-silicon, machine learning perspective,
Applications, 2010, pp. 310–317. doi:10.1109/FPL.2010.69. IEEE Transactions on Computers 71 (2022) 1411–1424.

D Amuru et al.: Preprint submitted to Elsevier Page 33 of 41


AI/ML in VLSI

[132] J. Zhang, Z. Wang, N. Verma, In-memory computation of a machine- [149] S. Bavikadi, P. R. Sutradhar, K. N. Khasawneh, A. Ganguly, S. M.
learning classifier in a standard 6t sram array, IEEE Journal of Solid- Pudukotai Dinakarrao, A review of in-memory computing ar-
State Circuits 52 (2017) 915–924. chitectures for machine learning applications, in: Proceedings of
[133] M. Kang, Y. Kim, A. D. Patil, N. R. Shanbhag, Deep in-memory the 2020 on Great Lakes Symposium on VLSI, GLSVLSI ’20,
architectures for machine learning–accuracy versus efficiency trade- Association for Computing Machinery, New York, NY, USA, 2020,
offs, IEEE Transactions on Circuits and Systems I: Regular Papers p. 89–94. URL: https://doi.org/10.1145/3386263.3407649. doi:10.
67 (2020) 1627–1639. 1145/3386263.3407649.
[134] M. Kang, M.-S. Keel, N. R. Shanbhag, S. Eilert, K. Curewitz, An [150] A. Biswas, H. Sanghvi, M. Mehendale, G. Preet, An area-efficient
energy-efficient vlsi architecture for pattern recognition via deep 6t-sram based compute-in-memory architecture with reconfigurable
embedding of computation in sram, in: 2014 IEEE International sar adcs for energy-efficient deep neural networks in edge ml ap-
Conference on Acoustics, Speech and Signal Processing (ICASSP), plications, in: 2022 IEEE Custom Integrated Circuits Conference
2014, pp. 8326–8330. doi:10.1109/ICASSP.2014.6855225. (CICC), 2022, pp. 1–2. doi:10.1109/CICC53496.2022.9772789.
[135] S. K. Gonugondla, M. Kang, N. Shanbhag, A 42pj/decision [151] L. Chang, C. Li, Z. Zhang, J. Xiao, Q. Liu, Z. Zhu, W. Li, Z. Zhu,
3.12tops/w robust in-memory machine learning classifier with on- S. Yang, J. Zhou, Energy-efficient computing-in-memory archi-
chip training, in: 2018 IEEE International Solid - State Circuits tecture for ai processor: device, circuit, architecture perspective,
Conference - (ISSCC), 2018, pp. 490–492. Science China Information Sciences 64 (2021) 160403.
[136] A. Sebastian, M. Le Gallo, R. Khaddam-Aljameh, E. Eleftheriou, [152] W. Wan, R. Kubendran, S. B. Eryilmaz, W. Zhang, Y. Liao, D. Wu,
Memory devices and applications for in-memory computing, Nature S. Deiss, B. Gao, P. Raina, S. Joshi, H. Wu, G. Cauwenberghs, H.-
Nanotechnology 15 (2020) 529–544. S. P. Wong, 33.1 a 74 tmacs/w cmos-rram neurosynaptic core with
[137] Y. Wang, H. Tang, Y. Xie, X. Chen, S. Ma, Z. Sun, Q. Sun, L. Chen, dynamically reconfigurable dataflow and in-situ transposable
H. Zhu, J. Wan, Z. Xu, D. W. Zhang, P. Zhou, W. Bao, An in-memory weights for probabilistic graphical models, in: 2020 IEEE Interna-
computing architecture based on two-dimensional semiconductors tional Solid- State Circuits Conference - (ISSCC), 2020, pp. 498–
for multiply-accumulate operations, Nature Communications 12 500. doi:10.1109/ISSCC19947.2020.9062979.
(2021) 3347. [153] P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang,
[138] Q. Wang, P. Li, Y. Kim, A parallel digital vlsi architecture for Y. Xie, Prime: A novel processing-in-memory architecture for
integrated support vector machine training and classification, IEEE neural network computation in reram-based main memory, in: 2016
Transactions on Very Large Scale Integration (VLSI) Systems 23 ACM/IEEE 43rd Annual International Symposium on Computer
(2015) 1471–1484. Architecture (ISCA), 2016, pp. 27–39. doi:10.1109/ISCA.2016.13.
[139] K. Kang, T. Shibata, An on-chip-trainable gaussian-kernel analog [154] C. Lammie, W. Xiang, M. Rahimi Azghadi, Modeling and simulat-
support vector machine, IEEE Transactions on Circuits and Systems ing in-memory memristive deep learning systems: An overview of
I: Regular Papers 57 (2010) 1513–1524. current efforts, Array 13 (2022) 100116.
[140] T. Kuan, J. Wang, J. Wang, P. Lin, G. Gu, Vlsi design of an svm [155] Ming Cheng, Lixue Xia, Zhenhua Zhu, Yi Cai, Yuan Xie, Yu
learning core on sequential minimal optimization algorithm, IEEE Wang, Huazhong Yang, Time: A training-in-memory architec- ture
Transactions on Very Large Scale Integration (VLSI) Systems 20 for memristor-based deep neural networks, in: 2017 54th
(2012) 673–683. ACM/EDAC/IEEE Design Automation Conference (DAC), 2017,
[141] M. Papadonikolakis, C. Bouganis, Novel cascade fpga accelerator pp. 1–6.
for support vector machines classification, IEEE Transactions on [156] S. Dave, R. Baghdadi, T. Nowatzki, S. Avancha, A. Shrivastava,
Neural Networks and Learning Systems 23 (2012) 1040–1052. B. Li, Hardware acceleration of sparse and irregular tensor com-
[142] S. Gupta, M. Imani, H. Kaur, T. S. Rosing, Nnpim: A processing putations of ml models: A survey and insights, Proceedings of the
in-memory architecture for neural network acceleration, IEEE IEEE 109 (2021) 1706–1752.
Transactions on Computers 68 (2019) 1325–1337. [157] W. Olin-Ammentorp, Y. Sokolov, M. Bazhenov, A dual-memory
[143] M. He, C. Song, I. Kim, C. Jeong, S. Kim, I. Park, M. Thot- tethodi, architecture for reinforcement learning on neuromorphic platforms,
T. N. Vijaykumar, Newton: A dram-maker’s accelerator- in- Neuromorphic Computing and Engineering 1 (2021) 024003.
memory (aim) architecture for machine learning, in: 2020 53rd [158] S. Hoffmann-Eifert, Nanoscale hfo2-based memristive devices for
Annual IEEE/ACM International Symposium on Microarchitecture neuromorphic computing, in: 2022 Device Research Conference
(MICRO), 2020, pp. 372–385. doi:10.1109/MICRO50266.2020.00040. (DRC), 2022, pp. 1–2. doi:10.1109/DRC55272.2022.9855810.
[144] D. Chen, H. Jin, L. Zheng, Y. Huang, P. Yao, C. Gui, Q. Wang, [159] T. Tang, S. Li, L. Nai, N. Jouppi, Y. Xie, Neurometer: An integrated
H. Liu, H. He, X. Liao, R. Zheng, A general offloading approach for power, area, and timing modeling framework for machine learning
near-dram processing-in-memory architectures, in: 2022 IEEE Inter- accelerators industry track paper, in: 2021 IEEE International
national Parallel and Distributed Processing Symposium (IPDPS), Symposium on High-Performance Computer Architecture (HPCA),
2022, pp. 246–257. doi:10.1109/IPDPS53621.2022.00032. 2021, pp. 841–853. doi:10.1109/HPCA51647.2021.00075.
[145] F. Schuiki, M. Schaffner, F. K. Gürkaynak, L. Benini, A scalable [160] X. Wei, C. H. Yu, P. Zhang, Y. Chen, Y. Wang, H. Hu, Y. Liang,
near-memory architecture for training deep neural networks on large J. Cong, Automated systolic array architecture synthesis for high
in-memory datasets, IEEE Transactions on Computers 68 (2019) throughput cnn inference on fpgas, in: 2017 54th ACM/EDAC/IEEE
484–497. Design Automation Conference (DAC), 2017, pp. 1–6. doi:10.1145/
[146] A. S. Cordeiro, S. R. d. Santos, F. B. Moreira, P. C. Santos, L. Carro, 3061639.3062207.
M. A. Z. Alves, Machine learning migration for efficient near-data [161] H. Ahmad, M. Tanvir, M. A. Hanif, M. U. Javed, R. Hafiz,
processing, in: 2021 29th Euromicro International Conference on M. Shafique, Systimator: A design space exploration methodology
Parallel, Distributed and Network-Based Processing (PDP), 2021, for systolic array based cnns acceleration on the fpga-based edge
pp. 212–219. doi:10.1109/PDP52278.2021.00041. nodes, 2019. arXiv:1901.04986.
[147] V. Iskandar, M. A. Abd El Ghany, D. Goehringer, Near-data- [162] H. Kung, B. McDanel, S. Q. Zhang, Packing sparse convolu- tional
processing architectures performance estimation and ranking using neural networks for efficient systolic array implementations:
machine learning predictors, in: 2021 24th Euromicro Conference Column combining under joint optimization, in: Proceedings of the
on Digital System Design (DSD), 2021, pp. 158–165. doi:10.1109/ Twenty-Fourth International Conference on Architectural Sup- port
DSD53832.2021.00033. for Programming Languages and Operating Systems, ASPLOS ’19,
[148] R. Kaplan, L. Yavits, R. Ginosar, Prins: Processing-in-storage accel- Association for Computing Machinery, New York, NY, USA,
eration of machine learning, IEEE Transactions on Nanotechnology 2019, p. 821–834. URL: https://doi.org/10.1145/3297858.3304028.
17 (2018) 889–896. doi:10.1145/3297858.3304028.

D Amuru et al.: Preprint submitted to Elsevier Page 34 of 41


AI/ML in VLSI

[163] S. Han, H. Mao, W. J. Dally, Deep compression: Compressing deep [178] D.-H. Wang, P.-J. Lin, H.-T. Yang, C.-A. Hsu, S.-H. Huang, M. P.-H.
neural networks with pruning, trained quantization and huffman Lin, A novel machine-learning based soc performance monitoring
coding, 2016. arXiv:1510.00149. methodology under wide-range pvt variations with unknown critical
[164] P. Molchanov, S. Tyree, T. Karras, T. Aila, J. Kautz, Pruning con- paths, in: 2021 58th ACM/IEEE Design Automation Conference
volutional neural networks for resource efficient inference, 2017. (DAC), 2021, pp. 1370–1371. doi:10.1109/DAC18074.2021.9586155.
arXiv:1611.06440. [179] T.-W. Chen, C.-S. Tang, S.-F. Tsai, C.-H. Tsai, S.-Y. Chien, L.-G.
[165] B. Asgari, R. Hadidi, H. Kim, S. Yalamanchili, Eridanus: Efficiently Chen, Tera-scale performance machine learning soc (mlsoc) with
running inference of dnns using systolic arrays, IEEE Micro 39 dual stream processor architecture for multimedia content analysis,
(2019) 46–54. IEEE Journal of Solid-State Circuits 45 (2010) 2321–2329.
[166] C. Jiang, D. Ojika, B. Patel, H. Lam, Optimized fpga-based deep [180] P. Jokic, E. Azarkhish, R. Cattenoz, E. Türetken, L. Benini, S. Emery,
learning accelerator for sparse cnn using high bandwidth memory, A sub-mw dual-engine ml inference system-on-chip for complete
in: 2021 IEEE 29th Annual International Symposium on Field- end-to-end face-analysis at the edge, in: 2021 Symposium on VLSI
Programmable Custom Computing Machines (FCCM), 2021, pp. Circuits, 2021, pp. 1–2. doi:10.23919/VLSICircuits52068.2021.
157–164. doi:10.1109/FCCM51124.2021.00026. 9492401.
[167] T. Senoo, A. Jinguji, R. Kuramochi, H. Nakahara, A multilayer [181] C.-W. Hung, C.-H. Lee, C.-C. Kuo, S.-X. Zeng, Soc-based early
perceptron training accelerator using systolic array, in: 2021 IEEE failure detection system using deep learning for tool wear, IEEE
Asia Pacific Conference on Circuit and Systems (APCCAS), 2021, Access 10 (2022) 70491–70501.
pp. 77–80. doi:10.1109/APCCAS51387.2021.9687773. [182] A. Safaei, Q. M. J. Wu, Y. Yang, T. Akılan, System-on-a-chip
[168] N.-C. Huang, W.-K. Tseng, H.-J. Chou, K.-C. Wu, An energy- (soc)-based hardware acceleration for extreme learning machine, in:
efficient approximate systolic array based on timing error prediction 2017 24th IEEE International Conference on Electronics, Circuits
and prevention, in: 2021 IEEE 39th VLSI Test Symposium (VTS), and Systems (ICECS), 2017, pp. 470–473. doi:10.1109/ICECS.2017.
2021, pp. 1–7. doi:10.1109/VTS50974.2021.9441004. 8292050.
[169] Y. Parmar, K. Sridharan, A resource-efficient multiplierless systolic [183] Z. He, C. Shi, T. Wang, Y. Wang, M. Tian, X. Zhou, P. Li, L. Liu,
array architecture for convolutions in deep networks, IEEE Transac- N. Wu, G. Luo, A low-cost fpga implementation of spiking extreme
tions on Circuits and Systems II: Express Briefs 67 (2020) 370–374. learning machine with on-chip reward-modulated stdp learning,
[170] I. Ullah, K. Inayat, J.-S. Yang, J. Chung, Factored radix-8 systolic IEEE Transactions on Circuits and Systems II: Express Briefs 69
array for tensor processing, in: 2020 57th ACM/IEEE Design Au- (2022) 1657–1661.
tomation Conference (DAC), 2020, pp. 1–6. doi:10.1109/DAC18072. [184] L. Bai, L. Chen, Machine-learning-based early-stage timing predic-
2020.9218585. tion in soc physical design, in: 2018 14th IEEE International Con-
[171] C. Peltekis, D. Filippas, C. Nicopoulos, G. Dimitrakopoulos, ference on Solid-State and Integrated Circuit Technology (ICSICT),
Fusedgcn: A systolic three-matrix multiplication architecture for 2018, pp. 1–3. doi:10.1109/ICSICT.2018.8565778.
graph convolutional networks, in: 2022 IEEE 33rd International [185] V. Gotra, S. K. R. Reddy, Simultaneous multi voltage aware timing
Conference on Application-specific Systems, Architectures and analysis methodology for soc using machine learning, in: 2020 IEEE
Processors (ASAP), 2022, pp. 93–97. doi:10.1109/ASAP54787.2022. 33rd International System-on-Chip Conference (SOCC), 2020, pp.
00024. 254–257. doi:10.1109/SOCC49529.2020.9524780.
[172] K. Inayat, J. Chung, Hybrid accumulator factored systolic array for [186] M. M. Ziegler, J. Kwon, H.-Y. Liu, L. P. Carloni, Online and
machine learning acceleration, IEEE Transactions on Very Large offline machine learning for industrial design flow tuning: (invited
Scale Integration (VLSI) Systems 30 (2022) 881–892. - iccad special session paper), in: 2021 IEEE/ACM International
[173] S. Kundu, S. Banerjee, A. Raha, S. Natarajan, K. Basu, Toward Conference On Computer Aided Design (ICCAD), 2021, pp. 1–9.
functional safety of systolic array-based deep learning hardware doi:10.1109/ICCAD51958.2021.9643577.
accelerators, IEEE Transactions on Very Large Scale Integration [187] A. F. Ajirlou, I. Partin-Vaisband, A machine learning pipeline stage
(VLSI) Systems 29 (2021) 485–498. for adaptive frequency adjustment, IEEE Transactions on Computers
[174] P. Joseph, K. Vaswani, M. Thazhuthaveetil, Construction and use 71 (2022) 587–598.
of linear regression models for processor performance analysis, in: [188] S. Kapoor, P. Agarwal, L. Kostas, Challenges in building deployable
The Twelfth International Symposium on High-Performance machine learning solutions for soc design, in: 2022 IEEE Women in
Computer Architecture, 2006., 2006, pp. 99–108. doi:10.1109/HPCA. Technology Conference (WINTECHCON), 2022, pp. 1–6. doi:10.
2006.1598116. 1109/WINTECHCON55229.2022.9832287.
[175] B. C. Lee, D. M. Brooks, Accurate and efficient regression modeling [189] I. M. Elfadel, D. S. Boning, X. Li, Machine Learning in VLSI
for microarchitectural performance and power prediction, in: Pro- Computer-Aided Design, Springer, 2019.
ceedings of the 12th International Conference on Architectural Sup- [190] Y. Lin, W. Li, J. Gu, H. Ren, B. Khailany, D. Z. Pan, Abcdplace:
port for Programming Languages and Operating Systems, ASPLOS Accelerated batch-based concurrent detailed placement on multi-
XII, Association for Computing Machinery, New York, NY, USA, threaded cpus and gpus, IEEE Transactions on Computer-Aided
2006, p. 185–194. URL: https://doi.org/10.1145/1168857.1168881. Design of Integrated Circuits and Systems 39 (2020) 5083–5096.
doi:10.1145/1168857.1168881. [191] A. Mirhoseini, A. Goldie, M. Yazgan, J. W. Jiang, E. Songhori,
[176] H.-S. Yun, S.-J. Lee, Power prediction of mobile processors based S. Wang, Y.-J. Lee, E. Johnson, O. Pathak, A. Nazi, J. Pak, A. Tong,
on statistical analysis of performance monitoring events, Journal of K. Srinivasa, W. Hang, E. Tuncer, Q. V. Le, J. Laudon, R. Ho,
KIISE: Computing Practices and Letters 15 (2009) 469–477. R. Carpenter, J. Dean, A graph placement methodology for fast chip
[177] S. Rai, W. L. Neto, Y. Miyasaka, X. Zhang, M. Yu, Q. Y. M. Fujita, design, Nature 594 (2021) 207–212.
G. B. Manske, M. F. Pontes, L. S. da Rosa Junior, M. S. de [192] W.-T. J. Chan, K. Y. Chung, A. B. Kahng, N. D. MacDonald, S. Nath,
Aguiar, P. F. Butzen, P.-C. Chien, Y.-S. Huang, H.-R. Wang, J.- Learning-based prediction of embedded memory timing failures
H. R. Jiang, J. Gu, Z. Zhao, Z. Jiang, D. Z. Pan, B. A. de Abreu, during initial floorplan design, in: 2016 21st Asia and South Pacific
I. de Souza Campos, A. Berndt, C. Meinhardt, J. T. Carvalho, Design Automation Conference (ASP-DAC), 2016, pp. 178–185.
M. Grellert, S. Bampi, A. Lohana, A. Kumar, W. Zeng, A. Davoodi, doi:10.1109/ASPDAC.2016.7428008.
R. O. Topaloglu, Y. Zhou, J. Dotzel, Y. Zhang, H. Wang, Z. Zhang, [193] W.-K. Cheng, Y.-Y. Guo, C.-S. Wu, Evaluation of routability-driven
V. Tenace, P.-E. Gaillardon, A. Mishchenko, S. Chatterjee, Logic macro placement with machine-learning technique, in: 2018 7th
synthesis meets machine learning: Trading exactness for generaliza- International Symposium on Next Generation Electronics (ISNE),
tion, 2020. arXiv:2012.02530. 2018, pp. 1–3. doi:10.1109/ISNE.2018.8394712.

D Amuru et al.: Preprint submitted to Elsevier Page 35 of 41


AI/ML in VLSI

[194] A. Arunkumar, E. Bolotin, B. Cho, U. Milic, E. Ebrahimi, O. Villa, [209] T.-C. Chen, P.-Y. Lee, T.-C. Chen, Automatic floorplanning for
A. Jaleel, C.-J. Wu, D. Nellans, Mcm-gpu: Multi-chip-module ai socs, in: 2020 International Symposium on VLSI Design,
gpus for continued performance scalability, in: 2017 ACM/IEEE Automation and Test (VLSI-DAT), 2020, pp. 1–2. doi:10.1109/
44th Annual International Symposium on Computer Architecture VLSI-DAT49148.2020.9196464.
(ISCA), 2017, pp. 320–332. doi:10.1145/3079856.3080231. [210] Q. Cai, W. Hang, A. Mirhoseini, G. Tucker, J. Wang, W. Wei,
[195] X. Xie, P. Prabhu, U. Beaugnon, P. M. Phothilimthana, S. Roy, Reinforcement learning driven heuristic optimization, 2019. URL:
A. Mirhoseini, E. Brevdo, J. Laudon, Y. Zhou, A transferable https://arxiv.org/abs/1906.06639. doi:10.48550/ARXIV.1906.06639.
approach for partitioning machine learning models on multi-chip- [211] A. Goldie, A. Mirhoseini, Placement optimization with deep re-
modules, 2021. URL: https://arxiv.org/abs/2112.04041. doi:10. inforcement learning, in: Proceedings of the 2020 International
48550/ARXIV.2112.04041. Symposium on Physical Design, ISPD ’20, Association for Com-
[196] S. I. Ward, D. A. Papa, Z. Li, C. N. Sze, C. J. Alpert, E. Swartzlander, puting Machinery, New York, NY, USA, 2020, p. 3–7. URL: https:
Quantifying academic placer performance on custom designs, in: //doi.org/10.1145/3372780.3378174. doi:10.1145/3372780.3378174.
Proceedings of the 2011 International Symposium on Physical De- [212] A. B. Kahng, S. Mantik, A system for automatic recording and
sign, ISPD ’11, Association for Computing Machinery, New York, prediction of design quality metrics, in: Proceedings of the IEEE
NY, USA, 2011, p. 91–98. URL: https://doi.org/10.1145/1960397. 2001. 2nd International Symposium on Quality Electronic Design,
1960420. doi:10.1145/1960397.1960420. 2001, pp. 81–86.
[197] S. Ward, D. Ding, D. Z. Pan, Pade: A high-performance placer [213] A. B. Kahng, B. Lin, S. Nath, Enhanced metamodeling techniques
with automatic datapath extraction and evaluation through high- for high-dimensional ic design estimation problems, in: 2013
dimensional data learning, in: DAC Design Automation Conference Design, Automation & Test in Europe Conference & Exhibition
2012, 2012, pp. 756–761. (DATE), 2013, pp. 1861–1866. doi:10.7873/DATE.2013.371.
[198] Y. Wang, D. Yeo, H. Shin, Effective datapath logic extraction [214] A. B. Kahng, B. Lin, S. Nath, High-dimensional metamodeling for
techniques using connection vectors, IET Circuits, Devices & prediction of clock tree synthesis outcomes, in: 2013 ACM/IEEE
Systems 13 (2019) 741–747. International Workshop on System Level Interconnect Prediction
[199] A. Mirhoseini, A. Goldie, M. Yazgan, J. Jiang, E. Songhori, S. Wang, (SLIP), 2013, pp. 1–7.
Y.-J. Lee, E. Johnson, O. Pathak, S. Bae, A. Nazi, J. Pak, A. Tong, [215] Y. Kwon, J. Jung, I. Han, Y. Shin, Transient clock power estimation
K. Srinivasa, W. Hang, E. Tuncer, A. Babu, Q. V. Le, J. Laudon, of pre-cts netlist, in: 2018 IEEE International Symposium on
R. Ho, R. Carpenter, J. Dean, Chip placement with deep reinforce- Circuits and Systems (ISCAS), 2018, pp. 1–4. doi:10.1109/ISCAS.
ment learning, 2020. arXiv:2004.10746. 2018.8351430.
[200] I. Turtletaub, G. Li, M. Ibrahim, P. Franzon, Application of Quantum [216] P. Ray, V. S. Prashant, B. P. Rao, Machine learning based parameter
Machine Learning to VLSI Placement, Association for Computing tuning for performance and power optimization of multisource clock
Machinery, New York, NY, USA, 2020, p. 61–66. URL: https: tree synthesis, in: 2022 IEEE 35th International System-on-Chip
//doi.org/10.1145/3380446.3430644. Conference (SOCC), 2022, pp. 1–2. doi:10.1109/SOCC56010.2022.
[201] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. 9908123.
Love, A. Aspuru-Guzik, J. L. O’Brien, A variational eigenvalue [217] Y.-C. Lu, J. Lee, A. Agnesina, K. Samadi, S. K. Lim, Gan-
solver on a photonic quantum processor, Nature Communications cts: A generative adversarial framework for clock tree prediction
5 (2014). and optimization, in: 2019 IEEE/ACM International Conference
[202] T.-W. Huang, Machine learning system-enabled gpu acceleration on Computer-Aided Design (ICCAD), 2019, pp. 1–8. doi:10.1109/
for eda, in: 2021 International Symposium on VLSI Design, ICCAD45719.2019.8942063.
Automation and Test (VLSI-DAT), 2021, pp. 1–1. doi:10.1109/ [218] Y.-C. Lu, J. Lee, A. Agnesina, K. Samadi, S. K. Lim, A clock tree
VLSI-DAT52063.2021.9427323. prediction and optimization framework using generative adversarial
[203] A. B. Kahng, Advancing placement, in: Proceedings of the learning, IEEE Transactions on Computer-Aided Design of Inte-
2021 International Symposium on Physical Design, ISPD ’21, As- grated Circuits and Systems 41 (2022) 3104–3117.
sociation for Computing Machinery, New York, NY, USA, 2021, [219] S. A. Beheshti-Shirazi, A. Vakil, S. Manoj, I. Savidis, H. Homayoun,
p. 15–22. URL: https://doi.org/10.1145/3439706.3446884. doi:10. A. Sasan, A reinforced learning solution for clock skew engineering
1145/3439706.3446884. to reduce peak current and ir drop, in: Proceedings of the 2021
[204] A. Alhyari, A. Shamli, Z. Abuwaimer, S. Areibi, G. Grewal, A deep on Great Lakes Symposium on VLSI, GLSVLSI ’21, Associa-
learning framework to predict routability for fpga circuit placement, tion for Computing Machinery, New York, NY, USA, 2021, p.
in: 2019 29th International Conference on Field Programmable 181–187. URL: https://doi.org/10.1145/3453688.3461754. doi:10.
Logic and Applications (FPL), 2019, pp. 334–341. doi:10.1109/FPL. 1145/3453688.3461754.
2019.00060. [220] L.-T. Wang, Y.-W. Chang, K.-T. T. Cheng, Electronic Design Au-
[205] S. F. Almeida, J. Luís Güntzel, L. Behjat, C. Meinhardt, Routability- tomation: Synthesis, Verification, and Test, Morgan Kaufmann Pub-
driven detailed placement using reinforcement learning, in: 2022 lishers Inc., San Francisco, CA, USA, 2009.
IFIP/IEEE 30th International Conference on Very Large Scale In- [221] Y. Wei, C. Sze, N. Viswanathan, Z. Li, C. J. Alpert, L. Reddy,
tegration (VLSI-SoC), 2022, pp. 1–2. doi:10.1109/VLSI-SoC54400. A. D. Huber, G. E. Tellez, D. Keller, S. S. Sapatnekar, Techniques
2022.9939602. for scalable and effective routability evaluation, ACM Trans. Des.
[206] Y.-C. Lu, T. Yang, S. K. Lim, H. Ren, Placement optimization via Autom. Electron. Syst. 19 (2014).
ppa-directed graph clustering, in: 2022 ACM/IEEE 4th Workshop [222] G. Udgirkar, G. Indumathi, Vlsi global routing algorithms: A survey,
on Machine Learning for CAD (MLCAD), 2022, pp. 1–6. doi:10. in: 2016 3rd International Conference on Computing for Sustainable
1109/MLCAD55463.2022.9900089. Global Development (INDIACom), 2016, pp. 2528–2533.
[207] C.-K. Cheng, C.-T. Ho, C. Holtz, D. Lee, B. Lin, Machine learning [223] Z. Qi, Y. Cai, Q. Zhou, Z. Li, M. Chen, Vfgr: A very fast parallel
prediction for design and system technology co-optimization sensi- global router with accurate congestion modeling, in: 2014 19th
tivity analysis, IEEE Transactions on Very Large Scale Integration Asia and South Pacific Design Automation Conference (ASP-DAC),
(VLSI) Systems 30 (2022) 1059–1072. 2014, pp. 525–530. doi:10.1109/ASPDAC.2014.6742945.
[208] D. Z. Pan, Edaml 2022 keynote speaker: Machine learning for agile, [224] J. H. Friedman, Multivariate adaptive regression splines, Ann.
intelligent and open-source eda, in: 2022 IEEE International Paral- Statist. 19 (1991) 1–67.
lel and Distributed Processing Symposium Workshops (IPDPSW), [225] Z. Qi, Y. Cai, Q. Zhou, Accurate prediction of detailed routing
2022, pp. 1181–1181. doi:10.1109/IPDPSW55747.2022.00193. congestion using supervised data learning, in: 2014 IEEE 32nd
International Conference on Computer Design (ICCD), 2014, pp.

D Amuru et al.: Preprint submitted to Elsevier Page 36 of 41


AI/ML in VLSI

97–103. 2019 IFIP/IEEE 27th International Conference on Very Large Scale


[226] W. J. Chan, Y. Du, A. B. Kahng, S. Nath, K. Samadi, Beol stack- Integration (VLSI-SoC), 2019, pp. 217–222. doi:10.1109/VLSI-SoC.
aware routability prediction from placement using data mining tech- 2019.8920342.
niques, in: 2016 IEEE 34th International Conference on Computer [243] X. Chen, Z. Di, W. Wu, Q. Wu, J. Shi, Q. Feng, Detailed routing
Design (ICCD), 2016, pp. 41–48. doi:10.1109/ICCD.2016.7753259. short violation prediction using graph-based deep learning model,
[227] Z. Xie, Y. Huang, G. Fang, H. Ren, S. Fang, Y. Chen, J. Hu, IEEE Transactions on Circuits and Systems II: Express Briefs 69
Routenet: Routability prediction for mixed-size designs using con- (2022) 564–568.
volutional neural network, in: 2018 IEEE/ACM International Con- [244] L. Li, Y. Cai, Q. Zhou, A survey on machine learning-based routing
ference on Computer-Aided Design (ICCAD), 2018, pp. 1–8. doi:10. for vlsi physical design, Integr. VLSI J. 86 (2022) 51–56.
1145/3240765.3240843. [245] V. A. Chhabria, W. Jiang, A. B. Kahng, S. S. Sapatnekar, From
[228] A. F. Tabrizi, N. K. Darav, L. Rakai, A. Kennings, L. Behjat, De- global route to detailed route: Ml for fast and accurate wire parasitics
tailed routing violation prediction during placement using machine and timing prediction, in: 2022 ACM/IEEE 4th Workshop on
learning, in: 2017 International Symposium on VLSI Design, Machine Learning for CAD (MLCAD), 2022, pp. 7–14. doi:10.1109/
Automation and Test (VLSI-DAT), 2017, pp. 1–4. doi:10.1109/ MLCAD55463.2022.9900099.
VLSI-DAT.2017.7939657. [246] S. M. Sze, et al., VLSI technology, McGraw-hill, 1988.
[229] L.-C. Chen, C.-C. Huang, Y.-L. Chang, H.-M. Chen, A learning- [247] J. N. Helbert, Handbook of VLSI microlithography, Cambridge
based methodology for routability prediction in placement, in: 2018 University Press, 2001.
International Symposium on VLSI Design, Automation and Test [248] M. Phute, A. Sahastrabudhe, S. Pimparkhede, S. Potphode, K. Ren-
(VLSI-DAT), 2018, pp. 1–4. doi:10.1109/VLSI-DAT.2018.8373272. gade, S. Shilaskar, A survey on machine learning in lithography,
[230] Y.-Y. Huang, C.-T. Lin, W.-L. Liang, H.-M. Chen, Learning based in: 2021 International Conference on Artificial Intelligence and
placement refinement to reduce drc short violations, in: 2021 Inter- Machine Vision (AIMV), 2021, pp. 1–6. doi:10.1109/AIMV53313.
national Symposium on VLSI Design, Automation and Test (VLSI- 2021.9670977.
DAT), 2021, pp. 1–4. doi:10.1109/VLSI-DAT52063.2021.9427321. [249] A. Gu, A. Zakhor, Optical proximity correction with linear re-
[231] J.-R. Gao, P.-C. Wu, T.-C. Wang, A new global router for modern gression, IEEE Transactions on Semiconductor Manufacturing 21
designs, in: 2008 Asia and South Pacific Design Automation (2008) 263–271.
Conference, 2008, pp. 232–237. doi:10.1109/ASPDAC.2008.4483948. [250] R. Luo, Optical proximity correction using a multilayer perceptron
[232] T. Zhang, X. Liu, W. Tang, J. Chen, Z. Xiao, F. Zhang, W. Hu, neural network, Journal of Optics 15 (2013) 075708.
Z. Zhou, Y. Cheng, Predicted congestion using a density-based fast [251] T. Matsunawa, B. Yu, D. Z. Pan, Optical proximity correction with
neural network algorithm in global routing, in: 2019 IEEE Inter- hierarchical bayes model, in: Optical Microlithography XXVIII,
national Conference on Electron Devices and Solid-State Circuits volume 9426, International Society for Optics and Photonics, 2015,
(EDSSC), 2019, pp. 1–3. doi:10.1109/EDSSC.2019.8754196. p. 94260X.
[233] Z. Zhou, Z. Zhu, J. Chen, Y. Ma, B. Yu, T.-Y. Ho, G. Lemieux, [252] W. Gilks, Markov chain monte carlo. hoboken, 2005.
A. Ivanov, Congestion-aware global routing using deep convolu- [253] S. Choi, S. Shim, Y. Shin, Machine learning (ml)-guided opc using
tional generative adversarial networks, in: 2019 ACM/IEEE 1st basis functions of polar fourier transform, in: Optical Microlithog-
Workshop on Machine Learning for CAD (MLCAD), 2019, pp. 1–6. raphy XXIX, volume 9780, International Society for Optics and
doi:10.1109/MLCAD48534.2019.9142082. Photonics, 2016, p. 97800H.
[234] A. F. Tabrizi, N. K. Darav, L. Rakai, I. Bustany, A. Kennings, L. Be- [254] L. Pang, Y. Liu, D. Abrams, Inverse lithography technology (ilt):
hjat, Eh?predictor: A deep learning framework to identify detailed What is the impact to the photomask industry?, in: Photomask and
routing short violations from a placed netlist, IEEE Transactions Next-Generation Lithography Mask Technology XIII, volume 6283,
on Computer-Aided Design of Integrated Circuits and Systems 39 International Society for Optics and Photonics, 2006, p. 62830X.
(2020) 1177–1190. [255] N. Jia, E. Y. Lam, Machine learning for inverse lithography: using
[235] K.-R. Dai, W.-H. Liu, Y.-L. Li, Nctu-gr: Efficient simulated stochastic gradient descent for robust photomask synthesis, Journal
evolution-based rerouting and congestion-relaxed layer assignment of Optics 12 (2010) 045601.
on 3-d global routing, IEEE Transactions on Very Large Scale [256] K.-s. Luo, Z. Shi, X.-l. Yan, Z. Geng, Svm based layout retargeting
Integration (VLSI) Systems 20 (2012) 459–472. for fast and regularized inverse lithography, Journal of Zhejiang
[236] Y. Pan, Z. Zhou, A. Ivanov, Routability-driven global routing with University SCIENCE C 15 (2014) 390–400.
3d congestion estimation using a customized neural network, in: [257] X. Shi, Y. Zhao, S. Chen, C. Li, Ai computational lithography,
2022 23rd International Symposium on Quality Electronic Design in: 2020 China Semiconductor Technology International Conference
(ISQED), 2022, pp. 1–6. doi:10.1109/ISQED54688.2022.9806228. (CSTIC), 2020, pp. 1–4. doi:10.1109/CSTIC49141.2020.9282529.
[237] J. Liu, C.-W. Pui, F. Wang, E. F. Y. Young, Cugr: Detailed- [258] X. Shi, Y. Yan, T. Zhou, X. Yu, C. Li, S. Chen, Y. Zhao, Fast and
routability-driven 3d global routing with probabilistic resource accurate machine learning inverse lithography using physics-based
model, in: 2020 57th ACM/IEEE Design Automation Conference feature maps and specially designed dcnn, in: 2020 International
(DAC), 2020, pp. 1–6. doi:10.1109/DAC18072.2020.9218646. Workshop on Advanced Patterning Solutions (IWAPS), 2020, pp.
[238] P. Goswami, D. Bhatia, Congestion prediction in fpga using regres- 1–3. doi:10.1109/IWAPS51164.2020.9286814.
sion based learning methods, Electronics 10 (2021). [259] X. Xu, Y. Lin, M. Li, T. Matsunawa, S. Nojima, C. Kodama,
[239] B. Li, P. D. Franzon, Machine learning in physical design, in: T. Kotani, D. Z. Pan, Subresolution assist feature generation with
2016 IEEE 25th Conference on Electrical Performance Of Electronic supervised data learning, IEEE Transactions on Computer-Aided
Packaging And Systems (EPEPS), 2016, pp. 147–150. Design of Integrated Circuits and Systems 37 (2017) 1225–1236.
[240] E. C. Barboza, N. Shukla, Y. Chen, J. Hu, Machine learning-based [260] S. Shim, S. Choi, Y. Shin, Machine learning (ml)-based lithography
pre-routing timing prediction with reduced pessimism, in: 2019 56th optimizations, in: 2016 IEEE Asia Pacific Conference on Circuits
ACM/IEEE Design Automation Conference (DAC), 2019, pp. 1–6. and Systems (APCCAS), 2016, pp. 530–533.
[241] Y.-H. Yeh, S. Y.-H. Chen, H.-M. Chen, D.-Y. Tu, G.-Q. Fang, Y.-C. [261] S. Shim, Y. Shin, Machine learning-guided etch proximity correc-
Kuo, P.-Y. Chen, Substrate signal routing solution exploration for tion, IEEE Transactions on Semiconductor Manufacturing 30 (2017)
high-density packages with machine learning, in: 2022 International 1–7.
Symposium on VLSI Design, Automation and Test (VLSI-DAT), [262] R. Chen, H. Hu, X. Li, Y. Chen, X. SU, L. Dong, L. Qu, C. Li,
2022, pp. 1–4. doi:10.1109/VLSI-DAT54769.2022.9768081. J. Yan, Y. Wei, Etch model based on machine learning, in:
[242] R. Kirby, S. Godil, R. Roy, B. Catanzaro, Congestionnet: Rout- 2020 China Semiconductor Technology International Conference
ing congestion prediction using deep graph neural networks, in: (CSTIC), 2020, pp. 1–4. doi:10.1109/CSTIC49141.2020.9282462.

D Amuru et al.: Preprint submitted to Elsevier Page 37 of 41


AI/ML in VLSI

[263] Y. Meng, Y.-C. Kim, S. Guo, Z. Shu, Y. Zhang, Q. Liu, Machine [278] D. Ding, B. Yu, J. Ghosh, D. Z. Pan, Epic: Efficient prediction of ic
learning models for edge placement error based etch bias, IEEE manufacturing hotspots with a unified meta-classification formula-
Transactions on Semiconductor Manufacturing 34 (2021) 42–48. tion, in: 17th Asia and South Pacific Design Automation Conference,
[264] Y. Lin, M. Li, Y. Watanabe, T. Kimura, T. Matsunawa, S. Nojima, IEEE, 2012, pp. 263–270.
D. Z. Pan, Data efficient lithography modeling with transfer learning [279] T. Matsunawa, J.-R. Gao, B. Yu, D. Z. Pan, A new lithogra-
and active data selection, IEEE Transactions on Computer-Aided phy hotspot detection framework based on adaboost classifier and
Design of Integrated Circuits and Systems 38 (2019) 1900–1913. simplified feature extraction, in: Design-Process-Technology Co-
[265] H. Yang, S. Li, Y. Ma, B. Yu, E. F. Young, Gan-opc: Mask optimization for Manufacturability IX, volume 9427, International
optimization with lithography-guided generative adversarial nets, in: Society for Optics and Photonics, 2015, p. 94270S.
Proceedings of the 55th Annual Design Automation Conference, [280] Y. Chen, Y. Lin, T. Gai, Y. Su, Y. Wei, D. Z. Pan, Semi-supervised
2018, pp. 1–6. hotspot detection with self-paced multi-task learning, IEEE Transac-
[266] M. B. Alawieh, Y. Lin, Z. Zhang, M. Li, Q. Huang, D. Z. Pan, Gan- tions on Computer-Aided Design of Integrated Circuits and Systems
sraf: Sub-resolution assist feature generation using conditional (2019).
generative adversarial networks, in: Proceedings of the 56th Annual [281] M. Shin, J.-H. Lee, Accurate lithography hotspot detection
Design Automation Conference 2019, 2019, pp. 1–6. using deep convolutional neural networks, Journal of Mi-
[267] W. Ye, M. B. Alawieh, Y. Lin, D. Z. Pan, Lithogan: End-to-end cro/Nanolithography, MEMS, and MOEMS 15 (2016) 043507.
lithography modeling with generative adversarial networks, in: 2019 [282] V. Borisov, J. Scheible, Lithography hotspots detection using deep
56th ACM/IEEE Design Automation Conference (DAC), IEEE, learning, in: 2018 15th International Conference on Synthesis,
2019, pp. 1–6. Modeling, Analysis and Simulation Methods and Applications to
[268] H. Yang, W. Zhong, Y. Ma, H. Geng, R. Chen, W. Chen, B. Yu, Vlsi Circuit Design (SMACD), IEEE, 2018, pp. 145–148.
mask optimization: From shallow to deep learning, in: 2020 25th [283] H. Yang, Y. Lin, B. Yu, E. F. Young, Lithography hotspot detection:
Asia and South Pacific Design Automation Conference (ASP-DAC), From shallow to deep learning, in: 2017 30th IEEE International
IEEE, 2020, pp. 434–439. System-on-Chip Conference (SOCC), IEEE, 2017, pp. 233–238.
[269] H. Yang, S. Li, C. Tabery, B. Lin, B. Yu, Bridging the gap between [284] H. Yang, L. Luo, J. Su, C. Lin, B. Yu, Imbalance aware lithog-
layout pattern sampling and hotspot detection via batch active raphy hotspot detection: a deep learning approach, Journal of
learning, IEEE Transactions on Computer-Aided Design of Micro/Nanolithography, MEMS, and MOEMS 16 (2017) 033504.
Integrated Circuits and Systems 40 (2021) 1464–1475. [285] H. Zhang, B. Yu, E. F. Young, Enabling online learning in
[270] N. Figueiro, F. Sanchez, R. Koret, M. Shifrin, Y. Etzioni, lithography hotspot detection with information-theoretic feature op-
S. Wolfling, M. Sendelbach, Y. Blancquaert, T. Labbaye, G. Rade- timization, in: Proceedings of the 35th International Conference on
maker, J. Pradelles, L. Mourier, S. Rey, L. Pain, Application of Computer-Aided Design, 2016, pp. 1–8.
scatterometry-based machine learning to control multiple electron [286] W. Ye, M. B. Alawieh, M. Li, Y. Lin, D. Z. Pan, Litho-gpa: Gaussian
beam lithography: Am: Advanced metrology, in: 2018 29th An- process assurance for lithography hotspot detection, in: 2019 Design,
nual SEMI Advanced Semiconductor Manufacturing Conference Automation & Test in Europe Conference & Exhibition (DATE),
(ASMC), 2018, pp. 328–333. doi:10.1109/ASMC.2018.8373222. IEEE, 2019, pp. 54–59.
[271] M. P. McLaughlin, A. Stamper, G. Barber, J. Paduano, P. Mennell, [287] J. W. Park, A. Torres, X. Song, Litho-aware machine learning for
E. Benn, M. Linnane, J. Zwick, C. Khatumria, R. L. Isaacson, hotspot detection, IEEE Transactions on Computer-Aided Design
N. Hoffman, C. Menser, Enhanced defect detection in after de- of Integrated Circuits and Systems 37 (2018) 1510–1514.
velop inspection with machine learning disposition, in: 2021 32nd [288] K. Madkour, S. Mohamed, D. Tantawy, M. Anis, Hotspot detection
Annual SEMI Advanced Semiconductor Manufacturing Conference using machine learning, in: 2016 17th International Symposium on
(ASMC), 2021, pp. 1–5. doi:10.1109/ASMC51741.2021.9435721. Quality Electronic Design (ISQED), 2016, pp. 405–409. doi:10.
[272] N. Nagase, K. Suzuki, K. Takahashi, M. Minemura, S. Yamauchi, 1109/ISQED.2016.7479235.
T. Okada, Study of hot spot detection using neural networks [289] H. Yang, J. Su, Y. Zou, Y. Ma, B. Yu, E. F. Y. Young, Layout hotspot
judgment, in: Photomask and Next-Generation Lithography Mask detection with feature tensor generation and deep biased learning,
Technology XIV, volume 6607, International Society for Optics and IEEE Transactions on Computer-Aided Design of Integrated Cir-
Photonics, 2007, p. 66071B. cuits and Systems 38 (2019) 1175–1187.
[273] D. Ding, X. Wu, J. Ghosh, D. Z. Pan, Machine learning based [290] T. Gai, T. Qu, S. Wang, X. Su, R. Xu, Y. Wang, J. Xue, Y. Su, Y. Wei,
lithographic hotspot detection with critical-feature extraction and T. Ye, Flexible hotspot detection based on fully convolutional
classification, in: 2009 IEEE International Conference on IC Design network with transfer learning, IEEE Transactions on Computer-
and Technology, IEEE, 2009, pp. 219–222. Aided Design of Integrated Circuits and Systems 41 (2022) 4626–
[274] N. Ma, J. Ghan, S. Mishra, C. Spanos, K. Poolla, N. Rodriguez, 4638.
L. Capodieci, Automatic hotspot classification using pattern-based [291] Y. Zhang, C. Zhang, M. Li, L. Zhao, C. Yang, Z. Wang, Modified
clustering, in: Design for Manufacturability through Design-Process deep learning approach for layout hotspot detection, in: 2018 IEEE
Integration II, volume 6925, International Society for Optics and International Conference on Electron Devices and Solid-State
Photonics, 2008, p. 692505. Circuits (EDSSC), 2018, pp. 1–2. doi:10.1109/EDSSC.2018.8487177.
[275] J. Ghan, N. Ma, S. Mishra, C. Spanos, K. Poolla, N. Rodriguez, [292] M. T. Ismail, H. Sharara, K. Madkour, K. Seddik, Autoencoder-
L. Capodieci, Clustering and pattern matching for an automatic based data sampling for machine learning-based lithography hotspot
hotspot classification and detection system, in: Design for Manu- detection, in: 2022 ACM/IEEE 4th Workshop on Machine Learning
facturability through Design-Process Integration III, volume 7275, for CAD (MLCAD), 2022, pp. 91–96. doi:10.1109/MLCAD55463.2022.
International Society for Optics and Photonics, 2009, p. 727516. 9900096.
[276] D. Ding, J. A. Torres, D. Z. Pan, High performance lithography [293] H. Yang, W. Chen, P. Pathak, F. Gennari, Y.-C. Lai, B. Yu, Auto-
hotspot detection with successively refined pattern identifications matic layout generation with applications in machine learning engine
and machine learning, IEEE Transactions on Computer-Aided evaluation, in: 2019 ACM/IEEE 1st Workshop on Machine Learning
Design of Integrated Circuits and Systems 30 (2011) 1621–1634. for CAD (MLCAD), 2019, pp. 1–6. doi:10.1109/MLCAD48534.2019.
[277] Y.-T. Yu, G.-H. Lin, I. H.-R. Jiang, C. Chiang, Machine-learning- 9142121.
based hotspot detection using topological classification and critical [294] W. Zhang, K. Chen, X. Li, Y. Ma, C. Zhu, B. Chen, X. Gao,
feature extraction, IEEE Transactions on Computer-Aided Design K. Kim, A workflow of hotspot prediction based on semi-supervised
of Integrated Circuits and Systems 34 (2015) 460–470. machine learning methodology, in: 2021 International Workshop
on Advanced Patterning Solutions (IWAPS), 2021, pp. 1–3. doi:10.

D Amuru et al.: Preprint submitted to Elsevier Page 38 of 41


AI/ML in VLSI

1109/IWAPS54037.2021.9671068. [313] L. Alrahis, J. Knechtel, F. Klemme, H. Amrouch, O. Sinanoglu,


[295] M. B. Alawieh, D. Z. Pan, Adapt: An adaptive machine learning Gnn4rel: Graph neural networks for predicting circuit reliability
framework with application to lithography hotspot detection, in: degradation, IEEE Transactions on Computer-Aided Design of
2021 ACM/IEEE 3rd Workshop on Machine Learning for CAD Integrated Circuits and Systems 41 (2022) 3826–3837.
(MLCAD), 2021, pp. 1–6. doi:10.1109/MLCAD52597.2021.9531210. [314] S. Peng, W. Jin, L. Chen, S. X.-D. Tan, Data-driven fast elec-
[296] D. Schmidt, K. Petrillo, M. Breton, J. Fullam, R. Koret, I. Turovets, trostatics and tddb aging analysis, in: Proceedings of the 2020
A. Cepler, Advanced euv resist characterization using scatterometry ACM/IEEE Workshop on Machine Learning for CAD, MLCAD ’20,
and machine learning, in: 2021 32nd Annual SEMI Advanced Association for Computing Machinery, New York, NY, USA, 2020,
Semiconductor Manufacturing Conference (ASMC), 2021, pp. 1–4. p. 71–76. URL: https://doi.org/10.1145/3380446.3430620. doi:10.
doi:10.1109/ASMC51741.2021.9435698. 1145/3380446.3430620.
[297] M. P. McLaughlin, P. Mennell, A. Stamper, G. Barber, J. Paduano, [315] S. Lamichhane, S. Peng, W. Jin, S. X.-D. Tan, Fast electrostatic anal-
E. Benn, M. Linnane, J. Zwick, C. Khatumria, R. L. Isaacson, ysis for vlsi aging based on generative learning, in: 2021 ACM/IEEE
N. Hoffman, C. Menser, Improved color defect detection with 3rd Workshop on Machine Learning for CAD (MLCAD), 2021, pp.
machine learning for after develop inspections in lithography, IEEE 1–6. doi:10.1109/MLCAD52597.2021.9531320.
Transactions on Semiconductor Manufacturing 35 (2022) 418–424. [316] P.-N. Hsu, K.-C. Shie, K.-P. Chen, J.-C. Tu, C.-C. Wu, N.-T. Tsou, Y.-
[298] P. Parashar, C. Akbar, T. S. Rawat, S. Pratik, R. Butola, S. H. Chen, C. Lo, N.-Y. Chen, Y.-F. Hsieh, M. Wu, C. Chen, K.-N. Tu, Artificial
Y.-S. Chang, S. Nuannimnoi, A. S. Lin, Intelligent photolithogra- intelligence deep learning for 3d ic reliability prediction, Scientific
phy corrections using dimensionality reductions, IEEE Photonics Reports 12 (2022) 6711.
Journal 11 (2019) 1–15. [317] Y. Pan, Z. Lu, H. Zhang, H. Zhang, M. T. Arafin, Z. Liu, G. Qu,
[299] T. Zhou, B. Xu, C. Li, X. Diao, Y. Yan, S. Chen, Y. Zhao, Adlpt: Improving 3d nand flash memory reliability by adaptive
K. Zhou, W. Zhou, X. Zeng, X. Shi, Mining lithography hotspots lifetime prediction techniques, IEEE Transactions on Computers
from massive sem images using machine learning model, in: 2021 (2022) 1–14.
China Semiconductor Technology International Conference [318] S. Kundu, K. Basu, M. Sadi, T. Titirsha, S. Song, A. Das, U. Guin,
(CSTIC), 2021, pp. 1–3. doi:10.1109/CSTIC52283.2021.9461533. Special session: Reliability analysis for ai/ml hardware, in: 2021
[300] Y.-F. Yang, M. Sun, Hybrid quantum-classical machine learning IEEE 39th VLSI Test Symposium (VTS), 2021, pp. 1–10. doi:10.
for lithography hotspot detection, in: 2022 33rd Annual SEMI Ad- 1109/VTS50974.2021.9441050.
vanced Semiconductor Manufacturing Conference (ASMC), 2022, [319] F. Regazzoni, S. Bhasin, A. A. Pour, I. Alshaer, F. Aydin, A. Aysu,
pp. 1–6. doi:10.1109/ASMC54647.2022.9792509. V. Beroulle, G. Di Natale, P. Franzon, D. Hely, N. Homma, A. Ito,
[301] T. C. Tin, S. C. Tan, C. K. Lee, Virtual metrology in semiconductor D. Jap, P. Kashyap, I. Polian, S. Potluri, R. Ueno, E.-I. Vatajelu,
fabrication foundry using deep learning neural networks, IEEE V. Yli-Mäyry, Machine learning and hardware security: Challenges
Access 10 (2022) 81960–81973. and opportunities -invited talk-, in: 2020 IEEE/ACM International
[302] X. Zhang, J. Shiely, E. F. Young, Layout pattern generation and Conference On Computer Aided Design (ICCAD), 2020, pp. 1–6.
legalization with generative learning models, in: 2020 IEEE/ACM [320] L. M. Silva, F. V. Andrade, A. O. Fernandes, L. F. M. Vieira, Arith-
International Conference On Computer Aided Design (ICCAD), metic circuit classification using convolutional neural networks, in:
2020, pp. 1–9. 2018 International Joint Conference on Neural Networks (IJCNN),
[303] D. P. Kingma, M. Welling, An introduction to variational autoen- 2018, pp. 1–7. doi:10.1109/IJCNN.2018.8489382.
coders, Found. Trends Mach. Learn. 12 (2019) 307–392. [321] X. Hong, T. Lin, Y. Shi, B. H. Gwee, Asic circuit netlist recognition
[304] M. Mirza, S. Osindero, Conditional generative adversarial nets, using graph neural network, in: 2021 IEEE International Symposium
CoRR abs/1411.1784 (2014). on the Physical and Failure Analysis of Integrated Circuits (IPFA),
[305] M. B. Alawieh, W. Ye, D. Z. Pan, Re-examining vlsi manufacturing 2021, pp. 1–5. doi:10.1109/IPFA53173.2021.9617311.
and yield through the lens of deep learning : (invited talk), in: 2020 [322] G. Ali, L. Bagheriye, H. G. Kerkhoff, On-chip embedded instru-
IEEE/ACM International Conference On Computer Aided Design ments data fusion and life-time prognostics of dependable vlsi-socs
(ICCAD), 2020, pp. 1–8. using machine-learning, in: 2020 IEEE International Symposium
[306] K. N. Patel, I. Markov, J. Hayes, Evaluating circuit reliability under on Circuits and Systems (ISCAS), 2020, pp. 1–5. doi:10.1109/
probabilistic gate-level fault models, 2003. ISCAS45731.2020.9180773.
[307] S. Krishnaswamy, G. F. Viamontes, I. L. Markov, J. P. Hayes, [323] K. G. Liakos, G. K. Georgakilas, F. C. Plessas, Hardware trojan
Accurate reliability evaluation and enhancement via probabilistic classification at gate-level netlists based on area and power ma-
transfer matrices, in: Design, Automation and Test in Europe, 2005, chine learning analysis, in: 2021 IEEE Computer Society Annual
pp. 282–287 Vol. 1. doi:10.1109/DATE.2005.47. Symposium on VLSI (ISVLSI), 2021, pp. 412–417. doi:10.1109/
[308] M. R. Choudhury, K. Mohanram, Accurate and scalable reliability ISVLSI51109.2021.00081.
analysis of logic circuits, in: 2007 Design, Automation Test in [324] L. Alrahis, A. Sengupta, J. Knechtel, S. Patnaik, H. Saleh, B. Mo-
Europe Conference Exhibition, 2007, pp. 1–6. doi:10.1109/DATE. hammad, M. Al-Qutayri, O. Sinanoglu, Gnn-re: Graph neural net-
2007.364503. works for reverse engineering of gate-level netlists, IEEE Transac-
[309] A. Beg, F. Awwad, W. Ibrahim, F. Ahmed, On the reliability tions on Computer-Aided Design of Integrated Circuits and Systems
estimation of nano-circuits using neural networks, Microprocessors 41 (2022) 2435–2448.
and Microsystems 39 (2015) 674 – 685. [325] L. Amarú, P.-E. Gaillardon, G. De Micheli, The epfl combinational
[310] N. Karimi, K. Huang, Prognosis of nbti aging using a machine benchmark suite, in: Proceedings of the 24th International Workshop
learning scheme, in: 2016 IEEE International Symposium on Defect on Logic & Synthesis (IWLS), CONF, 2015.
and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), [326] P. R. Genssler, H. Amrouch, Brain-inspired computing for circuit
2016, pp. 7–10. doi:10.1109/DFT.2016.7684060. reliability characterization, IEEE Transactions on Computers 71
[311] S. Bian, M. Hiromoto, M. Shintani, T. Sato, Lsta: Learning- based (2022) 3336–3348.
static timing analysis for high-dimensional correlated on-chip [327] S.-J. Jang, J.-S. Kim, T.-W. Kim, H.-J. Lee, S. Ko, A wafer
variations, in: 2017 54th ACM/EDAC/IEEE Design Automation map yield prediction based on machine learning for productivity
Conference (DAC), 2017, pp. 1–6. enhancement, IEEE Transactions on Semiconductor Manufacturing
[312] T. Cho, R. Liang, G. Yu, J. Xu, Reliability analysis of p-type soi 32 (2019) 400–407.
finfets with multiple sige channels on the degradation of nbti, in: [328] W. Maly, A. J. Strojwas, S. W. Director, Vlsi yield prediction and
2020 IEEE Silicon Nanoelectronics Workshop (SNW), 2020, pp. estimation: A unified framework, IEEE Transactions on Computer-
101–102. Aided Design of Integrated Circuits and Systems 5 (1986) 114–130.

D Amuru et al.: Preprint submitted to Elsevier Page 39 of 41


AI/ML in VLSI

[329] I. Koren, Z. Koren, Defect tolerance in vlsi circuits: techniques and 2009.5355620.
yield analysis, Proceedings of the IEEE 86 (1998) 1819–1838. [349] L.-C. Wang, Data learning techniques for functional/system fmax
[330] P. Backus, M. Janakiram, S. Mowzoon, C. Runger, A. Bhargava, prediction, in: 2009 24th IEEE International Symposium on Defect
Factory cycle-time prediction with a data-mining approach, IEEE and Fault Tolerance in VLSI Systems, 2009, pp. 451–451. doi:10.
Transactions on Semiconductor Manufacturing 19 (2006) 252–258. 1109/DFT.2009.61.
[331] Y. Meidan, B. Lerner, G. Rabinowitz, M. Hassoun, Cycle-time key [350] P. Krishnamurthy, A. B. Chowdhury, B. Tan, F. Khorrami, R. Karri,
factor identification and prediction in semiconductor manufacturing Explaining and interpreting machine learning cad decisions: An ic
using machine learning and data mining, IEEE Transactions on testing case study, in: 2020 ACM/IEEE 2nd Workshop on Machine
Semiconductor Manufacturing 24 (2011) 237–248. Learning for CAD (MLCAD), 2020, pp. 129–134. doi:10.1145/
[332] C.-F. Chien, W.-C. Wang, J.-C. Cheng, Data mining for yield en- 3380446.3430643.
hancement in semiconductor manufacturing and an empirical study, [351] S. Roy, S. K. Millican, V. D. Agrawal, Training neural network for
Expert Systems with Applications 33 (2007) 192–198. machine intelligence in automatic test pattern generator, in: 2021
[333] D. Jiang, W. Lin, N. Raghavan, A gaussian mixture model clustering 34th International Conference on VLSI Design and 2021 20th
ensemble regressor for semiconductor manufacturing final test yield International Conference on Embedded Systems (VLSID), 2021, pp.
prediction, IEEE Access 9 (2021) 22253–22263. 316–321. doi:10.1109/VLSID51830.2021.00059.
[334] D. Jiang, W. Lin, N. Raghavan, Semiconductor manufacturing final [352] S. Roy, S. K. Millican, V. D. Agrawal, Multi-heuristic machine
test yield optimization and wafer acceptance test parameter inverse intelligence guidance in automatic test pattern generation, in: 2022
design using multi-objective optimization algorithms, IEEE Access IEEE 31st Microelectronics Design & Test Symposium (MDTS),
9 (2021) 137655–137666. 2022, pp. 1–6. doi:10.1109/MDTS54894.2022.9826985.
[335] H. Gun Kim, Y. S. Han, J.-H. Lee, Package yield enhancement [353] S. Vasudevan, W. J. Jiang, D. Bieber, R. Singh, h. shojaei, C. R.
using machine learning in semiconductor manufacturing, in: 2015 Ho, C. Sutton, Learning semantic representations to verify hard-
IEEE Advanced Information Technology, Electronic and Automa- ware designs, in: M. Ranzato, A. Beygelzimer, Y. Dauphin,
tion Control Conference (IAEAC), 2015, pp. 316–320. doi:10.1109/ P. Liang, J. W. Vaughan (Eds.), Advances in Neural Information
IAEAC.2015.7428567. Processing Systems, volume 34, Curran Associates, Inc., 2021, pp.
[336] J.-S. Kim, S.-J. Jang, T.-W. Kim, H.-J. Lee, J.-B. Lee, A productivity- 23491–23504. URL: https://proceedings.neurips.cc/paper/2021/
oriented wafer map optimization using yield model based on ma- file/c5aa65949d20f6b20e1a922c13d974e7-Paper.pdf.
chine learning, IEEE Transactions on Semiconductor Manufacturing [354] T. Song, H. Liang, T. Ni, Z. Huang, Y. Lu, J. Wan, A. Yan, Pattern
32 (2019) 39–47. reorder for test cost reduction through improved svmrank algorithm,
[337] C. Mead, L. Conway, Introduction to VLSI systems, volume 1080, IEEE Access 8 (2020) 147965–147972.
Addison-Wesley Reading, MA, 1980. [355] T. Song, Z. Huang, A. Yan, Machine learning classification algo-
[338] D. Price, Pentium fdiv flaw-lessons learned, IEEE Micro 15 (1995) rithm for vlsi test cost reduction, Integration 87 (2022) 40–48.
86–88. [356] C.-Y. Chen, J.-L. Huang, Reinforcement-learning-based test pro-
[339] L.-T. Wang, C.-W. Wu, X. Wen, VLSI Test Principles and Ar- gram generation for software-based self-test, in: 2019 IEEE 28th
chitectures: Design for Testability (Systems on Silicon), Morgan Asian Test Symposium (ATS), 2019, pp. 73–735. doi:10.1109/
Kaufmann Publishers Inc., San Francisco, CA, USA, 2006. ATS47505.2019.00013.
[340] B. Wile, J. Goss, W. Roesner, Comprehensive Functional Verifica- [357] Y. Maidon, B. Jervis, N. Dutton, S. Lesage, Diagnosis of multifaults
tion: The Complete Industry Cycle, Morgan Kaufmann Publishers in analogue circuits using multilayer perceptrons, IEE Proceedings-
Inc., San Francisco, CA, USA, 2005. Circuits, Devices and Systems 144 (1997) 149–154.
[341] R. Lisanke, F. Brglez, A. de Geus, D. Gregory, Testability-driven [358] M. A. El-Gamal, M. A. El-Yazeed, A combined clustering and
random test-pattern generation, IEEE Transactions on Computer- neural network approach for analog multiple hard fault classification,
Aided Design of Integrated Circuits and Systems 6 (1987) 1082– Journal of Electronic Testing 14 (1999) 207–217.
1087. [359] F. Aminian, M. Aminian, Fault diagnosis of nonlinear analog
[342] C. Fagot, P. Girard, C. Landrault, On using machine learning for circuits using neural networks with wavelet and fourier transforms
logic bist, in: Proceedings International Test Conference 1997, 1997, as preprocessors, Journal of Electronic Testing 17 (2001) 471–481.
pp. 338–346. doi:10.1109/TEST.1997.639635. [360] M. Aminian, F. Aminian, A modular fault-diagnostic system for
[343] S. Fine, A. Ziv, Coverage directed test generation for functional analog electronic circuits using neural networks with wavelet trans-
verification using bayesian networks, in: Proceedings 2003. Design form as a preprocessor, IEEE Transactions on Instrumentation and
Automation Conference (IEEE Cat. No.03CH37451), 2003, pp. Measurement 56 (2007) 1546–1554.
286–291. doi:10.1145/775832.775907. [361] A. DeOrio, Q. Li, M. Burgess, V. Bertacco, Machine learning-based
[344] M. Braun, S. Fine, A. Ziv, Enhancing the efficiency of bayesian anomaly detection for post-silicon bug diagnosis, in: 2013 Design,
network based coverage directed test generation, in: Proceedings. Automation & Test in Europe Conference & Exhibition (DATE),
Ninth IEEE International High-Level Design Validation and Test 2013, pp. 491–496. doi:10.7873/DATE.2013.112.
Workshop (IEEE Cat. No.04EX940), 2004, pp. 75–80. doi:10.1109/ [362] Y. Huang, R. Guo, W.-T. Cheng, J. C.-M. Li, Survey of scan chain
HLDVT.2004.1431241. diagnosis, IEEE Design & Test of Computers 25 (2008) 240–248.
[345] W. Hughes, S. Srinivasan, R. Suvarna, M. Kulkarni, Optimizing de- [363] Y. Huang, B. Benware, R. Klingenberg, H. Tang, J. Dsouza, W.-
sign verification using machine learning: Doing better than random, T. Cheng, Scan chain diagnosis based on unsupervised machine
CoRR abs/1909.13168 (2019). learning, in: 2017 IEEE 26th Asian Test Symposium (ATS), 2017,
[346] S. Fine, A. Freund, I. Jaeger, Y. Mansour, Y. Naveh, A. Ziv, Har- pp. 225–230. doi:10.1109/ATS.2017.50.
nessing machine learning to improve the success rate of stimuli [364] M. Chern, S.-W. Lee, S.-Y. Huang, Y. Huang, G. Veda, K.-H. H.
generation, IEEE Transactions on Computers 55 (2006) 1344–1355. Tsai, W.-T. Cheng, Improving scan chain diagnostic accuracy using
[347] H. Dhotre, S. Eggersglüß, M. Dehbashi, U. Pfannkuchen, R. Drech- multi-stage artificial neural networks, in: Proceedings of the 24th
sler, Machine learning based test pattern analysis for localizing crit- Asia and South Pacific Design Automation Conference, ASPDAC
ical power activity areas, in: 2017 IEEE International Symposium ’19, Association for Computing Machinery, New York, NY, USA,
on Defect and Fault Tolerance in VLSI and Nanotechnology Systems 2019, p. 341–346. URL: https://doi.org/10.1145/3287624.3287692.
(DFT), 2017, pp. 1–6. doi:10.1109/DFT.2017.8244464. doi:10.1145/3287624.3287692.
[348] J. Chen, L.-C. Wang, P.-H. Chang, J. Zeng, S. Yu, M. Mateja, Data [365] H. Lim, T. H. Kim, S. Kim, S. Kang, Diagnosis of scan chain faults
learning techniques and methodology for fmax prediction, in: 2009 based-on machine-learning, in: 2020 International SoC De- sign
International Test Conference, 2009, pp. 1–10. doi:10.1109/TEST. Conference (ISOCC), 2020, pp. 57–58. doi:10.1109/ISOCC50952.

D Amuru et al.: Preprint submitted to Elsevier Page 40 of 41


AI/ML in VLSI

2020.9333074. for iot systems, in: 2018 IEEE International Symposium on High
[366] Z. Liu, Q. Huang, C. Fang, R. D. Blanton, Improving test chip Performance Computer Architecture (HPCA), 2018, pp. 92–103.
design efficiency via machine learning, in: 2019 IEEE International doi:10.1109/HPCA.2018.00018.
Test Conference (ITC), 2019, pp. 1–10. doi:10.1109/ITC44170.2019. [383] E. Eleftheriou, “in-memory computing”: Accelerating ai applica-
9000131. tions, in: 2018 48th European Solid-State Device Research Confer-
[367] Y.-C. Cheng, P.-Y. Tan, C.-W. Wu, M.-D. Shieh, C.-H. Chuang, ence (ESSDERC), 2018, pp. 4–5. doi:10.1109/ESSDERC.2018.8486900.
G. Liao, A decision tree-based screening method for improv- [384] B. Yu, D. Z. Pan, T. Matsunawa, X. Zeng, Machine learning
ing test quality of memory chips, in: 2022 IEEE International Test and pattern matching in physical design, in: The 20th Asia and
Conference in Asia (ITC-Asia), 2022, pp. 19–24. doi:10.1109/ South Pacific Design Automation Conference, 2015, pp. 286–293.
ITCAsia55616.2022.00014. doi:10.1109/ASPDAC.2015.7059020.
[368] R. Sleik, M. Glavanovics, Y. Nikitin, M. Di Bernardo, A. Muetze, [385] H. Iwai, K. Kakushima, H. Wong, Challenges for future semiconduc-
K. Krischan, Performance enhancement of a modular test system tor manufacturing, International journal of high speed electronics
for power semiconductors for htol testing by use of an embedded and systems 16 (2006) 43–81.
system, in: 2017 19th European Conference on Power Electronics [386] Vandana, A. Singh, Multi-objective test case minimization using
and Applications (EPE’17 ECCE Europe), 2017, pp. P.1–P.8. doi:10. evolutionary algorithms: A review, in: 2017 International con-
23919/EPE17ECCEEurope.2017.8098933. ference of Electronics, Communication and Aerospace Technology
[369] C. Liu, J. Ou, Smart sampling for efficient system level test: A (ICECA), volume 1, 2017, pp. 329–334. doi:10.1109/ICECA.2017.
robust machine learning approach, in: 2021 IEEE International 8203698.
Test Conference (ITC), 2021, pp. 53–62. doi:10.1109/ITC50571.2021. [387] V. N. Vapnik, The Nature of Statistical Learning Theory, Springer-
00013. Verlag, Berlin, Heidelberg, 1995.
[370] C. Fang, Q. Huang, R. Blanton, Adaptive test pattern reordering for [388] S. Lathuilière, P. Mesejo, X. Alameda-Pineda, R. Horaud, A
diagnosis using k-nearest neighbors, in: 2020 IEEE International comprehensive analysis of deep regression, IEEE Transactions on
Test Conference in Asia (ITC-Asia), 2020, pp. 59–64. doi:10.1109/ Pattern Analysis and Machine Intelligence 42 (2020) 2065–2081.
ITC-Asia51099.2020.00022. [389] S. Obilisetty, Digital intelligence and chip design, in: 2018
[371] M. Liu, K. Chakrabarty, Adaptive methods for machine learning- International Symposium on VLSI Design, Automation and Test
based testing of integrated circuits and boards, in: 2021 IEEE (VLSI-DAT), 2018, pp. 1–4. doi:10.1109/VLSI-DAT.2018.8373256.
International Test Conference (ITC), 2021, pp. 153–162. doi:10. [390] M. Shafique, R. Hafiz, M. U. Javed, S. Abbas, L. Sekanina, Z. Va-
1109/ITC50571.2021.00023. sicek, V. Mrazek, Adaptive and energy-efficient architectures for
[372] A. B. Chowdhury, B. Tan, S. Garg, R. Karri, Robust deep learning machine learning: Challenges, opportunities, and research roadmap,
for ic test problems, IEEE Transactions on Computer-Aided Design in: 2017 IEEE Computer Society Annual Symposium on VLSI
of Integrated Circuits and Systems 41 (2022) 183–195. (ISVLSI), 2017, pp. 627–632. doi:10.1109/ISVLSI.2017.124.
[373] H. Amrouch, A. B. Chowdhury, W. Jin, R. Karri, F. Khorrami,
P. Krishnamurthy, I. Polian, V. M. van Santen, B. Tan, S. X.-D.
Tan, Special session: Machine learning for semiconductor test and
reliability, in: 2021 IEEE 39th VLSI Test Symposium (VTS), 2021,
pp. 1–11. doi:10.1109/VTS50974.2021.9441052.
[374] S. Roy, S. K. Millican, V. D. Agrawal, Special session – machine
learning in test: A survey of analog, digital, memory, and rf inte-
grated circuits, in: 2021 IEEE 39th VLSI Test Symposium (VTS),
2021, pp. 1–14. doi:10.1109/VTS50974.2021.9441051.
[375] E. Sentovich, K. Singh, C. Moon, H. Savoj, R. Brayton,
A. Sangiovanni-Vincentelli, Sequential circuit design using syn-
thesis and optimization, in: Proceedings 1992 IEEE International
Conference on Computer Design: VLSI in Computers Processors,
1992, pp. 328–333. doi:10.1109/ICCD.1992.276282.
[376] B. Shakya, T. He, H. Salmani, D. Forte, S. Bhunia, M. Tehranipoor,
Benchmarking of hardware trojans and maliciously affected circuits,
Journal of Hardware and Systems Security 1 (2017) 85–102.
[377] E. Ïpek, S. A. McKee, R. Caruana, B. R. de Supinski, M. Schulz,
Efficiently exploring architectural design spaces via predictive mod-
eling, in: Proceedings of the 12th International Conference on
Architectural Support for Programming Languages and Operating
Systems, ASPLOS XII, Association for Computing Machinery, New
York, NY, USA, 2006, p. 195–206. URL: https://doi.org/10.1145/
1168857.1168882. doi:10.1145/1168857.1168882.
[378] H.-J. Yoo, Mobile/embedded dnn and ai socs, in: 2018 International
Symposium on VLSI Design, Automation and Test (VLSI-DAT),
2018, pp. 1–1. doi:10.1109/VLSI-DAT.2018.8373285.
[379] Jeong-Taek Kong, Cad for nanometer silicon design challenges and
success, IEEE Transactions on Very Large Scale Integration (VLSI)
Systems 12 (2004) 1132–1147.
[380] M. T. Bohr, Nanotechnology goals and challenges for electronic
applications, IEEE Transactions on Nanotechnology 1 (2002) 56–
62.
[381] Y.-W. Lin, Y.-B. Lin, C.-Y. Liu, Aitalk: a tutorial to implement ai as
iot devices, IET Networks 8 (2019) 195–202.
[382] M. Song, K. Zhong, J. Zhang, Y. Hu, D. Liu, W. Zhang, J. Wang,
T. Li, In-situ ai: Towards autonomous and incremental deep learning

D Amuru et al.: Preprint submitted to Elsevier Page 41 of 41

You might also like