0% found this document useful (0 votes)
11 views8 pages

Ma Et Al. - 2020 - Understanding Graphs in EDA From Shallow To Deep Learning

The paper discusses the significance of graph representations in electronic design automation (EDA) and explores the application of machine learning techniques, particularly deep learning, to improve EDA problem-solving. It highlights challenges in graph learning for EDA and presents two case studies demonstrating the potential of these techniques in enhancing design quality and efficiency. The authors emphasize the importance of graph-based learning methods in overcoming performance bottlenecks in modern design flows.

Uploaded by

yangkunkuo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views8 pages

Ma Et Al. - 2020 - Understanding Graphs in EDA From Shallow To Deep Learning

The paper discusses the significance of graph representations in electronic design automation (EDA) and explores the application of machine learning techniques, particularly deep learning, to improve EDA problem-solving. It highlights challenges in graph learning for EDA and presents two case studies demonstrating the potential of these techniques in enhancing design quality and efficiency. The authors emphasize the importance of graph-based learning methods in overcoming performance bottlenecks in modern design flows.

Uploaded by

yangkunkuo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

ISPD ’20, September 20–23, 2020, Taipei, Taiwan

Session 6: Machine Learning for Physical Design (part 2) Proceedings published March 29, 2020

Understanding Graphs in EDA: From Shallow to Deep Learning


Yuzhe Ma Zhuolun He Wei Li Lu Zhang Bei Yu
CUHK CUHK CUHK CUHK CUHK
ABSTRACT
A C D A
As the scale of integrated circuits keeps increasing, it is witnessed
that there is a surge in the research of electronic design automation B C
D
(EDA) to make the technology node scaling happen. Graph is of B
great significance in the technology evolution since it is one of the G
E F G
most natural ways of abstraction to many fundamental objects in
E F
EDA problems like netlist and layout, and hence many EDA prob-
lems are essentially graph problems. Traditional approaches for (a) (b)
solving these problems are mostly based on analytical solutions or Figure 1: (a) A circuit; (b) The graph representation.
heuristic algorithms, which require substantial efforts in designing
and tuning. With the emergence of the learning techniques, dealing
with graph problems with machine learning or deep learning has
become a potential way to further improve the quality of solutions.
EDA in a few ways. Firstly, since a problem is abstracted into a
In this paper, we discuss a set of key techniques for conducting ma-
graph representation, numerous well-known graph algorithms can
chine learning on graphs. Particularly, a few challenges in applying
be directly applied or slightly modified to form a solution. However,
graph learning to EDA applications are highlighted. Furthermore,
it may be still non-trivial to develop an effective approach for those
two case studies are presented to demonstrate the potential of graph
complicated problems after being modeled with graphs. Besides,
learning on EDA applications.
mathematical programming is another useful approach to utilize.
ACM Reference Format: By defining an objective function and a set of constraints over the
Yuzhe Ma, Zhuolun He, Wei Li, Lu Zhang, and Bei Yu. 2020. Understanding constructed graph, mathematical optimization methodologies can
Graphs in EDA: From Shallow to Deep Learning . In Proceedings of the 2020 be applied to derive a solution, in which certain highly optimized
International Symposium on Physical Design (ISPD ’20), March 29-April 1, software and libraries can be of great help. It is commonly seen
2020, Taipei, Taiwan. ACM, New York, NY, USA, 8 pages. https://doi.org/10.
many problems are tackled with integer linear programming (ILP)
1145/3372780.3378173
[5, 6, 8], linear programming [9, 10], etc. However, given the fact
that many EDA problems are NP-hard and problem sizes are usu-
1 INTRODUCTION ally very large, the efficiency will be a major concern. Heuristic
Modern electronic design automation (EDA) flow is constituted by algorithms or approximation algorithms over graphs have been
multiple stages. On each step, dedicated optimization is performed intensively developed to achieve a balance between performance
to achieve desired quality of results (QoR). As the integration keeps and efficiency [11, 12].
increasing, more and more design constraints are imposed and lead As traditional optimization becomes more and more sophisti-
to performance bottleneck and severe runtime overhead. Various cated, data-driven learning techniques have drawn much attention
optimization techniques have been proposed to improve or renovate in hardware design automation. Classical machine learning tech-
the existing methodologies in EDA flow. niques require manual extraction of the features which are fed into
Graph is one of the core subjects in enormous EDA problems a downstream learning model for training, and several such at-
and optimization algorithms. It is a mathematical structure that tempts have been made in EDA applications [13–17]. Deep learning
models pairwise relationships among different items, which makes has demonstrated that feature representation can be automatically
it a natural and powerful representation for many fundamental learned in the presence of a large amount of data, which achieves
objects in EDA applications, such as Boolean functions, netlists and noticeable performance gain in EDA area [18–21]. However, a few
layout. Over the past few decades, a lot of problems are investigated issues may emerge when it comes to graphs. On one hand, ap-
by leveraging graph abstraction and a rich set of elegant graph al- plying classical machine learning approaches on graphs relies on
gorithms are developed to solve these problems [1–7]. It can be non-trivial heuristics for feature extraction to encode structural
observed that graph algorithms can assist the problem-solving of information, which consumes lots of endeavors to achieve desired
performance. On the other hand, it is not straightforward to trans-
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed fer conventional grid-based operations (e.g., convolution, pooling,
for profit or commercial advantage and that copies bear this notice and the full citation etc.) in deep neural networks to tackle irregular structured data like
on the first page. Copyrights for components of this work owned by others than ACM graphs. Recently, many approaches have emerged in graph learn-
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a ing research covering a wide range of sub-fields, including node
fee. Request permissions from [email protected]. classification [22–24], graph generation [25–27], model robustness
ISPD ’20, March 29-April 1, 2020, Taipei, Taiwan [28], etc. Inspired by such recent progress in graph learning, there
© 2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-7091-2/20/03. . . $15.00 are also a few attempts trying to apply graph learning techniques
https://doi.org/10.1145/3372780.3378173 to solve certain problems in EDA [29], which demonstrate the great

119
ISPD ’20, September 20–23, 2020, Taipei, Taiwan
Session 6: Machine Learning for Physical Design (part 2) Proceedings published March 29, 2020

A D
C

(a)
(a) (b)

B B
A1
<latexit sha1_base64="vyWfX0mmmyMfsu7oknOlPh/5R5w=">AAAB7HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstERtLDERJIEL2VvmYMPe3mV3z4Rc+A02Fhpj6w+y89+4wBUKvmSSl/dmMjMvSATXxnW/ncLa+sbmVnG7tLO7t39QPjxq6zhVDFssFrHqBFSj4BJbhhuBnUQhjQKBj8H4duY/PqHSPJYPZpKgH9Gh5CFn1FipdV3te9V+ueLW3DnIKvFyUoEczX75qzeIWRqhNExQrbuemxg/o8pwJnBa6qUaE8rGdIhdSyWNUPvZ/NgpObPKgISxsiUNmau/JzIaaT2JAtsZUTPSy95M/M/rpia88jMuk9SgZItFYSqIicnsczLgCpkRE0soU9zeStiIKsqMzadkQ/CWX14l7XrNu6jV7+uVxk0eRxFO4BTOwYNLaMAdNKEFDDg8wyu8OdJ5cd6dj0VrwclnjuEPnM8fcQSNxw==</latexit>

C C

A D A2
<latexit sha1_base64="SU7PbLKnjQ8MtPXISwC7EXRxb4Q=">AAAB7HicbVBNT8JAEJ3iF+IX6tHLRjDxRNp60CPqxSMmFkmgIdtlCxu2u83u1oQ0/AYvHjTGqz/Im//GBXpQ8CWTvLw3k5l5UcqZNq777ZTW1jc2t8rblZ3dvf2D6uFRW8tMERoQyaXqRFhTzgQNDDOcdlJFcRJx+hiNb2f+4xNVmknxYCYpDRM8FCxmBBsrBdf1vl/vV2tuw50DrRKvIDUo0OpXv3oDSbKECkM41rrruakJc6wMI5xOK71M0xSTMR7SrqUCJ1SH+fzYKTqzygDFUtkSBs3V3xM5TrSeJJHtTLAZ6WVvJv7ndTMTX4U5E2lmqCCLRXHGkZFo9jkaMEWJ4RNLMFHM3orICCtMjM2nYkPwll9eJW2/4V00/Hu/1rwp4ijDCZzCOXhwCU24gxYEQIDBM7zCmyOcF+fd+Vi0lpxi5hj+wPn8AXKJjcg=</latexit>
D

(b) (c)
(c) (d) Figure 3: An example of decomposition graph in layout de-
Figure 2: Examples of routing graphs. (a) Channel model; (b) composition problem. (a) Layout pattern; (b) Decomposition
Channel graph; (c) Grid model; (d) Grid graph. graph; (c) Decomposition graph with stitch edge.

potential of graph-based learning methods on overcoming perfor-


a powerful method. Layout decomposition is a challenging problem
mance bottleneck in the existing design flow and push forward the
when using multiple patterning lithography for manufacturing in
advance in the industry.
advanced technology nodes. The problem can be modeled using a
In this paper, a number of general graph-based learning tech-
conflict graph and a stitch graph [5, 6, 38–40], as shown in Figure 3.
niques are briefly introduced. Particularly, several special properties
Since many problems can be modeled as graphs, a convenient
that are critical in circuits design are highlighted, including hyper-
way to solve these problems is to apply graph algorithms directly.
graph, graph heterogeneity and scalability. Then we focus on ap-
For example, critical path extraction can be transformed into find-
plying graph-based learning techniques to assist EDA applications.
ing the longest path in a weighted and directed graph, which can be
Two case studies of graph learning in EDA are presented, includ-
solved based on Bellman-Ford algorithm [41, 42]. The construction
ing testability analysis for design-for-testing and timing model
of a power network with a minimum wirelength can be modeled as
selection in a netlist.
a minimum tree construction (MST) problem [42]. Other widely ap-
The rest of this paper is organized as follows. Section 2 reviews
plied graph algorithms in EDA include network flow for placement
a set of graph-based problems and widely used graph algorithms
[9, 10], graph partitioning [32], graph coloring for layout decom-
in the EDA field. Section 3 introduces some fundamental idea of
position [5, 6, 12, 39, 43, 44], etc. Besides, many NP-complete and
graph learning and some conventional graph learning algorithms.
NP-hard problems are prevalent in EDA field. Considering the scale
Section 4 presents two case studies and Section 5 concludes the
of the problem is huge in modern designs, it is difficult to derive an
paper.
optimal solution for these problems. Therefore, heuristic algorithms
are also commonly applied, which empirically yield good solutions.
2 TRADITIONAL GRAPH-BASED Previous studies have shown that heuristic algorithms can achieve
METHODOLOGIES IN EDA APPLICATIONS high quality results efficiently in floorplanning [45, 46] and layout
Graph model is widely leveraged in a wide range of applications in decomposition [12, 47, 48]. In addition, mathematical optimization
the modern design flow, which can greatly simplify the problem is another category of approach which optimizes the value of an
formulation and algorithm analysis. On top of that, many problems objective function and subject to a set of constraints. It is also used
in typical EDA flow can be addressed effectively, e.g., technology intensively to solve graph problems in EDA applications. For exam-
mapping [1, 30, 31], testability analysis [2], circuit partitioning ple, integer linear programming (ILP) is used for solving a variety
[3, 32], placement [4, 33], etc. The most intuitive modeling for a of problems such as routing [49], layout decomposition [6, 50], etc.
circuit is a graph whose nodes represent gates and edges represent Apart from traditional analytical or heuristic algorithms, a few
wires, as shown in Figure 1. In addition, different ways of graph attempts have exploited machine learning techniques to tackle
construction may be applied based on different characteristics in graph-based problems. Most of these approaches utilized classical
applications. In logical verification, a Boolean function is modeled machine learning models which are “shallow” learning in contrast
with a rooted, directed graph [34]. Global routing leverages graph to conventional deep neural networks.
to capture the adjacencies and capacities of the routing region, in Samanta et al. [51] performed wire and buffer sizing based on
which channel graph model [35] (Figure 2(a) – Figure 2(b)) and support vector regression (SVR) to minimize variation on non-tree
grid graph model are applied [8, 36] (Figure 2(c) – Figure 2(d)). In clock networks. In [52], a machine learning approach is proposed
detailed routing, horizontal and vertical constraint graphs are used to model wire delay and slew based on the timing graph, which
to model the relative positions of different nets in a channel routing utilizes a set of classical analytical values extracted from the timing
instance [37]. In the post-layout stage, graph representation is still graph as features for regression to obtain the wire delay/slew. A

120
ISPD ’20, September 20–23, 2020, Taipei, Taiwan
Session 6: Machine Learning for Physical Design (part 2) Proceedings published March 29, 2020

Input graph Layer 1 Layer k Embedding where hvt is the hidden state of node v at time step t, Nv the set
of adjacent vertices of v, xv the feature of node v, xuv the feature
of edge uv, and f the parametric function for local state transi-
tion, which should be carefully design (specifically, a contraction
…… Task map [57]) to ensure convergence. Despite the conceptual signifi-
cance, public interest towards recurrent GNNs was limited due to
the restricted expressive power of a contractive operation and the
heavy computational burden to reach its equilibrium.
Figure 4: Graph neural network with k layers for embedding Given the drawbacks of recurrent GNNs, the emergence of Graph
generation. Obtained embedding is fed to downstream tasks. Convolutional Networks (GCNs) is no surprise. Taxonomically,
GCNs fall into two categories, viz., spectral-based and spatial-based,
with the former based on graph spectral analysis, while the latter
Gaussian Process Regression-based active learning flow is proposed inherits the paradigm of message passing from recurrent GNNs.
for high performance adder design space exploration based on prefix We introduce the two approaches in the following paragraphs.
graph representation [16]. A timing failure prediction technique is Graph convolution is defined [58] on Fourier domain in the spec-
proposed in [53] given the information of netlist, timing constraints, tral approaches, where the eigen-decomposition of graph Lapla-
and floorplan. In place-and-route (P&R) stage, routability is a critical cian is computed. Specifically, graph Laplacian is defined as L =
1 1
issue and has large impact on the final quality. Some previous works I − D − 2 AD 2 = V ΛV ⊤ , where I is the identity matrix, D and A are
investigate routability estimation with a particular routing graph degree matrix and adjacent matrix of the graph, respectively. Let
model based on manually extracted features. Qi et al. [54] and Zhou дθ : R → R be a filter defined on graph spectrum Λ and f : V → R
et al. [55] applied multivariate adaptive regression splines (MARS) be features of nodes, graph convolution is given by:
to detailed routing congestion estimation. Pui et al. [17] proposed
a hierarchical hybrid model for congestion estimation in FPGA дθ ∗ f = V (дθ (Λ) ⊙ V ⊤ f ), (2)
placement, which consists of linear regression and support vector which respects the Convolution Theorem. Equivalently, we write
regression. д = diag(дθ (Λ)) and a convolutional layer with multiple (fl ) chan-
nels is defined by:
3 GRAPH REPRESENTATION WITH DEEP fl
LEARNING METHODOLOGIES H (l +1) = σ ( V (дi V ⊤ H (l ) )),
Õ
(3)
The main challenge of conducting learning algorithms on graphs is i=1
how to encoding structural information of graphs, which have been
where H (l ) is the output of previous convolutional layer with
intensively investigated in the machine learning community. In
H (0) := X the collection of node features, дi the i-th trainable
this section, we introduce a bunch of graph representation methods
filter, and σ (·) a nonlinear activation function. Note that the filters
based on neural networks, and highlight some challenges on how
in spectral domain may not be localized, which could be alleviated
to apply to EDA applications.
with some smoothing techniques [58]. Further, the computation
complexity of this line of methods is reduced through approxima-
3.1 Graph Learning with Neural Networks tion and simplification [22, 59].
Graph-based learning is a new approach to machine learning with
a wide range of applications [56]. Before performing a certain task,
representation of node or graph should be obtained first, which
3
is known as embedding and can be fed to downstream models, as 1 [1 ⇥ d2 ]
5
<latexit sha1_base64="dK7Q7Rwf8Es7WLHBSVx9xxio7ps=">AAAB+XicbVBNS8NAEJ3Ur1q/oh69LLaCp5LEgx4LXjxWsB/QhrDZbNqlm03Y3RRK6D/x4kERr/4Tb/4bt20O2vpg4PHeDDPzwowzpR3n26psbe/s7lX3aweHR8cn9ulZV6W5JLRDUp7KfogV5UzQjmaa034mKU5CTnvh5H7h96ZUKpaKJz3LqJ/gkWAxI1gbKbDtxsBFQ80SqlAUeH4jsOtO01kCbRK3JHUo0Q7sr2GUkjyhQhOOlRq4Tqb9AkvNCKfz2jBXNMNkgkd0YKjAZpVfLC+foyujRChOpSmh0VL9PVHgRKlZEprOBOuxWvcW4n/eINfxnV8wkeWaCrJaFOcc6RQtYkARk5RoPjMEE8nMrYiMscREm7BqJgR3/eVN0vWa7k3Te/TqLa+MowoXcAnX4MIttOAB2tABAlN4hld4swrrxXq3PlatFaucOYc/sD5/AFyKkiE=</latexit>

shown in Figure 4. 1
Encoding
The blossoms of deep learning in various disciplines have pro- 4 2
l=2
<latexit sha1_base64="fRaTxQ5ojt4n2ht7GBug69Ws30w=">AAAB7nicbVA9SwNBEJ2LXzF+RS1tFhPBKtydhTZCwMYygvmA5Ah7m02yZG/v2J0TwpEfYWOhiK2/x85/4ya5QhMfDDzem2FmXphIYdB1v53CxubW9k5xt7S3f3B4VD4+aZk41Yw3WSxj3Qmp4VIo3kSBkncSzWkUSt4OJ3dzv/3EtRGxesRpwoOIjpQYCkbRSu2qJLfEr/bLFbfmLkDWiZeTCuRo9MtfvUHM0ogrZJIa0/XcBIOMahRM8lmplxqeUDahI961VNGImyBbnDsjF1YZkGGsbSkkC/X3REYjY6ZRaDsjimOz6s3F/7xuisObIBMqSZErtlw0TCXBmMx/JwOhOUM5tYQyLeythI2ppgxtQiUbgrf68jpp+TXvquY/+JW6n8dRhDM4h0vw4BrqcA8NaAKDCTzDK7w5ifPivDsfy9aCk8+cwh84nz8og44V</latexit>

1 [1 ⇥ d1 ]
moted the application of neural networks in graph learning. Typi- (a)
<latexit sha1_base64="iuzoLUFRRSOK/BCTq64rBn1pfHU=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LLaCp5LEgx4LXjxWsB/QhrDZbNqlm03YnRRK6D/x4kERr/4Tb/4bt20O2vpg4PHeDDPzwkxwDY7zbVW2tnd296r7tYPDo+MT+/Ssq9NcUdahqUhVPySaCS5ZBzgI1s8UI0koWC+c3C/83pQpzVP5BLOM+QkZSR5zSsBIgW03Bi4eAk+YxlHg+o3ArjtNZwm8SdyS1FGJdmB/DaOU5gmTQAXReuA6GfgFUcCpYPPaMNcsI3RCRmxgqCRmlV8sL5/jK6NEOE6VKQl4qf6eKEii9SwJTWdCYKzXvYX4nzfIIb7zCy6zHJikq0VxLjCkeBEDjrhiFMTMEEIVN7diOiaKUDBh1UwI7vrLm6TrNd2bpvfo1VteGUcVXaBLdI1cdIta6AG1UQdRNEXP6BW9WYX1Yr1bH6vWilXOnKM/sD5/AFsEkiA=</latexit>

cally, neural networks expertise in extracting latent representations Aggregation


from Euclidean data, such as an image (a grid of pixel) or a text (a 1 [1 ⇥ d1 ]
<latexit sha1_base64="iuzoLUFRRSOK/BCTq64rBn1pfHU=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LLaCp5LEgx4LXjxWsB/QhrDZbNqlm03YnRRK6D/x4kERr/4Tb/4bt20O2vpg4PHeDDPzwkxwDY7zbVW2tnd296r7tYPDo+MT+/Ssq9NcUdahqUhVPySaCS5ZBzgI1s8UI0koWC+c3C/83pQpzVP5BLOM+QkZSR5zSsBIgW03Bi4eAk+YxlHg+o3ArjtNZwm8SdyS1FGJdmB/DaOU5gmTQAXReuA6GfgFUcCpYPPaMNcsI3RCRmxgqCRmlV8sL5/jK6NEOE6VKQl4qf6eKEii9SwJTWdCYKzXvYX4nzfIIb7zCy6zHJikq0VxLjCkeBEDjrhiFMTMEEIVN7diOiaKUDBh1UwI7vrLm6TrNd2bpvfo1VteGUcVXaBLdI1cdIta6AG1UQdRNEXP6BW9WYX1Yr1bH6vWilXOnKM/sD5/AFsEkiA=</latexit>
2 [1 ⇥ d1 ] <latexit sha1_base64="iuzoLUFRRSOK/BCTq64rBn1pfHU=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LLaCp5LEgx4LXjxWsB/QhrDZbNqlm03YnRRK6D/x4kERr/4Tb/4bt20O2vpg4PHeDDPzwkxwDY7zbVW2tnd296r7tYPDo+MT+/Ssq9NcUdahqUhVPySaCS5ZBzgI1s8UI0koWC+c3C/83pQpzVP5BLOM+QkZSR5zSsBIgW03Bi4eAk+YxlHg+o3ArjtNZwm8SdyS1FGJdmB/DaOU5gmTQAXReuA6GfgFUcCpYPPaMNcsI3RCRmxgqCRmlV8sL5/jK6NEOE6VKQl4qf6eKEii9SwJTWdCYKzXvYX4nzfIIb7zCy6zHJikq0VxLjCkeBEDjrhiFMTMEEIVN7diOiaKUDBh1UwI7vrLm6TrNd2bpvfo1VteGUcVXaBLdI1cdIta6AG1UQdRNEXP6BW9WYX1Yr1bH6vWilXOnKM/sD5/AFsEkiA=</latexit>
3 [1 ⇥ d1 ] <latexit sha1_base64="iuzoLUFRRSOK/BCTq64rBn1pfHU=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LLaCp5LEgx4LXjxWsB/QhrDZbNqlm03YnRRK6D/x4kERr/4Tb/4bt20O2vpg4PHeDDPzwkxwDY7zbVW2tnd296r7tYPDo+MT+/Ssq9NcUdahqUhVPySaCS5ZBzgI1s8UI0koWC+c3C/83pQpzVP5BLOM+QkZSR5zSsBIgW03Bi4eAk+YxlHg+o3ArjtNZwm8SdyS1FGJdmB/DaOU5gmTQAXReuA6GfgFUcCpYPPaMNcsI3RCRmxgqCRmlV8sL5/jK6NEOE6VKQl4qf6eKEii9SwJTWdCYKzXvYX4nzfIIb7zCy6zHJikq0VxLjCkeBEDjrhiFMTMEEIVN7diOiaKUDBh1UwI7vrLm6TrNd2bpvfo1VteGUcVXaBLdI1cdIta6AG1UQdRNEXP6BW9WYX1Yr1bH6vWilXOnKM/sD5/AFsEkiA=</latexit>
4 [1 ⇥ d1 ] <latexit sha1_base64="iuzoLUFRRSOK/BCTq64rBn1pfHU=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LLaCp5LEgx4LXjxWsB/QhrDZbNqlm03YnRRK6D/x4kERr/4Tb/4bt20O2vpg4PHeDDPzwkxwDY7zbVW2tnd296r7tYPDo+MT+/Ssq9NcUdahqUhVPySaCS5ZBzgI1s8UI0koWC+c3C/83pQpzVP5BLOM+QkZSR5zSsBIgW03Bi4eAk+YxlHg+o3ArjtNZwm8SdyS1FGJdmB/DaOU5gmTQAXReuA6GfgFUcCpYPPaMNcsI3RCRmxgqCRmlV8sL5/jK6NEOE6VKQl4qf6eKEii9SwJTWdCYKzXvYX4nzfIIb7zCy6zHJikq0VxLjCkeBEDjrhiFMTMEEIVN7diOiaKUDBh1UwI7vrLm6TrNd2bpvfo1VteGUcVXaBLdI1cdIta6AG1UQdRNEXP6BW9WYX1Yr1bH6vWilXOnKM/sD5/AFsEkiA=</latexit>

sequence of letters). While a graph lies in non-Euclidean domain,


which could be quite irregular. Therefore, it is natural and necessary Encoding Encoding Encoding Encoding
to extend deep learning approaches to graph data. 1 [1 ⇥ d0 ]
<latexit sha1_base64="eAlxPJyAFEIDQKrmdpeBpzJwvw0=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LLaCp5LEgx4LXjxWsB/QhrDZbNqlm03YnRRK6D/x4kERr/4Tb/4bt20O2vpg4PHeDDPzwkxwDY7zbVW2tnd296r7tYPDo+MT+/Ssq9NcUdahqUhVPySaCS5ZBzgI1s8UI0koWC+c3C/83pQpzVP5BLOM+QkZSR5zSsBIgW03Bi4eAk+YxlHg+I3ArjtNZwm8SdyS1FGJdmB/DaOU5gmTQAXReuA6GfgFUcCpYPPaMNcsI3RCRmxgqCRmlV8sL5/jK6NEOE6VKQl4qf6eKEii9SwJTWdCYKzXvYX4nzfIIb7zCy6zHJikq0VxLjCkeBEDjrhiFMTMEEIVN7diOiaKUDBh1UwI7vrLm6TrNd2bpvfo1VteGUcVXaBLdI1cdIta6AG1UQdRNEXP6BW9WYX1Yr1bH6vWilXOnKM/sD5/AFl+kh8=</latexit>
2 [1 ⇥ d0 ]
<latexit sha1_base64="eAlxPJyAFEIDQKrmdpeBpzJwvw0=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LLaCp5LEgx4LXjxWsB/QhrDZbNqlm03YnRRK6D/x4kERr/4Tb/4bt20O2vpg4PHeDDPzwkxwDY7zbVW2tnd296r7tYPDo+MT+/Ssq9NcUdahqUhVPySaCS5ZBzgI1s8UI0koWC+c3C/83pQpzVP5BLOM+QkZSR5zSsBIgW03Bi4eAk+YxlHg+I3ArjtNZwm8SdyS1FGJdmB/DaOU5gmTQAXReuA6GfgFUcCpYPPaMNcsI3RCRmxgqCRmlV8sL5/jK6NEOE6VKQl4qf6eKEii9SwJTWdCYKzXvYX4nzfIIb7zCy6zHJikq0VxLjCkeBEDjrhiFMTMEEIVN7diOiaKUDBh1UwI7vrLm6TrNd2bpvfo1VteGUcVXaBLdI1cdIta6AG1UQdRNEXP6BW9WYX1Yr1bH6vWilXOnKM/sD5/AFl+kh8=</latexit>
3 [1 ⇥ d0 ]
<latexit sha1_base64="eAlxPJyAFEIDQKrmdpeBpzJwvw0=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LLaCp5LEgx4LXjxWsB/QhrDZbNqlm03YnRRK6D/x4kERr/4Tb/4bt20O2vpg4PHeDDPzwkxwDY7zbVW2tnd296r7tYPDo+MT+/Ssq9NcUdahqUhVPySaCS5ZBzgI1s8UI0koWC+c3C/83pQpzVP5BLOM+QkZSR5zSsBIgW03Bi4eAk+YxlHg+I3ArjtNZwm8SdyS1FGJdmB/DaOU5gmTQAXReuA6GfgFUcCpYPPaMNcsI3RCRmxgqCRmlV8sL5/jK6NEOE6VKQl4qf6eKEii9SwJTWdCYKzXvYX4nzfIIb7zCy6zHJikq0VxLjCkeBEDjrhiFMTMEEIVN7diOiaKUDBh1UwI7vrLm6TrNd2bpvfo1VteGUcVXaBLdI1cdIta6AG1UQdRNEXP6BW9WYX1Yr1bH6vWilXOnKM/sD5/AFl+kh8=</latexit>
4 [1 ⇥ d0 ]
<latexit sha1_base64="eAlxPJyAFEIDQKrmdpeBpzJwvw0=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LLaCp5LEgx4LXjxWsB/QhrDZbNqlm03YnRRK6D/x4kERr/4Tb/4bt20O2vpg4PHeDDPzwkxwDY7zbVW2tnd296r7tYPDo+MT+/Ssq9NcUdahqUhVPySaCS5ZBzgI1s8UI0koWC+c3C/83pQpzVP5BLOM+QkZSR5zSsBIgW03Bi4eAk+YxlHg+I3ArjtNZwm8SdyS1FGJdmB/DaOU5gmTQAXReuA6GfgFUcCpYPPaMNcsI3RCRmxgqCRmlV8sL5/jK6NEOE6VKQl4qf6eKEii9SwJTWdCYKzXvYX4nzfIIb7zCy6zHJikq0VxLjCkeBEDjrhiFMTMEEIVN7diOiaKUDBh1UwI7vrLm6TrNd2bpvfo1VteGUcVXaBLdI1cdIta6AG1UQdRNEXP6BW9WYX1Yr1bH6vWilXOnKM/sD5/AFl+kh8=</latexit>

l=1
<latexit sha1_base64="jhd8YxvX+zP9Gz++g6JJ1IAdtkc=">AAAB7nicbVA9SwNBEJ2LXzF+RS1tFhPBKtzFQhshYGMZwXxAcoS9zVyyZG/v2N0TwpEfYWOhiK2/x85/4ya5QhMfDDzem2FmXpAIro3rfjuFjc2t7Z3ibmlv/+DwqHx80tZxqhi2WCxi1Q2oRsEltgw3AruJQhoFAjvB5G7ud55QaR7LRzNN0I/oSPKQM2qs1KkKcku86qBccWvuAmSdeDmpQI7moPzVH8YsjVAaJqjWPc9NjJ9RZTgTOCv1U40JZRM6wp6lkkao/Wxx7oxcWGVIwljZkoYs1N8TGY20nkaB7YyoGetVby7+5/VSE974GZdJalCy5aIwFcTEZP47GXKFzIipJZQpbm8lbEwVZcYmVLIheKsvr5N2veZd1eoP9UqjnsdRhDM4h0vw4BoacA9NaAGDCTzDK7w5ifPivDsfy9aCk8+cwh84nz8m/o4U</latexit>

The seminal works of graph neural network (GNN) are mostly Aggregation Aggregation Aggregation Aggregation
inspired by recurrent neural networks, where nodes recurrently 1 2 3 4 1 2 5 1 3 1 4
exchange information with adjacent nodes until a stable state is [4 ⇥ d0 ] [3 ⇥ d0 ] [2 ⇥ d0 ] [2 ⇥ d0 ]
reached. Formally, the hidden state of a node v is updated recur-
<latexit sha1_base64="R1ZfNX8av2JYxeUaA0Mvx7Ru29k=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LLaCp5KkBz0WvHisYFuhDWGz2bRLN5uwOymU0H/ixYMiXv0n3vw3btsctPXBwOO9GWbmhZngGhzn26psbe/s7lX3aweHR8cn9ulZT6e5oqxLU5Gqp5BoJrhkXeAg2FOmGElCwfrh5G7h96dMaZ7KR5hlzE/ISPKYUwJGCmy7MWjhIfCEaRwFjt8I7LrTdJbAm8QtSR2V6AT21zBKaZ4wCVQQrQeuk4FfEAWcCjavDXPNMkInZMQGhkpiVvnF8vI5vjJKhONUmZKAl+rviYIkWs+S0HQmBMZ63VuI/3mDHOJbv+Ayy4FJuloU5wJDihcx4IgrRkHMDCFUcXMrpmOiCAUTVs2E4K6/vEl6XtNtNb0Hr972yjiq6AJdomvkohvURveog7qIoil6Rq/ozSqsF+vd+li1Vqxy5hz9gfX5A1ygkiE=</latexit>

<latexit sha1_base64="Eh1b42pRgKYPxFI3kSS46lwB6Ao=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LLaCp5JEQY8FLx4r2FpoQ9hsNu3SzSbsTgol9J948aCIV/+JN/+N2zYHbX0w8Hhvhpl5YSa4Bsf5tiobm1vbO9Xd2t7+weGRfXzS1WmuKOvQVKSqFxLNBJesAxwE62WKkSQU7Ckc3839pwlTmqfyEaYZ8xMylDzmlICRAttu9K/xAHjCNI4Cx28Edt1pOgvgdeKWpI5KtAP7axClNE+YBCqI1n3XycAviAJOBZvVBrlmGaFjMmR9QyUxq/xicfkMXxglwnGqTEnAC/X3REESradJaDoTAiO96s3F/7x+DvGtX3CZ5cAkXS6Kc4EhxfMYcMQVoyCmhhCquLkV0xFRhIIJq2ZCcFdfXiddr+leNb0Hr97yyjiq6Aydo0vkohvUQveojTqIogl6Rq/ozSqsF+vd+li2Vqxy5hT9gfX5A14xkiI=</latexit>

<latexit sha1_base64="BnQWivthPq2obYWxmNGU0l5q0nc=">AAAB+XicbVBNS8NAEJ3Ur1q/oh69LLaCp5LEgx4LXjxWsB/QhrDZbNqlm03Y3RRK6D/x4kERr/4Tb/4bt20O2vpg4PHeDDPzwowzpR3n26psbe/s7lX3aweHR8cn9ulZV6W5JLRDUp7KfogV5UzQjmaa034mKU5CTnvh5H7h96ZUKpaKJz3LqJ/gkWAxI1gbKbDtxsBDQ80SqlAUOH4jsOtO01kCbRK3JHUo0Q7sr2GUkjyhQhOOlRq4Tqb9AkvNCKfz2jBXNMNkgkd0YKjAZpVfLC+foyujRChOpSmh0VL9PVHgRKlZEprOBOuxWvcW4n/eINfxnV8wkeWaCrJaFOcc6RQtYkARk5RoPjMEE8nMrYiMscREm7BqJgR3/eVN0vWa7k3Te/TqLa+MowoXcAnX4MIttOAB2tABAlN4hld4swrrxXq3PlatFaucOYc/sD5/AFsPkiA=</latexit>

<latexit sha1_base64="BnQWivthPq2obYWxmNGU0l5q0nc=">AAAB+XicbVBNS8NAEJ3Ur1q/oh69LLaCp5LEgx4LXjxWsB/QhrDZbNqlm03Y3RRK6D/x4kERr/4Tb/4bt20O2vpg4PHeDDPzwowzpR3n26psbe/s7lX3aweHR8cn9ulZV6W5JLRDUp7KfogV5UzQjmaa034mKU5CTnvh5H7h96ZUKpaKJz3LqJ/gkWAxI1gbKbDtxsBDQ80SqlAUOH4jsOtO01kCbRK3JHUo0Q7sr2GUkjyhQhOOlRq4Tqb9AkvNCKfz2jBXNMNkgkd0YKjAZpVfLC+foyujRChOpSmh0VL9PVHgRKlZEprOBOuxWvcW4n/eINfxnV8wkeWaCrJaFOcc6RQtYkARk5RoPjMEE8nMrYiMscREm7BqJgR3/eVN0vWa7k3Te/TqLa+MowoXcAnX4MIttOAB2tABAlN4hld4swrrxXq3PlatFaucOYc/sD5/AFsPkiA=</latexit>

rently as following: (b)


Figure 5: An illustration to compute the embedding for a
(t +1) (t ) node with l = 2. (a) Graph; (b) Procedure to compute the
Õ
hv = f (hu , xu , xv , xuv ), (1)
u ∈Nv embedding for node 1.

121
ISPD ’20, September 20–23, 2020, Taipei, Taiwan
Session 6: Machine Learning for Physical Design (part 2) Proceedings published March 29, 2020

Spatial-based graph convolutions are defined based on the spa- Recently, GraphZoom [65] is proposed for improving both accu-
tial relationship of nodes, where information is propagated and racy and scalability of unsupervised graph embedding algorithms,
aggregated in a message passing scheme. which is a multi-level spectral framework. In addition to designing
A representative work is GraphSAGE [23], which can generate specific algorithms and models, there are also a few third-party
node embedding by leveraging node feature information from the libraries like DGL [66] for users to make the network scalable.
neighborhood. The fundamental procedure consists of two steps,
3.2.2 Hypergraph. Another significant distinction is that hyper-
i.e., aggregation and encoding, which can be formulated as:
graph is commonly applied in EDA applications. Different from a
(l ) (l −1)
h N(v) ← AGGREGATEl ({hu , ∀u ∈ N(v)}), (4) regular edge that connects exactly two vertices, a hyperedge may
connect more than two vertices. The basic idea to extend GCN to
(l ) (l )
hv ← σ (W (l ) · h N(v) ), (5) handle hypergraphs is approximating the structural information
indicated by a hyperedge with normal edges. One way to perform
where AGGREGATEl in Equation (4) is the aggregation function ap-
convolution on hypergraph is proposed in [67], which can be writ-
plied to node v and its neighborhood N(v). Equation (5) is encoding
ten as:
operation consisting of an embedding projection and a non-linear
activation. An example illustrating a 2-layer network for generating H (l +1) = σ (D −1/2QW B −1Q ⊤ D −1/2 H (l ) P), (8)
the embedding for node 1 is depicted in Figure 5, with encoding di- where D, B are the degree matrices of the vertex and hyperedges,
mensions in l 1 and l 2 are d 1 and d 2 , respectively. Specifically, if mean respectively, P and W are trainable weights and Q is the incidence
function is selected as the aggregation function, the aggregation in matrix of the hypergraph. Intuitively, Equation (8) works through
layer l is equivalent to Equation (6). clique expansion [67] which is one of the widely used methods to
handle hyperedge by replacing it of size k with a k-clique. There are
(l ) also a few variations of clique expansion such as attention-based
H N(v) = A · H (l −1) clique expansion [68], which assigns different weights to generate
normal edges by an attention mechanism. However, performing
1 2 3 4 5 (l −1) clique expansion on hyperedge of size k requires a complexity of
1 1 w1 w1 w1 0  h 1  O(k!) in terms of the number of edges.
  (l −1) 
w 2 w 1  h 2 

2  1 0 0 In order to reduce the complexity and improve the efficiency, the
= 3 w
 2 0 1 0 0  × h (l3 −1)  , (6) transformation procedure can be reduced by selection. Specifically,
4 w 2

0 0 1
 
0  h (l −1) 
 a few normal edges are selected in the edge set generated by clique
   4  expansion instead of keeping all the edges. The selection criteria is
5 0 w2 0 0 1   (l −1) 
h 5  based on the assumption that nodes in the same hyperedge share

similar features. Intuitively, an edge can be omitted if the repre-
sentations of two nodes are already similar during training, while
where A is the adjacency matrix of the graph, and H (l ) contains the those nodes with relatively distinct representations should remain
embeddings for every node in the graph, w 1 and w 2 are weights connected. Therefore, one criterion is proposed in [69], which is to
for input edges and output edges, respectively. Then the two-step select node pairs connected by the same hyperedge with maximal
process can be calculated with matrix calculation: feature difference. The procedure can be formulated as:
H (l ) = σ ((A · H (l −1) ) · W (l ) ). (7) (i, j) = arg max ||(hi − h j )||2 . (9)
i,j ∈e
Note that random walk is often introduced as a sampling technique.
For more representative work, see [60–62]. We refer readers to [57] where hi , h j are the features of node i and node j, respectively. By
for a more comprehensive survey of graph learning. connecting (i, j) and {(i, u), (j, u) : u ∈ S e } where u ∈ S e is also
called “mediator", the complexity can be reduced to O(k). To do so,
3.2 Challenges in EDA Applications the graph structure changes dynamically during training since the
node representations are going to be updated. [70] also proposes a
3.2.1 Scalability. Unlike conventional graph learning tasks, graph
fast selection criteria which uses the input feature of each node to
learning for EDA problems is prone to runtime overhead consider-
compare the difference and selects the edges at the beginning such
ing that the scale of circuits keeps soaring. Similar to conventional
that the graph structure is fixed regardless of potential change on
CNNs, the most time-consuming process in the computation of a
node representations during training, which requires a constant
GCN is the embedding generation. To tackle the issue of scalability,
number of edges for approximation.
several attempts have been made for efficient graph representation
learning. In [63], a forward computation method with personalized 3.2.3 Heterogeneous Graphs. Most of the existing works on graph
PageRank is investigated to incorporate neighborhood features representation focus on homogeneous graphs, in which all ver-
without aggregation procedure. Besides, it is pointed out that the tices are of the same type and all edges only represent one kind of
inefficiency might be caused by duplicated computation under the relation. However, graphs in EDA can be constructed in a heteroge-
GraphSAGE-like framework [64]. To address this, PinSAGE [64] is neous manner, in which there may exist different types of nodes
proposed to select important neighbors by random walk instead of and edges. For example, in the multiple patterning lithography
aggregating all the neighbors, and a MapReduce pipeline is lever- problem, a typical graph contains two types of edges: stitch edge
aged for maximizing the inference throughput of a trained model. and conflict edge, as shown in Figure 3(c). A conflict edge implies

122
ISPD ’20, September 20–23, 2020, Taipei, Taiwan
Session 6: Machine Learning for Physical Design (part 2) Proceedings published March 29, 2020

the connected nodes tend to be assigned different colors, while a


stitch edge implies the connected nodes should have the same color. Observation point Observation point
To address the issue of heterogeneity, several methodologies have
been proposed based on the fundamental knowledge of learning
on homogeneous graphs. The main difference lies in the selection 0
of neighborhood in the feature aggregation step. Control point Control point 1
Considering there are multiple types of nodes in heterogeneous
graphs, feature aggregation will naturally involve aggregating with
BIST_mode BIST_mode
the same node type and with different node types. Zhang et al. pro-
posed HetGNN [71] to capture both structure and content hetero- (a) (b)
geneity in heterogeneous graphs. Aggregating the nodes with the Figure 6: Example of the test point insertion: (a) TPI with
same type in the neighborhood is done as : AND/OR gate; (b) TPI with a multiplexer.
(k)
u ∈Nt (v) F(hu )
Í
(k +1)
ev = , (10) by different meta-paths, the importance of each semantic specific
|Nt (v)|
embedding should be identified, which is calculated as:
where Nt (v) is the neighborhood of node v with the same type
1 Õ ⊤
q · tanh W · ziΦ + b ,
 
t. F(·) is a user-defined transformation function, e.g., Bi-LSTM is w Φi = (14)
|V|
used in [71] and linear transformation is applied in [62]. Therefore, i ∈V
there are in total |OV | embedding in the graph, where OV is the which essentially is a non-linear transformation with an attention
set of all the node types in a graph. Then, embedding of different vector q. Then, the importance coefficient is normalized by softmax
types are combined through an attention mechanism as follows function. One limitation of HetGAN is that the meta-path should
(k +1)
Õ
(k +1) be pre-defined manually, which might not be able to capture all
hv = α t ev , (11)
meaningful meta-paths. Yun et al. proposed Graph Transformer
t ∈OV
Networks (GTN) [74], in which meta-path is represented by matrix
where α t denotes the importance of node type t to node v. multiplication of soft adjacency matrices.
Apart from node heterogeneity, there could also be multiple
types of edges in a graph, which denote different relationships 4 CASE STUDIES
between items. In [72], a relational graph convolutional network In this section, we present two case studies on applying graph
(R-GCN) is proposed to deal with different types of relations (edges) learning into EDA applications, including test point insertion and
in a graph. Essentially, a forward computation of a node in R-GCN timing model selection.
is performed as
4.1 Test Point Insertion
(l +1) ©Õ Õ 1 (l ) (l ) (l ) (l ) ª
hi =σ­ W h + W0 hi ® , (12) 4.1.1 Problem Background. Built-in self test (BIST) is an important
c i,r r j
«r ∈R j ∈Ni
r technique in design-for-testing, whose purpose is to design addi-
¬
tional features into integrated circuits to allow them to perform
where Nir denotes the set of neighbor indices of node i under self-testing such that the controllability and the observability can be
relation r ∈ R. c i1,r is a normalized constant which can be pre- improved. Test point insertion (TPI) is a broadly used approach that
determined or learned, according to [72]. involves adding extra control points (CPs) or observation points
Besides feature aggregation by node type and edge type, Wang (OPs) to the circuit. CPs can be used for setting signal lines to
et al. proposed HetGAN [73] and used the concept of meta-path desired logic values, while OPs are added as scan cells to make
to select neighbors, where meta-path indicates composite relation a node observable. An example demonstrating two kinds of in-
and is able to represent some semantic relationship, e.g., Author- sertion is given in Figure 6. In this case study, GCN is applied to
Paper-Author is one kind of meta-path in an academic graph and performs observation points insertion in a given netlist to improve
represents a co-author relationship. HetGAN uses node-level at- the observability of a design. Essentially, it can be cast as a binary
tention and semantic-level (meta-path level) attention to learn the classification problem which is to determine whether an observa-
importance of each node and each meta-path, respectively. tion point should be added on the output port or not for each gate
in the design. A comprehensive study is in [29].
ziΦ = σ ( α iΦj · h j ),
Õ
(13)
j ∈NiΦ
Table 1: Statistics of test point insertion benchmarks
where ziΦ is the embedding of node i for meta-path Φ. NiΦ denotes
the meta-path based neighbors of node i. α iΦj is weight coefficient Design Bench1 Bench2 Bench3 Bench4
which is calculated through attention mechanism. Feature aggre- #Nodes 1384264 1456453 1416382 1397586
gation via Equation (13) is based on single meta-path, which is #Edges 2102622 2182639 2137364 2124516
semantic-specific and able to capture a particular kind of semantic
information. To combine different semantic information reflected

123
ISPD ’20, September 20–23, 2020, Taipei, Taiwan
Session 6: Machine Learning for Physical Design (part 2) Proceedings published March 29, 2020

LR SVM RF MLP GCN 4.2 Timing Model Selection


4.2.1 Problem Background. Gate sizing is a commonly used method
90 to optimize the timing of a circuit, which is an intermediate step
Accuracy(%)

to resize instances. In modern design flow, different nets require


different models for delay calculation, such as wire delay model and
buffer delay model with various parameters, to achieve accurate
80
outputs, shown as Figure 9. Conventional gate sizing flow suffers
from the inaccurate selection of delay model for each net in the
circuit, which relies on heuristics and is usually conservative. In
70 this case study, our goal is to train a classification model such that
B1 B2 B3 B4 Average
the selection can be more accurate than heuristics.
Benchmarks
Figure 7: Accuracy comparison with classical machine learn-
ing algorithms. Table 2: Statistics of timing model selection benchmarks

Design #Nodes #Edges #POS #NEG


4.1.2 Implementation Details. SCOAP [75] is leveraged to set ini- D1 49559 109118 2961 46598
tial features for each node, which is a classical quantitative heuristic D2 46548 105534 2168 44380
measurement for testability evaluation. Specifically, each node is D3 45986 95423 2783 43203
associated with a four-dimensional vector [LL, C0, C1, O]. LL rep- D4 41943 90992 1808 40135
resents the logic level of the corresponding gate. [C0, C1, O] corre-
spond to controllability-0, controllability-1 and observability, re-
spectively, which are calculated with SCOAP method. 4.2.2 Experiment Details. A netlist is represented as directed graph
In order to demonstrate the superiority of the GCN model, we G(V, E). Each node v ∈ V represents a driver node and each edge
compare the classification accuracy between GCN and another e ∈ E represents the connection between two nodes. The dataset
four classical learning models, including logistic regression (LR), consists of four 7nm designs. An initial attribute feature vector with
random forest (RF), support vector machine (SVM) and multi-layer a dimension of 14 is supplied to each driver node, including fan-
perceptron (MLP). Since classical machine learning models require out number, instance location, sensitivity, slew, arrival time, slack,
handcrafted features extracted from a graph, neighborhood features capacitance and resistances of net and sink, and delays of net and
are manually integrated by collecting the features of the nodes in the arc. Labels of nodes are generated by comparing and analyzing the
fan-in cone and fan-out cone. 500 nodes in fan-in cone and 500 nodes netlists before and after the global optimization step in an industrial
in fan-out cone are collected using breadth-first-search. Every time a tool. Statistic of designs are summarized in Table 2. #POS and #NEG
node is visited, the feature of this node is concatenated to the current represent the number of nets using buffer delay model and wire RC
feature vector. The node embedding generation is conducted similar delay model, respectively.
to GraphSAGE, consisting of three aggregation layers and three Given the dataset, a GCN can be trained based on the GraphSAGE-
encoding layers whose dimensions are 32, 64 and 128, respectively. like framework [23]. A single GCN is trained for this task, which
The classification is performed with a set of fully-connected layers contains two steps of aggregation-encoding process and a fully
whose dimensions are 64, 64, 128 and 2. Four industrial benchmarks connected layer with hidden dimension of 64. Similar to many clas-
(Bench1 – Bench4) are used in the experiments, whose statistics sification problems in EDA applications, data imbalance is a severe
are shown in Table 1. It can be seen that the sizes of graphs are all issue. To resolve that, a two-stage GCN is leveraged which is similar
in million scale. To preserve the evaluation principle of a machine to [29], and both models share the same structure. After the first
learning model, each time three designs are used for training and GCN model is trained, the parameters of the first model are fixed
the remaining one is used for testing. The accuracy comparison is and the second one starts to be trained, which is initialized by the
presented in Figure 7. GCN achieves significantly higher accuracy parameters obtained from the first stage.
than all other classical machine learning models on average for all Table 3 shows the results of numerical sizing baseline, single-
test designs. stage GCN and two-stage GCN. In every round of train-and-test,
Data visualization can facilitate us to justify whether the rep- we select one design as the testing dataset, while the other three
resentation of a node is discriminative or not. Furthermore, we designs are split into training and validating dataset. The results
visualize different node embedding generated with different net- show that the two-stage GCN achieves the highest F1-score among
work depth, which denotes the representation after integrating the three methods, which demonstrates the effectiveness of the
features of the nodes in 1-hop neighborhood, 2-hop neighborhood GCN approach.
and 3-hop neighborhood, respectively. In this experiment, we visu-
alize the feature representation obtained from different encoders 5 CONCLUSION
using t-SNE [76] for 1000 nodes, including 500 positive nodes and In this paper, we discussed a few key techniques of extending
500 negative nodes, as shown in Figure 8. It can be observed that deep learning approaches to handle irregular structure data and
the representations obtained for positive class and negative class highlight several challenges that are commonly encountered in
become more discriminative as search depth increases. EDA applications. Two case studies on timing model selection and

124
ISPD ’20, September 20–23, 2020, Taipei, Taiwan
Session 6: Machine Learning for Physical Design (part 2) Proceedings published March 29, 2020

Negtive Positive
40 40 40
20 20 20
0 0 0
−20 −20 −20
−40 −40 −40
−40 −20 0 20 40 −40 −20 0 20 40 −40 −20 0 20 40
(a) (b) (c)
Figure 8: Visualization of node embedding with different search depth L. (a) L=1; (b) L=2; (c) L=3.

based on neighborhood aggregating. Dealing with paths in graph


Wire Model Buffer Model with learning techniques may potentially broaden the availability
of graph learning in the EDA domain. In addition, conventional
learning algorithms focused on classification or regression tasks
which typically cannot yield the final solution to a combinatorial
optimization problem directly. Leveraging graph learning to solve
combinatorial problems might be a new direction for graph learning
to play a role in the EDA domain and beyond.
Not Guarantee Timing Requirement
Guarantee Timing Requirement ACKNOWLEDGMENT
(a) The authors would like to thank Dr. Qinghua Liu from Cadence
Figure 9: Example of timing model selection. (a) Wire RC Design Systems and Dr. Mark H. Ren from NVIDIA Research for
delay model; (b) Buffer delay model. their valuable support and insightful comments on the completion
of case studies. This work is supported by The Research Grants
Table 3: Results on the benchmarks Council of Hong Kong SAR (Project No. CUHK24209017).

Design Model F1-score Precision Recall REFERENCES


[1] K.-C. Chen, J. Cong, Y. Ding, A. B. Kahng, and P. Trajmar, “Dag-map: Graph-
Baseline 0.502 0.597 0.433 based fpga technology mapping for delay optimization,” IEEE Design & Test of
D1 GCN 0.561 0.466 0.706 Computers, vol. 9, no. 3, pp. 7–20, 1992.
[2] K.-T. Cheng and C.-J. Lin, “Timing-driven test point insertion for full-scan and
GCN-2 0.581 0.523 0.652 partial-scan BIST,” in Proc. ITC, 1995, pp. 506–514.
[3] N. Selvakkumaran and G. Karypis, “Multiobjective hypergraph-partitioning al-
Baseline 0.529 0.528 0.530 gorithms for cut and maximum subdomain-degree minimization,” IEEE TCAD,
D2 GCN 0.462 0.326 0.791 vol. 25, no. 3, pp. 504–517, 2006.
GCN-2 0.574 0.532 0.623 [4] B. Hu and M. Marek-Sadowska, “Fine granularity clustering-based placement,”
IEEE TCAD, vol. 23, no. 4, pp. 527–536, april 2004.
Baseline 0.527 0.660 0.438 [5] A. B. Kahng, C.-H. Park, X. Xu, and H. Yao, “Layout decomposition approaches
for double patterning lithography,” IEEE TCAD, vol. 29, pp. 939–952, June 2010.
D3 GCN 0.526 0.396 0.782 [6] B. Yu, K. Yuan, D. Ding, and D. Z. Pan, “Layout decomposition for triple patterning
GCN-2 0.538 0.437 0.699 lithography,” IEEE TCAD, vol. 34, no. 3, pp. 433–446, March 2015.
[7] D. Z. Pan, B. Yu, and J.-R. Gao, “Design for manufacturing with emerging nano-
Baseline 0.549 0.542 0.556 lithography,” IEEE TCAD, vol. 32, no. 10, pp. 1453–1472, 2013.
D4 GCN 0.497 0.364 0.785 [8] M. Cho and D. Z. Pan, “BoxRouter: a new global router based on box expansion
and progressive ILP,” in Proc. DAC, 2006, pp. 373–378.
GCN-2 0.556 0.454 0.715 [9] Y. Lin, B. Yu, X. Xu, J.-R. Gao, N. Viswanathan, W.-H. Liu, Z. Li, C. J. Alpert, and
Baseline 0.527 0.582 0.490 D. Z. Pan, “MrDP: Multiple-row detailed placement of heterogeneous-sized cells
for advanced nodes,” IEEE TCAD, 2017.
Average GCN 0.511 0.388 0.766 [10] H. Li, W.-K. Chow, G. Chen, E. F. Young, and B. Yu, “Routability-driven and
GCN-2 0.565 0.493 0.669 fence-aware legalization for mixed-cell-height circuits,” in Proc. DAC, 2018, pp.
1–6.
[11] G. Chen and E. F. Young, “Salt: provably good routing topology by a novel steiner
shallow-light tree algorithm,” IEEE TCAD, 2019.
test point insertion demonstrated the promising functionalities of [12] S.-Y. Fang, Y.-W. Chang, and W.-Y. Chen, “A novel layout decomposition algorithm
for triple patterning lithography,” IEEE TCAD, vol. 33, no. 3, pp. 397–408, March
graph learning in the circuits design domain. 2014.
Despite that significant improvements have been achieved, there [13] H. Zhang, B. Yu, and E. F. Y. Young, “Enabling online learning in lithography
are still lots of mysteries to be uncovered. For example, paths in hotspot detection with information-theoretic feature optimization,” in Proc. IC-
CAD, 2016, pp. 47:1–47:8.
a graph can reveal important properties of a graph (e.g., critical [14] H. Geng, W. Zhong, H. Yang, Y. Ma, J. Mitra, and B. Yu, “Sraf insertion via
paths in a circuit), which is distinct from current developments supervised dictionary learning,” IEEE TCAD, 2019.

125
ISPD ’20, September 20–23, 2020, Taipei, Taiwan
Session 6: Machine Learning for Physical Design (part 2) Proceedings published March 29, 2020

[15] W.-H. Chang, L.-D. Chen, C.-H. Lin, S.-P. Mu, M. C.-T. Chao, C.-H. Tsai, and Y.-C. [45] C. K. Cheng, S. Z. Yao, and T. C. Hu, “The orientation of modules based on graph
Chiu, “Generating routing-driven power distribution networks with machine- decomposition,” IEEE TC, vol. 40, pp. 774–780, June 1991.
learning technique,” in Proc. ISPD, 2016, pp. 145–152. [46] C.-W. Sham, F. Y. Young, and C. Chu, “Optimal cell flipping in placement and
[16] Y. Ma, S. Roy, J. Miao, J. Chen, and B. Yu, “Cross-layer optimization for high floorplanning,” in Proc. DAC, 2006, pp. 1109–1114.
speed adders: A pareto driven machine learning approach,” IEEE TCAD, vol. 38, [47] J. Kuang and E. F. Y. Young, “An efficient layout decomposition approach for
no. 12, pp. 2298–2311, 2018. triple patterning lithography,” in Proc. DAC, 2013, pp. 69:1–69:6.
[17] C.-W. Pui, G. Chen, Y. Ma, E. F. Young, and B. Yu, “Clock-aware ultrascale fpga [48] Y. Yang, W.-S. Luk, D. Z. Pan, H. Zhou, C. Yan, D. Zhou, and X. Zeng, “Lay-
placement with machine learning routability prediction,” in Proc. ICCAD, 2017, out decomposition co-optimization for hybrid e-beam and multiple patterning
pp. 929–936. lithography,” IEEE TCAD, vol. 35, no. 9, pp. 1532–1545, 2016.
[18] H. Yang, S. Li, Y. Ma, B. Yu, and E. F. Young, “GAN-OPC: Mask optimization [49] M. Cho and D. Z. Pan, “BoxRouter: a new global router based on box expansion
with lithography-guided generative adversarial nets,” in Proc. DAC, 2018, pp. and progressive ILP,” IEEE TCAD, vol. 26, no. 12, pp. 2130–2143, 2007.
131:1–131:6. [50] Y. Lin, X. Xu, B. Yu, R. Baldick, and D. Z. Pan, “Triple/quadruple patterning layout
[19] H. Yang, J. Su, Y. Zou, Y. Ma, B. Yu, and E. F. Y. Young, “Layout hotspot detection decomposition via linear programming and iterative rounding,” JM3, vol. 16, no. 2,
with feature tensor generation and deep biased learning,” IEEE TCAD, vol. 38, 2017.
no. 6, pp. 1175–1187, 2019. [51] R. Samanta, J. Hu, and P. Li, “Discrete buffer and wire sizing for link-based
[20] Z. Xie, Y.-H. Huang, G.-Q. Fang, H. Ren, S.-Y. Fang, Y. Chen, and J. Hu, “RouteNet: non-tree clock networks,” IEEE TVLSI, vol. 18, no. 7, pp. 1025–1035, 2009.
Routability prediction for mixed-size designs using convolutional neural network,” [52] A. B. Kahng, S. Kang, H. Lee, S. Nath, and J. Wadhwani, “Learning-based approxi-
in Proc. ICCAD, 2018, pp. 80:1–80:8. mation of interconnect delay and slew in signoff timing tools,” in Proc. SLIP, 2013,
[21] H. Yang, P. Pathak, F. Gennari, Y.-C. Lai, and B. Yu, “DeePattern: Layout pattern pp. 1–8.
generation with transforming convolutional auto-encoder,” in Proc. DAC, 2019, [53] W.-T. J. Chan, K. Y. Chung, A. B. Kahng, N. D. MacDonald, and S. Nath, “Learning-
pp. 148:1–148:6. based prediction of embedded memory timing failures during initial floorplan
[22] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolu- design,” in Proc. ASPDAC, 2016, pp. 178–185.
tional networks,” Proc. ICLR, 2016. [54] Z. Qi, Y. Cai, and Q. Zhou, “Accurate prediction of detailed routing congestion
[23] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on using supervised data learning,” in Proc. ICCD, 2014, pp. 97–103.
large graphs,” in Proc. NIPS, 2017, pp. 1024–1034. [55] Q. Zhou, X. Wang, Z. Qi, Z. Chen, Q. Zhou, and Y. Cai, “An accurate detailed
[24] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec, routing routability prediction model in placement,” in Proc. ASQED, 2015, pp.
“Graph convolutional neural networks for web-scale recommender systems,” in 119–122.
Proc. KDD, 2018, pp. 974–983. [56] H. Cai, V. W. Zheng, and K. Chang, “A comprehensive survey of graph embedding:
[25] D. Xu, Y. Zhu, C. B. Choy, and L. Fei-Fei, “Scene graph generation by iterative problems, techniques and applications,” IEEE TKDE, vol. 30, no. 9, pp. 1616–1637,
message passing,” in Proc. CVPR, 2017, pp. 5410–5419. 2018.
[26] J. You, B. Liu, Z. Ying, V. Pande, and J. Leskovec, “Graph convolutional policy [57] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey
network for goal-directed molecular graph generation,” in Proc. NIPS, 2018, pp. on graph neural networks,” arXiv preprint arXiv:1901.00596, 2019.
6410–6421. [58] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks and locally
[27] J. You, R. Ying, X. Ren, W. Hamilton, and J. Leskovec, “Graphrnn: Generating connected networks on graphs,” arXiv preprint arXiv:1312.6203, 2013.
realistic graphs with deep auto-regressive models,” in Proc. ICML, 2018, pp. 5694– [59] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks
5703. on graphs with fast localized spectral filtering,” in Advances in neural information
[28] H. Dai, H. Li, T. Tian, X. Huang, L. Wang, J. Zhu, and L. Song, “Adversarial attack processing systems, 2016, pp. 3844–3852.
on graph structured data,” in Proc. ICML, 2018, pp. 1123–1132. [60] A. Micheli, “Neural network for graphs: A contextual constructive approach,”
[29] Y. Ma, H. Ren, B. Khailany, H. Sikka, L. Luo, K. Natarajan, and B. Yu, “High per- IEEE Transactions on Neural Networks, vol. 20, no. 3, pp. 498–511, 2009.
formance graph convolutional networks with applications in testability analysis,” [61] J. Atwood and D. Towsley, “Diffusion-convolutional neural networks,” in Ad-
in Proc. DAC, 2019, p. 18. vances in Neural Information Processing Systems, 2016, pp. 1993–2001.
[30] R. J. Francis, J. Rose, and K. Chung, “Chortle: A technology mapping program [62] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph
for lookup table-based field programmable gate arrays,” in Proc. DAC, 1990, pp. attention networks,” arXiv preprint arXiv:1710.10903, 2017.
613–619. [63] A. Bojchevski, J. Klicpera, B. Perozzi, M. Blais, A. Kapoor, M. Lukasik, and S. Gün-
[31] R. Brayton and A. Mishchenko, “Abc: An academic industrial-strength verification nemann, “Is pagerank all you need for scalable graph neural networks?” 2019.
tool,” in International Conference on Computer Aided Verification, 2010, pp. 24–40. [64] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec,
[32] C. J. Alpert, A. E. Caldwell, A. B. Kahng, and I. L. Markov, “Hypergraph parti- “Graph convolutional neural networks for web-scale recommender systems,” in
tioning with fixed vertices,” IEEE TCAD, vol. 19, no. 2, pp. 267–272, 2000. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge
[33] B. Yu, X. Xu, J.-R. Gao, Y. Lin, Z. Li, C. Alpert, and D. Z. Pan, “Methodology for Discovery & Data Mining. ACM, 2018, pp. 974–983.
standard cell compliance and detailed placement for triple patterning lithography,” [65] C. Deng, Z. Zhao, Y. Wang, Z. Zhang, and Z. Feng, “Graphzoom: A multi-level
IEEE TCAD, vol. 34, no. 5, pp. 726–739, May 2015. spectral approach for accurate and scalable graph embedding,” arXiv preprint
[34] R. E. Bryant, “Graph-based algorithms for boolean function manipulation,” IEEE arXiv:1910.02370, 2019.
TC, vol. 100, no. 8, pp. 677–691, 1986. [66] M. Wang, L. Yu, D. Zheng, Q. Gan, Y. Gai, Z. Ye, M. Li, J. Zhou, Q. Huang, C. Ma
[35] J. Cong and P. H. Madden, “Performance driven global routing for standard cell et al., “Deep graph library: Towards efficient and scalable deep learning on graphs,”
design,” in Proc. ISPD, vol. 14, no. 16, 1997, pp. 73–80. arXiv preprint arXiv:1909.01315, 2019.
[36] C. Albrecht, “Provably good global routing by a new approximation algorithm [67] Y. Feng, H. You, Z. Zhang, R. Ji, and Y. Gao, “Hypergraph neural networks,” in
for multicommodity flow,” in Proc. ISPD, 2000, pp. 19–25. Proc. AAAI, vol. 33, 2019, pp. 3558–3565.
[37] T. Yoshimura and E. S. Kuh, “Efficient algorithms for channel routing,” IEEE TCAD, [68] S. Bai, F. Zhang, and P. H. Torr, “Hypergraph convolution and hypergraph atten-
vol. 1, no. 1, pp. 25–35, 1982. tion,” arXiv preprint arXiv:1901.08150, 2019.
[38] K. Yuan, J.-S. Yang, and D. Z. Pan, “Double patterning layout decomposition for [69] N. Yadati, M. Nimishakavi, P. Yadav, V. Nitin, A. Louis, and P. Talukdar, “Hypergcn:
simultaneous conflict and stitch minimization,” IEEE TCAD, vol. 29, no. 2, pp. A new method for training graph convolutional networks on hypergraphs,” in
185–196, Feb. 2010. Advances in Neural Information Processing Systems, 2019, pp. 1509–1520.
[39] H.-Y. Chang and I. H.-R. Jiang, “Multiple patterning layout decomposition con- [70] T.-H. H. Chan and Z. Liang, “Generalizing the hypergraph laplacian via a diffusion
sidering complex coloring rules,” in Proc. DAC, 2016, pp. 40:1–40:6. process with mediators,” Theoretical Computer Science, 2019.
[40] Y. Ma, J.-R. Gao, J. Kuang, J. Miao, and B. Yu, “A unified framework for simulta- [71] C. Zhang, D. Song, C. Huang, A. Swami, and N. V. Chawla, “Heterogeneous graph
neous layout decomposition and mask optimization,” in Proc. ICCAD, 2017, pp. neural network,” in Proc. KDD. ACM, 2019, pp. 793–803.
81–88. [72] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van Den Berg, I. Titov, and M. Welling,
[41] R. Bellman, “On a routing problem,” Quarterly of applied mathematics, vol. 16, “Modeling relational data with graph convolutional networks,” in European Se-
no. 1, pp. 87–90, 1958. mantic Web Conference. Springer, 2018, pp. 593–607.
[42] L.-T. Wang, Y.-W. Chang, and K.-T. T. Cheng, Electronic design automation: syn- [73] X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu, “Heterogeneous graph
thesis, verification, and test. Morgan Kaufmann, 2009. attention network.” ACM, 2019, pp. 2022–2032.
[43] B. Yu and D. Z. Pan, “Layout decomposition for quadruple patterning lithography [74] S. Yun, M. Jeong, R. Kim, J. Kang, and H. J. Kim, “Graph transformer networks,”
and beyond,” in Proc. DAC, 2014, pp. 53:1–53:6. in Proc. NIPS, 2019, pp. 11 960–11 970.
[44] B. Yu, Y.-H. Lin, G. Luk-Pat, D. Ding, K. Lucas, and D. Z. Pan, “A high-performance [75] L. H. Goldstein and E. L. Thigpen, “SCOAP: Sandia controllability/observability
triple patterning layout decomposer with balanced density,” in Proc. ICCAD, 2013, analysis program,” in Proc. DAC, 1980, pp. 190–196.
pp. 163–169. [76] L. v. d. Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of Machine
Learning Research, vol. 9, no. Nov, pp. 2579–2605, 2008.

126

You might also like