LayouTransformer - Generating Layout Patterns With Transformer Via Sequential Pattern Modeling
LayouTransformer - Generating Layout Patterns With Transformer Via Sequential Pattern Modeling
pattern design rules by explicitly modeling the shapes of polygons ࡱ 2 ܸ ǡ ܧଵ ǡ ܧଶ ǡ ܧଷ ǡ ܧସ
and the layouts among polygons. Specifically, we first propose a 3 Layout modeling
sequential pattern representation scheme that fully describes the 4 5 1 2 3 4 5
geometric information of polygons by encoding the 2D layout pat-
terns as sequences of tokens, i.e., vertexes and edges. Then we Figure 1: Illustration of existing image-based modeling and
train a sequential generative model to capture the long-term depen- our proposed sequential modeling for pattern generation.
dency among tokens and thus learn the design rules from training
examples. To generate a new pattern in sequence, each token is In recent years, machine learning is well adopted in various lithog-
generated conditioned on the previously generated tokens that are raphy design applications, giving rise to the requirement for large-
from the same polygon or different polygons in the same layout scale and diverse layout patterns. Building such pattern libraries
map. Our framework, termed LayouTransformer, is based on the is extremely time-consuming due to the long logic-to-chip design
Transformer architecture due to its remarkable ability in sequence cycle. To overcome this, researchers begin to pay more attention
modeling. Comprehensive experiments show that our LayouTrans- on synthesizing diverse layout patterns of VLSI.
former not only generates a large amount of legal patterns but also Early works [12–14] propose simple deterministic methods to
maintains high generation diversity, demonstrating its superiority synthesize new patterns. For example, some predefined unit pat-
over existing pattern generative models. terns are manually selected to be combined into a new pattern. An-
other simpler method is to rotate or flip exiting patterns to obtain
1 INTRODUCTION new patterns. Though effective in some extent, they only achieve
Diverse layout patterns of Very-Large-Scale Integration (VLSI) are minor augmentation for the original pattern library, and both the
of crucial significance for a variety of lithography design appli- quantity and diversity of the synthesized patterns are limited.
cations, including OPC recipes [1–4], source mask optimization, In order to break this limitation, researchers exploit powerful
layout hotspot detection [5–8], and lithography simulation [9–11]. machine learning methods to model the design process of patterns
through deep neural networks. They regard the pattern generation
* Equal problem as image generation of layout maps, each with multiple
contribution
polygons. Nevertheless, image-based modeling for layout pattern
generation may suffer from two drawbacks. First, the pixel-level
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed modeling of pattern geometric information is insufficient to capture
for profit or commercial advantage and that copies bear this notice and the full citation the shapes and layouts of patterns, leading to poor legality of the
on the first page. Copyrights for components of this work owned by others than ACM generated patterns, as shown in Figure 1 (upper). To alleviate this,
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a time-consuming post-processing is needed, which smoothes the
fee. Request permissions from [email protected]. noisy patterns based on a Conditional Generative Adversarial Net-
ICCAD ’22, October 30-November 3, 2022, San Diego, CA, USA work (CGAN) [15] to reduce the risk of violating DRC rules. Second,
© 2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9217-4/22/10. . . $15.00 the diversity of patterns generated by existing image-based gener-
https://doi.org/10.1145/3508352.3549350 ative models is limited. Generative Adversarial Networks (GANs)
ICCAD ’22, October 30-November 3, 2022, San Diego, CA, USA Liangjian Wen*, Yi Zhu*, Lei Ye, Guojin Chen, Bei Yu, Jianzhuang Liu, and Chunjing Xu
and 𝑐 𝑦 are the numbers of scan lines subtracted by one along the for natural language processing where the model is required to cap-
𝑥-axis and 𝑦-axis, respectively. Hence, following [17], the pattern ture long sequence features, understand the structure of language,
diversity is defined as the Shannon entropy [27] of the distribution maintains common grammar rules, and meanwhile enable parallel
of the pattern complexities as follows: training. Inspired by the unprecedented success of Transformer-
based language generation models, e.g., GPT [19], BERT [20], and
Definition 1 (Pattern Diversity). Pattern diversity, denoted
XLNet [21], we take advantage of the sequence modeling ability of
by 𝐻 , is the Shannon entropy of the pattern complexity sampled from
the Transformer architecture and design a novel sequence genera-
the pattern library,
tion framework for generating layout patterns.
𝐻 =− 𝑃 𝑐 𝑥𝑖 , 𝑐 𝑦 𝑗 log 𝑃 𝑐 𝑥𝑖 , 𝑐 𝑦 𝑗 , (1) To achieve this, we first propose sequential pattern representa-
𝑖 𝑗 tion that converts the 2D layout patterns into 1D sequences, in Sec-
tion 5.1. Then, we formulate the pattern generation problem from
where 𝑃 𝑐 𝑥𝑖 , 𝑐 𝑦 𝑗 denotes the probability of a pattern with complexity
the perspective of sequential modeling (Section 5.2), and design
(𝑐 𝑥𝑖 , 𝑐 𝑦 𝑗 ) sampled from the library.
a sequential pattern generation model based on the Transformer
According to this definition, the larger the entropy of the pattern architecture (Section 5.3). The training and generating procedures
complexity in the pattern library, the more diverse the patterns. are described in Section 5.4 and Section 5.5, respectively.
Pattern Validity. Synthesizing realistic layout patterns is an-
other essential target of pattern generation. This implies the gener-
ated patterns should closely resemble the real patterns. To evaluate 5.1 Sequential Pattern Representation
how realistic the generated patterns are, [24] defines pattern valid- In this subsection, we first give a brief introduction of the recently
ity as follows: proposed squish patterns [18] which is a 2D pattern representation
for generative models. After that, we present our sequential pattern
Definition 2 (Pattern Validity). Pattern validity is the ratio
representation that converts the 2D patterns into 1D sequences.
of realistic patterns to total patterns.
Squish pattern representation compresses the layout images into
Section 6.3 gives a scheme to measure if a generated pattern is a pattern topology matrix T and two geometric vectors 𝜹 𝑥 and 𝜹 𝑦 .
realistic. Based on the above evaluation metrics for pattern gen- As shown in Figure 3(a), the topology T is a binarized matrix where
eration, the fundamental framework of pattern generation can be 1 indicates the existence of a polygon and 0 otherwise. The vectors
formulated as follows: 𝜹 𝑥 and 𝜹 𝑦 describe the widths and heights of the grids in 𝑥 and 𝑦
axises, respectively. Such compositional representation requires the
Definition 3 (Pattern Generation). Given a set of real layout
layout generation models to generate the topology matrix and the
patterns and design rules, pattern generation aims to establish a legal
geometric vectors jointly and meanwhile to maintain the pattern
pattern library such that the pattern diversity and the pattern validity
legalization. This is quite challenging for modern generative mod-
of the layout patterns in the library are maximized.
els. Existing layout generation learning models [17, 24] learn to
generate layout typologies with the geometric vectors fixed, which
4 OVERVIEW OF OUR FRAMEWORK limits the diversity and flexibility of pattern generation.
Learning to generate diverse and legal layout patterns in 2D maps is In contrast, our proposed sequential patterns represent the lay-
challenging since it is difficult to capture and follow a lot of design out maps in a more compact and flexible way. As shown in Fig-
rules about the geometric shapes and spatial layouts of the patterns. ure 3(b), each polygon is represented as a sequence containing the
In this work, to the best of our knowledge, we for the first time coordinates of the starting point, and the directions and offsets for
regarding the 2D pattern generation problem as a 1D sequence walking through all the edges in turn from the starting point in a
generation problem. To achieve this, we design a novel sequential counterclockwise direction. Specifically, each polygon is identified
pattern generation framework that can well capture the pattern with a pair [𝑆𝑂𝑃] and [𝐸𝑂𝑃] tokens indicating the start and the
design rules and generate a large variety of new layout patterns. end of the pattern sequence, respectively. The starting point (𝑥 0, 𝑦0 )
As shown in Figure 4, given a set of layout examples, we first ex- is the upper-left corner of the polygon. We define four directional
tract their sequential pattern representations, where each pattern is tokens to represent the directions of “up”, “down”, “left” and “right”.
serialized as tokens of the starting point and the directions and off- Each of the directional tokens is followed by an offset value indi-
sets of edges. Then the extracted sequences are encoded and fed into cating the length of the edge. The coordinates of the starting point
a sequential generative model, which is built upon the Transformer and the offsets are discretized into the integers in [1, 𝑁 ], with 1
architecture, a successful model in generating well-structured and being the minimum distance unit and 𝑁 being the maximum length
grammar-correct sentences, in an auto-regressive manner. Training of the layout patterns. Multiple polygons are contained within a
with a dataset of serialized patterns, our auto-regressive genera- pair of tokens [𝑆𝑂𝐵] and [𝐸𝑂𝐵] denoting the head and the tail
tive model can generate new layout patterns which are legal and of a layout block. In the sequential representation of the layout
diverse. block, the polygons are sorted in ascending order according to the
𝑦-coordinate of their upper-left corners. If two polygons have the
5 SEQUENTIAL PATTERN GENERATION same 𝑦-coordinate of their upper-left corners, the one with smaller
Transformers have shown great potential in sequence modeling due 𝑥 0 comes first. In light of language, we define the vocabulary as a set
to its remarkable ability in capturing long-term dependencies of se- of all possible values of the tokens in the pattern sequences. Hence,
quences. As a result, Transformers become a dominant architecture the size of vocabulary is equal to 𝑁 + 8 where 8 is the number of
ICCAD ’22, October 30-November 3, 2022, San Diego, CA, USA Liangjian Wen*, Yi Zhu*, Lei Ye, Guojin Chen, Bei Yu, Jianzhuang Liu, and Chunjing Xu
ܽଵ ܽଶ ܽଷ ܽସ (ݔ , ݕ) ݀
Topology: 1 1 1 1 ݀ଵ Sequence:
ܾଵ ࡼ ࡼ
0 0 0 1 ݀ହ Start Of Block End Of Block
ܶ=
ܾଶ 0 0 0 0 ݀ଶ ݀ଷ ݀ସ ܱܵܤ ࡼ ࡼ ࡼ ܤܱܧ
1 0 1 1
ܾଷ Geometry:
ܾସ ࡼ ߜ௫ = [ܽଵ , ܽଶ , ܽଷ , ܽସ ] ࡼ ࡼ ܱܵܲ , ݔ , ݕ, 1, ݀ଵ , െ, ݀ଶ , … , []ܱܲܧ
ࡼ
ߜ௬ = [ܾଵ , ܾଶ , ܾଷ, ܾସ] Start Of Pattern End Of Pattern
8KGR2G_U[Z6GZZKXTY H
9KW[KTZOGR6GZZKXT-KTKXGZOUT
9KW[KTIK*KIUJOTM Add & Norm
:UQKT+SHKJJOTM MatMul
Feed Forward
4K]2G_U[Z6GZZKXTY
Softmax
:XGOT9KW[KTZOGR Scale
-KTKXGZO\K3UJKR Add & Norm
MatMul
Multi-Head
9KW[KTZOGR
6GZZKXT8KVXKYKTZGZOUT Attention ܳ ܸ
9GSVROTM ܭ
Positional
Figure 4: Overview of our LayouTransformer framework.
Embedding
Input ܺ
the tokens including [𝑆𝑂𝑃], [𝐸𝑂𝑃], [𝑆𝑂𝐵], [𝐸𝑂𝐵], “up”, “down”,
“left” and “right”. Figure 5: Illustration of the Transformer block.
ܱܵܲ ݔ … ܱܲܧ ܤܱܧ ܱܵܲ ݔ ݕ ݀ଵ … ‘SOB’ (Start of Block)
Training Patterns 1 ‘SOP’, 0, 0, ‘down’, 267, ‘right’, 291,
Transformer Block K Sampling ‘up’, 80, ‘left’, 165, ‘up’, 96, ‘right’,
63, ‘up’, 91, ‘left’, 189, ‘EOP’,
…
2 ‘SOP’, 457, 0, ‘down’, 267, ‘right’,
34, ‘up’, 112, ‘right’, 18, ‘up’, 155,
‘left’, 52, ‘EOP’,
Transformer Block 1
Extracting Transformer ……
Sequential Patterns ‘EOB’ (End of Block)
Token …
embeddings Generated
ܱܵܤ ܱܵܲ ݔ … ܱܲܧ ܱܵܤ ܱܵܲ ݔ ݕ … Patterns
1 2
(a) Training Stage (b) Generating Stage
the resulting token embeddings at all attention heads are concate- for predicting a token, e.g., the next token of [𝑆𝑂𝐵] must be [𝑆𝑂𝑃].
nated and once again projected to input dimension via a learnable The model learns to predict each token 𝑡𝑖 based on the previous
weight matrix 𝑾 𝑝 ∈ R (𝐻 ×𝑑)×𝑑 . For each 𝒙 𝑖 ∈ 𝑿 , we obtain the tokens {𝑡 1, ..., 𝑡𝑖−1 }. For example, as shown in Figure 6(a), the token
output embedding 𝒙ˆ 𝑖 as: 𝑡𝑖 = 𝑥 0 in the sequence 𝒕 is determined based on the given sub-
𝑄 √ sequence {[𝑆𝑂𝐵], [𝑆𝑂𝑃]}. First, each of the token embeddings in 𝑺
𝑨ℎ (𝒙 𝑖 , 𝒙 𝑗 ) = softmax((𝑾 ℎ 𝑿 )(𝑾 ℎ𝐾 𝑿 ) / 𝑑), (3) and the positional embeddings in 𝑬 are added together and fed into
⎧
⎪ 𝑁 ⎫
⎪ the Transformer model. Then, the model outputs {𝒛 1, ..., 𝒛𝑖−1 } and
𝐻
⎨
⎪ ⎬
⎪
𝒙˜ 𝑖 = concatℎ=1 𝑨ℎ (𝒙 𝑖 , 𝒙 𝑗 )(𝑾 𝑉
ℎ 𝑿 ) , (4) uses the last hidden state 𝒛𝑖−1 to predict token 𝒕 𝑖 . This procedure
⎪
⎪ 𝑗=1 ⎪
⎪ is represented as:
⎩ ⎭
𝒙ˆ 𝑖 = 𝑾 𝑝 𝒙˜ 𝑖 . (5) {ˆ𝒛 1, ..., 𝒛ˆ𝑖−1 } = {𝒔 1 + 𝒆 1, ..., 𝒔𝑖−1 + 𝒆𝑖−1 }, (6)
Beyond the Transformer, our model introduces two extra mecha- {𝒛 1, ..., 𝒛𝑖−1 } = TF({ˆ𝒛 1, ..., 𝒛ˆ𝑖−1 }), (7)
nisms for the sequential pattern generation learning. The first one
is masked self-attention, where each token is only allowed to attend where 𝒛ˆ𝑖 , 𝒛𝑖 ∈ R𝑑 . We use the cross entropy loss between the condi-
to the tokens preceding it in the sequence, since the subsequent tional distribution 𝑝 (𝑡𝑖 | 𝑡 1, . . . , 𝑡𝑖−1 ) and the one-hot distribution
tokens are unknown currently during token generation. The second of the target token 𝑡𝑖 to maximize the prediction of 𝑡𝑖 , thus optimiz-
is segment-level recurrence, which is inspired by recurrent neural ing the parameters of our model as:
networks and first proposed in Transformer-XL [22]. In practice, 𝑓 (𝑡 ))
exp(𝒛𝑖−1 𝑖
if a sequence is very long, it is divided into several segments in a 𝑝 (𝑡𝑖 | 𝑡 1, . . . , 𝑡𝑖−1 ) = 𝑓 (𝑡)) . (8)
𝑡 ∈𝑉 exp(𝒛𝑖−1
fixed length to reduce computation cost. This mechanism enables
the learning of dependencies across segments without disrupting In the training process, the predictions of different tokens are inde-
their coherence. By doing this, our model can well capture the long- pendent, so they can be calculated in parallel, meaning that Lay-
term relationship among sequences of different polygons and even ouTransformer can be trained efficiently.
polygons from different layout maps.
5.5 Generating New Patterns
5.4 Training LayouTransformer After training, our model can generate new patterns in an auto-
Here we present the training process of our sequential pattern regressive manner, i.e., each token is generated based on the preced-
generation model, as shown in Figure 6(a). Given a layout map 𝐼 , ing generated ones, as shown in Figure 6(b). Given a starting prompt
the patterns on it are first encoded as a sequence 𝒕 = {𝑡 1, ..., 𝑡𝐿 } = as input, e.g., “[𝑆𝑂𝐵], [𝑆𝑂𝑃]”, the prompt is first tokenized and en-
{[𝑆𝑂𝐵], [𝑆𝑂𝑃], 𝑥 0, 𝑦0, ..., [𝐸𝑂𝐵]}. The tokens in the sequence 𝒕 are coded via the learned word embedding layer, and then fed into the
tokenized and fed into a learnable word embedding layer 𝑓 (·), re- trained transformer blocks. Based on the output vectors, we can
sulting in a set of embeddings 𝑺 = {𝒔 1, ..., 𝒔 𝐿 }, 𝒔𝑖 = 𝑓 (𝑡𝑖 ) ∈ R𝑑 . obtain the categorical distribution 𝑝 (𝑡 3 | 𝑡 1 = [𝑆𝑂𝐵], 𝑡 2 = [𝑆𝑂𝑃])
The word embedding layer aims to learn an embedding for each over the possible values of the next token using the softmax func-
token in the vocabulary 𝑉 which contains all possible tokens. Also, tion. From the distribution, a token 𝑥 0 is sampled as the generated
we employ the positional embeddings to encode the relative po- next token, which is then regarded as a input token for predicting
sitions of the tokens as in [22], obtaining a set of embeddings the subsequent token. Then, “[𝑆𝑂𝐵], [𝑆𝑂𝑃], 𝑥 0 ” is fed back into the
𝑬 = {𝒆 1, ..., 𝒆 𝐿 }, 𝒆𝑖 ∈ R𝑑 . Note that the positional embeddings are model for continuing generation. Repeat the generation process
crucial to model the shapes and spatial layouts of patterns, since the above until encountering the token [𝐸𝑂𝐵] which indicates the end
relative position information of a token can provide the strong prior of the generation of a layout map.
ICCAD ’22, October 30-November 3, 2022, San Diego, CA, USA Liangjian Wen*, Yi Zhu*, Lei Ye, Guojin Chen, Bei Yu, Jianzhuang Liu, and Chunjing Xu
Here we employ the Nucleus Sampling [29] as our sampling We compare our LayouTransformer with several learning-based
strategy to enable generation diversity while maintaining legality. baselines. CAE [17] is a vanilla convolutional auto-encoder model.
It selects a smallest possible set of top ranked tokens, such that VCAE [24] is a variational convolutional auto-encoder model. The
the sum of their probabilities is greater than a threshold 𝜖. The LegalGAN [24] model can legalize most of the illegal patterns. In
probability of the rest tokens are set to 0, and the probabilities of Table 1, the VCAE baseline achieves larger diversity (10.9311) than
the tokens in the set are re-scaled to ensure the sum of the proba- our method (10.532) on the generated patterns. However, most of the
bilities to be 1. From the rescaled distribution, we sample a token VCAE generated patterns violate the DRC rules (only 2126 patterns
for generation. When the model is very certain on few tokens, the are legal) and cause an invalid larger diversity value. Furthermore,
potential candidate set is small, e.g., generating a “[𝑆𝑂𝑃]” regard- we can see that the our method obtains the largest number of
ing a “[𝑆𝑂𝐵]”. Otherwise, there will be many potential candidate DRC-clean patterns (89726 out of 100000) and the best diversity
tokens, e.g., generating “𝑥 0 ” regarding a “[𝑆𝑂𝑃]”, sampling from performance (10.527) on the legal patterns, which demonstrates the
which will generate diverse polygons in the block. superiority of our sequential pattern generation model. Note that
we do not use LegalGAN to correct illegal patterns. Because our
6 EXPERIMENTS sequential pattern representation explicitly encodes the complete
geometric information of each polygon and the spatial layout of
6.1 Experimental Setup
multiple polygons, our trained LayouTransformer can well capture
Dataset: We follow [23] to obtain the dataset of small layout pattern the design rules and generate DRC-clean patterns. The compared
images with the size of 2048 × 2048𝑛𝑚 2 by splitting a 160 × 400𝜇𝑚 2 baselines typically use the squish pattern representations and learn
layout map from ICCAD contest 2014. 80% of the images are ran- a model to generate the topology map with the geometry fixed,
domly selected as the training set while the others serve as the which limits the diversity of pattern generation in some extent. In
validation set for model training. contrast, since LayouTransformer models the distribution of layout
Network Configuration: In our sequential pattern generation patterns both topologically and geometrically, sampling from it
framework, the vocabulary size is 2056, containing 2048 possible enables better diversity.
integers of coordinates and offsets, 4 directional tokens (up, down, We show some randomly selected examples of the training and
left, right), and 4 tokens indicating the start or end of a polygon or generated patterns in Figure 7 and Figure 8, respectively. From these
a block. The dimensionality of the word and positional embeddings examples, we can observe that: 1) different from VCAE, where the
are set to 512. Our Transformer architecture has 𝐾 = 6 Blocks, geometry vectors only come from the training patterns, our method
and each multi-head attention module has 8 attention heads. The can generate patterns with new geometries that are never seen
lengths of memory and prediction of our Transformer model are in the training set; 2) the complexity of our generated polygons
both set to 512. The dimensionality of the hidden states is also 512, are consistent with the training samples, without an undesirable
and the intermediate size of the feed-forward networks is 1024. significant style gap between them; 3) when a layout map contains
Training Details: Our sequential pattern generation is imple- many complex patterns, they can be well arranged at legal locations
mented with PyTorch [30]. The training of our model runs for regarding their surrounding patterns.
40K steps, about 42 epochs with batch size 22. We employ the Adam
optimizer with 𝛽 1 = 0.9 and 𝛽 2 = 0.999. The learning rate is set to Table 2: Pattern validity comparison. The 10787 training pat-
3𝑒 − 4, with a warm-up of 2000 steps and the cosine learning rate terns from the real patterns are used to train the detection
decay. The rates of the residual dropout and attention dropout are model [23].
both set to 0.1. The total training procedure takes about four hours Pattern Validity
using 8 Nvidia Telsa V100 32GB GPUs. Set/Method Legal Patterns
𝑇 = 0.6 𝑇 = 0.7 𝑇 = 0.8
Training patterns 10787 0.8998 0.9702 0.9904
Table 1: Comparison with recent learning-based methods. CAE+LegalGAN [24] 3740 0.0003 0.0027 0.0167
VCAE+LegalGAN [24] 84510 0.5430 0.7840 0.9057
Legal Patterns
Set/Method LayouTransformer (Ours) 89726 0.8416 0.9438 0.9834
Patterns Diversity (H) Patterns Diversity (H)
Real patterns 13869 10.7767 − −
CAE [17] 100000 4.5875 19 3.7871
6.3 Pattern Validity
VCAE [24] 100000 10.9311 2126 9.9775 In this subsection, we evaluate how realistic the patterns generated
CAE+LegalGAN [24] 100000 5.8465 3740 5.8142 by LayouTransformer are. We adopt the pattern style detection
VCAE+LegalGAN [24] 100000 9.8692 84510 9.8669 model proposed by [24] to verify the validity of the patterns.
LayouTransformer (Ours) 100000 10.532 89726 10.527 The key idea of the model is that realistic patterns with a particu-
lar layout style are viewed as normal; otherwise they are anomalous.
Hence, pattern style detection is regarded as anomaly detection
6.2 Pattern Diversity and Legality of generated layouts. Specifically, a CGAN [15] is trained to learn
We generate 100000 patterns from the trained model and evaluate the distribution of normal samples. Since the encoder is trained on
their diversity and legality. The legality is checked via a tool Klayout normal samples, it fails to learn anomalous features. The trained
based on the design rules described in Section 3. After that, we generator is unable to reconstruct an anomalous pattern if it is en-
obtain the DRC-clean legal patterns and we further evaluate their coded. Hence, we can detect if a given pattern is realistic or not by
diversity. the difference of features between the pattern and its reconstructed
LayouTransformer: Generating Layout Patterns with Transformer via Sequential Pattern Modeling ICCAD ’22, October 30-November 3, 2022, San Diego, CA, USA
Figure 7: Examples of training patterns ( white regions represent the metal shapes).
Figure 8: Examples of generated patterns ( white regions represent the metal shapes).
one. This difference is defined as the pattern anomaly score. If a For fair comparisons, we adopt the same pre-trained pattern
pattern 𝑥 is realistic, the 𝑠𝑐𝑜𝑟𝑒 (𝑥) should satisfy: style detection model in [24] to compute the anomaly scores of
layout patterns generated by LayouTransformer. The legal patterns
𝑠𝑐𝑜𝑟𝑒 (𝑥) < 𝑇 , (9) from the 100000 generated patterns are used to evaluate the pattern
validity. We also cite the numbers of legal patterns (out of 100000)
and the validity values obtained by [24] in Table 2. From this table,
where 𝑇 is a pre-defined threshold to determine the realism of a
we can see that our LayouTransformer achieves the highest pattern
generated pattern.
ICCAD ’22, October 30-November 3, 2022, San Diego, CA, USA Liangjian Wen*, Yi Zhu*, Lei Ye, Guojin Chen, Bei Yu, Jianzhuang Liu, and Chunjing Xu
0.6
Training Patterns
0.5
Generated Patterns
0.4
0.3
0.2
0.1
Figure 10: Some layout patterns with incomplete polygons.
0
4 5 6 7 8 9 10~20 20~130
Number of points
Figure 9: Comparison of the polygon shape distributions (in-
dicated by the numbers of polygon vertexes) of the training
set and the generated set.
validity with different thresholds, even approaching the validity of Figure 11: Examples of partially overlapped polygons.
the real training patterns. The comparison between the training
and generated patterns in Figure 7 and Figure 8 also shows that our edges. Figure 10 shows some such patterns. The proportion of this
generated patterns have high pattern validity from human’s view. kind of anomalous patterns in 100000 generated patterns is only
0.48%. These incomplete polygons with only can be easily checked.
6.4 Statistical Analysis of the Polygon Shape Another category is that different polygons in a layout (partially)
Distribution overlap spatially as shown in Figure 11. These layout patterns seri-
ously violate the DRC rules. Their proportion in 100000 generated
The diversity of generated layout patterns plays a crucial role in
patterns is 7.9%. If we want to recycle these illegal patterns, the
various lithography design applications. Since the pattern complex-
polygons can be adjusted according to the linear programming
ity is defined as the number of scan lines subtracted by one along
algorithm with the constraints of DRC rules [17]. LegalGAN can
the x-axis and y-axis, the Shannon entropy of the pattern com-
also be designed to reduce the number of overlapping polygons.
plexity only measures the difference among layout patterns. If the
These methods will be our further research for improvement. In
generated patterns with the high Shannon entropy of the pattern
this paper, we consider these anomalous patterns as illegal. As re-
complexity are composed of only simple polygons, these generated
ported in Table 1, there are 89726 legal layout patterns out of 100000
patterns cannot improve lithography design applications, because
generated layout patterns.
the generated layout patterns need to have the same complexity of
polygons as the training patterns.
Hence, we further evaluate the diversity of layout patterns gen-
7 CONCLUSION
erated by our method by comparing the polygon shape distribution In this work, we make the first attempt at learning the generation
of the generated patterns with that of the real training patterns. of layout patterns from the perspective of sequential modeling.
Specifically, the shape of a polygon is determined by its vertexes. To this end, we propose a novel and efficient sequential pattern
For simplification, we count the number of vertexes of a polygon representation to explicitly encode the shapes and layouts of pat-
as its shape representation. Figure 9 shows the comparison of the terns. This representation method is lossless, highly structured,
polygon shape distribution in the training set and the generated and has high encoding efficiency. Also, we develop LayouTrans-
set. We observe that the generated set has a similar distributions of former based on Transformer which is a powerful architecture for
polygon complexity to the training set. This further demonstrates sequence modeling and achieves great progress in language gener-
the ability of LayouTransformer to generate diverse layout patterns ation. LayouTransformer regards the design rules behind the train-
whose shape distribution follows that of the training samples. ing patterns as some kind of grammar rules. Compared to existing
learning-based generative methods that typically generate irreg-
6.5 Generated Anomalous Patterns ular patterns due to pixel-level modeling, our LayouTransformer
can well capture the long-term dependencies among the vertexes
Transformer-XL is employed to approximate the true distribution of
and edges of polygons as well as the spatial relationships among
the sequences of layout patterns based on the finite length-limited
different polygons, and generate diverse, realistic and DRC-clean
true sequences. This gives rise to the minor difference between
patterns for lithography design applications. Extensive experiments
the approximate distribution and the true distribution although
show that our LayouTransformer significantly outperforms exist-
Transformer-XL can well fit the training data. Since the generated
ing learning-based methods in terms of pattern diversity, legality,
pattern sequences are randomly sampled from the approximate
and validity, demonstrating its superiority.
distribution, some anomalous patterns are difficult to avoid.
These anomalous patterns can be divided into two categories. ACKNOWLEDGEMENT
One is that a sequence between a pair [𝑆𝑂𝑃] and [𝐸𝑂𝑃] tokens
cannot form a complete polygon with only horizontal and vertical We thank Xiaopeng Zhang for extremely helpful discussion.
LayouTransformer: Generating Layout Patterns with Transformer via Sequential Pattern Modeling ICCAD ’22, October 30-November 3, 2022, San Diego, CA, USA
REFERENCES [25] P. Kareem and Y. Shin, “Synthesis of lithography test patterns using machine
[1] J.-R. Gao, X. Xu, B. Yu, and D. Z. Pan, “Mosaic: Mask optimizing solution with learning model,” IEEE Transactions on Semiconductor Manufacturing, vol. 34, no. 1,
process window aware inverse correction,” in 51st ACM/EDAC/IEEE Design Au- pp. 49–57, 2021.
tomation Conference (DAC), 2014. [26] A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey, “Adversarial autoen-
[2] A. Hamouda, M. Bahnas, D. Schumacher, I. Graur, A. Chen, K. Madkour, H. Ali, coders,” arXiv preprint arXiv:1511.05644, 2015.
J. Meiring, N. Lafferty, and C. McGinty, “Enhanced opc recipe coverage and early [27] C. E. Shannon, “A mathematical theory of communication,” The Bell System
hotspot detection through automated layout generation and analysis,” in Optical Technical Journal, vol. 27, no. 3, pp. 379–423, 1948.
Microlithography, vol. 10147, 2017, pp. 223–231. [28] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser,
[3] B. Jiang, H. Zhang, J. Yang, and E. F. Y. Young, “A fast machine learning-based and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information
mask printability predictor for opc acceleration,” in the 24th Asia and South Pacific Processing Systems, 2017, pp. 5998–6008.
Design Automation Conference, 2019, pp. 412–419. [29] A. Holtzman, J. Buys, L. Du, M. Forbes, and Y. Choi, “The curious case of neural
[4] J. Kuang, W.-K. Chow, and E. F. Y. Young, “A robust approach for process varia- text degeneration,” in 8th International Conference on Learning Representations,
tion aware mask optimization,” in Design, Automation Test in Europe Conference 2020.
Exhibition (DATE), 2015, pp. 1591–1594. [30] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin,
[5] R. Chen, W. Zhong, H. Yang, H. Geng, X. Zeng, and B. Yu, “Faster region-based N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance
hotspot detection,” in 56th ACM/IEEE Design Automation Conference (DAC), 2019. deep learning library,” in Advances in Neural Information Processing Systems,
[6] H. Yang, Y. Lin, B. Yu, and E. F. Y. Young, “Lithography hotspot detection: From vol. 32, 2019.
shallow to deep learning,” in 30th IEEE International System-on-Chip Conference
(SOCC), 2017, pp. 233–238.
[7] H. Yang, J. Su, Y. Zou, Y. Ma, B. Yu, and E. F. Y. Young, “Layout hotspot detection
with feature tensor generation and deep biased learning,” IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, vol. 38, pp. 1175–1187,
2019.
[8] H. Zhang, B. Yu, and E. F. Young, “Enabling online learning in lithography
hotspot detection with information-theoretic feature optimization,” in IEEE/ACM
International Conference on Computer-Aided Design (ICCAD), 2016.
[9] J. Kuang, J. Ye, and E. F. Young, “Simultaneous template optimization and mask
assignment for dsa with multiple patterning,” in 21st Asia and South Pacific Design
Automation Conference (ASP-DAC), 2016, pp. 75–82.
[10] J. Kuang and E. F. Y. Young, “An efficient layout decomposition approach for triple
patterning lithography,” in 50th ACM/EDAC/IEEE Design Automation Conference
(DAC), 2013.
[11] W. Ye, M. B. Alawieh, Y. Lin, and D. Z. Pan, “Lithogan: End-to-end lithogra-
phy modeling with generative adversarial networks,” in 56th ACM/IEEE Design
Automation Conference (DAC), 2019.
[12] G. R. Reddy, K. Madkour, and Y. Makris, “Machine learning-based hotspot detec-
tion: Fallacies, pitfalls and marching orders,” in IEEE/ACM International Conference
on Computer-Aided Design (ICCAD), 2019.
[13] G. R. Reddy, C. Xanthopoulos, and Y. Makris, “Enhanced hotspot detection
through synthetic pattern generation and design of experiments,” in IEEE 36th
VLSI Test Symposium (VTS), 2018.
[14] W. Ye, Y. Lin, M. Li, Q. Liu, and D. Z. Pan, “Lithoroc: Lithography hotspot detection
with explicit roc optimization.” Association for Computing Machinery, 2019.
[15] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint
arXiv:1411.1784, 2014.
[16] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural
Information Processing Systems, vol. 27, 2014.
[17] H. Yang, P. Pathak, F. Gennari, Y.-C. Lai, and B. Yu, “Deepattern: Layout pattern
generation with transforming convolutional auto-encoder,” in 56th ACM/IEEE
Design Automation Conference (DAC), 2019.
[18] F. E. Gennari and Y.-C. La, “Topology design using squish patterns,” US Patent
8,832,621., 2014.
[19] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakan-
tan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger,
T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse,
M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish,
A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,”
in Advances in Neural Information Processing Systems, 2020.
[20] J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirec-
tional transformers for language understanding,” in the North American Chapter
of the Association for Computational Linguistics: Human Language Technologies,
J. Burstein, C. Doran, and T. Solorio, Eds., 2019, pp. 4171–4186.
[21] Z. Yang, Z. Dai, Y. Yang, J. G. Carbonell, R. Salakhutdinov, and Q. V. Le, “Xlnet:
Generalized autoregressive pretraining for language understanding,” in Advances
in Neural Information Processing Systems 32, 2019, pp. 5754–5764.
[22] Z. Dai, Z. Yang, Y. Yang, J. G. Carbonell, Q. V. Le, and R. Salakhutdinov,
“Transformer-xl: Attentive language models beyond a fixed-length context,” in
the 57th Conference of the Association for Computational Linguistics, 2019, pp.
2978–2988.
[23] P. Kareem, Y. Kwon, and Y. Shin, “Layout pattern synthesis for lithography
optimizations,” IEEE Transactions on Semiconductor Manufacturing, vol. 33, no. 2,
pp. 283–290, 2020.
[24] X. Zhang, J. Shiely, and E. F. Young, “Layout pattern generation and legaliza-
tion with generative learning models,” in IEEE/ACM International Conference On
Computer Aided Design (ICCAD), 2020.