Bayesian Optimization
Bayesian Optimization
(1) As far as the authors are aware, KATO is the first BO sizing x2 Output
yet effective Bayesian selection strategy in the BO pipeline. Figure 1: Neural kernel and assessments
(3) As an alternative to DKL, we propose a Neural kernel (Neuk),
which is more powerful and stable for BO. The hyperparameters 𝜃 in kernels are estimated via maximum likeli-
(4) KATO is validated using practical analog designs with state-of- hood estimates (MLE) of L.
the-art methods on multiple experiment setups, showcasing Conditioning on y gives a predictor posterior 𝑓ˆ(x) ∼ N (𝜇 (x), 𝑣 (x))
a 2x speedup and 1.2x performance enhancement. −1 −1
𝜇 (x) = k𝑇 (x) K + 𝜎 2 I y; 𝑣 (x) = 𝑘 (x, x) − k𝑇 (x) K + 𝜎 2 I k(x)
2 BACKGROUND (4)
2.1 Problem Definition 2.3 Bayesian Optimization
Transistor sizing is typically formulated as a constrained optimiza- To maximize 𝑓 (x), we can optimize x by sequentially quarrying
tion problem. The goal is to maximize a specific performance while points such that each point shows an improvement 𝐼 (x) = max( 𝑓ˆ(x)−
ensuring that various metrics meet predefined constraints, i.e., 𝑦 †, 0), where 𝑦 † is the current optimal and 𝑓ˆ(x) is the predictive
posterior in Eq. (4). The possibility for x giving improvements is
argmax 𝑓0 (x) s.t. 𝑓𝑖 (x) ≥ 𝐶𝑖 , ∀𝑖 ∈ 1, . . . , 𝑁𝑐 . (1)
Here, x ∈ R𝑑 represents the design variable, 𝑓𝑖 (x) calculates the i- 𝑃𝐼 (x) = Φ (𝜇 (x) − 𝑦 † )/𝜎 (x) , (5)
th performance metric, and 𝐶𝑖 denotes the required minimum value which is the probability of improvement (PI). For a more informative
for that metric. Given the complexity of solving this constrained solution, we can take the expectative improvement (EI) over the
optimization problem, many convert it into an unconstrained opti- predictive posterior:
mization problem by defining a Figure of Merit (FOM) that combines
the performance metrics. For instance, [11] defines 𝐸𝐼 (x) = (𝜇 (x) − 𝑦 † )𝜓 (𝑢 (x)) + 𝑣 (x)𝜙 (𝑢 (x)) , (6)
𝑁𝑐
∑︁ 𝑚𝑖𝑛(𝑓𝑖 (x), 𝑓𝑖𝑏𝑜𝑢𝑛𝑑 ) − 𝑓𝑖𝑚𝑖𝑛 where 𝜓 (·) and 𝜙 (·) are the probabilistic density function (PDF) and
𝐹𝑂𝑀 (x) = 𝑤𝑖 × , (2) cumulative density function (CDF) of standard normal distribution,
𝑓𝑖𝑚𝑎𝑥 − 𝑓𝑖𝑚𝑖𝑛
𝑖=0 respectively.
where 𝑓𝑖𝑏𝑜𝑢𝑛𝑑 is the pre-determined limit and 𝑓𝑖𝑚𝑎𝑥 and 𝑓𝑖𝑚𝑖𝑛 are The candidate for the next iteration is selected by argmaxx∈ X 𝐸𝐼 (x)
the maximum and minimum values obtained through 10,000 random with gradient descent methods, e.g., L-BFGS-B. Rather than looking
samples, 𝑤𝑖 is −1𝑜𝑟 1 depending on whether the metric is to be into the expected improvement, we can approach the optimal by
maximized or minimized. The FOM method is less desirable due to exploring the areas with higher uncertainty, a.k.a, upper confidence
the difficulty in setting all hyperparameters properly. bound (UCB)
𝑈𝐶𝐵(x) = (𝜇 (x) + 𝛽𝑣 (x)) , (7)
2.2 Gaussian process
Gaussian process (GP) is a common choice as a surrogate model where 𝛽 controls the tradeoff between exploration and exploita-
for building the input-output mapping for complex computer code tion.
due to its flexibility and uncertainty quantification. We can approxi- 3 PROPOSED METHODOLOGIES
mate the black-box function 𝑓0 (x) by placing a GP prior: 𝑓0 (x)|𝜃𝜃 ∼ 3.1 Automatic Kernel Learning: Neural Kernel
GP (𝑚(x), 𝑘 (x, x′ |𝜃𝜃 )), Choosing the right kernel function is vital in BO. Despite DKL’s
where the mean function is normally assumed zero, i.e., 𝑚 0 (x) ≡ 0, success, creating an effective network structure remains challenging.
by virtue of centering the data. The covariance function can take Following [9], we introduce Neural Kernel (Neuk) as a basic unit to
many forms, the most common is the automatic relevance determi- construct an automatic kernel function.
nant (ARD) kernel 𝑘 (x, x′ |𝜃𝜃 ) = 𝜃 0 exp(−(x−x′ )𝑇 𝑑𝑖𝑎𝑔(𝜃 1, . . . , 𝜃𝑙 )(x− Neuk is inspired by the fact that kernel functions can be safely
x′ )), with 𝜃 denoting the hyperparameters. Other kernels like Ratio- composed by adding and multiplying different kernels. This compo-
nal Quadratic (RQ), Periodic (PERD), and Matern can also be used sitional flexibility is mirrored in the architecture of neural networks,
depending on the application. specifically within the linear layers. Neuk leverages this concept
For any given design variable x, 𝑓 (x) is now considered a random by substituting traditional nonlinear activation layers with kernel
variable, and multiple observations 𝑓 (x𝑖 ) form a joint Gaussian functions, facilitating an automatic kernel construction.
with covariance matrix K = [𝐾𝑖 𝑗 ] where 𝐾𝑖 𝑗 = 𝑘 (x𝑖 , x 𝑗 ). With a To illustrate, consider two input vectors, x1 and x2 . In Neuk, these
consideration of the model inadequacy and numerical noise 𝜀 ∼
vectors are processed through multiple kernels {ℎ𝑖 (x, x′ )}𝑖𝑁𝑘 , each
N (0, 𝜎 2 ), the likelihood L is:
undergoing a linear transformation as follows:
1 1 1
L = − y𝑇 (K + 𝜎 2 I) −1 y − ln |K + 𝜎 2 I| − log(2𝜋). (3) ℎ𝑖 (x1, x2 ) = ℎ𝑖 (W (𝑖 ) x1 + b (𝑖 ) , W (𝑖 ) x2 + b (𝑖 ) ) (8)
2 2 2
KATO: Knowledge Alignment And Transfer for Transistor Sizing Of Different Design and Technology DAC ’24, June 23–27, 2024, San Francisco, CA, USA
MP1 MP2
x1
As GP
MP3
C0 R RBF
Parameters
put y (𝑡 ) . Thus, the KAT-GP for the target domain is expressed as
Vin- MN1 MN2 Vin+
Vout
x2 Output
Input RQ
Target Output
Vb1 MN3 Vb2 MN4
Primate Kernel
gnd
Method
source and target domains. The encoder and decoder themselves
C0
RBF
Linear Layer
𝑥
Figure 2: Knowledge Alignment and Transfer (KAT)-GP specific architecture depends on the problem and available data. In
this work, both the encoder and decoder are small shallow neural
Here, W (𝑖 ) and b (𝑖 ) represent the weight and bias of the 𝑖-th kernel networks with linear(𝑑𝑖𝑛 × 32)-sigmoid-linear(32 × 𝑑𝑜𝑢𝑡 ) structure,
function, respectively, and ℎ𝑖 (·) is the corresponding kernel func- where 𝑑𝑖𝑛 and 𝑑𝑜𝑢𝑡 are the input and output dimension during im-
tion. Subsequently, latent variables z are generated through a linear plementation.
combination of these kernels: It is important to note that unless the decoder is a linear operator,
KAT-GP is no longer a GP and does not admit a closed-form solution.
z = W (𝑧 ) h(x1, x2 ) + b (𝑧 ) , (9)
We approximate the predictive mean and variance of the overall
where W (𝑧 ) and b (𝑧 ) are the weight and bias of the linear layer, model through the Delta method, which employs Taylor series ex-
and h(x1, x2 ) = [ℎ 1 (x1, x2 ), . . . , ℎ𝑑𝑙 (x1, x2 )]𝑇 . This configuration pansions to estimate the predictive mean 𝝁 (𝑡 ) (x (𝑡 ) ) and covariance
constitutes the core unit of Neuk. For broader applications, multiple S (𝑡 ) (x (𝑡 ) ) of the transformed output:
Neuk units can be stacked horizontally to form a Deep Neuk (DNeuk)
𝜇 (𝑡 ) (x (𝑡 ) ) = 𝐷 𝝁 (𝑠 ) (𝐸 (x (𝑡 ) )) ; S (𝑡 ) (x (𝑡 ) ) = JS (𝑠 ) J𝑇 , (11)
or vertically for a Wide Neuk (WNeuk). However, in this study, we
utilize a single Neuk unit, finding it sufficiently flexible and efficient where 𝝁 (𝑠 ) (x (𝑠 ) ) and S (𝑠 ) (x (𝑠 ) ) are the predictive mean and co-
without excessively increasing the parameter count. variance of 𝐺𝑃 (x), respectively, and J is the Jacobian matrix of
The final step in the Neuk process involves a nonlinear transfor- D(𝜇 (𝑡 ) (x (𝑡 ) )) with respect to 𝝁 (𝑡 ) (x (𝑡 ) ). Thus, training of KAT-
mation applied to the latent variables z, ensuring the semi-positive GP involves maximizing the log-likelihood in Eq. (12) using gradient
definiteness of the kernel function, descent w.r.t the parameters of the encoder and decoder, as well as
the hyperparameters of the neural kernel.
𝑘𝑛𝑒𝑢𝑘 (x1, x2 ) = exp( 𝑧 𝑗 + 𝑏 (𝑘 ) ). (10)
Í
log N (y𝑖 |𝐷 (𝝁 (𝐸 (x𝑖 ))), J𝑖 S𝑖 J𝑇𝑖 + 𝜎𝑡2 I, ).
(𝑡 ) (𝑡 )
∑︁
L= (12)
A graphical representation of the Neuk architecture is shown in
Fig. 1 along with small experiments demonstrating its effectiveness KAT-GP is illustrated in Fig. 2. It offers the first knowledge transfer
in predicting performance in a 180nm second-stage amplification solution for GP with different design and performance space.
circuit (see Section 4) with 100 training and 50 testing data points. 3.3 Modified Constrained MACE
In transistor sizing, it is crucial to harness the power of parallel
3.2 Knowledge Alignment and Transfer computing by running multiple simulations simultaneously. One
In the literature, transfer learning is predominantly based on deep
of the most popular solutions, MACE [12] resolves this challenge
learning, wherein the knowledge is encoded within the neural net-
by proposing candidates lying on the Pareto frontier of objectives
work weights, facilitating transfer learning through the method of 𝑢𝑖 (x)
𝑖=1 max(0, 𝑢𝑖 (x) ), 𝑖=1 max(0, 𝑣𝑖 (x)
Í𝑁𝑐 Í𝑁𝑐
{𝑈 𝐶𝐵 (x), 𝑃𝐼 (x), 𝐸𝐼 (x), 𝑃 𝐹 (x), )}
fine-tuning these weights on a target dataset. Contrastingly, GPs
using Genetic searching NSGA-II. Here, 𝑃𝐹 (x) is the probability of
present a fundamentally different paradigm. The predictive capa-
feasibility, which use Eq. (5) with all constraint metrics, i.e., 𝑃𝐹 (x) =
bility of GPs is intrinsically tied to the source data they are trained
𝑖 Φ(𝑢𝑖 (x) − 𝐶𝑖 /𝑣𝑖 (x)).
Î 𝑁𝑐
on (see Eq. (4)). This reliance on data for prediction underscores a
significant challenge in applying transfer learning to GPs. Despite its success, MACE suffers from high computational com-
To address the challenge of applying transfer learning to GPs, we plexity, as it requires a Pareto front search with six correlated objec-
propose an innovative encoder-decoder structure, which we refer to tives. To mitigate this issue, we consider the constraint as an addi-
as Knowledge Alignment and Transfer (KAT) in GPs. This approach tional objective for the primal metric 𝑓0 (x) and the multi-objective
retains the intrinsic knowledge of the source GP while aligning it optimization becomes
with the target domain through an encoder and decoder mechanism. argmax{𝑈𝐶𝐵(x), 𝑃𝐼 (x), 𝐸𝐼 (x)} × 𝑃𝐹 (x). (13)
(𝑠 ) (𝑠 )
Consider a source dataset D (𝑠 ) = {(x𝑖 , y𝑖 )}, on which the GP This reduction in dimensionality significantly improves efficiency, as
model 𝐺𝑃 (x) is trained, and a target dataset D (𝑡 ) = {(x𝑖 , y𝑖 )}.
(𝑡 ) (𝑡 ) the complexity grows exponentially with the number of objectives
The first step involves introducing an encoder E(x), which maps while maintaining the same level of performance. Empirically, we
the target input x (𝑡 ) into the source input space x (𝑠 ) . This encoder do not observe any performance degradation.
accounts for potential differences in dimensionality between the 3.4 Selective Transfer Learning with BO
source and target datasets, effectively managing any compression While transfer learning proves effective in numerous scenarios, its
or redundancy. utility is not universal, particularly when the source and target
The target outputs may have different value ranges or even quan- domains differ significantly, such as between an SRAM and an ADC.
tities from the source outputs. We employ a decoder D(y (𝑠 ) ) that It’s important to note that even when the source and target datasets
DAC ’24, June 23–27, 2024, San Francisco, CA, USA Wei W. Xing2 , Weijian Fan1,3 , Zhuohua Liu3 ,Yuan Yao4 and Yuanqi Hu4∗
have an equal number of points, the utility of KAT-GP may not be stage, the capacitance of two capacitors, and the bias currents for all
immediately apparent. However, our empirical studies reveal that three stages. The optimization is specified as:
source and target data often exhibit distinctly different distributions,
with varying concentration regions. This divergence can provide argmin 𝐼𝑡𝑜𝑡𝑎𝑙 ; s.t.; 𝑃𝑀 > 60◦, 𝐺𝐵𝑊 > 2𝑀𝐻𝑧, 𝐺𝑎𝑖𝑛 > 80𝑑𝐵. (16)
valuable insights for the optimization of the target circuit, even when
the target data exceeds the source data in size. The effectiveness Bandgap Reference Circuit is vital in maintaining precise and
of transfer learning in such cases is inherently problem-dependent, stable outputs in analog and mixed-signal systems-on-a-chip. The
requiring adaptable strategies for diverse scenarios. design variables include the length of the input transistor, the widths
To address these challenges, we propose a Selective Transfer of the bias transistors for the operational amplifier, and the resistance
Learning (STL) strategy, which synergizes with the batch nature of of the resistors. The aim is to minimize the temperature coefficient
the MACE algorithm to optimize the benefits of transfer learning. (TC), with constraints on total current consumption (𝐼𝑡𝑜𝑡𝑎𝑙 less than
This approach involves training both a KAT-GP model and a GP 6uA) and power supply rejection ratio (PSRR larger than 50dB):
model (referred to as NeukGP, equipped with a Neural Kernel) ex-
argmin𝑇𝐶; s.t.; 𝐼𝑡𝑜𝑡𝑎𝑙 < 6𝑢𝐴, 𝑃𝑆𝑅𝑅 > 50𝑑𝐵. (17)
clusively on the target data. During the Bayesian Optimization (BO)
process, each model collaborates with MACE to generate proposal For the experiments, each method was repeated five times with
Pareto front sets, denoted as P𝑖 (with 𝑖 = 1, 2 in this context). We ran- different random seeds, and statistical results were reported.
domly select 𝑤1+𝑤
𝑤1 𝑁 points from P1 to form A1, and 𝑤2 𝑁
2 𝐵 𝑤1+𝑤2 𝐵 Baselines were implemented with fine-tuned hyperparameters to
points from P2 to form A2 . Points in A1 and A2 are then simulated ensure optimal performance. All circuits were implemented using
and evaluated. The weights are initialized with the number of sam- 180nm and 40nm Process Design Kits (PDK). KATO is implemented
ples and updated based on the number of simulations that improve in PyTorch with MACE1 . Experiments were carried out on a work-
the current best, i.e., station equipped with an AMD 7950x CPU and 64GB RAM.
𝑤𝑖 = 𝑤𝑖 + |𝑓 (A𝑖) > 𝑦 † |𝑛 , (14)
4.1 Assessment of FOM Optimization
where |𝑓 (A𝑖) > 𝑦 † |𝑛 represents the number of points in A𝑖 that We initiated the evaluation based on the FOM of Eq. (2). KATO was
surpass the current best objective value 𝑦 † , and 𝑓 can be the con- compared against SOTA Bayesian Optimization (BO) techniques for
strained objective or the Figure of Merit (FOM). The STL algorithm a single objective, SMAC-RF2 , along with MACE and a naive random
is summarized in Algorithm 1. search (RS) strategy. All methods are given 10 random simulations
Algorithm 1 KATO with Selective Transfer Learning as the initial dataset, and the sizing results (FOM versus simulation
budget) for the 180nm technology node are shown in Fig. 4. SMAC-
Require: Source dataset D𝑠 , initial target circuit data D𝑡 , # itera-
RF is slightly better than MACE due to this simple single-objective
tions 𝑁𝐼 . 𝐵, batch size 𝑁𝐵 per iteration
optimization task. Notably, KATO outperforms the baselines by a
1: Train KAT-GP on D𝑠 and D𝑡
large margin. Particularly, KATO consistently achieves the maximum
2: Train NeukGP based on D𝑡
FOM, with up to 1.2x improvement, and it takes about 50% fewer
3: for 𝑖 = 1 → 𝑁𝐼 do
simulations to reach a similar optimal FOM. The optimal result of RS
4: Update KAT-GP and NeukGP based on D𝑡
does not actually satisfy all constraints, highlighting the limitation
5: Apply MACE to KAT-GP and NeukGP to generate proposal
of FOM-based optimization.
set P1 and P2 .
6: Form action set A1 and A2 by randomly selecting 𝑤1𝑤+𝑤 1
𝑁 4.2 Assessment of Constrained Optimization
2 𝐵
Next, we assess the proposed method of transistor sizing with a more
points from P1 and 𝑤1 +𝑤2 𝑁𝐵 points from P2 .
𝑤2
practical and challenging constrained optimization setup. During
7: Simulate A1 and A2 and update D𝑡 ← D𝑡 ∪ A1 ∪ A2 .
optimization, only designs satisfying all constraints are considered
8: Update 𝑤 1 and 𝑤 2 based on Eq. (14) and best design x∗ .
valid and included in the performance reports. To provide sufficient
9: end for
valid designs for the surrogate model, we first simulate 300 random
4 EXPERIMENT designs, typically yielding about 7 valid designs, a 2.3% that makes
To assess the effectiveness of KATO, we conducted experiments on RS not applicable in this task.
three analog circuits: a Two-stage operational amplifier (OpAmp), a We compare KATO with SOTA constrained BO tailored for circuit
Three-stage OpAmp, and a Bandgap circuit (depicted in Fig. 3). design, namely, MESMOC [2], USEMOC [3], and MACE with con-
Two-stage Operational Amplifier (OpAmp) focuses on optimiz- straints [12]. The results of 180nm are shown in Fig. 5, where MES-
ing the length of the transistors in the first stage, the capacitance MOC shows a poor performance due to its lack of exploration, and
of the capacitors, the resistance of the resistors, and the bias cur- MACE is generally good except for the three-stage OpAmp. KATO
rents for both stages. The objective is to minimize the total current demonstrates a consistent superiority, always achieving the best
consumption (𝐼𝑡𝑜𝑡𝑎𝑙 ) while meeting specific performance criteria: performance with a clear margin, and most importantly, with about
phase margin (PM) greater than 60 degrees, gain-bandwidth product 50% of simulation cost to reach the best-performing baseline. The
(GBW) over 4MHz, and a gain exceeding 60dB. final design performance is shown in Table 1, where KATO achieves
argmin 𝐼𝑡𝑜𝑡𝑎𝑙 ; s.t.; 𝑃𝑀 > 60◦, 𝐺𝐵𝑊 > 4𝑀𝐻𝑧, 𝐺𝑎𝑖𝑛 > 60𝑑𝐵. (15) the best performance by extreme trade-off for the constraints (e.g.,
Gain) as long as they fulfilling the requirements.
Three-stage OpAmp improves the gain beyond what a two-stage
OpAmp can offer with an additional stage. This variant introduces
more design variables, including the length of transistors in the first 1 https://github.com/Alaya-in-Matrix/MACE 2 https://github.com/automl/SMAC3
KATO: Knowledge Alignment And Transfer for Transistor Sizing Of Different Design and Technology DAC ’24, June 23–27, 2024, San Francisco, CA, USA
(a) Two-stage operational amplifier (b) Three-stage operational amplifier (c) Bandgap
VDD VDD VDD
MP8
MP1 MP6 MP10 MP13 MP22 MP15 MP19
MP1 MP2 MP3 MP1 MP2 MP3 Vb2 MP4 Vb3 MP5
MP2
SF_gate
MP9 MP21
C0 MP5
MP14 MP16 MP18
MP3
off_pnt
R1 C2 MP20
R3 off_pnt
MN1 MN1 MN1 Ibias
Vout MP4
Vout Vin+ MN1 MN2 Vin- C1 Vb1
MN1 MN2 MN2
Vin- Vin+ R2 MN3 MN4 MN5 MN6 MN7 R4
MN1 SF_gate
(a) Two-stage OpAmp (180nm) To Two-stage OpAmp (40nm) (b) Three-stage OpAmp (180nm) To Three-stage OpAmp (40nm)
(c) Three-stage OpAmp (40nm) To Two-stage OpAmp (40nm) (d) Two-stage OpAmp (40nm) To Three-stage OpAmp (40nm)
(e) Three-stage OpAmp (180nm) To Two-stage OpAmp (40nm) (f) Two-stage OpAmp (180nm) To Three-stage OpAmp (40nm)
Figure 6: Transistor sizing constrained optimization with Transfer Learning of designs and technology node
The final design performance is shown in Table 2. Transfer Learn- REFERENCES
ing between technology nodes achieves the best results as it is the [1] Chen Bai, Qi Sun, Jianwang Zhai, Yuzhe Ma, Bei Yu, and Martin DF Wong. 2021.
easier task. Nonetheless, the difference between different transfer BOOM-Explorer: RISC-V BOOM microarchitecture design space exploration frame-
work. In Proc. ICCAD. IEEE, Munich, Germany, 1–9.
learning tasks is not significant. Compared to human experts in [2] Syrine Belakaria, Aryan Deshwal, and Janardhan Rao Doppa. 2020. Max-value
three-stage OpAmp, KATO shows up to 1.62x improvement in key Entropy Search for Multi-Objective Bayesian Optimization with Constraints. CoRR
abs/2009.01721 (2020), 7825 – 7835. arXiv:2009.01721
performance. [3] Syrine Belakaria, Aryan Deshwal, Nitthilan Kannappan Jayakodi, and Janard-
Table 2: Transistor Sizing Optimal Performance via Optimiza- han Rao Doppa. 2020. Uncertainty-Aware Search Framework for Multi-Objective
tion with Constraints with Transfer Learning Bayesian Optimization. In Proc. AAAI. AAAI Press, 10044–10052.
[4] Ibrahim M Elfadel, Duane S Boning, and Xin Li. 2019. Machine learning in VLSI
Two Stage OpAmp(40nm) computer-aided design. Springer, https://www.springer.com/us.
Method I(uA) Gain(dB) PM(◦ ) GBW(MHz) [5] Yaguang Li, Yishuang Lin, Meghna Madhusudan, Arvind Sharma, Sachin Sapat-
nekar, Ramesh Harjani, and Jiang Hu. 2021. A circuit attention network-based
Specifications min >50 >60 >4 actor-critic learning approach to robust analog transistor sizing. In Workshop
Human Expert 308.10 51.77 71.33 7.08 MLCAD. IEEE, Raleigh, North Carolina, USA, 1–6.
[6] Wenlong Lyu, Pan Xue, and Fan Yang. 2018. An Efficient Bayesian Optimization
KATO 273.04 52.44 81.24 21.09 Approach for Automated Optimization of Analog Circuits. IEEE Transactions on
KATO (TL Node) 254.05 50.29 83.72 15.05 Circuits and Systems I: Regular Papers 65, 6 (June 2018), 1954–1967.
KATO (TL Design) 257.12 50.04 82.68 10.28 [7] Wenlong Lyu, Fan Yang, and Changhao Yan. 2018. Batch Bayesian Optimization
KATO (TL Node&Design) 258.01 51.23 85.78 13.31 via Multi-objective Acquisition Ensemble for Automated Analog Circuit Design.
In Proc. ICML, Vol. 80. PMLR, New York, USA, 3312–3320.
Three Stage OpAmp(40nm) [8] Keertana Settaluri, Zhaokai Liu, Rishubh Khurana, Arash Mirhaj, Rajeev Jain, and
Borivoje Nikolic. 2021. Automated design of analog circuits using reinforcement
Specifications min >70 >60 >2 learning. IEEE TCAD 41, 9 (2021), 2794–2807.
Human Expert 244.72 74.10 60.18 2.03 [9] Shengyang Sun, Guodong Zhang, Chaoqi Wang, Wenyuan Zeng, Jiaman Li, and
KATO 151.09 70.23 69.85 3.49 Roger Grosse. 2018. Differentiable compositional kernel learning for Gaussian
processes. In Proc. ICML. PMLR, PMLR, Vienna, Austria, 4828–4837.
KATO (TL Node) 118.47 74.41 71.84 2.65 [10] Konstantinos Touloupas, Nikos Chouridis, and Paul P Sotiriadis. 2021. Local
KATO (TL Design) 118.71 71.46 72.92 2.43 bayesian optimization for analog circuit sizing. In Proc. DAC. IEEE, IEEE, San
KATO (TL Node&Design) 120.08 70.44 73.44 2.48 Francisco, California, USA, 1237–1242.
[11] Hanrui Wang, Kuan Wang, and Jiacheng Yang. 2020. GCN-RL circuit designer:
5 CONCLUSION Transferable transistor sizing with graph neural networks and reinforcement
learning. In Proc. DAC. IEEE, IEEE, San Francisco, California, USA, 1–6.
We propose, KATO, a novel transfer learning for transistor sizing, [12] Shuhan Zhang, Fan Yang, Changhao Yan, Dian Zhou, and Xuan Zeng. 2021. An
which enables transferring knowledge from different designs and efficient batch-constrained bayesian optimization approach for analog circuit
technologies for BO for the first time. Except for improving the SOTA, synthesis via multiobjective acquisition ensemble. IEEE TCAD 41, 1 (2021), 1–14.
[13] Zheng Zhang, Tinghuan Chen, Jiaxin Huang, and Meng Zhang. 2022. A fast
we hope the idea of KAT can inspires more interesting research. parameter tuning framework via transfer learning and multi-objective bayesian
Further extension includes extending transfer learning to many optimization. In Proc. DAC. ACM, San Francisco, California, USA, 133–138.
different circuits of various types, e.g., SRAM, ADC, and PLL.