Multiagent Based Reinforcement Learning MA-RL An Automated Designer For Complex Analog Circuits
Multiagent Based Reinforcement Learning MA-RL An Automated Designer For Complex Analog Circuits
Abstract—Despite the effort of analog circuit design automa- demonstrated. It is shown that MA-RL framework can achieve
tion, currently complex analog circuit design still requires the best-Figure of Merits for complex analog circuits’ design.
extensive manual iterations, making it labor intensive and time- This work shines the light for future large scale analog circuit
consuming. Recently, reinforcement learning (RL) algorithms system design automation.
have been demonstrated successfully for the analog circuit design
optimization. However, a robust and highly efficient RL method Index Terms—Circuit design automation, complex analog
to design analog circuits with complex design space has not circuits, multiagent reinforcement learning (MA-RL), proximal
been fully explored yet. In this work, inspired by multiagent policy optimization (PPO), twin delayed deep deterministic policy
planning theory as well as human expert design practice, we gradient (DDPG).
propose a multiagent-based RL (MA-RL) framework to tackle
this issue. Particularly, we 1) partition the complex analog
circuits into several subblocks based on topology information I. I NTRODUCTION
and effectively reduce the complexity of design search space; UE TO the development of Internet of Things (IoT),
2) leverage MA-RL for the circuit optimization, where each
agent corresponds to a single subblock, and the interactions
between agents delicately mimic the best-design tradeoffs between
D 5G/6G communication and edge computing, the demand
for electronic chip integrated circuit (IC) design increase
circuit subblocks by human experts; 3) introduce and compare drastically. Digital circuit design implements functionalities
three different multiagent RL algorithms and corresponding based on standard cells and takes advantages of various
frameworks to demonstrate the effectiveness of the MA-RL computer aided design (CAD) tools [1]. However, the design
method; 4) employing twin-delayed techniques and proximal
of analog and mixed signal circuits (AMS) is still extremely
policy to further boost training stability and accomplish higher
performances; 5) the impacts of different reward function labor intensive and time consuming due to their highly
definitions as well as different state settings of MA-RL agents are nonlinear behavior and complex tradeoffs among circuit Specs.
investigated to further improve the robustness of this framework; It still heavily relies on the iterative optimization of analog
and 6) experiments on three different complex analog circuit circuit topology selection, component parameters selection,
topologies (gain boost amplifier, delay-locked loop, and SAR
and layout routing by human design experts. It is not only time
ADC) and knowledge transfers between two technology nodes are
consuming but also highly designer experience dependent,
making the analog circuit design an “art” with large variations.
To tackle this problem, automated design efforts for analog
circuits have been increasingly investigated [2], [3], [4], [5],
[6], [17]. These efforts primarily focus on three different
Manuscript received 14 July 2023; revised 12 November 2023, 10 January
2024, and 11 April 2024; accepted 25 April 2024. Date of publication aspects: 1) automated selection of circuit topology [14], [15];
8 May 2024; date of current version 22 November 2024. This work was 2) synthesis and optimization of circuit parameters, such as
supported in part by the National Key Research and Development Program of device sizing under a determined topology [2], [3], [4], [5],
China under Grant 2020YFA0711900 and Grant 2020YFA0711901; in part
by the National Natural Science Foundation of China Research under Project [6], [7], [8], [9], [10], [11], [12], [13]; and 3) automatic
62350610270, Project 62374034, Project 62235009, Project 62141407, and layout placement and routing [16], [17]. This work focuses
Project 62304052; in part by the Innovation Program of Shanghai Municipal on the second aspect. It is also noted some commercial EDA
Education Commission under Grant 2021-01-07-00-07-E00077; and in part
by the Natural Science Foundation of Shanghai under Grant 22ZR1403500. software begins to provide sizing functions for analog circuit,
This article was recommended by Associate Editor G. G. E. Gielen. e.g., “Global optimization” tool in Cadence could achieve
(Corresponding authors: Zhangcheng Huang; Xuan Zeng; Ye Lu.) automated and good sizing results for some analog circuits, its
Jiarui Bao, Jinxin Zhang, Xingwei Feng, and Ye Lu are with the State Key
Laboratory of Integrated Chips and Systems, School of Information Science underline algorithms are unrevealed to public and its capability
and Technology, Fudan University, Shanghai 200433, China (e-mail: lu_ye@ for optimizing complex circuit with multiple subblocks is still
fudan.edu.cn). unknown. Therefore, academic research on novel algorithms
Zhangcheng Huang is with the Frontier Institute of Chip and System
Shanghai, Fudan University, Shanghai 200433, China (e-mail: huangzc@ for complex analog circuit optimization is still desired, it
fudan.edu.cn). contributes a better understanding of these algorithms for
Zhaori Bi and Xuan Zeng are with the State Key Laboratory of Integrated the community and it may also serve as a good supple-
Chips and Systems, School of Microelectronics, Fudan University, Shanghai
200433, China (e-mail: [email protected]). ment to existing tools in certain scenarios. Existing research
Digital Object Identifier 10.1109/TCAD.2024.3398554 works on analog circuit parameter optimization can be
1937-4151
c 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on December 24,2024 at 12:52:20 UTC from IEEE Xplore. Restrictions apply.
BAO et al.: MA-RL: AN AUTOMATED DESIGNER FOR COMPLEX ANALOG CIRCUITS 4399
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on December 24,2024 at 12:52:20 UTC from IEEE Xplore. Restrictions apply.
4400 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 43, NO. 12, DECEMBER 2024
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on December 24,2024 at 12:52:20 UTC from IEEE Xplore. Restrictions apply.
BAO et al.: MA-RL: AN AUTOMATED DESIGNER FOR COMPLEX ANALOG CIRCUITS 4401
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on December 24,2024 at 12:52:20 UTC from IEEE Xplore. Restrictions apply.
4402 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 43, NO. 12, DECEMBER 2024
where st,min
j is defined as the target value of a Spec if that Algorithm 1: MADDPGfor Circuit Optimization
particular Spec is the smaller the better, while st,max
j is defined Create N agents for complex analog circuit’s
N sub-blocks
μ
as the target value of a Spec if that particular Spec is the larger Randomly ihitialize
theactor network μi Si | θi and the critic
Q
the better, otherwise they are set to be the worst boundary. network Qi Si , Ai | θi in the i-th agent for all i ∈ N;
Initialize replay buffer D and batch size B;
Taking gain as an example, if gain of a circuit is expected to for t = 1 to T do
be 10 dB or greater, then st,max
j is set to be 10 and st,min
j is Randomly generate initial states in each agent x = S1 , . . . , SN ;
set to be 0. If sj is worse than the worst-boundary value, −1
t
if 1 size (D) < B then
is assigned as the score for this reward. μ
Generate random sample action Ai ← μi Si | θi in the i-th
rjt,b is defined referring to [34]. Particularly, a clipping agent for i ∈ N
else
mechanism is introduced for Spec reward definition to alleviate Select action for the i-th agent according
μ
to policy and
the problem of the over-shooting of certain Specs while under- exploration noise σ, Ai ← μi Si | θi + σ ;
weighting others. The clipping reward of the jth performance end if1
Spec rjt,b can be divided into rj,u
t and r t , each defined as the
j,d Use A A1 , . . . , AN as input for circuit simulation and observe
following:
⎧ new state x S1 , . . . , SN and reward R R1 , . . . , RN ;
⎪
⎪ −1, if stj,u < st,min Store transition x, A, R, x in replay buffer D;
⎪
⎪ j,u
⎨ s t −s t,min
t,min if 2 size (D) > B then
t,min , if sj,u > sj,u and j∈M rjt < M
u j,u t
t,b
rj,u = min 1, st,max −s Sample a batch of B transitions x̂, Â, R̂, x̂ in D,
⎪
⎪
j,u j,u
⎪
⎪ stu −st,min for agent i =1 to N do
⎩ t,max j,u
t,min , if stj,u > st,min
j,u and j∈M rj ≥ M
t
μ
sj,u −sj,u Âi ← μ Si | θi ;
(12) yi ← ri + γ · Qi x̂ , Â1 , . . . , ÂN ;
⎧
⎪
⎪ −1, if stj,d > st,max update the critic network in the i-th agent by minimizing
⎪
⎪ d
2
⎨ t,max
sj,d − stj,d t,max loss function: L = B1 Qi x̂, Â1 , . . . , ÂN − yi ;
t,min , if sj,d < sj,d and j∈M rjt < M
t
t,b
rj,d = min 1, st,max −s update
⎪
⎪
j,d j,d μactor
network in i-th agent by the gradient ;
⎪ st,max
⎪ j,d − sj,d
t ∇θi J θi =
⎩ t,max , if stj,d < st,max and j∈M rjt ≥ M
j,d 1 ∇ Q x̂, Â1 , . . . , ÂN ∇ μ μ Si | θ μ ;
t,min
sj,d −sj,d
B ai i
i θ i i
(13) Ŝi
end for
t represents the reward for the Spec st , which is the end if2
where rj,u j,u
t represents the reward for the Spec st ,
larger the better. rj,d end for
j,d
which is the smaller the better. st,max t,min
j,u and sj,d are defined as
the target values of the Spec stj,u and the Spec stj,d , respectively,
while st,min t,max
j,u and sj,d are defined as the worst-boundary values. 1) In the execution phase, MADDPG establishes a replay
When a Spec fails to meet the worst boundary, the reward of the buffer D to store the sampled transitions (x, A, R, x ). If
Spec is set to −1. When a Spec meets the design objective, but the number of transitions in D is less than the batch size
the entire circuit does not satisfy its overall design objective, B, the untrained actor networks are used to generate ran-
the reward is fixed at 1. Once the entire circuit meets its overall dom design parameters, which are then used for circuit
design objective, the reward of this particular Spec continues simulation. However, once the number of transitions in
to increase and it is further optimized. D becomes larger than the batch size B, predictions of
the optimal policy are made using the actor networks
D. Algorithm Execution Flow that had been trained. In the execution phase, a particular
In the optimization process, we first leverage multiagent actor network only needs the information of its own
DDPG (MADDPG) algorithm [29], and the detailed algo- local agent (Si in x) to generate an action Ai . At the
rithm flow is shown in Algorithm 1. x= (S1 . . . , SN ) and same time, exploration noise σ is added to the action
x (S1 , . . . , SN ) denote the current and the next set of states Ai to help reach the global optimal point. The action
of the circuit with N subblocks. Similarly, A(A1 , . . . , AN ) and vector A(A1 , . . . , AN ) is provided to the circuit for circuit
R(R1 , . . . , RN ) denote the set of actions and rewards of N simulation, and then the next state x and reward R for
subblocks, respectively. Each agent in MADDPG contains all subblock circuits can be observed.
a critic network and an actor network. The critic network 2) The centralized training phase is based on the transitions
provides a Q value which contains the reward and future (x, A, R, x ) collected in the execution phase. Concretely,
discounted reward of subblock. The actor network is trained each agent takes the information from all agents to
to get an action that maximizes the Q value. train its critic network, and then the output of the critic
Noted that DDPG requires a stable environment for training, network of an agent influences its corresponding actor
while subblocks of a circuit interact with each other, therefore network by changing its loss function. For example,
these subblocks cannot be trained by multiple parallel non- in a complex analog circuit with N subblocks, the
interactive DDPG algorithms. MADDPG takes advantage of performance of the ith subblock is influenced not only
centralized training with decentralized execution to solve this by its own design parameters but also by those in other
problem. subblocks. In the execution in MADDPG framework,
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on December 24,2024 at 12:52:20 UTC from IEEE Xplore. Restrictions apply.
BAO et al.: MA-RL: AN AUTOMATED DESIGNER FOR COMPLEX ANALOG CIRCUITS 4403
each agent is able to take into account the design Algorithm 2: MATD3 for circuit optimization.
parameters and state vectors from all subblocks for Provide N agents for complexanalog circuit’s N sub-blocks Randomly
i μ
calculating the Q value both for the current state and initialize
μi S | θi and two critic networks
the actor network
Q Q
that of the next state. Combining the reward Ri and Q1,i Si , Ai | θi 1 , Q2,i Si , Ai | θi 2 in the i-th agent;
Q , the state-action value function yi of the ith agent The pseudo-code in the next part is the same as that in Algorithm 1
except if 2 loop is replaced by if 3 loop as shown below:
is obtained using the Bellman equation, as described if 3 size (D) > B then
in Algorithm 1. The critic network is updated based Sample a batch of B transitions x̂, Â, R̂, x̂ in D;
on the loss function of the current Q value and yi . for agent i = 1 to N do
Subsequently, the estimated Q value obtained by the Obtain target actor according to the actor
network and
μ
critic network is used to compute policy gradients, regularization noise , Âi ← μ Si | θi + ; yi ←
which are then used to update the actor network. The ri + γ · min Q1,i x̂ , Â1 , . . . , ÂN , Q2,i x̂ , Â1 , . . . , ÂN ;
centralized training method of MADDPG enables each update the critic networks in the i-th agent by minimizing loss
Q 2
agent to learn and interact effectively in an environment function: L θi 1 ← B1 Q1,i x̂, Â1 , . . . , ÂN − yi ;
2
affected by the influences of other subblocks and is Q
L θi 2 ← B1 Q2,i x̂, Â1 , . . . , ÂN − yi ;
suitable for complex analog circuits that require multiple if 4 t mod d then
subblocks for tradeoff optimization. update
μactor
network in i-th agent by the gradient ;
3) For MA-RL algorithm termination, a maximum number ∇θi J θi =
i μ
1 ∇ Q
ai 1,i x̂, Â , . . . , Â ∇θ μ μi S | θi ;
of simulation times is first set based on the design 1 N
B i Ŝi
complexity of a particular circuit, e.g., 15 000 for a end if4
gain boost amplifier (GBA) circuit case. A particular
end for
design solution is updated and recorded if its reward
end if3
is higher than the previous one. Finally, if the design
solution does not reach the design Specs for the entire
15 000 runs, the one with the highest reward will be
output. On the other side, if the design Specs are This mitigates the risk of value function overestimation
reached for a particular run and the result of this run caused by training noise. Similar to that of Algorithm 1, the
is not updated for another 1000 simulation runs, we Bellman equation for each agent in Algorithm 2 needs to be
consider the result is converged and the algorithm will be modified as the following when the “Done” flag is introduced:
terminated.
It is noted prior art a “Done” flag is incorporated in the yi ← ri + γ .min Q1,i x ,
A1 , . . . ,
AN
termination condition [35], this has also been studied for the
complex analog circuit design optimization task in Section IV. x ,
Q2,i A1 , . . . ,
AN (1 − Done). (16)
The “Done” flag is a Boolean variable assigned either 0
2) Delayed Policy Update: Actor network is updated at
or 1, where the change in assignment indicates the end of
a lower frequency than critic networks to minimize the
an episode and the start of a new one. When the “Done”
expectation of TD-error and improve the stability of circuit
flag is introduced, the Bellman equation for each agent in
strategical training.
Algorithm 1 is modified as follows:
3) Target Policy Smoothing Regularization: Normal dis-
x ,
yi ← ri + γ .Qi A1 , . . . ,
AN (1 − Done). (14) tributed noise is added to the training actions in critical
network for regularization, and this is to reduce ANN overfit-
ting and improve the stability of circuit training. The execution
E. Enhancing RL Agent With Twin Delayed Technique flow of these changes is detailed in Algorithm 2.
Similar to DQN’s max operation on the value function, As an initial test, we compare the robustness of the TD3
MADDPG also tends to overestimate the value function [31]. and DDPG algorithms in the optimization of a rail-to-rail
In circuit automation task, it may preferentially select circuit OTA [Fig. 3(a)] designed on commercial 0.13-μm mixed-
transitions (A, S, S , R) with a large value that is overestimated signal CMOS process PDK. The rail-to-rail OTA has 36
and finish circuit design in a suboptimal point. Furthermore, tunable design parameters and 7 design objectives. The design
DDPG algorithm appears to suffer from weak robustness and objectives are defined as follows:
stability.
Open Loop Gain ≥ 100 dB
To further improve design automation quality and efficiency.
Multiagent twin delayed DDPG (TD3) algorithm is introduced Unity Gain Frequency ≥ 50 MHz
for model training [31], and there are three major techniques Power ≤ 200μ W
added in TD3. Slew Rate ≥ 10Vμs
1) Double Network: Two critic networks instead of one are
PSRR ≥ 80 dB
employed for value function evaluation, and the lower-value
one is used for the target update CMRR ≥ 100 dB
Input common − mode range = 1.2V. (17)
yi ← ri + γ .min Q1,i x ,
A1 , . . . ,
AN
In the comparison, the rewards are calculated based on (10)
Q2,i x ,
A1 , . . . ,
AN . (15) and (11), and then normalized according to the design Specs.
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on December 24,2024 at 12:52:20 UTC from IEEE Xplore. Restrictions apply.
4404 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 43, NO. 12, DECEMBER 2024
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on December 24,2024 at 12:52:20 UTC from IEEE Xplore. Restrictions apply.
4406 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 43, NO. 12, DECEMBER 2024
Fig. 5. Schematic of (a) DLL circuits, (b) timing diagram, (c) PD block,
(d) VCDL block, and (e) CP block. Fig. 6. Schematic of (a) SAR ADC circuits, (b) BS block (c) CDAC,
(d) comparator block, and (e) successive approximation register logic block.
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on December 24,2024 at 12:52:20 UTC from IEEE Xplore. Restrictions apply.
BAO et al.: MA-RL: AN AUTOMATED DESIGNER FOR COMPLEX ANALOG CIRCUITS 4407
TABLE I
harmonic distortion (THD), (e) power, and (f ) FOM. The P ERFORMANCE S PECS C OMPARISONS OF G AIN B OOST A MPLIFIER
design objectives are defined as follows: O BTAINED BY T EN O PTIMIZATIONS OF MATD3 W ITH
T WO D IFFERENT R EWARD S ETTINGS
Effective Number of Bits ≥ 9.5 bit
Signal − to − Noise and Distortion Ratio ≥ 60 dB
Spurious − Free Dynamic Range ≥ 65 dB
Total Harmonic Distortion ≤ −60 dB
Power ≤ 8 mW
FOM ≤ 220fJ/c.step. (21)
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on December 24,2024 at 12:52:20 UTC from IEEE Xplore. Restrictions apply.
4408 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 43, NO. 12, DECEMBER 2024
TABLE II
P ERFORMANCE M ETRICS C OMPARISONS OF GBA
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on December 24,2024 at 12:52:20 UTC from IEEE Xplore. Restrictions apply.
BAO et al.: MA-RL: AN AUTOMATED DESIGNER FOR COMPLEX ANALOG CIRCUITS 4409
TABLE III
P ERFORMANCE S PECS C OMPARISONS OF DLL
TABLE IV
P ERFORMANCE S PECS C OMPARISONS OF SAR ADC
TABLE V
P ERFORMANCE S PECS OF DLL FOR D IRECT MA-RL ACTION
T RANSFER F ROM 0.13 T O 0.18-MM T ECHNOLOGY
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on December 24,2024 at 12:52:20 UTC from IEEE Xplore. Restrictions apply.
4410 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 43, NO. 12, DECEMBER 2024
V. C ONCLUSION [16] H. Chen et al., “MAGICAL: An open- source fully automated analog IC
layout system from netlist to GDSII,” IEEE Design Test, vol. 38, no. 2,
We propose the MA-RL-based methods for the automated pp. 19–26, Apr. 2021.
optimization of complex analog circuits to achieve better [17] H. Habal and H. Graeb, “Constraint-based layout-driven sizing of analog
performance. The circuit is proposed to be partitioned into circuits,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.,
vol. 30, no. 8, pp. 1089–1102, Aug. 2011.
subblocks based on topology information. Each subblock is [18] M. Fakhfakh, Y. Cooren, A. Sallem, M. Loulou, and P. Siarry, “Analog
assigned to an RL agent for which a specific reward is circuit design optimization through the particle swarm optimization
designed based on design knowledge. Then agents interact technique,” Analog Integr. Circuits Signal Process., vol. 63, no. 1,
pp. 71–82, 2011.
with each other during the training process to reach the overall [19] M. del Mar Hershenson, S. S. Mohan, S. P. Boyd, and T. H. Lee,
design goal. Furthermore, the twin delayed technique and PPO “Optimization of inductor circuits via geoSpec programming,” in Proc.
are introduced to further improve the training stability and Design Autom. Conf. (DAC), 1999, pp. 994–998.
[20] A. K. Singh, K. Ragab, M. Lok, C. Caramanis, and M. Orshansky,
efficiency. Finally, adopting reward clipping and setting circuit “Predictable equation-based analog optimization based on explicit cap-
operating conditions as agent states can help improve the ture of modeling error statistics,” IEEE Trans. Comput.-Aided Design
success rate of the optimization algorithm in achieving design Integr. Circuits Syst., vol. 31, no. 10, pp. 1485–1498, Oct. 2012.
[21] Y. Li, Y. Wang, Y. Li, R. Zhou, and Z. Lin, “An artificial neural network
objectives. This work opens the pathway for large scale analog assisted optimization system for analog design space exploration,” IEEE
circuits as well as system design. Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 39, no. 10,
pp. 2640–2653, Oct. 2020.
[22] B. Liu, D. Zhao, P. Reynaert, and G. G. E. Gielen, “GASPAD: A
general and efficient mm-wave integrated circuit synthesis method
R EFERENCES based on surrogate model assisted evolutionary algorithm,” IEEE Trans.
Comput.-Aided Design Integr. Circuits Syst., vol. 33, no. 2, pp. 169–182,
[1] I. A. M. Elfadel, D. S. Boning, and X. Li, Machine Learning in VLSI Feb. 2014.
Computer-Aided Design. Cham, Switzerland: Springer, 2018. [23] Z. Zhao and L. Zhang, “Deep reinforcement learning for analog circuit
[2] E. Deniz and G. Dündar, “Hierarchical performance estimation of analog sizing,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), 2020, pp. 1–5.
blocks using Pareto Fronts,” in Proc. 6th Conf. Ph.D. Res. Microelectron. [24] E. Ephrati and J. S. Rosenschein, “Divide and conquer in multi-
Electron., 2010, pp. 1–4. agent planning,” in Proc. Int. Conf. Assoc. Adv. Artif. Intell., 1994,
[3] G. Alpaydin, S. Balkir, and G. Dundar, “An evolutionary approach pp. 375–380.
to automatic synthesis of high-performance analog integrated cir- [25] G. Berkol, E. Afacan, G. Dündar, and E. V. Fernandez, “A hierarchical
cuits,” IEEE Trans. Evol. Comput., vol. 7, no. 3, pp. 240–252, design automation concept for analog circuits,” in Proc. IEEE Int. Conf.
Jun. 2003. Electron., Circuits Syst. (ICECS), 2016, pp. 133–136.
[4] M. Barros, J. Guilherme, and N. Horta, “Analog circuits optimization [26] M. Neuner and H. Graeb, “Hierarchical analog power-down synthe-
based on evolutionary computation techniques,” Integration, vol.43, sis,” in Proc. 27th IEEE Int. Conf. Electronics, Circuits Syst. (ICECS),
no. 1, pp. 136–155, 2010. 2020, pp. 1–4.
[5] G. Wolfe and R. Vemuri, “Extraction and use of neural network [27] V. Mnih et al., “Human-level control through deep reinforcement
models in automated synthesis of operational amplifiers,” IEEE Trans. learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
Comput.-Aided Design Integr. Circuits Syst., vol. 22, no. 2, pp. 198–212, [28] T. P. Lillicrap et al., “Continuous control with deep reinforcement
Feb. 2003. learning,” 2015, arXiv:1509.02971.
[6] J. P. S. Rosa, D. J. D. Guerra, N. C. G. Horta, R. M. F. Martins, [29] R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch,
and N. C. C. Lourenço, Using ANNs to Size Analog Integrated Circuits, “Multi-agent actor–critic for mixed cooperative-competitive environ-
Cham, Switzerland: Springer Int. Publ., 2019. ments,” in Proc. 31st Annu. Conf. Neural Inf. Process. Syst. (NIPS),
[7] W. L. Lyu, F. Yang, C. H. Yan, D. A. Zhou, and X. Zeng, “Batch 2017, pp. 6382–6393.
Bayesian optimization via multi-objective acquisition ensemble for [30] D. Silver, G. Lever, N. Heess, T. Degris, and M. Riedmiller,
automated analog circuit design,” in Proc. 35th Int. Conf. Mach. Learn. “Deterministic policy gradient algorithms,” in Proc. 31st Int. Conf.
(ICML), 2018, pp. 10–15. Mach. Learn. (ICML), 2014, pp. 387–395.
[8] W. Lyu et al., “An efficient Bayesian optimization approach for auto- [31] S. Fujimoto, H. V. Hoof, and D. Meger, “Addressing function approx-
mated optimization of analog circuits,” IEEE Trans. Circuits Syst. I, Reg. imation error in actor–critic methods,” in Proc. 35th Int. Conf. Mach.
Papers, vol. 65, no. 6, pp. 1954–1967, Jun. 2018. Learn. (ICML), 2018, pp. 1587–1596.
[32] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov,
[9] S. Zhang, F. Yang, C. Yan, D. Zhou, and X. Zeng, “An efficient batch-
“Proximal policy optimization algorithms,” 2017, arXiv:1707.06347.
constrained Bayesian optimization approach for analog circuit synthesis
[33] J. Wang, L. Siek, R. Filippi, and K. A. Ng, “A top-down design
via multiobjective acquisition ensemble,” IEEE Trans. Comput.-Aided
verification based on reuse modular and paraspec behavioral modeling
Design Integr. Circuits Syst., vol. 41, no. 1, pp. 1–14, Jan. 2022.
for subranging pipelined analog-to-digital converter,” in Proc. Int. Symp.
[10] H. Wang et al., “GCN-RL circuit designer: Transferable transistor sizing Integr. Circuits (ISIC), 2007, pp. 378–381.
with graph neural networks and reinforcement learning,” in Proc. 57th [34] W. Shi et al., “A robustanalog: Fast variation-aware analog circuit design
ACM/IEEE Design Autom. Conf. (DAC), 2020, pp. 1–6. via multi-task RL,” in Proc. ACM/IEEE 4th Workshop Mach. Learn.
[11] N. S. Karthik Somayaji, H. Hu, and P. Li, “Prioritized reinforcement CAD (MLCAD), 2022, pp. 35–41.
learning for analog circuit optimization with design knowledge,” in Proc. [35] Z. Li and A. C. Carusone, “Design and optimization of low-dropout
58th ACM/IEEE Design Autom. Conf. (DAC), 2021, pp. 1231–1236. voltage regulator using relational graph neural network and reinforce-
[12] K. Settaluri, A. Haj-Ali, Q. Huang, K. Hakhamaneshi, and ment learning in open-source SKY130 process,” in Proc. IEEE/ACM
B. Nikolic, “AutoCkt: Deep reinforcement learning of analog circuit Int. Conf. Comput. Aided Design (ICCAD), 2023, pp. 1–9.
designs,” in Proc. Design, Autom. Test Eur. Conf. Exhib. (DATE), 2020,
pp. 490–495.
[13] K. Settaluri, Z. Liu, R. Khurana, A. Mirhaj, R. Jain, and B. Nikolic,
“Automated design of analog circuits using reinforcement learn- Jiarui Bao received the B.S. degree in mate-
ing,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 41, rial physics from Nanchang University, Nanchang,
no. 9, pp. 2794–2807, Sep. 2022. China, in 2018. He is currently pursuing the Ph.D.
[14] T. Sripramong and C. Toumazou, “The invention of CMOS ampli- degree with the School of Information Science and
fiers using genetic programming and current-flow analysis,” IEEE Technology, Fudan University, Shanghai, China.
Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 21, no. 11, His research interests include analog circuit
pp. 1237–1252, Nov. 2002. design automation and optimization, RF circuit
design automation, and semiconductor devices and
[15] X. Wang and L. Hedrich, “An approach to topology synthesis of analog
physics.
circuits using hierarchical blocks and symbolic analysis,” in Proc. Asia
South Pac. Conf. Design Autom. (ASPDAC), 2006, pp. 700–705.
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on December 24,2024 at 12:52:20 UTC from IEEE Xplore. Restrictions apply.
BAO et al.: MA-RL: AN AUTOMATED DESIGNER FOR COMPLEX ANALOG CIRCUITS 4411
Jinxin Zhang received the B.S. degree in Xuan Zeng (Senior Member, IEEE) received the
information engineering from Shandong University, B.S. and Ph.D. degrees in electrical engineering
Jinan, Shandong, China, in 2020, and the M.S. from Fudan University, Shanghai, China, in 1991
degree from Fudan University, Shanghai, China, in and 1997, respectively.
2023. She is currently a Full Professor with the
His research interests include analog circuit Microelectronics Department. She served as the
design automation and reinforcement learning. Director of the State Key Laboratory of Application
Specific Integrated Circuits and Systems, Fudan
University from 2008 to 2012. She was a
Visiting Professor with the Department of Electrical
Engineering, Texas A&M University, College
Station, TX, USA, and with the Microelectronics Department, Technische
Universiteit Delft, Delft, The Netherlands, in 2002 and 2003, respectively. Her
current research interests include analog circuit modeling and synthesis, design
Zhangcheng Huang (Member, IEEE) received the for manufacturability, high-speed interconnect analysis and optimization, and
B.S. degree in physics from Nanjing University, circuit simulation.
Nanjing, China, in 2006, and the Ph.D. degree Prof. Zeng received the Best Paper Award from Integration, the VLSI
in microelectronics from the Chinese Academy of Journal 2018 and the Best Paper Award from the 8th IEEE Annual
Sciences, Beijing, China, in 2011. Ubiquitous Computing, Electronics and Mobile Communication Conference
From 2011 to 2013, he was an Assistant Professor 2017. She received the Changjiang Distinguished Professor with the Ministry
with the Shanghai Institute of Technical Physics, of Education Department of China in 2014, the Chinese National Science
Chinese Academy of Sciences. From 2013 to 2020, Funds for Distinguished Young Scientists in 2011 and the First-Class of
he was an Associate Professor. He is currently Natural Science Prize of Shanghai in 2012, 10th For Women in Science Award
an Associate Professor with the Frontier Institute in China in 2013, and Shanghai Municipal Natural Science Peony Award in
of Chip and System, Fudan University, Shanghai, 2014. She is an Associate Editor of IEEE T RANSACTIONS ON C IRCUITS AND
China. His research interests include high-performance ASICs for detectors S YSTEMS : PART II, IEEE T RANSACTIONS ON C OMPUTER A IDED D ESIGN
and smart imaging sensors. OF I NTEGRATED C IRCUITS AND S YSTEMS , and ACM T RANSACTIONS ON
D ESIGN AUTOMATION ON E LECTRONICS AND S YSTEMS.
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on December 24,2024 at 12:52:20 UTC from IEEE Xplore. Restrictions apply.