Physics-Inspired Neural Networks For Efficient Device Compact Modeling
Physics-Inspired Neural Networks For Efficient Device Compact Modeling
Received 22 October 2016; revised 30 November 2016; accepted 2 December 2016. Date of publication 8 December 2016;
date of current version 23 January 2017.
Digital Object Identifier 10.1109/JXCDC.2016.2636161
ABSTRACT We present a novel physics-inspired neural network (Pi-NN) approach for compact modeling.
Development of high-quality compact models for devices is a key to connect device science with applications.
One recent approach is to treat compact modeling as a regression problem in machine learning. The most
common learning algorithm to develop compact models is the multilayer perceptron (MLP) neural network.
However, device compact models derived using the MLP neural networks often exhibit unphysical behavior,
which is eliminated in the Pi-NN approach proposed in this paper, since the Pi-NN incorporates fundamental
device physics. As a result, smooth, accurate, and computationally efficient device models can be learned
from discrete data points by using Pi-NN. This paper sheds new light on the future of the neural network
compact modeling.
INDEX TERMS Supervised learning, artificial neural networks, semiconductor device modeling, TFETs.
I. INTRODUCTION method has raised a lot of interests [2]–[4] given the fact that it
FIGURE 2. Training procedure for artificial neural network device compact modeling.
VOLUME 2, 2016 45
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits
FIGURE 3. Multilayer perception (MLP) neural network model. FIGURE 4. Compact model of the n-type thin-TFET derived
based on the MLP neural network widely used in previous
works [2]–[4]. (a) Training errors and test errors for a variety of
the hyperparameter (i, j) to be (5, 5), (7, 7), and (9, 9), hyperparameters. (b) MLP neural network with 7 tanh neurons in
the first and second hidden layers. From (c)–(f), the I–V curves
these three MLP neural networks were trained for 5 million generated by the MLP neural network shown in (b) are plotted
epochs. Using the loss function defined in Fig. 3, the root- along with the training data and the test data. (c) ID versus VDS
mean-squared deviations for training data and test data are at different VTG values. (d) ID versus VTG at different VDS values
shown in Fig. 4(a). The test errors are used to evaluate the in linear scale. (e) ID versus VDS at different VTG values around
VDS = 0, the embedded plot shows unphysical ID –VDS
generalization ability of the model, namely, how the model relationships around VDS equals 0. (f) ID versus VTG at different
fit the unseen data. As shown in Fig. 4(a), the test errors VDS values in semilog scale, unphysical oscillation of ID around
stay close to the training errors, which indicated a good zero appears in the subthreshold region and when VDS = 0.
generalization. We choose to plot the I –V curves modeled
by the MLP neural network with 7 tanh neurons in the first
and second hidden layers, as shown in Fig. 4(b), which gives and the test data. Good fitting in the linear scale is achieved
a neural network with 15 neurons and 85 parameters in total. for both the ID –VDS and the ID –VTG curves. However, if we
Fig. 4(c)–(f) shows the I –V curves generated by the MLP zoomed-in view the region near VDS = 0, ID is not zero
neural network compact model along with the training data when VDS is zero, indicating that the ID –VDS relationship is
46 VOLUME 2, 2016
Li et al.: Pi-NN for efficient device compact modeling
VOLUME 2, 2016 47
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits
VI. CONCLUSION
Motivated by the need of high-quality compact models for
emerging devices, we have proposed a novel neural network:
Pi-NN, for compact modeling. With fundamental device
physics incorporated, the Pi-NN method can produce accu-
rate, smooth, and computational efficient transistor models
with good generalization ability. Thin-TFET is presented as
an example to illustrate the capabilities of Pi-NN. A rela-
tively small compact model is achieved with excellent fitting
in both the on and the subthreshold regions of the thin-
TFET. The charge–voltage, Q–V , relationships in a device are
highly desirable for circuit design. It is possible to construct
Q–V relations from the device C–V data (not shown here).
However, since the sign of the terminal charge density is
dependent on both VTG and VDS , the Pi-NN architecture
cannot be directly applied for modeling Q–V relations. The
walk-around is to connect VTG and VDS to both the tanh
subnet and the sig subnet in the Pi-NN, and add the bias
terms in the tanh neurons. This modified Pi-NN is compat-
ible with the adjoint neural network method for constructing
Q–V relation from C–V measurements [2], [11]. However,
this modified Pi-NN architecture has no apparent advantage
over the MLP architecture for Q–V modeling. Future work
will focus on how to better integrate Q–V modeling into the
Pi-NN framework. Finally, the Pi-NN approach is readily
implementable in commercial measurement and modeling
FIGURE 6. Compact model of the n-type thin-TFET derived systems.
based on the Pi-NN developed in this paper. (a) Training errors
and test errors for a variety of hyperparameters. (b) Pi-NN model REFERENCES
with 2 tanh neurons and 3 sigmoid neurons in the hidden layer.
[1] J. Sellier et al., ‘‘NEMO5, a parallel, multiscale, multiphysics nanoelec-
From (c)–(f), the I–V curves generated by the Pi-NN model
tronics modeling tool,’’ in Proc. Int. Conf. Simulation Semiconductor
shown in (b) are plotted along with the training data and the test
Process. Devices (SISPAD), Denver, CO, USA, 2012.
data. (c) ID versus VDS at different VTG values. (d) ID versus VTG [2] X. Jianjun, and D. E. Root, ‘‘Advances in artificial neural network models
at different VDS values in linear scale. (e) ID versus VDS at of active devices,’’ in Proc. IEEE MTT-S Int. Conf. Numer. Electromagn.
different VTG values around VDS = 0, the embedded plot shows Multiphys. Modeling Optim. (NEMO), Aug. 2015, pp. 1–3.
well-behaved ID –VDS relationship around VDS = 0. (f) ID versus [3] H. B. Hammouda, M. Mhiri, Z. Gafsi, and K. Besbes, ‘‘Neural-based
VTG at different VDS values in semilog scale, good fitting is models of semiconductor devices for SPICE simulator,’’ Amer. J. Appl. Sci.,
achieved in the subthreshold region. All the unphysical vol. 5, no. 4, pp. 785–791, 2008.
behaviors of the MLP neural network are eliminated, and the [4] W. Fang, and Q.-J. Zhang, ‘‘Knowledge-based neural models for
size of the neural network is largely reduced. microwave design,’’ IEEE Trans. Microw. Theory Techn., vol. 45, no. 12,
pp. 2333–2343, Dec. 1997.
[5] K. Hornik, ‘‘Approximation capabilities of multilayer feedforward net-
training and test error. More complex models can achieve works,’’ Neural Netw., vol. 4, no. 2, pp. 251–257, 1991.
[6] A. C. Seabaugh and Q. Zhang, ‘‘Low-voltage tunnel transistors for
smaller training and test error but the improvement is not beyond CMOS logic,’’ Proc. IEEE, vol. 98, no. 12, pp. 2095–2110,
significant enough to justify the increased complexity. Dec. 2010.
48 VOLUME 2, 2016
Li et al.: Pi-NN for efficient device compact modeling
[7] M. Oscar, D. Esseni, J. J. Nahas, D. Jena, and H. G. Xing, OZAN İRSOY received the B.A. degree in mathe-
‘‘Two-dimensional heterojunction interlayer tunneling field effect matics and the B.Sc. degree in computer engineer-
transistors (Thin-TFETs),’’ IEEE J. Electron Devices Soc., vol. 3, no. 3, ing from Bogazici University, Istanbul, Turkey, in
pp. 200–207, May 2015. 2012. He is currently pursuing the Ph.D. degree in
[8] M. Oscar, D. Esseni, G. Snider, D. Jena, and H. G. Xing, ‘‘Single parti- computer science with Cornell University, Ithaca,
cle transport in two-dimensional heterojunction interlayer tunneling field NY, USA.
effect transistor,’’ J. Appl. Phys., vol. 115, no. 7, p. 074508, 2014.
[9] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, ‘‘Learning representa-
tions by back-propagating errors,’’ Nature, vol. 323, no. 6088, pp. 533–536,
Oct. 1988.
[10] F. A. Gers, N. N. Schraudolph, and J. Schmidhuber, ‘‘Learning precise
timing with LSTM recurrent networks,’’ J. Mach. Learn. Res., vol. 3,
pp. 115–143, Aug. 2002.
[11] J. Xu, M. C. E. Yagoub, R. Ding, and Q. J. Zhang, ‘‘Exact adjoint sensitivity
analysis for neural-based microwave modeling and design,’’ IEEE Trans. CLAIRE CARDIE received the B.S. degree from
Microw. Theory Techn., vol. 51, no. 1, pp. 226–237, Jan. 2003. Yale University, New Haven, CT, USA, in 1982,
and the M.S. and Ph.D. degrees from the Uni-
versity of Massachusetts, Amherst, MA, USA,
in 1989 and 1994, respectively, all in computer
science.
She is currently a Professor with the Department
of Computer Science and the Department of Infor-
mation Science, Cornell University, Ithaca, NY,
USA.
VOLUME 2, 2016 49