0% found this document useful (0 votes)
14 views6 pages

Physics-Inspired Neural Networks For Efficient Device Compact Modeling

The document presents a novel physics-inspired neural network (Pi-NN) approach for efficient compact modeling of semiconductor devices, specifically addressing the limitations of traditional multilayer perceptron (MLP) neural networks that often produce unphysical results. By embedding fundamental device physics into the neural network architecture, the Pi-NN generates accurate, smooth, and computationally efficient device models, particularly for emerging devices like tunnel field effect transistors (TFETs). The study demonstrates the effectiveness of the Pi-NN in overcoming the challenges of device compact modeling while maintaining generalization capabilities.

Uploaded by

cnavaneeth28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views6 pages

Physics-Inspired Neural Networks For Efficient Device Compact Modeling

The document presents a novel physics-inspired neural network (Pi-NN) approach for efficient compact modeling of semiconductor devices, specifically addressing the limitations of traditional multilayer perceptron (MLP) neural networks that often produce unphysical results. By embedding fundamental device physics into the neural network architecture, the Pi-NN generates accurate, smooth, and computationally efficient device models, particularly for emerging devices like tunnel field effect transistors (TFETs). The study demonstrates the effectiveness of the Pi-NN in overcoming the challenges of device compact modeling while maintaining generalization capabilities.

Uploaded by

cnavaneeth28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

IEEE Journal on Exploratory Solid-State Computational Devices and Circuits

Received 22 October 2016; revised 30 November 2016; accepted 2 December 2016. Date of publication 8 December 2016;
date of current version 23 January 2017.
Digital Object Identifier 10.1109/JXCDC.2016.2636161

Physics-Inspired Neural Networks for


Efficient Device Compact Modeling
MINGDA LI1 , OZAN İRSOY2 , CLAIRE CARDIE2 , AND HUILI GRACE XING1
1 School of Electrical and Computer Engineering, Cornell University, Ithaca, NY 14850, USA
2 Department of Computer Science, Cornell University, Ithaca, NY 14860, USA

CORRESPONDING AUTHORS: H. G. XING and M. LI ([email protected]; [email protected])


This work was supported in part by the Center for Low Energy Systems Technology, one of the six SRC STARnet Centers, in part by MARCO
and DARPA, and in part by the National Science Foundation and Air Force Office of Scientific Research EFRI 2-DARE under Grant 1433490.

ABSTRACT We present a novel physics-inspired neural network (Pi-NN) approach for compact modeling.
Development of high-quality compact models for devices is a key to connect device science with applications.
One recent approach is to treat compact modeling as a regression problem in machine learning. The most
common learning algorithm to develop compact models is the multilayer perceptron (MLP) neural network.
However, device compact models derived using the MLP neural networks often exhibit unphysical behavior,
which is eliminated in the Pi-NN approach proposed in this paper, since the Pi-NN incorporates fundamental
device physics. As a result, smooth, accurate, and computationally efficient device models can be learned
from discrete data points by using Pi-NN. This paper sheds new light on the future of the neural network
compact modeling.

INDEX TERMS Supervised learning, artificial neural networks, semiconductor device modeling, TFETs.

I. INTRODUCTION method has raised a lot of interests [2]–[4] given the fact that it

D EVICE compact modeling bridges device science to


applications, and therefore, it plays a very important
role in device research. There are two extremes for device
is theoretically capable of arbitrarily accurate approximation
to any function and its derivatives [5].
Compared with another widely used data-driven model:
modeling, one is purely physical and the other is purely table lookup model, the neural network model performs better
empirical. Looking at these two extremes, a purely physi- on the following three aspects.
cal modeling method, such as NEMO [1], is computational 1) Scalability: In order to achieve certain level of accu-
expensive for use in circuit simulations, and a purely empir- racy, the table lookup model needs a large amount of
ical modeling method, such as table lookup model, has data, and the space complexity increases exponentially
limited generalization (extrapolation) ability. Therefore, to with increasing dimensions. In contrast, the neural net-
find a middle ground between purely physical and purely work model is lightweight and scalable.
empirical models, the Electron Design Automation indus- 2) Generalization: The table lookup model has poor gen-
try, represented by the Compact Model Coalition, chooses eralization performance. The polynomial fitting used
to promote physics-based compact models. These use in the table lookup model often has high out-of-sample
fundamental device physics as the building blocks, then errors. In contrast, by using correct learning algorithms,
add empirical fitting to modify and merge different analyt- neural network model can be well generalized, which
ical physical expressions into smooth functions. However, make it more robust against noises.
developing high-quality physics-based compact models is 3) Smoothness: An ideal compact model needs to be
very time-consuming, and therefore often not available for infinitely differentiable. The table lookup model is not
emerging devices. As an alternative, regression with machine infinitely differentiable due to the nature of polyno-
learning can be used to model relationships between different mial fitting, while using higher order polynomial fitting
variables with certain generalization abilities. Among dif- will improve the smoothness, and it is at the expense
ferent regression algorithms, the neural network modeling of computation efficiency. Therefore, the table lookup
2329-9231 2016 IEEE. Translations and content mining are permitted for academic research only.
Personal use is also permitted, but republication/redistribution requires IEEE permission.
44 See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. VOLUME 2, 2016
Li et al.: Pi-NN for efficient device compact modeling

interlayer tunneling field effect transistor), as an example


device for testing the neural network modeling techniques.
The schematic device structure of an n-type thin-TFET is
shown in Fig. 1. The training data are simulated [7] for the
top gate voltage (VTG ) from 0 to 0.4 V and the drain–source
voltage (VDS ) from −0.1 to 0.4 V with a uniform step of
0.01 V, while the test data are for VTG from 0.005 to 0.405 V
and VDS from −0.095 to 0.405 V with a uniform step of
0.01 V. The detailed training procedure is shown in Fig. 2.
FIGURE 1. Schematic of the example emerging device modeled In the preprocessing step in Fig. 2, a scalar function in the
in this paper. An n-type thin-TFET [7], [8]. Its I–V curves are form of exp(−a(VTG + b)) + 1 is multiplied to the output,
obtained by sweeping the top gate (VTG ) with the back
gate (VBG ) grounded. which helps improve deep subthreshold modeling. The value
of a and b is chosen by following the general rules described
below. Since this scalar function is used to improve deep
subthreshold modeling, we should choose a and b such as
model is not possible to be both smooth and computa- (
tionally efficient. In contrast, the neural network model = 1 when VTG > VTH

exp(−a(VTG + b)) + 1 (1)
is guaranteed to be infinitely differentiable.  1 when VTG < VTH
Previous works [2]–[4] used multilayer perceptron (MLP) where VTH is the threshold voltage. Therefore, |b| should be
neural networks to develop compact models, which are smaller than the threshold voltage and a is approximately the
prone to having unphysical behavior [see Fig. 4(e) and (f)]. slope of ID –VTG curves in the deep subthreshold region. The
To eliminate the unphysical behavior, we have developed a final values of a and b are fine-tuned by trial and error. For
novel neural network structure: physics-inspired neural net- example, in this paper, the threshold voltage of thin-TFET is
work (Pi-NN), with fundamental device physics embedded. around 100 mV, so b is set to be −50 mV. As for a, the sub-
As a result, the Pi-NN can be trained to generate an accurate, threshold swing for VTG < 50 mV is around 17 mV/decade,
smooth, and computational efficient device compact model. and therefore, a is set to be 1/17 × 2.3 = 0.135 mV−1
(where 2.3 comes from ln(10) ≈ 2.3).
II. THIN-TFET AND TRAINING PROCEDURE
To illustrate the principles of Pi-NN, we develop compact III. MLP NEURAL NETWORK MODELING
models for the dc I –V curves of a transistor. Physics-based AND UNPHYSICAL BEHAVIOR
device modeling is typically challenging, because the I –V In this section, we use the MLP neural network to generate
curves are highly nonlinear and require different analytical a compact model for the dc I –V curves of the thin-TFET.
physical expressions in different bias windows. Therefore, The MLP neural network architecture and its well-established
it is usually difficult to handcraft an infinitely differen- learning algorithms are shown in Fig. 3 [9]. After some
tiable function from these physical expressions. Since high initial training, we choose to use MLP neural networks with
quality physics-based compact models are yet unavailable two hidden layers and defined its hyperparameter as (i, j),
for emerging devices, such as tunnel field effect transis- where i is the number of neurons in the first hidden layer
tors (TFETs) [6], the neural network modeling approach and j is the number of neurons in the second hidden layer.
has an added attraction. Here, we used a novel device pro- Each neuron uses the hyperbolic tangent function tanh(x) =
posed in our group, a thin-TFET [7] (2-D heterojunction (ex − e−x )/(ex + e−x ) as the activation function. By choosing

FIGURE 2. Training procedure for artificial neural network device compact modeling.

VOLUME 2, 2016 45
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits

FIGURE 3. Multilayer perception (MLP) neural network model. FIGURE 4. Compact model of the n-type thin-TFET derived
based on the MLP neural network widely used in previous
works [2]–[4]. (a) Training errors and test errors for a variety of
the hyperparameter (i, j) to be (5, 5), (7, 7), and (9, 9), hyperparameters. (b) MLP neural network with 7 tanh neurons in
the first and second hidden layers. From (c)–(f), the I–V curves
these three MLP neural networks were trained for 5 million generated by the MLP neural network shown in (b) are plotted
epochs. Using the loss function defined in Fig. 3, the root- along with the training data and the test data. (c) ID versus VDS
mean-squared deviations for training data and test data are at different VTG values. (d) ID versus VTG at different VDS values
shown in Fig. 4(a). The test errors are used to evaluate the in linear scale. (e) ID versus VDS at different VTG values around
VDS = 0, the embedded plot shows unphysical ID –VDS
generalization ability of the model, namely, how the model relationships around VDS equals 0. (f) ID versus VTG at different
fit the unseen data. As shown in Fig. 4(a), the test errors VDS values in semilog scale, unphysical oscillation of ID around
stay close to the training errors, which indicated a good zero appears in the subthreshold region and when VDS = 0.
generalization. We choose to plot the I –V curves modeled
by the MLP neural network with 7 tanh neurons in the first
and second hidden layers, as shown in Fig. 4(b), which gives and the test data. Good fitting in the linear scale is achieved
a neural network with 15 neurons and 85 parameters in total. for both the ID –VDS and the ID –VTG curves. However, if we
Fig. 4(c)–(f) shows the I –V curves generated by the MLP zoomed-in view the region near VDS = 0, ID is not zero
neural network compact model along with the training data when VDS is zero, indicating that the ID –VDS relationship is

46 VOLUME 2, 2016
Li et al.: Pi-NN for efficient device compact modeling

unphysical around VDS = 0 [see Fig. 4(e) (inset)]. Moreover,


the ID –VTG relationship is also unphysical in the subthreshold
region [shown in Fig. 4(f)]. The fundamental reason of these
unphysical behaviors is that the MLP neural network has
no knowledge of the device physics; therefore, the fitting
is no longer physical when ID is very small. In order to
eliminate these unphysical behaviors, we have to design a
neural network with a priori knowledge of the fundamental
device physics.

IV. PHYSICS-INSPIRED NEURAL NETWORK DESIGN


First, we note that the inputs VDS and VTG are related to two
different physical effects: VDS drives the current through the
device while VTG controls the channel potential profile to
change the magnitude of the current. Therefore, VDS and VTG
should be fed to two different neural networks. According
to the fundamental device physics, we know ID –VDS curves
have a linear region at small VDS and a saturation region
at large VDS . This behavior is similar to a tanh function.
This indicates that VDS should be fed into a neural network
with tanh activation functions (tanh subnet). To ensure ID
equals zero when VDS equals zero, all the tanh neurons in the
tanh subnet must have no bias terms. On the other hand, the
ID –VTG curves have an exponential turn-on in the subthresh-
old region and then become a polynomial in the on region.
This is best simulated as a sigmoid function sig(x) =
1/(1 + e−x ). Therefore, VTG is fed into a neural network with
sigmoid activation functions (sig subnet). It should be noted
that we assumed that gate leakage current is negligible, so
VTG would not change the sign of ID . The final drain current is
the entrywise product of the outputs of the tanh subnet and the
sig subnet. This entrywise product reflects the control of VTG
on the drain current driven by VDS . In addition, VDS can affect
the channel potential profile controlled by VTG due to various
nonideal effects, such as the short channel effects. A simple
but effective remedy for this is to add weighted connections
from each layer in the tanh subnet to its corresponding layer
in the sig subnet. By embedding the above device physics
in a neural network structure, we arrive at the Pi-NN. The
Pi-NN architecture and its pseudocodes for the feed-forward
and error back-propagation algorithms are shown in Fig. 5.
This novel neural network is reminiscent of the peephole
long–short term memory [10], with the notable difference
that the Pi-NN does not propagate through time. After all, the
Pi-NN architecture can model the I –V curves of any transis-
tor if two conditions are satisfied: 1) ID equals zeros if and
only if VDS equals zero and 2) VG does not change the sign
of ID . (i.e., the gate leakage current is negligible).

V. PHYSICS-INSPIRED NEURAL NETWORK MODELING


After initial training, we choose to use Pi-NNs with one FIGURE 5. Pi-NN model. Source code available at
https://github.com/Oscarlight/Pi-NN.
hidden layer and define the hyperparameter as (m, n), where
m is the number of the tanh neurons in the hidden layer and
n is the number of the sigmoid neurons in the same hidden complexity is gradually increased from the hyperparame-
layer. The test errors stay close to the training errors, as shown ter (2, 2) to (3, 4). From Fig. 6(a), the model with the
in Fig. 6(a), which indicates good generalization. The model hyperparameter (2, 3) is the simplest model with converging

VOLUME 2, 2016 47
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits

Balancing between model complexity and accuracy, we


choose the model with the hyperparameter (2, 3), as shown
in Fig. 6(b), which give a small Pi-NN model with only
7 neurons and 20 parameters in total. Excellent modeling
is demonstrated in both the on region [shown in Fig. 6(c)
and (d)] and the subthreshold region [shown in Fig. 6(f)].
The ID –VDS relationship around VDS equals zero is shown
in Fig. 6(e). All the unphysical behaviors that appeared
in the MLP neural network model have been eliminated.
Moreover, thanks to the embedded device physics, the Pi-
NN requires much less parameters than the MLP neural
network, which results in a smaller, more efficient compact
model.

VI. CONCLUSION
Motivated by the need of high-quality compact models for
emerging devices, we have proposed a novel neural network:
Pi-NN, for compact modeling. With fundamental device
physics incorporated, the Pi-NN method can produce accu-
rate, smooth, and computational efficient transistor models
with good generalization ability. Thin-TFET is presented as
an example to illustrate the capabilities of Pi-NN. A rela-
tively small compact model is achieved with excellent fitting
in both the on and the subthreshold regions of the thin-
TFET. The charge–voltage, Q–V , relationships in a device are
highly desirable for circuit design. It is possible to construct
Q–V relations from the device C–V data (not shown here).
However, since the sign of the terminal charge density is
dependent on both VTG and VDS , the Pi-NN architecture
cannot be directly applied for modeling Q–V relations. The
walk-around is to connect VTG and VDS to both the tanh
subnet and the sig subnet in the Pi-NN, and add the bias
terms in the tanh neurons. This modified Pi-NN is compat-
ible with the adjoint neural network method for constructing
Q–V relation from C–V measurements [2], [11]. However,
this modified Pi-NN architecture has no apparent advantage
over the MLP architecture for Q–V modeling. Future work
will focus on how to better integrate Q–V modeling into the
Pi-NN framework. Finally, the Pi-NN approach is readily
implementable in commercial measurement and modeling
FIGURE 6. Compact model of the n-type thin-TFET derived systems.
based on the Pi-NN developed in this paper. (a) Training errors
and test errors for a variety of hyperparameters. (b) Pi-NN model REFERENCES
with 2 tanh neurons and 3 sigmoid neurons in the hidden layer.
[1] J. Sellier et al., ‘‘NEMO5, a parallel, multiscale, multiphysics nanoelec-
From (c)–(f), the I–V curves generated by the Pi-NN model
tronics modeling tool,’’ in Proc. Int. Conf. Simulation Semiconductor
shown in (b) are plotted along with the training data and the test
Process. Devices (SISPAD), Denver, CO, USA, 2012.
data. (c) ID versus VDS at different VTG values. (d) ID versus VTG [2] X. Jianjun, and D. E. Root, ‘‘Advances in artificial neural network models
at different VDS values in linear scale. (e) ID versus VDS at of active devices,’’ in Proc. IEEE MTT-S Int. Conf. Numer. Electromagn.
different VTG values around VDS = 0, the embedded plot shows Multiphys. Modeling Optim. (NEMO), Aug. 2015, pp. 1–3.
well-behaved ID –VDS relationship around VDS = 0. (f) ID versus [3] H. B. Hammouda, M. Mhiri, Z. Gafsi, and K. Besbes, ‘‘Neural-based
VTG at different VDS values in semilog scale, good fitting is models of semiconductor devices for SPICE simulator,’’ Amer. J. Appl. Sci.,
achieved in the subthreshold region. All the unphysical vol. 5, no. 4, pp. 785–791, 2008.
behaviors of the MLP neural network are eliminated, and the [4] W. Fang, and Q.-J. Zhang, ‘‘Knowledge-based neural models for
size of the neural network is largely reduced. microwave design,’’ IEEE Trans. Microw. Theory Techn., vol. 45, no. 12,
pp. 2333–2343, Dec. 1997.
[5] K. Hornik, ‘‘Approximation capabilities of multilayer feedforward net-
training and test error. More complex models can achieve works,’’ Neural Netw., vol. 4, no. 2, pp. 251–257, 1991.
[6] A. C. Seabaugh and Q. Zhang, ‘‘Low-voltage tunnel transistors for
smaller training and test error but the improvement is not beyond CMOS logic,’’ Proc. IEEE, vol. 98, no. 12, pp. 2095–2110,
significant enough to justify the increased complexity. Dec. 2010.

48 VOLUME 2, 2016
Li et al.: Pi-NN for efficient device compact modeling

[7] M. Oscar, D. Esseni, J. J. Nahas, D. Jena, and H. G. Xing, OZAN İRSOY received the B.A. degree in mathe-
‘‘Two-dimensional heterojunction interlayer tunneling field effect matics and the B.Sc. degree in computer engineer-
transistors (Thin-TFETs),’’ IEEE J. Electron Devices Soc., vol. 3, no. 3, ing from Bogazici University, Istanbul, Turkey, in
pp. 200–207, May 2015. 2012. He is currently pursuing the Ph.D. degree in
[8] M. Oscar, D. Esseni, G. Snider, D. Jena, and H. G. Xing, ‘‘Single parti- computer science with Cornell University, Ithaca,
cle transport in two-dimensional heterojunction interlayer tunneling field NY, USA.
effect transistor,’’ J. Appl. Phys., vol. 115, no. 7, p. 074508, 2014.
[9] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, ‘‘Learning representa-
tions by back-propagating errors,’’ Nature, vol. 323, no. 6088, pp. 533–536,
Oct. 1988.
[10] F. A. Gers, N. N. Schraudolph, and J. Schmidhuber, ‘‘Learning precise
timing with LSTM recurrent networks,’’ J. Mach. Learn. Res., vol. 3,
pp. 115–143, Aug. 2002.
[11] J. Xu, M. C. E. Yagoub, R. Ding, and Q. J. Zhang, ‘‘Exact adjoint sensitivity
analysis for neural-based microwave modeling and design,’’ IEEE Trans. CLAIRE CARDIE received the B.S. degree from
Microw. Theory Techn., vol. 51, no. 1, pp. 226–237, Jan. 2003. Yale University, New Haven, CT, USA, in 1982,
and the M.S. and Ph.D. degrees from the Uni-
versity of Massachusetts, Amherst, MA, USA,
in 1989 and 1994, respectively, all in computer
science.
She is currently a Professor with the Department
of Computer Science and the Department of Infor-
mation Science, Cornell University, Ithaca, NY,
USA.

HUILI GRACE XING (S’01–M’03–SM’14)


received the B.S. degree in physics from Peking
MINGDA LI received the B.S. degree in micro- University, Beijing, China, in 1996, the M.S.
electronics from Fudan University, Shanghai, degree in material science from Lehigh University,
China, in 2012, and the M.S. degree in electrical Bethlehem, PA, USA, in 1998, and the Ph.D.
engineering from the University of Notre Dame, degree in electrical engineering from the Univer-
Notre Dame, IN, USA, in 2014. He is currently sity of California at Santa Barbara, Santa Barbara,
pursuing the Ph.D. degree in electrical and com- CA, USA, in 2003.
puter engineering with Cornell University, Ithaca, She is currently a Professor with the School
NY, USA. of Electrical & Computer Engineering and the
Department of Materials Science and Engineering, Cornell University,
Ithaca, NY, USA.

VOLUME 2, 2016 49

You might also like