0% found this document useful (0 votes)
20 views7 pages

Smallest Neural Network To Learn The Ising Criticality. Phys. Rev. E 98

artigo bom sobre machine learning

Uploaded by

Igor brenno
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views7 pages

Smallest Neural Network To Learn The Ising Criticality. Phys. Rev. E 98

artigo bom sobre machine learning

Uploaded by

Igor brenno
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

PHYSICAL REVIEW E 98, 022138 (2018)

Smallest neural network to learn the Ising criticality

Dongkyu Kim and Dong-Hee Kim*


Department of Physics and Photon Science, School of Physics and Chemistry,
Gwangju Institute of Science and Technology, Gwangju 61005, Korea

(Received 6 April 2018; published 31 August 2018)

Learning with an artificial neural network encodes the system behavior in a feed-forward function with a
number of parameters optimized by data-driven training. An open question is whether one can minimize the
network complexity without loss of performance to reveal how and why it works. Here we investigate the learning
of the phase transition in the Ising model and find that having two hidden neurons can be enough for an accurate
prediction of critical temperature. We show that the networks learn the scaling dimension of the order parameter
while being trained as a phase classifier, demonstrating how the machine learning exploits the Ising universality
to work for different lattices of the same criticality within a single set of trainings in one lattice geometry.

DOI: 10.1103/PhysRevE.98.022138

I. INTRODUCTION temperature of the unseen triangular-lattice model. Remark-


ably, the network outputs for different system sizes fell on the
Machine learning [1,2] is a framework for prediction based
same curve in the finite-size-scaling tests.
on data-driven optimization of a hidden complex structure
We explain these behaviors by solving an analytically
of unknowns, which drastically differs from a conventional
tractable model of the neural network that we devise to capture
model of explanation based on a physical understanding of a
a typical structure emerging in the training of large-scale net-
system. Performing as a classifier, an artificial neural network
works. While the number of hidden neurons is reduced to two
can suggest a proper label for an unacquainted input without
in our network, the accuracy of locating the critical temperature
doing explicit analysis, which is done by training a large set of
the network parameters to adapt themselves to already labeled is comparable to the previous result with 100 neurons [17]. It
data. In spite of many empirical successes, one intrinsic issue turns out that the information explicitly encoded in the network
is that the network often works like a “black box” since it is the scaling dimension of the order parameter, leading to the
is generally difficult to see inside how it reaches a particular interoperability within the class of the same criticality.
output. Such lack of transparency is due to the high complexity
coming out of the interplay between many network parameters.
The more complex structure may help increase flexibility in II. PATTERNS IN THE TRAINED NETWORKS
learning but at the same time makes it harder to understand Let us first show the structure that we observe in the network
how it extracts a desired feature from the data. In this paper, trained in the square lattices (see Fig. 1). We consider a
we present the opposite extreme of a minimally simple neural typical fully connected feed-forward network with a single
network to explain the observed accuracy and universality in
hidden layer of 50 neurons between input and output where
its learning of the phase transition in the Ising model.
the sigmoid function normalizes the activation signals. The
The ideas of machine learning have been actively applied
network is trained by assigning zero (one) to the desired
to problems in classical and quantum physics. For instance,
output for the disordered (ordered) phase based on the labeled
efficient Monte Carlo simulation methods were proposed by
dataset of spin configurations which are sampled from the
integrating machine learning into wave-function representa-
Monte Carlo
 simulations [53] with the Ising Hamiltonian
tions [3–10] and cluster updates [11–16]. On the other hand,
H = − i,j  si sj , where si ∈ {1, −1} is the spin state at site i,
phase transitions have been extensively examined in various
and i, j  runs over the nearest neighbors. The training dataset
schemes of the supervised [17–33] and unsupervised [34–
42] learning and also in the deep learning with advanced is prepared at various temperatures
√ around the exact critical
structures [43–52] to classify phases, capture topological temperature Tc = 2/ ln(1 + 2). We use TENSORFLOW [54]
features, and locate transition points. An intriguing observation to minimize the cross entropy with the L2 regularization to
in the supervised learning is that a fixed neural network avoid overfitting. Details are given in Appendix A.
often works even for systems that were unseen in training. Two features are notable from the link weights between
In particular, for the Ising model, the seminal work by Car- the input and hidden layers as exemplified in Fig. 1(b). First,
rasquilla and Melko [17] demonstrated that the phase classifier a hidden neuron tends to receive a signal accumulated with
trained in the square lattices successfully predicted the critical almost constant weights of either sign,  suggesting that the
input {si } is reduced to its sum ∝ ± i si . This is consistent
with the activation patterns of the hidden neurons observed
previously [17,28] and the concept of the toy model [17].
*
[email protected] Second, there are neurons found effectively unlinked with

2470-0045/2018/98(2)/022138(7) 022138-1 ©2018 American Physical Society


DONGKYU KIM AND DONG-HEE KIM PHYSICAL REVIEW E 98, 022138 (2018)

where N is the number of lattice sites, and μ is a bias parameter


to be determined by training. The outgoing signals from the
hidden layer are normalized through the sigmoid function
f (x) = 21 (1 + tanh x2 ).
We treat the two signals delivered from the hidden to output
layer on an equal footing by setting the second weight matrix as
W2 = 4(1, 1) with the bias b2 = −2 on the output neuron.
This choice of W2 completes the Z(2) symmetry of the network
output for the Ising spin inputs. Being activated also with the
sigmoid function, the final output is then written as qk = [1 +
tanh(zk )]/2 for an input {si(k) } where zk = 2[f (yk − μ) +

f (−yk − μ)] − 1 for yk ≡ N1 i si(k) . The pseudotransition

temperature T can be typically given by qT ∗ = 1/2 where
·T denotes an average over the dataset at temperature T .
The coverage of the final output function clarifies our
motivation for having the common prefactor  in W2 and b2 .
For a value of μ that is not small, the signal after passing
through the network W2 resides in the range of (0, 4). Thus,
the shift with the bias b2 = −2 is a natural choice to make
the final output of the sigmoid function cover the full range of
(0,1) [see Fig. 1(a)] as required for successful learning of the
phase transition.
The training of our two-unit network is done by minimizing
the cross entropy,
 Tu  1
FIG. 1. Neural network as a phase classifier for the Ising model.
(a) The schematic diagram of the signal processing. (b) The examples
L=− dT dy ρT (y)[p1 ln q + p0 ln(1 − q )], (2)
Tl 0
of the weight matrix W1 at different values of the regularization
strength λ. (c) The transition temperature as a function of λ predicted where the reference classifier is given by the Heaviside step
by the 50-unit networks which are trained in the square lattices and function (x) as p0 ≡ (T − Tc ) and p1 = 1 − p0 , the net-
then applied to the square and triangular lattices for interoperability work output is denoted by q ≡ q[z(y, μ), ], and ρT (y) is the
tests. density of training data giving y at T .
Treating this learning problem analytically, we find that an
interesting system-size dependence is encoded in the network
vanishing weights, implying that the size of the hidden layer parameters  and μ. For simplicity, we approximate ρT (y) ∝
can be even smaller. δ(y − m) with the order parameter m ≡ |y|T by ignoring
These features are robust at the regularization strength the fluctuations in the input dataset of y. The minimization of
λ > 0.001 for all system sizes examined. The structure partly L(, μ) then leads to the following coupled equations:
survives at λ = 0.001, while it fades away as λ gets weaker
 Tu
(see Fig. 5 in Appendix A). We find that the prediction of
the transition temperature is consistent at λ > 0.001 for the dT [2p1/2 zm − zm tanh(zm )] = 0, (3)
Tl
networks that are trained in the square lattices and examined   
Tu
also in the triangular lattices [see Fig. 1(c)]. In contrast, as ∂ 1
dT 2p1/2 zm − ln cosh(zm ) = 0, (4)
λ gets weaker, the accuracy becomes inconsistent in this Tl ∂μ 
interoperability test between the different lattices.
where zm ≡ z(m, μ), and p1/2 = 1/2 − p0 . The precise
bounds of (Tl , Tu ) are irrelevant if it is wide enough because
III. LEARNING THE FINITE-SIZE SCALING WITH THE the integrands vanish for a large |zm |. Thus, the criterion
TWO-UNIT NETWORK MODEL |zm |  1/ allows us to define the effective bounds as T ∗ ±
Inspired from these observations, we propose a minimal δT centered at the pseudotransition temperature T ∗ where
network model by having only two neurons in the hidden zm (T ∗ ) = 0. Below we show that the effective range of a
layer. The one is linked significant T is comparable to the critical window that scales
from the input with a positive constant as L−1/ν with the length scale L of the system, which becomes
weight, reading y ∝ i si , while the other is associated with
the opposite sign, reading −y. We need a pair of them to learn essential to understand the behavior of the trained parameters
the Z(2) symmetry of the Ising model, which is in contrast to of L and μL .
the previous toy model [17]. Thus, we write the weight matrix In the area of a small zm around T ∗ , we may express
W1 for the links and the bias vector b1 for the hidden neurons zm for m as zm 3
m (m − m∗ ) where m∗ ≡ m(T ∗ ). In the
8 ∗
as transition area, we can also replace m by its finite-size-scaling
    ansatz mL = L−σ m̃[(T − Tc )L1/ν ] with the scaling dimen-
1 1 1 ··· 1  1 sion σ ≡ β/ν, where m̃(x) is a scale-invariant function.
W1 = , b1 = −μ , (1)
N −1 −1 · · · −1 1 Then, by expanding Eq. (3) for zm , we can simply write down

022138-2
SMALLEST NEURAL NETWORK TO LEARN THE ISING … PHYSICAL REVIEW E 98, 022138 (2018)

the leading-order behavior of L as


 x+
2σ x−
p1/2 m̃∗ [m̃(x) − m̃∗ ]dx
L ∼ L  x+ , (5)
x− m̃∗ [m̃(x) − m̃∗ ] dx
2 2

where x± = (TL∗ − Tc ± δTL )L1/ν . This reduces to L ∼


L2σ when x± ∼ O(1), which is indeed confirmed in Eq. (4).
Through similar procedures, one can also write down the
scaling solution of Eq. (4) as
 x+
∗ −1/ν
TL − Tc ∼ −L m̃∗ m̃(x)dx + 2m̃2∗ δTL , (6)
x−

where L is replaced by L . This holds when TL∗ − Tc ∼


2σ

L−1/ν and δTL ∼ L−1/ν , or equivalently x± ∼ O(1). For μL ,


the equation z[L−σ m̃(TL∗ ), μL ] = 0 leads to the asymptotic
behavior of μL − ln 3 ∼ L−2σ .

IV. NUMERICAL VERIFICATIONS


We numerically verify the behavior of L ∼ L2σ and
μL − ln 3 ∼ L−2σ by performing the learning based on the
Monte Carlo datasets. Specifically, we construct the input
data distribution ρT (y) by employing the Wang-Landau sam- FIG. 2. Learning with the two-unit neural network. The system-
pling method for energy and magnetization [55–57] (see size dependence of the network parameters (a) L and (b) μL trained
Appendix C). This allows us to compute Eq. (2) directly in the square (SQ1,SQ2) and triangular (TR1,TR2) lattices for T /Tc ∈
with the predetermined ρT (y), which makes the minimization [0.5, 1.5] (SQ1,TR1) and [0,2] (SQ2,TR2). (c) The comparison of
numerically straightforward. Figure 2 presents L and μL the transition point predicted by SQ1, the estimate with the 100-unit
obtained in two dimensions (2D) for the different choices of the network [17], and the exact Tc . The inset shows the scaling collapse
underlying geometry and temperature range for the learning. In of the network outputs with the exponent ν = 0.94.
all cases, the trained parameters become increasingly parallel
to the lines of L ∼ L1/4 and μL − ln 3 ∼ L−1/4 for the exact
exponent of σ = 1/8 as L increases. actually use the network trained in the square lattices for the
It turns out that although we have only two neurons in prediction with the data in the triangular lattices, explaining the
the hidden layer, the transition point located in our two-unit previous observation with the 100-unit network in Ref. [17].
network is as accurate as the previous estimate with 100 Figure 3 shows that locating the precise Tc in the thermo-
hidden neurons [17]. In Fig. 2(c) showing the outputs of the dynamic limit is not affected by the training-specific values
network SQ1 trained and examined in the square lattices, the of (a , aμ ) when σ is fixed. For these tests, the input
extrapolation from TL∗ finds T∞ ∗
= 2.267(1) with the exponent datasets are prepared for the systems with the sizes up to
ν = 0.94(2) which agrees well with the previous estimate of L = 1024 in the Monte Carlo simulations (see Appendix B for

Tc = 2.266(2) with ν = 1.0(2) [17]. Also, the location of T∞ details). The interoperability between the square and triangular
is at the crossings between the curves of different L’s, leading lattices is more directly examined in Fig. 3(c) by using the
to the scale invariance in the network outputs at the transition network ExtSQ1 with (a , aμ ) being extrapolated from the
temperature. SQ1 parameters trained in the square lattices. Testing with
The deviation from the exact Tc is possibly due to the the data of the triangular lattices shows an excellent scaling
finite-size effects of the systems accessible in the learning collapse at the exact values of Tc = 4/ ln 3 and ν = 1. The
which are apparent in L and μL at small L’s. Since we explicit use of SQ1 for small L’s provides Tc ≈ 3.637 (see
now know from the analytic results that L and μL should Fig. 6 in Appendix D), which is also comparable to the 100-unit
scale asymptotically with the exponent 2σ , we may try to network estimate of Tc = 3.65(1) [17].
remove the finite-size effects by extrapolating the network Finally, we discuss what happens in practice when the neural
parameters as L = a L1/4 and μL = ln 3 + aμ L−1/4 with network operates on the system with an exponent mismatch.
the expected exponent σ = 1/8. We have observed that this Figure 4(a) shows the case where the network ExtSQ1 with the

parametrization provides T∞ ≈ 2.269, which is very close to 2D exponent is applied to the inputs given in three-dimensional
the exact value of Tc (see Fig. 6 in Appendix D). (3D) cubic lattices. Interestingly, the pseudotransition tem-
An important implication of our analytic results is that peratures TL∗ show a clean power-law convergence to reach

the essential information encoded by the learning is only the T∞ ≈ 4.4695. It is not a precise Tc and with a wrong exponent,
exponent σ of the critical behavior. Thus, after the training but one might say that it is still not too far from the known
is done, one may not be able to distinguish the networks by Tc . However, it clearly loses a scale-invariant point of the
the system-specific properties of the training datasets, such as output curves, and thus a finite-size-scaling test is failed. On
an underlying lattice geometry and a location of Tc , as long as the other hand, a network parametrized with the known 3D
they are in the same universality class. This implies that one can exponent σ = 0.518 15 [58] provides Tc = 4.511 52(1) with

022138-3
DONGKYU KIM AND DONG-HEE KIM PHYSICAL REVIEW E 98, 022138 (2018)

ν = 0.63 [see Fig. 4(b)] which is in excellent agreement with


the previous Monte Carlo estimates [59].

V. CONCLUSIONS
In conclusion, we have shown that the minimal binary
structure with two neurons in the hidden layer is essential in
understanding the accuracy and interoperability of a neural
network observed in the supervised learning of the phase
transition in the Ising model. We have found that the scaling di-
mension of the order parameter is encoded into the system-size
dependence of the network parameters in the learning process.
This allows the conventional finite-size-scaling analysis with
the network outputs to locate the critical temperature and, more
importantly, demonstrates how one trained neural network can
work for different lattices of the same Ising universality.
Explainable machine learning aims to provide a transparent
platform that allows an interpretable prediction which is crucial
for the applications that require extreme reliability. In the
learning of classifying the phases in the Ising model, we have
attempted downsizing the neural network to reveal a traceable
structure which turns out to be irreducibly simple and yet not
to lose its performance. This suggests a necessity of further
FIG. 3. Interoperability of the two-unit neural network. At the studies to explore interpretable building blocks of machine
fixed scaling of L = a L1/4 and μL = ln 3 + aμ L−1/4 , the consis- learning in a broader range of physical systems.
tency of finding Tc (dashed line) is examined by varying (a) a at
aμ = 0.25 and (b) aμ at a = 15 for the inputs prepared in the square
lattices. The network ExtSQ1 is associated with (a , aμ ) extrapolated ACKNOWLEDGMENTS
from SQ1 trained in the square lattices. (c) The finite-size-scaling test This work was supported from the Basic Science Re-
of the outputs of ExtSQ1 for the inputs from the triangular lattices. search Program through the National Research Foundation
The exact values of Tc = 4/ ln 3 and ν = 1 are used. The error bars
of Korea funded by the Ministry of Education (Grant No.
(not shown) are much smaller than the symbol size.
NRF-2017R1D1A1B03034669) and also from a GIST Re-
search Institute (GRI) grant funded by the GIST in 2018.

APPENDIX A: NUMERICAL TRAINING OF THE 50-UNIT


NEURAL NETWORK WITH THE L 2 REGULARIZATION
For the numerical training of the 50-unit neural network, we
construct the loss function L(W1 , W2 , b1 , b2 ; λ) by combining
the cross entropy and regularization terms (for the overview,
see Ref. [60]) as
ndata
1 λ
L=− [p ln q + (1 − p) ln(1 − q )] + Wl 2F ,
ndata i=1
4 l=1,2

where the reference classifier p is set to be 1 if the temperature


of the input σT is below Tc and 0 otherwise, the network
output q is a function of the parameter set (W1 , W2 , b1 , b2 )
and the input σT = {s1 , . . . , sN }, and the last term is the L2
regularization with the strength λ which helps to avoid overfit-
ting. The training dataset includes 1800 spin configurations
per temperature sampled with the spin-up-down symmetry
being imposed in the Monte Carlo sampling processes. The
spin configurations are sampled for 229 temperatures regularly
spaced with step size 0.01 in the range of (0.5Tc , 1.5Tc ),
FIG. 4. Test of the 3D Ising model with the extrapolated two-unit giving the total number of the training data ndata = 412 200.
networks. ExtSQ1 is used in (a), examining the behavior of the The minimization is performed by using the Adam optimizer
pseudotransition points TL∗ and the crossing point (not existing) implemented in TENSORFLOW [54], and the learning proceeds
between the output curves. (b) The finite-size-scaling test of the with the entire training dataset during 30 000 epochs at the
outputs of the network made by using the previous 3D Ising estimate learning rate 0.0005. The training is done in the L × L square
of σ = 0.518 15 [58]. lattices for L = 16, 20, 24, 32, 40.

022138-4
SMALLEST NEURAL NETWORK TO LEARN THE ISING … PHYSICAL REVIEW E 98, 022138 (2018)

FIG. 5. Dependence of the training of a neural network on the L2 -regularization strength λ. (a) The visualization of the weight matrix W1T
of the 50-unit network trained in the 16 × 16 square lattices. (b) The success percentage in the phase classification test given as a function of
λ. The sum of the weights of the incoming links to a hidden neuron is plotted for (c) λ = 0.005, (d) 0.001, and (e) 0.0005, where the length of
the bar indicates the magnitude of the partial sum of the weights having the sign opposite to the total weight sum. The panels (f)–(h) present
the predictions of the transition temperature in the square and triangular lattices at λ = 0.005, 0.001, 0.0005.

Figure 5(a) visualizes the weight matrix W1 of the neural network for our purpose of investigating its ability to predict
network trained at various values of λ ranging from 0.1 to the transition temperature and interoperability with different
0.0001. In the typical validation test of the phase classifi- underlying lattice geometries.
cation with the reserved test dataset of 200 configurations The direct tests of finding the transition temperature with
per temperature in the same range, we have very similar the networks trained in the square lattices are performed with
success percentages of about 95% for all λ’s examined, while the datasets separately prepared in the triangular and square
slightly higher percentages are found at 0.0005  λ  0.01 lattices. The performance shown in these tests seems to be
[see Fig. 5(b)]. However, this simple classification test does closely related to the existence or nonexistence of the plus-
not fully validate the actual accuracy and performance of the minus structure of the weight observed in the weight matrix

022138-5
DONGKYU KIM AND DONG-HEE KIM PHYSICAL REVIEW E 98, 022138 (2018)

W1 which turns out to undergo a crossover from a structured on a single 3.4 GHz Xeon E3 processor. Note that we have
to unstructured one around λ = 0.001. This crossover does not obtained a single set of g(E, M ) for each system within our
change with the size of the system that we have examined. One computational resources, and thus the curves in Fig. 2 have
way to notice the change of the visibility of the structure is to been given without error bars. Once the joint density of states
look at the weight sum of the incoming links of a hidden neuron g(E, M ) is obtained, the distribution function ρT (y) of the
as exemplified in Figs. 5(c)–5(e). At λ = 0.005, the plus-minus magnetization y to be used to evaluate the two-unit network
structure is very clear since all contributing weights to a hidden can be computed at any temperature T as
neuron are of the same sign. While the weight sums are still well 
g(E, M ) exp[J E/kB T ]
separated into plus, minus, and zeros, defects start to appear ρT (y)|y=M/N =  E ,
at λ = 0.001, which is shown by the finite length of the bar E,M g(E, M ) exp[J E/kB T ]
in Fig. 5(d) that indicates the contribution of the weights with
where the ferromagnetic coupling J and the Boltzmann con-
the opposite sign to the total sum at a given neuron. At the
stant kB are set to be unity.
weaker regularizations of λ  0.0005, the length of the bar
tends to get larger to be comparable to the magnitude of the
weight sum. The behavior of the pseudotransition temperatures APPENDIX D: SUPPLEMENTAL FIGURES OF LOCATING
differs as well at these λ’s as shown in Figs. 5(f)–5(h). The mark THE TRANSITION POINT
at L = ∞ is from the extrapolation with the last three points Figure 6 displays the supplemental figures of finding the
while the error bars indicate the combined uncertainty of the transition temperatures with the two-unit neural networks in
three- and four-point fittings. Up to λ = 0.005 of having the the square and triangular lattices. The extrapolated network
clear structure in W1 , the system-size extrapolation with 1/L is ExtSQ1 is associated with the parameter set (a , aμ ) fitted
very consistent. At λ = 0.001, the finite-size behavior becomes to those of the network SQ1 that was explicitly trained in
severe, and below λ = 0.0005, the accuracy of predicting the square lattices. In the validation the network ExtSQ1
the transition temperature in the triangular lattices becomes for the transition temperature with the data in the square
poor as λ further decreases, implying that the overfitting to lattices, the extrapolation of the pseudotransition temperatures
the reference of a step-function-like classification may have (TL∗ along the 0.5-line of the output) provides T∞ ∗
≈ 2.269
occurred during the training with the data in the square lattices
at such small λ.

APPENDIX B: PREPARATION OF THE INPUT DATASET


OF SPIN CONFIGURATIONS AND MAGNETIZATIONS
The Monte Carlo simulations with the Wolff cluster update
algorithm [53] is used to produce the input dataset for training
and testing. The spin configurations and magnetizations are
sampled at every N/Nc  cluster flip, where N and Nc 
are the number of the lattice sites and the average cluster
size, respectively. In the measurement of the output of the
extrapolated two-unit network, the first 10 000 samples are
thrown away during the thermalization, and then 30 bins of
10 0000 samples are used for the measurements. The error bars
are estimated by using the jackknife method, but it turns out
that they are much smaller than the symbol sizes in all plots
and thus are not shown in the figures of the main text.

APPENDIX C: PREPARATION OF ρT ( y) TO TRAIN


THE TWO-UNIT NETWORK MODEL
We employ the two-parameter Wang-Landau sampling
method [55–57] to generate the joint density of states g(E, M )
of the Ising model by following the standard procedures
(for instance, see
 Ref. [61], and the  references therein). The
variables E = i,j  si sj and M = i si cover all possible
values of the energy and total magnetization. In all sizes of
the system examined, the flatness criterion of the histogram FIG. 6. Locating the transition temperature with the two-unit
is set to be larger or equal to 0.9, and the stopping criterion networks, ExtSQ1 (a) in the square lattices and SQ1 (b) in the
of the modification factor is given as ln f < 10−8 . The two- triangular lattices. The mark T∞∗ indicates the pseudotransition point
parameter Wang-Landau calculations are known to consume a extrapolated in the thermodynamic limit which is identified in the
huge amount of computational time, but still we have obtained insets as T∞∗ ≈ 2.269 (a) in the square lattices and T ∗ ≈ 3.637 (b) in
g(E, M ) up to L = 48 (L = 40) in the square (triangular) the triangular lattices. The arrows with Tc indicate the location of the
lattices, where the largest calculation took about 4 months exact critical temperature.

022138-6
SMALLEST NEURAL NETWORK TO LEARN THE ISING … PHYSICAL REVIEW E 98, 022138 (2018)


which agrees very well √ with the exact value of the critical we obtain T∞ ≈ 3.637 which is also very comparable to the
point Tc = 2/ ln(1 + 2). On the other hand, in the additional value of Tc = 4/ ln 3 of the exact solution in the triangular
interoperability test of SQ1 with the data in the square lattices, lattices.

[1] G. E. Hinton and R. R. Salakhutdinov, Science 313, 504 (2006). [31] R. A. Vargas-Hernández, J. Sous, M. Berciu, and R. V. Krems,
[2] Y. LeCun, Y. Bengio, and G. Hinton, Nature (London) 521, 436 arXiv:1803.08195.
(2015). [32] Y.-T. Hsu, X. Li, D.-L. Deng, and S. Das Sarma,
[3] G. Carleo and M. Troyer, Science 355, 602 (2017). arXiv:1805.12138.
[4] Y. Nomura, A. S. Darmawan, Y. Yamaji, and M. Imada, Phys. [33] X.-Y. Dong, F. Pollmann, and X.-F. Zhang, arXiv:1806.00829.
Rev. B 96, 205152 (2017). [34] L. Wang, Phys. Rev. B 94, 195105 (2016).
[5] X. Gao and L.-M. Duan, Nat. Commun. 8, 662 (2017). [35] G. Torlai and R. G. Melko, Phys. Rev. B 94, 165134 (2016).
[6] D.-L. Deng, X. Li, and S. Das Sarma, Phys. Rev. X 7, 021021 [36] W. Hu, R. R. P. Singh, and R. T. Scalettar, Phys. Rev. E 95,
(2017). 062122 (2017).
[7] Z. Cai and J. Liu, Phys. Rev. B 97, 035116 (2018). [37] N. C. Costa, W. Hu, Z. J. Bai, R. T. Scalettar, and R. R. P. Singh,
[8] I. Glasser, N. Pancotti, M. August, I. D. Rodriguez, and J. I. Phys. Rev. B 96, 195138 (2017).
Cirac, Phys. Rev. X 8, 011006 (2018). [38] S. J. Wetzel, Phys. Rev. E 96, 022140 (2017).
[9] J. Chen, S. Cheng, H. Xie, L. Wang, and T. Xiang, Phys. Rev. B [39] P. Ponte and R. G. Melko, Phys. Rev. B 96, 205146 (2017).
97, 085104 (2018). [40] K. Ch’ng, N. Vazquez, and E. Khatami, Phys. Rev. E 97, 013306
[10] G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer, R. Melko, and (2018).
G. Carleo, Nat. Phys. 14, 447 (2018). [41] S. Iso, S. Shiba, and S. Yokoo, Phys. Rev. E 97, 053304 (2018).
[11] L. Huang and L. Wang, Phys. Rev. B 95, 035105 (2017). [42] W.-J. Rao, Z. Li, Q. Zhu, M. Luo, and X. Wan, Phys. Rev. B 97,
[12] L. Wang, Phys. Rev. E 96, 051301(R) (2017). 094207 (2018).
[13] J. Liu, Y. Qi, Z. Y. Meng, and L. Fu, Phys. Rev. B 95, 041101(R) [43] K. Mills and I. Tamblyn, Phys. Rev. E 97, 032119 (2018).
(2017). [44] P. Huembeli, A. Dauphin, and P. Wittek, Phys. Rev. B 97, 134109
[14] J. Liu, H. Shen, Y. Qi, Z. Y. Meng, and L. Fu, Phys. Rev. B 95, (2018).
241104(R) (2017). [45] Y.-H. Liu and E. P. L. van Nieuwenburg, Phys. Rev. Lett. 120,
[15] X. Y. Xu, Y. Qi, J. Liu, L. Fu, and Z. Y. Meng, Phys. Rev. B 96, 176401 (2018).
041119(R) (2017). [46] T. Ohtsuki and T. Ohtsuki, J. Phys. Soc. Jpn. 85, 123706
[16] Y. Nagai, H. Shen, Y. Qi, J. Liu, and L. Fu, Phys. Rev. B 96, (2016).
161102(R) (2017). [47] T. Ohtsuki and T. Ohtsuki, J. Phys. Soc. Jpn. 86, 044708 (2017).
[17] J. Carrasquilla and R. G. Melko, Nat. Phys. 13, 431 (2017). [48] A. Morningstar and R. G. Melko, arXiv:1708.04622.
[18] E. P. L. van Nieuwenburg, Y.-H. Liu, and S. D. Huber, Nat. Phys. [49] Z. Liu, S. P. Rodrigues, and W. Cai, arXiv:1710.04987.
13, 435 (2017). [50] N. Sun, J. Yi, P. Zhang, H. Shen, and H. Zhai, Phys. Rev. B 98,
[19] P. Broecker, J. Carrasquilla, R. G. Melko, and S. Trebst, Sci. 085402 (2018).
Rep. 7, 8823 (2017). [51] P. Huembeli, A. Dauphin, P. Wittek, and C. Gogolin,
[20] K. Ch’ng, J. Carrasquilla, R. G. Melko, and E. Khatami, Phys. arXiv:1806.00419.
Rev. X 7, 031038 (2017). [52] V. K. Singh and J. H. Han, arXiv:1806.03749.
[21] F. Schindler, N. Regnault, and T. Neupert, Phys. Rev. B 95, [53] U. Wolff, Phys. Rev. Lett. 62, 361 (1989).
245134 (2017). [54] M. Abadi et al., TensorFlow: Large-Scale Machine Learn-
[22] A. Tanaka and A. Tomiya, J. Phys. Soc. Jpn. 86, 063001 (2017). ing on Heterogeneous Systems, 2015; software available from
[23] S. J. Wetzel and M. Scherzer, Phys. Rev. B 96, 184410 (2017). https://tensorflow.org.
[24] Y. Zhang and E.-A. Kim, Phys. Rev. Lett. 118, 216401 (2017). [55] F. Wang and D. P. Landau, Phys. Rev. Lett. 86, 2050 (2001).
[25] Y. Zhang, R. G. Melko, and E.-A. Kim, Phys. Rev. B 96, 245119 [56] F. Wang and D. P. Landau, Phys. Rev. E 64, 056101 (2001).
(2017). [57] D. P. Landau, S.-H. Tsai, and M. Exler, Am. J. Phys. 72, 1294
[26] P. Zhang, H. Shen, and H. Zhai, Phys. Rev. Lett. 120, 066401 (2004).
(2018). [58] S. El-Showk, M. F. Paulos, D. Poland, S. Rychkov, D. Simmons-
[27] M. J. S. Beach, A. Golubeva, and R. G. Melko, Phys. Rev. B 97, Duffin, and A. Vichi, J. Stat. Phys. 157, 869 (2014).
045207 (2018). [59] M. Hasenbusch, Phys. Rev. B 82, 174433 (2010).
[28] P. Suchsland and S. Wessel, Phys. Rev. B 97, 174435 (2018). [60] M. A. Nielsen, Neural Networks and Deep Learning (Determi-
[29] M. Koch-Janusz and Z. Ringel, Nat. Phys. 14, 578 (2018). nation Press, 2015).
[30] I. A. Iakovlev, O. M. Sotnikov, and V. V. Mazurenko, [61] W. Kwak, J. Jeong, J. Lee, and D.-H. Kim, Phys. Rev. E 92,
arXiv:1803.06682. 022134 (2015).

022138-7

You might also like