Smallest Neural Network To Learn The Ising Criticality. Phys. Rev. E 98
Smallest Neural Network To Learn The Ising Criticality. Phys. Rev. E 98
Learning with an artificial neural network encodes the system behavior in a feed-forward function with a
number of parameters optimized by data-driven training. An open question is whether one can minimize the
network complexity without loss of performance to reveal how and why it works. Here we investigate the learning
of the phase transition in the Ising model and find that having two hidden neurons can be enough for an accurate
prediction of critical temperature. We show that the networks learn the scaling dimension of the order parameter
while being trained as a phase classifier, demonstrating how the machine learning exploits the Ising universality
to work for different lattices of the same criticality within a single set of trainings in one lattice geometry.
DOI: 10.1103/PhysRevE.98.022138
022138-2
SMALLEST NEURAL NETWORK TO LEARN THE ISING … PHYSICAL REVIEW E 98, 022138 (2018)
022138-3
DONGKYU KIM AND DONG-HEE KIM PHYSICAL REVIEW E 98, 022138 (2018)
V. CONCLUSIONS
In conclusion, we have shown that the minimal binary
structure with two neurons in the hidden layer is essential in
understanding the accuracy and interoperability of a neural
network observed in the supervised learning of the phase
transition in the Ising model. We have found that the scaling di-
mension of the order parameter is encoded into the system-size
dependence of the network parameters in the learning process.
This allows the conventional finite-size-scaling analysis with
the network outputs to locate the critical temperature and, more
importantly, demonstrates how one trained neural network can
work for different lattices of the same Ising universality.
Explainable machine learning aims to provide a transparent
platform that allows an interpretable prediction which is crucial
for the applications that require extreme reliability. In the
learning of classifying the phases in the Ising model, we have
attempted downsizing the neural network to reveal a traceable
structure which turns out to be irreducibly simple and yet not
to lose its performance. This suggests a necessity of further
FIG. 3. Interoperability of the two-unit neural network. At the studies to explore interpretable building blocks of machine
fixed scaling of L = a L1/4 and μL = ln 3 + aμ L−1/4 , the consis- learning in a broader range of physical systems.
tency of finding Tc (dashed line) is examined by varying (a) a at
aμ = 0.25 and (b) aμ at a = 15 for the inputs prepared in the square
lattices. The network ExtSQ1 is associated with (a , aμ ) extrapolated ACKNOWLEDGMENTS
from SQ1 trained in the square lattices. (c) The finite-size-scaling test This work was supported from the Basic Science Re-
of the outputs of ExtSQ1 for the inputs from the triangular lattices. search Program through the National Research Foundation
The exact values of Tc = 4/ ln 3 and ν = 1 are used. The error bars
of Korea funded by the Ministry of Education (Grant No.
(not shown) are much smaller than the symbol size.
NRF-2017R1D1A1B03034669) and also from a GIST Re-
search Institute (GRI) grant funded by the GIST in 2018.
022138-4
SMALLEST NEURAL NETWORK TO LEARN THE ISING … PHYSICAL REVIEW E 98, 022138 (2018)
FIG. 5. Dependence of the training of a neural network on the L2 -regularization strength λ. (a) The visualization of the weight matrix W1T
of the 50-unit network trained in the 16 × 16 square lattices. (b) The success percentage in the phase classification test given as a function of
λ. The sum of the weights of the incoming links to a hidden neuron is plotted for (c) λ = 0.005, (d) 0.001, and (e) 0.0005, where the length of
the bar indicates the magnitude of the partial sum of the weights having the sign opposite to the total weight sum. The panels (f)–(h) present
the predictions of the transition temperature in the square and triangular lattices at λ = 0.005, 0.001, 0.0005.
Figure 5(a) visualizes the weight matrix W1 of the neural network for our purpose of investigating its ability to predict
network trained at various values of λ ranging from 0.1 to the transition temperature and interoperability with different
0.0001. In the typical validation test of the phase classifi- underlying lattice geometries.
cation with the reserved test dataset of 200 configurations The direct tests of finding the transition temperature with
per temperature in the same range, we have very similar the networks trained in the square lattices are performed with
success percentages of about 95% for all λ’s examined, while the datasets separately prepared in the triangular and square
slightly higher percentages are found at 0.0005 λ 0.01 lattices. The performance shown in these tests seems to be
[see Fig. 5(b)]. However, this simple classification test does closely related to the existence or nonexistence of the plus-
not fully validate the actual accuracy and performance of the minus structure of the weight observed in the weight matrix
022138-5
DONGKYU KIM AND DONG-HEE KIM PHYSICAL REVIEW E 98, 022138 (2018)
W1 which turns out to undergo a crossover from a structured on a single 3.4 GHz Xeon E3 processor. Note that we have
to unstructured one around λ = 0.001. This crossover does not obtained a single set of g(E, M ) for each system within our
change with the size of the system that we have examined. One computational resources, and thus the curves in Fig. 2 have
way to notice the change of the visibility of the structure is to been given without error bars. Once the joint density of states
look at the weight sum of the incoming links of a hidden neuron g(E, M ) is obtained, the distribution function ρT (y) of the
as exemplified in Figs. 5(c)–5(e). At λ = 0.005, the plus-minus magnetization y to be used to evaluate the two-unit network
structure is very clear since all contributing weights to a hidden can be computed at any temperature T as
neuron are of the same sign. While the weight sums are still well
g(E, M ) exp[J E/kB T ]
separated into plus, minus, and zeros, defects start to appear ρT (y)|y=M/N = E ,
at λ = 0.001, which is shown by the finite length of the bar E,M g(E, M ) exp[J E/kB T ]
in Fig. 5(d) that indicates the contribution of the weights with
where the ferromagnetic coupling J and the Boltzmann con-
the opposite sign to the total sum at a given neuron. At the
stant kB are set to be unity.
weaker regularizations of λ 0.0005, the length of the bar
tends to get larger to be comparable to the magnitude of the
weight sum. The behavior of the pseudotransition temperatures APPENDIX D: SUPPLEMENTAL FIGURES OF LOCATING
differs as well at these λ’s as shown in Figs. 5(f)–5(h). The mark THE TRANSITION POINT
at L = ∞ is from the extrapolation with the last three points Figure 6 displays the supplemental figures of finding the
while the error bars indicate the combined uncertainty of the transition temperatures with the two-unit neural networks in
three- and four-point fittings. Up to λ = 0.005 of having the the square and triangular lattices. The extrapolated network
clear structure in W1 , the system-size extrapolation with 1/L is ExtSQ1 is associated with the parameter set (a , aμ ) fitted
very consistent. At λ = 0.001, the finite-size behavior becomes to those of the network SQ1 that was explicitly trained in
severe, and below λ = 0.0005, the accuracy of predicting the square lattices. In the validation the network ExtSQ1
the transition temperature in the triangular lattices becomes for the transition temperature with the data in the square
poor as λ further decreases, implying that the overfitting to lattices, the extrapolation of the pseudotransition temperatures
the reference of a step-function-like classification may have (TL∗ along the 0.5-line of the output) provides T∞ ∗
≈ 2.269
occurred during the training with the data in the square lattices
at such small λ.
022138-6
SMALLEST NEURAL NETWORK TO LEARN THE ISING … PHYSICAL REVIEW E 98, 022138 (2018)
∗
which agrees very well √ with the exact value of the critical we obtain T∞ ≈ 3.637 which is also very comparable to the
point Tc = 2/ ln(1 + 2). On the other hand, in the additional value of Tc = 4/ ln 3 of the exact solution in the triangular
interoperability test of SQ1 with the data in the square lattices, lattices.
[1] G. E. Hinton and R. R. Salakhutdinov, Science 313, 504 (2006). [31] R. A. Vargas-Hernández, J. Sous, M. Berciu, and R. V. Krems,
[2] Y. LeCun, Y. Bengio, and G. Hinton, Nature (London) 521, 436 arXiv:1803.08195.
(2015). [32] Y.-T. Hsu, X. Li, D.-L. Deng, and S. Das Sarma,
[3] G. Carleo and M. Troyer, Science 355, 602 (2017). arXiv:1805.12138.
[4] Y. Nomura, A. S. Darmawan, Y. Yamaji, and M. Imada, Phys. [33] X.-Y. Dong, F. Pollmann, and X.-F. Zhang, arXiv:1806.00829.
Rev. B 96, 205152 (2017). [34] L. Wang, Phys. Rev. B 94, 195105 (2016).
[5] X. Gao and L.-M. Duan, Nat. Commun. 8, 662 (2017). [35] G. Torlai and R. G. Melko, Phys. Rev. B 94, 165134 (2016).
[6] D.-L. Deng, X. Li, and S. Das Sarma, Phys. Rev. X 7, 021021 [36] W. Hu, R. R. P. Singh, and R. T. Scalettar, Phys. Rev. E 95,
(2017). 062122 (2017).
[7] Z. Cai and J. Liu, Phys. Rev. B 97, 035116 (2018). [37] N. C. Costa, W. Hu, Z. J. Bai, R. T. Scalettar, and R. R. P. Singh,
[8] I. Glasser, N. Pancotti, M. August, I. D. Rodriguez, and J. I. Phys. Rev. B 96, 195138 (2017).
Cirac, Phys. Rev. X 8, 011006 (2018). [38] S. J. Wetzel, Phys. Rev. E 96, 022140 (2017).
[9] J. Chen, S. Cheng, H. Xie, L. Wang, and T. Xiang, Phys. Rev. B [39] P. Ponte and R. G. Melko, Phys. Rev. B 96, 205146 (2017).
97, 085104 (2018). [40] K. Ch’ng, N. Vazquez, and E. Khatami, Phys. Rev. E 97, 013306
[10] G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer, R. Melko, and (2018).
G. Carleo, Nat. Phys. 14, 447 (2018). [41] S. Iso, S. Shiba, and S. Yokoo, Phys. Rev. E 97, 053304 (2018).
[11] L. Huang and L. Wang, Phys. Rev. B 95, 035105 (2017). [42] W.-J. Rao, Z. Li, Q. Zhu, M. Luo, and X. Wan, Phys. Rev. B 97,
[12] L. Wang, Phys. Rev. E 96, 051301(R) (2017). 094207 (2018).
[13] J. Liu, Y. Qi, Z. Y. Meng, and L. Fu, Phys. Rev. B 95, 041101(R) [43] K. Mills and I. Tamblyn, Phys. Rev. E 97, 032119 (2018).
(2017). [44] P. Huembeli, A. Dauphin, and P. Wittek, Phys. Rev. B 97, 134109
[14] J. Liu, H. Shen, Y. Qi, Z. Y. Meng, and L. Fu, Phys. Rev. B 95, (2018).
241104(R) (2017). [45] Y.-H. Liu and E. P. L. van Nieuwenburg, Phys. Rev. Lett. 120,
[15] X. Y. Xu, Y. Qi, J. Liu, L. Fu, and Z. Y. Meng, Phys. Rev. B 96, 176401 (2018).
041119(R) (2017). [46] T. Ohtsuki and T. Ohtsuki, J. Phys. Soc. Jpn. 85, 123706
[16] Y. Nagai, H. Shen, Y. Qi, J. Liu, and L. Fu, Phys. Rev. B 96, (2016).
161102(R) (2017). [47] T. Ohtsuki and T. Ohtsuki, J. Phys. Soc. Jpn. 86, 044708 (2017).
[17] J. Carrasquilla and R. G. Melko, Nat. Phys. 13, 431 (2017). [48] A. Morningstar and R. G. Melko, arXiv:1708.04622.
[18] E. P. L. van Nieuwenburg, Y.-H. Liu, and S. D. Huber, Nat. Phys. [49] Z. Liu, S. P. Rodrigues, and W. Cai, arXiv:1710.04987.
13, 435 (2017). [50] N. Sun, J. Yi, P. Zhang, H. Shen, and H. Zhai, Phys. Rev. B 98,
[19] P. Broecker, J. Carrasquilla, R. G. Melko, and S. Trebst, Sci. 085402 (2018).
Rep. 7, 8823 (2017). [51] P. Huembeli, A. Dauphin, P. Wittek, and C. Gogolin,
[20] K. Ch’ng, J. Carrasquilla, R. G. Melko, and E. Khatami, Phys. arXiv:1806.00419.
Rev. X 7, 031038 (2017). [52] V. K. Singh and J. H. Han, arXiv:1806.03749.
[21] F. Schindler, N. Regnault, and T. Neupert, Phys. Rev. B 95, [53] U. Wolff, Phys. Rev. Lett. 62, 361 (1989).
245134 (2017). [54] M. Abadi et al., TensorFlow: Large-Scale Machine Learn-
[22] A. Tanaka and A. Tomiya, J. Phys. Soc. Jpn. 86, 063001 (2017). ing on Heterogeneous Systems, 2015; software available from
[23] S. J. Wetzel and M. Scherzer, Phys. Rev. B 96, 184410 (2017). https://tensorflow.org.
[24] Y. Zhang and E.-A. Kim, Phys. Rev. Lett. 118, 216401 (2017). [55] F. Wang and D. P. Landau, Phys. Rev. Lett. 86, 2050 (2001).
[25] Y. Zhang, R. G. Melko, and E.-A. Kim, Phys. Rev. B 96, 245119 [56] F. Wang and D. P. Landau, Phys. Rev. E 64, 056101 (2001).
(2017). [57] D. P. Landau, S.-H. Tsai, and M. Exler, Am. J. Phys. 72, 1294
[26] P. Zhang, H. Shen, and H. Zhai, Phys. Rev. Lett. 120, 066401 (2004).
(2018). [58] S. El-Showk, M. F. Paulos, D. Poland, S. Rychkov, D. Simmons-
[27] M. J. S. Beach, A. Golubeva, and R. G. Melko, Phys. Rev. B 97, Duffin, and A. Vichi, J. Stat. Phys. 157, 869 (2014).
045207 (2018). [59] M. Hasenbusch, Phys. Rev. B 82, 174433 (2010).
[28] P. Suchsland and S. Wessel, Phys. Rev. B 97, 174435 (2018). [60] M. A. Nielsen, Neural Networks and Deep Learning (Determi-
[29] M. Koch-Janusz and Z. Ringel, Nat. Phys. 14, 578 (2018). nation Press, 2015).
[30] I. A. Iakovlev, O. M. Sotnikov, and V. V. Mazurenko, [61] W. Kwak, J. Jeong, J. Lee, and D.-H. Kim, Phys. Rev. E 92,
arXiv:1803.06682. 022134 (2015).
022138-7