Discovering Phase Transitions With Unsupervised Learning
Discovering Phase Transitions With Unsupervised Learning
Lei Wang
Beijing National Lab for Condensed Matter Physics and Institute of Physics,
Chinese Academy of Sciences, Beijing 100190, China
Unsupervised learning is a discipline of machine learning which aims at discovering patterns in big
data sets or classifying the data into several categories without being trained explicitly. We show
that unsupervised learning techniques can be readily used to identify phases and phases transitions
of many body systems. Starting with raw spin configurations of a prototypical Ising model, we use
principal component analysis to extract relevant low dimensional representations the original data
and use clustering analysis to identify distinct phases in the feature space. This approach successfully
arXiv:1606.00318v2 [cond-mat.stat-mech] 6 Jun 2016
finds out physical concepts such as order parameter and structure factor to be indicators of the phase
transition. We discuss future prospects of discovering more complex phases and phase transitions
using unsupervised learning techniques.
Classifying phases of matter and identifying phase system, etc. They are often being used as a preprocessor
transitions between them is one of the central topics of of supervised learning to simplify the training procedure.
condensed matter physics research. Despite an astronom- In many cases, unsupervised learning also lead to better
ical number of constituting particles, it often suffices to human interpretations of complex datasets.
represent states of a many-body system with only a few In this paper, we explore the application of unsuper-
variables. For example, a conventional approach in con- vised learning in many-body physics with a focus on
densed matter physics is to identify order parameters via phase transitions. The advantage of unsupervised learn-
symmetry consideration or analyzing low energy collec- ing is that one assumes neither the presence of the phase
tive degree of freedoms and use them to label phases of transition nor the precise location of the critical point.
matter [1]. Dimension reduction techniques can extract salient fea-
However, it is harder to identify phases and phase tran- tures such as order parameter and structure factor from
sitions in this way in an increasing number of new states the raw configuration data. Clustering analysis can then
of matter, where the order parameter may only be defined divide the data into several groups in the low-dimensional
in an elusive nonlocal way [2]. These new developments feature space, representing different phases. Our studies
call for new ways of identifying appropriate indicators of show that unsupervised learning techniques have great
phase transitions. potentials of addressing the big data challenge in the
To meet this challenge, we use machine learning tech- many-body physics and making scientific discoveries.
niques to extract information of phases and phase tran- As an example, we consider the prototypical classical
sitions directly from many-body configurations. In fact, Ising model
application of machine learning techniques to condensed X
matter physics is a burgeoning field [3–13][33]. For ex- H = −J σi σj , (1)
hi,ji
ample, regression approaches are used to predict crystal
structures [3], to approximate density functionals [6], and where the spins take two values σi = {−1, +1}. We
to solve quantum impurity problems [10]; artificial neural consider the model (1) on a square lattice with periodic
networks are trained to classify phases of classical statis- boundary conditions and set J = 1 as the energy unit.
tical models [13]. However, most of those applications The system undergoes a phase transition at temperature
use supervised learning techniques (regression and clas- √
T /J = 2/ln(1 + 2) ≈ 2.269 [14]. A discrete Z2 spin
sification), where a learner needs to be trained with the inversion symmetry is broken in the ferromagnetic phase
previously solved data set (input/output pairs) before it below Tc and is restored in the disordered phase at tem-
can be used to make predictions. peratures above Tc .
On the other hand, in the unsupervised learning, there We generate 100 uncorrelated spin configuration sam-
is no such explicit training phase. The learner should by ples using Monte Carlo simulation [15] at temperatures
itself find out interesting patterns in the input data. Typ- T /J = 1.6, 1.7, . . . , 2.9 each and collect them into an
ical unsupervised learning tasks include cluster analysis M × N matrix
and feature extraction. Cluster analysis divides the input
data into several groups based on certain measures of sim- ↑ ↓ ↑ ... ↑ ↑ ↑
ilarities. Feature extraction finds a low-dimensional rep- X=
..
, (2)
.
resentation of the dataset while still preserving essential ↓ ↑ ↓ ... ↑ ↓ ↑ M ×N
characteristics of the original data. Unsupervised learn-
ing methods have broad applications in data compres- where M = 1400 is the total number of samples, and N
sion, visualization, online advertising and recommender is the number of lattice sites. The up and down arrows
2
100 0.05 50
25 (a) 2.8
N =202 0.04 0
y2
N =402
N =802 0.03 25 2.6
w1
0.02 50
10-1 50
0.01 25 (b) 2.4
0.000 400 800 1200 1600 0
λ̃`
y2
i 2.2
25
10-2 50
50 2.0
25 (c)
0 1.8
y2
25
10-3 1 2 3 4 5 6 7 8 9 10 50100 50 100 1.6
` y01 50
Figure 1: The first few explained variance ratios obtained Figure 2: Projection of the samples onto the plane of the
from the raw Ising configurations. The inset shows the leading two principal components. The color bar on the right
weights of the first principal component on an N = 402 square indicates the temperature T /J of the samples. The panels
lattice. (a-c) are for N = 202 , 402 and 802 sites respectively.
in the matrix denote σi = ±1. Such a matrix is the only highly redundant description of the system’s state be-
data we feed to the unsupervised learning algorithm. cause there are correlations among the spins. Moreover,
Our goal is to discover possible phase transition of the as the temperature varies, there is an overall tendency
model (1) without assuming its existence. This is differ- in the raw spin configurations, such as lowering the total
ent from the supervised learning task, where exact knowl- magnetization. In the following, we will try to first iden-
edge of Tc was used to train a learner [13]. Moreover, tify some crucial features in the raw data. They provide
the following analysis does not assume any prior knowl- an effective low dimensional representation of the original
edge about the lattice geometry and the Hamiltonian. data. And in terms of these features, the meaning of the
We are going to use the unsupervised learning approach distance between configurations becomes more transpar-
to extract salient features in the data and then use this ent. The separation of phases is also often clearly visible
information to cluster the samples into distinct phases. and comprehensible by the human in the reduced space
Knowledge about the temperature of each sample and spanned by these features. Therefore, feature extraction
the critical temperature Tc of the Ising model is used to does not only simplifies the subsequent clustering anal-
verify the clustering. ysis but also provides effective means of visualizing and
Interpreting each row of X as a coordinate of an N - offering physical insights. We denote the crucial features
dimensional space, the M data points form a cloud cen- extracted by the unsupervised learning as indicators of
tered around the origin of a hypercube [34]. Discovering the phase transition. In general, they do not necessarily
a phase transition amounts to find a hypersurface which need to be the same as the conventional order parame-
divides the data points into several groups, each repre- ters defined in condensed matter physics. This unsuper-
senting a phase. The task is akin to the standard unsu- vised learning approach nevertheless provides an alterna-
pervised learning technique: cluster analysis [16], where tive view of phases and phase transitions.
numerous algorithms are available, and they group the Principal component analysis (PCA) [17] is a widely
data based on different criteria. used feature extraction technique. The principal compo-
However, direct applying clustering algorithms to the nents are mutually orthogonal directions along which the
Ising configurations may not be very enlightening. The variances of the data decrease monotonically. PCA finds
reasons are twofold. First, even if one manages to sep- the principal components through a linearly transforma-
arate the data into several groups, clusters in high di- tion of the original coordinates Y = XW . When applied
mensional space may not directly offer useful physical to the Ising configurations in Eq. (2), PCA finds the most
insights. Second, many clustering algorithms rely on a significant variations of the data changing with the tem-
good measure of similarity between the data points. Its perature. We interpret them as relevant features in the
definition is, however, ambiguous without supplying of data and use them as indicators of the phase transition
domain knowledge such as the distance between two spin if there is any.
configurations. We write the orthogonal transformation into column
On the other hand, the raw spin configuration is a vectors W = (w1 , w2 , . . . , wN ) and denote w` as weights
3
0.10
w1 w2
0.04
0.08
0.02
0.06 w3 w4 0.00
0.02
λ̃`
0.04 0.04
Figure 3: Typical configurations of the COP Ising model at
below (a,b) and above (c) the critical temperature. Red and 0.02
blue pixels indicate up and down spins. There arePexactly half
of the pixels are red/blue due to the constraint i σi ≡ 0. 0.001 2 3 4 5 6 7 8 9 10
`
of the principal components in the configuration space. Figure 4: Explained variance ratios of the COP Ising model.
They are determined by an eigenproblem [18] [35] Insets show the weights corresponding to the four leading
principal components.
X T Xw` = λ` w` . (3)
1 X
S= σi σj [cos (θi − θj ) + cos (φi − φj )] . (5)
study. It is, therefore, a good example to demonstrate N 2 i,j
the ability of the unsupervised learning approach.
Even though such structure factor was unknown to the
We perform the same PCA on the COP Ising config- author before it was discovered by the learner, one can
urations sampled with Monte Carlo simulation [19] and convince himself it indeed captures the domain wall for-
show the first few explained variance ratios in Fig. 4. mation at low temperatures shown in Fig. 3(a,b). Fig-
Notably, there are four instead of one leading princi- ure 6(b) shows the structure factor versus temperature
pal components. Their weights plotted in the insets of for various system sizes. It decreases as the temperature
Fig. 4 show notable nonuniformity over the lattice sites. increases and clearly serves as a good indicator of the
This indicates that in the COP Ising model the spatial phase transition. We emphasis that the input spin con-
distribution of the spins varies drastically as the tem- figurations contain no information about the lattice ge-
perature changes. Denote Euclidean √coordinate of site ometry nor the Hamiltonian. However, the unsupervised
i as (µi , νi ), where µi , νi = 1, 2, . . . , N . The weights learner has by itself extracted meaningful information re-
of the four leading principal components can be writ- lated to the breaking of the orientational order. There-
ten as cos(θi√ ), cos(φi ), sin(θi ), sin(φi ), where (θi , φi ) = fore, even without the knowledge of the lattice and the
(µi , νi ) × 2π/ N [37]. Note these four mutually orthogo- analytical understanding of the structure factor Eq. (5),
nal weights correspond to the two orientations of the do- P4 2
`=1 y` plays the same role of separating the phases in
main walls shown in Fig. 3(a,b). Therefore, the PCA cor- the projected space.
rectly finds out the rotational symmetry breaking caused
It is interesting to compare our analysis of phase tran-
by the domain wall formation.
sitions to standard imagine recognition applications. In
To visualize the samples in the four-dimensional fea- the Ising model example, the P learner essentially finds out
ture space spanned by the first few principal compo- the brightness of the imagine i σi as an indicator of the
nents, we plot two-dimensional projections in Fig. 5. In phase transition. While in the COP Ising model exam-
all cases, the high-temperature samples are around the ple, instead of detecting sharpness of the edges (melting
origin while the low-temperature samples form a sur- of domain walls) following the ordinary imagine recog-
rounding cloud. Motivated by the circular shapes of all nition routine, the PCA learner finds out the structure
these projections, we further reduce to a two-dimensional factor Eq. (5) related to symmetry breaking, which is a
space via a nonlinear transformation (y1 , y2 , y3 , y4 ) 7→ fundamental concept in phase transition and condensed
(y12 + y22 , y32 + y42 ). As shown in Fig. 6(a), the line matter physics.
P4 2 Considering PCA is arguably one of the simplest un-
`=1 y` = const (a four dimensional sphere of a constant
radius) separates the low and high temperature samples. supervised learning techniques, the obtained results are
This motivates a further dimension reduction to a single rather encouraging. In essence, our analysis finds out the
P4
variable `=1 y`2 as an indicator of the phase transition dominant collective modes of the system related to the
in the COP Ising model. phase transition. The approach can be readily general-
ized to more complex cases such as models with emergent
Substituting weights of the four principal P4 components symmetry and order by disorder [21]. The unsupervised
cos(θi ), cos(φi ), sin(θi ), sin(φi ), the sum `=1 y`2 is pro- learning approach is particularly profitable in the case of
5
hidden or multiple intertwined orders, where it can help [10] L.-F. Arsenault, A. Lopez-Bezanilla, O. A. von Lilienfeld,
to single out various phases. and A. J. Millis, Physical Review B 90, 155136 (2014).
Although nonlinear transformation of the raw config- [11] G. Pilania, J. E. Gubernatis, and T. Lookman, Physical
uration Eq. (5) was discovered via visualization in Fig. 5, Review B 91, 214302 (2015).
[12] Z. Li, J. R. Kermode, and A. De Vita, Physical Review
simple PCA is however limited to linear transformations.
Letters 114, 096405 (2015).
Therefore, it remains challenging to identify more subtle [13] J. Carrasquilla and R. G. Melko, (2016), 1605.01735 .
phase transitions related to the topological order, where [14] L. Onsager, Physical Review 65, 117 (1944).
the indicators of the phase transition are nontrivial non- [15] U. Wolff, Physical Review Letters 62, 361 (1989).
linear functions of the original configurations. For this [16] B. S. Everitt, S. Landau, M. Leese, and D. Stahl, Cluster
purpose, it would be interesting to see if a machine learn- Analysis (Wiley, 2010).
ing approach can comprehend concepts such as duality [17] K. Pearson, Philosophical Magazine 2, 559 (1901).
transformation [22], Wilson loop [23] and string order pa- [18] I. Jolliffe, Principal Component Analysis (John Wiley &
Sons, Ltd, Chichester, UK, 2002).
rameter [24]. A judicial apply of kernel techniques [25] or [19] M. Newman and G. T. Barkema, Monte Carlo methods
neural network based deep autoencoders [26] may achieve in statistical physics (Oxford, 1999).
some of these goals. [20] C. N. Yang, Physical Review 85, 808 (1952).
Furthermore, although our discussions focus on ther- [21] R. Moessner and S. L. Sondhi, Physical Review B 63,
mal phase transitions of the classical Ising model, the 224401 (2001).
unsupervised learning approaches can also be used to an- [22] F. J. Wegner, Journal of Mathematical Physics 12, 2259
alyze quantum many-body systems and quantum phase (1971).
[23] K. G. Wilson, Physical Review D 10, 2445 (1974).
transitions [27]. In these applications, diagnosing quan- [24] M. den Nijs and K. Rommelse, Physical Review B 40,
tum states of matter without knowledge of Hamiltonian 4709 (1989).
is a useful paradigm for cases with only access to wave- [25] B. Schölkopf, A. Smola, and K. R. Müller, Neural com-
functions or experimental data. putation (1998).
Acknowledgment The author thanks Xi Dai, Ye-Hua [26] G. E. Hinton and R. R. Salakhutdinov, Science 313, 504
Liu, Yuan Wan, QuanSheng Wu and Ilia Zintchenko (2006).
for discussions and encouragement. The author also [27] S. Sachdev, Quantum phase transitions (Cambridge Uni-
versity Press, 2011).
thanks Zi Cai for discussions and careful readings of the [28] L. Saitta, A. Giordana, and A. Cornuéjols, Phase Transi-
manuscript. L.W. is supported by the start-up funding tions in Machine Learning (Cambridge University Press,
of IOP-CAS. 2011).
[29] P. Mehta and D. J. Schwab, (2014), 1410.3831 .
[30] E. M. Stoudenmire and D. J. Schwab, (2016),
1605.05775v1 .
[31] S. Lloyd, M. Mohseni, and P. Rebentrost, (2013),
[1] P. W. Anderson, Basic notions of condensed matter 1307.0411 .
physics (The Benjamin-Cummings Publishing Company, [32] S. R. White, Physical Review Letters 69, 2863 (1992).
1984). [33] We also note application of physics ideas such as phase
[2] X.-G. Wen, Quantum field theory of many-body systems transition [28], renormalization group [29], tensor net-
(Oxford University Press, 2004). works [30] and quantum computation [31] to machine
[3] S. Curtarolo, D. Morgan, K. Persson, J. Rodgers, and learning.
G. Ceder, Physical Review Letters 91, 135503 (2003). [34] Each column of X sums up to zero since on average each
[4] O. S. Ovchinnikov, S. Jesse, P. Bintacchit, S. Trolier- site has zero magnetization.
McKinstry, and S. V. Kalinin, Physical Review Letters [35] In practice this eigenproblem is often solved by singu-
103, 157203 (2009). lar value decomposition of X. In fact, replacing the in-
[5] G. Hautier, C. C. Fischer, A. Jain, T. Mueller, and put data X (raw spin configurations collected at various
G. Ceder, Chemistry of Materials 22, 3762 (2010). temperature) by the wave function of a one-dimensional
[6] J. C. Snyder, M. Rupp, K. Hansen, K.-R. Müller, and quantum system, the math here is identical to the trunca-
K. Burke, Physical Review Letters 108, 253002 (2012). tion of Schmidt coefficients in the density-matrix renor-
[7] Y. Saad, D. Gao, T. Ngo, S. Bobbitt, J. R. Chelikowsky, malization group calculations [32].
and W. Andreoni, Physical Review B 85, 104104 (2012). [36] See, for example, the methods provided in the
[8] E. LeDell, Prabhat, D. Y. Zubarev, B. Austin, and scikit-learn cluster module http://scikit-learn.org/
W. A. Lester, Journal of Mathematical Chemistry 50, stable/modules/clustering.html
2043 (2012). [37] The weights shown in the inset of Fig. 4 are linear mix-
[9] M. Rupp, A. Tkatchenko, K.-R. Müller, and O. A. von tures of them.
Lilienfeld, Physical Review Letters 108, 058301 (2012).