Estimating Gibbs free energies via isobaric-isothermal flows

Peter Wirnsberger; Borja Ibarz; George Papamakarios

doi:10.1088/2632-2153/acefa8

1. Introduction

Computer simulations have become an indispensable tool to refine our theoretical understanding of matter. Molecular dynamics (MD) and Monte Carlo methods are the two bedrock techniques for molecular simulation [1]. Recent advances in machine learning (ML), however, have enabled another paradigm of sampling from high-dimensional Boltzmann distributions that is based on powerful generative models, in particular normalizing flows [2–5]. An appealing property of normalizing flows is that they support drawing independent samples and provide exact likelihoods that are, for example, useful for efficient free energy estimation [6, 7]. Irrespective of the sampling method, statistical mechanics provides us with the mathematical language to reason about the complex interactions within a probabilistic framework.

A key goal in equilibrium statistical mechanics is the efficient estimation of expectation values ${\mathbb{E}}_{p(x)} \left [ {A(x)} \right]$ , where A(x) is an arbitrary function of a typically high-dimensional vector x and p(x) is the probability density which depends on a set of constant external parameters that define the statistical ensemble [1]. Two popular ensembles are the canonical ensemble, at fixed number of particles N, volume V and temperature T, and the isobaric-isothermal ensemble, at fixed N, P and T, where P is the pressure. While most of the differences between ensembles disappear in the 'thermodynamic limit', where $N \to \infty$ and $N/V \to \rho$ with ρ being the density, fluctuations and statistical averages can differ for some observables [1]. It is therefore crucial to sample from the correct distribution that is compatible with the external set of parameters.

Despite the considerable interest in ML-based sampling of molecular systems, applications of normalizing flows have primarily focused on the canonical ensemble [3, 6, 8–11], and do not support sampling from the isobaric-isothermal ensemble. Imposing a constant pressure is crucial, however, for studying important physical phenomena such as first-order phase transitions, across which the pressure is constant but the volume changes. It is further important for studying solid crystals to ensure that they can relax to their equilibrium shape [12, 13]. Moreover, a normalizing flow capable of sampling from the isobaric-isothermal ensemble would provide us with a simple and direct route to estimating the Gibbs free energy, by utilizing it as a targeted free energy estimator [6, 14].

In this work, we propose a normalizing flow model, called NPT-flow, that jointly generates particle coordinates and triclinic box shape parameters keeping the number of particles N, the pressure P and the temperature T fixed. The model can therefore be trained to approximate the target density of the isobaric-isothermal ensemble. The paper is structured as follows. First, we provide some background on flow-based sampling of the canonical ensemble in section 2 and introduce the NPT-flow for the isobaric-isothermal ensemble in section 3. We then benchmark the model on ice in two different crystal phases by comparison with MD data in section 4, and discuss limitations of and interesting extensions to our approach in section 5.

2. Background

This section discusses how normalizing flows can be used for unbiased estimation of observables within the framework of statistical mechanics. The general idea is to train a normalizing flow to closely approximate the canonical ensemble using standard deep-learning tools, and then estimate expectations under the ensemble using the flow model as a proposal distribution in an importance-weighting scheme. Readers familiar with this background can skip to section 3, where the extension of this methodology to the isobaric-isothermal ensemble is presented.

2.1. Canonical ensemble and Helmholtz free energy

We consider a system of N atoms in D dimensions, confined in a fixed simulation box (typically orthorhombic). Let $x_n = (x_{n,1}, \ldots, x_{n,D})$ be the position coordinates of the nth particle, and $x = (x_1, \ldots, x_N)$ be the configuration of the system. We assume that the atoms are subject to a potential energy U(x) and the system is in contact with a heat bath at fixed temperature T with which it can exchange energy. A standard result in statistical mechanics is that the equilibrium distribution of such a system (known as the canonical or NVT ensemble) is given by the Boltzmann distribution:

$\begin{equation} p(x) = \frac{1}{Z_V}\exp[-\beta U(x)], \end{equation} \tag{ 1 }$

where $Z_V = \int \mathrm{d}x \exp[-\beta U(x)]$ is the canonical partition function and $\beta = 1/k_\mathrm{B} T$ is the inverse temperature ( $k_\mathrm{B}$ is Boltzmann's constant). The canonical partition function is directly related to the Helmholtz free energy F via

$\begin{equation} F = -\beta^{-1}\ln{Z_V}. \end{equation} \tag{ 2 }$

In practice, one is interested in estimating (i) expectations under the Boltzmann distribution ${\mathbb{E}}_{p(x)} \left [ {A(x)} \right]$ of physical observables A, and (ii) the partition function Z_V and hence the free energy F. These problems are computationally challenging due to the high dimensionality of x and the fact that p(x) can only be evaluated point-wise and up to a constant. In what follows, we focus on a method that takes advantage of recent advances in ML and deep generative modeling, specifically normalizing flows.

2.2. Normalizing flows

Normalizing flows are deep generative models for parameterizing flexible probability distributions. A normalizing flow parameterizes a complex distribution $q_\theta(x)$ (with parameters θ) as the pushforward of a (typically fixed) simple base distribution q(u) via an invertible and differentiable function f_θ (typically parameterized with deep neural networks). In other words, a sample x from $q_\theta(x)$ is drawn by first sampling u from q(u) and transforming it with f_θ:

$\begin{equation} x \sim q_\theta(x) \quad\Leftrightarrow\quad x = f_\theta(u), u\sim q(u). \end{equation} \tag{ 3 }$

Thanks to the invertibility and differentiability of f_θ, the probability density of x can be computed exactly with a change of variables:

$\begin{equation} q_\theta(x) = q(u)\left|{\det \frac{\partial f_\theta}{\partial u}}\right|^{-1}, \end{equation} \tag{ 4 }$

where $\frac{\partial f_\theta}{\partial u}$ is the Jacobian of f_θ. There is a wide variety of implementations of invertible and differentiable functions f_θ using deep neural networks that are flexible enough to parameterize complex distributions $q_\theta(x)$ but whose Jacobian determinant $\det \frac{\partial f_\theta}{\partial u}$ remains efficient to compute, enabling exact density evaluation; for a thorough review, see [4, 15].

2.3. Training normalizing flows to approximate target distributions

Given a target distribution that can be evaluated point-wise and up to a constant, such as the Boltzmann distribution in equation (1), the objective is to train a normalizing-flow model to approximate it. Let the intractable target distribution take the general form $p(x) = \frac{1}{Z}\exp[g(x)]$ and let $q_\theta(x)$ be a flow model as defined in equation (3). We train the flow to approximate the target by minimizing a suitably chosen divergence between p(x) and $q_\theta(x)$ with respect to θ. A common choice is the reverse Kullback–Leibler divergence:

$\begin{align} D\!\left({q_\theta(x)}\,\|\,{p(x)}\right) & = {\mathbb{E}}_{q_\theta(x)} \left [ {\ln{q_\theta(x)} - \ln{p(x)}} \right]\kern65.5pt \end{align} \tag{ 5 }$

$\begin{align} & = {\mathbb{E}}_{q_\theta(x)} \left [ {\ln{q_\theta(x)} - g(x)} \right] + \ln{Z} \end{align} \tag{ 6 }$

$\begin{align} & \quad\quad= {\mathbb{E}}_{q(u)} \left [ {\ln{q(u)} - \ln{\left|{\det \frac{\partial f_\theta}{\partial u}}\right|} - g(f_\theta(u))} \right] + \ln{Z}, \end{align} \tag{ 7 }$

where in the last step we reparameterized the expectation with respect to the base distribution q(u). Using the final expression, the gradient of the KL divergence with respect to θ is

$\begin{equation} \nabla_\theta D\!\left({q_\theta(x)}\,\|\,{p(x)}\right) = {\mathbb{E}}_{q(u)} \left [ {- \nabla_\theta\ln{\left|{\det \frac{\partial f_\theta}{\partial u}}\right|} - \nabla_\theta g(f_\theta(u))} \right]. \end{equation} \tag{ 8 }$

Using samples from q(u), one can readily obtain an unbiased estimate of the above gradient, which can be used to minimize the KL divergence with any stochastic gradient optimization method.

2.4. Estimation of observables and free energy

After we train a flow model $q_\theta(x)$ to approximate the target distribution p(x), samples from $q_\theta(x)$ can be used in a Monte-Carlo fashion to estimate expectations ${\mathbb{E}}_{p(x)} \left [ {A(x)} \right]$ . Since $q_\theta(x)$ does not exactly equal p(x), direct Monte-Carlo estimates with samples from $q_\theta(x)$ will be biased; to avoid that, the probability density of samples can be used to remove the bias. Assuming $q_\theta(x) \gt 0$ whenever $p(x) \gt 0$ , the simplest way to remove bias is with importance weighting: using i.i.d. samples $(x^{(1)}, \ldots, x^{(S)})$ from $q_\theta(x)$ , we can form the asymptotically unbiased estimate

$\begin{equation} {\mathbb{E}}_{p(x)} \left [ {A(x)} \right] \approx \frac{\sum_{s = 1}^S w(x^{(s)})A(x^{(s)})}{\sum_{s = 1}^S w(x^{(s)})} \quad\text{where}\quad w(x) : = \frac{\exp[g(x)]}{q_\theta(x)}. \end{equation} \tag{ 9 }$

Similarly, the normalizing constant $Z = \int \mathrm{d}x \exp[g(x)]$ can be unbiasedly estimated as

$\begin{equation} Z \approx \frac{1}{S}\sum_{s = 1}^S w(x^{(s)}). \end{equation} \tag{ 10 }$

A disadvantage of importance-weighting is that it results in an estimator whose statistical performance is lower than that of a standard Monte-Carlo average of S independent samples. To quantify this reduction in statistical performance, the effective sample size (ESS) is often used, estimated by [16]

$\begin{equation} {\mathrm{ESS} \approx \frac{\left[\sum_{s = 1}^S w(x^{(s)})\right]^2}{\sum_{s = 1}^S \left[w(x^{(s)})\right]^2}.} \end{equation} \tag{ 11 }$

The ESS is a number between 1 and S (often reported as a percentage of S), and roughly measures how many independent samples would result in a Monte-Carlo estimator of the same statistical performance.

In statistical mechanics, the estimation method in equation (10) is widely known as free energy perturbation (FEP) [17]; when combined with a configuration space map it is referred to as targeted FEP [14] and, in combination with a learned map or a flow model it is referred to as learned FEP (LFEP) [6, 7].

In general, the simpler the estimation method, the closer the flow model must approximate the target for the statistical efficiency of the estimator to be within practical constraints. More sophisticated, but also computationally more expensive, multi-step estimation methods are available and can be utilized as learned estimators [6, 18–21] if higher accuracy is desirable.

3. Method

This section extends the application of normalizing flows from the canonical to the isobaric-isothermal ensemble with triclinic simulation boxes. The presentation is in two parts: section 3.1 shows how to construct a target distribution for the isobaric-isothermal ensemble, and section 3.2 proposes a flow architecture suitable for this target.

3.1. Isobaric-isothermal ensemble for atomistic systems

In the isobaric-isothermal ensemble, the external pressure P is held fixed while the shape of the simulation box, and hence its volume V, fluctuates. Therefore, the shape parameters of the simulation box should be treated as additional degrees of freedom that co-vary with the configuration x. In what follows, we will assume that the shape of the simulation box is controlled by a set of parameters h and we will formulate a joint probability density $p(x, h)$ .

A standard result in statistical mechanics is that the equilibrium distribution of the (x, V) system is given by replacing the potential energy U with the enthalpy U + PV in the Boltzmann distribution in equation (1), which gives:

$\begin{equation} p(x, V) = \frac{1}{Z_P} \exp\Big[-\beta(U(x, V) + P V)\Big], \end{equation} \tag{ 12 }$

where $Z_P = \int \mathrm{d}x \,\mathrm{d}V \exp[-\beta (U(x, V) + PV)] = \int \mathrm{d}V \exp[-\beta PV] Z_V$ is the partition function of the isobaric-isothermal ensemble. We note that the potential energy generally depends on the box shape, hence we include an explicit dependence on V. Analogously to the NVT case, the partition function Z_P is related to the Gibbs free energy G via

$\begin{equation} G = -\beta^{-1}\ln{\,Z_P}. \end{equation} \tag{ 13 }$

The above distribution is directly applicable to simulation boxes whose only shape parameter is V, for example a simulation box with isotropic scaling. However, it is not directly applicable to simulation boxes with more than one shape parameter without further assumptions of how the additional shape parameters are distributed.

In this work, we are interested in the case of a triclinic simulation box in D dimensions. The full flexibility to adjust the box shape is, for example, important when studying solids, to ensure that they can relax to their equilibrium shape [12, 13]. To describe the triclinic box shape, we let h be a D × D matrix with positive determinant that transforms the hypercube $[0, 1]^D$ into the simulation box. For D = 3, such a simulation box has nine shape parameters controlling the following nine degrees of freedom: scaling each of the three side lengths, varying each of the three internal angles, and three rigid rotations (see figure 1). Assuming that all simulation boxes with the same volume V are a priori equally likely, Martyna et al [22] derive the following distribution for the (x, h) system (equation 2.15 in their paper):

$\begin{equation} p(x, h) = \frac{1}{Z_P} \exp\Big[-\beta (U(x, h) + PV) + (1-D)\ln{V}\Big], \end{equation} \tag{ 14 }$

where $V = \det{h}$ is the volume of the simulation box.

Figure 1. Refer to the following caption and surrounding text. — **Figure 1.** Illustration of the box transformation. The cubic reference box h₀ (left) is transformed into a general triclinic box h (right) by multiplication with an upper triangular matrix M with positive diagonal elements.
Download figure:
Standard image High-resolution image

Since the systems we consider are globally rotationally symmetric, it is desirable to eliminate the $D(D-1) / 2$ rigid rotations from the shape parameters h. This can be done by restricting h to be upper-triangular ( $h_{ij} = 0$ for i > j) and requiring its diagonal elements h_ii to be positive. In this case, Martyna et al [22] derive the following distribution (equation 2.35 in their paper):

$\begin{equation} p(x, h) = \frac{1}{Z_P} \exp\Bigg[-\beta (U(x, h) + P V) + (1-D)\ln{V} + \sum_{i = 1}^D(i-1)\ln{h_{ii}}\Bigg], \end{equation} \tag{ 15 }$

where $V = \prod_{i = 1}^D h_{ii}$ is the volume of the simulation box.

A practical issue with the above distribution is that the range of atom positions $x = (x_1, \ldots, x_N)$ is not fixed, but varies with the shape parameters h. To avoid this complication, we define a fixed reference box with parameters h₀ (where we choose h₀ to be a diagonal D × D matrix with positive diagonal elements) and work with reference coordinates $s = (s_1, \ldots, s_N)$ that are defined with respect to the reference box. Concretely, we make the following variable changes:

(i)
$h = M h_0$ , where M is a D × D upper-triangular matrix with positive diagonal elements, and
(ii)
$x_n = M s_n$ for $n = 1, \ldots, N$ .

In addition, we log-transform the diagonal elements of M, so that we work with unconstrained shape parameters and that we ensure the diagonal elements M_ii are always positive. That is, we define a D × D upper-triangular matrix m with unconstrained upper-triangular elements, and make the variable change:

(iii)
$M_{ii} = \exp(m_{ii})$ for $i = 1, \ldots, D$ , and
(iv)
$M_{ij} = m_{ij}$ for i ≠ j.

The above transformations change the variables of interest from (x, h) to (s, m). As shown in the supplementary material, this change of variables leads to the following probability density for (s, m), which is the target distribution we use in our experiments:

$\begin{align} p(s, m) = \frac{1}{Z_P} \exp\Bigg[&-\beta (U(x, h) + P V) + (N + 2 - D)\ln{V} \nonumber \\ &- (N+1-D) \ln{V_0} + \sum_{i = 1}^D(i-1)\ln{h_{ii}}\Bigg], \end{align} \tag{ 16 }$

where $V = \prod_{i = 1}^D \exp(m_{ii})h_{0,i}$ is the volume of the simulation box and $V_0 = \prod_{i = 1}^D h_{0,i}$ is the volume of the reference box. The term $(N+1-D) \ln{V_0}$ is a constant with respect to (s, m), so it can be absorbed into the normalizing constant.

3.2. Model architecture

In this section, we describe the architecture of a model $q_\theta(s, m)$ that is suitable for approximating the target in equation (16). Our primary design choice is to decompose the model as:

$\begin{equation} q_\theta(s, m) = q_{{\theta_1}}(m) q_{{\theta_2}}(s \,|\, m), \end{equation} \tag{ 17 }$

where $q_{{\theta_1}}(m)$ is a flow model over the shape parameters, and $q_{{\theta_2}}(s \,|\, m)$ is a shape-conditional flow model over the configuration. The trainable parameters of the full model are $\theta = (\theta_1, \theta_2)$ , where θ₁ are the trainable parameters of $q_{\theta_1}(m)$ and θ₂ are the trainable parameters of $q_{\theta_2}(s \,|\, m)$ . In the following paragraphs, we describe each model in detail. The overall model architecture is illustrated in figure 2.

Figure 2. Refer to the following caption and surrounding text. — **Figure 2.** Left: high-level architecture of the composite flow model. Blue indicates invertible pathways, red indicates conditional (non-invertible) pathways. The flow over the configuration s also includes other invertible layers (circular shifts) interleaved with the coupling layers, not shown here for simplicity (these layers are not affected by *c_m* ). Right: Architecture of a single coupling layer, showing how it is conditioned on the encoded shape *c_m* .
Download figure:
Standard image High-resolution image

3.2.1. Flow over shape parameters

As previously described in section 3.1, m consists of $D(D+1)/2$ unconstrained shape parameters. For concreteness, we will focus on D = 3 (which is the setting of our experiments), in which case $m\in\mathbb{R}^6$ . Since m is unconstrained, any standard flow can be used to model its distribution. In our experiments, we use a flow model with base distribution $q_m(u_m)$ and transformation $f_m(u_m)$ defined as follows.

The base distribution is a standard six-dimensional Gaussian with zero mean and unit covariance: $q_m(u_m) = \mathcal{N}(u_m; 0, I)$ .
The flow transformation is affine: $f_m(u_m) = Wu_m + b$ , where W is a $6\times 6$ upper-triangular matrix with positive diagonal elements and b is a six-dimensional vector. We make the diagonal elements of W positive by parameterizing them as the softplus of unconstrained parameters.

The trainable parameters θ₁ of the above flow are: the bias b, the (pre-softplus) diagonal of W, and the above-diagonal elements of W.

This flow parameterizes a six-dimensional Gaussian distribution with mean b and covariance $WW^\top$ , that is, $q_{{\theta_1}}(m) = \mathcal{N}(m; b, WW^\top)$ . This Gaussian parameterization is complete, in the sense that any Gaussian distribution can be written this way. We found that this Gaussian parameterization is sufficient for our systems of interest (cubic and hexagonal ice): in section 4.1, we show empirically that the true distribution of shape parameters is statistically indistinguishable from a Gaussian. However, it is straightforward to use more flexible flow models if Gaussianity does not hold.

3.2.2. Shape-conditional flow over configuration

The atom configuration s consists of the ND atom coordinates with respect to the orthorhombic reference box h₀. Conditional on the shape parameters, the atom configuration is distributed as $p(s \,|\, m) \propto \exp[-\beta U(x, h)]$ , which is the Boltzmann distribution of a canonical ensemble confined in the reference box. Therefore, the configuration model $q_{{\theta_2}}(s \,|\, m)$ is tasked with approximating a family of canonical ensembles, indexed by the shape parameters m.

A general strategy for designing a conditional flow is to start from an unconditional one, $s = f_s(u_s)$ , and inject the conditioning information m as an additional input, such that $s = f_s(u_s; m)$ . The transformation $f_s(u_s; m)$ must be invertible with respect to u_s , but not with respect to m. This is the approach we follow in this work.

We build upon the work in [11], which proposed a flow model for atomic solids at fixed volume¹ . This flow model consists of a number of invertible coupling layers, each of which transforms part of its input (see figure 2). Concretely, let $z = (z_{n,i})^{n = 1:N}_{i = 1:D}$ denote the input to a coupling layer and $z^{^{\prime}} = (z^{^{\prime}}_{n,i})^{n = 1:N}_{i = 1:D}$ denote the output, where n ranges over particles and i ranges over spatial dimensions. The coupling layer works as follows.

First, the input is split in two parts along the i index, z_A and z_B . The first part stays fixed, whereas the second gets transformed by an invertible transformation $G(z_B; \psi)$ parameterized by ψ. For simplicity, let us assume that the first d spatial dimensions stay fixed, so $z_A = (z_{n,i})^{n = 1:N}_{i = 1:d}$ .
The parameters ψ are computed as a function of z_A , $\psi = C(z_A)$ , where C is referred to as the conditioner. The first step of C is to apply a circular encoding to each particle $(z_{n,i})_{i = 1:d}$ (see appendix A of [11] for details). The second step is to apply a permutation-equivariant transformer architecture [23] to the circular encodings, which outputs the parameters ψ.

This flow architecture results in a distribution over the configuration s that is invariant to particle permutations.

To make the above architecture shape-conditional, we first encode the shape parameters m into a code c_m using a multi-layer perceptron, as shown in figure 2. Then, we make the code c_m an additional input to every conditioner in the flow such that, for every coupling layer, $\psi = C(z_A, c_m)$ . We achieve this by concatenating c_m with the circular encoding of each particle $(z_{n,i})_{i = 1:d}$ prior to feeding the latter to the permutation-equivariant transformer. This modification preserves the permutation equivariance, and results in a conditional distribution $q_{{\theta_2}}(s\,|\, m)$ that is invariant to particle permutations for every m.

The trainable parameters θ₂ of this model are: (a) the weights and biases of the multi-layer perceptron that encodes m into c_m ; (b) the trainable parameters of the permutation-equivariant transformers that parameterize the conditioners C (there is one such transformer for each coupling layer, each with its own trainable parameters); and (c) the trainable circular shifts that are interleaved between coupling layers (these were omitted from this section for simplicity as they are not affected by conditioning on c_m , but their details can be found in appendix A of [11]).

4. Results

We assess the performance of the NPT-flow model proposed in section 3 by comparing selected observables and Gibbs free energy estimates to results obtained with MD simulations. We benchmark our trained models on ice, using a coarse-grained potential called monatomic water [24], which treats molecules as point particle interacting via two-body and three-body terms. More specifically, we trained separate models for varying numbers of particles (64, 216 and 512) in the hexagonal (Ih) and cubic (Ic) phases of ice, totaling six experiments with a minimum of ten random seeds each. The goal of the flow model is to draw samples at ambient pressure (1 atm) and a temperature of 200 K. All models were initialized with a perfect lattice at a density of about 1.004 g cm⁻³, as in [11], which leads to a pressure of approximately $1550\,\text{atm}$ in an NVT simulation.

We ran MD in the NPT-ensemble using the simulation package LAMMPS [25], and employed a Langevin thermostat to control the temperature and a fully flexible barostat to control the pressure [12, 22, 26].² All further simulation settings are identical to the ones used in [11] and we refer to appendices C–D in the Supplemental Data of that work for further details.

4.1. Distributions of shape parameters

We first justify our use of a Gaussian-affine flow by verifying that for our systems of interest the true distribution of shape parameters is statistically indistinguishable from a six-dimensional Gaussian. To this end, we compare histograms of the six shape parameters sampled from the flow model with their counterparts sampled during MD. The results for the 512-particle cubic system are shown in figure 3 and demonstrate excellent agreement of model and MD. Results for hexagonal ice show the same quality of agreement (see supplementary material) and results for all smaller systems are omitted for the sake of brevity, as training with fewer particles is in general much simpler due to the reduced number of degrees of freedom.

Figure 3. Refer to the following caption and surrounding text. — **Figure 3.** Distributions of shape parameters. The panels show the distributions of diagonal (left) and off-diagonal elements of the shape matrix estimated for samples from the trained model (model) and *NPT*-simulations (MD) for the cubic 512-particle system.
Download figure:
Standard image High-resolution image

To further support our assumption that a Gaussian with full covariance matrix is sufficient to model the shape distribution accurately for ice, we estimated the maximum mean discrepancy (MMD) [28] between the MD distribution of shape parameters and a full-covariance Gaussian fit. We observed that MMD estimates are close to zero (zero MMD implies equality of distributions) and that they are distributed similarly to reference MMD estimates between two identical six-dimensional Gaussians (see figure 4).

Figure 4. Refer to the following caption and surrounding text. — **Figure 4.** Estimated squared maximum mean discrepancy (using a squared exponential kernel) between the MD distribution of shape parameters and a six-dimensional full-covariance Gaussian fit, for cubic (left) and hexagonal (right) ice. The squared MMD was estimated using 10 000 samples, and the experiment was repeated 100 times (resulting in 100 MMD estimates). For reference, MMD estimates between the six-dimensional Gaussian and itself are also shown. Boxes range from the first to the third quartile of data, with median at the red line; whiskers extend from the corresponding quartile to the farthest datapoint within 1.5 times the box size. All other points are depicted separately as outliers.
Download figure:
Standard image High-resolution image

4.2. Distributions of energy and pressure

We next compare the distributions of energies and instantaneous pressure values for both cubic and hexagonal 512-particle systems in figure 5 and observe again good overall agreement between model and MD for both quantities. Table 1 shows the average energy and pressure for various system sizes (64, 216 and 512 particles) for both cubic and hexagonal ice, as estimated (a) directly by model samples (biased), (b) by re-weighted model samples (unbiased), and (c) by MD. We also show the ESS of importance sampling in each case as a percentage of the actual sample size. The results suggest that the bias due to the model is generally small, and that both observables are accurately estimated. We can further see that importance sampling is able to remove small biases, although at the expense of increasing the error bars, especially for the largest system size (512 particles). Finally, we see that the ESS is reasonably high for 64 and 216 particles, but becomes small for 512 particles, leading to the pronounced increase in estimation uncertainty mentioned previously.

Figure 5. Refer to the following caption and surrounding text. — **Figure 5.** A comparison of the energy histograms (left) and instantaneous pressure histograms (right) for the trained 512-particle cubic (top) and hexagonal (bottom) models and MD.
Download figure:
Standard image High-resolution image

Table 1. Biased (BE) and unbiased importance-sampling (IS) estimates of mean energy and pressure for system sizes of N = 64, 216 and 512 particles, along with estimates of the ESS expressed as a percentage of the actual sample size. MD estimates were computed with 100 seeds and 10 000 samples each. Model estimates comprise 16 seeds for each phase for 64 particles, 16 and 14 seeds for cubic and hexagonal, respectively, for 216 particles, and 11 seeds for 512 particles, with 512 000 samples per seed in all cases.

	Cubic		Hexagonal
	Energy (kcal mol⁻¹)	Pressure (atm)	Energy (kcal mol⁻¹)	Pressure (atm)
N = 64	ESS = $53.7\%$		ESS = $60.3\%$
BE	$-750.61 \pm 0.02\hphantom{0}$	$0.88 \pm 1.10$	$-750.50 \pm 0.02\hphantom{0}$	$0.28 \pm 1.98$
IS	$-750.456 \pm 0.005$	$1.15 \pm 0.92$	$-750.351 \pm 0.005$	$0.29 \pm 1.98$
MD	$-750.451 \pm 0.008$	$0.85 \pm 0.63$	$-750.352 \pm 0.009$	$1.30 \pm 0.73$
N = 216	ESS = $6.8\%$		ESS = $13.6\%$
BE	$-2535.08 \pm 0.05$	$1.58 \pm 1.26$	$-2534.69 \pm 0.04$	$1.03 \pm 1.46$
IS	$-2534.50 \pm 0.11$	$2.43 \pm 2.67$	$-2534.20 \pm 0.01$	$-0.01 \pm 0.80$
MD	$-2534.54 \pm 0.02$	$0.93 \pm 0.39$	$-2534.22 \pm 0.02$	$0.80 \pm 0.42$
N = 512	ESS = $0.17\%$		ESS = $0.15\%$
BE	$-6009.56 \pm 0.17\hphantom{0}$	$1.57 \pm 0.91$	$-6009.07 \pm 0.11\hphantom{0}$	$0.93 \pm 1.14$
IS	$-6008.84 \pm 0.48\hphantom{0}$	$-1.63 \pm 16.1$	$-6007.6 \pm 1.1\hphantom{00}$	$-23.6 \pm 38.8$
MD	$-6008.776 \pm 0.027$	$0.98 \pm 0.28$	$-6008.011 \pm 0.024$	$0.93 \pm 0.24$

4.3. Radial distribution functions

To further assess the quality of the trained model, we computed the radial distribution function and compare the results for both phases to MD in figure 6. We find the NPT-flow results to be almost indistinguishable from MD under this metric, suggesting that the model captures pairwise, structural properties correctly.

Figure 6. Refer to the following caption and surrounding text. — **Figure 6.** Radial distribution functions for the 512-particle cubic system (left) and hexagonal system (right). Results obtained with the respective models are shown as dashed lines and the MD results are shown as solid lines.
Download figure:
Standard image High-resolution image

4.4. Gibbs free energies

We finally compare Gibbs free energy estimates obtained with our trained models with MD estimates for all three system sizes. For the MD baseline estimates, we first borrow the Helmholtz free energy estimates published in [11] and promote these to Gibbs free energy estimates following the method proposed by Cheng and Ceriotti [29]. The Gibbs free energy is related to the Helmholtz free energy by

$\begin{equation} G(P, T) = F(h_\text{ref}, T) + P \det(h_\text{ref}) + k_\text{B} T \ln p (h_\text{ref} \,|\, P, T), \end{equation} \tag{ 18 }$

where h_ref defines the fixed simulation cell for which we know the Helmholtz free energy estimate, and $\ln p (h_\text{ref} \,|\, P, T)$ denotes the log-density of the reference box at fixed pressure and temperature. To obtain an estimate of this quantity, we first fit a six-dimensional Gaussian to the shape data sampled in an NPT-simulation, and then evaluate the log-density of the fit on the reference box analytically.

Table 2 contains the Gibbs free energy differences predicted by our models, using the LFEP estimator in equation (10), with the MD baseline estimates computed using equation (18). We find excellent agreement of both methods across all system sizes, verifying that our NPT-flow can resolve even the fine free energy difference between cubic and hexagonal ice accurately.

Table 2. Gibbs free energy differences between cubic and hexagonal ice, $\Delta G = G_\text{cubic} - G_\text{hex}$ , at ambient pressure and a temperature of 200 K, reported in units of J mol⁻¹.

N	LFEP	Baseline
64	51.76(3)	51.64(27)
216	15.61(21)	15.56(22)
512	6.70(32)	6.51(57)

5. Discussion

To summarize our work, we have proposed a normalizing-flow model, which we call NPT-flow, that can be optimized to sample from the isobaric-isothermal ensemble. We have further presented experimental evidence to show that the NPT-flow can sample ice for system sizes of up to 512 particles with high accuracy without requiring ground-truth samples to train. All structural and energetic metrics considered in this work agree well with the MD baseline results, even without applying an explicit re-weighting step to unbias estimates. We further demonstrated that the NPT-flow can capture the relatively small Gibbs free energy differences between cubic and hexagonal ice accurately when used as a targeted free energy estimator.

We find the agreement of the NPT-flow with MD to be of similar quality as the NVT-results presented in [11], where the particle-only flow model was used for generating particle coordinates. This suggests that generating the additional six shape parameters of the box distribution simultaneously does not complicate the learning task appreciably. Furthermore, the required hardware, training times and quality of results are all comparable with what was reported previously for the canonical ensemble [11] and are summarized in the supplementary material. An interesting extension for future work would be to extend the NPT-flow to rigid bodies, such as explicit water, by combining our model with the model proposed by Köhler et al [30].

There are several limitations of our approach, one being that we currently assume a Gaussian-affine box distribution. While this type of model is fully sufficient for the system under consideration, it is conceivable that other problems may require a more flexible description. Making this part more general, however, is a straightforward extension, as we can replace the affine flow with any standard non-affine one from the flow literature [4, 15].

A more fundamental limitation is the restriction to relatively small system sizes. In particular, for the hexagonal 512-particle system, we obtain an ESS of only about 0.15% (expressed as a percentage of the actual sample size) compared to an ESS of around 60% for 64 particles (see table 1). This can be overcome with drawing enough samples, which is computationally efficient with a flow model, as sampling can be done in parallel. However, the rapid decay in the statistical efficiency of naive importance sampling as dimensionality increases makes scaling the approach to larger systems challenging. One option for pushing the dimensionality further would be to consider more sophisticated estimation methods than naive importance sampling, in particular multi-stage methods capable of working with flow models [6, 18–21]. Another avenue would be to improve the architecture and training methods of flow models further, but this would require further innovations in the field.

Finally, although not the focus of this paper, it is worth commenting on the broader comparison between flow-based sampling methods and MD. Apart from the fact that MD models the temporal evolution of a system, which makes it applicable to a wider class of problems, a core difference between the two approaches is that flow-based methods can produce independent samples that can be generated in parallel, whereas MD is a sequential method based on small integration steps leading to correlated samples. Therefore, flow-based methods are in principle not constrained by the existence of low-energy barriers, as for example discussed by Noé et al [3]. Moreover, flow-based sampling is easily parallelizable and therefore well-suited to modern distributed hardware. The main downside of flow-based methods is the need for training, which as discussed becomes increasingly challenging for larger systems. Therefore, a promising avenue for flow-based methods is amortization, where a single flow model may be trained to approximate a range of different states, parameterized, for example, by particle size, temperature, pressure or other quantities of interest, so that the cost of training is amortized across many evaluations.

Acknowledgment

We would like to thank our colleagues Sébastien Racanière, Andrew J Ballard, Alexander Pritzel, Danilo Jimenez Rezende, Charles Blundell, Théophane Weber, John Jumper, Jonas Adler, Alexander G D G Matthews and Arnaud Doucet for their help and for stimulating discussions.

Data availability statement

The data cannot be made publicly available upon publication because the cost of preparing, depositing and hosting the data would be prohibitive within the terms of this research project. The data that support the findings of this study are available upon reasonable request from the authors.

Dates

Peer review information

3.2.1. Flow over shape parameters

3.2.2. Shape-conditional flow over configuration

Estimating Gibbs free energies via isobaric-isothermal flows

Author notes

Notes

Article metrics

Submit

Share this article

Dates

Peer review information

Abstract

1. Introduction

2. Background

2.1. Canonical ensemble and Helmholtz free energy

2.2. Normalizing flows

2.3. Training normalizing flows to approximate target distributions

2.4. Estimation of observables and free energy

3. Method

3.1. Isobaric-isothermal ensemble for atomistic systems

3.2. Model architecture

3.2.1. Flow over shape parameters

3.2.2. Shape-conditional flow over configuration

4. Results

4.1. Distributions of shape parameters

4.2. Distributions of energy and pressure

4.3. Radial distribution functions

4.4. Gibbs free energies

5. Discussion

Acknowledgment

Data availability statement

Footnotes