Anharmonic phonons with Gaussian processes

Keerati Keeratikarn Department of Physics, Imperial College London, Exhibition Road, London SW7 2AZ, UK Jarvist Moore Frost Department of Chemistry, Imperial College London, UK Department of Physics, Imperial College London, Exhibition Road, London SW7 2AZ, UK [email protected]

(June 6, 2024)

Abstract

We provide a method for calculating anharmonic lattice dynamics, by building a surrogate model based on Gaussian Processes (GPs). Due to the underlying Gaussian form of a GP, the model is infinitely differentiable. This allows us to train the model trained directly on forces (the derivative of PESs) reducing the evaluations required for a given accuracy. We can extend this differentiation to directly calculate second and third order force-constants using automatic differentiation (AD). For the five model materials we study, we find that the force-constants are in close agreement with a standard finite-displacement approach. Our method appears to be linear scaling in the number of atoms at predicting both second and third-order (anharmonic) force-constants.

Potential Energy Surface, Anharmonic Force Constants, Gaussian Processes

Anharmonicity; Gaussian processes; Phonon

I Introduction

Finite temperature properties of matter require a model for thermal motion. In crystals it is natural to describe the thermal motion as collective excitation around the equilibrium structure. The standard approach is to use a finite displacement method (FDM), based in a Taylor expansion of the potential energy surface around the equilibrium. The second-order force constants provide the harmonic approximation for the potential energy surface (PES); anharmonic contributions (required for finite thermal conductivity) require higher order force constants. FDM-based calculations scale poorly both with the size of the system, and the order of the force constants.

The potential energy surface (PES) of a solid-state material directly provides the mechanical response of a material (Young’s modulus, yield strength), and also provides the environment in which the electronic structure exists. Thus it is key to both the electron-phonon coupling which lead to finite electrical conductivity, and the phonon-phonon scattering which lead to finite thermal conductivity.

For small perturbations around the equilibrium structure, a Taylor expansion of the potential energy surface can be made. As at equilibrium the first-order (linear) term of this expansion is zero (i.e. no forces), the first non-zero term is the second-order harmonic contribution. Cutting off the expansion at this point (the harmonic approximation) leads to the mass-weighted ‘dynamic matrix’ of force-constants. Diagonalising this matrix produces normal-modes (eigenvectors of the dynamic matrix) and vibration frequencies (eigenvalues of the dynamic matrix), which describe the phonon properties of a material. This change of basis is often used as a natural way in which to describe the response theory of solid state materials. The phonon modes can be directly measured as a function of crystallographic momentum with neutron scattering experiments, and the gamma-point modes give rise to infrared absorption of materials (by the polarisation of a given mode), and Raman response (by the hyperpolarisation of a given mode).

These normal-modes do not interact, and therefore have an infinite lifetime; thus a harmonic picture of matter directly leads to the prediction of infinite thermal conductivity. Anharmonic corrections to this picture can be added via many-body perturbation theory, where second quantisation is used to consider these vibration states as quasi-particles, the phonon force Boson.

The standard approach to calculate phonons is via a finite displacement method. For harmonic phonons, this requires moving each atom a small displacement separately in ${x,y,z}$ , requiring $3^{2}N^{2}$ evaluations of the total energy. (Typically these forces come from an electronic structure method, such as density functional theory.)

To include third-order force-constants in a finite displacement method, one must consider each atom moving in ${x,y,z}$ combined with every other atom moving in ${x,y,z}$ , requiring $3^{3}N^{3}$ separate evaluations of the total energy.

This prohibitive scaling, combined with the minimal $O(N^{3})$ scaling of electronic structure methods, severely limits the size of systems for which we can predict the thermal conductivity. This is a major limitation in humanity’s ability to design materials with specific thermal characteristics. A particular technical application is in the design of thermoelectrics, where one wants to minimise thermal conductivity while maximising electrical conductivity, and for which state-of-the-art materials are extremely complex to maximise phonon scattering.

From an information theoretic point of view, this restriction is curious. There is no more information embedded in the third-order derivatives than the second. In fact, the full potential energy surface is present in every calculation, and in electronic structure methods the evaluation of the more complicated system has already been paid for by the $O(N^{3})$ scaling of the electronic structure method.

As an alternative approach, one could use a more sophisticated surrogate method, than fitting the individual terms of a Taylor expansion. In this work we use Gaussian Processes as a machine-learning surrogate potential energy surface. Gaussian processes (GPs) are a Bayesian (probabilistic) machine learning method which can fit an arbitrary function. During the last 15 years, a number of methods have been developed using Gaussian Processes to fit potential energy surfaces[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13].

Our particular focus is motivated by the fact that every linear operation on a Gaussian process results in a (transformed) Gaussian process. Differentiation is one such linear operation. In the context of force-constants, this means that the hierarchy of force-constants ( $\frac{\partial^{N}E}{\partial r^{N}}$ ) can simultaneously be used to condition (fit) the Gaussian Process energy surface; and to be directly calculated from the Gaussian process. This enables a maximal amount of information to be extracted from each individual electronic structure calculation; and for the calculation of force-constants (of arbitrary order), without further calculation. As a particular technical application, this may enable the calculation of anharmonic phonons for larger unit cells at a lower computational cost.

Prior work in this area includes Gaussian Processes have recently been applied to fitting the low-dimensional potential energy surfaces for polarised water [1], $\mathbf{N}_{4}$ [5], a single Si-crystal [10, 11], methyl chloride [6] and even chemical reactions [8, 7, 9]. Wu, Aoi and Pillow provide a general Bayesian optimisation method that makes use of gradient information [14]. This follows from earlier work which concentrated on low-dimensional observations of a dynamic model [15]. Gaussian process models of potential energy surfaces and forces have found considerable application in molecular dynamics of materials [2, 3].

Gaussian processes scale poorly with respect to the number of training points $m$ , as $O(m^{3})$ from an inversion of an $m\times m$ matrix to normalise the model and $O(m)$ from a dot-product of $1\times m$ matrix to evaluate the model [13, 16].

Several techniques are used to minimise the amount of training data required. One such technique is to increase the complexity of Gaussian processes model by compositing two or more kernel functions together. Dai and Krems [12] illustrated the use of composite kernel to interpolate and extrapolate the 6-dimensional potential energy surfaces of $\mathbf{H}_{3}\mathbf{O}^{+}$ . Subsequently, a 51-dimensional potential energy surface for protonated imidazole dimer is predicted via the full-dimensional GP model compositing several GP models contributing to lower-dimensional molecular fragments, presented by Sugisawa et al. [13].

Differentiating the Gaussian Process is another approach to reduce the training data required. Garijo del Río et al.[17] used Gaussian processes as a surrogate model of the anharmonic potential energy surface to accelerate structural relaxation, providing the GPMin function in the Atomic Simulation Environment (ASE) package. However, their Gaussian process directly fits a $1+3N$ vector of the total energy and the $3N$ forces. They do not exploit the linear nature of the differentiation operation, and therefore the ability to build a nested Gaussian process.

More similar to our work, Kaappa et al.[18] used a gradient-aided model to enhanced global structure prediction. Asnaashari and Krems[19] added composite kernels to boost the accuracy of force and energy prediction of large molecules.

As far as we know, we are the first people to take this approach to higher derivatives (anharmonic force constants), and apply them to ‘lattice dynamics’.

An alternative approach to accelerate anharmonic calculations is that of the HiPhive [20] extension of cluster-expansion. The potential of the atomic environment is constructed from a sum of two-body, three-body and four-body clusters. In these clusters, the pair (or the group) of atoms are contributed by harmonic (or higher anharmonic) force constants. These force constants are parameterised by training machine learning on prior forces associating to the input configuration in the atomic environment.

II Methods

The key development challenge in our work was in building derivatives into the Gaussian Processes: this we did by a combination of analytic derivatives of the Gaussian Process kernels, and by using Automatic Differentiation (which automatically applies the chain rule of derivation recursively) at the computer code level. McHutchon’s technical note was extremely useful in providing a number of Gaussian Process kernel derivatives[21].

II.1 Gaussian Process model

We begin with discussing standard Gaussian process regression, in the context of the potential energy surface and its derivatives.

To predict potential energy surfaces by Gaussian processes, a potential energy $E(\mathbf{x}^{*})$ can be written in terms of the sum of arbitrary basis functions $\phi_{h}$ as a function of an atomic configuration (descriptor) $\mathbf{x}^{*}$ , which can be expressed as

\begin{split}E(\mathbf{x}^{*})&=\sum_{h\>=\>1}^{H}w_{h}\phi_{h}(\textbf{x}^{*}% )\\ &=\textbf{w}^{\text{T}}\Phi(\textbf{x}^{*}).\end{split}

(1)

The descriptor $\mathbf{x}^{*}$ can be chosen to represent a set of atomic positions in Cartesian coordinates for an $N$ -atom system,

\begin{split}\mathbf{x}^{*}:=\{x^{*}_{1},y^{*}_{1},z^{*}_{1},...,x^{*}_{N},y^{% *}_{N},z^{*}_{N}\}.\end{split}

(2)

The parameters $w_{h}$ of an $H$ -dimensional weight vector $\mathbf{w}$ , corresponding to a basis vector $\Phi$ , can be selected to be a Gaussian (normal) distribution weight of prior potential energy surface data $E(\mathbf{x})$ . Each Gaussian weight is distributed as $P(w_{h}):=\mathcal{N}(w_{h};\>0,\>\sigma_{w})$ which has a zero-mean and an individual variance $\sigma_{w}^{2}$ . Regarding the expression in (1), we can write the covariance between a prior potential energy $E(\mathbf{x})$ and a posterior $E(\mathbf{x}^{*})$ as

\begin{split}\left\langle E(\mathbf{x})E(\mathbf{x}^{*})\right\rangle&=\sum_{% hh^{\prime}}^{H}\langle w_{h}w_{h^{\prime}}\rangle\phi_{h}(\textbf{x})\phi_{h^% {\prime}}(\textbf{x}^{*})\\ &=\sum_{hh^{\prime}}^{H}\left(\int\text{d}^{H}\mathbf{w}P(\mathbf{w})w_{h}w_{h% ^{\prime}}\right)\phi_{h}(\textbf{x})\phi_{h^{\prime}}(\textbf{x}^{*}).\end{split}

(3)

Since the covariance between two Gaussian weights is to integrate them over the distribution, the integral yields $\sigma_{w}^{2}\delta_{hh^{\prime}}$ so we have

\begin{split}\left\langle E(\mathbf{x}),\>E(\mathbf{x}^{*})\right\rangle&=\sum% _{hh^{\prime}}^{H}\left(\sigma_{w}^{2}\delta_{hh^{\prime}}\right)\phi_{h}(% \textbf{x})\phi_{h^{\prime}}(\textbf{x}^{*})\\ &=\sigma_{w}^{2}\sum_{h}^{H}\phi_{h}(\textbf{x})\phi_{h}(\textbf{x}^{*}).\end{split}

(4)

The inner product between two basis functions is defined as the similarity measures of two atomic descriptors and be rewritten as a kernel (covariance) function

\begin{split}k(\mathbf{x},\>\mathbf{x}^{*})=\sum_{h\>=\>1}^{H}\phi_{h}(\textbf% {x})\phi_{h}(\textbf{x}^{*}).\end{split}

(5)

We can use this kernel function to perform Gaussian process regression for predicting a posterior potential energy $E(\mathbf{x}^{*})$ from a prior dataset of previous calculations of potential energy surfaces $\mathbf{E}$ . The prior dataset consists of $m$ potential energy surfaces corresponding to $m$ atomic descriptors, i.e. $\mathbf{E}:=\left[E_{1}\>E_{2}\cdots E_{m}\right]^{\text{T}}$ and $\textbf{X}:=\{\textbf{x}_{1},\>\textbf{x}_{2},\cdots,\>\textbf{x}_{m}\}$ respectively. Each prior $E_{i}$ can have an observation noise $\sigma_{e}$ , due to the error or convergence limitation of the electronic structure calculations: $E_{i}=E(\mathbf{x}_{i})+\sigma_{e}$ . If the prior $E_{i}$ is (univariate) Gaussian distributed independently for each data, we will obtain the covariance of any two of the prior data as $\langle E_{i}E_{j}\rangle=k(\mathbf{x}_{i},\>\mathbf{x}_{j})+\sigma^{2}_{e}% \delta_{ij}$ and can write the prior probability distribution of each $E_{i}$ with zero mean and variance $\sigma^{2}_{e}$ as

P(E_{i})=\frac{\textbf{e}^{\left({-\frac{1}{2}(k(\mathbf{x}_{i},\>\mathbf{x}_{% i})+\sigma^{2}_{e})^{-1}\cdot E_{i}^{2}}\right)}}{\sqrt{(2\pi)\>(k(\mathbf{x}_% {i},\>\mathbf{x}_{i})+\sigma^{2}_{e})}}.

(6)

Therefore, the probability distribution of all priors (marginal likelihood) is a multivariate Gaussian distribution with zero mean and the $m\times m$ covariance matrix, $\langle\textbf{E}\textbf{E}^{\text{T}}\rangle=K(\textbf{X},\textbf{X})+\sigma^% {2}_{e}\mathbb{I}_{mm}$ , expressed as

P(\textbf{E})=\frac{\textbf{e}^{\left(-\frac{1}{2}\textbf{E}^{\text{T}}\cdot% \left[K(\textbf{X},\>\textbf{X})+\sigma^{2}_{e}\mathbb{I}_{mm}\right]^{-1}% \cdot\textbf{E}\right)}}{\sqrt{(2\pi)^{m}\>\text{det}\left[K(\textbf{X},\>% \textbf{X})+\sigma^{2}_{e}\mathbb{I}_{mm}\right]}}.

(7)

To make a Gaussian process regression function $E(\mathbf{x}^{*})$ , we use the analogy of Bayesian inference to calculate the conditional probability distribution of $E(\mathbf{x}^{*})$ , given by the prior $\mathbf{E}$ :

\begin{split}P\left(E(\mathbf{x}^{*})|\textbf{E}\right)=\frac{P\left(\textbf{E% },\>E(\mathbf{x}^{*})\right)}{P\left(\textbf{E}\right)}.\end{split}

(8)

This is also normal distributed since $P\left(\textbf{E},\>E(\mathbf{x}^{*})\right)$ is a Gaussian joint distribution. By substituting (7) into (8) and manipulating it algebraically following the derivation in [22], we can write the posterior mean of $\left(E(\mathbf{x}^{*})|\textbf{E}\right)$ as

E(\mathbf{x}^{*})=k(\textbf{x}^{*},\>\textbf{X})\cdot\left[K(\textbf{X},\>% \textbf{X})+\sigma^{2}_{e}\mathbb{I}_{mm}\right]^{-1}\cdot\textbf{E},

(9)

where $k(\mathbf{x}^{*},\;\textbf{X})$ is a $1\times m$ covariance matrix between the posterior $E(\mathbf{x}^{*})$ and the dataset $\mathbf{E}$ .

Gaussian processes are obviously powerful for predicting potential energy surfaces, since the model depends only on a kernel as function of two (same and different) atomic descriptors and is trained on the prior potential energy surface data directly. There is no need to determine the basis functions and the corresponding weights explicitly for the prediction.

However, it is possible to define a basis set in order to model nonlinear functions where there is nonlinear relationship among the corresponding descriptors. For instance, instead of representing atomic configurations in the Cartesian basis we could use spherical harmonics as a basis function, making the model equivariant, as used to great effect in recent works orientated towards molecular dynamics[3, 23, 24, 25]. This approach is also known as the “kernel trick”.

Commonly used Gaussian Process kernel functions allow us to define the similarity measure between an atomic descriptor in our Gaussian process model. Essentially these contain a prior belief in how much influence nearby data points should have on the underlying model. They generally encode the belief that physics is myopic—nearby data-points should have a greater influence on the model.

The standard and mathematically convenient kernel is the squared exponential (Gaussian or Radial Basis Function: RBF) kernel,

k(\mathbf{x},\>\mathbf{x}^{*})=\sigma_{o}\exp{\left(-\frac{|\mathbf{x}-\mathbf% {x}^{*}|^{2}}{2l^{2}}\right)},

(10)

where $l$ is the sole parameter, giving a length scale on which influence decays.

More simple kernels include a dot-product (or linear) kernel defined as

k(\mathbf{x},\>\mathbf{x}^{*})=\mathbf{x}\cdot\mathbf{x}^{*}

(11)

and its more generalised form: the polynomial kernel

k(\mathbf{x},\>\mathbf{x}^{*})=(\mathbf{x}\cdot\mathbf{x}^{*})^{\zeta},

(12)

where $\zeta$ is a degree of the polynomial.

There are many other developed kernels such as Matérn, periodic and rational quadratic whose definition conveniently found in Chapter 4 of Rasmussen’s book[16].

In the conditioning (training) process, we may want to undertake hyperparameter optimisation, to set the observation noise $\sigma_{e}$ , the kernel scale $\sigma_{o}$ and the length scale $l$ for (10). One approach we implement is to maximise the logarithm of the marginal likelihood (7):

\begin{split}\log&P=-\frac{1}{2}\textbf{E}^{\text{T}}\cdot\left[K(\textbf{X},% \>\textbf{X})+\sigma^{2}_{e}\mathbb{I}_{mm}\right]\cdot\textbf{E}\\ &-\frac{1}{2}\log\left(\det\left[K(\textbf{X},\>\textbf{X})+\sigma^{2}_{e}% \mathbb{I}_{mm}\right]\right)-\frac{m}{2}\log 2\pi\end{split}

(13)

for which its derivatives with respect to optimised hyperparameters are zero.

So far everything we have discussed is a standard Gaussian process, adapted to model a potential energy surface. This would require many electronic structure calculations to train the model, as we only make use of the total energy of the system (a single scalar value) per calculation. In the next section, we propose a derivative potential energy surface Gaussian process model, where we can additionally use the derivative of the energies (forces) which are often provided ‘for free’ by the Hellman-Feynman theorem from electronic structure calculations. This significantly reduces the number of calculations required to train the model.

II.2 Derivative model

To make use of force (partial derivative of energy with respect to displacement) information we need to take the derivative of the Gaussian process[15, 14]: kernel functions need to be differentiated with respect to the descriptors, up to the second order.

Consider the first derivative of kernel function $k(\mathbf{x},\mathbf{x}^{*})$ with respect to the descriptor $\mathbf{x}$ of $N$ atoms in Cartesian representations, $\mathbf{x}\in\mathbb{R}^{3N}$ . The derivative can be written as a $3N\times 1$ kernel matrix (Jacobian matrix as a first rank tensor)

\left[\nabla_{\mathbf{x}}k(\mathbf{x},\>\mathbf{x}^{*})\right]^{\textbf{T}}=% \left[\frac{\partial k}{\partial x_{1}}\ \ \frac{\partial k}{\partial x_{2}}% \cdots\frac{\partial k}{\partial x_{3N}}\right]^{\textbf{T}}.

(14)

Refer to caption — Figure 1: The diagram $(1a)$ shows the tensor contraction, Eq. (20), of the derivative GP model trained on one datapoint at x in Cartesian coordinates. This results in figure $(1b)$ , the harmonic force constant $\Phi^{(2)}$ at $\textbf{x}^{*}$ , of $2\times 1\times 1$ Si-bulk illustrated in $(1c)$ .

In the derivative model, this matrix is used as the covariance matrix which link the force information at $\mathbf{x}$ to the potential energy at $\mathbf{x}^{*}$ . Similarly, we can define the covariance matrix between the potential energy at $\mathbf{x}$ and forces at $\mathbf{x}^{*}$ by taking the first derivative of the kernel with respect to $\mathbf{x}^{*}$ :

\left[\nabla_{\mathbf{x}^{*}}k(\mathbf{x},\>\mathbf{x}^{*})\right]^{\textbf{T}% }=\left[\frac{\partial k}{\partial x^{*}_{1}}\ \ \frac{\partial k}{\partial x^% {*}_{2}}\cdots\frac{\partial k}{\partial x^{*}_{3N}}\right]^{\textbf{T}}.

(15)

Finally, we need the model marginal likelihood which contains the covariance of the forces at $\mathbf{x}$ and $\mathbf{x}^{*}$ . For this we require Hessian of the kernel with respect to those two descriptors, resulting in a $3N\times 3N$ second rank tensor covariance matrix,

\nabla_{\mathbf{x}}\nabla_{\mathbf{x}^{*}}k=\begin{bmatrix}\frac{\partial^{2}k% }{\partial x_{1}\partial x_{1}^{*}}&\frac{\partial^{2}k}{\partial x_{1}% \partial x_{2}^{*}}&\cdots&\frac{\partial^{2}k}{\partial x_{1}\partial x_{3N}^% {*}}\\ \frac{\partial^{2}k}{\partial x_{2}\partial x_{1}^{*}}&\frac{\partial^{2}k}{% \partial x_{2}\partial x_{2}^{*}}&&\vdots\\ \vdots&&\ddots&\\ \frac{\partial^{2}k}{\partial x_{3N}\partial x_{1}^{*}}&\cdots&&\frac{\partial% ^{2}k}{\partial x_{3N}\partial x_{3N}^{*}}\end{bmatrix}.

(16)

With only one prior energy $E(\mathbf{x}_{1})$ and the corresponding forces $\nabla E(\mathbf{x}_{1})$ , the marginal likelihood matrix can be reconstructed by combining (10), (14), (15) and (16). This marginal likelihood covariance matrix ( $(1+3N)\times(1+3N)$ ) is

\mathcal{K}(\mathbf{x}_{1},\>\mathbf{x}_{1})=\begin{bmatrix}k&(\nabla_{\mathbf% {x}}k)^{\textbf{T}}\\ \nabla_{\mathbf{x}^{*}}k&\nabla_{\mathbf{x}}\nabla_{\mathbf{x}^{*}}k\end{% bmatrix}_{\mathbf{x},\>\mathbf{x}^{*}=\mathbf{x}_{1}}+\Sigma,

(17)

The $(1+3N)\times(1+3N)$ $\Sigma$ observation noise matrix is

\Sigma:=\begin{bmatrix}\sigma^{2}_{e}&\mathbf{0}\\ \mathbf{0}&\sigma^{2}_{f}\cdot\mathbb{I}_{3N\times 3N}\end{bmatrix},

(18)

where $\sigma_{e}$ and $\sigma_{f}$ are observation noises for prior energy and force respectively. One choice for these noises are the energy and force convergence factor in the underlying electronic structure calculations used for training.

Therefore the potential energy prediction, analogous to (9), is

E(\mathbf{x}^{*})=\begin{bmatrix}k&(\nabla_{\mathbf{x}^{*}}k)^{\textbf{T}}\end% {bmatrix}_{\textbf{x}^{*},\textbf{x}_{1}}\cdot\mathcal{K}^{-1}_{\mathbf{x}_{1}% ,\>\mathbf{x}_{1}}\cdot\begin{bmatrix}E\\ \nabla E\end{bmatrix}_{\mathbf{x}_{1}},

(19)

where the subscripts indicate the functions evaluated at the descriptors.

Motivated by the derivative model, the differentiation can be extended to compute the second order (harmonic) and the third order (cubic anharmonic) force constants.

To predict the harmonic force constant, it is necessary to evaluate the second and the third order derivative with respect to those descriptors, yielding

(\nabla_{\mathbf{x}^{*}})^{2}k\quad\text{and}\quad(\nabla_{\mathbf{x}^{*}})^{2% }\nabla_{\mathbf{x}}k.

These correspond to the $(3N\times 3N)$ rank-2 and $(3N\times 3N\times 3N)$ rank-3 tensor, which shows the covariance linking the harmonic force constant at $\mathbf{x}^{*}$ to the potential energy and to the force fields respectively at $\mathbf{x}$ . We can, then, perform (19) again with the change of those covariance matrix:

\begin{split}\Phi^{(2)}(\mathbf{x}^{*})=\begin{bmatrix}(\nabla_{\mathbf{x}^{*}% })^{2}k\\ (\nabla_{\mathbf{x}^{*}})^{2}\nabla_{\mathbf{x}}k\end{bmatrix}^{\text{T}}_{% \textbf{x}^{*},\textbf{x}_{1}}\cdot\mathcal{K}^{-1}_{\mathbf{x}_{1},\>\mathbf{% x}_{1}}\cdot\begin{bmatrix}E\\ \nabla E\end{bmatrix}_{\mathbf{x}_{1}},\end{split}

(20)

where $\Phi_{2}$ is the $(3N\times 3N)$ harmonic force constant containing the correlation of between all possible degree of freedoms of $\mathbf{x}^{*}$ . This Eq. (20) can constructed as a tensor, which is then contracted, as in Figure 1.

We propose a similar method for predicting the (cubic) anharmonic force constant. The covariance tensor correlating the cubic anharmonic force constant to the potential energy and to the forces can be written as

(\nabla_{\mathbf{x}^{*}})^{3}k\quad\text{and}\quad(\nabla_{\mathbf{x}^{*}})^{3% }\nabla_{\mathbf{x}}k

respectively. We use these rank-3 and rank-4 covariance tensors to recast (20) as

\begin{split}\Phi^{(3)}(\mathbf{x}^{*})=\begin{bmatrix}(\nabla_{\mathbf{x}^{*}% })^{3}k\\ (\nabla_{\mathbf{x}^{*}})^{3}\nabla_{\mathbf{x}}k\end{bmatrix}^{\text{T}}_{% \textbf{x}^{*},\textbf{x}_{1}}\cdot\mathcal{K}^{-1}_{\mathbf{x}_{1},\>\mathbf{% x}_{1}}\cdot\begin{bmatrix}E\\ \nabla E\end{bmatrix}_{\mathbf{x}_{1}},\end{split}

(21)

where $\Phi_{3}$ is the cubic anharmonic force constant tensor with the size of $(3N\times 3N\times 3N)$ .

With these methods of directly evaluating the force constants from our fitted Gaussian Process, we have all the necessary components for calculating the lattice dynamics of a material, including anharmonicity.

II.3 Phonon coordinate representations

Instead of using the direct Cartesian representations, it seemed to be a good idea to try and directly use the phonon coordinates (normal modes) with the Gaussian process. These normal modes form a natural basis for the motion of a material or molecule, and naturally encode the symmetries of the material or molecule. Due to this, they are also the starting point for higher order physical models of lattice dynamics, such as many-body perturbation-theory which couples the normal modes through the anharmonicity.

Our hope was that this representation would increase the efficiency of the machine learning method (i.e. greater accuracy for fewer evaluations), without having to build any symmetries or equivariances directly into our methods. The normal modes are provided by diagonalising the mass-weighted second-order force-constant matrix (the dynamic matrix).

To build phonon descriptors, we start by considering this “Dynamical Matrix” [26, 27, 28] at wave vector q:

\begin{split}D_{Ai,Bj}(\textbf{q})=\frac{1}{\sqrt{m_{A}m^{\prime}_{B}}}\sum_{l% ^{\prime}}\Phi^{(2)}_{0Ai,l^{\prime}Bj}\textbf{e}^{i\textbf{q}\cdot(\textbf{R}% ^{o}_{l^{\prime}B}-\textbf{R}^{o}_{0A})}\end{split}

(22)

for component $i$ of atom $A$ and component $j$ of $B$ with mass $m_{A}$ and $m^{\prime}_{B}$ respectively. The summation is over all possible primitive cells $l^{\prime}$ in the supercell where $0$ indicates the reference primitive cell. Here $\textbf{R}^{o}$ is an atomic displacement from the equilibrium.

Our phonon descriptors are derived from a simple change of basis, using these new eignevectors of the dynamics matrix,

\begin{split}d_{Bj}(\textbf{q})=\sqrt{m^{\prime}_{B}}\sum_{l^{\prime}}\textbf{% x}_{l^{\prime}Bj}\textbf{e}^{i\textbf{q}\cdot(\textbf{R}^{o}_{l^{\prime}B}-% \textbf{R}^{o}_{0A})},\end{split}

(23)

where x are the original Cartesian descriptors [29].

The system symmetry is directly imposed to the model: the model does not have to learn the symmetry by itself. Additionally, the model descriptors are collapsed from an $l\times l\times l$ supercell to a $\kappa$ -atom primitive cell size; i.e., $N=l^{3}\kappa$ atoms are in the atomic environment. The corresponding tensor contraction Equation (20) can be reconstructed as shown in Figure 2.

For $m$ training points, the learning costs of $\mathcal{O}(3^{3}m^{3}N^{3})$ and the prediction cost $\mathcal{O}(3mN)$ [16, 30] are reduced to $\mathcal{O}(3^{3}m^{3}\kappa^{3})$ and $\mathcal{O}(3m\kappa)$ respectively.

The predicted second order force constant (as a yield from Equation (20)) will be in phonon coordinates as a dynamical matrix at wave-vector $\mathbf{q}$ . We can then recover the second order force constant in Cartesian coordinates by using the inverse discrete Fourier transform of the set of $N$ dynamical matrices, Equation (22), as following

\begin{split}\Phi^{(2)}_{0Ai,l^{\prime}Bj}=\frac{\sqrt{m_{A}m^{\prime}_{B}}}{N% }\sum_{\textbf{q}}D_{Ai,Bj}(\textbf{q})\textbf{e}^{-i\textbf{q}\cdot(\textbf{R% }^{o}_{l^{\prime}B}-\textbf{R}^{o}_{0A})}.\end{split}

(24)

Similarly, the third order force constant predicted in phonon coordinates can be transform back to one in Cartesian coordinates with the similar expression,

\begin{split}\Phi^{(3)}_{0Ai,l^{\prime}Bj,l^{\prime\prime}Ck}=&\frac{\sqrt{m_{% A}m^{\prime}_{B}m^{\prime\prime}_{C}}}{N}\sum_{\textbf{q}^{\prime},\textbf{q}^% {\prime\prime}}\Phi^{(3)}_{Ai,Bj,Ck}(\textbf{q}^{\prime},\textbf{q}^{\prime% \prime})\\ &\textbf{e}^{-i\textbf{q}^{\prime}\cdot(\textbf{R}^{o}_{l^{\prime}B}-\textbf{R% }^{o}_{0A})}\times\textbf{e}^{-i\textbf{q}^{\prime\prime}\cdot(\textbf{R}^{o}_% {l^{\prime\prime}C}-\textbf{R}^{o}_{0A})}\\ &\textbf{e}^{-i(\textbf{q}+\textbf{q}^{\prime}+\textbf{q}^{\prime\prime})\cdot% \textbf{R}^{o}_{0A}}\Delta(\textbf{q}+\textbf{q}^{\prime}+\textbf{q}^{\prime% \prime}),\end{split}

(25)

where these wave vectors have to conserve the lattice momentum. This means they are confined by

\Delta(\textbf{q}+\textbf{q}^{\prime}+\textbf{q}^{\prime\prime})=\begin{cases}% 1&\text{$\textbf{q}+\textbf{q}^{\prime}+\textbf{q}^{\prime\prime}$ is % reciprocal}\\ &\quad\text{lattice vector}\\ 0&\text{Otherwise}.\end{cases}

(26)

Therefore, the three phonon descriptors ( $d_{Ai}(\textbf{q})$ , $d_{Bj}(\textbf{q}^{\prime})$ and $d_{Ck}(\textbf{q}^{\prime\prime})$ ) used in our prediction have to satisfy this constraint.

III Dataset preparation and training

In the development of our methods, and to compare to standard approaches, we consider a set of representative solid state materials. We compare our method (which we term GPFC, Gaussian Process Force-Constants) with a traditional finite displacement method (as implemented in Phonopy [26, 28]) and a third-order many-body perturbation theory approach (in Phono3py [27, 28]).For processing of our predicted force-constants we use Phonopy and Phono3py. The details of the training datasets are provided below.

phono(3)py-dataset: we use phono(3)py-package to generate displaced structures of $2\times 2\times 2$ supercells following finite different methods (FDM), and then evaluate the energy and force in a density functional theory calculation with VASP, with plane-wave cutoff energy (ENMAX $=800\>eV$ ), SCF energy convergence (EDIFF $=10^{-8}\>eV$ ), and k-point (kpts $=[6,\>6,\>6]$ ). After VASP processes, we then use VASP results with Phonopy-package to generate force constants with space group symmetry operation.

GPFC-dataset: we use ASE to generate rattled ( $2\times 2\times 2$ supercell) structures of the atomic environments with normal distribution $0.01\>{\AA}$ , and then perform similar density functional theory calculations of energy and force in VASP. The plane-wave energy cutoff, k-point mesh, convergence parameters are as above. Subsequently, the dataset of total energies and forces are used to train the derivative GP model following Eq (20) without imposing any symmetry.

Though the GPFC hyperparameters (kernel scale ( $\sigma_{o}$ ), a length scale ( $l$ ), the observation noise of energy ( $\sigma_{e}$ ) and force ( $\sigma_{f}$ )) can be optimised by maximising the logarithm of the marginal likelihood as in Eq. (13), we set them to reasonable constants. $\sigma_{o}$ and $l$ are $1$ and $0.4$ respectively for the $0.01\>-\>0.05\>\AA$ normal distributed rattles of the relaxed structure [17]. The observation noises $\sigma_{e}$ and $\sigma_{f}$ we taje as $10e-8$ , the same as our SCF energy convergence (EDIFF) in our VASP calculations.

IV Results and discussion

To compare the harmonic (second order) force constants between our model and the standard finite-displacement method (FDM) as implemented in Phonopy, we examine the phonon band diagram and its density of state of a diamond(d)-Si (FCC), GaAs, CdTe, NaCl and PbTe, which are illustrated in Figure 3. The root mean square errors of the phonon band structures are around $1-4\%$ . Wasserstein’s distance (earth mover distance, EMD) is used to measure the dissimilarity of two phonon density of states between different methods over a frequency $\omega$ region: we calculate these as $0.043$ , $0.037$ , $0.021$ , $0.071$ and $0.022$ $THz\cdot eV^{-1}\AA^{-3}$ respectively.

The model learning curves are shown via calculating the degree of acoustic (or translational) sum-rule violation. For harmonic force-constants, this should be

\begin{split}\sum_{B}\Phi^{(2)}_{Ai,Bj}=0,\end{split}

(27)

and for third order anharmonic force-constants this would be

\begin{split}\sum_{C}\Phi^{(3)}_{Ai,Bj,Ck}=0,\end{split}

(28)

with components $i$ , $j$ , $k$ , of atoms $A$ , $B$ , $C$ respectively.

In Figure 4 (a), we can see the sum-rules in the Cartesian basis for all those three materials start to converge to zero with $\approx 48$ data-points. This number corresponds to the degree of freedoms of the atomic descriptor in Cartesian coordinates, which are $48$ for the $2\times 2\times 2$ supercell of a 2-atomic primitive cell. Although those five materials are different, their learning curves converge with the same number of training points.

Meanwhile in phonon coordinates, they converge to zero with $\approx 6$ datapoints. The number datapoints required similarly correspond to the degree of freedoms of the phonon atomic descriptor (now in the primitive unit-cell). Phonon coordinate FC2 (dynamical matrix) and FC3 are evaluated at $\Gamma$ -point. In other q-points, the evaluations are limited by current automatic differentiation package. With this limitation, we cannot recover Cartesian FC2 and FC3 with using Eq (24) and Eq (25).

To compare the cubic anharmonic (or third order) force constants among the methods, we consider the lifetime of the phonons and the lattice thermal conductivity of d-Si, NaCl, PbTe, GaAs and CdTe with finite temperature ( $300-1200$ K). They are calculated by using a third-order many-body perturbation theory approach with different cubic anharmonic force constants, one from FDM in Phono3py and another from kernel regression in our GPFC. The cubic anharmonic Gaussian process force constant is predicted by using Eq. (23). We train the derivative GP model with the same training set as using in the prediction of the harmonic force constants.

Based on the convergence of the acoustic sum-rule (Figure 4), accurate prediction of the cubic anharmonic force constants with a Cartesian basis requires up to $400$ energies and force calculations. Meanwhile, in the phonon basis, starting with a dynamic-matrix (harmonic force constants), only $50$ energy and forces are required.

The predicted lifetime of phonons of d-Si and PbTe at $300$ K are shown in panel 5 (a) and 5 (b), while their predicted lattice thermal conductivity illustrated in Figure 5 (c). Again, the Earth Mover Distance is used to quantify dissimilarity of the projected lifetime density of state, for each finite $30$ K step from $300$ K to $1200$ K. The mean EMD for d-Si is $3.9\times 10^{-1}\>ps$ , while $8.6\times 10^{-3}\>ps$ is for PbTe. Mean absolute errors of the lattice thermal conductivity calculated by using the third order GPFC are $2.03$ ( $\sim 2\%$ error) and $1.4\times 10^{-3}$ ( $\sim 7\%$ error) $W\cdot m^{-1}K^{-1}$ , respectively. Moreover, the lifetime of phonons and the lattice thermal conductivity of NaCl, GaAs and CdTe are calculated based on our GPFCs with the errors ranging from $2\%$ to $7\%$ . The accuracy of the third order GPFC prediction for PbTe is low comparing among five materials because PbTe exhibits strong anharmonic behaviour leading to phonon-phonon interaction which can reduce its phonon lifetime.

A key finding of our experiments with our GPFC approach is that we seem to require a number of training points equal to the degree of freedom of the descriptors. In the phonon basis this is the number of phonon bands which you are calculating. Potentially the Coulomb matrix [31], or similar species-aware descriptor, could be used to share information about similar chemical species in the unit cell.

In our anharmonic (to cubic) GPFC experiments, we find we require approximately $8$ times the number of data points required for a harmonic GPFC.

Therefore, in our limited experiments the calculation of anharmonic force constants is linear in the number of elements in the unit cell.

V Conclusion

We develop a method to model lattice dynamics, by fitting a derivative Gaussian Process force-constant (GPFC) model. These derivatives provide the correlation among energy and its derivatives, i.e. force, second-order force constants (FC2), and FC3. We experiment with this model on the harmonic and anharmonic (cubic) force constants of five materials (d-Si, NaCl, PbTe, GaAs and CdTe). Accurate predictions of the harmonic force constants require the same number of energy and force evaluations (here density functional theory calculations) as the number of degrees of freedom in our descriptor basis. For the most compact phonon descriptor, this means the same number as the phonon bands to be predicted. To predict the third-order force constants, we seem to require 8 times the data than required to fit the harmonic force constants.

Our models seem to offer linear scaling of prediction of anharmonic force constants. We are technically limited in being able to apply our phonon descriptor to anharmonic force constants, due to the necessity of dealing with the derivative of a complex number at positions away from gamma in the Brillouin-Zone.

An extension of this work would be to leverage the development of atomistic machine-learning force-field descriptors, i.e. radial basis and spherical Harmonic basis functions as in GAP [2, 3] or ACE [23, 32]. These model descriptors are equivariant under the rotation in a three dimensional space, and may offer some of the benefits of the phonon descriptors we have developed in this work.

As an alternative approach, one could directly train (or fine tune) a full machine-learning force-field, and then use this as a surrogate model for the standard finite displacement phonon workflow. In a future study we will compare the data efficiency of this approach.

VI Author contribution

K.K.: Formal Analysis (lead); Investigation (lead); Methodology (equal); Software (lead); Writing – original draft (equal); Writing – review and editing (equal).

J.M.F.: Conceptualization (lead); Methodology (equal); Writing – original draft (equal); Writing – review and editing (equal).

VII Acknowledgement

J.M.F. is supported by a Royal Society University Research Fellowship (URF-R1-191292). K.K. is supported by a Thai scholarship, Development and Promotion of Science and Technology project. Julia[33] codes implementing these calculations are available as a repository on GitHub[34]. This work made use of the Imperial College Research Computing Service [35]. Via our membership of the UK’s HEC Materials Chemistry Consortium, which is funded by EPSRC (EP/R029431 and EP/X035859), this work used the ARCHER2 UK National Supercomputing Service (http://www.archer2.ac.uk).

Appendix A Force constants from a Gaussian Process

References

Handley et al. [2009] C. M. Handley, G. I. Hawe, D. B. Kell, and P. L. A. Popelier, Optimal construction of a fast and accurate polarisable water potential based on multipole moments trained by machine learning, Physical Chemistry Chemical Physics 11, 6365 (2009).
Bartók et al. [2010] A. P. Bartók, M. C. Payne, R. Kondor, and G. Csányi, Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons, Physical Review Letters 104, 10.1103/physrevlett.104.136403 (2010).
Bartók et al. [2013] A. P. Bartók, R. Kondor, and G. Csányi, On representing chemical environments, Physical Review B 87, 10.1103/physrevb.87.184115 (2013).
Bartók and Csányi [2015] A. P. Bartók and G. Csányi, Gaussian approximation potentials: A brief tutorial introduction, International Journal of Quantum Chemistry 115, 1051–1057 (2015).
Cui and Krems [2016] J. Cui and R. V. Krems, Efficient non-parametric fitting of potential energy surfaces for polyatomic molecules with gaussian processes, Journal of Physics B: Atomic, Molecular and Optical Physics 49, 224001 (2016).
Dral et al. [2017] P. O. Dral, A. Owens, S. N. Yurchenko, and W. Thiel, Structure-based sampling and self-correcting machine learning for accurate calculations of potential energy surfaces and vibrational levels, The Journal of Chemical Physics 146, 10.1063/1.4989536 (2017).
Guan et al. [2017] Y. Guan, S. Yang, and D. H. Zhang, Construction of reactive potential energy surfaces with gaussian process regression: active data selection, Molecular Physics 116, 823–834 (2017).
Kolb et al. [2017] B. Kolb, P. Marshall, B. Zhao, B. Jiang, and H. Guo, Representing global reactive potential energy surfaces using gaussian processes, The Journal of Physical Chemistry A 121, 2552–2557 (2017).
Guan et al. [2018] Y. Guan, S. Yang, and D. H. Zhang, Application of clustering algorithms to partitioning configuration space in fitting reactive potential energy surfaces, The Journal of Physical Chemistry A 122, 3140–3147 (2018).
Bartók et al. [2018] A. P. Bartók, J. Kermode, N. Bernstein, and G. Csányi, Machine learning a general-purpose interatomic potential for silicon, Physical Review X 8, 10.1103/physrevx.8.041048 (2018).
Strickson et al. [2019] O. Strickson, N. Nikiforakis, and E. Artacho, Dynamical continuum simulation of condensed matter from first principles, Physical Review Research 1, 10.1103/physrevresearch.1.033199 (2019).
Dai and Krems [2020] J. Dai and R. V. Krems, Interpolation and extrapolation of global potential energy surfaces for polyatomic systems by gaussian processes with composite kernels, Journal of Chemical Theory and Computation 16, 1386 (2020).
Sugisawa et al. [2020] H. Sugisawa, T. Ida, and R. V. Krems, Gaussian process model of 51-dimensional potential energy surface for protonated imidazole dimer, The Journal of Chemical Physics 153, 114101 (2020).
Wu et al. [2017] A. Wu, M. C. Aoi, and J. W. Pillow, Exploiting gradients and hessians in bayesian optimization and bayesian quadrature (2017), arXiv:1704.00060 .
Solak et al. [2002] E. Solak, R. Murray-Smith, W. E. Leithead, D. J. Leith, and C. E. Rasmussen, Derivative observations in gaussian process models of dynamic systems, in Proceedings of the 15th International Conference on Neural Information Processing Systems, NIPS’02 (MIT Press, Cambridge, MA, USA, 2002) p. 1057–1064.
Rasmussen and Williams [2006] C. E. Rasmussen and C. K. I. Williams, Gaussian processes for machine learning., Adaptive computation and machine learning (MIT Press, 2006) pp. I–XVIII, 1–248.
del Río et al. [2019] E. G. del Río, J. J. Mortensen, and K. W. Jacobsen, Local bayesian optimizer for atomic structures, Physical Review B 100, 10.1103/physrevb.100.104103 (2019).
Kaappa et al. [2021] S. Kaappa, E. G. del Río, and K. W. Jacobsen, Global optimization of atomic structures with gradient-enhanced gaussian process regression, Physical Review B 103, 10.1103/physrevb.103.174114 (2021).
Asnaashari and Krems [2021] K. Asnaashari and R. V. Krems, Gradient domain machine learning with composite kernels: improving the accuracy of PES and force fields for large molecules, Machine Learning: Science and Technology 3, 015005 (2021).
Eriksson et al. [2019] F. Eriksson, E. Fransson, and P. Erhart, The hiphive package for the extraction of high‐order force constants by machine learning, Advanced Theory and Simulations 2, 10.1002/adts.201800184 (2019).
McHutchon [2013] A. McHutchon, Diﬀerentiating gaussian processes, http://mlg.eng.cam.ac.uk/mchutchon/DifferentiatingGPs.pdf (2013).
MacKay [2003] D. J. C. MacKay, Information Theory, Inference, and LearningAlgorithms (Cambridge University Press., 2003).
Drautz [2019] R. Drautz, Atomic cluster expansion for accurate and transferable interatomic potentials, Physical Review B 99, 10.1103/physrevb.99.014104 (2019).
Batatia et al. [2022a] I. Batatia, D. P. Kovács, G. N. C. Simm, C. Ortner, and G. Csányi, MACE: Higher order equivariant message passing neural networks for fast and accurate force fields, in Advances in Neural Information Processing Systems, Vol. 35, edited by S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Curran Associates, Inc., 2022) pp. 11423–11436, arXiv:2206.07697 [stat.ML] .
Batatia et al. [2022b] I. Batatia, S. Batzner, D. P. Kovács, A. Musaelian, G. N. C. Simm, R. Drautz, C. Ortner, B. Kozinsky, and G. Csányi, The design space of e(3)-equivariant atom-centered interatomic potentials (2022b).
Togo [2023] A. Togo, First-principles phonon calculations with phonopy and phono3py, J. Phys. Soc. Jpn. 92, 012001 (2023).
Togo et al. [2015] A. Togo, L. Chaput, and I. Tanaka, Distributions of phonon lifetimes in brillouin zones, Phys. Rev. B Condens. Matter Mater. Phys. 91 (2015).
Togo et al. [2023] A. Togo, L. Chaput, T. Tadano, and I. Tanaka, Implementation strategies in phonopy and phono3py, J. Phys. Condens. Matter 35 (2023).
Shinohara et al. [2023] K. Shinohara, A. Togo, and I. Tanaka, spgrep: On-the-fly generator of space-group irreducible representations, J. Open Source Softw. 8, 5269 (2023).
[30] D. Eriksson, K. Dong, E. H. Lee, D. Bindel, and A. G. Wilson, Scaling gaussian process regression with derivatives, https://dl.acm.org/doi/pdf/10.5555/3327757.3327791, accessed: 2023-12-7.
Rupp et al. [2012] M. Rupp, A. Tkatchenko, K.-R. Müller, and O. A. von Lilienfeld, Fast and accurate modeling of molecular atomization energies with machine learning, Physical Review Letters 108, 10.1103/physrevlett.108.058301 (2012).
Witt et al. [2023] W. C. Witt, C. van der Oord, E. Gelžinytė, T. Järvinen, A. Ross, J. P. Darby, C. H. Ho, W. J. Baldwin, M. Sachs, J. Kermode, N. Bernstein, G. Csányi, and C. Ortner, ACEpotentials.jl: A julia implementation of the atomic cluster expansion, J. Chem. Phys. 159 (2023).
Bezanson et al. [2017] J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah, Julia: A fresh approach to numerical computing, SIAM review 59, 65 (2017).
Keeratikarn and Frost [2024] K. Keeratikarn and J. M. Frost, https://github.com/Frost-group/GPFC.jl (2021–2024).
Harvey [2017] M. Harvey, Imperial college research computing service (2017).


(b)	(c)


(d)	(e)


(a)	(b)


(a)	(b)