Anharmonic phonons with Gaussian processes

Keerati Keeratikarn Department of Physics, Imperial College London, Exhibition Road, London SW7 2AZ, UK    Jarvist Moore Frost Department of Chemistry, Imperial College London, UK Department of Physics, Imperial College London, Exhibition Road, London SW7 2AZ, UK [email protected]
(June 6, 2024)
Abstract

We provide a method for calculating anharmonic lattice dynamics, by building a surrogate model based on Gaussian Processes (GPs). Due to the underlying Gaussian form of a GP, the model is infinitely differentiable. This allows us to train the model trained directly on forces (the derivative of PESs) reducing the evaluations required for a given accuracy. We can extend this differentiation to directly calculate second and third order force-constants using automatic differentiation (AD). For the five model materials we study, we find that the force-constants are in close agreement with a standard finite-displacement approach. Our method appears to be linear scaling in the number of atoms at predicting both second and third-order (anharmonic) force-constants.

Potential Energy Surface, Anharmonic Force Constants, Gaussian Processes
Anharmonicity; Gaussian processes; Phonon

I Introduction

Finite temperature properties of matter require a model for thermal motion. In crystals it is natural to describe the thermal motion as collective excitation around the equilibrium structure. The standard approach is to use a finite displacement method (FDM), based in a Taylor expansion of the potential energy surface around the equilibrium. The second-order force constants provide the harmonic approximation for the potential energy surface (PES); anharmonic contributions (required for finite thermal conductivity) require higher order force constants. FDM-based calculations scale poorly both with the size of the system, and the order of the force constants.

The potential energy surface (PES) of a solid-state material directly provides the mechanical response of a material (Young’s modulus, yield strength), and also provides the environment in which the electronic structure exists. Thus it is key to both the electron-phonon coupling which lead to finite electrical conductivity, and the phonon-phonon scattering which lead to finite thermal conductivity.

For small perturbations around the equilibrium structure, a Taylor expansion of the potential energy surface can be made. As at equilibrium the first-order (linear) term of this expansion is zero (i.e. no forces), the first non-zero term is the second-order harmonic contribution. Cutting off the expansion at this point (the harmonic approximation) leads to the mass-weighted ‘dynamic matrix’ of force-constants. Diagonalising this matrix produces normal-modes (eigenvectors of the dynamic matrix) and vibration frequencies (eigenvalues of the dynamic matrix), which describe the phonon properties of a material. This change of basis is often used as a natural way in which to describe the response theory of solid state materials. The phonon modes can be directly measured as a function of crystallographic momentum with neutron scattering experiments, and the gamma-point modes give rise to infrared absorption of materials (by the polarisation of a given mode), and Raman response (by the hyperpolarisation of a given mode).

These normal-modes do not interact, and therefore have an infinite lifetime; thus a harmonic picture of matter directly leads to the prediction of infinite thermal conductivity. Anharmonic corrections to this picture can be added via many-body perturbation theory, where second quantisation is used to consider these vibration states as quasi-particles, the phonon force Boson.

The standard approach to calculate phonons is via a finite displacement method. For harmonic phonons, this requires moving each atom a small displacement separately in x,y,z𝑥𝑦𝑧{x,y,z}italic_x , italic_y , italic_z, requiring 32N2superscript32superscript𝑁23^{2}N^{2}3 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT evaluations of the total energy. (Typically these forces come from an electronic structure method, such as density functional theory.)

To include third-order force-constants in a finite displacement method, one must consider each atom moving in x,y,z𝑥𝑦𝑧{x,y,z}italic_x , italic_y , italic_z combined with every other atom moving in x,y,z𝑥𝑦𝑧{x,y,z}italic_x , italic_y , italic_z, requiring 33N3superscript33superscript𝑁33^{3}N^{3}3 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_N start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT separate evaluations of the total energy.

This prohibitive scaling, combined with the minimal O(N3)𝑂superscript𝑁3O(N^{3})italic_O ( italic_N start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) scaling of electronic structure methods, severely limits the size of systems for which we can predict the thermal conductivity. This is a major limitation in humanity’s ability to design materials with specific thermal characteristics. A particular technical application is in the design of thermoelectrics, where one wants to minimise thermal conductivity while maximising electrical conductivity, and for which state-of-the-art materials are extremely complex to maximise phonon scattering.

From an information theoretic point of view, this restriction is curious. There is no more information embedded in the third-order derivatives than the second. In fact, the full potential energy surface is present in every calculation, and in electronic structure methods the evaluation of the more complicated system has already been paid for by the O(N3)𝑂superscript𝑁3O(N^{3})italic_O ( italic_N start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) scaling of the electronic structure method.

As an alternative approach, one could use a more sophisticated surrogate method, than fitting the individual terms of a Taylor expansion. In this work we use Gaussian Processes as a machine-learning surrogate potential energy surface. Gaussian processes (GPs) are a Bayesian (probabilistic) machine learning method which can fit an arbitrary function. During the last 15 years, a number of methods have been developed using Gaussian Processes to fit potential energy surfaces[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13].

Our particular focus is motivated by the fact that every linear operation on a Gaussian process results in a (transformed) Gaussian process. Differentiation is one such linear operation. In the context of force-constants, this means that the hierarchy of force-constants (NErNsuperscript𝑁𝐸superscript𝑟𝑁\frac{\partial^{N}E}{\partial r^{N}}divide start_ARG ∂ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_E end_ARG start_ARG ∂ italic_r start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_ARG) can simultaneously be used to condition (fit) the Gaussian Process energy surface; and to be directly calculated from the Gaussian process. This enables a maximal amount of information to be extracted from each individual electronic structure calculation; and for the calculation of force-constants (of arbitrary order), without further calculation. As a particular technical application, this may enable the calculation of anharmonic phonons for larger unit cells at a lower computational cost.

Prior work in this area includes Gaussian Processes have recently been applied to fitting the low-dimensional potential energy surfaces for polarised water [1], 𝐍4subscript𝐍4\mathbf{N}_{4}bold_N start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT [5], a single Si-crystal [10, 11], methyl chloride [6] and even chemical reactions [8, 7, 9]. Wu, Aoi and Pillow provide a general Bayesian optimisation method that makes use of gradient information [14]. This follows from earlier work which concentrated on low-dimensional observations of a dynamic model [15]. Gaussian process models of potential energy surfaces and forces have found considerable application in molecular dynamics of materials [2, 3].

Gaussian processes scale poorly with respect to the number of training points m𝑚mitalic_m, as O(m3)𝑂superscript𝑚3O(m^{3})italic_O ( italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) from an inversion of an m×m𝑚𝑚m\times mitalic_m × italic_m matrix to normalise the model and O(m)𝑂𝑚O(m)italic_O ( italic_m ) from a dot-product of 1×m1𝑚1\times m1 × italic_m matrix to evaluate the model [13, 16].

Several techniques are used to minimise the amount of training data required. One such technique is to increase the complexity of Gaussian processes model by compositing two or more kernel functions together. Dai and Krems [12] illustrated the use of composite kernel to interpolate and extrapolate the 6-dimensional potential energy surfaces of 𝐇3𝐎+subscript𝐇3superscript𝐎\mathbf{H}_{3}\mathbf{O}^{+}bold_H start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT bold_O start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT. Subsequently, a 51-dimensional potential energy surface for protonated imidazole dimer is predicted via the full-dimensional GP model compositing several GP models contributing to lower-dimensional molecular fragments, presented by Sugisawa et al. [13].

Differentiating the Gaussian Process is another approach to reduce the training data required. Garijo del Río et al.[17] used Gaussian processes as a surrogate model of the anharmonic potential energy surface to accelerate structural relaxation, providing the GPMin function in the Atomic Simulation Environment (ASE) package. However, their Gaussian process directly fits a 1+3N13𝑁1+3N1 + 3 italic_N vector of the total energy and the 3N3𝑁3N3 italic_N forces. They do not exploit the linear nature of the differentiation operation, and therefore the ability to build a nested Gaussian process.

More similar to our work, Kaappa et al.[18] used a gradient-aided model to enhanced global structure prediction. Asnaashari and Krems[19] added composite kernels to boost the accuracy of force and energy prediction of large molecules.

As far as we know, we are the first people to take this approach to higher derivatives (anharmonic force constants), and apply them to ‘lattice dynamics’.

An alternative approach to accelerate anharmonic calculations is that of the HiPhive [20] extension of cluster-expansion. The potential of the atomic environment is constructed from a sum of two-body, three-body and four-body clusters. In these clusters, the pair (or the group) of atoms are contributed by harmonic (or higher anharmonic) force constants. These force constants are parameterised by training machine learning on prior forces associating to the input configuration in the atomic environment.

II Methods

The key development challenge in our work was in building derivatives into the Gaussian Processes: this we did by a combination of analytic derivatives of the Gaussian Process kernels, and by using Automatic Differentiation (which automatically applies the chain rule of derivation recursively) at the computer code level. McHutchon’s technical note was extremely useful in providing a number of Gaussian Process kernel derivatives[21].

II.1 Gaussian Process model

We begin with discussing standard Gaussian process regression, in the context of the potential energy surface and its derivatives.

To predict potential energy surfaces by Gaussian processes, a potential energy E(𝐱)𝐸superscript𝐱E(\mathbf{x}^{*})italic_E ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) can be written in terms of the sum of arbitrary basis functions ϕhsubscriptitalic-ϕ\phi_{h}italic_ϕ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT as a function of an atomic configuration (descriptor) 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, which can be expressed as

E(𝐱)=h= 1Hwhϕh(x)=wTΦ(x).𝐸superscript𝐱superscriptsubscript1𝐻subscript𝑤subscriptitalic-ϕsuperscriptxsuperscriptwTΦsuperscriptx\begin{split}E(\mathbf{x}^{*})&=\sum_{h\>=\>1}^{H}w_{h}\phi_{h}(\textbf{x}^{*}% )\\ &=\textbf{w}^{\text{T}}\Phi(\textbf{x}^{*}).\end{split}start_ROW start_CELL italic_E ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_CELL start_CELL = ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = w start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT roman_Φ ( x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) . end_CELL end_ROW (1)

The descriptor 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT can be chosen to represent a set of atomic positions in Cartesian coordinates for an N𝑁Nitalic_N-atom system,

𝐱:={x1,y1,z1,,xN,yN,zN}.assignsuperscript𝐱subscriptsuperscript𝑥1subscriptsuperscript𝑦1subscriptsuperscript𝑧1subscriptsuperscript𝑥𝑁subscriptsuperscript𝑦𝑁subscriptsuperscript𝑧𝑁\begin{split}\mathbf{x}^{*}:=\{x^{*}_{1},y^{*}_{1},z^{*}_{1},...,x^{*}_{N},y^{% *}_{N},z^{*}_{N}\}.\end{split}start_ROW start_CELL bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := { italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT , italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT , italic_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT } . end_CELL end_ROW (2)

The parameters whsubscript𝑤w_{h}italic_w start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT of an H𝐻Hitalic_H-dimensional weight vector 𝐰𝐰\mathbf{w}bold_w, corresponding to a basis vector ΦΦ\Phiroman_Φ, can be selected to be a Gaussian (normal) distribution weight of prior potential energy surface data E(𝐱)𝐸𝐱E(\mathbf{x})italic_E ( bold_x ). Each Gaussian weight is distributed as P(wh):=𝒩(wh; 0,σw)assign𝑃subscript𝑤𝒩subscript𝑤 0subscript𝜎𝑤P(w_{h}):=\mathcal{N}(w_{h};\>0,\>\sigma_{w})italic_P ( italic_w start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) := caligraphic_N ( italic_w start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ; 0 , italic_σ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ) which has a zero-mean and an individual variance σw2superscriptsubscript𝜎𝑤2\sigma_{w}^{2}italic_σ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Regarding the expression in (1), we can write the covariance between a prior potential energy E(𝐱)𝐸𝐱E(\mathbf{x})italic_E ( bold_x ) and a posterior E(𝐱)𝐸superscript𝐱E(\mathbf{x}^{*})italic_E ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) as

E(𝐱)E(𝐱)=hhHwhwhϕh(x)ϕh(x)=hhH(dH𝐰P(𝐰)whwh)ϕh(x)ϕh(x).delimited-⟨⟩𝐸𝐱𝐸superscript𝐱superscriptsubscriptsuperscript𝐻delimited-⟨⟩subscript𝑤subscript𝑤superscriptsubscriptitalic-ϕxsubscriptitalic-ϕsuperscriptsuperscriptxsuperscriptsubscriptsuperscript𝐻superscriptd𝐻𝐰𝑃𝐰subscript𝑤subscript𝑤superscriptsubscriptitalic-ϕxsubscriptitalic-ϕsuperscriptsuperscriptx\begin{split}\left\langle E(\mathbf{x})E(\mathbf{x}^{*})\right\rangle&=\sum_{% hh^{\prime}}^{H}\langle w_{h}w_{h^{\prime}}\rangle\phi_{h}(\textbf{x})\phi_{h^% {\prime}}(\textbf{x}^{*})\\ &=\sum_{hh^{\prime}}^{H}\left(\int\text{d}^{H}\mathbf{w}P(\mathbf{w})w_{h}w_{h% ^{\prime}}\right)\phi_{h}(\textbf{x})\phi_{h^{\prime}}(\textbf{x}^{*}).\end{split}start_ROW start_CELL ⟨ italic_E ( bold_x ) italic_E ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ⟩ end_CELL start_CELL = ∑ start_POSTSUBSCRIPT italic_h italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT ⟨ italic_w start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ⟩ italic_ϕ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( x ) italic_ϕ start_POSTSUBSCRIPT italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∑ start_POSTSUBSCRIPT italic_h italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT ( ∫ d start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_w italic_P ( bold_w ) italic_w start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) italic_ϕ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( x ) italic_ϕ start_POSTSUBSCRIPT italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) . end_CELL end_ROW (3)

Since the covariance between two Gaussian weights is to integrate them over the distribution, the integral yields σw2δhhsuperscriptsubscript𝜎𝑤2subscript𝛿superscript\sigma_{w}^{2}\delta_{hh^{\prime}}italic_σ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_h italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT so we have

E(𝐱),E(𝐱)=hhH(σw2δhh)ϕh(x)ϕh(x)=σw2hHϕh(x)ϕh(x).𝐸𝐱𝐸superscript𝐱superscriptsubscriptsuperscript𝐻superscriptsubscript𝜎𝑤2subscript𝛿superscriptsubscriptitalic-ϕxsubscriptitalic-ϕsuperscriptsuperscriptxsuperscriptsubscript𝜎𝑤2superscriptsubscript𝐻subscriptitalic-ϕxsubscriptitalic-ϕsuperscriptx\begin{split}\left\langle E(\mathbf{x}),\>E(\mathbf{x}^{*})\right\rangle&=\sum% _{hh^{\prime}}^{H}\left(\sigma_{w}^{2}\delta_{hh^{\prime}}\right)\phi_{h}(% \textbf{x})\phi_{h^{\prime}}(\textbf{x}^{*})\\ &=\sigma_{w}^{2}\sum_{h}^{H}\phi_{h}(\textbf{x})\phi_{h}(\textbf{x}^{*}).\end{split}start_ROW start_CELL ⟨ italic_E ( bold_x ) , italic_E ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ⟩ end_CELL start_CELL = ∑ start_POSTSUBSCRIPT italic_h italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT ( italic_σ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_h italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) italic_ϕ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( x ) italic_ϕ start_POSTSUBSCRIPT italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = italic_σ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( x ) italic_ϕ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) . end_CELL end_ROW (4)

The inner product between two basis functions is defined as the similarity measures of two atomic descriptors and be rewritten as a kernel (covariance) function

k(𝐱,𝐱)=h= 1Hϕh(x)ϕh(x).𝑘𝐱superscript𝐱superscriptsubscript1𝐻subscriptitalic-ϕxsubscriptitalic-ϕsuperscriptx\begin{split}k(\mathbf{x},\>\mathbf{x}^{*})=\sum_{h\>=\>1}^{H}\phi_{h}(\textbf% {x})\phi_{h}(\textbf{x}^{*}).\end{split}start_ROW start_CELL italic_k ( bold_x , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( x ) italic_ϕ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) . end_CELL end_ROW (5)

We can use this kernel function to perform Gaussian process regression for predicting a posterior potential energy E(𝐱)𝐸superscript𝐱E(\mathbf{x}^{*})italic_E ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) from a prior dataset of previous calculations of potential energy surfaces 𝐄𝐄\mathbf{E}bold_E. The prior dataset consists of m𝑚mitalic_m potential energy surfaces corresponding to m𝑚mitalic_m atomic descriptors, i.e. 𝐄:=[E1E2Em]Tassign𝐄superscriptdelimited-[]subscript𝐸1subscript𝐸2subscript𝐸𝑚T\mathbf{E}:=\left[E_{1}\>E_{2}\cdots E_{m}\right]^{\text{T}}bold_E := [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋯ italic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT and X:={x1,x2,,xm}assignXsubscriptx1subscriptx2subscriptx𝑚\textbf{X}:=\{\textbf{x}_{1},\>\textbf{x}_{2},\cdots,\>\textbf{x}_{m}\}X := { x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } respectively. Each prior Eisubscript𝐸𝑖E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can have an observation noise σesubscript𝜎𝑒\sigma_{e}italic_σ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT, due to the error or convergence limitation of the electronic structure calculations: Ei=E(𝐱i)+σesubscript𝐸𝑖𝐸subscript𝐱𝑖subscript𝜎𝑒E_{i}=E(\mathbf{x}_{i})+\sigma_{e}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_E ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_σ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT. If the prior Eisubscript𝐸𝑖E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is (univariate) Gaussian distributed independently for each data, we will obtain the covariance of any two of the prior data as EiEj=k(𝐱i,𝐱j)+σe2δijdelimited-⟨⟩subscript𝐸𝑖subscript𝐸𝑗𝑘subscript𝐱𝑖subscript𝐱𝑗subscriptsuperscript𝜎2𝑒subscript𝛿𝑖𝑗\langle E_{i}E_{j}\rangle=k(\mathbf{x}_{i},\>\mathbf{x}_{j})+\sigma^{2}_{e}% \delta_{ij}⟨ italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_E start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ = italic_k ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT and can write the prior probability distribution of each Eisubscript𝐸𝑖E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with zero mean and variance σe2subscriptsuperscript𝜎2𝑒\sigma^{2}_{e}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT as

P(Ei)=e(12(k(𝐱i,𝐱i)+σe2)1Ei2)(2π)(k(𝐱i,𝐱i)+σe2).𝑃subscript𝐸𝑖superscripte12superscript𝑘subscript𝐱𝑖subscript𝐱𝑖subscriptsuperscript𝜎2𝑒1superscriptsubscript𝐸𝑖22𝜋𝑘subscript𝐱𝑖subscript𝐱𝑖subscriptsuperscript𝜎2𝑒P(E_{i})=\frac{\textbf{e}^{\left({-\frac{1}{2}(k(\mathbf{x}_{i},\>\mathbf{x}_{% i})+\sigma^{2}_{e})^{-1}\cdot E_{i}^{2}}\right)}}{\sqrt{(2\pi)\>(k(\mathbf{x}_% {i},\>\mathbf{x}_{i})+\sigma^{2}_{e})}}.italic_P ( italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = divide start_ARG e start_POSTSUPERSCRIPT ( - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_k ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG ( 2 italic_π ) ( italic_k ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) end_ARG end_ARG . (6)

Therefore, the probability distribution of all priors (marginal likelihood) is a multivariate Gaussian distribution with zero mean and the m×m𝑚𝑚m\times mitalic_m × italic_m covariance matrix, EET=K(X,X)+σe2𝕀mmdelimited-⟨⟩superscriptEET𝐾XXsubscriptsuperscript𝜎2𝑒subscript𝕀𝑚𝑚\langle\textbf{E}\textbf{E}^{\text{T}}\rangle=K(\textbf{X},\textbf{X})+\sigma^% {2}_{e}\mathbb{I}_{mm}⟨ bold_E bold_E start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT ⟩ = italic_K ( X , X ) + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT blackboard_I start_POSTSUBSCRIPT italic_m italic_m end_POSTSUBSCRIPT, expressed as

P(E)=e(12ET[K(X,X)+σe2𝕀mm]1E)(2π)mdet[K(X,X)+σe2𝕀mm].𝑃Esuperscripte12superscriptETsuperscriptdelimited-[]𝐾XXsubscriptsuperscript𝜎2𝑒subscript𝕀𝑚𝑚1Esuperscript2𝜋𝑚detdelimited-[]𝐾XXsubscriptsuperscript𝜎2𝑒subscript𝕀𝑚𝑚P(\textbf{E})=\frac{\textbf{e}^{\left(-\frac{1}{2}\textbf{E}^{\text{T}}\cdot% \left[K(\textbf{X},\>\textbf{X})+\sigma^{2}_{e}\mathbb{I}_{mm}\right]^{-1}% \cdot\textbf{E}\right)}}{\sqrt{(2\pi)^{m}\>\text{det}\left[K(\textbf{X},\>% \textbf{X})+\sigma^{2}_{e}\mathbb{I}_{mm}\right]}}.italic_P ( E ) = divide start_ARG e start_POSTSUPERSCRIPT ( - divide start_ARG 1 end_ARG start_ARG 2 end_ARG E start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT ⋅ [ italic_K ( X , X ) + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT blackboard_I start_POSTSUBSCRIPT italic_m italic_m end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ E ) end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG ( 2 italic_π ) start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT det [ italic_K ( X , X ) + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT blackboard_I start_POSTSUBSCRIPT italic_m italic_m end_POSTSUBSCRIPT ] end_ARG end_ARG . (7)

To make a Gaussian process regression function E(𝐱)𝐸superscript𝐱E(\mathbf{x}^{*})italic_E ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ), we use the analogy of Bayesian inference to calculate the conditional probability distribution of E(𝐱)𝐸superscript𝐱E(\mathbf{x}^{*})italic_E ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ), given by the prior 𝐄𝐄\mathbf{E}bold_E:

P(E(𝐱)|E)=P(E,E(𝐱))P(E).𝑃conditional𝐸superscript𝐱E𝑃E𝐸superscript𝐱𝑃E\begin{split}P\left(E(\mathbf{x}^{*})|\textbf{E}\right)=\frac{P\left(\textbf{E% },\>E(\mathbf{x}^{*})\right)}{P\left(\textbf{E}\right)}.\end{split}start_ROW start_CELL italic_P ( italic_E ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) | E ) = divide start_ARG italic_P ( E , italic_E ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) end_ARG start_ARG italic_P ( E ) end_ARG . end_CELL end_ROW (8)

This is also normal distributed since P(E,E(𝐱))𝑃E𝐸superscript𝐱P\left(\textbf{E},\>E(\mathbf{x}^{*})\right)italic_P ( E , italic_E ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) is a Gaussian joint distribution. By substituting (7) into (8) and manipulating it algebraically following the derivation in [22], we can write the posterior mean of (E(𝐱)|E)conditional𝐸superscript𝐱E\left(E(\mathbf{x}^{*})|\textbf{E}\right)( italic_E ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) | E ) as

E(𝐱)=k(x,X)[K(X,X)+σe2𝕀mm]1E,𝐸superscript𝐱𝑘superscriptxXsuperscriptdelimited-[]𝐾XXsubscriptsuperscript𝜎2𝑒subscript𝕀𝑚𝑚1EE(\mathbf{x}^{*})=k(\textbf{x}^{*},\>\textbf{X})\cdot\left[K(\textbf{X},\>% \textbf{X})+\sigma^{2}_{e}\mathbb{I}_{mm}\right]^{-1}\cdot\textbf{E},italic_E ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = italic_k ( x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , X ) ⋅ [ italic_K ( X , X ) + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT blackboard_I start_POSTSUBSCRIPT italic_m italic_m end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ E , (9)

where k(𝐱,X)𝑘superscript𝐱Xk(\mathbf{x}^{*},\;\textbf{X})italic_k ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , X ) is a 1×m1𝑚1\times m1 × italic_m covariance matrix between the posterior E(𝐱)𝐸superscript𝐱E(\mathbf{x}^{*})italic_E ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) and the dataset 𝐄𝐄\mathbf{E}bold_E.

Gaussian processes are obviously powerful for predicting potential energy surfaces, since the model depends only on a kernel as function of two (same and different) atomic descriptors and is trained on the prior potential energy surface data directly. There is no need to determine the basis functions and the corresponding weights explicitly for the prediction.

However, it is possible to define a basis set in order to model nonlinear functions where there is nonlinear relationship among the corresponding descriptors. For instance, instead of representing atomic configurations in the Cartesian basis we could use spherical harmonics as a basis function, making the model equivariant, as used to great effect in recent works orientated towards molecular dynamics[3, 23, 24, 25]. This approach is also known as the “kernel trick”.

Commonly used Gaussian Process kernel functions allow us to define the similarity measure between an atomic descriptor in our Gaussian process model. Essentially these contain a prior belief in how much influence nearby data points should have on the underlying model. They generally encode the belief that physics is myopic—nearby data-points should have a greater influence on the model.

The standard and mathematically convenient kernel is the squared exponential (Gaussian or Radial Basis Function: RBF) kernel,

k(𝐱,𝐱)=σoexp(|𝐱𝐱|22l2),𝑘𝐱superscript𝐱subscript𝜎𝑜superscript𝐱superscript𝐱22superscript𝑙2k(\mathbf{x},\>\mathbf{x}^{*})=\sigma_{o}\exp{\left(-\frac{|\mathbf{x}-\mathbf% {x}^{*}|^{2}}{2l^{2}}\right)},italic_k ( bold_x , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = italic_σ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT roman_exp ( - divide start_ARG | bold_x - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) , (10)

where l𝑙litalic_l is the sole parameter, giving a length scale on which influence decays.

More simple kernels include a dot-product (or linear) kernel defined as

k(𝐱,𝐱)=𝐱𝐱𝑘𝐱superscript𝐱𝐱superscript𝐱k(\mathbf{x},\>\mathbf{x}^{*})=\mathbf{x}\cdot\mathbf{x}^{*}italic_k ( bold_x , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = bold_x ⋅ bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT (11)

and its more generalised form: the polynomial kernel

k(𝐱,𝐱)=(𝐱𝐱)ζ,𝑘𝐱superscript𝐱superscript𝐱superscript𝐱𝜁k(\mathbf{x},\>\mathbf{x}^{*})=(\mathbf{x}\cdot\mathbf{x}^{*})^{\zeta},italic_k ( bold_x , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = ( bold_x ⋅ bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_ζ end_POSTSUPERSCRIPT , (12)

where ζ𝜁\zetaitalic_ζ is a degree of the polynomial.

There are many other developed kernels such as Matérn, periodic and rational quadratic whose definition conveniently found in Chapter 4 of Rasmussen’s book[16].

In the conditioning (training) process, we may want to undertake hyperparameter optimisation, to set the observation noise σesubscript𝜎𝑒\sigma_{e}italic_σ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT, the kernel scale σosubscript𝜎𝑜\sigma_{o}italic_σ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT and the length scale l𝑙litalic_l for (10). One approach we implement is to maximise the logarithm of the marginal likelihood (7):

logP=12ET[K(X,X)+σe2𝕀mm]E12log(det[K(X,X)+σe2𝕀mm])m2log2π𝑃12superscriptETdelimited-[]𝐾XXsubscriptsuperscript𝜎2𝑒subscript𝕀𝑚𝑚E12delimited-[]𝐾XXsubscriptsuperscript𝜎2𝑒subscript𝕀𝑚𝑚𝑚22𝜋\begin{split}\log&P=-\frac{1}{2}\textbf{E}^{\text{T}}\cdot\left[K(\textbf{X},% \>\textbf{X})+\sigma^{2}_{e}\mathbb{I}_{mm}\right]\cdot\textbf{E}\\ &-\frac{1}{2}\log\left(\det\left[K(\textbf{X},\>\textbf{X})+\sigma^{2}_{e}% \mathbb{I}_{mm}\right]\right)-\frac{m}{2}\log 2\pi\end{split}start_ROW start_CELL roman_log end_CELL start_CELL italic_P = - divide start_ARG 1 end_ARG start_ARG 2 end_ARG E start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT ⋅ [ italic_K ( X , X ) + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT blackboard_I start_POSTSUBSCRIPT italic_m italic_m end_POSTSUBSCRIPT ] ⋅ E end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL - divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log ( roman_det [ italic_K ( X , X ) + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT blackboard_I start_POSTSUBSCRIPT italic_m italic_m end_POSTSUBSCRIPT ] ) - divide start_ARG italic_m end_ARG start_ARG 2 end_ARG roman_log 2 italic_π end_CELL end_ROW (13)

for which its derivatives with respect to optimised hyperparameters are zero.

So far everything we have discussed is a standard Gaussian process, adapted to model a potential energy surface. This would require many electronic structure calculations to train the model, as we only make use of the total energy of the system (a single scalar value) per calculation. In the next section, we propose a derivative potential energy surface Gaussian process model, where we can additionally use the derivative of the energies (forces) which are often provided ‘for free’ by the Hellman-Feynman theorem from electronic structure calculations. This significantly reduces the number of calculations required to train the model.

II.2 Derivative model

To make use of force (partial derivative of energy with respect to displacement) information we need to take the derivative of the Gaussian process[15, 14]: kernel functions need to be differentiated with respect to the descriptors, up to the second order.

Consider the first derivative of kernel function k(𝐱,𝐱)𝑘𝐱superscript𝐱k(\mathbf{x},\mathbf{x}^{*})italic_k ( bold_x , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) with respect to the descriptor 𝐱𝐱\mathbf{x}bold_x of N𝑁Nitalic_N atoms in Cartesian representations, 𝐱3N𝐱superscript3𝑁\mathbf{x}\in\mathbb{R}^{3N}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT 3 italic_N end_POSTSUPERSCRIPT. The derivative can be written as a 3N×13𝑁13N\times 13 italic_N × 1 kernel matrix (Jacobian matrix as a first rank tensor)

[𝐱k(𝐱,𝐱)]T=[kx1kx2kx3N]T.superscriptdelimited-[]subscript𝐱𝑘𝐱superscript𝐱Tsuperscript𝑘subscript𝑥1𝑘subscript𝑥2𝑘subscript𝑥3𝑁T\left[\nabla_{\mathbf{x}}k(\mathbf{x},\>\mathbf{x}^{*})\right]^{\textbf{T}}=% \left[\frac{\partial k}{\partial x_{1}}\ \ \frac{\partial k}{\partial x_{2}}% \cdots\frac{\partial k}{\partial x_{3N}}\right]^{\textbf{T}}.[ ∇ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_k ( bold_x , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ] start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT = [ divide start_ARG ∂ italic_k end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG divide start_ARG ∂ italic_k end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ⋯ divide start_ARG ∂ italic_k end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 3 italic_N end_POSTSUBSCRIPT end_ARG ] start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT . (14)
Refer to caption
Figure 1: The diagram (1a)1𝑎(1a)( 1 italic_a ) shows the tensor contraction, Eq. (20), of the derivative GP model trained on one datapoint at x in Cartesian coordinates. This results in figure (1b)1𝑏(1b)( 1 italic_b ), the harmonic force constant Φ(2)superscriptΦ2\Phi^{(2)}roman_Φ start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT at xsuperscriptx\textbf{x}^{*}x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, of 2×1×12112\times 1\times 12 × 1 × 1 Si-bulk illustrated in (1c)1𝑐(1c)( 1 italic_c ).

In the derivative model, this matrix is used as the covariance matrix which link the force information at 𝐱𝐱\mathbf{x}bold_x to the potential energy at 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Similarly, we can define the covariance matrix between the potential energy at 𝐱𝐱\mathbf{x}bold_x and forces at 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT by taking the first derivative of the kernel with respect to 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT:

[𝐱k(𝐱,𝐱)]T=[kx1kx2kx3N]T.superscriptdelimited-[]subscriptsuperscript𝐱𝑘𝐱superscript𝐱Tsuperscript𝑘subscriptsuperscript𝑥1𝑘subscriptsuperscript𝑥2𝑘subscriptsuperscript𝑥3𝑁T\left[\nabla_{\mathbf{x}^{*}}k(\mathbf{x},\>\mathbf{x}^{*})\right]^{\textbf{T}% }=\left[\frac{\partial k}{\partial x^{*}_{1}}\ \ \frac{\partial k}{\partial x^% {*}_{2}}\cdots\frac{\partial k}{\partial x^{*}_{3N}}\right]^{\textbf{T}}.[ ∇ start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_k ( bold_x , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ] start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT = [ divide start_ARG ∂ italic_k end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG divide start_ARG ∂ italic_k end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ⋯ divide start_ARG ∂ italic_k end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 3 italic_N end_POSTSUBSCRIPT end_ARG ] start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT . (15)

Finally, we need the model marginal likelihood which contains the covariance of the forces at 𝐱𝐱\mathbf{x}bold_x and 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. For this we require Hessian of the kernel with respect to those two descriptors, resulting in a 3N×3N3𝑁3𝑁3N\times 3N3 italic_N × 3 italic_N second rank tensor covariance matrix,

𝐱𝐱k=[2kx1x12kx1x22kx1x3N2kx2x12kx2x22kx3Nx12kx3Nx3N].subscript𝐱subscriptsuperscript𝐱𝑘matrixsuperscript2𝑘subscript𝑥1superscriptsubscript𝑥1superscript2𝑘subscript𝑥1superscriptsubscript𝑥2superscript2𝑘subscript𝑥1superscriptsubscript𝑥3𝑁superscript2𝑘subscript𝑥2superscriptsubscript𝑥1superscript2𝑘subscript𝑥2superscriptsubscript𝑥2missing-subexpressionmissing-subexpressionmissing-subexpressionsuperscript2𝑘subscript𝑥3𝑁superscriptsubscript𝑥1missing-subexpressionsuperscript2𝑘subscript𝑥3𝑁superscriptsubscript𝑥3𝑁\nabla_{\mathbf{x}}\nabla_{\mathbf{x}^{*}}k=\begin{bmatrix}\frac{\partial^{2}k% }{\partial x_{1}\partial x_{1}^{*}}&\frac{\partial^{2}k}{\partial x_{1}% \partial x_{2}^{*}}&\cdots&\frac{\partial^{2}k}{\partial x_{1}\partial x_{3N}^% {*}}\\ \frac{\partial^{2}k}{\partial x_{2}\partial x_{1}^{*}}&\frac{\partial^{2}k}{% \partial x_{2}\partial x_{2}^{*}}&&\vdots\\ \vdots&&\ddots&\\ \frac{\partial^{2}k}{\partial x_{3N}\partial x_{1}^{*}}&\cdots&&\frac{\partial% ^{2}k}{\partial x_{3N}\partial x_{3N}^{*}}\end{bmatrix}.∇ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_k = [ start_ARG start_ROW start_CELL divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_k end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_CELL start_CELL divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_k end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∂ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_CELL start_CELL ⋯ end_CELL start_CELL divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_k end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∂ italic_x start_POSTSUBSCRIPT 3 italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_k end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_CELL start_CELL divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_k end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∂ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_CELL start_CELL end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL end_CELL start_CELL ⋱ end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_k end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 3 italic_N end_POSTSUBSCRIPT ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_CELL start_CELL ⋯ end_CELL start_CELL end_CELL start_CELL divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_k end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 3 italic_N end_POSTSUBSCRIPT ∂ italic_x start_POSTSUBSCRIPT 3 italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW end_ARG ] . (16)

With only one prior energy E(𝐱1)𝐸subscript𝐱1E(\mathbf{x}_{1})italic_E ( bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and the corresponding forces E(𝐱1)𝐸subscript𝐱1\nabla E(\mathbf{x}_{1})∇ italic_E ( bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), the marginal likelihood matrix can be reconstructed by combining (10), (14), (15) and (16). This marginal likelihood covariance matrix ((1+3N)×(1+3N)13𝑁13𝑁(1+3N)\times(1+3N)( 1 + 3 italic_N ) × ( 1 + 3 italic_N )) is

𝒦(𝐱1,𝐱1)=[k(𝐱k)T𝐱k𝐱𝐱k]𝐱,𝐱=𝐱1+Σ,𝒦subscript𝐱1subscript𝐱1subscriptmatrix𝑘superscriptsubscript𝐱𝑘Tsubscriptsuperscript𝐱𝑘subscript𝐱subscriptsuperscript𝐱𝑘𝐱superscript𝐱subscript𝐱1Σ\mathcal{K}(\mathbf{x}_{1},\>\mathbf{x}_{1})=\begin{bmatrix}k&(\nabla_{\mathbf% {x}}k)^{\textbf{T}}\\ \nabla_{\mathbf{x}^{*}}k&\nabla_{\mathbf{x}}\nabla_{\mathbf{x}^{*}}k\end{% bmatrix}_{\mathbf{x},\>\mathbf{x}^{*}=\mathbf{x}_{1}}+\Sigma,caligraphic_K ( bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = [ start_ARG start_ROW start_CELL italic_k end_CELL start_CELL ( ∇ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_k ) start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL ∇ start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_k end_CELL start_CELL ∇ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_k end_CELL end_ROW end_ARG ] start_POSTSUBSCRIPT bold_x , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + roman_Σ , (17)

The (1+3N)×(1+3N)13𝑁13𝑁(1+3N)\times(1+3N)( 1 + 3 italic_N ) × ( 1 + 3 italic_N ) ΣΣ\Sigmaroman_Σ observation noise matrix is

Σ:=[σe2𝟎𝟎σf2𝕀3N×3N],assignΣmatrixsubscriptsuperscript𝜎2𝑒00subscriptsuperscript𝜎2𝑓subscript𝕀3𝑁3𝑁\Sigma:=\begin{bmatrix}\sigma^{2}_{e}&\mathbf{0}\\ \mathbf{0}&\sigma^{2}_{f}\cdot\mathbb{I}_{3N\times 3N}\end{bmatrix},roman_Σ := [ start_ARG start_ROW start_CELL italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_CELL start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL bold_0 end_CELL start_CELL italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ⋅ blackboard_I start_POSTSUBSCRIPT 3 italic_N × 3 italic_N end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , (18)

where σesubscript𝜎𝑒\sigma_{e}italic_σ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT and σfsubscript𝜎𝑓\sigma_{f}italic_σ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT are observation noises for prior energy and force respectively. One choice for these noises are the energy and force convergence factor in the underlying electronic structure calculations used for training.

Therefore the potential energy prediction, analogous to (9), is

E(𝐱)=[k(𝐱k)T]x,x1𝒦𝐱1,𝐱11[EE]𝐱1,𝐸superscript𝐱subscriptmatrix𝑘superscriptsubscriptsuperscript𝐱𝑘Tsuperscriptxsubscriptx1subscriptsuperscript𝒦1subscript𝐱1subscript𝐱1subscriptmatrix𝐸𝐸subscript𝐱1E(\mathbf{x}^{*})=\begin{bmatrix}k&(\nabla_{\mathbf{x}^{*}}k)^{\textbf{T}}\end% {bmatrix}_{\textbf{x}^{*},\textbf{x}_{1}}\cdot\mathcal{K}^{-1}_{\mathbf{x}_{1}% ,\>\mathbf{x}_{1}}\cdot\begin{bmatrix}E\\ \nabla E\end{bmatrix}_{\mathbf{x}_{1}},italic_E ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = [ start_ARG start_ROW start_CELL italic_k end_CELL start_CELL ( ∇ start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_k ) start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] start_POSTSUBSCRIPT x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ caligraphic_K start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ [ start_ARG start_ROW start_CELL italic_E end_CELL end_ROW start_ROW start_CELL ∇ italic_E end_CELL end_ROW end_ARG ] start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , (19)

where the subscripts indicate the functions evaluated at the descriptors.

Motivated by the derivative model, the differentiation can be extended to compute the second order (harmonic) and the third order (cubic anharmonic) force constants.

Refer to caption
Figure 2: The diagram (2a)2𝑎(2a)( 2 italic_a ) illustrates Eq. (20) using a phonon descriptor. The results (2b)2𝑏(2b)( 2 italic_b ) is the dynamical matrix 𝐃(𝐝𝐪)𝐃subscript𝐝𝐪\mathbf{D}(\mathbf{d}_{\mathbf{q}})bold_D ( bold_d start_POSTSUBSCRIPT bold_q end_POSTSUBSCRIPT ) instead of the harmonic force constant, which corresponds to the analysis of the primitive Si crystal (2c)2𝑐(2c)( 2 italic_c ).

To predict the harmonic force constant, it is necessary to evaluate the second and the third order derivative with respect to those descriptors, yielding

(𝐱)2kand(𝐱)2𝐱k.superscriptsubscriptsuperscript𝐱2𝑘andsuperscriptsubscriptsuperscript𝐱2subscript𝐱𝑘(\nabla_{\mathbf{x}^{*}})^{2}k\quad\text{and}\quad(\nabla_{\mathbf{x}^{*}})^{2% }\nabla_{\mathbf{x}}k.( ∇ start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_k and ( ∇ start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∇ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_k .

These correspond to the (3N×3N)3𝑁3𝑁(3N\times 3N)( 3 italic_N × 3 italic_N ) rank-2 and (3N×3N×3N)3𝑁3𝑁3𝑁(3N\times 3N\times 3N)( 3 italic_N × 3 italic_N × 3 italic_N ) rank-3 tensor, which shows the covariance linking the harmonic force constant at 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT to the potential energy and to the force fields respectively at 𝐱𝐱\mathbf{x}bold_x. We can, then, perform (19) again with the change of those covariance matrix:

Φ(2)(𝐱)=[(𝐱)2k(𝐱)2𝐱k]x,x1T𝒦𝐱1,𝐱11[EE]𝐱1,superscriptΦ2superscript𝐱subscriptsuperscriptmatrixsuperscriptsubscriptsuperscript𝐱2𝑘superscriptsubscriptsuperscript𝐱2subscript𝐱𝑘Tsuperscriptxsubscriptx1subscriptsuperscript𝒦1subscript𝐱1subscript𝐱1subscriptmatrix𝐸𝐸subscript𝐱1\begin{split}\Phi^{(2)}(\mathbf{x}^{*})=\begin{bmatrix}(\nabla_{\mathbf{x}^{*}% })^{2}k\\ (\nabla_{\mathbf{x}^{*}})^{2}\nabla_{\mathbf{x}}k\end{bmatrix}^{\text{T}}_{% \textbf{x}^{*},\textbf{x}_{1}}\cdot\mathcal{K}^{-1}_{\mathbf{x}_{1},\>\mathbf{% x}_{1}}\cdot\begin{bmatrix}E\\ \nabla E\end{bmatrix}_{\mathbf{x}_{1}},\end{split}start_ROW start_CELL roman_Φ start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = [ start_ARG start_ROW start_CELL ( ∇ start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_k end_CELL end_ROW start_ROW start_CELL ( ∇ start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∇ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_k end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ caligraphic_K start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ [ start_ARG start_ROW start_CELL italic_E end_CELL end_ROW start_ROW start_CELL ∇ italic_E end_CELL end_ROW end_ARG ] start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , end_CELL end_ROW (20)

where Φ2subscriptΦ2\Phi_{2}roman_Φ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is the (3N×3N)3𝑁3𝑁(3N\times 3N)( 3 italic_N × 3 italic_N ) harmonic force constant containing the correlation of between all possible degree of freedoms of 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. This Eq. (20) can constructed as a tensor, which is then contracted, as in Figure 1.

We propose a similar method for predicting the (cubic) anharmonic force constant. The covariance tensor correlating the cubic anharmonic force constant to the potential energy and to the forces can be written as

(𝐱)3kand(𝐱)3𝐱ksuperscriptsubscriptsuperscript𝐱3𝑘andsuperscriptsubscriptsuperscript𝐱3subscript𝐱𝑘(\nabla_{\mathbf{x}^{*}})^{3}k\quad\text{and}\quad(\nabla_{\mathbf{x}^{*}})^{3% }\nabla_{\mathbf{x}}k( ∇ start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_k and ( ∇ start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ∇ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_k

respectively. We use these rank-3 and rank-4 covariance tensors to recast (20) as

Φ(3)(𝐱)=[(𝐱)3k(𝐱)3𝐱k]x,x1T𝒦𝐱1,𝐱11[EE]𝐱1,superscriptΦ3superscript𝐱subscriptsuperscriptmatrixsuperscriptsubscriptsuperscript𝐱3𝑘superscriptsubscriptsuperscript𝐱3subscript𝐱𝑘Tsuperscriptxsubscriptx1subscriptsuperscript𝒦1subscript𝐱1subscript𝐱1subscriptmatrix𝐸𝐸subscript𝐱1\begin{split}\Phi^{(3)}(\mathbf{x}^{*})=\begin{bmatrix}(\nabla_{\mathbf{x}^{*}% })^{3}k\\ (\nabla_{\mathbf{x}^{*}})^{3}\nabla_{\mathbf{x}}k\end{bmatrix}^{\text{T}}_{% \textbf{x}^{*},\textbf{x}_{1}}\cdot\mathcal{K}^{-1}_{\mathbf{x}_{1},\>\mathbf{% x}_{1}}\cdot\begin{bmatrix}E\\ \nabla E\end{bmatrix}_{\mathbf{x}_{1}},\end{split}start_ROW start_CELL roman_Φ start_POSTSUPERSCRIPT ( 3 ) end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = [ start_ARG start_ROW start_CELL ( ∇ start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_k end_CELL end_ROW start_ROW start_CELL ( ∇ start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ∇ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_k end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ caligraphic_K start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ [ start_ARG start_ROW start_CELL italic_E end_CELL end_ROW start_ROW start_CELL ∇ italic_E end_CELL end_ROW end_ARG ] start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , end_CELL end_ROW (21)

where Φ3subscriptΦ3\Phi_{3}roman_Φ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT is the cubic anharmonic force constant tensor with the size of (3N×3N×3N)3𝑁3𝑁3𝑁(3N\times 3N\times 3N)( 3 italic_N × 3 italic_N × 3 italic_N ).

With these methods of directly evaluating the force constants from our fitted Gaussian Process, we have all the necessary components for calculating the lattice dynamics of a material, including anharmonicity.

Refer to caption
(a)
Refer to caption Refer to caption
(b) (c)
Refer to caption Refer to caption
(d) (e)
Figure 3: Figure (a), (b), (c), (d) and (e) show the phonon band structures and density of states of d-Si (FCC), GaAs, CdTe, NaCl and PbTe respectively. Solid green lines are calculated from the harmonic force constants evaluated by Phonopy (with finite displacement methods), while the dashed red lines are calculated from the harmonic Gaussian process force constants.

II.3 Phonon coordinate representations

Instead of using the direct Cartesian representations, it seemed to be a good idea to try and directly use the phonon coordinates (normal modes) with the Gaussian process. These normal modes form a natural basis for the motion of a material or molecule, and naturally encode the symmetries of the material or molecule. Due to this, they are also the starting point for higher order physical models of lattice dynamics, such as many-body perturbation-theory which couples the normal modes through the anharmonicity.

Our hope was that this representation would increase the efficiency of the machine learning method (i.e. greater accuracy for fewer evaluations), without having to build any symmetries or equivariances directly into our methods. The normal modes are provided by diagonalising the mass-weighted second-order force-constant matrix (the dynamic matrix).

Refer to caption Refer to caption
(a) (b)
Figure 4: Panel (a) and (b) show the FC2 and FC3 learning curve via the acoustic sum rule. Yellow, red, green, blue and pink colours represent d-Si (FCC), GaAs, CdTe, NaCl and PbTe. The solid and dashed lines indicate the force constant in Cartesian and Phonon basis (evaluated at gamma: ΓΓ\Gammaroman_Γ point).

To build phonon descriptors, we start by considering this “Dynamical Matrix[26, 27, 28] at wave vector q:

DAi,Bj(q)=1mAmBlΦ0Ai,lBj(2)eiq(RlBoR0Ao)subscript𝐷𝐴𝑖𝐵𝑗q1subscript𝑚𝐴subscriptsuperscript𝑚𝐵subscriptsuperscript𝑙subscriptsuperscriptΦ20𝐴𝑖superscript𝑙𝐵𝑗superscripte𝑖qsubscriptsuperscriptR𝑜superscript𝑙𝐵subscriptsuperscriptR𝑜0𝐴\begin{split}D_{Ai,Bj}(\textbf{q})=\frac{1}{\sqrt{m_{A}m^{\prime}_{B}}}\sum_{l% ^{\prime}}\Phi^{(2)}_{0Ai,l^{\prime}Bj}\textbf{e}^{i\textbf{q}\cdot(\textbf{R}% ^{o}_{l^{\prime}B}-\textbf{R}^{o}_{0A})}\end{split}start_ROW start_CELL italic_D start_POSTSUBSCRIPT italic_A italic_i , italic_B italic_j end_POSTSUBSCRIPT ( q ) = divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_m start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Φ start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 italic_A italic_i , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_B italic_j end_POSTSUBSCRIPT e start_POSTSUPERSCRIPT italic_i q ⋅ ( R start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_B end_POSTSUBSCRIPT - R start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 italic_A end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_CELL end_ROW (22)

for component i𝑖iitalic_i of atom A𝐴Aitalic_A and component j𝑗jitalic_j of B𝐵Bitalic_B with mass mAsubscript𝑚𝐴m_{A}italic_m start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT and mBsubscriptsuperscript𝑚𝐵m^{\prime}_{B}italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT respectively. The summation is over all possible primitive cells lsuperscript𝑙l^{\prime}italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in the supercell where 00 indicates the reference primitive cell. Here RosuperscriptR𝑜\textbf{R}^{o}R start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT is an atomic displacement from the equilibrium.

Our phonon descriptors are derived from a simple change of basis, using these new eignevectors of the dynamics matrix,

dBj(q)=mBlxlBjeiq(RlBoR0Ao),subscript𝑑𝐵𝑗qsubscriptsuperscript𝑚𝐵subscriptsuperscript𝑙subscriptxsuperscript𝑙𝐵𝑗superscripte𝑖qsubscriptsuperscriptR𝑜superscript𝑙𝐵subscriptsuperscriptR𝑜0𝐴\begin{split}d_{Bj}(\textbf{q})=\sqrt{m^{\prime}_{B}}\sum_{l^{\prime}}\textbf{% x}_{l^{\prime}Bj}\textbf{e}^{i\textbf{q}\cdot(\textbf{R}^{o}_{l^{\prime}B}-% \textbf{R}^{o}_{0A})},\end{split}start_ROW start_CELL italic_d start_POSTSUBSCRIPT italic_B italic_j end_POSTSUBSCRIPT ( q ) = square-root start_ARG italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT x start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_B italic_j end_POSTSUBSCRIPT e start_POSTSUPERSCRIPT italic_i q ⋅ ( R start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_B end_POSTSUBSCRIPT - R start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 italic_A end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT , end_CELL end_ROW (23)

where x are the original Cartesian descriptors [29].

The system symmetry is directly imposed to the model: the model does not have to learn the symmetry by itself. Additionally, the model descriptors are collapsed from an l×l×l𝑙𝑙𝑙l\times l\times litalic_l × italic_l × italic_l supercell to a κ𝜅\kappaitalic_κ-atom primitive cell size; i.e., N=l3κ𝑁superscript𝑙3𝜅N=l^{3}\kappaitalic_N = italic_l start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_κ atoms are in the atomic environment. The corresponding tensor contraction Equation (20) can be reconstructed as shown in Figure 2.

For m𝑚mitalic_m training points, the learning costs of 𝒪(33m3N3)𝒪superscript33superscript𝑚3superscript𝑁3\mathcal{O}(3^{3}m^{3}N^{3})caligraphic_O ( 3 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_N start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) and the prediction cost 𝒪(3mN)𝒪3𝑚𝑁\mathcal{O}(3mN)caligraphic_O ( 3 italic_m italic_N ) [16, 30] are reduced to 𝒪(33m3κ3)𝒪superscript33superscript𝑚3superscript𝜅3\mathcal{O}(3^{3}m^{3}\kappa^{3})caligraphic_O ( 3 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_κ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) and 𝒪(3mκ)𝒪3𝑚𝜅\mathcal{O}(3m\kappa)caligraphic_O ( 3 italic_m italic_κ ) respectively.

The predicted second order force constant (as a yield from Equation (20)) will be in phonon coordinates as a dynamical matrix at wave-vector 𝐪𝐪\mathbf{q}bold_q. We can then recover the second order force constant in Cartesian coordinates by using the inverse discrete Fourier transform of the set of N𝑁Nitalic_N dynamical matrices, Equation (22), as following

Φ0Ai,lBj(2)=mAmBNqDAi,Bj(q)eiq(RlBoR0Ao).subscriptsuperscriptΦ20𝐴𝑖superscript𝑙𝐵𝑗subscript𝑚𝐴subscriptsuperscript𝑚𝐵𝑁subscriptqsubscript𝐷𝐴𝑖𝐵𝑗qsuperscripte𝑖qsubscriptsuperscriptR𝑜superscript𝑙𝐵subscriptsuperscriptR𝑜0𝐴\begin{split}\Phi^{(2)}_{0Ai,l^{\prime}Bj}=\frac{\sqrt{m_{A}m^{\prime}_{B}}}{N% }\sum_{\textbf{q}}D_{Ai,Bj}(\textbf{q})\textbf{e}^{-i\textbf{q}\cdot(\textbf{R% }^{o}_{l^{\prime}B}-\textbf{R}^{o}_{0A})}.\end{split}start_ROW start_CELL roman_Φ start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 italic_A italic_i , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_B italic_j end_POSTSUBSCRIPT = divide start_ARG square-root start_ARG italic_m start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT q end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT italic_A italic_i , italic_B italic_j end_POSTSUBSCRIPT ( q ) e start_POSTSUPERSCRIPT - italic_i q ⋅ ( R start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_B end_POSTSUBSCRIPT - R start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 italic_A end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT . end_CELL end_ROW (24)

Similarly, the third order force constant predicted in phonon coordinates can be transform back to one in Cartesian coordinates with the similar expression,

Φ0Ai,lBj,l′′Ck(3)=mAmBmC′′Nq,q′′ΦAi,Bj,Ck(3)(q,q′′)eiq(RlBoR0Ao)×eiq′′(Rl′′CoR0Ao)ei(q+q+q′′)R0AoΔ(q+q+q′′),subscriptsuperscriptΦ30𝐴𝑖superscript𝑙𝐵𝑗superscript𝑙′′𝐶𝑘subscript𝑚𝐴subscriptsuperscript𝑚𝐵subscriptsuperscript𝑚′′𝐶𝑁subscriptsuperscriptqsuperscriptq′′subscriptsuperscriptΦ3𝐴𝑖𝐵𝑗𝐶𝑘superscriptqsuperscriptq′′superscripte𝑖superscriptqsubscriptsuperscriptR𝑜superscript𝑙𝐵subscriptsuperscriptR𝑜0𝐴superscripte𝑖superscriptq′′subscriptsuperscriptR𝑜superscript𝑙′′𝐶subscriptsuperscriptR𝑜0𝐴superscripte𝑖qsuperscriptqsuperscriptq′′subscriptsuperscriptR𝑜0𝐴Δqsuperscriptqsuperscriptq′′\begin{split}\Phi^{(3)}_{0Ai,l^{\prime}Bj,l^{\prime\prime}Ck}=&\frac{\sqrt{m_{% A}m^{\prime}_{B}m^{\prime\prime}_{C}}}{N}\sum_{\textbf{q}^{\prime},\textbf{q}^% {\prime\prime}}\Phi^{(3)}_{Ai,Bj,Ck}(\textbf{q}^{\prime},\textbf{q}^{\prime% \prime})\\ &\textbf{e}^{-i\textbf{q}^{\prime}\cdot(\textbf{R}^{o}_{l^{\prime}B}-\textbf{R% }^{o}_{0A})}\times\textbf{e}^{-i\textbf{q}^{\prime\prime}\cdot(\textbf{R}^{o}_% {l^{\prime\prime}C}-\textbf{R}^{o}_{0A})}\\ &\textbf{e}^{-i(\textbf{q}+\textbf{q}^{\prime}+\textbf{q}^{\prime\prime})\cdot% \textbf{R}^{o}_{0A}}\Delta(\textbf{q}+\textbf{q}^{\prime}+\textbf{q}^{\prime% \prime}),\end{split}start_ROW start_CELL roman_Φ start_POSTSUPERSCRIPT ( 3 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 italic_A italic_i , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_B italic_j , italic_l start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT italic_C italic_k end_POSTSUBSCRIPT = end_CELL start_CELL divide start_ARG square-root start_ARG italic_m start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT italic_m start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Φ start_POSTSUPERSCRIPT ( 3 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_A italic_i , italic_B italic_j , italic_C italic_k end_POSTSUBSCRIPT ( q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL e start_POSTSUPERSCRIPT - italic_i q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⋅ ( R start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_B end_POSTSUBSCRIPT - R start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 italic_A end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT × e start_POSTSUPERSCRIPT - italic_i q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ⋅ ( R start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT italic_C end_POSTSUBSCRIPT - R start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 italic_A end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL e start_POSTSUPERSCRIPT - italic_i ( q + q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ⋅ R start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 italic_A end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_Δ ( q + q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) , end_CELL end_ROW (25)

where these wave vectors have to conserve the lattice momentum. This means they are confined by

Δ(q+q+q′′)={1q+q+q′′ is reciprocallattice vector0Otherwise.Δqsuperscriptqsuperscriptq′′cases1q+q+q′′ is reciprocalotherwiselattice vector0Otherwise\Delta(\textbf{q}+\textbf{q}^{\prime}+\textbf{q}^{\prime\prime})=\begin{cases}% 1&\text{$\textbf{q}+\textbf{q}^{\prime}+\textbf{q}^{\prime\prime}$ is % reciprocal}\\ &\quad\text{lattice vector}\\ 0&\text{Otherwise}.\end{cases}roman_Δ ( q + q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) = { start_ROW start_CELL 1 end_CELL start_CELL q + q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT is reciprocal end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL lattice vector end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL Otherwise . end_CELL end_ROW (26)

Therefore, the three phonon descriptors (dAi(q)subscript𝑑𝐴𝑖qd_{Ai}(\textbf{q})italic_d start_POSTSUBSCRIPT italic_A italic_i end_POSTSUBSCRIPT ( q ), dBj(q)subscript𝑑𝐵𝑗superscriptqd_{Bj}(\textbf{q}^{\prime})italic_d start_POSTSUBSCRIPT italic_B italic_j end_POSTSUBSCRIPT ( q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) and dCk(q′′)subscript𝑑𝐶𝑘superscriptq′′d_{Ck}(\textbf{q}^{\prime\prime})italic_d start_POSTSUBSCRIPT italic_C italic_k end_POSTSUBSCRIPT ( q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ) used in our prediction have to satisfy this constraint.

III Dataset preparation and training

In the development of our methods, and to compare to standard approaches, we consider a set of representative solid state materials. We compare our method (which we term GPFC, Gaussian Process Force-Constants) with a traditional finite displacement method (as implemented in Phonopy [26, 28]) and a third-order many-body perturbation theory approach (in Phono3py [27, 28]).For processing of our predicted force-constants we use Phonopy and Phono3py. The details of the training datasets are provided below.

phono(3)py-dataset: we use phono(3)py-package to generate displaced structures of 2×2×22222\times 2\times 22 × 2 × 2 supercells following finite different methods (FDM), and then evaluate the energy and force in a density functional theory calculation with VASP, with plane-wave cutoff energy (ENMAX =800eVabsent800𝑒𝑉=800\>eV= 800 italic_e italic_V), SCF energy convergence (EDIFF =108eVabsentsuperscript108𝑒𝑉=10^{-8}\>eV= 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT italic_e italic_V), and k-point (kpts =[6, 6, 6]absent666=[6,\>6,\>6]= [ 6 , 6 , 6 ]). After VASP processes, we then use VASP results with Phonopy-package to generate force constants with space group symmetry operation.

GPFC-dataset: we use ASE to generate rattled (2×2×22222\times 2\times 22 × 2 × 2 supercell) structures of the atomic environments with normal distribution 0.01Å0.01italic-Å0.01\>{\AA}0.01 italic_Å, and then perform similar density functional theory calculations of energy and force in VASP. The plane-wave energy cutoff, k-point mesh, convergence parameters are as above. Subsequently, the dataset of total energies and forces are used to train the derivative GP model following Eq (20) without imposing any symmetry.

Though the GPFC hyperparameters (kernel scale (σosubscript𝜎𝑜\sigma_{o}italic_σ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT), a length scale (l𝑙litalic_l), the observation noise of energy (σesubscript𝜎𝑒\sigma_{e}italic_σ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT) and force (σfsubscript𝜎𝑓\sigma_{f}italic_σ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT)) can be optimised by maximising the logarithm of the marginal likelihood as in Eq. (13), we set them to reasonable constants. σosubscript𝜎𝑜\sigma_{o}italic_σ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT and l𝑙litalic_l are 1111 and 0.40.40.40.4 respectively for the 0.01 0.05Å0.010.05italic-Å0.01\>-\>0.05\>\AA0.01 - 0.05 italic_Å normal distributed rattles of the relaxed structure [17]. The observation noises σesubscript𝜎𝑒\sigma_{e}italic_σ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT and σfsubscript𝜎𝑓\sigma_{f}italic_σ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT we taje as 10e810𝑒810e-810 italic_e - 8, the same as our SCF energy convergence (EDIFF) in our VASP calculations.

Refer to caption Refer to caption
(a) (b)
Refer to caption
(c)
Figure 5: These figures illustrate the predicted lifetime of phonons at 300300300300 K and the lattice thermal conductivity of d-Si and PbTe from 300K300𝐾300\>K300 italic_K to 1200K1200𝐾1200\>K1200 italic_K. Green point clouds and lines are calculated from the cubic anharmonic force constants evaluated by Phono3py (with finite displacement methods), while Red point clouds and lines are calculated from the cubic anharmonic Gaussian process force constants.

IV Results and discussion

To compare the harmonic (second order) force constants between our model and the standard finite-displacement method (FDM) as implemented in Phonopy, we examine the phonon band diagram and its density of state of a diamond(d)-Si (FCC), GaAs, CdTe, NaCl and PbTe, which are illustrated in Figure 3. The root mean square errors of the phonon band structures are around 14%1percent41-4\%1 - 4 %. Wasserstein’s distance (earth mover distance, EMD) is used to measure the dissimilarity of two phonon density of states between different methods over a frequency ω𝜔\omegaitalic_ω region: we calculate these as 0.0430.0430.0430.043, 0.0370.0370.0370.037, 0.0210.0210.0210.021, 0.0710.0710.0710.071 and 0.0220.0220.0220.022 THzeV1Å3𝑇𝐻𝑧𝑒superscript𝑉1superscriptitalic-Å3THz\cdot eV^{-1}\AA^{-3}italic_T italic_H italic_z ⋅ italic_e italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_Å start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT respectively.

The model learning curves are shown via calculating the degree of acoustic (or translational) sum-rule violation. For harmonic force-constants, this should be

BΦAi,Bj(2)=0,subscript𝐵subscriptsuperscriptΦ2𝐴𝑖𝐵𝑗0\begin{split}\sum_{B}\Phi^{(2)}_{Ai,Bj}=0,\end{split}start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT roman_Φ start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_A italic_i , italic_B italic_j end_POSTSUBSCRIPT = 0 , end_CELL end_ROW (27)

and for third order anharmonic force-constants this would be

CΦAi,Bj,Ck(3)=0,subscript𝐶subscriptsuperscriptΦ3𝐴𝑖𝐵𝑗𝐶𝑘0\begin{split}\sum_{C}\Phi^{(3)}_{Ai,Bj,Ck}=0,\end{split}start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT roman_Φ start_POSTSUPERSCRIPT ( 3 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_A italic_i , italic_B italic_j , italic_C italic_k end_POSTSUBSCRIPT = 0 , end_CELL end_ROW (28)

with components i𝑖iitalic_i,j𝑗jitalic_j,k𝑘kitalic_k, of atoms A𝐴Aitalic_A,B𝐵Bitalic_B,C𝐶Citalic_C respectively.

In Figure 4 (a), we can see the sum-rules in the Cartesian basis for all those three materials start to converge to zero with 48absent48\approx 48≈ 48 data-points. This number corresponds to the degree of freedoms of the atomic descriptor in Cartesian coordinates, which are 48484848 for the 2×2×22222\times 2\times 22 × 2 × 2 supercell of a 2-atomic primitive cell. Although those five materials are different, their learning curves converge with the same number of training points.

Meanwhile in phonon coordinates, they converge to zero with 6absent6\approx 6≈ 6 datapoints. The number datapoints required similarly correspond to the degree of freedoms of the phonon atomic descriptor (now in the primitive unit-cell). Phonon coordinate FC2 (dynamical matrix) and FC3 are evaluated at ΓΓ\Gammaroman_Γ-point. In other q-points, the evaluations are limited by current automatic differentiation package. With this limitation, we cannot recover Cartesian FC2 and FC3 with using Eq (24) and Eq (25).

To compare the cubic anharmonic (or third order) force constants among the methods, we consider the lifetime of the phonons and the lattice thermal conductivity of d-Si, NaCl, PbTe, GaAs and CdTe with finite temperature (30012003001200300-1200300 - 1200 K). They are calculated by using a third-order many-body perturbation theory approach with different cubic anharmonic force constants, one from FDM in Phono3py and another from kernel regression in our GPFC. The cubic anharmonic Gaussian process force constant is predicted by using Eq. (23). We train the derivative GP model with the same training set as using in the prediction of the harmonic force constants.

Based on the convergence of the acoustic sum-rule (Figure 4), accurate prediction of the cubic anharmonic force constants with a Cartesian basis requires up to 400400400400 energies and force calculations. Meanwhile, in the phonon basis, starting with a dynamic-matrix (harmonic force constants), only 50505050 energy and forces are required.

The predicted lifetime of phonons of d-Si and PbTe at 300300300300 K are shown in panel 5 (a) and 5 (b), while their predicted lattice thermal conductivity illustrated in Figure 5 (c). Again, the Earth Mover Distance is used to quantify dissimilarity of the projected lifetime density of state, for each finite 30303030 K step from 300300300300 K to 1200120012001200 K. The mean EMD for d-Si is 3.9×101ps3.9superscript101𝑝𝑠3.9\times 10^{-1}\>ps3.9 × 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_p italic_s, while 8.6×103ps8.6superscript103𝑝𝑠8.6\times 10^{-3}\>ps8.6 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT italic_p italic_s is for PbTe. Mean absolute errors of the lattice thermal conductivity calculated by using the third order GPFC are 2.032.032.032.03 (2%similar-toabsentpercent2\sim 2\%∼ 2 % error) and 1.4×1031.4superscript1031.4\times 10^{-3}1.4 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT (7%similar-toabsentpercent7\sim 7\%∼ 7 % error) Wm1K1𝑊superscript𝑚1superscript𝐾1W\cdot m^{-1}K^{-1}italic_W ⋅ italic_m start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, respectively. Moreover, the lifetime of phonons and the lattice thermal conductivity of NaCl, GaAs and CdTe are calculated based on our GPFCs with the errors ranging from 2%percent22\%2 % to 7%percent77\%7 %. The accuracy of the third order GPFC prediction for PbTe is low comparing among five materials because PbTe exhibits strong anharmonic behaviour leading to phonon-phonon interaction which can reduce its phonon lifetime.

A key finding of our experiments with our GPFC approach is that we seem to require a number of training points equal to the degree of freedom of the descriptors. In the phonon basis this is the number of phonon bands which you are calculating. Potentially the Coulomb matrix [31], or similar species-aware descriptor, could be used to share information about similar chemical species in the unit cell.

In our anharmonic (to cubic) GPFC experiments, we find we require approximately 8888 times the number of data points required for a harmonic GPFC.

Therefore, in our limited experiments the calculation of anharmonic force constants is linear in the number of elements in the unit cell.

V Conclusion

We develop a method to model lattice dynamics, by fitting a derivative Gaussian Process force-constant (GPFC) model. These derivatives provide the correlation among energy and its derivatives, i.e. force, second-order force constants (FC2), and FC3. We experiment with this model on the harmonic and anharmonic (cubic) force constants of five materials (d-Si, NaCl, PbTe, GaAs and CdTe). Accurate predictions of the harmonic force constants require the same number of energy and force evaluations (here density functional theory calculations) as the number of degrees of freedom in our descriptor basis. For the most compact phonon descriptor, this means the same number as the phonon bands to be predicted. To predict the third-order force constants, we seem to require 8 times the data than required to fit the harmonic force constants.

Our models seem to offer linear scaling of prediction of anharmonic force constants. We are technically limited in being able to apply our phonon descriptor to anharmonic force constants, due to the necessity of dealing with the derivative of a complex number at positions away from gamma in the Brillouin-Zone.

An extension of this work would be to leverage the development of atomistic machine-learning force-field descriptors, i.e. radial basis and spherical Harmonic basis functions as in GAP [2, 3] or ACE [23, 32]. These model descriptors are equivariant under the rotation in a three dimensional space, and may offer some of the benefits of the phonon descriptors we have developed in this work.

As an alternative approach, one could directly train (or fine tune) a full machine-learning force-field, and then use this as a surrogate model for the standard finite displacement phonon workflow. In a future study we will compare the data efficiency of this approach.

VI Author contribution

K.K.: Formal Analysis (lead); Investigation (lead); Methodology (equal); Software (lead); Writing – original draft (equal); Writing – review and editing (equal).

J.M.F.: Conceptualization (lead); Methodology (equal); Writing – original draft (equal); Writing – review and editing (equal).

VII Acknowledgement

J.M.F. is supported by a Royal Society University Research Fellowship (URF-R1-191292). K.K. is supported by a Thai scholarship, Development and Promotion of Science and Technology project. Julia[33] codes implementing these calculations are available as a repository on GitHub[34]. This work made use of the Imperial College Research Computing Service [35]. Via our membership of the UK’s HEC Materials Chemistry Consortium, which is funded by EPSRC (EP/R029431 and EP/X035859), this work used the ARCHER2 UK National Supercomputing Service (http://www.archer2.ac.uk).

Appendix A Force constants from a Gaussian Process

References

  • Handley et al. [2009] C. M. Handley, G. I. Hawe, D. B. Kell, and P. L. A. Popelier, Optimal construction of a fast and accurate polarisable water potential based on multipole moments trained by machine learning, Physical Chemistry Chemical Physics 11, 6365 (2009).
  • Bartók et al. [2010] A. P. Bartók, M. C. Payne, R. Kondor, and G. Csányi, Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons, Physical Review Letters 10410.1103/physrevlett.104.136403 (2010).
  • Bartók et al. [2013] A. P. Bartók, R. Kondor, and G. Csányi, On representing chemical environments, Physical Review B 8710.1103/physrevb.87.184115 (2013).
  • Bartók and Csányi [2015] A. P. Bartók and G. Csányi, Gaussian approximation potentials: A brief tutorial introduction, International Journal of Quantum Chemistry 115, 1051–1057 (2015).
  • Cui and Krems [2016] J. Cui and R. V. Krems, Efficient non-parametric fitting of potential energy surfaces for polyatomic molecules with gaussian processes, Journal of Physics B: Atomic, Molecular and Optical Physics 49, 224001 (2016).
  • Dral et al. [2017] P. O. Dral, A. Owens, S. N. Yurchenko, and W. Thiel, Structure-based sampling and self-correcting machine learning for accurate calculations of potential energy surfaces and vibrational levels, The Journal of Chemical Physics 14610.1063/1.4989536 (2017).
  • Guan et al. [2017] Y. Guan, S. Yang, and D. H. Zhang, Construction of reactive potential energy surfaces with gaussian process regression: active data selection, Molecular Physics 116, 823–834 (2017).
  • Kolb et al. [2017] B. Kolb, P. Marshall, B. Zhao, B. Jiang, and H. Guo, Representing global reactive potential energy surfaces using gaussian processes, The Journal of Physical Chemistry A 121, 2552–2557 (2017).
  • Guan et al. [2018] Y. Guan, S. Yang, and D. H. Zhang, Application of clustering algorithms to partitioning configuration space in fitting reactive potential energy surfaces, The Journal of Physical Chemistry A 122, 3140–3147 (2018).
  • Bartók et al. [2018] A. P. Bartók, J. Kermode, N. Bernstein, and G. Csányi, Machine learning a general-purpose interatomic potential for silicon, Physical Review X 810.1103/physrevx.8.041048 (2018).
  • Strickson et al. [2019] O. Strickson, N. Nikiforakis, and E. Artacho, Dynamical continuum simulation of condensed matter from first principles, Physical Review Research 110.1103/physrevresearch.1.033199 (2019).
  • Dai and Krems [2020] J. Dai and R. V. Krems, Interpolation and extrapolation of global potential energy surfaces for polyatomic systems by gaussian processes with composite kernels, Journal of Chemical Theory and Computation 16, 1386 (2020).
  • Sugisawa et al. [2020] H. Sugisawa, T. Ida, and R. V. Krems, Gaussian process model of 51-dimensional potential energy surface for protonated imidazole dimer, The Journal of Chemical Physics 153, 114101 (2020).
  • Wu et al. [2017] A. Wu, M. C. Aoi, and J. W. Pillow, Exploiting gradients and hessians in bayesian optimization and bayesian quadrature (2017), arXiv:1704.00060 .
  • Solak et al. [2002] E. Solak, R. Murray-Smith, W. E. Leithead, D. J. Leith, and C. E. Rasmussen, Derivative observations in gaussian process models of dynamic systems, in Proceedings of the 15th International Conference on Neural Information Processing Systems, NIPS’02 (MIT Press, Cambridge, MA, USA, 2002) p. 1057–1064.
  • Rasmussen and Williams [2006] C. E. Rasmussen and C. K. I. Williams, Gaussian processes for machine learning., Adaptive computation and machine learning (MIT Press, 2006) pp. I–XVIII, 1–248.
  • del Río et al. [2019] E. G. del Río, J. J. Mortensen, and K. W. Jacobsen, Local bayesian optimizer for atomic structures, Physical Review B 10010.1103/physrevb.100.104103 (2019).
  • Kaappa et al. [2021] S. Kaappa, E. G. del Río, and K. W. Jacobsen, Global optimization of atomic structures with gradient-enhanced gaussian process regression, Physical Review B 10310.1103/physrevb.103.174114 (2021).
  • Asnaashari and Krems [2021] K. Asnaashari and R. V. Krems, Gradient domain machine learning with composite kernels: improving the accuracy of PES and force fields for large molecules, Machine Learning: Science and Technology 3, 015005 (2021).
  • Eriksson et al. [2019] F. Eriksson, E. Fransson, and P. Erhart, The hiphive package for the extraction of high‐order force constants by machine learning, Advanced Theory and Simulations 210.1002/adts.201800184 (2019).
  • McHutchon [2013] A. McHutchon, Differentiating gaussian processeshttp://mlg.eng.cam.ac.uk/mchutchon/DifferentiatingGPs.pdf (2013).
  • MacKay [2003] D. J. C. MacKay, Information Theory, Inference, and LearningAlgorithms (Cambridge University Press., 2003).
  • Drautz [2019] R. Drautz, Atomic cluster expansion for accurate and transferable interatomic potentials, Physical Review B 9910.1103/physrevb.99.014104 (2019).
  • Batatia et al. [2022a] I. Batatia, D. P. Kovács, G. N. C. Simm, C. Ortner, and G. Csányi, MACE: Higher order equivariant message passing neural networks for fast and accurate force fields, in Advances in Neural Information Processing Systems, Vol. 35, edited by S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Curran Associates, Inc., 2022) pp. 11423–11436, arXiv:2206.07697 [stat.ML] .
  • Batatia et al. [2022b] I. Batatia, S. Batzner, D. P. Kovács, A. Musaelian, G. N. C. Simm, R. Drautz, C. Ortner, B. Kozinsky, and G. Csányi, The design space of e(3)-equivariant atom-centered interatomic potentials (2022b).
  • Togo [2023] A. Togo, First-principles phonon calculations with phonopy and phono3py, J. Phys. Soc. Jpn. 92, 012001 (2023).
  • Togo et al. [2015] A. Togo, L. Chaput, and I. Tanaka, Distributions of phonon lifetimes in brillouin zones, Phys. Rev. B Condens. Matter Mater. Phys. 91 (2015).
  • Togo et al. [2023] A. Togo, L. Chaput, T. Tadano, and I. Tanaka, Implementation strategies in phonopy and phono3py, J. Phys. Condens. Matter 35 (2023).
  • Shinohara et al. [2023] K. Shinohara, A. Togo, and I. Tanaka, spgrep: On-the-fly generator of space-group irreducible representations, J. Open Source Softw. 8, 5269 (2023).
  • [30] D. Eriksson, K. Dong, E. H. Lee, D. Bindel, and A. G. Wilson, Scaling gaussian process regression with derivatives, https://dl.acm.org/doi/pdf/10.5555/3327757.3327791, accessed: 2023-12-7.
  • Rupp et al. [2012] M. Rupp, A. Tkatchenko, K.-R. Müller, and O. A. von Lilienfeld, Fast and accurate modeling of molecular atomization energies with machine learning, Physical Review Letters 10810.1103/physrevlett.108.058301 (2012).
  • Witt et al. [2023] W. C. Witt, C. van der Oord, E. Gelžinytė, T. Järvinen, A. Ross, J. P. Darby, C. H. Ho, W. J. Baldwin, M. Sachs, J. Kermode, N. Bernstein, G. Csányi, and C. Ortner, ACEpotentials.jl: A julia implementation of the atomic cluster expansion, J. Chem. Phys. 159 (2023).
  • Bezanson et al. [2017] J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah, Julia: A fresh approach to numerical computing, SIAM review 59, 65 (2017).
  • Keeratikarn and Frost [2024] K. Keeratikarn and J. M. Frost, https://github.com/Frost-group/GPFC.jl (2021–2024).
  • Harvey [2017] M. Harvey, Imperial college research computing service (2017).