Completing DFT With ML
Completing DFT With ML
com/npjcompumats
ARTICLE OPEN
Kohn–Sham density functional theory (DFT) is the basis of modern computational approaches to electronic structures. Their
accuracy heavily relies on the exchange-correlation energy functional, which encapsulates electron–electron interaction beyond
the classical model. As its universal form remains undiscovered, approximated functionals constructed with heuristic approaches
are used for practical studies. However, there are problems in their accuracy and transferability, while any systematic approach to
improve them is yet obscure. In this study, we demonstrate that the functional can be systematically constructed using accurate
density distributions and energies in reference molecules via machine learning. Surprisingly, a trial functional machine learned from
only a few molecules is already applicable to hundreds of molecules comprising various first- and second-row elements with the
same accuracy as the standard functionals. This is achieved by relating density and energy using a flexible feed-forward neural
network, which allows us to take a functional derivative via the back-propagation algorithm. In addition, simply by introducing a
nonlocal density descriptor, the nonlocal effect is included to improve accuracy, which has hitherto been impractical. Our approach
thus will help enrich the DFT framework by utilizing the rapidly advancing machine-learning technique.
1234567890():,;
1
Department of Physics, The University of Tokyo, Hongo, Bunkyo-Ku, Tokyo 113-0033, Japan. 2Institute for Solid State Physics, The University of Tokyo, Kashiwa, Chiba 277-8581,
Japan. ✉email: [email protected]
Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences
R. Nagai et al.
2
was adopted because of its ability to represent any well-behaved construct functionals whose performance is comparable to standard
functions with arbitrary accuracy17,18. We have found that, when functionals by training with data of only a few of molecules.
applied to Vion not referenced in the training, the explicit We applied this ML construction approach to four types of
treatment of the kinetic energy suppresses the effect from approximation by setting the input vector g: LSDA, GGA, meta-
spurious oscillation in the predicted Vxc, and it reduces the error GGA, as well as a new formulation that we call “near region
of finally obtained n(r). This result suggests that the machine- approximation” (NRA) R by defining g[n](r) = (n(r), ζ(r), s(r), τ(r), R
learning approach to Vxc with the KS equation is a promising (r))T, where RðrÞ dr0 nðr0 Þexpðjr r0 j=σ Þ. Gunnarsson et al.28
route. The challenge is then to make the ML of Vxc feasible for real demonstrated that such an averaged density around r describes
materials. εxc(r) efficiently; therefore, we added it into g of meta-GGA.
Our strategy is to restrict the functional form to the (semi-)local Construction in such a nonlocal form has been uncommon, except
one, as adopted in most existing functionals for KS-DFT. for the van der Waals systems, because of the absence of
Specifically, we assume the following form for the xc-energy appropriate physical conditions.
Exc[n] to obtain the xc potential by Vxc(r) = δExc/δn(r) To test the performance of this approach, we constructed a
Z functional using a few molecules to train the NN. We selected
Exc ½n dr nðrÞεxc ðg½nðrÞÞ; (2) three molecules according to the following criteria: (i) the
structures of the molecules should be distinct from each other
where g[n](r) represents any local or nonlocal variables (descrip- and have low symmetry. (ii) Electrically polarized molecules are
preferred to include to deal with cases where optimized orbitals
tors) to include the effect of the density distribution around r.
are highly distorted from the atomic orbitals. (iii) It is most
Most of the existing functionals follow local spin-density
important to include at least one spin-polarized molecule, which is
approximation (LSDA)19,20, generalized gradient approximation
necessary for determining the dependency on spin-polarization ζ.
(GGA)21–23, or meta-GGA24–26, by defining g[n](r) as (n(r), ζ(r) ≡
Following those criteria, H2O, NH3, and NO are selected as the
↑(r)−n↓(r))/n(r)) , (n(r), ζ(r), s(r) ≡ |∇n(r)|/[2(3π
T 2 1/3 4/3
(n ) n (r)])T, or
Pocc reference molecules. Note that the NO radical is spin-polarized.
nðrÞ; ζ ðrÞ; sðrÞ; τ ðrÞ 1=2 i j∇φi ðrÞj2 T, respectively. In this The functionals are trained to reproduce the atomization energy
study, the xc-energy density εxc(r) is formulated using a feed- (AE) and the density distribution (DD) of them. We generated the
1234567890():,;
forward NN with H layers, which is a vector-to-vector mapping training data using accurate quantum chemical calculations, i.e.,
u → v represented by the Gaussian-2 method (G2)29 for the AE and the coupled-cluster
method with single and double excitations (CCSD)30,31 for the DD,
v ¼ hH ð ¼ ðh2 ðh1 ðuÞÞ; (3) which are more accurate methods than DFT. We adopt the AE for
training instead of the total energy (TE), considering that typical
hi ð x Þ f ðWi x þ bi Þ: (4) errors by existing functionals for the TE (~hartree) are much larger
than those for the AE (~eV or kcal/mol. See Table 1). The larger
where hi represents the ith layer of the NN, and the input vector x error implies the difficulty of reproducing TE within the (semi-)
is nonlinearly transformed by the activation function f after being local approximations, whereas the relative energy such as the AE
linearly transformed by the weight parameters Wi and bi. To can be predicted more accurately due to the error cancellation. It
evaluate the functional derivative of δExc/δn(r) for the xc potential, is also worth emphasizing that DD contains abundant information
we utilize the back-propagation technique27, which is an efficient of the electronic structure all over the three-dimensional space,
algorithm to differentiate an NN applying the chain rule. This NN which is expected to contribute to determining a large number of
form thus relates {n(r)} and {Vxc(r)} to be incorporated into the KS NN parameters. We selected the above conditions simply for
equation. In this case, we define u as the local density descriptors demonstration, though how the accuracy depends on the choice
g[n](r) and v as a one-dimensional vector εxc(r) (Fig. 1). The of the training dataset remains a target for future studies.
“Methods” section contains further details. Ultimately, the training dataset comprised the AE and DD of
This (semi-)local NN form has practical advantages compared H2O, NH3, and NO.
with the fully nonlocal form, which is adopted in the previous We trained the NN parameters so that εxc optimally reproduces
studies. First, the local mapping g(r) → εxc(r) is obviously transfer- the values of AE and DD for the training molecules through the
able to any system with different size, while the fully nonlocal one self-consistent solution of the KS equation—Eq. 1. For this
is not. Second, even a few systems can provide a large amount of purpose, we designed a Metropolis-type Monte Carlo update
training data since every grid point r yields different pair of values method for the NN parameters. At each step, the KS equation was
{g(r), εxc(r)}, which can be sufficient to determine the NN self-consistently solved for the three molecules to obtain their
parameters. As we demonstrate later, those features enable us to densities and total energies. Subsequently, errors from the
reference values of the AE and DD were evaluated for the update
of parameters. The energies of the component atoms (H, N, and O
in their isolated form) were also calculated with KS-DFT using the
same εxc to calculate the AE. This procedure was repeated until the
error was minimized. See the “Methods” section for the exact
definition of the error function and computational details.
npj Computational Materials (2020) 43 Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences
R. Nagai et al.
3
Table 1. Performances of NN-based and existing functionals.
AE147a1 (kcal/mol) AEHC28a2 (kcal/mol) DD147b - TE147c (hartree) IP13d (kcal/mol) BH76e (kcal/mol)
Tao–Perdew–Staroverov–Scuseria (TPSS)24, “Strongly Constrained determining a large number of NN parameters, but also for
and Appropriately Normed” (SCAN)25, and M06-L26 are meta-GGAs, improving their accuracy.
and PBE035, B3LYP36, and M0637 are hybrid functionals. It is also remarkable that NN-LSDA achieves far better accuracy
For the wide range of unreferenced molecular systems and than SVWN. Tozer et al.39 showed that, within the local approxima-
unreferenced quantities (BH, IP, and TE), the NN-based functionals tion, the energy density functional cannot be determined uniquely
exhibit comparable or superior performance to existing func- because the xc potential takes multiple values for the same local
tionals in every approximation level. In particular, the nonlocal n(r), as it is actually nonlocal. From various dependencies on n(r), the
NRA-type functional is comparable to the hybrid functionals, conventional LSDA has been adjusted for uniform electron gas,
which partly contain nonlocal effects. It is also noteworthy that the while our functional can be contrasted as “LSDA adjusted to
NN-based functionals are comparable to M06-L, B3LYP, or M06, molecular systems”. In addition, as the approximation level increases
which were implemented with the parameter fitting referring to (GGA, meta-GGA, and NRA), the multivaluedness of the mapping
more than 100 systems. This remarkable transferability with the g → εxc reduces; thus, the accuracy tends to improve as depicted in
small training dataset is nontrivial in the context of conventional Fig. 2. These results suggest that systematic improvement of the
ML methods predicting material properties. It reflects the functional is realized by adding further descriptors to g, and by
advantage of our method when using electron density, which is training with DDs.
common to any material, as the input for ML mapping. Even for Figure 3 also represents the improvement along with the
unreferenced molecules, this NN-based functional would work if approximation level for each benchmark molecule. For example,
its local DD is similar to the one included in the reference for the AE of HCN shown in panel (b), the NN-based functional
molecules. Actually, the NN-based functional shows comparable becomes more accurate as the approximation level increases.
accuracies for the AE of hydrocarbons (AEHC28) to other However, for AEs of SiH4 and CCl4, the accuracy does not improve
molecules, even though no carbon element is included in the systematically. This implies that their electron DD cannot be
reference molecules. Furthermore, some hydrocarbons such as trained sufficiently with the current reference molecules. Actually,
benzene and butadiene have delocalized electrons owing to their they have tetrahedrally coordinated structures, which do not
conjugated structures. In LSDA or GGA, the error for them is appear in the reference molecules. Large parts of their DD are
relatively large, whereas the error becomes much smaller in meta- considered to not appear in the reference molecules, leading to
GGA and NRA (see Supplementary Table for detailed values). This inaccurate prediction of the functional value. When we attempt
means that, as the descriptor of DD increases, the NN gains the further improvement by expanding the training dataset, we can
ability to distinguish whether the electrons are localized or find molecules that are “out of training” by this analysis and add
delocalized. them to the dataset.
The NN-based functionals also tend to be accurate for the We also applied the NN-based meta-GGA functional to the
unreferenced properties TE and IP, as well as for the trained bond dissociation of C2H2 and N2, comparing them to the existing
property DD. The accuracy for TE and DD should be related meta-GGA functionals as shown in panels (a) and (b) of Fig. 4. They
because the Hohenberg–Kohn theorem proves their one-to-one agree very well, even though the NN-based functional is trained
correspondence. Accuracy for IP is also closely related to that of only for the molecules in equilibrium structures. This transferability
DD, as Perdew et al. showed that IP can be calculated accurately for unreferenced structures is nontrivial in typical ML applications
using potential generated from accurate density with reproducing that predict the material properties directly from atomic config-
an accurate HOMO orbital energy38. This improvement for those urations with skipping basic physical theories. This indicates the
basic quantities would increase the accuracy of all other proper- advantage of explicitly solving the KS equation, where the kinetic
ties. Thus, training using density is effective not only for energy operator mitigates nonphysical noises of ML xc potential
Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences npj Computational Materials (2020) 43
R. Nagai et al.
4
Fig. 2 Improving accuracy for energies with improving accuracy for density. The panels represent accuracy for atomization energy (AE147),
reaction barrier height (BH76), and total energy (TE147) against accuracy for density distribution (DD147), corresponding to Table 1. The
closed and open markers represent the accuracy of existing and the NN-based functionals, respectively.
npj Computational Materials (2020) 43 Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences
R. Nagai et al.
5
Fig. 4 Dissociation curves and density transformation for linear molecules. Dissociation curves of a C2H2 and b N2 are calculated using the
NN-based meta-GGA and other existing meta-GGA functionals. For C2H2, the two C–H bonds are dissociated symmetrically along the original
bond direction. The horizontal axis shows the bond length, and the vertical axis shows energy relative to the atomized limit (Eatomized). The
magenta dashed lines and “x” marks show the bond lengths and the atomization energies from experiments32. The “o" marks show the peaks
of each curve. Density transformations of c C2H2 and d N2 due to binding are calculated using the CCSD method and DFT with the NN-based
meta-GGA functional. The densities of the single CH radical and N atom are plotted as blue lines, and the densities of the bonded molecules
are plotted as red lines. They are plotted along the 1D coordinate x penetrating the centers of those molecules.
and then training it with the electron DDs and energy-related error usually treated with range-separated hybrid functionals, or
properties of appropriate reference materials. The NN-based those with strongly correlated systems treated with DFT+U
functionals trained using only a few reference molecules exhibit approaches42. For those problems with complicated nonlocality,
comparable or superior performance to the representative our approach seems effective as it can systematically construct a
standard. We have revealed that employing the (semi-)local form maximally flexible functional form.
and including DDs in the training dataset contribute to this
transferability, as well as the determination of a large number of
NN parameters. Furthermore, this approach enables the systema- METHODS
tic construction of a functional with minimum assumptions, as Structure of the NN-based functional
demonstrated by the NRA functional with a nonlocal variable R, We formulate the xc-energy density as
which is difficult to construct using conventional methods
because of the lack of physical conditions. In Jacob’s ladder8, an 1n 4 4
o
1
approximated functional becomes accurate when including εxc ðn; gÞ ¼ n3 ð1 þ ζ Þ3 þ ð1 ζ Þ3 GNN
xc ðgÞ: (6)
2
nonlocality in orbital-dependent ways, such as hybrid functionals
or the random-phase approximation40,41; however, its computa-
The first factor n1=3 corresponds to the Slater exchange energy density19,
tional cost simultaneously increases. On the other hand, our
and the second is from the dependency of the exchange energy of the
approach of introducing nonlocality retains the classical frame-
uniformly spin-polarized electron gas on spin polarization43. They comprise
work of solving the KS equation with the explicit functional of the minimal physical conditions introduced to make the initial state of the
density, which makes the calculation more feasible for larger NN close to the goal. The remaining correction GNN xc is constructed using
systems. In future studies, our approach is expected to be applied the fully connected NN defined in Eq. 3 with four layers
to systems that cannot be completely treated with existing
functionals, such as those with dispersion interaction usually
treated with van der Waals functionals, those with self-interaction xc ðgÞ ¼ 1 þ h4 ð ¼ ðh1 ðgÞÞ:
GNN (7)
Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences npj Computational Materials (2020) 43
R. Nagai et al.
6
(a) (b) (c) (d)
τ=τ unif
Fig. 5 Behaviors of functionals around typical ranges of density distributions. The vertical coordinates are defined using Eq. (5). As meta-
GGA-type functionals have four variables, the panels show the dependency on one of them while the others are fixed. rs ≡ (3/4πn)1/3
represents the average distance between electrons. rs is about 1 bohr in typical metals. τunif ≡ (3/10)(3π2)2/3n5/3, τW ≡ |∇n|2/8n represents τ at
the UEG and single-orbital limits, respectively24.
Before applying the NN, each included element of g is preprocessed as bohr. This is derived from the inverse of the Fermi wavenumber, which is
follows: known to be the typical distance at which the contribution to the
1 exchange-correlation hole at r from r0 decays28, in the H2O molecule
n ! log n3 ;
n 4 4
o estimated from the DD calculated by the CCSD calculation (averaged with
ζ ! log 2 ð1 þ ζ Þ3 þ ð1 ζ Þ3 ;
1 respect to the number of electrons).
s ! log s; (8)
Training the NN-based functional
τ ! log 5
5
τ
5
; We used the Monte Carlo method by repeating the following steps to train
n3 ð1 þ ζ Þ3 þ ð1 ζ Þ3
the NN:
R ! log R:
These transformations are introduced to facilitate the optimization of NN 1. At the tth iteration, add a perturbation δwt to weights wt in NN. w
by making g dimensionless, suppressing the change in magnitude, and represents both elements in the matrices {Wi} and the vectors {bi}.
regularizing the variance ranges of all input elements. For activation function Each element in δwt is generated randomly from normal distribution
f, we adopted the smooth nonlinear activation function named “exponential N(0, δw).
linear units”44, which is defined as f(x) = max(0, x) + min(0, ex − 1). The last 2. Conduct the KS-DFT calculation for the target molecules and atoms
layer hH is designed to keep the value of εxc nonpositive. The dimensions of to evaluate the cost function Δierr in Eq. (14) using the NN-based
the parameter matrices and bias vectors are as follows: dim W1 = 100 × N, functional with the weight parameters wt + δwt.
dim W2 = dim W3 = 100 × 100, dim W4 = 1 × 100, dim b1 = dim b2 = dim 3. According to a random number p generated from uniform
b3 = 100, dim b4 = 1, where N represents the number of elements in g. distribution in (0,1) and the acceptance ratio P defined as follows,
decide whether to accept or reject the weight perturbation δwt.
!
Functional with nonlocal DD Δterr Δold
We suggest a functional form treating nonlocality by introducing a P ¼ exp err
: (13)
TΔold
err
nonlocal descriptor:
Z If P < 1: Set wt+1 = wt + δw and Δold
err ¼ Δerr . Restart from step 2.
t
Exc ½n ¼ dr nðrÞεxc ðgÞ½nðrÞ; (9) If p < P < 1: Set wt+1 = wt + δw. Restart from step 1.
If P < p: Set wt+1 = wt. Restart from step 1.
g½nðrÞ ¼ ðglocal ðrÞ; RðrÞÞ; (10) We repeated those steps while decreasing δw and T. The cost function
Z Δerr is defined as
RðrÞ ¼ dr0 nðr0 Þdðr; r0 Þ: (11)
glocal(r) represents (semi-)local descriptors such as n(r), s(r), or τ(r), while R Δerr ¼ c1 ΔG2 AEH2 O þ ΔG2 AENH3 þ ΔG2 AENO =E0
(14)
(r) includes the weighted DD around r, with the weight function dðr; r0 Þ
vanishing at the jr r0 j ! 1 limit. As a result of the nonlocality, the
þc2 ΔCCSD nH2 O þ ΔCCSD nNH3 þ ΔCCSD nNO ;
functional derivative contains an integration over the whole space:
where ΔG2AE represents the absolute deviation of the AE in hartree from
δExc the G2 calculation, and E0 was set to 1 hartree. ΔCCSDn represents the error
Vxc ½nðrÞ ¼ δnðrÞ between n obtained by DFT and CCSD calculation
(12)
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Z
∂fnðrÞεxc ðg½nðrÞÞg δglocal ðrÞ R
¼ þ dr0 ∂fnðrÞε∂Rxcððrg0 ½ÞnðrÞÞg d ðr; r0 Þ: Δ nM ¼
1 CCSD r 2 ;
M ðrÞ nM
CCSD
∂glocal ðrÞ δnðrÞ dr nDFT (15)
Ne
We implemented those integrations numerically on the same grid points
to those used in the exchange-correlation integration. The cost of where Ne represents the number of electrons in molecule M. The
evaluating the xc potential is proportional to the square of the system integrations were conducted numerically on the same grid points as
size. In this work, we defined d(r) as expðjr r0 j=σ Þ. σ was fixed to 0.2 those used in exchange-correlation integration of the KS equation (see the
npj Computational Materials (2020) 43 Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences
R. Nagai et al.
7
CODE AVAILABILITY
The trained NN parameters are available at https://github.com/ml-electron-project/
NNfunctional with usages implemented in PySCF codes.
REFERENCES
1. Xie, T. & Grossman, J. C. Crystal graph convolutional neural networks for an
accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120,
145301 (2018).
2. Behler, J. & Parrinello, M. Generalized neural-network representation of high-
Fig. 6 Network-size dependencies of the performance of NN- dimensional potential-energy surfaces. Phys. Rev. Lett. 98, 146401 (2007).
based meta-GGA functionals. AE147 MAE and DD147 ME (see the 3. Brockherde, F. et al. Bypassing the kohn-sham equations with machine learning.
caption of Table 1 for their definitions) of NN-based meta-GGA Nat. Commun. 8, 872 (2017).
functionals implemented with different NN sizes are plotted. The 4. Grisafi, A. et al. Transferable machine-learning model of the electron density. ACS
definition of each NN size is shown in the main text. Cent. 5, 57–64 (2018).
5. Chandrasekaran, A. et al. Solving the electronic structure problem with machine
learning. Npj Comput. Mater. 5, 22 (2019).
“Computational details” section). E0 is adjusted such that the magnitudes 6. Hohenberg, P. & Kohn, W. Inhomogeneous electron gas. Phys. Rev. 136,
of the contributions from the two terms become similar at the initial step B864–B871 (1964).
of the training. c2/c1 determines the balance of the two terms. In this study, 7. Kohn, W. & Sham, L. J. Self-consistent equations including exchange and corre-
E0 and c2/c1 were fixed to 1 hartree and 10, respectively, for the training of lation effects. Phys. Rev. 140, A1133–A1138 (1965).
any type of functional. 8. Perdew, J. P. & Schmidt, K. Jacob’s ladder of density functional approximations for
When training each NN-based functional, all steps were repeated for the exchange-correlation energy. AIP Conf. Proc. 577, 1–20 (2001).
approximately 300 times. The initial T and δW were set to 0.1 and 0.01, 9. Mardirossian, N. & Head-Gordon, M. Thirty years of density functional theory in
respectively, and they were linearly reduced to 0.06 and 0.005, computational chemistry: an overview and extensive assessment of 200 density
respectively. All whole steps were conducted in parallel with 160 threads functionals. Mol. Phys. 115, 2315–2372 (2017).
by ISSP System C, and the weight parameters, which minimized the cost 10. Gillan, M. J., Alfè, D. & Michaelides, A. Perspective: how good is dft for water? J.
function, were ultimately adopted. Chem. Phys. 144, 130901 (2016).
11. Ekholm, M. et al. Assessing the scan functional for itinerant electron ferro-
magnets. Phys. Rev. B 98, 094413 (2018).
NN-size dependency of performance 12. Medvedev, M. G., Bushmarinov, I. S., Sun, J., Perdew, J. P. & Lyssenko, K. A. Density
functional theory is straying from the path toward the exact functional. Science
To find the optimum NN size, we compared the performance among the 355, 49–52 (2017).
NN-based meta-GGA functionals trained in the same way as described 13. Snyder, J. C., Rupp, M., Hansen, K., Müller, K.-R. & Burke, K. Finding density
above with three different sizes: (α) H = 3, Nhidden = 50, (β) H = 4, Nhidden = functionals with machine learning. Phys. Rev. Lett. 108, 253002 (2012).
100, and (γ) H = 5, Nhidden = 200. Nhidden represents the size of hidden 14. Snyder, J. C. et al. Orbital-free bond breaking via machine learning. J. Chem. Phys.
layers, h1, ..., hH−1, in Eq. 3. As shown in Fig. 6, the performance improves to 139, 224104 (2013).
a certain extent as the NN size increases, while it does not improve 15. Li, L. et al. Understanding machine-learned density functionals. Int. J. Quantum
anymore when the size is larger than β. Therefore, we decided to use the Chem. 116, 819–833 (2016).
NN with size β throughout this work for the balance of computational cost 16. Nagai, R., Akashi, R., Sasaki, S. & Tsuneyuki, S. Neural-network Kohn-Sham
and accuracy. For the development of a further accurate functional, finer exchange-correlation potential and its out-of-training transferability. J. Chem.
tuning should be done. Phys. 148, 241737 (2018).
17. Cybenko, G. Approximation by superpositions of a sigmoidal function. Math.
Control Signals Syst. 2, 303–314 (1989).
Computational details 18. Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural
Netw. 4, 251–257 (1991).
All DFT and CCSD calculations in our work were implemented using PySCF 19. Slater, J. C. A simplification of the hartree-fock method. Phys. Rev. 81, 385–390
version 1.6.245, and the 6–311++G(3df,3pd) basis set was used both in (1951).
training the NN-based functionals and in testing the accuracies of the 20. Vosko, S. H., Wilk, L. & Nusair, M. Accurate spin-dependent electron liquid cor-
functionals. For the DFT calculations, the default settings of PySCF were relation energies for local spin density calculations: a critical analysis. Can. J. Phys.
used throughout. For the integration of xc potentials and energy densities, 58, 1200–1211 (1980).
we used the angular grids of Lebedev et al.46 and the radial grids of 21. Becke, A. D. Density-functional exchange-energy approximation with correct
Treutler et al.47. The numbers of radial and angular grids were set to (50, asymptotic behavior. Phys. Rev. A 38, 3098–3100 (1988).
302) for H, (75, 302) for second-row elements, and (80–105, 434) for third- 22. Lee, C., Yang, W. & Parr, R. G. Development of the colle-salvetti correlation-energy
row elements. For molecules, Becke partitioning48 was used. The NN-based formula into a functional of the electron density. Phys. Rev. B 37, 785–789 (1988).
functionals could cause a convergence issue owing to poor extrapolation 23. Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made
when they are applied to density far from that included in the training simple. Phys. Rev. Lett. 77, 3865–3868 (1996).
dataset; therefore, the initial density guess of self-consistent DFT should be 24. Tao, J., Perdew, J. P., Staroverov, V. N. & Scuseria, G. E. Climbing the density
functional ladder: Nonempirical meta–generalized gradient approximation
sufficiently close to the final destination. In this work, initial guesses of KS
designed for molecules and solids. Phys. Rev. Lett. 91, 146401 (2003).
density were given by a superposition of atomic density, which
25. Sun, J., Ruzsinszky, A. & Perdew, J. P. Strongly constrained and appropriately
successfully made the calculation converge.
normed semilocal density functional. Phys. Rev. Lett. 115, 036402 (2015).
We used the Pytorch version 1.1.0 for the NN implementation and took
26. Zhao, Y. & Truhlar, D. G. A new local density functional for main-group thermo-
its derivative via the back-propagation technique49. chemistry, transition metal bonding, thermochemical kinetics, and noncovalent
interactions. J. Chem. Phys. 125, 194101 (2006).
27. Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-
propagating errors. Nature 323, 533–536 (1986).
DATA AVAILABILITY 28. Gunnarsson, O., Jonson, M. & Lundqvist, B. I. Descriptions of exchange and cor-
The individual values for all benchmark systems in Table 1 are listed in relation effects in inhomogeneous electron systems. Phys. Rev. B 20, 3136–3164
Supplementary Table. (1979).
Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences npj Computational Materials (2020) 43
R. Nagai et al.
8
29. Curtiss, L. A., Raghavachari, K., Trucks, G. W. & Pople, J. A. Gaussian-2 theory for 49. Paszke, A. et al. Automatic differentiation in pytorch. NeurIPS (2017).
molecular energies of first-and second-row compounds. J. Chem. Phys. 94,
7221–7230 (1991).
30. Čížek, J. On the correlation problem in atomic and molecular systems. calculation ACKNOWLEDGEMENTS
of wavefunction components in ursell-type expansion using quantum-field the- R.N. thanks Takahito Nakajima and Yoshiyuki Yamamoto for their enlightening
oretical methods. J. Chem. Phys. 45, 4256–4266 (1966). comments. Part of the calculations were performed at the Supercomputer Center at
31. Purvis, G. D. & Bartlett, R. J. A full coupled-cluster singles and doubles model: the the Institute for Solid State Physics at the University of Tokyo.
inclusion of disconnected triples. J. Chem. Phys. 76, 1910–1918 (1982).
32. Curtiss, L. A., Raghavachari, K., Redfern, P. C. & Pople, J. A. Assessment of gaussian-
2 and density functional theories for the computation of enthalpies of formation.
AUTHOR CONTRIBUTIONS
J. Chem. Phys. 106, 1063–1079 (1997).
33. Lynch, B. J., Zhao, Y. & Truhlar, D. G. Effectiveness of diffuse basis functions for R.N. designed the method, implemented the codes, and performed the calculation.
calculating relative energies by density functional theory. J. Phys. Chem. A 107, All authors contributed to developing the concept, analyzing the results, and writing
1384–1388 (2003). the paper.
34. Zhao, Y., González-García, N. & Truhlar, D. G. Benchmark database of barrier
heights for heavy atom transfer, nucleophilic substitution, association, and
unimolecular reactions and its use to test theoretical methods. J. Phys. Chem. A COMPETING INTERESTS
109, 2012–2018 (2005). The authors declare no competing interests.
35. Ernzerhof, M. & Scuseria, G. E. Assessment of the perdew-burke-ernzerhof
exchange-correlation functional. J. Chem. Phys. 110, 5029–5036 (1999).
36. Stephens, P. J., Devlin, F. J., Chabalowski, C. F. & Frisch, M. J. Ab initio calculation ADDITIONAL INFORMATION
of vibrational absorption and circular dichroism spectra using density functional Supplementary information is available for this paper at https://doi.org/10.1038/
force fields. J. Phys. Chem. 98, 11623–11627 (1994). s41524-020-0310-0.
37. Zhao, Y. & Truhlar, D. G. The m06 suite of density functionals for main group
thermochemistry, thermochemical kinetics, noncovalent interactions, excited Correspondence and requests for materials should be addressed to R.N.
states, and transition elements: two new functionals and systematic testing of
four m06-class functionals and 12 other functionals. Theor. Chem. Acc. 120, Reprints and permission information is available at http://www.nature.com/
215–241 (2008). reprints
38. Perdew, J. P., Parr, R. G., Levy, M. & Balduz, J. L. Jr Density-functional theory for
fractional particle number: derivative discontinuities of the energy. Phys. Rev. Lett. Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims
49, 1691 (1982). in published maps and institutional affiliations.
39. Tozer, D. J., Ingamells, V. E. & Handy, N. C. Exchange-correlation potentials. J.
Chem. Phys. 105, 9200–9213 (1996).
40. McLachlan, A. & Ball, M. Time-dependent Hartree–Fock theory for molecules. Rev.
Mod. Phys. 36, 844 (1964).
41. Furche, F. Molecular tests of the random phase approximation to the exchange- Open Access This article is licensed under a Creative Commons
correlation energy functional. Phy. Rev. B 64, 195120 (2001). Attribution 4.0 International License, which permits use, sharing,
42. Burke, K. Perspective on density functional theory. J. Chem. Phys. 136, 150901 (2012). adaptation, distribution and reproduction in any medium or format, as long as you give
43. Oliver, G. L. & Perdew, J. P. Spin-density gradient expansion for the kinetic energy. appropriate credit to the original author(s) and the source, provide a link to the Creative
Phys. Rev. A 20, 397–403 (1979). Commons license, and indicate if changes were made. The images or other third party
44. Clevert, D. A., Unterthiner, T. & Hochreiter, S. Fast and accurate deep network material in this article are included in the article’s Creative Commons license, unless
learning by exponential linear units (elus). ICLR2016 (2015). indicated otherwise in a credit line to the material. If material is not included in the
45. Sun, Q. et al. Pyscf: the python-based simulations of chemistry framework. Wiley article’s Creative Commons license and your intended use is not permitted by statutory
Interdiscip. Rev. Comput. Mol. Sci. 8, e1340 (2017). regulation or exceeds the permitted use, you will need to obtain permission directly
46. Lebedev, V. I. & Laikov, D. A quadrature formula for the sphere of the 131st algebraic from the copyright holder. To view a copy of this license, visit http://creativecommons.
order of accuracy. in Dokl. Math. vol. 59, 477–481 (Pleiades Publishing, Ltd., 1999). org/licenses/by/4.0/.
47. Treutler, O. & Ahlrichs, R. Efficient molecular numerical integration schemes. J.
Chem. Phys. 102, 346–354 (1995).
48. Becke, A. D. A multicenter numerical integration scheme for polyatomic mole- © The Author(s) 2020
cules. J. Chem. Phys. 88, 2547–2553 (1988).
npj Computational Materials (2020) 43 Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences