José M Soler 2002 J. Phys. Condens. Matter 14 2745
José M Soler 2002 J. Phys. Condens. Matter 14 2745
José M Soler 2002 J. Phys. Condens. Matter 14 2745
Abstract
We have developed and implemented a selfconsistent density functional method
using standard norm-conserving pseudopotentials and a flexible, numerical
linear combination of atomic orbitals basis set, which includes multiple-zeta
and polarization orbitals. Exchange and correlation are treated with the local
spin density or generalized gradient approximations. The basis functions and
the electron density are projected on a real-space grid, in order to calculate
the Hartree and exchange–correlation potentials and matrix elements, with a
number of operations that scales linearly with the size of the system. We
use a modified energy functional, whose minimization produces orthogonal
wavefunctions and the same energy and density as the Kohn–Sham energy
functional, without the need for an explicit orthogonalization. Additionally,
using localized Wannier-like electron wavefunctions allows the computation
time and memory required to minimize the energy to also scale linearly with
the size of the system. Forces and stresses are also calculated efficiently
and accurately, thus allowing structural relaxation and molecular dynamics
simulations.
1. Introduction
As the improvements in computer hardware and software allow the simulation of molecules
and materials with an increasing number of atoms N , the use of so-called order-N algorithms,
in which the computer time and memory scales linearly with the simulated system size,
becomes increasingly important. These O(N ) methods were developed during the 1970s
and 80s for long-range forces [1] and empirical interatomic potentials [2] but only in the
last 5–10 years for the much more complex quantum mechanical methods, in which atomic
forces are obtained by solving the interaction of ions and electrons together [3]. Even among
quantum mechanical methods, there are very different levels of approximation: empirical
or semiempirical orthogonal tight-binding methods are the simplest ones [4, 5]; ‘ab initio’
nonorthogonal tight-binding and nonselfconsistent Harris-functional methods are next [6, 7]
and fully selfconsistent density functional theory (DFT) methods are the most complex and
reliable [8]. The implementation of O(N ) methods in quantum mechanical simulations has also
followed these steps, with several methods already well established within the tight-binding
formalism [5], but much less so in selfconsistent DFT [9]. The latter also require, in addition
to solving Schrödinger equation, the determination of the selfconsistent Hamiltonian in O(N )
iterations. While this is difficult using plane waves, a localized basis set appears to be the
natural choice. One proposed approach is the ‘blips’ of Hernandez and Gillan [10], regularly
spaced Gaussian-like splines that can be systematically increased, in the spirit of finite-element
methods, although at a considerable computational cost.
We have developed a fully selfconsistent DFT, based on a flexible linear combination of
atomic orbitals (LCAO) basis set, with essentially perfect O(N ) scaling. It allows extremely
fast simulations using minimal basis sets and very accurate calculations with complete multiple-
zeta and polarized bases, depending on the required accuracy and available computational
power. In previous papers [11, 12] we have described preliminary versions of this method,
that we call SIESTA (Spanish Initiative for Electronic Simulations with Thousands of Atoms).
There is also a review [13] of the tens of studies performed with it, in a wide variety of
systems, such as metallic surfaces, nanotubes and biomolecules. In this work we present a
more complete description of the method, as well as some important improvements.
Apart from that of Born and Oppenheimer, the most basic approximations concern the
treatment of exchange and correlation (XC), and the use of pseudopotentials. Exchange and
correlation are treated within Kohn–Sham DFT [14]. We allow for both the local (spin) density
approximation [15] (LDA/LSD) and the generalized gradient approximation (GGA) [16]. We
use standard norm-conserving pseudopotentials [17, 18] in their fully nonlocal form [19]. We
also include scalar-relativistic effects and the nonlinear partial-core correction to treat XC in
the core region [20].
The SIESTA code has been already tested and applied to dozens of systems and a variety of
properties [13]. Therefore, we shall just illustrate here the convergence of a few characteristic
magnitudes of silicon, the archetypical system of the field, with respect to the main precision
parameters that characterize our method: basis size (number of atomic basis orbitals); basis
range (radius of the basis orbitals); fineness of the real-space integration grid and confinement
radius of the Wannier-like electron states. Other parameters, such as the k-sampling integration
grid, are common to all similar methods and we shall not discuss their convergence here.
2. Pseudopotential
Although the use of pseudopotentials is not strictly necessary with atomic basis sets, we
find them convenient to get rid of the core electrons and, more importantly, to allow for the
The SIESTA method for ab initio order-N materials simulation 2747
expansion of a smooth (pseudo-) charge density on a uniform spatial grid. The theory and usage
of first-principles norm-conserving pseudopotentials [17] is already well established. SIESTA
reads them in semilocal form (a different radial potential Vl (r) for each angular momentum l,
optionally generated scalar-relativistically [21,22]) from a data file that users can fill with their
preferred choice. We generally use the Troullier–Martins parametrization [23]. We transform
this semilocal form into the fully nonlocal form proposed by Kleinman and Bylander (KB) [19]:
V̂ P S = Vlocal (r) + V̂ KB (1)
KB
lmax
l NlKB
V̂ KB = KB KB
|χlmn KB
vln χlmn | (2)
l=0 m=−l n=1
KB
vln = ϕln |δVl (r)|ϕln (3)
KB KB
where r = |r |, r̂ = r /r and δVl (r) = Vl (r) − Vlocal (r). χlmn (r ) = χln (r)Ylm (r̂ ) (with
Ylm (r̂ ) a spherical harmonic) are the KB projection functions
KB
χln (r) = δVl (r)ϕln (r). (4)
The functions ϕln are obtained from the eigenstates ψln of the semilocal pseudopotential
(screened by the pseudo-valence charge density) at energy ln using the orthogonalization
scheme proposed by Blöchl [24]:
n−1
ϕln |δVl (r)|ψln
ϕln (r) = ψln (r) − ϕln (r) (5)
n =1
ϕ ln |δVl (r)|ϕln
1 d2 l(l + 1) H xc
− r + + V l (r) + V (r) + V (r) ψln (r) = ln ψln (r). (6)
2r dr 2 2r 2
V H and V xc are the Hartree and XC potentials for the pseudo-valence charge density, and we
are using atomic units (e = h̄ = me = 1) throughout this paper.
The local part of the pseudopotential Vlocal (r) is in principle arbitrary, but it must join the
semilocal potentials Vl (r), which, by construction, all become equal to the (unscreened) all-
electron potential beyond the pseudopotential core radius rcore . Thus, δVl (r) = 0 for r > rcore .
Ramer and Rappe have proposed that Vlocal (r) be optimized for transferability [25], but most
plane wave schemes make it equal to one of the Vl (r) for reasons of efficiency. Our case is
different because Vlocal (r) is the only pseudopotential part that needs to be represented in the
real space grid, while the matrix elements of the nonlocal part V̂KB are cheaply and accurately
calculated by two-centre integrals. Therefore, we optimize Vlocal (r) for smoothness, making
it equal to the potential created by a positive charge distribution of the form [26]
ρ local (r) ∝ exp[−(sinh(abr)/ sinh(b))2 ], (7)
where a and b are chosen to provide simultaneously optimal real-space localization and
reciprocal-space convergence8 . After some numerical tests we have taken b = 1 and
a = 1.82/rcore . Figure 1 shows Vlocal (r) for silicon.
KB
Since Vl (r) = Vlocal (r) outside rcore , χln (r) is strictly zero beyond that radius,
KB
irrespective of the value of ln . Generally it is sufficient to have a single projector χlm
9
8 The local potentials constructed in this way usually have a strength (depth) that is an average of the different Vl and
neither too deep nor too shallow. This tends to maintain the separable potentials free of ghost states [80].
9 For some atoms, typically those with semicore states suitable for treating together with the valence states, V (r) only
l
assumes the asymptotic coulombic behaviour −2Zval /r, and therefore only cancels out exactly with our Vlocal (r), for
r larger than certain rC > rcore . In these cases, to avoid very extended KB projector functions, we generate the local
potentials with a prescription different from that presented in the text: if rC > 1.3 rcore we take Vlocal (r) = Vl (r)
for r > rcore and Vlocal (r) = exp(v1 + v2 r 2 + v3 r 3 ) for r < rcore , where v1 , v2 and v3 are determined by enforcing
the continuity of the potential up to the second derivative. This simple prescription usually produces smooth local
potentials with properties similar to those noted in the text (see footnote 8).
2748 J M Soler et al
Figure 1. Local pseudopotential for silicon. Vlocal is the unscreened local part of the
pseudopotential, generated as the electrostatic potential produced by a localized distribution of
positive charge, equation (7), whose integral is equal to the valence ion charge (Z = 4 for Si). The
dashed curve is −Z/r. VN A is the local pseudopotential screened by an electron charge distribution,
generated by filling the first-ζ basis orbitals with the free-atom valence occupations. Since these
c ,V
basis orbitals are strictly confined to a radius rmax N A is also strictly zero beyond that radius.
for each angular momentum (i.e. a single term in the sum on n). In this case we follow the
normal practice of making ln equal to the valence atomic eigenvalue l , and the function ϕl (r)
in equation (4) is identical to the corresponding eigenstate ψl (r). In some cases, particularly
for alkaline metals, alkaline earths and transition metals of the first few columns, we have
sometimes found it necessary to include the semicore states together with the valence states10 .
In these cases, we also include two independent KB projectors, one for the semicore and one
for the valence states. However, our pseudopotentials are still norm conserving rather than
‘ultrasoft’ [27]. This is because, in our case, it is only the electron density that needs to be
accurately represented in a real-space grid, rather than each wavefunction. Therefore, the
ultrasoft pseudopotential formalism does not imply in SIESTA the same savings as it does
in PW schemes. Also, since the nonlocal part of the pseudopotential is a relatively cheap
operator within SIESTA, we generally (but not necessarily) use a larger than usual value of
KB
lmax in equation (2), making it one unit larger than the lmax of the basis functions.
3. Basis set
Order-N methods rely heavily on the sparsity of the Hamiltonian and overlap matrices. This
sparsity requires either the neglect of matrix elements that are small enough or the use of
strictly confined basis orbitals, i.e. orbitals that are zero beyond a certain radius [7]. We have
adopted this latter approach because it keeps the energy strictly variational, thus facilitating
the test of the convergence with respect to the radius of confinement. Within this radius, our
atomic basis orbitals are products of a numerical radial function and a spherical harmonic. For
atom I , located at RI ,
φI lmn (r ) = φI ln (rI )Ylm (r̂I ) (8)
where rI = r − RI . The angular momentum (labelled by l, m) may be arbitrarily large
and, in general, there will be several orbitals (labelled by index n) with the same angular
10 If there are both semicore and valence electrons with the same angular momentum, the pseudopotential is generated
for an ion.
The SIESTA method for ab initio order-N materials simulation 2749
the number of nonzero matrix elements, without any loss of variational freedom.
To achieve well converged results, in addition to the atomic valence orbitals, it is generally
necessary to also include polarization orbitals, to account for the deformation induced by bond
2750 J M Soler et al
where δH = E z and δE = φ|δH |φ = 0 because δH is odd. Selection rules imply that the
resulting perturbed orbital will only have components with l = l ± 1, m = m:
δH φlm (r ) = (E r cos(θ ))(φl (r)Ylm (r̂ )) = E rφl (r)(cl−1 Yl−1,m + cl+1 Yl+1,m ) (12)
and
Since in general there will already be orbitals with angular momentum l − 1 in the basis set,
∗
we select the l + 1 component by substituting (12) and (13) in (11), multiplying by Yl+1,m (r̂ )
and integrating over angular variables. Thus we obtain the equation
1 d2 (l + 1)(l + 2)
− r + + V l (r) − E l ϕl+1 (r) = −rφl (r) (14)
2r dr 2 2r 2
where we have also eliminated the factors E and cl+1 , which only affect the normalization of
ϕl+1 . The polarization orbitals are then added to the basis set: φl+1,m (r ) = N ϕl+1 (r)Yl+1,m (r̂ ),
where N is a normalization constant.
We have found that the previously described procedures generate reasonable minimal SZ
basis sets, appropriate for semiquantitative simulations, and double-ζ plus polarization (DZP)
basis sets that yield high-quality results for most of the systems studied. We thus refer to DZP
as the ‘standard’ basis, because it usually represents a good balance between well converged
results and a reasonable computational cost. In some cases (typically alkali and some transition
metals), semicore states also need to be included for good-quality results. More recently [33],
we have obtained extremely efficient basis sets optimized variationally in molecules or solids.
Figure 2 shows the performance of these atomic basis sets compared with plane waves, using
the same pseudopotentials and geometries. It may be seen that the SZ bases are comparable
to plane-wave cutoffs typically used in Car–Parrinello molecular dynamics simulations, while
DZP sets are comparable to the cutoffs used in geometry relaxations and energy comparisons.
As expected, the LCAO is far more efficient, typically by a factor of 10–20, in terms of number
of basis orbitals. This efficiency must be balanced against the faster algorithms available
for plane waves, and our main motivation for using an LCAO basis is its suitability for O(N )
methods. Still, we have generally found that, even without using the O(N ) functional, SIESTA
is considerably faster than a plane-wave calculation of similar quality.
Figure 3 shows the convergence of the total-energy curve of silicon, as a function of
lattice parameter, for different basis sizes, and table 1 summarizes the same information
numerically. It can be seen that the ‘standard’ DZP basis offers already quite well converged
results, comparable to those used in practice in most plane-wave calculations.
Figure 4 shows the dependence of the lattice constant, bulk modulus and cohesive energy
of bulk silicon on the range of the basis orbitals. It shows that a cutoff radius of 3 Å for both
s and p orbitals yields already very well converged results, specially when using a ‘standard’
DZP basis.
The SIESTA method for ab initio order-N materials simulation 2751
2.50
Si
SZ (4)
2.00
Energy (eV)
DZ (8)
1.50
TZ (12)
1.00
SZP (9)
DZP (13) TZDP (22)
0.50 TZP (17)
TZTP (27)
TZTPF (34)
0.00
5 10 15 20 25 30 35 PW cutoff (Ryd)
(25) (71) (130) (201) (280) (369) (464) PW basis size
Figure 2. Comparison of convergence of the total energy with respect to the sizes of a plane-
wave basis set and of the LCAO basis set used by SIESTA. The curve shows the total energy per
atom of silicon versus the cutoff of a plane-wave basis, calculated with a program independent of
SIESTA, which uses the same pseudopotential. The arrows indicate the energies obtained with
different LCAO basis sets, calculated with SIESTA, and the plane-wave cutoffs that yield the same
energies. The numbers in parentheses indicate the basis sizes, i.e. the number of atomic orbitals
or plane waves of each basis set. SZ, single ζ (valence s and p orbitals); DZ, double ζ ; TZ,
triple ζ ; DZP, double-ζ valence orbitals plus SZ-polarization d orbitals; TZP, triple-ζ valence plus
SZ polarization; TZDP, triple-ζ valence plus double-ζ polarization; TZTP, triple-ζ valence plus
triple-ζ polarization; TZTPF, the same as TZTP plus extra SZ-polarization f orbitals.
2.0
Total Energy (eV)
1.5
SZ
DZ
TZ
SZP
1.0 DZP
TZP
TZDP
TZTP
TZTPF
PW
0.5 minima
0.0
5.0 5.2 5.4 5.6 5.8 6.0
Lattice Constant (Å)
Figure 3. Total energy per atom versus lattice constant for bulk silicon, using different basis sets,
denoted as in figure 2. PW refers to a very well converged (50 Ryd cutoff) plane-wave calculation.
The dotted curve joins the minima of the different curves.
4. Electron Hamiltonian
5.6
SZ
5.5
Exp
5.4
a (Å)
PW
5.3 DZP
5.2
5.1
175
B (GPa)
150
DZP
125
100 Exp
PW
SZ
75
PW
5
E c (eV)
DZP
Exp
SZ
4
3
5.0 6.0 7.0 8.0
cutoff radii (a.u.)
Figure 4. Dependence of the lattice constant, bulk modulus and cohesive energy of bulk silicon
on the cutoff radius of the basis orbitals. The s and p orbital radii have been made equal in this
case, to simplify the plot. PW refers to a well converged plane-wave calculation with the same
pseudopotential.
Table 1. Comparisons of the lattice constant a, bulk modulus B and cohesive energy Ec for bulk
Si, obtained with different basis sets. The basis notation is as in figure 2. PW refers to a 50 Ryd
cutoff plane-wave calculation. The LAPW results were taken from [34], and the experimental
values from [35].
In order to eliminate the long range of VIlocal , we screen it with the potential VIatom ,
created by an atomic electron density ρIatom , constructed by populating the basis functions
with appropriate valence atomic charges. Notice that, since the atomic basis orbitals are
zero beyond the cutoff radius rIc = maxl (rIcl ), the screened ‘neutral-atom’ (NA) potential
VIN A ≡ VIlocal + VIatom is also zero beyond this radius [7] (see figure 1). Now let δρ(r ) be the
differencebetween the selfconsistent electron density ρ(r ) and the sum of atomic densities
ρ atom = I ρIatom , and let δV H (r ) be the electrostatic potential generated by δρ(r ), which
integrates to zero and is usually much smaller than ρ(r ). Then the total Hamiltonian may be
rewritten as
Ĥ = T̂ + V̂IKB + VIN A (r ) + δV H (r ) + V xc (r ). (16)
I I
The matrix elements of the first two terms involve only two-centre integrals, which are
calculated in reciprocal space and tabulated as a function of interatomic distance. The
remaining terms involve potentials which are calculated on a three-dimensional real-space
grid. We consider these two approaches in detail in the following sections.
5. Two-centre integrals
The overlap matrix and the largest part of the Hamiltonian matrix elements are given by
two-centre integrals11 . We calculate these integrals in Fourier space, as proposed by Sankey
and Niklewski [7], but we use some implementation details explained in this section. Let us
consider first overlap integrals of the form
S(R) ≡ ψ1 |ψ2 = ψ1∗ (r )ψ2 (r − R) dr , (17)
where the integral is over all space and ψ1 , ψ2 may be basis functions φlmn , KB pseudopotential
projectors χlmn or more complicated functions centred on the atoms. The function S(R) can
be seen as a convolution: we take the Fourier transform
1
ψ(k) = ψ(r )e−ikr dr (18)
(2π)3/2
where we use the same symbol ψ for ψ(r ) and ψ(k), as its meaning is clear from the
different arguments. We also use the plane-wave expression of Dirac’s delta function,
ei(k −k)r dr = (2π)3 δ(k − k), to find the usual result that the Fourier transform of a
convolution in real space is a simple product in reciprocal space:
S(R) = ψ1∗ (k)ψ2 (k)e−ikR dk. (19)
Let us assume now that the functions ψ(r ) can be expanded exactly with a finite number of
spherical harmonics:
lmax
l
ψ(r ) = ψlm (r)Ylm (r̂ ), (20)
l=0 m=−l
π 2π
∗
ψlm (r) = sin θ dθ dϕYlm (θ, ϕ)ψ(r, θ, ϕ). (21)
0 0
11 Some integrals, such as φI lmn |VIN A |φI l m n could also be calculated in this way, but this is not the case of
φI lmn |VIN A |φI l m n , which involve rather cumbersome three-centre integrals of arbitrary numerical functions.
Therefore, it is simpler to find the total NA potential and to calculate a single integral φI lmn |V N A (r )|φI l m n
in the uniform spatial grid.
2754 J M Soler et al
This is clearly true for basis functions and KB projectors, which contain a single spherical
harmonic, and also for functions such as xψ(r ), which appear in dipole matrix elements. We
now substitute in (18) the expansion of a plane wave in spherical harmonics [36]
∞
l
eikr = 4π il jl (kr)Ylm
∗
(k̂)Ylm (r̂ ), (22)
l=0 m=−l
to obtain
lmax
l
ψ(k) = ψlm (k)Ylm (k̂), (23)
l=0 m=−l
∞
2 l
ψlm (k) = (−i) r 2 dr jl (kr)ψlm (r). (24)
π 0
Substituting now (23) and (22) into (19) we obtain
2l max
l
S(R) = Slm (R)Ylm (R̂) (25)
l=0 m=−l
where
Slm (R) = Gl1 m1 ,l2 m2 ,lm Sl1 m1 ,l2 m2 ,l (R), (26)
l1 m 1 l2 m 2
π 2π
Gl1 m1 ,l2 m2 ,lm = sin θ dθ dϕ Yl∗1 m1 (θ, ϕ)Yl2 m2 (θ, ϕ)Ylm ∗
(θ, ϕ), (27)
0 0
∞
Sl1 m1 ,l2 m2 ,l (R) = 4πil1 −l2 −l k 2 dk jl (kR)i−l1 ψ1,l
∗
1 m1
(k)il2 ψ2,l2 m2 (k). (28)
0
Notice that i−l1 ψ1∗ (k), il2 ψ2 (k) and il1 −l2 −l are all real, since l1 − l2 − l is even for all l for
which Gl1 m1 ,l2 m2 ,lm = 0. The Gaunt coefficients Gl1 m1 ,l2 m2 ,lm can be obtained by recursion from
Clebsch–Gordan coefficients [7]. However, we use real spherical harmonics for computational
efficiency:
m sin(mϕ) if m < 0
Ylm (θ, ϕ) = Clm Pl (cos θ ) (29)
cos(mϕ) if m 0
where Plm (z) are the associated Legendre polynomials and Clm normalization constants [28].
This does not affect the validity of any of previous equations, but it modifies the value of the
Gaunt coefficients. Therefore, we find it is simpler and more general to calculate Gl1 m1 ,l2 m2 ,lm
directly from equation (27). To do this, we use a Gaussian quadrature [28]
π 2π Nϕ
1 Nθ
1
sin θ dθ dϕ → 4π wi sin θi (30)
0 0 Nθ i=1 Nϕ j =1
with Nϕ = 1+3lmax , Nθ = 1+int(3lmax /2), and the points cos θi and weights wi are calculated
as described in [28]. This quadrature is exact in equation (27) for spherical harmonics Ylm
(real or complex) of l lmax , and it can be used also to find the expansion of ψ(r ) in spherical
harmonics (equation (21)).
The coefficients Gl1 m1 ,l2 m2 ,lm are universal and they can be calculated and stored once
and for all. The functions Sl1 m1 ,l2 m2 ,l (R) depend, of course, on the functions ψ1,2 (r ) being
integrated. For each pair of functions, they can be calculated and stored in a fine radial grid
Ri , up to the maximum distance Rmax = r1c + r2c at which ψ1 and ψ2 overlap. Their value at
an arbitrary distance R can then be obtained very accurately using a spline interpolation.
The SIESTA method for ab initio order-N materials simulation 2755
Kinetic matrix elements T (R) ≡ ψ1∗ | − 21 ∇ 2 |ψ2 can be obtained in exactly the same
way, except for an extra factor of k 2 in equation (28):
∞
Tl1 m, l2 m2 ,l (R) = 4πil1 −l2 −l k dk jl (kR)i−l1 ψ1,l
1 4
2
∗
1 m1
(k)il2 ψ2,l2 m2 (k). (31)
0
Since we frequently use basis orbitals with a kink [7], we need rather fine radial grids to obtain
accurate kinetic matrix elements, and we typically use grid cutoffs of more than 2000 Ryd for
this purpose. Once obtained, the fine grid does not penalize the execution time, because the
interpolation effort is independent of the number of grid points. It also affects very marginally
the storage requirements, because of the one-dimensional character of the tables. However,
even though it needs to be performed only once, the calculation of the radial integrals (24), (28),
and (31) is not negligible if performed unwisely. We have developed a special fast radial Fourier
transform for this purpose, as explained in appendix B.
Dipole matrix elements, such as ψ1 |x|ψ2 , can also be obtained easily by defining a new
function χ1 (r ) ≡ xψ1 (r ), expanding it using (21) and computing χ1 |ψ2 as explained above
(with the precaution of using lmax + 1 instead of lmax ).
6. Grid integrals
The matrix elements of the last three terms of equation (16) involve potentials which are
calculated on a real-space grid. The fineness of this grid is controlled by a ‘grid cutoff’
Ecut : the maximum kinetic energy of the plane waves that can be represented in the grid
without aliasing12 . The short-range screened pseudopotentials VIN A (r ) in (16) are tabulated
as a function of the distance to atoms I and easily interpolated at any desired grid point. The
last two terms require the calculation of the electron density on the grid. Let ψi (r ) be the
Hamiltonian eigenstates, expanded in the atomic basis set
ψi (r ) = φµ (r )cµi , (32)
µ
where cµi = φ̃µ |ψi and φ̃µ is the dual orbital of φµ : φ̃µ |φν = δµν . We use the compact
index notation µ ≡ {I lmn} for the basis orbitals, equation (8). The electron density is then
ρ(r ) = ni |ψi (r )|2 (33)
i
where ni is the occupation of state ψi . If we substitute (32) into (33) and define a density
matrix
ρµν = cµi ni ciν , (34)
i
∗
where ciν ≡ cνi ,
the electron density can be rewritten as
ρ(r ) = ρµν φν∗ (r )φµ (r ). (35)
µν
We use the notation φµ∗ for generality, despite our use of real basis orbitals in practice. Then, to
calculate the density at a given grid point, we first find all the atomic basis orbitals, equation (8),
at that point, interpolating the radial part from numerical tables, and then we use (35) to calculate
the density. Notice that only a small number of basis orbitals are nonzero at a given grid point,
so that the calculation of the density can be performed in O(N ) operations, once ρµν is known.
12 Notice that our grid cutoff to represent the density is not directly comparable to the energy cutoff in the context
of plane-wave codes, which usually refers to the wavefunctions. Strictly speaking, the density requires a value four
times larger.
2756 J M Soler et al
The storage of the orbital values at the grid points can be one of the most expensive parts of
the program in terms of memory usage. Hence, an option is included to calculate and use
these terms on the fly, in the spirit of a direct-SCF calculation. The calculation of ρµν itself
with equation (34) does not scale linearly with the system size, requiring instead the use of
special O(N ) techniques to be described below. However, notice that in order to calculate
the density, only the matrix elements ρµν for which φµ and φν overlap are required, and they
can therefore be stored as a sparse matrix of O(N ) size. Once the valence density is available
in the grid, we add to it, if necessary, the nonlocal core correction [20], a spherical charge
density intended to simulate the atomic cores, which is also interpolated from a radial grid.
With it, we find the XC potential V xc (r ), trivially in the LDA and using the method described
in [?] for the GGA. To calculate δV H (r ), we first find ρ atom (r ) at the grid points, as a sum
of spherical atomic densities (also interpolated from a radial grid) and subtract it from ρ(r )
to find δρ(r ). We then solve Poisson’s equation to obtain δV H (r ) and find the total grid
potential V (r ) = V N A (r ) + δV H (r ) + V xc (r ). Finally, at every grid point, we calculate
V (r )φµ∗ (r )φν (r )8r 3 for all pairs φµ , φν which are not zero at that point (8r 3 is the volume
per grid point) and add it to the Hamiltonian matrix element Hµν .
To solve Poisson’s equation and find δV H (r ) we normally use fast Fourier transforms
in a unit cell that is either naturally periodic or made artificially periodic by a supercell
construction. For neutral isolated molecules, our use of strictly confined basis orbitals makes
it trivial to avoid any direct overlap between the repeated molecules, and the electric multipole
interactions decrease rapidly with cell size. For charged molecules we suppress the G = 0
Fourier component (an infinite constant) of the potential created by the excess of charge. This
amounts to compensating this excess with a uniform charge background. We then use the
method of Makov and Payne [38] to correct the total energy for the interaction between the
repeated cells. Alternatively, we can solve Poisson’s equation by the multigrid method, using
finite differences and fixed boundary conditions, obtained from the multipole expansion of
the molecular charge density. This can be done in strictly O(N ) operations, unlike the fast
Fourier transformations (FFTs), which scale as N log N . However, the cost of this operation
is typically negligible and therefore has no influence on the overall scaling properties of the
calculation.
Figures 5 and 6 show the convergence of different magnitudes with respect to the energy
cutoff of the integration grid. For orthogonal unit cell vectors this is simply, in atomic units,
Ecut = (π/8x)2 /2 with 8x the grid interval.
7. Noncollinear spin
In the usual case of a normal (collinear) spin-polarized system, there are two sets of values for
ψi (r ), ρµν , ρ(r ), V xc (r ) and Hµν , one for spin up and another for spin down. Thus, the grid
calculations can be repeated twice in an almost independent way: only to calculate V xc (r )
need they be combined. However, in the noncollinear spin case [39–42], the density at every
point is represented not only by the up and down values, but also by a vector giving the spin
direction. Equivalently, it may be represented by a local spin density matrix
β∗
ρ αβ (r ) = ni ψi (r )ψiα (r ) = αβ ∗
ρµν φν (r )φµ (r ) (36)
i µν
ψiα (r ) = α
φµ (r )cµi (37)
µ
β
αβ α
ρµν = cµi ni ciν (38)
i
The SIESTA method for ab initio order-N materials simulation 2757
2
40
20
Si 0
∆ d (mAng)
∆ Et (meV)
-2
0
-4
-20 H 2O
-6
-40
15 1
∆ P (kbar)
∆ α (o )
0
-5
-15 -1
10 30 50 70 90 20 60 100 140 180
Ec (Ry) Ec (Ry)
Figure 5. (a) Convergence of the total energy and pressure in bulk silicon as a function of the
energy cutoff Ecut of the real-space integration mesh. Circles and continuous line: using a grid-cell
sampling of eight refinement points per original grid point. The refinement points are used only in
the final calculation, not during the selfconsistency iteration (see text). Triangles: two refinement
points per original grid point. White circles: no grid-cell sampling. (b) Bond length and angle of
the water molecule as a function of Ecut .
α
where α, β are spin indices, with up or down values. The coefficients cµi are obtained by
solving the generalized eigenvalue problem
β
αβ
(Hµν − Ei Sµν δ αβ )cνi = 0 (39)
νβ
αβ αβ
where Hµν , like ρµν , is a (2N × 2N) matrix, with N the number of basis functions:
αβ αβ
Hµν = φµ |T̂ + V̂ KB + V N A (r ) + δV H (r ) + VXC (r )|φν . (40)
This is in contrast to the collinear spin case, in which the Hamiltonian and density matrices can
αβ
be factorized into two N × N matrices, one for each spin direction. To calculate VXC (r ) we
αβ
first diagonalize the 2 × 2 matrix ρ (r ) at every point, in order to find the up and down spin
↑ ↓
densities ρ ↑ (r ), ρ ↓ (r ) in the direction of the local spin vector. We then find VXC (r ), VXC (r )
αβ
in that direction, with the usual local spin density functional [15], and we rotate VXC (r ) back
to the original direction. Thus, the grid operations are still basically the same, except that they
need now to be repeated three times, for the ↑↑, ↓↓ and ↑↓ components. Notice that ρ αβ (r )
αβ αβ αβ βα αβ∗
and VXC (r ) are locally Hermitian, while Hµν and ρµν are globally Hermitian (Hνµ = Hµν ),
so their ↓↑ components can be obtained from the ↑↓ ones.
Integration of all magnitudes over the Brillouin zone (BZ) is essential for small and moderately
large unit cells, especially of metals. Although SIESTA is designed for large unit cells, in
practice it is very useful, especially for comparisons and checks, to be able to also perform
calculations efficiently on smaller systems without using expensive superlattices. On the other
2758 J M Soler et al
40 0
20
Fe
∆ Et (meV)
-20
0
-40
-20
LDA GGA
-40 -60
40 40
∆ P (kbar)
20 20
0 0
-20 -20
-40 -40
Figure 6. The same as figure 5 for the total energy and pressure of bulk iron. This is presented as
an especially difficult case because of the very hard partial core correction (rm = 0.7 au) required
for a correct description of XC.
hand, an efficient k-sampling implementation should not penalize, because of the required
complex arithmetic, the <-point calculations used in large cells. A solution used in some
programs is to have two different versions of all or part of the code, but this poses extra
maintenance requirements. We have dealt with this problem in the following way: around the
unit cell (and comprising the unit cell itself) we define an auxiliary supercell large enough to
contain all the atoms whose basis orbitals are nonzero at any of the grid points of the unit cell,
or which overlap with any of the basis orbitals in it. We calculate all the nonzero two-centre
integrals between the unit-cell basis orbitals and the supercell orbitals, without any complex
phase factors. We also calculate the grid integrals between all the supercell basis orbitals φµ
and φν (primed indices run over all the supercell), but within the unit cell only. We accumulate
these integrals in the corresponding matrix elements, thus making use of the relation
φµ |V (r )|φν = φµ |V (r )f (r )|φν . (41)
(µ ν )≡(µν )
f (r ) = 1 for r within the unit cell and is zero otherwise. φµ is within the unit cell. The
notation µ ≡ µ indicates that φµ and φµ are equivalent orbitals, related by a lattice vector
translation. (µ ν ) ≡ (µν ) means that the sum extends over all pairs of supercell orbitals
φµ and φν such that µ ≡ µ, ν ≡ ν , and Rµ − Rν = Rµ − Rν . Once all the real
overlap and Hamiltonian matrix elements are calculated, we multiply them, at every k-point,
by the corresponding phase factors and accumulate them by folding the supercell orbital to its
unit-cell counterpart. Thus
Hµν (k) = Hµν eik(Rν −Rµ ) (42)
ν ≡ν
where φµ and φν are within the unit cell. The resulting N × N complex eigenvalue problem,
with N the number of orbitals in the unit cell, is then solved at every sampled k point, finding
The SIESTA method for ab initio order-N materials simulation 2759
where the sum in µ extends to all basis orbitals in space, i labels the different bands, cµ i = cµi
if µ ≡ µ and ψi (k, r ) is normalized in the unit cell.
The electron density is then
ρ(r ) = ni (k)|ψi (k, r )|2 dk = ρµ ν φν∗ (r )φµ (r ) (44)
i BZ µ ν
where the sum is again over all basis orbitals in space, and the density matrix
ρµν = cµi (k)ni (k)ciν (k)eik(Rν −Rµ ) dk (45)
i BZ
is real (for real φµ ) and periodic, i.e. ρµν = ρµ ν if (ν, µ) ≡ (ν , µ ) (with ‘≡’ meaning again
‘equivalent by translation’). Thus, to calculate the density at a grid point of the unit cell, we
simply find the sum (44) over all the pairs of orbitals φµ , φν in the supercell that are nonzero
at that point.
In practice, the integral in (45) is performed in a finite, uniform grid of the BZ. The fineness
of this grid is controlled by a k-grid cutoff lcut , a real-space radius which plays a role equivalent
to the plane-wave cutoff of the real-space grid [43]. The origin of the k-grid may be displaced
from k = 0 in order to decrease the number of inequivalent k-points [44].
If the unit cell is large enough to allow a <-point-only calculation, the multiplication by
phase factors is skipped and a single real-matrix eigenvalue problem is solved (in this case,
the real matrix elements are accumulated directly in the first stage, if multiple overlaps occur).
In this way, no complex arithmetic penalty occurs, and the differences between <-point and
k-sampling are limited to a very small section of the code, while all the two-centre and grid
integrals always use the same real-arithmetic code.
9. Total energy
The Kohn–Sham [14] total energy can be written as a sum of a band-structure (BS) energy
plus some correction terms, sometimes called ‘double-count’ corrections. The BS term is the
sum of the energies of the occupied states ψi :
E BS = ni ψi |Ĥ |ψi = Hµν ρνµ = Tr(Hρ) (46)
i µν
where spin and k-sampling notations are omitted here for simplicity. At convergence, the ψi are
simply the eigenvectors of the Hamiltonian, but it is important to realize that the Kohn–Sham
functional is also perfectly well defined outside this so-called ‘Born–Oppenheimer surface’,
i.e. it is defined for any set of orthonormal ψi . The correction terms are simple functionals of
the electron density, which can be obtained from equation (35), and the atomic positions. The
Kohn–Sham total energy can then be written as
ZI Z J
KS 1
E = Hµν ρνµ − V (r )ρ(r ) d r + ( xc (r ) − V xc (r ))ρ(r ) d3 r +
H 3
µν 2 I <J
RI J
(47)
where I, J are atomic indices, RI J ≡ |RJ − RI |, ZI , ZJ are the valence ion pseudoatom
charges and xc (r )ρ(r ) is the exchange–correlation energy density. In order to avoid the long-
range interactions of the last term, we construct from the local pseudopotential VIlocal , which
2760 J M Soler et al
has an asymptotic behaviour of −ZI /r, a diffuse ion charge, ρIlocal (r), whose electrostatic
potential is equal to VIlocal (r):
1 2 local
ρIlocal (r ) = − ∇ VI (r ). (48)
4π
Notice that we define the electron density as positive, and therefore ρIlocal 0. Then, we write
the last term in (47) as
ZI ZJ
= 21 UIlocal
J (RI J ) + δUIlocal
J (RI J ) − UIlocal (49)
I <J
R I J IJ I <J I
where UIlocal
J is the electrostatic interaction between the diffuse ion charges in atoms I and J ,
UIlocal
J (| R |) = VIlocal (r )ρJlocal (r − R) d3 r , (50)
δUIlocal
J is a small short-range interaction term to correct for a possible overlap between the
soft-ion charges, which appears when the core densities are very extended,
ZI Z J
δUIlocal
J (R) = − UIlocal
J (R), (51)
R
and UIlocal is the fictitious selfinteraction of an ion charge (notice that the first right-hand sum
in (49) includes the I = J terms):
UIlocal = 21 UIlocal
I (0) = 1
2
VIlocal (r)ρIlocal (r)4π r 2 dr. (52)
Defining ρIN A from VIN A , analogously to ρIlocal , we have that ρIN A = ρIlocal + ρIatom , and
equation (47) can be transformed, after some rearrangement of terms, into
E KS = KB
(Tµν + Vµν )ρνµ + 21 UINJA (RI J ) + δUIlocal
J (RI J ) − UIlocal
µν IJ I <J I
NA H xc
+ V (r )δρ(r ) d r +
3 1
2
δV (r )δρ(r ) d r +
3
(r )ρ(r ) d3 r (53)
where V N A = I VIN A and δρ = ρ − I ρIatom .
NA NA NA 1
UI J (R) = VI (r )ρJ (r − R) d r = −
3
VIN A (r )∇ 2 VJN A (r − R) d3 r (54)
4π
is a radial pairwise potential that can be obtained from VIN A (r) as a two-centre integral, by the
same method as described previously for the kinetic matrix elements:
Tµν = φµ | − 21 ∇ 2 |φν = − 21 φµ∗ (r )∇ 2 φν (r − Rµν ) d3 r . (55)
KB
Vµν is also obtained by two-centre integrals:
KB
Vµν = φµ |χα vαKB χα |φν (56)
α
where the sum is over all the KB projectors χα that overlap simultaneously with φµ and φν .
Although (53) is the total-energy equation
actually used by SIESTA, itsmeaning may be
further clarified if the I = J terms of 21 I J UINJA (RI J ) are combined with I UIlocal to yield
E KS = (Tµν + VµνKB
)ρνµ + UINJA (RI J ) + δUIlocal
J (RI J ) + UIatom
µν I <J I <J I
+ V N A (r )δρ(r ) d3 r + 1
2
δV H (r )δρ(r ) d3 r + xc (r )ρ(r ) d3 r (57)
The SIESTA method for ab initio order-N materials simulation 2761
where ∞
UIatom = (VIlocal (r) + 21 VIatom (r))ρIatom (r)4π r 2 dr (58)
0
is the electrostatic energy of an isolated atom.
The last three terms in equation (53) are calculated using the real-space grid. In addition
to getting rid of all long-range potentials (except that implicit in δV H (r )), the advantage
of (53) is that, apart from the relatively slowly varying exchange–correlation energy density,
the grid integrals involve δρ(r ), which is generally much smaller than ρ(r ). Thus, the errors
associated with the finite grid spacing are drastically reduced. Critically, the kinetic energy
matrix elements can be calculated almost exactly, without any grid integrations.
It is frequently desirable to introduce a finite electronic temperature T and/or a
fixed chemical potential µ, either because of true physical conditions or to accelerate the
selfconsistency iteration. Then, the functional that must be minimized is the free energy [45]
F (RI , ψi (r ), ni ) = E KS (RI , ψi (r ), ni ) − µ ni
i
− kB T (ni log ni + (1 − ni ) log(1 − ni )). (59)
i
Minimization with respect to ni yields the usual Fermi–Dirac distribution ni = 1/(1 +
e(i −µ)/kB T ).
We shall mention here a special use of the Harris energy functional, that is generally defined
as [46, 47]
in
1 ρ (r )ρ in (r ) 3 3
E H arris [ρ in ] = nout
i ψ i
out
| Ĥ in
|ψi
out
− d rd r
i
2 |r − r |
ZI Z J
in in
+ (xc (r ) − vxc (r ))ρ in (r ) d3 r + (60)
I <J
RI J
where Ĥ in is the KS Hamiltonian produced by a trial density ρ in and ψiout are its eigenvectors
(which in general are different from those whose density is ρ in ). As in equation (46), the
first term in (60) can be written as Tr(H in ρ out ), and the rest are the so-called ‘double-count
corrections’. An important advantage of equation (60) is that it does not require ρiin to be
obtained from a set of orthogonal electron states ψiin , and in fact ρ in is frequently taken as a
simple superposition of atomic densities. However, we shall assume here that the states ψiin
are indeed known. In this case, the Kohn–Sham energy E KS [ρ in ], equation (47), obeys exactly
the same expression (60), except that ψiout and nout i must be replaced by ψiin and nin
i . Thus, a
simple subtraction gives
E H arris [ρ in ] = E KS [ρ in ] + Hνµin out
(ρµν in
− ρµν ). (61)
µν
Generally the Harris functional is used nonselfconsistently, with a trial density given by the
sum of atomic densities, but here we want to comment on its usefulness to improve dramatically
in
the estimate of the converged total energy, by taking ρµν as the density matrix of the (n − 1)th
out
selfconsistency iteration and ρµν of the nth iteration. In fact, E H arris frequently gives, after
just two or three iterations, a better estimate than E KS after tens of iterations. Unfortunately,
we have found that there is hardly any improvement in the convergence of the atomic forces
thus estimated, and therefore the selfconsistent Harris functional is less useful for geometry
relaxations or molecular dynamics.
2762 J M Soler et al
Atomic forces and stresses are obtained by direct differentiation of (53) with respect to atomic
positions. They are obtained simultaneously with the total energy, mostly in the same places
in the code, under the general paradigm ‘a piece of energy ⇒ a piece of force/stress’ (except
that some pieces are calculated only in the last selfconsistency step). This ensures that all force
contributions, including Pulay corrections, are automatically included. The force contribution
from the first term in (53) is
∂ KB
KB ∂ρνµ
dTµν
(Tµν + Vµν )ρνµ = (Tµν + Vµν ) +2 ρνµ
∂ RI µν µν ∂ RI µ ν∈I d Rµν
dSαν dSαν
+2 Sµα vαKB ρνµ − 2 Sµα vαKB ρνµ (62)
µ ν∈I α d R αν µν α∈I d Rαν
Now, using equation (35) and that, for ν ∈ I , ∂φν (r )/∂ RI = −∇φν , the changes of the
selfconsistent and atomic densities are
∂ρ(r ) ∂ρνµ
= Re φµ∗ (r )φν (r ) − 2Re ρνµ φµ∗ (r )∇φν (r ) (68)
∂ RI µν ∂ RI µ ν∈I
∂ρ atom (r )
atom ∗
= −2Re ρµµ φµ (r )∇φµ (r ) (69)
∂ RI µ∈I
where we have taken into account that the density matrix of the separated atoms is diagonal.
Thus, still leaving aside the terms with ∂ρνµ /∂ RI , the last term in equation (65), as well as
those in (66) and (67), has the general form
Re ρνµ V (r )φµ∗ (r )∇φν (r ) d3 r = Re ρνµ φµ |V (r )|∇φν . (70)
µ ν∈I µ ν∈I
These integrals are calculated on the grid, in the same way as those for the total energy (i.e.
φµ |V (r )|φν ). The gradients ∇φν (r ) at the grid points are obtained analytically, like those
of φν (r ) from their radial grid interpolations of φ(r)/r l :
d φI ln (r) l φI ln (r)
∇φI lmn (r ) = r Ylm (r̂ )r̂ + ∇(r l Ylm (r̂ )). (71)
dr rl rl
In some special cases, with elements that require hard-partial-core corrections or explicit
inclusion of the semicore, the grid integrals may pose a problem for geometry relaxations,
because they make the energy dependent on the position of the atoms relative to the grid.
This ‘eggbox effect’ is small for the energy itself, and it decreases fast with the grid spacing,
but the effect is larger and the convergence slower for the forces, as they are proportional to
the amplitude of the energy oscillation, but inversely proportional to its period. These force
oscillations complicate the force landscape, especially when the true atomic forces become
small, making the convergence of the geometry optimization more difficult. Of course, the
problem can be avoided by decreasing the grid spacing but this has an additional cost in
computer time and memory. Therefore, we have found it useful to minimize this problem by
recalculating the forces, at a set of positions, determined by translating the whole system by a
set of points in a finer mesh. This procedure, which we call ‘grid-cell sampling’, has no extra
cost in memory, and since it is done only at the end of the selfconsistency iteration, for fixed
ρµν , it has only a moderate cost in CPU time.
At finite temperature, the forces are really the derivatives of the free energy with respect
to atomic displacements since
dF (RI , ψi (r ), ni ) ∂F ∂F ∂ni ∂F ∂ψi (r ) 3 ∂E
= + + ∗ d r= . (72)
d RI ∂ RI i
∂n i ∂ R I i
∂ψ i ( r ) ∂ R I ∂ RI
In this particular equation we have used the notation d/dRI , as opposed to ∂/∂ RI , to indicate
the inclusion of the change in ψi (r ) and ni when we move the atom, in calculating the derivative,
but we have used also that ∂F /∂ni = ∂F /∂ψi (r ) = 0 and that the last two terms in (59) do
not depend on RI , so that ∂F /∂ RI = ∂E/∂ RI . The latter are the atomic forces actually
calculated. Notice, however, that dE/dRI = ∂E/∂ RI , so the calculated forces are indeed the
total derivatives of the free, not the internal energy.
We would like to also mention the calculation of forces using the nonselfconsistent
Harris functional, in which the ‘in’ density is a superposition of atomic densities. We have
implemented this as an option for ‘quick and dirty’ calculations because, used with a minimal
basis set, it makes SIESTA competitive with tight-binding methods, which are much faster than
density functional calculations. The problem that we address here is that, although E H arris is
2764 J M Soler et al
stationary with respect to ρ out , it is not so with respect to ρ in . In particular, there appears a
force term
in
∂Vxc (r ) out
ρ (r ) d 3 r . (73)
∂ RI
A similar term appears for the electrostatic interaction between the input and output density,
but it presents no special problems because of the linear character of the Hartree potential.
However, evaluation of (73) requires the change of the exchange–correlation potential with
density, a quantity also required to evaluate the linear response of the electron gas, but not in
normal energy and force calculations. Finally, notice that, apart from this minor difficulty, the
Harris-functional forces are perfectly well defined at the first iteration only. For later iterations
(but still not converged) there is no practical way to calculate ∂ρ in /∂ RI and, without the help
of the Hellman–Feynmann theorem (which applies only at convergence), the forces are not
well defined. Of course, the omission of the terms depending on this quantity produces an
estimate of the forces, but we have found that their convergence is not appreciably faster than
those estimated from the Kohn–Sham functional.
We define the stress tensor as the positive derivative of the total energy with respect to the
strain tensor
∂E KS
σαβ = (74)
∂αβ
where α, β are Cartesian coordinate indices. To translate to standard units of pressure, we
must simply divide by the unit-cell volume and change sign. During the deformation, all
vector positions, including those of atoms and grid points (and of course lattice vectors),
change according to
3
rα = (δαβ + αβ )rβ . (75)
β=1
The shapes of the basis functions, KB projectors and atomic densities and potentials do not
change, but their origin is displaced according to (75). From this equation, we find that
∂rγ
= δγ α rβ . (76)
∂αβ
The change in E KS is essentially due to these position displacements, and therefore the
calculation of the stress is almost perfectly parallel to that of the atomic forces, thus being
performed in the same sections of the code. For example,
∂Tµν 3
∂Tµν ∂rµν
γ
∂Tµν β
= γ = α rµν . (77)
∂αβ γ =1
∂r µν ∂ αβ ∂rµν
α
Since ∂Tµν /∂rµν is evaluated to calculate the forces, it takes very little extra effort to also
β
multiply it by rµν for the stress. Equally, force contributions such as (70) have their obvious
stress counterpart
ρνµ φµ |V (r )|(∇α φν )rβ . (78)
µν
The SIESTA method for ab initio order-N materials simulation 2765
However, there are three exceptions to this parallelism. The first concerns the change of
the volume per grid point or, in other words, the Jacobian of the transformation (75) in the
integrals over the unit cell. This Jacobian is simply δαβ , and it leads to a stress contribution
NA H xc
(V (r ) + 2 δV (r ))δρ(r ) d r + E δαβ .
1 3
(79)
Notice that the renormalization of the density, required to conserve the charge when the
volume changes, enters through the orthonormality constraints, to be discussed in appendix A.
The second special contribution to the stress lies in the fact that, as we deform the lattice,
there is a change in the factor 1/|r − r | of the electrostatic energy integrals. We deal
with this contribution in reciprocal space, when we calculate the Hartree potential by FFTs,
by evaluating
the derivative of the reciprocal-space vectors with respect to αβ . Since
Gα = β Gβ (δβα − βα ),
∂ 1 2Gα Gβ
= . (80)
∂αβ G 2 G4
Finally, the third special stress contribution arises in GGA XC, from the change of the gradient
of the deformed density ρ(r ) → ρ(r ). The treatment of this contribution is explained in
detail in [?].
The calculation of the electric polarization, as an integral in the grid across the unit cell,
is standard and almost free for molecules, chains and slabs (in the directions perpendicular
to the chain axis, or to the surface). For bulk systems, the electric polarization cannot be
found from the charge distribution in the unit cell alone. In this case, we need the so-called
Berry-phase theory of polarization [48, 49], which allows us to compute quantities such as the
dynamical charges [48] and piezoeletric constants [50, 51]. Here we comment some details of
our implementation [52].
If Rα are the lattice vectors and P e = 3α=1 Pαe Rα is the electronic contribution to the
macroscopic polarization, then we have
e e 2e ∂
2π Pα = Gα · P = − dk Gα · D(k, k ) (81)
(2π ) BZ
3 ∂k k =k
where Gα is the corresponding reciprocal lattice vector, e is the electron charge, ui (k, r ) =
e−ik·r ψi (k, r ) is the periodic part of the Bloch function and the factor of two originates from
the spin degeneracy. The quantum phase D(k, k ) is defined as
D(k, k ) = Im [ln(detui (k, r )|uj (k , r )]. (82)
The derivative in (81) depends on a gauge that must be chosen such that u(k + G, r ) =
e−iG·r u(k, r ). In practice, the integral is replaced by a discrete summation, and a finite-
difference approximation is made for the derivative [48]: 8kα ∂ k∂ D(k, k )|k =k ≈ 21 [D(k, k +
α
8kα ) − D(k, k − 8kα )], where 8kα = Gα /Nα . Then (81) becomes, for α = 1,
2e 3 −1 N
N2 −1,N 1 −1
where we have split the sum to stress the fact that we have a two-dimensional integral in the
plane defined by G2 and G3 , and a linear integral along G1 . Due to the approximation in
2766 J M Soler et al
the derivative, the linear integral usually requires a finer mesh than the surface integral. To
evaluate D(k, k + 8k) we use our LCAO basis:
ui (k)|uj (k + 8k) = ψi (k)|e−i8k·r |ψj (k + 8k)
= ciν (k)cµ j (k + 8k)e−ik·(Rν −Rµ ) φν |e−i8k·(r−Rµ ) |φµ . (84)
ν µ
Formulae similar to (84) have been implemented by several authors [53, 54], mainly in the
context of Hartree–Fock calculations, in which the basis orbitals are expanded in Gaussians
whose matrix elements can be found analytically [53]. Our numerical, localized pseudo-atomic
basis orbitals are not well suited for a Gaussian expansion. Instead, we expand the plane waves
appearing in equation (84) to first order in 8k, e−i8k·(r−Rµ ) ≈ 1 − i8k · (r − Rµ ) + O(8k 2 ),
and then we calculate the matrix elements of the position operator as explained in section 5.
It is interesting to note that, since the discretized formula (83) only holds to O(8k 2 ), the
approximation of the matrix elements in (84) does not introduce any further errors in the
calculation of the polarization. In a symmetrized version, we approximate equation (84) as
8k
ciν (k)cµ j (k + 8k)e−i(k+ 2 )·(Rν −Rµ )
ν µ
8k
× φν |φµ − i · (φν |(r − Rν )|φµ + φν |(r − Rµ )|φµ ) . (85)
2
The basic problem for solving the Kohn–Sham equations in O(N ) operations is that the
solutions (the Hamiltonian eigenvectors) are extended over the whole system and overlap
with each other. Just to check the orthogonality of N trial solutions, by performing integrals
over the whole system, involves ∼N 3 operations. Among the different methods proposed to
solve this problem [5, 9], we have chosen the localized-orbital approach [6, 55, 56] because of
its superior efficiency for nonorthogonal basis sets. The initially proposed functional [6, 55]
used a fixed number of occupied states, equal to the number of electron pairs, and it was found
to have numerous local minima in which the electron configuration was easily trapped. A
revised functional form [56], which uses a larger number of states than electron pairs, with
variable occupations, has been found empirically to avoid the local-minimum problem. This
is the functional that we use and recommend.
Each of the localized, Wannier-like states is constrained to its own localization region.
Each atom I is assigned a number of states equal to int(ZIval /2 + 1) so that, if doubly
occupied, they can contain at least one excess electron (they can also become empty during
the minimization of the energy functional). These states are confined to a sphere of radius Rc
(common to all states) centred at RI . More precisely, the expansion (equation (32)) of a state
ψi centred at RI may contain only basis orbitals φµ centred on atoms J such that |RI J | < Rc .
This implies that ψi (r ) may extend to a maximum range Rc +rcmax , where rcmax is the maximum
range of the basis orbitals. For covalent systems, a localization region centred on bonds rather
than atoms is more efficient [57] (it leads to a lower energy for the same Rc ), but it is less
suitable for a general algorithm, especially in the case of ambiguous bonds. Therefore, we
generally use the atom-centred localization regions.
In the method
of Kim, Mauri and Galli (KMG) [56], the BS energy is rewritten as
KMG
E =2 (2δj i − Sj i )(Hij − ηSij ) = 4 ciµ δHµν cνi
ij i µν
−2 ciα Sαβ cβj cj µ δHµν cνi (86)
ij αβµν
The SIESTA method for ab initio order-N materials simulation 2767
80
Bulk Modulus
60 Lattice Constant
Cohesive Energy
20
– 20
2 3 4 5 6 7 8
Rc (A)
Figure 7. Convergence of the lattice constant, bulk modulus and cohesive energy as a function of
the localization radius Rc of the Wannier-like electron states in silicon. We used a supercell of 512
atoms and a minimal basis set with a cutoff radius rc = 5 au for both s and p orbitals.
where Sij = ψi |ψj , Hij = ψi |H |ψj , δHµν = Hµν − ηSµν and we have assumed a
nonmagnetic solution with doubly occupied states. The ‘double-count’ correction terms of
equation (47) remain unchanged and the electron density is still defined by (35), but the density
matrix is re-defined as
ρµν = 2 cµi (2δij − Sij )cj ν = 4 cµi ciν − 2 cµi ciα Sαβ cβj cj ν . (87)
ij i ij αβ
The parameter η in equation (86) plays the role of a chemical potential, and must be
chosen to lie within the bandgap between the occupied and empty states. This may be tricky
sometimes, since the electron bands can shift during the selfconsistency process or when the
atoms move. In general, the number of electrons will not be exactly the desired one, even if η is
within the bandgap, because the minimization of (86) implies a tradeoff in which the localized
states become fractionally occupied. To avoid an infinite Hartree energy in periodic systems,
we simply renormalize the density matrix so that the total electron charge µν Sµν ρνµ is equal
to the required value.
For a given potential, the functional (86) is minimized by the conjugate-gradients method,
using its derivatives with respect to the expansion coefficients
∂E KMG
=4 δHµν cνi − 2 (Sµν cνj cj α δHαβ cβi + δHµν cνj cj α Sαβ cβi ). (88)
∂ciµ ν j αβν
The minimization proceeds without any need to orthonormalize the electron states ψi . Instead,
the orthogonality, as well as the correct normalization (unity below η and zero above it), results
from the minimization of E KMG . This is because, in contrast to the KS functional, E KMG is
designed to penalize any nonorthogonality [56]. The KS ground state, with all the occupied
ψi orthonormal, is also the minimum of (86), at which E KMG = E KS . If the variational
freedom is constrained by the localization of the ψi , the orthogonality cannot be exact, and
the resulting energy is slightly larger than for unconstrained wavefunctions. In insulators and
semiconductors, the Wannier functions are exponentially localized [58], and the energy excess
due to their strict localization decreases rapidly as a function of the localization radius Rc , as
can be seen in figure 7.
2768 J M Soler et al
2000
250
200 1500
Memory (Mb)
150
1000
100
50 500
0
0
0 2000 4000 6000 8000
Number of Atoms
Figure 8. CPU time and memory for silicon supercells of 64, 512, 1000, 4096 and 8000 atoms.
Times are for one average molecular dynamics step at 300 K. This includes ten SCF steps, each
with ten conjugate gradient minimization steps of the O(N ) energy functional. Memories are peak
ones. Although the memory requirement for 8000 atoms was determined accurately, the run could
not be performed because of insufficient memory in the PC used.
If the system is metallic, or if the chemical potential is not within the bandgap (for example
because of the presence of defects), the KMG functional cannot be used in practice. In fact,
although some O(N ) methods can handle metallic systems in principle [9], we are not aware
of any practical calculations at a DFT level. In such cases we copy the Hamiltonian and
overlap matrices to standard expanded arrays and solve the generalized eigenvalue problem
by conventional order-N 3 diagonalization techniques [59]. However, even in this case, most
of the operations, and particularly those to find the density and potential, and to set up the
Hamiltonian, are still performed in O(N ) operations.
Irrespective of whether the O(N ) functional or the standard diagonalization is used,
an outer selfconsistency iteration is required, in which the density matrix is updated using
Pulay’s residual metric minimization by direct inversion of the iterative subspace (RMM-
DIIS) method [60, 61]. Even when the code is strictly O(N ), the CPU time may increase
faster if the number of iterations required to achieve the solution increases with N . In fact, it
is a common experience that the required number of selfconsistency iterations increases with
the size of the system. This is mainly because of the ‘charge sloshing’ effect, in which small
displacements of charge from one side of the system to the other give rise to larger changes
of the potential as the size increases. Fortunately, the localized character of the Wannier-like
wavefunctions used in the O(N ) method helps to solve also this problem, by limiting the
charge sloshing. Table 2 presents the average number of iterations required to minimize the
O(N ) functional and the average number of selfconsistency iterations, during a molecular
dynamics simulation of bulk silicon at room temperature. It can be seen that these numbers
are quite small and that they increase very moderately with system size. As might be expected,
the number of minimization iterations increases with the localization radius, i.e. with the
number of degrees of freedom (cµi coefficients) of the wavefunctions, but this increase is also
rather moderate.
Figure 8 shows the essentially perfect O(N ) behaviour of the overall CPU time and
memory. This is not surprising in view of the completely strict enforcement of O(N )
algorithms everywhere in the code (except the marginal N log N factor in the FFT used
to solve Poisson’s equation, which represents a very small fraction of CPU time even for
4000 atoms).
The SIESTA method for ab initio order-N materials simulation 2769
Table 2. Average number of selfconsistency (SCF) iterations (per molecular dynamics step) and
average number of conjugate-gradient (CG) iterations (per SCF iteration) required to minimize the
O(N ) functional, during a simulation of bulk silicon at ∼300 K. We used the Verlet method [62] at
constant energy, with a time step of 1.5 fs, and a minimal basis set with a cutoff radius rc = 5 au.
Rc is the localization radius of the Wannier-like wavefunctions used in the O(N ) functional (see
text). N is the number of atoms in the system.
Rc = 4 Å Rc = 5 Å
N CG SCF CG SCF
64 5.8 9.3 8.4 8.4
512 4.9 11.4 8.8 10.1
1000 4.3 11.5 9.9 11.5
Here we shall simply mention some of the possibilities and features of the SIESTA
implementation of DFT.
• A general-purpose package [63], the flexible data format (fdf), initially developped for
the SIESTA project, allows the introduction of all the data and precision parameters in
a simple tag-oriented, order-independent format which accepts different physical units.
The data can then be accessed from anywhere in the program, using simple subroutine
calls in which a default value is specified for the case in which the data are not present.
A simple call also allows the read pointer to be positioned in order to read complex data
‘blocks’ also marked with tags.
• The systematic calculation of atomic forces and stress tensor allows the simultaneous
relaxation of atomic coordinates and cell shape and size, using a conjugate gradients
minimization or several other minimization/annealing algorithms.
• It is possible to perform a variety of molecular dynamics simulations, at constant energy
or temperature, and at constant volume or pressure, also including Parrinello–Rahman
dynamics with variable cell shape [62]. The geometry relaxation may be restricted, to
impose certain positions or coordinates, or more complex constraints.
• The auxiliary program VIBRA processes systematically the atomic forces for sets of
displaced atomic positions, and from them computes the Hessian matrix and the phonon
spectrum. An interface to the PHONON program [64] is also provided within SIESTA.
• A linear response program (LINRES) to calculate phonon frequencies has also been
developed [65]. The code reads the SCF solution obtained by SIESTA, and calculates
the linear response to the atomic displacements, using first-order perturbation theory. It
then calculates the dynamical matrix, from which the phonon frequencies are obtained.
• A number of auxiliary programs allows various representations of the total density, the
total and local density of states and the electrostatic or total potentials. The representations
include both two-dimensional cuts and three-dimensional views, which may be coloured
to simultaneously represent the density and potential.
• Thanks to an interface with the TRANSIESTA program, it is posible to calculate transport
properties across a nanocontact, finding selfconsistently the effective potential across a
finite voltage drop, at a DFT level, using the Keldysh Green function formalism [66].
• The optical response can be studied with SIESTA using different approaches. An
approximate dielectric function can be calculated from the dipolar transition matrix
elements between occupied and unoccupied single-electron eigenstates using first-order
time-dependent pertubation theory [67]. For finite systems, these are easily calculated
2770 J M Soler et al
from the matrix elements of the position operator between the basis orbitals. For infinite
periodic systems, we use the matrix elements of the momentum operator. It is important
to notice, however, that the use of nonlocal pseudopotentials requires some correction
terms [68].
We have also implemented a more sophisticated approach to compute the optical response
of finite systems, using the adiabatic approximation to time-dependent DFT [69, 70].
The idea is to integrate the time-dependent Schrödinger equation when a time-dependent
perturbation is applied to the system [71]. From the time evolution, it is then possible
to extract the optical adsorption and dipole strength functions, including some genuinely
many-body effects, such as plasmons. Using this approach we have succesfully calculated
the electronic response of systems such as fullerenes and small metallic clusters [72].
16. Summary
We have presented one of the first fully operational O(N ) implementations of DFT. This
implementation has required many scientific and technical breakthroughs to calculate all the
terms of the DFT Hamiltonian and to solve Schrödinger equation in strictly O(N ) operations.
Some of these innovations are the following.
• A flexible, numerical atomic basis set, which allows extremely fast calculations, using a
minimal-basis, as well as highly converged ones, using multiple-ζ + polarization bases.
New methods have been devised to generate these numerical basis sets, adapting well
known principles of quantum chemistry, such as the split-valence concept.
• Norm-conserving pseudopotentials optimized for smoothness of the local potential, while
the application of KB nonlocal projectors to our atomic basis orbitals is nearly free using
two-centre integrals.
• A flexible, efficient and general method to calculate two-centre integrals of arbitrary
numerical radial functions, using convolutions and a new FFT method for radial functions.
• Evaluation of the matrix elements of the selfconsistent potential using a regular real-space
grid. The density gradient is evaluated by finite differences in the grid, to calculate the
XC potential of the GGA.
• New expressions for the total energy and forces, in which long-range interactions
are handled efficiently by using the difference between the Hartree potentials of the
selfconsistent density and of the sum of atomic densities. This also considerably decreases
the errors due to the finite integration grid.
• Minimization of an O(N ) functional [56] with localized Wannier-like orbitals allows us to
find the electronic ground state without any need to orthogonalize the one-electron states,
which instead become orthonormal as a result of the minimization. The ground-state
information is ‘coded’ into the one-electron density matrix, which is then used to find the
electron density and total energy, without any further knowledge of the individual electron
states. This allows a unified treatment of the ground states obtained by O(N ) methods or
by conventional Hamiltonian diagonalization, as well as the inclusion of k-sampling and
finite-temperature effects.
• It has been found that the Harris-functional energy converges much faster than the Kohn–
Sham energy, even if it is the latter (or the O(N ) functional) that is minimized. As
a single-iteration scheme, with a minimal basis set, the Harris functional provides a
nonselfconsistent, but reasonable and extremely fast, method for initial relaxations and
exploratory molecular dynamics.
The SIESTA method for ab initio order-N materials simulation 2771
In conclusion, the SIESTA method provides a very general scheme to perform a range
of calculations from very fast to very accurate, depending on the needs and stage of the
simulation, of all kinds of molecule, material and surface. It allows DFT simulations of more
than a thousand atoms in modest PC workstations, and over a hundred thousand atoms in
parallel platforms [73].
Acknowledgments
We are deeply indebted to Otto Sankey and David Drabold for allowing us to use their code
as an initial seed for this project, and to Richard Martin for continuous ideas and support. We
thank Jose Luis Martins for numerous discussions and ideas, and Jürgen Kübler for helping
us implement the noncollinear spin. The exchange–correlation methods and routines were
developed in collaboration with Carlos Balb’as and Jose L Martins. We also thank In-Ho Lee,
Maider Machado, Juana Moreno and Art R Williams for some routines, and Eduardo Anglada
and Oscar Paz for their computational help. This work was supported by the Fundación
Ramón Areces and by Spain’s MCyT grant BFM2000-1312. JDG would like to thank the
Royal Society for a University Research Fellowship and EPSRC for the provision of computer
facilities. DSP acknowledges support from the Basque Government (Programa de Formación
de Investigadores).
We have yet to comment on the force and stress terms containing ∂ρµν /∂ RI . Substituting the
first term of equation (68) into
equations (65)–(67) and adding the first term of equation (62) we
obtain a simple expression: µν Hνµ ∂ρµν /∂ RI . Now, ρµν is a function of the Hamiltonian
eigenvector coefficients and occupations only (equation (34)). On the Born–Oppenheimer
surface (BOS), E KS is stationary with respect to these coefficients and occupations, and the
Hellman–Feynmann theorem guarantees that any change of them will not modify the total
energy to first order, and therefore will not affect the forces. In other words, the atomic forces
are the partial derivatives ∂E KS /∂ RI at constant cµi and ni . Even in the Car–Parrinello
scheme, in which the system moves out of the BOS, making the Hellman–Feynmann theorem
invalid, the atomic forces are nevertheless defined as derivatives at constant cµi and ni . Thus, it
may seem that the terms ∂ρµν /∂ RI are irrelevant for the calculation of the forces. However, in
the previous discussion we have omitted to say that the KS energy must be minimized under the
constraint of orthonormality of the occupied states and that, therefore, at the BOS the energy
is stationary only with respect to changes of ψi which do not violate the orthonormality. With
an atomic basis set, the displacement of atoms (and the deformation of the unit cell) modifies
the basis, and therefore the occupied states ψi = µ φµ cµi , even at constant cµi . Moreover,
the change of the states affects their orthonormality. Thus, in order to calculate the new total
energy, we need to re-orthonormalize the occupied states, by changing their coefficients cµi .
Schematically, we must solve
ψi |δS|ψj + δψi |S|ψj + ψi |S|δψj = 0 (A.1)
where δS represents the change of Sµν due to the atomic displacements, and δψi the
modification of ψi due to the change of cµi .Without lack of generality, we can expand δψi
in the basis of the eigenvectors ψi as δψi = j ψj λj i . Substituting this expansion into (A.1)
and using that ψi |S|ψj = δij we obtain λj i = − 21 ψj |δS|ψi . Thus
|δψi = − 21 |ψj ψj |δS|ψi . (A.2)
j
2772 J M Soler et al
In terms of the coefficients cµi , we have ψj |δS|ψi = µν cj µ δSµν cνi and
−1
δcµi = − 21 cµj cj η δSην cνi = − 21 Sµη δSην cνi (A.3)
j ην ην
and
Hµν δρνµ = − Eµν δSνµ (A.6)
µν µν
where i are the eigenstate energies. To calculate the orthogonalization force or stress, δSµν
must be substituted by the appropriate derivative:
orthog
∂Sµν
FI =2 Eνµ . (A.8)
µ ν∈I ∂ Rµν
This equation has been derived before in different ways, and Ordejón et al [74] found it also
for the O(N ) functional, even though it does not require the occupied states to be orthogonal.
In this case, equation (A.7) must be substituted by a more complicated expression [74].
Similarly, the stress contribution is
orthog
∂Sµν β
σαβ =− Eνµ α Rµν . (A.9)
µν ∂Rµν
Notice that we have extended the integral to the whole real axis, defining ψl (−r) ≡ (−1)l ψl (r),
s,c
in accordance with the behaviour ψl (r) ∼ r l , r → 0. The coefficients cln can be obtained
The SIESTA method for ab initio order-N materials simulation 2773
by defining a complex polynomial Pl (x) = Plc (x) + iPls (x), which obeys the recurrence
relations [78]
√
P0 (x) = i ≡ −1
P1 (x) = i − x (B.3)
Pl+1 (x) = (2l + 1)Pl (x) − x Pl−1 (x).
2
In order to perform the integrals in (B.2) using discrete FFTs, we need to calculate ψ(r) on
a regular radial grid, up to a maximum radius rmax , beyond which ψ(r) is assumed to be
strictly zero. The separation 8r between grid points determines a cutoff kmax = π/8r in
reciprocal space, and vice versa, 8k = π/rmax . For convolutions, such as those involved
in equation (28), we need rmax = r1c + r2c and kmax = max(k1c , k2c ), where r1,2 c c
, k1,2 are the
cutoff radii and maximum wavevectors of ψ1,2 , respectively. We must then pad with zeros
c
the intervals [r1,2 , rmax ] for the forward transforms ψ1,2 (r) → ψ1,2 (k). In practice, we set
rmax = 2 maxµ (rµc ), kmax = maxµ (kµc ), where µ labels all the basis orbitals and KB projectors,
and we use the same real and reciprocal grids for all orbital pairs. In this way, we need to
perform the forward transform only once for each radial function ψµ (r). Finally, notice that in
∗
equation (28) ψ1,l 1 m1
(k)ψ2,l2 m2 (k) ∼ k l1 +l2 for k → 0, while l1 +l2 −l is even and non-negative,
so the integrands of equation (B.2) for the backward transform are all even and well behaved
at the origin.
We describe here a simple and efficient algorithm to handle mesh indices in three-dimensional
periodic systems. Its versatility makes it suitable for several different tasks in SIESTA such as
neighbour-list constructions, basis orbital evaluation in the real-space integration grid, density-
gradient calculations in the GGA etc. It would be also very apropriate for other problems, such
as the solution of partial differential equations by real-space discretization or the calculation
of the interaction energy in lattice models. For clarity of the exposition, we shall describe the
algorithm for a particularly simple application, namely the evaluation of the Laplacian of a
function f (r ) using finite differences, even though the algorithm is not used in SIESTA for this
purpose. In three dimensions, one generally discretizes space in all three periodic directions,
using an index for each direction. For simplicity, let us consider an orthorhombic unit cell,
with mesh steps 8x, 8y, 8z. Then the simplest formula for the Laplacian is
∇ 2 fix ,iy ,iz = (fix +1,iy ,iz − 2fix ,iy ,iz + fix −1,iy ,iz )/8x 2 + (fix ,iy +1,iz − 2fix ,iy ,iz + fix ,iy −1,iz )/8y 2
+ (fix ,iy ,iz +1 − 2fix ,iy ,iz + fix ,iy ,iz −1 )/8z2 .
Lf(ix,iy,iz) = &
( f(modulo(ix+1,nx),iy,iz) + &
f(modulo(ix-1,nx),iy,iz) )/dx2 &
+ ( f(ix,modulo(iy+1,ny),iz) + &
f(ix,modulo(iy-1,ny),iz) )/dy2 &
+ ( f(ix,iy,modulo(iz+1,nz)) + &
f(ix,iy,modulo(iz-1,nz)) )/dz2 &
- f(ix,iy,iz) * (2/dx2+2/dy2+2/dz2)
2774 J M Soler et al
where the indices iα (α = {x, y, z}) of the arrays f and Lf run from 0 to nα − 1,
as in C. There are two problems with this construction. First, the modulo operations
are required to bring the indices back to the allowed range [0, nα − 1], and second,
the use of three indices to refer to a mesh point implies implicit arithmetic operations,
generated by the compiler, to translate them into a single index giving its position
in memory.
A straightforward solution to these inefficiencies would be to create a neighbour-
point list j_neighb(i,neighb), of the size of the number of mesh points multiplied by
the number of neighbour points. However, although the latter is only six in our simple
example, it may frequently be as high as several hundred, which generally makes this
approach unfeasible. A partial solution, addressing only the first problem, is to create six
(or more for longer ranges) one-dimensional tables jα±1 (iα ) = mod(iα ± 1, nα ) to avoid
the modulo computations [79]. Here, we describe a multidimensional generalization of this
method, which solves both problems at the expense of a very reasonable amount of extra
storage.
The method is based on an extended mesh, which extends beyond the periodic unit cell,
by as much as required to cover all the space that can be reached from the unit cell by the range
of the interactions or the finite-difference operator. The extended mesh range is iαmin = −8nα
and iαmax = nα − 1 + 8nα , where 8nα = 1 in our particular example, in which the Laplacian
formula extends just to first-neighbour mesh points. In principle, in cases with a small unit
cell and a long range, the mesh extension may be larger than the unit cell itself, extending over
several neighbour cells. However, in the more relevant case of a large system, we shall expect
the extension region to be small compared with the unit cell. We then consider two combined
indices, one associated with the normal unit-cell mesh, and another one associated with the
extended mesh
where next
α = iα
max
− iαmin + 1 = nα + 28nα . The key observation is that, if iext is a mesh
point within the unit cell (0 iα nα − 1), and if jext is a neighbour mesh point (within its
interaction range, i.e. |jα − iα | 8nα ), then the arithmetic difference jext − iext depends only
on the relative positions of iext and jext (i.e. on jα − iα ), and not on the position of iext within
the unit cell. We can then create a list of neighbour strides 8ijext , and two arrays to translate
back and forth between i and iext . One of the arrays maps the unit-cell points to the central
region of the extended mesh, while the other one folds back the extended mesh points to their
periodically equivalent points within the unit cell. Then, to access the neighbours of a point i,
we (a) translate i → iext , (b) find jext = iext + 8ijext and (c) translate jext → j . Notice that
several points of the extended mesh will map to the same point within the unit cell and that, in
principle, a unit-cell point j may be a neighbour of i through different values of jext . In our
example, the innermost loop would then read
Lf(i) = 0
do neighb = 1,n_neighb
j_ext = i_extended(i) + ij_delta(neighb)
j = i_cell(j_ext)
Lf(i) = Lf(i) + L(neighb) * f(j)
end do
where the number of neighbour points would be n_neighb=7, including the central point
itself, and
The SIESTA method for ab initio order-N materials simulation 2775
We shall describe here some of the sparse-matrix multiplication techniques used in evaluating
equations (35) and (86)–(88). There are a large variety of sparse-matrix representations and
algorithms, each one optimized for a different kind of sparsity. The main constraint for choosing
our representation and algorithms is that they must be O(N ) in both memory and CPU time.
We enforce this condition strictly by requiring, for example, that a vector of size ∼N will not
be reset to zero a number ∼N of times. In our sparse matrices, such as Sµν , Hµν , cµi , ρµν and
φµ (r ), the number p of nonzero elements in a row is typically much larger than unity (but still
of order ∼N 0 ) and much smaller than the row size m ∼ N 1 . Such matrix rows are efficiently
stored as a real vector of size p, containing the nonzero elements, and an integer vector of the
same size containing the column index of each nonzero element. The whole matrix A of n rows
is then represented by two arrays A and jcol, of size n × p, such that14 Aij = A(i,k), where
j = jcol(i,k). The problem with this representation is that, given a value j of the column
index, there is no simple way to access the element Aij without scanning the whole row, which
is frequently too costly. One solution is to unpack a row i, that will be repeatedly used, into
‘expanded form’, i.e. to transfer it to a vector Arow of the full row size m (containing also all
the zeros), so that Aij = Arow(j). Since p 1, the size of Arow is negligible compared with
that of A and jcol.
To find the matrix product C of two sparse matrices A and B
Cik = Aij Bj k
j
we proceed iteratively for each row i of A (which will generate the same row of C): each
nonzero element j of the row is multiplied by every nonzero element of the j th row of B
(whose column index is, say k) and the result is accumulated in the kth position of an auxiliary
‘expanded’ vector. After finishing with that row of A we pack the vector in sparse format into
13 In the present example, further savings can be achieved by extending the arrays f (i) and Lf (i) themselves,
eliminating the index translations and facilitating the paralelization. Another efficient possibility is provided in
Fortran90 by the intrinsic cshift operation. However, these approaches are more complicated in other cases, such as
atomic-neighbour list constructions, while the double-index algorithm is quite general.
14 In Fortran, we alternate the order of i and k, to store the row elements consecutively in memory. If the number of
nonzero elements fluctuates widely for different rows, we also store all the rows consecutively as a large single vector.
2776 J M Soler et al
the ith row of C and restore the auxiliary vector to zero. In fact, the packing can be performed
simultaneously with the product, using an auxiliary index vector instead:
C = 0.
ncolC = 0
jcolC = 0
pos = 0 ! Auxiliary index vector
do i = 1, nA
do jA = 1,ncolA(i)
j = jcolA(i,jA)
do jB = 1,ncolB(j)
k = jcolB(j,jB)
jC = pos(k)
if (jC==0) then ! New nonzero col
ncolC(i) = ncolC(i) + 1
jC = ncolC(i)
jcolC(i,jC) = k
pos(k) = jC
endif
C(i,jC) = C(i,jC) + A(i,jA)*B(j,jB)
enddo
enddo
do jA = 1,ncolA(i) ! Restore pos to zero
j = jcolA(i,jA)
do jB = 1,ncolB(j)
k = jcolB(j,jB)
pos(k) = 0
enddo
enddo
enddo
Notice that the auxiliary vector pos, which keeps the position in ‘packed format’ of the nonzero
elements of one row of C, is initiallized in full only once. Notice also that this algorithm, unlike
those of [28], does not require the matrix elements to be stored in ascending column order.
The previous algorithm generates all the nonzero elements of C but in many cases we need
only some of them. For example, to calculate the electron density (equation (35)), we need
only the density matrix elements ρµν for which φµ and φν overlap. Also the expression (88)
needs to be evaluated only for the coefficients cµi which are allowed to be nonzero by the
localization constraints. In these cases, in which the array jcolC is already known, another
algorithm is more effective. We start by finding the sparse representation of B in column order
or, in other words, the transpose of B:
Bt = 0 ! B transpose
jcolBt = 0
ncolBt = 0
do i = 1,nB
do jB = 1,ncolB(i)
j = jcolB(i,jB)
ncolBt(j) = ncolBt(j) + 1
jBt = ncolBt(j)
jcolBt(j,jBt) = i
The SIESTA method for ab initio order-N materials simulation 2777
Bt(j,jBt) = B(i,jB)
enddo
enddo
We then unpack a row i of A and multiply it by a column j of B (a row of its transpose) for
each required matrix element Cij of their product:
C = 0.
Arow = 0. ! Auxiliary vector
do i = 1,nC
do jA = 1,ncolA(i) ! Copy one row of A
j = jcolA(i,jA)
Arow(j) = A(i,jA)
enddo
do jC = 1,ncolC(i) ! Calculate Cij
j = jcolC(i,jC)
do jBt = 1,ncolBt(j)
k = jcolBt(j,jBt)
C(i,jC) = C(i,jC) + Arow(k)*Bt(j,jBt)
enddo
enddo
do jA = 1,ncolA(i) ! Restore Arow to zero
j = jcolA(i,jA)
Arow(j) = 0.
enddo
enddo
The combination of these two matrix multiplication algorithms allows an efficient evaluation
of equations (86)–(88). Since these equations involve a trace or a relatively small subset of a
final matrix, it is important to control the order and sparsity of the intermediate products, in
order to keep them as sparse as possible. Notice that, once a row of A × B has been evaluated,
it may be multiplied by a third matrix, to obtain a row of the final product, without any need
to store the whole intermediate matrix.
To calculate the density at a grid point using equation (35) we need to access the matrix
elements ρµν , and this is inefficient if they are stored in sparse format. Thus, we first copy
the matrix elements, between the nr basis orbitals which are nonzero at the grid point r, into
an auxiliary matrix array, of size naux × naux , with naux nr . We also create a lookup table
pos, of size equal to the total number of basis orbitals, such that pos(mu) is the position, in
the auxiliary matrix, of the matrix elements of orbital mu (or zero if they have not been copied
to it). If there are new nonzero orbitals at the next grid points, we keep copying them into
the auxiliary matrix, until all its naux slots are full, at which point we erase it and restart the
process. Since succesive grid points tend to contain the same nonzero basis orbitals, these
copies and erasures are not frequent.
References
[6] Ordejon P, Drabold D A, Grumbach M P and Martin R M 1993 Phys. Rev. B 48 14 646
[7] Sankey O F and Niklewski D J 1989 Phys. Rev. B 40 3979
[8] Payne M C, Teter M P, Allan D C, Arias T A and Joannopoulos J D 1992 Rev. Mod. Phys. 64 1045
[9] Goedecker S 1999 Rev. Mod. Phys. 71 1085
[10] Hernandez E and Gillan M J 1995 Phys. Rev. B 51 10 157
[11] Ordejon P, Artacho E and Soler J M 1996 Phys. Rev. B 53 R10 441
[12] Sanchez-Portal D, Ordejon P, Artacho E and Soler J M 1997 Int. J. Quantum Chem. 65 453
[13] Ordejon P 2000 Phys. Status Solidi b 217 335 see also http://www.uam.es/siesta
[14] Kohn W and Sham L J 1965 Phys. Rev. 140 A1133
[15] Perdew J P and Zunger A 1981 Phys. Rev. B 23 5048
[16] Perdew J P, Burke K and Ernzerhof M 1996 Phys. Rev. Lett. 77 3865
[17] Hamann D R, Schlüter M and Chiang C 1979 Phys. Rev. Lett. 43 1494
[18] Bachelet G B, Hamann D R and Schlüter M 1982 Phys. Rev. B 26 4199
[19] Kleinman L and Bylander D M 1982 Phys. Rev. Lett. 48 1425
[20] Louie S G, Froyen S and Cohen M L 1982 Phys. Rev. B 26 1738
[21] Kleinman L 1980 Phys. Rev. B 21 2630
[22] Bachelet G B and Schlüter M 1982 Phys. Rev. B 25 2103
[23] Troullier N and Martins J L 1991 Phys. Rev. B 43 1993
[24] Blöchl P E 1990 Phys. Rev. B 41 5414
[25] Ramer N J and Rappe A M 1999 Phys. Rev. B 59 12 471
[26] Vanderbilt D 1985 Phys. Rev. B 32 8412
[27] Vanderbilt D 1990 Phys. Rev. B 41 7892
[28] Press W H, Teukolsky S A, Vetterling W T and Flannery B P 1992 Numerical Recipes (Cambridge: Cambridge
University Press)
[29] Sanchez-Portal D, Soler J M and Artacho E 1996 J. Phys.: Condens. Matter 8 3859
[30] Lippert G, Hutter J, Ballone P and Parrinello M 1996 J. Phys. Chem. 100 6231
[31] Artacho E, Sanchez-Portal D, Ordejon P, Garcia A and Soler J M 1999 Phys. Status Solidi b 215 809
[32] Huzinaga S et al 1984 Gaussian Basis Sets for Molecular Calculations (Berlin: Elsevier)
[33] Junquera J, Paz O, Sanchez-Portal D and Artacho E 2001 Phys. Rev. B 64 235111
[34] Filippi C, Singh D J and Umrigar C J 1994 Phys. Rev. B 50 14 947
[35] Kittel C 1986 Introduction to Solid State Physics (New York: Wiley)
[36] Jackson J D 1962 Classical Electrodynamics (New York: Wiley)
[37] Balb’as L C, Martins J L and Soler J M 2001 Phys. Rev. B 64 165110
[38] Makov G and Payne M C 1995 Phys. Rev. B 51 4014
[39] Sandratskii L M and Guletskii P G 1986 J. Phys. F: Met. Phys. 16 L43
[40] Kübler J, Höck K H, Sticht J and Williams A R 1988 J. Appl. Phys. 63 3482
[41] Oda T, Pasquarello A and Car R 1998 Phys. Rev. Lett. 80 3622
[42] Postnikov A V, Engel P and Soler J M 2001 Preprint cond-mat/0109540
[43] Moreno J and Soler J M 1992 Phys. Rev. B 45 13 891
[44] Monkhorst H J and Pack J D 1976 Phys. Rev. B 13 5188
[45] Mermin N D 1965 Phys. Rev. B 137 A1441
[46] Harris J 1985 Phys. Rev. B 31 1770
[47] Foulkes W M C and Haydock R 1989 Phys. Rev. B 39 12 520
[48] King-Smith R D and Vanderbilt D 1993 Phys. Rev. B 47 1651
[49] Resta R 1994 Rev. Mod. Phys. 66 899
[50] Saghi-Szabo G, Cohen R E and Krakauer H 1998 Phys. Rev. Lett. 80 4321
[51] Vanderbilt D 2000 J. Phys. Chem. Solids 61 147
[52] Sánchez-Portal D, Souza I and Martin R M 2000 Fundamental Physics of Ferroelectrics (AIP Conf. Proc.
Vol. 535) ed R Cohen (Melville: AIP) pp 111–20
[53] Dall’Olio S and Dovesi R 1997 Phys. Rev. B 56 10 105
[54] Yaschenko E, Fu L, Resca L and Resta R 1998 Phys. Rev. B 58 1222
[55] Mauri F, Galli G and Car R 1993 Phys. Rev. B 47 9973
[56] Kim J, Mauri F and Galli G 1995 Phys. Rev. B 52 1640
[57] Stephan U, Drabold D A and Martin R M 1998 Phys. Rev. B 58 13 472
[58] Kohn W 1959 Phys. Rev. 115 809
[59] Anderson E et al 1999 LAPACK Users’ Guide (Philadelphia, PA: SIAM)
[60] Pulay P 1980 Chem. Phys. Lett. 73 393
[61] Pulay P 1982 J. Comput. Chem. 13 556
The SIESTA method for ab initio order-N materials simulation 2779
[62] Allen M P and Tildesley D J 1987 Computer Simulation of Liquids (Oxford: Oxford University Press)
[63] Garcia A and Soler J M 2001 unpublished
[64] Parlinski K 2001 unpublished
[65] Pruneda J M, Estreicher S, Junquera J, Ferrer J and Ordejon P 2002 Phys. Rev. B 65 -75210
[66] Brandbyge M, Stokbro K, Taylor J, Mozos J-L and Ordejon P 2001 Mater. Res. Soc. Symp. Proc. 636 D9.25.1
[67] Economou E N 1983 Green’s Functions in Quantum Physics (Berlin: Springer)
[68] Read A J and Needs R J 1991 Phys. Rev. B 44 13 071
[69] Gross E K U, Ullrich C A and Gossmann U J 1995 Density Functional Theory ed E K U Gross and R M Dreizler
(New York: Plenum) pp 149–71
[70] Gross E K U, Dobson J F and Petersilka M 1996 Density Functional Theory II: Relativistic and Time Dependent
Extensions (Topics in Current Chemistry Vol. 181) ed R F Nalewajski (Berlin: Springer) pp 81–172
[71] Yabana K and Bertsch G F 1996 Phys. Rev. B 54 4484
[72] Tsolakidis A, Sanchez-Portal D and Martin R M 2001 Preprint cond-mat/0109488
[73] Gale J D 2002 unpublished
[74] Ordejon P, Drabold D A, Martin R M and Grumbach M P 1995 Phys. Rev. B 51 1456
[75] Suter B W 1991 IEEE Trans. Signal Process. 39 532
[76] Mohsen A A and Hashish E A 1994 Geophys. Prospect. 42 131
[77] Ferrari T, Perciante D and Dubra A 1999 J. Opt. Soc. Am. A 16 2581
[78] Abramowitz M and Stegun I A 1964 Handbook of Mathemathical Functions (New York: Dover)
[79] Binder K and Heermann D W 1992 Monte Carlo Simulation in Statistical Physics (Berlin: Springer)
[80] Gonze X, Stumpf R and Scheffler M 1991 Phys. Rev. B 44 8503