Computational Many-Particle Physics PDF
Computational Many-Particle Physics PDF
Computational Many-Particle Physics PDF
Editorial Board
R. Beig, Wien, Austria
W. Beiglböck, Heidelberg, Germany
W. Domcke, Garching, Germany
B.-G. Englert, Singapore
U. Frisch, Nice, France
P. Hänggi, Augsburg, Germany
G. Hasinger, Garching, Germany
K. Hepp, Zürich, Switzerland
W. Hillebrandt, Garching, Germany
D. Imboden, Zürich, Switzerland
R. L. Jaffe, Cambridge, MA, USA
R. Lipowsky, Potsdam, Germany
H. v. Löhneysen, Karlsruhe, Germany
I. Ojima, Kyoto, Japan
D. Sornette, Nice, France, and Zürich, Switzerland
S. Theisen, Potsdam, Germany
W. Weise, Garching, Germany
J. Wess, München, Germany
J. Zittartz, Köln, Germany
The Lecture Notes in Physics
The series Lecture Notes in Physics (LNP), founded in 1969, reports new developments
in physics research and teaching – quickly and informally, but with a high quality and
the explicit aim to summarize and communicate current knowledge in an accessible way.
Books published in this series are conceived as bridging material between advanced grad-
uate textbooks and the forefront of research and to serve three purposes:
• to be a compact and modern up-to-date source of reference on a well-defined topic
• to serve as an accessible introduction to the field to postgraduate students and
nonspecialist researchers from related areas
• to be a source of advanced teaching material for specialized seminars, courses and
schools
Both monographs and multi-author volumes will be considered for publication. Edited
volumes should, however, consist of a very limited number of contributions only. Pro-
ceedings will not be considered for LNP.
Volumes published in LNP are disseminated both in print and in electronic formats, the
electronic archive being available at springerlink.com. The series content is indexed, ab-
stracted and referenced by many abstracting and information services, bibliographic net-
works, subscription agencies, library networks, and consortia.
Proposals should be sent to a member of the Editorial Board, or directly to the managing
editor at Springer:
Christian Caron
Springer Heidelberg
Physics Editorial Department I
Tiergartenstrasse 17
69121 Heidelberg / Germany
christian.caron@springer.com
H. Fehske
R. Schneider
A. Weiße (Eds.)
Computational
Many-Particle Physics
Editors
Holger Fehske Ralf Schneider
Alexander Weiße Max-Planck-Institut für Plasmaphysik
Universität Greifswald Wendelsteinstr. 1
Institut für Physik 17491 Greifswald, Germany
Felix-Hausdorff-Str. 6 ralf.schneider@ipp.mpg.de
17489 Greifswald,
Germany
holger.fehske@physik.uni-greifswald
weisse@physik.uni-greifswald.de
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,
reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9,
1965, in its current version, and permission for use must always be obtained from Springer. Violations
are liable for prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springer.com
c Springer-Verlag Berlin Heidelberg 2008
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws
and regulations and therefore free for general use.
Typesetting: by the authors and Integra using a Springer LATEX macro package
Cover design: eStudio Calamar S.L., F. Steinen-Broo, Pau/Girona, Spain
Printed on acid-free paper SPIN: 11808855 543210
Preface
statistical physics, solid state theory and high performance computing. The present
Lecture Notes summarize and extend the material showcased over a 2-week period
of tightly scheduled tutorials, seminars and exercises. The emphasis is on a very ped-
agogical and systematic introduction to various numerical concepts and techniques,
with the hope that the reader may quickly start to program himself. The spectrum of
the numerical methods presented is very broad, covering classical as well as quan-
tum few- and many-particle systems. The trade-off between the number of particles,
the complexity of the underlying microscopic models and the importance of the in-
teractions determine the choice of the appropriate numerical approach. Therefore,
we arranged the book along the algorithms and techniques employed, rather than on
the physics applications, which we think is more natural for a book on numerical
methods.
We start with methods for classical many-particle systems. Here, molecular dy-
namics approaches trace the motion of individual particles, kinetic approaches work
with the distribution functions of particles and momenta, while hybrid approaches
combine both concepts. A prominent example is the particle-in-cell method typi-
cally applied to model plasmas, where the time evolution of distribution functions is
approximated by the dynamics of pseudo-particles, representing thousands or mil-
lions of real particles. Of course, at a certain length scale the quantum nature of
the particles becomes important. As an attempt to close the gap between classi-
cal and quantum systems, we outline a number of semi-classical (Wigner-function,
Boltzmann- and Vlasov-equation based) approaches, which in particular address
transport properties. The concept of Monte Carlo sampling is equally important
for classical, statistical and quantum physical problems. The corresponding chap-
ters therefore account for a substantial part of the book and introduce the major
stochastic approaches in application to very different physical situations. Focussing
on solids and their properties, we continue with ab initio approaches to the elec-
tronic structure problem, where band structure effects are taken into account with
full detail, but Coulomb interactions and the resulting correlations are treated ap-
proximately. Dynamical mean field theories and cluster approaches aim at improv-
ing the description of correlations and bridge the gap to an exact numerical treatment
of basic microscopic models. Exact diagonalization of finite systems gives access
to their ground-state, spectral and thermodynamic properties. Since these methods
work with the full many-particle Hamiltonian, the study of a decent number of par-
ticles or larger system sizes is a challenging task, and there is a strong demand
to circumvent these limitations. Along this line the density matrix renormalization
group represents a clever technique to restrict the many-particle Hilbert space to
the physically most important subset. Finally, all the discussed methods heavily
rely on the use of powerful computers, and the book would be incomplete without
two detailed chapters on parallel programming and optimization techniques for high
performance computing.
Of course, the preparation of such a comprehensive book would have been im-
possible without support from many colleagues and sponsors. First of all, we thank
the lecturers and authors for their engagement, enthusiasm and patience. We are
Preface VII
greatly indebted to Milena Pfafferott and Andrea Pulss for their assistance during
the editorial work and the fine-tuning of the articles. Jutta Gauger, Beate Kemnitz,
Thomas Meyer and Gerald Schubert did an invaluable job in the organization of the
summer school. Finally, we acknowledge financial support from the Wilhelm and
Else Heraeus foundation, the Deutsche Forschungsgemeinschaft through SFB 652
and TR 24 and the Helmholtz-Gemeinschaft through COMAS.
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773
1 Introduction to Molecular Dynamics
F i = mi a i , (1.1)
R. Schneider et al.: Introduction to Molecular Dynamics, Lect. Notes Phys. 739, 3–40 (2008)
DOI 10.1007/978-3-540-74686-7 1 c Springer-Verlag Berlin Heidelberg 2008
4 R. Schneider et al.
for each atom i in a system constituted by N atoms. Here, mi is the atom mass, ai
its acceleration and F i the force acting upon it due to the interactions with the other
atoms. Equivalently one can solve classical Hamiltonian equation of motion
∂H
ṗi = − , (1.2)
∂ri
∂H
ṙi = , (1.3)
∂pi
where pi and r i are the momentum and position co-ordinates for the ith atom. H,
the Hamiltonian, which is defined as a function of position and momenta, is given by
N
p2i
H (pi , ri ) = + V (r i ) . (1.4)
i=1
2mi
The force on an atom can be calculated as the derivative of energy with respect to
the change in the atom’s position
dE
F i = mi ai = −∇i V = − . (1.5)
dri
Knowledge of the atomic forces and masses can then be used to solve for the po-
sitions of each atom along a series of extremely small time steps (on the order of
femtoseconds). The velocities are calculated from the accelerations
dv i
ai = . (1.6)
dt
Finally, the positions are calculated from the velocities
dri
vi = . (1.7)
dt
To summarize the procedure, at each step, the forces on the atoms are computed
and combined with the current positions and velocities to generate new positions
and velocities a short time ahead. The force acting on each atom is assumed to be
constant during the time interval. The atoms are then moved to the new positions,
an updated set of forces is computed and new dynamics cycle goes on.
Usually molecular dynamics simulations scale by either O(N log N ) or O(N ),
with N as the number of atoms. This makes simulations with macroscopic number
of atoms or molecules (∼ 1023 ) impossible to handle with MD. Therefore, statisti-
cal mechanics is used to extract the macroscopic information from the microscopic
information provided by MD.
Two important properties of the equations of motion should be noted. One is
that they are time reversible, i.e., they take the same form when the transformation
t → −t is made. The consequence of time reversal symmetry is that the microscopic
physics is independent of the direction of the flow of time. Therefore, in contrast to
1 Introduction to Molecular Dynamics 5
N N
dH ∂H ∂H ∂H ∂H ∂H ∂H
= ṙ i + ṗi = − =0. (1.8)
dt i=1
∂ri ∂pi i=1
∂ri ∂pi ∂pi ∂ri
with vx,y,z being the velocities, r being the positions of atoms, and i being the index
that sums over all the atoms N in the system. Φi (ri ) is the potential energy of the
ith atom due to all other atoms in the system.
τT has to be greater than ∆t. According to Berendsen [7] if τT > 100∆t then the
system has natural fluctuations about the average.
The Berendsen pressure control is implemented by changing all atom positions, and
the system cell size during the simulation. If the desired pressure is P0 and τP is the
time constant for pressure control, which should be typically greater than 100∆t,
the scaling factor μ is given by:
1/3
β∆t
μ = 1− (Po − P ) , (1.13)
τP
8 R. Schneider et al.
This type of temperature and pressure scaling should be done after the solution of
the equations of motions gives realistic fluctuations in temperature and pressure for
a system in equilibrium and when large values of τT and τP are chosen.
For pair potentials, the total potential energy of a system can be calculated from the
sum of energy contributions from pairs of atoms and it depends only on the distance
between atoms. One example of a pair potential is the Lennard-Jones potential [11]
1 Introduction to Molecular Dynamics 9
(also known as the 6–12 potential). Other examples of pair potential are Coulomb
potential, Morse potential [12] etc. Lennard-Jones potential is the most commonly
used form
σ 12 σ
6
V (r)LJ = 4ǫ − , (1.17)
r r
where ǫ is the cohesive energy well depth and σ is the equilibrium distance. The
(σ/r)12 term describes the repulsive force due to overlapping of electron orbitals
(Pauli repulsion) and does not have a true physical motivation, other than that the
exponent must be larger than 6 to get a potential well. One often uses 12 because
it can be calculated efficiently(square of 6). The term (σ/r)6 describes the attrac-
tive force (Van der Waals) and can be derived classically by considering how two
charged spheres induce dipole-dipole interactions into each other. This potential was
used in the earliest studies of the properties of liquid argon [13, 14]. LJ potentials
are not a good choice for very small r (r 0.1 nm) since the true interaction is
∼ (1/r)exp(−r) and not 1/r12 .
Typical simulation sizes in molecular dynamics simulation are very small up to
1000 atoms. As a consequence, most of the extensive quantities are small in magni-
tude when measured in macroscopic units. There are two possibilities to overcome
this problem: Either one should work with atomic-scale units (ps, amu, nm) or to
make all the observable quantities dimensionless with respect to their characteristic
values. The second approach is more popular. The scaling is done with the model pa-
rameters e.g size σ, energy ǫ, mass m. So the common recipe is, one chooses a value
for one atom/molecule pair potential arbitrarily (ǫ) and then other model parameters
(say energy E) are given in terms of this reference value (E ∗ = E/ǫ). The other
parameters are also calculated similarly. For example, dimensionless distance (r∗ =
r/σ), energy(E ∗ = E/ǫ), temperature (T ∗ = kT /ǫ), time (t∗ = t/[σ(m/ǫ)1/2 ]),
force (F ∗ = F σ/ǫ), diffusion coefficient (D∗ = D/[σ(ǫ/m)1/2 ]) and so on.
Now if we write the LJ potential in dimensionless form
6
12
∗ ∗ LJ 1 1
V (r ) = 4 ∗
− . (1.18)
r r∗
We see that it is parameter independent, consequently all the properties must also
be parameter independent. If a potential only has a couple of parameters then this
scaling has a lot of advantages. Namely, potential evaluation can be really efficient in
reduced units and as the results are always the same, so the results can be transferred
to different systems with straight forward scaling by using the model parameters
σ, ǫ and m. This is equivalent to selecting unit value for the parameters and it is
convenient to report system properties in this form e.g P ∗ (ρ∗ ).
angular terms are needed, and in many cases many more complicated terms as well.
For instance, in carbon chains the difference between single and double bonds often
is important, and for this at least a four-body term is needed.
To describe complex molecules a large set of inter-atomic potentials (often also
called force fields) have been developed by chemists, physicists and biochemists. At
least when force fields are used to describe atom motion inside molecules and inter-
actions between molecules (but not actual chemical reactions) the term molecular
mechanics is often used.
The total energy of a molecule can be given as
Where:
– Ebond describes the energy change related to a change of bond length, and thus
is simply a pair potential V2 .
– Eangle describes the energy change associated with a change in the bond angle,
i.e. is a three-body potential V3 .
– Etorsion describes the torsion, i.e. energy associated with the rotation between
two parts of a molecule relative to each other.
– Eoop describes out-of-plane interactions, i.e. the energy change when one part
of a molecule is out of the plane with another.
– Ecross are cross terms between the other interaction terms.
– Enonbond describes interaction energies which are not associated with covalent
bonding. Could be e.g. ionic or van-der-Waal-terms.
In the following we describe the terms, using notation more common on chemistry
rather than the physics notation used earlier.
This term describes the energy change associated with the bond length. It is a simple
pair potential, and could be e.g. a Morse or LJ potential. At its simplest, it is purely
harmonic, i.e.
1
Ebond = kb (b − b0 )2 , (1.20)
2
bonds
we see that this is essentially the same thing as the pair potentials dealt with earlier.
So this is essentially the same thing as approximating the bond as a string with the
string constant k. Although the approximation is very simple, it can be good enough
in problems where we are always close to equilibrium, since any smooth potential
well can always be to the first order approximated by a harmonic well. But harmonic
1 Introduction to Molecular Dynamics 11
potentials obviously can not describe large displacements of atoms or bond breaking
reasonably. In solids, the harmonic approximation corresponds to the elastic regime,
i.e. the one where stress is directly proportional to the strain (Hooke’s law).
To improve on the bond model beyond the elastic regime, one can add higher-
order terms to it, e.g.
Ebond = K2 (b − b0 )2 + K3 (b − b0 )3 + K4 (b − b0 )4 . (1.22)
bonds
This way also larger strain can be described, but this still does not describe bond
breaking (dissociation).
Also the Morse potential
Ebond = Db {1 − e−a(b−b0 ) }2 (1.23)
bonds
is much used to describe bond energies. It is good in that it tends to zero when b
tends to infinity so it can describe bond breaking. But on the other hand it never
goes fully to zero, which is not quite realistic either as in reality a covalent bond
does break essentially completely at some inter-atomic distance.
The angular terms describe the energy change associated with two bonds forming an
angle with each other. Most kinds of covalent bonds have some angle which is most
favored by them – for sp3 hybridized bonds it is ∼ 109◦ , for sp2 120◦ and so on.
Like for bond lengths, the easiest way to describe bond angles is to use a harmonic
term like
Eangle = Hθ (θ − θ0 )2 , (1.24)
θ
where θ0 is the equilibrium angle and Hθ a constant which describes the angular
dependence well.
This may work well up to 10◦ or so, but for larger angles additional terms are
needed. A typical means for improvement is the third-order terms and so forth, for
instance
Eangle = H2 (θ − θ0 )2 + H3 (θ − θ0 )3 . (1.25)
θ
The bond and angular terms were already familiar from the potentials for solids. In
the physics and chemistry of molecules there are many important effects which can
not be described solely with these terms. The most fundamental of these is probably
12 R. Schneider et al.
torsion. By this, the rotations of one part of a molecule with respect to another is
meant. A simple example is the rotation of two parts of the ethane molecule C2 H6
around the central C-C carbon bond.
Torsional forces can be caused by e.g. dipole-dipole-interactions and bond con-
jugation. If the angle between two parts is described by an angle φ, it is clear that the
function f which describes the rotation should have the property f (φ) = f (φ+2π),
because it is possible to do a full rotation around the central bond and return to the
initial state. The trigonometric functions sine and cosine of course fulfill this re-
quirement, so it is natural to describe the torsional energy with a few terms in a
Fourier series
The first part of the torsional term V1 is often interpreted to be related to dipole-
dipole interactions, V2 to bond conjugation and V3 to steric energy.
With the out-of-plane-terms one describes the energy which in (some cases) is as-
sociated with the displacement of atoms out of the plane in which they should be.
This is relevant in some (parts of) molecules where atoms are known to lie all in the
same plane. The functional form can be rather simple
Eoop = Hχ χ 2 , (1.27)
χ
The cross-terms are functions which contain several of the above-mentioned quanti-
ties. They could e.g. describe how a stretched bond has a weaker angular dependence
than a normal one. Or they can describe the relations between two displacements,
an angle and a torsion and so on.
With the non-bonding terms all effects which affect the energy of a molecule but
are not covalent bonds are meant. These are e.g. van-der-Waals-terms, electrostatic
Coulomb interactions and hydrogen bonds. For this terms one could thus further
divide
Enonbond = EvdW + ECoulomb + Ehbond . (1.28)
The van der Waals term is often a simple Lennard-Jones-potential, and ECoulomb a
Coulomb potential for some, usually fractional, charges qi .
1 Introduction to Molecular Dynamics 13
Most of the potential functions used in MD simulations are intended for modeling
physical processes, not chemical reactions. The formation and breaking of chemi-
cal bonds are inherently quantum mechanical processes, and are often studied using
first-principles methods. Nevertheless, classical potentials do exist that can empiri-
cally model changes in covalent bonding.
One successful method for treating covalent bonding interactions in computer
simulations is the Tersoff-type potential [15, 16, 17, 18]. Unlike traditional molecu-
lar mechanics force fields [19, 20, 21, 22, 23, 24, 25, 26], the Tersoff model allows
for the formation and dissociation of covalent chemical bonds during a simulation.
Many-body terms reflecting the local coordination environment of each atom are
used to modify the strength of more conventional pairwise terms. With this ap-
proach, individual atoms are not constrained to remain attached to specific neigh-
bors, or to maintain a particular hybridization state or coordination number. Models
of this sort, despite being purely classical, can provide a realistic description of co-
valent bonding processes in non-electrostatic systems. Potentials of this type have
been developed to treat systems containing silicon [16], carbon [17, 27], germanium
[18], oxygen [27], or hydrogen [27], as well as heterogeneous systems containing
various combinations of these species [18, 28, 29, 30, 31].
One particularly successful example of a Tersoff potential is the reactive empiri-
cal bond-order (REBO) potential developed by Brenner [30, 31, 32, 33]. This model
uses a Tersoff-style potential to describe the covalent bonding interactions in carbon
and hydrocarbon systems. Originally developed for use in simulating the chemical
vapor deposition of diamond [30], the REBO potential has been extended to provide
more accurate treatment of the energetic, elastic, and vibrational properties of solid
carbon and small hydrocarbons [33]. This potential has been used to model many
different materials and processes, including fullerenes [32], carbon nanotubes [34],
amorphous carbon [35], and the tribology and tribochemistry of diamond interfaces
[36, 37, 38, 39, 40, 41, 42].
The REBO potential is not appropriate for studying every hydrocarbon system,
however. In particular, the absence of dispersion and non-bonded repulsion terms
makes the potential poorly suited for any system with significant intermolecular
interactions. This is the case for many important hydrocarbon systems, including
liquids and thin films, as well as some solid-state materials such as graphite and
fullerenes. Even covalent materials such as diamond can benefit from a treatment
including non-bonded interactions. The bulk phase is dominated by covalent inter-
actions, but longer-range forces become quite important when studying interfacial
systems [27].
Various attempts have been made previously to combine non-bonded interac-
tions with the Tersoff or REBO potentials in a way that preserves the reactive ca-
pabilities of the model [43, 44, 45]. One such improvement of the Tersoff potential
was presented by Kai Nordlund et al. [46] which retains the good description of
the covalent bonding and yet also describes accurately both the short-range replu-
sive part of the potential and the long-range bonding between graphite planes. One
14 R. Schneider et al.
way to do this is to simply reduce the repulsive barrier associated with the Lennard-
Jones or other potential [47], although this results in barriers which are too large for
radical species and too small for saturated compounds. Another alternative, taken
by Nyden et al. [44], is to allow bonds to dissociate with a Morse potential [12],
and explicitly check for recombination reactions between dissociated radicals. This
approach has been used to model thermal decomposition of polymers [44], but is
not general enough to treat arbitrary reactions in hydrocarbons, such as addition
across unsaturated bonds. Another method, used by Che et al. [45] is to reduce the
repulsive non-bonded interactions based on the covalent interaction energy, rather
than the distance. This method can help eliminate non-bonded interactions dur-
ing bond dissociations, but will again tend to overestimate barriers in association
reactions.
summing the above two equations eliminates the odd-order terms. Rearranging gives
Notice that the position vector r at time t + δt is calculated from position vector
at time t and t − δt, this makes the Verlet’s algorithm a two-step method. Therefore
it is not self-starting, initial positions r(0) and velocities v(0) are not sufficient to
begin a calculation. Also the velocities are missing from the above equation and can
be calculated from
If the Taylor expansions are truncated, so that only the terms shown explicitly
in (1.34) are left, then the quantities can be called the predicted values rp , v p , ap
and bp . The force is computed by taking the gradient of potential at the predicted
position rp , and new acceleration value is computed. Since the predicted values are
not based on physics the re-calculated acceleration is different from the predicted
acceleration ap (acceleration in (1.34)). The difference between the two values is
called the error signal or error
∆a(t + δt) = ac (t + δt) − ap (t + δt) . (1.35)
This error signal is used to correct all predicted quantities in (1.34)
r c (t + δt) = r p (t + δt) + c0 ∆a(t + δt) ,
v c (t + δt) = v p (t + δt) + c1 ∆a(t + δt) ,
ac (t + δt) = ap (t + δt) + c2 ∆a(t + δt) ,
bc (t + δt) = bp (t + δt) + c3 ∆a(t + δt) . (1.36)
All the corrected quantities are proportional to the error signal, and the propor-
tional coefficients are determined to maximize the stability of the calculation. These
corrected values are now better approximations of the true quantities, and are used
to predict the quantities in the next iteration. The best choice for these coefficients
depends on the order of both the differential equations and the Taylor series [53].
These coefficients are computed based on the order of the algorithm being used in
the simulation. In addition, the accuracy of the numerical integrator algorithms also
depends on the time step size, which is typically on the order of fractions of femto-
seconds (10−15 s). Thus, the simulation as a whole is able to describe only short-
time scale phenomena that last on the order of pico- (10−12 ) up to nano-seconds
(10−9 s).
1.4.3 Leap-Frog
In this algorithm, the velocities are first calculated at time t + 1/2δt; these are used
to calculate the positions, r, at time t + δt. In this way, the velocities leap over the
positions, then the positions leap over the velocities. The advantage of this algorithm
is that the velocities are explicitly calculated, however, the disadvantage is that they
are not calculated at the same time as the positions. The velocities at time t can be
approximated by the relationship
1 1 1
v(t) = v t − δt + v t + δt . (1.37)
2 2 2
Therefore:
1
r(t + δt) = r(t) + v t + δt δt , (1.38)
2
1 1
v t + δt = v t − δt + a(t)δt . (1.39)
2 2
1 Introduction to Molecular Dynamics 17
The advantage of this algorithm is that it provides a more accurate expression for
the velocities and better energy conservation. The disadvantage is that the more
complex expressions make the calculation more expensive
2 1
r(t + δt) = r(t) + δt v(t) + δt2 a(t) − δt2 a(t − δt) . (1.45)
3 6
The predicted velocity is given by
3 1
v(t + δt) = v(t) + δt a(t) − δt a(t − δt) . (1.46)
2 2
The acceleration is based on the predicted velocity
where vi is the predicted velocity from the previous equation. The corrected velocity
is given by
1 5 1
v(t + δt) = v(t) + δt a(t + δt) + δt a(t) − δt a(t − δt) . (1.48)
3 6 6
18 R. Schneider et al.
The fifth-order Gear predictor-corrector method [53] predicts the molecular position
ri at time t+δt using fifth-order Taylor series based on position and their derivatives
at time t. It is particularly useful for stiff differential equations.
where t is the simulation time, M is the number of time steps in the simulation and
A(pN , rN ) is the instantaneous value of A. This integral is generally extremely
difficult to calculate because one must calculate all possible states of the system.
In statistical mechanics experimental observables are assumed to be ensemble
averages
Aensemble = dpN dr N A pN , rN ρ pN , rN , (1.50)
The basic idea is that if one allows the system to evolve in time indefinitely, then
the system will eventually pass through all possible states. One goal, therefore, of a
1 Introduction to Molecular Dynamics 19
There are a number of different physical quantities which one may be interested in.
For a liquid, these may be liquid structure factors, transport coefficients (eg. diffu-
sion coefficient, viscosity or thermal conductivity) etc. For solids, these may be crys-
tal structure, adsorption of molecules on surface, melting behaviour etc. Here, we
will consider the diagnostics methods to calculate internal energy, pressure tensor,
self-diffusion coefficient and pair distribution function. More details are described
in [55, 56].
1.5.2.1 Energy
The energy is the simplest and most straightforward quantity to calculate. From all
pair of atoms (i, j), one calculates their separation r ij . These are then substituted
into the chosen form of potential U (r). The energy has contributions from both po-
tential and kinetic terms. The kinetic energy should be calculated after the momenta
p have been updated, i.e., after the force routine has been called. The kinetic energy
can then be calculated, and then added to the potential energy
|pi |2
E = H = K + U = + U (r) . (1.52)
i
2mi
U (r) is obtained directly from the potential energy calculations. For calculating
avarage temperature
N
3 1 |pi |2
Ekin = K = N kB T ⇒ T = . (1.53)
2 3N kB i=1 mi
1.5.2.2 Pressure
Pressure is a second rank tensor. For inhomogeneous systems, one calculates this
tensor by finding the force across potential surfaces [57]. However, for homoge-
neous systems, it is not the most efficient method and one uses the virial theorem
to calculate the configurational part of the pressure tensor, and then add that to the
kinetic part. For the derivation of the virial theorem one can refer to [58]. The full
expression for the pressure tensor of a homogeneous system of particles is given as
20 R. Schneider et al.
⎡ ⎤
N N N
1 ⎣
P (r, t) = mi v i (t)v i (t) + r ij (t)F ij (t)|ri (t)=r ⎦ , (1.54)
V i=1 i=1 j>i
where V is the volume, mi , vi are the mass and velocity of particle i respectively.
The first term represents the kinetic contribution and the second term represents the
configurational part of the pressure tensor. It is clear that the interaction between the
pairs is calculated just once. Note that the above equation is valid for atomic systems
at equilibrium, system of molecules require some modifications to be made, as do
non-equilibrium systems.
The static properties of the system e.g. structure, energy, pressure etc. are obtained
from the pair (or radial) correlation function. Pair correlation function, g(r), gives
the information on the structure of the material. It gives the probability of locating
pairs of atoms separated by a distance r, relative to that for a completely random
distribution at the same density (i.e. the ideal gas). For a crystal, it exhibits a se-
quence of peaks at positions corresponding to shells around a given system. For
amorphous materials and liquid, g(r) exhibits its major peak close to the average
atomic separation of neighboring atoms, and oscillates with less pronounced peaks
at larger distances. The magnitude of the peaks usually decays exponentially with
distance as g(r) → 1. In most cases, g(r) vanishes below a certain distance where
atomic repulsion is strong enough to prevent pairs of atoms from getting too close.
It is defined as
N N
V
g(r) = 2 δ(r − r ij ) . (1.55)
N i=1 j =i
The dynamic and transport properties of the system are obtained from time corre-
lation functions. Any Transport coefficient K can be calculated using generalized
Einstein and Green-Kurbo Formulas [60]
[A(t) − A(0)]
2
∞
K(t) = lim = dτ Ȧ(τ )Ȧ(0) . (1.56)
t→∞ 2t
0
1 Introduction to Molecular Dynamics 21
If one wants to calculate the self diffusion coefficient then A(t) = ri (t) is the
atom position at time t and
Ȧ = v i (t) is the velocity of the atom. For calculating the
shear viscosity, A(t) = mi ri (t)v i (t) and Ȧ = σ αβ . Other transport quantities
can also be calculated similarly. If we compare the value of A(t) with its value at
zero time, A(0) the two values will be correlated at sufficiently short times, but
at longer times the value of A(t) will have no correlation with its value at t = 0.
Information on relevant dynamical processes is contained in the time decay of K(t).
Time correlation function can be related to the experimental spectra by a fourier
transformation.
Fig. 1.1. Left: One hydrogen atom in a carbon lattice. Right: Diffusion paths at 900 K for a
hydrogen atom in graphite. Small frequent jumps and rare large jumps are visible
22 R. Schneider et al.
Multi-Scale modeling is the field of solving physical problems which have important
features at multiple scales, particularly multiple spatial and temporal scales. As
an example, the problem of protein folding has multiple time scales. While the
time scale for the vibration of the covalent bonds is of the order of femtoseconds
(10−15 s), folding time for proteins may very well be of the order of seconds. Well-
known examples of problems with multiple length scales include turbulent flows,
mass distribution in the universe, and vortical structures on the weather map [63]. In
addition, different physical laws may be required to describe the system at different
scales. Take the example of fluids. At the macroscale (meters or millimeters), fluids
are accurately described by the density, velocity and temperature fields, which obey
the continuum Navier-Stokes equations. On the scale of mean free path, it is neces-
sary to use kinetic theory (Boltzmann equations) to get a more detailed description
in the terms of the one-particle phase-space distribution function. At the nanome-
ter scale, molecular dynamics in the form of Newton’s law has to be used to give
the actual position and velocity of each individual atom that makes up the fluid.
If a liquid such as water is used as the solvent for protein folding, then the elec-
tronic structure of the water molecules becomes important and these are described
by Schrödinger’s equation in quantum mechanics. The boundaries between different
levels of theories may vary, depending on the system being studied, but the overall
trend described above is generally valid. At each finer scale a more detailed theory
has to be used, giving rise to more detailed information on the system. Warrier et al.
Granules
KMC in voids
Voids
DiG_Bulk
MCD in granules
5µ DTGD 5 nm
micro−voids
Crystallites
DiG_TGD
PorousGeometry (ωl,h, El,h l,h
m,L )
MD
Fig. 1.2. Multi-scale modeling approach for diffusion of hydrogen in porous graphite
1 Introduction to Molecular Dynamics 23
[61] have done a multi-scale analysis of the diffusion of hydrogen isotope in porous
graphite. They used the insight gained from microscopic models (consisting of a
few hundreds of atoms over a time scale of a few picoseconds and length scale of
nanometersusing MD) into modeling the hydrogen isotope reactions and transports
at meso-scale (trans-granular diffusion, with length scales of few microns) and fur-
ther into the macro-scale (typically a centimeter over a time scale of milliseconds).
Therefore a multi-scale (both in length and time) approach to modeling plasma sur-
face interaction is necessary. The figure below explains the multi-scale modeling
approach clearly.
1.7 Ab Initio MD
In this approach, a global potential energy surface is constructed in a first step either
empirically or based on electronic structure calculations. In a second step, the dy-
namical evolution of the nuclei is generated by using classical mechanics, quantum
mechanics or semi/quasiclassical approximations of various sorts.
Suppose that a useful trajectory consists of about 10M molecular dynamics
steps, i.e. 10M electronic structure calculations are needed to generate one trajec-
tory. Furthermore, it is assumed that 10n independent trajectories are necessary in
order to average over different initial conditions so that 10M+n ab initio molecular
dynamics steps are required in total. Finally, it is assumed that each single-point
electronic structure calculation needed to devise the global potential energy surface
and one ab initio molecular dynamics time step requires roughly the same amount
of CPU time. Based on this truly simplistic order of magnitude estimate, the advan-
tage of ab initio molecular dynamics vs. calculations relying on the computation of
a global potential energy surface amounts to about 103N +6+M+n . The crucial point
is that for a given statistical accuracy (that is for M and n fixed and independent
on N ) and for a given electronic structure method, the computational advantage
of on-the-fly approaches grows like 10N with system size. Of course, considerable
progress has been achieved in trajectory calculations by carefully selecting the dis-
cretization points and reducing their number, choosing sophisticated representations
and internal coordinates, exploiting symmetry etc. but basically the scaling 10N
with the number of nuclei remains a problem. Other strategies consist for instance
in reducing the number of active degrees of freedom by constraining certain inter-
nal coordinates, representing less important ones by a (harmonic) bath or friction,
or building up the global potential energy surface in terms of few-body fragments.
All these approaches, however, invoke approximations beyond the ones of the elec-
tronic structure method itself. Finally, it is evident that the computational advantage
of the on-the-fly approaches diminish as more and more trajectories are needed for
a given (small) system. For instance extensive averaging over many different initial
conditions is required in order to calculate quantitatively scattering or reactive cross
sections.
A variety of powerful ab initio molecular dynamics codes have been developed,
few of them listed here are CASTEP [73], CP-PAW [74], fhi98md [75], NWChem
[76], VASP [77], GAUSSIAN [78], MOLPRO [79] and ABINIT [80, 81].
1 Introduction to Molecular Dynamics 25
where μi (= μ) are the fictitious masses or inertia parameters assigned to the or-
bital degrees of freedom; the units of the mass parameter μ are energy times a
squared time for reasons of dimensionality. ψi are regarded as classical fields, MI
are the ionic masses. The potential energy in the Car-Parrinello Lagrangian can be
written as
Ψ0 |He |Ψ0 = EKS [{ψi }, RI ] , (1.59)
EKS is the LDA-KS energy functional. Within the pseudopotential implementation
of the local density approximation (LDA) in the Kohn-Sham (KS) scheme, the ionic
potential energy corresponding to the electron in the ground state can be found
by minimizing the KS total-energy functional EKS [{ψi }, RI ] with respect to the
one-particle wavefunction ψi (r) describing the valence-electron density subject to
orthonormalization constraints. The explicit expression of EKS in terms of orthonor-
mal one-particle orbitals ψi (r) is
The terms on the right-hand side of the previous equation are, respectively, the
electronic kinetic energy, the electrostatic Hartree term, the integral of the LDA
exchange and correlation energy density ǫXC , the electron-ion pseudopotential in-
teraction, and the ion-ion interaction potential energy. The electronic density ρ(r)
is given by
ρ(r) = fi |ψi (r)|2 , (1.61)
i
where fi are occupation numbers.
26 R. Schneider et al.
The corresponding Newtonian equations of motion are obtained from the asso-
ciated Euler-Lagrange equations
d ∂L ∂L
= , (1.62)
dt ∂ ṘI ∂RI
d δL δL
= (1.63)
dt δ ψ̇i∗ δψi∗
like in classical mechanics, but here for both the nuclear positions and the orbitals;
note ψi∗ = ψi | and that the constraints are holonomic (which can be expressed in
the form f (r1 , r 2 , ..., t) = 0). Following this route of ideas, generic Car-Parrinello
equations of motion are found to be of the form
∂ ∂
MI R̈I (t) = − Ψ0 |He |Ψ0 + {constraints} , (1.64)
∂RI ∂RI
δ δ
μi ψ̈i (t) = − ∗ Ψ0 |He |Ψ0 + {constraints} . (1.65)
δψi δψi∗
Note that the constraints within the total wavefunction lead to constraint forces
in the equations of motion. Note also that these constraints might be a function of
both the set of orbitals {ψi } and the nuclear positions {RI }. These dependencies
have to be taken into account properly in deriving the Car-Parrinello equations fol-
lowing from (1.58) using (1.62) and (1.63).
According to the Car-Parrinello equations of motion, the nuclei evolve in time
2
at a certain (instantaneous) physical temperature ∝ I MI ṘI , whereas a fictitious
temperature ∝ i μi ψ̇i |ψ̇i is associated to the electronic degrees of freedom. In
this terminology, low electronic temperature or cold electrons means that the elec-
tronic subsystem is close to its instantaneous minimum energy min{ψi } Ψ0 |He |Ψ0
i.e. close to the exact Born-Oppenheimer (BO) surface. Thus, a ground-state wave-
function optimized for the initial configuration of the nuclei will stay close to its
ground state also during time evolution if it is kept at a sufficiently low temperature.
The remaining task is to separate in practice nuclear and electronic motion such that
the fast electronic subsystem stays cold also for long times but still follows the slow
nuclear motion adiabatically (or instantaneously). Simultaneously, the nuclei are
nevertheless kept at a much higher temperature. This can be achieved in nonlinear
classical dynamics via decoupling of the two subsystems and (quasi-)adiabatic time
evolution. This is possible if the power spectra stemming from both dynamics do not
have substantial overlap in the frequency domain so that energy transfer from the
hot nuclei to the cold electrons becomes practically impossible on the relevant time
scales. This amounts in other words to imposing and maintaining a metastability
condition in a complex dynamical system for sufficiently long times.
The Hamiltonian or conserved energy is the constant of motion (like classical
MD, with relative variations smaller than 10−6 and with no drift), which serves as
an extremely sensitive check of the molecular dynamics algorithm. Contrary to that
the electronic energy displays a simple oscillation pattern due to the simplicity of
1 Introduction to Molecular Dynamics 27
the phonon modes. Most importantly, the fictitious kinetic energy of the electrons is
found to perform bound oscillations around a constant, i.e. the electrons do not heat
up systematically in the presence of the hot nuclei.
As we have seen above, Car-Parrinello method gives physical results even if the
orbitals are not at the BO surface, provided that the electronic and ionic degrees
of freedom remain adiabatically separated and electrons remain close to the BO
surface. Loss of adiabacity would mean that there is transfer of energy from hot
nuclei to cold electron and Car-Parrinello MD deviates from BO surface.
1.8.1 Adiabaticity
The metastable two-temperature regime setup in the CP dynamics is extremely ef-
ficient at approximating the constraints of maintaining the electronic energy func-
tional at the minimum without explicit minimization. At the beginning of the nu-
merical simulation, the electronic subsystem is in an initial state which is very close
to the minimum of the energy surface. When the ions start moving, their motion
causes a change in the instantaneous position of the minimum in the electronic pa-
rameter space. The electrons experience restoring forces and start moving. If they
start from a neighborhood of a stable equilibrium position, there will be range of
initial velocities such that a regime of small oscillations is originated.
A simple harmonic analysis of the frequency spectrum of the orbital classical
fields close to the minimum defining the ground state yields [82]
1/2
2(ǫi − ǫj )
ωij = , (1.66)
μ
where ǫj and ǫi are the eigen values of occupied and unoccupied orbitals, respec-
tively. The analytic estimate for the lowest possible electronic frequency
1/2
min Egap
ωe ∝ (1.67)
μ
shows that this frequency increases like the square root of the electronic energy dif-
ference Egap between the lowest unoccupied and the highest occupied orbital. On
the other hand it increases similarly for a decreasing fictitious mass parameter μ.
Since the parameters Egap and the maximum phonon frequency (ωnmax ) are dictated
by physics, the only parameter in our hands to control adiabatic separation is the
fictitious mass, which is therefore also called adiabaticity parameter. However, de-
creasing μ not only shifts the electronic spectrum upwards on the frequency scale,
but also stretches the entire frequency spectrum according to (1.66). This leads to
an increase of the maximum frequency according to
1/2
max Ecut
ωe ∝ , (1.68)
μ
where Ecut is the largest kinetic energy in an expansion of the wavefunction in
terms of a plane wave basis set. Limitation to decrease arbitrarily kicks in due to the
28 R. Schneider et al.
maximum length of the molecular dynamics time step ∆tmax that can be used. The
time step is inversely proportional to the highest frequency in the system, which is
ωemax and thus the relation
1/2
max μ
∆t ∝ . (1.69)
Ecut
In the limit, when, electronic gap is very small or even vanishes Egap → 0 as
is the case for metallic systems, all the above-given arguments break down due to
the occurrence of zero-frequency electronic modes in the power spectrum according
to (1.67), which necessarily overlap with the phonon spectrum. It has been shown
that the coupling of separate Nosé-Hoover thermostats [68, 69, 83] to the nuclear
and electronic subsystem can maintain adiabaticity by counterbalancing the energy
flow from ions to electrons so that the electrons stay cool [84]; see [85] for a sim-
ilar idea to restore adiabaticity. Although this method is demonstrated to work in
practice [86], this ad hoc cure is not entirely satisfactory from both a theoretical and
practical point of view so that the well-controlled Born-Oppenheimer approach is
recommended for strongly metallic systems.
has been restricted to cases in which modest potential quality seems sufficient and
in which discrete spectral lines or state-selected dynamics are not required, as in
rate constant calculations based on classical trajectories [103] or in transition state
theory [104, 105]. In contrast, the highest-accuracy ab initio calculations can take
hours or more of computer time, even for small systems. Another obstacle for on-
the-fly calculations of ab initio energies is the failure or non-convergence of the ab
initio method. One frequently comes across this problem when the nuclear config-
urations are in a state for which the selected ab initio method fails. This is seen in
particular for dissociating molecules. The absence of ab initio energy on the sur-
face can be treated as hole in the surface and can be corrected on the pre-calculated
surface. Moreover, carefully adding the ab initio fragment data for the dissociat-
ing molecule allows to study reaction dynamics on high quality surface. Thus, the
construction of accurate analytic representations of PES is a necessary step in full
quantum spectroscopic and dynamics studies.
The number of high-level ab initio data points currently needed for adequate
sampling of dynamically significant regions typically ranges from several hundred
to several thousand points for tri- and tetra-atomic systems. Methods that use deriva-
tives typically use fewer configurations; however, the number of pieces of informa-
tion is typically in the same range [106, 107, 108, 109, 110, 111, 112, 113].
In constructing the PES the prescribed functional form must be carefully crafted
so that it
(i) does not introduce arbitrary features,
(ii) achieves the required smoothness,
(iii) preserves any necessary permutation symmetry, and
(iv) agrees with any known asymptotic form of the underlying PES.
An analytic fit that has a residual significantly larger than the error in the high level
ab initio calculations is only marginally more useful than if a lower-level calcula-
tion is employed. High-quality ab initio calculations demand representations that
preserve their level of accuracy.
One such method named, Reproducing kernel Hilbert space (RKHS) was intro-
duced by Hollebeek et al. [114]. Several other examples of carefully crafted analytic
representations are listed in [114].
all particles within the interaction range of the potential the longer the range of the
potential the larger the number forces that must be calculated at each time step.
The simulation region or cell is effectively replicated in all spatial directions, so that
particles leaving the cell reappear at the opposite boundary. For systems governed
by a short-ranged potential – say Lennard-Jones or hard spheres – it is sufficient to
take just the neighbouring simulation volumes into account, leading to the minimum-
image configuration shown in Fig. 1.3.
The potential seen by the particle at r i is summed over all other particles
rj , or their periodic images (r j ± n), where n = (ix , iy , iz )L, with iα =
0, ±1, ±2, ±3... ± ∞ whichever is closest. L denotes the length of the simulation
box. More typically, this list is further restricted to particles lying within a sphere
centred on ri6 . For long-range potentials, this arrangement is inadequate because the
contributions from more distant images at 2L, 3L etc., are no longer negligible.
One is faced with the challenge of arranging the terms in the potential energy equa-
tion so that the contribution from oppositely charged pairs of charges cancel and the
summation series converges, and preferably as fast as possible.
A way to achieve this is to add image cells radially outwards from the origin as
shown in Fig. 1.4 (this is to build up sets of images contained within successively
larger spheres surrounding the simulation region).
1 Introduction to Molecular Dynamics 31
Fig. 1.3. Periodic boundary conditions for simulation region (centre, dark-shaded particles
at positions r j ), showing minimum-image box for reference ion ⊕ at position r i containing
nearest periodic images (lightshaded particles at positions r j ± n)
For the above scheme the potential at ri due to charges at rj and image cells is
i N
′ qj
Vs (ri ) = , (1.70)
n j=1
|rij + n|
+
–
+ + + +
– – – –
+ + + + + + + + +
– – – – – – – – –
+ + + +
– – – –
+
-
Fig. 1.4. Constructing a convergent sum over periodic images (adapted from Allen &
Tildesley)
32 R. Schneider et al.
to the correct value, but only slowly. The summation over the boxes as shown in
(1.70) is computionally expensive for N -body problem. The O(N 2 ) is turned into
a Nbox × N 2 operation problem.
Ewald’s idea got around this problem by recasting the potential equation into
sum of two rapidly converging series, one in real space and one in the reciprocal k-
space. Consider the simple Gaussian distribution originally used by Ewald himself
α3 −α2 r2
σ(r) = e , (1.71)
π 3/2
which is normalized such that
∞
σ(r)dr = 1 . (1.72)
0
Note that α determines the height and width of the effective size of the charges
(called spreading function). To obtain the real-space term depicted in Fig. 1.5, we
just subtract the lattice sum for the smeared out charges fom the original point-
charge sum, thus
N
∞
′ qj 3
Vr (r i ) = 1− σ(r − rij )d r
n j=1
|r ij + n|
0
|r
ij +n|
′ 1 4α3 2 2
= qj − 1/2 r2 e−α r
dr
n j
|r ij + n| π |rij + n|
0
∞
3
4α 2 2
− r e−α r
dr . (1.73)
π 1/2
|r ij +n|
The second term in the above equation can be integrated by parts to give an error
function
x
2 2
erfc(x) = 1 − 1/2 e−t dt , (1.74)
π
0
Fig. 1.5. Splitting the sum for point charges into two rapidly convergent series for Gaussian-
shaped charges
1 Introduction to Molecular Dynamics 33
plus a term which exactly cancels the third term. This gives
N
′ erfc(α|r ij + n|)
Vs (r i ) = qj . (1.75)
n j=1
|rij + n|
Now for the reciprocal-space sum, consider the charge density of the whole lattice
at some arbitrary position r
ρ(r) = qj δ(r − rj ) . (1.76)
j
Since the lattice is periodic, we can express this quivalently as a Fourier sum
ρ(r) = L−3 f (k)e−ik·r) , (1.77)
j k
This is the convolution of function ρ(r) with function σ(r), which can be expressed
in Fourier space as
1 ′
ρ′ (r) = f (k)φ(k, α)e−ik·r , (1.81)
L3
k
where φ(k, α) is the Fourier transform of the charge-smearing function σ(r), i.e.
2
/(4α2 )
φ(k, α) = e−|k| . (1.82)
The potential due to the smeared charges in k-space at the reference position ri is
∞
∞ −ik·r
ρ′ (ri + r) 1 ′ −ik·r e
Vk (ri ) = dr = 3 f (k)φ(k, α) e d3 r . (1.83)
r L r
0 k 0
34 R. Schneider et al.
The integral on the right of the above expression is 4π/k 2 . Combining this with
earlier results from (1.79) and (1.82) we get
2 2
4π ′ ik·(rj −ri ) e−|k| /(4α )
Vk (ri ) = 3 qj e . (1.84)
L j
|k|2
k
f i = −∇ri U
N
qi erfc(α|r ij + n|) 2α 2 2 rij + n
= qj + √ e−α |rij +n|
4πǫ0 n |rij + n| π |rij + n|
j=1,j =i
Real-space term
N
2 k −k2 /(4α2 )
+ qi 2 e sin(k · ri ) qj cos(k · rj )
ǫ0 V k j=1
k>0
N
!
− cos(k · r i ) qj sin(k · r j )
j=1
Reciprocal-space term
N
qi
− qj rj . (1.87)
6ǫ0 V j=1
Surface dipole term
1 Introduction to Molecular Dynamics 35
whose derivative is absent from the equation for the forces (1.87). This term cor-
rects for interactions between charges on the same molecule which are implicitly
included in the reciprocal space sum, but are not required in the rigid-molecule
model. Although the site forces f i , do include unwanted terms, these sum to zero
in the evaluation of the molecular center-of-mass forces and torques (by the conser-
vation laws for linear and angular momentum).
Both, the real- and reciprocal-space series (the sums over n and k) converge
fairly rapidly so that only a few terms are need to be evaluated. One defines the
cut-off distances rc and kc so that only terms with |r ij + n| < rc and |k| < kc
are included. The parameter α determines how rapidly the terms decrease and the
values of rc and kc needed to achieve a given accuracy.
For a fixed α and accuracy the number of terms in the real-space sum is pro-
portional to the total number of sites, N but the cost of the reciprocal-space sum
increases as N 2 . An overall scaling of N 3/2 may be achieved if α varies with N .
This is discussed in detail in an excellent article by D. Fincham [115]. The optimal
value of α is
1
√ tR N 6
α= π , (1.89)
tF V 2
where tR and tF are the execution times needed to evaluate a single term in the real-
and reciprocal-space sums respectively. If we require that the sums converge to an
accuracy of ǫ = exp(−p) the cutoffs are then given by
√
p
rc = , (1.90)
α√
kc = 2α p . (1.91)
A representative value of tR /tF has been established as 5.5. Though this will
vary on different processors and for different potentials its value is not critical since
it enters the equations as a sixth root.
It must be emphasized that the rc is used as a cutoff for the short-ranged poten-
tials as well as for the electrostatic part. The value chosen above does not take the
nature of the non-electrostatic part of the potential into account.
In a periodic system the electrostatic energy is finite only if the total electric charge
of the MD cell is zero. The reciprocal space sum for k = 0 takes the form
36 R. Schneider et al.
" N "2
1 −k2 /(4α2 ) "" ""
e " qi " , (1.92)
k2 "
i=1
"
which is zero in the case of electro-neutrality but infinite otherwise. Its omission is
physically equivalent to adding a uniform jelly of charge which exactly neutralizes
the unbalanced point charges. But though the form of the reciprocal space sum is
unaffected by the uniform charge jelly the real-space sum is not. The real-space part
of the interaction of the jelly with each point charge as well as the self-energy of the
jelly itself must be included giving
" "2
1 "" ""
N
− " qi " . (1.93)
8ǫ0 V α2 " i=1 "
This term accounts for different periodic boundary conditions. It was suggested by
De Leeuw, Perram and Smith [116, 117, 118] in order to accurately model dipolar
systems and is necessary in any calculation of a dielectric constant
⎡ " "2 ⎤
1 "N "
" "
+⎣ " qi ri " ⎦ . (1.94)
6ǫ0 V " i=1
"
Consider a near-spherical cluster of MD cells. The infinite result for any property
is the limit of its cluster value as the size of the cluster tends to infinity. However,
this value is non-unique and depends on the dielectric constant, ǫs of the physical
medium surrounding the cluster. If this medium is conductive (ǫs = ∞) the dipole
moment of the cluster is neutralized by image charges, whereas in a vacuum (ǫs = 1)
it remains. It is trivial to show that in that case the dipole moment per unit volume
(or per MD cell) does not decrease with the size of the cluster. This term is then just
the dipole energy, and ought to be used in any calculation of the dielectric constant
of a dipolar molecular system.
There is a large number of N -body problems for which periodic boundaries are
completely inappropriate, for example: galaxy dynamics, electron-beam transport,
large proteins [119], and any number of problems with complex geometries. Two
new approaches were put forward in the mid-1980’s, the first from Appel [120]
and Barnes & Hut [121], who proposed O(N log N )-schemes based on hierarchical
grouping of distant particles; the second from Greengard & Rohklin [122] with an
O(N ) (better than O(N log N )) solution with rounding-error accuracy. These two
methods are known today as the hierarchical tree algorithm and the Fast Multipole
Method (FMM) respectively – have revolutionized N -body simulation in a much
1 Introduction to Molecular Dynamics 37
broader sense than the specialized periodic methods discussed earlier. They offer a
generic means of accelerating the computation of many-particle systems governed
by central, long-range potentials.
References
1. Y. Duan, L. Wang, P. Kollman, P. Natl. Acad. Sci. USA 95, 9897 (1998) 3
2. Q. Zhong, P. Moore, D. Newns, M. Klein, FEBS Lett. 427, 267 (1998) 3
3. Q. Zhong, Q. Jiang, P. Moore, D. Newns, M. Klein, Biophys. J. 74, 3 (1998) 3
4. R. Car, M. Parrinello, Phys. Rev. Lett. 55, 2471 (1985) 3, 25
5. G. Galli, M. Parrinello, in Proceedings of the NATO Advanced Study Institute on Com-
puter Simulation in Material Schience: Interatomic Potentiols, Simulation Techniques
and Applications, Aussois, France, 25 March - 5 April 1991, Vol. 3, ed. by M. Meyer,
V. Pontikis (Kluwer Academic Publishers, Dordrecht, The Netherlands, 1991), Vol. 3,
pp. 283–304 3
6. D. Heermann, Computer Simulation Methods (Springer, Berlin Heidelberg New York,
1986) 5
7. H. Berendsen, J. Postma, W. van Gunsteren, A. DiNola, J. Haak, J. Chem. Phys. 81,
3684 (1984) 7
8. H. Andersen, J. Chem. Phys. 72, 2384 (1980) 7
9. W. Hoover, Phys. Rev. A 31, 1695 (1985) 7
10. A. Voter, F. Montalenti, T. Germann, Annu. Rev. Mater. Res. 32, 321 (2002) 8
11. J. Lennard-Jones, P. Roy. Soc. Lond. 43, 461 (1931) 8
12. P. Morse, Phys. Rev. 34, 57 (1929) 9, 14
13. A. Rahman, Phys. Rev. 136, A405 (1964) 9
14. L. Verlet, Phys. Rev. 159, 98 (1967) 9, 14
15. J. Tersoff, Phys. Rev. Lett. 56, 632 (1986) 13
16. J. Tersoff, Phys. Rev. B 37, 6991 (1988) 13
17. J. Tersoff, Phys. Rev. Lett. 61, 2879 (1988) 13
18. J. Tersoff, Phys. Rev. B 39, 5566 (1989) 13
19. W. Jorgensen, J. Madura, C. Swenson, J. Am. Chem. Soc. 106, 6638 (1984) 13
20. N. Allinger, K. Chen, J. Lii, J. Comput. Chem. 17, 642 (1996) 13
21. W. Jorgensen, D. Maxwell, J. Tiradorives, J. Am. Chem. Soc. 118, 11225 (1996) 13
22. W. Cornell, P. Cieplak, C. Bayly, I. Gould, K. Merz, D. Ferguson, D. Spellmeyer, T. Fox,
J. Caldwell, P. Kollman, J. Am. Chem. Soc. 118, 2309 (1996) 13
23. T. Halgren, J. Comput. Chem. 17, 490 (1996) 13
24. S. Nath, F. Escobedo, J. de Pablo, J. Chem. Phys. 108, 9905 (1998) 13
25. M. Martin, J. Siepmann, J. Phys. Chem. B 102, 2569 (1998) 13
26. H. Sun, J. Phys. Chem. B 102, 7338 (1998) 13
27. D. Brenner, Mat. Res. Soc. Symp. Proc. 141, 59 (1989) 13
28. M. Ramana Murty, H. Atwater, Phys. Rev. B 51, 4889 (1995) 13
29. A. Dyson, P. Smith, Surf. Sci. 355, 140 (1996) 13
30. D. Brenner, Phys. Rev. B 42, 9458 (1990) 13
31. D. Brenner, Phys. Rev. B 46, 1948 (1992) 13
32. D. Brenner, J. Harrison, C. White, R. Colton, Thin Solid Films 206, 220 (1991) 13
33. D. Brenner, K. Tupper, S. Sinnott, R. Colton, J. Harrison, Abstr. Pap. Am. Chem. S.
207, 166 (1994) 13
38 R. Schneider et al.
34. J. Harrison, S. Stuart, D. Robertson, C. White, J. Phys. Chem. B 101, 9682 (1997) 13
35. S. Sinnott, R. Colton, C. White, O. Shenderova, D. Brenner, J. Harrison, J. Vac. Sci.
Technol. A 15, 936 (1997) 13
36. J. Harrison, C. White, R. Colton, D. Brenner, Phys. Rev. B 46, 9700 (1992) 13
37. J. Harrison, R. Colton, C. White, D. Brenner, Wear 168, 127 (1993) 13
38. J. Harrison, C. White, R. Colton, D. Brenner, J. Phys. Chem. 97, 6573 (1993) 13
39. J. Harrison, D. Brenner, J. Am. Chem. Soc. 116, 10399 (1994) 13
40. J. Harrison, C. White, R. Colton, D. Brenner, Thin Solid Films 260, 205 (1995) 13
41. M. Perry, J. Harrison, Langmuir 12, 4552 (1996) 13
42. D. Allara, A. Parikh, E. Judge, J. Chem. Phys. 100, 1761 (1994) 13
43. R. Smith, K. Beardmore, Thin Solid Films 272, 255 (1996) 13
44. M. Nyden, T. Coley, S. Mumby, Polym. Eng. Sci 37, 1496 (1997) 13, 14
45. J. Che, T. Cagin, W. Goddard, Theor. Chem. Acc. 102, 346 (1999) 13, 14
46. K. Nordlund, J. Keinonen, T. Mattila, Phys. Rev. Lett. 77, 699 (1996) 13
47. S. Stuart, B. Berne, J. Phys. Chem. 100, 11934 (1996) 14
48. R. Hockney, J. Eastwood, Computer Simulation Using Particles (McGraw-Hill,
New-York, USA, 1981) 14
49. W. Swope, H. Andersen, P. Berens, K. Wilson, J. Chem. Phys. 76, 637 (1982) 14
50. D. Beeman, J. Comput. Phys. 20, 130 (1976) 14
51. G. Martyna, M. Tuckerman, J. Chem. Phys. 102, 8071 (1995) 14
52. M. Tuckerman, B. Berne, G. Martyna, J. Chem. Phys. 97, 1990 (1992) 14
53. C. Gear, Numerical Initial Value Problems in Ordinary Differential Equations (Chap. 9)
(Prentice Hall, Englewood Cliffs, NJ, USA, 1971) 16, 18
54. H. Yoshida, Phys. Lett. A 150, 262 (1990) 18
55. D. Frenkel, B. Smit, Understanding Molecular Simulation: From Algorithms to Appli-
cations (Academic Press, San Diego, 1996) 19
56. M. Allen, D. Tildesley, Computer simulation of liquids (Clarendon Press, Oxford, 1987)
19, 20, 21, 23, 30
57. B. Todd, D. Evans, P. Daivis, Phys. Rev. E 52, 1627 (1995) 19
58. J. Irving, J. Kirkwood, J. Chem. Phys. 18, 817 (1950) 19
59. D. McQuarrie, Statistical Mechanics (Harper and Row, New York, 1976) 20
60. D. Frenkel, B. Smit, Understanding Molecular Simulation: From Algorithms to Appli-
cations (Academic Press, San Diego, 2002) 20
61. M. Warrier, R. Schneider, E. Salonen, K. Nordlund, Contrib. Plasma Phys. 44, 307
(2004) 21, 23
62. J. Klafter, M. Shlesinger, G. Zumofen, Phys. Today 2, 33 (1996) 21
63. E. Weinan, B. Engquist, Not. Am. Math. Soc 50, 1062 (2003) 22
64. B. Berne, G. Ciccotti, C. D.F. (eds.), Classical and Quantum Dynamics in Condensed
Phase Simulations (World Scientific Publishing Company, Singapore, Singapore, 1998)
23
65. K. Binder, G. Ciccotti (eds.), Monte Carlo and Molecular Dynamics of Condensed
Matter Systems (Editrice Compositori, Bologna, Italy, 1996) 23
66. G. Ciccotti, D. Frenkel, I. McDonald, Simulation of Liquids and Solids (North Holland,
Amsterdam, 1987) 23
67. R. Esser, P. Grassberger, J. Grotendorst, M. Lewerenz (eds.), Molecular Dynamics
on Parallel Computers (World Scientific Publishing Company, Singapore, Singapore,
1999) 23
68. D. Frenkel, B. Smit, Understanding Molecular Simulations: From Algorithms to Appli-
cations (Academic Press, San Diego, 2005) 23, 28
1 Introduction to Molecular Dynamics 39
Classical molecular dynamics (MD) is a well established and powerful tool in vari-
ous fields of science, e.g. chemistry, plasma physics, cluster physics and condensed
matter physics. Objects of investigation are few-body systems and many-body sys-
tems as well. The broadness and level of sophistication of this technique is docu-
mented in many monographs and reviews, see for example [1, 2]. Here we discuss
the extension of MD to quantum systems (QMD). There have been many attempts
in this direction which differ from each other, depending on the type of system un-
der consideration. One variety of QMD has been developed for condensed matter
systems. This approach is reviewed e.g. in [3] and will not be discussed here. In
this contribution we deal with unbound electrons as they occur in gases, fluids or
plasmas. Here, a quite successful strategy is to replace classical point particles by
wave packets [3, 4, 5, 6]. This method, however, struggles with problems related
to the dispersion of such a wave packet and difficulties to properly describe strong
electron-ion interaction and bound-state formation. We try to avoid these restric-
tions by an alternative approach: We start the discussion of quantum dynamics by a
general consideration of quantum distribution functions.
V. S. Filinov et al.: Wigner Function Quantum Molecular Dynamics, Lect. Notes Phys. 739, 41–60 (2008)
DOI 10.1007/978-3-540-74686-7 2 c Springer-Verlag Berlin Heidelberg 2008
42 V. S. Filinov et al.
N N
= p2i
H + V# (qi ) + V (qi , qj ) , (2.3)
j=1
2m i=1 i<j
where V# (qi ) and V (qi , qj ) denote an external and an interaction potential, respec-
tively. The equation of motion for fW has the form [8, 7] (see also Sect. 2.3)
∞
∂fW p
+ · ∇ q fW = ds fW (p − s, q, t) ω
# (s, q, t) , (2.4)
∂t m
−∞
takes into account the non-local contribution of the potential energy in the quantum
case. Equivalently, expanding the integral around q ′ = 0, (2.4) can be rewritten by
an infinite sum of local potential terms
∞
∂fW p ∂fW (/(2i))2n ∂ 2n+1 V ∂ 2n+1 fW
+ = , , (2.6)
∂t m ∂q n=0
(2n + 1)! ∂q 2n+1 ∂p2n+1
where (∂ 2n+1 V /∂q 2n+1 , ∂ 2n+1 fW /∂p2n+1 ) denotes the scalar product of two vec-
tors which for an N -particle system contain 3N components.
If the potential does not contain terms higher than second order in q, i.e.
∂ n V /∂q n |n≥3 = 0, (2.6) reduces to the classical Liouville equation for the dis-
tribution function f :
∂f p ∂f ∂V ∂f
+ = . (2.7)
∂t m ∂q ∂q ∂p
The Wigner function must satisfy a number of conditions [9], therefore, the initial
function fW (q, p, 0) cannot be chosen arbitrarily. Even if fW (q, p, t) satisfies the
classical equation (2.7) it nevertheless describes the evolution of a quantum distri-
bution because a properly chosen initial function fW (q, p, 0) contains, in general, all
2 Wigner Function Quantum Molecular Dynamics 43
In order to obtain an effective pair potential which is finite at zero interparticle dis-
tance, we consider (2.4) for two particles. Assuming further thermodynamic equi-
librium with a given temperature kB T = 1/β, spatial homogeneity and neglect-
ing three-particle correlations, one can solve for the two-particle Wigner function
eq eq
fW,12 = F12 (r1 , p1 , r2 , p2 , β) ≈ F12 (r1 − r2 , p1 , p2 , β).
eq
This is now rewritten as in the canonical case [7], F12 (r1 − r2 , p1 , p2 , β) ≡
eq eq qp
F1 (p1 , β)F2 (p2 , β) exp(−βV12 ), which defines the desired quantum pair poten-
qp
tial V12 .
qp
The first solution for V12 was found by Kelbg in the limit of weak coupling
[16, 17, 18]. It has the form of (2.10) with γij → 1, for details and references see
[10, 19]. The Kelbg potential, or slightly modified versions, is widely used in nu-
merical simulations of dense plasmas [4, 5, 20, 21, 22]. It is finite at zero distance
44 V. S. Filinov et al.
which correctly captures basic quantum diffraction effects preventing any diver-
gence. However, the absolute value at r = 0 is incorrect which has lead to the
derivation of further improved potentials, see [10, 19, 23] and references therein.
Here we use the improved Kelbg potential (IKP),
$ %
qi qj 2 2 √ rij rij
Φ (rij , β) = 1 − e−rij /λij + π 1 − erf γij , (2.10)
rij λij γij λij
where rij = |r ij |, xij = rij /λij , λ2ij = 2 β/(2μij ) and μ−1
ij = mi
−1
+ m−1 j ,
which contains additional free parameters γij that can be obtained from a fit to the
exact solution of the two-particle problem [19].
40
20
H/NP, [eV]
–20
rs = 4, MD
rs = 6, MD
–40 rs = 4, PIMC
rs = 6, PIMC
Fig. 2.1. Internal energy per hydrogen atom at rs = 4 and rs = 6 versus temperature, MD
results are compared to restricted PIMC simulations [19, 24]
molecule formation (see below), there also appear clusters of several molecules
which is unphysical under the present conditions and is caused by the approximate
two-particle treatment of quantum effects in the IKP. This turns out to be the reason
for the too small energy at low temperatures (see Fig. 2.1).
Let us now turn to a more detailed analysis of the spatial configuration of the
particles. In Fig. 2.2 the pair distribution functions of all particle species with the
same charge are plotted at two densities. Consider first the case of T = 125 000 K
(upper panels). For both densities all functions agree qualitatively showing a de-
pletion at zero distance due to Coulomb repulsion. Besides, there are differences
which arise from the spin properties. Electrons with the same spin show a Coulomb
hole around r = 0 which is broader than the one of the protons due to the Pauli
principle with additional repulsion of electrons with the same spin projection. This
trend is reversed at low temperatures (see middle panel), which is due to the for-
mation of hydrogen atoms and molecules. In this case, electrons, i.e., their classical
trajectories, are spread out around the protons giving rise to an increased probability
of close encounters of two electrons belonging to different atoms compared to two
protons.
Now, let us compare electrons with parallel and electrons with anti-parallel
spins. In all cases, we observe a significantly increased probability to find two elec-
trons with opposite spin at distances below one Bohr radius, which is due to the
missing Pauli repulsion. This trend increases when the temperature is lowered be-
cause of increasing quantum effects. Before analyzing the lowest temperature in
Fig. 2.2, let us consider the electron-proton (e-p) distributions. Multiplying these
functions by r2 gives essentially the radial probability density Wep (r) = r2 gep (r),
46 V. S. Filinov et al.
1 1
1 1
e↑– e↑
80
e↑– e↓
100
p-p
40 50
0 2 4 6 8 0 2 4 6 8
r/aB r/aB
Fig. 2.2. Electron-electron (e-e) and proton-proton (p-p) pair distribution functions for a cor-
related hydrogen plasma with rs = 4 (left row) and rs = 6 (right row) for T = 125 000 K,
61 250 K and 31 250 K (from top to bottom) [19]
which is plotted in Fig. 2.3. At low temperatures this function converges to the
ground state probability density of the hydrogen atom Wep (r) = r2 |ψ|21s (r) influ-
enced by the surrounding plasma. Here, lowering of the temperature leads towards
the formation of a shoulder around 1.4aB for rs = 4 and 1.2aB for rs = 6 which
is due to the formation of hydrogen atoms; this is confirmed by the corresponding
quasi-bound electron trajectories. At this temperature, the observed most probable
electron distance is slightly larger than one aB as in the atom hydrogen ground state.
Of course, classical MD cannot yield quantization of the bound electron motion, but
it correctly reproduces (via averaging over the trajectories) the statistical properties
of the atoms, such as the probability density averaged over the energy spectrum.
At 62 500 K and rs = 6 (right middle part of Fig. 2.2) the simulations show a
first weak signature of molecule formation – see the maximum of the p-p distri-
bution function around r = 2aB and the maximum of the distribution function of
electrons with anti-parallel spins around r = 1.5aB . Upon further lowering of the
temperature by a factor of two (lower panel of Fig. 2.2) the p-p functions exhibit a
clear peak very close to r = 1.4aB , the theoretical p-p separation in H2 . At the same
time, also the e-e functions have a clear peak around r = 0.5aB , the two electrons
are concentrated between the protons. In contrast, in the case of parallel spins, no
molecules are formed, the most probable electron distance is around r = 1.2aB .
2 Wigner Function Quantum Molecular Dynamics 47
rs = 4
20
10 T = 166667 K
T = 125000 K
T = 62500 K
gep(r) r2
T = 50000 K
rs = 6
20
10
0
0 1 2 3 4 5
r/aB
Fig. 2.3. Electron-proton (e-p) pair distribution functions multiplied by r 2 as function of e-p
distance at rs = 4 (top) and rs = 6 (bottom) at four temperatures [19]
where r αi (t) denotes the trajectory of particle i obtained in the simulation. We now
define the three partial density-density time correlation functions (DDCF) between
sorts α and η as
1
Aαη (k, t) = ρα (k, t)ρη (−k, 0) , (2.13)
Nα + Nη
where, due to isotropy, k = k. Here ρα (k, t)ρη (−k, 0) denotes averaging along
the trajectories by shifting the time interval and keeping the difference equal to t.
48 V. S. Filinov et al.
Note also, that Aαη (k, t) = Aηα (k, t) for all pairs α and η. In addition to the
spin-resolved electron functions we can also consider the spin averaged correlation
function A(k, t) = A↑↑ (k, t) + A↓↑ (k, t).
We have performed a series of simulation runs of equilibrium fluctuations in
hydrogen plasmas with coupling parameters Γ and electron degeneracy√ parameters
χe = ρΛ3e with the electron de Broglie wavelength Λe = / 2πme kB T ranging
from zero (classical system) to one (quantum or degenerate system). The electron
DDCF for Γ = 1 and χe = 1 are plotted in Fig. 2.4 for four values of the di-
mensionless wavenumber q = kr̄. The correlation functions (↑↑ and ↓↑) have two
characteristic features – a highly damped, high-frequency part and a weakly damped
low-frequency tail. The latter originates from slow ionic motion whereas the high-
frequency part is related to oscillations with frequencies close to the electron plasma
frequency ωpl . On the other hand, the time scale of the ion motion is determined
i
by the ion plasma frequency ωpl = 4πρi Zi2 e2 /mi , the ratio of the two time
scales is mi /me ≈ 43. The slow proton oscillations are clearly seen in the proton
DDCF, shown in Fig. 2.5. To resolve the proton oscillations the whole simulation
(including the electron dynamics) has to extend over several proton plasma periods
i
Tp = 2π/ωpl thereby resolving the fast electronic motions as well, which sets the
numerical limitation of the calculation.
The temporal Fourier transform of the DDCF yields another very important
quantity – the dynamic structure factor, Sα,η (ω, q), which allows one to analyze,
e.g., the dispersion of the coupled electron and proton oscillations. Fig. 2.6 shows
q = 0.39
200 ↑↑ q = 0.55
q = 1.22
q = 1.73
40
<ρe(t)ρe(0)>
0 ↓↑
–40
100
0
0 40 80
t⋅ωpl
Fig. 2.4. Electron DDCF (2.13) multiplied by (Ne↑ + Ne↓ ) for Γ = 1 and χe = 1 for four
wave vectors. Upper (middle) panel: Correlation functions for parallel (antiparallel) spins.
Bottom: Spin-averaged function [25]
2 Wigner Function Quantum Molecular Dynamics 49
q = 0.39
q = 0.55
400 q = 0.95
q = 1.22
q = 1.73
<ρi(t)ρi(0)>
200
0 2 4 6
t⋅ωipl
Fig. 2.5. Proton DDCF (2.13) for Γ = 1 and χe = 1 for five wave vectors (in units of
1/r̄) [25]
3
Γ = 0.1 ρΛ3 = 0.5
Γ = 1.0 ρΛ3 = 0.5
Γ = 1.0 ρΛ3 = 1.0
Γ = 0.1 ρΛ3 = 0.1
2
ωpl
i
0
0 0,5 1 1,5
q
Fig. 2.6. Ion-acoustic wave dispersion in a dense hydrogen plasma. Lines correspond to
weighted linear fits to the MD data (symbols). The scatter of the data is due to the limited
particle number N and simulation time and can be systematically reduced. Also, smaller
q-values require larger N [25]
50 V. S. Filinov et al.
dispersion results for the collective proton oscillations, for the electron modes see
[22, 24], which follow from the peak positions of Sii (ω, q). Fig. 2.6 shows the
peak frequency versus wave number, i.e. the dispersion of longitudinal ion-acoustic
waves, ω(q) = vMD q, where vMD denotes our MD result for the phase veloc-
ity. This can be compared to the familiar analytical expression for an ideal two-
temperature (Te ≫ Ti ) plasma vs = Zi kB Te /mi , where vs is the ion sound
velocity. We observe deviations of about 10% for weak degeneracy χe < 0.5, and
about 10% for large degeneracy χe ≥ 1, which are due to nonideality (correlations)
and quantum effects, directly included in our simulations. For further details on this
method see [6, 24, 25].
Thus semiclassical MD is a powerful approach to correlated quantum plasmas.
Thermodynamic and dynamic properties are accurately computed if accurate quan-
tum pair potentials, such as the IKP, are used.
In the classical limit ( → 0), the r.h.s of (2.15) vanishes and we obtain the classical
Liouville equation
∂W p
+ · ∇q W + F (q) · ∇p W = 0 . (2.17)
∂t m
The solution of (2.17) is known and can be expressed by the Green function [9]
2 Wigner Function Quantum Molecular Dynamics 51
where p(τ ) and q(τ ) are the phase space trajectories of all particles, which are the
solutions of Hamilton’s equations with the initial conditions at τ = t0 = 0,
dq̄ p̄(τ )
= ; q̄(0) = q0 ,
dτ m
dp̄
= F (q̄(τ )); p̄(0) = p0 . (2.19)
dτ
Using the Green function, the time-dependent solution of the classical Liouville
equation takes the form
W (p, q, t) = dp0 dq0 G(p, q, t; p0 , q0 , 0) W0 (p0 , q0 ) . (2.20)
With this result, it is now possible to construct a solution also for the quantum
case. To this end we note that it is straightforward to convert (2.15) into an integral
equation
W (p, q, t) = dp0 dq0 G(p, q, t; p0 , q0 , 0) W0 (p0 , q0 )
t
+ dt1 dp1 dq1 G(p, q, t; p1 , q1 , t1 )
0
∞
× ds1 ω(s1 , q1 , t1 ) W (p1 − s1 , q1 , t1 ) , (2.21)
−∞
which is exact and can be solved efficiently by iteration [10, 11]. The idea is to
replace the unknown function W under the integral in (2.21) by an approximation.
The first approximation is obtained by solving (2.21) to lowest order, i.e. by neglect-
ing the integral term completely. This gives the first order result for W which can
again be substituted for W in the integral in (2.21) and so on. This way we can sys-
tematically derive improved approximations for W . The procedure leads to a series
of terms of the following general form,
t
(0) (1)
W (p, q, t) = W (p, q, t) + W (p, q, t) + dt1 d1 G(p, q, t; 1, t1 )
0
t1
× dt2 d2 G(p1 − s1 , q1 , t1 ; 2, t2 )
0
∞
× ds2 ω(s2 , q2 , t2 ) W (p2 − s2 , q2 , t2 ) , (2.22)
−∞
52 V. S. Filinov et al.
The terms W (0) and W (1) are the first of an infinite series. To shorten the notation,
all higher order terms are again summed up giving rise to the last term in (2.22).
Below we will give also the third term, W (2) , but first we discuss the physical inter-
pretation of each contribution.
W (0) (p, q, t), as it follows from the Green function G(p, q, t; p0 , q0 , 0), de-
scribes the propagation of the Wigner function along the classical characteristics,
i.e., the solutions of Hamilton’s equations (2.19) in the time interval [0, t]. It is worth
mentioning, that this first term describes both classical and quantum effects, due to
the fact that the initial Wigner function W0 (p0 , q0 ), in general, contains all powers
of Planck’s constant contained in the initial state wave functions. These are quan-
tum diffraction and spin effects, depending on the quality of the initial function.
The second and third terms on the r.h.s. of (2.22) describe additional quan-
tum corrections to the time evolution of W (p, q, t) arising from non-classical time
propagation, in particular, the Heisenberg uncertainty principle. Let us consider
the term W (1) (p, q, t) in more detail. It was first proposed in [11]. Later on it
was demonstrated that the multiple integral (2.23) can be calculated stochasti-
cally by Monte Carlo techniques [12, 13, 14]. For this we need to generate an
ensemble of trajectories in phase space. To each trajectory we ascribe a specific
weight, which gives its contribution to (2.23). For example, let us consider a tra-
jectory which starts at point {p0 , q0 , τ = 0}. This trajectory acquires a weight
equal to the value W0 (p0 , q0 ). Up to the time τ = t1 the trajectory is defined
by the Green function G(p1 − s1 , q1 , t1 ; p0 , q0 , 0). At τ = t1 , as it follows from
(2.23), the weight of this trajectory must be multiplied by the factor ω(s1 , q1 , t1 ),
and simultaneously a perturbation in momentum takes place: (p1 − s1 ) → p1 .
As a result the trajectory becomes discontinuous in momentum space, but con-
tinuous in the coordinate space. Obviously this is a manifestation of the Heisen-
berg uncertainty of coordinates and momenta. Now the trajectory consists of two
parts – two classical trajectories which are the solutions of (2.19), which are sep-
arated, at τ = t1 by a momentum jump of magnitude s1 . What about the value
s1 of the jump and the time moment t1 ? Both appear under integrals with a cer-
tain probability. To sample this probability adequately, a statistical ensemble of
trajectories should be generated, further the point in time t1 must be chosen ran-
domly in the interval [0, t], and the momentum jump s1 randomly in the interval
[−∞, +∞]. Finally, also different starting & points {p0 , q0 } of trajectories at τ = 0
must be considered due to the integration dp0 dq0 . Considering a sufficiently large
2 Wigner Function Quantum Molecular Dynamics 53
G(p2 – s2, q2, t2; p0, q0, 0) G(p1 – s1, q1, t1; p0, q0, 0) G(p, q, t; p0, q0, 0)
P
W(0)
W(p, q ,t)
S1
W(1)
W(p0, q0)
S2
W(2)
S1
Fig. 2.7. Illustration of the iteration series. Three types of trajectories are shown: Without
(top curve), with one (middle) and with two (lower) momentum jumps
54 V. S. Filinov et al.
As was noted in Sect. 2.1 the Wigner function allows us to compute the quantum-
mechanical expectation value of an arbitrary one-particle operator A. Using the idea
of iteration series (2.22), we obtain an iteration series also for the expectation value
A(t) = dpdq A(p, q)W (p, q, t) = A (0) (t) + A
(1) (t) + . . . , (2.25)
where different terms correspond to different terms in the series for W . The series
(2.25) maybe computed much more efficiently than the one for W since the result
does not depend on coordinates and momenta anymore.
Certainly, in the iteration series it is possible to take into account only a finite
number of terms and contributions of a limited number of trajectories. Interestingly,
it is not necessary to compute the individual terms iteratively. Instead, all relevant
terms can be calculated simultaneously using the basic concepts of MC methods
[26]. An important task of the MC procedure will be to generate stochastically the
trajectories which give the dominant contribution to the result, for details see [10].
1 iHt β
∗ −iHt
CF A (t) = Tr F e β A e , (2.26)
Z
and the function ω (s, q, t) is defined in the same way as in the microcanonical
ensemble, see (2.16).
Using (2.28) at t = 0, we find that the initial value of the Wigner function is given
by the integral
1
W0 (1; 2; 0; β) = dξ1 dξ2 eip1 ξ1 eip2 ξ2
Z(2π)2N d
' (' (
ξ1 "" −β H/2
"
" ξ2 ξ2 "" −β H/2
"
" ξ1
× q1 − "e " q2 + q2 − "e " q1 + (2.31)
2 2 2 2
with 1 = q1 , p1 and 2 = q2 , p2 .
56 V. S. Filinov et al.
Let us now exploit the group property of the density operator ρ and the high
temperature approximation for the matrix elements of q ′ |
ρ|q (see Chap. 13)
!
M
e−β H = e−β/M H
" " " " " "
" " " " " "
q ′ "e−β/(2M)H " q ′′ ≈ q ′ "e−β/(2M)K " q ′′ q ′ "e−β/(2M)U " q ′′ . (2.32)
Then we obtain
M
1 ′ ′ ′′ ′′ − M m=2 Km − m=1 Um
W0 (1; 2; 0; β) ≈ dq1 . . . dq M dq 1 . . . dqM e
Z(2π)2N d
' " " (' " (
ip1 ξ1 / ′ " −β K/(2M
)" ξ1 ξ1 "" −β K/(2M
) " ′′
× dξ1 e qM "e " q1 + q1 − "e " q1
2 2
' " " ( ' " (
ip2 ξ2 / ′′ " −β K/(2M
)" ξ2 ξ2 "" −β K/(2M
)" ′
× dξ2 e qM "e " q2 + q2 − "e " q1 ,
2 2
(2.33)
) ′ *
where Km = (π/λ2M ) (qm ′
− qm−1 )2 + (qm ′′
− qm−1′′
)2 and Um = (β/(2M ))
′ ′′
[U (qm ) + U (qm )]. Here we have assumed that M ≫ 1, and λ2M = 2π2 β/(mM )
denotes the thermal de Broglie wave length corresponding to the inverse temperature
β/(2M ). A direct calculation of the last two factors in (2.33) gives
where
′ ′′
′ )/λM )2 /(2π)
φ (p; qM , q1 ) = (2λ2M )N d/2 e−(pλM /+iπ(q −q (2.35)
The final result for the Wigner function at t = 0 can be written as
W (1; 2; 0; β) ≈ dq1′ . . . dqM
′
dq1′′ . . . dqM
′′
Ψ (1; 2; q1′ . . . qM
′
; q1′′ . . . qM
′′
; 0; β)
′
×φ(p2 ; qM , q1′′ ) φ(p1 ; qM
′′
, q1′ ) , (2.36)
where
1 −M +1 M
′
Ψ (p1 , q1 ; p2 , q2 ; q1′ . . . qM ; q1′′ . . . qM
′′
; β) = e m=1 Km − m=1 Um . (2.37)
Z
Here we have introduced the notation {q0′ ≡ q1 ; q0′′ ≡q2 } and {qM+1
′ ′′
≡ q2 ; qM+1 ≡
q1 }. Fig. 2.8 illustrates the simulation idea. Two closed loops with the set of points
2 Wigner Function Quantum Molecular Dynamics 57
t+ t+
qM'' e
q1
e q1'
q1''
q2 qM'
t–
t–
Fig. 2.8. Two closed loops illustrating the path integral representation of two electrons in the
density matrices in (2.33). Two special points, (p1 , q1 ) and (p2 , q2 ), are starting points for
two dynamical trajectories propagating forward and backward in time
show the path integral representation of the density matrices in (2.33). The left
chain of points, i.e. {q1 , q1′ , . . . , qM
′
, q2 , q1′′ , . . . , qM
′′
} characterizes the path of a sin-
gle quantum particle. The chain has two special points (p1 , q1 ) and (p2 , q2 ). As it
follows from (2.28) and (2.29) these points are the original points for the Wigner
function, the additional arguments arise from the path integral representation. As
we show in the next section, we can consider these points as starting points for two
dynamical trajectories propagating forward and backward in time, i.e. t → t+ and
t → t− . The Hamilton equations for the trajectories are defined in the next section.
The solution follows the scheme explained before. The only difference is that we
now have to propagate two trajectories instead of one,
dq̄1 p̄1 (τ )
= , q̄1 (0) = q10 ,
dτ 2m
dp̄1 1
= F [q̄1 (τ )] , p̄1 (0) = p01 ,
dτ 2
dq̄2 p̄2 (τ )
=− , q̄2 (0) = q20 ,
dτ 2m
dp̄2 1
= − F [q̄2 (τ )], p̄2 (0) = p02 . (2.38)
dτ 2
The first (second) trajectory propagates forward (backward). Let us substitute ex-
pressions for F [q̄1 (τ )], p̄1 (τ ), F [q̄2 (τ )] and p̄2 (τ ) from (2.38) into (2.29) and sub-
tract the second equation from the first. As a result, on the l.h.s. we obtain a full
differential of the Wigner function. After multiplication by the factor 1/2 and inte-
gration over time, the integral equation for the Wigner function takes the form
58 V. S. Filinov et al.
W (p1 , q1 ; p2 , q2 ; t; β) = dp01 dq10 dp02 dq20
×G(p1 , q1 , p2 , q2 , t; p01 , q10 , p02 , q20 , 0)W (p01 , q10 ; p02 , q20 ; 0; β)
t
+ dτ dp11 dq11 dp12 dq21 G(p1 , q1 , p2 , q2 , t; p11 , q11 , p12 , q21 , τ )
0
∞
× ds dη ϑ(s, q11 ; η, q21 ; τ ) W (p11 − s, q11 ; p12 − η, q21 ; τ ; β) , (2.39)
−∞
where ϑ(s, q11 ; η, q21 ; τ ) = [ω(s, q11 )δ(η) − ω(η, q21 )δ(s)]/2. The dynamical Green
function G is defined as G(p1 , q1 , p2 , q2 , t; p01 , q10 , p02 , q20 , 0)=δ[p1 − p¯1 (τ ; p01 , q10 , 0)]
δ[q1 − q¯1 (τ ; p01 , q10 , 0)]δ[p2 − p¯2 (τ ; p02 , q20 , 0)]δ[q2 − q¯2 (τ ; p02 , q20 , 0)]. Let us de-
note the first term on the r.h.s. of (2.39) as W (0) (p1 , q1 ; p2 , q2 ; t; β). This term
represents the Wigner function of the initial state propagating along classical tra-
jectories (characteristics – solutions of (2.38)). Using the approach applied for
the microcanonical ensemble, we obtain expressions for W (1) (p1 , q1 ; p2 , q2 ; t; β),
W (2) (p1 , q1 ; p2 , q2 ; t; β), . . . and represent W (p1 , q1 ; p2 , q2 ; t; β) as iteration se-
ries. In this case, we can calculate this also with an ensemble of trajectories using
the quantum dynamics MC approach described in [28]. As a result the expression
for the time correlation function (2.27) can be rewritten as
CF A (t) = dp1 dq1 dp2 dq2 F (p1 , q1 )A(p2 , q2 )W (p1 , q1 ; p2 , q2 ; t; β)
∞
2.5 Discussion
We have presented a general idea how to extend the powerful method of molecular
dynamics to quantum systems. First, we discussed semi-classical MD, i.e., classical
2 Wigner Function Quantum Molecular Dynamics 59
MD with accurate quantum pair potentials. This method is very efficient and allows
to compute thermodynamic properties of partially ionized plasmas for temperatures
above the molecule binding energy (i.e. as long as three and four particle correla-
tions can be neglected). Further, frequency dependent quantities, e.g., the plasmon
spectrum, are computed correctly for ω < ωpl . Further progress is possible if more
general quantum potentials are derived.
In the second part, we considered methods for a rigorous solution of the quantum
Wigner-Liouville equation for the N -particle Wigner function. Results were derived
for both, a pure quantum state and a mixed state (canonical ensemble). Although this
method is by now well formulated, it is still very costly in terms of CPU time, so
that practical applications are only starting to emerge. Yet, we expect that, due to its
first principle character, Wigner function QMD will become increasingly important
for a large variety of complex many-body problems.
This work is supported by the Deutsche Forschungsgmeinschaft through SFB
TR 24 and in part by Award No. Y2-P-11-02 of the U.S. Civilian Research and
Development Foundation for the Independent States of the Former Soviet Union
(CRDF) and of Ministry of Education and Science of Russian Federation, and
RF President Grant NS-3683.2006.2 for governmental support of leading scientific
schools.
References
1. M. Allen, D. Tildesley, Computer Simulations of Liquids (Clarendon Press, Oxford,
1987) 41
2. D. Frenkel, B. Smit, Understanding Molecular Simulations: From Algorithms to Appli-
cations (Academic Press, Fribourg, 2002) 41
3. H. Feldmeier, J. Schnack, Rev. Mod. Phys. 72, 655 (2000) 41
4. D. Klakow, C. Toepffer, P.G. Reinhard, Phys. Lett. A 192, 55 (1994) 41, 43
5. D. Klakow, C. Toepffer, P.G. Reinhard, J. Chem. Phys. 101(12), 10766 (1994) 41, 43
6. G. Zwicknagel, T. Pschiwul, J. Phys. A: Math. General 39, 4359 (2006) 41, 50
7. M. Bonitz, Quantum Kinetic Theory (B.G. Teubner, Stuttgart/Leipzig, 1998) 41, 42, 43
8. E. Wigner, Phys. Rev. 40, 749 (1932) 42
9. V. Tatarsky, Sov. Phys. Usp. 26(4), 311 (1983) 42, 50
10. M. Bonitz, D. Semkat (eds.), Introduction to Computational Methods in Many Body
Physics (Princeton: Rinton Press, 2006) 43, 44, 51, 54, 58
11. V. Filinov, Y. Medvedev, V. Kamskyi, Mol. Phys. 85(4), 711 (1995) 43, 51, 52
12. V. Filinov, Mol. Phys. 88(6), 1517 (1996) 43, 52, 55
13. V. Filinov, Y. Lozovik, A. Filinov, E. Zakharov, A. Oparin, Phys. Scripta 58, 297 (1998) 43, 52, 55
14. Y. Lozovik, A. Filinov, Sov. Phys. JETP - USSR 88, 1026 (1999) 43, 52
15. Y. Lozovik, A. Filinov, A. Arkhipov, Phys. Rev. E 67, 026707 (2003) 43
16. G. Kelbg, Ann. Physik 467(3–4), 219 (1963) 43
17. G. Kelbg, Ann. Physik 467(7–8), 354 (1964) 43
18. G. Kelbg, Ann. Physik 469(7–8), 394 (1964) 43
19. A. Filinov, V. Golubnychiy, M. Bonitz, W. Ebeling, J. Dufty, Phys. Rev. E 70, 046411
(2004); W. Ebeling, A. Filinov, M. Bonitz, V. Filinov, T. Pohl, J. Phys. A: Math. Gen. 39,
4309 (2006) 43, 44, 45, 46, 47
60 V. S. Filinov et al.
Detlev Reiter
This chapter presents the basic principles of stochastic algorithms, usually called
Monte Carlo methods. After some historical notes, the generation of random num-
bers is discussed. Then, as a first non-trivial example, the concept is applied to the
evaluation of integrals. More involved problems will be discussed in the two subse-
quent chapters of this part.
D. Reiter: The Monte Carlo Method, an Introduction, Lect. Notes Phys. 739, 63–78 (2008)
DOI 10.1007/978-3-540-74686-7 3 c Springer-Verlag Berlin Heidelberg 2008
64 D. Reiter
Monte Carlo concepts fall into the branch of experimental mathematics. In ordinary
mathematics conclusions are deduced from postulates (Deduction). In experimental
mathematics conclusions are inferred from observations (Induction). Monte Carlo
methods comprise that branch of experimental mathematics, which is concerned
with experiments on random events (mainly random numbers). Monte Carlo meth-
ods can be of probabilistic or deterministic type.
Usually the first reference to the Monte Carlo Method is the famous needle
experiment of Compte de Buffon (1733), a French biologist (1707–1788), Fig. 3.1.
Buffon pointed out that if a needle of length L is tossed on a plane with parallel
lines a distance D apart (D > L), it has probability p = 2L/(πD) to fall such that
is crosses one of the lines. Later, also Laplace suggested this procedure to determine
π by counting the number of crosses n in N repetitions of the experiment. Then
n 2L 2L n
= ⇒π≈ · . (3.1)
N πD D N
This historical use of Monte Carlo has all key features of the method:
3 The Monte Carlo Method, an Introduction 65
Fig. 3.1. Buffon’s needles: What is the probability p, that a needle (length L), which falls
randomly on a sheet, crosses one of the lines (distance D)? (Left: Copyright
c 1998–2003:
The Regents of the University of California)
– Convergence: About N = 100 000 trials are needed for only two digits after the
comma. Convergence is slow, but foolproof.
– Transparency: The method is intuitively understandable, even without any math-
ematical reasoning.
– Error estimates, optimization: Error estimates and optimal choice of L, D are
provided by theory of probability. (Binomial distribution, statistical variance as
2nd central moment etc.).
Modern use Monte Carlo techniques, in the age of digital computers, was initiated
by the pioneering work of John von Neumann and Stanislaw Ulam in thermonuclear
weapon development. They are also credited for having coined the phrase Monte
Carlo.
Many monographs on Monte Carlo Methods start with an introduction to mea-
sure theory and in particular to elementary probability theory. Although we will
introduce and use the proper mathematical vocabulary too, we will, with respect to
purely mathematical aspects, refer to those and largely rely upon the intuitive mean-
ing. We refer in particular to the classic monograph by Hammersley and Handscomb
[2]. This book provides a short and very readable overview of Monte Carlo. Remark-
ably, the theoretical foundations today remain rather similar to those from 1964,
when this book was first published. Just the applications are far more sophisticated
today. The illustrative examples on Monte Carlo integration and some of the ad-
vanced techniques in this present introduction will be based upon this text1 .
1
A pdf-file of that book, which is out of print since long, can be downloaded from the
internet, e.g., http://www.eirene.de/html/textbooks.html.
66 D. Reiter
The principle is to find (estimate) mean values, i.e. expectation values, I of some
system components. If a deterministic problem is to be solved, one first has to invent
a stochastic system such that a mean value ( = expectation value) coincides with the
desired solution I of the deterministic problem.
In any case: I is a single numerical quantity of interest (not an entire functional
dependence), and one might always think of I as some definite integral.
The simple intuitive interpretations are given below, but in abstract mathematical
terms this stochastic model is given by the probability space (Ω, σ, p, X). Ω is a set
of elementary (random) events ω, the σ-field is a set of subsets of Ω to which the
measurable function p assigns a value (the probability) from the interval [0, 1], such
that the Kolmogoroff axioms for a probability are fulfilled. X is a random variable
on Ω, assigning a (usually real) number (or vector) to each random event, e.g.:
X(ω) → R, such that I = E(X), the expected value of X.
The expectation value E(X) and variance σ 2 (X) are defined as the first moment
and second central moment, respectively, and, unless otherwise stated, we assume
that they both exist
E(X) := dp X ,
Ω
2
σ (X) := dp (X − E(X))2 . (3.2)
Ω
Note that E(X) = Ep (X), σ 2 (X) = σp2 (X), i.e., the moments of X of course
depend upon the probability measure p.
A stochastic approximation to I is then obtained by producing an independent
sequence of random events ωi , i = 1, . . . N according to probability law p and
evaluating
N
1
E(XN ) = IN = X(ωi ) . (3.3)
N i=1
The estimator IN is just the arithmetic mean of many (N ) outcomes of the random
experiment.
Even without any of this abstract mathematical background it is intuitively clear
(see examples below) that IN will converge to E(X), hence to I by construction,
as the number of samples N is increased. However the laws of large numbers and
the central limit theorems of probability theory not only provide sound mathematical
proofs that this Monte Carlo procedure is exact
√ (unbiased) but also that it converges:
IN → I for N → ∞, albeit slowly (with 1/ N ). In particular the central limit the-
orem of probability theory2 asserts that the probability distribution of IN , for large
enough N , converges to a Gaussian distribution, with mean value I = E(X) and
2
See any textbook on Monte Carlo, or Probability Theory.
3 The Monte Carlo Method, an Introduction 67
variance σ 2 (IN ) = σ 2 (X)/N . Hence the typical results from statistical error analy-
sis under Gaussian distribution laws apply, e.g., also the resulting confidence levels.
It is, therefore, common practice in Monte Carlo applications to quote results as
and one has also, under the assumptions made, for large sample size N
s2 → σ 2 ,
1 2
σ 2 (IN ) ≈ s2N = s . (3.6)
N
Hence, for large enough N , in the Gaussian based error estimates (3.4) σ can safely
be replaced by sN , at least for large sample size N 100. In the opposite case
N 100 Student’s t-distribution should be employed in error analysis instead.
The third part is required for the abstract mathematical case only (general measur-
able spaces, σ-algebras, . . . ), but it does not occur in practical Monte Carlo appli-
cations. This means, for any distribution law arising in an application we can obtain
random numbers in two steps: First a random decision (based on the two remaining
weighting factors p1 , p2 ) whether the continuous or the discrete distribution is to be
sampled, and second then generating a random number from the chosen distribution
μc or μd . We will show below that for both cases, continuous and discrete distribu-
tions, general procedures for random number generation exist, at least in principle.
We refer to the standard reference on the production of nonuniform random numbers
[5]. This book deals with the myriad number of ways to transform the uniform ran-
dom numbers into anything else one might want. Also the first section (pp. 1–193)
of [6] is a very comprehensive introduction to random number generation.
Uniform random numbers are the basis for generation of random numbers with all
other distribution laws. A random variable is uniformly distributed on an interval
[a, b], if the distribution density f is
1
f (x) = χ[a,b] , xǫR (3.7)
b−a
with χ[a, b] = 1 if x is in the interval [a, b] and f (x) = 0 elsewhere.
The classical method to generate uniform random numbers on [0, 1] is by so
called linear congruential random number generators, which are defined by the
recursion
ξn+1 = [a ξn + b] mod m (3.8)
Here a is a magic multiplicand, m if often chosen to be the largest integer repre-
sentable on the machine (m = 232 , etc.), and b should be prime to m. Proofs for
particular choices of (large) parameters a and m that the generator achieves the
largest possible period of m − 1 different random numbers are quite cumbersome.
Optimal parameter choices are typically found experimentally, see again [6]. The
finite periodicity limits precision only in very large calculations, e.g. on modern
massively parallel computing systems. A rather subtle issue is also independence of
an entire sequence of random numbers (loc.cit.).
P (X = i) = pi ≥ 0 ,
k
pi = 1 ,
i=0
i
F (i) = P (X ≤ i) = pi (3.9)
j=0
Unfortunately, the Gaussian error function erf(x) and hence Φ(x) cannot be inverted
in closed form. We will show how to generate Gaussian random numbers, even
without numerical inversion, further below.
From this procedure follows directly the natural and best format for storing (also
multi-dimensional) tabulated data for random sampling in Monte Carlo applica-
tions: Form the inverse cumulative distribution function F −1 (x) (i.e.: the quantile
function) and store this for x uniformly spaced in [0, 1]. Then take ξ from a uniform
distribution on [0, 1] and find F −1 (ξ) by interpolation in this table.
70 D. Reiter
1
0.6 (a) Gauss (b) 4 (c)
Cauchy
0.8
0.5 2
0.6 ξ2 z1
0.4
0
0.3 ξ1
0.4 z2
0.2 –2
0.2
0.1 –4
0 0
–4 –2 0 2 4 –4 –2 0 2 4 0 0.2 0.4 0.6 0.8 1
Fig. 3.2. (a) Comparison of Cauchy (dashed line) and normal distribution (solid line).
(b) Cumulative distribution function Φ(x) of normal distribution (3.11), (c) Inverse cumu-
lative distribution of normal distribution Φ−1 (ξ). Uniform random numbers ξ1 , ξ2 (abscissa)
are converted to random numbers z1 , z2 from a normal distribution (ordinate)
3.2.2.2 Rejection
Another general method for generating non-uniform random numbers is the re-
jection method (J.v. Neumann, 1947). This method is always applicable, although
it may sometimes be rather inefficient. For distributions with finite support, i.e.,
f (x) = 0 only on a finite domain M (say, M = [a, b]), find the maximum c of
f (x), sample a random pair (ξ1 , ξ2 ) with ξ1 uniform on M and ξ2 uniform on [0, c].
If ξ2 ≤ f (ξ1 ), accept ξ1 . Otherwise reject this pair and pick a new pair. Repeat this
procedure until a pair is accepted. Clearly, the efficiency of this method (e.g. mea-
sured as average number of accepted random pairs to number of pairs produced)
may be quite poor, in particular if the distribution f (x) has sharp maxima.
A more general, and sometimes more efficient rejection method, working even
on infinite sampling domains M , results if one finds a second distribution g(x) and a
numerical constant c such that f (x) ≤ c · g(x). Again find a pair (z1 , z2 ) of random
numbers, however with z1 not sampled uniformly on M but from distribution g(z)
instead. z2 is uniform on the interval [0, c]. The random variable z1 is accepted if
z2 ≤ f (z1 )/g(z1 ). Otherwise a new pair (z1 , z2 ) is generated. See Chap. 5 for an
important application in particle simulation.
3.2.2.3 Examples
3.2.2.3.1 Inversion
Important examples in which the inversion method can be applied are, e.g., the expo-
nential distribution (of the mean free flight length of radiation in matter), the cosine
distribution of polar emission angles against surface normals, the surfaces cross-
ing Maxwellian flux distribution f (v⊥ ) ∝ v⊥ fMaxw (v⊥ ) of normal velocity compo-
nents of gas molecules with Maxwellian velocity distribution (fMaxw ). We explicitly
illustrate the inversion method here for the Cauchy distribution: The Cauchy dis-
tribution, see Fig. 3.2(a), in physical applications also called Lorentz distribution, is
an example of a distribution function that has no moments. It arises often in radia-
tion transfer, e.g., as line-shapes of naturally- or Stark broadened lines or in other
resonance phenomena
3 The Monte Carlo Method, an Introduction 71
c 1
fC (x) = . (3.12)
π (x − b)2 + c2
Here b is the median (line shift), and c is the half width at half maximum (HWHM).
Generating random number with a Cauchy distribution is usually done by inver-
sion. First transform to a standardized Cauchy, by s = (x − b)/c. The cumulative
distribution is then given as
x
1 1 1 1 x−b
FC (x) = ds = + arctan . (3.13)
π s2 + 1 2 π c
−∞
Because the Gaussian error function cannot be inverted in closed form, the following
combination of transformation, rejection and inversion method is typically applied:
Not one, but two independent normally distributed random numbers (z1 , z2 ) are
produced by first transforming random variables Z1 , Z2 from cartesian to polar co-
ordinates R, Φ. The angle Φ is then uniform in [0, 2π]. Only cos(Φ) and sin(Φ) are
needed, and a rejection method (comparing a unit circle and a surrounding square)
can be used for them. The variable R has, due to the Jacobian of the transformation,
a Gaussian flux distribution (see above) rather than a Gaussian itself, and this can be
directly generated by the Method of Inversion. Transforming back Z1 = R · cos(Φ)
and Z2 = R · sin(Φ) provides a pair of independent Gaussian random numbers.
Here f is the one particle distribution (density) function f (r, v, i, t) or f (x), where
the state x of the relevant phase-space may, e.g., be characterized by a position vec-
tor r, a velocity vector v, the time t, i.e. continuous variables, and further a discrete
72 D. Reiter
chemical species index i, also for example for internal quantal states. g(x) is again
some weighting function determined by the particular moment of interest. In math-
ematical terms one would refer to this as Lebesgue-Stieltjes Integral of measurabel
function g(x) with respect to (probability) measure defined by distribution density
f (x).
We will discuss Integration by Monte Carlo using the example from [2]: Let the
integration domain V be the unit interval [0, 1], f (x) the uniform distribution on
[0, 1] (i.e.: f (x) = 1 on [0, 1], and f (x) = 0 elsewhere) and g(x) = (exp(x) −
1)/(e − 1). Clearly,
1
ex − 1
I= dx = 0.418 0 . . . . (3.15)
e−1
0
We will now integrate this same function by Monte Carlo. Our first method does not
require any theory, but instead, inspired by Buffon’s needle experiment, we will just
use pairs ξ1 , ξ2 of independent uniform random numbers and compare the known
area (the unit square [0, 1] × [0, 1]) with the unknown area I, which is the area
underneath function g(x), in [0,1]. I.e., we count a hit if the point defined by the
pair of random numbers is under the curve g(x), and a miss otherwise.
As can clearly be seen on Fig. 3.3 the ratio of hits to total number of samples con-
verges to the exact values of the integral, as expected, and also the √ statistical error,
indicated as empirical standard deviation sN , (3.5) scales with 1/ N as expected.
Of course such a Monte Carlo integration method is patently foolish. By this
method we have, in principle, replaced the single integral over function g by a dou-
ble integral over the area between abscissa and function g(x). The conventional
text-book method (crude Monte Carlo) can be obtained from this one by the obser-
vation that once the first random number ξ1 of the pair is known, we do not have
to rely upon ξ2 to decide about counting zero or one. Given ξ1 , then an one will be
counted with probability p: p = g(ξ1 ). Hence instead we can use that (conditional)
expected value p of the binomial distribution b(1, p) directly. This is, admittedly, a
quite obscure explanation for something really trivial. But it is also the underlying
idea behind a powerful variance reducing Monte Carlo technique known under dif-
ferent names in different areas of application: Conditional expectation estimator (in
neutron shielding), [4], averaging transformation (transfer theory, mainly in Russian
literature), [7], or energy partitioning method in radiative heat transfer [8].
This method is opposite to randomization: We have replaced a sampled result
(zero or one) by its expectation value. In our particular example we have carried out
one of the two integrations analytically, conditional on the outcome ξ1 . The second
random number is not needed at all in this particular trivial case but this not the
relevant point. What is important also in general terms is that one (generally: some)
of the two (generally: many) integrals has been done analytically, and only the re-
maining ones by random sampling. The general rule is: Always try to do as many
integrations analytically or numerically and resort to Monte Carlo only for the rest.
In particle transport theory this concept will lead to powerful hybrid methods com-
bining information gained analytically (or numerically) and stochastically, bridging
3 The Monte Carlo Method, an Introduction 73
0.50
0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10
100.0 100.3 100.6 100.9 101.2 101.5 101.8 102.1 102.4 102.7 103.0
number of samples, logarithmic scaling
integral
approximation
0.4180233
Fig. 3.3. Evaluating Integral of (exp(x) − 1)/(e − 1) on [0,1], method: hit or miss Monte
Carlo
continuously the gap between stochastic and numerical methods. Sometimes, how-
ever, these resulting methods may loose their transparency.
In this crude Monte Carlo integration I is obtained as estimated mean value
(expectation value) of function g(x) with respect to the uniform probability distri-
bution f (x) on [0, 1], I = Ef (g), see remark after (3.2). Also indicated in Fig. 3.4
is again the empirical standard deviation sN , which, as expected, is significantly
smaller than with the hit or miss method.
Note that although this method has certainly a smaller statistical error per sam-
ple, the efficiency gain of one over the other method has also to account for the extra
labor involved in evaluating the smoother estimator (which is hardly measurable in
this trivial example chosen here).
In general Monte Carlo terminology one would refer to the uniform distribu-
tion f (x) as the underlying stochastic law, according to which random samples
X are produced. The random variable g(X) is called estimator, score, or response
function.
We are now in the position to explain the famous Monte Carlo concept of impor-
tance sampling for improving the statistical performance of Monte Carlo methods.
Distinct from the conditional expectation technique discussed above, in which the
74 D. Reiter
0.50
0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10
100.0 100.3 100.6 100.9 101.2 101.5 101.8 102.1 102.4 102.7 103.0
number of samples, logarithmic scaling
integral
approximation
0.4180233
Fig. 3.4. Evaluating Integral of (exp(x) − 1)/(e − 1) on [0,1], method: crude Monte Carlo
Hence we have g#(x) = g(x)f (x)/f#(x). The name of this method, importance
sampling originates from the special techniques often used to find optimal biassing
schemes (i.e.: f#(x)) of the random process, in particular in transfer theory. A more
general, but also somewhat imprecise terminology would refer to this concept as
non-analog Monte Carlo, as compared to the analog Monte Carlo scheme. In the
latter the underlying probability distribution law is directly taken from the applica-
tion, whereas in the former one uses a different distribution, motivated by practical,
economical or other reasons, and statistical weights to compensate this.
As seen from (3.16), the value of I is independent of how the integrant is
decomposed into a product of a probability density and a response function, but
the variances, σf2 (g) and σf2#(#
g ), certainly can be different.
3 The Monte Carlo Method, an Introduction 75
Let’s take, again, our example, to illustrate the concept: In order to reduce the
variance σf2#(#g ) of g# with respect to probability law f# we should try to make #g as
constant as possible on [0,1]. The Taylor expansion of our particular function g(x)
indicates that the ratio g#(x) = g(x)/x should be more constant than g(x) itself.
Hence we try f#(x) ∝ x, i.e., f#(x) = 2x so that f#(x) is normalized to one on [0,1].
Our importance sampling procedure to evaluate I now proceeds as follows:
Draw random numbers ξ# from f#(x). By the method of inversion, this is done by
√
setting ξ# = ξ, with ξ a uniform random number on [0,1]. Then, again, form the
arithmetic average of many (N ) random variables g#(ξ). # Figure 3.5 shows the result
of such an integration, again vs. N . Clearly the convergence is (i) to the correct
value, (ii) still only ∝ 1/ (N ), but (iii) the error bars sN are much smaller than in
both previously discussed Monte Carlo integration methods.
Again, it needs to be pointed out that the efficiency of the procedure is nei-
ther determined by the variance, not by N per CPU-time, but only by the figure
of merit: variance per CPU time. And hence, importance sampling, more generally,
non-analog sampling, can go both ways in Monte Carlo. Its performance has to be
assessed on a case by case basis.
As a general observation, one should note that in non-analog Monte Carlo
schemes the error assessment simply based upon the empirical variance, and error
bars obtained from the central limit theorem, can be less reliable than in analog sim-
ulations. Although the variance may be decreased by a clever importance sampling
0.50
0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10
100.0 100.3 100.6 100.9 101.2 101.5 101.8 102.1 102.4 102.7 103.0
number of samples, logarithmic scaling
integral
approximation
0.4180233
Fig. 3.5. Same integral as in Fig. 3.4, method: importance sampling Monte Carlo
76 D. Reiter
method, the variance of the variance may increase, thus invalidating conventional
error bar estimates, see [9].
As in the case of conditional expectation Monte Carlo we can design an extreme
case of importance sampling with zero statistical error after only one sample: Let us
set f#(x) = g(x)/I, hence: g#(x) = I = const. Monte Carlo integration proceeds by
sampling from this distribution f#(x) which, in case of our particular example can
be done by the rejection technique. Then, independent of the sampling, I is scored.
Unfortunately we needed the knowledge of the final result I already to design this
perfect zero variance scheme.
Finally we use our simple integral to illustrate the concept of the δf Monte Carlo
method, which is widely used in kinetic particle transport simulations. Starting point
is the idea to split the unknown parameter into a large known nearby quantity and
small unknown perturbation. In particle simulations this can also be the single parti-
cle distribution function f (x) solving some kinetic equation or moments of this pdf.
In near equilibrium situations we have
with, for example, the Maxwellian equilibrium distribution fequil and a small pertur-
bation δf . It can then be advantageous to solve, by Monte Carlo sampling, only for
δf rather than for the full distribution.
So let us consider our integral again, and write, accordingly, I = I0 + δI with
I0 the known part
1
1
1 1 1 2 2 1
I0 = dx g0 (x) = dx x + x = (3.18)
e−1 e−1 2 3e−1
0 0
Figure 3.6 shows the result of the estimate for I, with I0 known and δI evaluated
by crude Monte Carlo. Clearly by eliminating a large, known, contribution to I the
relative errors of the estimates for any given sample size N are greatly reduced as
compared to previous methods.
This method is also related to the so called correlation sampling technique, in
which one would evaluate both I and I0 by Monte Carlo techniques, but using the
same random numbers. Both estimates are then positively correlated and the statis-
tical precision of the Monte Carlo estimate for the difference δI can be substantially
better than in independent estimates of I and I0 or of I alone.
3 The Monte Carlo Method, an Introduction 77
0.50
0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10
100.0 100.3 100.6 100.9 101.2 101.5 101.8 102.1 102.4 102.7 103.0
number of samples, logarithmic scaling
integral
approximation
0.4180233
3.4 Summary
The purpose of this introduction was to show that random numbers can be generated
from any given probability density distribution, and that Monte Carlo Methods can
be regarded as stochastic (rather than numerical) procedures for integration. Monte
Carlo consists of inventing a random game such that the expected value of a proper
random variable is exactly equal to the parameter which is to be computed. Averag-
ing over repeated independent Monte Carlo samples from that game converges (in
the proper measure theoretical sense) to the desired solution.
The additional complication arising in many particle physics applications and
in transfer theory is due to one fact only: Distinct from the material in this present
chapter the sampling distribution f (x) is sometimes not known explicitly. Instead
it will be given only implicitly as solution of a, usually, very complicate equation
(e.g.: the Boltzmann equation, the Fokker-Planck equation, etc.). We will see that
this extra complication can be dealt with by sampling from certain stochastic pro-
cesses (generating particle trajectories), rather than sampling from a given pdf, see
Chap. 5. But the rest: Estimation of multi-dimensional integrals, the unbiased nature
of the method, proof of convergence, error bars, variance reduction methods, remain
essentially the same as in this present introduction.
78 D. Reiter
References
1. H. Kalos, P.A. Whitlock, Monte Carlo Methods, Vol. I: Basics (Wiley-Interscience Pub-
lications, John Wiley and Sons, New York, 1986) 63, 64
2. J.M. Hammersley, D.C. Handscomb, Monte Carlo Methods (Chapman and Hall, Lon-
don & New York, 1964) 64, 65, 72
3. R.Y. Rubenstein, in Wiley Series in Probability and Mathematical Statistics (John Wiley
and Sons, New York, 1981) 64
4. J. Spanier, E. Gelbard, Monte Carlo Principles and Neutron Transport Problems (Ad-
dison Wesley Publication Company, 1969) 64, 72
5. L. Devroye, Non-Uniform Random Variate Generation (Springer-Verlag, Berlin Hei-
delberg New York, 1986) 68
6. D.E. Knuth, in Seminumerical Algorithms, Vol. 2 (Addison Wesley, Reading, 1998) 68
7. G. Mikhailov, Optimization of Weighted Monte Carlo Methods (Springer Verlag, Berlin
Heidelberg New York, 1992) 72
8. A. Wang, M.F. Modest, J. Quant. Spectrosc. R. A 104, 288 (2007) 72
9. K. Noack, Ann. nucl. Energy 18(6), 309 (1991) 76
4 Monte Carlo Methods in Classical
Statistical Physics
Wolfhard Janke
Institut für Theoretische Physik and Centre for Theoretical Sciences, Universität Leipzig,
04009 Leipzig, Germany
The purpose of this chapter is to give a brief introduction to Monte Carlo simu-
lations of classical statistical physics systems and their statistical analysis. To set
the general theoretical frame, first some properties of phase transitions and sim-
ple models describing them are briefly recalled, before the concept of importance
sampling Monte Carlo methods is introduced. The basic idea is illustrated by a few
standard local update algorithms (Metropolis, heat-bath, Glauber). Then methods
for the statistical analysis of the thus generated data are discussed. Special atten-
tion is payed to the choice of estimators, autocorrelation times and statistical error
analysis. This is necessary for a quantitative description of the phenomenon of crit-
ical slowing down at continuous phase transitions. For illustration purposes, only
the two-dimensional Ising model will be needed. To overcome the slowing-down
problem, non-local cluster algorithms have been developed which will be described
next. Then the general tool of reweighting techniques will be explained which is ex-
tremely important for finite-size scaling studies. This will be demonstrated in some
detail by the sample study presented in the next section, where also methods for es-
timating spatial correlation functions will be discussed. The reweighting idea is also
important for a deeper understanding of so-called generalized ensemble methods
which may be viewed as dynamical reweighting algorithms. After first discussing
simulated and parallel tempering methods, finally also the alternative approach us-
ing multicanonical ensembles and the Wang-Landau recursion are briefly outlined.
4.1 Introduction
Classical statistical physics is a well understood subject which poses, however,
many difficult problems when a concrete solution for interacting systems is sought.
In almost all non-trivial applications, analytical methods can only provide approxi-
mate answers. Numerical computer simulations are, therefore, an important comple-
mentary method on our way to a deeper understanding of complex physical systems
such as (spin) glasses and disordered magnets or of biologically motivated prob-
lems such as protein folding. Quantum statistical problems in condensed matter or
the broad field of elementary particle physics and quantum gravity are other ma-
jor applications which, after suitable mappings, also rely on classical simulation
techniques.
W. Janke: Monte Carlo Methods in Classical Statistical Physics, Lect. Notes Phys. 739, 79–140 (2008)
DOI 10.1007/978-3-540-74686-7 4 c Springer-Verlag Berlin Heidelberg 2008
80 W. Janke
with the summation running over all possible states of the system. The state space
may be continuous or discrete. As usual β ≡ 1/kB T denotes the inverse temperature
4 Monte Carlo Methods in Classical Statistical Physics 81
du E 2 − E2
C= = β2 = β 2 V e2 − e2 , (4.4)
dT V
where we have set H ≡ E = eV with V denoting the number of lattice sites, i.e.,
the lattice volume. The magnetization per site m = M/V and the susceptibility χ
are defined as
82 W. Janke
1 d ln Z 1
M= = V μ , μ= σi , (4.5)
β dh V i
and
χ = βV μ2 − μ2 . (4.6)
The correlation between spins σi and σj at sites labeled by i and j can be measured
by considering correlation functions like the two-point spin-spin correlation G(i, j),
which is defined as
C = Creg + C0 |1 − T /Tc|−α + . . . ,
m = m0 (1 − T /Tc )β + . . . ,
χ = χ0 |1 − T /Tc |−γ + . . . , (4.11)
where Creg is a regular background term, and the amplitudes are again in general
different on the two sides of the transition. Right at the critical temperature Tc , two
further exponents δ and η are defined through
m ∝ h1/δ ,
G(r) ∝ r−D+2−η . (4.12)
In the 1960’s, Rushbrooke [22], Griffiths [23], Josephson [24, 25] and Fisher
[26] showed that these six critical exponents are related via four inequalities. Sub-
sequent experimental evidence indicated that these relations were in fact equalities,
and they are now firmly established and fundamentally important in the theory of
critical phenomena. With D representing the dimensionality of the system, the scal-
ing relations are
Dν = 2 − α (Josephson’s law) ,
2β + γ = 2 − α (Rushbrooke’s law) ,
In the conventional scaling scenario, Rushbrooke’s and Griffiths’ laws can be de-
duced from the Widom scaling hypothesis that the Helmholtz free energy is a ho-
mogeneous function [27, 28]. Widom scaling and the remaining two laws can in turn
be derived from the Kadanoff block-spin construction [29] and ultimately from that
of the renormalization group (RG) [30]. Josephson’s law can also be derived from
the hyperscaling hypothesis, namely that the free energy behaves near criticality as
84 W. Janke
Table 4.1. Critical exponents of the Ising model in two (2D) and three (3D) dimensions. All
2D exponents are exactly known [31, 32], while for the 3D Ising model the world-average
for ν and γ calculated in [33] is quoted. The other exponents follow from the hyperscaling
relation α = 2 − Dν, and the scaling relations β = (2 − α − γ)/2, δ = γ/β + 1, and
η = 2 − γ/ν
dimension ν α β γ δ η
−D
the inverse correlation volume: f∞ (t) ∼ ξ∞ (t). Twice differentiating this relation
one recovers Josephson’s law (4.13). The critical exponents for the 2D and 3D Ising
model [31, 32, 33] are collected in Table 4.1.
In any numerical simulation study, the system size is necessarily finite. While
the correlation length may still become very large, it is therefore always finite. This
implies that also the divergences in other quantities are rounded and shifted [34, 35,
36, 37]. How this happens is described by finite-size scaling (FSS) theory, which in
a nut-shell may be explained as follows: Near Tc the role of ξ is taken over by the
linear size L of the system. By rewriting (4.9) or (4.10) and replacing ξ → L
|1 − T /Tc | ∝ ξ −1/ν −→ L−1/ν , (4.14)
it is easy to see that the scaling laws (4.11) are replaced by the FSS Ansätze,
C = Creg + aLα/ν + . . . ,
m ∝ L−β/ν + . . . ,
χ ∝ Lγ/ν + . . . . (4.15)
As a mnemonic rule, a critical exponent x of the temperature scaling law is
replaced by −x/ν in the corresponding FSS law. In general these scaling laws are
valid in a neighborhood of Tc as long as the scaling variable
x = (1 − T /Tc)L1/ν (4.16)
is kept fixed [34, 35, 36, 37]. This implies for the locations Tmax of the (finite)
maxima of thermodynamic quantities such as the specific heat or susceptibility, an
FSS behavior of the form
Tmax = Tc (1 − xmax L−1/ν + . . .) . (4.17)
In this more general formulation the scaling law for, e.g., the susceptibility reads
χ(T, L) = Lγ/ν f (x) , (4.18)
where f (x) is a scaling function. By plotting χ(T, L)/Lγ/ν versus the scaling vari-
able x, one thus expects that the data for different T and L fall onto a master
4 Monte Carlo Methods in Classical Statistical Physics 85
curve described by f (x). This is a nice visual method for demonstrating the scaling
properties.
Similar considerations for first-order phase transitions [38, 39, 40, 41] show
that here the δ-function like singularities in the thermodynamic limit, originating
from phase coexistence, are also smeared out for finite systems [42, 43, 44, 45,
46]. They are replaced by narrow peaks whose height (width) grows proportional to
the volume (1/volume) with a displacement of the peak location from the infinite-
volume limit proportional to 1/volume [47, 48, 49, 50, 51, 52].
The basic idea of importance sampling is to set up a suitable Markov chain that
draws configurations not at random but according to their Boltzmann weight
e−βH({σi })
P eq ({σi }) = . (4.19)
Z
A Markov chain defines stochastic rules for transitions from one state to another
subject to the condition that the probability for the new configuration only depends
on the preceding state but not on the history of the whole trajectory in state space,
i.e., it is almost local in time. Symbolically this can be written as
W W W W
. . . −→ {σi } −→ {σi }′ −→ {σi }′′ −→ . . . , (4.20)
2
This number should be compared with the estimated number of protons in the Universe
which is about 1080 .
86 W. Janke
From condition (iii) we see that the desired Boltzmann distribution P eq is a fixed
point of W (eigenvector of W with unit eigenvalue). A somewhat simpler sufficient
condition is detailed balance,
By summing over {σi } and using condition (ii), the more general condition (iii)
follows. After an initial equilibration period (cf. Sect. 4.5.1), expectation values can
be estimated as an arithmetic mean over the Markov chain of length N , e.g.,
N
1
E = H = H({σi })P eq ({σi }) ≈ H({σi }j ) , (4.22)
N j=1
{σi }
where {σi }j denotes the spin configuration at “time” j. A more detailed exposition
of the mathematical concepts underlying any Markov chain Monte Carlo algorithm
can be found in many textbooks and reviews [1, 2, 3, 4, 34, 54, 55].
The Markov chain conditions (i)–(iii) are still quite general and can be satisfied by
many different concrete update rules. In a rough classification one distinguishes be-
tween local and non-local algorithms. Local update algorithms discussed in this
subsection are conceptually much simpler and, as the main merit, quite univer-
sally applicable. The main drawback is their relatively poor performance close to
second-order phase transitions where the spins or fields of a typical configuration
are strongly correlated over large spatial distances. Here non-local update algo-
rithms based on multigrid methods or in particular self-adaptive cluster algorithms
discussed later in Sect. 4.4 perform much better.
The most flexible update rule is the classic Metropolis algorithm [56], which
is applicable in practically all cases (lattice/off-lattice, discrete/continuous, short-
range/long-range interactions, . . . ). Here one proposes an update for a single degree
of freedom (spin, field, . . . ) and accepts this proposal with probability
/
1 Enew < Eold
W ({σi }old −→ {σi }new ) = , (4.23)
e−β(Enew −Eold) Enew ≥ Eold
where Eold and Enew denote the energy of the old and new spin configuration {σi }old
and {σi }new , respectively, where {σi }new differs from {σi }old only locally by one
modified degree of freedom at, say, i = i0 . More compactly, this may also be writ-
ten as
W ({σi }old −→ {σi }new ) = min{1, e−β∆E } , (4.24)
4 Monte Carlo Methods in Classical Statistical Physics 87
where ∆E = Enew − Eold . If the proposed update lowers the energy, it is always
accepted. On the other hand, when the new configuration has a higher energy, the up-
date has still to be accepted with a certain probability in order to ensure the proper
treatment of entropic contributions – in thermal equilibrium, it is the free energy
F = U − T S which has to be minimized and not the energy. Only in the limit of
zero temperature, β → ∞, the acceptance probability for this case tends to zero and
the Metropolis method degenerates to a minimization algorithm for the energy func-
tional. With some additional refinements, this is the basis for the simulated anneal-
ing technique [57], which is often applied to hard optimization and minimization
problems.
The verification of the detailed balance condition (4.21) is straightforward. If
Enew < Eold , then the l.h.s. of (4.21) becomes exp(−βEold ) × 1 = exp(−βEold ).
On the r.h.s. we have to take into account that the reverse move would increase
the energy, Eold > Enew (with Eold now playing the role of the new energy), such
that now the second line of (4.23) with Eold and Enew interchanged is relevant.
This gives exp(−βEnew ) × exp(−β(Eold − Enew )) = exp(−βEold ) on the r.h.s. of
(4.21), completing the demonstration of detailed balance. In the opposite case with
Enew < Eold , a similar reasoning leads to exp(−βEold ) × exp(−β(Enew − Eold )) =
exp(−βEnew ) = exp(−βEnew ) × 1. Admittedly, this proof looks a bit like a tautol-
ogy. To uncover its non-trivial content, it is a useful exercise to replace the r.h.s. of
the Metropolis rule (4.23) by some general function f (Enew − Eold ) and repeat the
above steps [58].
Finally a few remarks on the practical implementation of the Metropolis method
are in order. To decide whether a proposed update should be accepted or not, one
draws a uniformly distributed random number r ∈ [0, 1), and if W ≤ r, the new
state is accepted. Otherwise one keeps the old configuration and continues with
the next spin. In computer simulations, random numbers are generated by means
of pseudo-random number generators (RNGs), which produce (more or less) uni-
formly distributed numbers whose values are very hard to predict – by using some
deterministic rule (see [59] and references therein). In other words, given a finite
sequence of subsequent pseudo-random numbers, it should be (almost) impossible
to predict the next one or to even guess the deterministic rule underlying their gen-
eration. The goodness of an RNG is thus measured by the difficulty to derive its
underlying deterministic rule. Related requirements are the absence of trends (cor-
relations) and a very long period. Furthermore, an RNG should be portable among
different computer platforms and, very importantly, it should yield reproducible re-
sults for testing purposes. The design of RNGs is a science in itself, and many things
can go wrong with them. As a recommendation one should better not experiment
too much with some fancy RNG one has picked up somewhere from the Web, say,
but rely on well-tested and well-documented routines.
There are many different ways how the degrees of freedom to be updated can
be chosen. They may be picked at random or according to a random permutation,
which can be updated every now and then. But also a simple fixed lexicographical
88 W. Janke
(sequential) order is permissible.3 In lattice models one may also update first all odd
and then all even sites, which is the usual choice in vectorized codes. A so-called
sweep is completed when on the average4 for all degrees of freedom an update was
proposed. The qualitative behavior of the update algorithm is not sensitive to these
details, but its quantitative performance does depend on the choice of the update
scheme.
This algorithm is only applicable to lattice models and at least in its most straight-
forward form only to discrete degrees of freedom with a few allowed states. The
new value σi′0 at site i0 is determined by testing all its possible states in the heat-
bath of its (fixed) neighbors (e.g., four on a square lattice and six on a simple-cubic
lattice with nearest-neighbor interactions):
′
e−βH({σi }new ) e−βσi0 Si0
W ({σi }old −→ {σi }new ) = −βH({σ } )
= −βσi0 Si0
, (4.25)
σi e σi e
i old
0 0
where Si0 = − j σj − h is an effective spin or field collecting all neighboring
spins (in their old states) interacting with the spin at site i0 and h is the external
magnetic field. Note that this decomposition also works in the case of vectors (σi →
σi , h → h, Si0 → Si0 ), interacting via the usual dot product (σi′0 Si0 → σi′0 ·
Si0 ). As the last equality in (4.25) shows, all other contributions to the energy not
involving σi′0 cancel due to the ratio in (4.25), so that for the update at each site i0
only a small number of computations is necessary (e.g, about four for a square and
six for a simple-cubic lattice of arbitrary size). Detailed balance (4.21) is obviously
satisfied since
e−βH({σi }new ) e−βH({σi }old )
e−βH({σi }old ) −βH({σi }new )
= e −βH({σi }new )
−βH({σi }old )
. (4.26)
σi e
0 σi e
0
How is the probability (4.25) realized in practice? Due to the summation over
all local states, special tricks are necessary when each degree of freedom can
take many different states, and only in special cases the heat-bath method can be
efficiently generalized to continuous degrees of freedom. In many applications,
however, the admissible local states of σi0 can be labeled by a small number of
integers, say n = 1, . . . , N . Since the probability in (4.25) is normalized to unity,
the sequence (P1 , P2 , . . . , Pn , . . . , PN ) decomposes the unit interval into segments
of length Pn = exp(−βnSi0 )/ N k=1 exp(−βkSi0 ). If one now draws a random
n
number R ∈ [0, 1) and compares the accumulated probabilities k=1 k with R,
P
n0
then the new state n0 is given as the smallest integer for which k=1 Pk ≥ R.
Clearly, for a large number of possible local states, the determination of n0 can be-
come quite time-consuming (in particular, if many small Pn are at the beginning
3
Some special care is necessary, however, for one-dimensional spin chains.
4
This is only relevant when the random update order is chosen.
4 Monte Carlo Methods in Classical Statistical Physics 89
of the sequence, in which case a clever permutation of the Pn -list can help a lot).
The order of updating the individual variables can be chosen as for the Metropolis
algorithm (random, sequential, . . . ).
In the special case of the Ising model with only two states per spin, σi = ±1,
(4.25) reads explicitly as
′
e−βσi0 Si0
W ({σi }old −→ {σi }new ) = βSi . (4.27)
e 0 + e−βSi0
And since ∆E = Enew − Eold = (σi′0 − σi0 )Si0 , the probability for a spin flip,
σi′0 = −σi0 , becomes [58]
e−β∆E/2
Wσi0 →−σi0 = . (4.28)
eβ∆E/2 + e−β∆E/2
The acceptance ratio (4.28) is plotted in Fig. 4.1 as a function of ∆E for various
(inverse) temperatures and compared with the corresponding ratio (4.24) of the
Metropolis algorithm. As we shall see in the next paragraph, for the Ising model,
the Glauber and heat-bath algorithm are identical.
The Glauber update prescription [60] is conceptually similar to the Metropolis algo-
rithm in that also here a local update proposal is accepted with a certain probability
1.0 M
0.8
acceptance ratio
0.6
HB
0.4
β = 0.2
0.2 β = 0.44
β = 1.0
0.0
−8 −4 0 4 8
energy difference ΔE
Fig. 4.1. Comparison of the acceptance ratio for a spin flip with the heat-bath (HB) (or
Glauber) and Metropolis (M) algorithm in the Ising model for three different inverse temper-
atures β. Note that for all values of ∆E and temperature, the Metropolis acceptance ratio is
higher than that of the heat-bath algorithm
90 W. Janke
or otherwise rejected. For the Ising model with spins σi = ±1 the acceptance prob-
ability can be written as
1
Wσi0 →−σi0 = [1 + σi0 tanh (βSi0 )] , (4.29)
2
where as before σi0 Si0 with Si0 = − j σj − h is the energy of the ith
0 spin in the
current old state.
Due to the point symmetry of the hyperbolic tangent, one may rewrite σi0 tanh
(βSi0 ) as tanh (σi0 βSi0 ). And since as before ∆E = Enew − Eold = −2σi0 Si0 ,
(4.29) becomes
1
Wσi0 →−σi0 = [1 − tanh (β∆E/2)] , (4.30)
2
showing explicitly that the acceptance probability only depends on the total en-
ergy change as in the Metropolis case. In this form it is thus possible to generalize
the Glauber update rule from the Ising model with only two states per spin to any
general model that can be simulated with the Metropolis procedure. Also detailed
balance is straightforward to prove. Finally by using trivial identities for hyperbolic
functions, (4.30) can be further recast to read
1 cosh(β∆E/2) − sinh(β∆E/2)
Wσi0 →−σi0 =
2 cosh(β∆E/2)
e−β∆E/2
= , (4.31)
eβ∆E/2 + e−β∆E/2
which is just the flip probability (4.28) of the heat-bath algorithm for the Ising
model, i.e., heat-bath updates for the special case of a 2-state model and the Glauber
update algorithm are identical. In the general case with more than two states per
spin, however, this is not the case.
The Glauber (or equivalently heat-bath) update algorithm for the Ising model is
also theoretically of interest since in this case the dynamics of the Markov chain can
be calculated analytically for a one-dimensional system [60]. For two and higher
dimensions, however, no exact solutions are known.
Local update algorithms are applicable to a very wide class of models and the com-
puter codes are usually quite simple and very fast. The main drawback are tempo-
ral correlations of the generated Markov chain which tend to become huge in the
vicinity of phase transitions. They can be determined by analysis of autocorrelation
functions
Oi Oi+k − Oi Oi
A(k) = , (4.32)
Oi2 − Oi Oi
where O denotes any measurable quantity, for example the energy or magnetization.
More details and how temporal correlations enter into the statistical error analysis
4 Monte Carlo Methods in Classical Statistical Physics 91
will be discussed in Sect. 4.5.2.3. For large time separations k, A(k) decays expo-
nentially (a = const)
k→∞
A(k) −−−− → ae−k/τO,exp , (4.33)
which defines the exponential autocorrelation time τO,exp . At smaller distances usu-
ally also other modes contribute and A(k) behaves no longer purely exponentially.
This is illustrated in Fig. 4.2 for the 2D Ising model on a rather small 16×16
square lattice with
√ periodic boundary conditions at the infinite-volume critical point
βc = ln(1 + 2)/2 = 0.440 686 793 . . .. The spins were updated in sequential
order by proposing always to flip a spin and accepting or rejecting this proposal
according to (4.23). The raw data of the simulation are collected in a time-series
file, storing 1 000 000 measurements of the energy and magnetization taken after
each sweep over the lattice, after discarding (quite generously) the first 200 000
sweeps for equilibrating the system from a disordered start configuration. The last
1 000 sweeps of the time evolution of the energy are shown in Fig. 4.2(a). Using the
complete time series the autocorrelation function was computed according to (4.32)
which is shown in Fig. 4.2(b). On the linear-log scale of the inset we clearly see the
asymptotic linear behavior of ln A(k). A linear fit of the form (4.33), ln A(k) =
ln a − k/τe,exp , in the range 10 ≤ k ≤ 40 yields an estimate for the exponential
autocorrelation time of τe,exp ≈ 11.3. In the small k behavior of A(k) we observe
an initial fast drop, corresponding to faster relaxing modes, before the asymptotic
behavior sets in. This is the generic behavior of autocorrelation functions in realistic
models where the small-k deviations are, in fact, often much more pronounced than
for the 2D Ising model.
Close to a critical point, in the infinite-volume limit, the autocorrelation time
typically scales as
τO,exp ∝ ξ z , (4.34)
where z ≥ 0 is the so-called dynamical critical exponent. Since the spatial correla-
tion length ξ ∝ |T −Tc |−ν → ∞ when T → Tc , also the autocorrelation time τO,exp
diverges when the critical point is approached, τO,exp ∝ |T − Tc |−νz . This leads to
the phenomenon of critical slowing down at a continuous phase transition. This is
not in the first place a numerical artefact, but can also be observed experimentally for
instance in critical opalescence, see Fig. 1.1 in [5]. The reason is that local spin-flip
Monte Carlo dynamics (or diffusion dynamics in a lattice-gas picture) describes at
least qualitatively the true physical dynamics of a system in contact with a heat-bath
(which, in principle, enters stochastic elements also in molecular dynamics simula-
tions). In a finite system, the correlation length ξ is limited by the linear system size
L, and similar to the reasoning in (4.14) and (4.15), the scaling law (4.34) becomes
τO,exp ∝ Lz . (4.35)
For local dynamics, the critical slowing down effect is quite pronounced since
the dynamical critical exponent takes a rather large value around
z≈2, (4.36)
with
p = 1 − e−2β . (4.42)
Here the nij are bond occupation variables which can take the values nij = 0 or
nij = 1, interpreted as deleted or active bonds. The representation (4.40) in the
second line follows from the observation that the product σi σj of two Ising spins
can only take the two values ±1, so that exp(βσi σj ) = x + yδσi σj can easily be
solved for x and y.
And in the third line (4.41) we made use of the trivial (but clever)
1
identity a + b = n=0 (aδn,0 + bδn,1 ).
According to (4.41) a cluster update sweep consists of two alternating steps. First,
updates of the bond variables nij for given spins, and second updates of the spins
σi for a given bond configuration. In practice one proceeds as follows:
94 W. Janke
always
Fig. 4.3. Illustration of the bond variable update. The bond between unlike spins is always
deleted as indicated by the dashed line. A bond between like spins is only active with prob-
ability p = 1 − exp(−2β). Only at zero temperature (β → ∞) stochastic and geometrical
clusters coincide
(i) Set nij = 0 if σi = σj , or assign values nij = 1 and 0 with probability p and
1 − p, respectively, if σi = σj , cp. Fig. 4.3.
(ii) Identify clusters of spins that are connected by active bonds (nij = 1).
(iii) Draw a random value ±1 independently for each cluster (including one-site
clusters), which is then assigned to all spins in a cluster.
Technically the cluster identification part is the most complicated step, but there
are by now quite a few efficient algorithms available which can even be used on
parallel computers. Vectorization, on the other hand, is only partially possible.
Notice the difference between the just defined stochastic clusters and geometri-
cal clusters whose boundaries are defined by drawing lines through bonds between
unlike spins. In fact, since in the stochastic cluster definition also bonds between
like spins are deleted with probability p0 = 1 − p = exp(−2β), stochastic clus-
ters are smaller than geometrical clusters. Only at zero temperature (β → ∞) p0
approaches zero and the two cluster definitions coincide.
As described above, the cluster algorithm is referred to as Swendsen-Wang (SW)
or multiple-cluster update [61]. The distinguishing point is that the whole lattice is
decomposed into stochastic clusters whose spins are assigned a random value +1 or
−1. In one sweep one thus attempts to update all spins of the lattice.
Shortly after the original discovery of cluster algorithms, Wolff [63] proposed a
somewhat simpler variant in which only a single cluster is flipped at a time. This
variant is therefore sometimes also called single-cluster algorithm. Here one chooses
a lattice site at random, constructs only the cluster connected with this site, and then
flips all spins of this cluster. In principle, one could also here choose for all spins in
the updated cluster a new value +1 or −1 at random, but then nothing at all would
be changed if one hits the current value of the spins. Typical configuration plots
before and after the cluster flip are shown in Fig. 4.4, which also nicely illustrates the
difference between stochastic and geometrical clusters already stressed in the last
paragraph. The upper right plot clearly shows that, due to the randomly distributed
inactive bonds between like spins, the stochastic cluster is much smaller than the
underlying black geometrical cluster which connects all neighboring like spins.
4 Monte Carlo Methods in Classical Statistical Physics 95
Fig. 4.4. Illustration of the Wolff cluster update, using actual simulation results for the 2D
Ising model at 0.97βc on a 100×100 lattice. Upper left: Initial configuration. Upper right:
The stochastic cluster is marked. Note how it is embedded in the larger geometric cluster
connecting all neighboring like (black) spins. Lower left: Final configuration after flipping
the spins in the cluster. Lower right: The flipped cluster
In the single-cluster variant some care is necessary with the definition of the unit
of time since the number of flipped spins varies from cluster to cluster. It also de-
pends crucially on temperature since the average cluster size automatically adapts
to the correlation length. With |C| denoting the average cluster size, a sweep is
usually defined to consist of V /|C| single cluster steps, assuring that on the av-
erage V spins are flipped in one sweep. With this definition, autocorrelation times
are directly comparable with results from the Swendsen-Wang or Metropolis algo-
rithm. Apart from being somewhat easier to program, Wolff’s single-cluster variant
is usually even more efficient than the Swendsen-Wang multiple-cluster algorithm,
especially in 3D. The reason is that with the single-cluster method, on the average,
larger clusters are flipped.
96 W. Janke
with σ i = (σi,1 , σi,2 , . . . , σi,n ) and |σ i | = 1, one needs a new strategy for n ≥ 2
[63, 69, 70, 71] (the case n = 1 degenerates again to the Ising model). Here the
basic idea is to isolate Ising degrees of freedom by projecting the spins σ i onto a
randomly chosen unit vector r
σi = σi + σ⊥
i ,
σ i = ǫ |σ i · r| r ,
ǫ = sign(σ i · r) . (4.44)
with positive random couplings Jij = J|σ i · r||σ j · r| ≥ 0, whose Ising degrees of
freedom ǫi can be updated with a cluster algorithm as described above.
Table 4.2. Dynamical critical exponents z for the 2D and 3D Ising model (τ ∝ Lz ). The sub-
scripts indicate the observables and method used (exp resp. int: exponential resp. integrated
autocorrelation time, rel: relaxation, dam: damage spreading)
The intimate relationship of cluster algorithms with the correlated percolation rep-
resentation of Fortuin and Kasteleyn leads to another quite important improvement
which is not directly related with the dynamical properties discussed so far. Within
the percolation picture, it is quite natural to introduce alternative estimators (mea-
surement prescriptions) for most standard quantities which turn out to be so-called
improved estimators. By this one means measurement prescriptions that yield the
same expectation value as the standard ones but have a smaller statistical variance
which helps to reduce the statistical errors. Suppose we want to measure the expec-
tation value O of an observable O. Then any estimator O satisfying O
= O
is permissible. This does not determine O uniquely since there are infinitely many
other possible choices O ′ = O + X, as long as the added estimator X has zero
expectation X = 0. The variance of the estimator O ′ , however, can be quite dif-
ferent and is not necessarily related to any physical quantity (contrary to the standard
mean-value estimator of the energy, for instance, whose variance is proportional to
the specific heat). It is exactly this freedom in the choice of O which allows the
construction of improved estimators.
For the single-cluster algorithm an improved cluster estimator for the spin-spin
correlation function in the high-temperature phase G(xi − xj ) ≡ σ i · σ j is given
by [71]
98 W. Janke
When introducing the importance sampling technique in Sect. 4.3.1 it was already
indicated in (4.22) that within Markov chain Monte Carlo simulations, the expec-
tation value O of some quantity O, for instance the energy, can be estimated as
arithmetic mean
N
eq 1
O = O({σi })P ({σi }) ≈ O = Oj , (4.50)
N j=1
{σi }
Fig. 4.5. Phase-ordering with progressing Monte Carlo time (from left to right) of an initially
disordered spin configuration for the 2D Ising model at T = 1.5 ≈ 0.66 Tc [93]
and z is the dynamical critical exponent already introduced in Sect. 4.3.2. In the case
of a simple ferromagnet like the Ising- or q-state Potts model with a non-conserved
scalar order parameter, below Tc the dynamical exponent can be found exactly as
z = 2 [94], according to diffusion or random-walk arguments. Right at the transition
temperature, critical dynamics (for a recent review, see [95]) plays the central role
and the dynamical exponent of, e.g., the 2D Ising model takes the somewhat larger
non-trivial value z ≈ 2.17 [78, 79] cf. Table 4.2. To equilibrate the whole system, ξ
must approach the system size L, so that the typical relaxation time for equilibration
scales as
τrelax ∼ Lz . (4.51)
Note that this implies in the infinite-volume limit L → ∞ that true equilibrium can
never be reached.
Since 1/z < 1, the relaxation process after the quench happens on a growing
time scale. This can be revealed most clearly by measurements of two-time quan-
tities f (t, s) with t > s, which no longer transform time-translation invariantly as
they would do for small perturbations in equilibrium, where f would be a function
of the time difference t − s only. Instead, in phase-ordering kinetics, two-time quan-
tities depend non-trivially on the ratio t/s of the two times. The dependence of the
relaxation on the so-called waiting time s is the notional origin of ageing: Older
samples respond more slowly.
For the most commonly considered two-time quantities, dynamical scaling
forms can be theoretically predicted (for recent reviews see, e.g., [96, 97]). Well
studied are the two-time autocorrelation function (here in q-state Potts model
notation)
1 V
2
1 q ) *
C(t, s) = δσi (t),σi (s) av − 1 = s−b fC (t/s) , (4.52)
q − 1 V i=1
with the asymptotic behavior fC (x) → x−λC /z (x ≫ 1), and the two-time response
function "
δ [σi (t)]av ""
R(t, s) = = s−1−a fR (t/s) , (4.53)
δhi (s) " h=0
−λR /z
where fR (x) → x (x ≫ 1). Here h(s) is the amplitude of a small spa-
tially random external field which is switched off after the waiting time s and [. . .]av
4 Monte Carlo Methods in Classical Statistical Physics 101
denotes an average over different random initial configurations (and random fields
in (4.53)). In phase-ordering kinetics after a quench to T < Tc , in general b = 0 (and
z = 2) [94], but all other exponents depend on the dimensionality of the considered
system. In the simplest case of the Ising model in two dimensions, it is commonly
accepted that λC = λR = 5/4. The value of the remaining exponent a, however, is
more controversial [98, 99], with strong claims for a = 1/z = 1/2 [96, 100], but
also a = 1/4 [101, 102] has been conjectured. In computer simulation studies the
two-time response function is rather difficult to handle and it is more convenient to
consider the integrated response or thermoremanent magnetization (TRM) [103],
s
T
ρ(t, s) = T duR(t, u) = MTRM (t, s) . (4.54)
h
0
4.5.2.1 Estimators
2 2
σO = [O − O]2 = O − O2 , (4.55)
2
Assuming equilibrium, the individual variances σO i
= Oi2 − Oi 2 do not de-
2
pend on “time” i, such that the first term gives σOi /N . The second term with
Oi Oj − Oi Oj = (Oi − Oi )(Oj − Oj ) records the correlations be-
tween measurements at times i and j. For completely uncorrelated data (which is,
of course, an unrealistic assumption for importance sampling Monte Carlo simula-
tions), the second term would vanish and (4.57) simplifies to
2
ǫ2O ≡ σO 2
= σO i
/N . (4.58)
This result is true for any distribution P(Oi ). In particular, for the energy or mag-
netization, distributions of the individual measurements are often plotted as phys-
ically directly relevant (N independent) histograms (see, e.g., Fig. 4.8(b) below)
2
whose squared width (= σO i
) is proportional to the specific heat or susceptibility,
respectively.
Whatever form the distribution P(Oi ) assumes (which, in fact, is often close to
Gaussian because the Oi are usually already lattice averages over many degrees of
freedom), by the central limit theorem the distribution of the mean value is Gaussian,
at least for uncorrelated data in the asymptotic limit of large N . The variance of
2
the mean, σO , is the squared width of this (N dependent) distribution which is
2 2
usually taken as the one-sigma squared error, ǫO ≡ σO , and quoted together with
the mean value O. Under the assumption of a Gaussian distribution for the mean,
the interpretation is that about 68% of all simulations under the same conditions
would yield a mean value in the range [O − σO , O + σO ] [113]. For a two-sigma
interval which also is sometimes used, this percentage goes up to about 95.4%, and
for a three-sigma interval which is rarely quoted, the confidence level is higher than
99.7%.
4 Monte Carlo Methods in Classical Statistical Physics 103
For correlated data the second term in (4.57) does not vanish and things become
more involved
N N115, 116]. Using the symmetry i ↔ j to reduce the summation
N[114,
i=j to 2 i=1 j=i+1 , reordering the summation, and using time-translation in-
variance in equilibrium, one finally obtains [111]
1 N
k
2 2
σO = σOi + 2 O1 O1+k − O1 O1+k 1− , (4.59)
N N
k=1
where, due to the last factor (1 − k/N ), the k = N term may be trivially kept in the
2
summation. Factoring out σO i
, this can be written as
2
σO
ǫ2O ≡ σO
2
= i ′
2τO,int , (4.60)
N
where we have introduced the (proper) integrated autocorrelation time
N
′ 1 k
τO,int = + A(k) 1 − , (4.61)
2 N
k=1
with
O1 O1+k − O1 O1+k k→∞
A(k) ≡ 2 −−−→ ae−k/τO,exp
− (4.62)
σO i
The notion “integrated” derives from the fact that this may be interpreted as a trape-
&N
zoidal discretization of the (approximate) integral τO,int ≈ 0 dkA(k). Notice that,
′
in general, τO,int (and also τO,int ) is different from τO,exp . In fact, one can show [117]
that τO,int ≤ τO,exp in realistic models. Only if A(k) is a pure exponential, the two
autocorrelation times, τO,int and τO,exp , coincide (up to minor corrections for small
τO,int [58, 111]).
As far as the accuracy of Monte Carlo data is concerned, the important point
of (4.60) is that due3 to temporal correlations of the measurements the statistical
error ǫO ≡ O ⇒ σO 2 on the Monte Carlo estimator O is enhanced by a factor
of 2τO,int . This can be rephrased by writing the statistical error similar to the
3
uncorrelated case as ǫO = σO 2 /N , but now with a parameter
j eff
104 W. Janke
describing the effective statistics. This shows more clearly that only every 2τO,int
iterations the measurements are approximately uncorrelated and gives a better idea
of the relevant effective size of the statistical sample. In view of the scaling behavior
of the autocorrelation time in (4.34), (4.35) or (4.37), it is obvious that without extra
care this effective sample size may become very small close to a continuous or first-
order phase transition, respectively.
4.5.2.4 Bias
A too small effective sample size does not only affect the error bars, but for some
quantities even the mean values can be severely underestimated. This happens for
so-called biased estimators, as is for instance the case for the specific
heat and
susceptibility. The specific heat can be computed as C = β 2 V e2 − e2 =
β 2 V σe2i , with the standard estimator for the variance
N
1 2
e2i = e2 − e2 = (e − e)2 =
σ (ei − e) . (4.65)
N i=1
Subtracting and adding e2 , one finds for the expectation value
σe2i = e2 − e2 = e2 − e2 − e2 − e2 = σe2i + σe2 .
(4.66)
The estimator σ e2i in (4.65) thus systematically underestimates the true value by a
term of the order of τe,int /N . Such
√ an estimator is called weakly biased (weakly be-
cause the statistical error ∝ 1/ N is asymptotically larger than the systematic bias;
for medium or small N , however, also prefactors need to be carefully considered).
We thus see that for large autocorrelation times or equivalently small effective
statistics Neff , the bias may be quite large. Since τe,int scales quite strongly with
the system size for local update algorithms, some care is necessary when choosing
the run time N . Otherwise the FSS of the specific heat or susceptibility and thus the
determination of the static critical exponent α/ν or γ/ν could be completely spoiled
by the temporal correlations [118]! Any serious simulation study should therefore
provide at least a rough order-of-magnitude estimate of autocorrelation times.
The above considerations show that not only for the error estimation but also for
the computation of static quantities themselves, it is important to have control over
4 Monte Carlo Methods in Classical Statistical Physics 105
which approaches τO,int in the limit of large kmax where, however, its statistical error
increases rapidly. As an example, Fig. 4.6(a) shows results for the 2D Ising model
from an analysis of the same raw data as in Fig. 4.2.
As a compromise between systematic and statistical errors, an often employed
procedure is to determine the upper limit kmax self-consistently by cutting off the
summation once kmax ≥ 6 τO,int (kmax ), where A(k) ≈ e−6 ≈ 10−3 . In this case an
a priori error estimate is available [116, 119, 120]
4 4
2(2kmax + 1) 12
ǫτO,int = τO,int ≈ τO,int . (4.69)
N Neff
For a 5% relative accuracy one thus needs at least Neff ≈ 5 000 or N ≈ 10 000 τO,int
measurements. For an order of magnitude estimate consider the 2D Ising model
on a square lattice with L = 100 simulated with a local update algorithm. Close
to criticality, the integrated autocorrelation time for this example is of the order
of Lz ≈ L2 ≈ 1002 (ignoring an priori unknown prefactor of order unity which
Fig. 4.6. (a) Integrated autocorrelation time approaching τe,int ≈ 5.93 for large upper cutoff
kmax and (b) binning analysis for the energy of the 2D Ising model on a 16×16 lattice at βc ,
using the same data as in Fig. 4.2. The horizontal line in (b) shows 2τe,int with τe,int read off
from (a)
106 W. Janke
It should be clear by now that ignoring autocorrelation effects can lead to severe
underestimates of statistical errors. Applying the full machinery of autocorrelation
analysis discussed above, however, is often too cumbersome. On a day by day basis
the following binning analysis is much more convenient (though somewhat less ac-
curate). By grouping the N original time-series data into NB non-overlapping bins
or blocks of length k (such that6 N = NB k), one forms a new, shorter time series
of block averages
k
(B) 1
Oj ≡ O(j−1)k+i (4.71)
k i=1
with j = 1, . . . , NB , which by choosing the block length k ≫ τ are almost uncor-
related and can thus be analyzed by standard means. The mean value over all block
averages obviously satisfies O(B) = O and their variance can be computed accord-
ing to the standard (unbiased) estimator, leading to the squared statistical error of
the mean value
B N
2 2 2 1 (B)
ǫO ≡ σO = σB /NB = (O − O(B) )2 . (4.72)
NB (NB − 1) j=1 j
2 2
By comparing with (4.60) we see that σB /NB = 2τO,int σO i
/N . Recalling the defi-
nition of the block length k = N/NB , this shows that one may also use
2 2
2τO,int = kσB /σO i
(4.73)
for the estimation of τO,int . This is demonstrated in Fig. 4.6(b). Estimates of τO,int
obtained in this way are often referred to as blocking τ or binning τ .
A simple toy model (bivariate time series), where the behavior of the blocking
τ and also of τO,int (kmax ) for finite k resp. kmax can be worked out exactly, is dis-
cussed in [58]. These analytic formulas are very useful for validating the computer
implementations.
6
Here we assume that N was chosen cleverly. Otherwise one has to discard some of the
data and redefine N .
4 Monte Carlo Methods in Classical Statistical Physics 107
Even if the data are completely uncorrelated in time, one still has to handle the
problem of error estimation for quantities that are not directly measured in the sim-
ulation but are computed as a non-linear combination of basic observables. This
problem can either be solved by error propagation or by using the Jackknife method
[122, 123], where instead of considering rather small blocks of length k and their
(J)
fluctuations as in the binning method, one forms NB large Jackknife blocks Oj
th
containing all data but the j block of the previous binning method,
(B)
(J) N O − kOj
Oj = (4.74)
N −k
with j = 1, . . . , NB , cf. the schematic sketch in Fig. 4.7.
Each of the Jackknife blocks thus consists of N − k data, i.e., it contains almost
as many data as the original time series. When non-linear combinations of basic
variables are estimated, the bias is hence comparable to that of the total data set
(typically 1/(N − k) compared to 1/N ). The NB Jackknife blocks are, of course,
trivially correlated because one and the same original data enter in NB − 1 different
Jackknife blocks. This trivial correlation caused by re-using the original data over
and over again has nothing to do with temporal correlations. As a consequence,
the Jacknife block variance σJ2 will be much smaller than the variance estimated in
the binning method. Because of the trivial nature of the correlations, however, this
reduction can be corrected by multiplying σJ2 with a factor (NB − 1)2 , leading to
N
NB − 1 B
(J)
2
ǫ2O ≡ σO = (O − O(J) )2 . (4.75)
NB j=1 j
To summarize this section, any realization of a Markov chain Monte Carlo up-
date algorithm is characterized by autocorrelation times which enter directly into the
statistical errors of Monte Carlo estimates. Since temporal correlations always in-
crease the statistical errors, it is thus a very important issue to develop Monte Carlo
O
(J)
1 O1
(J)
2 O2
(J)
3 O3
(J)
NB ONB
Fig. 4.7. Schematic sketch of the organization of Jackknife blocks. The grey part of the
N data points is used for calculating the total and the Jackknife block averages. The white
blocks enter into the more conventional binning analysis using non-overlapping blocks
108 W. Janke
update algorithms that keep autocorrelation times as small as possible. This is the
reason why cluster and other non-local algorithms are so important.
The physics underlying reweighting techniques [124, 125] is extremely simple and
the basic idea has been known since long (see the list of references in [125]), but
their power in practice has been realized only relatively late in 1988. The impor-
tant observation by Ferrenberg and Swendsen [124, 125] was that the best perfor-
mance is achieved near criticality where histograms are usually broad. In this sense
reweighting techniques are complementary to improved estimators, which usually
perform best off criticality.
If we would normalize Pβ0 (E) to unit area, the r.h.s. would have to be divided by
E Pβ0 (E) = Z(β0 ), but the normalization will be unimportant in what follows.
Let us assume we have performed a Monte Carlo simulation at inverse temperature
β0 and thus know Pβ0 (E). It is then easy to see that
i.e., the histogram at any point β can be derived, in principle, by reweighting the
simulated histogram at β0 with the exponential factor exp[−(β − β0 )E]. Notice that
in reweighted expectation values
f (E)Pβ (E)
f (E)(β) = E , (4.79)
E Pβ (E)
the normalization of Pβ (E) indeed cancels. This gives, for instance, the energy
e(β) = E(β)/V and the specific heat C(β) = β 2 V [e2 (β) − e(β)2 ], in
4 Monte Carlo Methods in Classical Statistical Physics 109
As a rule of thumb, the range over which reweighting should produce accurate
results can be estimated by requiring that the peak location of the reweighted his-
2 β = 0.375 β = 0.475
2D Ising
2
16 10
specific heat
β0 = βc
counts
(b)
(a)
0
0 0.5 1.0 1.5 2.0
0.3 0.4 0.5 0.6
β −E/V
Fig. 4.8. (a) The specific heat of the 2D Ising model on a 16×16 square lattice computed
by reweighting from a single Monte Carlo simulation at β0 = βc , marked by the filled data
symbol. The continuous line shows for comparison the exact solution of Kaufman [12, 13].
(b) The corresponding energy histogram at β0 , and reweighted to β = 0.375 and β = 0.475.
The dashed lines show for comparison the exact histograms obtained from Beale’s expression
[112]
7
For simplicity we consider here only models with discrete energies. If the energy varies
continuously, sums have to be replaced by integrals, etc. Also lattice size dependences are
suppressed to keep the notation short.
110 W. Janke
togram should not exceed the energy value at which the input histogram had de-
creased to about one half or one third of its maximum value. In most applications
this range is wide enough to locate from a single simulation, e.g., the specific-heat
maximum by employing a standard maximization subroutine to the continuous func-
tion C(β). This is by far more convenient, accurate and faster than the traditional
way of performing many simulations close to the peak of C(β) and trying to deter-
mine the maximum by spline or least-squares fits.
For an analytical estimate of the reweighting range we now require that the peak
of the reweighted histogram is within the width e(T0 ) ± ∆e(T0 ) of the input
histogram (where a Gaussian histogram would have decreased to exp(−1/2) ≈
0.61 of its the maximum value)
|e(T ) − e(T0 )| ≤ ∆e(T0 ) , (4.80)
where we have made use of the fact that for a not too asymmetric histogram Pβ0 (E)
the maximum location approximately coincides with e(T0 ). Recalling that the half
width ∆e of a histogram is related to the specific heat via (∆e)2 ≡ (e − e)2 =
e2 − e2 = C(β0 )/β02 V and using the Taylor expansion e(T ) = e(T0 ) +
C(T0 )(T − T0 ) + . . ., this can be written as C(T0 )|T − T0 | ≤ T0 C(T0 )/V or
|T − T0 | 1
≤ . (4.81)
T0 V C(T0 )
Since C(T0 ) is known from the input histogram this is quite a general estimate of the
reweighting range. For the example in Fig. 4.8 with V =16×16, β0 = βc ≈ 0.44
and C(T0 ) ≈ 1.5, this estimate yields |β − β0 |/β0 ≈ |T − T0 |/T0 ≤ 0.04, i.e.,
|β − β0 | ≤ 0.02 or 0.42 ≤ β ≤ 0.46. By comparison with the exact solution we see
that this is indeed a fairly conservative estimate of the reliable reweighting range.
If we only want to know the scaling behavior with system size V = LD , we can
go one step further by considering three generic cases:
(i) Off-criticality, where C(T0 ) ≈ const, such that
|T − T0 |
∝ V −1/2 = L−D/2 . (4.82)
T0
(ii) Criticality, where C(T0 ) ≃ a1 + a2 Lα/ν , with a1 and a2 being constants, and
α and ν denoting the standard critical exponents of the specific heat and cor-
relation length, respectively. For α > 0, the leading scaling behavior becomes
|T − T0 |/T0 ∝ L−D/2 L−α/2ν . Assuming hyperscaling (α = 2 − Dν) to be
valid, this simplifies to
|T − T0 |
∝ L−1/ν , (4.83)
T0
i.e., the typical scaling behavior of pseudo-transition temperatures in the finite-
size scaling regime of a second-order phase transition [126]. For α < 0, C(T0 )
approaches asymptotically a constant and the leading scaling behavior of the
reweighting range is as in the off-critical case.
4 Monte Carlo Methods in Classical Statistical Physics 111
|T − T0 |
∝ V −1 = L−D , (4.84)
T0
which is again the typical finite-size scaling behavior of pseudo-transition tem-
peratures close to a first-order phase transition [47].
we arrive at
g(M ) = g(M )(E)Pβ0 (E) . (4.87)
E
Identifying g(M )(E) with f (E) in (4.79), the actual reweighting procedure is
precisely as before. An example for computing |M |(E) and M 2 (E) using
the data of Fig. 4.8 is shown in Fig. 4.9. Mixed quantities, e.g. E k M l , can be
treated similarly. One caveat of this method is that one has to decide beforehand
which lists g(M )(E) one wants to store during the simulation, e.g., which pow-
ers k in M k (E) are relevant.
An alternative and more flexible method is based on time series. Suppose we
have performed a Monte Carlo simulation at β0 and stored the time series of N
112 W. Janke
1.0 1.0
β = 0.375 β = 0.375
0.8 β = βc 0.8 β = βc
β = 0.475 β = 0.475
<<m2>>(E)
<<|m|>>(E)
0.6 0.6
0.4 0.4
0.2 0.2
(a) (b)
0.0 0.0
0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0
−E/V −E/V
Fig. 4.9. Microcanonical expectation values for (a) the absolute magnetization and (b) the
magnetization squared obtained from the 2D Ising model simulations shown in Fig. 4.8
i.e., in particular all moments E k M l can be computed. Notice that this can also
be written as
f (E, M )e−(β−β0 )E 0
f (E, M ) = , (4.89)
e−(β−β0 )E 0
where the subscript zero refers to expectation values taken at β0 . Another very im-
portant advantage of the last formulation is that it works without any systematic
discretization error also for continuously distributed energies and magnetizations.
As nowadays hard-disk space is no real limitation anymore, it is advisable to
store time series in any case. This guarantees the greatest flexibility in the data anal-
ysis. As far as the memory requirement of the actual reweighting code is concerned,
however, the method of choice is sometimes not so clear. Using directly histograms
and lists, one typically has to store about (6−8) V data, while working directly with
the time series one needs 2N computer words. The cheaper solution (also in terms
of CPU time) thus obviously depends on both, the system size V and the run length
N . It is hence sometimes faster to generate from the time series first histograms and
the required lists and then proceed with reweighting the latter quantities.
Here we shall assume that the histograms Pβi (E) are naturally normalized
E Pβi (E) = Ni , such that
the statistical errors for each of the histograms Pβi (E)
are approximately given by Pβi (E). By choosing as reference point β0 = 0 and
working out the error weighted combined histogram one ends up with
m
Pβi (E)
Ω(E) = m i=1 −1 −βi E
, (4.90)
i=1 i i e
N Z
where the unknown partition function values Zi ≡ Z(βi ) are determined self-
consistently from
m
−βi E −βi E Pβk (E)
Zi = Ω(E)e = e m k=1 −1 −βk E
, (4.91)
E E k=1 Nk Zk e
m2
U2 (β) = 1 − ,
3|m|2
m4
U4 (β) = 1 − , (4.95)
3m2 2
Further quantities with a useful FSS behavior are the derivatives of the magnetiza-
tion,
d|m|
= V (|m|e − |m|e) ,
dβ
d ln|m| |m|e
=V − e ,
dβ |m|
2
d lnm2 m e
=V − e . (4.97)
dβ m2
These latter five quantities are good examples for expectation values containing
both, powers of e and m.
In the infinite-volume limit most of these quantities exhibit singularities at the
transition point. As already discussed in Sect. 4.2, in finite systems the singularities
are smeared out and the standard observables scale in the critical region according to
where Creg is a regular background term, α, ν, β (in the exponent of L) and γ are
the usual critical exponents, and fi (x) are FSS functions with
structure of the magnetization distribution (provided the runs are long enough). By aver-
aging m one thus gets zero by symmetry, while the peak locations ±m0 (L) are close to
the spontaneous magnetization and the average of |m| is a good estimator. Things become
more involved for slightly asymmetric models, where this recipe would produce a sys-
tematic error and thus cannot be employed. For strongly asymmetric models, on the other
hand, one peak clearly dominates and the average of m can usually be measured without
too many problems.
116 W. Janke
x = (β − βc )L1/ν (4.101)
being the scaling variable (do not confuse the unfortunate double-meaning of β –
here β = 1/kB T ). The brackets [1+. . .] indicate corrections-to-scaling terms which
become unimportant for sufficiently large system sizes L.
A particular role play the magnetic cumulants or Binder parameters U2 and U4
which scale according to
i.e., for constant scaling variable x, they take approximately the same value for
∗
all lattice sizes, in particular U2p ≡ fU2p (0) at βc . Their curves as function of
∗
temperature for different L hence cross around (βc , U2p ) (with slopes ∝ L1/ν ), apart
from corrections-to-scaling collected in [1 + . . .] which explain small systematic
deviations. From a determination of this crossing point, one thus obtains a basically
∗
unbiased estimate of βc , the critical exponent ν, and U2p . Note that in contrast to the
∗
truly universal critical exponents, U2p is only weakly universal. By this one means
that the infinite-volume limit of such quantities does depend in particular on the
boundary conditions and geometrical shape of the considered lattice, e.g., on the
aspect ratio r = Ly /Lx [129, 130, 131, 132, 133, 134, 135, 136].
Differentiating U2p with respect to β, one picks up an extra power of L from the
scaling function, dU2p /dβ = (dx/dβ)fU′ 2p = L1/ν fU′ 2p . This leads to
dU2p
= L1/ν fU′ 2p (x)[1 + . . .] , (4.103)
dβ
and similarly for the magnetization derivatives
d|m|
= L(1−β)/ν fm′
(x)[1 + . . .] , (4.104)
dβ
d ln|m|p
= L1/ν fdmp (x)[1 + . . .] . (4.105)
dβ
By applying standard reweighting techniques to the time-series data one first
determines the temperature dependence of C(β), χ(β), . . . , in the neighborhood
of the simulation point β0 ≈ βc (a reasonably good guess of β0 can usually be
obtained quite easily from a few short test runs). It should be stressed that in a
serious study, by estimating the valid reweighting range, one should at any rate
make sure that no systematic errors crept in by this procedure (which may be eas-
ily overlooked if one works too mechanically). Once the temperature dependence
is known, one can determine the maxima, e.g., Cmax (βmaxC ) ≡ maxβ C(β), by
applying standard extremization routines: When reweighting is implemented as a
subroutine, for instance C(β) can be handled as a normal function with a con-
tinuously varying argument β, i.e., no interpolation or discretization error is in-
volved when iterating towards the maximum. The locations of the maxima of C,
χ, dU2 /dβ, dU4 /dβ, d|m|/dβ, d ln|m|/dβ, and d lnm2 /dβ provide us with
4 Monte Carlo Methods in Classical Statistical Physics 117
seven sequences of pseudo-transition points βmaxi (L) which all should scale ac-
cording to βmaxi (L) = βc + ai L−1/ν + . . .. In other words, the scaling variable
x = (βmaxi (L) − βc )L1/ν = ai + . . . should be constant, if we neglect the small
higher-order corrections indicated by . . ..
Notice that while the precise estimates of ai do depend on the value of ν, the
qualitative conclusion that x ≈ const for each of the βmaxi (L) sequences does not
require any a priori knowledge of ν or βc . Using this information one thus has
several possibilities to extract unbiased estimates of the critical exponents ν, α/ν,
β/ν, and γ/ν from least-squares fits assuming the FSS behaviors (4.98)–(4.105).
(iii) As a useful cross-check one can determine βc also from the Binder parameter
crossings. For a first rough estimate, this is a very convenient and fast method.
Remarks: As a rule of thumb, an accuracy of about 3–4 digits for βc can be
obtained with this method without any elaborate infinite-volume extrapolations
– the crossing points lie usually much closer to βc than the various maxima
locations. For high precision, however, it is quite cumbersome to control the
necessary extrapolations and often more accurate estimates can be obtained
by considering the scaling of the maxima locations. Also, error estimates of
crossing points involve the data for two different lattice sizes which tends to be
quite unhandy.
(iv) Next, similarly to ν, the ratios of critical exponents α/ν, β/ν, and γ/ν can be
obtained from fits to (4.98)–(4.100), and (4.104). Again the maxima of these
quantities or any of the FSS sequences βmaxi can be used. What concerns the
fitting procedure the same remarks apply as for ν.
Remarks: The specific heat C usually plays a special role in that the expo-
nent α is difficult to determine. The reason is that α is usually relatively small
(3D Ising model: α ≈ 0.1), may be zero (logarithmic divergence as in the 2D
Ising model) or even negative (as for instance in the 3D XY and Heisenberg
models). In all these cases, the constant background contribution Creg in (4.98)
becomes important, which enforces a non-linear three-parameter fit with the
just described problems. Also for the susceptibility χ, a regular background
term cannot be excluded, but it is usually much less important since γ ≫ α.
Therefore, in (4.99), (4.100), and (4.104), similar to the fits for ν, one may take
the logarithm and work with much more stable linear fits.
(v) As a final step one may re-check the FSS behavior of C, χ, dU2 /dβ, . . . at the
numerically determined estimate of βc . These fits should be repeated also at
βc ± ∆βc in order to estimate by how much the uncertainty in βc propagates
into the thus determined exponent estimates.
Remark: In (the pretty rare) cases where βc is known exactly (e.g., through self-
duality), this latter option is by far the most accurate one. This is the reason,
why for such models numerically estimated critical exponents are usually also
much more precise.
The purpose of this subsection is to illustrate the above outlined recipe with ac-
tual data from recent simulations of a 2D Ising model with next-nearest neighbor
interactions [137]. The Hamiltonian has the form
H = −J σi σj − Jd σk σl , (4.106)
i,j (k,l)
where the spins can take the values σi = ±1, J denotes the nearest-neighbor (nn)
coupling and Jd is the next-nearest-neighbor (nnn) coupling along the two diagonals
4 Monte Carlo Methods in Classical Statistical Physics 119
of a square lattice. The corresponding pairs of spins are denoted by the brack-
ets i, j and (k, l), respectively. In [137] we restricted ourselves to that region
of the phase diagram where the ground states show ferromagnetic order (J ≥ 0,
Jd ≥ −J/2), and always assumed periodic boundary conditions. Absorbing the nn
coupling J into the inverse temperature β (i.e., formally putting J = 1), the remain-
ing second parameter is the coupling-constant ratio α = Jd /J. In the following we
will concentrate on the case α = 0.5 [138]. The linear size of the lattices varies from
L = 10, 20, 40, . . . up to 640. All simulations are performed with the single-cluster
algorithm which is straightforward to adapt to nnn interactions by setting bonds also
along the diagonals. Similarly to the standard nn model, the integrated autocorrela-
tion time close to criticality is found for α = 1 [137] to scale only weakly with
lattice size: τe,int ∝ Lz with z = 0.134(3).
Another example following closely the lines sketched above is provided by a
Monte Carlo study of the 3D Ising model, albeit not on a regular but on Poissonnian
random lattices of Voronoi-Delaunay type [139]. The random lattices are treated as
quenched disorder in the local coordination numbers and hence necessitate an ad-
ditional average over many realizations (in the study described in [139], for each
lattice size 96 independent realizations were used). This introduces in all FSS for-
mulas additional disorder averages which complicate some aspects of the analysis.
The general concept of FSS analysis, however, does not depend on this special fea-
ture and it may be worthwhile to consult [139] for a supplementary example.
Having recorded the times series of the energy and magnetization, all quantities of
the preceding paragraph can be computed in the FSS region. The scaling behavior
of the maxima of d ln|m|p /dβ and dU2p /dβ for p = 1 and p = 2 is shown in
9
(d ln m2/dβ)max
8 (d ln⏐m⏐/dβ)max
(dU4/dβ)max
7 (dU2/dβ)max
6
ln Omax
5
4
3
2
1
0
2 3 4 5 6 7
ln L
Table 4.3. Fit results for the correlation length critical exponent ν of the 2D nnn Ising model
with α = Jd /J = 0.5, and the weighted average of the four estimates. Also listed are the χ2
per degree of freedom, χ2 /d.o.f., and the goodness-of-fit parameter Q [113]
d ln
|m|/dβ d ln
m2 /dβ dU2 /dβ dU4 /dβ weighted av.
ν 1.0031(17) 1.0034(21) 1.0027(24) 1.0025(44) 1.0031(11)
χ2 /d.o.f 0.98 0.60 2.02 0.49
Q 0.37 0.55 0.13 0.61
the log-log plot of Fig. 4.10. From the parameters of the four linear fits over the
data points with Lmin > 40 collected in Table 4.3, we obtain a weighted average of
ν = 1.003 1 ± 0.001 1.
As the more detailed analysis in [139] clearly shows, considering all 4×7 =
28 possible FSS sequences (the four observables shown in Fig. 4.10 evaluated at
the seven different βmaxi sequences) does not significantly improve the precision
of the final estimate. The reason are the quite strong correlations between most of
these 28 estimates. In a really large-scale simulation such a more detailed analysis
may still be valuable, however, since it potentially helps to detect systematic trends
which otherwise may remain unnoticed. Also here the weighted average is clearly
dominated by the result from the d ln|m|/dβ fit, and correlations between the first
and second pair of estimates are obvious. Therefore, to account for these correlations
at least heuristically, we usually quote in our investigations the weighted average,
but take the smallest contributing error estimate (here thus from the d ln|m|/dβ
fit). This recipe then gives from Table 4.3 the final result
in good agreement with the 2D Ising universality class (cf. Table 4.1).
Fixing the critical exponent ν at the numerically determined estimate (or in the
present context at the exactly known value ν = 1), it is now straightforward to
obtain estimates of the critical coupling βc from linear least-squares fits to
where βmaxi are the pseudo-transition points discussed earlier. Depending on the
quantity considered, here we found a significant improvement of the fit quality if the
smallest lattice sizes were excluded. This is illustrated in Table 4.4, where detailed
results for various fit ranges are compiled.
As final result we quote the weighted average of the five estimates and again the
smallest contributing error bar,
Table 4.4. FSS fits of the pseudo-transition points βmax = βc + aL−1/ν of the nnn model
(4.106) with α = 0.5 for varying fit ranges, assuming ν = 1. Here n is the number of data
points, Lmin denotes the smallest lattice size considered, and Q is the standard goodness-of-fit
parameter. The selected fit ranges used for the final average are high-lighted in boldface. The
last line labeled HTS gives a high-temperature series estimate [140] for comparison
observables n Lmin βc Q
C
βmax 7 10 0.262 699(13) 0.00
6 20 0.262 766(15) 0.03
5 40 0.262 799(18) 0.88
4 80 0.262 807(22) 0.89
|m|
βinf 7 10 0.262 8706(36) 0.00
6 20 0.262 8398(40) 0.00
5 40 0.262 8272(47) 0.16
4 80 0.262 8212(58) 0.38
χ
βmax 7 10 0.262 8253(12) 0.00
6 20 0.262 8195(13) 0.00
5 40 0.262 8178(14) 0.09
4 80 0.262 8153(17) 0.66
ln |m|
βinf 7 10 0.262 8437(62) 0.00
6 20 0.262 8243(68) 0.24
5 40 0.262 8183(77) 0.42
4 80 0.262 8099(97) 0.70
2
ln m
βinf 7 10 0.262 8684(94) 0.00
6 20 0.262 837(11) 0.43
5 40 0.262 837(13) 0.57
4 80 0.262 818(17) 0.55
average 0.262 8204(144)
weighted average 0.262 8174(16)
final 0.262 8174(17)
HTS (Oitmaa [140]) 0.262 808
The corrections to the asymptotic FSS behavior can be also visually inspected
in Fig. 4.11, where the Monte Carlo data and fits are compared. One immediately
notices a systematic trend that the L = 10 data deviate from the linear behavior.
For larger L, however, the deviations are already so small that only a quantitative
judgement in terms of the χ2 per degree of freedom or goodness-of-fit parameter Q
of the fits [113] can lead to a sensible conclusion.
Following our general recipe sketched above, the Binder parameter U4 (L) is shown
in Fig. 4.12 as a function of temperature. Even though the temperature range is much
122 W. Janke
0.265
0.260
0.255
0.250
βmax
0.245
C
0.240 βmax
d|m|
0.235 βmax
χ
βmax
0.230 dln|m|
βmax
0.225 d ln m
2
βmax
0.220
0.00 0.02 0.04 0.06 0.08 0.10
L−1/ν(with ν = 1 )
Fig. 4.11. FSS fits of the pseudo-transition points βmaxi with ν = 1.0 fixed of the 2D nnn Ising
model (4.106) with α = Jd /J = 0.5. The error weighted average of the FSS extrapolations
yields βc = 0.262 817 4(16), cf. Table 4.4 for details
smaller than in the βmaxi plot of Fig. 4.11, a clear-cut crossing point can be observed.
Already from the crossing of the two curves for the very modestly sized lattices with
L = 10 and L = 20 (which can be obtained in a few minutes of computing time),
one can read off that βc ≈ 0.262 8. This clearly demonstrates the power of this
method, although it should be stressed that the precision is exceptionally good for
this model.
On the scale of Fig. 4.12 one reads off that U4∗ ≈ 0.61. Performing an extrapo-
lation (on a very fine scale) to infinite size at β = βc , one obtains the more accurate
0.65
0.6
U4(L)
0.55
L = 10
L = 20
L = 40
0.5 L = 80
L = 160
L = 320
L = 640
0.45
0.256 0.258 0.26 0.262 0.264 0.266
β
Fig. 4.12. Fourth-order Binder parameter U4 , exhibiting a sharp crossing point around
(βc , U4∗ ) ≈ (0.262 82, 0.61). Note the much smaller temperature scale compared to Fig. 4.11
4 Monte Carlo Methods in Classical Statistical Physics 123
estimate of U4∗ = 0.610 8(1). This result for the 2D nnn Ising model with α = 0.5
is in perfect agreement with the very precisely known value for the standard square
lattice nn Ising model with periodic boundary conditions from extrapolating exact
transfer-matrix data for L ≤ 17 [129], U4∗ = 0.610 690 1(5), and a numerical
evaluation of an exact expression [130], U4∗ = 0.610 692(2). This illustrates the ex-
pected universality of U4∗ (and also U2∗ ) for general isotropic interactions (e.g., also
for α = 1 one finds the same result within error bars [137]), as long as boundary
conditions, lattice shapes etc. are the same. As emphasized already in Sect. 4.7.1,
the cumulants are, however, only weakly universal in the sense that they do depend
sensitively on the anisotropy of interactions, boundary conditions and lattice shapes
(aspect ratios) [131, 132, 133, 134, 135, 136].
The exponent ratio γ/ν can be obtained from fits to the FSS behavior (4.100) of
the susceptibility. By monitoring the quality of the fits, using all data starting from
L = 10 is justified. The fits collected in Table 4.5 all have Q ≥ 0.15.
Still it is fairly obvious, that the two fits with Q < 0.2 have some problems.
Discarding them in the averages, one obtains from the weighted average (and again
quoting the smallest contributing error estimate to heuristically take into account the
correlations among the individual fits)
Table 4.5. Fit results for the critical exponents γ/ν, β/ν, and (1 − β)/ν. The fits for γ/ν and
(1 − β)/ν take all lattices with L ≥ 10 into account while the fits for β/ν start at L = 20
to be compared with the exact result 7/4 = 1.75. For the critical exponent η, the
estimate (4.110) implies η = 2 − γ/ν = 0.252 3 ± 0.001 2, and, by inserting
our value of ν = 1.003 1(17), one obtains γ = 1.753 1 ± 0.004 2. Here and in
the following we are quite conversative and always quote the maximal error, i.e.,
max{(O1 + ǫ1 )(O2 + ǫ2 ) − O1 O2 , O1 O2 − (O1 − ǫ1 )(O2 − ǫ2 )}.
The exponent ratio β/ν can be obtained either from the FSS behavior of |m|
or d|m|/dβ, (4.99) or (4.104). In the first case, Table 4.5 shows that most βmaxi
sequences yield poor Q values (≤ 0.1) even if the L = 10 lattice data is discarded.
If one averages only the fits with Q ≥ 0.02, the final result is
and, by using our estimate (4.107) for ν, β = 0.125 23 ± 0.000 54, in very good
agreement with the exact result β/ν = β = 1/8 = 0.125 00 for the 2D Ising
universality class. Assuming hyperscaling to be valid, the estimate (4.111) implies
γ/ν = D − 2β/ν = 1.750 30(64).
From the Q values in Table 4.5 one can conclude that the FSS of d|m|/dβ is
somewhat better behaved, so that one can keep again all lattice sizes L ≥ 10 in the
fits. By discarding only the fit for the βmaxC sequence, which has an exceptionally
small Q value, one arrives at
so that by inserting our estimate (4.107) for ν, β/ν = 0.119 4 ± 0.003 2, and finally
β = 0.119 8 ± 0.003 0.
Due to the regular background term Creg in the FSS behavior (4.98), the specific
heat is usually among the most difficult quantities to analyze [141]. In the present
example the critical exponent α is expected to be zero, as can be verified by using
the hyperscaling relation α = 2 − Dν = −0.0062(34). In such a situation it may
be useful to test at least the consistency of a linear two-parameter fit with α/ν kept
fixed. In the present case with α = 0, this amounts to the special form C = Creg +
a ln(L). As can be inspected in Fig. 4.13, the expected linear behavior is, in fact,
satisfied over the whole range of lattice sizes.
To conclude this example analysis [138], it should be stressed that no particular
care was taken to arrive at high-precision estimates for the critical exponents since in
the original work [137] primarily the critical coupling was of interest. In applications
aiming also at accurate exponent estimates, one may experiment more elaborately
4 Monte Carlo Methods in Classical Statistical Physics 125
3.5
C(βmax(L)) 3
2.5
2 C
dm/dβ
χ
1.5 dU2/dβ
d ln|m|/dβ
1 d ln m2/dβ
dU4/dβ
0.5
2 3 4 5 6 7
log L
Fig. 4.13. FSS behavior of the specific heat evaluated at the various βmaxi sequences, assum-
ing α = 0, i.e., a logarithmic scaling ∝ ln L
with the fit ranges and averaging procedures. If (small) inconsistencies happen to
persist, it is in particular also wise to re-check the extent of the reliable reweighting
range, which often turns out to be the source of trouble in the first place (. . . which
we have not seriously attempted to exclude in this example analysis).
Since critical phenomena are intimately connected with diverging spatial correla-
tions, it is in many applications important to also estimate the correlation length. In
the high-temperature phase down to the critical point, we have σi = 0 and the
two-point correlation function (4.7) simplifies to
By summing over all lattice points one obtains the susceptibility (without β
prefactor)
1
χ′ /β = G(r i − r j ) = G(r)
V r ,r r
i j
1
2
=V σi = V m2 . (4.114)
V r
with β dependent prefactor a and mass parameter m. Inserting this into (4.114), one
finds for large distances |r| ≫ 1 (but |r| ≪ L/2 for finite periodic lattices)
with (|r| ≫ 1), so that the inverse mass can be identified as the correlation length
ξ ≡ 1/m.
In order to avoid the power-like prefactor in (4.117) and to increase effectively the
statistics one actually measures in most applications a so-called projected (zero-
momentum) correlation function defined by (r = (x1 , x2 , . . .))
g(x1 − x′1 )
L
L
1
= G(r i − rj )
LD−1 x2 ,x3 ,...=1 x′2 ,x′3 ,...=1
L
L
1 1
D−1
=L σx1 ,x2 ,x3 ,... σx′1 ,x′2 ,x′3 ,... ,
LD−1 LD−1
x2 ,x3 ,...=1 x′2 ,x′3 ,...=1
(4.118)
L
i.e., the correlations of line magnetizations L−1 x2 =1 σx1 ,x2 for 2D systems or
L
surface magnetizations L−2 x2 ,x3 =1 σx1 ,x2 ,x3 for 3D systems at x1 and x′1 . Notice
that in all dimensions
L−1
1 1
χ′ /β = g(0) + g(i) + g(L) (4.119)
2 i=1
2
&L
is given by the trapezoidal approximation to the area 0 g(x)dx under the pro-
jected correlation function g(x). Applying the summations in (4.118) to the Fourier
decomposition of G(r i − rj ) and using
4 Monte Carlo Methods in Classical Statistical Physics 127
L
1
eik2 x2 +ik3 x3 +... = δk2 ,0 δk3 ,0 . . . , (4.120)
LD−1 x2 ,x3 ,...=1
is the one-dimensional version of (4.115) and (4.116), since all but one momentum
component are projected to zero in (4.120). This can be evaluated exactly as
a cosh[m∗ (L/2 − x)]
g(x) = ∗
2 sinh m sinh(m∗ L/2)
∗
a −m∗ x 2e−m L ∗
= e + cosh(m x) , (4.122)
2 sinh m∗ 1 − e−m∗ L
with m and m∗ related by
∗
m m
= sinh ,
2 2
4
m∗ m m 2
= ln + +1 . (4.123)
2 2 2
For ξ > 10 (m < 0.1) the difference between ξ and ξ ∗ ≡ 1/m∗ is completely
negligible, (ξ ∗ − ξ)/ξ < 0.042%. Notice that there is no x-dependent prefactor in
(4.122). And also note that G(r) computed for r along one of the coordinate axes
is a truly D-dimensional correlation function (albeit along some special direction),
exhibiting the r (D−1)/2 prefactor of (4.117).
Figure 4.14 shows as an example g(x) for the standard nn Ising model at
T = 2.5 ≈ 1.1 Tc on a 50×50 square lattice. By fitting the Monte Carlo data
to the cosh-form (4.122), m∗ = 0.167 9 is obtained or ξ ∗ = 5.957. Inserting
this value into (4.123), one obtains ξ = 1/m = 5.950. This is in very good
agreement (at a 0.1-0.2% level) with the exactly known correlation length (of the
two-dimensional correlation function) along one of the two main coordinate axes,
ξ||(ex) = −1/(ln(tanh(β)) + 2β) = 5.962 376 984 . . . [14, 15].
6
Monte Carlo
5 cosh fit
exp
3
g(x)
0
0 5 10 15 20 25 30 35 40 45 50
x
Fig. 4.14. Zero momentum projected correlation function g(x) for the standard 2D nn Ising
model at T = 2.5 > Tc . Also shown is a fit with the cosh-ansatz (4.122), yielding m∗ =
0.167 9 or ξ ∗ = 5.957, and the exponential approximation ∝ exp(−m∗ x)
1 D
2
1
G(k) −1
= 2(1 − cos ki ) + m 2
≡ c1 κ 2 + c0 , (4.125)
a i=1
0.08
0.05
1/G(k)
0.04
0.03
0.02
0.01
0.00
0.00 0.02 0.04 0.06 0.08
2
Σi=1 2(1 − cos ki)
Fig. 4.15. Inverse −1 versus squared lattice mo-
components G(k)
2 2 long-wavelength Fourier
2
menta κ ≡ i=1 2(1 − cos ki ) ≈ k for the 2D Ising model at T = 2.5 > Tc . The
2
fit (4.125),
c1 κ + c0 , gives c1 = 0.565 5 and c0 = 0.015 96, and hence by (4.126),
ξ = c1 /c0 = 5.953
and all m systems at different simulation points β1 < β2 < . . . < βm are sim-
ulated in parallel, using any legitimate update algorithm (Metropolis, cluster,. . . ).
This freedom in the choice of update algorithm is a big advantage of the paral-
lel tempering method. After a certain number of sweeps, exchanges of the cur-
rent configurations {σ}i and {σ}j are attempted (equivalently, the βi may be ex-
changed, as is done in most implementations). Adapting the Metropolis criterion
(4.24) to the present situation, the proposed exchange will be accepted with proba-
bility W = min(1, e∆ ), where
∆ = (βj − βi ) [E({σ}j ) − E({σ}i )] . (4.132)
To assure a reasonable acceptance rate, usually only nearest-neighbor exchanges
(j = i ± 1) are attempted and the βi should again be spaced with the δβ given in
(4.130). In most applications, the smallest inverse temperature β1 is chosen in the
high-temperature phase where the autocorrelation time is expected to be very short
and the system decorrelates rapidly. Conceptually this approach follows again the
avoiding-rare-events strategy.
Notice that in parallel tempering no free-energy parameters have to be adjusted.
The method is thus very flexible and moreover can be almost trivially parallelized.
4 Monte Carlo Methods in Classical Statistical Physics 131
introducing a multicanonical weight factor W (Q) where Q stands for any macro-
scopic observable such as the energy or magnetization. This defines formally
Hmuca = H − (1/β) ln W (Q) which may be interpreted as an effective multicanon-
ical Hamiltonian. The Monte Carlo sampling can then be implemented as usual by
comparing Hmuca before and after a proposed update of {σ}, and canonical expec-
tation values can be recovered exactly by inverse reweighting
similarly to (4.89). The goal is now to find a suitable weight factor W such that the
dynamics of the multicanonical simulation profits most.
To be specific, let us assume in the following that the relevant macroscopic ob-
servable is the energy E itself. This is for instance the case at a temperature driven
first-order phase transition, where the canonical energy distribution Pcan (E) devel-
ops a characteristic double-peak structure [47]. As an illustration, simulation data
for the 2D seven-state Potts model [158] are shown in Fig. 4.16. With increasing
system size, the region between the two peaks becomes more and more suppressed
(∝ exp(−2σod LD−1 ) where σod is the (reduced) interface tension, LD−1 the cross-
section of a D-dimensional system, and the factor two accounts for the fact that with
the usually employed periodic boundary condition at least two interfaces are present
due to topological reasons) and the autocorrelation time thus grows exponentially
with the system size L. In the literature, this is sometimes termed supercritical slow-
ing down (even though nothing is critical here). Given such a situation, one usually
adjusts W = W (E) such that the multicanonical distribution Pmuca (E) is approx-
imately constant between the two peaks of Pcan (E), thus aiming at a random-walk
(pseudo-) dynamics of the Monte Carlo process, cf. Fig. 4.16.
The crucial non-trivial point is, of course, how this can be achieved. On a piece
of paper, W (E) ∝ 1/Pcan (E) – but we do not know Pcan (E) (otherwise there would
be little need for the simulation . . . ). The solution of this problem is a recursive
computation. Starting with the canonical distribution, or some initial guess based
on results for already simulated smaller systems together with finite-size scaling
3.0 q = 7, L = 60
Pmuca
2.0
P(E)
1.0
Pcan
0.0
1.0 1.2 1.4 1.6
−E/V
Fig. 4.16. The canonical energy density Pcan (E) of the 2D 7-state Potts model on a 60×60
lattice at inverse temperature βeqh,L , where the two peaks are of equal height, together with
the multicanonical energy density Pmuca (E), which is approximately constant between the
two peaks
4 Monte Carlo Methods in Classical Statistical Physics 133
Go to (i).
The recursion is initialized with p0 (E) = 0. To derive this recursion one as-
sumes
that (unnormalized) histogram entries Hn (E) have an a priori statistical error
Hn (E) and (quite crudely) that all data are uncorrelated. Due to the accumulation
of statistics, this procedure is rather insensitive to the length of the nth run in the first
step and has proved to be rather stable and efficient in practice.
In most applications local update algorithms have been employed, but for certain
classes of models also non-local multigrid methods [119, 120, 160, 161] are appli-
cable [121, 162]. A combination with non-local cluster update algorithms, on the
other hand, is not straightforward. Only by making direct use of the random-cluster
representation as a starting point, a multibondic variant [163, 164, 165] has been de-
veloped. For a recent application to improved finite-size scaling studies of second-
order phase transitions, see [128]. If Pmuca was completely flat and the Monte Carlo
update moves would perform an ideal random walk, one would expect that after V 2
local updates the system has travelled on average a distance V in total energy. Since
one lattice sweep consists of V local updates, the autocorrelation time should scale
in this idealized picture as τ ∝ V . Numerical tests for various models with a first-
order phase transition have shown that in practice the data are at best consistent with
a behavior τ ∝ V α , with α ≥ 1. While for the temperature-driven transitions of 2D
Potts models the multibondic variant seems to saturate the bound [163, 164, 165],
employing local update algorithms, typical fit results are α ≈ 1.1–1.3, and due to
the limited accuracy of the data even a weak exponential growth law cannot really
be excluded.
134 W. Janke
In fact, at least for the field-driven first-order transition of the 2D Ising model
below Tc , where one works with the magnetization instead of the energy (some-
times called multimagnetical simulations), it has been demonstrated recently [166]
that even for a perfectly flat multicanonical distribution there are two hidden free
energy barriers (in directions orthogonal to the magnetization) which lead to an ex-
ponential growth of τ with lattice size, which is albeit much weaker than the leading
supercritical slowing down of the canonical simulation. Physically the two barriers
are related to the nucleation of a large droplet of the wrong phase (say down-spins in
the background of up-spins) [167, 168, 169, 170, 171, 172, 173] and the transition
of this large, more or less spherical droplet to the strip phase (coexisting strips of
down- and up-spins, separated by two straight interfaces) around m = 0 [174].
Another more recently proposed method deals directly with estimators Ω(E) of the
density of states [175, 176]. By flipping spins randomly, the transition probability
from energy level E1 to E2 is
Ω(E1 )
p(E1 → E2 ) = min ,1 . (4.139)
Ω(E2 )
Each time an energy level is visited, the estimator is multiplicatively updated
with n = 0, 1, . . ., and the energy histogram reset to zero until some small value
−8
such as f = e10 ≈ 1.000 000 01 is reached.
For the 2D Ising model this procedure converges very rapidly towards the ex-
actly known density of states, and also for other applications a fast convergence has
been reported. Since the procedure is known to violate detailed balance, however,
some care is necessary in setting up a proper protocol of the recursion. Most authors
who employ the obtained density of states directly to extract canonical expectation
values by standard reweighting argue that, once f is close enough to unity, sys-
tematic deviations become negligible. While this claim can be verified empirically
for the 2D Ising model (where exact results are available for judgement), possible
systematic deviations are difficult to assess in the general case. A safe way would
be to consider the recursion (4.139)–(4.141) as an alternative method to determine
the multicanonical weights, and then to perform a usual multicanonical simulation
based on them. As emphasized earlier, any deviations of multicanonical weights
from their optimal shape do not show up in the final canonical expectation values;
they rather only influence the dynamics of the multicanonical simulations.
4 Monte Carlo Methods in Classical Statistical Physics 135
The intention of these lecture notes was to give an elementary introduction to the
concepts of modern Markov chain Monte Carlo simulations and to illustrate their
usefulness by applications to the very simple Ising lattice spin model. The basic
Monte Carlo methods employing local update rules are straightforward to generalize
to all models with discrete degrees of freedom and, with small restrictions, also to all
models with continuous variables and off-lattice systems. Non-local cluster update
methods are much more efficient but also more specialized. Some generalizations to
Potts and O(n) symmetric spin models have been indicated and also further models
may be efficiently simulated by this method, but there is no guarantee that for a given
model a cluster update procedure can be developed. The statistical error analysis is
obviously completely general, and also the example finite-size scaling analysis can
be taken as a guideline for any model exhibiting a second-order phase transition.
Finally, reweighting techniques and generalized ensemble ideas such as tempering
methods, the multicanonical ensemble and Wang-Landau sampling can be adapted
to almost every statistical physics problem at hand once the relevant macroscopic
observables are identified.
Acknowledgements
Many people have influenced these lecture notes with their advice, discussions,
questions, and active contributions. In particular I wish to thank Michael Bachmann,
Bertrand Berche, Pierre-Emmanuel Berche, Bernd A. Berg, Alain Billoire, Kurt
Binder, Elmar Bittner, Christophe Chatelain, Thomas Haase, Malte Henkel,
Desmond A. Johnston, Christoph Junghans, Ralph Kenna, David P. Landau, Eric
Lorenz, Thomas Neuhaus, Andreas Nußbaumer, Michel Pleimling, Adriaan Schakel,
and Martin Weigel for sharing their insight and knowledge with me. Special thanks
go to Elmar Bittner for his help with the sample finite-size scaling analysis.
This work was partially supported by the Deutsche Forschungsgemeinschaft
(DFG) under grants JA 483/22-1 and JA 483/23-1, the EU RTN-Network “EN-
RAGE”: Random Geometry and Random Matrices: From Quantum Gravity to
Econophysics under grant MRTN-CT-2004-005616, and the JUMP computer time
grants hlz10, hlz11, and hlz12 of NIC at Forschungszentrum Jülich.
References
1. M. Newman, G. Barkema, Monte Carlo Methods in Statistical Physics (Clarendon
Press, Oxford, 1999) 80, 86
2. D. Landau, K. Binder, Monte Carlo Simulations in Statistical Physics (Cambridge Uni-
versity Press, Cambridge, 2000) 80, 86
3. K. Binder, D. Heermann, Monte Carlo Simulations in Statistical Physics: An Introduc-
tion, 4th edn. (Springer, Berlin, 2002) 80, 86
136 W. Janke
4. B. Berg, Markov Chain Monte Carlo Simulations and Their Statistical Analysis (World
Scientific, Singapore, 2004) 80, 86, 131
5. H. Stanley, Introduction to Phase Transitions and Critical Phenomena (Oxford Press,
Oxford, 1979) 80, 92
6. J. Binney, N. Dowrick, A. Fisher, M. Newman, The Theory of Critical Phenomena
(Oxford University Press, Oxford, 1992) 80
7. D. Lavis, G. Bell, Statistical Mechanics of Lattice Systems 2 (Springer, Berlin, 1999) 80
8. C. Domb, J. Lebowitz (eds.), Phase Transitions and Critical Phenomena (Academic
Press, New York, 1976) 80
9. W. Lenz, Phys. Z. 21, 613 (1920) 81
10. E. Ising, Phys. Z. 31, 253 (1925) 81
11. L. Onsager, Phys. Rev. 65, 117 (1944) 82
12. B. Kaufman, Phys. Rev. 76, 1232 (1949) 82, 109
13. A. Ferdinand, M. Fisher, Phys. Rev. 185, 832 (1969) 82, 109
14. B. McCoy, T. Wu, The Two-Dimensional Ising Model (Harvard University Press,
Cambridge, 1973) 82, 127
15. R. Baxter, Exactly Solved Models in Statistical Mechanics (Academic Press, New York,
1982) 82, 127
16. L. Onsager, Nuovo Cimento 6, 261 (1949) 82
17. C. Yang, Phys. Rev. 85, 808 (1952) 82
18. C. Chang, Phys. Rev. 88, 1422 (1952) 82
19. W. Orrick, B. Nickel, A. Guttmann, J. Perk, Phys. Rev. Lett. 86, 4120 (2001) 82
20. W. Orrick, B. Nickel, A. Guttmann, J. Perk, J. Stat. Phys. 102, 795 (2001) 82
21. R. Griffiths, Phys. Rev. Lett. 24, 1479 (1970) 83
22. G. Rushbrooke, J. Chem. Phys. 39, 842 (1963) 83
23. R. Griffiths, Phys. Rev. Lett. 14, 623 (1965) 83
24. B. Josephson, Proc. Phys. Soc. 92, 269 (1967) 83
25. B. Josephson, Proc. Phys. Soc. 92, 276 (1967) 83
26. M. Fisher, Phys. Rev. 180, 594 (1969) 83
27. L. Widom, J. Chem. Phys. 43, 3892 (1965) 83
28. L. Widom, J. Chem. Phys. 43, 3898 (1965) 83
29. L. Kadanoff, Physics 2, 263 (1966) 83
30. K. Wilson, J. Kogut, Phys. Rep. C12, 75 (1974) 83
31. F. Wu, Rev. Mod. Phys. 54, 235 (1982) 84
32. F. Wu, Rev. Mod. Phys. 55, 315(E) (1983) 84
33. M. Weigel, W. Janke, Phys. Rev. B62, 6343 (2000) 84
34. K. Binder, in Monte Carlo Methods in Statistical Physics ed. by K. Binder (Springer,
Berlin, 1979), p. 1 84, 86
35. M. Barber, in Phase Transitions and Critical Phenomena, Vol. 8 ed. by C. Domb,
J. Lebowitz (Academic Press, New York, 1983), p. 146 84
36. V. Privman (ed.), Finite-Size Scaling and Numerical Simulations of Statistical Systems
(World Scientific, Singapore, 1990) 84
37. K. Binder: in Computational Methods in Field Theory, Schladming Lecture Notes, eds.
H. Gausterer, C.B. Lang (Springer, Berlin, 1992), P. 59 84
38. J. Gunton, M. Miguel, P. Sahni, in Phase Transitions and Critical Phenomena, Vol. 8,
ed. by C. Domb, J. Lebowitz (Academic Press, New York, 1983) 85
39. K. Binder, Rep. Prog. Phys. 50, 783 (1987) 85
40. H. Herrmann, W. Janke, F. Karsch (eds.), Dynamics of First Order Phase Transitions
(World Scientific, Singapore, 1992) 85
4 Monte Carlo Methods in Classical Statistical Physics 137
41. W. Janke, in Computer Simulations in Condensed Matter Physics, Vol. VII, ed. by
D. Landau, K. Mon, H.B. Schüttler (Springer, Berlin, 1994), p. 29 85
42. M. Fisher, A. Berker, Phys. Rev. B 26, 2507 (1982) 85
43. V. Privman, M. Fisher, J. Stat. Phys. 33, 385 (1983) 85
44. K. Binder, D. Landau, Phys. Rev. B 30, 1477 (1984) 85
45. M. Challa, D. Landau, K. Binder, Phys. Rev. B 34, 1841 (1986) 85
46. V. Privman, J. Rudnik, J. Stat. Phys. 60, 551 (1990) 85
47. W. Janke, in Computer Simulations of Surfaces and Interfaces, NATO Science Series,
II. Mathematics, Physics and Chemistry Vol. 114, ed. by B. Dünweg, D. Landau,
A. Milchev (Kluwer, Dordrecht, 2003), pp. 111–135 85, 92, 111, 131, 132
48. C. Borgs, R. Kotecký J. Stat. Phys. 61, 79 (1990) 85
49. J. Lee, J. Kosterlitz Phys. Rev. Lett. 65, 137 (1990) 85
50. C. Borgs, R. Kotecký, S. Miracle-Solé, J. Stat. Phys. 62, 529 (1991) 85
51. C. Borgs, W. Janke, Phys. Rev. Lett. 68, 1738 (1992) 85
52. W. Janke, Phys. Rev. B 47, 14757 (1993) 85
53. J. Hammersley, D. Handscomb, Monte Carlo Methods (Chapman and Hall, London,
New York, 1964) 85
54. D. Heermann, Computer Simulation Methods in Theoretical Physics, 2nd edn.
(Springer, Berlin, 1990) 86
55. K. Binder (ed.), The Monte Carlo Method in Condensed Matter Physics (Springer,
Berlin, 1992) 86
56. N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, E. Teller, J. Chem. Phys. 21,
1087 (1953) 86
57. S. Kirkpatrick, C. Gelatt. Jr., M. Vecchi, Science 220, 671 (1983) 87
58. W. Janke, in Ageing and the Glass Transition – Summer School, University of Lux-
embourg, September 2005, Lecture Notes in Physics, Vol. 716, ed. by M. Henkel,
M. Pleimling, R. Sanctuary (Springer, Berlin, Heidelberg, 2007), Lecture Notes in
Physics, pp. 207–260 87, 89, 103, 106
59. W. Janke, in Proceedings of the Euro Winter School Quantum Simulations of Com-
plex Many-Body Systems: From Theory to Algorithms, NIC Series, Vol. 10, ed. by
J. Grotendorst, D. Marx, A. Muramatsu (John von Neumann Institute for Computing,
Jülich, 2002), pp. 447–458 87
60. R. Glauber, J. Math. Phys. 4, 294 (1963) 89, 90
61. R. Swendsen, J.S. Wang, Phys. Rev. Lett. 58, 86 (1987) 93, 94, 97
62. R. Potts, Proc. Camb. Phil. Soc. 48, 106 (1952) 93
63. U. Wolff, Phys. Rev. Lett. 62, 361 (1989) 93, 94, 96
64. W. Janke, Mathematics and Computers in Simulations 47, 329 (1998) 93
65. P. Kasteleyn, C. Fortuin, J. Phys. Soc. Japan 26, 11 (1969) 93
66. C. Fortuin, P. Kasteleyn, Physica 57, 536 (1972) 93
67. C. Fortuin, Physica 58, 393 (1972) 93
68. C. Fortuin, Physica 59, 545 (1972) 93
69. U. Wolff, Nucl. Phys. B322, 759 (1989) 96, 98
70. M. Hasenbusch, Nucl. Phys. B333, 581 (1990) 96
71. U. Wolff, Nucl. Phys. B334, 581 (1990) 96, 97, 98
72. U. Wolff, Phys. Lett. A 228, 379 (1989) 96, 97
73. C. Baillie, Int. J. Mod. Phys. C 1, 91 (1990) 96
74. M. Hasenbusch, S. Meyer, Phys. Lett. B 241, 238 (1990) 96
75. R. Swendsen, J.S. Wang, A. Ferrenberg, in The Monte Carlo Method in Condensed
Matter Physics ed. by K. Binder (Springer, Berlin, 1992) 96
138 W. Janke
76. X.L. Li, A. Sokal, Phys. Rev. Lett. 63, 827 (1989) 96
77. X.L. Li, A. Sokal, Phys. Rev. Lett. 67, 1482 (1991) 96
78. M. Nightingale, H. Blöte, Phys. Rev. Lett. 76, 4548 (1996) 97, 100
79. M. Nightingale, H. Blöte, Phys. Rev. B 62, 1089 (2000) 97, 100
80. P. Grassberger, Physica A 214, 547 (1995) 97
81. P. Grassberger, Physica A 217, 227 (E) (1995) 97
82. N. Ito, K. Hukushima, K. Ogawa, Y. Ozeki, J. Phys. Soc. Japan 69, 1931 (2000) 97
83. D. Heermann, A. Burkitt, Physica A 162, 210 (1990) 97
84. P. Tamayo, Physica A 201, 543 (1993) 97
85. N. Ito, G. Kohring, Physica A 201, 547 (1993) 97
86. W. Janke, Phys. Lett. A 148, 306 (1990) 98
87. C. Holm, W. Janke, Phys. Rev. B 48, 936 (1993) 98, 113
88. W. Janke, A. Schakel, Nucl. Phys. B700, 385 (2004) 98
89. W. Janke, A. Schakel, Comp. Phys. Comm. 169, 222 (2005) 98
90. W. Janke, A. Schakel, Phys. Rev. E 71, 036703 (2005) 98
91. W. Janke, A. Schakel, Phys. Rev. Lett. 95, 135702 (2005) 98
92. W. Janke, A. Schakel, in Order, Disorder and Criticality: Advanced Problems of Phase
Transition Theory, Vol. 2, ed. by Y. Holovatch (World Scientific, Singapore, 2007),
pp. 123–180 98
93. E. Lorenz, Ageing phenomena in phase-ordering kinetics in Potts models. Diploma
thesis, Universität Leipzig (2005). www.physik.uni-leipzig.de/∼lorenz/ diplom.pdf 100, 101
94. A. Rutenberg, A. Bray, Phys. Rev. E 51, 5499 (1995) 100, 101
95. P. Calabrese, A. Gambassi, J. Phys. A 38, R133 (2005) 100
96. C. Godrèche, J.M. Luck, J. Phys.: Condens. Matter 14, 1589 (2002) 100, 101
97. L.F. Cugliandolo: in Slow Relaxation and Non Equilibrium Dynamics in Condensed
Matter, Les Houches Lectures, eds. J.-L. Barrat, J. Dalibard, J. Kurchan, M.V.
Feigel’man (Springer, Berlin, 2003) 100
98. F. Corberi, E. Lippiello, M. Zannetti, Phys. Rev. Lett. 90, 099601 (2003) 101
99. M. Henkel, M. Pleimling, Phys. Rev. Lett. 90, 099602 (2003) 101
100. L. Berthier, J. Barrat, J. Kurchan, Europhys. J. B 11, 635 (1999) 101
101. F. Corberi, E. Lippiello, M. Zannetti, Europhys. J. B 24, 359 (2001) 101
102. F. Corberi, E. Lippiello, M. Zannetti, Phys. Rev. E 65, 046136 (2003) 101
103. A. Barrat, Phys. Rev. E 57, 3629 (1998) 101
104. M. Henkel, Conformal Invariance and Critical Phenomena (Springer, Berlin, 1999) 101
105. M. Henkel, M. Pleimling, C. Godrèche, J.M. Luck, Phys. Rev. Lett. 87, 265701 (2001)
101
106. M. Henkel, Nucl. Phys. B641, 405 (2002) 101
107. M. Henkel, M. Paessens, M. Pleimling, Europhys. Lett. 62, 664 (2003) 101
108. M. Henkel, M. Pleimling, Phys. Rev. E 68, 065101 (R) (2003) 101
109. M. Henkel, A. Picone, M. Pleimling, Europhys. Lett. 68, 191 (2004) 101
110. E. Lorenz, W. Janke, Europhys. Lett. 77, 10003 (2007) 101
111. W. Janke, in Proceedings of the Euro Winter School Quantum Simulations of Complex
Many-Body Systems: From Theory to Algorithms, NIC Series, Vol. 10, ed. by J. Gro-
tendorst, D. Marx, A. Muramatsu (John von Neumann Institute for Computing, Jülich,
2002), pp. 423–445 102, 103
112. P. Beale, Phys. Rev. Lett. 76, 78 (1996) 109
113. W. Press, S. Teukolsky, W. Vetterling, B. Flannery, Numerical Recipes in Fortran 77
– The Art of Scientific Computing, 2nd edn. (Cambridge University Press, Cambridge,
1999) 102, 117, 120, 121
4 Monte Carlo Methods in Classical Statistical Physics 139
114. M. Priestley, Spectral Analysis and Time Series, Vol. 2 (Academic, London, 1981).
Chaps. 5–7 103
115. T. Anderson, The Statistical Analysis of Time Series (Wiley, New York, 1971) 103
116. N. Madras, A. Sokal, J. Stat. Phys. 50, 109 (1988) 103, 105
117. A. Sokal, L. Thomas, J. Stat. Phys. 54, 797 (1989) 103
118. A. Ferrenberg, D. Landau, K. Binder, J. Stat. Phys. 63, 867 (1991) 104
119. A. Sokal, Monte Carlo Methods in Statistical Mechanics: Foundations and New Algo-
rithms (Cours de Troisième Cycle de la Physique en Suisse Romande, Lausanne, 1989)
105, 133
120. A. Sokal, in Quantum Fields on the Computer, ed. by M. Creutz (World Scientific,
Singapore, 1992), p. 211 105, 133
121. W. Janke, T. Sauer, J. Stat. Phys. 78, 759 (1995) 106, 133
122. B. Efron, The Jackknife, the Bootstrap and Other Resampling Plans (Society for Indus-
trial and Applied Mathematics [SIAM], Philadelphia, 1982) 107
123. R. Miller, Biometrika 61, 1 (1974) 107
124. A. Ferrenberg, R. Swendsen, Phys. Rev. Lett. 61, 2635 (1988) 108
125. A. Ferrenberg, R. Swendsen, Phys. Rev. Lett. 63, 1658(E) (1989) 108
126. N. Wilding, in Computer Simulations of Surfaces and Interfaces, NATO Science Series,
II. Mathematics, Physics and Chemistry Vol. 114, ed. by B. Dünweg, D. Landau,
A. Milchev (Kluwer, Dordrecht, 2003), pp. 161–171 110
127. A. Ferrenberg, R. Swendsen, Phys. Rev. Lett. 63, 1195 (1989) 112
128. B. Berg, W. Janke, Phys. Rev. Lett. 98, 040602 (2007) 114, 133
129. G. Kamieniarz, H. Blöte, J. Phys. A 26, 201 (1993) 116, 123
130. J. Salas, A. Sokal, J. Stat. Phys. 98, 551 (2000) 116, 123
131. X. Chen, V. Dohm, Phys. Rev. E 70, 056136 (2004) 116, 123
132. V. Dohm, J. Phys. A 39, L259 (2006) 116, 123
133. W. Selke, L. Shchur, J. Phys. A 38, L739 (2005) 116, 123
134. M. Schulte, C. Drope, Int. J. Mod. Phys. C 16, 1217 (2005) 116, 123
135. M. Sumour, D. Stauffer, M. Shabat, A. El-Astal, Physica A 368, 96 (2006) 116, 123
136. W. Selke, Europhys. J. B 51, 223 (2006); preprint http://arxiv.org/abs/
cond-mat/0701515 116, 123
137. A. Nußbaumer, E. Bittner, W. Janke, Europhys. Lett. 78, 16004 (2007) 118, 119, 123, 124
138. E. Bittner, W. Janke, The pain of example analyses – a (self-)critical discussion. Un-
published results 119, 124
139. W. Janke, R. Villanova, Phys. Rev. B 66, 134208 (2002) 119, 120
140. J. Oitmaa, J. Phys. A 14, 1159 (1981) 121
141. C. Holm, W. Janke, Phys. Rev. Lett. 78, 2265 (1997) 124
142. E. Marinari, G. Parisi, Europhys. Lett. 19, 451 (1992) 129
143. A. Lyubartsev, A. Martsinovski, S. Shevkunov, P. Vorontsov-Velyaminov, J. Chem.
Phys. 96, 1776 (1992) 129
144. C. Geyer, in Proceedings of the 23rd Symposium on the Interface, ed. by E. Keramidas
(Interface Foundation, Fairfax, Virginia, 1991), pp. 156–163 130
145. C. Geyer, E. Thompson, J. Am. Stat. Assoc. 90, 909 (1995) 130
146. K. Hukushima, K. Nemoto, J. Phys. Soc. Japan 65, 1604 (1996) 130
147. B. Berg, Fields Inst. Comm. 26, 1 (2000) 131
148. B. Berg, Comp. Phys. Comm. 104, 52 (2002) 131
149. W. Janke, Physica A 254, 164 (1998) 131
150. W. Janke, in Computer Simulations of Surfaces and Interfaces, NATO Science Series, II.
Mathematics, Physics and Chemistry –Proceedings of the NATO Advanced Study Insti-
tute, Albena, Bulgaria, 9–20 September 2002, Vol. 114, ed. by B. Dünweg, D. Landau,
A. Milchev (Kluwer, Dordrecht, 2003) 131
140 W. Janke
Detlev Reiter
D. Reiter: The Monte Carlo Method for Particle Transport Problems, Lect. Notes Phys. 739, 141–158 (2008)
DOI 10.1007/978-3-540-74686-7 5
c Springer-Verlag Berlin Heidelberg 2008
142 D. Reiter
the Monte Carlo method then basically simulates an approximating integral equa-
tion of a jump process. A very clear discussion on the approximation of diffusion
processes by jump-processes (i.e., opposite to the direction usually used in physi-
cal arguments to derive Fokker Planck equations) is, e.g., given in the monograph
by C.W. Gardiner [2]. Once this approximation is done, the Monte Carlo proce-
dures for Fokker-Planck equations and for Boltzmann equations become analog.
We shall, therefore, only discuss discontinuous jump processes from now on, hence
only Monte Carlo methods for solving Fredholm integral equations. We will fol-
low a similar strategy as in the introductory chapter on Monte Carlo methods in
these lecture notes, Chap. 3: Although we will try to make explicit the underlying
mathematical basis of the method, we strongly build on the key advantage of Monte
Carlo methods over numerical concepts: The important role of intuition to guide the
derivation of the algorithm, which consequently retains a high level of transparency.
The first Monte Carlo computer simulations have been carried out within the
US atomic bomb project (Manhatten project), under the leadership of John von
Neumann (Fig. 5.1, left) and Stan Ulam.
Neutron migration in material was simulated by a cascade of decisions based
on non-uniform random numbers (Fig. 5.1, right): At the start of a neutron velocity
and position was sampled. Then the mean free flight distance (from an exponential
distribution) was determined, leading to the decision: Collision or transit through the
medium? If transit, the neutron was moved and new free flight distance is sampled.
(a)
(b)
1)
2)
3)
4)
5)
Fig. 5.1. Left: John von Neumann (1952), Mathematician of Hungarian origin, 1903–1957
(c Copyright 2006 Los Alamos National Security, LLC. All rights reserved). Right: Track-
ing of individual particle histories from birth to death
5 The Monte Carlo Method for Particle Transport Problems 143
The new element in this present chapter on particle transport is the fact that the
distribution law f may not be known explicitly anymore. Hence direct random sam-
pling from f is not possible. Common to all Monte Carlo applications to transport
theory is that f is given only implicity, as solution of a governing kinetic equation.
This kinetic equation can be a differential equation (diffusive transport, Fokker-
Planck type differential equations, i.e., very soft interactions causing only small
changes), an integral equation (ballistic transport, Boltzmann type integral equa-
tions, hard interactions, causing discontinuous jumps), or of mixed type. We refer to
the historic papers on this relation between analytic properties of trajectories of ran-
dom walks and corresponding differential and integral equations by W. Feller [6],
and references therein. As will be discussed next the key idea is then to generate an
entire random walk (Markov chain) rather than sequences of independent random
numbers.
It is worth noting that also a second very wide class of Monte Carlo applica-
tions, namely those to problems in statistical mechanics (chapter 4), is based upon
a similar idea: Ensemble averages (very high dimensional integrals) are found there
by generating a random walk in the Gibb’s phase-space, rather than explicitly con-
sidering the underlying many-body distribution law itself. Because of this similarity
of the concept with the historically earlier developed neutron transport applications
also this procedure was then named Monte Carlo method, see Metropolis [7].
144 D. Reiter
To introduce the terminology, we briefly recall the basic definitions and principles
of a Monte Carlo linear transport model, following the lead of many textbooks
on Monte Carlo methods for computing neutron transport (see e.g., Spanier and
Gelbard, [3]). We begin with the linear transport equation for the dependent vari-
able ψ (see below), written as integral equation (linear non-homogeneous Fredholm
integral equation (FIE) of 2nd kind). This equation reads
ψ(x) = S(x) + dx′ K(x′ → x)ψ(x′ ) ,
(5.2)
c(x ) = dx′ K(x′ → x) .
′
In Sect. 5.3 we will interpret the generic FIE (5.2) of transport theory with the par-
ticular Boltzmann equation for dilute gases in physics, because this serves then as
guidance of intuition for all our further discussions. The objects will be interpreted
as particles, events will be collisions with a host (background) medium.
We now make connection between the generic transport equation (5.2) and the most
famous and important transport equation in science: The Boltzmann equation for
dilute gases: The phase space is then the space of all relevant independent variables
(co-ordinates) of a single particle and the dependent quantity of interest ψ is then
the one particle distribution function f (r, v, i, t), f (r, E, Ω, i, t), or f (x) where
the state x is characterized by a position vector r, a velocity vector v, a chemical
species index i and the time t, etc.
5 The Monte Carlo Method for Particle Transport Problems 145
Integrations are over the velocities v ′ , v and v ′ . Here σ(v ′ , v ′ ; v, v) is the cross
section for a binary particle collision process defined such that the conservation laws
for total energy and momentum are fulfilled. The first two arguments in σ, namely
the velocities v ′ , v ′ in the first integral, correspond to the species i0 and b, respec-
tively, prior to a collision. These are turned into the post collision velocities v, v,
again for species i0 and b, respectively. The first integral, therefore, describes transi-
tions (v ′ , v ′ → v, v) into the velocity space interval [v, v + dv] for species i0 , and
the second integral describes loss from that interval for this species. Furthermore,
m is the particle mass and F (r, v, t) is the volume force field. The right hand side
is the collision integral δf /(δt)|b . If there are more than just one possible type of
collision partners, then the collision integral has to be replaced by a sum of collision
integrals over all collision partners b, including, possibly, b = i0 (self collisions)
"
δf δf "
= " . (5.6)
δt δt "b
b
146 D. Reiter
Despite its simple physical content (transition probability from v ′ to v, given a col-
lision at r) the collision kernel C can be a quite complicated integral, as it involves
not only multiple differential cross sections, but also, possibly, particle multipli-
cation factors, e.g. in case of fission by neutron impact, dissociation of molecules
by electron impact, or stimulated photon emission from excited atoms. It can also
include absorption, in which case the post collision state must be an extra limbo
state outside the phase-space considered. Due to both particle multiplication and/or
absorption the collision kernel C is not normalized to one, generally.
The second term on the right hand side is much simpler, because the function
f (v) can be taken out of the integral. We even take the product |v| · f before the
integral. The remaining integral is then just the total macroscopic cross section Σt ,
i.e., the inverse local mean free path (dimension: 1/length). It is solely defined by
total cross sections and independent of particle multiplication factors, since we only
consider binary collisions (exactly two pre-collision partners always).
This term is then often taken on the left hand side of the Boltzmann equation
with a positive sign, in the form
"
δf ""
= Σt (r, v)|v|f (v) . (5.8)
δt "loss
With these formal substitutions the Boltzmann equation takes a form which is often
more convenient, in particular in linear transport theory
∂ F (r, v, t)
+ v · ∇r + · ∇v f (r, v, t) + Σt (r, v)|v|f (v)
∂t m
(5.9)
3 ′ ′ ′ ′
= d v C(v → v)|v |f (v ) + Q(r, v, t) .
In this equation an external source term Q has also been added, for completeness.
5 The Monte Carlo Method for Particle Transport Problems 147
If the distributions fb of collision partners b are assumed to be given, then the kernel
C does not dependent on the dependent quantities f . Also the extinction coefficient
Σt is independent of the dependent variable f = fi0 , and the out-scattering loss
term (last term on left hand side) just describes the loss of particle flux of i0 par-
ticles due to any kind of interaction of them with the host medium. Equation (5.9)
above becomes a linear integro-differential equation. If the characteristic time con-
stants for the considered transport phenomena are very short compared to those for
evolution of the macroscopic background medium one can then neglect explicit time
dependence.
If the particles travel on straight lines between collisions i.e., with no forces
acting on them: F = 0, then the scalar transport flux (angular flux) Φ, where
where, again, the macroscopic cross section Σt is the total inverse local mean free
path (dimension: 1/length). This cross section can be written as a sum Σt = Σk
over macroscopic cross sections for the different types (identified by the index k) of
collision processes.
With these simplifications the transport equation takes the well known form in
linear transport theory (e.g., neutronics, radiation transfer, cosmic rays, etc.)
v
· ∇r Φ(r, v) + Σt (r, v)Φ(r, v)
|v|
(5.12)
= Q(r, v) + dv ′ Φ(r, v ′ )Σt (r, v ′ ) · C(r, v ′ → v) .
In order to see this, define the Green’s function G(v, i; r ′ → r). This is the so-
lution to (5.12), but with the right hand side replaced by a delta-point source at
x = (r ′ , v, i). For this let, again, Ω denote the unit vector in the direction of par-
ticle flight, and let Ω ′ and Ω ′′ be two further unit vectors such that these three
vectors form an ortho-normal basis at the point r ′ . The Green’s function G then
reads as follows
G(v, i; r′ → r)
& Ω(r−r ′ ) ′
= e− 0 dsΣt (r +sΩ)
δ(Ω ′ (r − r ′ ))
× δ(Ω ′′ (r − r ′ ))H(Ω(r − r′ )) (5.14)
with H(x) = 0 if x ≤ 0, and H(x) = 1 if x > 0, the Heaviside step function. Thus,
G is closely related to the distribution density T (l) for the distance l for a free flight
starting from r′ to the next point of collision r = r′ + l · Ω. The integral
& Ω(r−r′ )
dsΣt (r ′ +sΩ)
α(r ′ , r) = e− 0 (5.15)
in (5.14) is well known to characterize the optical thickness of the medium in linear
transport theory.
Multiplying (5.12) with that Green’s function and integrating over initial vari-
ables r ′ turns this integro-differential equation into an integral equation for the flux
Φ, which (almost) has the required generic form
Φ(x) = dx′ Q(x′ )G(x′ → x)
+ dx′ Φ(x′ )Σ(x′ )G(x′ → x)C(x′ → x)
1
= dx′ Q(x′ ) T (x′ → x)
Σ(x)
Σ(x′ )
+ dx′ Φ(x′ ) T (x′ → x)C(x′ → x) . (5.16)
Σ(x)
Here we have introduced the transport kernel T (x → x′ ) = Σ(x′ )G(x → x′ ),
which will play the role of the distribution of free flight length between two collision
events.
Multiplying this equation with Σ(x) and using the definition for the pre-
collision density: Ψ = ΣΦ yields exactly the&generic equation (5.13). The source
term S in this equation is now seen to be S = QT dx′ , i.e. it is the contribution to
Ψ directly from source Q, then transported (free flight) to the first point of collision
with T . It is the density of (un-collided) particles going into their first collision. The
kernel K(x → x′ ) is now identified as K = CT : a particle going into a collision
at x′ is first collided by sampling from C, then transported to the next collision at
x with operator & T . The once collided contribution (particles going into their second
collision) is QT CT dx′ . The twice& collided contribution of particles going into
their third collision is consequently: QT CT CT dx′, and so on.
5 The Monte Carlo Method for Particle Transport Problems 149
The kernel C is (excluding normalization) the conditional distribution for new co-
ordinates (v,i) given that a particle of species i′ and with velocity v ′ has undergone
a collision at position r ′ . This kernel can further be decomposed into
Σk
C(r ′ , v ′ , i′ → v, i) = pk Ck (r ′ ; v ′ , i′ → v, i) , pk = (5.18)
Σt
k
with summation over the index k for the different types of collision processes under
consideration and pk defined as the (conditional) probability for a collision to be of
type k. The normalizing factor
1
ck (x′ ) = d3 v Ck (r ′ , v ′ , i′ → v, i) , Ck = Ck (5.19)
i
ck
gives the mean number of secondaries for this collision process. The normalized
function Ck then is a conditional probability density. The particle absorption pro-
cess can conveniently be described by adding an absorbing state xa to the μ-space
(generally referred to as one-point compactification of this space in the language of
mathematical topology). This limbo state, once it is reached, is never left again if
the kernels T or C are employed as transition probabilities.
The Green’s function G and similarly the kernel T (r ′ → r) := ΣG(r ′ →
r) describes the motion of the test particles between the collision events. It is the
probability distribution of the mean free flight length l between events. In more
compact notation
& r
ds Σt (v ′ ,s)
T (v ′ , l) = Σt (v ′ , r)e− r′ . (5.20)
As the problem is linear, the source Q arising in the inhomogeneous part can be
normalized to one and, thus, Q can be regarded as a distribution density in phase
space for the primary birth points of particles.
Also a secondary birth point distribution (or post collision density) χ of particles
emerging from a collision event (or directly from the source Q) is sometimes defined
and used as dependent variable, instead of Ψ , φ or f
150 D. Reiter
χ(x) = Q(x) + dx′ Ψ (x′ )C(x′ → x). (5.21)
Comparing this with the previous definitions and equations one easily sees that
φ(x) = dx′ χ(x′ )G(x′ → x) , (5.22)
χ(x) = Q(x) + dx′ χ(x′ )T (x′ → x)C(x′ → x) . (5.23)
This equation too has exactly the same form as our generic equation for Ψ . But now
the inhomogeneous part is directly the physical source Q, and the order of C and
T is reversed in the transport kernel. But this is also obvious: χ(x) is the emerging
collision density (per unit time), hence for the next higher generation of emerging
particles first the free flight (T ) and then the scattering (C) must be applied.
As already mentioned, a detailed knowledge of Φ, Ψ or χ is often not required,
and the output of Monte Carlo simulations are responses R, defined by
1
2
R = Ψ |gc = dxΨ (x)gc (x) = Φ|gt = dxΦ(x)gt (x) , (5.24)
This estimator evaluates the response function g at the points of collisions along the
random walks, starting at the first collision. The factors in the product account for
particle absorption and multiplication.
For example: If g = 1 and c = psc , i.e., no particle multiplication, then this
estimator simply counts collisions. It is then also intuitively clear that the response
Rg is just the collision density averaged over phase space (or a sub-domain of phase
space, if g = 0 outside that sub-domain).
But it can be shown rigourously that the statistical expectation E(Xc ) produces
R = E(Xc ) = d(ω)Xc (ω)h(ω) (5.27)
with h(ω) being the probability density for finding a chain ω from the Markov
process defined above. This means: Xc is, indeed, an unbiased (correct) estimator
for response R.
We now sketch the idea of the proof. We will refer to the construction of a Markov
chain by directly employing the terms Q, T and C in the integral equation (5.13) as
analog and the resulting procedure as analog Monte Carlo. Note that this means that
possible physical particle splitting events (fission processes, cascading of ray show-
ers, dissociation of molecules) have been eliminated already, and this is corrected
for by the weight factors pa and c which result from normalization of the scattering
kernel C. Hence the analog Markov process is not a branching process anymore,
even if the underlying physical process was a branching process.
In order to cover variance reducing methods already in this proof, we also con-
sider another, non-analog, equation, of exactly the same type
#
ψ(x) #
= S(x) + dx′ K(x # → x′ )ψ(x # ′) (5.28)
and we use (5.28) to construct a random walk process, rather than (5.13).
If (5.28) = (5.13) we speak of an analog Monte Carlo game, otherwise of non-
analog Monte Carlo: Variance reduction is then possible by making clever choices
for the non-analog process, as already discussed under the topic importance sam-
pling in the introductory chapter before. For the initial distribution of the Markov
chain we set
152 D. Reiter
#
f1 (x) = S(x) . (5.29)
The transition probability is defined by
# 1 → x2 )
K(x
f2/1 (x1 → x2 ) = p#a (x1 )#
q (x2 ) + p#sc (x1 ) , (5.30)
c(x1 )
#
p#a is, again, the absorption probability, p#sc is the scattering probability (= 1 − p#a ),
and q#(x) is (an entirely irrelevant) distribution, formally needed after transition into
the limbo state absorbed particle.
The probability for finding a particular chain (x1 , ..., xk ), ending with absorp-
tion in xk , is given by the product
k−1
0
h(x1 , . . . , xk ) = f1 (x1 ) f2/1 (xj → xj+1 ) . (5.31)
j=1
We now define the estimator for the non-analog Monte Carlo process (with the ana-
log estimator X, (5.26) as special case)
k=1
0
# S(x1 ) K(xj → xj+1 ) psc (xj ) #c(xj ) pa (xk )
X(w) = X(w) . (5.32)
# 1)
S(x #
K(xj → xj+1 ) p#sc (xj ) c(xj ) p#a (xk )
j=1
(5.13)=(5.28)
The Monte Carlo method for solving a Fredholm IE by this random walk and with
this estimating method is exact, because:
Theorem 1. If K is subcritical, i.e., the absorption pa is strong enough compared
to particle multiplication c, and if some measure-theoretical conditions are fulfilled
as well, namely p# = 0 ⇒ p = 0 (Radon Nikodym) for any non-analog probability p#
and corresponding analog probability p in the Markov chain, then
#
E(X(w)) = Ig (ψ)
= dx S(x)g(x)
+ dx′ dx S(x′ )K(x′ → x)g(x)
+ dx′′ dx′ dx S(x′′ )K(x′′ → x′ )K(x′ → x)g(x)
+ ... (5.33)
walks, those with length one, plus those with length two, etc., summing over all
possible lengths k of random walks
#
E(X(w))
∞
= # 1 , . . . , xk )h(x1 , . . . , xk )
. . . dx1 . . . dxk X(x
k=1
= ... (after same lengthy algebra) . . .
∞
i−1
0
= . . . S(x1 ) K(xj → xj+1 )g(xi )Ni,k (xi , . . . , xi+k ) (5.34)
i=1 j=1
with
Probability that a chain, which starts at xi
Ni,k = 1 − lim
k→∞ will not end at one of the next k events.
=1 (because K is subcritical) , (5.35)
#
hence: E(X(w)) = Ig (ψ) = g|ψ, by convergence of the v. Neumann series of
the FIE.
Other estimators (track-length type estimators) are employed frequently. These esti-
mators are unbiased as well but have higher moments (e.g. variance) different from
those of Xc . Instead of evaluating the detector function gc (x) at the points of colli-
sions xl as Xc does, they involve line integrals of gt (x) along the trajectories, e.g.,
x
l+1
9
n−1 :0
l−1
c(xj )
Xt (ωin ) = ds gt (s) , (5.36)
j=1
(1 − pa (xj ))
l=0 xl
again with R = E(Xt ) = E(Xc ). See (5.24) for the definition of response func-
tions gc and gt .
It can be seen (see also [3]), that the collision estimator, written not for the pre-
collision density Ψ but for the post-collision density χ (integral equation (5.23))
results in a track-length type conditional expectation estimator Xe : This conditional
expectation estimator reads
9
xend
n−1 & s
:0
l−1
c(xj )
ds′ Σt (s′ )
Xe (ωin ) = ds gt (s) e− 0 . (5.37)
j=1
(1 − pa (xj ))
l=0 xl
Here xend is the nearest point on a boundary along the test flight originating in xl .
The proof is identical to the one given above for the collision estimator, but using
(5.23) for χ as starting point instead (which has the identical mathematical form),
154 D. Reiter
and the definition of the flux φ expressed in terms of χ and the Green’s function G
in (5.22).
With this proof for the estimator Xe , as special case of a collision estimator
after an averaging transformation of the FIE, also the track-length estimator Xt is
proofed to be unbiased for the same response. Because the exponential in Xe is just
the sampling distribution for the flight length between collisions, Xt results from
Xe by randomization: Rather than evaluating the integral over gt exp(. . . ) in Xe ,
one samples the next collision point from this exponential distribution and evaluates
only gt until this point. This is exactly what the track-length estimator Xt does.
This estimator Xe is related to Xt by extending the line integration, which is
restricted to the path from xl to xl+1 in formula (5.36), to the line segment from
xl to xend . I.e., the line integration (scoring) may be extended into a region beyond
the next point of collision, into which the generated history would not necessarily
reach. Xe is especially useful for deep penetration problems. Furthermore, for a
point source Q and a purely absorbing host medium its variance is exactly zero: This
Monte Carlo scheme then has turned into a purely analytic or numerical concept. See
also the similar discussions on zero variance estimators in the introductory chapter
before.
Methods for random number generation from the collision kernel C (i.e., sampling
the post collision velocity after a collision) are largely case dependent. Usually first
a discrete random number is used to determine the type of collision process k, next
one finds post collision parameters and weight from kernel Ck , see (5.19). In case of
scattering, one frequently encountered sampling distribution is given by the follow-
ing consideration: Take a classical Monte Carlo test-particle, velocity v0 , traveling
in a host medium of other particles, which have a known velocity distribution fb ,
often: fb = fMaxw (vb ), a Maxwellian, with a given temperature Tb . Given that a
collision point has been found (after sampling from the transport kernel T ), the task
is to find (sample) the velocity vc of the collision partner. Once both v0 and vc
are known (and the masses of the particles involved), the new velocities can be cal-
culated from the collision kinetics (e.g., classical orbits, or using differential cross
sections, etc.).
The distribution of velocities of the collision partners going into a collision at
this point in phase space is given as
Here vrel = |v0 − vc |) and the fb -averaged rate coefficient c = σ(vrel )vrel fb is the
normalization constant.
5 The Monte Carlo Method for Particle Transport Problems 155
We now discuss one special sampling method for the transport kernel T , which is
known under various different names in Monte Carlo literature: Null collisions (in
PIC simulations), pseudo collisions (in fusion plasma applications) or delta-events
(in neutron transport).
Lets take l as coordinate along the flight starting from r, remove all irrelevant
parameters, and assume that the mean free path λ = 1/Σt is independent of the
spatial co-ordinate r ′ , along the trajectory under consideration. Then the transport
kernel T , see (5.20), is simply given by the exponential distribution
and the flight distance l can directly be sampled by the method of inversion of the
chapter before, Sect. 3.2.2.1.
If, however, the parameters of the host medium are varying along the flight path
(either continuously or, in a grid, from cell to cell) then it may sometimes be com-
putationally advantageous to modify the collision rate, such that the mean free path
remains constant along a flight path. I.e., one replaces Σt (r, v) by Σt∗ (v) with
Clearly, by adding this on both sides of the equation the solution Φ is not altered.
But C is modified to become C ∗
Σt Σδ
C → C∗ = ∗ C + ∗ δ(v ′ − v) . (5.42)
Σt Σt
Rather than applying weight corrections T /T ∗ we now need to sample from the
non-analog kernel collision C ∗ . But this is trivial: A first random number is used to
decide if the collision is real or with the δ-isotope. In the second case the scattering
is actually a null event: The flight continues without any change in velocity, due to
the delta distribution for post collision velocities in the δ-scattering kernel.
Note, that typically Σδ ≥ 0, i.e., the mean free path in the simulation is reduced.
More general δ-scattering operators, also allowing for negative values of Σδ , i.e.,
increased mean free pathes, have also been derived [10]. They seem not to be in
use very much. Although they are unbiased (correct), they require negative weight
corrections.
SCALING FACTORS
EIRENE TEST PARTICLES
FACT–X = 6.000E+01
H
FACT–Y = 6.000E+01 D
C
0.800
H2
ORIGIN D2
CH2X0 = 1.750E+02 H2+
CH2Y0 = 0.000E+00 D2+
0.400
CH2X0 = HOST MEDIUM (BACKGROUND)
PLOTTED AT H+
Z = 0.000E + 00 D+
C1+
LOCATE(1) C5+
ELECTR.IMPACT(2) C6+
0.000
HEAVY PAR.IMPACT(3)
PHOTON IMPACT(4)
ELASTIC COLL.(5)
CHARGE EXCHANGE(6)
FOKKER PLANCK(7)
SURFACE(8)
–0.400
SPLITTING(9)
RUSSIAN ROULETTE(10)
PERIODICITY(11)
RESTART:A.SPLT.(12)
SAVE:COND.EXP.(13)
RESTAT:COND EXP(14) –0.800
TIME LIMIT(15)
GENERATION LIMIT(16)
FLUID LIMIT(17)
ERROR DETECTED –0.800 –0.400 0.000 0.400 0.800
INT.GRID SURFACE98)
Fig. 5.2. Left: Inside view of TEXTOR Tokamak, FZ-Jülich. Major and minor radius of
torus: 1.75 m and 0.5 m, respectively. Right: 45 typical Monte Carlo trajectories (atoms
and molecules). Analog sampling, Host medium: hydrogen plasma, central electron density:
4·1019 m−3 , central plasma temperature: 1.5 keV
As can be seen the molecular density is compressed underneath the toroidal lim-
iter blade (bright area). This is also the location of the pump-ducts. The atoms pene-
trate deeper into the plasma, for the TEXTOR conditions shown here the density typ-
ically drops from 1018 m−3 at the outer edge to 1013 m−3 in the plasma center. More
details on the particular application of Monte Carlo transport methods to neutral par-
ticle transport in fusion plasmas can be found at the URL: www.eirene.de.
1014 1015 1016 1017 1018 1019 1014 1015 1016 1017 1018 1019
Fig. 5.3. Neutral particle density in TEXTOR, poloidal distribution. Monte Carlo solution,
with track-length estimator. Left: atom density. Right: molecule density. Shading according
to logarithmic scale for density, density range: 1014 − 1018 m−3
158 D. Reiter
References
1. P. Kloeden, E. Platen, Numerical Solution of Stochastic Differential Equations. Springer
Series: Applications of Mathematics (Springer Verlag, 1995) 141
2. C. Gardiner, Handbook of Stochastic Methods: for Physics, Chemistry and the Natural
Sciences. Springer Series in Synergetics (Springer Verlag, 2004) 142
3. J. Spanier, E. Gelbard, Monte Carlo Principles and Neutron Transport Problems (Addi-
son Wesley Publication Company, 1969) 143, 144, 153
4. H. Kalos, P. Whitlock, Monte Carlo Methods, Vol. I: Basics (Wiley-Interscience Publi-
cations, John Wiley and Sons, New York, 1986) 143
5. J. Hammersley, D. Handscomb, Monte Carlo Methods (Chapman and Hall, London &
New York, 1964) 143
6. W. Feller, T. Am. Math. Soc. 48 (1940) 143, 149
7. N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, E. Teller, J. Chem. Phys. 21,
1087 (1953) 143
8. C. Cercignani, The Boltzmann Equation and its Applications. Springer Series: Applied
Mathematical Sciences (Springer Verlag, 1975) 145
9. D. Reiter, J. Nucl. Mater. 196–198 (1992) 150
10. L. Carter, E. Cashwell, W. Taylor, Nucl. Sci. Eng. 48 (1972) 156
6 The Particle-in-Cell Method
David Tskhakaya
Probably the first Particle-in-Cell (PIC) simulations have been made at late 1950s
by Buneman [1] and Dawson [2] who simulated the motion of 100–1 000 particles
including interaction between them. Our day PIC codes simulate 105 –1010 particles
and represent a powerful tool for kinetic plasma studies. They are used practically
in all branches of plasma physics modelling laboratory, as well as astrophysical
plasma. PIC codes have a number of advantages: They represent so-called lowest
codes, i.e. the number of assumptions made in the physical model is reduced to
the minimum, they can simulate high-dimensional cases and can tackle complicated
atomic and plasma-surface interactions. The prize for these advantages is a long
simulation time: Some simulations can take up to 104 hours of CPU. As a result, they
require a high level of optimization and are usually designed for professional use.
With this chapter we aim at introducing the reader to the basics of the PIC sim-
ulation technique. It is based mainly on available literature cited below, but includes
some original unpublished material, too. For the interested reader I can recommend
two classical monographs, [3] and [4], and the papers [5, 6] describing new devel-
opments in this field (see also references cited in the text).
The chapter is organized as follows. The main PIC features are discussed in
Sect. 6.1. In Sect. 6.2 we consider solvers of equations of motion used in PIC and
discuss their accuracy and stability aspects. Initialization of particle distribution,
boundary effects and particle sources are described in Sect. 6.3. In Sects. 6.4 and
6.5 we show how plasma macro-parameters are calculated and discuss solvers of
Maxwell’s equations. Particle collisions are considered in Sect. 6.6. Final remarks
are given in Sect. 6.7.
D. Tskhakaya: The Particle-in-Cell Method, Lect. Notes Phys. 739, 161–189 (2008)
DOI 10.1007/978-3-540-74686-7 6 c Springer-Verlag Berlin Heidelberg 2008
162 D. Tskhakaya
dX i dV i
=Vi and = F i (t, X i , V i , A) (6.1)
dt dt
for i = 1, . . . , N and of macro fields A = L1 (B), with the prescribed rule of
calculation of macro quantities B = L2 (X 1 , V 1 , . . . , X N , V N ) from the parti-
cle position and velocity can be called a PIC simulation. Here X i and V i are the
generalized (multi-dimensional) coordinate and velocity of the particle i. A and B
are macro fields acting on particles and some macro-quantities associated with par-
ticles, respectively. L1 and L2 are some operators and F i is the force acting on a
particle i. As one can see, PIC simulations have much broader applications then
just plasma physics. On the other hand, inside the plasma community PIC codes are
usually associated with codes solving the equation of motion of particles with the
Newton-Lorentz’s force (for simplicity we consider an unrelativistic case)
dX i dV i ei
=Vi and = (E (X i ) + V i × B (X i )) (6.2)
dt dt mi
∂B
∇D = ρ (r, t) , = −∇ × E , D = εE ,
∂t
(6.3)
∂D
∇B = 0 , = ∇ × H − J (r, t) , B = μH ,
∂t
together with the prescribed rule of calculation of ρ and J
ρ = ρ (X 1 , V 1 , . . . , X N , V N ) , (6.4)
J = J (X 1 , V 1 , . . . , X N , V N ) . (6.5)
Here ρ and J are the charge and current densities and ε and μ the permittivity and
permeability of the medium, respectively. Below we will follow this definition of
the PIC codes.
PIC codes usually are classified depending on dimensionality of the code and
on the set of Maxwell’s equations used. The codes solving a whole set of Maxwell’s
equations are called electromagnetic codes, contrary electrostatic ones solve just
the Poisson equation. E.g., the XPDP1 code represents a 1D3V electrostatic code,
which means that it is 1D in usual space and 3D in velocity space, and solves only
the electrostatic field from the Poisson equation [7]. Some advanced codes are able
to switch between different dimensionality and coordinate system, and use electro-
static, or electro-magnetic models (e.g. the XOOPIC code [8]).
A simplified scheme of the PIC simulation is given in Fig. 6.1. Below we con-
sider each part of it separately.
6 The Particle-in-Cell Method 163
Input/Output
During PIC simulation the trajectory of all particles is followed, which requires so-
lution of the equations of motion for each of them. This part of the code is frequently
called “particle mover”.
A few words about the simulation particles itself. The number of particles in
real plasma is extremely large and exceeds by orders of magnitude a maximum
possible number of particles, which can be handled by the best supercomputers.
Hence, during a PIC simulation it is usually assumed that one simulation particle
consists of many physical particles. Because the ratio charge/mass is invariant to this
transformation, this superparticle follows the same trajectory as the corresponding
plasma particle. One has to note that for 1D and 2D models this transformation can
be easily avoided by choosing of sufficiently small simulated volume, so that the
number of real plasma particles can be chosen arbitrary.
As we will see below, the number of simulated particles is defined by a set of
physical and numerical restrictions, and usually it is extremely large (> 105 ). As a
result, the main requirements to the particle mover are the high accuracy and speed.
One of such solvers represents the so called leap-frog method (see [3] and [4]),
which we will consider in detail.
As in other numerical codes the time in PIC is divided into discrete time mo-
ments, in other words the time is grided. This means that physical quantities are
calculated only at given time moments. Usually, the time step, ∆t, between the
nearest time moments is constant, so that the simulated time moments can be given
via following expression: t → tk = t0 + k∆t and A (t) → Ak = A (t = tk ) with
k = 0, 1, 2, . . ., where t is the time, t0 the initial moment and A denotes any physical
quantity. The leap-frog method calculates particle velocity not at usual time steps tk ,
but between them tk+1/2 = t0 + (k + 1/2) ∆t. In this way equations become time
centred, so that they are sufficiently accurate and require relatively short calculation
time
164 D. Tskhakaya
X k+1 − X k
= V k+1/2 ,
∆t
V k+1/2 − V k−1/2 e V k+1/2 + V k−1/2
= Ek + × Bk . (6.6)
∆t m 2
The leap-frog scheme is an explicit solver, i.e. it depends on old forces from the
previous time step k. Contrary to implicit schemes, when for calculation of particle
velocity a new filed (at time step k + 1) is used, explicit solvers are simpler and
faster, but their stability requires a smaller time step ∆t.
By substituting
3
∆t ′ ∆t2 ′′ 1 ∆t
V k±1/2 = V k ± V + Vk± V ′′′
k + ... ,
2 k 8 6 2
∆t2 ′ ∆t3 ′′
X k+1 = X k + ∆tV k + Vk+ V k + ... (6.7)
2 6
into (6.6) we obtain the order of the error ∼ ∆t2 . It satisfies a general requirement
for the scaling of numerical accuracy ∆ta>1 . In order to understand this requirement
we recall that for a fixed simulated time the number of simulated time steps scales
as Nt ∼ ∆t−1 . Then, after Nt time steps an accumulated total error will scale as
Nt ∆ta ∼ ∆ta−1 , where ∆ta is the scale of the error during one step. Thus, only
a > 1 can guarantee, that the accuracy increases with decreasing ∆t.
There exist different methods of solution of finite-difference equations (see
(6.6)). Below we consider the Boris method (see [3]), which is frequently used in
PIC codes
X k+1 = X k + ∆tV k+1/2 and V k+1/2 = u+ + qE k (6.8)
with u+ = u− + (u− + (u− × h)) × s, u− = V k−1/2 + qE k , h = qB k , s =
2h/(1 + h2 ) and q = ∆t/(2(e/m)). Although these equations look very simple,
their solution represent the most time consuming part of PIC, because it is done
for each particle separately. As a result, the optimization of the particle mover can
significantly reduce the simulation time.
In general, the Boris method requires 39 operations (18 adds and 21 multiplies),
assuming that B is constant and h, s and q are calculated only once at the beginning
of simulation. But if B has one or two components, then the number of operations
can be significantly reduced. E.g., if B z and E x then (6.8) can be reduced to
the following ones
X k+1 = X k + ∆tV k+1/2 ,
x y y
Vk+1/2 = ux− + Vk+1/2 + Vk−1/2 h + qEkx ,
y y
Vk+1/2 = Vk−1/2 (1 − sh) − ux− s (6.9)
x
with ux− = Vk−1/2 +qEkx . They require just 17 operations (8 multiplies and 9 adds),
which can save up to 50% of the CPU time. Some advanced PIC codes include
a subroutine for searching the fastest solver for a given simulation setup, which
significantly decreases the CPU time.
6 The Particle-in-Cell Method 165
In order to find correct simulation parameters one has to know the absolute accuracy
and corresponding stability conditions for the particle mover. They are different
for different movers and the example considered below is applied just to the Boris
scheme.
First of all let us consider the accuracy of a Larmor rotation. By assuming
V k−1/2 ⊥ B we can define the rotation angle during the time ∆t from
V k+1/2 V k−1/2
cos (ω∆t) = 2 . (6.10)
Vk−1/2
with Ω = eB/m, so that for a small ∆t we get ω = Ω(1 − (∆tΩ)2 /12) + . . .. E.g.,
for a 1% accuracy the following condition has to be satisfied: ∆tΩ ≤ 0.35.
In order to formulate the general stability condition some complicated calcu-
lations are required (see [4]). Below we present simple estimates of the stability
criteria for the (explicit) particle mover.
Let us consider the equation of a linear harmonic oscillator
d2 X
= −ω02 X , (6.12)
dt2
having the following analytic solution
X = Ae−iω0 t , (6.13)
ω0 ∆t < 2 (6.16)
166 D. Tskhakaya
ω0 ∆t ≤ 0.2 , (6.17)
giving sufficiently accurate results. Interesting to note that this number has been de-
rived few decades ago when the number of simulation time steps was typically of the
order of Nt ∼ 104 . From (6.15) we obtain ω = ω0 (1 − (ω0 ∆t)2 /24) + . . .. Hence,
a cumulative phase error after Nt steps should be ∆(ω∆t) ≈ (Nt (ω0 ∆t)3 )/24.
Assuming Nt = 104 and ∆ (ω∆t) < π we obtain the condition (6.17). Although
modern simulations contain much larger number of time steps up to Nt = 107 , this
condition still can work surprisingly well.
The restrictions on ∆t described above can require the simulation of unaccept-
ably large number of time steps. In order to avoid these restrictions different implicit
schemes have been introduced: V k+1/2 = F (E k+1 , ...). The difference from the
explicit scheme is that for the calculation of the velocity a new field is used, which
is given at the next time moment.
One of examples of an implicit particle mover represents the so called 1 scheme
(see [9])
X k+1 − X k
= V k+1/2 ,
∆t
V k+1/2 − V k−1/2 e E k+1 (xk+1 ) + E k−1
=
∆t m 2
V k+1/2 + V k−1/2
+ × Bk . (6.18)
2
From the physics point of view, the boundary conditions for the simulated particles
are relatively easy to formulate: Particles can be absorbed at boundaries, or injected
from there with any distribution. On the other hand, an accurate numerical imple-
mentation of particle boundary conditions can be tricky. The problem is that (i) the
velocity and position of particles are shifted in time (∆t/2), and (ii) the velocity of
6 The Particle-in-Cell Method 167
Xk
Vk – 1/2
1 2
Xk* + 1 Xk + 1 Xk + 1
Vk* + 1/2 Vk + 1/2 Vk + 1/2
0 L
∗ ∗
Fig. 6.2. Particle reflection (1) and reinjection (2) at the boundaries. Xk+1 and Vk+1/2 denote
the virtual position and velocity of a particle if there would be no boundary
particles are known at discrete time steps, while a particle can cross the boundary at
any moment between these steps.
In unbounded plasma simulation particles are usually reflected at the bound-
aries, or reinjected from the opposite side (see Fig. 6.2). A frequently used reflection
model, so called specular reflection, is given as
refl x,refl x
Xk+1 = −Xk+1 and Vk+1/2 = −Vk+1/2 . (6.20)
Here, the boundary is assumed to be located at x = 0 (see Fig. 6.2). The specular
reflection represents the simplest reflection model, but due to relatively low accuracy
it can cause artificial effects. Let us estimate the accuracy of reflection (see (6.20)).
The exact time when particle reaches a boundary and the corresponding velocity
can be written as
" "
" X "
" k "
t0 = t k + " x " ,
" Vk−1/2 "
" " (6.21)
" X " e
x " k " x
V0 = Vk−1/2 + " x " E .
" Vk−1/2 " m k
The second term on the right hand side of (6.22) represents the error made during the
specular reflection, which can cause an artificial particle acceleration and heating.
168 D. Tskhakaya
Particle reinjection is applied usually when the fields satisfy periodic boundary
reinj x,reinj x
conditions. The reinjection is given by Xk+1 = L − Xk+1 and Vk+1/2 = Vk+1/2 ,
where x = L denotes the opposite boundary. If the fields are not periodic, then this
expression has to be modified. Otherwise a significant numerical error can arise.
The PIC codes simulating bounded plasmas are usually modeling particle ab-
sorption and injection at the wall, and some of them are able to tackle complicated
plasma-surface interactions too.
Numerically, particle absorption is the most trivial operation and done by re-
moving of the particle from memory. Contrary to this, for particle injection com-
plicated numerical models can be required. When a new particle is injected it
has to be taken into account that the initial coordinate and velocity are known at
the same time, while the leap-frog scheme uses a time shifted values of them. In
most cases the number of particles injected per time step is much smaller than
the number of particles near the boundary, hence, the PIC code use simple injec-
tion models. For example, an old version of the XPDP1 code (see [7]) has used
x
V k+1/2 = V + e∆t (R − 0.5) E k /m and Xk+1 = R∆tVk+1/2 , which assumes
that particle has been injected at time t0 = tk+1 − R∆t with R being an uniformly
distributed number between 0 and 1. V is the velocity obtained from a given injec-
tion distribution function (usually the Maxwellian one). The BIT1 code [10] uses a
more simpler injection routine
x
V k+1/2 = V and Xk+1 = R∆tVk+1/2 , (6.23)
which is independent of the field at the boundary and hence, insensitive to a possible
field error there. Description of higher order schemes can be found in [11].
Strictly speaking, the plasma-surface interaction processes can not be attributed
to a classical PIC method, but probably all advanced PIC codes simulating bounded
plasma contain elements of Monte-Carlo techniques [12]. A general scheme of
plasma-surface interactions implemented in PIC codes is given below.
When a primary particle is absorbed at the wall, it can cause the emission of
a secondary particle (a special case is reflection of the same particle). In general
the emission probability F depends on the surface properties and primary parti-
cle energy ǫ and incidence angle α. Accordingly, the PIC code calculates F (ǫ, α)
and compares it to a random number R, uniformly distributed between 0 and 1. If
F > R then a secondary particle is injected. The velocity of a secondary particle is
calculated according to a prescribed distribution fsev (V ). Some codes allow mul-
tiple secondary particle injection, including as a special case the thermal emission.
The functions F and fsev are obtained from different sources on surface and solid
physics.
The particles in a PIC simulation appear, either by initial loading, or via particle in-
jection from the boundary and at a volumetric source. In any case the corresponding
velocities have to be calculated from a given distribution function f (V ). Important
6 The Particle-in-Cell Method 169
&V
f (V ′ ) dV ′
Vmin
F (V ) = V&max
(6.24)
f (V ′ ) dV ′
Vmin
F −1 (U ) = V . (6.25)
The same method can be applied to multi-dimensional cases which can be ef-
fectively reduced to 1D, e.g., by variable separation: f (V ) = f1 (V x ) f2 (V y )
f3 (V z ). Often inversion of (6.25) can be done analytically, otherwise it is done
numerically.
As an example we consider the injection of Maxwell-distributed particles:
f (V ) ∼ V exp(−V 2 /(2VT2 )). According to (6.24) and (6.25) we get
2
/(2VT2 )
F (V ) = 1 − e−V and V = VT −2 ln (1 − U ) . (6.26)
(ii) Another possibility is to use two sets of random numbers R1 and R2 (for sim-
plicity we consider a 1D case) V = Vmin + R1 (Vmax − Vmin ), if f (V ) /(fmax ) >
R2 use V , else try once more. This method requires random number genera-
tors of high level and it is time consuming. As a result, it is usually used when
the method considered above can not be applied (e.g. for complicated multi-
dimensional f (V )).
In advanced codes these distributions are generated and saved at the beginning
of a simulation, so that later no further calculations are required except getting V
from the memory. The same methods are used for spatial distributions f (X), too.
As it was mentioned above, required velocity distributions can be generated by
set of either ordered numbers U or by random numbers R, which are uniformly
170 D. Tskhakaya
distributed between 0 and 1. A proper choice of these numbers is not a trivial task
and depends on the simulated system; e.g., using of random numbers can cause
some noise. In addition, numerically generated random numbers in reality represent
pseudo-random numbers, which can correlate and cause some unwanted effects.
Contrary to this, the distributions generated by a set of ordered numbers, e.g. U =
(i + 0.5) /N , i = 1, . . . , N − 1, are less noisy. On the other hand, in this case the
generated distributions represent a multi-beam distribution, which sometimes can
cause a beam instability [3].
All numerical schemes considered up to now can be applied not only to PIC, but to
any test particle simulation too. In order to simulate a real plasma one has to self-
consistently obtain the force acting on particles, i.e. to calculate particle and current
densities and solve Maxwell’s equations. The part of the code calculating macro
quantities associated with particles (n, J, . . . ) is called “particle weighting”.
For a numerical solution of field equations it is necessary to grid the space:
x → xi with i = 0, . . . , N g . Here x is a general 3D coordinate and N g number of
grid cells (e.g. for 3D Cartesian coordinates N g = Ngx , Ngy , Ngz ). Accordingly,
the plasma parameters are known at these grid points: A(x) → Ai = A(x = xi ).
The number of simulation particles at grid points is relatively low, so that one can
not use an analytic approach of point particles, which is valid only when the number
of these particles is very large. The solution is to associate macro parameters to each
of the simulation particle. In other words to assume that particles have some shape
S (x − X), where X and x denote the particle position and observation point.
Accordingly, the distribution moments at the grid point i associated with the particle
“j” can be defined as
Am m
i = aj S (xi − X j ) , (6.27)
where A0i = ni , A1i = ni V i , A2i = ni Vi2 etc. and a0j = 1/Vg , a1j = V j /Vg ,
a2j = (V j )2 /Vg etc. Vg is the volume occupied by the grid cell. The total distribution
moments at a given grid point are expressed as
N
Am
i = am
j S (xi − X j ) . (6.28)
j=1
Stability and simulation speed of PIC simulations strongly depend on the choice
of the shape function S (x). It has to satisfy a number of conditions. The first two
conditions correspond to space isotropy
The rest of the conditions can be obtained requiring an increasing accuracy of the
weighting scheme. In order to derive them let us consider a potential generated at
the point x by a unit charge located at the point X, G (x − X). In other words
G (x − X) is the Green’s function (for simplicity we consider a 1D case). Intro-
ducing the weighting scheme we can write the potential generated by some particle
located at X as
m
φ (x) = e S (xi − X) G (x − xi ) , (6.31)
i=1
here e is the particle charge and m the number of nearest grid points with assigned
charge. Expanding G (x − xi ) near (x − X) we get
m
φ (x) = e S (xi − X) G (x − X)
i=1
m ∞ n
(X − xi ) dn G (x − X)
+e S (xi − X)
i=1 n=1
n! dxn
= eG (x − X) + δφ (x) ,
∞ m
1 dn G (x − X)
δφ (x) = e n
S (xi − X) (X − xi )n . (6.32)
n=1
n! dx i=1
The first term on the right hand side of expression (6.32) represents a physical po-
tential, while δφ is an unphysical part of it introduced by weighting. It is obvious to
require this term to be as small as possible. This can be done by requiring
m
n
S (xi − X) (xi − X) = 0 (6.33)
i=1
Thus, at large distance from the particle (|X − xi | < |x − X|) δφ (x) decreases
with increasing nmax .
The shape functions can be directly constructed from the conditions (6.29),
(6.30) and (6.33). The later two represent algebraic equations for S (xi − X).
Hence, the number of conditions (6.33), which can be satisfied depends on the
172 D. Tskhakaya
The meaning of this expression is that the density at the grid point xi assigned by
the particle located at the point x represents the average of the particle real shape
D (x′ − x) over the area [xi − ∆x/2; xi + ∆x/2] . For the nearest grid point and
a)
Δ x/2 0 Δ x/2 Δx 0 Δx
b)
Δ y/2 Δ y/2
0 0
Δ y/2 Δ y/2
Fig. 6.3. Particle shapes for the NGP (left) and linear (right) weightings in 1D (a) and 2D
(b) cases
6 The Particle-in-Cell Method 173
linear weightings D (x) = δ (x) and D (x) = H (∆x/2 − |x|), respectively. Here
H (x) is the step-function: H (x) = 1, if x > 0, else H (x) = 0.
After the calculation of charge and current densities the code solves the Maxwell’s
equations (cf. Fig. 6.1) and delivers fields at the grid points i = 0, . . . , N g . These
fields can not be used directly for the calculation of force acting on particles, which
are located at any point and not necessarily at the grid points. Calculation of fields at
any point is done in a similar way as charge assignment and called field weighting.
So, we have E i and B i and want to calculate E (x) and B (x) at any point x. This
interpolation should conserve momentum, which can be done by requiring that the
following conditions are satisfied:
(i) Weighting schemes for the field and particles are same
E (x) = E i S (xi − x) . (6.36)
i
(ii) The field solver has a correct space symmetry, i.e. formally the field can be
expressed in the following form (for simplicity we consider the 1D case)
Ei = gik ρk (6.37)
k
with gik = −gki , where ρk is the charge density at the grid point k. In order to
understand this condition better, let us consider a 1D electrostatic system. By
integrating the Poisson equation we obtain
⎛ x ⎞
b
1 ⎝
E (x) = ρ dx − ρ dx⎠ + Eb + Ea , (6.38)
2ε0
a x
where a and b define boundaries of the system. Assuming that either a and b
are sufficiently far and Ea,b = ρa,b = 0, or the system (potential) is periodic
Eb = −Ea , ρb = ρa , we obtain
⎛x ⎞
i
b
1 ⎝
E (xi ) = ρ dx − ρ dx⎠
2ε0
a xi
⎛ ⎞
i−1 Ng −1
∆x ⎝
= (ρk + ρk+1 ) − (ρk + ρk+1 )⎠
4ε0
k=1 k=i
Ng
∆x
= gik ρk (6.39)
4ε0
k=1
174 D. Tskhakaya
e 2
= gik S (xi − x) S (xi − x) = (i ↔ k)
Vg
i, k
e2
=− gik S (xi − x) S (xi − x) = −Fself = 0 , (6.41)
Vg
i, k
= −F21 . (6.42)
N
N
= Ei ep S (xi − xp ) − Bi × ep V p S (xi − xp )
i p=1 i p=1
= Vg (ρi E i + J i × B i ) . (6.43)
i
6 The Particle-in-Cell Method 175
As we see, the conditions (6.36) and (6.37) guarantee that during the force weighting
the momentum is conserved and the inter-particle forces are calculated in a proper
way. It has to be noted that:
(i) We neglected contribution of an internal magnetic field B int .
(ii) The momentum conserving schemes considered above does not necessarily
conserve the energy too (for energy conserving schemes see [3] and [4]).
(iii) The condition (6.37) is not satisfied in general for coordinate systems with
nonuniform grids, causing the self-force and incorrect inter-particle forces.
For exampple, if we introduce a nonuniform grid ∆xi = ∆xαi with αi = αj =i , in
expression (6.39) we obtain
Ng
∆x
E (xi ) = gik ρk (6.45)
4ε0
k=1
with
⎧
⎪
⎨ αk + αk−1 if i > k
k k−1
[b,a]
gik = − α +α if i < k, Ng → ∞, ∆x = Ng i , (6.46)
⎪
⎩ i−1 i
i=1 α
α −α if i = k
According to the type of the equations to be solved the field solvers can be explicit
or implicit. E.g., the explicit solver of the Poisson equation solves the usual Poisson
equation
∇ [ε (x) ∇ϕ (x, t)] = −ρ (x, t) , (6.47)
while an implicit one solves the following equation
Here η (x) is the implicit numerical factor, which arises due to the fact that in its
implicit formulation a new position (and hence ρ) of particle is calculated from a
new field given at the same moment.
As an example we consider some matrix methods, which are frequently used in
different codes. For a general overview of different solvers the interested reader can
use [3] or [4].
′′ (4) ∆x4
aϕi+1 + bϕi + cϕi−1 = ϕi (2a + b) + (ϕi ) a∆x2 + (ϕi ) a + . . . . (6.52)
12
Hence, by choosing a = 1 and b = −2a = −2 we get
2
∂ ϕ ϕi+1 − 2ϕi + ϕi−1 ∆x2
2
− 2
= (ϕi )(4) + O ∆x4 . (6.53)
∂x x=xi ∆x 12
6 The Particle-in-Cell Method 177
An excellent example of an 1D Poisson solver has been introduced in [7]. The solver
is applied to an 1D bounded plasma between two electrodes and solves Poisson and
external circuit equations simultaneously. Later, this solver has been applied to a 2D
plasma model [14]. Below we consider an simplified version of this solver assuming
that the external circuit consists of a voltage (or current) source V (t) (I(t)) and a
capacitor C (see Fig. 6.4)).
The Poisson equation for a 1D plasma is given as
∆x2
ϕi+1 − 2ϕi + ϕi−1 = − ρi . (6.55)
ε0
It is a second order equation, so that we need two boundary conditions for the solu-
tion. The first one can be a potential at the right-hand-side (rhs) wall:
ϕN g = 0. (6.56)
Wall Wall
Plasma
V(t)
I(t) C
Recalling that E0 is the electric field at the l.h.s. wall, we can write E0 = σlhs /ε0 ,
where σlhs is the surface charge density there. Hence, the second boundary condition
can be formulated as
∆x ∆x
ϕ0 − ϕ1 = σlhs + ρ0 . (6.58)
ε0 2
In order to calculate σlhs we have to employ the circuit equation.
ϕ0 − ϕN g = ϕ0 = V (t) . (6.62)
In this case Qci can be directly calculate from the expression Qci = ∆tI(t). Then
the second boundary condition can be given as
∆x Qpl + ∆tI (t) ∆x
ϕ0 − ϕ1 = σlhs (t − ∆t) + + ρ0 . (6.65)
ε0 S 2
Combining equations (6.55), (6.56) and (6.61)–(6.65) we can write the set of differ-
ence equations in the following matrix form
⎛ ⎞
a b 0 ......... 0 ⎛ ⎞ ⎛ ⎞
⎜ c −2 1 0 . . . . . . 0 ⎟ ϕ0 d/∆x
⎜ ⎟⎜ ⎟ ⎜ ρ1 + e ⎟
⎜0 1 −2 1 0 . . . 0 ⎟ ⎜ ϕ1 ⎟ ⎜ ⎟
⎜ ⎟ ⎜ ϕ2 ⎟ ∆x2 ⎜ ρ2 ⎟
⎜ .. .. .. ⎟⎜ ⎟ ⎜ ⎟
⎜ . . . ⎟ ⎜ . ⎟ = − ⎜ .. ⎟ . (6.66)
⎜ ⎟⎜ . ⎟ ε0 ⎜ . ⎟
⎜0 . . . 0 1 −2 1 0 ⎟ ⎜ . ⎟ ⎜ ⎟
⎜ ⎟ ⎝ϕN g−2 ⎠ ⎝ρN g−2 ⎠
⎝0 . . . . . . 0 1 −2 1 ⎠
ϕN g−1 ρN g−1
0 . . . . . . . . . 0 1 −2
This 1D solver can be generalized for a 2D case. The main difference between
the 1D and 2D cases represent the decomposition of the field and the boundary
conditions at internal objects introduced in 2D (for details see [14]).
Field decomposition is given by
Here ϕpl is the plasma field with the zero boundary conditions
1 "
∆ϕpl (t, x, y) = − ρ (x, y) , ϕpl "b = 0 , (6.68)
ε0
where ϕvac is the vacuum field with the unit boundary conditions
180 D. Tskhakaya
It is easy to see that (6.75) is similar to the one for the 1D model considered above
and can be solved in the same way. The main difference are the boundary conditions
along the x-axis. E.g., if the plasma is bounded between two conducting walls, then
ϕk0 = ϕkNg = 0 if k > 0, and for the k = 0-component we have exactly the same
equation as for 1D with the same boundary condition.
For sufficiently strong fields and/or very fast processes it is necessary to solve the
complete set of Maxwell’s equations (6.3). It is obvious that corresponding solvers
are more complicated than ones considered above. Correspondingly a detailed de-
scription of them is out of the scope of this work. Here we present just one of
possible schemes, which is implemented in the XOOPIC code [8].
In order to ensure high speed and accuracy it is convenient to introduce a leap-
frog scheme also for the fields. The leap-frog scheme is applied to the space coordi-
nates too, which means that electric and magnetic fields are shifted in time by ∆t/2,
and different components of them are shifted in space by ∆x/2. In other words:
(i) E is defined at t = n∆t and B and J at t = (n + 1/2)∆t time moments.
(ii) “i” components of the electric field and current density are defined at the points
xi + ∆i /2, xk and xj , and same component of the magnetic field at xi , xk +
∆k /2 and xj + ∆j /2. Here xs and ∆s for s = i, k, j denote the grid point
and grid size along the s-axis. i, k and j denote the indices of the right-handed
Cartesian coordinate system.
As a result the finite-differenced Ampere’s and Faraday’s laws in Cartesian coordi-
nates can be written as
182 D. Tskhakaya
i,t i,t−∆t
Di+1/2,k,j − Di+1/2,k,j
∆t
j,t−∆t/2 j,t−∆t/2
Hi+1/2,k+1/2,j − Hi+1/2,k−1/2,j
=
∆xk
k,t−∆t/2 k,t−∆t/2
Hi+1/2,k,j+1/2 − Hi+1/2,k,j−1/2 i,t−∆t/2
− − Ji+1/2,k,j , (6.76)
∆xj
i,t+∆t/2 i,t−∆t/2
Bi,k+1/2,j+1/2 − Bi,k+1/2,j+1/2
∆t
k,t k,t j,t j,t
Di,k+1/2,j+1 − Di,k+1/2,j Di,k+1,j+1/2 − Di,k,j+1/2
= − . (6.77)
∆xj ∆xk
A = A0 eikx−ωt , (6.78)
with A = E, B. After substitution of (6.78) into field equations (6.76) and (6.77)
and trivial transformations we obtain
2 3 2
sin (ωt/2) sin (ki xi /2)
= , (6.79)
c∆t i=1
∆xi
where c = 1/ε0 μ0 is the speed of light. It is obvious that the solution is stable
(i.e. Imω < 0) if
1 3 2−1
2
1
(c∆t) < . (6.80)
i=1
∆x2i
Often, this so called Courant condition requires unnecessary small time step for the
particle mover. In order to relax it one can introduce separate time steps for field and
particles. This procedure is called “sub-cycling” [3].
6 The Particle-in-Cell Method 183
The routines described above namely: The field solver, the particle mover with
proper boundary conditions and the particle source, weighting of particles and fields
represent a complete PIC code in its classical understanding. Starting from 1970s
a number of PIC codes include different models of particle collisions. Today the
majority of PIC codes include at least some kind of collision operator, which have
to be attributed to a PIC technique. These operators are usually based on statistical
methods and correspondingly are called Monte Carlo (MC) models. Often different
authors use the name PIC-MC code. The MC simulations represent an independent
branch in numerical physics and the interested reader can find more on MC method
in corresponding literature (e.g., see Part II). Below we consider the main features
of the MC models used in PIC codes.
The forces acting on the particles in a classical PIC scheme correspond to macro
fields, so that the simulated plasma is assumed to be collisionless. In order to simu-
late a collisional plasma it is necessary to implement corresponding routines. More-
over, the field solver is organized in such a way that self-forces are excluded, hence,
the field generated by a particle inside the grid cell decreases with decreasing dis-
tance from this particle. As a result, inter-particle forces inside grid cells are un-
derestimated (see Fig. 6.5). Hence, they can be (at least partially) compensated by
introducing the Coulomb collision operator.
5
PIC
1D
2D
4 3D
~1/r
3
~1/r2
~const
1
~r
0
0 0.2 0.4 0.6 0.8 1
r/Δr
The first codes simulating Coulomb collisions were the particle-particle codes
simulating the exact interaction between each particle pair. Of course this method,
which scales as N 2 can not be used in contemporary PIC simulations. Later different
MC models have been developed.
The simplest linear model assumes that the particle distribution is near to a
Maxwellian and calculates an average force acting on particles due to collisions
[16]. Although this is the fastest operator it probably can not be used for most of
kinetic plasma simulations, when particle distributions are far from the Maxwellian.
A nonlinear analogue of this model has been introduced in [17]. Here, the exact col-
lision inter-particle inter-particle is obtained from the particle velocity distribution
function. Unfortunately, the number of particles required for building up a suffi-
ciently accurate velocity distribution is extremely large (see [18]), which makes it
practically impossible to simulate large systems.
Most of nonlinear Coulomb collision operators used in our day PIC codes are
based on the binary collision model introduced in [19]. In this model each particle
inside a cell is collided with one particle from the same cell. This collision operator
conserves energy and momentum and it is sufficiently accurate. The main idea is
based on the fact that there is no need to consider Coulomb interaction between two
particles separated by a distance larger than the Debye radius λD (e.g., see [20]).
Since a typical size of the PIC cell is of the order of λD , the interaction between the
particles in different cells can be neglected. This method consists of the following
three steps (see Fig. 6.6):
(i) First, all particles are grouped according to the cells where they are located;
(ii) Then these particles are paired in a random way, so that one particle has only
one partner;
(iii) Finally, the paired particles are (statistically) collided.
The latter is not trivial and we consider it in some detail.
Simulation
1. 1 2 ... Nc
particles
Grid cells
2. 3a. 3b.
Fig. 6.6. Binary collision model from [19]. 1. Grouping of particles in the cells; 2. Randomly
changing the particle order inside the cells; 3a. Colliding particles of the same type; 3b.
Colliding particles of different types
6 The Particle-in-Cell Method 185
∆V = O (χ, ψ) − 1 V , (6.82)
where O (α, β) is the matrix corresponding to the rotation on angles α and β (see
[19]). χ and ψ represent the scattering and azimuthal angles.
The scattering angle χ is calculated from a corresponding statistical distribution.
By using the Fokker-Plank collision operator one can show (see [22]) that during the
time ∆tc the scattering angle has the following Gaussian distribution
χ −χ2 /(2χ2 )
P (χ) = 2 e ∆tc ,
χ ∆tc
(6.83)
2 e2 e2 n∆tc Λ
χ ∆tc ≡ 1 22 2 3 .
2πε0 μ V
Here e1,2 and μ = m1 m2 / (m1+ m2 ) denote the charge and reduced mass of the
collided particles, respectively. n and Λ are the density and the Landau logarithm
[20], respectively. The distribution (6.83) can be inverted to get
3
χ = −2 χ2 t ln R1 . (6.84)
Correspondingly, the azimuthal angle ψ is chosen randomly between 0 and 2π
ψ = 2πR2 . (6.85)
R1 and R2 are random numbers between 0 and 1.
Finally, the routine for two-particle collision is reduced to the calculation of
expressions (6.81), (6.82), (6.84), and (6.85).
The Coulomb interaction is a long range interaction, when a cumulative effect of
many light collisions with small scattering angle represents the main contribution to
the collisionality. Accordingly,
the time step for the Coulomb collisions ∆tc should
be sufficiently small: χ2 ∆t (V = VT ) ≪ 1. It is more convenient to formulate
this condition in the equivalent following form
e21 e22 nΛ
νc ∆tc ≪ 1 and νc = , (6.86)
2πε20 μ2 VT3
where νc is the characteristic relaxation time for the given Coulomb collisions [23]
and VT is the thermal velocity of the fastest collided particle species. Although usu-
ally ∆tc ≫ ∆t, the binary collision operator is the most time consuming part of the
PIC code. Recently, in order to speed up the collisional plasma simulations a number
of updated versions of this operator have been developed (e.g., see [6, 24] and [25]).
186 D. Tskhakaya
Under realistic conditions the plasma contains different neutral particles, which suf-
fer collisions with the plasma particles. The corresponding collision models used in
PIC codes can be divided in two different schemes: Direct Monte-Carlo and null-
collision models.
The direct Monte-Carlo model is a common MC scheme when all particles carry
information about their collision probability. In this scheme all particles have to be
analyzed for a collision probability. Hence, the direct MC requires some additional
memory storage and sufficiently large amount of the CPU time.
The null collision method (see [26] and [27]) requires a smaller number of par-
ticles to be sampled and it is relatively faster. It uses the fact that in each simula-
tion time step only a small fraction of charged particles suffer collisions with the
neutrals. Hence, there is no necessity to analyze all particles. As a first step the
maximum collision probability is calculated for each charged particle species
Pmax = 1 − e−σn∆s max = 1 − e−(σV )max nmax ∆t , (6.87)
where σ = σi (V ) and n are the total collision cross-section, i.e. the sum of
cross-sections σi for all possible collision types and the neutral density, respectively.
∆s = V ∆t is the distance, which the particle travels per ∆t time. Accordingly, the
maximum number of particles which can suffer a collision per ∆t time is given
as Nnc = Pmax N ≪ N . As a result only Nnc particle per time step have to be
analyzed. These Nnc particles are randomly chosen, e.g., by using the expression
i = Rj N with j = 1, . . . , Nnc , where i is the index of the particle to be sampled
and Rj are the random numbers between zero and one. The sampling procedure
itself includes the calculation of the collision probability of a sampled particle and
choosing which kind of collision it should suffer (if any). For this a random number
R is compared to the corresponding relative collision probabilities: if
P1 + P2 nV (σ1 (V ) + σ2 (V ))
R≤ ≈ , (6.89)
Pmax (σV )max nmax
velocity is picked up from the prescribed distribution (usually the Maxwellian dis-
tribution with the given density and temperature profiles). Contrary to this, in the
nonlinear case the motion of neutral particles is resolved in the simulation, and the
collided ones are randomly chosen from the same cells, where the colliding charged-
particle are.
When the collision partners and corresponding collision types are chosen, the
collision itself takes place. Each collision type needs a separate consideration, so
that here we discuss the general principle.
The easiest collisions are the ion-neutral charge-exchange collisions. In this case
the collision is reduced to an exchange of velocities
V ′1 = V 2 and V ′2 = V 1 . (6.91)
The recombination collisions are also easy to implement. In this case the col-
lided particles are removed from the simulation and the newly born particle, i.e. the
recombination product, has the velocity derived from the momentum conservation
m1 V 1 + m2 V 2
V new = . (6.92)
mnew
The elastic collisions are treated in a similar way as the Coulomb collisions
using (6.81). The scattering angle depends on the given atomic data. E.g., often it is
assumed that the scattering is isotropic
cos χ = 1 − 2R . (6.93)
In order to save computational time during the electron-neutral elastic collisions the
neutrals are assumed to be at rest. Accordingly, in spite of resolving (6.81) a simpli-
fied expression is used for the calculation of the after-collision electron velocity
4
2me
Ve′ ≈ Ve 1 − (1 − cos χ) . (6.94)
Mn
Excitation collisions are done in a similar way as the elastic ones, just before the
scattering the threshold energy Eth is subtracted from the charged particle energy
4
′ Eth
V ⇒ V = V 1− ⇒ scattering ⇒ V ′′ . (6.95)
E
Important to note is that one has to take care on the proper coordinate system, e.g.,
in (6.95) the first transform should be done in a reference system, where the collided
neutral is at rest.
Implementation of inelastic collisions when secondary particles are produced
is case dependent. E.g., in electron-neutral ionization collisions, first the neutral
particle is removed from the simulation and a secondary electron-ion pair is born.
The velocity of this ion is equal to the neutral particle velocity. The velocity of
electrons is calculated in the following way. First, the ionization energy is subtracted
188 D. Tskhakaya
from the primary electron energy and then the rest is divided between the primary
and secondary electrons. This division is done according to given atomic data. After
finding these energies the electrons are scattered on the angles χprim and χsec .
In a similar way the neutral-neutral and inelastic charged-charged particle colli-
sions can be treated.
References
1. O. Buneman, Phys. Rev. 115(3), 503 (1959) 161
2. J. Dawson, Phys. Fluids 5(4), 445 (1962) 161
3. C. Birdsall, A. Langdon, Plasma Physics Via Computer Simulation (McGraw-Hill, New
York, 1985) 161, 163, 164, 169, 170, 175, 176, 182, 188
4. R. Hockney, J. Eastwood, Computer Simulation Using Particles (IOP, Bristol and New
York, 1989) 161, 163, 165, 172, 175, 176, 188
5. V. Decyk, Comput. Phys. Commun. 87(1-2), 87 (1995) 161, 188
6. D. Tskhakaya, R. Schneider, J. of Comp. Phys. 225(1), 829–839 (2007) 161, 185, 188
7. J. Verboncoeur, M. Alves, V. Vahedi, C. Birdsall, J. Comput. Phys. 104(2), 321 (1993) 162, 168, 177, 18
8. J. Verboncoeur, A. Langdon, N. Gladd, Comput. Phys. Commun. 87(1–2), 199 (1995) 162, 181, 188
9. D. Barnes, T. Kamimura, J.N. Le Boeuf, T. Tajima, J. Comput. Phys. 52(3), 480 (1983) 166, 188
10. D. Tskhakaya, S. Kuhn, Contrib. Plasma Phys. 42(2–4), 302 (2002) 168, 188
11. K. Cartwright, J. Verboncoeur, C. Birdsall, J. Comput. Phys. 162(2), 483 (2000) 168, 188
12. D. Tskhakaya, S. Kuhn, Plasma Phys. Contr. F. 47, A327 (2005) 168, 188
13. F. F. Collino, T. Fouquet, P. Joly, J. Comput. Phys. 211(1), 9 (2006) 175, 188
14. V. Vahedi, G. DiPeso, J. Comput. Phys. 13(1), 149 (1997) 177, 179, 180, 188
15. W. Press, S. Teukolsky, W. Vetterling, B. Flannery, Numerical Recipes in C (Cambridge
University Press, Cambridge, New York Port Chester, Melbourne, Sydney, 2002) 179, 181, 188
16. A. Bergmann, Contrib. Plasma Phys. 38, 231 (1998) 184, 188
17. O. Batishchev, X. Xu, J. Byers, R. Cohen, S. Krasheninnikov, T. Rognlien, D. Sigmar,
Phys. Plasmas 3(9), 3386 (1996) 184, 188
18. O. Batishchev, S. Krasheninnikov, P. Catto, A. Batishcheva, D. Sigmar, X. Xu, J. Byers,
T. Rognlien, R. Cohen, M. Shoucri, I. Shkarofskii, Phys. Plasmas 4(5), 1672 (1997) 184, 188
19. T. Takizuka, H. Abe, J. Comput. Phys. 25(3), 205 (1977) 184, 185, 188
6 The Particle-in-Cell Method 189
20. N. Krall, A. Trivelpiece, Principles of Plasma Physics (San Francisco Press, Inc., Box
6800, San Francisco, 1986) 184, 185, 188
21. L. Landau, E. Lifshitz, Course of TheoreticalPhysics, vol. 1, Mechanics (Pergamon
Press, Oxford-London-Paris, 1960) 185, 188
22. R. Shanny, J. Dawson, J. Greene, Phys. Fluids 10(6), 1281 (1967) 185, 188
23. D. Book, NRL Plasma formulary (Naval Research Laboratory, Washington D.C., 1978) 185, 188
24. K. Nanbu, Phys. Rev. E 55(4), 4642 (1997) 185, 188
25. A. Bobylev, K. Nanbu, Phys. Rev. E 61(4), 4576 (2000) 185, 188
26. C. Birdsall, IEEE T. Plasma Sci. 19(2), 65 (1991) 186, 188
27. V. Vahedi, M. Surendra, Comput. Phys. Commun. 87, 179 (1995) 186, 188
28. K. Bowers, J. Comput. Phys. 173(2), 393 (2001) 188
29. K. Matyash, R. Schneider, A. Bergmann, W. Jacob, U. Fantz, P. Pecher, J. Nucl. Mater.
313-316, 434 (2003) 188
30. A. Christlieb, R. Krasny, J. Verboncoeur, IEEE T. Plasma Sci. 32(2), 384 (2004) 188
31. A. Christlieb, R. Krasny, J. Verboncoeur, Comput. Phys. Commun. 164(1–3), 306 (2004)
188
7 Gyrokinetic and Gyrofluid Theory and Simulation
of Magnetized Plasmas
Richard D. Sydora
7.1 Introduction
Magnetized plasmas contain a wide range of time and space scales that span many
orders of magnitude. This makes realistic simulations of time-dependent phenom-
ena very difficult and capturing all the scales within a single calculation in still
beyond reach of our present computational capabilties. Charged particle motion in
time-varying, nonuniform electric and magnetic fields, in the presence of collective
effects and collisions is very complex. This complexity arises because the inter-
particle forces have both a short and long range nature. For the short range, the
cross-section of Coulomb collisions strongly decreases with increasing energy of
the interacting particles and for lower densities. Therefore, the mean free path of
the charged particles in such physical systems as high temperature magnetically-
confined fusion plasmas or in low density space and astrophysical plasmas becomes
enormous; hundreds to thousands of meters or kilometers. The particle trajectories
become more influenced by the electromagnetic forces which are determined by ex-
ternal sources and internal processes. An external source could be a magnetic field
which is necessarily confined to a finite volume and is generally curved and inho-
mogeneous. The Lorentz force that acts of the particles binds them to the magnetic
field and forces them to follow the field lines. The internal processes created by col-
lective plasma motions have a range of scales and these also modify the trajectories
leading to cross-field or anomalous plasma transport.
R. D. Sydora: Gyrokinetic and Gyrofluid Theory and Simulation of Magnetized Plasmas, Lect. Notes Phys. 739,
191–219 (2008)
DOI 10.1007/978-3-540-74686-7 7 c Springer-Verlag Berlin Heidelberg 2008
192 R. D. Sydora
In this chapter we are concerned with collective plasma effects which reside in
the low frequency range ω < Ωi , where Ωi = eB/mi is the ion cyclotron fre-
quency. This is motivated by the experimental observation [1, 2, 3] that the domi-
nant contribution to low frequency microturbulence in magnetically confined plas-
mas originate from temporal and spatial scales √ that are associated
with the drift
frequency ω ≃ Ωi (ρs /L⊥ ), where ρs = mi Te /(eB) = Te /Ti ρi and ρi is
the thermal ion gyroradius defined as ρi = vti /Ωi with ion thermal velocity vti =
Ti /mi . Since a typical scale separation between ρs and L⊥ in experiment is ρs /L⊥
∼ 10−3 –10−2, this makes ω/Ωi of this order and therefore kinetic simulations us-
ing the complete set of Vlasov-Maxwell equations or particle simulations based on
the Lorentz-Newton and Maxwell’s equations quite impractical.
Another important experimental indication of important physical scales, partic-
ularly relevant to convective transport in inhomogeneous magnetized plasmas, is
the observed peaks in the wavenumber spectra around k⊥ ρs ≃ 0.2 − 0.5 in den-
sity fluctuation measurements [4, 5]. Therefore, the electric and magnetic fields
associated with these fluctuations must include finite-gyroradius effects. The ob-
served characteristics of low frequency turbulent fluctuations suggest an ordering
ω/Ωi ∼ ρi /L⊥ ∼ O(ǫ) and k⊥ ρi ∼ O(1), which helps in deriving reduced kinetic
equations for the evolution of the phase space distribution function that removes all
dependence on gyrophase. Thus, the detailed cyclotron time scale does not have to
be explicitly followed. Analytical orbit averaging has been used to derive energy
and phase space preserving drift-kinetic and gyrokinetic equations of motion.
Gyrokinetic theory was originally developed in the 1960’s as an extension to
guiding center theory [6] to include the finite gyroradius effects on low frequency,
short perpendicular wavelength electrostatic fluctuations in general magnetic geom-
etry [7, 8]. In 1978 Catto [9] develops an important approach for gyrokinetic equa-
tions by first transforming the particle coordinates to the guiding center variables
in the Vlasov equation (or collisionless Boltzmann equation) before performing the
gyrophase averaging. This key result then allowed for a more consistent develop-
ment of the linear theory [10, 11], an early formulation of nonlinear gyrokinetic
theory [12] and a gyrokinetic particle simulation model [13]. In the early 1980’s
two important advances in guiding center theory occur. First, Boozer [14] develops
particle drift motion in magnetic coordinates which greatly simplifies the analysis of
orbits in complex geometry and second, Littlejohn [15] develops guiding center the-
ory based on action variational and Lie perturbation methods to obtain phase space
conserving equations. This was soon followed by an extension of the method to gy-
rokinetic theory [16]. In the late 1980’s Hahm [17], Brizard [18] and co-workers
extend the methodology to general magnetic geometry. There is an excellent recent
review on the rigorous perturbation approach using action variational methods [19].
Improved numerical algorithms for performing the gyrophase averaging [20] in 2D
were also made in the late 1980’s as well as the first 3D gyrokinetic simulations
[21, 22]. In the 1990’s 3D gyrokinetic particle simulations with general magnetic
geometry advanced with the rapid growth in massively parallel computational facili-
ties [23, 24, 25]. Recently 3D toroidal geometry simulations have made it possible to
7 Gyrokinetic and Gyrofluid Theory and Simulation of Magnetized Plasmas 193
study turbulent fluctuations in magnetically confined fusion plasmas from about the
scale size of the ion gyroradius (typically a few millimeters) up to the minor radius
of the cross-section (about 0.5–1 m). The anomalous transport coefficients obtained
from the models, such as the ion heat diffusivity are well within the experimental
range [26]. The anomalous electron thermal diffusivity is not well understood and
there are indications that fluctuations scales near the electron gyroradius need to be
included [27].
The basic gyrokinetic equations can also be used to formulate reduced or contin-
uum equations representing the time evolution of moments such as density, current
and pressure [16, 28, 29, 30, 31]. The two-fluid equations can be used to capture the
different dynamics of electrons and ions parallel and perpendicular to the magnetic
field and the coupling between both species by electric and magnetic field interac-
tions. These, so-called gyrofluid models are able to incorporate the finite gyroradius,
ion polarization drift and coupling to sound waves. There are many computational
advantages in using these continuum-based models in addition to efficiency, such as
the clear indentification of important fluid-type nonlinearities, inclusion of sources
and handling of more collisional regimes.
In this chapter, we present some of the key steps in development of gyrokinetic
and gyrofluid models starting with single particle motion in a magnetic field. Once
the basic transformation from gyro-center to gyrophase averaged coordinates is es-
tablished, it is possible to construct a kinetic theory or self-consistent gyrokinetic
Vlasov-Poisson-Ampere equations which form the basis of an N -body particle sim-
ulation model. The set of equations possess an energy invariant which can be used to
precisely monitor the exchange of energy among the fields and particles and the sys-
tem is inherently phase space conserving. The moments of the fundamental gyroki-
netic equations lead to a set of gyrofluid partial differential equations which form
the basis of magnetized fluid simulations in the low frequency regime and some of
the steps in the derivation are presented. The elements of the gyrokinetic particle
simulation approach are discussed along with the fundamental normal modes and
equilibrium statistical properties of the model. To illustrate the techniques, the re-
sults from a couple of example simulations in simpler geometry (slab or Cartesian)
are presented and are related to current-driven and current gradient-driven microin-
stabilities as a potential source of low frequency turbulence in laboratory and space
plasmas.
cyclotron motion. This leads to computationally efficient methods for the N -body
dynamics. To obtain the gyro-drift equations we utilize the more modern approach
using action-variational Lie perturbuation methods applied to single particle motion
[15, 16] under the influence of strong ambient magnetic field and electromagnetic
perturbation. This preserves the Hamiltonian structure of the system under coordi-
nate system changes.
When the dominant force acting on the individual particles in a plasma is electro-
magnetic, the equations of motion for a particle with mass m and charge q in the
electromagnetic fields E(r, t), B(r, t) are
dx
=v, (7.1)
dt
dv q
= (E + v × B) . (7.2)
dt m
Each of the N plasma particles satisfy such equations and the solution of the 6N
equations are the particle trajectories. These trajectories determine the local charge
and current density
ρ(r, t) = qj δ(r − r j (t)) ,
j
J (r, t) = v j qj δ(r − rj (t)) , (7.3)
j
v = u + v E + v b , (7.4)
7 Gyrokinetic and Gyrofluid Theory and Simulation of Magnetized Plasmas 195
and ρ = u/Ωc is the gyroradius. Particles with opposite sign of charge move along
the gyro-orbits in opposite directions. The position of the particle with respect to the
gyro-center and gyroradius is r(t) = R(t) + ρ(t) and this is shown schematically
in Fig. 7.1. The gyro-center moves with a velocity
Tc
dR 1
= v dt = v b + v E , (7.8)
dt Tc
0
e1
B
B
ρ
θ
ρ
e2
r
R
Fig. 7.1. Charged particle orbit in an ambient magnetic field with gyroradius vector ρ and
exact particle position r with respect to the guiding center position R
196 R. D. Sydora
also be derived from this time scale separation. If we consider the current associated
with the particle gyro-motion I = q/Tc , the flux enclosed by the current loop is the
magnetic moment μ = Iπρ2 which is therefore
Ωc 2 mu2
μ=q πρ = . (7.9)
2π 2B
We next consider the electromagnetic fields that are no longer constant but vary
in space and time. Computational advantages over following exact particle motion
arise when an ordering of the multiple scales is applied. If we introduce a character-
istic length L and time τ over which the fields vary and these satisfy
ρ 1
≪1, ≪1, (7.10)
L Ωc τ
then we can extend the basic drift dynamics equations. Since Ωc−1 is proportional
to m/q this may be adopted as a smallness parameter and allows us to obtain the
inertial corrections to the drift motion and magnetic moment. To lowest order we
have shown that the gyro-center position is given by a vector version of the simple
uniform field result. However, the exact gyro-center position is more complicated.
We may re-write the original Lorentz-Newton equations (7.1) and (7.2), in terms of
guiding center coordinates (R, v⊥ , v , φ), making use of cylindrical coordinates
dR 1 dv E d b
= v b + v E + b× +u× ( ), (7.12)
dt Ωc dt dt Ωc
dv⊥ db dv E
= −e⊥ · (v + ), (7.13)
dt dt dt
dv q db db
= E + v E · + v⊥ e⊥ · , (7.14)
dt m dt dt
dφ de1 1 db dv E
= −Ωc − e2 · − eφ · (v + ), (7.15)
dt dt v⊥ dt dt
where the total time dervative is taken along the particle trajectory d/dt = ∂/∂t +
(v b + v E + v⊥ e⊥ ) · ∇. In order to express the fields as a function of the guiding
center position, R, we use a Taylor expansion around this point b(r, t) = b(R, t) +
ρ · ∇b(R, t) + . . ..
From these transformed equations we note several important points.
(i) The ordering we established indicates that the fastest timescale is gyromotion
since dφ/dt ≃ −Ωc is the largest term.
7 Gyrokinetic and Gyrofluid Theory and Simulation of Magnetized Plasmas 197
(ii) E must be small because the q/m factor in front of it is large. If E were large,
the particles would accelerate on the time scale of gyration and in opposite
directions. This would create a charge separation and generate electric fields on
the shortest time scale, violating our initial assumption of slowly varying fields.
(iii) The parallel and perpendicular velocities are slowly varying and may be con-
sidered constant on the fast gyrofrequency time scale.
(iv) Lastly, the inertial corrections to the gyro-center motion are obtained such as
the polarization drift, the third term on the right hand side of (7.12), which is
associated with time varying electric fields and is much larger for the ions due
to its proportionality to mass.
It is possible to continue working with these equations and derive gyrophase inde-
pendent drift motion. We will not go through the detailed steps here but state the
result [32] for the guiding center drift velocity to first order in the parameter m/q is
2
dR v⊥ v2 ∇B
= v + b · ∇ × b b + vE + ⊥ b ×
dt 2Ωc 2Ωc B
b db dv E
+ × v ′ + ′ , (7.16)
Ωc dt dt
where d/dt′ = ∂/∂t + (v b + v E ) · ∇ and all fields are taken at the guiding center
position. The first term on the right hand side of (7.16) is the particle transit motion
along the field, the second term is the E × B drift motion and the third term is the
gradient-B drift. This gradient-B drift arises because, as the particle gyrates in the
inhomogeneous field, it periodically experiences stronger and weaker field strengths
along its gyro-orbit, leading to a net drift motion in the direction perpendicular to B
and ∇B. The final two terms in (7.16) are the curvature drift and polarization drift
effects, respectively.
where for single charged-particle motion in a fixed magnetic field has the Hamilto-
nian
1
H= (P − qA0 (Q))2 + qΦ(Q, t) , (7.18)
2m
where A0 is the vector potential of the background magnetic field and the canonical
variables are
P (t) = mv(t) + qA0 (r(t)) ,
(7.19)
Q(t) = r(t) .
γ = γμ dz μ = γi dz i − hdt , (7.20)
where z = z(P, Q, t) is the phase space coordinate system and in the summation
convention μ = 0, ..., 6 and i = 1, ..., 6, with z 0 = t, γi = P · ∂Q/∂z i , and
h = H − P · ∂Q/∂t. Therefore, the one-form can be written as
mv 2
γ = (qA0 (r) + mv) · dr − + qΦ(r, t) dt (7.21)
2
and r and v can be obtained from the canonical variables Q and P . A0 is the vector
potential of the background magnetic field. The action associated with the one-form
is given by
tf
dz μ
S = γμ dt (7.22)
dt
t0
can be derived from this one-form and it describes the fast periodic gyromotion
about θ. The θ-dependent non-secular perturbation is removed by transforming the
¯, θ̄; t). This
fundamental one-form to gyrophase-averaged coordinates Z̄ = (R̄, v̄z , μ
is accomplished by using the Lie transform which gives the fundamental one-form
in the new coordinate system as [16, 34]
Γ̄0 = γ0
Γ̄1 = dS1 − L1 γ0 + γ1 (7.25)
with (L1 γ0 )ν = g1μ (∂ν γoμ − ∂μ γoν ). The generating function S1 is
q #
S1 = dθ′ Φ (7.26)
Ωc
and is related to the difference between the gyro-center and gyro-averaged potential
&
# = Φ − Φθ with Φθ = 1/(2π) 2π Φ(R + ρ, t)dθ.
Φ 0
The generator, g1μ , is also obtained within the low frequency ordering as
1
g1R = ∇ R S1 × b ,
qB
1
g1vz = b · ∇R S1 ,
m
∂S1
g1μ = ,
∂θ
∂S1
g1θ = − . (7.27)
∂μ
The fundamental one-form in the gyro-averaged coordinates becomes
¯dθ̄ − h̄dt
Γ̄ = qA0 · dR̄⊥ + mv̄z dR̄z + μ (7.28)
with gyrophase-averaged Hamiltonian
¯ 1
Ωc + mv̄z2 + qΦθ̄ .
h̄ = μ (7.29)
2
By taking the variation of Γ̄ we obtain the Euler-Lagrange equation for the particle
motion in gyro-averaged coordinates, to first order
dR̄ b × ∇R̄ Φθ̄
= v̄z b + ,
dt B
dv̄z q
=− b · ∇R̄ Φθ̄ ,
dt m
dμ¯
=0,
dt
dθ̄ ∂Φθ̄
= Ωc + q ¯ . (7.30)
dt ∂μ
It is straightforward to generate the higher order corrections to the gyro-averaged
drift motion, parallel acceleration and magnetic moment using this formalism [16].
200 R. D. Sydora
The single particle gyro-drift dynamics can be used to obtain a Vlasov equation
for the gyro-averaged particle distribution function F̄ (Z̄). The gyrokinetic Vlasov
equation for species α is
For electrons with the smaller gyroradius ρe ≪ ρi , leads to the drift-kinetic equation
∂ F̄e b × ∇R̄ Φ ∂ F̄e e ∂ F̄e
+ v̄z b + · + b · ∇R̄ Φ =0. (7.33)
∂t B ∂ R̄ me ∂v̄z
In order to obtain the self-consistent electric potential for the gyrokinetic Vlasov
equation, we must consider the density response in real space r and not in R. The
Lie transform can help us relate the distribution function in the gyro-averaged coor-
dinates F̄ to the gyro-center coordinates f . To first order
where J and J¯ are the Jacobians of the gyro-center and gyro-averaged coordinates,
respectively. In evaluating the second term, we can linearize the distribution about a
local Maxwellian defined as
¯ n0 (R̄) 2 ¯
F̄M (R̄, v̄z , μ
) = e−(mv̄z /2+μ Ωc )/T (R̄) (7.36)
(2πT (R̄)/m)3/2
and use the ordering ρ/L ≪ 1, where L is the density and temperature gradient scale
¯) = q/(Ωc )
variation, to show that the leading order term is (∂S1 /∂ θ̄)(∂ F̄M /∂ μ
¯
(Φ − Φθ̄ )(∂ F̄M /∂ μ
) and the particle density simplifies to
¯ q ∂ F̄M ¯ 6 Z̄ . (7.37)
n(r, t) = F̄ (R̄, v̄z , μ
, t) + (Φ − Φθ̄ ) ¯ δ(R̄ − r + ρ̄)Jd
Ωc ∂μ
This expression can be used to construct a Poisson equation by taking the difference
between the electron and ion number densities
e ∂ F̄M
(Φ − Φθ̄ ) ¯ δ(R̄ − r + ρ̄i )J¯i d6 Z̄
Ωi ∂μ
(7.38)
= F̄i δ(R̄ − r + ρ̄i )J¯i d6 Z̄ − F̄e δ(R̄ − r)J¯e d6 Z̄ ,
where the small gyroradius limit for the electrons ρe → 0 has been taken. Using Fm
from (7.36), the gyrokinetic Poisson equation becomes
τ # = 4πe(n̄i − ne )
(Φ − Φ) (7.39)
λ2De
coordinates involves a double gyrophase average and assuming the Maxwellian dis-
tribution for F̄m we obtain
#
Φ(r) = Φk Γ0 (b)eik·r , (7.43)
k
&
where Γ0 (b) = J02 (k⊥ v⊥ /Ωi )F̄m (μ ¯ ¯
)dμ
= I0 (b) exp(−b). I0 is the modified
2 2
Bessel function with argument b = k⊥ ρi with ion thermal gyroradius ρi = vthi /Ωi
and vthi = (Ti /mi )1/2 . The Poisson equation (7.39) in this operator form is therefore
expressed as
τ
(1 − Γ0 )Φ = 4πe(n̄i − ne ) (7.44)
λ2De
and makes the 1 − Γ0 operator easy to invert.
Returning to (7.38), we can expand the delta functions about R̄ − r for the ions
and the left-hand-side of the Poisson equation becomes
2
ωpi
(∇⊥ · ∇⊥ )Φ = −4πe(n̄i − ne ) (7.45)
Ωi2
and the ion plasma frequency is ωpi = (4πne2 /mi )1/2 . This can also be obtained
from (7.44) in the long wavelength limit b ≪ 1, where 1 − Γ0 (b) ≃ b and the
operator becomes τ b/λ2De = k⊥ 2
τ ρ2i /λ2De = k⊥
2 2
ωpi /Ωi2 .
The operator on the left hand side of (7.45) represents the shielding effect due
to the ion polarization field. It is the lowest order contribution to the density fluctua-
tions provided by the polarization drift. We can obtain it heuristically by considering
the polarization drift from (7.12)
1 ∂E m ∂E
vp = = , (7.46)
Ωc B ∂t eB 2 ∂t
which gives a polarization current density
nio mi ∂E ⊥
J p = enio v pi − eneo v pe ≃ , (7.47)
eB 2 ∂t
that is dominated by the ions. Using the continuity equation and integrating, the
polarization density is
1 2
2
4πmi 1 ωpi
np = ∇⊥ · (nio ∇⊥ Φ) = ∇⊥ · ∇⊥ Φ . (7.48)
eB 2 e Ωi2
Equations (7.32), (7.33) and (7.39) form the basis for electrostatic gyrokinetic sim-
ulation of low frequency magnetized plasma dynamics.
In addition to electrostatic perturbations, it is also possible to self-consistently
generate magnetic perturbations via currents that are parallel to the ambient mag-
netic field. The currents are induced via inductive electric fields Ez = −∂Az /∂t, if
7 Gyrokinetic and Gyrofluid Theory and Simulation of Magnetized Plasmas 203
the magnetic field is in the z-direction. From Ampere’s law, the parallel current
causes magnetic perturbations that are primarily perpendicular to the main field
with δB⊥ < Boz and this is termed a field line bending effect. The perpendicu-
lar magnetic field perturbations can be expressed as a vector potential ∇ × (Az b) =
∇Az × b. The compressional magnetic field perturbations can also be included by
determining the perpendicular currents and higher β plasmas may be studied, where
β characterizes the ratio of the plasma pressure to magnetic field pressure. Magnet-
ically confined plasmas with high β are of contemporary interest in fusion, space
and astrophysical plasmas.
We proceed to outline the gyrokinetic Vlasov-Poisson-Ampere system of equa-
tions in the low β regime where parallel currents are important. We assume the cur-
rents are sufficiently weak and density fluctuations small such that the gyrokinetic
equations satisfy the ordering
ω ρi eΦ δB⊥
≃ ≃ ≃ ≃ O(ǫ) ,
Ωi L Te B0
k⊥ ρi ≃ O(1) , (7.49)
where we keep the finite gyroradius effects for the ions. If we introduce a canonical
momentum
q
pz = vz + Az (7.50)
m
into the one-form gyro-center Hamiltonian, we have
Ωc + mp2z /2 ,
h0 = μ
h1 = −qpz Az (R + ρ, t) + qΦ(R + ρ, t) (7.51)
and the gyrophase-averaged Hamiltonian can be derived using the Lie perturbation
method as
¯ 1
Ωc + mp̄2z + qΦθ̄ − q p̄z Az θ̄ .
h̄ = μ (7.52)
2
From this result it is possible to derive the gyrokinetic Vlasov equation including
magnetic perturbations from parallel currents [35]
∂ F̄i ∗ b × ∇R̄ Ψ θ̄ ∂ F̄i e ∗ ∂ F̄i
+ v̄z b + · − b · ∇R̄ Ψ θ̄ =0, (7.53)
∂t B ∂ R̄ mi ∂ p̄z
where Ψ θ̄ is a generalized potential defined by Ψ θ̄ = Φθ̄ − v̄z Az θ̄ . Ampere’s
law expressed in terms of the parallel vector potential becomes
e
2
∇⊥ Az = −4πe (p̄z − Az )F̄i δ(R̄ − r + ρ̄i )J¯i d6 Z̄
mi
e ¯ 6
− (p̄z + Az )F̄e Je d Z̄ . (7.54)
me
204 R. D. Sydora
The canonical momentum formulation does not explicitly contain the induction
electric field but is present when one transforms the distribution function momen-
tum characteristic back to its evolution along the velocity characteristic. The cost
of removing the explicit induction electric field in the characteristics is that we
must solve (7.54) as a nonlinear elliptic partial differential equation for the vector
potential.
In this section the total energy conservation of the gyrokinetic Vlasov-Poisson sys-
tem is derived in the transformed corrdinates. By using the fundamental conserva-
tion law in the Hamiltonian system
∂f (z, t) 6
H(z, t) Jd z = 0 (7.55)
∂t
we can apply this relation for each species in (7.32) and (7.33) to obtain the total
energy as
& 1 & 1
d 2 ¯ 6 2 ¯ 6
dt 2 mi v̄z F̄i Ji d Z̄ + 2 me v̄z F̄e Je d Z̄
& F̄i ¯ 6
& F̄e ¯ 6
+ eΦθ̄ ∂∂t Ji d Z̄ + eΦ ∂∂t Je d Z̄ = 0 (7.56)
where the right-hand-side is the ion polarization drift field energy. The total energy
invariant is therefore
& 1 & 1
d 2 ¯ 6 2 ¯ 6
dt 2 mi v̄z F̄i Ji d Z̄ + 2 me v̄z F̄e Je d Z̄
& ωpi2 !
d 1 2
+ dt 8π Ω 2 |∇ ⊥ Φ| d3 r = 0 . (7.58)
i
We will present this case here, although it is possible to include higher order mo-
ments and linear wave-particle resonance effects such as Landau damping. These are
known as gyro-Landau fluid closure methods [29]. The moment equations allow for
a computationally efficient way to investigate nonlinearity in low frequency magne-
tized plasmas. However, if the velocity space nonlinearities become significant, the
closure methods may require too many higher order moments and the simulations
become impractical. Therefore, it is important to carefully compare the results of ki-
netic simulations and fluid closure approaches to be sure important physical effects
are not left out.
We now derive a three-field gyrofluid model (Φ, Az , pe ) based on the gyroki-
netic Vlasov-Poisson-Ampere system discussed previously. We will first work in
the long wavelength limit (k⊥ ρi < 1) and neglect the finite ion temperature effects
(Ti = 0) but include the ion polarization drift or polarization shielding in the gy-
rokinetic Poisson equation. The gyrokinetic Vlasov equation, now in gyro-center
coordinates, is
∂f ∗ b × ∇Φ ∂f q ∗ ∂Az ∂f
+ v b + · + −b · ∇Φ − = 0 (7.59)
∂t B0 ∂r m ∂t ∂v
for each species and where the unit vector along the magnetic field becomes tilted to
∇Az × b
b∗ = b + (7.60)
B0
in the parallel velocity representation. The field equations are
2
ωpi
∇2⊥ Φ = 4πe(ne − ni ) , (7.61)
Ωi2
∇2⊥ Az = −4π(Je + Ji ) . (7.62)
We first form the density and current moments for each species
n = dv f (r, v ) ,
J = q dv v f (r, v ) = qnv , (7.63)
where v is the fluid velocity along the magnetic field. By integration of (7.59) over
velocity space we obtain the continuity equation for each species as
dn
+ b∗ · ∇(nv) = 0 . (7.64)
dt
Taking the convective derivative (d/dt), defined by
d ∂ b × ∇Φ
= + ·∇ (7.65)
dt ∂t B0
206 R. D. Sydora
of (7.61) and using the continuity equation (7.64) with (7.62), an equation for the
vorticity evolution is obtained
d 2 Ω2
∇⊥ Φ = − 2i b∗ · ∇(∇2⊥ Az ) . (7.66)
dt ωpi
For the second field equation, which involves Az , we assume the ions move
only with the E × B drift and we set vi = 0, which is equivalent to neglecting the
coupling to ion sound waves. For electrons, we take the velocity moment of (7.59)
which gives
dJe e ∗ n0 e 2 ∂Az
− b · ∇pe = −b∗ · ∇Φ − , (7.67)
dt me me ∂t
&
where the field-aligned electron pressure is pe = me dv (v − ve )2 fe (r, v ). This
moment equation should be recognized as a type of Ohm’s law. Using the simplest
closure relation, pe = nTe and Te = const, the field-aligned pressure gradient is
related to the vorticity by
2
1 ωpi
∇pe = Te ∇ne = ∇(∇2⊥ Φ) , (7.68)
4πe Ωi2
where (7.61) was used. From (7.67), (7.62) and (7.68) we obtain
∂Az d
= −b∗ · ∇Φ + d2e (∇2⊥ Az ) + ρ2s b∗ · ∇(∇2⊥ Φ) , (7.69)
∂t dt
where the de = c/ωpe is the collisionless electron skin depth and ρs = cs /Ωi is the
ion sound radius with cs = (Te /mi )1/2determined by the electron temperature. It is
#z = Az − d2 ∇2 Az
related to the ion gyroradius by ρs = Te /Ti ρi . If we define A e ⊥
the Ohm’s law can be re-written as
∂A#z ∇Φ × ∇A #z ∇(∇2⊥ Φ) × ∇A #z
∗ 2 2
= −b · ∇Φ + + ρs ∇(∇⊥ Φ) + .
∂t B0 B0
(7.70)
The continuity equation for the electron density completes the three-field gyrofluid
model and is coupled to both the vorticity evolution and Ohm’s law
∂ne b × ∇Φ 1 ∗
+ · ∇ne = − b · ∇(∇2⊥ Az ) . (7.71)
∂t B0 4πe
The finite ion temperature effects can be incorporated by using the form of the
gyrokinetic Poisson equation
Te
(1 − Γ0 )Φ = −4πe(ne − n̄i ) , (7.72)
Ti λ2De
&
where n̄i (r, t) = 2πv⊥ dv⊥ dv J0 fi (r, v , v⊥ , t). The ion continuity equation is
obtained by taking the first moment of the ion gyrokinetic equation
7 Gyrokinetic and Gyrofluid Theory and Simulation of Magnetized Plasmas 207
∂fi ∗ b × ∇(J0 Φ) ∂fi
+ v b + ·
∂t B0 ∂r
(7.73)
e ∗ ∂(J0 Az ) ∂fi
+ −b · ∇(J0 Φ) − =0,
mi ∂t ∂v
where J0 represents the gyrophase-averaging effect. We therefore obtain
dn̄i b × ∇Φ
+ (1 − Γ0 ) · ∇ni = 0 , (7.74)
dt B0
where the dominant term for the gyro-averaged drift is retained. The finite-tempera-
ture vorticity equation is obtained by operating the convective derivative on (7.72)
and again using the continuity and Ampere’s equation gives
1 d Ω2 Ti (1 − Γ0 ) b × ∇Φ
2 ((1 − Γ0 )Φ) = 2i b∗ · ∇(∇2⊥ Az ) − · ∇ni . (7.75)
ρi dt ωpi eneo ρ2i B0
By using a Padé approximation to the Γ0 operator, expressed in Fourier space as
2 2
2 2 k⊥ ρi
1 − Γ0 (k⊥ ρi ) ≃ 2 ρ2 (7.76)
1 + k⊥ i
2 2 2
and multiplying the vorticity equation on both sides by 1 + k⊥ ρi , then replacing k⊥
2
with the Laplacian ∇⊥ we finally obtain
d Ω2 Ti b × ∇ni
(∇2⊥ Φ) = 2i (1 − ρ2i ∇2⊥ )b∗ · ∇(∇2⊥ Az ) + · ∇(∇2⊥ Φ) . (7.77)
dt ωpi eneo B0
The multi-field gyrofluid models can be extended to include nonuniform elec-
tron and ion temperature as well as the parallel ion velocity. This leads to extended
four-field [36, 37, 38] and five-field (Φ, Az , pe , pi , vi ) models.
and ρj = |ρj |(e1 sin(θ) + e2 cos(θ)). The particle equations of motion (7.78), are
finite-differenced in time and standard, second order predictor-corrector methods
can be used to evolve the discrete N -particle distribution [39]
N
F (R, pz , μ, t) = δ(R − Ri (t))δ(pz − pzi (t))δ(μ − μi ) . (7.81)
i=1
Each particle is initially assigned a guiding center position, a parallel velocity and a
magnetic moment, from which a gyroradius can be computed.
Using this discrete particle distribution, the electrostatic and vector potentials
are obtained from the gyrokinetic Poisson and Ampere’s equations
τ
(1 − Γ0 )Φ = 4πe(n̄i − ne ) , (7.82)
λ2De
∇2⊥ Az = −4π(J¯zi + Jze ) , (7.83)
where
'
(
n̄i (r) = Fi (R, vz , μ)δ(R − r + ρi )dRdμdvz ,
θ
'
(
J¯zi (r) = e vz Fi (R, vz , μ)δ(R − r + ρi )dRdμdvz (7.84)
θ
and vz = pz − (e/mi )Az . The electrons are not gyrophase-averaged and considered
to be drift-kinetic. The gyrophased-averaged ion number density (and current den-
sity) can also be obtained numerically by using the ring-averaged of the N -particle
distribution function and this gives
⎡ ⎤
M
1 ⎣
n̄i (r) = δ(Rj + ρj (θi ))⎦ . (7.85)
M i=1 j
The charge density is obtained at discrete grid points and therefore an interpola-
tion must be made. The delta functions are then replaced by interpolating functions
7 Gyrokinetic and Gyrofluid Theory and Simulation of Magnetized Plasmas 209
which may be of low order, such as the nearest-grid-point (NGP) method, or higher
order, such as the second order quadratic spline method [40]. Once the charge and
current densities are formed at the grid points, the field equations may be solved by
inverting the elliptic-type operators on the left-hand-side of (7.82) and (7.83). This
can be done efficiently using Fast Fourier Transform (FFT) methods. Non-periodic
boundary conditions may be incorporated by employing sine or cosine transforms.
(7.83) for the vector potential is a nonlinear equation since the right-hand-side de-
pends on the vector potential. Therefore, an iterative procedure must be used to
converge the solution.
In order to determine the accuracy and energetics of the simulation plasma the
conservation properties must be carefully examined. The total energy invariant for
the gyrokinetic Vlasov-Poisson-Ampere system is used to determine the accuracy
of the simulation results and is given by [35]
(pzej + e/me Az )2
ET = me μej B +
j
2
(pzij − e/mi Az θ )2
+ mi μij B +
j
2
1
+ (|E|2 + |B|2 )d3 r , (7.86)
8π
where E is the electric field determined by the gradient of the electric potential in
(7.82) and B is the magnetic field obtained from (7.83) and B = ∇Az × b.
Linearization of the ion gyrokinetic equation, the electron drift-kinetic equation and
combined with the Fourier transform of the gyrokinetic Poisson and Ampere equa-
tions, one obtains the following dispersion relation
2
k2 vA2 2 2
k⊥ ρi 2 2
ω = 2 d2 2 ρ2 ) + k⊥ ρs , (7.87)
1 + k⊥ e 1 − Γ0 (k⊥ i
where ω is the real frequency. This fundamental normal mode in gyrokinetic plas-
mas is known as the kinetic shear Alfven wave [41] and in the long wavelength
2 2
limit, k⊥ ρi < 1 has the form
2 2
2 2 2 1 + k⊥ ρs
ω = k vA 2 d2 (7.88)
1 + k⊥ e
2 2 2 2
since 1 − Γ0 (k⊥ ρi ) ≃ k⊥ ρi . This is the highest frequency which must be re-
solved in the simulation and therefore the condition ω∆t ≤ 0.1 must be satis-
fied. Another time step restriction arises from the electron transit motion and hence
k (ve )max ∆t < 1.
210 R. D. Sydora
Ek2 Te /2
= . (7.93)
8π 1 + k 2 λ2De
For the magnetic fluctuations which arise from the fluctuating currents, the thermal
fluctuation spectrum is
2
B⊥k Te /2
= 2 d2 (7.94)
8π 1 + k⊥ e
and de = c/ωpe .
As we have seen in the previous section the fundamental normal mode of gyroki-
netic plasmas is the kinetic shear Alfven wave (KSAW). The particle inertia and
finite gyroradius effects make this mode highly dispersive for short wavelengths.
7 Gyrokinetic and Gyrofluid Theory and Simulation of Magnetized Plasmas 211
2000
Ωit = 0
1500 Ωit = 20
Fc(vz)
1000
500
0
–6 –4 –2 0 2
vz / vtc
Fig. 7.2. Parallel electron distribution function taken at the initial time and final time levels
300
200
EB
100
EL
0
ET
ΔE
–100
EKe
–200
–300
–400
0 5 10 15 20
Ωit
Fig. 7.3. Temporal evolution of the energy change ∆E from the initial value of the elec-
trostatic EL , magnetic EB , electron kinetic EKe and total energy ET for the current-driven
kinetic shear Alfven wave instability
7 Gyrokinetic and Gyrofluid Theory and Simulation of Magnetized Plasmas 213
is a net slowing down of the parallel electron distribution with very weak thermal
change.
The spatial distribution of the electric potential fluctuations is given in Fig. 7.4 at
the saturation phase of instability. The electric potential vortices have a mean scale
size of about 2–3 ρs and are roughly isotropic with kx ρs ≃ ky ρs . The electron den-
sity fluctuations averaged over the y-direction are also shown and reach a maximum
level of δn/n0 ≃ 0.05. There is also some indication of smaller scales (∼ ρs ) being
modulated by larger scales (∼ 5ρs ). A more complete analysis of these results will
be presented elsewhere, but one can see the large amount of information which can
be obtained in the nonlinear regime of such models. The fluctuation spectra can be
compared to experiment and assist in their interpretation.
In this section we consider electric and magnetic fluctuations which arise from
nonuniform currents. Spatially localized currents can lead to regions of anti-parallel
magnetic fields which can break and reconnect via a microtearing instability to
form magnetic islands. Small-scale magnetic islands have been proposed as means
of inducing spontaneous symmetry breaking of perfectly nested flux surfaces in
magnetically-confined toroidal plasmas [47] and in certain space plasma environ-
ments [48]. A consequence of this is the generation of substantial anomalous elec-
tron thermal transport, particularly when these islands interact radially [49, 50].
Small-scale magnetic turbulence can also act as a negative effective resistivity on
large-scale magnetic field perturbations which can lead to amplification of the large-
scale fields. Furthermore, sources of small-scale magnetic turbulence can produce
4.4 4.4
4.2 4.2
Ne 4 Ne 4
3.8 3.8
3.6 3.6
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
y/ρs y/ρs
0 4 8 0 4 8
x/ρs x/ρs
Fig. 7.4. Electron density profile (y-averaged) and electrostatic potential taken at the pre-
growth (Ωi t = 2) (left panel) and saturation phase (Ωi t = 20) (right panel)
214 R. D. Sydora
anomalous electron viscosity and enhanced current diffusion which could lead to
self-sustained turbulence [51].
There has been theoretical work on the kinetic theory of magnetic island growth
in the linearly unstable regime of collisionless tearing [52, 53] as well as some early
particle simulations with a full particle magnetoinductive model [54]. More recently,
gyrokinetic particle simulations have been applied to the collisionless tearing insta-
bility dynamics in uniform and nonuniform plasmas [55, 56].
For the results here, we use a plasma slab with sheared magnetic field B =
Bz z+By (x) y and |Bz | ≫ |By |. The shear field By (x) is produced by a nonuniform
sheet current assumed to vary only in the x-direction. It has the form Jz (x) =
−en0 vdz exp(−(x − Lx /2)2 /a2 ) where vdz is the electron drift velocity in the z-
direction and this is shown in Fig. 7.5. The sheared By (x) field has anti-parallel
field lines across the middle of the simulation domain as displayed in the same
figure. The initial density and temperature profiles are taken to be uniform. The
boundary conditions are periodic in the y-direction and the vector potential Az and
electrostatic potential are set to be zero at the boundaries in the x-direction. The
particles are specularly reflected at these boundaries.
This equilibrium serves an excellent test case because the linear growth rate and
saturated island width are well-known [52, 53]. The equilibrium sheared magnetic
field By (x) can also be represented by a vector potential Azo (x) and the perturbed
magnetic field by a vector potential A #z (x, y) through
# ⊥ = ∇ × (A
B #x x
#z z) = B #y y .
+B (7.98)
Since Az (x, y) = Azo + A#z and A
#z (x, y) = Az cos(ky y), it is possible to show the
width of the magnetic island W is related to the amplitude of the perturbed vector
potential by
z Ls
A
W = , (7.99)
Bz
0 x
0.2
–0.1
0
x x
Fig. 7.5. Initial current profile, magnetic field By (x), and schematic of the ambient magnetic
direction Bz and sheared anti-parallel By component
7 Gyrokinetic and Gyrofluid Theory and Simulation of Magnetized Plasmas 215
where Ls is the shear scale length of the magnetic field defined as Ls = Bz /|dBy
(x)/dx|. The linear growth rate of the tearing mode has been derived from the elec-
tron drift kinetic equation in the collisionless regime as
ky vte de
γk ≃ (k vte )(∆′ de ) ≃ (∆′ de ) , (7.100)
Ls
where de = c/ωpe , vte is the electron thermal velocity, and ∆′ is the jump derivative
of the perturbed vector potential across the anti-parallel field reversal region
#z /(∂x)]∆
[∂ A −∆
∆′ = , (7.101)
A#z (0)
where ∆ is the singular layer width and for the Gaussian current profile, the jump
derivative ∆′ ≃ 1/a, where a is the current channel width. Therefore, under the
assumption of a uniform plasma, the condition ∆′ > 0 is required for growing
microtearing modes. The saturation level for the unstable collisionless mode is pre-
dicted to be
∆′ d2e
W max ≃ , (7.102)
2G
where G is a numerical constant with value G = 0.41.
The nonlinear evolution of the collisionless microtearing mode is investigated
using the 2D gyrokinetic particle simulation model described earlier and the pa-
rameters used were: System size Lx × Ly = 16de × 16de , de = c/ωpe = 8∆,
ρi = 4∆, Te /Ti = 1, mi /me = 1837, and current layer width 2a = 10∆. The
y-direction is periodic and the discrete wavenumbers in this direction are given by
ky = 2πm/Ly , m = 0, 1, ..., Ly /2 − 1.
Fig. 7.6(a) displays the magnetic island half-width time evolution for the most
unstable wavelength. The final saturation level is W sat ≃ 1.2de which is comparable
to the theoretical estimate of W max ≃ (de /2aG)de ≃ 1.9de . The simulation results
are below the predicted value mainly due to the difference between the simulation
value of ∆′ which changes during the evolution of the current profile; the theory
assumes it is constant.
The vector potential Az (x, y) and electrostatic potential Φ(x, y) at fixed time
level are presented in Fig. 7.6(b), just prior to saturation. The magnetic island
with X-point and O-point are clearly visible in the vector potential and the elec-
trostatic potential pattern has a quadrapolar structure near the X-point of the is-
land. After the island grows, electrons are trapped in the singular layer and un-
dergo bounce motion with frequency ωb = ky vte W/2Ls , where W is the mag-
netic island width. This can be seen in Fig. 7.6(a), where electron trapping os-
cillations appear after saturation and the period is consistent with Tb = 2π/ωb .
When the saturated island is evolved for a long time period a nonlinear elec-
tron distibution function emerges and consists of trapped and untrapped electron
orbits.
216 R. D. Sydora
1.4
a)
1.2
W(t) 0.8
de 0.6
0.4
0.2
0
0 200 400 600 800 1000 1200 1400
Time
b) Az(x,y) φ(x,y)
1 0.02
0.8
0.01
0.6
y/de y/de 0
0.4
–0.01
0.2
0 –0.02
–4 x/de 4 –4 x/de 4
Fig. 7.6. (a) Magnetic island width W, temporal evolution and (b) vector potential Az (x, y),
and electrostatic potential Φ(x, y), at time level prior to island width saturation
Fig. 7.7 shows the current profile at the initial time and near saturation. A
double-peaked structure forms near the field reversal region and is related to the
quasilinear changes induced by the magnetic island formation. The current becomes
redistributed to the outer regions of the island.
t=0
t = 600
Jz
–4 0 4
x /de
Fig. 7.7. Initial and final electron current profiles for the microtearing instability
7 Gyrokinetic and Gyrofluid Theory and Simulation of Magnetized Plasmas 217
7.7 Summary
The multiple scale nature of plasmas present inherent difficulties in the simulation
of low frequency (ω < Ωi ) electromagnetic fluctuations in magnetized plasmas.
In full particle simulation models based on the Vlasov-Maxwell system, the main
problem is the high frequency space charge waves characterized by the electron
plasma frequency ωpe and electron Debye length λDe , which impose severe time
step and spatial resolution restrictions. Their presence gives rise to very high noise
levels which can mask the evolution of low frequency quasi-neutral-type (ne ≈ ni )
waves whose equilibrium fluctuation energy can be orders of magnitude lower.
Beginning with single particle dynamics in an ambient magnetic field, a gy-
rophase averaging procedure can be developed to remove the gyrophase dependence
on drift motion and thus eliminate the fast gyro-motion and associated high fre-
quency cyclotron waves while retaining finite gyroradius effects. The methods of
action variational and Lie perturbation methods can be used to derive gyro-drift
equation of motion to any order and retain the phase space conserving properties in
the change of variables from gyro-center to gyrophase-averaged coordinates.
Using the single particle equations of motion as characteristics of the gyrokinetic
Vlasov equation, it is possible to formulate a self-consistent system of equations in-
cluding a Poisson and Ampere equation for the electrostatic and magnetic potential
from which the electric and magnetic fields are formed. The gyrokinetic Poisson
equation physically describes the ion polarization drift effects without the need to
include them explicity in the equations of motion. The gyrokinetic Vlasov-Poission-
Ampere system satisfies particle and energy conservation.
By integration over the phase space, moment equations can be formed to de-
scribe continuum gyrofluids. In some cases the magnetized plasma dynamics can
be modeled by just a few of the lowest order moments, resulting in computationally
efficient simulations without the problems of statistical noise as in the discrete for-
mulation. It should also be mentioned that continuum gyrokinetic Vlasov-Poission-
Ampere equations are being used for turbulent transport simulations in inhomoge-
neous plasmas [27]. These require very large computing resources because of the
large number of grid points required in the high dimensional phase space.
Gyrokinetic particle simulations have been extensively developed in recent years
for the study of low frequency microturbulence in inhomogeneous plasmas. These
models have the advantage of allowing one to formulate parallel algorithms for im-
plemention on massively parallel computing architectures and simulations with over
one billion particles are now feasible. The advance of low noise techniques, where
only the perturbed part of the distribution function is represented by particles, has
also allowed for more clearer delineation of the linear growth and saturation phase
of instabilities [39, 56] as well as larger scale simulations.
The author would like to thank the organizers of the Heraeus Summer School,
Prof. H. Fehske, Dr. R. Schneider and Dr. A. Weiße for their support and the hos-
pitality of the Max-Planck-Institute for Plasma Physics, Greifswald, Germany. This
research was supported in part by a grant from the Natural Sciences and Engineering
Research Council (NSERC) of Canada.
218 R. D. Sydora
References
1. J. Hugill, Nucl. Fusion 23, 331 (1983) 192
2. P. Liewer, Nucl. Fusion 25, 543 (1985) 192
3. F. Wagner, U. Stroth, Plasma Phys. Contr. F. 35, 1321 (1993) 192
4. R. Fonck, G. Cosby, R. Durst, S. Paul, N. Bretz, S. Scott, E. Synakowski, G. Taylor,
Phys. Rev. Lett. 70, 3736 (1993) 192
5. G. McKee, C. Petty, R. Waltz, C. Fenzi, R. Fonck, J. Kinsey, T. Luce, K. Burrell,
D. Baker, E. Doyle, X. Garbet, R. Moyer, C. Rettig, T. Rhodes, D. Ross, G. Staebler,
R. Sydora, M. Wade, Nucl. Fusion 41, 1235 (2001) 192
6. T. Northrop, Adiabatic Motion of Charged Particles (Wiley, New York, 1963) 192, 193
7. P. Rutherford, E. Frieman, Phys. Fluids 11, 569 (1968) 192
8. J. Taylor, R. Hastie, Plasma Phys. 20, 479 (1968) 192
9. P. Catto, Plasma Phys. 20, 719 (1978) 192
10. T. Antonsen, B. Lane, Phys. Fluids 23, 1205 (1980) 192
11. P. Catto, W. Tang, D. Baldwin, Plasma Phys. 23, 639 (1981) 192
12. E. Frieman, L. Chen, Phys. Fluids 25, 502 (1982) 192
13. W. Lee, Phys. Fluids 26, 556 (1983) 192
14. A. Boozer, Phys. Fluids 23, 904 (1980) 192
15. R. Littlejohn, Phys. Fluids 24, 1730 (1981) 192, 194
16. D. Dubin, J. Krommes, C. Oberman, W. Lee, Phys. Fluids 26, 3524 (1983) 192, 193, 194, 199
17. T. Hahm, Phys. Fluids 31 (1988) 192
18. A. Brizard, J. Plasma Phys. 41, 541 (1989) 192
19. A. Brizard, T. Hahm, Rev. Mod. Phys. 79, 421 (2007) 192, 197
20. W. Lee, J. Comput. Phys. 72, 243 (1987) 192, 208
21. R. Sydora, T. Hahm, W. Lee, J. Dawson, Phys. Rev. Lett. 64, 2015 (1990) 192
22. R. Sydora, Phys. Fluids B2, 1455 (1990) 192
23. S. Parker, W. Lee, R. Santoro, Phys. Rev. Lett. 71, 2042 (1993) 192
24. R. Sydora, V. Decyk, J. Dawson, Plasma Phys. Contr. F. 12, A281 (1996) 192
25. Z. Lin, T. Hahm, W. Lee, W. Tang, R. White, Science 281, 1835 (1998) 192
26. A. Dimits, G. Bateman, M. Beer, B. Cohen, W. Dorland, G. Hammett, C. Kim, J. Kinsey,
M. Kotschenreuther, A. Kritz, L. Lao, J. Mandrekas, W. Nevins, S. Parker, A. Redd,
D. Shumaker, R. Sydora, J. Weiland, Phys. Plasm. 7, 969 (2000) 193
27. W. Nevins, J. Candy, S. Cowley, T. Dannert, A. Dimits, W. Dorland, C. Estrada-Mila,
G. Hammett, F. Jenko, M. Pueschel, D. Shumaker, Phys. Plasm. 13, 122306 (2006) 193, 217
28. A. Brizard, Phys. Fluids B4, 1213 (1992) 193
29. W. Dorland, G. Hammett, Phys. Fluids B5, 812 (1993) 193, 205
30. M. Beer, G. Hammett, Phys. Plasmas 3, 812 (1996) 193
31. D. Strintzi, B. Scott, Phys. Plasmas 11, 5452 (2004) 193
32. K. Miyamoto, Plasma Physics for Nuclear Fusion (MIT Press, Cambridge, MA, 1989) 193, 197
33. V. Arnold, Mathematical Methods of Classical Mechanics (Springer-Verlag, New York,
1989) 198
34. J. Cary, R. Littlejohn, Ann. Phys. 151, 1 (1983) 199
35. T. Hahm, W. Lee, A. Brizard, Phys. Fluids 31, 1940 (1988) 203, 209
36. R. Hazeltine, C. Hsu, P. Morrison, Phys. Fluids 30, 3204 (1987) 207
37. A. Aydemir, Phys. Fluids B4, 3469 (1992) 207
38. B. Scott, Plasma Phys. Contr. F. 45, A385 (2003) 207
39. H. Naitou, K. Tsuda, W. Lee, R. Sydora, Phys. Plasmas 2, 4257 (1995) 208, 217
7 Gyrokinetic and Gyrofluid Theory and Simulation of Magnetized Plasmas 219
40. C. Birdsall, A. Langdon, Plasma Physics via Computer Simulation (McGraw-Hill, New
York, 1985) 209
41. A. Hasegawa, L. Chen, Phys. Fluids 19, 1924 (1976) 209
42. J. Krommes, Phys. Fluids B5, 2405 (1993) 210
43. J. Dawson, Rev. Mod. Phys. 55, 403 (1983) 210
44. A. Hasegawa, P. Indian Acad. Sci. A 86, 151 (1977) 211
45. K. Stasiewicz, P. Bellan, C. Chaston, C. Kletzing, R. Lysak, J. Maggs, O. Pokhotelov,
C. Seyler, P. Shukla, L. Stenflo, A. Streltsov, J.E. Wahlund, Space Sci. Rev. 92, 423
(2000) 211
46. D. Leneman, W. Gekelman, J. Maggs, Phys. Rev. Lett. 82, 2673 (1999) 211
47. B. Kadomtsev, Nucl. Fusion 31, 1301 (1991) 213
48. A. Galeev, L. Zelenyi, JETP Lett. 29, 614 (1979) 213
49. A. Rechester, M. Rosenbluth, Phys. Rev. Lett. 40, 88 (1978) 213
50. P. Rebut, M. Hugon, Comments Plasma Phys. Contr. F. 33, 1085 (1991) 213
51. M. Yagi, S.I. Itoh, K. Itoh, A. Fukuyama, M. Azumi, Phys. Plasmas 2, 4140 (1995) 214
52. J. Drake, Y. Lee, Phys. Rev. Lett. 39, 453 (1977) 214
53. J. Drake, Y. Lee, Phys. Fluids 20, 1341 (1977) 214
54. I. Katanuma, T. Kamimura, Phys. Fluids 23, 2500 (1980) 214
55. R. Sydora, Phys. Plasmas 8, 1929 (2001) 214
56. W. Wan, Y. Chen, S. Parker, Phys. Plasmas 12, 012311 (2005) 214, 217
8 Boltzmann Transport in Condensed Matter
This chapter presents numerical methods for the solution of Boltzmann equations as
applied to the analysis of transport and relaxation phenomena in condensed matter
systems.
The Boltzmann equation (BE) is of central importance for the description of trans-
port processes in many-particle systems. Boltzmann introduced this equation in the
second half of the 19th century to study irreversibility in gases from a statistical me-
chanics point of view. Envisaging the molecules of the gas to perform free flights,
which are occasionally interrupted by mutual collisions, he obtained the well-known
equation [1]
∂g F ∂g
+ v · ∇r g + · ∇v g = , (8.1)
∂t M ∂t c
where g(r, v, t) is the velocity distribution function, M is the mass of the gas
molecules, F is the external force, and the r.h.s. is the collision integral. With this
equation Boltzmann could not only prove his famous H-theorem, which contains
a definition of entropy in terms of the velocity distribution function and states that
F. X. Bronold: Boltzmann Transport in Condensed Matter, Lect. Notes Phys. 739, 223–254 (2008)
DOI 10.1007/978-3-540-74686-7 8 c Springer-Verlag Berlin Heidelberg 2008
224 F. X. Bronold
for irreversible processes entropy has to increase, he could also calculate transport
properties of the gas, for instance, its heat conductivity or its viscosity.
In the original form, the BE holds only for dilute, neutral gases with a short range
interaction, for which nR3 ≪ 1, where n is the density of the gas and R is the range
of the interaction potential. However, it has also been applied to physical systems,
for which, at first sight, the condition nR3 ≪ 1 is not satisfied. For instance, the
kinetic description of plasmas in laboratory gas discharges or interstellar clouds is
usually based on a BE, although R → ∞ for the Coulomb interaction. Thus, nR3 ≪
1 cannot be satisfied, for any density. A careful study of the Coulomb collision
integral showed, however, that the bare Coulomb interaction has to be replaced by
the screened one, resulting in a modified BE, the Lenard-Balescu equation [2, 3],
which can then indeed be employed for the theoretical analysis of plasmas [4, 5].
In the temperature and density range of interest, ionized gases are, from a sta-
tistical point of view, classical systems. The technical problems due to the Coulomb
interaction not withstanding, it is therefore clear that a BE, which obviously belongs
to the realm of classical statistical mechanics, can be in principle formulated for a
plasma.
The BE has also been successfully applied to condensed matter, in particular, to
quantum fluids, metals, and semiconductors [6, 7, 8], whose microscopic descrip-
tion has to be quantum-mechanical. Hence, transport properties of these systems
should be calculated quantum-statistically, using a quantum-kinetic equation, in-
stead of a BE [9, 10, 11]. In addition, naively, one would not expect the condition
nR3 ≪ 1 to be satisfied. The densities of condensed matter are too high. A pro-
found quantum-mechanical analysis in the first half of the 20th century [12, 13, 14]
revealed, however, that the carriers in condensed matter are not the tightly bound,
dense building blocks but physical excitation which, at a phenomenological level,
resemble a dilute gas of quasiparticles for which a BE or a closely related kinetic
equation can indeed be formulated.
The quasiparticle concept opens the door for Boltzmann transport in condensed
matter, see Fig. 8.1. Depending on the physical system, electrons or ion cores in a
solid, normal or superfluid/superconducting quantum fluids etc., various types of
quasiparticles can be defined quantum-mechanically, whose kinetics can then be
modelled by an appropriate semi-classical BE. Its mathematical structure, and with
it the solution strategy, is essentially independent of the physical context. Below,
we restrict ourselves to the transport resulting from electronic quasiparticles in a
crystal. Further examples of semi-classical quasiparticle transport can be found in
the excellent textbook by Smith and Jensen [7].
quasi−
“kinetic mapping” particles
Fig. 8.1. This cartoon puts the content of this chapter into perspective. For neutral or ionized
gases, the BE and its range of validity, can be directly derived from the Hamilton function for
the classical gas molecules. In that case, the BE determines the distribution function for the
constituents of the physical system under consideration. In the context of condensed matter,
however, the BE describes the distribution function for the excitation modes (quasiparticles)
and not for the constituents (electrons, ion cores, ...) although the BE has to be obtained –
by quantum-statistical means – from the constituents’ Hamilton operator. The definition of
quasiparticles is absolutely vital for setting-up a BE. It effectively maps, as far as the kinetics
is concerned, the quantum-mechanical many-particle system of the constituents to a semi-
classical gas of excitation modes
where we separated the lattice-periodic potential V (r) originating from the rigidly
arranged ion cores from the energy dependent potential Σ(r, E) (self-energy) which
arises from the coupling to other charge carriers as well as to the ion cores’ devia-
tions from the equilibrium positions (phonons).
Let us first consider (8.2) for Σ = 0. An electron moving through the crystal ex-
periences then only the lattice periodic potential V . It gives rise to extremely strong
scattering, with a mean free path of the order of the lattice constant, which could
never be treated in the framework of a BE. However, this scattering is not random.
It originates from the periodic array of the ion cores and leads to the formation of
bare energy bands. Within these bands, the motion of the electron is coherent, but
with a dispersion which differs from the dispersion of a free electron. Because of
1 −1
&
The Bloch functions are orthonormal when integrated over a unit cell vcell : vcell cell
∗
dr unk (r) un′ k ′ (r) = δnn′ δk,k ′ .
226 F. X. Bronold
this difference, the electron no longer sees the rigid array of ion cores. Its mean free
path exceeds therefore the lattice constant, and a BE may become feasible.
However, in a crystal there is not only one electron but many, and the lattice
of ion cores is not rigid but dynamic. Thus, electron-electron and electron-phonon
interaction have to be taken into account giving rise to Σ = 0. As a result, the
Schrödinger equation (8.2) becomes an implicit eigenvalue problem for the renor-
malized energy bands En (k) which may contain a real and an imaginary part. For
the purpose of the discussion, we assume Σ to be real. Physically, the dressing of the
electron incorporated in Σ arises from the fact that the electron attracts positively
charged ion cores and repels other electrons, as visualized in Fig. 8.2. The former
gives rise to a lattice distortion around the considered electron and the latter leads
to a depletion of electrons around it2 .
While coherent scattering on the periodic array of ion cores transforms bare
electrons into band electrons, which is favorable for a BE description, residual in-
teractions turn band electrons into dressed quasiparticles, which may be detrimen-
tal to it, because the dressing is energy and momentum dependent. Quasiparticles
are therefore complex objects. Nevertheless, they are characterized by a dispersion,
carry a negative elementary charge, and obey Fermi statistics, very much like band
electrons. Provided they are well-defined, which means that the imaginary part of
Σ, which we neglected so far in our discussion, is small compared to the real part
of Σ, a BE may be thus also possible for them. However, the justification of the
Fig. 8.2. Graphical representation of the many-body effects contained in the selfenergy Σ.
The lattice distortion (dashed lines) and the depletion region (large solid circle) turn bare
band electrons (visualized by the small bullet) into quasiparticles which carry the lattice dis-
tortion and the depletion region with it when they move through the crystal
2
Here, exchange effects are also important, because electrons are fermions. At a technical
level, the depletion region is encoded in the Coulomb hole and the screened exchange
selfenergy.
8 Boltzmann Transport in Condensed Matter 227
quasiparticle BE will be subtle. Indeed, there are no pre-canned recipes for this
task. Each physical situation has to be analyzed separately, using quantum-statistical
techniques [9, 10, 11].
For standard metals [15] and superconductors [16, 17, 18, 19], for instance, the
BE for quasiparticles can be rigorously derived from basic Hamiltonians, provided
the quasiparticles are well-defined. The main reason is the separation of energy
scales [20]: A high-energy scale set by the Fermi momentum kF and a low-energy
scale given by the width ∆k of the thermal layer around the Fermi surface, see
Fig. 8.3. The latter also defines the wavelength 1/∆k of the quasiparticles respon-
sible for transport. Because of the separation of scales, an ab initio calculation of
transport coefficients is possible along the lines put forward by Rainer [21] for the
calculation of transition temperatures in superconductors, which is a closely related
problem, see also [22].
For semiconductors, on the other hand, a BE for quasiparticles can only be rigor-
ously derived when they are degenerate, that is, heavily doped and thus metal-like;
the scales are then again well separated. However, when the doping is small, or
in moderately optically excited semiconductors, the electrons are non-degenerate.
Thus, neither a Fermi energy nor a transport energy scale can be defined. In that
case, a BE for quasiparticles is very hard to justify from first principles [23], despite
the empirical success the BE has also in these situations.
Δk
thermal layer
kF
Fermi
surface
k=0
Fig. 8.3. Separation of the momentum (and thus energy) scales in a metal. The Fermi momen-
tum kF sets the high-energy scale, whereas the thermal smearing-out of the Fermi surface,
∆k, gives the scale relevant for transport. Using quantum-statistical methods, a correlation
function called g K , which is closely related to the distribution function g can be systemati-
cally expanded in ∆k/kF ∼ kT /EF . If the quasiparticles have long enough lifetimes, g K
reduces, in leading order, to g and satisfies a BE [20]
228 F. X. Bronold
where En (k) is the band energy obtained from the solution of the Schrödinger equa-
tion (8.2), k is the kinetic momentum, and p and r are, respectively, the canonical
momentum and coordinate of the quasiparticle. From the Hamilton equations it then
follows, that a quasiparticle in the nth band centered at r and k in phase space has
to move according to
3
Naturally, phonons comprising the lattice distortion accounted for in the definition of
quasiparticles do not lead to scattering. But there is a residual electron-phonon interac-
tion which induces transitions between different quasiparticle states.
8 Boltzmann Transport in Condensed Matter 229
dr 1
= v n (k) = ∇k En (k) ,
dt n
dk 1
= F n = −e E + v n (k) × B (8.4)
dt n c
with E = −∇r U and B = ∇r × A, which immediately leads to the quasiparticle
BE
∂gn e 1 ∂gn
+ v n · ∇r gn − E + v n × B · ∇k gn = , (8.5)
∂t c ∂t c
when the time evolutions of the distribution function due to streaming (l.h.s.) and
scattering (r.h.s.) are balanced.
Suppressing the variables r and t, the general structure of the collision integral
is
∂gn
= {Sn′ k′ ,nk gn′ (k′ )[1 − gn (k)] − Snk,n′ k′ gn (k)[1 − gn′ (k′ )]}
∂t c ′ ′
nk
(8.6)
with Sn′ k′ ,nk the probability for scattering from the quasiparticle state n′ k′ to the
quasiparticle state nk, which has to be determined from the quantum mechanics of
scattering. Its particular form depends on the scattering process (see below). The
collision integral consists of two terms: The term proportional to 1 − gn (k) ac-
counts for scattering-in (gain) processes, whereas the term proportional to gn (k)
takes scattering-out (loss) processes into account. Note, for non-degenerate quasi-
particles4 , gn (k) ≪ 1 and the Pauli-blocking factor 1 − gn (k) reduces to unity.
Some of the numerical techniques we will discuss below are tailored for the so-
lution of the steady-state, spatially uniform, linearized quasiparticle BE, applicable
to situations, where the external fields are weak and the system is close to thermal
equilibrium. This equation can be obtained from the full BE (8.5) through an expan-
sion around thermal equilibrium. In the absence of magnetic fields and for a single
band it reads [6, 7]
"
∂f "" (1)
− eE · v = Cg (k) , (8.7)
∂E " E(k)
where the r.h.s. symbolizes the linearized collision integral and g (1) = g − f
is the deviation of the distribution function from the Fermi function f (E) =
[exp(E/kB T ) + 1]−1 . Here T is the temperature and E measures the energy from
the chemical potential. With the help of the detailed balance condition
Sk′ ,k f (E(k′ ))[1 − f (E(k))] = Sk,k′ f (E(k))[1 − f (E(k′ ))] , (8.8)
the linearized collision integral becomes5
4 ∗ 3
with mass m are non-degenerate when nλdB ≪ 1, where n is the density
Quasiparticles
2 ∗
and λdB = h /2πm kB T is the de Broglie wavelength of the quasiparticles.
5
Recall, we suppress in the collision integral the variables r and t.
230 F. X. Bronold
$ %
(1) (1) ′ f (E(k)) (1) 1 − f (E(k′ ))
Cg (k) = Sk,k′ g (k ) − g (k)
f (E(k′ )) 1 − f (E(k))
k′ (8.9)
= # k,k′ g (1) (k′ ) ,
Q
k′
6
From now on, BE refers to quasiparticle BE.
8 Boltzmann Transport in Condensed Matter 231
the full BE. With an eye on the calculation of the electric conductivity of metals
and the calculation of hot-electron distributions in semiconductors, we will describe
two such methods: Numerical iteration [27, 28, 29, 30, 31, 32, 33, 34, 35] and al-
gebraization through an expansion of the distribution function in terms of a set of
basis functions [36, 37, 38, 39, 40, 41, 42, 43].
The second group consists of Monte Carlo techniques for the direct simulation
of the stochastic motion of quasiparticles, whose distribution function is governed
by the BE. These techniques are the most popular ones currently used because the
concepts they invoke are easy to grasp and straightforward to implement on a com-
puter. In addition, Monte Carlo techniques can be applied to far-off-equilibrium
situations and are thus ideally suited for studying hot-electron transport in semicon-
ductor devices which is of particular importance for the micro-electronics industry.
Below, we will present two different Monte Carlo approaches. The first approach,
which evolved into a design tool for electronic circuit engineers, samples the phase
space of the quasiparticles by monitoring the time evolution of a single test-particle
[44, 45, 46, 47, 48, 49]. Whereas the second approach generates the time evolution
of N -electron configurations in a discretized momentum space [50]. This is partic-
ularly useful for degenerate electrons, where Pauli-blocking is important.
Based on the linearized BE (8.7), numerical iteration has been extensively used for
calculating steady-state transport coefficients for metals in uniform external fields
[27, 28, 29, 30]. In contrast to the full BE, the linearized BE is not an integro-
differential equation but an inhomogeneous integral equation to which an iterative
approach can be directly applied. As an illustration, we consider the calculation of
the electric conductivity tensor σ.
To set up the iteration scheme, g (1) is written in a form which anticipates that
(1)
g will change rapidly in the vicinity of the Fermi surface, see Fig. 8.3, while it
will be a rather smooth function elsewhere. The relaxation time approximation [6, 7]
suggests for g (1) the ansatz
"
(1) ∂f ""
g (k) = − eE · v(k)φ(k) , (8.11)
∂E "E(k)
where E(k) and v(k) are, respectively, the energy measured from the chemical
potential and the group velocity of the quasiparticles. In terms of the function φ(k),
which can be interpreted as a generalized, k-dependent relaxation time, the electric
current becomes [6, 7]
dk
j = 2e v(k) g (1) (k)
(2π)3
232 F. X. Bronold
"
dk ∂f ""
= −2e2 φ(k) v(k) : v(k)E
(2π)3 ∂E "E(k)
= σE , (8.12)
where : denotes the tensor product and the factor two comes from the spin. Note,
although the particular structure of (8.11) is inspired by the relaxation time approx-
imation, the iterative approach goes beyond it, because it does not replace the lin-
earized collision integral by −g (1) /τ , where τ is the relaxation time, but keeps it
fully intact. In addition, it is also more general than variational approaches [6, 7]
because the function φ(k) is left unspecified.
To proceed, we insert (8.11) into (8.7). Using the collision integral in the form
(8.9) and defining
X(k; E) = −eE · v(k) , (8.14)
we obtain
" "
∂f "" 1 − f (E(k′ )) ∂f ""
X(k; E) = Sk,k′ X(k; E)φ(k)
∂E "E(k) 1 − f (E(k)) ∂E "E(k)
k′
"
f (E(k)) ∂f "" ′ ′
− X(k ; E)φ(k ) , (8.15)
f (E(k′ )) ∂E "E(k′ )
Notice that the precise form of the single band scattering probability Sk,k′ is im-
material for the iteration procedure which can thus handle all three major scat-
tering processes: Elastic electron-impurity, inelastic electron-phonon, and electron-
electron scattering.
Equation (8.16) is an inhomogeneous integral equation suitable for iteration:
Starting with φ(0) = 0 (thermal equilibrium), a sequence of functions φ(i) , i ≥ 1,
can be successively generated, which comes with increasing i arbitrarily close to
the exact solution, provided the process converges. Convergence is only guaranteed
when the kernel is positive and continuous. This is not necessarily the case, but it
can be enforced when selfscattering processes are included, see below.
8 Boltzmann Transport in Condensed Matter 233
The iteration process needs as an input the scattering probability. The most
important scattering processes affecting the electric conductivity of metals are
electron-impurity and electron-phonon scattering. The former determines the con-
ductivity at low temperatures whereas the latter at high temperatures7. In our nota-
tion, these two scattering probabilities are given by [28]
imp 2π
Sk,k′ = |M imp (cos θkk′ )|2 δ(E(k′ ) − E(k)) , (8.18)
ph 2π "" ph ′ "2 ?
Sk,k ′ = Mλ (k − k)" Nλq δ(E(k ′ ) − E(k) − ωλq ) δk′ −k,q+Qi
qλ Qi
@
+[1 + Nλq ] δ(E(k ′ ) − E(k) + ωλq )δk−k′ ,q+Qi , (8.19)
where M imp (cos θkk′ ) is the electron-impurity coupling which depends on the angle
θkk′ between k and k′ (isotropic elastic scattering), Mλph (k′ − k) is the electron-
phonon coupling, and Nλq = [exp(ωλq ) − 1]−1 is the equilibrium distribution
function for phonons with frequency ωλq ; q, λ, and Qi are the phonon wave-vector,
the phonon polarization, and the ith reciprocal lattice vector, respectively. The cou-
pling functions are material specific and can be found in the literature [6, 7, 8].
In order to obtain a numerically feasible integral equation, (8.18) and (8.19)
are inserted into (8.16) and the momentum sums are converted into integrals. The
integral over k′ is then transformed into an integral over constant energy surfaces
using8
dk′ 1 dΩ(k′ )
′
→ 3
→ 3
dE(k ) , (8.20)
′
(2π) (2π) |v(k′ )|
k
where dΩ(k ) is the surface element on the energy surface E(k′ ). The δ-functions
′
appearing in the scattering probabilities (8.18) and (8.19) are then utilized to elim-
inate some of the integrations thereby reducing the dimensionality of (8.16). For
isotropic bands φ(k) → φ(E(k)), and one ends up with an one-dimensional in-
tegral equation which can be readily solved by iteration. For more details see
[27, 28, 29, 30].
The iterative approach can be also applied to the full BE. This is of particular interest
for the calculation of distribution functions for electrons in strong external fields
[31, 32, 33, 34, 35]. In that case, however, the BE has to be first converted into an
integral equation. This is always possible because the free streaming term in (8.5)
has the form of a total differential which can be integrated along its characteristics.
7
Electron-electron scattering does not affect the electric conductivity, as long as normal
processes are only taken into account. Umklapp processes, on the other hand, contribute
to the conductivity, but the matrix elements are usually very small.
8
The volume is put equal to one.
234 F. X. Bronold
and added on both sides of the equation a selfscattering term Sk g(k, t), which has
no physical significance, but is later needed to simplify the kernel of the integral
equation.
To transform (8.21) into an integral equation, we introduce path variables k∗ =
k + e/Et∗ and t∗ = t which describe the collisionless motion of the electrons
along the characteristics of the differential operator [31, 45]10 . In terms of these
variables, (8.21) can be written as
d 9 & t∗
# k(k∗ ,y)
: & t∗
#
∗
g(k(k ∗ ∗
, t ), t∗
) e 0
dy λ
= e 0 dyλk(k∗ ,y) S#k′ ,k(k∗ ,t∗ ) g(k′ , t∗ )
dt ′
k
(8.23)
&
eE t #
g(k, t) =g k + (t − t′ ), t′ e− t′ dyλk+eE(t−y)/
t &t #
+ dt∗ e− t∗ dyλk+eE(t−y)/ S#k′ ,k+eE(t−t∗ )/ g(k′ , t∗ ) . (8.24)
t′ k′
t & t # k+eE(t−y)/
g(k, t) = dt∗ e− t∗
dy λ
S#k′ ,k+eE(t−t∗ )/ g(k′ , t∗ ) . (8.25)
−∞ k′
This form of the BE is not yet particularly useful, because the time integral
in the exponent contains an integrand which almost always cannot be integrated
exactly. Even if it can, the result would be a complicated function, unsuited for fast
numerical manipulations. It is at this point, where the selfscattering term, which we
artificially added on both sides of the BE, can be used to dramatically simplify the
integral equation, as was first noticed by Rees [32]. Since the selfscattering rate Sk
is completely unspecified, we can use it to enforce a particularly simple form of λ#k .
An obvious choice is
#k = λk + Sk = Γ ≡ const
λ (8.26)
with a constant Γ > sup λk in order to maintain the physical desirable interpretation
of Sk = Γ −λk in terms of a selfscattering rate, which, of course, has to be positive.
With (8.26), (8.25) reduces after a re-labeling of the time integration variable to
∞
g(k, t) = dτ e−Γ τ S#k′ ,k+eEτ / g(k′ , t − τ ) . (8.27)
0 k′
This form of the uniform BE is well-suited for an iterative solution [32, 34]. The
parameter Γ turns out to be crucial. It not only eliminates a complicated integration
but it also enforces a positive, continuous kernel which is necessary for the iteration
procedure to converge [33].
From a numerical point of view, integral equations are less prone to numerical
errors than differential equations. It can be therefore expected that an iteration based
solution of (8.27) is numerically more robust than a numerical treatment of the BE
in integro-differential form. Another nice property of the iterative approach is that it
processes the whole distribution function which is available any time during the cal-
culation. This is particularly useful for degenerate electrons, where Pauli-blocking
affects electron-electron and electron-phonon scattering rates. In the simplest, and
thus most efficient, particle-based Monte Carlo simulations, in contrast, the distri-
bution function is only available at the end of the simulation, see Sect. 8.2.3.
At first sight the dimensionality of the integral equation11 seems to ruin any
efficient numerical treatment of (8.27). This is however not necessarily so. The
time integration, for instance, is a convolution and can be eliminated by a Laplace
transform (t ↔ s). In the Laplace domain, (8.27) contains s only as a parameter
not as an integration variable. The efficiency of the method depends then on the
efficiency with which the remaining k-integration can be performed. For realistic
band structures and scattering processes this may be time consuming. However, it
is always possible to express g(k, s) in a symmetry adopted set of basis functions,
thereby converting (8.27) into a set of integral equations with lower dimensionality.
11
Three momentum variables and one time variable.
236 F. X. Bronold
Hammar [35], for instance, used Legendre polynomials to expand the angle depen-
dence of the distribution function and obtained an extremely fast algorithm for the
calculation of hot-electron distributions in p − Ge and n − GaAs. In addition, it
is conceivable to construct approximate kernels, which are numerically easier to
handle.
The energy polynomials σn (E) are nth order polynomials in E/(kB T ), which
are orthogonal with respect to the weight function −∂f /∂E with
∂f
dE − σn (E)σn′ (E) = δnn′ . (8.34)
∂E
They will be used to describe variations perpendicular √to the Fermi surface. The
first two polynomials are σ0 (E) = 1 and σ1 (E) = 3E/(πkB T ). Higher or-
der ones have to be again constructed on a computer, using the recursion relation
given by Allen [37]. As pointed out by Pinski [38], another possible choice for
the energy√polynomials, which may lead to faster convergence in some cases, is
σn (E) = 2n + 1Pn (tanh[E/(2kB T )]), where Pn (E) is the nth order Legendre
polynomial.
Allen used the functions FJ (k) and σn (E(k)) to define two complete sets of
functions which are biorthogonal. With the proper normalization, they are given by
FJ (k)σn (E(k))
χJn (k) = ,
N (E(k))v(E(k))
"
∂f ""
ξJn (k) = −FJ (k)σn (E(k))v(E(k)) (8.35)
∂E "E(k)
with N (E) and v(E), respectively, the single-spin density of states and the root-
mean-square velocity at energy E (see above). With the help of (8.33) and (8.34),
it is straightforward to show that χJn (k) and ξJn (k) satisfy the biorthogonality
conditions
χJn (k)ξJ ′ n′ (k) = δJJ ′ δnn′ ,
k
χJn (k)ξJn (k′ ) = δkk′ . (8.36)
Jn
Any function of k can be either expanded in terms of the functions χJn (k) or in
terms of the functions ξJn (k). The functions χJn are most convenient for expanding
functions which are smooth in energy. Since in (8.29) we split-off the factor −∂E f ,
we expect φ(k) to exhibit this property and thus write
φ(k) = φJn χJn (k) . (8.37)
Jn
The functions ξJn , on the other hand, vary strongly in the vicinity of the Fermi
surface. They are used at intermediate steps to express functions which peak at the
Fermi energy.
We are now able to rewrite (8.31). Using the definition of ξJn , the l.h.s. imme-
diately becomes
l.h.s. of (8.31) = eEx ξX0 (k) . (8.38)
For the r.h.s., we find
8 Boltzmann Transport in Condensed Matter 239
r.h.s. of (8.31) = Qkk′ δk′ k′′ φ(k ′′ )
k′ k′′
= Qkk′ χJ ′ n′ (k′ ) ξJ ′ n′ (k′′ )φ(k′′ )
J ′ n′ k′ k′′
′
= Qkk′ χJ ′ n′ (k )φJ ′ n′ , (8.39)
J ′ n′ k′
where in the second line we expressed the Kronecker delta via (8.36) and in the third
line we used the inverse of (8.37)
φJn = ξJn (k)φ(k) . (8.40)
k
Multiplying (8.38) and (8.39) from the left with χJn (k) and summing over all k
leads to the final result
Ex δn0 δJX = QJn,J ′ n′ φJ ′ n′ (8.41)
J ′ n′
with QJn,J ′ n′ = kk′ χJn (k)Qkk′ χJ ′ n′ (k′ ).
Equation (8.41) is the symmetry-adapted matrix representation of the linearized
BE (8.31). Its solution gives the expansion coefficients φJn . To complete the cal-
culation, we have to express the electric current jx in terms of these coefficients.
Using in (8.30) the definition for ξX0 (k) and the biorthogonality condition (8.36),
we obtain
jx = 2eφX0 , (8.42)
which with (8.41) yields
σxx = 2e2 [Q−1 ]X0,X0 . (8.43)
Thus, in Allen’s basis, the xx-component of the electric conductivity tensor is just
the upper-most left matrix element of the inverse of the matrix which represents the
linearized collision integral. Remember, because of the symmetry of the basis func-
tions, this matrix is block-diagonal. The numerical inversion is therefore expected
to be fast.
The numerical bottleneck is the calculation of the matrix elements QJn,J ′ n′ .
They depend on the symmetry of the metal and, of course, on the scattering pro-
cesses. For realistic band structures, this leads to rather involved expressions, which,
fortunately, are amenable to some simplifications arising from the fact that in met-
als kB T /EF ≪ 1, where EF is the Fermi energy. The k-integration can thus be
restricted to the thermal layer with width ∆k, see Fig. 8.3. For explicit expressions,
we refer to the literature [37, 38, 39, 40, 41, 42, 43]. Although Allen’s approach
is not straightforward to implement, it has the advantage that it can handle com-
plicated Fermi surfaces in a transparent manner. In practice, the matrix elements
QJn,J ′ n′ are expressed in terms of generalized coupling functions which can be
either directly obtained from experiments or from ab initio band structure calcula-
tions. Allen’s method of solving the linearized BE is therefore geared towards an ab
initio calculation of transport coefficients for metals.
240 F. X. Bronold
The most widely accepted method for solving the electron transport problem in
semiconductors is the particle-based Monte Carlo simulation [44, 45, 46, 47, 48, 49].
In its general form, it simulates the stochastic motion of a finite number of test-
particles and is equivalent to the solution of the BE. This technique has been, for
instance, used to simulate field-effect transistors from a microscopic point of view,
starting from the band structures and scattering processes of the semiconducting
materials transistors are made off.
A Monte Carlo simulation of Boltzmann transport is a one-to-one realization of
Boltzmann’s original idea of free flights, occasionally interrupted by random scat-
tering events, as being responsible for the macroscopic transport properties of the
gas under consideration; here, the electrons in a semiconductor. The approach relies
only on general concepts of probability theory, and not on specialized mathematical
techniques. Because of the minimum of mathematical analysis, realistic band struc-
tures, scattering probabilities, and geometries can be straightforwardly incorporated
into a Monte Carlo code. However, the method has some problems to account for
Pauli-blocking in degenerate electron systems. It has thus not been applied to de-
generate semiconductors, metals, or quantum fluids.
t1 t2 t3 t4 t5
time
Fig. 8.4. Schematic representation of the particle-based Monte Carlo simulation of steady-
state Boltzmann transport in spatially uniform solids. A single test-particle suffices here be-
cause ergodicity guarantees that the whole phase space is sampled. The test-particle performs
free flights in the external field, randomly interrupted by one of the possible scattering pro-
cesses (black bullets). The simulation consists of a finite number of free flights, starting from
an arbitrary initial condition (grey bullet). For each free flight the simulation uses random
numbers to generate its duration ti , to select the terminating scattering process, and to deter-
mine the test-particles’ momentum after the scattering event, which then serves as the initial
momentum for the next free flight
1
ti
where O can be, for instance, the energy of the electron E(k) or its velocity v(k) =
−1 ∇k E(k). Note, for each free flight, the time integration starts all over again
from zero. The test-electron has no memory, reminiscent of the Markovian property
of the BE.
The probability distributions for the random variables used in the Monte Carlo
simulation are given in terms of the electric field and the transition probabilities for
the various scattering processes. For realistic band structures the distributions can be
quite complicated, in particular, the distribution of the duration of the free flights.
Special techniques have to be used to relate the random variables needed in the
simulation to the uniformly distributed random variables generated by a computer.
Let us first consider the distribution of the duration of the free flights. The proba-
bility for the test-electron to suffer the next collision in the time interval dt centered
around t is given by
&
#k(t) e− t # ′
dt′ λ
P (t)dt = λ 0 k(t ) dt , (8.46)
where k(t) = k0 − (e/)Et, with k0 the arbitrary wave vector from which the first
#k the total transition rate from state k due to all scattering
free flight started, and λ
processes, including selfscattering. In Sect. 8.2.1, we introduced selfscattering in
order to simplify the integral representation of the BE. But it is also very useful in
the present context because, using again the choice (8.26), it leads to λ # = Γ and
thus to
P (t)dt = Γ e−Γ t dt . (8.47)
242 F. X. Bronold
Note, without selfscattering, we should have integrated over λk(t) , comprising only
real scattering events. For realistic band structures, this could have been done only
numerically, and would have lead to a rather complicated P (t)dt, useless for further
numerical processing.
To relate the random variable t to a random variable R ∈ [0, 1] with a uniform
distribution, we consider the cumulant of P (t),
t
c(t) = dt′ P (t′ ) = 1 − e−Γ t , (8.48)
0
final
state
initial
state bath
Fig. 8.5. Illustration of the scattering event in the test-particle-based Monte Carlo simula-
tion. The test-particle scatters off a generalized bath representing impurities and phonons.
For elastic scattering, the test-electron gains or loses only momentum, whereas for inelastic
scattering it can also transfer or receive energy
φ′ = 2πR3 ,
cos θ′ = 1 − 2R4 (8.52)
with R3 and R4 uniformly distributed random variables in the interval [0, 1].
For non-randomizing scattering processes, the probability for the angles is pro-
portional to the transition rate written in the polar coordinates introduced above.
Hence, for given k and k ′ , the properly normalized function P (φ′ , θ′ ; k, k ′ ) =
(sin θ′ /4π)S(k, k ′ , φ′ , θ′ ) gives the probability for the azimuth φ′ and the polar
angle θ′ , both depending therefore on k and k ′ . Applying again the method of con-
ditioning, which can be applied to any two-variable probability, together with the
cumulants, the random variables φ′ (k, k ′ ) and θ′ (k, k ′ ) can be again expressed in
terms of uniformly distributed random variables in the interval [0, 1].
The simulation consists of a finite number of free flights of random duration
and random initial conditions. Average single particle properties, in particular the
12
Strictly speaking, it is the probability density.
244 F. X. Bronold
drift velocity, can then be obtained from (8.45). Assuming the electric field to be in
z-direction, the drift will be also along the z-axis, that is, only the z-component of
the electron momentum will be changed due to the field. Hence, writing (8.45) with
O = vz (kz (t)) and integrating with respect to kz instead of t, the drift velocity is
given by [44]
k
z,f
1 1 ∂E 1
vz = dkz = (Ef − Ei ) , (8.53)
K ∂kz K
flights k flights
z,i
where the sum goes over all free flights, kz,i and kz,f denote the z-component of the
initial and final momentum of the respective free flights, and K is the total length of
the k-space trajectory.
In some cases, the distribution function g(k) may be also of interest. In order to
determine g(k) from the motion of a single test-particle a grid is set up in momen-
tum space at the beginning of the simulation. During the simulation the fraction of
the total time the test-electron spends in each cell is then recorded and taken as a
measure for the distribution function. This rule results from an application of (8.45).
Indeed, using O(k(t)) = ni (k(t)) with ni (k(t)) = 1 when the test-particle is in
cell i and zero otherwise, gives g(ki ) ≡ ni = ∆ti /ts , with ∆ti the time spend
in cell i. Averaged single particle quantities for the steady-state could then be also
obtained from the sum
O = O(k)g(k) , (8.54)
k
but for a reasonable accuracy the grid in momentum space has to be very fine. It is
therefore more convenient to calculate O directly from (8.45).
It is instructive to demonstrate that the Monte Carlo procedure just outlined is
indeed equivalent to solving the steady-state BE (8.44). The equivalence proof has
been given by Fawcett and coauthors [44] and we follow closely their treatment. The
starting point is the definition of a function Pn (k0 , k, t) which is the probability that
the test-electron will have momentum k at time t during the nth free flight when it
started at t = 0 with momentum k0 . The explicit time dependence must be retained
because the electron can pass through the momentum state k any time during the
nth free flight. This probability satisfies an integral equation
t
whose r.h.s. consists of three probabilities which are integrated over. The first one
Pn−1 (k0 , k′ , t) is the probability that the test-electron passes through some mo-
mentum state k′ during the (n − 1)th free flight, the second S#k′ ,k′′ is the probability
that it will be scattered from state k′ to state k′′ , whereas the exponential factor
8 Boltzmann Transport in Condensed Matter 245
is the probability that it will not be scattered while drifting from k′′ to k during
the nth free flight. The Kronecker-δ ensures that the test-electron follows the tra-
jectory appropriate for the applied electric field E. The Monte Carlo simulation
generates realizations of the random variable k(t) in accordance to the probability
Pn (k0 , k, t).
Integrating in (8.55) over k′′ and t′ and substituting τ = t − t′ and y = τ − t′′
yields an equation,
t
&τ
# k+eEy/
Pn (k0 , k, t) = dτ Pn−1 (k0 , k′ , t − τ )S#k′ ,k+eEτ / e− 0
dy λ
,
k′ 0
(8.56)
which is a disguised BE. To make the connection with the BE more explicit, we
consider the count at k obtained after N collisions
N
ts
1
CN (k0 , k) = lim dtPn (k0 , k, t) . (8.57)
ts →∞ t
n=1 s 0
This number is provided by the Monte Carlo procedure and at the same time it can
be identified with g(k) for N ≫ 1. Thus, CN (k0 , k) is the bridge, which will carry
us from the test-particle Monte Carlo simulation to the traditional BE.
We now perform a series of mathematical manipulations at the end of which
we will have obtained the steady-state, spatially-uniform BE (8.44). Inserting (8.56)
into the definition (8.57), applying on both sides −(e/)E · ∇k from the left, and
using the two identities
eE ∂
− · ∇k S#k′ ,k+eEτ / = − S#k′ ,k+eEτ / ,
∂τ
&
eE & ∂
# k+eEy/
− 0τ dy λ # τ ′#
− · ∇k e =− + λk e− 0 dy λk+eEy/ (8.58)
∂τ
gives
eE #k CN (k0 , k)
− · ∇k CN (k0 , k) = −λ
N
ts
!
1 ∂ # &τ
#
− lim dτ Pn−1 (k0 , k′ , t − τ ) Sk′ ,k+eEτ / e− 0 dyλk+eEy/ ,
ts →∞ t
n=1 s ′
∂τ
k 0
(8.59)
where we used definition (8.57) once more to obtain the first term on the r.h.s. This
equation can be rewritten into
246 F. X. Bronold
eE #k CN (k0 , k)
− · ∇k CN (k0 , k) = −λ
N
ts &t
1 #
− lim dt Pn−1 (k0 , k′ , 0)S#k′ ,k+eEt/ e− 0 dyλk+eEy/
ts →∞
n=1
t s ′
0 k
N
ts N
ts
1 1
+ lim dt Pn−1 (k0 , k , t)S#k′ ,k − lim
′
dt
ts →∞ t
n=1 s
ts →∞ t
n=1 s
0 k′ 0
t
∂ &τ
#
when the τ -integration is carried out by parts and ∂τ Pn−1 = −∂t Pn−1 is used.
Pulling now in the fourth term on the r.h.s. the differential operator ∂t in front of
the τ -integral produces two terms, one of which cancels with the second term on
the r.h.s. and the other vanishes in the limit ts → ∞. As a result, only the first and
third term on the r.h.s. of (8.60) remain. Using finally in the third term again the
definition (8.57) yields
eE
− #k CN (k0 , k) =
· ∇k CN (k0 , k) + λ CN −1 (k0 , k′ )S#k′ ,k , (8.61)
′ k
For spatially non-uniform situations13 , typical for semiconductor devices, the simu-
lation of a single test-particle is not enough (see Fig. 8.6). With a single test-particle,
for instance, it is impossible to represent the source term of the Poisson equation.
However, this equation needs to be solved in conjunction with the BE to obtain
the self-consistent electric field responsible for space-charge effects which, in turn,
determine the current-voltage characteristics of electronic devices.
Instead of a single test particle it is necessary to simulate an ensemble of test-
particles for prescribed boundary conditions for the Poisson equation and the BE,
where the latter have to be translated into boundary conditions for the test-particles.
The boundary conditions for the Poisson equation are straightforward; Dirichlet
condition, i.e., fixed potentials, at the electrodes and Neumann condition, i.e., zero
electric field, at the remaining boundaries. But the boundary conditions for the test-
particles, which need to be consistent with the ones for the Poisson equation, can be
rather subtle, resulting in sophisticated particle injection and reflection strategies,
13
The same holds for time-dependent situations.
8 Boltzmann Transport in Condensed Matter 247
x
charge assignment & force interpolation
in particular, when the doping profile of the semiconductor structure is taken into
account. An authoritative discussion of the boundary conditions, as well as other as-
pects of device modeling, can be found in the textbook by Jacoboni and Lugli [47].
Conceptually, the Monte Carlo simulation for semiconductor devices resembles
the particle-in-cell simulations for plasmas described in Chap. 6, and we refer there
for technical details. In particular, the techniques for the solution of the Poisson
equation and the particle weighting and force interpolation required for the cou-
pling of the grid-free electron kinetics (simulation of the BE) with the grid-bound
electric field (solution of the Poisson equation) are identical. In addition, except of
the differences which arise from the particular electric contacting of the simulation
volume, the implementation of particle injection and reflection (boundary condi-
tions for the test-particles) are also basically the same. The only differences are that
the test-particles have to be of course propagated during a free flight according to
dk/dt = −(e/)E and dr/dt = −1 ∇k E(k) and that the scattering processes
are the ones appropriate for semiconductors: Electron-impurity scattering, electron-
phonon scattering, and, in some cases, electron-electron scattering.
In this generalized form, the particle-based Monte Carlo simulation has become
the standard tool for analyzing Boltzmann transport of electrons in semiconductors.
In combination with ab initio band structure data, including scattering rates, it is by
now an indispensable tool for electronics engineers optimizing the performance of
semiconductor devices [46, 47, 48, 49].
encoded in the collision integral (8.6) through the factor 1 − gn (r, k, t). It depends
therefore on the one-particle distribution function which in the test-particle-based
Monte Carlo algorithm is only available at the end of the simulation. In principle, the
distribution function from a previous run could be used, but this requires additional
book-keeping, which, if nothing else, demonstrates that the algorithm presented in
the previous Subsection loses much of its simplicity.
An alternative method, which is most suitable for degenerate Fermi systems is
the ensemble-based Monte Carlo simulation. There are various ways to simulate an
ensemble. We describe here a simple approach, applicable to a spatially homoge-
neous electron system. It is based on the master equation for the probability Pν (t)
that at time t the many-particle system is in configuration ν = (nk1 , nk2 , ...). For
fermions, nki = 0 when the momentum state ki is empty and nki = 1 when the
state is occupied. The one-particle distribution function, which is the solution of the
corresponding BE, is then given by an ensemble average
g(k, t) = Pν (t)nk . (8.62)
ν
The algorithm has been developed by El-Sayed and coworkers and we closely
follow their treatment [50]. The purpose of the algorithm is to simulate electron re-
laxation in a two-dimensional, homogeneous degenerate electron gas, with electron-
electron scattering as the only scattering process. Such a situation can be realized,
for instance, in the conduction band of a highly optically excited semiconductor
quantum well at low enough temperatures. It is straightforward to take other scatter-
ing processes into account. Inhomogeneous situations, typical for device modeling,
can be in principle also treated but it requires a major overhaul of the approach
which we will not discuss.
Taking only direct electron-electron scattering into account, the force-free Boltz-
mann equation for a homogeneous, two-dimensional electron gas reads14
∂gk
=2 Wkp;k′ p′ ([1 − gk ][1 − gp ]gk′ gp′ − gk gp [1 − gk′ ][1 − gp′ ]) (8.63)
∂t ′ ′ p,k p
with
" "2
2π "" ′ "
"
Wkp;k′ p′ = "V (|k − k |)" δk+p;k′ +p′ δ (E(k)+E(p)−E(k′ )−E(p′ )) (8.64)
and V (q) = 2πe2 /[ǫ0 (q + qs )] the statically screened Coulomb interaction in two
dimensions, V = L2 is again put to one. The factor two in front of the sum in (8.63)
comes from the electron spin. As indicated above, the simulation of this equation
via the test-particle-based Monte Carlo technique is complicated because the Pauli
blocking factors depend on the (instantaneous) distribution function. The ensemble
Monte Carlo method proposed by El-Sayed and coworkers [50] simulates therefore
14
Notice the slight change in our notation: g(k) → gk .
8 Boltzmann Transport in Condensed Matter 249
the master equation underlying the Boltzmann description. This equation determines
the time evolution of the probability for the occurrence of a whole configuration in
momentum space,
dPν Pν (t)
=− + Wν ′ ν Pν ′ (t) . (8.65)
dt τν ′ ν
Here
1
= Wνν ′ (8.66)
τν ′ ν
is the lifetime of the configuration ν, and Wν,ν ′ is the transition rate from configu-
ration ν to ν ′ . Specifically for electron-electron scattering,
1 νν ′
Wνν ′ = Wkp;k′ p′ nk np [1 − nk′ ][1 − np′ ]Dkp;k ′ p′ (8.67)
2 ′ ′
kpk p
νν ′
A
with Dkp;k ′ p′ = δn′ nk −1 δn′ np −1 δn′ n ′ +1 δn′ n ′ +1
k p k′ k p′ p q =k,p,k′ ,p′ δn′q nq .
The crucial point of the method is that the sampling of the configurations can be
done in discrete time steps τν . The master equation (8.65) then simplifies to
Pν (t + τν ) = Πν ′ ν Pν ′ (t) (8.68)
ν′
with
Πν ′ ν = τν Wν ′ ν (8.69)
′15
the transition probability from configuration ν to configuration ν . Thus, when
the system was at time t in the configuration ν0 , that is Pν (t) = δνν0 , then the
probability to find the system at time t + τν in the configuration ν is Pν (t + τν ) =
Πν0 ν . In a simulation the new configuration can be therefore chosen according to
the probability Πν0 ν .
However, there is a main drawback. In order to determine τν form (8.66) a high-
dimensional, configuration-dependent integral has to be numerically calculated be-
fore the time propagation can be made. Clearly, this is not very efficient. To over-
come the problem, the selfscattering method is used again, but now at the level of
the master equation, where selfscattering events can be also easily introduced be-
cause (8.65) is unchanged, when a diagonal element is added to the transition rate.
It is therefore possible to work with a modified transition rate
s
Wνν ′ = Wνν ′ + Wν δνν ′ , (8.70)
15
The normalization required for the interpretation of Πν ′ ν in terms of a probability is a
consequence of the detailed balance Wνν ′ = Wν ′ ν which holds for energy conserving
processes.
250 F. X. Bronold
1 1
s
= + Wν . (8.71)
τν τν
The diagonal elements of the modified transition probability Πνs0 ν , that is, (8.69)
with τν → τνs and Wν0 ν → Wνs0 ν , are now finite. There is thus a finite probability
to find the system at time t + τνs still in the configuration ν0 , in other words, there is
a finite probability for selfscattering ν → ν.
Allowing for selfscattering provides us with the flexibility we need to speed up
the simulation. Imagine τν has a lower bound τ s . Then, we can always add
1 1
Wν = − >0 (8.72)
τs τν
to the transition rate which, when inserted in (8.71), leads to τνs = τ s . The sampling
time step can be therefore chosen configuration independent, before the sampling
starts. In addition, from the fact that τ s is a lower bound to τν follows 1/τν ≤
1/τ s . Thus, 1/τ s can be easily obtained from (8.66) using an approximate integrand
which obeys or even enforces this inequality. In particular, using
in (8.66) leads to
1 γ
= N (N − 1) , (8.74)
τs 2
where N = k nk is the total number of electrons and γ = supk,p γkp with
γkp = k′ p′ Wkp;k′ p′ .
We now have to work out the modified transition probability Πνs0 ν = τνs Wνs0 ν =
τ Wνs0 ν . Following El-Sayed and coworkers [50], we consider a configuration ν1
s
which differs from the configuration ν0 only in the occupancy of the four momentum
states k1 , p1 , k′1 , and p′1 . Then
(2) (3)
Πνs0 νi = P (1) (k1 , p1 ) · Pk1 ,p1 (k′1 , p′1 ) · Pk1 p1 ;k′ p′ (νi ) (8.75)
1 1
with
n k1 n p 1
P (1) (k1 , p1 ) = (8.76)
N N −1
the probability for the electrons with momentum k1 and p1 to be the scatterer,
the probability that the two electrons with k1 and p1 are scattered into momentum
states k′1 and p′1 , respectively, and
⎧ γk1 p1
⎨ γ (1 − nk1 )(1 − np1 ) i=1
′ ′
(3)
Pk1 p1 ;k′ p′ (νi ) = (8.78)
1 1 ⎩ γ
1 − kγ1 p1 (1 − nk′1 )(1 − np′1 ) i = 0
8 Boltzmann Transport in Condensed Matter 251
the probability for the selected momentum states to perform a real (i = 1) or a selfs-
cattering (i = 0) event, respectively. Note, the factor (1 − nk′1 )(1 − np′1 ) guarantees
that real scattering events occur only when the final momentum states are empty.
All three probabilities are normalized to unity when summed over the domain of the
independent variables in the brackets.
In order to implement the ensemble-based Monte Carlo simulation, the momen-
tum space is discretized into a large number of cells which can be either occupied
or empty (see Fig. 8.7). A configuration is then specified by the occupancies of all
cells. The temporal evolution of the configurations proceeds in discrete time steps
τ s and is controlled by the probability Πνs′ ν . The basic structure of the algorithm
is thus as follows: First, the initial distribution gk (t = 0) is sampled to create the
initial configuration ν0 , which is then propagated in time in the following manner:
(i) Increment the time by τ s .
(ii) Choose at random two initial momentum states, k1 and p1 , and two final mo-
mentum states, k′1 and p′1 .
(iii) Perform the selfscattering test consisting of two inquiries:
First, check whether the chosen momentum states are legitimate by asking
(2)
whether R1 > P (1) (k1 , p1 ) and R2 > Pk1 ,p1 (k′1 , p′1 ), with R1 , R2 ∈ [0, 1]
two uniformly distributed random numbers. Second, determine whether the fi-
nal states are empty or not. In the former case, a real scattering event takes
(3)
place provided R3 > Pk1 p1 ;k′ p′ (ν1 ), with R3 ∈ [0, 1] again an uniformly
1 1
distributed random variable, whereas in the latter selfscattering occurs.
(iv) Generate the new configuration ν1 , which is the old configuration ν0 with the
occupancies nk1 , np1 , nk′1 , and np′1 changed in accordance to the outcome of
the selfscattering test.
time t time t + τs
1 1 0 0 s 0 0 0 1
Πν’ν
0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 1
0 1 0 0 1 0 0 0
configuration configuration
ν’ ν
Fig. 8.7. Schematic representation of the ensemble-based Monte Carlo simulation. A suf-
ficiently large part of the two-dimensional momentum space is discretized into small cells.
Each cell with size ∆kx ∆ky is labelled by its central momentum k i , i = 1, 2, ..., M with M
the total number of cells. An ensemble of N < M electrons occupies the cells: n(k i ) = 1,
when an electron is in cell i, and n(k i ) = 0 otherwise; i n(k i ) = N . The occupancies of
all cells constitute a configuration ν. During the simulation a sequence of configurations is
generated stochastically whereby the transition probability from configuration ν ′ at time t to
configuration ν at time t + τs is Πνs′ ν
252 F. X. Bronold
8.3 Conclusions
In this section, we discussed Boltzmann transport in condensed matter, focusing on
the conditions, which need to be satisfied for a BE to be applicable to the quasiparti-
cles in a crystal, and on computational tools to solve the quasiparticle BE. Although
the quasiparticle BE cannot always be rigorously derived from first principles, it
provides in most cases a surprisingly accurate description of transport processes in
condensed matter. Most of semiconductor device engineering, for instance, is based
on a quasiparticle BE, despite the lack of a satisfying microscopic derivation.
We presented various strategies for the numerical solution of the quasiparticle
BE. For the steady-state, spatially uniform, linearized BE, usually employed for
the calculation of transport coefficients for metals, we discussed numerical iteration
and the expansion of the one-particle distribution function in terms of a symmetry-
adapted set of basis functions. In the context of condensed matter, Fermi surface har-
monics are here particularly useful because they adequately describe the topology
of the Fermi surface, which may be anisotropic, or even consisting of unconnected
pieces in momentum space.
As far as the numerical solution of the time-dependent, nonlinear BE is con-
cerned, we discussed iteration and Monte Carlo simulation. Both approaches have
been used in the past to calculate hot electron distributions in strongly biased semi-
conductors. Iteration is here based on the integral representation of the BE. The
approach is mathematically very elegant although its potential has not been fully
exploited. By far the most popular method for the numerical solution of the BE is
the Monte Carlo simulation. It has the virtue of an intuitively obvious approach,
requiring a minimum of preparatory mathematical analysis, before the computer
generates the solution. In addition, it requires no k-summations, which makes the
incorporation of realistic band structures particularly easy. We discussed two Monte
Carlo algorithms. In the first, particle-based algorithm, a single test-particle is used
8 Boltzmann Transport in Condensed Matter 253
References
1. L.W. Boltzmann, Ber. Wien. Akad. 66, 275 (1872) 223
2. A. Lenard, Ann. Phys. (New York) 10, 390 (1960) 224
3. R. Balescu, Phys. Fluids 3, 52 (1960) 224
4. G. Ecker, Theory of fully ionized plasmas (Academic Press, New York, 1972) 224
5. R. Winkler, in Advances in Atomic, Molecular, and Optical Physics, Vol. 43, ed. by
B. Bederson, H. Walther (Academic Press, New York, 2000), p. 19 224, 236
6. J.M. Ziman, Electrons and Phonons (Oxford University Press, Oxford, 1960) 224, 229, 231, 232, 233
7. H. Smith, H.H. Jensen, Transport Phenomena (Clarendon Press, Oxford, 1989) 224, 229, 230, 231, 232
8. L.M. Roth, in Handbook on Semiconductors Completely Revised Edition, Vol. 1, ed. by
P.T. Landsberg (Elsevier Science Publishers, Amsterdam, 1992), p. 489 224, 233
9. L.P. Kadanoff, G. Baym, Quantum Statistical Mechanics (W. A. Benjamin, Inc.,
New York, 1962) 224, 227
10. L.V. Keldysh, Sov. Phys. JETP 20, 1018 (1965) 224, 227
11. E.M. Lifshitz, L.P. Pitaevskii, Physical Kinetics (Pergamon Press, New York, 1981) 224, 227
12. F. Bloch, Zeitschrift f. Physik 52, 555 (1928) 224
13. R. Peierls, Ann. d. Physik 3, 1055 (1929) 224
14. L.D. Landau, Sov. Phys. JETP 3, 920 (1958) 224
15. R.E. Prange, L.P. Kadanoff, Phys. Rev. 134A, 566 (1964) 227
16. G. Eilenberger, Zeitschrift f. Physik 214, 195 (1968) 227
17. A.I. Larkin, Y.N. Ovchinnikov, Sov. Phys. JETP 28, 1200 (1969) 227
18. G. Eliashberg, Sov. Phys. JETP 34, 668 (1972) 227
19. A.I. Larkin, Y.N. Ovchinnikov, Sov. Phys. JETP 41, 960 (1976) 227
254 F. X. Bronold
T. Fennel and J. Köhn: Semiclassical Description of Quantum Many-Particle Dynamics in Strong Laser Fields, Lect.
Notes Phys. 739, 255–273 (2008)
DOI 10.1007/978-3-540-74686-7 9 c Springer-Verlag Berlin Heidelberg 2008
256 T. Fennel and J. Köhn
N !
∂ −2 2
i Ψ= ∇r i + Vext (r i ) + Vee (|ri − rj |) Ψ , (9.1)
∂t i=1
2m i<j
rij
where Vee = e2 /(4πǫ0 rij ) is the Coulomb potential and the full expression in square
brackets is the Hamilton operator.
i<j
So far this does not seem to simplify the problem, since the number of variables
has doubled. The strategy becomes more transparent after introducing the reduced
k-particle density matrices
ρ(k) (r1 . . . r k , r′1 . . . r′k , t)
N!
= ρ#(r 1 . . . rN , r ′1 . . . r′k , rk+1 . . . rN , t)d3 rk+1 . . . d3 r N (9.4)
(N − k)!
by writing r ′i = r i for all but k spacial coordinates, and integrating over these N −k
variables. To derive the equation of motion for the reduced k-particle density ma-
trix, insert (9.4) into (9.3) and integrate in the same way over all but k coordinates.
The terms (∇2r i − ∇2r′ ) and (Vext (r i ) − Vext (r′i )) vanish if the ith coordinate is in-
i
′
tegrated out. Also, interaction terms (Vee (rij ) − Vee (rij )) cancel, when both primed
coordinates are equal to the unprimed ones. Then for the one-body density matrix
follows
∂
− i ρ(1) (r, r ′ )
∂t
2
−
= (∇2r − ∇2r ′ ) + Vext (r) − Vext (r ′ ) ρ(1) (r, r ′ )
2m
+ (Vee (|r − r2 |) − Vee (|r ′ − r 2 |)ρ(2) (r, r 2 , r ′ , r2 )d3 r 2 . (9.5)
9 Semiclassical Description of Quantum Many-Particle Dynamics 257
The first term on the right hand side contains all single particle contributions, while
the second interaction term describes two-body effects and depends on the next
higher matrix ρ(2) . Similarly, the evolution of ρ(2) requires prior knowledge of the
three-body density matrix ρ(3) . Thus, the exact reformulation results in a series of
coupled equations of motion for the reduced density matrices ρ(k) , representing the
quantum counterpart to the famous BBGKY1 hierarchy, known from classical sta-
tistical mechanics. For a useful approximation this series must be truncated at some
level. Let us keep only (9.5) and close this equation by an approximation for ρ(2) . A
simple approach is a product of one-body density matrices (Hartree approximation)
ρ(2) (r 1 , r2 , r′1 , r ′2 ) = ρ(1) (r 1 , r ′1 )ρ(1) (r 2 , r ′2 ) . (9.6)
Now the integral in (9.5) can be carried out and allows to include the interaction
terms in an effective field according to
2
∂ (1) −
− i ρ (r, r ) = ′
(∇r − ∇r ′ ) + Veff (r) − Veff (r ) ρ(1) (r, r ′ ) , (9.7)
2 2 ′
∂t 2m
with the effective potential
e2 1
Veff (r) = Vext (r) + ρ(1) (r ′′ , r′′ ) d3 r′′ . (9.8)
4πǫ0 |r − r ′′ |
=n(r ′′ )
The second term in (9.8) is just the classical Hartree potential resulting from the
total electron density of the system n(r ′′ ). Thus we have found a closed mean-field
approximation to the dynamics of the one-body density matrix.
+Veff (r + q2 ) − Veff (r − q2 ) ρ(1) (r + q2 , r − q2 )d3 q . (9.11)
Using the identity ∇2r+q/2 − ∇2r−q/2 = 2∇r ∇q and the Taylor expansion of
the potential,
e2
Vx (r) = − (3π 2 n(r))1/3 , (9.16)
4π 2 ǫ0
which corresponds to the Dirac exchange energy [6]. Similarly, correlation effects
can be introduced in terms of a local potential, as from [7, 8]. A semiclassical for-
mulation of the Pauli principle will be discussed later in Sect. 9.2.1. Without sym-
metry restrictions, the direct solution of the Vlasov equation requires to evolve a
six-dimensional function in phase space, which is numerically unfavorable. An ef-
ficient practical solution is offered by the test particle method described in the next
section, which, however, requires a non-negative distribution function. This can be
achieved either by smoothing out the rapid oscillations of the Wigner function to re-
move their negative values, or by using a suitable approximation to the initial state
of the distribution function. Once the distribution function is continuously differen-
tiable and non-negative, it remains non-negative upon propagation according to the
Vlasov equation.
In general, the calculation of the forces is the most expensive part in simulations of
the dynamics of interacting particles, since forces depend on all pairs of particles.
This leads to the known N 2 -scaling in direct particle-particle simulations. The strat-
egy behind the particle-mesh technique is to use a gridded potential in coordinate
space and to approximate the forces by finite differences [11]. In our case, even the
numerical approximation of derivatives drops out, since we can express the forces
directly as a convolution of the potential and the analytically know gradient of the
weighting function, see (9.20). Now, for high particle numbers the particle-mesh
treatment is obviously advantageous to the direct force calculations, if the numer-
ical method to calculate the potential scales better than N 2 . For our problem this
is possible, as we discuss in a moment. For a Coulomb-coupled system the force
calculation using the particle-mesh technique consists of three steps:
(i) Inject all particles to a grid for the charge density.
(ii) Find the potential by solving Poisson’s equation on the grid.
(iii) Compute forces for all particles from the potential.
For our semiclassical problem we just add local potentials resulting from ions, ex-
ternal laser fields and the approximated exchange-correlation effects to the potential
obtained from step (ii). The only demanding task is the solution of the Poisson
equation on a grid. A common way is the solution in frequency space, which re-
sults in N log(N ) scaling due to discrete Fourier transforms. The discrete Fourier
2
This is related to the Husimi picture, see [10].
9 Semiclassical Description of Quantum Many-Particle Dynamics 261
As the most simple model of a fermionic many-particle system, the infinite Fermi
gas assumes noninteracting particles. The corresponding solutions of the stationary
Schrödinger equation are eigenfunctions of the kinetic energy operator, i.e., plane
waves. Restriction to a fixed volume L3 with periodic boundary conditions yields
the density of states in k-space for the Fermi gas with paired spins as
2L3
g(k) = . (9.21)
(2π)3
The occupation number of each state in k-space is given from the Fermi-Dirac dis-
tribution
1
fFD (ǫ(k) − μ) = , (9.22)
1 + e(ǫ(k)−μ)/(kB T )
with the single-particle energy ǫ(k) = 2 k 2 /(2m) and the chemical potential μ.
For a given chemical potential μ the number of particles we find in the volume L3
is given by
2L3
N (μ) = fFD (ǫ(k) − μ)d3 k . (9.23)
(2π)3
By substituting p = k and dividing by the volume L3 we can write the particle
density as an integral over momentum space,
n(μ) = f (p)d3 p , (9.24)
262 T. Fennel and J. Köhn
where f (p) = 2fFD (ǫ(p) − μ)/(2π)3 and f (p) is the momentum distribution
of the Fermi gas. The Pauli principle appears here implicity as an upper limit of
the distribution function according to f (p) ≤ 2/(2π)3 , which we can use for
semiclassical considerations as it stands. At zero temperature all states are fully
occupied up to the chemical potential, i.e., the distribution becomes a step function
2
f T =0 (p) = Θ(μ − ǫ(p)) . (9.25)
(2π)3
It is now straightforward to find the zero point kinetic energy density as a function
of the particle density
The original Thomas-Fermi theory was developed to describe the electronic struc-
ture of heavy atoms at zero temperature and leads to a problem with spherical sym-
metry [15, 16]. Here we consider a more general form that can be derived from a
variational principle and contains the original form as a special case. The central
idea is to describe electrons in an external potential as a Fermi gas at zero tem-
perature by using LDA. Then, the total energy can be written in terms of the total
electron density n(r) as
Etot [n(r)] = ukin (n(r)) + Vext (r)n(r)
1 e2 n(r)n(r ′ ) 3 ′ 3
+ d r d r, (9.27)
2 4πǫ0 |r − r′ |
where the terms in square brackets describe the approximate kinetic energy density
taken from Fermi gas, the interaction with an external potential and the electron-
electron interaction. Obviously, there is a spurious self-interaction, since an electron
interacts with its own contribution to the total electron density n(r), but we assume
this error to be small for systems with many electrons. To find the density with
minimal energy, i.e., the ground state, we solve the variational problem
δ
Etot [n] − μ n(r)d3 r = 0 , (9.28)
δn
9 Semiclassical Description of Quantum Many-Particle Dynamics 263
where μ has the meaning of a Lagrange multiplier to fix the total number of elec-
trons. The interpretation of (9.28) is the following: The (extremal) energy must re-
main unchanged for any infinitesimal change of the density by δn(r). This leads to
the condition
(2m)3/2 3/2
n(r) = [μ − Veff (r)] , (9.29)
3π 2 2
with the effective potential from (9.15). Thus, in our notation, the Thomas-Fermi
ground state is defined by a pair of self-consistent equations. For systems with
spherical symmetry the problem can be reduced to a single one-dimensional nonlin-
ear differential equation using Poisson’s equation. If the external potential Vext is a
Coulomb potential, as for a nucleus, this yields the famous Thomas-Fermi equation
[15, 16]. However, we consider the unrestricted case.
As the most simple version of density functional theory (DFT), the Thomas-
Fermi approximation provides a reasonable parameter-free description of heavy
atoms, but has serious shortcomings. For example, molecules are predicted to be
completely unstable within Thomas-Fermi theory [17], since exchange effects are
neglected [18]. In addition, the predicted values of the first atomic ionization po-
tentials are far too small. To cure these problems it was suggested by Dirac to treat
exchange effects in the same way as the kinetic energy [6], i.e., by approximating
the exchange energy locally with the Hartree-Fock result from the Fermi gas. This
adds the exchange energy density in LDA,
3e2
ux (n(r)) = − (3π 2 )1/3 n4/3 (r) , (9.30)
16π 2 ǫ0
to the integrand of (9.27). The solution of the variational problem is similar, but
yields an additional term in the effective potential. This is the LDA exchange poten-
tial we have already seen in (9.16).
However, the solution of this extended Thomas-Fermi-Dirac model can lead to
unphysical jumps in the electron density in some cases. Quantum mechanics avoids
sharp density jumps, since the large gradient of the corresponding wavefunction
would result in a very high kinetic energy. Fortunately, we can take advantage of the
test particle representation here, since the weighting functions introduce an artificial
smoothing of the density.
occupied up to the local Fermi momentum. This allows to use the Fermi gas result
from (9.26) to approximate the kinetic energy density, but now as a function of the
test particle density according to
3 2 5/3
ukin (r) = (3π 2 )nδ (r) . (9.31)
10 m
From the test particle density we find the effective real-space electron density neff (r)
after convolution with the corresponding weighting function as
neff (r) = nδ (r′ )gr (r − r′ )d3 r ′ . (9.32)
The effective density can then be used to describe all contributions to the potential
energy density due to external fields, Coulomb interactions between electrons, and
exchange. For simplicity, we restrict the derivation to an external potential and the
classical Coulomb energy, leading to
1 e2 neff (r)neff (r′′ ) 3 ′′
upot [neff ](r) = Vext (r)neff (r) + d r . (9.33)
2 4πǫ0 |r − r ′′ |
The dependence on the test particle density is implicit, because it was used to define
the effective density. After integrating the kinetic and potential energy densities and
introducing a Lagrange multiplier we find the variational problem for the minimal
total energy,
δ 3
[ukin (nδ (r)) + upot [neff ](r) − μ nδ (r)] d r = 0 . (9.34)
nδ
Since the varied quantity is the test particle density nδ , the variation of the first and
last term under the integral is straightforward and analogous to the previous section.
The treatment of the potential energy term is more difficult, since it is a functional
of the effective density. Application of the chain rule for the functional derivative
yields
& 1 2 2/3
(3π 2 )nδ (r) − μ δnδ (r)d3 r
2m
δupot [neff ](r) 3 δneff (r ′ ) 3 ′
+ d r d r δnδ (r ′′ )d3 r′′ = 0 . (9.35)
δneff (r′ ) δnδ (r ′′ )
Veff (r ′ ) gr (r ′ −r ′′ )
In the second term on the right hand side, the integration over d3 r yields the ef-
fective potential to our potential energy density from (9.33). It has the simple and
familiar form
e2 neff (r ′ ) 3 ′
Veff (r) = Vext (r) + d r . (9.36)
4πǫ0 |r − r ′ |
9 Semiclassical Description of Quantum Many-Particle Dynamics 265
With this definition we perform the d3 r′ -integration in (9.35) and introduce the
smoothed test particle potential
Vδ (r) = Veff (r ′ )g(r ′ − r)d3 r′ . (9.37)
Since the integrand must vanish to fulfill this equation for arbitrary δnδ we find the
condition for extremal energy after solving for nδ , which reads
(2m)3/2 3/2
nδ (r) = [μ − Vδ (r)] . (9.39)
3π 2 2
It is not surprising that the structure of this equation is analog to (9.29), but here we
describe the test particle density nδ as a function of the test particle potential Vδ . For
density weighting with delta functions, where neff = nδ and therefore Vδ = Veff , we
recover (9.29) as a limiting case.
For a given external potential the determination of the test particle ground state
density nδ requires to solve (9.32), (9.36), (9.37) and (9.39) self-consistently. Fur-
ther quantum corrections due to exchange and correlation effects can be easily in-
corporated, if they are treated in LDA. Therefore, the corresponding potentials, such
as that of (9.16) for the LDA exchange, are just added to the effective potential in
(9.36) as a fuction of the effective density. Once the test particle density is known,
the positions of numerical test particles can be generated by simple Monte-Carlo
sampling of nδ (r). The local momenta are sampled according to the assumed ho-
mogeneous occupation of the local Fermi sphere up to the local Fermi momentum
2 3 1/3
pmax
δ (r) = 3π nδ (r) . (9.40)
For sufficiently fine sampling (Ns ≫ 1) and a finite width of the weighting functions
dr the semiclassically initialized system is numerically stable upon the propagation
described in Sect. 9.1.3. In practice, the parameters Ns and dr are chosen to provide
the required level of long-term stability of the model, i.e., sufficient suppression of
spurious classical thermalization (see [9, 14]).
reasonably approximated as a Fermi gas. This is the major justification for the ap-
plicability of the semiclassical method.
As we want to resolve the structure of the systems, the dynamics and potentials
of the ions (nuclei plus strongly bound electrons) have to be taken into account. As
the contributions of deeper bound electrons are assumed to be less important, it is
convenient to resolve only the valance electrons explicitly, while their interaction
with core electrons and nuclei is described by pseudopotentials. This is also a com-
mon strategy in time-dependent density functional theory. Here, we consider sodium
clusters where each atom contributes one active valence electron to the model ex-
plicitly. For all results discussed in this section the exchange-correlation potential
from [7] and Gaussian density weighting (dr = 1.15 Å) have been used.
For the alkaline metals, where the singly charged ion has a closed-shell electronic
structure, it is sufficient to model the ion as an effective charge distribution with
spherical symmetry. A convenient form is a sum of Gaussians according to
k
e 2 2
ρion (r) = cn e−r /an , (9.41)
n=1
π 3/2 a3n
where cn and an are the charge and the width of each Gaussian. The corresponding
potential of an electron at position r in the field of a pseudo-ion at position R is
k
e erf (|r e − R|/an )
Ve↔ion (r, R) = −e cn , (9.42)
n=1
4πǫ0 |re − R|
where erf(x) is the error function. The parameters an and cn can be optimized so
that the model reproduces central properties of the described element, such as ion-
ization potential and polarizability [14]. Examples for the semiclassical prediction
on the basis of optimized pseudopotential with two Gaussians are given in Table 9.1,
illustrating the reasonable agreement with experimental values. The sum over the
pseudopotential of all ions at positions Ri then provides the external potential for
the electronic problem
Vext (r) = Ve↔ion (r, Ri ) . (9.43)
i
Table 9.1. Semiclassically calculated atomic properties for the sodium atom using a two-
Gaussian pseudo-potential compared [14]
model reference
ionization potential [eV] 5.30 5.13
polarizability [Å3 /(4πǫ0 )] 21.9 23.6
9 Semiclassical Description of Quantum Many-Particle Dynamics 267
Fig. 9.1. Icosahedral structure of small sodium clusters predicted by the semiclassical ground
state theory. Since electronic shells effects are not resolved in the semiclassical theory, the
ground state geometries are biased by geometric packing effects [14]
Using the parameters from the optimized atomic problem, the total energy of the
full ground state can be minimized with respect to the ionic coordinates to find the
cluster geometry, e.g., be simulated annealing. Plots of optimized geometries for
three cluster sizes are shown in Fig. 9.1. It should be noted, that the semiclassical
theory is biased by geometric packing effects and ignores electronic shell closures.
Nevertheless, the results are surprisingly close to DFT calculations [19], except for
very small particle numbers.
Having obtained the initial state of the considered system (ionic structure plus
test particle distribution), the time-dependent response can be calculated by direct
numerical propagation for various external perturbations, e.g., due to a laser field or
collisions with charged ions. However, this treatment is inefficient for a systematic
characterization of the system, since all possible scenarios would require a separate
calculation. In the limit of small excitations, where the response can be assumed to
be almost linear and allows mode decomposition, it is possible to extract the full
spectrum out of a single numerical calculation, as we discuss here in terms of the
optical response.
In dipole approximation (d ≪ λ), the linear optical response of a finite and isotropic
system to an external electric field is fully characterized by its complex dynamic po-
larizability α(ω). This quantity relates the spectral amplitudes of the induced dipole
moment p(ω) linearly to those of a driving external field E(ω) by
As the dipole moment must be real in the time domain, it is required that α(ω) =
α∗ (−ω). The knowledge of α(ω) enables to calculate important optical properties
of the system, as, e.g., the light absorption cross section from
268 T. Fennel and J. Köhn
ω
σ(ω) = Im[α(ω)] . (9.45)
cǫ0
A convenient way to calculate α(ω) for a finite system (on the basis of a time-based
numerical model) is offered by the real-time method [20], as it requires only a sin-
gle simulation run. The idea behind is to excite all modes of the system at once and
to extract their spectral weights from a simple Fourier transform of the response
in the time domain. To see this, assume an external field oriented in z-direction,
having constant spectral amplitudes for all frequencies Ez (ω) = f /(2π). The cor-
responding field in the time domain3 is Ez (t) = f δ(t) and has the meaning of an
impulsive force, instantaneously changing the velocity of all charged particles by
∆vz = qf /m at time t = 0, where q is the charge and m the particle mass. In prac-
tice, only electrons are considered to be kicked, as ions are basically unaffected due
to their higher mass. The impulsive perturbation leads to an excitation of all possible
optical modes of the system in proportion to their excitation strength. The result-
ing dipole moment in the time domain, pz (t), which can be easily recorded from
a simulation, is just a weighted superposition of harmonic oscillations. Their am-
plitudes characterize the corresponding optical activity of the investigated system.
Now, the Fourier transform of the time-dependent dipole moment, if we assume it
is a continuous function and use (9.44), turns out to be directly proportional to the
polarizability according to
2π
α(ω) = pz (ω) . (9.46)
f
A numerical simulation, of course, requires to sample the evolution of the dipole
moment by a finite number of data points pz (tn ). Assuming an even number of
points N and a fixed time step ∆t, a discrete Fourier transform provides an array
for the polarizability at N discrete values ωk from
N −1
∆t
α(ωk ) = pz (tn )e−2πink/N (9.47)
fz n=0
with ωk = k∆ω, ∆ω = 2π/(N ∆t) and k = −N/2, . . . , N/2. This form has the
advantage that the spectrum can be calculated with Fast Fourier Transform. Due to
the mentioned symmetry properties of α(ω) it is sufficient to use only the values
for positive frequencies. Obviously, the timestep and the number of iterations are
directly related to the bandwidth and the resolution of the spectrum and must there-
fore be chosen adequately for a given problem. Also, the magnitude of the field
impulse f is a sensitive parameter, as it must be small enough to remain in the linear
response regime. A simple cross-check is to vary the value of f , as the resulting
α(ω) must be independent of the strength of the perturbation.
As an example, in Fig. 9.2 the real-time method is applied to icosahedral Na147 .
Starting from the ground state, an initial velocity offset is introduced to all electrons
and the system is propagated in time. The resulting dipole signal is given in (a).
3
&∞ &∞
We use g(ω) = 1
2π −∞
g(t)e−iωtdt and g(t) = −∞
g(ω)eiωtdω.
9 Semiclassical Description of Quantum Many-Particle Dynamics 269
40
(a)
Na147
20
dipole moment [eÅ]
pz(t)
0
–20
–40
0 10 20 30 40 50
time [fs]
600
(b)
absorption cross section [Å2]
400
200
σtot (ω)
0
0 1 2 3 4 5 6
photon energy hω [eV]
Fig. 9.2. Calculated linear optical response of Na147 : (a) Evolution of the dipole signal in
the time domain, pz (t), recorded from the semiclassical model, after giving all electrons a
constant velocity offset of ∆vz = −1 Å/fs. (b) Corresponding total light absorption cross
section σ(ω) by using the polarizability obtained from a Fourier transform of the dipole
moment. The dominant peak at ω = 2.95 eV corresponds to the plasmon resonance of the
nanoparticle. A significant red-shift with respect to the classical value of the Mie plasmon
(vertical line in (b)) is predicted [14]
From its Fourier transform we obtain the polarizability, and the absorption cross
section, see (b), follows directly from (9.45). The optical spectrum is dominated
by a strong peak, i.e., the plasmon resonance of the metallic nanoparticle. Sharp
transitions through single particle-hole excitations are absent, as discrete electronic
states are not resolved within the semiclassical treatment. However, the predicted
response is reasonable and surprisingly close to results obtained from orbital based
quantum mechanical approaches such as the time-dependent density functional the-
ory [19, 20]. This is due to the fact that the response of simple metals is domi-
nated by collective effects, which are well covered in the semiclassical treatment. An
270 T. Fennel and J. Köhn
interesting feature of the plasmon resonance in small metal particles is the signifi-
cant red-shift with respect to the classical value, which is a clear quantum effect. The
magnitude of the semiclassically predicted shift is in agreement with experimental
observations. It can be explained by the non-zero electron density outside the clus-
ter surface, often referred to as spill-out. In a classical metallic sphere, where the
density makes a sharp step at the surface, the energy of the collective dipole mode
(Mie plasmon) reads
1/2
e 2 ni
ωMie = , (9.48)
3ǫ0 me
where ni is the number density of ionic charges. As bulk sodium4 is an almost ideal
metal, the prediction of (9.48) of ωMie = 3.41 eV gives a good estimate for the
macroscopic limit, see vertical line in Fig. 9.2(b). The red-shift of plasmon in case
of a cluster is a function of particle size and decreases gradually with increasing
particle size.
So far, we have gone a long way without considering truly nonlinear scenarios.
Therefore, let us finally discuss an application of the semiclassical treatment to
metal clusters in ultrashort intense laser pulses. On the basis of the calculated op-
tical absorption spectrum, cf. Fig. 9.2, a high absorption cross section is expected
for laser excitations close to the plasmon resonance. For laser photon energies far
away from the resonance, only a weak response is predicted. This is true, but only
within the linear regime. In intense laser pulses (say I ≫ 1010 W/cm2 ) the sys-
tem is changing rapidly during the interaction process, due to laser heating or the
emission of electrons, which results in transient optical properties. As observed in
many experiments, the cluster response is very sensitive to the temporal shape of the
laser field, leading to strong variations in the numbers and energies of emitted elec-
trons, ions and photons. The mechanisms behind these phenomena are a fascinating
aspect of clusters in intense fields. However, full quantum mechanical treatment is
unfeasible and simplified approaches are necessary for a theoretical description. In
case of metal clusters, the semiclassical method is a useful compromise, provid-
ing valuable insight into the dynamics of nonlinear laser-cluster interactions. This
is demonstrated below by analyzing the origin of maximum cluster ionization at
optimal delay of dual pulses.
We consider the excitation of Na55 by a sequence of two linearly polarized 50 fs
laser pulses of moderate intensity (I0 = 4 × 1012 W/cm2 ), having a variable delay
∆t and a photon energy of ω = 1.54 eV (Titanium-Sapphire laser at 800 nm). This
means, the system is probed well below its collective mode in the ground state, in
accordance to the typical situation in experiments on simple-metal clusters. A set of
simulations for various pulse delays, say ∆t = 0 . . . 1 ps, will specify a character-
istic optimal pulse separation, resulting in maximal total ionization. This behavior
4
ni = 2.53 × 1022 cm−3 .
9 Semiclassical Description of Quantum Many-Particle Dynamics 271
has also been observed in measurements [21]. To identify the mechanism underlying
this effect, Fig. 9.3 shows a set of time-dependent observables from the simulation
with the optimal delay. For the given laser parameters this is ∆topt ≈ 250 fs.
Let us first concentrate on the impact of the leading pulse. The almost vanishing
phase lag between the laser and the dipole moment (b) is a marker for low energy
absorption from the first pulse, as the system is excited far off the resonance. Re-
member, this is what we know from a driven oscillator. Only a small amplitude
of the dipole moment (a) and weak ionization of the cluster (c) is induced by the
leading pulse. However, the cluster is excited strong enough to become unstable,
as can be seen from the increasing radius (d). There are two important mechanism
driving the expansion, i.e., the Coulomb pressure due to total cluster charge and
a hydrodynamic contribution, resulting from the heated electron gas. Now, if we
Field [V/Å]
100
0 0
–100 laser envelope
dipole moment
–200 –0.5
0 (b)
phase angle
π
2
dipole phase
π
ejected electrons
(c)
40 total ionization
35.5
20
4.95
0
20 (d)
radius [Å]
5
0 100 200 300 400
time [fs]
Fig. 9.3. Response of Na55 for dual pulse laser excitation with I = 4 × 1012 W/cm2 , ω =
1.54 eV (800 nm), and an optical delay of 250 fs. Shown are the envelope of the laser field
(grey) and the corresponding electron dipole amplitude (a), the phase angle between the laser
field and the dipole signal (b), the total cluster ionization (c), and the root-mean-square radius
of the ion distribution (d). Note that the dipole phase angle passes π/2 as the rms-radius is
close to the critical value Rc (dotted line) [21]
272 T. Fennel and J. Köhn
inspect the impact of the second pulse, much larger amplitudes in the dipole moment
are found and the ionization is increased by a factor of seven. This is a significant
difference to the first excitation step, although the pulses are identical. The enhance-
ment can be explained by a dynamic plasmon resonance. A clear hint to a collective
resonance phenomena over quasi-static field effects is a transient phase lag of π/2.
The critical cluster radius for frequency matching can be estimated from the simple
classical plasmon formula in (9.48), cf. Fig. 9.3(d). The cluster radius passes this
critical value right at the time where the system absorbs energy most efficiently and
therefore emits many electrons. This effect is called plasmon-enhanced ionization.
Connected to the plasmon enhancement is an efficient non-thermal electron accel-
eration mechanism, discussed in [22].
The optimal pulse delay calculated within the semiclassical model is of the
same order of magnitude as values obtained from corresponding experiments. It
is, of course, only an approximation to the real behavior, but has a number of ad-
vantages over purely classical MD techniques. The introduction of exchange and
correlation effects allows to start from a stable and bound ground state. An initial
Fermi-Dirac distribution is stable, as the mean-field test particle approach removes
binary collisions and, therefore, unphysical thermalization to a Boltzmann distri-
bution. However, as described above, the treatment neglects collisions if the sys-
tem becomes highly excited. This shortcoming can be removed by introducing a
Ühling-Uhlenbeck collision term [23]. However, this is beyond the scope of this
contribution. For further reading about the semiclassical method and a comparison
to quantum mechanical models we refer to [24].
The authors gratefully acknowledge financial support by the Deutsche For-
schungsgemeinschaft within the Sonderforschungsbereich 652. Computer time was
provided by the High Performance Computing Center for North Germany (HLRN).
References
1. A. Messiah, Quantum Mechanics (North-Holland, 1976) 256
2. G. Bertsch, in Many-Body Dynamics of Heavy-Ion Collisions, ed. by R.B. et al. (North-
Holland, Amsterdam, 1978) 256
3. E. Wigner, Phys. Rep. 40, 739 (1932) 257
4. G. Bertsch, S.D. Gupta, Phys. Rep. 160, 189 (1988) 258, 259
5. A. Smerzi, Phys. Rev. Lett. 76, 559 (1996) 258
6. P. Dirac, Proc. Cambridge Philos. Soc. 26, 376 (1930) 259, 263
7. O. Gunnarsson, B. Lundquist, Phys. Rev. B 13, 4274 (1976) 259, 266
8. J. Perdew, A. Zunger, Phys. Rev. B 23, 5048 (1981) 259
9. C. Jarzynski, G. Bertsch, Phys. Rev. C 53, 1028 (1995) 260, 265
10. A. Domps, P. L’Eplattenier, P. Reinhard, E. Suraud, Ann. Phys. (Leipzig) 6, 455 (1997) 260
11. R. Hockney, J. Eastwood, Computer simulation using particles (McGraw-Hill Book
Company, 1981) 260
12. A. Castro, A. Rubio, M.J. Stott, Can. J. Phys. 81, 1151 (2003) 261
13. T. Beck, Rev. Mod. Phys. 72, 1041 (2000) 261
14. T. Fennel, G. Bertsch, K.H. Meiwes-Broer, Eur. Phys. J. D 29, 367 (2004) 261, 265, 266, 267, 269
9 Semiclassical Description of Quantum Many-Particle Dynamics 273
15. L.H. Thomas, Proc. Cambridge Philos. Soc. 23, 542 (1927) 262, 263
16. E. Fermi, Z. Phys. 48, 73 (1928) 262, 263
17. E. Teller, Rev. Mod. Phys. 34, 627 (1962) 263
18. E.H. Lieb, Rev. Mod. Phys. 348, 553 (1976) 263
19. C. Legrand, E. Suraud, P. Reinhard, J. Phys. B. 39, 2481 (2006) 267, 269
20. Y. Yabana, G. Bertsch, Phys. Rev. B 15(7), 3108 (1996) 268, 269
21. T. Döppner, T. Fennel, T. Diederich, J. Tiggesbäumker, K.H. Meiwes-Broer, Phys. Rev.
Lett. 94, 13401 (2005) 271
22. T. Fennel, T. Döppner, J. Passig, C. Schaal, J. Tiggesbäumker, K.H. Meiwes-Broer, Phys.
Rev. Lett. 98, 143401 (2007) 272
23. A. Domps, P.G. Reinhard, E. Suraud, Ann. Phys 280, 211 (2000) 272
24. P. Reinhard, E. Suraud, Introduction to Cluster Dynamics (Wiley-VCH, Berlin, 2004) 272
10 World-line and Determinantal Quantum Monte
Carlo Methods for Spins, Phonons and Electrons
10.1 Introduction
The correlated electron problem remains one of the central challenges in solid state
physics. Given the complexity of the problem, numerical simulations provide an
essential source of information to test ideas and develop intuition. In particular for
a given model describing a particular material we would ultimately like to be able
to carry out efficient numerical simulations so as to provide exact results on ther-
modynamic, dynamical, transport and ground-state properties. If the model shows a
continuous quantum phase transition we would like to characterize it by computing
the critical exponents. Without restriction on the type of model, this is an extremely
challenging goal.
There are however a set of problems for which numerical techniques have pro-
vided invaluable insight and will continue to do so. Here we list a few which are
exact, capable of reaching large system sizes (the computational effort scales as a
power of the volume), and provide ground-state, dynamical as well as thermody-
namic quantities: (i) Density matrix renormalization group applied to general one-
dimensional (1D) systems [1, 2], (ii) world-line based QMC methods such as the
loop algorithm [3, 4] or directed loops [5] applied to non-frustrated spin systems
in arbitrary dimensions or to 1D electron-models on bipartite lattices, and (iii) aux-
iliary field QMC methods [6]. The latter method is capable of handling a class of
F.F. Assaad and H.G. Evertz: World-line and Determinantal Quantum Monte Carlo Methods for Spins, Phonons and
Electrons, Lect. Notes Phys. 739, 277–356 (2008)
DOI 10.1007/978-3-540-74686-7 10 c Springer-Verlag Berlin Heidelberg 2008
278 F.F. Assaad and H.G. Evertz
models with spin and charge degrees of freedom in dimensions larger than unity.
This class contains fermionic lattice models with attractive interactions (e.g. attrac-
tive Hubbard model), models invariant under particle-hole transformation, as well
as impurity problems modelled by Kondo or Anderson Hamiltonians.
In this lecture we first introduce the world-line approach, exemplarily for the
1D XXZ-chain, see Sect. 10.2. In Sect. 10.3, we discuss world-line representations
of exp(−βH) without Trotter-time discretization errors (where β = 1/(kB T )), in-
cluding the stochastic series expansion (SSE). We emphasize that the issue of such
a representation of exp(−βH) is largely independent of the Monte Carlo algorithm
used to update the world lines. In Sect. 10.4 we explain the loop algorithm from an
operator point of view, and discuss some applications and generalizations. Sect. 10.5
discusses ways to treat coupled systems of spins and phonons, exemplified for 1D
spin-Peierls transitions. It includes a new method which allows the simulation of ar-
bitrary bare phonon dispersions [7]. In Sect. 10.6 we describe the basic formulation
of the auxiliary field QMC method. This includes the formulation of the partition
function, the measurement of equal-time and time-displaced correlation functions
as well as general conditions under which one can show the absence of negative
sign problem. In Sect. 10.7 we concentrate on the implementation of the auxiliary
field method for lattice problems. Here, the emphasis is placed on numerical stabi-
lization of the algorithm. Sect. 10.8 concentrates on the Hirsch-Fye formulation of
the algorithm. This formulation is appropriate for general impurity models, and is
extensively used in the framework of dynamical mean-field theories and their gen-
eralization to cluster methods. Recently, more efficient continuous time algorithms
for the impurity problem (diagrammatic determinantal QMC methods) have been
introduced [8, 9]. Finally in Sect. 10.9 we briefly provide a short and necessarily
biased overview of applications of auxiliary field methods.
To illustrate the world-line QMC method, we concentrate on the XXZ quantum spin
chain. This model is defined as
y
H = Jx x
Six Si+1 + Siy Si+1 + Jz Siz Si+1
z
, (10.1)
i i
where S i are spin 1/2 operators on site i and hence satisfy the commutation rules
) η ν*
Si , Sj = iǫη,ν,γ Siγ δi,j . (10.2)
In the above, ǫη,ν,γ is the antisymmetric tensor and the sum over repeated indices is
understood. We impose periodic boundary conditions
S i+L = S i , (10.3)
S + = S x + iS y , S − = S x − iS y , (10.6)
such that
S − | ↓ = S + | ↑ = 0 ,
S − | ↑ = | ↓ ,
S + | ↓ = | ↑ . (10.7)
280 F.F. Assaad and H.G. Evertz
The Hilbert space of the L-site chain HL is given by the tensor product of L spin
1/2 Hilbert spaces. HL contains 2L state vectors which we will denote by
The eigenstates of the above Hamiltonian are nothing but the singlet and three triplet
states
1 Jz Jx 1
Htwo sites √ (| ↑, ↓ − | ↓, ↑) = − − √ (| ↑, ↓ − | ↓, ↑) ,
2 4 2 2
1 Jz Jx 1
Htwo sites √ (| ↑, ↓ + | ↓, ↑) = − + √ (| ↑, ↓ + | ↓, ↑) ,
2 4 2 2
Jz
Htwo sites | ↑, ↑ = | ↑, ↑ ,
4
Jz
Htwo sites | ↓, ↓ = | ↓, ↓ . (10.11)
4
The basic idea of this original world-line approach is to split the XXZ Hamilto-
nian into a set of independent two-site problems. The way to achieve this decoupling
is with the use of a path integral and the Trotter decomposition. First we write
H= H (2n+1) + H (2n+2)
n n (10.12)
H1 H2
y
with H (i) = Jx Six Si+1 x
+ Siy Si+1 + Jz Siz Si+1
z
. One may verify that H1 and
H2 are sums of commuting (i.e. independent) two-site problems. Hence, on their
own H1 and H2 are trivially solvable problems. However, H is not. To use this
fact, we split the imaginary propagation exp(−βH) into successive infinitesimal
propagations of H1 and H2 . Here β corresponds to the inverse temperature. This
is achieved with the Trotter decomposition introduced in detail in Sect. 10.A. The
partition function is then given by
) * ) * ) *
Tr e−βH = Tr (e−∆τ H )m = Tr (e−∆τ H1 e−∆τ H2 )m + O(∆τ 2 )
= σ 1 |e−∆τ H1 |σ 2m . . . σ 3 |e−∆τ H1 |σ 2 σ 2 |e−∆τ H2 |σ 1 + O(∆τ 2 ) ,
σ 1 ...σ 2m
(10.13)
10 World-line and Determinantal Quantum Monte Carlo Methods 281
where m∆τ = β. In the last equality we have inserted the unit operator between
each infinitesimal imaginary time propagation. For each set of states |σ 1 . . . |σ 2m
with non-vanishing contribution to the partition function we have a simple graphical
representation in terms of world lines which track the evolution of the spins in space
and imaginary time. An example of a world-line configuration is shown in Fig. 10.1.
Hence the partition function may be written as the sum of over all world-line
configurations w, each world-line configuration having an appropriate weight Ω(w)
Z= Ω(w)
w
Ω(w) = σ 1 |e−∆τ H1 |σ 2m . . . σ 3 |e−∆τ H1 |σ 2 σ 2 |e−∆τ H2 |σ 1 , (10.14)
where w defines the states |σ 1 . . . |σ 2m .
Our task is now to compute the weight Ω(w) for a given world-line configura-
tion w. Let us concentrate on the matrix element σ τ +1 | exp(−∆τ H2 )|σ τ . Since
H2 is a sum of independent two site problems, we have
L/2
0 (2i)
σ τ +1 |e−∆τ H2 |σ τ = σ2i,τ +1 , σ2i+1,τ +1 |e−∆τ H |σ2i,τ , σ2i+1,τ .
i=1
(10.15)
Hence, the calculation of the weight reduces to solving the two-site problem, see
(10.10). We can compute, for example, the spin-flip matrix element
eΔτJz/4cosh(ΔτJx/2)
–eΔτJz/4sinh(ΔτJ /2)
Imaginary time
–eΔτJz/4sinh(ΔτJ /2)
x
τ = Δτ e–ΔτJz/4
e–ΔτJz/4
τ=0
Real space
Fig. 10.1. (a) World-line configuration for the XXZ model of (10.1). Here, m = 4 and the
system size is L = 8. The bold lines follow the time evolution of the up spins and empty sites,
with respect to the world lines, correspond to the down spins. A full time step ∆τ corresponds
to the propagation with H1 followed by H2 . Periodic boundary conditions are chosen in the
spatial direction. In the time direction, periodic boundary conditions follow from the fact that
we are evaluating a trace. (b) The weights for a given world-line configuration is the product
of the weights of plaquettes listed in the figure. Note that, although the spin-flip processes
come with a minus sign, the overall weight for the world-line configuration is positive since
each world-line configuration contains an even number of spin flips
282 F.F. Assaad and H.G. Evertz
The other five matrix elements are listed in Fig. 10.1 and may be computed in the
same manner.
We are now faced with a problem, namely that the spin-flip matrix elements are
negative. However, for non-frustrated spin systems, we can show that the overall
sign of the world-line configuration is positive. To prove this statement consider
a bipartite lattice in arbitrary dimensions. A bipartite lattice may be split into two
sub-lattices, A and B, such that the nearest neighbors of sub-lattice A belong to sub-
lattice B and vice-versa. A non-frustrated spin system on a bipartite lattice has solely
spin-spin interactions between two lattice sites belonging to different sub-lattices.
For example, in our 1D case, the even sites correspond to say sub-lattice A and the
odd sites to sub-lattice B. Under those conditions we can carry out the canonical
transformation (i.e. the commutation rules remain invariant) Six → f (i)Six , Siy →
f (i)Siy , and Siz → Siz , where f (i) = 1 (−1) if i belongs to sublattice A (B). Under
this transformation, the matrix element Jx in the Hamiltonian transforms to −Jx ,
which renders all matrix elements positive. The above canonical transformation just
tells us that the spin-flip matrix element occurs an even number of times in any
world-line configuration. The minus sign in the spin-flip matrix element may not
be omitted in the case of frustrated spin systems. This negative sign leads to a sign
problem which up to date inhibits large scale QMC simulations of frustrated spin
systems.
10.2.2 Observables
In the previous section, we have shown how to write the partition function of a
non-frustrated spin system as a sum over world-line configurations, each world-line
configuration having a positive weight. Our task is now to compute observables
) *
Tr e−βH O w Ω(w)O(w)
O = = , (10.17)
Tr [e−βH ] w Ω(w)
One of the major drawbacks of the world-line algorithm used to be that one
could not measure arbitrary observables. In particular, the correlation functions such
as Si+ Sj− which introduce a cut in a world-line configuration are not accessible
with continuous world lines and local updates. This problem disappears in the loop
algorithm and also with worms and directed loops, as will be discussed later. Here
we will concentrate on observables which locally conserve the z-component of spin,
specifically the total energy as well as the spin-stiffness.
To obtain the last equation, we have used the cyclic properties of the trace: Tr [AB] =
Tr [BA]. Inserting the unit operator 1 = σ |σσ| at each imaginary time interval
yields
1 )
H = σ 1 |e−∆τ H1 |σ 2m . . . σ 3 |e−∆τ H1 |σ 2 σ 2 |e−∆τ H2 H2 |σ 1
Zσ ,...σ
1 2m
*
+ σ 1 |e−∆τ H1 |σ 2m . . . σ 3 |e−∆τ H1 H1 |σ 2 σ 2 |e−∆τ H2 |σ 1
1
= σ 1 |e−∆τ H1 |σ 2m . . . σ 3 |e−∆τ H1 |σ 2 σ 2 |e−∆τ H2 |σ 1
Z σ ,...σ
1 2m
σ 3 |e−∆τ H1 H1 |σ 2 σ 2 |e−∆τ H2 H2 |σ 1
× +
σ 3 |e−∆τ H1 |σ 2 σ 2 |e−∆τ H2 |σ 1
Ω(w)E(w)
= w (10.19)
w Ω(w)
with
∂ ) *
E(w) = − lnσ 2 |e−∆τ H2 |σ 1 + lnσ 3 |e−∆τ H1 |σ 2 . (10.20)
∂∆τ
We can of course measure the energy on arbitrary time slices. Averaging over all the
time slices to reduce the fluctuations yields the form
1 ∂
E(w) = − ln Ω(w) . (10.21)
m ∂∆τ
Hence the energy of a world-line configuration is nothing but the logarithmic deriva-
tive of its weight. This can also be obtained more directly by taking the derivative
of (10.14).
284 F.F. Assaad and H.G. Evertz
Observables O which locally conserve the z-component of the spin are easy to
compute. If we decide to measure on time slice τ then O|σ τ = O(w)|σ τ . An
example of such an observable is the correlation function O = Siz Sjz .
The spin stiffness probes the sensitivity of the system under a twist – in spin space –
of the boundary condition along one lattice direction. If long-range spin order is
present, the free energy in the thermodynamic limit will acquire a dependence on
the twist. If on the other hand the system is disordered, the free energy is insensitive
to the twist. The spin stiffness hence probes for long range or quasi long-range spin
ordering. It is identical to the superfluid density when viewing spin systems in terms
of hard-core bosons. To define the spin stiffness, we consider the Heisenberg model
on a d-dimensional hyper-cubic lattice of linear length L:
H=J #i · S
S #j . (10.22)
i,j
where R(e, φ) is an SO(3) rotation around the axis e with angle φ. In the other
lattice directions, we consider periodic boundary conditions. The spin stiffness is
then defined as
"
1 −1 "
ρs = d−2 ln Z(φ)"" , (10.24)
L β φ=0
where Z(φ) is the partition function in the presence of the twist in the boundary
condition, and β corresponds to the inverse temperature.
Under the canonical transformation
φ Bi
S i = R(e, − i · ex )S (10.25)
L
the twist may be eliminated from the boundary condition
φ # i+Lex
S i+Lex = R e, − (i + Lex ) · ex S
L
φ #i
= R e, − (i + Lex ) · ex R(e, φ)S
L
φ # i = Si
= R e, − i · ex S (10.26)
L
10 World-line and Determinantal Quantum Monte Carlo Methods 285
φ
=J S i · R e, (i − j) · ex S j
L
i,j
φa
=J S i · R(e, − )S i+ax + J S i · S i+a . (10.27)
L
i i,a=ax
1 + − !
+J Siz Si+a
z
+ +
Si Si+a + Si− Si+a . (10.29)
2
i,a=ax
0
Z(φ) = W (Sp (w), φ) .
w p (10.30)
Ω(w,φ)
The sum runs over all world-line configurations w and the weight of the world-line
configuration, Ω(w), is given by the product of the individual plaquette weights
W (Sp (w), φ) in the space-time lattice. Sp (w) denotes the spin configuration on
plaquette p in the world-line configuration w.
Since at φ = 0 time reversal symmetry holds, the spin current
"
1 ∂ "
js = − ln Z(φ)"" (10.31)
β ∂φ φ=0
where
1 ∂2
"
"
1 ∂φ2 W (Sp (w), φ) φ=0
ρs (w) = − d−2
βL p
W (Sp (w))
" " 2
∂
∂φ W (Sp (w), φ)"φ=0 ∂φ
∂
W (Sq (w), φ)"φ=0
+ (10.33)
W (Sp (w)) W (Sq (w))
p=q
In the last line have used the fact that σ1,p , σ2,p |Si+p Sj−p + Si−p Sj+p |σ3,p , σ4,p = 1
if there is a spin-flip process on plaquette p and zero otherwise. Similarly, we have:
"
∂ "
∂φ W (S p (w), φ)"
φ=0
lim
∆τ →0 W (Sp (w))
+ − − +
∆τ J iex · (j p − ip ) σ1,p , σ2,p |(Sip Sj p − Sip Sj p )|σ3,p , σ4,p
= lim −
∆τ →0 2 L σ1,p , σ2,p |1 − ∆τ Hip ,j p |σ3,p , σ4,p
iex · (j p − ip )
= σ1,p , σ2,p |Si+p Sj−p − Si−p Sj+p |σ3,p , σ4,p . (10.35)
L
Since σ1,p , σ2,p |Si+p Sj−p − Si−p Sj+p |σ3,p , σ4,p = ±1 if there is a spin-flip process
on plaquette p and zero otherwise the identity
" ⎛ " ⎞2
∂2 " ∂ "
∂φ2 W (Sp (w), φ)" φ=0 ⎜ ∂φ W (Sp (w), φ)"φ=0 ⎟
lim = ⎝ lim ⎠ (10.36)
∆τ →0 W (Sp (w)) ∆τ →0 W (Sp (w))
The problem is now cast into one which may be solved with classical Monte Carlo
methods where we need to generate a Markov chain through the space of world-line
configurations. Along the chain the world-line configuration w, occurs on average
with normalized probability Ω(w). There are many ways of generating the Markov
chain. Here we will first discuss a local updating scheme and its limitations. We will
then turn our attention to a more powerful updating scheme which is known under
the name of loop algorithm.
Local updates deform a world-line configuration locally. As shown in Fig. 10.2 one
randomly chooses a shaded plaquette and, if possible, shifts a world line from one
side of the shaded plaquette to the other. This move is local and only involves the
four plaquettes surrounding the shaded one. It is then easy to calculate the ratio of
weights of the new to old world-line configurations and accept the move according
to a Metropolis criterion. As it stands, the above described local move is not ergodic.
For example, the z-component of spin is conserved. This problem can be circum-
vented by considering a move which changes a single down world line into an up
one and vice-versa. However, such a global move will have very low acceptance
probability at large β.
Combined, both types of moves are ergodic but only in the case of open bound-
ary conditions in space. The algorithm is not ergodic if periodic or anti-periodic
boundary conditions are chosen. Consider a starting configuration with zero wind-
ing (i.e. Wx (w) = 0). The reader will readily convince himself that with local up-
dates, it will never be possible to generate a configuration with Wx (w) = 0. Hence,
for example, a spin stiffness may not be measured within the world-line algorithm
with local updates. However, one should note that violation of ergodicity lies in
Fig. 10.2. Local updates. A shaded plaquette is chosen randomly and a Word Line is shifted
from left to right or vice versa across the shaded plaquette
288 F.F. Assaad and H.G. Evertz
the choice of the boundary condition. Since bulk properties are boundary indepen-
dent in the thermodynamic limit, the algorithm will yield the correct results in the
thermodynamic limit [20].
Different local updates without such problems have been invented in recent
years, namely worms and directed loops. They work by allowing a partial world
line, and iteratively changing the position of its ends until it closes again. They will
be discussed in Sect. 10.4.5.
To introduce loop updates, it is useful to first map the XXZ model onto the six vertex
model of statistical mechanics.
That the XXZ quantum spin chain is equivalent to the classical 2D six vertex model
follows from a one to one mapping of a world-line configuration to one of the six
vertex model. The identification of single plaquettes is shown in Fig. 10.3(a). The
world-line configuration of Fig. 10.1 is plotted in the language of the six vertex mode
in Fig. 10.3(b). The vertex model lies on a 45 degrees rotated lattice denoted by
bullets in Fig. 10.3(b). At each vertex (bullets in Fig. 10.3) the number of incoming
arrows equals the number of outgoing arrows. In the case of the XYZ chain, source
and drain terms have to be added, yielding the eight vertex model.
The identification of the XXZ model to the six vertex model gives us an intuitive
picture of loop updates [3]. Consider the world-line configuration in Fig. 10.4(a) and
its corresponding vertex formulation (Fig. 10.4(b)). One can pick a site at random
and follow the arrows of the vertex configuration. At each plaquette there are two
possible arrow paths to follow. One is chosen, appropriately, and one follows the
arrows to arrive to the next plaquette. The procedure is then repeated until one re-
turns to the starting point. Such a loop is shown in Fig. 10.4(c). Along the loop,
(a) (b)
Fig. 10.3. (a) Identification of world-line configurations on plaquettes with the vertices of
the six vertex model. (b) The world-line configuration of Fig. 10.1 in the language of the six
vertex model
10 World-line and Determinantal Quantum Monte Carlo Methods 289
(d) (e)
changing the direction of the arrows generates another valid vertex configuration,
see Fig. 10.4(d). The corresponding world-line configuration (after flipping the loop)
is shown in Fig. 10.4(e). As apparent, this is a global update which in this example
changes the winding number. This was not achievable with local moves.
In the previous paragraph we have seen how to build a loop. Flipping the loop has
the potential of generating large-scale changes in the world-line configuration and
hence allows us to move quickly in the space of world lines. There is however a
potential problem. If the loops were constructed at random, then the acceptance rate
for flipping a loop would be extremely small and loop updates would not lead to an
efficient algorithm. The loop algorithm sets up rules to build the loop such that it
can be flipped without any additional acceptance step for the XXZ model.
To do so, additional variables are introduced, which specify for each plaquette
the direction which a loop should take there, Fig. 10.5. These specifications, called
breakups or plaquette-graphs, are completely analogous to the Fortuin-Kasteleyn
bond-variables of the Swendsen-Wang cluster algorithm, discussed in Chap. 4. They
can also be thought of as parts of the Hamilton operator, as discussed in Sect. 10.4.
Note that when a breakup has been specified for every plaquette, this then graphi-
cally determines a complete decomposition of the vertex-lattice into a set of loops
(see also below). The loop algorithm is a cluster algorithm mapping from such sets
of loops to world-line configurations and back to new sets of loops. In contrast,
directed loops are a local method not associated with such graphs.
Which plaquette-graphs are possible? For each plaquette and associated vertex
(spin-configuration) there are several possible choices of plaquette-graphs which
290 F.F. Assaad and H.G. Evertz
1
1 2 3
2
2 4 3
3
1 4 3
Fig. 10.5. Possible plaquette-graphs for vertex configurations. Graph one is a vertical
breakup, graph two a horizontal one, graph four is diagonal. Plaquette-graph three is called
frozen; it corresponds to the combined flip of all four arrows
are compatible with the fact that the arrow direction may not change in the con-
struction of the loop. Figure 10.5 illustrates this. Given the vertex configurations
one in Fig. 10.5 one can follow the arrows vertically (graph one) or horizontally
(graph two). There is also the possibility to flip all the spins of the vertex. This cor-
responds to graph three in Fig. 10.5. The plaquette-graph configuration defines the
loops along which one will propose to flip the orientation of the arrows of the vertex
model.
In order to find appropriate probabilities for choosing the breakups, we need to
find weights W (S, G) for each of the combinations of spin configuration S on a
plaquette and plaquette-graph G shown in Fig. 10.5. We require that
W (S, G) = W (S) , (10.39)
G
where W (S) is the weight of the vertex S, i.e. we subdivide the weight of each spin-
configuration on a vertex onto the possible graphs, for example graphs one, four and
three if S = 3, see Fig. 10.5.
Starting from a vertex configuration S on a plaquette we choose an allowed
plaquette-graph with probability
W (S, G)
P (S → (S, G)) = . (10.40)
W (S)
for every vertex-plaquette of the lattice. We then have a configuration of vertices
and plaquette-graphs. When a plaquette-graph has been chosen for every plaquette,
the combined lines subdivide the lattice into a set of loops. To achieve a constant ac-
ceptance rate for the flip of each loop, we require that for a given plaquette-graph G
W (S, G) = W (S ′ , G) , (10.41)
10 World-line and Determinantal Quantum Monte Carlo Methods 291
This completes the formal description of the algorithm. We will now illustrate the
algorithm in the case of the isotropic Heisenberg model (J = Jx = Jz ) since this
turns out to be a particularly simple case. Equations (10.39) and (10.41) lead to
e∆τ J/4 cosh(∆τ J/2) ≡ W1 = W1,1 + W1,2 + W1,3
e∆τ J/4 sinh(∆τ J/2) ≡ W2 = W2,2 + W2,4 + W2,3
e−∆τ J/4 ≡ W3 = W3,1 + W3,4 + W3,3 (10.45)
with W3,1 = W1,1 , W1,2 = W2,2 and W2,4 = W3,4 . Here we adopt the notation
Wi,j = W (S = i, G = j) and Wi = W (S = i). To satisfy the above equations for
the special case of the Heisenberg model, we can set W•,3 = W•,4 = 0 and thereby
consider only the graphs G = 1 and G = 2. The reader will readily verify that the
equations
e∆τ J/4 cosh(∆τ J/2) ≡ W1 = W1,1 + W1,2
e∆τ J/4 sinh(∆τ J/2) ≡ W2 = W2,2 = W1,2
e−∆τ J/4 ≡ W3 = W1,1 = W3,1 (10.46)
292 F.F. Assaad and H.G. Evertz
are satisfied. We will then only have vertical and horizontal breakups. The prob-
ability of choosing a horizontal breakup is tanh(∆τ J/2) on an antiferromagnetic
plaquette (i.e. type one), it is unity on type two, and zero on a ferromagnetic plaque-
tte (type three).
Further aspects of the loop algorithm will be discussed in Sect. 10.3.
The QMC approach is often plagued by the so-called sign problem. Since the origin
of this problem is easily understood in the framework of the world-line algorithm we
will briefly discuss it in this section on a specific model. Consider spinless electrons
on an L-site linear chain
†
H = −t ci (ci+1 + ci+2 ) + H.c. (10.47)
i
with {c†i , c†j } = {ci , cj } = 0, {c†i , cj } = δi,j . Here, we consider periodic boundary
conditions, ci+L = ci and t > 0.
The world-line representation of spinless fermions is basically the same as that
of spin-1/2 degrees of freedom (which themselves are equivalent to so-called hard-
core bosons) on any lattice. For fermions, world lines stand for occupied locations
in space-time. Additional signs occur when fermion world lines wind around each
other, as we will now discuss.
For the above Hamiltonian it is convenient to split it into a set of independent
four site problems
L/4−1 L/4−1
(4n+1)
H= H + H (4n+3) (10.48)
n=0 n=0
H1 H2
with H (i) = −tc†i (ci+1 /2 + ci+2 )− tc†i+1 (ci+2 + ci+3 ) − tc†i+2 ci+3 /2 + H.c.. With
this decomposition one obtains the graphical representation of Fig. 10.6 [21].
Imaginary time.
Real space
Fig. 10.6. World-line configuration for the model of (10.47). Here m = 3. Since the two
electrons exchange their positions during the imaginary time propagation, this world-line
configuration has a negative weight
10 World-line and Determinantal Quantum Monte Carlo Methods 293
The sign problem occurs due to the fact that the weights Ω(w) are not neces-
sarily positive. An example is shown in Fig. 10.6. In this case the origin of negative
signs lies in Fermi statistics. Negative weights cannot be interpreted as probabilities.
To circumvent the problem, one decides to carry out the sampling with an auxiliary
probability distribution
|Ω(w)|
P r(ω) = , (10.49)
w |Ω(w)|
which in the limit of small values of ∆τ corresponds to the partition function of the
Hamiltonian of (10.47) but with fermions replaced by hard-core bosons. Thus, we
can now evaluate (10.17) with
P r(ω)sign(w)O(w)
O = w , (10.50)
w P r(ω)sign(w)
where both the numerator and denominator are evaluated with MC methods. Let us
first consider the denominator
) *
Tr e−βH
w Ω(w)
sign = P r(ω)sign(w) = = . (10.51)
w w |Ω(w)| Tr [e−βHB ]
problem would not occur for this non-interacting problem since one body operators
are treated exactly. That is, the sum over all world lines is carried out exactly in that
approach.
In this section we describe generalizations of the loop algorithm which allow one
to investigate the physics of single-hole motion in non-frustrated quantum magnets
[22, 23, 24].
The Hamiltonian we will consider is the t-J model defined as
† 1
The t-J model lives a Hilbert space where double occupancy on a site is excluded.
In the above, this constraint is taken care of by the projection
0
P= (1 − ni,↑ ni,↓ ) , (10.54)
i
act.
We can identify the four fermionic states on a given site to the four states in the
product space of spinless fermions and spins as:
with σ z,± = (1 ± σ z ) /2. Under the above canonical transformation the t-J model
reads
J #
H# t−J = P
# t [fj† fi P#i,j + H.c.] + #i,j P
(Pi,j − 1)∆ #,
2
i,j i,j
1
P#i,j = (σ i · σ j + 1) ,
2
#i ,j = 1 − f † fi − f † fj ,
∆ i j
0 †
#=
P 1 − fi fi σi− σi+ . (10.60)
i
We can check the validity of the above expression by considering the two-site
(i,j)
problem Ht−J . Applying the Hamiltonian on the four states in the projected Hilbert
space with two electrons gives
(i,j)
Ht−J | ↑i ⊗ | ↑j = 0 ,
(i,j)
Ht−J | ↓i ⊗ | ↓j = 0 ,
(i,j)
Ht−J | ↑i ⊗ | ↓j = P − t|0i ⊗ | ↑↓j − t| ↑↓i ⊗ |0j
J J
# (i,j) #
H t−J |1, ↓i ⊗ |1, ↑j = P − |1, ↓i ⊗ |1, ↑j + |1, ↑i ⊗ |1, ↓j
2 2
J J
= − |1, ↓i ⊗ |1, ↑j + |1, ↑i ⊗ |1, ↓j ,
2 2
(10.62)
# (i,j) (i,j)
which confirms that the matrix elements of H t−J are identical to those of Ht−J .
The reader can readily carry out the calculation in the one and zero particle Hilbert
(i,j) # (i,j) |#
spaces to see that: η|Ht−J |ν = # η |H t−J ν , where |ν (|η) and |# ν (|η) cor-
respond to the same states but in the two different representations. Since the t-J
model may be written as a sum of two-sites terms, the above is equivalent to
In the representation of (10.61) the t-J model has two important properties
which facilitate numerical simulations:
(i) As apparent from (10.62) the application of the Hamiltonian (without projec-
tion) on a state in the projected Hilbert space does not generate states with
double occupancy. Hence, the projection operation commutes with the Hamil-
tonian in this representation. The reader can confirm that this is a statement
which holds in the full Hilbert space. This leads to the relation
† J # !
t [fj fi P#i,j + H.c.] + (Pi,j − 1)∆ #i,j , P# = 0 , (10.64)
2
i,j i,j
To use the world-line formulation to the present problem, we introduce the unit
operator in the Hilbert space with no holes
1= |v, σv, σ| , |v, σ = |1, σ1 1 ⊗ |1, σ2 2 ⊗ . . . ⊗ |1, σN N , (10.67)
σ
as well as the unit operator in the Hilbert space with a single hole
1= |r, σv, r| , |r, σ = σrz,+ fr† |v, σ . (10.68)
r,σ
In the above, r denotes a lattice site and N corresponds to the number of lattice
sites. In the definition of the single hole-states, the operator σrz,+ guarantees that we
will never generate a doubly occupied state on site r (i.e. |0, ↓).
The Green function may now be written as
1 m−nτ z,+
G(i − j, τ ) = v, σ 1 | e−∆τ H1 e−∆τ H2 σi fi
Z σ
1
nτ z,+ †
× e−∆τ H1 e−∆τ H2 σj fj |v, σ 1
1
= v, σ1 |e−∆τ H1 |v, σ 2m
Z σ 1 ...σ 2m
r 2 ...r 2nτ
× v, σ 2m−1 |e−∆τ H2 |v, σ 2m−2 · · · v, σ 2nτ +1 |σiz,+ fi e−∆τ H1 |r 2nτ , σ 2nτ
Defining
and since the single-hole states are given by |r, σ = σrz,+ fr† |v, σ, the Green
function for a given world-line configuration is given by
)
Gw (i − j, τ ) = A1 (σ2nτ +1 , σ2nτ )A2 (σ2nτ , σ2nτ +1 ) · · ·
*
· · · A1 (σ3 , σ2 )A2 (σ2 , σ1 ) i,j . (10.73)
We are now left with the task of computing the matrix A. Since H2 is a sum of
commuting bond Hamiltonians (Hb ) [A1 (σ 3 , σ 2 )]i,j does not vanish only if i and
j belong to the same bond #b. In particular, denoting the two-spin configuration on
bond b by σ 1,b , σ 2,b we have
A2 (σ 2 , σ 1 )i,j
A !
b=#b v, σ 2,b |e −∆τ Hb
|v, σ 1,b v, σ 2,#b |σiz,+fi e−∆τ H#b σjz,+fj† |v, σ 1,#b
= A −∆τ Hb |v, σ
b v, σ b,2 |e b,1
τ=0
Fig. 10.7. Graphical representation of the propagation of a hole in a given world-line or spin
configuration. The solid lines denotes the possible routes taken by the hole through the spin
configuration. One will notice that due to the constraint which inhibits the states |0, ↓ the
hole motion tracks the up spins
The possible paths the hole follows for a given spin configuration is shown in
Fig. 10.7. With the above construction, a loop algorithm for a given non-frustrated
spin system in arbitrary dimensions may be quickly generalized to tackle the impor-
tant problem of single-hole dynamics in quantum magnets.
A number of other methods without time discretization errors have been devel-
oped in recent years in different contexts. See for example [8, 9, 19, 25, 26, 27, 28,
29] and Chaps. 11 and 12.
In the context of QMC, it was first realized by Beard and Wiese [30] that the limit
∆τ → 0 can be explicitly taken within the loop algorithm. Actually this applies
to any model with a discrete state space, see Sect. 10.3.3. Let us look again at the
isotropic Heisenberg AF, (10.1) with J = Jz = Jx . There are then only vertical and
horizontal breakups in the loop algorithm.
To lowest order in ∆τ , the probability for a horizontal breakup is J∆τ /2, pro-
portional to ∆τ , and the probability for a vertical breakup is 1 − J∆τ /2. This is like
a discrete Poisson process: The event of a horizontal breakup occurs with probability
J∆τ /2. Note that the vertical breakup does not change the world-line configuration;
it is equivalent to the identity operator, see also Sect. 10.4. In the limit ∆τ → 0 the
Poisson process becomes a Poisson distribution in continuous imaginary time, with
probability density J/2 for a horizontal breakup.
In continuous imaginary time there are no plaquettes anymore. Instead, config-
urations are specified by the space and time coordinates of the events, together with
the local spin values. On average, there will be about one event per unit of βJ on
each lattice bond. Therefore the storage requirements are reduced by O(1/∆τ )! The
events are best stored as linked lists, i.e. for each event on a bond there should be
pointers to the events closest in imaginary time, for both sites of the bond.
Monte Carlo Loop updates are implemented quite differently for the multi-loop
and for the single-loop variant, respectively. For multi-loop updates, i.e. the con-
struction and flip of loops for every space-time site of the lattice, one first constructs
a stochastic loop decomposition of the world-line configuration. To do so, horizontal
breakups are put on the lattice with constant probability density in imaginary time
for each bond, but only in time regions where they are compatible with the world-
line configuration, i.e. where the spins are antiferromagnetic. Horizontal breakups
must also be put wherever a world-line jumps to another site. The linked list has to
be updated or reconstructed. The configuration of breakups is equivalent to a con-
figuration of loops, obtained by vertically connecting the horizontal breakups (see
Sect. 10.4). These implicitly given loops then have to be flipped with some constant
probability, usually 1/2. To do so, one can for example go to each stored event
(breakup) and find, and possibly flip, the one or two loops through this breakup,
unless these loop(s) have already been treated.
In single-loop-updates only one single loop is constructed and then always
flipped. Here it is better to make the breakup-decisions during loop construction,
see also Sect. 10.4.1). One starts at a randomly chosen space-time site (i, t0 ). The
loop is constructed piece by piece. It thus has a tail and a moving head. The par-
tial loop can be called a worm (cf. Sect. 10.4.5). The loop points into the present
spin-direction, say upwards in time.
10 World-line and Determinantal Quantum Monte Carlo Methods 301
For each lattice bond ij at the present site, the smallest of the following times
is determined:
(i) The time at which the neighboring spin changes;
(ii) If the bond is antiferromagnetic, the present time t0 plus a decay time generated
with uniform probability density;
(iii) The time at which the spin at site i changes.
The loop head is moved to the smallest of all these times, t1 . Existing breakups
between t0 and t1 are removed. If t1 corresponds to case (ii) or (i), a breakup is
inserted there, and the loop head follows it, i.e. it moves to the neighboring site and
changes direction in imaginary time. Then the construction described in the present
paragraph repeats.
It finishes when the loop has closed. All spins along the loop can then be flipped.
The stochastic series expansion (SSE), invented by A. Sandvik [31, 32, 33] is an-
other representation of exp (−βH) without discretization error. Note that it is not
directly connected to any particular MC-update. Most update methods can (with
some adjustments) be applied either in imaginary time or in the SSE representation.
Let the Hamiltonian be a sum of operators defined on lattice bonds
mb
H=− Hb (10.76)
b
α = 1, 2, . . . , n, and only one event per value of the index. The remaining ma-
trix elements can be evaluated easily. With suitable normalizations of the operators
Hb , they can usually be made to be unity. They are zero for operator configura-
tions which are not possible, e.g. not compatible with periodic world lines, which
will thus not be produced in the Monte Carlo. Spins at sites not connected by any
operator to other sites can be summed over immediately.
Note that, in contrast to imaginary time, now diagonal operators Siz Sjz occur
explicitly, since the exponential factor weighing neighboring world lines has also
been expanded in a power series. Thus, SSE needs more operators on average than
imaginary time for a given accuracy.
The average length n of the operator sequence is β times the average total
energy (as can be seen from ∂ log Z/∂β) and its variance is related to the specific
heat. Therefore in any finite length simulation, only a finite value of n of order
β−H will occur, so that we get results without discretization error, despite the
finiteness of n.
It is convenient to pad the sum in (10.77) with unit operators 1 in order to have
an operator string of constant length N . For details see [31, 32, 33].
Updates in the SSE representation usually proceed in two steps. First, a diagonal
update is performed, for which a switch between diagonal parts of the Hamiltonian,
e.g. Siz Sjz , and unit operators 1 is proposed. This kind of update does not change the
shape of world lines. Second, non-diagonal updates are proposed, e.g. local updates
analogous to the local updates of world lines in imaginary time, see Sect. 10.2. Loop
updates are somewhat different, see Sect. 10.4.
∞
β
τ3
τ2
(−βH0 )
Z = Tr e dτn . . . dτ2 dτ1 V (τ1 ) . . . V (τn ) , (10.77)
n=0 0 0 0
where V (τ ) = exp(H0 τ )V exp(−H0 τ ). When the system size and β are finite, this
is a convergent expansion.
Indeed, in the form of (10.77), this is already the continuous imaginary time
representation of exp (−βH)! When the time integrals are approximated by discrete
sums, then the discrete time representation results.
The SSE representationmcan be obtained in the special case that one chooses
H0 = 0 and V = −H = b b Hb . Then H(τ ) does not depend on τ and the time
integrals can be performed
10 World-line and Determinantal Quantum Monte Carlo Methods 303
β
τ2
βn
dτn . . . dτ1 = (10.78)
n!
0 0
These are just the configurations compatible with the horizontal breakup of the
loop algorithm. The horizontal breakup can thus be interpreted as an operator pro-
jecting onto a spin singlet. The partition function of the Heisenberg model is then
1
βJ
Z = Tr e−βH ∼ Tr e ij 2
. (10.83)
From (10.77) or (10.80) we see that exp(−βH) then corresponds to a Poisson dis-
tribution of horizontal breakups (singlet projection operators) with density J/2 in
imaginary time, on each lattice bond. One instance of such a distribution is shown
in Fig. 10.8 on the left.
Taking the trace means to sum over all spin states on the bottom, with periodic
boundary conditions in imaginary time. Between operators, the spin states cannot
change. The operators can therefore be connected by lines, on which the spin di-
rection does not change. The operator configuration, see Fig. 10.8 (left), therefore
implies a configuration of loops, Fig. 10.8 (middle left). A horizontal breakup stands
for a sum over two spin directions on each of its half-circles. On each loop the spin
direction stays constant along the lines. Thus each loop contributes two states to the
partition function. We arrive at the loop representation of the Heisenberg antiferro-
magnet [4, 43, 44]
β
Poisson distribution of horizontal
Z = 2number of loops . (10.84)
breakups with density J/2
0
Trace
Fig. 10.8. Loop operator representation of the Heisenberg model and of the loop algorithm
10 World-line and Determinantal Quantum Monte Carlo Methods 305
The spin directions on different loops are independent. Therefore the contribution of
a given loop configuration to the spin Greens function S z (x, t) S z (x′ , t′ ) averages
to zero when (x, t) and (x′ , t′ ) are on different loops, whereas it gets four identical
contributions when they are on the same loop [4]. Thus this Greens function can be
measured within the loop representation, and it is particularly simple there. For the
Heisenberg AF and at momentum π, this Greens function only takes the values zero
and one: It is one when (x, t) and (x′ , t′ ) are on the same loop, and zero otherwise.
Thus its variance is smaller than that of S z (x, t) S z (x′ , t′ ) in spin representation,
which takes values +1 and −1. Observables in loop representation such as this
Greens function are therefore called improved estimators.
We also see that the Greens function corresponds directly to the space-time size
of the loops: These are the physically correlated objects of the model, in the same
sense that Fortuin-Kasteleyn clusters are the physically correlated objects of the
Ising model [39, 40, 42].
In the loop representation one can also easily measure the off-diagonal Greens
function S + (x, t) S − (x′ , t′ ). It is virtually inaccessible in the spin world-line rep-
resentation with standard local updates, since contributing configurations would
require partial world lines, which do not occur there. However, in loop represen-
tation, S + (x, t) S − (x′ , t′ ) does get a contribution whenever (x, t) and (x′ , t′ ) are
located on the same loop [4]. For the spin-isotropic Heisenberg model, the estima-
tor in loop representation is identical to that of the diagonal correlation function
S z (x, t) S z (x′ , t′ ).
. J
J
Fig. 10.9. Left: Sketch of regions updated with subsequent loops on an infinite lattice. Right:
Heisenberg spin ladder with two legs
1 L=10 20 1 L=20 40
0.1 0.1
0.01 0.01
0.001 0.001
L=∞
1e-04 1e-04
1e-05 1e-05
1e-06 L=∞ 1e-06
0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 40
Fig. 10.10. Spatial correlation function of Heisenberg ladders at β = ∞, for finite systems
of finite L and, independently, of L = ∞
308 F.F. Assaad and H.G. Evertz
30
β=2 5 10
1
25 N=2
0.1
20
0.01
15
0.001
10 N=4 (times 10)
1e-04
5
1e-05 β=∞
0 2 4 6 8 10 0
0 0.2 0.4 0.6 0.8 1 1.2
Fig. 10.11. Left: Temporal correlation function (Greens function) of Heisenberg ladders at
L = ∞, at finite inverse temperatures β = 2, 5, 10 and, independently, at β = ∞. Right:
Real frequency spectrum obtained by Maximum Entropy continuation
The worm algorithm and directed loops differ in details of the updates. Note
that, like the loop-algorithm, they also allow the measurement of off-diagonal two-
point functions and the change of topological quantum numbers like the number
of particles or the spatial winding. In a suitably chosen version of directed loops,
single-loop updates of the loop algorithm become a special case. For more informa-
tion on worms and directed loops we refer to [5, 14, 15, 16, 17, 35].
ω ω
ω0
ω0 ω0
0 π π π
k k
Fig. 10.12. Issues for the spin-Peierls transition. Left: Einstein (optical) and acoustical bare
phonon dispersions. Middle: Softening scenario. Right: Central peak scenario
(ii) Is the phonon spectrum beyond the transition softened (i.e. the bare phonon
spectrum moves to lower frequency, down to zero at momentum π), or does it
have a separate central peak?
These phonons are the easiest to treat by QMC. In order to make the quantum
phonons amenable to numerical treatment, one can express them with the basic
Feynman path integral for each xi (see Chap. 11), by introducing discrete Trotter
times τj , inserting complete sets of states xi (τj ) and evaluating the resulting matrix
elements to O(∆τ ). A simple QMC for the phonon degrees of freedom can then be
done with local updates of the phonon world lines xi (τ ).
A similar approach is possible in second quantization, by inserting complete
sets of occupation number eigenstates ni (τj ) at the Trotter times τj . Again, one can
perform QMC with local updates on the occupation number states [36, 37]. The
discrete Trotter time can be avoided here, either with continuous time or with SSE
[31, 32, 33].
Such local updates suffer from the usual difficulties of long autocorrelation
times, which occur especially close to and beyond the phase transition. They can be
alleviated by using parallel tempering [50, 51] (or simulated tempering [52]) (see
Chap. 4). In this approach, simulations at many different couplings g (originally:
at many temperatures) are run in parallel. Occasionally, a swap of configurations at
neighboring g is proposed. It is accepted with Metropolis probability. The goal of
this strategy is to have approximately a random walk of configurations in the space
of couplings g. Configurations at high g can then equilibrate by first moving to low
g, where the Monte Carlo is efficient, and then back to high g. The proper choice of
couplings (and of re-weighting factors in case of simulated tempering) depends on
the physics of the system and is sometimes cumbersome. It can, however, be auto-
mated [7] efficiently by measuring the distributions of energies during an initial run.
The results discussed below were obtained using loop updates for spins and
local updates in second quantization for phonons, in SSE representation, similar to
[36, 37], with additional automated tempering. Spectra were obtained by mapping
the SSE configurations to continuous imaginary time, as explained in Sect. 10.3.3,
and measuring Greens functions there using FFT.
310 F.F. Assaad and H.G. Evertz
The location of the phase transition is best determined through the finite size
dependence of a staggered susceptibility, of spins, spin-dimers, or phonons. For
spins it reads
β
1 m
χS (π) = (−1) dτ Snz (τ )Sn+m
z
(0) . (10.86)
N n,m
0
At the phase transition, χS (π) is directly proportional to the system size N , whereas
above gc there are additional logarithmic corrections. Below gc it is proportional to
ln N for any g > 0, i.e. there is a non-extensive central peak in the phonon spectrum
for any finite spin-phonon coupling.
The phonon spectra exhibit drastic changes at the phase transition. Figure 10.13
shows that the value of ω0 determines their qualitative behavior: At ω0 = J the
central peak becomes extensive and develops a linear branch at the phase transition,
which shows the spin-wave velocity. At ω0 = 0.25 J the behavior is completely
different: The bare Einstein dispersion has softened and has joined the previously
non-extensive central peak. Thus both the central peak scenario and the softening
scenario occur, depending on the size of ω0 .
Note that large system sizes and low temperature are essential to get the cor-
rect spectra. The finite size gap of a finite system is of order 1/N . When 1/N is
larger than about ω0 /10 (!), then there are drastic finite size effects in the phonon
spectrum [7].
At very large values of g, the spin gap ∆S becomes sizeable. The system enters
an adiabatic regime when ∆S > O(ω0 ) [49]. For the couplings investigated here, it
is always diabatic.
Phonons other than those treated in Sect. 10.5.1 have in the past posed great diffi-
culties for QMC. Site phonons have a coupling
Fig. 10.13. Spectra of phonon coordinates xi above the phase transition for bond phonons.
Left: ω0 = 1 J, just above the phase transition. Right: ω0 = 0.25 J at g = 0.3 > gc ≃ 0.23.
Lattice size L = 256 and β = 512
10 World-line and Determinantal Quantum Monte Carlo Methods 311
which causes a sign problem when second phonon quantization is used. In first
quantization, phonon updates are very slow. This is even worse in case of acoustical
phonons, which have a zero mode at q = 0. Indeed, no efficient QMC method has
been available for arbitrary phonon dispersions.
Let us now discuss a new method [7] which overcomes all these difficulties. We
use the interaction representation with the pure phonon Hamiltonian as the diagonal
part and the spin interaction (10.87) as the interaction part which is expanded. The
partition function then reads
∞
β
τ2
n
0 &β
Z = Trs dτn . . . dτ1 Dx f ({xl })S[l] e− 0 dτ Hph ({x(τ )})
.
n=0 S 0 0 l=0 phonon path integral
spin operator sequence
(10.88)
Here S[l] is a spin operator like S i S i+1 . The spin-phonon coupling f ({x(τ )})
is to be evaluated at the space-time location where the spin operators act.
For a given sequence of spin operators we now construct a Monte Carlo phonon
update. The effective action Seff for the phonons contains log(f ({x(τ )}). It is
therefore not bilinear and cannot be integrated directly. However, for purposes
of a Monte Carlo update, we can pretend for a moment that the coupling was
prop
f prop (x) := exp(gx) instead of f (x) = 1 + gx. Then Seff is bilinear. For a given
prop
sequence of spin operators, we can diagonalize Seff in momentum space and Mat-
subara frequencies. This results in independent Gaussian distributions of phonon
coordinates in the diagonalized basis. We can then generate a new, completely inde-
pendent phonon configuration by taking one sample from this distribution. In order
to achieve a correct Monte Carlo update for the actual model, we take this sample
as a Monte Carlo proposal and accept or reject it with Metropolis probability for the
actual model, see (10.88).
prop
The acceptance probability will depend on the difference between Seff and Seff ,
and thus on the typical phonon extensions. In order to achieve high acceptance rates
it is advantageous to change phonon configurations only in part of the complete
(q, ωn ) space for each update proposal. These parts need to be smaller close to the
physically important region (q = π, ω = 0).
Given a phonon-configuration, the effective model for the spins is a Heisenberg
antiferromagnet with couplings that vary in space-time. It can be simulated effi-
ciently with the loop-algorithm, modified for the fact that probabilities are now not
constant in imaginary time, but depend on the phonon coordinates.
The approach just sketched works for site phonons as well as for bond phonons.
Remarkably, any bare phonon dispersion ω(q) can be used, since it just appears in
the Gaussian effective phonon action. Measurements of phonon properties are easy,
since the configurations are directly available in (q, ωn ) space.
Let us now briefly discuss some recent results [7] for site phonons. Their bare
dispersion is acoustical, i.e. gapless at q = 0. In a recent letter [53] it was concluded
that for this model, the critical coupling is gc = 0, i.e. the system supposedly orders
312 F.F. Assaad and H.G. Evertz
0.25
0.2
ω
0.15
0.1
0.05
0.
0 π π 3π π
4 2 4
k
Fig. 10.14. Spectrum of phonon coordinates xi for acoustical site phonons, at the phase
transition
at any finite coupling. However, it turns out that this conclusion was based on an
incorrect scaling assumption [7].
QMC examination of the spin susceptibility χS (π) on lattices up to length 256
revealed that the critical coupling is actually finite, and almost identical to that of
dispersionless bond phonons with the same ω0 (π).
The phonon dispersion slightly above the phase transition is shown, together
with the bare dispersion, in Fig. 10.14.
One can see clearly that in this case of small ω0 (π) = 0.25J there is again
phonon softening. The spin-Peierls phase transition only affects phonons with mo-
menta close to π. The soft bare dispersion at q = 0 is not affected at all. Indeed, the
bare dispersion at small momenta has no influence on the phase transition [7].
for the partition function. Here, i runs over all lattice sites and τ from 0 to β. For
a fixed HS field Φ(i, τ ), one has to compute the action S[Φ(i, τ )], corresponding
to a problem of non-interacting electrons in an external space and imaginary time
dependent field. The required computational effort depends on the formulation of
the algorithm. In the Blankenbecler-Scalapino-Sugar (BSS) [6] approach for lattice
models such as the Hubbard Hamiltonian, it scales as βN 3 where N corresponds to
the number of lattice sites. In the Hirsch-Fye approach [54], appropriate for impurity
3
problems it scales as (βNimp ) where Nimp corresponds to the number of correlated
sites. Having solved for a fixed HS field, we have to sum over all possible fields.
This is done stochastically with the Monte Carlo method.
10 World-line and Determinantal Quantum Monte Carlo Methods 313
In comparison to the loop and SSE approaches, auxiliary field methods are slow.
Recall that the computational effort for loop and SSE approaches – in the absence
of a sign problem – scales as N β. However, the attractive point of the auxiliary field
approach lies in the fact that the sign problem is absent in many non-trivial cases
where the loop and SSE methods fail.
H = H t + HU (10.90)
with Ht = −t i,j,σ c†i,σ cj,σ and HU = U i (ni,↑ − 1/2) (ni,↓ − 1/2).
If one is interested in ground-state properties, it is convenient to use the projector
quantum Monte Carlo (PQMC) algorithm [57, 58, 59]. The ground-state expectation
value of an observable O is obtained by projecting a trial wave function |ΨT along
the imaginary time axis
0.2 –0.7
6×6, 〈n〉 = 1, U/t = 4
T = 0 algorithm
Grand canonical
0.15 –0.75
S(π,π) / N
E / (tN)
0.1 –0.8
0.05 –0.85
(a) (b)
0 –0.9
0 5 10 15 20 0 5 10 15 20
2Θt 2Θt
Fig. 10.15. Fourier transform of the spin-spin correlation functions at Q = (π, π) (a) and
energy (b) for the half-filled Hubbard model (10.90). •: PQMC algorithm. △: FTQMC algo-
rithm at β = 2Θ
In the world-line approach, one uses the Trotter decomposition (see App. 10.A) to
split the Hamiltonian into a set of two-site problems. In the auxiliary field approach,
we use the Trotter decomposition to separate the single-body Hamiltonian H0 from
the two-body interaction term in the imaginary time propagation
! m !
Z = Tr e−β(H−μN ) = Tr e−∆τ HU e−∆τ Ht + O ∆2τ , (10.94)
e−∆τ U i (ni,↑ −1/2)(ni,↓ −1/2) =C eα i si (ni,↑ −ni,↓ )
. (10.95)
s1 ,...,sN =±1
where cosh(α) = exp (∆τ U/2) and on an N -site lattice, the constant C =
exp (∆τ U N/4) /2N .
To simplify the notation we introduce the index x = (i, σ) to define
Ht = c†x Tx,y cy ≡ c† T c ,
x,y
α si (ni,↑ − ni,↓ ) = c†x V (s)x,y cy ≡ c† V (s)c . (10.96)
i x,y
10 World-line and Determinantal Quantum Monte Carlo Methods 315
where n1 ∆τ = τ1 and n2 ∆τ = τ2 .
Using the results of App. 10.C we can now write the partition function as
Z = Cm Tr [Us (β, 0)] = C m det [1 + Bs (β, 0)] . (10.98)
s1 ,...,sm s1 ,...,sm
For the PQMC algorithm, we will require the trial wave function to be a Slater
determinant characterized by the rectangular matrix P (see App. 10.C)
Np
Np
0 0 †
|ΨT = c†x Px,y |0 = c P y |0 . (10.99)
y=1 x y=1
Hence,
) *
ΨT |e−2ΘH |ΨT = C m det P † Bs (2Θ, 0)P , (10.100)
s1 ,...,sm
One of the big advantages of the auxiliary field approach is the ability of mea-
suring arbitrary observables. This is based on the fact that for a given Hubbard-
Stratonovich field we have to solve a problem of non-interacting fermions subject
to this time and space dependent field. This leads to the validity of Wick’s theo-
rem. In this section, we will concentrate on equal-time observables, show how to
compute Green functions, and finally demonstrate the validity of Wick’s theorem.
10.6.3.1 PQMC
For each lattice site i, time slice n, we have introduced an independent HS field
s = {si,n } and
316 F.F. Assaad and H.G. Evertz
det P † Bs (2Θ, 0)P
Ps = †
,
s det (P Bs (2Θ, 0)P )
cx c†y s
∂ † (y,x)
= δx,y − lnΨT |Us (2Θ, Θ)eηc A c
Us (Θ, 0)|ΨT |η=0
∂η
∂ (y,x)
On . . . O1 s
"
∂ n lnΨT |Us (2Θ, Θ)eηn On . . . eη1 O1 Us (Θ, 0)|ΨT "" (10.105)
= "
∂ηn . . . ∂η1 η1 ...ηn =0
where OCj means that the operator Oj has been omitted from the product [62].
The cumulant may now be computed order by order. We concentrate on the
(i)
form c†xn cyn . . . c†x1 cy1 so that Ax,y = δx,xi δy,yi . To simplify the notation we
introduce the quantities
B = Bs (Θ, 0)P ,
B = P † Bs (2Θ, Θ) . (10.108)
For n = 2 we have
= Tr Gs (Θ)A(2) Gs (Θ)A(1)
= c†x2 cy1 s cy2 c†x1 s (10.110)
with G = 1 − G. To derive the above, we have used the cyclic properties of the
−1
trace as well as the relation G = 1 − B B B B . Note that for a matrix
A(η), (∂/∂η)A−1 (η) = −A−1 (η)[(∂/∂η)A(η)]A−1 (η). There is a simple rule to
obtain the third cumulant given the second. In the above expression for the second
cumulant, one replaces B with B exp(η3 A(3) ). This amounts in redefining the
Green function as G(η3 ) = 1 − B (B exp(η3 A(3) )B )−1 B exp(η3 A(3) ). Thus,
318 F.F. Assaad and H.G. Evertz
since
∂ ∂
Gs (Θ, η3 )|η3 =0 = −Gs (Θ)A(3) Gs (Θ) = − Gs (Θ, η3 )|η3 =0 . (10.112)
∂η3 ∂η3
Clearly the same procedure may be applied to obtain the n+1th cumulant given the
nth one. It is also clear that the nth cumulant is a sum of products of Green functions.
Thus with (10.107) we have shown that any multi-point correlation function may be
reduced into a sum of products of Green functions: Wicks theorem. Useful relations
include
c†x2 cy2 c†x1 cy1 s = c†x2 cy1 s cy2 c†x1 s + c†x2 cy2 s c†x1 cy1 s . (10.113)
10.6.3.2 FTQMC
where
det (1 + Bs (β, 0))
Ps = ,
s det (1 + Bs (β, 0))
Imaginary time displaced correlation yield important information. On one hand they
may be used to obtain spin and charge gaps [63, 64], as well quasiparticle weights
[23]. On the other hand, with the use of the Maximum Entropy method [65, 66]
and generalizations thereof [67], dynamical properties such as spin and charge dy-
namical structure factors, optical conductivity, and single-particle spectral functions
may be computed. Those quantities offer the possibility of direct comparison with
experiments, such as photoemission, neutron scattering and optical measurements.
Since there is again a Wick’s theorem for time displaced correlation functions, it
suffices to compute the single-particle Green function for a given HS configuration.
We will first start with the FTQMC and then concentrate on the PQMC.
10.6.4.1 FTQMC
where T corresponds to the time ordering. Thus for τ1 > τ2 Gs (τ1 , τ2 )x,y reduces to
) *
†
Tr Us (β, τ1 )cx Us (τ1 , τ2 )c†y Us (τ2 , 0)
cx (τ1 )cy (τ2 )s =
Tr [Us (β, 0)]
) *
Tr Us (β, τ2 )Us−1 (τ1 , τ2 )cx Us (τ1 , τ2 )c†y Us (τ2 , 0)
= .
Tr [Us (β, 0)]
(10.120)
∂cx (τ ) † †
= eτ c Ac [c† Ac, cx ]e−τ c Ac = − Ax,z cz (τ ) . (10.122)
∂τ z
Thus,
Since B is a matrix and not a second quantized operator, we can pull it out of the
trace in (10.120) to obtain
Gs (τ1 , τ2 )x,y = cx (τ1 )c†y (τ2 )s = [Bs (τ1 , τ2 )Gs (τ2 , τ2 )]x,y (10.125)
with τ1 > τ2 , where Gs (τ2 ) is the equal-time Green function computed previously.
A similar calculation will yield for τ2 > τ1
The above equations imply the validity of Wick’s theorem for time displaced
Green functions. Any n-point correlation function at different imaginary times may
be mapped onto an expression containing n-point equal-time correlation functions.
The n-point equal-time correlation function may then be decomposed into a sum of
products of equal-time Green functions. For example, for τ1 > τ2 let us compute
10 World-line and Determinantal Quantum Monte Carlo Methods 321
z,z1
= B −1 (τ1 , τ2 )z,x B(τ1 , τ2 )x,z1 (1 − G(τ2 , τ2 ))z1 ,z (1 − G(τ2 , τ2 ))y,y
z,z1
!
+ (1 − G(τ2 , τ2 ))y,z G(τ2 , τ2 )z1 ,y
) *
= B(τ1 , τ2 ) (1 − G(τ2 , τ2 )) B −1 (τ1 , τ2 ) x,x [1 − G(τ2 , τ2 )]y,y
) *
+ (1 − G(τ2 , τ2 )) B −1 (τ1 , τ2 ) y,x [B(τ1 , τ2 )G(τ2 , τ2 )]x,y
= [1 − G(τ1 , τ1 )]x,x [1 − G(τ2 , τ2 )]y,y − G(τ2 , τ1 )y,x G(τ1 , τ2 )x,y . (10.127)
In the above, we have omitted the index s, used (10.126) and (10.125), Wick’s
theorem for equal-time n-point correlation functions as well as the identity
and
10.6.4.2 PQMC
Zero-temperature time displaced Green functions are given by
τ τ
G s Θ + ,Θ −
2 2 x,y
ΨT |Us 2Θ, Θ + τ2 cx Us Θ + τ2 , Θ − τ2 c†y Us (Θ − τ2 , 0)|ΨT
=
ΨT |Us (2Θ, 0)|ΨT
τ τ
τ
!
= Bs Θ + , Θ − Gs Θ − (10.137)
2 2 2 x,y
and
τ τ
G s Θ − ,Θ +
2 2 x,y
ΨT |Us (2Θ, Θ + τ2 )c†y Us (Θ + τ2 , Θ − τ2 )cx Us (Θ − τ2 , 0)|ΨT
=−
ΨT |Us (2Θ, 0)|ΨT
τ
−1 τ τ
!
= − 1 − Gs Θ − Bs Θ + , Θ − . (10.138)
2 2 2 x,y
Here τ > 0 and we have used (10.124), as well as the equal-time Green function of
(10.103). Two comments are in order.
10 World-line and Determinantal Quantum Monte Carlo Methods 323
(i) For a given value of τ the effective projection parameter is Θ − τ . Thus, be-
fore starting a simulation, one has to set the maximal value of τ which will be
considered, τM and the effective projection parameter Θ − τM should be large
enough to yield the ground state within the desired precision.
(ii) In a canonical ensemble, the chemical potential is meaningless. However, when
single-particle Green functions are computed it is required to set the reference
energy with regards to which a particle will be added or removed. In other
words, it is the chemical potential which delimits photoemission from inverse
photoemission.
Thus, it is useful to have an estimate of this quantity if single-particle or pairing
correlations are under investigation. For observables such as spin-spin or charge-
charge time displaced correlations this complication does not come into play since
they are in the particle-hole channel.
Proof. Let us first remind the reader that an anti-linear operator K satisfies the
property K (αv + βu) = α† Kv + β † Ku, where α and β are complex numbers.
An anti-unitary operator, corresponding to time reversal symmetry for example,
is an unitary anti-linear transformation so that the scalar product remains invari-
ant (Kv, Ku) = (v, u). Let us assume that v is an eigenvector of the matrix
1 + Bs (β, 0) with eigenvalue λ
From (10.139) and (10.97) follows that K† (1 + Bs (β, 0)) K = 1 + Bs (β, 0) such
that
(1 + Bs (β, 0)) Kv = λ† Kv . (10.142)
Hence, Kv is an eigenvector with eigenvalue λ† . To complete the proof, we have to
show that v and Kv are linearly independent
(v, Kv) = K† v, v = KK† v, Kv = − (v, Kv) . (10.143)
In the above, we have used the unitarity of K and the relation K2 = −1. Hence, since
v and Kv are orthogonal, we are guaranteed that λ and λ† will occur in the spectrum.
In particular, if λ is real, it occurs an even number of times in the spectrum.
It is interesting to note that models which show spin-nematic phases can be
shown to be free of sign problems due the above symmetry even though the factor-
ization of the determinant is not present [71].
Clearly, the sign problem remains the central issue in Monte Carlo simulations
of correlated electrons. It has been argued that there is no general solution to this
problem [72]. This does not exclude the possibility of finding novel algorithms
which can potentially circumvent the sign problem for a larger class of models than
at present. A very interesting novel algorithm, the Gaussian Monte Carlo approach,
has recently been introduced by Corney and Drummond [18, 73] and is claimed
to solve the negative sign problem for a rather general class of models containing
the Hubbard model on arbitrary lattices and at arbitrary dopings. As it stands, this
method does not produce accurate results and the interested reader is referred to [19]
for a detailed discussion of those problems.
10.6.6 Summary
In principle, we now have all the elements required to carry out a QMC simulation.
The space we have to sample is that of N m Ising spins. Here N is the number of
lattice sites and m the number of imaginary time slices. For each configuration of
Ising spins s, we can associate a weight. For the PQMC it reads
) *
Ws = C m det P † Bs (2Θ, 0)P (10.144)
The fundamental quantity on which the entire algorithm relies is the equal-time
Green function. For a given HS configuration of auxiliary fields, this quantity is
given by
−1 †
for the FTQMC, see (10.117). On finite precision machines a straightforward cal-
culation of the Green function leads to numerical instabilities at large values of β
or projection parameter Θ. To understand the sources of numerical instabilities, it is
convenient to consider the PQMC. The rectangular matrix P accounting for the trial
wave function is just a set of column orthonormal vectors. Typically for a Hubbard
model, at weak couplings, the extremal scales in the matrix Bs (2Θ, 0) are deter-
mined by the kinetic energy and range from exp(8tΘ) to exp(−8tΘ) in the 2D
case. When the set of orthonormal vectors in P are propagated, the large scales will
wash out the small scales yielding a numerically ill defined inversion of the matrix
P † Bs (2Θ, 0)P . To be more precise consider a two-electron problem. The matrix
P then consists of two column orthonormal vectors v(0)1 and v(0)2 , which after
propagation along the imaginary time axis will be dominated by the largest scales in
Bs (2Θ, 0) so that v(2Θ)1 = v(2Θ)2 + ǫ, where v(2Θ)1 = Bs (2Θ, 0)v 1 . It is the
information contained in ǫ which renders the matrix P † Bs (2Θ, 0)P non-singular.
For large values of Θ this information is lost in round-off errors.
To circumvent this problem a set of matrix decomposition techniques were de-
veloped [58, 59, 61]. Those matrix decomposition techniques are best introduced
with the Gram-Schmidt orthonormalization method of Np linearly independent vec-
tors. At imaginary time τ , Bs (τ, 0)P ≡ B is given by the Np vectors v 1 . . . v Np .
Orthogonalizing those vectors yields
v ′1 = v 1
v 2 · v ′1 ′
v ′2 = v 2 − v
v ′1 · v ′1 1
..
.
Np −1
v Np · v ′i
v ′Np = v Np − ′
′ · v′ vi . (10.148)
i=1
v i i
where D is a diagonal matrix containing the scales. One can repeat the procedure to
obtain: B ≡ P † Bs (2Θ, τ ) = VL DL U . The Green function for the PQMC is now
particularly easy to compute:
−1
1 − Gs (τ ) = B B B B
−1
= U DR VR VL DL U U DR VR VL DL U
−1
−1 −1
= U DR VR (DR VR ) U U (VL DL ) VL DL U
−1
= U U U U . (10.151)
Thus, in the PQMC, all scales which are at the origin of the numerical instabilities
disappear from the problem when computing Green functions. Since the entire algo-
rithm relies solely on the knowledge of the Green function, the above stabilization
procedure leaves the physical results invariant. Note that although appealing, the
Gram-Schmidt orthonormalization is itself unstable, and hence it is more appropri-
ate to use singular value decompositions based on Housholder’s method to obtain
the above U DV -form for the B matrices [74]. In practice the frequency at which the
stabilization is carried out is problem dependent. Typically, for the Hubbard model
with ∆τ t = 0.125 stabilization at every 10th time slice produces excellent accuracy.
The stabilization procedure for the finite-temperature algorithm is more subtle
since scales do not drop out in the calculation of the Green function. Below, we
provide two ways of computing the Green function.
The first approach relies on the identity
−1 1 −1 −1 2
AB A − BD−1 C C − DB −1 A
= −1 −1 , (10.152)
CD B − AC −1 D D − CA−1 B
where A, B, C and D are matrices. Using the above, we obtain
−1
1 Bs (β, τ ) Gs (0) −(1 − Gs (0))Bs−1 (τ, 0)
= .
−Bs (τ, 0) 1 Bs (τ, 0)Gs (0) Gs (τ )
(10.153)
The diagonal terms on the right hand side of the above equation correspond to
the desired equal-time Green functions. The off-diagonal terms are nothing but the
time displaced Green functions, see (10.125) and (10.126) . To evaluate the left hand
side of the above equation, we first have to bring Bs (τ, 0) and Bs (β, τ ) in U DV -
forms. This has to be done step by step so as to avoid mixing large and small scales.
Consider the propagation Bs (τ, 0), and a time interval τ1 , with nτ1 = τ , for which
the different scales in Bs (nτ1 , (n − 1)τ1 ) do not exceed machine precision. Since
Bs (τ, 0) = Bs (nτ1 , (n − 1)τ1 ) . . . Bs (τ1 , 0) we can evaluate Bs (τ, 0) for n = 2
with
Bs (2τ1 , τ1 ) Bs (τ1 , 0) = ((Bs (2τ1 , τ1 )U1 )D1 ) V1 = U2 D2 V2 , (10.154)
U1 D1 V1 U2 D2 V
328 F.F. Assaad and H.G. Evertz
where V2 = V V1 . The parenthesis determine the order in which the matrix mul-
tiplication are to be done. In all operations, mixing of scales is avoided. After the
multiplication with diagonal matrix D1 scales are again separated with the use of
the singular value decomposition.
Thus, for Bs (τ, 0) = UR DR VR and Bs (β, τ ) = VL DL UL we have to invert
−1
I VL DL UL
−UR DR VR I
−1
VL 0 (VR VL )−1 DL VR 0
=
0 UR −DR (UL UR )−1 0 UL
UDV
−1
−1
VR 0 −1 −1 −1 VL 0
= V D U . (10.155)
0 UL−1 0 UR−1
In the above, all matrix multiplications are well defined. In particular, the matrix
−1 −1
D contains only large scales since the matrices (VR VL ) and (UL UR ) act as
a cutoff to the exponentially small scales in DL and DR . This method to compute
Green functions is very stable and has the advantage of producing time displaced
Green functions. However, it is numerically expensive since the matrices involved
are twice as big as the B matrices.
Alternative methods to compute Gs (τ ) which involve matrix manipulations only
of the size of B include
Again, (UL UR )−1 acts as a cutoff to the small scales in DR (VR VL )DL so that
D contains only large scales.
The accuracy of both presented methods may be tested by in the following way.
Given the Green function at time τ we can upgrade and wrap, see (10.128), this
Green function to time slice τ + τ1 . Of course, for the time interval τ1 the involved
scales should lie within the accuracy of the computer ∼ 10−12 for double precision
numbers. The Green function at time τ + τ1 obtained thereby may be compared
to the one computed from scratch using (10.155) or (10.156). For a 4 × 4 half-
filled Hubbard model at U/t = 4, βt = 20, ∆τ t = 0.1 and τ1 = 10 ∆τ we
obtain an average (maximal) difference between the matrix elements of both Green
functions of 10−10 ( 10−6 ) which is orders of magnitude smaller than the statistical
uncertainty. Had we chosen τ1 = 50 ∆τ the accuracy drops to 0.01 and 100.0 for
the average and maximal differences.
10 World-line and Determinantal Quantum Monte Carlo Methods 329
The Monte Carlo sampling used in the auxiliary field approach is based on a single
spin-flip algorithm. Acceptance or rejection of this spin flip requires the knowledge
of the ratio
Ps′
R= , (10.157)
Ps
where s and s′ differ only at one point in space i, and imaginary time n. For the
Ising field required to decouple the Hubbard interaction, (10.236) and (10.239)
/
si′ ,n′ if i′ = i and n′ = n
s′i′ ,n′ = . (10.158)
−si,n if i′ = i and n′ = n
The calculation of R boils down to computing the ratio of two determinants
⎧
⎪ det [1 + Bs′ (β, 0)]
⎪
⎪ det [1 + B (β, 0)] for the FTQMC
⎨ s
R= ) * . (10.159)
⎪
⎪ det P † Bs′ (2Θ, 0)P
⎪
⎩ for the PQMC
det [P † Bs (2Θ, 0)P ]
For the Hubbard interaction with HS transformation of (10.236) only the matrix
V (sn ) will be effected by the move. Hence, with
′
′
!
eV (s n ) = 1 + eV (s n ) e−V (sn ) − 1 eV (sn ) (10.160)
∆
we have
Bs′ (•, 0) = Bs (•, τ ) (1 + ∆) Bs (τ, 0) , (10.161)
where the • stands for 2Θ or β and τ = n∆τ .
For the FTQMC, the ratio is given by
Where the last line follows from the fact that the equal-time Green function reads
−1
Gs (τ ) = (1 + Bs (τ, 0)Bs (β, τ )) . Hence the ratio is uniquely determined from
the knowledge of the equal-time Green function.
Let us now compute the ratio for the PQMC. Introducing the notation Bs =
P † Bs (2Θ, τ ) and Bs = Bs (τ, 0)P , again we have to evaluate
330 F.F. Assaad and H.G. Evertz
!
det Bs 1 + ∆(i) Bs
−1
! = det Bs 1 + ∆(i) Bs Bs Bs
det Bs Bs
−1
= det 1 + Bs ∆(i) Bs Bs Bs
−1
(i)
= det 1 + ∆ Bs Bs Bs Bs , (10.163)
where the last equation follows from the identity det [1 + AB] = det [1 + BA] for
arbitrary rectangular matrices3 . We can recognize the Green function of the PQMC
Bs (Bs Bs )−1 Bs = 1 − Gs (τ ). The result is thus identical to that of the FTQMC
provided that we replace the finite-temperature equal-time Green function with the
zero-temperature one. Hence, in both algorithms, the ratio is essentially given by
the equal-time Green function which, at this point, we know how to compute in a
numerically stable manner.
Having calculated the ratio R for a single spin flip one may now decide stochas-
tically within, for example, a Metropolis scheme if the move is accepted or not.
In case of acceptance, we have to update the Green function since this quantity is
required at the next step.
Since in general the matrix ∆ has only a few non-zero entries, it is convenient
to use the Sherman-Morrison formula [74] which states that
(A + u ⊗ v)−1 = (1 + A−1 u ⊗ v)−1 A−1
= [1 − A−1 u ⊗ v + A−1 u ⊗ vA −1 −1 2
u ⊗v + A u ⊗ λ v − . . .]A
−1
≡λ
) *
= 1 − A u ⊗ v 1 − λ + λ2 − . . . A−1
−1
−1 −1
−1 A u ⊗ vA
=A − , (10.164)
1 + v • A−1 u
where A is an N × N matrix, u, v N -dimensional vectors with tensor product
defined as (u ⊗ v)x,y = ux v y .
To show how to use this formula for the updating of the Green function, let us
′
first assume that matrix ∆ has only one non-vanishing entry ∆x,y = δx,z δy,z′ η (z,z ) .
In the case of the FTQMC we will then have to compute
Gs′ (τ ) = [1 + (1 + ∆)Bs (τ, 0)Bs (β, τ )]−1
−1
= Bs−1 (β, τ ) [1 + Bs (β, τ )(1 + ∆)Bs (τ, 0)] Bs (β, τ )
−1
= Bs−1 (β, τ ) [1 + Bs (β, τ )Bs (τ, 0) + u ⊗ v] Bs (β, τ )
(10.165)
′
where ux = [Bs (β, τ )]x,z η (z,z ) and v x = [Bs (τ, 0)]z′ ,x .
3
This identity may be formally proven by using the relation det(1 + AB) =
exp(Tr log(1 + AB)), expanding the logarithm and using the cyclic properties of the
trace.
10 World-line and Determinantal Quantum Monte Carlo Methods 331
Using the Sherman-Morrison formula for inverting 1+Bs (β, τ )Bs (τ, 0)+u⊗v
yields
′
[Gs (τ )]x,z η (z,z ) [1 − Gs (τ )]z′ ,y
[Gs′ (τ )]x,y = [Gs (τ )]x,y − . (10.166)
1 + η (z,z′ ) [1 − Gs (τ )]z′ ,z
Precisely the same equation holds for the PQMC provided that one replaces the
finite-temperature Green function by the zero-temperature one. To show this, one
will first compute
−1
−1
(Bs′ Bs′ )−1 = Bs (1 + ∆)Bs = Bs Bs + u ⊗ v
(Bs Bs )−1 u ⊗ v(Bs Bs )−1
= (Bs Bs )−1 − (10.167)
1 + η (z,z′ ) [1 − G0s (τ )]z′ ,z
′
with ux = [Bs ]x,z η (z,z ) and v x = [Bs ]z′ ,x . Here x runs from 1 . . . Np where Np
corresponds to the number of particles contained in the trial wave function and the
zero-temperature Green function reads G0s (τ ) = 1 − Bs (Bs Bs )−1 Bs . After some
straightforward algebra, one obtains
) 0 * !
Gs′ (τ ) x,y = 1 − (1 + ∆)Bs (Bs (1 + ∆)Bs )−1 Bs
x,y
) 0 * (z,z ′ )
) *
) 0 * Gs (τ ) x,z η 1 − G0s (τ ) z′ ,y
= Gs (τ ) x,y − .(10.168)
1 + η (z,z′ ) [1 − G0s (τ )]z′ ,z
In the above, we have assumed that the matrix ∆ has only a single non-zero
entry. In general, it is convenient to work in a basis where ∆ is diagonal with n non-
vanishing eigenvalues. One will then iterate the above procedure n-times to upgrade
the Green function.
In Sect. 10.6.4 we introduced the time displaced Green functions both within the
ground-state and finite-temperature formulations. Our aim here is to show how to
compute them in a numerically stable manner. We will first start with the FTQMC
and then concentrate on the PQMC.
10.7.3.1 FTQMC
Gs (τ1 , τ2 )x,y = cx (τ1 )c†y (τ2 )s = Bs (τ1 , τ2 )Gs (τ2 ) τ1 > τ2 , (10.169)
where τ2 < τ1 .
With the above method, we have access to all time displaced Green functions
Gs (0, τ ) and Gs (τ, 0). However, we do not use translation invariance in imaginary
time. Clearly, using this symmetry in the calculation of time displaced quantities will
reduce the fluctuations which may sometimes be desirable. A numerically expensive
but elegant way of producing all time displaced Green functions relies on the inver-
sion of the matrix O given in (10.129). Here, provided the τ1 is small enough so
that the scales involved in Bs (τ + τ1 , τ ) fit on finite precision machines, the matrix
inversion of O−1 is numerically stable and and yields the Green functions between
arbitrary time slices nτ1 and mτ1 . For β/τ1 = l, the matrix to inverse has the di-
mension l times the size of the B matrices, and is hence expensive to compute. It is
worth noting that on vector machines the performance grows with growing vector
size so that the above method can become attractive. Having computed the Green
functions Gs (nτ1 , mτ1 ) we can obtain Green functions on any two time slices by
using equations of the type (10.171).
10.7.3.2 PQMC
−1 τ τ
!
Gs Θ − , Θ + = − 1 − Gs Θ − Bs Θ + , Θ −
2 2 x,y 2 2 2 x,y
(10.172)
where the sum runs over nearest neighbors. For this Hamiltonian one has
Ψ0 |c†k (τ )ck |Ψ0 = eτ (ǫk −μ) Ψ0 |c†k ck |Ψ0 , (10.174)
The above involves only well defined numerical manipulations even in the large
τ limit provided that all scales fit onto finite precision machines for a unit time
interval.
The implementation of this idea in the QMC algorithm is as follows. First, one
has to notice that the Green function Gs (τ ) is a projector
Gs (τ )2 = Gs (τ ) . (10.176)
G2s (τ ) = Gs (τ ) ,
(1 − Gs (τ ))2 = 1 − Gs (τ ) . (10.177)
τ τ
N
0 −1 τ τ
Gs Θ + , Θ − = Gs Θ − + [n + 1] τ1 , Θ − + nτ1 .
2 2 n=0
2 2
(10.180)
The above equation is the generalization of (10.175). If τ1 is small enough each
Green function in the above product is accurate and has matrix elements bounded
by order unity. The matrix multiplication is then numerically well defined.
We conclude this section by comparing with a different approach to computed
imaginary time correlation functions in the framework of the PQMC [63]. We con-
sider the special case of the Kondo lattice model (see Fig. 10.16). As apparent the re-
sults are identical within error-bars. The important point however, is that the method
based on (10.180) is for the considered case an order of magnitude quicker in CPU
time than the method of [63].
0.0 →
1n 〈ψ0⏐S†i,σ (τ) Si,σ (0)⏐ψ0〉
–0.5
–1.0
–1.5
–2.0
(a)
–2.5
0.0
1n 〈ψ0⏐Σσ c†i,σ (τ) ci,σ (0)⏐ψ0〉
–1.0
–2.0
–3.0
–4.0
–5.0
–6.0
(b)
–7.0
0.00 2.00 4.00 6.00 8.00 10.00 12.00
τt
Fig. 10.16. Imaginary time displaced on-site spin-spin correlation function (a) and Green
function (b). We consider a 6 × 6 lattice at half-filling and J/t = 1.2. In both (a) and (b)
results obtained from (10.180) (△) and from an alternative approach presented in [63] (▽)
are plotted
10 World-line and Determinantal Quantum Monte Carlo Methods 335
with 1 ≤ nτ ≤ n.
At this stage we can sequentially upgrade the Hubbard Stratonovich fields from
τ = β to τ = ∆τ . In doing so, we will take care of storing information to subse-
quently carry out a sweep from τ = ∆τ to τ = β.
10.7.4.1 From τ = β to τ = Δτ
nτ1 = β (n−1)τ1 τ1 1
Fig. 10.17. Each line (solid or dashed) denotes a time slice separated by an imaginary
time propagation ∆τ . The solid lines correspond to time slices where we store the U DV -
decomposition of the matrices Bs (β, nτ τ1 ) or Bs (nτ τ1 , 0) depending upon the direction of
the propagation ( 1 ≤ nτ ≤ n)
336 F.F. Assaad and H.G. Evertz
We will repeat the above procedure till we arrive at time slice τ = (nτ − 1)τ1 .
At this stage, we have to recompute the equal-time Green function due to the ac-
cumulation of round-off errors and hence loss of precision. To do so, we read
from the storage UR = Unτ −1 , DR = Unτ −1 and VR = Unτ −1 such that
Bs ((nτ − 1)τ1 , 0) = UR DR VR . Note that we have not yet upgraded the Hubbard
Stratonovich fields involved in Bs ((nτ − 1)τ1 , 0) so that this storage slot is still
up to date. We then compute the matrix Bs (nτ τ1 , (nτ − 1)τ1 ) and read from the
storage V#L = Vnτ , D# L = Vnτ and U #L = Vnτ such that Bs (β, nτ τ1 ) = V#L D #L .
# LU
With this information and the computed matrix Bs (nτ τ1 , (nτ − 1)τ1 ) we will cal-
culate Bs (β, (nτ − 1)τ1 ) = VL DL UL , see (10.154). We now store this result as
Vnτ −1 = VL , Dnτ −1 = DL and Unτ −1 = UL , and recompute the Green function.
Note that as a cross check, one can compare both Green functions to test the numer-
ical accuracy. Hence, we now have a fresh estimate of the Green function at time
slice τ = (nτ − 1)τ1 and we can iterate the procedure till we arrive at time slice ∆τ .
Hence, in this manner, we sweep down from time slice β to time slice ∆τ ,
upgrade sequentially all the Hubbard Stratonovich fields and have stored
with 0 ≤ nτ ≤ n. We can now carry out a sweep from ∆τ to β and take care of
storing the information required for the sweep from β to ∆τ .
10.7.4.2 From τ = Δτ to β
We initially set nτ = 0, read out from the storage Bs (β, 0) = V0 D0 U0 and compute
the Green function on time slice τ = 0. This storage slot is then set to unity such
that Bs (0, 0) = U0 D0 V0 ≡ 1.
Assuming that we are on time slice τ = nτ τ1 , we propagate the Green function
to time slice τ + ∆τ with
and upgrade the Hubbard Stratonovich fields on time slice τ + ∆τ . The above pro-
cedure is repeated till we reach time slice (nτ + 1)τ1 , where we have to recompute
the Green function. To do so, we read from the storage VL = Vnτ +1 DL = Dnτ +1
and UL = nτ + 1 such that Bs (β, (nτ + 1)τ1 ) = VL DL UL . We then compute
Bs ((nτ + 1)τ1 , nτ τ1 ) and from the U DV -form of Bs (nτ τ1 , 0) which we obtain
from the storage slot nτ , we calculate Bs ((nτ + 1)τ1 , 0) = UR DR VR . The result of
the calculation is stored in slot nτ +1, and we recompute the Green function on time
slice (nτ + 1)τ1 . We can now proceed till we reach time slice β and we will have
accumulated all the information required for carrying out a sweep from β to ∆τ .
This completes a possible implementation of the finite-temperature method. The
zero-temperature method follows exactly the same logic. However, it turns out that
it is more efficient to keep track of (P † Bs (2Θ, 0)P )−1 since (i) it is of dimension
Np × Np in contrast to the Green function which is a N × N matrix, and (ii) it is τ
independent. When Green functions are required they are computed from scratch.
10 World-line and Determinantal Quantum Monte Carlo Methods 337
H − μN = H0 + HU (10.185)
with
V †
For an extensive overview of the Anderson and related Kondo model, we refer
the reader to [78].
In the next section, we will review the finite-temperature formalism. Since the
CPU time scales as β 3 it is expensive to obtain ground state properties, and projec-
tive formulations of Hirsch-Fye algorithm become attractive. This corresponds to
the topic of Sect. 10.8.2.
In Sect. 10.6 we have shown that the grand-canonical partition function may be writ-
ten as
! 0 ) *
−β(H−μN ) σ σ σ
Z ≡ Tr e = det 1 + Bm Bm−1 . . . B1 (10.187)
s σ
with m∆τ = β.
To define the matrices Bnσ , we will label all the orbitals (conduction and im-
purity) with the index i and use the convention that i = 0 denotes the f -orbital
and i = 1 . . . N the conduction orbitals. We will furthermore define the fermionic
operators
338 F.F. Assaad and H.G. Evertz
/
fσ† if i = 0
a†i,σ = , (10.188)
c†i,σ otherwise
such that the non-interacting term of the Anderson takes the form
†
H0 = H0σ , H0σ = ai,σ (h0 )i,j aj,σ . (10.189)
σ i,j
with
⎛ ⎞
1 0 . . 0 B1σ
⎜ −B2σ 1 0 . . 0 ⎟
⎜ ⎟
⎜ 0 −B3σ 1 . . 0 ⎟
⎜ ⎟
O =⎜
σ
⎜ . 0 −B4σ . . . ⎟⎟ . (10.192)
⎜ . . 0 . . . ⎟
⎜ ⎟
⎝ . . . . . . ⎠
σ
0 . . 0 −Bm 1
The above identity, follows by considering – omitting spin indices – the matrix
A = O − 1. Since
r(m+1) r
Tr [An ] = δn,rm (−1) mTr [(Bm . . . B1 ) ] (10.193)
r
we obtain:
∞ (−1)n+1
Tr[An ]
det O = eTr ln(1+A) = e n=1 n
∞ (−1)r+1
Tr[(Bm ...B1 )r ]
=e r=1 r
and V ′σ the Green functions g σ and g ′σ satisfy the following Dyson equation
′σ σ
g σ = g ′σ + g ′σ ∆(1 − g σ ) with ∆σ = (eV e−V − 1) . (10.199)
#−1 = [O
g# ≡ O B′ + O
#−O B′ ]−1 = g#′ − g#′ e−V − e−V ′ g# . (10.201)
≡e−V −e−V ′
The starting point of the algorithm is to compute the green function for a random
HS configuration of Ising spins s′ . We will only need the Green function for the
impurity f -site. Let x = (τx , ix ) with Trotter index τx and orbital ix . Since
′ ′
(eV e−V − 1)x,y = (eV e−V − 1)x,x δx,y δix ,0 (10.203)
we can use the Dyson equation only for the impurity Green function
σ ′σ ′σ σ σ
gf,f ′ = gf,f ′ + gf,f ′′ ∆f ′′ ,f ′′ (1 − g )f ′′ ,f ′ (10.204)
f ′′
10.8.1.0.2 Upgrading
At this point we have computed the impurity Green function for a given HS config-
uration s. Adopting a single spin flip algorithm we will propose the configuration
/
′ −sf if f = f1
sf = (10.207)
sf otherwise
with
) ′σ ′σ
*
σ det 1 + Bm Bm−1 . . . B1′σ ) *
R = ) σ σ
* = det g σ (g ′σ )−1
σ
det 1 + Bm Bm−1 . . . B1
= det [1 + ∆σ (1 − g σ )] . (10.209)
10 World-line and Determinantal Quantum Monte Carlo Methods 341
zero matrix element: ∆σf1 ,f1 . Hence, Rσ = 1+∆σf1 ,f1 (1−gfσ1 ,f1 ). Since the impurity
Green function g I,σ is at hand, we can readily compute R.
If the move (spin flip) is accepted, we will have to recalculate (upgrade) the
impurity Green function. From the Dyson equation (10.206), we have
) *−1
g ′I,σ = g I,σ 1 + ∆I,σ 1 − g I,σ . (10.210)
To compute [1 + ∆I,σ 1 − g I,σ ]−1 we can use the Sherman-Morrison formula of
(10.164). Setting A = 1, uf = ∆I,σ f1 ,f1 δf1 ,f and v f = (1 − g
I,σ
)f1 ,f we obtain
I,σ
′I,σ I,σ gf,f1
∆σf1 ,f1 (g I,σ − 1)f1 ,f ′
gf,f ′ = gf,f ′ + . (10.211)
1 + (1 − g I,σ )f1 ,f1 ∆σf1 ,f1
Thus, the upgrading of the Green function under a single spin flip is an operation
which scales as m2 . Since for a single sweep we have to visit all spins, the compu-
tational cost of a single sweep scales as m3 .
By construction, the Hirsch-Fye algorithm is free from numerical stabilization
problems. For the here considered Anderson model, it has recently been shown that
there is no sign problem irrespective of the conduction band electron density [79].
Clearly the attractive feature of the Hirsch-Fye impurity algorithm is that the algo-
rithm may be formulated directly in the thermodynamic limit. This is not possible
within the lattice formulation of the auxiliary field QMC method. Within this ap-
proach the dimension of the matrices scale as the total number of orbitals N , and
the CPU time for a single sweep as N 3 β. The Hirsch-Fye algorithm is not limited
to impurity models. However, when applied to lattice models, such as the Hubbard
3
model, it is not efficient since the CPU time will scale as (βN ) .
To conclude this section we show a typical example of the use of the Hirsch-Fye
algorithm for the Kondo model
H= ε(k)c†k,σ ck,σ + JS Ic · S If . (10.212)
k,σ
For the Monte Carlo formulation, the same ideas as for the lattice problem may
be used for the HS decoupling of the interaction as well as to impose the constraint
of no charge fluctuations on the f -sites. Figure 10.18 plots the impurity spin suscep-
tibility
β
χ = dτ S If (τ ) · S If
I
(10.213)
0
Single impurity
0.5
0.4
: J/t = 2.0, TKI/t = 0.21
0.3
TχI : J/t = 1.6, TKI/t = 0.12
0.2
: J/t = 1.2, TKI/t = 0.06
0.1
0
0.1 1 10 100
T/Tk
Fig. 10.18. Impurity spin susceptibility of the Kondo model as computed with the Hirsch-Fye
impurity algorithm [80]
which has |ΨT as a non-degenerate ground state. In the above, and in the context
of the Anderson model, aj,σ denotes c- or f -fermionic operators. Our aim is to
compute
Θ Θ
!
Θ Θ
ΨT |e− 2 H Oe− 2 H |ΨT Tr e− 2 H Oe− 2 H e−βHT
≡ lim (10.215)
ΨT |e−ΘH |ΨT β→∞ Tr [e−ΘH e−βHT ]
and subsequently take the limit Θ → ∞. As apparent, the above equation provides a
link between the finite temperature and projection approaches. To proceed, we will
consider the right hand side of the above equation and retrace the steps carried out
for the standard finite-temperature formulation of the Hirsch-Fye algorithm. After
Trotter decomposition and discrete Hubbard Stratonovich transformation we obtain
0 ) *
−ΘH σ σ σ −βhT
ΨT |e |ΨT = lim det 1 + Bm Bm−1 . . . B1 e (10.216)
β→∞
s σ
with m∆τ = Θ. Replacing B1σ by B1σ exp(−βhT ) in (10.192) and following the
steps described for the finite-temperature version, we derive a Dyson equation (omit-
ting spin indices) for the ground-state Green function matrix g0
10 World-line and Determinantal Quantum Monte Carlo Methods 343
′σ σ
g0σ = g0′σ + g0′σ ∆(1 − g0σ ) , ∆σ = (eV e−V − 1) , (10.217)
with ⎛ ⎞
G0 (1, 1) G0 (1, 2) . . . G0 (1, m)
⎜ G0 (2, 1) G0 (2, 2) . . . G0 (2, m) ⎟
g0 = ⎜
⎝
⎟
⎠ (10.218)
. . ... .
G0 (m, 1) G0 (m, 2) . . . G0 (m, m)
and
0.12 : T=0
〈 f↑f↑f↓ f↓〉
0.11 : Finite T
†
0.1
†
0.09
0.08
0.001 0.01 0.1 1
1/βt
Fig. 10.19. Comparison between the zero and finite-temperature Hirsch-Fye algorithms for
the symmetric Anderson model, with an 1D density of states
344 F.F. Assaad and H.G. Evertz
as well as spin excitations were investigated in detail. One can show numerically
that the quasiparticle residue in the vicinity of k = (π, π) tracks the Kondo scale
of the corresponding single impurity problem. This statement is valid both in the
magnetically ordered and disordered phases [102]. This suggest that the coherence
temperature tracks the Kondo scale. Furthermore, the effect of a magnetic field on
the Kondo insulating state was investigated. For the particle-hole symmetric con-
duction band, results show a transition from the Kondo insulator to a canted antifer-
romagnet [103, 104]. Finally, models with regular depletion of localized spins can
be investigated [80]. Within the framework of those models, the typical form of the
resistivity versus temperature can be reproduced.
The most common application of the Hirsch-Fye algorithm is in the framework
of dynamical mean-field theories [77] which map the Hubbard model onto an An-
derson impurity problem supplemented by a self-consistency loop. At each iteration,
the Hirsch-Fye algorithm is used to solve the impurity problem at finite tempera-
ture [76] or at T = 0 [81]. For this particular problem, many competing methods
such as DMRG [105] and NRG [106] are available. In the dynamical mean-field
approximation spatial fluctuations are frozen out. To reintroduce them, one has to
generalize to cluster methods such as the dynamical cluster approximation (DCA)
[107] or cellular-DMFT (CDMFT) [108]. Within those approaches, the complexity
of the problem to solve at each iteration is that of an N -impurity Anderson model
(N corresponds to the cluster size). Generalizations of DMRG and NRG to solve
this problem are difficult. On the other hand, as a function of cluster size the sign
problem in the Hirsch-Fye approach becomes more and more severe but is, in many
instances, still tractable. It however proves to be one of the limiting factors in achiev-
ing large cluster sizes.
10.10 Conclusion
We have discussed in details a variety of algorithms which can broadly be classified
as world-line based or determinantal algorithms. For fermionic models, such as the
Hubbard model, the determinantal QMC algorithm should be employed because of
the reduced sign problem in this formulation. For purely 1D fermion systems and
for spin models the world-line algorithms are available, which have lower autocor-
relations, and better scaling because of their almost linear scaling with system size,
in contrast to the cubic scaling of the determinantal algorithms.
where m∆τ = β. For [H1 , H2 ] = 0 and finite values of the time step ∆τ this
introduces a systematic error. In many QMC algorithms we will not take the limit
∆τ → 0, and it is important to understand the order of the systematic error produced
by the above decomposition4. A priori, it is of the order ∆τ . However, in many non-
trivial cases, the prefactor of the error of order ∆τ vanishes [109].
For a time step ∆τ
∆τ 2
e−∆τ (H1 +H2 ) = e−∆τ H1 e−∆τ H2 − [H1 , H2 ] + O(∆τ 3 ) , (10.222)
2
such that
e−∆τ (H−∆τ /2[H1 ,H2 ]) = e−∆τ H1 e−∆τ H2 + O(∆τ 3 ) . (10.223)
We can now exponentiate both sides of the former equation to the power m
) *m
e−β(H−∆τ /2[H1 ,H2 ]) = e−∆τ H1 e−∆τ H2 + O(∆τ 2 ) . (10.224)
The systematic error is now of order ∆τ 2 since in the exponentiation, the systematic
error of order ∆τ 3 occurs m times and m∆τ = β.
To evaluate the left hand side of the above equation we use time dependent
perturbation theory. Let h = h0 + h1 , where h1 is small in comparison to h0 . The
imaginary time propagation in the interacting picture reads
UI (τ ) = eτ h0 e−τ h (10.225)
such that
∂
UI (τ ) = eτ h0 (h0 − h)e−τ h = − eτ h0 h1 e−τ h0 UI (τ )
∂τ
≡hI1 (τ )
β
†
A =− dτ e−τ H [H1 , H2 ]e−(β−τ )H
0
0
′ ′
= dτ ′ e−(β−τ )H
[H1 , H2 ]e−τ H
= −A , (10.230)
β
′
) † * τ = β − τ Since A is an anti-Hermitian
where we have carried out the substitution
operator it follows that Tr [A] = Tr A = −Tr [A] as well as Tr [AO] = −Tr [AO].
Recall that the observable O is a Hermitian operator. Thus, if O, H1 and H2 are
simultaneously real representable in a given basis, the systematic error proportional
to ∆τ vanishes since in this case the trace is real. Hence the systematic error is of
order ∆τ 2 .
Clearly there are other choices of the Trotter decomposition which irrespective
of the properties of H1 , H2 and O yield systematic errors of the order ∆τ 2 . For
example we mention the symmetric decomposition
However, in many cases higher order decompositions are cumbersome and numeri-
cally expensive to implement.
Auxiliary field QMC methods are based on various forms of the Hubbard-Stratono-
vich (HS) decomposition. This transformation is not unique. The efficiency of the
algorithm as well as of the sampling scheme depends substantially on the type of
HS transformation one uses. In this appendix we will review some aspects of the HS
transformation with emphasis on its application to the auxiliary field QMC method.
The generic HS transformation is based on the Gaussian integral
+∞
√
2
dφe−(φ+A) /2 = 2π , (10.232)
−∞
+∞
A2 /2 1 2
e = √ dφe−φ /2−φA . (10.233)
2π
−∞
Hence, if A is a one-body operator, the two-body operator exp(A2 /2), can be trans-
formed into the integral of single-body operators interacting with a bosonic field φ.
The importance of this identity in the Monte Carlo approach lies in the fact that for
a fixed field φ the one-body problem is exactly solvable. The integral over the field
φ can then be carried out with Monte Carlo methods. However, the Monte Carlo
integration over a continuous field is much more cumbersome than the sum over a
discrete field.
Let us consider for example the Hubbard interaction for a single site
Here, nσ = c†σ cσ where c†σ are spin 1/2 fermionic operators. In the Monte Carlo
approach after Trotter decomposition of the kinetic and interaction term, we will
have to compute exp(−∆τ HU ). Since,
U U
HU = − (n↑ − n↓ )2 + (10.235)
2 4
2
we can set A2 = ∆τ U (n↑ − n↓ ) and use (10.233) to compute exp(−∆τ HU ).
There are, however, more efficient ways of carrying out the transformation which
are based on the fact that the Hilbert space for a single site consists of four states
|0, | ↑, | ↓ and | ↑, ↓. Let us propose the identity
e−∆τ HU = γ eαs(n↑ −n↓ ) (10.236)
s=±1
and see if it is possible to find values of α and γ to satisfy it on the single site Hilbert
space. Applying each state vector on both sides of the equation yields
the HS field s couples to the z-component of the magnetization the spin symmetry
is broken for a fixed value of the field and is restored only after summation over
the field. To avoid this symmetry breaking, one can consider alternative HS trans-
formations which couple to the density. In the same manner as above, we can show
that
e−∆τ HU = #γ ei#αs(n↑ +n↓ −1) , (10.239)
s=±1
where cos(# α) = exp(−∆τ U/2) and γ # = exp(∆τ U/4)/2. Clearly, this choice of
the HS transformation conserves the SU (2) spin symmetry for each realization of
the field. However, this comes at the price that one needs to work with complex
numbers. It turns out that when the sign problem is absent, the above choice of the
HS transformation yields in general more efficient codes.
We conclude this appendix with a general discrete HS transformation which
replaces (10.233). For small time steps ∆τ we have the identity
2 √
e∆τ λA = γ(l)e ∆τ λη(l)O + O(∆τ 4 ) , (10.240)
l=±1,±2
η(±1) = ± 2 3 − 6 ,
4
√
η(±2) = ± 2 3 + 6 . (10.241)
This transformation is not exact and produces an overall systematic error pro-
portional to ∆τ 3 in the Monte Carlo estimate of an observable. However, since we
already have a systematic error proportional to ∆τ 2 from the Trotter decomposi-
tion, the transformation is as good as exact. It also has the great advantage of being
discrete thus allowing efficient sampling.
where h0 is a Hermitian matrix, {c†x , cy } = δx,y , {c†x , c†y } = 0, and x runs over the
Ns single-particle states. Since h0 is Hermitian, we can find an unitary matrix U
such that U † h0 U = λ, where λ is a diagonal matrix. Hence,
350 F.F. Assaad and H.G. Evertz
H0 = λx,x γx† γx ,
x
†
γx = Ux,y cy ,
y
γx† = c†y Uy,x . (10.243)
y
? U is an
Since @ the γ operators satisfy the commutation re-
@ unitary transformation
?
lations γx† , γy = δx,y , and γx† , γy† = 0. An Np -particle eigenstate of the Hamil-
tonian H0 is characterized by the occupation of Np single-particle levels, α1 . . . αNp
and is given by
Np
Np
†
0 0 †
γα† 1 γα† 2 . . . γN p
|0 = c†x Ux,αn |0 = c P n |0 . (10.244)
n=1 x n=1
The second property we will need, is the overlap of two slater determinants. Let
Np
0 †
|Ψ = c P n |0 ,
n=1
Np
0
|Ψ# = c† P# |0 , (10.247)
n
n=1
then
!
Ψ |Ψ# = det P † P# . (10.248)
0 † 0
Ψ |Ψ# = 0| P c n c† P# |0
n
#
n=Np n
# =1
= PN† p ,yNp. . . P1,y
†
P##1 ,1 . . . P#y#Np ,Np 0|cyNp . . . cy1 c†y#1 . . . c†y#N |0 .
1 y p
y1 ,...yNp
y
#1 ...#yNp
(10.249)
The matrix element in the above equation does not vanish provided that all the
yi , i : 1 . . . Np take different values and that there is a permutation π , of Np numbers
such that
y#i = yπ(i) . (10.250)
Under those conditions, the matrix element is nothing but the sign of the permu-
tation (−1)π . Hence,
Ψ |Ψ# = |c†y1 . . . c†yNp |0|2
y1 ,...yNp
π
× (−1) PN† p ,yNp . . . P1,y
†
P#
1 yπ(1) ,1
. . . P#yπ(Np ) ,Np .
πǫSNp
(10.251)
In the above, we have explicitly included the matrix element |c†y1 . . . c†yNp |0|2 to
insure that terms in the sum with yi = yj do not contribute since under this as-
sumption the matrix element vanishes due to the Pauli principle. We can however
omit this term since the sum over permutations will guarantee that if yi = yj
π
for any i = j then πǫSNp (−1) PN† p ,yNp . . . P1,y
†
P#
1 yπ(1) ,1
. . . P#yπ(Np ) ,Np van-
ishes. Consider for example Np = 2 and y1 = y2 = x then the sum reduces to
† † # π
P2,x P1,x Px,1 P#x,2 πǫS2 (−1) = 0 since the sum over the sign of the permuta-
tions vanishes.
352 F.F. Assaad and H.G. Evertz
where the trace is over the Fock space. To verify the validity of the above equation,
† † †
let us set B = eT1 eT2 . . . eTn and U = ec T1 c ec T2 c . . . ec Tn c .
det (1 + B)
= (−1)π 1 + Bπ(1),1 . . . 1 + Bπ(Ns ),Ns
πǫSNs
= (−1)π δ1,π(1) . . . δNs ,π(Ns )
πǫSNs
+ (−1)π Bπ(x),x δ1,π(1) . . . δ
x,π(x) . . . δNs ,π(Ns )
x πǫSNs
+ (−1)π Bπ(x),x Bπ(y),y
y>x πǫSNs
×δ1,π(1) . . . δ
x,π(x) . . . δy,π(y) . . . δNs ,π(Ns )
+ (−1)π Bπ(x),x Bπ(y),y Bπ(z),z
y>x>z πǫSNs
×δ1,π(1) . . . δ
x,π(x) . . . δy,π(y) . . . δz,π(z) . . . δNs ,π(Ns ) + . . . . (10.254)
ANs
Here, δy,π(y) means that this term is omitted in the product δx,π(x) . To
x=1
proceed, let us consider in more details the second term starting with y>x in the
last equality. Due to the δ-functions the sum over the permutation of Ns numbers
reduces to two terms, namely the unit permutation and the transposition π(x) = x
and π(y) = y. Let us define the P (x,y) as a rectangular matrix of dimension Ns × 2,
with entries of the first (second) column set to one at row x (y) and zero otherwise.
Hence, we can write
(−1)π Bπ(x),x Bπ(y),y δ1,π(1) . . . δ
x,π(x) . . . δy,π(y) . . . δNs ,π(Ns )
πǫSNs
!
= det P (x,y),† BP (x,y) = 0|cx cy U c†y c†x |0 , (10.255)
10 World-line and Determinantal Quantum Monte Carlo Methods 353
where in the last equation we have used the properties of (10.248) and (10.245).
Repeating the same argument for different terms we obtain
det (1 + B)
=1+ 0|cx U c†x |0 + 0|cx cy U c†y c†x |0
x y>x
+ 0|cx cy cz U c†z c†y c†x |0 + . . . = Tr [U ] . (10.256)
y>x>z
References
1. S.R. White, Physics Reports 301, 187 (1998) 277
2. U. Schollwöck, Rev. Mod. Phys. 77, 259 (2005) 277
3. H.G. Evertz, G. Lana, M. Marcu, Phys. Rev. Lett. 70, 875 (1993) 277, 278, 288
4. H.G. Evertz, Adv. Phys. 52, 1 (2003) 277, 278, 303, 304, 306
5. O. Syljuasen, A.W. Sandvik, Phys. Rev. E 66, 046701 (2002) 277, 278, 302, 307, 308
6. R. Blankenbecler, D.J. Scalapino, R.L. Sugar, Phys. Rev. D 24, 2278 (1981) 277, 312
7. F. Michel, H.G. Evertz. URL http://arxiv.org/abs/0705.0799 and in preparation 278, 303, 308, 309, 310
8. A.N. Rubtsov, V.V. Savkin, A.I. Lichtenstein, Phys. Rev. B 72, 035122 (2005) 278, 300, 337
9. P. Werner, A. Comanac, L.D. Medici, M. Troyer, A. Millis, Phys. Rev. Lett. 97, 076405
(2006) 278, 300, 337
10. J.E. Hirsch, D.J. Scalapino, R.L. Sugar, R. Blankenbecler, Phys. Rev. B 26, 5033 (1981)
278
11. M. Barma, B.S. Shastry, Phys. Rev. B 18, 3351 (1978) 278
12. R.J. Baxter, Exactly Solved Models in Statistical Mechanics (Academic Press Limited,
London, 1989) 278
13. M. Troyer, M. Imada, K. Ueda, J. Phys. Soc. Jpn. 66, 2957 (1997) 278
14. O. Syljuasen, Phys. Rev. B 67, 046701 (2003) 278, 307, 308
15. A.W. Sandvik, O.F. Syljuåsen, in THE MONTE CARLO METHOD IN THE PHYSI-
CAL SCIENCES: Celebrating the 50th Anniversary of the Metropolis Algorithm, AIP
Conference Proceedings, Vol. 690, ed. by J.E. Gubernatis (2003), pp. 299–308. URL
http://arxiv.org/abs/cond-mat/0306542 278, 307, 308
16. M. Troyer, F. Alet, S. Trebst, S. Wessel, in THE MONTE CARLO METHOD IN THE
PHYSICAL SCIENCES: Celebrating the 50th Anniversary of the Metropolis Algorithm,
AIP Conference Proceedings, Vol. 690, ed. by J.E. Gubernatis (2003), pp. 156–169.
URL http://arxiv.org/abs/physics/0306128 278, 302, 307, 308
17. N. Kawashima, K. Harada, J. Phys. Soc. Jpn. 73, 1379 (2004) 278, 307, 308
18. J.F. Corney, P.D. Drummond, Phys. Rev. Lett. 93, 260401 (2004) 279, 324
19. F.F. Assaad, P. Werner, P. Corboz, E. Gull, M. Troyer, Phys. Rev. B 72, 224518 (2005) 279, 300, 324
20. F.F. Assaad, D. Würtz, Phys. Rev. B 44, 2681 (1991) 288
21. M. Troyer, F.F. Assaad, D. Würtz, Helv. Phys. Acta. 64, 942 (1991) 292
22. M. Brunner, F.F. Assaad, A. Muramatsu, Eur. Phys. J. B 16, 209 (2000) 294
23. M. Brunner, F.F. Assaad, A. Muramatsu, Phys. Rev. B 62, 12395 (2000) 294, 319
24. C. Brünger, F.F. Assaad, Phys. Rev. B 74, 205107 (2006) 294
354 F.F. Assaad and H.G. Evertz
25. N. Prokof’ev, B. Svistunov, Phys. Rev. Lett. 81, 2514 (1998) 300
26. S. Rombouts, K. Heide, N. Jachowicz, Phys. Rev. Lett. 82, 4155 (1999) 300
27. E. Burovski, A. Mishchenko, N. Prokof’ev, B. Svistunov, Phys. Rev. Lett. 87, 186402
(2001) 300
28. A. Rubtsov, M. Katsnelson, A. Lichtenstein, Dual fermion approach to nonlocal cor-
relations in the Hubbard model. URL http://arxiv.org/abs/cond-mat/
0612196. Preprint 300
29. M. Boninsegni, N. Prokof’ev, B. Svistunov, Phys. Rev. Lett. 96, 070601 (2006) 300
30. B. Beard, U. Wiese, Phys. Rev. Lett. 77, 5130 (1996) 300
31. A. Sandvik, J. Kurkijärvi, Phys. Rev. B 43, 5950 (1991) 301, 302, 309
32. A.W. Sandvik, J. Phys. A 25, 3667 (1992) 301, 302, 309
33. A.W. Sandvik, Phys. Rev. B 56, 11678 (1997) 301, 302, 309
34. N. Prokof’ev, B. Svistunov, I. Tupitsyn, Sov. Phys. JETP Letters 64, 911 (1996). URL
http://arxiv.org/abs/cond-mat/9612091 302
35. N. Prokof’ev, B. Svistunov, I. Tupitsyn, Sov. Phys. JETP 87, 310 (1998). URL
http://arxiv.org/abs/cond-mat/9703200 302, 307, 308
36. A. Sandvik, R. Singh, D. Campbell, Phys. Rev. B 56, 14510 (1997) 302, 308, 309
37. A. Sandvik, D. Campbell, Phys. Rev. Lett. 83, 195 (1999) 302, 308, 309
38. A. Dorneich, M. Troyer, Phys. Rev. E 64, 066701 (2001) 303
39. P. Kasteleyn, C. Fortuin, J. Phys. Soc. Jpn. Suppl. 26, 11 (1969) 303, 306
40. C. Fortuin, P. Kasteleyn, Physica 57, 536 (1972) 303, 306
41. R. Swendsen, J. Wang, Phys. Rev. Lett. 58, 86 (1987) 303
42. A.D. Sokal, in Quantum Fields on the Computer, ed. by M. Creutz (World Scientific,
Singapore, 1992), pp. 211–274. Available electronically via www.dbwilson.com/
exact 303, 306
43. M. Aizenman, B. Nachtergaele, Comm. Math. Phys. 164, 17 (1994) 304
44. B. Nachtergaele, in Probability Theory and Mathematical Statistics (Proceedings of the
6th Vilnius Conference), ed. by B. Grigelionis, et al. (VSP/TEV, Utrecht Tokyo Vilnius,
1994), pp. 565–590. URL http://arxiv.org/abs/cond-mat/9312012 304
45. A.W. Sandvik, Phys. Rev. B 59, R14157 (1999) 305
46. P. Henelius, A. Sandvik, Phys. Rev. B 62, 1102 (2000) 305
47. A.W. Sandvik, Phys. Rev. Lett. 95, 207203 (2005) 305
48. H.G. Evertz, W. von der Linden, Phys. Rev. Lett. 86, 5164 (2001) 306
49. R. Citro, E. Orignac, T. Giamarchi, Phys. Rev. B 72, 024434 (2005) 308, 310
50. K. Hukushima, K. Nemoto, J. Phys. Soc. Japan 65, 1604 (1996) 309
51. K. Hukushima, H. Takayama, K. Nemoto, Int. J. Mod. Phys. C 7, 337 (1996) 309
52. E. Marinari, G. Parisi, Europhys. Lett. 19, 451 (1992) 309
53. W. Barford, R. Bursill, Phys. Rev. Lett. 95, 137207 (2005) 311
54. J.E. Hirsch, R.M. Fye, Phys. Rev. Lett. 56, 2521 (1986) 312, 321
55. S. Capponi, F.F. Assaad, Phys. Rev. B 63, 155114 (2001) 313, 323, 344
56. F.F. Assaad, Phys. Rev. B 71, 075103 (2005) 313, 344
57. G. Sugiyama, S. Koonin, Ann. Phys. (N.Y.) 168, 1 (1986) 313
58. S. Sorella, S. Baroni, R. Car, M. Parrinello, Europhys. Lett. 8, 663 (1989) 313, 326
59. S. Sorella, E. Tosatti, S. Baroni, R. Car, M. Parrinello, Int. J. Mod. Phys. B 1, 993 (1989)
313, 326
60. J.E. Hirsch, Phys. Rev. B 31, 4403 (1985) 313
61. S.R. White, D.J. Scalapino, R.L. Sugar, E.Y. Loh, J.E. Gubernatis, R.T. Scalettar, Phys.
Rev. B 40, 506 (1989) 313, 326, 344
62. A.M. Tsvelik, Quantum Field Theory in Condensed Matter Physics (Cambridge
University press, Cambridge, 1996) 317
10 World-line and Determinantal Quantum Monte Carlo Methods 355
63. F.F. Assaad, M. Imada, J. Phys. Soc. Jpn. 65, 189 (1996) 319, 334
64. F.F. Assaad, Phys. Rev. Lett. 83, 796 (1999) 319, 344
65. M. Jarrell, J. Gubernatis, Phys. Rep. 269, 133 (1996) 319
66. W. von der Linden, Appl. Phys. A 60, 155 (1995) 319
67. K.S.D. Beach, Identifying the maximum entropy method as a special limit of stochas-
tic analytic continuation. URL http://arxiv.org/abs/cond-mat/0403055.
Preprint 319
68. J.E. Hirsch, Phys. Rev. B 38, 12023 (1988) 321
69. C. Wu, S. Zhang, Phys. Rev. B 71, 155115 (2005) 323, 344
70. A. Messiah, Quantum Mechanics. (Dover publications, INC., Mineola, New-York,
1999) 323
71. S. Capponi, F.F. Assaad, Phys. Rev. B 75, 045115 (2007) 324
72. M. Troyer, U. Wiese, Phys. Rev. Lett. 94, 170201 (2005) 324
73. J.F. Corney, P.D. Drummond, J. Phys. A: Math. Gen. 39, 269 (2006) 324
74. W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, Numerical Recipes in C
(Cambridge University Press, Cambridge, 1992) 327, 330
75. M. Feldbacher, F.F. Assaad, Phys. Rev. B 63, 073105 (2001) 333
76. M. Jarrell, Phys. Rev. Lett. 69, 168 (1992) 337, 345
77. A. Georges, G. Kotliar, W. Krauth, M.J. Rozenberg, Rev. of. Mod. Phys. 68, 13 (1996) 337, 345
78. A.C. Hewson, The Kondo Problem to Heavy Fermions. Cambridge Studies in Mag-
netism (Cambridge Universiy Press, Cambridge, 1997) 337, 341
79. J. Yoo, S. Chandrasekharan, R.K. Kaul, D. Ullmo, H.U. Baranger, J. Phys. A: Math.
Gen. 38, 10307 (2005) 341
80. F.F. Assaad, Phys. Rev. B 65, 115104 (2002) 342, 345
81. M. Feldbacher, F.F. Assaad, K. Held, Phys. Rev. Lett. 93, 136405 (2004) 343, 345
82. F. F. Assaad and T. Lang, Phys. Rev. B 76, 035116 (2007) 343
83. J.E. Hirsch, Fradkin, Phys. Rev. B 27, 4302 (1983) 344
84. M. Randeria, N. Trivedi, A. Moreo, R.T. Scalettar, Phys. Rev. Lett. 69, 2001 (1992) 344
85. N. Trivedi, M. Randeria, Phys. Rev. Lett. 75, 312 (1995) 344
86. M. Randeria, N. Trivedi, A. Moreo, R.T. Scalettar, Phys. Rev. D 54, R3756 (1996) 344
87. F.F. Assaad, V. Rousseau, F. Hébert, M. Feldbacher, G. Batrouni, Europhys. Lett. 63,
569 (2003) 344
88. C. Wu, J.P. Hu, S.C. Zhang, Phys. Rev. Lett. 91, 186402 (2003) 344
89. S. Capponi, C. Wu, S.C. Zhang, Phys. Rev. B 70, 220505 (2004) 344
90. J.E. Hirsch, S. Tang, Phys. Rev. Lett. 62, 591 (1989) 344
91. G. Dopf, A. Muramatsu, W. Hanke, Europhys. Lett. 17, 559 (1992) 344
92. G. Dopf, A. Muramatsu, W. Hanke, Phys. Rev. Lett. 68, 353 (1992) 344
93. F.F. Assaad, W. Hanke, D.J. Scalapino, Phys. Rev. B 50, 12835 (1994) 344
94. D.J. Scalapino, S. White, S. Zhang, Phys. Rev. B 47, 7995 (1993) 344
95. N. Furukawa, M. Imada, J. Phys. Soc. Jpn. 62, 2557 (1993) 344
96. F.F. Assaad, M. Imada, Phys. Rev. Lett 74, 3868 (1995) 344
97. F.F. Assaad, M. Imada, Phys. Rev. Lett 76, 3176 (1996) 344
98. G. Dopf, J. Wagner, P. Dieterich, A. Muramatsu, W. Hanke, Phys. Rev. Lett. 68, 2082
(1992) 344
99. C. Gröber, R. Eder, W. Hanke, Phys. Rev. B 62, 4336 (2000) 344
100. M. Imada, A. Fujimori, Y. Tokura, Rev. Mod. Phys. 70, 1039 (1998) 344
101. M. Vekic, J.W. Cannon, D.J. Scalapino, R.T. Scalettar, R.L. Sugar, Phys. Rev. Lett. 74,
2367 (1995) 344
356 F.F. Assaad and H.G. Evertz
11.1 Introduction
The interaction of electrons with lattice degrees of freedom plays an important role
in many materials, including conventional and high-temperature superconductors,
colossal-magnetoresistance manganites, and low-dimensional nanostructures. Over
more than two decades, lattice and continuum quantum Monte Carlo (QMC) sim-
ulations have proved to be a highly valuable tool to investigate the properties of
coupled fermion-boson models in condensed matter theory.
Despite the recent development of other numerical methods (e.g., the density
matrix renormalization group, see Part IX, QMC approaches remain in the focus
of research due to their versatility. Especially in the early days of computational
physics, they outperformed alternative memory-consumptive methods, and this of-
ten remains true today, e.g., in more than one dimension or at finite temperature.
Apart from stand-alone applications, QMC algorithms also serve as solvers in the
context of cluster methods (see Chap. 16). Finally, they represent the most reliable
techniques for several classes of problems, e.g., three-dimensional (3D) spin sys-
tems (see Chap. 10).
A general introduction to the concepts common to many QMC methods has
been given in Chap. 10. In this chapter, we focus on the issue of autocorrelations,
which turns out to be of particular importance in the case of coupled fermion-boson
models due to the different physical time scales involved, and the resulting problems
in finding appropriate updating schemes. Quite disturbingly, some recent as well as
early work seems to be unaware of the problem. To illustrate this point, we re-
enact some specific QMC studies from the literature using the same methods, and
demonstrate that statistical errors are severely underestimated if autocorrelations are
neglected.
M. Hohenadler and T. C. Lang: Autocorrelations in Quantum Monte Carlo Simulations of Electron-Phonon Models,
Lect. Notes Phys. 739, 357–366 (2008)
DOI 10.1007/978-3-540-74686-7 11 c Springer-Verlag Berlin Heidelberg 2008
358 M. Hohenadler and T. C. Lang
This chapter is organized as follows. In Sect. 11.2, we introduce the model con-
sidered, and Sect. 11.3 gives a brief description of the algorithms used. Numerical
evidence for the problem of autocorrelations is presented in Sect. 11.4, whereas
their origin and a possible solution are the topic of Sect. 11.5. We end with our
conclusions in Sect. 11.6.
Here c†i,σ creates an electron with spin σ at site i, and n
i = σ n i,σ with n
i,σ =
c†i,σ ci,σ . The phonon degrees of freedom at site i are described by the momentum
pi and coordinate (displacement) x i of a harmonic oscillator. The model parameters
are the nearest-neighbor hopping amplitude t, the Einstein phonon frequency ω0 and
the electron-phonon coupling α. We shall also refer to the spinless Holstein model,
which can be obtained from (11.1) by dropping spin indices and sums over σ. We
consider D-dimensional lattices with V = N D sites and periodic boundary con-
ditions. A useful dimensionless coupling constant is λ = α2 /(ω0 W ) = 2EP /W ,
where W = 4tD and EP denote the free bandwidth and the polaron binding energy,
respectively.
The Holstein model provides a framework to study numerous problems associ-
ated with electron-phonon interaction, such as polaron formation, superconductivity
or charge-density-wave formation. Besides, more complicated models such as the
Holstein-Hubbard model share the same structure of the phonon degrees of freedom
and the electron-phonon interaction, so that the following discussion in principle
applies to a wider range of problems.
To set the stage for the discussion of autocorrelations, we provide here a brief sum-
mary of the most important details of the different QMC algorithms employed. For
details we refer the reader to [1, 2] and Chap. 10.
For the one-electron case (the polaron problem), we make use of the world-
line method originally proposed in [3, 4]. Dividing the imaginary-time axis [0, β]
(β = (kB T )−1 is the inverse temperature) into intervals of length ∆τ = β/L ≪ 1
11 Autocorrelations in QMC Simulations of Electron-Phonon Models 359
according to the Suzuki-Trotter approximation (see Chap. 10), the result for the
fermionic3 partition function reads
L L
0
F (τ −τ ′ )δrτ ,r
Zf,L = wf ({r τ }) , wf ({rτ }) = e τ,τ ′ =1 τ′ I(r τ +1 − r τ ) ,
{r τ } τ =1
(11.2)
with the fermionic weight wf . The fermion world-lines, specified by a position vec-
tor r τ on each time slice,4 are subject to periodic boundary conditions both in real
space and imaginary time, and the sum in (11.2) is over all allowed configurations.
The retarded electron (self-)interaction due to electron-phonon coupling is de-
scribed by the memory function
L−1
ω0 ∆τ 3 α2 cos(2πτ ν/L)
F (τ ) = , (11.3)
4L ν=0
1 − cos(2πν/L) + (ω0 ∆τ )2 /2
3
The bosonic part can be calculated exactly [4] and is therefore not considered.
4
We use bold symbols to indicate the vector character of a quantity. The exact definition of
the components should be clear from the context.
360 M. Hohenadler and T. C. Lang
0 L
0
σ σ
ZL = const. Dx det (1 + e−∆τ K e−∆τ I ({xτ }) e−∆τ Sb ({xτ }) ,
σ τ =1
wf ({xτ }) wb ({xτ })
(11.5)
σ σ
with K , I denoting the matrix representations of the spin-σ component of the
first respectively last term (including the minus signs) in Hamiltonian (11.1).
The bosonic action is given by
L
V
L
ω0 1 2
Sb ({xτ }) = x2i,τ + (xi,τ − xi,τ +1 ) = xTi A xi .
i=1 τ =1
2 2ω0 ∆τ 2 i=1
(11.6)
Here the sampling is over all possible phonon configurations {xτ } of the bosonic
degrees of freedom. In the simplest approach, we select a random time slice
τ0 ∈ [1, L] and a random lattice site i0 ∈ [1, V ], and propose a modified phonon
configuration x′i0 ,τ0 = xi0 ,τ0 ± δx. The latter is then accepted with probability
min[1, wf ({x′τ })wb ({x′τ })/wf ({xτ })wb ({xτ })]. The change δx is determined by
requiring a reasonable acceptance rate. An improved (global) updating scheme will
be discussed below.
0.8
ΔEf(k) / ΔEf(kmax)
0.6
0.4
1D,ω0/t = 1.0, Δτ = 0.16, Nskip = 32
0.2 3D,ω0/t = 1.0, Δτ = 0.16, Nskip = 32
2D,ω0/t = 0.1, Δτ = 0.05, Nskip = 300
0
11 0 100 1000 10000
binsize k
Fig. 11.1. Statistical error of the fermionic total energy Ef as a function of binsize k, normal-
ized to the result for the maximum binsize, obtained with the world-line method [4]. Results
are for the Holstein model with one electron, N = 32 and λ = 1
two-electron model [7], and accurate simulations in the many-electron case turn out
to be unfeasible in many cases [10] due to autocorrelations times exceeding 104 MC
steps.
To illustrate this point, we consider two parameter sets for the Holstein model at
half filling (one electron per lattice site), representative of the work in [11] and [12].
We use the finite-temperature determinant QMC method, although the results of [11]
have been obtained using the projector method (see Chap. 10; autocorrelation times
are usually comparable). Owing to the substantially larger computational effort as
compared to one-electron calculations, we were not able to obtain converged results.
Therefore, and to compare different parameters, we show in Fig. 11.2 the statistical
error of the bosonic energy Eb = (ω0 /2) i ( p2i + x2i ), normalized to the error
for binsize k = 1. The definition of λ in terms of the coupling constant g used in
[11, 12] reads λ = 2g 2 /(ω0 W ), and we have used Nskip = 1.
The strong increase of statistical errors as a function of binsize in Fig. 11.2 illus-
trates the substantial autocorrelations in such simulations. No saturation can be seen
in our data even for the largest binsize k > 104 shown (cf Fig. 11.1) and, in contrast
to the world-line method of Sect. 11.3, skipping thousands of steps is usually not
practicable in the many-electron case. In our opinion, this suggests that reliable re-
sults for the Holstein model in the many-electron case are extremely challenging to
obtain using the determinant QMC method, and the situation becomes even worse
for ω0 /t < 1. Similar conclusions can be drawn about the spinless Holstein model,
models with phonon modes of different symmetry [10], as well as Holstein-Hubbard
models with local and/or non-local Coulomb interaction [13].
Despite these difficulties, some early work [12] as well as more recent pa-
pers, e.g., [11, 14], seem to be unaware of this problem. This issue becomes even
more critical if dynamical quantities such as the one-electron spectral function are
1
1 0 100 1000 10000
binsize k
Fig. 11.2. Statistical error of the bosonic energy Eb as a function of binsize k, normalized
to the result for k = 1, obtained with the determinant QMC method [2]. Results are for the
Holstein model at half filling n = 1, βt = 10 and ω0 /t = 1. Errorbars are not shown
11 Autocorrelations in QMC Simulations of Electron-Phonon Models 363
6
In the world-line algorithm, the discrete step size used for updates cannot be reduced
below one lattice constant.
364 M. Hohenadler and T. C. Lang
with the aforementioned L×L matrix A, and the principal components ξ, in terms of
which Sb becomes diagonal. Using this representation,
the bosonic weight reduces
to a Gaussian distribution, wb = exp(−∆τ i ξ Ti · ξi ). For α = 0, sampling can be
done exactly in terms of the new variables ξi,τ using the Box-Muller method [19].
To further illustrate the origin of autocorrelations, as well as the transformation
to principal components, we show in Fig. 11.3(a) a schematic representation of the
distribution of values for a pair (p, p′ ) of two phonon momenta (shaded area). The
elongated shape originates from the strong correlations mediated by Sb , and requires
a transition A → B between two points in phase space to be performed in many
small steps, leading in turn to long autocorrelation times.
In contrast, the axes of the principal components ξ, ξ ′ in Fig. 11.3(b) lie along
the axes of the ellipse, and a single MC update of ξ ′ is sufficient to get from A to B.
Although we have sketched the more general case, the distribution after the exact
transformation (11.7) – under which wb becomes a Gaussian – is actually circular
in the new variables ξ, ξ ′ (dashed line in Fig. 11.3(b)).
Whereas exact sampling without autocorrelations is straightforward in the non-
interacting case α = 0, the dependence of wf on the phonon coordinates xi,τ for
α > 0 does not permit a simple separation of bosonic and fermionic contributions
in the updating process. Therefore, it has been proposed [9] to base the QMC algo-
rithm on the Lang-Firsov transformed Hamiltonian, which has no explicit coupling
of x to electronic degrees of freedom. To this end, it is advantageous to sample the
phonon momenta p instead of x, as the former depend only weakly on the elec-
tronic degrees of freedom [9], which enables us to treat the fermionic weight wf
(a) p (b)
ξ B
B ξ’
p’
A
A
Fig. 11.3. Schematic illustration of the transformation from phonon momenta p, p′ to princi-
pal components ξ, ξ ′ (see text)
11 Autocorrelations in QMC Simulations of Electron-Phonon Models 365
as a part of the observables, and renders the MC sampling exact and rejection-free
(every new configuration is accepted). Consequently, we avoid a warm-up phase,
autocorrelations and the computationally expensive evaluation of wf in the updating
process.
The method outlined here has been successfully applied to the polaron [9, 20],
bipolaron [21], and the (spinless) many-polaron problem [22]. Unfortunately, at-
tempts to generalize this approach to the Holstein-Hubbard model, or the spinful
Holstein model, have not been successful [13]. Although the Lang-Firsov transfor-
mation improves the sampling of phonon configurations via principal components,
the complex phase in the transformed hopping term [9] induces a severe sign prob-
lem [13, 22]. Despite encouraging acceptance rates, this global updating scheme
does not permit reliable statements concerning a possible decrease of autocorrela-
tion times.
11.6 Conclusions
By revisiting several QMC studies of Holstein models carried out in the past we
have illustrated the severe problem of autocorrelations in simulations of electron-
phonon models, in accordance with [10]. In particular, we have shown that statisti-
cal errors can be underestimated by orders of magnitude if autocorrelations are ne-
glected. This is particularly dangerous when calculating dynamic properties using,
e.g., Maximum Entropy methods, where meaningful errorbars can usually not be
obtained, introducing substantial uncertainties into the results. Long autocorrelation
times can also lead to critical slowing down as well as non-ergodic sampling during
finite-time MC runs – both phenomena being additional sources for underestimated
statistical errors – thereby also affecting the expectation values of observables.
Similar to the infamous minus-sign problem (see Chap. 10), autocorrelations
in QMC simulations seem to result from the fact that one is dealing with an ill-
conditioned physical problem. As a consequence, their appearance is not restricted
to the Holstein-type models considered here (see Chap. 10), or the particular QMC
methods employed. Besides, autocorrelations even occur in simulations of classical
systems (Chap. 4), although the problem is usually not as substantial as for coupled
fermion-boson systems. This general observation strongly suggests that great care
has to be taken when performing any MC simulations in order to avoid incorrect
results.
Significant advances in terms of efficiency and applicability can be achieved by
constructing a physically motivated global updating scheme. One such possibility
has been presented here in terms of a transformation to principal components. How-
ever, a general solution to overcome the problem of autocorrelations in simulations
of electron-phonon models is not yet known.
366 M. Hohenadler and T. C. Lang
Acknowledgements
References
1. H. de Raedt, A. Lagendijk, Phys. Rep. 127, 233 (1985) 358
2. R. Blankenbecler, D.J. Scalapino, R.L. Sugar, Phys. Rev. D 24, 2278 (1981) 358, 359, 362
3. H. De Raedt, A. Lagendijk, Phys. Rev. Lett. 49, 1522 (1982) 358, 359, 360
4. H. De Raedt, A. Lagendijk, Phys. Rev. B 27, 6097 (1983) 358, 359, 361
5. N. Metropolis, A. Rosenbluth, A. Teller, E. Teller, J. Chem. Phys. 21, 1087 (1953) 359
6. P.E. Kornilovitch, J. Phys.: Condens. Matter 9, 10675 (1997) 360, 361
7. M. Hohenadler, H. Fehske, J. Phys.: Condens. Matter 19, 255210 (2007) 360, 361, 362
8. H. Fehske, A. Alvermann, M. Hohenadler, G. Wellein, in Polarons in Bulk Materials and
Systems with Reduced Dimensionality, ed. by G. Iadonisi, J. Ranninger, G. De Filippis
(IOS Press, Amsterdam, Oxford, Tokio, Washington DC, 2006), Proc. Int. School of
Physics “Enrico Fermi”, Course CLXI, pp. 285–296 360
9. M. Hohenadler, H.G. Evertz, W. von der Linden, Phys. Rev. B 69, 024301 (2004) 361, 363, 364, 365
10. D. Eckert, Phononen im Hubbard Modell. Master’s thesis, University of Würzburg
(1997) 362, 365
11. K. Tam, S. Tsai, D.K. Campbell, A.H. Castro Neto, Phys. Rev. B 75, 161103 (2007) 362
12. P. Niyaz, J.E. Gubernatis, R.T. Scalettar, C.Y. Fong, Phys. Rev. B 48, 16 011 (1993) 362
13. T.C. Lang, Dynamics and charge order in a quarter filled ladder coupled to the lattice.
Master’s thesis, TU Graz (2005) 362, 365
14. C.E. Creffield, G. Sangiovanni, M. Capone, Eur. Phys. J. B 44, 175 (2005) 362
15. W. von der Linden, Phys. Rep. 220, 53 (1992) 363
16. M. Jarrell, J.E. Gubernatis, Phys. Rep. 269, 133 (1996) 363
17. G.G. Batrouni, R.T. Scalettar, in Quantum Monte Carlo Methods in Physics and Chem-
istry, ed. by M.P. Nightingale, C.J. Umrigar (Kluwer Academic Publishers, Dordrecht,
1998), p. 65 363
18. P.E. Kornilovitch, Phys. Rev. Lett. 81, 5382 (1998) 363
19. G.E.P. Box, M.E. Muller, Ann. Math. Stat. 29, 610 (1958) 364
20. M. Hohenadler, H.G. Evertz, W. von der Linden, phys. stat. sol. (b) 242, 1406 (2005) 365
21. M. Hohenadler, W. von der Linden, Phys. Rev. B 71, 184309 (2005) 365
22. M. Hohenadler, D. Neuber, W. von der Linden, G. Wellein, J. Loos, H. Fehske, Phys.
Rev. B 71, 245111 (2005) 365
12 Diagrammatic Monte Carlo and Stochastic
Optimization Methods for Complex Composite
Objects in Macroscopic Baths
A. S. Mishchenko
CREST, Japan Science and Technology Agency, AIST, 1-1-1, Higashi, Tsukuba
305-8562, Japan
Russian Research Center Kurchatov Institute, 123182 Moscow, Russia
12.1 Introduction
Many physical problems can be reduced to a system of one or a few complex objects
(CO) interacting with each other and with a macroscopic bosonic bath. The state of
such a CO, in general, is defined by a diverse set of quantum numbers, which change
when excitations of the bosonic bath are emitted and annihilated, or when two COs
interact. Despite the varying physical meaning of the COs quantum numbers in dif-
ferent physical systems the typical Hamiltonians for a broad range of problems look
very similar, and, thus, similar methods can be applied for their solution.
Historically, the most famous problem treated in the above framework is that of
a polaron, i.e. of an electron coupled to phonons (see [1, 2] for an introduction).
In the initial formulation a bare quasi particle (QP)1 has no internal structure, i.e.
internal quantum numbers, and it is characterized only by the translational quan-
tum number – the momentum – which changes due to the interaction of the QP with
phonons [3, 4]. Hence, in terms of the above definition, the polaron is not a CO since
the quasimomentum completely defines its quantum state and there are no other
quantum numbers determining the internal state of the QP. However, the polaron
concept can be generalized to include additional internal degrees of freedom, which
change their quantum numbers due to the interaction with the environment. Exam-
ples are the Jahn-Teller polaron, where the electron-phonon interaction changes the
quantum numbers of degenerate electronic states [5, 6], and the pseudo Jahn-Teller
(PJT) polaron, where electron-phonon interaction leads to transitions between elec-
tronic levels that are close in energy [7, 8]. Note, that for a CO, in addition to the
quasimomentum, some internal quantum numbers are required to define the state
1
In general, a QP is defined as an elementary excitation whose energy separation from the
ground state is larger than the energy broadening due to decay.
A. S. Mishchenko: Diagrammatic Monte Carlo and Stochastic Optimization Methods for Complex Composite Objects
in Macroscopic Baths, Lect. Notes Phys. 739, 367–395 (2008)
DOI 10.1007/978-3-540-74686-7 12 c Springer-Verlag Berlin Heidelberg 2008
368 A. S. Mishchenko
Hardly any numerical method, not to speak of analytical approaches, can give
approximation-free results for measurable spectral quantities of a CO, such as the
optical conductivity, the angle resolved photoemission spectrum of a polaron or the
damping of a qubit. There are plenty of effective methods which are either restricted
to finite systems or applicable only to specific cases of macroscopic systems, such
as low dimensional systems, etc. What we need is a general strategy for the whole
class of problems formulated above, i.e. for a few COs in a macroscopic system of
arbitrary dimension interacting with an arbitrary bath in the most general form. This
implies arbitrary momentum dependence of the coupling constant of the CO to the
bosonic bath which, in turn, has an arbitrary dispersion of bosonic excitations. In
addition, it is important to treat the information on the damping of the CO and of
the bosonic bath on the same (approximation-free) level as the interactions. Most of
the standard numerical methods are based on the solution of an eigenvalue problem
where all bare eigenstates have well defined energies. Therefore, any information
which is not explicitly encoded in the Hamiltonian cannot be incorporated in the
solution, in particular, it is not possible to describe damped QPs.
The DMC method provides an elegant way to handle all these difficulties. It re-
lies on an exact numerical summation of the Feynman expansion for the considered
correlation function, and is independent of the analytic expression for the initial
bare Green functions (GFs). Hence, additional information, e.g. damping, which is
not included in the bare Hamiltonian, can easily be incorporated afterwards using
standard rules [40]. Note also, that there are no restrictions on the bosonic bath.
Formulating models suitable for the DMC-SO approach I start from general polaron
models. The simplest problem of a complex polaronic object, where the center-of-
mass motion does not separate from the other degrees of freedom, is given by a
system of two QPs,
H0par = εa (k)a†k ak + εh (k)hk h†k . (12.1)
k k
Here ak and hk are annihilation operators, and εa (k) and εh (k) are the dispersions
of the QPs, which interact with each other through the instantaneous Coulomb po-
tential U,
370 A. S. Mishchenko
1
Ha-h = − Uk (p, p′ ) a†k+p h†k−p hk−p′ ak+p′ , (12.2)
N ′
kpp
where N is the number of lattice sites. The QPs are scattered by Q different branches
of bosons,
Q
Hpar-bos = i (b†q,κ − b−q,κ ) γaa,κ (k, q)a†k−q ak
κ=1 k,q
!
+γhh,κ (k, q)h†k−q hk + γah,κ (k, q)h†k−q ak + h.c. (12.3)
with γ[aa,ah,hh],κ are the interaction constants, which are described by the
Hamiltonian
Q
Hbos = ωq,κ b†q,κ bq,κ . (12.4)
κ=1 q
with quantum numbers that can also be affected by the non-diagonal part of the
particle-boson interaction
Q
R
Hpar-bos = i γij,κ (k, q)(b†q,κ − b−q,κ )a†i,k−q aj,k + h.c. . (12.6)
κ k,q i,j=1
The simplest polaron problem, in turn, can be subdivided into continuous and lattice
polaron models.
The dynamics of a dissipative two-state system, which we need to understand
when operating real quantum computers [20], can be reduced to the so-called spin-
boson Hamiltonian [21], where a two-level system interacts with a bosonic bath.
The properties of the two-level system are determined by the tunneling matrix el-
ement ∆ and the bias ǫ. The bosonic bath and the interaction are described by a
set of oscillator frequencies {ωα } and coupling constants {γα }. It is convenient to
consider the two biased levels and the bosonic bath as the unperturbed system
1 !
H0 = ǫ c†1 c1 − c†2 c2 + ωα b†α bα , (12.10)
2 α
where hk−p (τ ) = exp(Hτ )hk−p exp(−Hτ ), τ > 0. In the case of the exciton-
polaron the vacuum state |vac is the state with filled valence and empty conduction
bands. For the bipolaron problem it is a system without particles. In the simpler case
of a QP with internal two-level structure described by (12.4)–(12.6) the relevant
quantity is the one-particle matrix GF [28, 34]
Information about the response to a weak external perturbation, e.g. optical absorp-
tion, is contained in the current-current correlation function Jβ (τ )Jδ , where β,δ
are Cartesian indices.
The Lehmann spectral representation [40, 43] of Gk (τ ) (12.14)–(12.16) at zero
temperature reads
∞
Gk (τ ) = dωLk (ω)e−ωτ , (12.17)
0
where the Lehmann function Lk (ω) given in (12.7) reveals information on the
ground state and the excited states. Lk (ω) has poles (sharp peaks) at the energies
of stable (metastable) states of the particle. For example, if there is a stable state
at energy E(k), the Lehmann function reads Lk (ω) = Z (k) δ(ω − E(k)) + . . . ,
and the state with the lowest energy Egs (k) in a sector of a given momentum k is
characterized by the asymptotic behavior of the GF
12 DMC and SO Methods for Complex Composite Objects in Macroscopic Baths 373
1
Gk τ ≫ −→ Z (k) e−Egs (k)τ , (12.18)
Eex (k) − Egs (k)
where Z (k) is the weight of the ground state and Eex (k) the energy of the first
excited state of the system. Then, the ground state properties are obtained from the
logarithmic plot of the GF (see Fig. 12.1).
Note that the energy and Z-factors of the lowest state in the sector of given
momentum are not the only properties which can be extracted from the asymptotic
behavior. For example, the analysis of the asymptotic behavior of the two-particle
GF (12.14) of an exciton [27]
′
Gp=p
k (τ → ∞) = |ξkp,gs |2 e−Egs (k)τ (12.19)
yields absolute values for the coefficients ξkp,gs of the wave function of the relative
electron-hole motion for an exciton in the lowest state of the given momentum
Ψgs (k) = ξkp,gs a†k+p h†k-p |vac . (12.20)
p
R
∞
Ψgs (k) = θi (k; q 1 , . . . , q n )c†i,k−q1 ...−qn b†q1 . . . b†qn |vac , (12.22)
i=1 n=0 q 1 ...qn
0.0
–0.5
In (Z(k)) – Eg.s.(k) τ
In [G(τ)]
–1.0
–1.5
–2.0
0 20 40 60
τ
Fig. 12.1. Typical behavior of the GF of a polaron and determination of Z (k) -factor and
energy of the ground state from the fit of the linear asymptotics
374 A. S. Mishchenko
∞
which are normalized to unity n=0 Z (k) (n) ≡ 1, and the average number of
phonons
∞
N ≡ Ψgs (k)| b†q bq |Ψgs (k) = nZ (k) (n) (12.24)
q n=1
for instance, yields the light absorption of excitons I(ω). Moreover, the real part of
the optical conductivity σβδ (ω) can be expressed [29] in terms of the current-current
correlation function Jβ (τ )Jδ as
π −1
σβδ (ω) = F [Jβ (τ )Jδ ] . (12.27)
ω ω
with τ > 0. Here Tτ is the imaginary time ordering operator, |vac is a vacuum state
without particles and phonons, and Hint is the interaction Hamiltonian of (12.9).
The exponent denotes the formal summation of a Taylor series which corresponds to
multiple integrations over the internal variables {τ1′ , τ2′ , . . .}. The operators are taken
in the interaction representation A(τ ) = exp[τ (Hpar +Hph )]A exp[−τ (Hpar +Hph )],
and the index “con” denotes an expansion which contains only connected diagrams
where no integral over internal time variables {τ1′ , τ2′ , . . .} can be factorized.
Applying the Wick theorem, a matrix element of time-ordered operators can be
written as a sum of terms, each being the product of matrix elements of pairs of
operators. Then the expansion (12.28) becomes an infinite series of integrals with
an ever increasing number of integration variables
∞
Gk (τ ) = dx′1 · · · dx′m Dm
(ξm )
(τ ; {x′1 , . . . , x′m }) . (12.29)
m=0,2,4,... ξm
Here the index ξm stands for different Feynman diagrams (FDs) of the same order
m because for m > 2 there is more than one diagram of the same order m. The
zero-order term with m = 0 is the bare GF of the QP.
The aim of DMC is the evaluation of the series (12.29) with the help of im-
portance sampling. Hence, we need to find a positive weight function and an up-
date procedure to formulate something similar to the well known Metropolis al-
gorithm [49, 50, 51]. In statistical physics the latter is used to calculate the ex-
pectation value of an observable Q, which is defined as a sum over all states μ
of the system with
−1
energies Eμ , each term weighted with the Boltzmann probabil-
ity, Q
= Z μ Qμ exp[−βEμ ]. Here β = 1/T is inverse temperature and
Z = μ exp[−βEμ ] the partition function. Since it is impossible to sum over all
possible states μ of the macroscopic system {μ} the classical MC uses the con-
cept of importance sampling, where the sum is approximated by adding only the
contributions of a small but typical set of states. These states are selected such that
the probability of a particular state ν equals Dν = Z −1 exp[−βEν ]. This can be
achieved through a Markov chain ν → ν ′ → ν ′′ → . . . with appropriate transition
probabilities between subsequent states. Within the Metropolis scheme the system
is offered a new configuration ν ′ , and the move ν → ν ′ is accepted with probability
M = Dν ′ /Dν , if M < 1, or one otherwise. After N steps of such a stochastic
(Markov) process the estimator for the observable Q reads
376 A. S. Mishchenko
N
1
QN = Qν , (12.30)
N i=1
where Qν is the value of Q in the state ν.
In close analogy to the weight function of classical MC, Dν , the DMC method
(ξ )
uses the weight function Dmm (τ ; {x′1 , . . . , x′m }), which depends on the internal
integration variables {x1 , . . . , x′m } and the external variable τ . The term with m =
′
(0)
0 is the GF of the noninteracting QP, Gk (τ ).
(ξm )
For orders m > 0, Dm (τ ; {x1 , . . . , x′m }) can be expressed as a product of
′
GFs of noninteracting QPs, GFs of phonons, and of interaction vertices V (k, q).
For the simplest case of a Hamiltonian system the expressions for the GFs are well
(0)
known: They read Gk (τ2 −τ1 ) = exp [−ǫ(k)(τ2 − τ1 )] with (τ2 > τ1 ) for the QPs
(0)
and Dq (τ2 − τ1 ) = exp [−ωq (τ2 − τ1 )] with (τ2 > τ1 ) for the phonons [42, 40].
An important feature, which distinguishes the DMC method from other exact
numerical approaches, is the possibility to explicitly include renormalized GFs into
an exact expansion without any change of the algorithm. If we know the damping of
the QP caused by interactions that are not included in the Hamiltonian, we can use
the renormalized GF
∞
# (0) (τ ) = 1
G dωe−ωτ
ImΣret (k, ω)
k 2 2
π (ω − ǫ(k) − ReΣret (k, ω)) + (ImΣret (k, ω))
−∞
(12.31)
(0)
for our calculation, instead of bare the GF Gk (τ ).
To avoid double counting the
retarded self energy Σret (k, ω) should contain only those interactions which are
not included in the Hamiltonian treated by the DMC procedure. The rules for the
(ξ )
evaluation of Dmm do not depend on the order and topology of the FDs. In Fig. 12.2
(0)
we show examples of typical diagrams. Here GFs of noninteracting QPs Gk (τ2 −
(0)
# (τ2 − τ1 ), correspond to horizontal lines, whereas noninteracting GFs
τ1 ), or Gk
(0)
of phonons Dq (τ2 − τ1 ), multiplied by the prefactor of the appropriate vertices
(ξ )
V (k′ , q)V ∗ (k′′ , q), are denoted by semi-circles. Dmm then is the product of all
GSs occuring in a given diagram. For example, the weight of the second order term
in Fig. 12.2(b) is
Fig. 12.2. (a) Typical FD contributing into expansion (12.29). (b) FD of the second order and
(c) forth order
12 DMC and SO Methods for Complex Composite Objects in Macroscopic Baths 377
2 (0)
D2 (τ ; {τ2′ , τ1′ , q}) = |V (k, q)| Dq(0) (τ2′ − τ1′ )Gk (τ1′ )
(0) (0)
× Gk−q (τ2′ − τ1′ )Gk (τ − τ2′ ) . (12.32)
The DMC process is a numerical procedure which, based on the Metropolis princi-
(ξ )
ple [49, 50, 51] and the weight function Dmm (τ ; {x′1 , . . . , x′m }), samples different
FDs in the parameter space (τ, m, ξm , {x′m }). In parallel, it collects the statistics of
the external variable τ such that the result converges to the exact GF Gk (τ ). Hence,
within DMC the Markov process involves changes of both the internal variables and
the external variable τ , as well as a switching between different orders and topolo-
gies of the FDs. The statistics of the variable τ is measured, e.g. by a histogram
method.
Even though the Markov process combines the sampling of the internal parame-
ters of a diagram and the switching between different diagrams, it is instructive to
explain these two update mechanisms separately. Let us start with the sampling of
(ξ )
one particular diagram of weight Dmm (τ ; {x′1 , . . . , x′m }), which has much in com-
(old) (new)
mon with classical MC. Given a set {τ ; {x′1 , . . . , x′m }}, an update xl → xl
of an arbitrarily chosen parameter is suggested. This update is accepted or rejected
according to the Metropolis principle. After many steps, altering all variables, the
statistics of the external variable converges to the exact dependence of the term on
(new)
τ . The suggestion for the new value of the parameter xl = S −1 (R) is generated
−1
from a random number R ∈ [0, 1], where S (R) is the root of the integral equation
(new)
x
l
(ξ ) (new) (new)
Dmm τ ; {x′1 , . . . , xl , . . . , x′m } /W xl
M=
. (12.34)
(ξ ) (old) (old)
Dmm τ ; {x′1 , . . . , xl , . . . , x′m } /W (xl )
378 A. S. Mishchenko
(max) (min)
For the uniform distribution W = const. = (xl − xl )−1 , the probability of
any combination of parameters is proportional to the weight function D. However,
for better convergence the distribution W (xnewl ) should be as close as possible to
(ξ ) (new)
the actual distribution given by the function Dmm ({. . . , xl , . . . , }). If these two
distributions coincide, M ≡ 1 for every update. Hence, all updates are accepted and
the sampling is most effective. For example, if the distribution
′ (new) ′
∆Ee−([τ4 ] −τ3 )∆E
W ([τ4′ ](new) ) = ′ ′ (12.35)
1 − e−(τ2 −τ3 )∆E
is used to update parameter τ4′ in the FD of Fig. 12.2(c), [τ4′ ](new) must be generated
by random numbers R ∈ [0, 1] as
′ ′
The switching between diagrams of different order differs from the above process
in that it modifies a term with a given topology. Obviously this process also changes
the dimension of the parameter space. All FDs contributing to the polaron GF can
be sampled with two complimentary updates. Update A,
(ξm ) (ξ )
Dm (τ ; {x′1 , . . . , x′m }) −→ m+2
Dm+2 (τ ; {x′1 , . . . , x′m ; q ′ , τ3′ , τ4′ }) , (12.37)
A
transforms a given FD into a higher order FD with an extra phonon arch, which
connects two time points τ3′ and τ4′ by a phonon propagator of momentum q ′ , see
Fig. 12.2(c). On the opposite, update B performs the reciprocal transformation. Note
(ξm+2 ) (ξ )
that the ratio of the weights Dm+2 /Dmm is not dimensionless. The dimensionless
Metropolis ratio
(ξ )
m+2
pB Dm+2 (τ ; {x′1 , . . . , x′m ; q ′ , τ ′ , τ ′′ }) 1
M= (ξm ) ′ , τ ′ , τ ′′ )
(12.38)
pA ′
Dm (τ ; {x , . . . , xm }) ′ W (q
1
Let us assume that the DMC process adds and removes lines with equal proba-
bility. To add a phonon propagator the A-procedure randomly chooses an arbitrary
electronic propagator. The value of the left end of the phonon propagator, τ3′ , is se-
lected with uniform probability dτ3′ /∆τ , where ∆τ is the length of the electronic
propagator considered. Then, the right end of the phonon propagator, τ4′ , is seeded
with (normalized) probability density ∝ dτ4′ ω̄ exp(−ω̄(τ4′ − τ3′ )), where ω̄ is an av-
erage frequency of the phonon spectrum. Hence, according to (12.33), the value of
τ4′ is given by
1
τ4′ = τ3′ − ln(R) . (12.39)
ω̄
If τ4′ is larger than the right end of the diagram, τ , the update is rejected. The mo-
mentum q ′ of the new phonon propagator is choosen from an uniform distribution
over the whole Brillouin zone, dq ′ /VBZ . Then, according to the rule (12.38)
(ξ )
m+2
pB Dm+2 dτ3′ dτ4′ dq ′ /VBZ
M= ′ ′ . (12.40)
pA Dmm (dτ3 /∆τ )dτ4′ ω̄ e−ω̄(τ4 −τ3 ) dq ′ /VBZ
(ξ ) ′
The removal step B selects an arbitrary phonon propagator and accepts the up-
date with the reciprocal M −1 of the probability, which would be used when adding
the same propagator in step A. Let us emphasize that the context factor pB /pA de-
pends on the way how the add or removal process is organized. If, for instance, the
procedure addresses these processes with equal probabilities, the naive expectation
that pB /pA = 1 is wrong. To understand this, let us consider two diagrams, Dm
and Dm+2 . The diagram Dm contains Ne electron and Nph = (Ne − 1)/2 phonon
propagators. The procedure A transforms the diagram Dm to Dm+2 with Ne + 2
electron and Nph + 1 = (Ne + 1)/2 phonon propagators. The procedure B trans-
forms the second diagram to the first one, respectively. When procedure A selects
an electron propagator for inserting the point τ3′ in Dm , we have Ne possibilities,
hence, pA = 1/Ne . On the other hand, when the procedure B selects a phonon prop-
agator for removal from Dm+2 , there are Nph + 1 = (Ne + 1)/2 possibilities and
pB = 2/(Ne + 1). Therefore, detailed balance requires a context factor of
pB Ne
= . (12.41)
pA Nph + 1
Note that this factor essentially depends on how the processes are organized. For
example, if the rule of equal add and removal probability is relaxed and the add pro-
cess is addressed f times more frequently than the removal process, the probability
of process A is pA = f /Ne and the context factor reads pB /pA = Ne /[f (Nph + 1)].
Writing expression (12.41) I intentionally do not use the relation Nph + 1 =
(Ne + 1)/2 because it is valid only in the particular case of a polaron interacting
with one phonon branch without any other terms in the interaction Hamiltonian. If
the system includes interactions with other phonon branches or external potentials,
the relation between the number of phonon and electron propagators does not hold,
while expression (12.41) is still valid.
380 A. S. Mishchenko
m+2 (ξ ) (ξ )
Note that the ratio Dm+2 /Dmm depends on the topology of the higher-order
FD. When the FD in Fig. 12.2(c) is updated, e.g., from the FD in Fig. 12.2(b), the
ratio has the following form
(ξ )
m+2
Dm+2 G0 (k − q − q ′ ; τ4′ − τ3′ )
= |V (k − q, q ′ )|2 D0 (q ′ ; τ4′ − τ3′ ) . (12.42)
(ξ )
Dmm G0 (k − q; τ4′ − τ3′ )
Finally, let us add a few words about the general features of the DMC algorithm.
Note that all updates are local, i.e. do not depend on the structure of the whole FD.
Neither the rules nor the CPU time needed for the update depend on the order of the
FD. The DMC method does not imply any explicit truncation of the FD’s order due
to the finite size of computer memory. Even for strong coupling, where the typical
number of contributing phonon propagators Nph is large, the memory requirements
are marginal. In fact, according to the central limit theorem, the number of phonon
propagators obeys a Gauss distribution centered at N̄ph with a half width of the
1/2
order of N̄ph [52]. Hence, if memory for at least 2N̄ph propagators is reserved, the
diagram order hardly surpasses this limit.
For a beginner the rules given in the previous section and thoroughly described in
[26] may seem rather complicated and not easy to understand. In what follows we
therefore apply the DMC method to a set of increasingly complex examples. We
start with the Matsubara GF of a noninteracting particle with energy ε,
(a) (b)
τ G(τ) τ1 τ2 τ3 τ4 τ G(τ)
+1 +1
Fig. 12.3. Accumulation of statistics for (a) the GF of a QP and (b) the GF of a QP in an
attractive potential
one whenever the position of the external variable τ is within the cell ξ(i) < τ <
ξ(i + 1), see Fig. 12.3(a).
Next, we need to initialize τ with an arbitrary value from the domain [0, τmax ]
and set up rules for the update procedure. I suggest two methods: The “simplest”
one and the “best” one.
The new external parameter τnew is suggested as a shift τold → τnew = τold + δ(R −
1/2) of the old value τold . The new value is generated by a random number R ∈ [0, 1]
with uniform distribution W (x) = 1/δ in the range [τold − δ/2, τold + δ/2]. If τnew
is not in the range [0, τmax ], the update is rejected. Otherwise, the decision to accept
or reject the update is based on the Metropolis procedure with probability ratio
M = exp [−(ε − μ)(τnew − τold )].
normalized in the range [0, +∞] Then, according to the rules, one solves the equa-
tion
τnew
W (x)dx = R (12.45)
0
Inserting the probability densities W (τnew ), W (τold ), and the weights D(τnew ),
D(τold ), in the general expression (12.34) one gets M ≡ 1 and, hence, all updates
are accepted. Note that this update is accepted even if τ > τmax , though there is
nothing to add to the statistics, since the external variable is out of the histogram
382 A. S. Mishchenko
0.0
In [G(τ)]
–0.5
(a) (b)
–1.0
0 1 2 3 0 1 2 3
τ τ
Fig. 12.4. GFs of a QP in the logarithmic scale for ε = 0 and μ = −0.3. Solid line represents
the exact GF (12.43) of a free QP ln[G(0) (τ )] = −(ε − μ)τ . Dashed line describe the exact
GF of the QP in the attractive potential (12.48) ln[G(1) (τ )] = −(ε − V − μ)τ for V = 0.25.
Triangles and squares are the results of DMC method for small (a) and large (b) amount of
DMC updates, respectively
4
One can restrict the external variable τ to the range [0, τmax ] using the probability
density W (x) = [(1 − exp(−(ε − μ)τmax ))]−1 (ε − μ) exp(−(ε − μ)x), which is
normalized in the range [0, τmax ]. In this case one generates τnew as τnew = −(ε −
μ)−1 ln [1 − R[1 − exp(−(ε − μ)τmax )]]. Note the similarity of the above equation with
(12.36). It occurs because in both cases the distribution of the random variables is expo-
nential and normalized in a finite range.
12 DMC and SO Methods for Complex Composite Objects in Macroscopic Baths 383
To get an idea of the diagrammatic expansion of DMC let us solve the problem
by Feynman expansion. Since for the given
" problem
" the system is always in the
Hilbert space sector with one particle vac "c† c" vac = 1, we can introduce the
unity operator η and consider H (int) = −|V |η. Then, the Feynman expansion reads
" &∞
"
" ′ ′ "
G(1) (τ ) = vac "Tτ c(τ )c† (0)e|V | 0 η(τ )dτ " vac , (12.50)
con
with τ > 0, and the structure of diagrams is that of Fig. 12.3(b). According to
the general rules, the weight of the diagram is the product of particle propagators
. . . , exp[(ε−μ)(τi+1 −τi )], . . . and vertices |V |. Hence, the weight of each order-m
diagram of length τ is Dm (τ ) = |V |m exp[(ε − μ)τ ].
The GF can be calculated using three different updates: The modification of
the right diagram end τ , and a pair of self-balanced updates which add/remove the
vertex |V |, see the crosses in Fig. 12.3(b). Below I introduce the minimal set of
updates sufficient to reach the numerically exact solution. Note that this set is the
simplest one but not the most efficient.
Moving the external parameter τ : The value τ − τlast obeys the distribution
(12.44), where τlast is the position of the vertex with largest imaginary time, or τ = 0
when there is not a single vertex. Therefore we can use the recipes of Sect. 12.3.2
and obtain a rejection-free update method, if τnew is generated through
1
τnew = τlast − ln R . (12.51)
ε−μ
Add or remove an interaction vertex: To add an interaction vertex one randomly
chooses one particle propagator from the Nprop existing propagators in the FD of
Fig. 12.5(a), the dashed line, for example. Then the position of the new vertex is
suggested with uniform probability density W (x) = (τr − τl )−1 , hence, τnew is
chosen as τnew = τl + (τr − τl )R. The Metropolis ratio thus reads
Nprop
M= |V |(τr − τl ) . (12.52)
Nvert + 1
The structure of this ratio is intentionally given in a form where all factors have a
one to one correspondence with those of (12.38) and (12.40). Note the roles of the
last two factors in (12.52), (12.38) and (12.40). Nvert is the number of vertices in the
FD of Fig. 12.5(a). The first factor is the context factor pB /pA of (12.38), which is
necessary to self-balance add and removal processes, and whose form depends on
how these processes are organized. The expression Nprop /(Nvert + 1) accounts for
(a) (b)
τ1 τr τi τi+1 τi+2
Fig. 12.5. Updates adding (a)→(b) and removing (a)←(b) an interaction vertex. The circle
in (b) is an existing vertex of the present FD which is suggested for removal. The circle in (a)
is a vertex suggested for the adding procedure
384 A. S. Mishchenko
a process, where one of the vertices is selected randomly and then removed, τi+1
∗
in Fig. 12.5(b), for example. Note that there are Nvert = Nvert + 1 choices in the
FD of Fig. 12.5(b). Hence, for a self-balanced MC process one has to divide the
weight by the probability to suggest the addition of a new vertex, pA = 1/Nprop ,
and multiply by the probability to suggest the removal of the same vertex, pB =
∗
1/Nvert = 1/(Nvert + 1). This explains the factor pB /pA = Nprop /(Nvert + 1) in
(12.52).
The careful reader may have noticed that the context factor is equal to unity,
since for the FDs in Figs. 12.3(b) and 12.5 we always find the relation Nprop =
Nvert + 1. However, this is correct only for the specific example of the interaction
with a single attractive potential. If, e.g., an interaction with phonons is added, the
relation between the numbers of vertices and propagators is different, though the
expression (12.52) is still correct. Hence, it seems better to stick to the correct rea-
soning even in this simple example, and introduce context factors which are valid in
more general and complicated situations. For example, in the case of several types
of interaction vertices one can introduce self-balanced updates for each type of ver-
tices. In this case Nprop is the number of all propagators and Nvert is the number of
vertices of the given type.
The Metropolis ratio for the removal procedure is constructed as the inverse of
expression (12.52), which describes the adding of that same vertex which is now
considered for removal,
∗ −1
Nprop −1
M= ∗ |V |(τi+2 − τi ) . (12.53)
Nvert
∗ ∗
Here Nprop = Nprop + 1 (Nvert = Nvert + 1) is the number of propagators (vertices)
in the FD of Fig. 12.5(b).
In conclusion, the general strategy is the following: We start from a bare
FD without interaction vertices and with the external parameter τ in the range
τmin < τ < τmax , see Fig. 12.3). Then, with some probability one of the three up-
dates, move, add, or remove is suggested. Note that with the given context factors
the probabilities to address add and removal processes must be equal. One can, of
course, address add and removal processes with different probabilities, but in this
case the context factor pB /pA need to be modified accordingly. Finally, statistics is
collected as shown in Fig. 12.3, and in the end the data is normalized implying the
condition G(1) (τ = 0) ≡ 1.
In Fig. 12.4 we show the convergence of the statistics for the external variable τ
(squares) to the exact answer (dashed line). After ≈ 107 DMC updates the data is
very close to the exact result, and perfectly reproduces the exact GF after ≈ 3 × 109
DMC updates. Note that the integration over different orders of FDs and over the
internal imaginary times of the interaction vertices requires a larger number of DMC
updates, compared to the free particle.
12 DMC and SO Methods for Complex Composite Objects in Macroscopic Baths 385
It is straightforward to adapt the algorithm of the previous section to the less trivial
case of the interaction
Hint = − |V (k, k ′ )|c† c , (12.54)
k,k′
where, for simplicity, we assume that the degrees of freedom k and k ′ are restricted
′
to finite domains: kmin < k < kmax and kmin < k ′ < kmax
′
. Then, all rules are
identical to those of the previous section, except for two modifications. First, one
changes |V | to |V (k, k ′ )| in (12.52) and (12.53). Second, in the add-procedure one
generates k and k ′ with uniform probability densities k = kmin + (kmax − kmin )R
and k ′ = kmin
′ ′
+ (kmax − kmin ′
)R.
The exact result for the GF
& kmax & kmax
′
−(ε− dk dk′ |V (k,k′ )|−μ)τ
G(2) (τ ) = e kmin k′
min (12.55)
kmin dk kmin′ dk ′ |V (k, k ′ )|, we need more DMC updates to converge to the exact
result, because of additional integrations over the internal variables k and k ′ .
12.3.5 Exciton
0 τ1 τ2 τ3 τ
k+p k + p1 k + p2 k + p3
Uk(p,p1) Uk(p1,p2) Uk(p2,p3)
k–p k – p1 k – p2 k – p3
0 p τ1 τ2 τ3 τ
p1 p2 p3
Uk(p,p1) Uk(p1,p2) Uk(p2,p3)
Fig. 12.6. Upper panel: Ladder diagrammatic expansion for GF of an exciton with total mo-
mentum k. Lower panel: Equivalent one-line representation for the same class of diagrams
386 A. S. Mishchenko
of the corresponding interaction vertices Uk (p, p′ ) (vertical dashed lines) and the
propagators of electrons and holes with corresponding momenta (horizontal solid
lines). However, for the given structure of ladder diagrams the electron and hole
propagators can be combined into single propagators for the electron-hole pair. The
propagator of the electron-hole pair is the product of hole and electron propagators
and has the form
Gk (p, τi+1 − τi ) = e−(ǫk (p)−μ)(τi+1 −τi ) . (12.56)
The energy of the electron-hole pair ǫk (p) = εc (k + p) − εv (k − p) corresponds to
the difference of the hole and electron energies with the center of mass momentum
k and the relative momentum 2p. Then, for such a kind of Feynman expansion one
can formulate an effective bare Hamiltonian
H (0) = ǫk (p)ξp† ξp (12.57)
p
i.e. the expansion reduces to the line shown in the lower panel of Fig. 12.6.
12.3.5.1 Updates
The MC procedure for this series of FDs is a trivial modification of the techniques
presented in previous sections. Updating the external parameter τ one needs to take
into account that the distribution W (x) = (ǫk (p3 ) − μ) exp [−(ǫk (p3 ) − μ)x] de-
pends on the momentum p3 of the propagator at the end of the FD. The updates,
which add/remove an interaction vertex to/from the FD are similar to the previous
examples. One of the Nprop propagators is chosen randomly and a time τ ′ is selected
in the range [τl , τr ] with uniform probability. Then, the momentum p2 is selected
with uniform probability from the Brillouin zone and attributed to the new propa-
gator between the imaginary times τ ′ and τr , τ ′ is shown by circle in Fig. 12.7(a).
Finally, the Metropolis ratio is very similar to that obtained for the simple potential
model
Nprop Uk (p1 , p2 ) Uk (p2 , p3 ) −(ǫk (p2 )−ǫk (p1 ))(τr −τ ′ )
M= e . (12.59)
Nvert + 1 1/(τr − τl ) Uk (p1 , p3 )
(a) (b)
τ1 p1 τr p3 τi p1 τi+1 p2 τi+2 p3
Fig. 12.7. Updates adding (a)→(b) and removing (a)←(b) an interaction vertex. See caption
of Fig. 12.5
12 DMC and SO Methods for Complex Composite Objects in Macroscopic Baths 387
The first two factors are the same as in (12.52). The exponent takes into ac-
count the change of the FD weight due to the modification of the momentum of the
electron-hole pair between τ ′ and τr , and the factor in front of the exponent appears
due to the change of momentum p1 → p2 in the vertex at τr . The ratio for the
removal procedure is a straightforward modification of (12.53).
Having calculated the Green function, let us now extract further properties of the
exciton from the limit G(τ → ∞). An eigenstate Ψν (k) with energy Eν can be
written as
Ψν (k) ≡ ξk,p,ν e†k+p h†k−p |0 , (12.60)
p
where the amplitudes ξk,p,ν = ν; k|e†k+p h†k−p |0 describe the wave function of
the internal motion of the exciton. In terms of exciton eigenstates we have
′
Gp=p
k (τ ) = |ξk,p,ν |2 e−Eν τ . (12.61)
ν
If τ is much larger than the inverse energy difference between the ground state and
the first excited states, the GF projects to the ground state
′
Gp=p
k (τ → ∞) = |ξk,p,gs |2 e−Egs τ . (12.62)
Due to the normalization condition p |ξk,p,ν |2 ≡ 1, the asymptotic behavior of
#k = Gp=p is especially simple: G(τ
′
the sum G # ) → exp(−Egs τ ).
p k
′
By definition, in the limit τ → ∞, we have Gp=p # k = |ξk,p,gs |2 , i.e. the
/G
k
distribution over the quasimomentum p is related to the wave function of internal
′
motion, which is calculated by simulating the set of GFs Gp=pk with p = p′ .
p=p′
One can ask how to calculate the asymptotic behavior of Gk (τ ) when, ob-
viously, the first order diagram does not obey the condition p = p′ except for the
case of the Uk (p, p′ = p) vertex. Moreover, working with the function G # we en-
counter a certain formal problem: The zero- and first-order diagrams with respect to
Uk (p, p′ = p) contain macroscopically large factors N . However, since we are only
interested in the ground-state properties, we can safely omit the obstructive terms,
which in a careful analysis turn out to be irrelevant in the limit τ → ∞. There-
fore, in the simulation one simply starts from an arbitrary second order diagram and
excludes all diagrams of order less than two.
The exciton problem (12.1)–(12.2) has been studied for many years, but as yet there
was no rigorous technique available for its solution. The only solvable cases are the
388 A. S. Mishchenko
Frenkel small-radius limit [56] and the Wannier large-radius limit [57], but the range
of validity of these two approximations was unclear.
To study the conditions for the validity of the Frenkel and Wannier approaches
with DMC, we consider a three-dimensional (3D) system and assume an electron-
hole spectrum with symmetric valence and conduction bands of width Ec and a
direct gap Eg at zero momentum [27]. We find that for large ratio κ = Ec /Eg
(κ > 30) the exciton binding energy is in good agreement with the Wannier approx-
imation, see Fig. 12.8(a), and the probability density of the relative electron-hole
motion, see Fig. 12.8(c), corresponds to the hydrogen-like result. For smaller val-
ues of κ, however, both the binding energy and the wave function of the relative
motion, see Fig. 12.8(d) deviate noticeably from the large radius results. It is quite
100
1.0
(a) (c)
ln (Binding energy)
Binding energy
10–1
0.5
10–2
0.0
0.5 1.5 2.0
10–3 0.0 1.0
0 200 400 600 800 1000
Bandwidth
Electron−hole distance
Envelope function (a.u.)
0.25
10–4
0.0 20.0 40.0 60.0 80.0 0.2
(d)
Bandwidth
0.15
0.05
0.1
0.05
Wave function
0 0 2 4 6 8 10
Coordinate sphere
(e)
–0.05 (b)
–0.1
0 5 10 15 20 25 0 1 2 3 4
Coordinate sphere Coordinate sphere
Fig. 12.8. Panel (a): Dependence of the exciton binding energy on the bandwidth Ec = Ev
for conduction and valence bands. The dashed line corresponds to the Wannier model. The
solid line is the cubic spline, the derivatives at the right and left ends being fixed by the
Wannier limit and perturbation theory, respectively. Inset in panel (a): The initial part of the
plot. Panel (b): Wave function of internal motion in real space for the optically forbidden
monopolar exciton. Panels (c)–(e): The wave function of internal motion in real space: (c)
Wannier (Ec = Ev = 60); (d) intermediate (Ec = Ev = 10); (e) near-Frenkel (Ec =
Ev = 0.4) regimes. The solid line in the panel (c) is the Wannier model result while solid
lines in other panels are to guide the eyes only
12 DMC and SO Methods for Complex Composite Objects in Macroscopic Baths 389
surprising that we need such large valence and conduction bandwidths (κ > 20) for
the Wannier approximation to be applicable.
Similarly, the range of validity of the Frenkel approach is limited as well. Even
a strongly localized wave function does not guarantee good agreement between the
exact binding energy and the Frenkel approximation. For 1 < κ < 10 the wave
function is already strongly localized, but the binding energies differ considerably.
For example, at κ = 0.4, the relative motion is rather suppressed, cf. Fig. 12.8(e),
but the binding energy of the Frenkel approximation is two times larger than the
exact result, see inset in Fig. 12.8(a).
Another long-standing issue is the formation of charge transfer excitons in 3D
systems and the appropriate modelling of mixed valence semiconductors [58]. A
decade ago some of the unusual properties of SmS and SmB6 were explained on
the basis of an excitonic instability mechanism, thereby assuming a charge-transfer
nature of the optically forbidden exciton [59, 60]. Although this model explained
quantitatively the phonon spectra [61, 62], optical properties [63, 64], and mag-
netic neutron scattering data [65], its basic assumption has been criticized as being
groundless [66, 67]. We have studied the excitonic wave function of mixed valence
materials, starting from typical dispersions of the valence and conduction bands: An
almost flat valence band is separated from a broad conduction band with its maxi-
mum in the center and minimum at the border of the Brillouin zone [27]. The results
presented in Fig. 12.8(b) support the assumption of [59, 60], since the wave function
of the relative motion has an almost vanishing on-site component and its maximal
charge density at nearby neighbors.
(a) (b)
Fig. 12.9. Add/remove updates changing the number of interaction vertices. Solid (dashed)
lines correspond to propagators of the particle in a state with energy ε1 (ε2 )
390 A. S. Mishchenko
Nprop
M= ∆(τr − τl )e−ǫδS , (12.63)
Nvert + 1
where δS = (τi+2 − τi+1 ) − (τlast − τi+2 ) + (τ − τlast ). Note that each additional
vertex switches between the GFs G11 (τ ) and G12 (τ ). The statistics for G11 (τ ) is
updated when the right end of the diagram corresponds to a propagator of type 1,
which is denoted by a solid line, i.e. when there is an even number of interaction
vertices, and a contribution to the statistics of GF G12 (τ ) is counted otherwise.
To realize the importance of the above remark one can take the code for an
attractive potential from Sect. 12.3.3, and use it for the calculation of the GF of the
degenerate two-level system. In the case of zero bias ǫ = 0 the exponential factor in
(12.63) is irrelevant and the DMC algorithms for both problems are equivalent. The
only difference is the way how the statistics for the GFs is collected, since a diagram
contributes to G11 (τ ), G12 (τ ), for even, odd, number of interaction vertices.
The analytic GFs for the two-level system (12.10)–(12.11) in the case of zero
bias can be obtained in the following way: Diagonalization of the Hamiltonian
of the two-level system (12.10)–(12.11) without coupling to bosons yields two
eigenstates with energies ±∆. Then, the GFs G11 (τ ) = vac|a1 (τ )a†1 |vac and
G12 (τ ) = vac|a1 (τ )a†2 |vac can be obtained by a canonical transformation a1,2 =
√
1/ 2[aup ± alow ] of the initial creation and annihilation operators a1,2 and a†1,2 into
the operators of the upper and lower state aup, low and a†up, low . Then, taking into ac-
count that aup, low (τ ) = exp[−(±∆ − μ)τ ] aup, low , one arrives at the following
expressions
1 −(−∆−μ)τ
0.0
(a) (b)
–0.5 –1.0
In[G11(τ)]
In[G12(τ)]
–1.0
–1.5
–1.5
–2.0 –2.0
0 10 20 30 0 10 20 30
τ τ
Fig. 12.10. Comparison of the DMC data (squares) for G11 (τ ) (a) and G12 (τ ) (b) with
the solid lines corresponding to the analytic expressions (12.64) for the degenerate two-level
system. The dashed line marks the asymptotics ln[G11,12 (τ )]τ →+∞ = ln(1/2)−(−∆−μ)τ
of both GFs. Calculations are done for μ = −0.2 and ∆ = 0.15
12 DMC and SO Methods for Complex Composite Objects in Macroscopic Baths 391
The solution of the integral equation (12.17) is known to be an ill conditioned prob-
lem. The GF Gk (τ ) is known only with statistic errors and on a finite number of
imaginary times in a finite range [0, τmax ]. Due to this incomplete and noisy infor-
mation, there is an infinite number of approximate solutions which reproduce the GF
within some range of deviation, and the problem is to choose the best one. Another
problem is the saw tooth noise instability, which remained a stumbling block for
decades. It occurs when the problem is solved naively, e.g. by using a least-squares
approach for minimizing the deviation measure
τmax" "
# k (ω) = " #k (τ )"" G−1 (τ ) dτ .
D L "Gk (τ ) − G k (12.65)
0
M
1 # (s)
Lk (ω) = L (ω) . (12.66)
M s=1 k
392 A. S. Mishchenko
(s)
# (ω) is parameterized in terms of a sum
The particular solution L k
K
# (s) (ω) =
L χ{Pt } (ω) (12.67)
k
t=1
of rectangles {Pt } = {ht , wt , ct } with height ht > 0, width wt > 0, and center ct .
The configuration
C = Pt , (12.68)
K
with t = 1, . . . , K, which satisfies the normalization condition t=1 ht wt = 1,
defines the function G # k (τ ). The generation of a particular solution starts from an
arbitrary initial configuration Csinit . Then, the deviation measure is optimized with
a random sequence of updates, until the deviation is less than Du . In addition to
the updates, which do not change the number of terms in the sum (12.67), there are
updates which increase or decrease K. Hence, since the number of elements K is
not fixed, any spectral function can be reproduced with the desired accuracy.
Although each particular solution L # (s) (ω) suffers from saw tooth noise in re-
k
gions where the Lehmann function is smooth, the statistical independence of each
solution leads to a self-averaging of this noise in the sum (12.66). Note that the
noise is suppressed without suppressing high derivatives. Hence, in contrast to reg-
ularization approaches, sharp peaks and edges are not smeared out. Moreover, the
continuous parameterization (12.67) does not need a predefined mesh in ω-space,
and, since the Hilbert space of solutions is sampled directly, no assumptions about
the distribution of statistical errors are required.
In Fig. 12.11 we present results for an averaging over an increasing number
of statistically independent particular solutions. One can notice that the spikes in
the spectral analysis data disappear with increasing M . Note, that neither the gen-
eral shape of the triangle, which is an artificial Lehmann function with infinite first
derivatives, nor the sharp low-energy edge of the spectral density are corrupted by
the SO method.
1.5
0.5
0.0
0 1 2 3 0 1 2 3 0 1 2 3
ω ω ω
Fig. 12.11. Comparison of the actual spectral function (dashed line) with the results of a
spectral analysis after averaging over (a) M = 4, (b) M = 28, and (c) M = 500 particular
solutions
12 DMC and SO Methods for Complex Composite Objects in Macroscopic Baths 393
References
1. J. Appel, Polarons. Solid State Physics, Vol. 21 (Academic, New York, 1968) 367
2. S.I. Pekar, Untersuchungen über die Elektronentheorie der Kristalle (Akademie Verlag,
Berlin, 1954) 367
3. L.D. Landau, Phys. Z. Sowjetunion 3, 664 (1933) 367
4. H. Fröhlich, H. Pelzer, S. Zienau, Philos. Mag. 41, 221 (1950) 367
5. J. Kanamori, Appl. Phys. 31, S14 (1960) 367
6. K.I. Kugel, D.I. Khomskii, Sov. Phys. Usp. 25, 231 (1982) 367
7. Y. Toyozawa, J. Hermanson, Phys. Rev. Lett. 21, 1637 (1968) 367
8. I.B. Bersuker, The Jahn-Teller Effect (IFI/Plenum, New York, 1983) 367
9. V.L. Vinetskii, Sov. Phys. JETP 13, 1023 (1961) 368, 371
10. P.W. Anderson, Phys. Rev. Lett. 34, 953 (1975) 368, 371
11. H. Hiramoto, Y. Toyozawa, J. Phys. Soc. Jpn. 54, 245 (1985) 368, 371
12. A. Alexandrov, J. Ranninger, Phys. Rev. B 23, 1796 (1981) 368, 371
13. H. Haken, Il Nuovo Cimento 3, 1230 (1956) 368, 371
14. F. Bassani, G. Pastori Parravicini, Electronic States and Optical Transitions in Solids
(Pergamon, Oxford, 1975) 368, 371
15. J. Pollmann, H. Büttner, Phys. Rev. B 16, 4480 (1977) 368, 371
16. A. Sumi, J. Phys. Soc. Jpn. 43, 1286 (1977) 368
17. M. Ueta, H. Kanzaki, K. Kobayashi, Y. Toyozawa, E. Hanamura, Excitonic Processes in
Solids (Springer-Verlag, Berlin, 1986) 368
18. C.L. Kane, P.A. Lee, N. Read, Phys. Rev. B 39, 6880 (1989) 368, 371
19. Y.A. Izyumov, Phys. Usp. 40, 445 (1997) 368
20. A.J. Leggett, Science 296, 861 (2002) 368, 371
21. A.J. Leggett, S. Chakravarty, A.T. Dorsey, M.P.A. Fisher, A. Garg, W. Zwerger, Rev.
Mod. Phys. 59, 1 (1987) 368, 371, 372
22. A. Macridin, G.A. Sawatzky, M. Jarrell, Phys. Rev. B 69, 245111 (2004) 368
23. H. Fehske, G. Wellein, G. Hager, A. Weiße, A.R. Bishop, Phys. Rev. B 69, 165115 (2004)
368
24. N.V. Prokof’ev, B.V. Svistunov, I.S. Tupitsyn, Sov. Phys. JETP 87, 310 (1998) 368
25. N.V. Prokof’ev, B.V. Svistunov, Phys. Rev. Lett. 81, 2514 (1998) 368
26. A.S. Mishchenko, N.V. Prokof’ev, A. Sakamoto, B.V. Svistunov, Phys. Rev. B 62, 6317
(2000) 368, 369, 373, 374, 380, 391, 393
27. E.A. Burovski, A.S. Mishchenko, N.V. Prokof’ev, B.V. Svistunov, Phys. Rev. Lett. 87,
186402 (2001) 368, 372, 373, 388, 389, 393
28. A.S. Mishchenko, N. Nagaosa, N.V. Prokof’ev, B.V. Svistunov, E.A. Burovski, Nonlin-
ear Optics 29, 257 (2002) 368, 372, 389
29. A.S. Mishchenko, N. Nagaosa, N.V. Prokof’ev, A. Sakamoto, B.V. Svistunov, Phys. Rev.
Lett. 91, 236401 (2003) 368, 374, 393
30. A.S. Mishchenko, N. Nagaosa, Phys. Rev. Lett. 93, 036402 (2004) 368, 393
31. A.S. Mishchenko, Phys. Usp. 48, 887 (2005) 368, 369, 393
32. A.S. Mishchenko, N. Nagaosa, J. Phys. Soc. J. 75, 011003 (2006) 368, 369, 393
33. A.S. Mishchenko, Proceedings of the international school of physics “Enrico Fermi”,
Course CLXI (IOS Press, 2006), pp. 177–206 368
34. A.S. Mishchenko, N. Nagaosa, Phys. Rev. Lett. 86, 4624 (2001) 368, 372, 373, 389, 393
35. A.S. Mishchenko, N.V. Prokof’ev, B.V. Svistunov, Phys. Rev. B 64, 033101 (2001) 368, 393
36. A.S. Mishchenko, N. Nagaosa, N.V. Prokof’ev, A. Sakamoto, B.V. Svistunov, Phys. Rev.
B 66, 020301 (2002) 368, 391, 393
12 DMC and SO Methods for Complex Composite Objects in Macroscopic Baths 395
37. A.S. Mishchenko, N. Nagaosa, Phys. Rev. B 73, 092502 (2006) 368, 393
38. A.S. Mishchenko, N. Nagaosa, J. Phys. Chem. Solids 67, 259 (2006) 368
39. G. De Filippis, V. Cataudella, A.S. Mishchenko, C.A. Perroni, J.T. Devreese, Phys. Rev.
Lett. 96, 136405 (2006) 368, 393
40. G.D. Mahan, Many particle physics (Plenum Press, New York, 2000) 369, 370, 372, 374, 376
41. A. Damascelli, Z. Hussain, Z.X. Shen, Rev. Mod. Phys. 75, 473 (2003) 370, 371
42. A.A. Abrikosov, L.P. Gor’kov, D.I. E., Quantum Field Theoretical Method in Statistical
Physics (Pergamon Press, Oxford, 1965) 370, 374, 376
43. M. Jarrell, J.E. Gubernatis, Phys. Rep. 269, 133 (1996) 370, 372, 391
44. R. Knox, Theory of Excitons (Academic Press, New York, 1963) 371
45. I. Egri, Phys. Rep. 119, 363 (1985) 371
46. D. Haarer, Chem. Phys. Lett. 31, 192 (1975) 371
47. D. Haarer, M.R. Philpott, M. H., J. Chem. Phys. 63, 5238 (1975) 371
48. A. Elschner, G. Weiser, Chem. Phys. 98, 465 (1985) 371
49. N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, E. Teller, J. Chem. Phys.
21, 1087 (1953) 375, 377
50. M.E.J. Newman, G.T. Barkema, Carlo Methods in Statistical Physics (Clarendon Press,
Oxford, 2002) 375, 377
51. D.P. Landau, K. Binder, A Guide to Monte Carlo Simulations in Statistical Physics (Uni-
versity Press, Cambridge, 2000) 375, 377
52. A.W. Sandvik, J. Kurkijärvi, Phys. Rev. B 43, 5950 (1991) 380
53. D.M. Ceperley, J. Comp. Phys. 51, 404 (1983) 380
54. D.M. Ceperley, A.B. J., J. Chem. Phys. 81, 5833 (1984) 380
55. N. Prokof’ev, B. Svistunov, I. Tupitsyn, Phys. Rev. Lett. 82, 5092 (1999) 380
56. J. Frenkel, Phys. Rev. 37, 17 (1931) 388
57. G.H. Wannier, Phys. Rev. 52, 191 (1937) 388
58. S. Curnoe, K.A. Kikoin, Phys. Rev. B 61, 15714 (2000) 389
59. K.A. Kikoin, A.S. Mishchenko, Zh. Eksp. Teor. Fiz. 94, 237 (1988). [Sov. Phys. JETP
67, 2309 (1988)] 389
60. K.A. Kikoin, A.S. Mishchenko, J. Phys.: Condens. Matter 2, 6491 (1990) 389
61. P.A. Alekseev, I.A. S., B. Dorner, et.al, Europhys. Lett. 10, 457 (1989) 389
62. A.S. Mishchenko, K.A. Kikoin, J. Phys.: Condens. Matter 3, 5937 (1991) 389
63. G. Travaglini, P. Wachter, Phys. Rev. B 29, 893 (1984) 389
64. P. Lemmens, A. Hoffman, A.S. Mishchenko, et.al, Physica B 206-207, 371 (1995) 389
65. K.A. Kikoin, A.S. Mishchenko, J. Phys.: Condens. Matter 7, 307 (1995) 389
66. T. Kasuya, Europhys. Lett. 26, 277 (1994) 389
67. T. Kasuya, Europhys. Lett. 26, 283 (1994) 389
68. A.N. Tikhonov, V.Y. Arsenin, Solutions of Ill-Posed Problems (Winston, Washington,
1977) 391
69. E. Perchik (2003). URL http://arxiv.org/abs/math-ph/0302045. Preprint
391
70. D.L. Phillips, J. Assoc. Comput. Mach. 9, 84 (1962) 391
71. D.L. Tikhonov, Sov. Math. Dokl. 4, 1035 (1963) 391
72. S.S. Aplesnin, J. Exp. Theor. Phys. 97, 969 (2003) 393
13 Path Integral Monte Carlo Simulation of Charged
Particles in Traps
13.1 Introduction
This chapter is devoted to the computation of equilibrium (thermodynamic) proper-
ties of quantum systems. In particular, we will be interested in the situation where
the interaction between particles is so strong that it cannot be treated as a small per-
turbation. For weakly coupled systems many efficient theoretical and computational
techniques do exist. However, for strongly interacting systems such as nonideal
gases or plasmas, strongly correlated electrons and so on, perturbation methods fail
and alternative approaches are needed. Among them, an extremely successful one
is the Path Integral Monte Carlo (PIMC) method which we are going to consider in
this chapter.
A. Filinov et al.: Path Integral Monte Carlo Simulation of Charged Particles in Traps, Lect. Notes Phys. 739, 397–412
(2008)
DOI 10.1007/978-3-540-74686-7 13 c Springer-Verlag Berlin Heidelberg 2008
398 A. Filinov et al.
expression for the N -particle density operator which has to be substituted in (13.1).
This problem was first overcome by Feynman [2]. The key idea was to express the
unknown density operator,
ρ(β)|R′ with
R| ρ = e−β H , (13.2)
One simple and straightforward strategy is to use the group property of the density
matrix. It allows to express the density matrix at low temperatures in terms of its
values at higher temperature, i.e.
ρ(R, R′ ; β1 + β2 ) = R|e−(β1 +β2 )H |R′ = dR1 R|e−β1 H |R1 R1 |e−β2 H |R′
= dR1 ρ(R, R1 ; β1 )ρ(R1 , R′ ; β2 ). (13.3)
Equations (13.4) and (13.5) are correct for any finite M as long as we use exact ex-
pressions for the high-temperature N -particle density matrices, ρ(Ri−1 , Ri ; ∆β).
Unfortunately, they are unknown, and to proceed further we need to introduce
approximations.
1
The total dimension of the integral, (M − 1) 3N , may be very large. The success of the
method relies on highly efficient Monte Carlo integration.
13 Path Integral Monte Carlo Simulation of Charged Particles 399
M 2
≈ e−∆β T e−∆β V + O e−∆β M[T ,V ]/2
! 1
M
≈ e−∆β T e−∆β V +O . (13.6)
M
Note that T and V do not commute giving rise to the commutator, [T, V ], which
is only the first term of a series2 . Neglecting the terms [T, V ] gives an error of the
order O [1/M ]. This error can be made arbitrarily small by choosing a sufficiently
large number of factors M .
Using the Trotter result (13.6), we immediately obtain an approximation for
high temperatures3
ρ(Ri , Ri+1 ; ∆β) ≈ Ri |e−∆β T e−∆β V |Ri+1
2
/λ2∆ −∆βV (Ri ;∆β)
= λ−3N
∆ e−π(Ri −Ri+1 ) , (13.7)
where λ∆ = 2π2 ∆β/m is the De Broglie wavelength. Substituting (13.7) in
(13.5) we get our final result for low temperatures
M−1 M−1
− π(Ri −Ri+1 )2 /λ2∆ − ∆βV (Ri )
ρ(R, R′ ; β) = dR1 . . . dRM−1 e i=0 e i=0 ,
(13.8)
with the boundary conditions: R0 = R and RM = R′ . Hence, we have con-
structed a suitable representation of the N -particle density matrix, which can be
evaluated numerically with the help of a Monte Carlo algorithm.
As we can see from (13.5) and (13.8), all N particles have their own images on M
different planes (or ‘time slices’). We can view these images (for each particle 3M
sets of coordinates) as a ‘trajectory’ or a ‘path’ in the configurational space. The
inverse temperature argument β can be considered as an imaginary time of the path.
The set of M time slices is ordered along the β-axis and separated by intervals ∆β.
In Fig. 13.1 we show typical configurations of particle trajectories which contribute
to the diagonal density matrix element (13.5) with R = R′ . The full density matrix
ρ(R, R; β) is obtained after integration over all possible path configurations with
the fixed end points (R = R′ ).
2
Double, triple and higher-order commutators have higher powers ∆β n as a prefactor and
can be dropped in the limit ∆β → 0.
3
Other more accurate high-temperature approximations are discussed in [1, 3].
400 A. Filinov et al.
2.0 XY plane 5 4 3 2 1
1 100
1.5
80
1.0
2 60
3 M
0.5
40
0.0
4 20
–0.5
5
0
5 4 3 1 2
–1.0
–1.0 –0.5 0.0 0.5 1.0 1.5 2.0 –1.0 –0.5 0.0 0.5 1.0 1.5 2.0 2.5
Y
If we look at the final analytical result for the high-temperature density matrix
(13.8), we recognize the usual Boltzmann factor with some effective action in the
exponent. This action describes two types of interaction. The first term,
M−1
N M−1 N M−1
π 2 π j j 2 k j
(Ri − Ri+1 ) = (r − r ) = (∆ri,i+1 )2 ,
i=0
λ2∆ λ2∆ j=1 i=0 i i+1
2 j=1 i=0
(13.9)
comes from the kinetic energy density matrices of free particles (j denotes sum-
mation over N particles, and i over M ‘time slices’). This energy can be interpreted
as the energy of a spring, U = k(∆r)2 /2. Changing one of the coordinates rji at
the time slice i is equivalent to a change of the spring energy of two nearest links,
j j
Ui = k(∆ri−1,i )2 /2 and Ui+1 = k(∆ri,i+1 )2 /2. These springs provide that the
nearest points on the path are usually at some average distance proportional to λ∆ .
With temperature reduction the average size of the path increases with λ∆ .
The second term ∆βV (Ri ) in (13.8) adds interactions to the system (e.g. an
external potential or inter-particle pair interaction)
M−1
N M−1
M−1
Each potential term depends only on the particle coordinates on the same time slice,
i.e. (r1i , r2i , . . . , rN
i ). As a result the number of pair interactions at each time slice,
N (N − 1)/2, is conserved.
In all expressions above we have considered the particles as distinguishable.
The generalization to quantum particles obeying Fermi/Bose statistics is considered
below, and discussed in more detail in [1, 3, 4, 5].
13 Path Integral Monte Carlo Simulation of Charged Particles 401
Having the general idea of the PIMC simulations we are ready to formulate the first
list of important issues which we need to solve.
It is necessary to explore the whole coordinate space for each intermediate point.
This is very time consuming. To speed up convergence we move several slices
(points of path) at once.
The key point is to sample a path using mid-points Rm and a consequent itera-
tion (bisection), see Fig. 13.2(b).
With the definition: 0 < t < β, τ = i0 ∆β [i0 = 1, 2, 3, . . .], R ≡ R(t), R′ ≡
R(t + 2τ ), Rm ≡ R(t + τ ), the guiding rule to sample a mid-point Rm is
R|e−τ H |Rm Rm |e−τ H |R′ 2
/2στ2
P (Rm ) = ≈ (2πστ2 )−d/2 e−(Rm −R) ,
R|e−2τ H |R′
(13.11)
where d is the spatial dimension of the system. In practice, we can neglect in
the sampling distribution the potential energy and use only the ratio of the free-
particle density matrices. As a result we get a Gaussian distribution with the mean
R = (R + R′ )/2 and the variance στ2 = 2 τ /2m. This will lead to 100% ac-
ceptance of sampling for ideal systems (and close to one for a weakly interacting
system).
For strongly interacting systems the overlap of the paths sampled from the free-
particle distribution (13.11) results in large increase of the interaction energy and in
a poor acceptance probability at the last level of the bisection sampling [1, 3]. This
can be improved by using the optimized mean and the variance
2
R + R′ ∂V (R) 2 τ 2 τ
R= + στ , στ2 = + ∆V (R) , (13.12)
2 ∂R 2m m
which also accounts for interaction between nearest neighbors (gradient of the po-
tential energy).
The advantages of the bisection sampling method [1, 3] are:
– Detailed balance is satisfied at each level.
– We do not waste time on moves for which paths come close and the potential
energy increases strongly (for repulsive interaction). Such configurations are re-
jected already at early steps.
– Computer time is spent more efficiently because we consider mainly configura-
tions with high acceptance rate.
– The sampling of particle permutations is easy to perform.
402 A. Filinov et al.
In the quasi-classical limit (β → 0), only the classical path is important, R0 (t) =
(1 − t/β) R + tR′ /β, which leads to the semi-classical approximation of the high-
temperature density matrix
∆β
&
− dt V (R0 (t))
′ ′
ρ(R, R ; ∆β) = ρ0 (R, R ; ∆β)e 0 , (13.14)
which is already much better compared to (13.7) with the substitution of classical
(in many cases divergent) potentials.
For systems with pair interactions, in the limit of small ∆β, the full density
matrix (13.13) can be approximated by a product of pair density matrices
&β β
− dt V (R(t)) 0 − & dt Vpair [rj (t),rk (t)]
e 0 ≈ e 0 , (13.15)
FK j<k FK
which is known as the pair approximation. It supposes that on the small time interval
∆β the correlations of two particles become independent from the surroundings.
Different derivations of the effective pair potential (average on the r.h.s of (13.15))
have been proposed in the literature [6, 7, 8]. More accurate effective interaction
potentials, which take into account two, three and higher order correlation effects,
help to reduce the number of time slices by a factor of 10 or more.
The implementation of periodic boundary conditions leads to further modifica-
tions, see e.g. [9, 10, 11, 12, 13, 14, 15].
One can note at once, that for weakly interacting systems at high temperatures,
the virial result (13.18) directly gives the classical kinetic energy (first term) and
does not depend on the chosen number of time slices M , whereas using the direct
estimator (13.17) we get this result by calculating the difference of two large terms
which are diverging as M → ∞.
When we try different kinds of moves in the Metropolis algorithm, it may happen
that some moves will be frequently rejected or accepted. In both cases, we loose
the efficiency of the algorithm. The system will be trapped in some local region of
phase space for a long time (number of MC steps), and will not explore the whole
space within reasonable computer time. In practice, the parameters of the moves
are usually chosen to get an acceptance ratio of roughly 50%, which requires the
construction of good apriori sampling distributions for the different kinds of PIMC
moves (particle displacement, path deformation, permutation sampling).
A discussion of these topics, which is beyond the scope of this lecture, can be
found in [1, 3].
Now we come to ‘real’ quantum particles. As we have already discussed, the prop-
erties of a system of N particles at a finite temperature T are determined by the
density operator. Due to the Fermi/Bose statistics the total density matrix should
be (anti)symmetric under arbitrary exchange of two identical particles (e.g. elec-
trons, holes, with the same spin projection), i.e. we have to replace ρ → ρA/S for
fermions/bosons. As a result the full density matrix will be a superposition of all N !
404 A. Filinov et al.
permutations of N identical particles. Let us consider the case of two types (e,h) of
particles with numbers Ne , Nh
1
ρA/S (Re , Rh , Re , Rh ; β) = (∓1)Pe (∓1)Ph ρ(Re , Rh , Pe Re , Ph Rh ; β) ,
Ne !Nh !
Pe Ph
(13.19)
where Pe(h) is the parity of a permutation (number of equivalent pair transposi-
tions) and Pe(h) the permutation operator. We directly see that for bosons all terms
have a positive sign, while for fermions the sign of the prefactor alternates depend-
ing whether the permutation is even or odd.
In the last case a severe problem arises. The Metropolis algorithm gives the
same distribution of permutations for both Fermi and Bose systems. The reason is
that, for sampling permutations, we use the modulus of the off-diagonal density ma-
trix, |ρ(R, P R; β)| (implementation of the importance sampling in the Metropolis
scheme). We find that:
– For bosons all permutations contribute with the same (positive) sign. Hence with
the increase of the permutation statistics, accuracy in the calculation of the den-
sity matrix increases proportionally.
– For fermions positive and negative terms cancel almost completely (correspond-
ing to even and odd permutations), since both are close in their absolute values.
Accurate calculation of this small difference is hampered noticeably with the
increase of quantum degeneracy (low T , high density). The consequences are
large fluctuations in the computed averages. This is known as the fermion sign
problem. It was shown [5] that the efficiency of the straightforward calculations
scales like exp(−2N β∆F ), where ∆F is the free energy difference per particle
of the same fermionic and bosonic system, and N is the particle number.
Fermi and Bose statistics require sampling of permutations, see (13.19), in addition
to the integrations in real space. From the N ! possibilities, we need to pick up a
permutation which has a non-zero probability for a given particle configuration.
To realize a permutation we pick up two end-points {Ri , Ri+i0 } along the β-
axis with i0 = 2l−1 (l = 1, 2, . . .). Although the permutation operator P in (13.19)
acts on the last time-slice, Re(h) , the permutation of the paths, {Ri , Ri+i0 } →
{Ri , P Ri+i0 } can be carried out at any time slice because the operator P commutes
with the Hamiltonian. In a permutation (k permuted particles) the path coordinates
between the fixed points Ri and Ri+i0 are removed and new paths connecting one
particle to another (new k links) or a new path connecting a particle on itself (if a
given particle undergoes the identity permutation) are sampled.
It is evident that a local permutation move consisting of a cyclic exchange of
k ≥ 2 neighboring particles will be more probable than a global exchange involving
a macroscopic number of particles, and, in general, the probability of exchange will
decrease with the increase of k. The most probable are local updates: Permutations
13 Path Integral Monte Carlo Simulation of Charged Particles 405
of only few (2, 3, 4) particles. Moreover, any of the N ! permutations can be decom-
posed in a sequence of successive pair transpositions (two particle exchange), and
we can explore the whole permutation space by making only local updates which
have a high acceptance ratio.
In MC simulations we choose as the sampling probability of permutations
ρkin (Ri , P Ri+i0 ; i0 ∆β)
T (P → P ′ ) = , (13.20)
ρkin (Ri , PRi+i0 ; i0 ∆β)
P ∈Ω(P )
a) 5 1 3 4 2 34 b) c) 5 4 3 1 2
100 100
90 32 90
80 30 80
70 28 70
60 60
26
50 50
24
40 40
30 22 30
20 20 20
10 18 10
0 0
4 1 5 3 2 16 1 4 4 1 5 3 2
–1.0 –1.5 0.0 0.5 1.0 1.5 2.0 –1.0 –0.5 0.0 0.5 1.0 –1.0 –0.5 0.0 0.5 1.0 1.5 2.0
Y Y Y
Fig. 13.2. (a),(c) The Y-coordinates of N = 5 identical particles as a function of the time-
slice number m. Labels show particle indices. Thick gray and light gray lines show the paths
of the particles ‘1’ and ‘4’ which are exchanged by sampling new paths at time-slices m =
17 − 33 (these time-slices are in the region between two dashed lines). (b) Sampling of new
paths using the bisection algorithm for the particles ‘1’ and ‘4’. The new paths are constrained
at the time-slices m = 17−33. Old (new) paths are shown by lines (circles). The filled circles
show two mid-points sampled at the level l = 1 (center of the interval, m = 25) and four
other mid-points for sub-intervals [17, 25] and [25, 33] sampled at level l = 2. Open circles
show final new paths for two particles obtained with the sampling at levels l = 3, 4 and
the transposition, i.e. by exchanging the paths starting from m = 33 up to the end point,
m = 100
interaction strength, have a direct influence on these cycle distributions and, hence,
on the superfluid and condensate fractions. Below we demonstrate how the latter
can be easily related to the statistics of path configurations sampled by PIMC.
To be more specific in the discussion below we consider a system of trapped
bosons with Coulomb interaction described by the Hamiltonian
N
N
=H
0 + e2 0 = 2 2 m 2 2
H , H − ∇ + ω ri , (13.23)
i<j
ε|ri − r j |2 i=1
2m ri 2
which can be also reduced to the dimensionless form (in the harmonic oscillator
units)
N N
# = H = 1
H −∇2#ri + r#i2 + λ
1
, (13.24)
ω 2 i=1 r#
i<j ij
13.4.1 Superfluidity
where N is the particle number, M the number of time slices used in the path inte-
gral presentation, and . . . denotes the thermal average with respect to the bosonic
(symmetric) N -particle diagonal density matrix
1
. . . = dr 1 dr 2 . . . dr N (. . .) ρS (r 1 , r2 , . . . , rN ; β) . (13.26)
Z
This formula has been derived [24] for finite systems by assuming that particles
are placed in an external field, e.g. in a rotating cylinder. Then one assumes that the
system is put in a permanent slow rotation with the result that the normal component
follows the imposed rotation while the superfluid part stays at rest. The effective
408 A. Filinov et al.
moment of inertia is defined as the work required to rotate the system by a unit
angle.
For macroscopic systems the path area formula (13.25) can be modified [3, 25].
Instead of a filled cylinder, one considers two cylinders with the radius R and spac-
¯ where d¯ ≪ R. Such a torus is topologically equivalent to the usual periodic
ing d,
boundary conditions. As a result we have: Ic = mN R2 and Az = W R/2, where
W is the winding number, defined as the flux of paths winding around the torus and
crossing any plane
N
β
ρs W 2 dr i (t)
γs = = , W = dt . (13.27)
ρ 2λβN i=1 0 dt
where n0 is the fraction of particles in the condensate and V is the volume of the
simulation cell. For a homogeneous isotropic system, ρ(r 1 , r ′ 1 ) = ρ(|r 1 − r ′ 1 |)
and, by taking the Fourier transform of an off-diagonal element, one obtains the
momentum distribution
1 ′
ρ(k) = d
d(r 1 − r ′ 1 ) e−ik(r1 −r 1 ) ρ(|r 1 − r ′ 1 |; β) , (13.30)
(2π)
which shows a sharp increase at zero momentum when the temperature drops below
the critical temperature Tc of Bose condensation.
Obviously, a finite trapped system of particles considered in real experiments
behaves differently. The radial density is strongly inhomogeneous with the highest
value at the trap center. However, these systems do represent an analog of the homo-
geneous macroscopic system in the angular direction (for traps with angular sym-
metry as in the case (13.23)). Hence, the macroscopic formulas (13.29) and (13.30)
should be modified in an appropriate way and the corresponding momentum distri-
bution, the condensate fraction and superfluidity acquire an additional dependence
on the radial distance from the trap center.
As follows from (13.28), for the numerical evaluation of the single-particle den-
sity matrix one should allow that one of the N simulated particles has an open path,
e.g. r 1 = r ′ 1 . The paths of the other N − 1 particles can close at their beginning
13 Path Integral Monte Carlo Simulation of Charged Particles 409
W = ρS (r 1 , r2 , . . . , rN , r ′ 1 , r2 , . . . , rN ; β)/Z ′ , (13.32)
(Z ′ is the normalization factor) which is then used to obtain the momentum distri-
bution (13.30). The probability W is sampled using the path integral representation
of ρS .
Recently a new method to sample the single-particle density matrix (13.28) has
been proposed [26, 27]. It is based on generalization of the conventional PIMC to
the grand canonical ensemble. A worm algorithm [26, 27] allows for a simultaneous
sampling of both diagonal configurations contributing to the partition function and
off-diagonal ones which contribute to the one-particle Matsubara Green function.
The method has been recently applied to study of Bose condensation in crystalline
4
He and superfluidity in para-hydrogen droplets [28, 29], where high efficiency in
sampling of long permutation cycles (practically unaffected by system size) and
significantly improved convergence in the calculation of superfluid properties has
been demonstrated.
(a) λ = 10
1,0
(b)
0,8
Superfluid fraction, γs
0,6
N = 5 interacting bosons
Coupling strength:
(c) ideal
0,4 λ=2
λ = 10
λ = 100
0,2
Fig. 13.3. Superfluid fraction for N = 5 charged bosons with Coulomb interaction in a two-
dimensional harmonic trap (see the Hamiltonian (13.23)). Parameters are: λ = 0, 2, 10, 100,
and temperature, T = kB T /ω. Symbols denote PIMC data (from [30]). Dash-dotted line
displays an analytical result, γs = 1 − Iq /Ic , for the ideal harmonically confined bosons. The
insets show the density distributions at λ = 10 for three temperatures
410 A. Filinov et al.
With the PIMC method it is possible to include inter-particle interactions like e.g.
Coulomb repulsion (13.23) from first principles. The effective strength of the inter-
action can be controlled by the trap frequency and is measured by the parameter λ.
As an illustration in Fig. 13.3 we present numerical results from PIMC simulations.
Shown is the temperature dependence of the superfluid fraction for several values
of the control parameter λ (the range 2 ≤ λ ≤ 10 corresponds to typical particle
densities in semiconductor quantum dots). The repulsive interaction causes a shift
of the transition temperatures to lower values. When cooled down, the system typi-
cally forms a crystal like state in intermediate temperature regions until it melts into
a ring like structure with delocalized particles, see insets in Fig. 13.3. Obviously, the
latter shows a high superfluid response which is proportional to the ratio of the area
enclosed by paths to the cross-section of the whole system (see (13.25)). In the ideal
case, the system skips the intermediate crystal phase and directly reaches the delo-
calized state. In the case of dominating interaction strengths, the system stays highly
localized even at absolute zero. Note, that even for the crystal phase the simulations
yield a non vanishing value γs . This is a finite size effect because of a nonzero area
ratio (13.25) (for details see [30]).
13.5 Discussion
We close this lecture with a few general comments. Quantum and classical Monte
Carlo methods are currently actively developing computational tools for a basi-
cally exact treatment of many-body systems in equilibrium. Quantum simulations
are particularly complicated: While in classical mechanics one only has to evaluate
integrals over the Boltzmann distribution, in quantum mechanics one also needs to
determine the quantum density matrix or, at low temperature, the wave function.
The basis for the PIMC approach lies in the correspondence principle, which states
that quantum mechanics reduces to classical mechanics in the limits of low density
and high temperature.
The ability of quantum Monte Carlo methods (including PIMC) to provide an
accurate treatment of quite a general class of model Hamiltonians has lead to ap-
plications in many fields of physics, including low-temperature degenerate plasmas,
solid state physics, nanomaterials, collective effects in ultra-cold Bose and Fermi
gases, molecules etc. This list is far from being complete.
Typical applications include neutral atoms cooled down to temperatures of sev-
eral nano Kelvin, or systems with strong correlations, quantum effects in solids,
melting or liquid-vapor transitions. Particularly interesting are the crystal formation
of electrons or holes in bulk semiconductors [31] and quantum dots [32, 33], the
superfluidity of dense 4 He in Vycor [28, 34], the equation of state, phase transitions
and the phase diagram of hot, dense hydrogen [16, 17, 35, 36, 37].
In addition, there are calculations concerned with the superfluid transition of
4
He [38, 39]. Since 4 He is one of the simplest bosonic systems for experimentalists
13 Path Integral Monte Carlo Simulation of Charged Particles 411
References
1. A. Filinov, M. Bonitz, in Introduction to Computational Methods for Many Body Sys-
tems, ed. by M. Bonitz, D. Semkat (Rinton Press, Princeton, 2006) 397, 399, 400, 401, 402, 403, 405
2. R. Feynman, A. Hibbs, Quantum Mechanics and Path Integrals (McGraw Hill, New
York, 1965) 398, 406
3. D. Ceperley, Rev. Mod. Phys. 67, 279 (1995) 399, 400, 401, 403, 405, 408
4. H. Kleinert, Path Integrals in Quantum Mechanics, Statistics and Polymer Physics, 2nd
edn. (World Scientific, 1995) 400
5. D. Ceperley, in Monte Carlo and Molecular Dynamics of Condensed Matter Systems,
ed. by K. Binder, G. Ciccotti (Editrice Compositori, Bologna, 1996) 400, 404
6. W. Ebeling, H. Hoffmann, G. Kelbg, Contr. Plasma Phys. 7, 233 (1967). And references
therein 402
7. A. Filinov, V. Golubnychiy, M. Bonitz, W. Ebeling, J. Dufty, Phys. Rev. E 70, 046411
(2004) 402
8. H. Kleinert, Phys. Rev. D 57, 2264 (1998) 402
9. T. Gaskell, Proc. Phys. Soc. 77, 1182 (1961) 402
10. T. Gaskell, Proc. Phys. Soc. 80, 1091 (1962) 402
11. D. Ceperley, Phys. Rev. B 18, 3126 (1978) 402
12. V. Natoli, D. Ceperley, J. Comput. Phys. 117, 171 (1995) 402
13. C. Lin, F. Zong, D. Ceperley, Phys. Rev. E 64, 016702 (2001) 402
14. P. Kent, R. Hood, A. Williamson, R. Needs, W. Foulkes, G. Rajagopal, Phys. Rev. B 59,
1917 (1999) 402
412 A. Filinov et al.
Alexander Quandt
The chemical and physical properties of solids, molecules and nanomaterials de-
pend on a subtle interplay of the spatial arrangement of the ions and the resulting
distribution and density of electrons, which provide the binding forces of the sys-
tem. Predicting the structure and the properties of novel materials, e.g., nanosys-
tems, therefore is impossible without falling back on the elementary interactions
and the most accurate ab initio methods for their simulation. We give a survey of
the most popular ab initio methods used by quantum chemists, and describe some
important modifications that made those methods available for the study of complex
nanomaterials of moderate size.
14.1 Introduction
The term ab initio1 refers to a family of theoretical concepts and computational
methods that literally treat the many-electron problem from the beginning. In other
words, these methods start from the exact (non-relativistic) many-body Hamiltonian
of an atomic, molecular or solid system comprising M atoms and N electrons.
In a strict sense, the one and only approximation ever made will be the Born-
Oppenheimer approximation [1], where one assumes that the electronic and nu-
clear time scales effectively decouple. Then one may freeze the nuclear degrees
of freedom R ≡ {R1 . . . RM } and solve the corresponding Schrödinger equa-
tion for a many-electron wavefunction Ψ that will explicitly depend on the elec-
tronic degrees of freedom r ≡ {r1 . . . rN }, only. Therefore in the framework of the
Born-Oppenheimer approximation, the exact many-electron Hamiltonian will be (in
atomic units, see [2]):
N
N
M N Zα Zβ M
1 Zα 1
H(r, R) ≡ − ∆r i − + + .
i
2 i α
|ri − Rα | i<j |r i − r j | |Rα − Rβ |
α<β
Here, the first term denotes the operator of the kinetic electronic energies, the sec-
ond term refers to the various attractive electron-nucleus interactions, the third term
describes the various electron-electron repulsions, and the final term describes the
repulsions between the various nuclei of the system.
1
Latin: ab, from + initio, ablative of initium, beginning.
A. Quandt: Ab-Initio Approach to the Many-Electron Problem, Lect. Notes Phys. 739, 415–436 (2008)
DOI 10.1007/978-3-540-74686-7 14 c Springer-Verlag Berlin Heidelberg 2008
416 A. Quandt
108
106
104
102
100
10–2
10–4
10–6
1900 1920 1940 1960 1980 2000
Time
Fig. 14.1. Moore’s law predicts an exponential growth of computing power, which obviously
extends over various technologies (electromechanical: 1900–1935, relays: 1934–1940, vac-
uum tubes: 1940–1960, transistors: 1960–1970, integrated circuits: since 1970), see [7]
418 A. Quandt
that similar shifts in fabrication technologies and distributed computing will extend
Moore’s law even into the far future [7], and that these developments will provide us
with increasingly powerful computational platforms for future ab initio simulations.
A second important factor for the dramatic progress of ab initio methods were
several algorithmic breakthroughs, which considerably boosted the performance of
modern ab initio program packages. In order to understand the strong dependence of
modern ab initio codes from the development of powerful numerical algorithms, we
listed some of the most popular algorithms in Tab. 14.1. This listing was taken from
a recent effort to identify the top ten algorithms of the 20th century [8]. It comes as
no big surprise that the vast majority of these algorithms are actually forming key
elements of modern ab initio codes, and progress along these lines implies progress
in the computational performance of ab initio codes. Probably, the top ten algorithms
of the 21st century will also make their way into future ab initio codes.
The third important factor for the progress of ab initio methods were theoretical
and conceptional breakthroughs in applying the variational principle described by
(14.1). Nowadays chemical accuracy may routinely be achieved for system sizes
that imply hundreds of atoms and electrons, and these developments turned out to
be so useful for our current understanding of molecular and solid systems, that the
1998 Nobel prize in Chemistry was given to some of the protagonists in the field of
ab initio methods, Walter Kohn and John A. Pople. We will describe some of their
achievements in Sect. 14.3 and 14.4, but in order to get a more detailed impression
about their pioneering work, we recommend the study of some decent textbooks,
for example [2, 9, 10, 11] or [12].
Table 14.1. Top ten algorithms of the 20th century, after [8]
Let us finally emphasize that these lecture notes are meant to be tutorial in the
first place, and to provide the reader with some sort of jump start concerning mod-
ern ab initio methods. Therefore these notes are no substitute for an extended review
article about ab initio methods, and the interested reader is asked to consult further
literature in order to get a more detailed picture of the vast field of modern ab ini-
tio methods. Beyond that, knowledge usually comes with practice, and we really
want to encourage the reader to get some practical experience with modern ab initio
methods, for example after implementing and running some of the program pack-
ages listed in Sect. 15.A.
It is very likely that already during high school, your chemistry teacher might have
introduced you into the language of Lewis-diagrams, just like the ones shown in
Fig. 14.2. And after some time, you might have even learned to check chemical
420 A. Quandt
H H
X H C C H C C (a)
H H
H
M W CH (b)
H2C
C C C C
C C C C (c)
C C C C
Fig. 14.2. Lewis diagrams. (a) Octet rule for main row elements and some examples. (b)
Dodectet rule for d-block elements and a simple example. (c) Resonance effects stabilizing a
π-electron system
structures by carefully counting electrons from one to eight. But it is also very likely
that someone at the university finally told you that it is all rubbish. Well, the next
sections will show you that even the most simple minded Lewis picture of the chem-
ical bond is not that far off the truth.
Let us have a closer look at Fig. 14.2. Under (a), we find a rather suggestive rep-
resentation of the famous octet rule, which tells you that main row elements bind
over localized electrons pairs, and in a way that all main row elements involved in
chemical binding may be able to completely fill up their valence shells (s+ 3p) with
shared electrons. There is a similar rule for d-block elements shown in (b), where
the valence shell comprises six orbitals (s + 5d), which leads to a dodectet rule [16].
A single Lewis diagram of course is a very localized description of the chem-
ical bond, and in most cases, there is additional stabilization through delocaliza-
tion effects. In the classical resonance picture of the chemical bond [4], delocal-
ized bonding will be represented by a series of resonance structures, as indicated in
Fig. 14.2(c) for the well-known case of the delocalized π-electron system of ben-
zene. The real π-electron wavefunctions will be superpositions of these resonance
structures, such that all carbon-carbon bonds in Fig. 14.2(c) will turn out to be equal.
According to the Pauli principle, every orbital could be filled with two electrons,
and therefore we may consider the doubly occupied orbital solutions of (14.2) to
correspond to some localized electron pairs in the Lewis diagrams, to be located
around the atomic cores. Now let us assume that the influence of the nearby ionic
cores and electrons may be described by the addition of a perturbation term fpert (r).
Then according to second order perturbation theory, we obtain the following results:
(1)
φj |fpert |φi
φi (r) = φj (r) , (14.3)
ǫi − ǫj
j =i
(1)
Ei = φi |fpert |φi , (14.4)
(2)
|φi |fpert |φj |2
Ei = . (14.5)
ǫi − ǫj
j =i
These equations have some interesting interpretation. First of all (14.3) tells us that
under the influence of a perturbing environment, our localized orbitals will mix
and form delocalized orbitals. Equation (14.4) is a simple energy shift lacking any
further interpretation. But (14.4) contains a lot of chemistry. Here, the expression
for the second order energy correction involves a sum of terms that become negative
and rather large (i.e. bonding), whenever there is a strong interaction φi |fpert |φj
between orbitals i and j that are close in energy, and ǫi < ǫj . Therefore the system
usually stabilizes through one or just a few bonding contributions, which correspond
to a specific energetic scenario indicated in Fig. 14.3.
The latter diagram also has some chemical interpretation. The occupied orbital
i of energy ǫi is strongly interacting with a nearby unoccupied orbital j ∗ of energy
ǫj ∗ > ǫi . According to Lewis [13], the occupied orbital i is an electron pair donor
(Lewis base), whereas the unoccupied orbital j ∗ is an electron pair acceptor (Lewis
acid). The strong interaction between the donor orbital and the acceptor orbital leads
to the formation of a delocalized bonding orbital, which is lower in energy by an
amount:
(2) |φi |fpert |φj ∗ |2
Ei→j ∗ = −2 . (14.6)
ǫj ∗ − ǫi
εj ∗
εi
(2)
Ei→j ∗
Fig. 14.3. Donor-acceptor interaction between a doubly occupied orbital i and an unoccupied
(2)
orbital j ∗ , forming a new bonding orbital stabilized by an energy Ei→j ∗
422 A. Quandt
This delocalized bonding orbital will be filled by the electron pair that originally
occupied the localized orbital i.
Therefore, given the validity of a one-electron picture, where every (localized)
electron is only slightly perturbed by a local interaction fpert (r) corresponding to
the averaged influence of the environment, the chemical bonding will largely be
dominated by donor-acceptor interactions of the type shown in Fig. 14.3. And this
seems to be the standard scenario of quantum chemistry.
From a chemical point of view, the basis functions ϕ(r) should either be chosen
such that they mimic localized electronic states, for example the eigenstates of a
single atom (atomic orbitals). Or the basis functions should have some important
physical or chemical properties in common with the one-electron states they should
describe, for example the periodicity of electron wavefunctions in a solid.
Beyond that, the basis functions also serve some numerical purpose, namely
the transformation of a Schrödinger equation similar to (14.2) into a generalized
matrix eigenvalue problem to be discussed below. Then the criteria must be that the
numerical algebra related to these basis functions should be as simple as possible.
Consequently the basis functions ϕμ neither have to be orthogonal, nor do they
really have to correspond to any known floc (r). Instead, for the usual one-electron
system encountered in quantum chemistry or solid state physics, it will be most
important to pick a set of basis functions of the right physical shape. This chosen
basis set must be large enough to mimic electrons in a realistic fashion, but at the
same time small enough to keep the related matrix eigenvalue problem manageable.
Some of the most popular choices for basis functions are:
14 Ab-Initio Approach to the Many-Electron Problem 423
2
ϕ(r) = e−αr (14.8)
Within quantum chemistry, the most popular choices are the Gauss-type orbitals
(GTO) of (14.8). Their algebra is well understood [2], and the corresponding basis
sets have been optimized by generations of quantum chemists. Although the atomic
states seem to be more similar to the Slater-type orbitals (STO) of (14.9), it turns
out that the STOs may be well approximated by a suitable fixed linear combination
of GTOs [2].
For solids, the simplest type of basis functions are the plane waves (PW) of
(14.10). There are various numerical advantages in using such a basis set, in par-
ticular in the framework of Car-Parrinello molecular dynamics [20]. The algebra
related to the PWs is extremely simple, and the basic numerics can be carried out
quite effectively using FFT routines [21].
The basis functions of (14.11) are augmented planewaves (APW), which go
back to Slater [22]. These functions are designed to meet the special bonding situ-
ations in (closely packed) solids. Inside a sphere S near the nucleus, the potential
will be nearly spherically symmetric and similar to the potential of a single atom,
whereas in the interstitial region I, the potential will be almost constant. Both parts
of the corresponding electronic wavefunction in (14.11) have to match on the sur-
face of S. Using some clever approximations [23], all of these requirements can
be met in the framework of the linearized augmented planewave method (LAPW),
where the determination of eigenstates based on APWs may again be reduced to a
standard generalized matrix eigenvalue problem [21].
Other interesting basis sets are Muffin-Tin orbitals [24], Wannier functions [25],
or wavelets [26].
In the last paragraph we saw that each type of basis function ϕμ (r) seems to require
its own individual type of algebra. But in the end, the general problem of solving
the one-electron Schrödinger equation
f (r)φi (r) = ǫi φi (r) with φi (r) = Cνi ϕν (r) , (14.12)
ν
Here the matrix elements Fμν are a measure for the interaction strength between
two orbitals, and the matrix elements Sμν are a measure for their mutual overlap.
The coefficient matrix Cνi and the diagonal matrix ǫi δij are to be determined by
numerically solving the generalized matrix eigenvalue problem.
In principle (14.13) should be a standard numerical task. There exists a bulk of
profound literature dealing with such problems (see e.g. [27]), and there are quite
powerful program packages to tackle such problems, see http://www.netlib.
org/lapack/ or [28]. Nevertheless, as we will see in the next section, the gen-
eralized matrix eigenvalue problem related to the one-electron problem turns out to
be rather special. And therefore even the most powerful solvers, which are designed
to tackle the most general cases, may not really be the method of choice for solving
this problem.
It will be another textbook wisdom that the overlap between bonding atomic orbitals
is supposed to be a measure for the strength of that bond (principle of maximum
overlap). In a more scientific language, we may put it that way:
Most basis functions will decay rather quickly away from the centers where they
are located, and this means that quantum chemistry is a rather near-sighted business,
where the matrices Fμν and Sμν may be thinned out considerably. In the end (14.13)
will be a rather sparse matrix eigenvalue problem, and even some (over-)simplified
versions of (14.13) might be of considerable theoretical interest. Let us have a look
at the following approximations:
14 Ab-Initio Approach to the Many-Electron Problem 425
Fμν Cνi = ǫi Sμν Cνi , (14.15)
ν ν
with Sμν = 0 except when ϕμ and ϕν are located within nearest-neighbor distance,
or Sμν = δμν (Hückel-type approach), Fμν = 0 except when ϕμ and ϕν are lo-
cated within nearest-neighbor distance. These equations are the essence of the tight-
binding approximation, where all the contributions to Fμν and Sμν are zero, except
those that involve basis functions, which are located at neighboring sites. In such
a case, the interactions between valence orbitals located on neighboring atoms be-
come somewhat standardized and may be tabulated. Furthermore these tight-binding
models are also a perfect starting point for a series of simple, but rather powerful
analytical models in solid state physics, see Sect. 15.3 and [19].
14.2.3.2 Pseudopotentials
The idea behind the pseudopotential approach may easily be stated in a few sen-
tences. As we already mentioned before, only the valence electrons are contributing
to the chemical bond. Therefore it would be best to substitute all of the core elec-
trons by a pseudopotential, which weakens the original potential within the core re-
gion. This would lead to a pseudo-wavefunction χv for the valence electrons, which
would be much smoother inside the core region than the real valence wavefunction
φv , which wiggles around much faster, due to some orthogonality constraints with
respect to the core states φc , see Fig. 14.4.
Altogether we may assume that the pseudo-wavefunctions will also contain
some contributions from the core states, and therefore we make the following
Ansatz:
f φc = ǫc φc(core) f φv = ǫv φv (valence)
χv = φv + φc |χv φc (pseudo-wavefunction) . (14.16)
c
wavefunctions
real
R
pseudo
potentials
Fig. 14.4. In the framework of the pseudopotential approach, the wavefunctions of the va-
lence electrons are substituted by smooth pseudo-wavefunctions, which implies a weakened
interaction potential within the core region
426 A. Quandt
The next theoretical step is to introduce some core projector P , such that
Obviously, the pseudo-wavefunctions have the same energies as the real valence
electron wavefunctions, but the corresponding one-electron Hamiltonian f has been
modified quite dramatically (it should be energy-dependent!). It turns out that this
modified one-electron Hamiltonian may usually be written in the form of the last
line in (14.18), using a simple parameterized form for the pseudopotential VPS like:
Z − Nc A
VPS ≡ + e−λr . (14.19)
r r
This parameterization comprises the fitting parameters A and λ, Z should just be
the nuclear charge, and Nc the number of core electrons. Of course, there are more
sophisticated ways to construct a pseudopotential VPS , in particular using planewave
basis sets [21].
This Hamiltonian could be inserted into the variational principle of (14.1) under the
constraint of orthonormality for a set of suitable spin orbitals
These spin orbitals will form a Slater determinant, which is defined as follows:
" "
" χ1 (x1 ) χ2 (x1 ) . . . χN (x1 ) "
" "
1 " χ (x ) χ2 (x2 ) . . . χN (x2 ) ""
Ψ SD (r, m) ≡ √ "" 1 2 . (14.23)
N! " . . . ... ... . . . ""
" χ1 (xN ) χ2 (xN ) . . . χN (xN ) "
Hartree’s idea [29] was to reduce the many-electron problem of chemical bonding
to a one-electron form, where every electron has its own individual wavefunction φi
and energy level ǫi . To this end, he suggested a one-electron Schrödinger equation
of the following kind:
1
− ∆r φi (r) + V (r)φi (r) = ǫi φi (r) . (14.24)
2
The first term denotes the kinetic energy operator for an electron with wavefunction
φi , and the second term represents a general interaction potential for this electron.
428 A. Quandt
The brilliant insight of Hartree was to assume that every electron is moving in a
potential caused by the classical electrostatic interaction with the nuclei, and caused
by the classical electrostatic interaction of the electron with smeared out negative
electric charges that correspond to the electron density
N
ρ(r ′ ) = φ∗i (r ′ )φi (r ′ ) . (14.25)
i
This is the very essence of the mean-field approach. We see that the electron den-
sity in (14.25) is obviously built from the electron wavefunctions themselves, and
therefore the corresponding potential must be constructed by iterating (14.24) until
one obtains a self-consistent electron density or wavefunction.2
In contrast to a common prejudice, the potential given by Hartree was actually
the following:
M
Zα ρ(r ′ ) − φ∗i (r ′ )φi (r ′ ) ′
V (r) = − + dr (for electron i) . (14.26)
α
|Rα − r| |r − r ′ |
We see that Hartree obviously corrected the interaction between electron i and the
electronic mean field, such that the electron i will not interact with itself, which
would certainly be unphysical.
The final conceptional step of the Hartree theory was to pack the one-electron
wavefunctions together to form a many-electron wavefunction. Here Hartree as-
sumed a simple product wavefunction:
N
0
Ψ (r) = φi (r i ) . (14.27)
i
It was Fock [30], who pointed out that the many-electron wavefunction of Hartree
theory should better be approximated by a single Slater determinant (see (14.23))
in order to guarantee its antisymmetry (see (14.21)). In the next section, we will
see that this assumption will add another term to the mean field called exchange
interaction. Finally we note that spin is obviously missing from Hartree’s original
theory.
We now derive Hartree-Fock theory, starting from the variational principle of (14.1).
This will lead to a set of non-linear one-electron Schrödinger equations, similar to
the Hartree theory ((14.24)–(14.26)). Then, by introducing basis functions, these
equations may be transformed into a nonlinear matrix equation (Roothan equation),
where we have to determine the self-consistent solution to a generalized eigenvalue
problem similar to (14.13).
2
The first “supercomputer” to carry out these calculations was Hartree’s father.
14 Ab-Initio Approach to the Many-Electron Problem 429
and apply the variational principle with a twist: We vary Etot with respect to the
(conjugate) spin orbitals χ∗a , under the constraint that these spin orbitals should be
orthogonal. To this end we introduce a Lagrangian multiplier ǫab . Thus (14.1) will
be transformed into the following variational principle:
Orthonormality : [a|b] = dx1 χ∗a (x1 )χb (x1 ) = δab
1 2
δ δ
L[Ψ0 ] = E[{χi }] − ǫab ([a|b] − δab ) = 0 .
δχ∗a δχ∗a
ab
(14.29)
various fHF of (14.30). There are quite powerful minimization techniques that ac-
tually exploit this idea, which lead to a dramatic improvement in convergence for
large systems [20].
Finally we write out (14.30) explicitly
M
N
1 Zα χ∗j (x′ )χj (x′ ) ′
− ∆r χi (x) − χi (x) + dx χi (x)
2 α
|Rα − r| |r − r′ |
j =i
N
χ∗j (x′ )χi (x′ ) ′
+ dx χj (x) = ǫi χi (x) , (14.31)
|r − r ′ |
j =i
And by applying the techniques from Sect. 14.2.2, (14.31) will go over into a nonlin-
ear matrix equation called Roothan equation [2]. We explicitly write out everything
in its full glory, just to stop the overconfident reader, who might be convinced that
he/she will be able to write a HF program overnight:
Fμν Cνi = ǫi Sμν Cνi ,
ν ν
Sμν = dr ϕ∗μ (r)ϕν (r) ,
nucl core
Fμν = Tμν + Vμν + Gμν = Hμν + Gμν ,
1
Tμν = drϕ∗μ (r)[− ∆r ]ϕν (r) ,
2
Zα
nucl
Vμν = drϕ∗μ (r)[− ]ϕν (r) ,
α
|r − Rα |
14 Ab-Initio Approach to the Many-Electron Problem 431
N/2
∗
Gμν = Cλa Cρa [2(μν|ρλ) − (μλ|ρν)] ,
a λρ
1
(μν|λρ) = dr 1 dr 2 ϕ∗μ (r 1 )ϕν (r 1 ) ϕ∗ (r 2 )ϕρ (r 2 ) ,
|r 1 − r 2 | λ
N/2
1 ∗ core
Zα Zβ
Etot = [2 Cμa Cνa ](Hμν + Fμν ) + . (14.33)
2 μν a α
|Rα − Rβ |
β>α
Again we notice that the operator Fμν depends on the coefficient matrix Cνi that
ought to be determined from (14.33). Therefore we have to solve this equation it-
eratively, and schemes to accelerate such a procedure are known for a long time,
see [31].
Note that we would specifiy those Slater determinants by those orbitals that are
actually substituting orbitals of the ground-state Slater determinant Ψ0SD . The various
expansion coefficients may be determined from the variational principle of (14.1),
which corresponds to the diagonalization of a giant Hamilton matrix H [2]
⎛ αβ ⎞
Ψ0 |H|Ψ0 0 Ψ0 |H|Ψab 0 ...
⎜ Ψaα |H|Ψbβ Ψaα |H|Ψbc
βγ βγδ
Ψaα |H|Ψbcd ... ⎟
H =⎜
⎝
⎟(14.35)
Ψab |H|Ψcd Ψab |H|Ψcde . . . ⎠
αβ γδ αβ γδǫ
etc.
There are actually some selection rules, which make the matrix H a little bit sparser,
but usually one needs a large number of Slater determinants to really improve upon
the HF method. Therefore the CI method is only applied to obtain some benchmark
432 A. Quandt
results for smaller systems, but there is a whole plethora of similarly accurate post-
HF ab initio methods described in [2] or [9], and a lot of them are actually going
back to Pople.
The key entity of density functional theory is the (spinless, reduced) one-electron
density:
ρ0 (r1 ) = N Ψ0∗ (x1 . . . xN )Ψ0 (x1 . . . xN )dm1 dx2 . . . dxN , (14.36)
There are some mathematical subtleties related to this variational principle. In par-
ticular it is not clear up to now which types of trial densities ρtr are actually allowed
in (14.38). But for this and other mathematical details, we refer the interested reader
to [35] or [10]
So far, we could convince ourselves that there exists some abstract density func-
tional Ev [ρ0 (r), R] (14.37), and an equally abstract variational principle to deter-
mine the ground-state density ρ0 (14.38). However, Kohn and Sham showed [33]
that density functional theory may be put in a form similar to Hartree or Hartree-
Fock theory.
The key concept are the one-electron Hamiltonians that we discussed at great
length in Sect. 14.2. Because Kohn and Sham made the assumption that the one-
electron density ρ0 should be equal to the one-electron density of a non-interacting
reference system:
1
− ∆r + V (r) φi (r) = ǫi φi
2
Ψs = Ψ SD (φ1 . . . φN ) =⇒ ρ0 (r) = |φi (r)|2 . (14.39)
i
Then ρ0 will be made of the orbital solutions φi to (14.39), and the correspond-
ing many-electron ground-state wavefunction Ψs will be a single Slater determinant
made from the most stable orbitals, which is already pretty close to Hartree-Fock
theory!
In order to arrive at a potential V similar to the Hartree or Hartree-Fock one-
electron interaction potential, it is necessary to make some cosmetics and rearrange
various parts of (14.37):
434 A. Quandt
Eel [ρ] = T [ρ] + Vee [ρ] + Vv [ρ] = Ts [ρ] + Veeclass [ρ] + Exc [ρ] + Vv [ρ] ,
1
Ts [ρ] = φi | − ∆r |φi ,
i
2
Exc [ρ] = T [ρ] − Ts [ρ] + Vee [ρ] − Veeclass [ρ] . (14.40)
The exchange correlation functional Exc will become our garbage collection, con-
taining all non-classical electron interactions, as well as corrections to the kinetic
energy functional Ts of the non-interacting reference system. The quality of any
density functional based simulation will depend quite critically on reasonable ap-
proximations for Exc as a functional of ρ0 , see the next section.
With these rearrangements, we may carry out the variational principle of (14.38),
where the variation with respect to ρ will go over into a variation with respect to the
(conjugate) orbitals φ∗i :
⎛ ⎞
δ δ ⎝
F [{φi }] = Eel [{φi }] − λij φ∗i (x)φi (x)dx⎠ = 0
δφ∗i δφ∗i ij
1
⇒ fks φi = − ∆r + vs (ρ0 , r, R) φi = ǫi φi . (14.41)
2
Thus we formally obtain the kind of non-linear one-electron Schrödinger equation
that was postulated in (14.39). And again we will have to solve this equation itera-
tively.
We may then write out fks and compare it to the Hartree ((14.24)–(14.26)) and
the Hartree-Fock (14.31) one-electron Hamiltonians:
M
1 Zα ρ0 (r ′ ) ′
− ∆r φi (r) − φi (r) + dr φi (r)
2 α
|R α − r| |r − r ′ |
δExc
+ [ρ0 (r)]φi (r) = ǫi φi (r) . (14.42)
δρ
The biggest difference is the last term in (14.42) called exchange-correlation poten-
tial, which was completely missing within Hartree theory. And in the framework
of Hartree-Fock theory, there was a complicated orbital-dependent exchange poten-
tial taking the place of this density dependent exchange-correlation potential. In the
next section, we will discuss some technical details related to (14.42) that will also
concern the construction of a suitable exchange-correlation potential.
The obvious similarities between the Kohn-Sham scheme and the Hartree(-Fock)
method will makes it easy to implement density functional theory into any existing
Hartree-Fock code. To this end, we just have to re-write the Kohn-Sham equations
in matrix form, using the standard procedure based on an expansion of the orbitals
14 Ab-Initio Approach to the Many-Electron Problem 435
and the one-electron density in a suitable set of basis functions. Given a precise
exchange-correlation functional Exc , we will have a Hartree-like method with post-
Hartree-Fock accuracy!
Various types of exchange-correlation functionals have been discussed at great
length in the literature [10], but the most popular ones fall into the following classes:
LDA
Exc [ρ] = ρ(r)egas [ρ(r)]dr ,
GA LDA
Exc [ρ] = Exc + δExc [ρ(r), |∇r ρ(r)|] . (14.43)
LDA
The first type of exchange-correlation functional Exc refers to the local density
approximation (LDA), and egas is the energy density of the electron gas. These
types of exchange-correlation functionals are basically some parameterized forms
of the exchange-correlation functional of a homogeneous electron gas [10]. The
LDA seems to be a rather poor assumption, because the electron density within a
molecule or solid is usually varying noticeably, which is the opposite of a homoge-
neous electron gas.
But LDA works quite well, mainly due to some miraculous error compensation
[34]. And as indicated in (14.43), it is also possible to obtain even better exchange-
GA
correlation functionals Exc by determining some correction terms, which depend
on the density and the density gradient [36].
Finally the reader may have noticed that we did not introduce any spin into our
formalism. No need to worry, it turns out that the general formalism of density func-
tional theory can easily be modified to meet this requirement, just by introducing
an exchange-correlation functional that will depend on two different one-electron
densities for different electron spins. This method is called spin-density functional
theory, see [34] and [10].
References
1. M. Born, K. Huang, Dynamical Theory of Crystal Lattices (Oxford University Press,
Oxford, 1954) 415
2. A. Szabo, N.S. Ostlund, Modern Quantum Chemistry (McGraw-Hill, New York, 1989) 415, 418, 423, 4
3. C.A. Coulson, Valence (Clarendon Press, Oxford, 1952) 416, 419
4. L. Pauling, The Nature of the Chemical Bond, 3rd edn. (Cornell University Press, Ithaca,
1960) 416, 419, 420
5. H.A. Bethe, E.E. Salpeter, Quantum Mechanics of One- and Two-Electron Atoms
(Springer, Berlin Göttingen Heidelberg, 1957) 417
6. C.A. Mead, Journal of VLSI Sig. Process. 8, 9 (1994) 417
7. R. Kurzweil, The Age of Spiritual Machines (Penguin Putnam, New York, 2000) 417, 418
8. J. Dongarra, F. Sullivan, Comp. in Sci. & Eng. 2, 22 (2000) 418
9. F.E. Harrisa, H.J. Monkhorst, D.L. Freeman, Algebraic and Diagrammatic Methods in
Many-Fermion Theory (Oxford University Press, Oxford, 1989) 418, 427, 432
10. R.G. Parr, W. Yang, Density Functional theory of Atoms and Molecules (Oxford Univer-
sity Press, Oxford, 1989) 418, 432, 433, 435
436 A. Quandt
11. R.M. Dreizler, E.K.U. Gross, Density Functional Theory (Springer, Berlin Heidelberg,
1990) 418, 432
12. N.H. March, Electron Density Theory of Atoms and Molecules (Academic Press Limited,
New York, 1992) 418, 432
13. G.N. Lewis, Valence and the Structure of Atoms and Molecules (The Chemical Catalog
Co., New York, 1923) 419, 421
14. W.N. Lipscomb, Acc. Chem. Research 6, 257 (1973) 419
15. F. Weinhold, C.R. Landis, Chemistry education 2, 91 (2001) 419
16. F. Weinhold, C.R. Landis, Valence and Bonding. A Natural Bond Orbital Donor–
Acceptor Perspective (Cambridge University Press, Cambridge, 2005) 419, 420
17. A.P. Sutton, Electronic Structure of Materials (Clarendon Press, Oxford, 1994) 419
18. D. Pettifor, Bonding and Structure of Molecules and Solids (Clarendon Press, Oxford,
1995) 419
19. W.A. Harrison, Elementary Electronic Structure, revised edn. (World Scientific,
Singapore, 2004) 419, 425
20. M.C. Payne, M.P. Teter, D.C. Allan, T.A. Arias, J.D. Joannopoulos, Rev. Mod. Phys. 64,
1045 (2002) 423, 430
21. D.J. Singh, Planewaves, pseudopotentials and the LAPW method (Kluwer Academic
Publishers, Dordrecht, 1994) 423, 426
22. J.C. Slater, Phys. Rev. 51, 846 (1937) 423
23. O.K. Andersen, Phys. Rev.B 12, 3060 (1975) 423
24. O.K. Andersen, Z. Pavlowska, O. Jepsen, Phys. Rev.B 34, 5253 (1986) 423
25. N. Mazari, D. Vanderbilt, Phys. Rev.B 56, 12847 (1997) 423
26. T.A. Arias, Rev. Mod. Phys. 71, 267 (1999) 423
27. G.H. Golub, C.F. van Loan, Matrix computations (The Johns Hopkins University Press,
Baltimore London, 1996) 424
28. E. Anderson, Z. Bai, C. Bishof, S. Blackford, J. Demmel, J. Dongarra, J.D. Croz,
A. Greenbaum, S. Hammarling, A. McKenney, D. Sorensen, LAPACK user’s guide
(SIAM, Philadelphia, 1999) 424
29. D.R. Hartree, Proc. Camb. Phil. Soc. 24, 111 (1928) 427
30. V. Fock, Z. Phys. 61, 126 (1930) 427, 428
31. P. Pulay, Chem. Phys. Lett. 73, 393 (1980) 431
32. P. Hohenberg, W. Kohn, Phys. Rev. 136, B864 (1964) 432
33. W. Kohn, L. Sham, Phys. Rev. 140, A1133 (1965) 432, 433
34. R.O. Jones, O. Gunnarsson, Rev. Mod. Phys. 61, 690 (1989) 432, 435
35. E.H. Lieb, Int. J. Quantum Chem. 24, 243 (1983) 432, 433
36. J. Tao, J.P. Perdew, V.N. Staroverov, G.E. Scuseria, Phys. Rev. Lett. 91, 146401 (2003) 435
15 Ab-Initio Methods Applied to Structure
Optimization and Microscopic Modelling
Alexander Quandt
In Fig. 15.1 we made a simple sketch of such an energy hypersurface. The simplicity
of this figure is slightly misleading. Normally R is a large multivector, and therefore
the energy landscape may be full of stationary points. But those stationary points are
of the highest chemical relevance:
A. Quandt: Ab-Initio Methods Applied to Structure Optimization and Microscopic Modelling, Lect. Notes Phys. 739,
437–469 (2008)
DOI 10.1007/978-3-540-74686-7 15 c Springer-Verlag Berlin Heidelberg 2008
438 A. Quandt
TS
LM
Energy
GM
R
Fig. 15.1. Sketch of an energy hypersurface Ehyp (R), indicating the location of transition
states (TS), local minima (LM) and global minima (GM)
"
∂Ehyp ""
=0 stationary state ,
∂R "R0
" $
∂ 2 Ehyp "" > 0 for all coord.: isomer ,
= (15.2)
∂ 2 R "R 0
< 0 for at least one coord.: transition state .
Among those isomers, there will usually be a large number of local minima (LM),
and just one or a handful of global minima (GM). These global minima ought to
be detected to make a reliable prediction of the most stable configurations of a cer-
tain system, which is a serious numerical challenge. In Sect. 15.1.3 we will present
several techniques to step over energy hypersurfaces, in a way that will actually
increase our chances to detect the most relevant local and global minima.
Furthermore, there may be chemical of physical processes that connect vari-
ous chemically relevant minima. Here it will be of immediate chemical relevance
to know the transition states that are located on a path connecting both minima.
Similarly one might want to know the size of the energetic activation barriers be-
tween both minima. Unfortunately, transition states are even more difficult to detect
than minima, and there is also no guarantee that the numerical search algorithms for
transition states will generate any meaningful result [2].
A simple toy model will illustrate the complexity of such a task [1]. Just assume
that we want to examine a large system of m mutually independent subsystems
comprising N atoms. For the number of isomers nisomer we find that:
which means that the number of isomers is growing exponentially with the subsys-
tem size N . For the transition states we assume that each of them is located in one
subsystem, and that a transition state of the complete system with mN atoms is only
occurring when one of the subsystem is in a transition state, and all of the others are
in a minimum. Therefore the number of transition states ntstates may be calculated as
follows:
ntstates (mN ) ≈ m nisomers (N )m−1 ntstates (N ) =⇒ ntstates (N ) ≈ N eαN . (15.4)
15 Structure Optimization and Microscopic Modelling 439
Again this will imply exponential growth with N . Finally we see that
ntstates
≈N , (15.5)
nisomers
i.e. the ratio of the number of transition states vs. the number of isomers grows lin-
early with the subsystems size N , which explains the increasing difficulty to detect
transition states.
Given the complexity of a rugged energy hypersurface defined over a configura-
tion space made of large multivectors R, we may also ask ourselves how such a com-
plicated object might actually be visualized. The monograph of Wales [1] presents
several interesting techniques like monotonic sequences, disconnectivity graphs of
minimum-transition state-minimum triplets, and a network analysis of disconnec-
tivity graphs to determine some typical scaling laws, and to prove the existence of
chemically relevant hubs.
However, beyond these techniques mentioned in [1], there is a large literature
concerning the graphical visualization of complex data [3], and it might actually pay
off to try one of these techniques to represent and analyze energy hypersurfaces.
15.1.2 Forces
Given an energy hypersurface Ehyp (R), the formal definition of ab initio interatomic
forces is rather simple:
∂Ehyp (R)
F k ≡ −∇Rk Ehyp (R) ≡ − . (15.6)
∂R
Thus the forces on a nucleus with coordinate Rk is just the derivative of the ab initio
total energy with respect to this coordinate. The forces on a whole configuration R
are then forming a corresponding force multivector, as indicated in (15.6). In the
following, we will consistently use this notation, and specify the Rk only when
necessary.
There is a disarmingly simple force theorem by Hellmann and Feynman, and
we will discuss it in the following (see [2] for a critical revision of this concept).
Assume that Ψ is the exact (normalized) eigenstate of H(r, R). Then we obtain the
following result for the forces:
∂Ehyp (R) ∂Ψ ∂Ψ
= |H(r, R)|Ψ + Ψ |H(r, R)|
∂R ∂R ∂R
∂H(r, R)
+Ψ | |Ψ
∂R
∂H(r, R)
= Ψ | |Ψ . (15.7)
∂R
The last line follows from the fact that for the exact (normalized) eigenstate Ψ of
H(r, R) we find that
440 A. Quandt
∂Ψ |Ψ ∂Ψ ∂Ψ
=0= |Ψ + Ψ | . (15.8)
∂R ∂R ∂R
As simple as this theorem might be, as hard it is to apply in practice! Note that we
were generally composing Ψ using orbitals that might be expanded in some suitable
localized basis sets, see (14.7) and (14.32). If these basis sets are not somehow fol-
lowing the gradient ∂Ψ/∂R, there is no reason that (15.7) will reduce to the simple
result of its last line (see Appendix C of [2]). The way to include a proper basis-set-
following is to determine the formal changes in the orbital expansion coefficients
Cμi (see (14.7) and (14.32)):
#hyp ({Cμi } , R)
∂Ehyp ({Cμi } , R) ∂E ∂Ehyp ({Cμi } , R) ∂Cμi
= +
∂R ∂R μi
∂Cμi ∂R
= an artwork . . . . (15.9)
The tilde-sign in this equation denotes all terms that explicitly depend on R. The
complex analytical artwork indicated by (15.9) for (post) Hartree-Fock methods
may be found in [4], including higher derivatives.
Now we want to present some methods to step over energy surfaces in order to detect
isomers and transition states. Useful references are the Appendix C of [2] and [1].
Note that none of theses methods is foolproof, and you will dramatically increase
your chances to become a fool, if you leave common sense and chemical intuition
behind to blindly trust a numerical blackbox.
The goal of any structure optimization method is to detect a stationary point, hope-
fully being the most stable isomer of the system. If there is no indication where to
search, one simply has to construct a reasonable starting configuration R0 . Then
one usually applies one’s favorite search algorithm, which will step over the energy
hypersurface in a systematic fashion, and finally reveal the location of a station-
ary point. This procedure can be repeated with different starting configurations to
achieve a certain sampling of the energy hypersurfaces. The algorithms presented in
this paragraph are all local search algorithms, which at best might be able to detect
some stationary points next to a chosen starting configuration. They are to be used
with care.
The simplest way to step over an energy hypersurface is a steepest descent path.
In such a case we will move from one configuration Ri to the next configuration
Ri+1 along a direction determined by the local forces:
min
Ehyp = min Ehyp (Ri − λ∇R Ehyp (Ri )) ⇒ Ri+1 = Ri − λmin ∇R Ehyp (Ri ) .
λ
(15.10)
15 Structure Optimization and Microscopic Modelling 441
Here λmin is the λ which minimizes Ehyp along the steepest descent direction. For
complicated hypersurfaces, the steepest descent procedure will mainly consist of
bouncing around like a drunk sailor. A more sober way of stepping over energy hy-
persurfaces is the famous Newton-Raphson method. When applying this method,
one is permanently optimistic that for a given configuration Ri , the next configura-
tion Ri+1 will be a stationary point, involving the following approximations:
The Hessian H(Ri ) involves analytical second derivatives and may be quite costly
to determine. Therefore the search step
will be by far more tedious than the determination of a steepest descent step, which
involves the determination of the forces, only (see (15.10)).
There is a whole family of Quasi-Newtonian algorithms, which circumvent these
conceptional difficulties by starting with an initial guess for the inverse Hessian
H −1 , and updating the latter for every subsequent search step using the forces. The
most popular algorithms of this family can be found in the Numerical Recipes [5],
but there is also a simple algorithm described in the Appendix C of [2], which may
easily be programmed and implemented by the reader.
We want to close this section with a little survey of the most popular structure
optimization methods (see [2]):
– Methods without gradients. The most popular method is due to Nelder and Mead
[5]. These methods should only be used, if there is really no chance to determine
analytical derivatives.
– Methods involving analytical first derivatives and numerical second derivatives.
The whole family of Quasi-Newtonian methods mentioned above falls under
this category, the most prominent examples being the Davidson-Fletcher-Powell
method [5], or the Broyden-Fletcher-Goldfarb-Shanno method [5]. There is a
second family of methods falling into this category, which is based on the con-
jugate gradient method. The latter is a rather smart line search algorithm, which
proceeds along conjugate directions rather than steepest descent directions. Like
the steepest descent method described above, the conjugate gradient method in-
volves analytical first derivatives, only. The most prominent examples are the
conjugate gradient methods of Polak and Ribiere [5], and of Fletcher and Pow-
ell [5].
– Methods involving analytical first and second derivatives. These methods are
usually too costly if one is only interested in the isomers of a given molecular or
442 A. Quandt
solid system. However, some of the algorithms to detect transition states involve
the knowledge of analytical second derivatives (see [2]). In the following para-
graph we will present a simple method to detect transition states, which will
involve analytical first derivatives, only.
For a detailed description and proper references we constantly referred to the Nu-
merical Recipes [5], which really should be your first address when trying to under-
stand and implement those methods.
The standard setting for this method is the typical triplet setting on the energy hy-
persurfaces, where two isomers are connected by a transition state. We assume that
both isomers are already known, which could be the educts and the products of a
chemical reaction. In order to detect the transition state and the corresponding en-
ergy barrier of a reaction path connecting both isomers, one may apply the general
procedure indicated in Fig. 15.2.
Between the isomers M1 and M2 , one may choose a set of images Ii at some-
what intermediate geometries. Those images are supposed to be connected by elastic
spring forces of strength k
Fi,spring = k (|Ri+1 − Ri | − |Ri − Ri−1 |)
ti (15.13)
which will prevent them from collapsing into one single image. The ti is an estimate
for the normalized tangent vector to the path at Ri . Note that the Fi and Rk and ti
are all multivectors.
The total force on an image Ii is defined as:
∂Ehyp ∂Ehyp
Fi = Fi,spring − (Ri ) + (Ri ) · ti
ti . (15.14)
∂R ∂R
TS
LM
Energy
GM
R
Fig. 15.2. The nudged elastic band method involves two known isomers M1 and M2 , and a
set of images Ik located between them, which interact via springs. After starting with a rather
poor configuration (white circles), the elastic band between both isomers will slip downhill
into its final position (grey circles), which marks the proper pathway over a transition state
close to I2
15 Structure Optimization and Microscopic Modelling 443
The last term in (15.14), which involves the scalar product of the multivectors, will
remove the component of the chemical force along the path. This means that we
now have artificial harmonic forces along the path, and the components of the real
chemical forces perpendicular to them.
The forces Fi for each image are minimized using one of the algorithms with
numerical second derivatives described in the last paragraph. It will correspond to
an high-dimensional elastic band, that slips downhill on an energy hypersurface into
the proper reaction pathway connecting two isomers, as indicated in Fig. 14.5.
Energy hypersurface
1 ... 5 Annealing sequence
1
Catchment basins
2
Energy
3
4
5
Fig. 15.3. A simulated annealing sequence will eventually be able to detect a global mini-
mum on a rather simple energy hypersurface, but this procedure is hopelessly inaccurate for
complex hypersurfaces. The basin hopping algorithm involves the transformation of an en-
ergy hypersurface into a simpler object composed of catchment basins. Such a hypersurface
is much easier to sample using simulated annealing, and inside each basin, the original energy
hypersurface will be sampled in search for local minima
444 A. Quandt
Then the search will either continue from the next local minimum R #i+1 , or again
#
from the old minimum Ri . By gradually lowering the temperature T , the search will
be narrowed down on a basin, which hopefully contains the global minimum to be
detected, see Fig. 15.3.
In Fig. 14.1 we saw that Moore’s law was obviously holding through rather dramatic
technological changes. Now even the most optimistic interpolations of Moore’s law
into the near future clearly predict that silicon-based computer technologies will
soon hit the lithographic barrier of about 40 nm, and probably run out of steam.
15 Structure Optimization and Microscopic Modelling 445
These technologies will have to substituted by other technologies, and it will in-
volve new materials, new devices and radically new concepts for the layout of future
computing machines.
In order to understand the advantages and disadvantages of shrinking devices
down to the nanodomain, we listed the classical scaling behavior of some key phys-
ical properties with system size L in Table 15.1. The only assumptions are that speed
and electrostatic fields should be constant, and that forces are acting via surfaces,
which are proportional to L2 (continuum model [8]).
We notice that nanodevices will have some obvious advantages over micro-
electronic devices: They will be cheaper, they will operate at smaller voltages and
higher frequencies, and they will tolerate more power. On the other hand, the resis-
tance of nanodevices will be rather high, their capacitance will be low, they will be
rather noisy and short-lived. Of course, these simple scaling laws might have to be
amended due to the laws of quantum mechanics, which definitely govern the world
of nanosystems [9].
Nevertheless, even under some of the unfortunate conditions listed in Table 15.1,
there exists already a successful nanotechnology called biology for billions of years,
offering many possibilities for reverse engineering and technological transfer to
novel nanomaterials. And even if there are still a lot of nanotechnological lessons
to be learned from Mother Nature, some remarkable technological breakthroughs
within the last decade have shown that one does not have to be too pessimistic about
the future of nanotechnology [9].
There is indeed a growing number of inorganic nanomaterials, which could be-
come key materials for future nanoelectronics, the most prominent ones being car-
bon fullerenes and carbon nanotubes [10]. And although we know that “prediction
is difficult, especially about the future” (N. Bohr), let us have a look at Fig. 15.4,
which depicts a possible scenario for future nanoelectronics based on nanotubes.
Pretty high up on the list of presents one would like to receive is a controlled lay-
out of heterogeneous tubular networks. Furthermore one would like to have stable
and noiseless junctions in between different nanotubular materials, as well as at the
Table 15.1. Scaling of various physical properties with system size L. We postulate constant
speed and electrostatic fields, and assume a continuum model, where forces are acting through
surfaces of size L2 . [8]
interfaces of nanotubular networks with the outside world. Finally those nanotubu-
lar networks will probably require some supporting substrate, or they might have
to be embedded into some matrix. Therefore one would also need some detailed
knowledge about the interactions between those materials and the nanotubes.
So much for the future. Let us now return to reality, which looks less promising,
at least for carbon nanotubes. First of all, the chirality of carbon nanotubes, which
decides about their electronic properties (semiconducting vs. metallic), may not be
controlled during synthesis [10]. And despite some recent progress [11], there is
no known mechanism to achieve any technologically relevant layout of nanotubular
networks. Third, there seems to be no suitable binding partner for carbon to form
heterogeneous networks with a certain nanoelectronic functionality. And forth the
interfaces between carbon and silicon are noisy and rather unstable.
Therefore the search has long been opened to find other nanotubular materials
with promising new properties [12, 13], and to achieve even more ambitious goals
[12] than the ones sketched in Fig. 15.4.
One candidate nanotubular material has been found in a system, where nobody re-
ally expected to find nanotubes. As we will illustrate in the next paragraph, tradi-
tional boron chemistry seems to be incompatible with the existence of boron nan-
otubes [14]. Nevertheless in the last paragraph of this section, we will draw a radi-
cally different picture of boron chemistry [15], which has been established through a
large series of numerical and experimental studies on small boron clusters and boron
nanostructures [7]. The motor for this development were several theoretical studies
on boron clusters and boron nanotubes [16, 17, 18], which combined ab initio struc-
ture optimization methods with a chemically motivated Aufbau principle for small
boron clusters to predict new classes of nanostructured boron materials [7].
The most prominent features of traditional boron chemistry are boron icosahedra,
as well as a complicating bonding pattern involving 2-center and 3-center bonds,
Interface Nanotube A
Nanotube B
Junction
Substrate
Fig. 15.4. Heterogeneous nanotubular network as a possible blueprint for future technologies.
Such applications may require the controlled layout of nanotubular networks, the formation
of stable and noiseless tubular heterojunctions, a detailed knowledge of tube-substrate or
tube-matrix interactions, and noiseless interfaces between nanotubes and the outside world
15 Structure Optimization and Microscopic Modelling 447
2-center bonds
3-center bonds
Boron icosahedra
Fig. 15.5. Rhombohedral unit cell of α-boron with icosahedral boron clusters located around
its vertices. The bonding is rather complicated, and it involves 2-center and 3-center bonds
between boron icosahedra
see Fig. 15.5 and [14]. By the way, there is a common prejudice that icosahedral
symmetry should be impossible for crystalline systems, but Fig. 15.5 is certainly
the perfect counterexample.
One might ask oneself: How is it possible, that a chemical element with only five
electrons will show such a complex bonding pattern? The answer is: Because it has
only five electrons! Let us have a look a Table 15.2, where we showed the electronic
configurations of single atoms for Be, B and C, together with their coordinations in
pure solid phases.
Obviously Be and B have a smaller number of valence electrons than stable
orbitals for this shell (see below), and they turn out to be rather highly coordinated.
This is a general trend observed for electron deficient (ED) materials, and Pauling
[14] gave the following characteristics for this kind of bonding:
(i) The ligancy of ED atoms is higher than the number of valence electrons, and
even higher than the number of stable orbitals (4:1 × (2s) + 3 × (2p)).
(ii) ED elements atoms cause adjacent atoms to increase their ligancy to values
greater than the orbital numbers.
A typical electron deficient element is a metal like Be, but even boron, which
is a semiconductor [19]), shows both characteristics. First we see from Table 15.2
Table 15.2. Electronic configuration of single atoms, and typical atomic coordinations within
solid phases for the electron deficient (ED) elements Be and B, in comparison to a non ED
element like C. Note the rather high atomic coordinations within the solid configurations of
Be and B, which is in clear contrast to C
and Fig. 15.5 that boron has a coordination higher than four. Second recent ab initio
studies of B-C clusters [20] and tubular B-C heterojunctions [21] show that even
carbon takes coordinations higher than four in a boron environment.
However, boron icosahedra are only one part of the story. The other part were a
series of ab initio studies on small boron clusters summarized by Boustani [16]. The
main results are shown in Fig. 15.6. First of all, it is quite obvious from Fig. 15.6
(a) that boron icosahedra are unstable. Here the ab initio studies clearly suggest that
the stable isolated B12 -clusters are flat (the so-called boron flat out, see [15]). This
behavior may be understood on the basis of a general aromaticity theory for boron
clusters (see [7] and references therein).
Second, from the ab initio studies of small boron clusters, one may infer a gen-
eral Aufbau principle for boron clusters. This Aufbau principle states that the stable
boron clusters can be built from two basic units, only: The pentagonal and hexagonal
pyramidal units B6 and B7 shown in Fig. 15.6 (b).
One of the most interesting consequences of this Aufbau principle [16] is shown in
Fig. 15.7 (a): Further and further additions of hexagonal B7 units should lead to sta-
ble nanostructures in the form of boron sheets or boron nanotubes. In the following,
(a) 9
4 11 5 2
6 11
12 7
1 2 4
3
9 7
7 10 8 12
5 8 1
4 3 6 10
6
1
5 2
3
B12 – D2h B12 – C2h
B9 – Cs
0 3 10 4
4 9 11 13 5 12 6
4 10 2 1 0 2
5 9 1
2 1 13 1
6 5 8
12 7
7 2 9
10 8 8 4 11 5 7
7 3 10 6
3 6
8
(b) 1 1
3 2
5 3
4 2
7 4
B6 – C5υ
B7 – C2υ
Fig. 15.6. Ab initio studies of small boron clusters reveal that (a) isolated boron icosahedra
are unstable, because the stable B12 clusters are flat. (b) Pyramidal B6 and B7 clusters being
the basic units of an Aufbau principle for boron clusters [16]
15 Structure Optimization and Microscopic Modelling 449
(a)
40
EF
30
DOS
20
10
(b) (c) 0
–14 –12 –10 –8 –6 –4 –2 0 2 4 6
Energy (eV)
Fig. 15.7. (a) According to the Aufbau principle [16] one may add hexagonal B7 units to
either form stable quasiplanar structures or stable tubular structures. (b) Portrait of a stable
boron nanotube. (c) Typical density of states for boron nanotubes, which should be metallic,
independent of their chirality [22]
we will focus our discussion on boron nanotubes. As for the boron sheets, the inter-
ested reader must be referred to a recent article [22] dealing with structure models
for stable boron sheets and their relations to boron nanotubes.
Boron nanotubes were originally postulated in [17] on the basis of an extensive
ab initio study, which demonstrated the principal stability of such structures. Beyond
that, an much larger class of metal-boron nanotubes was predicted in [23], which is
also summarized in [7].
A proper structure model for a pure boron nanotube is shown in Fig. 15.7 (b).
From a structural point of view, each boron nanotube may be characterized by a
certain chirality, and one may classify them according to a scheme developed for
carbon nanotubes (see [10]). When trying to determine the basic electronic proper-
ties, one finds that boron nanotubes should always be metallic [22], independent of
their chirality, as shown in Fig. 15.7 (c). This is in striking contrast to carbon nan-
otubes, where the basic electronic properties (metallic vs. semiconducting) depend
quite critically on their chirality (see [10]).
450 A. Quandt
Fig. 15.8. (a) Boron nanotubes growing out of a template structure [24]. (b) Amorphous
boron nanowires [25]
15 Structure Optimization and Microscopic Modelling 451
We want to represent the electronic part of our general Hamiltonian from (14.1) in
the framework of second quantization, where the fermionic degrees will be repre-
sented by a set of fermionic field operators
ψ(r) = φi (r)ci ,
i
†
ψ (r) = φ∗i (r)c†i (15.17)
i
Here i runs over the labels of a complete basis, including spin. These fermionic
operators have the following anticommutator relations:
where σ and σ ′ label the spin components of the field operators. A single Slater
determinant defined in (14.23) or (14.34) will be interpreted as a set of creation
operators c†i acting on the vacuum state |0:
With
N
N M N
1 Zα 1
H(r, R) = (− ∆ri ) + − + (15.20)
i
2 i α
|ri − Rα | i<j
|ri − r j |
M
1 Zα
ψ † (r) − ∆r + ψ(r) dr
2 α
|r − Rα |
1 1
+ ψ † (r)ψ † (r ′ ) ψ(r ′
)ψ(r) drdr′
2 |r − r′ |
1
= tij c†i cj + vijkl c†i c†j cl ck ≡ H(c† , c) . (15.21)
ij
2
ijkl
15 Structure Optimization and Microscopic Modelling 453
At this point it already becomes rather clear that any transferable parameterization
of a model Hamiltonian will either involve a lot of parameters t... and v... , or one has
to find a way to remove a lot of interaction terms and eliminate a lot of degrees of
freedom. If successful, one might finally obtain a model Hamiltonian for complex
systems, which contains a few adjustable parameters, only. In order to arrive at this
point, we better rely on intuitive approaches, where we somehow guess the right
form of the Hamiltonian. A useful model Hamiltonian will only comprise those
interactions and degrees of freedom that really determine the physical or chemical
properties we are interested in.
However the formally neglected interactions and degrees of freedom will not be
dropped. Instead we will include them in a chosen model Hamiltonian in terms of
a proper renormalization of the model parameters, but according to the following
rules:
– Include implicitly, as a renormalization of the parameters, what is not included
explicitly in the model.
– What is included explicitly in the model should not be included implicitly (⇒
no double-counting).
In this section, we want to discuss some model Hamiltonians that are useful for our
basic understanding of physical and chemical phenomena in complex materials and
strongly correlated systems.
Our first task will be to derive a one-electron model Hamiltonian as discussed in
Sect. 14.2.3. Such a Hamiltonian is formally given by the first part of (15.19), and it
is diagonal in the orthogonal basis spanned by its eigenstates φi (x). Nevertheless,
apart from the atomic case, these eigenstates are not very localized, and in order to
to arrive at a transferable model Hamiltonian, we better try to expand the fermionic
field operators ψ † (r) and ψ(r) in a complete basis set of (localized, atomic-like)
orbitals ϕμ (r):
ψ(r) = ϕμ (r)bμ
μ
†
ψ (r) = ϕ∗μ (r)b†μ . (15.22)
μ
In order to reduce the number of model parameters for H(b† , b), we may assume that
the diagonal elements of tμν should be close to atomic energy levels, and that the
hopping terms (resonances) should extend to nearest neighbors, only. Furthermore,
we may assume that these hopping terms somehow depend on the distance of the
454 A. Quandt
hopping centers, and the mutual orientation of the contributing orbitals. We will
illustrate this point in more detail in Sect. 15.3.2.
We now discuss a number of model Hamiltonians for strongly localized systems.
In such systems, the on-site electron-electron repulsion is much stronger than the
resonance energies associated with the overlap of orbitals centered around different
atoms. The former effect will keep the electrons as far away from each other as
possible, whereas the latter effect will keep them close to each other, in order to
maximize the overlap between neighboring orbitals (see Sect. 14.2.1).
The simplest model Hamiltonian for strongly correlated systems comprises two
electrons distributed over two orbitals. Following [27], we denote the contribu-
tions from theses orbitals with a label l, which means ligand, and a label f , which
might stand for a 4f -electron. The corresponding orbital energies are ǫl and ǫf with
ǫf < ǫl . The hybridization between the l and f orbitals, which is characterized by
a parameter V , is assumed to be small such that V ≪ (ǫl − ǫf ). Finally we assume
that the strong repulsion between the f -orbitals should be characterized by a very
large parameter U ≫ (ǫl − ǫf ). Then we make the following Ansatz:
H = ǫl lσ† lσ + ǫf fσ† fσ + V (lσ† fσ + fσ† lσ ) + U nf↑ nf↓ . (15.24)
σ σ σ
The operators l(†) and f (†) create and destroy electrons with spin σ in the corre-
sponding l and f states, and nfα = fα† fα is the occupation number for f electrons
of spin α. When V = 0, the electrons will just sit on their atomic sites, due to the
strong repulsion U . Furthermore, a polar state, where two electrons actually sit on
the same f site, might safely be excluded by assuming that U → ∞. For further
discussion see [27].
Next we present a popular model Hamiltonian for a magnetic impurity embed-
ded in a metal, which is due to Anderson [29]. The basic setup for a 3d impurity
embedded in a sp host is indicated in Fig. 15.9 (a). The conduction electrons of
the periodic sp-host are noninteracting with each other. Instead they interact with
U U
sp 3d
t
(a) (b)
Fig. 15.9. (a) Anderson model for a 3d impurity embedded in a sp host. There is a weak
hybridization V between the host and the impurity, and a strong repulsion U that effects
the d-electrons, only. (b) Hubbard model to describe strong electron correlations in metallic
compounds. We assume hopping t between different sites, and a strong on-site repulsion U
15 Structure Optimization and Microscopic Modelling 455
a periodic potential generated by the lattice and the mean field of the remaining
electrons. Therefore they may be represented by Bloch states (see [32]):
φnk (r) = N eik·r unk (r) with: unk (r + R) = unk (r)
ik·R
=⇒ φnk (r + R) = e φnk (r) . (15.25)
These states are obviously composed of a plane wave times a function unk (r),
which is periodic with the lattice period r. They correspond to freely propagat-
ing electrons with dispersion ǫn (k), where n is the band index, and the continuum
k that forms this band are restricted to the first Brillouin zone [32].
According to Hund’s rule, there is a multiplet of electronic states distributed
among the orbitals of the impurity, and a strong Coulomb repulsion U between spins
of different orientation. It is best to describe these states by atomic-like localized
orbitals, for example Wannier functions, which are the Fourier transformed of Bloch
functions:
wn (r − R) = N ′ e−ik·R φnk (r) . (15.26)
k
We also assume a weak hybridization V between the impurity and its host. This
leads to the following model Hamiltonian:
U d d
H= ǫ(k)c†kσ ckσ + ǫ3d ndm + n m n m′
m
2 ′
kσ m=m
+ (Vmkσ d†m ckσ + ∗
Vmkσ c†kσ dm ) . (15.27)
mkσ
Here m and m′ are quantum numbers that characterizes the spin up and spin down
multiplets residing on the d-host. The meaning of the remaining terms should be
clear from (15.24).
Finally we want to mention a model Hamiltonian due to Hubbard [30], which is
used to describe strong correlations among 3d-electrons in a transition metal (com-
pound), as illustrated in Fig. 15.9 (b). We assume hopping t between different sites,
and strong Coulomb repulsion U among spin multiplets characterized by m and m′ ,
sitting on the same site. This leads to the following Ansatz:
H= tim,jm′ d†imσ djm′ σ + U nimσ nim′ σ′ . (15.28)
ij mm′ σ i (mσ)<(m′ σ′ )
Again, the meaning of all terms should be clear from (15.24) and (15.27). For more
details about this model see Chap. 18.
Note that the model Hamiltonians presented in this section are the topic of
many research papers, and we will not even try to comment on the physics de-
scribed by these models. Instead we want to point out that these model Hamiltoni-
ans are sometimes augmented by adding long-range Coulomb interactions, electron-
phonon coupling and other effects, which might be relevant for the real system under
consideration.
456 A. Quandt
We now discuss a general downfolding method due to Löwdin [31], which is also
known as matrix condensation or Schur complement [33]. It is a general technique
that may be applied to one-electron and many-electron Hamiltonians alike. In or-
der to eliminate some degrees of freedom from a quantum mechanical description
of a chosen system, we will partition the full Hilbert space related to the system
Hamiltonian H into a model space with corresponding projection operator P = P 2 ,
and the rest of that Hilbert space with projector Q = (1 − P ) = Q2 . With a slight
abuse of notation, such a partitioning may be formalized in terms of a block matrix
representation of the Hamiltonian H, and a vector representation of a general state
ψ from the Hilbert space related to H:
P HP P HQ Pψ
H⇒ ; ψ⇒ . (15.29)
QHP QHQ Qψ
If we let (H − ǫI) operate on ψ, where I denotes the identity matrix and ǫ a real
number, we will obtain a new state ψ ′ different from zero, unless ǫ is an eigenvalue,
and ψ the corresponding eigenvector:
(P HP − ǫP IP ) P HQ Pψ P ψ′
· = (15.30)
QHP (QHQ − ǫQIQ) Qψ Qψ ′
(P HP − ǫP IP )P ψ + (P HQ)Qψ = P ψ ′ ,
(QHP )P ψ + (QHQ − ǫQIQ)Qψ = Qψ ′ . (15.31)
If we multiply the second equation with −(P HQ)(QHQ − ǫQIQ)−1 and add this
to the first equation, we obtain a new set of block equations:
′
(Hred (ǫ) − ǫP IP )P ψ = ψred (ǫ) ,
′
(QHP )P ψ + (QHQ − ǫQIQ)Qψ = Qψ , (15.32)
where Hred is a somewhat reduced, but ǫ-dependent matrix, the so-called Schur
′
complement, and ψred is a new energy-dependent component of the primed state:
1
Hred (ǫ) = P HP − P HQ QHP ,
(QHQ − ǫQIQ)
′ 1
ψred (ǫ) = P ψ ′ − P HQ Qψ ′ . (15.33)
(QHQ − ǫQIQ)
Thus the reduced Hamiltonian Hred (ǫ) will have the same spectrum as the original
Hamiltonian H, but only if the corresponding eigenstates ψ of H live in our model
space selected by P . In practice one does not loose too much accuracy if one uses
a modified reduced Hamiltonian Hred (# ǫ), which depends on some suitably chosen
(i.e. typical) energy #
ǫ.
An interesting example to illustrate the usage of model Hamiltonians are the struc-
tural and physical properties of C60 molecules and their related solid structures
[34]. Undoped C60 molecules as shown in Fig. 15.10 (a) crystallize as a sc phase
at temperatures below 249 K, but at room temperature the preferred structure is
fcc. Within those solid phases, the C60 molecules interact with each other over rel-
atively large distances, and their mutual orientation must be the result of a rather
weak chemical bonding. This is certainly an interesting problem to be tackled using
a suitable model Hamiltonian.
Furthermore it is possible to dope C60 solids with alkali atoms A (B) =
K, Rb, Cs. Those enter the solid at various tetragonal or octagonal sites inside the
molecule [35], and each of them donates one extra electron, but they have little ef-
fect on the electronic states close to the Fermi energy EF indicated in Fig. 15.10 (b).
Also their main structural effect is limited to a mere expansion of the lattices.
For An−x Bx C60 solids with n ≤ 3 one actually observes superconductivity [36]
with Tc in the range of 40 K, whereas for higher doping levels the solid structure
becomes bct or bcc, and superconductivity disappears. Heavily doped compounds
with n = 6 are insulating. Therefore the electronic structure of (doped) C60 solids
will be another interesting question to be tackled using model Hamiltonians.
458 A. Quandt
t1u N=3
30 eV EF
hu
0.5 eV
3.1 A
C60 C60
1.4 A
(a) (b)
Fig. 15.10. (a) C60 molecule. (b) Structural details and the valence states for molecular and
solid C60 . The valence states of the molecule broaden to some rather narrow bands of the
solid, due to weak C60 -C60 bonding
Without going into the various details described in [34] and [35], we now want
to illustrate how to arrive at a useful model Hamiltonian for (doped) C60 solids.
Each carbon atom in C60 has one 2s and three 2p orbitals, which form three approx-
imate sp2 orbitals in the molecular surface pointing towards neighboring carbon
atoms, and one radial pr orbital. The sp2 orbitals are forming strongly σ-bonding
or antibonding orbitals far away from the Fermi level, and they are irrelevant for the
properties that we are interested in. But the 60 2pr orbitals form weakly π-bonding
or antibonding orbitals close to the Fermi level, and they point towards neighbor-
ing C60 molecules. These are the type of atomic orbitals that should be included in
our preliminary model Hamiltonian for a C60 molecule, which is a simple hopping
Hamiltonian similar to (15.23):
†
H = ǫ2pr ciσ ciσ + tij c†iσ cjσ . (15.36)
iσ ijσ
Here i and j are running over all the 60 sites of the C60 molecule, and the hopping
term involves nearest neighbors ij, only. This assumption may be dropped in the
case of a solid, but only for those sites that really contribute to the bonding between
different C60 molecules [35]. Note that in the case of a solid phase, the operators
(†)
ci will refer to Bloch states φik (r) (see (15.25)), rather than atomic orbitals φi (r).
In such a case, the molecular states will broaden and become subbands, as indicated
in Fig. 15.10 (b).
To make our model more realistic, we assume that the hopping terms tij will de-
pend on the mutual orientation of the atomic orbitals located at sites Ri and Rj , and
on the interatomic distance between them. This leads to the following Ansatz [35]:
15 Structure Optimization and Microscopic Modelling 459
Here n is the number of orbitals that contribute to the band, M is the number of
contributing atoms, and ij runs over all the orbitals in the basis, but the hopping
will be restricted to nearest neighbors, only. What we basically have to do now is to
count all hopping cycles of length two to the appropriate neighbors, plug this into
(15.38), and match the resulting bandwidth with the width of the corresponding den-
sity of states. In practice it might be easier to simply adjust the parameter vσ , such
that ab initio bandwidths will be reproduced, and in combination with a variation of
lattice sizes, the second parameter λ may be fitted quite accurately [34, 35].
But it turns out that the model Hamiltonian can be reduced even further. In
alkali-doped C60 molecules, the important orbitals are three degenerate t1u orbitals
close to the Fermi level, see Fig. 15.10 (b). In the original basis of the 2pr atomic
orbitals φi (r), which are located around Ri , the three t1u orbitals φ′m (r) are just:
60
φ′m (r) = cm
i φi (r) . (15.39)
i=1
The corresponding hopping terms tmμ,nν between two t1u orbitals labelled by m
and n, which are associated with different C60 molecules located around Rμ and
Rν , may be calculated from the basic hopping terms tij defined in (15.37):
60
60
tmμ,nν = cm n
i cj tiμ,jν . (15.40)
i j
Thus we finally obtain a Hamiltonian for alkali-doped C60 solids, where every
molecule may just be described by three t1u states, instead of the 60 × 4 = 240
2pr orbitals:
460 A. Quandt
H = ǫt1u c†mμσ cmμσ + tmμ,nν c†mμσ cnνσ . (15.41)
mμσ mμ,nν,σ
Again, the hopping terms should comprise nearest neighbors only, and the basic
fitting may be carried out on the basis of the general procedure described above (see
(15.38)). In Fig. 15.11, we show a comparison between an ab initio band structure
for an alkali-doped C60 -molecule and a band structure obtained using the model
Hamiltonian from (15.41), which are obviously matching quite well [38]. Other
important properties like the orientation of C60 molecules and the main features
of superconductivity may also be predicted quite reliably [34, 35] using the model
Hamiltonians of (15.36) and (15.41).
0.6
(a)
0.5
0.4
Energy
C60 TB
0.3
0.2
0.1
0
Γ X W L Γ K
Fig. 15.11. Comparison between an ab initio band structure for RbC60 (above) and band
structure calculation using the model Hamiltonian of (15.41) shown below [38]
15 Structure Optimization and Microscopic Modelling 461
O32–
Cu2+ O22–
CuO3 chains of edge-sharing CuO4 squares. Those chains determine various phys-
ical properties, but the superconductivity is thought to be mediated by electrons
within the copper-oxide planes.
A simple-minded electron count (Cu3+ Cu2+ 2 ) in undoped YBa2 Cu3 O7 reveals
that the Cu in the copper-oxide planes has the formal charge Cu2+ , see Fig. 15.12.
But the formal charge of Cu2+ implies that the d-shell of the copper atom will be
incomplete (d9 ). The corresponding hole is mainly put into the highest antibond-
ing state of a Cu-O bond, which is of 3dx2 −y2 character. In such a situation, one
would certainly expect a metallic behavior, but YBCO is actually a semiconductor,
caused by strong correlations of the electrons within the copper-oxide planes [41].
Of course, these rather formal considerations must be amended for real YBCO ma-
terials, where doping turns out to be an essential precondition for superconductivity.
If we take all of these basic structural and electronic features into account, the
band structure of YBCO may be simplified using a first downfold to remove all
bands other than the ones that refer to electrons within the planes. This leads to an
8-band model Hamiltonian, which comprises the states of type Cux2 −y2 , O2x , O3y ,
Cus , Cuxz , Cuyz , O2z and O3z . This model Hamiltonian may be parameterized
following a procedure explained in the previous paragraph (see (15.38)). But in
order to obtain an orthonormal model Hamiltonian, we have to modify our general
downfolding procedure. To understand this, we first expand the Hamiltonian Hred (ǫ)
of (15.33) around the Fermi energy ǫF [39, 40]:
∂Hred
Hred (ǫ) − ǫ ≈ Hred (ǫF ) + (ǫ − ǫF ) − ǫ ≡ H − ǫS . (15.42)
∂ǫ
Up to first order in ǫ, the expansion will obviously lead to a generalized eigen-
value problem that we already encountered before in (14.13). Such a generalized
eigenvalue problem indicates that the chosen basis functions are non-orthogonal.
Therefore, in order to obtain a Hamiltonian for an orthogonal basis, we just have to
make the following transformation [2]:
(H − ǫS)C = 0
1 1 1 1 1 1
=⇒ S − 2 HS − 2 (S 2 C) − ǫS − 2 SS − 2 (S 2 C) = (H ′ − ǫI)C ′ = 0 . (15.43)
We may then continue to downfold copper and oxygen bands and arrive at
a 3-band model Hamiltonian, which contains the Cux2 −y2 , O2x and O3y bands.
462 A. Quandt
The price to pay is that oxygen on-site energies will be renormalized, and that the
reduced Hamiltonian will contain 2nd -nearest-neighbor O2x ↔ O3y hopping (see
Fig. 15.12), as well as 3rd -nearest-neighbor O2x ↔ O2x and O3y ↔ O3y hopping.
In other words, hopping becomes more and more long-ranged, and downfolding
remains accurate solely over a smaller and smaller energy range.
If we finally downfold the remaining oxygen bands to obtain a 1-band model
Hamiltonian for the essential Cux2 −y2 band, the latter will contain up to 9th-nearest-
neighbor hopping integrals [39]. After all, the downfolded bands did not vanish into
thin air! All the way down to the 1-band model Hamiltonian, their basic character
survived in the various renormalizations of the remaining on-site and hopping terms.
Finally we want to recommend another study, which employs the downfolding
procedure described in this paragraph to a much simpler and exactly solvable model
for 3d compounds [42]. That paper also illustrates some of the techniques discussed
in the following section.
In the first line, we were setting the derivative of the energy as a functional of the
occupation number of the 3d states equal to the corresponding one-particle state of
the Kohn-Sham equations. This is known as Janak’s theorem, and an elementary
derivation can be found in [43]. Furthermore we see from the second line in (15.45),
that U might in principle be obtained from the knowledge of total energies E(n)
for discrete changes of the occupation numbers n, or from the knowledge of the
variation of the 3d Kohn-Sham eigenstate as a function of a continuous n.
To determine U for rare earth compounds, one can follow Herring [44] and
assume that changes in the occupation numbers of the 4f states are accompanied by
changes in the occupation of other localized atomic states, such that the atom as a
whole will remain neutral (perfect screening). This approach was used in [45] and
[46], and good agreement with experiment [47] was obtained, see Fig. 15.13. Later
work [48] confirmed that perfect screening is indeed a rather useful assumption for
rare earth compounds, but not for transition metal compounds.
When the perfect screening assumption is invalid, one can apply constrained
density functional theory [49]. Here the idea is to fix the occupation number of a
Kohn-Sham state φi to a value Ni by introducing a Lagrangian multiplier v. To this
end, we formally rewrite (14.41) including this additional Lagrangian multiplier:
E[Ni ] = min F [n(r)] + v (ni (r) − Ni ) dr . (15.46)
φk (r) Ω
We then obtain a set of one-particle equations similar to (14.42), but with an addi-
tional projection potential v, which acts on φi (r) in a restricted (atomic) domain Ω,
only. All other orbitals are allowed to relax, thus describing an optimally screened
excitation.
14 Experiment
Theory
12
10
U (eV)
Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm
where
The coupling constant gmn;μ is related to the shift ∆ǫt1u (ωμ ) of the atomic ener-
gies ǫt1u , when the C60 molecule is distorted in the direction of the phonon mode
corresponding to ωμ . In order to determine the dimensionless coupling constant λ,
we also have to know the density of states at the Fermi level, denoted by N (ǫF ). All
of these values may easily be extracted from ab initio data, see [56].
The Hamiltonian of (15.48) already includes the Jahn-Teller effect due to the
coupling to phonons with Hg -symmetry [57]. It is also possible to include Hund’s
rule coupling [58], and such an augmented model Hamiltonian, which is entirely
based on ab initio data, can be used to develop a consistent theory of strong super-
conductivity in C60 solids [58, 59].
In summary, it seems that for 4f compounds, C60 and high-Tc cuprates, one
may determine U rather accurately. For many 3d compounds, the theoretically de-
termined U turns out to be too large [60]. However, a recent study, which includes
proper RPA screening, leads to largely improved results even for early 3d sys-
tems [61].
emphasizes again the importance of ab initio methods for our basic understanding
of the physical and chemical properties of molecules and solids.
Right at the beginning, we pointed out that a continuing success of modern ab
initio methods will not only depend on better theoretical concepts, but also on better
algorithms and better computing hardware. Better theoretical concepts might imply
the construction of novel basis sets, which are ideally suited to treat mesoscopic
or low-dimensional systems, rather than the popular Gaussian or planewave basis
sets implemented in many ab initio packages. Better algorithms could imply novel
algorithms for sparse-matrix eigenvalue problems, or just some new techniques to
visualize and analyze chemistry data provided by ab initio methods. And better
computing hardware could imply new techniques of distributed computing, or just
a brave jump into a new technology.
Whatever simulation tools the future may bring, two things will always remain:
Ab initio alchemists who want to treat ab initio simulations like a black box, and new
pages in the “Journal of Non Reproducible ab initio Results”. Therefore we finally
added a short Appendix to help you choose your alchemist’s package of choice.
Have fun!
Finally the author would like to thank J. Kunstmann (MPI FKF Stuttgart) for
various illustrations used in Sect. 15.2.2, and O. Gunnarsson (MPI FKF Stuttgart)
for his lecture notes and a number of illustrations, that were forming the basis of
Sect. 15.3.
– SIESTA∗ , another planewave and density functional based code similar to VASP
(http://www.uam.es/departamentos/ciencias/fismateriac/
siesta).
– ABINIT∗ , an open source planewave and density functional based ab initio
package, which is maintained by an very active newsgroup (www.abinit.
org).
– TB-LMTO-ASA∗ , a density functional based ab initio package featuring Muffin-
Tin-orbitals (www.fkf.mpg.de/andersen). It is fast, easy to handle, and
may directly be used to set up analytical models, see Sect. 15.3 and [37].
– CRYSTAL, a package that contains Hartree-Fock and density functional based
methods for solid systems (www.crystal.unito.it).
References
1. D.J. Wales, Energy Landscapes (Cambridge University Press, Cambridge, 2003) 437, 438, 439, 440, 44
2. A. Szabo, N.S. Ostlund, Modern Quantum Chemistry (McGraw-Hill, New York, 1989) 438, 439, 440, 4
3. S.K. Card, J.D. MacKinlay, B. Shneiderman, Readings in information visualization :
using vision to think (Morgan Kaufmann Publishers, San Francisco, 1999) 439
4. Y. Yamaguchi, Y. Osamura, J.D. Goddard, H.F.S. III, A New Dimension to Quantum
Chemistry : Analytic Derivative Methods in Ab initio Molecular Electronic Structure
Theory (Oxford University Press, Oxford, 1994) 440
5. W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.R. Flannery, Numerical Recipes, Vol. 1,
2nd edn. (Cambridge University Press, Cambridge, 1992) 441, 442, 443
6. F. Weinhold, C.R. Landis, Valence and Bonding. A Natural Bond Orbital Donor–
Acceptor Perspective (Cambridge University Press, Cambridge, 2005) 444
7. A. Quandt, I. Boustani, Chem. Phys. Chem 6, 2001 (2005) 444, 446, 448, 449, 450
8. K.E. Drexler, Nanosystems (Wiley, New York, 1992) 445
9. E.L. Wolf, Nanophysics and Nanotechnology (Wiley-VCH, Weinheim, 2004) 445
10. M.S. Dresselhaus, G. Dresselhaus, P. Eklund, Science of Fullerenes and Carbon Nan-
otubes (Academic Press, San Diego, 1996) 445, 446, 449
11. E. Joselevich, C.M. Lieber, Nano Lett. 2, 1137 (2002) 446
12. B. Halford, Chem. and Eng. News 83, 30 (2005) 446
13. W. Tremel, Angew. Chem. 111, 2311 (1999) 446
14. L. Pauling, The Nature of the Chemical Bond, 3rd edn. (Cornell University Press, Ithaca,
1960) 446, 447
15. S.K. Ritter, Chem. and Eng. News 82, 28 (2004) 446, 448
16. I. Boustani, Phys. Rev. B 55, 16426 (1997) 446, 448, 449
17. I. Boustani, A. Quandt, Europhys. Lett. 39, 527 (1997) 446, 449
18. A. Gindulyte, N. Krishnamachari, W.N. Lipscomb, L. Massa, Inorg. Chem 37, 6546
(1998) 446
19. S. Lee, D.M. Bylander, L. Kleinmann, Phys. Rev. B 42, 1316 (1990) 447
20. K. Exner, P. v. R. Schleyer, Science 290, 1937 (2000) 448
21. J. Kunstmann, A. Quandt, J. Chem. Phys. 121, 10680 (2004) 448, 450
22. J. Kunstmann, A. Quandt, Phys. Rev. B 74, 035413 (2006) 449, 450
23. A. Quandt, A.Y. Liu, I. Boustani, Phys. Rev. B 64, 125422 (2001) 449
24. D. Ciuparu, R.F. Klie, Y. Zhu, L. Pfefferle, J. Phys. Chem. B 108, 3967 (2004) 450
468 A. Quandt
25. L. Cao, Z. Zhang, L. Sun, C. Gao, M. He, Y. Wang, Y. Li, X. Zhang, G. Li, J. Zhang,
W. Wang, Adv. Mater. 13, 1701 (2001) 450
26. Y. Imry, Introduction to mesoscopic physics, 2nd edn. (Oxford University Press, Oxford,
2002) 450
27. P. Fulde, Electron Correlations in Molecules and Solids, 3rd edn. (Springer, Berlin
Heidelberg New York, 1995) 451, 454
28. F.E. Harrisa, H.J. Monkhorst, D.L. Freeman, Algebraic and Diagrammatic Methods in
Many-Fermion Theory (Oxford University Press, Oxford, 1989) 451
29. P.W. Anderson, Phys. Rev. 124, 41 (1961) 451, 454
30. J. Hubbard, Proc. Roy. Soc. (London) A 276, 238 (1963) 451, 455
31. P.O. Löwdin, J. Chem. Phys. 19, 1396 (1951) 452, 456
32. N.W. Ashcroft, N.D. Mermin, Solid State Physics (Holt, Rinehard and Winston, Philadel-
phia, 1976) 455
33. G.H. Golub, C.F. van Loan, Matrix computations (The Johns Hopkins University Press,
Baltimore London, 1996) 456
34. O. Gunnarsson, S. Satpathy, O. Jepsen, O.K. Andersen, Phys. Rev. Lett. 67, 3002 (1991)
457, 458, 459, 460
35. S. Satpathy, V.P. Antropov, O.K. Andersen, O. Jepsen, O. Gunnarsson, A.I. Liechten-
stein, Phys. Rev. B 46, 1773 (1992) 457, 458, 459, 460
36. M.J. Rosseinsky, A.P. Ramizez, S.H. Glarum, D.W. Murphy, R.C. Haddon, A.F. Hebard,
T.T.M. Palstra, A.R. Kortan, S.M. Zahurak, A.V. Makhija, Phys. Rev. Lett. 66, 2830
(1991) 457
37. W.A. Harrison, Elementary Electronic Structure, revised edn. (World Scientific,
Singapore, 2004) 459, 467
38. O. Gunnarsson, S.C. Erwin, R.M.M. E. Koch, Phys. Rev. B 57, 2159 (1998) 460
39. O.K. Andersen, A.I. Liechtenstein, O. Jepsen, F. Paulsen, J. Phys. Chem. Solids 56, 1573
(1995) 460, 461, 462
40. O. Jepsen, O.K. Andersen, Z. Phys. B 97, 35 (1995) 460, 461
41. P.W. Anderson, The theory of superconductivity in the high-Tc cuprates (Princeton Uni-
versity Press, Princeton, 1997) 460, 461
42. O. Gunnarsson, Phys. Rev. B 41, 514 (1990) 462
43. R.G. Parr, W. Yang, Density Functional theory of Atoms and Molecules (Oxford Univer-
sity Press, Oxford, 1989) 463
44. C. Herring, Magnetism (Academic Press, New York, 1966) 463
45. J.F. Herbst, R.E. Watson, J.W. Wilkins, Phys. Rev. B 13, 1439 (1976) 463
46. J.F. Herbst, R.E. Watson, J.W. Wilkins, Phys. Rev. B 17, 3089 (1978) 463
47. J.K. Lang, Y. Baer, P.A. Cox, Phys. Lett. 42, 74 (1979) 463
48. V.I. Anisimov, O. Gunnarsson, Phys. Rev. B 43, 7570 (1991) 463
49. P.H. Dederichs, S. Blügel, R. Zeller, H. Akai, Phys. Lett. 53, 2512 (1984) 463, 464
50. M.S. Hybertsen, M. Schlüter, N.E. Christensen, Phys. Rev. B 39, 9028 (1989) 464
51. A.K. McMahan, R.M. Martin, S. Satpathy, Phys. Rev. B 38, 6650 (1988) 464
52. O. Gunnarsson, O.K. Andersen, J.Z. O. Jepsen, Phys. Rev. B 39, 1708 (1989) 464
53. O.K. Andersen, Z. Pavlowska, O. Jepsen, Phys. Rev. B 34, 5253 (1986) 464
54. V.P. Antropov, O. Gunnarsson, O. Jepsen, Phys. Rev. B 46, 13647 (1992) 464
55. R.W. Lof, M.A. van Veenendaak, B. Koopmans, H.T. Jonkman, G.A. Sawatzky, Phys.
Rev. Lett. 68, 3924 (1992) 464
56. V.P. Antropov, O. Gunnarsson, A.I. Liechtenstein, Phys. Rev. B 48, 7651 (1993) 464, 465
57. J.E. Han, E. Koch, O. Gunnarsson, Phys. Rev. Lett. 84, 1276 (2000) 465
58. M. Capone, M. Fabrizio, C. Castellani, E. Tosatti, Science 296, 2364 (2002) 465
15 Structure Optimization and Microscopic Modelling 469
59. J.E. Han, O. Gunnarsson, V.H. Crespi, Phys. Rev. Lett. 90, 167006 (2003) 465
60. V. Drchal, O. Gunnarsson, O. Jepsen, Phys. Rev. B 44, 3518 (1991) 465
61. F. Aryasetiawan, K. Karlsson, O. Jepsen, U. Schönberger, Phys. Rev. B 74, 125106
(2006) 465
62. C.L. Bris, P.L. Lions, Bull. Am. Math. Soc. 42, 291 (2005) 465
63. M.C. Payne, M.P. Teter, D.C. Allan, T.A. Arias, J.D. Joannopoulos, Rev. Mod. Phys. 64,
1045 (2002) 465
16 Dynamical Mean-Field Approximation
and Cluster Methods for Correlated Electron Systems
Thomas Pruschke
Among the various approximate methods used to study many-particle systems the
simplest are mean-field theories, which map the interacting lattice problem onto
an effective single-site model in an effective field. Based on the assumption that
one can neglect non-local fluctuations, they allow to construct a comprehensive and
thermodynamically consistent description of the system and calculate various prop-
erties, for example phase diagrams. Well-known examples for successful mean-field
theories are the Weiss theory for spin models or the Bardeen-Cooper-Schrieffer the-
ory for superconductivity. In the case of interacting electrons the proper choice of
the mean-field becomes important. It turns out that a static description is no longer
appropriate. Instead, a dynamical mean-field has to be introduced, leading to a com-
plicated effective single-site problem, a so-called quantum impurity problem.
This chapter gives an overview of the basics of dynamical mean-field theory
and the techniques used to solve the effective quantum impurity problem. Some
key results for models of interacting electrons, limitations as well as extensions that
systematically include non-local physics are presented.
16.1 Introduction
Strongly correlated electron systems still present a major challenge for a theoretical
treatment. The simplest model describing correlation effects in solids is the one-
band Hubbard model [1, 2, 3]
†
H= tij c†iσ cjσ + U ci↑ ci↑ c†i↓ ci↓ , (16.1)
i,j,σ i
where we use the standard notation of second quantization to represent the electrons
for a given lattice site Ri and spin orientation σ by annihilation (creation) operators
(†)
ciσ . The first term describes a tight-binding band with tunneling amplitude for the
conduction electrons tij , while the second represents the local part of the Coulomb
interaction. Since for this model we assume that the conduction electrons do not
have further orbital degrees of freedom, this local Coulomb interaction acts only if
two electrons at the same site Ri with opposite spin are present.
The complementary nature of the two terms present in the Hubbard model (16.1)
– the kinetic energy or tight-binding part is diagonal in momentum representation,
T. Pruschke: Dynamical Mean-Field Approximation and Cluster Methods for Correlated Electron Systems, Lect. Notes
Phys. 739, 473–503 (2008)
DOI 10.1007/978-3-540-74686-7 16 c Springer-Verlag Berlin Heidelberg 2008
474 T. Pruschke
the interaction part in direct space – already indicates that it will be extremely hard
to solve. One can, however, get at least for filling n = 1 (half filling) some insight
into the physics of the model by a few simple arguments: In the limit U → 0 we
will have a simple metal. On the other hand, for tij → 0, or equivalently U → ∞,
the system will consist of decoupled sites with localized electrons and hence rep-
resents an insulator. We thus can expect that there exists a critical value Uc , where
a transition from a metal to an insulator occurs. Furthermore, from second order
perturbation theory around the atomic limit [4], we find that for |tij |/U → 0 the
Hubbard model (16.1) maps onto a Heisenberg model
H= Jij S i · S j , (16.2)
ij
where S i represents the spin operator at site Ri and the exchange constant is given
by
t2ij
Jij = 2 >0. (16.3)
U
Note that this immediately implies that we will have to expect that the ground state
of the model at half filling will show strong antiferromagnetic correlations.
Away from half filling n = 1 the situation is much less clear. There exists
a theorem by Nagaoka [5], that for U = ∞ and one hole in the half-filled band
the ground state can be ferromagnetic due to a gain in kinetic energy; to what extent
this theorem applies for a thermodynamically finite doping and finite U has not been
solved completely yet. The mapping to the Heisenberg model can still be performed
leading to the so-called t-J model [4], which again tells us that antiferromagnetic
correlations will be at least present and possibly compete with Nagaoka’s mecha-
nism for small Jij or even dominate the physics if Jij is large enough. This is the
realm where models like the Hubbard or t-J model are thought to describe at least
qualitatively the physics of the cuprate high-TC superconductors [6].
The energy scales present in the model are the bandwidth W of the tight-binding
band and the local Coulomb parameter U . From the discussion so far it is clear that
typically we will be interested in the situation U ≈ W or even U ≫ W . This
means, that there is either no clear-cut separation of energy scales, or the largest
energy scale in the problem is given by the two-particle interactions. Thus, standard
perturbation techniques using the interaction as perturbation are usually not reliable
even on a qualitative level; expansions around the atomic limit, on the other hand,
are extremely cumbersome [7] and suffer from non-analyticities [8] which render
calculations at low temperatures meaningless.
The knowledge on correlated electrons systems in general and the Hubbard
model (16.1) in particular acquired during the past decades is therefore mainly due
to the development of a variety of computational techniques, for example quantum
Monte Carlo (QMC), exact diagonalization (ED), and the density-matrix renormal-
ization group (see Parts V, VIII and IX). Since these methods – including modern
developments – have been covered in great detail during this school, I will not dis-
cuss them again at this point but refer the reader to the corresponding chapters in this
16 Dynamical Mean-Field Approximation and Cluster Methods 475
book. The aspect interesting here is that basically all of them are restricted to low-
dimensional systems: For ED, calculations in D > 2 are impossible due to the size
of the Hilbert space, and in case of the DMRG the way the method is constructed
restricts it basically to D = 1. QMC in principle can be applied to any system; how-
ever, the sign problem introduces a severe limitation to the range of applicability
regarding system size, temperature or interaction strengths.
In particular the restriction to finite and usually also small systems make a re-
liable discussion of several aspects of the physics of correlated electron systems
very hard. Typically, one expects these materials to show a rather large variety of
ordered phases, ranging from different magnetic phases with and without orbital or
charge ordering to superconducting phases with properties which typically cannot
be accounted for in standard weak-coupling theory [9]. Moreover, metal-insulator
transitions driven by correlation effects are expected [9], which are connected to a
small energy scale of the electronic system. Both aspects only become visible in a
macroscopically large system: For small finite lattices phase transitions into ordered
states cannot appear, and an identification of such phases requires a thorough finite-
size scaling, which usually is not possible. Furthermore, finite systems typically
have finite-size gaps scaling with the inverse system size, which means that small
low-energy scales appearing in correlated electron materials cannot be identified.
These restrictions motivate the question, if there exists a – possibly approximate
– method that does not suffer from restrictions on temperature and model param-
eters but nevertheless works in the thermodynamic limit and thus allows for phase
transitions and possibly very small low-energy scales dynamically generated due to
the correlations. Such methods are the subject of this contribution.
In Sect. 16.2.1 I will motivate them on a very basic level using the concept of
the mean-field theory well-known from statistical physics, and extend this concept
for the Hubbard model in Sect. 16.2.2, obtaining the so-called dynamical mean-field
theory (DMFT). As we will learn in Sect. 16.2.2.4, the DMFT still constitutes a non-
trivial many-particle problem and I will thus briefly discuss techniques available to
solve the equations of the DMFT. Following some selected results for the Hubbard
model in Sect. 16.2.3 I will touch a recent development to use DMFT in material
science in Sect. 16.2.4. Section 16.3 of this contribution will deal with extensions of
the DMFT, which will be motivated in Sect. 16.3.1. The actual algorithms and their
computational aspects will be discussed in Sects. 16.3.2 and 16.3.3. Some selected
results for the Hubbard model in Sect. 16.3.4 will finish this chapter.
theory which nevertheless describes the properties of the Heisenberg model at least
qualitatively correct, the Weiss mean-field theory [10]. As we all learned in the
course on statistical physics, the basic idea of this approach is the approximate
replacement
where . . .MFT stands for the thermodynamic average with respect to the mean-field
Hamiltonian (16.5) and we dropped a for the present discussion unimportant term
S i MFT · S j MFT . If we define an effective magnetic field or Weiss field accord-
ing to
B i,MF := 2 Jij S j MF , (16.6)
j =i
we may write
(i)
HMF = S i · B i,MF . (16.7)
This replacement is visualized in Fig. 16.1. The form (16.7) also explains the name
assigned to the theory: The Hamiltonian (16.2) is approximated by a single spin in
an effective magnetic field, the mean-field, given by the average over the surround-
ing spins. Note that this treatment does not make any reference to the system size,
i.e. it is also valid in the thermodynamic limit.
The fact, that the mean-field B i,MF is determined by S j MFT immediately leads
to a self-consistency condition for the latter
The precise form of the functional will in general depend on the detailed structure
of the Hamiltonian (16.2). For a simple cubic lattice and nearest-neighbor exchange
/
J for i, j nearest neighbors
Jij = (16.9)
0 otherwise
Fig. 16.1. Sketch for the mean-field theory of the Heisenberg model
16 Dynamical Mean-Field Approximation and Cluster Methods 477
where we put the quantization axis into the z direction and assumed the same value
Sjz MFT for all 2D nearest neighbors. For kB T < |J ∗ |, where J ∗ := 2DJ, this
equation has a solution |Siz MFT | =
0, i.e. the system undergoes a phase transition
to an ordered state (antiferromagnetic for J > 0 and ferromagnetic for J < 0).
As we know, this mean-field treatment yields the qualitatively correct phase di-
agram for the Heisenberg model in D = 3, but fails in dimensions D ≤ 2 and close
to the phase transition for D = 3. The reason is that one has neglected the fluctua-
tions δS i = S i − S i of the neighboring spins. Under what conditions does that
approximation become exact? The answer is given in [10]: The mean-field approx-
imation becomes exact in the formal limit D → ∞, provided on keeps J ∗ = 2DJ
constant. In this limit, each spin has 2D → ∞ nearest neighbors (for a simple cubic
lattice). Assuming ergodicity of the system, one finds that the phase space average
realized by the sum over nearest neighbors becomes equal to the ensemble average,
i.e.
1 D→∞ 1
1
Sj = S j + O( ) . (16.11)
2D 2D D
j = i j = i
∗
The requirement J =const. finally is necessary, because otherwise the energy den-
sity H/N would either be zero or infinity, and the resulting model would be trivial.
For the Heisenberg model (16.2) one can even show that D > 3 is already
sufficient to make the mean-field treatment exact, which explains why this approxi-
mation can yield a rather accurate description for magnets in D = 3.
Obviously, the above argument based on the limit D = ∞ is rather general and
can be applied to other models to define a proper mean-field theory. For example,
applied to disorder models, one obtains the coherent potential approximation (CPA),
where the disorder is replaced by a coherent local scattering potential, which has to
be determined self-consistently via the disorder average. A more detailed discussion
of the capabilities and shortcomings of this mean-field theory is given in Chap. 17.
Here, we want to use the limit D → ∞ to construct a mean-field theory for models
like the Hubbard model (16.1).
Guided by the previous section, the most obvious possibility to construct something
like a mean-field theory for the Hubbard model (16.1) is to approximate the two-
particle interaction term as
c†i↑ ci↑ c†i↓ ci↓ → c†i↑ ci↑ c†i↓ ci↓ + c†i↑ ci↑ c†i↓ ci↓ . (16.12)
which however are of order one. Thus, an argument rendering this approximation
exact in a nontrivial limit is missing here.
As we have observed in Sect. 16.2.1, the proper way to set up a mean-field theory
is to consider the limit D → ∞. Again, this limit has to be introduced such that
the energy density H/N remains finite. As far as the interaction term in (16.1) is
concerned, no problem arises, because it is purely local and thus does not care about
dimensionality. The critical part is obviously the kinetic energy
1 1
Hkin = tij c†iσ cjσ . (16.14)
N N i,j σ
To keep the notation simple, I will concentrate on a simple cubic lattice with nearest-
neighbor hopping
/
−t for Ri and Rj nearest neighbors
tij = (16.15)
0 otherwise
in the following. Starting at a site Ri , one has to apply Hkin to move an electron
to or from site Rj in the nearest-neighbor shell, i.e. c†iσ cjσ ∝ t and consequently
1 1
Hkin = tij c†iσ cjσ ∝ −2Dt2 , (16.16)
N N i,j σ
where the factor 2D arises because we have to sum over the 2D nearest neighbors
[12]. Thus, in order to obtain a finite result in√the limit D → ∞, it has to be per-
formed such that Dt2 = t∗ =const. or t = t∗ / D [12].
What are the consequences of this scaling? To find an answer to this question,
let us consider the quantity directly related to c†iσ cjσ , namely the single-particle
Green function [11]
1
Gkσ (z) = , (16.17)
z + μ − ǫk − Σkσ (z)
16 Dynamical Mean-Field Approximation and Cluster Methods 479
where the kinetic term enters as dispersion ǫk , obtained from the Fourier transform
of tij , and the two-particle interaction leads to the self-energy Σkσ (z), which can,
for example, be obtained from a perturbation series using Feynman diagrams [11].
For the following argument it is useful, to discuss the perturbation expansion in real
(0)
space, i.e. we study now Σij,σ (z). If we represent the Green function Gij,σ (z) for
U = 0 by a (directed) full line and the two-particle interaction U by a dashed line,
the first few terms of the Feynman perturbation series read
i j
Σij,σ (z) = i δij + + ... . (16.18)
i i j
The first, purely local term, evaluates to U niσ̄ , i.e. it is precisely the Hartree ap-
proximation which we found not sufficient to reproduce at least the fundamental
expectations. Let us now turn to the second term. To discuss it further, we need the
(0)
important property Gij,σ (z) ∝ td(i,j) , where d(i, j) is the “taxi-cab metric”, i.e.
√ number of steps to go from site Ri to site Rj . Inserting the scaling
the smallest
t = t∗ / D, we find
d(i,j)
(0) 1
Gij,σ (z)
∝ √ . (16.19)
D
When we insert this scaling property into the second-order term in the expansion
(16.2.2.2), we obtain for j being a nearest neighbor of i
d(i,j )
i j 1 1
Σij,σ (z) − U niσ̄ δij = + ... ∝ √ 3 ∝ √ 3 . (16.20)
i j D D
A closer inspection [12] yields an additional factor D on the right-hand side of the
equation, and we finally arrive at the scaling behavior
1 D→∞
Σij,σ (z) − U niσ̄ δij ∝ √ → 0 (16.21)
D
for the non-local part of the one-particle self-energy. Note that the local contribu-
tions Σii,σ (z) stay finite, i.e.
lim Σij,σ (z) = Σσ (z)δij , (16.22)
D→∞
The fundamental observation underlying the DMFT, namely that one can use the
locality of the self-energy to map the lattice model onto an effective impurity prob-
lem, was first made by Brandt and Mielsch [19]. For the actual derivation of the
DMFT equations for the Hubbard model one can use several different techniques.
I will here present the one based on a comparison of perturbation expansions [20].
A more rigorous derivation can for example be found in the review by Georges et al.
[21]. Let us begin by calculating the local Green function Gii,σ (z), which can be
obtained from Gkσ (z) by summing over all k, i.e.
1 1
Gii,σ (z) = . (16.24)
N z + μ − ǫk − Σσ (z)
k
Since k appears only in the dispersion, we can rewrite the k-sum as integral over
the density of states (DOS) of the model with U = 0
1
ρ(0) (ǫ) = δ(ǫ − ǫk ) (16.25)
N
k
as
ρ(0) (ǫ) (0)
Gii,σ (z) = dǫ = Gii (z + μ − Σσ (z)) , (16.26)
z + μ − ǫ − Σσ (z)
where
(0) ρ(0) (ǫ)
Gii (ζ) = dǫ (16.27)
ζ −ǫ
is the local Green function for U = 0. Note that due to the analytic properties of
Σσ (z) the relation sign {Im [z + μ − Σσ (z)]} = sign Im z always holds.
(0)
Now we can make use of well-known properties of quantities like Gii (z) which
can be represented as Hilbert transform of a positive semi-definite function like the
DOS ρ(0) (ǫ) (see for example [11]), namely they can quite generally be written as
(0) 1
Gii (ζ) = , (16.28)
#
ζ − ∆(ζ)
16 Dynamical Mean-Field Approximation and Cluster Methods 481
#
where ∆(ζ) is completely determined by ρ(0) (ǫ). If we define Gσ (z)−1 := z + μ −
# + μ − Σσ (z)), we can write the Green function for
∆σ (z), where ∆σ (z) := ∆(z
U > 0 as
1 1
Gii,σ (ζ) = = . (16.29)
z + μ − ∆σ (z) − Σσ (z) Gσ (z)−1 − Σσ (z)
Let us now assume that we switch off U at site Ri only. Then Gσ (z) can be viewed
as non-interacting Green function of an impurity model with a perturbation series
for the self-energy. The full line now represents Gσ (z), but the dashed line visu-
alizes still the same two-particle interaction as in (16.2.2.3). Looking into the lit-
erature, for example into the book by Hewson [22], one realizes that this is pre-
cisely the perturbation expansion for the so-called single impurity Anderson model
(SIAM) [23]
1 †
H= εk α†kσ αkσ + εf c†σ cσ + U c†↑ c↑ c†↓ c↓ + √ αkσ cσ + H.c. ,
kσ σ N kσ
(16.31)
which has been studied extensively in the context of moment formation in solids.
Obviously, the quantity Gσ (z) – or equivalently ∆σ (z) – takes the role of the
Weiss field in the MFT for the Heisenberg model. However, in contrast to the MFT
for the Heisenberg model, where we ended up with an effective Hamiltonian of a
single spin in a static field, we now have an effective local problem which is coupled
to a dynamical field, hence the name DMFT. Instead of Weiss field, Gσ (z) or ∆σ (z)
are called effective medium in the context of the DMFT.
The missing link to complete the mean-field equations is the self-consistency
condition which relates the Weiss field Gσ (z) with the solution of the effective im-
purity problem. This reads
ρ(0) (ǫ) !
Gii,σ (z) = dǫ = GSIAM
σ (z) . (16.32)
z + μ − ǫ − Σσ (z)
Thus, Σσ (z) has to be chosen such that the local Green function of the Hubbard
model is identical to the Green function of a fictitious SIAM with non-interacting
Green function
1
Gσ (z) = −1
. (16.33)
Gii,σ (z) + Σσ (z)
The resulting flow-chart for the iterative procedure to solve the Hubbard model
482 T. Pruschke
ρ(0) ( )
Gii,σ (z) = d
z+ μ− −Σσ (z)
Fig. 16.2. The self-consistency loop for the DMFT for the Hubbard model
with the DMFT is shown in Fig. 16.2. The only unknown in it is the box at the bot-
tom saying “solve effective quantum impurity problem”. What the notion quantum
impurity stands for and how the SIAM can be solved will be discussed next.
for finite temperatures was quantum Monte Carlo based on the Hirsch-Fye algo-
rithm [21, 24]. This algorithm and its application to e.g. Hubbard model has already
been discussed extensively in Chap. 10. For these models, the short-ranged interac-
tion and hopping allow for a substantial reduction of the computational effort and
a rather efficient code. For quantum impurity problems, however, the orthogonality
catastrophe mentioned above leads to long-ranged correlations in imaginary time.
Consequently, when we denote with L the number of time slices in the simulation,
the code scales with L3 (instead of L ln L for lattice models [25]). Thus, although
the algorithm does not show a sign problem for quantum impurity problems, the
computational effort increases very strongly with decreasing temperature and also
increasing local interaction. As a result, the quantum Monte Carlo based on the
Hirsch-Fye algorithm is severely limited in the temperatures and interaction param-
eters accessible. For those interested, a rather extensive discussion of the algorithm
and its application to the DMFT can be found in the reviews by Georges et al. and
Maier et al. [21, 24]. Note that with quantum Monte Carlo one is generically re-
stricted to finite temperature, although within the projector quantum Monte Carlo
the ground state properties can be accessed in some cases, too [26].
A rather clever method to handle quantum-impurity systems comprising such a
huge range of energy scales was invented by Wilson in the early 1970’s [27], namely
the numerical renormalization group (NRG). In this approach, the continuum of
states is mapped onto a discrete set, however with exponentially decreasing energy
scales. This trick allows to solve models like the SIAM for arbitrary model parame-
ters and temperatures. A detailed account of this method is beyond the scope of this
contribution but can be found in a recent review by Bulla et al. [28]. Here, the inter-
esting aspect is the actual implementation. One introduces a discretization parameter
Λ > 1 and divides the energy axis into intervals [Λ−(n+1) , Λ−n ], n = 0, 1, . . ., for
both positive and negative energies. After some manipulations [22, 28, 29, 30] one
arrives at a representation
∞
where Himp is the local part of the quantum impurity Hamiltonian. To keep the
(†)
notation short, I represented the impurity degrees of freedom by the operators α−1,σ .
The quantities εn and tn have the property, that they behave like εn ∝ Λ−n/2 and
tn ∝ Λ−n/2 for large n. The calculation now proceeds as follows: Starting from
the impurity degrees of freedom (n = −1) with the Hamiltonian H−1 ≡ Himp , one
successively adds site after site of the semi-infinite chain, generating a sequence of
Hamiltonians
√ √ N +1
HN +1 = ΛHN + Λ εN +1 α†N +1,σ αN +1,σ
σ
√ N +1
The factors Λ in the mapping ensure that at each step N the lowest energy eigen-
values are always of order one. Since for the chain parameters Λ(N +1)/2 tN +1 → 1
holds, the high energy states of the Hamiltonian at step N will not significantly con-
tribute to the low-energy states at step N + 1 and one discards them. This truncation
restricts the size of the Hilbert space at each step sufficiently that the usual expo-
nential growth is suppressed and one can actually repeat the procedure up to almost
arbitrarily large chains.
At each step N , one then has to diagonalize HN , generating all eigenvalues and
eigenvectors. The eigenvectors are needed to calculate matrix elements for the next
step by a unitary transformation of the matrix elements from the previous step. Since
this involves two matrix multiplications, the numerical effort (together with the di-
agonalization) scales with the third power of the dimension of the Hilbert space.
Invoking symmetries of the system, like e.g. charge and spin conservation, one can
reduce the Hamilton matrix at each step to a block structure. This block structure on
the one hand allows for an efficient parallelization and use of SMP machines (for
example with OpenMP). On the other hand, the size of the individual blocks is much
smaller than the actual size of the Hilbert space. For example, with 1000 states kept
in the truncation one has a dimension of the order of 200 for the largest subblock.
The use of the block structure thus considerably reduces the computational effort
necessary at each step.
Moreover, one can identify each chain length N with a temperature or energy
scale Λ−N/2 and can thus approach arbitrarily low temperatures and energies. With
presently available workstations the computational effort of solving the effective
impurity model for DMFT calculations at T = 0 then reduces to a few minutes
using on the order of 10 . . . 100 MB of memory.
Unfortunately, an extension of Wilson’s NRG to more complex quantum im-
purity models including e.g. orbital degrees of freedom or multi-impurity systems
(needed for example for the solution of cluster mean-field theories, see Sect. 16.3)
is not possible beyond four impurity degrees of freedom (where the consumption of
computer resources increases to order of days computation time with ∼ 20 − 30 GB
memory usage), because the step “construct Hamilton matrix of step N + 1 from
Hamilton matrix of step N ” increases the size again exponentially with respect to
the number of impurity degrees of freedom. For a compensation, one has to increase
the number of truncated states in each step appropriately. However, this procedure
breaks down when one starts to truncate states that contribute significantly to the
low-energy properties of the Hamiltonian at step N + 1. In this situation, one is left
with quantum Monte Carlo algorithms as only possible solver at T > 0. At T = 0,
there exists presently not yet a reliable tool to solve quantum impurity models with
substantially more that two impurity degrees of freedom (spin degeneracy). First
attempts to use the density matrix renormalization group method to solve quantum
impurity problems can for example be found in [31, 32, 33].
In the following sections selected results for the Hubbard model within the DMFT
will be discussed. I restrict the presentation to the case of a particle-hole symmetric
16 Dynamical Mean-Field Approximation and Cluster Methods 485
0.6
1
0.5 0,5
Σ(ω+i0 )
0
+
QP −0,5 +
ℜeΣ (ω+i0 )
0.4 −1 +
ℑmΣ (ω+i0 )
−1,5
DOS
−2
0.3 −0,04 −0,02 0 0,02 0,04
LHB ω/W
0.2
UHB
0.1
0
−1 −0.5 0 0.5 1 1.5 2 2.5
ω/W
Fig. 16.3. Generic DMFT result for the DOS of the Hubbard model at T = 0. Model pa-
rameters are U/W = 1.5 and
n = 0.97. W denotes the bandwidth of the dispersion ǫk .
The inset shows the corresponding self-energy Σσ (ω + i0+ ) in the region about the Fermi
energy. One nicely sees the parabolic maximum in the imaginary part and the linear real part
as ω → 0
486 T. Pruschke
are features characteristic for a Fermi liquid. The slope of the real part determines
the quasiparticle renormalization factor or effective mass of the quasiparticles.
One particular feature we expect for the Hubbard model is the occurrence of a metal-
insulator transition (MIT) in the half-filled case n = 1. As already mentioned,
this particular property can serve as a test for the quality of the approximation used
to study the model. That the expected MIT indeed appears in the DMFT has first
been noticed by Jarrell [20] and was subsequently studied in great detail [21]. The
MIT shows up in the DOS as vanishing of the quasiparticle peak with increasing
U . An example for this behavior can be seen in Fig. 16.4. The full curve is the
result of a calculation with a value of U < W , the dashed obtained with U >
1.5 W ≈ Uc . For the latter, the quasiparticle peak at ω = 0 has vanished, i.e. we
have N (ω = 0) = 0. Since the DOS at the Fermi level determines all properties
of a Fermi system, in particular the transport, we can conclude from this result that
for U > Uc the conductivity will be zero, hence the system is an insulator. One
can now perform a series of calculations for different values of U and temperatures
T to obtain the phase diagram for this MIT (see e.g. [36] and references therein).
The result is shown in Fig. 16.5. As an unexpected feature of this MIT one finds
that there exists a hysteresis region, i.e. starting from a metal and increasing U
leads to a different Uc,2 as starting from the insulator at large U and decreasing
U . The coexistence region terminates in a second-order critical end point, which
has the properties of the liquid-gas transition [37, 38]. At T = 0, the transition is
also second order and characterized by a continuously vanishing Drude weight in
the optical conductivity [21], or equivalently a continuously vanishing quasiparticle
renormalization factor [39]. Interestingly, the actual critical line falls almost onto the
0.6
U < Uc
U > Uc
0.4
DOS
0.2
0
−1 −0.5 0 0.5 1
ω/W
0.05
U < Uc
0.04 0.5 U > Uc
DOS
0.03
T/W
0
–6 –4 –2 0 2 4 6
0.02
ω
metal
0.01
insulator
0.00
1.0 1.2 1.4 1.6
U/W
Fig. 16.5. Paramagnetic phase diagram for the Hubbard model at half filling. The transition
between metal and insulator shows a hysteresis denoted by the two critical lines. The inset
shows the behavior of the DOS as U increases. Figure taken from [36]
upper transition [40]. Finally, for temperatures larger than the upper critical point the
MIT turns into a crossover.
Up to now we have discussed the paramagnetic phase of the Hubbard model. What
about the magnetic properties? Does the DMFT in particular cure the failure of the
Hartree approximation, where TN became constant when U → ∞?
Investigations of magnetic properties can be done in two ways. First, one can
calculate the static magnetic susceptibility and search for its divergence. This will
give besides the transition temperature also the proper wave vector of the magnetic
order [21, 41]. For the NRG another method is better suited and yields furthermore
also information about the single-particle properties and hence transport proper-
ties in the antiferromagnetic phase [21, 34]: One starts the calculation with a small
symmetry breaking magnetic field, which will be switched off after the first DMFT
iteration. As result, the system will converge either to a paramagnetic state or a
state with finite polarization. The apparent disadvantage is, that only certain com-
mensurate magnetic structures can be studied, such as the ferromagnet or the Néel
antiferromagnet.
For half filling, the result of such a calculation for the Néel structure at T = 0 is
shown in Fig. 16.6. Quite generally, we expect a stable antiferromagnetic phase at
arbitrarily small values of U with an exponentially small Néel temperature [11, 16].
Indeed we find that the Néel antiferromagnet is the stable solution for all values of
U at T = 0 [42].
488 T. Pruschke
– –
W
Fig. 16.6. DOS for spin up and spin down in the antiferromagnetic phase at half filling and
T = 0. The inset shows the magnetization as function of U
The next question concerns the magnetic phase diagram, in particular the de-
pendence of the Néel temperature TN on U . To this end one has to perform a rather
large number of DMFT calculations systematically varying T and U . The result of
such a survey are the circles denoting the DMFT values for TN (U ) in the phase di-
agram in Fig. 16.7. The dotted line is a fit that for small U behaves ∝ exp (−α/U ),
predicted by weak-coupling theory, while for large U a decay like 1/U is reached.
Thus, the DMFT indeed reproduces the correct U dependence in both limits U → 0
and U → ∞.
0.06
0.05
Metal
0.04
T/W
0.03
Insulator
0.02
Antiferromagnetic insulator
0.01
0.00
0 0.5 1 1.5 2 2.5 3
U/W
Fig. 16.7. Phase diagram for the Néel state at half filling. In addition the paramagnetic MIT
phase lines are included
16 Dynamical Mean-Field Approximation and Cluster Methods 489
1
FM
AFM (??)
0,8
0,6 AFM(PS)
U/(W + U)
0,4
PM
0,2
0
0% 10% 20% 30%
δ
Fig. 16.8. Magnetic phase diagram of the Hubbard model for T = 0. The vertical axis has
been rescaled as U/(W +U ) to encompass the whole interval [0, ∞). The abbreviations mean
paramagnetic metal (PM), antiferromagnet (AFM), phase separation (PS) and ferromagnet
(FM). δ = 1 −
n denotes the doping
490 T. Pruschke
complex physical properties of the Hubbard model (16.1). Moreover, the values for
transition temperatures obtained are strongly reduced as compared to a Stoner or
Hartree approximation [11], thus illuminating the importance of local dynamical
fluctuations due to the two-particle interaction respected by the DMFT.
The finding, that the DMFT for the Hubbard model, besides properly reproduc-
ing all expected features at least qualitatively, also leads to a variety of non-trivial
novel aspects of the physics of this comparatively simple model [21, 35], rather
early triggered the expectation, that this theory can also be a reasonable ansatz to
study real 3D materials. This idea was further supported by several experimental
results on transition metal compound suggesting that the metallic state can be de-
scribed as a Fermi liquid with effective masses larger than the ones predicted by
bandstructure theory [9]. Moreover, with increasing resolution of photoemission
experiments, structures could be resolved that very much looked like the ubiqui-
tous lower Hubbard band and quasiparticle peak found in DMFT, for example in
the series (Sr,Ca)VO3 [44, 45, 46, 47]. It was thus quite reasonable, to try to de-
scribe such materials within a Hubbard model [48, 49]. However, the explanation
of the experiments required an unphysical variation of the value of U across the
series.
The explanation for the failure lies in the orbital degrees of freedom neglected
in the Hubbard model (16.1) but definitely present in transition metal ions. Thus,
a development of quantum impurity solvers for models including orbital degrees
of freedom started [50, 51, 52]. At the same time it became clear, that the number
of adjustable parameters in a multi-orbital Hubbard model increases dramatically
with the degrees of freedom. In view of the restricted sets of experiments that one
can describe within the DMFT, the idea of material specific calculations with this
method actually appears rather ridiculous.
The idea which solved that problem was to use the density functional theory
′
(DFT) [53, 54] to generate the dispersion relation ǫmm k entering the multi-orbital
Hubbard model [55, 56]. Moreover, within the so-called constrained DFT [57] even
a calculation of Coulomb parameters is possible. Thus equipped, a material-specific
many-body theory for transition metal oxides and even lanthanides became possi-
ble, nowadays called LDA+DMFT [58, 59, 60, 61]. The scheme basically works as
follows [58, 61]:
– For a given material, calculate the band structure using DFT with local density
approximation [54].
– Identify the states where local correlations are important and downfold the band-
structure to these states to obtain a Hamilton matrix H(k) describing the dis-
persion of these states. If necessary, include other states overlapping with the
correlated orbitals (for example oxygen 2p for transition metal oxides).
16 Dynamical Mean-Field Approximation and Cluster Methods 491
– From a constrained DFT calculation, obtain the Coulomb parameters for the
correlated orbitals.
– Perform a DMFT calculations using the expression
1 1
Gii,σ (z) = (16.36)
N z + μ − H(k) − Σσ (z)
k
for the local Green function, which now can be a matrix in the orbital indices
taken into account. Note that the self-energy can be a matrix, too.
– If desired, use the result of the DMFT to modify the potential entering the DFT
and repeat from the first step until self-consistency is achieved [56].
As an example for the results obtained in such a parameter-free calculation I present
the DOS for (Sr,Ca)VO3 obtained with the LDA+DMFT scheme compared to pho-
toemission experiments [62] in Fig. 16.9. Apparently, both the position of the struc-
tures and the weight are obtained with rather good accuracy. From these calculations
one can now infer that the structures seen are indeed the lower Hubbard band orig-
inating from the 3d levels, here situated at about −2 eV, and a quasiparticle peak
describing the coherent states in the system.
This example shows that the DMFT is indeed a rather powerful tool to study
3D materials where local correlations dominate the physical properties. There is,
however, not a simple rule of thumb which can tell us when this approach is indeed
applicable and when correlations beyond the DMFT may become important. Even in
seemingly rather simple systems non-local correlations can be important and modify
the dominant effects of the local interactions in a subtle way [63].
–3 –2 –1 0
Energy (eV)
Fig. 16.9. DOS for (Sr,Ca)VO3 obtained from a parameter free LDA+DMFT calcula-
tion (full lines) compared to results from photoemission experiments (symbols). Taken
from [62]
492 T. Pruschke
The DMFT has turned out to be a rather successful theory to describe properties of
strongly correlated electron systems in three dimensions sufficiently far away from
e.g. magnetic phase transitions. Its strength lies in the fact that it correctly includes
the local dynamics induced by the local two-particle interactions. It is, on the other
hand, well-known that in one or two dimensions or in the vicinity of a transition to a
state with long-range order the physics is rather dominated by non-local dynamics,
e.g. spin waves for materials showing magnetic order. Such features are of course
beyond the scope of the DMFT.
As a particular example let us take a look at the qualitative properties of the
Hubbard model in D = 2 on a square lattice at and close to half filling. As we
already know, the model has strong antiferromagnetic correlations for intermediate
and strong U , leading to a phase transition to a Néel state at finite TN in D = 3.
However, in D = 2 the theorem by Mermin and Wagner [64] inhibits a true phase
transition at finite T , only the ground state may show long-range order. Neverthe-
less, the non-local spin correlations exist and can become strong at low temperature
[6]. In particular, a snapshot of the system will increasingly look like the Néel state,
at least in a certain vicinity of a given lattice site.
Such a short-range order in both time and space can have profound effects for
example on the photoemission spectrum. In a true Néel ordered state the broken
translational symmetry leads to a reduced Brillouin zone and hence to a folding back
of the bandstructure, as depicted in Fig. 16.10(a). At the boundary of this so-called
magnetic Brillouin zone, a crossing of the dispersions occurs, which will be split by
interactions and leads to the gap in the DOS and the insulating behavior of the Néel
antiferromagnet. When we suppress the long-range order but still allow for short-
range correlations, the behavior in Fig. 16.10(b) may occur. There is no true broken
translational symmetry, hence the actual dispersion will not change. However, the
(a) (b)
W/2 W/2
Ek
Ek
0 0
–W/2 –W/2
–1 0 1 –1 0 1
k/ π k/ π
Fig. 16.10. Sketch of the effect of long-range Néel order (a) vs. strong short-ranged correla-
tions (b) on the single-particle properties of the Hubbard model in D = 2
16 Dynamical Mean-Field Approximation and Cluster Methods 493
system “feels” the ordered state on certain time and length scales, which leads to
broadened structures at the position of the back-folded bands (shadow bands) in the
spectral function [65, 66]. Furthermore, the tendency to form a gap at the crossing
points at the boundary of the magnetic Brillouin zone can lead to a suppression of
spectral weight at these points (pseudo-gaps) [65].
The paradigm for such a behavior surely are the high-TC superconductors, but
other low-dimensional materials show similar features, too.
Let us in the beginning state the minimum requirements, that a theory extending the
DMFT to include non-local correlation should fulfill: It should
– work in thermodynamic limit,
– treat local dynamics exactly,
– include short-ranged dynamical fluctuations in a systematic and possibly non-
perturbative way,
– be complementarity to finite-system calculations
– and of course remain computationally manageable.
It is of course tempting, to try and start from the DMFT as an approximation that
already properly includes local dynamics and add the non-local physics somehow.
Since the DMFT becomes exact in the limit D → ∞, an expansion in powers of
1/D may seem appropriate [67]. However, while such approaches work well for
wave functions, their extension to the DMFT suffer from so-called self-avoiding
random walk problems, and no proper resummation has been successful yet.
A more pragmatic approach tries to add the non-local fluctuations by hand
[68, 69], but here the problem of possible overcounting of processes arises. More-
over, the type of fluctuations included is strongly biased and the way one includes
them relies on convergence of the perturbation series.
In yet another idea one extends the DMFT by including two-particle fluctuations
locally [70]. In this way, one can indeed observe effects like pseudo-gap formation
in the large-U Hubbard model [71], but cannot obtain any k-dependence in the
spectral function, because the renormalizations are still purely local.
The most successful compromise that fulfills all of the previously stated require-
ments is based on the concept of clusters. There, the basic idea is to replace the
impurity of the DMFT by a small cluster embedded in a medium representing the
remaining infinite lattice. In this way, one tries to combine the advantages of finite-
system calculations, i.e. the proper treatment of local and at least short-ranged corre-
lations, with the properties of the DMFT, viz the introduction of the thermodynamic
limit via the Weiss field. The schematic representation of this approach is shown
in Fig. 16.11. This idea is not new, but has been tried in the context of disordered
systems before [72], and also in various ways for correlated electron models [24].
A rather successful implementation is the cluster perturbation theory, discussed in
Chap. 19. A recent review discussing these previous attempts and their problems is
given in [24].
494 T. Pruschke
Cluster MFT
ky
k’
(0,π) (π,π)
k K
(0,0) (π,0) kx
ΔK
we will call the effective cluster Green function. Obviously, the quantity Ḡσ (K, z)
describes an effective periodic cluster model. The procedure now follows precisely
the ideas of the DMFT. Switching off the interaction in the effective cluster leads to
an effective non-interacting system described by a Green function
1
Ḡσ (K, z) = (16.38)
Ḡσ (K, z)−1 + Σσ (K, z)
and a self-consistency loop depicted in Fig. 16.13.
As in the DMFT, the problematic step is the last box, i.e. the solution of the
effective quantum cluster problem. Note that although we started the construction
from a cluster, the presence of the energy-dependent medium Ḡσ (K, z) renders
this problem again a very complicated many-body problem, just like the effective
quantum impurity problem in the DMFT. However, the situation here is even worse,
because the dynamical degrees of freedom represented by this medium mean that
Nc 1
GKσ (z)=
N z+μ − ΣKσ (z)− K+k
k
even for clusters as small as Nc = 4, the effective system to solve has infinitely
many degrees of freedom. For example the NRG, which is so successful for the
Hubbard model in the DMFT, will suffer from a huge increase of the Hilbert space
(4Nc ) in each step, which makes the method useless. Up to now the only reasonable
technique is quantum Monte Carlo (QMC), and most of the results presented in the
next section will be based on QMC simulations.
Before we move to the presentation of some results for the Hubbard model, let
me make some general comments on the method. First, while the concept of a cluster
MFT seems to be a natural extension of the DMFT, it lacks a similar justification by
an exact limit. The best one can do is view the cluster MFT as interpolation scheme
between the DMFT and the real lattice, systematically including longer ranged cor-
relations. Moreover, the use of a finite cluster introduces the problem of boundary
conditions (BC). In a real space implementation [73] one has to use open BC and
thus has broken translational invariance. As a consequence, k is not a good quan-
tum number any more and one has to work out averaging procedures to recover the
desired diagonality in k-space. The DCA implements periodic BC, but introduces
patches in the Brillouin zone, where Σσ (K, z) is constant. As result, one obtains a
histogram of self-energy values and must use a fitting procedure to recover a smooth
function Σσ (k, z), if desired.
Another potential problem can be causality [72]. In early attempts to set up
cluster approaches, one typically ran into the problem that spectral functions could
become negative. It has been shown, however, that the different implementations of
the cluster MFT are manifestly causal [24].
Last but not least one may wonder how one can implement non-local two-
particle interactions in this scheme, for example nearest-neighbor Coulomb inter-
action or the exchange interaction in models like the t-J model. In the DMFT, these
interactions reduce to their mean-field description [74]. For cluster mean-field theo-
ries, they should in fact be treated similarly to the single-particle hopping. One then
is faced with the requirement, to not only solve for dynamic single-particle prop-
erties in the presence of the effective bath, but also set up a similar scheme for the
two-particle quantities of the effective cluster [24]. In this respect the cluster MFT
acquire a structure similar to the so-called EDMT proposed by Q. Si et al. [70].
In the following I present some selected results obtained with the DCA for the
Hubbard model in D = 2 on a square lattice. If not mentioned otherwise, we will
again use the nearest-neighbor hopping (16.15). A much wider overview can be
found in the review [24].
The first obvious question to ask is how the cluster MFT will modify the single-
particle properties of the Hubbard model. As mentioned, the Mermin-Wagner theo-
rem states that no long-range magnetic order can occur, but from the discussion in
the beginning of this chapter we expect at least the presence of strong non-local spin
fluctuations which should lead to precursors of the ordering at T = 0 in the physical
quantities. In Fig. 16.14 the results of calculations for half filling and U = W/2 with
16 Dynamical Mean-Field Approximation and Cluster Methods 497
Fig. 16.14. DOS for the 2D Hubbard model at half filling and U = W/2 for different tem-
perature using the DMFT (middle panel) and the DCA with Nc = 4 (right panel). The left
panel shows the bare DOS at U = 0 for comparison. Figure taken from [75]
the DMFT (middle panel) and the DCA with a cluster size of Nc = 4 (right panel)
for different temperatures are shown. For comparison the bare DOS is included in
the left panel. In the DMFT, one obtains a phase transition into the Néel state at
some TN > 0. For T > TN , the DOS shows the ubiquitous three-peak structure,
while for T < TN a gap appears in the DOS. No precursor of the transition can be
seen. The DCA, on the other hand, already shows a pronounced pseudo-gap even
at elevated temperatures, which becomes deeper with decreasing temperatures. This
reflects the expected presence of spin fluctuations. Since the DCA still represents a
MFT, a phase transition will eventually occur here, too. However, the correspond-
ing transition temperature is reduced from its DMFT value and the DOS seemingly
varies smoothly from T > TN to T < TN here.
The influence of spin fluctuations close to half filling can also be seen in
the spectral functions A(k, ω) = − Im mG(k, ω + i0+ )/π, which are plotted
along high-symmetry directions of the first Brillouin zone of the square lattice (see
Fig. 16.16) in Fig. 16.15. The calculations were done with Nc = 16 at a temperature
T = W/30 at U = W using a Hirsch-Fye QMC algorithm and maximum entropy
to obtain the real-frequency spectra from the QMC imaginary time date [24, 77].
In the calculation an additional next-nearest neighbor hopping t′ = 0.05 W was
included. For n = 0.8 (left panel of Fig. 16.15) nice quasiparticles along the non-
interacting Fermi surface (base-line in the spectra) can be seen and the imaginary
part of the self-energy (plot in the left corner of the panel) has a nice parabolic ex-
tremum at ω = 0 and is basically k-independent. Thus, in this parameter regime the
DMFT can be a reliable approximation, at least as far as single-particle properties in
the paramagnetic phase are concerned. For n = 0.95 (right panel in Fig. 16.15),
on the other hand, quasiparticles are found along the diagonal of the Brillouin zone
(cold spot), while the structures are strongly overdamped in the region k ≈ (0, π)
(hot spot). The notion hot spot comes from the observation, that in this region the
498 T. Pruschke
(a) (b)
(n)=0.8 (n)=0.95
0 0
–0.5 M→X kb (c) –0.5 M→X kb (c)
–1 –1
–1.5 ℑmΣ(ω+iδ) ka –1.5 ℑmΣ(ω+iδ)
kb
–2 –2 ka
–2.5 –2.5 kb
ω ω
–3 –3
–1 0 1 –1 0 1
0 1 –1 0 1 –1 0 1 –1 0 1
Fig. 16.15. Spectral functions along high-symmetry directions of the first Brillouin zone of
the square lattice (see Fig. 16.16) obtained from a DCA calculation with Nc = 16 for dif-
ferent fillings
n = 0.8 (left panel) and
n = 0.95 (right panel). The figures in the left
corners show the imaginary part of the self-energy at special k-points indicated by the arrows
in the spectra. The model parameters are U = W and T = W/30. Figure taken from [76]
Fermi surface can be connected with the reciprocal lattice vector Q describing the
antiferromagnetic ordering (see Fig. 16.16) (nesting). Obviously, these k-points will
be particularly susceptible to spin fluctuations and acquire additional damping due
to the coupling to those modes.
Finally, one may wonder what the DCA can do for 3D systems. As example,
I show results of a calculation of the Néel temperature for the 3D Hubbard model
at half filling in Fig. 16.17. The figure includes several curves: The one labelled
“Weiss” is obtained from a Weiss mean-field treatment of the antiferromagnetic
Heisenberg model with an exchange coupling J ∼ t2 /U according to (16.3). The
one called “Heisenberg” represents a full calculation for the 3D Heisenberg model
with this exchange coupling, “SOPT” denotes a second-order perturbation theory
calculation for the Hubbard model, “Staudt” recent QMC results [79] and finally
“DMFT” and “DCA” the values for TN obtained from DMFT and DCA respectively.
Obviously, the DCA results in a substantial reduction of TN as compared to the
DMFT, leading to the correct values for all U . As expected, the DMFT overestimates
TN as usual for a mean-field theory, but, as we already know, is otherwise consistent
with the anticipated behavior at both small and large U on the mean-field level.
X
Q Γ
Fig. 16.17. Néel temperature as function of U for the 3D Hubbard model at half filling. For
the different curves see text. Taken from [78]
Note that for the QMC results and the DCA a finite size scaling has been performed,
where for the DCA lattices up to 32 sites were included, i.e. substantially smaller
that in [79].
16.4 Conclusions
Starting from the Weiss mean-field theory for the Heisenberg model, we have devel-
oped a proper mean-field theory for correlated fermionic lattice models with local
interactions. In contrast to the mean-field theory for the Heisenberg model, the fun-
damental quantity in this so-called dynamical mean-field theory is the single-particle
Green function, and the effective local problem turned out to be a quantum-impurity
model. Quantum impurity models are notoriously hard to solve, even with advanced
computational techniques. As a special example, we discussed the numerical renor-
malization group approach in some detail.
As we have seen, the dynamical mean-field theory allows to calculate a vari-
ety of static and dynamic properties for correlated lattice models like the Hubbard
model and its relatives. In contrast to the Hartree-Fock treatment, dynamical renor-
malizations lead to non-trivial phenomena like a Fermi liquid with strongly en-
hanced Fermi liquid parameters, a paramagnetic metal-insulator transition and mag-
netic properties that correctly describe the crossover from weak-coupling Slater an-
tiferromagnetism to Heisenberg physics and Nagaoka ferromagnetism as U → ∞.
In combination with density functional theory, which allows to determine model
parameters for a given material ab initio, a particularly interesting novel approach to
a parameter-free and material-specific calculation of properties of correlated materi-
als arises. Several applications have demonstrated the power of this method, which
can even lead to a quantitative agreement between theory and experiment.
Thus, is the DMFT an all-in-one tool, suitable for every purpose? Definitely not.
We also learned that we have to pay a price for the gain: The DMFT completely
500 T. Pruschke
neglects non-local fluctuations. This means, for example, that it does not care for
the dimensionality of the system and will in particular lead to phase transitions even
in nominally one-dimensional problems. Furthermore, even in three dimensions one
cannot realize ordered states with non-local order parameters – e.g. d-wave super-
conductivity. Thus, for low-dimensional system or in the vicinity of phase transi-
tions, the DMFT surely is not a good approach.
These deficiencies can be cured at least in part by extending the notion of local to
also include clusters in addition to single lattice sites. One then arrives at extensions
of the DMFT like the cluster dynamical mean-field theory or the dynamical cluster
approximation. These theories allow to incorporate at least spatially short-ranged
fluctuations into the calculations. We have learned that these extensions indeed lead
to new phenomena, like formation of pseudo-gaps in the one-particle spectra and the
appearance of new ordered phases with non-local order parameters. Cluster theories
also lead to further renormalizations of transition temperatures or, with large enough
clusters, lead to a suppression of phase transitions in low-dimensional systems, in
accordance with e.g. the Mermin-Wagner theorem.
Again one has to pay a price for this gain, namely a tremendously increased
computational effort. For this reason, calculations are up to now possible only for
comparatively high temperatures and only moderate values for the interaction pa-
rameters. For the same reason, while the DMFT can also be applied to realistic
materials with additional orbital degrees of freedom, cluster mean-field extensions
are presently restricted to single-orbital models. Also, questions concerning critical
properties of phase transitions are out of reach.
Another phenomenon frequently occurring in correlated electron systems, which
cannot be handled by both theories, are quantum phase transitions. This class of
phenomena typically involves long-ranged two-particle fluctuations and very low
temperatures, which are of course beyond the scope of any computational resource
presently available.
The roadmap for further developments and investigations is thus obvious. We
need more efficient algorithms to calculate dynamical properties of complex quan-
tum impurity systems, preferably at low temperatures and T = 0. First steps into
this direction have already been taken through the development of new Monte Carlo
algorithms [80, 81] which show much better performance than the conventional
Hirsch-Fye algorithm and are also sign-problem free [82].
With more efficient algorithms also new possibilities for studies of properties of
correlated electron systems arise: Studies of f -electron systems (heavy Fermions)
with DFT+DMFT or even DFT+cluster mean-field theories; low-temperature prop-
erties of one- or two-dimensional correlated electron systems with large interac-
tion parameter; critical properties and properties in the vicinity of quantum phase
transitions.
This collection of examples shows that, although the DMFT and its cluster ex-
tensions are already well established, the list of possible applications and improve-
ments is large and entering into the field by no means without possible reward.
16 Dynamical Mean-Field Approximation and Cluster Methods 501
References
1. J. Hubbard, Proc. Roy. Soc. London A 276, 238 (1963) 473
2. J. Kanamori, Prog. Theor. Phys. 30, 275 (1963) 473
3. M.C. Gutzwiller, Phys. Rev. Lett. 10, 159 (1963) 473
4. P. Fulde, Electron Correlations in Molecules and Solids. Springer Series in Solid-State
Sciences (Springer Verlag, Berlin/Heidelberg/New York, 1991) 474
5. Y. Nagaoka, Phys. Rev. 147, 392 (1966) 474, 489
6. E. Dagotto, Rev. Mod. Phys. 66, 763 (1994) 474, 492
7. N. Grewe, H. Keiter, Phys. Rev. B 24, 4420 (1981) 474
8. N.E. Bickers, Rev. Mod. Phys. 59, 845 (1987) 474
9. M. Imada, A. Fujimori, Y. Tokura, Rev. Mod. Phys. 70, 1039 (1998) 475, 490
10. C. Itzykson, J.M. Drouffe, Statistical Field Theory Vol. I & II (Cambridge University
Press, Cambridge, 1989) 476, 477
11. J. Negele, H. Orland, Quantum Many-Particle Physics (Addison-Wesley, 1988) 477, 478, 479, 480, 487
12. W. Metzner, D. Vollhardt, Phys. Rev. Lett. 62, 324 (1989) 478, 479
13. H. Schweitzer, G. Czycholl, Z. Phys. B 77, 327 (1990) 480
14. H. Schweitzer, G. Czycholl, Phys. Rev. Lett. 67, 3724 (1991) 480
15. B. Menge, E. Müller-Hartmann, Z. Phys. B 82, 237 (1991) 480
16. P.G.J. van Dongen, Phys. Rev. Lett. 67, 757 (1991) 480, 487
17. P.G.J. van Dongen, Phys. Rev. B 50, 14016 (1994) 480
18. P.G.J. van Dongen, Phys. Rev. B 54, 1584 (1996) 480, 489
19. U.B. und C. Mielsch, Z. Phys. B 82, 37 (1991) 480
20. M. Jarrell, Phys. Rev. Lett. 69, 168 (1992) 480, 486
21. A. Georges, G. Kotliar, W. Krauth, M.J. Rozenberg, Rev. Mod. Phys. 68, 13 (1996) 480, 483, 485, 486,
22. A.C. Hewson, The Kondo Problem to Heavy Fermions. Cambridge Studies in Magnetism
(Cambridge University Press, Cambridge, 1993) 481, 482, 483
23. P.W. Anderson, Phys. Rev. 124, 41 (1961) 481
24. T.A. Maier, M. Jarrell, T. Pruschke, M. Hettler, Rev. Mod. Phys. 77, 1027 (2005) 483, 485, 493, 494, 49
25. R. Blankenbecler, D.J. Scalapino, R.L. Sugar, Phys. Rev. D 24, 2278 (1981) 483
26. M. Feldbacher, K. Held, F. Asaad, Phys. Rev. Lett. 93, 136405 (2004) 483
27. K.G. Wilson, Rev. Mod. Phys. 47, 773 (1975) 483
28. R. Bulla, T. Costi, T. Pruschke, (2007). URL http://arxiv.org/abs/cond-
mat/0701105. Preprint 483
29. H.R. Krishnamurthy, J.W. Wilkins, K.G. Wilson, Phys. Rev. B 21, 1003 (1980) 483
30. H.R. Krishnamurthy, J.W. Wilkins, K.G. Wilson, Phys. Rev. B 21, 1044 (1980) 483
31. S. Nishimoto, E. Jeckelmann, J. Phys.: Condens. Matter 16, 613 (2004) 484
32. S. Nishimoto, T. Pruschke, R.M. Noack, J. Phys.: Condens. Matter 18, 981 (2006) 484
33. C. Raas, G.S. Uhrig, F.B. Anders, Phys. Rev. B 69, R041102 (2004) 484
34. T. Pruschke, Prog. Theor. Phys. Suppl. 160, 274 (2005) 485, 487, 489
35. T. Pruschke, M. Jarrell, J.K. Freericks, Adv. in Phys. 44, 187 (1995) 485, 490
36. R. Bulla, T.A. Costi, D. Vollhardt, Phys. Rev. B 64, 045103 (2001) 486, 487
37. G. Moeller, Q. Si, G. Kotliar, M. Rozenberg, D.S. Fisher, Phys. Rev. Lett. 74, 2082
(1995) 486
38. G. Kotliar, E. Lange, , M.J. Rozenberg, Phys. Rev. Lett. 84, 5180 (2000) 486
39. R. Bulla, Phys. Rev. Lett. 83, 136 (1999) 486
40. N.H. Tong, S.Q. Shen, F.C. Pu, Phys. Rev. B 64, 235109 (2001) 487
41. M. Jarrell, T. Pruschke, Z. Phys. B 90, 187 (1993) 487
42. R. Zitzler, T. Pruschke, R. Bulla, Eur. Phys. J. B 27, 473 (2002) 487, 489
502 T. Pruschke
43. T. Pruschke, R. Zitzler, J. Phys.: Condens. Matter 15, 7867 (2003) 489
44. Y. Aiura, F. Iga, Y. Nishihara, H. Ohnuki, H. Kato, Phys. Rev. B 47, 6732 (1993) 490
45. K. Morikawa, T. Mizokawa, K. Kobayashi, A. Fujimori, H. Eisaki, S. Uchida, F. Iga,
Y. Nishihara, Phys. Rev. B 52, 13711 (1995) 490
46. K. Maiti, D.D. Sarma, M.J. Rozenberg, I.H. Inoue, H. Makino, O. Goto, M. Pedio,
R. Cimino, Europhys. Lett. 55, 246 (2001) 490
47. I.H. Inoue, C. Bergemann, I. Hase, S.R. Julian, Phys. Rev. Lett. 88, 236403 (2002) 490
48. M.J. Rozenberg, G. Kotliar, H. Kajueter, G.A. Thomas, D.H. Rapkine, J.M. Honig,
P. Metcalf, Phys. Rev. Lett. 75, 105 (1995) 490
49. M.J. Rozenberg, I.H. Inoue, H. Makino, F. Iga, Y. Nishihara, Phys. Rev. Lett. 76, 4781
(1996) 490
50. M.J. Rozenberg, Phys. Rev. B 55, R4855 (1997) 490
51. J.E. Han, M. Jarrell, D.L. Cox, Phys. Rev. B 58, 4199 (1998) 490
52. K. Held, D. Vollhardt, Eur. Phys. J. B 5, 473 (1998) 490
53. O.K. Andersen, Phys. Rev. B 12, 3060 (1975) 490
54. R.O. Jones, O. Gunnarsson, Rev. Mod. Phys. 61, 689 (1989) 490
55. V.I. Anisimov, A.I. Poteryaev, M.A. Korotin, A.O. Anokhin, G. Kotliar, J. Phys.:
Condens. Matter 9, 7359 (1997) 490
56. V.I. Anisimov, D.E. Kondakov, A.V. Kozhevnikov, I.A. Nekrasov, Z.V. Pchelkina, J.W.
Allen, S.K. Mo, H.D. Kim, P. Metcalf, S. Suga, A. Sekiyama, G. Keller, I. Leonov,
X. Ren, D. Vollhardt, Phys. Rev. B 71, 125119 (2005) 490, 491
57. V.I. Anisimov, O. Gunnarsson, Phys. Rev. B 43, 7570 (1991) 490
58. K. Held, I.A. Nekrasov, G. Keller, V. Eyert, N. Blümer, A.K. McMahan, R.T. Scalettar,
T. Pruschke, V.I. Anisimov, D. Vollhardt, in Quantum Simulations of Complex Many-
Body Systems: From Theory to Algorithms, NIC Series, vol. 10, ed. by J. Grotendorst,
D. Marks, A. Muramatsu (2002), NIC Series, vol. 10, pp. 175–209 490
59. K. Held, I.A. Nekrasov, N. Blümer, V.I. Anisimov, D. Vollhardt, Int. J. Mod. Phys. 15,
2611 (2001) 490
60. K. Held, I.A. Nekrasov, G. Keller, V. Eyert, N. Blümer, A.K. McMahan, R.T. Scalettar, T.
Pruschke, V.I. Anisimov, D. Vollhardt, 490
61. G. Kotliar, S.Y. Savrasov, K. Haule, V.S. Oudovenko, O. Parcollet, C.A. Marianetti, Rev.
Mod. Phys. 78, 865 (2006) 490
62. A. Sekiyama, H. Fujiwara, S. Imada, S. Suga, H. Eisaki, S.I. Uchida, K. Takegahara,
H. Harima, Y. Saitoh, I.A. Nekrasov, G. Keller, D.E. Kondakov, A.V. Kozhevnikov,
T. Pruschke, K. Held, D. Vollhardt, V.I. Anisimov, Phys. Rev. Lett. 93, 156402 (2004) 491
63. A.I. Poteryaev, A.I. Lichtenstein, G. Kotliar, Phys. Rev. Lett. 93, 086401 (2004) 491
64. A. Gelfert, W. Nolting, Journal of Physics: Condensed Matter 13, R505 (2001) 492
65. A. Kampf, J.R. Schrieffer, Phys. Rev. B 41, 6399 (1990) 493
66. A.P. Kampf, J.R. Schrieffer, Phys. Rev. B 42, 7967 (1990) 493
67. F. Gebhard, Phys. Rev. B 41, 9452 (1990) 493
68. T. Obermeier, T. Pruschke, J. Keller, Physica B 230–232, 892 (1997) 493
69. M.V. Sadovskii, I.A. Nekrasov, E.Z. Kuchinskii, T. Pruschke, V.I. Anisimov, Phys. Rev.
B 72, 155105 (2005) 493
70. J.L. Smith, Q. Si, Phys. Rev. B 61, 5184 (2000) 493, 496
71. K. Haule, A. Rosch, J. Kroha, P. Wölfle, Phys. Rev. Lett. 89, 236402 (2002) 493
72. A. Gonis, Green Functions for Ordered and Disordered Systems. Studies in Mathemat-
ical Physics (North-Holland, Amsterdam, 1992) 493, 496
73. G. Kotliar, S.Y. Savrasov, G. Pallson, G. Biroli, Phys. Rev. Lett. 87, 186401 (2001) 496
74. E. Müller-Hartmann, Z. Phys. B 74, 507 (1989) 496
16 Dynamical Mean-Field Approximation and Cluster Methods 503
75. T. Maier, M. Jarrell, T. Pruschke, J. Keller, Eur. Phys. J. B 13, 613 (2000) 497
76. T.A. Maier, T. Pruschke, M. Jarrell, Phys. Rev. B 66, 075102 (2002) 498
77. M. Jarrell, J.E. Gubernatis, Physics Reports 269, 133 (1996) 497
78. P.R.C. Kent, M. Jarrell, T.A. Maier, T. Pruschke, Phys. Rev. B 72, 060411 (2005) 499
79. R. Staudt, M. Dzierzawa, A. Muramatsu, Eur. Phys. J. B 17, 411 (2000) 498, 499
80. A.N. Rubtsov, V.V. Savkin, A.I. Lichtenstein, Phys. Rev. B 72, 035122 (2005) 500
81. P. Werner, A.J. Millis, Phys. Rev. B 74, 155107 (2006) 500
82. E. Gull, P. Werner, A.J. Millis, M. Troyer, (2006). URL http://arxiv.org/abs/
cond-mat/0609438. Preprint 500
17 Local Distribution Approach
17.1 Introduction
Any theory of condensed matter – at least a proper quantum mechanical one – has to
include spatial and temporal fluctuations, and the correlations that develop between
these. Fluctuations in time naturally arise in any interacting system, where a particle
can exchange energy with the rest of the system. In a number of situations spatial
fluctuations are equally important. As we learn in the Born-Oppenheimer approx-
imation [1], electrons in a solid see the ions mainly through a static potential. In
disordered systems spatial fluctuations then arise from an irregular arrangement of
the ions. Even for a regular crystal, at finite temperature ions are elongated from
their equilibrium positions, and the ionic potential fluctuates in space. On a techni-
cal level, the Hubbard-Stratonovich transformation [2, 3] shows how an interacting
fermion system can be mapped onto a non-interacting one coupled to auxiliary fields
which fluctuate in space (and time).
In traditional mean-field descriptions, such as the Weiss theory of magnetism,
fluctuations are at best approximately described, if not neglected at all. As a major
improvement the dynamical mean-field theory (DMFT) [4] – for a detailed expla-
nation and a list of references we refer the reader to Chap. 16 – includes fluctua-
tions and correlations in time by establishing a self-consistent theory for a local but
energy-dependent interaction self-energy. In the course of the DMFT construction,
which is based on the limit of infinite dimension (d = ∞), spatial fluctuations are
averaged out. A natural question is whether one can set up a kind of mean-field
theory which accounts for fluctuations and correlations in space. This contribution
will try to explain that an affirmative answer can be found if one adopts a viewpoint
which has been first advocated for by P. W. Anderson in developing his theory of
A. Alvermann and H. Fehske: Local Distribution Approach, Lect. Notes Phys. 739, 505–526 (2008)
DOI 10.1007/978-3-540-74686-7 17 c Springer-Verlag Berlin Heidelberg 2008
506 A. Alvermann and H. Fehske
localization in disordered systems [5]: To take the stochastic nature of spatial fluc-
tuations serious. Then quantities like the density of states become site-dependent
random quantities, and one has to deal with their distribution instead of some
averages.
In this tutorial we are going to describe an approach resting on this stochastic
viewpoint. This approach employs the distribution of the local density of states as
the quantity of interest, and is accordingly denoted as local distribution (LD) ap-
proach. We will explain how to turn this approach into a working method, and apply
it to two important examples of disordered non-interacting systems. In the discus-
sion of the results we will relate it to a method based on averages, the coherent
potential approximation (CPA) [6]. Then we outline how to combine the stochastic
approach with DMFT to address both interaction and disorder. Anderson localiza-
tion of a Holstein polaron serves as a particular example in this context. Finally, we
take a short look how to cast the Holstein model at finite temperature into a stochas-
tic framework. There is one word of warning to the reader: What we are going to
explain is a fully worked out machinery only to a lesser degree, but constitutes an
original way of thinking which has yet found some applications. This tutorial will
hopefully serve the purpose to get the reader accustomed to the fundamental con-
cepts of a stochastic approach to spatial fluctuations, and to convince him that the
stochastic viewpoint is essential for an appropriate treatment.
We can present the basic ideas best if we concentrate on disordered systems, where
spatial fluctuations are explicitly imposed.1 In a substitutionally disordered system,
like a doped semiconductor or an alloy, disorder primarily manifests through site-
dependent random potentials ǫi . A model to describe electron motion in such a
disordered crystal is given by
† †
H= ǫ i ci ci − t ci cj . (17.1)
i i,j
(†)
In this Hamiltonian, ci denote fermionic operators for tight-binding electrons on a
crystal lattice, and the ǫi account for local potentials arising from the ions composing
the crystal. Note that this is a model of non-interacting fermions whose non-trivial
properties arise from the randomness present in ǫi . Due to randomness, the ǫi are
not fixed to some concrete values, but only their range of possible values is specified
by a probability distribution p(ǫi ). Two examples, which will be discussed below in
detail, are the binary alloy with p(ǫi ) = cA δ(ǫi + ∆/2) + (1 − cA )δ(ǫi − ∆/2),
and the Anderson model of localization p(ǫi ) = (1/γ) Θ(γ/2 − |ǫi |) (see (17.9) and
(17.10)).
A material of certain composition corresponds to some p(ǫi ), while any single
specimen of this material is described by choosing values for ǫi according to p(ǫi ).
1
For reviews on the interesting physics of disordered systems we refer the reader to [7, 8].
17 Local Distribution Approach 507
Any p(ǫi ) therefore defines many Hamiltonians (17.1), one for each concrete choice
of all {ǫi }. Any experiment is carried out on a single specimen, i.e. one of these
Hamiltonians, while in generally we want to describe common properties of all
Hamiltonians defined by p(ǫi ). How then is the typical behavior for some p(ǫi )
related to the specific behavior for fixed ǫi ? For any finite system, there is a small
chance to find untypical values for {ǫi }. For the binary alloy (see (17.9) below) for
example, there is a finite probability cN N
A + cB to have all ǫi equal on N sites –
which gives an ordered system with untypical behavior for the disordered one. In a
crystal with many sites however, this probability is vanishingly small: In this sense
any disordered specimen is typical for the material class.2
In a disordered system translational invariance is broken. In contrast to the de-
scription of ordered systems we then employ quantities that depend on position,
like the local density of states (LDOS) ρi (ω). The LDOS counts the number of
states at a certain energy ω at lattice site i. It is related to the local Green function
Gii (ω) = i|(ω − H)−1 |i by
From the LDOS the density ofstates (DOS) ρ(ω) is obtained as the average over
the crystal volume, ρ(ω) = N1 i ρi (ω) for an N -site lattice. The LDOS generally
contains more information than the DOS. Only in absence of disorder, ρi (ω) = ρ(ω)
for all i. But with disorder, ρi (ω) fluctuates through the system. The important point
we will discuss later is that it would be wrong to say that the LDOS fluctuates around
the DOS. In generally, LDOS fluctuations can render the concept of an averaged
DOS to described the system in whole almost useless.
A tool to measure the LDOS in the laboratory is scanning tunneling microscopy
(STM). In STM, the tunneling current between a tip and the surface of a specimen
is measured. The tunneling current is, in a suitable approximation, proportional to
ρi (ω), convoluted with some apparatus function which accounts for the finite en-
ergy resolution of the STM device. For a given applied voltage STM can therefore
produce a spatially resolved picture of the LDOS. Note that due to the finite energy
resolution several states contribute to the picture of ρi (ω): One always measures the
typical behavior of some eigenstates of the Hamiltonian in the vicinity of ω.
What could not be done with STM, can be done by numerical techniques: To
measure the LDOS even inside a three dimensional cube (Fig. 17.1). The computer
first generates N = L3 values for the ǫi in (17.1) using a random number generator,
and then calculates the LDOS for L2 sites in a quadratic slice of the cube using
e.g. the kernel polynomial method (KPM) (see Chap. 19 in this book). Taking this
2
The critical reader might note that this is not the more difficult question whether all
quantities are self-averaging, that is mean and typical values coincide for large system
sizes. The latter is true if the distribution of a quantity is sharp or at least peaked at the
mean value. As e.g. the distribution P [ρi (ω)] of the local density of states shows, this is
in general not the case. Whether it is true for the conductivity is a different question. The
distribution of a quantity itself is nevertheless always typical.
508 A. Alvermann and H. Fehske
Fig. 17.1. LDOS ρi (ω) for a disordered cube of N = L3 , L = 512, sites. The values of
ǫi were obtained according to the disorder distribution (17.10) of the Anderson model, with
γ/t = 10.0, the calculation has been performed for periodic boundary conditions to avoid
boundary effects. The pictures show a slice of L2 sites, the value of ρi (ω) is color-coded, from
black for very small to white for very large values (see color bar in the middle). In the upper
right edge the 502 sites in the upper left edge of the picture are shown in magnification. Left:
At energy ω/t = 0.0, the LDOS is comparable throughout the crystal. Right: At ω/t = 7.69,
the LDOS is concentrated in finite, separated regions of the crystal. Evidently, the character of
states is very different depending on energy. This indicates the existence of a phase transition,
the so-called Anderson localization, which we will discuss in Sect. 17.2.1
picture,3 one should easily accept that the site-dependence of the LDOS constitutes
an eminent aspect of disordered systems. Apparently, the DOS is not significant for
the different structure of the LDOS: On average, both LDOS pictures in Fig. 17.1
would look the same.
To account for the difference, we have to describe the fluctuations of the LDOS.
Then, both LDOS pictures look different: The right one has strong fluctuations, most
values being small but some very large, while in the left picture values are equally
distributed in some range, and extreme values are rare. To quantify this behavior we
can understand the LDOS with its different values at different sites as a statistical
quantity, whose fluctuations are described by a distribution P [ρi (ω)]. To construct
the distribution from the explicit knowledge of the LDOS, we had to count how
often the LDOS takes a value in a certain range. By this counting we would obtain
P [ρi (ω)] as a histogram. Then, we could also recover the DOS as an (arithmetic)
average
∞
ρ(ω) = ρi P [ρi (ω)] dρi . (17.3)
0
3
With respect to the previous footnote, for N = 5123 sites we expect that the LDOS shows
typical behavior. Indeed, for two different sets of randomly generated values for the ǫi , the
two pictures for the LDOS look qualitatively the same.
17 Local Distribution Approach 509
To find P [ρi (ω)] not only for one specific Hamiltonian out of the many given by
(17.1) for a certain p(ǫi ), we had to repeat this counting for many different choices
of the ǫi until we get the typical form of P [ρi (ω)], which then no longer depends on
concrete values of the ǫi but only on the disorder distribution p(ǫi ). The aim of the
stochastic approach is to construct this distribution at once.
Let us rethink the concept of the LDOS distribution, which we so far have in-
troduced merely as a way of reorganizing information obtained from a calculation
that does not mention distributions at all. To adopt the stochastic viewpoint entirely
we must convince ourselves that distributions of observables are inherent in the def-
inition of the model (17.1). Clearly, the Green function depends on all values {ǫi }.
Each of the values Gii (ω; {ǫi }) occurs with the probability of the realization {ǫi },
which is in turn given by the distribution p(ǫi ). That is: The Green function by itself
is a random variable right from the beginning, and we must deal with its distribution
P [Gii (ω)]. As we will see this point of view is essential for the very understand-
ing of disorder physics. We can now precisely formulate the task to be solved: To
determine P [Gii (ω)] from p(ǫi ).
The distribution P [Gii (ω)] has two important properties. First, though it clearly
depends e.g. on energy ω, it does not depend on the lattice site i – remember, any
value Gii (ω; {ǫi }) for given {ǫi } does –, since due to the definition of model (17.1)
each lattice site is equivalent. On the level of distributions we recover translational
invariance which is otherwise lost. We keep the subscript i just to indicate a local
Green function. Second, ergodicity implies a two-fold meaning of P [Gii (ω)]: It
gives either the probability for a Green function value at a fixed lattice site but all
possible {ǫi }, or the probability for all lattice sites in a typical realization {ǫi }. As
we stated above, for an infinite lattice we get typical realizations almost surely.
We have yet advocated many times for using the distribution of the LDOS (or a
Green function) instead of its average, the DOS. We now establish a scheme that
provides us directly with the distribution for an infinite lattice. Since it is entirely
formulated in terms of distributions of local Green functions, we call it local distri-
bution (LD) approach.
For an arbitrary lattice, both the free DOS ρ0 (ω) and the connectivity K, i.e.
the number of nearest neighbors, enter the LD equations. Compared to theories in
the limit d = ∞, we have the additional parameter K. Since it is a bit tedious to
establish the equations in the general case, we restrict to the case of a Bethe lattice
(see Fig. 17.2) where we get simple equations straightforwardly, as has been first
realized in [9]. As a byproduct, we obtain exact equations in this case. All principal
physical features are retained despite this simplification, as we will demonstrate
below.
The local Green function Gii (ω) can always be expanded as
510 A. Alvermann and H. Fehske
Fig. 17.2. Part of the half-infinite Bethe lattice for K = 2. The Bethe lattice is an infinite
loop-free connected graph, where each site is connected to K + 1 different sites. Cutting one
edge, we obtain the half-infinite Bethe lattice (or Bethe tree) as shown here. The relevance of
Bethe lattices originates from the fact that a number of approximations become exact there –
like the LD approach. However, the precise structure of the Bethe lattice is not as relevant for
the LD approach as it may seem: In principle, only the free DOS is of importance. Especially
simple equations are obtained for the Bethe lattice since the inverse Hilbert transform for the
Bethe DOS is a simple, algebraic function
⎡ ⎤−1
K
(i)
Gii (ω) = ⎣ω − ǫi − t2 Gjk (ω)⎦ . (17.4)
j,k=1
(i)
Here, j, k run over all K neighbors of i, and the superscript (i) indicates that Gjk (ω)
has to be calculated with site i removed from the lattice. On the Bethe lattice, no
path connects different sites j, k adjacent to i once i has been removed. Accordingly,
(17.4) simplifies to
⎡ ⎤−1
K
Gi (ω) = ⎣ω − ǫi − t2 Gj (ω)⎦ , (17.5)
j=1
(i)
where Gj (ω) denotes the Green function Gjj (ω) where the site i to the left of j is
removed (see Fig. 17.2).
Equation (17.5) contains only Green functions of the same type. Hence it is,
in the absence of disorder (ǫi = 0 for all i), a closed equation for the local Green
function Gi (ω) = Gj (ω). Solving that quadratic equation, we find the free Green
function for the Bethe lattice with corresponding semi-circular density of states,
1 4 2
8 W 2
0
Gi (ω) = 2 ω − ω 2 − , (17.6)
W 4
4
0 8 W2 W
ρ (ω) = − ω2 |ω| ≤ , (17.7)
πW 2 4 2
17 Local Distribution Approach 511
√
where W = 4t K is the bandwidth. Note that the DOS does not depend √ on K if
W is fixed. In the limit d = ∞, for K → ∞, the scaling t ∝ # t/ K keeps the
bandwidth constant (cf. Chap. 16).
With disorder, the solution of (17.5) is less simple. Then, Gi (ω) = Gj (ω), and
(17.5) encodes an infinite set of coupled equations, depending on an infinite number
of parameters {ǫi }. The site-dependence of Gi (ω) prevents a closed equation for
the local Green function, and hence a simple solution of the problem. But let us
look at (17.5) once more from the stochastic viewpoint. We already know that the
Green functions in this equation are random variables. We therefore find that (17.5)
determines one random variable Gi (ω) from K + 1 random variables ǫi and Gj (ω),
j = 1, . . . , K. We also know that P [Gi (ω)] = P [Gj (ω)] for all j. Moreover the
K Green functions Gj (ω) which appear on the r.h.s. of (17.5) are independently
distributed. These two properties amount to read (17.5) as a self-consistency or fix-
point equation for one random variable Gi (ω): It determines Gi (ω) on the l.h.s of
(17.5) from K copies of Gi (ω) on the r.h.s. The on-site energy ǫi enters the equation
as the source of randomness, parameterized by p(ǫi ).
To explicitly state this essential point of the LD approach: By the stochastic
reinterpretation of (17.5), the infinite set of equations for values of Gi (ω) turns
out to be a single equation for the stochastic variable Gi (ω) (i.e., for its distribution
P [Gi (ω)]), with only one parameter p(ǫi ). This amounts to a solution for P [Gi (ω)])
entirely in terms of distributions, as provided by the sampling procedure described
below.
For any finite K, (17.5) is a closed equation for the distribution of the random
variable Gi (ω), which cannot be reduced to an equation for a single value like the
average of Gi (ω). In the limit d = ∞ however, spatial fluctuations are averaged out,
√ (17.5) should simplify to one for averages then. Indeed, with the scaling t ∝
and
#
t/ K for K → ∞, the r.h.s. of (17.5) contains a sum of K summands multiplied
with 1/K. Hence this sum becomes an average for K → ∞ according to the law of
large numbers. Integrating over ǫi gives an average also on the l.h.s., and we obtain
the equation
−1
W 2 ave
Gave (ω) = ω−ǫ− G (ω) p(ǫ) dǫ (17.8)
16
for the disorder averaged Green function Gave (ω). This equation is just the self-
consistency equation of the so-called coherent potential approximation (CPA) for
the Bethe lattice4 . Since (17.5) is exact we have, for the special case of the Bethe
lattice, proven that the CPA becomes exact for K → ∞.
It remains to solve the stochastic self-consistency equation (17.5) for P [Gi (ω)]. We
employ a sampling technique which is related to the Gibbs sampling method. Here
4
For an extensive review on CPA see [6].
512 A. Alvermann and H. Fehske
we have to deal with infinitely many random variables instead of finitely many as in
standard Gibbs sampling.
Generally, the sampling solves any stochastic fix-point equation of the form
x = F [x, . . . , x, ǫ], where x and ǫ are random variables, F [x1 , . . . , xK , ǫ] is a
function5 that takes K values xi of x and one value of ǫ. The distribution p(ǫ)
of the external variable ǫ is known a priori. Obviously (17.5) is of that form, with
F [G1 , . . . , GK , ǫi ] given by the r.h.s. of the equation. Then, an implicit equation
has to be solved: If one already knew the solution P [x] one would obtain it again
by means of F [x, . . . , x, ǫ]. Note the difference to the prominent Monte Carlo tech-
nique of importance sampling: While the latter one performs an integral with respect
to a given known distribution, we have to construct the distribution from scratch. For
that purpose we need to represent the distribution, which is conveniently done by a
sample with a certain number Ns of entries xi . Each entry will, as soon as the solu-
tion to the fix-point equation is obtained, be a possible value of x, and the fraction of
entries in a certain range does determine P [x]. To read off P [x] from the sample, we
therefore construct a histogram by counting the appearances of entries in specified
intervals; to build up a sample to P [x] we throw Ns dice, weighted with P [x], and
take the Ns outcomes as sample entries. We note that any permutation of the sample
still represents the same distribution.
The algorithm shown below solves the stochastic fix-point equation like one is
tempted to solve any fix-point equation: By iteration. Starting with initial random
values the sample is repeatedly updated until convergence is obtained. Then the
distribution represented by the sample is a fix-point of the equation. To examine
the following algorithm closely is a good way to comprehend the interpretation of
(17.5) as a stochastic self-consistency equation.
input: distribution p(e), functional F, sample size Ns
output: sample and distribution for P[x]
We remind ourselves that convergence of the sample does not mean conver-
gence of its entries but of the distribution represented. In practice, we may check
this by comparison of some moments extracted of the distributions before and after
each update (2). In principle, convergence of the sampling algorithm cannot be
5
For the equation to make sense, one requires F [xσ1 , . . . , xσK , ǫ] = F [x1 , . . . , xK , ǫ] for
all permutations σ.
17 Local Distribution Approach 513
20
100
variance, average
20 updates
15 10–50 K=2
10–100
P [x]
10 updates
10
K=5
10–150
0 200 400
5 5 updates number of updates
0 updates
2 updates
0
–1 –0.5 0 0.5 1 1.5 2
x
Fig. 17.3. Convergence of a distribution within the sampling algorithm. Solving the equation
x = f (x) with f (x) = x3 − 1.25 · x as a stochastic equation with K = 2. The picture shows
the distribution P [x] of x after some updates of a sample with Ns = 5 × 104 entries; the
inset displays the arithmetic average (solid line) and variance (dashed line) of the sample for
K = 2 and K = 5. The distribution converges to a δ-distribution at the fix-point x0 = 0,
although |f ′ (x0 )| = 1.25 > 1
514 A. Alvermann and H. Fehske
6 × 105
γ = 2.5
γ = 0.2
4 × 105
P [ρi]
1 γ = 1.0
2 × 105
<ρi> 0
10–15 10–10 10–5 100
ρi
0.5
γ = 2.0
0 500 1000
number of updates
Fig. 17.4. Convergence of a distribution within the sampling algorithm. Fluctuations of the
average
ρi of the LDOS distribution P [ρi (ω)] to (17.5) during updates of a sample with
Ns = 5 × 104 entries. The disorder distribution p(ǫi ) is taken from (17.10), and ω = 0.
The curves to γ = 0.2 and γ = 1.0 correspond to the distributions shown in Fig. 17.8. For
γ = 2.5, the average of 100 consecutive updates is shown instead of
ρi (the fluctuations of
ρi would fill the picture). The inset displays P [ρi (ω)] for γ = 2.5. Note the logarithmic
abscissa
sampling. In Fig. 17.4 we show the fluctuations of the average of the LDOS distri-
bution P [ρi (ω)]. The larger γ in this example, i.e. the larger the variance of ǫi , the
stronger fluctuations are. This is not an artifact of the algorithm, but results unavoid-
ably from the properties of the fix-point distribution. As the inset in Fig. 17.4 shows,
the fix-point distribution has extremely large variance. Resolving this equation by a
sample with a finite number of entries results in typical large fluctuations associated
with the statistics of rare events. We will see below, that the strength of fluctuations
may even diverge, which signals a phase transition (here, the Anderson transition
from extended to localized states, see Sect. 17.2.1). With strong fluctuations, the
algorithm does not converge even in an approximate sense, and a single sample is
not a good representation of the distribution. To sample the full distribution we then
have to use a large number of consecutive samples obtained in update step (2).
Note that convergence in the first example has been faster for K = 5 than for
K = 2. For (17.5) this observation implies that convergence becomes better with
increasing K – just as one comes close to the limit K = ∞, where the stochastic
equation can be replaced by one for averages.
we set K = 2 in (17.5),
√ and measure energies in units of the bandwidth W (if we
fix W = 1, t = 1/ 32 for the K = 2-Bethe lattice).
Let us begin with two examples of non-interacting disordered systems [14]. The first
example is the binary alloy model, which describes a solid composed of two atomic
species A, B. The on-site energies are distributed as
where ∆ is the separation of the energy levels of A,B atoms, and cA (cB = 1 − cA )
is the concentration of A (B) atoms.
At a first glance we should expect, for ∆ > W , two bands in the DOS centered
at ±∆/2, with weight cA and 1 − cA respectively. Indeed this is what we get by the
CPA, if we solve (17.8). If we compare to the result obtained from the stochastic
approach, solving (17.5) by the sampling algorithm, we find that the averaged CPA
description misses important features of the alloy (see Fig. 17.5). Remember that the
stochastic approach is exact in this situation: The DOS shown gives the true picture
of the system.
Why does CPA fail in this case? Physically, the electron motion is strongly af-
fected by multiple scattering on finite clusters of either A or B atoms, whereby the
DOS develops a rich structure. The most prominent peaks in the DOS can be di-
rectly attributed to small clusters, as indicated in Fig. 17.5. For the parameters cho-
sen here, the concentration cA is below the classical percolation threshold, hence all
clusters of the minority species A are finite. This is the origin of the fragmentation
3
A-BB
LD
CPA
2
ρ(ω)
B-AB B-AB
0
–2 –1 0 1 2
ω
Fig. 17.5. DOS ρ(ω) for the binary alloy model, with ∆ = 2.0, cA = 0.1. The picture shows
both CPA and LD results. To resolve the δ-peaks in the minority band, the LD curve has been
broadened by including an artificial imaginary part η = 10−3 in the energy ω + iη. Arrows
mark contributions from small finite clusters of atoms. Figure taken from [14]
516 A. Alvermann and H. Fehske
of the minority A-band. CPA, being constructed in the limit K → ∞, averages over
spatial fluctuations and does therefore not properly account for multiple scattering.
From the stochastic viewpoint, this is manifest in the LDOS distribution P [ρi (ω)]
(see Fig. 17.6), which cannot be represented by a single value. Especially it is not
senseful to replace P [ρi (ω)] by ρ(ω) as in the CPA.
The second example we consider is the Anderson model of localization, which
assumes a box distribution of on-site energies
1 γ
p(ǫi ) = Θ( − |ǫi |) , (17.10)
γ 2
with γ ≥ 0 as the strength of disorder. In contrast to the binary alloy with its dis-
crete distribution, the DOS in the Anderson model is well described by CPA, expect
for some details at the band edges (see Fig. 17.7). But, invisible in the DOS, the
character of states is different towards the band edges and in the band center, as
could already be anticipated from Fig. 17.1. While states in the band center resem-
ble distorted Bloch waves, which extend through the whole crystal, states towards
the band edge have appreciable weight only in finite (separated) regions of the crys-
tal. An electron in such a state is not itinerant any more, hence the state is called
localized in contrast to extended Bloch-like states. As localized states do not con-
tribute to the electrical conductivity, Anderson localization is a mechanism to drive
a metal into an insulator as a result of disorder. While for interaction-driven metal-
insulator transitions like the Mott or Peierls transition a gap in the DOS opens at the
transition, the DOS stays finite at the Anderson transition from localized to extended
states. It is only the conductivity which drops to zero.
Guided by our discussion of Fig. 17.1, one expects that localized and extended
states can be distinguished by means of the LDOS distribution. Fig. 17.8 shows
ρ(ω)
3
P[ρi]
0
0 0.25 0.5 0.75 1
ρi
Fig. 17.6. LDOS distribution P [ρi (ω)] for the binary alloy model at ω = 0.0, with ∆ = 0.3,
cA = 0.1. The arrow marks the DOS ρ(ω). Evidently a single value cannot represent the
distribution in any sense
17 Local Distribution Approach 517
0.6
0.4
ρ(ω)
0.2 LD
CPA
0
–1 –0.5 0
ω
Fig. 17.7. DOS ρ(ω) for the Anderson model, at γ = 1.5. The picture shows both CPA and
LD results. Since ρ(−ω) = ρ(ω), only one half of the figure is shown. Note the sharp band
edge within the CPA approximation, and the smooth Lifshitz tails in the LD result. These
tails result from the exponentially few (localized) states at sites with large |ǫi | which are not
resolved within CPA
P [ρi (ω)] for weak and moderate disorder. For weak disorder, the distribution re-
sembles a Gaussian peaked at the (averaged) DOS ρ(ω). With increasing disorder,
as fluctuations of the LDOS grow, the distribution becomes increasingly broad and
asymmetric. The DOS is then not representative for the distribution anymore. With
even increasing disorder, the distribution becomes singular at the transition to local-
ized states: All but infinitesimally small weight resides at ρi = 0. This singularity
in P [ρi (ω)] has to be accessed via analytical continuation of a Green function to
the real axis, as is depicted in Fig. 17.9 Although the distribution becomes singular
at the localization transition, the DOS is nevertheless still finite due to negligible
4 4
ρ(ω)
ρ(ω)
P[ρi]
P[ρi]
2 2
0 0
0 1 2 3 0 0.5 1 1.5 2 2.5 3
ρi ρi
Fig. 17.8. LDOS distribution P [ρi (ω)] for the Anderson model, in the band center ω = 0.
The arrows mark the DOS ρ(ω). Left: For weak disorder γ = 0.2, the distribution is peaked
at the ρ(ω). Right: Already for moderate disorder γ = 1.0, the DOS is not significant for
the distribution, which is very skew and broad. Compare this to the distribution shown in
Fig. 17.4 for even stronger disorder
518 A. Alvermann and H. Fehske
100
108 ρ
10–6
ρ
10–12
ρgeo
P[ρi]
10–18
104
10–12 10–16 10–20
η
η = 10–8
η = 10–9
η = 10–10
100
10–10 10–5 100
ρi
Fig. 17.9. The figure shows, how P [ρi (ω)] for localized states in the Anderson model depends
on the imaginary part η in the energy argument of the Green function Gi (ω + iη). For η → 0,
numerically performing analytical continuation to the real axis, the DOS ρ(ω) stays finite, but
a typical moment, like the geometrically averaged LDOS ρgeo (ω), goes to zero
localized
2.5
states
2
γ
1.5 extended
states
1
0.5
0
0 0.5 1 1.5
ωmob
Fig. 17.10. Phase diagram of the Anderson model. Shown is the mobility edge ωmob vs. γ.
The dashed line shows the exact band edge ω = (W + γ)/2. The trajectory is symmetric
under ωmob → −ωmob . Note that for small γ, ωmob grows before it tends to zero when γ
approaches the critical value for complete localization (so-called re-entrant behavior)
2
Δ
0
–2 –1 0 1 2
ω
Fig. 17.11. Part of a phase diagram of the binary alloy model for concentration cA = 0.1,
showing the DOS for various ∆. The dashed curves show the CPA band edges, the dotted
lines mark ω = ±∆/2 ± W/2. Figure taken from [14]
520 A. Alvermann and H. Fehske
whose explicit form is not known in most cases. For the Bethe lattice with its semi-
circular DOS, simple expressions for Gi (ω) and Gi (ω) exist, namely
K
!−1
Gi (ω) = ω − ǫi − t2 Gj (ω) ,
j=1
) *−1
Gi (ω) = Gi (ω)−1 − Σi (ω)
K
!−1
= ω − ǫi − Σi (ω) − t2 Gj (ω) , (17.13)
j=1
– this is of course just the equivalent to (17.5) – while the complexity of (17.12)
does not reduce a single bit. Clearly, with the Green function Gii (ω) also the self-
energy Σii (ω) is a random quantity. The Equations (17.11)–(17.13) therefore have
the same status in an interacting system as (17.5) has without interaction: They form
stochastic self-consistency equations for Σii (ω) and Gii (ω). Again, what would
be an infinite number of coupled equations for self-energies and Green functions,
reduces to few self-consistency equations if reformulated by means of distributions.
Solving these equations by Monte Carlo sampling the impurity problem (17.12)
has to be solved in each update step (2c). This constitutes the main part of the
high computational complexity of the combined LD+DMFT approach. While in
DMFT one has to solve the impurity problem some times till convergence, it has to
be solved here repeatedly for each entry of the sample. The computational effort is
therefore at least Ns times larger than in DMFT.
In few cases the DMFT impurity problem can be solved exactly. With an ex-
plicit solution for (17.12) at hand, the numerical effort to perform the sampling of
17 Local Distribution Approach 521
Gi (ω) can be handled. One example is the single polaron Holstein model [15] with
Hamiltonian
† √ † †
H = −t ci cj − ε p ω 0 (bi + bi )c†i ci + ω0 bi bi , (17.14)
i,j i i
where an electron is coupled to optical phonons of energy ω0 . For this model, Σi (ω)
is obtained as an infinite continued fraction [16]
1εp ω0
Σi (ω) = . (17.15)
−1 2εp ω0
[Gi (ω − 1ω0 )] −
−1 3εp ω0
[Gi (ω − 2ω0 )] −
···
The continued fraction is an expansion in terms of the maximal number of vir-
tual phonons that are excited at the same time. Evidently, this expansion is non-
perturbative, and contains diagrams of arbitrary order at any truncation depth of the
fraction.
To give an impression of the physical content of the Holstein model, we show in
Fig. 17.12 the DOS ρ(ω) in the anti-adiabatic (i.e. for large ω0 ) strong coupling
regime as obtained from a DMFT calculation based on (17.15). This picture il-
lustrates the formation of a new quasi-particle which is a compound object of an
0.5
Im Σ(ω) ρ(ω)
–15
–30
–5 –4 –3 –2 –1 0 1 2 3 4 5
1
0.5
Im Σ(ω) ρ(ω)
–15 pole of
Σ(ω)
–30
–4.509 –4.508 –4.507 –4.506
ω
Fig. 17.12. The Holstein polaron at strong coupling and large phonon frequency. We show
the DOS for the Holstein model with ω0 /W = 0.5625, εp /W = 4.5. The center of the
lowest sub-band is located nearly at −εp (the polaron shift), and bands are separated by
ω0 . The bandwidth of the lowest sub-band, which is shown in detail in the lower panel, is
Wsub = 3.45 × 10−4 W
522 A. Alvermann and H. Fehske
electron with a surrounding cloud of phonons. This so-called small polaron is char-
acterized by an extremely large mass resulting in a narrow quasi-particle band (in
Fig. 17.12 the effective mass of the polaron is four orders of magnitude larger than
the free electron mass). Note that, while the lowest polaron is fully coherent, as an
effect of inelastic electron-phonon interaction higher bands are incoherent. Accord-
ingly, the imaginary part of the self-energy is finite. The reader should be aware that
the properties of the polaron intimately depend on the parameter values. Here we do
by no means provide a general picture of polaron physics. For detailed discussions
see e.g. [17, 18, 19], for a DMFT study of small polarons [20].
If the Hamiltonians (17.1) and (17.14) are combined, we obtain a model to study
possible effects of Anderson localization of a polaron. Like for the polaron itself,
the physics of polaron localization is diverse and complicated. A general discussion,
as partly given in [21], is far beyond the scope of this tutorial. For the parameters
used in Fig. 17.12 however, the polaron in its lowest band is a small and heavy
quasi-particle with infinite lifetime. We therefore expect that disorder affects this
quasi-particle like a free electron, but with the mass of the polaron. We can scruti-
nize this expectation within the LD+DMFT approach, which provides the mobility
edge trajectory for the lowest sub-band (Fig. 17.13). Rescaling the trajectory prop-
erly it perfectly matches the trajectory of the Anderson model in Fig. 17.10. As a
fundamental observation we note that the critical disorder for complete localization
of all states in the polaron sub-band is renormalized by Wsub /W as compared to
the free electron: In any real material such a polaron would be localized for almost
arbitrarily small disorder.
3.5
2.5
2
γ /Wsub
1.5
1 extended
0.5
localized localized
0
–1 –0.5 0 0.5 1
ωmob [relative to (polaronic sub-) band]
Fig. 17.13. Phase diagram for Anderson localization of a Holstein polaron at strong coupling
and large phonon frequency. As in the previous figure, ω0 /W = 0.5625, εp /W = 4.5.
Shown is the mobility edge for the lowest polaronic sub-band (circles) in comparison to
the Anderson model for a free electron (crosses). γ and ωmob is rescaled to the respective
bandwidth. The energy scale of both curves accordingly differs by almost four orders of
magnitude, as Wsub = 3.45 × 10−4 W
17 Local Distribution Approach 523
In the previous section we addressed the Holstein model at zero temperature, and
imposed spatial fluctuations by disorder. But even without disorder, the physics of
the Holstein model (17.14) may be strongly influenced by static scattering off spa-
tial fluctuations. As mentioned in the introduction, this is the case for heavy ions,
i.e. small oscillator frequency ω0 , when ions act as static scatterers to first order. If at
finite temperature ions are displaced from their equilibrium positions, the concomi-
tant random potential acts as a static disorder potential to first order (see also [19]).
Let us consider the limit of large ionic mass M , keeping the spring constant ks =
M ω02 of the harmonic oscillator ω0 b†i bi constant. This limit, the so-called adiabatic
limit ω0 → 0 of small phonon frequency, is opposite to the regime of large phonon
frequency to which the example from the previous section (Fig. 17.12) belongs. In
the limit ω0 → 0 ions are nearly classical particles. Classical states in the context
of the harmonic oscillator can be constructed as coherent states |α. Remember
that
a coherent states is a Gaussian wavepacket centered at X̄iα = α|Xi |α =
2/(M ω0 ) Re α, with the position operator Xi = 1/(2M ω0)(bi + b†i ).
It is not difficult to convince oneself, that the thermal (Boltzmann) trace over
boson eigenstates |n can be expressed as an integral over coherent states:
∞
1 −βnω0 eβω0 − 1 2
Trβ [. . . ] = e n| . . . |n = d2 α e(1−exp(βω0 ))|α| α| . . . |α.
2 n=0 π
(17.16)
& 2
In the spirit of Monte Carlo integration the complex plane integral d α . . . has a
stochastic counterpart: The integral value is obtained by sampling the expectation
value α| . . . |α for a complex random variable α with Gaussian probability density
∝ exp[(1−exp(βω0 ))|α|2 ]. This results in a stochastic interpretation of the Holstein
model at finite temperature. The random part of the model is the initial state of the
bosonic subspace, given by random coherent states |αi at site i according to the
specific distribution for αi . The bosonic vacuum at T = 0 is therefore replaced by a
fluctuating vacuum, where the strength of fluctuations depends on T .
524 A. Alvermann and H. Fehske
From a local point of view as in the previous section, we need the Green func-
tion Gαi (ω), which in contrast to the Holstein model at T = 0 is not evaluated in
the bosonic vacuum but within a certain coherent state |αi . The Green function is
given by
!−1
Gαi (ω) = Gi (ω)
−1
− 2εp ks X̄iα − Σiα (ω) . (17.17)
This expression is of the same type as (17.13), with a static disorder contribu-
tion given by the random variable X̄iα , and a self-energy contribution Σiα (ω) ac-
√
counting for finite lifetime effects, i.e. finite ω0 . Note that εp X̄iα , being an ef-
fect of interaction, enters Gα α
i (ω) but not Gi (ω). X̄i has Gaussian distribution
α α 2
P [X̄i ] ∝ exp[(1 − exp(βω0 ))M ω0 (X̄i ) /2] resulting from (17.16). Both for
high temperature (β → 0) and in the adiabatic limit (ω0 → 0), the classical result
2
P [X̄iα ] ∝ exp[−βks (X̄iα ) /2] is obtained. Note that the Green function Gα i (ω) is
evaluated in bosonic states that are not eigenstates of the bosonic number operator
b†i bi , and therefore in principle is a non-equilibrium Green function with different
analytical properties as retarded Green functions Gi (ω) used elsewhere in the text.
On average however, i.e. for the disorder averaged Green function Gα i which is
obtained as the average over αi instead of ǫi as in the previous sections, the full
analytical properties of a retarded Green function are recovered.
The self-energy Σiα (ω) can be expressed as a continued fraction like for the Hol-
stein model at zero temperature. The expression is derived at considerably less ease
than before – e.g. using Mori-Zwanzig projection techniques [22] – and acquires a
less systematic form. From the top level of the continued fraction,
√
ω0 (εp − 2i εp ω0 Im αi )
Σiα (ω) = √
2 εp ω0 (εp + ω0 ) Re αi + εp ω0 (1 − 4i Re αi Im αi )
ω− √ − ...
εp − 2i εp ω0 Im αi
(17.18)
we deduce that Σiα (ω) is of order ω0 , while X̄iα is of order 1. The expression for
Gαi (ω) therefore acquires the correct form as an expansion in ω0 . Note that (17.16)–
(17.18) hold for any parameters values, but are constructed to work in the limit of
small ω0 . The continued fraction (17.15), which is straightforwardly generalized
to arbitrary eigenstates |n of b† b, is not applicable in this case: For ω0 → 0 the
number of bosons in the thermal trace becomes large, which renders an expansion
in the number of excited bosons useless.
By (17.16)–(17.18) the Holstein model for small ω0 → 0 and finite T is cast
in a form that is amenable to the stochastic method explicated in the preceding
sections. Here, we do not supply actual calculations based on that. The bottom line
instead is the interpretation provided by our reformulation: Temperature induced
spatial fluctuations act to a certain degree like (static) disorder. In (17.17) the main
source of resistivity due to scattering off thermally excited phonons is translated
scattering: With increasing T , the amount of fluctuations of the disorder
to disorder
potential εp ks X̄iα increases, and electron motion is strongly suppressed. We know
from disordered systems that the suppression is much larger than expected from
17 Local Distribution Approach 525
17.3 Summary
At the end of this tutorial we shall return to the initial question we raised: How
to set up a kind of mean-field theory for spatial fluctuations and correlations. The
essential idea argued for is to adopt a stochastic viewpoint: The mean-field in the
theory has to be the distribution of a certain quantity – that is a stochastic mean-field
theory which does not have a mean-field at all. We first had to convince ourselves –
taking disordered systems as the example for fluctuations of a potential in space –
that important quantities like the density of states are indeed best understood as
random quantities which should be described by their distribution. The main effort
was to construct a working scheme, the LD approach, out of this basic premise of
the stochastic viewpoint. Technically that included the derivation of a closed set of
stochastic equations for the distribution of the local density of states as the quantity
of interest. In the derivation a complicated set of equations could be collapsed into a
single equation if formulated with the help of distributions. For the solution of this
stochastic equation we discussed the application of Monte Carlo sampling.
As much as we used disordered systems to motivate the central concepts leading
to the LD approach we took them as the first example to demonstrate its application.
Notably, even a complex non-local effect like Anderson localization is correctly de-
scribed by distributions of a local quantity. This demonstrates how correlations turn
up in local distributions. On the other hand we had to accept that a disordered sys-
tem is always far from the limit d = ∞. Both the second example – Anderson lo-
calization of a Holstein polaron as an interacting disordered system – and the third
example – the Holstein model at finite temperature – show that we generally cannot
separate temporal fluctuations from spatial ones. The competition between the dif-
ferent physical mechanism present in these problems gives rise to very rich physical
behavior. The central features of such systems become accessible only within a the-
ory which accounts for both spatial and temporal fluctuations on an equal footing,
as the combined LD+DMFT approach does.
6
Remember the discussion in the previous section concerning the case of large ω0 , opposite
to the adiabatic limit addressed here. There we noted that only in a coherent polaron band
Anderson localization affects a polaron like a free – albeit heavy – particle.
526 A. Alvermann and H. Fehske
There is a number of (open) questions we could not touch upon here. The cal-
culation of transport properties is one important example, which is not really un-
derstood at the present stage of development. Taking the Holstein model at finite
temperature as an example, we sketched how to address the issue of transport at
T > 0 in the notoriously difficult limit of small ω0 by means of a stochastic formu-
lation. To actually resolve this issue within the LD approach we have to specify a
way how to obtain the electric conductivity from local distributions, aside from the
need to actually perform the numerical calculations. There is no definite answer yet,
which is ready to be implemented. We nevertheless believe to have given arguments
that thinking in terms of distributions can prove worthwhile also here. Maybe we
should rephrase our introductory word of warning concerning the content of this
tutorial: It’s not just about a method, it’s about a way of thinking!
References
1. N.W. Ashcroft, N.D. Mermin, Solid State Physics (Saunders College Publ., Philadelphia,
1976) 505
2. J. Hubbard, Phys. Rev. Lett. 3, 77 (1959) 505
3. R.L. Stratonovich, Dokl. Akad. Nauk SSSR 115, 1097 (1957) 505
4. A. Georges, G. Kotliar, W. Krauth, M.J. Rozenberg, Rev. Mod. Phys. 68, 13 (1996) 505
5. P.W. Anderson, Phys. Rev. 109, 1492 (1958) 506
6. R.J. Elliot, J.A. Krumhansl, P.L. Leath, Rev. Mod. Phys. 46, 465 (1974) 506, 511
7. P.A. Lee, T.V. Ramakrishnan, Rev. Mod. Phys. 57, 287 (1985) 506
8. B. Kramer, A. Mac Kinnon, Rep. Prog. Phys. 56, 1469 (1993) 506
9. R. Abou-Chacra, D.J. Thouless, P.W. Anderson, J. Phys. C 6, 1734 (1973) 509
10. S.M. Girvin, M. Jonson, Phys. Rev. B 22, 3583 (1980) 514
11. D.E. Logan, P.G. Wolynes, Phys. Rev. B 29, 6560 (1984) 514
12. D.E. Logan, P.G. Wolynes, Phys. Rev. B 36, 4135 (1987) 514
13. V. Dobrosavljević, G. Kotliar, Philos. Trans. Roy. Soc. Lond., Ser. A 356, 57 (1998) 514, 520
14. A. Alvermann, H. Fehske, Eur. Phys. J. B 48, 205 (2005) 515, 519
15. T. Holstein, Ann. Phys. (N.Y.) 8, 343 (1959) 521
16. H. Sumi, J. Phys. Soc. Jpn. 36, 770 (1974) 521
17. Y.A. Firsov, Polarons (Izd. Nauka, Moscow, 1975) 522
18. H. Fehske, A. Alvermann, M. Hohenadler, G. Wellein, in Polarons in Bulk Materials and
Systems With Reduced Dimensionality, International School of Physics Enrico Fermi,
Vol. 161, ed. by G. Iadonisi, J. Ranninger, G. De Filippis (IOS Press, Amsterdam, 2006),
International School of Physics Enrico Fermi, Vol. 161, pp. 285–296 522
19. H. Fehske and S.A. Trugman in Polarons in Advanced Materials, Ed. A.S. Alexandrov,
Springer Series in Material Sciences Vol. 103, pp. 393–461 (Canopus/Springer,
Dordrecht 2007) 522, 523
20. S. Ciuchi, F. de Pasquale, S. Fratini, D. Feinberg, Phys. Rev. B 56, 4494 (1997) 522
21. F.X. Bronold, A. Alvermann, H. Fehske, Philos. Mag. 84, 673 (2004) 522
22. P. Fulde, Electron Correlation in Molecules and Solids (Springer-Verlag, Berlin, 1991) 524
18 Exact Diagonalization Techniques
In this chapter we show how to calculate a few eigenstates of the full Hamiltonian
matrix of an interacting quantum system. Naturally, this implies that the Hilbert
space of the problem has to be truncated, either by considering finite systems or by
imposing suitable cut-offs, or both. All of the presented methods are iterative, i.e.,
the Hamiltonian matrix is applied repeatedly to a set of vectors from the Hilbert
space. In addition, most quantum many-particle problems lead to a sparse matrix
representation of the Hamiltonian, where only a very small fraction of the matrix
elements is non-zero.
Before we can start applying sparse matrix algorithms, we need to translate the con-
sidered many-particle Hamiltonian, given in the language of second quantization,
into a sparse Hermitian matrix. Usually, this is the intellectually and technically
challenging part of the project, in particular, if we want to take into account sym-
metries of the problem.
Typical lattice models in solid state physics involve electrons, spins and phonons.
Within this part we will focus on the Hubbard model,
†
H = −t ciσ cjσ + H.c. + U ni↑ ni↓ , (18.1)
ij,σ i
(†)
which describes a single band of electrons ciσ (niσ = c†iσ ciσ ) with on-site Coulomb
interaction U . Originally [1, 2, 3], it was introduced to study correlation effects and
ferromagnetism in narrow band transition metals. After the discovery of high-TC
superconductors the model became very popular again, since it is considered as
the simplest lattice model which, in two dimensions, may have a superconducting
phase. In one dimension, the model is exactly solvable [4, 5], hence we can check
our numerics for correctness. From the Hubbard model at half-filling, taking the
limit U → ∞, we can derive the Heisenberg model
H= Jij S i · S j , (18.2)
ij
A. Weiße and H. Fehske: Exact Diagonalization Techniques, Lect. Notes Phys. 739, 529–544 (2008)
DOI 10.1007/978-3-540-74686-7 18 c Springer-Verlag Berlin Heidelberg 2008
530 A. Weiße and H. Fehske
which accounts for the magnetic properties of insulating compounds that are gov-
erned by the exchange interaction J ∼ t2 /U between localized spins S i . In many
solids the electronic degrees of freedom will interact also with vibrations of the
(†)
crystal lattice, described in harmonic approximation by bosons bi (phonons). This
leads to microscopic models like the Holstein-Hubbard model
†
H =−t (ciσ cjσ + H.c.) + U ni↑ ni↓
ij,σ i
− gω0 (b†i + bi )niσ + ω0 b†i bi . (18.3)
i,σ i
With the methods described in this part, such models can be studied on finite
clusters with a few dozen sites, both at zero and at finite temperature. In special
cases, e.g., for the problem of few polarons, also infinite systems are accessible.
To be specific, let us derive all the general concepts of basis construction for the
Hubbard model on an one-dimensional chain or ring. For a single site i, the Hilbert
space of the model (18.1) consists of four states,
(i) |0 = no electron at site i,
(ii) c†i↓ |0 = one down-spin electron at site i,
(iii) c†i↑ |0 = one up-spin electron at site i, and
(iv) c†i↑ c†i↓ |0 = two electrons at site i.
Consequently, for a finite cluster of L sites, the full Hilbert space has dimension
4L . This is a rapidly growing number, and without symmetrization we could not go
beyond L ≈ 16 even on the biggest supercomputers.
Given a symmetry of the system, i.e. an operator A that commutes with H,
the Hamiltonian will not mix states from different eigenspaces of A. Therefore,
the matrix representing H will acquire a block structure, and we can handle each
block separately (see Fig. 18.1). The Hubbard Hamiltonian (18.1) has a number of
symmetries:
– Particle number conservation: H commutes with total particle number
Ne = niσ . (18.4)
i,σ
– SU (2) spin symmetry: H commutes with all components of the total spin
1 † α
Sα = c σ c , (18.5)
2 i μ,ν iμ μν iν
Fig. 18.1. With the use of symmetries the Hamiltonian matrix acquires a block structure.
Here: The matrix for the Hubbard model when particle number conservation is neglected
(left) or taken into account (right)
(†) (†)
T : ci,σ → ci+1,σ . (18.7)
For the basis construction the most important of these symmetries are the parti-
cle number conservation, the spin-S z conservation and the translational invariance.
Note that the conservation of both S z = (N↑ − N↓ )/2 and Ne = N↑ + N↓ is equiv-
alent to the conservation of the total number of spin-↑ and of spin-↓ electrons, N↑
and N↓ , respectively. In addition to S z we could also fix the total spin S 2 , but the
construction of the corresponding eigenstates is too complicated for most practical
computations.
Let us start with building the basis for a system with L sites and fixed electron
numbers N↑ and N↓ . Each element of the basis can be identified by the positions of
the up and down electrons, but for uniqueness we also need to define some normal
532 A. Weiße and H. Fehske
order. For the Hubbard model it is convenient to first sort the electrons by the spin
index, then by the lattice index, i.e.,
is a valid ordered state. This ordering has the advantage that the nearest-neighbor
hopping in the Hamiltonian does not lead to complicated phase factors, when ap-
plied to our
basis
states. Finding all the basis states is a combinatorics problem:
There are NL↑ ways of distributing N↑ (indistinguishable) up-spin electrons on L
sites, and similarly, NL↓ ways of distributing N↓ down-spin electrons on L sites.
Hence, the total number of states in our basis is NL↑ NL↓ . If we sum up the dimen-
sions of all (N↑ , N↓ )-blocks, we obtain
L
L
L L
= 2L 2L = 4L , (18.10)
N↑ N↓
N↑ =0 N↓ =0
which is the total Hilbert space dimension we derived earlier. The biggest block in
L 2
our symmetrized Hamiltonian has N↑ = N↓ = L/2 and dimension L/2 . This
L
is roughly a factor of πL/2 smaller than the original 4 . Below we will reduce the
dimension of the biggest block by another factor of L using translational invariance.
Knowing the basic structure and the dimension of the Hilbert space with fixed
particle numbers, how can we implement it on a computer? An efficient way to
do so, is using integer numbers and bit operations that are available in many pro-
gramming languages. Assume, we work with a lattice of L = 4 sites and N↑ = 3,
N↓ = 2. We can then translate the state of (18.9) into a bit pattern,
c†3↑ c†2↑ c†0↑ c†3↓ c†1↓ |0 → (↑, ↑, 0, ↑) × (↓, 0, ↓, 0) → 1101 × 1010 . (18.11)
To build the other basis states, we need all four-bit integers with three bits set to one,
as well as all four-bit integers with two bits set. We leave this to the reader as a little
programming exercise, and just quote the result in Table 18.1.
The complete basis is given by all 24 pairs of the four up-spin and the six down-
spin states. Having ordered the bit patterns by the integer values they correspond to,
Table 18.1. Basis states of the Hubbard model on four sites with three up- and two down-spin
electrons
no. ↑-patterns no. ↓-patterns
0 0111 = 7 0 0011 = 3
1 1011 = 11 1 0101 = 5
2 1101 = 13 2 0110 = 6
3 1110 = 14 3 1001 = 9
4 1010 = 10
5 1100 = 12
18 Exact Diagonalization Techniques 533
we can label each state by its indices (i, j) in the list of up and down patterns, or
combine the two indices to an overall index n = i · 6 + j. Our sample state (18.9)
corresponds to the index pair (2, 4), which is equivalent to the state 2 · 6 + 4 = 16
of the total 24 states.
Now we need to find the indices of the resulting states on the right. For the
Hubbard model with its decomposition into two spin channels, we can simply use a
table which translates the integer value of the bit pattern into the index in the list of
up and down spin states (see Table 18.1). Note, however, that this table has a length
of 2L . When simulating spin or phonon models such a table would easily exceed all
available memory. For finding the index of a given basis state we then need to resort
to other approaches, like hashing, fast search algorithms or some decomposition of
the state [6]. Having found the indices and denoting our basis in a ket-notation, |n,
(18.12) reads
↑ -hopping : |16 → −t (|10 + |22) ,
↓ -hopping : |16 → −t (|14 + |17 + |15 − |12) , (18.13)
U -term : |16 → U |16 .
To obtain the complete Hamiltonian matrix we have to repeat this procedure for all
24 basis states. In each case we obtain a maximum of 2L = 8 off-diagonal non-
zero matrix elements. Thus, the matrix is indeed very sparse (see Fig. 18.2). The
generalization of the above considerations to arbitrary values of L, N↑ , and N↓ is
straight-forward. For spatial dimensions larger than one we need to be a bit more
careful with fermionic phase factors. In general, minus signs will occur not only at
the boundaries, but also for other hopping processes.
-t
t
U
2U
Fig. 18.2. Schematic representation of the Hamiltonian matrix of the Hubbard model with
L = 4, N↑ = 3, N↓ = 2, and periodic boundary conditions
Clearly, for a given (unsymmetrized) state |n, the state Pk |n is an eigenstate of T ,
L−1
1 2πijk/L j+1
T Pk |n = e T |n = e−2πik/L Pk |n , (18.15)
L j=0
as we expect for a projector. Hence, n|Pk† Pk |n = n|Pk2 |n = n|Pk |n. For
most |n the states T j |n with j = 0, 1, . . . , (L − 1) will differ from each other,
therefore n|Pk |n = 1/L. However, some states are mapped onto themselves by a
translation T νn with νn < L, i.e., T νn |n = eiφn |n with a phase φn (usually 0 or
18 Exact Diagonalization Techniques 535
We can call this group of connected states a cycle, which is completely described
by knowing one of its members. It is convenient to use the pattern with the smallest
integer value to be this special member of the cycle, and we call it the representative
of the cycle.
Applying the projector to the representative of the cycle, Pk |0↑ , we can gener-
ate L linearly independent states, which in our case reads
The advantage of these new states, which are linear combinations of all members of
the cycle in a spirit similar to discrete Fourier transformation, becomes clear when
we apply the Hamiltonian: Whereas the Hamiltonian mixes the states in (18.18), all
matrix elements between the states in (18.19) vanish. Hence, we have decomposed
the four-dimensional Hilbert space into four one-dimensional blocks.
In a next step we repeat this procedure for the ↓-patterns of Table 18.1. These
can be decomposed into two cycles represented by the states |0↓ = 0011 and
|1↓ = 0101, where due to T 2 |1↓ = −|1↓ the second cycle has size ν1 = 2.
Note, that we also have phase factors here, since the number of fermions is even.
To get the complete symmetrized basis, we need to combine the up and down spin
representatives, thereby taking into account relative shifts between the states. For
our sample case the combined representatives,
no. patterns
0 0111 × 0011
1 0111 × 0110
2 0111 × 1100
3 0111 × 1001
4 0111 × 0101
5 0111 × 1010
Pk |r
|rk = , (18.21)
r|Pk |r
where we discard those |r with r|Pk |r = 0. In our example all six states
have r|Pk |r = 1/4 ∀k and no state is discarded. Therefore the dimension of each
fixed-k space is six, and summing over all four k we obtain the original number of
states, 24. For other particle numbers or lattice sizes we may obtain representatives
|r with r|Pk |r = 0 for certain k. An example is the case N↑ = N↓ = 2, L = 4
which leads to ten representatives, but two of them have r|Pk |r = 0 for k = 1 and
k = 3. Adding the dimensions
of the four k-subspaces, we find 10+8+10+8 = 36,
which agrees with NL↓ NL↑ = 62 .
When calculating the Hamiltonian matrix for a given k-sector, we can make use
of the fact that H commutes with T , and therefore also with Pk . Namely, the matrix
element between two states |rk and |rk′ is simply given by
i.e., we need to apply the projector only once after we applied H to the representa-
tive |r. Repeating the procedure for all representatives, we obtain the matrix for a
given k. The full matrix with fixed particle numbers N↑ and N↓ is decomposed into
L blocks with fixed k. For example, the 24×24 matrix from Fig. 18.2 is decomposed
into the four 6 × 6 matrices.
18 Exact Diagonalization Techniques 537
⎛ ⎞ ⎛ ⎞
2U −t −t t t 0 2U −t −it −it t 0
⎜ −t 2U −t −t 0 t ⎟ ⎜ −t 2U −t −t −2it t ⎟
⎜ ⎟ ⎜ ⎟
⎜ −t −t 2U 0 −t −t⎟ ⎜ it −t 2U 0 −t −it⎟
Hk=0 = ⎜
⎜ t −t 0 U −t t ⎟
⎟ Hk=1 = ⎜
⎜ it −t 0 U −t −it⎟
⎟
⎜ ⎟ ⎜ ⎟
⎝ t 0 −t −t U −t⎠ ⎝ t 2it −t −t U −t ⎠
0 t −t t −t U 0 t it it −t U
⎛ ⎞ ⎛ ⎞
2U −t t −t t 0 2U −t it it t 0
⎜ −t 2U −t −t 0 t ⎟ ⎜ −t 2U −t −t 2it t ⎟
⎜ ⎟ ⎜ ⎟
⎜ t −t 2U 0 −t t ⎟ ⎜−it −t 2U 0 −t it ⎟
Hk=2 = ⎜
⎜ −t −t 0 U −t −t⎟
⎟ Hk=3 = ⎜
⎜−it −t 0 U −t it ⎟
⎟
⎜ ⎟ ⎜ ⎟
⎝ t 0 −t −t U −t⎠ ⎝ t −2it −t −t U −t⎠
0 t t −t −t U 0 t −it −it −t U
(18.23)
Note that except for k = 0 and k = 2, which correspond to the momenta zero and
π, the matrices Hk are complex. Their dimension, however, is a factor of L smaller
than the dimension of the initial space with fixed particle numbers. At first glance,
the above matrices look rather dense. This is due to the small dimension of our
sample system. For larger L and Ne the Hamiltonian is as sparse as the example of
Fig. 18.1.
Having constructed a symmetrized basis for the Hubbard and Heisenberg type mod-
els, let us now comment on bosonic models and phonons, in particular. For such sys-
tems the particle number is usually not conserved, and the accessible Hilbert space
is infinite even for a single site. For numerical studies we therefore need an appro-
priate truncation scheme, which preserves enough of the Hilbert space to describe
the considered physics, but restricts the dimension to manageable values. Assume
we are studying a model like the Holstein-Hubbard model (18.3), where the pure
phonon part is described by a set of harmonic Einstein oscillators, one at each site.
For an L-site lattice the eigenstates of this phonon system are given by the Fock
states
L−1
0 (b† )mi
|m0 , . . . , mL−1 = √i |0 (18.25)
i=0
mi !
and the corresponding eigenvalue is
L−1
Ep = ω0 mi . (18.26)
i=0
If we are interested in the ground state or the low energy properties of the interacting
electron-phonon model (18.3), certainly only phonon states with a rather low energy
will contribute. Therefore, a good truncated basis for the phonon Hilbert space is
given by the states
L−1
|m0 , . . . , mL−1 with mi ≤ M , (18.27)
i=0
which include all states with Ep ≤ ω0 M . The dimension of the resulting Hilbert
space is L+M
M .
To keep the required M small, we apply another trick [7]. After Fourier trans-
forming the phonon subsystem,
L−1
1 2πiik/L#
bi = √ e bk , (18.28)
L k=0
we observe that the phonon mode with k = 0 couples to a conserved quantity: The
total number of electrons Ne ,
† †
H = −t (ciσ cjσ + H.c.) + U ni↑ ni↓ + ω0 #b #b
k k
ij,σ i k
gω0 −2πiik/L #† # gω0
−√ e (bk + b−k )niσ − √ (#b†0 + #b0 )Ne . (18.29)
L i,σ k=0 L
√
With a constant shift #b0 = b0 + gNe / L this part of the model can thus be solved
analytically. Going back to real space and using the equivalently shifted phonons
bi = b̄i + gNe /L, the transformed Hamiltonian reads
18 Exact Diagonalization Techniques 539
H = −t (c†iσ cjσ + H.c.) + U ni↑ ni↓ + ω0 b̄†i b̄i
ij,σ i i
†
−gω0 (b̄i + b̄i )(ni↑ + ni↓ − Ne /L) − ω0 (gNe )2 /L . (18.30)
i
(†)
Since the shifted phonons b̄i couple only to the local charge fluctuations, in a sim-
ulation the same accuracy can be achieved with a much smaller cutoff M , compared
(†)
to the original phonons bi . This is particularly important in the case of strong in-
teraction g.
As in the electronic case, we can further reduce the basis dimension using the
translational symmetry of our lattice model. Under periodic boundary conditions,
the translation operator T transforms a given basis state like
Since we are working with bosons, no additional phase factors can occur, and every-
thing is a bit easier. As before, we need to find the representatives |rp of the cycles
generated by T , and then construct eigenstates of T with the help of the projection
operator Pk . When combining the electronic representatives |re from (18.20) with
the phonon representatives |rp , we proceed in the same way, as we did for the up
and down spin channels, |r = |re T j |rp . A full symmetrized basis state of the
interacting electron-phonon model is then given by Pk |r. Note that the product
structure of the electron-phonon basis is preserved during symmetrization, which is
a big advantage for parallel implementations [8].
Having explained the construction of a symmetrized basis and of the correspond-
ing Hamiltonian matrix for both electron and phonon systems, we are now ready to
work with these matrices. In particular, we will show how to calculate eigenstates
and dynamic correlations of our physical systems.
The Lanczos algorithm is one of the simplest methods for the calculation of ex-
tremal (smallest or largest) eigenvalues of sparse matrices [9]. Initially it was devel-
oped for the tridiagonalization of Hermitian matrices [10], but it turned out, not to
be particularly successful for this purpose. The reason for its failure as a tridiagonal-
ization algorithm is the underlying recursion procedure, which rapidly converges to
eigenstates of the matrix and therefore looses the orthogonality between subsequent
vectors that is required for tridiagonalization. Sometimes, however, deficiencies turn
into advantages, and the Lanczos algorithm made a successful career as an eigen-
value solver.
The basic structure and the implementation of the algorithm is very simple.
Starting from a random initial state (vector) |φ0 , we construct the series of states
540 A. Weiße and H. Fehske
H n |φ0 by repeatedly applying the matrix H (i.e., the Hamiltonian). This series of
states spans what is called a Krylov space in the mathematical literature, and the
Lanczos algorithm therefore belongs to a broader class of algorithms that work on
Krylov spaces [11]. Next we orthogonalize these states against each other to obtain
a basis of the Krylov space. Expressed in terms of this basis, the matrix turns out
to be tridiagonal. We can easily perform these two steps in parallel, and obtain the
following recursion relation:
With increasing recursion order N the eigenvalues of H # N – starting with the ex-
tremal ones – converge to the eigenvalues of the original matrix H. In Fig. 18.4 we
illustrate this for the ground-state energy of the one-dimensional Hubbard model
(18.1) on a ring of 12 and 14 sites. Using only particle number conservation, the
2 2
corresponding matrix dimensions are D = 12 6 = 853776 and D = 14 7 =
11778624, respectively. With about 90 iterations the precision of the lowest eigen-
value is better than 10−13 , where we compare with the exact result obtained with
Bethe ansatz [4]. The eigenvalues of the tridiagonal matrix were calculated with
standard library functions from the LAPACK collection [12]. Since N ≪ D, this
accounts only for a tiny fraction of the total computation time, which is governed
by the application of H on |φn .
Having found the extremal eigenvalues, we can also calculate the corresponding
eigenvectors of the matrix. If the eigenvector |ψ of the tridiagonal matrix H # N has
the components ψj , i.e., |ψ = {ψ0 , ψ1 , . . . , ψN −1 }, the eigenvector |Ψ of the
original matrix H is given by
N
−1
|Ψ = ψj |φj . (18.34)
j=0
18 Exact Diagonalization Techniques 541
Fig. 18.4. Convergence of the Lanczos recursion for the ground-state energy of the Hubbard
model on a ring of L = 12 and L = 14 sites
To calculate this sum we simply need to repeat the above Lanczos recursion with
the same start vector |φ0 , thereby omitting the scalar products for αj and βj , which
we know already.
The efficiency of the Lanczos algorithm is based on three main properties:
(i) It relies only on matrix vector multiplications (MVM) of the matrix H with a
certain vector |φn . If H is sparse, this requires only of the order of D opera-
tions, where D is the dimension of H.
(ii) When calculating eigenvalues, the algorithm requires memory only for two
vectors of dimension D and for the matrix H. For exceptionally large prob-
lems, the matrix can be re-constructed on-the-fly for each MVM, and the mem-
ory consumption is determined by the vectors. When calculating eigenvectors
we need extra memory.
(iii) The first few eigenvalues on the upper and lower end of the spectrum of H
usually converge very quickly. In most cases N 100 iterations are sufficient.
Extensions of the Lanczos algorithm can also be used for calculating precise
estimates of the full spectral density of H, or of dynamical correlation functions
that depend on the spectrum of H and on the measured operators. We will discuss
more details in Chap. 19 when we describe Chebyshev expansion based methods,
such as the Kernel Polynomial Method.
It has the advantage that not only the lowest eigenstates but also excitations converge
rapidly. In addition, it can correctly resolve degeneracies.
In the Jacobi-Davidson algorithm, like in the Lanczos algorithm, a set of vectors
VN = {|v0 , . . . , |vN −1 } is constructed iteratively, and the eigenvalue problem
for the Hamiltonian H is solved within this subspace. However, in contrast to the
Lanczos algorithm, we do not work in the Krylov space of H, but instead expand
VN with a vector that is orthogonal to our current approximate eigenstates. In more
detail, the procedure is as follows:
(i) Initialize the set V with a random normalized start vector, V1 = {|v0 }.
(ii) Compute all unknown matrix elements vi |H|vj of H # N with |vi ∈ VN .
#
(iii) Compute an eigenstate |s of HN with eigenvalue θ, and express |s in the
original basis, |u = i |vi vi |s.
(iv) Compute the associated residual vector |r = (H −θ)|u and stop the iteration,
if its norm is sufficiently small.
(v) Otherwise, (approximately) solve the linear equation
(vi) Orthogonalize |t against VN with the modified Gram-Schmidt method and
append the resulting vector |vN to VN , obtaining the set VN +1 .
(vii) Return to step (ii).
–6 –6
Lanczos Jacobi-Davidson
–7 –7
E
artificial 3-fold
degeneracy
–7.5 –7.5
–8 –8
Fig. 18.5. Comparison of the Jacobi-Davidson algorithm and the Lanczos algorithm applied
to the four lowest eigenstates of the Hubbard model with L = 12, N↓ = 5, N↑ = 6. Jacobi-
Davidson correctly resolves the two-fold degeneracy, standard Lanczos (although faster) can-
not distinguish true and artificial degeneracy
18 Exact Diagonalization Techniques 543
For (18.35) we only need an approximate solution, which can be obtained, for
instance, with a few steps of the Generalized Minimum Residual Method (GMRES)
or the Quasi Minimum Residual Method (QMR) [16]. If more than one eigenstate
(1 − |uu|) needs to be extended by the already
is desired, the projection operator
converged eigenstates, (1 − k |uk uk |), such that the search continues in a new,
yet unexplored direction. Since the Jacobi-Davidson algorithm requires memory for
all the vectors in VN , it is advisable to restart the calculation after a certain number
of steps. There are clever strategies for this restart, and also for the calculation of
interior eigenstates, which are hard to access with Lanczos. More details can be
found in the original papers [13, 17] or in text books [18].
In Fig. 18.5 we give a comparison of the Lanczos and the Jacobi-Davidson al-
gorithms, calculating the four lowest eigenstates of the Hubbard model on a ring
of L = 12 sites with N↓ = 5 and N↑ = 6 electrons. The matrix dimension is
D = 731808, and each of the lowest states is two-fold degenerate. In terms of speed
and memory consumption the Lanczos algorithm has a clear advantage, but with
the standard setup we have difficulties resolving the degeneracy. The method tends
to create artificial copies of well converged eigenstates, which are indistinguishable
from the true degenerate states. The problem can be circumvented with more ad-
vanced variants of the algorithm, such as Block or Band Lanczos [9, 18], but we
loose the simplicity of the method and part of its speed. Jacobi-Davidson then is
a strong competitor. It is not much slower and it correctly detects the two-fold de-
generacy, since the converged eigenstates are explicitly projected out of the search
space.
References
1. J. Hubbard, Proc. Roy. Soc. London, Ser. A 276, 238 (1963) 529
2. M.C. Gutzwiller, Phys. Rev. Lett. 10, 159 (1963) 529
3. J. Kanamori, Prog. Theor. Phys. 30, 275 (1963) 529
4. E.H. Lieb, F.Y. Wu, Phys. Rev. Lett. 20, 1445 (1968) 529, 540
5. F.H.L. Essler, H. Frahm, F. Göhmann, A. Klümper, V.E. Korepin, The One-Dimensional
Hubbard Model (Cambridge University Press, Cambridge, 2005) 529
6. R. Sedgewick, Algorithmen (Addison-Wesley, Bonn, 1992) 533
7. S. Sykora, A. Hübsch, K.W. Becker, G. Wellein, H. Fehske, Phys. Rev. B 71, 045112
(2005) 538
8. B. Bäuml, G. Wellein, H. Fehske, Phys. Rev. B 58, 3663 (1998) 539
9. J.K. Cullum, R.A. Willoughby, Lanczos Algorithms for Large Symmetric Eigenvalue
Computations, vol. I & II (Birkhäuser, Boston, 1985) 539, 543
10. C. Lanczos, J. Res. Nat. Bur. Stand. 45, 255 (1950) 539
11. Y. Saad, Numerical Methods for Large Eigenvalue Problems (University Press, Manch-
ester, 1992). URL http://www-users.cs.umn.edu/ saad/books.html 540
12. Linear Algebra PACKage. URL http://www.netlib.org 540
13. G.L.G. Sleijpen, H.A. van der Vorst, SIAM J. Matrix Anal. Appl. 17, 401 (1996) 541, 543
14. E.R. Davidson, J. Comput. Phys. 17, 87 (1975) 541
15. C.G.J. Jacobi, J. Reine und Angew. Math. 30, 51 (1846) 541
544 A. Weiße and H. Fehske
16. Y. Saad, Iterative Methods for Sparse Linear Systems, 2nd edn. (SIAM, Philadelphia,
2003). URL http://www-users.cs.umn.edu/ saad/books.html 543
17. D.R. Fokkema, G.L.G. Sleijpen, H.A. van der Vorst, SIAM J. Sci. Comp. 20, 94 (1998) 543
18. Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, H. van der Vorst (eds.), Templates for the
Solution of Algebraic Eigenvalue Problems: A Practical Guide (SIAM, Philadelphia,
2000). URL http://www.cs.utk.edu/ dongarra/etemplates/ 543
19 Chebyshev Expansion Techniques
With the Lanczos and the Jacobi-Davidson algorithm we are able to calculate a
few of the many eigenstates of a sparse matrix. However, it is hardly feasible to
calculate all eigenstates of matrices with dimensions larger than a million, not to
speak of dimensions like 109 . Nevertheless, we are interested in dynamic correlation
functions and finite temperature properties, which depend on the complete spectrum
of the Hamiltonian.
In this chapter we introduce the Kernel Polynomial Method (KPM), a numerical
approach that on the basis of Chebyshev expansion allows a very precise calcula-
tion of the spectral properties of large sparse matrices and of the static and dynamic
correlation functions, which depend on them. In addition, we show how the KPM
successfully competes against the very popular Lanczos Recursion and Maximum
Entropy Method and can be easily embedded into other numerical techniques, such
as Cluster Perturbation Theory or Monte Carlo simulation. Characterized by a re-
source consumption that scales linearly with the problem dimension the KPM en-
joyed growing popularity over the last decade and found broad application not only
in physics (for a recent more detailed review see [1]).
Let us first recall the basic properties of expansions in orthogonal polynomials and
of Chebyshev expansion in particular. Given a positive weight function w(x) defined
on the interval [a, b] we can introduce a scalar product
b
f |g = w(x)f (x)g(x) dx (19.1)
a
between two integrable functions f, g:[a, b] → R. With respect to this scalar product
there exists a complete set of polynomials pn (x), which fulfil the orthogonality
relations pn |pm = δn,m /hn , where hn = 1/pn |pn denotes the inverse of the
squared norm of pn (x). These orthogonality relations allow for an easy expansion
A. Weiße and H. Fehske: Chebyshev Expansion Techniques, Lect. Notes Phys. 739, 545–577 (2008)
DOI 10.1007/978-3-540-74686-7 19 c Springer-Verlag Berlin Heidelberg 2008
546 A. Weiße and H. Fehske
of a given function f (x) in terms of the pn (x), since the expansion coefficients are
proportional to the scalar products of f and pn ,
∞
f (x) = αn pn (x) (19.2)
n=0
with αn = pn |f hn .
In general, all types of orthogonal polynomials can be used for such an expan-
sion and for the KPM approach which we discuss in this chapter (see e.g. [2]).
However, as we frequently observe whenever we work with polynomial expansions
[3], Chebyshev polynomials [4, 5] of first and second kind turn out to be the best
choice for most applications, mainly due to the good convergence properties of the
corresponding series and the close relation to Fourier transform [6, 7]. The latter
is also an important prerequisite for the derivation of optimal kernels (see below),
which are required for the regularization of finite-order expansions, and which so
far have not been derived for other sets of orthogonal polynomials.
There are two sets of Chebyshev polynomials, √ both defined on the interval
−1
[a, b] = [−1, 1]: The weight function w(x) = (π 1 − x2 )√ yields the polyno-
mials of first kind, Tn , and the weight function w(x) = π 1 − x2 those of sec-
ond kind, Un . In what follows we focus on the Tn = cos(n arccos(x)), which
after substituting x = cos(ϕ) can be shown to fulfil the orthogonality relation
Tn |Tm = δn,m (1 + δn,0 )/2. Moreover, we can easily prove the recursion relation
Tm+1 (x) = 2 x Tm (x) − Tm−1 (x) , (19.3)
and the addition formula
2 Tm (x)Tn (x) = Tm+n (x) + Tm−n (x) , (19.4)
where T−n (x) = Tn (x) and T0 (x) = 1.
Expanding a function f in the standard way of (19.2), the determination of the
coefficients Tn |f requires integrations over the weight function w(x), see (19.1).
In practical applications to matrix problems this prohibits a simple iterative scheme,
but a solution follows from a slight rearrangement of the expansion, namely
1 ∞
2
1
f (x) = √ μ0 + 2 μn Tn (x) (19.5)
π 1 − x2 n=1
These two equations are the general basis for the Chebyshev expansion. In the
remaining sections we will explain how to translate physical quantities into polyno-
mial expansions of the form of (19.5), how to calculate the moments μn in practice,
and how to improve the convergence of the approach.
19 Chebyshev Expansion Techniques 547
# = H −b ,
H # = E−b ,
E (19.7)
a a
and denote all rescaled quantities with a tilde hereafter. Given the extremal eigen-
values of the Hamiltonian, Emin and Emax , which can be calculated, e.g. with the
Lanczos algorithm [8], or for which bounds may be known analytically, the scaling
factors a and b read a = (Emax −Emin )/(2−ǫ), b = (Emax +Emin )/2. The parameter
ǫ is a small cut-off introduced to avoid stability problems that arise if the spectrum
includes or exceeds the boundaries of the interval [−1, 1]. It can be fixed, e.g. to
ǫ = 0.01, or adapted to the resolution of the calculation, which for an expansion of
finite order N is proportional 1/N (see below).
The next similarity of most Chebyshev expansions is the form of the moments,
namely their dependence on the matrix or Hamiltonian H. # In general, we find two
types of moments: Simple expectation values of Chebyshev polynomials in H, #
#
μn = β|Tn (H)|α , (19.8)
where |α and |β are certain states of the system, or traces over such polynomials
and a given operator A,
μn = Tr[A Tn (H)]# . (19.9)
Handling the first case is rather straightforward. Starting from the state |α we
#
can iteratively construct the states |αn = Tn (H)|α by using the recursion relations
for the Tn (see (19.3)),
|α0 = |α , # 0 ,
|α1 = H|α # n − |αn−1 .
|αn+1 = 2H|α
(19.10)
which is equivalent to two moments per MVM. The numerical effort for N moments
is thus reduced by a factor of two. In addition, like many other numerical approaches
KPM benefits considerably from the use of symmetries that reduce the Hilbert space
dimension.
The second case where the moments depend on a trace over the whole Hilbert
space, at first glance, looks far more complicated. Based on the previous considera-
tions we would estimate the numerical effort to be proportional to D2 , because the
iteration needs to be repeated for all D states of a given basis. It turns out, however,
that extremely good approximations of the moments can be obtained with a much
simpler approach: The stochastic evaluation of the trace [2, 9, 10], i.e., an estimate
of μn based on the average over only a small number R ≪ D of randomly chosen
states |r
R−1
μn = Tr[A Tn (H)]# ≈ 1 #
r|A Tn (H)|r . (19.12)
R r=0
The number of random states R does not scale with D. It can be kept constant
or even reduced with increasing D. To understand this, let us consider the conver-
gence properties of the above estimate. Given an arbitrary basis {|i} and a set of
independent identically distributed random variables ξri ∈ C, which in terms of the
statistical average . . . fulfil
∗
ξri = 0 , ξri ξr′ j = 0 , ξri ξr′ j = δrr′ δij , (19.13)
D−1
a random vector is defined through |r = i=0 ξri |i. We can now calculate the
R−1
statistical expectation value of the trace estimate Θ = R1 r=0 r|B|r for some
Hermitian operator B with matrix elements Bij = i|B|j, and indeed find,
1 R−1
R−1 D−1
1 ∗
D−1
Θ = r|B|r = ξri ξrj Bij = Bii = Tr(B).
R r=0 R r=0 i,j=0 i=0
(19.14)
Of course, this only shows that we obtain the correct result on average. To assess
the associated error we also need to study the fluctuation of Θ, which is character-
ized by (δΘ)2 = Θ2 − Θ2 . Evaluating Θ2 , we get for the fluctuation
19 Chebyshev Expansion Techniques 549
⎡ ⎤
D−1
1 ⎣ 2
(δΘ)2 = Tr(B 2 ) + |ξri |4 − 2 Bjj ⎦ . (19.15)
R j=0
In the preceding sections we introduced the basic ideas underlying the expansion
of a function f (x) in an infinite series of Chebyshev polynomials, and gave a few
hints for the numerical calculation of the expansion coefficients μn . For a numerical
approach, however, the total number of moments will remain finite, and we have to
look for the best (uniform) approximation to f (x) by polynomials of given maxi-
mal degree N . Introducing the concept of kernels, we will investigate and optimize
the convergence properties of the mapping f (x) → fKPM (x) from the considered
function f (x) to our approximation fKPM (x).
Experience shows that a simple truncation of an infinite series,
1 N
−1
leads to poor precision and fluctuations – also known as Gibbs oscillations – near
points where the function f (x) is not continuously differentiable. The situation is
even worse for discontinuities or singularities of f (x), as we illustrate in Fig. 19.1.
A common procedure to damp these oscillations relies on an appropriate modifica-
tion of the expansion coefficients, μn → gn μn , which depends on the order of the
approximation N ,
1 N
−1
This truncation of the infinite series to order N together with the corresponding
modification of the coefficients is equivalent to the convolution of f (x) with a kernel
KN (x, y),
1
fKPM (x) = π 1 − y 2 KN (x, y)f (y) dy , (19.18)
where −1 N −1
KN (x, y) = g0 φ0 (x)φ0 (y) + 2 gn φn (x)φn (y) , (19.19)
n=1
550 A. Weiße and H. Fehske
30 1.5
plain series
25
Jackson kernel 1
20
step function
N = 64
δ function
0.5
15
10
0
5
σ = π/N
–0.5
0
–5 –1
0 0.5 1
x
Fig. 19.1. Order N = 64 expansions of δ(x) and a step. Whereas the truncated series (Dirich-
let kernel) strongly oscillate, the Jackson results smoothly converge to the expanded functions
√
and φn (x) = Tn (x)/(π 1 − x2 ). This way the problem translates into finding an
optimal kernel KN (x, y), i.e., coefficients gn . Clearly the notion of optimal depends
on the application considered.
The standard truncated series corresponds to the choice gnD = 1, which leads to
what is usually called the Dirichlet kernel,
D
KN (x, y) = [φN (x)φN −1 (y) − φN −1 (x)φN (y)]/(x − y) . (19.20)
An approximation
based on this kernel for N → ∞ converges within the integral
norm ||f ||2 = f |f , i.e. we have
N →∞
||f − fKPM ||2 −−−−→ 0 . (19.21)
This is, of course, not particularly restrictive and leads to the disadvantages we
mentioned earlier.
A much better criterion would be uniform convergence,
N →∞
||f − fKPM ||∞ = max |f (x) − fKPM (x)| −−−−→ 0 , (19.22)
−1<x<1
and, indeed, this can be achieved for continuous functions f under very general
conditions. Specifically, it suffices to demand that:
(i) The kernel is positive: KN (x, y) > 0 ∀x, y ∈ [−1, 1].
&1
(ii) The kernel is normalized, −1 K(x, y) dx = φ0 (y), which is equivalent to
g0 = 1.
(iii) The second coefficient g1 approaches 1 as N → ∞.
19 Chebyshev Expansion Techniques 551
The conditions (i) and (ii) are very useful for practical applications: The first ensures
that approximations of positive quantities become positive, the second conserves
&1 &1
the integral of the expanded function, −1 fKPM (x) dx = −1 f (x) dx. Applying
the kernel, for example, to a density of states thus yields an approximation which is
strictly positive and normalized.
The simplest kernel which fulfils all three conditions is the Fejér kernel [11],
N
F 1 D
KN (x, y) = K (x, y) , (19.23)
N ν=1 ν
i.e., gnF = 1 − n/N , which is the arithmetic mean of all Dirichlet approximations
of order less or equal N . However, with the coefficients gnF of the Fejér kernel we
have not fully exhausted the freedom offered by the above conditions. We can hope
to further improve the kernel by optimizing the gn in some sense, which will lead
us to recover old results by Jackson [12, 13]. In particular, let us tighten the third
condition by demanding that the kernel has optimal resolution in the sense that
1
1
Q := (x − y)2 KN (x, y) dx dy (19.24)
−1 −1
The Jackson kernel is the best choice for most of the applications we dis-
cuss below. In some situations, however, special analytical properties of the ex-
panded functions become important, which only other kernels can account for.
Single-particle Green functions that appear in the Cluster Perturbation Theory (see
Sect. 19.3), are an example. Considering the imaginary part of the Plemelj-Dirac
formula, limǫ→0 1/(x + iǫ) = P(1/x) − iπδ(x) (here P denotes the principal
value), which frequently occurs in connection with Green functions, the δ-function
on the right hand side is approached in terms of a Lorentz curve,
1 1 ǫ
δ(x) = − lim Im = lim . (19.29)
π ǫ→0 x + iǫ ǫ→0 π(x + ǫ2 )
2
where we can take the gn of any of the previously discussed kernels. If we use the
J
gnJ of the Jackson kernel, KN (x, y) fulfils generalizations of our conditions for an
optimal kernel, namely
J
(i) KN (x, y) is positive ∀ x, y ∈ [−1, 1]d .
J
(ii) KN (x, y) is normalized with
1
1
1
1
··· fKPM (x) dx1 . . . dxd = ··· f (x) dx1 . . . dxd . (19.32)
−1 −1 −1 −1
J
(iii) KN (x, y) has optimal resolution in the sense that
1
1
Q= ··· (x − y)2 KN (x, y) dx1 . . . dxd dy1 . . . dyd = d(g0 − g1 )
−1 −1
(19.33)
is minimal.
Note that for simplicity the order of the expansion, N , was chosen to be the same
for all spatial directions. Of course, we could also define more general kernels,
Ad
KN (x, y) = j=1 KNj (xj , yj ), where the vector N denotes the orders of ex-
pansion for the different spatial directions.
Having discussed the theory behind Chebyshev expansion, the calculation of mo-
ments, and the various kernel approximations, let us now come to the practical is-
sues of the implementation of KPM, namely to the reconstruction of the expanded
function f (x) from its moments μn . Knowing a finite number N of coefficients
μn , we usually want to reconstruct f (x) on a finite set of abscissas xk . Naively we
could sum up (19.17) separately for each point, thereby making use ofthe recur-
sion relations for Tn , i.e., f (xk ) = (g0 μ0 + 2 N −1
n=1 gn μn Tn (xk ))/(π 1 − xk ).
2
#
For a set {xk } containing N points these summations would require of the order
of N N# operations. We can do much better, remembering the definition Tn (x) =
cos(n arccos(x)) and the close relation between KPM and Fourier expansion: First,
we may introduce the short-hand notation μ #n = μn gn for the kernel improved mo-
ments. Second and more important, we make a special choice for our data points,
π(k + 1/2)
xk = cos (19.34)
#
N
# − 1), which coincides with the abscissas of Chebyshev nu-
with k = 0, . . . , (N
merical integration [4]. The number N# of points in the set {xk } is not necessarily
the same as the number of moments N . Usually we will consider N # ≥ N and a
554 A. Weiße and H. Fehske
which allows for the use of divide-and-conquer type algorithms that require only
N# log N
# operations – a clear advantage over the above estimate N N #.
Routines for fast discrete cosine transform are implemented in many mathe-
matical libraries or Fast Fourier Transform (FFT) packages, for instance, in FFTW
[14, 15] that ships with most Linux distributions. If no direct implementation is at
hand we may also use fast discrete Fourier transform. With
/ #
#n eiπn/(2N )
(2 − δn,0 ) μ 0<n<N
λn = (19.36)
0 otherwise
#j ) ,
γ2j = Re(λ ##
γ2j+1 = Re(λN −1−j ) , (19.38)
The first and basic application of Chebyshev expansion and KPM is the calculation
of the spectral density of Hermitian matrices, which could correspond to the densi-
ties of states of both interacting or non-interacting quantum models [2, 9, 16, 17]. To
be specific, let us consider a D-dimensional matrix M with eigenvalues Ek , whose
spectral density is defined as
D−1
1
ρ(E) = δ(E − Ek ) . (19.40)
D
k=0
D−1
1 B)|k = 1 Tr(Tn (M
B)) .
= k|Tn (M (19.41)
D D
k=0
This is exactly the trace form that we introduced in Sect. 19.1, and we can imme-
diately calculate the μn using the stochastic techniques described before. Knowing
the moments we can reconstruct ρ#(E) # for the whole range [−1, 1], and a final rescal-
ing yields ρ(E).
As the first physical example let us consider percolation of non-interacting
fermions in disordered solids. The percolation problem is characterized by the in-
terplay of pure classical and quantum effects. Besides the question of finding a per-
colating path of accessible sites through a given lattice the quantum nature of the
electrons imposes further restrictions on the existence of extended states and, con-
sequently, of a finite dc-conductivity. As a particularly simple model describing this
situation we consider a tight-binding one-electron Hamiltonian
† †
H= ǫ i ci ci − t ci cj + H.c. (19.42)
i=1 ij
on a simple cubic lattice with L3 sites and random on-site energies ǫi drawn from
the bimodal distribution p(ǫi ) = p δ(ǫi − ǫA ) + (1 − p) δ(ǫi − ǫB ), also known as
the binary alloy model (see Chap. 17). In the limit ∆ = (ǫB − ǫA ) → ∞ the wave-
function of the A sub-band vanishes identically on the B-sites, making them com-
pletely inaccessible for the quantum particles. We then arrive at a situation where
non-interacting electrons move on a random ensemble of lattice points, which, de-
pending on p, may span the entire lattice or not. The corresponding Hamiltonian
556 A. Weiße and H. Fehske
reads H = −t ij∈A(c†i cj + H.c.), where the summation extends over nearest-
neighbor A-sites only and, without loss of generality, ǫA is chosen to be zero.
In the theoretical investigation of disordered systems it turned out that distribu-
tion functions for the random quantities take the center stage [18, 19]. The distribu-
tion f (ρi (E)) of the local density of states (LDOS)
N
ρi (E) = |ψn (r i )|2 δ(E − En ) (19.43)
n=1
is particularly suited because ρi (E) measures the local amplitude of the wavefunc-
tion at site r i . It therefore contains direct information about the localization proper-
ties. In contrast to the (arithmetically averaged) mean DOS, ρme (E) = ρi (E),
the LDOS becomes critical at the localization transition [20, 21]. Therefore the
(geometrically averaged) so-called typical DOS, ρty (E) = exp(ln ρi (E)), is fre-
quently used to monitor the transition from extended to localized states. The typical
DOS puts sufficient weight on small values of ρi and a comparison to ρme (E) allows
to detect the localization transition.
Using the KPM the LDOS can be easily calculated for a large number of sam-
ples, Kr , and sites, Ks . The mean and typical DOS are then simply obtained from
Kr Ks Kr Ks
1 1
ρme (E) = ρi (E) , ρty (E) = exp ln(ρi (E)) ,
Kr Ks Kr Ks
k=1 i=1 k=1 i=1
(19.44)
respectively. We classify a state at energy E with ρme (E) = 0 as localized if
ρty (E) = 0 and as extended if ρty (E) = 0.
In order to discuss possible localization phenomena let us investigate the be-
havior of the mean DOS for the quantum percolation models (19.42). As long as
ǫA and ǫB do not differ too much there exists an asymmetric (if p = 0.5) but still
connected electronic band [22]. At about ∆ ≃ 4tD this band separates into two
sub-bands centered at ǫA and ǫB , respectively. The most prominent feature in the
split-band regime is the series of spikes at discrete energies within the band. As an
obvious guess, we might attribute these spikes to eigenstates on islands of A or B
sites being isolated from the main cluster [23, 24]. It turns out, however, that some
of the spikes persist, even if we neglect all finite clusters and restrict the calcula-
tion to the spanning cluster of A sites, A∞ . This is illustrated in the upper panels
of Fig. 19.2, where we compare the DOS of the model (19.42) (at ∆ → ∞) to that
of the spanning cluster only Hamiltonian. Increasing the concentration of accessible
sites the mean DOS of the spanning cluster is evocative of the DOS of the simple cu-
bic lattice, but even at large values of p a sharp peak structure remains at E = 0 √
(cf.
Fig.
19.2,
√ lower panels). Note that the most dominant peaks at E/t = 0, ±1, ± 2,
±1 ± 5 /2, . . . correspond to eigenvalues of the tight-binding model on small
clusters with different geometries. We can thus argue that the wavefunctions, which
belong to these special energies, are localized on some dead ends of the spanning
cluster. The assumption that the distinct peaks correspond to localized wavefunc-
tions is corroborated by the fact that the typical DOS vanishes or, at least, shows
19 Chebyshev Expansion Techniques 557
p = 0.405 p = 0.405
0.3 all A sites A∈A∞
0.2
p = 0.7 p = 0.92
0.3 A∈A∞ A∈A∞
0.2
0.1
–4 –2 0 2 4 –4 –2 0 2 4
E/t E/t
Fig. 19.2. Mean (upper curves) and typical (lower curves) DOS for the quantum percolation
model in the limit ∆ → ∞. While in the upper left panel all A-sites are taken into account, the
other three panels show data for the restricted model on the spanning cluster A∞ only (note
that ρty is smaller in the former case because there are more sites with vanishing amplitude of
the wavefunction). System sizes were adapted to ensure that A∞ always contains the same
number of sites, i.e., 573 for p = 0.405, 463 for p = 0.70, and 423 for p = 0.92. In
order to obtain these high-resolution date we used N = 32768 Chebyshev moments and
Ks × Kr = 32 × 32
a dip at these energies. Occurring also for finite ∆, this effect becomes more pro-
nounced as ∆ → ∞ and in the vicinity of the classical percolation threshold pc . For
a more detailed discussion see [25].
Densities of states provide only the most basic information about a given quantum
system, and much more details can usually be learned from the study of correlation
functions.
Given the eigenstates |k of an interacting quantum system the thermodynamic
expectation value of an operator A reads
D−1
1 1
A = Tr(Ae−βH ) = k|A|k e−βEk , (19.45)
ZD ZD
k=0
where H is the Hamiltonian of the system, Ek the energy of the eigenstate |k, and
D−1
Z = Tr(exp(−βH))/D = D−1 k=0 exp(−βEk ) the partition function. Using
D−1
the function a(E) = D−1 k=0 k|A|k δ(E − Ek ) and the (canonical) density of
states ρ(E), we can express the thermal expectation value in terms of integrals over
the Boltzmann weight,
558 A. Weiße and H. Fehske
∞
∞
1 −βE
A = a(E) e dE , Z= ρ(E) e−βE dE . (19.46)
Z
−∞ −∞
Of course, similar relations hold also for non-interacting fermion systems, where
the Boltzmann weight exp(−βE) has to be replaced by the Fermi function f (E) =
1/(1 + exp(β(E − μ))) and the single-electron wave functions play the role of |k.
Again, the particular form of a(E) suggests an expansion in Chebyshev polyno-
mials, and after rescaling we find
1 D−1
1 #k ) = 1 Tr ATn (H)
#
μn = #
a(E) Tn (E) dE = k|A|k Tn (E ,
D D
−1 k=0
(19.47)
which can be evaluated employing the stochastic approach, outlined in Sect. 19.1.
For interacting systems at low temperature the expression in (19.46) is a bit
problematic, since the Boltzmann factor puts most of the weight on the lower end
of the spectrum and heavily amplifies small numerical errors in ρ(E) and a(E). We
can avoid these problems by calculating the ground state and some of the lowest
excitations exactly, using standard iterative diagonalization methods like Lanczos
or Jacobi-Davidson (see Sect. 18.2). Then we split the expectation value of A and
the partition function Z into contributions from the exactly known states and con-
tributions from the rest of the spectrum,
C−1
∞
1 −βEk 1
A = k|A|k e + as (E)e−βE dE ,
ZD Z
k=0 −∞
C−1
∞
1
Z= e−βEk + ρs (E)e−βE dE . (19.48)
D
k=0 −∞
−1
D−1 −1
D−1
Here as (E) = D k=C k|A|k δ(E −Ek ) and ρs (E) = D k=C δ(E −
Ek ) describe the rest of the spectrum and can be expanded in Chebyshev polyno-
mials easily.
C−1 Based on the known states we can introduce the projection operator
P = 1 − k=0 |kk| and find for the expansion coefficients of #as (E)
R−1
1 # ≈ 1 # |r ,
μn = Tr(P ATn (H)) r|P ATn (H)P (19.49)
D RD r=0
.0
–9
z
Δ = –0.5
Δ=
z
0.0
z z
0.0
Δ= Δ = 0.5
1
–0.1
1.0
Δ=
KPM 4 x 4, C = 0
KPM 4 x 4, C = 2 .. 9
–0.2 KPM 4 x 6, C = 2 .. 8
ED 4 x 4
0 1 2 3
T
z z
Fig. 19.3. Nearest-neighbor S -S correlations of the XXZ model on a square lattice. Lines
represent the KPM results with separation of low-lying eigenstates (bold solid and bold
dashed) and without (thin dashed), open symbols denote exact results from a complete diag-
onalization of a 4 × 4 system
We illustrate the accuracy of this approach in Fig. 19.3 considering the nearest-
neighbor S z -S z correlations of the square-lattice spin-1/2 XXZ model as an
example,
H= (Six Si+δ
x
+ Siy Si+δ
y
+ ∆Siz Si+δ
z
). (19.51)
i,δ
Note that for non-interacting systems the above separation of the spectrum is not
required, since for T → 0 the Fermi function converges to a simple step function
without causing any numerical problems.
Having discussed simple expectation values and static correlations, the calculation
of time dependent quantities is the natural next step in the study of complex quan-
tum models. This is motivated also by many experimental setups, which probe the
response of a physical system to time dependent external perturbations. Examples
are inelastic scattering experiments or measurements of transport coefficients. In
the framework of linear response theory and the Kubo formalism the system’s re-
sponse is expressed in terms of dynamical correlation functions, which can also be
calculated efficiently with Chebyshev expansion and KPM.
Given two operators A and B a general dynamical correlation function can be
defined through
D−1
0|A|kk|B|0
1
A; B±
ω = lim 0|A B|0 = lim , (19.52)
ǫ→0 ω + iǫ ∓ H ǫ→0 ω + iǫ ∓ Ek
k=0
has a similar structure as, e.g., the local density of states in (19.43), and in fact,
with ρi (E) we already calculated a dynamical correlation function. Rescaling the
Hamiltonian H → H # and all energies ω → ω # we can proceed as usual and expand
±
ImA; Bω in Chebyshev polynomials,
1 ∞
2
1
±
ImA; Bω# = − √ μ0 + 2 μn Tn (#ω) . (19.54)
1−ω #2 n=1
In many cases, especially for the spectral functions and optical conductivities
studied below, only the imaginary part of A; B±
ω is of interest, and the above setup
19 Chebyshev Expansion Techniques 561
is all we need. Sometimes however – e.g., within the cluster perturbation theory
discussed in Sect. 19.3 – also the real part of a general correlation function A; B±
ω
is required. Fortunately it can be calculated with almost no additional effort: The
analytical properties of A; B± ω arising from causality imply that its real part is
fully determined by the imaginary part. Indeed, using the Hilbert transforms of the
Chebyshev polynomials,
1
Tn (y) dy
P = π Un−1 (x) ,
(y − x) 1 − y 2
−1
1
1 − y 2 Un−1 (y) dy
P = −π Tn (x) , (19.56)
(y − x)
−1
we obtain
D−1
1
ReA; B±
# =
ω 0|A|kk|B|0 P
ω #k
#∓E
k=0
1 ∞
1 ImA; B±#′
ω ′
=− P dω = −2 μn Un−1 (#
ω) . (19.57)
π ω
#−ω #′ n=1
−1
can thus be reconstructed from the same moments μn that we derived for its imag-
inary part (19.55). In contrast to the real quantities we considered so far, the recon-
struction merely requires complex Fourier transform (see (19.39)). If only the imag-
inary or real part of A; B±ω is needed, a cosine or sine transform, respectively, is
sufficient.
Note that the calculation of dynamical correlation functions for non-interacting
electron systems is not possible with the scheme discussed in this section, not even
at zero temperature. At finite band filling (finite chemical potential) the ground state
consists of a sum over occupied single-electron states, and dynamical correlation
functions thus involve a double summation over matrix elements between all single-
particle eigenstates, weighted by the Fermi function. See the section on the optical
conductivity for a discussion of this case, which covers also the calculation of dy-
namical correlation functions at finite temperature.
562 A. Weiße and H. Fehske
which is one of the basic models for the study of electron-lattice interaction in elec-
tronically low-dimensional solids. In (19.61), the electrons are approximated by
(†)
spinless fermions ci , the density of which couples to the local lattice distortion
(†)
described by dispersionless phonons bi . At half-filling, i.e., 0.5 fermions per site,
the model allows for the study of quantum effects at the transition from a (Luttinger
liquid) metal to a (Peierls) insulator, marked by the opening of a gap at the Fermi
wave vector and the development of charge-density-wave (CDW) long-range order
and a matching lattice distortion [29, 30, 31]. The Peierls insulator can be classified
as traditional band insulator and polaronic superlattice in the strong electron-phonon
coupling adiabatic (ω0 /t ≪ 1) and anti-adiabatic (ω0 /t ≫ 1) regimes, respectively.
Figure 19.4 shows KPM data for the spectral function of the half-filled Holstein
model and assesses its quality by comparing with results from Dynamical Density
Matrix Renormalization Group (DDMRG) [32] calculations. In the spinless case,
the photo-emission (A− ) and inverse photo-emission (A+ ) parts read
A− (k, ω) = |l, Ne − 1| ck |0, Ne |2 δ[ω + (El,Ne −1 − E0,Ne )] ,
l
+
A (k, ω) = |l, Ne + 1| c†k |0, Ne |2 δ[ω − (El,Ne +1 − E0,Ne )] , (19.62)
l
where |l, Ne denotes the lth eigenstate with Ne electrons and energy El,Ne . For
the parameters of Fig. 19.4 the system is in an insulating phase with a finite charge
excitation gap at the Fermi momentum k = ±π/2. Below and above the gap the
spectrum is characterized by broad multi-phonon absorption, reflecting the Poisson-
like phonon distribution in the ground state. Compared to DDMRG, KPM offers
the better resolution and unfolds all the discrete phonon sidebands. Concerning nu-
merical performance DDMRG has the advantage of a small optimized Hilbert space
19 Chebyshev Expansion Techniques 563
2
k=0
1
x10
A+(k,ω), A−(k,ω)
0
k = ±π/4
1
x10
0
k = ±π/2
1
n = 0.5
0
–4 –2 0 2 4
ω/t
Fig. 19.4. Single-particle spectral functions A(k, ω) (for electron removal, ω < 0, and elec-
tron injection, ω > 0) of the spinless Holstein model at half-filling on an eight-site lat-
tice with periodic boundary conditions. The system is in the Peierls/CDW insulating phase
(ω0 /t = 0.1 and g = 4). The rapidly oscillating thin lines are the KPM results (M = 32)
while the smooth thick line are the DDMRG data (M = 16) obtained with the pseudo-site
method for the same lattice size
[33, 34], which can be handled with standard workstations. However, the basis opti-
mization is rather time consuming and, in addition, each frequency value ω requires
a new simulation. The KPM calculations, on the other hand, involved matrix dimen-
sions between 108 and 1010 , and we therefore used high-performance computers
such as Hitachi SR8000-F1 or IBM p690 for the moment calculation. For the recon-
struction of the spectra, of course, a desktop computer is sufficient.
where #
j(x, y) refers to the rescaled j(x, y), gn are the usual kernel damping factors,
and hnm account for the correct normalization. The moments μnm are obtained
from
1
1
1
μnm = #
j(x, y)Tn (x)Tm (y) dx dy = #
Tr Tn (H)JT #
m (H)J , (19.67)
D
−1 −1
and again the trace can be replaced by an average over a relatively small number
R of random vectors |r. The numerical effort for an expansion of order n, m <
N ranges between 2RDN and RDN 2 operations, depending on whether memory
is available for up to N vectors of the Hilbert space dimension D or not. Given
the operator density j(x, y) we find the optical conductivity by integrating over
Boltzmann factors,
∞
1
reg
σ (ω) = j(y + ω, y) e−βy − e−β(y+ω) dy
Zω
−∞
|k|J|q|2 (e−βEk − e−βEq )
= δ(ω − ωqk ) , (19.68)
ZDω
k,q
and, as above, we get the partition function Z from an integral over the density of
states ρ(E). The latter can be expanded in parallel to j(x, y). Note that the calcu-
lation of the conductivity at different temperatures is based on the same operator
density j(x, y), i.e., it needs to be expanded only once for all temperatures.
As a physical example, we consider the conductivity for the Anderson model of
non-interacting fermions moving in a random potential [18],
† †
H = −t ci cj + ǫ i ci ci . (19.69)
ij i
Here hopping occurs along nearest neighbor bonds ij on a simple cubic lattice
and the local potential ǫi is chosen randomly with uniform distribution in the inter-
val [−γ/2, γ/2]. With increasing strength of disorder, γ, the single-particle eigen-
states of the model tend to become localized in the vicinity of a particular lattice
19 Chebyshev Expansion Techniques 565
site, which excludes these states from contributing to electronic transport. Disorder
can therefore drive a transition from metallic behavior with delocalized fermions to
insulating behavior with localized fermions [35, 36, 37].
Since the Anderson model describes non-interacting fermions, the eigenstates
|k occurring in σ(ω) now denote single-particle wave functions and the Boltzmann
weight has to be replaced by the Fermi function,
|k|J|q|2 (f (Ek ) − f (Eq ))
σ reg (ω) = δ(ω − ωqk ) . (19.70)
ω
k,q
Clearly, from a computational point of view this expression is of the same com-
plexity for both, zero and finite temperature, i.e. we need the more advanced 2D
KPM approach [38].
Figure 19.5 shows the optical conductivity of the Anderson model at γ/t = 12
for different inverse temperatures β = 1/T . The chemical potential is chosen as
μ = 0, i.e., the system is still in the metallic phase. However, the conductivity
shows a pronounced dip near ω = 0 with the functional form σ(ω) ∼ σ0 + |ω|α .
For stronger disorder γ or a different chemical potential μ, the system will become
insulating and the dc-conductivity σ0 will vanish. The role of temperature, in this
example, is limited to suppressing σ(ω), mainly through the (f (Ek ) − f (Eq )) term
in (19.70). The model (19.69) does not describe thermally activated hopping, since
there are no phonons included.
0.014
γ = 12 t3
D = 1003
0.012
S = 440
N = 2048
0.01
σreg(ω)
0.008
/t
00
0.006
0
... 1
0.004
1
0.0
β=
0.002
0
–15 –10 –5 0 5 10 15
ω/t
Fig. 19.5. Optical conductivity of the 3D Anderson model with γ = 12. Note that all curves
are derived from the same matrix element density j(x, y), which was calculated for a 1003
site cluster with expansion order N = 2048 and averaged over Kr = 440 samples
566 A. Weiße and H. Fehske
and the problem translates into calculating the time evolution operator U (t) =
exp(−iHt) for a given Hamiltonian H and time t. Using the rescaling introduced
in (19.7), we can expand U (t) in a series of Chebyshev polynomials [39, 40, 41],
1 N
2
#
U (t) = e −i(aH+b)t
=e −ibt
c0 + 2 #
ck Tk (H) , (19.73)
k=1
1
Tk (x)e−iaxt
ck = √ dx = (−i)k Jk (at) , (19.74)
π 1 − x2
−1
and Jk (at) denotes the Bessel function of order k. The Chebyshev polynomials of
# are calculated with the recursion we introduced earlier, see
the Hamiltonian, Tk (H),
(19.3). Thus, the wave function at a later time is obtained simply through a set of
MVMs with the Hamiltonian.
Asymptotically the Bessel function behaves as
1 z
k 1 ez
k
Jk (z) ∼ ∼ √ (19.75)
k! 2 2πk 2k
for k → ∞, hence for k ≫ at the expansion coefficients ck decay superexponen-
tially and the series can be truncated with negligible error. With an expansion order
of N 1.5at we are usually on the safe side. Moreover, we can check the quality
of our approximation by comparing the norms of |ψt and |ψ0 . For sparse matrices
the whole time evolution scheme is therefore linear in both, the matrix dimension
and the time.
The Chebyshev expansion method converges much faster than other time inte-
gration methods, in particular, it is faster than the popular Crank-Nicolson method
19 Chebyshev Expansion Techniques 567
[42]. Within this approach the time interval t is divided into small steps ∆t = t/N ,
and the wave function is propagated in a mixed explicit/implicit manner,
Thus, each step requires both a sparse MVM and the solution of a sparse linear
system. Obviously, this is more complicated than the Chebyshev recursion, which
requires only MVMs. In the Crank-Nicolson method the time evolution operator is
approximated as
N
1 − iHt/(2N )
U (t) = . (19.77)
1 + iHt/(2N )
1 exact
Chebyshev
0.5 Crank Nicolson
Re[U(x)]
0
–0.5
–1
1
0.5
Im[U(x)]
0
–0.5
–1
–1 –0.5 0 0.5 1
x
Fig. 19.6. Comparison of the Chebyshev and the Crank-Nicolson approximation of the func-
tion U (t) = exp(ixt) with t = 10 and expansion order N = 15
568 A. Weiße and H. Fehske
0.2 polaron
g = 0.4 free electron
〈ni〉 0.15 ω0 = 1
0.1 2
speed
0.05 difference
1
0
ε(k), ε’(k)
0.2 0
time
0.15
〈ni〉
–1
0.1
0.05 –2
0
0 50 100 150 0 0.1 0.2 0.3 0.4 0.5
i p / 2π
Fig. 19.7. Formation of a polaron for electron-lattice coupling g = 0.4 and phonon frequency
ω0 = 1 (upper panel), compared to the motion of a non-interacting wave packet (lower
panel). The right panel shows the underlying dispersions (lower curves) and velocities (upper
curves)
quantum dynamics of such a system (for a recent review see also [45]). In Fig. 19.7
we show the time evolution of a single-electron wave packet
2 2
|ψ0 = eipj−(j−j0 ) /(2σ ) c†j |0 , (19.78)
j
where in the upper and lower panels the electron-phonon coupling g is finite or
zero, respectively. For finite g, within the first few time steps a polaron is formed,
which then travels at lower speed, compared to the non-interacting wave packet. The
speed difference is given by the difference of the derivatives ε′ (k) of the underlying
dispersions ε(k) at the mean momentum p, see right hand panel. The Chebyshev ex-
pansion method allows for a fast and reliable simulation of this interesting problem.
The spectrum of a finite system of L sites, which we obtain through KPM, differs in
many respects from that of an infinite system, L → ∞, especially since for a finite
system the lattice momenta K = π m/L and the energy levels are discrete. While
we cannot easily increase L without reaching computationally inaccessible Hilbert
space dimensions, we can try to extrapolate from a finite to the infinite system.
With the Cluster Perturbation Theory (CPT) [46, 47, 48] a straightforward way
to perform this task approximatively has recently been devised. In this scheme one
19 Chebyshev Expansion Techniques 569
first calculates the Green function Gcij (ω) for all sites i, j = 1, . . . , L of a L-size
cluster with open boundary conditions, and then recovers the infinite lattice by past-
ing identical copies of this cluster at their edges. The glue is the hopping V between
these clusters, where Vmn = t for |m − n| = 1 and m, n ≡ 0, 1(modL), which is
dealt with in first order perturbation theory. Then the Green function Gij (ω) of the
infinite lattice is given through a Dyson equation
Gij (ω) = Gcij (ω) + Gcik (ω)Vmn Gnj (ω) , (19.79)
mn
c
where indices of G (ω) are counted modulo L. Obviously this order of perturba-
tion in V is exact for the non-interacting system. The Dyson equation is solved by
Fourier transformation over momenta K = kL corresponding to translations by L
sites
Gc (ω)
Gij (K, ω) = . (19.80)
1 − V (K)Gc (ω) ij
from which one finally obtains
L
1 c
G(k, ω) = G (Lk, ω)e−ik(i−j) . (19.81)
L i,j=1 ij
Hence, from the Green function Gcij (ω) on a finite cluster we construct a Green
function G(k, ω) with continuous momenta k.
Two approximations are made, one by using first order perturbation theory in
V = t, the second on assuming translational symmetry in Gij (ω) which is satisfied
only approximately. In principle, the CPT spectral function G(k, ω) does not con-
tain any more information than the cluster Green function Gcij (ω) already does. But
extrapolating to the infinite system it gives a first hint at the scenario in the ther-
modynamic limit. Providing direct access to spectral functions, still without relying
on possibly erroneous approximations, CPT occupies a niche between variational
approaches like (D)DMRG [32, 49] and methods directly working in the thermody-
namic limit like the variational ED method [43].
On applying the CPT crucial attention has to be paid to the kernel used in the
reconstruction of Gcij (ω). As it turns out, the Jackson kernel is an inadequate choice
here, since already for the non-interacting tight-binding model it introduces spuri-
ous structures into the spectra [1]. The failure can be attributed to the shape of the
Jackson kernel: Being optimized for high resolution, a pole in the Green function
will give a sharp peak with most of its weight concentrated at the center, and rapidly
decaying tails. The reconstructed (cluster) Green function therefore does not satisfy
the correct analytical properties required in the CPT step. To guarantee these prop-
erties, instead, we use the Lorentz kernel, which is constructed in order to mimic the
effect of a finite imaginary part in the energy argument of a Green function.
Using Gcij (ω) = Gcji (ω) (no magnetic field), for a L-site chain L diagonal and
L(L − 1)/2 off-diagonal elements of Gcij (ω) have to be calculated. The latter can be
(†) (†)
reduced to Chebyshev iterations for the operators ci + cj . The numerical effort
570 A. Weiße and H. Fehske
can be further reduced by a factor 1/L: If we keep the ground state |0 of the system
we can calculate the moments μij # †
n = 0|ci Tn (H)cj |0 for L elements i = 1, . . . , L
of Gcij (ω) in a single Chebyshev iteration. To achieve a similar reduction within
the Lanczos recursion we had to explicitly construct the eigenstates to the Lanczos
eigenvalues. Then the factor 1/L is exceeded by at least N D additional operations
for the construction of N eigenstates of a D-dimensional sparse matrix. Hence using
KPM for the CPT cluster diagonalization the numerical effort can be reduced by a
factor of 1/L in comparison to the Lanczos recursion.
As an example we consider the 1D Hubbard model
†
H = −t (ci,σ ci+1,σ + H.c.) + U ni↑ ni↓ , (19.82)
i,σ i
which is exactly solvable by Bethe ansatz [50] and was also extensively studied with
DDMRG [51]. It thus provides the opportunity to assess the precision of the KPM-
based CPT. The top left panel of Fig. 19.8 shows the one-particle spectral function
at half-filling, calculated on the basis of L = 16 site clusters and an expansion order
of N = 2048. The matrix dimension is D ≈ 1.7 · 108 . Remember that the cluster
Green function is calculated for a chain with open boundary conditions. The reduced
symmetry compared to periodic boundary conditions results in a larger dimension
of the Hilbert space that has to be dealt with numerically.
In the top right panel the dots show the Bethe ansatz results for a L = 64 site
chain, and the lines denote the L → ∞ spinon and holon excitations each electron
separates into (spin-charge separation). So far the Bethe ansatz does not allow for a
direct calculation of the structure factor, the data thus represents only the position
and density of the eigenstates, but is not weighted with the matrix elements of the
(†)
operators ckσ . Although for an infinite system we would expect a continuous re-
sponse, the CPT data shows some faint fine-structure. A comparison with the finite-
size Bethe ansatz data suggests that these features are an artifact of the finite-cluster
Greens function which the CPT spectral function is based on. The fine-structure is
also evident in the lower panel of Fig. 19.8, where we compare with DDMRG data
for a L = 128 site system. Otherwise the CPT nicely reproduces all expected fea-
tures, like the excitation gap, the two pronounced spinon and holon branches, and
the broad continuum. Note also, that CPT is applicable to all spatial dimensions,
whereas DDMRG works well only for 1D models.
Having demonstrated the wide applicability of KPM, let us now discuss some direct
competitors of KPM, i.e., methods that share the broad application range and some
of its general concepts.
The first of these approaches, the combination of Chebyshev expansion and
Maximum Entropy Method (MEM), is basically an alternative procedure to trans-
form moment data μn into convergent approximations of the considered function
19 Chebyshev Expansion Techniques 571
10 10
2 2
0 0
–2 –2
–4 –4
–6 –6
0 1 2 3 0 1 2 3
k k = kh + ks
DDMRG
CPT + KPM
1 1
Gσ(k,ω)
0.8 0.5
–Im G
Aσ(k,ω)
0.6 0
Re G
0.4 –0.5 k=π
2 4 6 8
ω/t
0.2 k=π / 2
0
2 4 6 8
ω/t
Fig. 19.8. Spectral function of the 1D Hubbard model for half-filling and U = 4t. Top
left: CPT result with cluster size L = 16 and expansion order N = 2048. For similar
data based on Lanczos recursion see [47]. Top right: Within the exact Bethe ansatz solution
each electron separates into the sum of independent spinon (red dashed) and holon (green)
excitations. The dots mark the energies of a 64-site chain. Bottom: CPT data compared to
selected DDMRG results for a system with L = 128 sites, open boundary conditions and a
broadening of ǫ = 0.0625t. Note that in DDMRG the momenta are approximate
f (x). To achieve this, instead of (or in addition to) applying kernel polynomials, an
entropy
1
f (x)
S(f, f0 ) = f (x) − f0 (x) − log dx (19.83)
f0 (x)
−1
is maximized under the constraint that the moments of the estimated f (x) agree
with the given data. The function f0 (x) describes our initial knowledge about f (x),
and may in the worst case just be a constant. Being related to Maximum Entropy
approaches to the classical moment problem [52, 53], for the case of Chebyshev
moments different implementations of MEM have been suggested [9, 54, 55]. Since
572 A. Weiße and H. Fehske
for a given set of N moments μn the approximation to the function f (x) is usually
not restricted to a polynomial of degree N − 1, compared to the KPM with Jackson
kernel the MEM usually yields estimates of higher resolution. However, this higher
resolution results from adding a priori assumptions and not from a true information
gain (see also Fig. 19.9). The resource consumption of the MEM is generally much
higher than the N log N behavior we found for KPM. In addition, the approach is
non-linear in the moments and can occasionally become unstable for large N . Note
also that as yet MEM have been derived only for positive quantities, f (x) > 0, such
as densities of states or strictly positive correlation functions.
MEM, nevertheless, is a good alternative to KPM, if the calculation of the μn
is particularly time consuming. Based on only a moderate number of moments it
yields very detailed approximations of f (x), and we obtained very good results for
some computationally demanding problems [56].
The Lanczos recursion technique [57] is certainly the most capable competitor of
KPM. The use of the Lanczos algorithm [8, 58] for the characterization of spec-
tral densities [59, 60] was first proposed at about the same time as the Chebyshev
expansion approaches, and in principle Lanczos recursion is also a kind of modi-
fied moment expansion [61, 62]. Its generalization from spectral densities to zero-
temperature dynamical correlation functions was first given in terms of continued
fractions [63], and later also an approach based on the eigenstates of the tridiagonal
matrix was introduced and termed Spectral Decoding Method [64]. This technique
20 2
KPM (Jackson kernel)
MEM (Silver, Röder 1997)
15 N = 512 1.5
step function
five δ-peaks
10 1
5 0.5
0 0
–0.1 –0.05 0 0.05 0.1 0.4 0.45 0.5 0.55 0.6
x x
Fig. 19.9. Comparison of a KPM and a MEM approximation to a spectrum consisting of five
isolated δ-peaks, and to a step function. The expansion order is N = 512. Clearly, for the δ-
peaks MEM yields a higher resolution, but for the step function the Gibbs oscillations return
(algorithm of [54])
19 Chebyshev Expansion Techniques 573
was then generalized to finite temperature [65, 66], and, in addition, some variants of
the approach for low temperature [67] and based on the micro-canonical ensemble
[68] have been proposed recently.
To give an impression, in Table 19.1 we compare the setup for the calculation
of a zero-temperature dynamical correlation function within the Chebyshev and the
Lanczos approach. The most time consuming step for both methods is the recursive
construction of a set of vectors |φn , which in terms of scalar products yield the
moments μn of the Chebyshev series or the elements αn , βn of the Lanczos tridi-
agonal matrix. In terms of the number of operations the Chebyshev recursion has a
Table 19.1. Comparison of Chebyshev expansion and Lanczos recursion for the calculation
of a zero-temperature dynamical correlation function f (ω) = n |n|A|0|2 δ(ω − ωn ). We
assume N MVMs with a D-dimensional sparse matrix H, and a reconstruction of f (ω) at
M points ωi
# = (H − b)/a
H β0 = 0|A† A|0
|φ0 = A|0, # 0
|φ1 = H|φ |φ0 = A|0/β0 , |φ−1 = 0
μ0 = φ0 |φ0 , μ1 = φ1 |φ0
O(N D) O(N D)
Recursion for 2N moments μn : Recursion for N coefficients αn , βn :
# n − |φn−1
|φn+1 = 2H|φ |φ′ = H|φn − βn |φn−1 , αn = φn |φ′
μ2n+2 = 2φn+1 |φn+1 − μ0 |φ′′ = |φ′ − αn |φn , βn+1 = φ′′ |φ′′
μ2n+1 = 2φn+1 |φn − μ1 |φn+1 = |φ′′ /βn+1
1 β02
#n = gn μn
Apply kernel: μ f (z) = − Im
π β12
#n → f#(#
Fourier transform: μ ωi ) z − α0 −
z − α1 − . . .
Rescale: where z = ωi + iǫ
f#[(ωi − b)/a]
f (ωi ) =
π a2 − (ωi − b)2
→ procedure is linear in μn → procedure is non-linear in αn , βn
→ well defined resolution ∝ 1/N → ǫ is somewhat arbitrary
574 A. Weiße and H. Fehske
small advantage, but, of course, the application of the Hamiltonian as the dominant
factor is the same for both methods. As a drawback, at high expansion order the
Lanczos iteration tends to lose the orthogonality between the vectors |φn , which
it intends to establish by construction. When the Lanczos algorithm is applied to
eigenvalue problems this loss of orthogonality usually signals the convergence of
extremal eigenstates, and the algorithm then starts to generate artificial copies of the
converged states (see Fig. 18.5). For the calculation of spectral densities or corre-
lation functions this means that the information content of the αn and βn does no
longer increase proportionally to the number of iterations. Unfortunately, this defi-
ciency can only be cured with more complex variants of the algorithm, which also
increase the resource consumption. Chebyshev expansion is free from such defects,
as there is a priori no orthogonality between the |φn .
The reconstruction of the considered function from its moments μn or coeffi-
cients αn , βn , respectively, is also faster and simpler within the KPM, as it makes
use of FFT. In addition, the KPM is a linear transformation of the moments μn ,
a property we used extensively above when averaging moment data instead of the
corresponding functions. Continued fractions, in contrast, are non-linear in the co-
efficients αn , βn . A further advantage of KPM is our good understanding of its
convergence and resolution as a function of the expansion order N . For the Lanczos
algorithm these issues have not been worked out with the same rigor.
In Fig. 19.10 we compare KPM and Lanczos recursion, calculating the spectral
function −π −1 Im0|c0↑ (ω − H)−1 c†0↑ |0 for the Hubbard model on a L = 12 site
ring and half-filling. With the Jackson kernel all features of the dynamical corre-
lation function are resolved sharply, whereas with Lanczos recursion, by construc-
tion, we observe Lorentzian broadening. The Lanczos recursion data therefore is
1
KPM Jackson
KPM Lorentz
LR
0.8
–Im 〈0|c0 (ω –H)–1c0 |0〉/π
L = 12, N = N = 6
↓ ↑
+
1024 MVM
0.6
0.4
0.2
0
2 4 6 8 10
(ω – E0) / t
Fig. 19.10. The spectral function −π −1 Im0|c0↑ (ω − H)−1 c†0↑ |0 calculated for the Hub-
bard model with L = 12, N↓ = N↑ = 6 using KPM and Lanczos recursion (LR). Lanczos
recursion closely matches KPM with Lorentz kernel
19 Chebyshev Expansion Techniques 575
comparable to KPM with Lorentz kernel, except that the calculation takes a little bit
longer (about 10% in this simple case). Note also, that within KPM the calculation
of non-diagonal correlation functions, like 0|ci (ω − H)−1 c†j |0 with i = j, is much
easier – see our discussion in Sect. 19.3.1.
In conclusion, we think that the Lanczos algorithm is an excellent tool for the
calculation of extremal eigenstates of large sparse matrices, but for spectral densi-
ties and correlation functions the KPM (MEM) is the better choice. Of course, the
advantages of both algorithms can be combined, e.g. when the Chebyshev expan-
sion starts from an exact eigenstate that was calculated with the Lanczos algorithm.
Acknowledgements
References
1. A. Weiße, G. Wellein, A. Alvermann, H. Fehske, Rev. Mod. Phys. 78, 275 (2006) 545, 551, 552, 569
2. R.N. Silver, H. Röder, Int. J. Mod. Phys. C 5, 935 (1994) 546, 548, 555
3. J.P. Boyd, Chebyshev and Fourier Spectral Methods. No. 49 in Lecture Notes in Engi-
neering (Springer-Verlag, Berlin, 1989) 546
4. M. Abramowitz, I.A. Stegun (eds.), Handbook of Mathematical Functions with formulas,
graphs, and mathematical tables (Dover, New York, 1970) 546, 553
5. T.J. Rivlin, Chebyshev polynomials: From Approximation Theory to Algebra and Num-
ber Theory, 2nd edn. Pure and Applied Mathematics (John Wiley & Sons, New York,
1990) 546
6. E.W. Cheney, Introduction to Approximation Theory (McGraw-Hill, New York, 1966) 546
7. G.G. Lorentz, Approximation of Functions (Holt, Rinehart and Winston, New York,
1966) 546
8. C. Lanczos, J. Res. Nat. Bur. Stand. 45, 255 (1950) 547, 572
9. J. Skilling, in Maximum Entropy and Bayesian Methods, ed. by J. Skilling (Kluwer,
Dordrecht, 1988), Fundamental Theories of Physics, pp. 455–466 548, 555, 571
10. D.A. Drabold, O.F. Sankey, Phys. Rev. Lett. 70, 3631 (1993) 548
11. L. Fejér, Math. Ann. 58, 51 (1904) 551
12. D. Jackson, Über die Genauigkeit der Annäherung stetiger Funktionen durch ganze ratio-
nale Funktionen gegebenen Grades und trigonometrische Summen gegebener Ordnung.
Ph.D. thesis, Georg-August-Universität Göttingen (1911) 551
13. D. Jackson, T. Am. Math. Soc. 13, 491 (1912) 551
576 A. Weiße and H. Fehske
14. M. Frigo, S.G. Johnson, Proceedings of the IEEE 93(2), 216 (2005). Special issue on
“Program Generation, Optimization, and Platform Adaptation” 554
15. M. Frigo, S.G. Johnson. FFTW fast fourier transform library. URL http://www.
fftw.org/ 554
16. J.C. Wheeler, Phys. Rev. A 9, 825 (1974) 555
17. R.N. Silver, H. Röder, A.F. Voter, D.J. Kress, J. Comput. Phys. 124, 115 (1996) 555
18. P.W. Anderson, Phys. Rev. 109, 1492 (1958) 556, 564
19. R. Abou-Chacra, D.J. Thouless, P.W. Anderson, J. Phys. C Solid State 6, 1734 (1973) 556
20. R. Haydock, R.L. Te, Phys. Rev. B 49, 10845 (1994) 556
21. V. Dobrosavljević, A.A. Pastor, B.K. Nikolić, Europhys. Lett. 62, 76 (2003) 556
22. C.M. Soukoulis, Q. Li, G.S. Grest, Phys. Rev. B 45, 7724 (1992) 556
23. S. Kirkpatrick, T.P. Eggarter, Phys. Rev. B 6, 3598 (1972) 556
24. R. Berkovits, Y. Avishai, Phys. Rev. B 53, R16125 (1996) 556
25. G. Schubert, A. Weiße, H. Fehske, Phys. Rev. B 71, 045126 (2005) 557
26. K. Fabricius, B.M. McCoy, Phys. Rev. B 59, 381 (1999) 559
27. C. Schindelin, H. Fehske, H. Büttner, D. Ihle, Phys. Rev. B 62, 12141 (2000) 559
28. H. Fehske, C. Schindelin, A. Weiße, H. Büttner, D. Ihle, Braz. J. Phys. 30, 720 (2000) 559
29. A. Weiße, H. Fehske, Phys. Rev. B 58, 13526 (1998) 562
30. M. Hohenadler, G. Wellein, A.R. Bishop, A. Alvermann, H. Fehske, Phys. Rev. B 73,
245120 (2006) 562
31. H. Fehske, E. Jeckelmann, in Polarons in Bulk Materials and Systems With Reduced Di-
mensionality, International School of Physics Enrico Fermi, Vol. 161, ed. by G. Iadonisi,
J. Ranninger, G. De Filippis (IOS Press, Amsterdam, 2006), International School of
Physics Enrico Fermi, Vol. 161, pp. 297–311 562
32. E. Jeckelmann, Phys. Rev. B 66, 045114 (2002) 562, 569
33. A. Weiße, H. Fehske, G. Wellein, A.R. Bishop, Phys. Rev. B 62, R747 (2000) 563
34. E. Jeckelmann, H. Fehske, in Polarons in Bulk Materials and Systems With Reduced Di-
mensionality, International School of Physics Enrico Fermi, Vol. 161, ed. by G. Iadonisi,
J. Ranninger, G. De Filippis (IOS Press, Amsterdam, 2006), International School of
Physics Enrico Fermi, Vol. 161, pp. 247–284 563
35. D.J. Thouless, Phys. Rep. 13, 93 (1974) 565
36. P.A. Lee, T.V. Ramakrishnan, Rev. Mod. Phys. 57, 287 (1985) 565
37. B. Kramer, A. Mac Kinnon, Rep. Prog. Phys. 56, 1469 (1993) 565
38. A. Weiße, Eur. Phys. J. B 40, 125 (2004) 565
39. H. Tal-Ezer, R. Kosloff, J. Chem. Phys. 81, 3967 (1984) 566
40. J.B. Wang, T.T. Scholz, Phys. Rev. A 57, 3554 (1998) 566
41. V.V. Dobrovitski, H. De Raedt, Phys. Rev. E 67, 056702 (2003) 566
42. W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, Numerical Recipes in C: The
Art of Scientific Computing, 2nd edn. (Cambridge University Press, Cambridge, 1992) 567
43. J. Bonča, S.A. Trugman, I. Batistić, Phys. Rev. B 60, 1633 (1999) 567, 569
44. S.A. Trugman, L.C. Ku, J. Bonča, J. Supercond. 17, 193 (2004) 567
45. H. Fehske and S.A. Trugman in Polarons in Advanced Materials, Ed. A.S. Alexan-
drov, Springer Series in Material Sciences Vol. 103, pp. 393–461 (Canopus/Springer,
Dordrecht 2007) 568
46. C. Gros, R. Valentı́, Ann. Phys. (Leipzig) 3, 460 (1994) 568
47. D. Sénéchal, D. Perez, M. Pioro-Ladrière, Phys. Rev. Lett. 84, 522 (2000) 568, 571
48. D. Sénéchal, D. Perez, D. Plouffe, Phys. Rev. B 66, 075129 (2002) 568
49. S.R. White, Phys. Rev. Lett. 69, 2863 (1992) 569
50. F.H.L. Essler, H. Frahm, F. Göhmann, A. Klümper, V.E. Korepin, The One-Dimensional
Hubbard Model (Cambridge University Press, Cambridge, 2005) 570
19 Chebyshev Expansion Techniques 577
51. E. Jeckelmann, F. Gebhard, F.H.L. Essler, Phys. Rev. Lett. 85, 3910 (2000) 570
52. L.R. Mead, N. Papanicolaou, J. Math. Phys. 25, 2404 (1984) 571
53. I. Turek, J. Phys. C Solid State 21, 3251 (1988) 571
54. R.N. Silver, H. Röder, Phys. Rev. E 56, 4822 (1997) 571, 572
55. K. Bandyopadhyay, A.K. Bhattacharya, P. Biswas, D.A. Drabold, Phys. Rev. E 71,
057701 (2005) 571
56. B. Bäuml, G. Wellein, H. Fehske, Phys. Rev. B 58, 3663 (1998) 572
57. E. Dagotto, Rev. Mod. Phys. 66, 763 (1994) 572
58. J.K. Cullum, R.A. Willoughby, Lanczos Algorithms for Large Symmetric Eigenvalue
Computations, Vol. I & II (Birkhäuser, Boston, 1985) 572
59. R. Haydock, V. Heine, M.J. Kelly, J. Phys. C Solid State 5, 2845 (1972) 572
60. R. Haydock, V. Heine, M.J. Kelly, J. Phys. C Solid State 8, 2591 (1975) 572
61. P. Lambin, J.P. Gaspard, Phys. Rev. B 26, 4356 (1982) 572
62. C. Benoit, E. Royer, G. Poussigue, J. Phys. Cond. Mat. 4, 3125 (1992) 572
63. E. Gagliano, C. Balseiro, Phys. Rev. Lett. 59, 2999 (1987) 572
64. Q. Zhong, S. Sorella, A. Parola, Phys. Rev. B 49, 6408 (1994) 572
65. J. Jaklič, P. Prelovšek, Phys. Rev. B 49, 5065 (1994) 573
66. J. Jaklič, P. Prelovšek, Adv. Phys. 49, 1 (2000) 573
67. M. Aichhorn, M. Daghofer, H.G. Evertz, W. von der Linden, Phys. Rev. B 67, 161103
(2003) 573
68. M.W. Long, P. Prelovšek, S. El Shawish, J. Karadamoglou, X. Zotos, Phys. Rev. B 68,
235106 (2003) 573
20 The Conceptual Background of Density-Matrix
Renormalization
In the treatment of many-particle quantum systems, one approach is to work with the
wave function and to look for an approximation which is as good as possible. The
density-matrix renormalization group method (DMRG) is a numerical procedure
which does that by selecting an optimal subspace of the complete Hilbert space in
a systematic way. It was developed in the early nineties by Steven White [1, 2] and
has since then become the most powerful tool for treating one-dimensional quan-
tum systems [3, 4, 5]. This is due to the fact that it combines spectacular accura-
cies like ten decimal places for ground-state energies, with the possibility to treat
large systems with e.g. hundreds of spins. Recently it has also been extended to
time-dependent problems. All this will be described in more detail in the following
contributions.
20.1 Introduction
In this introductory chapter, we want to give a general background for the method
and discuss some concepts which arise in the characterization and description of
quantum states. These are not only relevant for the DMRG but appear also in other
contexts and have a basic interest in themselves. Specifically, this will be entangled
states, reduced density matrices, entanglement entropies and matrix-product states.
The emphasis will be on reduced density matrices and their features. These are
crucial for the performance of the DMRG but they also arise naturally if one wants
to quantify entanglement properties. The latter have been the topic of many recent
studies and we will also give a brief account of that.
|Ψ = |+|− , (20.1)
I. Peschel and V. Eisler: The Conceptual Background of Density-Matrix Renormalization, Lect. Notes Phys. 739,
581–596 (2008)
DOI 10.1007/978-3-540-74686-7 20 c Springer-Verlag Berlin Heidelberg 2008
582 I. Peschel and V. Eisler
or, in general,
|Ψ = [a |+ + b |−][c |+ + d |−] . (20.2)
In this case the two spins are independent of each other and all expectation values
factorize. By contrast, an entangled state is
1
|Ψ = √ [|+|+ + |−|−] . (20.3)
2
This cannot be written in product form and expectation values do not factorize. The
parts of the composite system are interwoven in the wave function. This is typical
for interacting systems and is the situation one normally encounters, and has to deal
with, in many-particle problems.
In the two-spin case it is relatively easy to check whether a state has product
form or not. In the general case, one proceeds as follows. One divides the system
into two parts 1 and 2.
1 2
which gives the total wave function as a single sum of products of orthonormal
functions. Here the number of terms is limited by the smaller of the two Hilbert
spaces and the weight factors λn are the elements of the diagonal matrix D. If |Ψ
is normalized, their absolute magnitudes squared sum to one. The entanglement
properties are encoded in the set of λn . Only if all except one are zero, the sum
reduces to a single term and |Ψ is a product state. On the other hand, if all λn are
equal in size, one would call the state maximally entangled. Of course, this refers
to a particular bipartition and one should investigate different partitions to obtain a
complete picture. One could also ask for the entanglement of more than two parts
but it turns out that there is no general extension of the Schmidt decomposition.
ρ = |Ψ Ψ | , (20.6)
one can, for a chosen division, take the trace over the degrees of freedom in one part
of the system. This gives the reduced density matrix for the other part, i.e.
The reduced density matrices can be determined for the ground states of a number of
standard systems. These are integrable spin chains like the XY model and the XXZ
model, free bosons like coupled oscillators and free fermions like hopping models.
In all these cases the reduced density matrices are found to have the form
where, depending on the problem, the c†l , cl are fermionic or bosonic creation and
annihilation operators and the εl are the corresponding single-particle eigenvalues.
Before we discuss (20.9) further, let us describe briefly how one can derive this
result. Basically, there are three methods to obtain the ρα .
584 I. Peschel and V. Eisler
(1) Integration over part of the variables according to the definition. This can be
done e.g. for coupled harmonic oscillators [10, 11]. In this case the ground state
is a Gaussian in the normal coordinates and has the general form
1 N
If all sites are in the same subsystem, a calculation using the reduced density
matrix must give the same result. This is guaranteed by Wick’s theorem if ρα is
the exponential of a free-fermion operator
The matrix Hij , where i and j are sites in the subsystem, is determined by the
one-particle correlation function Cij = c†i cj via
1−C !
H = ln . (20.13)
C
The method has been used in various fermionic problems [14, 15, 16, 17, 18,
19, 20, 21, 22]. If there is pair creation and annihilation, one has to include
the anomalous correlation functions c†i c†j and ci cj . The approach works for
arbitrary dimensions and also for bosonic systems [22, 23, 24].
(3) Via the connection to two-dimensional classical models. Consider a quantum
chain of finite length and imagine that one can obtain its state |Ψ from an initial
state |Ψs by applying a proper operator T many times. If T is the row-to-row
transfer matrix of a classical model, one has thereby related |Ψ to the parti-
tion function of a two-dimensional semi-infinite strip of that system. The total
20 The Conceptual Background of Density-Matrix Renormalization 585
density matrix |Ψ Ψ | is then given by two such strips. This is sketched on the
left of Fig. 20.1. The reduced density matrix, e.g. for the left part of the chain,
follows by identifying the variables along the right part of the horizontal edges
and summing them, which means mending the two half-strips together. In this
way, ρα is expressed as the partition function of a full strip with a perpendicular
cut, as shown on the right of Fig. 20.1.
T
T
T
T
T 1 2
T trace
T
T
T
T
Fig. 20.1. Density matrices for a quantum chain as two-dimensional partition functions. Left:
Expression for ρ. Right: Expression for ρ1 . The matrices are defined by the variables along
the thick lines
This approach works for the ground state of a number of integrable quantum
chains [11, 25, 26]. For example, the Ising chain in a transverse field can in this way
be related to a two-dimensional Ising model where the lattice is rotated by 45◦ with
respect to the horizontal. However, to actually calculate such a partition function
and thus ρα , one needs a further ingredient, namely the corner transfer matrices
introduced by Baxter [27]. These are partition functions for a whole quadrant as
shown in Fig. 20.2. For some non-critical integrable models, they are known in the
thermodynamic limit and have exponential form. By multiplying four of them as in
D C
A B
Fig. 20.2. Two-dimensional system built from four quadrants with corresponding corner
transfer matrices A, B, C, D. The arrows indicate the direction of transfer
586 I. Peschel and V. Eisler
the figure one can obtain the reduced density matrix for a half-chain which is much
longer than the correlation length.
For a continuum model, the representation just described can be viewed as a
path-integral picture. This can be utilized in particular if the two-dimensional system
is critical and conformally invariant [28, 29].
Returning to (20.9), one sees that ρα has a thermal form with some effective free-
particle Hamiltonian H appearing in the exponent. The eigenstates |Φα n and their
eigenvalues wn are then specified by the single-particle occupation numbers and
the values of the εl . The latter can be given explicitly in a few cases but otherwise
have to be found numerically. Degeneracies in the wn will occur either if one of
the εl is zero or if they are commensurate. Note that although the ρα look like
thermal density operators, no temperature appears. However, one can ascribe an
effective temperature to the subsystem if one is dealing with a critical model where
the low-lying spectrum of H has the same linear form as that of the Hamiltonian
itself [30, 31].
For completeness, we mention that ρα can also be determined for some other
states with high symmetry [32] and for a number of systems with infinite-range
interactions [33].
20.5 Spectra
The free-particle models discussed above can be used to calculate the density-matrix
spectra and to show their typical features. It turns out that there are differences
between critical and non-critical systems and also between one and two dimensions.
We will present results for two particular models in their ground states. One is the
Ising chain in a transverse field with Hamiltonian
H=− σnz − λ σnx σn+1
x
, (20.14)
n n
which has a non-degenerate ground state without long-range order for λ < 1, a
two-fold degenerate one for λ > 1 and a quantum critical point at λ = 1. It can
be viewed also as a fermionic model with pair creation and annihilation terms. The
other one is a fermionic hopping model which in one dimension has the Hamiltonian
H=− tn (c†n cn+1 + c†n+1 cn ) . (20.15)
n
150 100
λ = 0.1
λ = 0.5
λ = 1.0
100 10–20
εl wn
50 10–40
λ = 0.1
λ = 0.5
λ = 1.0
0 10–60
1 2 3 4 5 6 7 8 9 10 0 10 20 30 40
l n
Fig. 20.3. Density-matrix spectra for one-half of a transverse Ising chain with N = 20 sites in
its ground state. Left: Single-particle eigenvalues εl . Right: Total eigenvalues wn . After [12]
– If the system is non-critical, the dispersion is linear for the lowest εl , i.e. they
are equally spaced;
– The spacing becomes smaller and the linear region shrinks as one approaches
the critical point;
– At the critical point, the linear region of the dispersion curve is no longer visible.
The equidistance of the levels becomes exact in the limit of an infinite system where
it follows from the corresponding corner transfer matrix spectrum. The explicit for-
mula in this case is, for λ < 1
εl = ε (2l − 1) , l = 1, 2, 3 . . . , (20.16)
where ε = π I(k ′ )/I(k). Here √ I(k) denotes the complete elliptic integral of the
first kind, k = λ and k ′ = 1 − k 2 [25]. The deviations from the linear law are
therefore finite-size effects which, for fixed system size, increase near the critical
point.
The eigenvalues wn of ρ1 which follow from the single-particle spectrum, are
displayed in the right part of Fig. 20.3. One sees an extremely rapid decrease (please
note the vertical scale), because the εl appearing in the exponent are all rather large.
This is a typical property of non-critical quantum chains. For the equidistant lev-
els (20.16) one can also determine the asymptotic form of the wn [34]. The decay
becomes slower near the critical point, but is still impressive even for λ = 1.
A closer look at critical systems, however, shows an important difference. The
spectra then depend on the size of the subsystem in an essential way. Specifically,
the single-particle dispersion becomes flatter and flatter as the size increases, and
correspondingly also the wn -curves become flatter. This is shown in Fig. 20.4 for a
segment of L sites in an infinite homogeneous hopping model. For very large L, the
εl are in this case predicted to have again a linear dispersion as in (20.16)
π2
εl = ± (2l − 1) , l = 1, 2, 3 . . . , (20.17)
2 ln L
588 I. Peschel and V. Eisler
100
L = 100
10−1 L = 40
L = 20
10−2
−3
wn 10
10−4
10−5
10−6
0 20 40 60 80 100
n
Fig. 20.4. Size dependence of the density-matrix spectrum in a critical system. Shown are the
largest wn for segments of different length in an infinite hopping model
0.5 0.5
0.3 0.3
0.1 0.1
–0.1 –0.1
–0.3 –0.3
δ=0 δ = 0.1
–0.5 –0.5
0 20 40 60 80 100 0 20 40 60 80 100
j j
60 100
20 × 20
50
16 × 16
40 10−2
εl 30 12 × 12 wn
20 10−4 20 × 20
10 12 × 12 16 × 16
0 10−6
0 40 80 120 160 200 0 500 1000 1500 2000
l n
Fig. 20.6. Density-matrix spectra for halves of N × N hopping models in their ground states.
The sizes are indicated in the figures. Left: Single-particle eigenvalues εl . Right: Total eigen-
values wn . After [12]
subsystem has two boundaries with the rest, one also finds two such states which
are practically degenerate and only differ in their reflection symmetry. As remarked
above, this leads to degeneracies in the wn and thus to a considerably slower decay
of the spectrum than if one has only one boundary. Such a feature was noted early
on when comparing DMRG calculations for open chains and rings. It also gives an
indication to what happens in two dimensions.
Spectra for homogeneous two-dimensional hopping models in the form of N ×
N squares which are divided into two halves of size N ×N/2 are shown in Fig. 20.6.
The lowest εl now have a kind of band structure with about N states in the lowest
band. These can be associated with the interface. The picture would be even clearer
if one considered a non-critical system where these states are more localized. This
band structure has drastic consequences for the wn , as seen on the right. After an
initial decay, the spectrum flattens extremely, because the corresponding wn can be
generated by a large number of different single-particle combinations. This indicates
that a DMRG calculation will not be successful in this case. Due to the long interface
one has a much higher entanglement in the wave function than in one dimension.
This feature will be discussed again in the next section in a somewhat different way.
where the trace has been rewritten as a sum using the eigenvalues wn . The entropy
is defined in a way that certain basic requirements are automatically fulfilled. The
most important properties are as follows:
– The entropy is determined purely by the spectrum of ρ1 , which is known to be
identical to the spectrum of ρ2 , therefore S1 = S2 holds for arbitrary biparti-
tions, thus giving a measure of the mutual connection of the parts;
– The entropy vanishes for product states, and has a maximal value of S = ln M
when all the eigenvalues are equal, wn = 1/M for n = 1, 2, . . . , M . Using
this one can write in general S = ln Meff , where Meff is the effective number of
coupled states in parts 1 and 2.
Apart from these basic properties, the entanglement entropy shows features
which result from the specific underlying density-matrix spectra. Correspondingly,
they are different for critical and non-critical systems and depend on the dimension-
ality. We discuss this again for solvable models.
Consider the case of free fermions or bosons where the reduced density matrix
has the exponential form (20.9). Then the entanglement entropy is given by the same
expression as in thermodynamics, namely
εl
S=± ln(1 ± e−εl ) + , (20.19)
eεl ± 1
l l
where the upper (lower) sign refers to fermions (bosons), respectively. In one di-
mension, these sums can be evaluated analytically in terms of elliptic integrals, if
the εl have a linear dispersion as in (20.16) [15]. In this way, one can obtain S for
the non-critical transverse Ising chain, the XY chain or a chain of harmonic oscilla-
tors and finds that it is finite and typically of the order one. Thus the corresponding
ground states have Meff ∼ 1 − 10 and are only weakly entangled as can be seen also
from the density-matrix spectra.
The critical case is different, however, since as shown above the spectra then
vary with the size of the subsystem. Using the asymptotic form (20.17) for a segment
in a hopping model, one can evaluate S for large ln L by converting the sums into
integrals. This gives
⎡ ∞ ⎤
∞
2 ln L ⎣ ε ⎦ ,
S= dε ln(1 + e−ε ) + dε ε (20.20)
π2 e +1
0 0
These results also show that the entanglement entropy belongs to the quantities dis-
playing critical behavior at a quantum phase transition. This is illustrated in Fig. 20.7
for the dimerized hopping model introduced in (20.15). The entropy is plotted there
against the dimerization parameter δ, which measures the distance from the critical
point δ = 0. With increasing subsystem size, the curves become more and more
peaked, signaling a singularity in the thermodynamic limit. One can also verify that
the entropy has the usual finite-size scaling properties [29]. These features were also
found in hopping models with an energy current [20].
For higher-dimensional systems, the spectra in Fig. 20.6 give some indication
on the behavior of the entropy. The low-lying band of εl roughly has the effect of
multiplying the contribution of one eigenvalue by the length of the interface. In-
deed, there is a long-standing conjecture, called the “area law”, which originated in
2.6 L = 300
2.4 L = 100
L = 50
2.2
L = 20
2
S(δ)
1.8
1.6
1.4
1.2
1
–0.1 –0.05 0 0.05
δ
Fig. 20.7. Entanglement entropy for segments of different size L in a one-dimensional hop-
ping model as a function of the dimerization parameter δ. The development of a singularity
in case of vanishing dimerization is clearly visible
592 I. Peschel and V. Eisler
the context of black-hole physics [35, 36]. It states, that the entropy of an entan-
gled state obtained by tracing out the degrees of freedom inside a given region in
space (the black hole) should scale with the surface area of that region (instead of its
volume). It was first checked numerically for massless bosonic fields in three spa-
tial dimensions [36] and has recently been proven for non-critical harmonic lattice
systems in arbitrary dimensions [24].
The idea of an area law is very plausible given the fact, that the entanglement
entropy measures mutual connections in a wave function. However, it is not univer-
sally valid. In one dimension, the surface area of a subsystem is just the number of
contact points with the rest of the system, which would lead to a constant entropy.
This is indeed the case for non-critical systems, but as the results presented above
show, not at criticality. It is therefore an intriguing question whether this is only a
peculiarity of these one dimensional systems. Several studies in this direction have
shown, that in the fermionic case the violation of the area law carries on to higher
dimensional critical systems [22, 37, 38], if the Fermi surface is finite [39]. Thus, to
leading order the behavior of the entropy for fermionic systems is given by
/
Ld−1 non-critical case
S∼ d−1
(20.24)
L ln L critical case
Thus at each site one has two coefficients for the two spin directions. Multiplying
out the product, one can write this as
|Ψ = c(s) |s , (20.26)
s
whereas, for an open chain, one uses boundary vectors in the auxiliary space
The simplest case is a homogeneous state where the matrices are the same for all
sites. Such states were first considered in the eighties [42, 43] and occur as ground
states of certain spin chains with competing interactions [44]. The best-known ex-
ample is the spin-one chain with bilinear and biquadratic interactions and a certain
ratio of the couplings, where the valence-bond ground state [45] can be written in
this form using 2 × 2 matrices. They also appear in non-equilibrium models describ-
ing, for example, the diffusion of hard-core particles between two reservoirs. This
case corresponds to (20.29) and, depending on the parameters, the dimension m of
the matrices can be finite or infinite [46].
The first property can be seen very easily. For an open chain which is divided into
two parts, there are m connections between the matrix product to the left and to the
right of the interface. Thus
m
|Ψ = βn |φ1n |φ2n . (20.30)
n=1
This is not yet the Schmidt decomposition (20.5) because in general the states |φαn
are not orthogonal. Nevertheless, the number of terms in the Schmidt decomposi-
tion is limited by m, the dimension of the matrices. For a ring, where one has two
interfaces, it is limited by m2 . Correspondingly, the reduced density matrices have
up to m resp. m2 non-zero eigenvalues. If m is small, this gives the possibility to
detect such states by investigating the density-matrix spectrum [47].
The second property excludes in principle the description of critical systems
by such a state. However, taking the matrices large enough, one may still obtain a
very good approximation for a system of finite size. The question of representing a
quantum state in terms of a matrix product has recently been investigated in detail
[48]. This was motivated partly by the fact that the DMRG produces its approximate
wave function in the form of an (inhomogeneous) matrix product [49], as will be
discussed in the next contribution. An alternative to the usual DMRG procedure
could then be to start with a matrix-product Ansatz from the beginning and to find
the matrices for the ground state by minimizing the energy [50]. This idea can be
extended to higher dimensions [51]. For example, in a square lattice the analogue of
the matrices would be fourth-order tensors which permit to connect each site to its
four neighbors.
20.8 Summary
In this contribution we have discussed quantum states in terms of their entangle-
ment properties. This approach is an alternative to the conventional characterization
via correlation functions and the topic of intense current research. It also provides
the framework in which the DMRG operates. Some knowledge of it is therefore
indispensable for a deeper understanding and an appreciation of the nature of this
intriguing numerical method. We have dealt with particular many-body states in or-
der to illustrate basic features of entanglement. The DMRG is also an ideal tool if
one wants to study these features for more complicated systems, because the algo-
rithm is based on density-matrix spectra and determines them routinely. However, it
has much wider applications as will be described in the following chapters.
References
1. S.R. White, Phys. Rev. Lett. 69, 2863 (1992) 581
2. S.R. White, Phys. Rev. B 48, 10345 (1993) 581
20 The Conceptual Background of Density-Matrix Renormalization 595
48. F. Verstraete, J.I. Cirac, Phys. Rev. B 73, 094423 (2006) 594
49. S. Östlund, S. Rommer, Phys. Rev. Lett. 75, 3537 (1995) 594
50. F. Verstraete, D. Porras, J.I. Cirac, Phys. Rev. Lett. 93, 227205 (2004) 594
51. F. Verstraete, J.I. Cirac (2004). URL http://arxiv.org/abs/cond-mat/
0407066. Preprint 594
21 Density-Matrix Renormalization
Group Algorithms
Eric Jeckelmann
Institut für Theoretische Physik, Leibniz Universität Hannover, 30167 Hannover, Germany
In this chapter I will introduce the basic Density Matrix Renormalization Group
(DMRG) algorithms for calculating ground states in quantum lattice many-body
systems using the one-dimensional spin- 21 Heisenberg model as illustration. I will
attempt to present these methods in a manner which combines the advantages of
both the traditional formulation in terms of renormalized blocks and superblocks
and the new description based on matrix-product states. The latter description is
physically more intuitive but the former description is more appropriate for writing
an actual DMRG program. Pedagogical introductions to DMRG which closely fol-
low the original formulation are available in [2, 1]. The conceptual background of
DMRG and matrix-product states is discussed in the previous chapter and should be
read before. Extensions of the basic DMRG algorithms are presented in the chapters
that follow this one.
21.1 Introduction
The DMRG was developed by White [3, 4] in 1992 to overcome the problems
arising in the application of real-space renormalization groups to quantum lattice
many-body systems in solid-state physics. Since then the approach has been ex-
tended to a great variety of problems in all fields of physics and even in quantum
chemistry. The numerous applications of DMRG are summarized in two recent re-
view articles [5, 6]. Additional information about DMRG can be found at http:
//www.dmrg.info.
Originally, DMRG has been considered as an extension of real-space renor-
malization group methods. The key idea of DMRG is to renormalize a system us-
ing the information provided by a reduced density matrix rather than an effective
Hamiltonian (as done in most renormalization groups), hence the name density-
matrix renormalization. Recently, the connection between DMRG and matrix-
product states has been emphasized (for a recent review, see [7]) and has lead to
significant extensions of the DMRG approach. From this point of view, DMRG
is an algorithm for optimizing a variational wavefunction with the structure of a
matrix-product state.
The outline of this chapter is as follows: First I briefly introduce the DMRG
matrix-product state and examine its relation to the traditional DMRG blocks and
E. Jeckelmann: Density-Matrix Renormalization Group Algorithms, Lect. Notes Phys. 739, 597–619 (2008)
DOI 10.1007/978-3-540-74686-7 21
c Springer-Verlag Berlin Heidelberg 2008
598 E. Jeckelmann
For instance, for the spin- 21 Heisenberg model dn = 2 and B(n)= {| ↑, | ↓}.
Any state |ψ of H can be expanded in this basis: |ψ = s c(s)|s. As ex-
plained in Chap. 20, the coefficients c(s) can take the form of a matrix product.
Here we consider a particular matrix-product state
where Cj is a (aj ×bj+1 )-matrix (i.e., with aj rows and bj+1 columns). The (an−1 ×
an )-matrices An (sn ) (for sn = 1, . . . , dn ; n = 1, . . . , j) and the (bn × bn+1 )-
matrices Bn (sn ) (for sn = 1, . . . , dn ; n = j + 1, . . . , N ) fulfill the orthonormaliza-
tion conditions
dn
dn
(An (sn ))† An (sn ) = I and Bn (sn ) (Bn (sn ))† = I , (21.3)
sn =1 sn =1
The orthonormality of each set of block states (i.e., the states belonging to the same
block Hilbert space) follows directly from the orthonormalization conditions for the
matrices An (sn ) and Bn (sn ).
Every set of block states spans a subspace of the Hilbert space associated with
the block. Using these states one can build an effective or renormalized (i.e., ap-
proximate) representation of dimension an or bn for every block. By definition, an
effective representation of dimension an for the block L(n) is made of vector and
matrix representations in a subspace basis B(L, n) for every state and operator
An (act-
ing on sites in L(n)) which our calculation requires. Note that if an = k=1 dk ,
the block state set is a complete basis of the block Hilbert space and the “effective”
representation is actually exact. An effective representation of dimension bn for a
right block R(n) is defined similarly using a subspace basis B(R, n).
If we combine the left block L(j) with the right block R(j + 1), we obtain
a so-called superblock {L(j) + R(j + 1)} which contains the sites 1 to N . The
tensor-product basis B(SB, j) = B(L, j) ⊗ B(R, j + 1) of the block bases is called
a superblock basis and spans a (aj bj+1 )-dimensional subspace of the system Hilbert
space H. The matrix-product state given by (21.2) can be expanded in this basis
aj bj+1
R(j+1)
L(j)
|ψ = [Cj ](α, β) |φα φβ , (21.8)
α=1 β=1
tractable only if all matrix dimensions are kept small, for instance an , bk ≤ m with
m up to a few thousands. A matrix-product state with restricted matrix sizes can be
considered as an approximation for states in H. In particular, it can be used as a vari-
ational ansatz for the ground state of the system Hamiltonian H. Thus the system en-
ergy E = ψ|H|ψ/ψ|ψ is a function of the matrices An (sn ), Bn (sn ), and Cj . It
has to be minimized with respect to these variational parameters subject to the con-
straints (21.3) to determine the ground state. In the following sections I will present
three algorithms (a numerical renormalization group, the infinite-system DMRG
method, and the finite-system DMRG method) for carrying out this minimization.
Fig. 21.1. Schematic representations of the NRG method (left), the infinite-system DMRG
(center), and the finite-system DMRG (right). Solid circles are lattice sites and ovals are
blocks. Going from top to bottom corresponds to the iterations L(1) → L(2) → · · · →
L(5) for the three methods. In the right picture, going from bottom to top corresponds to
the iterations R(N = 8) → R(7) → · · · → R(4) in a sweep from right to left of the
finite-system DMRG
21 Density-Matrix Renormalization Group Algorithms 601
O= OL,k OS,k , (21.11)
k
where the operators OL,k act only on sites in L(j) and the operators OS,k act only
on the site j + 1. For instance, the (one-dimensional) Heisenberg Hamiltonian on
the block L(j + 1)
j
H= S n S n+1 , (21.12)
n=1
can be decomposed as
j−1
1 + −
H= S n S n+1 ⊗ I + Sjz ⊗ Sj+1
z
+ Sj ⊗ Sj+1 + Sj− ⊗ Sj+1
+
, (21.13)
n=1
2
where I is the identity operator and S n , Snz , Sn+ , Sn− are the usual spin operators for
the site n. As a result the matrix representation of O in the basis (21.10)
" "
" " L(j)
O(α, sj+1 , α′ , s′j+1 ) = φα
L(j)
sj+1 " O "φα′ s′j+1 , (21.14)
is given by
O(α, sj+1 , α′ , s′j+1 ) = OL,k (α, α′ ) OS,k (sj+1 , s′j+1 ) , (21.15)
k
denotes the known matrix representations of OL,k in the basis B(L, j) of the block
L(j). The matrix representations of the site operators
"
OS,k (sj+1 , s′j+1 ) = sj+1 | OS,k "s′j+1 , (21.17)
can be calculated exactly. For instance, they correspond to the Pauli matrices for the
spin operators Snx , Sny , Snz in the spin- 21 basis B(n) = {| ↑, | ↓}.
Using this procedure we can construct the matrix representation (21.14) of
the Hamiltonian (restricted to the block L(j + 1)) in the basis (21.10). This ma-
trix can be fully diagonalized numerically. In practice, this sets an upper limit of
L(j+1)
a few thousands on aj dj+1 . The eigenvectors are denoted φμ (α, sj+1 ) for
L(j+1)
μ = 1, . . . , aj dj+1 and are ordered by increasing eigenenergies ǫμ . The aj+1
eigenvectors with the lowest eigenenergies are used to define a new basis B(L, j +1)
of L(j + 1) through (21.6) and the other eigenvectors are discarded. The matrix rep-
resentation in B(L, j + 1) for any operator acting in L(j + 1)
" "
" " L(j+1)
O(μ, μ′ ) = φL(j+1)μ " O "φμ′ ; μ, μ′ = 1, . . . , aj+1 , (21.18)
602 E. Jeckelmann
can be calculated using the orthogonal transformation and projection defined by the
reduced set of eigenvectors. Explicitly, we have to perform two successive matrix
products
aj dj+1
L(j+1)
M (α, sj+1 , μ′ ) = O(α, sj+1 , α′ , s′j+1 ) φμ′ (α′ , s′j+1 ) ,
α′ =1 s′j+1 =1
aj dj+1
∗
O(μ, μ′ ) = φμL(j+1) (α, sj+1 ) M (α, sj+1 , μ′ ) . (21.19)
α=1 sj+1 =1
Vector representations of states in L(j + 1) can be obtained using the same princi-
ples. Therefore, we have obtained an effective representation of dimension aj+1 for
the block L(j + 1). We note that the block states (21.6) are not explicitly calculated.
Only matrix and vector representations for operators and states in that basis and the
transformation from a basis to the next one need to be calculated explicitly.
Once the effective representation of L(j + 1) has be determined, the procedure
can be repeated to obtain the effective representation of the next larger block. This
procedure has to be iterated until j + 1 = N for a finite system or until a fixed
point is reached if one investigates an infinite system. After the last iteration phys-
ical quantities for the (approximate) ground state and low-energy excitations can
be calculated using the effective representation of L(N ). For instance, expectation
values are given by
aN
ψ|O|ψ = [C†N ](μ) O(μ, μ′ ) [CN ](μ′ ) , (21.20)
μ,μ′ =1
where O(μ, μ′ ) is the matrix representation of O in the basis B(L, N ) and CN is the
(aN × 1)-matrix corresponding to the state |ψ in (21.2) and (21.8). For the ground
state we obviously have [CN ](μ) = δμ,1 .
The NRG method is efficient and accurate for quantum impurity problems such
as the Kondo model but fails utterly for quantum lattice problems such as the
Heisenberg model. One reason is that in many quantum systems the exact ground
state can not be represented accurately by a matrix-product state (21.2) with re-
stricted matrix sizes. However, another reason is that in most cases the NRG algo-
rithm does not generate the optimal block representation for the ground state of a
quantum lattice system and thus does not even find the matrix-product state (21.2)
with the minimal energy for given matrix sizes.
tend to vanish. Thus at later iterations the low-energy eigenstates of the effective
Hamiltonian in larger subsystems have unwanted features like nodes where the ar-
tificial boundaries of the previous subsystems were located. White and Noack [9]
have shown that this difficulty can be solved in single-particle problems if the ef-
fects of the subsystem environment are taken into account self-consistently. DMRG
is the application of this idea to many-particle problems. In his initial papers [3, 4],
White described two DMRG algorithms: The infinite-system method presented in
this section and the finite-system method discussed in the next section.
The infinite-system method is certainly the simplest DMRG algorithm and is
the starting point of many other DMRG methods. In this approach the system size
increases by two sites in every iteration, N → N + 2, as illustrated in Fig. 21.1. The
right block R(j + 1) is always an image (reflection) of the left block L(j), which
implies that j ≡ N/2 in (21.2). Therefore, the superblock structure is {L(N/2) +
R(N/2 + 1)} and an effective representation for the N -site system is known if we
have determined one for L(N/2).
As in the NRG method an iteration consists in the calculation of an effective
representation of dimension aj+1 for the block L(j + 1) assuming that we already
know an effective representation of dimension aj for the block L(j). First, we pro-
ceed as with the NRG method and determine an effective representation of dimen-
sion aj dj+1 for L(j + 1) using the tensor product basis (21.10). Next, the effective
representation of R(j + 2) is chosen to be an image of L(j + 1). The quantum
system is assumed to be homogeneous and symmetric (invariant under a reflection
n → n′ = N − n + 3 through the middle of the (N+2)-site lattice) to allow for
this operation. Therefore, one can define a one-to-one mapping between the site
and block bases on the left- and right-hand sides of the superblock. We consider a
mapping between the tensor product bases for L(j + 1) and R(j + 2)
" "
" L(j) " R(j+3)
"φα sj+1 ↔ "sj+2 φβ . (21.21)
where the operator parts OL,k and OR,k act on sites in L(j + 1) and R(j + 2),
respectively. As an example, the Heisenberg Hamiltonian on a (N + 2)-site chain
can be written
j
N
+1
H= S n S n+1 ⊗ I + I ⊗ S n S n+1 +
n=1 n=j+2
z z 1 + − − +
Sj+1 ⊗ Sj+2 + Sj+1 ⊗ Sj+2 + Sj+1 ⊗ Sj+2 , (21.25)
2
where I is the identity operator. Therefore, the matrix representation of any operator
in the superblock basis
Storing the matrix representations (21.14) and (21.22) for the block operators
requires a memory amount ∝ nk [(aj dj+1 )2 + (dj+2 bj+3 )2 ], but calculating and
storing the superblock matrix (21.26) require nk (Dj+1 )2 additional operations and
a memory amount ∝ (Dj+1 )2 . As the number of operator pairs nk is typically much
smaller than the matrix dimensions aj , bj+3 (nk = 5 in the Heisenberg model on a
open chain), one should not calculate the superblock matrix representation (21.26)
explicitly but work directly with the right-hand side of (21.27). For instance, the ap-
plication of the operator O to a state |ψ ∈ H yields a new state |ψ ′ = O|ψ, which
can be calculated without computing the superblock matrix (21.26) explicitly. If
"
L(j) R(j+3) "
[Cj+1 ](α, sj+1 , sj+2 , β) = φα sj+1 sj+2 φβ "ψ , (21.28)
is the vector representation of |ψ in the superblock basis (21.23), the vector repre-
sentation C′j+1 of |ψ ′ in this basis is obtained through double matrix products with
the block operator matrices in (21.27)
21 Density-Matrix Renormalization Group Algorithms 605
(21.29)
Performing these operations once requires only nk Dj+1 (aj dj+1 +dj+2 bj+3 ) opera-
tions, while computing a matrix-vector product using the superblock matrix (21.26)
would require (Dj+1 )2 operations. In practice, this sets an upper limit of the order
of a few thousands for the matrix dimensions an , bn .
As we want to calculate the ground state of the system Hamiltonian H, the next
task is to set up the superblock representation (21.27) of H and then to determine
the vector representation (21.28) of its ground state in the superblock basis. To de-
termine the ground state without using the superblock matrix (21.26) of H we use
iterative methods such as the Lanczos algorithm or the Davidson algorithm, see
Chap. 18. These algorithms do not require an explicit matrix for H but only the
operation |ψ ′ = H|ψ, which can be performed very efficiently with (21.29) as
discussed above.
Once the superblock ground state Cj+1 has been determined, the next step is
finding an effective representation of dimension aj+1 < aj dj+1 for L(j + 1)
which described this ground state as closely as possible. Thus we look for the
best approximation C # j+1 of the superblock ground state Cj+1 with respect to a
new basis B(L, j + 1) of dimension aj+1 for L(j + 1). As discussed in Chap. 20
this can be done using the Schmidt decomposition or more generally reduced den-
sity matrices. Choosing the density-matrix eigenvectors with the highest eigenval-
ues is an optimal choice for constructing a smaller block basis (see Sect. 21.7).
Therefore, if the DMRG calculation targets a state with a vector representation
[Cj+1 ](α, sj+1 , sj+2 , β) in the superblock basis (21.23), we calculate the reduced
density matrix for the left block L(j + 1)
ρ(α, sj+1 , α′ , s′j+1 )
dj+2 bj+3
∗
= ([Cj+1 ](α, sj+1 , sj+2 , β)) [Cj+1 ](α′ , s′j+1 , sj+2 , β)
sj+2 =1 β=1
(21.30)
for α, α′ = 1, . . . , aj and sj+1 , s′j+1 = 1, . . . , dj+1 . This density matrix has aj dj+1
eigenvalues wμ ≥ 0 with
aj dj+1
wμ = 1 . (21.31)
μ=1
606 E. Jeckelmann
L(j+1)
We note φμ (α, sj+1 ) the corresponding eigenvectors. The aj+1 eigenvectors
with the largest eigenvalues are used to define a new basis B(L, j + 1) of L(j + 1)
through (21.6) and the other eigenvectors are discarded. As done in the NRG
method, the matrix representation of any operator in L(j + 1) can be calculated
using the orthogonal transformation and projection (21.19) defined by the reduced
set of eigenvectors. If necessary, vector representations of states in L(j + 1) can be
obtained using the same principles.
Thus, we have obtained an effective representation of dimension aj+1 for the
block L(j + 1). We note that as with the NRG method the block states (21.6) are not
explicitly calculated. Only matrix and vector representations of operators and states
in that basis and the transformation from a basis to the next one need to be calculated
explicitly. The procedure can be repeated to obtain an effective representation of the
next larger blocks (i.e., for the next larger lattice size). Iterations are continued until
a fixed point has been reached.
As an illustration Fig. 21.2 shows the convergence of the ground state energy per
site as a function of the superblock size N in the one-dimensional spin- 21 Heisen-
berg model. The energy per site EDMRG (N ) is calculated from the total energy E0
for two consecutive superblocks EDMRG (N ) = [E0 (N )−E0 (N −2)]/2. The exact
result for an infinite chain is Eexact = 41 − ln(2) according to the Bethe ansatz solu-
tion [10]. The matrix dimensions an , bn are chosen to be not greater than a number
m which is the maximal number of density-matrix eigenstates kept at each itera-
tion. As N increases, EDMRG (N ) converges to a limiting value EDMRG (m) which
is the minimal energy for a matrix-product state (21.2) with matrix dimensions up
to m. This energy minimum EDMRG (m) is always higher than the exact ground
state energy Eexact as expected for a variational method. The error in EDMRG (m)
is dominated by truncation errors, which decrease rapidly as the number m increases
(see the discussion of truncation errors in Sect. 21.7).
Once a fixed point has been reached, ground state properties can be calculated.
For instance, a ground state expectation value Ō = ψ|O|ψ is obtained in two
10–2
10–3
EDMRG(N) - Eexact
10–4 m = 20
10–5 m = 50
10–6
10–7 m = 100
Fig. 21.2. Convergence of the ground state energy per site calculated with the infinite-system
DMRG algorithm in a spin- 12 Heisenberg chain as a function of the superblock size N for
three different numbers m of density-matrix eigenstates kept
21 Density-Matrix Renormalization Group Algorithms 607
steps: First, one calculates |ψ ′ = O|ψ using (21.29), then the expectation value is
computed as a scalar product Ō = ψ|ψ ′ . Explicitly,
ψ|ψ ′ =
aj dj+1 dj+2 bj+3
∗
([Cj+1 ](α, sj+1 , sj+2 , β)) [C′j+1 ](α, sj+1 , sj+2 , β) .
α=1 sj+1 =1 sj+2 =1 β=1
(21.32)
effective representation of dimension bj+2 for the right block R(j + 2) calculated
in the previous iteration. For the first iteration j = N − 2, the exact representation
of R(N ) is used. As done for left blocks in the NRG and infinite-system DMRG
algorithm, we first define a tensor-product basis of dimension dj+1 bj+2 for the new
right block using the site basis B(j + 1) and the subspace basis B(R, j + 2) of
R(j + 2)
" "
" R(j+2) " R(j+2)
"sj+1 φβ = |sj+1 ⊗ "φβ , (21.33)
where the OS,k (sj+1 , s′j+1 ) are site-operator matrices (21.17) and OR,k (β, β ′ ) de-
notes the known matrix representations of operators acting on sites of R(j + 2)
in the basis B(R, j + 2). Thus we obtain an effective representation of dimension
dj+1 bj+2 for R(j +1). Next, we use the available effective representation of dimen-
sion aj−1 for the left block L(j − 1), which has been obtained during the previous
sweep from left to right (or the result of the warmup sweep if this is the first sweep
from right to left). With this block L(j − 1) we build an effective representation of
dimension aj−1 dj for L(j) using a tensor-product basis (21.10) as done in the NRG
and infinite-system DMRG methods.
Now we consider the superblock {L(j) + R(j + 1)} and its tensor-product basis
analogue to (21.23) and set up the representation of operators in this basis, espe-
cially the Hamiltonian, similarly to (21.27). As for the infinite-system algorithm we
determine the ground state Cj of the superblock Hamiltonian in the superblock ba-
sis using the Lanczos or Davidson algorithm and the efficient implementation of the
matrix-vector product (21.29). Typically, we have already obtained a representation
of the ground state Cj+1 for the superblock configuration {L(j + 1) + R(j + 2)} in
the previous iteration. This state can be transformed exactly in the superblock basis
for {L(j) + R(j + 1)} using
[CG
j ](α, sj , sj+1 , β) =
aj dj+2
L(j)
b
j+3
R(j+2)
∗
φα′ (α, sj ) [Cj+1 ](α′ , sj+1 , sj+2 , β ′ ) φβ (sj+2 , β ′ ) ,
α′ =1 sj+2 =1 β ′ =1
(21.35)
CGj can be used as the initial vector for the iterative diagonalization routine. When
the finite-system DMRG algorithm has already partially converged, this initial state
CGj is a good guess for the exact ground state Cj of the superblock Hamiltonian in
the configuration {L(j) + R(j + 1)} and thus the iterative diagonalization method
converges in a few steps. This can result in a speed up of one or two orders of
magnitude compared to a diagonalization using a random initial vector CG j .
Once the superblock representation Cj of the targeted ground state has been
obtained, we calculate the reduced density matrix for the right block R(j + 1)
ρ(sj+1 , β, s′j+1 , β ′ )
dj aj−1
∗
= ([Cj ](α, sj , sj+1 , β)) [Cj ](α, sj , s′j+1 , β ′ ) , (21.36)
sj=1 α=1
bj+2
d
j+1
∗
O(μ, μ′ ) = φμR(j+1) (sj+1 , β) M (sj+1 , β, μ′ ) , (21.37)
β=1 sj+1 =1
When this left-to-right sweep is done, one can start a new couple of sweeps back
and forth. The ground state energy calculated with the superblock Hamiltonian de-
creases progressively as the sweeps are performed. This results from the progres-
sive optimization of the matrix-product state (21.2) for the ground state. Figure 21.3
illustrates this procedure for the total energy of a 400-site Heisenberg chain. The
matrix dimensions an , bn are chosen to be not greater than m = 20 (maximal num-
ber of density-matrix eigenstates kept at each iteration). The sweeps are repeated
until the procedure converges (i.e., the ground state energy converges). In Fig. 21.3
the DMRG energy converges to a value EDMRG (m = 20) which lies about 0.008
above the exact result for the 400-site Heisenberg chain. As it corresponds to a vari-
ational wavefunction (21.2) the DMRG energy EDMRG (m) always lies above the
exact ground state energy and decreases as m increases.
Once convergence is achieved, ground state properties can be calculated with
(21.29) and (21.32) as explained in the previous section. Contrary to the infinite-
system algorithm, however, the finite-system algorithm yields consistent results for
the expectation values of operators acting on any lattice site. For example, we show
in Fig. 21.4 the staggered spin bond order (−1)n (S n S n+1 + ln(2) − 1/4) and
the staggered spin-spin correlation function C(r) = (−1)r S n S n+r obtained in
the 400-site Heisenberg chain using up to m = 200 density-matrix eigenstates. A
strong staggered spin bond order is observed close to the chain edges (Friedel os-
cillations) while a smaller one is still visible in the middle of the chain because
of its finite size. For a distance up to r ≈ 100 the staggered spin-spin correlation
function C(r) decreases approximately as a power-law 1/r as expected but a devi-
ation from this behavior occurs for larger r because of the chain edges. Finite-size
100
EDMRG - Eexact
10–1
10–2
Fig. 21.3. Convergence of the ground state energy calculated with the finite-system DMRG
algorithm using m = 20 density-matrix eigenstates as a function of the iterations in a 400-site
spin- 21 Heisenberg chain. Arrows show the sweep direction for the first three sweeps starting
from the top
21 Density-Matrix Renormalization Group Algorithms 611
100
0.2
10–1
C(r) = (–1)r<SnSn+r>
0.15
(–1)nΔ<SnSn+1>
10–2
0.1
10–3
0.05 DMRG
C(r) = 0.8/r
10–4
0
0 100 200 300 400 1 10 100
Bond n Distance r
Fig. 21.4. Staggered spin bond order (left) (−1)n (S n S n+1 − 41 + ln(2)) and staggered
spin-spin correlation function (right) C(r) = (−1)r S n S n+r . Both quantities have been
calculated using the finite-system DMRG algorithm with m = 200 in a 400-site spin- 12
Heisenberg chain. The dashed line is a guide for the eye
and chain-end effects are unavoidable and sometimes troublesome features of the
finite-size DMRG method.
Contrary to the infinite-system algorithm the finite-system algorithm always
finds the optimal matrix-product state (21.2) with restricted matrix sizes. Never-
theless, experience shows that the accuracy of DMRG calculations depends sig-
nificantly on the system investigated because the matrix-product state (21.2) with
restricted matrix sizes can be a good or a poor approximation of the true ground
state. In practice, this implies that physical quantities calculated with DMRG can
approach the exact results rapidly or slowly for an increasing number m of density-
matrix eigenstates kept. This so-called truncation error is discussed in Sect. 21.7.
For instance, the finite-system DMRG method yields excellent results for gapped
one-dimensional systems but is less accurate for critical systems or in higher di-
mensions for the reason discussed in Chap. 20.
L(j)
where the index r numbers the possible quantum numbersqr of QL(j) , α num-
bers ar,j basis states with the same quantum number, and r ar,j = aj .
We note that QL(j+1) = QL(j) + Qj+1 . Thus if we choose the site basis states
in B(j + 1) to be eigenstates of the site operator Qj+1 and denote |t, sj+1 a basis
S(j+1)
state with quantum number qt , the tensor product state (21.10) becomes
" "
" L(j) " L(j)
"φr,α ; t, sj+1 = "φr,α ⊗ |t, sj+1 , (21.40)
L(j+1) L(j)
and its quantum number (eigenvalue of QL(j+1) ) is given by qp = qr +
S(j+1)
qt . Therefore, the corresponding density-matrix eigenstates take the form
L(j+1) L(j+1) L(j) S(j+1)
φp,α (r, α′ , t, sj+1 ) and vanish if qp = qr + qt , see (21.6). Simi-
R(j+1)
larly, the density-matrix eigenstates for a right block are noted φp,β (t, sj+1 , r, β ′ )
R(j+1) R(j+2) S(j+1)
and vanish if qp = qr + qt . We can save computer time and mem-
ory if we use this rule to compute and store only the terms which do not identically
vanish.
Furthermore, as Q = QL(j) + Qj+1 + Qj+2 + QR(j+3) , a superblock basis
state (21.23) can be written
" " "
" L(j) R(j+3) " L(j) " R(j+3)
"φp,α ; r, sj+1 ; t, sj+2 ; φv,β = "φp,α ; r, sj+1 ⊗ "t, sj+2 ; φv,β ,
(21.41)
L(j) S(j+1)
and its quantum number (eigenvalue of Q) is given by q = qp + qr +
S(j+2) R(j+3)
qt + qv . Therefore, the superblock representation (21.28) of a state |ψ
with a quantum number q can be written [Cj+1 ](p, α, r, sj+1 , t, sj+2 , v, β) and van-
L(j) S(j+1) S(j+2) R(j+2)
ishes if q = qp + qr + qt + qv . Here again we can save computer
time and memory if we use this rule to compute and store only the components of
Cj+1 which do not identically vanish.
21 Density-Matrix Renormalization Group Algorithms 613
ground state with too much precision but strike a balance between accuracy and
computational cost. In DMRG algorithms that target other states than the ground
state (for instance, dynamical correlation functions, see Chap. 22), the diagonaliza-
tion error may become relevant.
Convergence errors corresponds to non-optimal matrices An (sn ) and Bn (sn )
in the matrix-product state (21.2). They are negligible in DMRG calculations for
ground state properties in non-critical one-dimensional open systems with nearest-
neighbor interactions. For such cases DMRG converges after very few sweeps
through the lattice. Convergence problems occur frequently in critical or inhomo-
geneous systems and in systems with long-range interactions (this effectively in-
cludes all systems in dimension larger than one, see the last section). However, if
one performs enough sweeps through the lattice (up to several tens in hard cases),
these errors can always be made smaller than truncation errors (i.e., the finite-system
DMRG algorithm always finds the optimal matrices for a matrix-product state (21.2)
with restricted matrix sizes).
Truncation errors are usually the dominant source of inaccuracy in the finite-
system DMRG method. They can be systematically reduced by increasing the ma-
trix dimensions an , bn used in (21.2). In actual computations, however, they can be
significant and it is important to estimate them reliably. In the finite-system DMRG
algorithm a truncation error is introduced at every iteration when a tensor-product
basis of dimension aj dj+1 for the left block L(j + 1) is reduced to a basis of di-
mension aj+1 during a sweep from left to right and, similarly, when a tensor-product
basis of dimension bj+2 dj+1 for the right block R(j + 1) is reduced to a basis of
dimension bj+1 during a sweep from right to left. Each state |ψ which is defined
using the original tensor-product basis (usually, the superblock ground state) is re-
placed by an approximate state |ψ # which is defined using the truncated basis. It has
been shown [1] that the optimal choice for constructing a smaller block basis for a
given target state |ψ consists in choosing the eigenvectors with the highest eigen-
values wμ from the reduced density-matrix (21.30) or (21.36) of |ψ for this block.
" "2
" # "" between the
More precisely, this choice minimizes the differences = "|ψ − |ψ
#
target state |ψ and its approximation |ψ.
The minimum of S is given by the weight P of the discarded density-matrix
eigenstates. With w1 ≥ w2 ≥ · · · ≥ waj dj+1 we can write
aj dj+1 aj+1
Smin = P (aj+1 ) = wμ = 1 − wμ (21.44)
μ=1+aj+1 μ=1
bj+1
for the left block L(j + 1) and similarly Smin = P (bj+1 ) = 1 − μ=1 wμ for the
right block R(j + 1). It can be shown that errors in physical quantities depend di-
rectly on the discarded weight. For the ground-state energy the truncation introduces
an error
# ψ
ψ|H| # ψ|H|ψ
− ∝ P (aj+1 ) or P (bj+1 ) , (21.45)
# #
ψ|ψ ψ|ψ
21 Density-Matrix Renormalization Group Algorithms 615
100
10–2
10–4
EDMRG(m) - Eexact
10–6
10–8
10–10
10–12 Open
Periodic
10–14
10–16
0 100 200 300 400
m
Fig. 21.5. Error in the ground state energy calculated with the finite-system DMRG algorithm
as a function of the number m of density-matrix eigenstates kept for the spin- 12 Heisenberg
Hamiltonian on a one-dimensional 100-site lattice with open (circles) and periodic (squares)
boundary conditions
616 E. Jeckelmann
magnitude of round-off errors in the computer system used. For periodic boundary
conditions (a less favorable case) the error decreases slowly with m and is still sig-
nificant for the largest number of density-matrix eigenstates considered m = 400.
In the second approach the density-matrix eigenbasis is truncated so that the dis-
carded weight is approximately constant, P (aj+1 ), P (bj+1 ) P , and thus a vari-
able number of density-matrix eigenstates is kept at every iteration. The physical
quantities obtained with this procedure depends on the chosen discarded weight P .
Empirically, one finds that the relations (21.45) and (21.46) hold for DMRG results
calculated with various P . For the energy one has EDMRG (P ) ≈√ E(P = 0) + cP
and for other expectation values ŌDMRG (P ) ≈ Ō(P = 0) + c′ P if P is small
enough. Therefore, we can carry out DMRG calculations for several values of the
discarded weight P and obtain results E(P = 0) and Ō(P = 0) in the limit of van-
ishing discarded weight P → 0 using an extrapolation. In practice, this procedure
yields reliable estimations of the truncation errors and often the extrapolated results
are more accurate than those obtained directly with DMRG for the smallest value of
P used in the extrapolation.
It should be noted that if one works with a fixed number m of density-matrix
eigenstates
kept, it is possible to calculate an average discarded weight P (m) =
j P (b j+1 ) over a sweep. In many cases, the physical quantities EDMRG (m) and
ŌDMRG (m) scale with P (m) as in (21.45) and (21.46), respectively. Therefore, an
extrapolation to the limit of vanishing discarded weight P (m) → 0 is also possible
(see [12] for some examples).
where q is the number of fermions in the state |α′ and ∆q is the difference between
the number of fermions in the states |β and |β ′ .
To apply the DMRG method to boson systems such as electron-phonon models,
we must first choose an appropriate finite basis for each boson site to represent the
infinite Hilbert space of a boson as best as possible, which is done also in exact
diagonalization methods [12]. Then the finite-system DMRG algorithm can be used
without modification. However, the computational cost scales as d3 for the CPU
time and as d2 for the memory if d states are used to represent each boson site.
Typically, d = 10 − 100 is required for accurate computations in electron-phonon
models. Therefore, simulating boson systems with the standard DMRG algorithms
is significantly more demanding than spin systems. More sophisticated DMRG al-
gorithms have been developed to reduce the computational effort involved in solving
boson systems. The best algorithms scale as d or d ln(d) and are presented in [12].
618 E. Jeckelmann
Fig. 21.6. Schematic representations of the site sequence (dashed line) in a two-dimensional
lattice. The site in the bottom left corner is site 1. The superblock structure {L(21)+ site
22 + site 23 +R(24)} is shown with solid lines delimiting the left and right blocks and full
circles indicating both sites
The finite-system DMRG method can be applied to quantum systems with vari-
ous degrees of freedom, on lattices in dimension larger than one, and to the non-local
Hamiltonians considered in quantum chemistry and momentum space, see Chap. 24.
We just have to order the lattice sites from 1 to N in some way to be able to carry
out the algorithm described in Sect. 21.5. For instance, Fig. 21.6 shows one possi-
ble site sequence for a two-dimensional cluster. It should be noted that sites which
are close in the two-dimensional lattice are relatively far apart in the sequence. This
corresponds to an effective long-range interaction between the sites even if the two-
dimensional system includes short-range interactions only, and results in a slower
convergence and larger truncation errors than in truly one-dimensional systems with
short-range interactions. As a consequence, reordering of the lattice sites can sig-
nificantly modify the accuracy of a DMRG calculation and various site sequences
should be considered for those systems which do not have a natural order. The dif-
ficulty with DMRG simulations and more generally with matrix-product states in
dimensions larger than one is discussed fully in Chap. 20.
References
1. I. Peschel, X. Wang, M. Kaulke, K. Hallberg (eds.), Density-Matrix Renormalization, A
New Numerical Method in Physics (Springer, Berlin, 1999) 597, 614
2. R.M. Noack, S.R. Manmana, in Lectures on the Physics of Highly Correlated Electron
Systems IX: Ninth Training Course in the Physics of Correlated Electron Systems and
High-Tc Superconductors, AIP Conf. Proc., Vol. 789, ed. by A. Avella, F. Mancini (AIP,
2005), pp. 93–163 597
3. S.R. White, Phys. Rev. Lett. 69(19), 2863 (1992) 597, 603
4. S.R. White, Phys. Rev. B 48(14), 10345 (1993) 597, 603
5. U. Schollwock, Rev. Mod. Phys. 77(1), 259 (2005) 597
6. K. Hallberg, Adv. Phys. 55, 477 (2006) 597
7. I.P. McCulloch (2007). URL http://arxiv.org/abs/cond-mat/0701428.
Preprint 597, 611
21 Density-Matrix Renormalization Group Algorithms 619
22.1 Introduction
Calculating the dynamical correlation functions of quantum many-body systems
has been a long-standing problem of theoretical physics because many experi-
mental techniques probe these properties. For instance, solid-state spectroscopy
experiments, such as optical absorption, photoemission, or nuclear magnetic res-
onance, measure the dynamical correlations between an external time-dependent
perturbation and the response of electrons and phonons in solids [1]. Typically, the
zero-temperature dynamic response of a quantum system is given by a dynamical
correlation function (with = 1)
' " " (
1 " 1 "
GX (ω + iη) = − ψ0 ""X † X "" ψ0 , (22.1)
π E0 + ω + iη − H
where H is the time-independent Hamiltonian of the system, E0 and |ψ0 are its
ground-state energy and wavefunction, X is the quantum operator corresponding to
the physical quantity which is analyzed, and X † is the Hermitian conjugate of X. A
small real number η > 0 is used to shift the poles of the correlation function into the
complex plane. The spectral function GX (ω + iη) is also the Laplace transform (up
to a constant prefactor) of the zero-temperature time-dependent correlation function
E. Jeckelmann and H. Benthien: Dynamical Density-Matrix Renormalization Group, Lect. Notes Phys. 739, 621–635
(2008)
DOI 10.1007/978-3-540-74686-7 22
c Springer-Verlag Berlin Heidelberg 2008
622 E. Jeckelmann and H. Benthien
It describes electrons with spin σ =↑, ↓ which can hop between neighboring sites
on a lattice. Here c†jσ and cjσ are creation and annihilation operators for electrons
with spin σ at site j (= 1, . . . , N ), njσ = c†jσ cjσ are the corresponding density
operators, and nj = nj↑ + nj↓ . The hopping integral t gives rise to a single-electron
band of width 4t. The Coulomb repulsion between electrons is mimicked by a local
Hubbard interaction U ≥ 0. The chemical potential has been chosen μ = U/2 so
that the number of electrons is equal to the number of sites N (half-filled band) in the
grand-canonical ground state and the Fermi energy is εF = 0 in the thermodynamic
limit. The photoemission spectral function A(k, ω) is the imaginary part of the one-
particle Green’s function
for the operator X = ckσ which annihilates an electron with spin σ in the Bloch state
with wavevector k ∈ (−π, π]. This spectral function corresponds to the spectrum
measured in angle-resolved photoemission spectroscopy experiments. We note that
the spectral function of the Hubbard model is symmetric with respect to spatial
reflection Aσ (−k, ω) = Aσ (k, ω) and spin-reflection A↑ (k, ω) = A↓ (k, ω). The
one-particle density of states (DOS) is
1
nσ (ω ≤ 0) = Aσ (k, ω) . (22.6)
N
k
spectra. Then we will present the dynamical DMRG method, which is presently the
best frequency-space DMRG approach for calculating zero-temperature dynamical
correlation functions when the spectrum is complex or continuous and allows us to
determine spectral properties in the thermodynamic limit (i.e., for infinitely large
lattices). The basic principles of the DMRG method are described in the Chaps. 20
and 21 of this book and are assumed to be known. The direct calculation of time-
dependent quantities (22.2) within DMRG is explained in Chap. 23 while methods
for computing dynamical quantities at finite temperature are described in Chap. 25.
En − E0 is the excitation energy and |n|X|0|2 the spectral weight of the n-th
excited state. Obviously, only states with a finite spectral weight contribute to the
dynamical correlation function. Typically, the number of contributing excited states
scales as a power of the system size N (while the Hilbert space dimension increases
exponentially with N ). In principle, one can calculate the contributing excited states
only and reconstruct the spectrum from the sum over these states (22.7).
The simplest method for computing excited states within DMRG is to target
the lowest M eigenstates |ψs instead of the sole ground state using the standard
algorithm. In that case, the density matrix is formed as the sum
M
ρ= cs ρ s (22.8)
s=1
of the density matrices ρs = |ψs ψs | for each target state [8]. As a result the DMRG
algorithm produces an effective Hamiltonian describing these M states accurately.
Here the coefficients cs > 0 are normalized weighting factors ( s cs = 1), which
allow us to vary the influence of each target state in the formation of the density
matrix. This approach yields accurate results for some problems such as the Holstein
polaron [9]. In most cases, however, this approach is limited to a small number M
of excited states (of the order of ten) because DMRG truncation errors grow rapidly
with the number of targeted states (for a fixed number of density-matrix eigenstates
kept). This is not sufficient for calculating a complete spectrum for a large system
and often does not even allow for the calculation of low-energy excitations. For
instance, in the strong-coupling regime U ≫ t of the half-filled one-dimensional
624 E. Jeckelmann and H. Benthien
The Lanczos-DMRG method [12, 13] combines DMRG with the Lanczos algo-
rithm [14] to compute dynamical correlation functions. Starting from the states
|φ−1 = 0 and |φ0 = X|ψ0 , the Lanczos algorithm recursively generates a set
of so-called Lanczos vectors:
where an = φn |H|φn /φn |φn and b2n+1 = φn+1 |φn+1 /φn |φn for n =
0, . . . , L − 1. These Lanczos vectors span a Krylov subspace containing the ex-
cited states contributing to the dynamical correlation function (22.1). Calculating L
Lanczos vectors gives the first 2L − 1 moments of a spectrum and up to L excited
states contributing to it. The spectrum can be obtained from the continued fraction
expansion
ψ0 |X † X|ψ0
− πGX (z − E0 ) = . (22.11)
b21
z − a0 −
b22
z − a1 −
z − ...
This procedure has proved to be efficient and reliable in the context of exact diago-
nalizations (see Chap. 18).
Within a DMRG calculation the Lanczos algorithm is applied to the effective
superblock operators H and X and serves two purposes. Firstly, it is used to com-
pute the full dynamical spectrum. Secondly, in addition to the ground state |ψ0
some Lanczos vectors {|φn , n = 0, . . . , M ≤ L} are used as target (22.8) to con-
struct an effective representation of the Hamiltonian which describes both ground
state and excited states accurately. Be reminded that a target state does not need to
be an eigenstate of the Hamiltonian but can be any quantum state which is well-
defined and can be computed in every superblock during a DMRG sweep through
the lattice. Unfortunately, DMRG truncation errors increase rapidly with the num-
ber M of target Lanczos vectors for a fixed number of density-matrix eigenstates
kept and the method becomes numerically unstable. Therefore, only the first few
Lanczos vectors (often only the first one |φ0 ) are included as target in most ap-
plications of Lanczos DMRG. In that case, the density-matrix renormalization does
not necessarily converge to an optimal representation of H for all excited states
contributing to a dynamical correlation function and the calculated spectrum is not
always reliable. For instance, the shape of continuous spectra (for very large systems
N ≫ 1) can not be determined accurately with the Lanczos-DMRG method [13].
Nevertheless, Lanczos DMRG is a relatively simple and quick method for calculat-
ing dynamical properties within DMRG. In practice, it gives reliable and accurate
results for simple discrete spectra made of (or dominated by) a few peaks only and it
has been used successfully in several studies of low-dimensional correlated systems
(see [6, 7]).
626 E. Jeckelmann and H. Benthien
where |X = X|ψ0 is identical to the first Lanczos vector. If the correction vector
is known, the dynamical correlation function can be calculated directly
1
GX (ω + iη) = − X|ψX (ω + iη) . (22.13)
π
To calculate a correction vector an inhomogeneous linear equation system
has to be solved for the unknown state |ψ. Typically, the vector space dimension is
very large and the equation system is solved with the conjugate gradient method [16]
or other iterative methods [17].
The distinctive characteristic of a correction vector approach to the calcula-
tion of dynamical properties is that a specific quantum state (22.12) is constructed
to compute the dynamical correlation function at each frequency ω. To obtain a
complete dynamical spectrum, the procedure has to be repeated for many differ-
ent frequencies. Therefore, in the context of exact diagonalizations the correction-
vector approach is less efficient than the Lanczos technique (22.10) and (22.11). For
DMRG calculations, however, this is a highly favorable characteristic. The dynami-
cal correlation function can be determined for each frequency ω separately using ef-
fective representations of the system Hamiltonian H and operator X which describe
a single excitation energy accurately. The approach can be extended to higher-order
dynamic response functions such as third-order optical polarizabilities [18].
In practice, in a correction-vector DMRG calculation [13] two correction vec-
tors with close frequencies ω1 and ω2 and finite broadening η ∼ ω2 − ω1 > 0
are calculated from the effective superblock operators H and X and used as tar-
get (22.8) beside the ground state |ψ0 and the first Lanczos vector |X. This is
sufficient to obtain an accurate effective representation of the system excitations for
frequencies ω1 ω ω2 . The spectrum is then calculated for this frequency
interval using (22.13). The calculation is repeated for several (possibly overlap-
ping) intervals to determine the spectral function over a large frequency range. This
correction-vector DMRG method allows one to perform accurate calculations of
complex or continuous spectra for all frequencies in large lattice quantum many-
body systems [6, 7, 13].
22 Dynamical Density-Matrix Renormalization Group 627
The success of the correction-vector DMRG method for calculating dynamical prop-
erties show that using specific target states for each frequency is the right approach.
This idea can be further improved using a variational formulation of the prob-
lem [19]. Consider the functional
For any η = 0 and a fixed frequency ω this functional has a well-defined and non-
degenerate minimum |ψmin . This state is related to the correction vector (22.12)
by
(H − E0 − ω + iη)|ψmin = η|ψX (ω + iη) . (22.16)
The value of the minimum yields the imaginary part of the dynamical correlation
function
WX,η (ω, ψmin ) = −πηIX (ω + iη) . (22.17)
Therefore, the calculation of spectral functions can be formulated as a minimization
problem.
This variational formulation is completely equivalent to the correction-vector
method if we can calculate |ψmin and |ψX (ω + iη) exactly. However, if we can
only calculate approximate states with an error of the order ε ≪ 1, the variational
formulation (22.17) gives the imaginary part IX (ω + iη) of the correlation function
with an accuracy of the order of ε2 , while the correction-vector approach (22.13)
yields results with an error of the order of ε.
The DMRG method can be used to minimize the functional WX,η (ω, ψ) and thus
to calculate the dynamical correlation function GX (ω + iη). This approach is called
the dynamical DMRG method. The minimization of the functional is easily inte-
grated into the standard DMRG algorithm. At every step of a sweep through the
system lattice, the following calculations are performed for the effective superblock
operators H and X:
(i) The ground state vector |ψ0 of H and its energy E0 are calculated as in the
standard DMRG method.
(ii) The state |X = X|ψ0 is calculated.
(iii) The functional WX,η (ω, ψ) is minimized using an iterative minimization algo-
rithm. This yields the imaginary part IX (ω + iη) of the dynamical correlation
function and the state |ψmin .
(iv) The correction vector is calculated using (22.16).
(v) The states |ψ0 , |X, and |ψX (ω + iη) are used as target (22.8) of the density-
matrix renormalization process.
628 E. Jeckelmann and H. Benthien
The robust finite-system DMRG algorithm must be used to perform several sweeps
through a lattice of fixed size. Sweeps are repeated until the procedure has converged
to the minimum of WX,η (ω, ψ).
To obtain the spectrum IX (ω + iη) over a range of frequencies, one has to re-
peat this calculation for several values of ω. The computational effort is thus roughly
proportional to the number of frequencies. As with the correction-vector approach,
one can perform a DDMRG calculation for two close frequencies ω1 and ω2 si-
multaneously, and then calculate the dynamical correlation function for frequencies
ω between ω1 and ω2 without targeting the corresponding correction vectors. This
approach can significantly reduce the computer time necessary to determine the
spectrum over a frequency range but the results obtained for ω = ω1 , ω2 are less
accurate than for the targeted frequencies ω = ω1 and ω = ω2 .
First, it should be noted that DDMRG calculations are always performed for a fi-
nite parameter η. The spectrum IX (ω + iη) is equal to the convolution of the true
spectrum IX (ω) with a Lorentzian distribution of width η
+∞
1 η
IX (ω + iη) = dω ′ IX (ω ′ ) . (22.18)
−∞ π (ω − ω ′ )2 + η 2
where nσ (k) = ψ0 |c†kσ ckσ |ψ0 is the ground state momentum distribution.
Numerous comparisons with exact analytical results and accurate numerical
simulations have demonstrated the unprecedented accuracy and reliability of the
dynamical DMRG method for calculating dynamical correlation functions in one-
dimensional correlated systems [6, 9, 19, 20, 21, 22] and quantum impurity problems
[23, 24, 25]. As an example, we show in Fig. 22.1 the local DOS of the half-filled
one-dimensional Hubbard model calculated with DDMRG for two values of U .
The local DOS is obtained by substituting X = cjσ and X = c†jσ for ckσ and
c†kσ in the definition of the spectral functions A(k, ω) and B(k, ω), respectively.
22 Dynamical Density-Matrix Renormalization Group 629
0.5
0.4
U = 4t
U=0
0.3
nσ(ω)
0.2
0.1
0.0
–5 0 5
ω/t
Fig. 22.1. Local density of states of the half-filled one-dimensional Hubbard model for U = 0
and U = 4t calculated in the middle of an open chain with 128 sites using DDMRG and a
broadening η = 0.08t
The local DOS does not depend on the site j for periodic boundary conditions
and is equal to the integrated DOS defined in Sect. 22.1. For open boundary con-
ditions we have checked that the local DOS in the middle of the chain is indis-
tinguishable from the integrated DOS (22.6) for the typical broadening η used in
DDMRG calculations [22]. On the scale of Fig. 22.1 the DDMRG DOS for the
metallic regime (U = 0) is indistinguishable from the exact result (with the same
broadening η), which illustrates the accuracy of DDMRG. For the insulating regime
U = 4t, one clearly sees the opening of the Mott-Hubbard gap in Fig. 22.1. The
width of the gap agrees with the exact result Ec ≈ 1.286t calculated with the
Bethe Ansatz method [2]. The shape of the spectrum around the spectrum onsets
at ω ≈ ±Ec /2 ≈ 0.643t also agrees with field-theoretical predictions as discussed
in the next section. The effects of the broadening η = 0.08t are also clearly visible
in Fig. 22.1: For U = 4t spectral weight is seen inside the Mott-Hubbard gap and
for U = 0 the DOS divergences at ω = ±2t have been broadened into two sharp
peaks.
The numerical errors in the DDMRG method are dominated by the truncation
of the Hilbert space. As in a ground state DMRG calculation, this truncation error
decreases (and thus the accuracy of DDMRG target states and physical results in-
creases) when more density-matrix eigenstates are kept. As the variational approach
yields a smaller error in the spectrum than the correction-vector approach for the
same accuracy in the targeted states, the DDMRG method is usually more accurate
than the correction-vector DMRG method for the same number of density-matrix
eigenstates kept or, equivalently, the DDMRG method is faster than the correction-
vector DMRG method for a given accuracy.
630 E. Jeckelmann and H. Benthien
It should be noted that the order of limits in the above formula is important. Com-
puting both limits from numerical results requires a lot of accurate data for different
values of η and N and can be the source of large extrapolation errors. A better ap-
proach is to use a broadening η(N ) > 0 which decreases with increasing N and
vanishes in the thermodynamic limit [19]:
I(ω) = lim IX (ω + iη(N )) . (22.22)
N →∞
The function η(N ) depends naturally on the specific problem studied and can also
vary for each frequency ω considered. For one-dimensional correlated electron sys-
tems such as the Hubbard model, one finds empirically that the optimal scaling is
c
η(N ) = , (22.23)
N
where the constant c is comparable to the effective band width of the excitations
contributing to the spectrum around ω.
In Fig. 22.2 we see that the DOS of the half-filled one-dimensional Hubbard
model becomes progressively step-like around ω ≈ 0.643t as the system size is
increased using a size-dependent broadening η = 10.24t/N . The slope of nσ (ω)
has a maximum at a frequency which tends to half the value of the Mott-Hubbard
gap Ec ≈ 1.286t for N → ∞. The height of the maximum diverges as η −1 ∼ N
for increasing N (see the inset in Fig. 22.2). This demonstrates the presence of a
Dirac-function peak δ(ω − Ec /2) in the derivative of nσ (ω) [19] or, equivalently,
a step increase of the DOS at the spectrum onset in the thermodynamic limit, in
agreement with the field-theoretical result for a one-dimensional Mott insulator [26].
Thus the features of the infinite-system spectrum can be determined accurately from
DDMRG data for finite systems using a finite-size scaling analysis with a size-
dependent broadening η(N ).
It should be noted that a good approximation for a continuous infinite-system
spectrum can sometimes be obtained at a much lower computational cost than this
scaling analysis by solving the convolution equation (22.18) numerically for an un-
known smooth function IX (ω ′ ) using DDMRG data for a finite system on the left-
hand side (deconvolution) [9, 24, 27].
22 Dynamical Density-Matrix Renormalization Group 631
0.15
0.10
nσ(ω)
0 100 200
η = 0.04
0.05 η = 0.08
η = 0.16
η = 0.32
0.00
0.3 0.4 0.5 0.6 0.7 0.8 0.9
ω/t
Fig. 22.2. Expanded view of the DOS around the spectrum onset at ω = Ec /2 ≈ 0.643t
(vertical dashed line) in the the half-filled one-dimensional Hubbard model for U = 4t. The
data have been obtained with DDMRG for various system sizes from N = 32 to N = 256
with a broadening η = 10.24t/N . The inset shows the slope of nσ (ω) at ω = Ec /2 as a
function of the system size
with wavevectors k = 2πz/N (momentum p = k) for integers −N/2 < z ≤ N/2.
These plane waves are the one-electron eigenstates of the Hamiltonian (22.4) in the
non-interacting limit (U = 0) for periodic boundary conditions.
Since DMRG calculations can be performed for much larger systems using open
boundary conditions, it is desirable to extend the definition of the spectral function
A(k, ω) to that case. Combining plane waves with filter functions to reduce bound-
ary effects is a possible approach [13] but this method is complicated and does
not always yield good results [22]. A simple and efficient approach is based on
the eigenstates of the particle-in-a-box problem [i.e., the one-electron eigenstates of
the Hamiltonian (22.4) with U = 0 on an open chain]. The operators are defined
for quasi-wavevectors k = πz/(N + 1) (quasi-momenta p = k) with integers
1 ≤ z ≤ N by
4 N
2
ckσ = sin(kj)cjσ . (22.25)
N +1
j=1
632 E. Jeckelmann and H. Benthien
1
open
periodic
0.8 0.6
0.6 0.4
nσ(k)
0.4 0.2
open 0.4 0.5
periodic
0.2
0
0 0.2 0.4 0.6 0.8 1
k/π
The DDMRG method and the quasi-momentum technique allow us to calculate the
spectral properties of one-dimensional correlated systems on large lattices. To illus-
trate the capability of this approach we have calculated the photoemission spectral
function Aσ (k, ω) of the half-filled one-dimensional Hubbard model. In Fig. 22.4
we show a density plot of this spectral function for U = 4t on a 128-site lattice. Re-
sults for stronger coupling U/t are qualitatively similar [22]. In Fig. 22.4 we observe
dispersive structures which correspond well to the excitation branches (spinon and
holon) predicted by field theory for one-dimensional Mott insulators in the weak
coupling regime (i.e., U/t ≪ 1 in the Hubbard model) [26]. The DDMRG re-
sults can also be compared to those obtained with other numerical methods (see
Chap. 19).
22 Dynamical Density-Matrix Renormalization Group 633
–1 0
log(A(k,ω))
–0.5
–2 –1
–1.5
–3
ω –2
–4
–5
–6
0 0.5 1 1.5 2 2.5 3
k
Fig. 22.4. Density plot of the spectral function Aσ (k, ω) in the half-filled one-dimensional
Hubbard model for U = 4t calculated on a 128-site open chain using DDMRG with η =
0.0625t and quasi-momenta
–1
–2
ω(k)/t
–3
–4
–5
–6
0 0.5 1 1.5 2 2.5 3
k
Fig. 22.5. Dispersion of structures found in the DDMRG spectral function of Fig. 22.4
(symbols). Lines show the dispersion of corresponding excitation branches calculated with
the Bethe Ansatz for periodic boundary conditions
634 E. Jeckelmann and H. Benthien
where spinon and holon branches merge and α = 0.5 ± 0.1 for other |k| ≤ kF in
excellent agreement with the field-theoretical predictions α = 3/4 and α = 1/2,
respectively.
Finally, we note that in the one-dimensional Hubbard model the dispersion of
excitations (but not their spectral weight) can be calculated with the Bethe Ansatz
method [2]. In Fig. 22.5 we compare the dispersion of structures observed in the
DDMRG spectral function for an open chain with the dispersion of some excitations
obtained with the Bethe Ansatz for periodic boundary conditions. The agreement
is excellent and allows us to identify the dominant structures, such as the spinon
branch (squares) and the holon branches (circles) [21, 22]. This demonstrates once
again the accuracy of the DDMRG method combined with the quasi-momenta tech-
nique. In summary, DDMRG provides a powerful and versatile approach for inves-
tigating the dynamical properties in low-dimensional lattice quantum many-body
systems.
References
1. H. Kuzmany, Solid-State Spectroscopy (Springer, Berlin, 1998) 621
2. F. Essler, H. Frahm, F. Göhmann, A. Klümper, V. Korepin, The One-Dimensional Hub-
bard Model (Cambridge University Press, Cambridge, 2005) 622, 629, 634
3. S.R. White, Phys. Rev. Lett. 69(19), 2863 (1992) 622
4. S.R. White, Phys. Rev. B 48(14), 10345 (1993) 622
5. I. Peschel, X. Wang, M. Kaulke, K. Hallberg (eds.), Density-Matrix Renormalization, A
New Numerical Method in Physics (Springer, Berlin, 1999) 622, 631
6. U. Schollwöck, Rev. Mod. Phys. 77(1), 259 (2005) 622, 625, 626, 628, 631
7. K. Hallberg, Adv. Phys. 55, 477 (2006) 622, 625, 626
8. S.R. White, D.A. Huse, Phys. Rev. B 48(6), 3844 (1993) 623
9. E. Jeckelmann, H. Fehske, in Proceedings of the International School of Physics “Enrico
Fermi” - Course CLXI Polarons in Bulk Materials and Systems with Reduced Dimension-
ality (IOS Press, Amsterdam, 2006), pp. 247–284 623, 628, 630
10. S. Ramasesha, S.K. Pati, H.R. Krishnamurthy, Z. Shuai, J.L. Brédas, Phys. Rev. B 54(11),
7598 (1996) 624
11. M. Boman, R.J. Bursill, Phys. Rev. B 57(24), 15167 (1998) 624
12. K.A. Hallberg, Phys. Rev. B 52(14), R9827 (1995) 625, 631
13. T.D. Kühner, S.R. White, Phys. Rev. B 60(1), 335 (1999) 625, 626, 631
14. E.R. Gagliano, C.A. Balseiro, Phys. Rev. Lett. 59(26), 2999 (1987) 625
15. Z.G. Soos, S. Ramasesha, J. Chem. Phys. 90(2), 1067 (1989) 626
16. W. Press, S. Teukolsky, W. Vetterling, B. Flannery, Numerical Recipes in C++. The Art
of Scientific Computing (Cambridge University Press, Cambridge, 2002) 626
17. S. Ramasesha, J. Comp. Chem. 11(5), 545 (1990) 626
18. S.K. Pati, S. Ramasesha, Z. Shuai, J.L. Brédas, Phys. Rev. B 59(23), 14827 (1999) 626
19. E. Jeckelmann, Phys. Rev. B 66(4), 045114 (2002) 627, 628, 630, 633
20. E. Jeckelmann, F. Gebhard, F.H.L. Essler, Phys. Rev. Lett. 85(18), 3910 (2000) 628
21. H. Benthien, F. Gebhard, E. Jeckelmann, Phys. Rev. Lett. 92(25), 256401 (2004) 628, 632, 634
22. H. Benthien, Dynamical properties of quasi one-dimensional correlated electron systems.
Ph.D. thesis, Philipps-Universität, Marburg, Germany (2005) 628, 629, 631, 632, 634
22 Dynamical Density-Matrix Renormalization Group 635
23. F. Gebhard, E. Jeckelmann, S. Mahlert, S. Nishimoto, R. Noack, Eur. Phys. J. B 36, 491
(2003) 628
24. S. Nishimoto, E. Jeckelmann, J. Phys. Condens. Matter 16, 613 (2004) 628, 630
25. C. Raas, G.S. Uhrig, F.B. Anders, Phys. Rev. B 69(4), 041102 (2004) 628
26. F.H.L. Essler, A.M. Tsvelik, Phys. Rev. B 65(11), 115117 (2002) 630, 632, 633
27. C. Raas, G. Uhrig, Eur. Phys. J. B 45, 293 (2005) 630
23 Studying Time-Dependent Quantum Phenomena
with the Density-Matrix Renormalization Group
Recently, the Density Matrix Renormalization Group (DMRG) has been extended
to calculate the time evolution of an arbitrary state. Here, we will discuss this exten-
sion of the DMRG method, in particular, the general properties of the DMRG that
are relevant to the extension, the basic issues that are involved in calculating time-
dependence within the DMRG, and the first attempts at formulating time-dependent
DMRG (t-DMRG) algorithms. Moreoever, we describe adaptive t-DMRG methods,
which tailor the reduced Hilbert space to one particular time step and which are
therefore the most efficient algorithms for the majority of applications. Finally, we
discuss in detail the application of the t-DMRG to a system of interacting spinless
fermions which are quenched by suddenly changing the interaction strength. This
system provides a very useful test bed for the method, but also raises physical is-
sues which are illustrative of the general behavior of quenched interacting quantum
systems.
∂|ψ
i = H|ψ , (23.1)
∂t
with formal solution
|ψ(t) = e−iHt/ |ψ(t0 ) (23.2)
for a time-independent Hamiltonian H given an initial state at time t = t0 , |ψ(t0 ),
this formal expression does not help very much in finding an actual solution: Calcu-
lating the exponential of the Hamiltonian applied to an arbitrary state is, in general,
a quite difficult problem.
Here we will concern ourselves primarily with the case of systems undergoing
a sudden change or quench, as formulated above, i.e., the system is prepared in an
initial state at time t0 ≡ 0 and evolves via a Hamiltonian that is time-independent
R. M. Noack et al.: Studying Time-Dependent Quantum Phenomena with the Density-Matrix Renormalization Group,
Lect. Notes Phys. 739, 637–652 (2008)
DOI 10.1007/978-3-540-74686-7 23 c Springer-Verlag Berlin Heidelberg 2008
638 R. M. Noack et al.
for t > 0. In order to simplify the notation, we will take = 1 and define t0 ≡ 0
in the following. This physical situation is interesting in a number of experimental
contexts. Examples include experiments in which the depth of an optical lattice con-
taining trapped cold atoms is suddenly changed, leading to the collapse and revival
of a Bose-Einstein condensate [1], the realization of a quantum version of Newton’s
cradle [2], the quenching of a ferromagnetic spinor Bose-Einstein condensate [3],
and transport across quantum dots [4, 5] and other nanostructures. One should also
consider what aspects of time-dependent behavior are interesting. In these systems,
the detailed time evolution of various observables can be followed experimentally
on short to intermediate time scales. For example, for the system of 87 Rb atoms
on an optical lattice, snapshots of the momentum distribution can be obtained by
releasing the condensate at different times after the quench and then performing
time-of-flight measurements [1]. What is interesting is, first of all, the transient be-
havior, in this case, oscillations between a momentum distribution characteristic of
a Bose-Einstein condensate and that of a bosonic Mott insulator. After a somewhat
longer period of time, one can ask the question of whether there is relaxation of
these oscillations to stationary or quasi-stationary behavior, and, if so, how can this
behavior be characterized.
Numerically, the way to proceed, given an initial state |ψ(0), is to propagate
through a succession of discrete time intervals of size ∆t. The time interval ∆t is
chosen to be sufficiently small so that |ψ(t+∆t) can be calculated to the desired ac-
curacy given |ψ(t). For the single-particle Schrödinger equation, an appropriately
chosen discretization in time and space leads to finite difference equations which
can be iterated numerically; the most well-known variants are the Crank-Nicolson
method and the Runge-Kutta method. For interacting many-particle systems, it is
less evident how to formulate a well-behaved and efficient algorithm, but a dis-
cretization in time nevertheless forms the basis for most tenable algorithms.
One class of such algorithms involves projecting the time-propagation operator
over a finite interval, exp(−iH∆t), onto the Krylov subspace, the subspace gener-
ated by applying the Hamiltonian n times to an arbitrary initial vector, |u0 ,
? @
|u0 , H|u0 , H 2 |u0 , ..., H n |u0 .
The Lanczos and the related Arnoldi method involve projecting an operator onto
an orthogonalized version of this Krylov subspace, where n is typically chosen to
be much smaller than the total dimension of the Hilbert space (see also Chaps. 18
and 19). In the original methods, the operator projected is the Hamiltonian, and the
lowest (or highest) few eigenstates are good variational (anti-variational) approx-
imations to the exact eigenstates. However, variants of these methods can also be
used to approximate the unitary time-evolution operator. In the Lanczos procedure,
the Hamiltonian becomes tridiagonal in the Lanczos basis, a basis for the Krylov
subspace orthogonalized via the Lanczos recursion. The time evolution operator is
then the exponential of a tridiagonal matrix, which can be formed explicitly and ef-
ficiently. For a given time interval ∆t and bandwidth of the matrix representation of
H, explicit error bounds can be given for the Euclidean norm of the wave function
23 Studying Time-Dependent Quantum Phenomena 639
where the states |j are a basis for the environment block, and the Wα are positive
semi-definite weights, which must sum to one. When only one state enters into the
sum (i.e., only one wα = 1), the superblock is in a pure state, otherwise it is in a
mixed state. The states |ψα are called target states.
In order to truncate the basis, a given number, m, states with the largest weights,
i.e., the largest density-matrix eigenvalues, are retained. For the case of a pure state,
this is equivalent to representing the wave function of the superblock in a reduced
basis by truncating the Schmidt or singular-value decomposition:
system environment
superblock
Fig. 23.1. Decomposition used in the DMRG: The superblock, which encompasses the entire
system studied, is divided into a system block and an environment block
640 R. M. Noack et al.
m ≤ dim(γ)
√
|ψ0 ≈ wγ |φγ |χγ , (23.4)
γ
where the wγ are the nonzero eigenvalues of the reduced density matrices of either
the system or the environment blocks (which are identical), and the |φγ and |χγ are
the eigenstates of the reduced density matrices of the system and the environment
blocks, respectively. This expression can straightforwardly be generalized to the
case of a mixed state. A matrix-product state is built up out of a succession of such
approximations.
In order to accurately calculate a state that evolves in time, the DMRG algorithm
must be extended in two ways: First, states other than extremal eigenstates must be
generated, and second, the basis must be adapted to the time-evolving state. Differ-
ent choices can be made in how these extensions are carried out; these choices can
be used to classify the various algorithms.
The simplest and earliest algorithm, formulated by Cazalilla and Marston [8],
adapts the basis for the initial state only. More specifically, the initial state |ψ(0) is
determined using a ground-state DMRG calculation, carried out with a Hamiltonian
H0 . The wave vector is then propagated through a set of time steps without fur-
ther changing the basis, i.e., the basis is adapted to the initial state only and is not
changed. The accuracy of this method clearly depends on how well the basis adapted
for the initial state represents the time-evolved state. Since one is, in most cases, in-
terested in a time-evolved state which is significantly different from the initial state,
this method, will not, in general, provide an accurate approximation for the time-
evolved behavior.
Luo, Xiang and Wang [9] subsequently pointed out that better accuracy could be
achieved for the test quantity calculated in [8], the tunnel current across a quantum
dot, when information on all relevant time scales is included in the DMRG proce-
dure. They did this by including in the density matrix (23.3) states at all discrete
time steps,
|ψ(0), |ψ(∆t), |ψ(2∆t), . . . , |ψ(T ) (23.5)
up to a maximum time T . This scheme is illustrated conceptually in Fig. 23.2(a).
While this technique should evidently improve the accuracy at times removed from
t = 0, the penalty that must be paid is that the set of bases built up by the DMRG
procedure, i.e., the matrix-product state that is generated, is adapted for a set of
states rather than for a single state. Therefore, for a fixed number of states kept at
each step, the accuracy of the representation of each particular state suffers. In other
words, the longer the desired maximum time T , the more poorly the matrix-product
state is adapted for a given time, at least at fixed numerical effort.
(a) (b)
Fig. 23.2. Schematic sketch of the portions of the complete Hilbert space for which
the matrix-product state is adapted for (a) the complete t-DMRG and (b) the adaptive
t-DMRG
described at the end of the last section by targeting only states associated with the
previous and the current time step. While this statement seems straightforward at
first glance, the problem of how to formulate a controlled, efficient algorithm in-
corporating this strategy is less straightforward. In particular, the original DMRG
algorithm targets extremal eigenstates of the Hamiltonian, i.e., the ground state and
low-lying excited states within a particular symmetry sector. Additional states can
also be targeted, such as the correction vector when dynamical quantities are desired
(see Chap. 22), but they are generally generated by applying an operator to one of
the extremal eigenstates, or by minimizing an additional functional. For an arbitrary
time step, however, the only information available is |ψ(t), which is not an extremal
state of a particular functional. Information on this state is encoded as a matrix-
product state, i.e., a series of transformations to the basis of the reduced density
matrix for successive partitions of the system. Given this state and the Hamiltonian
H that determines the time evolution, the state |ψ(t + ∆t) = exp(−iH∆t)|ψ(t)
must be calculated. This must be done by re-adapting the basis to |ψ(t + ∆t).
In general, such a re-adaption is carried out by performing a finite-system
DMRG sweep in which the state |ψ(t + ∆t) is targeted at each step. By doing
this for every bipartite decomposition of the system, the matrix-product state is op-
timized for the new state. Note, however, that in order to calculate |ψ(t + ∆t)
accurately, an accurate representation of |ψ(t) must also be available at each step.
Therefore, the basis must simultaneous be re-adapted for |ψ(t). However, |ψ(t)
cannot be explicitly recalculated because the previous time step is not known. This
technical problem is the reason that the adaptive method was not developed ear-
lier. The solution is to transform the wave function |ψ(t) from the last step us-
ing the so-called wave-function transformation; for details, see Chap. 21, Sect. 4,
and, in particular, (35). Note that such a transformation is not exact; it introduces
an additional error that is the truncation error of the particular finite-system step
into the representation of |ψ(t). Therefore, one should avoid performing super-
fluous finite-system sweeps in the time-dependent DMRG; unlike in the ground-
state DMRG, additional sweeps are not guaranteed to always improve the wave
function.
642 R. M. Noack et al.
The original work on adaptive t-DMRG [10, 11, 12] treated the time evolution
operator in the Trotter-Suzuki decomposition. The most commonly used second-
order decomposition has the form
where Heven (Hodd ) are the parts of the Hamiltonian involving even (odd) bonds
and we have Lassumed that H can be decomposed as H = Heven + Hodd . Here
Heven = H is a sum over the even bond operators and, similarly,
L i=1 2i,2i+1
Hodd = i=1 H2i−1,2i . Note that only Hamiltonians composed solely of nearest-
neighbor connections can be decomposed in this way. For one-dimensional lattices,
this decomposition can be integrated quite readily into the usual finite-system al-
gorithm. Since the exponentials of the individual bond operators within the even or
odd bonds commute with one another, the terms can be ordered so that only one
bond term is applied at each finite-system step. This bond term is chosen so that it
corresponds to the two exactly treated sites in the finite-system superblock config-
uration, as depicted in Fig. 23.3. The advantage of this scheme is that the complete
Hilbert space of the two sites is present, so that this piece of the time-evolution op-
erator can be applied exactly and very efficiently. A complete sweep back and forth
then successively applies all the bond operators and, at the end of the sweep, the
propagation through the time step is complete. For more detailed descriptions of the
algorithms, see [11, 12].
Feiguin and White [13] subsequently pointed out that an adaptive t-DMRG algo-
rithm can also be formulated without carrying out a Trotter-Suzuki decomposition.
Instead, the complete time evolution operator is applied at each step of the finite-
system procedure, and sweeping is carried out only to adapt the matrix-product state.
Different schemes are then possible to carry out the propagation through a time step;
in [13] the Runge-Kutta method was used. However, if an integrator is used, it would
clearly be better to use one that preserves unitarity, such as Crank-Nicholson. The
most accurate and efficient scheme seems to be to decompose exp(−iH∆t) in a
Lanczos basis [7] or using the Arnoldi method [14], just as is done in the exact di-
agonalization method discussed above in Sect. 23.1. This scheme has the advantage
of preserving unitarity and converges exponentially in the number of applications
of H.
Another crucial issue in the general adaptive algorithm is which states to tar-
get in the density matrix. If one considers the time evolution of the density matrix
through one time step
Hi,i+1
i i+1
Fig. 23.3. Four-site superblock decomposition showing how an individual bond operator is
applied in the Trotter-Suzuki decomposition-based variant of the adaptive t-DMRG
23 Studying Time-Dependent Quantum Phenomena 643
t+∆t
ρ(t + ∆t) = dt′ |ψ(t′ )ψ(t′ )| , (23.7)
t
it is clear that targeting additional states within the time interval [t, t + ∆t] could be
helpful [13]. Just how many intermediate time steps should be targeted depends on
the overall time step ∆t and the details of the system studied. In practice, we target
one to a few states at intermediate times in the calculations presented in Sect. 23.2;
this issue is illustrated numerically there.
In general, which variant of the adaptive t-DMRG to use will depend on the
problem treated. First, the Lanczos (and related variants of the general adaptive
scheme) can be applied to a more general class of systems than the Trotter method.
When the Trotter method can be applied, it is generally computationally more effi-
cient for a given formal accuracy, i.e., equal number of states kept m or equal cutoff
in discarded weight or quantum information loss. However, in general, the different
methods should be compared in test runs for particular systems in order determine
which one yields the most accurate and stable results for given computational effort.
In order to illustrate and test the adaptive t-DMRG algorithm as well as to explore
typical physical issues that crop up in suddenly perturbed strongly interacting sys-
tems, we consider a system of spinless fermions with nearest-neighbor Coulomb
repulsion †
H = −th cj+1 cj + H.c. + V nj nj+1 . (23.8)
j j
Our physical motivation for considering this system comes from the “collapse
and revival” phenomena observed in experiments with atoms trapped in optical lat-
tices [1]. When the depth of the optical lattice is suddenly changed, the effective
hopping and interaction strength of the corresponding model are suddenly changed;
this can be parameterized as a change of their ratio. In the bosonic systems treated
in [1], the parameters were changed in such a way that a transition from a superfluid
to a bosonic Mott insulator was induced. Although more difficult to realize exper-
imentally, trapping fermionic atoms is also possible [15, 16]. As we will see, the
phenomena observed when the model parameters of fermionic systems are suddenly
changed is reminiscent of that found in the bosonic systems. In view of this, we will
treat a system initially prepared to be in the ground state of Hamiltonian (23.8) with
a particular value of the interaction V0 , i.e., |ψ(0) = |ψ0 (V0 ), the ground state of
H(V0 ). At time t = 0, the interaction strength will be suddenly changed to a value
V and the time evolution will be subsequently carried out using H(V ).
In order to investigate the single-particle properties of the system, which are
related to its metallic or insulating nature, we examine the momentum distribution
function
L
1 ik(l−m) †
nk (t) = e cl cm (t) , (23.9)
L
l,m=1
i.e., the Fourier transform of the one-particle density matrix, ρlm = c†l cm , as a
function of time. In an insulator, nk has a finite slope at the Fermi wave vector, k =
kF , while for a conventional (Fermi liquid) metal, there is a jump discontinuity at
kF . For a one-dimensional interacting metal, a Luttinger liquid, the jump is replaced
by a power-law singularity in the slope at kF [17, 18]. Note that the behavior of the
density-density correlation function is also interesting for characterizing the CDW
insulating phase [19, 20]; however, for the sake of compactness, we will consider
only single-particle properties here.
We use the adaptive t-DMRG method described in Sect. 23.1.2, using both the
Lanczos and the Trotter treatment of the time step. In all cases, we set a fixed thresh-
old of discarded weight as well as a limit on the maximum number of states kept;
we set the number of states limit to be appropriate for the weight cutoff and the
system parameters. Typical values for this system are a weight cutoff of 10−9 and a
maximum of 1500 states kept.
We have carried out extensive tests, comparing both variants of the adaptive
t-DMRG algorithm with each other and with control results where available. Un-
fortunately, there are not many interacting quantum systems for which exact results
can be obtained. In order to calculate the full time evolution, all eigenstates of the
system must be obtained; exact methods for the ground state, such as the Bethe
ansatz, are generally not powerful enough to obtain the full time evolution.4 For
spinless fermions, exact results for the time evolution are available for zero inter-
action V = 0 and in the atomic limit, th = 0. In addition, on sufficiently small
4
There have been recent advances for single-impurity systems using the Bethe ansatz; see
[21, 22].
23 Studying Time-Dependent Quantum Phenomena 645
systems, we can compare with time evolution calculated using the Lanczos method,
for which the numerical errors are well-controlled enough so that the numerical
error can be made arbitrarily small. The behavior of various quantities can be con-
sidered. Since the time evolution is unitary, the expectation value of the Hamiltonian
H(V ) and all higher powers of H, H 2 , H 3 , . . . , will be conserved. Any ap-
preciably change in these expectation values with time then signifies a breakdown
in accuracy. Since the average energy is not particularly important physically, the
accuracy of the relevant observables is more important. Here the most important
observable is the momentum distribution nk ; other useful quantities include the
local density and the density-density correlation function.
In Fig. 23.4, we compare the maximum error over k in the momentum distri-
bution of various t-DMRG calculations for the same system, an L = 18 site chain
with open boundary conditions in which the interaction is changed from V0 = 0.5 to
V = 10. The numerically exact (for the time range shown) benchmark is provided
by a Lanczos time evolution calculation. In the Lanczos t-DMRG method, it is im-
portant to optimize the number of intermediate time steps targeted. As can be seen
the accuracy of the Lanczos t-DMRG method depends strongly on the number of
intermediate time steps taken. For the time step taken here, the best accuracy occurs
0.1
0.01
0.001
0.0001
max(|nkED - nkDMRG|)
1e-05
1e-06
1e-07
1e-08
Fig. 23.4. Maximum value of the deviation of the momentum distribution nk obtained with
the Lanczos and Trotter adaptive t-DMRG methods from a numerically exact Lanczos time-
evolution calculation for a system of interacting spinless fermions with L = 18 sites pushed
out of equilibrium by changing the interaction from V0 = 0.5 to V = 10 at time t = 0. the
time step is ∆t = 5 × 10−3 and the calculations were all limited to 8 CPU hours with fixed
discarded weight
646 R. M. Noack et al.
when one intermediate time step is taken, with zero and ten intermediate time steps
significantly less accurate. The Trotter method yields the most accurate result for
times less than approximately one, whereas the Lanczos t-DMRG with one inter-
mediate time step yields somewhat more accurate results for times between one and
five. Note that the CPU time has been held to the same value for all the runs, so
that the length of the curves in time indicate the relative efficiency. For example, the
Trotter method uses about 2/3 the CPU time of the comparably accurate Lanczos
t-DMRG with one intermediate time step. Therefore, for fixed CPU time, one could
gain better accuracy for the same time range by taking a larger m. This result is typ-
ical for the system of spinless fermions treated here. We note, however, that a larger
time step can be taken in the Lanczos t-DMRG than the Trotter variant to obtain the
same accuracy. We nevertheless find that the Trotter method with an optimal choice
of parameters yields the most accurate results for a given computational effort for
the results shown here; the majority of the results shown are therefore calculated
using it. A more extensive analysis of the errors can be found in [19].
One useful limit to consider is the atomic limit, th = 0. With no hopping, the
particle number can be treated as classical variable and the Hamiltonian, which con-
sists of only a Coulomb repulsion, corresponds to the classical Ising model. In the
Ising language, the ground state is an unmagnetized antiferromagnetic state, which
corresponds to a CDW state at q = π site and is two-fold degenerate. Excitations
out of the ground state involve forming at least one domain wall, each of which
has an energy cost V . Such excited states are highly degenerate because the num-
ber of ways of making such an excitation is at least of the order of the system size.
Therefore, the complete excitation spectrum consists of a series of highly degen-
erate, dispersionless levels at energy V , 2V , . . . The time dependence of the rele-
vant observables can be calculated explicitly. Any observable composed of the local
density operator niσ , such as the density-density or spin-spin correlation function,
is time-independent because niσ commutes with H when th = 0. The functional
dependence of the single-particle density matrix on time and thus the frequencies
that enter into its Fourier transform nk can be easily obtained. It consists of two
cosine terms with frequencies ω1 = V and ω2 = 2V is therefore periodic with
period T = 2π/V [19].
We display the behavior of the momentum distribution nk Fig. 23.5(a). The
initial state is the ground state of H(V0 = 0.5) with th = 1, i.e., an interacting
metallic state. (In the thermodynamic limit, the jump at kF = π/2 would develop
into the singular, Luttinger-liquid form.) As nk develops from the pseudo-metallic
form at t = 0, changes rapidly, even attaining inverted behavior as a function of k
at t = 0.3. At t ≈ 0.62, in agreement with the argument above, there is a complete
revival of the momentum distribution. The Fourier transform in the time domain,
Fig. 23.5(b), clearly shows the expected sharp peaks at ω1 = V and ω2 = 2V .
We now turn to the case of finite th , treating first time evolution with large V /th ,
V = 40 (with th = 1), which is well into the CDW insulating phase for the ground
state. The initial state is the ground state of H(V0 = 0.5), which has distinctly
metallic character. As can be seen from the surface plot in Fig. 23.6, the behavior
23 Studying Time-Dependent Quantum Phenomena 647
1
(a) 1 τ=0 (b) L = 100, q = π
τ = 0.01 0.1
τ = 0.02
τ = 0.03
τ = 0.04 0.01
τ = 0.05
τ = 0.063 0.001
0.5 1e-04
1e-05
1e-06
1e-07
0 1e-08
0 π/2 π 0 20 40 60 80 100 120 140 160
k ω
Fig. 23.5. (a) Momentum distribution nk in the atomic limit, th = 0, V = 10 on a L = 100
site system at the indicated times. (b) Fourier transform in the time domain of the k = π
component from (a). The two sharp peak occur at angular frequencies ω1 = 10 and ω2 = 20
1
0.8
0.6
1 0.4
0.8 0.2
0
0.6
–0.2
0.4
–0.4
0.2 0
0
π/2 k
0 1 2 3 π
4 5
time τ
of the momentum distribution at short time is similar to that in the atomic limit.
There are strong oscillations at all k with a period T = 0.157, that is shorter due
to the larger value of V . However, the revival is not complete, and, after a number
of oscillations and a time of the order of 1/th , the oscillations damp out. At larger
times, there are still residual oscillations which do not become smaller, but also
show no significant drift or revival phenomena on the time scales treated. We argue
that this indicates that a quasi-stationary state has been reached. Note that, although
nk is still relatively steeply changing near the Fermi wave vector kF = π/2, the
slope is actually finite at kF , characteristic of insulating behavior.
When the time evolution for the same initial state is carried out with the smaller
interaction V = 10, Fig. 23.7, oscillations as a function of time are still evident.
However, the period is significantly longer, as would be expected from the smaller
value of V (T = 0.628). However, the time over which the oscillations decay is
still of the order of 1/th . Therefore, only two distinct oscillations are evident before
quasi-stationary behavior is reached. The relatively steeply changing portion of the
momentum distribution function is somewhat more pronounced than in the V = 40
case, but it still has insulating character.
For a much smaller interaction, V = 2.5, Fig. 23.8, no oscillations occur; the
metallic quasi-jump decays smoothly to an insulating form that has a somewhat
larger rapidly changing region than for the larger values of V . There still seems
1
0.9
0.8
0.7
1 0.6
0.5
0.8 0.4
0.3
0.6 0.2
0.1
0.4
0
0.2 0
0
π/2 k
0 1 2 3 4 π
5
time τ
1
0.9
0.8
0.7
1 0.6
0.5
0.8 0.4
0.3
0.6 0.2
0.1
0.4
0
0.2 0
0
π/2 k
0 3 6 9 π
12
time τ
Fig. 23.8. Momentum distribution nk plotted as a function of k and when the initial ground
state of is time-evolved with Hamiltonian H(V = 2.5). The system size is L = 50, the time
step ∆t = 0.005, and up to 1000 states are kept with a discarded weight cutoff of 10−9
1
τ=0
(a)
V0 = 0.5
0.75 V0 = 5.00857
〈nk〉
0.5
0.25
0
V = 2.5
(b)
V0 = 0.5
0.75 V0 = 5.00857
finite T
〈nk〉
0.5
0.25
0
0 π/2 π
Fig. 23.9. (a) Momentum distribution for two initial states at V0 = 0.5 and V0 = 5.0086 with
the same energy expectation value H(V = 2.5) for the time-evolving Hamiltonian. (b)
Momentum distribution of the two initial states of (a) after being time-evolved with H(V =
2.5) to a time T = 4.5. Also shown is the momentum distribution for a thermal state with
the same average energy calculated using the quantum Monte Carlo method
V0 has a clearly metallic initial momentum distribution, while the V0 = 5.0086 state
has clearly insulating character. Nevertheless, after a time t = 4.5/th , Fig. 23.9(b),
they agree almost exactly, with nk showing insulating behavior, but with a some-
what steeper slope at kF than the insulating initial state. Also depicted in Fig. 23.9(b)
is nk for a thermal state with the same average energy as both initial states. The
quasi-stationary state shows a small, but appreciable difference with the thermal
state. Such a difference becomes larger when the time evolution is carried out with
larger values of V . Therefore, we conclude that there is a generic quasi-stationary
momentum distribution for a wide range of initial states and time-evolving parame-
ter values, but that this state is almost always significantly different from the thermal
state with the same average energy and the same interaction strength. We have also
studied the density-density correlation function and have come to analogous con-
clusions [19, 20].
23.3 Discussion
In this chapter, we have given an outline of the method by which the DMRG tech-
nique can be used to calculate the time dependence of interacting quantum systems.
For most applications, some version of the adaptive t-DMRG will be the best-suited
23 Studying Time-Dependent Quantum Phenomena 651
method. Note, however, that it is possible that a system can have a strong enough
dependence on a wide range of earlier times so that the complete t-DMRG method
(i.e., targeting all time steps) could be advantageous in relatively rare circumstances
[14, 24].
Within the adaptive t-DMRG, there are two major variants. The first, the Trotter
method [10, 11, 12], is based on a Trotter-Suzuki decomposition which allows one
to decompose the time evolution operator into pieces that can be treated efficiently
and exactly within a DMRG sweeping procedure. While this method is generally
quite efficient, it is limited, at least in its simplest form, to one-dimensional systems
with nearest-neighbor interactions and also suffering from a systematic error in the
size of the time step. The second variant treats the evolution through a time step
directly [13]. The most effective way to do this seems to be to treat the exponential
time evolution operator in a Lanczos expansion or using the closely related Arnoldi
method [7, 14]. This method can treat more general Hamiltonians and seems to
be more stable and, in some cases, more accurate for some systems, but is usually
computationally more expensive than the Trotter method for similar accuracy.
As an example, we have applied the adaptive t-DMRG to a one-dimensional
system of interacting spinless fermions. By starting with a metallic state and time-
evolving with a Hamiltonian with CDW insulating ground state, we find oscillations
in the single-particle momentum distribution that are reminiscent of collapse and
revival phenomena found in bosonic systems on an optical lattice. These oscilla-
tions are damped out on the scale of the inverse hopping and attain quasi-stationary
behavior for a wide range of interaction strengths. Different initial states with the
same average energy lead to very similar quasi-stationary behavior, indicating that
this behavior is generic. However, the quasi-stationary behavior cannot be easily
characterized as a thermal distribution, at least when the temperature is fixed by the
average energy. One possibility to describe this behavior is to use a more general en-
semble such as the generalized Gibbs ensemble rather than the Boltzmann ensemble
[23, 25, 26]. Since the generalized Gibbs ensemble used in [23, 25, 26] is parame-
terized by an indefinite number of parameters, each coupled to a successively higher
power of the Hamiltonian H, any distribution can, in principle, be described. What
is required then is a simple physical description using a small number of parameters.
How to do this, and how to describe the long-time behavior of suddenly perturbed
interacting quantum systems in general, is clearly a very interesting area for further
research.
References
1. M. Greiner, O. Mandel, T. Hänsch, I. Bloch, Nature 419, 51 (2002) 638, 644
2. T. Kinoshita, T. Wenger, D. Weiss, Nature 440, 900 (2006) 638
3. L. Sadler, J. Higbie, S. Leslie, M. Vengalattore, D. Stamper-Kurn, Nature 443, 312
(2006) 638
4. Z. Yao, H. Postma, L. Balents, C. Dekker, Nature 402, 273 (1999) 638
5. O. Auslaender, A. Yacoby, R.d. et al., Phys. Rev. Lett. 84, 1764 (2000) 638
652 R. M. Noack et al.
In the past few years, there has been an increasingly active exchange of ideas and
methods between the formerly rather disjunct fields of quantum information and
many-body physics. This has been due, on the one hand, to the growing sophisti-
cation of methods and the increasing complexity of problems treated in quantum
information theory, and, on the other, to the recognition that a number of central
issues in many-body quantum systems can fruitfully be approached from the quan-
tum information point of view. Nowhere has this been more evident than in the
context of the family of numerical methods that go under the rubric density-matrix
renormalization group. In particular, the concept of entanglement and its definition,
measurement, and manipulation lies at the heart of much of quantum information
theory [1]. The density-matrix renormalization group (DMRG) methods use proper-
ties of the entanglement of a bipartite system to build up an accurate approximation
to particular many-body wave functions. The cross-fertilization between the two
fields has led to improvements in the understanding of interacting quantum systems
in general and the DMRG method in particular, has led to new algorithms related
to and generalizing the DMRG, and has opened up the possibility of studying many
new physical problems, ones of interest both for quantum information theory and
for understanding the behavior of strongly correlated quantum systems [2].
In this line, we discuss some relevant concepts in quantum information theory,
including the relation between the DMRG and data compression and entanglement.
As an application, we will use the quantum information entropy calculated with the
DMRG to study quantum phase transitions, in particular in the bilinear-biquadratic
spin-one chain and in the frustrated spin-1/2 Heisenberg chain.
Here ρ(N ) is the density matrix for the system and the trace is over the degrees of
freedom of the system. Implicit in this description is that the system can be thought
Ö. Legeza et al.: Applications of Quantum Information in the Density-Matrix Renormalization Group, Lect. Notes
Phys. 739, 653–664 (2008)
DOI 10.1007/978-3-540-74686-7 24 c Springer-Verlag Berlin Heidelberg 2008
654 Ö. Legeza et al.
of as forming one part of a larger, bipartite system which can always be constructed
to be in a pure state.
The von Neumann entropy has been found to be intimately connected to many-
body properties of a quantum system such as the quantum criticality. In one dimen-
sion, s(N ) will increase logarithmically with N if the system is quantum critical,
but will saturate with N if the system is not [3, 4]. If a quantum critical system is
also conformally invariant, additional, specific statements can be made about the
entropy (see below) [5]. In higher dimensions, the von Neumann entropy will be
bounded from below by a number proportional to the area (or length or volume, as
appropriate) of the interface between the two parts of the system [6].
Since the von Neumann entropy is also a quantification of the fundamental ap-
proximation in the DMRG, a number of entanglement-based approaches to improve
the performance and to extend the applicability of DMRG [2, 7, 8, 9], have been
developed in the past few years [10, 11, 12, 13, 14, 15, 16].
For a more extensive discussion of the relationship of entanglement and von
Neumann entropy with the fundamentals of the DMRG, see Chap. 20 of this volume,
especially Sects. 2 and 6.
The reduction of the Hilbert space carried out in the DMRG method is closely re-
lated to the problem of quantum data compression [17, 18]. In quantum data com-
pression, the Hilbert space of the system Λ is divided into two parts: The “typical
subspace” Λtyp , which is retained, and the “atypical subspace” Λatyp , which is dis-
carded. For pure states, there is a well defined relationship between Λtyp and the von
Neumann entropy of the corresponding ensemble. In general, it has been shown that
β ≡ ln (dim Λtyp ) − s , (24.2)
is independent of the system size for large enough systems [11, 19].
Since one fundamentally treats a bipartite system in the DMRG, each subsystem
is, in general, in a mixed state. In the context of the DMRG, the accessible informa-
tion [20, 21] of mixed-state ensembles can be interpreted as the information loss due
to the truncation procedure. This information loss is a better measure of the error
than the discarded weight of the reduced density matrix
m
εTE = 1 − wα , (24.3)
α=1
(also called the truncation error). Here the wα are the eigenvalues of the reduced
density matrix ρ of either subsystem; both must have the same nonzero eigenvalue
spectrum.
Based on these considerations, the convergence of DMRG can be improved sig-
nificantly by selecting the states kept using a criterion related to the accessible in-
formation. In general, the accessible information must be less than the Kholevo
bound [20]
24 Applications of Quantum Information 655
100 100
U=1
U = 10
–1 –1
10 10 U = 100
10–2 10–2
relative error
10–3 10–3
10–4 10–4
U=1
10–5 U = 10 10–5
U = 100
10–6 –8 10–6 −8
10 10–6 10–4 10–2 10 10−6 10−4 10−2
TRE χ
Fig. 24.1. The relative error of the ground-state energy for the half-filled Hubbard chain for
various values of the on-site Coulomb interaction U on an N = 80-site lattice with periodic
boundary conditions as a function of (a) the truncation error and (b) the threshold value of
the Kholevo bound on accessible information, see (24.4). Taken from [11]
656 Ö. Legeza et al.
4b1u
N2 4ag F2
5ag 4b1u
2b1u 3ag 5ag 2b1u
2ag
1b3g 1b2g 4ag 3ag 3b1u
1b3g 1b2g
1b3g 2b2g
1b2u 1b3u 2b3g 2b2u 2b3u 2b2g
2b2u 2b3u 1b2u 1b3u
2ag
5b1u 3b1u
Fig. 24.2. Diagram of Ip,q for the molecules LiF, CO, N2 , and F2 calculated at Hartree-Fock
ordering with m = 200: Lines connect orbital labels with Ip,q > 0.01. The circle for CO and
N2 denotes that the surrounding orbitals are all connected with each other. Taken from [22]
applied to quantum chemical systems. In Fig. 24.2, we show the topology of Ip,q
for four prototypical small molecules, LiF, CO, N2 , and F2 , with a particular ba-
sis set; for details, see [22]. As can be seen, the mutual two-site information yields
a picture of the detailed connectivity of the orbitals, which is different for each
molecule. An attempt to optimize ordering of orbitals using a cost function based
on this information has led to moderate success [22]. However, more work needs
to be done both on defining a meaningful measure of mutual two-site information,
and in developing heuristics to optimize ordering based on this measure. A related
problem has cropped up in an attempt to map the one-dimensional Hubbard model
with periodic boundary condition to a model with open boundary conditions [16].
σ,σ′
The transformed effective interaction, which has the form Vp,q,r,s (see Hamiltonian
(24.5)), is then nonlocal. An analysis of the entanglement generated by these nonlo-
cal terms has been used to optimize the site ordering. Such insights are also relevant
to quantum chemical problems.
The local measure of entanglement, the ℓ-site entropy with ℓ = 1, 2, ...N , which
is obtained from the reduced density matrix ρ, can be used to detect and locate
quantum phase transitions (QPTs) [26, 27, 28, 29]. As an example, Fig. 24.3 shows
the block entropy for ℓ = N/2 for the most general isotropic spin-one chain model
described by the Hamiltonian
) 2*
H= cos θ (S i · S i+1 ) + sin θ (S i · S i+1 ) , (24.7)
i
658 Ö. Legeza et al.
3
Block entropy
2
200
0
–1 −0.5 0 0.5 1
θ/π
Fig. 24.3. Entropy of blocks of ℓ = N/2 and ℓ = N/2 + 1 sites of the bilinear-biquadratic
spin S = 1 model for a chain with N = 200 sites
2
(a)
1.5
s(l)
0.5
0 10 20 30 40 50 60
l
20
(b)
15
N 2|s(q)|2
θ = π/4
10 θ = .15π
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
q/π
peaks at q = 0 and π for finite systems. It is known that for θ < π/4 the soft modes
become gapped and the minimum of the energy spectrum moves from q = 2π/3
toward q = π as θ approaches the VBS point [43].
In order to characterize the various phases in the thermodynamic limit, a finite-
size extrapolation must be carried out. Fig. 24.5 displays the behavior of s#(q ∗ )
with system size for a number of values of θ that are representative of the differ-
ent phases. The wave vector q ∗ is chosen to be appropriate for the corresponding
phase, for example, q ∗ = 2π/3 in the trimerized phase. The value q ∗ = 0.53 for
θ = 0.15π (in the incommensurate phase) is the location of the incommensurate
peak; see Fig. 24.4(b). As can be seen, all s#(q ∗ ) → 0 for N → ∞, except in the
range −3π/4 < θ < −π/4 where s#(q = π) remains finite, signaling the bond-
ordered nature of the dimerized phase. Note that the q ∗ = 0 peak (not shown) also
scales to a finite value in much of the phase diagram.
In Fig. 24.6, we summarize the behavior of s#(q) for finite systems and in the
N → ∞ limit. We determine the position of the peaks in s#(q) on finite systems
by finding the maxima in splines fit through the discrete allowed q points. Infinite-
system behavior, obtained from extrapolations (see Fig. 24.5), is also depicted. In
the ferromagnetic phase, θ < −3π/4, θ > π/2, there is a sole peak at q ∗ = 0, as ex-
pected. The q ∗ = 0 peak is present for all θ and persists in the thermodynamic limit.
In the dimer phase, −3π/4 < θ < −π/4, the q ∗ = π peak persists in the thermo-
dynamic limit (see Fig. 24.5). Two different behaviors can be seen in the Haldane
phase, −π/4 < θ < π/4; for θ < θVBS , the q ∗ = π peak present in finite-size
systems vanishes in the thermodynamic limit. For θ > θVBS , the incommensurate
peak present only in finite systems can be seen to move from q = 0 to 2π/3 as
θ goes towards π/4, as also seen in Fig. 24.4. Finally, in the spin nematic phase,
π/4 < θ < π/2, there is a peak at q ∗ = 2π/3 which scales to zero as N → ∞.
Therefore, incommensurability can be detected by the entropy analysis as well.
It is known [54] that the VBS point is a disorder point, where incommensurate
θ = −π/2, q = π
θ = −π/4, q = π
0.25 θ = 0, q=π
θ = .15π, q = .53π
0.2 θ = π/4, q = 2π/3
s(q)
0.15
0.1
0.05
0
0 0.005 0.01 0.015 0.02 0.025 0.03
1/N
Fig. 24.5. Finite-size scaling of s#(q) for a number of representative values of θ at the appro-
priate wave vector q. The continuous lines are fits to a form AN −α + B
24 Applications of Quantum Information 661
vbs
Fig. 24.6. Position of the peak q ∗ in the Fourier-transformed block entropy |# s(q)|2 plotted
as a function of the parameter θ for the bilinear-biquadratic spin chain on system sizes of
N = 120 and N = 180 (for higher resolution near θVBS ), as well as in the thermodynamic
limit. The peak at q ∗ = 0 on finite systems, which is present for all θ, has been removed for
readability
oscillations appear in the decaying correlation function; however, the shift of the
minimum of the static structure factor appears only at a larger value, θL = 0.138π,
the Lifshitz point. In contrast to this, the minimum of the block entropy shown in
Fig. 24.3 is exactly at the VBS point, and therefore indicates the location of the
commensurate-incommensurate transition correctly.
A similar analysis can be carried out for the frustrated J − J ′ Heisenberg spin
1/2 chain with Hamiltonian
) *
H= J (S i · S i+1 ) + J ′ (S i · S i+2 ) , (24.10)
i
with the ratio J ′ /J (J ′ , J > 0) playing the role of the parameter θ in the bilinear-
biquadratic model. For J ′ /J < Jc ≈ 0.2411, the model is in a critical Heisenberg
phase, while a spin gap develops for J ′ /J > Jc . At J ′ /J = 0.5, the Majumdar-
Ghosh point, the model is exactly solvable and the ground state is a product of
1.5
N 2|s(q)|2
J/J′ = 0.9
1 J/J′ = 1
0.5
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
q/π
Fig. 24.7. Power spectrum of the block entropy N 2 |#s(q)|2 for the frustrated Heisenberg
′
chain at J/J = 1, calculated on a chain of length N = 128
662 Ö. Legeza et al.
local dimers [55]. As a function of J ′ /J, the block entropy is continuous, but has
a minimum at J/J ′ = 0.5. For J/J ′ > 0.5 an extra peak appears in the Fourier
spectrum of s#(q) and moves from 0 to π/2 as J/J ′ gets larger. The development of
the incommensurate peaks near J/J ′ = 1 can be seen in Fig. 24.7.
Acknowledgements
This work was supported in part by Hungarian Research Fund (OTKA) Grants No.
K 68340 and NF 61726 and by the János Bolyai Research Fund.
24 Applications of Quantum Information 663
References
1. A. Galindo, M. Martin-Delgado, Rev. Mod. Phys. 74, 347 (2002) 653, 662
2. U. Schollwöck, Rev. Mod. Phys. 77, 259 (2005) 653, 654, 662
3. G. Vidal, J. Latorre, E. Rico, A. Kitaev, Phys. Rev. Lett. 90, 227902 (2003) 654
4. J. Latorre, E. Rico, G. Vidal, Quant. Inf. and Comp. 4, 48 (2004) 654
5. P. Calabrese, J. Cardy, J. Stat. Mech.: Theor. Exp. (2004) 654, 658
6. M. Srednicki, Phys. Rev. Lett. 71, 666 (1993) 654
7. S. White, Phys. Rev. Lett. 69, 2863 (1992) 654
8. S. White, Phys. Rev. B 48, 10345 (1993) 654
9. R. Noack, S. Manmana, in Lectures on the physics of highly correlated electron sys-
tems IX, AIP Conference proceedings, Vol. 789, ed. by A. Avella, F. Mancini (Melville,
New York, 2005), AIP Conference proceedings, Vol. 789, p. 93 654
10. Ö. Legeza, J. Sólyom, Phys. Rev. B 68, 195116 (2003) 654, 656
11. O. Legeza, J. Sólyom, Phys. Rev. B 70, 205118 (2004) 654, 655, 656
12. F. Verstraete, D. Porras, J. Cirac, Phys. Rev. Lett. 93, 227205 (2004) 654
13. F. Verstraete, J. Cirac. URL http://arxiv.org/abs/cond-mat/0407066.
Preprint 654
14. S.R. White, A. Feiguin, Phys. Rev. Lett. 93, 076401 (2004) 654
15. A.J. Daley, C. Kollath, U. Schollwöck, G. Vidal, J. Stat. Mech.: Theor. Exp. P04005
(2004) 654
16. O. Legeza, F. Gebhard, J. Rissler, Phys. Rev. B 74, 195112 (2006) 654, 656, 657
17. B. Schumacher, Phys. Rev. A 51, 2738 (1995) 654
18. R. Jozsa, J. Mod. Opt. 41, 2315 (1994) 654
19. G. Vidal, Phys. Rev. Lett. 91, 147902 (2003) 654
20. A. Kholevo, Probl. Inf. Transm.(USSR) 177, 9 (1973) 654
21. C. Fuchs, C. Caves, Phys. Rev. Lett. 73, 3047 (1994) 654, 655
22. J. Rissler, R. Noack, S. White, Chem. Phys. 323, 519 (2006) 656, 657
23. T. Xiang, Phys. Rev. B 53, 10445 (1996) 656
24. S. Nishimoto, E. Jeckelmann, F. Gebhard, R. Noack, Phys. Rev. B 65, 165114 (2002) 656
25. G.L. Chan, M. Head-Gordon, J. Chem. Phys. 116, 4462 (2002) 656
26. P. Zanardi, Phys. Rev. A 65, 42101 (2002) 657
27. S.J. Gu, S.S. Deng, Y.Q. Li, H.Q. Lin, Phys. Rev. Lett. 93, 86402 (2004) 657, 658
28. J. Vidal, G. Palacios, R. Mosseri, Phys. Rev. A 69, 022107 (2004) 657
29. J. Vidal, R. Mosseri, J. Dukelsky, Phys. Rev. A 69, 054101 (2004) 657
30. G. Fath, J. Sólyom, Phys. Rev. B 44, 11836 (1991) 658, 659
31. G. Fath, J. Sólyom, Phys. Rev. B 47, 872 (1993) 658, 659
32. G. Fath, J. Sólyom, Phys. Rev. B 51, 3620 (1995) 658, 659
33. L. Takhtajan, Phys. Lett. A 87, 479 (1982) 658
34. H.M. Babujian, Phys. Lett. A 90, 479 (1982) 658
35. G. Uimin, JETP Lett. 12, 225 (1970) 658, 659
36. C. Lai, J. Math. Phys. 15, 1675 (1974) 658, 659
37. B. Sutherland, Phys. Rev. B 12, 3795 (1975) 658, 659
38. A. Chubukov, J. Phys. Condens. Matter 2, 1593 (1990) 658
39. A. Chubukov, Phys. Rev. B 43, 3337 (1991) 658
40. A. Läuchli, G. Schmid, S. Trebst, Phys. Rev. B 74, 144426 (2006) 658
41. K. Buchta, G. Fath, Ö. Legeza, J. Sólyom, Phys. Rev. B 72, 054433 (2005) 658
42. Ö. Legeza, J. Sólyom, Phys. Rev. Lett. 96, 116401 (2006) 658
43. I. Affleck, T. Kennedy, E. Lieb, H. Tasaki, Phys. Rev. Lett. 59, 799 (1987) 658, 660
664 Ö. Legeza et al.
44. D. Larsson, H. Johannesson, Phys. Rev. Lett. 95, 196406 (2005) 658
45. D. Larsson, H. Johannesson, Phys. Rev. A 73, 155108 (2007) 658
46. K. Buchta, Ö. Legeza, E.S.J. Sólyom, Phys. Rev. B 75, 155108 (2007) 658
47. J. Parkinson, J. Phys. C 20, L1029 (1987) 658
48. J. Parkinson, J. Phys. C 21, 3793 (1988) 658
49. C. Holzhey, F. Larsen, F. Wilczek, Nucl. Phys. B 424, 443 (1994) 658
50. I. Affleck, A.W.W. Ludwig, Phys. Rev. Lett. 67, 161 (1991) 659
51. N. Laflorencie, E.S. Sørensen, M.S. Chang, I. Affleck, Phys. Rev. Lett. 96, 100603 (2006)
659
52. C. Itoi, M.H. Kato, Phys. Rev. B 55, 8295 (1997) 659
53. Ö. Legeza, J. Sólyom, L. Tincani, R.M. Noack, Phys. Rev. Lett. 99, 087203 (2007) 659
54. U. Schollwöck, T. Jolicoeur, T. Garel, Phys. Rev. B 53, 3304 (1996) 660
55. C.K. Majumdar, D.K. Ghosh, J. Mat. Phys. 10, 1388, 1399 (1969) 662
56. L. Amico, R. Fazio, A. Osterloh, V. Vedral. URL http://arxiv.org/abs/
quant-ph/0703044. Preprint 662
25 Density-Matrix Renormalization Group
for Transfer Matrices: Static and Dynamical
Properties of 1D Quantum Systems
at Finite Temperature
25.1 Introduction
Several years after the invention of the DMRG method to study ground-state prop-
erties of 1D quantum systems [1], Nishino showed that the same method can also
be applied to the transfer matrix of a 2D classical system hence allowing to calcu-
late its partition function at finite temperature [2]. The same idea can also be used
to calculate the thermodynamic properties of a 1D quantum system after mapping
it to a 2D classical one with the help of a Trotter-Suzuki decomposition [3, 4, 5].
Bursill et al. [6] then presented the first application but the density matrix chosen
in this work to truncate the Hilbert space was not optimal so that the true potential
of this new numerical method was not immediately clear. This changed when Wang
and Xiang [7] and Shibata [8] presented an improved algorithm and showed that
the density-matrix renormalization group applied to transfer matrices (which we
will denote as TMRG from hereon) is indeed a serious competitor to other numeri-
cal methods as for example Quantum-Monte-Carlo (QMC). Since then, the TMRG
method has been successfully applied to a number of systems including various spin
S. Glocke et al.: Density-Matrix Renormalization Group for Transfer Matrices: Static and Dynamical Properties of 1D
Quantum Systems at Finite Temperature, Lect. Notes Phys. 739, 665–677 (2008)
DOI 10.1007/978-3-540-74686-7 25 c Springer-Verlag Berlin Heidelberg 2008
666 S. Glocke et al.
chains, the Kondo lattice model, the t − J chain and ladder and also spin-orbital
models [9, 10, 11, 12, 13, 14, 15, 16, 17].
The main advantage of the TMRG algorithm is that the thermodynamic limit
can be performed exactly thus avoiding an extrapolation in system size. Further-
more, there are no statistical errors and results can be obtained with an accuracy
comparable to (T = 0) DMRG calculations. Similar to the (T = 0) DMRG al-
gorithms, the method is best suited for 1D systems with short range interactions.
These systems can, however, be either bosonic or fermionic because no negative
sign problem as in QMC exists. Most important, there are two areas where TMRG
seems to have an edge over any other numerical methods known today. These are:
(i) Impurity or boundary contributions, and
(ii) real-time dynamics at finite temperature.
As first shown by Rommer and Eggert [18], the TMRG method allows it to separate
an impurity or boundary contribution from the bulk part thus giving direct access
to quantities which are of order O(1/L) compared to the O(1) bulk contribution
(here L denotes the length of the system). We will discuss this in more detail in
Sect. 25.5. Calculating numerically the dynamical properties for large or even infi-
nite 1D quantum systems constitutes a particularly difficult problem because QMC
and TMRG algorithms can usually only deal with imaginary-time correlation func-
tions. The analytical continuation of numerical data is, however, an ill-posed prob-
lem putting severe constraints on the reliability of results obtained this way. Very
recently, two of us have presented a modified TMRG algorithm which allows for
the direct calculation of real-time correlations [19]. This new algorithm will be dis-
cussed in Sect. 25.6.
Before coming to these more recent developments we will discuss the definition
of an appropriate quantum transfer matrix for the classical system in Sect. 25.2 and
describe how the DMRG algorithm is applied to this object in Sect. 25.3. Here we
will follow in parts the article by Wang and Xiang in [20] but, at the same time, also
discuss an alternative Trotter-Suzuki decomposition [15, 16].
The standard mapping, widely used in QMC and TMRG calculations, is described
in detail in [20]. Therefore we only summarize it briefly here. First, the Hamiltonian
25 Density-Matrix Renormalization Group for Transfer Matrices 667
with ǫ = β/M , β being the inverse temperature and M an integer (the so called
Trotter number). By inserting 2M times a representation of the identity operator,
the partition function is expressed by a product of local Boltzmann weights
i,i+1 " −ǫH " i
τk,k+1 = sik si+1 "e e,o "
sk+1 si+1
k k+1 , (25.3)
denoted in a graphical language by a shaded plaquette (see Fig. 25.1). The sub-
scripts i and k represent the spin coordinates in the space and the Trotter (imaginary
time) directions, respectively. A column-to-column transfer matrix TM , the so called
quantum transfer matrix (QTM), can now be defined using these local Boltzmann
weights
TM = (τ1,2 τ3,4 . . . τ2M−1,2M ) (τ2,3 τ4,5 . . . τ2M,1 ) . (25.4)
and is shown in the left part of Fig. 25.1. The partition function is then simply given
by
L/2
Z = Tr TM . (25.5)
The disadvantage of this Trotter-Suzuki mapping to a 2D lattice with checker-
board structure is that the QTM is two columns wide. This increases the amount
of memory necessary to store it and also complicates the calculation of correlation
functions.
QTM QTM
Fig. 25.1. The left part shows the standard Trotter-Suzuki mapping of the 1D quantum chain
to a 2D classical model with checkerboard structure where the vertical direction corresponds
to imaginary time. The QTM is two-column wide. The right part shows the alternative map-
ping. Here, the QTM is only one column wide
668 S. Glocke et al.
with T1,2 (ǫ) = TR,L exp[−ǫH + O(ǫ2 )]. Here, TR,L are the right- and left-shift
operators, respectively. The obtained classical lattice has alternating rows and addi-
tional points in a mathematical auxiliary space. Its main advantage is that it allows
to formulate a QTM which is only one column wide (see right part of Fig. 25.1).
The derivation of this QTM is completely analogous to the standard one, even the
shaded plaquettes denote the same Boltzmann weight. Here, however, these weights
are rotated by 45◦ clockwise and anti-clockwise in an alternating fashion from row
to row. Using this transfer matrix, T#M , the partition function is given by Z = Tr T#M
L
.
The reason why this transfer matrix formalism is extremely useful for numerical
calculations has to do with the eigenspectrum of the QTM. At infinite temperature
it is easy to show [21] that the largest eigenvalue of the QTM TM (T#M ) is given by
S 2 (S) and all other eigenvalues are zero. Here S denotes the number of degrees of
freedom of the physical system per lattice site. Decreasing the temperature, the gap
between the leading eigenvalue Λ0 and next-leading eigenvalues Λn (n > 0) of the
transfer matrix shrinks. The ratio between Λ0 and each of the other eigenvalues Λn ,
however, defines a correlation length 1/ξn = ln |Λ0 /Λn | [20, 21]. Because an 1D
quantum system cannot order at finite temperature, any correlation length ξn will
stay finite for T > 0, i.e., the gap between the leading and any next-leading eigen-
value stays finite. Therefore the calculation of the free energy in the thermodynamic
limit boils down to the calculation of the largest eigenvalue Λ0 of the QTM
1 1
f = − lim ln Z = − lim lim ln Tr T#M L
L→∞ βL L→∞ ǫ→0 βL
$ %
1 L L ln Λ0 (25.7)
= − lim lim ln Λ0 1 + (Λl /Λ0 ) = − lim .
ǫ→0 L→∞ βL ǫ→0 β
l>1
L→∞
−−−→0
Here the interchangeability of the limits L → ∞ and ǫ → 0 has been used [5]. Local
expectation values and static two-point correlation functions can be calculated in a
similar fashion (see e.g. [20] and [21]). In the next section, we are going to show
how the eigenvalues of the QTM are computed by means of the density matrix
renormalization group. This is possible since the transfer matrices are built from
local objects. Instead of sums of local objects we are dealing with products, but
this is not essential to the numerical method. However, there are a few important
differences in treating transfer matrices instead of Hamiltonians. At first sight, these
differences look technical, but at closer inspection they reveal a physical core.
The QTMs as introduced above are real valued, but not symmetric. This is not
a serious drawback for numerical computations, but certainly inconvenient. So the
25 Density-Matrix Renormalization Group for Transfer Matrices 669
first question that arises is whether the transfer matrices can be symmetrized. Un-
fortunately, this is not the case. If the transfer matrix were replaceable by a real
symmetric (or a hermitean) matrix all eigenvalues would be real and the ratios of
next-leading eigenvalues to the leading eigenvalue would be real, positive or nega-
tive. Hence all correlation functions would show commensurability with the lattice.
However, we know that a generic quantum system at sufficiently low temperatures
yields incommensurate oscillations with wave vectors being multiples of the Fermi
vector taking rather arbitrary values.
Therefore we know that the spectrum of a QTM must consist of real eigenvalues
or of complex eigenvalues organized in complex conjugate pairs. This opens the
possibility to understand the QTM as a normal matrix upon a suitable choice of
the underlying scalar product. Unfortunately, the above introduced matrices are not
†
normal with respect to standard scalar products, i.e. we do not have [T#M , T#M ] = 0.
ρ = T#M
L
, (25.8)
" R L"
which reduces to ρ = "Ψ0 Ψ0 " up to a normalization constant in the thermody-
namic limit. As in the zero-temperature DMRG algorithm, a reduced density matrix
ρS is obtained by taking a partial trace over the environment
" "
ρS = TrE {"Ψ0R Ψ0L "} . (25.9)
Note that this matrix is real but non-symmetric, which complicates its numerical
diagonalization. It also allows for complex conjugated pairs of eigenvalues which
have to be treated separately (see [21] for details).
In actual computations, the Trotter-Suzuki parameter ǫ is fixed. Therefore the
temperature T ∼ 1/ǫM is decreased by an iterative algorithm M → M + 1. In the
following, the blocks of the QTM, T#M , are shown in a 90◦ -rotated view.
ns
σ τ
ns´
Fig. 25.2. The system block Γ . The plaquettes are connected by a summation over the adja-
cent corner spins
670 S. Glocke et al.
ns s2 s1 ne
σ σ
(i) First we construct the initial system block Γ (see Fig. 25.2) consisting of M
plaquettes so that S M ≤ N < S M+1 , where S is the dimension of the local
Hilbert space and N is the number of states which we want to keep. ns , n′s are
block-spin variables and contain N # = S M states. The S 2 · N # 2 -dimensional
′
array Γ (σ, ns , τ, ns ) is stored.
# 2 )-dimensional ar-
(ii) The enlarged system block Γ#(σ, ns , s2 , τ, s′2 , n′s ), a (S 4 · N
ray, is formed by adding a plaquette to the system block. If hi,i+1 is real and
translationally invariant, the environment block can be constructed by a 180◦ -
rotation and a following inversion of the system block. Otherwise the environ-
ment block has to be treated separately like the system block. Together both
blocks form the superblock (see Fig. 25.3).
(iii) The leading eigenvalue Λ0 and the corresponding left and right eigenstates
L" "
Ψ0 " = Ψ L (s1 , ns , s2 , ne ) , "Ψ0R = Ψ R (s′1 , n′s , s′2 , n′e )
are calculated and normalized Ψ0L |Ψ0R = 1. Now thermodynamic quantities
can be evaluated at the temperature T = 1/(2ǫ(M + 1)).
(iv) A reduced density matrix is calculated by performing the trace over the envi-
ronment
" "
ρs (n′s , s′2 |ns , s2 ) = "Ψ0R Ψ0L " = Ψ R (s1 , n′s , s′2 , ne )Ψ L (s1 , ns , s2 , ne )
s1 ,ne s1 ,ne
# L
and
Rthe ′complete
′ ′
spectrum is computed. A (N ×(S· N ))-matrix V (# ns |ns , s2 )
V (# ns |ns , s2 ) is constructed using the left (right) eigenstates belonging to
the N largest eigenvalues, where n n′s ) is a new renormalized block-spin
#s (#
variable with only N possible values.
ns
ns s2 ns
block
σ τ σ τ
renormalized
ns´ s 2´ ns´
n∼ s´
Fig. 25.4. The renormalization step for the system block
25 Density-Matrix Renormalization Group for Transfer Matrices 671
(v) Using V L and V R the system block is renormalized. The renormalization (see
Fig. 25.4) is given by
Γ (σ, n # ′s ) =
#s , τ, n ns |ns , s2 )Γ#(σ, ns , s2 , τ, s′2 , n′s )V R (#
V L (# n′s |n′s , s′2 ) .
ns ,s2 n′s ,s′2
Now the algorithm is repeated starting with step 2 using the new system block.
# values.
However, the block-spin variables can now take N instead of N
where Hext is the external uniform magnetic field and g the Landé factor. An effec-
tive staggered magnetic field is realized in spin-chain compounds as for example
copper pyrimidine dinitrate (CuPM) or copper benzoate if an external uniform mag-
netic field Hext is applied [22]. For CuPM the magnetization as a function of applied
1
(a) 0,15
m (µB/Cu)
0,8 0,1
0,05
0,6
m (µB/Cu)
0
0 5 10
µ0H (T)
0,4
χ (µB/ T Cu)
0,03
(b)
0,02 H=0
0,2
0,01
0 20 40 60
T (K)
0
0 10 20 30 40 50
µ0H (T)
Fig. 25.5. TMRG data (solid line) and experimental magnetization curves (circles) for CuPM
at a temperature T = 1.6 K with the magnetic field applied along the c′′ axis. For comparison
ED data for a system of 16 sites and T = 0 are shown (dashed lines). Here J/kB = 36.5 K,
hu = gμB Hext /J, hs = 0.11 hu and g = 2.19. Inset (a): Magnetization for small magnetic
fields. Inset (b): Susceptibility as a function of temperature T at Hext = 0 calculated by
TMRG
672 S. Glocke et al.
magnetic field Hext has been measured experimentally. In Fig. 25.5 the excellent
agreement between these experimental and TMRG data at a temperature T = 1.6 K
with a magnetic field applied along the c′′ axis is shown. Along the c′′ axis the ef-
fect due to the induced staggered field is largest (see [23] for more details). Note
that at low magnetic fields the TMRG data describe the experiment more accurately
than the exact diagonalization (ED) data, because there are no finite size effects (see
inset (a) of Fig. 25.5). For a magnetic field Hext applied along the c′′ axis a gap,
2/3
∆ ∝ Hext , is induced with multiplicative logarithmic corrections. For Hext → 0
and low T the susceptibility diverges χ ∼ 1/T because of the staggered part [24]
(see inset (b) of Fig. 25.5).
Z = Tr T#M L−1
Timp , (25.11)
where Timp is the QTM describing the site impurity or the modified bond. In the
thermodynamic limit the total free energy then becomes
F = −T ln Z = Lfbulk + Fimp = −LT ln Λ0 − T ln(λimp /Λ0 ) , (25.12)
with Λ0 being the largest eigenvalue of the QTM, T#M , and λimp = Ψ0L |Timp |Ψ0R .
As example, we want to consider a semi-infinite spin-1/2 XXZ-chain with an
open boundary. In this case translational invariance is broken and field theory pre-
dicts Friedel-type oscillations in the local magnetization S z (r) and susceptibility
χ(r) = ∂S z (r)/∂h near the boundary [30, 31]. Using the TMRG method the
local magnetization can be calculated by
ΨL0 |T# (S z )T# r−1 Timp |ΨR0
S z (r) = , (25.13)
Λr0 λimp
where T# (S z ) is the transfer matrix with the operator S z included and Timp is the
transfer matrix corresponding to the bond with zero exchange coupling. Hence
Timp |ΨR0 is nothing but the state describing the open boundary at the right. In
Fig. 25.6 the susceptibility profile as a function of the distance r from the boundary
for various temperatures as obtained by TMRG calculations [31] is shown. For more
details the reader is referred to [18] and [31].
25 Density-Matrix Renormalization Group for Transfer Matrices 673
0.6
T = 0.014
0.4 T = 0.02
T = 0.03
0.2 T = 0.045
χ(r) - χbulk
T = 0.1
T = 0.4
0
–0.2
–0.4
0 20 40 60 80 100
r
Fig. 25.6. Susceptibility profile for ∆ = 0.6 and different temperatures T . N = 240 states
have been kept in the DMRG algorithm. The lines are a guide to the eye
Here [r/2] denotes the first integer smaller than or equal to r/2 and we have set
T ≡ T2N,M . A graphical representation of the transfer matrices appearing in the
numerator of (25.17) is shown in Fig. 25.7. This new transfer matrix can again be
treated with the DMRG algorithm described in Sect. 25.3 where either a τ or v
plaquette is added corresponding to a decrease in temperature T or an increase in
real time t, respectively.
To demonstrate the method, results for the longitudinal spin-spin autocorrelation
function of the XXZ-chain at infinite temperature are shown in Fig. 25.8. For ∆ = 0
the XXZ-model corresponds to free spinless fermions and is exactly solvable. We
focus on the case of free fermions, as here the analysis of the dynamical TMRG
(DTMRG) method, its results and numerical errors can be done to much greater ex-
tent than in the general case. The performance of the DTMRG itself is expected to
be independent of the strength of the interaction. The comparison with the exact re-
sult in Fig. 25.8 shows that the maximum time before the DTMRG algorithm breaks
down increases with the number of states. However, the improvement when taking
25 Density-Matrix Renormalization Group for Transfer Matrices 675
e−βΗ/2
E
e−itH
e itH
S
e−βΗ/2
Fig. 25.7. Transfer matrices appearing in the numerator of (25.17) for r > 1 with r even. The
T , T (O)
two big black dots denote the operator O. consist of three parts: A part representing
exp(−βH) (vertically striped plaquettes), another for exp(itH) (stripes from lower left to
are
upper right) and a third part describing exp(−itH) (upper left to lower right). T , T (O)
split into system (S) and environment (E)
N = 400 instead of N = 300 states is marginal. The reason for the breakdown of
the DTMRG computation can be traced back to an increase of the discarded weight
(see inset of Fig. 25.9). Throughout the RG procedure we keep only N of the leading
eigenstates of the reduced density matrix ρS . As long as the discarded states carry
a total weight less than, say, 10−3 the results are faithful. For infinite temperature
and ∆ = 0 we could explain the rapid increase of the discarded weight with time
by deriving an explicit expression for the leading eigenstate of the QTM as well as
for the corresponding reduced density matrix. At the free fermion point the spec-
trum of this density matrix is multiplicative. Hence, from the one-particle spectrum
0.25 0.2
<S (t)S (0)>
0.15
0.2
z
0.1
z
<S (t)S (0)>
0.15
z
0.1 400
0
0 2 4 6 8 10 12
0.05 Jt
100 400
0 exact
50 300
200
0 2 4 6 8 10 12 14
Jt
Jt
0 1 2 3 4 5 6 7 8 9 10 11 12
1
1 × 10–2
1 × 10–4
1 × 10–6
Λi = 1,...,100
1 × 10–8 1 × 10–2
1 × 10–10 1 × 10–4
1 – Σ Λi
i=1
100
1 × 10–12 1 × 10–6
1 × 10–14 1 × 10–8
5 6 7 8 9 10 11 12
Jt
Acknowledgement
S.G. acknowledges support by the DFG under contracts KL645/4-2 and GK1052
(Representation theory and its applications in mathematics and physics) and J.S.
by the DFG and NSERC. The numerical calculations have been performed in part
using the Westgrid Facility (Canada).
References
1. S. White, Phys. Rev. Lett. 69, 2863 (1992) 665
2. T. Nishino, J. Phys. Soc. Jpn. 64, 3598 (1995) 665
3. H. Trotter, Proc. Amer. Math. Soc. 10, 545 (1959) 665, 666
4. M. Suzuki, Commun. Math. Phys. 51(2), 183 (1976) 665, 666
5. M. Suzuki, Phys. Rev. B 31, 2957 (1985) 665, 666, 668
6. R. Bursill, T. Xiang, G. Gehring, J. Phys. - Condens. Mat. 8, L583 (1996) 665
25 Density-Matrix Renormalization Group for Transfer Matrices 677
In the past two decades the accessible compute power for numerical simulations has
increased by more than three orders of magnitude. Many-particle physics has largely
benefited from this development because the complex particle-particle interactions
often exceed the capabilities of analytical approaches and require sophisticated nu-
merical simulations. The significance of these simulations, which may require large
amounts of data and compute cycles, is frequently determined both by the choice of
an appropriate numerical method or solver and the efficient use of modern comput-
ers. In particular, the latter point is widely underestimated and requires an under-
standing of the basic concepts of current (super) computer systems.
In this chapter we present a comprehensive introduction to the architectural con-
cepts and performance characteristics of state-of-the art high performance comput-
ers, ranging from the “poor man’s” Linux cluster to leading edge supercomputers
with thousands of processors. In Sect. 26.1 we discuss basic features of modern
commodity microprocessors with a slight focus on Intel and AMD products. Vector
systems (NEC SX8) are briefly touched. The main emphasis is on the various ap-
proaches used for on-chip parallelism and data access, including cache design, and
the resulting performance characteristics.
In Sect. 26.2 we turn to the fundamentals of parallel computing. First we explain
the basics and limitations of parallelism without specialization to a concrete method
or computer system. Simple performance models are established which help to un-
derstand the most severe bottlenecks that will show up with parallel programming.
In terms of concrete manifestations of parallelism we then cover the principles
of distributed-memory parallel computers, of which clusters are a variant. These
systems are programmed using the widely accepted message passing paradigm
where processes running on the compute nodes communicate via a library that
sends and receives messages between them and thus serves as an abstraction layer
to the hardware interconnect. Whether the program is run on an inexpensive clus-
ter with bare Gigabit Ethernet or on a special-purpose vector system featuring a
high-performance switch like the NEC IXS does not matter as far as the paral-
lel programming paradigm is concerned. The Message Passing Interface (MPI) has
emerged as the quasi-standard for message passing libraries. We introduce the most
important MPI functionality using some simple examples. As the network is often
a performance-limiting aspect with MPI programming, some comments are made
G. Hager and G. Wellein: Architecture and Performance Characteristics of Modern High Performance Computers,
Lect. Notes Phys. 739, 681–730 (2008)
DOI 10.1007/978-3-540-74686-7 26
c Springer-Verlag Berlin Heidelberg 2008
682 G. Hager and G. Wellein
26.1 Microprocessors
In the “old days” of scientific supercomputing roughly between 1975 and 1995,
leading-edge high performance systems were specially designed for the HPC mar-
ket by companies like Cray, NEC, Thinking Machines, or Meiko. Those systems
were way ahead of standard commodity computers in terms of performance and
price. Microprocessors, which had been invented in the early 1970s, were only ma-
ture enough to hit the HPC market by the end of the 1980s, and it was not until
the end of the 1990s that clusters of standard workstation or even PC-based hard-
ware had become competitive at least in terms of peak performance. Today the sit-
uation has changed considerably. The HPC world is dominated by cost-effective,
off-the-shelf systems with microprocessors that were not primarily designed for sci-
entific computing. A few traditional supercomputer vendors act in a niche market.
They offer systems that are designed for high application performance on the sin-
gle CPU level as well as for highly parallel workloads. Consequently, the scientist
is likely to encounter commodity clusters first and only advance to more special-
ized hardware as requirements grow. For this reason we will mostly be focused on
microprocessor-based systems in this paper. Vector computers show a different pro-
gramming paradigm which is in many cases close to the requirements of scientific
computation, but they have become rare animals.
Microprocessors are probably the most complicated machinery that man has
ever created. Understanding all inner workings of a CPU is out of the question for
the scientist and also not required. It is helpful, though, to get a grasp of the high-
level features in order to understand potential bottlenecks. Figure 26.1 shows a very
26 Modern High Performance Computers 683
Memory queue
shift
L1D
L2 unified cache
cache
INT
op
LD
ST
INT/FP queue
FP reg. file
FP
interface
mult
Memory
L1I
FP
cache add
All those components can operate at some maximum speed called peak perfor-
mance. Whether this limit can be reached with a specific application code depends
on many factors and is one of the key topics of Chap. 27. Here we would like to
introduce some basic performance metrics that can quantify the speed of a CPU.
Scientific computing tends to be quite centric to floating-point data, usually with
double precision (DP). The performance at which the FP units generate DP results
for multiply and add operations is measured in floating-point operations per sec-
ond (Flops/sec). The reason why more complicated arithmetic (divide, square root,
trigonometric functions) is not counted here is that those are executed so slowly
compared to add and multiply as to not contribute significantly to overall perfor-
mance in most cases (see also Sect. 27.1). At the time of writing, standard micro-
processors feature a peak performance between 4 and 12 GFlops/sec.
684 G. Hager and G. Wellein
Listing 26.1. Basic code fragment for the vector triad benchmark, including performance
measurement
double precision A(N),B(N),C(N),D(N),S,E,MFLOPS
S = get_walltime()
do j=1,R
do i=1,N
A(i) = B(i) + C(i) * D(i) ! 3 loads, 1 store
enddo
call dummy(A,B,C,D) ! prevent loop interchange
enddo
E = get_walltime()
MFLOPS = R*N*2.d0/((E-S)*1.d6) ! compute MFlop/sec rate
1
Please note that the giga and mega prefixes refer to a factor of 109 and 106 , respectively,
when used in conjunction with ratios like bandwidth or performance.
26 Modern High Performance Computers 685
This effectively prevents the optimization described, and the cost for the call are
negligible as long as N is not too small. Optionally, the call can be masked by an if
statement whose condition is never true (a fact that must of course also be hidden
from the compiler).
The MFLOPS variable is computed to be the MFlops/sec rate for the whole loop
nest. Please note that the most sensible time measure in benchmarking is wallclock
time. Any other “time” that the runtime system may provide, first and foremost
the often-used CPU time, is prone to misinterpretation because there might be con-
tributions from I/O, context switches, other processes etc. that CPU time cannot
encompass. This is even more true for parallel programs (see Sect. 26.2).
Figure 26.2 shows performance graphs for the vector triad obtained on current
microprocessor and vector systems. For very small loop lengths we see poor per-
formance no matter which type of CPU or architecture is used. On standard micro-
processors, performance grows with N until some maximum is reached, followed
by several sudden breakdowns. Finally, performance stays constant for very large
loops. Those characteristics will be analyzed and explained in the following sec-
tions.
Vector processors (dotted line in Fig. 26.2) show very contrasting features. The
low-performance region extends much farther than on microprocessors, but after
saturation at some maximum level there are no breakdowns any more. We con-
clude that vector systems are somewhat complementary to standard CPUs in that
they meet different domains of applicability. It may, however, be possible to opti-
mize real-world code in a way that circumvents the low-performance regions. See
Sect. 27.1 for details.
Low-level benchmarks are powerful tools to get information about the basic ca-
pabilities of a processor. However, they often cannot accurately predict the behavior
of real application code. In order to decide whether some CPU or architecture is
4000
3000
Intel Xeon/Netburst (3.2 GHz)
Intel Xeon/Core (2.66 GHz)
MFlops/sec
1000
0 1 2 3 4 5 6 7
10 10 10 10 10 10 10
N
Fig. 26.2. Serial vector triad performance data for different architectures. Note the entirely
different performance characteristics of the vector processor (NEC SX8)
686 G. Hager and G. Wellein
well-suited for some application (e.g., in the run-up to a procurement), the only safe
way is to prepare application benchmarks. This means that an application code is
used with input parameters that reflect as closely as possible the real requirements
of production runs but lead to a runtime short enough for testing (no more than a few
minutes). The decision for or against a certain architecture should always be heavily
based on application benchmarking. Standard benchmark collections like the SPEC
suite [3] can only be rough guidelines.
Computer technology had been used for scientific purposes and, more specifically,
for numerical calculations in physics long before the dawn of the desktop PC. For
more than 30 years scientists could rely on the fact that no matter which technology
was implemented to build computer chips, their complexity or general capability
doubled about every 24 months. In its original form, Moore’s law stated that the
number of components (transistors) on a chip required to hit the “sweet spot” of
minimal manufacturing cost per component would increase at the indicated rate [4].
This has held true since the early 1960s despite substantial changes in manufactur-
ing technologies that have happened over the decades. Amazingly, the growth in
complexity has always roughly translated to an equivalent growth in compute per-
formance, although the meaning of performance remains debatable as a processor
is not the only component in a computer (see below for more discussion regarding
this point).
Increasing chip transistor counts and clock speeds have enabled processor de-
signers to implement many advanced techniques that lead to improved applica-
tion performance. A multitude of concepts have been developed, including the
following:
(i) Pipelined functional units. Of all innovations that have entered computer de-
sign, pipelining is perhaps the most important one. By subdividing complex
operations (like, e.g., floating point addition and multiplication) into simple
components that can be executed using different functional units on the CPU,
it is possible to increase instruction throughput, i.e. the number of instructions
executed per clock cycle. Optimally pipelined execution leads to a throughput
of one instruction per cycle. At the time of writing, processor designs exist that
feature pipelines with more than 30 stages. See the next section for details.
(ii) Superscalar architecture. Superscalarity provides for an instruction through-
put of more than one per cycle by using multiple, identical functional units
concurrently. This is also called instruction-level parallelism (ILP). Modern
microprocessors are up to six-way superscalar.
(iii) Out-of-order execution. If arguments to instructions are not available on time,
e.g. because the memory subsystem is too slow to keep up with processor
speed, out-of-order execution can avoid pipeline bubbles by executing instruc-
tions that appear later in the instruction stream but have their parameters avail-
able. This improves instruction throughput and makes it easier for compilers to
26 Modern High Performance Computers 687
26.1.3 Pipelining
Normalize A A
A(1) A(2) ... (N−1) A(N)
result (N−3) (N−2)
Insert A A A
A(1) ... (N−2) A(N)
sign (N−4) (N−3) (N−1)
Fig. 26.3. Timeline for a simplified floating-point multiplication pipeline that executes
A(:)=B(:)*C(:). One result is generated on each cycle after a five-cycle wind-up phase
26 Modern High Performance Computers 689
0.8
0.6
N/Tpipe
0.4
m=5
m=10
0.2 m=30
m=100
0
1 10 100 1000
N
N 1
= , (26.2)
Tpipe 1 + m−1
N
approaching one for large N (see Fig. 26.4). It is evident that the deeper the pipeline
the larger the number of independent operations must be to achieve reasonable
throughput because of the overhead incurred by wind-up and wind-down phases.
One can easily determine how large N must be in order to get at least p results
per cycle (0 < p ≤ 1):
1 (m − 1)p
p= m−1 =⇒ Nc = . (26.3)
1+ Nc
1−p
Note that although a depth of five is not unrealistic for a FP multiplication pipeline,
executing a real code involves more operations like, e.g., loads, stores, address cal-
culations, opcode fetches etc. that must be overlapped with arithmetic. Each operand
of an instruction must find its way from memory to a register, and each result must
be written out, observing all possible interdependencies. It is the compiler’s job to
arrange instructions in a way to make efficient use of all the different pipelines. This
is most crucial for in-order architectures, but also required on out-of-order proces-
sors due to the large latencies for some operations.
As mentioned above, an instruction can only be executed if its operands are
available. If operands are not delivered on time to execution units, all the compli-
cated pipelining mechanisms are of no use. As an example, consider a simple scaling
loop:
do i=1,N
A(i) = s * A(i)
enddo
Although the multiply operation can be pipelined, the pipeline will stall if the load
operation on A(i) does not provide the data on time. Similarly, the store operation
can only commence if the latency for mult has passed and a valid result is available.
Assuming a latency of four cycles for load, two cycles for mult and two cycles
for store, it is clear that above pseudo-code formulation is extremely inefficient.
It is indeed required to interleave different loop iterations to bridge the latencies and
avoid stalls:
loop: load A(i+6)
mult A(i+2) = A(i+2) * s
store A(i)
branch -> loop
Here we assume for simplicity that the CPU can issue all four instructions of an it-
eration in a single cycle and that the final branch and loop variable increment comes
at no cost. Interleaving of loop iterations in order to meet latency requirements is
called software pipelining. This optimization asks for intimate knowledge about pro-
cessor architecture and insight into application code on the side of compilers. Often,
heuristics are applied to arrive at optimal code.
26 Modern High Performance Computers 691
As the loop is traversed from small to large indices, it makes a huge difference
whether the offset is negative or positive. In the latter case we speak of a pseudo-
dependency, because A(i+1) is always available when the pipeline needs it for
computing A(i), i.e. there is no stall. In case of a real dependency, however, the
pipelined computation of A(i) must stall until the result A(i-1) is completely
finished. This causes a massive drop in performance as can be seen on the left of
Fig. 26.5. The graph shows the performance of the above scaling loop in MFlops/sec
versus loop length. The drop is clearly visible only in cache because of the small
latencies of on-chip caches. If the loop length is so large that all data has to be
fetched from memory, the impact of pipeline stalls is much less significant.
Although one might expect that it should make no difference whether the offset
is known at compile time, the right graph in Fig. 26.5 shows that there is a dramatic
performance penalty for a variable offset. Obviously the compiler cannot optimally
software pipeline or otherwise optimize the loop in this case. This is actually a com-
mon phenomenon, not exclusively related to software pipelining; any obstruction
that hides information from the compiler can have a substantial performance im-
pact.
600 600
400 400
200 200
0 2 4 6
0 2 4 6
10 10 10 10 10 10
N N
Fig. 26.5. Influence of constant (left) and variable (right) offsets on the performance of a
scaling loop. (AMD Opteron 2.0 GHz)
692 G. Hager and G. Wellein
There are issues with software pipelining linked to the use of caches. See below
for details.
26.1.5.1 Cache
Caches are low-capacity, high-speed memories that are nowadays usually integrated
on the CPU die. The need for caches can be easily understood by the fact that data
26 Modern High Performance Computers 693
"DRAM gap"
L2 cache
CPU chip
L1 cache
Registers
Arithmetic units
Computation
transfer rates to main memory are painfully slow compared to the CPU’s arithmetic
performance. At a peak performance of several GFlops/sec, memory bandwidth, i.e.
the rate at which data can be transferred from memory to the CPU, is still stuck at a
couple of GBytes/sec, which is entirely insufficient to feed all arithmetic units and
keep them busy continuously (see Sect. 27.1 for a more thorough analysis). To make
matters worse, in order to transfer a single data item (usually one or two DP words)
from memory, an initial waiting time called latency occurs until bytes can actually
flow. Often, latency is defined as the time it takes to transfer a zero-byte message.
Memory latency is usually of the order of several hundred CPU cycles and is com-
posed of different contributions from memory chips, the chipset and the processor.
Although Moore’s law still guarantees a constant rate of improvement in chip com-
plexity and (hopefully) performance, advances in memory performance show up at
a much slower rate. The term DRAM gap has been coined for the increasing distance
between CPU and memory in terms of latency and bandwidth.
Caches can alleviate the effects of the DRAM gap in many cases. Usually there
are at least two levels of cache (see Fig. 26.6), and there are two L1 caches, one
for instructions (I-cache) and one for data. Outer cache levels are normally unified,
storing data as well as instructions. In general, the closer a cache is to the CPU’s
registers, i.e. the higher its bandwidth and the lower its latency, the smaller it must be
to keep administration overhead low. Whenever the CPU issues a read request (load)
for transferring a data item to a register, first-level cache logic checks whether this
item already resides in cache. If it does, this is called a cache hit and the request can
be satisfied immediately, with low latency. In case of a cache miss, however, data
694 G. Hager and G. Wellein
must be fetched from outer cache levels or, in the worst case, from main memory. If
all cache entries are occupied, a hardware-implemented algorithm evicts old items
from cache and replaces them with new data. The sequence of events for a cache
miss on a write is more involved and will be described later. Instruction caches
are usually of minor importance as scientific codes tend to be largely loop-based;
I-cache misses are rare events.
Caches can only have a positive effect on performance if the data access pattern
of an application shows some locality of reference. More specifically, data items
that have been loaded into cache are to be used again soon enough to not have been
evicted in the meantime. This is also called temporal locality. Using a simple model,
we will now estimate the performance gain that can be expected from a cache that
is a factor of τ faster than memory (this refers to bandwidth as well as latency; a
more refined model is possible but does not lead to additional insight). Let β be
the cache reuse ratio, i.e. the fraction of loads or stores that can be satisfied from
cache because there was a recent load or store to the same address. Access time
to main memory (again this includes latency and bandwidth) is denoted by Tm . In
cache, access time is reduced to Tc = Tm /τ . For some finite β, the average access
time will thus be Tav = βTc + (1 − β)Tm , and we calculate an access performance
gain of
Tm τ Tc τ
G(τ, β) = = = . (26.4)
Tav βTc + (1 − β)τ Tc β + τ (1 − β)
As Fig. 26.7 shows, a cache can only lead to a significant performance advantage if
the hit ratio is relatively close to one.
However, many applications use streaming patterns where large amounts of data
are loaded to the CPU, modified and written back, without the potential of reuse in
20
15 τ=5
τ=10
τ=50
G(τ,β)
10
0
0.7 0.75 0.8 0.85 0.9 0.95 1
β
Fig. 26.7. Performance gain vs. cache reuse ratio. τ parametrizes the speed advantage of
cache vs. main memory
26 Modern High Performance Computers 695
time. For a cache that only supports temporal locality, the reuse ratio β (see above)
is zero for streaming. Each new load is expensive as an item has to be evicted from
cache and replaced by the new one, incurring huge latency. In order to reduce the la-
tency penalty for streaming, caches feature a peculiar organization into cache lines.
All data transfers between caches and main memory happen on the cache line level.
The advantage of cache lines is that the latency penalty of a cache miss occurs only
on the first miss on an item belonging to a line. The line is fetched from memory as
a whole; neighboring items can then be loaded from cache with much lower latency,
increasing the cache hit ratio γ, not to be confused with the reuse ratio β. So if the
application shows some spatial locality, i.e. if the probability of successive accesses
to neighboring items is high, the latency problem can be significantly reduced. The
downside of cache lines is that erratic data access patterns are not supported. On the
contrary, not only does each load incur a miss and subsequent latency penalty, it also
leads to the transfer of a whole cache line, polluting the memory bus with data that
will probably never be used. The effective bandwidth available to the application
will thus be very low. On the whole, however, the advantages of using cache lines
prevail, and very few processor manufacturers have provided means of bypassing
the mechanism.
Assuming a streaming application working on DP floating point data on a CPU
with a cache line length of Lc = 16 words, spatial locality fixes the hit ratio at
γ = (16 − 1)/16 = 0.94, a seemingly large value. Still it is clear that performance
is governed by main memory bandwidth and latency – the code is memory-bound.
In order for an application to be truly cache-bound, i.e. decouple from main memory
so that performance is not governed by bandwidth or latency any more, γ must be
large enough that the time it takes to process in-cache data becomes larger than the
time for reloading it. If and when this happens depends of course on the details of
the operations performed.
By now we can interpret the performance data for cache-based architectures on
the vector triad in Fig. 26.2. At very small loop lengths, the processor pipeline is too
long to be efficient. Wind-up and wind-down phases dominate and performance is
poor. With growing N this effect becomes negligible, and as long as all four arrays
fit into the innermost cache, performance saturates at a high value that is set by
cache bandwidth and the ability of the CPU to issue load and store instructions.
Increasing N a little more gives rise to a sharp drop in performance because the
innermost cache is not large enough to hold all data. Second-level cache has usually
larger latency but similar bandwidth to L1 so that the penalty is larger than expected.
However, streaming data from L2 has the disadvantage that L1 now has to provide
data for registers as well as continuously reload and evict cache lines from/to L2,
which puts a strain on the L1 cache’s bandwidth limits. This is why performance is
usually hard to predict on all but the innermost cache level and main memory. For
each cache level another performance drop is observed with rising N, until finally
even the large outer cache is too small and all data has to be streamed from main
memory. The sizes of the different caches are directly related the locations of the
bandwidth breakdowns. Section 27.1 will describe how to predict performance for
696 G. Hager and G. Wellein
simple loops from basic parameters like cache or memory bandwidths and the data
demands of the application.
Storing data is a little more involved than reading. In presence of caches, if
data to be written out already resides in cache, a write hit occurs. There are several
possibilities for handling this case, but usually outermost caches work with a write-
back strategy: The cache line is modified in cache and written to memory as a whole
when evicted. On a write miss, however, cache-memory consistency dictates that the
cache line in question must first be transferred from memory to cache before it can
be modified. This is called read for ownership (RFO) and leads to the situation that
a data write stream from CPU to memory uses the bus twice, once for all the cache
line RFOs and once for evicting modified lines (the data transfer requirement for the
triad benchmark code is increased by 25 % due to RFOs). Consequently, streaming
applications do not usually profit from write-back caches and there is often a wish
for avoiding RFO transactions. Some architectures provide this option, and there are
generally two different strategies:
– Non-temporal stores. These are special store instructions that bypass all cache
levels and write directly to memory. Cache does not get polluted by store streams
that do not exhibit temporal locality anyway. In order to prevent excessive laten-
cies, there is usually a write combine buffer of sorts that bundles a number of
successive stores.
– Cache line zero. Again, special instructions serve to zero out a cache line and
mark it as modified without a prior read. The data is written to memory when
evicted. In comparison to non-temporal stores, this technique uses up cache
space for the store stream. On the other hand it does not slow down store opera-
tions in cache-bound situations.
Both can be applied by the compiler and hinted at by the programmer by means
of directives. In very simple cases compilers are able to apply those instructions
automatically in their optimization stages, but one must take care to not slow down
a cache-bound code by using non-temporal stores, rendering it effectively memory-
bound.
Way 2
Way 1
Fig. 26.8. Direct-mapped (left) and two-way set-associative cache (right). Shaded boxes in-
dicate cache lines
Fig. 26.8 (left)). Memory locations that lie a multiple of the cache size apart are
always mapped to the same cache line, and the cache line that corresponds to some
address can be obtained very quickly by masking out the most significant bits. More-
over, an algorithm to select which cache line to evict is pointless. No hardware and
no clock cycles need to be spent for it.
The downside of a direct-mapped cache is that it is disposed toward cache
thrashing, which means that cache lines are loaded into and evicted from cache
in rapid succession. This happens when an application uses many memory locations
that get mapped to the same cache line. A simple example would be a strided triad
code for DP data:
do i=1,N,CACHE SIZE/8
A(i) = B(i) + C(i) * D(i)
enddo
By using the cache size in units of DP words as a stride, successive loop iterations
hit the same cache line so that every memory access generates a cache miss. This
is different from a situation where the stride is equal to the line length; in that case,
there is still some (albeit small) N for which the cache reuse is 100 %. Here, the
reuse fraction is exactly zero no matter how small N may be.
To keep administrative overhead low and still reduce the danger of cache thrash-
ing, a set-associative cache is divided into m direct-mapped caches of equal size,
so-called ways. The number of ways m is the number of different cache lines a
memory address can be mapped to (see Fig. 26.8 (right) for an example of a two-
way set-associative cache). On each memory access, the hardware merely has to
determine which way the data resides in or, in the case of a miss, which of the m
possible cache lines should be evicted.
For each cache level the tradeoff between low latency and prevention of thrash-
ing must be considered by processor designers. Innermost (L1) caches tend to be
less set-associative than outer cache levels. Nowadays, set-associativity varies be-
tween two- and 16-way. Still, the effective cache size, i.e. the part of the cache that
is actually useful for exploiting spatial and temporal locality in an application code
698 G. Hager and G. Wellein
could be quite small, depending on the number of data streams, their strides and
mutual offsets. See Sect. 27.1 for examples.
26.1.5.3 Prefetch
There is only one load stream in this code. Assuming a cache line length of four
elements, three loads can be satisfied from cache before another miss occurs. The
long latency leads to long phases of inactivity on the memory bus.
Making the lines very long will help, but will also slow down applications with
erratic access patterns even more. As a compromise one has arrived at typical cache
line lengths between 64 and 128 bytes (8–16 DP words). This is by far not big
enough to get around latency, and streaming applications would suffer not only
from insufficient bandwidth but also from low memory bus utilization. Assuming
a typical commodity system with a memory latency of 100 ns and a bandwidth of
4GBytes/sec, a single 128-byte cache line transfer takes 32 ns, so 75 % of the poten-
tial bus bandwidth is unused. Obviously, latency has an even more severe impact on
performance than bandwidth.
The latency problem can be solved in many cases, however, by prefetching.
Prefetching supplies the cache with data ahead of the actual requirements of an
application. The compiler can do this by interleaving special instructions with the
software pipelined instruction stream that touch cache lines early enough to give
the hardware time to load them into cache (see Fig. 26.10). This assumes there is
the potential of asynchronous memory operations, a prerequisite that is to some
extent true for current architectures. As an alternative, some processors feature a
time
1 LD cache miss: latency use data
2 LD use data
3 LD use data
Iteration #
4 LD use data
5 LD cache miss: latency use data
6 LD use data
7 LD use data
Fig. 26.9. Timing diagram on the influence of cache misses and subsequent latency penalties
for a vector norm loop. The penalty occurs on each new miss
26 Modern High Performance Computers 699
time
1 PF cache miss: latency LD use data
2 LD use data
3 LD use data
Iteration #
4 LD use data
5 PF cache miss: latency LD use data
6 LD use data
7 LD use data
8 LD use data
9 PF cache miss: latency LD use data
Fig. 26.10. Calculation and data transfer can be overlapped much better with prefetching. In
this example, two outstanding prefetches are required to hide latency completely
hardware prefetcher that can detect regular access patterns and tries to read ahead
application data, keeping up the continuous data stream and hence serving the same
purpose as prefetch instructions. Whichever strategy is used, it must be emphasized
that prefetching requires resources that are limited by design. The memory subsys-
tem must be able to sustain a certain number of outstanding prefetch operations,
i.e. pending prefetch requests, or else the memory pipeline will stall and latency
cannot be hidden completely. Applications with many data streams can easily over-
strain the prefetch mechanism. Nevertheless, if main memory access is unavoidable,
a good programming guideline is to try to establish long continuous data streams.
Figs. 26.9 and 26.10 stress the role of prefetching for hiding latency, but the ef-
fects of bandwidth limitations are ignored. It should be clear that prefetching cannot
enhance available memory bandwidth, although the transfer time for a single cache
line is dominated by latency.
In recent years it has become increasingly clear that, although Moore’s law is still
valid and will be at least for the next decade, standard microprocessors are starting
to hit the “heat barrier”: Switching and leakage power of several-hundred-million-
transistor chips are so large that cooling becomes a primary engineering effort
and a commercial concern. On the other hand, the necessity of an ever-increasing
clock frequency is driven by the insight that architectural advances and growing
cache sizes alone will not be sufficient to keep up the one-to-one correspondence of
Moore’s law with application performance.
Processor vendors are looking for a way out of this dilemma in the form of multi-
core designs. The technical motivation behind multi-core is based on the observation
that power dissipation of modern CPUs is proportional to the third power of clock
700 G. Hager and G. Wellein
W + ∆W = (1 + εf )3 W . (26.5)
Reducing clock frequency opens the possibility to place more than one CPU core
on the same die while keeping the same power envelope as before. For m cores, this
condition is expressed as
(1 + εf )3 m = 1 =⇒ εf = m−1/3 − 1 (26.6)
Figure 26.11 shows the required relative frequency reduction with respect to the
number of cores. The overall performance of the multi-core chip,
pm = (1 + εp )pm , (26.7)
–0.6
–0.5
–0.4
εf
–0.3
–0.2
–0.1
0
2 4 8 16
m
Fig. 26.11. Relative frequency reduction required to keep a given power envelope versus
number of cores on a multi-core chip. The filled dots represent available technology at the
time of writing
26 Modern High Performance Computers 701
Of course it is not easy to grow the CPU die by a factor of m with a given man-
ufacturing technology. Hence the most simple way to multi-core is to place separate
CPU dies in a common package. At some point advances in manufacturing technol-
ogy, i.e. smaller structure lengths, will then enable the integration of more cores on a
single die. Additionally, some compromises regarding the single-core performance
of a multi-core chip with respect to the previous generation will be made so that
the number of transistors per core will go down as will the clock frequency. Some
manufacturers have even adopted a more radical approach by designing new, much
simpler cores, albeit at the cost of introducing new programming paradigms.
Finally, the over-optimistic assumption (26.7) that m cores show m times the
performance of a single core will only be valid in the rarest of cases. Nevertheless,
multi-core has by now been adopted by all major processor manufacturers. There
are, however, significant differences in how the cores in a package can be arranged to
get good performance. Caches can be shared or exclusive to each core, the memory
interface can be on- or off-chip, fast data paths between the cores’ caches may or
may not exist, etc.
The most important conclusion one must draw from the multi-core transition is
the absolute demand for parallel programming. As the single core performance will
at best stagnate over the years, getting more speed for free through Moore’s law
just by waiting for the new CPU generation does not work any more. The following
section outlines the principles and limitations of parallel programming. More details
on dual- and multi-core designs will be discussed in the section on shared-memory
programming Sect. 26.2.4.
In order to avoid any misinterpretation we will always use the terms core, CPU
and processor synonymously.
200
150
100
50
0
5–8 9–16 17–32 33–64 65–128 129– 257– 513– 1025– 2049– 4k–8k 8k–16k 32k– 64k–
256 512 1024 2048 4096 64k 128k
# CPUs
Fig. 26.12. Number of systems vs. processor count in the June 2000 and June 2006 Top 500
lists. The average number of CPUs has grown 16-fold in six years
In order to be able to define scalability we first have to identify the basic mea-
surements on which derived performance metrics are built. In a simple model, the
overall problem size (amount of work) shall be s + p = 1, where s is the serial
(non-parallelizable) and p is the perfectly parallelizable fraction. The 1-CPU (se-
rial) runtime for this case,
26 Modern High Performance Computers 703
1 2 3 4 5 6 7 8 9 10 11 12
W1 1 2 3 4 W1 1 2 3 4
W2 5 6 7 8 W2 5 6 7 8
W3 9 10 11 12 W3 9 10 11 12
Fig. 26.13. Parallelizing a sequence of tasks (top) using three workers (W1. . . W3). Left bot-
tom: perfect speedup. Right bottom: some tasks executed by different workers at different
speeds lead to load imbalance. Hatched regions indicate unused resources
Tfs = s + p , (26.9)
is thus normalized to one. Solving the same problem on N CPUs will require a
runtime of
p
Tfp = s +
. (26.10)
N
This is called strong scaling because the amount of work stays constant no matter
how many CPUs are used. Here the goal of parallelization is minimization of time
to solution for a given problem.
If time to solution is not the primary objective because larger problem sizes (for
which available memory is the limiting factor) are of interest, it is appropriate to
scale the problem size with some power of N so that the total amount of work is
s+pN α , where α is a positive but otherwise free parameter. Here we use the implicit
assumption that the serial fraction s is a constant. We define the serial runtime for
the scaled problem as
Tvs = s + pN α . (26.11)
Consequently, the parallel runtime is
The term weak scaling has been coined for this approach.
W1 1 2 3 4
W2 5 6 7 8
W3 9 10 11 12
Fig. 26.14. Parallelization in presence of a bottleneck that effectively serializes part of the
concurrent execution. Tasks 3, 7 and 11 cannot overlap across the dashed barriers
704 G. Hager and G. Wellein
W1 1 2 3 4
W2 5 6 7 8
W3 9 10 11 12
Fig. 26.15. Communication processes (arrows represent messages) limit scalability if they
cannot be overlapped with each other or with calculation
Pfp 1
Sf = = . (26.15)
Pfs
s + 1−s
N
With (26.15) we have derived the well-known Amdahl law which limits application
speedup for large N to 1/s. It answers the question “How much faster (in terms of
runtime) does my application run when I put the same problem on N CPUs?” On
the other hand, in the case of weak scaling where workload grows with CPU count,
the question to ask is “How much more work can my program do in a given amount
of time when I put a larger problem on N CPUs?” Serial performance as defined
above is again
s+p
Pvs = =1, (26.16)
Tfs
as N = 1. Based on (26.11) and (26.12), Parallel performance (work over time) is
s + pN α s + (1 − s)N α
Pvp = p = = Sv , (26.17)
Tv (N ) s + (1 − s)N α−1
N ≫1 s + (1 − s)N α p
Sv −−−→ = 1 + Nα , (26.18)
s s
26 Modern High Performance Computers 705
In the light of the considerations about scalability, one other point of interest is the
question how effectively a given resource, i.e., CPU power, can be used in a parallel
program (in the following we assume that the serial part of the program is executed
on one single worker while all others have to wait). Usually, parallel efficiency is
then defined as
performance on N CPUs speedup
ε= = . (26.20)
N × performance on one CPU N
We will only consider weak scaling, as the limit α → 0 will always recover the
Amdahl case. We get
Sv sN −α + (1 − s)
ε= = . (26.21)
N sN 1−α + (1 − s)
For α = 0 this yields 1/(sN + (1 − s)), which is the expected ratio for the Amdahl
case and approaches zero with large N . For α = 1 we get s/N + (1 − s), which is
also correct because the more CPUs are used the more CPU cycles are wasted, and,
starting from ε = s + p = 1 for N = 1, efficiency reaches a limit of 1 − s = p for
large N . Weak scaling enables us to use at least a certain fraction of CPU power,
even when the CPU count is very large. Wasted CPU time grows linearly with N ,
though, but this issue is clearly visible with the definitions used.
There are situations where Amdahl’s and Gustafson’s laws are not appropriate be-
cause the underlying model does not encompass components like communication,
load imbalance, parallel startup overhead etc. As an example, we will include a
simple communication model. For simplicity we presuppose that communication
cannot be overlapped with computation (see Fig. 26.15), an assumption that is ac-
tually true for many parallel architectures. In a parallel calculation, communication
must thus be accounted for as a correction term in parallel runtime (26.12):
20
15
c
Sv
10 Amdahl (α=κ=λ=0)
α=0, blocking
α=0, non-blocking
α=0, 3D domain dec., non-blocking
α=1, 3D domain dec., non-blocking
5
0
1 10 100 1000
N
Fig. 26.16. Predicted parallel scalability for different models at s = 0.05. In general, κ =
0.005 and λ = 0.001 except for the Amdahl case which is shown for reference
Figure 26.16 illustrates the four cases at κ = 0.005, λ = 0.001 and s = 0.05 and
compares with Amdahl’s law. Note that the simplified models we have covered in
this section are far from accurate for many applications. In order to check whether
some performance model is appropriate for the code at hand, one should measure
scalability for some processor numbers and fix the free model parameters by least-
squares fitting.
After covering the principles and limitations of parallelization we will now turn
to the concrete architectures that are at the programmer’s disposal to implement a
parallel algorithm on. Two primary paradigms have emerged, and each features a
dominant and standardized programming model: Distributed-memory and shared-
memory systems. In this section we will be concerned with the former while the
next section covers the latter.
Figure 26.17 shows a simplified block diagram, or programming model, of a
distributed-memory parallel computer. Each processor P (with its own local cache
C) is connected to exclusive local memory, i.e. no other CPU has direct access to it.
Although many parallel machines today, first and foremost the popular PC clusters,
consist of a number of shared-memory compute nodes with two or more CPUs for
price/performance reasons, the programmer’s view does not reflect that (it is even
possible to use distributed-memory programs on machines that feature shared mem-
ory only). Each node comprises at least one network interface (NI) that mediates
the connection to a communication network. On each CPU runs a serial process that
can communicate with other processes on other CPUs by means of the network. In
the simplest case one could use standard switched Ethernet, but a number of more
advanced technologies have emerged that can easily have ten times the bandwidth
708 G. Hager and G. Wellein
P P P P P
C C C C C
M M M M M
NI NI NI NI NI
Communication network
Fig. 26.17. Simplified programmer’s view, or programming model, of a distributed-memory
parallel computer
and 1/10 th of the latency of Gbit Ethernet. As shown in the section on performance
models, the exact layout and speed of the network has considerable impact on ap-
plication performance. The most favorable design consists of a non-blocking wire-
speed network that can switch N/2 connections between its N participants without
any bottlenecks. Although readily available for small systems with tens to a few
hundred nodes, non-blocking switch fabrics become vastly expensive on very large
installations and some compromises are usually made, i.e. there will be a bottleneck
if all nodes want to communicate concurrently.
Note that domain decomposition has the attractive property that domain bound-
ary area grows more slowly than volume if the problem size increases with N con-
stant. Therefore one can alleviate communication bottlenecks just by choosing a
larger problem size. The expected effects of strong and weak scaling with opti-
mal domain decomposition in three dimensions have been discussed in (26.26) and
(26.27).
In order to compile and link MPI programs, compilers and linkers need options
that specify where include files and libraries can be found. As there is considerable
variation in those locations across installations, most MPI implementations provide
compiler wrapper scripts (often called mpicc, mpif77, etc.) that supply the re-
quired options automatically but otherwise behave like normal compilers. Note that
the way that MPI programs should be compiled and started is not fixed by the stan-
dard, so please consult your system documentation.
Listing 26.2. A very simple, fully functional “Hello World” MPI program
1 program mpitest
2 use MPI
3
6 call MPI_Init(ierror)
7 call MPI_Comm_size(MPI_COMM_WORLD, size, ierror)
8 call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierror)
9
12 call MPI_Finalize(ierror)
13
14 end
26 Modern High Performance Computers 711
Listing 26.2 shows a simple “Hello World” type MPI program in Fortran 90. In
line 2, the MPI module is loaded which provides required globals and definitions
(in Fortran 77 and C/C++ one would use the preprocessor to read in the mpif.h
or mpi.h header files, respectively). All MPI calls take an INTENT(OUT) argu-
ment, here called ierror, that transports information about the success of the MPI
operation to the user code (in C/C++, the return code is used for that). As failure
resiliency is not built into the MPI standard today and checkpoint/restart features
are usually implemented by the user code anyway, the error code is rarely checked
at all.
The first call in every MPI code should go to MPI_Init and initializes the par-
allel environment (line 6). In C/C++, &argc and &argv are passed to MPI_Init
so that the library can evaluate and remove any additional command line arguments
that may have been added by the MPI startup process. After initialization, MPI has
set up a so-called communicator, called MPI_COMM_WORLD. A communicator de-
fines a group of MPI processes that can be referred to by a communicator handle.
The MPI_COMM_WORLD handle describes all processes that have been started as
part of the parallel program. If required, other communicators can be defined as
subsets of MPI_COMM_WORLD. Nearly all MPI calls require a communicator as an
argument.
The calls to MPI_Comm_size and MPI_Comm_rank in lines 7 and 8 serve to
determine the number of processes (size) in the parallel program and the unique
identifier (the rank) of the calling process, respectively. The ranks in a commu-
nicator, in this case MPI_COMM_WORLD, are numbered starting from zero up to
N − 1. In line 12, the parallel program is shut down by a call to MPI_Finalize.
Note that no MPI process except rank 0 is guaranteed to execute any code beyond
MPI_Finalize.
In order to compile and run the source code in Listing 26.2, a common imple-
mentation would require the following steps:
This would compile the code and start it with four processes. Be aware that proces-
sors may have to be allocated from some batch system before parallel programs can
be launched. How MPI processes are mapped to actual processors is entirely up to
the implementation. The output of this program could look like the following:
Hello World, I am 3 of 4
Hello World, I am 0 of 4
Hello World, I am 2 of 4
Hello World, I am 1 of 4
Although the stdout and stderr streams of MPI programs are usually redirected
to the terminal where the program was started, the order in which outputs from
different ranks will arrive is undefined.
712 G. Hager and G. Wellein
This example did not contain any real communication apart from starting and
stopping processes. An MPI message is defined as an array of elements of a par-
ticular MPI datatype. Data types can either be basic types (corresponding to the
standard types that every programming language knows) or derived types that must
be defined by appropriate MPI calls. The reason why MPI needs to know the data
types of messages is that it supports heterogeneous environments where it may be
necessary to do on-the-fly data conversions. For some message transfer to take place,
the data types on sender and receiver sides must match. If there is exactly one sender
and one receiver we speak of point-to-point communication. Both ends are identified
uniquely by their ranks. Each message can carry an additional integer label, the so-
called tag that may be used to identify the type of a message, as a sequence number
or any other accompanying information. In Listing 26.3 we show an MPI program
fragment that computes an integral over some function f(x) in parallel. Each MPI
process gets assigned a subinterval of the integration domain (lines 9 and 10), and
some other function can then perform the actual integration (line 12). After that each
process holds its own partial result, which should be added to get the final integral.
This is done at rank 0, who executes a loop over all ranks from 1 to size − 1,
receiving the local integral from each rank in turn via MPI_Recv and accumulat-
ing the result in res. Each rank apart from 0 has to call MPI_Send to transmit
the data. Hence there are size − 1 send and size − 1 matching receive opera-
tions. The data types on both sides are specified to be MPI_DOUBLE_PRECISION,
which corresponds to the usual double precision type in Fortran (be aware
that MPI types are named differently in C/C++ than in Fortran). The message tag
is not used here, so we set it to 0 because identical tags are required for message
matching as well.
While all parameters are necessarily fixed on MPI_Send, there is some more
variability on the receiver side. MPI_Recv allows wildcards so that the source rank
and the tag do not have to be specified. Using MPI_ANY_SOURCE as source rank
and MPI_ANY_TAG as tag will match any message, from any source, with any tag
as long as the other matching criteria like data type and communicator are met (this
would have been possible in the integration example without further code changes).
After MPI_Recv has returned to the user code, the status array can be used to
extract the missing pieces of information, i.e. the actual source rank and message
tag, and also the length of the message as the array size specified in MPI_Recv is
only an upper limit.
The accumulation of partial results as shown above is an example for a reduction
operation, performed on all processes in the communicator. MPI has mechanisms
that make reductions much simpler and in most cases more efficient than looping
over all ranks and collecting results. As reduction is a procedure that all ranks in a
communicator participate in, it belongs to the so-called collective communication
operations in MPI. Collective communication, as opposed to point-to-point commu-
nication, requires that every rank calls the same routine, so it is impossible for a mes-
sage sent via point-to-point to match a receive that was initiated using a collective
26 Modern High Performance Computers 713
call. The whole if. . . else. . . endif construct (apart from printing the result) in
Listing 26.3 could have been written as a single call:
call MPI_Reduce(psum, & ! send buffer
res, & ! receive buffer
1, & ! array length
MPI_DOUBLE_PRECISION,&
MPI_SUM,& ! type of operation
0, & ! root (accumulate res there)
MPI_COMM_WORLD,ierror)
Most collective routines define a root rank at which some general data source or
sink is located. Although rank 0 is a natural choice for root, it is in no way different
from other ranks.
There are collective routines not only for reduction but also for barriers (each
process stops at the barrier until all others have reached the barrier as well), broad-
casts (the root rank transmits some data to everybody else), scatter/gather (data is
distributed from root to all others or collected at root from everybody else), and
complex combinations of those. Generally speaking, it is a good idea to prefer col-
lectives over point-to-point constructs that emulate the same semantics. Good MPI
implementations are optimized for data flow on collective operations and also have
some knowledge about network topology built in.
All MPI functionalities described so far have the property that the call returns
to the user program only after the message transfer has progressed far enough so
that the send/receive buffer can be used without problems. That means, received
data has arrived completely and sent data has left the buffer so that it can be safely
modified without inadvertently changing the message. In MPI terminology, this is
called blocking communication. Although collective operations are always block-
ing, point-to-point communication can be performed with non-blocking calls as
well. A non-blocking point-to-point call merely initiates a message transmission
and returns very quickly to the user code. In an efficient implementation, waiting
for data to arrive and the actual data transfer occur in the background, leaving re-
sources free for computation. In other words, non-blocking MPI is a way in which
computation and communication may be overlapped. As long as the transfer has not
finished (which can be checked by suitable MPI calls), the message buffer must not
be used. Non-blocking and blocking MPI calls are mutually compatible, i.e. a mes-
sage sent via a blocking send can be matched by a non-blocking receive. Table 26.1
gives a rough overview of available communication modes in MPI.
As mentioned before, there are various options for the choice of a network in a
distributed-memory computer. The simplest and cheapest solution to date is Gbit
Ethernet, which will suffice for many throughput applications but is far too slow –
in terms of bandwidth and latency – for parallel code with any need for fast com-
munication. Assuming that the total transfer time for a message of size N [bytes] is
26 Modern High Performance Computers 715
Point-to-point Collective
Blocking MPI_Send(buf,...) MPI_Barrier(...)
MPI_Ssend(buf,...) MPI_Bcast(...)
MPI_Bsend(buf,...) MPI_Reduce(...)
MPI_Recv(buf,...) (all processes in communicator
(buf can be used after call returns) must call)
Non-blocking MPI_Isend(buf,...)
MPI_Irecv(buf,...)
(buf can not be used or modified N/A
after call returns; check for
completion with MPI_Wait(...)
or MPI_Test(...))
Bandwidth in MBytes/sec is then reported for different N (see Fig. 26.20). Common
to all interconnects, we observe very low bandwidth for small message sizes as
expected from the model (26.29). Latency can be measured directly by taking the
716 G. Hager and G. Wellein
100
80
Beff [MBytes/sec]
60
40
measured (GE)
20 model fit (Tl = 41µs,
B = 102 MBytes/sec)
0
Fig. 26.19. Fit of the model for effective bandwidth (26.29) to data measured on a Gbit
Ethernet network
N = 0 limit of transfer time (inset in Fig. 26.20). The reasons for latency can be
diverse:
– All data transmission protocols have some overhead in the form of administra-
tive data like message headers etc.
– Some protocols (like, e.g., TCP/IP as used over Ethernet) define minimum mes-
sage sizes, so even if the application sends a single byte, a small frame of N > 1
bytes is transmitted.
– Initiating a message transfer is a complicated process that involves multiple soft-
ware layers, depending on the complexity of the protocol. Each software layer
adds to latency.
– Standard PC hardware as frequently used in clusters is not optimized towards
low-latency I/O.
In fact, high-performance networks try to improve latency by reducing the influence
of all of the above. Lightweight protocols, optimized drivers and communication
devices directly attached to processor buses are all used by vendors to provide low
MPI latency.
For large messages, effective bandwidth saturates at some maximum value.
Structures like local minima etc. frequently occur but are very dependent on hard-
ware and software implementations (e.g., the MPI library could decide to switch
to a different buffering algorithm beyond some message size). Although saturation
bandwidths can be quite high (there are systems where achievable MPI bandwidths
are comparable to the local memory bandwidth of the processor), many applications
work in a region on the bandwidth graph where latency effects still play a dominant
role. To quantify this problem, the N1/2 value is often reported. This is the message
size at which Beff = B/2 (see Fig. 26.20). In the model (26.29), N1/2 = BTl . From
this point of view it makes sense to ask whether an increase in maximum network
bandwidth by a factor of β is really beneficial for all messages. At message size N ,
26 Modern High Performance Computers 717
1500 35
30
Latency [µs]
25
20
Beff [MBytes/sec]
15
1000 10
5
0
100 101 102 103
500 Message Size [Byte] GBit Ethernet
Infiniband 4X (PCI-X)
SGI NUMALink4
Fig. 26.20. Result of the PingPong benchmark for three different networks. The N1/2 point
is marked for the NumaLink4 data. Inset: Latencies can be deduced by extrapolating to zero
message length
so that for N = N1/2 and β = 2 the gain is only 33 %. In case of a reduction of la-
tency by a factor of β, the result is the same. Hence it is desirable to improve on both
latency and bandwidth to make an interconnect more efficient for all applications.
Please note that the simple PingPong algorithm described above cannot pinpoint
saturation effects: If the network fabric is not completely non-blocking and all nodes
transmit or receive data (as is often the case with collective MPI operations), aggre-
gated bandwidth, i.e. the sum over all effective bandwidths for all point-to-point
connections, is lower than the theoretical limit. This can severely throttle the perfor-
mance of applications on large CPU numbers as well as overall throughput of the
machine.
systems is quite similar to the distributed-memory case (Fig. 26.17), but net-
work logic makes the aggregated memory of the whole system appear as one
single address space. Due to the distributed nature, memory access performance
varies depending on which CPU accesses which parts of memory (local vs. re-
mote access).
With multiple CPUs, copies of the same cache line may reside in different caches,
probably in modified state. So for both above varieties, cache coherence protocols
must guarantee consistency between cached data and data in memory at all times.
Details about UMA, ccNUMA and cache coherence mechanisms are provided in
the following sections.
26.2.4.1 UMA
P P
socket
C C
C C
Chipset
Memory
Fig. 26.21. A UMA system with two single-core CPUs that share a common front-side bus
(FSB)
26 Modern High Performance Computers 719
P P P P
socket
C C C C
C C
Chipset
Memory
Fig. 26.22. A UMA system in which the FSBs of two dual-core chips are connected separately
to the chipset
that the bandwidth from chipset to memory matches the aggregated bandwidth of
the front-side buses. Each dual-core chip features a separate L1 on each CPU but
a shared L2 cache for both. The advantage of a shared cache is that, to an extent
limited by cache size, data exchange between cores can be done there and does not
have to resort to the slow front-side bus. Of course, a shared cache should also meet
the bandwidth requirements of all connected cores, which might not be the case.
Due to the shared caches and FSB connections this kind of node is, while still a
UMA system, quite sensitive to the exact placement of processes or threads on its
cores. For instance, with only two processes it may be desirable to keep (pin) them
on separate sockets if the memory bandwidth requirements are high. On the other
hand, processes communicating a lot via shared memory may show more perfor-
mance when placed on the same socket because of the shared L2 cache. Operating
systems as well as some modern compilers usually have tools or library functions
for observing and implementing thread or process pinning.
The general problem of UMA systems is that bandwidth bottlenecks are bound
to occur when the number of sockets, or FSBs, is larger than a certain limit. In very
simple designs like the one in Fig. 26.21, a common memory bus is used that can
only transfer data to one CPU at a time (this is also the case for all multi-core chips
available today).
In order to maintain scalability of memory bandwidth with CPU number, non-
blocking crossbar switches can be built that establish point-to-point connections
between FSBs and memory modules (similar to the chip set in Fig. 26.22). Due
to the very large aggregated bandwidths those become very expensive for a larger
number of sockets. At the time of writing, the largest UMA systems with scalable
bandwidth (i.e. the memory bandwidth matches the aggregated FSB bandwidths of
720 G. Hager and G. Wellein
all processors in the node) have eight CPUs. This problem can only be solved by
giving up on the UMA principle.
26.2.4.2 ccNUMA
P P P P
C C C C
C C C C
MI MI
HT
Memory Memory
Fig. 26.23. Hypertransport-based ccNUMA system with two locality domains (one per
socket) and four cores
26 Modern High Performance Computers 721
P P P P
C C C C
C C C C
S S R
R S S
Fig. 26.24. ccNUMA system with routed NUMALink network and four locality domains
the system. Furthermore, as is the case with networks for distributed-memory com-
puters, providing wire-equivalent speed, non-blocking bandwidth in large systems
is extremely expensive.
In all ccNUMA designs network connections must have bandwidth and latency
characteristics comparable to those of local memory. Although this is the case for
all contemporary systems, even a penalty factor of two for non-local transfers can
badly hurt application performance if access can not be restricted inside locality
domains. This locality problem is the first of two obstacles to take with high per-
formance software on ccNUMA. It occurs even if there is only one serial program
running on a ccNUMA machine. The second problem is potential congestion if two
processors from different locality domains access memory in the same locality do-
main, fighting for memory bandwidth. Even if the network is non-blocking and its
performance matches the bandwidth and latency of local access, congestion can oc-
cur. Both problems can be solved by carefully observing the data access patterns
of an application and restricting data access of each processor to its own locality
domain. Section 27.2.3 will elaborate on this topic.
In inexpensive ccNUMA systems I/O interfaces are often connected to a sin-
gle locality domain. Although I/O transfers are usually slow compared to memory
bandwidth, there are, e.g., high-speed network interconnects that feature multi-GB
bandwidths between compute nodes. If data arrives at the wrong locality domain,
written by an I/O driver that has positioned its buffer space disregarding any cc-
NUMA constraints, it should be copied to its optimal destination, reducing effective
bandwidth by a factor of four (three if RFOs can be avoided, see Sect 26.1.5.2.
In this case even the most expensive interconnect hardware is wasted. In truly scal-
able ccNUMA designs this problem is circumvented by distributing I/O connections
across the whole machine and using ccNUMA-aware drivers.
722 G. Hager and G. Wellein
P1 P2
1. C1 requests exclusive CL ownership
C1 C2
2. Set CL in C2 to state I
3 7
3. CL has state E in C1 → modify A1 in C1
and set to state M
A1 A2 A1 A2
4. C2 requests exclusive CL ownership
1 5 2 4 6
5. Evict CL from C1 and set to state I
A1 A2 6. Load CL to C2 and set to state E
Memory
7. Modify A2 in C2 and set to state M in C2
Fig. 26.25. Two processors P1, P2 modify the two parts A1, A2 of the same cache line in
caches C1 and C2. The MESI coherence protocol ensures consistency between cache and
memory
has requested exclusive ownership. A cache miss occurs if and only if the chosen
line is invalid.
The order of events is depicted in Fig. 26.25. The question arises how a cache line in
state M is notified when it should be evicted because another cache needs to read the
most current data. Similarly, cache lines in state S or E must be invalidated if another
cache requests exclusive ownership. In small systems a bus snoop is used to achieve
this: Whenever notification of other caches seems in order, the originating cache
broadcasts the corresponding cache line address through the system, and all caches
“snoop” the bus and react accordingly. While simple to implement, this method has
the crucial drawback that address broadcasts pollute the system buses and reduce
available bandwidth for useful memory accesses. A separate network for coherence
traffic can alleviate this effect but is not always practicable.
A better alternative, usually applied in larger ccNUMA machines, is a directory-
based protocol where bus logic like chip sets or memory interfaces keep track of the
location and state of each cache line in the system. This uses up some small part of
main memory (usually far less than 10 %), but the advantage is that state changes
of cache lines are transmitted only to those caches that actually require them. This
greatly reduces coherence traffic through the system. Today even workstation chip
sets implement snoop filters that serve the same purpose.
Coherence traffic can severely hurt application performance if the same cache
line is written to frequently by different processors (false sharing). In Sect. 27.2.1.2
we will give hints for avoiding false sharing in user code.
thread is much less costly than forking a new process, because threads share ev-
erything but instruction pointer (the address of the next instruction to be executed),
stack pointer and register state. Each thread can, by means of its local stack pointer,
also have private variables, but as all data is accessible via the common address
space, it is only a matter of taking the address of an item to make it accessible to all
other threads as well: Thread-private data is for convenience, not for protection.
It is indeed possible to use operating system threads (POSIX threads) directly,
but this option is seldom used with numerical software. OpenMP is a layer that
adapts the raw OS thread interface to make it more usable with the typical loop
structures that numerical software tends to show. As an example, consider a paral-
lel version of a simple integration program (Listing 26.4). This is valid serial code,
but equipping it with the comment lines starting with the sequence !$OMP (called
a sentinel) and using an OpenMP-capable compiler makes it shared-memory paral-
lel. The PARALLEL directive instructs the compiler to start a parallel region (see
Fig. 26.26). A team of threads is spawned that executes identical copies of every-
thing up to END PARALLEL (the actual number of threads is unknown at compile
time as it is set by an environment variable). By default, all variables which were
present in the program before the parallel region are shared among all threads. How-
ever, that would include x and sum of which we later need private versions for each
thread. OpenMP provides a way to make existing variables private by means of the
PRIVATE clause. If, in the above example, any thread in a parallel region writes
to sum (see line 4), it will update its own private copy, leaving the other threads’
untouched. Therefore, before the loop starts each thread’s copy of sum is set to zero.
In order to share some amount of work between threads and actually reduce
wallclock time, work sharing directives can be applied. This is done in line 5 using
the DO directive with the optional SCHEDULE clause. The DO directive is always re-
lated to the immediately following loop (line 6) and generates code that distributes
Listing 26.4. A simple program for numerical integration of a function f (x) in OpenMP
1 pi=0.d0
2 w=1.d0/n
3 !$OMP PARALLEL PRIVATE(x,sum)
4 sum=0.d0
5 !$OMP DO SCHEDULE(STATIC)
6 do i=1,n
7 x=w*(i-0.5d0)
8 sum=sum+f(x)
9 enddo
10 !$OMP END DO
11 !$OMP CRITICAL
12 pi=pi+w*sum
13 !$OMP END CRITICAL
14 !$OMP END PARALLEL
26 Modern High Performance Computers 725
master thread
fork
parallel
region
join
serial
region
team of
threads
Fig. 26.26. Model for OpenMP thread operations: The master thread forks a thread team that
work on shared memory in a parallel region. After the parallel region, the threads are joined
or put to sleep until the next parallel region starts
the loop iterations among the team of threads (please note that the loop counter
variable is automatically made private). How this is done is controlled by the argu-
ment of SCHEDULE. The simplest possibility is STATIC which divides the loop
in chunks of (roughly) equal size and executes each thread on a chunk. If for some
reason the amount of work per loop iteration is not constant but, e.g., decreases
with loop count, this strategy is suboptimal because different threads will get vastly
different workloads, which leads to load imbalance. One solution would be to use
a chunk size like in “STATIC,1” that dictates that chunks of size one should be
distributed across threads in a round-robin manner. There are alternatives to static
schedule for other types of workload (DYNAMIC, GUIDED).
The parallelized loop computes a partial sum in each thread’s private sum vari-
able. To get the final result, all the partial sums must be accumulated in the global
pi variable (line 12), but pi is shared so that uncontrolled updates would lead to
a race condition, i.e. the exact order and timing of operations will influence the re-
sult. In OpenMP, critical sections solve this problem by making sure that at most
one thread at a time executes some piece of code. In the example, the CRITICAL
and END CRITICAL directives bracket the update to pi so that a correct result
emerges at all times.
Critical sections hold the danger of deadlocks when used inappropriately. A
deadlock arises when one or more threads wait for resources that will never be-
come available, a situation that is easily generated with badly arranged CRITICAL
726 G. Hager and G. Wellein
Without the names on the two different critical sections in this code would dead-
lock.
– There are OpenMP API functions (see below) that support the use of locks for
protecting shared resources. The advantage of locks is that they are ordinary
variables that can be arranged as arrays or in structures. That way it is possible
to protect each single element of an array of resources individually.
Whenever there are different shared resources in a program that must be protected
from concurrent access each for its own but are otherwise unconnected, named crit-
ical sections or OpenMP locks should be used both for correctness and performance
reasons.
In some cases it may be useful to write different code depending on OpenMP
being enabled or not. The directives themselves are no problem here because they
will be ignored gracefully. Conditional compilation however is supported by the
preprocessor symbol _OPENMP which is defined only if OpenMP is available and
(in Fortran) the special sentinel !$ that acts as a comment only if OpenMP is not
enabled (see Listing 26.5). Here we also see a part of OpenMP that is not concerned
with directives. The use omp_lib declaration loads the OpenMP API func-
tion prototypes (in C/C++, #include <omp.h> serves the same purpose). The
omp_get_thread_num() function determines the thread ID, a number between
zero and the number of threads minus one, while omp_get_num_threads()
returns the number of threads in the current team. So if the general disposition of
OpenMP towards loop-based code is not what the programmer wants, one can easily
switch to an MPI-like style where thread ID determines the tasks of each thread.
In above example the second API call (line 8) is located in a SINGLE region,
which means that it will be executed by exactly one thread, namely the one that
26 Modern High Performance Computers 727
reaches the SINGLE directive first. This is done because numthreads is global
and should be written to only by one thread. In the critical region each thread just
prints a message, but a necessary requirement for the numthreads variable to
have the updated value is that no thread leaves the SINGLE region before update
has been promoted to memory. The END SINGLE directive acts as an implicit bar-
rier, i.e. no thread can continue executing code before all threads have reached the
same point. The OpenMP memory model ensures that barriers enforce memory con-
sistency: Variables that have been held in registers are written out so that cache co-
herence can make sure that all caches get updated values. This can also be initiated
under program control via the FLUSH directive, but most OpenMP work-sharing
and synchronization constructs perform implicit barriers and hence flushes at the
end.
There is an important reason for serializing the write statements in line 10.
As a rule, I/O operations and general OS functionality, but also common library
functions should be serialized because they are usually not thread-safe, i.e. calling
them in parallel regions from different threads at the same time may lead to errors.
A prominent example is the rand() function from the C library as it uses a static
variable to store its hidden state (the seed). Although local variables in functions are
private to the calling thread, static data is shared by definition. This is also true for
Fortran variables with a SAVE attribute.
One should note that the OpenMP standard gives no hints as to how threads
are to be distributed among the processors, let alone observe locality constraints.
Usually the OS makes a good choice regarding placement of threads, but sometimes
(especially on multi-core architectures and ccNUMA systems) it makes sense to
728 G. Hager and G. Wellein
Listing 26.6. C/C++ example with reduction clause for adding noise to the elements of an
array and calculating its vector norm. rand() is not thread-safe so it must be protected by a
critical region
1 double r,s;
2 #pragma omp parallel for private(r) reduction(+:s)
3 for(i=0; i<N; ++i) {
4 #pragma omp critical
5 {
6 r = rand(); // not thread-safe
7 }
8 a[i] += func(r/RAND_MAX); // func() is thread-safe
9 s = s + a[i] * a[i]; // calculate norm
10 }
use OS-level tools, compiler support or library functions to explicitly pin threads to
cores. See Sect. 27.2.3 for details.
So far, all OpenMP examples were concerned with the Fortran bindings. Of
course there is also a C/C++ interface that has the same functionality. The C/C++
sentinel is called #pragma omp, and the only way to do conditional compila-
tion is to use the _OPENMP symbol. Loop parallelization only works for canon-
ical for loops that have standard integer-type loop counters (i.e., no STL2 -style
iterator loops) and is done via #pragma omp for. All directives that act on
code regions apply to compound statements and an explicit ending directive is not
required.
The example in Listing 26.6 shows a C code that adds some random noise to
the elements of an array a[] and calculates its vector norm. As mentioned be-
fore, rand() is not thread-safe and must be protected with a critical region. The
function func(), however, is assumed to be thread-safe as it only uses automatic
(stack) variables and can thus be called safely from a parallel region (line 8). An-
other peculiarity in this example is the fusion of the parallel and for directives
to parallel for, which allows for more compact code. Finally, the reduction
operation is not performed using critical updates as in the integration example. In-
stead, an OpenMP reduction clause is used (end of line 2) that automatically
initializes the summation variable s with a sensible starting value, makes it private
and accumulates the partial results to it.
A word of caution is in order concerning thread-local variables. Usually the OS
shell restricts the maximum size of all stack variables of its processes. This limit
can often be adjusted by the user or the administrators. However, in a threaded pro-
gram there are as many stacks as there are threads, and the way the thread-local
stacks get their limit set is not standardized at all. Please consult OS and compiler
2
Standard template library
26 Modern High Performance Computers 729
documentation as to how thread-local stacks are limited. Stack overflows are a fre-
quent source of problems with OpenMP programs.
Running an OpenMP program is as simple as starting the executable binary just
like in the serial case. The number of threads to be used is determined by an environ-
ment variable called OMP_NUM_TREADS. There may be other means to influence
the way the program is running, e.g. OS scheduling of threads, pinning, getting de-
bug output etc., but those are not standardized.
References
1. Intel 64 and IA-32 Architectures Optimization Reference Manual (2006). URL http:
//developer.intel.com/design/processor/manuals/248966.pdf 683
2. Software Optimization Guide for AMD64 Processors (2005). URL http:
//www.amd.com/us-en/assets/content_type/white_papers_and_
tech_docs/25112.PDF 683
730 G. Hager and G. Wellein
The rapid development of faster and more capable processors and architectures has
often led to the false conclusion that the next generation of hardware will easily meet
the scientist’s requirements. This view is at fault for two reasons: First, utilizing the
full power of existing systems by proper parallelization and optimization strategies,
one can gain a competitive advantage without waiting for new hardware. Second,
computer industry has now reached a turning point where exponential growth of
compute power has ended and single-processor performance will stagnate at least
for the next couple of years. The advent of multi-core CPUs was triggered by this
development, making the need for more advanced, parallel, and well-optimized al-
gorithms imminent.
This chapter describes different ways to write efficient code on current super-
computer systems. In Sect. 27.1, simple common sense optimizations for scalar
code like strength reduction, correct layout of data structures and tabulation are
covered first. Many scientific programs are limited by the speed of the computer
system’s memory interface, so it is vital to avoid slow data paths or, if this is not
possible, at least use them efficiently. After some theoretical considerations on data
access and performance estimates based on code analysis and hardware character-
istics, techniques like loop transformations and cache blocking are explained using
examples from linear algebra (matrix-vector multiplication, matrix transpose). The
importance of interpreting compiler logs is emphasized. Along the discussion of per-
formance measurements for vanilla and optimized codes we introduce peculiarities
like cache thrashing and translation look-aside buffer misses, both potential show-
stoppers for compute performance. In a case study we apply the acquired knowl-
edge on sparse matrix-vector multiplication, a performance-determining operation
required for practically all sparse diagonalization algorithms.
Turning to shared-memory parallel programming in Sect. 27.2, we identify typ-
ical pitfalls (OpenMP loop overhead and false sharing) that can severely limit par-
allel scalability, and show some ways to circumvent them. The abundance of AMD
Opteron nodes in clusters has initiated the necessity for optimizing memory local-
ity. ccNUMA can lead to diverse bandwidth bottlenecks, and few compilers support
special features for ensuring memory locality. Programming techniques which can
alleviate ccNUMA effects are therefore described in detail using a parallelized
sparse matrix-vector multiplication as a nontrivial but instructive example.
G. Hager and G. Wellein: Optimization Techniques for Modern High Performance Computers, Lect. Notes Phys. 739,
731–767 (2008)
DOI 10.1007/978-3-540-74686-7 27
c Springer-Verlag Berlin Heidelberg 2008
732 G. Hager and G. Wellein
In the age of multi-1000-processor parallel computers, writing code that runs ef-
ficiently on a single CPU has grown slightly old-fashioned in some circles. The
argument for this point of view is derived from the notion that it is easier to add
more CPUs and boasting massive parallelism instead of investing effort into serial
optimization.
Nevertheless there can be no doubt that single-processor optimizations are of
premier importance. If a speedup of two can be gained by some straightforward
common sense optimization as described in the following section, the user will be
satisfied with half the number of CPUs in the parallel case. In the face of Amdahl’s
law the benefit will usually be even larger. This frees resources for other users and
projects and puts the hardware that was often acquired for considerable amounts of
money to better use. If an existing parallel code is to be optimized for speed, it must
be the first goal to make the single-processor run as fast as possible.
Often very simple changes to code can lead to a significant performance boost.
The most important common sense guidelines regarding the avoidance of perfor-
mance pitfalls are summarized in the following. Those may seem trivial, but ex-
perience shows that many scientific codes can be improved by the simplest of
measures.
In all but the rarest of cases, rearranging the code such that less work than before
is being done will improve performance. A very common example is a loop that
checks a number of objects to have a certain property, but all that matters in the end
is that any object has the property at all:
logical FLAG
FLAG = .false.
do i=1,N
if(complex_func(A(i)) < THRESHOLD) then
FLAG = .true.
endif
enddo
If complex_func() has no side effects, the only information that gets com-
municated to the outside of the loop is the value of FLAG. In this case, depending
on the probability for the conditional to be true, much computational effort can be
saved by leaving the loop as soon as FLAG changes state:
27 Optimization techniques in HPC 733
logical FLAG
FLAG = .false.
do i=1,N
if(complex_func(A(i)) < THRESHOLD) then
FLAG = .true.
exit
endif
enddo
integer iL,iR,iU,iO,iS,iN,edelz
double precision tt
... ! load spin orientations
edelz=iL+iR+iU+iO+iS+iN ! loop kernel
BF= 0.5d0*(1.d0+tanh(edelz/tt))
The last two lines are executed in a loop that accounts for nearly the whole runtime
of the application. The integer variables store spin orientations (up or down, i.e.
−1 or +1, respectively), so the edelz variable only takes integer values in the
range {−6, . . . , +6}. The tanh() function is one of those operations that take
vast amounts of time (at least tens of cycles), even if implemented in hardware. In
the case described, however, it is easy to eliminate the tanh() call completely by
tabulating the function over the range of arguments required, assuming that tt does
not change its value so that the table does only have to be set up once:
double precision tanh_table(-6:6)
integer iL,iR,iU,iO,iS,iN, edelz
double precision, tt
...
do i=-6,6 ! do this once
tanh_table(i) = tanh(dble(i)/tt)
734 G. Hager and G. Wellein
enddo
...
edelz=iL+iR+iU+iO+iS+iN ! loop kernel
BF= 0.5d0*(1.d0+tanh_table(edelz))
The table lookup is performed at virtually no cost compared to the tanh() evalu-
ation since the table will, due to its small size and frequent use, be available in L1
cache at access latencies of a few CPU cycles.
The working set of a code is the amount of memory it uses (i.e. actually touches) in
the course of a calculation. In general, shrinking the working set by whatever means
is a good thing because it raises the probability for cache hits. If and how this can
be achieved and whether it pays off performance-wise depends heavily on the al-
gorithm and its implementation, of course. In the above example, the original code
used standard four-byte integers to store the spin orientations. The working set was
thus much larger than the L2 cache of any processor. By changing the array defini-
tions to use integer*1 for the spin variables, the working set could be reduced
by nearly a factor of four, and became comparable to cache size.
Many recent microprocessor designs have instruction set extensions for integer
and floating-point SIMD operations (see also Sect. 26.1.4) that allow the concur-
rent execution of arithmetic operations on a wide register that can hold, e.g., two
DP or four SP floating-point words. Although vector processors also use SIMD in-
structions and the use of SIMD in microprocessors is often coined vectorization,
it is more similar to the multi-track property of modern vector systems. Generally
speaking, a vectorizable loop in this context will run faster if more operations can be
performed with a single instruction, i.e. the size of the data type should be as small
as possible. Switching from DP to SP data could result in up to a twofold speedup,
with the additional benefit that more items fit into the cache.
Consider, however, that not all microprocessors can handle small types effi-
ciently. Using byte-size integers for instance could result in very ineffective code
that actually works on larger word sizes but extracts the byte-sized data by mask
and shift operations.
A lot of compute time can be saved by this optimization, especially where strong
operations (like sin()) are involved. Although it may happen that subexpressions
are obstructed by other code and not easily recognizable, compilers are in princi-
ple able to detect this situation. They will however often refrain from pulling the
subexpression out of the loop except with very aggressive optimizations turned on.
The reason for this is the well-known non-associativity of FP operations: If floating-
point accuracy is to be maintained compared to non-optimized code, associativity
rules must not be used and it is left to the programmer to decide whether it is safe
to regroup expressions by hand.
Tight loops, i.e. loops that have few operations in them, are typical candidates for
software pipelining (see Sect. 26.1.3.1), loop unrolling and other optimization tech-
niques (see below). If for some reason compiler optimization fails or is inefficient,
performance will suffer. This can easily happen if the loop body contains conditional
branches:
do j=1,N
do i=1,N
if(i.ge.j) then
sign=1.d0
else if(i.lt.j) then
sign=-1.d0
else
sign=0.d0
endif
C(j) = C(j) + sign * A(i,j) * B(i)
enddo
enddo
In this multiplication of a matrix with a vector, the upper and lower triangular parts
get different signs and the diagonal is ignored. The if statement serves to decide
about which factor to use. Each time a corresponding conditional branch is encoun-
tered by the processor, some branch prediction logic tries to guess the most probable
outcome of the test before the result is actually available, based on statistical meth-
ods. The instructions along the chosen path are then fetched, decoded, and generally
fed into the pipeline. If the anticipation turns out to be false (this is called a mispre-
dicted branch or branch miss), the pipeline has to be flushed back to the position
of the branch, implying many lost cycles. Furthermore, the compiler refrains from
doing advanced optimizations like loop unrolling (see Sect. 27.1.3.2).
Fortunately the loop nest can be transformed so that all if statements vanish:
do j=1,N
do i=j+1,N
C(j) = C(j) + A(i,j) * B(i)
enddo
736 G. Hager and G. Wellein
enddo
do j=1,N
do i=1,j-1
C(j) = C(j) - A(i,j) * B(i)
enddo
enddo
By using two different variants of the inner loop, the conditional has virtually been
moved outside. One should add that there is more optimization potential in this loop
nest. Please consider the section on data access below for more information.
The previous sections have pointed out that the compiler is a crucial component
in writing efficient code. It is very easy to hide important information from the
compiler, forcing it to give up optimization at an early stage. In order to make the
decisions of the compiler’s intelligence available to the user, many compilers offer
options to generate annotated source code listings or at least logs that describe in
some detail what optimizations were performed. Listing 27.1 shows an example
for a compiler annotation regarding a standard vector triad loop as in listing 26.1.
Unfortunately, not all compilers have the ability to write such comprehensive code
annotations and users are often left with guesswork.
Of all possible performance-limiting factors in HPC, the most important one is data
access. As explained earlier, microprocessors tend to be inherently unbalanced with
respect to the relation of theoretical peak performance versus memory bandwidth.
As many applications in science and engineering consist of loop-based code that
moves large amounts of data in and out of the CPU, on-chip resources tend to be
underutilized and performance is limited only by the relatively slow data paths to
memory or even disks. Any optimization attempt should therefore aim at reducing
traffic over slow data paths, or, should this turn out to be infeasible, at least make
data transfer as efficient as possible.
Table 27.1. Typical balance values for operations limited by different transfer paths
data path balance
cache 0.5–1.0
machine (memory) 0.05–0.5
interconnect (high speed) 0.01–0.04
interconnect (GBit ethernet) 0.001–0.003
disk 0.001–0.02
738 G. Hager and G. Wellein
Now it is obvious that the expected maximum fraction of peak performance one can
expect from a code with balance Bc on a machine with balance Bm is
Bm
l = min 1, . (27.3)
Bc
features two flops per iteration, for which three loads (to elements B(i), C(i), and
D(i)) and one store operation (to A(i)) provide the required input data. The code
balance is thus Bc = (3 + 1)/2 = 2. On a CPU with machine balance Bm = 0.1,
we can then expect a lightspeed ratio of 0.05, i.e. 5 % of peak.
Standard cache-based microprocessors usually feature an outermost cache level
with write-back strategy. As explained in Sect. 26.1.5, cache line read for ownership
(RFO) is then required to ensure cache-memory coherence if nontemporal stores
or cache line zero is not used. Under such conditions, the store stream to array A
must be counted twice in calculating the code balance, and we would end up with a
lightspeed estimate of lRFO = 0.04.
These Fortran and C codes perform exactly the same task, and the second array
index is the fast (inner loop) index both times, but the memory access patterns are
quite distinct. In the Fortran example, the memory address is incremented in steps
of N*sizeof(double), whereas in the C example the stride is optimal. This is
because Fortran follows the so-called column major order whereas C follows row
major order for multi-dimensional arrays (see Fig. 27.1). Although mathematically
insignificant, the distinction must be kept in mind when optimizing for data access.
For the following example we assume column major order as implemented in For-
tran. Calculating the transpose of a dense matrix, A = B T , involves strided memory
access to A or B, depending on how the loops are ordered. The most unfavorable
way of doing the transpose is shown here:
[3][0] [3][1] [3][2] [3][3] [3][4] (4,1) (4,2) (4,3) (4,4) (4,5)
15 16 17 18 19 3 8 13 18 23
Fig. 27.1. Row major order (left) and column major order (right) storage schemes for matri-
ces. The small numbers indicate the offset of the element with respect to the starting address
of the array. Solid frames symbolize cache lines
740 G. Hager and G. Wellein
do i=1,N
do j=1,N
A(i,j) = B(j,i)
enddo
enddo
Write access to matrix A is strided (see Fig. 27.2). Due to RFO transactions, strided
writes are more expensive than strided reads. Starting from this worst possible code
we can now try to derive expected performance features. As matrix transpose does
not perform any arithmetic, we will use effective bandwidth (i.e., GBytes/sec avail-
able to the application) to denote performance.
Let C be the cache size and Lc the cache line size, both in DP words. Depending
on the size of the matrices we can expect three primary performance regimes:
– In case the two matrices fit into a CPU cache (2N 2 C), we expect effective
bandwidths of the order of cache speeds. Spatial locality is of importance only
between different cache levels; optimization potential is limited.
– If the matrices are too large to fit into cache but still
N Lc C , (27.4)
1 2 3 4 5 6 7
Fig. 27.2. Cache line traversal for vanilla matrix transpose (strided store stream, column
major order). If the leading matrix dimension is a multiple of the cache line size, each column
starts on a line boundary
27 Optimization techniques in HPC 741
1 2 3 4 5 6 7
Fig. 27.3. Cache line traversal for padded matrix transpose. Successive iterations hit different
cache lines
The vanilla graph in Fig. 27.4 shows that the assumptions described above are es-
sentially correct, although the strided write seems to be very unfavorable even when
the whole working set fits into cache. This is because the L1 cache on the considered
architecture is of write-through type, i.e. the L2 cache is always updated on a write,
regardless whether there was an L1 hit or miss. The RFO transactions between the
two caches hence waste the major part of available internal bandwidth.
In the second regime described above, performance stays roughly constant up to
a point where the fraction of cache used by the store stream for N cache lines be-
comes comparable to the L2 size. Effective bandwidth is around 1.8 GBytes/sec, a
15000 vanilla
flipped
N = 8192
flipped, unroll = 4
12500 flipped, block = 50, unroll = 4
Bandwidth [MBytes/sec]
N = 256
10000
400
7500
200
5000
0
6000 8000 10000
2500
0
100 1000 10000
N
Fig. 27.4. Performance (effective bandwidth) for different implementations of the dense ma-
trix transpose on a modern microprocessor with 1 MByte of L2 cache. The N = 256 and
N = 8192 lines indicate the positions where the matrices fit into cache and where N cache
lines fit into cache, respectively. (Intel Xeon/Nocona 3.2 Ghz)
742 G. Hager and G. Wellein
2000
1500
MBytes/sec
1000
500 vanilla
padded
0
1020 1024 1028 1032
N
Fig. 27.5. Cache thrashing for unfavorable choice of array dimensions (dashed). Padding
removes thrashing completely (solid)
when successive iterations hit the same (set of) cache line(s) because of insufficient
associativity. Fig. 27.2 shows clearly that this can easily happen with matrix trans-
pose if the leading dimension is a power of two. On a direct-mapped cache of size
C, every C/N -th iteration hits the same cache line. At a line length of Lc words,
the effective cache size is
C
Ceff = Lc max 1, . (27.5)
N
It is the number of cache words that are actually usable due to associativity con-
straints. On an m-way set-associative cache this number is merely multiplied by m.
Considering a real-world example with C = 217 (1 MByte), Lc = 16, m = 8 and
N = 1024 one arrives at Ceff = 211 DP words, i.e. 16 kBytes. So N Lc ≫ Ceff and
performance should be similar to the very large N limit described above, which is
roughly true.
A simple code modification, however, eliminates the thrashing effect: Assuming
that matrix A has dimensions 1024×1024, enlarging the leading dimension by p
(called padding) to get A(1024+p,1024) leads to a fundamentally different cache
use pattern. After Lc /p iterations, the address belongs to another set of m cache lines
and there is no associativity conflict if Cm/N > Lc /p (see Fig. 27.3). In Fig. 27.5
the striking effect of padding the leading dimension by p = 1 is shown with the
padded graph.
Generally speaking, one should by all means stay away from powers of two in
array dimensions. It is clear that different dimensions may require different paddings
to get optimal results, so sometimes a rule of thumb is applied: Try to make leading
array dimensions odd multiples of 16.
Further optimization approaches will be considered in the following sections.
744 G. Hager and G. Wellein
If both the number of arithmetic operations and the number of data transfers (load-
s/stores) are proportional to the problem size (or loop length) N , optimization po-
tential is usually very limited. Scalar products, vector additions and sparse matrix-
vector multiplication are examples for this kind of problems. They are inevitably
memory-bound for large N , and compiler-generated code achieves good perfor-
mance because O(N )/O(N ) loops tend to be quite simple and the correct soft-
ware pipelining strategy is obvious. Loop nests, however, are a different matter (see
below).
But even if loops are not nested there is sometimes room for improvement. As
an example, consider the following vector additions:
do i=1,N ! optimized
A(i) = B(i) + C(i) do i=1,N
enddo loop fusion A(i) = B(i) + C(i)
do i=1,N - ! save a load for B(i)
Z(i) = B(i) + E(i) Z(i) = B(i) + E(i)
enddo enddo
Each of the loops on the left has no options left for optimization. The code bal-
ance is 3/1 as there are two loads, one store and one addition per loop (not counting
RFOs). Array B, however, is loaded again in the second loop, which is unneces-
sary: Fusing the loops into one has the effect that each element of B only has to be
loaded once, reducing code balance to 5/2. All else being equal, performance in the
memory-bound case will improve by a factor of 6/5 (if RFO cannot be avoided, this
will be 8/7).
Loop fusion has achieved an O(N ) data reuse for the two-loop constellation so
that a complete load stream could be eliminated. In simple cases like the one above,
compilers can often apply this optimization by themselves.
In typical two-level loop nests where each loop has a trip count of N , there are
O(N 2 ) operations for O(N 2 ) loads and stores. Examples are dense matrix-vector
multiplication, matrix transpose, matrix addition etc., Although the situation on
the inner level is similar to the O(N )/O(N ) case and the problems are gener-
ally memory-bound, the nesting opens new opportunities. Optimization, however,
27 Optimization techniques in HPC 745
This code has a balance of 1 (two loads for A and B and two flops). Array C is
indexed by the outer loop variable, so updates can go to a register (here clarified
through the use of the scalar tmp although compilers can do this transformation
automatically) and do not count as load or store streams. Matrix A is only loaded
once, but B is loaded N times, once for each outer loop iteration. One would like to
apply the same fusion trick as above, but there are not just two but N inner loops to
fuse. The solution is loop unrolling: The outer loop is traversed with a stride m and
the inner loop is replicated m times. Obviously, one has to deal with the situation
that the outer loop count might not be a multiple of m. This case has to be handled
by a remainder loop:
! remainder loop
do r=1,mod(N,m)
do j=1,N
C(r) = C(r) + A(j,r) * B(j)
enddo
enddo
! main loop
do i=r,N,m
do j=1,N
C(i) = C(i) + A(j,i) * B(j)
enddo
do j=1,N
C(i+1) = C(i+1) + A(j,i+1) * B(j)
enddo
! m times
...
do j=1,N
C(i+m-1) = C(i+m-1) + A(j,i+m-1) * B(j)
enddo
enddo
The remainder loop is obviously subject to the same optimization techniques as the
original loop, but otherwise unimportant. For this reason we will ignore remainder
loops in the following.
By just unrolling the outer loop we have not gained anything but a considerable
code bloat. However, loop fusion can now be applied easily:
746 G. Hager and G. Wellein
The combination of outer loop unrolling and fusion is often called unroll and jam.
By m-way unroll and jam we have achieved an m-fold reuse of each element of
B from register so that code balance reduces to (m + 1)/(2m) which is clearly
smaller than one for m > 1. If m is very large, the performance gain can get close
to a factor of two. In this case array B is only loaded a few times or, ideally, just
once from memory. As A is always loaded exactly once and has size N 2 , the total
memory traffic with m-way unroll and jam amounts to N 2 (1 + 1/m) + N . Fig. 27.6
shows the memory access pattern for vanilla and 2-way unrolled dense MVM.
All this assumes, however, that register pressure is not too large, i.e. the CPU
has enough registers to hold all the required operands used inside the now quite
sizeable loop body. If this is not the case, the compiler must spill register data to
cache, slowing down the computation. Again, compiler logs can help identify such
a situation.
Unroll and jam can be carried out automatically by some compilers at high opti-
mization levels. Be aware though that a complex loop body may obscure important
information and manual optimization could be necessary, either – as shown above
– by hand-coding or compiler directives that specify high-level transformations like
unrolling. Directives, if available, are the preferred alternative as they are much eas-
ier to maintain and do not lead to visible code bloat. Regrettably, compiler directives
are inherently non-portable.
The matrix transpose code from the previous section is another example for
a problem of O(N 2 )/O(N 2 ) type, although in contrast to dense MVM there is
no direct opportunity for saving on memory traffic; both matrices have to be read
= + * = + *
Fig. 27.6. Vanilla (left) and 2-way unrolled (right) dense matrix vector multiplication. The
remainder loop is only a single (outer) iteration in this example
27 Optimization techniques in HPC 747
or written exactly once. Nevertheless, by using unroll and jam on the flipped
version a significant performance boost of nearly 50 % is observed (see dotted line
in Fig. 27.4):
do j=1,N,m
do i=1,N
A(i,j) = B(j,i)
A(i,j+1) = B(j+1,i)
...
A(i,j+m-1) = B(j+m-1,i)
enddo
enddo
Naively one would not expect any effect at m = 4 because the basic analysis stays
the same: In the mid-N region the number of available cache lines is large enough
to hold up to Lc columns of the store stream. The left picture in Fig. 27.7 shows the
situation for m = 2. However, the fact that m words in each of the load stream’s
cache lines are now accessed in direct succession reduces the TLB misses by a factor
of m, although the TLB is still way too small to map the whole working set.
Even so, cutting down on TLB misses does not remedy the performance break-
down for large N when the cache gets too small to hold N cache lines. It would
be nice to have a strategy which reuses the remaining Lc − m words of the strided
stream’s cache lines right away so that each line may be evicted soon and would not
have to be reclaimed later. A brute force method is Lc -way unrolling, but this ap-
proach leads to large-stride accesses in the store stream and is not a general solution
as large unrolling factors raise register pressure in loops with arithmetic operations.
Loop blocking can achieve optimal cache line use without additional register pres-
sure. It does not save load or store operations but increases the cache hit ratio. For
a loop nest of depth d, blocking introduces up to d additional outer loop levels that
cut the original inner loops into chunks:
do jj=1,N,b
jstart=jj; jend=jj+b-1
do ii=1,N,b
istart=ii; iend=ii+b-1
do j=jstart,jend,m
do i=istart,iend
a(i,j) = b(j,i)
a(i,j+1) = b(j+1,i)
...
a(i,j+m-1) = b(j+m-1,i)
enddo
enddo
enddo
enddo
In this example we have used 2D blocking with identical blocking factors b for
both loops in addition to m-way unroll and jam. Obviously, this change does not
748 G. Hager and G. Wellein
Fig. 27.7. Two-way unrolled (left) and blocked/unrolled (right) flipped matrix transpose, i.e.
with strided load
alter the loop body so the number of registers needed to hold operands stays the
same. However, the cache line access characteristics are much improved (see the
right picture in Fig. 27.7 which shows a combination of two-way unrolling and 4×4
blocking). If the blocking factors are chosen appropriately, the cache lines of the
strided stream will have been used completely at the end of a block and can be
evicted soon. Hence we expect the large-N performance breakdown to disappear.
The dotted-dashed graph in Fig. 27.4 demonstrates that 50 × 50 blocking combined
with 4-way unrolling alleviates all memory access problems induced by the strided
stream.
Loop blocking is a very general and powerful optimization that can often not be
performed by compilers. The correct blocking factor to use should be determined
experimentally through careful benchmarking, but one may be guided by typical
cache sizes, i.e. when blocking for L1 cache the aggregated working set size of
all blocked inner loop nests should not be much larger than half the cache. Which
cache level to block for depends on the operations performed and there is no general
recommendation.
If the number of operations is larger than the number of data items by a factor that
grows with problem size, we are in the very fortunate situation to have tremendous
optimization potential. By the techniques described above (unroll and jam, loop
blocking) it is usually possible for these kinds of problems to render the imple-
mentation cache-bound. Examples for algorithms that show O(N 3 )/O(N 2 ) char-
acteristics are dense matrix-matrix multiplication (MMM) and dense matrix diago-
nalization. It is beyond the scope of this contribution to develop a well-optimized
MMM, let alone eigenvalue calculation, but we can demonstrate the basic principle
by means of a simpler example which is actually of the O(N 2 )/O(N ) type:
do i=1,N
do j=1,N
sum = sum + foo(A(i),B(j))
enddo
enddo
27 Optimization techniques in HPC 749
The complete data set is O(N ) here but O(N 2 ) operations (calls to foo(), addi-
tions) are performed on it. In the form shown above, array B is loaded from memory
N times, so the total memory traffic amounts to N (N + 1) words. m-way unroll
and jam is possible and will immediately reduce this to N (N/m + 1), but the dis-
advantages of large unroll factors have been pointed out already. Blocking the inner
loop with a block size of b, however,
do jj=1,N,b
jstart=jj; jend=jj+b-1
do i=1,N
do j=jstart,jend
sum = sum + foo(A(i),B(j))
enddo
enddo
enddo
Several different storage schemes for sparse matrices have been developed, some of
which are suitable only for special kinds of matrices [3]. Of course, memory access
patterns and thus performance characteristics of sMVM depend heavily on the stor-
age scheme used. The two most important and also general formats are CRS (Com-
pressed Row Storage) and JDS (Jagged Diagonals Storage). We will see that CRS
is well-suited for cache-based microprocessors while JDS supports dependency and
loop structures that are favorable on vector systems.
= + *
Fig. 27.8. Sparse matrix-vector multiplication. Dark elements visualize entries involved in
updating a single l.h.s. element. Unless the sparse matrix rows have no gaps between the first
and last non-zero elements, some indirect addressing of the r.h.s. vector is inevitable
27 Optimization techniques in HPC 751
In CRS, an array val of length Nnz is used to store all non-zeroes of the matrix,
row by row, without any gaps, so some information about which element of val
originally belonged to which row and column must be supplied. This is done by
two additional integer arrays, col_idx of length Nnz and row_ptr of length Nr .
col_idx stores the column index of each non-zero element in val, and row_ptr
contains the indices at which new rows start in val (see Fig. 27.9). The basic code
to perform a MVM using this format is quite simple:
do i = 1,Nr
do j = row_ptr(i), row_ptr(i+1) - 1
c(i) = c(i) + val(j) * b(col_idx(j))
enddo
enddo
1 2 3 4 5
1 −4 2 −4 2 2 8 8 −5 10 −5 10 −6 val
2 2 8
3 8 −5 10 1 2 1 3 2 4 5 3 3 5 col_idx
4 −5
5 10 −6 1 3 5 8 9 row_ptr
sparse matrix from left top to right bottom (see Fig. 27.10). For each non-zero the
original column index is stored in col_idx just like in the CRS. In order to have
the same element order on the r.h.s. and l.h.s. vectors, the col_idx array is sub-
ject to the above-mentioned permutation as well. Array jd_ptr holds the start
indices of the Nj jagged diagonals. A standard code for sMVM in JDS format is
only slightly more complex than with CRS:
do diag=1, Nj
diagLen = jd_ptr(diag+1) - jd_ptr(diag)
offset = jd_ptr(diag)
do i=1, diagLen
c(i) = c(i) + val(offset+i) * b(col_idx(offset+i))
enddo
enddo
The perm array storing the permutation map is not required here; usually, all sMVM
operations are done in permuted space. These are the notable properties of this loop:
– There is a long inner loop without dependencies, which makes JDS a much
better storage format for vector processors than CRS.
– The outer loop is short (number of jagged diagonals).
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
1 −4 2 1 −4 2 1 8 −5 10
2 2 8 2 2 8 2 −4 2
3 8 −5 10 3 8 −5 10 3 2 8
4 −5 4 −5 4 10 −6
5 10 −6 5 10 −6 5 −5
8 −4 2 10 −5 −5 2 8 −6 10 val
original
2 1 1 3 3 4 2 3 5 5
col index
perm 2 3 1 5 4
3 2 2 1 1 5 3 1 4 4 col_idx
1 6 10 jd_ptr
Fig. 27.10. JDS sparse matrix storage format. The permutation map is also applied to the
column index array. One of the jagged diagonals is marked
27 Optimization techniques in HPC 753
– The result vector is loaded multiple times (at least partially) from memory, so
there might be some optimization potential.
– The non-zeroes in val are accessed with stride one.
– The r.h.s. vector is accessed indirectly, just as with CRS. The same comments
as above do apply, although a favorable matrix layout would feature straight
diagonals, not compact rows. As an additional complication the matrix rows as
well as the r.h.s. vector are permuted.
– Bc = 9/4 if the integer load to col_idx is counted with four bytes.
The code balance numbers of CRS and JDS sMVM seem to be quite in favor of
CRS.
Assuming that the peeled-off iterations account for a negligible contribution to CPU
time, m-way unroll and jam reduces code balance to
1 5
Bc = + .
m 4
If m is large enough, this can get close to the CRS balance. However, as explained
before large m leads to strong register pressure and is not always desirable. Gener-
ally, a sensible combination of unrolling and blocking is employed to reduce mem-
ory traffic and enhance in-cache performance at the same time. Blocking is indeed
possible for JDS sMVM as well (see Fig. 27.12):
754 G. Hager and G. Wellein
Fig. 27.11. JDS matrix traversal with 2-way unroll and jam and loop peeling. The peeled
iterations are marked
450
AMD Opteron
400 Intel Itanium2 (SGI Altix)
Intel Xeon/Core
350
MFlops/sec 300
250
200
150
100
50
0
CRS JDS JDS JDS
vanilla unroll = 2 block = 400
Fig. 27.13. Performance comparison of sparse MVM codes with different optimizations. A
matrix with 1.7 × 107 unknowns and 20 jagged diagonals was chosen. The blocking size of
400 has proven to be optimal for a wide range of architectures
With this optimization the result vector is effectively loaded only once from memory
if the block size bl is not too large. The code should thus get similar performance as
the CRS version, although code balance has not been changed. As anticipated above
with dense matrix transpose, blocking does not optimize for register reuse but for
cache utilization.
Fig. 27.13 shows a performance comparison of CRS and plain, 2-way unrolled
and blocked (b = 400) JDS sMVM on three different architectures. The CRS vari-
ant seems to be preferable for standard AMD and Intel microprocessors, which is
not surprising because it features the lowest code balance right away without any
subsequent manual optimizations and the short inner loop length is less unfavorable
on CPUs with out-of-order capabilities. The Intel Itanium2 processor with its EPIC
architecture, however, shows mediocre performance for CRS and tops at the blocked
JDS version. This architecture can not cope very well with the short loops of CRS
due to the absence of out-of-order processing and the compiler, despite detecting all
instruction-level parallelism on the inner loop level, not being able to overlap the
wind-down of one row with the wind-up phase of the next.
Like any other parallelization method, OpenMP is prone to the standard problems
of parallel programming: Serial fraction (Amdahl’s law) and load imbalance, both
introduced in Sect. 26.2.
An overabundance of serial code can easily arise when critical sections become
out of hand. If all but one threads continuously wait for a critical section to become
available, the program is effectively serialized. This can be circumvented by em-
ploying finer control on shared resources using named critical sections or OpenMP
locks. Sometimes it may even be useful to supply thread-local copies of otherwise
shared data that may be pulled together by a reduction operation at the end of a par-
allel region. The load imbalance problem can often be solved by choosing a different
OpenMP scheduling strategy (see Sect. 26.2.4.4).
There are, however, very specific performance problems that are inherently con-
nected to shared-memory programming in general and OpenMP in particular.
Fig. 27.14 shows a comparison of vector triad data in the purely serial case
and with one and four OpenMP threads, respectively. The presence of OpenMP
causes overhead at small N even if only a single thread is used. Using the IF
clause leads to an optimal combination of threaded and serial loop versions if
27 Optimization techniques in HPC 757
serial version
OMP_NUM_THREADS = 1
1500 OMP_NUM_THREADS = 4
OMP_NUM_THREADS = 4, if clause
MFlops/sec
1000
500
0
102 103 104 105 106
N
Fig. 27.14. OpenMP overhead and the benefits of the IF(N>10000) clause for the vector
triad benchmark. Note the impact of aggregate cache size on the position of the performance
breakdown from L2 to memory. (AMD Opteron 2.0 GHz)
the threshold is chosen appropriately, and is hence mandatory when large loop
lengths cannot be guaranteed.
As a side-note, there is another harmful effect of short loop lengths: If the num-
ber of iterations is comparable to the number of threads, load imbalance may
cause bad scalability.
– In loop nests, parallelize on a level as far out as possible. This is inherently con-
nected to the previous advice. Parallelizing inner loop levels leads to increased
OpenMP overhead because a team of threads is spawned or woken up multiple
times.
– Be aware that most OpenMP work-sharing constructs (including OMP DO and
END DO) insert automatic barriers at the end so that all threads have completed
their share of work before anything after the construct is executed. In cases
where this is not required, a NOWAIT clause removes the implicit barrier:
!$OMP PARALLEL
!$OMP DO
do i=1,N
A(i) = func1(B(i))
enddo
!$OMP END DO NOWAIT
! still in parallel region here. do more work:
!$OMP CRITICAL
CNT = CNT + 1
!$OMP END CRITICAL
!$OMP END PARALLEL
758 G. Hager and G. Wellein
There is also an implicit barrier at the end of a parallel region that cannot be re-
moved. In general, implicit barriers add to synchronization overhead like critical
regions, but they are often required to protect from race conditions.
integer, dimension(8) :: S
integer IND
S = 0
do i=1,N
IND = A(i)
S(IND) = S(IND) + 1
enddo
The loop starting at line 18 collects the partial results of all threads. Although this
is a valid OpenMP program, it will not run faster but much more slowly when using
four threads instead of one. The reason is that the two-dimensional array S contains
all the histogram data from all threads. With four threads these are 160 bytes, less
than two cache lines on most processors. On each histogram update to S in line 10,
the writing CPU must gain exclusive ownership of one of the two cache lines, i.e.
every write leads to a cache miss and subsequent coherence traffic. Compared to
the situation in the serial case where S fits into the cache of a single CPU, this will
result in disastrous performance.
One should add that false sharing can be eliminated in simple cases by the stan-
dard register optimizations of the compiler. If the crucial update operation can be
performed to a register whose contents are only written out at the end of the loop, no
write misses turn up. This is not possible in the above example, however, because
of the computed second index to S in line 10.
Getting rid of false sharing by manual optimization is often a simple task once
the problem has been identified. A standard technique is array padding, i.e. insertion
of a suitable amount of space between memory locations that get updated by differ-
ent threads. In the histogram example above, an even more painless solution exists
in the form of data privatization: On entry to the parallel region, each thread gets
its own local copy of the histogram array in its own stack space. It is very unlikely
that those different instances will occupy the same cache line, so false sharing is
not a problem. Moreover, the code is simplified and made equivalent with the serial
version by using the REDUCTION clause introduced in Sect. 26.2.4.4:
integer, dimension(8) :: S
integer IND
S=0
!$OMP PARALLEL DO PRIVATE(IND) REDUCTION(+:S)
do i=1,N
IND = A(i)
S(IND) = S(IND) + 1
enddo
!$OMP EMD PARALLEL DO
Setting S to zero is only required for serial equivalence as the reduction clause au-
tomatically initializes the variables in question with appropriate starting values. We
must add that OpenMP reduction to arrays in Fortran does not work for allocatable,
pointer or assumed size types.
calculates successive elements (or blocks of elements) of the result vector (see
Fig. 27.15). For the CRS matrix format, this principle can be applied in a straight-
forward manner:
!$OMP PARALLEL DO PRIVATE(j)1
do i = 1,Nr
do j = row_ptr(i), row_ptr(i+1) - 1
c(i) = c(i) + val(j) * b(col_idx(j))
enddo
enddo
!$OMP END PARALLEL DO
Due to the long outer loop, OpenMP overhead is usually not a problem here. De-
pending on the concrete form of the matrix, however, some loop imbalance might
occur if very short or very long matrix rows are clustered at some regions. A differ-
ent kind of OpenMP scheduling strategy like DYNAMIC or GUIDED might help in
this situation.
The vanilla JDS sMVM is also parallelized easily:
!$OMP PARALLEL PRIVATE(diag,diagLen,offset)
do diag=1, Nj
diagLen = jd_ptr(diag+1) - jd_ptr(diag)
offset = jd_ptr(diag)
!$OMP DO
do i=1, diagLen
c(i) = c(i) + val(offset+i) * b(col_idx(offset+i))
enddo
!$OMP END DO
enddo
T0
T1
T2 = + *
T3
T4
Fig. 27.15. Parallelization approach for sparse MVM (five threads). All marked elements
are handled in a single iteration of the parallelized loop. The r.h.s. vector is accessed by all
threads
1
The privatization of inner loop indices in the lexical extent of a parallel outer loop is not
required in Fortran, but it is in C/C++ [4].
27 Optimization techniques in HPC 761
The parallel loop is the inner loop in this case, but there is no OpenMP overhead
problem as the loop count is large. Moreover, in contrast to the parallel CRS version,
there is no load imbalance because all inner loop iterations contain the same amount
of work. All this would look like an ideal situation were it not for the bad code
balance of vanilla JDS sMVM. However, the unrolled and blocked versions can be
equally well parallelized. For the blocked code (see Fig. 27.12), the outer loop over
all blocks is a natural candidate:
!$OMP DO PARALLEL DO PRIVATE(block_start,block_end,i,diag,
!$OMP& diagLen,offset)
do ib=1,Nr ,b
block_start = ib
block_end = min(ib+b-1,Nr )
do diag=1,Nj
diagLen = jd_ptr(diag+1)-jd_ptr(diag)
offset = jd_ptr(diag)
if(diagLen .ge. block_start) then
do i=block_start, min(block_end,diagLen)
c(i) = c(i)+val(offset+i)*b(col_idx(offset+i))
enddo
endif
enddo
enddo
!$OMP END PARALLEL DO
This version has even got less OpenMP overhead because the DO directive is on the
outermost loop. Unfortunately, there is more potential for load imbalance because
of the matrix rows being sorted for size. But as the dependence of workload on loop
index is roughly predictable, a static schedule with a chunk size of one can remedy
most of this effect.
Fig. 27.16 shows performance and scaling behavior of the parallel CRS and
blocked JDS versions on three different architectures. In all cases, the code was run
on as few locality domains or sockets as possible, i.e. first filling one locality domain
or socket before going to the next. On the ccNUMA systems (Altix and Opterons,
equivalent to the block diagrams in Figs. 26.23 and 26.24), the performance charac-
teristics with growing CPU number is obviously fundamentally different from the
UMA system (Xeon/Core node like in Fig. 26.22). Both code versions seem to be
extremely unsuitable for ccNUMA. Only the UMA node shows the expected be-
havior of strong bandwidth saturation at 2 threads and significant speedup when the
second socket gets used (additional bandwidth due to second FSB).
The reason for the failure of ccNUMA to deliver the expected bandwidth lies in
our ignorance of a necessary prerequisite for scalability that we have not honored
yet: Correct data and thread placement for access locality.
762 G. Hager and G. Wellein
800
700
600
500
MFlops/sec
400
0
1 2 4 6 8
# Threads
Fig. 27.16. Performance and strong scaling for straightforward OpenMP parallelization of
sparse MVM on three different architectures, comparing CRS (open symbols) and blocked
JDS (closed symbols) variants. The Intel Xeon/Core system (dashed) is of UMA type, the
other two systems are ccNUMA
It was mentioned already in the section on ccNUMA architecture that locality and
congestion problems (see Figs. 27.17 and 27.18) tend to turn up when thread-
s/processes and their data are not carefully placed across the locality domains of
a ccNUMA system. Unfortunately, the current OpenMP standard does not refer to
placement at all and it is up to the programmer to use the tools that system builders
provide.
The placement problem has two dimensions: First, one has to make sure that
memory gets mapped into the locality domains of processors that actually access
them. This minimizes NUMA traffic across the network. Second, threads or pro-
cesses must be “pinned” to those CPUs which had originally mapped their memory
regions in order not to lose locality of access. In this context, mapping means that
a page table entry is set up which describes the association of a physical with a vir-
P P P P
C C C C
C C C C
Memory Memory
Fig. 27.17. Locality problem on a ccNUMA system. Memory pages got mapped into a local-
ity domain that is not connected to the accessing processor, leading to NUMA traffic
27 Optimization techniques in HPC 763
P P P P
C C C C
C C C C
Memory Memory
Fig. 27.18. Congestion problem on a ccNUMA system. Even if the network is very fast, a
single locality domain can usually not saturate the bandwidth demands from concurrent local
and non-local accesses
is still sequential, the data will be distributed across the locality domains. Array B
does not have to be initialized but will automatically be mapped correctly.
A required condition for this strategy to work is that the OpenMP loop schedules
of initialization and work loops are identical and reproducible, i.e. the only possible
choice is STATIC with a constant chunk size. As the OpenMP standard does not
define a default schedule, it is generally a good idea to specify it explicitly on all
parallel loops. All current compilers choose STATIC by default, though. Of course,
the use of a static schedule poses some limits on possible optimizations for elim-
inating load imbalance. One option is the choice of an appropriate chunk size (as
small as possible, but at least several pages).
Unfortunately it is not always at the programmer’s discretion how and when data
is touched first. In C/C++, global data (including objects) is initialized before the
main() function even starts. If globals cannot be avoided, properly mapped local
copies of global data may be a possible solution, code characteristics in terms of
communication vs. calculation permitting [5]. A discussion of some of the problems
that emerge from the combination of OpenMP with C++ can be found in [6].
The initialization of b is based on the assumption that the non-zeroes of the matrix
are roughly clustered around the main diagonal. Depending on the matrix structure
it may be hard in practice to perform proper placement for the r.h.s. vector at all.
Fig. 27.19 shows performance data for the same architectures and sMVM codes
as in Fig. 27.16 but with appropriate ccNUMA placement. There is no change in
27 Optimization techniques in HPC 765
Fig. 27.19. Performance and strong scaling for ccNUMA-optimized OpenMP parallelization
of sparse MVM on three different architectures, comparing CRS (open symbols) and blocked
JDS (closed symbols) variants. Cf. Fig. 27.16 for performance without proper placement
scalability for the UMA platform, which was to be expected, but also on the cc-
NUMA systems for up to two threads. The reason is of course that both architectures
feature two-processor locality domains which are of UMA type. On four threads
and above, the locality optimizations yield dramatically improved performance. Es-
pecially for the CRS version scalability is nearly perfect when going from 2n to
2(n + 1) threads (due to bandwidth limitations inside the locality domains, scalabil-
ity on ccNUMA systems should always be reported with reference to performance
on all cores of a locality domain). The JDS variant of the code benefits from the op-
timizations as well, but falls behind CRS for larger thread numbers. This is because
of the permutation map for JDS which makes it hard to place larger portions of the
r.h.s. vector into the correct locality domains, leading to increased NUMA traffic.
It should be obvious by now that data placement is of premier importance on cc-
NUMA architectures, including commonly used two-socket cluster nodes. In prin-
ciple, ccNUMA features superior scalability for memory-bound codes, but UMA
systems are much easier to handle and require no code optimization for locality of
access. It is to be expected, though, that ccNUMA designs will prevail in the mid-
term future.
27.2.3.3 Pinning
One may speculate that the considerations about locality of access on ccNUMA sys-
tems from the previous section do not apply for MPI-parallelized code. Indeed, MPI
processes have no concept of shared memory. They allocate and first-touch memory
pages in their own locality domain by default. Operating systems are nowadays ca-
pable of maintaining strong affinity between threads and processors, meaning that a
thread (or process) will be reluctant to leave the processor it was initially started on.
However, it might happen that system processes or interactive load push threads off
766 G. Hager and G. Wellein
their original CPUs. It is not guaranteed that the previous state will be re-established
after the disturbance. One indicator for insufficient thread affinity are erratic perfor-
mance numbers (i.e., varying from run to run). Even on UMA systems insufficient
affinity can lead to problems if the UMA node is divided into sections (e.g., sockets
with dual-core processors like in Fig. 26.22) that have separate paths to memory and
internal shared caches. It may be of advantage to keep neighboring thread IDs on
the cores of a socket to exploit the advantage of shared caches. If only one core per
socket is used, migration of both threads to the same socket should be avoided if the
application is bandwidth-bound.
The programmer can avoid those effects by pinning threads to CPUs. Every
operating system has ways of limiting the mobility of threads and processes. Unfor-
tunately, these are by no means portable, but there is always a low-level interface
with library calls that access the basic functionality. Under the Linux OS, PLPA [7]
can be used for that purpose. The following is a C example that pins each thread to
a CPU whose ID corresponds to the thread ID:
#include <plpa.h>
...
#pragma omp parallel
{
plpa_cpu_set_t mask;
PLPA_CPU_ZERO(&mask);
int id = omp_get_thread_num();
PLPA_CPU_SET(id,&mask);
PLPA_NAME(sched_setaffinity)((pid_t)0, (size_t)32, &mask);
}
The mask variable is used as a bit mask to identify those CPUs the thread should
be restricted to by setting the corresponding bits to one (this could be more than one
bit, a feature often called CPU set). After this code has executed, no thread will be
able to leave its CPU any more.
System vendors often provide high-level interfaces to the pinning or CPU set
mechanism. Please consult the system documentation for details.
There is one important topic in code optimization that we have neglected for
brevity: The start of any serious optimization attempt on a nontrivial application
should be the production of a profile that identifies the hot spots, i.e. the parts of
the code that take the most time to execute. Many tools, free and commercial, exist
in this field and more are under development. In which form a programmer should
be presented performance data for a parallel run with thousands of processors and
how the vast amounts of data can be filtered to extract the important insights is the
subject of intense research. Multi-core technologies are adding another dimension
to this problem.
References
1. A. Hoisie, O. Lubeck, H. Wassermann, Int. J. High Perform. Comp. Appl. 14, 330 (2000)
738
2. P.F. Spinnato, G. van Albada, P.M. Sloot, IEEE Trans. Parallel Distrib. Systems 15(1),
81 (2004) 738
3. R. Barrett, M. Berry, T. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo,
C. Romine, H. van der Vorst, Templates for the Solution of Linear Systems: Building
Blocks for Iterative Methods (SIAM, 1994) 750
4. URL http://www.openmp.org 760
5. B. Chapman, F. Bregier, A. Patil, A. Prabhakar, Concurrency Comput.: Pract. Exper. 14,
713 (2002) 764
6. C. Terboven, D. an Mey, in Proceedings of IWOMP2006 — International Workshop
on OpenMP, Reims, France, June 12–15, 2006. (2006). URL http://iwomp.
univ-reims.fr/cd/papers/TM06.pdf 764
7. URL http://www.open-mpi.org/software/plpa/ 766
A Appendix: Abbreviations
Abbreviation Explanation
1D,2D,3D one-, two-, three-dimensional
1D3V 1D in a usual space and 3D in a velocity space
ABINIT DFT software package (open source)
AIREBO Adaptive Intermolecular Reactive Bond Order
API Application Programming Interface
ARPES Angle-Resolved Photo-Emission Spectroscopy
BE Boltzmann Equation
BIT1 1D3V PIC code
BLAS Basic Linear Algebra Subsystem
BO Born-Oppenheimer
CASTEP DFT software package (commercial)
ccNUMA cache-coherent Non-Uniform Memory Architecture
CDW Charge Density Wave
CF Correlation Functions
CI Configuration Interaction
CIC Cloud in Cell
CISC Complex Instruction Set Computing
CO Complex Object
CP Car-Parrinello
CPA Coherent Potential Approximation
CP-PAW Car-Parrinello software package
CPT Cluster Perturbation Theory
CPU Central Processing Unit
CRS Compressed Row Storage
dc direct current
DDCF Density-Density time Correlation Functions
DDMRG Dynamical Density Matrix Renormalization Group
DFT Density Functional Theory
DMC Diagrammatic Monte Carlo
DMFT Dynamical Mean-Field Theory
DMRG Density-Matrix Renormalization Group
(continued)
770 A Appendix: Abbreviations
(continued)
Abbreviation Explanation
DOS Density of States
DP Double Precision
DRAM Dynamic Random Access Memory
DTMRG Dynamical TMRG
ED Exact Diagonalization
EDIP Environment-Depedent Interaction Potential
EIRENE A Monte Carlo linear transport solver
EPIC Explicitly Parallel Instruction Computing
FD Feynman Diagram
FFT Fast Fourier Transform
FFTW Fastest Fourier Transform in the West (FFT library)
FHI98md DFT software package
FMM Fast Multipole Method
FP Floating Point
FPGA Field Programmable Gate Arrays
FSB Front Side Bus
GAUSSIAN computational chemistry software program
GF Green Function
GTO Gaussian Type Orbitals
GMRES Generalized Minimum Residual Method
GPU Graphics Processing Unit
HF Hartree-Fock
HPC High Performance Computing
HPF High Performance Fortran
HT Hypertransport
IKP Improved Kelbg Potential
ILP Instruction-Level Parallelism
JDS Jagged Diagonals Storage
KPM Kernel Polynomial Method
LAPACK Linear Algebra Package
LD Local Distribution
LDA Local Density Approximation
LDA-KS Local Density Approximation in the Kohn-Sham scheme
LDOS Local Density of States
LINPACK Linear Algebra Package (superseeded by LAPACK)
LJ Lennard-Jones
LR Lanczos Recursion
LRU Least Recently Used
MC Monte Carlo
MD Molecular Dynamics
MEM Maximum Entropy Method
MESI Modified/Exclusive/Shared/Invalid protocol
MIPS Microprocessor without Interlocked Pipeline Stages
MIT Metal-Insulator Transition
MMM Matrix Matrix Multiplication
A Appendix: Abbreviations 771
Abbreviation Explanation
MOLPRO quantum chemistry software package
MP Message Passing
MPI Message Passing Interface
MPMD Multiple Program Multiple Data
MVM Matrix Vector Multiplication
NGP Nearest Grid Point
NI Network Interface
NL NUMA Link
NRG Numerical Renomalization Group
NUMA Non-Uniform Memory Architecture
NWChem computational chemistry software package
OpenMP Open Multi-Processing
OS Operating System
PDP1 Programmed Data Processor 1
PES Potential Energy Surface
PIC Particle-in-Cell
PIC-MCC Particle-in-Cell Monte Carlo Collision
PIMC Path Integral Monte Carlo
PJT Pseudo Jahn-Teller
PLPA Pageable Link Pack Area
POSIX Portable Operating System Interface
QMC Quantum Monte Carlo
QMD Quantum Molecular Dynamics
QMR Quasi Minimum Residual Method
QP Quantum Particle
QPT Quantum Phase Transition
REBO Reactive Empirical Bond Order
RFO Read For Ownership
RG Renormalization Group
RISC Reduced Instruction Set Computing
RKHS Reproducing Kernel Hilbert Space
SIAM Single Impurity Anderson Model;
Society for Industrial and Applied Mathematics
SIMD Single Instruction Multiple Data
SMP Symmetric Multi-Processing
sMVM Sparse Matrix Vector Multiplication
SO Stochastic Optimization
SPEC Standard Performance Evaluation Corporation
SP Single Precision
SPMD Single Program Multiple Data
STL Standard Template Library
STM Scanning Tunneling Microscopy
STO Slater Type Orbitals
TCP/IP Transmission Control Protocol / Internet Protocol
TLB Translation Look-aside Buffer
(continued)
772 A Appendix: Abbreviations
(continued)
Abbreviation Explanation
TMRG Transfer Matrix Renormalization Group
UMA Uniform Memory Architecture
UPC Unified Parallel C
VASP ab initio molecular dynamics software package
VBS Valence Bond Solid
WF Wave Function
XOOPIC X-windows Object Oriented PIC
XPDP1 X-windows PDP1 plasma code
Index
false sharing, 723, 758 Heisenberg model, 278, 303, 474–477, 529,
fast Fourier transform, 175, 181, 209, 303, 537, 671
309, 423, 554 hidden free energy barriers, 134
Fermi gas, 261 High Performance Fortran, 709
Fermi liquid, 223, 485 Hilbert transform, 480, 561
Fermi surface, 227, 497, 592 Hirsch-Fye algorithm, 337–343, 482
harmonics, 237 Holstein model, 358, 521, 523, 562, 567
ferromagnetism, 81, 99, 119, 474, 477, 489, Holstein-Hubbard model, 368
529, 658, 660 Hubbard model, 455, 473, 480, 484–490,
Feynman expansion, 375, 383, 479 496, 529–537, 540, 543, 570, 574,
field weighting, 173 632, 655
finite-size scaling, 84, 114–128, 307, 475, multi-orbital, 490
591, 630 hypertransport, 720
first touch policy, 763
flop, 683 importance sampling, 73, 85, 151, 375
Fortuin-Kasteleyn representation, 93, 289, instruction throughput, 686
303 instruction-level parallelism, 686
Fredholm integral equation, 63, 141, 374 interaction representation, 302, 311, 375
front-side bus, 718 Ising model, 81, 586
Gauss distribution, 66, 69, 185, 364 jackknife method, 107, 360
Gaussian flux distribution, 70 Jacobi-Davidson algorithm, 541
Gibbs oscillation, 549 jagged diagonals storage, 751
Glauber algorithm, 89
global optimization, 443 Kelbg potential, improved, 44
goodness-of-fit parameter, 117 kernel
Green function, 148, 478, 485, 552, 554, collision, 146
562, 568 Dirichlet, 550
local, 480, 481, 509 Fejér, 551
Gustafson’s law, 705 Jackson, 551
gyrofluid model Lorentz, 552
three-field, 204 subcritical, 152
two-fluid equations, 193 transport, 148
vorticity equation, 207 kernel polynomial method, see Chebyshev
gyrokinetics expansion
dispersion relation and fluctuation Kholevo bound, 654
spectrum, 209 Kohn-Sham method, 433
guiding center drift velocity, 197 Kondo problem, 341, 482, 600
gyro-averaged potential, 201 Krylov space, 540, 625, 638
gyro-center eq. of motion, 195 Kubo formalism, 253, 560
gyrophase-averaged eq. of motion, 197
history, 192 Lanczos algorithm, 638, 642
one-form, 198 correlation functions, 572, 625
particle simulation, 207 DMRG, 625
polarization drift, 202 eigenvalues, 539
eigenvectors, 540
Hartree approximation, 427, 477 latency, 693, 698
Hartree-Fock approximation, 427–432 of network, 715
heat-bath algorithm, 88 leap-frog algorithm, 16, 164
Index 777