0% found this document useful (0 votes)
9 views140 pages

Stat Phys 2010

The document outlines a graduate-level course on Statistical Physics, covering topics such as thermodynamics, statistical distributions, entropy, gases, phase transitions, and fluctuations. It emphasizes the importance of understanding macroscopic properties and their underlying microscopic states, utilizing principles like the extremum principle to determine equilibrium states. The course aims to provide a comprehensive overview of both foundational concepts and advanced topics in statistical physics.

Uploaded by

mtasnim33
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views140 pages

Stat Phys 2010

The document outlines a graduate-level course on Statistical Physics, covering topics such as thermodynamics, statistical distributions, entropy, gases, phase transitions, and fluctuations. It emphasizes the importance of understanding macroscopic properties and their underlying microscopic states, utilizing principles like the extremum principle to determine equilibrium states. The course aims to provide a comprehensive overview of both foundational concepts and advanced topics in statistical physics.

Uploaded by

mtasnim33
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 140

Statistical Physics

G. Falkovich
http://www.weizmann.ac.il/home/fnfal/papers/statphys2010.pdf

More is different
P.W. Anderson

Contents
1 Thermodynamics (brief reminder) 4
1.1 Basic notions . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Legendre transform . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Stability of thermodynamic systems . . . . . . . . . . . . . . . 13

2 Basic statistical physics (brief reminder) 16


2.1 Distribution in the phase space . . . . . . . . . . . . . . . . . 16
2.2 Microcanonical distribution . . . . . . . . . . . . . . . . . . . 17
2.3 Canonical distribution . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Grand canonical ensemble . . . . . . . . . . . . . . . . . . . . 22
2.5 Two simple examples . . . . . . . . . . . . . . . . . . . . . . . 24
2.5.1 Two-level system . . . . . . . . . . . . . . . . . . . . . 24
2.5.2 Harmonic oscillators . . . . . . . . . . . . . . . . . . . 27

3 Entropy and information 29


3.1 Lyapunov exponent . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Adiabatic processes and the third law . . . . . . . . . . . . . . 33
3.3 Information theory approach . . . . . . . . . . . . . . . . . . . 34

4 Gases 41
4.1 Ideal Gases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1.1 Boltzmann (classical) gas . . . . . . . . . . . . . . . . . 41

1
4.2 Fermi and Bose gases . . . . . . . . . . . . . . . . . . . . . . . 47
4.2.1 Degenerate Fermi Gas . . . . . . . . . . . . . . . . . . 49
4.2.2 Photons . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.3 Phonons . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.4 Bose gas of particles and Bose-Einstein condensation . 55

5 Non-ideal gases 58
5.1 Cluster and virial expansions . . . . . . . . . . . . . . . . . . . 58
5.2 Van der Waals equation of state . . . . . . . . . . . . . . . . . 61
5.3 Coulomb interaction and screening . . . . . . . . . . . . . . . 64

6 Phase transitions 69
6.1 Thermodynamic approach . . . . . . . . . . . . . . . . . . . . 69
6.1.1 Necessity of the thermodynamic limit . . . . . . . . . . 69
6.1.2 First-order phase transitions . . . . . . . . . . . . . . . 71
6.1.3 Second-order phase transitions . . . . . . . . . . . . . . 73
6.1.4 Landau theory . . . . . . . . . . . . . . . . . . . . . . 74
6.2 Ising model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.2.1 Ferromagnetism . . . . . . . . . . . . . . . . . . . . . . 77
6.2.2 Impossibility of phase coexistence in one dimension . . 83
6.2.3 Equivalent models . . . . . . . . . . . . . . . . . . . . 84

7 Fluctuations 88
7.1 Thermodynamic fluctuations . . . . . . . . . . . . . . . . . . . 88
7.2 Spatial correlation of fluctuations . . . . . . . . . . . . . . . . 90
7.3 Different order parameters . . . . . . . . . . . . . . . . . . . . 93
7.3.1 Goldstone mode and Mermin-Wagner theorem . . . . . 93
7.3.2 Berezinskii-Kosterlitz-Thouless phase transition . . . . 96
7.3.3 Higgs mechanism . . . . . . . . . . . . . . . . . . . . . 100
7.4 Universality classes and renormalization group . . . . . . . . . 102

8 Random walks and fluctuating fields 106


8.1 Random walk and diffusion . . . . . . . . . . . . . . . . . . . 106
8.2 Analogy between quantum mechanics and statistical physics . 110
8.3 Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . 112

2
9 Response and fluctuations 115
9.1 Static response . . . . . . . . . . . . . . . . . . . . . . . . . . 115
9.2 Temporal correlation of fluctuations . . . . . . . . . . . . . . . 118
9.3 Spatio-temporal correlation function . . . . . . . . . . . . . . 123
9.4 General fluctuation-dissipation relation . . . . . . . . . . . . . 124
9.5 Central limit theorem and large deviations . . . . . . . . . . . 128

3
This is a graduate one-semester course. Chapters 1,2,4 briefly remind
what is supposed to be known from the undergraduate courses, using a bit
more sophisticated language.

1 Thermodynamics (brief reminder)


Physics is an experimental science, and laws appear usually by induction:
from particular cases to a general law and from processes to state functions.
The latter step requires integration (to pass, for instance, from Newton equa-
tion of mechanics to Hamiltonian or from thermodynamic equations of state
to thermodynamic potentials). Generally, it is much easier to differentiate
then to integrate and so deduction (or postulation approach) is usually much
more simple and elegant. It also provides a good vantage point for further
applications and generalizations. In such an approach, one starts from pos-
tulating same function of the state of the system and deducing from it the
laws that govern changes when one passes from state to state. Here such a
postulation presentation is presented for thermodynamics following the book
H. B. Callen, Thermodynamics (John Wiley & Sons, NYC 1965).

1.1 Basic notions


We use macroscopic description so that some degrees of freedom remain
hidden. Compare mechanics, electricity and magnetism (as related to the
explicit macroscopic degrees of freedom) versus thermodynamics (as related
to the macroscopic manifestations of the hidden degrees of freedom). When
detailed knowledge is unavailable, physicists use symmetries or conserva-
tion laws. Thermodynamics studies restrictions on the possible properties of
macroscopic matter that follow from the symmetries of the fundamental laws.
Therefore, thermodynamics does not predict numerical values but rather sets
inequalities and establishes relations among different properties.
The basic symmetry is invariance with respect to time shifts which gives
energy conservation1 . That allows one to introduce the internal energy E.
We define work as the energy change of macroscopic degrees of freedom
and heat as the energy change of hidden degrees of freedom. To be able
1
Be careful trying to build thermodynamic description for biological or social-economic
systems, since generally they are not time-invariant. For instance, the amount of money
generally is not conserved.

4
to measure energy changes in principle, we need adiabatic processes where
there is no heat exchange. We wish to establish the energy of a given system
in equilibrium states which are those that can be completely characterized
by the static2 values of extensive parameters like energy E, volume V and
mole number N (number of particles divided by the Avogadro number 6.02×
1023 ). Other extensive quantities may include numbers of different sorts of
particles, electric and magnetic moments etc i.e. everything which value for
a composite system is a direct sum of the values for the components. For
a given system, any two equilibrium states A and B can be related by an
adiabatic process either A → B or B → A, which allows to measure the
difference in the internal energy by the work W done by the system. Now,
if we encounter a process where the energy change is not equal to minus the
work done by the system, we call the difference the heat flux into the system:

dE = δQ − δW . (1)

This statement is known as the first law of thermodynamics. We use δ


since the heat and work aren’t differentials of any function as they refer to
particular forms of energy transfer (not energy content).
The basic problem of thermodynamics is the determination of the equilib-
rium state that eventually results after all internal constraints are removed
in a closed composite system. The problem is solved with the help of ex-
tremum principle: there exists an extensive quantity S called entropy which
is a function of the extensive parameters of any composite system. The
values assumed by the extensive parameters in the absence of an internal
constraint maximize the entropy over the manifold of constrained equilib-
rium states. Since the entropy is extensive it is a homogeneous first-order
function of the extensive parameters: S(λE, λV, . . .) = λS(E, V, . . .). The
entropy is a continuous differentiable function of its variables. This function
(called also fundamental relation) is everything one needs to know to solve
the basic problem (and other problems in thermodynamics as well).
Since the entropy is generally a monotonic function of energy3 then S =
S(E, V, . . .) can be solved uniquely for E(S, V, . . .) which is an equivalent fun-
damental relation. Indeed, assume (∂E/∂S)X > 0 and consider S(E, X) and
2
In other words, equilibrium states must be independent of the way they are prepared.
3
This is not always so, we shall see in the course of statistical physics that the two-level
system gives a counter-example as well as other systems with a finite phase space.

5
E(S, X). Then4 (∂S/∂X)E = 0 ⇒ (∂E/∂X)S = −(∂S/∂X)E (∂E/∂S)X =
0. Differentiating the last relation once more time we get (∂ 2 E/∂X 2 )S =
−(∂ 2 S/∂X 2 )E (∂E/∂S)X , since the derivative of the second factor is zero as
it is at constant X. We thus see that the equilibrium is defined by the energy
minimum instead of the entropy maximum (very much like circle can be de-
fined as the figure of either maximal area for a given perimeter or of minimal
perimeter for a given area). On the figure, unconstrained equilibrium states
lie on the curve while all other states lie below. One can reach the state A
either maximizing entropy at a given energy or minimizing energy at a given
entropy:
S

One can work either in energy or entropy representation but ought to be


careful not to mix the two. Experimentally, one usually measures changes
thus finding derivatives (called equations of state). The partial derivatives
of an extensive variable with respect to its arguments (also extensive param-
eters) are intensive parameters5 . For example, for the energy one writes
∂E ∂E ∂E
≡T, ≡ −P ≡ µ ,... (2)
∂S ∂V ∂N
These relations are called the equations of state and they serve as definitions
for temperature T , pressure P and chemical potential µ while the respective
extensive quantities are S, V, N . From (2) we write
dE = δQ − δW = T dS − P dV + µdN . (3)
Entropy is thus responsible for hidden degrees of freedom (i.e. heat) while
other extensive parameters describe macroscopic degrees of freedom. The
4
An efficient way to treat partial derivatives is to use jacobians ∂(u, v)/∂(x, y) =
(∂u/∂x)(∂v/∂y) − (∂v/∂x)(∂u/∂y) and the identity (∂u/∂x)y = ∂(u, y)/∂(x, y).
5
In thermodynamics we have only extensive and intensive variables (and not, say,
surface-dependent terms ∝ N 2/3 ) because we take thermodynamic limit N → ∞, V → ∞
keeping N/V finite.

6
derivatives (2) are defined only in equilibrium. Therefore, δQ = T dS and
δW = P dV − µdN for quasi-static processes i.e such that the system is close
to equilibrium at every point of the process. A process can be considered
quasi-static if its typical time of change is larger than the relaxation times
(which for pressure can be estimates as L/c, for temperature as L2 /κ, where
L is a system size, c - sound velocity and κ thermal conductivity). Finite
deviations from equilibrium make dS > δQ/T because entropy can increase
without heat transfer.
Let us give an example how the entropy maximum principle solves the basic
problem. Consider two simple systems separated by a rigid wall which is
impermeable for anything but heat. The whole composite system is closed
that is E1 + E2 =const. The entropy change under the energy exchange,
( )
∂S1 ∂S2 dE1 dE2 1 1
dS = dE1 + dE2 = + = − dE1 ,
∂E1 ∂E2 T1 T2 T1 T2
must be positive which means that energy flows from the hot subsystem to
the cold one (T1 > T2 ⇒ ∆E1 < 0). We see that our definition (2) is in
agreement with our intuitive notion of temperature. When equilibrium is
reached, dS = 0 which requires T1 = T2 . If fundamental relation is known,
then so is the function T (E, V ). Two equations, T (E1 , V1 ) = T (E2 , V2 ) and
E1 + E2 =const completely determine E1 and E2 . In the same way one can
consider movable wall and get P1 = P2 in equilibrium. If the wall allows for
particle penetration we get µ1 = µ2 in equilibrium.
Both energy and entropy are homogeneous first-order functions of its vari-
ables: S(λE, λV, λN ) = λS(E, V, N ) and E(λS, λV, λN ) = λE(S, V, N )
(here V and N stand for the whole set of extensive macroscopic parame-
ters). Differentiating the second identity with respect to λ and taking it at
λ = 1 one gets the Euler equation

E = T S − P V + µN . (4)

Let us show that there are only two independent parameters for a simple one-
component system, so that chemical potential µ, for instance, can be found
as a function of T and P . Indeed, differentiating (4) and comparing with (3)
one gets the so-called Gibbs-Duhem relation (in the energy representation)
N dµ = −SdT + V dP or for quantities per mole, s = S/N and v = V /N :
dµ = −sdT + vdP . In other words, one can choose λ = 1/N and use
first-order homogeneity to get rid of N variable, for instance, E(S, V, N ) =

7
N E(s, v, 1) = N e(s, v). In the entropy representation the Gibbs-Duhem
relation is again states that the sum of products of the extensive parameters
and the differentials of the corresponding intensive parameters vanish:

Ed(1/T ) + V d(P/T ) − N d(µ/T ) = 0 . (5)

One uses µ(P, T ), for instance, when considering systems in the external
field. One then adds the potential energy (per particle) u(r) to the chemical
potential so that the equilibrium condition is µ(P, T ) + u(r) =const. Par-
ticularly, in the gravity field u(r) = mgz and differentiating µ(P, T ) under
T = const one gets vdP = −mgdz. Introducing density ρ = m/v one gets
the well-known hydrostatic formula P = P0 − ρgz. For composite systems,
the number of independent intensive parameters (thermodynamic degrees of
freedom) is the number of components plus one.
Processes. While thermodynamics is fundamentally about states it is
also used for describing processes that connect states. Particularly important
questions concern performance of engines and heaters/coolers. Heat engine
works by delivering heat from a reservoir with some higher T1 via some system
to another reservoir with T2 doing some work in the process. If the entropy
of the hot reservoir decreases by some ∆S1 then the entropy of the cold one
must increase by some ∆S2 ≥ ∆S1 . The work ∆W is the difference between
the heat given by the hot reservoir T1 ∆S1 and the heat absorbed by the cold
one T2 ∆S2 . It is clear that maximal work is achieved for minimal entropy
change ∆S2 = ∆S1 , which happens for reversible (quasi-static) processes —
if, for instance, the system is a gas which works by moving a piston then the
pressure of the gas and the work are less for a fast-moving piston than in
equilibrium. Engine efficiency is the fraction of heat used for work that is
∆W ∆Q1 − ∆Q2 T2 ∆S2 T2
= =1− ≤1− .
∆Q1 ∆Q1 T1 ∆S1 T1

Similarly, refrigerator/heater is something that does work to transfer heat


from cold to hot systems. The performance is characterized by the ratio of
transferred heat to the work done: ∆Q2 /∆W ≤ T2 /(T1 − T2 ).
A specific procedure to accomplish reversible heat and work transfer is
to use an auxiliary system which undergoes so-called Carnot cycle, where
heat exchanges take place only at two temperatures. Engine goes through:
1) isothermal expansion at T1 , 2) adiabatic expansion until temperature falls
to T2 , 3) isothermal compression until the entropy returns to its initial value,

8
4) adiabatic compression until the temperature reaches T1 . The auxiliary
system does work during 1 and 2 increasing the energy of our system which
then decreases its energy by working on the auxiliary system during 3 and 4.
The total work is the area in the graph. For heat transfer, one reverses the
direction.
T P
1
T1 1

4
4 2
2
3
T2 3
S V

Carnot cycle in T-S and P-V variables


Carnot cycle provides one with an operational method to measure the
ratio of two temperatures by measuring the engine efficiency6 .
Summary of formal structure. The fundamental relation (in energy rep-
resentation) E = E(S, V, N ) is equivalent to the three equations of state (2).
If only two equations of state are given then Gibbs-Duhem relation may be
integrated to obtain the third up to an integration constant; alternatively
one may integrate molar relation de = T ds − P dv to get e(s, v) again with
an undetermined constant of integration.
Example: consider an ideal monatomic gas characterized by two equations
of state (found, say, experimentally with R ≃ 8.3 J/mole K ≃ 2 cal/mole K ):

P V = N RT , E = 3N RT /2 . (6)

The extensive parameters here are E, V, N so we want to find the fundamental


equation in the entropy representation, S(E, V, N ). To use (4) we need to
find µ using Gibbs-Duhem relation in the entropy representation (5). We
express intensive via extensive variables in the equations of state (6), compute
d(1/T ) = −3Rde/2e2 and d(P/T ) = −Rdv/v 2 , and substitute into (5)
( )
µ 3R R µ 3R
d =− de − dv , =C− ln e − R ln v ,
T 2e v T 2
1 P µ 3R e v
s = e + v − = s0 + ln + R ln . (7)
T T T 2 e0 v0
6
Practical needs to estimate the engine efficiency during the industrial revolution led
to the development of such abstract concepts as entropy

9
Here e0 , v0 are parameters of the state of zero internal energy used to deter-
mine the temperature units, and s0 is the constant of integration.

1.2 Legendre transform


. Let us emphasize that the fundamental relation always relates extensive
quantities. Therefore, even though it is always possible to eliminate, say,
S from E = E(S, V, N ) and T = T (S, V, N ) getting E = E(T, V, N ), this
is not a fundamental relation and it does not contain all the information.
Indeed, E = E(T, V, N ) is actually a partial differential equation (because
T = ∂E/∂S) and even if it can be integrated the result would contain un-
determined function. Still, it is easier to measure, say, temperature than
entropy so it is convenient to have a complete formalism with intensive pa-
rameters as operationally independent variables and extensive parameters
as derived quantities. This is achieved by the Legendre transform: To pass
from the relation Y = Y (X) to that in terms of P = ∂Y /∂X it is not enough
to eliminate X and consider the function Y = Y (P ), which determines the
curve Y = Y (X) only up to a shift along X:
Y Y

X X

To fix the shift one may consider the curve as the envelope of the family of
the tangent lines characterized by the slope P and the position ψ of intercept
of the Y -axis. The function ψ(P ) = Y [X(P )] − P X(P ) completely defines
the curve; here one substitutes X(P ) found from P = ∂Y (X)/∂X (which is
possible only when ∂P/∂X = ∂ 2 Y /∂X 2 ̸= 0). The function ψ(P ) is referred
to as a Legendre transform of Y (X). From dψ = −P dX − XdP + dY =
−XdP one gets −X = ∂ψ/∂P i.e. the inverse transform is the same up to
a sign: Y = ψ + XP . In mechanics, we use the Legendre transform to pass
from Lagrangian to Hamiltonian description.

10
Y

Y = Ψ + XP

P X

8. Replacing different extensive parameters by the respective intensive pa-


rameters we obtain thermodynamics potentials suitable for different physical
situations.
Free energy F = E − T S (also called Helmholtz potential) is that partial
Legendre transform of E which replaces the entropy by the temperature
as an independent variable: dF (T, V, N, . . .) = −SdT − P dV + µdN + . . ..
It is used to describe a system in a thermal contact with a heat reservoir
since the work done by the system under constant temperature (equal to
that of the reservoir) is minus the differential of the free energy: dW =
δQ − dE = T dS − dE = −dF (free energy is that part of the internal
energy which is free to turn into work, the rest of the energy we must keep
to sustain a constant temperature). The equilibrium state minimizes F , not
absolutely, but over the manifold of states with the temperature equal to that
of the reservoir. Indeed, consider F (T, X) = E[S(T, X), X] − T S(T, X),
then (∂E/∂X)S = (∂F/∂X)T that is they turn into zero simultaneously.
Also, in the point of extremum, one gets (∂ 2 E/∂X 2 )S = (∂ 2 F/∂X 2 )T i.e.
both E and F are minimal in equilibrium. Monatomic gas at fixed T, N
has F (V ) = E − T S(V ) = −N RT ln V +const. If a piston separates equal
amounts then the work done in changing the volume of a subsystem from V1
to V2 is ∆F = N RT ln[V2 (V − V2 )/V1 (V − V1 )].
Enthalpy H = E + P V is that partial Legendre transform of E which re-
places the volume by the pressure dH(S, P, N, . . .) = T dS +V dP +µdN +. . ..
It is particularly convenient for situation in which the pressure is maintained
constant by a pressure reservoir (say, when the vessel is open into atmo-
sphere). Just as the energy acts as a potential for work under the constant
entropy and the free energy as potential for work at constant temperature,
so the enthalpy is a potential for work at constant pressure: dW = −dH.
Equilibrium minimizes H under the constant pressure.
One can replace both entropy and volume obtaining (Gibbs) thermody-
namics potential G = E − T S + P V which has dG(T, P, N, . . .) = −SdT +

11
V dP + µdN + . . . and is minimal in equilibrium at constant temperature
and pressure. From (4) we get (remember, they all are functions of different
variables):

F = −P (T, V )V + µ(T, V )N , H = T S + µN , G = µ(T, P )N . (8)

When there is a possibility of change in the number of particles (because


our system is in contact with some particle source having a fixed chemical
potential) then one uses the grand canonical potential Ω(T, V, µ) = E −T S −
µN which has dΩ = −SdT − P dV − N dµ. The grand canonical potential
reaches its minimum under the constant temperature and chemical potential.
The choice of the potential for a given physical situation is usually to take
what is fixed as a variable to diminish the number of effective variables.
Maxwell relations. Changing order of taking mixed second derivatives of a
potential creates a class of identities known as Maxwell relations. For exam-
ple, ∂ 2 E/∂S∂V = ∂ 2 E/∂V ∂S gives (∂P/∂S)V = −(∂T /∂V )S . That can be
done for all three combinations (SV, SN, V N ) possible for a simple single-
component system and also for every other potential (F, H, G). Maxwell
relations for constant N can be remembered with the help of the mnemonic
diagram with the sides labelled by the four common potentials flanked by
their respective natural independent variables. In the differential expression
for each potential in terms of the natural variables arrow pointing away from
the variable implies a positive sign while pointing towards the variable implies
negative sign like in dE = T dS − P dV :
V F T

E G
V T
=

S P S P
S H P

Maxwell relations are given by the corners of the diagram, for example,
(∂V /∂S)P = (∂T /∂P )S etc. If we consider constant N then any fundamental
relation of a single-component system is a function of only two variables and
therefore have only three independent second derivatives. Traditionally, all
derivatives are expressed via the three basic ones (those of Gibbs potential),
the specific heat and the coefficient of thermal expansion, both at a constant

12
pressure, and isothermal compressibility:
( ) ( ) ( ) ( )
∂S ∂ 2G 1 ∂V 1 ∂V
cP = T = −T , α= , κT = − .
∂T P
∂T 2 P
V ∂T P
V ∂P T

In particular, the specific heat at constant volume is as follows:


( )
∂S T V α2
cV = T = cP − . (9)
∂T V
N κT

That and similar formulas form a technical core of thermodynamics and


the art of deriving them ought to be mastered. It involves few simple rules
in treating partial derivatives:
( ) ( ) ( ) ( ) ( )
∂X ∂Y ∂X ∂X ∂Y
= 1, = / ,
∂Y Z ∂X Z ∂Y Z ∂W Z ∂W Z
( ) ( ) ( )
∂X ∂Y ∂Z
= −1 . (10)
∂Y Z ∂Z X ∂X Y
An alternative (and more general) way to manipulate thermodynamic
derivatives is to use jacobians and identity ∂(T, S)/∂(P, V ) = 1 7 .

1.3 Stability of thermodynamic systems


. Consider entropy representation. Stationarity of equilibrium requires
dS = 0 while stability requires d2 S < 0. In particular, that means con-
cavity of S(E, X). Indeed, for all ∆E one must have S(E + ∆E, X) +
S(E − ∆E, X) ≤ 2S(E, X) otherwise our system can break into two halves
with the energies E ± ∆E thus increasing total entropy. For ∆E → 0 the
stability requirement means (∂ 2 S/∂E 2 )X ≤ 0 ⇒ (∂T /∂E)X ≥ 0 — in-
crease of the energy must increase temperature. This can be also recast into
(∂T /∂E)V = [∂(T V )/∂(EV )][∂(SV )/∂(SV )] = T −1 (∂T /∂S)V = 1/cv ≥ 0
(adding heat to a stable system increases temperature). The same concavity
requirement is true with respect to changes in other parameters X, in partic-
ular, (∂ 2 S/∂V 2 )E ≤ 0 ⇒ (∂P/∂V )T ≤ 0 that is isothermal expansion must
reduce pressure for the stable system. Considering both changes together we
must require SEE (∆E)2 +2SEV ∆E∆S+SV V (∆V )2 ≤ 0. This quadratic form
7
Taking, say, S, V as independent variables, ∂(T, S)/∂(P, V ) = −(∂T /∂V )(∂S/∂P ) =
ESV /EV S = 1

13
has a definite sign if the determinant is positive: SEE SV V − SEV
2
≥ 0. Ma-
nipulating derivatives one can show that this is equivalent to (∂P/∂V )S ≤
0. Alternatively, one may consider the energy representation, here stabil-
ity requires the energy minimum which gives ESS = T /cv ≥ 0, EV V =
−(∂P/∂V )S ≥ 0. Considering both variations one can diagonalize d2 E =
ESS (dS)2 + EV V (dV )2 + 2ESV dSdV by introducing the temperature differen-
−1 −1
tial dT = ESS dS+ESV dV so that 2d2 E = ESS (dT )2 +(EV V −ESV
2
ESS )(dV )2 .
−1
It is thus clear that EV V − ESV ESS = (∂ E/∂V )T = −(∂P/∂V )T and we
2 2 2

recover all the same inequalities.


∆ V
∆V

∆ E ∆E

Lines of constant entropy in unstable and stable cases

The physical content of those stability criteria is known as Le Chätelier’s


principle: if some perturbation deviates the system from a stable equilibrium
that induces spontaneous processes that reduce the perturbation.
11. Phase transitions happen when some stability condition is not satis-
fied like in the region with (∂P/∂V )T > 0 as at the lowest isotherm in the
below figure. When the pressure corresponds to the level NLC, it is clear
that L is an unstable point and cannot be realized. But which stable point is
realized, N or C? To get the answer, one must minimize the Gibbs potential
since we have T and P fixed. For one mole, it is the chemical potential which
can be found integrating the Gibbs-Duhem relation,

dµ(T, P ) = −sdT +vdP ,
under the constant temperature: G = µ = v(P )dP . It is clear that the
pressure that corresponds to D (having equal areas before and above the hor-
izontal line) separates the absolute minimum at the left branch Q (solid-like)
from that on the right one C (liquid-like). The dependence of volume on
pressure is discontinuous along the isotherm.

14
P V µ J
C E
D L Q
E D
N
L J
Q J E C
D
N L C N
V Q P P

15
2 Basic statistical physics (brief reminder)
Here we introduce microscopic statistical description in the phase space and
describe three principal ways (microcanonical, canonical and grand canoni-
cal) to derive thermodynamics from statistical mechanics.

2.1 Distribution in the phase space


We consider macroscopic bodies, systems and subsystems. We define prob-
ability for a subsystem to be in some ∆p∆q region of the phase space as
the fraction of time it spends there: w = limT →∞ ∆t/T . We introduce the
statistical distribution in the phase space as density: dw = ρ(p, q)dpdq. By
definition, the average with the statistical distribution is equivalent to the
time average:

1∫T
f¯ = f (p, q)ρ(p, q)dpdq = lim f (t)dt . (11)
T →∞ T 0

The main idea is that ρ(p, q) for a subsystem does not depend on the initial
states of this and other subsystems so it can be found without actually solving
equations of motion. We define statistical equilibrium as a state where macro-
scopic quantities equal to the mean values. Assuming short-range forces we
conclude that different macroscopic subsystems interact weakly and are sta-
tistically independent so that the distribution for a composite system ρ12 is
factorized: ρ12 = ρ1 ρ2 .
Now, we take the ensemble of identical systems starting from different
points in phase space. In a flow with the velocity v = (ṗ, q̇) the density
changes according to the continuity equation: ∂ρ/∂t + div (ρv) = 0. If the
motion is considered for not very large time it is conservative and can be
described by the Hamiltonian dynamics: q̇i = ∂H/∂pi and ṗi = −∂H/∂qi .
Hamiltonian flow in the phase space is incompressible: div v = ∂ q̇i /∂qi +
∂ ṗi /∂pi = 0. That gives the Liouville theorem: dρ/dt = ∂ρ/∂t + (v · ∇)ρ = 0
that is the statistical distribution is conserved along the phase trajectories
of any subsystem. As a result, equilibrium ρ must be expressed solely via
the integrals of motion. Since ln ρ is an additive quantity then it must be
expressed linearly via the additive integrals of motions which for a general
mechanical system are energy E(p, q), momentum P(p, q) and the momentum
of momentum M(p, q):
ln ρa = αa + βEa (p, q) + c · Pa (p, q) + d · M(p, q) . (12)

16
Here αa is the normalization constant for a given subsystem while the seven
constants β, c, d are the same for all subsystems (to ensure additivity) and
are determined by the values of the seven integrals of motion for the whole
system. We thus conclude that the additive integrals of motion is all we
need to get the statistical distribution of a closed system (and any sub-
system), those integrals replace all the enormous microscopic information.
Considering system which neither moves nor rotates we are down to the sin-
gle integral, energy. For any subsystem (or any system in the contact with
thermostat) we get Gibbs’ canonical distribution

ρ(p, q) = A exp[−βE(p, q)] . (13)

See Landau & Lifshitz, Sects 1-4.

2.2 Microcanonical distribution


For a closed system with the energy E0 , Boltzmann assumed that all mi-
crostates with the same energy have equal probability (ergodic hypothesis)
which gives the microcanonical distribution:

ρ(p, q) = Aδ[E(p, q) − E0 ] . (14)

Usually one considers the energy fixed with the accuracy ∆ so that the mi-
crocanonical distribution is
{
1/Γ for E ∈ (E0 , E0 + ∆)
ρ= (15)
0 for E ̸∈ (E0 , E0 + ∆) ,
where Γ is the volume of the phase space occupied by the system

Γ(E, V, N, ∆) = d3N pd3N q . (16)
E<H<E+∆

For example, for N noninteracting particles (ideal gas) the states with the
∑ 2
energy √
E = p /2m are in the p-space near the hyper-sphere with the
radius 2mE. Remind that the surface area of the hyper-sphere with the
radius R in 3N -dimensional space is 2π 3N/2 R3N −1 /(3N/2 − 1)! and we have

Γ(E, V, N, ∆) ∝ E 3N/2−1 V N ∆/(3N/2 − 1)! ≈ (E/N )3N/2 V N ∆ . (17)

To link statistical physics with thermodynamics one must define the fun-
damental relation i.e. a thermodynamic potential as a functions of respective

17
variables. It can be done using either canonical or microcanonical distribu-
tion. We start from the latter and introduce the entropy as

S(E, V, N ) = ln Γ(E, V, N ) . (18)

This is one of the most important formulas in physics8 (on a par with F =
ma , E = mc2 and E = h̄ω).
Noninteracting subsystems are statistically independent so that the sta-
tistical weight of the composite system is a product and entropy is a sum.
For interacting subsystems, this is true only for short-range forces in the
thermodynamic limit N → ∞. Consider two subsystems, 1 and 2, that
can exchange energy. Assume that the indeterminacy in the energy of any
subsystem, ∆, is much less than the total energy E. Then
E/∆

Γ(E) = Γ1 (Ei )Γ2 (E − Ei ) . (19)
i=1

We denote Ē1 , Ē2 = E − Ē1 the values that correspond to the maximal
term in the sum (19). The derivative of it is proportional to (∂Γ1 /∂Ei )Γ2 −
(∂Γ2 /∂Ei )Γ1 = (Γ1 Γ2 )−1 [(∂S1 /∂E1 )Ē1 − (∂S2 /∂E2 )Ē2 ]. Then the extremum
condition is evidently (∂S1 /∂E1 )Ē1 = (∂S2 /∂E2 )Ē2 , that is the extremum
corresponds to the thermal equilibrium where the temperatures of the subsys-
tems are equal. The equilibrium is thus where the maximum of probability is.
It is obvious that Γ(Ē1 )Γ(Ē2 ) ≤ Γ(E) ≤ Γ(Ē1 )Γ(Ē2 )E/∆. If the system con-
sists of N particles and N1 , N2 → ∞ then S(E) = S1 (Ē1 )+S2 (Ē2 )+O(logN )
where the last term is negligible.
Identification with the thermodynamic entropy can be done consider-
ing any system, for instance, an ideal gas. The problem is that the log-
arithm of (17) contains non-extensive term N ln V . The resolution of this
controversy is that we need to treat the particles as indistinguishable, oth-
erwise we need to account for the entropy of mixing different species. We
however implicitly assume that mixing different parts of the same gas is
a reversible process which presumes that the particles are identical. For
identical particles, one needs to divide Γ (17) by the number of transmu-
tations N ! which makes the resulting entropy of the ideal gas extensive:
S(E, V, N ) = (3N/2) ln E/N + N ln V /N +const. Note that quantum parti-
cles (atoms and molecules) are indeed indistinguishable, which is expressed
8
It is inscribed on the Boltzmann’s gravestone

18
by a proper symmetrization of the wave function. One can only wonder
at the genius of Gibbs who introduced N ! long before quantum mechan-
ics (see, L&L 40 or Pathria 1.5 and 6.1). Defining temperature in a usual
way, T −1 = ∂S/∂E = 3N/2E, we get the correct expression E = 3N T /2.
We express here temperature in the energy units. To pass to Kelvin de-
grees, one transforms T → kT and S → kS where the Boltzmann constant
k = 1.38 · 1023 J/K. The value of classical entropy (18) depends on the units.
Proper quantitative definition comes from quantum physics with Γ being the
number of microstates that correspond to a given value of macroscopic pa-
rameters. In the quasi-classical limit the number of states is obtained by
dividing the phase space into units with ∆p∆q = 2πh̄.
The same definition (entropy as a logarithm of the number of states)
is true for any system with a discrete set of states. For example, consider
the set of N two-level systems with levels 0 and ϵ. If energy of the set is
E then there are L = E/ϵ upper levels occupied. The statistical weight is
determined by the number of ways one can choose L out of N : Γ(N, L) =
CNL = N !/L!(N − L)!. We can now define entropy (i.e. find the fundamental
relation): S(E, N ) = ln Γ. Considering N ≫ 1 and L ≫ 1 we can use the
Stirling formula in the form d ln L!/dL = ln L and derive the equation of
state (temperature-energy relation),

∂ N! N −L
T −1 = ∂S/∂E = ϵ−1 ln = ϵ−1 ln ,
∂L L!(N − L)! L

and specific heat C = dE/dT = N (ϵ/T )2 2 cosh−2 (ϵ/T ). Note that the ratio
of the number of particles on the upper level to those on the lower level is
L/(N − L) = exp(−ϵ/T ) (Boltzmann relation). Specific heat turns into zero
both at low temperatures (too small portions of energy are ”in circulation”)
and in high temperatures (occupation numbers of two levels already close to
equal).
The derivation of thermodynamic fundamental relation S(E, . . .) in the
microcanonical ensemble is thus via the number of states or phase volume.

2.3 Canonical distribution


Let us re-derive the canonical distribution from the microcanonical one which
allows us to specify β = 1/T in (12,13). Consider a small subsystem or
a system in a contact with the thermostat (which can be thought of as

19
consisting of infinitely many copies of our system — this is so-called canonical
ensemble, characterized by N, V, T ). Here our system can have any energy
and the question arises what is the probability W (E). Let us find first the
probability of the system to be in a given microstate a with the energy E.
Assuming that all the states of the thermostat are equally likely to occur
we see that the probability should be directly proportional to the statistical
weight of the thermostat Γ0 (E0 −E) where we evidently assume that E ≪ E0 ,
expand Γ0 (E0 − E) = exp[S0 (E0 − E)] ≈ exp[S0 (E0 ) − E/T )] and obtain

wa (E) = Z −1 exp(−E/T ) , (20)



Z= exp(−Ea /T ) . (21)
a

Note that there is no trace of thermostat left except for the temperature.
The normalization factor Z(T, V, N ) is a sum over all states accessible to the
system and is called the partition function.
The probability to have a given energy is the probability of the state (20)
times the number of states:

W (E) = Γ(E)wa (E) = Γ(E)Z −1 exp(−E/T ) . (22)

Here Γ(E) grows fast while exp(−E/T ) decays fast when the energy E grows.
As a result, W (E) is concentrated in a very narrow peak and the energy
fluctuations around Ē are very small (see Sect. 2.4 below for more details).
For example, for an ideal gas W (E) ∝ E 3N/2 exp(−E/T ). Let us stress again
that the Gibbs canonical distribution (20) tells that the probability of a given
microstate exponentially decays with the energy of the state while (22) tells
that the probability of a given energy has a peak.
An alternative and straightforward way to derive the canonical distri-
bution is to use consistently the Gibbs idea of the canonical ensemble as a
virtual set, of which the single member is the system under consideration
and the energy of the total set is fixed. The probability to have our system
in the state a with the energy Ea is then given by the average number of
systems n̄a in this state divided by the total number of systems N . The set
of occupation numbers {na } = (n0 , n1 , n2 . . .) satisfies obvious conditions
∑ ∑
na = N , Ea na = E = ϵN . (23)
a a

Any given set is realized in W {na } = N !/n0 !n1 !n2 ! . . . number of ways and

20
the probability to realize the set is proportional to the respective W :

na W {na }
n̄a = ∑ , (24)
W {na }
where summation goes over all the sets that satisfy (23). We assume that
in the limit when N, na → ∞ the main contribution into (24) is given by
the most probable distribution which is found by looking at the extremum
∑ ∑
of ln W − α a na − β a Ea na . Using the Stirling formula ln n! = n ln n − n

we write ln W = N ln N − a na ln na and the extremum n∗a corresponds to
ln n∗a = −α − 1 − βEa which gives
n∗a exp(−βEa )
=∑ . (25)
N a exp(−βEa )

The parameter β is given implicitly by the relation



E a Ea exp(−βEa )
=ϵ= ∑ . (26)
N a exp(−βEa )

Of course, physically ϵ(β) is usually more relevant than β(ϵ). See Pathria,
Sect 3.2.
To get thermodynamics from the Gibbs distribution one needs to define
the free energy because we are under a constant temperature. This is done
via the partition function Z (which is of central importance since macroscopic
quantities are generally expressed via the derivatives of it):
F (T, V, N ) = −T ln Z(T, V, N ) . (27)

To prove that, differentiate the identity a exp[(F − Ea )/T ] = 1 with respect
to temperature which gives
( )
∂F
F = Ē + T ,
∂T V

equivalent to F = E − T S in thermodynamics.
One can also come to this by defining entropy. Remind that for a closed
system we defined S = ln Γ while the probability of state was wa = 1/Γ. For
a system in a contact with a thermostat that has a Gibbs distribution we
have ln wa is linear in E so that

S(Ē) = − ln wa (Ē) = −⟨ln wa ⟩ = − wa ln wa (28)

= wa (Ea /T + ln Z) = E/T + ln Z .

21

Even though we derived the formula for entropy, S = − wa ln wa , for
an equilibrium, this definition can be used for any set of probabilities wa ,
since it provides a useful measure of our ignorance about the system, as we
shall see later.
See Landau & Lifshitz (Sects 31,36).

2.4 Grand canonical ensemble


Let us now repeat the derivation we done in Sect. 2.3 but in more detail
and considering also the fluctuations in the particle number N . Consider a
subsystem in contact with a particle-energy reservoir. The probability for a
subsystem to have N particles and to be in a state EaN can be obtained by
expanding the entropy of the reservoir. Let us first do the expansion up to
the first-order terms as in (20,21)

waN = A exp[S(E0 − EaN , N0 − N )] = A exp[S(E0 , N0 ) + (µN − EaN )/T ]


= exp[(Ω + µN − EaN )/T ] . (29)

Here we used ∂S/∂E = 1/T , ∂S/∂N = −µ/T and introduced the grand
canonical potential which can be expressed through the grand partition func-
tion ∑ ∑
Ω(T, V, µ) = −T ln exp(µN/T ) exp(−EaN )/T ) . (30)
N a

That this is equivalent to the thermodynamic definition, Ω = Ē − T S̄ − µN̄


can be seen calculating the mean entropy similar to (28):

S̄ = − waN ln waN = (µN̄ + Ω − Ē)/T . (31)
a,N

The grand canonical distribution must be equivalent to canonical if one


neglects the fluctuations in particle numbers. Indeed, when we put N = N̄
the thermodynamic relation gives Ω + µN̄ = F so that (29) coincides with
the canonical distribution wa = exp[(F − Ea )/T ].
Generally, there is a natural hierarchy: microcanonical distribution ne-
glects fluctuations in energy and number of particles, canonical distribution
neglects fluctuations in N but accounts for fluctuations in E, and eventually
grand canonical distribution accounts for fluctuations both in E and N . The
distributions are equivalent as long as fluctuations are small.

22
To describe fluctuations one needs to expand the respective thermody-
namic potential around the mean value, using the second derivatives ∂ 2 S/∂E 2
and ∂ 2 S/∂N 2 (which must be negative for stability). That will give Gaus-
sian distributions of E − Ē and N − N̄ . A straightforward way to find the
energy variance (E − Ē)2 is to differentiate with respect to β the identity
E − Ē = 0. For this purpose one can use canonical distribution and get
( )
∂ ∑ ∑ ∂F ∂ Ē
(Ea − Ē)eβ(F −Ea ) = (Ea − Ē) F + β − Ea eβ(F −Ea ) − = 0,
∂β a a ∂β ∂β
∂ Ē
(E − Ē)2 = − = T 2 CV . (32)
∂β
Magnitude of fluctuations is determined by the second derivative of the re-
spective thermodynamic potential (which is CV ). Since both Ē and CV are
proportional to N then the relative fluctuations are small indeed: (E − Ē)2 /Ē 2 ∝
N −1 . In what follows (as in the most of what preceded) we do not distinguish
between E and Ē.
Let us now discuss the fluctuations of particle number. One gets the
probability to have N particles by summing (29) over a:

W (N ) ∝ exp{β[µ(T, V )N − F (T, V, N )]}

where F (T, V, N ) is the free energy calculated from the canonical distribu-
tion for N particles in volume V and temperature T . The mean value N̄
is determined by the extremum of probability: (∂F/∂N )N̄ = µ. The sec-
ond derivative determines the width of the distribution over N that is the
variance:
( )−1 ( )−1
∂2F −2 ∂P
(N − N̄ )2 = 2T = 2T N v ∝N . (33)
∂N 2 ∂v

Here we used the fact that F (T, V, N ) = N f (T, v) with v = V /N so that


P = (∂F/∂V )N = ∂f /∂v, and substituted the derivatives calculated at
fixed V : (∂F/∂N )V = f (v) − v∂f /∂v and (∂ 2 F/∂N 2 )V = N −1 v 2 ∂ 2 f /∂v 2 =
−N −1 v 2 ∂P (v)/∂v. As we discussed in Thermodynamics, ∂P (v)/∂v < 0
for stability. We see that generally the fluctuations are small unless the
isothermal compressibility is close to zero which happens at the first-order
phase transitions. Particle number (and density) strongly fluctuate in such
systems which contain different phases of different densities. This is why one

23
uses grand canonical ensemble in such cases (as we shall do in the Chapter 4

below). Note that any extensive quantity f = N i=1 fi which is a sum over
independent subsystems (i.e. fi fk = f¯i f¯k ) have a small relative fluctuation:
(f 2 − f¯2 )/f¯2 ∝ 1/N .
See also Landau & Lifshitz 35 and Huang 8.3-5.

2.5 Two simple examples


Here we consider two examples with the simplest structures of energy levels
to illustrate the use of microcanonical and canonical distributions.

2.5.1 Two-level system


Assume levels 0 and ϵ. Remind that in Sect. 2.2 we already considered
two-level system in the microcanonical approach calculating the number of
ways one can distribute L = E/ϵ portions of energy between N particles
and obtaining S(E, N ) = ln CNL = ln[N !/L!(N − L)!] ≈ N ln[N/(N − L)] +
L ln[(N − L)/L]. The temperature in the microcanonical approach is as
follows:
∂S
T −1 = = ϵ−1 (∂/∂L) ln[N !/L!(N − L)!] = ϵ−1 ln(N − L)/L . (34)
∂E
The entropy as a function of energy is drawn on the Figure:
S
T= T=−

T=+0 T=−0 E
0 Nε
Indeed, entropy is zero at E = 0, N ϵ when all the particles are in the same
state. The entropy is symmetric about E = N ϵ/2. We see that when
E > N ϵ/2 then the population of the higher level is larger than of the
lower one (inverse population as in a laser) and the temperature is negative.
Negative temperature may happen only in systems with the upper limit of
energy levels and simply means that by adding energy beyond some level we
actually decrease the entropy i.e. the number of accessible states. Available

24
(non-equilibrium) states lie below the S(E) plot, notice that the entropy
maximum corresponds to the energy minimum for positive temperatures and
to the energy maximum for the negative temperatures part. A glance on
the figure also shows that when the system with a negative temperature is
brought into contact with the thermostat (having positive temperature) then
our system gives away energy (a laser generates and emits light) decreasing
the temperature further until it passes through infinity to positive values
and eventually reaches the temperature of the thermostat. That is negative
temperatures are actually ”hotter” than positive.
Let us stress that there is no volume in S(E, N ) that is we consider only
subsystem or only part of the degrees of freedom. Indeed, real particles have
kinetic energy unbounded from above and can correspond only to positive
temperatures [negative temperature and infinite energy give infinite Gibbs
factor exp(−E/T )]. Apart from laser, an example of a two-level system is
spin 1/2 in the magnetic field H. Because the interaction between the spins
and atom motions (spin-lattice relaxation) is weak then the spin system for
a long time (tens of minutes) keeps its separate temperature and can be
considered separately.
External fields are parameters (like volume and chemical potential) that
determine the energy levels of the system. They are sometimes called gen-
eralized thermodynamic coordinates, and the derivatives of the energy with
respect to them are called respective forces. Let us derive the generalized
force M that corresponds to the magnetic field and determines the work done
under the change of magnetic field: dE = T dS − M dH. Since the projection
of every magnetic moment on the direction of the field can take two values ±µ
then the magnetic energy of the particle is ∓µH and E = −µ(N+ − N− )H.
The force (the partial derivative of the energy with respect to the field at a
fixed entropy) is called magnetization or magnetic moment of the system:
( )
∂E exp(µH/T ) − exp(−µH/T )
M =− = µ(N+ − N− ) = N µ . (35)
∂H S
exp(µH/T ) + exp(−µH/T )

The derivative was taken at constant entropy that is at constant popula-


tions N+ and N− . Note that negative temperature for the spin system
corresponds to the magnetic moment opposite in the direction to the ap-
plied magnetic field. Such states are experimentally prepared by a fast re-
versal of the magnetic field. We can also define magnetic susceptibility:
χ(T ) = (∂M/∂H)H=0 = N µ2 /T .

25
At weak fields and positive temperature, µH ≪ T , (35) gives the formula
for the so-called Pauli paramagnetism
M µH
= . (36)
Nµ T
Para means that the majority of moments point in the direction of the ex-
ternal field. This formula shows in particular a remarkable property of the
spin system: adiabatic change of magnetic field (which keeps N+ , N− and
thus M ) is equivalent to the change of temperature even though spins do not
exchange energy. One can say that under the change of the value of the ho-
mogeneous magnetic field the relaxation is instantaneous in the spin system.
This property is used in cooling the substances that contain paramagnetic
impurities. Note that the entropy of the spin system does not change when
the field changes slowly comparatively to the spin-spin relaxation and fast
comparatively to the spin-lattice relaxation.
To conclude let us treat the two-level system by the canonical approach
where we calculate the partition function and the free energy:


N
Z(T, N ) = CNL exp[−Lϵ/T ] = [1 + exp(−ϵ/T )]N , (37)
L=0
F (T, N ) = −T ln Z = −N T ln[1 + exp(−ϵ/T )] . (38)

We can now re-derive the entropy as S = −∂F/∂T and derive the (mean)
energy and specific heat:
∑ ∂ ln Z ∂ ln Z
Ē = Z −1 Ea exp(−βEa ) = − = T2 (39)
a ∂β ∂T

= , (40)
1 + exp(ϵ/T )
dE N exp(ϵ/T ) ϵ2
C = = . (41)
dT [1 + exp(ϵ/T )]2 T 2

Note that (39) is a general formula which we shall use in the future. Specific
heat turns into zero both at low temperatures (too small portions of energy
are ”in circulation”) and in high temperatures (occupation numbers of two
levels already close to equal).

26
C/N
1/2

2 T/ε

A specific heat of this form characterized by a peak is observed in all systems


with an excitation gap.
More details can be found in Kittel, Section 24 and Pathria, Section 3.9.

2.5.2 Harmonic oscillators


Small oscillations around the equilibrium positions (say, of atoms in the
lattice or in the molecule) can be treated as harmonic and independent.
The harmonic oscillator is described by the Hamiltonian
1 ( 2 )
H(q, p) = p + ω 2 q 2 m2 . (42)
2m
We start from the quasi-classical limit, h̄ω ≪ T , when the single-oscillator
partition function is obtained by Gaussian integration:
∫ ∞ ∫ ∞ T
Z1 (T ) = (2πh̄)−1 dp dq exp(−H/T ) = . (43)
−∞ −∞ h̄ω
We can now get the partition function of N independent oscillators as Z(T, N ) =
Z1N (T ) = (T /h̄ω)N , the free energy F = N T ln(h̄ω/T ) and the mean energy
from (39): E = N T — this is an example of the equipartition (every oscillator
has two degrees of freedom with T /2 energy for each)9 . The thermodynamic
equations of state are µ(T ) = T ln(h̄ω/T ) and S = N [ln(T /h̄ω)+1] while the
pressure is zero because there is no volume dependence. The specific heat
CP = CV = N .
Apart from thermodynamic quantities one can write the probability dis-
tribution of coordinate which is given by the Gibbs distribution using the
potential energy:

dwq = ω(2πT )−1/2 exp(−ω 2 q 2 /2T )dq . (44)


9
If some variable ∫x enters energy as x2n∫ then the mean energy associated with that
degree of freedom is x2n exp(−x2n /T )dx/ exp(−x2n /T )dx = T 2−n (2n − 1)!!.

27
Using kinetic energy and simply replacing q → p/ω one obtains a similar
formula dwp = (2πT )−1/2 exp(−p2 /2T )dp which is the Maxwell distribution.
For a quantum case, the energy levels are given by En = h̄ω(n + 1/2).
The single-oscillator partition function


Z1 (T ) = exp[−h̄ω(n + 1/2)/T ] = 2 sinh−1 (h̄ω/2T ) (45)
n=0

gives again Z(T, N ) = Z1N (T ) and F (T, N ) = N T ln[sinh(h̄ω/2T )/2] =


N h̄ω/2 + N T ln[1 − exp(−h̄ω/T ). The energy now is

E = N h̄ω/2 + N h̄ω[exp(h̄ω/T ) − 1]−1

where one sees the contribution of zero quantum oscillations and the break-
down of classical equipartition. The specific heat is as follows: CP = CV =
N (h̄ω/T )2 exp(h̄ω/T )[exp(h̄ω/T ) − 1]−2 . Comparing with (41) we see the
same behavior at T ≪ h̄ω: CV ∝ exp(−h̄ω/T ) because “too small energy
portions are in circulation” and they cannot move system to the next level.
At large T the specific heat of two-level system turns into zero because the
occupation numbers of both levels are almost equal while for oscillator we
have classical equipartition (every oscillator has two degrees of freedom so it
has T in energy and 1 in CV ).
C/N
1

2 T/ε

Quantum analog of (44) must be obtained by summing the wave functions


of quantum oscillator with the respective probabilities:


dwq = adq |ψn (q)|2 exp[−h̄ω(n + 1/2)/T ] . (46)
n=0

Here a is the normalization factor. Straightforward (and beautiful) calcula-


tion of (46) can be found in Landau & Lifshitz Sect. 30. Here we note that
the distribution must be Gaussian dwq ∝ exp(−q 2 /2q 2 ) where the mean-
square displacement q 2 can be read from the expression for energy so that

28
one gets:
( )1/2 ( )
ω h̄ω 2ω h̄ω
dwq = tanh exp −q tanh dq . (47)
πh̄ 2T h̄ 2T
At h̄ω ≪ T it coincides with (44) while at the opposite (quantum) limit gives
dwq = (ω/πh̄)1/2 exp(−q 2 ω/h̄)dq which is a purely quantum formula |ψ0 |2 for
the ground state of the oscillator.
See also Pathria Sect. 3.7 for more details.

3 Entropy and information


By definition, entropy of a closed system determines the number of available
states (or, classically, phase volume). Assuming that system spends compa-
rable time in different available states we conclude that since the equilibrium
must be the most probable state it corresponds to the entropy maximum. If
the system happens to be not in equilibrium at a given moment of time [say,
the energy distribution between the subsystems is different from the most
probable Gibbs distribution (26)] then it is more probable to go towards
equilibrium that is increasing entropy. This is a microscopic (probabilistic)
interpretation of the second law of thermodynamics formulated by Clausius
in 1865. Note that the probability maximum is very sharp in the thermo-
dynamic limit since exp(S) grows exponentially with the system size. That
means that for macroscopic systems the probability to pass into the states
with lower entropy is so vanishingly small that such events are never observed.
Dynamics (classical and quantum) is time reversible. Entropy growth
is related not to the trajectory of a single point in phase space but to the
behavior of finite regions (i.e. sets of such points). Consideration of finite
regions is called coarse graining and it is the main feature of stat-physical
approach responsible for the irreversibility of statistical laws.

3.1 Lyapunov exponent


The dynamical background of entropy growth is the separation of trajec-
tories in phase space so that trajectories started from a small finite region
fill larger and larger regions of phase space as time proceeds. The relative
motion is determined by the velocity difference between neighboring points:
δvi = rj ∂vi /∂xj = rj σij . Here x = (p, q) is the 6N -dimensional vector of

29
the position and v = (ṗ, q̇) is the velocity in the phase space. The trace
of the tensor σij is the rate of the volume change which must be zero ac-
cording to the Liouville theorem (that is a Hamiltonian dynamics imposes
an incompressible flow in the phase space). We can decompose the tensor
of velocity derivatives into an antisymmetric part (which describes rotation)
and a symmetric part (which describes deformation). We are interested here
in deformation because it is the mechanism of the entropy growth. The sym-
metric tensor, Sij = (∂vi /∂xj + ∂vj /∂xi )/2, can be always transformed into
a diagonal form by an orthogonal transformation (i.e. by the rotation of
the axes). The diagonal components are the rates of stretching in different
directions. Indeed, the equation for the distance between two points along a
principal direction has a form: ṙi = δvi = ri Sii (no summation over i). The
solution is as follows:
[∫ t ]
′ ′
ri (t) = ri (0) exp Sii (t ) dt . (48)
0

For a time-independent strain, the growth/decay is exponential in time. One


recognizes that a purely straining motion converts a spherical element into an
ellipsoid with the principal diameters that grow (or decay) in time. Indeed,
consider a two-dimensional projection of the initial spherical element
√ i.e. a
circle of the radius R at t = 0. The point that starts at x0 , y0 = R2 − x20
goes into
x(t) = eS11 t x0 ,
√ √
y(t) = eS22 t y0 = eS22 t R2 − x20 = eS22 t R2 − x2 (t)e−2S11 t ,
x2 (t)e−2S11 t + y 2 (t)e−2S22 t = R2 . (49)
The equation (49) describes how the initial circle turns into the ellipse whose
eccentricity increases exponentially with the rate |S11 − S22 |.
exp(Sxx t)
t
exp(Syy t)

Figure 1: Deformation of a phase-space element by a permanent strain.

Of course, as the system moves in the phase space, both the strain val-
ues and the orientation of the principal directions change, so that expand-

30
ing direction may turn into a contracting one and vice versa. The ques-
tion is whether averaging over all possibilities gives a zero net result. One
can show that in a general case an exponential stretching persists on av-
erage and the majority of trajectories separate. To see that, consider as
an example two-dimensional incompressible saddle-point flow (pure strain):
vx = λx, vy = −λy. The vector r = (x, y) (which is supposed to char-
acterize the distance between two close trajectories) satisfies the equations
ẋ = vx and ẏ = vy . Whether the vector is stretched or contracted after
some time T depends on its orientation and on T . Since x(t) = x0 exp(λt)
and y(t) = y0 exp(−λt) = x0 y0 /x(t) then every trajectory is a hyperbole.
A unit vector initially forming an angle φ with the x axis will have its
length [cos2 φ exp(2λT ) + sin2 φ exp(−2λT )]1/2 after time T . The vector will
be stretched if cos φ ≥ [1 + exp(2λT )]−1/2 < 1/sqrt2, i.e. the fraction of
stretched directions is larger than half. When along the motion all orienta-
tions are equally probable, the net effect is stretching.
y

y(0)

y(T)
ϕ
0
x(0) x(T) x

Figure 2: The distance of the point from the origin increases if the angle is
less than φ0 = arccos[1 + exp(2λT )]−1/2 > π/4. Note that for φ = φ0 the
initial and final points are symmetric relative to the diagonal.

This is formally proved in mathematics by considering random σ̂(t) and the


transfer matrix Ŵ defined by r(t) = Ŵ (t, t1 )r(t1 ). It satisfies the equation
dŴ /dt = σ̂ Ŵ . The Liouville theorem tr σ̂ = 0 means that det Ŵ = 1. The
modulus r of the separation vector may be expressed via the positive symmetric
matrix Ŵ T Ŵ . The main result (Furstenberg and Kesten 1960; Oseledec, 1968)
states that in almost every realization σ̂(t), the matrix 1t ln Ŵ T (t, 0)Ŵ (t, 0) tends
to a finite limit as t → ∞. In particular, its eigenvectors tend to d fixed orthonor-
mal eigenvectors fi . Geometrically, that precisely means than an initial sphere
evolves into an elongated ellipsoid at later times. As time increases, the ellipsoid
is more and more elongated and it is less and less likely that the hierarchy of the

31
ellipsoid axes will change. The limiting eigenvalues

λi = lim t−1 ln |Ŵ fi | (50)


t→∞

define the so-called Lyapunov exponents. The sum of the exponents is zero due to
the Liouville theorem so there exists at least one positive exponent which corre-
sponds to stretching. Mathematical lesson to learn is that multiplying N random
matrices with unit determinant (recall that the determinant is the product of
eigenvalues), one generally gets some eigenvalues growing (and some decreasing)
exponentially with N .
The probability to find a ball turning into an exponentially stretching
ellipse goes to unity as time increases. The physical reason for it is that
substantial deformation appears sooner or later. To reverse it, one needs
to contract the long direction, that is the direction of contraction must be
inside the narrow angle defined by the ellipse eccentricity, which is unlikely.
Randomly oriented deformations on average continue to increase the eccen-
tricity. After the strip length reaches the scale of the velocity change (when
one already cannot approximate the phase-space flow by a linear profile σ̂r),
strip starts to fold, continuing locally the exponential stretching. Eventually,
one can find the points from the initial ball everywhere which means that
the flow is mixing.
On the figure, one can see how the black square of initial conditions (at
the central box) is stretched in one (unstable) direction and contracted in an-
other (stable) direction so that it turns into a long narrow strip (left and right
boxes). Rectangles in the right box show finite resolution (coarse-graining).
Viewed with such resolution, our set of points occupies larger phase volume
(i.e. corresponds to larger entropy) at t = ±T than at t = 0. Time reversibil-
ity of any particular trajectory in the phase space does not contradict the
time-irreversible filling of the phase space by the set of trajectories considered
with a finite resolution. By reversing time we exchange stable and unstable
directions but the fact of space filling persists.
p p p

t=-T q t=0 q t=T q

32
When the density spreads, entropy grows. For example, consider a simple
case, when the spread of the probability density ρ(r, t) can be described by
the simple diffusion (in Sect 8 below we show that it is possible on a timescale
where one can consider the motion as a series of uncorrelated random walks):
∂ρ/∂t = −κ∆ρ. Entropy increases monotonically under diffusion:
dS d ∫ ∫ ∫
(∇ρ)2
=− ρ(r, t) ln ρ(r, t) dr = −κ ∆ρ ln ρ dr = κ dr ≥ 0 . (51)
dt dt ρ

3.2 Adiabatic processes and the third law


The second law of thermodynamics is valid not only for isolated systems but
also for systems in the (time-dependent) external fields or under external
conditions changing in time as long as there is no heat exchange, that is
for systems that can be described by the microscopic Hamiltonian H(p, q, λ)
with some parameter λ(t) slowly changing with time. That means that the
environment is not a macroscopic body with hidden degrees of freedom but
is completely determined by the value of the single parameter λ, that is the
entropy of the environment is zero. In particular, λ can be the system volume
since the walls can be thought of as confining potential. If temporal changes
are slow enough then the entropy of the system does not change i.e. the
process is adiabatic. Indeed, the positivity of Ṡ = dS/dt requires that the
expansion of Ṡ(λ̇) starts from the second term,
( )2
dS dS dλ dλ dS dλ
= · =A ⇒ =A . (52)
dt dλ dt dt dλ dt
We see that when dλ/dt goes to zero, entropy is getting independent of λ.
That means that we can change λ (say, volume) by finite amount making the
entropy change whatever small by doing it slow enough.
During the adiabatic process the system is assumed to be in thermal
equilibrium at any instant of time (as in quasi-static processes defined in
thermodynamics). Changing λ (called coordinate) one changes the energy
levels Ea and the total energy. Respective force (pressure when λ is volume,
magnetic or electric moments when λ is the respective field) is obtained as the
average (over the equilibrium statistical distribution) of the energy derivative
with respect to λ:
( )
∂H(p, q, λ) ∑ ∂Ea ∂ ∑ ∂E(S, λ, . . .)
= wa = wa E a = . (53)
∂λ a ∂λ ∂λ a ∂λ S

33
We see that the force is equal to the derivative of the thermodynamic energy
at constant entropy because the probabilities wa do not change. Note that
in an adiabatic process all wa are assumed to be constant i.e. the entropy of
any subsystem us conserved. This is more restrictive than the condition of
reversibility which requires only the total entropy to be conserved. In other
words, the process can be reversible but not adiabatic. See Landau & Lifshitz
(Section 11) for more details.
The last statement we make here about entropy is the third law of thermo-
dynamics (Nernst theorem) which claims that S → 0 as T → 0. A standard
argument is that since stability requires the positivity of the specific heat
cv then the energy must monotonously increase with the temperature and
zero temperature corresponds to the ground state. If the ground state is
non-degenerate (unique) then S = 0. Since generally the degeneracy of the
ground state grows slower than exponentially with N , then the entropy per
particle is zero in the thermodynamic limit. While this argument is correct
it is relevant only for temperatures less than the energy difference between
the first excited state and the ground state. As such, it has nothing to do
with the third law established generally for much higher temperatures and
related to the density of states as function of energy. We shall discuss it later
considering Debye theory of solids. See Huang (Section 9.4) for more details.

3.3 Information theory approach


Here I briefly re-tell the story of statistical physics using a different language.
An advantage of using different formulations is that it helps to understand
things better and triggers different intuition in different people.
Consider first a simple problem in which we are faced with a choice among
n equal possibilities (say, in which of n boxes a candy is hidden). How much
we do not know? Let us denote the missing information by I(n). Clearly,
the information is an increasing function of n and I(1) = 0. If we have
few independent problems then information must be additive. For example,
consider each box to have m compartments: I(nm) = I(n) + I(m). Now, we
can write (Shannon, 1948)

I(n) = I(e) ln n = k ln n (54)

That it must be a logarithm is clear also from obtaining the missing informa-
tion by asking the sequence of questions in which half we find the box with

34
the candy, one then needs log2 n of such questions and respective one-bit
answers. We can easily generalize the definition (54) for non-integer ratio-
nal numbers by I(n/l) = I(n) − I(l) and for all positive real numbers by
considering limits of the series and using monotonicity.
If we have an alphabet with n symbols then the message of the length N
can potentially be one of nN possibilities so that it brings the information
kN ln n or k ln n per symbol. If all the 25 letters of the English alphabet
were used with the same frequency then the word ”love” would bring the
information equal to 4k ln 25.
n

A B ... L ... Z

A B ...... O ... Z
N
A B ... V ... Z

A B E Z
... ...
In reality though it brings even less information (no matter how emotional
we can get) since we know that letters are used with different frequencies.
Indeed, consider the situation when there is a probability wi assigned to
each letter (or box) i = 1, . . . , n. Now if we want to evaluate the missing
information (or, the information that one symbol brings us on average) we
ought to think about repeating our choice N times. As N → ∞ we know
that candy in the i-th box in N wi cases but we do not know the order in
which different possibilities appear. Total number of orders is N !/ Πi (N wi )!
and the missing information is
( ) ∑
IN = k ln N !/ Πi (N wi )! ≈ −N k wi ln wi + O(lnN ) . (55)
i

The missing information per problem (or per symbol in the language) coin-
cides with the entropy (28):

n
I(w1 . . . wn ) = lim IN /N = −k wi ln wi . (56)
N →∞
i=1

Incidentally for English language the information per symbol is



z
− wi log2 wi ≈ 4.11 bits .
i=a

35
The information (56) is zero for delta-distribution wi = δij ; it is generally
less than the information (54) and coincides with it only for equal probabil-
ities, wi = 1/n, when the entropy is maximum. Indeed, equal probabilities
we ascribe when there is no extra information, i.e. in a state of maximum
ignorance. Mathematically, the property

I(1/n, . . . , 1/n) ≥ I(w1 . . . wn ) (57)

is called convexity. It follows from the fact that the function of a single
variable s(w) = −w ln w is strictly downward convex (concave) since its
second derivative, −1/w, is everywhere negative for positive w. For a concave
function, the average over the set of points wi is less or equal to the function
at the average value (so-called Jensen inequality):
( )
1∑ n
1∑ n
s (wi ) ≤ s wi . (58)
n i=1 n i=1

From here one gets the entropy inequality:


( ) ( ) ( )

n
1∑ n
1 1 1
I(w1 . . . wn ) = s (wi ) ≤ ns wi = ns =I ,..., (59)
.
i=1 n i=1 n n n

The relation (58) can be proven by induction using the convexity condition,
which states that the linear interpolation between two points a, b lies every-
where below the function graph: s(λa + b − λb) ≥ λs(a) + (1 − λ)s(b) for
∑n−1
any λ ∈ [0, 1]. Now we can choose λ = (n − 1)/n, a = (n − 1)−1 i=1 wi and
b = wn to see that
( ) ( )
1∑ n
n−1 ∑
n−1
wn
s wi = s (n − 1)−1 wi +
n i=1 n i=1 n
( )
n−1 ∑
n−1
1
≥ s (n − 1)−1 wi + s (wn )
n i=1 n

1 n−1 1 1∑ n
≥ s (wi ) + s (wn ) = s (wi ) . (60)
n i=1 n n i=1

In the last line we used the truth of (58) for n − 1 to prove it for n.
Note that when n → ∞ then (54) diverges while (56) may well be finite.
We can generalize (56) for a continuous distribution by dividing into cells

36
(that is considering a limit of discrete points). Here, different choices of
variables to define equal cells give different definitions of information. It is in
such a choice that physics enters. We use canonical coordinates in the phase
space and write the missing information in terms of the density which may
also depend on time:

I(t) = − ρ(p, q, t) ln[ρ(p, q, t)] dpdq . (61)

If the density of the discrete points in the continuous limit is inhomogeneous,


say m(x), then the proper generalization is

I(t) = − ρ(x) ln[ρ(x)/m(x)] dx . (62)

Note that (62) is invariant with respect to an arbitrary change of variables


x → y(x) since ρ(y) = ρ(x)dy/dx and m(y) = m(x)dy/dx while (61) was
invariant only with respect to canonical transformations (including a time
evolution according to a Hamiltonian dynamics) that conserve the element
of the phase-space volume.
Mention briefly the application of entropy in communication theory. In-
equality (57) means that a communication channel transmitting bits (ones
and zeros) on average can transmit no more than one unit of the information

(56) per symbol. In other words, zi=a wi log2 wi gives the minimum number
of bits needed to transmit the ensemble of messages. We can say that the
information content of a symbol number i is log2 (1/wi ), while the entropy is
the mean information content per symbol. Note that less probable symbols
have larger information content. That suggests a way of signal compression
by coding common letters by short sequences and infrequent letters by more
lengthy combinations - lossless compressions like zip, gz and gif work this way
(you may find it interesting to know that jpeg, mpeg, mp3 and telephone use
lossy compression which removes information presumed to be unimportant
for humans).
Apart from restrictions imposed by the statistics of symbols to be trans-
ferred, one also wish to characterize the quality of the channel. Note that in
this context one can view measurements as messages about the value of the
quantity we measure. Here, the message (measurement) A we receive gives
the information about the event (quantity) B as follows:

I(A, B) = ln[P (B|A)/P (B)] ,

37
where P (B|A) is the so-called conditional probability (of B in the presence
of A). The conditional probability is related to the joint probability P (A, B)
by the evident formula P (A, B) = P (B|A)P (A), which allows one to write
the information in a symmetric form
[ ]
[P (B, A)
I(A, B) = ln .
P (A)P (B)

When A and B are independent then the conditional probability is indepen-


dent of A and information is zero. When they are dependent, P (B, A) ≥
P (A)P (B) so that that the information is always positive. It is interesting
to know how much information on average about B one obtains by measuring
A. Summing over all possible B1 , . . . , Bn and A1 , . . . , Am we obtain Shan-
non’s “mutual information” used to evaluate the quality of communication
systems (or measurements)

m ∑
n
I(A, B) = P (Ai , Bj ) ln[P (Bj |Ai )/P (Bj )]
i=1 j=1
∫ [ ∫ ] [ ]
p(z|y) p(z, y)
→ I(Z, Y ) = dzdyp(z, y) ln = dzdyp(z, y) ln (63)
.
p(y) p(z)p(y)

Here we used p(z, y) = p(z|y)p(y). If one is just interested in the channel as


specified by P (B|A) then one maximizes I(A, B) over all choices of the source
statistics P (B) and call it channel capacity. Note that (63) is the particular
case of multidimensional (62) where one takes x = (y, z), m = p(z)p(y).
So far, we defined information via the distribution. Now, we want to
use the idea of information to get the distribution. Statistical mechanics is a
systematic way of guessing, making use of incomplete information. The main
problem is how to get the best guess for the probability distribution ρ(p, q, t)
based on any given information presented as ⟨Rj (p, q, t)⟩ = rj , i.e. as the
expectation (mean) values of some dynamical quantities. Our distribution
must contain the whole truth (i.e. all the given information) and nothing
but the truth that is it must maximize the missing information I. This is to
provide for the widest set of possibilities for future use, compatible with the
existing information. Looking for the maximum of
∑ ∫ ∑
I− λj ⟨Rj (p, q, t)⟩ = ρ(p, q, t){ln[ρ(p, q, t)] − λj ⟨Rj (p, q, t)} dpdq ,
j j

38
we obtain the distribution
[ ∑ ]
ρ(p, q, t) = Z −1 exp − λj Rj (p, q, t) , (64)
j

where the normalization factor


∫ [ ∑ ]
Z(λi ) = exp − λj Rj (p, q, t) dpdq ,
j

can be expressed via the measured quantities by using


∂ ln Z
= −ri . (65)
∂λi
For example, consider our initial ”candy-in-the-box” problem (think of an
impurity atom in a lattice if you prefer physics). Let us denote the number
of the box with the candy j. Different attempts give different j (for impurity,
think of X-ray scattering on the lattice) but on average after many attempts
we find, say, ⟨cos(kj)⟩ = 0.3. Then

ρ(j) = Z −1 (λ) exp[−λ cos(kj)]



n
Z(λ) = exp[λ cos(kj)] , ⟨cos(kj)⟩ = d log Z/dλ = 0.3 .
j=1

We can explicitly solve this for k ≪ 1 ≪ kn when one can approximate the
sum by the integral so that Z(λ) ≈ nI0 (λ) where I0 is the modified Bessel
function. Equation I0′ (λ) = 0.3I0 (λ) has an approximate solution λ ≈ 0.63.
Note in passing that the set of equations (65) may be self-contradictory
or insufficient so that the data do not allow to define the distribution or
allow it non-uniquely. If, however, the solution exists then (61,64) define the
missing information I{ri } which is analogous to thermodynamic entropy as
a function of (measurable) macroscopic parameters. It is clear that I have
a tendency to increase whenever a constraint is removed (when we measure
less quantities Ri ).
If we know the given information at some time t1 and want to make
guesses about some other time t2 then our information generally gets less
relevant as the distance |t1 − t2 | increases. In the particular case of guessing
the distribution in the phase space, the mechanism of loosing information
is due to separation of trajectories described in Sect. 3. Indeed, if we know

39
that at t1 the system was in some region of the phase space, the set of
trajectories started at t1 from this region generally fills larger and larger
regions as |t1 − t2 | increases. Therefore, missing information (i.e. entropy)
increases with |t1 − t2 |. Note that it works both into the future and into the
past. Information approach allows one to see clearly that there is really no
contradiction between the reversibility of equations of motion and the growth
of entropy. Also, the concept of entropy as missing information10 allows
one to understand that entropy does not really decrease in the system with
Maxwell demon or any other information-processing device (indeed, if at the
beginning one has an information on position or velocity of any molecule,
then the entropy was less by this amount from the start; after using and
processing the information the entropy can only increase). Consider, for
instance, a particle in the box. If we know that it is in one half then entropy
(the logarithm of available states) is ln(V /2). That also teaches us that
information has thermodynamic (energetic) value: by placing a piston at the
half of the box and allowing particle to hit and move it we can get the work
T ∆S = T ln 2 done; on the other hand, to get such an information one must
make a measurement whose minimum energetic cost is T ∆S = T ln 2 (Szilard
1929).
Yet there is one class of quantities where information does not age. They
are integrals of motion. A situation in which only integrals of motion are
known is called equilibrium. The distribution (64) takes the canonical form
(12,13) in equilibrium. On the other hand, taking micro-canonical as constant
over the constant-energy surface corresponds to the same approach of not
adding any additional information to what is known (energy).
From the information point of view, the statement that systems approach
equilibrium is equivalent to saying that all information is forgotten except the
integrals of motion. If, however, we possess the information about averages
of quantities that are not integrals of motion and those averages do not
coincide with their equilibrium values then the distribution (64) deviates
from equilibrium. Examples are currents, velocity or temperature gradients
like considered in kinetics.
More details can be found in Katz, Sects. 2-5, Sethna Sect. 5.3 and Kardar
I, Problem 2.6.

10
that entropy is not a property of the system but of our knowledge about the system

40
4 Gases
We now go on to apply a general theory given in the Chapter 2. Here we
consider systems with the kinetic energy exceeding the potential energy of
inter-particle interactions: ⟨U (r1 − r2 )⟩ ≪ ⟨mv 2 /2⟩.

4.1 Ideal Gases


We start from neglecting the potential energy of interaction completely. Note
though that molecules in the same state do have quantum interaction so
generally one cannot consider particles completely independent. If however
we introduce a grand canonical ensemble considering all molecules in the
same state as a subsystem (with a non-fixed number of particles na ) then
such subsystems do not interact. Using the distribution (29) with N = na
and E = na ϵa one expresses the probability of occupation numbers:

w(na ) = exp{ β[Ωa + na (µ − ϵa )]} . (66)

Consider now a dilute gas, when all na ≪ 1. Then the probability of no


particles in the given state is close to unity, w0 = exp(βΩa ) ≈ 1, and the
probability of having one particle and the average number of particles is given
by ( )
∑ µ − ϵa
n̄a = w(na )na ≈ w1 ≈ exp , (67)
na T
which is called Boltzmann distribution It is thus expressed via the chemical
potential.

4.1.1 Boltzmann (classical) gas


is such that one can also neglect quantum exchange interaction of particles
(atoms or molecules) in the same state which requires the occupation num-
bers of any quantum state to be small, which in turn requires the number of
states V p3 /h3 to be much √
larger than the number of molecules N . Since the
typical momentum is p ≃ mT we get the condition

(mT )3/2 ≫ h3 n . (68)

To get the feeling of the order of magnitudes, one can make an estimate with
m = 1.6 · 10−24 g (proton) and n = 1021 cm−3 which gives T ≫ 0.5K. Another

41
way to interpret (68) is to say that the mean distance between molecules
n−1/3 must be much larger than the wavelength h/p. In this case, one can
pass from the distribution over the quantum states to the distribution in the
phase space: [ ]
µ − ϵ(p, q)
n̄(p, q) = exp . (69)
T
In particular, the distribution over momenta is always quasi-classical for the
Boltzmann gas. Indeed, the distance between energy levels is determined by
the size of the box, ∆E ≃ h2 m−1 V −2/3 ≪ h2 m−1 (N/V )2/3 which is much less
than temperature according to (68). To put it simply, if the thermal quantum
wavelength h/p ≃ h(mT )−1/2 is less than the distance between particles it is
also less than the size of the box. We conclude that the Boltzmann gas has the
Maxwell distribution over momenta. If such is the case even in the external
field then n(q, p) = exp{[µ − ϵ(p, q)]/T } = exp{[µ − U (q) − p2 /2m]/T }. That
gives, in particular, the particle density in space n(r) = n0 exp[−U (r)/T ]
where n0 is the concentration without field. In the uniform gravity field we
get the barometric formula n(z) = n(0) exp(−mgz/T ).
Since now molecules do not interact then we can treat them as members
of the Gibbs canonical ensemble. The partition function of the Boltzmann
gas can be obtained from the partition function of a single particle (like we
did for two-level system and oscillator) with the only difference that particles
are now real and indistinguishable so that we must divide the sum by the
number of transmutations:
[ ]N
1 ∑
Z= exp(−ϵa /T ) .
N! a
Using the Stirling formula ln N ! ≈ N ln(N/e) we write the free energy
[ ]
e ∑
F = −N T ln exp(−ϵa /T ) . (70)
N a
Since the motion of the particle as a whole is always quasi-classical for the
Boltzmann gas, one can single out the kinetic energy: ϵa = p2 /2m + ϵ′a .
If in addition there is no external field (so that ϵ′a describes rotation and
the internal degrees of freedom of the particle) then one can integrate over
d3 pd3 q/h3 and get for the ideal gas:
[ ( )3/2 ∑ ]
eV mT
F = −N T ln exp(−ϵ′a /T ) . (71)
N 2πh̄2 a

42
To complete the computation we need to specify the internal structure of the

particle. Note though that a exp(−ϵ′a /T ) depends only on temperature so
that we can already get the equation of state P = −∂F/∂V = N T /V .
Mono-atomic gas. At the temperatures much less than the distance to
the first excited state all the atoms will be in the ground state (we put ϵ0 = 0).
That means that the energies are much less than Rydberg ε0 = e2 /aB =
me4 /h̄2 ≃ 4 · 10−11 erg and the temperatures are less than ε0 /k ≃ 3 · 105 K
(otherwise atoms are ionized).
If there is neither orbital angular momentum nor spin (L = S = 0 —

such are the atoms of noble gases) we get a exp(−ϵ′a /T ) = 1 as the ground
state is non-degenerate and
[ ( ) ]
eV mT 3/2 eV
F = −N T ln 2 = −N T ln − N cv T ln T − N ζT , (72)
N 2πh̄ N
3 m
cv = 3/2 , ζ = ln . (73)
2 2πh̄2
Here ζ is called the chemical constant. Note that for F = AT + BT ln T the
energy is linear E = F − T ∂F/∂T = BT that is the specific heat, Cv = B, is
independent of temperature. The formulas thus derived allow one to derive
the conditions for the Boltzmann statistics to be applicable which requires
n̄a ≪ 1. Evidently, it is enough to require exp(µ/T ) ≪ 1 where
 ( )3/2 
E − TS + PV F + PV F + NT N 2πh̄2
µ= = = = T ln   .
N N N V mT

Using such µ we get (mT )3/2 ≫ h3 n. Note that µ < 0.


If there is a nonzero spin the level has a degeneracy 2S + 1 which adds
ζS = ln(2S + 1) to the chemical constant (73). If both L and S are nonzero
then the total angular momentum J determines the fine structure of levels ϵJ .
This is the energy of spin-orbital and spin-spin interactions, both relativistic
effects, so that the energy can be estimated as ϵJ ≃ ε0 (v/c)2 ≃ ε0 (Zn e2 /h̄c)2 .
For not very high nuclei charge Zn , it is generally comparable with the room
temperature ϵJ /k ≃ 200 ÷ 300K. Every such level has a degeneracy 2J + 1
so that the respective partition function

z= (2J + 1) exp(−ϵJ /T ) .
J

43
Without actually specifying ϵJ we can determine this sum in two limits of
large and small temperature. If ∀J one has T ≫ ϵJ , then exp(−ϵJ /T ) ≈ 1
and z = (2S + 1)(2L + 1) which is the total number of components of the
fine level structure. In this case
ζSL = ln(2S + 1)(2L + 1) .
In the opposite limit of temperature smaller than all the fine structure level
differences, only the ground state with ϵJ0 = 0 contributes and one gets
ζJ = ln(2J0 + 1) ,
where J0 is the total angular momentum in the ground state.
ζ
ζ SL
ζJ
T
cv

3/2
T
Note that cv = 3/2 in both limits that is the specific heat is constant at
low and high temperatures (no contribution of electron degrees of freedom)
having some maximum in between (due to contributions of the electrons).
We have already seen this in considering two-level system and the lesson is
general: if one has a finite number of levels then they do not contribute to
the specific heat both at low and high temperatures.
Specific heat of diatomic molecules. We need to calculate the sum
over the internal degrees of freedom in (71). We assume the temperature to
be smaller than the energy of dissociation (which is typically of the order
of electronic excited states). Since most molecules have S = L = 0 in the
ground state we disregard electronic states in what follows. The internal
excitations of the molecule are thus vibrations and rotations with the energy
ϵ′a characterized by two quantum numbers, j and K:
( )
ϵjK = h̄ω(j + 1/2) + h̄2 /2I K(K + 1) . (74)

Here ω is the frequency of vibrations and I is the moment of inertia for


rotations. We estimate the parameters here assuming the typical scale to

44
be Bohr radius aB = h̄2 /me2 ≃ 0.5 · 10−8 cm and the typical energy to be
Rydberg ε0 = e2 /aB = me4 /h̄2 ≃ 4 · 10−11 erg. Note that m = 9 · 10−28 g is
the electron mass here. Now the frequency of the atomic oscillations is given
by the ratio of the Coulomb restoring force and the mass of the ion:
√ √
ε0 e2
ω≃ 2
= .
aB M a3B M

Rotational energy is determined by the moment of inertia I ≃ M a2B . We


may thus estimate the typical energies of vibrations and rotations as follows:

m h̄2 m
h̄ω ≃ ε0 , ≃ ε0 . (75)
M I M
Since m/M ≃ 10−4 then that both energies are much smaller than the energy
of dissociation ≃ ϵ0 and the rotational energy is smaller than the vibrational
one so that rotations start to contribute at lower temperatures: ε0 /k ≃
3 · 105 K, h̄ω/k ≃ 3 · 103 K and h̄2 /Ik ≃ 30 K.
The harmonic oscillator was considered in in Sect. 2.5.2. In the quasi-
classical limit, h̄ω ≪ T , the partition function of N independent oscillators
is Z(T, N ) = Z1N (T ) = (T /h̄ω)N , the free energy F = N T ln(h̄ω/T ) and the
mean energy from (39): E = N T . The specific heat CV = N .
For a quantum case, the energy levels are given by En = h̄ω(n + 1/2).
The single-oscillator partition function


Z1 (T ) = exp[−h̄ω(n + 1/2)/T ] = 2 sinh−1 (h̄ω/2T ) (76)
n=0

gives again Z(T, N ) = Z1N (T ) and F (T, N ) = N T ln[sinh(h̄ω/2T )/2] =


N h̄ω/2 + N T ln[1 − exp(−h̄ω/T ). The energy now is

E = N h̄ω/2 + N h̄ω[exp(h̄ω/T ) − 1]−1

where one sees the contribution of zero quantum oscillations and the break-
down of classical equipartition. The specific heat (per molecule) of vibrations
is thus as follows: cvib = (h̄ω/T )2 exp(h̄ω/T )[exp(h̄ω/T ) − 1]−2 . At T ≪ h̄ω:
we have CV ∝ exp(−h̄ω/T ). At large T we have classical equipartition (every
oscillator has two degrees of freedom so it has T in energy and 1 in CV ).

45
To calculate the contribution of rotations one ought to calculate the par-
tition function
( )
∑ h̄2 K(K + 1)
zrot = (2K + 1) exp − . (77)
K 2IT

Again, when temperature is much smaller than the distance to the first
level, T ≪ h̄2 /2I, the specific heat must be exponentially small. Indeed,
retaining only two first terms in the sum (77), we get zrot = 1+3 exp(−h̄2 /IT )
which gives in the same approximation Frot = −3N T exp(−h̄2 /IT ) and crot =
3(h̄2 /IT )2 exp(−h̄2 /IT ). We thus see that at low temperatures diatomic gas
behaves an mono-atomic.
At large temperatures, T ≫ h̄2 /2I, the terms with large K give the main
contribution to the sum (77). They can be treated quasi-classically replacing
the sum by the integral:
∫ ( )
∞ h̄2 K(K + 1) 2IT
zrot = dK(2K + 1) exp − = . (78)
0 2IT h̄2

That gives the constant specific heat crot = 1. The resulting specific heat of
the diatomic molecule, cv = 3/2 + crot + cvibr , is shown on the figure:
Cv

7/2

5/2

3/2

Ι/h 2 hω T

Note that for h̄2 /I < T ≪ h̄ω the specific heat (weakly) decreases be-
cause the distance between rotational levels increases so that the level density
(which is actually cv ) decreases.
For (non-linear) molecules with N > 2 atoms we have 3 translations, 3
rotations and 6N − 6 vibrational degrees of freedom (3n momenta and out
of total 3n coordinates one subtracts 3 for the motion as a whole and 3 for
rotations). That makes for the high-temperature specific heat cv = ctr +crot +
cvib = 3/2 + 3/2 + 3N − 3 = 3N . Indeed, every variable (i.e. every degree

46
of freedom) that enters ϵ(p, q), which is quadratic in p, q, contributes 1/2
to cv . Translation and rotation each contributes only momentum and thus
gives 1/2 while each vibration contributes both momentum and coordinate
(i.e. kinetic and potential energy) and gives 1.
Landau & Lifshitz, Sects. 47, 49, 51.

4.2 Fermi and Bose gases


Like we did at the beginning of the Section 4.1 we consider all particles at
the same quantum state as Gibbs subsystem and apply the grand canonical
distribution with the potential

Ωa = −T ln exp[na (µ − ϵa )/T ] . (79)
na

Here the sum is over all possible occupation numbers na . For fermions, there
are only two terms in the sum with na = 0, 1 so that

Ωa = −T ln {1 + exp[β(µ − ϵa )]} .

For bosons, one must sum the infinite geometric progression (which converges
when µ < 0) to get Ωa = T ln {1 − exp[β(µ − ϵa )]}. Remind that Ω depends
on T, V, µ. The average number of particles in the state with the energy ϵ is
thus
∂Ωa 1
n̄(ϵ) = − = . (80)
∂µ exp[β(ϵ − µ)] ± 1
Upper sign here and in the subsequent formulas corresponds to the Fermi
statistics, lower to Bose. Note that at exp[β(ϵ − µ)] ≫ 1 both distributions
turn into Boltzmann distribution (67). The thermodynamic potential of the
whole system is obtained by summing over the states
∑ [ ]
Ω = ∓T ln 1 ± eβ(µ−ϵa ) . (81)
a

Fermi and Bose distributions are generally applied to elementary particles


(electrons, nucleons or photons) or quasiparticles (phonons) since atomic
and molecular gases are described by the Boltzmann distribution (with the
exception of ultra-cold atoms in optical traps). For elementary particle, the
energy is kinetic energy, ϵ = p2 /2m, which is always quasi-classical (that is
the thermal wavelength is always smaller than the size of the box but can

47
now be comparable to the distance between particles). In this case we may
pass from summation to the integration over the phase space with the only
addition that particles are also distinguished by the direction of the spin s
so there are g = 2s + 1 particles in the elementary sell of the phase space.
We thus replace (80) by
gdpx dpy dpz dxdydxh−3
dN (p, q) = . (82)
exp[β(ϵ − µ)] ± 1
Integrating over volume we get the quantum analog of the Maxwell dis-
tribution: √
gV m3/2 ϵ dϵ
dN (ϵ) = √ 2 3 . (83)
2π h̄ exp[β(ϵ − µ)] ± 1
In the same way we rewrite (81):
∫ [ ]
gV T m3/2 ∞ √
Ω=∓ √ 2 3 ϵ ln 1 ± eβ(µ−ϵ) dϵ
2π h̄ 0
2 gV m3/2 ∫ ∞ ϵ3/2 dϵ 2
=− √ = − E. (84)
3 2π 2 h̄3 0 exp[β(ϵ − µ)] ± 1 3
Since also Ω = −P V we get the equation of state
2
PV = E . (85)
3
We see that this relation is the same as for a classical gas, it actually is true for
any non-interacting particles with ϵ = p2 /2m in 3-dimensional space. Indeed,
consider a cube with the side l. Every particle hits a wall |px |/2ml times per
unit time transferring the momentum 2|px | in every hit. The pressure is the
total momentum transferred per unit time p2x /ml divided by the wall area l2
(see Kubo, p. 32):

N
p2ix ∑
N
p2i 2E
P = 3
= 3
= . (86)
i=1 ml i=1 3ml 3V
In the limit of Boltzmann statistics we have E = 3N T /2 so that (85)
reproduces P V = N T . Let us obtain the (small) quantum corrections to the
pressure assuming exp(µ/T ) ≪ 1. Expanding integral in (84)
∫∞ ∫∞ [ ]

ϵ3/2 dϵ 3 π βµ ( −5/2 βµ
)
≈ ϵ 3/2 β(µ−ϵ)
e 1 ∓ eβ(µ−ϵ)
dϵ = e 1 ∓ 2 e ,
eβ(ϵ−µ) ± 1 4β 5/2
0 0

48
and substituting Boltzmann expression for µ we get
[ ]
π 3/2 N h3
P V = NT 1 ± . (87)
2g V (mT )3/2
Non-surprisingly, the small factor here is the ratio of the thermal wavelength
to the distance between particles. We see that quantum effects give some
effective attraction between bosons and repulsion between fermions.
Landau & Lifshitz, Sects. 53, 54, 56.

4.2.1 Degenerate Fermi Gas


The main goal of the theory here is to describe the electrons in the metals
(it is also applied to the Thomas-Fermi model of electrons in large atoms,
to protons and neutrons in large nucleus, to electrons in white dwarf stars,
to neutron stars and early Universe). Drude and Lorents at the beginning
of 20th century applied Boltzmann distribution and obtained decent results
for conductivity but disastrous discrepancy for the specific heat (which they
expected to be 3/2 per electron). That was cleared out by Sommerfeld in
1928 with the help of Fermi-Dirac distribution. The energy of an electron in
a metal is comparable to Rydberg and so is the chemical potential (which
is positive for degenerate Fermi gas in distinction from Boltzmann and Bose
gases, since one increases energy by putting extra particle into the system,
see below). Therefore, for most temperatures we may assume T ≪ µ so that
the Fermi distribution is close to the step function:
n
T

ε
ε
F
At T = 0 electrons fill all the momenta up to pF that can be expressed
via the concentration (g = 2 for s = 1/2):
N 4π ∫ pF 2 p3
=2 3 p dp = 2F 3 , (88)
V h 0 3π h̄

49
which gives the Fermi energy
( )2/3
2 2/3 h̄2 N
ϵF = (3π ) . (89)
2m V
The chemical potential at T = 0 coincides with the Fermi energy (putting
already one electron per unit cell one obtains ϵF /k ≃ 104 K). Condition
T ≪ ϵF is evidently opposite to (68). Chemical potential of the fermi gas
decreases with temperature and changes sign at T = ϵF . Note that the
condition of ideality requires that the electrostatic energy Ze2 /a is much
less than ϵF where Ze is the charge of ion and a ≃ (ZV /N )1/3 is the mean
distance between electrons and ions. We see that the condition of ideality,
N/V ≫ (e2 m/h̄2 )3 Z 2 , surprisingly improves with increasing concentration.
Note nevertheless that in most metals the interaction is substantial, why one
can still use Fermi distribution (only introducing an effective electron mass)
is the subject of Landau theory of Fermi liquids to be described in the course
of condensed matter physics (in a nutshell, it is because the main effect of
interaction is reduced to some mean effective periodic field).
To obtain the specific heat, Cv = (∂E/∂T )V,N one must find E(T, V, N )
i.e. exclude µ from two relations, (83) and (84):

2V m3/2 ∫ ∞ ϵdϵ
N=√ 2 3 ,
2π h̄ 0 exp[β(ϵ − µ)] + 1
2V m3/2 ∫ ∞ ϵ3/2 dϵ
E=√ 2 3 .
2π h̄ 0 exp[β(ϵ − µ)] + 1
At T ≪ µ ≈ ϵF this can be done perturbatively using the formula
∫ ∞ ∫ µ
f (ϵ) dϵ π2
≈ f (ϵ) dϵ + T 2 f ′ (µ) , (90)
0 exp[β(ϵ − µ)] + 1 0 6
which gives
2V m3/2 2 ( )
N = √ 2 3 µ3/2 1 + π 2 T 2 /8µ2 ,
2π h̄ 3
2V m3/2 2 ( )
E = √ 2 3 µ5/2 1 + 5π 2 T 2 /8µ2 .
2π h̄ 5
From the first equation we find µ(N, T ) perturbatively
( )2/3 ( )
µ = ϵF 1 − π 2 T 2 /8ϵ2F ≈ ϵF 1 − π 2 T 2 /12ϵ2F

50
and substitute it into the second equation:
3 ( )
E = N ϵF 1 + 5π 2 T 2 /12ϵ2F , (91)
5
π2 T
CV = N . (92)
2 ϵF
We see that CV ≪ N . Another important point to stress is that the energy
(and P V ) are much larger than N T , the consequence is that the fermionic
nature of electrons is what actually determines the resistance of metals (and
neutron stars) to compression. For a typical electron density in metals, n ≃
1022 cm−3 , we get

2nϵF h̄2 5/3


P ≈ = (3π 2 )2/3 n ≃ 104 atm .
5 5m

Landau & Lifshitz, Sects. 57, 58 and Pathria 8.3.

4.2.2 Photons
Consider electromagnetic radiation in an empty cavity kept at the temper-
ature T . Since electromagnetic waves are linear (i.e. they do not interact)
thermalization of radiation comes from interaction with walls (absorption
and re-emission)11 . One can derive the equation of state without all the for-
malism of the partition function. Indeed, consider the plane electromagnetic
wave with the fields having amplitudes E and B. The average energy density
is (E 2 + B 2 )/2 = E 2 while the momentum flux modulus is |E × B| = E 2 .
The radiation field in the box can be considered as incoherent superposi-
tion of plane wave propagating in all directions. Since all waves contribute
the energy density and only one-third of the waves contribute the radiation
pressure on any wall then
P V = E/3 . (93)
In a quantum consideration we treat electromagnetic waves as photons
which are massless particles with the spin 1 that can have only two inde-
pendent orientations (correspond to two independent polarizations of a clas-
sical electromagnetic wave). The energy is related to the momentum by
11
It is meaningless to take perfect mirror walls which do not change the frequency of
light under reflection and formally correspond to zero T .

51
ϵ = cp. Now, exactly as we did for particles [where the law ϵ = p2 /2m gave
P V = 2E/3 — see (86)] we can derive (93) considering12 that every incident
photon brings momentum ∫
2p cos θ to the wall, that the normal velocity is
2
c cos θ and integrating cos θ sin θ dθ. Photon pressure is relevant inside the
stars, particularly inside the Sun.
Let us now apply the Bose distribution to the system of photons in a
cavity. Since the number of photons is not fixed then a minimum of the free
energy, F (T, V, N ), requires zero chemical potential: (∂F/∂N )T,V = µ =
0. The Bose distribution over the quantum states with fixed polarization,
momentum h̄k and energy ϵ = h̄ω = h̄ck is called Planck distribution
1
n̄k = . (94)
eh̄ω/T −1
At T ≫ h̄ω it gives the Rayleigh-Jeans distribution h̄ωn̄k = T which is
classical equipartition. Assuming cavity large we consider the distribution
over wave vectors continuous. Multiplying by 2 (the number of polarizations)
we get the spectral distribution of energy

2V 4πk 2 dk V h̄ ω 3 dω
dEω = h̄ck = . (95)
(2π)3 eh̄ck/T − 1 π 2 c3 eh̄ω/T − 1

It has a maximum at h̄ωm = 2.8T . The total energy



E= V T4 , (96)
c
where the Stephan-Boltzmann constant is as follows: σ = π 2 /60h̄3 c2 . The
specific heat cv ∝ T 3 . Since P = 4σT 4 /3c depends only on tempera-
ture, cP does not exist (may be considered infinite). We consider fixed
temperature so that the relevant thermodynamics potential is the free en-
ergy (which coincides with Ω for µ = 0). It is derived from energy using
S = −∂F/∂T and F + T S = F − T ∂F/∂T = −T 2 ∂(F/T )/∂T = E, which
gives F = −E/3 ∝ V T 4 and entropy S = −∂F/∂T ∝ V T 3 that is the
12
This consideration is not restricted to bosons. Indeed, ultra-relativistic fermions have
ϵ = cp and P = E/3V , e.g. electrons in graphene. In the relativistic theory energy and
momentum are parts of the energy-momentum tensor whose trace must be positive which
requires cp ≤ ϵ and P ≤ E/3V where E is the total energy including the rest mass N mc2 ,
L&L 61.

52
Nernst law is satisfied: S → 0 when T → 0. Under adiabatic compres-
sion or expansion of radiation, entropy constancy requires V T 3 = const and
P V 4/3 = const.
If one makes a small orifice in the cavity then it absorbs all the incident
light like a black body. Therefore, what comes out of such a hole is called
black-body radiation. The energy flux from a unit surface is the energy
density times c and times the geometric factor

cE ∫ π/2 cE
I= cos θ sin θ dθ = = σT 4 . (97)
V 0 4V
Landau & Lifshitz, Sect. 63 and Huang, Sect. 12.1.

4.2.3 Phonons
The specific heat of a crystal lattice can be calculated using the powerful
idea of quasi-particles: turning the set of strongly interacting atoms into a
set of weakly interacting waves. In this way one considers the oscillations of
the atoms as acoustic waves with three branches (two transversal and one
longitudinal) ωi = ui k where ui is the respective sound velocity. Debye took
this expression for the spectrum and imposed a maximal frequency ωmax so
that the total number of degrees of freedom is equal to 3 times the number
of atoms:
3 ω∫max 2
4πV ∑ ω dω 3
V ωmax
= = 3N . (98)
(2π)3 i=1 u3i 2π 2 u3
0

Here we introduced some effective sound velocity u defined by 3u−3 = 2u−3


t +
−3
ul . One usually introduces the Debye temperature

Θ = h̄ωmax = h̄u(6π 2 N/V )1/3 ≃ h̄u/a , (99)

where a is the lattice constant.


We can now write the energy of lattice vibrations using the Planck dis-
tribution (since the number of phonons is indefinite, µ = 0)
( ) ( )
3V ∫
ωmax
1 1 2 9N Θ Θ
E= 2 3 h̄ω + ω dω = +3N T D , (100)
2π u 2 exp(h̄ω/T )−1 8 T
0
{
3 ∫ x z 3 dz 1 for x ≪ 1 ,
D(x) = 3 =
x 0 e −1
z π 4 /5x3 for x ≫ 1 .

53
At T ≪ Θ for the specific heat we have the same cubic law as for photons:
12π 4 T 3
C=N . (101)
5 Θ3
For liquids, there is only one (longitudinal) branch of phonons so C =
N (4π 4 /5)(T /Θ)3 which works well for He IV at low temperatures.
At T ≫ Θ we have classical specific heat (Dulong-Petit law) C = 3N .
Debye temperatures of different solids are between 100 and 1000 degrees
Kelvin. We can also write the free energy of the phonons as a sum/integral
over frequencies of the single oscillator expression:
( )3 Θ/T
∫ ( ) [ ( ) ]
T
F = 9N T z 2 ln 1 − e−z dz = N T 3 ln 1−e−Θ/T −D(Θ/T ) ,(102)
Θ
0

and find that, again, at low temperatures S = −∂F/∂T ∝ T 3 i.e. Nernst


theorem. An interesting quantity is the coefficient of thermal expansion
α = (∂ ln V /∂T )P . To get it one must pass to the variables P, T, µ introducing
the Gibbs potential G(P, T ) = E − T S + P V and replacing V = ∂G/∂P .
At high temperatures, F ≈ 3N T ln(Θ/T ). It is the Debye temperature
here which depends on P , so that the part depending on T and P in both
potentials is linearly proportional to T : δF (P, T ) = δG(P, T ) = 3N T ln Θ.
That makes the mixed derivative
∂2G N ∂ ln Θ
α = V −1 =3
∂P ∂T V ∂P
independent of temperature. One can also express it via so-called mean

geometric frequency defined as follows: ln ω̄ = (3N )−1 ln ωa . Then δF =

δG = T a ln(h̄ωa /T ) = N T ln h̄ω̄(P ) , and α = (N/V ω̄)dω̄/dP . When the
pressure increases, the atoms are getting closer, restoring force increases and
so does the frequency of oscillations so that α ≥ 0.
Note that we’ve got a constant contribution 9N Θ/8 in (100) which is due
to quantum zero oscillations. While it does not contribute the specific heat,
it manifests itself in X-ray scattering, Mössbauer effect etc. Incidentally,
this is not the whole energy of a body at zero temperature, this is only the
energy of excitations due to atoms shifting from their equilibrium positions.
There is also a negative energy of attraction when the atoms are precisely in
their equilibrium position. The total (so-called binding) energy is negative
for crystal to exists at T = 0.

54
One may ask why we didn’t account for zero oscillations when considered
photons in (95,96). Since the frequency of photons is not restricted from
above, the respective contribution seems to be infinite. How to make sense
out of such infinities is considered in quantum electrodynamics; note that
the zero oscillations of the electromagnetic field are real and manifest them-
selves, for example, in the Lamb shift of the levels of a hydrogen atom. In
thermodynamics, zero oscillations of photons are of no importance.
Landau & Lifshitz, Sects. 64–66; Huang, Sect. 12.2

4.2.4 Bose gas of particles and Bose-Einstein condensation


We consider now an ideal Bose gas of massive particles with the fixed number
of particles. This is applied to atoms at very low temperatures. As usual,
equaling the total number of particles to the sum of Bose distribution over all
states gives the equation that determines the chemical potential as a function
of temperature and the specific volume. It is more convenient here to work
with the function z = exp(µ/T ) which is called fugacity:
∑ 1 4πV ∫ ∞ p2 dp z V g3/2 (z) z
N= = + = + .
p eβ(ϵp −µ) −1 h3 0 −1
z e p2 /2mT
−1 1 − z λ 3 1−z

We introduced the thermal wavelength λ = (2πh̄2 /mT )1/2 and the function

1 ∫ ∞ xa−1 dx ∞
∑ zi
ga (z) = = . (103)
Γ(a) 0 z −1 ex − 1 i=1 ia

One may wonder why we single out the contribution of zero-energy level as it
is not supposed to contribute at the thermodynamic limit V → ∞. Yet this
is not true at sufficiently low temperatures. Indeed, let us rewrite it denoting
n0 = z/(1 − z) the number of particles at p = 0

n0 1 g3/2 (z)
= − . (104)
V v λ3
The function g3/2 (z) behaves as shown at the figure, it monotonically grows
while z changes from zero (µ = −∞) to unity (µ = 0). Remind that the
chemical potential of bosons is non-positive (otherwise one would have in-
finite occupation numbers). At z = 1, the value is g3/2 (1) = ζ(3/2) ≈ 2.6
and the derivative is infinite. When the temperature and the specific volume

55
v = V /N are such that λ3 /v > g3/2 (1) (notice that the thermal wavelength is
now larger than the inter-particle distance) then there is a finite fraction of
particles that occupies the zero-energy level. The graphic solution of (104)
for a finite V can be seen in the Figure below by plotting λ3 /v (broken
line) and g3/2 (z) (solid line). When V → ∞ we have a sharp transition at
λ3 /v = g3/2 (1) i.e. at T = Tc = 2πh̄2 /m[vg3/2 (1)]2/3 : at T ≤ Tc we have
z ≡ 1 that is µ ≡ 0. At T > Tc we obtain z solving λ3 /v = g3/2 (z).
_
λ
3
z
v
2.6 O(1/V) O(1/V)
1

g3/2
3
1/2.6 v/λ
0 1 z
Therefore, at the thermodynamic limit we put n0 = 0 at T > Tc and
n0 /N = 1 − (T /Tc )3/2 as it follows from (104). All thermodynamic relations
have now different expressions above and below Tc (upper and lower cases
respectively):
{
3 2πV ∫ ∞ p4 dp (3V T /2λ3 )g5/2 (z)
E = PV = = (105)
2 mh3 0 z −1 exp(p2 /2mT ) − 1 (3V T /2λ3 )g5/2 (1)
{
(15v/4λ3 )g5/2 (z) − 9g3/2 (z)/4g1/2 (z)
cv = (106)
(15v/4λ3 )g5/2 (1)

At low T , cv ∝ λ−3 ∝ T 3/2 , it decreases faster than cv ∝ T for electrons


(since the number of over-condensate particles now changes with T as for
phonons and photons, and µ = 0 too) yet slower than cv ∝ T 3 (that we
had for ϵp = cp) because the particle levels, ϵp = p2 /2m, are denser at lower
energies. On the other hand, since the distance between levels increases with
energy so that at high temperatures cv decreases with T as for rotators in
Sect. 4.1.1:
cv P transition line Pv5/3=const

isotherms

3/2

T
T v
Tc vc(T)

56
At T < Tc the pressure is independent of the volume which prompts the
analogy with a phase transition of the first order. Indeed, this reminds the
properties of the saturated vapor (particles with nonzero energy) in contact
with the liquid (particles with zero energy): changing volume at fixed tem-
perature we change the fraction of the particles in the liquid but not the pres-
sure. This is why the phenomenon is called the Bose-Einstein condensation.
Increasing temperature we cause evaporation (particle leaving condensate in
our case) which increases cv ; after all liquid evaporates (at T = Tc ) cv starts
to decrease. It is sometimes said that it is a “condensation in the momentum
space” but if we put the system in a gravity field then there will be a spatial
separation of two phases just like in a gas-liquid condensation (liquid at the
bottom).
We can also obtain the entropy [above Tc by usual formulas∫ that fol-
low∫ from (84) and below Tc just integrating specific heat S = dE/T =
N cv (T )dT /T = 5E/3T = 2N cv /3):
{
S (5v/2λ3 )g5/2 (z) − log(z)
= (107)
N (5v/2λ3 )g5/2 (1)

The entropy is zero at T = 0 which means that the condensed phase has no
entropy. At finite T all the entropy is due to gas phase. Below Tc we can
write S/N = (T /Tc )3/2 s = (v/vc )s where s is the entropy per gas particle:
s = 5g5/2 (1)/2g3/2 (1). The latent heat of condensation per particle is T s that
it is indeed phase transition of the first order.
To conclude, we have seen in this Section how quantum effects lead to
switching off degrees of freedom at low temperatures. Fermi and Bose sys-
tems reach the zero-entropy state at T = 0 in different ways. It is also
instructive to compare their chemical potentials:
µ
Fermi
T
Bose

Landau & Lifshitz, Sect. 62; Huang, Sect. 12.3.

57
5 Non-ideal gases
Here we take into account a weak interaction between particles. There are
two limiting cases when the consideration is simplified:
i) when the typical range of interaction is much smaller than the mean dis-
tance between particles so that it is enough to consider only two-particle
interactions,
ii) when the interaction is long-range so that every particle effectively interact
with many other particles and one can apply some mean-field description.

5.1 Cluster and virial expansions


Consider a dilute gas with the short-range inter-particle energy of interaction
u(r). We assume that u(r) decays on the scale r0 and

ϵ ≡ (2/3)πr03 N/V ≡ bN/V ≪ 1 .

Integrating over momenta we get the partition function Z and the grand
partition function Z as
1 ∫ ZN (V, T )
Z(N, V, T ) = 3N
dr1 . . . drN exp[−U (r1 , . . . , rN )] ≡ .
N !λT N !λ3N
T
∑∞
z N ZN
Z(z, V, T ) = 3N
. (108)
N =0 N !λT

Here we use fugacity z = exp(µ/T ) instead of the chemical potential. The


terms with N = 0, 1 give unit integrals, with N = 2 we shall have U12 =
u(r12 ), then U123 = u(r12 ) + u(r13 ) + u(r23 ), etc. In every term we may
integrate over the coordinate of the center of mass of N particles and obtain
( )2 ∫
z V z
Z(µ, V, T ) = 1 + V 3 + dr exp[−u(r)/T ]
λT 2! λ3T
( )3 ∫
V z
+ exp{−[u(r12 ) + u(r13 ) + u(r23 )]/T } dr2 dr3 + . . . (. 109)
3! λ3T
The first terms does not account for interaction. The second one accounts
for the interaction of only one pair (under the assumption that when one
pair of particles happens to be close and interact, this is such a rare event
that the rest can be considered non-interacting). The third term accounts for

58
simultaneous interaction of three particles etc. We can now write the Gibbs
potential Ω = −P V = −T ln Z and expand the logarithm in powers of z/λ3T :


P = λ−3
T bl z l . (110)
l=1

It is convenient to introduce the two-particle function, called interaction fac-


tor, fij = exp[−u(rij )/T ] − 1, which is zero outside the range of interaction.
Terms containing integrals of k functions fij are proportional to ϵk . The
coefficients bl can be expressed via fij :


b1 = 1 , b2 = (1/2)λ−3
T f12 dr12 ,
∫ ( )
b3 = (1/6)λ−6
T e−U123 /T −e−U12 /T −e−U23 /T −e−U13 /T +2 dr12 dr13

= (1/6)λ−6
T (3f12 f13 + f12 f13 f23 ) dr12 dr13 . (111)

It is pretty cumbersome to analyze higher orders in analytical expressions.


Instead, every term in
∫ ∏
ZN (V, T ) = i<j
(1 + fij ) dr1 . . . drN
∫ ( ∑ ∑ )
= 1+ fij + fij fkl + . . . dr1 . . . drN .

can be represented as a graph with N points and lines connecting particles


which interaction we account for. In this way, ZN is a sum of all distinct N -
particle graphs. Since most people are better in manipulating visual (rather
than abstract) objects then it is natural to use graphs to represent analytic
expressions which is called diagram technique. For example, the three-particle
clusters are as follows:
r r r r r r
r r + r@r + @
@ r r + r r =3 r r + @
r r, (112)
which corresponds to (111). Factorization of terms into independent integrals
corresponds to decomposition of graphs into l-clusters i.e. l-point graphs
where all points are directly or indirectly connected. Associated with the
l-th cluster we may define dimensionless factors bl (called cluster integrals):
1
bl = 3(l−1)
× [sum of all l − clusters] . (113)
l!V λT

59
In the square brackets here stand integrals like
∫ ∫ ∫
dr = V for l = 1, f (r12 ) dr1 dr2 = V f (r) dr for l = 2 , etc .

Using the cluster expansion we can now show that the cluster integrals bl
indeed appear in the expansion (110). For l = 1, 2, 3 we saw that this is
indeed so.
Denote ml the number of l-clusters and by {ml } the whole set of m1 , . . ..
In calculating ZN we need to include the number of ways to distribute N

particles over clusters which is N !/ l (l!)ml . We then must multiply it by
the sum of all possible clusters in the power ml divided by ml ! (since an
exchange of all particles from one cluster with another cluster of the same
3(l−1)
size does not matter). Since the sum of all l-clusters is bl l!λT V then
∑ ∏
ZN = N !λ3N (bl λ−3 ml
T V ) /ml ! .
{ml } l

Here we used N = lml . The problem here is that the sum over different
partitions {ml } needs to be taken under this restriction too and this is techni-
cally very cumbersome. Yet when we pass to calculating the grand canonical
partition function13 and sum over all possible N we ∑ obtain an unrestricted

summation over all possible {ml }. Writing z N = z lml = l (z l )ml we get
∞ ∞
( )ml
∑ ∏ V bl z l 1
Z =
m1 ,m2 ,...=0 l=1 λ3T ml !
∞ ∞
[ ( )m1 ( )m2 ]
∑ ∑ 1 V b1 1 V b2
= ··· z z ···
m1 =0 m2 =0 m1 ! λ3T m2 ! λ3T
( ∞
)
V ∑
= exp 3 bl z l . (114)
λT l=1
We can now reproduce (110) and write the total number of particles:


P V = −Ω = T ln Z(z, V ) = (V /λ3T ) bl z l (115)
l=1

1 z ∂ ln Z ∑
= = λ−3
T lbl z l . (116)
v V ∂z l=1
13
Sometimes the choice of the ensemble is dictated by the physical situation, sometimes
by a technical convenience like now. The equation of state must be the same in the
canonical and microcanonical as we expect the pressure on the wall restricting the system
to be equal to the pressure measured inside.

60
To get the equation of state of now must express z via v/λ3 from (116) and
substitute into (115). That will generate the series called the virial expansion

( )l−1
Pv ∑ λ3
= al (T ) T . (117)
T l=1 v
Dimensionless virial coefficients can be expressed via cluster coefficients i.e.
they depend on the interaction potential and temperature:

a1 = b1 = 1 , a2 = −b2 , a3 = 4b22 − 2b3 = −λ−6
T f12 f13 f23 dr12 dr13 /3 . . . .

In distinction from the cluster coefficients bl which contain terms of different


order in fij we now have al ∝ ϵl i.e. al comes from simultaneous interaction
of l particles. Using graph language, virial coefficients al are determined by
irreducible clusters i.e. such that there are at least two entirely independent
non-intersecting paths that connect any two points. Tentative physical in-
terpretation of the cluster expansion (117) is that we consider an ideal gas of
clusters whose pressures are added. Further details can be found in Pathria,
Sects. 9.1-2 (second edition).

5.2 Van der Waals equation of state


We thus see that the cluster expansion in powers of f generates the virial
expansion of the equation of state in powers of n = N/V . Here we account
only for pairs of the interacting particles. The second virial coefficient
∫ { }
B(T ) = a2 λ3T = 2π 1 − exp[−u(r)/T ] r2 dr (118)

can be estimated by splitting the integral into two parts, from 0 to r0 (where
we can neglect the exponent assuming u large positive) and from r0 to ∞
(where we can assume small negative energy, u ≪ T , and expand the expo-
nent). That gives
∫ ∞
a
B(T ) = b − , a ≡ 2π u(r)r2 dr . (119)
T r0

with b = (2/3)πr03 introduced above. Of course, for any particular u(r) it is


pretty straightforward to calculate a2 (T ) but (119) gives a good approxima-
tion for most cases. We can now get the first correction to the equation of
state: [ ]
NT N B(T )
P = 1+ = nT (1 + bn) − an2 . (120)
V V

61
Generally, B(T ) is negative at low and positive at high temperatures. We
have seen that for Coulomb interaction the correction to pressure (129) is
always negative while in this case it is positive at high temperature where
molecules hit each other often and negative at low temperatures when long-
range attraction between molecules decreases the pressure. Since N B/V <
N b/V ≪ 1 the correction is small. Note that a/T ≪ 1 since we assume weak
interaction.
While by its very derivation the formula (120) is derived for a dilute
gas one may desire to change it a bit so that it can (at least qualitatively)
describe the limit of incompressible liquid. That would require the pressure
to go to infinity when density reaches some value. This is usually done by
replacing in (120) 1 + bn by (1 − bn)−1 which is equivalent for bn ≪ 1 but
for bn → 1 gives P → ∞. The resulting equation of state is called van der
Waals equation: ( )
P + an2 (1 − nb) = nT . (121)
There is though an alternative way to obtain (121) without assuming the gas
dilute. This is some variant of the mean field even though it is not a first
step of any consistent procedure. Namely, we assume that every molecule
moves in some effective field Ue (r) which is a strong repulsion (Ue → +∞)
in some region of volume bN and is an attraction of order −aN outside:
{∫ } [ ]
aN
F − Fid ≈ −T N ln e−Ue (r)/T dr/V = −T N ln(1 − bn) + . (122)
VT
Differentiating (122) with respect to V gives (121). That “derivation” also
helps understand better the role of the parameters b (excluded volume) and
a (mean interaction energy per molecule). From (122) one can also find the
entropy of the van der Waals gas S = −(∂F/∂T )V = Sid + N ln(1 − nb)
and the energy E = Eid − N 2 a/V , which are both lower than those for an
ideal gas, while the sign of the correction to the free energy depends on the
temperature. Since the correction to the energy is T -independent then CV
is the same as for the ideal gas.
Let us now look closer at the equation of state (121). The set of isotherms
is shown on the figure:

62
P V µ J
C E
D L Q
E D
N
L J
Q J E C
D
N L C N
V Q P P

Since it is expected to describe both gas and liquid then it must show
phase transition. Indeed, we see the region with (∂P/∂V )T > 0 at the lower
isotherm in the first figure. When the pressure correspond to the level NLC,
it is clear that L is an unstable point and cannot be realized. But which
stable point is realized, N or C? To get the answer, one must minimize the
Gibbs potential G(T, P, N ) = N µ(T, P ) since we have T and P fixed. For
one mole, integrating the relation∫ dµ(T, P ) = −sdT +vdP under the constant
temperature we find: G = µ = v(P )dP . It is clear that the pressure that
corresponds to D (having equal areas before and above the horizontal line)
separates the absolute minimum at the left branch Q (liquid-like) from that
on the right one C (gas-like). The states E (over-cooled or over-compressed
gas) and N (overheated or overstretched liquid) are metastable, that is they
are stable with respect to small perturbations but they do not correspond to
the global minimum of chemical potential. We thus conclude that the true
equation of state must have isotherms that look as follows:
P

Tc
V

The dependence of volume on pressure is discontinuous along the isotherm


in the shaded region (which is the region of phase transition). True partition
function and true free energy must give such an equation of state. We were
enable to derive it because we restricted ourselves by the consideration of the
uniform systems while in the shaded region the system is nonuniform being
the mixture of two phases. For every such isotherm T we have a value of

63
pressure P (T ), that corresponds to the point D, where the two phases coexist.
On the other hand, we see that if temperature is higher than some Tc (critical
point), the isotherm is monotonic and there is no phase transition. Critical
point was discovered by Mendeleev (1860) who also built the periodic table
of elements. At critical temperature the dependence P (V ) has an inflection
point: (∂P/∂V )T = (∂ 2 P/∂V 2 )T = 0. According to (33) the fluctuations
must be large at the critical point (more detail in the next chapter).

5.3 Coulomb interaction and screening


Interaction of charged particles is long-range and one may wonder how at all
one may use a thermodynamic approach (divide a system into independent
subsystems, for instance). The answer is in screening. Indeed, if the system
is neutral and the ions and electrons are distributed uniformly then the total
Coulomb energy of interaction is zero. Of course, interaction leads to corre-
lations in particle positions (particle prefer to be surrounded by the particles
of the opposite charge) which makes for a nonzero contribution to the energy
and other thermodynamic quantities. The semi-phenomenological descrip-
tion of such systems has been developed by Debye and Hückel (1923) and it
works for plasma and electrolytes. Consider the simplest situation when we
have electrons of the charge −e and ions of the charge +e.
We start from a rough estimate for the screening radius rD which we define
as that of a sphere around an ion where the total charge of all particles is of
order −e i.e. compensates the charge of the ion. Particles are distributed in
the field U (r) according to the Boltzmann formula n(r) = n0 exp[−U (r)/T ]
and the estimate is as follows:
3
rD n0 [exp(e2 /rD T ) − exp(−e2 /rD T )] ≃ 1 . (123)

We obtain what is called the Debye radius



T
rD ∼ (124)
n0 e2
1/3
under the condition of interaction weakness, e2 /rD T = (e2 n0 /T )3/2 ≪ 1.
Note that under that condition there are many particles inside the Debye
sphere: n0 rd3 ≫ 1 (in electrolytes rD is of order 10−3 ÷ 10−4 cm while in
ionosphere plasma it can be kilometers). Everywhere n0 is the mean density
of either ions or electrons.

64
We can now estimate the electrostatic contribution to the energy of the
system of N particles (what is called correlation energy):

e2 N 3/2 e3 A
Ū ≃ −N ≃−√ = −√ . (125)
rD VT VT
The (positive) addition to the specific heat
A e2
∆CV = ≃N ≪N . (126)
2V 1/2 T 3/2 rD T
One can get the correction to the entropy by integrating the specific heat:
∫ ∞ CV (T )dT A
∆S = − = − 1/2 3/2 . (127)
T T 3V T
We set the limits of integration here as to assure that the effect of screening
disappears at large temperatures. We can now get the correction to the free
energy and pressure
2A A
∆F = Ū − T ∆S = − , ∆P = − . (128)
3V 1/2 T 1/2 3V 3/2 T 1/2
Total pressure is P = N T /V − A/3V 3/2 T 1/2 — a decrease at small V (see
figure) hints about the possibility of phase transition which indeed happens
(droplet creation) for electron-hole plasma in semiconductors even though
our calculation does not work at those concentrations.
P
ideal

The correlation between particle positions (around every particle there


are more particles of opposite charge) means that attraction prevails over
repulsion so that it is necessary that corrections to energy, entropy, free
energy and pressure are negative. Positive addition to the specific heat could
be interpreted that increasing temperature one decreases screening and thus
increases energy.

65
Now, we can do all the consideration in a more consistent way calculating
exactly the value of the constant A. To calculate the correlation energy of
electrostatic interaction one needs to multiply every charge by the potential
created by other charges at its location. In estimates, we took the Coulomb
law for the potential around every charge, while it must differ as the distance
increases. Indeed, the electrostatic potential ϕ(r) around an ion determines
the distribution of ions (+) and electrons (-) by the Boltzmann formula
n± (r) = n0 exp[∓eϕ(r)/T ] while the charge density e(n+ − n− ) in its turn
determines the potential by the Poisson equation
( ) 8πe2 n0
∆ϕ = −4πe(n+ − n− ) = −4πen0 e−eϕ/T − eeϕ/T ≈ ϕ, (129)
T
where we expanded the exponents assuming the weakness of interaction.
This equation has a central-symmetric solution ϕ(r) = (e/r) exp(−κr) where
−2
κ2 = 8πrD . We are interesting in this potential near the ion i.e. at small r:
ϕ(r) ≈ e/r − eκ where the first term is the field of the ion itself while the
second term is precisely what we need i.e. contribution of all other charges.
We can now write the energy of every ion and electron as −e2 κ and get the
total electrostatic energy multiplying by the number of particles (N = 2n0 V )
and dividing by 2 so as not to count every couple of interacting charges twice:
√ N 3/2 e3
Ū = −n0 V κe2 = − π √ . (130)
VT

Comparing with the rough estimate (125), we just added the factor π.
The consideration by Debye-Hückel is the right way to account for the
1/3
first-order corrections in the small parameter e2 n0 /T . One cannot though
get next corrections within the method [further expanding the exponents in
(129)]. That would miss multi-point correlations which contribute the next
orders. Indeed, the existence of an ion at some point influences not only
the probability of having an electron at some other point but they together
influence the probability of charge distribution in the rest of the space. To
account for multi-point correlations, one needs Bogolyubov’s method of cor-
relation functions. Such functions are multi-point joint probabilities to find
simultaneously particles at given places. The correlation energy is expressed
via the two-point correlation function wab where the indices mark both the
type of particles (electrons or ions) and the positions ra and rb :
1 ∑ Na Nb ∫ ∫
E= uab wab dVa dVb . (131)
2 a,b V 2

66
Here uab is the energy of the interaction. The pair correlation function is
determined by the Gibbs distribution integrated over the positions of all
particles except the given pair:
∫ [ ]
2−N F − Fid − U (r1 . . . rN )
wab = V exp dV1 . . . dVN −2 . (132)
T

Here ∑ ∑
U = uab + (uac + ubc ) + ucd .
c c,d̸=a,b

Expanding (132) in U/T we get terms like uab wab and in addition (uac +
ubc )wabc which involves the third particle c and the triple correlation function
that one can express via the integral similar to (132):
∫ [ ]
F − Fid − U (r1 . . . rN )
wabc = V 3−N exp dV1 . . . dVN −3 . (133)
T

We can also see this (so-called closure problem) by differentiating


∑ ∫
∂wab wab ∂uab ∂ubc
=− − (V T )−1 Nc wabc dVc , (134)
∂rb T ∂rb c ∂rb

and observing that the equation on wab is not closed, it contains wabc ; the
similar equation on wabc will contain wabcd etc. Debye-Hückel approximation
corresponds to closing this hierarchical system of equations already at the
level of the first equation (134) putting wabc ≈ wab wbc wac and assuming
ωab = wab − 1 ≪ 1, that is assuming that two particles rarely come close
while three particles never come together:
∑ ∫
∂ωab 1 ∂uab −1 ∂ubc
=− − (V T ) Nc ωac dVc , (135)
∂rb T ∂rb c ∂rb

For other contributions to wabc , the integral turns into zero due to isotropy.
This is the general equation valid for any form of interaction. For Coulomb
interaction, we can turn the integral equation (135) into the differential equa-
tion by using ∆r−1 = −4πδ(r). For that we differentiate (135) once more:

4πza zb e2 4πzb e2 ∑
∆ωab (r) = δ(r) + Nc zc ωac (r) . (136)
T TV c

67
The dependence on ion charges and types is trivial, ωab (r) = za zb ω(r) and
we get ∆ω = 4πe2 δ(r)/T + κ2 ω which is (129) with delta-function enforc-
ing the condition at zero. We see that the pair correlation function satis-
fies the same equation as the potential. Substituting the solution ω(r) =
−(e2 /rT ) exp(−κr) into wab (r) = 1 + za zb ω(r) and that into (131) one gets
contribution of 1 vanishing because of electro-neutrality and the term linear
in ω giving (130). To get to the next order, one considers (134) together with
the equation for wabc , where one expresses wabcd via wabc .
After seeing how screening works, it is appropriate to ask what happens
when there is no screening in a system with a long-range interaction. One
example of that is gravity. Indeed, thermodynamics is very peculiar in this
case. Arguably, the most dramatic manifestation is the Jeans instability of
sufficiently large systems which leads to gravitational collapse, creation of
stars and planets and, eventually, appearance of life.
For more details on Coulomb interaction, see Landau & Lifshitz, Sects.
78,79 for more details.
The quantum (actually quasi-classical) variant of such mean-field consid-
eration is called Thomas-Fermi method (1927) and is traditionally studied
in the courses of quantum mechanics as it is applied to the electron distri-
bution in large atoms (such placement is probably right despite the method
is stat-physical because objects of study are more important than methods).
In this method we consider the effect of electrostatic interaction on a degen-
erate electron gas at zero temperature. According to the Fermi distribution
(88), the maximal kinetic energy (Fermi energy) is related to the local con-
centration n(r) by p20 /2m = (3π 2 n)2/3 h̄2 /2m, which is also the expression for
the chemical potential at the zero temperature. We need to find the elec-
trostatic potential ϕ(r) which determines the interaction energy for every
electron, −eϕ. The sum of the chemical potential and the interaction energy,
p20 /2m − eϕ = −eϕ0 , must be space-independent otherwise the electrons drift
to the places with lower −ϕ0 . The constant ϕ0 is zero for neutral atoms and
negative for ions. We can now relate the local electron density n(r) to the lo-
cal potential ϕ(r): p20 /2m = eϕ − eϕ0 = (3π 2 n)2/3 h̄2 /2m — that relation one
must now substitute into the Poisson equation ∆ϕ = 4πen ∝ (ϕ−ϕ0 )3/2 . The
boundary condition is ϕ → Z/r at r → 0 where Z is the charge of the nuclei.
This equation has a power-law solution at large distances ϕ ∝ r−4 , n ∝ r−6
which does not make much sense as atoms are supposed to have finite sizes.
Indeed, at large distances (∝ Z 1/3 ), the quasi-classical description breaks as
the quantum wavelength is comparable to the distance r. The description

68
also is inapplicable below the Bohr radius. The Thomas-Fermi approxima-
tion works well for large-atoms where there is an intermediate interval of
distances (Landau&Lifshitz, Quantum Mechanics, Sect. 70).

6 Phase transitions
6.1 Thermodynamic approach
The main theme of this Chapter is the competition (say, in minimizing the
free energy) between the interaction energy, which tends to order systems,
and the entropy, which brings a disorder. Upon the change of some parame-
ters, systems can undergo a phase transition from more to less ordered states.
We start this Chapter from the phenomenological approach to the transitions
of both first and second orders. We then proceed to develop a microscopic
statistical theory based on Ising model.

6.1.1 Necessity of the thermodynamic limit


So far we got the possibility of a phase transition almost for free by cooking
the equation of state for the van der Waals gas. But can one really derive
the equations of state that have singularities or discontinuities? Let us show
that this is impossible in a finite system. Indeed, the classical grand partition
function (expressed via fugacity z = exp(µ/T )) is as follows:


Z(z, V, T ) = z N Z(N, V, T ) . (137)
N =0

Here the classical partition function of the N -particle system is


1 ∫
Z(N, V, T ) = exp[−U (r1 , . . . , rN )/T ]dr1 , . . . , rN (138)
N !λ3N
and the thermal wavelength is given by λ2 = 2πh̄2 /mT . Now, interaction
means that for a finite volume V there is a maximal number of molecules
Nm (V ) that can be accommodated in V . That means that the sum in (137)
actually goes until Nm so that the grand partition function is a polynomial in
fugacity with all coefficients positive14 . The equation of state can be obtained
14
Even when one does not consider hard-core models, the energy of repulsion grows so
fast when the distance between molecules are getting less than some scale that Boltzmann
factor effectively kills the contribution of such configurations.

69
by eliminating z from the equations that give P (v) in a parametric form —
see (115,116):

P 1 1 [ ]
= ln Z(z, V ) = ln 1 + zZ(1, V, T ) + . . . + z Nm Z(Nm , V, T ) ,
T V V
1 z ∂ ln Z(z, V ) zZ(1, V, T ) + 2z 2 Z(2, V, T ) . . . + Nm z Nm Z(Nm , V, T )
= = .
v V ∂z V [1 + zZ(1, V, T ) +. . .+ z Nm Z(Nm , V, T )]

For Z(z) being a polynomial, both P and v are analytic functions of z in a


region of the complex plane that includes the real positive axis. Therefore,
P (v) is an analytic function in a region of the complex plane that includes
the real positive axis. Note that V /Nm ≤ v < ∞. One can also prove that
∂v −1 /∂z > 0 so that ∂P/∂v = (∂P/∂z)/(∂v/∂z) < 0.
For a first-order transition, the pressure must be independent of v in the
transition region. We see that strictly speaking in a finite volume we cannot
have that since P (v) is analytic, nor we can have ∂P/∂v > 0. That means
that singularities, jumps etc can appear only in the thermodynamic limit
N → ∞, V → ∞ (where, formally speaking, the singularities that existed
in the complex plane of z can come to the real axis). Such singularities are
related to zeroes of Z(z). When such a zero z0 tends to a real axis at the
limit N → ∞ (like the root e−iπ/2N of the equation z N + 1 = 0) then 1/v(z)
and P (z) are determined by two different analytic functions in two regions:
one, including the part of the real axis with z < z0 and another with z > z0 .
Depending on the order of zero of Z(z), 1/v itself may have a jump or its n-th
derivative may have a jump, which corresponds to the n + 1 order of phase
transition. For n = 0, P ∝ limN →∞ N −1 ln |z − z0 − O(N −1 )| is continuous at
z → z0 but ∂P/∂z and 1/v are discontinuous; this is the transition of the first
order. For a second-order transition, volume is continuous but its derivative
jumps. We see now what happens as we increase T towards Tc : another zero
comes from the complex plane into real axis and joins the zero that existed
there before, turning 1st order phase transition into the 2nd order transition;
at T > Tc the zeroes leave the real axis. That one needs N → ∞ for a phase
transition is one of the manifestation of the ”more is different” principle.
See more details in Huang, Sect. 15.1-2.

70
P 1/v P

first order

z z v
z0 z0
P 1/v P

second order

z z v
z0 z0

6.1.2 First-order phase transitions


Let us now consider equilibrium between phases from a general viewpoint.
We must have T1 = T2 and P1 = P2 . Requiring dG/dN1 = ∂G1 /∂N1 +
(∂G2 /∂N2 )(dN2 /dN1 ) = µ1 (P, T ) − µ2 (P, T ) = 0 we obtain the curve of the
phase equilibrium P (T ). We thus see on the P − T plane the states outside
the curve are homogeneous while on the curve we have the coexistence of two
different phases. If one changes pressure or temperature crossing the curve
then the phase transition happens. Three phases can coexist only at a point.
On the T − V plane the states with phase coexistence fill whole domains
(shaded on the figure) since different phases have different specific volumes.
Different points on the V − T diagram inside the coexistence domains cor-
respond to different fractions of phases. Consider, for instance, the point A
inside the gas-solid coexistence domain. Since the specific volumes of the
solid and the gas are given by the abscissas of the points 1 and 2 respectively
then the fractions of the phases in the state A are inversely proportional to
the lengthes A1 and A2 respectively (the lever rule).

71
P T
Critical point L−S L G−L
LIQUID
T tr G
SOLID S
1 A 2
Triple point
G−S
GAS T V
Ttr

Changing V at constant T in the coexistence domain (say, from the state


1 to the state 2) we realize the phase transition of the first order. Phase
transitions of the first order are accompanied by an absorption or release of
some (latent) heat L. Since the transition happens at fixed temperature and
pressure then the heat equals to the enthalpy change or simply to L = T ∆s
(per mole). If 2 is preferable to 1 at higher T (see the figure below) then
L > 0 (heat absorbed) as it must be according to the Le Chatelier principle:
µ

µ1 L>0
µ2 T

On the other hand, differentiating µ1 (P, T ) = µ2 (P, T ) and using s =


−(∂µ/∂T )P , v = (∂µ/∂P )T , one gets the Clausius-Clapeyron equation

dP s1 − s2 L
= = . (139)
dT v1 − v2 T (v2 − v1 )

Since the entropy of a liquid is usually larger than that of a solid then L > 0
that is the heat is absorbed upon melting and released upon freezing. Most
of the substances also expand upon melting then the solid-liquid equilibrium
line has dP/dT > 0, as on the P-T diagram above. Water, on the contrary,
contracts upon melting so the slope of the melting curve is negative (for-
tunate for fish and unfortunate for Titanic, ice floats on water). Note that
symmetries of solid and liquid states are different so that one cannot contin-
uously transform solid into liquid. That means that the melting line starts
on another line and goes to infinity since it cannot end in a critical point
(like the liquid-gas line).
Clausius-Clapeyron equation allows one, in particular, to obtain the pres-
sure of vapor in equilibrium with liquid or solid. In this case, v1 ≪ v2 .
We may treat the vapor as an ideal so that v2 = T /P and (139) gives

72
d ln P/dT = L/T 2 . We may further assume that L is approximately inde-
pendent of T and obtain P ∝ exp(−L/T ) which is a fast-increasing function
of temperature. Landau & Lifshitz, Sects. 81–83.

6.1.3 Second-order phase transitions


As we have seen, in the critical point, the differences of specific entropies and
volumes turn into zero. Considering µ(P ) at T = Tc one can say that the
chemical potential of one phase ends where another starts and the derivative
v(P ) = (∂µ/∂P )Tc is continuous.
µ v

µ2
µ1

P P
Another examples of continuous phase transitions (i.e. such that corre-
spond to a continuous change in the system) are all related to the change
in symmetry upon the change of P or T . Since symmetry is a qualitative
characteristics, it can change even upon an infinitesimal change (for exam-
ple, however small ferromagnetic magnetization breaks isotropy). Here too
every phase can exist only on one side of the transition point. The tran-
sition with first derivatives of the thermodynamic potentials continuous is
called second order phase transition. Because the phases end in the point
of transition, such point must be singular for the thermodynamic potential,
and indeed second derivatives, like specific heat, are generally discontinuous.
One set of such transitions is related to the shifts of atoms in a crystal lattice;
while close to the transition such shift is small (i.e. the state of matter is
almost the same) but the symmetry of the lattice changes abruptly at the
transition point. Another set is a spontaneous appearance of macroscopic
magnetization (i.e. ferromagnetism) below Curie temperature. Transition
to superconductivity is of the second order. Variety of second-order phase
transitions happen in liquid crystals etc. Let us stress that some transitions
with a symmetry change are first-order (like melting) but all second-order
phase transitions correspond to a symmetry change.

73
6.1.4 Landau theory
To describe general properties of the second-order phase transitions Lan-
dau suggested to characterize symmetry breaking by some order parameter
η which is zero in the symmetrical phase and is nonzero in nonsymmetric
phase. Example of an order parameter is magnetization. The choice of order
parameter is non-unique; to reduce arbitrariness, it is usually required to
transform linearly under the symmetry transformation. The thermodynamic
potential can be formally considered as G(P, T, η) even though η is not an
independent parameter and must be found as a function of P, T from requir-
ing the minimum of G. We can now consider the thermodynamic potential
near the transition as a series in small η:
G(P, T, η) = G0 + A(P, T )η 2 + B(P, T )η 4 . (140)
The linear term is absent to keep the first derivative continuous at η = 0. The
coefficient B must be positive since arbitrarily large values of η must cost a
lot of energy. The coefficient A must be positive in the symmetric phase when
minimum in G corresponds to η = 0 (left figure below) and negative in the
non-symmetric phase where η ̸= 0. Therefore, at the transition Ac (P, T ) = 0
and Bc (P, T ) > 0:
G G

A>0 A<0
η η

We assume that the symmetry of the system requires the absence of η 3 -


term, then the only requirement on the transition is Ac (P, T ) = 0 so that
the transition points fill the line in P − T plane. If the transition happens
at some Tc then generally near transition15 A(P, T ) = a(P )(T − Tc ). Writing
then the potential
G(P, T, η) = G0 + a(P )(T − Tc )η 2 + B(P, T )η 4 , (141)
and requiring ∂G/∂η = 0 we get
a
η̄ 2 = (Tc − T ) . (142)
2B
15
We assume a > 0 since in almost all cases the more symmetric state corresponds to
higher temperatures; rare exceptions exist so this is not a law.

74
In the lowest order in η the entropy is S = −∂G/∂T = S0 + a2 (T − Tc )/2B at
T < Tc and S = S0 at T > Tc . Entropy is lower at lower-temperature phase
(which is generally less symmetric). Specific heat Cp = T ∂S/∂T has a jump
at the transitions: ∆Cp = a2 Tc /2B. Specific heat increases when symmetry
is broken since more types of excitations are possible.
If symmetries allow the cubic term C(P, T )η 3 (like in a gas or liquid
near the critical point discussed in Sect. 7.2 below) then one generally has a
first-order transition, say, when A < 0 and C changes sign:
G G G

C>0 C=0 C<0

η η η

It turns into a second-order transition, for instance, when A = C = 0 i.e.


only in isolated points in P − T plane.
Consider now what happens when there is an external field (like magnetic
field) which contributes the energy (and thus the thermodynamic potential)
by the term −hηV . Equilibrium condition,

2a(T − Tc )η + 4Bη 3 = hV , (143)

has one solution η(h) above the transition and may have three solutions (one
stable, two unstable) below the transition:
η η η

h h h

T>T c T=T c T<T c

The similarity to the van der Waals isotherm is not occasional: changing
the field at T < Tc one encounters √ a first-order phase transition at h = 0
where the two phases with η = ± a(Tc − T )/2B coexist. We see that p − pc
is analogous to h and 1/v − 1/vc to the order parameter (magnetization) η.

75
Susceptibility,
( ) {
∂η V [2α(T − Tc )]−1 at T > Tc
χ= = = (144)
∂h h=0
2a(T − Tc ) + 12Bη 2 [4α(Tc − T )]−1 at T < Tc

which diverges at T → Tc . Compared to χ ∝ 1/T obtained in (36) for


the noninteracting spins we see that the paramagnetic phase corresponds to
T ≫ Tc . Experiments support the Curie law (144). Since a ∝ V we have
introduced α = a/V in (144).
We see that Landau theory (based on the only assumption that the ther-
modynamic potential must be an analytic function of the order parameter)
gives universal predictions independent on space dimensionality and of all
the details of the system except symmetries. Is it true? Considering specific
systems we shall see that Landau theory actually corresponds to a mean-field
approximation i.e. to neglecting the fluctuations. The potential is getting
flat near the transition then the fluctuations grow. In particular, the prob-
ability of the order parameter fluctuations around the equilibrium value η̄
behaves as follows [ ( ) ]
(η − η̄)2 ∂ 2 G
exp − ,
Tc ∂η 2 T,P
so that the mean square fluctuation of the order parameter, ⟨(η − η̄)2 ⟩ =
Tc /2A = Tc /2a(T − Tc ). Remind that a is proportional to the volume under
consideration. Fluctuations are generally inhomogeneous and are correlated
on some scale. To establish how the correlation radius depends on T − Tc one
can generalize the Landau theory for inhomogeneous η(r), which is done in
Sect. 7.2 below, where we also establish the validity conditions of the mean
field approximation and the Landau theory.
Landau & Lifshitz, Sects. 142, 143, 144, 146.

6.2 Ising model


We now descend from phenomenology to real microscopic statistical the-
ory. Our goal is to describe how disordered systems turn into ordered one
when interaction prevails over thermal motion. Different systems seem to be
having interaction of different nature with their respective difficulties in the
description. For example, for the statistical theory of condensation one needs
to account for many-particle collisions. Magnetic systems have interaction

76
of different nature and the technical difficulties related with the commuta-
tion relations of spin operators. It is remarkable that there exists one highly
simplified approach that allows one to study systems so diverse as ferromag-
netism, condensation and melting, order-disorder transitions in alloys, phase
separation in binary solutions, and also model phenomena in economics, so-
ciology, genetics, to analyze the spread of forest fires etc. This approach is
based on the consideration of lattice sites with the nearest-neighbor interac-
tion that depends upon the manner of occupation of the neighboring sites.
We shall formulate it initially on the language of ferromagnetism and then
establish the correspondence to some other phenomena.

6.2.1 Ferromagnetism
Experiments show that ferromagnetism is associated with the spins of elec-
trons (not with their orbital motion). Spin 1/2 may have two possible pro-
jections. We thus consider lattice sites with elementary magnetic moments
±µ. We already considered (Sect. 2.5.1) this system in an external magnetic
field H without any interaction between moments and got the magnetization
(35):
exp(µH/T ) − exp(−µH/T )
M = nµ . (145)
exp(µH/T ) + exp(−µH/T )
Of course, this formula gives no first-order phase transition upon the change
of the sign of H as in Landau theory (143). The reason is that we did not
account for interaction (i.e. B-term). First phenomenological treatment of
the interacting system was done by Weiss who assumed that there appears
some extra magnetic field proportional to magnetization which one adds to
H and thus describes the influence that M causes upon itself:
µ(H + ξM )
M = N µ tanh ). (146)
T
And now put the external field to zero H = 0. The resulting equation can
be written as
Tc η
η = tanh , (147)
T
where we denoted η = M/µN and Tc = ξµ2 N . At T > Tc there is a
single solution η = 0 while at T < Tc there are two more nonzero solutions
which exactly means the appearance of the spontaneous magnetization. At
Tc − T ≪ Tc one has η 2 = 3(Tc − T ) exactly as in Landau theory (142).

77
η
1

T/Tc
1
One can compare Tc with experiments and find surprisingly high ξ ∼ 103 ÷
4
10 . That means that the real interaction between moments is much higher
than the interaction between neighboring dipoles µ2 n = µ2 /a3 . Frenkel and
Heisenberg solved this puzzle (in 1928): it is not the magnetic energy but
the difference of electrostatic energies of electrons with parallel and antipar-
allel spins, so-called exchange energy, which is responsible for the interaction
(parallel spins have antisymmetric coordinate wave function and much lower
energy of interaction than antiparallel spins).
We can now at last write the Ising model (formulated by Lenz in 1920
and solved in one dimension by his student Ising in 1925): we have the
variable σi = ±1 at every lattice site. The energy includes interaction with
the external field and between neighboring spins:


N ∑
H = −µH σi + J/4 (1 − σi σj ) . (148)
i ij

We assume that every spin has γ neighbors (γ = 2 in one-dimensional chain, 4


in two-dimensional square lattice, 6 in three dimensional simple cubic lattice
etc). We see that parallel spins have zero interaction energy while antiparallel
have J (which is comparable to Rydberg).
Let us start from H = 0. Magnetization is completely determined by
the numbers of spins up: M = µ(N+ − N− ) = µ(2N+ − N ). We need
to write the free energy F = E − T S and minimizing it find N+ . The
competition between energy and entropy determines the phase transition.
N
Entropy is easy to get: S = ln CN + = ln[N !/N+ !(N − N+ )!]. The energy
of interaction depends on the number of neighbors with opposite spins N+− .
The crudest approximation (Bragg and Williams, 1934) is, of course, mean-
field, i.e. replacing N-particle problem with a single-particle problem. It
consists of saying that every up spin has the number of down neighbors
equal to the mean value γN− /N so that the energy ⟨H⟩ = E = JN+− ≈
γN+ (N − N+ )J/N . Requiring the minimum of the free energy, ∂F/∂N+ = 0,

78
we get:
N − 2N+ N − N+
γJ − T ln =0. (149)
N N+
Here we can again introduce the variables η = M/µN and Tc = γJ/2 and
reduce (149) to (147). We thus see that indeed Weiss approximation is equiv-
alent to the mean field. The only addition is that now we have the expression
for the free energy, F/2N = Tc (1−η 2 )−T (1+η) ln(1+η)−T (1−η) ln(1−η),
so that we can indeed make sure that the nonzero η at T < Tc correspond to
minima. Here is the free energy plotted as a function of magnetization, we
see that it has exactly the form we assumed in the Landau theory (which as
we see near Tc corresponds to the mean field approximation). The energy is
symmetrical with respect to flipping all the spins simultaneously. The free
energy is symmetric with respect to η ↔ −η. But the system at T < Tc lives
in one of the minima (positive or negative η). When the symmetry of the
state is less than the symmetry of the potential (or Hamiltonian) it is called
spontaneous symmetry breaking.
F
T<Tc
T=T_
c

T>Tc η

We can also calculate the specific heat using E = γN+ (N − N+ )J/N =


(Tc N/2)(1 − η 2 ) and obtain the jump exactly like in Landau theory:
√ dη
η= 3(Tc − T ) , ∆C = C(Tc − 0) = −Tc N η = 3N/2 .
dT
At T → 0, we have η ≈ 1 − 2 exp(−2Tc /T ) and the specific heat vanishes
exponentially: C(T ) ≈ 4N (Tc /T )2 exp(−2Tc /T ).
Note that in our approximation, when the long-range order (i.e. N+ )
is assumed to completely determine the short-range order (i.e. N+− ), the
energy is independent of temperature at T > Tc since N+ ≡ N/2. We do
not expect this in reality. Moreover, let us not delude ourselves that we
proved the existence of the phase transition. How wrong is the mean-field
approximation one can see comparing it with the exact solution for the one-
dimensional chain. Indeed, consider again H = 0. It is better to think
not about spins but about the links between spins. Starting from the first
spin, the state of the chain can be defined by saying whether the next one is

79
parallel to the previous one or not. If the next spin is opposite it gives the
energy J and if it is parallel the energy is zero. There are N − 1 links. The
partition function is that of the N − 1 two-level systems (37):

Z = 2[1 + exp(−J/T )]N −1 . (150)

Here 2 because there are two possible orientations of the first spin.
One can do it for a more general Hamiltonian (as in the Exercise 1.2)


N ∑
N N {
∑ }
H
H = −H σi − J σi σi+1 = − (σi + σi+1 ) + Jσi σi+1 , (151)
i=1 i=1 i=1 2

which we consider on a ring so that σN +1 = σ1 . We now write the partition


function as
[ N { }]
∑ ∑ H
Z = exp β (σi + σi+1 ) + Jσi σi+1 (152)
{σi } i=1 2
∑∏
N [ { }]
H
= exp β (σi + σi+1 ) + Jσi σi+1 (153)
{σi } i=1
2

Every factor in the product can be written as a matrix element ⟨σj |T̂ |σj+1 ⟩ =
Tσj σj+1 = exp[β{H(σi + σi+1 )/2 + Jσi σi+1 } of the transfer matrix
( )
T1,1 T1,−1
T = (154)
T−1,1 T−1,−1

where T11 = eβ(J+H) , T−1,−1 = eβ(J−H) , T−1,1 = T1,−1 = e−βJ . The sum over
σj = ±1 corresponds to taking trace of the matrix:
∑ ∑
Z= Tσ1 σ2 Tσ2 σ3 . . . TσN σ1 = ⟨σ1 |T̂ N |σ1 ⟩ = trace T N . (155)
{σi } σ1 =±1

Therefore

Z = λN N
1 + λ2 (156)

where λ1 and λ2 are eigenvalues of T , given by



λ1,2 = e βJ
cosh(βH) ± e2βJ sinh2 (βH) + e−2βJ (157)

80
Therefore
  ( )N 
 
λ2 
F = −kB T log(λN
1 + λ2 ) = −kB T N log(λ1 ) + log 1 +
N
λ1
→ −N kB T log λ1 as N →∞ (158)

Now, as we know, there is no phase transitions for a two-level system. In


particular one can compare the mean-field energy E = Tc (1 − η 2 ) with the
exact 1d expression (40) which can be written as E(T ) = N J/(1 + eJ/T ) and
compare the mean field specific heat with the exact 1d expression:
C mean−field
1d

T
We can improve the mean-field approximation by accounting exactly for
the interaction of a given spin σ0 with its γ nearest neighbors and replacing
the interaction with the rest of the lattice by a new mean field H ′ (this is
called Bethe-Peierls or BP approximation):

γ ∑
γ
Hγ+1 = −µH ′ σj − (J/2) σ0 σj . (159)
j=1 j=1

The external field H ′ is determined by the condition of self-consistency, which


requires that the mean values of all spins are the same: σ̄0 = σ̄i . To do that,
let us calculate the partition function of this group of γ + 1 spins:
∑ ( ∑
γ ∑
γ )
Z= exp η σj + ν σ0 σj = Z+ + Z− ,
σ0 ,σj =±1 j=1 j=1
∑ [ ∑γ ]
Z± = exp (η ± ν) σj = [2 cosh(η ± ν)]γ , η = µH ′ /T , ν = J/2T .
σj =±1 j=1

Z± correspond to σ0 = ±1. Requiring σ̄0 = (Z+ − Z− )/Z to be equal to


⟨ γ ⟩
1 ∑ 1 ∂Z
σ̄j = σj =
γ j=1 γZ ∂η
{ }
= Z −1 [2 cosh(η + ν)]γ−1 sinh(η + ν) + [2 cosh(η − ν)]γ−1 sinh(η − ν) ,

81
we obtain [ ]
γ−1 cosh(η + ν)
η= ln (160)
2 cosh(η − ν)
instead of (147) or (149). Condition that the derivatives with respect to η at
zero are the same, (γ − 1) tanh ν = 1, gives the critical temperature:
( )
γ
Tc = J ln−1 , γ≥2. (161)
γ−2
It is lower than the mean field value γJ/2 and tends to it when γ → ∞
— mean field is exact in an infinite-dimensional space. More important, it
shows that there is no phase transition in 1d when γ = 2 and Tc = 0 (in
fact, BP is exact in 1d). Note that η is now not a magnetization, which
is given by the mean spin σ̄0 = sinh(2η)/[cosh(2η) + exp(−2ν)]. While at
T > Tc both η = σ̄0 = 0, BP gives nonzero energy and specific heat. Indeed,
now Z = 2γ+1 coshγ (βJ/2) and F = −T ln Z = −(γ/β) ln cosh(βJ/2). The
energy is E = ∂(F β)/∂β = (γJ/2) tanh(βJ/2) and the specific heat, C =
(γJ 2 /8T 2 ) cosh2 (J/2T ), such that C ∝ T −2 at T → ∞ (see Pathria 11.6 for
more details):
C exact 2d solution

BP
mean−field

T/J
1.13 1.44 2
Bragg-Williams and Bethe-Peierls approximations are the first and the
second steps of some consistent procedure. When the space dimensionality
is large, then 1/d is a small parameter whose powers determine the contri-
butions of the subsequent approximations. Mean-field corresponds to the
total neglect of fluctuations, while BP accounts for them in the first approx-
imation. One can also say that it corresponds to the account of correlations:
indeed, correlations make fluctuations (like having many spins with the same
direction in a local neighborhood) more probable and require us to account
for them. The two-dimensional Ising model was solved exactly by Onsager
(1944). The exact solution shows the phase transition in two dimensions.
The main qualitative difference from the mean field is the divergence of the
specific heat at the transition: C ∝ − ln |1 − T /Tc |. This is the result of

82
fluctuations: the closer one is to Tc the stronger are the fluctuations. The
singularity of the specific heat ∫is integrable that is, for instance, the en-
tropy change S(T1 ) − S(T2 ) = TT12 C(T )dT /T is finite across the transition
(and goes to zero when T1√→ T2 ) and so is the energy change. Note also
that the true Tc = J/2 ln[( 2 − 1)−1 ] is less than both the mean-field value
Tc = γJ/2 = 2J and BP value Tc = J/ ln 2 also because of fluctuations
(one needs lower temperature to “freeze” the fluctuations and establish the
long-range order).

6.2.2 Impossibility of phase coexistence in one dimension


It is physically natural that fluctuations has much influence in one dimen-
sion: it is enough for one spin to flip to loose the information of the preferred
orientation. It is thus not surprising that phase transitions are impossible
in one-dimensional systems with short-range interaction. Another way to
understand that the ferromagnetism is possible only starting from two di-
mensions is to consider the spin lattice and ask if we can make temperature
low enough to have a nonzero magnetization. The state of lowest energy has
all spins parallel. The first excited state correspond to one spin flip and has
an energy higher by ∆E = γJ, the concentration of such opposite spins is
proportional to exp(−γJ/T ) and must be low at low temperatures so that
the magnetization is close to µN and η ≈ 1. In one dimension, however, the
lowest excitation is not the flip of one spin (energy 2J) but flipping all the
spins to the right or left from some site (energy J). Again the mean number
of such flips is N exp(−J/T ) and in sufficiently long chain this number is
larger than unity i.e. the mean magnetization is zero. Note that short pieces
with N < exp(J/T ) are magnetized.
That argument can be generalized for arbitrary systems with the short-
range interaction in the following way (Landau, 1950; Landau & Lifshitz,
Sect. 163.): assume we have n contact points of two different phases. Those
points add nϵ − T S to the thermodynamic potential. The entropy is ln CLn
where L is the length of the chain. Evaluating entropy at 1 ≪ n ≪ L
we get the addition to the potential nϵ − T n ln(eL/n). The derivative of
the thermodynamic potential with respect to n is thus ϵ − T ln(L/n) and it
is negative for sufficiently small n/L. That means that one decreases the
thermodynamic potential creating the mixture of two phases all the way
until the derivative comes to zero which happens at L/n = exp(ϵ/T ) — this
length can be called the correlation scale of fluctuations and it is always

83
finite in 1d at a finite temperature as in a disordered state. Let us stress
that for the above arguments it is important that the ground state is non-
degenerate so that the first excited state has a higher energy (degeneracy
leads to criticality).

6.2.3 Equivalent models


The anti-ferromagnetic case has J < 0 and the ground state at T = 0 corre-
sponds to the alternating spins i.e. to two sublattices. Without an external
magnetic field, the magnetization of every sublattice is the same as for Ising
model with J > 0 which follows from the fact that the energy is invariant
with respect to the transformation J → −J and flipping all the spins of
one of the sublattices. Therefore we have the second-order phase transition
at zero field and at the temperature which is called Neel temperature. The
difference from ferromagnetic is that there is a phase transition also at a
nonzero external field (there is a line of transition in H − T plane.
One can try to describe the condensation transition by considering a
regular lattice with N cites that can be occupied or not. We assume our
lattice to be in a contact with a reservoir of atoms so that the total number
of atoms, Na , is not fixed. We thus use a grand canonical description with
Z(z, N, T ) given by (108). We model the hard-core repulsion by requiring
that a given cite cannot be occupied by more than one atom. The number of
cites plays the role of volume (choosing the volume of the unit cell unity). If
the neighboring cites are occupied by atoms it corresponds to the (attraction)
energy −2J so we have the energy E = −2JNaa where Naa is the total
number of nearest-neighbor pairs of atoms. The partition function is

a
Z(Na , T ) = exp(2JNaa /T ) , (162)

where the sum is over all ways of distributing Na indistinguishable atoms


over N cites. Of course, the main problem is in calculating how many times
one finds the given Naa . The grand partition function,


Z(z, V, T ) = z Na Z(Na , T )) , (163)
Na

gives the equation of state in the implicit form (like in Sect. 6.1.1): P =
T ln Z/N and 1/v = (z/V )∂ ln Z/∂z. The correspondence with the Ising
model can be established by saying that an occupied site has σ = 1 and

84
unoccupied one has σ = −1. Then Na = N+ and Naa = N++ . Recall that
for Ising model, we had E = −µH(N+ − N− ) + JN+− = µHN + (Jγ −
2µH)N+ − 2JN++ . Here we used the identity γN+ = 2N++ + N+− which
one derives counting the number of lines drawn from every up spin to its
nearest neighbors. The partition function of the Ising model can be written
similarly to (163) with z = exp[(γJ − 2µH)/T ]. Further correspondence can
be established: the pressure P of the lattice gas can be expressed via the
free energy per cite of the Ising model: P ↔ −F/N + µH and the inverse
specific volume 1/v = Na /N of the lattice gas is equivalent to N+ /N =
(1 + M/µN )/2 = (1 + η)/2. We see that generally (for given N and T ) the
lattice gas corresponds to the Ising model with a nonzero field H so that the
transition is generally of the first-order in this model. Indeed, when H = 0
we know that η = 0 for T > Tc which gives a single point v = 2, to get the
whole isotherm one needs to consider the nonzero H i.e. the fugacity different
from exp(γJ). In the same way, the solutions of the zero-field Ising model
at T < Tc gives us two values of η that is two values of the specific volume
for a given pressure P . Those two values, v1 and v2 , precisely correspond to
two phases in coexistence at the given pressure. Since v = 2/(1 + η) then as
T → 0 we have two roots η1 → 1 which correspond to v1 → 1 and η1 → −1
which corresponds to v1 → ∞. For example, in the mean field approximation
(149) we get (denoting B = µH)

γJ T 1 − η2 γJ T
P =B− (1 + η ) − ln
2
, B= − ln z ,
4 (
2 4 ) 2 2
2 B γJη
v= , η = tanh + . (164)
1+η T 2T
As usual in a grand canonical description, to get the equation of state one
expresses v(z) [in our case B(η)] and substitutes it into the equation for the
pressure. On the figure, the solid line corresponds to B = 0 at T < Tc where
we have a first-order phase transition with the jump of the specific volume,
the isotherms are shown by broken lines. The right figure gives the exact
two-dimensional solution.

85
P/T
c P/ T
c

5
0.2

T
1 T
c

v v
0
1 2 3 0 1 2 3

The mean-field approximation (164) is equivalent to the Landau theory


near the critical point. In the variables t = T − Tc , η = n − nc the equation of
state takes the form p = P − Pc = bt + 2atη + 4Cη 3 with C > 0 for stability
and a > 0 to have a homogeneous state at t > 0. In coordinates p, η the
isotherms at t = 0 (upper curve) and t < 0 (lower curve) look as follows:
p
η2 η
η1

The densities of the two phases in equilibrium, η1 , η2 are given by the


condition
∫2 ∫2 ∫η2 ( ) ∫η2 (
∂p )
v dp = 0 ⇒ η dp = η dη = η 2at + 12Cη 2 dη = 0 , (165)
∂η t η1
1 1 η1 1

where we have used v = n−1 ∼ n−1 −2


c − ηnc . We find from (165) η1 = −η2 =
(−at/2C)1/2 . According to Clausius-Clapeyron equation
√ (139) we get the
latent heat of the transition q ≈ bTc (η1 − η2 )/nc ∝ −t. We thus have the
2

phase transition of the first order at t < 0. As t → −0 this transition is


getting close to the phase transitions of the second order. See Landau &
Lifshitz, Sect. 152.
As T → Tc the mean-field theory predicts 1/v1 − 1/v2 ∝ (Tc − T )1/2 while
the exact Onsager solution gives (Tc − T )1/8 . Real condensation transition

86
gives the power close 1/3. Also lattice theories give always (for any T )
1/v1 + 1/v2 = 1 which is also a good approximation of the real behavior (the
sum of vapor and liquid densities decreases linearly with the temperature
increase but very slowly). One can improve the lattice gas model considering
the continuous limit with the lattice constant going to zero and adding the
pressure of the ideal gas.
Another equivalent model is that of the binary alloy that is consisting
of two types of atoms. X-ray scattering shows that below some transition
temperature there are two crystal sublattices while there is only one lattice at
higher temperatures. Here we need the three different energies of inter-atomic
interaction: E = ϵ1 N11 +ϵ2 N22 +ϵ12 N12 = (ϵ1 +ϵ2 −2ϵ12 )N11 +γ(ϵ12 −ϵ2 )N1 +
γϵ2 N/2. This model described canonically is equivalent to the Ising model
with the free energy shifted by γ(ϵ12 − ϵ2 )N1 + γϵ2 N/2. We are interested in
the case when ϵ1 + ϵ2 > 2ϵ12 so that it is indeed preferable to have alternating
atoms and two sublattices may exist at least at low temperatures. The phase
transition is of the second order with the specific heat observed to increase
as the temperature approaches the critical value. Huang, Chapter 16 and
Pathria, Chapter 12.
As we have seen, to describe the phase transitions of the second order
near Tc we need to describe strongly fluctuating systems. We shall study
fluctuations more systematically in the next section and return to critical
phenomena in Sects. 7.2 and 7.4.

87
7 Fluctuations
7.1 Thermodynamic fluctuations
Consider fluctuations of energy and volume of a given (small) subsystem. The
probability of a fluctuation is determined by the entropy change of the whole
system w ∝ exp(∆S0 ) which is determined by the minimal work needed for
a reversible creation of such a fluctuation: T ∆S0 = −Rmin . For example,
if the fluctuation is that the system starts moving as a whole with the ve-
locity v then the minimal work is the kinetic energy of the system, so that
the probability of such a fluctuation is w(v) ∝ exp(−M v 2 /2T ). Generally,
Rmin = ∆E + P0 ∆V − T0 ∆S, where ∆S, ∆E, ∆V relate to the subsystem.
Indeed, the energy change of the subsystem ∆E is equal to the work R done
on it (by something from outside the system) plus the work done by the rest
of the system −P0 ∆V0 = P0 ∆V plus the heat received from the rest of the
system T0 ∆S0 . Minimal work corresponds to ∆S0 = −∆S. In calculating
variations we also assume P, T equal to their mean values which are P0 , T0 .
Stress that we only assumed the subsystem to be small i.e. ∆S0 ≪ S0 ,
E ≪ E0 , V ≪ V0 while fluctuations can be substantial, i.e. ∆E can be
comparable with E.
V0 S0

V − ∆S 0

Rmin
E0

If, in addition, we assume the fluctuations to be small (∆E ≪ E) we can


expand ∆E(S, V ) up to the first non-vanishing terms (quadratic):

Rmin = ∆E +P ∆V −T ∆S = [ESS (∆S)2 +2ESV ∆S∆V +EV V (∆V )2 ]/2


= (1/2)(∆S∆ES + ∆V ∆EV ) = (1/2)(∆S∆T − ∆P ∆V ) . (166)

Written in such a way, it shows a sum of contributions of hidden and me-


chanical degrees of freedom. Of course, only two variables are independent.
From that general formula one obtains different cases by choosing different
pairs of independent variables. In particular, choosing an extensive vari-
able from one pair and an intensive variable from another pair (i.e. either
V, T or P, S), we get cross-terms cancelled because of Maxwell identities like

88
(∂P/∂T )V = (∂S/∂V )T . That means the absence of cross-correlation i.e.
respective quantities fluctuate independently16 : ⟨∆T ∆V ⟩ = ⟨∆P ∆S⟩ = 0.
Indeed, choosing T and V as independent variables we must express
( ) ( ) ( )
∂S ∂S Cv ∂P
∆S = ∆T + ∆V = ∆T + ∆V (167)
∂T V ∂V T T ∂T V

and obtain [ ( ) ]
Cv 1 ∂P
w ∝ exp − 2 (∆T )2 + (∆V ) 2
. (168)
2T 2T ∂V T

Mean squared fluctuation of the volume (for a given number of particles),

⟨(∆V )2 ⟩ = −T (∂V /∂P )T ,

gives the fluctuation of the specific volume

⟨(∆v)2 ⟩ = N −2 ⟨(∆V )2 ⟩

which can be converted into the mean squared fluctuation of the number of
particles in a fixed volume:
( )
V 1 V ∆N N2 ∂V
∆v = ∆ = V ∆ = − , ⟨(∆N ) ⟩ = −T 2
2
. (169)
N N N2 V ∂P T

We see that the fluctuations cannot be considered small near the critical
point where ∂V /∂P is large.
For a classical ideal gas with V = N T /P it gives ⟨(∆N )2 ⟩ = N . In this
case, we can do more than considering small fluctuations (or large volumes).
Namely, we can find the probability of fluctuations comparable to the mean
value N̄ = N0 V /V0 . The probability for N (noninteracting) particles to be
inside some volume V out of the total volume V0 is
( ) ( )
N0 ! V N V0 − V N0 −N
wN =
N !(N0 − N )! V0 V0
N( )N0 N
N̄ N̄ N̄ exp(−N̄ )
≈ 1− ≈ . (170)
N! N0 N!
16
Remind that the Gaussian probability distribution w(x, y) ∼ exp(−ax2 − 2bxy − cy 2 )
corresponds to the second moments ⟨x2 ⟩ = 2c/(ac − b2 ), ⟨y 2 ⟩ = a/(ac − b2 ) and to the
cross-correlation ⟨xy⟩ = 2b/(b2 − ac).

89
Here we assumed that N0 ≫ N and N0 ! ≈ (N0 − N )!N0N . Note that N0
disappeared from (170). The distribution (170) is called Poisson law which
takes place for independent events. Mean squared fluctuation is the same as
for small fluctuations:
∑ N̄ N N
⟨(∆N )2 ⟩ = ⟨N 2 ⟩ − N̄ 2 = exp(−N̄ ) − N̄ 2
N =1 (N − 1)!
[ ]
∑ N̄ N ∑ N̄ N
= exp(−N̄ ) + − N̄ 2 = N̄ . (171)
N =2 (N − 2)! N =1 (N − 1)!
Recall that the measurement volume is proportional to N̄ . In particular,
the probability that a given volume is empty (N = 0) decays exponentially
with the volume. On the other hand, to cram more than average number
of particles into the volume decays with N in a factorial way, i.e. faster
than exponential: wN ∝ exp[−N ln(N/N̄ )]. One can check that near the
maximum, at |N − N̄ | ≪ N̄ , the Poisson distribution coincide with the
Gaussian distribution: wN = (2π N̄ )−1/2 exp[−(N − N̄ )2 /2N̄ ].
Of course, real molecules do interact, so that the statistics of their density
fluctuations deviate from the Poisson law, particularly near the critical point
where the interaction energy is getting comparable to the entropy contribu-
tion into the free energy.
Landau & Lifshitz, Sects. 20, 110–112, 114.

7.2 Spatial correlation of fluctuations


We now consider systems with interaction and discuss a spatial correlation of
fluctuations of concentration n = N/V , which is particularly interesting near
the critical point. Indeed, the level of fluctuations depends on the volume of
the subsystem. To estimate whether the fluctuations are important or not
(say in destroying the order appearing in a phase transition) we need to con-
sider the subsystem size of the order of the correlation scale of fluctuations.
We have seen that in 1d the correlation scale is always finite and turns into
infinity only at zero temperature. We consider 3d in this section and 2d in
the next one.
Since the fluctuations of n and T are independent, we assume T = const
so that the minimal work is the change in the free energy, which we again
expand to the quadratic terms
1∫
w ∝ exp(−∆F/T ) , ∆F = ϕ(r12 )∆n(r1 )∆n(r2 ) dV1 dV2 . (172)
2
90
Here ϕ is the second (variational) derivative of F with respect to n(r). After
Fourier transform,
∑ 1 ∫ −ikr

∆n(r) = ∆nk e ikr
, ∆nk = ∆n(r)e dr , ϕ(k) = ϕ(r)e−ikr dr .
k V

the free energy change takes the form


V ∑
∆F = ϕ(k)|∆nk |2 ,
2 k

which corresponds to a Gaussian probability distribution of independent vari-


ables - amplitudes of the harmonics. The mean squared fluctuation is as
follows
T
⟨|∆nk |2 ⟩ = . (173)
V ϕ(k)
Usually, the largest fluctuations correspond to small k where we can use the
expansion called the Ornshtein-Zernicke approximation

ϕ(k) ≈ ϕ0 + 2gk 2 . (174)

This presumes short-range interaction which makes large-scale limit regular.


From the previous section, ϕ0 (T ) = n−1 (∂P/∂n)T . The free energy must
increase when order parameter is not constant in space, so the coefficient g
is assumed positive.
Making the inverse Fourier transform we find (the large-scale part of) the
pair correlation function of the concentration in 3d:
∑ ∫
V d3 k
⟨∆n(0)∆n(r)⟩ = |∆nk |2 eikr = |∆nk |2 eikr
k (2π)3
∫ ∞ e ikr
− e−ikr V k 2 dk T exp(−r/rc )
= |∆nk |2 2
= . (175)
0 ikr (2π) 8πgr

One can derive that by recalling (from Sect. 5.3) that (κ2 − ∆) exp(−κr)/r =
4πδ(r) or directly: expand the integral to −∞ and then close the contour in
the complex upper half plane for the first exponent and in the lower half plane
for the second exponent so that the integral is determined by the respective
poles k = ±iκ = ±irc−1 . We defined the correlation radius of fluctuations
rc = [2g(T )/ϕ0 (T )]1/2 . Far from any phase transition, the correlation radius
is typically the mean distance between molecules.

91
Not only for the concentration but also for other quantities, (175) is a
general form of the correlation function at long distances. Near the critical
point, ϕ0 (T ) ∝ A(T )/V = α(T − Tc ) decreases and the correlation radius
increases. To generalize the Landau theory for inhomogeneous η(r) one writes
the space density of the thermodynamic potential uniting (141) and (174) as

F {η(r)} = g|∇η|2 + α(T − Tc )η 2 + bη 4 . (176)

The thermodynamic potential is now a functional since it depends on the


whole function η(r). Addition of the extra term g|∇η|2 means that having
an inhomogeneous state costs us extra value of the thermodynamic poten-
tial. We accounted for that extra cost using only the first spatial derivative,
which means that this expression is obtained by effectively averaging the
potential over the scales large comparing to the inter-atom distances (but
small comparing to the correlation radius, so that this expression makes
sense only not far from the critical temperature. We again assume that only
the coefficient at η 2 turns into zero at the √ transition. The correlation
√ ra-
dius diverges at the transition : rc (T ) = g/α(T − Tc ) = rc0 Tc /(T − Tc ).
17

Here we expressed g ≃ αTc rc0 2


via the correlation radius far from the transi-
tion. We must now estimate the typical size of fluctuations for the volume
rc3 : ⟨(∆η)2 ⟩ ≃ T /A ≃ T /ϕ0 rc3 . As any thermodynamic approach, the Landau
theory is valid only if the mean square fluctuation on the scale of rc is much
less than η̄, so-called Ginzburg-Levanyuk criterium:
( )6
Tc α(T − Tc ) T − Tc b2 ri
≪ ⇒ ≫ ≡ . (177)
2αrc3 (T − Tc ) b Tc 6
α4 rc0 rc0
We introduced ri3 = b/α2 which can be interpreted as the volume of inter-
action: if we divide the energy density of interaction bη 4 ≃ b(α/b)2 by the
energy of a single degree of freedom Tc we get the number of degrees of free-
dom per unit volume i.e. ri−3 . Since the Landau theory is built at T −Tc ≪ Tc
then it has validity domain only when ri /rc0 ≪ 1 which often takes place
(in superconductors, this ratio is less than 10−2 ). In a narrow region near Tc
fluctuations dominate. We thus must use the results of the Landau theory
only outside the fluctuation region, in particular, the jump in the specific
heat ∆Cp is related to the values on the boundaries of the fluctuation region.
17
The correlation radius generally stays finite at a first-order phase transition. The
divergence of rc at T → Tc means that fluctuations are correlated over all distances so
that the whole system is in a unique critical phase at a second-order phase transition.

92
Landau theory predicts rc ∝ (T − Tc )−1/2 and the correlation function
approaching the power law 1/r as T → Tc in 3d. Of course, those scalings
are valid under the condition that the criterium (177) is satisfied that is
not very close to Tc . As we have seen from the exact solution of 2d Ising
model, the true asymptotics at T → Tc (i.e. inside the fluctuation region)
are different: rc ∝ (T − Tc )−1 and φ(r) = ⟨σ(0)σ(r)⟩ ∝ r−1/4 at T = Tc
in that case. Yet the fact of the radius divergence remains. It means the
breakdown of the Gaussian approximation for the probability of fluctuations
since we cannot divide the system into independent subsystems. Indeed, far
from the critical point, the probability distribution of the density has two
approximately Gaussian peaks, one at the density of liquid nl , another at
the density of gas ng . As we approach the critical point and the distance
between peaks is getting comparable to their widths, the distribution is non-
Gaussian. In other words, one needs to describe a strongly interaction system
near the critical point which makes it similar to other great problems of
physics (quantum field theory, turbulence).
w

ng nl n

7.3 Different order parameters


Phenomenological treatment in the previous section assumed that at critical-
ity (ϕ0 = 0) the energy of the perturbation goes to zero with the wave num-
ber. Here we show that this is the case even below the critical temperature
when the symmetry broken by the phase transition is continuous. Remind
that in the Ising model the symmetry was discrete (up-down, present-absent)
and was described by a scalar real order parameter.

7.3.1 Goldstone mode and Mermin-Wagner theorem


Here we consider the case when the Hamiltonian is invariant under O(n)
rotations and the order parameter is a vector (η1 , . . . , ηn ). Then the ana-

log of the Landau thermodynamic potential must have a form g i |∇ηi |2 +

93
∑ (∑ )2
α(T − Tc ) i ηi2 + b i ηi2 . Here we assumed that the interaction is short-
range and used the Ornshtein-Zernicke approximation for the spatial depen-
dence. When T < Tc the minimum corresponds to breaking the O(n) sym-
metry, for example, by taking the first component [α(Tc − T )/2b]1/2 and the
other components zero. Considering fluctuations we put ([α(Tc − T )/2b]1/2 +

η1 , η2 , . . . , ηn ) and obtain the thermodynamic potential g i |∇ηi |2 + 2α(Tc −
T )η12 + higher order terms. That form means that only the longitudinal mode
η1 has a finite correlation length rc = [2α(Tc − T )]−1/2 . Almost uniform fluc-
tuations of the transverse modes do not cost any energy. This is an example
of the Goldstone theorem which claims that whenever continuous symmetry
is spontaneously broken (i.e. the symmetry of the state is less than the sym-
metry of the thermodynamic potential or Hamiltonian) then the mode must
exist with the energy going to zero with the wavenumber. This statement is
true beyond the mean-field approximation or Landau theory as long as the
force responsible for symmetry breaking is short-range. For a spin system,
the broken symmetry is rotational and the Goldstone mode is the excited
state where the spin turns as the location changes, as shown in the Figure.
That excitation propagates as a spin wave. For a solid, the broken symmetry
is translational and the Goldstone mode is a phonon.
1
0 1
0 11
1
00
0
111
01
0 1111
1111
111
1111
0000
0000
000
0000
1
0 1
0 1
00
0
11
1
01
0 1111
1111
111
1111
0000
0000
000
0000
1
0 1
0 1
00
0
11
1
00
0
1
0
1
0
1
0 1111
1111
111
1111
0000
0000
000
0000
1
0
1
0
1
0
1
0 11
1 0
00
0
1
10
1
0 1111
1111
111
1111
0000
0000
000
0000
1111
1111
111
1111
0000
0000
000
0000
1
0 1
0 11
1
00
0
111
01
0 1111
1111
111
1111
0000
0000
000
0000
1
0 1
0 1
00
0
111
01
10 1111
1111
111
1111
0000
0000
000
0000
1
0 1
0 1 0
00
0 1
0 1111
1111
111
1111
0000
0000
000
0000
Equivalent ground states

1
0
1 1
0 111
000 111
000
0
1
0
1
0
1
0
111
000
111
000 11111
00000
111
000 1111
0000
1
0 1
0 111
000 11111
00000
111
000 1111
0000
11111
1
0 1 000
0 00000
111 111
000 1111
0000
11111
00000 1111
0000
1
0 1
0 111
000 111
000 1111
0000
11111
00000 1111
0000
1111
1
0 1
0 111
000 111
000 0000
11111
00000 1111
0000
1111
0000
1
0 1
0 111 111
000
000 11111
00000 1111
0000
1111
0000
Goldstone mode − spin wave
Goldstone modes are easily excited by thermal fluctuations and they de-
stroy long-range order for d ≤ 2. Indeed, in less than three dimensions, the
integral (175) at ϕ0 = 0 describes the correlation function which grows with
the distance:
∫ ∫ 1/r r2−d − L2−d
⟨∆η(0)∆η(r)⟩ ∝ (1 − eikr )k −2 d d k ≈ k −2 d d k ∝ . (178)
1/L d−2

For example, ⟨∆n(0)∆n(r)⟩ ∝ ln r in 2d. Simply speaking, if at some point


you have some value then far enough from this point the value can be much

94
larger. That means that the state is actually disordered despite rc = ∞: soft
(Goldstone) modes with no energy price for long fluctuations (ϕ0 = 0) destroy
long order (this statement is called Mermin-Wagner theorem). One can state
this in another way by saying that the mean variance of the order parameter,
⟨(∆η)2 ⟩ ∝ L2−d , diverges with the system size L → ∞ at d ≤ 2. In exactly
the same way phonons with ωk ∝ k make 2d crystals impossible: the energy
of the lattice vibrations is proportional to the squared atom velocity (which
is the frequency ωk times displacement uk ),∫ T ≃ ωk2 u∫2k ; that makes mean
squared displacement proportional to ⟨u2 ⟩ ∝ dd ku2k = dd kT /ωk2 ∝ L2−d —
in large enough samples the amplitude of displacement is getting comparable
to the distance between atoms in the lattice, which means the absence of a
long-range order.
Let us consider the so-called XY model which describes a system of two-
dimensional spins s = (s cos φ, s sin φ) with ferromagnetic interaction i.e.
with the Hamiltonian
∑ ∑
H = −J s(i) · s(j) = −Js2 cos(φi − φj ) . (179)
i,j i,j

At low enough temperature, |φi − φj | ≪ 2π, and we may approximate cos


and also go into a continuous limit (spin-wave approximation):
γN Js2 Js2 ∑
H ≈ − + |φi − φj |2
2 2 i,j
γN Js2 Js2 ∫
≈ − + |∇φ(r)|2 d2 r . (180)
2 2
Of course, that Hamiltonian can be written as a sum over Fourier harmonics
∑ ∑
H + γN Js2 /2 = k Hk = N a2 Js2 k k 2 |φk |2 /2 with each term having an
Ornstein-Zernike form. Here a is the lattice spacing. There is no φ20 term
because of the O(2) rotational symmetry of the spins which corresponds to
translational symmetry of the phases φ. In this (low-T ) approximation, the
phases have Gaussian statistics with the pair correlation function which is
logarithmic: ⟨φ(r)φ(0)⟩ ∝ ln r. Let us calculate now the correlation function
between two spins distance r apart:
∫ { }
∑[ ( ) ]
⟨exp[iφ(r) − iφ(0)]⟩ = dφk dφ∗k exp iφk e i(kr)
− 1 − βHk
k
[ ]
∑ 1 − cos(kr)
−1
= exp −(βN Js2 a2 )
k k2

95
[ ] ( )−1/2πβJs2
2 −1 πr
≈ exp −(2πβJs ) ln(πr/a) = . (181)
a
Here we used the formula of Gaussian integration
∫ ∞ √ 2 A−1 /2
dφe−Aφ 2π/Ae−J
2 /2+iJφ
= . (182)
−∞

We see that the correlation function falls of by a power law at however


low temperature, and it does not approach constant as in a state with a
long-range order. We thus conclude that there is no long-range order at all
temperatures. Description looks practically the same for two dimensional
crystals where the energy is proportional to the squared difference between
the displacements of the neighboring atoms.
Another example of the Goldstone mode destroying the long-range co-
herence is the case of the complex scalar order parameter Ψ = ϕ1 + ıϕ2
(say, the amplitude of the quantum condensate). In this case, the density
of the Landau thermodynamic potential invariant with respect to the phase
change Ψ → Ψ exp(iα) (called global gauge invariance) has the (so-called
Landau-Ginzburg) form

F = g|∇Ψ|2 + α(T − Tc )|Ψ|2 + b|Ψ|4 . (183)

At T < Tc , the (space-independent) minima of the potential form a circle in


ϕ1 − ϕ2 plane: |Ψ|2 = ϕ21 + ϕ22 = α(Tc − T )/2b = ϕ20 . Any ordered (coherent)
state would correspond to some choice of the phase, say ϕ1 = ϕ0 , ϕ2 = 0. For
small perturbations around this state, Ψ = ϕ0 + φ + ıξ, the quadratic part of
the potential takes the form g|∇ξ|2 +g|∇φ|2 +4bϕ20 |φ|2 . Since there is no term
proportional to ξ 2 , then fluctuations of ξ have an infinite correlation radius
even at T < Tc ; their correlation function (175) diverges at d ≤ 2, which
means that phase fluctuations destroy coherence of 2d quantum condensate.

7.3.2 Berezinskii-Kosterlitz-Thouless phase transition


Still, a power-law decay of correlations (181) is very much different from the
exponential decay in a state with a finite correlation radius. That is the state
with a power-law decay formally corresponds to an infinite correlation radius.
A long-range order is absent in that state yet a local order exists, which means
that at sufficiently low temperatures superfluidity and superconductivity can
exist in 2d films, and 2d crystals can support transverse sound (recall that

96
longitudinal sound exists in fluids as well, so it is transverse sound which is
a defining property of a solid). Remind that our consideration (179-181) was
for sufficiently low temperatures. One then asks if the power-law behavior
disappears and a finite correlation radius appears above some temperature.
The answer is that, apart from Goldstone modes, there is another set of
excitations, which destroy an order and lead to a finite correlation radius.
For a 2d crystal, an order is destroyed by randomly placed dislocations while
for a spin system or condensate by vortices. As opposite to initial spins or
atoms that create crystal or condensate and interact locally, vortices and
dislocations have a long-range (logarithmic) interaction and the energy E of
a single vortex is proportional to the logarithms of a sample size L. Indeed,
for the XY model in the spin-wave approximation one can minimize the free
energy:
δ ∫
|∇φ(r′ )|2 dr′ = ∆φ = 0 .
δφ(r)
The vortex is the solution of this equation, which in the polar coordinates
r, θ is simply

φ(r, θ) = mθ so that (∇φ)θ = m/r and the energy E =
(Js2 /2) |∇φ(r)|2 d2 r = πJs2 m2 ln(L/a). But the entropy S associated with
a single vortex is also logarithmic: S = ln(L/a)2 . That means that there
exists some sufficiently high temperature T = πJs2 m2 /2 when the energy
contribution into the free energy E − T S is getting less than the entropy
contribution T S. That temperature corresponds to the so-called Berezinskii-
Kosterlitz-Thouless (BKT) phase transition from a high-T state of free dislo-
cations/vortices to a low-T where there are no isolated vortices. Free vortices
provide for a screening (a direct analog of the Debye screening of charges) so
that the high-T state has a finite correlation length equal to the Debye ra-
dius. Actually, in a low-T state (with power-law correlations) those vortices
that exist are bound into dipole vortex pairs whose energy does not depend
on the sample size.
Let now describe a vortex as an inhomogeneous state that corresponds
to an extremum of the thermodynamic Landau-Ginzburg potential, which
contains nonlinearity. Varying the potential (183) with respect to Ψ∗ we get
the equation
( )
δF/δΨ∗ = −g∆Ψ + 2b |Ψ|2 − ϕ20 Ψ = 0 . (184)

In polar coordinates, the Laplacian takes the form ∆ = r−1 ∂r r∂r + r−2 ∂θ2 .
Vortex corresponds to Ψ = A exp(imθ) where A can be chosen real and

97
integer m determines the circulation. Going around the vortex point in 2d
(vortex line in 3d), the phase acquires 2mπ. The dependence on the angle
θ makes the condensate amplitude turning into zero on the axis: A ∝ rm as
r → 0. At infinity, we have a homogeneous condensate: limr→∞ (A − ϕ0 ) ∝
r−2 . That this is a vortex is also clear from the fact that there is a current
J (and the velocity) around it:
( )
∂ψ ∗ ∂ψ A2
Jθ ∝ i ψ − ψ∗ = ,
r∂θ r∂θ r
so that the circulation is independent of the distance from the vortex. Second,
notice that the energy of a vortex indeed diverges logarithmically with the
sample size (as well as the energy of dislocations and 2d Coulomb charges):
velocity around the vortex
∫ 2 2 ∫ 2
decays as v ∝ 1/r so that the kinetic energy di-
verges as A v d r ∝ d r/r2 . A single vortex/charge has its kinetic/electrostatic
2

energy proportional to the logarithm of the sample size L; therefore, at low


temperatures, one does not meet isolated charges but only dipoles, i.e. pairs
with m = ±1, whose energy is determined by the inter-vortex distance and
is independent of the system size L. Dipoles cannot screen which leads to
power-law correlations. As we argued, since any charge can be placed in
roughly (L/a)2 different positions, where a is the charge size (vortex core),
then the entropy S = 2 ln(L/a) is also logarithmic. The free energy is as
follows:
[ ]
α(Tc − T )
F = E − T S = (A2 − 2T ) ln(L/a) = − 2T ln(L/a) .
2b

We find that the second (BKT) phase transition happens at


α
TBKT = Tc ,
α + 4b
i.e. at the temperature less than Tc , when the condensate exists. At T >
TBKT , the entropy term T S wins so that the minimum of F corresponds to
proliferation of isolated charges which provide for screening and an exponen-
tial decay of correlations.
The statistics of vortices and similar systems can be studied using a simple
model of 2d Coulomb gas, which is described by the Hamiltonian

H=− mi mj ln |ri − rj | (185)
i̸=j

98
with mi = ±1. Since we are interested only in the mutual positions of
the vortices, we accounted only for the energy of their interaction. One
can understand it better if we recall that the Hamiltonian describes also
dynamics. The coordinates ri = (xi , yi ) of the point vortices (charges) satisfy
the Hamiltonian equations
∂H ∑ yi − yj ∂H ∑ xj − xi
mi ẋi = = mi mj , mi ẏi = − = mi mj ,
∂yi j |ri − rj |2 ∂xi j |ri − rj |2

which simply tell that every vortex moves in the velocity field induced by all
other vortices.
For example, the statistics of the XY-model at low temperatures can be
considered with the Hamiltonian which is a sum of (180) and (185), and
the partition function respectively a product of those due to spin waves and
vortices. In the same way, condensate perturbations are sound waves and
vortices. Since the Gaussian partition function of the waves is analytic,
the phase transition can only originate from the vortices; near such phase
transition the vortex degrees of freedom can be treated separately.
In the canonical approach, one writes the Gibbs distribution P = Z −1 exp(−βH).
The partition function,
∫ ∏∫
2 βmi mj 2
Z(β) = exp(−βH) Πi d ri = V rij d rij ,
i̸=j

behaves at rij → ϵ → 0 as ϵ2(N −1)−β(N −N (N −2))/4 . Here we took into account


2

that a neutral system with N vortices must have the following number of
pairs: N+− = (N/2)2 and N++ = N−− = N (N − 2)/8. That is the integral
converges only for β < 4(N − 1)/N . In the thermodynamic limit N →
∞, the convergence condition is β < 4. When β → 4 with overwhelming
probability the system of vortices collapses, which corresponds to the BKT
phase transition.
In the microcanonical approach one considers vortices in a domain with
the area A. The density of states,

g(E, A) = δ(E − H)Πi dzi , (186)

must have a maximum since the phase volume Γ(E) = E g(E)dE increases
monotonically with E from zero to AN . The entropy, S(E) = ln g(E), there-
fore, gives a negative temperature T = g/g ′ at sufficiently high energy. Since

99
∑N
g(E, A) = AN g(E ′ , 1) where E ′ = E + 2 ln A i<j mi mj we can get the equa-
tion of state
( )  
∂S NT  1 ∑ N
P =T = 1+ mi mj  . (187)
∂A E
A 2N T i<j
∑N
For a neutral system with mi = ±1 we find i<j mi mj = N++ +N−− −N+− =
−N/2 and
( )
NT β
P = 1− , (188)
A 4

which shows that pressure turns to zero at the BKT transition temperature.
Note that the initial system had a short-range interaction, which allowed
to use the potential (183) in the long-wave limit. In other words, particles
that create condensate interact locally. However, condensate vortices have
a long-range interaction (185). While BKT transition may seem exotic and
particular for 2d, the lesson is quite general: when there are entities (like vor-
tices, monopoles, quarks) whose interaction energy depends on the distance
similarly to entropy then the phase transition from free entities to pairs is
possible at some temperature. In particular, it is argued that a similar tran-
sition took place upon cooling of the early universe in which quarks (whose
interaction grows with the distance, just like entropy) bind together to form
the hadrons and mesons we see today.
See Kardar, Fields, sect. 8.2.

7.3.3 Higgs mechanism


Let us stress that the previous subsection treated the condensate made of
particles with local interaction. The physics is different, however, if the
condensate is made of charged particles which can interact with the elec-
tromagnetic field, defined by the vector potential A and the field tensor
Fµν = ∇µ Aν − ∇ν Aµ . That corresponds to the thermodynamic potential
|(∇ − ıeA)Ψ|2 + α(T − Tc )|Ψ|2 + b|Ψ|4 − Fµν F µν /4, which is invariant with re-
spect to the local (inhomogeneous) gauge transformations Ψ → Ψ exp[iφ(r)],
A → A + ∇φ/e for any differentiable function φ(r). Note that the term
Fµν F µν = (∇ × A)2 does not change when one adds gradient to A. Pertur-
bations can now be defined as Ψ = (ϕ0 + h) exp[ıφ/ϕ0 ], absorbing also the

100
phase shift into the vector potential: A → A + ∇φ/eϕ0 . The quadratic part
of the thermodynamic potential is now

|∇h|2 + α(T − Tc )h2 + e2 ϕ20 A2 − Fµν F µν /4 , (189)

i.e. the correlation radii of both modes h, A are√ finite. The correlation
radius of the gauge-field mode is rc = 1/2eϕ0 = b/α(Tc − T ). This case
does not belong to the validity domain of the Mermin-Wagner theorem since
the Coulomb interaction is long-range.
Thinking dynamically, if the energy of a perturbation contains a nonzero
homogeneous term quadratic in the perturbation amplitude, then the spec-
trum of excitations has a gap at k → 0. For example, the (quadratic
part of the) Hamiltonian H = g|∇ψ|2 + ϕ20 |ψ|2 gives the dynamical equa-
tion ıψ̇ = δH/δψ∗ = −g∆ψ + ϕ20 ψ, which gives ψ ∝ exp(ıkx − ıωk t) with
ωk = ϕ20 + gk 2 . Therefore, what we have just seen is that the Coulomb
interaction leads to a gap in the plasmon spectrum in superconducting tran-
sition. In other words, photon is massless in a free space but has a mass if
it has to move through the condensate of Cooper pairs in a superconductor
(Anderson, 1963). Another manifestation of this simple effect comes via the
analogy between quantum mechanics and statistical physics (which we dis-
cuss in more detail in the subsection 8.2). On the language of relativistic
quantum field theory, the system Lagrangian plays a role of the thermody-
namic potential and the vacuum plays a role of the equilibrium. Here, the
presence of a constant term means a finite energy (mc2 ) at zero momentum,
which means a finite mass. Interaction with the gauge field described by
(189) is called Higgs mechanism by which particles acquire mass in quantum
field theory, the excitation analogous to that described by A is called vector
Higgs boson (Nambu, 1960; Englert, Brout, Higgs, 1964). What is dynami-
cally a finite inertia is translated statistically into a finite correlation radius.
When you hear about the infinite correlation radius, gapless excitations or
massless particles, remember that people talk about the same phenomenon.
See also Kardar, Fields, p. 285-6.
Let us conclude this section by restating the relation between the space
dimensionality and the possibility of phase transition and ordered state for
a short-range interaction. At d ≥ 3 an ordered phase is always possible
for a continuous as well as discrete broken symmetry. For d = 2, one type
of a possible phase transition is that which breaks a discrete symmetry and
corresponds to a real scalar order parameter like in Ising model; another type

101
is the BKT phase transition between power-law and exponential decrease of
correlations. At d = 1 no symmetry breaking and ordered state is possible
for system with a short-range interaction.
Landau & Lifshitz, Sects. 116, 152.

7.4 Universality classes and renormalization group


Since the correlation radius diverges near the critical point, then fluctuations
of all scales (from the lattice size to rc ) contribute the free energy. One there-
fore may hope that the particular details of a given system (type of atoms,
their interaction, etc) are unimportant in determining the most salient fea-
tures of the phase transitions, what is important is the space dimensionality
and which type of symmetry is broken — for instance, whether it is described
by scalar, complex or vector order parameter. Those salient features must be
related to the nature of singularities that is to the critical exponents which
govern the power-law behavior of different physical quantities as functions
of t = (T − Tc )/Tc and the external field h. Every physical quantity may
have its own exponent, for instance, specific heat C ∝ t−α , order parameter
η ∝ (−t)β and η ∝ h1/δ , susceptibility χ ∝ t−γ , correlation radius rc ∝ t−ν ,
the pair correlation function ⟨σi σj ⟩ ∝ |i − j|2−d−η , etc. Only two exponents
are independent since all quantities must follow from the free energy which,
according to the scaling hypothesis, must be scale invariant. That means that
if we re-scale the lengthes by the factor k then one can find such numbers a, b
that the free energy is transformed under re-scaling of arguments as follows:
F (k a t, k b h) = k d F (t, h). This is a very powerful statement which tells that
this is the function of one argument (rather than two), for instance,

F (t, h) = td/a g(h/tb/a ) . (190)

One can now express the quantities of interest via the derivatives at h = 0:
C = ∂ 2 F/∂t2 , η = ∂F/∂h, χ = ∂ 2 F/∂h2 and relate β = (d − b)/a etc.
A general formalism which describes how to make a coarse-graining to
keep only most salient features in the description is called the renormaliza-
tion group (RG). It consists in subsequently eliminating small-scale degrees
of freedom and looking for fixed points of such a procedure. For Ising model,
it is achieved with the help of a block spin transformation that is dividing all
the spins into groups (blocks) with the side k so that there are k d spins in
every block (d is space dimensionality). We then assign to any block a new

102
variable σ ′ which is ±1 when respectively the spins in the block are predom-
inantly up or down. We assume that the phenomena very near critical point
can be described equally well in terms of block spins with the energy of the
∑ ∑
same form as original, E ′ = −h′ i σi′ + J ′ /4 ij (1 − σi′ σj′ ), but with differ-
ent parameters J ′ and h′ . Let us demonstrate how it works using 1d Ising
model with
∑ [ ∑ h = 0 and ] J/2T ≡ K. Let us transform the partition function
18
{σ} exp K i σi σi+1 by the procedure (called decimation ) of eliminating
degrees of freedom by ascribing (undemocratically) to every block of k = 3
spins the value of the central spin. Consider two neighboring blocks σ1 , σ2 , σ3
and σ4 , σ5 , σ6 and sum over all values of σ3 , σ4 keeping σ1′ = σ2 and σ2′ = σ5
fixed. The respective factors in the partition function can be written as fol-
lows: exp[Kσ3 σ4 ] = cosh K + σ3 σ4 sinh K. Denote x = tanh K. Then only
the terms with even powers of σ3 , σ4 contribute the factors in the partition
function function that involve these degrees of freedom:

exp[K(σ1′ σ3 + σ3 σ4 + σ4 σ2′ )]
σ3 ,σ4 =±1

= cosh3 K (1 + xσ1′ σ3 )(1 + xσ4 σ3 )(1 + xσ2′ σ4 )
σ3 ,σ4 =±1

= 4 cosh3 K(1 + x3 σ1′ σ2′ ) = e−g(K) cosh K ′ (1 + x′ σ1′ σ2′ ) , (191)


( )
cosh K ′
g(K) = ln . (192)
4 cosh3 K

The expression (191) has the form of the Boltzmann factor exp(K ′ σ1′ σ2′ ) with
the re-normalized constant K ′ = tanh−1 (tanh3 K) or x′ = x3 — this formula
and (192) are called recursion relations. The partition function of the whole
system in the new variables can be written as
∑ [ ∑ ]
exp N g(K) − K ′ σi′ σi+1

.
{σ ′ } i

The term proportional to g(K) represents the contribution into the free en-
ergy of the short-scale degrees of freedom which have been averaged out.
This term does not affect the calculation of any spin correlation function.
Yet the renormalization of the constant, K → K ′ , influences the correlation
functions. Let us discuss this renormalization. Since K ∝ 1/T then T → ∞
18
the term initially meant putting to death every tenth soldier of a Roman army regiment
that run from a battlefield.

103
correspond to x → 0+ and T → 0 to x → 1−. One is interested in the set of
the parameters which does not change under the RG, i.e. represents a fixed
point of this transformation. Both x = 0 and x = 1 are fixed points, the first
one stable and the second one unstable. Indeed, after iterating the process
we see that x approaches zero and effective temperature infinity. That means
that large-scale degrees of freedom are described by the partition function
where the effective temperature is high so the system is in a paramagnetic
state. At this limit we have K, K ′ → 0 so that the contribution of the
small-scale degrees of freedom is getting independent of the temperature:
g(K) → − ln 4.
We see that there is no phase transition since there is no long-range order
for any T (except exactly for T = 0). RG can be useful even without critical
behavior, for example, the correlation length measured in lattice units must
satisfy rc (x′ ) = rc (x3 ) = rc (x)/3 which has a solution rc (x) ∝ ln−1 x, an
exact result for 1d Ising. It diverges at x → 1 (T → 0) as

rc (T ) ∝ ln−1 (tanh K) ≈ ln−1 (1 + e−2K ) ≈ exp(2K) = exp(J/T )

in agreement with the general argument of Sec. 6.2.2.

σ1 σ2 σ3 σ4 σ5 σ6
1d 2d

T=0 K=0 T=0 Tc K=0


The picture of RG flow is different in higher dimensions. Indeed, in 1d
in the low-temperature region (x ≈ 1, K → ∞) the interaction constant K
is not changed upon renormalization: K ′ ≈ K⟨σ3 ⟩σ2 =1 ⟨σ4 ⟩σ5 =1 ≈ K. This is
clear because the interaction between k-blocks is mediated by their boundary
spins that all look at the same direction (at high temperatures ⟨σ⟩ ∝ K so
that K ′ ∝ K 3 ). In d dimensions, there are k d−1 spins at the block side
so that K ′ ∝ k d−1 K as K → ∞ (in the case k = 3 and d = 2 we have
K ′ ≈ 3K). That means that K ′ > K that is the low-temperature fixed point
is stable at d > 1. On the other hand, the paramagnetic fixed point K = 0
is stable too, so that there must be an unstable fixed point in between at
some Kc which precisely corresponds to Tc . Indeed, consider rc (K0 ) ∼ 1 at
some K0 that corresponds to sufficiently high temperature, K0 < Kc . Since
rc (K) ∼ k n(K) , where n(K) is the number of RG iterations one needs to

104
come from K to K0 , and n(K) → ∞ as K → Kc then rc → ∞ as T → Tc .
Critical exponent ν = −d ln rc /d ln t is expressed via the derivative of RG at
Tc . Indeed, denote dK ′ /dK = k y at K = Kc so that K ′ − Kc ≈ k y (K − Kc ).
Since krc (K ′ ) = rc (K) then we may present:
rc (K) ∝ (K − Kc )−ν = k(K ′ − Kc )−ν = k [k y (K − Kc )]−ν ,
which requires ν = 1/y. We see that in general, the RG transformation of
the set of parameters K is nonlinear. Linearizing it near the fixed point one
can find the critical exponents from the eigenvalues of the linearized RG and,
more generally, classify different types of behavior. That requires generally
the consideration of RG flows in multi-dimensional spaces.
RG flow with two couplings
K2
critical surface
K2

K1 σ

K1

Already in 2d, summing over corner spin σ produces diagonal coupling


between blocks. In addition to K1 , that describes an interaction between
neighbors, we need to introduce another parameter, K2 , to account for a
next-nearest neighbor interaction. In fact, RG generates all possible further
couplings so that it acts in an infinite-dimensional K-space. An unstable fixed
point in this space determines critical behavior. We know, however, that we
need to control a finite number of parameters to reach a phase transition; for
Ising at h = 0 and many other systems it is a single parameter, temperature.
For all such systems (including most magnetic ones), RG flow has only one
unstable direction (with positive y), all the rest (with negative y) must be
contracting stable directions, like the projection on K1 , K2 plane shown in
the Figure. The line of points attracted to the fixed point is the projection
of the critical surface, so called because the long-distance properties of each
system corresponding to a point on this surface are controlled by the fixed
point. The critical surface is a separatrix, dividing points that flow to high-
T (paramagnetic) behavior from those that flow to low-T (ferromagnetic)
behavior at large scales. We can now understand universality of critical
behavior in a sense that systems in different regions of the parameter K-
space flow to the same fixed point and have thus the same exponents. Indeed,

105
changing the temperature in a system with only nearest-neighbor coupling,
we move along the line K2 = 0. The point where this line meets critical
surface defines K1c and respective Tc1 . At that temperature, the large-scale
behavior of the system is determined by the RG flow i.e. by the fixed point.
In another system with nonzero K2 , by changing T we move along some
other path in the parameter space, indicated by the broken line at the figure.
Intersection of this line with the critical surface defines some other critical
temperature Tc2 . But the long-distance properties of this system are again
determined by the same fixed point i.e. all the critical exponents are the
same. For example, the critical exponents of a simple fluid are the same as of
a uniaxial ferromagnetic. In this case (of the Ising model in an external field),
RG changes both t and h, the free energy (per block) is then transformed
as f (t, h) = g(t, h) + k −d f (t′ , h′ ). The part g originates from the degrees
of freedom within each block so it is an analytic function even at criticality.
Therefore, if we are interested in extracting the singular behavior (like critical
exponents) near the critical point, we consider only singular part, which has
the form (190) with a, b determined by the derivatives of the RG.
See Cardy, Sect 3 and http://www.weizmann.ac.il/home/fedomany/

8 Random walks and fluctuating fields


Many of the properties of the statistical systems, in particularly, the statistics
of fluctuations can be related to a fundamental problem of random walk.

8.1 Random walk and diffusion


Consider a particle that can hop randomly to a neighboring cite of d-dimensional
cubic lattice, starting from the origin at t = 0. We denote a the lattice spac-
ing, τ the time between hops and ei the orthogonal lattice vectors that satisfy
ei · ej = a2 δij . The probability to be in a given cite x satisfies the equation

1 ∑d
P (x, t + τ ) = P (x ± ei , t) . (193)
2d i=1

The first (painful) way to solve this equation is to turn it into averaging
exponents as we always do in statistical physics. This is done using the

106

Fourier transform, P (x) = (a/2π)d eikx P (k) dd k, which gives

1∑ d
P (k, t + τ ) = cos aki P (k, t) . (194)
d i=1

The initial condition for (193) is P (x, 0) = δ(x), which gives P (k, 0) = 1
( ∑d )t/τ
and P (k, t) = d−1 i=1 cos aki . That gives the solution in space as an
integral
∫ ( )t/τ
d ikx 1∑ d
P (x, t) = (a/2π) e cos aki dd k . (195)
d i=1
We are naturally interested in the continuous limit a → 0, τ → 0. If
we take τ → 0 first, the integral tends to zero and if we take a → 0 first,
the answer remains delta-function. A non-trivial evolution appears when the
lattice constant and the jump time go to zero simultaneously. Consider the
cosine expansion,
( )t/τ
1∑ d ( )t/τ
cos aki = 1 − a2 k 2 /2d + . . . ,
d i=1

where k 2 = di=1 ki2 . The finite answer exp(−κtk 2 ) appears only if one takes
the limit keeping constant the ratio κ = a2 k 2 /2dτ . In this limit, the space
density of the probability stays finite and is given by the saddle-point calcu-
lation of the integral:
∫ ( )
−d −d ikx−tκk2 −d/2 x2
ρ(x, t) = P (x, t)a ≈ (2π) e d
d k = (4πκt) exp − (196)
.
4κt

The second (painless) way to get this answer is to pass to the continuum
limit already in the equation (193):

1 ∑d
P (x, t+τ )−P (x, t) = [P (x + ei , t) + P (x − ei , t) − 2P (x, t)] . (197)
2d i=1

This is a finite difference approximation to the diffusion equation

(∂t − κ∆)P (x, t) = 0 . (198)

Of course, ρ satisfies the same equation, and (196) is its solution. Note that
(196,197) are isotropic and translation invariant while the discrete version

107
respected only cubic∫
symmetries. Also, the diffusion equation conserves the
total probability, ρ(x, t) dx, because it has the form of a continuity equation,
∂t ρ(x, t) = −div j with the current j = −κ∇ρ.
Another way to describe it is to treat ei as a random variable with ⟨ei ⟩ = 0
∑t/τ
and ⟨ei ej ⟩ = a2 δij , so that x = i=1 ei .
Random walk is a basis of the statistical theory of fields. To see the
relation, consider the Green function which is the mean time spent on a
given site x:


G(x) = P (x, t) . (199)
t=0
The most natural question is whether this time if finite or infinite. It depends
on the space dimensionality. Indeed, summing (195) over t as a geometric
series one gets

eikx dd k
G(x) = ∑ . (200)
1 − d−1 cos(aki )
It diverges at k → 0 when d ≤ 2. In this limit one can also use the continuous
limit, where it has a form
∫ ∞ ∫ ∫
2 eikx dd k
g(x) = lim (a2−d /2d)G(x/a) = dt eikx−tk dd k = . (201)
a→0 0 (2π)d k 2
We have seen this integral calculating the correlation function of fluctuation
(178). The divergence of this integral in one and two dimensions meant
before that the fluctuation are so strong that they destroy any order. Now,
(201) suggests another interpretation: integral divergence means that the
mean time spent on any given site is infinite. In other words, it means that
the random walker in 1d and 2d returns to any point infinite number of
times. Short-correlated nature of a random walk corresponds to short-range
interaction.
A path of a random walker behaves rather like a surface than a line.
Indeed, surfaces generally intersect along curves in 3d, they meet at isolated
points in 4d and do not meet at d > 4. That is reflected in special properties
of critical phenomena in 2d and 4d. Two-dimensionality√of the random walk
is a reflection of the square-root diffusion law: ⟨x⟩ ∝ t. Indeed, one can
define the dimensionality of a geometric object as a relation between its size
R and the number of standard-size elements (i.e. volume or area) N . For a
line, N ∝ R, generally N ∝ Rd . For a random walk, N ∝ t while R ∝ x so
that d = 2.

108
A slight generalization gives the generating function for the probabilities:

∑ ∫
eikx dd k
G(x, λ) = λ λt/τ P (x, t) = ∑ . (202)
t=0 λ−1 − d−1 cos(aki )

At λ = 1 it coincides with the Green functions while its derivatives give


moments of time: ( )
∂ nG
⟨(1 + t/τ ) ⟩ =
n
.
∂λn λ=1
The continuous limit of the generating function,

eikx dd k
g(x, m) = lim (a2−d /2d)G(x/a, , λ) = , (203)
a→0 (2π)d (k 2 + m2 )

exactly corresponds to the Ornstein-Zernike approximation of the correlation


function of fluctuations of order parameter away from criticality (with a finite
correlation radius). Here we denoted 1/λ = 1 + m2 a2 /2d so that m plays the
role of the inverse radius of correlation or mass of the quantum particle. Note
that this Green function can be presented as an integral of the probability
density (196) taken with κ = 1:
∫ ∞
e−m t ρ(x, t) dt .
2
g(x, m) = (204)
0

The properties of random walks can be expressed alternatively in terms


of sums over different paths. Let us write the transition probability indi-
cating explicitly the origin: ρ(x, t; 0, 0). Then we can write the convolution
identity which simply states that the walker was certainly somewhere at an
intermediate time t1 :

ρ(x, t; 0, 0) = ρ(x, t; x1 , t1 )ρ(x1 , t1 ; 0, 0) dx1 . (205)

We now divide the time interval t into an arbitrary large number of intervals
and using (196) we write
∫ [ ]
dxi+1 (xi+1 − xi )2
ρ(x, t; 0, 0) = Πni=0 exp −
[4πκ(ti+1 − ti )]d/2 4κ(ti+1 − ti )
∫ [ ∫ t ]
1
→ Dx(t′ ) exp − dt′ẋ2 (t′ ) . (206)
4κ 0

109
The last expression is an integral over paths that start at zero and end up at
x at t. By virtue of (204) it leads to a path-integral expression for the Green
function:
∫ ∞ ∫ { ∫ t [ ]}
1
g(x, m) = dt Dx(t′ ) exp − dt′ m2 + ẋ2 (t′ ) . (207)
0 0 4
Comparison of the Green functions (178,203) shows the relation between
the random walk and a free field. This analogy goes beyond the correlation
function to all the statistics. Indeed, much in the same way, the partition
function of a fluctuating field η(x) that takes continuous values in a con-
tinuum limit can be written as a path integral over all realizations of the
field:

Z = Dη exp [−βH{η(x)}] . (208)

For a Gaussian free field in a discrete case one takes


1∑
βH = η(x)[λ−1 δx,x′ − Jx,x′ ]η(x′ ) (209)
2 x,x′
1∫ d ∑
= d kη(k)[λ−1 − d−1 cos(aki )]η(−k) , (210)
2
where Jx,x′ = 1/2d when |x − x′ | = a and Jx,x′ = 0 otherwise. In the
continuous limit it corresponds to the Ornstein-Zernike correlation function
(178,203),
1∫ ( )
βH = dx m2 η 2 + |∇η|2 . (211)
2

Here one needs to renormalize η(x/a) → 2dad/2−1 η(x).
Stochastic dynamics can thus be seen as thermodynamics in space-time
with trajectories playing the role of configurations.

8.2 Analogy between quantum mechanics and statisti-


cal physics
Looking at the transition probability (206), one can also see the analogy
between the statistical physics of a random walk and quantum mechanics.
According to Feynman, for a quantum non-relativistic particle with a mass

110
M in the external potential U (x), the transition amplitude T (x, t; 0, 0) from
zero to x during t is given by the sum over all possible paths connecting
the points. Every path is weighted by the factor exp(iS/h̄) where S is the
classical action:
∫ [ ( )]
′ i ∫ t ′ M ẋ2
T (x, t; 0, 0) = Dx(t ) exp dt + U (x) . (212)
h̄ 0 2

Comparing with (206), we see that the transition probability of a random


walk is given by the transition amplitude of a free quantum particle during
an imaginary time.
In quantum theory, one averages over quantum rather than thermal fluc-
tuations, yet the formal structure of the theory is similar. This similarity
can be also revealed by using the formalism of the transfer matrix. In-
deed, in a nutshell, quantum mechanics is done by specifying two sets of
states |q⟩ and ⟨p|, which has ortho-normality and completeness: ⟨p|q⟩ = δqp

and q |q⟩⟨q| = 1. Physical quantities are represented by operators and
measurement corresponds to taking a trace of the operator over the set of

sets: trace P = q ⟨p|P |q⟩. One special operator, called Hamiltonian H, de-
termines the temporal evolution of all other operators according to P (t) =
exp(iHt)P (0) exp(−iHt). The operator T (t) = exp(iHt) is called time trans-
lation operator. The trace of it is the normalization factor in averages so it
can be called the partition function, let us formally consider it for an imag-
inary time t = iβ: then the quantum-mechanical average of any operator Q
is calculated as a trace normalized by the partition function:

T (iβ)Q ∑
⟨Q⟩ = trace , Z(β) = trace T (iβ) = e−βEa . (213)
Z(β) a

If the inverse ”temperature” β goes to infinity then all the sums are domi-
nated by the ground state, Z(β) ≈ exp(−βE0 ) and the average in (213) are
just expectation values in the ground state.
On the other hand, recall the one dimensional ring chain (151) with a
nearest neighbor interaction, whose partition function was expressed via the
trace of the transfer matrix (154,155). Here trace means the sum (or integral)
over all possible values on the cite. In particular, for the Ising model, it is
the sum over two values. If there are m values on the cite, then T is m × m
matrix. For O(n) model, trace means integrating over orientations of the
spin in n-dimensional space. We see that the translations along the chain

111
are analogous to quantum-mechanical translations in (imaginary) time. This
analogy is not restricted to 1d systems, one can consider 2d strips that way
too.

8.3 Brownian motion


A similar problem is the random walk in the momentum space. The mo-
mentum of a small particle in a fluid, p = M v, changes because of collisions
with the molecules. When the particle is much heavier than the molecules
then its velocity is small comparing to the typical velocities of the molecules.
Then one can write the force acting on it as Taylor expansion with the parts
independent of p and linear in p:
ṗ = −λp + f . (214)
Here, f (t) is a random function which makes (214) Langevin equation.
Its solution ∫ t

p(t) = f (t′ )eλ(t −t) dt′ . (215)
−∞
We now assume that ⟨f ⟩ = 0 and that ⟨f (t′ ) · f (t′ + t)⟩ = 3C(t) decays with
t during the correlation time τ which is much smaller than λ−1 . Since the
integration time in (215) is of order λ−1 then the condition λτ ≪ 1 means
that the momentum of a Brownian particle can be considered as a sum of
many independent random numbers (integrals over intervals of order τ ) and
so it must have a Gaussian statistics ρ(p) = (2πσ 2 )−3/2 exp(−p2 /2σ 2 ) where
∫ ∞
σ 2 = ⟨p2x ⟩ = ⟨p2y ⟩ = ⟨p2z ⟩ = C(t1 − t2 )e−λ(t1 +t2 ) dt1 dt2
0
∫ ∫
∞ 2t 1 ∫∞
≈ e−2λt dt C(t′ ) dt′ ≈ C(t′ ) dt′ . (216)
0 −2t 2λ −∞
On the other hand, equipartition guarantees that ⟨p2x ⟩ = M T so that we
can express the friction coefficient via the correlation function of the force
fluctuations (a particular case of the fluctuation-dissipation theorem to be
discussed below in Sect. 9.2):
1 ∫∞
λ= C(t′ ) dt′ . (217)
2T M −∞
Displacement
∫ t′

∆r = r(t + t ) − r(t) = v(t′′ ) dt′′
0

112
is also Gaussian with a zero mean. To get its second moment we need the
different-time correlation function of the velocities

⟨v(t) · v(0)⟩ = (3T /M ) exp(−λ|t|) (218)

which can be obtained from (215). Note that the friction makes velocity
correlated on a longer timescale than the force. That gives
∫ t′ ∫ t′ 6T ′
⟨(∆r) ⟩ =
2
dt1 dt2 ⟨v(t1 )v(t2 )⟩ = 2
(λt′ + e−λt − 1) .
0 0 Mλ
The mean squared distance initially grows quadratically (so-called ballistic
regime at λt′ ≪ 1). In the limit of a long time (comparing to the relaxation
time λ−1 rather than to the force correlation time τ ) we have the diffusion
growth ⟨(∆r)2 ⟩ ≈ 6T t′ /M λ. Generally ⟨(∆r)2 ⟩ = 2dDt where d is space
dimensionality and D - diffusivity. In our case d = 3 and then the diffu-
sivity is as follows: D = T /M λ — the Einstein relation19 . The probability
distribution of displacement,

ρ(∆r, t′ ) = (4πDt′ )−3/2 exp[−(∆r)2 /4Dt′ ] ,

satisfies the diffusion equation ∂ρ/∂t′ = D∇2 ρ. If we have many particles


initially distributed

according to n(r, 0) then their distribution at any time,
n(r, t) = ρ(r−r , t)n(r′ , 0) dr′ , also satisfies the diffusion equation: ∂n/∂t′ =

D∇2 n.
In the external field V (q), the particle satisfies the equations

ṗ = −λp + f − ∂q V , q̇ = p/M . (219)

Note that these equations characterize the system with the Hamiltonian H =
p2 /2M +V (q), that interact with the thermostat, which provides friction −λp
and agitation f - the balance between these two terms expressed by (217)
means that the thermostat is in equilibrium.
Consider now the over-damped limit, λp ≫ ṗ, which gives the first-order
equation:
λp = λM q̇ = f − ∂q V . (220)
19
With temperature in degrees, it contains the Boltzmann constant, k = DM λ/T , which
was actually determined by this relation and found constant indeed, i.e. independent of
the medium and the type of particle. That proved the reality of atoms - after all, kT is
the kinetic energy of a single atom.

113
Let us derive the equation on ρ(q, t), that is to pass from considering indi-
vidual trajectories to the description of the ”cloud” of trajectories. We know
that without V , the density satisfies the diffusion equation. Consider now
dynamic equation without any randomness, λM q̇ = −∂q V , it corresponds to
a flow in q-space with the velocity w = −∂q V /λM . In a flow, density satisfies
the continuity equation ∂t ρ = −div ρw. Together, diffusion and advection
give the so-called Fokker-Planck equation
∂ρ 1 ∂ ∂V
= D∇2 ρ + ρ = −div J . (221)
∂t λM ∂qi ∂qi
More formally, one can derive this equation by considering the Langevin
equation q̇−w = η, taking the noise Gaussian delta-correlated ⟨ηi (0)ηj (t)⟩ =
2Dδij δ(t). In this case the path-integral representation (206) change into
∫ [ ]
1 ∫t ′
ρ(q, t; 0, 0) = Dq(t′ ) exp − dt |q̇ − w|2 , (222)
4D 0
since it is the quantity q̇ − w which is Gaussian now. To describe the time
change, consider the convolution identity (205) for an infinitesimal time shift
ϵ:
∫ [ ]
′ −d/2 [q − q′ − ϵw(q′ )]2
ρ(q, t) = dq (4πDϵ) exp − ρ(q′ , t − ϵ) . (223)
4Dϵ
What is written here is simply that the transition probability is the Gaussian
probability of finding the noise η with the right magnitude to provide for the
transition from q′ to q. We now change integration variable, y = q − q′ −
ϵw(q′ ), and keep only the first term in ϵ: dq′ = dy[1 − ϵ∂q · w(q)]. Here
∂q · w = ∂i wi = div w. In the resulting expression, we expand the last factor,

dy(4πDϵ)−d/2 e−y
2 /4Dϵ
ρ(q, t) ≈ (1 − ϵ∂q · w) ρ(q + y − ϵw, t − ϵ)
∫ [
dy(4πDϵ)−d/2 e−y
2 /4Dϵ
≈ (1 − ϵ∂q · w) ρ(q, t) + (y − ϵw) · ∂q ρ(q, t)
]
+(yi yj − 2ϵyi wj + ϵ2 wi wj )∂i ∂j ρ(q, t)/2 − ϵ∂t ρ(q, t)
= (1 − ϵ∂q · w)[ρ − ϵw · ∂q ρ + ϵD∆ρ − ϵ∂t ρ + O(ϵ2 )] , (224)

and obtain (221) collecting terms linear in ϵ. Note that it was necessary to
expand until the quadratic terms in y, which gave the contribution linear in
ϵ, namely the Laplacian, i.e. the diffusion operator.

114
The Fokker-Planck equation has a stationary solution which corresponds
to the particle in an external field and in thermal equilibrium with the sur-
rounding molecules:

ρ(q) ∝ exp[−V (q)/λM D] = exp[−V (q)/T ] . (225)

Apparently it has a Boltzmann-Gibbs form, and it turns into zero the proba-
bility current: J = −ρ∂V /∂q − D∂ρ/∂q = 0. We shall return to the Fokker-
Planck equation in the next Chapter for the consideration of the detailed
balance and fluctuation-dissipation relations.
Ma, Sect. 12.7; Kardar Fields, Sect 9.1.

9 Response and fluctuations


The mean squared thermodynamic fluctuation of any quantity is determined
by the second derivative of the thermodynamic potential with respect to this
quantity. Those second derivatives are related to susceptibilities with respect
to the properly defined external forces. One can formulate a general relation.

9.1 Static response


Consider a system with the Hamiltonian H and add some small static ex-
ternal force f so that the Hamiltonian becomes H − xf where x is called
the coordinate. The examples of force-coordinate pairs are magnetic field
and magnetization, pressure and volume etc. The mean value of any other
variable B can be calculated by the canonical distribution with the new
Hamiltonian ∑
B exp[(xf − H)/T ]
B̄ = ∑ .
exp[(xf − H)/T ]
Note that we assume that the perturbed state is also in equilibrium. The
susceptibility of B with respect to f is as follows

∂ B̄ ⟨Bx⟩ − B̄ x̄ ⟨Bx⟩c
χ≡ = ≡ . (226)
∂f T T

Here the cumulant (also called the irreducible correlation function) is defined
for quantities with the subtracted mean values ⟨xy⟩c ≡ ⟨(x − x̄)(y − ȳ)⟩ and
it is thus the measure of statistical correlation between x and y. We thus

115
learn that the susceptibility is the measure of the statistical coherence of the
system, increasing with the statistical dependence of variables. Consider few
examples of this relation.
Example 1. If x = H is energy itself then f represents the fractional
increase in the temperature: H(1 − f )/T ≈ H/(1 + f )T . Formula (226) then
gives the relation between the specific heat (which is a kind of susceptibility)
and the squared energy fluctuation which can be written via the irreducible
correlation function of the energy density ϵ(r):

∂ H̄ ∂E
=T = T Cv = ⟨(∆E)2 ⟩/T
∂f ∂T
∫ ∫
1 ′ ′ V
= ⟨ϵ(r)ϵ(r )⟩c drdr = ⟨ϵ(r)ϵ(0)⟩c dr (227)
T T
Growth of the specific heat when the temperature approached criticality
is related to the growth of the correlation function of fluctuations. As we
discussed before, the specific heat is extensive i.e. proportional to the volume
(or number of particles), but the coefficient of proportionality actually tells
us how many degrees of freedom are effective in absorbing energy at a given
temperature (recall two-level system where specific heat was small for high
and low temperatures). We see from (227) that the higher the correlation
the larger is the specific heat that is the more energy one needs to spend to
raise the temperature by one degree).
Example 2. If f = h is a magnetic field then the coordinate x = M is the
magnetization and (226) gives the magnetic susceptibility

∂M ⟨M 2 ⟩c V ∫
χ= = = ⟨m(r)m(0)⟩c dr .
∂h T T
Divergence of χ near the Curie point means the growth of correlations be-
tween distant spins i.e. the growth of correlation length. For example, the
Ornshtein-Zernicke correlation function (175) gives ⟨m(r)m(0)⟩c ∝ r2−d so
∫ rc d 2−d
that in the mean-field approximation χ ∝ a d rr ∝ rc2 ∝ |T − Tc |−1 .
General remark. These fluctuation-response relations can be related to
the change of the thermodynamic potential (free energy) under the action of
the force:

F = −T ln Z = −T ln exp[(xf − H)/T ]

116
f2 2
= −T ln Z0 − T ln⟨exp(xf /T )⟩0 = F0 − f ⟨x⟩0 − ⟨x ⟩0c + . . . (228)
2T
⟨x⟩ = −∂F/∂f, ⟨x2 ⟩c /T = ∂⟨x⟩/∂f = −∂F 2 /∂f 2 . (229)

Subscript 0 means an average over the state with f = 0, we don’t write it


in the expansion (229) which can take place around any value of f . Formula
(228) is based on the cumulant expansion theorem (which we already used
implicitly in making virial expansion in Sect. 5.1):

∑ an
⟨exp(ax)⟩ = 1 + ⟨xn ⟩ ,
n=1 n!
( ∞
)n
∑ 1 ∑ 1 ∑ am
ln⟨exp(ax)⟩ = (1 − ⟨exp(ax)⟩) = n
− ⟨x ⟩
m

n=1 n n=1 n m=1 m!


( ) a2 ∞
∑ a n
= a1 ⟨x⟩ + ⟨x2 ⟩ − ⟨x⟩2 + ... = ⟨xn ⟩c = ⟨eax − 1⟩c .
2! n=1 n!

Example 3. Consider now the inhomogeneous force f (r) and denote


a(r) ≡ x(r) − x0 . The Hamiltonian change is now the integral
∫ ∑ ∫ ∑

f (r)a(r) dr = fk ak′ ei(k+k )·r dr = V fk a−k .
kk′ k

The mean linear response can be written as an integral of the force with the
response (Green) function:

ā(r) = G(r − r′ )f (r′ ) dr′ , āk = Gk fk . (230)

One relates the Fourier components of the Green function and the correlation
function of the coordinate fluctuations choosing B = ak , x = a−k in (226):

1∫ ′ ik·(r′ −r) ′ V ∫
V Gk = ⟨a(r)a(r )⟩c e drdr = ⟨a(r)a(0)⟩c e−ik·r dr .
T T
T Gk = (a2 )k .

Example 4. If B = x = N then f is the chemical potential µ:


( )
∂N ⟨N 2 ⟩c ⟨(∆N )2 ⟩ V ∫
= = = ⟨n(r)n(0)⟩c dr . (231)
∂µ T,V
T T T

117
This formula coincides with (169) if one accounts for
( ) ( ) ( )
∂V ∂n ∂N
−n 2
= N =n
∂P T,N
∂P T,N
∂P T,V
( ) ( ) ( )
∂P ∂N ∂N
= = . (232)
∂µ T,V
∂P T,V
∂µ T,V

Here we used the fact Ω(T, µ) = P V and N = ∂Ω/∂µ. We conclude that the
response of the density to the pressure is expressed via the density fluctua-
tions.
In the simplest case of an ideal gas with ⟨n(r)n(0)⟩c = nδ(r), (231,232)
give dn/dP = 1/T . For the pair interaction energy U (r) the first ap-
proximation (neglecting multiple correlations) gives ⟨n(r)n(0)⟩c = n{δ(r) +
n[e−U (r)/T − 1]} and the correction to the equation of state

p = nT + n2 T [1 − e−U (r)/T ] dr ,

which we already derived in Sect. 5.2.


More details in Shang-Keng Ma, Statistical Mechanics, Sect. 13.1

9.2 Temporal correlation of fluctuations


We now consider the time-dependent force f (t) so that the Hamiltonian is
H = H0 − xf (t). Time dependence requires more elaboration than space
inhomogeneity20 because one must find the non-equilibrium time-dependent
probability density in the phase space solving the Liouville equation
∂ρ ∂ρ ∂H ∂ρ ∂H
= − ≡ −{ρ, H} , (233)
∂t ∂x ∂p ∂p ∂x
or the respective equation for the density matrix in the quantum case. Here
p is the canonical momentum conjugated to the coordinate x. One can solve
the equation (233) perturbatively in f starting from ρ0 = Z −1 exp(−βH0 )
and then solving
∂ρ1 dρ1 ∂H0
+ {ρ1 , H0 } = = {ρ0 , f x} = f β ρ0 . (234)
∂t dt ∂p
20
As the poet (Brodsky) said, ”Time is bigger than space: space is an entity, time is in
essence a thought of an entity.”

118
Here d/dt denotes the derivative in the moving reference frame, like in
Sect. 2.1. The motion is along an unperturbed trajectory determined by
H0 . Recall now that ∂H0 /∂p = dx/dt (calculated at f = 0 i.e. also along
an unperturbed trajectory). The formal solution of (234) is written as an
integral over the past:
∫ t dx(τ )
ρ1 = βρ0 f (τ ) dτ . (235)
−∞ dτ
We now use (235) to derive the relation between the fluctuations and
response in the time-dependent case. Indeed, the linear response of the co-
ordinate to the force is as follows
∫ t ∫
′ ′ ′
⟨x(t)⟩ ≡ α(t, t )f (t ) dt = xdxρ1 (x, t) , (236)
−∞

which defines generalized susceptibility (also called response or Green func-


tion) α(t, t′ ) = α(t − t′ ) ≡ δ⟨x(t)⟩/δf (t′ ). Note that causality requires
α(t − t′ ) = 0 for t < t′ . Substituting (235) into (236) and taking a variational
derivative δ/δf (t′ ) we obtain the fluctuation-dissipation theorem (FDT)


⟨x(t)x(t′ )⟩ = T α(t, t′ ) , t ≥ t′ . (237)
∂t
It relates quantity in equilibrium (the decay rate of correlations) to the weakly
non-equilibrium quantity (response to a small perturbation). While it is
similar to the fluctuation-response relations obtained in the previous section,
it is called the fluctuation-dissipation theorem. Pay attention to the fact
that the derivative is with respect to the earlier time, which is related to
causality and is also clear looking at (235). To understand (237) better and
to see where the word ”dissipation” comes from, we introduce the spectral
decomposition of the fluctuations:
∫ ∞ ∫ ∞ dω
xω = x(t)eiωt dt , x(t) = xω e−iωt . (238)
−∞ −∞ 2π
The pair correlation function, ⟨x(t′ )x(t)⟩ must be a function of the time
difference which requires ⟨xω xω′ ⟩ = 2πδ(ω + ω ′ )(x2 )ω — this relation is the
definition of the spectral density of fluctuations (x2 )ω . Linear response in the
spectral form is x̄ω = αω fω where
∫ ∞
α(ω) = α(t)eiωt dt = α′ + ıα′′
0

119
is analytic in the upper half-plane of complex ω under the assumption that
α(t) is finite everywhere. Since α(t) is real then α(−ω ∗ ) = α∗ (ω). Let us
show that the imaginary part α′′ determines the energy dissipation,

dE dH ∂H ∂H df df
= = = = −x̄ (239)
dt dt ∂t ∂f dt dt
For purely monochromatic perturbation, f (t) = fω exp(−iωt) + fω∗ exp(iωt),
x̄ = α(ω)fω exp(−iωt) + α(−ω)fω∗ exp(iωt), the dissipation averaged over a
period is as follows:

dE ∫ 2π/ω ωdt
= [α(−ω) − α(ω)]ıω|fω |2 = 2ωαω′′ |fω |2 . (240)
dt 0 2π
We can now calculate the average dissipation using (235)
∫ ∫ ∫ t
dE ˙ ˙
= − xf ρ1 dpdx = −β x(t)f (t)ρ0 dpdx ẋ(τ − t)f (τ ) dτ
dt ∫ ∞
−∞

= −iω|fω |2 β ⟨x(t)ẋ(t′ )⟩eiω(t−t ) dt′ = βω 2 |fω |2 (x2 )ω , (241)
−∞

where the spectral density of the fluctuations is calculated with ρ0 (i.e. at


unperturbed equilibrium). Comparing (240) and (241) we obtain the spectral
form of the fluctuation-dissipation theorem (Callen and Welton, 1951):

2T α′′ (ω) = ω(x2 )ω . (242)

We can also obtain directly making a Fourier transform of (237),

T α(ω) ∫ ∞
= ⟨x(0)x(t)⟩ exp(iωt)dt ,
iω 0
∫ ∞ ∫ 0
(x2 )ω = ⟨x(0)x(t)⟩ exp(iωt)dt + ⟨x(0)x(t)⟩ exp(iωt)dt
0 −∞
T [α(ω) − α(−ω)] 2T α′′ (ω)
= = .
iω ω
This truly amazing formula relates the dissipation coefficient that governs
non-equilibrium kinetics under the external force with the equilibrium fluc-
tuations. The physical idea is that to know how a system reacts to a force
one might as well wait until the fluctuation appears which is equivalent to
the result of that force. Note that the force f disappeared from the final

120
result which means that the relation is true even when the (equilibrium)
fluctuations of x are not small. Integrating (242) over frequencies we get
∫ ∞ dω T ∫ ∞ α′′ (ω)dω T ∫ ∞ α(ω)dω
⟨x ⟩ =
2
(x )ω2
= = = T α(0) . (243)
−∞ 2π π −∞ ω ıπ −∞ ω
The last step used the Cauchy integration formula (for the contour that
runs along the real axis, avoids zero by a small semi-circle and closes at the
upper half plane). We thus see that the mean squared fluctuation is the
zero-frequency response, which is the integral of the response over time:
∫ ∞
α(ω = 0) = α(t) dt .
0

We have seen the simplest case of the FDT before - the relation (217)
relating the friction coefficient of the Brownian particle to the variance of
the fluctuating molecular forces and the temperature of the medium. The
same Langevin equation (214) is satisfied by many systems, in particular by
the current I flowing through an electric L-R circuit at a finite temperature:
dI
L = −RI + V (T ) ,
dt
where the fluctuating voltage V (t) is due to thermal fluctuations. Similar to
(217) we can then relate the resistance to the equilibrium voltage fluctuations
on a resistor R: ∫ ∞
⟨V (0)V (t)⟩dt = 2RT .
−∞

This relation is called Nyquist theorem, note that L does not enter. Equipar-
tition requires L < I 2 > /2 = T /2, so that similar to (218) we can write the
current auto-correlation function ⟨I(0)I(t)⟩ = (T /L) exp(−Rt/L), which cor-
responds to the Lorentzian spectral density: (I 2 )ω = 2RT /(L2 ω 2 + R2 ). At
low frequencies it corresponds to a constant spectral density (white noise).
Generally, the spectral density has a universal Lorentzian form in the
low-frequency limit when the period of the force is much longer than the
relaxation time for establishing the partial equilibrium characterized by the
given value x̄ = α(0)f . In this case, the evolution of x is the relaxation
towards x̄:
ẋ = −λ(x − x̄) . (244)

121
For harmonics,

(λ − iω)xω = λx̄ = λα(0)f ,


λ ω
α(ω) = α(0) , α′′ (ω) = α(0) 2 . (245)
λ − iω λ + ω2
The spectral density of such (so-called quasi-stationary) fluctuations is as
follows:

(x2 )ω = ⟨x2 ⟩ 2 . (246)
λ + ω2
It corresponds to the long-time exponential decay of the temporal correla-
tion function: ⟨x(t)x(0)⟩ = ⟨x2 ⟩ exp(−λ|t|). That exponent is a temporal
analog of the large-scale formula (175). Non-smooth behavior at zero is an
artefact of the long-time approximation, consistent consideration would give
zero derivative at t = 0. The susceptibility is α(t) = exp(−λt).
When several degrees of freedom are weakly deviated from equilibrium,
the relaxation must be described by the system of linear equations (consider
all xi = 0 at the equilibrium)

ẋi = −λij xj . (247)

The dissipation coefficients are generally non-symmetric: λij ̸= λji . One


can however find a proper coordinates in which the coefficients are symmet-
ric. Single-time probability distribution of small fluctuations is Gaussian
w(x) ∼ exp(∆S) ≈ exp(−βjk xj xk ). The matrix β̂ is symmetric by defini-
tion. Introduce generalized forces Xj = −∂S/∂xj = βij xj so that ẋi = γij Xj ,
∫ ∫
γij = λik (β̂ −1 )kj with ⟨xi Xj ⟩ = dxxi Xj w = − dxxi ∂w/∂xj = δij —
we have seen that the coordinates and the generalized forces do not cross-
correlate already in the simplest case of uniform fluctuations described by
(166) which gave ⟨∆T ∆V ⟩ = 0, for instance. Returning to the general case,
note also that ⟨Xj Xj ⟩ = βij and ⟨xj xk ⟩ = (β̂ −1 )jk . If xi all have the same
properties with respect to the time reversal then their correlation function
is symmetric too: ⟨xi (0)xk (t)⟩ = ⟨xi (t)xk (0)⟩. Differentiating it with respect
to t at t = 0 we get the Onsager symmetry principle, γik = γki . For ex-
ample, the conductivity tensor is symmetric in anisotropic crystals without
magnetic field. Also, a temperature difference produces the same electric
current as the heat current produced by a voltage. Such symmetry relations
due to time reversibility are valid only near equilibrium steady state and are
manifestations of the detailed balance (i.e. absence of any persistent currents

122
in the phase space). Let us stress that this is different from the susceptibil-
ities in equilibrium which were symmetric for the simple reason that they
were second derivatives of the thermodynamic potential; for instance, dielec-
tric susceptibility χij = ∂Pi /∂Ej = χji where P is the polarization of the
medium - this symmetry is analogous to the (trivial) symmetry of β̂, not the
(non-trivial) symmetry of γ̂.
See Landay & Lifshitz, Sect. 119-120 for the details and Sect. 124 for the
quantum case. Also Kittel Sects. 33-34.

9.3 Spatio-temporal correlation function


To have a specific example, let us calculate the correlation function of the
density at different points and times for the simplest case of an ideal gas. We
have N particles with the time-independent momenta pk and the coordinates
Rk (t) = Rk (0)+pk t/m. The concentration is just the sum of delta-functions,
corresponding to different particles:

N ( )
n(r, t) = δ r − Rk (t) . (248)
k=1

Since the particles do not interact, there is no correlation between differ-


ent particles. Therefore, the only correlation between densities n(r, t) and
n(r′ , t′ ) can appear due to a particle that visited both places at respective
times:
⟨ ⟩

N ( ) ( )
′ ′ ′ ′
⟨n(r, t)n(r , t )⟩c = δ r − Rk (t) δ r − Rk (t )
k=1
⟨ ( ) ( )⟩
= N δ r − Rk (t) δ r′ − Rk (t) − pk (t′ − t)/m
⟨ ( ) ( )⟩
= N δ r − Rk (t) δ r′ − r − pk (t′ − t)/m . (249)
There are two averages here.( The first one) is over all possible positions within
the space V , which gives ⟨δ r − Rk (t) ⟩ = 1/V . The second average is over
the Maxwell distribution of the momenta:
N⟨ ( ′ )⟩
⟨n(r, t)n(r′ , t′ )⟩c = δ r − r − pk (t′ − t)/m
∫ V ( )
= n(2πM T )−d/2 dpe−p δ r′ − Rk (t) − pk (t′ − t)/M
2 /2mT

( )d/2 ( )
′ −d m m|r − r′ |2
= n|t − t | exp − . (250)
2πT 2|t − t′ |2

123
That function determines the response of the concentration to the change
in the chemical potential. In particular, when t′ → t it tends to nδ(r − r′ ),
which determines the static response described in Sect. 9.1. For coinciding
points it decays by the diffusion law, ⟨n(r, t)n(r, t′ )⟩c ∝ |t − t′ |−d , so that the
response decays as |t − t′ |−d−1 .

9.4 General fluctuation-dissipation relation


Consider the over-damped Brownian particle with the coordinate x(t) in a
time-dependent potential V (x, t):

ẋ = −∂x V + η . (251)

Here ⟨η(0)η(t)⟩ = 2δ(t)/β. The Fokker-Planck equation for the probability


ρ(x, t) has the same form (221):

∂t ρ = T ∂x2 ρ + ∂x (ρ∂x V ) = −ĤF P ρ . (252)

We have introduced the Fokker-Planck operator,


( )
∂ ∂V ∂
HF P =− +T ,
∂x ∂x ∂qi

which allows one to exploit another instance of the analogy between quantum
mechanics and statistical physics. We may say that the probability density is
the ψ-function is the x-representation, ρ(x, t) = ⟨x|ψ(t)⟩. In other words, we
consider evolution in the Hilbert space of functions so that we may rewrite
(252) in a Schrödinger representation as d|ψ⟩/dt = −ĤF P |ψ⟩, which has a
formal solution |ψ(t)⟩ = exp(−tHF P )|ψ(0)⟩. The transition probability is
given by the matrix element:

ρ(x′ , t′ ; x, t) = ⟨x′ | exp[(t − t′ )HF P )|x⟩ . (253)

Without the coordinate-dependent field V (x), the transition probability is


symmetric, ρ(x′ , t; x, 0) = ρ(x, t; x′ , 0), which is formally manifested by the
fact that the respective Fokker-Planck operator ∂x2 is Hermitian. This prop-
erty is called the detailed balance. How this is modified in an external field? If
the potential V is time independent, then we have a Gibbs steady state which

124
also satisfies the detailed balance: the probability current is the (Gibbs) prob-
ability density at the starting point times the transition probability; forward
and backward currents must be equal in equilibrium:

ρ(x′ , t; x, 0)e−V (x)/T = ρ(x, t; x′ , 0)e−V (x )/T . (254)

⟨x′ |e−tHF P −V /T |x⟩ = ⟨x|e−tHF P −V /T |x′ ⟩ = ⟨x′ |e −V /T −tHF P |x⟩ .

Since this must be true for any x, x′ then e−tHF P = eV /T e−tHF P e−V /T and

HF† P = eV /T HF P e−V /T , (255)

i.e. eV /2T HF P e−V /2T is hermitian, which can be checked directly.


Consider now a time-dependent potential. Even though ∫
one can define a
time-dependent Gibbs state Zt−1 exp[−βV (x, t)] with Zt = exp[−βV (x, t)]dx,
it is not any longer a solution. Can we have anything generalizing the de-
tailed balance relation (254) we had in equilibrium? Such relation was found
surprisingly recently despite its generality and relative technical simplicity of
derivation.
Define the work done during the time t as
∫ t ∂V (x(t′ ), t′ )
Wt = dt′ . (256)
0 ∂t′
The time derivative here is partial i.e. taken only with respect to the second
argument. The work is a fluctuating quantity depending on the trajectory
x(t′ ). We consider an ensemble of trajectories starting from the initial po-
sitions taken with the Gibbs distribution corresponding to the potential at
that moment: ρ(x, 0) = Z0−1 exp[−V (x, 0)]. Then the remarkable relation
holds (Jarzynski 1997):

⟨exp(−βWt )⟩ = Zt /Z0 . (257)

Here the bracket means double averaging, over the initial distribution ρ(x, 0)
and over the different realizations of the Gaussian noise η(t) during the time
interval (0, t). Remark that the entropy is defined only in equilibrium, yet the
work divided by temperature is an analog of the entropy change (production),
and the exponent of it is an analog of the phase volume change.
To prove (257) we consider the generalized Fokker-Planck equation for
the joint probability ρ(x, W, t). Similar to the argument preceding (221),

125
we note that the flow along W in x − W space proceeds with the velocity
dW/dt = ∂t V so that the respective component of the current is ρ∂t V and
the equation takes the form
∂t ρ = β −1 ∂x2 ρ + ∂x (ρ∂x V ) − ∂W ρ∂t V , (258)
Since W0 = 0 then the initial condition for (258) is
ρ(x, W, 0) = Z0−1 exp[−V (x, 0)]δ(W ) . (259)
While we cannot find ρ(x, W, t) for arbitrary V (t) we can multiply (258) by
exp(−βW ) and integrate over dW . Since

V (x, t) does not depend on W , we
get the closed equation for f (x, t) = dW ρ(x, W, t) exp(−βW ):
∂t f = β −1 ∂x2 f + ∂x (f ∂x V ) − βf ∂t V , (260)
Now, this equation does have an exact time-dependent solution f (x, t) =
Z0−1 exp[−βV (x, t)] where the factor is chosen to satisfy the initial condition
(259). In other words, the distribution weighted by exp(−βWt ) looks like
Gibbs state, adjusted to the time-dependent potential at every moment of
time. To get (257), what remains is to integrate f (x, t) over x.
Let us reflect. We started from a Gibbs distribution but considered ar-
bitrary temporal evolution of the potential. Therefore, our distribution was
arbitrarily far from equilibrium during the evolution. Still, to obtain the
mean exponent of the work done, it is enough to know the partition func-
tions of the equilibrium Gibbs distributions corresponding to the potential at
the beginning and at the end (even though the system is not in equilibrium
at the end). Remarkable.
One can obtain different particular results from the general fluctuation-
dissipation relation (257). For example, using Jensen inequality ⟨eA ⟩ ≥ e⟨A⟩
and introducing the free energy Ft = −T ln Zt , one can obtain the second
law of thermodynamics:
⟨W ⟩ ≥ Ft − F0 .
Moreover, Jarzynski relation is a generalization of the fluctuation-dissipation
theorem, which can be derived from it for small deviations from equilibrium.
Namely, we can consider V (x, t) = V0 (x) − f (t)x, consider limit of f → 0,
expand (257) up to the second-order terms in f and get (237).
In a multi-dimensional case, there is another way to deviate the system
from equilibrium - to apply a non-potential force f (q, t) (which is not a
gradient of any scalar):
q̇ = f − ∂q V + η . (261)

126
The new Fokker-Planck equation has an extra term
[ ( ) ]
∂ρ ∂ ∂V ∂ρ
= ρ + fi + T = −ĤF P ρ . (262)
∂t ∂qi ∂qi ∂qi
Again, there is no Gibbs steady state and the detailed balance (254,255) is
now violated in the following way:

HF† P = eV /T HF P e−V /T + (f · q̇)/T , (263)

The last term is again the power divided by temperature i.e. the entropy
production rate. A close analog of the Jarzynski relation can be formulated
for the production rate measured during the time t:
1 ∫t
σt = − (f · q̇) dt . (264)
tT 0
This quantity fluctuates from realization to realization (of the noise η).
The probabilities P (σt ) satisfy the following relation, which we give with-
out derivation (see Kurchan for details)
P (σt )
∝ etσt . (265)
P (−σt )
The second law of thermodynamics states that to keep the system away from
equilibrium, the external force f must on average do a positive work. Over a
long time we thus expect σt to be overwhelmingly positive, yet fluctuations
do happen. The relation (265) shows how low is the probability to observe a
negative entropy production rate - this probability decays exponentially with
the time of observation.
The relation similar to (265) can be derived for any system symmetric
with respect to some transformation to which we add anti-symmetric per-
turbation. Consider a system with the variables s1 , . . . , sN and the even
energy: E0 (s) = E0 (−s). Consider the energy perturbed by an odd term,

E = E0 − hM/2, where M (s) = si = −M (−s). The probability of the
perturbation P [M (s)] satisfies the direct analog of (265), which is obtained
by changing the integration variable s → −s:

P (a) = dsδ[M (s) − a]e−βE0 +βha/2

= dsδ[M (s) + a]e−βE0 −βha/2 = P (−a)e−βha . (266)

127
9.5 Central limit theorem and large deviations
Mathematical statement underlying most of the statistical physics in the
thermodynamic limit is the central limit theorem, which states that the sum
of many independent random numbers has Gaussian probability distribution.
Recently, however, we are more and more interested in the statistics of not
very large systems or in the statistics of really large fluctuations. To answer
such questions, here we discuss the sum of random numbers in more detail.
Consider the variable X which is a sum of many independent identically

distributed (iid) random numbers X = N 1 yi . Its mean value ⟨X⟩ = N ⟨y⟩
grows linearly with N . Its fluctuations X − ⟨X⟩ on the scale less and compa-
rable with O(N 1/2 ) are governed by the Central Limit Theorem that states
that (X − ⟨X⟩)/N 1/2 becomes for large N a Gaussian random variable with
variance ⟨y 2 ⟩ − ⟨y⟩2 ≡ ∆. Finally, its fluctuations on the larger scale O(N )
are governed by the Large Deviation Theorem that states that the PDF of
X has asymptotically the form

P(X) ∝ e−N H(X/N −⟨y⟩) . (267)

To show this, let us characterize y by its generating function ⟨e zy ⟩ ≡ eS(z) (as-


suming that the mean value exists for all complex z). The derivatives of S(z)
at zero determine the cumulants of y. The basic statement is that because
all y-s in the sum are independent then the generating function ⟨e zX ⟩ of the
moments of X has exponential dependence on N : ⟨e zX ∫⟩ = eN S(z) . The PDF
P(X) is then given by the inverse Laplace transform 2πi 1
e− z X+N S(z) dz with
the integral over any axis parallel to the imaginary one. For large N , the inte-
gral is dominated by the saddle point z0 such that S ′ (z0 ) = X/N and the large
deviation relation (267) follows with H = −S(z0 ) + z0 S ′ (z0 ). The function
H of the variable X/N − ⟨y⟩ is called entropy function since it is a logarithm
of probability. Note that N dH/dX = z0 (X) and N 2 d2 H/dX 2 = 1/S ′′ (z0 ).
A few important properties of H (also called rate or Cramér function) may
be established independently of the distribution P(y) or S(z). It is a convex
function which takes its minimum at zero, i.e. for X taking its mean value
⟨X⟩ = N ⟨y⟩ = N S ′ (0). The minimal value of H vanishes since S(0) = 0.
The entropy is quadratic around its minimum with H ′′ (0) = ∆−1 , where
∆ = S ′′ (0) is the variance of y. Quadratic entropy means Gaussian probabil-
ity near the maximum — this statement is (loosely speaking) the essence of
the central limit theorem. In the particular case of Gaussian y, X is Gaussian
as well. Non-Gaussianity of the y’s leads to a non-quadratic behavior of H

128
for (large) deviations of X/N from the mean of the order of ∆/S ′′′ (0).
A simple example is provided by tossing coin and attributing y = 0 to
heads and y = 1 to tails. X gives then the overall number of tails and
N!
it appears with probability 2N X!(N −X)!
. In this example, S(z) = ln(1/2 +
e /2), ⟨X⟩ = N/2, ∆ = 1/4. The Large Deviation form (267) may be
z

easily obtained from the Stirling formula and the entropy H is equal to
(1 − X/N ) ln(1 − X/N ) + (X/N ) ln(X/N ) + ln 2 for X/N between 0 and 1
and to +∞ otherwise (X is never bigger than N ).

Another example is the statistics of the kinetic energy, E = N 2
1 pi /2, of
N classical identical unit-mass particles. We considered that back in the Sec-
tion 2.2 in the micro-canonical approach and thermodynamic limit N → ∞.
Let us now look using canonical Gibbs distribution
( ∑ which
) is Gaussian for
−N/2
momenta: ρ(p1 , . . . , pN ) = (2πT ) exp − 1 pi /2T . The energy prob-
N 2

ability for any N is as follows:


∫ ( )

N
ρ(E, N ) = ρ(p1 , . . . , pN )δ E − p2i /2 dp1 . . . dpN
1
( )N/2
E exp(−E/T )
= . (268)
T EΓ(N/2)

Plotting it for different N , one can appreciate how the thermodynamic limit
appears. Taking the logarithm and using the Stirling formula one gets the
large-deviation form for the energy R = E/Ē, normalized by the mean energy
Ē = N T /2:

N RN N RN N
ln ρ(E, N ) = ln − ln ! − ≈ (1 − R + ln R) . (269)
2 2 2 2 2
This expression has a maximum at R = 1 i.e the most probable value is
the mean energy. The probability of R is Gaussian near maximum when
R − 1 ≤ N −1/2 and non-Gaussian for larger deviations. Notice that this
function is not symmetric with respect to the the minimum, it has logarithmic
asymptotics at zero and linear asymptotics at infinity.

129
Basic books
L. D. Landau and E. M. Lifshitz, Statistical Physics Part 1.
R. K. Pathria, Statistical Mechanics.
R. Kubo, Statistical Mechanics.
K. Huang, Statistical Mechanics.
C. Kittel, Elementary Statistical Physics.

Additional reading
S.-K. Ma, Statistical Mechanics.
A. Katz, Principles of Statistical Mechanics.
J. Cardy, Scaling and renormalization in statistical physics.
M. Kardar, Statistical Physics of Particles, Statistical Physics of Fields.
J. Sethna, Entropy, Order Parameters and Complexity.
J. Kurchan, Six out of equilibrium lectures, http://xxx.tau.ac.il/abs/0901.1271

130
Exam 2007
1. A lattice in one dimension has N cites and is at temperature T .
At each cite there is an atom which can be in either of two energy states:
Ei = ±ϵ. When L consecutive atoms are in the +ϵ state, we say that they
form a cluster of length L (provided that the atoms adjacent to the ends of
the cluster are in the state −ϵ). In the limit N → ∞,
a) Compute the probability PL that a given cite belongs to a cluster of

length L (don’t forget to check that ∞ L=0 PL = 1);
b) Calculate the mean length of a cluster ⟨L⟩ and determine its low- and
high-temperature limits.
2. Consider a box containing an ideal classical gas at pressure P and
temperature T. The walls of the box have N0 absorbing sites, each of which
can absorb at most two molecules of the gas. Let −ϵ be the energy of an
absorbed molecule. Find the mean number of absorbed molecules ⟨N ⟩. The
dimensionless ratio ⟨N ⟩/N0 must be a function of a dimensionless parameter.
Find this parameter and consider the limits when it is small and large.
3. Consider the spin-1 Ising model on a cubic lattice in d dimensions,
given by the Hamiltonian
∑ ∑ ∑
H = −J Si Sj − ∆ Si2 − h Si ,
⟨i,j⟩ i i


where Si = 0, ±1, <ij> denote a sum over z nearest neighbor sites and
J, ∆ > 0.

(a) Write down the equation for the magnetization m = ⟨Si ⟩ in the mean-
field approximation.

(b) Calculate the transition line in the (T, ∆) plane (take h = 0) which
separates the paramagnetic and the ferromagnetic phases. Here T is
the temperature.

(c) Calculate the magnetization (for h = 0) in the ferromagnetic√ phase


near the transition line, and show that to leading order m ∼ Tc − T ,
where Tc is the transition temperature.

(d) Show that the zero-field (h = 0) susceptibility χ in the paramagnetic

131
phase is given by
1 1
χ= .
1 −β∆
kB T 1 + 2 e − Jz
kB T

4. Compare the decrease in the entropy of a reader’s brain with the


increase in entropy due to illumination. Take, for instance, that it takes t =
100 seconds to read one page with 3000 characters written by the alphabet
that uses 32 different characters (letters and punctuation marks). At the
same time, the illumination is due to a 100 Watt lamp (which emits P =
100J/s). Take T = 300K and use the Boltzmann constant k = 1.38·10−23 J/K.

Solutions
Problem 1.
a) Probabilities of any cite to have energies ±ϵ are
P± = e±βϵ (eβϵ + e−βϵ )−1 .
The probability for a given cite to belong to an L-cluster is PL = LP+L P−2
for L ≥ 1 since cites are independent and we also need two adjacent cites to
have −ϵ. The cluster of zero length corresponds to a cite having −ϵ so that
PL = P− for L = 0. We ignore the possibility that a given cite is within L
of the ends of the lattice, it is legitimate at N → ∞.

∑ ∞
∑ ∞
∂ ∑
PL = P− + P−2 LP+L = P− + P−2 P+ P+L
L=0 L=1 ∂P+ L=1
P−2 P+
= P− + = P− + P+ = 1 .
(1 − P+ )2
b)
∞ ∞
∑ ∂ ∑ P+ (1 + P+ ) eβϵ + 2e−βϵ
⟨L⟩ = LPL = P−2 P+ LP+L = = e−2βϵ βϵ .
L=0 ∂P+ L=1 P− e + e−βϵ
At T = 0 all cites are in the lower level and ⟨L⟩ = 0. As T → ∞, the proba-
bilities P+ and P− are equal and the mean length approaches its maximum
⟨L⟩ = 3/2.

132
Problem 2.
Since each absorbing cite is in equilibrium with the gas, then the cite and
the gas must have the same chemical potential µ and the same temperature
T . The fugacity of the gas z = exp(βµ) can be expressed via the pressure
from the grand canonical partition function
Zg (T, V, µ) = exp[zV (2πmT )3/2 h−3 ] ,
P V = Ω = T ln Zg = zV T 5/2 (2πm)3/2 h−3 .
The grand canonical partition function of an absorbing cite Zcite = 1 + zeβϵ +
z 2 e2βϵ gives the average number of absorbed molecules per cite:
⟨N ⟩ ∂Zcite x + 2x2
=z =
N0 ∂z 1 + x + x2
where the dimensionless parameter is x = P T −5/2 eβϵ h3 (2πm)−3/2 . The limits
are ⟨N ⟩/N0 → 0 as x → 0 and ⟨N ⟩/N0 → 2 as x → ∞.

Problem 3.
a) Hef f (S) = −JmzS − ∆S 2 − hS, S = 0, ±1.
eβ(Jzm+h) − e−β(Jzm+h)
m = eβ∆ [ ] .
1 + eβ∆ eβ(Jzm+h) + e−β(Jzm+h)
b) h = 0,
2βJzm + (βJzm)3 /3
m ≈ eβ∆ .
1 + 2eβ∆ [1 + (βJzm)2 /2]
Transition line βc Jz = 1 + 12 e−βc ∆ . At ∆ → ∞ it turns into Ising.
c)
(β − βc )Jz
m2 = .
(βc Jz)2 /2 − (βc Jz)3 /6
d)
2βJzm + 2βh
m ≈ eβ∆ , m ≈ 2βh(2 + e−β∆ − βJz)−1 , χ = ∂m/∂h .
1 + 2eβ∆

Problem 4
Since there are 25 = 32 different characters then every character brings
5 bits and the entropy decrease is 5 × 3000/ log2 e. The energy emitted by
the lamp P t brings the entropy increase P t/kT which is 100 × 100 × 1023 ×
log2 e/1.38 × 300 × 5 × 3000 ≃ 1020 times larger.

133
Exam 2009
1. Consider a classical ideal gas of atoms whose mass is m moving in an
attractive potential of an impenetrable wall: V (x) = Ax2 /2 with A > 0 for
x > 0 and V (x) = ∞ for x < 0. Atoms move freely along y and z. Let T
be the temperature and n be the number of atoms per unit area of the wall.
Consider the thermodynamic limit.
a) Calculate the concentration n(x) — the number of atoms per∫∞
unit vol-
ume as a function of the distance x from the wall. Note that n = 0 n(x) dx.
b) Calculate the energy and specific heat per unit area of the wall.
c) Find the free energy and chemical potential.
d) Find the pressure the atoms exert on the wall (i.e. at x = 0).

2. A cavity containing a gas of electrons has a small hole of area A through


which electrons can escape. External electrodes are so arranged that voltage
is V between inside and outside of the cavity. Assume that
i) a constant number density of electrons is maintained inside (for example,
by thermionic emission);
ii) electrons are in thermal equilibrium with temperature T and chemical
potential µ such that kT ≪ V − µ;
iii) electrons moving towards the hole escape if they have an energy greater
than V.
Estimate the total current carried by escaping electrons.

3. A d-dimensional container is divided into two regions A and B by a


fixed wall. The two regions contain identical Fermi gases of spin 1/2 particles
which have a magnetic moment τ . In region A there is a magnetic field of
strength H, but there is no field in region B. Initially, the entire system is
at zero temperature, and the numbers of particles per unit volume are the
same in both regions. If the wall is now removed, particles may flow from
one region to the other. Determine the direction in which particles begin to
flow, and how the answer depends on the space dimensionality d.

134
4. One may think of certain networks, such as the internet, as a directed
graph G of N vertices. Every pair of vertices, say i and j, can be connected by
multiple edges (e.g. hyperlinks) and loops may connect edges to themselves.
The graph can therefore be described in terms of an adjacency matrix Aij
with N 2 elements; each Aij counts the number of edges connecting i and j
and can be any non-negative integer, 0, 1, 2...

The entropy of the ensemble of all possible graphs is S = − G pG ln pG =

− {Aij } p(Aij ) ln p(Aij ). Consider such an ensemble with the fixed average
number of edges per vertex ⟨k⟩.
(i) Write an expression for the number of edges per vertex k for a given
graph Aij . Use the maximum entropy principle to calculate pG (Aij ) and the
partition function Z (denote the Lagrange multiplier that accounts for fixed
⟨k⟩ by τ ). What is the equivalent of the Hamiltonian? What are the degrees
of freedom? What kind of “particles” are they? Are they interacting?
(ii) Calculate the free energy F = − ln Z, and express it in terms of τ . Is
it extensive with respect to any number?
(iii) Write down an expression for the mean occupation number ⟨Aij ⟩ as
a function of τ . What is the name of this statistics? What is the ”chemical
potential” and why?
(iv) Express F via N and ⟨k⟩. Express pG for a given graph as a function
of k, ⟨k⟩ and N .

135
Solutions

Problem 1.
a) n(x) = n(0) exp(−βAx2 /2) = 2n(Aβ/2π)1/2 exp(−βAx2 /2).
b) For xi > 0, ( )
∑ p2i Ax2i
H= + .
i 2m 2
Equipartition gives E = 4(T /2)N = 2T N and Cv = 2N .
c)
[ ( )]
h3N ∫ ∑ p2i Ax2i
ZN = πdpdx exp −β +
N! i 2m 2
√ N
h3N
L2N √ 2πT /A
= ( 2πmT )3N   ,
N! 2
[ ( )]
F = −T 3N ln h − N ln N + N + 2N ln L + N ln 2π 2 T 2 m3/2 /A1/2 ,
∂F [ ( )]
µ= = −T 3 ln h − ln N + 2 ln L + ln 2π 2 T 2 m3/2 /A1/2
∂N
[ ( )]
= −T 3 ln h − ln n + ln 2π 2 T 2 m3/2 /A1/2 .

d) P (x) = n(x)T . Note that the pressure is inhomogeneous so it is not ∂F/∂l


where l is the length of the system in the x-direction — the free energy F is
not extensive in l and at the limit l → ∞ we have ∂F/∂l = 0.

Problem 2
Our main task is to find the number of particles per unit time escaping
from the cavity. To this end, we first suppose that the cavity is a cube of side
L and that single-particle wavefunctions satisfy periodic boundary conditions
at its walls. Then the allowed momentum eigenvalues are pi = hni /L (i =
1, ..., 3), where the ni are positive or negative integers. Then (allowing for
two spin polarizations) the number of states with momentum in the range d3 p
is (2V /h3 )d3 p, where V = L3 is the volume. (To an excellent approximation,
this result is independent of the shape of the cavity.) Multiplying by the
grand canonical occupation numbers for these states, we find that the number

136
of particles per unit volume with momentum in the range d3 p near p is
n(p)d3 p, where
2 1
n(p) = 3 (270)
h exp[β(ϵ(p) − η)] + 1
with ϵ(p) = |p|2 /2m. Now we consider, in particular, a small volume enclos-
ing the hole in the cavity wall, and adopt a polar coordinate system with
the usual angles θ and ϕ, such that the axis θ = 0 (the z axis, say) is the
outward normal to the hole (see corresponding figure).
The number of electrons per unit volume whose volume has a magnitude
between p and p + dp and is directed into a solid angle dΩ = sinθdθdϕ
surrounding the direction (θ, ϕ) is

n(p)p2 sin θdpdθdϕ (271)

and, since these electrons have a speed p/m, the number per unit time cross-
ing a unit area normal to the (θ, ϕ) direction is
1
n(p)p3 sin θdpdθdϕ . (272)
m
The hole subtends an area δA cos θ normal to this direction, so the number
of electrons per unit time passing through the hole with momentum between
p and p + dp into the solid angle dΩ is
δA
n(p)p3 sin θ cos θdpdθdϕ . (273)
m
It is useful to check this for the case of a classical gas, with

n(p) = n(2πmkB T )−3/2 e−p


2 /2mk
BT
,

where n is the total number density, and with V = 0. Bearing in mind that
only those particles escape for which 0 ≤ θ < π/2, we then find that the
total number of particles escaping per unit time is
∫ ∞ ∫ π/2 ∫ 2π [ ]
δAn p2 1
3/2
dp dθ dϕp exp −
3
sin θ cos θ = nc̄δA ,
m(2πmkB T ) 0 0 0 2mkB T 4
√ (274)
where c̄ = 8kB T /πm is the mean speed. This standard result can be
obtained by several methods in elementary kinetic theory. For the case in

137
hand, the number of electrons escaping per unit time is
dN δA ∫ ∞ ∫ π/2 ∫ 2π
= √ dp dθ dϕp3 n(p) sin θ cos θ (275)
dt m 2mV 0 0
2πδA ∫√
∞ p3
= mh3 2mV eβ(ϵ(p)−µ) +1 dp
(276)
.

If V − µ ≫ kB T , then β(ϵ(p) − µ) is large for all values of p in the range of


integration, and we can use the approximation
( )
dN 2πδA ∫ ∞ πδA V
≃ 3 √ p3 e−(ϵ(p)−µ)β dp ≃ 3
(2mkB T )2 e−(V −µ)β 1 + .
dt mh 2mV mh kB T
(277)
Since µ is positive for a Fermi gas at low temperatures, we also have V ≫
kB T , so the 1 and 1 + V /kB T can be neglected. Finally, therefore, on multi-
plying by the charge −e each electron, we estimate the current as
( )
4πme
I=− δAV kB T exp [−β(V − µ)] . (278)
h3
If the temperature is low enough that kB T ≪ ϵF , then µ can be replaced
by the Fermi energy ϵF = (h2 /8m)(3n/π)2/3 , where n is the total number of
particles per unit volume. Of course, the current, that we have calculated is
the charge per unit time emerging from the hole. The number of electrons
per unit solid angle at an angle θ to the normal (z direction) is proportional
to cos θ, so, although no electrons emerge tangentially to the wall (θ = π/2),
not all of them travel in the z direction.

Problem 3
In general, particles flow from a region of higher chemical potential to a
region of lower chemical potential. We therefore need to find out in which
region the chemical potential is higher, and we do this by considering the
grand canonical expression for the number of particles per unit volume. In
the presence of a magnetic field, the single- particle energy is ϵ ± τ H, where ϵ
is the kinetice energy, depending on whether the magnetic moment is parallel
or antiparallel to the field. The total number of particles is then given by
∫ ∞ ∫ ∞
1 1
N= dϵg(ϵ) + dϵg(ϵ) .
0 exp[β(ϵ − η − τ H)] + 1 0 exp[β(ϵ − η + τ H)] + 1
(279)

138
For non-relativistic particles in a d−dimensional volume V , the density of
states is g(ϵ) = γV ϵd/2−1 , where γ is a constant. At T = 0, the Fermi
distribution function is
( )
1
lim = θ(µ ∓ τ H − ϵ) (280)
β→∞ eβ(ϵ−µ±τ H) + 1
where θ(·) is the step function, so the integrals are easily evaluated with the
result
N 2γ [ ]
= (µ + τ H)d/2 + (µ − τ H)d/2 . (281)
V d
At the moment that the wall is removed, N/V is the same in regions A and
B; so (with H = 0 is the region B) we have
d/2
(µA + τ H)d/2 + (µA − τ H)d/2 = 2µB . (282)

For small fields, we can make use of the Taylor expansions


( )
d d d
(1 ± x) d/2
=1± x+ − 1 x2 + ... (283)
2 4 2

to obtain ( )d/2 ( )2
µB d(d − 2) τH
=1+ + ... (284)
µA 8 µA
We see that, for d = 2, the chemical potentials are equal, so there is no flow of
particles. For d > 2, we have µB > µA so particles flow towards the magnetic
field in region A while, for d < 2, the opposite is true. We can prove that the
same result holds for any magnetic field strength as follows. For compactness,
d/2
we write λ = τ H. Since our basic equation (µA +λ)d/2 +(µA −λ)d/2 = 2µB is
unchanged if we change λ to −λ, we can take λ > 0 without loss of generality.
Bearing in mind the µB is fixed, we calculate dµA /dλ as

dµA (µA − λ)d/2−1 − (µA + λ)d/2−1


= . (285)
dλ (µA − λ)d/2−1 + (µA + λ)d/2−1

Since µA + λ > µA − λ, we have (µA + λ)d/2−1 > (µA − λ)d/2−1 if d > 2 and
vice versa. Therefore, if d > 2, then dµA /dλ is negative and, as the field
is increased, µA decreased from its zero-field value µB and is always smaller
than µB . Conversely, if d < 2, then µA is always greater than µB . For d = 2,
we have µA = µB independent of the field.

139
Problem 4

(i) k = N −1 i,j Aij . The thermodynamic potential (Lagrangian in me-
chanical terms) to be minimized is therefore
∑ ∑ ∑ ∑
L=− p(Aij ) ln p(Aij ) + λ0 p(Aij ) − τ N −1 p(Aij ) Aij . (286)
{Aij } {Aij } {Aij } i,j

Minimizing L, that is ∂L/∂p(Aij ) = 0, one finds the distribution and the


partition function
[ ∑ ]  
exp −(τ /N ) i,j Aij ∑ ∑
pG (Aij ) = , Z= exp −(τ /N ) Aij  . (287)
Z {Aij } i,j

The equivalent of the Hamiltonian is k. The degrees of freedom are the N 2


entries of Aij . Each d.o.f. is a “boson” since it takes all non-negative integers
as possible values. The bosons are non-interacting in the “Hamiltonian” .
(ii)
 
∑ ∑ ∑ ∏ ( )
τ Aij
F = − ln Z = − ln exp −τ /N Aij  = − ln exp −
{Aij } i,j {Aij } i,j
N
∑ ∏ ( ) ∏∑ ( ) ( )
τ Aij τ Aij
= − ln exp − = − ln exp − = N 2 ln 1 − e−τ /N .
{Aij } i,j
N i,j Aij N

F is extensive in (N 2 . )−1
(iii) ⟨Aij ⟩ = eτ /N − 1 , that is a Bose-Einstein statistics with zero
chemical potential since additional( edges are free to form.
)−1
−1 ∑
(iv) ⟨k⟩ = N i,j ⟨Aij ⟩ = N e
τ /N
−1 ⇒ τ /N = ln(N/ ⟨k⟩ + 1). It
( )
follows that the free energy is F = N 2 ln 1 − e−τ /N = −N 2 ln(1 + ⟨k⟩ /N ).
Similarly pG (Aij ) = (N/ ⟨k⟩ + 1)−N k (1 + ⟨k⟩ /N )−N .
2

140

You might also like