0% found this document useful (0 votes)
101 views146 pages

Physics 12c: Introduction To Statistical Mechanics: Course Website

Uploaded by

haidararesslan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views146 pages

Physics 12c: Introduction To Statistical Mechanics: Course Website

Uploaded by

haidararesslan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 146

Ph 12c 2024

Physics 12c: Introduction to Statistical Mechanics


David Simmons-Duffin

Course website
https://courses.caltech.edu/course/view.php?id=3823.

About these notes


These notes are not original. Their primary purpose is to remind me what to say
during lectures. They draw heavily from several sources, including Kittel and Kroe-
mer, Statistical Physics lecture notes by David Tong1 and Matt Schwartz2 , lecture
notes from John Preskill’s 2016 Physics 12c course, and possibly other sources. Last
updated: May 23, 2024.

1
http://www.damtp.cam.ac.uk/user/tong/statphys.html
2
http://users.physics.harvard.edu/~schwartz/teaching

1
Ph 12c 2024 Contents

Contents

1 Counting 7
1.1 Macrostates and microstates . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Large numbers and sharp distributions . . . . . . . . . . . . . . . . . 8
1.3 Probability and expectation values . . . . . . . . . . . . . . . . . . . 11
1.4 Spin system again . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Connection to random walks . . . . . . . . . . . . . . . . . . . . . . 14

2 Entropy and Temperature 16


2.1 Fundamental assumption of statistical mechanics . . . . . . . . . . . 16
2.1.1 Intuition for the fundamental assumption . . . . . . . . . . . 16
2.1.2 Mathematical statement of fundamental assumption . . . . . 18
2.2 Thermal contact and most probable configuration . . . . . . . . . . . 18
2.2.1 Example: spin system . . . . . . . . . . . . . . . . . . . . . . 18
2.2.2 Back to the general case . . . . . . . . . . . . . . . . . . . . . 20
2.3 Entropy and temperature . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Comments on entropy . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 Entropy increases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 The arrow of time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.7 Heat capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.8 The laws of thermodynamics . . . . . . . . . . . . . . . . . . . . . . 25

3 The Boltzmann distribution 26


3.1 Microcanonical ensemble . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Canonical ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Why Taylor expand the entropy? . . . . . . . . . . . . . . . . . . . . 28
3.4 The partition function . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 Example: spin system . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.6 Thermal density matrix . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.6.1 Density matrices . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.6.2 Thermal density matrix . . . . . . . . . . . . . . . . . . . . . 33

4 Entropy in general ensembles 34


4.1 Gibbs/Shannon entropy . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Sanity checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Entropy of the canonical ensemble . . . . . . . . . . . . . . . . . . . 36
4.4 Comments on Shannon Entropy . . . . . . . . . . . . . . . . . . . . . 36
4.4.1 Optimal compression and entropy . . . . . . . . . . . . . . . 39

2
Ph 12c 2024 Contents

5 The thermodynamic limit and free energy 40


5.1 The thermodynamic limit . . . . . . . . . . . . . . . . . . . . . . . . 40
5.2 Free energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.3 Pressure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.4 Thermodynamic identities . . . . . . . . . . . . . . . . . . . . . . . . 43
5.4.1 Constrained derivatives . . . . . . . . . . . . . . . . . . . . . 43
5.4.2 Pressure, entropy, and free energy . . . . . . . . . . . . . . . 44
5.5 Ideal gas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.5.1 One atom in a box . . . . . . . . . . . . . . . . . . . . . . . . 45
5.5.2 N particles and the Gibbs paradox . . . . . . . . . . . . . . . 47

6 Blackbody radiation and the Planck distribution 50


6.1 The Planck distribution . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.1.1 Energy in a mode . . . . . . . . . . . . . . . . . . . . . . . . 50
6.1.2 What are photons? . . . . . . . . . . . . . . . . . . . . . . . . 51
6.1.3 Modes in a cavity . . . . . . . . . . . . . . . . . . . . . . . . 51
6.1.4 Energy of a photon gas and the Planck distribution . . . . . 53
6.1.5 Energy flux . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.2 Kirchhoff’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.3 Planck distribution, temperature, and CMB . . . . . . . . . . . . . . 56
6.4 Partition function, entropy, and pressure of photon gas . . . . . . . . 57

7 Debye theory 59

8 Classical Statistical Mechanics 62


8.1 Equipartition of energy . . . . . . . . . . . . . . . . . . . . . . . . . 65

9 Chemical potential and the Gibbs Distribution 68


9.1 Entropy and conserved quantities . . . . . . . . . . . . . . . . . . . . 68
9.2 Chemical potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
9.3 Relation to free energy . . . . . . . . . . . . . . . . . . . . . . . . . . 70
9.4 Gibbs factor and Gibbs sum . . . . . . . . . . . . . . . . . . . . . . . 72

10 Ideal Gas 74
10.1 Spin and statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
10.1.1 Bosons vs. Fermions . . . . . . . . . . . . . . . . . . . . . . . 74
10.1.2 Spin review . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
10.1.3 The spin-statistics theorem . . . . . . . . . . . . . . . . . . . 76
10.2 Statistics of Fermions and Bosons . . . . . . . . . . . . . . . . . . . . 77
10.2.1 Fermi-Dirac statistics . . . . . . . . . . . . . . . . . . . . . . 77
10.2.2 Bose-Einstein statistics . . . . . . . . . . . . . . . . . . . . . 77
10.3 The classical limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3
Ph 12c 2024 Contents

11 Fermi Gases 80
11.1 Gibbs sum and energy . . . . . . . . . . . . . . . . . . . . . . . . . . 80
11.2 Zero temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
11.3 Heat capacity at low temperature . . . . . . . . . . . . . . . . . . . . 83

12 Bose Gases and Bose-Einstein Condensation 91


12.1 Chemical potential near τ = 0 . . . . . . . . . . . . . . . . . . . . . . 91
12.2 Orbital occupancy versus temperature . . . . . . . . . . . . . . . . . 92
12.3 Liquid 4 He and superfluidity . . . . . . . . . . . . . . . . . . . . . . 96
12.4 Dilute gas Bose-Einstein condensates . . . . . . . . . . . . . . . . . . 98
12.4.1 Magnetic traps . . . . . . . . . . . . . . . . . . . . . . . . . . 98
12.4.2 Laser cooling . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
12.4.3 Evaporative cooling . . . . . . . . . . . . . . . . . . . . . . . 100
12.4.4 BEC in a harmonic trap . . . . . . . . . . . . . . . . . . . . . 101

13 Heat and Work 103


13.1 Reversibility, heat, and work . . . . . . . . . . . . . . . . . . . . . . 103
13.2 Heat engines and Carnot efficiency . . . . . . . . . . . . . . . . . . . 104
13.3 Refrigerators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
13.4 Different types of expansion . . . . . . . . . . . . . . . . . . . . . . . 106
13.5 The Carnot cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
13.5.1 Path dependence of heat and work . . . . . . . . . . . . . . . 109
13.5.2 Universality of reversible engines . . . . . . . . . . . . . . . . 110

14 Gibbs free energy 111


14.1 Heat and work at constant temperature . . . . . . . . . . . . . . . . 111
14.2 Heat and work at constant pressure . . . . . . . . . . . . . . . . . . . 111
14.3 Thermodynamic identities for Gibbs free energy . . . . . . . . . . . . 112
14.4 Gibbs free energy and chemical potential . . . . . . . . . . . . . . . . 113
14.5 Chemical processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
14.6 Equilibrium for ideal gases . . . . . . . . . . . . . . . . . . . . . . . . 115
14.6.1 Heat capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

15 Phase transitions 119


15.0.1 Nonanalyticities . . . . . . . . . . . . . . . . . . . . . . . . . 119
15.0.2 Phase diagram of water . . . . . . . . . . . . . . . . . . . . . 120
15.1 Coexistence curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
15.1.1 Approximate solution to the vapor pressure equation . . . . . 122
15.1.2 Physics on the coexistence curve . . . . . . . . . . . . . . . . 122
15.2 The Van der Waals equation of state . . . . . . . . . . . . . . . . . . 123
15.2.1 Isotherms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
15.2.2 The Maxwell construction . . . . . . . . . . . . . . . . . . . . 125
15.2.3 Metastable states . . . . . . . . . . . . . . . . . . . . . . . . . 126

4
Ph 12c 2024 Contents

15.3 The critical point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127


15.4 The Ising Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
15.4.1 The Ising model as a Lattice Gas . . . . . . . . . . . . . . . . 130
15.5 Mean Field Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
15.5.1 Critical exponents in Mean Field Theory . . . . . . . . . . . 132
15.6 Critical phenomena and universality . . . . . . . . . . . . . . . . . . 133
15.7 Solving the 1d Ising model . . . . . . . . . . . . . . . . . . . . . . . . 135
15.8 The 1d Ising model and the transfer matrix . . . . . . . . . . . . . . 135
15.9 Quantization in quantum mechanics . . . . . . . . . . . . . . . . . . 138
15.10The 2d Ising model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

16 Maxwell’s demon 143

A Review: trace 145

5
Ph 12c 2024 List of definitions

List of definitions
Definition (microstate/state) . . . . . . . . . . . . . . . . . . . . . . 7
Definition (macrostate) . . . . . . . . . . . . . . . . . . . . . . . . . 8
Definition (expectation value) . . . . . . . . . . . . . . . . . . . . . . 11
Definition (entropy) . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Definition (temperature) . . . . . . . . . . . . . . . . . . . . . . . . . 20
Definition (heat capacity) . . . . . . . . . . . . . . . . . . . . . . . . 23
Definition (ensemble) . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Definition (microcanonical ensemble) . . . . . . . . . . . . . . . . . . 26
Definition (reservoir) . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Definition (Boltzmann factor) . . . . . . . . . . . . . . . . . . . . . . 28
Definition (partition function) . . . . . . . . . . . . . . . . . . . . . . 28
Definition (canonical ensemble) . . . . . . . . . . . . . . . . . . . . . 28
Definition (thermal density matrix) . . . . . . . . . . . . . . . . . . . 33
Definition (Gibbs/Shannon entropy) . . . . . . . . . . . . . . . . . . 35
Definition (von-Neumann entropy) . . . . . . . . . . . . . . . . . . . 35
Definition (thermodynamic limit) . . . . . . . . . . . . . . . . . . . . 40
Definition (free energy) . . . . . . . . . . . . . . . . . . . . . . . . . 41
Definition (constrained derivative) . . . . . . . . . . . . . . . . . . . 43
Definition (orbital) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Definition (black) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Definition (absorptivity) . . . . . . . . . . . . . . . . . . . . . . . . . 56
Definition (emissivity) . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Definition (isentropic) . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Definition (diffusive contact) . . . . . . . . . . . . . . . . . . . . . . 68
Definition (chemical potential) . . . . . . . . . . . . . . . . . . . . . 69
Definition (general reservoir) . . . . . . . . . . . . . . . . . . . . . . 72
Definition (grand canonical ensemble) . . . . . . . . . . . . . . . . . 72
Definition (grand canonical partition function/Gibbs sum) . . . . . . 73
Definition (Fermi energy) . . . . . . . . . . . . . . . . . . . . . . . . 82
Definition (reversible process) . . . . . . . . . . . . . . . . . . . . . . 103
Definition (heat) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Definition (work) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Definition (heat engine) . . . . . . . . . . . . . . . . . . . . . . . . . 104
Definition (mechanical contact) . . . . . . . . . . . . . . . . . . . . . 111
Definition (Landauer’s principle) . . . . . . . . . . . . . . . . . . . . 144

6
Ph 12c 2024 1 Counting

1 Counting
1.1 Macrostates and microstates
Classical mechanics and quantum mechanics give precise predictions for almost any
physical observable, within their respective realms of validity. For example, to
make a prediction using quantum mechanics, one must determine the initial state
|ψ(t = 0)⟩ of a system and the Hamiltonian H that generates time evolution. Then
one simply evolves the state forward in time using the Schrödinger equation

iℏ |ψ(t)⟩ = H|ψ(t)⟩. (1.1)
∂t
This is already an interesting problem for an individual Hydrogen atom. In the
approximation that the proton is stationary and (and ignoring the spin of the pro-
ton), we can write the state as a linear combination of position and spin eigenstates
for the electron
X Z
|ψsingle-electron (t)⟩ = dd x ψs (x, t)|x, s⟩, (1.2)
s=↑,↓

where s =↑, ↓ is the spin of the electron, and d = 3 is the number of spatial di-
mensions. The Schrodinger equation becomes a partial differential equation (PDE)
for two functions ψ↑,↓ (x, t) of d + 1 = 4 variables (that is, the number of spatial
dimensions plus the time).
Now let us use this framework to describe the weather. The number of molecules
in the atmosphere is approximately 1044 . So all we need to do is solve the Schrödinger
equation, which is a PDE for a function ψs1 ,...,s1044 (x1 , . . . , x1044 , t) of 3 × 1044 + 1 ≈
3 × 1044 variables.
Even to achieve a more modest goal of exactly modeling a handful of a typical
substance, we would need to solve the Schrödinger equation for a wavefunction with
O(NA ) = O(1023 ) variables, where NA is Avagodro’s number.
Of course, these are impossible computational tasks. But that doesn’t stop
us from doing physics. The reason is that the complete wavefunction of a system,
including the positions and spins of all the constituent particles, contains much more
information than we usually care about. We can still make predictions without all
that detailed information.
Let us introduce some terminology.
Definition (microstate/state). A microstate is a complete specification of the
physical state of a system. We often use the term state to mean a microstate.
In the context of quantum mechanics, a microstate is a complete specification of
the wavefunction of the system. In the context of classical mechanics, a microstate
is a complete list of the positions and momenta (and other degrees of freedom) of
all the constituent particles {x1 , . . . , xN , p1 , . . . , pN }.

7
Ph 12c 2024 1 Counting

Definition (macrostate). A macrostate is a partial specification of the state of


a system, typically involving a small number of quantities. A single macrostate
corresponds to a large number of microstates.
For example, we might specify a macrostate of a gas by giving its total energy U
and its volume V . A more complicated example of a macrostate is a description of
the atmosphere using fluid dynamics in terms of a density field ρ(x, t) and velocity
field v(x, t) satisfying the Navier-Stokes equations. Although this is much more
information that the two numbers {U, V }, it is vastly less information than the full
quantum wavefunction ψ(x1 , . . . , x1044 ), so it is still correct to call {ρ(x, t), v(x, t)}
a macrostate of the atmosphere.
For practical purposes, we usually only measure macroscopic quantities, i.e., we
measure the macrostate of a system. The idea of statistical mechanics is to avoid
studying the precise dynamics of individual microstates by instead describing the
statistics of microstates using probability theory.
Statistics and probability theory create a bridge between microscopic laws and
macroscopic phenomena. We will see that they explain many different physical
processes and have lots of applications.

1.2 Large numbers and sharp distributions


Using statistics is a good idea because of the incredibly enormous numbers involved.
The fact that these numbers are so huge means that statistical predictions can be
ridiculously good.
As an example, let us study a toy model. Consider a system with N sites on
a line, where each site can be in one of two states ↑, ↓. For concreteness, imagine
that each site contains a small magnet which can point either up or down, with
magnetization mi = msi with si = ±1, (i = 1, . . . , N ). We refer to si as the “spin”
at site i. A state is specified by giving the spins at each site. An example state for
the case N = 60 is3

↓↓↓↓↓↑↑↓↑↓↓↑↑↓↑↑↑↑↓↓↓↑↑↑↓↑↑↓↑↑↓↑↑↑↓↓↑↓↓↑↓↑↑↑↓↓↓↑↓↑↑↑↓↓↑↑↓↓↓↓ (1.3)

The total number of states is 2N .


Let us imagine coupling these spins to some random environment and letting
them evolve for a while. For simplicity, we assume the spins do not interact strongly
with each other and that they have no particular preference for pointing up or
down. After a while, let us measure the total magnetization, given by the sum of
the magnetic moments at each individual site:
N
X
M =m si . (1.4)
i=1
3
Mathematica code: StringJoin[Table[RandomChoice[{"\uparrow","\downarrow"}],60]]

8
Ph 12c 2024 1 Counting

What will we observe?


For convenience, let us get rid of the factors of m and discuss a slightly more
convenient quantity, the “spin excess”
N
M 1X 1
s= = si = (N↑ − N↓ ), (1.5)
2m 2 2
i=1

where N↑ and N↓ are the number of spins pointing up and down, respectively. Using
the fact that N = N↑ + N↓ , we find

1 1
N↑ = N + s, N↓ = N − s. (1.6)
2 2
Note that s can take N + 1 different values s ∈ {− N2 , − N2 + 1, . . . , N2 − 1, N2 }.
If N is large, the number of possible values of s (macrostates) is much smaller than
the total number of microstates of the system, which is 2N . This means that some
macrostates correspond to a huge number of microstates. We see that the most
extreme macrostates s = ± N2 have one microstate each. The next most extreme
N

± 2 − 1 have N microstates each. You may be able to guess that most of the
microstates are bunched near middle values of s. Let us count them more precisely.
The number of states with spin excess s is given by the number of ways to choose
N↑ objects from a set of size N :
   
N N N! N!
g(N, s) = = = = 1 . (1.7)
N↑ N↓ N↑ !N↓ ! ( 2 N + s)!( 21 N − s)!

Another way to derive this result is using generating functions. First let us fix a
given state. For each site, associate a symbolic factor x to ↑ and a factor x−1 to ↓.
After multiplying the factors associated to all the sites, the result is xN↑ −N↓ = x2s .
Now, to count all possible states, we can associate to each site an expression x+x−1 .
We claim that the coefficient of x2s in

Z(x) = (x + x−1 )(x + x−1 ) · · · (x + x−1 )


| {z }
N factors
(1.8)

is the number of states with spin excess s. To see this, note that each configuration
(1.3) is in one-to-one correspondence with a term in the expansion of Z(x)

Z(x) = xx · · · x
+ xx−1 x · · · x
+ xxx−1 · · · x + (2N −3 more terms). (1.9)

9
Ph 12c 2024 1 Counting

Each configuration with spin excess s contributes x2s , so the coefficient of x2s is the
total number of states with spin excess s. We can compute (1.8) with the binomial
theorem

Z(x) = (x + x−1 )N
X N!
= 1 1 x2s
s
( 2 N + s)!( 2 N − s)!
X
2s
= g(N, s)x . (1.10)
s

Matching coefficients of x2s , we recover (1.7).


Let us study the behavior of g(N, s) in the limit of large N . To evaluate it, we
can use Stirling’s approximation4
1
N ! ≈ (2πN )1/2 N N e−N + 12N +...
1 √
≈ e(N + 2 ) log N −N +log 2π
, (1.11)

You will derive Stirling’s approximation on the first problem set. Plugging this into
g(N, s), we have

log g(N, s) ≈ (N + 12 ) log N − N + log 2π

− (N↑ + 12 ) log N↑ + N↑ − log 2π

− (N↓ + 12 ) log N↓ + N↓ − log 2π (1.12)
1
now adding and subtracting 2 log N and using that N = N↑ + N↓ ,
1
log g(N, s) ≈ (N↓ + 2+ N↑ + 12 ) log N − (N↑ + 21 ) log N↑ − (N↓ + 21 ) log N↓

− 12 log N − log 2π
N↑ N↓ 1 √
= −(N↑ + 12 ) log − (N↓ + 12 ) log − 2 log N − log 2π. (1.13)
N N
We have already anticipated that the largest number of microstates will be concen-
trated at small s. Thus, let us make the approximation that s/N is small. We can
then Taylor expand the logarithms,

2s 2s2
 
N↑,↓ 1 2s
log = log 1± = − log 2 ± − 2 + ... (1.14)
N 2 N N N
4
In this course, log x will always denote the correct definition of the logarithm, which is log base
e.

10
Ph 12c 2024 1 Counting

Keeping only the terms up to quadratic order in s2 /N 2 (we justify this in a moment),
our expression becomes

2s 2s2
  
N 1
− +s+ − log 2 + − 2
2 2 N N
2s 2s2 √
  
N 1 1
− −s+ − log 2 − − 2 − log N − log 2π. (1.15)
2 2 N N 2

Because of symmetry under s ↔ −s, the odd-order terms in s will drop out. We
only need to keep the constant and quadratic terms which are

2s2
 
1 1 Nπ
log g(N, s) ≈ − 1− + N log 2 − log
N N 2 2
2s 2 1 Nπ
≈− + N log 2 − log . (1.16)
N 2 2
2
In the last step, we discarded a term of the form 2s N2
. To justify this, we should
make sure that this term is subleading compared to all the others — in other words,
it should be smaller than constants like − 12 log π2 in the large N limit. We confirm
this in a moment. The result can be written as
2s2
g(N, s) ≈ g(N, 0)e− N , (1.17)

where
 1/2
N 2
g(N, 0) ≈ 2 . (1.18)
πN

1.3 Probability and expectation values


Definition (expectation value). Given a probability distribution P (s), the ex-
pectation value (or average value) of a function f (s) is defined as
X
⟨f ⟩ = f (s)P (s). (1.19)
s

Here, the probability function P (s) must be normalized


X
P (s) = 1, or equivalently ⟨1⟩ = 1. (1.20)
s

Note that expectation values are linear ⟨af + b⟩ = a⟨f ⟩ + b, where a and b are
constants independent of s.

11
Ph 12c 2024 1 Counting

1.4 Spin system again


For our spin system, we made the physical assumption that the spins were allowed
to freely interact with a random environment for a while. Mathematically, we will
assume that when we observe the spin system, every spin configuration (microstate)
is equally likely. Since a spin excess s corresponds to g(N, s) different microstates,
the probability of observing a given spin excess is
 1/2
g(N, s) 2 2s2
P (N, s) = ≈ e− N . (1.21)
2N πN

We divide by the total number of states so that our probability distribution is


normalized.
N
2
X
P (N, s) = 1. (1.22)
s=− N
2

P N2 P
For convenience, we will abbreviate s=− N
→ s.
2
We can now compute expectation values of various measurements. Firstly, we
have
X
⟨s⟩ = sP (N, s) = 0. (1.23)
s

We don’t have to do any computation to get this result — it follows from the fact
that sP (N, s) is odd under s ↔ −s. A more interesting question is
X
⟨s2 ⟩ = s2 P (N, s). (1.24)
s

To compute this, we can approximate it as an integral


 1/2 Z ∞
2 2s2
2
⟨s ⟩ = dss2 e− N (1.25)
πN −∞

Here, we used another trick. Even though the maximal values of s are ± N2 , we
extended the integral from −∞ to ∞. This is OK because the integrand is extremely
tiny at extreme values of s, so we don’t make
q a very big error by changing the integral
N
there. We can now change variables s → 2 x,

 1/2  3/2 Z ∞
2 N 2
2
⟨s ⟩ = dx x2 e−x (1.26)
πN 2 −∞

12
Ph 12c 2024 1 Counting

To do the integral over x, note that



Z r
−ax2 π
I(a) = e = . (1.27)
−∞ a

The integral we want is


√ √
dI(a) 1 π π
− = = . (1.28)
da a=1 2 a3/2 a=1 2

In the end, we find


N
⟨s2 ⟩ = . (1.29)
4
The root mean square spin excess is

2 1/2 N
s∗ = ⟨s ⟩ = , (1.30)
2
This is the width of the gaussian distribution (1.21). In terms of s∗ , we can write
the probability distribution (1.21) as
 2
1 −1 s
P (N, s) = √ e 2 s∗
(1.31)
2πs∗
The width s∗ sets the typical size of s in the probability distribution P (N, s).
Having computed it, we can revisit some of the approximations we made in com-
puting g(N, s). One approximation was that s/N √ ≪ 1, so that we could expand the
logarithm in (1.14). This is clearly true if s ≈ N and N is large. Thus, (1.14) is
a good approximation in the region s ≈ s∗ where the probability is largest. It is a
bad approximation where the probability is small (very very small, as we will see
in a moment), but who cares about that! Another approximation we made was to
2
discard the term 2s
N2
but keep the constant − 12 log π2 in (1.16). Again, we see that
s2∗ 1
this is justified near the peak of the probability distribution, since N2
∼ N ≪ 1.

N
In the large N limit, the width s∗ = is large. However, it is small compared
2
to the full possible range of s. To be more precise, let us define a fractional spin
excess
s
∈ − 12 , 12 .
 
t= (1.32)
N
The fractional width is ⟨t2 ⟩1/2 = 2√1N . In the large-N limit, this is tiny. For example,
suppose N = 1022 , which is typical for a macroscopic system. The fractional width
is t∗ = ⟨t2 ⟩1/2 ≈ 10−11 , so most states have extremely tiny fractional spin excess.
Furthermore, if we measure t, we are overwhelmingly likely to find it close to t∗ . You

13
Ph 12c 2024 1 Counting

might think: Hey, 10−6 is also pretty small. Perhaps we might sometimes measure
t ∈ [10−6 , 2 × 10−6 ]! You would be wrong. The probability of this happening is
Z 2×10−6 1 10−6 2 1010 9
dsP (N, s) ≈ 10−6 e− 2 ( 10−11 ) ≈ 10−6 e− 2 ≈ 10−2×10 . (1.33)
10−6

If you do observe a spin excess of 10−6 , here is a list of more likely scenarios:

• Your measurement equipment is broken.

• Your eyes are not working (i.e. your personal measurement equipment is bro-
ken).

• You accidentally drove to the wrong university and measured the wrong spin
system.

• You are actually living in the matrix and the machines are doing an experiment
to see if you ever studied statistical mechanics.

The point is that the distribution (1.21) becomes ridiculously sharply peaked
near s = 0 when N is large, at least if we measure deviations relative to macroscopic
quantities (i.e. quantities that scale linearly with N ). For all intents and purposes,
even though we have very little knowledge of the spin system and its connection to
the environment, we can confidently predict t = 0, based on counting.

1.5 Connection to random walks



The fact that deviations
√ away from mean values go like N , and fractional devi-
ations go like 1/ N , at large N , is a generic phenomenon about sums of random
contributions. Another way to derive it is to think of the spins as specifying a ran-
dom walk in spin excess space. We start at the left of the spin chain with spin excess
S0 = 0. Each time we move past a site, we change the spin excess in the direction
specified by that site
1
Sn = Sn−1 + sn . (1.34)
2
The total spin excess is the endpoint of the random walk SN = s. Taking an
expectation value of the above, we have
1
⟨Sn ⟩ = ⟨Sn−1 ⟩ + ⟨sn ⟩. (1.35)
2
However, ⟨sn ⟩ = 0, so we find the recursion relation

⟨Sn ⟩ = ⟨Sn−1 ⟩, (1.36)

14
Ph 12c 2024 1 Counting

which has solution ⟨Sn ⟩ = ⟨S0 ⟩ = 0. This is just the statement that the average
spin excess vanishes.
Now consider the expectation value of the square of (1.34),

⟨Sn2 ⟩ = ⟨(Sn−1 + 12 sn )2 ⟩ = ⟨Sn−1


2
⟩ + ⟨Sn−1 sn ⟩ + 14 ⟨s2n ⟩. (1.37)

The middle term vanishes because the value of sn does not depend on Sn−1 . We
say that the two quantities are uncorrelated. We have ⟨Sn−1 sn ⟩ = ⟨Sn−1 ⟩⟨sn ⟩ = 0.
More precisely,
X
⟨Sn−1 sn ⟩ = P (s1 , . . . , sn )Sn−1 (s1 , . . . , sn−1 )sn
s1 ,...,sn
X
= P (s1 ) · · · P (sn )Sn−1 (s1 , . . . , sn−1 )sn
s1 ,...,sn
  !
X X
= P (s1 ) · · · P (sn )Sn−1 (s1 , . . . , sn−1 ) P (sn )sn
s1 ,...,sn−1 sn

= ⟨Sn−1 ⟩⟨sn ⟩
= 0. (1.38)

In the second line, we used the fact that the probability of a configuration of spins is
the product of the probabilities for each individual spin (P (si ) = 1/2 for si = ±1).
This is true in our simple model. However, in more general cases, like in real
magnets, we might not be able to factorize P (s1 , . . . , sn ) and the above computation
would not be correct.
However, the last term is interesting because sn = ±1, so s2n is always 1. Thus,
we find

⟨Sn2 ⟩ = ⟨Sn−1
2
⟩ + 41 . (1.39)

The solution to this recursion relation is


n
⟨Sn2 ⟩ = , (1.40)
4

so in particular ⟨s2 ⟩ = N4 , and the RMS value is ⟨s2 ⟩1/2 = 2N , which agress with

our earlier computation. Thus, N fluctuations are typical in quantities that get a
large number of independent random contributions.
The sharpness of probability distributions in the large-N limit is to a large extent
responsible for the incredible success of statistical methods for macroscopic systems.
It allows us to turn probabilistic statements into precise predictions.
[End of lecture 1(2024)]

15
Ph 12c 2024 2 Entropy and Temperature

2 Entropy and Temperature


2.1 Fundamental assumption of statistical mechanics
Consider a closed (isolated) physical system with some space of states. The funda-
mental assumption of statistical mechanics is
Claim (Fundamental assumption of statistical mechanics). All accessible
states are equally likely.
The term “accessible” requires some explanation:

• Conservation laws. Accessible states should be consistent with conservation


laws: they should have the correct values of conserved quantities like total
energy, momentum, and angular momentum.
In this course, we will encounter other types of conserved quantities, some of
which are fundamental and some of which are approximate. An example of an
approximate conserved quantity is particle number. In relativistic quantum
field theory (QFT), all particles can be created or destroyed. However, par-
ticle creation/destruction might not occur at the energy scales relevant for a
given problem. For example, chemical processes do not involve enough energy
to create or destroy electrons, protons, or neutrons. For this reason, elec-
tron number, proton number, and neutron number are good approximately
conserved quantities in chemistry. By contrast, photons can be created or de-
stroyed in chemical processes, so photon number is not a good approximately
conserved quantity.
Let us also comment about what it means to have the “correct” value of a
conserved quantity, e.g. energy. If a system is in an exact energy eigenstate
|E⟩, then only that eigenstate is accessible. However, we typically don’t find
macroscopic systems in exact energy eigenstates. In practice, we usually only
know the total energy of a system with a resolution ∆U that depends on our
measuring device. In this case, states with energies in the window [U −∆U, U +
∆U ] are accessible.

• Time scale. Which states are accessible also depends on the time scale
being considered. Glass is a famous example of a substance that exhibits
dynamics on very long time scales — its relaxation time can be millions of
years. However, for practical purposes on human time scales, we shouldn’t
consider states where glass flows into other configurations.

2.1.1 Intuition for the fundamental assumption


The intuition behind the fundamental assumption is that interactions within the
system will tend to randomize the state. However, those interactions must still

16
Ph 12c 2024 2 Entropy and Temperature

respect conservation laws, so the system will wander through the space of states in
a random way while still respecting, e.g. conservation of energy.
Two properties of typical systems support this picture

• Chaos. Chaos is the exponential sensitivity of a system to changes in initial


conditions. It is sometimes called the “butterfly effect:” the flapping of a but-
terfly’s wings is a tiny perturbation, but its effects can amplify exponentially
over time. Because of chaos, even if we have precise knowledge of the initial
state of a system, that knowledge rapidly degrades when we try to predict its
future evolution.
For an illustration of chaos, I like this simulation of a double pendulum with
1 million closeby initial conditions: https://twitter.com/rickyreusser/
status/1231990500023926784. Although all 1 million instances start out
very close to one another, their trajectories quickly diverge from each other.
After a short time, the pendulums are spread out over all possible configura-
tions (consistent with conservation of energy).
Here’s another fun one, showing chaos in a simulation of four bodies interacting
via gravity: https://twitter.com/simon_tardivel/status/1215728659010670594.

• Egodicity. An ergodic system is one for which the average of an observable


over the set of accessible states is the same as the average over time for a
particular state.
The idea behind ergodicity in classical systems is that a trajectory through
phase space q(t), p(t) will eventually pass close by any other accessible point
in phase space. For example, a gas molecule bouncing around a room will
eventually go everywhere and have every momentum.
Most systems are not ergodic in the strict mathematical sense. There can be
regions of phase space that are never reached. (Example: A circular billiards
table on which a billiard ball traces out a path in the shape of an n-pointed
regular star centered at the center of the table. In this case, there is an n-gon
region in the center of the table that the ball never reaches.) Or it can take an
exponentially long time to visit every point in phase space. However, we are
usually not interested in whether a system passes close by every microstate.
For studying the dynamics of macrostates, ergodicity is often an excellent
approximation.

All this intuition does not amount to a proof, and in fact the fundamental
assumption is only an approximation. There are some properties of a state that are
very difficult to measure, and therefore not important for our purposes, but take a
very long time to randomize. An example is the “computational complexity” of a
state.

17
Ph 12c 2024 2 Entropy and Temperature

2.1.2 Mathematical statement of fundamental assumption


Mathematically, the fundamental assumption says that, in an isolated system, the
probability of observing a state s is P (s) = 1/g, where g (which is independent of
s) is the number of accessible states. Expectation values of observables X(s) are
given by
 
X 1 1X
⟨X⟩ = X(s) = X(s), (2.1)
s
g g s
P
where the sum s runs over accessible states.

2.2 Thermal contact and most probable configuration


Consider two systems S1 , S2 and let us bring them into contact so that energy can
be exchanged between them. This is called “thermal contact.” Together, the two
systems form a larger closed system S = S1 + S2 with constant energy U = U1 + U2 .
Let the number of states of the system S1 with energy U1 be g1 (U1 ) and let the
number of states of S2 with energy U2 be g2 (U2 ) = g2 (U − U1 ). The number of
states of S such that S1 has energy U1 is

g1 (U1 )g2 (U − U1 ) (2.2)

The total number of states of the combined system is


X
g(U ) = g1 (U1 )g2 (U − U1 ) (2.3)
U1

According to the fundamental assumption, after some amount of time, every


state of the combined system is equally likely. In particular, the probability dis-
tribution for observing that subsystem S1 has energy U1 , given that the combined
system has energy U , is
g1 (U1 )g2 (U − U1 )
P (U, U1 ) = . (2.4)
g(U )
P
Note that due to (2.3), the distribution is correctly normalized, i.e. U1 P (U, U1 ) =
1.

2.2.1 Example: spin system


Let us look at a typical example of such a probability distribution. Consider the
spin system from the last lecture, and let us turn on a magnetic field B so that
the energy U is proportional to the spin excess U = 2mBs.5 Using (1.17) from the
5
In most of this lecture, we use s to label microstates. Here, we are briefly using it to label the
spin-excess which specifies a macrostate. We hope this does not cause confusion.

18
Ph 12c 2024 2 Entropy and Temperature

previous lecture, we can write the number of microstates as a function of U ,


U2

g(N, U ) = g(N, 0)e 2N (mB)2 (2.5)
Let us bring two such systems into thermal contact, and suppose that the total
energy is U . The number of states where S1 has energy U1 is
U12 U22
 
g(N1 , U1 )g(N2 , U2 ) = g(N1 , 0)g(N2 , 0) exp − −
2N1 (mB)2 2N2 (mB)2
 2
(U − U1 )2
 
1 U1
= g(N1 , 0)g(N2 , 0) exp − + .
2(mB)2 N1 N2
(2.6)
The quantity in the exponent has a maximum when
U1 U − U1
0= −
N1 N2
N1
U1 = U ≡U b1 (2.7)
N1 + N2
The largest number of microstates corresponds to the case where the total energy U
is distributed between the two systems in amounts proportional to their size: system
S1 has N1N+N1
2
of the energy and system S2 has N1N+N 2
2
of the energy.
Let us write U1 = U b1 + δU1 and substitute to find
U2
   
1 2 1 1
g(N1 , 0)g(N2 , 0) exp − + δU1 + (2.8)
2(mB)2 N1 + N2 N1 N2
We can compute g(U ) by integrating over δU1 (exercise). Dividing, we find
s  2 
δU1
 
2 1 1 − 2 N
1
+ N1
P (U, U1 ) = + e 2(mB) 1 2 . (2.9)
π N1 N2
Suppose that N1 and N2 are both large. We’ll will write N1 , N2 = O(N ), where N
is some large number. Let us also take U = O(N ) in the large-N limit, so that the
total energy scales with
√ the size of the system. The probabiity distribution P (U, U1 )
has width δU1 ∼ O( N ).6 Meanwhile, U b1 scales like O(N ). Thus, the fractional
6
We could obtain this by computing ⟨δU12 ⟩1/2 , but a quicker way is to look at the exponent in
(2.9). Typical values of δU1 will be such that the quantity in the exponent is not too negative
(otherwise it will be very suppressed). Let us suppose it is O(1),
δU12
 
1 1
+ = O(1) (2.10)
2(mB)2 N1 N2
Ignoring the factors 2(mB)2 , which are not important for determining the large-N scaling of δU1 ,
we find
δU12
= O(1), (2.11)
N

or δU1 = O( N ) as claimed.

19
Ph 12c 2024 2 Entropy and Temperature


deviation δU1 /U1 ∼ O(1/ N ) is small. We should think of (2.9) as being extremely
sharply-peaked as a function of U1 .

2.2.2 Back to the general case


Generically, for macroscopic systems S1 and S2 , the product g1 (U1 )g2 (U − U1 ) will
be sharply peaked. Typically, g1 (U1 ) is a rapidly-changing function of U1 . Similarly
g2 (U2 ) is a rapidly-changing function of U2 . The product g1 (U1 )g2 (U − U1 ) involves
a competition between the two factors. A local maximum U b1 can occur when there
is a delicate balance between the rate of increase of g1 (U1 ) and the rate of decrease
of g2 (U − U1 ). If we move U1 slightly away from this point, the balance is destroyed
and the product decreases rapidly away from its maximum.
Because P (U, U1 ) is sharply peaked, we are overwhelmingly likely to observe a
value of U1 where P (U, U1 ) is maximized — the most probable configuration.
In summary, before the systems are in thermal contact, we will observe a value
of U1 where g1 (U1 ) alone is maximized. After thermal contact, the two systems
will evolve to the most probable configuration U1 = U b1 , where g1 (U1 )g2 (U − U1 ) is
maximized. The final configuration is called thermal equilibrium.

2.3 Entropy and temperature


The condition to have a maximum of P (U, U1 ), or equivalently g1 (U1 )g2 (U − U1 ), is

0= (g1 (U1 )g2 (U − U1 ))
∂U1
∂g1 ∂g2
= g2 − g1 (2.12)
∂U1 ∂U2
We can write this as
∂ log g1 ∂ log g2
= . (2.13)
∂U1 ∂U2
Definition (entropy). The entropy of a system is the log of the number of acces-
sible states

σ = log g. (2.14)

The condition for thermal equilibrium (i.e. maximizing P (U1 )) becomes


∂σ1 ∂σ2
= . (2.15)
∂U1 ∂U2
Definition (temperature). The temperature of a system is defined by
1 ∂σ
= . (2.16)
τ ∂U

20
Ph 12c 2024 2 Entropy and Temperature

We use 1/τ on the left-hand side so that τ has units of energy. (Note that the
entropy σ is unitless.) With this definition, the condition for thermal equilibrium
becomes

τ1 = τ2 . (2.17)

For historical reasons, the temperature that we usually measure with a ther-
mometer is defined by
τ
T = , (2.18)
kB

where kB = 1.381 × 10−23 J/◦ K is Boltzmann’s constant. Similarly, the entropy is


usually defined7

S = kB log g, (2.19)

so that we have
1 ∂S
= . (2.20)
T ∂U
These conventions date from a time when the relationship between temperature,
entropy, and the number of states was not understood. From a modern point of
view, kB is a silly constant. Entropy should be unit-less, and temperature should
have units of energy. In this course, we will almost always work with τ and σ instead
of T and S.

2.4 Comments on entropy


The logarithm of the number of states, the entropy σ = log g, has some nice prop-
erties compared to the number of states itself. Firstly, entropy is extensive. If we
have two non-interacting (or weakly-interacting) systems, then the entropy of the
combined system is the sum of the entropies of the individual systems,

σ = log(g1 g2 ) = σ1 + σ2 . (2.21)

If we take N copies of a system, the entropy of the combined system is N times


the entropy of an individual system. In particular, for typical physical systems, the
entropy scales like the number of degrees of freedom of the system. For example, in
a gas at fixed temperature and pressure, the entropy is proportional to the volume.
Another nice property of the logarithm is that σ is insensitive to many details
that we have left unspecified, unlike g. For example, in our definition of accessible
states, we said that there might be some energy resolution, so that accessible states
have energy in a window U ± ∆U . If we double the size of the window ∆U ′ = 2∆U ,
7
This expression is engraved on Boltzmann’s gravestone in the form S = k log W .

21
Ph 12c 2024 2 Entropy and Temperature

then the number of accessible states will roughly double, g → g ′ = 2g.8 Thus, the
actual value of g depends sensitively on our precise energy resolution. However, the
entropy will change by σ → σ ′ = σ + log 2. If σ ∼ 1022 is large, the difference
between σ and σ ′ is insignificant. Thus, we can reliably discuss the entropy of a
system without having to give a detailed definition of the space of accessible states.
All of this supports the mantra that when you have a ridiculously large number,
it’s better to think about it’s logarithm. You’ll notice that in our examples with
spin systems and gaussian integrals, we were manipulating logarithms.

2.5 Entropy increases


Suppose system S1 has energy U10 before thermal contact. The initial number of
states of the combined system is

(g1 g2 )0 = g1 (U10 )g2 (U − U10 ). (2.22)

After thermal contact, the systems will evolve to a configuration with U1 = U


b1 where
the number of states is maximized
b1 )g2 (U − U
(g1 g2 )max = g1 (U b1 ) (2.23)

Clearly we have

(g1 g2 )max ≥ (g1 g2 )0 (2.24)

with equality if Ub1 = U10 (i.e. the system S1 was already at the equilibrium tem-
perature of the combined system).
This is the statement that entropy increases. The entropy of a closed system will
remain constant or increase when a constraint internal to the system is removed.
Some examples of operations that increase entropy are adding particles, adding
energy, increasing volume, decomposing molecules.
It is extremely unlikely to observe a system spontaneously lower its entropy. For
example, suppose we have a gas of particles in a room. A low entropy configuration
would be one where all the particles are contained in the top left corner. (This is
low entropy because the number of such states is much smaller than the number of
states where the particles roam freely.) However, the chances of the room evolving
into this state are ridiculously tiny. Suppose the particles are non-interacting. The
probability at any given time of observing a particle in 1/2 of the room is 1/2.
The probability of observing all the particles simultaneously in 1/2 the room is
23
1/2N ∼ 1/210 — i.e. it will never happen.
8
This is true assuming the density of states is roughly constant as a function of energy over
scales of size ∆U .

22
Ph 12c 2024 2 Entropy and Temperature

Consider two systems brought into thermal contact that have not yet reached
equilibrium. They will exchange some energy U1 → U1 + δU and U2 → U2 − δU in
such a way that entropy increases. Thus, we have
∂σ1 ∂σ2
δU − δU ≥ 0
∂U1 ∂U2
 
1 1
− δU ≥ 0. (2.25)
τ1 τ2
In other words, δU will be positive and S1 will get hotter if τ2 > τ1 , and vice versa
if τ1 > τ2 . Thus, energy flows in the way one would expect: from hotter to cooler.

2.6 The arrow of time


An interesting property of the laws of nature is that they are time-reversal invariant.
For example, if we have a solution x(t) to
d2 x
F (x) = m , (2.26)
dt2
then x(−t) is another solution.
Actually, the Standard Model of particle physics is not invariant under time-
reversal alone. Instead, it is invariant under a symmetry called CRT that simulta-
neously reverses time (T), reflects a spatial direction (R), and exchanges particles
with antiparticles (C for “charge conjugation”). It was discovered in 1964 that if
you don’t do CR as well as T, then certain processes involving nuclear decay are
not invariant. Anyway, applying the symmetry CRT, a solution of the equations of
the Standard Model with time evolving one way can be transformed into another
solution with time evolving the other way.
In this case, one might ask: how do we know which direction of time is the right
one? The fact that entropy increases gives our answer. If we took a solution with
time reversed, then all microscopic physical laws would be obeyed, but we’d see
weird things happen like eggs unbreaking, shockwaves miraculously pushing bombs
back together and causing them to bounce off the ground and attach to airplane
wings, etc.. It is possible to unambiguously determine the arrow of time as long as
a system does not start out in a maximum entropy configuration. One of the great
mysteries of fundamental physics is why the initial entropy of the universe is so low.
[End of lecture 2(2024)]

2.7 Heat capacity


Definition (heat capacity). The heat capacity C is defined by
∂U
C= . (2.27)
∂τ

23
Ph 12c 2024 2 Entropy and Temperature

It is the change in energy over change in temperature.


Heat capacity is extensive: it scales with the size of a system. This is hopefully
intuitive — the bigger a system is, the more energy it can store. However, let us
understand it from the formula (2.27). Energy U is extensive: the total energy of
system S1 + S2 is the sum U = U1 + U2 . Meanwhile, temperature is intensive: if we
combine two systems with temperature τ , the resulting system still has temperature
τ . Consider a system S with heat capacity CS (τ ) = ∂U S
∂τ . Now consider N copies of
S, which we denote N S. The heat capacity is
∂UN S ∂US
CN S (τ ) = =N = N CS (τ ). (2.28)
∂τ ∂τ
Even more roughly, we can say: in a system of size N , U scales like O(N ) and τ
scales like O(1). Thus ∂U
∂τ scales like O(N/1) = O(N ).
A nice property of heat capacity is that it is directly measurable. To measure it,
you add some energy and see how the temperature changes. From the heat capacity,
one can compute entropy by integrating. Note that
∂σ ∂σ ∂U C
= = . (2.29)
∂τ ∂U ∂τ τ
Thus, the difference in entropy of a system at temperatures τ1 and τ2 is
Z τ2
C(τ )
σ2 − σ1 = dτ. (2.30)
τ1 τ

For example, setting τ1 = 0, we have


Z τ0
C(τ )
σ(τ0 ) = dτ + σ(0). (2.31)
0 τ

The entropy at zero temperature is the log of the ground state multiplicity
σ(0) = log g(0). For most systems, g(0) is small, so σ(0) is effectively zero compared
to macroscopic entropies at nonzero temperature (which are O(N ), where N is
the number of particles). In such systems, we can drop the last term above. For
glassy systems, σ(0) can still have macroscopic size — such systems have frozen-in
disorder, so that there are many different (near) ground states where the atoms are
distributed in many different ways.
Equation (2.31) is an awesome formula! You might think that the entropy, i.e.
the log of the number of accessible states, is a pretty abstract concept. Furthermore,
it’s not easy to imagine computing it directly: how can we count the microstates of
a system of 1023 particles? But using (2.31) you can actually figure out what it is!
Another property we should mention is that heat capacity is usually positive:
adding energy usually increases temperature. However, there are some exotic sys-
tems that possess negative heat capacity. The most famous example is a black hole:

24
Ph 12c 2024 2 Entropy and Temperature

adding energy increases the size of the black hole, which lowers the temperature
(and lowers the rate of Hawking radiation). Conversely, removing energy decreases
the size of the black hole and raises the temperature. This leads to a runaway effect:
the black hole will radiate more and more, its temperature rising and rising, until
it decays away to nothing.

2.8 The laws of thermodynamics


The above observations about temperature and entropy have been codified into
“laws” of thermodynamics9

0th law: If two systems are in thermal equilibrium with a third system, then
they are in thermal equilibrium with each other. This is the definition of
equilibrium τ1 = τ2 , together with the transitive property of equality.

1st law: Heat is a form of energy. This is the principle of conservation of


energy.

2nd law: Entropy increases.

3rd law: Entropy approaches a constant value as the temperature approaches


zero. This is simply the ground-state entropy σ(0) = log g(0).

From the point of view of statistical mechanics, most of these aren’t really funda-
mental laws — they’re consequences of statistics and properties of typical systems.
Honestly, I never learned which law had which number, just like I never bothered
to learn the difference between class 1, class 2, and class 3 levers. I’m supposedly a
professional physicist, and it’s almost never been a problem, although people talk a
lot about the 2nd law when discussing black hole physics (for reasons that we’ll see
later), so I kind of know which one that is. To avoid hypocrisy, I will try not to test
you on which law is which.

9
Here’s a song about the laws of thermodynamics: https://www.youtube.com/watch?v=VnbiVw_
1FNs.

25
Ph 12c 2024 3 The Boltzmann distribution

3 The Boltzmann distribution


3.1 Microcanonical ensemble
Definition (ensemble). A probability distribution P (s) on the space of states is
called an ensemble.
An ensemble is needed to compute expectation values of observables
X
⟨X⟩ = P (s)X(s). (3.1)
s

In the last lecture, we studied the ensemble for an isolated system with a known
fixed energy U . According to the fundamental assumption of statistical mechanics,
it is given by
(
1
if s has energy U ,
Pmicrocanonical (s) = g(U ) (3.2)
0 otherwise,

where g(U ) is the number of states with energy U — i.e., the number of accessible
states.
Definition (microcanonical ensemble). For historical reasons, the probability
distribution (3.2) is called the microcanonical ensemble.

3.2 Canonical ensemble


However, many systems we encounter can exchange energy with their environment,
and thus should not be thought of as having fixed energy. A basic question in
thermal physics is how a system S behaves when exposed to an environment at
some temperature. To model the environment, let us introduce:
Definition (reservoir). A reservoir R (or “heat bath”) is an extremely large sys-
tem. In particular, a reservoir has an very large heat capacity C = ∂U
∂τ ≫ 1.
Because of its large heat capacity, a reservoir can absorb essentially any amount of
energy without changing its temperature. We would like to know how S behaves
when brought into thermal contact with a reservoir R at temperature τ . We are
interested in this question even when S is not macroscopic. Some examples of a
system S and reservoir R are
• S is an atom in a crystal and R is the crystal itself

• S is a few air molecules in the room and R is all the air in the room

• S is a dark matter particle and R is the dark matter halo in the galaxy

• More generally, S is a small subpart of a larger system R

26
Ph 12c 2024 3 The Boltzmann distribution

What is the probability that the system S will be in a state s with energy Es ?
Let the total energy of the combined system S + R be Etot . By the fundamental
assumption of statistical mechanics,

P (s) ∝ gS (S is in state s) × gR (Etot − Es )


= gR (Etot − Es ). (3.3)

The first factor gS (S is in state s) is simply 1 because we are demanding that the
microstate of S be exactly s. In the second factor gR (Etot − Es ), we have used
conservation of energy.
The constant of proportionality in (3.3) is the inverse of the number of accessible
states of S + R. So that we don’t have to discuss this quantity, consider the ratio
of probabilities associated to two microstates s, s′ ,

P (s) gR (Etot − Es )

= . (3.4)
P (s ) gR (Etot − Es′ )

Let us write the number of states of R in terms of its entropy,

eσR (Etot −Es )


= . (3.5)
eσR (Etot −Es′ )
We should imagine that Es ≪ Etot , so that it is a good idea to expand the entropy
in a Taylor series

∂σR 1 2 ∂ 2 σR
σR (Etot − Es ) = σR (Etot ) − Es + Es + ...
∂U 2 ∂U 2
Es 1 2 ∂ 2 σR
= σR (Etot ) − + Es + .... (3.6)
τ 2 ∂U 2
In the second line, we have assumed that R is at temperature τ and we have used
the definition of temperature.
We claim that in the limit of a very large reservoir, the second and higher
derivative terms above can be dropped. Let us first consider the second derivative
term. It is given by

∂ 2 σR
 
∂ 1 1 ∂τ 1 1
2
= =− 2 =− 2 , (3.7)
∂U ∂U τ τ ∂U τ C

where C is the heat capacity of the reservoir. Remember that a reservoir has ex-
tremely large heat capacity, so this term is negligible.
Let us understand this result in a cruder way that will help with the higher-
order terms in (3.6). The largeness of C comes from the fact that energy U is
extensive and temperature τ is intensive. Thus, C = ∂U ∂τ scales like O(N ). By the
∂ k σR
same large-N scaling analysis, we expect ∂U k
to scale like O(N/N k ) = O(N 1−k ):

27
Ph 12c 2024 3 The Boltzmann distribution

the N in the numerator comes from the fact that σR is extensive, and the N k in
the denominator comes from the fact that U is extensive and we have k factors of
U . Thus, in the limit of a large reservoir, the higher-order terms in (3.6) should
be dropped as well. On your homework, you will make this result rigorous in the
case where the environment can be modeled as N non-interacting copies of a single
system.
Thus, let us drop the higher-derivative terms in (3.6). The quantity σR (Etot )
cancels between the numerator and denominator, and we are left with

P (s) e−Es /τ
= =⇒ P (s) = Ae−Es /τ , (3.8)
P (s′ ) e−Es′ /τ
where A is independent of s. Using the fact that probabilities must sum to 1, we
can solve for A,
X 1
Ae−Es /τ = 1 =⇒ A= , (3.9)
s
Z(τ )

where
X
Z(τ ) ≡ e−Es /τ . (3.10)
s

Overall, we find

e−Es /τ
P (s) = , (3.11)
Z(τ )
Note that almost all reference to the reservoir has dropped out, with the only re-
maining dependence on R through its temperature τ . Equation (3.11) is the most
important equation in statistical mechanics. It’s so important that the numerator,
denominator, and ratio on the right-hand side all have their own names:
Definition (Boltzmann factor). The quantity e−Es /τ in the numerator of (3.11)
is called a Boltzmann factor.

Definition (partition function). The quantity Z(τ ) in (3.10) is called the parti-
tion function.

Definition (canonical ensemble). For historical reasons, the probability distri-


bution (3.11) is called the canonical ensemble.

3.3 Why Taylor expand the entropy?


If you look over the above derivation, all we did was Taylor expand the entropy and
keep the first two terms. One question is: why didn’t we Taylor expand the number

28
Ph 12c 2024 3 The Boltzmann distribution

of states in (3.4) and keep the first two terms of that expansion instead? Instead of
an exponential, that procedure would have given a linear function of Es , which is
the wrong answer. The reason is that Taylor expanding a logarithm gives a more
efficient way to approximate quickly-growing functions.
As an example, consider the model function g(U ) = U N , where N = 1022 is
enormous. If we Taylor expand g(U ) itself, then we have
N (N − 1) Es2
 
N Es
g(Etot − Es ) = Etot 1 − N + 2 + ... (3.12)
Etot 2 Etot
The problem with this series compared to (3.6) is that even though each subsequent
term comes with an extra 1/Etot (which goes like 1/N and is small), it also comes
with an extra factor of N (which is large). In the large-N limit, we cannot drop
all the higher-derivative terms. Instead, the best we can do is drop subleading
quantities in N in each term. This leads to
N 2 Es2
 
N Es
g(Etot − Es ) ∼ Etot 1−N + 2 + . . .
Etot 2 Etot
−Es EN
N
= Etot e tot = g(Etot )e−Es /τ (3.13)
so we get an exponential again. Here, we used the fact that τ = g(Etot )/g ′ (Etot ) =
Etot /N in this example. Taking the logarithm first gives a more efficient way to sum
all these terms.

3.4 The partition function


The partition function Z(τ ) is the most important quantity in statistical mechanics.
Almost everything you might ever want to know about a system is hidden in its
partition function (suitably generalized), and the −1’st law of thermodynamics is
that the first thing you should try to do when you encounter a new system is compute
its partition function. An example of a quantity hidden in the partition function is
the expectation value of the energy
X
U = ⟨E⟩ = Es P (s)
s
1 X
= Es e−Es /τ
Z(τ ) s
1 2 ∂
= τ Z(τ )
Z(τ ) ∂τ

= τ2 log Z(τ ) (3.14)
∂τ
It is conventional to define β = 1/τ and write

⟨E⟩ = − log Z(β). (3.15)
∂β

29
Ph 12c 2024 3 The Boltzmann distribution

We can also compute the expectation value of the square of the energy

1 ∂2 ∂2 1 ∂Z(β) 2
 
2
⟨E ⟩ = Z(β) = log Z(β) + . (3.16)
Z(β) ∂β 2 ∂β 2 Z(β) ∂β
In particular, the standard deviation of the energy is

∂2
∆E 2 = ⟨(E − ⟨E⟩)2 ⟩ = ⟨E 2 ⟩ − ⟨E⟩2 = log Z(β). (3.17)
∂β 2
As an example, consider a two-state system with ground-state energy 0 and
excited state energy ε. The partition function is

Z(τ ) = 1 + e−ε/τ . (3.18)

The average energy is

τ 2 ∂Z(τ )
U = ⟨E⟩ =
Z(τ ) ∂τ
εe−ε/τ
= (3.19)
1 + e−ε/τ
When τ /ε is small, this is close to zero (in fact, it dies incredibly quickly as τ gets
small). When τ /ε is large, this asymptotes to ε/2. The heat capacity is

∂U ε2 e−ε/τ
C= = 2 . (3.20)
∂τ τ (1 + e−ε/τ )2

This has the surprising feature that it spikes near τ /ε ∼ 0.5, a phenomenon known
as the Schottky anomaly. It is “anomalous” because in most materials, the heat
capacity is monotonically increasing with temperature. The reason the Schottky
anomaly appears here is that our system has a maximum energy ε. As τ → ∞,
the largest U = ⟨E⟩ can ever get is ε/2 (coming from an equal probability of being
in the excited and ground state). Typical macroscopic systems have no maximum
energy (or a maximum energy that is way bigger than typical temperatures). In
that case, as we increase τ , the probability becomes nontrivial for more and more
high energy states and ⟨E⟩ can keep increasing.
[End of lecture 3(2024)]
Consider now a pair of two-state systems, each with zero ground state energy,
and with excited state energies ε1 , ε2 . The partition function is

Z1+2 (τ ) = 1 + e−ε1 /τ + e−ε2 /τ + e−(ε1 +ε2 )/τ


= (1 + e−ε1 /τ )(1 + e−ε2 /τ )
= Z1 (τ )Z2 (τ ). (3.21)

30
Ph 12c 2024 3 The Boltzmann distribution

〈E〉,C
0.5

0.4

0.3

0.2

0.1

τ/E0
0.0 0.5 1.0 1.5 2.0 2.5 3.0

Figure 1: ⟨E⟩ for a two-state system as a function of τ /E0 .

This is a special case of a more general result: If S1 , S2 are non-interacting systems,


then the partition function of the combined system is the product of the partition
functions

ZS1 +S2 (τ ) = ZS1 (τ )ZS2 (τ ). (3.22)

3.5 Example: spin system


As an example, the partition function of our spin system from the last lecture can
be computed as follows. An individual spin has energies E = ±mB. The partition
function for a single spin is thus

Zsingle-spin (τ ) = e−mB/τ + emB/τ = 2 cosh(mB/τ ) (3.23)

Thus, the partition function for N non-interacting spins is

Zspin-system (τ ) = (2 cosh(mB/τ ))N = (2 cosh(βmB))N . (3.24)

The expected value of the energy is


∂ ∂
⟨E⟩ = − log Z = − (N log 2 + N cosh(βmB))
∂β ∂β
= −N mB tanh(βmB) = −N mB tanh(mB/τ ). (3.25)

At low temperatures, we have ⟨E⟩ = −N mB as all the spins align with the magnetic
field. At high temperatures, we have ⟨E⟩ = 0 as thermal fluctuations dominate over
the effects of the applied field.
The standard deviation in energy is
∂ 1
∆E 2 = − (−N mB tanh(βmB)) = N (mB)2 2 . (3.26)
∂β cosh (βmB)

31
Ph 12c 2024 3 The Boltzmann distribution

Note that the fluctuations are of size ∆E 2 ∼ N , whereas the energy itself
√ is also
O(N ). Thus, the fractional fluctuations are small, of order ∆E/⟨E⟩ ∼ 1/ N .
There is a more general result relating the size of fluctuations in energy to the
heat capacity. Recall that
 −1
∂⟨E⟩ ∂τ ∂⟨E⟩ ∂2 ∆E 2
C= =− = β 2 2 log Z = 2 (3.27)
∂τ ∂β ∂β ∂β τ
We know that heat capacity scales linearly with the size of the system N , so this
shows that ∆E 2 also scales like N .

3.6 Thermal density matrix


3.6.1 Density matrices
In quantum mechanics, a state |ψ⟩ is a vector in a Hilbert space H. The expectation
b in the state |ψ⟩ is computed by10
value of an observable X

⟨X⟩
b = ⟨ψ|X|ψ⟩.
b (3.28)

More generally, we may not know the precise state of a system, but only a
probability distribution on the space of states. For example, we may have a lab
partner that randomly decides whether to prepare an up state or down state with
probabilities p and 1 − p. In this case, the expectation value of an observable is

⟨X⟩
b = p⟨↑ |X|
b ↑⟩ + (1 − p)⟨↓ |X|
b ↓⟩. (3.29)

This is not the same thing as having a single quantum state


√ p
|?⟩ = p| ↑⟩ + 1 − p| ↓⟩ (wrong). (3.30)

The expectation value of X b in such a state would involve cross-terms ⟨↑ |X|


b ↓⟩ that
are not present in (3.29). Instead, the quantities p and (1 − p) represent classical
probabilities, on top of the inherent quantum mechanical probabilities associated
with a measurement.
A probability distribution on the space of states can be represented by a density
matrix. In the case described above, the density matrix is

ρb = p| ↑⟩⟨↑ | + (1 − p)| ↓⟩⟨↓ |. (3.31)

It has matrix elements


   
⟨↑ |b
ρ| ↑⟩ ⟨↑ |b
ρ| ↓⟩ p 0
= . (3.32)
⟨↓ |b
ρ| ↑⟩ ⟨↓ |b
ρ| ↓⟩ 0 1−p
10
In these few sections, I will try to be careful about putting hats on quantum mechanical
operators, to distinguish them from C-numbers. However, I will be less careful about this later.

32
Ph 12c 2024 3 The Boltzmann distribution

The expectation value of an observable can be written


⟨X⟩
b ρb = Tr(b
ρX)
b
= pTr(| ↑⟩⟨↑ |X)
b + (1 − p)Tr(| ↓⟩⟨↓ |X)
b
= p⟨↑ |X|
b ↑⟩ + (1 − p)⟨↓ |X|
b ↓⟩. (3.33)
In general, given a probability distribution P (s) on a space of orthonormal states,
the corresponding density matrix is given by
X
ρb = P (s)|s⟩⟨s|, (3.34)
s
and the expectation value of an observable is
X X
⟨X⟩
b ρb = Tr(Xb ρb) = P (s)Tr(X|s⟩⟨s|)
b = P (s)⟨s|X|s⟩.
b (3.35)
s s
This is a more general version of (3.1). Specifically, (3.1) applies when the states |s⟩
are eigenstates of X
b with eigenvalues X(s).

3.6.2 Thermal density matrix


In the canonical ensemble, the probability distribution P (s) is given by (3.11). This
leads to
Definition (thermal density matrix). The thermal density matrix at tempera-
ture τ is given by
P −Es /τ
e |s⟩⟨s|
ρbτ = s . (3.36)
Z(τ )
The quantity in the numerator can be written in a basis-independent way as
X
e−Es /τ |s⟩⟨s| = e−H/τ , (3.37)
s
where H is the Hamiltonian. Equation (3.37) is simply the definition of the expo-
nential of a diagonalizable operator: we just exponentiate all the eigenvalues. See
the discussion around (A.10) for details.
The partition function itself can be written
X
Z(τ ) = e−Es /τ = Tr(e−H/τ ), (3.38)
s
which is the statement that the trace of an operator is the sum of its eigenvalues.
Overall, the density matrix associated to the canonical ensemble is
e−H/τ
ρbτ = . (3.39)
Tr(e−H/τ )
This incredibly beautiful formula has abundant applications, from statistical physics
to condensed matter physics to particle physics to string theory. We will meet some
of them later in this course.

33
Ph 12c 2024 4 Entropy in general ensembles

4 Entropy in general ensembles


4.1 Gibbs/Shannon entropy
We defined entropy in the microcanonical ensemble as the log of the number of
accessible states. Suppose that instead we have a general ensemble, i.e. a general
probability distribution P (s) on the space of states. How should we define the
entropy? After answering this question in general, we will be particularly interested
in the case of the canonical ensemble.
We will look for a generalized definition of entropy satisfying two conditions:

• If there are g equally likely accessible states, then we should have σ = log g,
as before.

• σ should be extensive.

It turns out that these conditions single out a unique answer!


The argument is as follows. Suppose that we don’t have just one copy of our
system S, but instead a large number N of identical copies. We will compute the
entropy of N copies, and then divide by N .
The systems are in states

si1 , si2 , . . . , siN . (4.1)

If N is large enough, the number of systems in state si will be P (si )N . That is, the
string above will contain P (s1 )N s1 ’s, P (s2 )N s2 ’s, etc.. By taking a large number of
systems, we have turned probabilities into eventualities. A string where s1 , s2 , etc. do
not appear in the correct proportions should be considered inaccessible in the large-
N limit. To determine the entropy, we count the number of accessible configurations
gtotal of the total system and apply the familiar definition σtotal = log gtotal . The
entropy of a single system is then σ = σtotal /N .
The problem of counting the number of strings (4.1) with P (s1 )N s1 ’s, P (s2 )N
s2 ’s, etc. was on the homework. The answer is

eN σ , (4.2)
log 2
where σ is log e times the Shannon entropy of the probability distribution,11
X
σ=− P (s) log P (s). (4.3)
s

This is how we will define the entropy of a general ensemble.


11
In information theory, it is conventional to define entropy using log2 . In physics, it is more
appropriate to use the natural logarithm.

34
Ph 12c 2024 4 Entropy in general ensembles

Definition (Gibbs/Shannon entropy). The Gibbs/Shannon entropy of a general


ensemble is P (s) is given by (4.3).
Formula (4.3) was written down by Gibbs, and was rediscovered many years later
by Shannon in the context of information theory.
We see that entropy is a function of a probability distribution. When we discuss
“the” entropy of a macroscopic system, we are using the fact that many different
probability distributions all give similar answers. For example, if I know the average
energy within a window U ±∆U , this gives me a probability distribution on the space
of states. For macroscopic systems, the Gibbs/Shannon entropy of that distribution
does not depend strongly on ∆U .
Definition (von-Neumann entropy). In the quantum mechanical context, the
entropy can be written

σ = −Tr(b
ρ log ρb), (4.4)

where ρb is the density matrix. This is called the von-Neumann entropy

4.2 Sanity checks


Let us check that the Gibbs/Shannon entropy gives the right answer for the micro-
canonical ensemble. Recall that
(
1
if s has energy U
Pmicrocanonical (s) = g(U ) . (4.5)
0 otherwise

The microcanonical Gibbs/Shannon entropy is then


X 1 1 X
− log − 0 = log g(U ). (4.6)
g(U ) g(U )
s with energy U other s

Here, it is important that we define 0 × log 0 = limp→0 p log p = 0.


Another sanity check is that the entropy is additive, even for a combination of two
systems with different probability distributions. This is not something we used in
our derivation in section 4, but it will happen naturally anyways. Consider systems
S1 and S2 with probability distributions P1 (s1 ) and P2 (s2 ) on their respective space
of states. States of the combined system S1 + S2 are labeled by pairs (s1 , s2 ), with

35
Ph 12c 2024 4 Entropy in general ensembles

probabilities P (s1 , s2 ) = P1 (s1 )P2 (s2 ). The combined Gibbs/Shannon entropy is


X
σ12 = − P (s1 , s2 ) log P (s1 , s2 )
s1 ,s2
X
=− P1 (s1 )P2 (s2 )(log P1 (s1 ) + log P2 (s2 ))
s1 ,s2
! !
X X X X
=− P1 (s1 ) log P1 (s1 ) P2 (s2 ) − P2 (s2 ) log P2 (s2 ) P1 (s1 )
s1 s2 s2 s1
= σ1 + σ2 . (4.7)

[End of lecture 4(2024)]

4.3 Entropy of the canonical ensemble


Let us finally compute the Gibbs/Shannon entropy of the canonical ensemble:
X
σ=− P (s) log P (s)
s
X e−Es /τ e−Es /τ
=− log
s
Z Z
X e−Es /τ  Es 
= + log Z
s
Z τ
⟨E⟩
= + log Z
τ
U
= + log Z. (4.8)
τ
This gives a definition of entropy appropriate for a system in thermal contact with
a reservoir.

4.4 Comments on Shannon Entropy


In information theory, Shannon entropy is a measure of the amount of information
in a message. Consider a data file given by a collection of 0’s and 1’s. How much
can we compress the file? We would like to perform lossless compression, so that the
original file can be completely recovered from its compressed version. It is reasonable
to define the amount of information in the file as the minimal number of bits it can
be compressed to.
If the data is completely random and unstructured, then there is no way to
compress it. This is because the possible number of random data files with n bits is
2n . A hypothetical compression algorithm would map each of these files to a string

36
Ph 12c 2024 4 Entropy in general ensembles

of m < n bits. However, the number of such strings is only 2m < 2n , so this cannot
be done in a 1-1 manner.
However, if the data has some structure, compression is possible. An extreme
example is if we know the file is just a sequence of exactly n 0’s. This can be encoded
using zero bits. A slightly less extreme example is if we know the file is a constant
sequence of length n, containing either all 0’s or all 1’s. Such files can be encoded
using 1 bit: 0 if the file is all 0’s or 1 if the file is all 1’s.
In general, when data has structure, the presence of that structure decreases
the number of possible files, making compression possible. For example, an XML
file always starts with <xml> and ends with </xml>. In total, this is 88 bits of
information that we already know is there. Consequently, the number of valid XML
files of length n is less than 2n−88 < 2n .
A more interesting example of structured data is human language. A naive way
to store a document of English text uses ASCII encoding (https://en.wikipedia.
org/wiki/ASCII), where each character is stored using 8 bits. For example, a is
ASCII code 01100001, b is ASCII code 01100010, etc.. This is not very efficient
for English text because the English language contains only 26 letters — perhaps
together with punctuation and spaces, we can say that there are 25 = 32 symbols.
Thus, we should be able to compress English using no more than 5 bits per character.
However, not every random sequence of characters is a good English sentence.
For example, the letter e is more likely to appear than z. We can exploit this extra
structure to make a better compression scheme.
Let us understand how this works in a toy language with only two characters a
and b. First, suppose that a,b occur equally often. From the point of view of letter
frequency, our text file is like a random set of 0’s and 1’s, and we cannot compress
it. (This statement ignores possible correlations between letters, which we discuss
in a moment.) Now suppose instead that a occurs 90% of the time and b occurs
10% of the time. To compress a sequence, it’s a good idea to use fewer bits for a
than for b. One way to accomplish this is the following scheme:

Sequence Code
aa 0
(4.9)
a 10
b 11

Because a occurs very often, we use 0 to stand for a pair of a’s. For example, the
sequence aaaaaabaaa, which is 10 characters, becomes 00011010, which is 8 bits.
The probabilities of all possible two-letter sequences are

Sequence aa ab ba bb
Probability 81% 9% 9% 1% (4.10)
Code 0 1011 1110 1111

37
Ph 12c 2024 4 Entropy in general ensembles

Thus, the average number of bits per character in scheme (4.9) is


1
(1 × 0.81 + 4 × 0.09 + 4 × 0.09 + 4 × 0.01) = 0.785. (4.11)
2
Given a sequence of symbols ai occurring with probabilities Pi , Shannon’s answer
for the number of bits per symbol needed to compress the sequence is the Shannon
entropy
X
H = minimal # of bits = − Pi log2 Pi . (4.12)
i

This is Shannon’s source coding theorem. In our two-character example, H is


given by
H = −(0.9 log2 0.9 + 0.1 log2 0.1) = 0.468. (4.13)
So our the compression scheme (4.9) that uses 0.785 bits per character is not optimal.
Can you improve it?
The probabilities of occurrences of various letters in English text are
a b c d e f g h i j
l m k
% 8.17 1.49 2.20 4.25 12.7 2.23 2.02 6.09 6.97 0.15 1.29
4.03 2.41
n o p q r s t u v wy z x
% 6.75 7.51 1.93 0.10 5.99 6.33 9.36 2.76 0.98 2.56 0.15
1.99 0.08
(4.14)
For example, a randomly-selected letter is 12.7% likely to be e and 0.08% likely to
be z. According to the source coding theorem, we should be able to encode English
using
H = − (0.0817 log2 0.0817 + 0.0149 log2 0.0149 + · · · + 0.0008 log2 0.0008) = 4.17
(4.15)
bits per letter. This is slightly better than the 5 bits we estimated above.
However, letters aren’t just randomly distributed according to some probabili-
ties. They form words and sentences. This additional structure makes language
more compressible. By thinking about sequences of words rather than letters,
Shannon estimated that the information entropy of English text has H = 2.62
bits/character.
The Hutter prize is a 50,000e competition to encode a 100MB snapshot Wikipedia
using the smallest possible number of bits. The current record is 16MB, which is
a compression ratio of 100/16=6.25, so each character is represented by 86.25bits
=
1.28 bits. This is better than Shannon’s estimate of 2.62 bits. Thus, correlations
between words also play a role in reducing the entropy of English text.12
12
Most people speak approximately 14 characters per second. Using the Hutter prize result
H = 1.28 bits/letter to estimate the information content, this is 18 bits/second. In conversation,
you convey roughly 18bit/s × byte/8bit ≈ 8 kB/hour of information.

38
Ph 12c 2024 4 Entropy in general ensembles

4.4.1 Optimal compression and entropy


An optimal compression algorithm for files with structured data of length N is as
follows: make a list of all valid files of length N . (“Valid” means consistent with
the known structure of the files.) Map each file to its index in the list, encoded as
a binary number. The amount of information in a file is the length of that binary
number. As you showed on your homework, if we perform this exercise for a file
of length N whose content is drawn from an alphabet ai with probabilities Pi , we
obtain N S bits, where S is the Shannon entropy of the probability distribution Pi .
Another way to think about this algorithm is as follows: our knowledge of the
structure of the data is a “macrostate.” It is a coarse measurement that tells us
something, but not everything, about the contents of the file. The set of valid files
is the set of “accessible states.” The number of bits we need to encode a structured
file is the log of the number of accessible states.
The more we know about the structure of the file, the smaller the number of
accessible states and the smaller the information entropy. Conversely, the less we
know about the structure of the file, the larger the number of accessible states
and the larger the information entropy. Thus, information entropy is a measure
of ignorance about the contents of some binary data. Likewise, thermodynamic
entropy is a measure of ignorance about the microstate of a system. Hopefully this
analogy explains why tools of information theory are useful in statistical mechanics.
Here’s a simulation that demonstrates a connection between information entropy
and thermodynamic entropy in a beautiful way:
https://twitter.com/i/status/1182208172100116480. On the left is a simula-
tion of two types particles that are initially un-mixed, and then mix over time. On
the right is the size of the resulting image, compressed using the png compression
algorithm. As the particles mix, our ignorance about the precise microstate of the
system increases. This is because there are more accessible states when the particles
are mixed than when they are not mixed. (We discuss entropy of mixing in sec-
tion ??.) At the same time, the image gets more complicated and requires a larger
png. This is roughly because there are more possible images of mixed particles than
unmixed particles. If the compression algorithm were lossless and optimal (which
is not really true in this case), then the size of the resulting file would equal the
thermodynamic entropy of the mixture.

39
Ph 12c 2024 5 The thermodynamic limit and free energy

5 The thermodynamic limit and free energy


5.1 The thermodynamic limit
Definition (thermodynamic limit). The limit N → ∞, where N represents the
size of a system, is called the thermodynamic limit.
You will show on the problem set that for√a system with N degrees of freedom
in the canonical ensemble, ∆E/⟨E⟩ = O(1/ N ). Consequently, in the thermody-
namic limit, all uncertainty about the energy of a system goes away. The canonical
ensemble becomes effectively the same as the microcanonical ensemble (with some
relationship between ⟨E⟩ and τ ).
To see this more precisely, consider the partition function
X
Zc (τ ) = e−Es /τ . (5.1)
s

Here, we include the subscript ‘c’ to emphasize that Zc is the partition function in
the canonical ensemble. Let us organize the states into energy levels with degeneracy
g(U ) = eσmc (U ) , where σmc (U ) is the entropy of the microcanonical ensemble. The
sum (5.1) becomes
X
Zc (τ ) = eσmc (U )−U/τ . (5.2)
U

The summand involves a competition between two effects: (1) the increasing number
of states eσmc (U ) as energy increases, and (2) the exponentially dying Boltzmann
factors.13 The crossover point where these effects balance out is the solution U∗ of
the equation


0= (σmc (U ) − U/τ ) . (5.3)
∂U U =U∗

In the thermodynamic limit, the summand in (5.2) becomes sharply peaked at the
“most probable configuration” U = U∗ , so we can approximate it by the value of
the summand at that point:

Zc (τ ) ≈ eσmc (U∗ )−U∗ /τ = Zmc (τ ), (thermodynamic limit) (5.4)

The right-hand side of (5.4) is the partition function of the microcanonical ensemble
(i.e. of a system with eσmc (U∗ ) states, all with energy U∗ ). Applying the formula
⟨E⟩ = τ 2 ∂ log Z
∂τ , we see that ⟨E⟩ ≈ U∗ in the thermodynamic limit.
13
Student question: how do we know that the Boltzmann factor eventually balances the growth
of g(U ) — i.e. that g(U ) doesn’t grow superexponentially? Answer: A system where g(U ) grows
superexponentially forever would have an infinite partition function. If this happens, it means that
some approximation we made in describing the system is breaking down.

40
Ph 12c 2024 5 The thermodynamic limit and free energy

Because all thermodynamic observables are encoded in the partition function,


equality of the canonical and microcanonical partition functions in the thermody-
namic limit implies equality of other quantities too. As an example, let us apply
formula (4.8) for the entropy:

⟨E⟩
σc = + log Z
τ
U∗  
≈ + log eσmc (U∗ )−U∗ /τ (thermodynamic limit)
τ
= σmc (U∗ ). (5.5)

5.2 Free energy


Definition (free energy). The free energy is defined by

F = U − τ σ = −τ log Z. (5.6)

In section (5.1), we showed that in the thermodynamic limit, a system will evolve
toward the “most probable configuration,” where σ(U ) − U/τ = − Fτ is maximized
(equation 5.3). Equivalently, in the thermodynamic limit, a system at fixed tem-
perature will seek to minimize F .
The fact that F is minimized in the most probable configuration is not a new
principle. Recall that the entropy of a reservoir is given by

σR (Utot − U ) = σR (Utot ) − U/τ. (5.7)

Thus, we have

F = U − τσ
= τ σR (Utot ) − τ σR (Utot − U ) − τ σ
= −τ σS+R (Utot , U ) + τ σR (Utot )
= −τ σS+R (Utot , U ) + independent of U (5.8)

F is just −τ times the entropy σS+R (Utot , U ) = σ + σR (Utot − U ) of the combined


system and reservoir, plus a term that doesn’t depend on U . So the statement that
F is minimized in the thermodynamic limit is just the usual statement that the
entropy of the combined system+reservoir is maximized.14
Free energy plays a similar role for systems in contact with a heat bath as energy
for damped systems. (Damping really means coupled to a zero-temperature heat
bath.) The term U causes the system to seek low energies, and the term −τ σ causes
14
Student question: what do we mean that the energy evolves in time toward the most probable
configuration? Answer: imagine the system is very weakly coupled to a reservoir, so that it is in
thermal equilibrium with itself, with energy as a function of time.

41
Ph 12c 2024 5 The thermodynamic limit and free energy

the system to seek high entropy. These effects balance each other at the minimum
of the free energy.
[End of lecture 5(2024)]
A nice example is a polymer that can straighten out or curl up. There are many
more configurations where the polymer is curled up than where it is straight. Thus,
the polymer wants to curl up to increase its entropy. We can make it energetically
favorable for the polymer straighten out by pulling on it with an external force Fext .
The free energy is then

F = −ℓFext − τ σ(ℓ), (5.9)

where σ(ℓ) is the entropy at length ℓ. The minimum is acheived when

∂σ
Fext = −τ . (5.10)
∂ℓ
Thus, the entropy term −τ σ(ℓ) leads to a noticeable force that we can measure. It
is called an “entropic force.” If we increase the temperature, the entropic force gets
stronger. For example, if you heat a rubber band, it contracts.

5.3 Pressure
Consider a system in an energy eigenstate s with energy Es , and let us suppose
that Es is a function of the volume V of the system. The volume can be changed
V → V −∆V by applying an external force. Let us suppose that the volume changes
sufficiently slowly that the system remains in an energy eigenstate.15 The change
in energy is
dEs
Es (V − ∆V ) = Es (V ) − ∆V + . . . . (5.11)
dV
Suppose that the system is compressed by applying pressure p to a surface of
area A. The force is pA. If the force acts over a distance ∆x, then the work imparted
to the system is

∆W = (pA)∆x = p∆V. (5.12)


15
This is sometimes called the “adiabatic approximation.” Consider a time-dependent Hamilto-
nian H(t) such that H(t0 ) = H0 and H(t1 ) = H1 . Suppose the system starts out in an eigenstate of
H0 , H0 |E0 ⟩ = E0 |E0 ⟩. What state will it be in at time t1 ? If the perturbation happens very slowly,
then the eigenstate |E0 ⟩ of H0 will evolve to an eigenstate |E1 ⟩ of H1 . However, if the P
perturbation
happens quickly, we typically end in a nontrivial superposition of eigenstates of H1 : i ci |Ei ⟩. In
other words, a slow perturbation takes eigenstates to eigenstates, while a fast perturbation induces
transitions between eigenstates. The meaning of slow vs. fast depends on the energy gaps Egap
between eigenstates and the amplitude. “Slow” means ∆t ≫ ℏ/Egap .

42
Ph 12c 2024 5 The thermodynamic limit and free energy

The work ∆W is equal to the change in energy of the system. Comparing (5.11)
and (5.12), we see that the pressure in state s is
dEs
ps = − (5.13)
dV
Averaging over all states in an ensemble, we find
 
X dEs
⟨p⟩ = P (s) −
s
dV
d X
=− P (s)Es
dV s
   
∂⟨E⟩ ∂U
=− =− . (5.14)
∂V σ ∂V σ
The subscript σ means that we take the derivative, keeping the entropy fixed. The
entropy is held fixed because the probability distribution on the space of states is
fixed during the compression. Specifically, the compression is slow enough that if
the system starts out in energy eigenstate state s, it will remain in state s. Thus,
the occupation probabilities don’t change and the entropy is unchanged.

5.4 Thermodynamic identities


In this section, we derive an alternative expression for pressure in terms of the free
energy of a system. This is useful because the free energy is given by F = −τ log Z,
and the partition function Z is the nicest thing to calculate. In our derivation, we
will encounter the notion of constrained partial derivatives. Let us explain them in
more detail.

5.4.1 Constrained derivatives


A common situation in thermodynamics is that we have three macroscopic mea-
surements x, y, z that depend on each other. Once we specify any two, the third is
determined. We can express this with a constraint equation f (x, y, z) = 0.
 
Definition (constrained derivative). A constrained partial derivative ∂x ∂y z is
the change in x over the change in y, with z held fixed. This notion generalizes
to
 multiple
 variables: in general if we have a constraint f (x, y, z, w, . . . ) = 0, then
∂x
∂y means that we hold z, w, . . . fixed when we take the derivative.
z,w,...

If x, y, z are related by a constraint f (x, y, z) = 0, then their constrained deriva-


tives are related. We can see this by looking at the differential form of the constraint
0 = f (x + dx, y + dy, z + dz) − f (x, y, z)
∂f ∂f ∂f
= dx + dy + dz. (5.15)
∂x ∂y ∂z

43
Ph 12c 2024 5 The thermodynamic limit and free energy

 
∂x dx
To compute the constrained derivative ∂y z , we can set dz = 0 and solve for dy :

 
∂x ∂f /∂y
=− . (5.16)
∂y z ∂f /∂x

Similarly, we find
 
∂y ∂f /∂z
=− ,
∂z x ∂f /∂y
 
∂z ∂f /∂x
=− . (5.17)
∂x y ∂f /∂z

An immediate consequence of (5.16) and (5.17) is the handy formula


     
∂x ∂y ∂z
= −1, (5.18)
∂y z ∂z x ∂x y

which we will use in a moment. The nice thing about (5.18) is that it no longer
makes reference to the constraint f (x, y, z). This is useful because we don’t have to
bother writing down f (x, y, z) or choosing conventions for it.

5.4.2 Pressure, entropy, and free energy


For a homogeneous system (like a gas) with variable volume at finite temperature,
the entropy σ, energy U , and volume V are examples of three macroscopic variables
that satisfy a constraint. For example, we can compute the entropy σ(U, V ) as a
function of energy and volume. If we consider changes dU, dV , then we have
   
∂σ ∂σ
dσ = dU + dV
∂U V ∂V U
 
1 ∂σ
= dU + dV. (5.19)
τ ∂V U

In the second line, we used the definition of temperature. The second term involves a
constrained derivative that we don’t a-priori know, but it can be related to pressure
using the identity (5.18) for U, V, σ:
     
∂σ ∂V ∂U
= −1
∂V U ∂U σ ∂σ V
   
∂σ 1
− τ = −1
∂V U p
 
∂σ p
= (5.20)
∂V U τ

44
Ph 12c 2024 5 The thermodynamic limit and free energy

In the second line, we used the definition of pressure (5.14) and the definition of
temperature.
Substituting this result into (5.19), we find
1 p
dσ = dU + dV
τ τ
τ dσ = dU + pdV. (5.21)

Let us now compute the change in free energy corresponding to dU, dV . We have

dF = dU − τ dσ − σdτ
dF = −σdτ − pdV, (5.22)

from which we derive a relationship between pressure and free energy, together with
another useful expression for entropy
 
∂F
p=−
∂V τ
 
∂F
σ=− . (5.23)
∂τ V

We will put these to work momentarily.

5.5 Ideal gas


5.5.1 One atom in a box
Our eventual goal is to compute the partition function of a gas of N particles. It is
helpful to first compute the partition function Z1 of a single particle in a box. We
can then build the whole partition function from Z1 .
Definition (orbital). An orbital is a solution of the single-particle Schrodinger
equation.
The term “orbital” is motivated by atomic physics: in that case, orbital refers to a
solution of the single-particle Schrodinger equation for an electron in the potential
well created by the nucleus. However, we will use the term more generally.
Let us calculate the partition function Z1 of one particle of mass m in a cubical
box of volume V = L3 . The orbitals of the free particle Schrodinger equation

ℏ2
 
− ∇2 ψ = Eψ (5.24)
2m
are
nx πx ny πy nz πz
ψ(x, y, z) = A sin sin sin , (5.25)
L L L

45
Ph 12c 2024 5 The thermodynamic limit and free energy

where nx , ny , nz are any positive integers. The energy eigenvalues are

ℏ2  π 2 2
En = (nx + n2y + n2z ). (5.26)
2m L
We neglect the spin and other structure of the particle.
The partition function is the sum

ℏ2 π 2 2 2 2
e− 2mL2 τ (nx +ny +nz )
X
Z1 =
nx ,ny ,nz =1

2 (n2 +n2 +n2 )
X
= e−α x y z , (5.27)
nx ,ny ,nz =1

2 2
ℏ π
where α2 = 2mL 2 τ , and we exclude nx = 0, ny = 0, and nz = 0 terms as the
wavefunction vanishes with those values.
If α2 is sufficiently small, then the quantity inside the sum is slowly varying and
we can approximate the sum as an integral,
Z ∞ Z ∞ Z ∞
2 2 2 2
Z1 ≈ dnx dny dnz e−α (nx +ny +nz )
0 0 0
Z ∞ 3
2 2
= dne−α n
0
 Z ∞ 3
1 −x2
= dxe
α 0
!3
π 1/2
=

 mτ 3/2 nQ
= 2
V = nQ V = , (5.28)
2πℏ n
mτ 3/2

where n = 1/V and nQ = 2πℏ 2 is called the quantum concentration. Note we
have set the integration lower bound to 0 with negligible error, as the integrand is
slowly varying—the difference between starting from 0 and 1 is small. It is sometimes
more common to define the thermal wavelength
r
1 2πℏ2
nQ = 3 , λ= . (5.29)
λ mτ
The thermal wavelength is approximately the de Broglie wavelength of an atom with
energy ⟨E⟩ ∼ τ . Specifically λ ∼ ⟨p2ℏ⟩1/2 = (2m⟨E⟩)

1/2 ∼ (mτ )1/2 . Note that λ is an

intrinsically quantum-mechanical quantity, due to the presence of the ℏ. In some


formulas, like the ideal gas law derived below, it will drop out. However, in other
cases λ will remain, signaling a quantum-mechanical effect that cannot be explained

46
Ph 12c 2024 5 The thermodynamic limit and free energy

classically. Thus, nQ is the concentration associated with one atom in a cube of size
equal to the thermal average de Broglie wavelength.
The smallness of α is equivalent to the statement

L ≫ λ, (5.30)

i.e. the box is big compared to the thermal wavelength. When this condition holds,
we say that the gas is in the “classical regime.” An ideal gas is a gas of noninteracting
particles in the classical regime.
The average energy of the atom in a box is
 
2 ∂ 2 ∂ 3 3
U =τ log Z1 = τ log τ = τ. (5.31)
∂τ ∂τ 2 2
In D spatial dimensions, the answer would be
D
U= τ. (5.32)
2
This is a special case of a more general result called “equipartition of energy” that
we derive in section 8.

5.5.2 N particles and the Gibbs paradox


If we had N distinguishable particles in a box, then we could treat each particle like
a separate non-interacting system, and the partition function would be
distinguishable
X
ZN = e−(Es1 +···+EsN )/τ
s1 ,...,sN
!N
X
−Es /τ
= e
s
= Z1N (5.33)

This will turn out to be an incorrect expression for the ideal gas, for reasons
that we will explain. However, let us continue with it for now. The energy of the
ideal gas follows from the N -particle partition function,

U = τ2 log ZN
∂τ
3
= N τ. (5.34)
2
The free energy is
distinguishable
F distinguishable = −τ log ZN
= −τ N log(nQ V ). (5.35)

47
Ph 12c 2024 5 The thermodynamic limit and free energy

From here, we can compute the pressure


 
∂F Nτ
p=− =
∂V τ V
pV = N τ (5.36)

This is called the ideal gas law. Note that the thermal wavelength λ has dropped
out of this formula. In more traditional units,

pV = N kB T = Nmoles RT. (5.37)

[End of lecture 6(2024)]


We can also go on the compute the entropy
 
distinguishable ∂F
σ =−
∂τ V
  
∂ mτ 3/2
=− −τ N log V
∂τ 2πℏ2
3
= N log(nQ V ) + N. (5.38)
2
Here, we used that τ log nQ ∝ 32 τ log τ . This formula has some surprising properties.
Firstly, note that nQ has not dropped out. The notion of counting the states requires
quantum mechanics.
Secondly, consider a gas with some fixed concentration of particles n = N/V , so
that the volume can be written V = N/n. The above formula becomes
n  3
Q
σ distinguishable = N log + N + N log N. (5.39)
n 2
This is mysterious because it is not extensive — i.e. it is not proportional to N . In
particular, if this formula were true, then if we took two containers of identical gasses
at the same temperature and put them together, then the entropy would increase.
We could then bring the containers apart and return to our starting configuration.
More directly, this formula just doesn’t agree with experiment. This problematic
answer is called the “Gibbs paradox.”
The fix was understood by Gibbs before the advent of quantum mechanics.
However, it is a purely quantum mechanical effect with no classical analog. The
key point is that identical fundamental particles are indistinguishable. This means
that the multi-particle state |s1 , s2 , . . . , sN ⟩ is the same (up to a sign) under a
permutation of the labels,

|s1 , s2 , . . . , sN ⟩ = (−1)F |s2 , s1 , . . . , sN ⟩ (5.40)

48
Ph 12c 2024 5 The thermodynamic limit and free energy

Here, F = 0 if the particles are bosons and F = 1 if the particles are fermions.
Thus, for indistinguishable particles, the sum (5.33) over-counts the states. To
correctly count the states, we must divide by the number of permutations of the
labels s1 , . . . , sN . For example, if we have three bosonic particles in the follow-
ing states, then the distinguishable partition function overcounts by the following
factors:

|1, 1, 1⟩ → 1
|1, 1, 2⟩ → 3
|1, 2, 3⟩ → 6
|123, 14, 999⟩ → 6. (5.41)

If the particles are fermionic, then the first two states above actually vanish, while
the last two are again overcounted in Z distinguishable by a factor of 6.
If the number of particles is much smaller than the number of states si , then most
states have all labels si different (i.e. very few orbitals will be multiply occupied), so
the distinguishable partition function overcounts by a factor of N !. This is a good
approximation when we have a dilute gas, where multiple-occupancy is very rare.
We will see how to deal correctly with multiple-occupancy later.
Let us assume a dilute gas. Taking into account the indistinguishability of
fundamental particles, the correct partition function is
indistinguishable 1 N
ZN = Z (dilute gas). (5.42)
N! 1
The free energy is

F = −τ log ZN
= −τ N log (nQ V ) + τ log N !
= −τ N log (nQ V ) + τ (N log N − N ), (5.43)

where we have used Stirling’s approximation and only kept terms of size N or
N log N . The pressure computation is the same as before. However, now the result
for the entropy is different
 
∂F
σ=−
∂τ V
3
= N log(nQ V ) + N − N log N + N
  2
 
V 5
= N log nQ + . (5.44)
N 2
If we fix the concentration N/V , then this formula is now extensive. The extra factor
of 1/N ! has solved the Gibbs paradox. Equation (5.44) called the Sackur-Tetrode
equation, and it does agree with experiment.

49
Ph 12c 2024 6 Blackbody radiation and the Planck distribution

6 Blackbody radiation and the Planck distribution


6.1 The Planck distribution
The Planck distribution describes the spectrum of electromagnetic radiation in a
cavity in thermal equilibrium. We can think of electromagnetic radiation as a gas
of photons which are essentially non-interacting. However there are two main dif-
ferences that will make the analysis different from the one in the previous section:
• Unlike in the case of the ideal gas, the gas of photons is not dilute. Each
orbital can be occupied by multiple photons.
• In addition, photon number is not conserved — states with any number of
photons should be considered accessible states.

6.1.1 Energy in a mode


An orbital in this case is an oscillation mode in the cavity. There can be zero or
more photons present in each mode. The energy of a single photon in a mode with
frequency ω is
ℏω, (6.1)
and the energy of s photons in this mode is
sℏω. (6.2)
Thus, the possible energy levels associated with a mode look just like energy levels
of a harmonic oscillator.
The partition function of a single mode is

X 1
Zmode = e−sℏω/τ = (6.3)
s=0
1 − e−ℏω/τ
ℏω
where y = τ . The average energy in the mode is

⟨E⟩ = τ 2 log Zmode
∂τ

= −τ 2 log(1 − e−ℏω/τ )
∂τ
e−ℏω/τ
 
2 ℏω
= −τ − 2
τ 1 − e−ℏω/τ
ℏω
= ℏω/τ . (6.4)
e −1
When ℏω ≪ τ , we have simply ⟨E⟩ ≈ τ . When ℏω ≫ τ , ⟨E⟩ ≈ 0, and in fact it
is exponentially suppressed. This suppression is due to the fact that the minimum
energy in the mode is ℏω, and the probability of having this energy is suppressed
due to a Boltzmann factor.

50
Ph 12c 2024 6 Blackbody radiation and the Planck distribution

6.1.2 What are photons?


Let us explain where this picture comes from. In quantum electrodynamics, an
oscillating electric field is best understood in terms of its relation to a vector potential
A:
∂A
E=−
∂t
B = ∇ × A. (6.5)

We can decompose the vector potential into spatial modes


X
A= qk (t) cos(k · x) + rk (t) sin(k · x). (6.6)
k

The fields are


X
E= (−q̇k (t) cos(k · x) − ṙk (t) sin(k · x))
k
X
B= (−k × q(t) sin(k · x) + k × r(t) cos(k · x)) . (6.7)
k

Plugging these expressions into the energy of an EM field, we have


Z
1
E= d3 x(E2 + c2 B2 )
2
X1  X1 2
q̇2k + c2 k2 q2k + ṙk + c2 k2 r2k

∝ (6.8)
2 2
k k

This is the Hamiltonian for a bunch of oscillators with frequency ω = c|k|. In


quantum field theory, each classical oscillator gets promoted to a quantum oscillator,
and the different energy levels of the oscillators are interpreted as photons.

6.1.3 Modes in a cavity


There are an infinite number of modes of the electromagnetic field, and each one
can be occupied by any number of photons. Let us solve Maxwell’s equations to
find the radiation modes in a cubical cavity of size L × L × L.
Because Maxwell’s equations are linear with time-independent coefficients, they
can be solved by going to Fourier space in the time variables:
Z ∞
E(x, t) = dωeiωt E(x,
e ω),
−∞
Z ∞
B(x, t) = dωeiωt B(x,
e ω). (6.9)
−∞

51
Ph 12c 2024 6 Blackbody radiation and the Planck distribution

Because E and B must be real, we have the condition E(x, e ω)∗ = E(x,
e −ω) and
similarly for B.
e
Maxwell’s equations are also translationally-invariant, so the functions E(x,
e ω)
and B(x,
e ik·x
ω) can be written as linear combinations of plane waves e . However,
we must impose boundary conditions, and this will require us to take special linear
combinations of eik·x that become sines and cosines. We have two kinds of boundary
conditions:

• Firstly E∥ = 0. This follows from the equation ∇ × E = − ∂B ∂t . We integrate


this equation over the interior of a very thin rectangular loop with one edge
inside the conductor and one edge outside. The right-hand side is zero because
the rectangle has zero area. The left-hand side is (E∥ )out − (E∥ )in , by Stokes’
theorem. Finally, we use the fact that electric fields vanish inside a conductor
to conclude that (E∥ )out = 0.

• The other boundary condition is B⊥ = 0. To derive this, we start from


the equation ∇ · B = 0. Integrating this over a small “pill box” with a
face inside the conductor and another face outside the conductor, we find
(B⊥ )in = (B⊥ )out . Now the dynamic part of the magnetic field inside a con-
ductor vanishes. The reason is that the electric and magnetic fields are related
by
∂B
∇×E=− . (6.10)
∂t
Thus, a changing magnetic field inside a conductor implies the presence of an
electric field, which is not allowed. A conductor can contain a static magnetic
field, but we can ignore this in our analysis. All our modes will be oscillating
at some nonzero frequency ω and do not include static contributions.

A linear combination of plane waves that satisfies the above boundary conditions
is
ex (x, ω) = Ex0 cos nx πx sin ny πy sin nz πz
E
L L L
n x πx n y πy n z πz
E
ey (x, ω) = Ey0 sin cos sin
L L L
ez (x, ω) = Ez0 sin nx πx sin ny πy cos nz πz .
E (6.11)
L L L
where n = (nx , ny , nz ) is a vector of positive integers. It is hopefully easy to see
that (6.11) obeys the boundary condition E∥ = 0. To see that it obeys B⊥ = 0, we
can use (6.10). For example,
nx πx ny πy nz πz
Ḃx = ∂y Ez − ∂z Ey ∝ sin cos cos , (6.12)
L L L

52
Ph 12c 2024 6 Blackbody radiation and the Planck distribution

which vanishes at x = 0.
The components of the electric field are not independent. We also have to impose
∂Ex ∂Ey ∂Ez
∇·E= + + = 0. (6.13)
∂x ∂y ∂z
This gives the condition
Ex0 nx + Ey0 ny + Ez0 nz = E0 · n = 0, (6.14)
where we have defined E0 = (Ex0 , Ey0 , Ez0 ). For any given n, equation (6.14) says
that fluctuations of the electric field must be perpendicular to n. The two perpen-
dicular directions are the two possible polarizations of electromagnetic radiation.
Finally, we also have the wave equation, which says
∂2E
c2 ∇2 E = . (6.15)
∂t2
Plugging in (6.9) with E(x,
e ω) given by (6.11), we find a relationship between ω and
n:
c2 π 2 n2
= ω2. (6.16)
L2

6.1.4 Energy of a photon gas and the Planck distribution


Now that we know the modes of the electromagnetic field in a cavity and the parti-
tion function (6.3) of a single mode, we can write down the partition function of a
photon gas. Because each mode is independent, the partition function is
Y
Z(τ ) = Zmode (n, τ )
n,polarizations
Y
= Zmode (n, τ )2
n
Y 1
= . (6.17)
n
(1 − e−ℏωn /τ )2
Let us take the logarithm, which turns the product into a sum:
X
log Z = −2 log(1 − e−ℏωn /τ ). (6.18)
n

We return to this quantity in a moment. For now, let us study the energy U of
photons in the cavity. We have

U = τ2 log Z
∂τ
X ℏωn
=2 . (6.19)
n
e n /τ − 1
ℏω

53
Ph 12c 2024 6 Blackbody radiation and the Planck distribution

We now make a “big box” approximation, so that the quantity ℏωn /τ is slowly-
varying with n. In more detail, our assumption is ℏπcLτ ≪ 1. The quantity τ can
ℏc

be thought of as the thermal wavelength of a relativistic particle — it is the typical


wavelength at temperature τ . We are making the assumption that L ≫ ℏc τ , so that
the size of the box is large compared P
to the typical wavelength of radiation in the
box. Under our assumption, the sum n turns into an integral

1 ∞
Z
ℏωn
U ≈2 4πn2 dn ℏω /τ . (6.20)
8 0 e n −1
πc
Let us change variables to the frequency ω = L n. We find

ω3
Z
U ℏ
= dω , (6.21)
V 0 π 2 c3 eℏω/τ − 1

where V = L3 . The quantity in the integrand

ℏ ω3
uω = (6.22)
π 2 c3 eℏω/τ − 1
has the interpretation as the energy density of electromagnetic radiation per unit
frequency range. This is the Planck radiation law.
[End of lecture 7(2024)]
In the small ω limit, the Planck law becomes

ω2τ
uω ≈ ℏω ≪ τ (6.23)
π 2 c3
This is the Rayleigh-Jeans law. The ℏ’s have cancelled and the resulting formula
can be understood purely classically. A problem with the Rayleigh-Jeans law is
that if we extrapolate it up to high energies, the spectral density continues to grow,
and we find that the energy density is divergent. This is the so-called “ultraviolet
catastrophe” that was fixed by the advent of quantum mechanics. The catastrophe
is avoided in the Planck distribution because high frequency modes are unoccupied
due to the fact that they have a minimum energy ℏω (the energy of a single photon
in the mode). This leads to exponential suppression by the Boltzmann factor.
Let us finally integrate (6.21) to get the total energy density

ℏ  τ 4 ∞ x3
Z
U
= 2 3 dx x (6.24)
V π c ℏ 0 e −1

54
Ph 12c 2024 6 Blackbody radiation and the Planck distribution

The integral is
∞ ∞
x3
Z Z
dx x = dxx3 (e−x + e−2x + e−3x + . . . )
0 e −1 0
 
1 1
= Γ(4) 1 + 4 + 4 + . . .
2 3
= 6ζ(4)
π4
=6 , (6.25)
90
where ζ(s) is the Riemann-ζ function.16 Altogether, we have

U ℏ  τ 4 π 4
= 2 3 6
V π c ℏ 90
π2
= τ4
15c3 ℏ3
= ατ 4 . (6.26)

The fact that the energy density of radiation is proportional to τ 4 is the Stefan-
Boltzmann law.

6.1.5 Energy flux


Consider a cubical cavity of radiation, and let us cut a small hole of area A in the
cavity. The amount of radiation that escapes in a time ∆t depends on the direction
the radiation is moving. Specifically, if the angle between the wave vector and the
normal to the hole is θ, then the radiation in a column of length

c∆t cos θ (6.27)

escapes the hole.


The average volume of radiation that escapes the hole is given by averaging over
solid angles. A solid-angle element is dϕdθ sin θ. We integrate only over θ ∈ [0, π2 ]
because only right-moving radiation can escape the cavity. We divide by 4π which
is the volume of S 2 :
Z π Z 2π
1 2 Ac∆t
Vescape = Ac∆t dθ dϕ sin θ cos θ = . (6.28)
4π 0 0 4
16
If you haven’t heard of it, definitely read about the ζ-function, one of the most (in)famous
functions in all of mathematics: https://en.wikipedia.org/wiki/Riemann_zeta_function. For
a history and some derivations of values of the zeta function at even integers, see https://en.
wikipedia.org/wiki/Basel_problem.

55
Ph 12c 2024 6 Blackbody radiation and the Planck distribution

Thus, the flux of energy per unit area per unit time is

Vescape U c π2τ 4
JU = = ατ 4 = = σB T 4
A∆t V 4 60ℏ3 c3
π 2 kB
4
σB = . (6.29)
60ℏ3 c2

6.2 Kirchhoff ’s Law


Definition (black). An object is defined to be black in a given frequency range if
all EM radiation incident on it in that frequency range is absorbed.
A small hole in a cavity is black because the radiation will enter the cavity and
bounce around many times, being absorbed by the walls long before it escapes back
out through the hole.
The radiant energy JU from a black surface is equal to the energy flux density
from a hole in a cavity at the same temperature. To see this, cover the hole with
the surface. All the energy radiated from the hole is absorbed by the surface. But
to achieve thermal equilibrium, the same amount of energy must be radiated from
the surface back into the hole.
Definition (absorptivity). An object has absorptivity a ∈ [0, 1] if it absorbs a
fraction a of the radiation incident upon it.

Definition (emissivity). An object has emissivity e ∈ [0, 1] if the energy flux from
the object is e times that of a black body.
Realistic objects are not black and have some absorptivity/emissivity less than
1. The above argument about radiation flux from a black body generalizes to show
that absorptivity and emissivity are always equal, a = e. Consider covering the hole
in a cavity with an object with absorptivity a at temperature τ . The object absorbs
an energy flux aJUblack . In order for thermal equilibrium to be maintained, it must
emit the same energy flux aJUblack , implying a = e. This is called Kirchhoff’s law.
The law holds as a function of frequency as well, a(ω) = e(ω).

6.3 Planck distribution, temperature, and CMB


The Planck law can be used to compute the temperature of an object. One looks
τ ≈ 2.82.
for the peak of the energy density in frequency space. This occurs at ℏω
An example is the cosmic microwave background (CMB): the universe is filled
with approximately black-body radiation at temperature 2.73K, see figure 2. This
radiation was originally generated when the entire universe was a hot interacting
plasma of electrons and protons strongly interacting with the radiation. When the
temperature of the radiation cooled to 3000K, most of the electrons and protons
got bound into Hydrogen. This is called “recombination.” After recombination, only

56
Ph 12c 2024 6 Blackbody radiation and the Planck distribution

Figure 2: This xkcd comic (https://xkcd.com/54/) shows a plot of the energy


density of the CMB as a function of frequency. The dots represent data from the
COBE satellite, and the curve is the Planck distribution (with f = 2πω and τ = kT ).

specific frequencies of EM radiation could interact strongly with matter, so most of


the photons decoupled from matter. The gas of photons then underwent isentropic
expansion.
Definition (isentropic). We say that a process is isentropic if entropy remains
constant during the process.

6.4 Partition function, entropy, and pressure of photon gas


Let us return to the log of the partition function (6.18). We can make the big-box
approximation and turn it into an integral as before. We will be left with another
rather complicated looking integral dxx2 log(1 − e−x ). This is doable using the
R

same tricks as before.


However, instead of going through all this, let us simply integrate our expression

for U = ⟨E⟩ = τ 2 ∂τ log Z:
τ
U (τ ′ ) αV τ 3
Z
log Z = dτ ′ = . (6.30)
0 τ ′2 3

The integration constant vanishes because when τ = 0, all modes are in their ground

57
Ph 12c 2024 6 Blackbody radiation and the Planck distribution

state, and we see explicitly from (6.17) that Z(τ = 0) = 1. The free energy is

αV τ 4
F = −τ log Z = − . (6.31)
3
The entropy is

4αV τ 3
 
∂F
σ=− = . (6.32)
∂τ V 3

Note that this is consistent with F = U − τ σ.


If a gas of photons undergoes isentropic expansion, the temperature scales like
V −1/3 = L−1 , where L = V 1/3 . This makes physical sense. Under isentropic
expansion, the occupancy of every mode remains the same. The only effect of
expansion L → γL is that the wave vectors rescale by k → k/γ, and consequently
the frequencies and thus the energies in each mode scale like ω → ω/γ. To keep the
Boltzmann factors unchanged, we must have τ → τ /γ as well.
The pressure of the photon gas is
 
∂F 1 1U
p=− = ατ 4 = . (6.33)
∂V τ 3 3V

Together with the energy density VU , this quantity plays an important role in cosmol-
ogy because it controls the way that radiation affects the evolution of the geometry
of the universe.
The factor of 1/3 in (6.33) reflects an interesting property of electromagnetic
radiation: conformal symmetry. A conformally-symmetric system has a traceless
stress-energy tensor. The stress-energy tensor of a stationary perfect fluid is given
by
 
−ρ 0 0 0
 0 p 0 0
Tµ ν =   0 0 p 0 ,
 (6.34)
0 0 0 p

where ρ = U/V is the energy density. Thus, tracelessness of T implies (6.33).

58
Ph 12c 2024 7 Debye theory

7 Debye theory
A solid has vibrational modes. A quantum of energy of a vibrational mode of a
solid is called a phonon. In this section, we study the thermodynamics of a solid
by modeling it as a gas of phonons. Phonons have some similarities to photons,
so certain parts of the analysis of phonons will parallel our analysis of a photons.
However, there are also some important differences.
One similarity is that in the limit of an infinitely large system, there can be
phonons with very low energy. In the case of the electromagnetic field, this followed
from the dispersion relation ω = c|k|, where k = Lπ n is the wavevector. In the limit
that L is very large, there are wavevectors k with very small magnitude (i.e. modes
with very long wavelength), and the energy E = ℏω|k| of quanta in those modes is
small.
Phonons also satisfy a dispersion relation of the same form ω = v|k|, but this
time v is the speed of sound in the solid. In particular, note that we have arbitrarily
low-energy phonons in an arbitrarily large solid L → ∞, since then the wave number
|k| can get arbitrarily tiny in that case. This is a consequence of translational
symmetry. The argument is as follows: As the wavelength of a phonon gets longer,
it looks locally more and more like a translation of the lattice. Translation doesn’t
cost any energy because it is a symmetry. Thus, the energy of a phonon should
approach zero as k → 0. An excitation whose long-wavelength limit is controlled by
a symmetry is called a goldstone boson, and a phonon is an example of a goldstone
boson for translational symmetry.
[End of lecture 8(2024)]
An important difference between photons and phonons is as follows. There are
an infinite number of electromagnetic modes in any fixed volume. However, the
number of phonon modes in a solid is bounded. The reason is that a solid is made
up of N atoms. So the number of position coordinates of the atoms is 3N and
the number of momentum coordinates is 3N . The normal modes are obtained by
writing the Hamiltonian
X p2 1 X
i
Hsolid = + m Ωij xi xj (7.1)
2m 2
i i,j

and diagonalizing the matrix Ωij . The dimension of the space of modes is unchanged
by the diagonalization, and hence the total number of modes is still 3N .
Example 7.1 (1d lattice). Another way to say this is that wavelengths smaller
than the lattice spacing are redundant with other wavelengths. For simplicity, con-
sider a 1-dimensional lattice with lattice spacing a. A mode with number n is

59
Ph 12c 2024 7 Debye theory

equivalent to a mode with number n + 2mL a for m ∈ Z because


 nπx   
(n + 2mL/a)πx
sin = sin , (7.2)
L L
for all x ∈ aZ in the lattice. (In signal processing, this phenomenon is related to
the Nyquist limit.) Note that L/a = N , the number of atoms in the lattice, so this
redundancy is n ∼ n + 2N , where “∼” means “is equivalent to.” Furthermore, we
have n ∼ −n. Thus, the set of inequivalent mode numbers is n = 1, . . . , N . There
are N of them.
A vibrational wave has three polarizations: two transverse polarizations and one
longditudinal polarization (think of a slinky). Thus, the sum over modes takes the
form
Z
X 3
≈ 4πn2 dn (7.3)
8
n,polarizations

Following Debye, let us suppose that n has a uniform cutoff nD and determine nD
by solving the equation
3 nD
Z
π
4πn2 dn = n3D = 3N
8 0 2
6N 1/3
 
nD = . (7.4)
π
The calculation of the energy is the same as for photons. Now we have ω = v|k|,
where v is the speed of sound. The speed of sound can actually be different for
transverse and longditudinal polarizations in general, but we will assume it is the
same for simplicity. As before k = πn/L is a wavevector. The energy is
X ℏω
U=
modes
eℏω/τ −1
Z nD
3 ℏωn
= 4π dn n2 ℏω /τ . (7.5)
8 0 e n −1
We have ω = πv L n, where v is the velocity of sound (which we assume is the same
for all polarizations, for simplicity). Thus, the energy can be written
3π 2 ℏv τ L 4 xD x3
  Z
= dx x , (7.6)
2L πℏv 0 e −1
where
θ
xD = ,
τ
1/3
6π 2 N

θ = ℏv (7.7)
V

60
Ph 12c 2024 7 Debye theory

is the Debye temperature. The integral can be done analytically at low tempera-
tures, where xD → ∞. The result is

3π 4 N τ 4
U≈ . (7.8)
5θ3
The heat capacity is

12π 4 N  τ 3
 
∂U
CV = = , (7.9)
∂τ V 5 θ

which matches experiment pretty well. This is called the Debye τ 3 law.

61
Ph 12c 2024 8 Classical Statistical Mechanics

8 Classical Statistical Mechanics


We have seen that there are some quantities, like the partition function and the en-
tropy from which quantum mechanical effects do not completely decouple. However,
it is still possible to approximate these quantities using classical calculations with a
small amount of input from quantum mechanics. This stems from a general prin-
ciple that in the classical limit, numbers of states can be approximated by volumes
of phase space. Recall that phase space is the space of positions and momenta of
particles. For example, the classical phase space of a single particle in on dimension
is specified by (x, p) ∈ R2 .
Let us study a single quantum mechanical particle moving in 1 dimension, with
Hamiltonian
2
b = pb + V (b
H x). (8.1)
2m
I will try to carefully distinguish quantum operators from classical numbers using
hats. We have seen that the partition function can be written
Z ∞
−β H
Z = Tr(e )= dx⟨x|e−β H |x⟩, (8.2)
b b

−∞

where on the second line, we chose the position basis.


Now, the partition function is complicated because we have the exponential of
this operator involving x
b and pb, which of course do not commute

[b
x, pb] = iℏ. (8.3)

The exponential of a sum of non-commuting matrices can be written


1
eA+B = eA eB e− 2 [A,B]+... , (8.4)
b b b b b b

where . . . represents higher nested commutators [[A,


b B],
b B],
b etc.. (This is a form of
the Baker-Campbell-Hausdorff formula.) 17

17
As a sanity check on this formula, let us expand the left-hand side

b+ 1 A
 
eA+B = 1 + A b+B b2 + AbBb+B bAb+B b2 + . . .
b b
2
 
=1+A b+B b + 1A b2 + AbBb + 1Bb 2 − 1 [A,
b B]b + ...
2 2 2
   
1 b2 + . . . 1 b2 + . . . 1 b b
= 1+A b+ A 1+Bb+ B 1 − [A, B] + . . . (8.5)
2 2 2
The claim is that if we include higher-order terms, we can always group the discrepancy between
eA+B and eA eB into nested commutators.
b b b b

62
Ph 12c 2024 8 Classical Statistical Mechanics

However, the classical limit is defined as the limit in which ℏ → 0. (Because ℏ is


a dimensionful quantity, this really means that other dimensionful quantities with
units of energy×time (i.e. action) are large compared to ℏ.) In this limit, we have
b2
p b2
p
e−β 2m −βV (bx) ≈ e−β 2m e−βV (bx) . (8.6)

Plugging this into our trace, we get


Z ∞
b2
p
Z≈ dx⟨x|e−β 2m e−βV (bx) |x⟩
Z−∞∞ b2
p
= dx⟨x|e−β 2m |x⟩e−βV (x)
Z−∞∞ b2
p
= dxdp⟨x|e−β 2m |p⟩⟨p|x⟩e−βV (x)
Z−∞∞ p2
= dxdp⟨x|p⟩⟨p|x⟩e−β 2m −βV (x) . (8.7)
−∞

But we also have


ipx
⟨x|p⟩ = Ae ℏ . (8.8)

This is the wavefunction for a momentum eigenstate. The factor A is determined


(up to a phase) by demanding that a momentum eigenstate be properly normalized
Z ∞

⟨p |p⟩ = dx⟨p′ |x⟩⟨x|p⟩
−∞
Z ∞
ix(p−p′ )
= dx|A|2 e ℏ
−∞
1
= 2πℏ|A|2 δ(p − p′ ) =⇒ |A|2 = . (8.9)
2πℏ
Thus, finally we have
Z
1 p2
Z≈ dxdpe−β 2m −βV (x)
2πℏ
Z
1
= dx dp e−βH(x,p) , (8.10)
2πℏ
where H(x, p) is the classical Hamiltonian.
The lesson is that a sum over quantum mechanical states can be approximated
by an integral over phase space, where one quantum state has phase-space volume
2πℏ for each pair p, x. For example, in N +N -dimensional phase space, a quantum
state has volume (2πℏ)N ,
Z N N
−β H d xd p −βH(p1 ,...,pN ,x1 ,...,xN )
Tr(e ) →classical e . (8.11)
b
(2πℏ)N

63
Ph 12c 2024 8 Classical Statistical Mechanics

Thus, a little bit of quantum mechanics is still necessary to make sense of a sum
over states. However, once we know the density of states in phase space, we can
compute various quantities in a classical approximation.
Example 8.1 (Classical ideal gas). As an application, let us recover our formula
for the partition function of the ideal gas in the classical regime. In this case, we
have V (x) = 0, so the single-particle partition function is
Z
1 3 3 −β 2m ⃗2
p
Z1 = d ⃗
x d p
⃗ e
(2πℏ)3
Z ∞ 3
V p2
−β 2m
= dp e
(2πℏ)3 −∞

2πm 3/2
 
V  mτ 3/2
= = V . (8.12)
(2πℏ)3 β 2πℏ2

Once we have the single-particle partition function, we can compute the full partition
function via
1 N
ZN = Z , (8.13)
N! 1
and this agrees with our previous analysis.
A nice property of the classical phase space integral is that it is simple and
flexible: we could have a box with a complicated shape, and we don’t have to worry
about finding the quantum wavefunctions inside this shape and figuring out the
number of states at each energy level. The contribution from the d3 x integral is
always V . This means that we have actually improved our previous analysis where
we had to assume a cubical box. We now see that the thermodynamics of the gas
doesn’t depend on the shape of the box, in the classical regime.
Example 8.2 (Density of eigenvalues of the Laplacian). We can turn this
observation around and deduce an interesting mathematical result. Consider a box
with an arbitrary shape. The wavefunctions for energy eigenstates inside the con-
tainer are solutions of the time-independent Schrodinger equation

⃗2
ℏ2 ∇
− ψ(x) = Eψ(x) (8.14)
2m
with the boundary condition ψ(x)|boundary = 0 (called a Dirichlet boundary condi-
tion). Thus, the energies are given by Ei = ℏ2 λi /2m, where λi are the eigenvalues
of the Laplacian −∇ ⃗ 2 with Dirichlet boundary conditions. For a general region,

the spectrum of −∇2 can be quite complicated, see e.g. https://en.wikipedia.
org/wiki/Hearing_the_shape_of_a_drum. However, our result (8.12) encodes a
universal answer for the asymptotic density of large eigenvalues.

64
Ph 12c 2024 8 Classical Statistical Mechanics

To see this, let us write the single-particle partition function as an integral over
eigenvalues of the Laplacian
Z
−ℏ2 λi /(2mτ ) 2
X X
−Ei /τ
Z1 = e = e = dλ ρ(λ)e−ℏ λ/(2mτ ) . (8.15)
i i
P
In the last line, we introduced the density of eigenvalues ρ(λ) = i δ(λ − λi ). In the
classical limit ℏ → 0, the exponential decays slowly and the integral is dominated
by large λ. Let us suppose that ρ(λ) behaves like a power law at large λ,18

ρ(λ) ∼ aλb (large λ). (8.16)

In the classical limit, the integral becomes


 b+1
2mτ
Z ≈ aΓ(b + 1) (ℏ → 0). (8.17)
ℏ2

Comparing to (8.12), we can solve for a and b to find

V 1/2
ρ(λ) ∼ λ (large λ). (8.18)
4π 2
This is Weyl’s law https://en.wikipedia.org/wiki/Weyl_law in 3-dimensions.
Can you derive the d-dimensional version?

[End of lecture 9(2024)]

8.1 Equipartition of energy


Equipartition of energy states that in the classical limit, every degree of freedom,
defined as a quadratic term in the Hamiltonian, has energy τ2 . This is simple to
prove using phase space integrals. Suppose that the Hamiltonian has the form
H = yT Ay, where y is a vector of phase space variables (x’s and p’s), and A is
some positive-definite matrix. The partition function is
Z Z
T 1 T const.
Z(β) = dN ⃗y e−βy Ay = N/2 dN ⃗y e−y Ay = N/2 . (8.19)
β β
The average energy is
 
∂ ∂ N N
U =− log Z = − − log β = τ. (8.20)
∂β ∂β 2 2
18
In reality, ρ(λ) is a sum of delta functions, so cannot be a power law. However, if we smooth it
out a little bit (for example by averaging over a window), then it becomes a power law in the large
λ limit.

65
Ph 12c 2024 8 Classical Statistical Mechanics

In practice, the Hamiltonian is never exactly quadratic. Equipartition of energy


applies when the quadratic approximation to the Hamiltonian is good over the
entire region of phase-space that dominates the partition function integral. As an
example, in the monatomic ideal gas, the Hamiltonian is quadratic in momenta to
extremely high precision. However, the Hamiltonian is not well-approximated as
being quadratic in position — instead the dependence on position is some function
that changes sharply at the boundary of the box. Thus, we must separate out the
position variables when computing Z(β), which is why equipartition only applies to
the three momenta in this case.
Furthermore, a quantum harmonic oscillator needs sufficiently high temperature
in order to have ⟨E⟩ = τ , so equipartition of energy only applies to those degrees of
freedom that are in the classical regime, i.e. such that τ ≫ ℏω. For temperatures
τ ≪ ℏω (the quantum regime), a quantum harmonic oscillator is exponentially likely
to be in its ground state, and so has average energy ⟨E⟩ ≈ 0.
Example 8.3 (Monatomic gas). The equipartition theorem gives the energy of a
monatomic gas in the classical regime almost instantly. Consider a gas of N particles.
1 PN 2
The Hamiltonian is 2m i=1 i . This has 3N quadratic degrees of freedom, so the
p

3
average energy is U = 2 N τ and the heat capacity is C = 32 N . This recovers our
earlier result (5.34).
Example 8.4. The equipartition theorem also gives a useful qualitative under-
standing of the heat capacity of a diatomic gas. Diatomic molecules have

• 3 momenta,

• 2 rotation angles,

• 1 longitudinal vibration mode.

To apply equipartition of energy, we must decide which of these degrees of freedom


is in the classical regime.
Assuming we have a big box, the momentum degrees of freedom are in the
classical regime, and they contribute an energy per molecule of 32 τ .
The rotation degrees of freedom are described by the quantum rigid rotor that
you studied on your homework set. The rotor has a characteristic energy scale
Erot = ℏ2 /2I, where I is the moment of inertia of the molecule. When τ is below
this value, the rotational degrees of freedom are in the quantum regime, and when τ
is above this value, the rotational degrees of freedom are in the classical regime. In
the classical regime, the rotor has two quadratic degrees of freedom, which are the
momenta conjugate to the rotation angles. Thus, the rotational degrees of freedom
contribute energy 2 × τ2 = τ per molecule in the classical regime.
Finally, the longitudinal vibration mode is in the classical regime if τ ≫ ℏω,
where ω is the characteristic frequency of oscillation. In this regime, there are two
quadratic degrees of freedom — one position and one momentum variable for the

66
Ph 12c 2024 8 Classical Statistical Mechanics

harmonic oscillator. Thus longitudinal vibrations contribute energy 2 × τ2 = τ per


molecule in the classical regime.
The energy scales associated with spatial momenta, rotations, and longitudinal
vibrations are usually ordered as

Emom < Erot < Evib . (8.21)

Thus, the heat capacity of a diatomic gas as a function of temperature looks like
figure 3.

Figure 3: Heat capacity per molecule C b = C/N for a diatomic gas, as a function of
temperature. At low temperatures, only translational degrees of freedom are in the
classical regime and C b = 3 . At intermediate temperatures, both translations and
2
rotations are classical and Cb = 3 + 1 = 5 , and at high temperatures, translations,
2 2
b = 3 + 1 + 1 = 7 . Figure
rotations, and vibrations are in the classical regime and C 2 2
from Wikipedia https://en.wikipedia.org/wiki/Molar_heat_capacity.

67
Ph 12c 2024 9 Chemical potential and the Gibbs Distribution

9 Chemical potential and the Gibbs Distribution


9.1 Entropy and conserved quantities
We began this course by discussing the notion of thermal contact. Two systems are
in thermal contact if energy is allowed to flow between them, but they are otherwise
isolated. Our analysis of systems in thermal contact relied on two ingredients:

• The fundamental assumption of statistical mechanics, which says that the


systems were overwhelmingly likely to maximize the entropy

σ = σ1 (U1 ) + σ2 (U2 ), (9.1)

• Conservation of energy

U = U1 + U2 . (9.2)

From these ingredients, we derived the Boltzmann distribution, defined the partition
function, and computed lots of interesting things.
Analogous reasoning applies whenever there is a conserved quantity that the
entropy depends on. Such a conserved quantity gives rise to a generalization of the
Boltzmann distribution, a generalization of the partition function, and lots more
interesting results. In this section, we will develop these methods for an important
(approximately) conserved quantity: particle number.

9.2 Chemical potential


Unlike energy, particle number is not a fundamental conserved quantity in the Stan-
dard Model of particle physics.19 However, particle number can be effectively con-
served in different physical situations. For example, if we consider physical processes
at temperatures small compared to the activation energy of various chemical reac-
tions, then the numbers of each type of atom/molecule are approximately conserved.
If we consider processes at temperatures small compared to the binding energies of
nuclei, then the numbers of various types of nuclei are approximately conserved.
Thus, let us generalize our tools to take into account conserved particle number.
For simplicity, we mostly assume that there is one species of particle, and that the
total number N of that species is conserved.
Definition (diffusive contact). In analogy with the definition of thermal contact,
we say that two systems S1 , S2 are in diffusive contact if they can exchange particles.
19
There is a particle-number-like quantity called B − L, which measures the difference between
baryon number and lepton number, that is very nearly conserved. Experiments have not yet been
able to detect changes in B − L. However, B − L non-conservation is expected due to quantum
gravitational effects. For example, Hawking radiation of black holes violates B − L conservation.

68
Ph 12c 2024 9 Chemical potential and the Gibbs Distribution

Consider two systems S1 , S2 in thermal and diffusive contact. Let the number
of particles in system Si be Ni , and let the energy of Si be Ui . Let the total number
of particles be N and the total energy be U . The entropy of each system depends
on the number of particles and the energy of that system:
σ1 (U1 , N1 ) and σ2 (U2 , N2 ). (9.3)
By conservation of energy and particle number, the total entropy is
σ = σ1 (U1 , N1 ) + σ2 (U2 , N2 )
= σ1 (U1 , N1 ) + σ2 (U − U1 , N − N1 ) (9.4)
The systems will reach thermal and diffusive equilibrium in the most probable
configuration, where (9.4) is maximized. This leads to the conditions
   
∂σ1 ∂σ2
= ,
∂U1 N ∂U2 N
   
∂σ1 ∂σ2
= . (9.5)
∂N1 U ∂N2 U
The first condition is simply τ1 = τ2 . To write the second condition, let us introduce

Definition (chemical potential). The chemical potential µ is defined by


 
µ ∂σ
=− , (9.6)
τ ∂N U
where τ is the temperature.
The factor of τ is a convention and it means that µ has dimensions of energy.
The minus sign is another convention introduced so that chemical potential behaves
kind of like a potential energy for particles. To see this, suppose that µ1 > µ2 .
We see that in order to increase the total entropy (9.4), we must decrease N1 and
increase N2 . Thus, particles move from the region of higher chemical potential to
the region of lower chemical potential.
With the above definition, the condition for diffusive equilibrium is
µ1 = µ2 (diffusive equilibrium). (9.7)
If several different species of particle are present, each one has its own chemical
potential
 
µj ∂σ
=− . (9.8)
τ ∂Nj U,N1 ,...,Nj−1 ,Nj+1 ,...

(Here, the partial derivative is taken while holding all particle numbers except Nj
fixed.) The condition for diffusive equilibrium becomes
(µj )1 = (µj )2 for all species j. (9.9)

69
Ph 12c 2024 9 Chemical potential and the Gibbs Distribution

9.3 Relation to free energy


Consider a change in the energy U and particle number N of a system. The entropy
changes by
   
∂σ ∂σ
dσ = dU + dN
∂U N ∂N U
1 µ
= dU − dN. (9.10)
τ τ
Now consider the change in free energy

dF = d(U − τ σ) = dU − τ dσ − σdτ
= τ dσ + µdN − τ dσ − σdτ
= µdN − σdτ. (9.11)

This leads to another expression for µ in terms of the free energy:


 
∂F
µ= . (9.12)
∂N τ
If the system depends on a volume V , then we must keep that fixed as well when
computing the derivative
 
∂F
µ= . (9.13)
∂N τ,V

These expressions are useful because F is simple to compute, via its relationship to
Z.
Example 9.1 (Ideal gas). We previously computed the free energy of the ideal
gas
 
1 N
F = −τ log (nQ V )
N!
= −τ (N log nQ V − N log N + N ) . (9.14)

The chemical potential is

µ = −τ (log nQ V − log N ) = τ log(n/nQ ), (9.15)

where n = N/V is the concentration. Typical gasses are dilute relative to the
quantum concentration n/nQ ≪ 1, so that µ is negative.
Example 9.2 (Voltage barrier). Consider two systems S1 , S2 with chemical po-
tentials µ1 , µ2 . Using (9.12), we have
∂F
= µ 1 − µ2 . (9.16)
∂N1

70
Ph 12c 2024 9 Chemical potential and the Gibbs Distribution

That is, the derivative of the total free energy F = F1 + F2 with respect to N1 is
the difference of chemical potentials.
Suppose also that the particles are charged with charge q, and imagine turning
on a voltage difference between the two systems so that particles in system Si have
potential energy qVi .20 The potential energy will give a new contribution to the
total energy and hence the free energy:
F ′ = F + N1 qV1 + N2 qV2 (9.17)
We now have
∂F ′
= µ1 − µ2 + q(V1 − V2 ) (9.18)
∂N2
Thus, the new difference of chemical potentials after turning on the voltages is
µ1 − µ2 + q(V1 − V2 ). We can achieve diffusive equilibrium by tuning q(V1 − V2 ) =
−(µ1 − µ2 ).
Thus, another definition of chemical potential is the electrical potential difference
required to maintain diffusive equilibrium. This gives a way of measuring chemical
potential for charged particles.
In general, when an external potential is present, the total chemical potential of
a system is
µ = µext + µint , (9.19)
where µext is the potential energy per particle in the external potential, and the
internal chemical potential µint is the chemical potential that would be present if
the external potential were zero.
[End of lecture 10(2024)]

Example 9.3 (Isothermal model of the atmosphere). We can model the at-
mosphere as an ideal gas at temperature τ that is in thermal and diffusive equilib-
rium. The isothermal/equilibrium assumptions are actually not true, but this is a
reasonable model. The total chemical potential at height h is
µ = µint + µext
= τ log(n(h)/nQ ) + M gh, (9.20)
where M is the mass of a gas particle, h is the height, and we have allowed the
concentration n(h) to depend on height. In equilibrium, this must be constant, so
we have
τ log(n(h)/nQ ) + M gh = τ log(n(0)/nQ )
n(h) = n(0)e−M gh/τ . (9.21)
20
Don’t confuse Vi with volume in this example.

71
Ph 12c 2024 9 Chemical potential and the Gibbs Distribution

An alternative fun way to derive this result (that you’ll do on the homework) is
to demand that the pressure of a slab of gas be enough to hold up the slab above it.
Here’s a third slick way that I really like. Consider a single molecule of gas as a
subsystem and the whole atmosphere as a reservoir at temperature τ . The density of
orbitals at height h is independent of h. Thus, the probability that the molecule will
be at height h is proportional to a Boltzmann factor P (h) ∝ e−E(h)/τ = e−M gh/τ .
The concentration n(h) is proportional to this probability.

9.4 Gibbs factor and Gibbs sum


We have seen that N can be treated as a conserved quantity in a way similar to
energy U . Let us revisit our discussion of system and reservoir, this time allowing
the system and reservoir to be in both diffusive and thermal contact.
Definition (general reservoir). Let us generalize our definition of a reservoir to
be a system whose entropy is linear in fluctuations of conserved quantities, to very
good approximation.
In the case of energy fluctuations, we argued that this definition is reasonable by con-
∂k σ
sidering how derivatives ∂U k scaled with the size of a system: the higher derivatives
become very small for large systems. The same reasoning holds for any conserved
quantity; hence the definition above.
If R is such a reservoir, then we have
   
∂σR ∂σR
σR (U0 − Es , N0 − Ns ) = σR (U0 , N0 ) − N− Es
∂N U ∂U N
= σR (U0 , N0 ) + µNs /τ − Es /τ. (9.22)

Let us suppose our system is in a state with energy Es and particle number Ns .
The reservoir has energy U0 − Es and particle number N0 − Ns . The probability of
the state is proportional to

gR (U0 − Es , N0 − Ns ) = eσR (U0 ,N0 )+µNs /τ −Es /τ . (9.23)

To compute the constant of proportionality, we can impose that the sum of proba-
bilities is equal to 1. We find
eµNs /τ −Es /τ
P (s) = (9.24)
Z(τ, µ)
X ∞ X
X
(Ns µ−Es )/τ
Z(τ, µ) = e = e(N µ−Es(N ) )/τ , (9.25)
s N =0 s(N )

where s(N ) indicates states with particle number N .


Definition (grand canonical ensemble). The probability distribution P (s) de-
fined in (9.24) is called the grand canonical ensemble.

72
Ph 12c 2024 9 Chemical potential and the Gibbs Distribution

Definition (grand canonical partition function/Gibbs sum). The object Z


defined in (9.25) is sometimes called the grand canonical partition function, although
Kittel and Kroemer call it the Gibbs sum.
Sometimes it is convenient to define the activity λ = eµ/τ . Then the grand
canonical partition function can be written

X
Z= λN ZN ,
N =0
X
ZN = e−Es /τ , (9.26)
s(N )

so it is essentially a generating function for canonical partition functions with dif-


ferent particle number.
We can use the Gibbs sum in the same way that we use the usual partition
function. For example, we can compute the expected number of particles
X
⟨N ⟩ = Ns P (s)
s
X Ns e(Ns µ−Es )/τ
=
s
Z(τ, µ)

=τ log Z(τ, µ). (9.27)
∂µ
One tricky point is that because of the way µ is defined, the expression for the
average energy is different in the grand canonical ensemble:
 
µ ∂ ∂
U = ⟨Es ⟩ = − log Z
β ∂µ ∂β

= µ⟨N ⟩ − log Z. (9.28)
∂β
Example 9.4 (Ionization of impurities in a semiconductor). A semiconduc-
tor has energy levels called the “conduction band” which act as an ideal gas of
electrons. The semiconductor can also contain impurities, which are atoms whose
electrons are in thermal and diffusive equilibrium with the conduction band. In a
simple model, an impurity atom has a single valence electron, and that electron can
be either absent (zero energy, ionized) or present in either the spin-up or spin-down
state (energy −I). The Gibbs sum is
1 + 2e(µ+I)/τ . (9.29)
The probability of ionization is
1
. (9.30)
1 + 2e(µ+I)/τ

73
Ph 12c 2024 10 Ideal Gas

10 Ideal Gas
10.1 Spin and statistics
10.1.1 Bosons vs. Fermions
Quantum mechanical particles are indistinguishable. This means that probabili-
ties and expectation values of operators are invariant under exchanging the labels
associated to a pair of particles. Consider a two-particle state

|x1 , x2 ⟩. (10.1)

Here, the particles are labeled by their positions. If they also had spin or other inter-
nal quantum numbers, then they would carry those labels as well. Under relabeling
x1 ↔ x2 , observables must be unchanged. This means that we must have

|x2 , x1 ⟩ = eiϕ |x1 , x2 ⟩. (10.2)

Note that we don’t simply have 1 on the right-hand side because we are free to
rotate amplitudes by a phase without changing expectation values and probabilities.
However, exchanging the labels twice should bring us back to the same state. Thus,
the phase should be a plus or minus sign

|x2 , x1 ⟩ = ±|x1 , x2 ⟩. (10.3)

Given a general two-particle state |Ψ⟩, the wavefunction is

Ψ(x1 , x2 ) = ⟨x1 , x2 |Ψ⟩. (10.4)

If the particles get a ± sign under exchange according to (10.3), then the wavefunc-
tion satisfies

Ψ(x1 , x2 ) = ±Ψ(x2 , x1 ). (10.5)

We see that particles fall into two types:

• bosons have wavefunctions that are symmetric under exchange of labels,

• fermions have wavefunctions that are antisymmetric under exchange of labels.

Let us try to build two-particle wavefunctions from a set of single-particle wave-


functions ψa (x) (“orbitals”). For example, the single particle wavefunctions could
be energy eigenstates of the single-particle Schrödinger equation in some potential.
We can define the two-particle wavefunctions

Ψab (x1 , x2 ) = ψa (x1 )ψb (x2 ) ± ψa (x2 )ψb (x1 ), (10.6)

74
Ph 12c 2024 10 Ideal Gas

and these will automatically have the correct property under exchange (10.5). How-
ever, consider the case where a = b:

Ψaa (x1 , x2 ) = ψa (x1 )ψa (x2 ) ± ψa (x2 )ψa (x1 ). (10.7)

If we have a + sign (bosons), there is no problem with this wavefunction. However,


if we have a minus sign (fermions), then it is simply zero. This is the Pauli exclusion
principle: two fermions cannot occupy the same orbital.
Thus, the types of multi-particle states that are available depend on whether we
have bosons or fermions. This has dramatic consequences for their statistics, and
consequently for experiment.

10.1.2 Spin review


Recall that a rotation by an angle θ acts on a quantum state by

Rn (θ) = eiθn·J , (10.8)

where n is a unit vector for the axis of rotation, and J = (Jx , Jy , Jz ) are angular
momentum generators. By studying the commutation relations of the J’s, one can
show that the possible eigenvalues for any of the J’s are

m = −j, −j + 1, . . . , j − 1, j, (10.9)

where j is either an integer or half-integer. For example, acting on an eigenstate of


Jz , and abbreviating R(θ) = Rez (θ), we have

R(θ)|m⟩ = eimθ |m⟩. (10.10)

Note that a rotation by 2π gives either ±1, depending on whether j is integer or


half-integer (regardless of which value of m (10.9) we have):
(
+1 j ∈ Z
R(2π)|m⟩ = q|m⟩, q= . (10.11)
−1 j ∈ Z + 21

The generators J get contributions from both the orbital angular momentum
and the intrinsic spin of a particle. In particular, if a spin-s particle is in a position
eigenstate at the origin, then R(2π) acts on it by ±1, depending on whether the
spin is integer or half-integer.
The fact that R(2π) is not necessarily equal to 1 is surprising from the classical
point of view. Classically, a 2π rotation brings a system back to itself. However, this
is not necessarily true for quantum states. Only probabilities are preserved under
a 2π rotation, and this leaves room for the appearance of a phase. You might ask:
why can the phase only be ±1, and not something more general? The answer is
related to the topology of the rotation group.

75
Ph 12c 2024 10 Ideal Gas

The reason we have the possibility of a nontrivial phase associated with 2π


rotations is that, when rotating the system, we can keep track of the path that we
take in the rotation group. The final quantum state we get can depend on that
path, not just the endpoint of the path.
In 2 dimensions, we can picture the rotation group as a circle, and a rotation
by 2π corresponds to moving once around the circle. A rotation by 4π corresponds
to moving twice around the circle. The circle allows for arbitrary integer winding
numbers, and so we can imagine accumulating many different phases as we go around
and around.
In 3 dimensions, the rotation group is a more complicated topological space.21 It
turns out that a rotation by 2π cannot be continuously deformed into a trivial path,
but a rotation by 4π can. See for example the demonstration with coffee cup from
the lecture, or https://twitter.com/krzhang/status/1256117297766268936.

10.1.3 The spin-statistics theorem


A-priori, there is no obvious connection between the sign associated with R(2π) and
the sign associated with exchange of particle labels. However, it turns out that they
are directly related.
Theorem 10.1 (Spin-statistics). All particles with integer spin are bosons and
all particles with half-integer spin are fermions.
The spin-statistics theorem is not obvious. It is a consequence of relativistic
invariance and the existence of antiparticles. A heuristic picture of why it’s true
is as follows. Let us imagine drawing the world lines of particles. We draw an
arrow to indicate particles vs. anti-particles. These world lines have two important
properties:

• They can be created or destroyed in pairs. Thus, we can have loops.

• They keep track of the history of rotations applied to each particle. To model,
this we can thicken each line into a “belt.”

In the Feynman path integral formalism, we must assign an amplitude to each


possible history of particles. On the board, draw a history associated to an exchange
and explain how it can be modified into a history where one particle is rotated by
2π.
[End of lecture 11(2024)]

21
Topologically, it is equivalent to S 3 /Z2 .

76
Ph 12c 2024 10 Ideal Gas

10.2 Statistics of Fermions and Bosons


10.2.1 Fermi-Dirac statistics
Consider an orbital (solution of the single-particle Schrödinger equation) that can
be occupied or unoccupied by a fermion, and let the energy of the orbital be E. The
Gibbs sum is

Zorbital = 1 + λe−E/τ , (10.12)

where λ = eµ/τ . The expected occupancy is

λe−E/τ 1
f (E) = ⟨Norbital ⟩ = = (E−µ)/τ . (10.13)
Z e +1
This is the Fermi-Dirac distribution. It is equal to 1 when E ≪ µ, 0 when E ≫ µ,
and 1/2 when E = µ. At low temperatures, it approaches a step function at
E = µ(τ = 0) = EF (the Fermi energy).

10.2.2 Bose-Einstein statistics


Let us consider the analogous problem for bosons. In this case, each orbital can be
occupied by an arbitrary number of bosons. The Gibbs sum is a geometric series

X 1
Zorbital = λN e−N E/τ = . (10.14)
N =0
1 − λe−E/τ

The expected occupancy is

∂ λe−E/τ
f (E) = λ log Zorbital =
∂λ 1 − λe−E/τ
1
= (E−µ)/τ . (10.15)
e −1
This is the Bose-Einstein distribution. The only difference with the Fermi-Dirac
distribution is the minus sign in the denominator, but that difference is dramatic.
Note that the occupancy blows up when E ≈ µ — bosons love to live together in
low energy orbitals.

10.3 The classical limit


An ideal gas is a system of free non-interacting particles in the classical regime.
Here, classical regime means that the occupancy of each orbital is much smaller
than 1. In this case, we have

e(E−µ)/τ ≫ 1, (10.16)

77
Ph 12c 2024 10 Ideal Gas

so that

fF D (E) ≈ fBE (E) ≈ e(µ−E)/τ = λe−Es /τ . (10.17)

This is called the “classical distribution function.”


You might recall that in our initial analysis of the ideal gas, we used the low-
occupancy assumption to justify dividing by N ! in the multi-particle partition func-
tion. This ignored the combinatorics associated with multiple occupancy. Using
the technology of Gibbs sums and chemical potentials, we will be able to treat the
low-occupancy approximation in a different way.22 We will study an ideal gas of
particles in the classical limit, where the difference between FD and BE statistics is
unimportant. Later, we will study gasses in the quantum regime, where the different
statistics of fermions and bosons will play a major role.
Let us determine the chemical potential of a gas of N atoms in the classical
regime. We must have
X
N = ⟨N ⟩ = f (Es )
s
X
−Es /τ
=λ e
s
= λZ1 , (10.18)

where Z1 is simply the single-particle partition function. We have already evaluated


M τ 3/2

this: Z1 = nQ V , where nQ = 2πℏ 2 is the quantum concentration. Thus, we
find

N = λnQ V =⇒ λ = n/nQ ,
=⇒ µ = τ log(n/nQ ). (10.19)

where n = N/V is the conentration. This recovers our result from before.
∂F
From the chemical potential µ = ( ∂N )τ,V , we can recover the free energy
Z N
F = dN µ
0
Z N
= τ (log(N ) + log(1/(nQ V )))
0
= τ ((N log N − N ) + N log(1/(nQ V )))
= τ N (log(n/nQ ) − 1). (10.20)

The pressure is
 
∂F
p=− = N τ /V, (10.21)
∂V τ,N
22
A way where the errors are more easily quantified.

78
Ph 12c 2024 10 Ideal Gas

as before. The entropy is


   
∂F 3
σ=− = −N log(n/nQ ) − 1 −
∂τ V,N 2
 
5
= N log(nQ /n) + , (10.22)
2

which is the Sackur-Tetrode equation again.

79
Ph 12c 2024 11 Fermi Gases

11 Fermi Gases
In the previous section, we considered gases in the classical regime n ≪ nQ . The
behavior of gases in the quantum regime n ≫ nQ is rich and interesting, and in
particular depends in a crucial way on whether the gas is composed of fermions or
bosons. In this section, we will study a Fermi gas (gas of fermions). Some examples
of Fermi gases are

• Electrons in the conduction band of a metal,

• Electrons in a white dwarf star,

• Liquid 3 He,

• Nuclear matter.

11.1 Gibbs sum and energy


For concreteness, we will consider a gas of nonrelativistic spin- 12 particles (e.g. elec-
trons). Such particles have two spin states, s = ± 12 . Suppose the particles are
constrained to a cube of side length L. The orbitals are labeled by mode numbers
which are a triplet of nonnegative integers n = (nx , ny , nz ) ∈ Z3≥0 , and also a spin
s = ± 12 . Their energies are a function of n = |n| given by

ℏ2  πn 2
En = . (11.1)
2m L
Each orbital can be occupied or unoccupied, so the partition function of a single
orbital is

Zn = 1 + eβ(µ−En ) , (11.2)

where β = 1/τ .
The Gibbs sum of the entire system is a product of the Gibbs sums for each
orbital
Y Y
Z= Zn = Zn2 . (11.3)
n∈Z3≥0 n∈Z3≥0
s=± 12

Note that we have a square because of the possibility of having two different spins.

80
Ph 12c 2024 11 Fermi Gases

The expected number of particles is

Ns eβ(µNs −Es )
P
⟨N ⟩ = s
Z
1 ∂
= log Z
β ∂µ
1 ∂ X
= 2 log(1 + eβ(µ−En ) )
β ∂µ n
X eβ(µ−En ) X
=2 = 2 fFD (En , τ, µ), (11.4)
n
1 + eβ(µ−En ) n
P P
where we have abbreviated n = n∈Z3 . This is an intuitive formula: the Fermi-
≥0
Dirac distribution gives the expected occupancy of each mode number, so the ex-
pected value of N is the sum of the expected occupancies, times 2 to account for
spin. We abbreviate fFD (E, τ, µ) as f (E, τ, µ) or sometimes f (E), throughout this
section.
Let us make the “big box” assumption that theP thermal wavelength is much
smaller than the size of the box. The sum over modes n can then be approximated
as a phase space integral. For any function f , we have
Z 3 3  2
X d xd p p
f (En ) ≈ 3
f
n
(2πℏ) 2m
Z ∞  2
4πV 2 p
= 3
dp p f
(2πℏ) 0 2m
Z ∞ √
4πV
= d( 2mE) (2mE)f (E)
(2πℏ)3 0
2m 3/2 ∞
  Z
V
= 2 dEE 1/2 f (E)
4π ℏ2 0
Z ∞
= dED0 (E)f (E), (11.5)
0

where we have defined the “density of modes”

2m 3/2 1/2
 
V
D0 (E) = 2 E . (11.6)
4π ℏ2

You may recognize this as just Weyl’s law (8.18). Remember that the density of
⃗ 2 is given by
eigenvalues of the Laplacian −∇
V 1/2
ρ(λ)dλ = λ dλ. (11.7)
4π 2
2mE
To get D0 (E)dE, we can just plug in λ = ℏ2
.

81
Ph 12c 2024 11 Fermi Gases

Because we have particles with spin s = 1/2, the “density of orbitals” is

D(E) = (2s + 1)D0 (E) = 2D0 (E). (11.8)

With this definition, we can write


Z ∞
⟨N ⟩ = dED(E)fFD (E, τ, µ). (11.9)
0

We can interpret D(E)fFD (E, τ, µ) as the density of occupied orbitals.


To compute the expectation value of the energy, we apply (9.28):
   
µ ∂ ∂ µ ∂ ∂
− log Zn = − log(1 + eβ(µ−En ) )
β ∂µ ∂β β ∂µ ∂β
En eβ(µ−En )
= = En fFD (En , τ, µ). (11.10)
1 + eβ(µ−En )
Thus, we have
X
U =2 En f (En , τ, µ)
n
Z ∞
≈ dED(E)EfFD (E, τ, µ), (11.11)
0

where in the second line, we have made the big-box approximation.

11.2 Zero temperature


Fermi gases have a characteristic energy scale called the Fermi energy (defined be-
low). We would often like to understand the physics of Fermi gases at temperatures
much lower than the Fermi energy. To do this, we can perform perturbation theory
around zero temperature. The starting point is to understand τ = 0.
In the limit of zero temperature, the Fermi-Dirac distribution becomes a step
function
(
1 if E < µ
lim f (E, τ, µ) = θ(µ − E) = , (11.12)
τ →0 0 if E > µ

where θ(x) is the Heaviside step function (https://en.wikipedia.org/wiki/Heaviside_


step_function).
Definition (Fermi energy). The Fermi energy is defined as the chemical potential
at zero temperature: EF = µ(τ = 0).

82
Ph 12c 2024 11 Fermi Gases

Thus, at zero temperature, we have


Z EF  3/2
2 V 2mEF
N= dE D(E) = EF D(EF ) = 2
0 3 3π ℏ2
ℏ2 2/3
EF = 3π 2 n , (11.13)
2m

where we have used that D(E) ∝ E 1/2 . This gives a connection between the Fermi
energy and the concentration.
The total energy at zero temperature is
Z EF
2 3
U= dEED(E) = EF2 D(EF ) = EF N, (11.14)
0 5 5

where we have used that ED(E) ∝ E 3/2 . This is a striking result. The energy
at zero temperature is enormous when N is large or the concentration n is large,
coming from the fact that the Pauli exclusion principle forces orbitals with high
energy to be occupied.
Because EF ∝ n2/3 ∼ V −2/3 , the energy increases with decreasing volume. This
leads to an outward pressure sometimes called degeneracy pressure. The pressure at
zero temperature is given by
     
∂F ∂(U − τ σ) ∂U
p=− =− =−
∂V τ,N τ =0 ∂V τ,N τ =0 ∂V τ,N
 
2 U
=− −
3 V
2
= EF n. (11.15)
5
In metals, the degeneracy pressure is cancelled in metals by Coulomb attraction
between electrons and ions, and in white dwarf stars by gravitational attraction.

11.3 Heat capacity at low temperature


For electrons in the conduction band of a metal, typical Fermi energies are EF /kB =
50000K, so the regime of low temperatures τ ≪ EF is of practical interest. For
concreteness, consider the energy U as a function of temperature. When we turn on
the temperature, the Fermi-Dirac distribution f (E, τ, µ) is no longer a step function.
However, when τ is small, f (E, τ, µ) only changes significantly near the Fermi energy
E ∼ EF . Thus, there will be some reshuffling of particles between energy levels near
EF , but not elsewhere.

83
Ph 12c 2024 11 Fermi Gases

We have to be a bit careful because we must let µ depend on temperature in


order to keep N constant. Let us study the condition that N is constant
Z ∞
N= dED(E)f (E, τ, µ)
Z0 ∞  
∂f ∂µ ∂f
0= dED(E) + . (11.16)
0 ∂τ ∂τ ∂µ
Here, we encounter two important functions:

∂f 1 e(E−µ)/τ 1 1
= (E−µ)/τ 2
=  ,
∂µ τ (e + 1) τ 4 cosh2 E−µ

∂f E−µ e(E−µ)/τ E−µ 1


= 2 (E−µ)/τ 2
= 2
 . (11.17)
∂τ τ (e + 1) τ 4 cosh E−µ
2

∂f
Note that µ just shifts over the Fermi-dirac distribution, so ∂µ looks like the deriva-
tive of a step function, which is a delta function δ(E − µ). Meanwhile, ∂f
∂τ looks like
two closeby bumps: one negative just below µ and one positive just above µ. This
should remind you of the derivative of a delta function δ ′ (E − µ).23
Let us turn this into a quantitative approximation. Let us guess
∂f
≈ A δ(E − µ), (τ ≪ µ, E),
∂µ
∂f
≈ B δ ′ (E − µ), (τ ≪ µ, E), (11.21)
∂τ
We can determine the coefficients A, B by integrating both sides against 1 and
E − µ. Note that we can integrate from −∞ to ∞ instead of from 0 to ∞ because
23
The dirac delta function is defined by
Z ∞
δ(x − a)f (x) = f (a). (11.18)
−∞

Technically speaking, the delta function is not really a function, but a distribution — something
that can be integrated against a function to get a number. Derivatives of delta functions are similar:
to define them we must explain how to integrate them against a function. Specifically, they are
defined by integrating by parts
Z ∞ Z ∞

δ ′ (x − a)f (x) = − δ(x − a) f (x) = −f ′ (a). (11.19)
−∞ −∞ ∂x

Similarly, we can compute an integral against the n-th derivative δ (n) (x − a) by integrating by parts
n times:
Z ∞
δ (n) (x − a)f (x) = (−1)n f (n) (a). (11.20)
−∞

84
Ph 12c 2024 11 Fermi Gases

the functions are very small when E ≪ µ, and we will assume that µ is large. For
example,
Z ∞ Z ∞  
∂f ∂f
dE = dE − = f (−∞) − f (∞) = 1. (11.22)
−∞ ∂µ −∞ ∂E
R
(In this case, we didn’t actually have to do an integral.) Comparing to dE δ(E −
µ) = 1, we conclude
∂f
≈ δ(E − µ), (τ ≪ µ, E). (11.23)
∂µ

[End of lecture 12(2024)]


∂f
In the case of ∂τ , to compute the coefficient B, we should integrate against
E − µ:24
Z ∞ ∞
(E − µ)2 e(E−µ)/τ
Z
∂f
dE(E − µ) = dE
−∞ ∂τ −∞ τ2 (e(E−µ)/τ + 1)2
Z ∞
x2 ex π2τ
=τ dx x = . (11.25)
−∞ (e + 1)2 3
comparing to
∂(E − µ)
Z Z
dE(E − µ)δ ′ (E − µ) = − dE δ(E) = −1, (11.26)
∂E
we conclude
∂f π2τ ′
≈− δ (E − µ) (τ ≪ µ, E). (11.27)
∂τ 3
These expansions in terms of δ-functions and their derivatives are extremely
useful. They are the first terms in more general series expansions at small τ , e.g.
∂f
= δ(E − µ) + a2 τ 2 δ ′′ (E − µ) + . . .
∂µ
∂f π2τ ′
=− δ (E − µ) + b3 τ 3 δ ′′′ (E − µ) + . . . . (11.28)
∂τ 3
24
To evaluate the integral, we first compute
Z ∞ Z ∞
x
dx x e−ax − e−2ax + e−3ax − e−4ax + . . .

I(a) = dx ax =
e +1
Z0 ∞ 0
Z ∞
−ax −2ax
dx x e−2ax + e−4ax + . . .
 
= dx x e +e + ... − 2
0 0
1 2 π2
= 2 ζ(2) − 2
ζ(2) = . (11.24)
a (2a) 12a2
The desired answer is −2I ′ (1).

85
Ph 12c 2024 11 Fermi Gases

The coefficients a1 , a2 , . . . and b3 , . . . can be computed by integrating both sides


against (E − µ)n for various n. The additional terms would be necessary to compute
further corrections in τ to the formulas that we will derive next.
Note that this kind of expansion is only good if ∂f ∂τ is integrated against another
function that is smooth at E = µ.R This is the case for D(E) above. However, we
sometimes encounter integrals like dEg(E) ∂f ∂τ , where g(E) is not smooth at E = µ.
One example is graphene (which you may encounter on your problem set). In those
cases, we cannot separately expand ∂f ∂τ — we must consider the complete product
∂f
g(E) ∂τ .
We can now return to (11.16) and plug in our approximations for ∂f ∂f
∂τ and ∂µ .
Because we are working near τ = 0, we can also substitute µ = EF in the δ functions.
We find
Z ∞  2 
π τ ′ ∂µ
0= dED(E) − δ (E − EF ) + δ(E − EF )
0 3 ∂τ
π2τ ′ ∂µ
= D (EF ) + D(EF ). (11.29)
3 ∂τ
This determines the leading correction to µ at small τ :

∂µ π 2 τ D′ (EF ) π2τ
=− =−
∂τ 3 D(EF ) 6EF
2
π τ 2
µ(τ ) = EF − (τ ≪ EF ). (11.30)
12EF
Let us now study the energy at small τ . We have
Z ∞  
∂U ∂f ∂µ ∂f
CV = = dE ED(E) +
∂τ 0 ∂τ ∂τ ∂µ
Z ∞  2 
π τ ′ ∂µ
= dE ED(E) − δ (E − EF ) + δ(E − EF )
0 3 ∂τ
π2τ ∂µ
= (D(EF ) + EF D′ (EF )) + EF D(EF )
3 ∂τ
π2τ π2 N τ
= D(EF ) = . (11.31)
3 2 EF
where we cancelled the term involving D′ (EF ) using the first line of (11.30).
Note that the heat capacity is proportional to τ : it is small at small temperature,
which is another striking prediction. (Compare to an ideal gas in the classical regime
whose heat capacity is independent of τ .) The physical interpretation is that when
we increase τ , particles in orbitals with energy EF − τ get lifted to have energy
EF + τ . The change in energy of each particle is proportional to τ , but the number
of particles whose orbitals are changing is also proportional to τ . Thus, the total
energy changes quadratically with τ , and the heat capacity is linear in τ .

86
Ph 12c 2024 11 Fermi Gases

Example 11.1 (Metals). The heat capacity of many metals can be written as a
sum of an electronic contribution from a free electron gas and a contribution from
phonons. The former goes like τ , and the latter goes like τ 3 according to Debye
theory. Overall, we have

CVmetal = γτ + Aτ 3 (11.32)

which fits the data extremely well. In more detail, we expect the form

π2N 12π 4 N 3
CVmetal = τ+ τ (11.33)
2EF 5 θ3
At low temperatures, the free electron term will be dominant. The crossover tem-
perature occurs at
1/2
5θ3

τ= , (11.34)
24π 2 EF
which in metals is about 1 K.
In practice, the heat capacity in metals is linear at low temperatures. However,
the slope γ does not precisely agree with the prediction (11.33). For example, in
2
potassium, the observed slope γ satisfies γ/γ0 ≈ 1.23, where γ0 = π2ENF is the ideal
gas prediction. The reason is that electrons in a metal are not really free — they
experience Coulomb interactions with each other and with the lattice of ions.
In fact, it is initially surprising that the linear slope prediction works at all. The
reason is that the important excitations in a metal aren’t really individual electrons,
but “quasiparticles” — an electron together with accompanying deformations and
vibrations of the lattice of ions (called “dressing”). The dressing changes the way
electrons interact with the lattice and with each other. Firstly, it tends to screen
Coulomb interactions, so that the dressed electrons interact more weakly with each
other and the lattice than undressed electrons. Secondly, it changes the effective
mass of the excitations m → m∗ , so that the Fermi energy of dressed electrons EF ∗
is different.
This is an example of the phenomenon of “renormalization.” We are given a
microscopic theory of particles with some masses and interactions. Interactions
dress the particles, so that the quasiparticles relevant for long-distance dynamics
have different properties from their microscopic constituents. Thus, effective masses
and interactions change with distance scale. This phenomenon is ubiquitous in
particle physics — in fact, many of the so-called “fundamental” particles might be
quasiparticles of some underlying more fundamental theory. Even the fine-structure
constant α, which controls the strength of electromagnetic interactions, changes
with distance scale.
Example 11.2 (White dwarf stars). White dwarf stars have masses approx-
imately equal to the mass of the sun, but radii 100 times smaller. The density

87
Ph 12c 2024 11 Fermi Gases

of the sun is approximately 1g/cm3 . The density of a white dwarf star is 104 to
107 g/cm3 . All the atoms in a white dwarf star are ionized and the electrons form
an approximately free electron gas.
Typical interatomic spacings are 1 Bohr radius, or 10−8 cm. When the atoms are
100 times closer together, their typical spacing is 10−10 cm, so the concentration of
electrons is approximately n = 1030 electrons/cm3 . The corresponding Fermi energy
is25
ℏ2
(3π 2 n)2/3 ≈ 3 × 105 eV = 0.36 MeV. (11.35)
2m
The Fermi temperature is 109 K. The actual temperature inside a white dwarf is
expected to be on the order of 107 K, thus we are in the same highly quantum
regime that we just analyzed.
One important point is that the Fermi energy is comparable to the rest mass of
an electron, which is me c2 = 0.5 MeV. Thus, relativistic effects are nontrivial, but
don’t completely invalidate our analysis.
What determines the size of a white dwarf star? Very roughly, the energy due
to gravitational potential energy is

GM 2
Ugrav = − . (11.36)
r
The kinetic energy of the Fermi gas is
2/3
ℏ2 ℏ2 N 5/3

3 N
Ufermi = EF N ≈ N = . (11.37)
5 me r3 me r 2

We have N = M/mp , where mp is the proton mass. Minimizing the energy, we find

∂ GM 2 ℏ2 M 5/3
0= (Ugrav + Ufermi ) = − , (11.38)
∂r r2 me mp5/3 r3

which gives

ℏ2
r≈ 5/3
. (11.39)
Gme mp M 1/3
So more massive white dwarf stars are actually smaller. Plugging in a solar mass
M⊙ ≈ 1033 g, we find

r(M⊙ ) ≈ 104 km, (11.40)

roughly the size of the earth.


25
I often use google for these types of calculations: https://www.google.com/search?q=hbar%
5E2%2F(2*electron+mass)*(3+pi%5E2+*+10%5E30%2Fcm%5E3)%5E(2%2F3)+in+MeV

88
Ph 12c 2024 11 Fermi Gases

Example 11.3 (Relativistic white dwarf stars and the Chandrasekhar


limit). When a white dwarf star more massive, the Fermi energy can get larger
than the electron rest mass and relativistic effects become more important. When
the star gets very massive, we can approximate the electron mass as a relativistic de-
generate Fermi gas. Recall that the energy of a relativistic particle is E = pc = ℏcπn
L .
To determine the Fermi energy, we have

1 nF
Z
N =2 4πn2 dn
8 0
L 3 EF
  Z
π
=π dEE 2 = 3
V EF3 (11.41)
ℏcπ 0 3(ℏcπ)

The kinetic energy of the gas is


3 Z EF 1/3
ℏcN 4/3 ℏcN 4/3
 
L 3 3N 3N 3N
U =π dEE = EF = (ℏcπ) ∼ ∼
ℏcπ 0 4 4 πV V 1/3 r
(11.42)

This depends on 1/r in the same way that gravitational potential energy does.
Thus, if the relativistic dwarf star is heavy enough, gravitational potential energy
will dominate kinetic energy and the star will collapse. The condition for collapse is

GM 2 ≈ ℏcN 4/3 = ℏc(M/mp )4/3


3
MPl
MChandrasekhar ≈ , (11.43)
m2p
1/2
where MPl = ℏc G is the Planck mass (approximately 2 × 10−5 g). Plugging in
numbers, this gives about 1030 kg, or about 1 solar mass M⊙ . The current accepted
value, based on a more careful analysis, for MChandrasekhar is 1.4M⊙ .
Example 11.4 (Neutron star). When a relativistic white dwarf star collapses,
what does it turn into? A way for the star to decrease its energy is to reduce the
number of electrons, which it can do via the “electron capture” reaction

p + e → n + νe , (11.44)

where νe is a neutrino. Under normal conditions, this reaction is (thankfully) not


energetically favorable because mp +me < mn +mνe . However, the high temperature
of a collapsing dwarf star is enough to overcome the activation energy and the
reaction occurs, creating a neutron star. In a neutron star, the gravitational energy
is balanced by the kinetic energy of a degenerate neutron gas. The Fermi energy of a
neutron gas is set by mn ≈ 2 × 103 me , so from (11.39) the neutron star is about 103
times smaller than a white dwarf, or a few kilometers. The flux of neutrinos from the

89
Ph 12c 2024 11 Fermi Gases

core of a newly-forming neutron star blasts away a shell of other collapsing matter,
creating a supernova. Most of the energy is released in the form of neutrinos.26
The Schwarzchild radius of a black hole is
2GM
rS = , (11.45)
c2
which for M = M⊙ is about 3 kilometers. Thus, a solar-mass neutron star is close to
being a black hole. Indeed, if the neutron star has mass about 3M⊙ , it will collapse
into a black hole.
[End of lecture 13(2024)]

26
The Wikipedia article on neutron stars (https://en.wikipedia.org/wiki/Neutron_star) is
pretty mindblowing. One sentence I like is: “A neutron star is so dense that one teaspoon (5
milliliters) of its material would have a mass over 5.5 × 1012 kg, about 900 times the mass of the
Great Pyramid of Giza.”

90
Ph 12c 2024 12 Bose Gases and Bose-Einstein Condensation

12 Bose Gases and Bose-Einstein Condensation


Bose gases behave very differently from Fermi gases in the degenerate regime. Re-
call that it is possible for multiple bosons to occupy the same orbital. In fact, a
spectacular thing happens in Bose gases at low temperatures: many particles accu-
mulate in the lowest-energy orbital (the “ground orbital”).27 This phenomenon is
called Bose-Einstein condensation. It is not surprising that this might happen at
temperatures that are extremely low compared to the energy spacing ∆E between
the lowest orbital and the first excited orbital. However, the surprising fact is that
Bose-Einstein condensation happens even for temperatures much larger than ∆E.
As an example, the spacing between the lowest and first excited orbital for atoms
in a cubical box with side length L is

ℏ2  π 2 2
∆E = (nexcited − n2ground )
2m L
ℏ2  π 2
= (4 + 1 + 1 − (1 + 1 + 1))
2m L
ℏ2  π 2
=3 . (12.1)
2m L
The mass of 4 He is 4 GeV/c2 (4 times the mass of a nucleon). For a box of size
L = 1cm, we have

∆E ≈ 10−18 eV, (12.2)

which is a temperature of about ∆E/kB ≈ 10−14 K. However, the observed tem-


perature of Bose-Einstein condensation in such a system is about 2 K.

12.1 Chemical potential near τ = 0


Naively, the Boltzmann factors e−E/τ for the lowest several energy levels would be
nearly the same at a temperature of τ = 2 K. The key to Bose-Einstein condensation
is that the chemical potential µ can be extremely small compared to ∆E, even when
τ is not small. The smallness of µ makes the physics of the ground orbital very
different from the physics of the excited orbitals.
Consider a Bose gas of N particles. For simplicity, we will consider spin-0 bosons.
Recall that the expected occupancy of an orbital with energy E is
1
f (E, µ, τ ) = , (12.3)
e(E−µ)/τ −1
27
People usually say that particles accumulate in the ground state, where here “state” means a
solution to the single-particle Schrodinger equation, i.e. what we have been calling an “orbital.”

91
Ph 12c 2024 12 Bose Gases and Bose-Einstein Condensation

where in this section f stands for the Bose-Einstein distribution. The occupancy of
the ground orbital E = 0 is
1
f (0, µ, τ ) = . (12.4)
e−µ/τ −1
Let us begin by calculating the chemical potential at very small τ ≪ ∆E. As
we emphasized above, τ ≪ ∆E is not necessary for Bose-Einstein condensation.
However, it will help us understand how small µ is. When τ ≪ ∆E, all N particles
are in the ground orbital, and we have
1
lim f (0, µ, τ ) ≈ N = (τ ≪ ∆E).
τ →0 e−µ/τ
 − 1
1 τ
µ = −τ log 1 + ≈− (τ ≪ ∆E, 1 ≪ N ). (12.5)
N N

For example, if we have 1022 particles, then the chemical potential when τ ≪ ∆E
is −τ /1022 .

12.2 Orbital occupancy versus temperature


The fact that the chemical potential is so much closer to the energy of the lowest
orbital means that the physics of that orbital is very different from the physics of
the other orbitals. Let us study the occupancy of various orbitals as a function of
temperature τ .
Our central claim is that when τ is larger than ∆E (but not too large) we can
still have an O(1) fraction of the particles in the ground orbital. This is because µ
remains extremely small even as we increase τ . To see this, let us assume that an
O(1) fraction of the particles are in the ground orbital. Using this assumption, we
will solve for µ and verify that our assumption is consistent for some range of τ .
Let us write

N = N0 (τ ) + Ne (τ ), (12.6)

where N0 = f (0, µ, τ ) is the number of particles in the ground orbital (which we


assume to be very large). As before, we can solve for µ
1 τ
N0 (τ ) = =⇒ µ = − , (12.7)
e−µ/τ − 1 N0 (τ )

where we used our assumption that N0 (τ ) is large. If an O(1) fraction of particles


are in the ground orbital, µ is still ridiculously small. For example, we will later
find that at temperature τ = kB · 1.5K = 10−3 eV, half of the particles in a gas of
4 He are in the ground orbital. In this case, we have µ = −10−25 eV, which is much

smaller than the spacing ∆E = 10−18 eV we computed earlier.

92
Ph 12c 2024 12 Bose Gases and Bose-Einstein Condensation

In general, for a box of size L, if we increase L while holding the concentration


fixed, then the number of particles scales like L3 , so that µ ∼ 1/L3 . Meanwhile, the
energy spacing ∆E scales like 1/L2 . Thus, in a sufficiently large box, we can ensure
µ ≪ ∆E.
The number of particles in excited orbitals is
X X 1
Ne (τ ) = f (En , µ, τ ) = (E −µ)/τ
, (12.8)
n̸=0 n̸=0
e n − 1

where
ℏ2  πn 2
En = − Eground . (12.9)
2m L
2
ℏ π 2
Here, we have shifted the energy by Eground = 3 2m L2
so that the ground orbital has
energy 0. For each excited orbital, we have En ≫ µ, so we can safely replace the
above sum with
X X 1
Ne (τ ) = f (En , 0, τ ) = E /τ
(12.10)
n̸=0 n̸=0
e n −1

We can now analyze this sum using our usual tools. When L ≫ √mτ ℏ
, we can
approximate the sum as an integral. The density of orbitals is half of what we
computed before, because now we have spin-0 particles

2m 3/2 1/2
 
V
DB (E) ≡ 2 E . (12.11)
4π ℏ2
We use the subscript “B” to indicate spin-0 bosons.
The combination DB (E)f (E, µ, τ ) behaves like E −1/2 at small E. Even though
this function has a singularity at E = 0, it is an integrable singularity near E = 0 —
the area under the curve E −1/2 is finite. This guarantees that the sum Ne (τ ) can
be approximated by an integral
Z ∞
Ne (τ ) ≈ DB (E)f (E, µ, τ ). (12.12)
0

There are some confusing things about this integral: it is supposed to represent only
the contribution of excited states, and yet the integral goes from 0 to ∞. Doesn’t
this include the ground state? And if so, aren’t we double-counting the ground
state?
Example 12.1. The answer is no, we are not double-counting. To see why let us
study the following toy model of the sum over energy levels:

1 X (n/a + ϵ)1/2
S0,ϵ (a) = , (12.13)
a en/a+ϵ − 1
n=0

93
Ph 12c 2024 12 Bose Gases and Bose-Einstein Condensation

in the limit ϵ ≪ 1/a ≪ 1. The important feature of this toy model is that the
small quantity ϵ is only important in the first term n = 0. Thus, we will have
to treat this term differently from the others. We will see that the other terms
can be approximated as an integral from n = 0 to ∞, and this integral does not
double-count the n = 0 term.
To begin, consider the following simplified version of the sum over the “excited”
levels from n = 1 to ∞,

1 X (n/a)1/2
S1 (a) = (12.14)
a en/a − 1
n=1

The sum is the area under the red curve in the following picture (here, we have
a = 10)
x1/2
ex - 1
3.5

3.0

2.5

2.0

1.5

1.0

0.5

x
0.0 0.2 0.4 0.6 0.8 1.0

Meanwhile, the area under the blue curve is


∞ ∞
dn (n/a)1/2 x1/2
Z Z
I= = dx . (12.15)
0 a en/a − 1 0 ex − 1

Because I is finite, it is hopefully clear that S1 (a) ≈ I, with the approximation


getting better for larger a. Note that the integral goes from x = 0 to x = ∞, but it
approximates the terms in the sum with n = 1, 2, . . . .
Now, if ϵ ≪ 1/a, then modifying the sum over “excited” levels to include ϵ
doesn’t change the answer very much

1 X (n/a + ϵ)1/2
S1,ϵ (a) = ≈ S1 (a) ≈ I, (ϵ ≪ 1/a ≪ 1) (12.16)
a en/a+ϵ − 1
n=1

The reason is that n/a ≫ ϵ for all n = 1, 2, . . . , by our assumption that ϵ ≪ 1/a.
Thus every individual term in the sum is well-approximated by replacing n/a+ϵ → ϵ.

94
Ph 12c 2024 12 Bose Gases and Bose-Einstein Condensation

Now consider the sum including the “ground” level n = 0. We can evaluate it
by separating out the n = 0 term:

1 X (n/a + ϵ)1/2
S0,ϵ (a) =
a
n=0
en/a+ϵ − 1
1 ϵ1/2
= + S1,ϵ (a)
a eϵ − 1
ϵ−1/2
≈ +I (ϵ ≪ 1/a ≪ 1) (12.17)
a
The sum S0,ϵ (a) is analogous to the one we must do: because the ground energy is so
close to a singularity of the Bose-Einstein distribution, we must treat it separately.
The remaining energy levels are not too close to the singularity and they can be
approximated in terms of an integral.
Let us now return to the sum (12.8). Approximating it as an integral, we have
Z ∞
1
Ne (τ ) = dE DB (E) E/τ
0 e −1
3/2 Z ∞
E 1/2

V 2m
= 2 dE
4π ℏ2 0 eE/τ − 1
3/2 Z ∞
x1/2

V 2m 3/2
= 2 τ dx
4π ℏ2 0 ex − 1
 3/2
V 2mτ
= 2 Γ( 32 )ζ( 23 )
4π ℏ2
= ζ( 32 )nQ V, (12.18)

mτ 3/2
is the quantum concentration. Above, we have used Γ( 32 ) =

where nQ = 2πℏ
√ 2
π/2. Numerically, ζ( 32 ) ≈ 2.61238. Note that the shift En → En − Eground is not
important in the integral approximation.
Thus, the fraction of particles in excited orbitals is
ne nQ
= ζ( 23 ) , (12.19)
n n
where n = N/V . Because particle number is conserved, the number of particles in
the ground orbital is
n0 nQ τ
= 1 − ζ( 32 ) =− . (12.20)
n n Nµ

This finally determines the temperature dependence of µ (for τ < τE — see below).

95
Ph 12c 2024 12 Bose Gases and Bose-Einstein Condensation

The Einstein condensation temperature τE is the temperature above which all


particles are in excited orbitals. It is given by
!2/3
nQ (τE ) 2πℏ2 n
ζ( 32 ) =1 =⇒ τE = . (12.21)
n m ζ( 32 )

Plugging this back in, we can write


 3/2
ne τ
= (12.22)
n τE

Above this temperature, we can no longer use (12.20) to determine µ. Instead, we


must solve the equation N = Ne (τ ), taking into account µ-dependent corrections to
Ne (τ ) (which we ignored above because µ was so small).
The interpretation of the calculation that we just did is as follows. At a given
temperature τ , the excited orbitals can only hold a certain fraction of the particles
in the gas. Any particle that doesn’t go in an excited orbital goes into the ground
state.

12.3 Liquid 4 He and superfluidity


Plugging in numbers for 4 He, we find that for a concentration of 2 × 1022 /cm3 , the
Einstein condensation temperature calculated using (12.21) is τE = 3K. Nothing
actually happens at this temperature — our discussion above is modified strongly
by interactions.28
Instead, when T ≈ 2.17K (at low pressure), one finds a new phase with negligible
viscosity, meaning it flows almost without resistance. Below this temperature, liquid
Helium is in a two-fluid state: a superfluid, which is essentially a Bose condensate
consisting of particles in the ground orbital, and a normal fluid consisting of particles
in excited orbitals.
Note that 3 He also shows superfluidity, but at a much lower temperature τ ≈
−3
10 K. This might be surprising because 3 He atoms are fermions. What happens
is that interactions between the fermions cause them to form “Cooper” pairs which
are bosons. These Cooper pairs can then condense. The very low temperatures are
necessary so that the Cooper pairs are sufficiently tightly bound.
Superfluidity is not implied by Bose-Einstein condensation, but it is related. Its
appearance depends on interactions, so it goes beyond what we can analyze in this
lecture. However, we can give some idea of how superfluidity works.
Consider a heavy object with mass M moving through a liquid. The object
may excite the liquid, and if it does, that will manifest as resistance to the object’s
motion. However, it can happen that conservation of energy and momentum don’t
28
Later, we will discuss some examples of true Bose Einstein condensation that can be described
using the framework we developed above.

96
Ph 12c 2024 12 Bose Gases and Bose-Einstein Condensation

allow the object to excite the liquid. If this occurs, the object won’t experience
resistance, and the liquid will appear to have zero viscosity.
Suppose the excitations of the liquid have momentum p and energy Ep . Then
during a scattering process with the object, conservation of energy and momentum
imply
1 1
M V 2 = M V ′2 + Ep
2 2
M V = M V′ + p, (12.23)

where V and V′ are the initial and final velocity of the object. Subtracting p from
both sides of the second equation and taking 1/2M of its square, we get
1 p2 1
MV 2 − V · p + = M V ′2 . (12.24)
2 2M 2
Subtracting from the first equation, we find
p2
V·p− = Ep . (12.25)
2M
Suppose that M is very large, so that we have

V · p = Ep . (12.26)

Depending on the functional form of Ep , it may not be possible to satisfy this


equation. The dot product V · p is maximized when the velocity and momentum
are aligned. In that case, we find
Ep
V = . (12.27)
p
The lowest possible V for which this equation can be satisfied (the “critical velocity”)
is obtained by plotting Ep vs p and finding the lowest slope of a line through the
origin tangent to the curve.
p2
If the excitations of the liquid have a free-particle dispersion relation Ep = 2m ,
then this equation becomes V = p/(2m), which always has a solution for any value
of V . The curve Ep vs p is a quadratic, and the slope of the line through the origin
tangent to the curve is zero.
However, the excitations of 4 He do not have a free-particle dispersion relation
— instead Ep is a somewhat complicated function of p. At small p, the excitations
of 4 He are longditudinal sound waves with a dispersion relation

Ep = vs p, (12.28)

where vs ≈ 237m/s is the speed of sound. The actual value of the critical velocity
in liquid 4 He is much lower, about 50m/s. This is because the curve of Ep vs. p
dips back down at larger p, corresponding to a wavelength of about 10−8 cm.

97
Ph 12c 2024 12 Bose Gases and Bose-Einstein Condensation

12.4 Dilute gas Bose-Einstein condensates


In the mid-90’s, two competing groups at UC Boulder (Cornell and Weimann) and
at MIT (Ketterle) created Bose-Einstein condensates (BECs) in dilute atomic gases
that are well-described by the theory we have developed. Because the gases are di-
lute, interatomic collisions are rare, and thermodynamic properties are not strongly
modified by interactions as in the case of superfluid helium.
The experiments were done with 87 Rb (Boulder) and 23 Na (MIT). These are al-
kali atoms which have a single valence electron. A benefit of alkali atoms is that they
have nonzero magnetic moments, and thus can be held in a magnetic trap. Magnetic
trapping was essential for achieving Bose condensation at very low temperatures,
since otherwise the atoms would interact strongly with whatever container they were
placed in.
The trapped atoms have a density of about 1014 cm−3 , for which the Einstein
temperature is τE ≈ 1µK. The main challenge in achieving a dilute gas BEC
was achieving such low temperatures, while maintaining a suitable concentration of
atoms.

12.4.1 Magnetic traps


A magnetic field B produces a splitting of energy levels called the Zeeman effect.
The energy of a magnetic dipole in the field B is

E = −µ · B (12.29)

Alkali atoms have a single valence electron with zero orbital angular momentum
L = 0, so their magnetic moment is entirely due to the spin of the valence electron

µ = −gs µB S (12.30)

where gs ≈ 2 is the electron spin g-factor, S is the electron spin angular momentum,
and µB = eℏ/2me is the Bohr magneton.
An atom whose magnetic moment is aligned with B experiences a lower potential
at high values of |B|, and hence seeks regions of large |B|. An atom whose magnetic
moment is anti-aligned will seek regions of small |B|. Earnshaw’s theorem says
that it is not possible to have a local maximum of |B|. Thus, magnetic traps
work by setting up a local minimum of |B|. If the field is sufficiently strong, the
magnetic moments of atoms moving around in the trap will adiabatically follow the
orientation of B — i.e. high-field-seekers will remain high-field-seekers and low-
field-seekers will remain low-field-seekers. The low-field-seekers will be attracted to
the local minimum of |B|, and the high-field-seekers will be expelled from the trap.
The magnetic traps used in the first BEC experiments were quadrupole traps,
whose local magnetic field looks like

B ∝ xex + yey − 2zez . (12.31)

98
Ph 12c 2024 12 Bose Gases and Bose-Einstein Condensation

(Note that this indeed satisfies ∇ · B = 0.) This can be achieved locally by
placing two finite-length solenoids end-to-end with opposite fields. Unfortunately,
quadrupole traps have a zero of B in the middle. Atoms can experience spin-flips
at this location, resulting in them being lost from the trap. The MIT team dealt
with this problem by “plugging” the hole at (x, y, z) = 0 using a laser with fre-
quency tuned to repel the atoms.29 A nicer choice is the Ioffe-Pritchard trap, which
has a nonzero magnetic field at its center and gives rise to a harmonic (though not
isotropic) potential.

12.4.2 Laser cooling


Magnetic traps are not very strongly confining, so atoms must be cooled significantly
to even remain in the trap. The technique used for this initial cooling is laser cooling
or “optical molasses,” which is a very cool idea.
The idea is to shine a laser through the gas of atoms at a frequency ν = ω − δ
slightly lower than a resonant frequency ω of the atoms. That is, ω is such that
ℏω = E1 − E2 is a difference between two atomic energy levels, and δ is called
the “detuning.” Atoms moving toward the laser with velocity v will experience a
blue-shifted frequency ν → ν − k · v = ω − (δ + k · v), where k is the wave vector
of the laser, satisfying c|k| = ν. This blue-shift will make the laser light closer to
resonance, and therefore make the atom more likely to absorb a photon, receiving
a momentum kick ∆p = m∆v = −ℏk opposite to its direction of motion. Once
the photon is absorbed, the atom radiates it away isotropically, resulting in zero
momentum kick on average. Thus, the atom is more likely to slow down than speed
up during its interaction with the laser.
More precisely, the rate of absorption of photons with detuning δ is proportional
to
A A
Rabs ∝ = 2 , (12.32)
γ2 + (ν − ω)2 γ + δ2

where ω is the resonant frequency of the atom, ν is the applied frequency of light,
γ is the rate of spontaneous emission of photons, and A depends on the squared
amplitude of the applied light. You can think of the atom as being like a harmonic
oscillator with frequency ω and damping rate 2γ. Then (12.32) is the expression
for the squared amplitude of a damped-driven oscillator with driving frequency ν.
29
The laser is tuned to a frequency ω + δ, where ω is a characteristic oscillation frequency of the
atom and δ is a large positive “detuning.” The laser creates an oscillating electric field that drives
oscillations of the electric dipole moment of the atoms. The energy of an electric dipole in a field E
is −E · d. When δ is positive, oscillations of d occur π out of phase with the driving electric field,
causing the atom to have higher energy in higher electric oscillating electric fields.

99
Ph 12c 2024 12 Bose Gases and Bose-Einstein Condensation

Thus, the force due to absorption of photons is proportional to


A
Fk = ℏkRabs = ℏk 2
γ + (δ + k · v)2
 
Aℏk 2δk · v
≈ 2 1− 2 . (12.33)
γ + δ2 γ + δ2
where the shift of the detuning δ → δ + k · v in the denominator is due to the
doppler shift. In the second line, we expanded in small v = |v|. The first term
gives a universal kick in the direction of the laser light, while the second term gives
the velocity-dependent cooling force. We do not need to consider the force due to
spontaneous emission because it averages to zero.
A nifty trick for canceling the unwanted first term is to use two counterpropa-
gating laser beams — one with wavevector k and the other with wavevector −k.
The sum of the resulting forces is
4Aℏδk(k · v)
Fk + F−k = − . (12.34)
(γ 2 + δ 2 )2
For example, if the lasers and velocity are oriented in the z-direction, the force is
4Aℏδk 2
Fz ∝ − vz , (12.35)
(γ 2 + δ 2 )2
which looks like a viscous drag force. For this reason, the counterpropagating lasers
are called “optical molasses.” The setup we just described is sufficient to slow atoms
in one dimension. To slow them in three dimensions, we must apply three pairs of
counterpropagating beams.
In the MIT experiment, they start with sodium atoms effusing from an oven at
a temperature of 600K and a concentration of 1014 /cm3 . Using laser cooling, the
atoms are eventually cooled to about 100µK.

12.4.3 Evaporative cooling


The next step is evaporative cooling. Conceptually, the height of the trap is lowered
gradually, so atoms near the top of the potential well (i.e. those with the most
energy) escape, while those deeper in the potential well do not. Atomic collisions
occur often enough to keep the remaining atoms close to thermal equilibrium with
each other, so that the temperature steadily decreases. In this way, the temperature
is further reduced from 100µK to T ∼ TE ∼ 1µK, with approximately 107 atoms
remaining in the BEC.
The way the walls of the trap are “lowered” is really interesting. Recall that
the atoms are trapped by an inhomogeneous magnetic field. They experience an
effective potential
1
Veff = ± gs µB |B|. (12.36)
2

100
Ph 12c 2024 12 Bose Gases and Bose-Einstein Condensation

The source of this effective potential is the Zeeman splitting between spin states

∆E = gs µB |B| ≡ ℏωZeeman . (12.37)

Atoms whose magnetic moments are anti-aligned with the magnetic field experience
a plus sign in (12.36), and thus seek low values of |B| (and remain in the trap),
while atoms whose magnetic moments are aligned seek high values of |B| (and get
ejected).
One can induce oscillations between energy levels ∆E = ℏω by applying an
oscillating electric field with frequency ν ≈ ω. Thus, by applying such a field with
ν = ωZeeman , one can turn low-field-seeking atoms into high-field-seeking atoms and
kick them out of the trap. The key trick is to match the oscillation frequency ν to
the value of ωZeeman for atoms at the top of the trap. In this way, only atoms at the
top of the trap get ejected.
[End of lecture 14(2024)]

12.4.4 BEC in a harmonic trap


The fact that the atoms are in a harmonic trap changes slightly the analysis of Bose
Einstein condensation. For simplicity, let us assume that the trap is isotropic,30 so
that the energy levels for a single atom are the energy levels of three independent
harmonic oscillators

En = ℏω0 (nx + ny + nz ). (12.38)

The number of particles in excited orbitals is


X
Ne = f (En , µ, τ )
n̸=0
Z ∞
≈ dnx dny dnz f (ℏω0 (nx + ny + nz ), µ, τ ). (12.39)
0

Last time, the energy depended on n2x + n2y + n2z , so it was useful to introduce a
q
variable n = n2x + n2y + n2z representing the radial direction in mode number space.
We can use the same trick here. This time, our “radial” variable is s = nx + ny + nz .
By dimensional analysis, the measure for the radial direction 2
R 1 2 will be1 as ds for some
constant a. We can fix the constant by requiring that 0 as ds = 6 , the volume of
a unit simplex. This gives a = 12 .
30
For typical BEC’s in the laboratory, this is not the case — the trap is typically elongated in
some direction.

101
Ph 12c 2024 12 Bose Gases and Bose-Einstein Condensation

Overall, we get

1 ∞ 2
Z
1
Ne = s ds (ℏω s−µ)/τ
2 0 e 0 −1
Z ∞
1 τ3 x2
≈ dx x
2 (ℏω0 )3 0 e −1
ζ(3)τ 3
= (12.40)
(ℏω0 )3

The Einstein condensation temperature is the temperature where Ne = N , i.e.


 1/3
N
τE = ℏω0 . (12.41)
ζ(3)

The mean squared position of an atom in an excited state in a harmonic trap is


 
2 1 τ
⟨x2 ⟩τ = mω0
2 2
x = , (12.42)
mω02 2 mω02

by equipartition of energy. However, the squared width of the ground state is



⟨0|x2 |0⟩ = . (12.43)
2mω0
Thus, at temperatures comparable to τE , we have
1/2
⟨0|x2 |0⟩1/2

ℏω0
1/2
≈ ≈ N −1/6 . (12.44)
⟨x2 ⟩τ 2τE

Consequently, the BEC appears as a dense “pit” inside a cloud.

102
Ph 12c 2024 13 Heat and Work

13 Heat and Work


13.1 Reversibility, heat, and work
Definition (reversible process). A reversible process is a process where the com-
bined entropy of a system plus reservoir remains constant. Because the total entropy
remains constant, it is possible to run the process in reverse without violating the
laws of thermodynamics (in this case, without leaving the “most probable configu-
ration” for the combined system).
Note that the entropy σS of the system can change in a reversible process — it
is only the combined entropy σtot = σS + σR that cannot change. Henceforth, we
denote σS by σ. Achieving reversibility in practice requires e.g. removing friction,
and ensuring that two objects with different temperatures never come into direct
contact. If heat is only exchanged between objects whose temperatures are infinites-
imally close to each other, then the heat flow can be reversed by an infinitesimal
change in one of the temperatures. In this way, we can imagine running the process
backwards. However, if heat flows from a hotter object to a cooler object, then we
know from our discussion of thermal contact that this involves an increase in total
entropy that cannot be undone.
In a reversible process, there are two types of energy exchange that we can
distinguish:
Definition (heat). An exchange of energy that accompanies a change in entropy
of the system σ → σ + dσ is called heat and we denote it by dQ. ¯ In a reversible
process, the change in entropy of the reservoir must be dσR = −dσ. By conservation
of energy, the change in energy of the reservoir is dUR = −¯ dQ. Finally, from the
1 ∂σR
definition of temperature τ = ∂UR , we have dQ¯ = τ dσ.

Definition (work). An exchange of energy that comes from a change in external


parameters (such as the position of a piston) is called work and is denoted d̄W . Work
does not involve a change in entropy of the system.
The d-notation
¯ dQ
¯ and dW
¯ is to emphasize that there do not exist functions
Q(σ, V ) and W (σ, V ) of which d̄Q and dW ¯ are differentials. That is, heat and
work are not functions of the state of a system — we cannot say that my car has
heat 1000J or that an iceberg has work 1000J. Instead, heat and work characterize
changes. By contrast σ, U , and V are functions of the state of a system — one
can observe the system without knowing its history and (in principle) write down
σ, U , and V . We write their differentials as dσ, dU , and dV . Don’t worry if this
distinction is confusing — we will return to it later.31
31
For the mathematically inclined, the distinction is this: differentials like dσ and dV are exact
differential forms obtained by applying the exterior derivative operator d to some function of the
state of a system. By contrast, “differentials” with a bar dW
¯ and dQ ¯ are not exact or even closed.

103
Ph 12c 2024 13 Heat and Work

It is often useful to think of S as coupled to two different auxiliary systems:


a reservoir R with which it can exchange heat (i.e. energy with an accompanying
exchange of entropy), and another system S ′ on which it can do work but not
exchange entropy. In practice, there may be no physical distinction between R and
S ′.
By conservation of energy, we have
dU = dW
¯ + dQ
¯
= dW
¯ + τ dσ, (13.1)
which says that the total change in energy is caused partly by heat exchanged with
the reservoir and partly by work done on the system.

13.2 Heat engines and Carnot efficiency


All forms of work can be completely converted into each other — for example,
mechanical energy can be used to generate an electrical current, and an electrical
current can be converted back into mechanical energy. Meanwhile, work can be
completely converted into heat (e.g. by friction), but not vice-versa. This makes
work a valuable quantity. It is possible to convert some heat to work, but there are
fundamental limits on how efficient this process can be.
Definition (heat engine). A heat engine is a cyclic device that converts heat into
work by exploiting a difference in temperature between two reservoirs.
Here, “cyclic” means that the device periodically returns to its initial state.
Let us model such a device as a system S coupled to three auxiliary systems
• A reservoir Rh at high temperature τh ,
• A reservoir Rl at low temperature τl ,
• An auxiliary system S ′ on which S can do work.
Over the course of a cycle, the device takes in heat Qh at high temperature τh , does
some work W on S ′ , and ejects heat Ql at low temperature τl . By conservation of
energy, we have
Qh = W + Ql . (13.2)
Let us assume the device operates reversibly. This means that the change in
entropy of the system must be zero over the course of a cycle. The entropy in-
crease accompanying the intake of Qh is σh = Qh /τh , and the entropy decrease
accompanying the output of Ql is σl = Ql /τl . Overall, we must have
σh = σl
Qh /τh = Ql /τl . (13.3)
They are more property thought of as 1-forms on configuration space.

104
Ph 12c 2024 13 Heat and Work

From here, we can solve for W in terms of Qh :


τh − τl
W = Qh (1 − τl /τh ) = Qh = ηC Qh . (13.4)
τh
The quantity ηC is called the Carnot efficiency. It gives the fraction of heat that
can be converted into work by a heat engine operating reversibly.
Suppose now that the heat engine operates irreversibly, so that there is some
entropy production inside the engine. For example, at some point the engine could
allow gas to expand suddenly into a larger chamber, or friction inside the engine
could directly turn work into heat. Because entropy must increase, in this case we
have
σh ≤ σl
Qh /τh ≤ Ql /τl (irreversible engine) (13.5)
Solving for W in this case, we have
W = Qh − Ql ≤ Qh (1 − τl /τh ) = ηC Qh (irreversible engine). (13.6)
Thus, heat engines operating irreversibly have a lower energy conversion efficiency
W/Qh ≤ ηC .

13.3 Refrigerators
A refrigerator is a heat engine run in reverse. The refrigerator takes input heat
Ql from a reservoir at low temperature τl , together with input work, and outputs
heat Qh to a reservoir at temperature τh . In a reversible refrigerator, we again have
Qh /τh = Ql /τl so that total entropy is unchanged. Conservation of energy implies
W = Qh − Ql . The efficiency of a refrigerator is measured by
Ql Ql
γ= = (13.7)
W Qh − Ql
In the case of reversible operation, this is
τl
γC = . (13.8)
τh − τl
For an irreversible refrigerator, we have Qh /τh ≥ Ql /τl , so that the coefficient γ is
smaller than γC .
An air conditioner is a refrigerator that cools the inside of a building. Note that
the efficiency of an air conditioner is highest when the inside temperature τl is close
to the outside temperature τh . A heat pump is an air conditioner with the input and
output switched, so it uses work to heat up a building. The efficiency of a reversible
heat pump is
Qh Qh τh
= = > 1, (13.9)
W Qh − Ql τh − τl
which is better than directly converting W into heat and injecting it into the room
(as with a heater).

105
Ph 12c 2024 13 Heat and Work

13.4 Different types of expansion


Let us give some different examples of processes involving a classical ideal gas and
their relationship to reversibility.
Example 13.1 (Reversible isothermal expansion). Isothermal means the tem-
perature is held constant. The gas is in contact with a reservoir and can exchange
energy with the reservoir during expansion. As the gas expands isothermally, it does
work on the walls of its container. The work done is
Z Vf Z Vf

W = pdV = dV = N τ log(Vf /Vi ). (13.10)
Vi Vi V
The entropy is given by (10.22). The quantum concentration nQ is constant during
isothermal expansion, so the entropy of the gas increases by
∆σ = N log(Vf /Vi ). (13.11)
Note that W = τ ∆σ. This is the integral of the thermodynamic relation
dU = τ dσ − pdV
∆U = τ ∆σ − W. (13.12)
For an ideal gas, τ is unchanged, so since U = 23 N τ , the energy U is unchanged as
well. This implies W = τ ∆σ.
Note that there are two sources of energy flow into/out of the gas. 1) The gas
does work on its environment through a piston, meaning energy W leaves the gas.
2) Energy in the form of heat enters the gas as it expands to maintain constant
temperature.
Example 13.2 (Reversible isentropic expansion). Isentropic means the en-
tropy is held fixed during the expansion. This can be achieved by insulating the gas
from a reservoir so that no heat can flow. The entropy of a monatomic gas depends
on the volume and temperature as
 
σ(τ, V ) = N log τ 3/2 + log V + . . . . (13.13)

To keep the entropy constant, we must have V τ 3/2 = const. So as the system
expands, the temperature drops. The energy is proportional to the temperature, so
the energy drops as well.
Because no heat can flow, we have ∆U = −W . Consider a small amount of
expansion V → V + dV . The change in energy due to work is
Nτ N2U 2U dV
dU = −pdV = − dV = − dV = −
V V 3N 3 V
dU 2 dV
=−
U 3 V
U = CV −2/3 (13.14)

106
Ph 12c 2024 13 Heat and Work

Thus, we have τ V 2/3 ∝ U V 2/3 = const., which agrees with our direct analysis using
the formula for the entropy.
Example 13.3 (Irreversible expansion into vacuum). If a gas suddenly ex-
pands, without exchanging energy with its environment, the process is irreversible.
The total energy doesn’t change, and so the temperature doesn’t either. The en-
tropy thus changes by N log(V2 /V1 ). For example, if we double the size of the box,
the entropy changes by N log 2. We can think of this as adding 1 bit of information
per particle, since now to specify the state of the system, we must say whether it is
in the left- or right-half of the box.

13.5 The Carnot cycle


We have argued that any heat engine has an upper bound on its efficiency η ≤ ηC =
τh −τl
τh coming from the principle that entropy does not decrease. The Carnot cycle
is an example of a heat engine that achieves this maximum efficiency.
In the Carnot cycle, a gas is in a box adjacent to two reservoirs at temperatures τl
and τh . At any given time, we can remove an insulating barrier to couple the gas to
a reservoir, or add a barrier to decouple the gas. Furthermore, the gas can do work
against a piston by changing its volume. By adding/removing barriers and doing
work on or extracting work from the gas, we change the macroscopic configuration
of the gas in a cycle so that it comes back to its starting point.
In order to have a reversible engine, we must only couple the gas to a reservoir
when the gas already has the temperature of that reservoir. This tells us that some
parts of the cycle should be isothermal at temperatures τl and τh . Furthermore,
when the gas is not coupled to a reservoir, we must ensure that its entropy does not
change. This tells us that the other parts of the cycle should be isentropic.
Essentially the only solution is a four-stage process. Isothermal expansion, isen-
tropic expansion, isothermal compression, isentropic compression. In the τ, σ plane,
the gas traverses a rectangle between the points

p1 = (τh , σL ), p2 = (τh , σH ), p3 = (τl , σH ), p4 = (τl , σL ). (13.15)

Recall that the entropy of a gas is given by

σ = N (log τ 3/2 + log V + const.). (13.16)

Thus, we see that σH − σL = N log(V2 /V1 ) = N log(V3 /V4 ), so that we must have
V2 /V1 = V3 /V4 . In more detail, the process is32
32
Following Kittel and Kroemer, we use conventions where we always associate a positive number
and a direction (i.e. from the reservoir to the system) to heat and work, as opposed to using negative
numbers to indicate direction. Thus, when combining different sources of heat and work, we will
have to keep in mind the direction of flow to know what signs to use.

107
Ph 12c 2024 13 Heat and Work

• (1 → 2) The gas starts with temperature τh and entropy σL . After being


coupled to the reservoir at temperature τh , it undergoes isothermal expansion,
doing work on the piston and also increasing its entropy to σH . During this
stage, heat flows from the reservoir:

Qh = τh (σH − σL ) = N τh log(V2 /V1 ). (13.17)

Recall that the internal energy of a monatomic ideal gas is given by U = 32 N τ .


Since the temperature isn’t changing during isothermal expansion, the internal
energy of the gas isn’t changing either. Thus, all the heat Qh goes into work
done on the piston

W12 = Qh . (13.18)

• (2 → 3) The gas now has temperature τh and entropy σH . The reservoir is


decoupled from the gas, and the gas undergoes isentropic expansion. During
this expansion, the gas does more work on the piston, but its temperature
drops to τl . The work is given by the change in internal energy of the gas,
which is given by
3
W23 = U (τh ) − U (τl ) = N (τh − τl ). (13.19)
2

During isentropic expansion or compression, V τ 3/2 is constant. Thus, we have


3/2 3/2
V2 τh = V3 τl . (13.20)

• (3 → 4) The gas now has temperature τl and entropy σH . It is coupled to the


low-temperature reservoir with temperature τl , and then undergoes isothermal
compression. During the compression, its entropy changes from σH back to
σL , so that it dumps heat Ql = τl (σH −σL ) into the low-temperature reservoir.
Because the internal energy of the gas doesn’t change, this heat must come
from work done by the piston.

W34 = Ql = τl (σH − σL ) = N τl log(V3 /V4 ) = N τl log(V2 /V1 ) (13.21)

• (4 → 1) The gas has temperature τl and entropy σL . It is decoupled from the


reservoir and undergoes isentropic compression. During the compression, the
temperature increases from τl to τh . To achieve the compression, the piston
must do work
3
W41 = U (τh ) − U (τl ) = N (τh − τl ) (13.22)
2

108
Ph 12c 2024 13 Heat and Work

The total work done by the heat engine is

W12 + W23 − W34 − W41 = W12 − W34


= (τh − τl )(σH − σL ). (13.23)

[End of lecture 15(2024)]

This is simply the area of the rectangle in the τ -σ plane. This simple result is a
consequence of the fact that the integral of dU over a closed cycle in configuration
space must vanish
I
dU = 0. (13.24)

On the other hand, we have

dU = dQ ¯ = τ dσ − pdV
¯ + dW (13.25)

Thus, we find
I I Z
pdV = τ dσ = dτ dσ, (13.26)
rectangle

where in the last line we used Stokes theorem.


Meanwhile, the heat taken in from the high temperature reservoir is τh (σH −σL ).
W
Thus, the Carnot cycle exactly achieves the Carnot efficiency Q h
= τhτ−τ
h
l
= ηC .

13.5.1 Path dependence of heat and work


In the Carnot cycle, we see an example of the path-dependence of heat and work.
The integrals
I I I I
dQ
¯ = τ dσ, ¯ = − pdV
dW (13.27)

are both nonzero. This shows that Q cannot be a function of the state of the system
— i.e. there cannot be a function Q(σ, V ) such thatH dQ¯ is the differential of that
function.33 If such a function existed, then we’d have dQ = Q(σf , Vf )−Q(σi , Vi ) =
0 for any closed cycle. The object dQ¯ is something that must be integrated along a
path. The resulting heat transfer Q depends on that path — not just the initial and
final points. This is why we use the funny d.
¯ The regular d is reserved for something
whose integral along a closed path is zero, as in (13.24).
33
The variables we use to parametrize the system are not important as long as they completely
characterize the macrostate. For example, there can’t be a function Q(τ, V ) either.

109
Ph 12c 2024 13 Heat and Work

13.5.2 Universality of reversible engines


Suppose that we had two reversible engines with different efficiencies
W1 W2
= η1 > η2 = . (13.28)
Qh1 Qh2
We can hook them up to each other with the lower-efficiency engine operating as a
refrigerator. Specifically, we set

Qh1 = Qh2 = Qh , (13.29)

so that W1 = W2 + ∆W . The refrigerator takes in heat

Ql2 = Qh2 − W2 = Qh − W1 + ∆W, (13.30)

from the low temperature reservoir and requires work W1 − ∆W . The heat engine
deposits heat

Ql1 = Qh1 − W1 (13.31)

into the low-temperature reservoir and does work W1 .


Overall, the net effect is to take heat Ql2 − Ql1 = ∆W from the low temperature
reservoir and do work W1 −(W1 −∆W ) = ∆W . That is, we have converted heat into
work, thus violating the second law of thermodynamics. Of course, the contradiction
is that a reversible heat engine or refrigerator must not produce entropy, and we
already showed that this fixes their efficiency to be the Carnot efficiency. Still this
argument is a way of seeing that the efficiency of a reversible engine is universal
without invoking entropy explicitly. Now we can exhibit the Carnot cycle, which is
a reversible cycle, and deduce that all reversible engines have efficiency ηC .

110
Ph 12c 2024 14 Gibbs free energy

14 Gibbs free energy


14.1 Heat and work at constant temperature
In a process at constant temperature, the work done on a system is

¯ = dU − dQ
dW ¯ = dU − τ dσ = dU − d(τ σ) = dF, (14.1)

where F = U − τ σ is the free energy. Recall also that the condition for thermody-
namic equilibrium for a system at constant temperature τ is that F is minimized.
This is because the number of accessible states of a system+reservoir in thermal
contact is

g = gS gR = eσ(U ) eσR (Utot −U ) = eσ(U ) eσR (Utot )−U/τ = eσR (Utot ) e−F/τ , (14.2)

Thus, a minimum of F is a maximum of g, i.e. the “most probable configuration.”


For these reasons, free energy plays a role for isothermal processes at nonzero
temperature analogous to the role that energy plays for processes at zero tempera-
ture. The extra term −τ σ in the free energy accounts for heat exchange with the
reservoir. If heat is provided by the reservoir, then we get it for free — no external
work is required to make it happen. If heat is given to the reservoir, then that
energy cannot be used to do work.

14.2 Heat and work at constant pressure


Many processes, particularly those open to the atmosphere, take place at constant
pressure. An example is the boiling of a liquid against a piston with a constant
applied force. Such a system is in thermal and mechanical contact with a reservoir
R at temperature τ and pressure p.
Definition (mechanical contact). Two systems are in mechanical contact if they
can exchange volume.
Recall that one of our expressions for pressure was
 
p ∂σ
= . (14.3)
τ ∂V U

Thus, for the reservoir R, we have


p 1
σR (Utot − U, Vtot − V ) = σR (Utot , Vtot ) − V − U + . . . . (14.4)
τ τ
The definition of a reservoir is that the “. . . ” terms can be dropped (we should
imagine that they are higher order in 1/N , where N is some extensive quantity), so
that the entropy of the reservoir is linear in V and U (to very good approximation).

111
Ph 12c 2024 14 Gibbs free energy

Mechanical contact means that in order for a system to change volume V →


V + δV , it must steal volume from the reservoir VR → VR − δV . That is, volume
becomes a conserved quantity in the context of mechanical contact. Consequently,
the number of accessible states of the combined system and reservoir is
p 1
eσ(U,V ) eσR (Utot −U,Vtot −V ) = eσR (Utot ,Vtot ) eσ(U,V )− τ V − τ U
= eσR (Utot ,Vtot ) e−G/τ (14.5)

Here, we have defined a quantity called the Gibbs free energy

G = F + pV = U + pV − τ σ (14.6)

Thus, the minimum of G is the “most probable configuration” for a system and
reservoir in thermal and mechanical contact.
One of the equilibrium conditions for thermal and mechanical contact is
   
∂σS ∂σR
0= − , (14.7)
∂V U ∂V U

which says that the pressure of the system and reservoir must be equal.
Recall that the free energy F = U −τ σ has an interpretation as the energy, where
we subtract off the “useless” contribution from heat τ σ. The Gibbs free energy has
a similar interpretation. When a system is in mechanical contact with a reservoir,
the an amount of energy −pV goes into changing the volume of the reservoir. This
energy is “useless” in the same form as heat and it is useful to subtract it off. Thus
G represents the “useful” work that can be extracted from a system at constant
temperature and pressure.

14.3 Thermodynamic identities for Gibbs free energy


Consider a system with variable energy U , volume V , and number of particles N .
Under changes dU, dV, dN , the entropy changes by
     
∂σ ∂σ ∂σ
dσ = dU + dV + dN
∂U V,N ∂V U,V ∂N
1 p µ
= dU + dV − dN. (14.8)
τ τ τ
The corresponding change in G is

dG = dU + pdV + V dp − τ dσ − σdτ
= (τ dσ − pdV + µdN ) + pdV + V dp − τ dσ − σdτ
= V dp − σdτ + µdN. (14.9)

112
Ph 12c 2024 14 Gibbs free energy

As a result, we get the following thermodynamic identities


 
∂G

∂N τ,p
 
∂G
= −σ
∂τ N,p
 
∂G
= V. (14.10)
∂p τ,N

14.4 Gibbs free energy and chemical potential


Note that p and τ are intensive quantities: they do not change when two identical
systems with the same p and τ are put together. By contrast, U, V, σ, F, G, N are
extensive quantities: they are additive under putting two systems together.
If we think of G as a function of τ, p, N , then N and G are the only extensive
quantities, so they must be proportional:

G = N ϕ(τ, p), (14.11)

where ϕ is some function ofthe intensive quantities that is independent of N . How-


∂G
ever, we also saw that ∂N τ,p
= µ, which implies ϕ(τ, p) = µ(τ, p), so that we can
write

G = N µ(τ, p). (14.12)

Example 14.1 (Gibbs free energy of ideal gas). Recall that the entropy of an
ideal gas is
   
nQ 5 V nQ 5
σ = N log + = N log + . (14.13)
n 2 N 2

The free energy is


 
3 V nQ 5
F (τ, V, N ) = U − τ σ = N τ − N τ log +
2 N 2
 
N
= N τ log −1 . (14.14)
V nQ

We previously found the formula


 
∂F
µ= = τ log(N/V nQ ). (14.15)
∂N τ,V

113
Ph 12c 2024 14 Gibbs free energy

We can alternatively get this from the Gibbs free energy


 
N
G(τ, p, N ) = F + pV = N τ log − 1 + Nτ
V nQ
p
= N τ log
τ nQ
= N µ(τ, p). (14.16)

where we have used the ideal gas law pV = N τ .


Consider now a system made up of multiple chemical species with particle num-
bers Nj . Because G is extensive, we must have

G(λN1 , λN2 , . . . , λNk ) = λG(N1 , N2 , . . . , Nk ). (14.17)

Taking a derivative with respect to λ and evaluating at λ = 1, we find


X ∂G X
Nj = Nj µj (τ, p) = G, (14.18)
∂Nj
j j

where we used a generalization of (14.10) to compute the µj from derivatives of G.


Here, we made a physical assumption that µj is only a function of τ and p, and
doesn’t depend for instance on the intensive ratios Ni /Nj . This assumption is true
in the limit that the different species are weakly interacting, so the presence of one
type of particle does not affect others. The thermodynamic identity is
X
τ dσ = dU + pdV − µj dNj (14.19)
j

so that we have
X
dG = µj dNj − σdτ + V dp. (14.20)
j

14.5 Chemical processes


The equation for a chemical reaction can be written
X
νj Aj = 0, (14.21)
j

where Aj represent chemical species, and νj are coefficients. For example, for the
reaction H2 + Cl2 = 2HCL, we have

A1 = H2 , A2 = Cl2 , A3 = HCL
ν1 = 1, ν2 = 1, ν3 = −2. (14.22)

114
Ph 12c 2024 14 Gibbs free energy

Consider this reaction happening at constant temperature τ and pressure p. The


equation (14.20) becomes
X
dG = µj dNj (14.23)
j

On the other hand, we can write

dNj = νj dN
b, (14.24)

where dN b is the change in the number of times the reaction has taken place. Plugging
this into the expression for dG, we have
 
X
dG =  µj νj  dN
b. (14.25)
j

In equilibrium, we have dG = 0, so that


X
µj νj = 0. (14.26)
j

14.6 Equilibrium for ideal gases


Consider the case where each constituent is an ideal gas. We will have to generalize
our discussion of the ideal gas slightly to make this interesting. Recall that the
partition function of a single monatomic atom was
X
Z1monatomic = e−En /τ = nQ V. (14.27)
n

Here, the energies appearing in the sum come from the center-of-mass momentum of
the atoms, and we have written the result after approximating the sum over modes
as an integral in the usual way. If the gas is not monatomic, but the particles have
some internal structure, then the partition function becomes
X X X
Z1 = e−(En +Eint )/τ = e−En /τ e−Eint /τ
n,internal n internal

Z1 = nQ V Zint (τ ), (14.28)

where Zint (τ ) is the partition function associated with the internal degrees of freedom
— for instance, rotational or vibrational motion.
As an example, for rotational degrees of freedom contribute

X ℏ2
Zrot (τ ) = (2j + 1)e− 2Iτ j(j+1) , (14.29)
j=0

115
Ph 12c 2024 14 Gibbs free energy

as you computed on problem set 4. Vibrational degrees of freedom contribute



X 1
Zvib (τ ) = e−ℏnω/τ = . (14.30)
n=0
1 − e−ℏω/τ

The total internal partition function for a diatomic molecule is a product of these
factors
Zint (τ ) = Zrot (τ )Zvib (τ ). (14.31)

[End of lecture 16(2024)]


The Gibbs sum is
N
X 1
Z= λN Z1N = ≈ 1 + λZ1 , (14.32)
1 − λZ1
n=0

where λ = eµ/τ , and we have assumed λ ≪ 1 so that we are in the classical regime.
The expectation value of N is
∂ ∂
N = ⟨N ⟩ = λ log Z ≈ λ (λZ1 ) = λZ1 . (14.33)
∂λ ∂λ
Thus, the chemical potential is given by
N
µ = τ log = τ (log n − log c), (14.34)
Z1
where c = nQ Zint . This is a generalization of our previous results for monatomic
gases.
Suppose that we have multiple species, so that
µj = τ (log nj − log cj ), (14.35)
where cj = nQi Zint,i (τ ). The equilibrium condition (14.26) can be written
Y νj X X
log nj = νj log nj = νj log cj = log K(τ ),
j j j
ν
Y
K(τ ) ≡ cj j . (14.36)
j

Here, K is called the equilibrium constant, which is a function only of temperature.


It can also be written
Y νj
K(τ ) = nQj e−νj Fint,j /τ , (14.37)
j

where Fint,j = −τ log Zint,j is the internal free energy. Equation (14.36) is called the
law of mass action.

116
Ph 12c 2024 14 Gibbs free energy

Example 14.2 (Atomic and molecular hydrogen). As an example, consider


the reaction H2 − 2H = 0 for the dissociation of molecular hydrogen into atomic
hydrogen. The concentrations satisfy

[H2 ][H]−2 = K(τ ). (14.38)

Here, the notation [A] = nA is shorthand for the concentration of species A. Since
H is monatomic, it has Zint,H = 1. What about H2 ? One important point is that
there is a nontrivial binding energy EB = 4.476eV for two H’s inside an H2 molecule.
When computing Zint,H2 , we must use the same conventions for the zero of energy
as we used for H. Thus, we have

Zint,H2 = e−(−EB )/τ Zrot,H2 Zvib,H2 , (14.39)

where Zrot,H2 and Zvib,H2 are the partition functions associated with rotational mo-
tion and vibrations. The equilibrium constant is
nQ,H2 Zint,H2 Zrot,H2 Zvib,H2
K(τ ) = 2 = 23/2 eEB /τ . (14.40)
nQ,H nQ,H
3/2
nQ,H2 mH
Here, we used nQ,H ∝ 2
3/2 = 23/2 . At low temperatures, the vibrational and
mH
rotational partition functions become 1, and the dominant term above is the binding
energy term eEB /τ . We can write our result as

[H2 ] [H]
= 23/2 eEB /τ Zrot,H2 Zvib,H2 (14.41)
[H] nQ,H

At low concentrations [H] ≪ nQ,H , the concentration of H2 is suppressed. This


is a consequence of entropy: even though H2 has lower energy due to binding,
dissociated molecules have higher entropy. Entropy competes against the binding
energy through the term eEB /τ . As an example, in intergalactic space, the hydrogen
has very low concentration, so most of it is unbound.

14.6.1 Heat capacity


The heat capacity at constant volume is defined by
     −1  
∂U ∂σ ∂σ ∂σ
CV = = =τ . (14.42)
∂τ V ∂τ V ∂U V ∂τ V

For the ideal gas, it is


 
∂ 3 3N
τ N log τ + . . . = . (14.43)
∂τ 2 2

117
Ph 12c 2024 14 Gibbs free energy

The heat capacity at constant pressure is defined by


   
∂σ ∂U ∂V
Cp = τ = +p . (14.44)
∂τ ∂τ p ∂τ p

The energy of an ideal gas is U = 32 N τ , which is independent of p. Thus, this term


is the same as before. We get a new contribution from
 
∂V N
p = p = N. (14.45)
∂τ p p

Thus, Cp = CV + N = 52 N for an ideal gas. The physical interpretation is that in


order to raise the temperature when the system is at constant pressure, we need to
pour in more energy. Some of the energy goes to expanding the system, doing work
against the pressure.

118
Ph 12c 2024 15 Phase transitions

15 Phase transitions
You will notice that these lecture notes are lacking in pictures, especially in this
section. For the pictures, you can (a) copy them down in class, (b) consult the
textbook (which we’re following relatively closely), (c) check out some of the other
notes linked from the website.
Sources: K&K chapter 10, David Tong’s notes on statistical physics
A phase transition is a discontinuity in thermodynamic quantities or their deriva-
tives. For example, the heat capacity and the compressibility of water jump between
the liquid phase and the gas phase. A small change in temperature or pressure leads
to a big change in the equilibrium configuration.

15.0.1 Nonanalyticities
Discontinuities in thermodynamic quantities can only occur in infinite-size systems.
To see this, consider a system in the canonical ensemble at temperature τ . The
partition function is
X
Z(τ ) = e−Es /τ . (15.1)
s

Quantities like, e.g., the heat capacity can be computed from derivatives of Z, CV ∝
∂2
β 2 ∂β 2 log Z. If the sum over states s is finite, then CV can never be discontinuous

because Z(τ ) will be a finite sum of smooth curves E −Es /τ as a function of τ , and
hence itself a smooth curve. However, as the number of states goes to infinity, Z(τ )
can potentially develop non-smooth features, called “nonanalyticities.”
In fact, it’s not sufficient simply for the number of states to be infinite — for
example, a single harmonic oscillator will not exhibit a phase transition. In prac-
tice, the number of degrees of freedom must be infinite as well. This means that
phase transitions only actually occur in the infinite volume limit. In practice, the
rounding out of thermodynamic quantities due to finite system size is unobservable
for macroscopic systems.
To see how nonanalyticities can emerge, consider a density of states of the form
1 1
f (E) = E αN −1 + E βN −1 , (15.2)
Γ(N α) Γ(N β)
where β > α. (The important thing is the two different power laws — the coefficients
are chosen for convenience.) The partition function is
Z ∞
Z(τ ) = e−E/τ f (E) = τ N α + τ N β . (15.3)
0

The free energy is


 
F = −τ log Z = −τ log τ N α + τ N β . (15.4)

119
Ph 12c 2024 15 Phase transitions

Consider the limit N → ∞. Note that the free energy grows like F ∼ N , so this is
like the limit of a large number of degrees of freedom. If τ < 1, then the first term
in parentheses dominates and we have F ≈ −αN τ log τ . If τ > 1, the second term
dominates, and we have F ≈ −βN τ log τ . Thus, in the large-N limit, we have
(
−αN τ log τ (τ < 1),
F = (N ≫ 1).
−βN τ log τ (τ > 1)

The derivative of the free energy becomes discontinuous at τ = 1. The free energy
itself is not discontinuous — it is equal to zero at τ = 1 in both expressions.
This type of discontinuity in the derivative of F is characteristic of a first-order
phase transition. First-order transitions are typically cases where two distinct con-
tributions to Z are switching places in importance, as is happening in our toy model.
Physically, one contribution τ N α comes from states in one kind of configuration, and
the other contribution τ N β comes from states in a different type of configuration.
When their contributions switch places, it means that thermodynamic quantities go
from being dominated by one type of configuration to the other.

15.0.2 Phase diagram of water


As a physical example of a system with phase transitions, let us look at the phase di-
agram of water as a function of temperature and pressure. There are three (naively)
distinct phases: solid, liquid, and gas.34 The phases are separated by coexistence
curves in the τ -p plane. For example, the upper-left curve separates solid and liquid.
It is almost vertical, with a huge downward slope. The downward slope reflects the
fact that ice, when put under pressure, melts.
The top-right curve separates the liquid and gas phases — boiling occurs here.
Interestingly, the curve ends at a point (τc , pc ) = (647K = 374◦ C, 218 atm) called
the critical point. Here, the distinction between liquid and gas disappears, and we
might call the system simply a “fluid.” By going around the critical point, you can
actually turn liquid into gas without ever encountering a discontinuity. Thus, in
some sense we cannot say they are different phases until we go sufficiently close to
the coexistence curve and observe a discontinuity in thermodynamic quantities.
At sufficiently low pressure, the liquid phase disappears and only solid and gas are
allowed. Ice sublimates directly to vapor as the temperature is increased. The three
phase boundaries meet at a triple point (273.16K = 0.01◦ C, 0.006 atm),35 where
solid, liquid, and gas simultaneously coexist.36
34
Actually, the full phase diagram includes other exotic phases, like different types of ice. Here,
we are considering the part of the diagram involving the three most familiar phases.
35
The Kelvin scale is defined such that the triple point of water is 273.16K.
36
There are some awesome videos of triple points online, e.g. https://www.youtube.com/watch?
v=r3zP9Rj7lnc.

120
Ph 12c 2024 15 Phase transitions

15.1 Coexistence curves


At a coexistence curve, the two phases, say liquid and gas, are in thermal, mechan-
ical, and diffusive equilibrium. The conditions for this are

τl = τg , µl = µg , pl = pg . (15.5)

In particular, the chemical potentials must be equal, and this determines a curve in
the τ -p plane

µg (p, τ ) = µl (p, τ ). (15.6)

Let us relate this curve to some more easily measurable quantities.


We start by determining its slope. Consider changes τ → dτ and p → dp, moving
along the coexistence curve. In addition to (15.6), we have

µg (p + dp, τ + dτ ) = µl (p + dp, τ + dτ )
       
∂µg ∂µg ∂µl ∂µl
dp + dτ = dp + dτ, (15.7)
∂p τ ∂τ p ∂p τ ∂τ p

which can be rearranged to give


   
∂µl ∂µg
∂τ − ∂τ
dp
= p  p (15.8)
dτ ∂µ g
− ∂µ l
∂p ∂p τ τ

Recall that we have the relations


   
∂G ∂G
G = N µ(p, τ ), = V, = −σ. (15.9)
∂p N,τ ∂τ N,p

Let us define

v = V /N, s = σ/N (15.10)

the volume and entropy per particle. We then have


   
∂µ 1 ∂G V
= = =v
∂p τ N ∂p N,τ N
   
∂µ 1 ∂G σ
= = − = −s. (15.11)
∂τ p N ∂τ N,p N

Thus, we can write our equation for the slope of the coexistence curve as
dp ∆s
= , ∆s = sg − sl , ∆v = vg − vl (15.12)
dτ ∆v

121
Ph 12c 2024 15 Phase transitions

The quantity in the numerator is the change in entropy per particle in going from
liquid to gas, and the quantity in the denominator is the change in volume per
particle in going from liquid to gas.
The quantity ∆s is related to the amount of heat needed to boil a molecule —
i.e. to transfer a molecule reversibly from the liquid to the gas while keeping the
temperature constant. The heat during the transfer is

¯ = τ ∆s ≡ L,
dQ (15.13)

where L is called the latent heat of vaporization. With this definition, we have
dp L
= , (15.14)
dτ τ ∆v
which is called the Clausius-Clapeyron equation or vapor pressure equation.

15.1.1 Approximate solution to the vapor pressure equation


We can obtain a useful approximation to (15.14) with two assumptions. First that
the volume per molecule in the gas phase is much larger than the volume per
molecule in the liquid phase vg ≫ vl , which implies ∆v ≈ vg . Second, that the
gas phase is well-described by the ideal gas law pv = τ , so that ∆v ≈ vg = τ /p. We
find
dp L
= 2 p. (15.15)
dτ τ
Let us furthermore assume that L ≈ L0 is approximately constant as a function of
τ . This is a good approximation in water and actually ice as well for a large range
of temperatures from 220K to 650K (the critical point). With this approximation,
we can integrate the above differential equation to obtain
L0
log p(τ ) = − +C
τ
p(τ ) = p0 e−L0 /τ (15.16)

15.1.2 Physics on the coexistence curve


Suppose τ and p lie on a liquid-gas coexistence curve. Because the chemical poten-
tials µl (p, τ ) and µg (p, τ ) are equal, we can move molecules from liquid to gas in a
reversible manner. Specifically,

G = µg Ng + µl Nl = µ(Ng + Nl ) = µN, (15.17)

which is independent of Ng . However, the volume is not independent of Ng . We


have

V = Ng vg + Nl vl = Ng vg + (N − Ng )vl , (15.18)

122
Ph 12c 2024 15 Phase transitions

and vg ̸= vl . Thus, we can solve


V − N vl N vg − V
Ng = , Nl = . (15.19)
vg − vl vg − vl
There exists a range of V such that both of these quantities are nonzero, and hence
the system will be in an inhomogeneous mixture of two phases. In this range, the
volume can be changed reversibly while the Gibbs free energy, temperature, and
pressure remain constant.
If we change V into a region where V < N vl or V > N vg , we naively find a
contradiction because Ng or Nl appears to become negative. In practice, this means
that if we try to move V into these regions, the temperature and or pressure will
change.

15.2 The Van der Waals equation of state


We will now study a toy model of an interacting non-ideal gas that illustrates some
of the physics we’ve discussed. The Van der Waals equation of state is a modification
of the ideal gas p = NVτ law that takes into account a simple model of interactions
between molecules. It is
 2
Nτ N
pVdW = −a
V − Nb V
τ a
= − (15.20)
v − b v2
where a and b are constants. The replacement V → V − N b models the fact that
particles have short-range repulsive forces when they are brought together — each
molecule essentially has some volume b, which means the total volume available for
the particles to move is reduced by N b.
2
The term p → −a N V2
takes into account long-range pairwise attraction between
molecules. The shift in energy due to pairwise interactions is
Z
X 1
δU = Vpair = d3 xd3 yn(x)n(y)ϕ(x − y), (15.21)
2
pairs

where ϕ is the intermolecular potential and the 21 is to avoid overcounting pairs.


Let us make the mean field approximation that the density of atoms is constant
n(x) = n. This is of course not correct — for example, the density of atoms n(x) is
a sum of delta functions in the classical limit. However, the mean field approximation
should be ok when there are not large spatial fluctuations in the density n(x). It
is also important that the gas is dilute — if the atoms were close and could feel
large variations in each others’ potentials, then n(x) = const would be a very poor
approximation. And finally, it is also important that the interactions ϕ(x) are weak
— otherwise it would not make sense to start with the ideal gas law p = NVτ and

123
Ph 12c 2024 15 Phase transitions

make small corrections to it. We will talk more about the validity of mean field
approximations later. Plugging in n(x) = n, we have

n2 n2
Z Z
δU = d3 xd3 yϕ(x − y) = V d3 xϕ(x)
2 2
n2 N2
= V (−2a) = −a . (15.22)
2 V
R
Where we simply gave the integral dxϕ(x) a name −2a. The negative sign is
because the potential should be attractive. This gives a correction to the pressure
via
∂δU N2
δp ≈ − = −a 2 . (15.23)
∂V V
This derivation was a reasonable approximation in the limit of low density and
weak interactions. However, we are going to now take the Van der Waals equation
of state and use it outside of its regime of validity because it gives a very useful toy
model in which to understand some properties of phase transitions. The quantitative
results will be incorrect, but some of the qualitative features will be correct.

15.2.1 Isotherms
Let us fix τ , and plot p vs V in the Van der Waals model. Such curves are called
isotherms. If τ is large, the term −a/v 2 is unimportant, and the curves closely follow
the ideal gas law p ≈ τ /v. However, as τ gets smaller, the curve p(v) develops a
wiggle. The value τ = τc below which the wiggles occur is the one where the curve
has an inflection point

∂p ∂2p
= 2 = 0. (15.24)
∂v ∂v
A nice way to find the inflection point is to write the VdW equation as

pv 3 − (pb + τ )v 2 + av − ab. (15.25)

This is a cubic in v with three solutions for τ < τc . When τ = τc and p = pc , the
three solutions coincide, so that the equation has the form

C(v − vc )3 = pc v 3 − (pc b + τc )v 2 + av − ab, (15.26)

for some constant C. Expanding out and equating coefficients of powers of v, we


find C = pc and
8a a
τc = , vc = 3b, pc = . (15.27)
27b 27b2

124
Ph 12c 2024 15 Phase transitions

Let us consider τ < τc . There is a range of p for which there are three possible
solutions v1 , v2 , v3 for the equation of state p(v). What is happening in this range?
Note that the middle solution v = v2 has the feature
∂p
> 0. (15.28)
∂v
This means that as the system grows in volume, the pressure grows. As the system
shrinks in volume, the pressure decreases. The result is an instability — the system
will evolve away from this configuration toward some other state.
Example 15.1 (Negative heat capacity). Incidentally, a similar instability con-
dition applies to heat transfer. Recall that the heat capacity is defined as
∂U
CV = . (15.29)
∂τ
If the heat capacity were negative, then increasing the energy would decrease the
temperature. This would cause more energy to flow into the system, further de-
creasing the temperature. Meanwhile, decreasing the energy would increase the
temperature, causing more energy to flow out, causing the temperature to increase
further. A famous example of a system with negative heat capacity is a black hole.
1
The Hawking temperature is τ ∝ M , which means it decreases as the mass increases.
The instability results in the black Hawking evaporating to nothing.
Back to our liquid-gas system. What configuration replaces the unstable one?
Well, the Van der Waals equation of state was derived assuming the system has a
fixed density n/V . We can move off this curve by making the density inhomogeneous
— i.e. by turning the system into an inhomogeneous mixture of coexisting gas and
liquid. As we argued before, in the inhomogeneous phase, the pressure and temper-
ature are constant as a function of volume. Thus, the physically correct isotherm
does not follow the curve pVdW (τ ) between v1 and v3 , but rather is a straight line
in that region.

15.2.2 The Maxwell construction


To find out where the straight line occurs, we should set the chemical potentials of
the liquid and gas phases equal

µ(p, v1 (p)) = µ(p, v3 (p)). (15.30)

This equation can be solved for p once we know µ for the Van der Waals gas.
Remember that we have

dG = µdN − σdτ + V dp = d(N µ) = µdN + N dµ


N dµ = −σdτ + V dp. (15.31)

125
Ph 12c 2024 15 Phase transitions

We can integrate this equation along any path in configuration space to compute
the chemical potential µ. A convenient choice is to consider paths of constant N
and τ , in which case we have

N dµ = V dp (dτ = 0)
I pg
1
µ g = µl + V dp (15.32)
N pl

To find two points with equal µ, we should find two points such that the integral
of V dp between the points vanishes. This says that the difference between the area
above the straight line and below the straight line should be equal, which tells us
where to draw the isotherm in the coexistence region. This is called the Maxwell
construction.
Actually, there is something problematic about the Maxwell construction: it
requires us to integrate dµ through the unstable region, where the physics described
by the Van der Waals curve is completely wrong! A better derivation would be to
choose a path between the gas point and liquid point that does not pass through
the unstable region. Ultimately this would be more mathematically complicated,
and will give the same answer. Anyway, the Maxwell construction is not really very
important — it is a trick to help us visualize where the condition (15.30) holds.
Let us consider changing the pressure a little bit away from its value at the
coexistence curve pcoexistence at some temperature
 τ. If p is slightly above pcoexistence ,
then we follow the liquid part of the curve ∂µ ∂p τ = vl . If p is slightly below
 
pcoexistence , then we follow the gas part of the curve ∂µ∂p = vg . Thus, there is a
τ
discontinuity in the first derivative of the chemical potential (and also the Gibbs
free energy) as we move across the coexistence pressure at fixed τ . Note that there
is no discontinuity in the Gibbs free energy itself, since the chemical potentials have
to be equal for coexistence. A discontinuity in the first derivative of the free energy
is characteristic of a first order transition. This is the same thing that happened in
our toy model of the partition function Z(τ ) = τ N α + τ N β .

15.2.3 Metastable states


For each isotherm, we can determine the pressure at which the liquid and gas states
are in equilibrium. The coexistence region lies inside a downward-facing parabola-
like curve whose top is at the critical point. Note that there are regions inside
∂p
this curve where there is no instability, i.e. ∂V < 0 as it should be. These are
metastable phases — i.e. supercooled gas or superheated liquid. We can coax the
system into these states by, e.g. slowly cooling the gas. A small perturbation will
cause the supercooled gas to liquify. (By contrast, in the unstable region, no small
perturbation is needed.)

126
Ph 12c 2024 15 Phase transitions

15.3 The critical point


Let us look at how various quantities behave near the critical point in the Van der
Waals model. For example, let us ask how the difference in volume vg − vl between
the liquid and gas phases changes with temperature near the critical point.
Plugging vg and vl into the Van der Waals equation of state, we find
τ a τ a
p= − 2 = − 2 (15.33)
vg − b vg vl − b vl

We can use this to solve for τ ,

a(vl + vg )(vl − b)(vg − b)


τ= . (15.34)
vl2 vg2

When we’re near the critical point, we can write vl = vc − ϵ/2 and vg = vc + ϵ/2,
where ϵ is small. Plugging this in, we find

2avc ((vc − b)2 − (ϵ/2)2 )


τ≈
vc4 − (ϵvc )2 + O(ϵ3 )
2avc (vc − b)2 ϵ2 ϵ2
 
= 1− − + O(ϵ3 ) (15.35)
vc4 4(vc − b)2 vc2

where O(ϵ3 ) means terms of order ϵ3 and higher. When ϵ = 0, we must have τ = τc ,
so we can rewrite this as

τc − τ ∼ (vg − vl )2
(τc − τ )1/2 ∼ vg − vl . (15.36)

This is our first example of a critical exponent — the exponent in a power law
relationship between
  quantities as one approaches a critical point.
∂µ ∂µ
Recall that ∂p = vg and ∂p = vl . Below the critical temperature, we
gas liquid
noticed that these quantities jumped as we dialed the pressure past its coexistence
value. Thus, the Gibbs free energy (G = N µ) had a discontinuous first derivative.
However, at the critical temperature, the difference between vl and vg vanishes.
The Gibbs free energy no longer has a discontinuous first derivative as a function of
2 ∂(vg −vl )
pressure. Instead, we have ∂∂p∆µ2 = ∂p ∼ (p − pc )−2/3 , which diverges. This is
characteristic of a second-order phase transition.
Let us compute a few more critical exponents in the Van der Waals model.
We can ask how the volume changes with pressure as we move along the critical
∂p ∂2p
isotherm. Remember that the critical point was an inflection point ∂v = ∂v 2 = 0.
Thus, the Taylor expansion starts with a cubic term

p − pc ∼ (v − vc )3 (τ = τc ). (15.37)

127
Ph 12c 2024 15 Phase transitions

Finally, let us consider the compressibility, defined as


 
1 ∂v
κ=− . (15.38)
v ∂p τ

We would like to see how κ changes


 as we approach τ → τc from above. We already
∂p
know that at the critical point ∂v = 0. Thus, expanding for temperatures
τ =τc
close to τc , we have
 
∂p
∼ τ − τc , (15.39)
∂v τ,v=vc

so that

κ ∼ (τ − τc )−1 . (15.40)

The compressibility diverges near the critical point.


How well do these results agree with experiment? The actual answers for the
behavior of water near its critical point are

vg − vl ∼ (τc − τ )β , β = 0.326419(3)
p − pc ∼ (v − vc )δ , δ = 4.78984(1)
−γ
κ ∼ (τ − τc ) , γ = 1.23708(7). (15.41)

So the Van der Waals model does not actually get the critical exponents right. This
is perhaps not surprising, since it is a very crude model and doesn’t incorporate
many of the properties of actual water. However, a very surprising fact is that the
above values of the critical exponents don’t actually depend on special properties of
water at all. If you measure critical exponents in any other liquid-vapor transition,
you find the same crazy numbers. The above critical exponents are actually some
kinds of universal “constants of nature” like e and π. The failure of the Van der
Waals model has nothing to do with the failure to build in details of water molecules
— rather it is missing some essential feature of critical points that leads to the
emergence of these universal numbers.
I have to show you so many decimal digits for β, δ and γ because the current
most precise values for these quantities are actually due to me — your instructor —
together with some collaborators. See, for example https://arxiv.org/abs/1603.
04436, which computes precise values for the numbers ∆σ = 0.5181489(10), ∆ϵ =
∆σ
1.412625(10) which are related to the critical exponents above by β = 3−∆ ϵ
,δ =
3−∆σ 3−2∆σ
∆σ , γ = 3−∆ϵ . These numbers are computed using a method called the “con-
formal bootstrap,” which is based on techniques from quantum field theory. In last
few lectures of the term, I hope to explain a little bit about why critical exponents
are universal and why quantum field theory is a good tool for describing them.

128
Ph 12c 2024 15 Phase transitions

15.4 The Ising Model


Other important examples of phase transitions and critical points occurs in magnets.
In fact, the phases of a ferromagnet are very closely related to those of a liquid-vapor
transition in a way that we will make precise shortly.
The Ising model is a simple toy model of a ferromagnet that still displays very
nontrivial behavior. It was inveted by Wilhelm Lenz in 1920 and given to his
student Ernst Ising, who solved the 1-dimensional case for his Ph.D. thesis in 1924.
It consists of N sites in a d-dimensional lattice. On each lattice site i lives a spin
that can be either up (+1) or down (−1): si ∈ {±1}. The energy of the system of
spins is
X X
E = −J si sj − B si . (15.42)
⟨ij⟩ i

The interpretation of the second term is that the spins sit in an applied external
magnetic field B, causing it to be energetically favorable to point in the direction
of B. The energy of a magnetic moment msi in a magnetic field is −Bmsi , and we
have defined Bm → B for brevity. P
In the first term, the notation ⟨ij⟩ means that the sum runs over nearest-
neighbor pairs of spins. Its precise meaning depends on the dimension d and the
type of lattice. We write q for the number of nearest neighbors of a given lattice
site. For a square lattice in d-dimensions, each spin has q = 2d nearest neighbors,
corresponding to the sites reached by moving by ±1 along each of the d axes. The
Ising model can be defined on any type of lattice, but for concreteness we will mostly
consider a square lattice. The cases d = 1, 2, 3 are relevant for real-world materials
— they describe magnetic
P strings, surfaces, and bulk materials.
The term −J ⟨ij⟩ si sj models interactions between neighboring spins and the
interaction strength J is a parameter. When J > 0, this term makes it energetically
favorable for the spins to align with each other. This case is called a ferromagnet.
The case J < 0 is called an antiferromagnet. For the purposes of the present
discussion, the distinction is not so important, but we will imagine that we have a
ferromagnet.
Let us work in the canonical ensemble and introduce the partition function
X
Z= e−βE[si ]
{si }
X X X
= ··· e−βE(s1 ,s2 ,...,sN ) , (15.43)
s1 =±1 s2 =±1 sN ±1
P
where β = 1/τ . The notation {si } means we sum over all configurations of spins
si ∈ ±1, and we have written it out more explicitly on the second line. An important
observable is average spin or magnetization, which can be written in terms of a

129
Ph 12c 2024 15 Phase transitions

derivative of the partition function with respect to B,


1 X 1 ∂ log Z
m= ⟨si ⟩ = . (15.44)
N N β ∂B
i

15.4.1 The Ising model as a Lattice Gas


The Ising model has another interpretation as a lattice model of a gas. Consider
a d-dimensional lattice, but now with particles hopping between lattice sites. The
particles have hard cores, so that we do not allow multiple particles on the same
site. Thus, the state of the lattice is described by occupation numbers ni ∈ {0, 1}.
The particles have an attractive force that makes them want to sit on neighboring
sites, and we have some chemical potential µ. A Gibbs factor is e−β(E−µN ) , where
X X
E − µN = −K ni nj − µ ni . (15.45)
⟨ij⟩ i

However, this system is related to the Ising model by a simple change of variable:

si = 2ni − 1. (15.46)

As an exercise, you can determine how J, B are related to K, µ. Up to some shifts,


the chemical potential µ plays the role of an applied magnetic field, and the attrac-
tion term K plays the role of the nearest-neighbor spin-spin interaction.
For this reason, you might optimistically expect the Ising model to have some-
thing to do with liquid-vapor transitions as well. This might seem over-optimistic
because this model of a gas is so much cruder than the Van der Waals model we
have discussed before. However, it turns out to be right!

15.5 Mean Field Theory


For general lattices in general dimensions d, the partition function (15.43) cannot
be computed analytically. An exact solution exists in d = 1 (which we will derive
shortly). The case B = 0 and d = 2 was solved by Onsager in 1944 in a tour de
force calculation. The case d = 3 is particularly interesting for real world materials,
but has been an open problem for almost 100 years.
Mean field theory is an approximate method to evaluate Z that makes man-
ifest some of the physics of the Ising model. We write the interactions between
neighboring spins in terms of their deviation from the average spin m,

si sj = [(si − m) + m][(sj − m) + m]
= (si − m)(sj − m) + m(si + sj ) − m2 (15.47)

130
Ph 12c 2024 15 Phase transitions

In the mean field approximation, we assume that fluctuations of spins away from
the average are small, so that we can neglect the first term. The energy is then
X X
E ≈ −J [m(si + sj ) − m2 ] − B si
⟨ij⟩ i
1 X
= JN qm2 − (Jqm + B) si , (15.48)
2
i

where q is the number of nearest neighbors for a given lattice site. The factor of
1/2 in the first term is to avoid overcounting pairs of lattice sites.
The mean field approximation has removed the interactions. We now have a
bunch of non-interacting spins in the effective magnetic field
Beff = B + Jqm, (15.49)
which includes the external applied field and also a field generated by nearest neigh-
bors. We analyzed a system of noninteracting spins a long time ago in section ??.
The partition function is
1 2
e−β ( 2 JN qm −Beff i si )
X P
Z=
{si }
1 2
= e− 2 βJN qm (eβBeff + e−βBeff )N
1 2
= e− 2 βJN qm 2N coshN (βBeff ). (15.50)
This expression for Z depends separately on m and Beff . However, we can determine
m from (15.44), and this gives us a consistency equation
m = tanh(βBeff ) = tanh(βB + βJqm). (15.51)
This is a transcendental equation that we can solve for m for various values of β, J, B.
We can visualize the solution by plotting the curves m and tanh(βB + βJqm) and
asking that they intersect. Let us consider some different cases.
• B = 0, βJq < 1 (zero applied magnetic field, high temperature/low spin-spin
interaction). When B = 0, the curve tanh(βB + βJqm) is symmetric around
m = 0. When βJq < 1, there is only one solution to (15.51) at m = 0. The
interpretation is that at high temperatures and zero applied field, the spins
fluctuate a lot and average out to zero.
• B = 0, βJq > 1 (zero applied magnetic field, low temperature/high spin-
spin interaction). There are now three solutions to equation (15.51), m =
−m0 , 0, m0 . The middle solution is unstable. To see this, we can expand the
free energy near m = 0 when B = 0,
1 N log 2 1
F =− log Z ≈ − − (JN q(Jqβ − 1))m2 + O(m4 ) (B = 0).
β β 2
(15.52)

131
Ph 12c 2024 15 Phase transitions

We see that m = 0 is a local maximum of the free energy and thus does not
dominate the canonical ensemble. Thus, the system settles into the states
m = ±m. The interpretation is that the spin-spin interactions overwhelm the
effects of thermal fluctuations and the spins overwhelmingly want to point in
the same direction.
Precisely which direction they chooses depends on the history of the system
(hysteresis). If more spins start out up than down, then the system is most
likely to evolve to a state where most of the spins are up (an vice-versa). This
is an example of a phenomenon called spontaneous symmetry breaking. When
B = 0, the system has a Z2 symmetry under flipping all the spins si → −si ,
which acts as m → −m. However, the actual state of the system at low
temperature is not invariant under m → −m. (When the temperature is high,
the symmetry is unbroken.)

• B = 0, βJq = 1 (zero applied magnetic field, critical temperature). This is


the point where the distinction between the high and low temperature phases
disappears. The critical temperature is τc = Jq.

• B ̸= 0. In this case, there is always a solution for m with the same sign
as B. At sufficiently high values of βJq, there are two additional solutions.
The middle one is always unstable. The other one, with opposite sign to B is
metastable — it is a local but not global minimum of the free energy. Thus,
for the stable solution, the magnetization always points in the direction of B.
The applied magnetic field breaks the Z2 symmetry. If we slowly change the
applied magnetic field from positive to negative, the system will move into
the metastable state and then at some point decay into the stable state where
most spins are flipped.

To summarize, if we fix τ < τc and vary B, there is a phase transition at B = 0


where the sign of the magnetization m jumps. Because m is a first derivative of
the free energy, this is a first-order transition. As we increase τ , the first-order
transition disappears at τ = τc . There, we have a second-order transition. Near the
critical point, the phase diagram in the B, τ plane looks like a rotated version of the
liquid-vapor phase diagram we discussed last time.

15.5.1 Critical exponents in Mean Field Theory


Let us compute the critical exponents of the Ising model in the mean field approx-
imation. We first consider β = 0 and ask how the difference 2m0 in magnetization
between the two phases depends on τ . Just below τ = τc , m is small and we can
Taylor expand (15.51) to obtain
1
m ≈ βJqm − (βJqm)3 . (15.53)
3

132
Ph 12c 2024 15 Phase transitions

Thus, the magnetization goes like

m0 ∼ ±(τc − τ )1/2 . (15.54)

We see that we get the same exponent as in our computation of ng − nl in the Van
der Waals model.
We can also set τ = τc and ask how the magnetization changes as we approach
B = 0. We then have βJq = 1 and our equation is

m = tanh(B/Jq + m) (15.55)

Expanding in small B gives


 3
B 1 B B 1
m≈ +m− +m ≈ + m − m3 + O(B 2 ), (15.56)
Jq 3 Jq Jq 3

so that

m ∼ B 1/3 . (15.57)

This is analogous to our result vg − vl ∼ (p − pc )1/3 in the Van der Waals model.
Finally, let us consider the magnetic susceptibility
 
∂m
χ=N . (15.58)
∂B τ

This is the analog of compressibility for a gas. Let us ask how χ changes as we
approach τ → τc from above at B = 0. Differentiating (15.51) with respect to B,
we find
 
Nβ Jq
χ= 1+ χ . (15.59)
cosh2 (βJmq) N

Setting m = 0, B = 0, we find

χ= ∼ (τ − τc )−1 . (15.60)
1 − Jqβ
We again get the same critical exponent as in the Van der Waals model.

15.6 Critical phenomena and universality


We have seen that the mean field approximation gives the same answers for critical
exponents in both the Van der Waals gas and the Ising model. A rough analogy
between the two cases is

133
Ph 12c 2024 15 Phase transitions

gas magnet
τ τ
p B
τc τc
p = pc B=0
n m
κ χ
n = ng m = m0
n = nl m = −m0
On the other hand, there are some differences — the Ising model has an inbuilt Z2
symmetry under B, m → −B, −m, whereas there is no obvious symmetry relating
the liquid and gas phases.
Just like with liquid-vapor transitions, the critical exponents of the Ising model
turn out to be different from those predicted by mean field theory. However, amaz-
ingly they are the same critical exponents as in liquid vapor transitions.
The very same β, γ, δ that we discussed before appear not only in myriad liquid-
vapor transitions, but at critical points of ferromagnets, and many other settings.
We say that all these systems are in the same universality class — their critical
points are controlled by the same critical exponents. The particular universality
class we have been discussing is called the 3d Ising universality class. However, this
is only because the 3d Ising model is the simplest model in the same universality
class. Other than being simple, there is nothing special about it — it is just one of
many models with the same emergent critical behavior.
What does mean field theory get wrong in all of these systems, and what is
shared between different critical points? A defining feature of critical points is that
fluctuations are important. In fact, the structure of fluctuations at critical points is
extremely special. As an example, in a ferromagnet with τ < τc and h = 0, there
are two phases m = ±m0 . In practice, these phases will coexist, so that we will have
large pockets of up spins and pockets of down spins. As we approach the critical
point, the distinction between m ± m0 starts to disappear and thermal fluctuations
become more important. Inside the pockets of up spins, there are smaller pockets of
down spins. Inside those pockets are even smaller pockets of up spins, etc.. At the
critical point, this nested structure goes on ad infinitum, and we get (in the limit
of an infinite lattice) infinitely nested pockets of up and down spins at all distance
scales.
In liquid-vapor transitions, something similar happens. When τ < τc the liquid
can boil. Boiling happens by a bubble of gas appearing inside the liquid. As we
approach the critical temperature, fluctuations in density become more important
and we end up with droplets inside the bubbles, and bubbles inside the droplets,
etc. until there are bubbles and droplets at all distance scales.
This infinitely nested structure of different phases at a critical point is reflective
of an emergent symmetry: scale invariance. The system looks the same at all

134
Ph 12c 2024 15 Phase transitions

distance scales, like a fractal. There is strong evidence that scale-invariant theories
are extremely tightly mathematically constrained, and this is the reason why the
same scale-invariant theories show up in so many different physical systems.
As an example, the tight mathematical constraints on scale-invariant theories in
2-dimensions have led to an exact solution for the critical exponents of the 2d Ising
model
1
m0 ∼ (τc − τ )β , β=
8
m ∼ B 1/δ , δ = 15
7
χ ∼ (τ − τc )−γ , γ= . (15.61)
4
The critical exponents in 3 dimensions are not known exactly, but they can be
computed numerically using conformal bootstrap techniques to give the values in
section ??.
Let us now develop a bit more technology to understand the significance of
fluctuations and scaling symmetry in the Ising model. Along the way, we will solve
the 1d Ising model (though unfortunately we won’t earn a Ph.D. for doing so like
Ising did).

15.7 Solving the 1d Ising model


In this section, we will solve the 1d Ising model the right way. This is not necessarily
the easiest way, but it is the best way in that it can be vastly generalized to many
different types of lattice models and beyond.

15.8 The 1d Ising model and the transfer matrix


Let us start with the Ising lattice model in 1-dimension. For concreteness, we will
study the theory on a periodic lattice with length M , so the spins si are labeled by
i ∈ ZM . The partition function is given by
X
Z1d = e−S[s]
{si =±1}
M
X M
X
S[s] = −K si si+1 − h si , (15.62)
i=1 i=1

where K = βJ and h = βB. I will sometimes refer to S[s] as the “action,” even
though it is equal to βE, where E is the classical energy.
We’re going to solve this theory using an analogy with quantum mechanics. The
partition sum can be thought of as a discrete version of a path-integral. This path
integral can be computed by relating it to a quantum-mechanical theory. This is an
example of the notion of quantization.

135
Ph 12c 2024 15 Phase transitions

The key idea is to build up the partition sum by moving along the lattice site-
by-site. Forget about periodicity for the moment, and consider the contribution to
the partition function from spins j < i for some fixed i,
X P P
Zpartial (i, si ) = eK j<i sj sj+1 +h j<i sj . (15.63)
{sj :j<i}

Because of the interaction term si−1 si , we cannot do the sum over {s1 , . . . , si−1 }
without specifying the spin si . Thus, we have a function of si . In short, Zpartial (i, s)
is the partition function of the theory on the lattice 1 . . . i, with fixed boundary
condition s at site i.37
Note that Zpartial (i + 1, si+1 ) can be related to Zpartial (i, si ) by inserting the
remaining Boltzmann weights that depend on si and performing the sum over si =
±1,
X
Zpartial (i + 1, si+1 ) = T (si+1 , si )Zpartial (i, si ), (15.64)
si =±1

where

T (si+1 , si ) ≡ eKsi si+1 +hsi . (15.65)

The key step is to recognize (15.64) as a discrete version of the Schrodinger


equation in a 2-dimensional Hilbert space H. This Hilbert space has basis |s⟩ = |±1⟩.
The T (s′ , s)’s are elements of a 2 × 2 matrix Tb acting on H

e−K−h
 K+h 
′ ′ b e
T (s , s) = ⟨s |T |s⟩, T = −K+h
b , (15.66)
e eK−h

and Zpartial (i, s) are the components of a vector |Ψi ⟩ ∈ H,

Zpartial (i, s) = ⟨s|Ψi ⟩. (15.67)

In this notation, (15.64) becomes

|Ψi+1 ⟩ = Tb|Ψi ⟩. (15.68)

The matrix Tb is called the “transfer matrix”, and it plays the role of a discrete
time-translation operator. Here, i should be thought of as a discrete imaginary time
coordinate.
To be explicit, the (integrated) Schrodinger equation in a quantum theory in
imaginary time is

|Ψ(tE + ∆tE )⟩ = e−∆tE H |Ψ(tE )⟩, (15.69)


b

37
We must also impose some boundary condition at site 1. The precise choice is not important
for this discussion, so we have left it implicit.

136
Ph 12c 2024 15 Phase transitions

where tE is the imaginary time coordinate, ∆tE is some imaginary time-step, and
Hb is the quantum Hamiltonian. Thus, the 1-dimensional Ising lattice model is
equivalent to a 2-state quantum theory with Hamiltonian

b = − 1 log Tb.
H (15.70)
∆tE
When the lattice is periodic with length M , the partition function is related to
the transfer matrix by
X
Z= ⟨sM |Tb|sM −1 ⟩⟨sM −1 |Tb|sM −2 ⟩ · · · ⟨s1 |Tb|sM ⟩
{si }

= Tr(TbM ). (15.71)

This is easy to evaluate by diagonalizing Tb,

Tr(TbM ) = λM M
+ + λ− , (15.72)

where
p
λ± = eK cosh h ± e2K sinh2 h + e−2K
(
2 cosh K
→ (when h = 0). (15.73)
2 sinh K

In the thermodynamic limit M → ∞, the partition function is dominated by the


larger eigenvalue
 M !
λ−
Z1 = λM+ 1+ ≈ λM+. (15.74)
λ+

This is an interesting result: the 1d Ising model has a partition function that is
completely smooth as a function of the parameters K, h. Thus, there is no phase
transition in the 1d Ising model. The interpretation is that the spin-spin interactions
aren’t strong enough to create different high and low temperature phases. To see a
phase-transition, we will have to look in higher dimensions.
There is a nice interpretation of (15.74) in terms of our quantum mechanics
analogy: the state with the largest eigenvalue of Tb has the smallest eigenvalue of
Hb — i.e. it is the ground state, and we should call it |0⟩. We have shown that the
ground state dominates the thermodynamic limit. Contributions from the excited
state are exponentially suppressed in the size of the system
 M
λ−
= e−M/ξ ,
λ+
1
≡ − log(λ− /λ+ ). (15.75)
ξ

137
Ph 12c 2024 15 Phase transitions

The constant of proportionality is 1/ξ, where ξ is called the “correlation length”.


The correlation length characterizes the rate at which correlations fall off with dis-
tance.
We can also use the transfer matrix to compute correlation functions. For ex-
ample, consider the two-point function ⟨si1 si2 ⟩ZM where the subscript M indicates
that we are on a periodic lattice with length M . Suppose i1 > i2 . We have
1 X
⟨si1 si2 ⟩ZM = ⟨sM |Tb|sM −1 ⟩ · · · ⟨si1 +1 |Tb|si1 ⟩si1 ⟨si1 |Tb|si1 −1 ⟩ · · ·
Z1
{si }

× · · · ⟨si2 +1 |Tb|si2 ⟩si2 ⟨si2 |Tb|si2 −1 ⟩ · · · ⟨s1 |Tb|sM ⟩


1
= Tr(TbM −i1 σ bz Tbi1 −i2 σ z Tbi2 ) (i1 > i2 ). (15.76)
Z1
Here, we introduced the Pauli spin operator σ z that measures the spin of a state
σ z |s⟩ = s|s⟩. (15.77)
It is easy to compute the correlation function (15.76) by expressing σ z in the eigen-
basis of Tb. One can show that in the limit of large M and large “distance” i1 − i2 ,
the correlator factorizes into a product of expectation values ⟨0|σ z |0⟩ = ⟨si ⟩, plus
exponential corrections from the excited state
⟨si1 si2 ⟩ = ⟨si ⟩⟨sj ⟩ + O(e−|i1 −i2 |/ξ , e−(M −|i1 −i2 |)/ξ ). (15.78)
Thus, the “correlation length” indeed controls the rate at which correlations between
different spins die off.

15.9 Quantization in quantum mechanics


In problem set 5, you studied a quantum mechanical particle moving in 1 dimension.
The system had hamiltonian
2
b = pb + V (b
H x), where

pb|x⟩ = −iℏ |x⟩. (15.79)
2m ∂x
You showed that the partition function could be written
 
 !N
TH b 1
Tr  1 − +O 
ℏN N2
Z ∞ Z ∞
1 T
 Z 
= A T dx0 · · · A T dxN −1 exp − dtL(x(t), ẋ(t)) , (15.80)
−∞ N −∞ N ℏ 0
where x(t) is a piecewise linear path between the xk at times kT /N and L(x, ẋ) =
1 2
2 mẋ − V (x) is the “Lagrangian.” Taking the limit N → ∞ of (15.80), we get

1 T
Z  Z 
− TℏH
b
Tr(e )= Dx(t) exp − dtL(x(t), ẋ(t)) , (15.81)
x(0)=x(T ) ℏ 0

138
Ph 12c 2024 15 Phase transitions

where here the integral is over periodic paths. Note that the left-hand side is the
partition function at inverse temperature β = 1/τ = T /ℏ. Thus, the partition
function is a path integral over paths that are periodic in imaginary time with
periodicity T = ℏβ, or ∆t = −iℏβ.
This transformation is analogous to the one we just did for the 1d Ising model,
but backwards. The analogy is as follows. Firstly, the sum over a spin is analogous
to the integral over x
X Z ∞
⇐⇒ dxi (15.82)
si =±1 −∞

The energy functional S[s] in the Ising model is analogous to the action
Z T
1
S[s] ⇐⇒ dtL(x(t), ẋ(t)) (15.83)
ℏ 0

The partition function of the Ising model is analogous to the path integral
Z ∞ Z ∞
1 T
X  Z 
−S[s]
e ⇐⇒ A T dx0 · · · A T dxN −1 exp − dtL(x(t), ẋ(t))
−∞ N −∞ N ℏ 0
{si }
(15.84)

The state |s⟩ is analogous to |x⟩

|s⟩ ⇐⇒ |x⟩, (15.85)

and the transfer matrix is analogous to the operator that evolves with the Hamilto-
nian from one time to the next
 !
T H
b 1
Tb ⇐⇒ e−T H/ℏN = 1 − +O (15.86)
b
ℏN N2

The lesson is: a statistical system on a 1d lattice can be transformed into a


quantum mechanical system, where the direction along the lattice gets interpreted
as imaginary time. This begs the question: can we do something similar for higher-
dimensional lattices, and what do we get?

15.10 The 2d Ising model


Let us now consider a slightly more complicated case: the 2d Ising model. For
simplicity, we set h = 0. We consider the partition function on the doubly-periodic
lattice ZM × ZN and label spins si,j by a pair (i, j) ∈ ZM × ZN .

139
Ph 12c 2024 15 Phase transitions

The action is given by


X
S[s] = −K (si,j si+1,j + si,j si,j+1 )
i,j
X 1  X
2
=K (si,j+1 − si,j ) − 1 − K si,j si+1,j
2
i,j i,j
N
X
= const. + L(sj+1 , sj ). (15.87)
j=1

In the last line, we split the action into contributions from pairs of neighboring rows.
The notation sj represents the configuration of spins in the j-th row,
(sj )i = si,j . (15.88)
The action associated with a pair of neighboring rows is given by
M M
1 X ′ 1 X
L(s′ , s) = K (si − si )2 − K (si+1 si + s′i+1 s′i ). (15.89)
2 2
i=1 i=1

(The constant in (15.87) gives an unimportant multiplicative constant that we will


ignore.)
To quantize the theory, we can think of the j direction as time, so sj is inter-
preted as a configuration of spins on a fixed time-slice. The Hilbert space has an
orthonormal basis vector for each such configuration,
HM = Span {| ±1, ±1, · · · , ±1⟩}
M
O
= Hi , (15.90)
i=1

where Hi is a 1-qubit Hilbert space for each site i. HM is the quantum Hilbert
space of M qubits, and is 2M -dimensional. This is sometimes called a (quantum-
mechanical) spin-chain.
The transfer matrix between successive time slices is a 2M × 2M matrix with
entries

⟨s′ |Tb|s⟩ = e−L(s ,s) . (15.91)
The partition function on ZM × ZN is then
Z(ZM × ZN ) = TrHM (TbN ). (15.92)
To compute correlation functions, we need an operator that measures the spin at
site i. This is simply the Pauli spin matrix σiz associated with the i-th site
σiz |s1 , . . . , sj , . . . , sM ⟩ = si |s1 , . . . , si , . . . , sM ⟩. (15.93)

140
Ph 12c 2024 15 Phase transitions

Correlation functions become traces of time-ordered products, e.g.

⟨si1 ,j1 si2 ,j2 ⟩ = TrHM (TbN +j2 −j1 σiz1 Tb2j1 −j2 σiz2 ). (15.94)

Let us write Tb in a more familiar way as an operator on a spin chain. First split
L into contributions from horizontal and vertical bonds

L(s′ , s) = Lh (s′ ) + Lh (s) + Lv (s′ , s).


1 X
Lh (s) = − K si+1 si ,
2
i
X1
Lv (s′ , s) = K(s′i − si )2 . (15.95)
2
i

Note that
1
σiz σi+1
z
P
e2K i |s⟩ = e−Lh (s) |s⟩. (15.96)

Meanwhile, Lv only involves spins at a single site, so let us imagine that we have
only one site. Note that
1 ′ 2
⟨s′ |(1 + e−2K σ x )|s⟩ = e− 2 K(s −s) . (15.97)

We also have
′ x
1 + e−2K σ x = eA+K σ , where
′ −2K
tanh K = e ,
p
eA = 1 − e−4K , (15.98)
′ x
which follows by expanding out the Taylor series for eA+K σ and matching the
coefficients of 1, σ x . Thus,
′ σix ′
P
eAM ⟨s′ |eK i |s⟩ = e−Lv (s ,s) . (15.99)

The constant eAM will cancel in correlation functions, so we will ignore it. Putting
everything together, we find
! ! !
1 X X 1 X
Tb ∝ exp K σiz σi+1
z
exp K ′ σix exp K σiz σi+1
z
. (15.100)
2 2
i i i

In a quantum-mechanical interpretation, we would write Tb = e−∆τ H , but the re-


b

sulting H
b would be very complicated.
Onsager’s 1944 solution of the 2d Ising model consisted of diagonalizing the above
matrix. The solution has an interesting and important feature: when the parameter

141
Ph 12c 2024 15 Phase transitions

K is dialed to a very special value, the correlation length ξ (which is the inverse of
the difference between the two lowest eigenvalues of Tb) diverges — i.e. the low-lying
eigenvalues of Tb collapse toward zero. This is the critical point of the theory, and
it results in lots of interesting and rich physics. In fact, the right definition of a
critical point/second-order phase transition is a point where the correlation length
of a system diverges. A divergent correlation length means the system is in some
sense coupled to itself on all distance scales, and can display interesting emergent
phenomena.
We have started with a 2d lattice model, and in the end obtained a quantum
mechanical system that has a spatial direction (in addition to time). In the con-
tinuum limit, where we zoom out to distance scales much larger than the lattice
spacing, this system becomes a quantum field theory. In general, a d-dimensional
lattice model in the continuum limit can be rewritten as a quantum field theory
with d − 1 spatial directions and 1 time direction. In fact, one of our best definitions
of the strong nuclear force is as the continuum limit of a 4d lattice model.
A critical point is described by a quantum field theory with an infinite correlation
length. The fact that the correlation length is infinite means that the theory doesn’t
have any intrinsic length scales — it becomes invariant under rescaling.

142
Ph 12c 2024 16 Maxwell’s demon

16 Maxwell’s demon
Maxwell’s demon is a deceptively simple paradox about entropy that was proposed
in 1867 and resolved more than 100 years later.
Imagine a box of gas with a divider separating it into two halves. Both halves
initially have the same temperature, volume, and number of particles. On the
divider, there is a small door operated by a machine that we call the “demon.” The
demon looks at each gas molecule approaching the door. By selectively choosing to
open or close the door, it allows only fast-moving molecules to enter the right half,
and only slow-moving molecules to enter the left half. In this way, the temperature
on the left decreases and the temperature on the right increases. The demon has
apparently violated the second law of thermodynamics, since the configuration where
both sides have different temperatures has lower entropy than the configuration
where they have the same temperature (i.e. thermal equilibrium).
The demon does not need to be conscious — it could be a computer with a laser
sensor, or even something as simple as a spring-loaded door that only opens to the
right and can only be knocked open by a sufficiently fast molecule. There are also
versions of the paradox that violate the second law in other ways. For example,
starting with a mixture of two gases A and B, the demon could selectively let A
molecules pass to the left and B molecules pass to the right, separating the mixtures.
The separated gases have lower entropy than the mixed gas.
No matter what, the demon has to perform some kind of computation: it has
to measure a molecule, make a decision, and act accordingly. The resolution to the
paradox comes from understanding some thermodynamic properties of computation:
specifically, which kinds of computations are reversible and which ones are not.
Let us start with a very concrete model of a bit of information: a ball in a double-
well potential V (x) = (x2 − a2 )2 . We say that the ball in the left well represents
0, and the ball in the right well represents 1. Given a ball in the 1 state, we can
move it to the 0 state in a completely reversible manner without doing any work or
dissipating any heat. For example, we can hook the ball up via a pulley to another
ball in an exactly opposite potential −V (x). Now an infinitesimally-tiny nudge will
cause the ball to move from state 1 to state 0.
Suppose, however, that we do not know the initial state of the ball, and we
wish to move it to the 0 state. Importantly, we want an operation that works no
matter what the initial state is. Let us call this operation “SetToZero.” Almost by
definition, SetToZero cannot be reversible. The problem is that it maps two different
initial states (0 and 1) to a single final state 0. The laws of physics can always be
run backwards, so how can we achieve an irreversible operation? We must use the
environment.
As a concrete example, suppose that the ball experiences some friction. Then
our SetToZero operation can be: swing a hammer at the right well. If the ball is
already on the left, nothing will happen. If the ball is on the right, it will get kicked

143
Ph 12c 2024 16 Maxwell’s demon

over the barrier, and settle on the other side due to friction. Friction is important for
this, since otherwise the ball will just come back over the barrier. More abstractly,
in this example, information about the initial state has moved via heat dissipation
into the environment, where it is forgotten. In general, this must happen any time
we want to erase information, since erasing is an irreversible operation. This is
Definition (Landauer’s principle). Erasing information requires energy to be
dissipated as heat.
The heat that’s dissipated is at least τ ∆σ, where ∆σ is the information entropy
associated with the erased bits.
We can now see a problem with Maxwell’s demon. During operation, the demon
must measure a molecule. This measurement requires storing information, like “the
molecule is moving fast” in some kind of memory bank. The demon makes a decision
based on this information. Then before the next molecule arrives, the demon must
erase its memory bank. Erasure requires dissipating heat, which means the entropy
of that information is put into the environment.
As an example, consider the spring-loaded door. In order to function correctly,
the door must settle back down after each molecule comes through. This is a form
of erasing information: the door is more excited if a fast molecule hits it and less
excited if a slow molecule hits it, but either way it must settle down to the same
state. However, settling down requires sending away energy, and hence entropy, into
the environment.
Thus, the demon is ultimately taking entropy from the gas and sending it into
the environment. The total entropy of system plus environment is non-decreasing,
and the paradox is resolved.
One could ask: why must the demon erase its memory? Couldn’t it just leave
the information in memory? To understand the answer, let us imagine a memory
bank given by a ticker tape that can store a string of 0’s and 1’s. To avoid erasing
its memory, the demon must start filling up the ticker tape. After it has separated
a large number of molecules, it has created a long ticker tape with an essentially
random sequence of 0’s and 1’s. The entropy of the gas has turned into information
entropy of the ticker tape. An important insight is that we should count the informa-
tion entropy of the tape as physical entropy. After all, the tape is a physical system.
A tape with length N has at least 2N different microstates, and these should be
treated on the same footing as the microstates of any other physical system. Thus,
in this case, the demon has transferred entropy from the gas to the tape, but it has
not decreased the total entropy of the combined gas-tape system.
The insight that information entropy should be counted as thermodynamic en-
tropy, and that this gives the resolution to the Maxwell’s demon paradox, was largely
due to Bennett in 1982.

144
Ph 12c 2024 A Review: trace

A Review: trace
Let us review the trace and some of its properties. Consider a linear operator A.
Its matrix elements Aji in an n-dimensional basis ⃗ei are defined by38
n
X
A⃗ei = Aji⃗ej . (A.1)
j=1

The trace of A is defined by


n
X
Tr(A) = Aii . (A.2)
i=1

The trace has two important properties:

• Cyclicity. The trace is invariant under cyclic permutations of a product of


matrices

Tr(A1 · · · An−1 An ) = Tr(An A1 · · · An−1 ). (A.3)

Actually, this follows from cyclicity for a product of two matrices (exercise!)

Tr(AB) = Tr(BA). (A.4)

We can prove this as follows:


X XX XX X
Tr(AB) = (AB)ii = Aij Bji = Bji Aij = (BA)jj = Tr(BA)
i i j i j j
(A.5)

• Basis independence. Under a change of basis implemented by a matrix U ,


A changes by A → U AU −1 . We can prove invariance of the trace under this
transformation using cyclicity

Tr(U AU −1 ) = Tr(U −1 U A) = Tr(A). (A.6)

Suppose A is diagonalizable, i.e. there exists a change-of-basis matrix U such


that

A = U diag(λ1 , . . . , λn )U −1 , (A.7)
38
If ⃗ei is an orthonormal basis, we have Aji = ⃗ej · (A⃗ei ), or in Dirac notation Aji = ⟨j|A|i⟩.
However, what we say in this section holds in any basis.

145
Ph 12c 2024 A Review: trace

where λ1 , . . . , λn are the eigenvalues of A. Then the trace is given by the sum of the
eigenvalues:
X
Tr(A) = Tr(diag(λ1 , . . . , λn )) = λi . (A.8)
i

Finally, for a diagonalizable matrix, a function of that matrix is defined by applying


the function to each eigenvalue

f (A) = U diag(f (λ1 ), . . . , f (λn ))U −1 . (A.9)

In particular,
X
Tr(f (A)) = f (λi ). (A.10)
i

This is the equation used in writing (3.38), in the case f (H) = e−H/τ .

146

You might also like