0% found this document useful (0 votes)

69 views101 pages

Statistical Mechanics Notes

Uploaded by

vidhya mol

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views101 pages

Statistical Mechanics Notes

Uploaded by

vidhya mol

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 101

Lecture Notes on Statistical Mechanics & Thermodynamics

Jared Kaplan

Department of Physics and Astronomy, Johns Hopkins University

Abstract
Various lecture notes to be supplemented by other course materials.
Contents
1 What Are We Studying? 3

2 Energy in Thermal Physics 5

2.1 Temperature and Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Heat and Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Entropy and the 2nd Law 11

3.1 State Counting and Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Two Coupled Paramagnets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Another Example: ‘Einstein Solid’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4 Ideal Gases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.5 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Interactions and Temperature 23

4.1 What is Temperature? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Entropy and Heat Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Paramagnets – Everything You Know Is Wrong . . . . . . . . . . . . . . . . . . . . 27
4.4 Mechanical Equilibrium and Pressure . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.5 Diﬀusive Equilibrium and Chemical Potential . . . . . . . . . . . . . . . . . . . . . 31
4.6 Extensive and Intensive Quantities . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Engines and Refrigerators 33

5.1 Refrigerators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 Real Heat Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.3 Real Refrigerators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6 Thermodynamic Potentials 39

7 Partition Functions and Boltzmann Statistics 45

7.1 Boltzmann Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
7.2 Average or Expectation Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.3 Equipartition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.4 Maxwell Speed Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.5 Partition Function and Free Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.6 From Laplace Transforms to Legendre Transforms . . . . . . . . . . . . . . . . . . 53

8 Entropy and Information 55

8.1 Entropy in Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
8.2 Choice, Uncertainty, and Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
8.3 Other Properties of Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.4 A More Sophisticated Derivation of Boltzmann Factors . . . . . . . . . . . . . . . . 59
8.5 Limits on Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

1
9 Transport, In Brief 60
9.1 Heat Conduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
9.2 Viscosity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
9.3 Diﬀusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

10 Quantum Statistical Mechanics 64

10.1 Partition Functions for Composite Systems . . . . . . . . . . . . . . . . . . . . . . . 64
10.2 Gibbs Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
10.3 Bosons and Fermions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
10.4 Degenerate Fermi Gases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
10.5 Bosons and Blackbody Radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
10.6 Debye Theory of Solids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
10.7 Bose-Einstein Condensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
10.8 Density Matrices and Classical/Quantum Probability . . . . . . . . . . . . . . . . . 85

11 Phase Transitions 85
11.1 Clausius-Clapeyron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
11.2 Non-Ideal Gases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
11.3 Phase Transitions of Mixtures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
11.4 Chemical Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
11.5 Dilute Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
11.6 Critical Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

12 Interactions 97
12.1 Ising Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
12.2 Interacting Gases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Basic Info
This is an advanced undergraduate course on thermodynamics and statistical physics. Topics
include basics of temperature, heat, and work; state-counting, probability, entropy, and the second
law; partition functions and the Boltzmann distribution; applications to engines, refrigerators, and
computation; phase transitions; basic quantum statistical mechanics and some advanced topics.
The course will use Schroeder’s An Introduction to Thermal Physics as a textbook. We will follow
the book fairly closely, though there will also be some extra supplemental material on information
theory, theoretical limits of computation, and other more advanced topics. If you’re not planning
to go on to graduate school and become a physicist, my best guess is that the subject matter and
style of thinking in this course may be the most important/useful in the physics major, as it has
signiﬁcant applications in other ﬁelds, such as economics and computer science.

2
1 What Are We Studying?
This won’t all make sense yet, but it gives you an outline of central ideas in this course.
We’re interested in the approximate or coarse-grained properties of very large systems. You may
have heard that chemists define NA = 6.02 × 1023 as a mole, which is the number of atoms in a
small amount of stuff (ie 12 grams of Carbon-12). Compare this to all the grains of sand on all of
the world’s beaches ∼ 1018 , the number of stars in our galaxy ∼ 1011 , or the number of stars in the
visible universe ∼ 1022 . Atoms are small and numerous, and we’ll be taking advantage of that fact!
Here are some examples of systems we’ll study:
• We’ll talk a lot about gases. Why? Because gases have a large number of degrees of freedom,
so statistics applies to them, but they’re simple enough that we can deduce their properties by
looking at one particle at a time. They also have some historical importance. But that’s it.
• Magnetization – are magnetic dipoles in a substance aligned, anti-aligned, random? Once
again, if they don’t interact, its easy, and that’s part of why we study them!
• Metals. Why and how are metals all roughly the same (answer: they’re basically just gases of
electrons, but note that quantum mechanics matters a lot for their behavior).
• Phase Transitions, eg gas/liquid/solid. This is obviously a coarse-grained concept that
requires many constituents to even make sense (how could you have a liquid consisting of only
one molecule?). More interesting examples include metal/superconductor, fluid/superfluid,
gas/plasma, aligned/random magnetization, confined quarks/free quarks, Higgs/UnHiggsed,
and a huge number of other examples studied by physicists.
• Chemical reactions – when, how, how fast do they occur, and why?
Another important idea is universality – in many cases, systems with very different underlying
physics behave the same way, because statistically, ie on average, they end up being very similar.
For example, magnetization and the critical point of water both have the same physical description!
Related ideas have been incredibly important in contemporary physics, as the same tools are used
by condensed matter and high-energy theorists; eg superconductivity and the Higgs mechanism are
essentially the same thing.
Historically physicists figured out a lot about thermodynamics without understanding statistical
mechanics. Note that
• Thermodynamics deals with Temperature, Heat, Work, Entropy, Energy, etc as rather abstract
macroscopic concepts obeying certain rules or ‘laws’. It works, but it’s hard to explain why
without appealing to...
• Statistical Mechanics gets into the details of the physics of specific systems and makes statistical
predictions about what will happen. If the system isn’t too complicated, you can directly
derive thermo from stat mech. So statistical mechanics provides a much more complete
picture, and is easier to understand, but at the same time the analysis involved in stat mech is
more complicated. It was discovered by Maxwell, Boltzmann, Gibbs, and many others, who
provided a great deal of evidence for the existence of atoms themselves along the way.

3
A Very Brief Introduction
So what principles do we use to develop stat mech?
• Systems tend to be in typical states, ie they tend to be in and evolve towards the most likely
states. This is the second law of thermodynamics (from a stat mech viewpoint). Note

S ≡ kB log N (1.0.1)

is the definition of entropy, where N is the number of accessible states, and kB is a pointless
constant included for historical reasons. (You should really think of entropy as a pure number,
simply the log of the number of accessible states.)
For example, say I’m a bank robber, and I’m looking for a challenge, so instead of stealing $100
bills I hijack a huge truck full of quarters. But I end up crashing it on the freeway, and 106 quarters
fly everywhere, and come to rest on the pavement. How many do you expect will be heads vs tails?
The kind of reasoning you just (intuitively) employed is exactly the sort of thinking we use in stat
mech. You should learn basic statistical intuition well, as its very useful both in physics, and in the
rest of science, engineering, and economics. It’s very likely the most universally important thing you
can learn well as a physics undergraduate.
Is there any more to stat mech than entropy? Not much! Only that
• The laws of physics have to be maintained. Mostly this means that energy is conserved.
This course will largely be an exploration of the consequences of basic statistics + physics, which
will largely mean counting + energy conservation.
Say I have 200 coins, and I restrict myself to coin configurations with the same number of heads
and tails. Now I start with the heads in one pile and the tails in the other (100 each). I play a game
where I start randomly flipping a head and a tail over simultaneously. After a while, will the two
100-coin piles still be mostly heads and mostly tails? Or will each pile have roughly half heads and
half tails? This is a pretty good toy model for heat transfer and entropy growth.

Connections with Other Fields

The methods of this course are useful in a broad range of fields in and outside of physics:
• In stat mech we learn how to sum or average over distributions of states. We do something
quite similar in Quantum Mechanics. In fact, there are deep and precise connections between
the partition function in stat mech and the path integral in quantum mechanics.
• Information Theory concerns how much information we can store, encode, and transmit, and
is also built on the idea of entropy (often called Shannon Entropy in that context).
• Because gases are easy, engines use them, and they were important to the development of
thermo and stat mech, we’ll talk somewhat about how thermodynamics places fundamental
limitations on the efficiency of engines. There are similar constraints on the efficiency of
computation, which are just as easy to understand (though a bit less pessimistic).

4
• Machine Learning systems are largely built on the same conceptual foundations. In particular,
‘learning’ in ML is essentially just statistical regression, and diﬀerences between ML models
are often measured with a version of entropy. More importantly, even when ML systems are
deterministic algorithms, they are so complicated and have so many parameters that we use
statistical reasoning to understand them – this is identical to the situation in stat mech, where
we describe deterministic underlying processes using probability and statistics. (It’s also what
we do when we talk about probability in forecasting, eg when we say there’s a 30% probability
someone will win an election.)

2 Energy in Thermal Physics

2.1 Temperature and Energy
Temperature
Temperature is an intrinsic property of both of two systems that are in equilibrium, ie that are in
contact much longer than a relaxation time.
• Contact can be physical, or it can be established through radiation, or via mutual conduct
with some other substance/fluid. But heat must be exchanged for temperature...
• Other kinds of equilibrium might be energy = thermal, volume = mechanical, particles
= diffusive.
• Objects that give up energy are at a higher temperature.
So more specifically, objects at higher temperature spontaneously give up heat energy to objects at
lower temperature. From the point of view of Mechanics or E&M, this is weird! Nothing like this is
true of momentum, charge, force, ie they aren’t spontaneously emitted and absorbed!?
• Operationally, use a thermometer. These are usually made of substances that expand when
hotter.
• Scale is basically arbitrary for now. Just mark some points and see how much expansion
happens for a single material! Celsius from boiling water vs freezing.
• Different materials (eg mercury vs metal) expand a bit differently, so precise measurements
require care and more thought. But in many cases expansion is linear in temperature, so we
get same approximate results.
Hmm, are any materials more interesting than others for this purpose?
What if we use a gas to measure temperature?
• Operationally, a gas can keep on shrinking. In fact, at some temperature its volume or pressure
will go to zero!
• This temperature is called absolute zero, and defines the zero point of the absolute tempera-
ture scale.

5
Ideal Gas and the Energy-Temperature Relation
Gases are the harmonic oscillator of thermodynamics – the reason is that we want to study a large
number of very simple (approximately independent!) degrees of freedom in a fairly intuitive context.
Everyone breathes and has played with a balloon.
Low-density gases are nearly ideal, and turn out to satisfy (eg experiments demonstrate this)

P V = N kT (2.1.1)

where k is Boltzmann’s constant (which is sort of arbitrary and meaningless, as we’ll see, but we
have to keep it around for historical reasons).

• Note that Chemists use Avogadro’s number NA = 6.02 × 1023 and nR = N k where n is molar
density. This is also totally arbitrary, but it hints at something very important – there are
very big numbers ﬂoating around!

• Of course the ideal gas law is an approximation, but it’s a good approximation that’s important
to understand.

Why are we talking about this? Because the relationship between temperature and energy
is simple for an ideal gas. Given

• the ideal gas law

• the relationship between pressure and energy

we can derive the relationship between temperature and energy.

Consider a molecule of gas and piston with area A controlling a volume V = LA. Let’s imagine
its only a single molecule, as that’s equivalent to many totally non-interacting molecules. The
average pressure is

F¯x,onpiston F¯x,onmolecule m Δv
� x
Δt
P̄ = =− =− (2.1.2)
A A A
Its natural to use the double-round-trip time
L
δt = 2 (2.1.3)
vx
Note that

Δvx = −2vx (2.1.4)

So we ﬁnd that
m (−2vx ) mvx2
P̄ = − = (2.1.5)
A 2L/vx V

6
The two factors of vx come from the momentum transfer and from the time between collisions.
If we have N molecules, the total pressure is the sum, so we can write
P̄ V = N mv¯x2 (2.1.6)
So far we haven’t made any assumptions. But if we assume the ideal gas law, we learn that
1 1
kT = mv¯x2 (2.1.7)
2 2
or the average total kinetic energy is
3 1 �
kT = m v¯x2 + v¯y 2 + v¯z2

(2.1.8)
2 2
Cool! We have related energy to temperature, by assuming the (empirical for now) ideal gas law.
We can also get the RMS speed of the molecules as
3kT
v̄ 2 = (2.1.9)
m
Is the sum or average of the squares of numbers equal to the sum squared? So this isn’t the average
speed, but it’s pretty close. (What does it being close tell us about the shape of the distribution of
speeds?)
Our result is true for ideal gases. The derivation breaks down at very low temperatures, when
the molecules get very close together, though actually the result is still true (just not our argument).
We’ll discuss this later in the course.

Equipartition of Energy
The equipartition theorem, which we’ll prove later in the course, says that all ‘quadratic’ degrees of
freedom have average energy = 12 kT . This means kinetic energies and harmonic oscillator potential
energies:
• Atoms in molecules can rotate, and their bonds can vibrate.
• Solids have 3 DoF per molecular motion and 3 DoF for the potential energy in bonds.
• Due to QM, some motions can be frozen out, so that they don’t contribute.
This is all useful for understanding how much heat energy various substances can absorb.

2.2 Heat and Work

Book says that we can’t deﬁne energy... actually, we can deﬁne it as the conserved quantity associated
with time translation symmetry, for those who know about Noether’s theorem.
Recall that temperature is a measure of an object’s tendency to spontaneously give up
energy... temperature often increases with energy, but not always, and it’s important not to confuse
temperature with energy or heat.
In thermodynamics we usually discuss energy in the form of heat or work.

7
• Heat is the spontaneous ﬂow of energy due to temperature diﬀerences.

• Work is any other transfer of energy... you do work by pushing a piston, stirring a cup, running
current through a resistor or electric motor. In these cases the internal energy of these systems
will increase, and they will usually ‘get hotter’, but we don’t call this heat.

• Note that these are both descriptions of energy transfers, not total energies in a system.

These aren’t the everyday usages of these terms.

If I consider shining a flashlight with an incandescent bulb on something, I’m honestly not clear
if that’s heat or work – it’s work since it’s intentional (and the electrical ‘heating’ of the filament is
definitely work), but when the flashlight emits light, it’s actually ‘spontaneous’ thermal radiations,
so the classification is unclear.
The change in the internal energy of a system is

ΔU = Q + W (2.2.1)

and this is just energy conservation, sometimes called the ﬁrst law of thermodynamics.

• Note these are the amounts of heat and work entering the system.

• Note that we often want to talk about inﬁnitessimals, but note that ‘dQ’ and ‘dW ’ aren’t the
change in anything, since heat and work ﬂow in or out, they aren’t properties of the system
itself.

Processes of heat transfer are often categorized as

• Conduction is by molecular contact

• Radiation is by emission of electromagnetic waves

• Convection is the bulk motion of a gas or liquid

Compression Work
For compression work we have
~ → F Δx
W = F~ · dx (2.2.2)

for a simple piston. If compression happens slowly, so the gas stays in equilibrium, then I can replace

W = P AΔx = −P ΔV (2.2.3)

This is called quasistatic compression. To leave this regime you need to move the piston at a speed
of order the speed of sound or more (than you can make shock waves).

8
That was for inﬁnitesimal changes, but for larger changes we have
Z Vf
W =− P (V )dV (2.2.4)
Vi

as long as the entire process is quasistatic.

Now let’s consider two types of compressions/expansions of an ideal gas, where we hold different
quantities constant. These are
• isothermal – holds temperature constant
• adiabatic – no heat flow
• Which do you think is slower??
For an isotherm of an ideal gas we have
N kT
P = (2.2.5)
V
with T fixed, and so it’s easy to compute
Z Vf
N kT Vi
W =− dV = N kT log (2.2.6)
Vi V Vf
as the work done on the gas. If it contracts the work done on the gas is positive, if it expands the
work done on the gas is negative, or in other words it does work on the environment (ie on the
piston). How much heat flows? We just have

Q = ΔU − W = −W (2.2.7)

so heat is equal to work.

What if no heat is exchanged? By equipartition we have
f
U= N kT (2.2.8)
2
where f is the number of degrees of freedom per molecule. If f is constant then
f
dU = N kdT (2.2.9)
2
so that we have
f
N kdT = −P dV (2.2.10)
2
which along with ideal gas gives
f dT dV
=− (2.2.11)
2 T V
9
so that
V T f /2 = constant (2.2.12)
or
V γ P = constant (2.2.13)
2
where γ = 1 + f
is the adiabatic exponent.

Heat Capacities
Heat capacity is the amount of heat needed to raise its temperature. The specific heat capacity
is the heat capacity per unit mass.
As stated, specific heat capacity is ambiguous because Q isn’t a state function. That is
Q ΔU − W
C= = (2.2.14)
ΔT ΔT
It’s possible that U only depends on T (but likely not!), but W definitely depends on the process.
• Most obvious choice is W = 0, which usually means heat capacity at constant volume

∂U
CV = (2.2.15)
∂T V
This could also be called the ‘energy capacity’.
• Objects tend to expand when heated, and so its easier to use constant pressure

ΔU + P ΔV ∂U ∂V
CP = = +P (2.2.16)
ΔT P ∂T P ∂T P
For gases this distinction is important.
Interesting and fun to try to compute CV and CP theoretically.
When the equipartition theorem applies and f is constant, we find

∂U ∂ f f
CV = = N kT = N k (2.2.17)
∂T ∂T 2 2
For a monatomic gas have f = 3, while gases with rotational and vibrational DoF have (strictly)
larger f . So CV allows us to measure f .
Constant pressure? For an ideal gas the only difference is the addition of

∂V ∂ N kT Nk
= = (2.2.18)
∂T P ∂T P P
so that
CP = CV + N k (2.2.19)
Interestingly CP is independent of P , because for larger P the gas just expands less.

10
Phase Transitions and Latent Heat
Sometimes you can put heat in without increasing the temperature at all! This is the situation
where there’s Latent Heat, and usually occurs near a ﬁrst order phase transition. We say that the
Q
L≡ (2.2.20)
m
is the latent heat per mass needed to accomplish the transformation. Conventionally this is at
constant pressure, as its otherwise ambiguous. Note it takes 80 cal/g to melt water and 540 cal/g to
boil it (note it takes 100 cal/g to heat water from freezing to boiling).

Enthalpy
Since we often work at constant pressure, it’d be convenient if we could avoid constantly having to
include compression-expansion work. We can do this be deﬁning the enthalpy

H = U + PV (2.2.21)

This is the total energy we’d need to create the system out of nothing at constant
pressure. Or if you annihilate the system, you could get H back having the atmosphere contract!
It’s useful because

ΔH = ΔU + P ΔV
= Q + Wother (2.2.22)

where Wother is non-compression work. If no other form of work, ΔH = Q. Means

∂H
CP = (2.2.23)
∂T P

Could all CP the “enthalpy capacity” while CV is “energy capacity”. No need for any heat at all,
since could just be ‘other’ work (eg a microwave oven).
For example, boiling a mole of water at 1 atmosphere takes 40, 660 J, but via ideal gas

P V = RT ≈ 3100 J (2.2.24)

at room temperature, so this means 8% of the energy put in went into pushing the atmosphere away.

3 Entropy and the 2nd Law

3.1 State Counting and Probability
Why do irreversible processes seem to occur, given that the laws of physics are reversible? The
answer is that irreversible processes are overwhelmingly probable.

11
Two-State Systems, Microstates, and Macrostates
Let’s go back to our bank robber scenario, taking it slow.
If I have a penny, a nickel, and a dime and a flip them, there are 8 possible outcomes. If the coins
are fair, each has 1/8 probability. Note that there are 3 ways of getting 2 heads and a tail, so the
probability of that outcome (2 heads and a tail) is 3/8. But if I care about whether the tail comes
from the penny, nickel, or dime, then I might want to discriminate among those three outcomes.
This sort of choice of what to discriminate, what to care about, or what I can actually distinguish
plays an important role in stat mech. Note that
• A microstate is one of these 8 different outcomes.
• A macrostate ignores or ‘coarse-grains’ some of the information. So for example just giving
information about how many heads there were, and ignoring which coin gave which H/T
outcome, defines a macrostate.
• The number of microstates in a macrostate is called the multiplicity of the macrostate. We
often use the symbol Ω for the multiplicity.
If we label macrostates by the number of heads n, then for n = 0, 1, 2, 3 we find
Ω(n) 1 3 3 1
p(n) = = , , , (3.1.1)
Ω(all) 8 8 8 8
How many total microstates are there if we have N coins? It’s 2N , as each can be H or T. What are
the multiplicities of the macrostates with n heads with N = 100? Well note that
100 × 99 100 × 99 × 98
Ω(0) = 1, Ω(1) = 100, Ω(2) = , Ω(3) = (3.1.2)
2 3×2
In general we have
N!
Ω(n) = (3.1.3)
n!(N − n)!
which is N choose n. So this gives us a direct way to compute the probability of finding a given
number of heads, even in our bank robbery scenario where N = 106 .
We can immediately make this relevant to physics by studying the simplest conceivable magnetic
system, a two-state paramagnet. Here each individual magnetic dipole has only two states
(because of QM), so it can only point up or down, and furthermore each dipole behaves as though
its magnetic moment is independent of the others (they interact very weakly... in fact, in our
approximation, they don’t interact at all). Now H/T just becomes ↑ / ↓, so a microstate is
· · · ↑↑↑↓↑↓↓↑↓↓↑ · · · (3.1.4)
while a macrostate is specified by the total magnetization, which is just N↑ − N↓ = 2N↑ − N , and
multiplicity
N!
Ω(N↑ ) = (3.1.5)
N↑ !N↓ !

12
This really becomes physics if we apply an external magnetic ﬁeld, so that the system has an
energy corresponding with how many little dipoles are aligned or anti-aligned with the external ﬁeld.

Silly Comments on Large Numbers

We saw that NA = 6.02 × 1023 . But if we have that many spins, and each can be up or down, then...
we get really large numbers! Let’s note what’s worth keeping track of here, eg consider

1023 + 50 (3.1.6)

or
23
8973497 × 1010 (3.1.7)

With very very large numbers, units don’t even matter! Be very careful though, because in many
calculations the very very large, or just very large numbers cancel out, and so you really do have to
carefully keep track of pre-factors.

Important Math
Now we will derive the special case of a very universal result, which provides information about the
aggregate distribution of many independent random events.
Since N is large, N ! is really large. How big? One way to approximate it is to note that
PN
N ! = elog N ! = e n=1 log n
(3.1.8)

Now we can approximate

N
X Z N
log n ≈ dx log x = N (log N − 1) (3.1.9)
n=1 0

which says that

N ! ∼ N N e−N (3.1.10)

This is the most basic part of Stirling’s Approximation to the factorial. The full result is
√
N ! ≈ N N e−N 2πN (3.1.11)

You can derive this more precise result by taking the large N limit of the exact formula
Z ∞
N! = xN e−x dx (3.1.12)
0

13
That was a warm-up to considering the distribution of H/T, or spins in a paramagnet.

N!
Ω(n) =
n!(N − n)!
N N e−N
≈ (3.1.13)
nn e−n (N − n)N −n en−N

Now if we write n = xN then this becomes

NN
Ω(n) ≈
xxN N xN (1 − x)N (1−x) N N (1−x)
N
1
≈
xx (1 − x)(1−x)
≈ eN (−x log x−(1−x) log(1−x)) (3.1.14)

The prefactor of N in the exponent means that this is very, very sharply peaked at its maximum,
which is at x = 12 . Writing x = 12 + y we have

4
−x log x − (1 − x) log(1 − x) = log 2 − 2y 2 − y 4 + · · · (3.1.15)
3
so that

1+y 2
Ω n= N ≈ 2N e−2N y
2
(2n−N )2
≈ 2N e−2 N (3.1.16)

This is a special case of the central limit theorem, which is arguably the most important result
in probability and statistics. The distribution we have found is the Gaussian Distribution, or
the ‘bell curve’. Its key feature is its variance, which says intuitively that

N √
n− . few × N (3.1.17)
2
√
where here N is the standard deviation. Thus we expect that 2nN−N . √1N 1.
This means that if we flip a million fair coins, we expect the difference between the number of
heads and the number of tails to be of order 1000, and not of order 10, 000. And if the difference
was only, say, 3, then we might be suspicious that the numbers are being faked.

Paramagnets and Random Walks

More relevantly for physics, this tells us that if we have ∼ 1023 dipoles in a paramagnet (recall
Avogadro’s number), with their dipole moments randomly √ arranged, then the total contribution
to the magnetic moment we should expect is of order ∼ 1023 ≈ 3 × 1011 . That seems like a big

14
number, but it’s a very small fraction of the total, namely ∼ 10−11 fraction. Thus the diﬀerence
between up and down spins is minute.
We have also largely solved the problem of a (1-dimensional) random walk. Note that ‘heads’
and ‘tails’ can be viewed as steps to the right or left. Then the total nH − nT is just the number
of steps taken to the
√ right. This means that if we take N random steps, we should expect to be a
distance of order N from our starting point. This is also an important, classic result.

3.2 Two Coupled Paramagnets

With two coupled paramagnets, we can study heat ﬂow and irreversible processes.
But to have a notion of energy, we need to introduce a background magnetic ﬁeld, so that ↑ has
energy µ and ↓ has energy −µ. Then if we have 100 magnetic moments in a Paramagnet, the energy
of the system can range from

−100µ, −98µ, −96µ, · · · , 0, · · · 98µ, 100µ (3.2.1)

Note that there’s a unique configuration with minimum and maximum energy, whereas there are
100!
50!50!
configurations with energy 0.
So let’s assume that we have two paramagnets, which can exchange energy between them by
flipping some of the dipoles.

• We will treat the two paramagnets as weakly coupled, so that energy ﬂows between them
slowly and smoothly, ie energy exchange within each paramagnet is much faster than between
the two. We’ll call these energies UA and UB .

• Utotal = UA + UB is ﬁxed, and we refer to the macrostate just via UA and UB . Note that the
number of spins in each paramagnet NA and NB are ﬁxed.

The total number of states is just

Ωtot = ΩA ΩB = Ω(NA , UA )Ω(NB , UB ) (3.2.2)

Now we will introduce the key assumption of statistical mechanics:

• In an isolated system in thermal equilibrium, all accessible (ie possible) microstates are equally
probable.

A microstate might be inaccessible because it has the wrong energy, or the wrong values of conserved
charges. Otherwise it’s accessible.
Microscopically, if we can go from X → Y , we expect we can also transition from Y → X (this
is the principle of detailed balance). Really there are so many microstates that we can’t possible
‘visit’ all of them, so the real assumption is that the microstates we do see are a representative
sample of all the microstates. Sometimes this is called the ‘ergodic hypothesis’ – averaging over
time is the same as averaging over all accessible states. This isn’t obvious, but should seem plausible
– we’re assuming the transitions are ‘suﬃciently random’.

15
For simplicity, let’s imagine that the total energy is UA + UB = 0, and the two paramagnets
have NA = NB = N dipoles each. How many configurations are there with all spins up in one
paramagnet, and all spins down in the other? Just one! But with n up spins in the first paramagnet
and N − n up spins in the second, there are
(2N )! (N )!
Ωtot = Ω(N, (N − n)µ)Ω(N, (n − N )µ) = ×
n!(N − n)! n!(N − n)!
2
(N )!
= (3.2.3)
n!(N − n)!
Thus the number of states is vastly larger when n = N . In fact, we can directly apply our analysis
of combinatorics
√ to conclude that the number of states has an approximately Gaussian distribution
with width N .
Note that this is mathematically identical to the coin flipping example from the first lecture.

3.3 Another Example: ‘Einstein Solid’

The book likes to use a simple toy model where a ‘solid’ is represented as a system of N identical
but separate harmonic oscillators. They are treated as quantum oscillators, which means that they
have energy

1
E= + n ~ω (3.3.1)
2
for some non-negative integer n. And actually we’ll drop the 1/2, as only energy diﬀerences are
relevant. Also, since all energies are multiples of ~ω, we’ll just refer to this as an ‘energy unit’ and
not keep track of it.
This wasn’t Einstein’s greatest achievement by any means, and it does not describe real solids.
But it’s an easy toy example to play with, as we’ll see. More realistic models would require fancier
combinatorics.
If the total ES has E units of energy and N oscillators, then the number of states is
(E + N − 1)!
Ω(N, E) = (3.3.2)
E!(N − 1)!
Let’s brieﬂy check this for N = 3, then we have

Ω(3, 0) = 1, Ω(3, 1) = 3, Ω(3, 2) = 6, Ω(3, 3) = 10, · · · (3.3.3)

We can prove our general formula by writing a conﬁguration as

•| • •|| • • • •| • | • •||| • • (3.3.4)

where • denote units of energy and | denote transitions between oscillators. So this expression has
E = 12 and N = 9. Thus we just have a string with N − 1 of the | and E of the •, we can put the |
and • anywhere, and they’re indistinguishable.

16
If this reasoning isn’t obvious, ﬁrst consider how to count a string of N digits, say N = 7. A
conﬁguration might be

4561327 (3.3.5)

and so it’s clear that there are

7 × 6 × 5 × 4 × 3 × 2 × 1 = 7! (3.3.6)

conﬁgurations. But in the case of the • and |, there are only two symbols, so we divide to avoid
over-counting identical conﬁgurations.

Two Coupled Einstein Solids

The formula for counting configurations for an Einstein solid is very (mathematically) similar to
that for counting states of a paramagnet. But it’s slightly different in its dependence on energy and
number of oscillators.
If we fix E = EA + EB for the two ESs, then the number of configurations is
(EA + NA − 1)! (EB + NB − 1)!
Ωtot = ×
EA !(NA − 1)! EB !(NB − 1)!

1 (EA + NA − 1)!(E − EA + NB − 1)!
= (3.3.7)
(NA − 1)!(NB − 1)! EA !(E − EA )!
so we can study how many configurations there are as a function of EA , where we fix the total
energy E and the number of oscillators in each ES.
It’s not so easy to get intuition about this formula directly, so let’s approximate it. First we
will just study Ω(E, N ) itself. An interesting approximation is the high-temperature and (mostly)
classical limit, where E N , so that each oscillator will tend to have a lot of energy (that is E/N
is large).
(If we view the ‘pool’ of total energy as very large, then we could assign each oscillator to have
energy randomly chosen between 0 and 2E/N , so that overall the total energy works out. This
would give us (2E/N )N configurations, which is very close to the right answer.)
We already did the relevant math. But once again, using Stirling’s formula
(E + N )!
log ≈ (E + N ) log(E + N ) − E − N − E log E + E − N log N + N
E!N !
≈ (E + N ) log(E + N ) − E log E − N log N (3.3.8)

In the large E/N limit, we have

N
log(E + N ) = log E + log 1 +
E
N
≈ log E + (3.3.9)
E
17
This then gives that
E
log Ω ≈ N log +N (3.3.10)
N
which means that
N
N E
Ω(N, E) ≈ e (3.3.11)
N
We can make sense of this by noting that this is roughly equal to the number of excited energy
levels per oscillator, E/N , raised to a power equal to the number of oscillators.
Thus using this approximation, we see that
N N
EA A E − EA B
Ωtot ≈ (3.3.12)
NA NB
This is maximized as a function of EA when
NA
EA = E (3.3.13)
NA + NB
which is when the two ESs have energies proportional to the number of oscillators in each.
Repeating the math we used previously, we can parameterize
NA
EA = E+x (3.3.14)
NA + NB
and we ﬁnd that
NA NB
E x E x
Ωtot ≈ + −
N NA N N
B
E x E x
≈ exp NA log + + NB log −
N NA N NB
N " 2 #
E N Nx
≈ exp − (3.3.15)
N 2NA NB E

Thus we see that we have another Gaussian distribution

q in x/ρ, where ρ = E/N is the energy
density. The exponent becomes large whenever xρ > NANNB . This means that (assuming NA ∼ NB )
x/ρ is much much larger than 1, but much much smaller than N .
The punchline is that configurations that maximize Ω are vastly more numerous than those
that do not, and that macroscopic fluctuations in the energy are therefore (beyond) astronomically
unlikely. Specifically, since N & 1023 in macroscopic systems, we would need to measure energies to
11 sig figs to see these effects.
The limit where systems grow to infinite size and these approximations become arbitrarily good
is called the thermodynamic limit.

18
3.4 Ideal Gases
All large systems – ie systems with many accessible states or degrees of freedom – have the property
that only a tiny fraction of macrostates have reasonably large probabilities. Let us now apply this
reasoning to the ideal gas. What is the multiplicity of states for an ideal gas?
In fact, in the ideal limit the atoms don’t interact, so... what is the multiplicity of states for a
single atom in a box with volume V ?
If you know QM then you can compute this exactly. Without QM it would seem that there are an
inﬁnite number of possible states, since the atom could be anywhere (it’s location isn’t quantized!).
But we can get the right answer without almost any QM.

• First note that the number of states must be proportional to the volume V .

• But we can’t specify a particle’s state without also specifying its momentum. Thus the number
of states must be proportional to the volume in momentum space, Vp .

Note that the momentum space volume is constrained by the total energy p~2 = 2mU .
We also need a constant of proportionality, but in fact that’s (essentially) why Planck’s constant
~ exists! You may have heard that δxδp > h/2, ie this is the uncertainty principle. So we can write

V Vp
Ω1 = (3.4.1)
h3
by dimensional analysis. You can conﬁrm this more precisely if you’re familiar with QM for a free
particle in a box.
If we have two particles, then there are two added complications

• We must decide if the particles are distinguishable or not. For distinguishable particles we’d
have Ω21 , while for indistinguishable particles we’d have 12 Ω21 .

• We need to account for momentum conservation as p~21 + p~22 = 2mU . With more particles we
have a sum over all of their momenta.

For N indistinguishable particles we have

1 VN
ΩN = × (momentum hypersphere area) (3.4.2)
N ! h3N
where the momentum hypersphere is the volume in momentum space, constrained by momentum
conservation.
The surface area of a D-dimensional sphere is

2π D/2 D−1
A= R (3.4.3)
( D2 − 1)!

19
√
We know that R = 2mU , where U is the total energy of the gas, and that D = 3N . Plugging this
in gives

1 V N 2π 3N/2 3N −1
ΩN = 3N 3N
(2mU ) 2
N ! h ( 2 − 1)!
1 π 3N/2 V N 3N
≈ 3N 3N
(2mU ) 2 (3.4.4)
N ! ( 2 )! h

where we threw away some 2s and −1s because they are totally insigniﬁcant. This eliminates the
distinction between ‘area’ and ‘volume’ in momentum space, because momentum space has the
enormous dimension 3N . With numbers this big, units don’t really matter!
Note that we can write this formula as

ΩN = f (N )V N U 3N/2 (3.4.5)

so the dependence on Volume and Energy are quite simple.

Interacting Ideal Gases

Let’s imagine that we have two ideal gases in a single box with a partition. Then we just multiply
multiplicities:

Ωtot = f (N )2 (VA VB )N (UA UB )3N/2 (3.4.6)

Since UA + UB = Utot , we see that multiplicity is maximized very sharply by UA = UB = Utot /2. And
the peaks are very, very sharp as a functions of UA – we once again have a Gaussian distribution for
ﬂuctuations about this equilibrium. Though the exact result is the simpler function

(x(1 − x))P (3.4.7)

for very, very large P = 3N/2. It should be fairly intuitive that this function is very sharply
maximized at x = 1/2.
If the partition can move, then it will shift around so that

VA = VB = Vtot /2 (3.4.8)

as well. This will happen ‘automatically’, ie the pressures on both sides will adjust to push the
partition into this conﬁguration. Once again, the peak is extremely sharp.
Finally, we could imagine NA 6= NB to start, but we could poke a hole in the partition to see
how the molecules move. Here the analysis is a bit more complicated – it requires using Stirling’s
formula again – but once again there’s a very, very sharply peaked distribution.
Note that as a special case, putting all the molecules on one side is 2−N less likely than having
them roughly equally distributed. This case exactly like ﬂipping N coins!

20
3.5 Entropy
We have now seen in many cases that the highest multiplicities macrostates are vastly, vastly more
numerous than low-multiplicity microstates, and that as a result systems tend to evolve towards
these most likely states. Multiplicity tends to increase.
For a variety of reasons, it’s useful to replace the multiplicity with a smaller number, its logarithm,
which is the entropy

S ≡ log Ω (3.5.1)

For historical reasons, in physics we actually deﬁne

S ≡ k log Ω (3.5.2)

where k is the largely meaningless but historically necessary Boltzmann constant.

Notice that because logarithms turn multiplication into addition, we have

Stot = SA + SB (3.5.3)

when Ωtot = ΩA × ΩB . This is quite convenient for a variety of reasons. For one thing, the entropy is
a much smoother function of other thermodynamic variables (ie it doesn’t have an extremely sharp
peak).
Since the logarithm is a monotonic function, the tendency of multiplicity to increase is the same
thing as saying that entropy tends to increase:

δS ≥ 0 (3.5.4)

That’s the 2nd law of thermodynamics!

Note that the 2nd law picks an arrow of time. That’s a big deal. This arrow of time really
comes from the fact that initial conditions in the universe had low entropy. (If you start with a low
entropy ﬁnal condition and run physics backwards, entropy would increase as time went backward,
not forward.)
Can you oppose the 2nd law and decrease entropy through intelligence and will? No. You can
work hard to decrease the entropy in one system, but always at the cost of far more entropy creation
elsewhere. Maxwell discussed this idea as ‘Maxwell’s Demon’.

21
Entropy of an Ideal Gas
We can compute the entropy of an ideal gas from the multiplicity formula
" #
1 π 3N/2 V N 3N
S = log (2mU ) 2
N ! ( 3N
2
)! h 3N

3 3 3N V
= N k (1 − log N ) + π + 1 − log + log 3 + log 2mU
2 2 2 h
" 3/2 !#
5 V 4πmU
= Nk + log
2 N 3N h2
" 3/2 !#
5 1 4πmu
= Nk + log (3.5.5)
2 n 3h2

where u = U/N and n = N/V are the energy density and number density, respectively. Apparently
this is called the Sackur-Tetrode equation, but I didn’t remember that.
What would this result have looked like if we had used distinguishable particles??? The problem
associated with this is called the Gibbs paradox. We’ll talk about extensive vs intensive quantities
more thoroughly later.
Entropy dependence on volume (with ﬁxed U and N ) is simplest, ie just

Vf
ΔS = N k log (3.5.6)
Vi
Note that this makes sense in terms of doubling the volume and gas particles being in either the left
or right half of the new container.
This formula applies both to expansion that does work, eg against a piston (where the gas
does work, and heat flows in to replace the lost internal energy) and to free expansion of a gas.
These are quite different scenarios, but result in the same change in entropy.
What about increasing the energy U while fixing V and N ? Where did the exponent of 3/2
come from, and why does it take that value?

Entropy of Mixing
Another way to create entropy is to mix two diﬀerent substances together. There are many more
states where they’re all mixed up than when they are separated, so mixing creates entropy.
One way to think about it is if we have gas A in one side of a partition, and gas B on the other
side, then when they mix they each double in volume, so we end up with

ΔS = (NA + NB )k log 2 (3.5.7)

in extra entropy. It’s important to note that this only occurs if the two gases are diﬀerent
(distinguishable).

22
Note that doubling the number of molecules of the same type of gas in a ﬁxed volume does not
double the entropy. This is because while out front N → 2N , we also have the log n1 → log 21n , and
so the change in the entropy is strictly less than a factor of 2.
(Now you may wonder if the entropy ever goes down; it doesn’t, because for N (X − log N ) to be
a decreasing function of N , we must have S < 0, which means a multiplicity less than 1!)
In fact, the very deﬁnition of entropy strongly suggests that S ≥ 0 always.

A Simple Criterion for Reversible Processes

Reversible processes must have
ΔS = 0 (3.5.8)
because otherwise they’d violate the second law of thermodynamics! Conversely, processes that
create entropy ΔS > 0 must be irreversible.

• Free expansion, or processes requiring heat exchange will increase entropy. We’ll see soon that
reversible changes in volume must be quasistatic with W = −P ΔV , and no heat flow.
• From a quantum point of view, these quasistatic processes slowly change the energy levels of
atoms in a gas, but they do not change the occupancy of any energy levels. This is what keeps
multiplicity constant.
• We saw earlier that heat ‘wants’ to flow precisely because heat flow increases entropy
(multiplicity). The only way for heat to flow without a change in entropy is if it flows
extremely slowly, so that we can take a limit where multiplicity almost isn’t changing.

To see what we mean by the last point, consider the entropy of two gases separated by a partition.
As a function of the internal energies, the total entropy is
3
Stot = N k log ((Utot − δU )(Utot + δU )) (3.5.9)
2
So we see that as a function of δU , the multiplicity does not change when δU Utot . More precisely
∂Stot
=0 (3.5.10)
∂δU
at δU = 0. So if we keep δU inﬁnitesimal, we can cause heat to move energy across the partition
without any change in entropy or multiplicity. In fact, this idea will turn out to be important...

4 Interactions and Temperature

4.1 What is Temperature?
Early in the course we tentatively deﬁned temperature as a quantity that’s equal when two substances
are in thermal equilibrium. And we know it has something to do with energy. But what is it really?

23
The only important ideas here are entropy/multiplicity tend to increase until we reach
equilibrium, so they stop increasing when we’re in equilibrium, and energy can be shared, but
total energy is conserved.
If we divide our total system into sub-systems A and B, then we have UA + UB = Utot is ﬁxed,
and total entropy Stot = SA + SB is maximized. This means that
∂Stot
=0 (4.1.1)
∂UA equilibrium

But
∂Stot ∂SA ∂SB
= +
∂UA ∂UA ∂UA
∂SA ∂SB
= − (4.1.2)
∂UA ∂UB
where to get the second line we used energy conservation, dUA = −dUB . Thus we see that at
equilibrium
∂SA ∂SB
= (4.1.3)
∂UA ∂UB
So the derivative of the entropy with respect to energy is the thing that’s the same between
substances in thermal equilibrium.
If we look at units, we note that the energy is in the denominator, so it turns out that temperature
is really defined to be

1 ∂S
≡ (4.1.4)
T ∂U N,V
where the subscript means that we’re holding other quantities like the number of particles and the
volume fixed.
Let’s check something basic – does heat flow from larger temperatures to smaller temperatures?
The infinitessimal change in the entropy is

1 1 1 1
δS = δUA + δUB = − δUA (4.1.5)
TA TB TA TB
So we see that to increase the total entropy (as required by the 2nd law of thermodynamics), if
TA < TB then we need to have δUA > 0 – that is, the energy of the low-temperature sub-system
increases, as expected.
What if we hadn’t known about entropy, but were just using multiplicity? Then we would have
found that
∂Ωtot ∂ΩA ∂ΩB
= Ω B + ΩA
∂UA ∂UA ∂UA

1 ∂ΩA 1 ∂ΩB
= ΩA ΩB − (4.1.6)
ΩA ∂UA ΩB ∂UB

24
and we would have learned through this computation that entropy was the more natural quantity,
rather than multiplicity. I emphasize this to make it clear that entropy isn’t arbitrary, but is
actually a natural quantity in this context.
Note that we now have a crank to turn – whenever there’s a quantity X that’s conserved,
we can consider what happens when two systems are brought into contact, and ﬁnd the equilibrium
conﬁguration where S(XA , XB ) is maximized subject to the constraint that XA + XB is constant.
We can do this for X = Volume, Number of Particles, and other quantities.

Examples
For an Einstein Solid, we computed S(U ) a while ago and found
u U
S = N k log = N k log + Nk (4.1.7)
N
where is some energy unit in the oscillators. So the temperature is
U
T = (4.1.8)
Nk
This is what the equipartition theorem would predict, since a harmonic oscillator has a kinetic
energy and a potential.
More interestingly, for an monatomic ideal gas we found
3
S = N k log V + N k log U + log f (N ) (4.1.9)
2
Thus we ﬁnd that
3
U = N kT (4.1.10)
2
which once again agrees with equipartition.

4.2 Entropy and Heat Capacity

We can use our newfound knowledge of temperature to define and understand the heat capacity. It’s
important to know what we’re holding constant; at constant volume we define

∂U
CV ≡ (4.2.1)
∂T N,V
which you might also think of as the energy capacity.
For a monatomic ideal gas this is
∂ 3 3
CV = N kT = N k (4.2.2)
∂T 2 2
which can be experimentally verified, though many systems have more exotic CV .
In general, if you want to compute CV you need to

25
1. Compute S in terms of U and other quantities.

2. Compute T from the derivative of S

3. Solve for U (T ), and then diﬀerentiate it.

We’ll learn a quite diﬀerent and often more eﬃcient route in a few lectures.

Measuring Entropy via CV

We can measure entropy, or at least entropy diﬀerences, using heat. As long as work W = 0, we
have that
dU Q
dS = = (4.2.3)
T T
So if we can measure T and Q then we know how much the entropy changed!

• If T stays constant than this works even for ﬁnite changes in S.

• If the volume is changing, but the process is quasistatic, then this rule also applies.

A related statement is that

CV dT
dS = (4.2.4)
T
If the temperature does change, then we can just use CV (T ), by thinking of the changes as tiny
steps. So
Z Tf
CV
ΔS = dT (4.2.5)
Ti T

In some examples CV is approximately constant. For instance if we heat a cup of water from room
temperature to boiling, we have
Z 373
1 373
ΔS ≈ (840 J/K) dT = (840 J/K) log ≈ 200J/K ≈ 1.5 × 1025 (4.2.6)
293 T 293

In fundamental units this is an increase of 1.5 × 1025 , so that the multiplicity increases by a factor
of eΔS , which is a truly large number. This is about 2 per molecule.
If we knew CV all the way down to zero, then we could compute
Z Tf
CV
Sf − S(0) = dT (4.2.7)
0 T

So what is S(0)? Intuitively, it should be S(0) = 0, so that Ω = 1, the unique lowest energy state.
Unless there isn’t a unique ground state.

26
In practice many systems have residual entropy at (eﬀective) T ≈ 0, due to eg re-arrangements
of molecules in a crystal that cost very, very little energy. There can also be residual entropy from
isotope mixing. (Interestingly Helium actually re-arranges its isotopes at T = 0, as it remains a
liquid.)
Note that to avoid a divergence in our deﬁnitions, it would seem that
CV → 0 as T → 0 (4.2.8)
This, and the statement that S(0) = 0, are both sometimes called the third law of thermodynamics.
Apparently ideal gas formulas for heat capacities must be wrong as T → 0, as we have already
observed.

Historic View of Entropy

Historically dS = Q/T was the deﬁnition of entropy used in thermodynamics. It was an important
and useful quantity, even though no one knew what it was! Nevertheless, they got quite far using
ΔS ≥ 0 as the 2nd law of thermodynamics.

4.3 Paramagnets – Everything You Know Is Wrong

Now we will solve the two-state paramagnet model. This is almost identical to the toy model with
coin flipping from the very first day of class! However, we’ll see some features that you may find
very surprising.
The paramagnet is ideal in that it just consists of non-interacting dipoles. Also, we will invoke
quantum mechanics to study quantized dipoles that can only point up or down:
· · · ↑↑↑↓↑↓↓↑↓↓↑ · · · (4.3.1)
We’ve already counted the states in this system. So what we need to do is add energy (and energy
conservation). The dipoles don’t interact, so the only energy comes from assuming there’s an external
magnetic field, so that
U = µB(N↓ − N↑ ) = µB(N − 2N↑ ) (4.3.2)
The magnetization is just
U
M =− (4.3.3)
B
so it’s proportional to the energy. Recall the multiplicity of states is
N!
Ω(N↑ ) = (4.3.4)
N↑ !N↓ !
Note that we can solve for N↑ in terms of U via
N U
N↑ = − (4.3.5)
2 2µB

27
which means that N↑ = N/2 for U = 0. Where is multiplicity maximized?
We see that as a function of energy U , multiplicity is maximized at U = 0, and it goes down when
we increase or decrease U ! This is very different from most other systems, as typically multiplicity
increases with energy, but not here.
Let’s draw the plot of S(U ). In the minimum energy state the graph is very steep, so the system
wants to absorb energy, as is typical for systems at very low energy. But at U = 0 the slope goes to
zero, so in fact that system stops wanting to absorb more energy. And then for U > 0 the slope
becomes negative, so the paramagnet wants to spontaneously give up energy!
But remember that
−1
∂S
T ≡ (4.3.6)
∂U
Thus when S 0 (U ) = 0, we have T = ∞. And when S 0 (U ) < 0, we have that T is negative.
Note that negative temperatures are larger than positive temperature, insofar as systems at
negative temperature give up energy more freely that systems at any positive temperature. When
|T | small and T < 0, temperature is at its effective maximum.
It’s possible to actually do experiments on systems at negative temperature using nuclear
paramagnets by simply flipping the external magnetic field. Nuclear dipoles are useful because they
don’t equilibrate with the rest of the system, so they can be studied in isolation.
Negative temperature can only occur in equilibrium for systems with bounded total energy and
finite number of degrees of freedom.
Aside from simply demonstrating a new phenomenon, this example is useful for emphasizing
that entropy is more fundamental than temperature.

Explicit Formulae
It’s straightforward to apply Stirling’s approximation to get an analytic solution for large N .
The entropy is

S ≈ N log N − N↑ log N↑ − (N − N↑ ) log(N − N↑ ) (4.3.7)

To get the temperature we just have

1 ∂S ∂N↑ ∂S 1 ∂S
= = =− (4.3.8)
T ∂U N,B ∂U ∂N↑ 2µB ∂N↑

Then the last derivative can be evaluated giving

1 k N − U/µB
= log (4.3.9)
T 2µB N + U/µB
where T and U have opposite signs, as expected. Solving for U gives
µB µB
U = −N µB tanh , M = N µ tanh (4.3.10)
kT kT
28
Then the heat capacity at constant magnetic ﬁeld is
2
∂U µB 1
CB = = Nk (4.3.11)
cosh µB
2
�
∂T N,B kT kT

This goes to zero very fast at both low and high temperature, and is maximized at T → ∞ (with
either sign).
All this work wasn’t for nothing. For physical magnets at room temperature, kT ≈ 0.025eV , while
if µ = µB ≈ 5.8 × 10−5 eV /T is the Bohr magneton (the value for an electron), then µB ∼ 10−5 eV
even for strong magnets. Thus µB/(kT ) 1. In that limit

tanh x ≈ x (4.3.12)

and so we see that the magnetization is

µB
M ≈ Nµ (4.3.13)
kT
which is called Curie’s law, as it was discovered experimentally by Pierre Curie. At high temperature
this works for all paramagnets, even those with a fundamenatlly more complicated description (eg
with more than two states, and some interactions).

4.4 Mechanical Equilibrium and Pressure

Next we would like to consider systems whose volumes can change through interaction. The idea
is that exchange of volumes should be related to pressure through the entropy. We’re
turning the crank!
So let’s imagine two gases separated by a movable partition. The gases can exchange both energy
and volume, but total energy and volume are fixed.
Total entropy should be maximized, so we expect that
∂Stotal
=0 (4.4.1)
∂VA
By the same analysis that led to temperature, we can re-write this as
∂SA ∂SB
− =0 (4.4.2)
∂VA ∂VB
since VA + VB = Vtotal is fixed, and the entropy of the full system is the sum of the individual
entropies. Thus we conclude that
∂SA ∂SB
= (4.4.3)
∂VA ∂VB
at equilibrium, and so we should expect this to have a simple relationship with the pressure! Note
this is at fixed energy and fixed number of particles, and we are assuming that we are also in thermal

29
equilibrium. If the partition were to move without this assumption, energies would not be ﬁxed, and
things would be more complicated.
Since TA = TB and T × S has units of energy, we can identify the pressure by

∂S
P =T (4.4.4)
∂V U,N

where we are free to multiply by T since we are in thermal equilibrium.

Let’s check this in an example. For an ideal gas we have that

S = N k log V + · · · (4.4.5)

where the ellipsis denotes functions of U and N that are independent of V . So we ﬁnd that
∂ N kT
P =T (N k log V ) = (4.4.6)
∂V V
So we have derived the ideal gas law! Or alternatively, we have veriﬁed our expression for the
pressure in a familiar example.

Thermodynamic Identity
We can summarize the relation between entropy, energy, and volume by a thermodynamic identity.
Let’s consider how S changes if we change U and V a little. This will be

∂S ∂S
dS = dU + dV (4.4.7)
∂U V ∂V U

where we ﬁx volume when we vary U , and ﬁx U when we vary the volume. We can recognize these
quantities and write
1 P
dS = dU + dV (4.4.8)
T T
This is most often written as

dU = T dS − P dV (4.4.9)

This is true for any system where T and P make sense, and nothing else is changing (eg number of
particles). This formula reminds us how to compute everything via partial derivatives.

Heat and Entropy

The thermodynamic identity looks a lot like energy conservation, ie

dU = Q + W (4.4.10)

30
where we interpret Q ∼ T dS and W ∼ −P dV . Can we make this precise?
Not always. This is true if the volume changes quasistatically, if there’s no other work done, and
if no other relevant variables are changing. Then we know that W = −P dV , so that

Q = T dS (quasistatic) (4.4.11)

This means that ΔS = Q/T , even if there’s work being done. If the temperature changes, we can
still compute
Z
CP
(ΔS)P = dT (4.4.12)
T
if the process is quasistatic.
Also, if Q = 0, then the entropy doesn’t change. In other words

quasistatic + adiabatic = isentropic (4.4.13)

We’ll note later that isentropic processes are interesting.

What about exceptions, which abound? Say we hit a piston really hard. Then the work done on
the gas is

W > −P dV (4.4.14)

because we hit the molecules harder than in a quasistatic process. However, we can choose to only
move the piston inﬁnitesimally, so that the
dU P Q
dS = + dV > (4.4.15)
T T T
This is really just a way of saying that we’re doing work on the gas while keeping V constant, and so
U , T , and S will increase. It looks like compression work but it’s ‘other’ work.
A more interesting example is free expansion of the gas into a vacuum. No work is done and
no heat ﬂows, but S increases.
It’s easy to create more entropy, with or without work and heat. But we can’t ever decrease it.

4.5 Diﬀusive Equilibrium and Chemical Potential

We’re turning the crank again, this time using conservation of particle number!
Let’s consider two systems that can share particle number. The usual logic tells us that
∂SA ∂SB
= (4.5.1)
∂NA ∂NB
in equilibrium, where the partial derivatives are at ﬁxed energy and volume.
We can multiply by −T to get the conventional object chemical potential

∂S
µ ≡ −T (4.5.2)
∂N U,V

31
This is the quantity that’s the same for two systems in equilibrium. It’s the potential for sharing
particles.
Since we added a minus sign, if two systems aren’t in equilibrium the one with smaller µ gains
particles. This is like temperature. Particles ﬂow towards low chemical potential.
We can generalize the thermodynamic identity as
1 P µ
dS = dU + dV − dN (4.5.3)
T V T
where the minus sign comes from the deﬁnition. Thus

dU = T dS − P dV + µdN (4.5.4)

Sometimes we refer to µdN as the ‘chemical work’.

The general thermodynamic identity is useful for remembering the relations between various
thermodynamic quantities. For example note that

dU = µdN (4.5.5)

which provides another way to think about µ as

∂U
µ= (4.5.6)
∂N S,V

This is how much energy changes when we add a particle and keep S, V ﬁxed.
Usually to add particles without changing S, you need to remove energy. But if you need to give
the particle potential energy to add it, this contributes to µ.
Let’s compute µ for an ideal gas. We take
3/2 ! !
4πmU 5/2
S = N k log V − log N (4.5.7)
3h2

Thus we have
3/2 ! !
4πmU 5
µ = −T k log V 2
− log N 5/2 + T N k
3h 2N
" 3/2 #
V 4πmU
= −kT log
N 3N h2
" 3/2 #
V 2πmkT
= −kT log (4.5.8)
N h2

where at the end we used U = 32 N kT .

32
For gas at room temperature and atmospheric pressure, V /N ≈ 4 × 10−26 m3 , whereas the other
factor is, for helium, about 10−31 m3 and so the logarithm is 12.7, and µ = −0.32 eV for helium at
300 K, 105 N/m.
Increasing the density of particles increases the chemical potential, as particles don’t ‘want’ to
be there.
So far we used only one particle type, but with many types we have a chemical potential for
each. If two systems are in diﬀusive equilibrium all the chemical potentials must be the same.
So we see that through the idea of maximizing entropy/multiplicity, we have been led to deﬁne
T , P , and µ, and the relation

dU = T dS − P dV + µdN (4.5.9)

for classical thermodynamics.

4.6 Extensive and Intensive Quantities

There’s an important distinction that we’ve alluded to but haven’t discussed directly, between
extensive and intensive quantities. The distinction is whether the quantity doubles when we
double the amount of stuﬀ, or if it stays the same. So extensive quantities are

V, N, S, U, H, mass (4.6.1)

while intensive quantities are

T, P, µ, density (4.6.2)

If you multiply note that

extensive × intensive = extensive (4.6.3)

for example volume × density = mass. Conversely, if the ratios of extensive quantities will be
intensive.
Note that you should never add an intensive and an extensive quantity – this is often wrong by
dimensional analysis, but in cases where it’s not, it also doesn’t make sense conceptually.
So intensive/extensive is like a new kind of dimensional analysis that you can use to check what
you’re calculating.

5 Engines and Refrigerators

A heat engine absorbs heat energy and turns it into work. Note that this doesn’t include electric
motors, which turn electricity and mechanical work. Internal combustion engines in cars
oﬃcially don’t absorb heat, rather they produce it from combustion, but we can treat them as
though the heat was coming from outside.

33
Engines can’t convert heat into work perfectly. As we’ll see, the reason is that absorbing heat
increases the entropy of the engine, and if its operating on a cycle, then it must somehow get rid
of that entropy before it can cycle again. This leads to inefficiency. Specifically, the engine must
dump waste heat into the environment, and so only some of the heat it initially absorbed can be
converted into work.
We can actually say a lot about this without knowing anything specific about how the engine
works. We can just represent it with a hot reservoir, the engine itself, and a cold reservoir.
Energy conservation says that

Qh = W + Qc (5.0.1)

and that the eﬃciency is

W Qc
e= =1− (5.0.2)
Qh Qh
This already says that efficiency is at most 1. We can get a stronger bound using the 2nd law of
thermodynamics.
Note that the entropy extracted from the hot reservoir is
Qh
Sh = (5.0.3)
Th
while the entropy absorbed by the cold reservoir is
Qc
Sc = (5.0.4)
Tc
Thus we must have Sc − Sh ≥ 0 or
Qc Qh
≥ (5.0.5)
Tc Th
but this implies
Qc Tc
≥ (5.0.6)
Qh Th
This then tells us that
Qc Tc
e=1− ≤1− (5.0.7)
Qh Th
This is the famous Carnot bound on the efficiency of heat engines.
Most engines are significantly less efficient than this, because they produce excess entropy, often
through the heat transfer processes themselves. Note that to be maximally efficient, the engine has
to be at Th and Tc during each transfer!

34
Carnot Cycle
The arguments above also suggest how to build a maximally eﬃcient engine – at least in theory.
Let’s see how it would work.
The basic point is that we don’t want to produce excess entropy. This means that

• heat transfers must happen exactly at Th and Tc

• there shouldn’t be any other heat transfers

This basically ﬁxes the behavior the engine, as pointed out by Sadi Carnot in 1824.
To have heat transfers exactly at Th and Tc , we must expand or contract the gas isothermally.
To otherwise not produce any entropy, we should have adiabatic expansion and contraction for the
other parts of the cycle, ie with Q = 0. That’s how we get the gas from Th to Tc and back.
You should do problem 4.5 to check that this all works out, but based on our understanding of
energy conservation and the 2nd law of thermodynamics, it should actually be ‘obvious’ that this is
a maximally eﬃcient engine.
Note that Carnot engines are extremely impractical, because they must operate extremely slowly
to avoid producing excess entropy.
It’s easier to draw the Carnot cycle in T and S space rather than P and V space.

5.1 Refrigerators
Theoretically, a refrigerator is just a heat engine run in reverse.
We deﬁne the coeﬃcient of performance of a refrigerator as
Qc Qc 1
COP = = = Qh
(5.1.1)
W Qh − Qc Qc
−1

We cannot just use the inequality from our discussion of engines, because heat ﬂows in the opposite
direction.
Instead, the entropy dumped into the hot reservoir must be at least as large as that drawn from
the cold reservoir, which means that
Qh Qc
≥ (5.1.2)
Th Tc
so that
Qh Th
≥ (5.1.3)
Qc Tc
This then tells us that
1 Tc
COP ≤ Th
= (5.1.4)
Tc
−1 Th − Tc

35
So in theory we can refrigerate very, very well when Tc ≈ Tc . This should be intuitive, because in
this limit we need not transfer almost any entropy while transferring heat.
Conversely, if Tc Th then the performance is very poor. It takes an immense amount of work
to cool something towards absolute zero.
It should also be clear from the way we derived it that the maximally eﬃcient refrigerator would
be a Carnot cycle run in reverse.

Brief Timeline
Carnot came up with ingenius arguments that engines couldn’t be more efficient than the Carnot
cycle, and that they must always produce waste heat. But he didn’t distinguish very clearly between
entropy and heat. Clausius defined entropy (and coined the term) clearly as Q/T , but he didn’t
know what it really was. Boltzmann mostly figured this out by 1877.

5.2 Real Heat Engines

Let’s discuss some real heat engines.
Internal combustion engines fill a chamber with fuel and air, then combust it, expanding the
gas and pushing the piston; then the material vents and is replaced with lower temperature/pressure
gas, the piston compresses, and the cycle starts again. This is called the Otto cycle. Notice there’s
no hot reservoir, but combustion behaves the same way.
We can idealize this with fixed volume changes in pressure and adiabatic compression and
expansion operations.
Let’s compute the efficiency, recalling that V γ P = constant where γ is the adiabatic exponent,
where γ = 1 + f2 . On the adiabats, no heat is exchanged, so all heat exchanges occur at constant
volume. This means that when the heat is exchanged, no work is done, and so the heat is equal to
the change in internal energy of the gas. Thus we have that
f
Qh = N k(T3 − T2 ) (5.2.1)
2
and
f
Qc = N k(T4 − T1 ) (5.2.2)
2
so that the efficiency is
T3 − T4 − T2 + T1 T4 − T1
e= =1− (5.2.3)
T3 − T2 T3 − T2
It turns out that this simplifies nicely, using the fact that on adiabats
T3 V2γ−1 = T4 V1γ−1 (5.2.4)
and
T1 V1γ−1 = T2 V2γ−1 (5.2.5)

36
This leads to
rT3 − rT2
e=1− =1−r (5.2.6)
T3 − T2
γ−1
V2
where r = V1
. So in other words, the overall efficiency is
γ−1
V2
e=1− (5.2.7)
V1
For air γ = 7/5 and we might have V2 /V1 = 8, so that the efficiency is around 56%.
Recall that T V γ−1 is fixed on an adiabat, so to compare this efficiency with Carnot we can solve
for V ratios in terms of T ratios, and we find
T1 T4
e=1− =1− (5.2.8)
T2 T3
which doesn’t involve the extreme temperatures, so it less efficient than Carnot. In practice real
gasoline engines are 20-30% efficient.
You can get greater efficiency with higher compression ratio, but the fuel will pre-ignite once
it gets too hot. We can take advantage of this using diesel engines, which simply use heat from
compression to ignite the mixture. They spray the fuel/air mixture as the piston starts to move
outward, meaning that (to some approximation) they have a constant volume and a constant pressure
piece in their cycles, which means computing their efficiency is more complicated. They ultimately
have higher efficiency (around 40%) using very high compression ratios, which are limited by not
melting the engine.

Steam Engines
See the tables and diagrams in the book.
These use a Rankine cycle, and go outside the ideal gas regime, as the steam condenses!
Here water gets pumped in, heats into steam and pushes a turbine (as it cools and decreases in
pressure), then is condensed back into water.
We can’t compute the efficiency from first principles. But under constant pressure conditions,
the heat absorbed is equal to the change in enthalpy H. So t we can write the efficiency as
Qc H4 − H1 H4 − H1
e=1− =1− ≈1− (5.2.9)
Qh H3 − H2 H3 − H1
where the last approximation is good because the pump adds very little energy to the water, and
the P V term is small for liquids.
There are tables called ‘steam tables’ that simply list the enthalpies and entropies of steam, water,
and water-steam mixtures in various conditions. We need the entropies because for the steam+water
mixture, we only get H by using the fact that 3 → 4 is an adiabat, so S3 = S4 , allowing us to use
the S values of the tables to get H.

37
5.3 Real Refrigerators
Real refrigerators follow a cycle much like the inverse of the Rankine cycle. The working substance
changes back and forth between a liquid and a gas. That liquid-gas transition (boiling point) must
be much lower in a refrigerator though, because we want to get to lower temperatures.
A variety of fluids have been used, including carbon dioxide, ammonia (dangerous), Freon, and
now HFC-134a.
The PV diagram involves first from 1 the gas is compressed adiabatically, raising P and T . Then
from 2 to 3 it gives up heat and liquefies in the condenser. Then from 3 to 4 it passes through
a throttling valve – a narrow opening – emerging on the other side at much lower pressure and
temperature. Finally it absorbs heat and turns back into a gas in the evaporator.
We can write the COP using enthalpies
Qc H1 − H4
COP = = (5.3.1)
Qh − Qc H2 − H3 − H1 + H4
Enthalpies at 1, 2, 3 can be looked up, with point 2 we can assume S is constant during compression.
We can understand point 4 by thinking more about throttling.

Throttling
Throttling, or the Joule-Thomson process involves pushing a fluid through a tiny hole or plug from
a region with pressure Pi to one with pressure Pf . Initial and final volumes are Vi and Vf . There is
no heat flow, so
Uf − Ui = Q + W = 0 + Wlef t + Wright (5.3.2)
which gives
Uf − Ui = Pi Vi − Pf Vf (5.3.3)
or
Uf + Pf Vf = Ui + Pi Vi (5.3.4)
or just conservation of enthalpy.
The purpose is to cool the fluid below the cold reservoir temperature, so it can absorb heat. This
wouldn’t work if the fluid were an ideal gas, since H = f +22
N kT .
But in a dense gas or liquid, the energy contains a potential term which decreases (due to
attractive forces) when the molecules get close together. So
H = Upot + Ukin + P V (5.3.5)
and so as the gas molecules get further apart, Upot increases, and the kinetic energy and temperature
tend to decrease.
If we use H3 = H4 in refrigeration, then we have
H1 − H3
COP = (5.3.6)
H2 − H1
and so we just need to look up the enthalpies.

38
Cooling MORE
See the book for some fun reading.

• You can make dry ice by liquifying CO2 at higher pressure, then when you throttle you get
dry ice frost residue.

• Making liquid Nitrogen is more diﬃcult. You can just throttle for a while, but you need to
start at higher pressure.

• Then you need to do it more eﬃciently, so you can use a heat exchanger to cool one gas as
another is throttled.

However, throttling isn’t good enough for hydrogen or helium, as these gases become hotter
when throttled. This is because attractive interactions between gas molecules are very weak, so
interactions are dominated by positive potential energies (repulsion). Hydrogen and Helium were
ﬁrst liquiﬁed in 1898 and 1908 respectively.
One can use further evaporative cooling to get from 4.2 K (boiling point of helium at 1 atm) to
about 1 K. To go even further need

• A helium dilution refrigerator, where 3 He evaporates into a liquid bath with 4 He. We use the
3
He to take heat from the 4 He. This can get us from 1K to a few milliK.

• Magnetic cooling using

µB
M = N µ tanh (5.3.7)
kT
where we start with mostly up spins, and then smoothly decrease B, so that T also decreases
as the population of up and down spins stays the same. This is analogous to cooling of an
ideal gas by adiabatic expansion, following isothermal compression. We can’t get to 0K this
way due to dipole-dipole interactions, ie an imperfect paramagnet.

• There’s also laser cooling.

6 Thermodynamic Potentials
Forces push us towards minima of energies. Probability pushes us towards maximization of entropy.
Are there thermodynamic potentials mixing U and S that combine to tell us how a system will
tend to change?
And a related question – we have seen that enthalpy H is useful when we are in an environment
with constant pressure. What if instead we’re connected to a reservoir at constant T ? Perhaps
there’s a natural way to account for heat transfer to and from this reservoir, just as H automatically
accounts for work due to the ambient pressure?

39
Let’s assume the environment acts as a reservoir of energy, so that it can absorb or release
unlimited energy without any change in its temperature. If SR is the reservoir entropy, then

dStotal = dS + dSR ≥ 0 (6.0.1)

by the 2nd law.

We would like to eliminate the reservoir. For this, let’s use the thermodynamic identity for the
reservoir
1 P µ
dSR = dUR + dVR − dNR (6.0.2)
T T T
and let’s hold VR and NR ﬁxed, so only energy can be exchanged. Then we can write
1
dStotal = dS + dUR (6.0.3)
T
since TR = T . But energy is conserved, so we have that
1 1 1
dStotal = dS − dU = − (dU − T dS) = − dF (6.0.4)
T T T
where we deﬁne the Helmholtz free energy as

F = U − TS (6.0.5)

We have deﬁned it with a minus sign because now we have that

dF ≤ 0 (6.0.6)

which means that F can only decrease. So it’s a kind of ‘potential’ such that we can only move to
lower and lower values of F at ﬁxed T, V, N .
And now we can just ignore the reservoir and account for the 2nd law using F . The system will
just try to minimize F .
We can play the same game letting the volume of the system change, and working at constant
pressure, so that
1 P 1 1
dStotal = dS − dU − dV = − (dU − T dS + P dV ) = − dG (6.0.7)
T T T T
where

G = U − TS + PV (6.0.8)

and we work at constant T and constant P . Thus in this case its the Gibbs free energy that must
decrease.
We can summarize this as

40
• At constant U and V , S increases.

• At constant T and V , F tends to decrease.

• At constant T and P , G tends to decrease.

All three assume no particles enter or leave, though one can also make even more potentials if we
consider diﬀusion of particles.
We can think about this directly by just noting that in

F ≡ U − TS (6.0.9)

S ‘wants’ to increase, and (one may think that) U ‘wants’ to decrease. The latter is only because it
can give energy to the environment, and thereby increase the total entropy of the universe.
Wait what? Here what’s happening is that if U decreases without S increasing, then this means
that the entropy of the system isn’t changing – but the entropy of the environment will deﬁnitely
change by dUT
. So the fact that objects ‘want’ to roll down hills (and then not roll back up again!)
ultimately comes from the 2nd law of thermodynamics.
Note also that the desire to give away energy isn’t very great if T is large, since then the system
may not want to do this and risk ‘losing valuable entropy’. This is one way of seeing that at high
temperature, our system will have a lot of energy.
Reasoning for the Gibbs free energy

G ≡ U + PV − TS (6.0.10)

is similar, except that the environment can also ‘take’ volume from the system.

Why Free Energy?

We can also interpret F and G as analogues to H – these are the amount of energy needed to create
a system, minus the heat-energy that we can get for free from the environment. So for example in

F = U − TS (6.0.11)

we see that we can get heat Q = T ΔS = T S from the environment for free when we make the
system.
Conversely, F is the amount of energy that comes out as work if we annihilate the system,
because we have to dump some heat Q = T S into the environment just to get rid of the systems
entropy. So the available or ‘free’ energy is F .
Above we meant all work, but if we’re in a constant pressure environment then we get some work
for free, and so in that context we can (combining H and F , in some sense) deﬁne

G = U − TS + PV (6.0.12)

which is the systems energy, minus the heat term, plus the work the constant P atmosphere will
automatically do.

41
We call U, H, F, G thermodynamic potentials, which diﬀer by adding P V or −T S. They are all
useful for considering changes in the system, and changes in the relevant potential. Note that

ΔF = ΔU − T ΔS = Q + W − T ΔS (6.0.13)

If no new entropy is created then Q = T ΔS, otherwise Q < T ΔS, and so we have

ΔF ≤ W (6.0.14)

at constant T . This includes all work done on the system. Similarly, we ﬁnd that

ΔG ≤ Wother (6.0.15)

at constant T, P , where Wother includes work done other than by pressure.

Changes in H, F, G are very useful to know, and so are tabulated by chemists and engineers. We
rarely talk about their absolute values, since that would include all of the energy (even E = mc2
energy), so instead these quantities are usually viewed as changes from a convenient reference point.

More Thermodynamic Identities and Legendre Transforms

We learned the identity

dU = T dS − P dV + µdN (6.0.16)

In this sense we are automatically thinking about U as a function of S, V, N , since its these quantities
that we’re directly varying.
We can use this plus the definitions of H, F, G to write down a ton of other thermodynamic
identities. The math that we use is the ‘Legendre transform’. Given a convex function f (y), it
makes it possible to effectively ‘switch variables’ and treat f 0 (y) = x as the independent variable.
See 0806.1147 for a nice discussion.
Say we have a function so that its differential is

df = xdy (6.0.17)

Then if we consider a new function

g = xy − f (6.0.18)

then we have

dg = ydx + xdy − df = ydx (6.0.19)

So now we can instead use our new function g(x) in place of f (y).
For example, perhaps instead of using the entropy S(E) as a function of E, we would instead
∂S
like to make ∂E = T1 , ie the temperature, the independent variable. That way we can use T as our
‘knob’ instead of E, perhaps because we can control T but not E.

42
Although it may seem a bit mysterious – and the number of possibilities is rather mystifying –
we can apply these ideas very directly. For example since

H = U + PV (6.0.20)

we have

dH = dU + P dV + V dP (6.0.21)

and so we can write

dH = T dS − P dV + µdN + P dV + V dP
= T dS + µdN + V dP (6.0.22)

so we can think of the enthalpy as a function of S, N, P . We traded V for P and switched from
energy U to enthalpy H.
We can really go to town and do this for all the thermodynamic potentials. For example, for F
we ﬁnd

dF = −SdT − P dV + µdN (6.0.23)

and this tells us how F changes as we change T, V, N , and also that F is naturally dependent on
those variables. We also have

dG = −SdT + V dP + µdN (6.0.24)

and we get other identities from it.

Gibbs Free Energy and Chemical Potential

Combining the previous results with the idea of intensive/extensive variables, we can derive a cute
fact about G.
Recall that

∂G
µ= (6.0.25)
∂N T,P

This says that adding particles at ﬁxed T, P simply increases G by µ.

You might think that µ changes as you change N , but this can’t happen with T, P ﬁxed, because
G is an extensive quantity that must grow proportionally to µ. Thus we see that

G = Nµ (6.0.26)

an amazingly simple relation, which provides a new interpretation for G and µ.

43
Our argument was subtle – why doesn’t it apply to

∂F
µ= (6.0.27)
∂N T,V
The reason is that with ﬁxed V , as you change N the system becomes more and more dense. So an
intensive quantity is changing as you change N , namely the density N/V . It was crucial in the case
of G that all ﬁxed quantities were intensive. P
With more types of particles, we just have G = i Ni µi . Note though that µi for a mixture are
not equal to µi for a pure substance.
We can use this to get a formula for µ for an ideal gas. Using the fact that

∂G
V = (6.0.28)
∂P T,N
we see that
∂µ 1 ∂G V
= = (6.0.29)
∂P N ∂P N
But by the ideal gas law this is kT /P . So we can integrate to get

P
µ(T, P ) − µ(T, P0 ) = kT log (6.0.30)
P0
for any reference P0 , usually atmospheric pressure.
This formula also applies to each species independently in a mixture, if P is the partial pressure
of that species. This works because ideal gases are non-interacting – that’s essentially what makes
them ideal.

Example Application of G – Electrolysis & Batteries

Let’s consider the electrolysis process
1
H2 O → H2 + O2 (6.0.31)
2
We’ll assume we start with a mole of water.
Reference tables say that ΔH = 286 kJ for this reaction. It turns out P ΔV = 4 kJ to push the
atmosphere away, the remaining 282 kJ remains in the system. But do we have to supply all of the
286 kJ as work, or can some enter as heat?
For this we need to look at the entropies for a mole of each species
SH2 O = 70J/K, SH2 = 131J/K, SO2 = 205J/K (6.0.32)
So during the reaction we have an entropy change of
1
SH2 + SO2 − SH2 O (6.0.33)
2
44
and so the maximum amount of heat that can enter the system is

T ΔS = (298K)(163J/k) = 49kJ (6.0.34)

So the electrical work we need to do is 237 kJ. Thi sis the ΔG of the Gibbs free energy:

ΔG = ΔH − T ΔS (6.0.35)

Standard tables include this information. If you perform the reaction in reverse, you can get ΔG of
energy out.
The same reasoning applies to batteries. For example lead-acid cells in car batteries run the
reaction

P b + P bO2 + 4H + + 2SO42− → 2P bSO4 + 2H2 O (6.0.36)

Tables say ΔG = −390 kJ/mol in standard conditions, so per mole of metallic lead we get 390 kJ
from the battery.
Note that ΔH = −312 kJ/mol for this reaction, so we actually get extra energy from heat
absorbed from the environment! When we charge the battery and run the reaction in reverse, we
have to put the extra 78 kJ of heat back into the environment.
You can also compute the battery voltage if you know how many electrons get pushed around
per reaction. For this reaction 2 electrons get pushed through the circuit, so the electrical work per
electron is
390kJ
= 3.24 × 10−19 J = 2.02eV (6.0.37)
2 × NA
A volt is the voltage needed to give an electron 1 eV, so the voltage is 2.02 V. Car batteries have six
cells to get to 12 V.

7 Partition Functions and Boltzmann Statistics

Now we will develop some more powerful tools for studying general systems.
Speciﬁcally, we will obtain a formula for the probability of a ﬁnding any system in a given
microstate, assuming that the system is in equilibrium with a reservoir of temperature T .

7.1 Boltzmann Factors

We’ll consider an analysis that applies to any system. But since Hydrogen atoms tend to be familiar,
we can phrase the analysis in terms of a single electron in some hydrogen atom orbital.
A central point is that we are not considering a truly isolated system, but one that it’s contact
with a ‘thermal bath’, ie some other system with a great deal of energy and entropy, and fixed
temperature T .
Let’s simplify by considering the ratio of probabilities to be in two specific microstates of interest,
s1 and s2 , with energies E(s1 ), E(s2 ) and probabilities P (s1 ), P (s2 ). How can we find the ratio?

45
We know that for an isolated system, all accessible microstates are equally probable. Our atom
isn’t isolated, but the atom + reservoir system is isolated. We expect that we’re equally likely to
find this combined system in any microstate.
The reservoir will have ΩR (s1 ) available microstates when the atom is in s1 , and ΩR (s2 ) microstates
when the atom is in s2 . These will be different because the reservoir has more or less energy, and
thus it has access to more or less states. Since all states in total are equally likely, the ratio of
probabilities must be
P (s2 ) ΩR (s2 )
= (7.1.1)
P (s1 ) ΩR (s1 )
So we just need to write the right-side in a more convenient form that doesn’t make reference to the
reservoir.
Note that something a bit surprising happened – because multiplicity depends so strongly on
states and energies, changing our single atom’s state just a bit actually had a significant effect on
the multiplicity of a giant reservoir of energy. This contrasts with most questions in physics, where
the state of a single atom has almost no relevance to the behavior of a large reservoir.
But that’s easy by noting that

P (s2 ) eSR (s2 )

= S (s1 ) = eSR (s2 )−SR (s1 ) (7.1.2)
P (s1 ) e R
Now we can use the thermodynamic identity
1
dSR = (dUR + P dVR − µdNR ) (7.1.3)
T
We can write all of these changes in terms of properties of the atom!
We know dNR really is zero here, because reservoir particle number isn’t changing. We will set
dVR = 0, even though in principle it may change, because the change in it is very tiny (eg a million
times smaller than the term we’ll keep).
This then leads to
1 1
SR (s2 ) − SR (s1 ) = [UR (s2 ) − UR (s1 )] = − [E(s2 ) − E(s1 )] (7.1.4)
T T
Notice this is another situation where defining the natural quantity as β = 1/T might have been
more convenient. Anyway, we find that
E(s2 )
P (s2 ) e− kT
= E(s1 ) (7.1.5)
P (s1 ) e− kT
So the ratio strongly suggests that we can simply define a universal Boltzmann factor of

e−E(s)/kT (7.1.6)

as the proportionality factor for any given state s with energy E(s).

46
Note that probabilities aren’t equal to this factor because they also have to add up to 1! And so
we need to divide by the sum of all Boltzmann factors, including all states. We call that sum over
these factors
X E(s)
Z= e− kT (7.1.7)
s

the partition function. The true probabilities are

1 − E(s)
P (s) = e kT (7.1.8)
Z
This is perhaps the most important fact or formula in statistical mechanics.
The partition function Z itself is extremely useful. Notice that while it doesn’t depend on any one
state (since it depends on all states and energies), it does depend on the temperature. It essentially
counts the eﬀective number of states that are accessible; for that interpretation we just set E = 0
for the ground state.
Notice that when we think about statistical mechanics using Boltzmann factors, Temperature
plays a more fundamental role.

Example - Hydrogen in the Sun

The sun’s atmosphere has a temperature around 5800 K. The relative probability of ﬁnding an atom
in a its ﬁrst excited state as compared to the ground state is

e−(E2 −E1 )/kT (7.1.9)

with δE = 10.2 eV, and we have kT = 0.5 eV (note that an eV is about 10, 000 K). So the ratio of
probabilities is about e−20.4 ≈ 1.4 × 10−9 . Since there are four excited states, the portion in the ﬁrst
excited state is about about 5 × 10−9 .
Atoms in the atmosphere of the sun can be absorbed if they can induce a transition between
states. A hydrogen atom in the ﬁrst excited state can transition up in energy to give the Balmer
series, which give missing wavelengths in sunlight. Some other lines are also missing due to other
kinds of atoms, but these transition from their ground state. So there must be way more hydrogen
than other atoms in the sun’s atmosphere!

7.2 Average or Expectation Values

We saw that the probability to be in a given state is

e−βE(s)
P (s) = (7.2.1)
Z

47
Given the probability to be in any given state, we can easily (in principle) compute the average or
expectation value for any quantity f (s). Formally this means that
X
f¯ = P (s)f (s)
s
1 X −βE(s)
= e f (s)
Z s
P −βE(s)
se f (s)
= P −βE(s)
(7.2.2)
se

For example, the simplest thing we might want to know about is the average energy itself, ie
f (s) = E(s). Thus we have that
P −βE(s)
se E(s)
Ē = P −βE(s)
(7.2.3)
se

for the average energy.

The average energy is actually easier to compute because
1 ∂Z
Ē = − (7.2.4)
Z ∂β
as we can easily check by explicit computation. That’s why I wrote the partition function in the
way I did.
Note that expectation values are additive, so the average total energy of two objects is the
sum of their individual average energies. So if you have a collection of identical particles

U = N Ē (7.2.5)

This means that in many cases, working atom-by-atom is no diﬀerent than studying the whole
system (if we only care about averages).

7.2.1 Paramagnets
For paramagnets, there’s just up and down dipole moment, so

Z = e−β(−µB) + e−β(+µB) = eβ(µB) + e−β(µB)

= 2 cosh(βµB) (7.2.6)

and so we can easily compute the up and down probabilities. The average energy is
1 ∂Z
Ē = − = −µB tanh(βµB) (7.2.7)
Z ∂β
which is what we found before. We also get the Magnetization right.

48
7.2.2 Rotation
A more interesting example is the rotational motion of a diatomic molecule.
In QM, rotations are quantized with total angular momentum j~. The energy levels are

E(j) = j(j + 1) (7.2.8)

for some ﬁxed energy inversely proportional to the molecules moment of inertia. The number of
energy levels at angular momentum j is 2j + 1. (Here we are assuming the two ends of the molecule
are non-identical.)
Given this structure, we can immediately write the partition function
∞
X
Z = (2j + 1)e−E(j)/kT
j=0
∞

X
= (2j + 1)e−j(j+1) kT (7.2.9)
j=0

Unfortunately we can’t do the sum in closed form, but we can approximate it.
Note that for a CO molecule, = 0.00024 eV, so that /k = 2.8 K. Usually we’re interested in
much larger temperatures, so that /kT 1. This is the classical limit, where the quantum spacing
doesn’t matter much. Then we have
Z ∞
kT
Z≈ (2j + 1)e−j(j+1) kT dj = (7.2.10)
0
when kT . As one might expect, the partition function increases with temperature. Now we ﬁnd
∂ log Z
Ē = − = kT (7.2.11)
∂β
Diﬀerentiating with respect to T gives the contribution to the heat capacity per molecule

∂Ē
CV ⊃ =k (7.2.12)
∂T
as expected for 2 rotational DoF. Note that this only holds for kT ; actually the heat capacity
goes to zero at small T .
If the molecules are made of indistinguishable atoms like H2 or O2 , then turning it around gives
back the same molecule, and there are half as many rotatational degrees of freedom. Thus we have
kT
Z= (7.2.13)
2
The extra 1/2 cancels out when we compute the average energy, so it has no eﬀect on heat capacity.
At low temperatures we need to more carefully consider QM to get the right answer.

49
7.2.3 Very Quantum System
What if we have a very quantum system, so that energy level splittings are much larger than kT ?
Then we have

Z ≈ 1 + e−βE1 (7.2.14)

where E1 is the lowest excited state, and we are assuming En > E1 kT . Note that the expectation
of the energy is

E¯ = 0 + E1 e−βE1 = E1 e−βE1 (7.2.15)

where we have set the energy of the ground state to 0 as our normalization.
We also have that
E1
∂Ē E1 e− kT
= (7.2.16)
∂T kT 2
It may appear that as T → 0 this blows up, but actually it goes to zero very rapidly, because
exponentials dominate power-laws. For comparison note that xn e−x → 0 as x → ∞, this is just the
x → 1/x version of that statement.

7.2.4 This is Powerful

Anytime we know the energy levels of a system, we can do statistical mechanics. This is usually
much more eﬃcient than using combinatorics to count states, then determine S(E), and then ﬁgure
out T -dependence from there.

7.3 Equipartition
We have discussed the equipartition theorem... now let’s derive it.
Recall that it doesn’t apply to all systems, but just to quadratic degrees of freedom with energies
like

E = kq 2 (7.3.1)

where q = x as in a harmonic oscillator, or q = p as in the kinetic energy term. We’ll just treat q as
the thing that parameterizes states, so that diﬀerent q corresponds to a diﬀerent independent state.
This leads to
2
X X
Z = e−βE(q) = e−βkq
q q
Z ∞
2
≈ e−βkq dq (7.3.2)
−∞

50
Really the latter should be the deﬁnition in the classical limit, but we might also imagine starting
with a QM description with discrete
√ q and smearing it out. Anyway, we can evaluate the integral by
a change of variables to x = βkq so that
Z ∞
1 2
Z≈√ e−x dx (7.3.3)
βk −∞

The integral is just some number, so we’re free to write

C
Z=√ (7.3.4)
β
pπ 1
for some new number C. Note that C = k Δq
where Δq indicates the possible QM discreteness of
q.
Now the average energy is
∂ log Z 1
Ē = − = kT (7.3.5)
∂β 2
which is the equipartition theorem. Notice that k and Δq drop out. But that only works because
we assumed Δq was very small, in particular so that

kqΔq, kΔq 2 kT (7.3.6)

This is just the statement that we can take the classical limit, because energy level spacings are
small.

7.4 Maxwell Speed Distribution

As another application of Boltzmann factors, we can determine the distributions of speeds in a gas.
From equipartition we already know that
r
3kT
vrms = (7.4.1)
m
but that’s only an average (or more precisely, a moment of the distribution). With Boltzmann
factors we can just write down the distribution itself.
Here we are really computing a probability density, which gives a probability P (~v )d3~v or P (s)ds
for speed. So the integral gives the probability that v lies in some interval. Here P (~v ) or P (s)
is a probability distribution function. Note that its only P (s)ds that has the right units for
probability.
By the way, the moments of the distribution would be
Z ∞
ds sn P (s) (7.4.2)
0

51
where this is the nth moment. The vrms is the square root of the 2nd moment.
The Boltzmann factor for a gas molecule is the simple
v2
m~
e− 2kT (7.4.3)

using the usual formula for kinetic energy. But if we want to compute the probability distribution
for speed P (s), we need to account for the dimensionality of space. This means that we have
Z
m~v2 m~v2
P (s) ∝ d2~v e− 2kT = 4πs2 e− 2kT (7.4.4)
|v|=s

We used proportional to because we need to normalize. For that we need to compute the constant
N such that
Z ∞
m~v2
N = ds 4πs2 e− 2kT
0
3/2 Z ∞
2kT 2
= 4π x2 e−x dx (7.4.5)
m 0

Thus we have that

1 m~v2
P (v) = 4πs2 e− 2kT
N
m 3/2 m~v2
= 4πs2 e− 2kT (7.4.6)
2πkT
This is the Maxwell distribution for the speeds of molecules. It may look complicated, but its just a
Boltzmann factor times a geometric factor.
Note that this distribution is very asymmetric, insofar as it’s parabolic at small v, where its
geometry-dominated, but exponential (Gaussian) at large v, where it falls oﬀ exponentially fast.
Note that the most common, mean, and average speeds are all diﬀerent. We have
r
2kT
vmax = (7.4.7)
m
The mean speed (1st moment) is
Z ∞
r
8kT
v̄ = ds sP (s) = (7.4.8)
0 πm
So we have vmax < v̄ < vrms .

7.5 Partition Function and Free Energy

We began the course focused on the multiplicity Ω and S = log Ω, which tends to increase. These
are the fundamental quantities for an isolated system.

52
For a system at temperature T (in contact with a reservoir), the partition function Z(T ) is
the fundamental quantity. We know that Z(T ) is proportional to the number of accessible states.
Since Z(T ) should (thus) tend to increase, its logarithm will tend to increase too. This suggests a
relationship with −F , where F is the Helmholtz free energy. In fact

F = −kT log Z, Z = e−F/kT (7.5.1)

One could even take this as the deﬁnition of F . Instead, we’ll derive it from our other deﬁnition

F = U − TS (7.5.2)

To derive the relationship with Z, note that

∂F F −U
= −S = (7.5.3)
∂T V,N T

This is a differential equation for F . Let’s show that −kT log Z satisfies this differential equation,
and with the same initial condition.
Letting F̃ = −kT log Z, we have that

∂F̃ ∂
= −k log Z − kT log Z (7.5.4)
∂T ∂T
but
∂ U
log Z = (7.5.5)
∂T kT 2
This then tells us that F˜ obeys the correct differential equation.
They also agree at T = 0, since both F and F̃ will be 1, as the system will just be in its ground
state. So they are the same thing.
One reason why this formula is so useful is that
∂F ∂F ∂F
S=− , P =− , µ= (7.5.6)
∂T ∂V ∂N
where we fix the T, V, N we aren’t differentiating with respect to. So from Z we can get all of these
quantities.

7.6 From Laplace Transforms to Legendre Transforms

Let’s consider another derivation/explanation of the relationship between F and Z.
Note that eS(E) is the multiplicity of states at energy E. Thus
X
Z = e−E(s)/kT
Zs ∞
E
= dE eS(E)− kT (7.6.1)
0

53
From this perspective, the statement that
F = −kT log Z (7.6.2)
is equivalent to the statement that we can just view
Z ∞
E
Z = dE eS(E)− kT
0
U
S(U )− kT
≈ e (7.6.3)
That is, somehow we have simply forgotten about the integral! What’s going on? Have we made a
mistake?
The reason why this is alright is that the function
E
eS(E)− kT (7.6.4)
will be very, very sharply peaked at the energy U = E.¯ Its so sharply peaked that its integral is
simply dominated by the value at the peak. This ‘sharply peaked’ behavior is exactly what we
spent the first couple of weeks of class going on and on about. So
F = −kT log Z = U − T S (7.6.5)
is actually true because we are taking averages in the thermodynamic limit.
This provides a new perspective on thermodynamic potentials. That is, if we work in the
microcanonical ensemble, that is the ensemble of states with fixed energy U , then its natural to
study the multiplicity
Ω(U ) = eS(U ) (7.6.6)
and expect that systems in equilibrium will maximize this quantity. But we can introduce a
temperature by ‘averaging over states’ weighted by a Boltzmann factor. This puts us in the
canonical ensemble, the ensemble of states of all energies with fixed average energy. And it’s
simply given by taking the Laplace transform with respect to energy
Z ∞
Z(β) = dE e−βE eS(E) (7.6.7)
0

where β = 1/kT as usual. Notice that we have traded a function of energy U for a function of
temperature T , just as in the Legendre transform discussion. This isn’t a coincidence!
The statement that this is dominated by its maximum also implies that the canonical and
microcanonical ensembles are the same thing in the thermodynamic limit. Further, it tells us
that since
−kT log Z = U − T S(U ) (7.6.8)
we learn that the Legendre transform is just the thermodynamic limit of the Laplace
transform. This is a useful way to think about Legendre transforms in thermodynamics.
We can obtain all the other thermodynamic potentials, and all of the other Legendre transforms,
in exactly this way.

54
8 Entropy and Information
8.1 Entropy in Information Theory
We have been thinking of entropy as simply
S = log Ω (8.1.1)
But for a variety of reasons, it’s more useful to think of entropy as a function of the probability
distribution rather than as depending on a number of states.
If we have Ω states available, and we believe the probability of being in any of those states is
uniform, then this probability will be
1
pi = (8.1.2)
Ω
for any of the states i = 1, · · · , Ω. So we see that
S = − log p (8.1.3)
for a uniform distribution with all pi = p = 1/Ω. But notice that this can also be viewed as
X
S≡− pi log pi (8.1.4)
i

since all pi are equal, and their sum is 1 (since probabilities are normalized). We have not yet
demonstrated it for distributions that are not uniform, but this will turn out to be the most useful
general notion of entropy. It can also be applied for continuous probability distributions, where
Z
S ≡ − dx p(x) log p(x) (8.1.5)

Note that 0 log 0 = 0 (this is what we ﬁnd from a limit).

This arises in information theory as follows. Say we have a long sequence of 0s and 1s, ie
· · · 011100101000011 · · · (8.1.6)
If this is truly a fully random sequence, then the number of possibilities is 2N if it has length N ,
and the entropy is just the log of this quantity. This also means that the entropy per bit is
1
Sbit = log 2N = log 2 (8.1.7)
N
But what if we know that in fact 0s occur with probability p, and 1s with probability 1 − p? For
a long sequence, we’ll have roughly pN of the 0s, so this means that the number of such messages
will be
N! NN
≈
(pN )!((1 − p)N )! (pN )pN (N − pN )N −pN
N
1
= = eN Sbit (8.1.8)
pp (1 − p)1−p

55
where we see that

Sbit = −p log p − (1 − p) log(1 − p) (8.1.9)

So the entropy S that we’ve studied in statistical mechanics, ie the log of the number of possible
sequences, naturally turns into the deﬁnition in terms of probability distributions discussed above.
This has to do with information theory because N S quantiﬁes the amount of information
we actually gain by observing the sequence. Notice that as Warren Weaver wrote in an initial
popularization of Shannon’s ideas:
• “The word information in communication theory is not related to what you do say, but to
what you could say. That is, information is a measure of one’s freedom of choice when one
selects a message.”

8.2 Choice, Uncertainty, and Entropy

Shannon showed in his original paper that entropy has a certain nice property. It characterizes how
much ‘choice’ or ‘uncertainty’ or ‘possibility’ there is in a certain random process. This should seem
sensible given that we just showed that entropy counts the amount of information that a sequence
can carry.
Let’s imagine that there are a set of possible events that can occur with probabilities p1 , · · · , pn .
We want a quantity that measures how much ‘possibility’ there is in these events. For example, if
p1 = 1 and the others are 0, then we’d expect there’s no ‘possibility’. While if n 1 and the pi are
uniform, then ‘almost anything can happen’. We want a quantitate measure of this idea.
We will show that entropy is the unique quantity H(pi ) with the following natural properties:
1. H is continuous in the pi .

2. If all pi are equal, so that all pi = 1/n, then H should increase with n. That is, having more
equally likely options increases the amount of ‘possibility’ or ‘choice’.

3. If the probabilities can be broken down into a series of events, then H must be a weighted
sum of the individual values of H. For example, the probabilities for events A, B, C

1 1 1
{A, B, C} = , , (8.2.1)
2 3 6
can be rewritten as the process

1 1 2 1
{A, B or C} = , then {B, C} = , (8.2.2)
2 2 3 3
We are requiring that such a situation follows the rule

1 1 1 1 1 1 2 1
H , , =H , + H , (8.2.3)
2 3 6 2 2 2 3 3

56
We will show that up to a positive constant factor, the entropy S is the only quantity with these
properties.
Before we try to give a formal argument, let’s see why the logarithm shows up. Consider

1 1 1 1 1 1 1 1 1
H , , , = H , +2 H ,
4 4 4 4 2 2 2 2 2

1 1
= 2H , (8.2.4)
2 2

Similarly

1 1 1 1
H ,··· , = 3H , (8.2.5)
8 8 2 2

and so this is the origin of the logarithm. To really prove it, we will take two fractions, and note
that s1n ≈ t1m for suﬃciently large n and m. The only other point we then need is that we can
approximate any other numbers using long, large trees.
Here is the formal proof. Let

1 1
A(n) ≡ H ,··· , (8.2.6)
n n

for n equally likely possibilities. By using an exponential tree we can decompose a choice of sm
equally likely possibilities into a series of m choices among s possibilities. So

A(sm ) = mA(s) (8.2.7)

We have the same relation for some t and n, ie

A(tn ) = nA(t) (8.2.8)

By taking arbitrarily large n we can ﬁnd an n, m with sm ≤ tn < sm+1 , and so by taking the
logarithm and re-arranging we can write
m log t m 1
≤ ≤ + (8.2.9)
n log s n n
which means that we can make
m log t
− < (8.2.10)
n log s

Using monotonicity we have that

mA(s) ≤ nA(t) < (m + 1)A(s) (8.2.11)

57
So we also ﬁnd that
m A(t)
− < (8.2.12)
n A(s)
and so we conclude via these relations that
A(t) = K log t (8.2.13)
for some positive constant K, since we can make arbitrarily small.
But now by continuity we are essentially done, because we can approximate any set of probabilities
pi arbitrarily well by using a very ﬁne tree of equal probabilities.
Notice that the logarithm was picked out by our assumption 3 about the way that H applies to
the tree decomposition of a sequence of events.

8.3 Other Properties of Entropy

These properties all make sense in the context of information theory, where S counts the uncertainty
or possibility.
1. S = 0 only if there’s a single outcome.
2. S is maximized by the uniform distribution.
3. Suppose there are two events, x and y, where at each event n and m possibilities can occur,
respectively. The full joint entropy is
X
S(x, y) = p(i, j) log p(i, j) (8.3.1)
i,j

whereas there are partial entropies

X X
S(x) = p(i, j) log p(i, j)
i,j j
X X
S(y) = p(i, j) log p(i, j) (8.3.2)
i,j i

Then we have the inequality

S(x, y) ≤ S(x) + S(y) (8.3.3)
with equality only if the two events are independent, so that p(i, j) = p(i)p(j).
4. Any change in the probabilities that makes them more uniform increases the entropy. More
quantitatively, if we perform an operation
p0i = aij pj (8.3.4)
P P
where i aij = j aij = 1 and aij > 0, then S increases. This means that if x and y are
correlated, the total entropy is always less than or equal to the sum of the entropies from x
and y individually.

58
8.4 A More Sophisticated Derivation of Boltzmann Factors
Boltzmann factors can be derived in another simple and principled way – they are the probabilities
that maximize the entropy given the constraint that the average energy is held ﬁxed.
We can formalize this with some Lagrange multipliers, as follows.
We want to ﬁx the expectation value of the energy and the total probability while maximizing
the entropy. We can write this maximization problem using a function (Lagrangian)
! !
X X X
L=− pi log pi + β hEi − pi Ei + ν pi − 1 (8.4.1)
i i i

where we are maximizing/extremizing L with respect to the pi and β, ν; the latter are Lagrange
multipliers.
Varying gives the two constraints along with

− log pi − 1 − βEi + ν = 0 (8.4.2)

which implies that

pi = eν−1−βEi (8.4.3)

Now ν just sets the total sum of the pi while β is determined by the average energy itself. So we have
re-derived Boltzmann factors in a diﬀerent way. This derivation also makes it clear what abstract
assumptions are important in arriving at e−βE .

8.5 Limits on Computation

In practice, computation has a number of ineﬃciencies. Despite decades of progress, my impression
is that most of the waste is still due to electrical resistance and heat dissipation. We’ll perform a
relevant estimate in a moment. But what are the ultimate limits on computation?
We have seen several times that irreversible processes are those that create entropy. Conversely,
for a process to be reversible, it must not create any entropy.
However, computers tend to create entropy, and thus waste, for what may be a surprising reason
– they erase information. For example, say we add two numbers, eg

58 + 23 = 81 (8.5.1)

We started out with information representing both 58 and 23. Typically this would be stored as an
integer, and for example a 16 bit integer has information, or entropy, 16 log 2. But at the end of the
computation, we don’t remember what we started with, rather we just know the answer. Thus we
have created an entropy

S = 2 × (16 log 2) − (16 log 2) = 16 log 2 (8.5.2)

through the process of erasure!

59
Since our computer will certainly be working at ﬁnite temperature, eg room temperature, we
will be forced by the laws of thermodynamics to create heat

Q = kB T (16 log 2) ≈ 5 × 10−20 Joules (8.5.3)

Clearly this isn’t very signiﬁcant for one addition, but it’s interesting as its the fundamental limit.
Furthermore, computers today are very powerful. For example, it has been estimated that while
training AlphaGo Zero, roughly

1023 ﬂoat point operations (8.5.4)

were performed. Depending on whether they were using 8, 16, or 32 bit ﬂoating point numbers (let’s
assume the last), this meant that erasure accounted for

Q = kB T (32 log 2) × 1023 ∼ 8000 Joules (8.5.5)

of heat. That’s actually a macroscopic quantity, and the laws of thermodynamics say that it’s
impossible to do better with irreversible computation!
But note that this isn’t most of the heat. For example, a currently state of the art GPU like the
Nvidia Tesla V100 draw about 250 Watts of power and perform at max about 1014 ﬂop/s. This
means their theoretical minimum power draw is

Q = kB T (16 log 2) × 1014 f lop/s ∼ 10−5 Watts (8.5.6)

Thus state of the art GPUs are still tens of millions of times less eﬃcient than the theoretical
minimum. We’re much, much further from the theoretical limits of computation than we are from
the theoretical limits of heat engine eﬃciency.
In principle we can do even better through reversible computation. After all, there’s no reason to
make erasures. For example, when adding we could perform an operation mapping

(x, y) → (x, x + y) (8.5.7)

for example

(58, 23) → (58, 81) (8.5.8)

so that no information is erased. In this case, we could in principle perform any computation we like
without producing any waste heat at all. But we need to keep all of the input information around
to avoid creating entropy and using up energy.

9 Transport, In Brief
Let’s get back to real physics and discuss transport. This is a fun exercise. We’ll study the transport
of heat (energy), of momentum, and of particles.

60
9.1 Heat Conduction
We’ll talk about radiation later, in the quantum stat mech section. And discussing convection is
basically hopeless, as ﬂuid motions and mixings can be very complicated. But we can try to discuss
conduction using some simple models and physical intuition.
The idea is roughly that molecules are directly bumping into each other and transferring energy.
But this can also mean that energy is carried by lattice vibrations in a solid, or by the motion of
free electrons (good electrical conductors tend to have good heat conductivity too).
We can start by just making a simple guess that
ΔT δtA
Q∝ (9.1.1)
δx
or
dQ dT
∝A (9.1.2)
dt dx
The overall constant is called the thermal conductivity of a given ﬁxed material. So
dQ dT
= −kt A (9.1.3)
dt dx
which is called the Fourier heat conduction law.
Thermal conductivities of common materials vary by four orders of magnitude, in W/(mK): air
0.026, wood 0.08, water 0.6, glass 0.8, iron 80, copper 400. In common household situations thin
layers of air are very important as insulation, and can be more important than glass itself.

Heat Conductivity of an Ideal Gas

Let’s derive the simplest example of this, the conductivity for an ideal.
An important relevant parameter is the mean free path, which is the distance a molecule can
go before interacting with another molecule. It’s typically much larger than the distance between
molecules in a dilute gas. Let’s estimate it.
Collisions happen when molecules get within 2r of each other, where r is some rough estimate
of the ‘size’ the molecule. Really this is related to the cross section for interaction. Anyway,
molecules collide when the volume swept out

π(2r)2 ` (9.1.4)

is of order the volume per molecule, ie

V
π(2r)2 ` ≈ (9.1.5)
N
or
V
`≈ (9.1.6)
4πr2 N
61
for the mean free path `. This is an approximation in all sorts of ways, but it’s roughly correct.
In air the volume per particle is, using the ideal gas approximation, about
V kT
≈ ≈ 4 × 10−26 m3 (9.1.7)
N P
at room temperature. If the radius of nitrogen or oxygen molecules is about 1.5 angstroms, then
the mean free path is about 150 nm, which is about 40 times greater than the average separation
between air molecules. The average collision time is
`
Δ̄t = ≈ 3 × 10−10 s (9.1.8)
v̄
so this gives another relevant quantity. Apparently they’re colliding very frequently!
Now let’s study heat conduction. Imagine a surface across which we are studying heat flow. The
energy crossing this surface will be about UL /2 from the left and UR /2 from the right, if we study a
time Δt and a thickness away from the surface of order `. This leads to a net heat flow
1 1 1 dT
Q = (U1 − U2 ) = − CV (T2 − T1 ) = − CV ` (9.1.9)
2 2 2 dx
This leads to a prediction for the thermal conductivity as
CV ` CV
kt = = `v̄ (9.1.10)
2AΔt 2V
in terms of the MFP and average velocity. For an ideal gas
CV fP
= (9.1.11)
V 2T
where f is the number of active DoF per molecule. √
√ a general scaling law that since v̄ ≈ T and ` is independent of temperature, we
This suggests
expect kT ∝ T . This has been confirmed for a wide variety of gases.

9.2 Viscosity
Another quantity that you may have heard of is viscosity. It concerns the spread of momentum
through a fluid.
To understand this, we want to think of a picture of a fluid flowing in the x̂ direction, but where
the velocity of the fluid changes in the z direction. This could lead to chaotic turbulence, but let’s
assume that’s not the case, so that we have laminar flow.
In almost all cases, as one might guess intuitively, fluids tend to resist this shearing, differential
flow. This resistance is called viscosity. More viscosity means more resistance to this differential.
It’s not hard to guess what the viscous force is proportional to. We’d expect that
Fx dux
∝ (9.2.1)
A dz
62
and the proportionality constant is the viscosity, ie
Fx dux
∝η (9.2.2)
A dz
for coefficient of viscosity η. Although Fx /A has units of pressure, it isn’t a pressure, rather its
shear stress.
Viscosities vary a great deal with both material and temperature. Gases have lower viscosities
than fluids. And ideal gases have viscosity independent of pressure but increasing with temperature.
This can be explained because viscosity depends on momentum density in the gas, the mean free
path, and the average thermal speed. The first two depend on the density, but this cancels between
them – a denser gas carries more momentum, but it carries it less far! But since molecules
√ move
faster at higher temperature, more momentum gets transported, and so in a gas η ∝ T just as for
heat conductivity.
In a liquid, viscosity actually decreases with temperature because the molecules don’t stick to
each other as much or as well at higher temperature, and this ‘sticking’ is the primary cause of
liquid viscosity. Note that in effect η = ∞ for a solid, where ‘sticking’ dominates.

9.3 Diffusion
Now let’s talk about the spread of particles from high concentration to low concentration.
Let’s imagine the density n of particles increases uniformly in the x̂ direction. The flux is the
~ Once again we’d guess that
net number of particles crossing a surface, we’ll write it as J.
dn
Jx = −D (9.3.1)
dx
where D is the diffusion coefficient, which depends on both what’s diffusing and what it’s diffusing
through. Diffusion for large molecules is slower than for small molecules, and diffusion is much
faster through gases than through liquids. Values ranging from 10−11 to 10−5 m2 /s can be found.
Diffusion coefficients increase with temperature, because molecules move faster.
Diffusion is a very inefficient way for particles to travel. If I make a rough estimate for dye in
water that I want roughly all the dye to move through a glass of water then
N N/V
=D (9.3.2)
AΔt δx
and V = Aδx, then i have that
V δx δx2
Δt ≈ ≈ (9.3.3)
AD D
for a glass of water with δx ≈ 0.1 m and a D ≈ 10−9 m2 /s, it would take 107 seconds, or about 4
months!
Real mixing happens through convection and turbulence, which is much, much faster, but also
more complicated.

63
10 Quantum Statistical Mechanics
Let’s get back to our main topic and (oﬃcially) begin to discuss quantum statistical mechanics.

10.1 Partition Functions for Composite Systems

We’d like to understand what happens if we combine several diﬀerent systems together – what is the
partition function for the total system, in terms of the partition functions for the constituents?
The simplest and most obvious case is when we have two non-interacting distinguishable systems.
Then we have
X
Ztotal = e−βE1 (s1 ) e−βE2 (s2 ) = Z1 Z2 (10.1.1)
s1 ,s2

This follows because every state of the total systems is (s1 , s2 ), the Cartesian product. And
furthermore, there is no relation or indistinguishability issue between states in the ﬁrst and second
system.
I put this discussion under the Quantum Stat Mech heading because, in many cases, the systems
are indistinguishable. In that case the state stotal = (s1 , s2 ) = (s2 , s1 ), ie these are the same state.
We can approximate the partition function for that case by
1
Ztotal ≈ Z1 Z2 (10.1.2)
2
but this is only approximate, because we have failed to account correctly for s1 = s2 , in which case
we don’t need the 1/2. In the limit that most energy levels are unoccupied, this isn’t a big problem.
We’ll ﬁx it soon.
These expressions have the obvious generalizations
N
Y
Ztotal = Zi (10.1.3)
i=1

for distinguishable and

N
1 Y
Ztotal = Zi (10.1.4)
N ! i=1

for indistinguishable particles or sub-systems.

Ideal Gas from Boltzmann Statistics

The natural application of our approximations above is to ideal gases. We still haven’t computed Z
for an ideal gas!

64
The total partition function is
1 N
Ztotal = Z (10.1.5)
N! 1
in terms of the partition function for a single gas molecule, Z1 . So we just have to compute that.
To get Z1 we need to add up the Boltzmann factors for all of the microstates of a single molecule
of gas. This combines into

Z1 = Ztr Zint (10.1.6)

where the ﬁrst is the kinetic energy from translational DoF, and the latter are internal states such
as vibrations and rotations. There can also be electronic internal states due to degeneracies in the
ground states of the electrons in a molecule. For example, oxygen molecules have 3-fold degenerate
ground states.
Let’s just focus on Ztr . To do this correctly we need quantum mechanics.
This isn’t a QM course, so I’m mostly just going to tell you the result. A particle in a 1-d box
must have momentum
h
pn = n (10.1.7)
2L
as it must be a standing wave (with Dirichlet boundary conditions). So the energy levels are

p2n h2 n2
En = = (10.1.8)
2m 8mL2
From the energies, we can write down the partition function
X h 2 n2
Z1d = e−β 8mL2 (10.1.9)
n

Unless the box is extremely small, or T is extremely small we can approximate this by the integral
Z ∞
h2 n2 L
Z1d ≈ dne−β 8mL2 = h
(10.1.10)
0 √
2πmkB T

where we can deﬁne a quantum length scale

h
`Q = √ (10.1.11)
2πmkB T
It’s roughly the de Broglie wavelength of a particle of mass m with kinetic energy kT . At room
temperature for nitrogen, this length is ∼ 10−11 meters.
The partition function in 3d is just

Z3d = Z13d (10.1.12)

65
because the x, y, z motions are all independent. This gives
V
Z3d = (10.1.13)
`3Q

So the total partition function for an ideal gas is

!N
1 V
Z≈ Zint (10.1.14)
N! `3Q

where we don’t know Zint and it depends on the kind of molecule.

We can now compute the expected energy. Note that

log Z = N (log V + log Zint − log N − 3 log `Q + 1) (10.1.15)

and we can ﬁnd

∂
U =− log Z (10.1.16)
∂β
This gives
3
U = N kT + Uint (10.1.17)
2
which conﬁrms equipartition once again, plus the interaction term. The heat capacity is
3 ∂Uint
CV = N k + (10.1.18)
2 ∂T
We can compute the other properties of our gas using

F = −kT log Z = Fint − N kT (log V − log N − 3 log `Q + 1) (10.1.19)

For example we can ﬁnd the pressure, entropy, and chemical potential this way. We will once again
recover expressions we’ve seen earlier in the semester.

10.2 Gibbs Factors

We can modify our original derivation of Boltzmann factors to account for the possibility that
particles are exchanged between the reservoir and the system. The result are Gibbs factors and
the Grand Canonical Ensemble. The latter means that we fix Temperature and Chemical
Potential. Whereas in the microcanonical ensemble we fixed energy and particle number, and in
the canonical ensemble we fixed temperature and particle number.
Gibbs factors will be useful soon for quantum statistical physics because they allow us to re-
organize our thinking in terms of the number of particles in each state, rather than in terms of states
as given by a configuration of a fixed number of particles.

66
Recall that
P (s2 )
= eSR (s2 )−SR (s1 ) (10.2.1)
P (s1 )
Now we need to use
1
dSR = (dUR + P dVR − µdNR ) (10.2.2)
T
to ﬁnd that
1
SR (s2 ) − SR (s1 ) = − (E(s2 ) − E(s1 ) − µN (s2 ) + µN (s1 )) (10.2.3)
T
where E and N refer to the system. Thus we ﬁnd that the Gibbs factor is

e−(E−µN )/(kB T ) (10.2.4)

and so the probabilities are

1 −(E−µN )/(kB T )
P (s) = e (10.2.5)
Z
where
X
Z= e−(E(s)−µN (s))/(kB T ) (10.2.6)
s

where the sum includes all states, ranging over all values of E and all N . This is the promised
grand partition function or grand canonical distribution. In the presence of more types of
particles, we have chemical potentials and numbers for each.

Carbon Monoxide Poisoning

Carbon monoxide poisoning illustrates the use of the grand canonical ensemble.
We can idealize hemoglobin as having two states, occupied and unoccupied (by oxygen). So

Z = 1 + e−(−µ)/kT (10.2.7)

where = −0.7 eV. The chemical potential µ is high in the lungs, where oxygen is abundant, but
lower in cells where oxygen is needed. Near the lungs the partial pressure of oxygen is about 0.2
atm, and so the chemical potential is
!
V Zint
µ = −kT log ≈ −0.6eV (10.2.8)
N `3Q

So at body temperature 310 K we ﬁnd that the Gibbs factor is

e−(−µ)/kT ≈ 40 (10.2.9)

67
and so the probability of occupation by oxygen is 98%.
But things change in the presence of carbon monoxide, CO. Then there are three possible states
0 0
Z = 1 + e−(−µ)/kT + e−( −µ )/kT (10.2.10)

The CO molecule will be less abundant. If it’s 100 times less abundant, then

δµ = kT log 100 ≈ 0.12 eV (10.2.11)

But CO binds more tightly to hemoglobin, so that 0 ≈ −0.85 eV. In total this means
0 0
e−( −µ )/kT ≈ 120 (10.2.12)

which means that the oxygen occupation probability sinks to 25%. This is why CO is poisonous.

10.3 Bosons and Fermions

Now let’s correct our mistake, where we took
1 N
Z≈ Z (10.3.1)
N! 1
This was approximate because it ignores what happens when two particles are in the same state.
In fact, there are two types of particles, fermions and bosons. We can put as many identical
bosons as we like in the same state, but due to the Pauli exclusion principle, identical fermions
cannot be in the same state. This has extremely important implications for statistical physics. Note
that bosons always have integer spin, and fermions always have half-integer spin.
Note that this idea is irrelevant in the classical limit, when Z1 N , or when

V /N `3Q (10.3.2)

When this is violated, the wavefunctions of the particles start to overlap, and quantum eﬀects
become important. Systems that are quantum are either very dense or very cold, since we recall that
h
`Q = √ (10.3.3)
2πmkB T
We can picture this as a situation where the particle’s individual wavefunctions start to overlap. We
can’t separate them (make the wavefunctions more narrow) without giving them a lot of momentum,
increasing the energy and temperature of the gas.

Distribution Functions
We introduced the Gibbs ensemble to make computing the distribution functions easier. The idea is
to focus on individual, speciﬁc states that (some number of ) particles can be.

68
If we have a given single-particle state, then the probability of it being occupied by n particles is
1 − n (−µ)
P (n) = e kT (10.3.4)
Z
So we can use this to determine the Z for fermions and bosons.
In the case of fermions, the only occupancy allowed is n = 0, 1. That means that for any given
state, we simply have
1
Z = 1 + e− kT (−µ) (10.3.5)

This allows us to compute the occupancy

1
n̄ = 0P (0) + P (1) = 1
+ kT (−µ)
(10.3.6)
e +1
Note the crucial sign flip in the denominator! This is the Fermi-Dirac distribution. The F-D
distribution vanishes when µ and goes to 1 when µ. Draw it!
In the case of bosons all values of n are allowed. That means that
∞
n
X
Z = e− kT (−µ)
n=0
1
= 1 (10.3.7)
1 − e− kT (−µ)
by summing the geometric series. Notice that crucially the signs in the denominator here are
different.
Let’s look at the expected occupancy. This is
∞
X
n̄ = nP (n)
n=0
∂
= − log Z (10.3.8)
∂x
1
where x = kT
( − µ). By computing directly with this formula we find

1
n̄ = 1
+ kT (−µ)
(10.3.9)
e −1
where once again we emphasize that the signs here are crucial! This is the Bose-Einstein
distribution.
Notice that the B-E distribution goes to inﬁnity as → µ. This means that in that limit, the
state achieves inﬁnite occupancy! In the case of fermions this was impossible because occupancy can
only be 0 or 1. This divergence is Bose condensation. Draw it!

69
Let’s compare these results to the classical limit, where we’d have a Boltzmann distribution.
In the Boltzmann distribution

n̄ = e−(−µ)/kT 1 (10.3.10)

Draw it! I wrote the last inequality because the classical limit obtains when we don’t have many
particles in the same state. In this limit, it’s ﬁne to simply ignore large occupancies. Note that this
is equal to both the F-D and B-E distributions in the desired limit. The classical limit is low
occupancy.
In the rest of this section we will apply these ideas to simple systems, treating ‘quantum gases’.
That means that we can pretend that the systems are governed by single-particle physics. This
applies to electrons in metals, neutrons in neutron stars, atoms in a ﬂuid at low temperature, photons
in a hot oven, and even ‘phonons’, the quantized sound (vibrations) in a solid. We’ll determine µ in
most cases indirectly, using the total number of particles.

10.4 Degenerate Fermi Gases

An extremely important application of quantum stat mech is to gases of fermions, or fermions at
low temperature. Most importantly, this theory describes most of the properties of metals, which
are just ‘gases’ of free electrons. This also applies to nuclei, white dwarves, and neutron stars.
Here low temperature means ‘quantum’, not ‘low compared to room temperature’. That is we
mean
3
V 3 h
. `Q = √ ≈ (4 nm)3 (10.4.1)
N 2πmkT
where the last is for an electron at room temperature. The reason metals are at ‘low temperatures’
is because there is roughly one conduction electron per atom, and the atomic spacing are roughly
0.2 nm. So we’re far outside the limit of Boltzmann statistics, and in fact closer to the limit T = 0.
So let’s start with that.

Zero Temperature
Recall the F-D distribution
1
n̄F D = 1
(−µ)
(10.4.2)
e kT +1
In the limit that T = 0 this is a step function. All states with < µ are occupied, and all others are
unoccupied.
We call

F ≡ µ(T = 0) (10.4.3)

the Fermi Energy. In this low-temperature state of aﬀairs the fermion gas is said to be ‘degenerate’.

70
How can we determine F ? It can be fixed if we know how many electrons are actually present.
If we imagine adding electrons to an empty box, then they simply fill the states from lowest energy
up. To add one more electron you need F = µ of energy, and our understanding that

∂U
µ= (10.4.4)
∂N S,V
makes perfect sense. Note that adding an electron doesn’t change S, since we’re at zero temperature
and the state is unique!
To compute F we’ll assume that the electrons are free particles in a box of volume V = L3 . This
isn’t a great approximation because of interactions with ions in the lattice, but we’ll neglect them
anyway.
As previously seen, the allowed wavefunctions are just sines and cosines, with momenta
h
p~ = ~n (10.4.5)
2L
where ~n = (nx , ny , nz ). Energies are therefore
h2 2
= ~n (10.4.6)
8mL2
To visualize the allowed states, its easiest to draw them in ~n-space, where only positive integer
coordinates are allowed. Each point describes two states after accounting for electron spins. Energies
are proportional to ~n2 , so states start at the origin and expand outward to fill availabilities.
The Fermi energy is just
h2 n2max
F = (10.4.7)
8mL2
We can compute this by computing the total volume of the interior region in the space, which is
1 4 3 πn3max
N = 2 × × πnmax = (10.4.8)
8 3 3
Thus we find that
2/3
h2

3N
F = (10.4.9)
8m πV
Is this intensive of extensive? What does that mean? (Shape of ‘box’ is irrelevant.)
We can compute the average energy per electron by computing the total energy and dividing.
That is
h2 2
Z
1
U = 2 d3 n ~n
8 8mL2
Z nmax
h2 2
= dn(πn2 ) ~n
0 8mL2
πh2 n5max
=
40mL2
3
= N F (10.4.10)
5
71
where the 3/5 is a geometrical factor that would be different in more or fewer spatial dimensions.
1
Note that F ∼ few eV, which is much much larger than kT ≈ 40 eV. This is the same as
the comparison between the quantum volume and the average volume per particle. The Fermi
temperature is
F
TF ≡ & 104 K (10.4.11)
kB
which is hypothetical insofar as metals would liquefy before reaching this temperature.
Using the standard or thermodynamic definition of the pressure we can find
∂U
P = −
∂V
2/3 !
3 h2

∂ 3N
= − N
∂V 5 8m πV
2N F 2U
= = (10.4.12)
5V 3V
which is the degeneracy pressure. Electrons don’t want to be compressed, as they want space!
(Remember this has nothing to do with electrostatic repulsion.)
The degeneracy pressure is enormous, of order 109 N/m2 , but it isn’t measurable, as it is canceled
by the electrostatic forces that pulled the electrons into the metal in the first place. The bulk
modulus is measureable, and its just

∂P 10 U
B = −V = (10.4.13)
∂V T 9 V
This quantity is also large, but its not completely canceled by electrostatic forces, and (very roughly)
accords with experiment.

Small Temperatures
The distribution of electrons barely changes when TF T > 0, but we need to study ﬁnite
temperature to see how metals change with temperature – for example, to compute the heat
capacity!
Usually particles want to acquire a thermal energy kT , but in a degenerate Fermi gas they can’t,
because most states are occupied. So it’s only the state near the Fermi surface that can become
excited. Furthermore, it’s only states within kT of the Fermi surface that can be excited. So the
number of excitable electrons is proportional to T .
This suggests that
δU ∝ (N kT )(kT ) ∼ N (kT )2 (10.4.14)
By dimensional analysis then, since the only energy scale around is F , we should have
(kT )2
δU ≈ N (10.4.15)
F

72
This implies that
kT
CV ∝ N (10.4.16)
F
This agrees with experiments. Note that it also matches what we expect from the 3rd law of
thermodynamics, namely that CV → 0 as T → 0.

Density of States
A natural object of study is the density of states, which is the number of single-particle states
per unit of energy. For a Fermi gas note that
h2 2 L√
= ~n , |~n| = 8m (10.4.17)
8mL2 h
so that
r
L 2m
dn = d (10.4.18)
h
This means that we can write integrals like
Z nmax
h2 2
U = dn(πn2 ) n
0 8mL2
Z F " 3/2 #
π 8mL2 √
= d 2
(10.4.19)
0 2 h

where the quantity in square brackets is the density of states

π V √ 3N √
g() = (8m)3/2 3 = 3/2 (10.4.20)
2 h 2F
The total energy is the integral of the number of states at a given energy times that energy g().
Note that this expression does not depend on N except through V ; in the last formula N cancels.
The density of states is a useful concept in a variety of systems. It doesn’t depend on statistical
mechanics, rather determining it is just a quantum mechanics problem.
At zero temperature we have
Z F
N= g()d (10.4.21)
0

but at higher temperature we have to multiply by the non-trivial occupancy, which is really the
right way to think about zero temperature as well
Z ∞ Z ∞
g()
N= g()n̄F D ()d = 1 d (10.4.22)
0 0 e kT (−µ) + 1

73
similarly for the energy
Z ∞ Z ∞
g()
U= g()n̄F D ()d = 1
(−µ)
d (10.4.23)
0 0 e kT +1
The zero temperature limit is just a special case.
Note that at T > 0 the chemical potential changes. Its determined by the fact that the integral
expression for N must remain ﬁxed, and since g() is larger for larger , the chemical potential must
change to compensate. In fact it must decrease slightly. We can compute it by doing the integrals
and determining µ(T ). Then we can determine U (T ) as well.

Sommerfeld Expansion – Evaluating Integrals at Low Temperature

Unfortunately, we can’t do the integrals for N or U explicitly and analytically. But we can
approximate them in the limit

kB T F (10.4.24)

which we have seen is a very good approximation.

To get started, we can write
Z ∞ √

N = g0 1 d (10.4.25)
0 e kT (−µ) + 1
The only interesting region in the integral is when ≈ µ, as otherwise the n̄F D factor is simply 1 or
0. So to isolate this region, we can integrate by parts, writing
∞ Z ∞
2 3/2 2 3/2 d¯
nF D
N = g0 n̄F D () + g0 − d (10.4.26)
3 0 3 0 d
d¯
nF D
The boundary term vanishes, and so what we’re left with is nicer, because d
is basically zero
except near ≈ µ.
We will now start writing
−µ
x= (10.4.27)
kT
to simplify the algebra. In these terms we see that
nF D
d¯ 1 ex
= (10.4.28)
d kT (ex + 1)2
So we need to evaluate the integral
Z ∞
2 1 ex
N = g0 x + 1)2
3/2 d
3 kT (e
Z0 ∞
2 ex
= g0 x + 1)2
3/2 dx (10.4.29)
3 −µ/kT (e

74
So far this was exact, but now we will start making approximations.
First of all, note that µ/kT 1 and that the integrand falls oﬀ exponentially when −x 1.
This means that we can approximate
Z ∞
2 ex
N ≈ g0 x + 1)2
3/2 dx (10.4.30)
3 −∞ (e

up to exponentially small corrections. Then we can expand

3 3
3/2 = µ3/2 + ( − µ) µ1/2 + ( − µ)2 µ−1/2 + · · · (10.4.31)
2 8
So we’re left with
∞
ex 3 (kT )2 2
Z
2 3/2 3 1/2
N ≈ g0 µ + kT µ x + x + · · · dx (10.4.32)
3 −∞ (ex + 1)2 2 8 µ1/2

Now the integral can be performed term-by-term. This provides a series expansion of the result,
while neglecting some exponentially small corrections.
The ﬁrst term is
Z ∞ Z ∞
ex

1
x 2
dx = −∂x x dx = 1 (10.4.33)
−∞ (e + 1) −∞ e + 1)

The second term vanishes because

Z ∞ Z ∞
xex x
x 2
dx = x −x + 1)
dx = 0 (10.4.34)
−∞ (e + 1) −∞ (e + 1)(e

since the integrand is odd. Finally, the integral

Z ∞
x 2 ex π2
x 2
dx = (10.4.35)
−∞ (e + 1) 3

though it’s harder to evaluate.

Assembling the pieces, we ﬁnd that

π 2 (kT )2

2 3/2
N = g0 µ 1+ + ···
3 8 µ2
3/2
π 2 (kT )2

µ
= N 1+ + ··· (10.4.36)
F 8 µ2

This shows that µ/F ≈ 1, and so we can use that in the second term to solve and ﬁnd

µ π 2 (kT )2
≈1− + ··· (10.4.37)
F 12 2F

75
so the chemical potential decreases a bit as T increases.
We can evaluate the energy integrals using the same tricks, with the result that

3 µ5/2 3π 2 (kT )2
U = N 3/2 + N + ··· (10.4.38)
5 F 8 F

We can plug in for µ to ﬁnd that

3 π 2 (kT )2
U = N F + N + ··· (10.4.39)
5 4 F
so that we have the energy in terms of T . As expected, it increases with temperature with the
anticipated parametric dependence.

10.5 Bosons and Blackbody Radiation

Now we will consider what we expect to find for electromagnetic radiation inside a box.
Classically, this radiation takes the form of excitations of the electromagnetic field. Without
quantum mechanics, there is no notion of occupancy, and instead each degree of freedom is
approximately quadratic. This means that we have 12 kT of energy per DoF. But there are infinitely
many modes at short distances, leading to an infinite energy! This is called the ultraviolet catastrophe.
Quantum Mechanics was invented, in part, to solve this problem.

Planck Distribution
Instead of equipartition, each mode of the electromagnetic ﬁeld, which is its own harmonic oscillator,
can only have energy

En = n~ω (10.5.1)

where the values of ω are ﬁxed by the boundary conditions of the box. This n is the number of
photons!
The partition function is
1
Z= (10.5.2)
1 − e−β~ω
for the Bose-Einstein distribution. So the average energy in an oscillator is
d ~ω
Ē = − log Z = β~ω (10.5.3)
dβ e −1
The number of photons is just given by n̄BE . But in the context of photons, its often called the
Planck distribution.
This solves the ultraviolet catastrophe, by suppressing the high energy contributions exponentially.
This really required quantization, so that high energy modes are ‘turned oﬀ’.

76
Note that for photons µ = 0. At equilibrium this is required by the fact that photons can be
freely created or destroyed, ie

∂F
= 0 = µγ (10.5.4)
∂N T,V

This explains why µ didn’t appear in the Planck distribution.

Photons in a Box
We would like to know the total number and energy of photons inside a box.
As usual, the momentum of these photons is
hm
pm = (10.5.5)
2L
but since they are relativistic, the energy formula is now
hcm
= pc = (10.5.6)
2L
This is also just what we get from their frequencies.
The energy in a mode is just the occupancy times the energy. This is
X hc|m| 1
U =2 hc|m|
(10.5.7)
mx ,my ,mz
2L e 2LkT − 1

Converting to an integral, we have

Z ∞
π hcm 1
U = dm m2 hcm
0 2 L e 2LkT − 1
Z ∞ 3
πhcm 1
= dm hcm (10.5.8)
0 2L e 2LkT − 1
hcm
We can then change variables to = 2L
and use L3 = V to ﬁnd
∞
8π3
Z
U 1
= d 3 /kT
(10.5.9)
V 0 (hc) e −1

the integrand is the spectrum, or energy density per unit photon energy

8π3 1
u() = 3 /kT
(10.5.10)
(hc) e −1

The number density is just u/. The spectrum peaks at ≈ 2.8kT ; the 3 comes from living in
3-dimensions.

77
To evaluate the integral, its useful to change variables to x = /kT , so that all of the physical
quantities come out of the integral, giving

8π(kT )4 ∞ x3
Z
U
= dx (10.5.11)
V (hc)3 0 ex − 1

This means that one can determine the temperature of an oven by letting a bit of radiation leak out
and looking at its color.

Total Energy
Doing the last integral gives

U 8π 5 (kT )4
= (10.5.12)
V 15(hc)3

Note that this only depend on kT , as the other constants are just there to make up units. Since its
an energy density, pure dimensional analysis could have told us that it had to scale as (kT )4 .
Numerically, the result is small. At a typical oven temperature of 460 K, the energy per unit
volume is 3.5 × 10−5 J/m3 . This is much smaller than the thermal energy of the air inside the oven...
because there are more air molecules than photons. And that’s because the photon wavelengths are
smaller than the typical separations between air molecules at this temperature.

Entropy
CV
We can determine the entropy by computing CV and then integrating T
dT . We have that

∂U
CV = = 4aT 3 (10.5.13)
∂T V
5 4
8π k
where a = 15(hc) 3 . Since our determinations were fully quantum, this works down to T = 0, which

means that
Z T
CV 4
S(T ) = dt = aT 3 (10.5.14)
0 t 3
The total number of photons scales the same way, but with a diﬀerent coeﬃcient.

CMB
The Cosmic Microwave Background is the most interesting photon gas; it’s at about 2.73 K. So its
spectrum peaks at 6.6 × 10−4 eV, with wavelength around in a millimeter, in the far infrared.
The CMB has far less energy than ordinary matter in the universe, but it has far more entropy –
about 109 units of S per cubic meter.

78
Photons Emission Power
Now that we have understood ‘photons in a box’, we would like to see how radiation will be emitted
from a hot body. A natural and classical starting point is to ask what happens if you poke a hole in
the box of photons.
Since all light travels at the same speed, the spectrum of emitted radiation will be the same as
the spectrum in the box. To compute the amount of radiation that escapes, we just need to do some
geometry.
To get out through the hole, radiation needs to have once been in a hemisphere of radius R from
the hole. The only tricky question is how much of the radiation at a given point on this hemisphere
goes out through a hole with area A. At an angle θ from the perpendicular the area A looks like it
has area

Aef f = A cos θ (10.5.15)

So the fraction of radiation that’s pointed in the correct direction, and thus gets out is
Z π/2
Aef f
dθ2πR2 sin θ
0 4πR2
Z π/2
A A
= dθ sin θ cos θ = (10.5.16)
0 2 4
Other than that, the rate is just given by the speed of light times the energy density, so we ﬁnd

AU 2π 5 A(kT )4
c = (10.5.17)
4V 15h3 c2
In fact, this is the famous power per unit area blackbody emission formula

2π 5 k 4 4
P =A 3 2
T = AσT 4 (10.5.18)
15h c
where σ is known as the Stefan-Boltzmann constant, with value σ = 5.67 × 10−8 mW 2K4 . The
dependence on the fourth power of the temperature was discovered by Stefan empirically in 1879.

Universality of Black Body Spectrum

This rule actually applies to emission from any non-reflecting surface, and is referred to as blackbody
radiation. The argument that the hole and blackbody are the same is simply to put them in contact
with each other via radiation, and demand that since they’re in equilibrium, to avoid violating the
2nd law of thermodynamics, they must emit in the same way. Note that by adding filters, this
implies the full spectra must be the same.
The same argument implies that the emissivity e of the material at any given wavelength must
be such that 1 − e is its reflection coefficient. So good reflectors must be poor emitters, and vice
versa. In this case the spectrum will be modified, since e generally depends on wavelength.

79
Sun and Earth
We can use basic properties of the Earth and Sun to make some interesting deductions.
The Earth receives 1370 W/m2 from the sun (known as the solar constant). The earth’s 150
million km from the sun. This tells us the sun’s total luminosity is 4 × 1026 Watts.
The sun’s radius is a little over 100 times earth’s, or 7 × 108 m. So its surface area is 6 × 1018 m2 .
From this information, if we assume an emissivity of 1, we can ﬁnd that
1/4
luminosity
T = = 5800 K (10.5.19)
σA

This means that the sun’s spectrum should peak at

= 2.82kT = 1.41 eV (10.5.20)

which is in the near infrared. This is testable and agrees with experiment. This is close to visible
red light, so we get a lot of the sun’s energy in the visible spectrum. Note that ‘peak’ and ‘average’
aren’t quite the same here. (Perhaps in some rough sense evolution predicts that we should be able
to see light near the sun’s peak?)
We can also easily estimate the Earth’s equilibrium temperature, assuming its emission and
absorption are balanced. If the power emitted is equal to the power absorbed and emissivity is 1
then

(solar constant)πR2 = 4πR2 σT 4 (10.5.21)

which gives

T ≈ 280 K (10.5.22)

This is extremely close to the measured average temperature of 288 K.

Earth isn’t a perfect blackbody, as clouds (and other stuff) reflect about 30% of the light, so
that the earth’s predicted temperature is 255 K. Emissivity and absorption don’t balance because
Earth emits in Infrared and mostly absorbs in near IR and visible.
However, the earth’s surface is warmer than the upper atmosphere, so while 255 K applies to the
atmosphere, the surface is warmer due to the greenhouse effect. If we treat the atmosphere as a thin
layer intermediate between the earth’s surface, we get an extra factor of 2 in power on the surface,
which brings the temperature back up by 21/4 to 303 K. This is a bit high, but roughly right again.

10.6 Debye Theory of Solids

Now we get to see why I was complaining about the Einstein model early in the course.
The point is that the vibrations in a solid are sound waves, and so they have a range of wavelengths
and corresponding frequencies, with λf = cs . So it doesn’t make sense to treat them as copies of
the same harmonic oscillator.

80
In fact, the ‘phonons’ of sound are extremely similar to the photons of light, except that phonons
travel at cs , have 3 polarizations, can have diﬀerent speeds for diﬀerent polarizations (and even
directions!), and cannot have wavelength smaller than (twice) the atomic spacing.
All this means that we have a simple relation
hcs
= hf = |~n| (10.6.1)
2L
and these modes have a Bose-Einstein distribution
1
n̄BE = (10.6.2)
e/(kT ) −1
with vanishing chemical potential, as phonons can be created and destroyed.
The energy is also the usual formula
X
U =3 n̄BE () (10.6.3)
nx ,ny ,nz

The only complication is that we need to decide

√ what range of sum or integration to do for the ~n.
Formally we should sum ni from 1 to 3 N . But that would be annoying, so let’s pretend that
instead we have a spehere in n-space with
1/3
6N
nmax = (10.6.4)
π

so that the total volume in n-space is the same. This approximation is exact at both low and high
temperatures! Why?
Given this approximation, we can easily convert our sum into spherical coordinates and compute
the total average energy. This is
Z nmax
4πn2
U = 3 dn
0 8 e kT − 1
Z nmax
3π hcs n3
= dn hcs n (10.6.5)
2 0 2L e 2LkT −1
Don’t be fooled – this is almost exactly the same as our treatment of photons.
We can change variables to
hcs
x= n (10.6.6)
2LkT
This means that we are integrating up to
1/3
hcs hcs 6N TD
xmax = nmax = = (10.6.7)
2LkT 2kT πV T

81
where
1/3
hcs 6N
TD = (10.6.8)
2k πV

is the Debye Temperature.

The integral for energy becomes
TD /T
9N kT 4 x3
Z
U= dx (10.6.9)
TD3 0 ex − 1

We can just integrate numerically, but what about physics?

In the limit that T TD , the upper limit of the integral is very small, so
Z TD /T Z TD /T 3
x3 2 1 TD
dx ≈ x dx ≈ (10.6.10)
0 ex − 1 0 3 T

so that we ﬁnd

U = 3N kT (10.6.11)

in agreement with equipartition and also the Einstein model.

In the opposite limit of T TD the upper limit doesn’t matter, as its almost inﬁnity, and we
ﬁnd
3π 4 N k 4
U= T (10.6.12)
5TD3

which means that phonons are behaving exactly like photons, and physics is dominated by the
number of excited modes.
This means that at low temperature

12π 4 N k 3
CV = T (10.6.13)
5TD3

and this T 3 rule beautifully agrees with experimental data. Though for metals, to really match, we
need to also include the electron contribution.

Some Comments on Debye

Let’s compare this to the Fermi energy
2/3
h2

3N
F = (10.6.14)
8me πV

82
for electrons. Notice that the power-law dependence on the density is diﬀerent because electrons are
non-relativistic. In particular, we have
1/3
D cs me V
∝ (10.6.15)
F h N
This shows that if V /N doesn’t vary all that much, the main thing determining D is the speed of
sound cs . If we had time to cultivate a deeper understanding of materials, we could try to estimate
if we should expect this quantity to be order one.
You should also note that for T TD we might expect that solids will melt. Why? If not, it
means that the harmonic oscillator potentials have signiﬁcant depth, so we can excite many, many
modes without the molecules escaping.

10.7 Bose-Einstein Condensation

So far we have only treated bosons where µ = 0, so that particles can be created and destroyed. But
what about bosonic atoms, where N is ﬁxed? In that case we’ll have to follow the same procedure
as for Fermions. But we get to learn about B-E condensation, another rather shocking consequence
of quantum mechanics.
Note that in the limit T → 0 this is obvious – the bosons will all be in their ground state, which
is the same state for all N atoms. That this state has energy
h2 3h2
0 = (1 + 1 + 1) = (10.7.1)
8mL2 8mL2
which is a very small energy if L is macroscopic.
Let’s determine µ in this limit. We see that
1 kT
N0 = 0 −µ ≈ (10.7.2)
e kT −1 0 − µ
This means that µ = 0 at T = 0, and just a bit larger for small, positive T . But the question is,
how low must T be for N0 to be large?
We determine µ by ﬁxing N , and it’s usually easiest to think of this as
Z ∞
1
N= d g() −µ (10.7.3)
0 e kT − 1
Here g() is the usual density of states. For spin-zero bosons in a box, this is identical to the case of
electrons (ignoring a 2 for spin) so that
3/2
√

2 2πm
g() = √ V (10.7.4)
π h2
Note that this is an approximation assuming a discretum of states, which isn’t a good approximation,
really, for very small energies.

83
As we’ve seen before, we cannot do the integral analytically.
Let’s see what happens if we just choose µ = 0. In that case, using our usual x = /kT type
change of variables we have
3/2 Z ∞ √
2 2πmkT xdx
N=√ 2
V x
(10.7.5)
π h 0 e −1
Numerically this gives
3/2
2πmkT
N = 2.612 V (10.7.6)
h2
This really just means that there’s a particular Tc for which it’s true that µ = 0, so that
2/3
h2 N
kTc = 0.527 (10.7.7)
2πm V
At T > Tc we know that µ < 0 so that the total N is ﬁxed. But what about at T < Tc ?
Actually, our integral representation breaks down at T < Tc as the discreteness of the states
becomes important. The integral does correctly represent the contribution of high energy states, but
fails to account for the few states near the bottom of the spectrum.
This suggests that
3/2
2πmkT
Nexcited ≈ 2.612 V (10.7.8)
h2

for T < Tc . (This doesn’t really account for low-lying states very close to the ground state.)
Thus we have learned that at temperature T > Tc , the chemical potential is negative and all
atoms are in excited states. But at T < Tc , the chemical potential µ ≈ 0 and so
3/2
T
Nexcited ≈ N (10.7.9)
Tc
and so the rest of the atoms are in the ground state with
" 3/2 #
T
N0 = N − Nexcited ≈ N 1 − (10.7.10)
Tc

The accumulation of atoms in the ground state is Bose-Einstein condensation, and Tc is called
the condensation temperature.
Notice that as one would expect, this occurs when quantum mechanics becomes important, so
that the quantum length
3
V 3 h
≈ `Q = √ (10.7.11)
N 2πmkT

84
So condensation occurs when the wavefunctions begin to overlap quite a bit.
Notice that having many bosons helps – that is for ﬁxed volume
2/3
h2

N
kTc = 0.527 ∝ N 2/3 0 (10.7.12)
2πm V

We do not need a kT of order the ground state energy to get all of the bosons into the ground state!

Achieving BEC
BEC with cold atoms was first achieved in 1995, using a condensate of about 1000 rubidium-87
atoms. This involved laser cooling and trapping. By 1999 BEC was also achieved with sodium,
lithium, hydrogen.
Superfluids are also BECs, but in that case interactions among the atoms are important for
the interesting superfluidic properties. The famous example is helium-4, which has a superfluid
component below 2.17 K.
Note that superconductivity is a kind of BEC condensation of pairs of electrons, called Cooper
pairs. Helium-3 has a similar pairing-based superfluidic behavior at extremely low temperatures.

Why? Entropy
With distinguishable particles, the ground state’s not particularly favored until kT ∼ 0 . But with
indistinguishable particles, the counting changes dramatically.
The number of ways of arranging N indistinguishable particles in Z1 1-particle states is
Z1
(N + Z1 )! N
∼ eZ1 (10.7.13)
N !Z1 ! Z1

for Z1 N . Whereas in the distinguishable case we just have Z1N states, which is far, far larger. So
in the QM case with Bosons, at low temperatures we have very small relative multiplicity or
entropy, as these states will be suppressed by e−U/kT ∼ e−N . So we quickly transition to a BEC.

10.8 Density Matrices and Classical/Quantum Probability

As a last comment on quantum statistical mechanics, I want to mention how we actually combine
quantum mechanics with statistics, and to emphasize that statistical probabilities and quantum
probabilities are totally diﬀerent.

11 Phase Transitions
A phase transition is a discontinuous change in the properties of a substance as its environment
(pressure, temperature, chemical potential, magnetic ﬁeld, composition or ‘doping’) only change

85
infinitesimally. The different forms of a substance are called phases. A diagram showing the
equilibrium phases as a function of environmental variables is called a phase diagram.
On your problem set you’ll show (if you do all the problems) that BEC is a phase transition.
Phase structure of water has a triple point where all meet, and a critical point where water
and steam become indistinguishable (374 C and 221 bars for water). No ice/liquid critical point,
though long molecules can form liquid crystal phase where molecules move but stay oriented. There
are many phases of ice at high pressure. Metastable phases are possible via super-cooling/heating
etc.
The pressure at which a gas can coexist with its solid or liquid phase is called the vapor
pressure. At T = 0.01 Celsius and P = 0.006 bar all three phases can coexist at the triple point.
At low pressure ice sublimates directly into vapor.
We’ve seen sublimation of CO2 , this is because the triple point is above atmospheric pressure, at
5.2 bars. Note pressure lowers the melting temperature of ice, which is unusual (because ice is less
dense than water).
There are also phase transitions between fluid/superfluid and to superconductivity (eg in the
temperature and magnetic field plane). Ferromagnets can have domains of up or down magnetization,
and there’s a critical point at B = 0 and T = Tc called the Curie point.
Phase transitions have been traditionally classified as ‘first order’ or ‘second order’ depending on
whether the Gibbs energy or its various derivatives are discontinuous; we usually just say first order
or ‘continuous’.

Diamond and Graphite

Graphite is more stable than Diamond in the atmosphere, as the Gibbs free energy of a mole of
diamond is 2900 J higher than that of graphite. Recall
G = U − TS + PV (11.0.1)
To analyze what happens under other conditions, we can usefully note that

∂G ∂G
= V, = −S (11.0.2)
∂P T,N ∂T P,N
Since diamond has a smaller volume than graphite, as pressure increases its G doesn’t increase as
fast as that of graphite, so at 15k atmospheres of pressure diamond becomes more stable. Thus
diamonds must form deep in the earth – deeper than about 50 km.
Graphite has more entropy, so at high temperatures graphite is even more stable compared to
Diamond, and so at higher temperatures Diamonds are more likely to convert to graphite.
In general, many properties of phase transitions follow from thinking carefully about the behavior
of the free energy.

11.1 Clausius-Clapeyron
Given the relations for derivatives of G, it’s easy to see how phase boundaries must depend on P
and T .

86
At the boundary we must have that the two phases have G1 = G2 . But this also means that

dG1 = dG2 (11.1.1)

along the phase boundary, which means that

−S1 dT + V1 dP = −S2 dT + V2 dP (11.1.2)

so that
dP S1 − S2
= (11.1.3)
dT V1 − V2
and this determines the slope of the phase boundary (eg between a liquid and gas transition).
We can re-write this in terms of the latent heat L via

L = T (S1 − S2 ) (11.1.4)

and use the change in volume ΔV = V1 − V2 so that

dP L
= (11.1.5)
dT T ΔV
This is the Clausius-Clapeyron relation. We can use it to trace out phase boundaries.

11.2 Non-Ideal Gases

Gases become ideal in the limit of very low density, so that N/V → 0. Thus it’s natural to treat
this as a small parameter and expand in it. For this purpose, motivated by the ideal gas law, it’s
natural to write
2 3
P N N N
= + B2 (T ) + B3 (T ) + ··· (11.2.1)
kT V V V

We can actually try to compute the functions Bn (T ) by doing what’s called a ‘cluster expansion’ in
the interactions of the gas molecules.
But for now, let’s just see what we get if we include the ﬁrst correction. A particular equation
of state that’s of this form, and which keeps only the ﬁrst two terms, is the famous va der Waals
equation of state

N2

P + a 2 (V − N b) = N kT (11.2.2)
V
or
N kT N2
P = −a 2 (11.2.3)
V − Nb V

87
for constants a and b. This formula is just an approximation! Note that the absence of higher order
terms means that we are neglecting effects from interactions among many molecules at once, which
are certainly important at high density.
We can also write the vdW equation more elegantly in terms of purely intensive quantities as
kT 1
P = −a 2 (11.2.4)
v−b v
where v = V /N . This makes it clear that the total amount of stuff doesn’t matter!
The modification by b is easy to understand, as it means that the fluid cannot be compressed
down to zero volume. Thus b is a kind of minimum volume occupied by a molecule. It’s actually
well approximated by the cube of the average width of a molecule, eg 4 angstroms cubed for water.
The a term accounts for short-range attractive forces between molecules. Note that it scales
like N 2 /V 2 , so it corresponds to forces on all the molecules from the molecules, and it effectively
decreases the pressure (for a > 0), so it’s attractive. The constant a varies a lot between substances,
in fact by more than two orders of magnitude, depending on how much molecules interact.
Now let’s investigate the implications of the van der Waals equation. We’ll actually be focused
on the qualitative implications. Let’s plot pressure as a function of volume at fixed T . At large T
curves are smooth, but at small T they’re more complicated, and have a local minimum! How can a
gas’s pressure decrease when its volume decreases!?
To see what really happens, let’s compute G. We have
dG = −SdT + V dP + µdN (11.2.5)
At fixed T and N this is just dG = V dP so

∂G ∂P
=V (11.2.6)
∂V T,N ∂V T,N
From the van der Waals equation we can compute derivatives of P , giving
2aN 2

∂G N kT V
= 2
− (11.2.7)
∂V T,N V (V − N b)2
Integrating then gives
N 2 bkT 2aN 2
G = −N kT log(V − N b) + − + c(T ) (11.2.8)
V − Nb V
where c(T ) is some constant that can depend on T but not V .
Note that the thermodynamically stable state has minimum G. This is important because at a
given value of pressure P , there can be more than one value of G. We see this by plotting G and P
parametrically as functions of V . This shows the volume changes abruptly at a gas-liquid transition!
We can determine P at the phase transition directly, but there’s another cute, famous method.
Since the total change in G is zero across the transition, we have
I I I
∂G
0 = dG = dP = V dP (11.2.9)
∂P T,N

88
We can compute this last quantity from the PV diagram. This is called the Maxwell construction.
This is a bit non-rigorous, since we’re using unstable states/phases to get the result, but it works.
Repeating the analysis shows where the liquid gas transition occurs for a variety diﬀerent
temperatures. The corresponding pressure is the vapor pressure where the transition occurs. At
the vapor pressure at a ﬁxed volume, liquid and gas can coexist.
At high temperature, there is no phase boundary! Thus there’s a critical temperature Tc and
a corresponding Pc and Vc (N ) at which the phase transition disappears. At the critical point, liquids
and gases become identical! The use of the volume in all of this is rather incidental. Recall that
G = N µ, and so we can rephrase everything purely in terms of intensive quantities.
Note though that while van der Waals works qualitatively, it fails quantitatively. This isn’t
surprising, since it ignores multi-molecule interactions.

11.3 Phase Transitions of Mixtures

Phase transitions become more complicated when we consider mixtures of diﬀerent kinds of particles.
Essentially, this is due to the entropy of mixing. As usual, we should study the Gibbs free energy

G = U + PV − TS (11.3.1)

In the unmixed case, with substances A and B, we would expect

G = (1 − x)GA + xGB (11.3.2)

where x is the fraction and GA/B are the Gibbs free energies of unmixed substances.
But if we account for entropy of mixing, then instead of a straight line, Smix will be largest when
x ≈ 1/2, so that G will be smaller than this straight line suggests. Instead G will be concave up as
a function of x.
We can roughly approximate

Smixing ≈ −N (x log x + (1 − x) log(1 − x)) (11.3.3)

where N is the total number of molecules. This is correct for ideal gases, and for liquids or solids
where the molecules of each substance are roughly the same. This leads to

G = (1 − x)GA + xGB − N kB T (x log x + (1 − x) log(1 − x)) (11.3.4)

This is the result for an ideal mixture. Liquid and solid mixtures are usually far from ideal, but
this is a good starting point.
Since the derivative of G wrt x at x = 0 and x = 1 is actually inﬁnite, equilibrium phases almost
always contain impurities.
Even non-ideal mixtures tend to behave this way, unless there’s some non-trivial energy or U
dependence in mixing, eg with oil and water, where the molecules are only attracted to their own
kind. In that case the G of mixing can actually be concave down – that’s why oil and water don’t
mix, at least at low T . Note that at higher T there’s a competition between the energy eﬀect and

89
the entropy eﬀect. Even at very low T , the entropy dominates near x = 0 or x = 1, so oil and water
will have some ‘impurities’.
Note that concave-down G(x) simply indicates an instability where the mixture will split up
into two unmixed substances. In this region we say that the two phases are immiscible or have a
solubility gap.
So we end up with two stable mixtures at small x > 0 and large x < 1, and unmixed substances
between. Solids can have solubility gaps too.
You can read about a variety of more complicated applications of these ideas in the text.

11.4 Chemical Equilibrium

Chemical reactions almost never go to completion. For example consider

H2 O ↔ H + + OH − (11.4.1)

Even if one conﬁguration of chemicals is more stable (water), every once in a while there’s enough
energy to cause the reaction to go the other way.
As usual, the way to think about this quantitatively is in terms of the Gibbs free energy

G = U − TS + PV (11.4.2)

and to consider the chemical reaction as a system of particles that will react until they reach an
equilibrium. We end up breaking up a few H2 O molecuels because although it costs energy, the
entropy increase makes up for that. Of course this depends on temperature though!
We can plot G as a function of x, where 1 − x is the fraction of H2 O. Without any mixing we
would expect G(x) to be a straight line, and since U is lower for water, we’d expect x = 0. However,
due to the entropy of mixing, equilibrium will have x > 0 – recall that G0 (x) is inﬁnite at x = 0!
Plot this.
We can characterize the equilibrium condition by the fact that the slope of G(x) is zero. This
means that
X
0 = dG = µi dNi (11.4.3)
i

where we assumed T, P are ﬁxed. The sum on the right runs over all species. But we know that

dNH2 O = −1, dNH + = −1, dNOH − = 1 (11.4.4)

because atoms are conserved. So this implies that

µH2 O = µH + + µOH − (11.4.5)

at equilibrium. Since the chemical potentials are a function of species concentration, this determines
the equilibrium concentrations!

90
The equilibrium condition is always the same as the reaction itself, with names of chemicals
replaced by their chemical potentials. For instance with

N2 + 3H2 ↔ 2N H3 (11.4.6)

we would have

µN2 + 3µH2 ↔ 2µN H3 (11.4.7)

simply by noting how the various dNi must change.

Note that these results apply beyond ‘Chemistry’, for instance they also apply to the reactions
about nuclei in the early universe!
If the substances involved in reaction are all ideal gases, then recall that we can write their
chemical potentials as
Pi
µi = µi0 + kT log (11.4.8)
P0

where µi0 is the chemical potential in a ﬁxed ‘standard’ state when its partial pressure is P0 . So for
example, for the last chemical reaction we can write

PN2 PH3 2

N H3 N2 H2 ΔG0
kT log = 2µ 0 − µ0 − 3µ0 = (11.4.9)
PN2 H3 P02 N

which means that

PN2 PH3 2 ΔG
− N kT0
= e =K (11.4.10)
PN2 H3 P02

where K is the equilibrium constant. This equation is called the law of mass action by chemists.
Even if you don’t know K a priori, you can use this equation to determine what happens if you add
reactants to a reaction.
For this particular reaction – Nitrogen ﬁxing – we have to use high temperatures and very high
pressures – this was invented by Fritz Haber, and it revolutionized the production of fertilizers and
explosives.

Ionization of Hydrogen
For the simple reaction

H ↔p+e (11.4.11)

we can compute everything from ﬁrst principles. We have the equilibrium condition

PH P0
kT log = µp0 + µe0 − µH
0 (11.4.12)
Pp P e

91
We can treat all of these species as structureless monatomic gases, so that we can use
" 3/2 #
kT 2πmkT
µ = −kT log (11.4.13)
P h2

The only additional issues is that we need to subtract the ionization energy of I = 13.6 eV from µH0 .
Since mp ≈ mH we have
" 3/2 #
PH P0 kT 2πmkT I
− log = kT log 2
− (11.4.14)
P p Pe P0 h kT
Then with a bit of algebra we learn that
3/2
Pp kT 2πmkT
= e−I/kT (11.4.15)
PH Pe h2
which is called the Saha equation. Note that Pe /kT = Ne /V . If we plug in numbers for the
surface of the sun this gives
Pp
. 10−4 (11.4.16)
PH
so less than one atom in ten thousand are ionized. Note though that this is much, much larger than
the Boltzmann factor by itself!

11.5 Dilute Solutions

When studying mixtures, there’s a natural, ‘simplest possible’ limit that we can take, where one
substance dominates. In this case, molecules of the other substances are always isolated from each
other, as their immediate neighbors are molecules of the dominant substance. This is the limit of
dilute solutions, and the dominant substance is called the solvent, while the other components are
solutes.
In this case we can think of the solutes as almost like an ‘ideal gas’, since they are approximately
non-interacting. It’s the kind of problem we know how to solve!
So we need to compute the Gibbs free energy and chemical potentials. If the solvent is A, then
to start with, with no solute, we have
G = NA µ0 (T, P ) (11.5.1)
Now we imagine adding B molecules, and G changes by
dG = dU + P dV − T dS (11.5.2)
The dU and P dV terms don’t depend on NA , rather they only depend on how B molecules interact
with their A-molecule neighbors. But T dS depends on NB due to the freedom we have in placing
the B molecules. The number of choices is proportional to NA , so we get one term
dS ⊃ k log NA (11.5.3)

92
But we also need to account for the redundancy because B molecules are identical. The result is

G = NA µ0 (T, P ) + NB f (T, P ) − NB kT log NA + kT log NB !

= NA µ0 (T, P ) + NB f (T, P ) − NB kT log NA + NB kT (log NB − 1) (11.5.4)

where f (T, P ) doesn’t depend on NA . It is what accounts for the interaction of B molecules with
the A molecules that surround them. This expression is valid as long as NB NA .
The chemical potentials follow from

∂G NB
µA = = µ0 (T, P ) − kT (11.5.5)
∂NA T,P,NB NA

and

∂G NB
µB = = f (T, P ) + kT log (11.5.6)
∂NB T,P,NA NA

Adding solute reduces µA and increases µB . Also, these chemical potentials depend only on the
intensive ration NB /NA .

Osmotic Pressure
If we have pure solvent on one side of a barrier permeable only by solvent, and solute-solvent solution
on the other side, then solvent will want to move across the barrier to further dilute the solution.
This is called osmosis.
To prevent osmosis, we could add additional pressure to the solution side. This is the osmotic
pressure. It’s determined by equality of solvent chemical potential. So we need
NB ∂µ0
µ0 − kT = µ0 + δP (11.5.7)
NA ∂P
Note that
∂µ0 V
= (11.5.8)
∂P N
at ﬁxed T, N because G = N µ, and so
V NB
δP = kT (11.5.9)
NA NA
or
NB kT
δP = (11.5.10)
V
is the osmotic pressure.

93
Boiling and Freezing Points
The boiling and freezing points of a solution are altered by the presence of the solute. Often we can
assume the solute never evaporates, eg if we consider salt in water. Intuitively, the boiling point
may increase (and the freezing point decrease) because the effective liquid surface area is decreased
due to the solute ‘taking up space’ near the surface and preventing liquid molecules from moving.
In the case of freezing, the solute takes up space for building solid bonds.
Our theorizing above provides a quantitative treatment of these effects. For boiling we have
NB
µ0 (T, P ) − kT = µgas (T, P ) (11.5.11)
NA
and so the NB /NA fraction of solute makes the liquid chemical potential smaller, so that higher
temperatures are needed for boiling.
If we hold the pressure fixed and vary the temperature from the pure boiling point T0 we find
∂µ0 NB ∂µgas
µ0 (T0 , P ) + (T − T0 ) − kT = µgas (T, P ) + (T − T0 ) (11.5.12)
∂T NA ∂T
and we have that
∂µ S
=− (11.5.13)
∂T N
so that

Sliquid − Sgas NB
−(T − T0 ) = kT (11.5.14)
NA NA
The difference in entropies between gas and liquid is L/T0 , where L is the latent heat of vaporization,
so we see that
N kT02
T − T0 = (11.5.15)
L
where we treat T ≈ T0 on the RHS.
A similar treatment of pressure shows that
P NB
=1− (11.5.16)
P0 NA
for the boiling transition – we need to be at lower pressure for boiling to occur.

11.6 Critical Points

Now let us further discuss the critical point. We will see that critical points are universal.
Note that both
dP d2 P
= 0, =0 (11.6.1)
dV dV 2

94
at the critical point, which helps us to easily identify it (eg using the van der Waals equation). A
cute way to say this is to write the vdW equation using intensive variables as

P v 3 − (P b + kT )v 2 + av − ab = 0 (11.6.2)

The critical point occurs when all three roots of the cubic coincide, so that

pc (v − vc )3 = 0 (11.6.3)

and if we expand this out we can identify that

8a a
kT = , vc = 3b, Pc = (11.6.4)
27b 27b2

Law of Corresponding States

We can eliminate a, b completely if we re-write the vdW equation in terms of Tc , vc , and Pc .
Furthermore, we can then write the equation in dimensionless variables
T v P
T¯ = , v̄ = , P¯ = (11.6.5)
Tc vc Pc
so that the vdW equation becomes

8 T̄ 3
P̄ = − 2 (11.6.6)
3 v̄ − 1/3 v̄

This is often called the law of corresponding states. The idea being that all gases/liquids correspond
with each other once expressed in these variables. This extends the extreme equivalence among
gases implied by the ideal gas law to the vdW equation.
Furthermore, since Tc , vc , pc only depend on a, b, there must be a relation among them, it is that
pc vc 3
= (11.6.7)
kT 8
Thus all gases described by the vdW equation must obey this relation! Of course general gases
depend on more than the two parameters a, b, since vdW required truncating the expansion in the
density.
This universality has been conﬁrmed in experiments, which show that all gases, no matter their
molecular makeup, behave very similarly in the vicinity of the critical point. Let’s explore that.

Approaching the Critical Point

It turns out that the most interesting question is to ask how various quantities such as v, T, P behave
as we approach the critical point. We’re now going to do a naive analysis that will turn out to be
wrong for interesting reasons.

95
For T < Tc so that T¯ < 1, the vdW equation has two solutions to
8 T̄ 3
P̄ = − 2 (11.6.8)
3 v̄ − 1/3 v̄
corresponding to a vgas and vliquid . This means that we can write
(3v̄l − 1)(3v̄g − 1)(v̄l + v̄g )
T̄ = (11.6.9)
8v̄g2 v̄l2
We can expand near the critical point by expanding in vg − vl , whcih gives
1
T̄ ≈ 1 − (v̄g − v̄l )2 (11.6.10)
16
or
p
v̄g − v̄l ∝ Tc − T (11.6.11)
so this tells us how the diﬀerence in molecular volumes varies near the critical temperature.
We can answer a similar question about pressure with no work, giving
P − Pc ∝ (v − vc )3 (11.6.12)
since the ﬁrst and second derivatives of P wrt volume at Pc vanish.
As a third example, let’s consider the compressibility
1 ∂v
κ= − (11.6.13)
v ∂P T

We know that at Tc the derivative of pressure wrt volume vanishes, so we must have that
∂v
= −a(T − Tc ) (11.6.14)
∂P T,vc

which we can invert to learn that

1
κ∝ (11.6.15)
T − Tc
Note that to discover these last two facts, we didn’t even need the vdW equation. Rather, we
assumed that the scaling behavior near the critical point was analytic, ie it has a nice Taylor series
expansion.
Actually, our results are quite wrong, even the ones that didn’t rely on vdW!
In fact, all gas-liquid transitions seem to universally behave as
vg − vl ∝ (Tc − T )0.32
p − pc ∝ (v − vc )4.8
1
κ ∝ (11.6.16)
(T − Tc )1.2

96
which differ significantly from our predictions. In particular, these are non-analytic, ie they don’t
have a nice Taylor series expansion about the critical point.
We already saw a hint of why life’s more complicated at the critical point – there are large
fluctuations, for example in the density. The fact that the compressibility

1 ∂v 1
− ∝ (11.6.17)
v ∂P T (T − Tc )γ

(where we though γ = 1, and in fact γ = 1.2) means that the gas/liquid is becoming arbitrarily
compressible at the critical point. But this means that there will be huge, local ﬂuctuations in the
density. In fact one can show that locally

Δn2 ∂hni 1 ∂v
= −kT (11.6.18)
n ∂v P,T v ∂P N,T

so that ﬂuctuations in density are in fact diverging. This also means that we can’t use an equation
of state that only accounts for average pressure, volume, and density.
Understanding how to account for this is the subject of ‘Critical Phenomena’, with close links to
Quantum Field Theory, Conformal Field Theory, and the ‘Renormalization Group’. Perhaps they’ll
be treated by a future year-long version of this course. But for now we’ll move on to study a speciﬁc
model that shows some interesting related features.

12 Interactions
Let’s study a few cases where we can include the eﬀects of interactions.

12.1 Ising Model

The Ising Model is one of the most important non-trivial models in physics. It can be interpreted as
either a system of interacting spins or as a lattice gas.

As a Magnetic System
The kind of model that we’ve already studied has energy or Hamiltonian
N
X
EB = −B si (12.1.1)
i=1

where si are the spins, an d we can only have

si = ±1 (12.1.2)

This is our familiar paramagnet.

97
The Ising model adds an additional complication, a coupling between nearest neighbor spins, so
that
X N
X
E = −J s i sj − B si (12.1.3)
hiji i=1

where the notation hiji implies that we only sum over the nearest neighbors. How this actually
works depends on the dimension – one can study the Ising model in 1, 2, or more dimensions. We
can also consider diﬀerent lattice shapes. We often use q to denote the number of nearest neighbors.
If J > 0 then neighboring spins prefer to be aligned, and the model acts as a ferromagnet; when
J < 0 the spins prefer to be anti-aligned and we have an anti-ferromagnet.
We can thus describe the Ising model via a partition function
X
Z= e−βE[si ] (12.1.4)
si

The B and J > 0 will make the spins want to align, but temperature causes them to want to
ﬂuctuate. The natural observable to study is the magnetization
1 ∂
m= log Z (12.1.5)
N β ∂B

As a Lattice Gas
With the same math we can say diﬀerent words and view the model as a lattice gas. Since the
particles have hard cores, only one can sit at each site, so the sites are either full, ni = 1, or empty,
ni = 0. Then we can further add a reward in energy for them to sit on neighboring sites, so that
X X
E = −4J ni nj − µ ni (12.1.6)
hiji i

where µ is the chemical potential, which determines overall particle number. This Hamiltonian is
the same as above with si = 2ni − 1.

Mean Field Theory

The Ising model can be solved in d = 1 and, when B = 0, in d = 2; the latter is either very
complicated or requires very fancy techniques. But we can try to say something about its behavior
by approximating the eﬀect of the interactions by an averaging procedure.
The idea is to write each spin as

si = (si − m) + m (12.1.7)

where m is the average spin throughout the lattice. Then the neighboring interactions are

si sj = (si − m)(sj − m) + m(sj − m) + m(si − m) + m2 (12.1.8)

98
The approximation comes in when we assume that the ﬂuctuations of spins away from the average
are small. This means we treat

(si − m)(sj − m) → 0 (12.1.9)

but note that this does’t mean that

(si − m)2 = 0 (12.1.10)

as this latter statement is never true, since we will sum over si = ±1. The former statement is
possibly true because si 6= sj . In this approximation the energy simpliﬁes greatly to
X X
m(si + sj ) − more2 − B

Emf t = −J si
hiji i
1 X
= JN qm2 − (Jqm + B) si (12.1.11)
2 i

where N q/2 is the number of nearest neighbor pairs. So the mean ﬁeld approximation has removed
the interactions. Instead we have

Bef f = B + Jqm (12.1.12)

so that the spins see an extra contribution to the magnetic ﬁeld set by the mean ﬁeld of their
neighbors.
Now we are back in paramagnet territory, and so we can work out that the partition function is
JN qm2
Z = e−β 2 (e−βBef f + eβBef f )N
2
−β JN 2qm
= e 2N coshN (βBef f ) (12.1.13)

However, this is a function of m... yet m is the average spin, so this can’t be correct unless it predicts
the correct value of m. We resolve this issue by computing the magnetization using m and solving
for it, giving

m = tanh(βB + βJqm) (12.1.14)

We can understand the behavior by making plots of these functions.

Vanishing B
Perhaps the most interesting case is B = 0. The behavior depends on the value of βJq, as can be
noted from the expansion of tanh.
If βJq < 1, then the only solution is m = 0. So for T > Jq, ie at high temperatures, the average
magnetization vanishes. Thermal ﬂuctuations dominate and the spins do not align.

99
However if T < Jq, ie at low temperatures, we have a solution m = 0 as well as magnetized
solutions with m = ±m0 . The latter correspond to phases where the spins are aligning. It turns out
that m = 0 is unstable, in a way analogous to the unstable solutions of the van der Waals equation.
The critical temperature separating the phases is

kTc = Jq (12.1.15)

and at this temperature, the magnetization abruptly turns oﬀ.

At finite B, the magnetization is never zero, but instead always aligns with B; it smoothly shuts
off as T → ∞, but is finite for any finite T . Though in fact at small T there are other solutions that
are unstable or meta-stable.
But wait... are these results even correct??? Mean Field Theory was an uncontrolled approxi-
mation. It turns out that MFT is qualitatively correct for d ≥ 2 (in terms of the phase structure),
but is totally wrong for an Ising chain, in d = 1 – in that case there is no phase transition at finite
temperature. It’s amusing to note that MFT is exact for d → ∞.

Free Energy
We can understand the phase structure of the Ising model by computing the free energy. In the
MFT approximation it is
1 1 N
F = − log Z = JN qm2 − log(2 cosh(βBef f )) (12.1.16)
β 2 β

Demanding that F be in a local minimum requires dF/dm = 0, which requires that

m = tanh(βBef f ) (12.1.17)

To determine which solutions dominate, we just calculate F (m). For example when B = 0 and
T < Tc we have the possibilities m = 0, ±m0 , and we see that F (m = 0) = 0 while F (±m0 ) < 0, so
that the aligned phases dominate.
Usually we can only deﬁne F in equilibrium. But we can take a maverick approach and pretend
that F (T, B, m) actually makes sense away from equilibrium, and see what it says about various
phases with diﬀerent values of m. This is the Landau theory of phases. It is the beginning of a rich
subject...

12.2 Interacting Gases

Maybe someday.

100

Physical Chemistry Quantum Chemistry
100% (3)
Physical Chemistry Quantum Chemistry
352 pages
Thermal Physics
No ratings yet
Thermal Physics
286 pages
LET Review - Physical Science
100% (1)
LET Review - Physical Science
145 pages
Thermodynamics and Statistical Physics
No ratings yet
Thermodynamics and Statistical Physics
341 pages
Statistical Mechanics
80% (5)
Statistical Mechanics
105 pages
Thermodynamics of Material
No ratings yet
Thermodynamics of Material
120 pages
Notes of Statistical Mechanics
100% (1)
Notes of Statistical Mechanics
288 pages
StatisticalMechanicsNotes PDF
100% (1)
StatisticalMechanicsNotes PDF
101 pages
ALLEN's Thermodynamic BOOK
0% (1)
ALLEN's Thermodynamic BOOK
314 pages
Statistical Mechanics
100% (3)
Statistical Mechanics
264 pages
Part II Thermal and Statistical Physics
100% (2)
Part II Thermal and Statistical Physics
149 pages
Lecture Notes On Thermodynamics and Statistical Mechanics
No ratings yet
Lecture Notes On Thermodynamics and Statistical Mechanics
502 pages
Lectures PDF
100% (1)
Lectures PDF
137 pages
Failures Due To Copper Sulphide in Transformer Insulation PDF
No ratings yet
Failures Due To Copper Sulphide in Transformer Insulation PDF
3 pages
Basic PPT of Vam
100% (2)
Basic PPT of Vam
25 pages
Chapter 05
33% (3)
Chapter 05
5 pages
Statistical Mechanics - Oberlin
100% (1)
Statistical Mechanics - Oberlin
247 pages
Thermo Notes
100% (1)
Thermo Notes
381 pages
Notes Hoker PDF
No ratings yet
Notes Hoker PDF
105 pages
Ph12c-Sam-Elder-Notes Preskill Statmech PDF
No ratings yet
Ph12c-Sam-Elder-Notes Preskill Statmech PDF
51 pages
Notes For The Course: Statistical Physics
No ratings yet
Notes For The Course: Statistical Physics
87 pages
Notes 3
No ratings yet
Notes 3
277 pages
Book Statmech
No ratings yet
Book Statmech
521 pages
MecEst
No ratings yet
MecEst
246 pages
Thermo Dynamics Script
No ratings yet
Thermo Dynamics Script
210 pages
Stat Mech
No ratings yet
Stat Mech
564 pages
Physics 12c: Introduction To Statistical Mechanics: Course Website
No ratings yet
Physics 12c: Introduction To Statistical Mechanics: Course Website
146 pages
210 Course PDF
No ratings yet
210 Course PDF
530 pages
Book
No ratings yet
Book
262 pages
Statistical Mechanics: Daniel F. Styer December
No ratings yet
Statistical Mechanics: Daniel F. Styer December
258 pages
StatMech LectNotes
No ratings yet
StatMech LectNotes
235 pages
LedaFlow Tutorial LedaFlow Tutorial. Study of Severe Slugging Effects in A Pipeline - Riser Geometry
No ratings yet
LedaFlow Tutorial LedaFlow Tutorial. Study of Severe Slugging Effects in A Pipeline - Riser Geometry
26 pages
Lecture Notes PDF
No ratings yet
Lecture Notes PDF
130 pages
12 - PDFsam - Didier de Fontaine - Principles of Classical Thermodynamics - Applied To Materials Science-World Scientific Publishing Co (2019)
No ratings yet
12 - PDFsam - Didier de Fontaine - Principles of Classical Thermodynamics - Applied To Materials Science-World Scientific Publishing Co (2019)
78 pages
Sample Problems in Aircon1
No ratings yet
Sample Problems in Aircon1
1 page
Northwestern MSE314
No ratings yet
Northwestern MSE314
100 pages
LectureNotes Statistical Physics (Students)
No ratings yet
LectureNotes Statistical Physics (Students)
296 pages
Thermal Physics Concepts and Practice
No ratings yet
Thermal Physics Concepts and Practice
309 pages
314 Text
No ratings yet
314 Text
120 pages
Lecture Notes by D. Arovas
No ratings yet
Lecture Notes by D. Arovas
440 pages
First Order System Dynamics: Government Engineering College, Valsad Chemical Engineering Department
No ratings yet
First Order System Dynamics: Government Engineering College, Valsad Chemical Engineering Department
8 pages
Styer Thermo
No ratings yet
Styer Thermo
259 pages
PHYS20352 Thermal and Statistical Physics: Tobias Galla May 7, 2014
No ratings yet
PHYS20352 Thermal and Statistical Physics: Tobias Galla May 7, 2014
152 pages
4211 Contents
No ratings yet
4211 Contents
8 pages
THSTM
No ratings yet
THSTM
85 pages
Resumen-Ira Levine-1-12
No ratings yet
Resumen-Ira Levine-1-12
12 pages
Pchem Book
No ratings yet
Pchem Book
352 pages
Second Quantization
No ratings yet
Second Quantization
26 pages
Hướng dẫn sử dụng bộ điều khiển nhiệt độ Dixell-XR72CX
No ratings yet
Hướng dẫn sử dụng bộ điều khiển nhiệt độ Dixell-XR72CX
4 pages
RU CTX 0720B Web 072920
No ratings yet
RU CTX 0720B Web 072920
12 pages
Manual Indesit IWSD4105
No ratings yet
Manual Indesit IWSD4105
48 pages
Tfy-3.365 Statistical Physics and Thermodynamics (Spring 2004)
No ratings yet
Tfy-3.365 Statistical Physics and Thermodynamics (Spring 2004)
3 pages
SPE-179536-MS The Myths of Waterfloods, EOR Floods and How To Optimize Real Injection Schemes
No ratings yet
SPE-179536-MS The Myths of Waterfloods, EOR Floods and How To Optimize Real Injection Schemes
16 pages
Aim 720 - PST 05a - 04 04 2025
No ratings yet
Aim 720 - PST 05a - 04 04 2025
21 pages
Design and Control of A Modified Vinyl Acetate Monomer Process
No ratings yet
Design and Control of A Modified Vinyl Acetate Monomer Process
12 pages
Science G8 TG FINAL PDF
No ratings yet
Science G8 TG FINAL PDF
230 pages
Ch19 - Unknown PDF
No ratings yet
Ch19 - Unknown PDF
63 pages
Astm C1702 23
No ratings yet
Astm C1702 23
7 pages
Code Aster User Manuel
No ratings yet
Code Aster User Manuel
14 pages
Accuracy of Tympanic Temperature Measure
No ratings yet
Accuracy of Tympanic Temperature Measure
5 pages
Heat
No ratings yet
Heat
12 pages
FRNX22D3CWI
No ratings yet
FRNX22D3CWI
57 pages
Physics 1010-401: Signature Assignment & Reflective Essay
No ratings yet
Physics 1010-401: Signature Assignment & Reflective Essay
12 pages
TSG - Bahrain Condensing Units G6 - Latest 2024 Selection
No ratings yet
TSG - Bahrain Condensing Units G6 - Latest 2024 Selection
1 page
Legal Metrology: Load Cells and Weigh Modules
No ratings yet
Legal Metrology: Load Cells and Weigh Modules
8 pages
21 - Pre-Insulated Phenolic Ducts
No ratings yet
21 - Pre-Insulated Phenolic Ducts
5 pages
Determination of Hot-Spot Temperature For ONAN
No ratings yet
Determination of Hot-Spot Temperature For ONAN
9 pages
Rao 2006
No ratings yet
Rao 2006
13 pages
1 s2.0 S254252932200339X Main
No ratings yet
1 s2.0 S254252932200339X Main
33 pages
Kinetic Model of Matter - QUIZ
No ratings yet
Kinetic Model of Matter - QUIZ
10 pages
r22 Phase Out English
No ratings yet
r22 Phase Out English
12 pages
Curve Fitting in Real Life
No ratings yet
Curve Fitting in Real Life
7 pages
Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
Quantum Physics for Beginners
From Everand
Quantum Physics for Beginners
Max Thomson
4.5/5 (3)
Deadline Istanbul (The Elizabeth Darcy Series)
From Everand
Deadline Istanbul (The Elizabeth Darcy Series)
Peggy Hanson
5/5 (1)
Deadline Yemen (The Elizabeth Darcy Series)
From Everand
Deadline Yemen (The Elizabeth Darcy Series)
Peggy Hanson
5/5 (1)
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
A Discourse Analysis of 1 Peter
From Everand
A Discourse Analysis of 1 Peter
Ervin Ray Starwalt
No ratings yet
The Gracious Lily Affair
From Everand
The Gracious Lily Affair
Van Wyck Mason
5/5 (1)
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Osama the Gun
From Everand
Osama the Gun
Norman Spinrad
5/5 (1)
Between River and Mountain
From Everand
Between River and Mountain
Sally Walker Brinkmann
No ratings yet
Mortals or Immortals
From Everand
Mortals or Immortals
Konstantinos p Anastasiadis
No ratings yet
Keys to Better Reading
From Everand
Keys to Better Reading
Judy McFall
No ratings yet
Operation Longlife
From Everand
Operation Longlife
E. Hoffmann Price
3.5/5 (3)
Operation Exile
From Everand
Operation Exile
E. Hoffmann Price
3.5/5 (1)
Hamlet Had an Uncle: A Comedy of Honor
From Everand
Hamlet Had an Uncle: A Comedy of Honor
James Branch Cabell
4.5/5 (7)
Kellory the Warlock
From Everand
Kellory the Warlock
Lin Carter
No ratings yet
The First Science Fiction Novel MEGAPACK®: 6 Great Science Fiction Novels
From Everand
The First Science Fiction Novel MEGAPACK®: 6 Great Science Fiction Novels
John Gregory Betancourt
No ratings yet