Statistical Mechanics Notes
Statistical Mechanics Notes
Jared Kaplan
Abstract
Various lecture notes to be supplemented by other course materials.
Contents
1 What Are We Studying? 3
6 Thermodynamic Potentials 39
1
9 Transport, In Brief 60
9.1 Heat Conduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
9.2 Viscosity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
9.3 Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
11 Phase Transitions 85
11.1 Clausius-Clapeyron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
11.2 Non-Ideal Gases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
11.3 Phase Transitions of Mixtures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
11.4 Chemical Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
11.5 Dilute Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
11.6 Critical Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
12 Interactions 97
12.1 Ising Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
12.2 Interacting Gases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Basic Info
This is an advanced undergraduate course on thermodynamics and statistical physics. Topics
include basics of temperature, heat, and work; state-counting, probability, entropy, and the second
law; partition functions and the Boltzmann distribution; applications to engines, refrigerators, and
computation; phase transitions; basic quantum statistical mechanics and some advanced topics.
The course will use Schroeder’s An Introduction to Thermal Physics as a textbook. We will follow
the book fairly closely, though there will also be some extra supplemental material on information
theory, theoretical limits of computation, and other more advanced topics. If you’re not planning
to go on to graduate school and become a physicist, my best guess is that the subject matter and
style of thinking in this course may be the most important/useful in the physics major, as it has
significant applications in other fields, such as economics and computer science.
2
1 What Are We Studying?
This won’t all make sense yet, but it gives you an outline of central ideas in this course.
We’re interested in the approximate or coarse-grained properties of very large systems. You may
have heard that chemists define NA = 6.02 × 1023 as a mole, which is the number of atoms in a
small amount of stuff (ie 12 grams of Carbon-12). Compare this to all the grains of sand on all of
the world’s beaches ∼ 1018 , the number of stars in our galaxy ∼ 1011 , or the number of stars in the
visible universe ∼ 1022 . Atoms are small and numerous, and we’ll be taking advantage of that fact!
Here are some examples of systems we’ll study:
• We’ll talk a lot about gases. Why? Because gases have a large number of degrees of freedom,
so statistics applies to them, but they’re simple enough that we can deduce their properties by
looking at one particle at a time. They also have some historical importance. But that’s it.
• Magnetization – are magnetic dipoles in a substance aligned, anti-aligned, random? Once
again, if they don’t interact, its easy, and that’s part of why we study them!
• Metals. Why and how are metals all roughly the same (answer: they’re basically just gases of
electrons, but note that quantum mechanics matters a lot for their behavior).
• Phase Transitions, eg gas/liquid/solid. This is obviously a coarse-grained concept that
requires many constituents to even make sense (how could you have a liquid consisting of only
one molecule?). More interesting examples include metal/superconductor, fluid/superfluid,
gas/plasma, aligned/random magnetization, confined quarks/free quarks, Higgs/UnHiggsed,
and a huge number of other examples studied by physicists.
• Chemical reactions – when, how, how fast do they occur, and why?
Another important idea is universality – in many cases, systems with very different underlying
physics behave the same way, because statistically, ie on average, they end up being very similar.
For example, magnetization and the critical point of water both have the same physical description!
Related ideas have been incredibly important in contemporary physics, as the same tools are used
by condensed matter and high-energy theorists; eg superconductivity and the Higgs mechanism are
essentially the same thing.
Historically physicists figured out a lot about thermodynamics without understanding statistical
mechanics. Note that
• Thermodynamics deals with Temperature, Heat, Work, Entropy, Energy, etc as rather abstract
macroscopic concepts obeying certain rules or ‘laws’. It works, but it’s hard to explain why
without appealing to...
• Statistical Mechanics gets into the details of the physics of specific systems and makes statistical
predictions about what will happen. If the system isn’t too complicated, you can directly
derive thermo from stat mech. So statistical mechanics provides a much more complete
picture, and is easier to understand, but at the same time the analysis involved in stat mech is
more complicated. It was discovered by Maxwell, Boltzmann, Gibbs, and many others, who
provided a great deal of evidence for the existence of atoms themselves along the way.
3
A Very Brief Introduction
So what principles do we use to develop stat mech?
• Systems tend to be in typical states, ie they tend to be in and evolve towards the most likely
states. This is the second law of thermodynamics (from a stat mech viewpoint). Note
S ≡ kB log N (1.0.1)
is the definition of entropy, where N is the number of accessible states, and kB is a pointless
constant included for historical reasons. (You should really think of entropy as a pure number,
simply the log of the number of accessible states.)
For example, say I’m a bank robber, and I’m looking for a challenge, so instead of stealing $100
bills I hijack a huge truck full of quarters. But I end up crashing it on the freeway, and 106 quarters
fly everywhere, and come to rest on the pavement. How many do you expect will be heads vs tails?
The kind of reasoning you just (intuitively) employed is exactly the sort of thinking we use in stat
mech. You should learn basic statistical intuition well, as its very useful both in physics, and in the
rest of science, engineering, and economics. It’s very likely the most universally important thing you
can learn well as a physics undergraduate.
Is there any more to stat mech than entropy? Not much! Only that
• The laws of physics have to be maintained. Mostly this means that energy is conserved.
This course will largely be an exploration of the consequences of basic statistics + physics, which
will largely mean counting + energy conservation.
Say I have 200 coins, and I restrict myself to coin configurations with the same number of heads
and tails. Now I start with the heads in one pile and the tails in the other (100 each). I play a game
where I start randomly flipping a head and a tail over simultaneously. After a while, will the two
100-coin piles still be mostly heads and mostly tails? Or will each pile have roughly half heads and
half tails? This is a pretty good toy model for heat transfer and entropy growth.
4
• Machine Learning systems are largely built on the same conceptual foundations. In particular,
‘learning’ in ML is essentially just statistical regression, and differences between ML models
are often measured with a version of entropy. More importantly, even when ML systems are
deterministic algorithms, they are so complicated and have so many parameters that we use
statistical reasoning to understand them – this is identical to the situation in stat mech, where
we describe deterministic underlying processes using probability and statistics. (It’s also what
we do when we talk about probability in forecasting, eg when we say there’s a 30% probability
someone will win an election.)
5
Ideal Gas and the Energy-Temperature Relation
Gases are the harmonic oscillator of thermodynamics – the reason is that we want to study a large
number of very simple (approximately independent!) degrees of freedom in a fairly intuitive context.
Everyone breathes and has played with a balloon.
Low-density gases are nearly ideal, and turn out to satisfy (eg experiments demonstrate this)
P V = N kT (2.1.1)
where k is Boltzmann’s constant (which is sort of arbitrary and meaningless, as we’ll see, but we
have to keep it around for historical reasons).
• Note that Chemists use Avogadro’s number NA = 6.02 × 1023 and nR = N k where n is molar
density. This is also totally arbitrary, but it hints at something very important – there are
very big numbers floating around!
• Of course the ideal gas law is an approximation, but it’s a good approximation that’s important
to understand.
Why are we talking about this? Because the relationship between temperature and energy
is simple for an ideal gas. Given
F¯x,onpiston F¯x,onmolecule m Δv
� x
Δt
P̄ = =− =− (2.1.2)
A A A
Its natural to use the double-round-trip time
L
δt = 2 (2.1.3)
vx
Note that
So we find that
m (−2vx ) mvx2
P̄ = − = (2.1.5)
A 2L/vx V
6
The two factors of vx come from the momentum transfer and from the time between collisions.
If we have N molecules, the total pressure is the sum, so we can write
P̄ V = N mv¯x2 (2.1.6)
So far we haven’t made any assumptions. But if we assume the ideal gas law, we learn that
1 1
kT = mv¯x2 (2.1.7)
2 2
or the average total kinetic energy is
3 1 �
kT = m v¯x2 + v¯y 2 + v¯z2
(2.1.8)
2 2
Cool! We have related energy to temperature, by assuming the (empirical for now) ideal gas law.
We can also get the RMS speed of the molecules as
3kT
v̄ 2 = (2.1.9)
m
Is the sum or average of the squares of numbers equal to the sum squared? So this isn’t the average
speed, but it’s pretty close. (What does it being close tell us about the shape of the distribution of
speeds?)
Our result is true for ideal gases. The derivation breaks down at very low temperatures, when
the molecules get very close together, though actually the result is still true (just not our argument).
We’ll discuss this later in the course.
Equipartition of Energy
The equipartition theorem, which we’ll prove later in the course, says that all ‘quadratic’ degrees of
freedom have average energy = 12 kT . This means kinetic energies and harmonic oscillator potential
energies:
• Atoms in molecules can rotate, and their bonds can vibrate.
• Solids have 3 DoF per molecular motion and 3 DoF for the potential energy in bonds.
• Due to QM, some motions can be frozen out, so that they don’t contribute.
This is all useful for understanding how much heat energy various substances can absorb.
7
• Heat is the spontaneous flow of energy due to temperature differences.
• Work is any other transfer of energy... you do work by pushing a piston, stirring a cup, running
current through a resistor or electric motor. In these cases the internal energy of these systems
will increase, and they will usually ‘get hotter’, but we don’t call this heat.
• Note that these are both descriptions of energy transfers, not total energies in a system.
ΔU = Q + W (2.2.1)
and this is just energy conservation, sometimes called the first law of thermodynamics.
• Note these are the amounts of heat and work entering the system.
• Note that we often want to talk about infinitessimals, but note that ‘dQ’ and ‘dW ’ aren’t the
change in anything, since heat and work flow in or out, they aren’t properties of the system
itself.
Compression Work
For compression work we have
~ → F Δx
W = F~ · dx (2.2.2)
for a simple piston. If compression happens slowly, so the gas stays in equilibrium, then I can replace
W = P AΔx = −P ΔV (2.2.3)
This is called quasistatic compression. To leave this regime you need to move the piston at a speed
of order the speed of sound or more (than you can make shock waves).
8
That was for infinitesimal changes, but for larger changes we have
Z Vf
W =− P (V )dV (2.2.4)
Vi
Q = ΔU − W = −W (2.2.7)
Heat Capacities
Heat capacity is the amount of heat needed to raise its temperature. The specific heat capacity
is the heat capacity per unit mass.
As stated, specific heat capacity is ambiguous because Q isn’t a state function. That is
Q ΔU − W
C= = (2.2.14)
ΔT ΔT
It’s possible that U only depends on T (but likely not!), but W definitely depends on the process.
• Most obvious choice is W = 0, which usually means heat capacity at constant volume
∂U
CV = (2.2.15)
∂T V
This could also be called the ‘energy capacity’.
• Objects tend to expand when heated, and so its easier to use constant pressure
ΔU + P ΔV ∂U ∂V
CP = = +P (2.2.16)
ΔT P ∂T P ∂T P
For gases this distinction is important.
Interesting and fun to try to compute CV and CP theoretically.
When the equipartition theorem applies and f is constant, we find
∂U ∂ f f
CV = = N kT = N k (2.2.17)
∂T ∂T 2 2
For a monatomic gas have f = 3, while gases with rotational and vibrational DoF have (strictly)
larger f . So CV allows us to measure f .
Constant pressure? For an ideal gas the only difference is the addition of
∂V ∂ N kT Nk
= = (2.2.18)
∂T P ∂T P P
so that
CP = CV + N k (2.2.19)
Interestingly CP is independent of P , because for larger P the gas just expands less.
10
Phase Transitions and Latent Heat
Sometimes you can put heat in without increasing the temperature at all! This is the situation
where there’s Latent Heat, and usually occurs near a first order phase transition. We say that the
Q
L≡ (2.2.20)
m
is the latent heat per mass needed to accomplish the transformation. Conventionally this is at
constant pressure, as its otherwise ambiguous. Note it takes 80 cal/g to melt water and 540 cal/g to
boil it (note it takes 100 cal/g to heat water from freezing to boiling).
Enthalpy
Since we often work at constant pressure, it’d be convenient if we could avoid constantly having to
include compression-expansion work. We can do this be defining the enthalpy
H = U + PV (2.2.21)
This is the total energy we’d need to create the system out of nothing at constant
pressure. Or if you annihilate the system, you could get H back having the atmosphere contract!
It’s useful because
ΔH = ΔU + P ΔV
= Q + Wother (2.2.22)
Could all CP the “enthalpy capacity” while CV is “energy capacity”. No need for any heat at all,
since could just be ‘other’ work (eg a microwave oven).
For example, boiling a mole of water at 1 atmosphere takes 40, 660 J, but via ideal gas
P V = RT ≈ 3100 J (2.2.24)
at room temperature, so this means 8% of the energy put in went into pushing the atmosphere away.
11
Two-State Systems, Microstates, and Macrostates
Let’s go back to our bank robber scenario, taking it slow.
If I have a penny, a nickel, and a dime and a flip them, there are 8 possible outcomes. If the coins
are fair, each has 1/8 probability. Note that there are 3 ways of getting 2 heads and a tail, so the
probability of that outcome (2 heads and a tail) is 3/8. But if I care about whether the tail comes
from the penny, nickel, or dime, then I might want to discriminate among those three outcomes.
This sort of choice of what to discriminate, what to care about, or what I can actually distinguish
plays an important role in stat mech. Note that
• A microstate is one of these 8 different outcomes.
• A macrostate ignores or ‘coarse-grains’ some of the information. So for example just giving
information about how many heads there were, and ignoring which coin gave which H/T
outcome, defines a macrostate.
• The number of microstates in a macrostate is called the multiplicity of the macrostate. We
often use the symbol Ω for the multiplicity.
If we label macrostates by the number of heads n, then for n = 0, 1, 2, 3 we find
Ω(n) 1 3 3 1
p(n) = = , , , (3.1.1)
Ω(all) 8 8 8 8
How many total microstates are there if we have N coins? It’s 2N , as each can be H or T. What are
the multiplicities of the macrostates with n heads with N = 100? Well note that
100 × 99 100 × 99 × 98
Ω(0) = 1, Ω(1) = 100, Ω(2) = , Ω(3) = (3.1.2)
2 3×2
In general we have
N!
Ω(n) = (3.1.3)
n!(N − n)!
which is N choose n. So this gives us a direct way to compute the probability of finding a given
number of heads, even in our bank robbery scenario where N = 106 .
We can immediately make this relevant to physics by studying the simplest conceivable magnetic
system, a two-state paramagnet. Here each individual magnetic dipole has only two states
(because of QM), so it can only point up or down, and furthermore each dipole behaves as though
its magnetic moment is independent of the others (they interact very weakly... in fact, in our
approximation, they don’t interact at all). Now H/T just becomes ↑ / ↓, so a microstate is
· · · ↑↑↑↓↑↓↓↑↓↓↑ · · · (3.1.4)
while a macrostate is specified by the total magnetization, which is just N↑ − N↓ = 2N↑ − N , and
multiplicity
N!
Ω(N↑ ) = (3.1.5)
N↑ !N↓ !
12
This really becomes physics if we apply an external magnetic field, so that the system has an
energy corresponding with how many little dipoles are aligned or anti-aligned with the external field.
1023 + 50 (3.1.6)
or
23
8973497 × 1010 (3.1.7)
With very very large numbers, units don’t even matter! Be very careful though, because in many
calculations the very very large, or just very large numbers cancel out, and so you really do have to
carefully keep track of pre-factors.
Important Math
Now we will derive the special case of a very universal result, which provides information about the
aggregate distribution of many independent random events.
Since N is large, N ! is really large. How big? One way to approximate it is to note that
PN
N ! = elog N ! = e n=1 log n
(3.1.8)
N ! ∼ N N e−N (3.1.10)
This is the most basic part of Stirling’s Approximation to the factorial. The full result is
√
N ! ≈ N N e−N 2πN (3.1.11)
You can derive this more precise result by taking the large N limit of the exact formula
Z ∞
N! = xN e−x dx (3.1.12)
0
13
That was a warm-up to considering the distribution of H/T, or spins in a paramagnet.
N!
Ω(n) =
n!(N − n)!
N N e−N
≈ (3.1.13)
nn e−n (N − n)N −n en−N
The prefactor of N in the exponent means that this is very, very sharply peaked at its maximum,
which is at x = 12 . Writing x = 12 + y we have
4
−x log x − (1 − x) log(1 − x) = log 2 − 2y 2 − y 4 + · · · (3.1.15)
3
so that
1+y 2
Ω n= N ≈ 2N e−2N y
2
(2n−N )2
≈ 2N e−2 N (3.1.16)
This is a special case of the central limit theorem, which is arguably the most important result
in probability and statistics. The distribution we have found is the Gaussian Distribution, or
the ‘bell curve’. Its key feature is its variance, which says intuitively that
N √
n− . few × N (3.1.17)
2
√
where here N is the standard deviation. Thus we expect that 2nN−N . √1N 1.
This means that if we flip a million fair coins, we expect the difference between the number of
heads and the number of tails to be of order 1000, and not of order 10, 000. And if the difference
was only, say, 3, then we might be suspicious that the numbers are being faked.
14
number, but it’s a very small fraction of the total, namely ∼ 10−11 fraction. Thus the difference
between up and down spins is minute.
We have also largely solved the problem of a (1-dimensional) random walk. Note that ‘heads’
and ‘tails’ can be viewed as steps to the right or left. Then the total nH − nT is just the number
of steps taken to the
√ right. This means that if we take N random steps, we should expect to be a
distance of order N from our starting point. This is also an important, classic result.
Note that there’s a unique configuration with minimum and maximum energy, whereas there are
100!
50!50!
configurations with energy 0.
So let’s assume that we have two paramagnets, which can exchange energy between them by
flipping some of the dipoles.
• We will treat the two paramagnets as weakly coupled, so that energy flows between them
slowly and smoothly, ie energy exchange within each paramagnet is much faster than between
the two. We’ll call these energies UA and UB .
• Utotal = UA + UB is fixed, and we refer to the macrostate just via UA and UB . Note that the
number of spins in each paramagnet NA and NB are fixed.
• In an isolated system in thermal equilibrium, all accessible (ie possible) microstates are equally
probable.
A microstate might be inaccessible because it has the wrong energy, or the wrong values of conserved
charges. Otherwise it’s accessible.
Microscopically, if we can go from X → Y , we expect we can also transition from Y → X (this
is the principle of detailed balance). Really there are so many microstates that we can’t possible
‘visit’ all of them, so the real assumption is that the microstates we do see are a representative
sample of all the microstates. Sometimes this is called the ‘ergodic hypothesis’ – averaging over
time is the same as averaging over all accessible states. This isn’t obvious, but should seem plausible
– we’re assuming the transitions are ‘sufficiently random’.
15
For simplicity, let’s imagine that the total energy is UA + UB = 0, and the two paramagnets
have NA = NB = N dipoles each. How many configurations are there with all spins up in one
paramagnet, and all spins down in the other? Just one! But with n up spins in the first paramagnet
and N − n up spins in the second, there are
(2N )! (N )!
Ωtot = Ω(N, (N − n)µ)Ω(N, (n − N )µ) = ×
n!(N − n)! n!(N − n)!
2
(N )!
= (3.2.3)
n!(N − n)!
Thus the number of states is vastly larger when n = N . In fact, we can directly apply our analysis
of combinatorics
√ to conclude that the number of states has an approximately Gaussian distribution
with width N .
Note that this is mathematically identical to the coin flipping example from the first lecture.
where • denote units of energy and | denote transitions between oscillators. So this expression has
E = 12 and N = 9. Thus we just have a string with N − 1 of the | and E of the •, we can put the |
and • anywhere, and they’re indistinguishable.
16
If this reasoning isn’t obvious, first consider how to count a string of N digits, say N = 7. A
configuration might be
4561327 (3.3.5)
7 × 6 × 5 × 4 × 3 × 2 × 1 = 7! (3.3.6)
configurations. But in the case of the • and |, there are only two symbols, so we divide to avoid
over-counting identical configurations.
18
3.4 Ideal Gases
All large systems – ie systems with many accessible states or degrees of freedom – have the property
that only a tiny fraction of macrostates have reasonably large probabilities. Let us now apply this
reasoning to the ideal gas. What is the multiplicity of states for an ideal gas?
In fact, in the ideal limit the atoms don’t interact, so... what is the multiplicity of states for a
single atom in a box with volume V ?
If you know QM then you can compute this exactly. Without QM it would seem that there are an
infinite number of possible states, since the atom could be anywhere (it’s location isn’t quantized!).
But we can get the right answer without almost any QM.
• First note that the number of states must be proportional to the volume V .
• But we can’t specify a particle’s state without also specifying its momentum. Thus the number
of states must be proportional to the volume in momentum space, Vp .
Note that the momentum space volume is constrained by the total energy p~2 = 2mU .
We also need a constant of proportionality, but in fact that’s (essentially) why Planck’s constant
~ exists! You may have heard that δxδp > h/2, ie this is the uncertainty principle. So we can write
V Vp
Ω1 = (3.4.1)
h3
by dimensional analysis. You can confirm this more precisely if you’re familiar with QM for a free
particle in a box.
If we have two particles, then there are two added complications
• We must decide if the particles are distinguishable or not. For distinguishable particles we’d
have Ω21 , while for indistinguishable particles we’d have 12 Ω21 .
• We need to account for momentum conservation as p~21 + p~22 = 2mU . With more particles we
have a sum over all of their momenta.
1 VN
ΩN = × (momentum hypersphere area) (3.4.2)
N ! h3N
where the momentum hypersphere is the volume in momentum space, constrained by momentum
conservation.
The surface area of a D-dimensional sphere is
2π D/2 D−1
A= R (3.4.3)
( D2 − 1)!
19
√
We know that R = 2mU , where U is the total energy of the gas, and that D = 3N . Plugging this
in gives
1 V N 2π 3N/2 3N −1
ΩN = 3N 3N
(2mU ) 2
N ! h ( 2 − 1)!
1 π 3N/2 V N 3N
≈ 3N 3N
(2mU ) 2 (3.4.4)
N ! ( 2 )! h
where we threw away some 2s and −1s because they are totally insignificant. This eliminates the
distinction between ‘area’ and ‘volume’ in momentum space, because momentum space has the
enormous dimension 3N . With numbers this big, units don’t really matter!
Note that we can write this formula as
ΩN = f (N )V N U 3N/2 (3.4.5)
Since UA + UB = Utot , we see that multiplicity is maximized very sharply by UA = UB = Utot /2. And
the peaks are very, very sharp as a functions of UA – we once again have a Gaussian distribution for
fluctuations about this equilibrium. Though the exact result is the simpler function
for very, very large P = 3N/2. It should be fairly intuitive that this function is very sharply
maximized at x = 1/2.
If the partition can move, then it will shift around so that
VA = VB = Vtot /2 (3.4.8)
as well. This will happen ‘automatically’, ie the pressures on both sides will adjust to push the
partition into this configuration. Once again, the peak is extremely sharp.
Finally, we could imagine NA 6= NB to start, but we could poke a hole in the partition to see
how the molecules move. Here the analysis is a bit more complicated – it requires using Stirling’s
formula again – but once again there’s a very, very sharply peaked distribution.
Note that as a special case, putting all the molecules on one side is 2−N less likely than having
them roughly equally distributed. This case exactly like flipping N coins!
20
3.5 Entropy
We have now seen in many cases that the highest multiplicities macrostates are vastly, vastly more
numerous than low-multiplicity microstates, and that as a result systems tend to evolve towards
these most likely states. Multiplicity tends to increase.
For a variety of reasons, it’s useful to replace the multiplicity with a smaller number, its logarithm,
which is the entropy
S ≡ log Ω (3.5.1)
S ≡ k log Ω (3.5.2)
Stot = SA + SB (3.5.3)
when Ωtot = ΩA × ΩB . This is quite convenient for a variety of reasons. For one thing, the entropy is
a much smoother function of other thermodynamic variables (ie it doesn’t have an extremely sharp
peak).
Since the logarithm is a monotonic function, the tendency of multiplicity to increase is the same
thing as saying that entropy tends to increase:
δS ≥ 0 (3.5.4)
21
Entropy of an Ideal Gas
We can compute the entropy of an ideal gas from the multiplicity formula
" #
1 π 3N/2 V N 3N
S = log (2mU ) 2
N ! ( 3N
2
)! h 3N
3 3 3N V
= N k (1 − log N ) + π + 1 − log + log 3 + log 2mU
2 2 2 h
" 3/2 !#
5 V 4πmU
= Nk + log
2 N 3N h2
" 3/2 !#
5 1 4πmu
= Nk + log (3.5.5)
2 n 3h2
where u = U/N and n = N/V are the energy density and number density, respectively. Apparently
this is called the Sackur-Tetrode equation, but I didn’t remember that.
What would this result have looked like if we had used distinguishable particles??? The problem
associated with this is called the Gibbs paradox. We’ll talk about extensive vs intensive quantities
more thoroughly later.
Entropy dependence on volume (with fixed U and N ) is simplest, ie just
Vf
ΔS = N k log (3.5.6)
Vi
Note that this makes sense in terms of doubling the volume and gas particles being in either the left
or right half of the new container.
This formula applies both to expansion that does work, eg against a piston (where the gas
does work, and heat flows in to replace the lost internal energy) and to free expansion of a gas.
These are quite different scenarios, but result in the same change in entropy.
What about increasing the energy U while fixing V and N ? Where did the exponent of 3/2
come from, and why does it take that value?
Entropy of Mixing
Another way to create entropy is to mix two different substances together. There are many more
states where they’re all mixed up than when they are separated, so mixing creates entropy.
One way to think about it is if we have gas A in one side of a partition, and gas B on the other
side, then when they mix they each double in volume, so we end up with
in extra entropy. It’s important to note that this only occurs if the two gases are different
(distinguishable).
22
Note that doubling the number of molecules of the same type of gas in a fixed volume does not
double the entropy. This is because while out front N → 2N , we also have the log n1 → log 21n , and
so the change in the entropy is strictly less than a factor of 2.
(Now you may wonder if the entropy ever goes down; it doesn’t, because for N (X − log N ) to be
a decreasing function of N , we must have S < 0, which means a multiplicity less than 1!)
In fact, the very definition of entropy strongly suggests that S ≥ 0 always.
• Free expansion, or processes requiring heat exchange will increase entropy. We’ll see soon that
reversible changes in volume must be quasistatic with W = −P ΔV , and no heat flow.
• From a quantum point of view, these quasistatic processes slowly change the energy levels of
atoms in a gas, but they do not change the occupancy of any energy levels. This is what keeps
multiplicity constant.
• We saw earlier that heat ‘wants’ to flow precisely because heat flow increases entropy
(multiplicity). The only way for heat to flow without a change in entropy is if it flows
extremely slowly, so that we can take a limit where multiplicity almost isn’t changing.
To see what we mean by the last point, consider the entropy of two gases separated by a partition.
As a function of the internal energies, the total entropy is
3
Stot = N k log ((Utot − δU )(Utot + δU )) (3.5.9)
2
So we see that as a function of δU , the multiplicity does not change when δU Utot . More precisely
∂Stot
=0 (3.5.10)
∂δU
at δU = 0. So if we keep δU infinitesimal, we can cause heat to move energy across the partition
without any change in entropy or multiplicity. In fact, this idea will turn out to be important...
23
The only important ideas here are entropy/multiplicity tend to increase until we reach
equilibrium, so they stop increasing when we’re in equilibrium, and energy can be shared, but
total energy is conserved.
If we divide our total system into sub-systems A and B, then we have UA + UB = Utot is fixed,
and total entropy Stot = SA + SB is maximized. This means that
∂Stot
=0 (4.1.1)
∂UA equilibrium
But
∂Stot ∂SA ∂SB
= +
∂UA ∂UA ∂UA
∂SA ∂SB
= − (4.1.2)
∂UA ∂UB
where to get the second line we used energy conservation, dUA = −dUB . Thus we see that at
equilibrium
∂SA ∂SB
= (4.1.3)
∂UA ∂UB
So the derivative of the entropy with respect to energy is the thing that’s the same between
substances in thermal equilibrium.
If we look at units, we note that the energy is in the denominator, so it turns out that temperature
is really defined to be
1 ∂S
≡ (4.1.4)
T ∂U N,V
where the subscript means that we’re holding other quantities like the number of particles and the
volume fixed.
Let’s check something basic – does heat flow from larger temperatures to smaller temperatures?
The infinitessimal change in the entropy is
1 1 1 1
δS = δUA + δUB = − δUA (4.1.5)
TA TB TA TB
So we see that to increase the total entropy (as required by the 2nd law of thermodynamics), if
TA < TB then we need to have δUA > 0 – that is, the energy of the low-temperature sub-system
increases, as expected.
What if we hadn’t known about entropy, but were just using multiplicity? Then we would have
found that
∂Ωtot ∂ΩA ∂ΩB
= Ω B + ΩA
∂UA ∂UA ∂UA
1 ∂ΩA 1 ∂ΩB
= ΩA ΩB − (4.1.6)
ΩA ∂UA ΩB ∂UB
24
and we would have learned through this computation that entropy was the more natural quantity,
rather than multiplicity. I emphasize this to make it clear that entropy isn’t arbitrary, but is
actually a natural quantity in this context.
Note that we now have a crank to turn – whenever there’s a quantity X that’s conserved,
we can consider what happens when two systems are brought into contact, and find the equilibrium
configuration where S(XA , XB ) is maximized subject to the constraint that XA + XB is constant.
We can do this for X = Volume, Number of Particles, and other quantities.
Examples
For an Einstein Solid, we computed S(U ) a while ago and found
u U
S = N k log = N k log + Nk (4.1.7)
N
where is some energy unit in the oscillators. So the temperature is
U
T = (4.1.8)
Nk
This is what the equipartition theorem would predict, since a harmonic oscillator has a kinetic
energy and a potential.
More interestingly, for an monatomic ideal gas we found
3
S = N k log V + N k log U + log f (N ) (4.1.9)
2
Thus we find that
3
U = N kT (4.1.10)
2
which once again agrees with equipartition.
25
1. Compute S in terms of U and other quantities.
We’ll learn a quite different and often more efficient route in a few lectures.
• If the volume is changing, but the process is quasistatic, then this rule also applies.
In some examples CV is approximately constant. For instance if we heat a cup of water from room
temperature to boiling, we have
Z 373
1 373
ΔS ≈ (840 J/K) dT = (840 J/K) log ≈ 200J/K ≈ 1.5 × 1025 (4.2.6)
293 T 293
In fundamental units this is an increase of 1.5 × 1025 , so that the multiplicity increases by a factor
of eΔS , which is a truly large number. This is about 2 per molecule.
If we knew CV all the way down to zero, then we could compute
Z Tf
CV
Sf − S(0) = dT (4.2.7)
0 T
So what is S(0)? Intuitively, it should be S(0) = 0, so that Ω = 1, the unique lowest energy state.
Unless there isn’t a unique ground state.
26
In practice many systems have residual entropy at (effective) T ≈ 0, due to eg re-arrangements
of molecules in a crystal that cost very, very little energy. There can also be residual entropy from
isotope mixing. (Interestingly Helium actually re-arranges its isotopes at T = 0, as it remains a
liquid.)
Note that to avoid a divergence in our definitions, it would seem that
CV → 0 as T → 0 (4.2.8)
This, and the statement that S(0) = 0, are both sometimes called the third law of thermodynamics.
Apparently ideal gas formulas for heat capacities must be wrong as T → 0, as we have already
observed.
27
which means that N↑ = N/2 for U = 0. Where is multiplicity maximized?
We see that as a function of energy U , multiplicity is maximized at U = 0, and it goes down when
we increase or decrease U ! This is very different from most other systems, as typically multiplicity
increases with energy, but not here.
Let’s draw the plot of S(U ). In the minimum energy state the graph is very steep, so the system
wants to absorb energy, as is typical for systems at very low energy. But at U = 0 the slope goes to
zero, so in fact that system stops wanting to absorb more energy. And then for U > 0 the slope
becomes negative, so the paramagnet wants to spontaneously give up energy!
But remember that
−1
∂S
T ≡ (4.3.6)
∂U
Thus when S 0 (U ) = 0, we have T = ∞. And when S 0 (U ) < 0, we have that T is negative.
Note that negative temperatures are larger than positive temperature, insofar as systems at
negative temperature give up energy more freely that systems at any positive temperature. When
|T | small and T < 0, temperature is at its effective maximum.
It’s possible to actually do experiments on systems at negative temperature using nuclear
paramagnets by simply flipping the external magnetic field. Nuclear dipoles are useful because they
don’t equilibrate with the rest of the system, so they can be studied in isolation.
Negative temperature can only occur in equilibrium for systems with bounded total energy and
finite number of degrees of freedom.
Aside from simply demonstrating a new phenomenon, this example is useful for emphasizing
that entropy is more fundamental than temperature.
Explicit Formulae
It’s straightforward to apply Stirling’s approximation to get an analytic solution for large N .
The entropy is
This goes to zero very fast at both low and high temperature, and is maximized at T → ∞ (with
either sign).
All this work wasn’t for nothing. For physical magnets at room temperature, kT ≈ 0.025eV , while
if µ = µB ≈ 5.8 × 10−5 eV /T is the Bohr magneton (the value for an electron), then µB ∼ 10−5 eV
even for strong magnets. Thus µB/(kT ) 1. In that limit
tanh x ≈ x (4.3.12)
29
equilibrium. If the partition were to move without this assumption, energies would not be fixed, and
things would be more complicated.
Since TA = TB and T × S has units of energy, we can identify the pressure by
∂S
P =T (4.4.4)
∂V U,N
S = N k log V + · · · (4.4.5)
where the ellipsis denotes functions of U and N that are independent of V . So we find that
∂ N kT
P =T (N k log V ) = (4.4.6)
∂V V
So we have derived the ideal gas law! Or alternatively, we have verified our expression for the
pressure in a familiar example.
Thermodynamic Identity
We can summarize the relation between entropy, energy, and volume by a thermodynamic identity.
Let’s consider how S changes if we change U and V a little. This will be
∂S ∂S
dS = dU + dV (4.4.7)
∂U V ∂V U
where we fix volume when we vary U , and fix U when we vary the volume. We can recognize these
quantities and write
1 P
dS = dU + dV (4.4.8)
T T
This is most often written as
dU = T dS − P dV (4.4.9)
This is true for any system where T and P make sense, and nothing else is changing (eg number of
particles). This formula reminds us how to compute everything via partial derivatives.
dU = Q + W (4.4.10)
30
where we interpret Q ∼ T dS and W ∼ −P dV . Can we make this precise?
Not always. This is true if the volume changes quasistatically, if there’s no other work done, and
if no other relevant variables are changing. Then we know that W = −P dV , so that
Q = T dS (quasistatic) (4.4.11)
This means that ΔS = Q/T , even if there’s work being done. If the temperature changes, we can
still compute
Z
CP
(ΔS)P = dT (4.4.12)
T
if the process is quasistatic.
Also, if Q = 0, then the entropy doesn’t change. In other words
W > −P dV (4.4.14)
because we hit the molecules harder than in a quasistatic process. However, we can choose to only
move the piston infinitesimally, so that the
dU P Q
dS = + dV > (4.4.15)
T T T
This is really just a way of saying that we’re doing work on the gas while keeping V constant, and so
U , T , and S will increase. It looks like compression work but it’s ‘other’ work.
A more interesting example is free expansion of the gas into a vacuum. No work is done and
no heat flows, but S increases.
It’s easy to create more entropy, with or without work and heat. But we can’t ever decrease it.
31
This is the quantity that’s the same for two systems in equilibrium. It’s the potential for sharing
particles.
Since we added a minus sign, if two systems aren’t in equilibrium the one with smaller µ gains
particles. This is like temperature. Particles flow towards low chemical potential.
We can generalize the thermodynamic identity as
1 P µ
dS = dU + dV − dN (4.5.3)
T V T
where the minus sign comes from the definition. Thus
dU = T dS − P dV + µdN (4.5.4)
dU = µdN (4.5.5)
This is how much energy changes when we add a particle and keep S, V fixed.
Usually to add particles without changing S, you need to remove energy. But if you need to give
the particle potential energy to add it, this contributes to µ.
Let’s compute µ for an ideal gas. We take
3/2 ! !
4πmU 5/2
S = N k log V − log N (4.5.7)
3h2
Thus we have
3/2 ! !
4πmU 5
µ = −T k log V 2
− log N 5/2 + T N k
3h 2N
" 3/2 #
V 4πmU
= −kT log
N 3N h2
" 3/2 #
V 2πmkT
= −kT log (4.5.8)
N h2
32
For gas at room temperature and atmospheric pressure, V /N ≈ 4 × 10−26 m3 , whereas the other
factor is, for helium, about 10−31 m3 and so the logarithm is 12.7, and µ = −0.32 eV for helium at
300 K, 105 N/m.
Increasing the density of particles increases the chemical potential, as particles don’t ‘want’ to
be there.
So far we used only one particle type, but with many types we have a chemical potential for
each. If two systems are in diffusive equilibrium all the chemical potentials must be the same.
So we see that through the idea of maximizing entropy/multiplicity, we have been led to define
T , P , and µ, and the relation
dU = T dS − P dV + µdN (4.5.9)
V, N, S, U, H, mass (4.6.1)
T, P, µ, density (4.6.2)
for example volume × density = mass. Conversely, if the ratios of extensive quantities will be
intensive.
Note that you should never add an intensive and an extensive quantity – this is often wrong by
dimensional analysis, but in cases where it’s not, it also doesn’t make sense conceptually.
So intensive/extensive is like a new kind of dimensional analysis that you can use to check what
you’re calculating.
33
Engines can’t convert heat into work perfectly. As we’ll see, the reason is that absorbing heat
increases the entropy of the engine, and if its operating on a cycle, then it must somehow get rid
of that entropy before it can cycle again. This leads to inefficiency. Specifically, the engine must
dump waste heat into the environment, and so only some of the heat it initially absorbed can be
converted into work.
We can actually say a lot about this without knowing anything specific about how the engine
works. We can just represent it with a hot reservoir, the engine itself, and a cold reservoir.
Energy conservation says that
Qh = W + Qc (5.0.1)
34
Carnot Cycle
The arguments above also suggest how to build a maximally efficient engine – at least in theory.
Let’s see how it would work.
The basic point is that we don’t want to produce excess entropy. This means that
This basically fixes the behavior the engine, as pointed out by Sadi Carnot in 1824.
To have heat transfers exactly at Th and Tc , we must expand or contract the gas isothermally.
To otherwise not produce any entropy, we should have adiabatic expansion and contraction for the
other parts of the cycle, ie with Q = 0. That’s how we get the gas from Th to Tc and back.
You should do problem 4.5 to check that this all works out, but based on our understanding of
energy conservation and the 2nd law of thermodynamics, it should actually be ‘obvious’ that this is
a maximally efficient engine.
Note that Carnot engines are extremely impractical, because they must operate extremely slowly
to avoid producing excess entropy.
It’s easier to draw the Carnot cycle in T and S space rather than P and V space.
5.1 Refrigerators
Theoretically, a refrigerator is just a heat engine run in reverse.
We define the coefficient of performance of a refrigerator as
Qc Qc 1
COP = = = Qh
(5.1.1)
W Qh − Qc Qc
−1
We cannot just use the inequality from our discussion of engines, because heat flows in the opposite
direction.
Instead, the entropy dumped into the hot reservoir must be at least as large as that drawn from
the cold reservoir, which means that
Qh Qc
≥ (5.1.2)
Th Tc
so that
Qh Th
≥ (5.1.3)
Qc Tc
This then tells us that
1 Tc
COP ≤ Th
= (5.1.4)
Tc
−1 Th − Tc
35
So in theory we can refrigerate very, very well when Tc ≈ Tc . This should be intuitive, because in
this limit we need not transfer almost any entropy while transferring heat.
Conversely, if Tc Th then the performance is very poor. It takes an immense amount of work
to cool something towards absolute zero.
It should also be clear from the way we derived it that the maximally efficient refrigerator would
be a Carnot cycle run in reverse.
Brief Timeline
Carnot came up with ingenius arguments that engines couldn’t be more efficient than the Carnot
cycle, and that they must always produce waste heat. But he didn’t distinguish very clearly between
entropy and heat. Clausius defined entropy (and coined the term) clearly as Q/T , but he didn’t
know what it really was. Boltzmann mostly figured this out by 1877.
36
This leads to
rT3 − rT2
e=1− =1−r (5.2.6)
T3 − T2
γ−1
V2
where r = V1
. So in other words, the overall efficiency is
γ−1
V2
e=1− (5.2.7)
V1
For air γ = 7/5 and we might have V2 /V1 = 8, so that the efficiency is around 56%.
Recall that T V γ−1 is fixed on an adiabat, so to compare this efficiency with Carnot we can solve
for V ratios in terms of T ratios, and we find
T1 T4
e=1− =1− (5.2.8)
T2 T3
which doesn’t involve the extreme temperatures, so it less efficient than Carnot. In practice real
gasoline engines are 20-30% efficient.
You can get greater efficiency with higher compression ratio, but the fuel will pre-ignite once
it gets too hot. We can take advantage of this using diesel engines, which simply use heat from
compression to ignite the mixture. They spray the fuel/air mixture as the piston starts to move
outward, meaning that (to some approximation) they have a constant volume and a constant pressure
piece in their cycles, which means computing their efficiency is more complicated. They ultimately
have higher efficiency (around 40%) using very high compression ratios, which are limited by not
melting the engine.
Steam Engines
See the tables and diagrams in the book.
These use a Rankine cycle, and go outside the ideal gas regime, as the steam condenses!
Here water gets pumped in, heats into steam and pushes a turbine (as it cools and decreases in
pressure), then is condensed back into water.
We can’t compute the efficiency from first principles. But under constant pressure conditions,
the heat absorbed is equal to the change in enthalpy H. So t we can write the efficiency as
Qc H4 − H1 H4 − H1
e=1− =1− ≈1− (5.2.9)
Qh H3 − H2 H3 − H1
where the last approximation is good because the pump adds very little energy to the water, and
the P V term is small for liquids.
There are tables called ‘steam tables’ that simply list the enthalpies and entropies of steam, water,
and water-steam mixtures in various conditions. We need the entropies because for the steam+water
mixture, we only get H by using the fact that 3 → 4 is an adiabat, so S3 = S4 , allowing us to use
the S values of the tables to get H.
37
5.3 Real Refrigerators
Real refrigerators follow a cycle much like the inverse of the Rankine cycle. The working substance
changes back and forth between a liquid and a gas. That liquid-gas transition (boiling point) must
be much lower in a refrigerator though, because we want to get to lower temperatures.
A variety of fluids have been used, including carbon dioxide, ammonia (dangerous), Freon, and
now HFC-134a.
The PV diagram involves first from 1 the gas is compressed adiabatically, raising P and T . Then
from 2 to 3 it gives up heat and liquefies in the condenser. Then from 3 to 4 it passes through
a throttling valve – a narrow opening – emerging on the other side at much lower pressure and
temperature. Finally it absorbs heat and turns back into a gas in the evaporator.
We can write the COP using enthalpies
Qc H1 − H4
COP = = (5.3.1)
Qh − Qc H2 − H3 − H1 + H4
Enthalpies at 1, 2, 3 can be looked up, with point 2 we can assume S is constant during compression.
We can understand point 4 by thinking more about throttling.
Throttling
Throttling, or the Joule-Thomson process involves pushing a fluid through a tiny hole or plug from
a region with pressure Pi to one with pressure Pf . Initial and final volumes are Vi and Vf . There is
no heat flow, so
Uf − Ui = Q + W = 0 + Wlef t + Wright (5.3.2)
which gives
Uf − Ui = Pi Vi − Pf Vf (5.3.3)
or
Uf + Pf Vf = Ui + Pi Vi (5.3.4)
or just conservation of enthalpy.
The purpose is to cool the fluid below the cold reservoir temperature, so it can absorb heat. This
wouldn’t work if the fluid were an ideal gas, since H = f +22
N kT .
But in a dense gas or liquid, the energy contains a potential term which decreases (due to
attractive forces) when the molecules get close together. So
H = Upot + Ukin + P V (5.3.5)
and so as the gas molecules get further apart, Upot increases, and the kinetic energy and temperature
tend to decrease.
If we use H3 = H4 in refrigeration, then we have
H1 − H3
COP = (5.3.6)
H2 − H1
and so we just need to look up the enthalpies.
38
Cooling MORE
See the book for some fun reading.
• You can make dry ice by liquifying CO2 at higher pressure, then when you throttle you get
dry ice frost residue.
• Making liquid Nitrogen is more difficult. You can just throttle for a while, but you need to
start at higher pressure.
• Then you need to do it more efficiently, so you can use a heat exchanger to cool one gas as
another is throttled.
However, throttling isn’t good enough for hydrogen or helium, as these gases become hotter
when throttled. This is because attractive interactions between gas molecules are very weak, so
interactions are dominated by positive potential energies (repulsion). Hydrogen and Helium were
first liquified in 1898 and 1908 respectively.
One can use further evaporative cooling to get from 4.2 K (boiling point of helium at 1 atm) to
about 1 K. To go even further need
• A helium dilution refrigerator, where 3 He evaporates into a liquid bath with 4 He. We use the
3
He to take heat from the 4 He. This can get us from 1K to a few milliK.
6 Thermodynamic Potentials
Forces push us towards minima of energies. Probability pushes us towards maximization of entropy.
Are there thermodynamic potentials mixing U and S that combine to tell us how a system will
tend to change?
And a related question – we have seen that enthalpy H is useful when we are in an environment
with constant pressure. What if instead we’re connected to a reservoir at constant T ? Perhaps
there’s a natural way to account for heat transfer to and from this reservoir, just as H automatically
accounts for work due to the ambient pressure?
39
Let’s assume the environment acts as a reservoir of energy, so that it can absorb or release
unlimited energy without any change in its temperature. If SR is the reservoir entropy, then
F = U − TS (6.0.5)
dF ≤ 0 (6.0.6)
which means that F can only decrease. So it’s a kind of ‘potential’ such that we can only move to
lower and lower values of F at fixed T, V, N .
And now we can just ignore the reservoir and account for the 2nd law using F . The system will
just try to minimize F .
We can play the same game letting the volume of the system change, and working at constant
pressure, so that
1 P 1 1
dStotal = dS − dU − dV = − (dU − T dS + P dV ) = − dG (6.0.7)
T T T T
where
G = U − TS + PV (6.0.8)
and we work at constant T and constant P . Thus in this case its the Gibbs free energy that must
decrease.
We can summarize this as
40
• At constant U and V , S increases.
F ≡ U − TS (6.0.9)
S ‘wants’ to increase, and (one may think that) U ‘wants’ to decrease. The latter is only because it
can give energy to the environment, and thereby increase the total entropy of the universe.
Wait what? Here what’s happening is that if U decreases without S increasing, then this means
that the entropy of the system isn’t changing – but the entropy of the environment will definitely
change by dUT
. So the fact that objects ‘want’ to roll down hills (and then not roll back up again!)
ultimately comes from the 2nd law of thermodynamics.
Note also that the desire to give away energy isn’t very great if T is large, since then the system
may not want to do this and risk ‘losing valuable entropy’. This is one way of seeing that at high
temperature, our system will have a lot of energy.
Reasoning for the Gibbs free energy
G ≡ U + PV − TS (6.0.10)
is similar, except that the environment can also ‘take’ volume from the system.
F = U − TS (6.0.11)
we see that we can get heat Q = T ΔS = T S from the environment for free when we make the
system.
Conversely, F is the amount of energy that comes out as work if we annihilate the system,
because we have to dump some heat Q = T S into the environment just to get rid of the systems
entropy. So the available or ‘free’ energy is F .
Above we meant all work, but if we’re in a constant pressure environment then we get some work
for free, and so in that context we can (combining H and F , in some sense) define
G = U − TS + PV (6.0.12)
which is the systems energy, minus the heat term, plus the work the constant P atmosphere will
automatically do.
41
We call U, H, F, G thermodynamic potentials, which differ by adding P V or −T S. They are all
useful for considering changes in the system, and changes in the relevant potential. Note that
ΔF = ΔU − T ΔS = Q + W − T ΔS (6.0.13)
If no new entropy is created then Q = T ΔS, otherwise Q < T ΔS, and so we have
ΔF ≤ W (6.0.14)
at constant T . This includes all work done on the system. Similarly, we find that
ΔG ≤ Wother (6.0.15)
dU = T dS − P dV + µdN (6.0.16)
In this sense we are automatically thinking about U as a function of S, V, N , since its these quantities
that we’re directly varying.
We can use this plus the definitions of H, F, G to write down a ton of other thermodynamic
identities. The math that we use is the ‘Legendre transform’. Given a convex function f (y), it
makes it possible to effectively ‘switch variables’ and treat f 0 (y) = x as the independent variable.
See 0806.1147 for a nice discussion.
Say we have a function so that its differential is
df = xdy (6.0.17)
g = xy − f (6.0.18)
then we have
So now we can instead use our new function g(x) in place of f (y).
For example, perhaps instead of using the entropy S(E) as a function of E, we would instead
∂S
like to make ∂E = T1 , ie the temperature, the independent variable. That way we can use T as our
‘knob’ instead of E, perhaps because we can control T but not E.
42
Although it may seem a bit mysterious – and the number of possibilities is rather mystifying –
we can apply these ideas very directly. For example since
H = U + PV (6.0.20)
we have
dH = dU + P dV + V dP (6.0.21)
dH = T dS − P dV + µdN + P dV + V dP
= T dS + µdN + V dP (6.0.22)
so we can think of the enthalpy as a function of S, N, P . We traded V for P and switched from
energy U to enthalpy H.
We can really go to town and do this for all the thermodynamic potentials. For example, for F
we find
and this tells us how F changes as we change T, V, N , and also that F is naturally dependent on
those variables. We also have
G = Nµ (6.0.26)
43
Our argument was subtle – why doesn’t it apply to
∂F
µ= (6.0.27)
∂N T,V
The reason is that with fixed V , as you change N the system becomes more and more dense. So an
intensive quantity is changing as you change N , namely the density N/V . It was crucial in the case
of G that all fixed quantities were intensive. P
With more types of particles, we just have G = i Ni µi . Note though that µi for a mixture are
not equal to µi for a pure substance.
We can use this to get a formula for µ for an ideal gas. Using the fact that
∂G
V = (6.0.28)
∂P T,N
we see that
∂µ 1 ∂G V
= = (6.0.29)
∂P N ∂P N
But by the ideal gas law this is kT /P . So we can integrate to get
P
µ(T, P ) − µ(T, P0 ) = kT log (6.0.30)
P0
for any reference P0 , usually atmospheric pressure.
This formula also applies to each species independently in a mixture, if P is the partial pressure
of that species. This works because ideal gases are non-interacting – that’s essentially what makes
them ideal.
So the electrical work we need to do is 237 kJ. Thi sis the ΔG of the Gibbs free energy:
ΔG = ΔH − T ΔS (6.0.35)
Standard tables include this information. If you perform the reaction in reverse, you can get ΔG of
energy out.
The same reasoning applies to batteries. For example lead-acid cells in car batteries run the
reaction
Tables say ΔG = −390 kJ/mol in standard conditions, so per mole of metallic lead we get 390 kJ
from the battery.
Note that ΔH = −312 kJ/mol for this reaction, so we actually get extra energy from heat
absorbed from the environment! When we charge the battery and run the reaction in reverse, we
have to put the extra 78 kJ of heat back into the environment.
You can also compute the battery voltage if you know how many electrons get pushed around
per reaction. For this reaction 2 electrons get pushed through the circuit, so the electrical work per
electron is
390kJ
= 3.24 × 10−19 J = 2.02eV (6.0.37)
2 × NA
A volt is the voltage needed to give an electron 1 eV, so the voltage is 2.02 V. Car batteries have six
cells to get to 12 V.
45
We know that for an isolated system, all accessible microstates are equally probable. Our atom
isn’t isolated, but the atom + reservoir system is isolated. We expect that we’re equally likely to
find this combined system in any microstate.
The reservoir will have ΩR (s1 ) available microstates when the atom is in s1 , and ΩR (s2 ) microstates
when the atom is in s2 . These will be different because the reservoir has more or less energy, and
thus it has access to more or less states. Since all states in total are equally likely, the ratio of
probabilities must be
P (s2 ) ΩR (s2 )
= (7.1.1)
P (s1 ) ΩR (s1 )
So we just need to write the right-side in a more convenient form that doesn’t make reference to the
reservoir.
Note that something a bit surprising happened – because multiplicity depends so strongly on
states and energies, changing our single atom’s state just a bit actually had a significant effect on
the multiplicity of a giant reservoir of energy. This contrasts with most questions in physics, where
the state of a single atom has almost no relevance to the behavior of a large reservoir.
But that’s easy by noting that
e−E(s)/kT (7.1.6)
as the proportionality factor for any given state s with energy E(s).
46
Note that probabilities aren’t equal to this factor because they also have to add up to 1! And so
we need to divide by the sum of all Boltzmann factors, including all states. We call that sum over
these factors
X E(s)
Z= e− kT (7.1.7)
s
with δE = 10.2 eV, and we have kT = 0.5 eV (note that an eV is about 10, 000 K). So the ratio of
probabilities is about e−20.4 ≈ 1.4 × 10−9 . Since there are four excited states, the portion in the first
excited state is about about 5 × 10−9 .
Atoms in the atmosphere of the sun can be absorbed if they can induce a transition between
states. A hydrogen atom in the first excited state can transition up in energy to give the Balmer
series, which give missing wavelengths in sunlight. Some other lines are also missing due to other
kinds of atoms, but these transition from their ground state. So there must be way more hydrogen
than other atoms in the sun’s atmosphere!
e−βE(s)
P (s) = (7.2.1)
Z
47
Given the probability to be in any given state, we can easily (in principle) compute the average or
expectation value for any quantity f (s). Formally this means that
X
f¯ = P (s)f (s)
s
1 X −βE(s)
= e f (s)
Z s
P −βE(s)
se f (s)
= P −βE(s)
(7.2.2)
se
For example, the simplest thing we might want to know about is the average energy itself, ie
f (s) = E(s). Thus we have that
P −βE(s)
se E(s)
Ē = P −βE(s)
(7.2.3)
se
U = N Ē (7.2.5)
This means that in many cases, working atom-by-atom is no different than studying the whole
system (if we only care about averages).
7.2.1 Paramagnets
For paramagnets, there’s just up and down dipole moment, so
and so we can easily compute the up and down probabilities. The average energy is
1 ∂Z
Ē = − = −µB tanh(βµB) (7.2.7)
Z ∂β
which is what we found before. We also get the Magnetization right.
48
7.2.2 Rotation
A more interesting example is the rotational motion of a diatomic molecule.
In QM, rotations are quantized with total angular momentum j~. The energy levels are
for some fixed energy inversely proportional to the molecules moment of inertia. The number of
energy levels at angular momentum j is 2j + 1. (Here we are assuming the two ends of the molecule
are non-identical.)
Given this structure, we can immediately write the partition function
∞
X
Z = (2j + 1)e−E(j)/kT
j=0
∞
X
= (2j + 1)e−j(j+1) kT (7.2.9)
j=0
Unfortunately we can’t do the sum in closed form, but we can approximate it.
Note that for a CO molecule, = 0.00024 eV, so that /k = 2.8 K. Usually we’re interested in
much larger temperatures, so that /kT 1. This is the classical limit, where the quantum spacing
doesn’t matter much. Then we have
Z ∞
kT
Z≈ (2j + 1)e−j(j+1) kT dj = (7.2.10)
0
when kT . As one might expect, the partition function increases with temperature. Now we find
∂ log Z
Ē = − = kT (7.2.11)
∂β
Differentiating with respect to T gives the contribution to the heat capacity per molecule
∂Ē
CV ⊃ =k (7.2.12)
∂T
as expected for 2 rotational DoF. Note that this only holds for kT ; actually the heat capacity
goes to zero at small T .
If the molecules are made of indistinguishable atoms like H2 or O2 , then turning it around gives
back the same molecule, and there are half as many rotatational degrees of freedom. Thus we have
kT
Z= (7.2.13)
2
The extra 1/2 cancels out when we compute the average energy, so it has no effect on heat capacity.
At low temperatures we need to more carefully consider QM to get the right answer.
49
7.2.3 Very Quantum System
What if we have a very quantum system, so that energy level splittings are much larger than kT ?
Then we have
Z ≈ 1 + e−βE1 (7.2.14)
where E1 is the lowest excited state, and we are assuming En > E1 kT . Note that the expectation
of the energy is
where we have set the energy of the ground state to 0 as our normalization.
We also have that
E1
∂Ē E1 e− kT
= (7.2.16)
∂T kT 2
It may appear that as T → 0 this blows up, but actually it goes to zero very rapidly, because
exponentials dominate power-laws. For comparison note that xn e−x → 0 as x → ∞, this is just the
x → 1/x version of that statement.
7.3 Equipartition
We have discussed the equipartition theorem... now let’s derive it.
Recall that it doesn’t apply to all systems, but just to quadratic degrees of freedom with energies
like
E = kq 2 (7.3.1)
where q = x as in a harmonic oscillator, or q = p as in the kinetic energy term. We’ll just treat q as
the thing that parameterizes states, so that different q corresponds to a different independent state.
This leads to
2
X X
Z = e−βE(q) = e−βkq
q q
Z ∞
2
≈ e−βkq dq (7.3.2)
−∞
50
Really the latter should be the definition in the classical limit, but we might also imagine starting
with a QM description with discrete
√ q and smearing it out. Anyway, we can evaluate the integral by
a change of variables to x = βkq so that
Z ∞
1 2
Z≈√ e−x dx (7.3.3)
βk −∞
This is just the statement that we can take the classical limit, because energy level spacings are
small.
51
where this is the nth moment. The vrms is the square root of the 2nd moment.
The Boltzmann factor for a gas molecule is the simple
v2
m~
e− 2kT (7.4.3)
using the usual formula for kinetic energy. But if we want to compute the probability distribution
for speed P (s), we need to account for the dimensionality of space. This means that we have
Z
m~v2 m~v2
P (s) ∝ d2~v e− 2kT = 4πs2 e− 2kT (7.4.4)
|v|=s
We used proportional to because we need to normalize. For that we need to compute the constant
N such that
Z ∞
m~v2
N = ds 4πs2 e− 2kT
0
3/2 Z ∞
2kT 2
= 4π x2 e−x dx (7.4.5)
m 0
52
For a system at temperature T (in contact with a reservoir), the partition function Z(T ) is
the fundamental quantity. We know that Z(T ) is proportional to the number of accessible states.
Since Z(T ) should (thus) tend to increase, its logarithm will tend to increase too. This suggests a
relationship with −F , where F is the Helmholtz free energy. In fact
One could even take this as the definition of F . Instead, we’ll derive it from our other definition
F = U − TS (7.5.2)
This is a differential equation for F . Let’s show that −kT log Z satisfies this differential equation,
and with the same initial condition.
Letting F̃ = −kT log Z, we have that
∂F̃ ∂
= −k log Z − kT log Z (7.5.4)
∂T ∂T
but
∂ U
log Z = (7.5.5)
∂T kT 2
This then tells us that F˜ obeys the correct differential equation.
They also agree at T = 0, since both F and F̃ will be 1, as the system will just be in its ground
state. So they are the same thing.
One reason why this formula is so useful is that
∂F ∂F ∂F
S=− , P =− , µ= (7.5.6)
∂T ∂V ∂N
where we fix the T, V, N we aren’t differentiating with respect to. So from Z we can get all of these
quantities.
53
From this perspective, the statement that
F = −kT log Z (7.6.2)
is equivalent to the statement that we can just view
Z ∞
E
Z = dE eS(E)− kT
0
U
S(U )− kT
≈ e (7.6.3)
That is, somehow we have simply forgotten about the integral! What’s going on? Have we made a
mistake?
The reason why this is alright is that the function
E
eS(E)− kT (7.6.4)
will be very, very sharply peaked at the energy U = E.¯ Its so sharply peaked that its integral is
simply dominated by the value at the peak. This ‘sharply peaked’ behavior is exactly what we
spent the first couple of weeks of class going on and on about. So
F = −kT log Z = U − T S (7.6.5)
is actually true because we are taking averages in the thermodynamic limit.
This provides a new perspective on thermodynamic potentials. That is, if we work in the
microcanonical ensemble, that is the ensemble of states with fixed energy U , then its natural to
study the multiplicity
Ω(U ) = eS(U ) (7.6.6)
and expect that systems in equilibrium will maximize this quantity. But we can introduce a
temperature by ‘averaging over states’ weighted by a Boltzmann factor. This puts us in the
canonical ensemble, the ensemble of states of all energies with fixed average energy. And it’s
simply given by taking the Laplace transform with respect to energy
Z ∞
Z(β) = dE e−βE eS(E) (7.6.7)
0
where β = 1/kT as usual. Notice that we have traded a function of energy U for a function of
temperature T , just as in the Legendre transform discussion. This isn’t a coincidence!
The statement that this is dominated by its maximum also implies that the canonical and
microcanonical ensembles are the same thing in the thermodynamic limit. Further, it tells us
that since
−kT log Z = U − T S(U ) (7.6.8)
we learn that the Legendre transform is just the thermodynamic limit of the Laplace
transform. This is a useful way to think about Legendre transforms in thermodynamics.
We can obtain all the other thermodynamic potentials, and all of the other Legendre transforms,
in exactly this way.
54
8 Entropy and Information
8.1 Entropy in Information Theory
We have been thinking of entropy as simply
S = log Ω (8.1.1)
But for a variety of reasons, it’s more useful to think of entropy as a function of the probability
distribution rather than as depending on a number of states.
If we have Ω states available, and we believe the probability of being in any of those states is
uniform, then this probability will be
1
pi = (8.1.2)
Ω
for any of the states i = 1, · · · , Ω. So we see that
S = − log p (8.1.3)
for a uniform distribution with all pi = p = 1/Ω. But notice that this can also be viewed as
X
S≡− pi log pi (8.1.4)
i
since all pi are equal, and their sum is 1 (since probabilities are normalized). We have not yet
demonstrated it for distributions that are not uniform, but this will turn out to be the most useful
general notion of entropy. It can also be applied for continuous probability distributions, where
Z
S ≡ − dx p(x) log p(x) (8.1.5)
55
where we see that
So the entropy S that we’ve studied in statistical mechanics, ie the log of the number of possible
sequences, naturally turns into the definition in terms of probability distributions discussed above.
This has to do with information theory because N S quantifies the amount of information
we actually gain by observing the sequence. Notice that as Warren Weaver wrote in an initial
popularization of Shannon’s ideas:
• “The word information in communication theory is not related to what you do say, but to
what you could say. That is, information is a measure of one’s freedom of choice when one
selects a message.”
2. If all pi are equal, so that all pi = 1/n, then H should increase with n. That is, having more
equally likely options increases the amount of ‘possibility’ or ‘choice’.
3. If the probabilities can be broken down into a series of events, then H must be a weighted
sum of the individual values of H. For example, the probabilities for events A, B, C
1 1 1
{A, B, C} = , , (8.2.1)
2 3 6
can be rewritten as the process
1 1 2 1
{A, B or C} = , then {B, C} = , (8.2.2)
2 2 3 3
We are requiring that such a situation follows the rule
1 1 1 1 1 1 2 1
H , , =H , + H , (8.2.3)
2 3 6 2 2 2 3 3
56
We will show that up to a positive constant factor, the entropy S is the only quantity with these
properties.
Before we try to give a formal argument, let’s see why the logarithm shows up. Consider
1 1 1 1 1 1 1 1 1
H , , , = H , +2 H ,
4 4 4 4 2 2 2 2 2
1 1
= 2H , (8.2.4)
2 2
Similarly
1 1 1 1
H ,··· , = 3H , (8.2.5)
8 8 2 2
and so this is the origin of the logarithm. To really prove it, we will take two fractions, and note
that s1n ≈ t1m for sufficiently large n and m. The only other point we then need is that we can
approximate any other numbers using long, large trees.
Here is the formal proof. Let
1 1
A(n) ≡ H ,··· , (8.2.6)
n n
for n equally likely possibilities. By using an exponential tree we can decompose a choice of sm
equally likely possibilities into a series of m choices among s possibilities. So
By taking arbitrarily large n we can find an n, m with sm ≤ tn < sm+1 , and so by taking the
logarithm and re-arranging we can write
m log t m 1
≤ ≤ + (8.2.9)
n log s n n
which means that we can make
m log t
− < (8.2.10)
n log s
57
So we also find that
m A(t)
− < (8.2.12)
n A(s)
and so we conclude via these relations that
A(t) = K log t (8.2.13)
for some positive constant K, since we can make arbitrarily small.
But now by continuity we are essentially done, because we can approximate any set of probabilities
pi arbitrarily well by using a very fine tree of equal probabilities.
Notice that the logarithm was picked out by our assumption 3 about the way that H applies to
the tree decomposition of a sequence of events.
58
8.4 A More Sophisticated Derivation of Boltzmann Factors
Boltzmann factors can be derived in another simple and principled way – they are the probabilities
that maximize the entropy given the constraint that the average energy is held fixed.
We can formalize this with some Lagrange multipliers, as follows.
We want to fix the expectation value of the energy and the total probability while maximizing
the entropy. We can write this maximization problem using a function (Lagrangian)
! !
X X X
L=− pi log pi + β hEi − pi Ei + ν pi − 1 (8.4.1)
i i i
where we are maximizing/extremizing L with respect to the pi and β, ν; the latter are Lagrange
multipliers.
Varying gives the two constraints along with
pi = eν−1−βEi (8.4.3)
Now ν just sets the total sum of the pi while β is determined by the average energy itself. So we have
re-derived Boltzmann factors in a different way. This derivation also makes it clear what abstract
assumptions are important in arriving at e−βE .
58 + 23 = 81 (8.5.1)
We started out with information representing both 58 and 23. Typically this would be stored as an
integer, and for example a 16 bit integer has information, or entropy, 16 log 2. But at the end of the
computation, we don’t remember what we started with, rather we just know the answer. Thus we
have created an entropy
59
Since our computer will certainly be working at finite temperature, eg room temperature, we
will be forced by the laws of thermodynamics to create heat
Clearly this isn’t very significant for one addition, but it’s interesting as its the fundamental limit.
Furthermore, computers today are very powerful. For example, it has been estimated that while
training AlphaGo Zero, roughly
were performed. Depending on whether they were using 8, 16, or 32 bit floating point numbers (let’s
assume the last), this meant that erasure accounted for
of heat. That’s actually a macroscopic quantity, and the laws of thermodynamics say that it’s
impossible to do better with irreversible computation!
But note that this isn’t most of the heat. For example, a currently state of the art GPU like the
Nvidia Tesla V100 draw about 250 Watts of power and perform at max about 1014 flop/s. This
means their theoretical minimum power draw is
Thus state of the art GPUs are still tens of millions of times less efficient than the theoretical
minimum. We’re much, much further from the theoretical limits of computation than we are from
the theoretical limits of heat engine efficiency.
In principle we can do even better through reversible computation. After all, there’s no reason to
make erasures. For example, when adding we could perform an operation mapping
for example
so that no information is erased. In this case, we could in principle perform any computation we like
without producing any waste heat at all. But we need to keep all of the input information around
to avoid creating entropy and using up energy.
9 Transport, In Brief
Let’s get back to real physics and discuss transport. This is a fun exercise. We’ll study the transport
of heat (energy), of momentum, and of particles.
60
9.1 Heat Conduction
We’ll talk about radiation later, in the quantum stat mech section. And discussing convection is
basically hopeless, as fluid motions and mixings can be very complicated. But we can try to discuss
conduction using some simple models and physical intuition.
The idea is roughly that molecules are directly bumping into each other and transferring energy.
But this can also mean that energy is carried by lattice vibrations in a solid, or by the motion of
free electrons (good electrical conductors tend to have good heat conductivity too).
We can start by just making a simple guess that
ΔT δtA
Q∝ (9.1.1)
δx
or
dQ dT
∝A (9.1.2)
dt dx
The overall constant is called the thermal conductivity of a given fixed material. So
dQ dT
= −kt A (9.1.3)
dt dx
which is called the Fourier heat conduction law.
Thermal conductivities of common materials vary by four orders of magnitude, in W/(mK): air
0.026, wood 0.08, water 0.6, glass 0.8, iron 80, copper 400. In common household situations thin
layers of air are very important as insulation, and can be more important than glass itself.
π(2r)2 ` (9.1.4)
9.2 Viscosity
Another quantity that you may have heard of is viscosity. It concerns the spread of momentum
through a fluid.
To understand this, we want to think of a picture of a fluid flowing in the x̂ direction, but where
the velocity of the fluid changes in the z direction. This could lead to chaotic turbulence, but let’s
assume that’s not the case, so that we have laminar flow.
In almost all cases, as one might guess intuitively, fluids tend to resist this shearing, differential
flow. This resistance is called viscosity. More viscosity means more resistance to this differential.
It’s not hard to guess what the viscous force is proportional to. We’d expect that
Fx dux
∝ (9.2.1)
A dz
62
and the proportionality constant is the viscosity, ie
Fx dux
∝η (9.2.2)
A dz
for coefficient of viscosity η. Although Fx /A has units of pressure, it isn’t a pressure, rather its
shear stress.
Viscosities vary a great deal with both material and temperature. Gases have lower viscosities
than fluids. And ideal gases have viscosity independent of pressure but increasing with temperature.
This can be explained because viscosity depends on momentum density in the gas, the mean free
path, and the average thermal speed. The first two depend on the density, but this cancels between
them – a denser gas carries more momentum, but it carries it less far! But since molecules
√ move
faster at higher temperature, more momentum gets transported, and so in a gas η ∝ T just as for
heat conductivity.
In a liquid, viscosity actually decreases with temperature because the molecules don’t stick to
each other as much or as well at higher temperature, and this ‘sticking’ is the primary cause of
liquid viscosity. Note that in effect η = ∞ for a solid, where ‘sticking’ dominates.
9.3 Diffusion
Now let’s talk about the spread of particles from high concentration to low concentration.
Let’s imagine the density n of particles increases uniformly in the x̂ direction. The flux is the
~ Once again we’d guess that
net number of particles crossing a surface, we’ll write it as J.
dn
Jx = −D (9.3.1)
dx
where D is the diffusion coefficient, which depends on both what’s diffusing and what it’s diffusing
through. Diffusion for large molecules is slower than for small molecules, and diffusion is much
faster through gases than through liquids. Values ranging from 10−11 to 10−5 m2 /s can be found.
Diffusion coefficients increase with temperature, because molecules move faster.
Diffusion is a very inefficient way for particles to travel. If I make a rough estimate for dye in
water that I want roughly all the dye to move through a glass of water then
N N/V
=D (9.3.2)
AΔt δx
and V = Aδx, then i have that
V δx δx2
Δt ≈ ≈ (9.3.3)
AD D
for a glass of water with δx ≈ 0.1 m and a D ≈ 10−9 m2 /s, it would take 107 seconds, or about 4
months!
Real mixing happens through convection and turbulence, which is much, much faster, but also
more complicated.
63
10 Quantum Statistical Mechanics
Let’s get back to our main topic and (officially) begin to discuss quantum statistical mechanics.
This follows because every state of the total systems is (s1 , s2 ), the Cartesian product. And
furthermore, there is no relation or indistinguishability issue between states in the first and second
system.
I put this discussion under the Quantum Stat Mech heading because, in many cases, the systems
are indistinguishable. In that case the state stotal = (s1 , s2 ) = (s2 , s1 ), ie these are the same state.
We can approximate the partition function for that case by
1
Ztotal ≈ Z1 Z2 (10.1.2)
2
but this is only approximate, because we have failed to account correctly for s1 = s2 , in which case
we don’t need the 1/2. In the limit that most energy levels are unoccupied, this isn’t a big problem.
We’ll fix it soon.
These expressions have the obvious generalizations
N
Y
Ztotal = Zi (10.1.3)
i=1
64
The total partition function is
1 N
Ztotal = Z (10.1.5)
N! 1
in terms of the partition function for a single gas molecule, Z1 . So we just have to compute that.
To get Z1 we need to add up the Boltzmann factors for all of the microstates of a single molecule
of gas. This combines into
where the first is the kinetic energy from translational DoF, and the latter are internal states such
as vibrations and rotations. There can also be electronic internal states due to degeneracies in the
ground states of the electrons in a molecule. For example, oxygen molecules have 3-fold degenerate
ground states.
Let’s just focus on Ztr . To do this correctly we need quantum mechanics.
This isn’t a QM course, so I’m mostly just going to tell you the result. A particle in a 1-d box
must have momentum
h
pn = n (10.1.7)
2L
as it must be a standing wave (with Dirichlet boundary conditions). So the energy levels are
p2n h2 n2
En = = (10.1.8)
2m 8mL2
From the energies, we can write down the partition function
X h 2 n2
Z1d = e−β 8mL2 (10.1.9)
n
Unless the box is extremely small, or T is extremely small we can approximate this by the integral
Z ∞
h2 n2 L
Z1d ≈ dne−β 8mL2 = h
(10.1.10)
0 √
2πmkB T
65
because the x, y, z motions are all independent. This gives
V
Z3d = (10.1.13)
`3Q
For example we can find the pressure, entropy, and chemical potential this way. We will once again
recover expressions we’ve seen earlier in the semester.
66
Recall that
P (s2 )
= eSR (s2 )−SR (s1 ) (10.2.1)
P (s1 )
Now we need to use
1
dSR = (dUR + P dVR − µdNR ) (10.2.2)
T
to find that
1
SR (s2 ) − SR (s1 ) = − (E(s2 ) − E(s1 ) − µN (s2 ) + µN (s1 )) (10.2.3)
T
where E and N refer to the system. Thus we find that the Gibbs factor is
where the sum includes all states, ranging over all values of E and all N . This is the promised
grand partition function or grand canonical distribution. In the presence of more types of
particles, we have chemical potentials and numbers for each.
Z = 1 + e−(−µ)/kT (10.2.7)
where = −0.7 eV. The chemical potential µ is high in the lungs, where oxygen is abundant, but
lower in cells where oxygen is needed. Near the lungs the partial pressure of oxygen is about 0.2
atm, and so the chemical potential is
!
V Zint
µ = −kT log ≈ −0.6eV (10.2.8)
N `3Q
e−(−µ)/kT ≈ 40 (10.2.9)
67
and so the probability of occupation by oxygen is 98%.
But things change in the presence of carbon monoxide, CO. Then there are three possible states
0 0
Z = 1 + e−(−µ)/kT + e−( −µ )/kT (10.2.10)
The CO molecule will be less abundant. If it’s 100 times less abundant, then
But CO binds more tightly to hemoglobin, so that 0 ≈ −0.85 eV. In total this means
0 0
e−( −µ )/kT ≈ 120 (10.2.12)
which means that the oxygen occupation probability sinks to 25%. This is why CO is poisonous.
V /N `3Q (10.3.2)
When this is violated, the wavefunctions of the particles start to overlap, and quantum effects
become important. Systems that are quantum are either very dense or very cold, since we recall that
h
`Q = √ (10.3.3)
2πmkB T
We can picture this as a situation where the particle’s individual wavefunctions start to overlap. We
can’t separate them (make the wavefunctions more narrow) without giving them a lot of momentum,
increasing the energy and temperature of the gas.
Distribution Functions
We introduced the Gibbs ensemble to make computing the distribution functions easier. The idea is
to focus on individual, specific states that (some number of ) particles can be.
68
If we have a given single-particle state, then the probability of it being occupied by n particles is
1 − n (−µ)
P (n) = e kT (10.3.4)
Z
So we can use this to determine the Z for fermions and bosons.
In the case of fermions, the only occupancy allowed is n = 0, 1. That means that for any given
state, we simply have
1
Z = 1 + e− kT (−µ) (10.3.5)
1
n̄ = 1
+ kT (−µ)
(10.3.9)
e −1
where once again we emphasize that the signs here are crucial! This is the Bose-Einstein
distribution.
Notice that the B-E distribution goes to infinity as → µ. This means that in that limit, the
state achieves infinite occupancy! In the case of fermions this was impossible because occupancy can
only be 0 or 1. This divergence is Bose condensation. Draw it!
69
Let’s compare these results to the classical limit, where we’d have a Boltzmann distribution.
In the Boltzmann distribution
n̄ = e−(−µ)/kT 1 (10.3.10)
Draw it! I wrote the last inequality because the classical limit obtains when we don’t have many
particles in the same state. In this limit, it’s fine to simply ignore large occupancies. Note that this
is equal to both the F-D and B-E distributions in the desired limit. The classical limit is low
occupancy.
In the rest of this section we will apply these ideas to simple systems, treating ‘quantum gases’.
That means that we can pretend that the systems are governed by single-particle physics. This
applies to electrons in metals, neutrons in neutron stars, atoms in a fluid at low temperature, photons
in a hot oven, and even ‘phonons’, the quantized sound (vibrations) in a solid. We’ll determine µ in
most cases indirectly, using the total number of particles.
Zero Temperature
Recall the F-D distribution
1
n̄F D = 1
(−µ)
(10.4.2)
e kT +1
In the limit that T = 0 this is a step function. All states with < µ are occupied, and all others are
unoccupied.
We call
F ≡ µ(T = 0) (10.4.3)
the Fermi Energy. In this low-temperature state of affairs the fermion gas is said to be ‘degenerate’.
70
How can we determine F ? It can be fixed if we know how many electrons are actually present.
If we imagine adding electrons to an empty box, then they simply fill the states from lowest energy
up. To add one more electron you need F = µ of energy, and our understanding that
∂U
µ= (10.4.4)
∂N S,V
makes perfect sense. Note that adding an electron doesn’t change S, since we’re at zero temperature
and the state is unique!
To compute F we’ll assume that the electrons are free particles in a box of volume V = L3 . This
isn’t a great approximation because of interactions with ions in the lattice, but we’ll neglect them
anyway.
As previously seen, the allowed wavefunctions are just sines and cosines, with momenta
h
p~ = ~n (10.4.5)
2L
where ~n = (nx , ny , nz ). Energies are therefore
h2 2
= ~n (10.4.6)
8mL2
To visualize the allowed states, its easiest to draw them in ~n-space, where only positive integer
coordinates are allowed. Each point describes two states after accounting for electron spins. Energies
are proportional to ~n2 , so states start at the origin and expand outward to fill availabilities.
The Fermi energy is just
h2 n2max
F = (10.4.7)
8mL2
We can compute this by computing the total volume of the interior region in the space, which is
1 4 3 πn3max
N = 2 × × πnmax = (10.4.8)
8 3 3
Thus we find that
2/3
h2
3N
F = (10.4.9)
8m πV
Is this intensive of extensive? What does that mean? (Shape of ‘box’ is irrelevant.)
We can compute the average energy per electron by computing the total energy and dividing.
That is
h2 2
Z
1
U = 2 d3 n ~n
8 8mL2
Z nmax
h2 2
= dn(πn2 ) ~n
0 8mL2
πh2 n5max
=
40mL2
3
= N F (10.4.10)
5
71
where the 3/5 is a geometrical factor that would be different in more or fewer spatial dimensions.
1
Note that F ∼ few eV, which is much much larger than kT ≈ 40 eV. This is the same as
the comparison between the quantum volume and the average volume per particle. The Fermi
temperature is
F
TF ≡ & 104 K (10.4.11)
kB
which is hypothetical insofar as metals would liquefy before reaching this temperature.
Using the standard or thermodynamic definition of the pressure we can find
∂U
P = −
∂V
2/3 !
3 h2
∂ 3N
= − N
∂V 5 8m πV
2N F 2U
= = (10.4.12)
5V 3V
which is the degeneracy pressure. Electrons don’t want to be compressed, as they want space!
(Remember this has nothing to do with electrostatic repulsion.)
The degeneracy pressure is enormous, of order 109 N/m2 , but it isn’t measurable, as it is canceled
by the electrostatic forces that pulled the electrons into the metal in the first place. The bulk
modulus is measureable, and its just
∂P 10 U
B = −V = (10.4.13)
∂V T 9 V
This quantity is also large, but its not completely canceled by electrostatic forces, and (very roughly)
accords with experiment.
Small Temperatures
The distribution of electrons barely changes when TF T > 0, but we need to study finite
temperature to see how metals change with temperature – for example, to compute the heat
capacity!
Usually particles want to acquire a thermal energy kT , but in a degenerate Fermi gas they can’t,
because most states are occupied. So it’s only the state near the Fermi surface that can become
excited. Furthermore, it’s only states within kT of the Fermi surface that can be excited. So the
number of excitable electrons is proportional to T .
This suggests that
δU ∝ (N kT )(kT ) ∼ N (kT )2 (10.4.14)
By dimensional analysis then, since the only energy scale around is F , we should have
(kT )2
δU ≈ N (10.4.15)
F
72
This implies that
kT
CV ∝ N (10.4.16)
F
This agrees with experiments. Note that it also matches what we expect from the 3rd law of
thermodynamics, namely that CV → 0 as T → 0.
Density of States
A natural object of study is the density of states, which is the number of single-particle states
per unit of energy. For a Fermi gas note that
h2 2 L√
= ~n , |~n| = 8m (10.4.17)
8mL2 h
so that
r
L 2m
dn = d (10.4.18)
h
This means that we can write integrals like
Z nmax
h2 2
U = dn(πn2 ) n
0 8mL2
Z F " 3/2 #
π 8mL2 √
= d 2
(10.4.19)
0 2 h
but at higher temperature we have to multiply by the non-trivial occupancy, which is really the
right way to think about zero temperature as well
Z ∞ Z ∞
g()
N= g()n̄F D ()d = 1 d (10.4.22)
0 0 e kT (−µ) + 1
73
similarly for the energy
Z ∞ Z ∞
g()
U= g()n̄F D ()d = 1
(−µ)
d (10.4.23)
0 0 e kT +1
The zero temperature limit is just a special case.
Note that at T > 0 the chemical potential changes. Its determined by the fact that the integral
expression for N must remain fixed, and since g() is larger for larger , the chemical potential must
change to compensate. In fact it must decrease slightly. We can compute it by doing the integrals
and determining µ(T ). Then we can determine U (T ) as well.
kB T F (10.4.24)
74
So far this was exact, but now we will start making approximations.
First of all, note that µ/kT 1 and that the integrand falls off exponentially when −x 1.
This means that we can approximate
Z ∞
2 ex
N ≈ g0 x + 1)2
3/2 dx (10.4.30)
3 −∞ (e
Now the integral can be performed term-by-term. This provides a series expansion of the result,
while neglecting some exponentially small corrections.
The first term is
Z ∞ Z ∞
ex
1
x 2
dx = −∂x x dx = 1 (10.4.33)
−∞ (e + 1) −∞ e + 1)
π 2 (kT )2
2 3/2
N = g0 µ 1+ + ···
3 8 µ2
3/2
π 2 (kT )2
µ
= N 1+ + ··· (10.4.36)
F 8 µ2
This shows that µ/F ≈ 1, and so we can use that in the second term to solve and find
µ π 2 (kT )2
≈1− + ··· (10.4.37)
F 12 2F
75
so the chemical potential decreases a bit as T increases.
We can evaluate the energy integrals using the same tricks, with the result that
3 µ5/2 3π 2 (kT )2
U = N 3/2 + N + ··· (10.4.38)
5 F 8 F
Planck Distribution
Instead of equipartition, each mode of the electromagnetic field, which is its own harmonic oscillator,
can only have energy
En = n~ω (10.5.1)
where the values of ω are fixed by the boundary conditions of the box. This n is the number of
photons!
The partition function is
1
Z= (10.5.2)
1 − e−β~ω
for the Bose-Einstein distribution. So the average energy in an oscillator is
d ~ω
Ē = − log Z = β~ω (10.5.3)
dβ e −1
The number of photons is just given by n̄BE . But in the context of photons, its often called the
Planck distribution.
This solves the ultraviolet catastrophe, by suppressing the high energy contributions exponentially.
This really required quantization, so that high energy modes are ‘turned off’.
76
Note that for photons µ = 0. At equilibrium this is required by the fact that photons can be
freely created or destroyed, ie
∂F
= 0 = µγ (10.5.4)
∂N T,V
Photons in a Box
We would like to know the total number and energy of photons inside a box.
As usual, the momentum of these photons is
hm
pm = (10.5.5)
2L
but since they are relativistic, the energy formula is now
hcm
= pc = (10.5.6)
2L
This is also just what we get from their frequencies.
The energy in a mode is just the occupancy times the energy. This is
X hc|m| 1
U =2 hc|m|
(10.5.7)
mx ,my ,mz
2L e 2LkT − 1
the integrand is the spectrum, or energy density per unit photon energy
8π3 1
u() = 3 /kT
(10.5.10)
(hc) e −1
The number density is just u/. The spectrum peaks at ≈ 2.8kT ; the 3 comes from living in
3-dimensions.
77
To evaluate the integral, its useful to change variables to x = /kT , so that all of the physical
quantities come out of the integral, giving
8π(kT )4 ∞ x3
Z
U
= dx (10.5.11)
V (hc)3 0 ex − 1
This means that one can determine the temperature of an oven by letting a bit of radiation leak out
and looking at its color.
Total Energy
Doing the last integral gives
U 8π 5 (kT )4
= (10.5.12)
V 15(hc)3
Note that this only depend on kT , as the other constants are just there to make up units. Since its
an energy density, pure dimensional analysis could have told us that it had to scale as (kT )4 .
Numerically, the result is small. At a typical oven temperature of 460 K, the energy per unit
volume is 3.5 × 10−5 J/m3 . This is much smaller than the thermal energy of the air inside the oven...
because there are more air molecules than photons. And that’s because the photon wavelengths are
smaller than the typical separations between air molecules at this temperature.
Entropy
CV
We can determine the entropy by computing CV and then integrating T
dT . We have that
∂U
CV = = 4aT 3 (10.5.13)
∂T V
5 4
8π k
where a = 15(hc) 3 . Since our determinations were fully quantum, this works down to T = 0, which
means that
Z T
CV 4
S(T ) = dt = aT 3 (10.5.14)
0 t 3
The total number of photons scales the same way, but with a different coefficient.
CMB
The Cosmic Microwave Background is the most interesting photon gas; it’s at about 2.73 K. So its
spectrum peaks at 6.6 × 10−4 eV, with wavelength around in a millimeter, in the far infrared.
The CMB has far less energy than ordinary matter in the universe, but it has far more entropy –
about 109 units of S per cubic meter.
78
Photons Emission Power
Now that we have understood ‘photons in a box’, we would like to see how radiation will be emitted
from a hot body. A natural and classical starting point is to ask what happens if you poke a hole in
the box of photons.
Since all light travels at the same speed, the spectrum of emitted radiation will be the same as
the spectrum in the box. To compute the amount of radiation that escapes, we just need to do some
geometry.
To get out through the hole, radiation needs to have once been in a hemisphere of radius R from
the hole. The only tricky question is how much of the radiation at a given point on this hemisphere
goes out through a hole with area A. At an angle θ from the perpendicular the area A looks like it
has area
So the fraction of radiation that’s pointed in the correct direction, and thus gets out is
Z π/2
Aef f
dθ2πR2 sin θ
0 4πR2
Z π/2
A A
= dθ sin θ cos θ = (10.5.16)
0 2 4
Other than that, the rate is just given by the speed of light times the energy density, so we find
AU 2π 5 A(kT )4
c = (10.5.17)
4V 15h3 c2
In fact, this is the famous power per unit area blackbody emission formula
2π 5 k 4 4
P =A 3 2
T = AσT 4 (10.5.18)
15h c
where σ is known as the Stefan-Boltzmann constant, with value σ = 5.67 × 10−8 mW 2K4 . The
dependence on the fourth power of the temperature was discovered by Stefan empirically in 1879.
79
Sun and Earth
We can use basic properties of the Earth and Sun to make some interesting deductions.
The Earth receives 1370 W/m2 from the sun (known as the solar constant). The earth’s 150
million km from the sun. This tells us the sun’s total luminosity is 4 × 1026 Watts.
The sun’s radius is a little over 100 times earth’s, or 7 × 108 m. So its surface area is 6 × 1018 m2 .
From this information, if we assume an emissivity of 1, we can find that
1/4
luminosity
T = = 5800 K (10.5.19)
σA
which is in the near infrared. This is testable and agrees with experiment. This is close to visible
red light, so we get a lot of the sun’s energy in the visible spectrum. Note that ‘peak’ and ‘average’
aren’t quite the same here. (Perhaps in some rough sense evolution predicts that we should be able
to see light near the sun’s peak?)
We can also easily estimate the Earth’s equilibrium temperature, assuming its emission and
absorption are balanced. If the power emitted is equal to the power absorbed and emissivity is 1
then
which gives
T ≈ 280 K (10.5.22)
80
In fact, the ‘phonons’ of sound are extremely similar to the photons of light, except that phonons
travel at cs , have 3 polarizations, can have different speeds for different polarizations (and even
directions!), and cannot have wavelength smaller than (twice) the atomic spacing.
All this means that we have a simple relation
hcs
= hf = |~n| (10.6.1)
2L
and these modes have a Bose-Einstein distribution
1
n̄BE = (10.6.2)
e/(kT ) −1
with vanishing chemical potential, as phonons can be created and destroyed.
The energy is also the usual formula
X
U =3 n̄BE () (10.6.3)
nx ,ny ,nz
so that the total volume in n-space is the same. This approximation is exact at both low and high
temperatures! Why?
Given this approximation, we can easily convert our sum into spherical coordinates and compute
the total average energy. This is
Z nmax
4πn2
U = 3 dn
0 8 e kT − 1
Z nmax
3π hcs n3
= dn hcs n (10.6.5)
2 0 2L e 2LkT −1
Don’t be fooled – this is almost exactly the same as our treatment of photons.
We can change variables to
hcs
x= n (10.6.6)
2LkT
This means that we are integrating up to
1/3
hcs hcs 6N TD
xmax = nmax = = (10.6.7)
2LkT 2kT πV T
81
where
1/3
hcs 6N
TD = (10.6.8)
2k πV
so that we find
U = 3N kT (10.6.11)
which means that phonons are behaving exactly like photons, and physics is dominated by the
number of excited modes.
This means that at low temperature
12π 4 N k 3
CV = T (10.6.13)
5TD3
and this T 3 rule beautifully agrees with experimental data. Though for metals, to really match, we
need to also include the electron contribution.
82
for electrons. Notice that the power-law dependence on the density is different because electrons are
non-relativistic. In particular, we have
1/3
D cs me V
∝ (10.6.15)
F h N
This shows that if V /N doesn’t vary all that much, the main thing determining D is the speed of
sound cs . If we had time to cultivate a deeper understanding of materials, we could try to estimate
if we should expect this quantity to be order one.
You should also note that for T TD we might expect that solids will melt. Why? If not, it
means that the harmonic oscillator potentials have significant depth, so we can excite many, many
modes without the molecules escaping.
83
As we’ve seen before, we cannot do the integral analytically.
Let’s see what happens if we just choose µ = 0. In that case, using our usual x = /kT type
change of variables we have
3/2 Z ∞ √
2 2πmkT xdx
N=√ 2
V x
(10.7.5)
π h 0 e −1
Numerically this gives
3/2
2πmkT
N = 2.612 V (10.7.6)
h2
This really just means that there’s a particular Tc for which it’s true that µ = 0, so that
2/3
h2 N
kTc = 0.527 (10.7.7)
2πm V
At T > Tc we know that µ < 0 so that the total N is fixed. But what about at T < Tc ?
Actually, our integral representation breaks down at T < Tc as the discreteness of the states
becomes important. The integral does correctly represent the contribution of high energy states, but
fails to account for the few states near the bottom of the spectrum.
This suggests that
3/2
2πmkT
Nexcited ≈ 2.612 V (10.7.8)
h2
for T < Tc . (This doesn’t really account for low-lying states very close to the ground state.)
Thus we have learned that at temperature T > Tc , the chemical potential is negative and all
atoms are in excited states. But at T < Tc , the chemical potential µ ≈ 0 and so
3/2
T
Nexcited ≈ N (10.7.9)
Tc
and so the rest of the atoms are in the ground state with
" 3/2 #
T
N0 = N − Nexcited ≈ N 1 − (10.7.10)
Tc
The accumulation of atoms in the ground state is Bose-Einstein condensation, and Tc is called
the condensation temperature.
Notice that as one would expect, this occurs when quantum mechanics becomes important, so
that the quantum length
3
V 3 h
≈ `Q = √ (10.7.11)
N 2πmkT
84
So condensation occurs when the wavefunctions begin to overlap quite a bit.
Notice that having many bosons helps – that is for fixed volume
2/3
h2
N
kTc = 0.527 ∝ N 2/3 0 (10.7.12)
2πm V
We do not need a kT of order the ground state energy to get all of the bosons into the ground state!
Achieving BEC
BEC with cold atoms was first achieved in 1995, using a condensate of about 1000 rubidium-87
atoms. This involved laser cooling and trapping. By 1999 BEC was also achieved with sodium,
lithium, hydrogen.
Superfluids are also BECs, but in that case interactions among the atoms are important for
the interesting superfluidic properties. The famous example is helium-4, which has a superfluid
component below 2.17 K.
Note that superconductivity is a kind of BEC condensation of pairs of electrons, called Cooper
pairs. Helium-3 has a similar pairing-based superfluidic behavior at extremely low temperatures.
Why? Entropy
With distinguishable particles, the ground state’s not particularly favored until kT ∼ 0 . But with
indistinguishable particles, the counting changes dramatically.
The number of ways of arranging N indistinguishable particles in Z1 1-particle states is
Z1
(N + Z1 )! N
∼ eZ1 (10.7.13)
N !Z1 ! Z1
for Z1 N . Whereas in the distinguishable case we just have Z1N states, which is far, far larger. So
in the QM case with Bosons, at low temperatures we have very small relative multiplicity or
entropy, as these states will be suppressed by e−U/kT ∼ e−N . So we quickly transition to a BEC.
11 Phase Transitions
A phase transition is a discontinuous change in the properties of a substance as its environment
(pressure, temperature, chemical potential, magnetic field, composition or ‘doping’) only change
85
infinitesimally. The different forms of a substance are called phases. A diagram showing the
equilibrium phases as a function of environmental variables is called a phase diagram.
On your problem set you’ll show (if you do all the problems) that BEC is a phase transition.
Phase structure of water has a triple point where all meet, and a critical point where water
and steam become indistinguishable (374 C and 221 bars for water). No ice/liquid critical point,
though long molecules can form liquid crystal phase where molecules move but stay oriented. There
are many phases of ice at high pressure. Metastable phases are possible via super-cooling/heating
etc.
The pressure at which a gas can coexist with its solid or liquid phase is called the vapor
pressure. At T = 0.01 Celsius and P = 0.006 bar all three phases can coexist at the triple point.
At low pressure ice sublimates directly into vapor.
We’ve seen sublimation of CO2 , this is because the triple point is above atmospheric pressure, at
5.2 bars. Note pressure lowers the melting temperature of ice, which is unusual (because ice is less
dense than water).
There are also phase transitions between fluid/superfluid and to superconductivity (eg in the
temperature and magnetic field plane). Ferromagnets can have domains of up or down magnetization,
and there’s a critical point at B = 0 and T = Tc called the Curie point.
Phase transitions have been traditionally classified as ‘first order’ or ‘second order’ depending on
whether the Gibbs energy or its various derivatives are discontinuous; we usually just say first order
or ‘continuous’.
11.1 Clausius-Clapeyron
Given the relations for derivatives of G, it’s easy to see how phase boundaries must depend on P
and T .
86
At the boundary we must have that the two phases have G1 = G2 . But this also means that
so that
dP S1 − S2
= (11.1.3)
dT V1 − V2
and this determines the slope of the phase boundary (eg between a liquid and gas transition).
We can re-write this in terms of the latent heat L via
L = T (S1 − S2 ) (11.1.4)
We can actually try to compute the functions Bn (T ) by doing what’s called a ‘cluster expansion’ in
the interactions of the gas molecules.
But for now, let’s just see what we get if we include the first correction. A particular equation
of state that’s of this form, and which keeps only the first two terms, is the famous va der Waals
equation of state
N2
P + a 2 (V − N b) = N kT (11.2.2)
V
or
N kT N2
P = −a 2 (11.2.3)
V − Nb V
87
for constants a and b. This formula is just an approximation! Note that the absence of higher order
terms means that we are neglecting effects from interactions among many molecules at once, which
are certainly important at high density.
We can also write the vdW equation more elegantly in terms of purely intensive quantities as
kT 1
P = −a 2 (11.2.4)
v−b v
where v = V /N . This makes it clear that the total amount of stuff doesn’t matter!
The modification by b is easy to understand, as it means that the fluid cannot be compressed
down to zero volume. Thus b is a kind of minimum volume occupied by a molecule. It’s actually
well approximated by the cube of the average width of a molecule, eg 4 angstroms cubed for water.
The a term accounts for short-range attractive forces between molecules. Note that it scales
like N 2 /V 2 , so it corresponds to forces on all the molecules from the molecules, and it effectively
decreases the pressure (for a > 0), so it’s attractive. The constant a varies a lot between substances,
in fact by more than two orders of magnitude, depending on how much molecules interact.
Now let’s investigate the implications of the van der Waals equation. We’ll actually be focused
on the qualitative implications. Let’s plot pressure as a function of volume at fixed T . At large T
curves are smooth, but at small T they’re more complicated, and have a local minimum! How can a
gas’s pressure decrease when its volume decreases!?
To see what really happens, let’s compute G. We have
dG = −SdT + V dP + µdN (11.2.5)
At fixed T and N this is just dG = V dP so
∂G ∂P
=V (11.2.6)
∂V T,N ∂V T,N
From the van der Waals equation we can compute derivatives of P , giving
2aN 2
∂G N kT V
= 2
− (11.2.7)
∂V T,N V (V − N b)2
Integrating then gives
N 2 bkT 2aN 2
G = −N kT log(V − N b) + − + c(T ) (11.2.8)
V − Nb V
where c(T ) is some constant that can depend on T but not V .
Note that the thermodynamically stable state has minimum G. This is important because at a
given value of pressure P , there can be more than one value of G. We see this by plotting G and P
parametrically as functions of V . This shows the volume changes abruptly at a gas-liquid transition!
We can determine P at the phase transition directly, but there’s another cute, famous method.
Since the total change in G is zero across the transition, we have
I I I
∂G
0 = dG = dP = V dP (11.2.9)
∂P T,N
88
We can compute this last quantity from the PV diagram. This is called the Maxwell construction.
This is a bit non-rigorous, since we’re using unstable states/phases to get the result, but it works.
Repeating the analysis shows where the liquid gas transition occurs for a variety different
temperatures. The corresponding pressure is the vapor pressure where the transition occurs. At
the vapor pressure at a fixed volume, liquid and gas can coexist.
At high temperature, there is no phase boundary! Thus there’s a critical temperature Tc and
a corresponding Pc and Vc (N ) at which the phase transition disappears. At the critical point, liquids
and gases become identical! The use of the volume in all of this is rather incidental. Recall that
G = N µ, and so we can rephrase everything purely in terms of intensive quantities.
Note though that while van der Waals works qualitatively, it fails quantitatively. This isn’t
surprising, since it ignores multi-molecule interactions.
G = U + PV − TS (11.3.1)
where x is the fraction and GA/B are the Gibbs free energies of unmixed substances.
But if we account for entropy of mixing, then instead of a straight line, Smix will be largest when
x ≈ 1/2, so that G will be smaller than this straight line suggests. Instead G will be concave up as
a function of x.
We can roughly approximate
where N is the total number of molecules. This is correct for ideal gases, and for liquids or solids
where the molecules of each substance are roughly the same. This leads to
This is the result for an ideal mixture. Liquid and solid mixtures are usually far from ideal, but
this is a good starting point.
Since the derivative of G wrt x at x = 0 and x = 1 is actually infinite, equilibrium phases almost
always contain impurities.
Even non-ideal mixtures tend to behave this way, unless there’s some non-trivial energy or U
dependence in mixing, eg with oil and water, where the molecules are only attracted to their own
kind. In that case the G of mixing can actually be concave down – that’s why oil and water don’t
mix, at least at low T . Note that at higher T there’s a competition between the energy effect and
89
the entropy effect. Even at very low T , the entropy dominates near x = 0 or x = 1, so oil and water
will have some ‘impurities’.
Note that concave-down G(x) simply indicates an instability where the mixture will split up
into two unmixed substances. In this region we say that the two phases are immiscible or have a
solubility gap.
So we end up with two stable mixtures at small x > 0 and large x < 1, and unmixed substances
between. Solids can have solubility gaps too.
You can read about a variety of more complicated applications of these ideas in the text.
H2 O ↔ H + + OH − (11.4.1)
Even if one configuration of chemicals is more stable (water), every once in a while there’s enough
energy to cause the reaction to go the other way.
As usual, the way to think about this quantitatively is in terms of the Gibbs free energy
G = U − TS + PV (11.4.2)
and to consider the chemical reaction as a system of particles that will react until they reach an
equilibrium. We end up breaking up a few H2 O molecuels because although it costs energy, the
entropy increase makes up for that. Of course this depends on temperature though!
We can plot G as a function of x, where 1 − x is the fraction of H2 O. Without any mixing we
would expect G(x) to be a straight line, and since U is lower for water, we’d expect x = 0. However,
due to the entropy of mixing, equilibrium will have x > 0 – recall that G0 (x) is infinite at x = 0!
Plot this.
We can characterize the equilibrium condition by the fact that the slope of G(x) is zero. This
means that
X
0 = dG = µi dNi (11.4.3)
i
where we assumed T, P are fixed. The sum on the right runs over all species. But we know that
at equilibrium. Since the chemical potentials are a function of species concentration, this determines
the equilibrium concentrations!
90
The equilibrium condition is always the same as the reaction itself, with names of chemicals
replaced by their chemical potentials. For instance with
N2 + 3H2 ↔ 2N H3 (11.4.6)
we would have
where µi0 is the chemical potential in a fixed ‘standard’ state when its partial pressure is P0 . So for
example, for the last chemical reaction we can write
PN2 PH3 2
N H3 N2 H2 ΔG0
kT log = 2µ 0 − µ0 − 3µ0 = (11.4.9)
PN2 H3 P02 N
where K is the equilibrium constant. This equation is called the law of mass action by chemists.
Even if you don’t know K a priori, you can use this equation to determine what happens if you add
reactants to a reaction.
For this particular reaction – Nitrogen fixing – we have to use high temperatures and very high
pressures – this was invented by Fritz Haber, and it revolutionized the production of fertilizers and
explosives.
Ionization of Hydrogen
For the simple reaction
H ↔p+e (11.4.11)
we can compute everything from first principles. We have the equilibrium condition
PH P0
kT log = µp0 + µe0 − µH
0 (11.4.12)
Pp P e
91
We can treat all of these species as structureless monatomic gases, so that we can use
" 3/2 #
kT 2πmkT
µ = −kT log (11.4.13)
P h2
The only additional issues is that we need to subtract the ionization energy of I = 13.6 eV from µH0 .
Since mp ≈ mH we have
" 3/2 #
PH P0 kT 2πmkT I
− log = kT log 2
− (11.4.14)
P p Pe P0 h kT
Then with a bit of algebra we learn that
3/2
Pp kT 2πmkT
= e−I/kT (11.4.15)
PH Pe h2
which is called the Saha equation. Note that Pe /kT = Ne /V . If we plug in numbers for the
surface of the sun this gives
Pp
. 10−4 (11.4.16)
PH
so less than one atom in ten thousand are ionized. Note though that this is much, much larger than
the Boltzmann factor by itself!
92
But we also need to account for the redundancy because B molecules are identical. The result is
where f (T, P ) doesn’t depend on NA . It is what accounts for the interaction of B molecules with
the A molecules that surround them. This expression is valid as long as NB NA .
The chemical potentials follow from
∂G NB
µA = = µ0 (T, P ) − kT (11.5.5)
∂NA T,P,NB NA
and
∂G NB
µB = = f (T, P ) + kT log (11.5.6)
∂NB T,P,NA NA
Adding solute reduces µA and increases µB . Also, these chemical potentials depend only on the
intensive ration NB /NA .
Osmotic Pressure
If we have pure solvent on one side of a barrier permeable only by solvent, and solute-solvent solution
on the other side, then solvent will want to move across the barrier to further dilute the solution.
This is called osmosis.
To prevent osmosis, we could add additional pressure to the solution side. This is the osmotic
pressure. It’s determined by equality of solvent chemical potential. So we need
NB ∂µ0
µ0 − kT = µ0 + δP (11.5.7)
NA ∂P
Note that
∂µ0 V
= (11.5.8)
∂P N
at fixed T, N because G = N µ, and so
V NB
δP = kT (11.5.9)
NA NA
or
NB kT
δP = (11.5.10)
V
is the osmotic pressure.
93
Boiling and Freezing Points
The boiling and freezing points of a solution are altered by the presence of the solute. Often we can
assume the solute never evaporates, eg if we consider salt in water. Intuitively, the boiling point
may increase (and the freezing point decrease) because the effective liquid surface area is decreased
due to the solute ‘taking up space’ near the surface and preventing liquid molecules from moving.
In the case of freezing, the solute takes up space for building solid bonds.
Our theorizing above provides a quantitative treatment of these effects. For boiling we have
NB
µ0 (T, P ) − kT = µgas (T, P ) (11.5.11)
NA
and so the NB /NA fraction of solute makes the liquid chemical potential smaller, so that higher
temperatures are needed for boiling.
If we hold the pressure fixed and vary the temperature from the pure boiling point T0 we find
∂µ0 NB ∂µgas
µ0 (T0 , P ) + (T − T0 ) − kT = µgas (T, P ) + (T − T0 ) (11.5.12)
∂T NA ∂T
and we have that
∂µ S
=− (11.5.13)
∂T N
so that
Sliquid − Sgas NB
−(T − T0 ) = kT (11.5.14)
NA NA
The difference in entropies between gas and liquid is L/T0 , where L is the latent heat of vaporization,
so we see that
N kT02
T − T0 = (11.5.15)
L
where we treat T ≈ T0 on the RHS.
A similar treatment of pressure shows that
P NB
=1− (11.5.16)
P0 NA
for the boiling transition – we need to be at lower pressure for boiling to occur.
94
at the critical point, which helps us to easily identify it (eg using the van der Waals equation). A
cute way to say this is to write the vdW equation using intensive variables as
P v 3 − (P b + kT )v 2 + av − ab = 0 (11.6.2)
The critical point occurs when all three roots of the cubic coincide, so that
pc (v − vc )3 = 0 (11.6.3)
8 T̄ 3
P̄ = − 2 (11.6.6)
3 v̄ − 1/3 v̄
This is often called the law of corresponding states. The idea being that all gases/liquids correspond
with each other once expressed in these variables. This extends the extreme equivalence among
gases implied by the ideal gas law to the vdW equation.
Furthermore, since Tc , vc , pc only depend on a, b, there must be a relation among them, it is that
pc vc 3
= (11.6.7)
kT 8
Thus all gases described by the vdW equation must obey this relation! Of course general gases
depend on more than the two parameters a, b, since vdW required truncating the expansion in the
density.
This universality has been confirmed in experiments, which show that all gases, no matter their
molecular makeup, behave very similarly in the vicinity of the critical point. Let’s explore that.
95
For T < Tc so that T¯ < 1, the vdW equation has two solutions to
8 T̄ 3
P̄ = − 2 (11.6.8)
3 v̄ − 1/3 v̄
corresponding to a vgas and vliquid . This means that we can write
(3v̄l − 1)(3v̄g − 1)(v̄l + v̄g )
T̄ = (11.6.9)
8v̄g2 v̄l2
We can expand near the critical point by expanding in vg − vl , whcih gives
1
T̄ ≈ 1 − (v̄g − v̄l )2 (11.6.10)
16
or
p
v̄g − v̄l ∝ Tc − T (11.6.11)
so this tells us how the difference in molecular volumes varies near the critical temperature.
We can answer a similar question about pressure with no work, giving
P − Pc ∝ (v − vc )3 (11.6.12)
since the first and second derivatives of P wrt volume at Pc vanish.
As a third example, let’s consider the compressibility
1 ∂v
κ= − (11.6.13)
v ∂P T
We know that at Tc the derivative of pressure wrt volume vanishes, so we must have that
∂v
= −a(T − Tc ) (11.6.14)
∂P T,vc
96
which differ significantly from our predictions. In particular, these are non-analytic, ie they don’t
have a nice Taylor series expansion about the critical point.
We already saw a hint of why life’s more complicated at the critical point – there are large
fluctuations, for example in the density. The fact that the compressibility
1 ∂v 1
− ∝ (11.6.17)
v ∂P T (T − Tc )γ
(where we though γ = 1, and in fact γ = 1.2) means that the gas/liquid is becoming arbitrarily
compressible at the critical point. But this means that there will be huge, local fluctuations in the
density. In fact one can show that locally
Δn2 ∂hni 1 ∂v
= −kT (11.6.18)
n ∂v P,T v ∂P N,T
so that fluctuations in density are in fact diverging. This also means that we can’t use an equation
of state that only accounts for average pressure, volume, and density.
Understanding how to account for this is the subject of ‘Critical Phenomena’, with close links to
Quantum Field Theory, Conformal Field Theory, and the ‘Renormalization Group’. Perhaps they’ll
be treated by a future year-long version of this course. But for now we’ll move on to study a specific
model that shows some interesting related features.
12 Interactions
Let’s study a few cases where we can include the effects of interactions.
As a Magnetic System
The kind of model that we’ve already studied has energy or Hamiltonian
N
X
EB = −B si (12.1.1)
i=1
si = ±1 (12.1.2)
97
The Ising model adds an additional complication, a coupling between nearest neighbor spins, so
that
X N
X
E = −J s i sj − B si (12.1.3)
hiji i=1
where the notation hiji implies that we only sum over the nearest neighbors. How this actually
works depends on the dimension – one can study the Ising model in 1, 2, or more dimensions. We
can also consider different lattice shapes. We often use q to denote the number of nearest neighbors.
If J > 0 then neighboring spins prefer to be aligned, and the model acts as a ferromagnet; when
J < 0 the spins prefer to be anti-aligned and we have an anti-ferromagnet.
We can thus describe the Ising model via a partition function
X
Z= e−βE[si ] (12.1.4)
si
The B and J > 0 will make the spins want to align, but temperature causes them to want to
fluctuate. The natural observable to study is the magnetization
1 ∂
m= log Z (12.1.5)
N β ∂B
As a Lattice Gas
With the same math we can say different words and view the model as a lattice gas. Since the
particles have hard cores, only one can sit at each site, so the sites are either full, ni = 1, or empty,
ni = 0. Then we can further add a reward in energy for them to sit on neighboring sites, so that
X X
E = −4J ni nj − µ ni (12.1.6)
hiji i
where µ is the chemical potential, which determines overall particle number. This Hamiltonian is
the same as above with si = 2ni − 1.
si = (si − m) + m (12.1.7)
where m is the average spin throughout the lattice. Then the neighboring interactions are
98
The approximation comes in when we assume that the fluctuations of spins away from the average
are small. This means we treat
as this latter statement is never true, since we will sum over si = ±1. The former statement is
possibly true because si 6= sj . In this approximation the energy simplifies greatly to
X X
m(si + sj ) − more2 − B
Emf t = −J si
hiji i
1 X
= JN qm2 − (Jqm + B) si (12.1.11)
2 i
where N q/2 is the number of nearest neighbor pairs. So the mean field approximation has removed
the interactions. Instead we have
so that the spins see an extra contribution to the magnetic field set by the mean field of their
neighbors.
Now we are back in paramagnet territory, and so we can work out that the partition function is
JN qm2
Z = e−β 2 (e−βBef f + eβBef f )N
2
−β JN 2qm
= e 2N coshN (βBef f ) (12.1.13)
However, this is a function of m... yet m is the average spin, so this can’t be correct unless it predicts
the correct value of m. We resolve this issue by computing the magnetization using m and solving
for it, giving
Vanishing B
Perhaps the most interesting case is B = 0. The behavior depends on the value of βJq, as can be
noted from the expansion of tanh.
If βJq < 1, then the only solution is m = 0. So for T > Jq, ie at high temperatures, the average
magnetization vanishes. Thermal fluctuations dominate and the spins do not align.
99
However if T < Jq, ie at low temperatures, we have a solution m = 0 as well as magnetized
solutions with m = ±m0 . The latter correspond to phases where the spins are aligning. It turns out
that m = 0 is unstable, in a way analogous to the unstable solutions of the van der Waals equation.
The critical temperature separating the phases is
kTc = Jq (12.1.15)
Free Energy
We can understand the phase structure of the Ising model by computing the free energy. In the
MFT approximation it is
1 1 N
F = − log Z = JN qm2 − log(2 cosh(βBef f )) (12.1.16)
β 2 β
m = tanh(βBef f ) (12.1.17)
To determine which solutions dominate, we just calculate F (m). For example when B = 0 and
T < Tc we have the possibilities m = 0, ±m0 , and we see that F (m = 0) = 0 while F (±m0 ) < 0, so
that the aligned phases dominate.
Usually we can only define F in equilibrium. But we can take a maverick approach and pretend
that F (T, B, m) actually makes sense away from equilibrium, and see what it says about various
phases with different values of m. This is the Landau theory of phases. It is the beginning of a rich
subject...
100