Undergraduate Physics: Lecture Notes On
Undergraduate Physics: Lecture Notes On
Undergraduate Physics
Kevin Zhou
[email protected]
These notes review the undergraduate physics curriculum, with an emphasis on quantum mechanics.
They cover, essentially, the material that every working physicist should know. The notes are
not self-contained, but rather assume everything in the high school Physics Olympiad syllabus as
prerequisite knowledge. Nothing in these notes is original; they have been compiled from a variety
of sources. The primary sources were:
• David Tong’s Classical Dynamics lecture notes. A friendly set of notes that covers Lagrangian
and Hamiltonian mechanics with neat applications, such as the gauge theory of a falling cat.
• Arnold, Mathematical Methods of Classical Mechanics. The classic advanced mechanics book.
The first half of the book covers Lagrangian mechanics compactly, with nice and tricky problems,
while the second half covers Hamiltonian mechanics geometrically.
• David Tong’s Electrodynamics lecture notes. Covers electromagnetism at the standard Griffiths
level. Especially nice because it does the most complex calculations in index notation, when
vector notation becomes clunky or ambiguous.
• David Tong’s Statistical Mechanics lecture notes. Has an especially good discussion of phase
transitions, which leads in well to a further course on statistical field theory.
• Blundell and Blundell, Concepts in Thermal Physics. A good first statistical mechanics book
filled with applications, touching on information theory, non-equilibrium thermodynamics, the
Earth’s atmosphere, and much more.
• David Tong’s Applications of Quantum Mechanics lecture notes. A conversational set of notes,
with a focus on solid state physics. Also contains a nice section on quantum foundations.
• Robert Littlejon’s Physics 221 notes. An exceptionally clear set of graduate-level quantum
mechanics notes, with a focus on atomic physics: you read it and immediately understand.
Every important point and pitfall is discussed carefully, and complex material is developed
elegantly, often in a cleaner and more rigorous way than in any of the standard textbooks.
Much of these notes are just an imperfect summary of Littlejon’s notes; most diagrams are his.
The most recent version is here; please report any errors found to [email protected].
Contents
1 Classical Mechanics 1
1.1 Lagrangian Formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Rigid Body Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Hamiltonian Formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Poisson Brackets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Action-Angle Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 The Hamilton–Jacobi Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 Electromagnetism 21
2.1 Electrostatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Magnetostatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Electrodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 Relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5 Radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.6 Electromagnetism in Matter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3 Statistical Mechanics 46
3.1 Ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2 Thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3 Entropy and Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.4 Classical Gases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.5 Bose–Einstein Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.6 Fermi–Dirac Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4 Kinetic Theory 76
4.1 Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2 The Boltzmann Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.3 Hydrodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5 Fluids (TODO) 84
13 Scattering 214
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
13.2 Partial Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
13.3 Green’s Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
13.4 The Lippmann–Schwinger Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
13.5 The S-Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
1 1. Classical Mechanics
1 Classical Mechanics
1.1 Lagrangian Formalism
We begin by carefully considering generalized coordinates.
• It follows directly from the chain rule that the Euler–Lagrange equations are preserved by any
invertible coordinate change, to generalized coordinates qa = qa (xA ), because the action is a
property of a path and hence is extremized regardless of the coordinates used to describe the
path. The ability to use any generalized coordinates we want is a key practical advantage of
Lagrangian mechanics over Newtonian mechanics.
• It is a little less obvious this holds for time-dependent transformations qa = qa (xA , t), so we
will prove this explicitly. Again dropping indices,
∂L ∂L ∂x ∂L ∂ ẋ
= +
∂q ∂x ∂q ∂ ẋ ∂q
where we have q̇ = q̇(x, ẋ, t) and hence by invertibility x = x(q, t) and ẋ = ẋ(q, q̇, t), and
∂x ∂x
ẋ = q̇ + .
∂q ∂t
This yields the ‘cancellation of dots’ identity
∂ ẋ ∂x
= .
∂ q̇ ∂q
• It is a bit confusing why these partial derivatives are allowed. The point is that we are working
on the tangent bundle of some manifold, where the position and velocity are independent. They
are only related once we evaluate quantities on a specific path x(t). All total time derivatives
here implicitly refer to such a path.
2 1. Classical Mechanics
Next we show that if constraints exist, we can work in a reduced set of generalized coordinates.
fα (xA , t) = 0
which must hold on all physical paths. Holonomic constraints are useful because each one can
be used to eliminate a generalized coordinate; note that inequalities are not holonomic.
• Velocity-dependent constraints are holonomic if they can be ‘integrated’. For example, consider
a ball rolling without slipping. In one dimension, this is holonomic, since v = Rθ̇. In two
dimensions, it’s possible to roll the ball in a loop and have it come back in a different orientation.
Formally, a velocity constraint is holonomic if there is no nontrivial holonomy.
Thus, in these generalized coordinates, the constraint forces have disappeared. We may restrict
to the coordinates q a and use the original Lagrangian L. Note that in such an approach, we
cannot solve for the values of the constraint forces.
• In problems with symmetry, there will be conserved quantities, which may be formally written
as constraints on the positions and velocities. However, it’s important to remember that they
are not genuine constraints, because they only hold on-shell. Treating a conserved quantity as
a constraint and using the procedure above will give incorrect results.
• We may think of the coordinates q a as contravariant under changes of coordinates. Then the
conjugate momenta are covariant, so the quantity pi q̇ i is invariant. Similarly, the differential
form pi dq i is invariant.
Example. A single relativistic particle. The Lagrangian should be a Lorentz scalar, and the only
one available is the proper time. Setting c = 1, we have
p
L = −m 1 − ṙ2
Then the momentum is γmv as expected, and the action is proportional to the proper time,
Z p Z
S = −m 2 2
dt − dr = −m dτ.
Now consider how one might add a potential term. For a nonrelativistic particle, the potential term
is additive; in the relativistic case it can go inside or outside the square root. The two options are
Z s Z Z
2V 2 2
S1 = −m 1+ dt − dr , S2 = −m dτ + V dt.
m
Neither of these options are Lorentz invariant, which makes sense if we regard V as sourced by a
fixed background. However, we can get a Lorentz invariant action if we also transform the source.
In both cases, we need to extend V to a larger object. In the first case we must promote V to a
rank 2 tensor (because dt2 is rank 2), while in the second case we must promote V to a four-vector
(because dt is rank 1),
Z Z Z
gµν dx dx , S2 = −m dτ + e Aµ dxµ .
p
S1 = −m µ ν
These two possibilities yield gravity and electromagnetism, respectively. We see that in the nonrel-
ativistic limit, gµν = ηµν + hµν for small hµν , c2 h00 /2 becomes the gravitational potential.
4 1. Classical Mechanics
There are a few ways to see this “in advance”. For example, the former forces the effect of
the potential to be proportional to the mass, which corresponds to the equivalence of inertial and
gravitational mass in gravity. Another way to argue this is to note that electric charge is assumed to
be Lorentz invariant; this is experimentally supported because atoms are electrically neutral, despite
the much greater velocities of the electrons. This implies the charge density is the timelike part of
a current four-vector j µ . Since the source of electromagnetism is a four-vector, the fundamental
field Aµ is as well. However, the total mass/energy is not Lorentz invariant, but rather picks up a
factor of γ upon Lorentz transformation. This is because the energy density is part of a tensor T µν ,
and accordingly the gravitational field in relativity is described by a tensor gµν .
Specializing to electromagnetism, we have
p
L = −m 1 − ṙ2 − e(φ − ṙ · A)
d2 xµ dxν
m 2
= eF µν , Fµν = ∂µ Aν − ∂ν Aµ
dτ dτ
where Fµν is the field strength tensor. The current associated with the particle is
dxµ
Z
µ
j (x) = e dτ δ(x − x(τ )).
dτ
Further discussion of the relativistic point particle is given in the notes on String Theory.
• A rigid body is a collection of masses constrained so that kri − rj k is constant for all i and j.
Then a rigid body has six degrees of freedom, from translations and rotations.
• If we fix a point to be the origin, we have only the rotational degrees of freedom. Define a fixed
coordinate system {e ea } as well as a moving body frame {ea (t)} which moves with the body.
Both sets of axes are orthogonal and thus related by an orthogonal matrix,
Since the body frame is specified by R(t), the configuration space C of orientations is SO(3).
• Every point r in the body can be expanded in the space frame or the body frame as
• The matrix ω is antisymmetric, so we take the Hodge dual to get the angular velocity vector
1
ωa = abc ωbc , ω = ωa ea .
2
Inverting this relation, we have ωa abc = ωbc . Substituting into the above,
dea
= −abc ωb ec = ω × ea
dt
where we used (ea )d = δad .
v = ω×r
which can be derived from simple vector geometry. Using that picture, the physical interpretation
of ω is n̂ dφ/dt, where n̂ is the instantaneous axis of rotation and dφ/dt is the rate of rotation.
Generally, both n̂ and dφ/dt change with time.
Example. To get an explicit formula for R(t), note that Ṙ = ωR. The naive solution is the
exponential, but since ω doesn’t commute with itself at different times, we must use the path
ordered exponential, Z t
R(t) = P exp ω(t0 )dt0 .
0
For example, the second-order term here is
Z t00 Z t
00 00
ω(t ) dt ω(t0 ) dt0
0 t0
where the ω’s are ordered from later to earlier. Then when we differentiate with respect to t, it
only affects the dt00 integral, which pops out a factor of ω on the left as desired. This exponential
operation relates rotations R in SO(3) with infinitesimal rotations ω in so(3).
called the inertia tensor. Note that since the components of ω are in the body frame, so are
the components of I and ri that appear above; hence the Iab are constant.
6 1. Classical Mechanics
• Explicitly, for a continuous rigid body with mass density ρ(r), we have
y2 + z2
Z −xy −xz
I= d3 r ρ(r) −xy x2 + z 2 −yz .
−xz −yz x + y2
2
• Since I is symmetric, we can rotate the body frame to diagonalize it. The eigenvectors are
called the principal axes and the eigenvalues Ia are the principal moments of inertia. Since T
is nonnegative, I is positive semidefinite, so Ia ≥ 0.
• Parallel axis theorem states that if I0 is the inertia tensor about the center of mass, the inertia
tensor about the point c is
The proof is similar to the two-dimensional parallel axis theorem, with contributions proportional
P
to mi ri vanishing. The extra contribution the inertia tensor we would get if the object’s
mass was entirely at the center of mass.
• Similarly, the translational and rotational motion of a free spinning body ‘factorize’. If the
center of mass position is R(t), then
1 1
T = M Ṙ2 + ωa Iab ωb .
2 2
This means we can indeed ignore the center of mass motion for dynamics.
We thus recognize
1
T = ω · L.
L = I ω,
2
For general I, the angular momentum and angular velocity are not parallel.
• To find the equation of motion, we use dL/dt in the center of mass frame, for
Dotting both sides by eb gives 0 = L̇a + aij ωI Lj . In the case of principle axes (L1 = I1 ω1 ),
I1 ω̇1 + ω2 ω3 (I3 − I2 ) = 0
along with cyclic permutations thereof. These are Euler’s equations. In the case of a torque,
the components of the torque (in the principle axis frame) appear on the right.
We now analyze the motion of free tops. We consider the time evolution of the vectors L, ω, and
e3 . In the body frame, e3 is constant and points upward; in the space frame, L is constant, and for
convenience we take it to point upward. In general, we know that L and 2T = ω · L are constant.
7 1. Classical Mechanics
Example. A spherical top. In this trivial case, ω̇a = 0, so ω doesn’t move in the body frame, nor
does L. In the space frame, L and ω are again constant, and the axis e3 rotates about them. As a
simple example, the motion of e3 looks like the motion of a point on the globe as it rotates about
its axis.
Example. The symmetric top. Suppose I1 = I2 6= I3 , e.g. for a top with radial symmetry. Then
Then ω3 is constant, while the other two components rotate with frequency
Ω = ω3 (I1 − I3 )/I1 .
This implies that |ω| is constant. Moreover, we see that L, ω, and e3 all lie in the same plane.
In the body frame, both ω and L precess about e3 . Similarly, in the space frame, both ω and e3
precess about L. To visualize this motion, consider the point e2 and the case where Ω, ω1 , ω2 ω3 .
Without the precession, e2 simply rotates about L, tracing out a circle. With the precession, the
orbit of e2 also ‘wobbles’ slightly with frequency Ω.
Example. The Earth is an oblate ellipsoid with (I1 − I3 )/I1 ≈ −1/300, with ω3 = (1 day)−1 . Since
the oblateness itself is caused by the Earth’s rotation, the angular velocity is very nearly aligned
with e3 , though not exactly. We thus expect the Earth to wobble with a period of about 300 days;
this phenomenon is called the Chandler wobble.
Example. The asymmetric top. If all of the Ii are unequal, the Euler equations are much more
difficult to solve. Instead, we can consider the effect of small perturbations. Suppose that
ω1 = Ω + η1 , ω2 = η2 , ω3 = η3 .
Ω2
I2 η̈2 = (I3 − I1 )(I1 − I2 )η2 .
I3
Therefore, we see that rotation about e1 is unstable iff I1 is in between I2 and I3 . An asymmetric
top rotates stably only about the principal axes with largest and smallest moment of inertia.
8 1. Classical Mechanics
Note. We can visualize the Euler equations with the Poinsot construction. In the body frame, we
have conserved quantities
2T = I1 ω12 + I2 ω22 + I3 ω32 , L2 = I12 ω12 + I22 ω22 + I32 ω32
defining two ellipsoids. The first ellipsoid is called the inertia ellipsoid, and its intersection with the
L2 ellipsoid gives the polhode curve, which contains possible values of ω.
An inertia ellipsoid with some polhode curves is shown above. Since polhode curves are closed, the
motion is periodic in the body frame. This figure also gives an intuitive proof of the intermediate axis
theorem: polhodes are small loops near minima and maxima of L2 , but not near the intermediate
axis, which corresponds to a saddle point.
Note. The space frame is more complicated, as our nice results for the symmetric top no longer
apply. The only constraint we have is that L · ω is constant, which means that ω must lie on a
plane perpendicular to L called the invariable plane. We imagine the inertial ellipsoid as an abstract
object embedded inside the top.
Since L = ∂T /∂ ω, L is perpendicular to the inertial ellipsoid, which implies that the invariable
plane is tangent to the inertial ellipsoid. We can thus imagine this ellipsoid as rolling without
slipping on the invariable plane, as shown above. The angular velocity traces a path on this plane
called the herpolhode curve, which is not necessarily closed.
In the language of thermodynamics, we have L = L(q, q̇) and H = H(q, p) naturally. In order
to write H in terms of these variables, we must be able to eliminate q̇ in favor of p, which is
generally only possible if L is convex in q̇.
˙
In this context, the variations in pi and qi are independent. However, as before, δ q̇ = (δq).
Plugging in the variation, we see that δq must vanish at the endpoints to integrate by parts,
while δp doesn’t have to, so our formulation isn’t totally symmetric.
However, doing it for the covariant Lagrangian, with λ as the “time parameter”, yields H = 0. This
occurs generally for reparametrization-invariant actions. The notion of a Hamiltonian is inherently
not Lorentz invariant, as they generate time translation in a particular frame.
Both of the examples above are special cases of the minimal coupling prescription: to incorporate
an interaction with the electromagnetic field, we must replace
pµ → pµ − eAµ
E → E − eφ, p → p − eA.
we would need a non-minimal coupling to account for, e.g. the spin of the particle.
10 1. Classical Mechanics
• Liouville’s theorem states that volumes of regions of phase space are constant. To see this,
consider the infinitesimal time evolution
∂H ∂H
qi → qi + dt, pi → pi − dt.
∂pi ∂qi
Then the Jacobian matrix is
I + (∂ 2 H/∂pi ∂qj )dt (∂ 2 H/∂pi ∂pj )dt
J= .
−(∂ 2 H/∂qi ∂qj )dt I − (∂ 2 H/∂qi ∂pj )dt
Using the identity det(I + M ) = 1 + tr M , we have det J = 1 by equality of mixed partials.
• In statistical mechanics, we might have a phase space probability distribution ρ(q, p, t). The
convective derivative dρ/dt is the rate of change while comoving with the phase space flow,
∂ρ ∂ρ ∂H ∂ρ ∂H
= −
∂t ∂pi ∂qi ∂qi ∂pi
and Liouville’s theorem implies that dρ/dt = 0.
• Liouville’s theorem holds even if energy isn’t conserved, as in the case of an external field. It
fails in the presence of dissipation, where there isn’t a Hamiltonian description at all.
• Poincare recurrence states that for a system with bounded phase space, given an initial point
p, every neighborhood D0 of p contains a point that will return to D0 in finite time.
Proof: consider the neighborhoods Dk formed by evolving D0 with time kT for an arbitrary
time T . Since the phase space volume is finite, and the Dk all have the same volume, we
must have some overlap between two of them, say Dk and Dk0 . Since Hamiltonian evolution is
reversible, we may evolve backwards, yielding an overlap between D0 and Dk−k0 .
• As a corollary, it can be shown that Hamiltonian evolution is generically either periodic or
fills some submanifold of phase space densely. We will revisit this below in the context of
action-angle variables.
Geometrically, it is possible to associate g with a vector field Xg , and {f, g} is the rate of change
of f along the flow of Xg .
• Applying Hamilton’s equations, for any function f (p, q, t),
df ∂f
= {f, H} +
dt ∂t
where the total derivative is a convective derivative; this states that the flow associated with
H is time translation. In particular, if I(p, q) satisfies {I, H} = 0, then I is conserved.
11 1. Classical Mechanics
• The Poisson bracket is antisymmetric, linear, and obeys the product rule
as expected from the geometric intuition above. It also satisfies the Jacobi identity, so the space
of functions with the Poisson bracket is a Lie algebra.
• By the Jacobi identity, Lie brackets of conserved quantities are also conserved, so conserved
quantities form a Lie subalgebra.
Example. The Poisson brackets of position and momentum are always zero, except for
{qi , pj } = δij .
The flow generated by momentum is translation along its direction, and vice versa for position.
as in quantum mechanics. The first equation may be understood intuitively from the commutation
of infinitesimal rotations.
We now consider the changes of coordinates that preserve the form of Hamilton’s equations; these are
called canonical transformations. Generally, they are more flexible than coordinate transformations
in the Lagrangian formalism, since we can mix position and momentum.
• Now consider a transformation qi → Qi (q, p) and pi → Pi (q, p), written as xi → yi (x). Then
∂H
ẏ = (J JJ T )
∂y
where J is the Jacobian matrix Jij = ∂yi /∂xj . We say the Jacobian is symplectic if J JJ T is
the identity, and in this case, the transformation is called canonical.
• The Poisson bracket is invariant under canonical transformations. To see this, note that
{f, g}x = (∂x f )T J(∂x g)
where (∂x f )i = ∂f /∂xi . By the chain rule, ∂x = J T ∂y , giving the result. Then if we only
consider canonical transformations, we don’t have to specify which coordinates the Poisson
bracket is taken in.
• Conversely, if a transformation preserves the canonical Poisson brackets {yi , yj }x = Jij , it is
canonical. To see this, apply the chain rule for
Jij = {yi , yj }x = J JJ T ij
Example. Consider a ‘point transformation’ qi → Qi (q). We have shown that these leave Lagrange’s
equations invariant, but in the Hamiltonian formalism, we also must transform the momentum
accordingly. Dropping indices and defining Θ = ∂Q/∂q,
Θ(∂P/∂p)T
Θ 0 T 0
J = , J JJ =
∂P/∂q ∂P/∂p −ΘT ∂P/∂p 0
• We say G is a symmetry of H if the flow generated by G does not change H, i.e. {H, G} = 0.
But this is just the condition for G to be conserved: since the Poisson bracket is antisymmetric,
flow under H doesn’t change G either. This is Noether’s theorem in Hamiltonian mechanics.
• For example, using G = H simply generates time translation, y(t) = x(t − t0 ). Less trivially,
G = pk generates qi → qi + αδik , so momentum generates translations.
Now we give a very brief glimpse of the geometrical formulation of classical mechanics.
• In Lagrangian mechanics, the configuration space is a manifold M , and the Lagrangian is a
function on its tangent bundle L : T M → R. The action is a real-valued function on paths
through the manifold.
• The momentum p = ∂L/∂ q̇ is a covector on M , and we have a map
F : T M → T ∗ M, (q, q̇) 7→ (q, p)
called the Legendre transform, which is invertible if the Lagrangian is regular. The cotangent
bundle T ∗ M can hence be identified with phase space.
• A cotangent bundle has a canonical one-form ω = pi dq i , where the q i are arbitrary coordinates
and the pi are coordinates in the dual basis. Its exterior derivative Ω = dpi ∧ dq i is a symplectic
form, i.e. a closed and nondegenerate two-form on an even-dimensional manifold.
• Conversely, the Darboux theorem states that for any symplectic form we may always choose
coordinates so that locally it has the form dpi ∧ dq i .
• The symplectic form relates functions f on phase space to vector fields Xf by
iXf Ω = df, Ωµν Xfµ = ∂ν f
where iXf is the interior product with Xf , and the indices range over the 2 dim M coordinates
of phase space. The nondegeneracy condition means the form can be inverted, giving
Xfµ = Ωµν ∂ν f
and thus Xf is unique given f .
• Time evolution is flow under XH , so the rate of change of any phase space function f is XH (f ).
• The Poisson bracket is defined as
{f, g} = Ω(Xf , Xg ) = Ωµν ∂µ f ∂ν g.
The closure of Ω implies the Jacobi identity for the Poisson bracket.
• If flow under the vector field X preserves the symplectic form, LX Ω = 0, then X is called a
Hamiltonian vector field. In particular, using Cartan’s magic formula and the closure of Ω, this
holds for all Xf derived from the symplectic form.
• If Ω is preserved, so is any exterior power of it. Since Ωn is proportional to the volume form,
its conservation recovers Liouville’s theorem.
Note. Consider a single particle with a parametrized path xµ (τ ). Then the velocity is naturally a
Lorentz vector and the canonical momentum is a Lorentz covector. However, the physical energy
and momentum are vectors, because they are the conserved quantities associated with translations,
which are vectors. Hence we must pick up signs when converting canonical momentum to physical
momentum, which is the fundamental reason why p = −i∇ but H = +i∂t in quantum mechanics.
14 1. Classical Mechanics
H = ωI, θ̇ = ω, I˙ = 0.
We have “straightened out” the phase space flow into straight lines on a cylinder. This is the
simplest example of action angle variables.
• In general, for n degrees of freedom, we would like to find variables (θi , Ii ) so that the Hamiltonian
is only a function of the Ii . Then the Ii are conserved, and θ̇i = ωi , where the ωi depend on
the Ii but are time independent. When the system is bounded, we scale θi to lie in [0, 2π). The
resulting variables are called action-angle variables, and the system is integrable.
• Liouville’s theorem states that if there are n mutually Poisson commuting constants of motion
Ii , then the system is integrable. (At first glance, this seems to be a trivial criterion – how
could one possibly prove that such constants of motion don’t exist? However, it is possible; for
instance, Poincare famously proved that there were no such conserved quantities for the general
three body problem, analytic in the canonical variables and the masses.)
• Integrable systems are rare and special; chaotic systems are not integrable. The question of
whether a system is integrable has to do with global structure, since one can always straighten
out the phase space flow lines locally.
• The motion of an integrable system lies on a surface of constant Ii . These surfaces are topolog-
ically tori Tn , called invariant tori.
θ̇ = ω = dE/dI
15 1. Classical Mechanics
Note that by pulling the d/dE out of the integral, we neglected the change in phase space area due
to the change in the endpoints of the path, because this contribution is second order in dE.
Therefore, we have the nice results
I I
1 d
I= p dq, T = p dq.
2π dE
We can thus calculate T without finding a closed-form expression for θ, which can be convenient.
For completeness, we can also determine θ, by
Z Z
dE d d
θ = ωt = p dq = p dq.
dI dE dI
Here the value of θ determines the upper bound on the integral, and the derivative acts on the
integrand.
• Consider a situation where the Hamiltonian depends on a parameter λ(t) that changes slowly.
Then energy is not conserved; taking H(q(t), p(t), λ(t)) = E(t) and differentiating, we have
∂H
Ė = λ̇.
∂λ
However, certain “adiabatic invariants” are approximately conserved.
These two contributions are due to the nonconservation of energy, and from the change in the
shape of the orbits at fixed energy, respectively.
where we applied Hamilton’s equations, and neglected a higher-order term from the change in
the endpoints.
• To simplify the integrand, take H(q, p(q, λ, E), λ) = E and differentiate with respect to λ at
fixed E. Then
∂H ∂q ∂H ∂p ∂H
+ + = 0.
∂q λ,p ∂λ E ∂p λ,q ∂λ E ∂λ q,p,E
By construction, the first term is zero. Then we conclude that
I
∂I 1 ∂H
=− dt0 .
∂λ E 2π ∂λ E
Taking the time average of I˙ and noting that the change in λ is slow compared to the period
˙ = 0 and I is an adiabatic invariant.
of the motion, the two quantities above cancel, so hIi
Example. The simple harmonic oscillator has I = E/ω. Then if ω is changed slowly, the ratio
E/ω remains constant. The above example also manifests in quantum mechanics; for example, for
quanta in a harmonic oscillator, we have E = n~ω. If the ω of the oscillator is changed slowly, the
energy can only remain quantized if E/ω remains constant, as it does in classical mechanics.
Example. The adiabatic theorem can also be proved heuristically with Liouville’s theorem. We
consider an ensemble of systems with fixed E but equally spaced phase θ, which thus travel along
a single closed curve in phase space. Under any time variation of λ, the phase space curve formed
by the systems remains closed, and the area inside it is conserved because none can leak in or out.
Now suppose λ is varied extremely slowly. Then every system on the ring should be affected in
the same way, so the final ring remains a curve of constant energy E 0 . By the above reasoning, the
area inside this curve is conserved, proving the theorem.
Example. A particle in a magnetic field. Consider a particle confined to the xy plane, experiencing
a magnetic field
B = B(x, y, t)ẑ
which is slowly varying. Also assume that B is such that the particle forms closed orbits. If the
variation of the field is slow, then the adiabatic theorem holds. Integrating over a cycle gives
I Z Z
1 2π
I= p · dq ∝ mv · dq − e A · dq = mv 2 − eΦB .
2π ω
In the case of a uniform magnetic field, we have
eB
v = Rω, ω=
m
17 1. Classical Mechanics
which shows that the two terms are proportional; hence the magnetic flux is conserved. Alternatively,
since ΦB = AB and B ∝ ω, the magnetic moment of the current loop made by the particle is
conserved; this is called the first adiabatic invariant by plasma physicists. One consequence is that
charged particles can be heated by increasing the field.
Alternatively, suppose that B = B(r) and the particle performs circular orbits centered about
the origin. Then the adiabatic invariant can be written as
I ∝ r2 (2B − Bav )
where Bav is the average field inside the circular orbit. This implies that as B(r, t) changes in time,
the orbit will get larger or smaller unless we have 2B = Bav , a condition which betatron accelerators,
which accelerate particles by changing the magnetic field in this way, are designed to satisfy.
The first adiabatic invariant is also the principle behind magnetic mirrors. Suppose one has a
magnetic field B(x, y, z) where Bz dominates, and varies slowly in space. Particles can perform
helical orbits, spiraling along magnetic field lines. The speed is invariant, so
On the other hand, if we boost to match the vz of a spiraling particle, then the situation looks just
like a particle in the xy plane with a time-varying magnetic field. Approximating the orbit as small
and the Bz inside as roughly constant, we have
mv 2 vx2 + vy2
I∝ ∝ = const.
ω Bz
Therefore, as Bz increases, vz decreases, and at some point the particle will be “reflected” and spiral
back in the opposite direction. This is the principle behind magnetic mirrors, which can be used to
confine plasmas in fusion reactors.
• Given initial conditions (qi , ti ) and final conditions (qf , tf ), there can generally be multiple
classical paths between them. Often, paths are discrete, so we may label them with a branch
index b. However, note that for the harmonic oscillator we need a continuous branch index.
where A stands for the usual action. We suppress the branch index below, so the four arguments
of S alone specify the entire path.
• Consider an infinitesimal change in qf . Then the new path is equal to the old path plus a
variation δq with δq(tf ) = δqf . Integrating by parts gives an endpoint contribution pf δqf , so
∂S
= pf .
∂qf
18 1. Classical Mechanics
• Next, suppose we simply extend the existing path by running it for an additional time dtf .
Then we can compute the change in S in two ways,
∂S ∂S
dS = Lf dtf = dtf + dqf
∂tf ∂qf
• Henceforth we take qi and ti as fixed and implicit, and rename qf and tf to q and t. Then we
have S(q, t) with
dS = −H dt + p dq
where qi and ti simply provide the integration constants. The signs here are natural if one
imagines them descending from special relativity.
• To evaluate S, we use our result for ∂S/∂t, called the Hamilton–Jacobi equation,
∂S
H(q, ∂S/∂q, t) + = 0.
∂t
That is, S can be determined by solving a PDE. The utility of this method is that the PDE can
be separated whenever the problem has symmetry, reducing the problem to a set of independent
ODEs. We can also run the Hamilton–Jacobi equation in reverse to solve PDEs by identifying
them with mechanical systems.
• For a time-independent Hamiltonian, the value of the Hamiltonian is just the conserved energy,
so the quantity S 0 = S + Et is time-independent and satisfies the time-independent Hamilton–
Jacobi equation
H(q, ∂S 0 /∂q) = E.
The function S 0 can be used to find the paths of particles of energy E.
That is, Hamilton’s principal function can reduce the equations of motion to first-order equations
on configuration space.
19 1. Classical Mechanics
d ∂S ∂2S ∂2S
ṗ = = + 2 q̇
dt ∂q ∂t∂q ∂q
∂2S ∂ ∂H ∂2S
= − H(q, ∂S/∂q, t) = − − 2 q̇.
∂t∂q ∂q ∂q ∂q
• The quantity S(q, t) acts like a real-valued ‘classical wavefunction’. Given a position, its gradient
specifies the momentum. To see the connection with quantum mechanics, let
Some care needs to be taken here. We assume R and W are analytic in ~, but this implies that
ψ is not.
Then in the semiclassical limit, W obeys the Hamilton–Jacobi equation. The action S(q, t) is
the semiclassical phase of the quantum wavefunction. This result anticipates the de Broglie
relations p = ~k and E = ~ω classically, and inspires the path integral formulation.
• With this intuition, we can read off the Hamilton–Jacobi equation from a dispersion relation.
For example, a free relativistic particle has pµ pµ = m2 , which means the Hamilton–Jacobi
equation is
η µν ∂µ S∂ν S = m2 .
This generalizes immediately to curved spacetime by using a general metric.
• To see how classical paths emerge in one dimension, consider forming a wavepacket by superpos-
ing solutions with the same phase at time ti = 0 but slightly different energies. The solutions
constructively interfere when ∂S/∂E = 0, because
Z Z Z
∂S ∂p dq dq
= −t + dq = −t + = −t + =0
∂E ∂E ∂H/∂p q̇
• Fermat’s principle of least time states that light travels between two points in the shortest
possible time. We consider an inhomogeneous anisotropic medium. Consider the set of all
points that can be reached from point q0 within time t. The boundary of this set is the
wavefront Φq0 (t).
This follows because Φq0 (s + t) is the set of points we need time s + t to reach, and an optimal
path to one of these points should be locally optimal as well. In particular, note that each of
the fronts Φq (s) is tangent to Φq0 (s + t).
• Let Sq0 (q) be the minimum time needed to reach point q from q0 . We define
∂S
p=
∂q
to be the vector of normal slowness of the front. It describes the motion of wavefronts, while q̇
describes the motion of rays of light. We thus have dS = p dq.
• The quantities p and q̇ can be related geometrically. Let the indicatrix at a point be the
surface defined by the possible velocity vectors; it is essentially the wavefront at that point for
infinitesimal time. Define the conjugate of q̇ to be the plane tangent to the indicatrix at q̇.
• The wave front Φq0 (t) at the point q(t) is conjugate to q̇(t). By decomposing t = (t − ) +
and applying the definition of an indicatrix, this follows from Huygen’s theorem.
• Everything we have said here is perfectly analogous to mechanics; we simply replace the total
time with the action, and hence the indicatrix with the Lagrangian. The rays correspond to
trajectories. The main difference is that the speed the rays are traversed is fixed in optics but
variable in mechanics, so our space is (q, t) rather than just q, and dS = p dq − H dt instead.
(finish)
21 2. Electromagnetism
2 Electromagnetism
2.1 Electrostatics
The fundamental equations of electrostatics are
ρ
∇·E= , ∇ × E = 0.
0
The latter equation allows us to introduce the potential E = −∇φ, giving Poisson’s equation
ρ
∇2 φ = − .
0
The case ρ = 0 is Laplace’s equation and the solutions are harmonic functions.
Example. The field of a point charge is spherically symmetric with ∇2 φ = 0 except at the origin.
Guessing the form φ ∝ 1/r, we have
1 −∇r r
∇ = 2 = − 3.
r r r
Next, we can take the divergence by the product rule,
2 1 ∇ · r 3r̂ · r 3 3
∇ =− − 4 =− 3 − 3 =0
r r3 r r r
as desired. To get the overall constant, we use Gauss’s law, for φ = q/(4π0 r).
We see the potential falls off as 1/r2 , and at large distances only depends on the dipole moment
p = Qd. Differentiating using the usual quotient rule,
1 3(p · r̂)r̂ − p
E= .
4π0 r3
Taking only the first term of the Taylor series is justified if r d. More generally, for an arbitrary
charge distribution
ρ(r0 )
Z
1
φ(r) = dr0
4π0 |r − r0 |
and approximating the integrand with Taylor series gives the multipole expansion.
22 2. Electromagnetism
Note. Electromagnetic field energy. The energy needed to assemble a set of particles is
1X
U= qi φ(ri ).
2
i
Example. Dipole-dipole interactions. Consider a dipole moment p1 at the origin, and a second
dipole with charge Q at r and −Q at r − d, with dipole moment p2 = Qd. The potential energy is
Example. Boundary value problems. Consider a volume bounded by surfaces Si , which could
include a surface at infinity. Then Laplace’s equation ∇2 φ = 0 has a unique solution (up to
constants) if we fix φ or ∇φ · n̂ ∝ E⊥ on each surface. These are called Dirichlet and Neumann
boundary conditions respectively. To see this, let f be the difference of two solutions. Then
Z Z Z
2
dV (∇f ) = dV ∇ · (f ∇f ) = f ∇f · dS
where we used ∇2 f = 0 in the first equality. However, boundary conditions force the right-hand
side to be zero, so the left-hand side is zero, which requires f to be constant.
In the case where the surfaces are conductors, it also suffices to specify the charge on each surface.
To see this, note that potential is constant on a surface, so
Z Z
f ∇f · dS = f ∇f · dS = 0
because the total charge on a surface is zero if we subtract two solutions. Then ∇f = 0 as before,
giving the same conclusion.
23 2. Electromagnetism
2.2 Magnetostatics
• The fundamental equations of magnetostatics are
∇ × B = µ0 J, ∇ · B = 0.
• Since the divergence of a curl is zero, we must have ∇ · J = 0. This is simply a consequence of
the continuity equation
∂ρ
+∇·J=0
∂t
and the fact that we’re doing statics.
∆Bk = µ0 K, ∆B⊥ = 0.
This is similar to the case of a surface charge, except there E⊥ is discontinuous instead.
• Consider an infinite cylindrical solenoid. Then B = B(r)ẑ by symmetry. Both inside and
outside the solenoid, we have ∇ × B = 0 which implies ∂B/∂r = 0. Since fields vanish at
infinity, the field outside must be zero, and by Ampere’s law, the field inside is
B = µ0 K
where K is the surface current density, equal to nI where n is the number of turns per length.
∇2 A = −µ0 J
∇2 A = ∇(∇ · A) − ∇ × (∇ × A).
Note. What is the vector Laplacian? Formally, the Laplacian of any tensor is defined as
∇2 T = ∇ · (∇T ).
In a general manifold with metric, the operations on the right-hand side are defined through covariant
derivatives, and depend on a connection. Going to the other extreme of generality, it can be defined
24 2. Electromagnetism
in Cartesian components in Rn as the tensor whose components are the scalar Laplacians of those
of T ; we can then generalize to, e.g. spherical coordinates by a change of coordinates.
In the case of the vector Laplacian, the most practical definition for curvilinear coordinates on
Rn is to use the curl-of-curl identity in reverse, then plug in the known expressions for divergence,
gradient, and curl. This route doesn’t require any tensor operations.
We now use our mathematical tools to derive the Biot–Savart law.
• To simplify, pull the 1/r3 out of the integral, then dot the integral with g for
I Z Z Z
0 0 0 0 0
gi rj rj dri = ijk ∂i (gj r` r` ) dSk = ijk ri gj dSk = g · dS0 × r
0
C S S
µ0 3(m · r̂)r̂ − m
B(r) =
4π r3
which is the same as the far-field of an electric dipole.
• Near the dipoles, the fields differ because the electric and magnetic fields are curlless and
divergenceless, respectively. For instance, the field inside an electric dipole is opposite the
dipole moment, while the field inside a magnetic dipole is in the same direction.
• One can show that, in the limit of small dipoles, the fields are
Example. We can do more complicated variants of these tricks for a general current distribution,
where we used ∇ · J = 0. Then the monopole term is a total derivative and hence vanishes. The
intuitive interpretation is that currents must go around in loops, with no net motion; our identity
then says something like ’the center of charge doesn’t move’.
To simplify the second term, note that
∂j (Jj ri rk ) = Ji rk + Jk ri .
where we used the double cross product identity. Then we conclude the dipole field has the same
form as before, with the more general dipole moment
Z
1
m= dr0 r0 × J(r0 )
2
which is equivalent to our earlier result by the vector identity
Z Z
1
r × ds = dS.
2
Example. The force on a magnetic dipole. The force on a general current distribution is
Z
F = dr J(r) × B(r).
Here, we turned the R into an r0 evaluated at R so it’s clear what coordinate the derivative is acting
on. The first term contributes nothing, by the same logic as the previous example. In indices, the
second term is
Z Z
F = dr J(r) × (r · ∇ )B(r ) = dr ijk Ji r` ∂`0 Bj (r0 ) êk .
0 0
Now we focus on the terms in parentheses. In general, the curl is just the exterior derivative, so if
the curl of B vanishes, then
∂i Bj − ∂j Bi = 0.
This looks different from the usual (3D) expression for vanishing curl, which contains ijk , because
there we additionally take the Hodge dual. This means that we can swap the indices for
Z Z
dr ijk Ji r` ∂j B` (r ) êk = −∇ × dr (r · B(r0 ))J(r).
0 0 0
Now the integral is identical to our magnetic dipole integral from above, with a constant vector of
B(r0 ) instead. Therefore
In the first step, we use a product rule along with ∇ · B = 0. For the final step, we again use
the ’derivative index swapping’ trick which works because the curl of B vanishes. The resulting
potential energy can also be used to find the torque on a dipole.
2.3 Electrodynamics
The first fundamental equation of electrodynamics is Faraday’s law,
∂B
∇×E+ = 0.
∂t
27 2. Electromagnetism
• For conducting loops, the resulting emf will create a current that creates a field that opposes
the change in flux; this is Lenz’s law. This is simply a consequence of energy conservation; if
the sign were flipped, we would get runaway positive feedback.
• The integrated form of Faraday’s law still holds for moving wires. Consider a loop C with
surface S whose points have velocity v(r) in a static field. After a small time dt, the surface
becomes S 0 . Since the flux through any closed surface is zero,
Z Z Z
dΦ = B · dS − B · dS = − B · dS
S0 S Sc
• As an example, a solenoid has B = µ0 nI with total flux Φ = BAn` where ` is the total length.
Therefore L = µ0 n2 V where V = A` is the total volume.
• We can use our inductor energy expression to get the magnetic field energy density,
Z Z Z
1 1 1
U= I B · dS = I A · dr = dx J · A
2 S 2 C 2
where we turned the line integral into a volume integral.
so that taking the divergence now gives the full continuity equation. We see a changing electric field
behaves like a current; it is called displacement current. This leads to propagating wave solutions.
• In vacuum, we have
∂B ∂E
∇ · E = 0, ∇ · B = 0, ∇×E=− , ∇ × B = µ0 0 .
∂t ∂t
Combining these equations, we find
∂2E
µ0 0 = −∇ × (∇ × E) = ∇2 E
∂t2
√
with a similar equation for B, so electromagnetic waves propagate at speed c = 1/ µ0 0 .
• Taking plane waves with amplitudes E0 and B0 , we read off from Maxwell’s equations
k · E0 = k · B0 = 0, k × E0 = ωB0
• In an electromagnetic wave, the average field energy density is u = 0 E 2 /2, where we get a
factor of 1/2 from averaging a square trigonometric function and a factor of 2 from the magnetic
field. As expected, the Poynting vector obeys S = cu.
• Electromagnetic waves can also be written in terms of potentials, though these have gauge
freedom. A common choice for plane waves is to set the electric potential φ to zero.
29 2. Electromagnetism
2.4 Relativity
Next, we rewrite our results relativistically.
∂µ J µ = 0, J µ = (ρ, J).
ρ0 = γρ0 , J0 = −γρv.
Though the charge density is not invariant, the total charge is. To see this, note that
Z Z
Q = d x J (x) = d4 x J µ (x)nµ δ(n · x).
3 0
This is identical to the expression for Q, except that n has been replaced with n0 . Said another
way, we can compute the total charge measured in another frame by doing an integral over a tilted
spacelike surface in our original frame. Then by the continuity equation, we must have Q = Q0 .
More formally, we can use nµ δ(n · x) = ∂µ θ(n · x) to show the difference is a total derivative.
Example. Deriving magnetism. Consider a wire with positive charges q moving with velocity v
and negative charges −q moving with velocity −v. Then
I = 2nAqv.
Now consider a particle moving in the same direction with velocity u, who measures the velocities
of the charges to be v± = u ⊕ (∓v). Let n0 be the number density in the rest frame of each kind of
charge, so that n = γ(v)n0 . Using the property
ρ0 = q(n+ − n− ) = −q(uvγ(u))n
in its rest frame. It thus experiences an electric force of magnitude F 0 ∼ uvγ(u). Transforming
back to the original frame gives F ∼ uv, in agreement with our results from magnetostatics.
• In relativistic notation, we define Aµ = (φ, A) (noting that this makes the components of Aµ
metric dependent), and gauge transformations are
Aµ → Aµ − ∂µ χ.
Fµν = ∂µ Aν − ∂ν Aµ
and is gauge invariant. It contains the electric and magnetic fields in its components,
0 Ex Ey Ez
−Ex 0 −Bz By
Fµν = −Ey Bz
.
0 −Bx
−Ez −By Bx 0
F 0µν = Λµρ Λν σ F ρσ F 0 = ΛF ΛT .
In the latter, F has both indices up, and Λ is the matrix that transforms vectors, v → Λv.
• Under rotations, E and B also rotate. Under boosts along the x direction,
The intuition for the latter is that taking the dual simply swaps E and B (with some signs, i.e.
E → B → −E), so we can read off the answer.
Note. The Helmholtz decomposition states that a general vector field can be written as a curl-free
part plus a divergence-free part, as long as the field falls faster than 1/r at infinity. The slickest
way to show this is to take the Fourier transform F̃(k), which is guaranteed to exist by the decay
condition. Then the curl-free part is the part parallel to k (i.e. (F̃(k) · k̂)k̂), and the divergence-
free part is the part perpendicular to k. Since A can always be taken to be divergence-free, our
expression for E above is an example of the Helmholtz decomposition.
31 2. Electromagnetism
Example. Slightly boosting the field of a line charge at rest gives a magnetic field −v × E which
wraps around the wire, thus yielding Ampere’s law. For larger boosts, we pick up a Lorentz
contraction factor γ due to the contraction of the charge density.
Now consider a frame moving with velocity v = v î. Then the boosted field is
x
1
E0 ∼ 2 γy
(x + y 2 + z 2 )3/2
γz
using the coordinates in the original field. Switching the coordinates to the boosted ones,
0
x + vt0
γ
E0 ∼ 2 0 y0
(γ (x + vt0 )2 + y 02 + z 02 )3/2
z0
where we used x = γ(x0 + vt0 ). Interestingly, the field remains radial. However, the x0 coordinate
in the denominator is effectively γx0 , so it’s as if electric field lines have been length contracted.
By charge invariance and Gauss’s law, the total flux remains constant, so the field is stronger than
usual along the perpendicular direction and weaker than usual along the parallel direction.
We conclude by rewriting Maxwell’s equations and the Lorentz force law relativistically.
• One neat trick is that whenever E · B = 0, we can boost to get either zero electric or zero
magnetic field. For example, a particle in crossed fields either goes a cycloid-like motion, or
falls arbitrarily far; the sign of E 2 − B 2 separates the two cases.
32 2. Electromagnetism
2.5 Radiation
In this section, we show how radiation is produced by accelerating charges.
∂ν F νµ = µ0 J µ , ∂ 2 Aµ − ∂ µ ∂ν Aν = µ0 J µ .
∂ 2 Aµ = µ 0 J µ .
That is, the potential solves the wave equation, and its source is the current.
• Lorenz gauge exists if we can always pick a gauge transformation χ so that ∂ 2 χ = −∂µ Aµ .
Thus solving the wave equation will also show us how to get to Lorenz gauge in the first place.
• In Coulomb gauge, the expression for φ in terms of ρ is the same as in electrostatics, with no
retardation, which appears to violate causality. This is physically acceptable because φ is not
directly measurable, but it makes the analysis more confusing. However, Coulomb gauge is
useful for certain calculation, as we will see for the Darwin Lagrangian.
• In Coulomb gauge, it is useful to break the current into transverse and longitudinal components,
J = J` + Jt , ∇ × J` = 0, ∇ · Jt = 0.
∇0 · J(x0 ) 0 J(x0 )
Z Z
1 1
Jt (x) = − ∇ dx , Jt = ∇×∇× dx0 .
4π |x − x0 | 4π |x − x0 |
1 ∂2A
∇2 A − = −µ0 Jt
c2 ∂t2
which makes sense because A has no longitudinal component.
Returning to Lorenz gauge, we are thus motivated to find the Green’s function for ∂ 2 .
33 2. Electromagnetism
This is called the Helmholtz equation; the Poisson equation is the limit ω → 0. The function
Jµ (x, ω) is the time Fourier transform of Jµ (x, t) at every point x.
• In spherical coordinates,
1 d 2 dGω
r + ω 2 Gω = δ(r).
r2 dr dr
This equation has solutions
1 e±iωr
Gω (r) = − .
4π r
One can arrive at this result by guessing that amplitudes fall as 1/r, and hence working in
terms of rG instead of G. The constant is found by integrating in a ball around r = 0.
• The result is like the solution to the Poisson equation, except that the current must be evaluated
at the retarded or advanced time; we take the retarded time as physical, defining
tret = t − |x − x0 |.
We see that the Helmholtz equation contains the correct speed of light travel delay.
• Note that while the potentials just depend on the current in the usual way, but evaluated at the
retarded time, the same is not true of the fields! When we differentiate the potentials, we pick
up extra terms from differentiating tret . These extra terms are crucial because they provide the
radiation fields which fall off as 1/r, rather than 1/r2 .
We can also take the Fourier transform in both time and space.
ei(k·r−ωt)
Z
G(r, t) = − d̄4 k .
k 2 − ω 2 /c2
• In order to perform the dω integration, we need to deal with the poles. By adding an infinitesimal
damping forward in time, we can push the poles below the real axis. Now, when t < 0, the
integration contour can be closed in the upper-half plane, giving zero. When t > 0, we close in
the lower-half plane, picking up both poles, so
e−iωt
Z
2π
dω = − θ(t) sin(ckt).
C (ω − ck)(ω + ck) ck
θ(t)
Gret (r, t) = − δ(tret ).
4πr
This is the retarded Green’s function; plugging it into the wave equation gives us the same
expression for the retarded potential as derived earlier.
θ(−t)
Gadv (r, t) = − δ(tadv ).
4πr
• Both of these conventions can be visualized by pushing the integration contour above or below
the real axis. If we instead tilt it about the origin, we get the Feynman propagator.
Note. Checking Lorenz gauge. Our retarded potential solution has the form
Z
Aµ (x) ∼ d4 x0 G(x, x0 )Jµ (x0 ).
Now consider computing ∂µ Aµ . Since the Green’s function only depends on x − x0 , we have
Z Z
∂µ A ∼ d x ∂µ G(x, x )Jµ (x ) = − d4 x0 (∂µ0 G(x, x0 ))Jµ (x0 ).
µ 4 0 0 0
which is like our results in magnetostatics, but allowing for a varying dipole moment p. Evalu-
ating this at the time t − r/c,
µ0
A(x, t) ≈ ṗ(t − r/c).
4πr
• Applying the product rule, we have
µ0 x̂ × ṗ(t − r/c) x̂ × p̈(t − r/c)
B≈ − − .
4π r2 rc
The former is just the usual magnetic field but time-delayed, and the latter is the 1/r radiation
field. If the dipole has characteristic frequency ω, then the latter dominates if r λ = c/ω,
the far-field/radiation zone.
• In the radiation zone, the fields look like plane waves, with E = −cx̂ × B. Then
1 c 2 µ0
S= E×B= B x̂ = |x̂ × p̈|2 x̂
µ0 µ0 16π 2 r2 c
where we used the triple cross product rule.
• The total instantaneous power is thus
Z
µ0 µ0
P= sin2 θ dΩ = |p̈|2 .
16π 2 c 6πc
• Consider a particle of charge Q oscillating in the ẑ direction with frequency ω and amplitude
d, and hence dipole moment p = Qz. Expanding and time averaging,
µ0 p 2 ω 4 Q2 a2
Pav = = .
12πc 12π0 c3
This is the Larmor formula; note that it is quadratic in charge and acceleration (the field is
linear, but energy is bilinear). Since we used the electric dipole approximation, it only applies
for nonrelativistic motion.
36 2. Electromagnetism
• Note that the radiation fields are zero along the ẑ axis. This is related to the hairy ball theorem:
since the radiation fields are everywhere tangent to spheres about the charge, they must vanish
somewhere.
• By taking higher-order terms in our Taylor series, we can get magnetic dipole and electric
quadrupole terms, and so on. The magnetic dipole term is dominant in situations where there
is no electric dipole moment (e.g. a current loop), but for moving charges its power is suppressed
by v 2 /c2 and hence is much smaller in the nonrelativistic limit.
• As a warmup, we consider Thomson scattering. Consider a free particle in light, and assume
that it never moves far compared to the wavelength of the light. Equivalently, we assume it
never moves relativistically fast. Then
qE0
mẍ(t) ≈ qE(x = 0, t), x(t) = − sin(ωt).
mω 2
Applying the Larmor formula,
µ0 q 4 E02
Pav = .
12πm2 c
• The averaged Poynting vector for the light is
cE02
Sav = .
2µ0
Therefore, Thomson scattering has a ‘cross section’ of
Pav 8π 2 q2
σ= = r , = mc2 .
Sav 3 q 4π0 rq
Here, rq is called the classical electron radius. Note that it is independent of frequency.
• Thomson scattering is elastic, but if the particle moves relativistically fast, the scattered light
can be redshifted by radiation recoil effects.
• Experimentally, it was found that the scattered light had a shifted wavelength for high frequencies
and arbitrarily low intensities (Compton scattering), which provided support for the particle
nature of light.
• Rayleigh scattering describes the scattering of light off a neutral but polarizable atom or molecule.
We effectively add a spring and damping to the model of Thomson scattering, so
qE(t)/m
x(t) = − .
ω 2 − ω02 + iγω
• In the limit ω ω0 , which is a good approximation for visible light and molecules in the
atmosphere, the amplitude is constant (rather than the 1/ω 2 for Thomson scattering), giving
8πrq2 ω 4
σ= .
3 ω0
The fact that σ ∝ ω 4 explains why the sky is blue. Intuitively, scattering of low frequency light
is suppressed because the ‘molecular springs’ limit how far the electrons can go.
37 2. Electromagnetism
• Rayleigh scattering holds when the size of the molecules involved is much smaller than the
wavelength of the light. In the case where they are comparable, we get Mie scattering, which
preferentially scatters longer wavelengths. The reason is that nearby molecules oscillate in
phase, so their amplitudes superpose, giving a quadratic increase in power. Mie scattering
applies for water droplets in the atmosphere, explaining why clouds are visible, and white. In
the case where the scattering particles are much larger, we simply use geometric optics.
Note. As a final note, we can generalize our results to a relativistically moving charge. Suppose a
point charge has position r(t). Then its retarded potential is
δ(x0 − r(tret ))
Z
φ(x, t) ∝ dx0 .
|x − x0 |
The tricky part is that tret depends on x0 nontrivially. Instead, it’s easier to switch the delta function
to be over time,
δ(x0 − r(t))δ(t − tret ) δ(t − t0 − |x − r(t0 )|/c)
Z Z
φ(x, t) ∝ dx0 dt 0
= dt0 .
|x − x | |x − r(t0 )|
The argument of the delta function changes both because of the t0 and because of the velocity of
the particle towards the point x, giving an extra contribution akin to a Doppler shift. Then
q 1 qµ0 v(t0 )
φ(x, t) = , A(x, t) = , t0 +R(t0 )/c = t
4π0 R(t0 )(1 − R̂(t0 ) · v(t0 )/c) 4π R(t0 )(1 − R̂(t0 ) · v(t0 )/c)
where R is the separation vector R(t) = x − r(t). These are the Lienard–Wiechert potentials.
Carrying through the analysis, we can find the fields of a relativistic particle and the relativistic
analogue of the Larmor formula. The result is that the radiation rate is greatly enhanced, and
concentrated along the direction of motion of the particle.
Note. A cheap, very heuristic estimate of radiation power. Consider sound waves emitted by a
speaker. The relevant field is the velocity field v, and sources correspond to adding mass Ṁ (which
the speaker simulates by pushing mass outward). The “coupling” is the inverse of the air density,
1/ρ, in the sense that the static field and energy density are
Ṁ 1
v= , u = ρv 2 .
4πρr2 2
Now we consider the power radiated by a spherically symmetric speaker, which has amplitude Ṁ
and angular frequency ω. A simple estimate would be to take the energy density at some radius,
and multiply it by 4πr2 c, where c is the speed of sound. However, at small radii, the 1/r radiation
field is overwhelmed by a 1/r2 quasistatic field, which does not count as radiation.
By dimensional analysis, the two types of fields must be equally important at the intermediate
field distance r ∼ c/ω. Evaluating the field there, we have
!2
1 Ṁ 1 Ṁ 2 ω 2
P ∼ (4πr2 c) ρ
= .
2 4πρr2
r=c/ω 8π ρc
This is a correct estimate of the radiation power; evaluating the static field at r = c/ω has saved us
from having to think about how to compute the radiation field at all.
38 2. Electromagnetism
To convert this to electromagnetism, we convert Ṁ to q and the coupling 1/ρ to 1/0 , giving
1 q2ω2
P ∼ .
8π 0 c
However, this is incorrect, because monopole radiation does not exist for electromagnetism, because
of charge conservation. Instead, we need to use the static dipole field, which is smaller by a factor
of `/r where ` is the separation between the charges. This gives
1 q 2 `2 ω 4
P ∼
8π 0 c3
which is the Larmor formula up to an O(1) factor. We can recast this in a more familiar form using
a ∼ `ω 2 . A similar argument can be used to estimate (electric) quadrupole radiation power,
1 q 2 `4 ω 6
P ∼ .
8π 0 c5
This is especially relevant for gravitational waves, where the quadrupole is the leading contribution,
due to energy and momentum conservation. The charge is M and the coupling is 4πG, giving
G M 2 `4 ω 6
P ∼ .
2 c5
For a binary system of separation ` and masses M , we have
GM
ω2 =
`3
which gives
1 G4 M 5
P ∼ .
2 `5 c5
This matches the result of the quadrupole formula, derived in the notes on General Relativity, up
to an O(1) factor.
Note. Two slowly moving charges can be approximately described by the Lagrangian
X1 q1 q2
L= mi vi2 − .
2 r
i
It is difficult to account for radiation effects without having to think about the dynamics of the
entire electromagnetic field, drastically increasing the number of degrees of freedom. A typical
procedure is to compute the power radiated using the formulas above, then introduce it here as an
ad hoc energy loss. Radiation can also be accounted for more directly through a “self-force” on
each charge, but this is infamously tricky.
However, it is more straightforward to account for relativistic effects at lowest order. At order
(v/c)2 , the two effects are the retardation of propagation of the Coulomb field, and the magnetic
forces between the charges. We set c = 1 and work in Coulomb gauge. In this gauge, the scalar
potential has no retardation at all, instead propagating instantaneously, so the desired effect is
absorbed entirely into the vector potential. The new terms we want are
L1 = q1 v1 · A2 (r1 ) + q2 v2 · A1 (r2 ).
39 2. Electromagnetism
Since there is already a prefactor linear in v, the vector potential can be taken to first order in v.
This is the lowest order, so it can be found from the magnetostatic expression,
Jt (r0 )
Z
µ0
A(r) = dr0 .
4π |r − r0 |
The transverse part of the current can be calculated by starting from the current of a point charge
and taking the transverse part as described above. This leads to the Darwin Lagrangian,
q1 q2 (v1 · r)(v2 · r)
L1 = v1 · v2 + .
2r r2
Going to higher order requires accounting for the field degrees of freedom.
• In the previous equation, E is the total average field in the dielectric, counting both external
fields and the fields sourced by the dielectric itself. For example, consider a parallel plate
capacitor, whose plates alone produce field Eext . Then
P = 0 χe E, E = Eext − P/0 .
• To generalize this analysis, define free charge to be all charge besides bound charge, so that
ρ = ρbound + ρfree .
D = 0 E + P, ∇ · D = ρfree .
This implies that at boundaries, D⊥ is continuous. The name “electric displacement” is due to
Maxwell, who thought of it as a literal displacement of the ether.
D = E, = 0 (1 + χe )
where is called the permittivity of the material. For example, a point charge in a dielectric
medium would result in the electric field
q
E= r̂.
4πr2
The dielectric constant κ = /0 is also called the relative permittivity r .
• We may heuristically think of D as the “external field” 0 Eext alone. However, this analogy isn’t
perfect, because the above equation does not determine ∇ × D. We know that in electrostatics
∇ × E = 0, but the relation D = E means that at boundaries ∇ × D is generically nonzero.
• Next, we consider the energy of a linear dielectric. Suppose a dielectric material is fixed in
position while free charge is slowly brought in. Then
Z Z Z
∆U = dr (∆ρf )V = dr (∇ · (∆D))V = dr (∆D) · E
where we integrated by parts and threw away a boundary term. Now for a linear dielectric with
D = E, we have ∆(D · E) = 2(∆D) · E, so
Z
1
U= dr D · E.
2
• This differs from the usual formula, with energy density u = E 2 /2, because the two count
different things. We may split the total energy as
where the first three terms count electrostatic interactions, and Uspring is the non-electrostatic
energy stored in the “springs” that hold each atom or molecule in place.
• In the procedure above, we assembled the system by adding free charge. At each step of this
process, the dielectric is in equilibrium, so (right?)
Ubound + Uspring = 0.
Hence the procedure computes Ufree + Ufree/bound , which is the total energy Utot .
• If we instead think of assembling the entire system, including the dielectric charges, from scratch,
then we arrive at u = E 2 /2. The total energy computed this way is Utot − Uspring , because no
mention is made of the spring forces.
• Earlier, we found the energy of a dipole in a field is −p · E. This is simply equal to Ufree/bound ;
the spring force is not counted because we treated p as fixed in that derivation. Hence
Z
Ufree/bound = − dr P · E.
Now in a real situation, the dipoles are in equilibrium, which means the spring force balances the
stretching force. But for a linear dielectric the spring potential is quadratic, so differentiating
it gives a factor of 2, giving Z
1
Uspring = dr P · E.
2
Finally by our earlier identity we have (right?)
Z
1
Ubound = − dr P · E.
2
• For the purposes of thermodynamics, it is ambiguous what to count as the “internal” energy. If
one counts only Uspring , because the electromagnetic fields extend well outside the spring, then
1
dUspring = (E · dp + p · dE) = E · dp
2
which is the form usually seen in textbooks. The latter formula works even if the spring
dissipates energy, as it’s essentially just F dx.
42 2. Electromagnetism
Note. In solids, there is no definite distinction between bound charge and free charge. For example,
consider the ionic lattice of NaCl. We might divide the crystal into unit cells and treat each one as
a molecule. Then the dipole moment of each unit cell due to “bound charge” depends on how the
cell is chosen. Similarly, the “free” charge due to atoms on the boundary not in full unit cells also
depends on the cell. Of course, both these contributions must sum to zero.
Example. Consider a sphere of radius R with uniform polarization P. This is equivalent to having
two uniformly charged balls of total charge ±Q displaced by d so that Qd = (4πR3 /3)P. By the
shell theorem, the field inside is
P
Ep = −
30
and the field outside is exactly a dipole field. Now suppose such a dielectric sphere is in a uniform
field. The total field is
E = E0 + Ep
where E0 is the applied external field, and we know that
P = χe 0 E.
Solving the system, we find
3 κ−1 χe
E= E0 , P=3 0 E0 = 0 E0 .
κ+2 κ+2 1 + χe /3
For small χe this is about equal to the naive result P = χe 0 E0 , but it is smaller because the sphere
itself shields the field that it sees. This is important for relating χe to atomic measurements. The
polarizability of an atom is defined as
p = αE0
where we only count the applied field E0 , because the field produced by the atom itself is negligible.
Then naively for a medium with a number density n of atoms, χe = nα/0 . But instead we have
30 κ − 1
α=
n κ+2
which is called the Clausius-Mossotti formula, or the Lorentz-Lorenz equation in optics. One might
worry that this result only applies for a spherical sample, but we need only imagine a spherical
surface around each atom, much larger than the atomic size, for the argument to work.
Next, we turn to the analogous statements for magnetic fields, which are slightly more confusing.
• A current loop has a magnetic dipole moment
Z
µ = I da = Ia
τ = µ × B, F = ∇(µ · B)
where the latter is proved by Taylor expanding the field near the dipole.
Umech = −µ · B.
This expression is somewhat subtle. The expression above is the potential energy associated
with mechanical forces and torques on the dipole as it is moved, assuming m and the external
field B are fixed. It does not account for the energy required to maintain the magnetic dipole
m or the field B, which could be supplied by an electromagnet, but its derivative yields the
correct mechanical forces on the dipole.
• Note that the total field energy density is B 2 /2µ0 , so the interaction energy between two current
distributions (the first of which is a dipole) is
Z Z
1
U12 = dr B1 · B2 = dr J1 · A2 = µ1 · B2 .
µ0
This is precisely the opposite of Umech .
• To verify the two results are consistent, one can show the work required to maintain the dipole’s
current is U1 = µ1 · B2 . Then U1 + Umech = 0, reflecting the fact that magnetic fields do no
work. Similarly the work required to maintain the external field is U2 = µ2 · B1 = µ1 · B2 by
reciprocity. Hence
U12 = Umech + U1 + U2 = µ1 · B2
which is consistent with our result above.
• In summary, U12 is the true energy, but Umech is the energy one should use when computing
forces on dipoles. The subtleties here have nothing to do with the ones we encountered for
dielectrics. They instead arise from using the wrong variables to describe the situation. In
electrostatics, one can describe the interaction of two conductors by fixing their voltages or
fixing their charges; in the former case we pick up an extra sign because batteries must do work
to maintain the voltages. Similarly, in magnetostatics we can describe the interaction of two
current distributions by fixing their currents or fixing their fluxes. Fluxes can be fixed for free,
assuming perfect conductors, but currents must be fixed using batteries.
• Conceptually, the opposite sign in the true energy compared to the electric dipole case is because
electric and magnetic dipoles have opposite internal fields. A magnetic dipole aligned with a
magnetic field increases the total field energy, while an electric dipole decreases it.
• Define the magnetization M as the dipole moment density. In a linear medium, we define
1 χm
M= B.
µ0 1 + χm
This is not fully analogous to the definition of χe , and we’ll see why later.
44 2. Electromagnetism
Diamagnets are repelled by regions of higher B field while paramagnets are attracted.
• Note that a dielectric has χe > 0 but is attracted to regions of higher E. These sign flips are
again because of the differences in the internal fields. Both dielectrics and diamagnets reduce
the field in the bulk.
• By similar manipulations to the electric case, we see that magnetization leads to the surface
and volume currents
Kbound = M × n̂, Jbound = ∇ × M.
∇ × B = µ0 (Jfree + Jbound ).
M = χm H, µ = µ0 (1 + χm ), B = µH
where µ is called the permeability of the material. Note that the definition of χm is different
from that of χe , which instead related D and E.
• The asymmetry is because Jfree and hence H is easy to measure, by using an ammeter outside
of the material. But a voltmeter indirectly measures E, which depends on the total charge ρ,
not ρfree . The definitions of χm and χe are hence made so they are easy to measure.
• In general, H is a much more useful quantity than D, though both are used for historical
reasons. In fact, some sources regard H as the fundamental quantity and call it the magnetic
field, referring to B to the magnetic induction.
• As before, we may think of H as the magnetic field sourced by Jfree alone, but this is deceptive
because ∇ · H 6= 0. The boundary conditions are
Note. Earnshaw’s theorem for magnets. We know that in free space, ∇2 V = 0, so one cannot
confine charges by an electrostatic field. Similarly, one might ask if it is possible to confine magnetic
materials using a magnetostatic field.
The effective potential experienced by the material is proportional to |B|, and we know ∇ · B = 0
and ∇ × B = 0. Then the Laplacian of a field component vanishes,
∂ 2 Bi = ∂j ∂j Bi = ∂j ∂i Bj = ∂i (∂j Bj ) = 0
where the second step uses the curl-free condition. We thus have
Therefore, B 2 and hence |B| can have local minima but not local maxima. Since diamagnets are
attracted to regions with low |B|, we can have stable equilibrium for diamagnets but not paramagnets.
Examples of the former include superconducting levitation and magnetic traps for atomic gases.
3 Statistical Mechanics
3.1 Ensembles
First, we define the microcanonical ensemble.
• The fundamental postulate of statistical mechanics is that, for an isolated system in equilibrium,
all accessible microstate are equally likely. Here, accessible means ‘reachable due to small
fluctuations’. For example, such fluctuations cannot modify conserved quantities.
• For simplicity, we suppose that energy is the only conserved quantity. Then the probability of
occupying state |ni is
1
pn =
Ω(E)
where Ω(E) is the number of states with energy E.
• We know that for a quantum system the energy levels can be discrete, but for a thermodynam-
ically large system they form a continuum. Then what we really mean by Ω(E) is the number
of states with energy in [E, E + δE] where δE specifies how well we know the energy.
For two non-interacting systems, Ω multiplies, so S adds. That is, entropy is extensive.
• Often, we consider systems in the classical limit. In this case, the many-particle equivalent of
the WKB approximation applies, which states that for a system of N particles, there is one
quantum state per hN of phase space volume. The entropy in this case can then be defined in
terms of the logarithm of the volume of available phase space.
• Now suppose we allow two systems to weakly interact, so they can exchange energy, but the
energy levels of the states aren’t significantly shifted. Then the number of states is
Y X S1 (Ei ) + S2 (Etotal − Ei )
Ω(Etotal ) = Ω1 (Ei )Ω2 (Etotal − Ei ) = exp .
kB
Ei Ei
After allowing the systems to come to equilibrium, so that the new system is described by a
microcanonical ensemble, we find the entropy has increased. This is an example of the Second
Law of Thermodynamics.
• Since S is extensive, the argument of the exponential above is huge in the thermodynamic limit,
so we can approximate the sum by its maximum summand. (This is just the discrete saddle
point method.) Then the final entropy is approximately Stotal = S1 (E∗ ) + S2 (Etotal − E∗ ) where
E∗ is chosen to maximize Stotal .
Note. Motivating the fundamental postulate. In a generic dynamical system, we would expect
a generic initial distribution of states to settle into an “attractor”, thereby justifying equilibrium
ensembles. But the situation in Hamiltonian mechanics is subtler, because Liouville’s theorem tells
us that phase space attractors don’t exist. Instead, what happens is that any initial distribution
47 3. Statistical Mechanics
gets distorted and folded all throughout the phase space, so that after any coarse-graining, the
result looks like the microcanonical ensemble.
To make this a little bit more rigorous, we note that in practice, we usually use statistical
mechanics to predict the time averages of single systems; the microcanonical ensemble is valid if
the time average equals the ensemble average. Let us consider a reduced phase space S which has
constant energy. We define an ergodic component of S to be a subset that remains invariant under
time evolution, and an ergodic system to be one whose ergodic components are measure zero, or
the same measure as S.
By Liouville’s theorem, the microcanonical ensemble over S is time-independent, so its ensemble
average equals its time average. However, long time averages are constant along trajectories, so for
an ergodic system, time averages are the same starting from almost all of S. Therefore, the time
average starting from almost any point equals the microcanonical ensemble average.
There are many different definitions of ergodicity, and it is generally hard to establish any.
(Ergodicity is also sometimes used as a synonym for chaos. Though they often appear together,
chaos is specifically about the exponential divergence of nearby trajectories, while ergodicity is
about what happens in the long run. There is another distinct criterion called “mixing”, which has
to do with the decay of autocorrelation functions.)
This entire discussion gets far more complex when one moves to quantum statistical mechanics.
In quantum mechanics, the idea of a phase space distribution is blurred, and there is a huge variety
of time-independent ensembles, since energy eigenstates don’t evolve in time. However, many-body
energy eigenstates are generally extremely fragile superpositions, which are not observed in practice;
instead, such states quickly decohere into a mixture of non-eigenstates.
Note. Not every nontrivial, realistic system is ergodic. For example, if the solar system were
ergodic, then one would expect catastrophic results, such as Earth and Venus swapping places, or
Jupiter ejecting every planet from the solar system, as these are permitted by conservation laws.
In the case where the planets don’t interact, the motion takes place on invariant tori. The KAM
theorem states that in the three-body problem, for sufficiently weak interplanetary interactions,
and for planetary orbit periods that were not resonant (i.e. close to simple rational numbers), the
tori are distorted but survive. Numerically, we find that stronger interactions completely destroy
the tori. This was the culmination of much work in the 19th century, which attempted to find
convergent series to describe the evolution.
Ergodicity can also fail due to kinetic barriers. For example, a cold magnet with spontaneous
symmetry breaking will in practice never fluctuate to have its bulk magnetization point the opposite
direction, so to match with observation we must fix the magnetization, even though there is no
corresponding conservation law. Similarly, as glasses are cooled, they become trapped in one of
many metastable states.
• Keeping V implicitly fixed for the partial derivatives below, we define the temperature T as
1 ∂S
= .
T ∂E
Comparing this with our previous result, we find that in thermal equilibrium, the temperatures
of the two systems are equal. Moreover, in the approach to equilibrium, energy flows from the
hotter system to the colder one.
48 3. Statistical Mechanics
• Above, we are only guaranteed that E∗ maximizes Stotal if, for each of the two systems,
∂ 2 Si
< 0.
∂E 2
If a system does not satisfy this condition, it is thermodynamically unstable. Placed in contact
with a reservoir, it would never reach thermal equilibrium, instead emitting or absorbing as
much energy as possible. In terms of the heat capacity, stability requires C > 0.
• For example, black holes are hotter than the CMB, and so emit energy by Hawking radiation.
Since they get hotter as they lose energy, they continue emitting energy until they disappear.
• Another exotic option is for a system to have negative temperature. Such a system gets more
ordered as it absorbs energy. From the purposes of entropy maximization, negative temperature
is always “hotter” than any positive temperature. This weird behavior is just because the
natural variable is 1/T . The simple general rule is that heat always flows to higher 1/T .
where we used the triple product rule. Then by similar arguments as above, the pressures of
systems are equal in thermal equilibrium.
• This might sound strange, because we are used to pressure balancing because of mechanical
equilibrium. The point is that both mechanical and thermal equilibrium ensure pressure balance
independently, even though in many cases the former might take effect much faster, e.g. when
two gases are separated by a movable heat conducting piston.
dE = T dS − p dV.
We call ‘work’ the energy transferred by exchange of volume; the rest is ‘heat’. More generally,
P
we can write the work as a sum Ji dxi where the xi are generalized displacements and the Ji
are their conjugate generalized forces, adding yet more terms.
• In general, the (xi , Ji ) behave similarly to (S, T ). In equilibrium, the Ji are equal. For stability,
we must have ∂ 2 E/∂x2 > 0, which implies that the matrix ∂Ji /∂xj is positive definite. For
example, a gas with (∂p/∂V )|T > 0 is unstable to expansion or collapse.
• Consider a system S in thermal equilibrium with a large reservoir R. Then the number of
microstates associated with a state where the system has energy En is
SR (Etotal − En ) SR (Etotal ) ∂SR En
Ω = ΩR (Etotal − En ) = exp ≈ exp −
kB kB ∂Etotal kB
where the approximation holds because the reservoir is very large. Here we have summed over
reservoir states, which one could call “integrating out” or “tracing out” the reservoir.
e−En /kB T X
pn = , Z= e−En /kB T .
Z n
For convenience, we define β = 1/kB T . The partition function Z just normalizes the distribution.
If one takes the ground state energy to be zero, it heuristically measures the number of available
states.
• One might protest that the only reason we get an exponential in the final result is because
we chose to Taylor expand the logarithm of Ω, i.e. the entropy, and take just the leading
term. More precisely, the derivation above holds only when the subleading terms really can be
neglected in the thermodynamic limit. For a wide variety of systems, this is true of log Ω, but
not Ω itself or other functions thereof, as we will see in the next example.
Example. As we will see below, the entropy of an ideal gas depends on energy logarithmically,
S(E) ∼ N log E.
2
Ω(E − ) ∼ Ω(E) − N E N −1 + N (N − 1)E N −2 + . . .
2
and higher-order terms are suppressed by powers of N /E, which is not small. (Another way of
saying this is that the thermodynamic limit is N → ∞, but with E/N held fixed.)
Example. For noninteracting systems, the partition functions multiply. Another useful property
is that the partition function is similar to the cumulant generating function for the energy,
X e−(β−γ)En
f (γ) = logheγE i = log .
n
Z
The cumulants are the derivatives of f evaluated at γ = 0. Only the numerator contributes to this
term, and since it contains only (β − γ) we can differentiate with respect to β instead,
∂ n (log Z)
f (n) (γ)|γ=0 = (−1)n .
∂β n
50 3. Statistical Mechanics
As an explicit example,
∂ log Z ∂ 2 log Z
hEi = − , var E = .
∂β ∂β 2
However, since var E = −∂hEi/∂β, we have
var E = kB T 2 CV
which is a relative of the fluctuation-dissipation theorem. Moreover, all cumulants of the energy
can be found by differentiating hEi, so they are all extensive. Then in the thermodynamic limit
the system has a definite energy and the canonical and microcanonical ensembles coincide. (This
doesn’t hold when we’re applying the canonical ensemble to a small system, like a single atom.)
To see this another way, note that
X
Z= Ω(Ei )e−βEi
Ei
where we are now summing over energies instead of states. But in the thermodynamic limit, the
two factors in the sum are rapidly rising and falling, so they are dominated by the maximum term,
which has fixed energy.
Example. We now compute the entropy of the canonical ensemble. Suppose we had W copies
of the canonical ensemble; then there will be pn W systems in state |ni. Since W is large, we can
consider all the copies to lie in the microcanonical ensemble, for which the entropy is
W! X
S = kB log Ω = kB log Q = −kB W pn log pn .
n (pn W )! n
and this expression is called the Gibbs entropy. It is proportional to the Shannon entropy of
information theory; it is the amount of information we gain if we learn what the microstate is, given
knowledge of the macrostate.
Next, we define the free energy and other potentials.
F = E − T S.
• Next, we can allow the particle number N to vary, and define the chemical potential
∂S
µ = −T .
∂N E,V
• Note that the chemical potential for an classical gas is negative, because it is the energy cost
of a particle at fixed S. To keep the entropy the same, we typically have to remove more
energy than the particle’s presence added. By contrast, for the Fermi gas at zero temperature,
µ = EF > 0 because the entropy is exactly zero.
• We may similarly define the grand canonical ensemble by allowing N to vary. Then
e−β(En −µNn ) X
pn = , Z(T, µ, V ) = e−β(En −µNn )
Z n
∂ log Z ∂ 2 log Z
hN i = , var N = .
∂(βµ) ∂(βµ)2
In particular, as with energy, we see that variance is extensive, so fluctuations disappear in the
thermodynamic limit.
Example. In most cases, the energy and entropy are extensive. This implies that
Differentiating at λ = 1, we find
E = T S − pV + µN.
52 3. Statistical Mechanics
3.2 Thermodynamics
At this point, we start over with thermodynamics. For simplicity, we’ll consider gases whose only
thermodynamic variables are pressure, volume, and temperature.
• The point of thermodynamics is to describe a system with many degrees of freedom in terms
of only its macroscopically observable quantities, which we call the thermodynamic variables.
Historically this approach was taken by necessity, and it continues to be useful today because
of its simplicity. It gives only partial information, but this limited information is often exactly
what we want to know in practice anyway.
• Thermodynamics is a kind of predecessor to the modern idea of effective field theory and the
renormalization group. As described in the notes on Statistical Field Theory, it can be derived
from microscopic physics by applying statistical mechanics and successive coarse grainings until
only macroscopic information remains. But thermodynamics also stands on its own; to a large
extent, its validity is independent of what the microscopic physics is.
• The Zeroth Law states that thermal equilibrium between systems exists, and is transitive.
This means that we can assign systems a temperature T (p, V ) so that systems with the same
temperature are in equilibrium. The equation T = T (p, V ) is called an equation of state. At
this stage, T can be replaced by f (T ) for any monotonic f .
• The First Law tells us that energy is a state function. Work is the subset of energy transfers
due to macroscopically observable changes in macroscopic quantities, such as volume. All other
energy transfer is called heat, so
dE = d̄Q + d̄W
where the d̄ indicates an inexact differential. (Here ‘exact’ is used in the same sense as in the
theory of differential forms, as all terms above can be regarded as one-forms on the space of
thermodynamic variables.)
• The Second Law tells us that it’s impossible to transfer heat from a colder body to a warmer
body without any other effects.
• A Carnot cycle is a process involving an ideal gas that extracts heat QH from a hot reservoir
and performs work W and dumps heat QL to a cold reservoir. We define the efficiency
W
η= .
QH
By construction, the Carnot cycle is reversible. Then by the Second Law, no cycle can have
greater efficiency.
53 3. Statistical Mechanics
f (T2 )
1 − η(T1 , T2 ) = .
f (T1 )
For simplicity, we make the choice f (T ) = T , thereby fixing the definition of temperature. (In
statistical mechanics, this choice is forced by the definition S = kB log Ω.)
• Under this choice, the Carnot cycle satisfies QH /TH + QC /TC = 0. Since any reversible process
can be decomposed into infinitesimal Carnot cycles,
I
d̄Q
=0
T
R
for any reversible cycle. This implies that d̄Q/T is independent of path, as long as we only
use reversible paths, so we can define a state function
Z A
d̄Q
S(A) = .
0 T
• The Third Law tells us that S/N goes to zero as T goes to zero; this means that heat capacities
must go to zero. Another equivalent statement is that it takes infinitely many steps to get to
T = 0 via isothermal and adiabatic processes.
• In statistical mechanics, the Third Law simply says that the log-degeneracy of the ground state
can’t be extensive. For example, in a system of N spins in zero field, one might think that the
ground state has degeneracy 2N . But in reality, arbitrarily weak interactions always break the
degeneracy.
Note. Reversible and irreversible processes. For reversible processes only, we have
For example, in the process of free expansion, the volume and entropy change, even though there is
no heat or work. Now, for a reversible process the First Law gives dE = T dS − p dV . Since both
sides are state functions, this must be true for all processes, though the individual terms will no
longer describe heat or work! We’ll ignore this subtlety below and think of all changes as reversible.
54 3. Statistical Mechanics
Example. We define the enthalpy, Helmholtz free energy, and Gibbs free energy as
H = U + P V, F = U − T S, G = U + P V − T S.
Then we have
From these differentials, we can read off the natural variables of these functions. Also, to convert
between the quantities, we can use the Gibbs–Helmholtz equations
2 ∂(F/T ) 2 ∂(G/T )
U = −T , H = −T
∂T V ∂T p
Note. The potentials defined above have direct physical interpretations. Consider a system with
d̄W = −p dV + d̄W 0 , where d̄W 0 contains other types of work, such as electrical work supplied by a
battery. Since d̄Q ≤ T dS, the First Law gives
−p dV + d̄W 0 ≥ dU − T dS.
If the process is carried out at constant volume, then dF = dU − T dS, so d̄W 0 ≥ dF . Then the
Helmholtz free energy represents the maximum amount of work that can be extracted at fixed
temperature. If instead we fix the pressure, then d̄W 0 ≥ dG, so the Gibbs free energy represents
the maximum amount of non-p dV work that can be extracted.
The interpretation of enthalpy is different; at constant pressure, we have dH = T dS = d̄Qrev ,
so changes in enthalpy tell us whether a chemical reaction is endothermic or exothermic.
Note. Deriving the Maxwell relations. Recall that area in the T S plane is heat and area in the pV
plane is work. In a closed cycle, the change in U is zero, so the heat and work are equal,
Z Z
dp dV = dT dS.
dp ∧ dV = dT ∧ dS.
In terms of calculus, this means the Jacobian for changing variables from (p, V ) to (T, S) is one.
This equality can be used to derive all the Maxwell relations. For example, suppose we write
T = T (S, V ) and P = P (S, V ). Expanding the differentials and using dS ∧ dS = dV ∧ dV = 0,
∂T ∂P
dV ∧ dS = dS ∧ dV
∂V S ∂S V
We now give some examples of problems using the Maxwell relations and partial derivative rules.
55 3. Statistical Mechanics
Example. As stated above, the natural variables of U are S and V . Other derivatives, such as
∂U/∂V |T , are complicated, though one can be deceived because it is simple (i.e. zero) for ideal
gases. But generally, we have
∂U ∂U ∂U ∂S ∂p ∂(p/T )
= + = −p + T =
∂V T ∂V S ∂S V ∂V T ∂T V ∂T V
where we used a Maxwell relation in the second equality. This is the simplest way of writing
∂U/∂V |T in terms of thermodynamic variables.
κT (∂V /∂p)|T (∂V /∂T )|p (∂T /∂p)|V (∂V /∂T )|p (∂S/∂V )|p (∂S/∂T )|p
= = = = =γ
κS (∂V /∂p)|S (∂V /∂S)|p (∂S/∂p)||V (∂p/∂T )|V (∂S/∂p)||V (∂S/∂T )|V
where we used the triple product rule, the reciprocal rule, and the regular chain rule.
Using the ideal gas law, (∂p/∂T )|V = R/V , and integrating gives
Z Z
CV R
S= dT + dV = CV log T + R log V + const.
T V
where we can do the integration easily since the coefficient of dT doesn’t depend on V , and vice versa.
The singular behavior for T → 0 is incompatible with the Third Law, as is the result CP = CV + R,
as all heat capacities must vanish for T → 0. These tensions are because Third Law is quantum
mechanical, and they indicate the classical model of the ideal gas must break down. A more careful
derivation starting from statistical mechanics, given below, can account for the dependence on N
and the unknown constant.
Example. Work for a rubber band. Instead of dW = −pdV , we have dW = f dL, where f is the
tension. Now, we have
∂S ∂f ∂f ∂L
=− =−
∂L T ∂T L ∂L T ∂T f
where we used a Maxwell relation, and both of the terms on the right are positive (rubber bands act
like springs, and contract when cold). The sign can be understood microscopically: an expanding
gas has more position phase space, but if we model a rubber band as a chain of molecules taking a
random walk with a constrained total length, there are fewer microstates if the length is longer.
Next, using the triple product rule gives
∂S ∂T
>0
∂T L ∂L S
and the first term must be positive by thermodynamic stability; therefore a rubber band heats up
if it is quickly stretched, just the opposite of the result for a gas.
56 3. Statistical Mechanics
Example. Work for electric dipoles. In the previous section, we argued that the increment of work
for an electric dipole is
dUdip = E · dp
which corresponds directly to the F dx energy when the dipole is stretched. However, one could
also include the potential energy of the dipole in the field,
Upot = −p · E, dUpot = −p · dE − E · dp
which thereby includes some of the electric field energy. Conventions differ over whether this should
be counted as part of the dipole’s “internal” energy, as the electric fields are not localized to the
dipole. If we do count it, we find
and similarly dUtot = −m · dB for magnetic dipoles. Ultimately, the definition is simply a matter
of convention, and observable quantities will always agree. For example, the Maxwell relations
associated with the “internal energy” Udip are the same as the Maxwell relations associated with
the “free energy” Utot + p · E. Switching the convention simply swaps what is called the internal
energy and what is called the free energy, with actual results staying the same.
Note. In practice, the main difference between magnets and gases is that m decreases with temper-
ature, while p increases; then cycles involving magnets in (m, B) space run opposite the analogous
direction for gases.
P
Note. Chemical reactions. For multiple reactions, we get a contribution i µi dNi to the energy.
Now, consider an isolated system where some particle has no conservation law; then the amount
Ni of that particle is achieved by minimizing the free energy, which sets µ = 0. This is the case for
photons in most situations. More generally, if chemical reactions can occur, then minimizing the
free energy means that chemical potentials are balanced on both sides of the reaction.
As an example, consider the reaction n A ↔ m B. Then in equilibrium, nµA = mµB . On the
other hand, if the A and B species are both uniformly distributed in space, then
N
µi = kB T log + const.
V
Letting [A] and [B] denote the concentrations of A and B, we thus have the law of mass action,
[A]n
= K(T )
[B]m
which generalizes in the obvious way to more complex reactions. In introductory chemistry classes,
the law of mass action is justified by saying that the probability for n A molecules to come together
is proportional to [A]n , but this isn’t a good argument because real reactions occur in multiple
states. For example, two A molecules could combine into an unstable intermediate, which then
react with a third A molecule, and so on.
Note. The Clausius–Clapeyron equation. At a phase transition, the chemical potentials of the two
phases (per molecule) are equal. Now consider two nearby points on a coexistence curve in (p, T )
space. If we connect these points by a path in the region with phase i, then
∆µi = −si dT + vi dP
57 3. Statistical Mechanics
where we used µ = G/N , and si and vi are the entropy and volume divided by the total particle
number N . Since we must have ∆µ1 = ∆µ2 ,
dP s2 − s1 L
= = .
dT v2 − v1 T (V2 − V1 )
This can also be derived by demanding that a heat engine running through a phase transition
doesn’t violate the Second Law.
Note. Insight into the Legendre transform. The Legendre transform of a function F (x) is the
function G(s) satisfying
dF
G(s) + F (x) = sx, s =
dx
from which one may show that x = dG/ds. The symmetry of the above equation makes it clear
that the Legendre transform is its own inverse. Moreover, the Legendre transform crucially requires
F (x) to be convex, in order to make the function s(x) single-valued. It is useful whenever s is an
easier parameter to control or measure than x.
However, the Legendre transforms in thermodynamics seem to come with some extra minus
signs. The reason is that the fundamental quantity is entropy, not energy. Specifically, we have
∂S ∂F
F (β) + S(E) = βE, β= , E= .
∂E ∂β
That is, we are using β and E as conjugate variables, not T and S! Another hint of this comes from
the definition of the partition function,
Z
Z(β) = Ω(E)e−βE dE, F (β) = − log Z(β), S(E) = log Ω(E)
from which we recover the above result by the saddle point approximation.
• These two ideas are unified by the adiabatic theorem. An entropy-conserving process in
thermodynamics corresponds to a slowly varying Hamiltonian which satisfies the requirements
of the adiabatic theorem; this leads immediately to the conservation of phase space volume.
The same idea holds in quantum statistical mechanics, where the entropy quantifies the number
of possible states, which is conserved by the quantum adiabatic theorem.
• The general results of thermodynamics are not significantly changed if the underlying micro-
scopic physics changes. (Steam engines didn’t stop working when quantum mechanics was
discovered!) For example, suppose it is discovered that a gas can be magnetized. Subsequently
including the magnetization in the list of thermodynamic variables would change the numeric
values of the work, free energy, entropy, and so on.
58 3. Statistical Mechanics
• However, this does not invalidate results derived without this variable. Work quantifies how
much energy is given to a system through macroscopically measurable means. Entropy quantifies
how many states a system could be in, given the macroscopically measured variables. Free
energy quantifies how much work we can extract from a system given knowledge of those
same variables. (In the limit of including all variables, the free energy simply becomes the
microscopic Hamiltonian.) All of these can perfectly legitimately change if more quantities
become measurable.
• A more modern, unifying way to think about entropy is as a measure of our subjective ignorance
of the state. As we saw above for the canonical ensemble,
X
S = −kB pn log pn .
n
But this is proportional to −hlog2 pn i, the expected number of bits of information we receive
upon learning the state n. We can use this to define the entropy for nonequilibrium systems.
• In the context of Hamiltonian mechanics, the entropy becomes an integral over phase space of
−ρ log ρ. By Liouville’s theorem, the entropy is thus conserved. However, as mentioned earlier,
in practice the distribution gets more and more finely foliated, so that time evolution combined
with coarse-graining increases the entropy.
• In the context of information theory, the Shannon information −hlog2 pn i is the average number
of bits per symbol needed to transmit a message, if the symbols in the message are independent
and occur with probabilities pn .
• More generally, the Shannon information is a unique measure of ignorance, in the sense that it
is the only function of the {pn } to satisfy the following reasonable criteria.
• Extending this reasoning further leads to a somewhat radical reformulation of statistical me-
chanics, promoted by Jaynes. In this picture, equilibrium distributions maximize entropy not
because of their dynamics, but because that is simply the least informative guess for what the
system is doing. This seems to me to be too removed from the physics to actually be a useful
way of thinking, but it is a neat idea.
Example. Glasses are formed when liquids are cooled too fast to form the crystalline equilibrium
state. Generally, glasses occupy one of many metastable equilibrium states, leading to a “residual
entropy” (i.e. quenched disorder) at very low temperatures. To estimate this residual entropy, we
could start with a cold perfect crystal (which has approximately zero entropy), melt it, then cool it
into a glass. The residual entropy is then
Z T =T` Z T =0
d̄Q d̄Q
Sres = + .
T =0 T T =` T
59 3. Statistical Mechanics
In other words, the residual entropy is related to the amount of “missing heat”, which we transfer
in when melting the crystal, but don’t get back when turning it into a crystal.
More concretely, consider a double well potential with energy difference δ and a much larger
barrier height. As the system is cooled to kB T . δ, the system gets stuck in one of the valleys,
leading to a statistical entropy of kB log 2 ∼ kB . If the system gets stuck in the higher valley, then
there is a “missing” heat of δ, which one would have harvested at T ∼ δ/kB if the barrier were low,
so the system retains a thermodynamic entropy of δ/T ∼ kB . Hence both definitions of entropy
agree: there is a residual entropy of roughly kB times the number of such “choices” the system
must make as it cools.
Example. Some people object that identifying subjective information with entropy is a category
error; however, it really is true that “information is physical”. Suppose that memory is stored in
a computer as follows: each bit is a box with a divider. For a bit value of 0/1, a single atom is
present on the left/right side. Bit values can be flipped without energy cost; for instance, a 0 can
be converted to a 1 by moving the left wall and the divider to the right simultaneously.
One can harvest energy by forgetting the value of a bit. Concretely, one allows the divider to
move out adiabatically under the pressure of the atom. Once the divider is at the wall, we put a
new divider in. We have harvested a P dV work of kB T log 2, at the cost of no longer knowing the
value of the bit. Thus, pure “information” can be used to run an engine.
This reasoning also can be used to exorcise Maxwell’s demon. It is possible for a demon to
measure the state of a previously unknown bit without any energy cost, and then to extract work
from it. However, in the process, the entropy of the demon goes up – concretely, if the demon uses
similar bits to perform the measurement, known values turn into unknown values.
We would have a paradox if the demon were able to reset these unknown values to known ones
without consequence. But if the demon just tries to push pistons inward, then he increases the
temperatures of the atoms, and thereby produces a heat of kB T log 2 per bit. That is, erasing pure
“information” can cause the demon to warm up. As such, there is nothing paradoxical, because the
demon just behaves in every way like an ordinary cold reservoir.
The result that kB T log 2 heat is produced upon erasing a bit is known as Landauer’s principle,
and it also applies to computation in general. For example, an AND gate fed with uniformly random
inputs produces an output with a lower Shannon entropy, which means running the AND gate on
such inputs must produce heat. Numerically, at room temperature, we have kB T log 2 = 0.0175 eV.
However, computation can be performed with no heat dissipation at all if one uses only reversible
gates. During the computation one accumulates “garbage” bits that cannot be erased; at the end
one can just copy the answer bits, then run the computation in reverse. Numerous concrete models
of reversible computation have been proposed to demonstrate this point, as earlier it was thought
that Maxwell’s demon implied computation itself required energy dissipation.
Example. For each particle, we have the Hamiltonian Ĥ = p̂2 /2m + V (q̂), where the potential
confines the particle to a box. The partition function is defined as Z = tr e−β Ĥ . In the classical
limit, we neglect commutators,
2 /2m
e−β Ĥ = e−β p̂ e−βV (q̂) + O(~).
60 3. Statistical Mechanics
• For a particle in an ideal gas, the position integral gives a volume factor V . Performing the
Gaussian momentum integrals,
s
V 2π~2
Z = 3, λ = .
λ mkB T
The thermal de Broglie wavelength λ is the typical de Broglie wavelength of a particle. Then
our expression for Z makes sense if we think of Z as the ‘number of thermally accessible states’,
each of which could be a wavepacket of volume λ3 .
One common, slick derivation of this is to assume the velocity components are independent and
identically distributed, and F can only depend on the speed by rotational symmetry. Then
Example. Gaseous reactions. At constant temperature, the chemical potential per mole of gas is
µ(p) = µ◦ + RT log(p/p◦ )
where µ◦ is the chemical potential at standard pressure p◦ . For a reaction A ↔ B, we define the
equilibrium constant as K = pB /pA . Then
pB ∆G◦
∆G = ∆G◦ + RT log log K = − .
pA RT
This result also holds for arbitrarily complicated reactions. Applying the Gibbs–Helmholtz relation,
d log K ∆H ◦
=
dT RT 2
which is an example of Le Chatelier’s principle.
Example. Counting degrees of freedom. A monatomic gas has three degrees of freedom; the atom
has kinetic energy (3/2)kB T . The diatomic gas has seven: the three translational degrees of freedom
of the center of mass, the two rotations, and the vibrational mode, which counts twice due to the
potential energy of the bond, but is frozen out at room temperature.
An alternating counting method is to simply assign (3/2)kB T kinetic energy to every atom; this
is correct because the derivation of the monatomic gas’s energy holds for each atom separately, in
the moment it collides with another. The potential energy then adds (1/2)kB T .
• Corrections to the ideal gas law are often expressed in terms of a density expansion,
p N N2 N3
= + B2 (T ) 2 + B3 (T ) 3 + · · ·
kB T V V V
• To calculate the coefficients, we need an ansatz for the interaction potential. We suppose the
density is relatively low, so only pairwise interactions matter, so
X
Hint = U (rij ).
i<j
62 3. Statistical Mechanics
• If the atoms are neutral with no permanent dipole moment, they will have an attractive 1/r6
van der Waals interactions. Atoms will also have a strong repulsion at short distances; in the
Lennard–Jones potential, we take it to be 1/r12 for convenience. In our case, we will take the
even simpler choice of a hard core repulsion,
(
∞ r < r0
U (r) =
−U0 (r0 /r)6 r ≥ r0 .
It is tempting to expand in βU , but this doesn’t work because U is large (infinite!). Instead
we define the Mayer f function
f (r) = e−βU (r) − 1
which is bounded here between −1 and 0. Then
Z Y
1 Y
Z(N, V, T ) = dri (1 + fjk ).
N !λ3N
i j>k
• The zeroth order term recovers V N . The first order term gives
N 2 N −2 N 2 N −1
Z Y X Z Z
dri fjk ≈ V dr1 dr2 f (r12 ) ≈ V drf (r)
2 2
i j>k
where we integrated out the center of mass coordinate. We don’t have to worry about bounds
of integration on the r integral, as most of its contribution comes from atomic-scale r.
• Since f ∼ r03 , the ratio of the first and zeroth order terms goes as N r03 /V , giving us a measure
of what ‘low density’ means.
pV Nf
=1− .
N kB T 2V
Evidently, we have computed the virial coefficient B2 (T ). Finding f explicitly yields the van
der Waals equations of state.
63 3. Statistical Mechanics
• Note that the integral for f diverges if the potential falls off as 1/r3 or slower. These potentials
are “long-ranged”.
Higher order corrections can be found efficiently using the cluster expansion.
• Consider a generic term O(f E ) term in the full expansion of Z above. Such a term can be
represented by a graph G with N vertices and E edges, with no edges repeated. Denoting the
value of a graph by W [G], we have
1 X
Z= W [G].
N !λ3N
G
• Each graph G factors into connected components called clusters, each of which contributes an
independent multiplicative factor to W [G].
• The most convenient way to organize the expansion is by the number and sizes of the clusters.
Let Ul denote the contribution from all l-clusters,
l
Z Y X
Ul = dri W [G].
i=1 G is l−cluster
P
Now consider the contributions of all graphs with ml l-clusters, so that ml l = N . They have
the value
Y N ! U ml
l
.
(l!)m
l ml !
l
where the various factorials prevent overcounting within and between l-clusters.
1 XY Ulml
Z=
λ3N (l!)ml ml !
{ml } l
We see that if we take the log to get the free energy, only bl appears, not higher powers of bl .
This reduces a sum over all diagrams to a sum over only connected diagrams. Expanding in
powers of z allows us to find the virial coefficients.
64 3. Statistical Mechanics
Note. Calculating the density of states. For independent particles in a box with periodic boundary
conditions, the states are plane waves, leading to the usual 1/h3 density of states in phase space.
Integrating out position and momentum angle, we have
4πV 2
g(k) = k .
(2π)3
V E2
g(E) = .
2π 2 ~3 c3
In general, we should also multiply by the number of spin states/polarizations.
• Using E = ~ω and the fact that photons are bosons with two polarizations, the partition
function for photons in a mode of frequency ω is
1
Zω = 1 + e−β~ω + e−2β~ω + . . . = .
1 − e−β~ω
Note that the number of photons is not fixed. We can imagine we’re working in the canonical
ensemble, but summing over states of the quantum field. Alternatively, we can imagine we’re
working in the grand canonical ensemble, where µ = 0 since photon number is not conserved;
instead the photon number sits at a minimum of the Gibbs free energy. There are no extra
combinatoric factors, involving which photons sit in which modes, because photons are identical.
The energy is
∞
ω3
Z
∂ V~
E=− log Z = 2 3 dω
∂β π c 0 eβ~ω − 1
where the integrand is the Planck distribution. Taking the high T limit then recovers the
Rayleigh–Jeans law, from equipartition.
65 3. Statistical Mechanics
• Now, to evaluate the integral, note that it has dimensions ω 4 , so it must produce 1/(β~)4 . Then
E ∝ V (kB T )4
which recovers the Stefan–Boltzmann law.
• To get other quantities, we differentiate the free energy. One particularly important result is
E
p=
3V
which is important in cosmology. One way to derive the constant is to note that the pressure
from kinetic theory depends on pv, and pv is twice the kinetic energy for a nonrelativistic gas,
but equal to the kinetic energy for a photon gas. Thus pV = (1/2)(2E/3) for a photon gas.
• By considering an isochoric change,
dE
dS = ∝ V T 2 dT, S ∝ V T3
T
where the constant is zero by the Third Law. Thus pV γ is invariant in adiabatic (entropy
conserving) processes, where γ = 4/3.
• Note that adiabatically expanding or contracting a photon gas must keep it in equilibrium, just
like any other gas. This is simply because a photon gas can be used in a Carnot cycle, and if the
gas were not in equilibrium at the end of each adiabat, we could extract more work, violating
the second law.
• Microscopically, the number of photons is conserved during adiabatic processes, and every
photon redshifts by the same factor. This is because every photon has the same speed and
hence bounces off the walls equally as often, picking up the same redshift factor every time.
Since adiabatic processes preserve equilibrium, scaling the energies/frequencies in Planck’s law
is exactly the same as scaling the temperature.
• Since γ = 4/3 for the photon gas, it naively seems we have six degrees of freedom. The catch is
that the equipartition theorem only works for quadratic degrees of freedom; for a linear degree
of freedom, as in E = pc, the contribution is twice as much, giving twice as many effective
‘degrees of freedom’ as for a monatomic gas.
• However, this analogy is not very good: while a classical ultrarelativistic gas has energy 3N kB T ,
this is not true for a quantum gas, and a photon gas is always quantum. We cannot hide the
discreteness of the quantum states by raising the temperature, because the energy will always
mostly come from photons with energy of order kB T , so the mode occupancy numbers will
always be order 1. A photon gas is simply not the same as an ultrarelativistic classical gas with
conserved particle number, in any regime.
Note. Above, we’ve thought of every photon mode as a harmonic oscillator. To see this microscop-
ically, note that A is the conjugate momentum to E and the energy is
1 1
H ∼ (E 2 + B 2 ) ∼ (E 2 + ω 2 A2 )
2 2
where we worked in Coulomb gauge. This is then formally identical to a harmonic oscillator. The
reason that E and B are in phase, rather than the usual 90◦ out of phase, is that B is a derivative
of the true canonical variable A.
66 3. Statistical Mechanics
Note. Historically, Planck was the first to suggest that energy could be transferred between matter
and radiation only in integer multiples of ~ω. It was Einstein who made the further suggestion that
energy in radiation should always come in integer multiples of ~ω, in particles called photons. This
seems strange to us today because we used the idea of photons to derive Planck’s law. However,
Planck did not have a strong understanding of equilibrium statistical mechanics. Instead, he
attempted to solve a kinetic equation and find equilibrium in the long-time limit, e.g. by formulating
an H-theorem. This was a much harder task, which required an explicit theory of the interaction
of matter and radiation. Incidentally, Boltzmann derived the Stefan–Boltzmann law in the 1870s
by using blackbody radiation as the working fluid in a Carnot cycle.
Example. Phonons. The exact same logic applies for phonons in a solid, except that there are
three polarization states, and the speed of light c is replaced with the speed of sound cs . (That is,
we are assuming the dispersion relation remains linear.) There is also a high-frequency cutoff ωD
imposed by the lattice.
To get a reasonable number for ωD , note that the number of normal modes is equal to the number
of degrees of freedom, so Z ωD
dω g(ω) = 3N
0
where N is the number of lattice ions. The partition function is very similar to the blackbody case.
At low temperatures, the cutoff ωD doesn’t matter, so the integral is identical, and
E ∝ T4 C ∝ T 3.
At high temperatures, one can show that with the choice of ωD above, we simply reproduce the
Dulong-Petit law. The only problem with the Debye model is that the phonon dispersion relation
isn’t actually linear. This doesn’t matter at very high or low temperatures, but yields slight
deviations at intermediate ones.
Now we formally introduce the Bose–Einstein distribution. For convenience, we work in the grand
canonical ensemble.
P
• Consider a configuration of particles where ni particles are in state i, and i ni = N . In the
Maxwell–Boltzmann distribution, we treat the particles as distinguishable, then divide by 1/N !
at the end, so the probability of this configuration is proportional to
1 N N − n1 Y 1
··· = .
N ! n1 n2 ni !
i
In the Bose–Einstein distribution, we instead treat each configuration as one state of the
quantum field, so all states have weight 1.
• As long as all of the ni are zero or one (the classical limit), the two methods agree. How-
ever, once we introduce discrete quantum states, simply dividing by 1/N ! no longer “takes us
from distinguishable to indistinguishable”. States in which some energy levels have multiple
occupancy aren’t weighted enough.
• Similarly, the Fermi–Dirac distribution also agrees with the classical result, as long as hni i 1.
67 3. Statistical Mechanics
• Another way of saying this is that in the classical case, we’re imagining we can paint labels
on all the particles; at the end we divide by 1/N ! because the labels are arbitrary. This is an
imperfect approximation to true indistinguishability, because when two particles get into the
same state, we must lose track of the labels!
• For one single-particle quantum state |ri, the Bose–Einstein partition function is
X 1
Zr = e−βnr (Er −µ) = .
nr
1− e−β(Er −µ)
Note that in the classical case, we would have also multiplied by 1/nr !. Without this factor,
the sum might not converge, so we also demand Er > µ for all Er . Setting the ground state
energy E0 to zero, we require µ < 0.
• Using the Bose–Einstein distribution, we can compute properties of the Bose gas,
Z Z
g(E) Eg(E)
N = dE −1 βE , E = dE −1 βE
z e −1 z e −1
where z = eβµ is the fugacity. The stability requirement µ < 0 means z < 1.
• At high temperatures, we can compute the corrections to the ideal gas law by expanding in
z 1, finding
N z z
= 3 1 + √ + ···
V λ 2 2
68 3. Statistical Mechanics
λ3 N
pV = N kB T 1 − √ + ... .
4 2V
The pressure is less; the physical intuition is that bosons ‘like to clump up’, since they’re missing
the 1/nr ! weights that a classical gas has.
Note. To get more explicit results, it’s useful to define the functions
Z ∞
1 xn−1
gn (z) = dx −1 x .
Γ(n) 0 z e −1
To simplify this, expand the denominator as a geometric series for
∞ Z ∞ ∞
1 X zm ∞ zm
Z
1 X X
gn (z) = dx xn−1 e−mx z m = n
du un−1 −u
e = .
Γ(n) Γ(n) m 0 mn
m=1 m=1 m=1
N g3/2 (z) E 3 kB T
= , = g (z)
V λ3 V 2 λ3 5/2
for the ideal Bose gas. Finally, for photon gases where µ = 0 we use
gn (1) = ζ(n).
π2 π4
ζ(2) = , ζ(4) = .
6 90
These results may be derived by evaluating
Z π
dx |f (x)|2
−π
for f (x) = x and f (x) = x2 , respectively, using direct integration and Fourier series.
Note. We may also derive the Bose–Einstein distribution starting from the microcanonical ensemble.
Indexing energy levels by s, let there be Ns bosons in an energy level with degeneracy Ms . The
number of states is
Y (Ns + Ms − 1)!
Ω= .
s
Ns !(Ms − 1)!
Using Stirling’s approximation, the entropy is
X
S = kB log Ω = kB (Ns + Ms ) log(Ns + Ms ) − Ns log Ns − Ms log Ms .
s
69 3. Statistical Mechanics
An equivalent way to phrase this step is that we are maximizing entropy subject to fixed N and
U , and the last two terms come from Lagrange multipliers. Rearranging immediately gives the
Bose–Einstein distribution, where hns i = Ns /Ms . Similar arguments work for the Fermi–Dirac and
Boltzmann distributions.
Note. This idea of thinking of thermodynamic quantities as Lagrange multipliers is quite general.
We get a Lagrange multiplier every time there is a conserved quantity. For particle number we
get the chemical potential. As another example, for electric charge the corresponding Lagrange
multiplier would be the electric potential. This is rather different from our usual interpretation of
these quantities, which is in terms of the energy cost to pull some of the corresponding conserved
quantity from the environment. But just as for temperature, we can recover that picture by just
partitioning our original system into a subsystem and “environment” and analyzing the subsystem.
• Consider low temperatures, which correspond to high z, and fix N . Since we have
N g3/2 (z)
=
V λ3
the quantity g3/2 (z) must increase as λ3 increases. However, we know that the maximum value
of g3/2 (z) is g3/2 (1) = ζ(3/2), so this is impossible below the critical temperature
2/3
2π~2
n
Tc =
kB m ζ(3/2)
• The problem is that, early on, we took the continuum limit and turned sums over states into
integrals; this is a good approximation whenever the occupancy of any state is small. But for
T < Tc , the occupancy of the ground state becomes macroscopically large!
• The ground state isn’t counted in the integral because g(0) = 0, so we manually add it, for
N g3/2 (z) 1
= + n0 , n0 = .
V λ3 z −1 − 1
Then for T < Tc , z becomes extremely close to one (z ∼ 1 − 1/N ), and the second term makes
up for the first. In the limit T → 0, all particles sit in the ground state.
• We say that for T < Tc , the system forms a Bose–Einstein condensate (BEC). Since the
number of uncondensed particles in a BEC at fixed temperature is independent of the density,
the equation of state of a BEC doesn’t depend on the density.
70 3. Statistical Mechanics
• To explicitly see the phase transition behavior, note that for z → 1, one can show
√
g3/2 (z) ≈ ζ(3/2) + A 1 − z + . . . .
• Another way of characterizing the BEC transition is that it occurs when the chemical potential
increases to the ground state energy, creating a formally divergent number of particles in it.
Note. In a gas where the particle number N is not conserved, particles are created or destroyed freely
to maximize the entropy, setting the chemical potential µ to zero. For such a gas, Bose–Einstein
condensation cannot occur. Instead, as the temperature is lowered, N goes to zero.
Note that if N is almost conserved, with N changing on a timescale T much greater than the
thermalization time, then for times much less than T we can see a quasiequilibrium with nonzero µ.
Also note that setting µ = 0 formally makes N diverge if there are zero energy states. This infrared
divergence is actually correct; for instance, a formally infinite number of photons are created in
every single scattering event. This is physically acceptable since these photons cannot be detected.
Note. Bose–Einstein condensation was first predicted in 1925. In 1938, superfluidity was discovered
in 4 He. However, superfluids are far from ideal BECs, as they cannot be understood without
interactions. The first true BECs were produced in 1995 from dilute atomic gases in a magnetic
trap, with Tc ∼ 100 nK. This temperature was achieved using Doppler laser cooling and evaporative
cooling. Further details are given in the notes on Optics.
• Each single-particle quantum state |ri can be occupied by one or two particles, so
1
Zr = 1 + e−β(Er −µ) hnr i = .
eβ(Er −µ) + 1
Our expression for nr is called the Fermi–Dirac distribution; it differs from the Bose–Einstein
distribution by only a sign. Since there are no convergence issues, µ can be positive.
• Our expression for N , E, and pV are almost identical to the Bose gas case, again differing by
a few signs. As before, we have pV = (2/3)E. The extra minus signs result in a first-order
increase in pressure over that of a classical gas at high temperatures.
All states with energies up to the Fermi energy EF are filled, where in this case EF is just
equal to the chemical potential. These filled states form the ‘Fermi sea’ or ‘Fermi sphere’, and
its boundary is the Fermi surface. The quantity EF can be quite high, with the corresponding
temperature TF = EF /kB at around 104 K for metals and 107 K for white dwarfs.
• Next, consider the particle number and energy density near zero temperature,
Z ∞ Z ∞
g(E) Eg(E)
N= dE −1 βE , E= dE −1 βE
0 z e +1 0 z e +1
where g(E) is the density of states. We look at how E and µ depend on T , holding N fixed.
• Next, consider the change in energy. Since dN/dT = 0, the only effect is that kB T /EF of the
particles are excited by energy on the order of kB T . Then ∆E ∼ T 2 , so CV ∼ T .
CV = γT + αT 3
where the second term is from phonons. We can test this by plotting CV /T against T 2 . The
linear contribution is only visible at very low temperatures.
72 3. Statistical Mechanics
Note. The classical limit. Formally, both the Fermi–Dirac and Bose–Einstein distributions approach
the Maxwell–Boltzmann distribution in the limit of low occupancy numbers,
E−µ
1.
T
Since this is equivalent to T E − µ, it is sometimes called the low temperature limit, but this is
deceptive; it would be better to call it the ‘high energy limit’. Specifically, the high energy tail of a
Bose or Fermi gas always behaves classically. But at low temperature Bose and Fermi gases look
‘more quantum’ as a whole.
Note. The chemical potential is a bit trickier when the energy levels are discrete, since it can’t
be defined by a derivative; it is instead defined by fixing N . It can be shown that in the zero
temperature limit, the chemical potential is the average of the energies of the highest occupied
state and the lowest unoccupied state. This ensures that N is fixed upon turning in a small T . In
particular, it holds even if these two states have different degeneracies, because the adjustment in
µ needed to cancel this effect is exponentially small.
Note. We can establish the above results quantitatively with the Sommerfeld expansion. Define
Z ∞
1 xn−1
fn (z) = dx −1 x
Γ(n) 0 z e +1
which are the fermionic equivalent of the gn functions. Then
N gs E 3 gs
= 3 f3/2 (z), = kB T f5/2 (z)
V λ V 2 λ3
where we plugged in the form of g(E), and gs is the number of spin states. We want to expand the
fn (z) at high z. At infinite z, the integrands are just xn−1 θ(βµ − x), so the integral is (βµ)n /n.
For high z, the integrands still contain an approximate step function. Then it’s convenient to
peel off the difference from the step function by splitting the integral into two pieces,
Z βµ Z ∞
xn−1
1
Γ(n)fn (z) = dx xn−1 1 − −x
+ dx −1 x .
0 1 + ze βµ z e +1
The first term simply reproduces the infinite temperature result. Now, the deviations above and
below βµ tend to cancel each other, as we saw for dN/dT above. Then it’s useful to subtract them
against each other; defining η = βµ − x and η = x − βµ respectively, we get
Z ∞
(log z)n (βµ + η)n−1 − (βµ − η)n−1
Γ(n)fn (z) = + dη
n 0 1 + eη
where we extended a limit of integration from βµ to ∞, incurring an exponentially small O(z −1 )
error. Taylor expanding to lowest order in βµ gives
Z ∞
(log z)n n−2 η
Γ(n)fn (z) = + 2(n − 1)(log z) dη η .
n 0 e +1
This integral can be done by expanding the denominator as a geometric series in e−η . Termwise
integration gives the series (−1)m+1 /m2 = (1/2) 1/m2 = π 2 /12, giving the final result
P P
By keeping more terms in the Taylor expansion, we get a systematic expansion in 1/ log z = 1/βµ.
Applying the expansion to N/V , we immediately find
kB T 2
∆N ∼
µ
which shows that, to keep N constant,
2
kB T
∆µ ∼
EF
as expected earlier. Similarly, the first term in ∆E goes as T 2 , giving a linear heat capacity.
Example. Pauli paramagnetism. Paramagnetism results from dipoles aligning with an external
field, and Pauli paramagnetism is the alignment of spin. In a field B, electrons have energy
|e|~
E = µB Bs, s = ±1, µB =
2mc
where µB is the Bohr magneton. Then the occupancy numbers are
N↑ 1 N↓ 1
= 3 f3/2 (zeβµB B ), = 3 f3/2 (ze−βµB B ).
V λ V λ
The resulting magnetization is
M = µB (N↑ − N↓ ).
In the high-temperature limit, z is small and f3/2 (z) ≈ z, so
2µB V z
M= sinh(βµB B) = µB N tanh(βµB B)
λ3
where N = N↑ + N↓ . This is simply the classical result, as given by Maxwell–Boltzmann statistics.
One important feature is that the susceptibility χ = ∂M/∂B goes as 1/T , i.e. Curie’s law.
In the low-temperature limit, we take the leading term in the Sommerfeld expansion, then expand
to first order in B, for
M = µ2B g(EF )B.
Then at low temperatures, the susceptibility no longer obeys Curie’s law, but instead saturates to
a constant. To understand this result, note that only g(EF )∆E = g(EF )µB B electrons are close
enough to the Fermi surface to participate, and they each contribute magnetization µB .
Note. Classically, charged particles also exhibit diamagnetism because they begin moving in circles
when a magnetic field is turned on, creating an opposing field. However, this explanation isn’t
completely right because of the Bohr-van Leeuwen theorem; the canonical partition function Z does
not depend on the external field, as can be seen by shifting p − eA to p in the integral.
Physically, this is because the particles must be in a finite box, say with reflecting walls. Then
the particles whose orbits hit the walls effectively orbit backwards. Since the magnetic moment is
proportional to the area, this cancels the magnetic moment of the bulk exactly. This is a significantly
trickier argument. It is much easier to simply calculate Z and then differentiate the free energy,
since Z itself is less sensitive to the boundary conditions.
In quantum mechanics, this argument does not hold. The partition function isn’t an integral, so
the first argument fails; we will instead find nontrivial dependence of Z on the field. In terms of
the energy levels, electron states near the boundary are much higher energy due to the repulsive
potential, so they are less relevant, though this is a bit difficult to see.
74 3. Statistical Mechanics
The idea behind the Euler summation formula is that one can approximate a smooth function by a
low-order polynomial (or a Taylor series with decreasing coefficients). To see the origin of the first
term, consider the formula for a unit interval,
Z 1
h(1/2) ≈ h(x) dx + . . . .
0
There is no correction term if h(x) is a first-order polynomial. The correction due to second-degree
terms in h(x) can be found by subtracting h0 (x) at the endpoints,
Z 1
h(1/2) ≈ h(x) dx + c(h0 (0) − h0 (1)) + . . . .
0
To find the value of c, consider h(x) = (x − 1/2)2 , which fixes c = 1/24. Telescoping the sum
gives the h0 (0)/24 term in the formula above. Generally, all higher correction terms will have odd
derivatives, because terms like (x − 1/2)2n+1 don’t contribute to the area.
Example. An explicit calculation of Landau diamagnetism. When the electrons are constrained
to the xy plane, they occupy Landau levels with
1 eB
E = n+ ~ωc , ωc =
2 m
with degeneracy
Φ 2π~c
N= , Φ = L2 B, Φ0 = .
Φ0 e
Allowing the electrons to move in the third dimension gives an energy contribution ~2 kz2 /2m. Then
the grand partition function is
∞
2L2 B β~2 kz2
Z
L X
log Z = dkz log 1 + z exp − − β~ωc (n + 1/2)
2π Φ0 2m
n=0
where we added a factor of 2 to account for the spin sum, and converted the kz momentum sum
into an integral. Now we apply the Euler summation formula with the choice
β~2 kz2
Z
h(x) = dkz log 1 + exp − + βx .
2m
Then our grand partition function becomes
∞ Z ∞
VB X VB ~ωc dh
log Z = h(µ − ~ωc (n + 1/2)) = h(µ − ~ωc x) dx − + ... .
πΦ0 πΦ0 0 24 dµ
n=0
1 ∂(log Z) µ2
M= = − B g(Ef )B
β ∂B 3
75 3. Statistical Mechanics
where we have µB = |e|~/2mc as usual. Since the paramagnetic effect is three times larger, one
might expect that every solid is paramagnetic. The subtlety is that when the crystal lattice is
accounted for, the mass m used above becomes the effective mass m∗ . But the paramagnetic effect
is not changed at all, because it only depends on the intrinsic magnetic moments of the electrons,
which are independent of their motion. Another, independent factor is that core electrons still
contribute via Larmor diamagnetism but have no paramagnetic effects.
Note. Consider the hydrogen atom, with energy levels En = −E0 /n2 . The partition function
diverges, so formally the probability of occupancy of any state is zero! The situation only gets worse
when we consider unbound states as well.
The resolution is that we are missing a spatial cutoff; the sum over n includes states that
are extremely large. Any reasonable cutoff gives a reasonable result. For infinite volume, a zero
probability of occupancy really is the correct answer, because once the electron moves a significant
distance from the atom, it has little chance of ever coming back: a random walk in three dimensions
will likely never return to its starting point.
76 4. Kinetic Theory
4 Kinetic Theory
4.1 Fundamentals
So far, we’ve only considered systems in thermal equilibrium. Kinetic theory is the study of the
microscopic dynamics of macroscopically many particles, and we will use it to study the approach
to equilibrium. We begin with a heuristic introduction.
• We will need the fact that in equilibrium, the velocities of the particles in a gas obey the
Maxwell–Boltzmann distribution
3/2
m 2
f (v) = e−mv /2kB T .
2πkB T
• Now suppose we model the gas particles as hard spheres of diameter d. This is equivalent to
modeling the particles as points, with an interaction potential that turns on at a distance d, so
the interaction cross section is πd2 . Hence the mean free path is
1
`= .
nπd2
We assume the gas is dilute, so ` d.
• The typical time between collisions is called the scattering time or relaxation time,
`
τ= .
hvrel i
To estimate hvrel i, note that
6kB T
2
hvrel i = h(v − v0 )2 i = hv 2 i + hv 02 i =
m
since the Maxwell–Boltzmann distribution is isotropic, and we used equipartition of energy in
the last step.
• Zooming out, we can roughly think of each gas molecule as performing a random walk with
step size ` and time interval τ . For motion in one dimension starting at x = 0, the probability
of being at position x = m` after time t = N τ is
r r
N 2 2 2τ −x2 τ /2`2 t
P (x, t) = 2−N ≈ e−m /2N = e
(N − m)/2 πN πt
where we used Stirling’s approximation to expand the combination, and expanded to leading
order in m/N . The probability distribution is hence a Gaussian with variance
`2
hx2 i = t.
τ
This makes sense, as each of the t/τ steps is independent with variance `2 .
• For many particles diffusing independently, their density is described by the diffusion equation,
∂n
= D∇2 n.
∂t
Since this equation is linear, it suffices to establish this for an initial condition n(x, t = 0) = δ(x),
where it should match the result of the random walk above. In one dimension, the solution is
r
1 −x2 /4Dt
n(x, t) = e
4πDt
from which we conclude D = `2 /2τ . Similarly, in three dimensions we also find a spreading
Gaussian with D = `2 /6τ .
• Consider two plates at z = 0 and z = d. If the top plate is moved a constant speed u in the
x direction, there will be a velocity gradient ux (z) within the fluid. The upper plate then
experiences a resistive force
dux u
F = ηA ≈ ηA
dz d
where the latter holds when d is small. The coefficient η is the dynamic viscosity.
• Microscopically, viscosity can be thought of in terms of the transport of px through the fluid.
The plates are “sticky”, so that molecules pick up an average nonzero px when colliding the top
plate and lose it when colliding with the bottom plate. In the steady state, collisions between
particles in the body of the fluid continually transport px from the top plate to the bottom.
• As a simple approximation, we’ll suppose the local velocity distribution in the fluid is just the
Maxwell–Boltzmann distribution shifted by ux (z), which is assumed to be small.
• We now compute the momentum flowing through a surface of constant z. The number of
particles passing through it per unit time per unit area is
Z
n dv vz f (v).
∆z = ` cos θ.
Putting it all together, the momentum transferred per unit time per unit area is
Z Z 3/2
F dux m 2 /2k T
=n dv vz f (v)∆px = mn` dv ve−mv B
cos2 θ.
A dz 2πkB T
78 4. Kinetic Theory
• Now, the integral is essentially computing hvi up to the factor of cos2 θ. Working in spherical
coordinates, the only difference would be the θ integral,
Z π Z π
2
dθ cos2 θ sin θ dθ = , dθ sin θ dθ = 2.
0 3 0
The phase space is 6N -dimensional, and we describe the configuration of the system as a
probability distribution f on phase space, normalized as
Z Y
dV f (ri , pi , t) = 1, dV = dri dpi .
i
Liouville’s theorem states that df /dt = 0, where the derivative is to be interpreted as a convective
derivative, following the phase space flow.
• We define the one-particle distribution function by integrating over all but one particle,
Z N
Y
f1 (r1 , p1 , t) = N dV1 f (ri , pi , t), dV = dri dpi .
i=k+1
This doesn’t treat the first particle as special, because the particles are all identical, so f may be
taken symmetric. The one-particle distribution function allows us to compute most quantities
of interest, such as the density and average velocity,
Z Z
p
n(r, t) = dp f1 (r, p, t), u(r, t) = dp f1 (r, p, t).
m
• Now, by the same logic as when we were computing dhAi/dt, we can integrate by parts for
j 6= 1, throwing away boundary terms and getting zero. Hence we only have to worry about
j = 1. Relabeling (r1 , p1 ) to (r, p), we have
N
!
∂V (r) ∂f X ∂U (r − rk ) ∂f
Z
∂f1 p ∂f
= N dV1 − · + · + · .
∂t m ∂r ∂r ∂p ∂r ∂p
k=2
• The first two terms simply reflect the dynamics of free “streaming” particles, while the final
term includes collisions. Hence we can write this result as
p2
∂f1 ∂f1
= {H1 , f1 } + , H1 = + V (r).
∂t ∂t coll 2m
• The collision integral cannot be written in terms of f1 alone, which is not surprising, as it
represents collisions between two particles. We introduce the n-particle distribution functions
Z
N
fn (r1 , . . . , rn , p1 , . . . , pn , t) = dVn f (ri , pi , t).
n
Next, we note that all N − 1 terms in the collision integral are identical, so
Z
∂U (r − r2 ) ∂f ∂U (r − r2 ) ∂f2
Z
∂f1 N
= dV1 · = dr2 dp2 · .
∂t coll 2 ∂r ∂p ∂r ∂p
81 4. Kinetic Theory
• The same logic may be repeated recursively to find the time evolution of fn . We find
n
∂U (ri − rn+1 ) ∂fn+1
Z
∂fn X
= {Hn , fn } + drn+1 dpn+1 ·
∂t ∂ri ∂pi
i=1
That is, the n-particle distribution evolves by considering the interactions between n particles
alone, plus a correction term involving collisions with an outside particle. This is the BBGKY
hierarchy, converting Hamilton’s equations into N coupled PDEs.
The utility of the BBGKY hierarchy is that it isolates the physically most relevant information in
the lower fn , allowing us to apply approximations.
• We further assume that collisions occur locally in space. Then if there are two particles at a
point r with momenta p and p2 , the rate at which they scatter to p01 and p02 is
where ω describes the dynamics of the collision, and depends on the interaction potential.
where the two terms account for scattering into and out of momentum p. In a proper derivation
of the Boltzmann equation, we would have arrived here by explicitly applying approximations
to the BBGKY hierarchy.
– We’ve tacitly assumed the scattering is the same at all points, so ω doesn’t depend on r.
– Assuming that the external potential only varies appreciably on macroscopic distance scales,
energy and momentum are conserved in collisions, so
– Parity symmetry flips the momenta without swapping incoming and outgoing, so
– Combining these two, we have symmetry between incoming and outgoing momenta,
which assumes the momenta are uncorrelated. This is intuitive because collisions are rare, and
each successive collision a molecule experiences is with a completely different molecule.
• The assumption of molecular chaos is the key assumption that converts the BBGKY hierarchy
into a closed system. It introduces an arrow of time, as the momenta are correlated after a
collision. Since the dynamics are microscopically time-reversible, the momenta must actually
have been correlated before the collision as well. However, generically these initial correlations
are extremely subtle and destroyed by any coarse-graining.
• The collision integral will clearly vanish if we satisfy the detailed balance condition,
• Taking the logarithm of both sides, it is equivalent to say that the sum of log f1 (r, pi ) is
conserved during a collision. Since we know energy and momentum are conserved during a
collision, detailed balance can be achieved if
where µ sets the local particle density. Exponentiating both sides, we see f1 is simply a
Maxwell–Boltzmann distribution with temperature 1/β and drift velocity u.
• Note that β, µ, and u can all be functions of position. Such a solution is said to be in local
equilibrium, and we used them in our heuristic calculations in the previous section.
83 4. Kinetic Theory
• For simplicity, set V (r) = 0. Then the streaming term also vanishes if β, µ, and u are all
constants. When u is zero, we have a standard gas at equilibrium; the freedom to have u
nonzero is a result of momentum conservation. Similarly, the streaming term also vanishes if
u ∝ r × p because of angular momentum conservation, giving a rotating equilibrium solution.
ω(p, p2 |p01 , p02 )f2 (r, r, p, p2 )(1 ± f1 (r, p01 ))(1 ± f1 (r, p02 )) dp2 dp01 dp02
with a plus sign for bosons and a minus sign for fermions. In the fermionic case, the extra factors
√
simply enforce Pauli exclusion; in the bosonic case, they account for the n enhancement for
the amplitude for n bosons to be together.
• All the reasoning then goes through as before, and the detailed balance condition becomes
f1 (p)
log conserved in collisions.
1 ± f1 (p)
When we set this to β(µ−E +u·p), we recover the Bose–Einstein and Fermi–Dirac distributions
with chemical potential µ, temperature 1/β, and drift velocity u.
4.3 Hydrodynamics
84 5. Fluids (TODO)
5 Fluids (TODO)
85 6. Fundamentals of Quantum Mechanics
• A Hilbert space is complex vector spaces with a positive-definite sesquilinear form hα|βi. Ele-
ments of H are called kets, while elements of the dual space H∗ are called bras. Using the form,
we can canonically identify |αi with the bra hα|, analogously to raising and lowering indices.
This is an antilinear map, c|αi ↔ chα|, since the form is sesquilinear.
• A ray is a nonzero ket up to the equivalence relation |ψi ∼ c|ψi for any nonzero complex number
c, indicating that global phases in quantum mechanics at not important.
• Hilbert spaces are also complete, i.e. every Cauchy sequence of kets converges in H.
• A Hilbert space V is separable if it has a countable subset D so that D = V , which turns out
to be equivalent to having a countable orthonormal basis. Hilbert spaces that aren’t separable
are mathematically problematic, so we’ll usually assume this separability.
hα|αihβ|βi ≥ |hα|βi|2 .
The trick to the proof is to use hγ|γi ≥ 0 for |γi = |αi + λ|βi, with λ = −hβ|αi/hβ|βi.
Example. Finite-dimensional Hilbert spaces are all complete and separable. We will mostly deal
with countably infinite-dimensional Hilbert spaces, like the QHO basis |ni. Such spaces are separable,
though uncountably infinite-dimensional spaces are not.
Example. Not all countably infinite-dimensional spaces are complete: consider the space V of
infinite vectors with a finite number of nonzero entries. Then the sequence
A∗ (hβ|)|αi = hβ|(A|αi).
Since we can always construct the pullback (which does not even require an inner product),
it is convenient to use a notation where both side above are represented in the same way. In
Dirac notation, both sides are written as hβ|A|αi, where the leftward action of A on bras is
just that of A∗ above.
86 6. Fundamentals of Quantum Mechanics
Example. Not all isometries are unitary: if |ni is an orthonormal basis with n ∈ Z, the shift
operator A|ni = |n + 1i is an isometry, but AA† 6= 1.
• The spectral theorem states that if A = A† , then all eigenvalues of A are real, and all eigenspaces
with distinct ai are orthogonal. If the space is separable, every eigenspace has finite dimension,
so we can construct an orthonormal eigenbasis by Gram-Schmidt.
• Given a complete orthonormal basis, we can decompose operators and vectors into matrix
elements. For example,
X hφ1 |A|φ1 i hφ1 |A|φ2 i ...
A= |φi ihφi |A|φj ihφj | ∼ hφ2 |A|φ1 i
... . . . .
i,j ... ... ...
Example. If we consider infinite-dimensional spaces, not all Hermitian operators have a complete
eigenbasis. Let H = L2 ([0, 1]) and let A = x̂. Then A has no eigenvectors in H.
• We say A is bounded if
hα|A|αi
sup < ∞.
|αi∈H/{|0i} hα|αi
We say A is compact if every bounded sequence {|αn i} (with hαn |αn i < β for some fixed β)
has a subsequence {|αnk i} so that {A|αnk i} is norm-convergent in H.
• One can show that if A is compact, A is bounded. Compactness is sufficient for a Hermitian
operator to be complete, but boundedness is neither necessary not sufficient. However, we will
still consider observables that are neither bounded nor compact, when it turns out to be useful.
• If |ai i and |bi i are two complete orthonormal bases, then U defined by U |ai i = |bi i is unitary.
This yields the change of basis formula,
X = Xij |ai ihaj | = Ykl |bk ihbl |, Xij = Uik Ykl Ulj† .
• Using the above formula, a finite-dimensional Hermitian matrix can always be diagonalized by
a unitary, i.e. a matrix that changes basis to an orthonormal eigenbasis.
• If A and B are diagonalizable, they are simultaneously diagonalizable iff [A, B] = 0, in which
case we say A and B are compatible. The forward direction is easy. For the converse, let
A|αi i = ai |αi i. Then AB|αi i = ai B|αi i so B preserves A’s eigenspaces. Therefore when A is
diagonalized, B is block diagonal, and we can make B diagonal by diagonalizing within each
eigenspace of A.
eA Be−A = eadA B
and finding a differential equation for F ; this is the same idea in different notation.
• Glauber’s theorem states that if [A, B] commutes with both A and B, then
A B 1
e e = exp A + B + [A, B] .
2
To see this, define
λ2
F (λ) = exp λ(A + B) + [A, B]
2
This solution satisfies the differential equation as long as the argument of the exponential
commutes with its derivative, which we can quickly verify. Setting λ = 1 gives the result.
• In the case of general [A, B], eA eB can still be expressed as a single exponential in a more
complicated way, using the full Baker–Campbell–Hausdorff theorem, which subsumes Glauber’s
theorem as a special case.
4. If an observable A is measured when the system is in a state |αi, where A has an orthonormal
basis of eigenvectors |αi i with eigenvalues ai , the probability of observing A = a is
X X
|haj |αi|2 = hα|Pa |αi, Pa = |aj ihaj |.
aj =a aj =a
The fourth postulate implies the state of a system can change in an irreversible, discontinuous way.
There are other formalisms that do not have this feature, though we’ll take it as truth here.
P
Example. Let all eigenvalues of A be nondegenerate. Then if |αi = ci |ai i, the probability of
2
observing A = ai is |ci | , and the resulting state is |ai i. The expectation value of A is
X
hAi = |ci |2 ai = hα|A|αi.
Example. Spin 1/2. The Hilbert space is two-dimensional, and the operators that measure spin
about each axis are
~ 0 1 0 −1 1 0
Si = σi , σx = , σy = , σz = .
2 1 0 i 0 0 −1
These two terms are skew-Hermitian and Hermitian, so their expectation values are imaginary and
real, respectively. Then we have
1
h∆A2 ih∆B 2 i ≥ |h[A, B]i|2 + |h{∆A, ∆B}i|2 .
4
Ignoring the second term gives
1
σA σB ≥ |h[A, B]i|
2
where σX is the standard deviation. This is the uncertainty principle.
• The state of a particle on a line is an element of the Hilbert space H = L2 (R), the set of square
integrable functions on R. This space is separable, and hence has a countable basis.
However, this approach is physically inconvenient because most operators of interest (e.g. x̂,
p̂ = −i~∂x ) cannot be diagonalized in H, as their eigenfunctions would not be normalizable.
• We will treat all of these operators as acceptable, and formally include their eigenvectors, even
if they are not in H. This severely enlarges the space under consideration, because x and p have
uncountable eigenbases while the original space had a countable basis. Physically, this is not be
a problem because all physical measurements of x are ‘smeared out’ and not infinitely precise.
Thus the observables we actually measure do live in H, and x is just a convenient formal tool.
Using completeness, Z Z
|ψi = dx |xihx|ψi = dx ψ(x)|xi.
• In many cases, a quantum theory can be obtained by “canonical quantization”, replacing Poisson
brackets of classical observables with commutators of quantum operators, times a factor of i~.
When applied to position and momentum, this gives [x̂, p̂] = i~.
• Note that for a finite-dimensional Hilbert space, the trace of the left-hand side vanishes by the
cyclic property of the trace, while the trace of the right-hand side doesn’t. The cyclic property
doesn’t hold in infinite-dimensional Hilbert spaces, which are hence requires to describe position
and momentum.
91 6. Fundamentals of Quantum Mechanics
p̂|pi = p|pi.
Hence we may define a momentum space wavefunction, and the commutation relation immedi-
ately yields the Heisenberg uncertainty principle σx σp ≥ ~2 .
Therefore, we conclude
1
hx|pi = √ eipx/~
2π~
where we set an arbitrary phase to one.
Example. The momentum basis is complete if the position basis is. Insert the identity twice for
0
eip(x −x)/~ 0
Z Z Z Z
dp |pihp| = dxdx0 dp |xihx|pihp|x0 ihx0 | = dxdx0 dp |xi hx | = dx |xihx|.
2π~
Then if one side is the identity, so is the other.
Example. The momentum-space wavefunction φ(p) = hp|ψi is related to ψ(x) by Fourier transform,
Z Z Z
1 −ipx/~ 1
φ(p) = dx hp|xihx|ψi = √ dx e ψ(x), ψ(x) = √ dp eipx/~ φ(p).
2π~ 2π~
This is the main place where conventions may differ. The original factor of 2π comes from the
representation of the delta function
Z
δ(x) = dξ e2πixξ .
When defining the momentum eigenstates, we have freedom in choosing the scale of p, which can
change the hx0 |p0 i expression above. This allows us to move the 2π factor around. In field theory
texts, we prefer define momentum integrals to have a differential of the form dk p/(2π)k .
−ψ 00 + V ψ = Eψ.
Consider two degenerate solutions ψ and φ. Then combining the equations gives
dW
φψ 00 − ψφ00 = 0 =
dx
where W is the Wronskian of the solutions,
0 0 φ ψ
W = φψ − ψφ = det 0 .
φ ψ0
• In this case, if both ψ and φ vanish at some point, then W = 0 so the solutions are simply
multiples of each other. In particular, bound state wavefunctions vanish at infinity, so bound
states are not degenerate. Unbound states can be two-fold degenerate, such as e±ikx for the
free particle.
• Since the Schrodinger equation is real, if ψ is a solution with energy E, then ψ ∗ is a solution
with energy E. If the solution ψ is not degenerate, then we must have ψ = cψ ∗ , which means
ψ is real up to a constant phase. Hence bound state wavefunctions can be chosen real. It turns
out nonbound state wavefunctions can also be chosen real. These arguments are really just
time-reversal invariant arguments in disguise, since we are conjugating the wavefunction.
• For bound states, the bound state with the nth lowest energy has n − 1 nodes. Moreover, the
nodes interleave as n is increased.
p i~
v= = − ∇.
m m
The probability density and current satisfy the continuity equation
∂ρ
+ ∇ · J = 0.
∂t
93 6. Fundamentals of Quantum Mechanics
In particular, note that for an energy eigenfunction, J = 0 identically since it can be chosen real.
Also note that with a magnetic field, we would have v = (p − qA)/m instead.
However, physically interpreting ρ and J is subtle. For example, consider multiplying by the
particle charge q, so we have formal charge densities and currents. It is not true that a particle
sources an electromagnetic field with charge density eρ and current density eJ. The electric field of
a particle at x is
q(r − x)
Ex (r) = .
|r − x|3
Hence a perfect measurement of E is a measurement of the particle position x. Thus for the
hydrogen atom, we would not measure an exponentially small electric field at large distances, but
a dipole field! The state of the system is not |ψi ⊗ Eρ , but rather an entangled state like
Z
dx |xi ⊗ |Ex i
where we consider only the electrostatic field. To avoid these errors, it’s better to think of the
wavefunction as describing an ensemble of particles, rather than a single “spread out” particle. Note
if the measurement takes longer than the characteristic orbit time of the electron, then we will only
see the averaged field due to qJ.
• Suppose we have a Hamiltonian H(xa , λi ) with control parameters λi . If the energies never cross,
we can index the eigenstates as a function of λ as |n(λ)i. If the space of control parameters is
contractible, the |n(λ)i can be taken to be smooth, though we will see cases where they cannot.
• The adiabatic theorem states that if the λi are changed sufficiently slowly, a state initially
in |n(λ(ti ))i will end up in the state |n(λ(tf ))i, up to an extra phase called the Berry phase.
This is essentially because the rapid phase oscillations of the coefficients prevent transition
amplitudes from accumulating, as we’ve seen in time-dependent perturbation theory.
94 6. Fundamentals of Quantum Mechanics
• The phase oscillations between two energy levels have timescale ~/∆E, so the adiabatic theorem
holds if the timescale of the change in the Hamiltonian is much greater than this; it fails if
energy levels become degenerate with the occupied one.
• The quantum adiabatic theorem implies that quantum numbers n are conserved, and in the
semiclassical limit I
p dq = nh
which implies the classical adiabatic theorem. Additionally, since the occupancy of quantum
states is preserved, the entropy stays the same, linking to the thermodynamic definition of an
adiabatic process.
• To parametrize the error in the adiabatic theorem, we could write the time dependence as
H = H(τ ) with τ = t and take → 0 and t → ∞, holding τ fixed. We can then expand the
coefficients in a power series in .
• When this is done carefully, we find that as long as the energy levels are nondegenerate, the
adiabatic theorem holds to all orders in . To see why, note that the error terms will look like
Z τf
dτ eiωτ / f (τ )
τi
If the levels are nondegenerate, then the integral must be evaluated by the saddle point approx-
imation, giving a result of the form e−ωτ / , which vanishes faster than any power of .
• For comparison, note that for a constant perturbation, time-dependent perturbation theory
gives a transition amplitude that goes as , rather than e−1/ . This discrepancy is because
the constant perturbation is suddenly added, rather than adiabatically turned on; if all time
derivatives of the Hamiltonian are smooth, we get e−1/ .
iγ̇ + hn|ṅi = 0.
A0i = Ai + ∂i ω.
This is just like a gauge transformation in electromagnetism, except there, the parameters λi are
replaced by spatial coordinates. Geometrically, Ai is a one-form over the space of parameters,
like Ai is a one-form over Minkowski space.
Fij (λ) = ∂i Aj − ∂j Ai
called the Berry curvature. Using Stokes’ theorem, we may write the Berry phase as
Z Z
γ= Ai dλi = Fij dS ij
C S
• Geometrically, we can describe this situation using a U (1) bundle over M , the parameter space.
The Berry connection is simply a connection on this bundle; picking a phase convention amounts
to choosing a section.
• More generally, if our state has n-fold degeneracy, we have a non-abelian Berry connection for
a U (n) bundle. The equations pick up more indices; we have
∂
(Ai )(λ)ba = ihna | |nb i
∂λi
while a gauge transformation |n0 (λ)i = Ωab (λ)|nb (λ)i produces
∂Ω †
A0i = ΩAi Ω† − i Ω.
∂λi
and the generalization of the Berry phase, called the Berry holonomy, is
I
i
U = P exp i Ai dλ .
96 6. Fundamentals of Quantum Mechanics
Example. A particle with spin s in a magnetic field of fixed magnitude. The parameter space S 2
is in magnetic field space. We may define states in this space as
This is potentially singular at θ = 0 and θ = π, and the extra phase factor ensures there is no
singularity at θ = 0. The Berry connection is
A(m) = m(cos θ − 1) dφ
Hence we have a magnetic monopole in B-space of strength proportional to m, and the singularity
in the states and in A(m) is due to the Dirac string.
Next, we consider the Born–Oppenheimer approximation, an important application.
• In the theory of molecules, the basic Hamiltonian includes the kinetic energies of the nuclei
and electrons, as well as Coulomb interactions between them. We have a small parameter
κ ∼ (m/M )1/4 where m is the electron mass and M is the mass of the nuclei.
• In a precise treatment, we would expand in orders of κ. For example, for diatomic molecules
we can directly show that electronic excitations have energies of order E0 = e2 /a0 , where a0
is the Bohr radius, vibrational modes have energies of order κ2 E0 , and rotational modes have
energies of order κ4 E0 . These features generalize to all molecules.
• A simpler approximation is to simply note that if the electrons and nuclei have about the same
kinetic energy, the nuclei move much slower. Moreover, the uncertainty principle places weaker
constraints on their positions and momenta. Hence we could treat the positions R of the nuclei
as classical, giving a Hamiltonian Helec (r, p; R) for the electrons,
X p2 2
e X 1 X Zα
i
Helec = + − .
2m 4π0 |ri − rj | |ri − Rα |
i i6=j iα
• Applying the adiabatic theorem to variations of R in Helec , we find eigenfunctions and energies
for the electrons alone. We can hence write the wavefunction of the full system as
X
|Ψi = |Φn i|φn i
n
• To reduce this to an effective Schrodinger equation for the nuclei, we act with hφm |, giving
X
hφm |Hnuc |φn Φn i + Em (R)|φm i = E|φm i.
n
Then naively, Hnuc is diagonal in the electron space and the effective Schrodinger equation
for the nuclei is just the ordinary Schrodinger equation with an extra contribution to the
energy, Em (R). This shows quantitatively how nuclei are attracted to each other by changes
in electronic energy levels, in a chemical bond.
• A bit more accurately, we note that Hnuc contains ∇2α , which also acts on the electronic
wavefunctions. Applying the product rule and inserting the identity,
X
hφm |∇2α |φn Φn i = (δmk ∇α + hφm |∇α |φk i) (δkn ∇α + hφk |∇α |φn i) |Φn i.
k
Off-diagonal elements are suppressed by differences of electronic energies, which we assume are
large. However, differentiating the electronic wavefunction has converted ordinary derivatives
to covariant derivatives, giving
X ~2 e 2 X Zα Zβ
eff
Hnuc = (∇α − iAα )2 + + En (R).
α
2Mα 4π0 |Rα − Rβ |
α6=β
The electron motion provides an effective magnetic field for the nuclei.
(p − qA)2
H= + qφ
2m
as in classical mechanics. Here, p is the canonical momentum, so it corresponds to −i~∇.
• There is an ordering ambiguity, since A and p do not commute at the quantum level. We
will set the term linear in A to p · A + A · p, as this is the only combination that makes H
Hermitian, as one can check by demanding hψ|H|ψi to be real. Another way out is to just stick
with Coulomb gauge, ∇ · A = 0, since in this case p · A = A · p.
• The kinetic momentum is π = p − qA and the velocity operator is v = π/m. The velocity
operator is the operator that should appear in the continuity equation for probability, as it
corresponds to the classical velocity.
• Under a gauge transformation specified by an arbitrary function α, called the gauge scalar,
φ → φ − ∂t α, A → A + ∇α.
• In order to make the Schrodinger equation gauge invariant, we need to allow the wavefunction
to transform as well, by
ψ → eiqα/~ ψ.
If the Schrodinger equation holds for the old potential and wavefunction, then it also holds for
the gauge-transformed potential and wavefunction. Roughly speaking, the extra eiqα/~ factor
can be ‘pulled through’ the time and space derivatives, leaving behind extra ∂µ α factors that
exactly cancel the additional terms from the gauge transformation.
• In the context of gauge theories, the reasoning goes the other way. Given that we want to
make ψ → eiqα/~ ψ a symmetry of the theory, we conclude that the derivative (here, p) must
be converted into a covariant derivative (here, π).
• The phase of the wavefunction has no direct physical meaning, since it isn’t gauge invariant.
Similarly, the canonical momentum isn’t gauge invariant, but the kinetic momentum π is. The
particle satisfies the Lorentz force law in Heisenberg picture if we work in terms of π.
• The fact that the components of velocity v don’t commute can be understood directly from
our intuition for Poisson brackets; in the presence of a magnetic field parallel to ẑ, a particle
moving in the x̂ direction is deflected in the ŷ direction.
Note. More about the canonical momentum p = π +qA. We may roughly think of qA as “potential
momentum” so that p is, in certain restricted settings, conserved. For example, suppose a particle
is near a solenoid, which is very rapidly turned on. According to the Schrodinger equation, p does
not change during this process if it is sufficiently fast. On the other hand, the particle receives a
finite impulse since
∂A
E=− .
∂t
Hence this process may be viewed as transferring momentum from kinetic to potential. Another
place this picture works is in the interaction of charges and monopoles, since we have translational
invariance, giving significant insight into the equations of motion.
Electromagnetic fields lead to some interesting topological phenomena.
Example. A particle around a flux tube. Consider a particle constrained to lie on a ring of radius
r, through which a magnetic flux Φ passes. Then we can take
Φ
Aφ =
2πr
and the Hamiltonian is
(pφ − qAφ )2 qΦ 2
1
H= = −i~∂φ − .
2m 2mr2 2π
The energy eigenstates are still exponentials, of the form
1
ψ=√ einφ
2πr
where n ∈ Z since the wavefunction is single-valued. Plugging this in, the energy is
~2 Φ 2
E= n−
2mr2 Φ0
99 6. Fundamentals of Quantum Mechanics
where Φ0 = 2π~/q is the quantum of flux. Since generally Φ/Φ0 is not an integer, the presence
of the magnetic field affects the spectrum even though the magnetic field is zero everywhere the
wavefunction is nonzero!
We can also look at this phenomenon in a slightly different way. Suppose we were to try to
gauge away the vector potential. Since
Φφ
A = ∇α, α=
2π
we might try a gauge transformation with gauge scalar α. Then the wavefunction transforms as
iqα Φ
ψ → exp ψ = exp iφ ψ.
~ Φ0
Note. Sometimes, these two arguments are mixed up, leading to claims that the flux through any
loop must be quantized in multiples of Φ0 . This is simply incorrect, but it is true for superconducting
loops if ψ is interpreted as the macroscopic wavefunction. This is because the energy of the
superconducting loop is minimized when Φ/Φ0 is an integer. (add more detail)
Note. It is also useful to think about how the energy levels move, i.e. the “spectral flow”. For
zero field, the |n = 0i state sits at the bottom, while the states ±|ni are degenerate. As the field is
increased, the energy levels shift around so that once the flux is Φ0 , the |ni state has moved to the
energy level of the original |n + 1i state.
Example. The Aharanov–Bohm effect. Consider the double slit experiment, but with a solenoid
hidden behind the wall between the slits. Then the presence of the solenoid affects the interference
pattern, even if its electromagnetic field is zero everywhere the particle goes! To see this, note that
a path from the starting point to a point x picks up a phase
q x
Z
∆θ = A(x0 ) · dx0 .
~
Then the two possible paths through the slits pick up a relative phase
I Z
q q qΦ
∆θ = A · dx = B · dS =
~ ~ ~
which shifts the interference pattern. Again, we see that if Φ is a multiple of Φ0 , the effect vanishes,
but in general there is a physically observable effect.
Note. There are two ways to justify the phases. In the path integral formulation, we sum over all
classical paths with phase eiS/~ . The dominant contribution comes from the two classical paths, so
we can ignore all others; the phase shift for each path is just ei∆S/~ .
100 6. Fundamentals of Quantum Mechanics
Alternatively, we can use the adiabatic theorem. Suppose that we have a well-localized, slowly-
moving particle in a vector potential A(x). Then we can apply the adiabatic theorem, where
the parameter is the particle’s position, one can show the Berry connection is A, and the Berry
curvature is B, giving the same conclusion. In the path integral method, the adiabatic assumption
manifests as ignoring the p · dx phase.
Note. We may also describe the above effects with fiber bundles, though it adds little because all
U (1) bundles over S 1 are trivial. However, it can be useful to think in terms of gauge patches. If
we cover S 1 with two patches, we can gauge away A within each patch, and the physical phases in
both examples above arise solely from transition functions. This can be more convenient in some
situations, since the effects of A don’t appear in the Schrodinger equations in each patch.
Example. Dirac quantization of magnetic monopoles. A magnetic monopole has a magnetic field
gr̂
B=
4πr2
where the magnetic charge g is its total flux. To get around Gauss’s law (i.e. writing B = ∇ × A),
we must use a singular vector potential. Two possible examples are
g 1 − cos θ g 1 + cos θ
AN
φ = , ASφ = − .
4πr sin θ 4πr sin θ
These vector potentials are singular along the lines θ = π and θ = 0, respectively, which we call
Dirac strings. Physically, we can think of a magnetic monopole as one end of a solenoid that extends
off to infinity that’s too thin to detect; the solenoid then lies on the Dirac string. Note that there
is only one Dirac string, not two, but where it is depends on whether we use AN S
φ or Aφ .
To solve the Schrodinger equation for a particle in this field, we must solve it separately in the
Northern hemisphere (where AN φ is nonsingular) and the Southern hemisphere, giving wavefunctions
ψN and ψS . On the equator, where they overlap, they must differ by a gauge transformation
gφ
ψN = eiqα/~ ψS , α= .
2π
But since the wavefunction must be single-valued, g must be a multiple of Φ0 , giving the Dirac
quantization condition
qg = 2π~n.
A slight modification of this argument for dyons, with both electric and magnetic charge, gives
q1 g2 − q2 g1 = 2π~n.
Note. An alternate derivation of the Dirac quantization condition. Consider a particle that moves
in the field of a monopole, in a closed path that subtends a magnetic flux Φ. As we know already,
the resulting phase shift is ∆θ = qΦ/~. But we could also have taken a surface that wrapped about
the monopole the other way, with a flux Φ − g and phase shift ∆θ0 = q(Φ − g)/~.
101 6. Fundamentals of Quantum Mechanics
Since we consider the exact same path in both situations (and the phase shift is observable, as
we could interfere it with a state that didn’t move at all), the phase shifts must differ by a multiple
of 2π for consistency. This recovers the Dirac quantization condition.
The exact same argument applies to the abstract monopole in B-space in the previous section.
This underscores the fact that the quantization of magnetic charge has nothing to do with real
space; it is fundamentally because there are discretely many distinct U (1) bundles on the sphere,
as we show in more detail below.
Note. A heuristic derivation of the Dirac quantization condition. One can show the conserved
angular momentum of the monopole-charge system, with the monopole again fixed, is
qg
L = r × mv − r̂.
4π
The second term is the angular momentum stored in the electromagnetic fields. Using the fact that
angular momentum is quantized in units of ~/2 gives the same result.
Note. Formally, a wavefunction is a section of a complex line bundle associated with the U (1)
gauge bundle. In the case of a nontrivial bundle, the wavefunction can only be defined on patches;
naively attempting to define it globally will give a multivalued or singular wavefunction. This is
why some say that the wavefunction can be multivalued in certain situations. In all the cases we
have considered here, the bundle is trivial, so all wavefunctions may be globally defined. It turns
out that over a manifold M the equivalence classes of complex line bundles are classified by the
Picard group H 2 (M, Z). For instance, this is nontrivial for a two-dimensional torus.
This formalism lets us derive the Dirac quantization condition without referring to matter. The
point is that AN − AS = dλ for a single-valued function λ defined on the equator S 1 . Then
Z Z Z Z Z
N S N S
F = dA + dA = (A − A ) = dλ
S2 N S S1 S1
which is an integer. This quantity is called the first Chern class of the U (1) bundle.
√
Note. The behavior of a wavefunction has a neat analogy with fluid flow. We let ψ = ρeiθ . Then
the Schrodinger equation is
∂ρ ∂θ mv 2 ~2 1 2 √
= −∇ · (ρv), ~ =− − qφ + √ ∇ ( ρ)
∂t ∂t 2 2m ρ
where the velocity is v = (~∇θ − qA)/m. The first equation is simply the continuity equation, while
the second is familiar from hydrodynmaics if ~θ is identified as the “velocity potential”, and the
right-hand side is identified as the negative of the energy. We see there is an additional “quantum”
contribution to the energy, which can be interpreted as the energy required to compress the fluid.
The second equation becomes a bit more intuitive by taking the gradient, giving
~2
∂v q ∂A 1 2√
= −∇φ − − v × (∇ × v) − (v · ∇)v + ∇ √ ∇ ρ .
∂t m ∂t 2m ρ
Note that the definition of the velocity relates the vorticity with the magnetic field,
q
∇ × v = − B.
m
Then the first two terms on the right-hand side are simply the Lorentz force. The third simply
converts the partial time derivative to a convective derivative. Now in general this picture isn’t
physical, because we can’t think of the wavefunction ψ as a classical field, identifying the probability
density with charge density. However, it is a perfectly good picture when ψ is a macroscopic
wavefunction, as is the case for superconductivity.
102 6. Fundamentals of Quantum Mechanics
• The Hamiltonian
p̂2 mω 2 x̂2
H= +
2m 2
p √
has a characteristic length ~/mω, characteristic momentum m~ω, and characteristic energy
~ω. Setting all of these quantities to one, or equivalently setting ω = ~ = m = 1,
p̂2 x̂2
H= + , [x̂, p̂] = i.
2 2
We can later recover all units by dimensional analysis.
• Since the potential goes to infinity at infinity, there are only bound states, and hence the
spectrum of H is discrete. Moreover, since we are working in one dimension, the eigenfunctions
of H are nondegenerate.
N |νi = ν|νi, ν ≥ 0.
This implies that a|νi is an eigenket of N with eigenvalue ν −1, and similarly a† |νi has eigenvalue
ν + 1. Therefore, starting with a single eigenket, we can get a ladder of eigenstates.
103 6. Fundamentals of Quantum Mechanics
Therefore, the ladder terminates on the bottom with ν = 0 and doesn’t terminate on the top.
Moreover, all eigenvalues ν must be integers; if not, we could lower until the eigenvalue was
negative, contradicting the positive definiteness of N . We can show there aren’t multiple copies
of the ladder by switching to wavefunctions and using uniqueness, as shown below.
• Using the equations above, we find that for the |ni to be normalized, we have
√ √
a|ni = n|n − 1i, a† |ni = n + 1|n + 1i.
There can in principle be a phase factor, but we use our phase freedom in the eigenkets to
rotate it to zero. Repeating this, we find
(a† )n
|ni = √ |0i.
n!
To simplify the derivative factor, we ‘commute past the exponential’, using the identity
2 /2 2 /2
(x − ∂x )ex f = e−x ∂x f.
Therefore we find
1 (−1)n x2 /2 n −x2
√
ψn (x) = e ∂x e .
π 1/4 n!2n
In terms of the Hermite polynomials, we have
1 1 2 2 2
ψn (x) = √ Hn (x)e−x /2 , Hn (x) = (−1)x ex ∂xn e−x .
π 1/4 n!2n
Note. Similarly, we can find the momentum space wavefunction ψen (p) by writing a† in momentum
space. The result turns out to be identical up to phase factors and scaling; this is because unitary
evolution with the harmonic oscillator potential for time π/2 Fourier transforms the wavefunction
(as shown below), and this evolution leaves ψn (x) unchanged up to a phase factor.
104 6. Fundamentals of Quantum Mechanics
• The Hamiltonian is still H = (x̂2 + p̂2 )/2, but the operators have time-dependence equivalent
to the classical equations of motion,
dx̂ dp̂
= p̂, = −x̂.
dt dt
The solution to this is simply clockwise circular motion in phase space, as it is classically,
x̂(t) cos t sin t x̂0
= .
p̂(t) − sin t cos t p̂0
Then the expectation values of position and momentum behave as they do classically.
• Moreover, the time evolution for π/2 turns position eigenstates into momentum eigenstates. To
see this, let U = e−iH(π/2) and let x0 |xi = x|xi. Then
U x0 U −1 U |xi = U x|xi
• Not all “nearly classical” states are coherent states, but it’s also also true that not all states
with high occupancy numbers look nearly classical. For example, |ni for high n doesn’t look
classical, since it is completely delocalized.
105 6. Fundamentals of Quantum Mechanics
• The state |0i is a coherent state, and we can generate others by applying the position and
momentum translation operators
and
(S(b)ψ)(x) = eibx ψ(x), (S(b)φ)(p) = φ(p − b).
Therefore the translation operators shift expectation values and keep dispersions constant.
Moreover, they don’t commute; using the above relations, we instead have
• Due to the noncommutativity, the order of the position and momentum translations matters.
To put them on an equal footing, we define the Heisenberg operators
W (a, b) = ei(bx̂−ap̂) .
With this setup, it’s easy to show some important properties of coherent states.
• From our Heisenberg picture results, we know that the expectation values of |a, bi will evolve
classically. To show that the dispersions are constant over time, it’s convenient to switch to
raising and lowering operators. Defining the complex variable z as before, we have
• This makes it easy to compute properties of the coherent states; for example,
as well as
hz|n̂2 |zi = hz|a† aa† a|zi = |z|2 hz|aa† |zi = |z|4 + |z|2 .
In particular, this means var(n̂) = |z|2 . All these results are consistent with the fact that the
number distribution is Poisson with mean |z|2 .
in accordance with the classical z(t) evolution we saw before. This implies the coherent state
remains coherent. We can also see this result from the Heisenberg time evolution of a and a† .
• In the z/z variables, the uncertainty relation is ∆n∆ϕ & 1, where ϕ is the uncertainty on the
phase of z. Physically, if we consider the quantum electromagnetic field, this relation bounds
the uncertainty on the number of photons and the phase of the corresponding classical wave.
• Since a is not Hermitian, its eigenvectors are not a complete set, nor are they even orthogonal.
However, they are an “overcomplete” set, in the sense that
Z
dxdp
|zihz| = 1.
2π
To see this, act with hm| on the left and |ni on the right for
1
Z n ∗ m Z Z
dϕ z n (z ∗ )m
−|z|2 z (z ) 2 −|z|2
dxdp e √ = d|z| e √ .
2π n!m! 2π n!m!
The phase integral is zero unless n = m. When n = m, the phase integral is 1, and the d|z|2
integral also gives 1, showing the result.
Note. Coherent states are ubiquitous in nature, because they are generically produced by classically
driving a harmonic oscillator. For a harmonic oscillator experiencing force f (t), we have
Z
x(t) = x0 (t) + dt0 sin(t − t0 )θ(t − t0 )f (t0 )
where we fix â and ↠to be the Heisenberg operators at time t = 0. Now we focus on times t after
the driving ends. The step function is just 1, so denoting a Fourier transform with a tilde,
1 i ˜ −it † i ˜
x̂(t) = √ â + √ f (1) e + â − √ f (−1) eit
2 2 2
where the expressions look a little strange because we have set ω = 1. However, for all times,
so the final expressions for â(t) and ↠(t) must be the factors in parentheses above. The ground
state evolves into a state annihilated by â(t), which is precisely a coherent state. The other states
evolve into this state, raised by powers of ↠(t).
This result can also be derived directly at the level of the states. Setting ~ = ω = 1 again, let
the Hamiltonian be
H = a† a + f ∗ (t)a + f (t)a†
where we have generalized the forcing term to the most general one, which is Hermitian and linear
in x and p. In interaction picture,
Solving the Schrodinger equation then yields a time evolution operator whose form is an exponential
of a linear combination of a and a† . But this is precisely the form of the operators W (a, b) defined
above, so it turns the vacuum into a coherent state.
Note. The classical electromagnetic field in a laser is really a coherent state of the quantum
electromagnetic field; in general classical fields emerge from quantum ones by stacking many quanta
together. A more exotic example occurs for superfluids, where the excitations are bosons which
form a coherent field state, ψ̂(x)|ψi = ψ(x)|ψi. In the limit of large occupancies, we may treat the
state as a classical field ψ(x), which is often called a “macroscopic wavefunction”.
Note. As we’ve seen, coherent states simply oscillate indefinitely, with their wavefunctions never
spreading out. This is special to the harmonic oscillator, and it is because its frequencies have integer
spacing, which makes all frequency differences multiples of ~ω. Forming analogues of coherent states
in general potentials, such as the Coulomb potential, is much harder.
• We consider the standard ‘kinetic-plus-potential’ Hamiltonian, and attempt to solve the time-
independent Schrodinger equation. For a constant potential, the solutions are plane waves,
ψ(x) = A(x)eiS(x)/~
where we expect A(x) varies slowly, on the scale L, while S(x) still varies rapidly, on the scale
λ. Then the solution locally looks like a plane wave with momentum
p(x) = ∇S(x).
• To make this more quantitative, we write the logarithm of the wavefunction as a series in ~,
i
ψ(x) = exp W (x) , W (x) = W0 (x) + ~W1 (x) + ~2 W2 (x) + . . . .
~
Comparing this to our earlier ansatz, we identify W0 with S and W1 with −i log A, though the
true S and A receive higher-order corrections.
• To see the meaning of this result, define a velocity field and density
∂H p(x)
v(x) = = , ρ(x) = A(x)2 .
∂p m
Then the amplitude transport equation says
∇ · J = 0, J(x) = ρ(x)v(x)
• The same reasoning can be applied to the time-dependent Schrodinger equation with a time-
dependent Hamiltonian, giving
1 ∂S
(∇S)2 + V (x, t) + = 0.
2m ∂t
This is simply the time-dependent Hamilton-Jacobi equation.
Since S is the integral of p(x), it is simply the phase space area swept out by the classical
particle’s path.
• Note that in classically forbidden regions, S becomes imaginary, turning oscillation into ex-
ponential decay. In classically allowed regions, the two signs of S are simply interpreted as
whether the particle is moving left or right. For concreteness we choose
(p
2m(E − V (x)) E > V (x),
p(x) = p
i 2m(V (x) − E) E < V (x).
√
• The result A ∝ 1/ p has a simple classical interpretation. Consider a classical particle oscillat-
ing in a potential well. Then the amount of time it spends at a point is inversely proportional
to the velocity at that point, and indeed A2 ∝ 1/p ∝ 1/v. Then the semiclassical swarm of
particles modeling a stationary state should be uniformly distributed in time.
• This semiclassical picture also applies to time-independent scattering states, which can be
interpreted as a semiclassical stream of particles entering and disappearing at infinity.
• Note that the WKB approximation breaks down for classical turning points (where V (x) = E)
since the de Broglie wavelength diverges.
110 6. Fundamentals of Quantum Mechanics
We now derive the connection formulas, which deal with turning points.
to deal with only real quantities. Then the general WKB solution is
1
ψII (x) = p cg eK(x)/~ + cd e−K(x)/~
|p(x)|
• The connection formulas relate cr and c` with cg and cd . Taylor expanding near the turning
point, the Schrodinger equation is
~2 d2 ψ
− + V 0 (xr )(x − xr )ψ = 0.
2m dx2
To nondimensionalize, we switch to the shifted and scaled variable z defined by
1/3
~2 d2 ψ
x = xr + az, a= , − zψ = 0.
2mV 0 (xr ) dz 2
• The two independent solutions to Airy’s equation are Ai(x) and Bi(x). They are the exact
solutions of Schrodinger’s equation for a particle in a uniform field, such a gravitational or
electric field. Both oscillate for z 0, and exponentially decay and grow for z 0,
cos α(z) sin α(z)
√π(−z)1/4 z 0 √π(−z)1/4 z 0
Ai(x) = Bi(x) =
e−β(z) eβ(z)
√ 1/4
z 0, √ 1/4
z 0,
2 πz πz
where
2 π 2
α(z) = − (−z)3/2 + , β(z) = z 3/2
3 4 3
as can be shown by the saddle point approximation.
111 6. Fundamentals of Quantum Mechanics
• The analysis for a classically forbidden region on the left is very similar. On the left,
Z x
1
−K(x)/~
ψIII (x) = p cg e K(x)/~
+ cd e , K(x) = |p(x0 )| dx0
|p(x)| x`
where the phase factors are again chosen for convenience. Then we find
1 1
cg cr
= 2 2 .
cd −i i c`
• Next, consider a oscillator with turning points x` and xr . This problem can be solved by
demanding exponential decay on both sides. Intuitively, the particle picks up a phase of
I
1
p dx − π
~
through one oscillation, so demanding the wavefunction be single-valued gives
I
2πI = p dx = (n + 1/2)h, n = 0, 1, 2, . . .
which is the Bohr-Sommerfeld quantization rule. The quantity I is proportional to the phase
space area of the orbit, and called the action in classical mechanics. The semiclassical estimate
for the energy of the state is just the energy of the classical solution with action I.
√
I r
2E 2πE
p dx = π 2mE 2
=
mω ω
which yields
En = (n + 1/2)~ω
which are the exact energy eigenvalues; however, the energy eigenstates are not exact.
• We can also consider reflection from a hard wall, i.e. an infinite potential. In this case the
right-moving and left-moving waves must cancel exactly at the wall, c` = −icr , which implies
that the reflected wave picks up a phase of −π.
(n + 1)2 ~2 π 2
En =
2mL2
which is the exact answer.
• Finally, we can have periodic boundary conditions, such as when a particleH moves on a ring.
Then there are no phase shifts at all, and the quantization condition is just p dx = nh.
• Generally, we find that for a system with an n-dimensional configuration space, each stationary
state occupies a phase space volume of hn . This provides a quick way to calculate the density
of states.
Note. Classical and quantum frequencies. The classical frequency ωc is the frequency of the classical
oscillation, and obeys ωc = dE/dI. The quantum frequency ωq is the rate of change of the quantum
phase. These are different; for the harmonic oscillator ωc does not depend on n but ωq does.
Now, when a quantum oscillator transitions between states with difference ∆ωq in quantum
frequencies, it releases radiation of frequency ∆ωq . On the other hand, we know that a classical
113 6. Fundamentals of Quantum Mechanics
Note. The real Bohr model. Typically the Bohr model is introduced by the postulate that L = n~
in circular orbits, but this is a simplification; Bohr actually had a better justification. By the
correspondence principle as outlined above, we have ∆ωq = ωc , and Planck had previously motivated
∆E = ~∆ωq for matter oscillators. If we assume circular orbits with radii r and r − ∆r, these
√
relations give ∆r = 2 a0 r, which implies that r ∝ n2 when n 1. This is equivalent to L = n~.
Bohr’s radical step is then to assume these results hold for all n.
114 7. Path Integrals
7 Path Integrals
7.1 Formulation
• Define the propagator as the position-space matrix elements of the time evolution operator,
• Since we often work in the position basis, we distinguish the Hamiltonian operator acting on
kets, |Hi and the differential operator acting on wavefunctions, H. They are related by
hx|Ĥ|ψi = Hhx|ψi.
Example. The propagator for the free particle. Since the problem is time-independent, we set
t0 = 0 and drop it. Then
p2 t
Z
dp i
= exp p(x − x0 ) −
2π~ ~ 2m
2
r
m i m(x − x0 )
= exp
2πi~t ~ 2t
where we performed a Gaussian integral. The limit t → 0 is somewhat singular; we expect it is
a delta function, yet the magnitude of the propagator is equal for all x. The resolution is that
the phase oscillations in x get faster and faster, so that K(x, t) behaves like a delta function when
integrated against a test function.
The path integral is an approach for calculating the propagator in more complicated settings. We
work with the Hamiltonian H = T + V = p2 /2m + V (x), as more general Hamiltonians with higher
powers of p are more difficult to handle.
Within each factor, we insert a resolution of the identity in momentum space for
(xj+1 − xj )2
Z r
−ip̂2 /2m~ −iV (x̂)/~ m i
dp hxj+1 |e |pihp|e |xj i = exp m − V (xj )
2πi~ ~ 2
where we performed a Gaussian integral almost identical to the free particle case. Then
N −1 2
m N/2 Z i X (x j+1 − x j )
K(x, x0 , t) = lim dx1 . . . dxN −1 exp m − V (xj )
N →∞ 2πi~ ~ 22
j=0
• Next, we can differentiate the above identity with respect to j at j = 0. But since
T A−1 j/2 T A−1 j/2
∂jm ej = (A−1 j)m ej
the result vanishes for a single derivative when valuated at j = 0. However, for two derivatives,
we can get a nonzero result by differentiating the A−1 j term, giving
r
(2π)N −1
Z
−vT Av/2
dv e vm vn = A .
det A mn
Interpreting the Gaussian as a probability distribution, this implies
hvm vn i = A−1
mn .
Similarly, for any even number of derivatives, we get a sum over all pairings,
X
hvi1 · · · vi2n i = A−1 −1
ik ik . . . Aik ik .
1 2 2n−1 2n
pairings
π N w† A−1 w0
Z
† † † 0
d(v† , v) e−v Av+w v+v w = e .
det A
Similarly, we can take derivatives; to get nonzero results, we must pair derivatives with respect
to v with derivatives with respect to v. Then Wick’s theorem is
X
hv i1 · · · v in vj1 · · · vjn i = A−1 −1
j 1 iP · · · Aj n i P
1 n
perms
• In the continuum limit, the vectors and matrices above become functions and operators, and
the integral becomes a path integral, giving
Z Z Z
1 0 0 0
Dv(x) exp − dx dx v(x)A(x, x )v(x ) + dx j(x)v(x)
2
Z
1 1 0 −1 0 0
∝√ exp dx dx j(x)A (x, x )j(x ) .
det A 2
and we have thrown away some normalization factors, which drop out of averages. Wick’s
theorem generalizes to this case straightforwardly.
118 7. Path Integrals
Note. We now review the stationary phase approximation. We consider the integral
Z
dx eiϕ(x)/κ
for small κ. Then the integrand oscillates wildly except at points of stationary phase x. Approxi-
mating the exponent as a quadratic there, we have a Gaussian integral, giving
Z s s
2πiκ 2πκ iϕ(x)/κ
dx eiϕ(x)/κ ≈ 00
eiϕ(x)/κ = eiνπ/4 e , ν = sign(ϕ00 (x))
ϕ (x) |ϕ00 (x)|
If there are multiple points of stationary phase, we must sum over each such point. Similarly, we
can consider the multidimensional integral
Z
dx eiϕ(x)/κ
for small κ. Then the stationary points are where ∇ϕ = 0. Expanding about these points and
applying our multidimensional Gaussian formula,
• In this case, the small parameter is κ = ~ and the function is the discretized Lagrangian
N −1
m (xj+1 − xj )2
X
ϕ(x1 , . . . , xN −1 ) = − V (x j ) .
2 2
j=0
Differentiating, we have
∂ϕ m
0
∂2ϕ m
= 2 (2xk − xk+1 − xk−1 ) − V (xk ) , = Qk`
∂xk ∂xk ∂x`
where the matrix Qk` is tridiagonal,
2 − c1 −1 0 0 ...
−1 2 − c2 −1 0 . . . 2 00
Qk` = 0 , ck = V (xk ).
−1 2 − c3 −1 . . . m
.. .. .. .. . .
. . . . .
• In the limit N → ∞, the stationary points are simply the classical paths x(τ ), so
• Next, we must evaluate det Q. This must combine with the path integral prefactor, which is
proportional to −N/2 , to give a finite result, so we expect det Q ∝ 1/. The straightforward
way to do this would be to diagonalize Q, finding eigenfunctions of the second variation of the
action. However, we can do the whole computation in one go by a slick method.
• Letting Dk be the determinant of the upper-left k × k block, we have
Dk+1 = (2 − ck+1 )Dk − Dk−1 .
This may be rearranged into a difference equation, which becomes, in the continuum limit
d2 F (τ )
m = −V 00 (x(τ ))F (τ ), Fk = Dk .
dτ 2
We pulled out a factor of to make F (τ ) regular, with initial conditions
F (0) = lim D0 = lim = 0, F 0 (0) = lim (D1 − D0 ) = 1.
→0 →0 →0
• The equation of motion for F is the equation of motion for a small deviation about the classical
path, x(τ ) = x(τ ) + F (τ ), as the right-hand side is the linearized change in force. Thus F (t) is
the change in position at time t per unit change in velocity at t = 0, so
2 −1
∂pi −1
∂x ∂ S
F (t) = =m = −m .
∂vi ∂x ∂x0 ∂x
This is regular, as expected, and we switch back to D(t) by dividing by . Intuitively, this
factor tells us how many paths near the original classical path contribute. In the case where
V 00 (τ ) < 0, nearby paths rapidly diverge away, while for V 00 (τ ) < 0 a restoring force pushes
them back, enhancing the contribution.
• Finally, we need the number of negative eigenvalues, which we call µ. It will turn out that µ
approaches a definite limit as N → ∞. In that limit, it is the number of perturbations of the
classical path that further decrease the action, which is typically small.
• Putting everything together and restoring the branch index gives the Van Vleck formula
X e−iµb π/2 ∂ 2 Sb 1/2
i
K(x, x0 , t) ≈ √ exp Sb (x, x0 , t) .
b
2πi~ ∂x∂x0 ~
The van Vleck formula expands the action to second order about stationary paths. It is exact
when the potential energy is at most quadratic, i.e. for a particle that is free, in a uniform
electric or gravitational field, or in a harmonic oscillator. It is also exact for a particle in a
magnetic field, since the Lagrangian remains at most quadratic in velocity.
Note. The van Vleck formula has a simple intuitive interpretation. It essentially states that
2
∂ S
P (x, x0 ) ∝ .
∂x∂x0
By changing variables, we have
∂p0 1 ∂p0
P (x, x0 ) = P̃ (x0 , p0 )
=
∂x h ∂x
because the initial phase space distribution P̃ (x0 , p0 ) must always fill a Planck cell. These two
expressions are consistent since p0 = −∂S/∂x0 .
120 7. Path Integrals
Example. The free particle. In this case the classical paths are straight lines and
mẋ2 t m(x − x0 )2
S= = .
2 2t
The determinant factor is 2 1/2 r
∂ S m
∂x∂x0 = .
t
The second-order change in action would be the integral of m(δ ẋ)2 /2 which is positive definite, so
µ = 0. Putting everything together gives
i m(x − x0 )2
r
m
K(x, x0 , t) = exp
2πi~t ~ 2t
as we found earlier.
~2 2
i
ψ(x, ) = ψ(x, 0) − − ∇ + V (x) ψ(x, 0) + O(2 ).
~ 2m
Now we compare this to the path integral. Here we use a single timestep, so
i m(x − y)2
Z m 3/2
ψ(x, ) = dy K(x, y, )ψ(y, 0), K(x, y, 0) = exp − V (y) .
2πi~ ~ 22
The expansion is a little delicate because of the strange dependence on . The key is to note that
by the stationary phase approximation, most of the contribution comes from ξ = x − y = O(1/2 ).
We then expand everything to first order in , treating ξ = O(1/2 ), for
imξ 2
m 3/2 Z
i
ψ(x, ) = dξ exp 1 − V (x + ξ) + . . .
2πi~ 2~ ~
i 1 i j
× ψ(x, 0) + ξ ∂i ψ(x, 0) + ξ ξ ∂i ∂j ψ(x, 0) + . . . .
2
where we cannot expand the remaining exponential since its argument is O(1). Now we consider
the terms in the products of the two expansions. The O(1) term gives ψ(x, 0), as expected. The
O(1/2 ) term gives zero because it is odd in ξ. The O() term is
i 1
− V (x)ψ(x, 0) + ξ i ξ j ∂i ∂j ψ(x, 0).
~ 2
The first of these terms is the potential term. The second term integrates to give the kinetic term.
Finally, the O(3/2 ) term vanishes by symmetry, proving the result.
Example. Path integrals in quantum statistical mechanics. Since the density matrix is ρ = e−βH /Z,
we would like to compute the matrix elements of e−βH . This is formally identical to what we’ve
done before if we set t = −i~β. Substituting this in, we have
N/2 Z N −1 2
m η X m(x j+1 − x j )
hx|e−βH |x0 i = lim dx1 . . . dxN −1 exp − + V (xj )
N →∞ 2π~η ~ 2η 2
j=0
121 7. Path Integrals
where we have defined η = ~β/N , and = −iη. The relative sign between the kinetic and potential
terms has changed, so we have an integral for the Hamiltonian instead, and the integral is now
damped rather than oscillatory. Taking the continuum limit, the partition function is
1 β~
Z Z Z
Z = C dx0 Dx(u) exp − H du
~ 0
where the path integral is taken over paths with x(0) = x(β~) = x0 . As a simple example, suppose
that the temperature is high, so β~ is small. Then the particle can’t move too far from x(0) in the
short ‘time’ u = β~, so we can approximate the potential as constant,
Z Z Z β~ 2 ! r Z
−βV (x0 ) 1 m dx m
Z ≈ C dx0 e Dx(u) exp − du = dx0 e−βV (x0 )
~ 0 2 du 2πβ~2
where the last step used the analytically continued free particle propagator. This is the result from
classical statistical mechanics, where Z is simply an integral of e−βH over phase space, but we can
now find corrections order by order in β~.
Example. The harmonic oscillator with frequency ω. This is somewhat delicate since some choices
of (x0 , x, t) give infinitely many branches, or no branches at all. However, assuming we have chosen
a set with exactly one branch, we can show
mω
S(x, x0 , t) = ((x2 + x2 ) cos(ωt) − 2xx0 ).
2 sin(ωt) 0
To find µ, note that we may write the second variation as
m d2
Z
2
δS = dτ δx(τ ) − +ω δx(τ )
2 dτ 2
by integration by parts; hence we just need the number of negative eigenvalues of the operator
above, where the boundary conditions are δx(0) = δx(t) = 0. The eigenfunctions are of the form
sin(nπτ /t) for positive integer n with eigenvalue (nπ/t)2 − ω 2 . Therefore the number of negative
eigenvalues depends on the value of t, but for sufficiently small t there are none.
Applying the Van Vleck formula gives the exact propagator,
r
mω
K(x, x0 , t) = exp(iS(x, x0 , t)/~), t < π/ω.
2πi~ sin(ωt)
Setting t = −i~β and simplifying gives the partition function
e−β~ω/2
Z=
1 − e−β~ω
which matches the results from standard statistical mechanics.
Example. Operator ordering in the path integral. At the quantum level, operators generally do not
commute, and their ordering affects the physics. But all the variables in the path integral appear
to commute. It turns out that the operator ordering is determined by the discretization procedure.
For example, for a particle in an electromagnetic field, the correct phase factor is
N −1 2
i X m(x j+1 − x j ) q x j+1 − x j x j+1 + x j
exp + ·A − V (xj )
~ 22 c 2
j=0
122 7. Path Integrals
where V is evaluated as usual at the initial point, but A is evaluated at the midpoint. One can
show this is the right choice by expanding order by order in as we did before. While the evaluation
point of V doesn’t matter, the evaluation point of A ensures that the path integral describes a
Hamiltonian with term p · A + A · p.
Naively, the evaluation point can’t matter because it makes no difference in the continuum limit.
The issue is that the path integral paths are not differentiable, as we saw earlier, with ξ = O(1/2 )
instead of ξ = O(). The midpoint evaluation makes a difference at order O(ξ 2 ) = O(), which
is exactly the term that matters. This subtlety is swept under the rug in the casual, continuum
notation for path integrals.
In general there are various prescriptions for operator ordering, including normal ordering (used
in quantum field theory) and Weyl ordering, which heuristically averages over all possible orders.
However, we won’t encounter any other Hamiltonians below for which this subtlety arises.
Note. If we take the path integral as primary, we can even use it to define the Hilbert space, by
“cutting it open”. Note that by the product property of the path integral,
Z Z x(t0 )=x0 Z x(t)=xf
K(xf , x0 , t) = dx0 Dx(τ ) eiS Dx(τ ) eiS .
x(0)=x0 x(t0 )=x0
The extra dx0 integral produced is an integral over the Hilbert space of the theory. In a more
R
general setting, such as string theory, we can “cut open” the path integral in different ways, giving
different Hilbert space representations of a given amplitude. This is known as world-sheet duality.
123 8. Angular Momentum
8 Angular Momentum
8.1 Classical Rotations
First, we consider rotations classically.
• Physical rotations are operators R that take spatial points to spatial points in an inertial
coordinate system, preserving lengths and the origin.
• By taking coordinates, r = xi êi , we can identify every spatial point with a 3-vector. As a result,
we can identify rotation operators R with 3 × 3 rotation matrices Rij . Under a rotation r0 = Rr,
we have x0i = Rij xj .
• We distinguish the physical rotations R and the rotation matrices R. The latter provide a
representation of the former.
• It’s also important to distinguish active/passive transformations. We prefer the active viewpoint;
the passive viewpoint is tied to coordinate systems, so we can’t abstract out to the geometric
rotations R.
• Using the length-preserving property shows Rt = R−1 , so the group of rotations is isomorphic
to O(3). From now on we specialize to proper rotations, with group SO(3). The matrices R
acting on R3 form the fundamental representation of SO(3).
• Every proper rotation can be written as a rotation of an angle θ about an axis n̂, R(n̂, θ).
Q
Proof: every rotation has a unit eigenvalue because λi = 1 and |λi | = 1. The corresponding
eigenvalue is the axis. (Note that this argument fails in higher dimensions.)
• Working in the fundamental representation, we consider the infinitesimal elements R = I + A.
Then we require A + At = 0, so the (fundamental representation of the) Lie algebra so(3)
contains antisymmetric matrices. One convenient basis is
(Ji )jk = −ijk
and we write an algebra element as A = a · J.
• Using the above definition, we immediately find
(Ji Jj )jk = δil δkj − δij δkl
which gives the commutation relations
[Ji , Jj ] = ijk Jk , [a · J, b · J] = (a × b) · J.
• More generally, the set of infinitesimal elements of a Lie group is a Lie algebra, and we go
between the two by taking exponentials, or differentiating paths through the origin (to get
tangent vectors).
A group acts on itself by conjugation; this is called the adjoint action. The Lie algebra is closed
under this operation, giving an action of the group on the algebra. Viewing the algebra as a vector
space, this gives a representation of the Lie group on V = g called the adjoint representation.
Example. In the case of SO(3), the fundamental representation happens to coincide with the
adjoint representation. To see this, note that
which simply states that the cross product transforms as a vector under rotations (it’s actually a
pseudovector). Then we find
This provides a representation of the Lie group, representing R as the operator that takes the vector
a to Ra. This is just the fundamental representation, but viewed in a more abstract way – the
vector space now contains infinitesimal rotations rather than spatial vectors.
Another statement of the above is that ‘angular velocity is a vector’. This is not generally
true; in SO(2), it is a scalar and the adjoint representation is trivial; in SO(4), the Lie group is
six-dimensional, and the angular velocity is more properly a two-form.
Example. Variants of the adjoint representation. Exponentiating the above gives the formula for
the adjoint action on the group,
We can also derive the adjoint action of an algebra on itself, which yields a representation of the
Lie algebra. First consider conjugation acting on an infinitesimal group element,
This shows that the adjoint action also conjugates algebra elements. Then if A = 1 + g with g ∈ g,
Taking the derivative with respect to to define the algebra’s adjoint action, we find that g acts
on h by sending it to [g, h]. Incidentally, this is also a proof that the Lie algebra is closed under
commutators, since we know the algebra is closed under the adjoint action.
As a direct example, consider the matrix Lie group SO(3). Since the operation is matrix
multiplication, the commutator above is just the matrix commutator. Our above calculations shows
that the adjoint action of the Lie algebra so(3) on itself is the cross product.
Note. Noncommutativity in the Lie group reflects a nontrivial Lie bracket. The first manifestation
of this is the fact that
etg eth e−tg e−th = 1 + t2 [g, h] + . . .
125 8. Angular Momentum
This tells us that a nonzero Lie bracket causes the corresponding group elements to not commute;
as a simple example, the commutator of small rotations about x̂ and ŷ is a rotation about x̂ × ŷ = ẑ.
Conversely, the Lie bracket is zero, the commutator is zero.
Another form of the above statement is the Baker–Campbell–Hausdorff theorem, which is the
matrix identity
1 1 1
eX eY = eZ , Z = X + Y + [X, Y ] + [X, [X, Y ]] + [Y, [Y, X]] + . . .
2 12 12
where all the following terms are built solely out of commutators of X and Y . Therefore, if we can
compute the commutator in the algebra, we can in principle compute multiplication in the group.
The group SO(3) is a compact connected three-dimensional manifold; it is also the configuration
space for a rigid body, so wavefunctions for rigid bodies are defined on the SO(3) manifold. As
such, it’s useful to have coordinates for it; one set is the Euler angles.
Note. The Euler angles. A rotation corresponds to an orientation of a coordinate system; therefore,
we can specify a rotation uniquely by defining axes x̂0 , ŷ0 , ẑ0 that we would like to rotate our original
axes into. Suppose the spherical coordinates of ẑ0 in the original frame are α and β. Then the
rotation
R(ẑ, α)R(ŷ, β)
will put a vector originally pointing along ẑ along ẑ0 . However, the x̂ and ŷ axes won’t be in the
right place. To fix this, we can perform a pre-rotation about ẑ before any of the other rotations;
therefore, any rotation may be written as
This is the zyz convention for the Euler angles. We see that α and γ range from 0 to 2π, while β
ranges from 0 to π. The group manifold SO(3), however, is not S 1 × S 1 × [0, π]. This is reflected
in the fact that for extremal values of the angles, the Euler angle parametrization is not unique.
• Given a quantum mechanical system with an associated Hilbert space, we expect rotations R
are realized by unitary operators U (R) on the space. It is reasonable to expect that R → U (R)
is a group homomorphism, so we have a representation of SO(3) on the Hilbert space.
• Given a representation of a Lie group, we automatically have a representation of the Lie algebra.
Specifically, we define
∂U (θ)
Jk = i~
∂θk θ=0
where U (θ) is the rotation with axis θ̂ and angle θ. Then we must have
[Ji , Jj ] = i~ijk Jk .
• The operators J generate rotations, the factor of i makes them Hermitian, and the factor of
~ makes them have dimensions of angular momentum. We hence define J to be the angular
momentum operator of the system.
Since we can recover a representation of the group by exponentiation, it suffices to find repre-
sentations of the algebra, i.e. triplets of matrices that satisfy the above commutation relations.
• Even though the angular momentum of a spin 1/2 particle is not a vector, we still expect that
angular momentum behaves like a vector under rotations, in the sense that the expectation
value hJi transforms as a vector. Then we require
• The above formula is equivalent to our earlier adjoint formula. Inverting and dotting with a,
we find
U (a · σ)U † = (Ra) · σ.
This is just another formula for the adjoint action; conjugation by the group takes a to Ra.
Note. Euler angle decomposition also works for spinor rotations, with
−iθ/2
cos θ/2 −i sin θ/2 cos θ/2 − sin θ/2 e 0
U (x̂, θ) = , U (ŷ, θ) = , U (ẑ, θ) = .
−i sin θ/2 cos θ/2 sin θ/2 cos θ/2 0 eiθ/2
where α ∈ [0, 2π], β ∈ [0, π], γ ∈ [0, 4π]. The extended range of γ accounts for the double cover.
To see that this gives all rotations, note that classical rotations R are a representation of spinor
rotations U with kernel ±I. Then with the extended range of γ, which provides the −1, we get
everything.
Example. The ket |+i = (1, 0) points in the +ẑ direction, since h+|σ|+i = ẑ and σz |+i = |+i.
Similarly, we can define the kets pointing in arbitrary directions as
|n̂, +i = U |+i.
Then the expectation value of the spin along any direction perpendicular to n̂ vanishes.
Note. The above reasoning doesn’t work for higher spin. For example, using the notation in the
next section, for a spin 1 particle, the state (0, 1, 0) has hσi = 0, so it’s not ‘pointing’ in any direction.
For spin higher than 1/2, the action of the rotation operators U (R) on the states |ψi isn’t even
transitive, since the dimension of SU (2) is less than the (real) dimension of the state space. (This
is compatible with the spin representations being irreps, as that only requires that the span of the
entire orbit of each vector is the whole representation.)
We now consider general representations of su(2) on a Hilbert space. That is, we are looking
for triplets of operators J satisfying the angular momentum commutation relations. Given these
operators, we can recover the rotation operators by exponentiation; conversely, we can get back to
the angular momentum operators by differentiation at θ = 0.
which commutes with J; such an operator is called a Casimir operator. As a result, J 2 commutes
with any function of J, including the rotation operators.
• Given the above structure, we consider simultaneous eigenkets |ami of J 2 and J3 , with eigenval-
ues ~2 a and ~m. Since J 2 and J3 are Hermitian, a and m are real, and since J 2 is nonnegative
definite, a ≥ 0. For simplicity, we assume we are dealing with an irrep; physically, we can
guarantee this by postulating that J 2 and J3 form a CSCO.
128 8. Angular Momentum
and similarly
~2 (a − m(m − 1)) ≥ 0.
Therefore, we require a ≥ max(m(m + 1), m(m − 1)). If the maximum value of |m| is j, the
corresponding value of a is j(j + 1). For convenience, we switch to labeling the states by j and
m values.
where we have equality if j = m. (The other case is forbidden by our second equation.) Doing
a similar analysis on the second equation, we conclude
• Finally, using the commutation relations, we see that acting with J± doesn’t change the j value,
but raises/lowers m by 1.
As a result, we conclude that m − j is an integer; if not, we can keep applying the raising
operator until our inequalities above are broken. Similarly, m − (−j) is an integer. Therefore,
2j is an integer and m = −j, . . . , +j. These are all of the irreps of su(2).
Now that we’ve found all of the irreps, we turn to calculations and applications.
Above we used the phase freedom in the |jmi to set all possible phase factors to zero. Then
s j−m
(j + m)! J−
|jmi = |jji.
(2j)!(j − m)! ~
• Given the above, we know the matrix elements of J± , as well as the matrix elements of J3 ,
Then we can simply write down the matrix elements of all of the J, and hence the matrix of
any function of J, including the rotation operators.
129 8. Angular Momentum
Note. The j values which appear must be determined separately for each physical situation. If
we’re considering central force motion of a particle, it turns out that only integral j matter. If we
consider p-wave scattering, j = 1 appears. The spin state of a photon is (roughly) described by
j = 1, but the spin state of two electrons is described by j = 0, 1.
Example. In the case j = 1, we have
√
1 2 √
J3 = ~ , J+ = ~ 2 .
−1
Evaluating e−2πiJ3 /~ , we find that a rotation by 2π is the identity. In general, for integer j, we end
up with a normal representation of SO(3), rather than a projective one.
Note. Reading a table of rotation matrices. The operator U (n̂, θ) has matrix elements
j 0
Dm 0 m (U ) = hjm |U |jmi.
Note that U must be diagonal in j-space, so we aren’t missing any information here. We think of
j
the Dm 0 m as a set of matrices indexed by j. Parametrizing a rotation by Euler angles as above,
Here, djmm0 (β) is the reduced rotation matrix. Using tables of djmm0 values, we may construct
rotation matrices for arbitrary spin.
The Dj matrices have numerous properties which aid calculation. We can view them as a
representation of the U operators; the distinction is that while the U operators act on a physical
Hilbert space, the Dj matrices are just numbers acting on vectors of numbers. Since U is unitary,
and we are using the orthonormal basis |jmi, the Dj are also unitary. These two properties imply
j −1 j∗
Dmm 0 (U ) = Dm 0 m (U ).
Therefore, the left-hand side is J − θn̂ × J, which is simply the infinitesimal spatial rotation R−1 .
• Experimentally, we find that for nuclei and elementary particles, µ ∝ J, and the relevant state
space is just a single copy of a single irrep of su(2).
• For a classical current loop with total mass m and charge q, we can show that
q
µ= L.
2mc
The coefficient is called the gyromagnetic ratio γ. For general configurations, µ and L need
not be proportional, since the former depends only on the current distribution while the latter
depends only on the mass distribution. However, the relation does hold for orbital angular
momentum in quantum mechanics, as we’ll justify below.
• For spin, the relation above holds with a modified gyromagnetic ratio,
q
µ=g S.
2mc
For electrons, µB = e/2mc is called the Bohr magneton, and g ≈ 2.
• For nuclei, the magnetic moment must be determined experimentally. Since many nuclei are
neutral but still have magnetic moments, it is useful to define the g-factors in terms of the
nuclear magneton,
q
µ = gµN S, µN =
2mp c
where q is the elementary charge and mp is the proton mass. For the proton and neutron,
gp ≈ 5.56, gn ≈ −3.83.
Note the factors of 2. When we take magnitudes, µN gives a 1/2, S gives a 1/2, and for electrons
only, g gives a 2.
• The magnetic moment of the proton comes from a mix of the spin and orbital motion of the
quarks and gluons. Similarly, the magnetic moment of the deuteron (one proton and one
neutron) comes from a combination of the magnetic moments of the proton and neutron, and
the orbital motion of the proton. For spin zero particles, like the α particle, S = 0, so µ = 0.
131 8. Angular Momentum
• Assuming rotational invariance, the spectrum of the Hamiltonian is split into irreps each
containing 2j + 1 degenerate states. Now, since accidental degeneracies are very unlikely, the
irreps won’t be degenerate; instead, they will be separated by energies on the nuclear energy
scale. This energy scale is much larger than the splitting within each irrep induced by an external
field; therefore, if the nucleus starts in the ground state, it suffices to only consider the lowest-
energy irrep. (While additional symmetries can cause more degeneracies, such symmetries are
not generic.)
• The above argument explains the situation for nuclei. For fundamental particles, the reason
there isn’t degeneracy of different j is that no symmetries besides supersymmetry can relate
particles of different j. This is the Coleman–Mandula theorem, and its proof requires relativistic
quantum field theory.
• Supposing that a single irrep is relevant, we will show below that every vector operator (i.e.
triplet of operators transforming as a vector) is a multiple of J. Since µ is a vector, µ ∝ J.
• In the case of atoms, the irreps are much closer together, as the atomic energy scale is much
smaller than the nuclear energy scale. In this case we do see mixing of irreps for sufficiently
strong fields, such as in the strong field Zeeman effect. Each irrep has its own g-factor, so that
the total µ is no longer proportional to the total angular momentum, recovering the classical
behavior.
We now consider the example of a spinless particle in three-dimensional space. We again assume
rotational symmetry, which in this case means V = V (r).
• We can define angular momentum as x×p, but instead we define it as the generator of rotations,
which is more fundamental. Let
U (R)|xi = |Rxi.
Then it’s straightforward to check the U (R) are a unitary representation of SO(3).
• Wavefunctions transform as
One way of remembering this rule is to note that if the rotation takes x to x0 , then we must
have ψ 0 (x0 ) = ψ(x). This rule is necessary in the active point of view, which we take throughout
these notes.
• Note that in this context, x and p don’t have ordering issues. For example, we have
x × p = −p × x, x · L = p · L = 0.
The reason is that there are only nonzero commutators between xi and the same component of
momentum pi , and the cross products prevent components from matching.
• We now find the standard angular momentum basis |lmi in the position basis. That is, we are
looking for wavefunctions ψlm (x) such that
and
2 2 1 1 2
L = −~ ∂θ (sin θ∂θ ) + ∂ .
sin θ sin2 θ φ
That is, L2 is just the spherical Laplacian, up to a constant factor.
• We notice that ∂r appears nowhere above, which makes sense since angular momentum generates
rotations, which keep r constant. Therefore, it suffices to find wavefunctions on the unit sphere,
f (θ, φ) = f (r̂). We define their inner product by
Z
hf |gi = dΩ f (θ, φ)∗ g(θ, φ), dΩ = sin θ dθdφ.
As an example, the state |ri has angular wavefunction δ(θ − θ0 )δ(φ − φ0 )/ sin θ, where the sine
cancels the Jacobian factor in dΩ.
• The solutions for the ψlm on the sphere are the spherical harmonics Ylm . Using the definition
of Lz , we have Ylm ∝ eimφ . After solving for Yll , we apply the lowering operator to find
s l−m
l 2l + 1 (l + m)! eimφ
(−1) d
Ylm (θ, φ) = l sin2l θ.
2 l! 4π (l − m)! sinm θ d(cos θ)
Here, the choice of phase factor (−1)l is conventional and makes Yl0 real and positive at the
North pole. The (l + m)!/(l − m)! normalization factor comes from the application of L− .
133 8. Angular Momentum
• We may also write the θ dependence in terms of the Legendre polynomials, which can be given
by the Rodriguez formula
(−1)l dl
Pl (x) = l (1 − x2 )l ,
2 l! dxl
and the associated Legendre functions
dm Pl (x)
Plm (x) = (1 − x2 )m/2 .
dxm
This yields s
2l + 1 (l + m)! imφ
Ylm (θ, φ) = (−1)m e Plm (cos θ), m≥0
4π (l − m)!
where the m < 0 spherical harmonics are related by
∗
Yl,−m = (−1)m Ylm .
• In the above analysis, we have found that precisely one copy of each integer irrep appears, since
the solution to L+ ψll = 0 is unique for each l.
For a particle in three-dimensional space, the Ylm will be multiplied by a function u(r). Then
multiple copies of each irrep may appear, depending on how many solutions there are for u(r),
and we must index the states by a third quantum number (e.g. n for the hydrogen atom).
• The spherical harmonics are then our standard angular momentum basis |lmi. We can find
an identity by computing hr̂|U (R)|lmi in two different ways. Acting on the right, we have
Ylm (R−1 r̂). Alternatively, we may insert an identity for
X X
hr̂|lm0 ihlm0 |U (R)|lmi = l
Ylm0 (r̂)Dm 0 m (R).
m0 m0
Here, we only needed to insert states with the same l since they form an irrep. Then
X
Ylm (R−1 r̂) = l
Ylm0 (r̂)Dm 0 m (R).
m0
• One useful special case of the above is to choose r̂ = ẑ and replace R with R−1 , for
X
l −1
Ylm (r̂) = Ylm0 (ẑ)Dm 0 m (R )
m0
where R is the rotation that maps ẑ to r̂, i.e. the one with Euler angles α = φ and β = θ.
Moreover, only the m = 0 spherical harmonic is nonzero at ẑ (because of the centrifugal force),
and plugging it in gives r
2l + 1 l∗
Ylm (θ, φ) = Dm0 (φ, θ, 0)
4π
where we applied the unitarity of the D matrices.
• For a multiparticle system, with state space |x1 , . . . , xn i, the angular momentum operator
P
is L = xi × pi . To construct the angular momentum basis, we use addition of angular
momentum techniques, as discussed later.
134 8. Angular Momentum
which implies that rl Ylm is a homogeneous polynomial of degree l. In this representation, it is also
easy to see that the parity of Ylm is (−1)l .
• Consider a spinless particle moving in a central potential. Since L2 and Lz commute with H,
the eigenstates are of the form
Substituting this into the Schrodinger equation, and noting that L2 is −~2 /r2 times the angular
part of the Laplacian, we have
~2 1 l(l + 1)~2
− ∂r (r2 ∂r R) + U R = ER, U (r) = V (r) +
2m r2 2mr2
where the extra contribution to the effective potential U (r) is equal to L2 /2mr2 . As in the
classical case, this is the angular part of the kinetic energy.
• Next, we let f (r) = rR(r). This is reasonable, because then |f |2 gives the radial probability
density, so we expect this should simplify the radial kinetic energy term. Indeed we have
Z ∞
~2 d2 f (r)
− + U (r)f (r) = Ef (r), dr |f (r)|2 = 1.
2m dr2 0
The resulting equation looks just like the regular 1D Schrodinger equation, but on (0, ∞).
135 8. Angular Momentum
• We could also have arrived at this conclusion using separation of variables. Generally, this
technique works when there is a continuous symmetry. Then the (differential) operator that
generates this symmetry commutes with the Hamiltonian, and we can take the eigenfunctions
to be eigenfunctions of that operator. In an appropriate coordinate system (i.e. when fixing
some of the coordinates gives an orbit of the symmetry) this automatically gives separation
of variables; for example, Lz generates rotations which change only φ, so diagonalizing Lz
separates out the coordinate φ.
These account for the bound states; there also may be unbound states with a continuous
spectrum. Focusing on just the bound states, the irreps are indexed by n and l and each contain
2l + 1 states.
• There generally is no degeneracy in l unless there is additional symmetry; this occurs for the
hydrogen atom (hidden SO(4) symmetry) and the 3D harmonic oscillator (SU (3) symmetry).
• The hydrogen atom’s energy levels are also degenerate in ms . This is simply because nothing
in the Hamiltonian depends on the spin, but in terms of symmetries, it is because there are
two independent SU (2) rotational symmetries, which act on the orbital or spin parts alone.
• Next, we consider degeneracy in n, i.e. degenerate eigenfunctions f (r) of the same effective
potential. These eigenfunctions satisfy the same Schrodinger equation (with the same energy E
and effective potential U (r)), so there can be at most two of them, as the Schrodinger equation
is second-order. However, as we’ll show below, we must have f (0) = 0, which effectively
removes one degree of freedom – eigenfunctions are solely determined by f 0 (0). Therefore there
is only one independent solution for each energy, bound or not, so different values of n are
nondegenerate. (In the bound case, we can also appeal to the fact that f vanishes at infinity.)
Therefore we conclude irreps are generically nondegenerate.
• We now consider the behavior of R(r) for small r. If R(r) ∼ ark for small r, then the terms in
the reduced (1D) Schrodinger equation scale as:
If we suppose the potential is regular at the origin and diverges no faster than 1/r, then the
last two terms are negligible. Then for the equation to remain true, the first two terms must
cancel, so
k(k + 1) = l(l + 1), k = l or k = −l − 1.
136 8. Angular Momentum
The second solution is nonnormalizable for l ≥ 1, so we ignore it. For l = 0, it gives R(r) ∝ 1/r,
which is the solution for the delta function potential, which we have ruled out by regularity.
Therefore the first solution is physical,
Example. Two-body interactions. Suppose that two massive bodies interact with Hamiltonian
p21 p2
H= + 2 + V (|x1 − x2 |).
2m1 2m2
In this case it’s convenient to switch to the coordinates
m1 x1 + m2 x2
R= , r = x2 − x1
M
where M = m1 + m2 . Defining the conjugate momenta P = −i~∂R and p = −i~∂r , we have
m1 p2 − m2 p1
P = p1 + p2 , p= .
M
This transformation is an example of a canonical transformation, as it preserves the canonical
commutation relations. The Hamiltonian becomes
P2 p2 1 1 1
H= + + V (r), = + .
2M 2µ µ m1 m2
We see that P 2 /2M commutes with H, so we can separate out the variable R, giving the overall
center-of-mass motion. We then focus on the wavefunction of the relative coordinate, ψ(r). This
satisfies the same equation as a single particle in a central force, with m replaced with µ.
Finally, we may decompose the total angular momentum L = L1 + L2 into
L=R×P+r×p
which is a ‘orbit’ plus ‘spin’ (really, ‘relative’) contribution, just as in classical mechanics. The
relative contribution commutes with the relative-coordinate Hamiltonian p2 /2µ + V (r), so the
quantum numbers l and m in the solution for ψ(r) refer to the angular momentum of the particles
in their CM frame.
Example. The rigid rotor. Consider two masses m1 and m2 connected with a massless, rigid rod
of length r0 . The Hamiltonian is
L2
H= , I = µr02 .
2I
Since the length r0 is fixed, there is no radial dependence; the solution is just
l(l + 1)~2
El = , ψlm (θ, φ) = Ylm (θ, φ).
2µr02
This can also be viewed as a special case of the central force problem, with a singular potential.
137 8. Angular Momentum
• For a typical diatomic molecule, such as CO, the reduced mass is on the order of several
times the atomic mass, so the rotational energy levels are much more closely spaced than the
atomic levels. (Here, we treat the two atoms as point particles; this is justified by the Bohr-
Oppenheimer approximation, which works because the atomic degrees of freedom are faster,
i.e. higher energy.) There are also vibrational degrees of freedom due to oscillations in the
separation distance between the atoms.
• To estimate the energy levels of the vibrational motion, we use dimensional analysis on the
parameters m, e, and ~, where m and e are the mass and charge of the electron; this is
reasonable because valence electrons are responsible for bonding. We don’t use c, as the
situation is nonrelativistic.
In atomic units, we set e = m = ~ = 1, setting all of these quantities to unity, so c = 1/α ≈ 137.
• Now, we estimate the diatomic bond as a harmonic oscillator near its minimum. Assuming that
the ‘spring constant’ of the bond is about the same as the ‘spring constant’ of the bond between
the valence electrons and their own atoms (which makes sense since the bond is covalent), and
√
using ω ∝ 1/ m, we have r
m
ωvib = ω0 , ω0 = K0 /~
M
where M is the reduced mass, on the order of 104 m. Therefore the vibrational energy level
spacing is about 100 times closer than the electronic energy level spacing, or equivalently the
bond dissociation energy.
~2 ~2 m
∆Erot = ∼ 2 = K0 ∼ 10−4 K0 .
2I M a0 M
The rotational levels are another factor of 100 times closer spaced than the vibrational ones.
• At room temperature, the rotational levels are active, and the vibrational levels are partially
or completely frozen out, depending on the mass of the atoms involved.
– The characteristic distance is a = ~2 /meel enuc = a0 /Z, so the electrons orbit closer for
higher Z.
– The characteristic energy is K = eel enuc /a = Z 2 K0 , so the energies are higher for higher Z.
– The characteristic velocity is v = eel enuc /~ = Zv0 = (Zα)c, so for heavy nuclei, the
nonrelativistic approximation breaks down.
• We can now solve the equation by standard methods. As an overview, we first take the high ρ
limit to find the asymptotic behavior for normalizable solutions, f ∝ e−ρ/2 . We also know that
at small ρ, R(r) ∝ rl , so f (r) ∝ rl+1 . Peeling off these two factors, we let
• If one expands g(ρ) in a power series, one obtains a recursion relation for the power series
coefficients. If the series does not terminate, this series sums up to a growing exponential eρ
that causes f (ρ) to diverge. It turns out the series terminates if
ν = n ∈ Z, l < n.
• If one is interested in the non-normalizable solutions, one way to find them is to peel off
f (ρ) = ρ−l e−ρ/2 h(ρ) and expand h(ρ) in a power series. This is motivated by the fact that the
non-normalizable solutions to the Laplace equation look like ρ−l−1 at small ρ.
• The solutions for f are polynomials of degree n times the exponential e−ρ/2 , with energies
En = −1/2n2
independent of l. Therefore we have n2 degeneracy for each value of n, or 2n2 if we count the
spin. Restoring ordinary units, the energies are
Z 2 e4 m 1
En = −
2~2 n2
where m is really the reduced mass, which is within 0.1% of the electron mass.
• The bound l < n can be understood classically. For a planet orbiting a star with a fixed energy
(and hence fixed semimajor axis), there is a highest possible angular momentum corresponding
to l ≈ n (in some units), corresponding to a circular orbit. The analogous quantum states have
f (ρ) peaked around a single value. The low angular momentum states correspond to long, thin
ellipses, and indeed the corresponding f (ρ) extend further out with multiple nodes.
Note. Many perturbations break the degeneracy in l. For example, consider an alkali atom, i.e. a
neutral atom with one valence electron. The potential interpolates between −e2 /r at long distances
and −Ze2 /r at short distances, because of the shielding effect of the other electrons. Orbits which
approach the core are lowered in energy, and this happens more for low values of l. In sodium, this
effect makes the 3s state significantly lower in energy than the 3p state. In general atoms, this
causes the strange ordering of orbital filling in the aufbau principle.
In practice, these energy level shifts can be empirically parametrized as
Z 2 e4 m 1
En` = −
2~2 (n − δ` )2
where δ` is called the quantum defect, which rapidly falls as ` increases and does not depend on n.
For example, the electron energies in sodium can be fit fairly well by taking δs = 1.35, δp = 0.86, and
140 8. Angular Momentum
all others zero. The reason this works is that, for each fixed ` and in the Hartree–Fock approximation,
the energies En` are the energy eigenvalues associated with a fixed radial potential, which has a
1/r tail. A correspondence principle argument, just like that used to derive the Bohr model, shows
that En` ∝ 1/(n − δ` )2 for integers n when n 1. Thus the quantum defect is an excellent way
to parametrize the energy levels of a Rydberg atom, i.e. an atom with an electron in a state with
n 1. It turns out, just as for the Bohr model, that it still works decently for n ∼ 1.
For reference, we summarize facts about special functions and the contexts in which they appear.
−∇2 ψ + V ψ = Eψ
which comes from separating the ordinary Schrodinger equation. We only consider the rota-
tionally symmetric case V = V (r).
• If we separate the wave equation, the spatial part is the Helmholtz equation, which is the special
case V = 0 above. If we further set E = 0 above, we get Laplace’s equation, whose solutions
are harmonic functions. These represent static solutions of the wave equation.
• It only makes sense to add source terms to full PDEs, not separated ones, so we shouldn’t add
sources to the time-independent Schrodinger equation or the Helmholtz equation. By contrast,
Laplace’s equation is purely spatial, and adding a source term gives Poisson’s equation.
– The spherical harmonics Y`m (θ, φ) form a complete basis for functions on the sphere. The
quantity ` can take on nonnegative integer values.
– They are proportional to eimφ times an associated Legendre function P`m (cos θ).
– Setting m = 0 gives the Legendre polynomials, which are orthonormal on [−1, 1].
– More generally, the associated Legendre functions satisfy orthogonality relations which,
combined with those for eimφ , ensure that the spherical harmonics are orthogonal.
– Spherical harmonics are not harmonic functions on the sphere. Harmonic functions on the
sphere have zero L2 eigenvalue, and the only such function is the constant function Y00 .
– If we were working in two dimensions, we’d just get eimθ .
• The radial equation depends on the potential V (r) and the total angular momentum `, which
contributes a centrifugal force term.
– For V = 0, the solutions are spherical Bessel functions, j` (r) and y` (r). They are called
Bessel functions of the first and second kind; the latter are singular at r = 0.
– For high r, the Bessel functions asymptote to sinusoids with amplitude 1/r. (As a special
case, setting ` = 0 gives j0 (r) = sin(r)/r, y0 (r) = cos(r)/r, recovering the familiar form of
an isotropic spherical wave.)
– If we were working in two dimensions, we would instead get the ordinary, or cylindrical
Bessel functions.
141 8. Angular Momentum
– We define the (spherical) Hankel functions in terms of linear combinations of Bessel functions
to correspond to incoming and outgoing waves at infinity.
– For a Coulomb field, the solutions are exponentials times associated Laguerre polynomials.
Again, there are two solutions, with exponential growth and decay, but only the decaying
solution is relevant for bound states.
• Our results also apply to Laplace’s equation, in which case the radial equation yields solutions
r` and 1/r`+1 . These are the small-r limits of the spherical Bessel functions, because near the
origin the energy term Eψ is negligible compared to the centrifugal term.
• Consider two Hilbert spaces with angular momentum operators J1 and J2 . Then the tensor
product space has angular momentum operator
J = J1 ⊗ 1 + 1 ⊗ J2 = J1 + J2 .
The goal is to relate the angular momentum basis of the joint system |jmi in terms of the
uncoupled angular momentum basis |j1 m1 i ⊗ |j2 m2 i = |j1 m1 j2 m2 i.
• It suffices to consider the tensor product of two irreps; for concreteness, we consider 52 ⊗ 1. The
Jz eigenvalue is just m1 + m2 , so the m eigenvalues of the uncoupled basis states are:
• To find the coupled angular momentum basis, we first consider the state | 25 52 i ⊗ |11i, which has
m = 7/2. This state is a one-dimensional eigenspace of Jz . However, since Jz commutes with
J 2 , it must also be a one-dimensional eigenspace of J 2 , so it has a definite j value. Since there
are no states with higher m, we must have j = 7/2, so | 25 52 11i = | 72 27 i.
142 8. Angular Momentum
• Next, we may apply the total lowering operator to give | 72 25 i. There are two states with m = 5/2,
and hence by similar reasoning, the orthogonal state with m = 5/2 must be an eigenstate of
J 2 , so it is | 52 25 i.
• Continuing this process, lowering our basis vectors and finding new irreps by orthogonality, we
conclude that 52 ⊗ 1 = 32 ⊕ 52 ⊕ 72 . By very similar reasoning, we generally have
j1 ⊗ j2 = |j1 − j2 | ⊕ |j1 − j2 | + 1 ⊕ · · · ⊕ j1 + j2 .
• We define the Clebsch–Gordan coefficients as the overlaps hj1 j2 m1 m2 |jmi. These coefficients
satisfy the relations
X
hjm|j1 j2 m1 m2 ihj1 j2 m1 m2 |j 0 m0 i = δjj 0 δmm0 ,
m1 m2
X
hj1 j2 m1 m2 |jmihjm|j1 j2 m01 m02 i = δm1 m01 δm2 m02
jm
which simply follow from completeness of the coupled and uncoupled bases. In addition we
have the selection rule
hjm|j1 j2 m1 m1 i ∝ δm,m1 +m2 .
We may also obtain recurrence relations for the Clebsch–Gordan coefficients by applying J− in
both the coupled and uncoupled bases.
Example. Combining spin and spatial degrees of freedom for the electron. We must work in the
tensor product space with basis |r, mi. Wavefunctions are of the form
which has a separate wavefunction for each spin component, or equivalently, a spinor for every
position in space. The inner product is
XZ
hφ|ψi = d3 r φ∗ (r, m)ψ(r, m).
m
In the case of the electron, the Hamiltonian is the sum of the spatial and spin Hamiltonians we
have considered before,
1 g
H= (p − qA)2 + qφ − µ · B, µ = µσ.
2m 2
This is called the Pauli Hamiltonian and the resulting evolution equation is the Pauli equation. In
practice, it looks like two separate Schrodinger equations, for the two components of ψ, which are
coupled by the µ · B term.
The Pauli equation arises from expanding the Dirac equation to order (v/c)2 . The Dirac
equation also fixes g = 2. Further terms can be systematically found using the Foldy–Wouthuysen
transformation, as described here. At order (v/c)4 , this recovers the fine structure corrections we
will consider below.
Note. The probability current in this case can be defined as we saw earlier,
1
J = Re ψ † vψ, v= (−i~∇ − qA) .
m
Mathematically, J is not unique, as it remains conserved if we add any divergence-free vector field;
in particular, we can add any curl. But the physically interesting question is which possible J
is relevant when we perform a measurement. Performing a measurement of abstract “probability
current” is meaningless, in the sense that there do not exist detectors that couple to it. However,
in the case of a spinless charged particle, we can measure the electric current, and experiments
indicate it is Jc = eJ where J is defined as above; this gives J preference above other options.
However, when the particle has spin, the situation is different. By a classical analogy, we would
expect to regard M = ψ † µψ as a magnetization. But a magnetization gives rise to a bound current
Jb = ∇ × M, so we expect to measure the electric current
Jc = eJ + ∇ × (ψ † µψ).
This is indeed what is seen experimentally. For instance, without the second term, magnetic fields
could not arise from spin alignment, though they certainly do in ferromagnets.
Example. The Landau–Yang theorem states that a massive, spin 1 particle can’t decay into two
photons. This places restrictions on the decay of, e.g. some states of positronium and charmonium,
and the weak gauge bosons. To demonstrate this, work in the rest frame of the decaying particle. By
energy and momentum conservation, after some time, the state of the system will be a superposition
144 8. Angular Momentum
of the particle still being there, and terms involving photons coming out back to back in various
directions and polarizations, |k, e1 , −k, e2 i.
Now, pick an arbitrary z-axis. We will show that photons can’t come out back to back along this
axis, i.e. that terms |kẑ, e1 , −kẑ, e2 i cannot appear in the state. Since ẑ is arbitrary, this shows
that the decay can’t occur at all. The ei can be expanded into circular polarizations,
where these two options have Jz eigenvalues ±1. Since |Jz | ≤ 1 for a spin 1 particle, the Jz
eigenvalues of the two photons must be opposite, so the allowed polarization combinations are
|kẑ, e1R , −kẑ, e2R i and |kẑ, e1L , −kẑ, e2L i, giving Jz = 0. Now consider the effect of a rotation
Ry (π/2). Both of these states are eigenstates of this rotation, with an eigenvalue of 1. But the
Jz = 0 state of a spin 1 irrep flips sign, as can be seen by considering the transformation of Y10 (θ, φ),
so the term is forbidden. Similar reasoning can be used to restrict various other decays; further
constraints come from parity.
A0 = U (R)AU (R)†
hψ 0 |V|ψ 0 i = Rhψ|V|ψi.
[Ji , Vj ] = i~ijk Vk
• Similarly, we may show that the dot product of vector operators is a scalar operator, the cross
product is a vector operator, and so on. For example, p2 is a scalar operator and L = r × p
is a vector operator. The adjoint formula shows that angular momentum is always a vector
operator.
145 8. Angular Momentum
For example, the outer product of vector operators Tij = Vi Wj is a tensor operator. A physical
example of a rank-2 tensor operator is the quadrupole moment.
• Starting with the Cartesian basis x̂, ŷ, ẑ, we define the spherical basis vectors
x̂ + iŷ x̂ − iŷ
ê1 = − √ , ê0 = ẑ, ê−1 = √ .
2 2
We may expand vectors in this basis (or technically, the basis ê∗q ) by
X = ê∗q Xq , Xq = êq · X
• As an example application, consider calculating the dipole transition rate, which is proportional
to hn0 `0 m0 |x|n`mi. This is messy, but a simplification occurs if we expand x in the spherical
basis, because r
3
rY1q (Ω) = xq .
4π
Then the matrix element factors into an angular and radial part,
Z ∞ r Z
0 0 0 ∗ 4π
hn ` m |xq |n`mi = 2
r dr Rn0 `0 (r)rRn` (r) × dΩ Y`∗0 m0 (Ω)Y1q (Ω)Y`m (Ω).
0 3
This is a substantial improvement: we see that n and n0 only appear in the first factor, while
m and m0 only appear in the second. Furthermore, the integral vanishes automatically unless
m0 = q + m, which significantly reduces the work that must be done. Even better, the angular
part is the same for all rotationally symmetric systems; the radial part factors out what is
specific to hydrogen.
• The ‘coincidence’ arises because both the spherical harmonics and spherical basis arise out of
the representation theory of SU (2). The Ylm ’s are the standard angular momentum basis for
the action of rotations on functions on the sphere. Similarly, the spherical basis is the standard
angular momentum basis for the action of rotations in space, which carries the representation
j = 1.
• More generally, tensor quantities carry representations of SO(3) classically, and hence tensor
operators carry representations of SU (2) in quantum mechanics. Hence it is natural for the
photon, which is represented by the vector A classically, to have spin 1.
• Tensor operators can be broken down into irreps. Scalar and vector operators are already irreps,
but the tensor operator Tij = Vi Wj contains the scalar and vector irreps
tr T = V · W, X = V × W.
The remaining degrees of freedom form a five-dimensional irrep, the symmetric traceless part
of Tij . This is in accordance with the Clebsch–Gordan decomposition 1 ⊗ 1 = 0 ⊕ 1 ⊕ 2. The
same decomposition holds for arbitrary Tij by linearity.
146 8. Angular Momentum
• Irreps in the standard basis transform by the same D matrices that we introduced earlier. For
example, an irreducible tensor operator of order k is a set of 2k + 1 operators Tqk satisfying
An irreducible tensor operator of order k transforms like a spin j particle. In our new language,
writing x in terms of the xq is just writing it as an irreducible tensor operator of order 1.
• Rotations act on kets by multiplication by U (R), while rotation act on operators by conjugation,
which turns into commutation for infinitesimal rotations. Therefore the angular momentum
operators affect the irreducible tensor operator Tqk exactly as they affect the kets |kqi, but with
commutators,
[Jz , Tqk ] = ~kTqk , [Ji , [Ji , Tqk ]] = ~2 k(k + 1)Tqk .
We don’t even have to prove this independently; it just carries over from our previous work.
• In the case of operators, there’s no simple ‘angular momentum operator’ as in the other cases,
because it would have to be a superoperator, i.e. a linear map of operators.
Note. The ideas above can be used to understand higher spherical harmonics as well. The functions
x, y, and z form an irrep under rotations, and hence the set of homogeneous second-order polynomials
forms a representation as well. Using the decomposition 1 ⊗ 1 = 0 ⊕ 1 ⊕ 2 yields a five-dimensional
irrep, and dividing these functions by r2 yields the ` = 2 spherical harmonics.
This explains the naming of chemical orbitals. The p orbitals are px , py , and pz , corresponding
to angular parts x/r, y/r, and z/r. Note that this is not the standard angular momentum basis;
the functions are instead chosen to be real and somewhat symmetrical. The names of the d orbitals
are similar, though dz 2 should actually be called d3z 2 −r2 .
We now state the Wigner–Eckart theorem, which simplifies matrix elements of irreducible tensor
operators.
• Consider a setup with rotational symmetry, and work in the basis |γjmi. A scalar operator K
commutes with both Jz and J 2 , and hence preserves j and m. Moreover, since it commutes
with J± , its matrix elements do not depend on m,
This implies, for instance, that the eigenvalues come in multiplets of degeneracy 2j + 1. We’ve
already seen this reasoning before, for the special case K = H, but the result applies for any
scalar operator in any rotationally symmetric system.
where the first factor is called a reduced matrix element, and the second is a Clebsch–Gordan
coefficient. The reduced matrix element is not a literal matrix element, but just stands in for a
quantity that only depends on T k and the γs and js.
147 8. Angular Momentum
• The Wigner–Eckart theorem factors the matrix element into a part that depends only on the
irreps (and hence depends on the detailed dynamics of the system), and a part that depends
on the m’s that label states inside the irreps (and hence is determined completely by rotational
symmetry). This simplifies the computation of transition rates, as we saw earlier. Fixing the
γ’s and j’s, there are generally (2j + 1)(2j 0 + 1)(2k + 1) matrix elements to compute, but we
can just compute one, to get the reduced matrix element.
• The intuition for the Clebsch–Gordan coefficient is that Tqk |jmi transforms under rotations just
like the ket |kqi|jmi. The Clebsch–Gordan factor also provides several selection rules,
m0 = m + q, j 0 ∈ {|j − k|, . . . , j + k}
• If there is only one irrep, then all irreducible tensor operators of order k must be proportional
to each other. To show this directly, note that all such operators must be built out of linear
combinations of |mihm0 |. This set of operators transforms as
j ⊗ j = 0 ⊕ 1 ⊕ . . . ⊕ 2j.
Hence there is a unique irreducible tensor operator for all spins up to 2j, and none above that.
This shows, for example, that we must have µ ∝ S for spins.
• For example, an alpha particle is a nucleus whose ground state has spin zero. Restricting our
Hilbert space to this irrep, the selection rules show that every irreducible tensor operator with
k > 0 must be zero. Thus alpha particles cannot have a magnetic dipole moment.
where the last step uses explicit Clebsch–Gordan coefficients for the j ⊗ 1 case.
• We now sandwich this identity between hγ 0 jm0 | and |γjmi. Since the same j value is on both
sides, the left-hand side vanishes, giving
which is known as the projection theorem. Intuitively, the right-hand side is the projection of
V “in the J direction”, and the result says that the result is the same as V when we restrict to
a subspace of constant j. This is a generalization of the idea above that, for constant γ and j,
there is only one vector operator.
• The projection theorem can also be applied by explicitly evaluating the reduced matrix element
in the Wigner–Eckart theorem. Since the right-hand side involves the product of a scalar and
vector operator, we first seek to simplify such products.
• Let A be a vector operator and let f be a scalar operator. The Wigner–Eckart theorem says
and
hγ 0 j 0 m0 |f Aq |γjmi = hγ 0 j 0 ||f A||γjihj 0 m0 |j1mqi.
Furthermore, since f is a scalar, we have
Combining these results gives a decomposition for the reduced matrix elements of f A,
X
hγ 0 j 0 ||f A||γji = hγ 0 j 0 ||f ||Γj 0 ihΓj 0 ||A||γji
Γ
which makes sense: both A and f can move between irreps, though only A can change j.
where we used the decompositions above and the reduced matrix elements of J.
149 9. Discrete Symmetries
9 Discrete Symmetries
9.1 Parity
In the previous section, we studied proper rotations. We now add in parity, an improper rotation,
and consider its representations. Discrete symmetries are also covered in the context of relativistic
quantum mechanics in the notes on the Standard Model.
• In classical mechanics, the parity operator P inverts all spatial components. It has matrix
representation −I, satisfies P 2 = I, and commutes with all proper rotations, P RP −1 = R.
• The above postulates rule out projective representations. These are allowed in principle, but
won’t be necessary for any of our applications.
• For a spinless particle, we have previously defined U (R)|xi = |Rxi. Similarly, we may define
π|xi = −|xi, which obeys all of the postulates above. We may also explicitly compute
πxπ † = −x, πpπ † = −p, πLπ † = L
where L is the orbital angular momentum r × p. the parity of the state |lmi is (−1)l .
• Another example is a spin-s particle with no spatial wavefunction. The states are |smi for
m = −s, . . . , s. Since π is a scalar operator, we must have
π|smi = η|smi
for some constant η = ±1. In nonrelativistic quantum mechanics, the sign has no physical
consequences, so we choose η = 1 so that parity does nothing to the spin state. Adding back
the spatial degrees of freedom gives π|x, mi = |−x, mi.
• In relativistic quantum mechanics, the sign of η makes a physical difference because particle
number can change, but the overall parity must be conserved; this provides some selection rules.
For example, the fact that the photon has negative parity is related to the fact that the parity
of an atom flips during an electric dipole transition, which involves one photon.
• Note that E is a polar vector while B is an axial vector. In particular, adding an external
magnetic field does not break parity symmetry.
• Parity is conserved if [π, H] = 0. This is satisfied by the central force Hamiltonian, and more
generally to any system of particles interacting by pairwise forces of the form V (|ri − rj |).
• Parity remains conserved when we account for relativistic effects. For example, such effects
lead to a spin-orbit coupling L · S, but this term is a true scalar. Parity can appear to be
violated when photons are emitted (or generally when a system is placed in an external field),
but remains conserved as long as we account for the parity of the electromagnetic field.
• Parity is also conserved by the strong interaction, but not by the weak interaction. The weak
interaction is extremely weak at atomic energy scales, so parity symmetry is extremely accurate
in atomic physics.
• Just like rotational symmetry, parity symmetry can lower the dimensionality of a system. If
[π, H] = 0, then we can split the Hilbert space into representations with +1 and −1 parity and
diagonalize H within them separately, which is more computationally efficient.
• In the case of rotational symmetry, every rotational irrep has definite parity since π is a scalar
operator. In particular, if there is no degeneracy of irreps, then every energy eigenstate is
automatically a parity eigenstate.
• For example, in hydrogen, the 2s and 2p irreps are degenerate, with even and odd parity. A
linear combination of these states gives an energy eigenstate without definite parity.
Example. Selection rules for electric dipole transitions. Such a transition is determined by the
matrix element hn0 `0 m0 |x|n`mi. It must be parity invariant, but under parity it picks up a factor
0
of (−1)`+` +1 , giving the selection rule ∆` = odd. The Wigner–Eckart theorem rules out |∆`| > 1,
so we must have ∆` = ±1. The Wigner–Eckart theorem also gives |∆m| ≤ 1.
Example. A spin-orbit coupling. Consider a particle with spatial state |n`m` i, which separates
into a radial and angular part |n`i|`m` i, and a spin state |sms i. Ignoring the radial part, which
separates out, we consider the total spin states
X
|`jmj i = |`m` i|sms ih`sm` ms |jmj i.
m` ,ms
The wavefunction of such a state takes in an angular coordinate and outputs a spinor. A spin-orbit
coupling is of the form σ · x. Since this term is rotationally invariant, it conserves j and mj . From
the standpoint of the spatial part, it’s like an electric dipole transition, so ∆` = ±1. Thus the
interaction can transfer angular momentum between the spin and orbit, one unit at a time.
151 9. Discrete Symmetries
• This reasoning fails in the case of an external magnetic field. However, if we consider the field
to be internally generated by charges in the system, then time reversal takes
ρ → ρ, J → −J, E → E, B → −B,
where we suppress time coordinates. This gives an extra sign flip that restores the symmetry.
• Note that this is the opposite of the situation with parity. In this case, J is flipped as well, but
E is flipped while B isn’t.
~2 2
∂ψ
i~ = − ∇ + V (x) ψ(x, t).
∂t 2m
It is tempting to implement time reversal by taking ψ(x, t) → ψ(x, −t), but this doesn’t work
because only the left-hand changes sign. However, if we take
then we do get a solution, as we can conjugate both sides. Since position information is in the
magnitude of ψ and momentum information is in the phase, this is simply performing the flip
p → −p we already did in the classical case.
Setting t = 0, the time reversal operator takes the initial condition |ψ(0)i to the initial condition
for the reversed motion |ψr (0)i.
152 9. Discrete Symmetries
Θ† Θ = 1.
• We postulate that spin angular momentum flips as well. This can be understood classically by
thinking of spin as just internal rotation. Since µ ∝ S, the magnetic moment also flips.
but we must get [x, −p]. Alternatively, we know that Θ flips the sign of L, but this seems
impossible to reconcile with [L, L] = iL.
• However, we can construct Θ if we let it be an antilinear operator, i.e. an operator that complex
conjugates everything to its right. This conjugation causes an extra sign flip due to the i’s in
the commutators, and leads to conjugated wavefunction, as already seen above.
• Generally, Wigner’s theorem states that any map that preserves probabilities,
must be either unitary or antiunitary. Continuous symmetries must be unitary, since they are
connected to the identity, which is unitary; of the common discrete symmetries, time reversal
symmetry is the only antiunitary one.
Working with antilinear operators is delicate, because Dirac notation is made for linear operators.
Lc = cL, Ac = c∗ A.
(hφ|L)|ψi ≡ (hφ|)(L|ψi).
However, if we naively extend this to antilinear operators, then hφ|A would be an antilinear
functional, while bras must be linear functionals. Thus we add a complex conjugation,
(hφ|A)|ψi ≡ [(hφ|)(A|ψi)]∗ .
It matters which way an antilinear operator acts, and switching it gives a complex conjugate.
153 9. Discrete Symmetries
To extend this to antilinear operators, we need to find which way A and A† act. The correct
rule is to flip the direction of action,
hφ|A† |ψi = [hψ| (A|φi)]∗ .
One can check that this behaves correctly when |ψi and |φi are multiplied by scalars. The
rule can be remembered by simply flipping everything when taking the Hermitian conjugate.
Equivalently, we simply maintain the rule
A† A = AA† = 1.
• In the case of a spinless system, it can be verified that Kx acts on x and p in the appropriate
manner, so this is the time reversal operator; as we’ve seen, it conjugates wavefunctions.
• Now consider a particle of spin s, ignoring spatial degrees of freedom. Since ΘSΘ† = −S, we
have ΘSz Θ† = −Sz , so Θ flips m, Θ|smi = cm |s, −mi. On the other hand, we have
which yields cm1 = −cm , so that cm = η(−1)s−m . We can absorb an arbitrary phase into η.
The common choice is
Θ|smi = i2m |s, −mi.
• An alternate way to derive this result is to set K = KSz , so that K is conjugation in the
standard angular momentum basis, then choose L to fix up the commutation relations,
Θ = e−iπSy /~ K = Ke−iπSy /~
where the exponential commutes with K because its matrix elements are real.
154 9. Discrete Symmetries
Θ = Kx,Sz e−iπSy /~ .
One might wonder why Sy appears, rather then Sx . This goes back to our choice of Cordon–
Shortley phase conventions, which have a nontrivial effect here because Θ is antilinear.
• In the case of many particles with spin, we may either multiply the individual Θ’s or replace
Sy and Sz above with the total angular momenta. These give the same result because the
Clebsch–Gordan coefficients are real.
• Time reversal invariance holds for any Hamiltonian of the form H = p2 /2m + V (x). It is broken
by an external magnetic field, but not by internal fields. For example, the spin-orbit coupling
L · S is time-reversal invariant because both the angular momenta flip.
• First, we verify the time-reversed state obeys the Schrodinger equation. Setting ~ = 1,
i∂t |ψr (t)i = Θ [i∂τ |ψ(τ )i] = ΘH|ψ(τ )i = (ΘHΘ† )|ψr (t)i.
Hence the time-reversed state satisfies the Schrodinger equation under the time-reversed Hamil-
tonian. The Hamiltonian itself is invariant under time reversal if [Θ, H] = 0.
• If the Hamiltonian is invariant under time reversal and |ψi is a nondegenerate energy eigenstate,
we must have Θ|ψi = eiθ |ψi, where the eigenvalue is a phase because Θ preserves norms. Then
the state eiθ/2 |ψi has Θ eigenvalue 1.
• For the case of spatial degrees of freedom, this implies that the wavefunctions of nondegenerate
states can be chosen real.
• More generally, Θ can link pairs of degenerate energy eigenstates. One can show that we can
always change basis in this subspace so that both have Θ eigenvalue 1.
• For example, for the free particle, e±ikx can be combined into sin(kx) and cos(kx). As another
example, atomic orbitals in chemistry are conventionally taken to be linear combinations of the
Y`,±m , with real wavefunctions.
• In general, we have
(
2 −iπSy /~ −iπSy /~ −i(2π)Sy /~ 1 bosons
Θ = Ke Ke =e = .
−1 fermions
This does not depend on phase conventions, as any phase adjustment cancels itself out.
• When there are an odd number of fermions, Θ2 = −1. Then energy levels must be twofold
degenerate, because if they were not, we would have Θ2 |ψi = Θeiθ |ψi = |ψi, which contradicts
Θ2 = −1. This result is called Kramer’s degeneracy.
155 9. Discrete Symmetries
• For example, given rotational symmetry, Kramer’s degeneracy trivially holds because |l, mi
pairs with |l, −mi, where m =6 0 for half-integer l. The nontrivial point is that this remains
true even when, e.g. an external electric field is turned on, breaking rotational symmetry. One
might protest that no degeneracy then remains, in the case of a particle with an electric dipole
moment – but as we’ll now see, time reversal forbids such dipole moments!
In general, quantum objects such as atoms and nuclei can have multipole moments, which presents
a useful application of parity, time reversal, and the Wigner–Eckart theorem.
where the charge, electric dipole moment, and electric quadrupole moment are
Z Z Z
q = dr ρ(r), d = dr ρ(r)r, Qij = dr ρ(r)Tij .
• There is a similar expansion for the vector potential, but the monopole term vanishes, and we
won’t consider any situations where the quadrupole term matters, leaving the dipole term,
µ× r
Z
1
A(r) = , µ= dr r × J(r).
r3 2
• We call the terms in the multipole expansion “2k -poles” for convenience. Formally, the multipole
expansion is just representation theory: a 2k -pole transforms in the spin k irrep, and hence is
described by 2k +1 numbers. Accordingly, at the quantum level, the 2k -poles become irreducible
tensor operators.
• Now restrict to systems described by a single irrep, such as nuclei. In this case, many of the
multipoles are forbidden by symmetry. For example, consider an electric dipole moment d.
Classically, we expect d to flip under P and stay the same under T . But by the Wigner–Eckart
theorem, d ∝ S, which stays the same under P but flips under T . So a permanent electric
dipole moment for a nucleus would violate P and T .
• This argument is actually too quick, because there’s no reason the electric dipole moment of
a quantum system has to behave like our classical intuition suggests. A better argument is to
show that there is no way to extend the definitions of P and T we are familiar with, for classical
objects, to these quantum objects, in such a way that it is a symmetry of the theory.
• To do this, note that our quick argument shows d must transform like S. However, we measure
the effects of d by interaction terms like d · E, and we know that E must flip under P and stay
the same under T . Hence the term d · E is odd under both P and T , so the Hamiltonian is not
symmetric.
• Of course, one could just modify how E transforms to get a symmetry of the Hamiltonian, but
that symmetry, even if useful, could not reasonably be called “parity” or “time reversal”. The
E here is a classical electric field whose transformation we should already know.
156 9. Discrete Symmetries
• Usually people talk about electric dipole moments as violating T , not violating P , even though
they violate both. The reason is that the former is more interesting. By the CP T theorem,
T violation is equivalent to CP violation. While the Standard Model has a lot of known P
violation, it has very little CP violation, so the latter is a more sensitive probe for new physics.
• Note that this argument only applies to particles described by a single irrep. That is, it applies to
neutrons because we are assuming the irreps of nuclei are spaced far apart; there’s no symmetry
that would make the lowest irrep degenerate. But a typical molecule in laboratory conditions
has enough energy to enter many irreps, since the rotational energy levels are closely spaced,
which is why we say, e.g. that water molecules have a permanent electric dipole moment.
• Similar arguments show that electric multipoles of odd k and magnetic multipoles of even k
are forbidden, by both P and T . (However, magnetic monopoles are forbidden for a different
reason.) Hence the leading allowed multipoles are electric monopoles and magnetic dipoles.
• Another rule is that for a spin j irrep, a multipole can only exist if k ≤ 2j. This follows from
the fact that there aren’t irreducible tensor operators for k > 2j on a spin j irrep. For example,
a proton has j = 1/2 and hence cannot have quadrupole moments or higher. We also saw earlier
that an alpha particle has j = 0 and hence cannot have anything but an electric monopole.
157 10. Time Independent Perturbation Theory
• Bound-state perturbation theory is a method for finding the discrete part of the spectrum of
a perturbed Hamiltonian, as well as the corresponding eigenstates. It is also known as time-
independent perturbation theory. While the Hamiltonian can have a continuous spectrum as
well, analyzing states in the continuum requires different techniques, such as time-dependent
perturbation theory.
H0 |kαi = k |kαi
where α is an index to resolve degeneracies. The Hilbert space splits into eigenspaces Hk .
• We focus on one of the energy levels n for closer study. Let the full Hamiltonian be H =
H0 + λH1 , where λ ∈ [0, 1]. Introducing this parameter gives us a way to continuously turn on
the perturbation, and also a small parameter to expand in.
• Let |ψi be an exact eigenstate with energy E, which “grows out” of the eigenspace Hn as λ
increases,
H|ψi = E|ψi.
Both E and |ψi implicitly depend on λ.
• If the perturbation is small, then we expect |ψi mostly lies in Hn . Computing P |ψi is “easy”,
while computing the part Q|ψi in all the other Hk is “hard”.
• We will write an expression for Q|ψi using a power series. First, note that
We could get a formal solution for |ψi by multiplying by (E − H0 )−1 , which satisfies
1 X |kαihkα|
= .
E − H0 E − k
kα
The denominator could also blow up if E coincides with some k for k 6= n. We will consider
this case in more detail later.
• In this case, there is only a single state |ni in Hn . We normalize |ψi so that P |ψi = |ni, which
implies hn|ψi = 1. The series reduces to
X
|ψi = λs (RH1 )s |ni.
s≥0
We see the expected suppression by energy differences of the contribution of other states.
The last term can be computed using the series above, giving
• This is still an implicit expression, because E appears on both sides. However, we can use it to
extract an explicit series for E. For example, at first order we have
To go to second order, it suffices to take the first three terms in the series for E, plugging in
the zeroth-order expression for E into the O(λ2 ) term, giving
X hn|H1 |kαihkα|H1 |ni
E = n + λhn|H1 |ni + λ2 + O(λ3 )
n − k
k6=n,α
which is what appears in most textbooks. However, at higher orders the explicit Rayleigh–
Schrodinger expansion begins to look more complicated.
• We can then plug this back into the expansion for |ψi. For example, at first order,
X hkα|H1 |ni
|ψi = |ni + λ |kαi + O(λ2 )
n − k
k6=n,α
• Often, we will only want the first order effect. In this case, the eigenvectors are just those of
H1 restricted to Hn , and the energy shifts are just the corresponding eigenvalues of H1 .
• Sometimes, some or all of the states will remain degenerate. This degeneracy might be broken
at some higher order. If it’s never broken at any order, than in almost every case we can identify
a symmetry of the full Hamiltonian which is responsible for this.
160 10. Time Independent Perturbation Theory
• To work at second order, we can substitute E with n in the denominator of the quadratic term,
giving a standard eigenvalue equation. Alternatively, we can treat λhnα|H1 |nβi as part of the
unperturbed Hamiltonian, as we have presumably already diagonalized it to get the first order
result, and treat the quadratic term as the perturbation in a new, lower-dimensional problem.
• Once we have E and the cα to some order, we know P |ψi, and we can simply plug this into
our series for |ψi to get the full state to the same order.
• Sometimes, one is concerned with “nearly degenerate” perturbation theory, where some energy
levels are very close in the unperturbed Hamiltonian. Then even a weak perturbation can cause
the perturbed energy E to cross another unperturbed energy level, causing R to diverge.
• To fix this, we can transfer a small term from H0 and H1 so that these problematic unperturbed
energy levels are exactly degenerate, then use ordinary degenerate perturbation theory. (This
is, of course, what we are implicitly doing whenever we use degenerate perturbation theory at
all, since practically there are always further effects that break the degeneracies.)
• A completely equivalent solution is to define R excluding both Hn and all nearly degenerate
eigenspaces; the resulting series is the same.
Note. Why does a state within a continuum have to be treated with time-dependent perturbation
theory? The point is that the state generally gets “lost” into the continuum, i.e. the true energy
eigenstates have zero overlap with the original unperturbed state. For example, if we prepare an
atom in an excited state but allow it to radiate into vacuum (thus introducing a continuum of states
of the electromagnetic field), then no matter how we prepare the atom, the long-term probability
of occupancy of the state is zero.
• In hydrogen, the potential is V0 (r) = −e2 /r and the energy levels are
1 e2
En = −
2n2 a0
while for alkali atoms, the energy levels are En,` with energy increasing with `.
• We take the electric field to be F = F ẑ (to avoid confusion with energy), so the perturbation is
V1 = qΦ = eF z
• It’s reasonable to treat this as a small perturbation near the nucleus, since electric fields made
in the laboratory are typically much weaker than those in atoms. However, V1 grows for large
r while V0 falls, so the perturbation analysis doesn’t work for states with sufficiently large n.
161 10. Time Independent Perturbation Theory
• Technically, there don’t exist any bound states at all, no matter how small the field F is, because
the potential will become very negative for z → −∞. Hence all states can tunnel over a barrier
and escape to infinity. We’ll ignore this, since for small n, the barrier width grows as 1/F ,
and hence the tunneling rate falls very quickly as F is decreased. More precisely, it can be
calculated using the WKB approximation.
• The ground state of hydrogen is |100i, while the ground state of an alkali atom is |n00i. There
is no linear (i.e. first order) Stark effect in this case, because
by parity, since the states have definite parity and z is odd under parity. We saw a similar
conclusion when discussing electric dipole transitions above.
• The linear Stark effect can only occur if the corresponding eigenstate has a permanent dipole
moment d. Classically, we expect that ∆E = −d · F. Quantum mechanically, the dipole
moment operator and linear energy shift are
However, hdi must vanish for any nondegenerate energy eigenstate, simply because parity leaves
the energy invariant but flips d. Hence to see a linear Stark effect, we require degenerate states.
• In a generic alkali atom, only states of the same n and ` are degenerate. But as we argued
earlier, the operator z has to change ` by ±1, so again there is no linear Stark effect. More
generally, we need systems with degenerate SU (2) multiplets with opposite parities.
• Now consider the excited states of hydrogen. We focus on the states with principal quantum
number n, which are n2 -fold degenerate. The linear Stark effect depends on the matrix elements
hn`m|eF z|n`0 m0 i.
As shown earlier, we must have ∆` = ±1, and since z is invariant under rotations about the z-
axis, m = m0 . For example, for n = 2, the only states that can be connected by the perturbation
are |200i and |210i.
The energy W is of the order of the energy needed to shift the electron from one side of the
atom to the other. This is because |210i has two symmetric lobes, of positive and negative z.
Adding on |200i will make one lobe grow and the other shrink, depending on the phase.
(1)
so the first-order energy shifts are ∆E2 = ±W . The n = 2 energy level splits into three, with
the new eigenstates
1
|±W i = √ (|200i ∓ |210i)
2
and the states |211i and |21, −1i remaining degenerate at this order.
• This degeneracy remains at all orders, and can be explained using symmetries. In fact, it turns
out that this degeneracy can be explained by the surviving subset of the SO(4) symmetry of
the hydrogen atom. However, the degeneracy can also be explained more simply by noting that
Lz and time reversal Θ commute with H.
• From Lz , we know the energy eigenstates can be labeled as |γmi where γ is an additional index.
We also know that Θ flips the sign of m. Hence all states with m 6= 0 are at least doubly
degenerate.
• This result should not be confused with Kramer’s degeneracy, which applies to systems without
rotational symmetry but an odd number of fermions. Since we have neglected the electron’s
spin, its fermionic nature never came into play above.
Example. In the H2+ molecule, the protons can be treated as roughly fixed. Then there is rotational
symmetry along the axis connecting them, causing two-fold degeneracy for m 6= 0 states as above.
However, in reality the protons are free to move, causing a small splitting known as “Λ-doubling”,
where Λ is the standard name for the magnetic quantum number of the electrons about the axis of
a diatomic molecule.
We now continue discussing the Stark effect in hydrogen.
• In the absence of an external field, the 2p level of hydrogen decays quickly to 1s, with a lifetime
on the order of 10−9 seconds. But the 2s state has a much longer lifetime of 10−1 seconds,
because it decays to 1s by emitting two photons. This makes it easy to prepare a population
of 2s and 1s states.
• However, by turning on an electric field, the 2s and 2p states rapidly evolve into each other.
When such a field is applied to a population of 2s hydrogen atoms, the result is a rapid burst
of photons.
• Now we return to the ground state and consider the first order wavefunction shift. The result is
X hn`m|eF z|100i
|ψi = |100i + |n`mi .
E1 − En
n`m6=100
More generally, the polarizability could be a tensor, hdi i = αij Fj + O(F 2 ). We can convert the
polarizability of an atom to a dielectric constant of a gas using the Clausius-Mossotti formula.
163 10. Time Independent Perturbation Theory
• Next, we can compute the energy shift of the ground state to second order, i.e. the quadratic
Stark effect. The result is
X h100|eF z|n`mihn`m|eF z|100i 1 1
∆Eg(2) = = − αF 2 = − hdi · F.
E1 − En 2 2
n`m6=100
This factor of 1/2 is exactly as expected, because the dipole moment is induced, rather than
permanent; it grows linearly with F as F is turned on.
• Calculating α is a little tricky, because we must sum over an infinite number of intermediate
states, including the ionized continuum states. However, a crude estimate can be done using
3 e2
En − E1 > E2 − E1 =
8 a0
which implies
2e2 X
α< h100|z|n`mihn`m|z|100i
E2 − E1
n`m
where we have removed the restriction on the sum since the additional term doesn’t contribute
anyway. Recognizing a resolution of the identity,
2e2 2e2 16
α< h100|z 2 |100i = a20 = a30 .
E2 − E1 E2 − E1 3
In fact, the exact answer is α = (9/2)a30 . We could have guessed that α ∼ a30 from a classical
model, e.g. thinking of the electron as a mass on a spring, but only this calculation gives us
the coefficient.
• Above, we have discussed a stark difference between a system with degeneracy and a system
without: a lack of degeneracy guarantees no linear Stark effect. But in real life, degeneracy
is never perfect. More precisely, if the degeneracy is weakly broken by some other physics,
then the Stark effect will be quadratic in the regime where that other physics dominates, and
linear when the Stark effect dominates. This is just the case for hydrogen, where the 2s and 2p
degeneracy is already broken by the Lamb shift.
• A more formal way to say this is that the full Hamiltonian can be written as H0 plus a
possibly large number of small perturbations. To get the right answer, we should account for
the most important perturbation first, then treat the next-most important perturbation as a
perturbation on the result, and so on. Of course the physical answer doesn’t depend on how
we do the ordering, but if we choose it wrong, then our resulting series won’t be good.
• In chemistry, one often speaks of molecules with permanent electric dipole moments. This
doesn’t violate parity; it simply means that two energy levels of opposite parity are close
enough that even a small electric field takes the Stark effect into the linear regime; however,
as long as the energy levels are not exactly degenerate (which will always be the case) there is
also a quadratic regime at low fields.
164 10. Time Independent Perturbation Theory
• There are three new terms: the relativistic kinetic energy correction, the Darwin term, and the
spin-order term,
HFS = HRKE + HD + HSO .
α2 4 α2 2 α2 1 dV
HRKE = − p , HD = ∇ V, HSO = L·S
8 8 2 r dr
and it is clear the terms are all of the same order.
• As we will see below, the energy shifts will all be proportional to (Zα)2 . In fact, the full
expansion from the Dirac equation is a series in (Zα)2 , and hence is good when Zα 1. For
heavy atoms such as uranium, it is better to use a fully relativistic treatment.
• Since we are now dealing with spin, we include the spin magnetic quantum number, giving the
unperturbed basis |n`m` ms i which simultaneously diagonalizes L, L2 , S, and S 2 . The energy
levels are En = −Z 2 /2n2 .
• In general, it is useful to choose a basis which diagonalizes observables that commute with the
full Hamiltonian. If we choose the basis naively, we will have to diagonalize a 2n2 × 2n2 matrix,
while if we choose it well, we get selection rules which break the matrix into smaller pieces.
• As such, it may be useful to consider the total angular momentum J = L + S. Since HRKE is a
scalar, it commutes with L. Since it only depends on the orbital motion, it commutes with S,
and hence with J. Similarly, HD commutes with all of these operators. But we have
but [J, HSO ] = 0, since J rotates the entire system. Furthermore, HSO commutes with L2 and
S 2 because, for example, [L2 , L · S] = [L2 , L] · S = 0.
• Hence we are motivated to work in the “coupled basis” |n`jmj i which simultaneously diagonal-
izes L2 , S 2 , J 2 , and Jz . This is related to the original basis by Clebsch-Gordan coefficients,
X
|n`jmj i = |n`m` ms ih`sm` ms |jmj i
m` ,ms
and we will suppress s below. Since all three fine structure terms are diagonal in the coupled
basis, there is no need to do degenerate perturbation theory; we just have to compute their
diagonal matrix elements. (There is no point in going to second order perturbation theory,
since there are other effects that are more important at first order.)
166 10. Time Independent Perturbation Theory
• It’s easier to think about HRKE in the uncoupled basis, then transform to the coupled basis.
This term is purely orbital and commutes with L2 , so
hn`m` ms |HRKE |n`0 m0` m0s i = δ``0 δms m0s hn`m` |HRKE |n`m0` i.
= hn`0|HRKE |n`0i
α2 2 α2
HRKE = − T = − (H0 − V )2
2 2
since we know how to calculate the expectation values of H0 and V ,
Z2
1
hH0 i = En , hV i = −Z =− 2
r n
• The difficult part is calculating hV 2 i, which requires special function techniques, giving
Z2
1
=
r2 n3 (` + 1/2)
Zα2 1
hn`jmj |HSO |n`jmj i = (j(j + 1) − `(` + 1) − s(s + 1))hn`jmj | 3 |n`jmj i
4 r
where j = ` ± 1/2.
Z3
1
3
= 3 .
r n `(` + 1/2)(` + 1)
• In the case ` = 0, the prefactor is zero, but h1/r3 i diverges, so the result is indeterminate. The
proper way to handle this is to regulate the Coulomb singularity, which causes h1/r3 i not to
diverge, giving a result of zero.
• The spin-orbit and Darwin terms both have special cases for ` = 0, contributing or not con-
tributing respectively, but combine into something simple. The total result is
2 1 3 n
∆EFS = (Zα) (−En ) 2 − .
n 4 j + 1/2
Remarkably, the answer only depends directly on n and j, so the energy levels are
Z2 (Zα)2 3
n
Enj = − 2 1 − − .
2n n2 4 j + 1/2
The energy levels are shifted downward, and the total energy increases with j. Some degeneracy
remains, indicating a residual symmetry of the system.
• As shown here, the Dirac equation gives an exact result for the hydrogen energy levels,
mc2
Enj = !2 1/2
1 + Zα
p
n − j − 1/2 + (j + 1/2)2 − (Zα)2
which recovers mc2 , the ordinary energy levels, and the fine structure when expanded. However,
at the next order additional effects appear which are not captured by the Dirac equation, such
as hyperfine structure and the Lamb shift.
168 10. Time Independent Perturbation Theory
• Some energy levels are shown below, with the fine structure exaggerated for clarity. This
diagram uses spectroscopic notation n`j , where ` = s, p, d, f, . . ..
• The arrows above also show the allowed electric dipole transitions. These are determined by
the matrix elements hn`jmj |x|n0 `0 j 0 m0j i. Note that the operator x is a tensor operator of spin
1 with respect to both purely spatial rotations, generated by L, and rotations of the whole
system, generated by J. Applying the Wigner–Eckart theorem gives the constraints
• The Lamb shift is due to the interaction of the electron with the quantized electromagnetic
field. Its most historically important effect is splitting the degeneracy between 2s1/2 and 2p1/2 ,
so that 2s1/2 is about 1 GHz higher than 2p1/2 . For comparison, fine structure places 2p3/2
about 10 GHz higher. Parametrically, the Lamb shift scales as En α3 log(1/α).
• Since 2s1/2 cannot participate in electric dipole transitions, the Lamb shift means that its
dominant decay mode is to 2p1/2 , upon which the atom quickly decays to 1s1/2 .
• In alkali atoms, much of the above reasoning also goes through, except that here the degeneracy
in ` is already strongly split by the non-Coulomb nature of the potential. In this case, the most
important effect is the spin-orbit coupling, because this is the only term that breaks degeneracy
in j. By a similar analysis,
α2
1 dV
∆ESO = (j(j + 1) − `(` + 1) − 3/4) .
4 r dr
For example, this term splits the 3p level of sodium to 3p1/2 and 3p3/2 . When these levels decay
to 3s, one observes the sodium doublet.
169 10. Time Independent Perturbation Theory
Note. The Lamb shift is just an additional smearing like the Darwin term, which is due to interaction
with vacuum fluctuations. Consider an atom in a large cubical box of side length L. The modes of
the quantum electromagnetic field perpetually have vacuum energy ~ωk , where ωk is their frequency.
These quantum fluctuations form a randomly varying classical electric field Ek , where
|Ek |2 L3 ∼ ~ωk
since both sides measure the total field energy in that mode. The random fluctuations change over
a characteristic time τ ∼ 1/ωk , over which the displacement of the particle is
e|Ek |
δr ∼ e|Ek |τ 2 ∼ .
ωk2
Since the fluctuations of these modes are independent, the mean square fluctuation is
X e2 |Ek |2 3
e2
Z Z
X 1 L 1 dk
hδr2 i ∼ ∼ e 2
~ ∼ e 2
~ dk ∼
ωk4 (Lωk )3 ~ (Lωk )3 ~2 c3 k3
k k
where we used the fact that states are spaced in momentum space by ∆k ∼ ~/L. This integral is
logarithmically divergent, but we should put in cutoffs. Modes with wavelengths larger than the
atom don’t affect the electron much, just pushing on it adiabatically, while modes with wavelengths
smaller than the electron’s Compton wavelength will instead cause new particles to spontaneously
pop out of the vacuum. The ratio between these two scales is α, so
e2 1
hδr2 i ∼
2 3
log .
~ c α
Following the same reasoning for the Darwin term, this gives an energy shift of
∆E 1
∼ α3 log
En α
for ` = 0 states. One can use a similar story to justify the Darwin term within quantum field
theory. Instead of interacting with virtual photons, an electron-positron pair suddenly, spontaneously
appears out of the vacuum. The positron annihilates the old electron and the new electron continues
on in its place, effectively allowing the electron’s position to teleport.
One might wonder how much of these amazing stories are actually true. Unfortunately, while
this kind of reasoning is common in popular science, it bears little resemblance to how quantum field
theory actually works. “Quantum fluctuations” do not behave like classical stochastic ones, and the
theory would not make any sense if they did. We only arrive at the correct answer by accident: while
the words above are wrong, the equations happen to have similar dimensions to the real, correct
ones. While this derivation worked, almost all heuristic arguments using quantum fluctuations give
wrong answers. If one took the above statements seriously, as many legitimately curious laypeople
do, one would also conclude, for example, that electrons with definite momentum actually are
randomly battered around by field fluctuations, and hence infinite energy can be extracted from
the vacuum by harnessing the kinetic energy the electrons pick up.
Using fragile and misleading analogies like this is perhaps responsible for the majority of pop-
ular misconceptions about quantum mechanics. Luckily, for practitioners who do know quantum
mechanics, the properties of these supposed fluctuations are so vague that one can just tweak a
“derivation” using them to get any desired answer. This makes them good at providing an “intuitive”
explanation of a real calculation one has already done, but not a good tool when one doesn’t know
the answer in advance.
170 10. Time Independent Perturbation Theory
Note. A quick and dirty derivation of Thomas precession. Consider an electron moving at speed
v c, which is following a straight track, which suddenly turns by an angle θ 1. In the electron’s
frame, the track is length contracted in the longitudinal direction, so it has a larger turn angle,
That is, the electron thinks it turns by a larger amount than it does in the lab frame, by
θ0 − θ v2
≈ γ − 1 ≈ 2.
θ 2c
If the electron moves uniformly in the lab frame, then the “extra” precession is
ωv 2 av
ωT = 2
= 2
2c 2c
and thinking a bit about the directions gives
v×a
ωT = .
2c2
This is the result for Thomas precession in the nonrelativistic limit. Plugging in a = r̂(dV /dr)
shows that half of the naive spin-orbit contribution is cancelled, as claimed above. The exact result,
which properly should be derived by integrating infinitesimal Lorentz transformations, is
γ2 v × a
ωT = .
γ + 1 2c2
• We continue to use atomic units, where c = 1/α ≈ 137. This means the Bohr magneton is
e~ 1 α
µB = = = .
2mc 2c 2
Taking the electron g factor to be 2, we hence have
S
µ = gµB = −αS
~
so the energy of interaction of an electron spin in a magnetic field is
−µ · B = αB · S.
e m2 e5
B0 = = = 1.72 × 103 T
a20 ~4
This is equal to the electric field at the Bohr radius, which in Gaussian units has the same units
as the magnetic field.
171 10. Time Independent Perturbation Theory
• However, the most important quantity for perturbation theory is the magnitude of the force;
magnetic forces are suppressed by a factor of v/c = α relative to electric ones. Hence a magnetic
field perturbation to be comparable in effect to the electrostatic field, we need field strength
B0 /α = 2.35 × 105 T, which is much higher than anything that can be made in the lab. As
such, we will always treat the magnetic fields as weak.
1 p2 α2 2
T = (p + αA)2 = + αp · A + A = T1 + T2 + T3 .
2 2 2
α2 2 2
T3 = B (x + y 2 )
8
and hence behaves like a potential. However, it is suppressed by another power of α, and hence
can be dropped.
• The last term is the spin term αB · S. Combining this with T2 gives the total perturbation
α α
HZ = B · (L + 2S) = B(Lz + 2Sz ).
2 2
The reason we can’t drop the fine structure contributions is that they scale as α2 , while the
Zeeman perturbation scales as αB. As a crude estimate, the two are equally important for field
strengths αB0 ∼ 10 T, which is quite high, though the threshold is actually about a factor of
10 smaller due to suppression by dimensionless quantum numbers.
• On the scale of materials, the spin and T2 terms are responsible for Pauli paramagnetism, while
the T3 term is responsible for Landau diamagnetism; we’ve seen both when covering statistical
mechanics. The Zeeman effect is also used to measure magnetic fields via spectral lines.
172 10. Time Independent Perturbation Theory
First, we consider the strong field case, where HZ dominates. This strong-field Zeeman effect is also
called the Paschen–Back effect. Note that we can’t take the field strength to be too high, or else
the term T3 will become important.
• The first task is to choose a good basis. Since the magnetic field is in the ẑ direction, HZ
commutes with Lz , Sz , and Jz . Furthermore, it commutes with L2 and S 2 . However, we have
[J 2 , HZ ] 6= 0
because J 2 contains L · S, which in turn contains Lx Sx + Ly Sy . Thus, the Zeeman effect prefers
the uncoupled basis.
• In the uncoupled basis, the perturbation is already diagonal, so we just read off
α α
∆E = Bhn`m` ms |Lz + 2Sz |n`m` ms i = B(m` + 2ms ) = µB B(m` + 2ms ).
2 2
Note that if one didn’t know about spin, one would expect that a spectral line always splits into
an odd number of lines, since ∆E = µB Bm` . Violations of this rule were called the anomalous
Zeeman effect, and were one of the original pieces of evidence for spin. (In fact, a classical model
of the atom can account for three lines, one of the most common cases. The lines correspond
to the electron oscillating along the field, and rotating clockwise and anticlockwise about it.)
The 2p states |m` ms i = |−1, 12 i and |1, − 12 i are degenerate. This degeneracy is broken by QED
corrections to the electron g factor, though this is suppressed by another factor of α. This
result holds identically for alkali atoms.
• For one-electron atoms, some of the 2s states are also degenerate with the 2p states, as |`m` ms i =
|00 12 i is degenerate with |10 12 i, and |00, − 12 i with |10, − 21 i. In total, the eight n = 2 states are
split into five energy levels, three of which have two-fold degeneracy.
• We now consider the impact of fine structure, treating HZ as part of the unperturbed Hamilto-
nian. For simplicity, we only consider the spin-orbit contribution,
α2 1 dV
HSO = f (r)L · S, f (r) = .
2 r dr
This is the conceptually trickiest one, since it prefers the coupled basis, while we must work in
the uncoupled basis |n`m` ms i, where there are two-fold degeneracies.
173 10. Time Independent Perturbation Theory
• Using this basis is tricky because HSO can modify m` and ms values (though not ` values, since
[L2 , HSO ] = 0). However, it can only modify m` and ms by at most one unit at a time, since
1
L · S = (L+ S− + L− S+ ) + Lz Sz
2
or by applying the Wigner–Eckart theorem. The 2p degenerate states differ in ms by multiples
of 2, so HSO can’t mix the degenerate states. Hence to calculate the first order shift, it suffices
to look at its diagonal matrix elements.
α2 1 α2 m` ms
∆E = m` ms hn`0| 3 |n`0i = 3 .
2 r 2n `(` + 1/2)(` + 1)
In the case ` = 0, the form above is indeterminate, but the energy shift is zero by similar
reasoning to before.
• For hydrogen, we should properly consider the Lamb shift, which is only 10 times smaller than
the fine structure shifts on the n = 2 energy levels. However, we will ignore it for simplicity.
• In this case, we need to use the coupled basis |n`jmj i. The difficulty is that [J 2 , HZ ] 6= 0.
Luckily, the fine-structure energy levels depend directly on j, in the sense that energy levels
with different j are not degenerate. Hence to calculate the first-order shift, we again do not
have to diagonalize any matrices, and can focus on the diagonal elements,
and using
1 2
J + S 2 − L2 .
S·J=
2
This gives the result
• The fundamental reason we can write the shift as linear in mj , even when it depends on m` and
ms separately, is again the Wigner–Eckart theorem: there is only one possible vector operator
on the relevant subspace.
• The naive classical result would be gL = 1 + 1/2 = 3/2, and the result here is different because
J, L and S are not classical vectors, but rather noncommuting quantum operators. (A naive
intuition here is that, due to the spin-orbit coupling, L and S are rapidly changing; we need to
use the projection theorem to calculate their component along J, which changes more slowly
because the magnetic field is weak.) Note that gL satisfies the expected limits: when ` = 0 we
have gL = 2, while for ` → ∞ we have gL → 1.
• For stronger magnetic fields, we would have to calculate the second-order effect, which does
involve mixing between subspaces of different `. For the n = 2 energy levels this isn’t too
difficult, as only pairs of states are mixed, so one can easily calculate the exact answer.
• Hyperfine effects couple the nucleus and electrons together, thereby enlarging the Hilbert space.
They have many useful applications. For example, the hyperfine splitting of the ground state
of hydrogen produces the 21 cm line, which is useful in radio astronomy. Most atomic clocks
use the frequency of a hyperfine transition in a heavy alkali atom, such as rubidium or cesium,
the latter of which defines the second.
• We will denote the spin of the nucleus by I, and as usual assume the nucleus is described by a
single irrep, of I 2 eigenvalue i(i + 1)~2 . The nucleus Hilbert space is spanned by |imi i.
• For stable nuclei, i ranges from 0 to 15/2. For example, the proton has i = 1/2, the deuteron
has i = 1, and 133 Cs, used in atomic clocks, has i = 7/2.
• We restrict to nuclei with i = 1/2, in which case the only possible multipole moment, besides
the electric monopole, is the magnetic dipole.
Here we’re mixing vector and tensor notation; I is the identity tensor, T is the quadrupole
tensor, and dotting with µ on the left indicates contraction with the first index. The delta
function terms, present for all physical dipoles, will be important for the final result.
A 2
1 1
H= p+ + V (r) + HFS + HLamb + S · B.
2 c c
175 10. Time Independent Perturbation Theory
µ = gN µN I
where µN is the nuclear magneton. The states in the Hilbert space can be written as |n`jmj mi i,
which we refer to as the “uncoupled” basis since J and I are uncoupled.
• As in our analysis of the Zeeman effect, the vector potential is in Coulomb gauge and the A2
term is negligible, so by the same logic we have
1
H1 = (p · A + S · B).
c
However, it will be more difficult to evaluate these orbital and spin terms.
p · (I × r) = I · (r × p) = I · L
where one can check there are no ordering issues. Similarly, there are no ordering issues in the
spin term, since S and I act on separate spaces. Hence we arrive at
4π 1 8π I·T ·S
H1,orb = k(I · L) δ(r) + 3 , H1,spin = k δ(r)(I · S) + .
3 r 3 r5
The delta function terms are called Fermi contact terms, and we have defined
k = 2gN µB µN = ge gN µB µN .
The term H1,spin is a spin-spin interaction, while H1,orb can be thought of as the interaction of
the moving electron with the proton’s magnetic field.
• It’s tempting to add in additional terms, representing the interaction of the proton’s magnetic
moment with the magnetic field produced by the electron, due to its spin and orbital motion.
These give additional copies of H1,spin and H1,orb respectively, but they shouldn’t be added
since they would double count the interaction.
• The terms I · L and I · S don’t commute with L, S, or I. So just as for fine structure, we are
motivated to go to the coupled basis. We define F = J + I and diagonalize L2 , J 2 , F 2 , and Fz .
The coupled basis is related to the uncoupled one as
X
|n`jf mf i = |n`jmj mi ihjimj mi |f mf i.
mj ,mi
To relate this coupled basis to the original uncoupled basis |n`m` ms mf i, we need to apply
Clebsch-Gordan coefficients twice. Alternatively, we can use tools such as the Wigner 6j symbols
or the Racah coefficients to do the addition in one step.
• In the coupled basis, the perturbation is diagonal, so we again can avoid diagonalizing matrices.
It suffices to compute diagonal matrix elements,
• First we consider the case ` 6= 0, where the contact terms do not contribute. We can write the
energy shift as
L T ·S L 3r(r · S) − r2 S
∆E = khn`jf mf |I · G|n`jf mf i, G= + = + .
r3 r5 r3 r5
• The quantity G is a purely electronic vector operator, and we are taking matrix elements within
a single irrep of electronic rotations (generated by J), so we may apply the projection theorem,
k
∆E = hn`jf mf |(I · J)(J · G)|n`jf mf i.
j(j + 1)
• Now consider the case ` = 0. As we just saw, the non-contact terms get a factor of J·G = L2 /r3 ,
so they vanish in this case. Only the contact term in H1,spin contributes, giving
8π
∆E = khδ(r)(I · S)i.
3
Since F = I + S when L = 0, we have
1 1 3
I · S = (F 2 − I 2 − S 2 ) = f (f + 1) − .
2 2 2
The delta function is evaluated as for the Darwin term. The end result is that the energy shift
we found above for ` 6= 0 also holds for ` = 0.
177 10. Time Independent Perturbation Theory
• When the hyperfine splitting is included, the energy levels become En`jf . The states |n`jf mf i
are (2f + 1)-fold degenerate.
• For example, the ground state 1s1/2 of hydrogen splits into two levels, where f = 0 is the true
ground state and f = 1 is three-fold degenerate; these correspond to antiparallel and parallel
nuclear and electronic spins. The frequency difference is about 1.42 GHz, which corresponds to
a 21 cm wavelength.
• The 2s1/2 and 2p1/2 states each split similarly; the hyperfine splitting within these levels is
smaller than, but comparable to, the Lamb shift between them. The fine structure level 2p3/2
also splits, into f = 1 and f = 2.
The Wigner–Eckart theorem can be applied to rotations in J, F, and I separately, under each
of which xq is a k = 1 irreducible tensor operator, giving the constraints
• Finally, there is a special case for f 0 = 0, because this is the only representation that, upon
multiplication by the spin 1 representation, does not contain itself: 0 6∈ 0 ⊗ 1. This means we
cannot have a transition from f 0 = 0 to f = 0. The same goes for `, but this case is already
excluded by parity.
• Note that the 21 cm line of hydrogen is forbidden by the rules above; it actually proceeds as a
magnetic dipole transition. The splitting is small enough for it to be excited by even the cosmic
microwave background radiation. The 21 cm line is especially useful because its wavelength is
too large to be scattered effectively by dust. Measuring its intensity gives a map of the atomic
hydrogen gas distribution, measuring its Doppler shift gives information about the gas velocity,
and measuring its line width determines the temperature. Doppler shift measurements were
used to map out the arms of the Milky Way. (These statements hold for atomic hydrogen;
molecular hydrogen (H2 ) has a rather different hyperfine structure.)
• It is occasionally useful to consider both the weak-field Zeeman effect and hyperfine structure.
Consider a fine structure energy level with j = 1/2. For each value of mf there are two states,
with f = i ± 1/2. The two perturbations don’t change mf , so they only mix pairs of states.
Thus the energy level splits into pairs of levels, which are relatively easy to calculate; the result
is the Breit–Rabi formula. The situation is just like how the Zeeman effect interacts with fine
structure, but with (`, s) replaced with (j, i). At lower fields the coupled basis is preferred,
while at higher fields the uncoupled basis is preferred.
Note. The perturbations we’ve considered, relative to the hydrogen energy levels, are of order:
1 me
fine structure: α2 , Lamb: α3 log , Zeeman: αB, hyperfine: α2
α mp
where α ∼ 10−2 , me /mp ∼ 10−3 , and the fine structure is suppressed by O(10) numeric factors.
The hydrogen energy levels themselves are of order α2 mc2 .
178 10. Time Independent Perturbation Theory
It’s interesting to see how these scalings are modified in positronium. The fine structure is
still α2 , but the Lamb shift enters at the same order, since there is a tree-level diagram where
the electron and positron annihilate and reappear; the Lamb shift for hydrogen is loop-level. The
hyperfine splitting also enters at order α2 , so one must account for all of these effects at once.
• The variational method is a rather different kind of approximation method, which does not
require perturbing about a solvable Hamiltonian. It is best used for approximating the energies
of ground states.
• Let H be a Hamiltonian with at least some bound states, and energy eigenvalues E0 < E1 <
E2 < . . .. Then for any normalizable state |ψi, we have
hψ|H|ψi
≥ E0 .
hψ|ψi
The reason is simple: |ψi has some component along the true ground state and some component
orthogonal to it. The first component has expected energy E0 , while the second has expected
energy at least E0 .
• If we can guess |ψi so that its overlap with the ground state is 1 − when normalized, then its
expected energy will match the ground state energy up to O(2 ) corrections.
• In practice, we use a family of trial wavefunctions |ψ(λ)i and minimize the “Rayleigh–Ritz
quotient”,
hψ(λ)|H|ψ(λ)i
F (λ) =
hψ(λ)|ψ(λ)i
to approximate the ground state energy. This family could either be linear (i.e. a subset of the
Hilbert space) or nonlinear (e.g. the set of Gaussian wavefunctions).
However, this just tells us that |ψi is an eigenvector of the Hamiltonian restricted to our
variational subspace, with eigenvalue β. Our upper bound on the ground state energy is just
the lowest eigenvalue of this restricted Hamiltonian, which is intuitive.
• This sort of procedure is extremely common when computing ground state energies numerically,
since a computer can’t work with an infinite-dimensional Hilbert space. The variational principle
tells us that we always overestimate the ground state energy by truncating the Hilbert space,
and that the estimates always go down as we add more states.
(M )
• In fact, we can say more. Let βm be the mth lowest energy eigenvalue for the Hamiltonian
truncated to a subspace of dimension M . The Hylleraas–Undheim theorem states that if we
expand to a subspace of dimension N > M ,
(N ) (M ) (N )
βm ≤ βm ≤ βN −M +m .
In particular, if the Hilbert space has finite dimension N , then the variational estimate can
become exact, giving
(M )
Em ≤ βm ≤ EN −M +m .
This means that we can extract both upper bounds and lower bounds on excited state energies,
though still only an upper bound for the ground state energy.
• Another way to derive information about excited states is to use symmetry properties. For
example, for an even one-dimensional potential, the ground state is even, so we get a variational
upper bound on the first excited state’s energy by using odd trial wavefunctions. More generally,
we can upper bound the energy of the lowest excited state with any given symmetry.
E(α∗ ) = 1.08
which is a fairly good estimate. Now, the first excited state has E1 ≈ 3.80. We can estimate this
with an odd trial wavefunction, such as
3 1/4
4α 2
ψ(x, α) = xe−αx /2
π
which gives an estimate E(α∗ ) = 3.85.
180 10. Time Independent Perturbation Theory
The variational principle can be used to prove a quantum version of the virial theorem.
• Consider a power law potential, V (r) ∝ rn , in d dimensions, and suppose there is a normalizable
ground state ψ0 (x). Then the wavefunction ψ(x) = αd/2 ψ0 (αx) which is squished by α has
E(α) = α2 hT i0 + α−n hV i0 .
We know the minimum of this function must occur at α = 1, since this is the true ground state.
This implies the virial theorem,
2hT i0 = nhV i0
where the expectation values are taken in the ground state.
• For the harmonic oscillator, we have hT i0 = hV i0 , which fits with what we already know.
• For the Coulomb potential, we have E0 = −hT i0 < 0, which makes sense since we know
this potential has bound states. However, the theorem doesn’t apply to a repulsive Coulomb
potential, because the ground state is not normalizable; it contains plane waves.
• For a −1/r2 potential, the virial theorem gives E0 = 0. But if we revisit the derivation, we find
E(α) ∝ α2 , so E(α) has no minima at all, and the virial theorem does not apply. What’s really
going on is that the spectrum is unbounded below: given any bound state at all, one can lower
the energy by stretching it, implying the existence of a lower bound state.
• Physically, this occurs because for a −1/r2 potential, the Schrodinger equation has no length
scales. More realistically, since a spectrum can’t actually be unbounded below, it means we
need to regularize the potential.
• For a −1/r3 potential, the virial theorem gives E0 > 0. This seems puzzling, because one would
think that for a sufficiently strong potential there would be bound states. In fact, there are
none, and we can understand why classically: particles in such a potential are unstable against
falling all the way down to r = 0.
Note. Bound states in various dimensions. To prove that bound states exist, it suffices by the
variational principle to exhibit any state for which hHi < 0.
In one dimension, any overall attractive potential (i.e. one whose average potential is negative)
which falls off at infinity has a bound state. To see this, consider a Gaussian centered at the origin
with width λ. Then for large λ, the kinetic energy falls as 1/λ2 while the potential energy falls as
1/λ, since this is the fraction of the probability over the region of significant potential. Then for
sufficiently large λ, the energy is negative.
This argument does not work in more than one dimension. In fact, the statement remains
true in d = 2, as can be proven using a more sophisticated ansatz, as shown here. In d = 3 the
statement is not true; for instance, a sufficiently weak delta function well doesn’t have any bound
states. Incidentally, for central potentials in d = 3, if there exist bound states, then the ground
state must be an s-wave. This is because, given any bound state that is not an s-wave, one can get
a variational wavefunction with lower hHi by converting it to an s-wave.
Note. In second order nondegenerate perturbation theory, we saw that energy levels generally
“repel” each other, which means that the ground state is pushed downward at second order. This
might lead us to guess that the first order result is always an overestimate of the ground state
181 10. Time Independent Perturbation Theory
energy. That can’t be justified rigorously with perturbation theory alone, but it follows rigorously
from the variational principle, because the first order result is just the energy expectation of the
unperturbed ground state |0i.
182 11. Atomic Physics
11 Atomic Physics
11.1 Identical Particles
In this section, we will finally consider quantum mechanical systems with multiple, interacting
particles. To begin, we discuss some bookkeeping rules for identical particles.
p21 p2
H= + 2 + V (|x2 − x1 |).
2m 2m
Examples of such system include homonuclear diatomic molecules such as H2 and N2 or Cl2 .
The statements we will make below only apply to these molecules, and not to heteronuclear
diatomic such as HCl.
• One might protest that diatomic molecules contain more than two particles; for instance,
H2 contains two electrons and two protons. Here we’re really using the Born–Oppenheimer
approximation. We are keeping track of the locations of the nuclei, assuming they move slowly
relative to the electrons. The electrons only affect the potential, causing an attraction.
• If the electronic state is 1 Σ, using standard notation for diatomic molecules, then the spin and
orbital angular momentum of the electrons can be ignored. In fact, the ground electronic state
of most diatomic molecules is 1 Σ, though O2 is an exception, with ground state 3 Σ.
• The exchange operator switches the identities of the two particles. For instance, if each particle
can be described with basis |αi, then
• The exchange operator is unitary and squares to one, which means it is Hermitian. Furthermore,
†
E12 HE12 = H, [E12 , H] = 0
• There is no reasonable way to define an exchange operator for non-identical particles; everything
we will say here makes sense only for identical particles.
• Just like parity, the Hilbert space splits into subspaces that are even or odd under exchange,
which are not mixed by time evolution. However, unlike parity, it turns out that only one of
these subspaces actually exists for physical systems. If the particles have half-integer spin, only
the odd subspace is ever observed; if the particles have integer spin, only the even subspace is
observed. This stays true no matter how the system is perturbed or prepared.
• In the second quantized formalism of field theory, there is no need to (anti)symmetrize at all;
the Fock space already contains only the physical states. The symmetrization postulate is a
consequence of working with first quantized notation, where we give the particles unphysical
labels and must subsequently take them away.
• This also means that we must be careful to avoid using “unphysical” operators, which are not
invariant under exchange. For example, the operator x1 has no physical meaning, not does the
spin S1 , though S1 + S2 does.
• We first consider 12 C2 , a homonuclear diatomic molecule where both nuclei have spin 0. It does
not form a gas because it is chemically reactive, but it avoids the complication of spin.
• The two coordinates are completely decoupled, so energy eigenstates can be chosen to have the
form Ψ(R, r) = Φ(R)ψ(r). The center of mass degree of freedom has no potential, so Φ(R) can
be taken to be a plane wave,
Φ(R) = exp(iP · R).
The relative term ψ(r) is the solution to a central force problem, and hence has the form
The energy is
P2
E= + En` .
2M
• For many molecules, the low-lying energy levels have the approximate form
`(` + 1)~2
1
En` = + n+ ~ω
2I 2
where the first term comes from approximating the rotational levels using a rigid rotor, and
the second term comes from approximating the vibrational levels with a harmonic oscillator,
and I and ω depend on the molecule.
• The exchange operator flips the sign of r, which multiplies the state by (−1)` . This is like
parity, but with the crucial difference that this selection rule is never observed to be broken.
Spectroscopy tells us that all of the states of odd ` in 12 C2 are missing, a conclusion which is
confirmed by thermodynamic measurements.
184 11. Atomic Physics
• Furthermore, levels are not missing if the nuclei are different isotopes, even though, without
the notion of identical particles, the difference in the masses of the nuclei should be too small
to affect anything. Results like this are the experimental basis of the symmetrization postulate.
• Next we consider the hydrogen molecule H2 , where the nuclei (protons) have spin 1/2. Naively,
the interaction of the nuclear spins has a negligible effect on the energy levels. But the spins
actually have a dramatic effect due to the symmetrization postulate.
• We can separate the wavefunction as above, now introducing spin degrees of freedom |m1 m2 i.
The total spin is in the representation 0 ⊕ 1, where the singlet 0 is odd under exchanging the
spins, and the triplet 1 is even.
• The protons are fermions, so the total wavefunction must be odd under exchange. Therefore,
when the nuclear spins are in the singlet state, ` must be even, and we call this system
parahydrogen. When the nuclear spins are in the triplet state, ` must be odd, and we call this
system orthohydrogen. In general, “para” refers to a symmetric spatial wavefunction.
• These differences have a dramatic effect on the thermodynamic properties of H2 gas. Since
every orthohydrogen state is three-fold degenerate, at high temperature (where many ` values
can be occupied), H2 gas is 25% parahydrogen and 75% orthohydrogen. At low temperatures,
H2 gas is 100% parahydrogen.
• Note that we have taken the wavefunction to be the product of a spin and spatial part. Of
course, this is only valid because we ignored spin interactions; more formally, it is because the
Hamiltonian commutes with both exchanges of spin state and exchanges of orbital state alone.
Note. The singlet being antisymmetric and the triplet being symmetric under exchange is a special
case of a general rule. Suppose we add two identical spins j ⊕ j. The spin 2j irrep is symmetric,
because its top component is |m1 m2 i = |jji, and applying L− preserves symmetry.
Now consider the subspace with total Sz = 2j − 1, spanned by |j − 1, ji and |j, j − 1i. This has
one symmetric and one antisymmetric state; the symmetric one is part of the spin 2j irrep, so the
antisymmetric one must be part of the spin 2j − 1 irrep, which is hence completely antisymmetric.
Then the next subspace has two symmetric and one antisymmetric state, so the spin 2j − 2 irrep is
symmetric. Continuing this logic shows that the irreps alternate in symmetry.
Note. A quick estimate of the equilibration time, in SI units. The scattering cross section for
hydrogen molecules is σ ∼ a20 , so the collision frequency at standard temperature and pressure is
During the collision, the nuclei don’t get closer than about distance a0 . The magnetic field experi-
enced by a proton is hence
µ0 qv
B ∼ 2 ∼ 0.1 T.
a0
185 11. Atomic Physics
The collision takes time τ ∼ a0 /v. The resulting classical spin precession is
µN B a0
∆θ ∼ ∼ 10−7
~ v
and what this means at the quantum level is that the opposite spin component picks up an amplitude
of order ∆θ. The spin performs a random walk with frequency f and step sizes ∆θ, so it flips over
in a characteristic time
1 1
T ∼ ∼ 106 s
f (∆θ)2
which is on the order of days.
11.2 Helium
We now investigate helium and helium-like atoms.
• We consider systems with a single nucleus of atomic number Z, and two electrons. This includes
helium when Z = 2, but also ions such as Li+ and H− . One nontrivial fact we will show below
is that H− has a bound state, the H− ion.
• We work in atomic units and place the nucleus at the origin. The basic Hamiltonian is
p21 p22 Z Z 1
H= + − − +
2 2 r1 r2 r12
where r12 = |x2 −x1 |. This ignores fine structure, the Lamb shift, or hyperfine structure (though
there is no hyperfine structure for ordinary helium, since alpha particles have zero spin). Also
note that the fine structure now has additional terms, corresponding to the interaction of each
electron’s spin with the spin or orbital angular momentum of the other. Interactions between
the electrons also must account for retardation effects.
• There is another effect we are ignoring, known as “mass polarization”, which arises because
the nucleus recoils when the electrons move. To see this, suppose we instead put the center
of mass at the origin and let the nucleus move. Its kinetic energy contributes a term P 2 /2M
where P = −p1 − p2 .
• The terms proportional to p21 and p22 simply cause the electron mass to be replaced with the
electron-proton reduced mass, as in hydrogen. But there is also a cross-term (p1 · p2 )/2M ,
which is a new effective interaction between the electrons. We ignore this here because it is
suppressed by a power of m/M .
• Under the approximations above, the Hamiltonian does not depend on the spin of the electrons
at all; hence the energy eigenstates can be taken to have definite exchange symmetry under
both orbital and spin exchanges alone, as we saw for H2 .
• Thus, by the same reasoning as for H2 , there is parahelium (spin singlet, even under orbital
exchange) and orthohelium (spin triplet, odd under orbital exchange). Parahelium and orthohe-
lium behave so differently and interconvert so slowly that they were once thought to be separate
species.
186 11. Atomic Physics
• The main difference versus H2 is that it will be much harder to find the spatial wavefunction,
since this is not a central force problem: the electrons interact both with the nucleus and
with each other. In particular, since the nucleus can absorb momentum, we can’t separate the
electron wavefunction into a relative and center-of-mass part. We must treat it directly as a
function of all 6 variables, ψ(x1 , x2 ).
L = L1 + L2 , S = S1 + S2 .
• In fact, we will see that S has a very large impact on the energy, on the order of the Coulomb
energy itself. This is because the exchange symmetry of the orbital wavefunction has a strong
influence on how the electrons are distributed in space. Reasoning in reverse, this means there
is a large effective “exchange interaction” between spins, favoring either the singlet or the triplet
spin state, which is responsible in other contexts for ferromagnetism.
• The ionization potential of an atom is the energy needed to remove one electron from the atom,
assumed to be in its ground state, to infinity. One can define a second ionization potential by
the energy required to remove the second electron, and so on. These quantities are useful since
they are close to directly measurable.
• For helium, the ionization potentials are 0.904 and 2 in atomic units. (For comparison, for
hydrogen-like atoms it is Z 2 /2, so 1/2 for hydrogen.) In fact, helium has the highest first
ionization potential of any neutral atom.
• The first ionization potential tells us that continuum states exist at energies 0.904 above the
ground state, so bound states can only exist in between; any purported bound states above the
first ionization potential would mix with continuum states and become delocalized.
• For H− , the ionization potentials are 0.028 and 0.5. The small relative size of the first gives
rise to the intuition that H− is just an electron weakly bound to a hydrogen atom. There is
only a single bound state, the 11 S.
• The bound states for parahelium and orthohelium are shown below.
187 11. Atomic Physics
These values are obtained by numerically solving our simplified Hamiltonian, and do not include
fine structure or other effects. In principle, the values of L range from zero to infinity, while for
each L, the values of N range up to infinity. The starting value of each N is fixed by convention,
so that energy levels with similar N line up; this is why there is no 13 S state. Looking more
closely, one can see that energy increases with L for fixed N (the “staircase effect”), and the
energy levels are lower for orthohelium.
• We focus on the orbital part, and take the perturbation to be 1/r12 . This means the perturbation
parameter is 1/Z, which is not very good for helium, and especially bad for H− . However, the
results will be roughly correct, and an improved analysis is significantly harder.
• The two electrons will each occupy hydrogen-like states labeled by n`m, which we refer to as
orbitals. Thus the two-particle eigenfunctions of the unperturbed Hamiltonian are
Z2 1
(0) (0) 1
H0 |n1 `1 m1 n2 `2 m2 i = En1 n2 |n1 `1 m1 n2 `2 m2 i, En1 n2 = − +
2 n21 n22
if we neglect identical particle effects. Note that we use lowercase to refer to individual electrons,
and uppercase to refer to the atom as a whole.
• In order to account for identical particle effects, we just symmetrize or antisymmetrize the
orbitals, giving
1
√ (|n1 `1 m1 n2 `2 m2 i ± |n2 `2 m2 n1 `1 m1 i) .
2
This has no consequence on the energy levels, except that states of the form |n`mn`mi anti-
symmetrize to zero, and hence don’t appear for orthohelium.
• The energy levels are lower than the true ones, because the electrons repel each other. We also
note that the “double excited” states with n1 , n2 6= 1 lie in the continuum. Upon including the
perturbation, they mix with the continuum states, and are hence no longer bound states.
188 11. Atomic Physics
• However, the doubly excited states can be interpreted as resonances. A resonance is a state
that is approximately an energy eigenstate, but whose amplitude “leaks away” over time into
continuum states. For example, when He in the ground state is bombarded with photons, there
is a peak in absorption at energies corresponding to resonances.
• We can get some intuition by semiclassical thinking. We imagine that a photon excites both
electrons to higher orbits. It is then energetically possible for one electron to hit the other,
causing it to be ejected and falling into the n = 1 state in the process. Depending on the
quantum numbers involved, this could take a long time. There is hence an absorption peak at
the resonance, because at short timescales it behaves just like a bound state.
• A similar classical situation occurs in the solar system. It is energetically possible for Jupiter
to eject all of the other planets, at the cost of moving slightly closer to the Sun. In fact,
considerations from chaos theory suggest that over a long enough timescale, this will almost
certainly occur. This timescale, however, is long enough that we can ignore this process and
think of the solar system as a bound object.
• Now we focus on the true bound states, which are at most singly excited. These are characterized
by a single number n,
Z2
(0) 1
E1n = − 1+ 2
2 n
and can be written as
1
|N LM ±i = √ (|100n`mi ± |n`m100i)
2
where N = n, L = `, and M = m. We see there is no N = 1 state for orthohelium.
• The unperturbed energy levels are rather far off. For helium, the unperturbed ground state has
energy −4, while the real answer is about −2.9. For H− , we get −1, while the real answer is
about −0.53.
X r` ∞
1 1 <
= = P (cos γ)
`+1 `
r12 |x1 − x2 | r> `=0
where r< and r> are the lesser and greater of r1 and r2 , and γ is the angle between x1 and x2 .
We expand the Legendre polynomial in terms of spherical harmonics with the addition theorem,
4π X ∗
P` (cos γ) = Y`m (Ω1 )Y`m (Ω2 ).
2` + 1 m
This has the benefit that the angular integrals can be done with the orthonormality of spherical
harmonics. We have
Z √ Z ∗
√
dΩ Y`m (Ω) = 4π dΩ Y`m (Ω)Y00 (Ω) = 4π δ`0 δm0 .
after some tedious algebra. This is one factor of Z down from the unperturbed result −Z 2 , so
as expected the series is in Z.
• The negatives of the ground state energies for H− and He are hence
which are a significant improvement, though the first order correction overshoots. Indeed, as
mentioned earlier, the first order result always overestimates the ground state energy by the
variational principle, and hence sets an upper bound. It is trickier to set a lower bound, though
at the very least the zeroth order result serves as one, since it omits a repulsive interaction.
• To show H− has a bound state, we must show that the ground state energy is below the
continuum threshold of −0.5. Unfortunately, our result of −0.375 is not quite strong enough.
We now compute the first-order energy shift for the excited states.
• As stated earlier, we only need to consider singly excited states, namely the states |N LM ±i
defined above for N > 1. The energy shift is
∆EN L± = hN LM ±|H1 |N LM ±i
• The direct integral has the simple interpretation of the mutual electrostatic energy of the two
electron clouds,
|ψ100 (x1 )|2 |ψn`m (x2 )|2
Z
Jn` = dx1 dx2 .
|x1 − x2 |
It is clearly real and positive.
• The fact that Kn` is positive means that the ortho states are lower in energy than the para
states. Intuitively this is because the ortho wavefunctions vanish when x1 = x2 , while the para
wavefunctions have maxima/nodes at x1 = x2 . Hence the ortho states have less electrostatic
repulsion.
• Another important qualitative features is that the direct integrals Jn` increase with `, leading
to the “staircase effect” mentioned earlier. As for the alkali atoms, this is intuitively because as
the angular momentum of one electron is increased, it can move further away from the nucleus,
and the nuclear charge is more effectively screened by the other electron(s).
We have hence explained all the qualitative features of the spectrum, though perturbation theory
doesn’t do very well quantitatively. We can do a bit better using the variational principle.
• We recall that the unperturbed ground state just consists of two 1s electrons, which we refer
to as 1s2 , with wavefunction
Z 3 −Z(r1 +r2 )
Ψ1s2 (x1 , x2 ) = e .
π
However, we also know that each electron partially screens the nucleus from the other, so each
electron sees an effective nuclear charge Ze between Z − 1 and Z. This motivates the trial
wavefunction
Z3
Ψ(x1 , x2 ) = e e−Ze (r1 +r2 )
π
where Ze is a variational parameter.
191 11. Atomic Physics
This has the advantage that the first two terms are both clearly equal to −Ze2 /2.
This is closer than our result from first-order perturbation theory. However, since the estimate
for H− is still not below −0.5, it isn’t enough to prove existence of the bound state. This can
be done by using a more sophisticated ansatz; our was very crude, not even accounting for the
fact that the electrons should preferentially be on opposite sides of the nucleus.
• The idea of the model is to represent the electron cloud surrounding the nucleus as a zero tem-
perature, charged, degenerate Fermi–Dirac fluid, in hydrostatic equilibrium between degeneracy
pressure and electrostatic forces.
• The results we need from statistical mechanics are that for zero-temperature electrons in a
rectangular box of volume V with number density n, the Fermi wavenumber is
kF = (3π 2 n)1/3
dE ~2
P =− = (3π 2 n)5/3 .
dV 15mπ 2
We note that P is written solely in terms of constants and n. The key to the Thomas–Fermi is
to allow n to vary in space, and treat the electrons as a fluid with pressure P (n(x)). Of course,
this is precisely valid only in the thermodynamic limit.
∇P = en∇Φ
in Gaussian units, where we included the charge density for the nucleus explicitly. We will drop
this term below and incorporate it in the boundary conditions at r = 0.
~2
∇P = (3π 2 )2/3 n2/3 ∇n
3m
and plugging this into the hydrostatic equilibrium equation gives
~2
(3π 2 )2/3 n−1/3 ∇n = e∇Φ.
3m
We may integrate both sides to obtain
~2
(3π 2 n)2/3 = e(Φ − Φ0 ) ≡ eΨ.
2m
p2F
− eΦ = −eΦ0 .
2m
The left-hand side is the energy of an electron at the top of the local Fermi sea, so evidently
this result tells us it is a constant, the chemical potential of the gas. This makes sense, as in
equilibrium these electrons shouldn’t have an energetic preference for being in any one location
over any other.
It is intuitively clear that as we move outward, the potential energy goes up monotonically and
the kinetic energy goes down.
– If N > Z, we have a negative ion. Such atoms can’t be described by the Thomas-Fermi
model, because ∇P always points outward, while at some radius the electrostatic force will
start pointing outward as well, making the hydrostatic equilibrium equation impossible to
satisfy. In this model, the extra negative charge just falls off.
– If N = Z, we have a neutral atom. Then Φ(r) falls off faster than 1/r. Such a case is
described by Φ0 = 0.
– If N < Z, we have a positive ion, so Φ(r) falls off as (Z − N )e/r. Such a case is described
by Φ0 > 0. At some radius r0 , the kinetic energy and hence n falls to zero. Negative values
are not meaningful, so for all r > r0 the density is simply zero.
– The case Φ0 < 0 also has physical meaning, and corresponds to a neutral atom under
applied pressure.
~2
(3π 2 n)2/3 = eΨ, ∇2 Ψ = 4πne.
2m
We eliminate n to solve for Ψ. However, since we also know that Ψ ∼ Ze/r for small r, it is
useful to define the dimensionless variable
rΨ(r)
f (r) = , f (0) = 1.
Ze
d2 f f 3/2 (3π)2/3 a0
= , r = bx, b=
dx2 x1/2 27/3 Z 1/3
where x is a dimensionless radial variable.
• Since f (0) is already set, the solutions to the equation are parametrized by f 0 (0). Some numeric
solutions are shown below.
The case f 0 (0) = −1.588 corresponds to a neutral atom. The density only approaches zero
asymptotically. It is a universal function that is the same, up to scaling, for all neutral atoms
in this model.
• As the initial slope becomes more negative, the density reaches zero at finite radius, correspond-
ing to a positive ion with a definite radius.
194 11. Atomic Physics
• When the initial slope is less negative, the density never falls to zero. Instead, we can manually
cut it off at some radius and just declare the density is zero outside this radius, which physically
translates to imposing an external pressure. This is only useful for modeling neutral atoms
(with neutrality determining where the cutoff radius is) since one cannot collect a bulk sample
of charged ions.
• The Thomas-Fermi model has obvious limitations. For example, by treating the electrons as
a continuous fluid, we lose all shell structure. In general, the model is only reasonable for
describing the electron density at intermediate radii, breaking down both near the nucleus and
far from it.
• It can be used to calculate average properties, such as the average binding energy of charge
radius, which make it useful in experimental physics, e.g. for calculations of the slowing down
of particles passing through matter.
• We consider an atom with N electrons and nuclear charge Z, and use the basic Hamiltonian
N 2
X p i Z X 1
H= − + ≡ H1 + H2 .
2 ri rij
i=1 i<j
This neglects effects from the finite nuclear mass, fine and hyperfine structure, retardation,
radiative corrections, and so on. In particular, fine structure becomes more important for
heavier atoms, since it scales as (Zα)2 , and in these cases it is better to start from the Dirac
equation. Also note that the electron spin plays no role in the Hamiltonian.
• The Hamiltonian commutes with the total orbital angular momentum L, as well as each of
the individual spin operators Si . It also commutes with parity π, as well as all the exchange
operators Eij .
• This is our first situation with more than 2 identical particles, so we note that exchanges
generate all permutations. For each permutation P ∈ SN , there is a unitary permutation
operator U (P ) which commutes with the Hamiltonian, and which we hereafter just denote by
P . We denote the sign of P by (−1)P .
All physically meaningful operators must commute with the U (P ). If one begins with a formal
Hilbert space that doesn’t account for the symmetrization postulate, then one can project onto
the fermionic subspace with
1 X
A= (−1)P P.
N!
P
We will investigate such projectors in more detail in the notes on Group Theory.
195 11. Atomic Physics
• In Hartree’s basic ansatz, we simply ignore the symmetrization postulate. We take a trial
wavefunction of the form
|ΦH i = |1i(1) . . . |N i(N )
where the individual terms are single particle orbitals, describing the state of one electron. The
notation is a bit ambiguous: here Latin indices in parentheses label the electrons while Greek
indices in the kets label the orbitals.
• The orbitals are assumed to be normalized, and the product of a spatial and spin part,
where |msλ i is assumed to be an eigenstate of Sz with eigenvalue msλ = ±1/2. This causes no
loss of generality, because the Hamiltonian has no spin dependence.
• The variational parameters are, in principle, the entire spatial wavefunctions of the orbitals
uλ (r). It is straightforward to compute the expectation of H1 ,
N
X p2i Z
hΦH |H1 |ΦH i = hλ|(i) hi |λi(i) , hi = −
2 ri
λ=i=1
where the other bras and kets collapse by normalization. Explicitly, the expectation is
Z 2
X p Z
hΦH |H1 |ΦH i = Iλ , Iλ = dr u∗λ (r) − uλ (r).
2 r
λ
No exchange integrals have appeared, since we haven’t antisymmetrized. We are dropping the
self-interaction term λ = µ since we dropped the i = j term in the original Hamiltonian. This
term was dropped classically to avoid infinite self-energy for point charges, though note that in
this quantum context, Jλλ actually need not be divergent.
However, we can’t just minimize this directly; as usual we need a Lagrange multiplier to enforce
normalization, so we instead minimize
X
F [ΦH ] = E[ΦH ] − λ (hλ|λi − 1).
λ
196 11. Atomic Physics
• The vanishing of the functional derivative δF/δuλ (r) gives the Hartree equations
|uµ (r)|2
2 XZ
p Z
− uλ (r) + Vλ (r)uλ (r) = λ uλ (r), Vλ (r) = dr0 .
2 r |r − r0 |
µ6=λ
These equations have a simple interpretation. We see that each electron obeys a Schrodinger
equation with energy λ , and feels a potential sourced by the average field of the other charges,
which makes the equations an example of a mean field theory.
• In practice, one solves the Hartree equations by iteration. For example, one can begin by
computing the Thomas–Fermi potential, then setting the initial guess for the orbitals to be
the eigenfunctions of this potential. Then the new potentials are computed, and the resulting
Schrodinger equation is solved, and so on until convergence.
• Since the Hartree orbitals are eigenfunctions of different Schrodinger equations, there is no need
for them to be orthogonal.
• It is tempting to think of λ as the “energy of each electron”, but this is misleading because
each electron’s Hartree equation counts the interaction with every other electron. That is, if
we just summed up all the λ , we would not get the total energy because the interaction would
be double counted.
• More explicitly, if we multiply the Hartree equation by u∗λ (r) and integrate,
X
Iλ + Jλµ = λ
µ6=λ
This second way of writing the wavefunction is known as a Slater determinant, and is expanded
like a regular determinant with scalar multiplication replaced with tensor product. The rest of
the idea is the same: we simply variationally minimize the energy.
• Note that the Slater determinant vanishes if the N orbitals are not linearly independent. Mean-
while, if they are linearly independent, then they span an N -dimensional subspace of the
single-particle Hilbert space, and up to scaling the Slater determinant only depends on what
this subspace is. Hence, unlike in Hartree’s wavefunction, we can always choose the orbitals to
be orthonormal without loss of generality, in which case |Φi is automatically normalized.
• We have to make a point about language. We often speak of a particle being “in” a single-
particle state |λi (such as the Pauli exclusion principle’s “two particles can’t be in the same
state”). But because of the antisymmetrization, what we actually mean is that the joint state
is a Slater determinant over a subspace containing |λi.
• However, even though Slater determinants are very useful, they are not the most general valid
states! For instance, the superposition of two such states is generally not a Slater determinant.
Accordingly, the Hartree–Fock trial wavefunction doesn’t generally get the exact answer. In
such cases it really is not valid to speak of any individual particle as being “in” a state. Even
saying “the electrons fill the 1s and 2s orbitals” implicitly assumes a Slater determinant and is
not generally valid, but we use such language anyway because of the great difficulty of going
beyond Hartree–Fock theory.
• To evaluate the energy functional, we note that A commutes with H, since the latter is a
physical operator, so
X
hΦ|H|Φi = N !hΦH |A† HA|ΦH i = N !hΦH |HA2 |ΦH i = N !hΦH |HA|ΦH i = (−1)P hΦH |HP |ΦH i.
P
as we saw for helium. Note that the exchange integrals, unlike the direct integrals, depend
on the spin. As for helium, one can show the exchange integrals are positive. Since they
contribute with a minus sign, they lower the energy functional, confirming the expectation that
Hartree–Fock theory gives a better estimate of the ground state energy than Hartree theory.
198 11. Atomic Physics
• Again as we saw for helium, the lowering is only in effect for aligned spins, as this corresponds
to antisymmetry in the spatial wavefunction. This leads to Hund’s first rule, which is that
electrons try to align their spins. Half-filled electron shells are especially stable, since all the
electrons are aligned, leading to, e.g. the high ionization energy of nitrogen. It also explains
why chromium has configuration 3d5 4s instead of 3d4 4s2 as predicted by the aufbau principle.
• Another way in which Hartree–Fock theory makes more sense is that the self-energy, if included,
ultimately cancels out because Jλλ = Kλλ . Hence we can include it, giving
1X
hΦ|H2 |Φi = Jλµ − Kλµ .
2
λµ
As we’ll see, including the self-energy makes the final equations nicer as well.
Note that we are only enforcing normalization with Lagrange multipliers; we will see below that
we automatically get orthogonality.
Since we included the self-energy contributions, all electrons feel the same direct potential.
±
X uµ (r)u∗µ (r0 )
Vex (r, r0 ) = δ(msµ , ±1/2) .
µ
|r − r0 |
+ and V − respectively.
Hence all spin up and spin down orbitals experience Vex ex
• As such, the Hartree–Fock equations can be thought of as just two coupled Schrodinger-like
equations, one for each spin. The solutions are automatically orthogonal, because orbitals of
different spins are orthogonal, while orbitals of the same spin are eigenfunctions of the same
Hamiltonian. This illustrates how Hartree–Fock theory is more elegant than Hartree theory.
• The main disadvantage of Hartree–Fock theory is numerically handling the nonlocal potential,
and there are many clever schemes to simplify dealing with it.
199 11. Atomic Physics
• On the other hand, note that if we remove the electron with the highest associated λ , chosen
to be N , we can write the energy of the remaining electrons as
N −1 N −1
0
X 1 X 0
E = Iλ0 + 0
(Jλµ − Kλµ ).
2
λ=1 λ,µ=1
If we assume the self-consistent fields have not been significantly changed, so that I 0 = I, J 0 = J
and K 0 = K, then we have an expression for the ionization potential,
E − E 0 = N .
• To simplify calculations, we can average the potentials over angles just as in Hartree theory.
This is a little trickier to write down explicitly for the nonlocal potential, but corresponds to
replacing Vex± (r, r0 ) with an appropriately weighted average of U (R)V ± U (R)† for R ∈ SO(3),
ex
where the U (R) rotates space but not spin. The resulting averaged potential can only depend
on the rotational invariants of two vectors, namely |r|2 , |r0 |2 , and r · r0 .
• A further approximation is to average over spins, replacing the two exchange potentials Vex ±
with their average. In this case, we have reduced the problem to an ordinary central force
problem, albeit with a self-consistent potential, and we can label its orbitals as |n`m` ms i. This
is what people mean, for example, when they say that the ground state of sodium is 1s2 2s2 2p6 3s.
However, this level of approximation also erases, e.g. the tendency of valence electrons to align
their spins, which must be put in manually.
• The Hartree–Fock method gives a variational ansatz for the ground state of the basic Hamiltonian
X p2 Z X 1
i
H= − + .
2 ri rij
i i<j
The resulting states in the Slater determinant are solutions of the Schrodinger-like equation
p2 Z
huλ (r) = λ uλ (r), h(r, p) = − + V d − V ex .
2 r
Note that numerically, everything about a Hartree–Fock solution can be specified by the Rn` (r)
and n` , since these can be used to infer the potentials.
200 11. Atomic Physics
• Hartree–Fock theory gives us the exact ground state to the so-called central field approximation
to the Hamiltonian, X
H0 = h(ri , pi ).
i
Thus, we can treat this as the unperturbed Hamiltonian and the error as a perturbation,
X 1 X
H = H0 + H1 , H1 = − V d,i − V ex,i .
rij
i<j i
The term H1 is called the residual Coulomb potential, and the benefit of using Hartree–Fock
P
theory is that H1 may be much smaller than just i<j 1/rij alone.
• The unperturbed Hamiltonian H0 is highly symmetrical; for example, it commutes with the
individual Li and Si of the electrons. (Here and below, the potentials in H0 are regarded as
fixed; they are always equal to whatever they were in the Hartree–Fock ground state.) Therefore,
the useful quantum numbers depend on the most important perturbations.
• If H1 is the dominant perturbation, then we recover the basic Hamiltonian H0 , for which L
and S are good quantum numbers; as usual capital letters denote properties of the atom as a
whole. This is known as LS or Russell–Saunders coupling.
which instead favors the so-called jj-coupling. Fine structure is more important for heavier
atoms, so for simplicity we will only consider lighter atoms, and hence only LS coupling.
• Now we consider the degeneracies in H0 . The energy only depends on the n` values of the
occupied states. In general, a state can be specified by a set of completely filled orbitals, plus a
list of partly filled orbitals, along with the (m` , ms ) values of the states filled in these orbitals.
We call this data an m-set, or electron configuration.
• For the ground states of the lightest atoms, only at most one orbital will be partly filled. If it
contains n electrons, then the degeneracy is
2(2` + 1)
.
n
In the case of multiple partly filled orbitals, we would get a product of such factors.
• Now, the relevant operators that commute with H are L2 , Lz , S 2 , Sz , and π. Hence in LS
coupling we can write the states as |γLSML MS i, where γ is an index for degenerate multiplets;
these start appearing at Z = 23. The energy depends only on the L and S values.
where X X
ML = m`i , MS = msi .
i i
201 11. Atomic Physics
The sums range over all of the electrons, but can be taken to range over only unfilled orbitals
since filled ones contribute nothing.
• However, the Slater determinants are not eigenstates of L2 and S 2 . Computing the coefficients
that link the |m-seti and |LSML MS i bases is somewhat complicated, so we won’t do it in
detail. (It is more than using Clebsch–Gordan coefficients, because there can be more than two
electrons in the m-set, and we need to keep track of both orbital and spin angular momentum
as well as the antisymmetrization.) Instead, we will simply determine which values of L and S
appear, for a given electron configuration. Note that the parity does not come into play here,
since all states for a given electron configuration have the same parity.
• As for helium, we label these multiplets as 2S+1 L. The spin-orbit coupling splits the multiplets
apart based on their J eigenvalue, so when we account for it, we write 2S+1 LJ . Also, for clarity
we can also write which electron configuration a given multiplet comes from, 2S+1 L... .
• Finally, once we account for the spin-orbit coupling, we can also account for the Zeeman effect,
provided that it is weaker than even the spin-orbit coupling. In this case, the procedure runs
exactly as for a hydrogen atom with fine structure, and Lande g-factors appear after applying
the projection theorem.
We now give a few examples, focusing on the ground states of H0 for simplicity.
Example. Boron. The ground states of H0 have an electron configuration 1s2 2s2 2p, with a
degeneracy of 6. Since there is only one relevant electron, there is clearly one multiplet 2 P , where
L = ` = 1 and S = s = 1/2.
Example. Carbon. We start with the electron configuration 1s2 2s2 2p2 , which has degeneracy
6
2 = 15. Now, since there are only two relevant electrons, we can have L = 0, 1, 2 and S = 0, 1,
with each L value represented once. Overall antisymmetry determines the S values, giving 1 S, 3 P ,
and 1 D. These have dimensions 1, 9, and 5, which add up to 15 as expected.
Some of the low-lying atomic energy levels for carbon are shown below, where the energy is
measured in eV.
202 11. Atomic Physics
where the prefactor indicates the multiplicity. The first state is hence the highest weight state of a
2 D multiplet. Crossing this multiplet out leaves
The first state left over is the highest weight state of a 2 P multiplet. Finally we are left with a 4 S
multiplet. The dimensions are 10 + 6 + 4, which add up to 20 as expected.
Example. Oxygen. The electron configuration is 1s2 2s2 2p4 . This is actually easier than nitrogen
because we can treat the two missing electrons as “holes”, with the same ` and s but opposite m`
and ms from an electron. The LS multiplets are hence exactly the same as in carbon.
Note. The first case in which the ground state of H0 yields degenerate LS multiplets is the case
of three d electrons, which first occurs for Vanadium, Z = 23. For anything more complicated than
this, the answer is rather tedious to work out, and one consults standard tables.
11.6 Chemistry
204 12. Time Dependent Perturbation Theory
H(t) = H0 + H1 (t)
hf |U (t)|ii
where typically |ii and |f i are two eigenstates of the unperturbed Hamiltonian, and U (t) is the
time evolution operator. It’s useful to do this in the interaction picture.
where U (t) is the time evolution operator for H(t) from 0 to t. The states are ‘frozen’ at time
t = 0. Then all expectation values come out the same as in Schrodinger picture.
In the special case [HS (t), HS (t0 )] = 0 for all times (e.g. when it is time-independent) we find
HH (t) = HS (t).
∂U (t)
i~ = HS (t)U (t)
∂t
we find the Heisenberg equation of motion,
dAH (t) ∂AS (t)
i~ = [AH (t), HH (t)] + i~ .
dt ∂t H
• Time-independent Schrodinger operators that always commute with the Hamiltonian are said
to be ‘conserved’ in Schrodinger picture; in Heisenberg picture, they have no time evolution.
205 12. Time Dependent Perturbation Theory
|ψI (t)i = U0† (t)|ψS (t)i, AI (t) = U0† (t)AS (t)U0 (t)
That is, we evolve forward in time according to the exact Hamiltonian, then evolve backward
under the unperturbed Hamiltonian.
• In general, we can always split the Hamiltonian so that one piece contributes to the time
evolution of the operators (by the Heisenberg equation) and the other contributes to the time
evolution of the states (by the Schrodinger equation). Interaction picture is just the particular
splitting into H0 and H1 (t).
206 12. Time Dependent Perturbation Theory
• Applying the Dyson series, the interaction picture state at a later time is
Z t Z t Z t0
1 0 1 0 0
|ψI (t)i = |ii + dt H1I (t )|ii + dt dt00 H1I (t0 )H1I (t00 )|ii + · · · .
i~ 0 (i~)2 0 0
The cn (t) differ from the transition amplitudes mentioned earlier because they lack the rapidly
oscillating phase factors eiEn t/~ ; such factors don’t affect transition probabilities. (Note that
the eigenstates |ni are the same in all pictures; states evolve in time but eigenstates don’t.)
• Using the Dyson series, we can expand each coefficient in a power series
Here, we added a resolution of the identity; the second order term evidently accounts for
transitions through one intermediate state.
• To make further progress, we need to specify more about the perturbation H1 . For example,
for a constant perturbation, the phase factors come out of the integral, giving
Example. The next simplest example is sinusoidal driving. The most general example is
Physically, this could translate to absorption of light, where a sinusoidal electromagnetic field is the
driving; the response is Lorentzian. Since the K † term must be there as well, we also get resonance
for ωni ≈ −ω0 . Physically, that process corresponds to stimulated emission.
Generally, the probability is proportional to 1/(∆ω)2 and initially grows as t2 . The probability
can exceed unity close to resonance, signaling that first order perturbation theory breaks down.
Next, we consider the case of a continuum of final states, which yields Fermi’s golden rule.
• Shifting the frequency to be zero on resonance, the total transition probability to all states near
resonance. at first order, is
Z ∞
4 sin2 ωt/2
P (t) ≈ 2 dω g(ω)|hfω |K|ii|2
~ −∞ ω2
• The function sin2 (ωt/2)/ω 2 is peaked around |ω| . 1/t to a height of t2 /4, so area of the
central lobe is O(t). Away from the lobe, for |ω| & 1/t, we have oscillations of amplitude 1/ω 2 .
Integrating, the total area of the side lobes also grows as t. We thus expect the total area to
grow as t, and contour integrating shows
Z ∞
sin2 ωt/2 πt
dω 2
= .
−∞ ω 2
1 sin2 ωt/2 π
lim = δ(ω).
t→∞ t ω2 2
More generally, for arbitrary t, we can define
1 sin2 ωt/2 π
2
= ∆t (ω).
t ω 2
Plugging this into our integral and taking the long time limit gives
2πt
P (t) ≈ g(ωni )|hf |K|ii|2
~2
where f is a representative final state. This is called Fermi’s golden rule.
208 12. Time Dependent Perturbation Theory
• The transition probability grows linearly in time, which fits with our classical intuition (i.e. for
absorption of light), as the system has a constant ‘cross section’. For long times, the probability
exceeds unity, again signaling that first order perturbation theory breaks down.
• For very early times, the rule also fails, and we recover the t2 dependence. To do this, note
that limt→0 ∆t (ω) = t2 /4. Therefore, we can pull ∆t (ω) out of the integral to get
Z
P (t) ∝ t2 dω g(ω)|hfω |K|ii|2 ∝ t2 .
Fermi’s golden rule becomes valid once the variation of g(ω)|hfω |K|ii|2 is slow compared to the
variation of ∆t (ω), and we can pull the former out of the integral instead.
Note. It is sometimes said that for finite times, transitions can violate energy conservation, because
∆t (ω) has support for ω = 6 0, so we can have transitions of energy greater than or lesser than ~ω.
However, what’s really going on is that for finite times, the energy of the photons we’re sending
into the system aren’t definite to begin with, since they must be a finite-time wavepacket. Energy
is always conserved, even in quantum mechanics.
On the other hand, thinking this way can be occasionally useful. For example, it explains why
the probability can go as t rather than t2 . Roughly speaking, the amplitude to go into each decay
state scales as t, giving a probability t2 for each state, but the number of accessible states has
∆E ∼ ~/t by the energy-time uncertainty principle, giving the observed linear dependence and
expected exponential decay. For very early times, we see deviations because ∆E is so large we can
hit all the states.
This sort of confusing language is very common in the AMO community. For example, consider
any system involving a bare Hamiltonian and a perturbation. Then a state prepared in an eigenstate
of the bare Hamiltonian need not remain there; it can develop a small component orthogonal to
this eigenstate, because the eigenstates of the full Hamiltonian can be different. This basic result of
perturbation theory is unfortunately referred to as a “virtual transition” to a “virtual state” which
“violates conservation of energy’.
• To convert a cross section to a count rate, we let J be the flux of incident particles and w be
the total count rate of scattered particles. Then
Z
dσ
w = Jσ, σ = dΩ
dΩ
where σ is the total cross section, and the integral omits the forward direction.
209 12. Time Dependent Perturbation Theory
• For example, for hard-sphere scattering off an obstacle of radius r, the cross section is σ = πr2 .
However, classically the total cross section is often infinite, as we count particles that are
scattered even a tiny amount.
• In the case of two-body scattering, we switch to the center-of-mass frame, with variables
m1 p2 − m2 p1
r = x1 − x2 , p= .
m1 + m2
The momentum p is simply chosen to be the conjugate momentum to r. It is the momentum
of one of the particles in the center-of-mass frame.
• In the case of two beams scattering off each other, with number density n1 and n2 and relative
velocity v, Z
dw dσ
= v dx n1 n2 .
dΩ dΩ
• We split the Hamiltonian as H0 = p2 /2m and H1 = V (x). The perturbation is not time-
dependent, but the results above hold just as well.
• We take periodic boundary conditions in a cube of volume V = L3 with plane wave states |ki
with wavefunctions
eik·x
ψk (x) = hx|ki = √ .
V
These are the eigenstates of H0 . We take the initial state to be |ki i.
To make contact with our classical theory, we consider the rate of scattering into a cone of solid
angle ∆Ω,
dw X 2π
∆Ω = ∆t (ω)|hk|U (x)|ki i|2 ,
dΩ ~2
k∈cone
where w is now interpreted as probability per time, corresponding to a classical count rate. The
incident flux is also interpreted as a probability flux, J = ni vi = ~ki /mV .
• Plugging everything in and using the symmetric convention for the Fourier transform,
2 Z ∞ 2
dσ 2π m e (k − ki )|2 = 2πm |U
= 2 dk k 2 δ(k − ki )|U e (kf − ki )|2
dΩ ~ ~ki 0 ~4
where kf is parallel to k with kf = ki by energy conservation. This is the first Born approxi-
mation.
• After a time t a/v, the evolved wavefunction U (t)|ki will look like an energy eigenstate in
a region of radius about tv about the origin, as we have reached a ‘steady state’ of particles
coming in and being scattered out. This lends some intuition for why scattering rates can be
computed using energy eigenstates alone.
e−κr 2A 1
U (r) = A , U
e (q) =
r (2π) κ + q 2
1/2 2
which arises in nuclear physics because it is the Green’s function for the Klein-Gordan equation.
Applying our scattering formula, q = k − ki and hence q 2 = 4k 2 sin2 (θ/2), giving
dσ 4A2 m2 1
= .
dΩ ~4 (4k 2 sin2 (θ/2) + κ2 )2
dσ Z 2 Z 2 e4 m2 1
= 1 24 4 4 .
dΩ 4~ k sin (θ/2)
This is the Rutherford cross section, the exact result for classical nonrelativistic Coulomb scattering.
It is also the exact result in nonrelativistic quantum mechanics if the particles are distinguishable,
though we couldn’t have known this as we only computed the first term in a perturbation series.
However, the scattering amplitude for the Coulomb potential turns out to be incorrect by phase
factors, because the Coulomb potential doesn’t fall off quickly enough. This doesn’t matter for
distinguishable particles, but for identical particles it renders our answer incorrect because we must
combine distinct scattering amplitudes with phases intact. The correct answer for two electrons is
called the Mott cross section.
where we take the sum over final states in a cone of solid angle ∆Ω.
• We next convert from dw/dΩ to a cross-section dσ/dΩ using
dw dσ
= ni vi .
dΩ dΩ
Now, the velocity is simply vi = c, while the number density can be found by computing the
energy in two different ways,
E2 + B2 ω 2 A2
u = ni ~ω0 , u= = 0 20
8π 2πc
which tells us that
k0 A20
ni = .
2π~c
212 12. Time Dependent Perturbation Theory
The remaining factor is proportional to ψeg (q) where q = k − k0 by logic we’ve seen before. Note
that for typical optics applications, where k0 is in the visible range and hence eik0 ·x varies slowly,
we often expand the exponential instead, yielding a multipole expansion. We will describe this
in more detail in the notes on Optics.
dσ e2 kf
= (2π)2 2 ( · kf )2 |ψeg (q)|2
dΩ mc k0
where the magnitude of the final momentum kf is set by energy conservation. We can then
proceed further with an explicit form for |gi, which would show that harder (higher energy)
X-rays penetrate further, and that larger atoms are more effective at stopping them.
• We might wonder why momentum isn’t conserved here, while energy is. The reason is that
momentum is absorbed by the nucleus, which we have implicitly assumed to be infinitely heavy
by taking the potential as static; a proper treatment of the nucleus would be able to compute
its recoil.
• Without the nucleus present, the reaction γ + e → e would be forbidden. The same effect is
observed in Bremsstrahlung, e → e + γ, which only occurs when matter is nearby to absorb
the momentum. (However, note that gamma decay in isolated nuclei is allowed, as is photon
emission from isolated atoms. This is because the initial and final nuclei/atoms have different
rest masses.)
• Note that this derivation has treated the electromagnetic field as completely classical. Contrary
to what is usually taught, the photoelectric effect is not direct evidence for photons: quantizing
the matter alone is sufficient to make its energy transfer with the field discrete, even if the field
is treated classically! However, the photoelectric effect did play an important historical role in
the advent of quantum mechanics.
• In our study of atomic physics, we neglected the dynamics of the electromagnetic field entirely,
just assuming an instantaneous Coulomb attraction between charges. However, this isn’t right
even classically: one must account for magnetic fields, retardation, and radiation.
• If the velocities are low, and the retardation effects are negligible, one can account for magnetic
fields by adding velocity-dependent terms to the Lagrangian, resulting in the Darwin Lagrangian.
While we don’t do this explicitly, the spin-orbit coupling was very much in this spirit.
• To account for retardation and radiation, we are forced to consider the dynamics of the field
itself. In fact, for multi-electron atoms, retardation effects are of the same order as the fine
structure. Radiation is also important, since it plays a role whenever an atom decays by
spontaneous emission of a photon, but we’ve managed to get by treating this implicitly.
213 12. Time Dependent Perturbation Theory
• Now suppose we do include the full dynamics of the field. Classically, there are two categories
of “easy” electromagnetism problems: those in which the field is given, and those in which the
charges and currents are given. Cases where we need to solve for both, as they affect each other,
are very difficult.
• In the semiclassical theory of radiation, one treats the charges with quantum mechanics but the
field as a fixed, classical background, neglecting the backreaction of the charges. As we have
seen above, this approach can be used to compute the rate of absorption of radiation.
• It is more difficult to compute the rate of spontaneous emission, since the classical background is
simply zero in this case, but it can be done indirectly with thermodynamics, using the Einstein
coefficients. (In quantum field theory, one can compute the spontaneous emission rate directly,
or heuristically describe it as stimulated emission due to “vacuum fluctuations”.)
• Any attempt to incorporate backreaction while keeping the field classical is ultimately incon-
sistent. For example, one can measure a classical field perfectly, leading to a violation of the
uncertainty principle.
• The semiclassical theory also leads to violation of conservation of energy. For instance, if an
atom has a 50% chance of dropping in energy by ~ω, then the energy of the classical field must
be ~ω/2 to preserve the expectation value of energy. But the whole point is that energy is
transfered to the field in only multiples of ~ω. Any option for the field’s energy violates energy
conservation, and fundamentally arises because quantum systems can have indefinite energy,
while classical systems can’t.
• The same problems occur in semiclassical theories of gravity. Instead, a proper description must
involve the quantization of the electromagnetic field itself, carried out in the notes on Quantum
Field Theory. In these notes, we will focus on cases where the semiclassical theory of radiation
applies. For some interesting examples where it doesn’t, within the context of atomic physics,
see The Concept of the Photon—Revisited. A fuller account of the interaction of atoms with
quantized light is given in the notes on Optics.
214 13. Scattering
13 Scattering
13.1 Introduction
In the previous section, we considered scattering from a time-dependent point of view. In this
section, we instead solve the time-independent Schrodinger equation.
• We consider scattering off a potential V (x) which goes to zero outside a cutoff radius r > rco .
Outside this radius, energy eigenstates obey the free Schrodinger equation.
• As argued earlier, if we feed in an incident plane wave, the wavefunction will approach a steady
state after a long time, with constant probability density and current; hence it approach an
energy eigenstate. Thus we can also compute scattering rates by directly looking at energy
eigenstates; such eigenstates are all nonnormalizable.
• We look for energy eigenstates ψ(x) which contain an incoming plane wave, i.e.
For large r, the scattered wave must be a spherical wave with the same energy as the original
wave (i.e. same magnitude of momentum),
eikr
ψscat (x) ∼ f (θ, φ).
r
The function f (θ, φ) is called the scattering amplitude.
• Now, if we wanted ψscat to be an exact eigenstate for r > rco , then f would have to be constant,
yielding an isotropic spherical wave. However, the correction terms for arbitrary f are subleading
in r, and we only care about the large r behavior.
Similarly, the incoming plane wave eik·x isn’t an eigenstate; the correction terms are included
in ψinc (x) and are subleading.
• Next, we convert the scattering amplitude to a cross section. The probability current is
~
J= Im(ψ ∗ ∇ψ).
m
For the incident wave, Jinc = ~k/m. For the outgoing wave,
~k |f (θ, φ)|2
Jscat ∼ r̂.
m r2
The area of a cone of solid angle ∆Ω at radius r is r2 ∆Ω, and hence
dσ r2 Jscat (Ω)
= = |f (θ, φ)|2
dΩ Jinc
which is a very simple result.
• We’ve ignored a subtlety above: the currents for the incident and scattered waves should
interfere because J is bilinear. We ignore this because the incident wave has a finite area in
reality, so it is zero for all angles except the forward direction. In the forward direction, the
incident and scattered waves interfere destructively, as required by conservation of probability.
Applying this quantitatively yields the optical theorem.
215 13. Scattering
• The total cross section almost always diverges classically, because we count any particle scattered
by an arbitrarily small amount. By contrast, in quantum mechanics we can get finite cross
sections because an ‘arbitrarily small push’ can instead become an arbitrarily small scattering
amplitude, plus a high amplitude for continuing exactly in the forward direction. (However,
the cross section can still diverge if V (r) falls slowly enough.)
• The electron Compton wavelength is the scale where pair production can occur, and is
−13 m
λc 4 × 10
SI,
∼ α atomic,
2π
1/m ∼ (0.5 MeV) −1 natural.
• The classical electron radius is the size of an electron where the electrostatic potential energy
matches the mass, i.e. the scale where QED renormalization effects become important. It is
−15 m
3 × 10
SI,
re ∼ α 2 atomic,
α/m ∼ (60 MeV) −1 natural.
• High-frequency elastic scattering, or elastic scattering of any frequency off a free electron, is
known as Thomson scattering. If the frequency is high enough to require relativistic corrections,
it becomes Compton scattering, which is described by the Klein–Nishina formula.
• Raman scattering is the inelastic scattering of photons by matter, which typically is associated
with inducing vibrational excitation or deexcitation in molecules.
216 13. Scattering
The quantum number k parametrizes the energy by E = ~2 k 2 /2m. It is the wavenumber of the
incident and scattered waves far from the potential, i.e. Rkl (r) ∝ eikr .
where
`(` + 1) 2m
W (r) = + 2 V (r).
r2 ~
• Therefore, the general solution of energy E is
X
ψ(x) = A`m Rk` (r)Y`m (θ, φ).
`m
Our next task is to find the expansion coefficients A`m to get a scattering solution.
• In the case of the free particle, the solutions for the radial wavefunction Rk` are the spherical
Bessel functions j` (kr) and y` (kr), where
1 1
j` (ρ) ≈ sin (ρ − `π/2) , y` (ρ) ≈ − cos(ρ − `π/2)
ρ ρ
for ρ `, and the y-type Bessel functions are singular at ρ = 0.
• Since the incident wave eik·x describes a free particle, it must be possible to write in terms of
the j-type Bessel functions. One can show
X
∗
eik·x = 4π i` j` (kr)Y`m (k̂)Y`m (r̂).
`m
• Next, we find the asymptotic behavior of the radial wavefunction Rk` (r) for large r. If the
potential V (r) cuts off at a finite radius r0 , then the solutions are Bessel functions of both the
j and y-type, since we don’t care about the region r < r0 , giving uk` (r) ∼ e±ikr .
217 13. Scattering
• If there is no sharp cutoff, parametrize the error as uk` (r) = eg(r)±ikr , giving
g 00 + g 02 ± 2ikg 0 = W (r).
We already know the centrifugal term alone gives Bessel functions, so we consider the case
where the potential dominates for long distances, V (r) ∼ 1/rp where 0 < p < 2. Taking the
leading term on both sides gives g(r) ∼ 1/rp−1 , so the correction factor g goes to zero for large
r only if p > 1. In particular, the Coulomb potential is ruled out, as it gives logarithmic phase
shifts ei log(kr) . This can also be shown using the first-order WKB approximation.
• Assuming that V (r) does fall faster than 1/r, we may write
sin(kr − lπ/2 + δ` )
Rk` ∼
kr
for large r. To interpret the phase shift δ` , note that we would have δ` = 0 in the case of
a free particle, by the expansion of j` (kr). Thus the phase shift tells us how the potential
asymptotically modifies radial phases.
• For large r, the quantity in square brackets can be expanded as the sum of incoming and
outgoing waves e−ikr /r and eikr /r, and we only want an outgoing component, which gives
∗
A`m = eiδ` Y`m (k̂).
where we used the addition theorem for spherical harmonics and set k̂ = ẑ.
• The above result is known as the partial wave expansion. It gives the scattering amplitude
1X
f (θ, φ) = (2` + 1)eiδ` sin(δ` )P` (cos θ).
k
`
There is no dependence on φ and hence no angular momentum in the z-direction because the
problem is symmetric about rotations about ẑ. Instead the scattered waves are parametrized
by their total angular momentum `. The individual terms are m = 0 spherical harmonics, and
are called the s-wave, the p-wave, and so on. Each of these contributions are present in the
initial plane wave and scatter independently, since L2 is conserved.
218 13. Scattering
• The differential cross section has interference terms, but the total cross section does not due to
the orthogonality of the Legendre polynomials, giving
4π X
σ= (2` + 1) sin2 δ` .
k2
`
• For any localized potential with lengthscale a, then when ka . 1, s-wave scattering (` = 0)
dominates and the scattered particles are spherically symmetric. To see this, note that the
centrifugal potential is equal to the energy when
`(` + 1)~2 ~2 k 2
= E =
2ma2 2m
which has solution ` ≈ ka. Then for ka . 1 the particle cannot classically reach the potential
at all, so it has the same phase as a free particle and hence no phase shift.
• In reality, the phase shift will be small but nonzero for ka > 1 because of quantum tunneling,
but drops off exponentially to zero. In the case where the potential is a power law (long-ranged),
the phase shifts instead drop off as powers.
• In many experimental situations, s-wave scattering dominates (e.g. neutron scattering off nuclei
in reactors). In this case we can replace the potential V (r) with any potential with the same
δ0 . A common and convenient choice is a δ-function potential.
• We can also import some heuristic results from our knowledge of Fourier transforms, though
the partial wave expansions is in Legendre polynomials instead. If the scattering amplitude
is dominated by terms up to `cutoff , the maximum angular size of a feature is about 1/`cutoff .
Moreover, if the phase shifts fall off exponentially, then the scattering amplitude will be analytic.
Otherwise, we generally get singularities in the forward direction.
• Each scattering term σ` is bounded by (4π/k 2 )(2` + 1). This is called the unitarity bound; it
simply says we can’t scatter out more than we put in.
for r > a, where δ` is the phase shift, as can be seen by taking the r → ∞ limit. The boundary
condition Rk` (a) = 0 gives
j` (ka)
tan(δ` ) = .
y` (ka)
First we consider the case ka 1. Applying the asymptotic forms of the Bessel functions,
(ka)2`+1
sin(δ` ) ≈ δ` ≈ − .
(2` − 1)!!(2` + 1)!!
219 13. Scattering
over a large sphere. The flux J splits into three terms: the incident wave (which contributes zero
flux), the scattered wave (which contributes vσ), and the interference term,
~
∗ ∗
Jint = Im (ψscat ∇ψinc + ψinc ∇ψscat ) = vrRe f (θ, φ)∗ eik(x−r) x̂ + f (θ, φ)eik(r−x) r̂ .
m
Integrating over a sphere of radius r, we must have
Z Z
ikr(1−cos θ)
σ = r Re dφ sin θdθ e f (θ, φ)(1 + cos θ)
in the limit r → ∞. Then the phase factor is rapidly oscillating, so the only contribution comes
from the endpoints θ = 0, π since there are no points of stationary phase. The contribution at θ = π
is zero due to the (1 + cos θ) factor, while the θ = 0 peak gives the desired result.
220 13. Scattering
`(` + 1)~2
Vtot (r) = V (r) +
2mr2
which has a well between the turning points r = r0 and r = r1 , and a classically forbidden region
between r = r1 and the turning point r = r2 . We define
2 r1 1 r2
p Z Z
p(r) = 2m(E − Vtot (r)), Φ = p(r) dr, κ = |p(r)| dr.
~ r0 ~ r1
Note that Φ is the action for an oscillation inside the well, so the bound state energies satisfy
Starting with an exponentially decaying solution for r < r0 , the connection formulas give
Z r
1 K Φ i −K Φ iS(r)/~−iπ/4
u(r) = p 2e cos + e sin e + c.c., S(r) = p(r) dr
p(r) 2 2 2 r2
in the region r > r2 , where cos(Φ/2) = 0 for a bound state. Suppose the forbidden region is large,
so eK 1. Then away from bound states, the e−K term does not contribute; we get the same
solution we would get if there were no potential well at all. In particular, assuming V (r) is negligible
for r > r2 , the particle doesn’t feel its effect at all, so δ` = 0.
Now suppose we are near a bound state, E = En + δE. Then
δE
Φ(E) = 2π(n + 1/2) +
~ωc
according to the theory of action-angle variables, and expanding to lowest order in δE gives
−δE + iΓ/2
e2iδ` = , Γ = ~ωc e−2K .
−δE − iΓ/2
That is, across a resonance, the phase shift rapidly changes by π. Then we have a Lorentzian
resonance in the cross-section,
Γ2 /4
sin2 δ` = .
(E − En )2 + Γ2 /4
Since we have assumed K is large, the width Γ is much less than the spacing between energy
levels ~ωc , so the cross-section has sharp spikes as a function of E. Such spikes are common in
neutron-nucleus scattering. Physically, we imagine that the incoming particle tunnels through the
barrier, gets ‘stuck inside’ bouncing back and forth for a timescale 1/Γ, then exits. This is the
physical model for the production of decaying particles in quantum field theory.
• Now we consider the case where the source is determined by A itself, J = σA. Then Maxwell’s
equations read
A = σA, ( − σ)A = 0.
We have arrived at a homogeneous equation, but now A must be determined self-consistently;
it will generally be the sum of an incident and scattered term, both sourcing current.
• As a specific example, consider reflection of an incident wave off a mirror, which is a region
of high σ. The usual approach is to search for a solution of A = 0 containing an incoming
wave, satisfying a boundary condition at the mirror. But as shown above, we can also solve
self-consistently, letting A = Ainc + Ascat where A = σA. We would then find that Ascat
cancels Ainc inside the mirror and also contains a reflected wave.
• Given a Green’s function for ψ, we will not have a closed form for ψ. Instead, we’ll get a
self-consistent expression for ψ in terms of itself, which we can expand to get a series solution.
We define a Green’s function to satisfy this equation for the source i~δ(t − t0 )δ 3 (x − x0 ), where
the i~ is by convention. We always indicate sources by primed coordinates.
The additional step function gives the desired δ-function when differentiated. This Green’s
function is zero for all t < t0 . In terms of a water wave analogy, it describes the surface of a
lake which is previously still, which we poke at (x0 , t0 ).
If we want a causal solution, then ψh (x, t) must also vanish before the driving starts, but this
implies it must vanish for all times. Therefore
Z t Z
0
ψ(x, t) = dt dx0 K(x, t, x0 , t0 )S(x0 , t0 )
−∞
which satisfy
This form is often more useful it does not privilege the position basis. In particular, Green’s
operators can be defined for systems with a much broader range of Hilbert spaces, such as spin
systems or field theories.
Example. In the case of a time-independent Hamiltonian, we will replace the arguments t and t0
with one argument, t for the time difference. For example, for a free particle in three dimensions,
i m(x − x0 )2
m 3/2
0
K0 (x, x , t) = exp
2πi~t ~ 2t
Next, we turn to energy-dependent Green’s functions, which are essentially the Fourier transforms
of time-dependent ones.
223 13. Scattering
(E − H)ψ(x) = S(x)
Note that the homogeneous solution ψh (x) is simply a stationary state with energy E.
• We imagine the energy-dependent Green’s functions as follows. We consider a lake with finite
area which is quiet for t < 0. At t = 0, we begin driving a point x0 sinusoidally with frequency
E. After a long time, the initial transients die out by dissipation and the surface approaches a
sinusoidally oscillating steady state; this is G(x, x0 , E).
Then naively we have the solution Ĝ(E) = 1/(E − H), but this is generally not well defined.
As usual, the ambiguity that exists comes from freedom in the boundary conditions.
• Note that we are not explicitly distinguishing the operator H, which acts on the Hilbert space,
and the coordinate form of H, which is a differential operator that acts on wavefunctions.
then we have
∞ ∞
ei(E−H)t/~ ∞
Z Z
1 iEt/~ 1 i(E−H)t/~
Ĝ+ (E) = dt e U (t) = dt e =−
i~ 0 i~ 0 E − H 0
where all functions of operators are defined by power series. Then Ĝ+ (E) would be a Green’s
operator if we could neglect the upper limit of integration.
224 13. Scattering
• The problem above is due to the fact that the Schrodinger equation has no damping, so
initial transients never die out. Instead we replace H → H − i, giving exponential decay, or
equivalently E → E + i. Then generally we may define
1 ∞ izt/~
Z
1
Ĝ+ (z) = e U (t) =
i~ 0 z−H
for any z = E + i with > 0.
• For Im z > 0, the Green’s operator has a complete set of eigenfunctions (since H does), though
it is not Hermitian. Moreover, none of the eigenvalues are vanishing because they all have
nonzero imaginary part. Thus the inverse of z − H exists and is unique. (We ignore subtle
mathematical issues, such as nonnormalizable eigenfunctions.)
• Suppose that H has a discrete spectrum with negative energies En and a continuous spectrum
with positive energies E, as is typical for scattering problems,
• From the above expression we conclude that Ĝ+ (E + i) is well-defined in the upper-half plane,
but may become singular in the limit → 0. We define
where the right-hand side is often written as Ĝ+ (E + i0). When E is not an eigenvalue, then the
limit exists by the decomposition above. When E is a discrete eigenvalue, the limit is singular
and the Green’s function fails to exist. Finally, when E > 0 the integrand above diverges,
though it turns out the limit of the integral exists, as we’ll show in an example later. All these
results are perfectly analogous to the water waves above.
1 0 izt/~
Z
1
Ĝ− (z) = − e U (t) =
i~ −∞ z−H
where now z = E − i. It is defined in the lower-half plane and limits to Ĝ− (E) for → 0,
where the limit is well defined if E is not equal to any of the En .
225 13. Scattering
• In the water wave analogy, we have ‘antidamping’, and energy is continually absorbed by the
drive. In the case E < 0, this makes no difference in the limit → 0, where the drive absorbs
zero energy. But in the case of a continuous eigenfrequency E > 0, the drive will continuously
absorb energy even for → 0 because it ‘comes in from infinity’, just as it continuously radiates
energy out in the outgoing case.
• Note that since everything in the definitions of Ĝ± is real except for the i, the Ĝ± are Hermitian
conjugates.
With the above water wave intuition, we can understand the Green’s operators analytically.
Therefore we have
ˆ
∆(E) = −2πiδ(E − H).
The operator on the right-hand side is defined by each eigenvector, i.e. an eigenvector of H
with eigenvalue E0 becomes an eigenvector with eigenvalue δ(E − E0 ). Explicitly,
X Z ∞ X
δ(E − H) = |nαihnα|δ(E − En ) + dE 0 |E 0 αihE 0 α|δ(E − E 0 ).
nα 0 α
ˆ
We see that ∆(E) is zero when E is not an eigenvalue, diverges when E = En , and is finite
ˆ
when E > 0 with ∆(E)
P
= −2πi α |EαihEα|.
• Therefore Ĝ− (z) is the analytic continuation of Ĝ+ (z) through the gaps between the discrete
eigenvalues, so they are both part of the same analytic function called the resolvent,
1
Ĝ(z) =
z−H
which is defined for all z that are not eigenvalues of H. The resolvent has poles at every discrete
eigenvalue, and a branch cut along the continuous eigenvalues.
• We can analytically continuous Ĝ+ (z) across the positive real axis, ‘pushing aside’ the branch
cut to reach the second Riemann sheet of the resolvent. In this case we can encounter additional
singularities in the lower-half plane, which correspond to resonances (e.g. long-lived bound
states). (need a good example for this!)
Example. The free particle Green’s functions G0± (x, x0 , E) in three dimensions. Setting z = E +i,
0
dp eip·(x−x )/~
Z Z
0 −1 0 0 1
G0+ (x, x , z) = hx|(z − H0 ) |x i = dp dp hx|pihp| |p0 ihp0 |x0 i = .
z − H0 (2π~)3 z − p2 /2m
226 13. Scattering
To simplify, we set x0 = 0 for simplicity, by translational invariance, let p = ~q, and let z = E + i =
~2 w2 /2m for a complex wavenumber w (so that w is the first quadrant), giving
∞
eiq·x qeiqx
Z Z
1 2m 1 2m i
G0+ (x, z) = − dq 2 = dq
(2π)3 ~2 q −w 2 (2π)2 ~2 x −∞ (q − w)(q + w)
where we performed the angular integration. To do the final integral, we close the contour in the
upper-half plane, picking up the q = w pole. Then
1 2m eiwx
G0+ (x, z) = − .
4π ~2 x
The incoming Green’s function is similar, but now we choose the branch of the square root so that
w lies in the fourth quadrant, so we pick up the q = −w pole instead, giving e−iwx . Converting
back to wavenumbers, we have
(
1 2m e±ikx /x, E ≥ 0,
G0± (x, E) = −
4π ~2 e−κx /x, E ≤ 0
√
where the quantities k, κ ∼ ±E are all real and positive. By taking this choice of branches, we
have ensured that G0± is continuous across the negative real axis, but as a result it is discontinuous
across the positive real axis, as expected.
(E − H0 )ψ(x) = V (x)ψ(x)
where φ(x) solves the homogeneous equation (i.e. free particle with energy E).
• Since we are interested in scattering solutions, we take the outgoing Green’s function G0+ and let
the homogeneous solution be an incoming plane wave |φk i = |ki, which satisfies E = ~2 k 2 /2m.
This yields the Lippmann–Schwinger equation. In terms of kets, it reads
We add the subscript k to emphasize that the solution depends on the choice of k, not just on
E, as it tells us which direction the particles are launched in. In terms of wavefunctions,
0
eik|x−x |
Z
1 2m 0
ψk (x) = φk (x) − dx V (x0 )ψk (x0 ).
4π ~2 |x − x0 |
227 13. Scattering
• There are many variations on the Lippmann–Schwinger equation. For example, in proton-
proton scattering V is the sum of a Coulomb potential and the nuclear potential. Then we
might include the Coulomb term in H0 , so that the incoming wave would be a Coulomb solution
of positive energy, and we would use Green’s functions for the Coulomb potential.
• Now suppose that the potential cuts off after a finite radius, and we observe the scattering at a
much larger radius r = |x|. Then x0 r in the integral above, and we may expand in a power
series in x0 /r, throwing away all terms falling faster than 1/r, giving
1 2m eikr
Z
0 0
ψk (x) ≈ φk (x) − dx0 e−ik ·x V (x0 )ψk (x0 ).
4π ~2 r
In particular, this matches the ‘incident plus scattered’ form of the wavefunction postulated in
the beginning of this section, with scattering amplitude
(2π)3/2 2m 4π 2 m 0
Z
0 0
f (k, k0 ) = − dx0 e−ik ·x V (x0 )ψk (x0 ) = − hk |V |ψk i.
4π ~2 ~2
Thus we have proven that the wavefunction must have such a form in general. We can also
prove a similar statement for rapidly decaying potentials, but it fails for the Coulomb potential.
• We can also use the incoming Green’s function; this describes a solution where waves come in
from infinity and combine to come out as a plane wave. Since the outgoing solution is much
more realistic, we focus on it and may leave the plus sign implicit.
where there is no homogeneous term, because free particle solutions do not decay at infinity.
Solutions only exist for discrete values of E. There is also no choice in Green’s function as both
agree on the negative real axis.
We can use the Lippmann–Schwinger equation to derive a perturbation series for scattering, called
the Born series.
where Ω+ (E) is called the Moller scattering operator. Similarly we may define an incoming
form Ω− (E) and a general operator Ω(z) with complex energy and
|ψk i = |ki + G0+ (E)V |ki + G0+ (E)V G0+ (E)V |ki + . . . .
Substituting this into the expression for the scattering amplitude gives
4π 2 m 0
f (k, k0 ) = − hk |V |ki + hk0 |V G0+ (E)V |ki + . . . .
~ 2
When we truncate these series at V n , we get the nth Born approximation. The Born series can
also be derived by plugging the Lippmann–Schwinger equation into itself.
• The first Born approximation recovers our first-order result from time-dependent perturbation
theory: the scattering amplitude is proportional to the Fourier transform of the potential. In
general, the Dyson series (from time-dependent perturbation theory) is very similar to the Born
series. They both expand in powers of V , but in the time/energy domain respectively.
• We can also phrase the results in terms of the exact Green’s operator
1
G(z) = .
z−H
Playing around and suppressing the z argument, we have
G = G0 + G0 V G = G0 + GV G0
which are Lippmann–Schwinger equations for G. This gives the exact Green’s function as a
series in the number of scatterings off the potential.
In this picture, a scattering process occurs through an initial scattering, then propagation by
the exact Green’s function.
Example. We show the scattering states |ψk i are orthonormal using Green’s functions. We have
1
hψk0 |ψk i = hψk0 |ki + hψk0 |G+ (E)V |ki = hψk0 |ki + lim hψk0 |V |ki
→0 E + i − E 0
where E 0 = ~2 k 02 /2m. Next, using the Lippmann–Schwinger equation on the first factor,
1
hψk0 |ki = hk0 |ki + hψk0 |V G0− (E 0 )|ki = hk0 |ki + lim hψk0 |V |ki.
→0 E 0 − i − E
Then the extra terms cancel, giving hψk0 |ψk i = hk0 |ki = δ(k − k0 ). The completeness relation is
X Z
|nαihnα| + dk |ψk ihψk | = 1
nα
where the first term includes bound states, which are orthogonal to all scattering states.
229 13. Scattering
Then R = |r|2 and T = |t|2 give the probability of reflection and transmission, as can be seen
by computing the probability fluxes. Conservation of probability requires R + T = 1.
• Since the potential is real, if ψ is a solution, then ψ ∗ is as well. This gives the identities
r∗ t
t0 = t, r0 = −
t∗
so that |r| = |r0 |. These results also appear in classical scattering as a result of time-reversal
symmetry. The same symmetry is acting here, as time reversal is complex conjugation.
• As an explicit example, the finite well potential V (x) = −V0 θ(x − a/2)θ(x + a/2) has
• Combining our identities shows that S++ and S−− are phases,
This is analogous to how we distilled three-dimensional central force scattering into a set of
phases in the partial wave decomposition.
• The S-matrix can also detect bound states. Since the algebra used to derive r(k) and t(k) never
assumed that k was real, the same expressions hold for general complex k. Consider a pure
imaginary wavenumber k = iλ with even parity,
It looks like there can’t be a bound state solution here, since the I+ component diverges at
infinity. The trick is to rewrite this as
−1
lim ψ+ (x) = S++ I+ (x) + O+ (x)
|x|→∞
−1
which gives a valid bound state as long as S++ = 0, which corresponds to a pole in S++ . That
is, we can identify bound states from poles in S-matrix elements! (The same reasoning works
in the original left/right basis, though there are more terms.)
q tan(qa/2) − ik
S++ (k) = −e−ika
q tan(qa/2) + ik
which shows that bound states of even parity occur when λ = q tan(qa/2), a familiar result. We
can recover the bound state energy from E = −~2 λ2 /2m.