Statistics Physics
Statistics Physics
Statistical Physics
for Electrical Engineering
123
Neri Merhav
The Andrew and Erna Viterbi Faculty
of Electrical Engineering
Technion—Israel Institute of Technology
Technion City, Haifa
Israel
This short book is based on lecture notes of a course on statistical physics and
thermodynamics, which is oriented, to a certain extent, toward electrical engi-
neering students. The course has been taught in the Electrical Engineering
department of the Technion (Haifa, Israel) ever since year 2013. The main body
of the book is devoted to statistical physics, whereas much less emphasis is given to
the thermodynamics part. In particular, the idea is to let the important results
of thermodynamics (most notably, the laws of thermodynamics) to be obtained as
conclusions from the derivations in statistical physics.
Beyond the variety of central topics in statistical physics that are important to the
general scientific education of the electrical engineering student, special emphasis is
devoted to subjects that are vital to the engineering education concretely. These
include, first of all, quantum statistics, like the Fermi–Dirac distribution, as well as
diffusion processes, which are both fundamental for deep understanding of semi-
conductor devices. Another important issue for the electrical engineering student is
to understand mechanisms of noise generation and stochastic dynamics in physical
systems, most notably, in electric circuitry. Accordingly, the fluctuation–dissipation
theorem of statistical mechanics, which is the theoretical basis for understanding
thermal noise processes in systems, is presented from a signals-and-systems point
of view, in a way that would hopefully be understandable and useful for an engi-
neering student, and well connected to some other important courses learned by
students of electrical engineering, like courses on random processes. The quantum
regime, in this context, is important too and hence provided as well. Finally, we
touch very briefly upon some relationships between statistical mechanics and
information theory, which is the theoretical basis for communications engineering,
and demonstrate how statistical–mechanical approach can be useful for the study of
information–theoretic problems. These relationships are further explored in [1], and
in a much deeper manner.
In the table of contents below, chapters and sections, marked by asterisks, can be
skipped without loss of continuity.
v
vi Preface
Reference
1. N. Merhav, Statistical physics and information theory. Foundat. Trends Commun. Inf. Theor. 6
(1–2), pp. 1–212, 2009.
Acknowledgements
I would first like to express my deep gratitude to several colleagues in the Technion
Physics Department, including Profs. Dov Levine, Shmuel Fishman, and Yariv
Kafri, for many fruitful discussions, and for relevant courses that they have
delivered and that I have listened to. I have certainly learned a lot from them.
I would like to thank Profs. Nir Tessler and Baruch Fischer, of my department, for
their encouragement to develop a statistical physics course for our students. I am
also grateful to Prof. Yuval Yaish of my department, who has been teaching the
course too (in alternate years), for sharing with me his thoughtful ideas about the
course. The lecture notes of the course have served as the basis for this book.
Finally, I thank my dear wife, Ilana, and a student of mine, Mr. Aviv Lewis, for
their useful comments on the English grammar, typographical errors, and style.
vii
Contents
ix
x Contents
Statistical physics is a branch in physics which deals with systems with a huge
number of particles (or any other elementary units). For example, Avogadro’s
number, which is about 6 1023 , is the number of molecules in 22.4 liters of ideal
gas at standard temperature and pressure. Evidently, when it comes to systems with
such an enormous number of particles, there is no hope to keep track of the physical
state (e.g., position and momentum) of each and every individual particle by means
of the classical methods in physics, that is, by solving a gigantic system of dif-
ferential equations pertaining to Newton's laws for all particles. Moreover, even if
those differential equations could have been solved somehow (at least approxi-
mately), the information that they would have given us would be virtually useless.
What we normally really want to know about our physical system boils down to a
fairly short list of macroscopic quantities, such as energy, heat, pressure, temper-
ature, volume, magnetization, and the like. In other words, while we continue to use
the well-known laws of physics, even the classical ones, we no longer use them in
the ordinary manner that we have known from elementary physics courses. Instead,
we think of the state of the system, at any given moment, as a realization of a
certain probabilistic ensemble. This is to say that we approach the problem from a
probabilistic (or a statistical) point of view. The beauty of statistical physics is that
it derives the macroscopic theory of thermodynamics (i.e., the relationships
between thermodynamic potentials, temperature, pressure, etc.) as ensemble aver-
ages that stem from this probabilistic microscopic theory, in the limit of an infinite
number of particles, that is, the thermodynamic limit.
The purpose of this book is to teach statistical mechanics and thermodynamics,
with some degree of orientation toward students in electrical engineering. The main
body of the lectures is devoted to statistical mechanics, whereas much less emphasis
is given to the thermodynamics part. In particular, the idea is to let the laws
of thermodynamics to be obtained as conclusions from the derivations in statistical
mechanics.
Beyond the variety of central topics in statistical physics that are important to the
general scientific education of the electrical engineering student, special emphasis is
xi
xii Introduction
devoted to subjects that are vital to the engineering education concretely. These
include, first of all, quantum statistics, like the Fermi–Dirac distribution, as well as
diffusion processes, which are both fundamental for understanding semiconductor
devices. Another important issue for the electrical engineering student is to
understand mechanisms of noise generation and stochastic dynamics in physical
systems, most notably, in electric circuitry. Accordingly, the fluctuation–dissipation
theorem of statistical mechanics, which is the theoretical basis for understanding
thermal noise processes and physical systems, is presented from the standpoint of a
system with an input and output, in a way that would be understandable and useful
for an engineer, and well related to other courses in the undergraduate curriculum of
electrical engineering, like courses on random processes. This engineering per-
spective is not available in standard physics textbooks. The quantum regime, in this
context, is important and hence provided as well. Finally, we touch upon some
relationships between statistical mechanics and information theory, and demon-
strate how the statistical–mechanical approach can be useful for the study of
information theoretic problems. These relationships are further explored, and in a
much deeper manner, in [1].
Most of the topics in this book are covered on the basis of several other
well-known books on statistical mechanics. However, several perspectives and
mathematical derivations are original and new (to the best of the author’s knowl-
edge). The book includes fairly many examples, exercises, and figures, which will
hopefully help the student to grasp the material better.
It is assumed that the reader has prior background in the following subjects:
(i) elementary calculus and linear algebra, (ii) basics of quantum mechanics, and
(iii) fundamentals of probability theory. Chapter 7 assumes also basic background
in signals-and-systems theory, as well as the theory of random processes, including
the response of linear systems to random input signals.
Reference
1. N. Merhav, Statistical physics and information theory. Foundat. Trends Commun. Inf. Theor. 6
(1–2), pp. 1–212, 2009.
Chapter 1
Kinetic Theory and the Maxwell Distribution
The concept that a gas consists of many small mobile mass particles is very old–it
dates back to the Greek philosophers. It has been periodically rejected and revived
throughout many generations of the history of science. Around the middle of the
19th century, against the general trend of rejecting the atomistic approach, Clausius,1
Maxwell2 and Boltzmann3 succeeded in developing a kinetic theory for the motion
of gas molecules, which was mathematically solid, on the one hand, and agreed
satisfactorily with the experimental evidence (at least in simple cases), on the other
hand.
In this chapter, we present some elements of Maxwell’s formalism and derivation
that builds the kinetic theory of the ideal gas. It derives some rather useful results
from first principles. While the main results that we shall see in this section can be
viewed as a special case of the more general concepts and principles that will be
provided later on, the purpose here is to give a quick snapshot on the taste of this
matter and to demonstrate how the statistical approach to physics, which is based
on very few reasonable assumptions, gives rise to rather far–reaching results and
conclusions.
The choice of the ideal gas, as a system of many mobile particles, is a good test-
bed to begin with, as on the one hand, it is simple, and on the other hand, it is not
irrelevant to electrical engineering and electronics in particular. For example, the free
electrons in a metal can often be considered a “gas” (albeit not an ideal gas), as we
shall see later on.
1 Rudolf Julius Emanuel Clausius (1822–1888) was a German physicist and mathematician who is
considered to be one of the central pioneers of thermodynamics.
2 James Clerk Maxwell (1831–1879) was a Scottish physicist and mathematician, whose other
in statistical mechanics and thermodynamics. He was one of the advocators of the atomic theory
when it was still very controversial.
© The Author(s) 2018 1
N. Merhav, Statistical Physics for Electrical Engineering,
DOI 10.1007/978-3-319-62063-3_1
2 1 Kinetic Theory and the Maxwell Distribution
The rationale behind identical distributions is, like in item 1 above, namely, the
isotropic nature of the pdf. The rationale behind the independence assumption
is that in each collision between two particles, the total momentum is conserved
1.1 The Statistical Nature of the Ideal Gas 3
and in each component (x, y, and z) separately, so there are actually no interac-
tions among the component momenta. Each three–dimensional particle actually
behaves like three independent one–dimensional particles, as far as the momen-
tum is concerned.
We now argue that there is only one kind of (differentiable) joint pdf f (vx , v y , vz ) that
complies with both assumptions at the same time, and this is the Gaussian density
where all three components of v are independent, zero–mean and with the same
variance.
To see why this is true, consider the equation
which combines both requirements. Let us assume that both f and g are differentiable.
Taking now partial derivatives w.r.t. vx , v y and vz , we obtain
implying that
and in particular,
f (vx ) f (v y )
= . (1.1.8)
vx f (vx ) v y f (v y )
Since the l.h.s. depends only on vx and the r.h.s. depends only on v y , the last identity
can hold only if f (vx )/[vx f (vx )] = const. Let us denote this constant by −2α.
Then, we have a simple differential equation,
f (vx )
= −2αvx , (1.1.9)
f (vx )
f (vx ) = Be−αvx ,
2
(1.1.10)
and similar relations hold also for v y and vz . For f to be a valid pdf, α must be
positive and B must be the appropriate constant of normalization, which gives
4 1 Kinetic Theory and the Maxwell Distribution
α −αvx2
f (vx ) = e (1.1.11)
π
Consequently, the average impulse of a single molecule, exerted within time τ , is the
integral, given by
∞
2mτ α mτ
vx2 e−αvx dvx =
2
· .
Lx π 0 2αL x
It follows that the average4 force exerted on the Y Z wall is obtained by dividing
the last expression by τ , namely, it is m/(2αL x ), and then the average pressure
contributed by a single molecule is m/(2αL x L y L z ). Therefore, the total pressure
contributed by all N molecules is
mN mN ρm
P= = = (1.1.13)
2αL x L y L z 2αV 2α
and so, we can determine α in terms of the physical quantities P and ρ and m:
ρm
α= . (1.1.14)
2P
P = ρkT (1.1.15)
where is the (kinetic) energy of the molecule. This form of a pdf, that is proportional
to e−/(kT ) , where is the energy, is not a coincidence. We shall see it again and again
later on, and in much greater generality, as a fact that stems from much deeper and
more fundamental principles. It is called the Boltzmann–Gibbs distribution.6
Having derived the pdf of v, we can now calculate a few moments. Throughout
this book, we will denote the expectation operator by ·, which is the customary
notation used by physicists. Since
kT
vx2 = v 2y = vz2 = (1.1.18)
m
we readily have
3kT
v 2 = vx2 + v 2y + vz2 = , (1.1.19)
m
Other related statistical quantities, that can be derived from f (v ), are the √
average
speed s and the most likely speed. Like sRMS , they are also proportional to kT /m
but with different constants of proportionality (see Exercise 1.1 below). The average
kinetic energy per molecule is
1 3kT
= v 2 =
m , (1.1.21)
2 2
independent of m. This relation gives the basic significance to the notion of temper-
ature: at least in the case of the ideal gas, temperature is simply a quantity that is
directly proportional to the average kinetic energy of each particle. In other words,
temperature and kinetic energy are almost synonyms in this case. In the sequel, we
will see a more general definition of temperature. The factor of 3 at the numerator is
due to the fact that space has three dimensions, and so, each molecule has 3 degrees of
freedom. Every degree of freedom contributes an amount of energy given by kT /2.
This will turn out later to be a special case of a more general principle called the
equipartition of energy.
The pdf of the speed, s = v , can be derived from the pdf of the velocity v using
the obvious consideration that all vectors v of the same norm correspond to the same
speed. Thus, the pdf of s is simply the pdf of v (which depends solely on v = s)
multiplied by the surface area of a three–dimensional sphere of radius s, which is
4πs 2 , i.e.,
m 3/2
−ms 2 /(2kT ) 2 m 3 2 −ms 2 /(2kT )
f (s) = 4πs 2
e = ·s e (1.1.22)
2πkT π kT
This is called the Maxwell distribution and it is depicted in Fig. 1.1 for various values
of the parameter kT√/m. To obtain the pdf√of the energy , we should change variables
according to s = 2/m and ds = d/ 2m. The result is
√
2
f () = √ · e−/(kT ) . (1.1.23)
π(kT )3/2
Exercise 1.1 Use the above to calculate: (i) the average speed s, (ii) the most likely
speed, argmaxs f (s), and (iii) the most likely energy argmax f ().
An interesting relation, that will be referred to later on, links between the average
energy per particle ¯ = , the density ρ, and the pressure P, or equivalently, the
total energy E = N ¯, the volume V and P:
1.1 The Statistical Nature of the Ideal Gas 7
0.2
0.1
0
0 1 2 3 4 5
2ρ 3kT 2ρ
P = ρkT = · = · ¯, (1.1.24)
3 2 3
which after multiplying by V becomes
2E
PV = . (1.1.25)
3
It is interesting to note that this relation can be obtained directly from the analysis of
the impulse exerted by the particles on the walls, similarly as in the earlier derivation
of the parameter α, and without recourse to the equation of state (see, for example,
[1, Sect. 20–4, pp. 353–355]). This is because the parameter α of the Gaussian
pdf of each component of v has the obvious meaning of 1/(2σv2 ), where σv2 is the
common variance of each component of v. Thus, σv2 = 1/(2α) and so, v 2 =
3σv = 3/(2α), which in turn implies that
2
m
3m 3m 3P
¯ =
v 2 = = = , (1.1.26)
2 4α 4ρm/(2P) 2ρ
1.2 Collisions
We now take a closer look into the issue of collisions. We first define the concept
of collision cross–section, which we denote by σ. Referring to Fig. 1.2, consider
a situation, where two hard spheres, labeled A and B, with diameters 2a and 2b,
respectively, are approaching each other, and let c be the projection of the distance
between their centers in the direction perpendicular to the direction of their relative
8 1 Kinetic Theory and the Maxwell Distribution
motion, v1 − v2 . Clearly, collision will occur if and only if c < a + b. In other words,
the two spheres would collide only if the center of B lies inside a volume whose
cross sectional area is σ = π(a + b)2 , or for identical spheres, σ = 4πa 2 . Let the
colliding particles have relative velocity v = v1 − v2 . Passing to the coordinate
system of the center of mass of the two particles, this is equivalent to the motion
of one particle with the reduced mass μ = m 1 m 2 /(m 1 + m 2 ), and so, in the case of
identical particles, μ = m/2. The average relative speed is easily calculated from the
Maxwell distribution, but with m being replaced by μ = m/2, i.e.,
m 3/2 ∞
3 −m(v)2 /(4kT ) kT √
v = 4π (v) e d(v) = 4 · = 2s.
4πkT 0 πm
(1.2.1)
The total number of particles per unit volume that collide with a particular particle
within time τ is
kT
Ncol (τ ) = ρσ v τ = 4ρστ (1.2.2)
πm
The mean distance between collisions (a.k.a. the mean free path) is therefore
v 1 kT
λ= =√ =√ . (1.2.4)
ν 2ρσ 2Pσ
What is the probability distribution of the random distance L between two con-
secutive collisions of a given particle? In particular, what is p(l) = Pr{L ≥ l}? Let
us assume that the collision process is memoryless in the sense that the event of not
colliding before distance l1 + l2 is the intersection of two independent events, the
first one being the event of not colliding before distance l1 , and the second one being
the event of not colliding before the additional distance l2 . That is
1.2 Collisions 9
We argue that under this assumption, p(l) must be exponential in l. This follows
from the following consideration.7 Taking partial derivatives of both sides w.r.t. both
l1 and l2 , we get
Thus,
p (l1 ) p (l2 )
= (1.2.7)
p(l1 ) p(l2 )
for all non-negative l1 and l2 . Thus, p (l)/ p(l) must be a constant, which we shall
denote by −a. This trivial differential equation has only one solution which obeys
the obvious initial condition p(0) = 1:
so it only remains to determine the parameter a, which must be positive since the
function p(l) must be monotonically non–increasing by definition. This can easily
be found by using the fact that L = 1/a = λ, and so,
√
−l/λ 2Pσl
p(l) = e = exp − . (1.2.9)
kT
The discussion thus far focused on the static (equilibrium) behavior of the ideal gas.
In this subsection, we will briefly touch upon dynamical issues pertaining to non–
equilibrium situations. These issues will be further developed in Chap. 7, and with
much greater generality.
Consider two adjacent containers separated by a wall. Both have the same volume
V of the same ideal gas at the same temperature T , but with different densities ρ1
and ρ2 , and hence different pressures P1 and P2 . Let us assume that P1 > P2 . At time
t = 0, a small hole is created in the separating wall. The area of this hole is A (see
Fig. 1.3).
If the mean free distances λ1 and λ2 are relatively large compared to the dimensions
of the hole, it is safe to assume that every molecule that reaches the hole, passes
7 Similar idea to the one of the earlier derivation of the Gaussian pdf of the ideal gas.
10 1 Kinetic Theory and the Maxwell Distribution
through it. The mean number of molecules that pass from left to right within time τ
is given by8
∞
α −αvx2 vx τ A ρ1 τ A
N→ = ρ1 V · dvx e · = √ (1.3.1)
0 π V 2 πα
and so the number of particles per second, flowing from left to right is
dN→ ρ1 A
= √ . (1.3.2)
dt 2 πα
dN← ρ2 A
= √ , (1.3.3)
dt 2 πα
An important point here is that the current is proportional to the difference between
densities (ρ1 − ρ2 ), and considering the equation of state of the ideal gas, it is there-
fore also proportional to the pressure difference, (P1 − P2 ). This rings the bell of
the well known analogous fact that the electric current is proportional to the voltage,
which in turn is the difference between the electric potentials at two points. Con-
sidering the fact that ρ = (ρ1 + ρ2 )/2 is constant, we obtain a simple differential
equation
dρ1 A kT
= (ρ2 − ρ1 ) = C(ρ2 − ρ1 ) = 2C(ρ − ρ1 ) (1.3.5)
dt V 2πm
8 Note that for v = v = 0, the factor v τ A/V , in the forthcoming equation, is clearly the relative
y z x
volume (and hence the probability) of being in the ‘box’ in which a particle must be found in order
to pass the hole within τ seconds. When v y and vz are non–zero, instead of a rectangular box, this
region becomes a parallelepiped, but the relative volume remains vx τ A/V independently of v y
and vz .
1.3 Dynamical Aspects 11
whose solution is
which means that equilibrium is approached exponentially fast with time constant
1 V 2πm
τ= = . (1.3.7)
2C 2A kT
Imagine now a situation, where there is a long pipe aligned along the x–direction.
The pipe is divided into a chain of cells in a linear fashion, and in the wall between
each two consecutive cells there is a hole of area A. The length of each cell (i.e., the
distance between consecutive walls) is the mean free distance λ, so that collisions
within each cell can be neglected. Assume further that λ is so small that the density
of each cell at time t can be approximated using a continuous function ρ(x, t). Let
x0 be the location of one of the walls. Then, according to the above derivation, the
current at x = x0 is
λ λ kT
I (x0 ) = ρ x0 − , t − ρ x0 + , t A
2 2 2πm
kT ∂ρ(x, t)
≈ −Aλ · . (1.3.8)
2πm ∂x x=x0
Thus, the current is proportional to the negative gradient of the density. This is quite
a fundamental result which holds with much greater generality. In the more general
context, it is known as Fick’s law.
Consider next two close points x0 and x0 + dx, with possibly different current
densities (i.e., currents per unit area) J (x0 ) and J (x0 + x). The difference J (x0 ) −
J (x0 + x) is the rate at which matter accumulates along the interval [x0 , x0 + x]
per unit area in the perpendicular plane. Within t seconds, the number of particles
per unit area within this interval has grown by [J (x0 ) − J (x0 + x)]t. But this
amount is also [ρ(x0 , t + t) − ρ(x0 , t)]x, Taking the appropriate limits, we get
∂ J (x) ∂ρ(x, t)
=− , (1.3.9)
∂x ∂t
which is a one–dimensional version of the so called equation of continuity. Differ-
entiating now Eq. (1.3.8) w.r.t. x and comparing with (1.3.9), we obtain the diffusion
equation (in one dimension):
∂ρ(x, t) ∂ 2 ρ(x, t)
=D (1.3.10)
∂t ∂x 2
12 1 Kinetic Theory and the Maxwell Distribution
The following books are recommended for further reading and for related material.
Beck [2, Sect. 2.2] derives the pdf of the particle momenta in a manner somewhat
different than here. Other parts of this section are quite similar to those of [2, Chap. 2].
A much more detailed exposition of the kinetic theory of gases appears also in many
other textbooks, including: Huang [3, Chap. 4], Kardar [4, Chap. 3], Kittel [5, Part I,
Chap. 13], Mandl [6, Chap. 7], Reif [7, Chap. 9], and Tolman [8, Chap. IV], to name
a few.
References
1. F.W. Sears, M.W. Zemansky, H.D. Young, University Physics (Addison-Wesley, Reading, 1976)
2. A.H.W. Beck, Statistical Mechanics, Fluctuations and Noise (Edward Arnold Publishers, Lon-
don, 1976)
3. K. Huang, Statistical Mechanics, 2nd edn. (Wiley, New York, 1987)
4. M. Kardar, Statistical Physics of Particles (Cambridge University Press, Cambridge, 2007)
5. C. Kittel, Elementary Statistical Physics (Wiley, New York, 1958)
6. F. Mandl, Statistical Physics (Wiley, Chichester, 1971)
7. F. Reif, Fundamentals of Statistical and Thermal Physics (McGraw-Hill, New York, 1965)
8. R.C. Tolman, The Principles of Statistical Mechanics (Dover Publications, New York, 1979)
Chapter 2
Elementary Statistical Physics
In this chapter, we provide the formalism and the elementary background in statistical
physics. We first define the basic postulates of statistical mechanics, and then define
various ensembles. Finally, we shall derive some of the thermodynamic potentials
and their properties, as well as the relationships among them. The important laws of
thermodynamics will also be pointed out. The contents of this chapter has a consid-
erable overlap with Chap. 2 of [1] (though there are also considerable differences).
It is provided in this book too, mostly for the sake of completeness.
Before we proceed, let us slightly broaden the scope of our discussion. In a more
general context, associated with our N –particle physical system, is a certain instan-
taneous microstate, generically denoted by x = (x1 , x2 , . . . , x N ), where each xi ,
1 ≤ i ≤ N , may itself be a vector of several physical quantities associated with par-
ticle number i, e.g., its position, momentum, angular momentum, magnetic moment,
spin, and so on, depending on the type and the nature of the physical system. Accord-
ing to the physical model of the given system, there is a certain energy function, a.k.a.
Hamiltonian, that assigns to every x a certain energy E(x).2 Now, let us denote by
A(E) the volume of the shell of energy about E. This means
1 This is a result of the energy conservation law along with the fact that probability mass behaves
like an incompressible fluid in the sense that whatever mass that flows into a certain region from
some direction must be equal to the outgoing flow from some other direction. This is reflected in
the equation of continuity, which was demonstrated earlier.
2 For example, in the case of an ideal gas, E (x) =
N
i=1 p
i 2 /(2m), where m is the mass of each
molecule, namely, it accounts for the contribution of the kinetic energies only. In more complicated
situations, there might be additional contributions of potential energy, which depend on the positions.
2.2 Statistical Ensembles 15
A(E) = Vol{x : E ≤ E(x) ≤ E + E} = dx, (2.2.1)
{x: E≤E(x)≤E+E}
where E is a very small (but fixed) energy increment, which is immaterial when N
is large. Then, our above postulate concerning the ensemble of an isolated system,
which is called the microcanonical ensemble, is that the probability density P(x) is
given by
1
E ≤ E(x) ≤ E + E
P(x) = A(E) (2.2.2)
0 elsewhere
In the discrete case, things are simpler: here, A(E) is the number of microstates with
E(x) = E (exactly) and P(x) is the uniform probability mass function over this set
of states.
Returning to the general case, we next define the notion of the density of states,
ω(E), which is intimately related to A(E). Basically, in simple cases, ω(E) is defined
such that ω(E)E = A(E) where E is very small, but there might be a few minor
corrections, depending on the concrete system being addressed. More generally, we
define the density of states such that ω(E)E = (E), where (E) will be the
relevant (possibly corrected) function. The first correction has to do with the fact
that A(E) is, in general, not dimensionless: in the above example of a gas, it has
the physical units of [length × momentum]3N = [J · s]3N , but we must eliminate
these physical units because we will have to apply on it non–linear functions like the
logarithmic function. To this end, we normalize the volume A(E) by an elementary
reference volume. In the gas example, this reference volume is taken to be h 3N ,
where h is Planck’s constant (h ≈ 6.62 × 10−34 J · s). Informally, the intuition
comes from the fact that h is our best available “resolution” in the plane spanned
by each component of ri and the corresponding component of pi , owing to the
uncertainty principle in quantum mechanics, which tells that the product of the
standard deviations pa · ra of each component a (a = x, y, z) is lower bounded
by /2, where = h/(2π). More formally, this reference volume is obtained in
a natural manner from quantum statistical mechanics: by changing the integration
variable p to k using the relation p = k,
where k is the wave vector. This is a well–
known relation (one of the de Broglie relations) pertaining to particle–wave duality.
The second correction that is needed to pass from A(E) to (E) is applicable when
the particles are indistinguishable3 : In these cases, we do not consider permutations
between particles in a given configuration as distinct microstates. Thus, we have to
divide also by N ! Taking into account both corrections, we find that in the example
of the ideal gas,
3 Inthe example of the ideal gas, since the particles are mobile and since they have no colors and
no identity certificates, there is no distinction between a state where particle no. 15 has position r
and momentum p while particle no. 437 has position r and momentum p and a state where these
two particles are swapped.
16 2 Elementary Statistical Physics
A(E)
(E) = . (2.2.3)
N !h 3N
Once again, it should be understood that both of these corrections are optional and
their applicability depends on the system in question: the first correction is applicable
only if A(E) has physical units and the second correction is applicable only if the
particles are indistinguishable. For example, if x is discrete, in which case the integral
defining A(E) is replaced by a sum (that counts x’s with E(x) = E), and the particles
are distinguishable, then no corrections are needed at all, i.e.,
where k is Boltzmann’s constant. We will see later what is the relationship between
S(E) and the classical thermodynamic entropy, due to Clausius (1850), as well as the
information–theoretic entropy, due to Shannon (1948). As will turn out, all three are
equivalent to one another. Here, a comment on the notation is in order: the entropy S
may depend on additional quantities, other than the energy E, like the volume V and
the number of particles N . When this dependence will be relevant and important, we
will use the more complete form of notation S(E, V, N ). If only the dependence on
E is relevant in a certain context, we use the simpler notation S(E).
To get some insight into the behavior of the entropy, it should be noted that
normally, (E) (and hence also ω(E)) behaves as an exponential function of N (at
least asymptotically), and so, S(E) is roughly linear in N . For example, if E(x) =
N pi 2
i=1 2m , then (E) is the volume
√
of a thin shell about the surface of a (3N )–
dimensional sphere with radius 2m E, divided by N !h 3N , which is proportional to
(2m E)3N /2 V N /N !h 3N , where V is the volume. The quantity ω(E) is then associated
with the surface area of this (3N )–dimensional sphere. Specifically (ignoring the
contribution of the factor E), we get
4πm E 3N /2 VN 3
S(E, V, N ) = k ln · + Nk
3N N !h 3N 2
4πm E 3/2 V 5
≈ N k ln · 3
+ N k. (2.2.6)
3N Nh 2
are not extensive, i.e., independent of the system size, like temperature and pressure,
are called intensive.
It is interesting to point out that from the function S(E, V, N ), one can obtain the
entire information about the relevant macroscopic physical quantities of the system,
e.g., temperature, pressure, and so on. Specifically, the temperature T of the system
is defined according to:
1 ∂ S(E, V, N )
= (2.2.7)
T ∂E V,N
where [·]V,N emphasizes that the derivative is taken while keeping V and N constant.
One may wonder, at this point, what is the justification for defining temperature this
way. We will get back to this point a bit later, but for now, we can easily see that this
is indeed true at least for the ideal gas, as by taking the derivative of (2.2.6) w.r.t. E,
we get
∂ S(E, V, N ) 3N k 1
= = , (2.2.8)
∂E 2E T
where the second equality has been shown already in Chap. 1.
Intuitively, in most situations, we expect that S(E) would be an increasing function
of E for fixed V and N (although this is not strictly always the case), which means
T ≥ 0. But T is also expected to increase with E (or equivalently, E is increasing
with T , as otherwise, the heat capacity dE/dT < 0). Thus, 1/T should decrease with
E, which means that the increase of S in E slows down as E grows. In other words,
we expect S(E) to be a concave function of E. In the above example, indeed, S(E)
is logarithmic and E = 3N kT /2, as we have seen.
How can we be convinced, in mathematical terms, that under certain regularity
conditions, S(E) is a concave function in E? The answer may be given by a simple
superadditivity argument: As both E and S are extensive quantities, let us define
E = N and for a given density ρ,
S(N )
s() = lim , (2.2.9)
N →∞ N
i.e., the per–particle entropy as a function of the per–particle energy, where we assume
that the limit exists. Consider the case where the Hamiltonian is additive, i.e.,
N
E(x) = E(xi ) (2.2.10)
i=1
N pi 2
just like in the above example where E(x) = i=1 2m
. Then, the inequality
expresses the simple fact that if our system is partitioned into two parts,4 one with
N1 particles, and the other with N2 = N − N1 particles, then every combination
of individual microstates with energies N1 1 and N2 2 corresponds to a combined
microstate with a total energy of N1 1 + N2 2 (but there are more ways to split this
total energy between the two parts). Thus,
which establishes the concavity of s(·) at least in the case of an additive Hamiltonian,
which means that the entropy of mixing two systems of particles is greater than the
total entropy before the mix. A similar proof can be generalized to the case where
E(x) includes
N also a limited degree of interactions (short range interactions), e.g.,
E(x) = i=1 E(x i , x i+1 ), but this requires somewhat more caution. In general,
however, concavity may no longer hold when there are long range interactions, e.g.,
where some terms of E(x) depend on a linear subset of particles.
Example 2.1 (Schottky defects) In a certain crystal, the atoms are located in a lattice,
and at any positive temperature there may be defects, where some of the atoms are
dislocated (see Fig. 2.1). Assuming that defects are sparse enough, such that around
each dislocated atom all neighbors are in place, the activation energy, 0 , required
for dislocation is fixed. Denoting the total number of atoms by N and the number of
defected ones by n, the total energy is then E = n0 , and so,
4 Thisargument works for distinguishable particles. Later on, a more general argument will be
presented, that holds for indistinguishable particles too.
2.2 Statistical Ensembles 19
N N!
(E) = = , (2.2.14)
n n!(N − n)!
or, equivalently,
N!
S(E) = k ln (E) = k ln
n!(N − n)!
≈ k[N ln N − n ln n − (N − n) ln(N − n)] (2.2.15)
where in the last passage we have used the Stirling approximation. It is important to
point out that here, unlike in the example of the ideal gas, we have not divided A(E)
by N ! The reason is that we do distinguish between two different configurations
where the same number of particles were dislocated but the sites of dislocation are
different. Yet, we do not distinguish between two microstates whose only difference
is two (identical) particles which are not dislocated but swapped. This is the reason
for the denominator n!(N − n)! in the expression of (E). Now,5
1 ∂S dn dS 1 N −n
= = · = · k ln , (2.2.16)
T ∂E dE dn 0 n
N
n= . (2.2.17)
exp(0 /kT ) + 1
At T = 0, there are no defects, but their number increases gradually with T , approx-
imately according to exp(−0 /kT ). Note also that
n
N
S(E) = k ln ≈ k N h2
n N
E
= k N h2 = k N h2 , (2.2.18)
N 0 0
where
h 2 (x) = −x ln x − (1 − x) ln(1 − x), 0≤x ≤1
is the so called binary entropy function. Note also that s() = kh 2 (/0 ) is indeed
concave in this example.
5 Hereand in the sequel, the reader might wonder about the meaning of taking derivatives of, and
with respect to, integer valued variables, like the number of dislocated particles, n. To this end,
imagine an approximation where n is interpolated to be a continuous valued variable.
20 2 Elementary Statistical Physics
Suppose we have two systems that are initially at certain temperatures (and with
corresponding energies). At a certain time instant, the two systems are brought into
thermal contact with one another, but their combination remains isolated. What hap-
pens after a long time? How does the total energy E, split and what is the final
temperature T of the combined system? The number of combined microstates where
subsystem no. 1 has energy E 1 and subsystem no. 2 has energy E 2 = E − E 1 is
1 (E 1 ) · 2 (E − E 1 ). As the combined system is isolated, the probability of such
a combined macrostate is proportional to 1 (E 1 ) · 2 (E − E 1 ). Keeping in mind
that, normally, 1 and 2 are exponential in N , then for large N , this product is
dominated by the value of E 1 for which it is maximum, or equivalently, the sum
of logarithms, S1 (E 1 ) + S2 (E − E 1 ), is maximum, i.e., it is a maximum entropy
situation, which is the second law of thermodynamics, asserting that an isolated
system (in this case, combined of two subsystems) achieves its maximum possible
entropy in equilibrium. This maximum is normally achieved at the value of E 1 for
which the derivative vanishes, i.e.,
or
which means
1 1
≡ S1 (E 1 ) = S2 (E 2 ) ≡ . (2.2.21)
T1 T2
Thus, in equilibrium, which is the maximum entropy situation, the energy splits
in a way that temperatures are the same. Now, we can understand the concavity of
entropy more generally: λs(1 )+(1−λ)s(2 ) was the total entropy per particle when
two subsystems (with the same entropy function) were isolated from one another,
whereas s(λ1 + (1 − λ)2 ) is the equilibrium entropy per particle after we let them
interact thermally.
At this point, we are ready to justify why S (E) is equal to 1/T in general, as
was promised earlier. Although it is natural to expect that equality between S1 (E 1 )
and S2 (E 2 ), in thermal equilibrium, is related to equality between T1 and T2 , this
does not automatically mean that the derivative of each entropy is given by one over
its temperature. On the face of it, for the purpose of this implication, this derivative
could have been equal any one–to–one function of temperature f (T ). To see why
f (T ) = 1/T indeed, imagine that we have a system with an entropy function
S0 (E) and that we let it interact thermally with an ideal gas whose entropy function,
which we shall denote now by Sg (E), is given as in Eq. (2.2.6). Now, at equilibrium
S0 (E 0 ) = Sg (E g ), but as we have seen already, Sg (E g ) = 1/Tg , where Tg is the
temperature of the ideal gas. But in thermal equilibrium the temperatures equalize,
i.e., Tg = T0 , where T0 is the temperature of the system of interest. It then follows
2.2 Statistical Ensembles 21
eventually that S0 (E 0 ) = 1/T0 , which now means that in equilibrium, the derivative
of entropy of the system of interest is equal to the reciprocal of its temperature in
general, and not only for the ideal gas! At this point, the fact that our system has
interacted and equilibrated with an ideal gas is not important anymore and it does
not limit the generality of this statement. In simple words, our system does not ‘care’
what kind of system it has interacted with, whether ideal gas or any other. This
follows from a fundamental principle in thermodynamics, called the zero–th law of
thermodynamics, which states that thermal equilibrium has a transitive property:
If system A is in equilibrium with system B and system B is in equilibrium with
system C, then A is in equilibrium with C.
So we have seen that ∂ S/∂ E = 1/T , or equivalently, δS = δ E/T . But in the
absence of any mechanical work (V is fixed) applied to the system and any chemical
energy injected into the system (N is fixed), any change in energy must be in the
form of heat,6 thus we denote δ E = δ Q, where Q is the heat intake. Consequently,
δQ
δS = , (2.2.22)
T
This is exactly the definition of the classical thermodynamic entropy due to Clausius.
Thus, at least for the case where no mechanical work is involved, we have demon-
strated the equivalence of the two notions of entropy, the statistical notion due
to Boltzmann
S = k ln , and the thermodynamic entropy due to Clausius,
S = dQ/T , where the integration should be understood to be taken along a slow
(quasi–static) process, where after each small increase in the heat intake, the sys-
tem is allowed to equilibrate, which means that T is given enough time to adjust
before more heat is further added. For a given V and N , the difference S between
the entropies S A and S B associated with two temperatures T A and TB (pertaining to
B
internal energies E A and E B , respectively) is given by S = A dQ/T along such a
quasi–static process. This is a rule that defines entropy differences, but not absolute
levels. A reference value is determined by the third law of thermodynamics, which
asserts that as T tends to zero, the entropy tends to zero as well.7
We have seen what is the meaning of the partial derivative of S(E, V, N ) w.r.t. E.
Is there also a simple meaning to the partial derivative w.r.t. V ? Again, let us begin
by examining the ideal gas. Differentiating the expression of S(E, V, N ) of the ideal
gas w.r.t. V , we obtain
∂ S(E, V, N ) Nk P
= = , (2.2.23)
∂V V T
6 Heat is a form of energy that is transferred neither by mechanical work nor by matter. It is the type
of energy that flows spontaneously from a system/body at a higher temperature to one with a lower
temperature (and this transfer is accompanied by an increase in the total entropy).
7 In this context, it should be understood that the results we derived for the ideal gas hold only for
high enough temperatures: since S was found proportional to ln E and E is proportional to T , then
S is proportional to ln T , but this cannot be true for small T as it contradicts (among other things)
the third law.
22 2 Elementary Statistical Physics
where the second equality follows again from the equation of state. So at least for the
ideal gas, this partial derivative is related to the pressure P. For similar considerations
as before, the relation
∂ S(E, V, N ) P
= (2.2.24)
∂V T
is true not only for the ideal gas, but in general. Consider again an isolated system
that consists of two subsystems, separated by a wall (or a piston). Initially, this
wall is fixed and the volumes are V1 and V2 . At a certain moment, this wall is
released and allowed to be pushed in either direction. How would the total volume
V = V1 + V2 divide between the two subsystems in equilibrium? Again, the total
entropy S1 (E 1 , V1 ) + S2 (E − E 1 , V − V1 ) would tend to its maximum for the same
reasoning as before. The maximum will be reached when the partial derivatives of
this sum w.r.t. both E 1 and V1 would vanish. The partial derivative w.r.t. E 1 has
already been addressed. The partial derivative w.r.t. V1 gives
P1 ∂ S1 (E 1 , V1 ) ∂ S2 (E 2 , V2 ) P2
= = = (2.2.25)
T1 ∂V1 ∂V2 T2
∂S ∂S
δS = δE + δV
∂E ∂V
δE PδV
= + (2.2.26)
T T
or
which is the the first law of thermodynamics, asserting that the change in the energy
δ E of a system with a fixed number of particles is equal to the difference between
the incremental heat intake δ Q and the incremental mechanical work δW carried out
by the system. This is nothing but a restatement of the law of energy conservation.
Example 2.2 (Compression of ideal gas) Consider again an ideal gas of N particles
at constant temperature T . The energy is E = 3N kT /2 regardless of the volume.
This means that if we compress (slowly) the gas from volume V1 to volume V2
(V2 < V1 ), the energy remains the same, in spite of the fact that we injected energy
by applying mechanical work
2.2 Statistical Ensembles 23
V2 V2
dV V1
W =− PdV = −N kT = N kT ln . (2.2.28)
V1 V1 V V2
What happened to that energy? The answer is that it was transformed into heat as
the entropy of the system (which is proportional to ln V ) has changed by the amount
S = −N k ln(V1 /V2 ), and so, the heat intake Q = T S = −N kT ln(V1 /V2 )
exactly balances the work.
∂ S(E, V, N ) μ
=− , (2.2.29)
∂N T
where μ is called the chemical potential. If we now consider again the isolated system,
which consists of two subsystems that are allowed to exchange, not only heat and
volume, but also particles (of the same kind), whose total number is N = N1 + N2 ,
then again, maximum entropy considerations would yield an additional equality
between the chemical potentials, μ1 = μ2 (chemical equilibrium).8 The chemical
potential should be understood as a kind of a force that controls the ability to inject
particles into the system. For example, if the particles are electrically charged, then
the chemical potential has a simple analogy to the electrical potential. The first law
is now extended to have an additional term, pertaining to an increment of chemical
energy, and it now reads:
δ E = T δS − PδV + μδ N . (2.2.30)
So far we have assumed that our system is isolated, and therefore has a strictly fixed
energy E. Let us now relax this assumption and assume instead that our system is
free to exchange energy with its very large environment (heat bath) and that the total
energy of the system plus heat bath, E 0 , is by far larger than the typical energy of the
system. The combined system, composed of our original system plus the heat bath,
is now an isolated system at temperature T .
8 Equality of chemical potentials is, in fact, the general principle of chemical equilibrium, and not
equality of concentrations or densities. In Sect. 1.3, we saw equality of densities, because in the
case of the ideal gas, the chemical potential is a function of the density, so equality of chemical
potentials happens to be equivalent to equality of densities in this case.
24 2 Elementary Statistical Physics
B (E 0 − E(x))
P(x) =
. (2.2.31)
x B (E 0 − E(x ))
Let us focus on the numerator for now, and normalize the result at the end. Then,
P(x) ∝ B (E 0 − E(x))
= exp{S B (E 0 − E(x))/k}
S B (E 0 ) 1 ∂ S B (E)
≈ exp − · E(x)
k k ∂ E E=E0
S B (E 0 ) 1
= exp − · E(x)
k kT
∝ exp{−E(x)/(kT )}. (2.2.32)
1
β= (2.2.33)
kT
and so,
as we have already seen in the example of the ideal gas (where E(x) was the kinetic
energy), but now it is much more general. Thus, all that remains to do is to normal-
ize, and we then obtain the Boltzmann–Gibbs (B–G) distribution, or the canonical
ensemble, which describes the underlying probability law in equilibrium:
exp{−βE(x)}
P(x) = (2.2.35)
Z (β)
2.2 Statistical Ensembles 25
in the continuous case. This function is called the partition function. As with the
function (E), similar comments apply to the partition function: it must be dimen-
sionless, so if the components of x do have physical units, we must normalize by a
‘reference’ volume, which in the case of the (ideal) gas is again h 3N . By the same
token, for indistinguishable particles, it should be divided by N ! While the micro-
canonical ensemble was defined in terms of the extensive variables E, V and N , in
the canonical ensemble, we replaced the variable E by the intensive variable that
controls it, namely, β (or T ). Thus, the full notation of the partition function should
be Z N (β, V ) or Z N (T, V ).
where
h
λ= √ . (2.2.39)
2πmkT
The formula of the B–G distribution is one of the most fundamental results in sta-
tistical mechanics, obtained solely from the energy conservation law and the postulate
of the uniform distribution in an isolated system. As we shall see, the meaning of the
partition function is by far deeper than just being a normalization constant. Interest-
ingly, a great deal of the macroscopic physical quantities, like the internal energy,
the free energy, the entropy, the heat capacity, the pressure, etc., can be obtained
from the partition function. This is in analogy to the fact that in the microcanonical
ensemble, S(E) (or, more generally, S(E, V, N )) was pivotal to the derivation of all
macroscopic physical quantities of interest.
9 The origin of this name comes√from the wave–particle de Broglie relation λ = h/ p together with
the fact that the denominator, 2πmkT , can be viewed as a notion of thermal √ momentum of the
ideal gas, given the fact that the average molecular speed is proportional to kT /m (see Sect. 1.1).
26 2 Elementary Statistical Physics
mg N mg N
Pfloor = ; Pceiling = . (2.2.40)
A(1 − e−mgh/kT ) A(emgh/kT − 1)
Let us now examine more closely the partition function and make a few observations
about its basic properties. For simplicity, we shall assume that x is discrete. First, let’s
obviously, Z (0) is equal to the size of the entire set of microstates,
look at the limits:
which is also E (E). This is the high temperature limit, where all microstates
are equiprobable. At the other extreme, we have:
ln Z (β)
lim = − min E(x) = −E GS (2.2.41)
β→∞ β x
which describes the situation where the system is frozen to the absolute zero. Only
states with minimum energy – the ground–state energy, prevail.
Another important property of Z (β), or more precisely, of ln Z (β), is that it is
a cumulant generating function: by taking derivatives of ln Z (β), we can obtain
cumulants of E(x). For the first cumulant, we have
−βE(x)
x E(x)e d ln Z (β)
E(X) = −βE(x)
=− . (2.2.42)
xe dβ
d2 ln Z (β)
Var{E(X)} = E 2 (X) − E(X)2 = . (2.2.44)
dβ 2
which means that ln Z (β) must always be a convex function. Note also that
d2 ln Z (β) dE(x)
2
=−
dβ dβ
dE(x) dT
=− ·
dT dβ
= kT C(T )
2
(2.2.46)
28 2 Elementary Statistical Physics
where C(T ) = dE(x)/dT is the heat capacity (at constant volume). Thus, the
convexity of ln Z (β) is intimately related to the physical fact that the heat capacity
of the system is positive.
Next, we look at the function Z (β) slightly differently. Instead of summing the
terms {e−βE(x) } over all states individually, we sum them by energy levels, in a
collective manner. This amounts to:
Z (β) = e−βE(x)
x
= (E)e−β E
E
≈ e N s()/k · e−β N
= exp{−N β[ − T s()]}
·
= max exp{−N β[ − T s()]}
= exp{−N β min[ − T s()]}
= exp{−N β[∗ − T s(∗ )]}
= e−β F , (2.2.47)
·
where here and throughout the sequel, the notation = means asymptotic equivalence
·
in the exponential scale. More precisely, a N = b N for two positive sequences {a N }
and {b N }, means that lim N →∞ N ln b N = 0.
1 aN
The quantity f = ∗ − T s(∗ ) is the (per–particle) free energy. Similarly, the
entire free energy, F, is defined as
ln Z (β)
F = E −TS = − = −kT ln Z (β). (2.2.48)
β
Once again, due to the exponentiality of (2.2.47) in N , with very high probability the
system would be found in a microstate x whose normalized energy (x) = E(x)/N is
very close to ∗ , the normalized energy that minimizes − T s() and hence achieves
f . Note that the minimizing ∗ (obtained by equating the derivative of − T s()
to zero), is the solution to the equation s (∗ ) = 1/T , which conforms with the
definition of temperature. We see then that equilibrium in the canonical ensemble
amounts to minimum free energy. This extends the second law of thermodynamics
from isolated systems to non–isolated ones. While in an isolated system, the second
law asserts the principle of maximum entropy, when it comes to a non–isolated
system, this rule is replaced by the principle of minimum free energy.
2.2 Statistical Ensembles 29
∂F ∂ ln Z N (β, V )
P=− = kT · .
∂V ∂V
Examine this formula for the canonical ensemble of the ideal gas. Compare to the
equation of state.
The physical meaning of the free energy, or more precisely, the difference between
two free energies F1 and F2 , is the minimum amount of work that it takes to transfer
the system from equilibrium state 1 to another equilibrium state 2 in an isothermal
(fixed temperature) process. This minimum is achieved when the process is quasi–
static, i.e., so slow that the system is always almost in equilibrium. Equivalently,
−F is the maximum amount of energy in the system, that is free and useful for
performing work (i.e., not dissipated as heat) in fixed temperature.
To demonstrate this point, let us consider the case where E(x) includes a term of
a potential energy that is given by the (scalar) product of a certain external force and
the conjugate physical variable at which this force is exerted (e.g., pressure times
volume, gravitational force times height, moment times angle, magnetic field times
magnetic moment, voltage times electric charge, etc.), i.e.,
where λ is the force and L(x) is the conjugate physical variable, which depends on
(some coordinates of) the microstate. The partition function then depends on both β
and λ and hence will be denoted11 Z (β, λ). It is easy to see (similarly as before) that
ln Z (β, λ) is convex in λ for fixed β. Also,
∂ ln Z (β, λ)
L(x) = kT · . (2.2.50)
∂λ
11 Since the term λ · L(x) is not considered part of the internal energy (but rather an external energy
resource), formally, this ensemble is no longer the canonical ensemble, but a somewhat different
ensemble, called the Gibbs ensemble, which will be discussed later on.
12 At this point, there is a distinction between the Helmholtz free energy and the Gibbs free energy.
F = E −TS
= −kT ln Z + λL(X)
∂ ln Z
= kT λ · − ln Z . (2.2.51)
∂λ
Now, let F1 and F2 be the equilibrium free energies pertaining to two values of λ,
denoted λ1 and λ2 . Then,
λ2
∂F
F2 − F1 = dλ ·
λ1 ∂λ
λ2
∂ 2 ln Z
= kT · dλ · λ ·
λ1 ∂λ2
λ2
∂L(X)
= dλ · λ ·
λ1 ∂λ
L(X )λ2
= λ · dL(X) (2.2.52)
L(X )λ1
ln (N ) s()
() = lim = . (2.2.54)
N →∞ N k
ln Z (β)
φ(β) = lim
N →∞ N
1
N [()−β]
= lim ln e
N →∞ N
= max[() − β].
2.2 Statistical Ensembles 31
The achiever, ∗ (β), of φ(β) in the forward transform is obtained by equating the
derivative to zero, i.e., it is the solution to the equation
β = (), (2.2.56)
where () is the derivative of (). In other words, ∗ (β) the inverse function of
(·). By the same token, the achiever, β ∗ (), of () in the backward transform is
obtained by equating the other derivative to zero, i.e., it is the solution to the equation
or in other words, the inverse function of −φ (·). This establishes a relationship
between the typical per–particle energy and the inverse temperature β that gives
rise to (cf. the Lagrange interpretation above, where we said that β controls the
average energy).
Example 2.3 (Two level system) Similarly to the earlier example of Schottky defects,
which was previously given in the context of the microcanonical ensemble, consider
now a system of N independent particles, each having two possible states: state 0 of
zero energy and state 1, whose energy is 0 , i.e., E(x) = 0 x, x ∈ {0, 1}. The xi ’s are
independent, each having a marginal14 :
e−β0 x
P(x) = x ∈ {0, 1}. (2.2.58)
1 + e−β0
In this case,
N /(eβ0 + 1), in agreement with the result of Example 2.1 (Eq. (2.2.17)). This demonstrates the
ensemble equivalence principle.
32 2 Elementary Statistical Physics
and
0 e−β0
− =0 (2.2.61)
1 + e−β0
which gives
ln(0 / − 1)
β ∗ () = . (2.2.62)
0
() = ln − 1 + ln 1 + exp − ln −1 , (2.2.63)
0 0 0
φ(β) = max h 2 − β , (2.2.65)
0
1 1 − /0
ln =β (2.2.66)
0 /0
or equivalently,
0
∗ (β) = , (2.2.67)
1 + e−β0
which is exactly the inverse function of β ∗ () above, and which when substituted
back into the expression of φ(β), indeed gives
Comment A very similar model (and hence with similar results) pertains to non–
interacting spins (magnetic moments), where the only difference is that x ∈ {−1, +1}
rather than x ∈ {0, 1}. Here, the meaning of the parameter 0 becomes that of
a magnetic field, which is more customarily denoted by B (or H ), and which is
either parallel or anti-parallel to that of the spin, and so the potential energy (in the
appropriate physical units), B · x, is either Bx or −Bx. Thus,
eβ Bx
P(x) = ; Z (β) = 2 cosh(β B). (2.2.69)
2 cosh(β B)
1 1
H̄ = − lim P(x) ln P(x) = − lim ln P(x), (2.2.72)
N →∞ N x N →∞ N
ln Z (β) βE(X)
= lim +
N →∞ N N
= φ(β) − β · φ (β),
but this is exactly the same expression as in (2.2.71), and so, () and H̄ are identical
whenever β and are related accordingly. The former, as we recall, we defined as the
normalized logarithm of the number of microstates with per–particle energy . Thus,
we have learned that the number of such microstates is of the exponential order of
e N H̄ . Another look at this relation is the following:
34 2 Elementary Statistical Physics
exp{−β i E(xi )}
1≥ P(x) =
x: E(x)≈N x: E(x)≈N
Z N (β)
·
= exp{−β N − N φ(β)}
x: E(x)≈N
A compatible lower bound is obtained by observing that the minimizing β gives rise
to E(X 1 ) = , which makes the event {x : E(x) ≈ N } a high–probability event,
by the weak law of large numbers. A good reference for further study, and from a
more general perspective, is the article by Hall [6]. See also [7].
Now, that we identified the Gibbs entropy with the Boltzmann entropy, it is instruc-
tive to point out that the B–G distribution could have been obtained also in a different
manner, owing to the maximum–entropy principle that stems from the second law,
or the minimum free–energy principle. Specifically, let us denote the Gibbs entropy
as
H (P) = − P(x) ln P(x) (2.2.76)
x
max H (P)
s.t. E(X) = E (2.2.77)
By formalizing the equivalent Lagrange problem, where β now plays the role of a
Lagrange multiplier:
max H (P) + β E − P(x)E(x) , (2.2.78)
x
or equivalently,
H (P)
min P(x)E(x) − (2.2.79)
x
β
2.2 Statistical Ensembles 35
one readily verifies that the solution to this problem is the B–G distribution where the
choice of the (Lagrange multiplier) β controls the average energy E. If β is identified
with the inverse temperature, the above is nothing but the minimization of the free
energy.
Note also that Eq. (2.2.71), which we will rewrite, with a slight abuse of notation
as
φ(β) − βφ (β) = (β) (2.2.80)
can be viewed in two ways. The first suggests to take derivatives of both sides w.r.t.
β and then obtain (β) = −βφ (β) and so,
s(β) = k(β)
∞
=k β̃φ (β̃)dβ̃ 3rd law
β
T
1 dT̃
=k · k T̃ 2 c(T̃ ) · c(T̃ ) = heat capacity per particle
0 k T̃ k T̃ 2
T
c(T̃ )dT̃
= (2.2.81)
0 T̃
recovering the Clausius entropy as c(T̃ )dT̃ is the increment of heat intake per particle
dq. The second way to look at Eq. (2.2.80) is as a first order differential equation in
φ(β), whose solution is easily found to be
∞
dβ̂(β̂)
φ(β) = −βGS + β · , (2.2.82)
β β̂ 2
Since F = E − ST , we have
T S
E = E GS + ST − S(T )dT = E GS + T (S )dS , (2.2.85)
0 0
36 2 Elementary Statistical Physics
where the second term amounts to the heat Q that accumulates in the system, as
the temperature is raised from 0 to T . This is a special case of the first law of
thermodynamics. The more general form, as said, takes into account also possible
work performed on (or by) the system.
Let us now summarize the main properties of the partition function that we have
seen thus far:
1. Z (β) is a continuous function. Z (0) = |X n | and limβ→∞ ln Zβ(β) = −E GS .
2. Generating cumulants: E(X) = −d ln Z /dβ, Var{E(X)} = d2 ln Z /dβ 2 , which
implies convexity of ln Z , and hence also of φ(β).
3. φ and are a Legendre–Fenchel transform pair. is concave.
We have also seen that Boltzmann’s entropy is not only equivalent to the Clausius
entropy, but also to the Gibbs/Shannon entropy. Thus, there are actually three different
forms of the expression of entropy.
√
Comment Consider Z (β) for an imaginary temperature β = iω, where i = −1,
and define z(E) as the inverse Fourier transform of Z (iω). It can readily be seen that
z(E) = ω(E), i.e., for E 1 < E 2 , the number of states with energy between E 1 and
E
E 2 is given by E12 z(E)dE. Thus, Z (·) can be related to energy enumeration in two
different ways: one is by the Legendre–Fenchel transform of ln Z (β) for real β, and
the other is by the inverse Fourier transform of Z (β) for imaginary β. It should be
kept in mind, however, that while the latter relation holds for every system size N ,
the former is true only in the thermodynamic limit, as mentioned.
It turns out that in the case of a quadratic Hamiltonian, E(x) = 21 αx 2 , which means
that x is Gaussian, the average per–particle energy, is always given by 1/(2β) =
kT /2, independently of α. If we have N such quadratic terms, then of course, we
end up with N kT /2. In the case of the ideal gas, we have three such terms (one
for each dimension) per particle, thus a total of 3N terms, and so, E = 3N kT /2,
which is exactly the expression we obtained also from the microcanonical ensemble
as well as in the previous chapter. In fact, we observe that in the canonical ensemble,
whenever we have an Hamiltonian of the form α2 xi2 plus some arbitrary terms that
do not depend on xi , then xi is Gaussian (with variance kT /α) and independent of
the other variables, i.e., p(xi ) ∝ e−αxi /(2kT ) . Hence it contributes an amount of
2
1 1 kT kT
αX i = α ·
2
= (2.2.86)
2 2 α 2
like three independent one–dimensional particles) and so, it contributes 3kT /2. This
principle is called the energy equipartition theorem.
Below is a direct derivation of the equipartition theorem:
∞ −βαx 2 /2
−∞ dx(αx /2)e
2
1
αX =
2
∞
2 −βαx 2 /2)
−∞ dxe
∞
∂
dxe−βαx /2
2
=− ln
∂β −∞
∞
∂ 1 √
−α( βx)2 /2
=− ln √ d( βx)e
∂β β −∞
∞
∂ 1 −αu 2 /2
=− ln √ due
∂β β −∞
1 d ln β 1 kT
= = = .
2 dβ 2β 2
Note that although we could have used closed–form expressions for both the numer-
ator and the denominator of the first line, we have deliberately taken a somewhat
different route in the second line, where we have presented it as the derivative of
the denominator of the first line. Also, rather than calculating the Gaussian integral
explicitly, we only figured out how it scales with β, because this is the only thing
that matters after taking the derivative relative to β. The reason for using this trick
of bypassing the need to calculate integrals, is that it can easily be extended in two
directions at least:
1. Let x ∈ IR N and let E(x) = 21 xT Ax, where A is a N × N positive definite
matrix. This corresponds to a physical system with a quadratic Hamiltonian, which
includes also interactions between pairs (e.g., harmonic oscillators or springs, which
are coupled because they are tied to one another). It turns out that here, regardless of
A, we get:
1 T kT
E(X) = X AX = N · . (2.2.87)
2 2
2. Back to the case of a scalar x, but suppose now a more general power–law
Hamiltonian, E(x) = α|x|θ . In this case, we get
kT
E(X ) = α|X |θ = . (2.2.88)
θ
Moreover, if lim x→±∞ xe−βE(x) = 0 for all β > 0, and we denote E (x) = dE(x)/dx,
then
X · E (X ) = kT. (2.2.89)
38 2 Elementary Statistical Physics
It is easy to see that the earlier power–law result is obtained as a special case of this,
as E (x) = αθ|x|θ−1 sgn(x) in this case.
px2 + p 2y + pz2
E(x) = + mgz. (2.2.90)
2m
The average kinetic energy of each particle is 3kT /2, as said before. The contribution
of the average potential energy is kT (one degree of freedom with θ = 1). Thus, the
total is 5kT /2, where 60% come from kinetic energy and 40% come from potential
energy, universally, that is, independent of T , m, and g.
A brief summary of what we have done thus far, is the following: we started with
the microcanonical ensemble, which was very restrictive in the sense that the energy
was held strictly fixed to the value of E, the number of particles was held strictly
fixed to the value of N , and at least in the example of a gas, the volume was also held
strictly fixed to a certain value V . In the passage from the microcanonical ensemble
to the canonical one, we slightly relaxed the first of these parameters, E: rather than
insisting on a fixed value of E, we allowed energy to be exchanged back and forth
with the environment, and thereby to slightly fluctuate (for large N ) around a certain
average value, which was controlled by temperature, or equivalently, by the choice
of β. This was done while keeping in mind that the total energy of both system and
heat bath must be kept fixed, by the law of energy conservation, which allowed us
to look at the combined system as an isolated one, thus obeying the microcanonical
ensemble. We then had a one–to–one correspondence between the extensive quantity
E and the intensive variable β, that adjusted its average value. But the other extensive
variables, like N and V were still kept strictly fixed.
It turns out, that we can continue in this spirit, and ‘relax’ also either one of the
other variables N or V (but not both at the same time), allowing it to fluctuate around
a typical average value, and controlling it by a corresponding intensive variable. Like
E, both N and V are also subjected to conservation laws when the combined system
is considered. Each one of these relaxations leads to a new ensemble, in addition to
the microcanonical and the canonical ensembles that we have already seen. In the
case where it is the variable V that is allowed to be flexible, this ensemble is called
the Gibbs ensemble. In the case where it is the variable N , this ensemble is called
the grand–canonical ensemble. There are, of course, additional ensembles based on
this principle, depending on the kind of the physical system.
2.2 Statistical Ensembles 39
The fundamental idea is essentially the very same as the one we used to derive the
canonical ensemble: let us get back to our (relatively small) subsystem, which is in
contact with a heat bath, and this time, let us allow this subsystem to exchange with
the heat bath, not only energy, but also matter, i.e., particles. The heat bath consists of
a huge reservoir of energy and particles. The total energy is E 0 and the total number
of particles is N0 . Suppose that we can calculate the number/volume of states of the
heat bath as a function of both its energy E and amount of particles N , and denote
this function by B (E , N ). A microstate is now a combination (x, N ), where N is
the (variable) number of particles in our subsystem and x is as before for a given N .
From the same considerations as before, whenever our subsystem is in state (x, N ),
the heat bath can be in any one of B (E 0 − E(x), N0 − N ) microstates of its own.
Thus, owing to the microcanonical ensemble,
P(x, N ) ∝ B (E 0 − E(x), N0 − N )
= exp{S B (E 0 − E(x), N0 − N )/k}
S B (E 0 , N0 ) 1 ∂ S B 1 ∂ SB
≈ exp − · E(x) − ·N
k k ∂E k ∂N
E(x) μN
∝ exp − + (2.2.91)
kT kT
where μ is the chemical potential of the heat bath. Thus, we now have the grand–
canonical distribution:
eβ[μN −E(x)]
P(x, N ) = , (2.2.92)
(β, μ)
Example 2.5 (Grand partition function of the ideal gas) Using the result of Exercise
2.1, we have for the ideal gas:
∞
N
1
βμN V
(β, μ) = e ·
N =0
N ! λ 3
∞
1 V N
= eβμ · 3
N =0
N! λ
40 2 Elementary Statistical Physics
V
= exp eβμ · 3 . (2.2.94)
λ
˜
This notation emphasizes the fact that for a given β, (z) is actually the z–transform
of the sequence {Z N (β)} N ≥0 . A natural way to think about P(x, N ) is as P(N ) ·
P(x|N ), where P(N ) is proportional to z N Z N (β) and P(x|N ) corresponds to the
canonical ensemble as before.
Using the grand partition function, it is now easy to obtain moments of the random
variable N . For example, the first moment is:
N z N Z N (β) ˜
∂ ln (β, z)
N = N N =z· . (2.2.96)
N z Z N (β) ∂z
˜
kT ln (β, z, V ) ≈ max[μN + kT ln Z N (β, V )].
N
15 The best way to understand this is in analogy to the derivation of ∗ as the minimizer of the free
energy in the canonical ensemble, except that now the ‘big’ extensive variable is V rather than
N , so that z N Z N (β, V ) is roughly exponential in V for a given fixed ρ = N /V . The exponential
coefficient depends on ρ, and the ‘dominant’ ρ∗ maximizes this coefficient. Finally, the ‘dominant’
N is N ∗ = ρ∗ V .
2.2 Statistical Ensembles 41
PδV = μδ N + T δS − δ E
= δ(μN + T S − E)
= δ(μN − F)
≈ kT · δ[ln (β, z, V )] V large (2.2.97)
ln (β, z, V )
P = kT · lim . (2.2.98)
V →∞ V
Example 2.6 (more on the ideal gas) Applying formula (2.2.96) to Eq. (2.2.94), we
readily obtain
zV eμ/kT V
N = = . (2.2.99)
λ3 λ3
We see that the grand–canonical factor eμ/kT has the physical meaning of the average
number of ideal gas atoms in a cube of size λ × λ × λ, where λ is the thermal de
Broglie wavelength. Now, applying Eqs. (2.2.98) to (2.2.94), we get
kT · eμ/kT N · kT
P= = , (2.2.100)
λ3 V
recovering again the equation of state of the ideal gas. This also demonstrates the
principle of ensemble equivalence.
Once again, it should be pointed out that beyond the obvious physical significance
of the grand–canonical ensemble, sometimes it proves useful to work with from rea-
sons of pure mathematical convenience, using the principle of ensemble equivalence.
We will see this very clearly in the next chapters on quantum statistics.
Consider now the case where T and N are fixed, but V is allowed to fluctuate
around an average volume controlled by the pressure P. Again, we can analyze our
relatively small test system surrounded by a heat bath. The total energy is E 0 and
the total volume of the system and the heat bath is V0 . Suppose that we can calculate
the count/volume of states of the heat bath as function of both its energy E and the
volume V , call it B (E , V ). A microstate is now a combination (x, V ), where V
is the (variable) volume of our subsystem. Once again, the same line of thought is
42 2 Elementary Statistical Physics
used: whenever our subsystem is at state (x, V ), the heat bath may be in any one of
B (E 0 − E(x), V0 − V ) microstates of its own. Thus,
P(x, V ) ∝ B (E 0 − E(x), V0 − V )
= exp{S B (E 0 − E(x), V0 − V )/k}
S B (E 0 , V0 ) 1 ∂ S B 1 ∂ SB
≈ exp − · E(x) − ·V
k k ∂E k ∂V
E(x) PV
∝ exp − −
kT kT
= exp{−β[E(x) + P V ]}. (2.2.101)
For a given N and β, the function Y N (β, P) can be thought of as the Laplace transform
of Z N (β, V ) as a function of V . In the thermodynamic limit, lim N →∞ N1 ln Y N (β, P)
is the Legendre–Fenchel transform of lim N →∞ N1 ln Z N (β, V ) for fixed β, similarly
to the Legendre–Fenchel relationship between the entropy and the canonical log–
partition function.16 Note that analogously to Eq. (2.2.96), here the Gibbs partition
function serves as a cumulant generating function for the random variable V , thus,
for example,
∂ ln Y N (β, P)
V = −kT · . (2.2.103)
∂P
As mentioned in footnote no. 20,
is the Gibbs free energy of the system, and for the case considered here, the force
is the pressure and the conjugate variable it controls is the volume. In analogy to
the grand–canonical ensemble, here too, there is only one extensive variable, this
time, it is N . Thus, G should be asymptotically proportional to N with a constant of
proportionality that depends on the fixed values of T and P.
16 Exercise 2.5 Write explicitly the Legendre–Fenchel relation (and its inverse) between the Gibbs
partition function and the canonical partition function.
2.2 Statistical Ensembles 43
Exercise 2.7 Complete the diagram of Fig. 2.2 by the three additional ensembles just
defined. Can you give physical meanings to A, B and C? Also, as said, C(E, P, μ) has
only E as an extensive variable. Thus, lim E→∞ C(E, P, μ)/E should be a constant.
What is this constant?
Even more generally, we could start from a system model, whose micro–canonical
ensemble consists of many extensive variables L 1 , . . . , L n , in addition to the inter-
nal energy E (not just V and N ). The entropy function is then S(E, L 1 , . . . , L n , N ).
Here, L i can be, for example, volume, mass, electric charge, electric polarization in
each one of the three axes, magnetization in each one of three axes, and so on. The
first Legendre–Fenchel transform takes us from the micro–canonical ensemble to the
canonical one upon replacing E by β. Then we can think of various Gibbs ensem-
bles obtained by replacing any subset of extensive variables L i by their respective
conjugate forces λi = T ∂ S/∂ L i , i = 1, . . . , n (in the above examples: pressure,
gravitational force (weight), voltage (or electric potential), electric fields, and mag-
netic fields in the corresponding axes, respectively). In the extreme case, all L i are
replaced by λi upon applying successive Legendre–Fenchel transforms, or equiva-
lently, a multi–dimensional Legendre–Fenchel transform:
Part of the presentation in this chapter is similar to a corresponding chapter in [1]. The
derivations associated with the various ensembles of statistical mechanics, as well
as their many properties, can also be found in any textbook on elementary statistical
mechanics, including: Beck [8, Chap. 3], Huang [9, Chaps. 6, 7], Honerkamp [10,
Chap. 3], Landau and Lifshitz [2], Pathria [11, Chaps. 2–4], and Reif [12, Chap. 6],
among many others.
References 45
References
1. N. Merhav, Statistical physics and information theory. Found. Trends Commun. Inf. Theory
6(1–2), 1–212 (2009)
2. L.D. Landau, E.M. Lifshitz, Course of Theoretical Physics – Volume 5: Statistical Physics,
Part 1, 3rd edn. (Elsevier: Butterworth–Heinemann, New York, 1980)
3. J. Barré, D. Mukamel, S. Ruffo, Inequivalence of ensembles in a system with long-range
interactions. Phys. Rev. Lett. 87(3), 030601 (2001)
4. D. Mukamel, Statistical mechanics of systems with long range interactions, in AIP Conference
Proceedings, vol. 970, no. 1, ed. by A. Campa, A. Giansanti, G. Morigi, F. Sylos Labini (AIP,
2008), pp. 22–38
5. R. Kubo, Statistical Mechanics (North-Holland, New York, 1965)
6. M.J.W. Hall, Universal geometric approach to uncertainty, entropy, and information. Phys. Rev.
A 59(4), 2602–2615 (1999)
7. R.S. Ellis, Entropy, Large Deviations, and Statistical Mechanics (Springer, New York, 1985)
8. A.H.W. Beck, Statistical Mechanics, Fluctuations and Noise (Edward Arnold Publishers, Lon-
don, 1976)
9. K. Huang, Statistical Mechanics, 2nd edn. (Wiley, New York, 1987)
10. J. Honerkamp, Statistical Physics - An Advanced Approach with Applications, 2nd edn.
(Springer, Berlin, 2002)
11. R.K. Pathria, Statistical Mechanics, 2nd edn. (Elsevier: Butterworth-Heinemann, Oxford,
1996)
12. F. Reif, Fundamentals of Statistical and Thermal Physics (McGraw-Hill, New York, 1965)
Chapter 3
Quantum Statistics – The Fermi–Dirac
Distribution
In our discussion thus far, we have largely taken for granted the assumption that our
system can be analyzed in the classical regime, where quantum effects are negligible.
This is, of course, not always the case, especially at very low temperatures. Also,
if radiation plays a role in the physical system, then at very high frequency ν, the
classical approximation also breaks down. Roughly speaking, kT should be much
larger than hν for the classical regime to be well justified.1 It is therefore necessary
to address quantum effects in statistical physics issues, most notably, the fact that
certain quantities, like energy and angular momentum (or spin), no longer take on
values in the continuum, but only in a discrete set, which depends on the system in
question.
Consider a gas of identical particles with discrete single–particle quantum states,
1, 2, . . . , r, . . ., corresponding to energies
1 ≤ 2 ≤ · · · ≤ r ≤ · · · .
Since the particles are assumed indistinguishable, then for a gas of N particles, a
micro–state is defined by the combination of occupation numbers
N 1 , N 2 , . . . , Nr , . . . ,
1 One well–known example is black–body radiation. According to the classical theory, the radiation
density per unit frequency grows proportionally to kT ν 2 , a function whose integral over ν, from zero
to infinity, diverges (“the ultraviolet catastrophe”). This absurd is resolved by quantum mechanical
considerations, according to which the factor kT should be replaced by hν/[ehν/(kT ) − 1], which
is close to kT at low frequencies, but decays exponentially for ν > kT / h.
© The Author(s) 2018 47
N. Merhav, Statistical Physics for Electrical Engineering,
DOI 10.1007/978-3-319-62063-3_3
48 3 Quantum Statistics – The Fermi–Dirac Distribution
The first fundamental question is the following: what values can the occupation
numbers N1 , N2 , . . . assume? According to quantum mechanics, there might be cer-
tain restrictions on these numbers. In particular, there are two kinds of situations that
may arise, which divide the various particles in the world into two mutually exclusive
classes.
For the first class of particles, there are no restrictions at all. The occupation
numbers can assume any non–negative integer value (Nr = 0, 1, 2, . . .). Particles
of this class are called Bose–Einstein (BE) particles2 or bosons for short. Another
feature of bosons is that their spins are always integral multiples of , namely, 0, ,
2, etc. Examples of bosons are photons, π mesons and K mesons. We will focus
on them in the next chapter.
In the second class of particles, the occupation numbers are restricted by the Pauli
exclusion principle (discovered in 1925), according to which no more than one par-
ticle can occupy a given quantum state r (thus Nr is either 0 or 1 for all r ), since
the wave function of two such particles is anti–symmetric and thus vanishes if they
assume the same quantum state (unlike bosons for which the wave function is sym-
metric). Particles of this kind are called Fermi–Dirac (FD) particles3 or fermions for
short. Another characteristic of fermions is that their spins are always odd multiples
of /2, namely, /2, 3/2, 5/2, etc. Examples of fermions are electrons, positrons,
protons, and neutrons. The statistical mechanics of fermions will be discussed in this
chapter.
2 Bosons were first introduced by Bose (1924) in order to derive Planck’s radiation law, and Einstein
To derive the equilibrium behavior of this system, we analyze the Helmholtz free
energy F as a function of the occupation
numbers, and use the fact that in equilibrium,
it should be minimum. Since E = s N̂s ˆs and F = E − T S, this boils down to
the evaluation of the entropy S = k ln ( N̂1 , N̂2 , . . .). Let s ( N̂s ) be the number of
ways of putting N̂s particles into G s states of group no. s. Now, for fermions each
one of the G s states is either empty or occupied by one particle. Thus,
Gs !
s ( N̂s ) = (3.1.1)
N̂s !(G s − N̂s )!
and
( N̂1 , N̂2 , . . .) = s ( N̂s ). (3.1.2)
s
Therefore,
F( N̂1 , N̂2 , . . .) = [ N̂s ˆs − kT ln s ( N̂s )]
s
N̂s
≈ N̂s ˆs − kT G s h 2 . (3.1.3)
s
Gs
As said, we wish to minimize F( N̂1 , N̂2 , . . .) s.t. the constraint s N̂s = N . Consider
then the minimization of the Lagrangian4
N̂s
L= N̂s ˆs − kT G s h 2 −λ N̂s − N . (3.1.4)
s
Gs s
Gs
N̂s = (3.1.5)
e(ˆs −λ)/kT + 1
4 For readers that are not familiar with Lagrangians, the minimization of F s.t. s N̂s = N , is
equivalent to the unconstrained minimization of F − λ( s N̂s − N ) for the value of λ at which
the constraint is met with equality by the minimizer { N̂s∗ }. This is because F( N̂1∗ , N̂2∗ , . . .) −
λ( s N̂s∗ − N ) ≤ F( N̂1 , N̂2 , . . .) − λ( s N̂s − N ), together with s N̂s = s N̂s∗ = N , imply
F( N̂1∗ , N̂2∗ , . . .) ≤ F( N̂1 , N̂2 , . . .) for every { N̂s } with s N̂s = N .
50 3 Quantum Statistics – The Fermi–Dirac Distribution
0.5
0.4
0.3
0.2
0.1
0
0 2 4 6 8 10
Exercise 3.1 After showing the general relation μ = (∂ F/∂ N )T,V , show that λ =
μ, namely, the Lagrange multiplier λ has the physical meaning of the chemical
potential. From now on, then we replace the notation λ by μ.
Note that N̂s /G s is the mean occupation number N̄r of a single state r within
group no. s. I.e.,
1
N̄r = (3.1.7)
e(r −μ)/kT +1
It is pleasing that this result no longer depends on the partition into groups. Equa-
tion (3.1.7) is the FD distribution, and it is depicted in Fig. 3.1.
∞
1
1
(β, μ) = eβμN ...δ Nr = N e−β r Nr r
N =0 N1 =0 N2 =0 r
∞
1
1
= ... δ Nr = N eβ r Nr (μ−r )
N1 =0 N2 =0 N =0 r
1
1
= . . . eβ r Nr (μ−r )
N1 =0 N2 =0
1
1
= ... eβ Nr (μ−r )
N1 =0 N2 =0 r
1
β Nr (μ−r )
= e
r Nr =0
= 1 + eβ(μ−r ) . (3.2.2)
r
Note that this product form of the grand partition function means that under the grand–
canonical ensemble the binary random variables {Nr } are statistically independent,
i.e.,
P(N1 , N2 , . . .) = Pr (Nr ) (3.2.3)
r
where
eβ Nr (μ−r )
Pr (Nr ) = , Nr = 0, 1, r = 1, 2, . . . . (3.2.4)
1 + eβ(μ−r )
Thus,
e(μ−r )/kT 1
N̄r = Pr{Nr = 1} = = ( −μ)/kT . (3.2.5)
1 + e(μ−r )/kT e r +1
Equivalently, defining αr = β(μ − r ), we have = r 1Nr =0 eαr Nr , giving rise
to N̄r = ∂ ln /∂αr = eαr /(1 + eαr ), which is the same result.
52 3 Quantum Statistics – The Fermi–Dirac Distribution
Let us now examine what happens if the system is cooled to the absolute zero (T →
0). It should be kept in mind that the chemical potential μ depends on T , so let μ0
be the chemical potential at T = 0. It is readily seen that N̄r approaches a unit
step function (see Fig. 3.1), namely, all energy levels {r } below μ0 are occupied
( N̄r ≈ 1) by a fermion, whereas all those that are above μ0 are empty ( N̄r ≈ 0).
The explanation is simple: Pauli’s exclusion principle does not allow all particles to
reside at the ground state at T = 0 since then many of them would occupy the same
quantum state. The minimum energy of the system that can possibly be achieved is
when all energy levels are filled up, one by one, starting from the ground state up to
some maximum level, which is exactly μ0 . This explains why even at the absolute
zero, fermions have energy.5 The maximum occupied energy level in a gas of non–
interacting fermions at the absolute zero is called the Fermi energy, which we shall
denote by F . Thus, μ0 = F , and then the FD distribution at very low temperatures
is approximately
1
N̄r = . (3.3.1)
e(r −F )/kT +1
We next take a closer look on the FD distribution, taking into account the density
of states. Consider a metal box of dimensions L x × L y × L z and hence volume
V = L x L y L z . The energy level associated with quantum number (l x , l y , l z ) is given
by
π 2 2 l x2 l y2 l z2 2 2
lx ,l y ,lz = + + = (k + k 2y + k z2 ), (3.3.2)
2m L 2x L 2y L 2z 2m x
where k x , k y and k z are the wave numbers pertaining to the various solutions of
the Schrödinger equation. First, we would like to count how many quantum states
{(l x , l y , l z )} give rise to energy between and + d. We denote this number by
g()d, where g() is the density of states.
2m π 2 l x2 π 2 l y2 π 2 l z2 2m( + d)
g()d = 1 ≤ + + ≤
l x ,l y ,l z
2 Lx 2 Ly 2 Lz 2 2
≈
Lx L y Lz
· Vol k 2 ≤ 2m( + d)
: 2m ≤ k
π3 2 2
V
= 3 · Vol k :
2m
≤ 2 ≤ 2m( + d) .
k (3.3.3)
π 2 2
where ρe is the electron density. In most metals F is about the order of 5–10 electron–
volts (eV’s), whose equivalent temperature TF = F /k (the Fermi temperature) is of
the order of magnitude of 100,000 ◦ K. Hence, the Fermi energy is much larger than
kT in laboratory conditions. In other words, electrons in a metal behave like a gas
54 3 Quantum Statistics – The Fermi–Dirac Distribution
at an extremely high temperature. This means that the internal pressure in metals
(the Fermi pressure) is extremely large and this a reason why metals are almost
incompressible. This kind of pressure also stabilizes a neutron star (a Fermi gas of
neutrons) or a white dwarf star (a Fermi gas of electrons) against the inward pull of
gravity, which would ostensibly collapse the star into a Black Hole. Only when a
star is sufficiently massive to overcome the degeneracy pressure can it collapse into
a singularity.
1
f () = . (3.4.1)
e(−μ)/kT + 1
For example, if we wish to calculate the average energy, we have to deal with an
integral like
∞
3/2 f ()d.
0
where the last line was obtained by calculating the integral of x 2 φ (x) using a power
series expansion. Note that this series contains only even powers of kT /μ, thus the
convergence is rather fast. Let us now repeat the calculation of Eq. (3.3.7), this time
at T > 0.
√ ∞ √
2m 3 d
ρe ≈ 2 3 · (−μ)/kT + 1
π 0 e
√
2m 3
= 2 3 · I1/2
π
√
2m 3 2 3/2 π 2 kT 2
= 2 3 · μ 1+ (3.4.5)
π 3 8 μ
which gives
2/3
π 2 kT 2
F = μ 1 +
8 μ
π 2 kT 2
≈ μ 1+
12 μ
(πkT )2
= μ+ . (3.4.6)
12μ
This relation between μ and F can be easily inverted by solving a simple quadratic
equation, which yields
56 3 Quantum Statistics – The Fermi–Dirac Distribution
F + F 1 − (πkT /F )2 /3
μ≈
2
2
π kT 2
≈ F 1 − ·
12 F
2
π2 T
= F 1 − · . (3.4.7)
12 TF
Since T /TF 1 for all T in the interesting range, we observe that the chemical
potential depends extremely weakly on T . In other words, we can safely approximate
μ ≈ F for all relevant temperatures of interest. The assumption that kT μ was
found self–consistent with the result μ ≈ F .
Having established the approximation μ ≈ F , we can now calculate the average
energy of the electron at an arbitrary temperature T :
√
2m 3 ∞ 3/2 d
= 2 3
π ρe 0 e(−F )/kT + 1
√
2m 3
= 2 3 · I3/2
π ρe
√ 5/2
2m 3 2F 5π 2 T 2
≈ 2 3 · 1+
π ρe 5 8 TF
2
2
32 5π T
= · (3π 2 ρe )2/3 · 1 +
10m 8 TF
3F 5π 2 T 2
= · 1+ (3.4.8)
5 8 TF
Note that the dependence of the average per–particle energy on the temperature is
drastically different from that of the ideal gas. While in the idea gas it was linear
(
= 3kT /2), here it is actually almost a constant, independent of the temperature
(just like the chemical potential).
The same technique can be used, of course, to calculate any moment of the electron
energy.
physics of white dwarfs. We next briefly touch upon the very basics of conductance
in solids, as well as on two other applications: thermionic emission and photoelectric
emission.
The structure of the electron energy levels in a solid are basically obtained using
quantum–mechanical considerations. In the case of a crystal, this amounts to solving
the Schrödinger equation in a periodic potential, stemming from the corresponding
periodic lattice structure. Its idealized form, which ignores the size of each atom, is
given by a train of equispaced Dirac delta functions. This is an extreme case of the so
called Kronig–Penney model, where the potential function is a periodic rectangular
on–off function (square wave function), and it leads to a certain band structure. In
particular, bands of allowed energy levels are alternately interlaced with bands of
forbidden energy levels. The Fermi energy level F , which depends on the overall
concentration of electrons, may either fall in an allowed band or in a forbidden band.
The former case is the case of a metal, whereas the latter case is the case of an
insulator or a semiconductor (the difference being only how wide is the forbidden
band in which F lies). While in metals it is impossible to change F , it is possible
by doping in semiconductors.
A semiconductor can then be thought of as a system with electron orbitals grouped
into two6 energy bands separated by an energy gap. The lower band is the valence
band (where electrons are tied to their individual atoms) and the upper band is the
conduction band, where they are free. In a pure semiconductor at T = 0, all valence
orbitals are occupied with electrons and all conduction orbitals are empty. A full
band cannot carry any current so a pure semiconductor at T = 0 is an insulator. In a
pure semiconductor the Fermi energy is exactly in the middle of the gap between the
valence band (where f () is very close 1) and the conduction band (where f () is very
close to 0). Finite conductivity in a semiconductor follows either from the presence of
electrons in the conduction band (conduction electrons) or from unoccupied orbitals
in the valence band (holes).
Two different mechanisms give rise to conduction electrons and holes: the first is
thermal excitation of electrons from the valence band to the conduction band, while
the second is the presence of impurities that change the balance between the number
of orbitals in the valence band and the number of electrons available to fill them.
We will not delve into this too much beyond this point, since this material is
normally well–covered in other courses in the standard curriculum of electrical engi-
neering, namely, courses on solid state physics. Here we only demonstrate the use of
the FD distribution in order to calculate the density of charge carriers. The density of
charge carriers n of the conduction band is found by integrating up, from the conduc-
6 We treat both bands as single bands for our purposes. It does not matter that both may be themselves
tion band edge C , the product of the density of states ge () and the FD distribution
f (), i.e.,
∞ √ √
2m 3 ∞ − C d
n= d · ge () f () = 2 3 (−F )/kT + 1
, (3.5.1)
C π C e
where here m designates the effective mass of√the electron7 and where we have taken
the density of states to be proportional to − C since C is now the reference
energy and only the difference − C goes for kinetic energy.8 For a semiconductor
at room temperature, kT is much smaller than the gap, and so
We see then that the density of conduction electrons, and hence also the conduction
properties, depend critically on the gap between C and F . A similar calculation
holds for the holes, of course.
7 The effective mass is obtained by a second order Taylor series expansion of the energy as a
function of the wave-number (used to obtain the density of states), and thinking of the coefficient
of the quadratic term as 2 /2m.
8 Recall that earlier we calculated the density of states for a simple potential well, not for a periodic
potential function. Thus, the earlier expression of ge () is not correct here.
3.5 Applications of the FD Distribution 59
Thus, dvx = hdl x /(2m L x ) and similar relations hold for the two other components,
which together yield
m 3
dl x dl y dl z = V dvx dv y dvz , (3.5.5)
h
where we have divided by 8 since every quantum state can be occupied by only one
out of 8 combinations of the signs of the three component velocities. The fraction of
electrons dN with quantum states within the cube dl x dl y dl z is simply the expected
number of occupied quantum states within that cube, which is
dl x dl y dl z
dN = .
1 + exp{(lx l y lz − )/kT }
Thus, we can write the distribution function of the number of electrons in a cube
dvx × dv y × dvz as
m 3 dvx dv y dvz
dN = 2V , (3.5.6)
h 1 1
1 + exp m(vx2 + v 2y + vz2 ) − F
kT 2
where we have doubled the expression due to the spin and we have taken the chemical
potential of the electron gas to be F , independently of temperature, as was justified
in the previous subsection. Assuming that the surface is parallel to the Y Z plane, the
9 We are assuming that the potential barrier φ is fairly large (relative to kT ), such that the relationship
between energy and quantum numbers is reasonably well approximated by that of a particle in a
box.
60 3 Quantum Statistics – The Fermi–Dirac Distribution
minimum escape velocity in the x–direction is v0 = m2 (F + φ) and there are no
restrictions on v y and vz . The current along the x–direction is
where the factor vx dt/L x in the second line is the fraction of electrons close enough
to the surface so as to be emitted within time dt. Thus, the current density (current
per unity area) is
m 3 ∞ +∞ +∞ dv y dvz
J = 2qe dvx · vx .
h v0 −∞ −∞ 1 1
1 + exp m(vx + v y + vz ) − F
2 2 2
kT 2
(3.5.8)
which yields
∞
4πm 2 qe kT 1 1
J= dvx · vx ln 1 + exp F − mvx2 . (3.5.10)
h3 v0 kT 2
3.5 Applications of the FD Distribution 61
Now, since normally10 φ kT , the exponent in the integrand is very small through-
out the entire range of integration and so, it is safe to approximate it by ln(1+ x) ≈ x,
i.e.,
∞
4πm 2 qe kT F /kT
dvx · vx e−mvx /2kT
2
J ≈ e
h3 v0
4πmqe (kT ) 1 2
1 2
= exp F − mv0
h3 kT 2
4πmqe (kT )2 −φ/kT
= e , (3.5.11)
h3
and thus we have obtained a simple expression for the current density as function
of temperature. This result, which is known as the Richardson–Dushman equation,
is in very good agreement with experimental evidence. Further discussion on this
result can be found in [1, 2].
1
hν + mv02 = F + φ = F + hν0 . (3.5.12)
2
Let α denote the probability that a photon actually excites an electron. Then, similarly
as in the previous subsection,
∞
4πm 2 qe kT 1 1
J =α· dvx · vx ln 1 + exp F − mvx2 . (3.5.13)
h3 v0 kT 2
For h(ν − ν0 ) kT , we have e 1, and then it can be shown (using the same
technique as in Sect. 3.4) that f (e ) ≈ 2 /2, which gives
2πmqe
J =α· (ν − ν0 )2 (3.5.18)
h
independently of T . In other words, when the energy of light quantum is much larger
than the thermal energy kT , temperature becomes irrelevant. At the other extreme of
very low frequency, where h(ν0 − ν) kT , and then e 1, we have f (e ) ≈ e ,
and then
4πmqe (kT )2 (hν−φ)/kT
J =α· e (3.5.19)
h3
which is like the thermionic current density, enhanced by a photon factor ehν/kT .
The Fermi–Dirac distribution, its derivation, and its various applications can also be
found in many alternative textbooks, such as: Beck [1, Chap. 4], Huang [3, Chap. 11],
Kittel [4, Part I, Chap. 19], Landau and Lifshitz [5, Chap. V], Mandl [6, Sect. 11.4.1],
3.6 Suggestions for Supplementary Reading 63
Pathria [2], and Reif [7, Chap. 9]. The exposition in this chapter is based, to a large
extent, on the books by Beck, Mandl and Pathria. Applications to semiconductor
physics are based also on Omar [8, Chaps. 6, 7] and Gershenfeld [9].
References
1. A.H.W. Beck, Statistical Mechanics, Fluctuations and Noise (Edward Arnold Publishers, Lon-
don, 1976)
2. R.K. Pathria, Statistical Mechanics, 2nd edn. (Elsevier: Butterworth-Heinemann, Oxford, 1996)
3. K. Huang, Statistical Mechanics, 2nd edn. (Wiley, New York, 1987)
4. C. Kittel, Elementary Statistical Physics (Wiley, New York, 1958)
5. L.D. Landau, E.M. Lifshitz, Course of Theoretical Physics – Volume 5: Statistical Physics, Part
1, 3rd edn. Elsevier: Butterworth–Heinemann, New York (1980)
6. F. Mandl, Statistical Physics (Wiley, Chichester, 1971)
7. F. Reif, Fundamentals of Statistical and Thermal Physics (McGraw-Hill, New York, 1965)
8. M.A. Omar, Elementary Solid State Physics: Principles and Applications (Addison Wesley,
Reading, 1975)
9. N. Gershenfeld, The Physics of Information Technology (Cambridge University Press, Cam-
bridge, 2000)
Chapter 4
Quantum Statistics – The Bose–Einstein
Distribution
Using the same notation as in Chap. 3, again, we are partitioning the energy levels
1 , 2 , . . . into groups, labeled by s, where in group no. s, which has G s quantum
states, the representative energy is ˆs . As before, a microstate is defined in terms
of { N̂s } and ( N̂1 , N̂2 , . . .) = s s ( N̂s ), but now we need a different estimate of
each factor s ( N̂s ), since now there are no restrictions on the occupation numbers
of the quantum states. In how many ways can one partition N̂s particles among G s
states? Imagine that the N̂s particles of group no. s are arranged along a line. By
means of G s − 1 partitions we divide the particles into G s subsets corresponding to
the various states in that group. We have a total of ( N̂s + G s − 1) elements, N̂s of
them are particles and the remaining (G s − 1) are partitions (see Fig. 4.1). In how
many distinct ways can we configure them? The answer is simple:
( N̂s + G s − 1)!
s ( N̂s ) = . (4.1.1)
N̂s !(G s − 1)!
On the account that G s 1, the −1 term can be safely neglected, and we approximate
( N̂s + G s )!
s ( N̂s ) = . (4.1.2)
N̂s !G s !
Repeating the same derivation as in Sect. 4.1, but with the above s ( N̂s ), we get:
Gs
ln s ( N̂s ) ≈ ( N̂s + G s )h 2 , (4.1.3)
N̂s + G s
Gs
F≈ N̂s ˆs − kT ( N̂s + G s )h 2 , (4.1.4)
s N̂s + G s
which should be minimized s.t. s N̂s = N . Upon carrying out the minimization
of the corresponding Lagrangian, we arrive1 at the following result for the most
probable occupation numbers:
Gs
N̂s = β(ˆ
−μ)
(4.1.5)
e s −1
1
N̄r = , (4.1.6)
eβ(r −μ) −1
where μ is again the Lagrange multiplier, which has the meaning of the chemical
potential. This is Bose–Einstein (BE) distribution. As we see, the formula is very
similar to that of the FD distribution, the only difference is that in the denominator,
+1 is replaced by −1. Surprisingly enough, this is a crucial difference that makes
the behavior of bosons drastically different from that of fermions. Note that for this
expression to make sense, μ must be smaller than the ground energy 1 , otherwise
the denominator either vanishes or becomes negative. If the ground–state energy is
zero, this means μ < 0.
As in Sect. 4.2, an alternative derivation can be carried out using the grand–canonical
ensemble. The only difference is that now, the summations over {Nr }, are not only
over {0, 1} but over all non–negative integers. In particular,
Of course, here too, for convergence of each geometric series, we must assume
μ < 1 , and then the result is
1
(β, μ) = . (4.2.2)
r
1 − eβ(μ−r )
Thus, N̄r is just the expectation of this geometric random variable, which is readily
found2 to be as in Eq. (4.1.6).
In analogy to the FD case, here too, the chemical potential μ is determined from the
constraint on the total number of particles. In this case, it reads
1
= N. (4.3.1)
r
eβ(r −μ) − 1
At this point, an important peculiarity should be discussed. Consider Eq. (4.3.2) and
suppose that we are cooling the system. As T decreases, μ must adjust in order to keep
Eq. (4.3.2) holding since the number of particles must be preserved. In particular, as
T decreases, μ must increase, yet it must be negative. The point is that even for μ = 0,
which is the maximum allowed value of μ,√ the integral at the r.h.s. of (4.3.2) is finite3
as the density of states is proportional to and hence balances the divergence of
the BE integral near = 0. Let us define then
and let Tc be the solution to the equation (T ) = ρ, which can be found as follows.
By changing the integration variable to z = /kT , we can rewrite the r.h.s. as
3/2
∞ √ 3/2
mkT 2 zdz mkT
(T ) = √ ≈ 2.612 · , (4.3.4)
2π2 π 0 e −1
z 2π2
where the constant 2.612 is the numerical value of the expression in the curly brackets.
Thus,
2π2 2/3 2 ρ2/3
Tc ≈ 0.5274 · · ρ = 3.313 · . (4.3.5)
mk mk
The problem is that for T < Tc , Eq. (4.3.2) can no longer be solved by any non–
positive value of μ. So what happens below Tc ?
The root of the problem is in the passage from the discrete sum over r to the
integral over . The paradox is resolved when it is understood that below Tc , the
contribution of = 0 should be separated from the integral. That is, the correct form
is
√ ∞ √
1 2m 3 V d
N = −μ/kT + · . (4.3.6)
e −1 2π 2 3 0 e(−μ)/kT − 1
where ρ0 is the density of ground–state particles, and now the integral accommodates
the contribution of all particles with strictly positive energy. Now, for T < Tc ,
we simply have ρ0 = ρ − (T ), which means that a macroscopic fraction of the
particles condensate at the ground state. This phenomenon is called Bose–Einstein
condensation. Note that for T < Tc ,
ρ0 = ρ − (T )
= (Tc ) − (T )
(T )
= (Tc ) 1 −
(Tc )
3/2
T
= (Tc ) 1 −
Tc
4.3 Bose–Einstein Condensation 69
3/2
T
= ρ 1− (4.3.8)
Tc
where integral over x (including the minus sign) is just a positive constant C that we
will not calculate here. Now,
√
kT ln C 2m 3 (kT )5/2
P = lim = . (4.3.10)
V →∞ V 2π 2 3
We see that the pressure is independent of the density ρ (compare with the ideal gas
where P = ρkT ). This is because the condensed particles do not contribute to the
pressure. What matters is only the density of those with positive energy, and this
density in turn depends only on T .
Exercise 4.4 Why fermions do not condensate? What changes in the last derivation?
Exercise 4.5 The last derivation was in three dimensions (d = 3). Modify the deriva-
tion of the BE statistics to apply to a general dimension d, taking into account the
dependence of the density of states upon d. For which values of d bosons condensate?
4 In 1995 the first gaseous condensate was produced by Eric Cornell and Carl Wieman at the Univer-
sity of Colorado, using a gas of rubidium atoms cooled to 170 nanokelvin. For their achievements
Cornell, Wieman, and Wolfgang Ketterle of MIT received the 2001 Nobel Prize in Physics. In
November 2010 the first photon BEC was observed.
70 4 Quantum Statistics – The Bose–Einstein Distribution
A black body is an (idealized model of an) object that absorbs all the incident electro-
magnetic radiation (and reflects none), regardless of the wavelength. A black body
in thermal equilibrium emits radiation that is called black–body radiation. It should
be understood that all bodies emit electromagnetic radiation whenever at positive
temperature, but normally, this radiation is not in thermal equilibrium. One of the
important applications of the BE statistics is to investigate the equilibrium properties
of black–body radiation.
If we consider the radiation inside an opaque object whose surfaces and walls
are kept at fixed temperature T , then the radiation and the surfaces arrive at thermal
equilibrium and then, the radiation has properties that are appreciably close to those
of a black body. To study the behavior of such a radiation, one creates a tiny hole in the
surface of the enclosure (so that a photon entering the cavity will be ‘trapped’ within
internal reflections, but will never be reflected out) it will not disturb the equilibrium
of the cavity and then the emitted radiation will have the same properties as the cavity
radiation, which in turn are the same as the radiation properties of a black body. The
temperature of the black body is T as well, of course. In this section, we study these
radiation properties using BE statistics.
We consider a radiation cavity of volume V and temperature T . Historically,
Planck (1900) viewed this system as an assembly of harmonic oscillators with quan-
tized energies (n + 1/2)ω, n = 0, 1, 2, . . ., where ω is the angular frequency of the
oscillator. An alternative point of view is as an ideal gas of identical and indistin-
guishable photons, each one with energy ω. Photons have integral spin and hence are
bosons, but they have zero mass and zero chemical potential when they interact with
a black–body. The reason is that there is no constraint that their total number would
be conserved (they are emitted and absorbed in the black–body material with which
they interact). Since in equilibrium F should be minimum, then (∂ F/∂ N )T,V = 0.
But (∂ F/∂ N )T,V = μ, and so, μ = 0. It follows then that distribution of photons
across the quantum states obeys BE statistics with μ = 0, that is
1
N̄ω = . (4.4.1)
eω/kT −1
The calculation of the density of states here is somewhat different from the one in
Sect. 4.3. Earlier, we considered a particle with positive mass m, whose kinetic energy
2 /2m, whereas now we are talking about a photon whose rest
is p2 /2m = 2 k
mass is zero and whose energy is ω = kc = pc (c being the speed of light),
so the dependence on k is now linear rather than quadratic. This is a relativistic
effect.
Assuming that V is large enough, we can pass to the continuous approximation.
As in Sect. 3.3, the number of waves (i.e., the number of quantum states) whose
wave–vector magnitude lies between k and k + d k,
is given by
4.4 Black–Body Radiation 71
2 d k
(1/8) · 4π k 2 d k
V k
= .
(π/L x ) · (π/L y ) · (π/L z ) 2π 2
V ω 2 dω
dNω = · . (4.4.2)
π 2 c3 eω/kT − 1
V ω 3 dω
dE ω = ωdNω = · . (4.4.3)
π 2 c3 eω/kT − 1
This expression for the spectrum of black–body radiation is known as Planck’s law.
kT V 2
dE ω ≈ ω dω (4.4.4)
π 2 c3
which is the Rayleigh–Jeans law. This is actually the classic limit (see footnote at the
Introduction to Chap. 3), obtained from multiplying kT by the “number of waves.”
In the other extreme of ω kT , we have
V
dE ω = ωdNω ≈ · ω 3 e−ω/kT dω, (4.4.5)
π 2 c3
which is Wien’s law. At low temperatures, this is an excellent approximation over a
very wide range of frequencies. The frequency of maximum radiation is (Fig. 4.2)
kT
ωmax = 2.822 · , (4.4.6)
namely, linear in temperature. This relation has immediate applications. For example,
the sun is known to be a source of radiation, which with a good level of approximation,
can be considered a black body. Using a spectrometer, one can measure the frequency
ωmax of maximum radiation (which turns out to be at the lower limit of the visible
range), and estimate the sun’s surface temperature (from Eq. (4.4.6)), to be T ≈
5800◦ K. At room temperature, ωmax falls deep in the infrared range, and thus invisible
to the human eye. Hence the name black body.
72 4 Quantum Statistics – The Bose–Einstein Distribution
0.12
0.1
0.08
0.06
0.04
0.02
0
0 1 2 3 4 5
The relation E/V = aT 4 is called the Stefan–Boltzmann law. The heat capacity at
constant volume, C V = (∂ E/∂T )V , is therefore proportional to T 3 .
For example, the pressure of the photon gas can be calculated from
kT ln
P=
V
∞
kT
=− 2 3 dω · ω 2 ln[1 − e−ω/kT ]
π c 0
4.4 Black–Body Radiation 73
∞
(kT )4
=− 2 3 3 dx · x 2 ln(1 − e−x )
π c 0
1 E
= aT 4 = , (4.4.10)
3 3V
where the integral is calculated using integration by parts.5 Note that while in the
ideal gas P was only linear in T , here it is proportional to the fourth power of T . Note
also that here, P V = E/3, which is different from the ideal gas, where P V = 2E/3.
The exposition in this chapter is heavily based on those of Mandl [1] and Pathria
[2]. Additional relevant textbooks are the same as those that are mentioned also in
Sect. 3.6 (as BE statistics and FD statistics are almost always presented on a similar
footing).
References
In this chapter, we discuss systems with interacting particles. As we shall see, when
the interactions among the particles are significant, the system exhibits a certain
collective behavior that, in the thermodynamic limit, may be subjected to phase
transitions, i.e., abrupt changes in the behavior of the system in the presence of
a gradual change in an external control parameter, like temperature, pressure, or
magnetic field. The contents of this chapter has a considerable overlap with Chap. 5
of [1], and it is provided in this book too for the sake of completeness.
So far, we have
dealt almost exclusively with systems that have additive Hamiltoni-
ans, E(x) = i E(xi ), which means, under the canonical ensemble, that the particles
are statistically independent and there are no interactions among them. In Nature,
of course, this is seldom really the case. Sometimes this is still a reasonably good
approximation, but in other cases, the interactions are appreciably strong and cannot
be neglected. Among the different particles there could be many sorts of mutual
forces, such as mechanical, electrical, or magnetic forces. There could also be inter-
actions that stem from quantum–mechanical effects: as described earlier, fermions
must obey Pauli’s exclusion principle. Another type of interaction stems from the fact
that the particles are indistinguishable, so permutations between them are not con-
sidered as distinct states. For example, referring to BE statistics, had the N particles
been statistically independent, the resulting partition function would be
N
−βr
Z N (β) = e
r
N!
= δ Nr = N · exp −β Nr r (5.1.1)
N1 ,N2 ,... r r Nr ! r
whereas in Eq. (3.2.1), the combinatorial factor, N !/ r Nr !, that distinguishes
between the various permutations among the particles, is absent. This introduces
dependency, which means interaction. Indeed, for the ideal boson gas, we have
encountered the effect of Bose–Einstein condensation, which is a phase transition,
and phase transitions can occur only in systems of interacting particles, as will be
discussed in this chapter.1
The simplest forms of deviation from the purely additive Hamiltonian structure are
those that consist, in addition to the individual energy terms, {E(xi )}, also terms that
depend on pairs, and/or triples, and/or even larger cliques of particles. In the case of
purely pairwise interactions, this means a structure like the following:
N
E(x) = E(xi ) + ε(xi , x j ) (5.2.1)
i=1 (i, j)
where the summation over pairs can be defined over all pairs i = j, or over some of
the pairs, according to a given rule, e.g., depending on the distance between particle i
and particle j, and according to the geometry of the system, or according to a certain
graph whose edges connect the relevant pairs of variables (that in turn, are designated
as nodes).
For example, in a one–dimensional array (a lattice) of particles, a customary model
accounts for interactions between neighboring pairs only, neglecting more remote
ones, thus the second term above would be i ε(xi , xi+1 ). A well known special
case of this is that of a polymer or a solid with crystal lattice structure, where, in the
one–dimensional version of the model, atoms are thought of as a chain of masses
connected by springs (see left part of Fig. 5.1), i.e., an array of coupled harmonic
oscillators. In this case, ε(xi , xi+1 ) = 21 K (xi+1 − xi )2 , where K is a constant and xi
is the displacement of the i-th atom from its equilibrium location, i.e., the potential
energies of the springs. In higher dimensional arrays (or lattices), similar interactions
apply, there are just more neighbors to each site, from the various directions (see right
part of Fig. 5.1). These kinds of models will be discussed in the next chapter in some
depth.
In a system where the particles are mobile and hence their locations vary and
have no geometrical structure, like in a gas, the interaction terms are also potential
energies pertaining to the mutual forces (see Fig. 5.2), and these normally depend
solely on the distances ri − rj .
N
pi 2
E(x) = + φ(
ri − rj ). (5.2.2)
i=1
2m i= j
A simple special case is that of hard spheres (Billiard balls), without any forces,
where
ri − rj < 2R
∞
φ( ri − rj ) = (5.2.3)
ri − rj ≥ 2R
0
which expresses the simple fact that balls cannot physically overlap. The analysis of
this model can be carried out using diagrammatic techniques (the cluster expansion,
etc.), but we will not get into details in this book.2 To demonstrate, however, the
effect of interactions on the deviation from the equation of state of the ideal gas, we
consider next a simple one–dimensional example.
Example 5.1 (Non–ideal gas in one dimension) Consider a one–dimensional object
of length L that contains N + 1 particles, whose locations are 0 ≡ r0 ≤ r1 ≤ . . . ≤
r N −1 ≤ r N ≡ L, namely, the first and the last particles are fixed at the edges. The
2 The reader can find the derivations in any textbook on elementary statistical mechanics, for exam-
order of the particles is fixed, namely, they cannot be swapped. Let the Hamiltonian
be given by
N n
pi2
E(x) = φ(ri − ri−1 ) + (5.2.4)
i=1 i=1
2m
where φ is a given potential function designating the interaction between two neigh-
boring particles along the line. The partition function, which is an integral of
the Boltzmann factor pertaining to this Hamiltonian, should incorporate the fact
that the positions {ri } are not independent. It is convenient to change variables to
ξi = ri − ri−1 , i = 1, 2, . . . , N , where it should be kept in mind that ξi ≥ 0 for all
N
i and i=1 ξi = L. Let us assume that L is an extensive variable, i.e., L = N ξ0 for
some constant ξ0 > 0. Thus, the partition function is
⎛ ⎞
N
N
1 −β 2
Z N (β, L) = N d p1 · · · d p N dξ1 · · · dξ N e i=1 [φ(ξi )+ pi /2m] · δ ⎝L − ξi ⎠
h IR+
N i=1
(5.2.5)
⎛ ⎞
N
N
1
= dξ1 · · · dξ N e−β i=1 φ(ξi ) · δ ⎝L − ξi ⎠ , (5.2.6)
λN IR+
N i=1
√ N
where λ = h/ 2πmkT . The constraint i=1 ξi = L makes the analysis of the
configurational partition function difficult. Let us pass to the corresponding Gibbs
ensemble where instead of fixing the length L, we control it by applying a force f .3
The corresponding partition function now reads
∞
−N
Y N (β, f ) = λ dLe−β f L Z N (β, L)
0
∞ N
N
= λ−N dLe−β f L dξ1 · · · dξ N e−β i=1 φ(ξi )
·δ L − ξi
0 IR+
N i=1
∞
N N
−N −β f L
=λ dξ1 · · · dξ N dLe δ L− ξi e−β i=1 φ(ξi )
IR+
N 0 i=1
N
N
−N
=λ dξ1 · · · dξ N exp −β f ξi + φ(ξi )
IR+
N i=1 i=1
N
N
−N
=λ dξ1 · · · dξ N exp −s ξi − β φ(ξi )
IR+
N i=1 i=1
∞ N
1 −[sξ+βφ(ξ)]
= dξ · e (5.2.7)
λ 0
With a slight abuse of notation, from now on, we will denote the last expression by
Y N (β, s). Consider now the following potential function
⎧
⎨∞ 0≤ξ≤d
φ(ξ) = − d <ξ ≤d +δ (5.2.8)
⎩
0 ξ >d +δ
In words, distances below d are strictly forbidden (e.g., because of the size of the
particles), in the range between d and d+δ there is a negative potential −, and beyond
d + δ the potential is zero.4 Now, for this potential function, the one–dimensional
integral above is given by
∞
e−sd −sδ
I = dξe−[sξ+βφ(ξ)] = [e (1 − eβ ) + eβ ], (5.2.9)
0 s
and so,
e−sd N −sδ
Y N (β, s) = [e (1 − eβ ) + eβ ] N
λN s N
= exp N ln[e−sδ (1 − eβ ) + eβ ] − sd − ln(λs) (5.2.10)
∂ ln Y N (β, s)
L = −
∂s
N δe−sδ (1 − eβ ) N
= −sδ β β
+ Nd + , (5.2.11)
e (1 − e ) + e s
or, equivalently,
L =
L − N d, which is the excess length beyond the possible
minimum, is given by
Thus,
4 Thisis a caricature of the Lennard–Jones potential function φ(ξ) ∝ [(d/ξ)12 − (d/ξ)6 ], which
begins from +∞, decreases down to a negative minimum, and finally increases and tends to zero.
80 5 Interacting Particle Systems and Phase Transitions
where the last line is obtained after some standard algebraic manipulation. Note that
without the potential well of the intermediate range of distances ( = 0 or δ = 0), the
second term in the square brackets disappears and we get a one dimensional version
of the equation of state of the ideal gas (with the volume being replaced by length
and the pressure – replaced by force). The second term is then a correction term due
to the interaction. The attractive potential reduces the product f · L.
Yet another example of a model, or more precisely, a very large class of models
with interactions, are those of magnetic materials. These models will closely accom-
pany our discussions from this point onward in this chapter. Although few of these
models are solvable, most of them are not. For the purpose of our discussion, a mag-
netic material is one for which the relevant property of each particle is its magnetic
moment. As a reminder, the magnetic moment is a vector proportional to the angular
momentum of a revolving charged particle (like a rotating electron, or a current loop),
or the spin, and it designates the intensity of its response to the net magnetic field that
this particle ‘feels’. This magnetic field is given by the superposition of an externally
applied magnetic field and the magnetic fields generated by the neighboring spins.
Quantum mechanical considerations dictate that each spin, which will be denoted
by si , is quantized, that is, it may take only one out of finitely many values. In the
simplest case to be adopted in our study – two values only. These will be designated
by si = +1 (“spin up”) and si = −1 (“spin down”), corresponding to the same
intensity, but in two opposite directions, one parallel to the magnetic field, and the
other – anti-parallel (see Fig. 5.3). The Hamiltonian associated with an array of spins
s = (s1 , . . . , s N ) is customarily modeled (up to certain constants that, among other
things, accommodate for the physical units) with a structure like this:
N
E(s) = −B · si − Ji j si s j , (5.2.14)
i=1 (i, j)
where B is the externally applied magnetic field and {Ji j } are the coupling constants
that designate the levels of interaction between spin pairs, and they depend on prop-
erties of the magnetic material and on the geometry of the system. The first term
accounts for the contributions of potential energies of all spins due to the magnetic
field, which in general, are given by the inner product B · si , but since each si is either
as said, these boil down to simple products, where only
parallel or anti-parallel to B,
the sign of each si counts. Since P(s) is proportional to e−βE(s) , the spins ‘prefer’
to be parallel, rather than anti-parallel to the magnetic field. The second term in the
above Hamiltonian accounts for the interaction energy. If Ji j are all positive, they
also prefer to be parallel to each other (the probability for this is larger), which is
the case where the material is called ferromagnetic (like iron and nickel). If they are
all negative, the material is antiferromagnetic. In the mixed case, it is called a spin
glass. In the latter, the behavior is rather complicated.
5.2 Models of Interacting Particles 81
The case where all Ji j are equal and the double summation over {(i, j)} is over
nearest neighbors only is called the Ising model. A more general version of it is
called the O(n) model, according to which each spin is an n–dimensional unit vector
si = (si1 , . . . , sin ) (and so is the magnetic field), where n is not necessarily related to
the dimension d of the lattice in which the spins reside. The case n = 1 is then the
Ising model. The case n = 2 is called the XY model, and the case n = 3 is called the
Heisenberg model.
Of course, the above models for the Hamiltonian can (and, in fact, are being)
generalized to include interactions formed also, by triples, quadruples, or any fixed
size p (that does not grow with N ) of spin–cliques.
We next discuss a very important effect that exists in some systems with strong
interactions (both in magnetic materials and in other models): the effect of phase
transitions.
and, as mentioned earlier, this is a ferromagnetic model, where all spins ‘like’ to be
in the same direction, especially when β J is large. In other words, the interactions,
in this case, tend to introduce order into the system. On the other hand, the second
law talks about maximum entropy, which tends to increase the disorder. So there are
two conflicting effects here. Which one of them prevails?
The answer turns out to depend on temperature. Recall that in the canonical
ensemble, equilibrium is attained at the point of minimum free energy f = −T s().
Now, T plays the role of a weighting factor for the entropy. At low temperatures,
the weight of the second term of f is small, and minimizing f is approximately
equivalent to minimizing
, which is obtained by states with a high level of order,
as E(s) = −J (i, j) si s j , in this example. As T grows, however, the weight of the
term −T s() increases, and min f , becomes more and more equivalent to max s(),
which is achieved by states with a high level of disorder (see Fig. 5.4). Thus, the
order–disorder characteristics depend primarily on temperature. It turns out that for
some magnetic systems of this kind, this transition between order and disorder may
be abrupt, in which case, we call it a phase transition. At a certain critical temperature,
called the Curie temperature, there is a sudden transition between order and disorder.
In the ordered phase, a considerable fraction of the spins align in the same direction,
which means that the system is spontaneously magnetized (even without an external
magnetic field), whereas in the disordered phase, about half of the spins are in either
direction, and then the net magnetization vanishes. This happens if the interactions,
or more precisely, their dimension in some sense, is strong enough.
What is the mathematical significance of a phase transition? If we look at the
partition function, Z N (β), which is the key to all physical quantities of interest, then
for every finite N , this is simply the sum of finitely many exponentials in β and
therefore it is continuous and differentiable infinitely many times. So what kind of
abrupt changes could there possibly be in the behavior of this function? It turns out
that while this is true for all finite N , it is no longer necessarily true if we look at the
thermodynamic limit, i.e., if we look at the behavior of
ln Z N (β)
φ(β) = lim . (5.3.2)
N →∞ N
While φ(β) must be continuous for all β > 0 (since it is convex), it need not necessar-
ily have continuous derivatives. Thus, a phase transition, if exists, is fundamentally an
asymptotic property, it may exist in the thermodynamic limit only. While a physical
system is, after all finite, it is nevertheless well approximated by the thermodynamic
limit when it is very large.
The above discussion explains also why a system without interactions, where all
{xi } are i.i.d., cannot have phase transitions. In this case, Z N (β) = [Z 1 (β)] N , and
so, φ(β) = ln Z 1 (β), which is always a smooth function without any irregularities.
For a phase transition to occur, the particles must behave in some collective manner,
which is the case only if interactions take place.
There is a distinction between two types of phase transitions:
• If φ(β) has a discontinuous first order derivative, then this is called a first order
phase transition.
• If φ(β) has a continuous first order derivative, but a discontinuous second order
derivative then this is called a second order phase transition, or a continuous phase
transition.
We can talk, of course, about phase transitions w.r.t. additional parameters other
than temperature. In the above magnetic example, if we introduce back the magnetic
field B into the picture, then Z , and hence also φ, become functions of B too. If we
then look at derivative of
ln Z N (β, B)
φ(β, B) = lim
N →∞ N
⎡ ⎧ ⎫⎤
1 ⎣ ⎨ N ⎬
= lim ln exp β B si + β J si s j ⎦ (5.3.3)
N →∞ N ⎩ ⎭
s i=1 (i, j)
w.r.t. the product (β B), which multiplies the magnetization, i si , at the exponent,
this would give exactly the average magnetization per spin
$ %
1
N
m(β, B) = Si , (5.3.4)
N i=1
and this quantity might not always be continuous. Indeed, as mentioned earlier, below
the Curie temperature there might be a spontaneous magnetization. If B ↓ 0, then
this magnetization is positive, and if B ↑ 0, it is negative, so there is a discontinuity
at B = 0. We shall see this more concretely later on.
84 5 Interacting Particle Systems and Phase Transitions
The most familiar model of a magnetic system with interactions is the one–
dimensional Ising model, according to which
N
N
E(s) = −B si − J si si+1 (5.4.1)
i=1 i=1
N
N
Z N (β, B) = exp β B si + β J si si+1
s i=1 i=1
N
N
= exp h si + K si si+1 h = β B, K = β J
s i=1 i=1
h
N N
= exp (si + si+1 ) + K si si+1 . (5.4.2)
s
2 i=1 i=1
= tr{P N }
= λ1N + λ2N (5.4.4)
5.4 The One–Dimensional Ising Model 85
then clearly,
ln Z N (h, K )
φ(h, K ) = lim = ln λ1 . (5.4.7)
N →∞ N
sinh(β B)
m(β, B) = ) . (5.4.10)
e−4β J + sinh2 (β B)
For β > 0 and B > 0 this is a smooth function, and so, there are no phase transi-
tions and no spontaneous magnetization at any finite temperature.5 However, at the
absolute zero (β → ∞), we get
5 Note, in particular, that for J = 0 (i.i.d. spins) we get paramagnetic characteristics m(β, B) =
tanh(β B), in agreement with the result pointed out in the example of two–level systems, in the
comment that follows Example 2.3.
86 5 Interacting Particle Systems and Phase Transitions
For lattice dimension larger than two, the problem is still open.
It turns out then that what counts for the existence of phase transitions, is not
only the intensity of the interactions (designated by the magnitude of J ), but more
importantly, the “dimensionality” of the structure of the pairwise interactions. If we
denote by n the number of –th order neighbors of every given site, namely, the
number of sites that can be reached within steps from the given site, then what
counts is how fast does the sequence {n } grow, or more precisely, what is the value
of d = lim→∞ lnlnn , which is exactly the ordinary dimensionality for hyper-cubic
lattices. Loosely speaking, this dimension must be sufficiently large for a phase
transition to exist.
To demonstrate this point, we next discuss an extreme case of a model where
this dimensionality is actually infinite. In this model “everybody is a neighbor of
everybody else” and to the same extent, so it definitely has the highest connectivity
possible. This is not quite a physically realistic model, but it is pleasing that it is easy
to solve and that it exhibits a phase transition that is fairly similar to those that exist
in real systems. It is also intimately related to a very popular approximation method
in statistical mechanics, called the mean field approximation. Hence it is sometimes
called the mean field model. It is also known as the Curie–Weiss model or the infinite
range model.
Finally, we should comment that there are other “infinite–dimensional” Ising
models, like the one defined on the Bethe lattice (an infinite tree without a root and
without leaves), which is also easily solvable (by recursion) and it also exhibits phase
transitions [4], but we will not discuss it here.
N
J
E(s) = −B si − si s j . (5.5.1)
i=1
2N i= j
5.5 The Curie–Weiss Model 87
Here, all pairs {(si , s j )} communicate to the same extent, and without any geometry.
The 1/N factor here is responsible for keeping the energy of the system extensive
(linear in N ), as the number of interaction terms is quadratic in N . The factor 1/2
compensates for the fact that the summation over i = j counts each pair twice. The
first observation is the trivial fact that
2
si = si2 + si s j = N + si s j (5.5.2)
i i i= j i= j
where the second equality holds since si2 ≡ 1. It follows then, that our Hamiltonian
is, up to a(n immaterial) constant, equivalent to
N 2
J
N
E(s) = −B si − si
i=1
2N i=1
⎡ 2 ⎤
1 N
J 1 N
= −N ⎣ B · si + si ⎦ , (5.5.3)
N i=1 2 N i=1
thus E(s) depends on s only via the magnetization m(s) = 1
N i si . This fact makes
the C–W model very easy to handle:
J
Z N (β, B) = exp N β B · m(s) + m 2 (s)
s
2
+1
(m) · e N β(Bm+J m /2)
2
=
m=−1
+1
·
e N h 2 ((1+m)/2) · e N β(Bm+J m /2)
2
=
m=−1
& '
· 1+m βm 2 J
= exp N · max h 2 + β Bm + (5.5.4)
|m|≤1 2 2
Fig. 5.5 Graphical solutions of equation m = tanh(β J m): The left part corresponds to the case
β J < 1, where there is one solution only, m ∗ = 0. The right part corresponds to the case β J > 1,
where in addition to the zero solution, there are two non–zero solutions m ∗ = ±m 0
at m = 0 is negative, and therefore it is indeed the maximum (see Fig. 5.6, left part).
Thus, the dominant magnetization is m ∗ = 0, which means disorder and hence no
spontaneous magnetization for T > Tc . On the other hand, when β J > 1, which
means temperatures lower than Tc , the initial slope of the tanh function is larger than
that of the linear function, but since the tanh function cannot take values outside the
interval (−1, +1), the two functions must intersect also at two additional, symmetric,
non–zero points, which we denote by +m 0 and −m 0 (see Fig. 5.5, right part). In this
case, it can readily be shown that the second derivative of ψ(m) is positive at the
origin (i.e., there is a local minimum at m = 0) and negative at m = ±m 0 , which
means that there are maxima at these two points (see Fig. 5.6, right part). Thus, the
dominant magnetizations are ±m 0 , each capturing about half of the probability.
Consider now the case β J > 1, where the magnetic field B is brought back
into the picture. This will break the symmetry of the right graph of Fig. 5.6 and the
corresponding graphs of ψ(m) would be as in Fig. 5.7, where now the higher local
6 Onceagain, for J = 0, we are back to non–interacting spins and then this equation gives the
paramagnetic behavior m = tanh(β B).
5.5 The Curie–Weiss Model 89
Fig. 5.6 The function ψ(m) = h 2 ((1 + m)/2) + β J m 2 /2 has a unique maximum at m = 0 when
β J < 1 (left graph) and two local maxima at ±m 0 , in addition to a local minimum at m = 0, when
β J > 1 (right graph)
Fig. 5.7 The case β J > 1 with a magnetic field B. The left graph corresponds to B < 0 and the
right graph – to B > 0
maximum (which is also the global one) is at m 0 (B) whose sign is as that of B. But
as B → 0, m 0 (B) → m 0 of Fig. 5.6. Thus, we see the spontaneous magnetization
here. Even after removing the magnetic field, the system remains magnetized to the
level of m 0 , depending on the direction (the sign) of B before its removal. Obviously,
the magnetization m(β, B) has a discontinuity at B = 0 for T < Tc , which is a first
order phase transition w.r.t. B (see Fig. 5.8). We note that the point T = Tc is the
Fig. 5.8 Magnetization versus magnetic field: For T < Tc there is spontaneous magnetization:
lim B↓0 m(β, B) = +m 0 and lim B↑0 m(β, B) = −m 0 , and so there is a discontinuity at B = 0
90 5 Interacting Particle Systems and Phase Transitions
boundary between the region of existence and the region of non–existence of a phase
transition w.r.t. B. Such a point is called a critical point. The phase transition w.r.t.
β is of the second order.
Finally, we should mention here an alternative technique that can be used to
analyze this model, which is based on the so called Hubbard–Stratonovich transform.
Specifically, we have the following chain of equalities:
⎧ N 2 ⎫
⎨ N
K ⎬
Z (h, K ) = exp h si + si h = β B, K = β J
s
⎩ 2N ⎭
i=1 i=1
N
⎧ N 2 ⎫
⎨ K ⎬
= exp h si · exp si
s
⎩ 2N ⎭
i=1 i=1
N
*
N N z2 N
= exp h si · dz exp − +z· si
s i=1
2πK IR 2K i=1
*
N N
−N z 2 /(2K )
= dze exp (h + z) si
2πK IR s i=1
* 1 N
N
dze−N z /(2K ) e(h+z)s
2
=
2πK IR s=−1
*
N
dze−N z /(2K ) [2 cosh(h + z)] N
2
=
2πK IR
*
N
=2 ·
N
dz exp{N [ln cosh(h + z) − z 2 /(2K )]}, (5.5.9)
2πK IR
where the passage from the second to the third line follows the use of
+ the, characteristic
variable: If X ∼ N (0, σ 2 ), then eαX = eα σ /2 (in
2 2
function of a Gaussian random
our case, σ 2 = K /N and α = i si ).
The integral in the last line can be shown (see, e.g., [1, Chap. 4]) to be dominated
by e to N times the maximum of the function in the square brackets at the exponent
of the integrand, or equivalently, the minimum of the function
z2
γ(z) = − ln cosh(h + z). (5.5.10)
2K
by equating its derivative to zero, we get the very same equation as m = tanh(β B +
β J m) by setting z = β J m. The function γ(z) is different from the function ψ that we
maximized earlier, but the extremum is the same. This function is called the Landau
free energy.
5.6 Spin Glasses∗ 91
So far we discussed only models where the non–zero coupling coefficients, J = {Ji j }
are equal, thus they are either all positive (ferromagnetic models) or all negative
(antiferromagnetic models). As mentioned earlier, there are also models where the
signs of these coefficients are mixed, which are called spin glass models.
Spin glass models have a much more complicated and more interesting behavior
than ferromagnets, because there might be meta-stable states, due to the fact that not
all spin pairs {(si , s j )} can necessarily be in their preferred mutual polarization. It
might be the case that some of these pairs are “frustrated.” In order to model situations
of amorphism and disorder in such systems, it is customary to model the coupling
coefficients as random variables. This model with random parameters means that
there are now two levels of randomness:
• Randomness of the coupling coefficients J.
• Randomness of the spin configuration s given J, according to the Boltzmann
distribution, i.e.,
- .
N
exp β B i=1 si + (i, j) Ji j si s j
P(s| J) = . (5.6.1)
Z (β, B| J)
However, these two sets of random variables have a rather different stature. The
underlying setting is normally such that J is considered to be randomly drawn once
and for all, and then remain fixed, whereas s keeps varying all the time (according to
the dynamics of the system). At any rate, the time scale along which s varies is much
smaller than that of J. Another difference is that J is normally not assumed to depend
on temperature, whereas s does. In the terminology of physicists, s is considered an
annealed random variable, whereas J is considered a quenched random variable.7
Accordingly, there is a corresponding distinction between annealed averages and
quenched averages.
Let us see what is exactly the difference between the quenched averaging and
the annealed one. If we examine, for instance, the free energy, or the log–partition
function, ln Z (β| J), this is now a random variable because it depends on the random
J. If we denote by
· J the expectation w.r.t. the randomness of J, then quenched
averaging means
ln Z (β| J) J , whereas annealed averaging means ln
Z (β| J) J .
Normally, the relevant average is the quenched one, because the random variable
1
N
ln Z (β| J) typically converges to the same limit as its expectation N1
ln Z (β| J) J
(the so called self–averaging property), but more often than not, it is also much harder
to calculate. Clearly, the annealed average is never smaller than the quenched one
because of Jensen’s inequality, but they sometimes coincide at high temperatures.
The difference between them is that in quenched averaging, the dominant realizations
7 Ina nutshell, annealing means slow cooling, whereas quenching means fast cooling, that causes
the material to freeze without enough time to settle in an ordered structure. The result is then a
disordered structure, modeled by frozen (fixed) random parameters, J.
92 5 Interacting Particle Systems and Phase Transitions
of J are the typical ones, whereas in annealed averaging, this is not necessarily the
case. This follows from the following sketchy consideration. As for the annealed
average, we have:
Z (β| J) J = P( J)Z (β| J)
J
·
≈ Pr{ J : Z (β| J) = e N α } · e N α
α
≈ e−N E(α) · e N α (assuming exponential probabilities)
α
·
= e N maxα [α−E(α)] (5.6.2)
which means that the annealed average is dominated by realizations of the system
with
ln Z (β| J)
≈ α∗ = arg max[α − E(α)], (5.6.3)
N α
1
α = φ(β) ≡ lim
ln Z (β| J) . (5.6.4)
N →∞ N
On the other hand, when it comes to quenched averaging, the random variable
ln Z (β| J) behaves linearly in N , and concentrates strongly around the typical value
N φ(β), whereas other values are weighted by (exponentially) decaying probabilities.
The literature on spin glasses includes many models for the randomness of the
coupling coefficients. We end this part by listing just a few.
• The Edwards–Anderson (E–A) model, where {Ji j } are non–zero for nearest–
neighbor pairs only (e.g., j = i ± 1 in one–dimensional model). According to
this model, {Ji j } are i.i.d. random variables, which are normally modeled to have
a zero–mean Gaussian pdf, or binary symmetric with levels ±J0 . It is customary
to work with a zero–mean distribution if we have a pure spin glass in mind. If the
mean is nonzero, the model has either a ferromagnetic or an anti-ferromagnetic
bias, according to the sign of the mean.
• The Sherrington–Kirkpatrick (S–K) model, which is similar to the E–A model,
except that the support of {Ji j } is extended to include all N (N − 1)/2 pairs, and
not only nearest–neighbor pairs. This can be thought of as a stochastic version of
the C–W model in the sense that here too, there is no geometry, and every spin
interacts with every other spin to the same extent, but here the coefficients are
random, as said.
• The p–spin model, which is similar to the S–K model, but now the interaction
term consists, not only of pairs, but also of triples, quadruples, and so on, up to
cliques of size p, i.e., products si1 si2 · · · si p , where (i 1 , . . . , i p ) exhaust all possible
subsets of p spins out of N . Each such term has a Gaussian coefficient Ji1 ,...,i p with
an appropriate variance.
5.6 Spin Glasses∗ 93
Considering the p–spin model, it turns out that if we look at the extreme case of
p → ∞ (taken after the thermodynamic limit N → ∞), the resulting behavior
turns out to be extremely erratic: all energy levels {E(s)}s∈{−1,+1} N become i.i.d.
Gaussian random variables. This is, of course, a toy model, which has very little to
do with reality (if any), but it is surprisingly interesting and easy to work with. It is
called the random energy model (REM).
References
1. N. Merhav, Statistical physics and information theory. Found. Trends Commun. Inf. Theory
6(1–2), 1–212 (2009)
2. R.K. Pathria, Statistical Mechanics, 2nd edn. (Elsevier: Butterworth–Heinemann, Oxford, 1996)
3. L. Onsager, Crystal statistics. I. A two-dimensional model with an order-disorder transition.
Phys. Rev. 65(3–4), 117–149 (1944)
4. R.J. Baxter, Exactly Solved Models in Statistical Mechanics (Academic Press, London, 1982)
5. K. Huang, Statistical Mechanics, 2nd edn. (Wiley, New York, 1987)
6. M. Kardar, Statistical Physics of Particles (Cambridge University Press, Cambridge, 2007)
7. L.D. Landau, E.M. Lifshitz, Course of Theoretical Physics – Volume 5: Statistical Physics, Part
1, 3rd edn. (Elsevier: Butterworth–Heinemann, New York, 1980)
8. F. Reif, Fundamentals of Statistical and Thermal Physics (McGraw-Hill, New York, 1965)
Chapter 6
Vibrations in a Solid – Phonons
and Heat Capacity∗
6.1 Introduction
1 In general, there are additional contributions to the heat capacity (e.g., from orientational ordering
in paramagnetic salts, or from conduction electrons in metals, etc.), but here we shall consider only
the vibrational heat capacity.
2 Each atom has 6 degrees of freedom (3 of position + 3 of momentum). Classically, each one of
them contributes one quadratic term to the Hamiltonian, whose mean is kT /2, thus a total mean
energy of 3 kT, which means specific heat of 3 k per atom.
© The Author(s) 2018 95
N. Merhav, Statistical Physics for Electrical Engineering,
DOI 10.1007/978-3-319-62063-3_6
96 6 Vibrations in a Solid – Phonons and Heat Capacity∗
6.2 Formulation
1 2 1 ˙2
3N 3N
K = m ẋi = m ξ (6.2.1)
2 i=1 2 i=1 i
∂ 1 ∂2
(x) = ( x̄) + ξi + ξi ξ j + . . . (6.2.2)
i
∂xi x= x̄ i, j
2 ∂xi ∂x j x= x̄
The first term in this expansion represents the minimum energy when all atoms are
at rest in their mean positions x̄i . We henceforth denote this energy by 0 . The
second term is identically zero because (x) is minimized at x = x̄. The second
order terms of this expansion represent the harmonic component of the vibrations.
If we assume that the overall amplitude of the vibrations is reasonably small, we
can safely neglect all successive terms and then we are working with the so called
harmonic approximation. Thus, we may write
1 ˙2
3N
E(x) = 0 + m ξ + αi j ξi ξ j (6.2.3)
2 i=1 i i, j
6.2 Formulation 97
This Hamiltonian corresponds to harmonic oscillators that are coupled to one another,
as discussed in Sects. 2.2.2 and 5.2, where the off–diagonal terms of the matrix
A = {αi j } designate the pairwise interactions. This Hamiltonian obeys the general
form of Eq. (5.2.1)
While Einstein neglected the off–diagonal terms of A in the first place, Debye
did not. In the following, we present the latter approach, which is more general (and
more realistic), whereas the former will essentially be a special case.
The first idea of the analysis is to transform the coordinates into a new domain where
the components are all decoupled. This means diagonalizing the matrix A. Since A
is a symmetric non–negative definite matrix, it is clearly diagonalizable by a unitary
matrix formed by its eigenvectors, and the diagonal elements of the diagonalized
matrix (which are the eigenvalues of A) are non–negative. Let us denote the new
coordinates of the system by qi , i = 1, . . . , 3N , and the eigenvalues – by 21 mωi2 .
By linearity of the differentiation operation, the same transformation take us from
the vector of velocities {ξ˙i } (of the kinetic component of the Hamiltonian) to the
vector of derivatives of {qi }, which will be denoted {q̇i }. Fortunately enough, since
the transformation is unitary it leaves the components {q̇i } decoupled. In other words,
by the Parseval theorem, the norm of {ξ˙i } is equal to the norm of {q̇i }. Thus, in the
transformed domain, the Hamiltonian reads
1 2
E(q) = 0 + m (q̇i + ωi2 qi2 ). (6.3.1)
2 i
which can be viewed as 3N decoupled harmonic oscillators, each one oscillating in its
individual normal mode ωi . The parameters {ωi } are called characteristic frequencies
or normal modes.
Example 6.1 (One–dimensional ring of springs) If the system has translational sym-
metry and if, in addition, there are periodic boundary conditions, then the matrix A is
circulant, which means that it is always diagonalized by the discrete Fourier transform
(DFT). In this case, qi are the corresponding spatial frequency variables, conjugate
to the location displacement variables ξi . The simplest example of this is a ring of N
one–dimensional springs, as discussed in Sect. 5.2 (see left part of Fig. 5.1), where
the Hamiltonian (in the current notation) is
98 6 Vibrations in a Solid – Phonons and Heat Capacity∗
1 ˙2 1
E(x) = 0 + m ξi + K (ξi+1 − ξi )2 . (6.3.2)
2 i
2 i
The eigenvalues of A are λi = K [1 − cos(2πi/N )], which are simply the DFT
coefficients of the N –sequence formed by any row of A (removing the com-
plex exponential
√ of the phase factor). This means that the normal modes are
ωi = 2K [1 − cos(2πi/N )]/m.
Classically, each of the 3N normal modes of vibration corresponds to a wave
of distortion of the lattice. Quantum–mechanically, these modes give rise to quanta
called phonons, in analogy to the fact that vibrational modes of electromagnetic
waves give rise to photons. There is one important difference, however: while the
number of normal modes in the case of an electromagnetic wave is infinite, here the
number of modes (or the number of phonon energy levels) is finite – there are exactly
3N of them. This gives rise to a few differences in the physical behavior, but at low
temperatures, where the high–frequency modes of the solid become unlikely to be
excited, these differences become insignificant.
The Hamiltonian is then
1
E(n 1 , n 2 , . . .) = 0 + ni + ωi , (6.3.4)
i
2
where the non-negative integers {n i } denote the ‘states of excitation’ of the various
oscillators, or equally well, the occupation numbers of the various phonon levels in
the system. The internal energy is then
∂
E = − ln Z 3N (β)
∂β
∂ 1
=− ln . . . exp −β 0 + ni + ωi
∂β n1 n2 i
2
∂ e−βωi /2
=− ln e−β0
∂β i
1 − e−βωi
6.3 Heat Capacity Analysis 99
1 ∂
= 0 + ln(1 − e−βωi )
ωi +
i
2 i
∂β
1 ωi
= 0 + ωi + . (6.3.5)
i
2 i
1 − e−βωi
Only the last term of the last expression depends on T . Thus, the heat capacity at
constant volume3 is:
To proceed from here, one has to know (or assume something) about the form of the
density g(ω) of {ωi } and then pass from summation to integration. It is this point
where the difference between Einstein’s approach and Debye’s approach starts to
show up.
For Einstein, who assumed that the oscillators do not interact in the original, ξ–
domain, all the normal modes are equal ωi = ω E for all i, because then (assuming
translational symmetry) A is proportional to the identity matrix and then all its
eigenvalues are the same. Thus, in Einstein’s model g(ω) = 3N δ(ω − ω E ), and the
result is
C V = 3N k E(x) (6.3.7)
x 2 ex
E(x) = (6.3.8)
(e x − 1)2
with
ω E E
x= = . (6.3.9)
kT T
3 Exercise 6.1 Why is this the heat capacity at constant volume? Where is the assumption of constant
the observed rate, which is cubic, as described earlier. But at least, Einstein’s theory
predicts the qualitative behavior correctly.
Debye (1912), on the other hand, assumed a continuous density g(ω). He assumed
some cutoff frequency ω D , so that
ωD
g(ω)dω = 3N . (6.3.10)
0
This, together with the previous equation, determines the cutoff frequency to be
1/3
18π 2 ρ
ωD = (6.3.12)
1/v L3 + 2/vT3
C V = 3N k D(x0 ) (6.3.14)
with
ω D D
x0 = = , (6.3.16)
kT T
6.3 Heat Capacity Analysis 101
In other words, Debye’s theory indeed recovers the T 3 behavior at low tempera-
tures, in agreement with experimental evidence. Moreover, the match to experimental
results is very good, not only near T = 0, but across a rather wide range of temper-
atures. In some textbooks, like [1, p. 164, Fig. 6.7], or [2, p. 177, Fig. 7.10], there are
plots of C V as a function of T for certain materials, which show impressive proximity
between theory and measurements.
Exercise 6.2 Extend Debye’s analysis to allow two different cutoff frequencies, ω L
and ωT – for the longitudinal and the transverse modes, respectively.
Exercise 6.3 Calculate the density g(ω) for a ring of springs as described in
Example 6.1. Write an expression for C V as an integral and try to simplify it as
much as you can.
The exposition in this chapter is based largely on the books by Mandl [1] and Pathria
[2]. Additional material appears also in Kardar [3, Sect. 6.2].
102 6 Vibrations in a Solid – Phonons and Heat Capacity∗
References
(E)e−β E
P(E) =
Z (β)
e−β[E−T S(E)]
≈
e−β F
e−β[E−T S(E)]
=
e−β[E ∗ −T S(E ∗ )]
= e−β[E−T S] , E = E − E ∗ ; S = S(E) − S(E ∗ ). (7.1.1)
Now,
∂S 1 ∂2 S
S = E + · · (E)2 + . . .
∂E 2 ∂ E2
1 1 ∂2 S
= · E + · · (E)2 + . . . (7.1.2)
T 2 ∂ E2
7.1 Elements of Fluctuation Theory 105
and so,
−β[E−T S] βT ∂ 2 S
P(E) ≈ e ≈ exp · · (E)2
2 ∂ E2
1 ∂2 S ∗ 2
= exp · · (E − E ) . (7.1.3)
2k ∂ E 2
One should keep in mind that since S(E) is concave, its second derivative is negative,
so in the vicinity of E ∗ , the random variable E(X) is nearly Gaussian with mean E ∗
and variance k/|∂ 2 S/∂ E 2 |. How does this variance scale with N ? Note that since S
and E are both extensive (proportional to N ), the first derivative is intensive, and the
second derivative is proportional to 1/N , so the variance of E is proportional to √N ,
which means that the standard √ deviation of the energy fluctuations
√ scales like N,
and so the relative
variance Var{E}/E scales like 1/ N (cf. the additive case,
where E(X) = i E(X i ) is the sum of N i.i.d. random variables). This asymptotic
Gaussianity should not be a surprise, as we have approximated F(E) by a second
order Taylor series expansion around its minimum, so e−β F(E) is approximated by
an exponentiated quadratic expression which is Gaussian. The same idea can be
used for additional quantities that fluctuate. For example, in the Gibbsian ensemble,
where both E and V fluctuate, the Gibbs free energy is nearly quadratic in (E, V )
around its equilibrium value, and so, this random vector is Gaussian with a covariance
matrix that is proportional to the inverse of the Hessian of S w.r.t. E and V .
Example 7.1 (Ideal gas) In the case of the ideal gas, Eq. (2.2.6) gives
3N 5/3 h 2
E(S, V ) = e2S/(3N k) , (7.1.4)
4πe5/3 mV 2/3
whose Hessian is
2E 2/(N k)2 −2/(N kV )
∇2 E = . (7.1.5)
9 −2/(N kV ) 5/V 2
The term “Brownian motion” is after the botanist Robert Brown, who, in 1828, had
been observing tiny pollen grains in a liquid under a microscope and saw that they
moved in a random fashion, and that this motion was not triggered by currents or
other processes in the liquid, like evaporation, etc. The movement was caused by
frequent collisions with the particles of the liquid. Einstein (1905) was the first to
provide a sound theoretical analysis of Brownian motion on the basis of the “random
walk problem.” Here, we introduce the topic using a formulation due to the French
physicist Paul Langevin (1872–1946), which makes the derivation extremely simple.
Langevin focuses on the motion of a relatively large particle of mass m, located at x(t)
at time t, whose velocity is v(t) = ẋ(t). The particle is subjected to the influence of a
force, composed of two components: one is a slowly varying macroscopic force and
the other is varying rapidly and randomly. The latter has zero mean, but it fluctuates.
In the one–dimensional case, it obeys the differential equation
d(x(t)ẋ(t))
mx(t)ẍ(t) ≡ m − ẋ 2 (t) = −γx(t)ẋ(t) + x(t)Fr (t). (7.2.2)
dt
Taking the expectation, while assuming that, due to the randomness of {Fr (t)}, x(t)
and Fr (t) at time
t,are independent, we have, x(t)Fr (t) = x(t) Fr (t) = 0. Also,
note that m ẋ 2 (t) = kT by the energy equipartition theorem (which applies here
since we are assuming the classical regime), and so, we end up with
d x(t)ẋ(t)
m = kT − γ x(t)ẋ(t) , (7.2.3)
dt
a simple first order differential equation, whose solution is
kT
x(t)ẋ(t) = + Ce−γt/m , (7.2.4)
γ
7.2 Brownian Motion and the Langevin Equation 107
where C is a constant of integration. Imposing the condition that x(0) = 0, this gives
C = −kT /γ, and so
1 d x 2 (t) kT
≡ x(t)ẋ(t) = 1 − e−γt/m , (7.2.5)
2 dt γ
which yields
2kT m
x 2 (t) = t − (1 − e−γt/m ) . (7.2.6)
γ γ
The last equation gives the mean square deviation of a particle away from its origin,
at time t. The time constant
of the dynamics, a.k.a. the relaxation time, is θ = m/γ.
For short times (t θ), x 2 (t) ≈ √
kT t 2 /m, which means that it looks like the particle
is moving at constant velocity of kT /m. For t
θ, however,
2kT
x 2 (t) ≈ · t. (7.2.7)
γ
It should now be pointed out that this linear growth rate of x 2 (t) is a characteristic of
Brownian motion. Here it is only an approximation for t
θ, as for m > 0, {x(t)} is
not a pure Brownian motion. Pure Brownian motion corresponds to the case m = 0
(hence θ = 0), namely, the term m ẍ(t) in
the Langevin equation can be neglected,
t
and then x(t) is simply proportional to 0 Fr (τ )dτ where {Fr (t)} is white noise.
Figure 7.1 illustrates a few realizations of a Brownian motion in one dimension and
in two dimensions.
We may visualize each collision on the pollen grain as that of an impulse, because
the duration of each collision is extremely short. In other words, the position of
the particle x(t) is responding to a sequence of (positive and negative) impulses at
random times. Let
denote the autocorrelation of the random process v(t) = ẋ(t) and let Sv (ω) =
F{Rv (τ )} be the power spectral density.1
Clearly, by the Langevin equation {v(t)} is the response of a linear, time–invariant
linear system
1 1 1 −t/θ
H (s) = = ; h(t) = e u(t) (7.2.9)
ms + γ m(s + 1/θ) m
1 To avoid confusion, one should keep in mind that although Sv (ω) is expressed as a function of
the radial frequency ω, which is measured in radians per second, the physical units of the spectral
density function itself here are Volt2 /Hz and not Volt2 /[radian per
+∞second]. To pass to the latter,
one should divide by 2π. Thus, to calculate power, one must use −∞ Sv (2π f )d f .
108 7 Fluctuations, Stochastic Dynamics and Noise
30 20 10
25
15
20 0
15 10
−10
10 5
5 −20
0 0
−5 −30
−5
−10
−10 −40
−15
−20 −15 −50
0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000
10 15 70
0 10
60
−10 5
50
−20 0
−30 −5 40
−40 −10 30
−50 −15
20
−60 −20
10
−70 −25
−80 −30 0
−90 −35 −10
−10 −5 0 5 10 15 20 25 30 35 −5 0 5 10 15 20 25 30 35 −25−20−15−10 −5 0 5 10 15 20 25
Fig. 7.1 Illustration of a Brownian motion. Upper figures: one–dimensional Brownian motion –
three realizations of x(t) as a function of t. Lower figures: two–dimensional Brownian motion –
three realizations of r (t) = [x(t), y(t)]. All realizations start at the origin
to the random input process {Fr (t)}. Assuming that the impulse process {Fr (t)} is
white noise, then
kT −|τ |/θ
Rv (τ ) = const · h(τ ) ∗ h(−τ ) = const · e−|τ |/θ = Rv (0)e−|τ |/θ = ·e
m
(7.2.10)
and
2kT ω0 1 γ
Sv (ω) = · 2 , ω0 = = (7.2.11)
m ω + ω02 θ m
that is, a Lorentzian spectrum. We see that the relaxation time θ is indeed a measure
of the “memory” of the particle and ω0 = 1/θ plays the role of 3 dB cutoff frequency
of the spectrum of {v(t)}. What is the spectral density of the driving input white
noise process?
Sv (ω) 2kT ω0
S Fr (ω) = = · m 2 (ω 2 + ω02 ) = 2kT mω0 = 2kT γ.
|H (iω)|2 m(ω 2 + ω02 )
(7.2.12)
This result is very important. The spectral density of the white noise is 2kT times
the dissipative coefficient of the system, γ. In other words, the dissipative element of
7.2 Brownian Motion and the Langevin Equation 109
the system is ‘responsible’ for the noise. At first glance, it may seem surprising: why
should the intensity of the (external) driving force Fr (t) be related to the dissipative
coefficient γ? The answer is that they are related via energy balance considerations,
since we are assuming thermal equilibrium. Because the energy waste (dissipation)
is proportional to γ, the energy supply from Fr (t) must also be proportional to γ in
order to balance it.
Example 7.2 (Energy balance for the Brownian particle) The friction force
Ffriction (t) = −γv(t) causes the particle to loose kinetic energy at the rate of
kT kT
Ploss = Ffriction (t)v(t) = −γ v 2 (t) = −γ · =− .
m θ
On the other hand, the driving force Fr (t) injects kinetic energy at the rate of
These principles apply not only to a Brownian particle in a liquid, but to any linear
system that obeys a first order stochastic differential equation with a white noise
input, provided that the energy equipartition theorem applies. An obvious electrical
analogue of this is a simple electric circuit where a resistor R and a capacitor C are
connected to each other (see Fig. 7.2). The thermal noise generated by the resistor
(due to the thermal random motion of the colliding free electrons in the conductor
with extremely short mean time between collisions), a.k.a. the Johnson–Nyquist
noise, is modeled as a current source connected in parallel to the resistor (or as an
equivalent voltage source connected in series to the resistor), which generates a white
noise current process Ir (t). The differential equation pertaining to Kirchoff’s current
law is
2 More rigorously, think of the delta function here as the limit of narrow (symmetric) autocorrelation
V (t)
C V̇ (t) + = Ir (t) (7.2.17)
R
where V (t) is the voltage across the resistor as well as the parallel capacitor. Now,
this is exactly the same differential equation as before, where Ir (t) plays the role of
the driving force, V (t) is replacing ẋ(t), C substitutes m, and 1/R is the dissipative
coefficient instead of γ. Thus, the spectral density of the current is
2kT
S Ir (ω) = . (7.2.18)
R
Alternatively, if one adopts the equivalent serial voltage source model then Vr (t) =
R Ir (t) and so
2kT
SVr (ω) = · R 2 = 2kT R. (7.2.19)
R
This result is studied in every elementary course on random processes.
Finally, note that here we have something similar to the ultraviolet catastro-
phe: white noise has infinite power, which is nonphysical. Once again, this hap-
pens because we have not addressed quantum effects pertaining to high frequencies
(ω
kT ), which as in black–body radiation, cause an exponential decay in the
spectrum beyond a frequency of about kT /. We will get back to this later on in
Sect. 7.5.
In this subsection, we consider the temporal evolution of the probability density func-
tion of x(t) (and not only its second order statistics, as in the previous subsection),
under quite general conditions. The first successful treatment of Brownian motion
was due to Einstein, who as mentioned earlier, reduced the problem to one of dif-
fusion. Einstein’s argument can be summarized as follows: assume that all particles
move independently. The relaxation time is short compared to the observation time,
but long enough for the motions of a particle in two consecutive intervals of θ to
be independent. Let the number of suspended grains be N and let the x coordinate
change by in one relaxation time. is a random variable, symmetrically dis-
tributed around 0. The number of particles dN displaced by more than but less
7.3 Diffusion and the Fokker–Planck Equation 111
This equation tells that the probability of finding the particle around x at time t + δ
is the probability of finding it in x − (for any ) at time t, and then moving
by within duration δ to arrive at x at time t + δ. Here we assume independence
between the location x − at time t and the probability distribution of , as p()
is independent of x − . Since δ is small, we use the Taylor series expansion
∂ρ(x, t)
ρ(x, t + δ) ≈ ρ(x, t) + δ · . (7.3.2)
∂t
Also, for small , we approximate ρ(x − , t), this time to the second order:
∂ρ(x, t) 2 ∂ 2 ρ(x, t)
ρ(x − , t) ≈ ρ(x, t) − · + · . (7.3.3)
∂x 2 ∂x 2
Putting these in Eq. (7.3.1), we get
+∞ +∞
∂ρ(x, t) ∂ρ(x, t)
ρ(x, t) + δ · = ρ(x, t) p()d − p()d +
∂t −∞ ∂x −∞
1 ∂ 2 ρ(x, t) +∞ 2
· p()d (7.3.4)
2 ∂x 2 −∞
or
+∞
∂ρ(x, t) 1 ∂ 2 ρ(x, t)
= · 2 p()d (7.3.5)
∂t 2δ ∂x 2 −∞
∂ρ(x, t) ∂ 2 ρ(x, t)
= D· (7.3.6)
∂t ∂x 2
with the diffusion coefficient being
2 [x(t + δ) − x(t)]2
D = lim = lim . (7.3.7)
δ→0 2δ δ→0 2δ
To solve the diffusion equation, define (κ, t) as the Fourier transform of ρ(x, t)
w.r.t. the variable x, i.e.,
112 7 Fluctuations, Stochastic Dynamics and Noise
+∞
(κ, t) = dx · e−iκx ρ(x, t). (7.3.8)
−∞
(κ, t)
= D(iκ)2 (κ, t) ≡ −Dκ2 (κ, t) (7.3.9)
∂t
δ(x), this means C(κ) = (κ, 0) = 1 for all κ, and so (κ, t) = e−Dκ t . The density
2
e−x /(4Dt)
2
ρ(x, t) = √ , (7.3.10)
4π Dt
and so x(t) is zero–mean Gaussian with variance x 2 (t) = 2Dt.3 Of course, any
other initial location x0 would yield a Gaussian with the same variance 2Dt, but the
mean would be x0 . Comparing the variance 2Dt with (7.2.7), we have D = kT /γ,
which is known as the Einstein relation, widely used in semiconductor physics.
The analysis thus far assumed that = 0, namely, there is no drift to either the
left or the right direction. We next drop this assumption. In this case, the diffusion
equation generalizes to
has the obvious meaning of the average velocity. Equation (7.3.11) is well known as
the Fokker–Planck equation. The diffusion equation and the Fokker–Planck equation
are very central in physics. As mentioned already in Chap. 1, they are fundamental
in semiconductor physics, describing processes of propagation of concentrations of
electrons and holes in semiconductor materials.
Exercise 7.2 Solve the Fokker–Planck equation and show that the solution is
ρ(x, t) = N (vt, 2Dt). Explain the intuition.
3 A point to think about: what is the intuition behind the resultant Gaussianity? We have not assumed
important point to retain is that given the present location x(t) = x, would be
independent of the earlier history of {x(t ), t < t}, which means that {x(t)} should
be a Markov process. Consider then a general continuous–time Markov process
defined by the transition probability density function Wδ (x |x), which denotes the
pdf of x(t + δ) at x given that x(t) = x. A straightforward extension of the earlier
derivation would lead to the following more general form4
∂ρ(x, t) ∂ ∂2
= − [v(x)ρ(x, t)] + 2 [D(x)ρ(x, t)], (7.3.13)
∂t ∂x ∂x
where
+∞
1
v(x) = lim (x − x)Wδ (x |x)dx = E[ẋ(t)|x(t) = x] (7.3.14)
δ→0 δ −∞
+∞
1 1
D(x) = lim (x − x)2 Wδ (x |x)dx = lim E{[x(t + δ) − x(t)]2 |x(t) = x}
δ→0 2δ −∞ δ→0 2δ
(7.3.15)
where n(t) is a Gaussian white noise with spectral density N0 /2. From the solution
of this differential equation, it is easy to see that
t+δ
−aδ −a(t+δ)
x(t + δ) = x(t)e +e dτ n(τ )eaτ .
t
This relation, between x(t) and x(t + δ), can be used to derive the first and the
second moments of [x(t + δ) − x(t)] for small δ, and to find that v(x) = −ax and
D(x) = N0 /4 (Exercise 7.4 Show this). Thus, the Fokker–Planck equation, in this
example, reads
∂ρ(x, t) ∂ N0 ∂ 2 ρ(x, t)
=a· [x · ρ(x, t)] + · .
∂t ∂x 4 ∂x 2
It is easy to check that the r.h.s. vanishes for ρ(x, t) ∝ e−2ax /N0 (independent of t),
2
which means that in equilibrium, x(t) is Gaussian with zero mean and variance
N0 /4a. This is in agreement with the fact that, as x(t) is the response of the lin-
ear system H (s) = 1/(s + a) (or in the time domain, h(t) = e−at u(t)) to n(t), its
variance is indeed (as we know from courses on random processes):
∞ ∞
N0 N0 N0
h 2 (t)dt = e−2at dt = .
2 0 2 0 4a
Note that if x(t) is the voltage across the capacitor in a simple R–C network, then
a = 1/RC, and since E(x) = C x 2 /2, then in equilibrium we have the Boltzmann
weight ρ(x) ∝ exp(−C x 2 /2kT ), which is again, a zero–mean Gaussian. Comparing
the exponents, we immediately obtain N0 /2 = 2kT /RC 2 .
Exercise 7.5 Find the solution ρ(x, t) for all x and t subject to the initial condition
ρ(x, 0) = δ(x).
Now, v(x)ρ(x, t) has the obvious interpretation of the drift current density Jdrift (x, t)
of a ‘mass’ whose density is ρ(x, t) (in this case, it is a probability mass), whereas
∂
Jdiffusion (x, t) = − [D(x)ρ(x, t)]
∂x
is the diffusion current density.5 While the drift current is related to the overall motion
of the object, the diffusion current is associated with the tendency to equalize the
density ρ (which is why it is proportional to the negative density gradient). Thus,
5 Thisgeneralizes Fick’s law that we have seen in Chap. 1. There, D was fixed (independent of x),
and so the diffusion current was proportional to the negative gradient of the density.
7.3 Diffusion and the Fokker–Planck Equation 115
∂ρ
J = ρqe μE + Dqe = 0. (7.3.18)
∂x
This gives ρ(x) ∝ e−μE x/D . On the other hand, under thermal equilibrium, with
potential energy V (x) = qe E x, we also have ρ(x) ∝ e−V /kT = e−qe E x/kT . Upon
comparing the exponents, we readily obtain the Einstein relation, D = kT μ/qe .
Note that μ/qe = |v|/qe |E| is related to the admittance (the dissipative coefficient)
since |v| is proportional to the current and |E| is proportional to the voltage.
Now, let w(x) be an observable (a measurable physical quantity that depends on the
microstate), which has a conjugate force F, so that when F is applied, the change
in the Hamiltonian is E(x) = −F · w(x). Next, suppose that the external force is
time–varying according to a certain waveform {F(t), − ∞ < t < ∞}. As in the
previous subsection, it should be kept in mind that the overall effective force can
be thought of as a superposition of two contributions, a deterministic contribution,
which is the above mentioned F(t) – the external field that the experimentalist applies
on purpose and fully controls, and a random part Fr (t), which pertains to interaction
with the environment (or the heat bath at temperature T ). The former is deterministic
and the latter symbolizes the random, spontaneous thermal fluctuations.6 The random
component Fr (t) is responsible for the randomness of the microstate x and hence also
the randomness of the observable. We shall denote the random variable corresponding
to the observable at time t by W (t). Thus, W (t) is random variable, which takes values
in the set {w(x), x ∈ X }, where X is the space of microstates. When the external
deterministic field is kept fixed (F(t) ≡ const.), the system is expected to converge
into equilibrium and eventually obey the Boltzmann law. While in the section on
Brownian motion, we focused only on the contribution of the random part, Fr (t),
now let us refer only to the deterministic part, F(t). We will get back to the random
part later on.
Let us assume first that F(t) was switched on to a small level
at time −∞, and
then switched off at time t = 0, in other words, F(t) =
U (−t), where U (·) is the
unit step function (a.k.a. the Heaviside function). We are interested in the behavior
of the mean of the observable W (t) at time t, which we shall denote by W (t),
for t > 0. Also, W (∞) will denote the limit of W (t) as t → ∞, namely, the
equilibrium mean of the observable in the absence of an external field. Define now
the (negative) step response function as
W (t) − W (∞)
ζ(t) = lim (7.4.2)
→0
RW (τ ) = lim W (t)W (t + τ ) − W (∞)2 , (7.4.3)
t→∞
RW (τ ) = kT · ζ(τ ). (7.4.4)
The FDT then relates between the linear transient response of the system to a small
excitation (after it has been removed) and the autocovariance of the observable in
6 The random part of the force Fr (t) does not necessarily exist physically, but it is a way to refer
the random thermal fluctuations of the system to the ‘input’ F from a pure signals–and–systems
perspective. For example, think again of the example a Brownian particle colliding with other
particles. The other particles can be thought of as the environment in this case.
7.4 The Fluctuation–Dissipation Theorem 117
Fig. 7.3 Illustration of the response of W (t) to a step function at the input force F(t) =
U (−t).
According to the FDT, the response (on top of the asymptotic level W (∞)) is proportional to the
equilibrium autocorrelation function RW (t), which in turn may decay either monotonically (solid
curve) or in an oscillatory manner (dashed curve)
equilibrium. The transient response, that fades away is the dissipation, whereas the
autocovariance is the fluctuation. Normally, RW (τ ) decays for large τ and so W (t)
converges to W (∞) at the same rate (see Fig. 7.3).
To prove this result, we proceed as follows: first, we have by definition:
−βE(x)
x w(x)e
W (∞) = −βE(x)
. (7.4.5)
xe
e−βE(x)−βE(x)
P(x) = −βE(x)−βE(x) . (7.4.6)
xe
Let Pt (x |x) denote the probability that the system will be at state x at time t
(t > 0) given that it was at state x at time t = 0− . This probability depends on the
dynamical properties of the system (in the absence of the perturbing force). Let us
define W (t)x = x w(x )Pt (x |x), which is the expectation of W (t) (t > 0) given
that the system was at state x at t = 0− . Now,
118 7 Fluctuations, Stochastic Dynamics and Noise
W (t)x e−βE(x)−βE(x)
W (t) = x
−βE(x)−βE(x)
xe
W (t)x e−βE(x)+β
w(x)
= x
−βE(x)+β
w(x) (7.4.8)
xe
and W (∞) can be seen as a special case of this quantity for
= 0 (no perturbation
at all). Thus, ζ(t) is, by definition, nothing but the derivative of W (t) w.r.t.
,
computed at
= 0. I.e.,
−β E (x)+β
w(x)
∂ x W (τ )x e
ζ(τ ) = −β E (x)+β
w(x)
∂
xe
=0
−β E (x) −β E (x) w(x)e−β E (x)
x W (τ )x w(x)e x W (τ )x e x
=β· −β E (x) −β· −β E (x) ·
−β E (x)
xe x e xe
= β RW (τ ), (7.4.9)
where we have used the fact that the dynamics of {Pt (x |x)} preserve the equilibrium
distribution.
Exercise 7.6 Extend the FDT to account for a situation where the force F(t) is not
conjugate to w(x), but to another physical quantity v(x).
While ζ(t) is essentially the response of the system to a (negative) step function
in F(t), then obviously,
0 t <0 0 t <0
h(t) = = (7.4.10)
−ζ̇(t) t ≥ 0 −β ṘW (t) t ≥ 0
would be the impulse response. Thus, we have characterized the “linear” system that
describes the transient response of W (t) − W (∞) to a small input signal in F(t).
It is directly related to the equilibrium autocovariance function of {W (t)}. We can
now express the response of W (t) to a general small signal F(t) that vanishes for
t ≥ 0 to be
0
W (t) − W (∞) ≈ −β ṘW (t − τ )F(τ )dτ
−∞
0
= −β RW (t − τ ) Ḟ(τ )dτ
−∞
= −β RW ⊗ Ḟ, (7.4.11)
7.4 The Fluctuation–Dissipation Theorem 119
where the second passage is from integration by parts and where ⊗ denotes convo-
lution. Indeed, in our first example, Ḟ(t) = −
δ(t) and we are back with the result
W (t) − W (∞) = β
RW (t).
It is instructive to look at these relations also in the frequency domain. Applying
the one sided Fourier transform on both sides of the relation h(t) = −β ṘW (t) and
taking the complex conjugate (i.e., multiplying by eiωt and integrating over t > 0),
we get
∞ ∞ ∞
H (−iω) ≡ h(t)eiωt dt = −β ṘW (t)eiωt dt = βiω RW (t)eiωt dt + β RW (0),
0 0 0
(7.4.12)
where the last step is due to integration by parts. Upon taking the imaginary parts of
both sides, we get:
∞
1
Im{H (−iω)} = βω RW (t) cos(ωt)dt = βωSW (ω), (7.4.13)
0 2
where SW (ω) is the power spectrum of {W (t)} in equilibrium, that is, the Fourier
transform of RW (τ ). Equivalently, we have:
Example 7.4 (An electric circuit) Consider the circuit in Fig. 7.4. The driving force
is the voltage source V (t) and the conjugate variable is Q(t) the electric charge of
the capacitor. The resistors are considered part of thermal environment. The voltage
waveform is V (t) =
U (−t). At time t = 0− , the voltage across the capacitor is
/2
and the energy is 21 C(Vr + 2
)2 , whereas for t → ∞, it is 21 C Vr2 , so the difference
is E = 21 C Vr
= 21 Q r
, neglecting the O(
2 ) term. According to the FDT then,
ζ(t) = 21 β R Q (t), where the factor of 1/2 follows the one in E. This then gives
Im{H (−iω)}
S Q (ω) = 4kT · . (7.4.15)
ω
In this case,
(R1/[iωC]) · C C
H (iω) = = (7.4.16)
R + (R1/[iωC]) 2 + iω RC
for which
ω RC 2
Im{H (−iω)} = (7.4.17)
4 + (ω RC)2
and finally,
4kT RC 2
S Q (ω) = . (7.4.18)
4 + (ω RC)2
Thus, the thermal noise voltage across the capacitor is 4kT R/[4 + (ω RC)2 ]. The
same result can be obtained, of course, using the method studied in courses on
random processes, where the voltage noise spectrum across a certain pair of points
in the circuit is given by 2kT times the real part of the input impedance seen from
these points, which in this case, is given by
1 4kT R
2kT · Re RR = . (7.4.19)
iωC 4 + (ω RC)2
viewing the first bracketed term as the deterministic part, responding to the determin-
istic signal F(t), and the second bracketed term as the random fluctuation, responding
to a random input Fr (t). If we wish to think of our physical system in equilibrium as
a linear(ized) system with input Fr (t) and output W (t) − W (t), then what should
the spectrum of the input process {Fr (t)} be in order to comply with the last result?
Denoting by S Fr (ω) the spectrum of the input process, we know from the basic of
random processes that
Im{H (−iω)}
S Fr (ω) = 2kT · . (7.4.21)
ω · |H (iω)|2
7.4 The Fluctuation–Dissipation Theorem 121
This extends our earlier result concerning the spectrum of the driving white noise in
the case of the Brownian particle, where we obtained a spectral density of 2kT γ.
Example 7.5 (Second order linear system) For a second order linear system (e.g., a
damped harmonic oscillator),
the force Fr (t) is indeed conjugate to the variable W (t), which is the location, as
required by the FDT. Here, we have
1 1
H (iω) = = . (7.4.23)
m(iω)2 + γiω + K K − mω 2 + γiω
In this case,
γω
Im{H (−iω)} = = γω|H (iω)|2 (7.4.24)
(K − mω 2 )2 + γ 2 ω 2
and so, we readily obtain
recovering the principle that the spectral density of the noise process is 2kT times
the dissipative coefficient γ of the system, which is responsible to the irreversible
component. The difference between this and the earlier derivation is that earlier, we
assumed in advance that the input noise process is white and we only computed its
spectral level, whereas now, we have actually shown that at least for a second order
linear system like this, it must be white noise (as far as the classical approximation
holds).
From Eq. (7.4.21), we see that the thermal interaction with the environment, when
referred to the input of the system, has a spectral density of the form that we can
calculate. In general, it does not necessarily have to be a flat spectrum. Consider for
example, an arbitrary electric network consisting of one voltage source (in the role
of F(t)) and several resistors and capacitors. Suppose that our observable W (t) is
the voltage across one of the capacitors. Then, there is a certain transfer function
H (iω) from the voltage source to W (t). The thermal noise process stemming from
all resistors is calculated by considering equivalent noise sources (parallel current
sources or serial voltage sources) attached to each resistor. However, in order to refer
the contribution of these noise sources to the input F(t), we must calculate equivalent
noise sources which are in series with the given voltage source F(t). These equivalent
noise sources will no longer generate white noise processes, in general. For example,
in the circuit of Fig. 7.4, if an extra capacitor C would be connected in series to one
of the resistors, then, the contribution of the right resistor referred to the left one is
not white noise.7
Finally, it should be pointed out that this concept of referring the randomness in
the system to the input is not always feasible, as in general, there is no apparent
guarantee that the r.h.s. of Eq. (7.4.21) is a legitimate spectrum density function,
i.e., that it is non–negative everywhere. In the absence of this condition, the idea of
referring the noise to the input should simply be abandoned.
As promised at the end of Sect. 7.3, we now return to the problematics of the formula
SVr (ω) = 2kT R when it comes to very high frequencies, namely, the electrical ana-
logue to the ultraviolet catastrophe. Very high frequencies means very short waves,
much shorter than the physical sizes of the electric circuit.
The remedy to the unreasonable classical results in the high frequency range, is
to view the motion of electrons in a resistor as an instance of black–body radiation,
but instead of the three–dimensional case that we studied earlier, this time we are
talking about the one–dimensional case. The difference is mainly the calculation of
the density of states. Consider a long transmission line with characteristic impedance
R of length L, terminating at both ends by resistances R (see Fig. 7.5), so that the
impedances are matched at both ends. Then any voltage wave propagating along
the transmission line is fully absorbed by the terminating resistor without reflection.
The system resides in thermal equilibrium at temperature T . The resistor then can be
thought of as a black–body radiator in one dimension. A voltage wave of the form
V (x, t) = V0 exp[i(κx − ωt)] propagates along the transmission line with velocity
v = ω/κ, which depends on the capacitance and the inductance of the transmission
line per unit length. To assess the number of modes, let us impose the periodic
boundary condition V (0, t) = V (L , t). Then κL = 2πn for any positive integer n.
Thus, there are n = Lκ/2π = Lω/2πv such modes in the frequency range
between ω = vκ and ω + ω = v(κ + κ). The mean energy of such a mode is
given by
ω
(ω) = . (7.5.1)
eω/kT − 1
Since there are n = Lω/(2πv) propagating modes in this frequency range, the
mean energy per unit time (i.e., the power) incident upon a resistor in this frequency
range is
1 Lω 1 ωω
P= · ·
(ω) = · , (7.5.2)
L/v 2πv 2π eω/kT − 1
where L/v at the denominator is the travel time of the wave along the transmis-
sion line. This is the radiation power absorbed by the resistor, which must be
equal to the power emitted by the resistor in this frequency range. Let the thermal
voltage generated by the resistor in the frequency range [ω, ω + ω] be denoted
by Vr (t)[ω, ω + ω]. This voltage sets up a current
of Vr (t)[ω, ω + ω]/2R and
hence an average power of Vr2 (t)[ω, ω + ω] /4R. Thus, the balance between the
absorbed and the emitted power gives
Vr2 (t)[ω, ω + ω] 1 ω · ω
= · , (7.5.3)
4R 2π eω/kT − 1
which is
Vr2 (t)[ω, ω + ω] 4R ω
= · ω/kT (7.5.4)
ω 2π e −1
or
Vr2 (t)[ f, f + f ] hf
= 4R · h f /kT . (7.5.5)
f e −1
Taking the limit f → 0, the left–hand side becomes the one–sided spectral density
of the thermal noise, and so (returning to the angular frequency domain), the two–
sided spectral density is
ω
SVr (ω) = 2R · . (7.5.6)
eω/kT − 1
ω
kT =⇒ . (7.5.7)
eω/kT −1
124 7 Fluctuations, Stochastic Dynamics and Noise
2R(πkT )2
= , (7.5.8)
3
which is quadratic in T since both the (low frequency) spectral density and the
effective bandwidth are linear in T . The RMS is then
2R
VRMS = Vr2 (t) = · πkT, (7.5.9)
3
namely, proportional to temperature and to the square root of the resistance. To assess
the order of magnitude, a resistor of 100 at T = 300 ◦ K generates an RMS thermal
noise of about 10 mV when it stands alone (without a circuit that limits the bandwidth
much more drastically than ωc ). The equivalent noise bandwidth is
2R(πkT )2 /3h π 2 kT π2
Beq = = = · fc . (7.5.10)
2kT R 3h 3
Exercise 7.8 Derive an expression for the autocorrelation function of the Johnson–
Nyquist noise in the quantum mechanical regime.
In addition to thermal noise, there are other physical mechanisms that generate noise
in Nature in general, and in electronic circuits, in particular. We will only provide short
descriptions here. The interested reader is referred to the course notes “Fundamentals
of Noise Processes” by Y. Yamamoto in the following link: http://www.nii.
ac.jp/qis/first-quantum/e/forStudents/lecture/index.html
These notes contain a very detailed and comprehensive account of many more topics
that evolve around the physics of noise processes in electronic circuitry and other
systems.
Flicker Noise
Flicker noise, also known as 1/ f noise, is a random process with a spectrum that falls
off steadily into the higher frequencies. It occurs in almost all electronic devices, and
results from a variety of effects, though always related to a direct current. According
to the underlying theory, there are fluctuations in the conductivity due to the superpo-
sition of many independent thermal processes of alternate excitation and relaxation
of certain defects (e.g., dopant atoms or vacant lattice sites). This means that every
once in a while, a certain lattice site or a dopant atom gets excited and it moves into
a state of higher energy for some time, and then it relaxes back to the lower energy
state until the next excitation. Each one of these excitation/relaxation processes can
be modeled as a random telegraph signal (RTS) with a different time constant θ (due
to different physical/geometric characteristics) and hence contributes a Lorentzian
spectrum parametrized by θ. The superposition of these processes, whose spectrum
is given by the integral of the Lorentzian function over a range of values of θ (with
a certain weight), gives rise to the 1/ f behavior over a wide range of frequencies.
To see this more concretely in the mathematical language, an RTS X (t) is given by
X (t) = (−1) N (t) , where N (t) is a Poisson process of rate λ. It is a binary signal where
the level +1 can symbolize excitation and the level −1 designates relaxation. Here
the dwell times between jumps are exponentially distributed. The autocorrelation
function is given by
X (t)X (t + τ ) = (−1) N (t)+N (t+τ )
= (−1) N (t+τ )−N (t)
= (−1) N (τ )
∞
(λτ )k
= e−λτ · (−1)k
k=0
k!
∞
−λτ (−λτ )k
=e
k=0
k!
= e−2λτ (7.6.1)
126 7 Fluctuations, Stochastic Dynamics and Noise
4λ 2θ
S X (ω) = F{e−2λ|τ | } = = , (7.6.2)
ω 2 + 4λ2 1 + (ωθ)2
where the time constant is θ = 1/2λ and the cutoff frequency is ωc = 2λ. Now,
calculating the integral
θmax
2θ
dθ · g(θ) ·
θmin 1 + (ωθ)2
1 1
tan−1 (ωθmax ) − tan−1 (ωθmin ).
ω ω
For ω 1/θmax , using the approximation tan−1 (x) ≈ x (|x| 1), this is approx-
imately a constant. For ω
1/θmin , using the approximation tan−1 (x) ≈ π2 − x1
(|x|
1), this is approximately proportional to 1/ω 2 . In between, in the range
1/θmax ω 1/θmin (assuming that 1/θmax 1/θmin ), the behavior is according
to
1 π 1 1 π 1 π
− − θmin = − − ωθmin ≈ ,
ω 2 ωθmax ω 2 ωθmax 2ω
which is the 1/ f behavior in this wide range of frequencies. There are several theories
why g(θ) should be inversely proportional to θ, but the truth is that they are not perfect,
and the issue of 1/ f noise is not yet perfectly (and universally) understood.
Shot Noise
Shot noise in electronic devices consists of unavoidable random statistical fluctu-
ations of the electric current in an electrical conductor. Random fluctuations are
inherent when current flows, as the current is a flow of discrete charges (electrons).
First, some background on Poisson processes: a Poisson process {N (t)}t≥0 is a
continuous–time counting process, starting from N (0) = 0 and incremented by 1 at
random time instants T1 , T2 , . . .. The number of events N (t), counted up to time t,
is distributed according to
(λt)k
Pr{N (t) = k} = e−λt , k = 0, 1, 2, . . . (7.6.3)
k!
and events counted at non–overlapping time intervals are statistically independent.
Thus, over a total time interval of t0 seconds, the joint probability of N (t0 ) = k
together with counting event times within [τ1 , τ1 + dτ1 ] × . . . × [τk , τk + dτk ] is
given by
7.6 Other Noise Sources 127
e−λt0 λk
f (τ1 , . . . , τk |N (t0 ) = k) = (7.6.7)
e−λt0 (λt0 )k /k!
k
1
= k! · . (7.6.8)
t0
where i e (·) is the (very short) current pulse generated by the passage of a single
electron.9 The DC current is simply the average of this, which is λt0 qe /t0 = λqe = I0 .
The noise, which is associated with the fluctuations around this average, is given by
the second order statistics. Neglecting edge effects, we have:
R(s) = E{I (t)I (t + s)} = E{E{I (t)I (t + s)|K }} (7.6.10)
⎧ ⎫
⎨ K
K ⎬
=E i e (t − i ) i e (t + s − j ) (7.6.11)
⎩ ⎭
i=1 j=1
9 Note that i e (t) integrates to qe , and since it is a very short pulse, it is nearly qe δ(t) for a passage
at time t = 0.
128 7 Fluctuations, Stochastic Dynamics and Noise
K
1 t0
=E i e (t + s − θ)i e (t − θ)dθ + (7.6.12)
t
i=1 0 0
⎧ ⎫
⎨ 1 t0 t0 ⎬
E i e (t + s − θ)dθ · i e (t − θ)dθ (7.6.13)
⎩ t 2 ⎭
i= j 0 0 0
E{K } E{K 2 − K } 2
= Re (s) + · qe (7.6.14)
t0 t02
λt0 λ2 t 2
= · Re (s) + 2 0 qe2 (7.6.15)
t0 t0
I0
= · Re (s) + I02 , (7.6.16)
qe
where
+∞
Re (s) = i e (t)i e (t + s)dt. (7.6.17)
−∞
Now, the second term, I02 , is the contribution of the pure DC component, i.e., the
(stationary) average current. The first term is the fluctuation noise. Note that for
i e (t) = qe δ(t), we have Re (s) = qe2 δ(s), and so, the (flat) spectrum of the noisy part
is
or S(ω) = 2I0 qe for the single–sided spectrum. A few comments are in order:
1. By measuring the noise intensity in a diode, one can find qe experimentally.
2. In the derivation above, we assumed that i e (t) is proportional to the Dirac delta
function, which is an idealization. For a general pulse shape, Sshot (ω) would
become proportional to the Fourier transform of Re (s) as defined in (7.6.17).
Equivalently, this can be thought of as letting the white noise process derived
above undergo a linear filter whose impulse response is i e (t).
3. The result applies as long as I0 is not too large. For a strong DC current I0 , there
is another effect that kicks in, namely, the spatial charge effect: if a large bulk of
electrons cross at the same time, they create a spatial charge that interferes with
the emission of additional electrons, and this causes the shot noise spectral level
to be smaller than predicted by the above derivation.
Burst Noise
Burst noise consists of sudden step–like transitions between two or more levels (non-
Gaussian), as high as several hundred micro-volts, at random and unpredictable times.
Each shift in offset voltage or current lasts for several milliseconds, and the intervals
between pulses tend to be in the audio range (less than 100 Hz), leading to the term
7.6 Other Noise Sources 129
popcorn noise for the popping or crackling sounds it produces in audio circuits. Burst
noise is customarily modeled as an RTS and therefore, another synonym for burst
noise is RTS noise. Accordingly, it has a Lorentzian spectrum, similarly as in (7.6.2):
1
Sburst (ω) ∝ , (7.6.19)
1 + (ω/ω0 )2
which means that the spectrum is nearly flat at low frequencies (compared to the
cutoff frequency ω0 ) and nearly proportional to 1/ω 2 for high frequencies.
Avalanche Noise
Avalanche noise is the noise produced when a junction diode is operated at the onset
of avalanche breakdown, a semiconductor junction phenomenon in which carriers
in a high voltage gradient develop sufficient energy to dislodge additional carriers
through physical impact, creating ragged current flows.
Parts of the material in this chapter are based on Beck [1, Chaps. 6 and 9] and Reif
[2, Chap. 15]. For additional recommended reading, the reader is referred to van
Kampen [3], Risken [4], and Sethna [5, Chap. 10].
References
1. A.H.W. Beck, Statistical Mechanics, Fluctuations and Noise (Edward Arnold Publishers, Lon-
don, 1976)
2. F. Reif, Fundamentals of Statistical and Thermal Physics (McGraw-Hill, New York, 1965)
3. N.G. van Kampen, Stochastic Processes in Physics and Chemistry (North Holland, Amsterdam,
1992)
4. H. Risken, The Fokker–Planck Equation - Methods of Solution and Applications, 2nd edn.
(Springer, Berlin, 1989)
5. J.P. Sethna, Statistical Mechanics: Entropy, Order Parameters, and Complexity (Oxford Uni-
versity Press, Oxford, 2006)
Chapter 8
A Brief Touch on Information Theory∗
Our last topic in this book consists of a very brief description on the relation between
statistical physics and information theory, a research field pioneered by Claude
Elwood Shannon (1916–2001), whose seminal paper “A Mathematical Theory of
Communications” (1948) has established the corner–stone of this field.
In a nutshell, information theory is a science that focuses on the fundamental limits,
on the one hand, and the achievable performance, on the other hand, concerning
various information processing tasks, including most notably:
1. Data compression (lossless/lossy).
2. Error correction coding (coding for protection against errors due to channel noise).
3. Encryption.
There are also additional tasks of information processing that are considered to belong
under the umbrella of information theory, like: signal detection, estimation (para-
meter estimation, filtering/smoothing, prediction), information embedding, process
simulation, extraction of random bits, information relaying, and more.
Core information theory, which is called Shannon theory in the jargon of the
professionals, is about coding theorems. It is associated with the development of
computable formulas that characterize the best performance that can possibly be
achieved in these information processing tasks under some (usually simple) assump-
tions on the probabilistic models that govern the data, the channel noise, the side
information, the jammers if applicable, etc. While in most cases, this theory does not
suggest constructive communication systems, it certainly provides insights concern-
ing the features that an optimal (or nearly optimal) communication system must have.
Shannon theory serves, first and foremost, as the theoretical basis for modern digital
communication engineering. That being said, much of the modern research activ-
ity in information theory evolves, not only around Shannon theory, but also on the
never-ending efforts to develop methodologies (mostly, specific code structures and
Perhaps the first relation that crosses one’s mind is that in both fields there is a
fundamental notion of entropy. Actually, in information theory, the term entropy was
coined in the footsteps of the thermodynamic/statistical–mechanical entropy. Along
this book, we have seen already three (seemingly) different forms of the entropy: the
first is the thermodynamic entropy defined, in its differential form as
δS = δ Q/T, (8.2.1)
which was first introduced by Clausius in 1850. The second is the statistical entropy
S = k ln , (8.2.2)
which was defined by Boltzmann in 1872. The third is yet another formula for the
entropy – the Gibbs formula for the entropy of the canonical ensemble:
S = −k P(x) ln P(x) = −kln P(x), (8.2.3)
x
namely, the same expression as above exactly, just without the factor k and with the
basis of the logarithm being 2 rather than e. Indeed, this clear analogy was recognized
already by Shannon and von Neumann. According to a well–known anecdote, von
8.2 Entropy in Information Theory and Statistical Physics 133
Neumann advised Shannon to adopt this term because it would provide him with
“... a great edge in debates because nobody really knows what entropy is anyway.”
What is the information–theoretic meaning of entropy? It turns out that it has many
information–theoretic meanings, but the most fundamental one concerns optimum
compressibility of data. Suppose that we have a string of N i.i.d. random variables,
x1 , x2 , . . . , x N , taking values in a discrete set, say, the components of the microstate
in a quantum system of non–interacting particles, and we want to represent the
microstate information digitally (in bits) as compactly as possible, without losing
any information – in other words, we require the ability to fully reconstruct the data
from the compressed binary representation. How short can this binary representation
be?
Let us look at the following example. Suppose that each xi takes values in the set
{A, B, C, D}, independently with probabilities
Clearly, when translating the letters into bits, the naive approach would be to say
the following: we have 4 letters, so it takes 2 bits to distinguish between them, by
mapping, say lexicographically, as follows:
This would mean representing the list of x’s using 2 bits per–symbol. This is very
simple. But is this the best thing one can do?
It turns out that the answer is negative. Intuitively, if we can assign variable–
length code-words to the various letters, using shorter code-words for more probable
symbols and longer code-words for the less frequent ones, we might be able to gain
something. In our example, A is most probable, while C and D are the least probable,
so how about the following solution:
1 1 1 1
· 1 + · 2 + · 3 + · 3 = 1.75.
2 4 8 8
We have improved the average bit rate by 12.5%. This is fine, but is this the best one
can do or can we improve even further?
It turns out that this time, the answer is affirmative. Note that in this solution,
each letter has a probability of the form 2− ( – positive integer) and the length
of the assigned code-word is exactly (for A, = 1, for B – = 2, and for C
and D, = 3). In other words, the length of the code-word for each letter is the
negative
logarithm of its probability, so the average number of bits per symbol is
x∈{A,B,C,D} P(x)[− log2 P(x)], which is exactly the entropy H of the information
134 8 A Brief Touch on Information Theory∗
source. One of the basic coding theorems of information theory tells us that we
cannot compress to any coding rate below the entropy and still expect to be able to
reconstruct the x’s perfectly. But why is this true?
We will not get into a rigorous proof of this statement, but we will make an
attempt to give a statistical–mechanical insight into it. Consider the microstate x =
(x1 , . . . , x N ) and let us think of the probability function
N
N
P(x1 , . . . , x N ) = P(xi ) = exp −(ln 2) log2 [1/P(xi )] (8.2.5)
i=1 i=1
1
N
(xi ) ≈ (xi ) = − log2 P(xi ) = H, (8.2.6)
N i=1
so the average ‘internal energy’ is NH. It is safe to consider instead, the corresponding
microcanonical ensemble, which is equivalent as far as macroscopic averages go. In
the microcanonical ensemble, we would then have:
1
N
(xi ) = H (8.2.7)
N i=1
for every realization of x. How many bits would it take us to represent x in this micro-
canonical ensemble? Since all x’s are equiprobable in the microcanonical ensemble,
we assign to all x’s binary code-words of the same length, call it L. In order to
have a one–to–one mapping between the set of accessible x’s and binary strings of
representation, 2 L , which is the number of binary strings of length L, should be no
less than the number of microstates {x} of the microcanonical ensemble. Thus,
N
L ≥ log2 x : (xi ) = N H = log2 (N H ), (8.2.8)
i=1
but the r.h.s. is exactly related (up to a constant factor) to Boltzmann’s entropy
associated with ‘internal energy’ at the level of N H . Now, observe that the free
energy of the original, canonical ensemble, which is zero, is related to the entropy
ln (N H ) via the Legendre relation ln Z ≈ ln − β E, which is
and so,
ln (N H ) ≈ N H ln 2 (8.2.10)
or
which means that the length of the binary representation essentially cannot be less
than N H , namely, a compression rate of H bits per component of x. So, we have
seen that the entropy has a very concrete information–theoretic meaning, and in fact,
it is not the only one, but we will not delve into this any further here.
where M(E) is the number of distinct messages (and log2 of this is the number of
bits) that can be transmitted over a time interval of E seconds. Over a duration of E
seconds, L information symbols are conveyed, so that the average transmission time
per symbol is σ = E/L seconds per symbol. In the absence of any constraints on
the structure of the encoded messages, M(E) = r L = r E/σ , where r is the channel
input–output alphabet size. Thus, C = (log r )/σ bits per second.
Consider now the thermodynamic limit of L → ∞. Suppose that the L symbols
of duration E form N words, where by ‘word’, we mean a certain variable–length
string of channel symbols. The average transmission time per word is then = E/N .
Suppose further, that the code defines a certain set of word transmission times:
word number i takes i seconds to transmit. What is the optimum allocation of
word probabilities {Pi } that would support full utilization of the channel capacity?
Equivalently, given the probabilities {Pi }, what are the optimum transmission times
{i }? For simplicity, we will assume that {i } are all distinct. Suppose that each word
136 8 A Brief Touch on Information Theory∗
N! ·
(N) = = exp{N · H ( P)} (8.3.2)
N
i i !
This sum is dominated by the maximum term, namely, the maximum–entropy assign-
ment of relative frequencies
e−βi
Pi = (8.3.4)
Z (β)
where β > 0 is a Lagrange multiplier chosen such that i Pi i = , which gives
ln[Pi Z (β)]
i = − . (8.3.5)
β
For β = 1, this is in agreement with our earlier observation that the optimum message
length assignment in variable–length lossless data compression is according to the
negative logarithm of the probability.
Suppose now that {i } are kept fixed and consider a small perturbation in Pi ,
denoted dPi . Then
d = i dPi
i
1
=− (dPi ) ln[Pi Z (β)]
β i
1 1
=− (dPi ) ln Pi − (dPi ) ln Z (β)
β i β i
1
=− (dPi ) ln Pi
β i
1
= d −k Pi ln Pi
kβ i
= T ds, (8.3.6)
8.3 Statistical Physics of Optimum Message Distributions 137
where we have defined T = 1/(kβ) and s = −k i Pi ln Pi . The free energy per
particle is given by
f = − T s = −kT ln Z , (8.3.7)
which is related to the redundancy of the code. In [2], there is also an extension of this
setting to the case where N is not fixed, with correspondence to the grand—canonical
ensemble.
References
1. N. Merhav, Statistical physics and information theory. Found. Trends Commun. Inf. Theory
6(1–2), 1–212 (2009)
2. H. Reiss, Thermodynamic-like transformations in information theory. J. Stat. Phys. 1(1), 107–
131 (1969)
3. H. Reiss, C. Huang, Statistical thermodynamic formalism in the solution of information theory
problems. J. Stat. Phys. 3(2), 191–211 (1971)
4. M. Mézard, A. Montanari, Information, Physics, and Computation (Oxford University Press,
Oxford, 2009)