100% found this document useful (1 vote)
177 views143 pages

Statistics Physics

Statistics physics

Uploaded by

KrishnaMohan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
177 views143 pages

Statistics Physics

Statistics physics

Uploaded by

KrishnaMohan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 143

Neri Merhav

Statistical Physics for


Electrical Engineering
Statistical Physics for Electrical Engineering
Neri Merhav

Statistical Physics
for Electrical Engineering

123
Neri Merhav
The Andrew and Erna Viterbi Faculty
of Electrical Engineering
Technion—Israel Institute of Technology
Technion City, Haifa
Israel

ISBN 978-3-319-62062-6 ISBN 978-3-319-62063-3 (eBook)


DOI 10.1007/978-3-319-62063-3
Library of Congress Control Number: 2017945248

© The Author(s) 2018


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by Springer Nature


The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

This short book is based on lecture notes of a course on statistical physics and
thermodynamics, which is oriented, to a certain extent, toward electrical engi-
neering students. The course has been taught in the Electrical Engineering
department of the Technion (Haifa, Israel) ever since year 2013. The main body
of the book is devoted to statistical physics, whereas much less emphasis is given to
the thermodynamics part. In particular, the idea is to let the important results
of thermodynamics (most notably, the laws of thermodynamics) to be obtained as
conclusions from the derivations in statistical physics.
Beyond the variety of central topics in statistical physics that are important to the
general scientific education of the electrical engineering student, special emphasis is
devoted to subjects that are vital to the engineering education concretely. These
include, first of all, quantum statistics, like the Fermi–Dirac distribution, as well as
diffusion processes, which are both fundamental for deep understanding of semi-
conductor devices. Another important issue for the electrical engineering student is
to understand mechanisms of noise generation and stochastic dynamics in physical
systems, most notably, in electric circuitry. Accordingly, the fluctuation–dissipation
theorem of statistical mechanics, which is the theoretical basis for understanding
thermal noise processes in systems, is presented from a signals-and-systems point
of view, in a way that would hopefully be understandable and useful for an engi-
neering student, and well connected to some other important courses learned by
students of electrical engineering, like courses on random processes. The quantum
regime, in this context, is important too and hence provided as well. Finally, we
touch very briefly upon some relationships between statistical mechanics and
information theory, which is the theoretical basis for communications engineering,
and demonstrate how statistical–mechanical approach can be useful for the study of
information–theoretic problems. These relationships are further explored in [1], and
in a much deeper manner.
In the table of contents below, chapters and sections, marked by asterisks, can be
skipped without loss of continuity.

Technion City, Haifa, Israel Neri Merhav

v
vi Preface

Reference

1. N. Merhav, Statistical physics and information theory. Foundat. Trends Commun. Inf. Theor. 6
(1–2), pp. 1–212, 2009.
Acknowledgements

I would first like to express my deep gratitude to several colleagues in the Technion
Physics Department, including Profs. Dov Levine, Shmuel Fishman, and Yariv
Kafri, for many fruitful discussions, and for relevant courses that they have
delivered and that I have listened to. I have certainly learned a lot from them.
I would like to thank Profs. Nir Tessler and Baruch Fischer, of my department, for
their encouragement to develop a statistical physics course for our students. I am
also grateful to Prof. Yuval Yaish of my department, who has been teaching the
course too (in alternate years), for sharing with me his thoughtful ideas about the
course. The lecture notes of the course have served as the basis for this book.
Finally, I thank my dear wife, Ilana, and a student of mine, Mr. Aviv Lewis, for
their useful comments on the English grammar, typographical errors, and style.

vii
Contents

1 Kinetic Theory and the Maxwell Distribution . . . . . . . . . . . . . . . . . . 1


1.1 The Statistical Nature of the Ideal Gas . . . . . . . . . . . . . . . . . . . . . 2
1.2 Collisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Dynamical Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Suggestions for Supplementary Reading . . . . . . . . . . . . . . . . . . . . 12
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Elementary Statistical Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1 Basic Postulates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Statistical Ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 The Microcanonical Ensemble . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 The Canonical Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.3 The Grand–Canonical Ensemble
and the Gibbs Ensemble . . . . . . . . . ................. 38
2.3 Suggestions for Supplementary Reading . . . ................. 44
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ................. 45
3 Quantum Statistics – The Fermi–Dirac Distribution . . . . . . . . . . . . . 47
3.1 Combinatorial Derivation of the FD Statistics . . . . . . . . . . . . . . . . 48
3.2 FD Statistics from the Grand–Canonical Ensemble . . . . . . . . . . . . 50
3.3 The Fermi Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4 Useful Approximations of Fermi Integrals . . . . . . . . . . . . . . . . . . . 54
3.5 Applications of the FD Distribution . . . . . . . . . . . . . . . . . . . . . . . 56
3.5.1 Electrons in a Solid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.5.2 Thermionic Emission . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.5.3 Photoelectric Emission . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.6 Suggestions for Supplementary Reading . . . . . . . . . . . . . . . . . . . . 62
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

ix
x Contents

4 Quantum Statistics – The Bose–Einstein Distribution . . . . . . . . . . . . 65


4.1 Combinatorial Derivation of the BE Statistics . . . . . . . . . . . . . . . . 65
4.2 Derivation Using the Grand–Canonical Ensemble . . . . . . . . . . . . . 66
4.3 Bose–Einstein Condensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4 Black–Body Radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.5 Suggestions for Supplementary Reading . . . . . . . . . . . . . . . . . . . . 73
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5 Interacting Particle Systems and Phase Transitions . . . . . . . . . . . . . 75
5.1 Introduction – Sources of Interaction . . . . . . . . . . . . . . . . . . . . . . . 75
5.2 Models of Interacting Particles . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.3 A Qualitative Discussion on Phase Transitions . . . . . . . . . . . . . . . 81
5.4 The One–Dimensional Ising Model . . . . . . . . . . . . . . . . . . . . . . . . 84
5.5 The Curie–Weiss Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.6 Spin Glasses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.7 Suggestions for Supplementary Reading . . . . . . . . . . . . . . . . . . . . 93
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6 Vibrations in a Solid – Phonons and Heat Capacity . . . . . . . . . . . . 95
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.2 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.3 Heat Capacity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.3.1 Einstein’s Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.3.2 Debye’s Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.4 Suggestions for Supplementary Reading . . . . . . . . . . . . . . . . . . . . 101
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7 Fluctuations, Stochastic Dynamics and Noise . . . . . . . . . . . . . . . . . . . 103
7.1 Elements of Fluctuation Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.2 Brownian Motion and the Langevin Equation . . . . . . . . . . . . . . . . 106
7.3 Diffusion and the Fokker–Planck Equation . . . . . . . . . . . . . . . . . . 110
7.4 The Fluctuation–Dissipation Theorem . . . . . . . . . . . . . . . . . . . . . . 115
7.5 Johnson–Nyquist Noise in the Quantum–Mechanical Regime . . . . 122
7.6 Other Noise Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.7 Suggestions for Supplementary Reading . . . . . . . . . . . . . . . . . . . . 129
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
8 A Brief Touch on Information Theory . . . . . . . . . . . . . . . . . . . . . . . 131
8.1 Introduction – What Is Information Theory About? . . . . . . . . . . . 131
8.2 Entropy in Information Theory and Statistical Physics . . . . . . . . . 132
8.3 Statistical Physics of Optimum Message Distributions . . . . . . . . . 135
8.4 Suggestions for Supplementary Reading . . . . . . . . . . . . . . . . . . . . 137
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Introduction

Statistical physics is a branch in physics which deals with systems with a huge
number of particles (or any other elementary units). For example, Avogadro’s
number, which is about 6  1023 , is the number of molecules in 22.4 liters of ideal
gas at standard temperature and pressure. Evidently, when it comes to systems with
such an enormous number of particles, there is no hope to keep track of the physical
state (e.g., position and momentum) of each and every individual particle by means
of the classical methods in physics, that is, by solving a gigantic system of dif-
ferential equations pertaining to Newton's laws for all particles. Moreover, even if
those differential equations could have been solved somehow (at least approxi-
mately), the information that they would have given us would be virtually useless.
What we normally really want to know about our physical system boils down to a
fairly short list of macroscopic quantities, such as energy, heat, pressure, temper-
ature, volume, magnetization, and the like. In other words, while we continue to use
the well-known laws of physics, even the classical ones, we no longer use them in
the ordinary manner that we have known from elementary physics courses. Instead,
we think of the state of the system, at any given moment, as a realization of a
certain probabilistic ensemble. This is to say that we approach the problem from a
probabilistic (or a statistical) point of view. The beauty of statistical physics is that
it derives the macroscopic theory of thermodynamics (i.e., the relationships
between thermodynamic potentials, temperature, pressure, etc.) as ensemble aver-
ages that stem from this probabilistic microscopic theory, in the limit of an infinite
number of particles, that is, the thermodynamic limit.
The purpose of this book is to teach statistical mechanics and thermodynamics,
with some degree of orientation toward students in electrical engineering. The main
body of the lectures is devoted to statistical mechanics, whereas much less emphasis
is given to the thermodynamics part. In particular, the idea is to let the laws
of thermodynamics to be obtained as conclusions from the derivations in statistical
mechanics.
Beyond the variety of central topics in statistical physics that are important to the
general scientific education of the electrical engineering student, special emphasis is

xi
xii Introduction

devoted to subjects that are vital to the engineering education concretely. These
include, first of all, quantum statistics, like the Fermi–Dirac distribution, as well as
diffusion processes, which are both fundamental for understanding semiconductor
devices. Another important issue for the electrical engineering student is to
understand mechanisms of noise generation and stochastic dynamics in physical
systems, most notably, in electric circuitry. Accordingly, the fluctuation–dissipation
theorem of statistical mechanics, which is the theoretical basis for understanding
thermal noise processes and physical systems, is presented from the standpoint of a
system with an input and output, in a way that would be understandable and useful
for an engineer, and well related to other courses in the undergraduate curriculum of
electrical engineering, like courses on random processes. This engineering per-
spective is not available in standard physics textbooks. The quantum regime, in this
context, is important and hence provided as well. Finally, we touch upon some
relationships between statistical mechanics and information theory, and demon-
strate how the statistical–mechanical approach can be useful for the study of
information theoretic problems. These relationships are further explored, and in a
much deeper manner, in [1].
Most of the topics in this book are covered on the basis of several other
well-known books on statistical mechanics. However, several perspectives and
mathematical derivations are original and new (to the best of the author’s knowl-
edge). The book includes fairly many examples, exercises, and figures, which will
hopefully help the student to grasp the material better.
It is assumed that the reader has prior background in the following subjects:
(i) elementary calculus and linear algebra, (ii) basics of quantum mechanics, and
(iii) fundamentals of probability theory. Chapter 7 assumes also basic background
in signals-and-systems theory, as well as the theory of random processes, including
the response of linear systems to random input signals.

Reference

1. N. Merhav, Statistical physics and information theory. Foundat. Trends Commun. Inf. Theor. 6
(1–2), pp. 1–212, 2009.
Chapter 1
Kinetic Theory and the Maxwell Distribution

The concept that a gas consists of many small mobile mass particles is very old–it
dates back to the Greek philosophers. It has been periodically rejected and revived
throughout many generations of the history of science. Around the middle of the
19th century, against the general trend of rejecting the atomistic approach, Clausius,1
Maxwell2 and Boltzmann3 succeeded in developing a kinetic theory for the motion
of gas molecules, which was mathematically solid, on the one hand, and agreed
satisfactorily with the experimental evidence (at least in simple cases), on the other
hand.
In this chapter, we present some elements of Maxwell’s formalism and derivation
that builds the kinetic theory of the ideal gas. It derives some rather useful results
from first principles. While the main results that we shall see in this section can be
viewed as a special case of the more general concepts and principles that will be
provided later on, the purpose here is to give a quick snapshot on the taste of this
matter and to demonstrate how the statistical approach to physics, which is based
on very few reasonable assumptions, gives rise to rather far–reaching results and
conclusions.
The choice of the ideal gas, as a system of many mobile particles, is a good test-
bed to begin with, as on the one hand, it is simple, and on the other hand, it is not
irrelevant to electrical engineering and electronics in particular. For example, the free
electrons in a metal can often be considered a “gas” (albeit not an ideal gas), as we
shall see later on.

1 Rudolf Julius Emanuel Clausius (1822–1888) was a German physicist and mathematician who is
considered to be one of the central pioneers of thermodynamics.
2 James Clerk Maxwell (1831–1879) was a Scottish physicist and mathematician, whose other

prominent achievement was formulating classical electromagnetic theory.


3 Ludwig Eduard Boltzmann (1844–1906) was an Austrian physicist, who has founded contributions

in statistical mechanics and thermodynamics. He was one of the advocators of the atomic theory
when it was still very controversial.
© The Author(s) 2018 1
N. Merhav, Statistical Physics for Electrical Engineering,
DOI 10.1007/978-3-319-62063-3_1
2 1 Kinetic Theory and the Maxwell Distribution

1.1 The Statistical Nature of the Ideal Gas

From the statistical–mechanical perspective, an ideal gas is a system of mobile par-


ticles, which interact with one another only via elastic collisions, whose duration is
extremely short compared to the time elapsed between two consecutive collisions in
which a given particle is involved. This basic assumption is valid as long as the gas
is not too dense and the pressure that it exerts is not too high. As explained in the
Introduction, the underlying idea of statistical mechanics in general, is that instead
of hopelessly trying to keep track of the motion of each individual molecule, using
differential equations that are based on Newton’s laws, one treats the population of
molecules as a statistical ensemble using tools from probability theory, hence the
name statistical mechanics (or statistical physics).
What is the probability distribution of the state of the molecules of an ideal gas in
equilibrium? Here, by “state” we refer to the positions and the velocities (or momenta)
of all molecules at any given time. As for the positions, if gravity is neglected, and
assuming that the gas is contained in a given box (container) of volume V , there is no
apparent reason to believe that one region is preferable over others, so the distribution
of the locations is assumed uniform across the container, and independently of one
another. Thus, if there are N molecules, the joint probability density of their positions
is 1/V N everywhere within the container and zero outside. It is therefore natural to
define the density of particles per unit volume as ρ = N /V .
What about the distribution of velocities? This is slightly more involved, but as
we shall see, still rather simple, and the interesting point is that once we derive
this distribution, we will be able to derive some interesting relationships between
macroscopic quantities pertaining to the equilibrium state of the system (pressure,
density, energy, temperature, etc.). As for the velocity of each particle, we will make
two assumptions:
1. All possible directions of motion in space are equally likely. In other words, there
are no preferred directions (as gravity is neglected). Thus, the probability density
function (pdf) of the velocity vector v = vx x̂ + v y ŷ + vz ẑ depends on v only
via its magnitude, i.e., the speed s = 
v = vx2 + v 2y + vz2 , or in mathematical
terms:

f (vx , v y , vz ) = g(vx2 + v 2y + vz2 ) (1.1.1)

for some function g.


2. The various components vx , v y and vz are identically distributed and independent,
i.e.,

f (vx , v y , vz ) = f (vx ) f (v y ) f (vz ). (1.1.2)

The rationale behind identical distributions is, like in item 1 above, namely, the
isotropic nature of the pdf. The rationale behind the independence assumption
is that in each collision between two particles, the total momentum is conserved
1.1 The Statistical Nature of the Ideal Gas 3

and in each component (x, y, and z) separately, so there are actually no interac-
tions among the component momenta. Each three–dimensional particle actually
behaves like three independent one–dimensional particles, as far as the momen-
tum is concerned.
We now argue that there is only one kind of (differentiable) joint pdf f (vx , v y , vz ) that
complies with both assumptions at the same time, and this is the Gaussian density
where all three components of v are independent, zero–mean and with the same
variance.
To see why this is true, consider the equation

f (vx ) f (v y ) f (vz ) = g(vx2 + v 2y + vz2 ) (1.1.3)

which combines both requirements. Let us assume that both f and g are differentiable.
Taking now partial derivatives w.r.t. vx , v y and vz , we obtain

f  (vx ) f (v y ) f (vz ) = 2vx g  (vx2 + v 2y + vz2 ) (1.1.4)


 
f (vx ) f (v y ) f (vz ) = 2v y g (vx2 + v 2y + vz2 ) (1.1.5)

f (vx ) f (v y ) f (vz ) = 2vz g  (vx2 + v 2y + vz2 ) (1.1.6)

implying that

f  (vx ) f (v y ) f (vz ) f (vx ) f  (v y ) f (vz ) f (vx ) f (v y ) f  (vz )


= = = 2g  (vx2 + v 2y + vz2 ),
vx vy vz
(1.1.7)

and in particular,
f  (vx ) f  (v y )
= . (1.1.8)
vx f (vx ) v y f (v y )

Since the l.h.s. depends only on vx and the r.h.s. depends only on v y , the last identity
can hold only if f  (vx )/[vx f (vx )] = const. Let us denote this constant by −2α.
Then, we have a simple differential equation,

f  (vx )
= −2αvx , (1.1.9)
f (vx )

whose solution is easily found to be

f (vx ) = Be−αvx ,
2
(1.1.10)

and similar relations hold also for v y and vz . For f to be a valid pdf, α must be
positive and B must be the appropriate constant of normalization, which gives
4 1 Kinetic Theory and the Maxwell Distribution

α −αvx2
f (vx ) = e (1.1.11)
π

and the same applies to v y and vz . Thus, we finally obtain


 α 3/2
e−α(vx +v y +vz )
2 2 2
f (vx , v y , vz ) = (1.1.12)
π
namely, a Gaussian pdf of three zero–mean independent variables with the same
variance, and it only remains to determine one constant, α, which is related to this
variance. We would like now to express α in terms of some physical parameters of
the system.
To this end, we adopt the following consideration. Assume, without essential loss
of generality, that the container is a box of sizes L x × L y × L z , whose walls are
parallel to the axes of the coordinate system. Consider a molecule with velocity
v = vx x̂ + v y ŷ + vz ẑ hitting a wall parallel to the Y Z plane from the inner side
(left side) of the box. The molecule bounces elastically with a new velocity vector
v = −vx x̂ + v y ŷ + vz ẑ, and so, the change in momentum, which is also the impulse
that the molecule exerts on the wall, is p = 2mvx , where m is the mass of the
molecule. For a molecule of velocity vx in the x–direction to hit the wall within time
duration τ , its initial distance from the wall must not exceed vx τ in the x–direction.
Thus, the total average impulse contributed by a molecule with an x–component
velocity ranging between vx and vx + dvx , is given by

vx τ α −αvx2
2mvx · · e dvx .
Lx π

Consequently, the average impulse of a single molecule, exerted within time τ , is the
integral, given by
  ∞
2mτ α mτ
vx2 e−αvx dvx =
2
· .
Lx π 0 2αL x

It follows that the average4 force exerted on the Y Z wall is obtained by dividing
the last expression by τ , namely, it is m/(2αL x ), and then the average pressure
contributed by a single molecule is m/(2αL x L y L z ). Therefore, the total pressure
contributed by all N molecules is

mN mN ρm
P= = = (1.1.13)
2αL x L y L z 2αV 2α

4 Average – over time as well.


1.1 The Statistical Nature of the Ideal Gas 5

and so, we can determine α in terms of the physical quantities P and ρ and m:
ρm
α= . (1.1.14)
2P

From the equation of state of the ideal gas5

P = ρkT (1.1.15)

where k is Boltzmann’s constant (= 1.381 × 10−23 Joules/degree) and T is absolute


temperature. Thus, an alternative expression for α is:
m
α= . (1.1.16)
2kT
On substituting this into the general Gaussian form of the pdf, we finally obtain
 m 3/2  m   m 3/2   
f (
v) = exp − (vx2 + v 2y + vz2 ) = exp − ,
2πkT 2kT 2πkT kT
(1.1.17)

where  is the (kinetic) energy of the molecule. This form of a pdf, that is proportional
to e−/(kT ) , where  is the energy, is not a coincidence. We shall see it again and again
later on, and in much greater generality, as a fact that stems from much deeper and
more fundamental principles. It is called the Boltzmann–Gibbs distribution.6
Having derived the pdf of v, we can now calculate a few moments. Throughout
this book, we will denote the expectation operator by ·, which is the customary
notation used by physicists. Since

kT
vx2  = v 2y  = vz2  = (1.1.18)
m
we readily have
3kT

v 2  = vx2 + v 2y + vz2  = , (1.1.19)
m

5 Consider this as an experimental fact.


6 Another example of the Boltzmann form e−/(kT ) is the barometric formula: considering gravity,
the pressure increment dP between height h and height h + dh in an ideal–gas atmosphere, must
be equal to −μgdh, which is the pressure contributed by a layer of thickness dh, where μ is the
mass density. Thus, dP/dh = −μg. But by the equation of state, μ = N m/V = m P/kT , which
gives the simple differential equation dP/dh = −Pmg/kT , whose solution is P = P0 e−mgh/kT ,
and so, μ = −P  /g = ρ0 e−mgh/kT , which is proportional to the probability density. Then, here we
have  = mgh, the gravitational potential energy of one particle.
6 1 Kinetic Theory and the Maxwell Distribution

and so the root mean square (RMS) speed is given by




3kT
sRMS = 
v 2  = . (1.1.20)
m

Other related statistical quantities, that can be derived from f (v ), are the √
average
speed s and the most likely speed. Like sRMS , they are also proportional to kT /m
but with different constants of proportionality (see Exercise 1.1 below). The average
kinetic energy per molecule is


1 3kT
 = v 2 =
m , (1.1.21)
2 2

independent of m. This relation gives the basic significance to the notion of temper-
ature: at least in the case of the ideal gas, temperature is simply a quantity that is
directly proportional to the average kinetic energy of each particle. In other words,
temperature and kinetic energy are almost synonyms in this case. In the sequel, we
will see a more general definition of temperature. The factor of 3 at the numerator is
due to the fact that space has three dimensions, and so, each molecule has 3 degrees of
freedom. Every degree of freedom contributes an amount of energy given by kT /2.
This will turn out later to be a special case of a more general principle called the
equipartition of energy.
The pdf of the speed, s =  v , can be derived from the pdf of the velocity v using
the obvious consideration that all vectors v of the same norm correspond to the same
speed. Thus, the pdf of s is simply the pdf of v (which depends solely on  v  = s)
multiplied by the surface area of a three–dimensional sphere of radius s, which is
4πs 2 , i.e.,

 m 3/2   
−ms 2 /(2kT ) 2 m 3 2 −ms 2 /(2kT )
f (s) = 4πs 2
e = ·s e (1.1.22)
2πkT π kT

This is called the Maxwell distribution and it is depicted in Fig. 1.1 for various values
of the parameter kT√/m. To obtain the pdf√of the energy , we should change variables
according to s = 2/m and ds = d/ 2m. The result is

2 
f () = √ · e−/(kT ) . (1.1.23)
π(kT )3/2

Exercise 1.1 Use the above to calculate: (i) the average speed s, (ii) the most likely
speed, argmaxs f (s), and (iii) the most likely energy argmax f ().

An interesting relation, that will be referred to later on, links between the average
energy per particle ¯ = , the density ρ, and the pressure P, or equivalently, the
total energy E = N ¯, the volume V and P:
1.1 The Statistical Nature of the Ideal Gas 7

Fig. 1.1 Demonstration of 0.7


the Maxwell distribution for
various values of the 0.6
parameter kT /m. The red
curve (tall and narrow) 0.5
corresponds to the smallest
value and the blue curve 0.4
(short and wide) – to the
highest value 0.3

0.2

0.1

0
0 1 2 3 4 5

2ρ 3kT 2ρ
P = ρkT = · = · ¯, (1.1.24)
3 2 3
which after multiplying by V becomes

2E
PV = . (1.1.25)
3
It is interesting to note that this relation can be obtained directly from the analysis of
the impulse exerted by the particles on the walls, similarly as in the earlier derivation
of the parameter α, and without recourse to the equation of state (see, for example,
[1, Sect. 20–4, pp. 353–355]). This is because the parameter α of the Gaussian
pdf of each component of v has the obvious meaning of 1/(2σv2 ), where σv2 is the
common variance of each component of v. Thus, σv2 = 1/(2α) and so,  v 2  =
3σv = 3/(2α), which in turn implies that
2

m 3m 3m 3P
¯ = 
v 2 = = = , (1.1.26)
2 4α 4ρm/(2P) 2ρ

in equivalence to the above.

1.2 Collisions

We now take a closer look into the issue of collisions. We first define the concept
of collision cross–section, which we denote by σ. Referring to Fig. 1.2, consider
a situation, where two hard spheres, labeled A and B, with diameters 2a and 2b,
respectively, are approaching each other, and let c be the projection of the distance
between their centers in the direction perpendicular to the direction of their relative
8 1 Kinetic Theory and the Maxwell Distribution

Fig. 1.2 Hard sphere collision

motion, v1 − v2 . Clearly, collision will occur if and only if c < a + b. In other words,
the two spheres would collide only if the center of B lies inside a volume whose
cross sectional area is σ = π(a + b)2 , or for identical spheres, σ = 4πa 2 . Let the
colliding particles have relative velocity  v = v1 − v2 . Passing to the coordinate
system of the center of mass of the two particles, this is equivalent to the motion
of one particle with the reduced mass μ = m 1 m 2 /(m 1 + m 2 ), and so, in the case of
identical particles, μ = m/2. The average relative speed is easily calculated from the
Maxwell distribution, but with m being replaced by μ = m/2, i.e.,

 m 3/2  ∞ 
3 −m(v)2 /(4kT ) kT √

v  = 4π (v) e d(v) = 4 · = 2s.
4πkT 0 πm
(1.2.1)

The total number of particles per unit volume that collide with a particular particle
within time τ is

kT
Ncol (τ ) = ρσ v τ = 4ρστ (1.2.2)
πm

and so, the collision rate of each particle is



kT
ν = 4ρσ . (1.2.3)
πm

The mean distance between collisions (a.k.a. the mean free path) is therefore


v  1 kT
λ= =√ =√ . (1.2.4)
ν 2ρσ 2Pσ

What is the probability distribution of the random distance L between two con-

secutive collisions of a given particle? In particular, what is p(l) = Pr{L ≥ l}? Let
us assume that the collision process is memoryless in the sense that the event of not
colliding before distance l1 + l2 is the intersection of two independent events, the
first one being the event of not colliding before distance l1 , and the second one being
the event of not colliding before the additional distance l2 . That is
1.2 Collisions 9

p(l1 + l2 ) = p(l1 ) p(l2 ). (1.2.5)

We argue that under this assumption, p(l) must be exponential in l. This follows
from the following consideration.7 Taking partial derivatives of both sides w.r.t. both
l1 and l2 , we get

p  (l1 + l2 ) = p  (l1 ) p(l2 ) = p(l1 ) p  (l2 ). (1.2.6)

Thus,
p  (l1 ) p  (l2 )
= (1.2.7)
p(l1 ) p(l2 )

for all non-negative l1 and l2 . Thus, p  (l)/ p(l) must be a constant, which we shall
denote by −a. This trivial differential equation has only one solution which obeys
the obvious initial condition p(0) = 1:

p(l) = e−al l≥0 (1.2.8)

so it only remains to determine the parameter a, which must be positive since the
function p(l) must be monotonically non–increasing by definition. This can easily
be found by using the fact that L = 1/a = λ, and so,
 √ 
−l/λ 2Pσl
p(l) = e = exp − . (1.2.9)
kT

1.3 Dynamical Aspects

The discussion thus far focused on the static (equilibrium) behavior of the ideal gas.
In this subsection, we will briefly touch upon dynamical issues pertaining to non–
equilibrium situations. These issues will be further developed in Chap. 7, and with
much greater generality.
Consider two adjacent containers separated by a wall. Both have the same volume
V of the same ideal gas at the same temperature T , but with different densities ρ1
and ρ2 , and hence different pressures P1 and P2 . Let us assume that P1 > P2 . At time
t = 0, a small hole is created in the separating wall. The area of this hole is A (see
Fig. 1.3).
If the mean free distances λ1 and λ2 are relatively large compared to the dimensions
of the hole, it is safe to assume that every molecule that reaches the hole, passes

7 Similar idea to the one of the earlier derivation of the Gaussian pdf of the ideal gas.
10 1 Kinetic Theory and the Maxwell Distribution

Fig. 1.3 Gas leakage


through a small hole

through it. The mean number of molecules that pass from left to right within time τ
is given by8
 ∞ 
α −αvx2 vx τ A ρ1 τ A
N→ = ρ1 V · dvx e · = √ (1.3.1)
0 π V 2 πα

and so the number of particles per second, flowing from left to right is

dN→ ρ1 A
= √ . (1.3.2)
dt 2 πα

Similarly, in the opposite direction, we have

dN← ρ2 A
= √ , (1.3.3)
dt 2 πα

and so, the net left–to–right current is



 dN (ρ1 − ρ2 )A kT
I = = √ = (ρ1 − ρ2 )A . (1.3.4)
dt 2 πα 2πm

An important point here is that the current is proportional to the difference between
densities (ρ1 − ρ2 ), and considering the equation of state of the ideal gas, it is there-
fore also proportional to the pressure difference, (P1 − P2 ). This rings the bell of
the well known analogous fact that the electric current is proportional to the voltage,
which in turn is the difference between the electric potentials at two points. Con-

sidering the fact that ρ = (ρ1 + ρ2 )/2 is constant, we obtain a simple differential
equation

dρ1 A kT 
= (ρ2 − ρ1 ) = C(ρ2 − ρ1 ) = 2C(ρ − ρ1 ) (1.3.5)
dt V 2πm

8 Note that for v = v = 0, the factor v τ A/V , in the forthcoming equation, is clearly the relative
y z x
volume (and hence the probability) of being in the ‘box’ in which a particle must be found in order
to pass the hole within τ seconds. When v y and vz are non–zero, instead of a rectangular box, this
region becomes a parallelepiped, but the relative volume remains vx τ A/V independently of v y
and vz .
1.3 Dynamical Aspects 11

whose solution is

ρ1 (t) = ρ + [ρ1 (0) − ρ]e−2Ct (1.3.6)

which means that equilibrium is approached exponentially fast with time constant

1 V 2πm
τ= = . (1.3.7)
2C 2A kT

Imagine now a situation, where there is a long pipe aligned along the x–direction.
The pipe is divided into a chain of cells in a linear fashion, and in the wall between
each two consecutive cells there is a hole of area A. The length of each cell (i.e., the
distance between consecutive walls) is the mean free distance λ, so that collisions
within each cell can be neglected. Assume further that λ is so small that the density
of each cell at time t can be approximated using a continuous function ρ(x, t). Let
x0 be the location of one of the walls. Then, according to the above derivation, the
current at x = x0 is
     
λ λ kT
I (x0 ) = ρ x0 − , t − ρ x0 + , t A
2 2 2πm
 
kT ∂ρ(x, t) 
≈ −Aλ · . (1.3.8)
2πm ∂x x=x0

Thus, the current is proportional to the negative gradient of the density. This is quite
a fundamental result which holds with much greater generality. In the more general
context, it is known as Fick’s law.
Consider next two close points x0 and x0 + dx, with possibly different current
densities (i.e., currents per unit area) J (x0 ) and J (x0 + x). The difference J (x0 ) −
J (x0 + x) is the rate at which matter accumulates along the interval [x0 , x0 + x]
per unit area in the perpendicular plane. Within t seconds, the number of particles
per unit area within this interval has grown by [J (x0 ) − J (x0 + x)]t. But this
amount is also [ρ(x0 , t + t) − ρ(x0 , t)]x, Taking the appropriate limits, we get

∂ J (x) ∂ρ(x, t)
=− , (1.3.9)
∂x ∂t
which is a one–dimensional version of the so called equation of continuity. Differ-
entiating now Eq. (1.3.8) w.r.t. x and comparing with (1.3.9), we obtain the diffusion
equation (in one dimension):

∂ρ(x, t) ∂ 2 ρ(x, t)
=D (1.3.10)
∂t ∂x 2
12 1 Kinetic Theory and the Maxwell Distribution

where the constant D, in this case,



Aλ kT
D= · , (1.3.11)
S 2πm

which is called the diffusion coefficient. Here S is the cross–section area.


This is, of course, merely a toy model – it is a caricature of a real diffusion process,
but it captures its essence. Diffusion processes are central in irreversible statistical
mechanics, since the solution to the diffusion equation is sensitive to the sign of time.
This is different from the Newtonian equations of frictionless motion, which have a
time reversal symmetry and hence are reversible. We will touch upon these issues in
Chap. 7.
The equation of continuity, Fick’s law, the diffusion equation and its extension, the
Fokker–Planck equation (which will also be discussed), are all very central in physics
in general and in semiconductor physics, in particular, as they describe processes of
propagation of concentrations of electrons and holes in semiconductor materials.
Another branch of physics where these equations play an important role is fluid
mechanics.

1.4 Suggestions for Supplementary Reading

The following books are recommended for further reading and for related material.
Beck [2, Sect. 2.2] derives the pdf of the particle momenta in a manner somewhat
different than here. Other parts of this section are quite similar to those of [2, Chap. 2].
A much more detailed exposition of the kinetic theory of gases appears also in many
other textbooks, including: Huang [3, Chap. 4], Kardar [4, Chap. 3], Kittel [5, Part I,
Chap. 13], Mandl [6, Chap. 7], Reif [7, Chap. 9], and Tolman [8, Chap. IV], to name
a few.

References

1. F.W. Sears, M.W. Zemansky, H.D. Young, University Physics (Addison-Wesley, Reading, 1976)
2. A.H.W. Beck, Statistical Mechanics, Fluctuations and Noise (Edward Arnold Publishers, Lon-
don, 1976)
3. K. Huang, Statistical Mechanics, 2nd edn. (Wiley, New York, 1987)
4. M. Kardar, Statistical Physics of Particles (Cambridge University Press, Cambridge, 2007)
5. C. Kittel, Elementary Statistical Physics (Wiley, New York, 1958)
6. F. Mandl, Statistical Physics (Wiley, Chichester, 1971)
7. F. Reif, Fundamentals of Statistical and Thermal Physics (McGraw-Hill, New York, 1965)
8. R.C. Tolman, The Principles of Statistical Mechanics (Dover Publications, New York, 1979)
Chapter 2
Elementary Statistical Physics

In this chapter, we provide the formalism and the elementary background in statistical
physics. We first define the basic postulates of statistical mechanics, and then define
various ensembles. Finally, we shall derive some of the thermodynamic potentials
and their properties, as well as the relationships among them. The important laws of
thermodynamics will also be pointed out. The contents of this chapter has a consid-
erable overlap with Chap. 2 of [1] (though there are also considerable differences).
It is provided in this book too, mostly for the sake of completeness.

2.1 Basic Postulates

As explained in the Introduction, statistical physics is about a probabilistic approach


to systems of many particles. While our discussion here will no longer be specific
to the ideal gas as before, we will nonetheless start again with this example in mind,
just for the sake of concreteness, Consider then a system with a very large number N
of mobile particles, which are free to move in a given volume. The microscopic state
(or microstate, for short) of the system, at each time instant t, consists, in this example,
of the position vector ri (t) and the momentum vector pi (t) of each and every particle,
1 ≤ i ≤ N . Since each one of these is a vector of three components, the microstate is
then given by a (6N )–dimensional vector x(t) = {( ri (t), pi (t)) : i = 1, 2, . . . , N },
whose trajectory along the time axis, in the phase space IR6N , is called the phase
trajectory.
Let us assume that the system is closed, i.e., isolated from its environment, in
the sense that no energy flows inside or out. Imagine that the phase space IR6N is
partitioned into very small hypercubes (or cells)  p × r . One of the basic postulates
of statistical mechanics is the following: in the long run, the relative amount of
time at which x(t) spends within each such cell, converges to a certain number
between 0 and 1, which can be given the meaning of the probability of this cell.
© The Author(s) 2018 13
N. Merhav, Statistical Physics for Electrical Engineering,
DOI 10.1007/978-3-319-62063-3_2
14 2 Elementary Statistical Physics

Thus, there is an underlying assumption of equivalence between temporal averages


and ensemble averages, namely, this is the postulate of ergodicity. Considerable
efforts were dedicated to the proof of the ergodic hypothesis at least in some cases.
As reasonable and natural as it may seem, the ergodic hypothesis should not be taken
for granted. It does not hold for every system but only if no other conservation law
holds. For example, a single molecule (N = 1) of a gas in a box is non–ergodic.
The reason is simple: assuming elastic collisions with the walls, the kinetic energy
of the molecule is conserved, and hence also the speed s, rather than sampling the
Maxwell distribution over time.
What are then the probabilities of the above–mentioned phase–space cells? We
would like to derive these probabilities from first principles, based on as few as
possible basic postulates. Our second postulate is that for an isolated system (i.e.,
whose energy is fixed) all microscopic states {x(t)} are equiprobable. The rationale
behind this postulate is twofold:
• In the absence of additional information, there is no apparent reason that certain
regions in phase space would have preference relative to any others.
• This postulate is in harmony with a basic result in kinetic theory of gases – the
Liouville theorem (see e.g., [2]), which we will not touch upon in this book, but
in a nutshell, it asserts that the phase trajectories must lie along hyper-surfaces of
constant probability density.1

2.2 Statistical Ensembles

2.2.1 The Microcanonical Ensemble

Before we proceed, let us slightly broaden the scope of our discussion. In a more
general context, associated with our N –particle physical system, is a certain instan-
taneous microstate, generically denoted by x = (x1 , x2 , . . . , x N ), where each xi ,
1 ≤ i ≤ N , may itself be a vector of several physical quantities associated with par-
ticle number i, e.g., its position, momentum, angular momentum, magnetic moment,
spin, and so on, depending on the type and the nature of the physical system. Accord-
ing to the physical model of the given system, there is a certain energy function, a.k.a.
Hamiltonian, that assigns to every x a certain energy E(x).2 Now, let us denote by
A(E) the volume of the shell of energy about E. This means

1 This is a result of the energy conservation law along with the fact that probability mass behaves
like an incompressible fluid in the sense that whatever mass that flows into a certain region from
some direction must be equal to the outgoing flow from some other direction. This is reflected in
the equation of continuity, which was demonstrated earlier.
2 For example, in the case of an ideal gas, E (x) =
N
i=1  p
i 2 /(2m), where m is the mass of each
molecule, namely, it accounts for the contribution of the kinetic energies only. In more complicated
situations, there might be additional contributions of potential energy, which depend on the positions.
2.2 Statistical Ensembles 15

A(E) = Vol{x : E ≤ E(x) ≤ E + E} = dx, (2.2.1)
{x: E≤E(x)≤E+E}

where E is a very small (but fixed) energy increment, which is immaterial when N
is large. Then, our above postulate concerning the ensemble of an isolated system,
which is called the microcanonical ensemble, is that the probability density P(x) is
given by
 1
E ≤ E(x) ≤ E + E
P(x) = A(E) (2.2.2)
0 elsewhere

In the discrete case, things are simpler: here, A(E) is the number of microstates with
E(x) = E (exactly) and P(x) is the uniform probability mass function over this set
of states.
Returning to the general case, we next define the notion of the density of states,
ω(E), which is intimately related to A(E). Basically, in simple cases, ω(E) is defined
such that ω(E)E = A(E) where E is very small, but there might be a few minor
corrections, depending on the concrete system being addressed. More generally, we
define the density of states such that ω(E)E = (E), where (E) will be the
relevant (possibly corrected) function. The first correction has to do with the fact
that A(E) is, in general, not dimensionless: in the above example of a gas, it has
the physical units of [length × momentum]3N = [J · s]3N , but we must eliminate
these physical units because we will have to apply on it non–linear functions like the
logarithmic function. To this end, we normalize the volume A(E) by an elementary
reference volume. In the gas example, this reference volume is taken to be h 3N ,
where h is Planck’s constant (h ≈ 6.62 × 10−34 J · s). Informally, the intuition
comes from the fact that h is our best available “resolution” in the plane spanned
by each component of ri and the corresponding component of pi , owing to the
uncertainty principle in quantum mechanics, which tells that the product of the
standard deviations pa · ra of each component a (a = x, y, z) is lower bounded
by /2, where  = h/(2π). More formally, this reference volume is obtained in
a natural manner from quantum statistical mechanics: by changing the integration
variable p to k using the relation p = k,
 where k is the wave vector. This is a well–
known relation (one of the de Broglie relations) pertaining to particle–wave duality.
The second correction that is needed to pass from A(E) to (E) is applicable when
the particles are indistinguishable3 : In these cases, we do not consider permutations
between particles in a given configuration as distinct microstates. Thus, we have to
divide also by N ! Taking into account both corrections, we find that in the example
of the ideal gas,

3 Inthe example of the ideal gas, since the particles are mobile and since they have no colors and
no identity certificates, there is no distinction between a state where particle no. 15 has position r
and momentum p while particle no. 437 has position r and momentum p and a state where these
two particles are swapped.
16 2 Elementary Statistical Physics

A(E)
(E) = . (2.2.3)
N !h 3N
Once again, it should be understood that both of these corrections are optional and
their applicability depends on the system in question: the first correction is applicable
only if A(E) has physical units and the second correction is applicable only if the
particles are indistinguishable. For example, if x is discrete, in which case the integral
defining A(E) is replaced by a sum (that counts x’s with E(x) = E), and the particles
are distinguishable, then no corrections are needed at all, i.e.,

(E) = A(E). (2.2.4)

Now, the entropy is defined as

S(E) = k ln (E), (2.2.5)

where k is Boltzmann’s constant. We will see later what is the relationship between
S(E) and the classical thermodynamic entropy, due to Clausius (1850), as well as the
information–theoretic entropy, due to Shannon (1948). As will turn out, all three are
equivalent to one another. Here, a comment on the notation is in order: the entropy S
may depend on additional quantities, other than the energy E, like the volume V and
the number of particles N . When this dependence will be relevant and important, we
will use the more complete form of notation S(E, V, N ). If only the dependence on
E is relevant in a certain context, we use the simpler notation S(E).
To get some insight into the behavior of the entropy, it should be noted that
normally, (E) (and hence also ω(E)) behaves as an exponential function of N (at
least asymptotically), and so, S(E) is roughly linear in N . For example, if E(x) =
 N  pi 2
i=1 2m , then (E) is the volume

of a thin shell about the surface of a (3N )–
dimensional sphere with radius 2m E, divided by N !h 3N , which is proportional to
(2m E)3N /2 V N /N !h 3N , where V is the volume. The quantity ω(E) is then associated
with the surface area of this (3N )–dimensional sphere. Specifically (ignoring the
contribution of the factor E), we get
  
4πm E 3N /2 VN 3
S(E, V, N ) = k ln · + Nk
3N N !h 3N 2
  
4πm E 3/2 V 5
≈ N k ln · 3
+ N k. (2.2.6)
3N Nh 2

Assuming that E and V are both proportional to N (E = N  and V = N /ρ), it


is readily seen that S(E, V, N ) is also proportional to N . A physical quantity that
has a linear dependence on the size of the system N , is called an extensive quantity.
Energy, volume and entropy are then extensive quantities. Other quantities, which
2.2 Statistical Ensembles 17

are not extensive, i.e., independent of the system size, like temperature and pressure,
are called intensive.
It is interesting to point out that from the function S(E, V, N ), one can obtain the
entire information about the relevant macroscopic physical quantities of the system,
e.g., temperature, pressure, and so on. Specifically, the temperature T of the system
is defined according to:

1 ∂ S(E, V, N )
= (2.2.7)
T ∂E V,N

where [·]V,N emphasizes that the derivative is taken while keeping V and N constant.
One may wonder, at this point, what is the justification for defining temperature this
way. We will get back to this point a bit later, but for now, we can easily see that this
is indeed true at least for the ideal gas, as by taking the derivative of (2.2.6) w.r.t. E,
we get
∂ S(E, V, N ) 3N k 1
= = , (2.2.8)
∂E 2E T
where the second equality has been shown already in Chap. 1.
Intuitively, in most situations, we expect that S(E) would be an increasing function
of E for fixed V and N (although this is not strictly always the case), which means
T ≥ 0. But T is also expected to increase with E (or equivalently, E is increasing
with T , as otherwise, the heat capacity dE/dT < 0). Thus, 1/T should decrease with
E, which means that the increase of S in E slows down as E grows. In other words,
we expect S(E) to be a concave function of E. In the above example, indeed, S(E)
is logarithmic and E = 3N kT /2, as we have seen.
How can we be convinced, in mathematical terms, that under certain regularity
conditions, S(E) is a concave function in E? The answer may be given by a simple
superadditivity argument: As both E and S are extensive quantities, let us define
E = N  and for a given density ρ,

S(N )
s() = lim , (2.2.9)
N →∞ N

i.e., the per–particle entropy as a function of the per–particle energy, where we assume
that the limit exists. Consider the case where the Hamiltonian is additive, i.e.,


N
E(x) = E(xi ) (2.2.10)
i=1

N  pi 2
just like in the above example where E(x) = i=1 2m
. Then, the inequality

(N1 1 + N2 2 ) ≥ (N1 1 ) · (N2 2 ), (2.2.11)


18 2 Elementary Statistical Physics

Fig. 2.1 Schottky defects in


a crystal lattice

expresses the simple fact that if our system is partitioned into two parts,4 one with
N1 particles, and the other with N2 = N − N1 particles, then every combination
of individual microstates with energies N1 1 and N2 2 corresponds to a combined
microstate with a total energy of N1 1 + N2 2 (but there are more ways to split this
total energy between the two parts). Thus,

k ln (N1 1 + N2 2 ) k ln (N1 1 ) k ln (N2 2 )


≥ +
N1 + N2 N1 + N2 N1 + N2
N1 k ln (N1 1 )
= · +
N1 + N2 N1
N2 k ln (N2 2 )
· . (2.2.12)
N1 + N2 N2

and so, by taking N1 and N2 to ∞, with N1 /(N1 + N2 ) → λ ∈ (0, 1), we get:

s(λ1 + (1 − λ)2 ) ≥ λs(1 ) + (1 − λ)s(2 ), (2.2.13)

which establishes the concavity of s(·) at least in the case of an additive Hamiltonian,
which means that the entropy of mixing two systems of particles is greater than the
total entropy before the mix. A similar proof can be generalized to the case where
E(x) includes
 N also a limited degree of interactions (short range interactions), e.g.,
E(x) = i=1 E(x i , x i+1 ), but this requires somewhat more caution. In general,
however, concavity may no longer hold when there are long range interactions, e.g.,
where some terms of E(x) depend on a linear subset of particles.

Example 2.1 (Schottky defects) In a certain crystal, the atoms are located in a lattice,
and at any positive temperature there may be defects, where some of the atoms are
dislocated (see Fig. 2.1). Assuming that defects are sparse enough, such that around
each dislocated atom all neighbors are in place, the activation energy, 0 , required
for dislocation is fixed. Denoting the total number of atoms by N and the number of
defected ones by n, the total energy is then E = n0 , and so,

4 Thisargument works for distinguishable particles. Later on, a more general argument will be
presented, that holds for indistinguishable particles too.
2.2 Statistical Ensembles 19
 
N N!
(E) = = , (2.2.14)
n n!(N − n)!

or, equivalently,

N!
S(E) = k ln (E) = k ln
n!(N − n)!
≈ k[N ln N − n ln n − (N − n) ln(N − n)] (2.2.15)

where in the last passage we have used the Stirling approximation. It is important to
point out that here, unlike in the example of the ideal gas, we have not divided A(E)
by N ! The reason is that we do distinguish between two different configurations
where the same number of particles were dislocated but the sites of dislocation are
different. Yet, we do not distinguish between two microstates whose only difference
is two (identical) particles which are not dislocated but swapped. This is the reason
for the denominator n!(N − n)! in the expression of (E). Now,5

1 ∂S dn dS 1 N −n
= = · = · k ln , (2.2.16)
T ∂E dE dn 0 n

which gives the number of defects as

N
n= . (2.2.17)
exp(0 /kT ) + 1

At T = 0, there are no defects, but their number increases gradually with T , approx-
imately according to exp(−0 /kT ). Note also that
  n
N
S(E) = k ln ≈ k N h2
n N
   
E 
= k N h2 = k N h2 , (2.2.18)
N 0 0

where

h 2 (x) = −x ln x − (1 − x) ln(1 − x), 0≤x ≤1

is the so called binary entropy function. Note also that s() = kh 2 (/0 ) is indeed
concave in this example. 

5 Hereand in the sequel, the reader might wonder about the meaning of taking derivatives of, and
with respect to, integer valued variables, like the number of dislocated particles, n. To this end,
imagine an approximation where n is interpolated to be a continuous valued variable.
20 2 Elementary Statistical Physics

Suppose we have two systems that are initially at certain temperatures (and with
corresponding energies). At a certain time instant, the two systems are brought into
thermal contact with one another, but their combination remains isolated. What hap-
pens after a long time? How does the total energy E, split and what is the final
temperature T of the combined system? The number of combined microstates where
subsystem no. 1 has energy E 1 and subsystem no. 2 has energy E 2 = E − E 1 is
1 (E 1 ) · 2 (E − E 1 ). As the combined system is isolated, the probability of such
a combined macrostate is proportional to 1 (E 1 ) · 2 (E − E 1 ). Keeping in mind
that, normally, 1 and 2 are exponential in N , then for large N , this product is
dominated by the value of E 1 for which it is maximum, or equivalently, the sum
of logarithms, S1 (E 1 ) + S2 (E − E 1 ), is maximum, i.e., it is a maximum entropy
situation, which is the second law of thermodynamics, asserting that an isolated
system (in this case, combined of two subsystems) achieves its maximum possible
entropy in equilibrium. This maximum is normally achieved at the value of E 1 for
which the derivative vanishes, i.e.,

S1 (E 1 ) − S2 (E − E 1 ) = 0 (2.2.19)

or

S1 (E 1 ) − S2 (E 2 ) = 0 (2.2.20)

which means
1 1
≡ S1 (E 1 ) = S2 (E 2 ) ≡ . (2.2.21)
T1 T2

Thus, in equilibrium, which is the maximum entropy situation, the energy splits
in a way that temperatures are the same. Now, we can understand the concavity of
entropy more generally: λs(1 )+(1−λ)s(2 ) was the total entropy per particle when
two subsystems (with the same entropy function) were isolated from one another,
whereas s(λ1 + (1 − λ)2 ) is the equilibrium entropy per particle after we let them
interact thermally.
At this point, we are ready to justify why S  (E) is equal to 1/T in general, as
was promised earlier. Although it is natural to expect that equality between S1 (E 1 )
and S2 (E 2 ), in thermal equilibrium, is related to equality between T1 and T2 , this
does not automatically mean that the derivative of each entropy is given by one over
its temperature. On the face of it, for the purpose of this implication, this derivative
could have been equal any one–to–one function of temperature f (T ). To see why
f (T ) = 1/T indeed, imagine that we have a system with an entropy function
S0 (E) and that we let it interact thermally with an ideal gas whose entropy function,
which we shall denote now by Sg (E), is given as in Eq. (2.2.6). Now, at equilibrium
S0 (E 0 ) = Sg (E g ), but as we have seen already, Sg (E g ) = 1/Tg , where Tg is the
temperature of the ideal gas. But in thermal equilibrium the temperatures equalize,
i.e., Tg = T0 , where T0 is the temperature of the system of interest. It then follows
2.2 Statistical Ensembles 21

eventually that S0 (E 0 ) = 1/T0 , which now means that in equilibrium, the derivative
of entropy of the system of interest is equal to the reciprocal of its temperature in
general, and not only for the ideal gas! At this point, the fact that our system has
interacted and equilibrated with an ideal gas is not important anymore and it does
not limit the generality of this statement. In simple words, our system does not ‘care’
what kind of system it has interacted with, whether ideal gas or any other. This
follows from a fundamental principle in thermodynamics, called the zero–th law of
thermodynamics, which states that thermal equilibrium has a transitive property:
If system A is in equilibrium with system B and system B is in equilibrium with
system C, then A is in equilibrium with C.
So we have seen that ∂ S/∂ E = 1/T , or equivalently, δS = δ E/T . But in the
absence of any mechanical work (V is fixed) applied to the system and any chemical
energy injected into the system (N is fixed), any change in energy must be in the
form of heat,6 thus we denote δ E = δ Q, where Q is the heat intake. Consequently,

δQ
δS = , (2.2.22)
T
This is exactly the definition of the classical thermodynamic entropy due to Clausius.
Thus, at least for the case where no mechanical work is involved, we have demon-
strated the equivalence of the two notions of entropy, the statistical notion due
to Boltzmann
 S = k ln , and the thermodynamic entropy due to Clausius,
S = dQ/T , where the integration should be understood to be taken along a slow
(quasi–static) process, where after each small increase in the heat intake, the sys-
tem is allowed to equilibrate, which means that T is given enough time to adjust
before more heat is further added. For a given V and N , the difference S between
the entropies S A and S B associated with two temperatures T A and TB (pertaining to
B
internal energies E A and E B , respectively) is given by S = A dQ/T along such a
quasi–static process. This is a rule that defines entropy differences, but not absolute
levels. A reference value is determined by the third law of thermodynamics, which
asserts that as T tends to zero, the entropy tends to zero as well.7
We have seen what is the meaning of the partial derivative of S(E, V, N ) w.r.t. E.
Is there also a simple meaning to the partial derivative w.r.t. V ? Again, let us begin
by examining the ideal gas. Differentiating the expression of S(E, V, N ) of the ideal
gas w.r.t. V , we obtain
∂ S(E, V, N ) Nk P
= = , (2.2.23)
∂V V T

6 Heat is a form of energy that is transferred neither by mechanical work nor by matter. It is the type

of energy that flows spontaneously from a system/body at a higher temperature to one with a lower
temperature (and this transfer is accompanied by an increase in the total entropy).
7 In this context, it should be understood that the results we derived for the ideal gas hold only for

high enough temperatures: since S was found proportional to ln E and E is proportional to T , then
S is proportional to ln T , but this cannot be true for small T as it contradicts (among other things)
the third law.
22 2 Elementary Statistical Physics

where the second equality follows again from the equation of state. So at least for the
ideal gas, this partial derivative is related to the pressure P. For similar considerations
as before, the relation
∂ S(E, V, N ) P
= (2.2.24)
∂V T
is true not only for the ideal gas, but in general. Consider again an isolated system
that consists of two subsystems, separated by a wall (or a piston). Initially, this
wall is fixed and the volumes are V1 and V2 . At a certain moment, this wall is
released and allowed to be pushed in either direction. How would the total volume
V = V1 + V2 divide between the two subsystems in equilibrium? Again, the total
entropy S1 (E 1 , V1 ) + S2 (E − E 1 , V − V1 ) would tend to its maximum for the same
reasoning as before. The maximum will be reached when the partial derivatives of
this sum w.r.t. both E 1 and V1 would vanish. The partial derivative w.r.t. E 1 has
already been addressed. The partial derivative w.r.t. V1 gives

P1 ∂ S1 (E 1 , V1 ) ∂ S2 (E 2 , V2 ) P2
= = = (2.2.25)
T1 ∂V1 ∂V2 T2

Since T1 = T2 by the thermal equilibrium pertaining to derivatives w.r.t. energies, it


follows that P1 = P2 , which means mechanical equilibrium: the wall will be pushed
to the point where the pressures from both sides are equal. We now have the following
differential relationship:

∂S ∂S
δS = δE + δV
∂E ∂V
δE PδV
= + (2.2.26)
T T
or

δ E = T δS − PδV = δ Q − δW, (2.2.27)

which is the the first law of thermodynamics, asserting that the change in the energy
δ E of a system with a fixed number of particles is equal to the difference between
the incremental heat intake δ Q and the incremental mechanical work δW carried out
by the system. This is nothing but a restatement of the law of energy conservation.

Example 2.2 (Compression of ideal gas) Consider again an ideal gas of N particles
at constant temperature T . The energy is E = 3N kT /2 regardless of the volume.
This means that if we compress (slowly) the gas from volume V1 to volume V2
(V2 < V1 ), the energy remains the same, in spite of the fact that we injected energy
by applying mechanical work
2.2 Statistical Ensembles 23
 V2  V2
dV V1
W =− PdV = −N kT = N kT ln . (2.2.28)
V1 V1 V V2

What happened to that energy? The answer is that it was transformed into heat as
the entropy of the system (which is proportional to ln V ) has changed by the amount
S = −N k ln(V1 /V2 ), and so, the heat intake Q = T S = −N kT ln(V1 /V2 )
exactly balances the work. 

Finally, we should consider the partial derivative of S w.r.t. N . This is given by

∂ S(E, V, N ) μ
=− , (2.2.29)
∂N T
where μ is called the chemical potential. If we now consider again the isolated system,
which consists of two subsystems that are allowed to exchange, not only heat and
volume, but also particles (of the same kind), whose total number is N = N1 + N2 ,
then again, maximum entropy considerations would yield an additional equality
between the chemical potentials, μ1 = μ2 (chemical equilibrium).8 The chemical
potential should be understood as a kind of a force that controls the ability to inject
particles into the system. For example, if the particles are electrically charged, then
the chemical potential has a simple analogy to the electrical potential. The first law
is now extended to have an additional term, pertaining to an increment of chemical
energy, and it now reads:

δ E = T δS − PδV + μδ N . (2.2.30)

Equation (2.2.30) can be used to derive a variety of relations. For example, μ =


(∂ E/∂ N ) S,V , T = (∂ H/∂ S) N , where H = E + P V is called the enthalpy, P =
μ(∂ N /∂V ) E,S , and so on.

2.2.2 The Canonical Ensemble

So far we have assumed that our system is isolated, and therefore has a strictly fixed
energy E. Let us now relax this assumption and assume instead that our system is
free to exchange energy with its very large environment (heat bath) and that the total
energy of the system plus heat bath, E 0 , is by far larger than the typical energy of the
system. The combined system, composed of our original system plus the heat bath,
is now an isolated system at temperature T .

8 Equality of chemical potentials is, in fact, the general principle of chemical equilibrium, and not
equality of concentrations or densities. In Sect. 1.3, we saw equality of densities, because in the
case of the ideal gas, the chemical potential is a function of the density, so equality of chemical
potentials happens to be equivalent to equality of densities in this case.
24 2 Elementary Statistical Physics

Similarly as before, since the combined system is isolated, it is governed by the


microcanonical ensemble. The only difference is that now we assume that one of
the systems (the heat bath) is very large compared to the other (our test system).
This means that if our small system is in microstate x (for whatever definition of the
microstate vector) with energy E(x), then the heat bath must have energy E 0 − E(x)
to complement the total energy to E 0 . The number of ways that the heat bath may
have energy E 0 − E(x) is  B (E 0 − E(x)), where  B (·) is the state volume function
pertaining to the entropy of the heat bath. In other words, the volume/number of
microstates of the combined system for which the small subsystem is in microstate
x is  B (E 0 − E(x)). Since the combined system is governed by the microcanonical
ensemble, the probability of this is proportional to  B (E 0 − E(x)). More precisely:

 B (E 0 − E(x))
P(x) =  
. (2.2.31)
x  B (E 0 − E(x ))

Let us focus on the numerator for now, and normalize the result at the end. Then,

P(x) ∝  B (E 0 − E(x))
= exp{S B (E 0 − E(x))/k}
  
S B (E 0 ) 1 ∂ S B (E) 
≈ exp − · E(x)
k k ∂ E  E=E0
 
S B (E 0 ) 1
= exp − · E(x)
k kT
∝ exp{−E(x)/(kT )}. (2.2.32)

It is customary to work with the so called inverse temperature:

1
β= (2.2.33)
kT
and so,

P(x) ∝ e−βE(x) , (2.2.34)

as we have already seen in the example of the ideal gas (where E(x) was the kinetic
energy), but now it is much more general. Thus, all that remains to do is to normal-
ize, and we then obtain the Boltzmann–Gibbs (B–G) distribution, or the canonical
ensemble, which describes the underlying probability law in equilibrium:

exp{−βE(x)}
P(x) = (2.2.35)
Z (β)
2.2 Statistical Ensembles 25

where Z (β) is the normalization factor:



Z (β) = exp{−βE(x)} (2.2.36)
x

in the discrete case, or



Z (β) = dx exp{−βE(x)} (2.2.37)

in the continuous case. This function is called the partition function. As with the
function (E), similar comments apply to the partition function: it must be dimen-
sionless, so if the components of x do have physical units, we must normalize by a
‘reference’ volume, which in the case of the (ideal) gas is again h 3N . By the same
token, for indistinguishable particles, it should be divided by N ! While the micro-
canonical ensemble was defined in terms of the extensive variables E, V and N , in
the canonical ensemble, we replaced the variable E by the intensive variable that
controls it, namely, β (or T ). Thus, the full notation of the partition function should
be Z N (β, V ) or Z N (T, V ).

Exercise 2.1 Show that for the ideal gas


 N
1 1 V
Z N (T, V ) = V N (2πmkT )3N /2 = (2.2.38)
N !h 3N N! λ3

where
 h
λ= √ . (2.2.39)
2πmkT

λ is called the thermal de Broglie wavelength.9

The formula of the B–G distribution is one of the most fundamental results in sta-
tistical mechanics, obtained solely from the energy conservation law and the postulate
of the uniform distribution in an isolated system. As we shall see, the meaning of the
partition function is by far deeper than just being a normalization constant. Interest-
ingly, a great deal of the macroscopic physical quantities, like the internal energy,
the free energy, the entropy, the heat capacity, the pressure, etc., can be obtained
from the partition function. This is in analogy to the fact that in the microcanonical
ensemble, S(E) (or, more generally, S(E, V, N )) was pivotal to the derivation of all
macroscopic physical quantities of interest.

9 The origin of this name comes√from the wave–particle de Broglie relation λ = h/ p together with
the fact that the denominator, 2πmkT , can be viewed as a notion of thermal √ momentum of the
ideal gas, given the fact that the average molecular speed is proportional to kT /m (see Sect. 1.1).
26 2 Elementary Statistical Physics

Several comments are in order:


• The B–G distribution tells us that the system “prefers” to visit its low energy states
more than the high energy states, and what counts is only energy differences, not
absolute energies: If we add to all states a fixed amount of energy E 0 , this will
result in an extra factor of e−β E0 both in the numerator and in the denominator of
the B–G distribution, which of course will cancel out.
• In many physical systems, the Hamiltonian is a quadratic (or “harmonic”) function,
e.g., 21 mv 2 , 21 kx 2 , 21 C V 2 , 21 L I 2 , 21 I ω 2 , etc., in which case the resulting B–G
distribution turns out to be Gaussian. This is at least part of the explanation why
the Gaussian distribution is so frequently encountered  N in Nature.
• When the Hamiltonian is additive, that is, E(x) = i=1 E(xi ), the various particles
are statistically independent: Additive Hamiltonians correspond to non–interacting
particles. In other words, the {xi }’s behave as if they were  N drawn from an i.i.d. prob-
ability distribution. By the law of large numbers N1 i=1 E(xi ) will tend (almost
surely) to  = E(X i ). Thus, the average energy of the system is about N · , not
only on the average, but moreover, with an overwhelmingly high probability for
large N . Nonetheless, this is different from the microcanonical ensemble where
1 N
N i=1 E(x i ) was held strictly at the value of .

One of the important principles of statistical mechanics is that the microcanoni-


cal ensemble and the canonical ensemble (with the corresponding temperature) are
asymptotically equivalent (in the thermodynamic limit) as far as macroscopic quan-
tities go. They continue to be such, even in cases of interactions, as long as these are
short range10 and the same is true with the other ensembles that we will encounter
later in this chapter. This is an important and useful fact, because more often than
not, it is more convenient to analyze things in one ensemble rather than in others,
and so it is appropriate to pass to another ensemble for the purpose of the analysis,
even though the “real system” is in the other ensemble. We will use this ensemble
equivalence principle many times later on. The important thing, however, is to be
consistent and not to mix up two ensembles or more. Having moved to the other
ensemble, it is recommended to keep all further analysis in that ensemble.
Exercise 2.2 Consider the ideal gas with gravitation, where the Hamiltonian
includes, in addition to the kinetic energy term for each molecule, also an additive
term of potential energy mgz i for the i–th molecule (z i being its height). Suppose
that an ideal gas of N molecules of mass m is confined to a room whose floor and
ceiling areas are both A and whose height is h: (i) Write an expression for the joint
pdf of the location r and the momentum p of each molecule. (ii) Use this expression
to show that the gas pressures at the floor and the ceiling are given by

mg N mg N
Pfloor = ; Pceiling = . (2.2.40)
A(1 − e−mgh/kT ) A(emgh/kT − 1)

10 This is related to the concavity of s() [3, 4].


2.2 Statistical Ensembles 27

Properties of the Partition Function and the Free Energy

Let us now examine more closely the partition function and make a few observations
about its basic properties. For simplicity, we shall assume that x is discrete. First, let’s
 obviously, Z (0) is equal to the size of the entire set of microstates,
look at the limits:
which is also E (E). This is the high temperature limit, where all microstates
are equiprobable. At the other extreme, we have:

ln Z (β) 
lim = − min E(x) = −E GS (2.2.41)
β→∞ β x

which describes the situation where the system is frozen to the absolute zero. Only
states with minimum energy – the ground–state energy, prevail.
Another important property of Z (β), or more precisely, of ln Z (β), is that it is
a cumulant generating function: by taking derivatives of ln Z (β), we can obtain
cumulants of E(x). For the first cumulant, we have
 −βE(x)
x E(x)e d ln Z (β)
E(X) =  −βE(x)
=− . (2.2.42)
xe dβ

For example, referring to Exercise 2.1, for the ideal gas,


 N  3N /2
1 V 1 VN 2πm
Z N (β, V ) = = · , (2.2.43)
N! λ3 N ! h 3N β

thus, E(X) = −d ln Z N (β, V )/dβ = 3N /(2β) = 3N kT /2 in agreement with


the result we obtained both in Chap. 1 and in the microcanonical ensemble, thus
demonstrating the ensemble equivalence principle. Similarly, it is easy to show that

d2 ln Z (β)
Var{E(X)} = E 2 (X) − E(X)2 = . (2.2.44)
dβ 2

This in turn implies that


d2 ln Z (β)
≥ 0, (2.2.45)
dβ 2

which means that ln Z (β) must always be a convex function. Note also that

d2 ln Z (β) dE(x)
2
=−
dβ dβ
dE(x) dT
=− ·
dT dβ
= kT C(T )
2
(2.2.46)
28 2 Elementary Statistical Physics

where C(T ) = dE(x)/dT is the heat capacity (at constant volume). Thus, the
convexity of ln Z (β) is intimately related to the physical fact that the heat capacity
of the system is positive.
Next, we look at the function Z (β) slightly differently. Instead of summing the
terms {e−βE(x) } over all states individually, we sum them by energy levels, in a
collective manner. This amounts to:

Z (β) = e−βE(x)
x

= (E)e−β E
E

≈ e N s()/k · e−β N 


= exp{−N β[ − T s()]}

·
= max exp{−N β[ − T s()]}

= exp{−N β min[ − T s()]}


= exp{−N β[∗ − T s(∗ )]}

= e−β F , (2.2.47)

·
where here and throughout the sequel, the notation = means asymptotic equivalence
·
in the exponential scale. More precisely, a N = b N for two positive sequences {a N }
and {b N }, means that lim N →∞ N ln b N = 0.
1 aN


The quantity f = ∗ − T s(∗ ) is the (per–particle) free energy. Similarly, the
entire free energy, F, is defined as

 ln Z (β)
F = E −TS = − = −kT ln Z (β). (2.2.48)
β

Once again, due to the exponentiality of (2.2.47) in N , with very high probability the
system would be found in a microstate x whose normalized energy (x) = E(x)/N is
very close to ∗ , the normalized energy that minimizes  − T s() and hence achieves
f . Note that the minimizing ∗ (obtained by equating the derivative of  − T s()
to zero), is the solution to the equation s  (∗ ) = 1/T , which conforms with the
definition of temperature. We see then that equilibrium in the canonical ensemble
amounts to minimum free energy. This extends the second law of thermodynamics
from isolated systems to non–isolated ones. While in an isolated system, the second
law asserts the principle of maximum entropy, when it comes to a non–isolated
system, this rule is replaced by the principle of minimum free energy.
2.2 Statistical Ensembles 29

Exercise 2.3 Show that the canonical average pressure is given by

∂F ∂ ln Z N (β, V )
P=− = kT · .
∂V ∂V
Examine this formula for the canonical ensemble of the ideal gas. Compare to the
equation of state.

The physical meaning of the free energy, or more precisely, the difference between
two free energies F1 and F2 , is the minimum amount of work that it takes to transfer
the system from equilibrium state 1 to another equilibrium state 2 in an isothermal
(fixed temperature) process. This minimum is achieved when the process is quasi–
static, i.e., so slow that the system is always almost in equilibrium. Equivalently,
−F is the maximum amount of energy in the system, that is free and useful for
performing work (i.e., not dissipated as heat) in fixed temperature.
To demonstrate this point, let us consider the case where E(x) includes a term of
a potential energy that is given by the (scalar) product of a certain external force and
the conjugate physical variable at which this force is exerted (e.g., pressure times
volume, gravitational force times height, moment times angle, magnetic field times
magnetic moment, voltage times electric charge, etc.), i.e.,

E(x) = E0 (x) − λ · L(x) (2.2.49)

where λ is the force and L(x) is the conjugate physical variable, which depends on
(some coordinates of) the microstate. The partition function then depends on both β
and λ and hence will be denoted11 Z (β, λ). It is easy to see (similarly as before) that
ln Z (β, λ) is convex in λ for fixed β. Also,

∂ ln Z (β, λ)
L(x) = kT · . (2.2.50)
∂λ

The free energy is given by12

11 Since the term λ · L(x) is not considered part of the internal energy (but rather an external energy

resource), formally, this ensemble is no longer the canonical ensemble, but a somewhat different
ensemble, called the Gibbs ensemble, which will be discussed later on.
12 At this point, there is a distinction between the Helmholtz free energy and the Gibbs free energy.

The former is defined as F = E − T S in general, as mentioned earlier. The latter is defined as


G = E −T S−λL = −kT ln Z , where L is shorthand notation for L(x) (the quantity H = E −λL
is called the enthalpy). The physical significance of the Gibbs free energy is similar to that of the
Helmholtz free energy, except that it refers to the total work of all other external forces in the system
(if there are any), except the work contributed by the force λ (Exercise 2.4 show this!). The passage
to the Gibbs ensemble, which replaces a fixed value of L(x) (say, constant volume of a gas) by
the control of the conjugate external force λ, (say, pressure in the example of a gas) can be carried
out by another Legendre–Fenchel transform (see, e.g., [5, Sect. 1.14]) as well as Sect. 2.2.3 in the
sequel.
30 2 Elementary Statistical Physics

F = E −TS
= −kT ln Z + λL(X)
 
∂ ln Z
= kT λ · − ln Z . (2.2.51)
∂λ

Now, let F1 and F2 be the equilibrium free energies pertaining to two values of λ,
denoted λ1 and λ2 . Then,
 λ2
∂F
F2 − F1 = dλ ·
λ1 ∂λ
 λ2
∂ 2 ln Z
= kT · dλ · λ ·
λ1 ∂λ2
 λ2
∂L(X)
= dλ · λ ·
λ1 ∂λ
 L(X )λ2
= λ · dL(X) (2.2.52)
L(X )λ1

The product λ · dL(X) designates an infinitesimal amount of (average) work per-


formed by the force λ on a small change in the average of the conjugate variable
L(X), where the expectation is taken w.r.t. the actual value of λ. Thus, the last
integral expresses the total work along a slow process of changing the force λ in
small steps and letting the system adapt and equilibrate after this small change every
time. On the other hand, it is easy to show (using the convexity of ln Z in λ), that if
λ varies in large steps, the resulting amount of work will always be larger.
Let us define
ln Z (β)
φ(β) = lim (2.2.53)
N →∞ N

and, in order to avoid dragging the constant k, let us define

ln (N ) s()
() = lim = . (2.2.54)
N →∞ N k

Then, the chain of equalities (2.2.47), written slightly differently, gives

ln Z (β)
φ(β) = lim
N →∞ N
 
1
N [()−β]
= lim ln e
N →∞ N


= max[() − β].

2.2 Statistical Ensembles 31

Thus, φ(β) is (a certain variant of) the Legendre–Fenchel transform13 of (). As


() is (normally) a concave, monotonically increasing function, then it can readily
be shown that the inverse transform is:

() = min[β + φ(β)]. (2.2.55)


β

The achiever, ∗ (β), of φ(β) in the forward transform is obtained by equating the
derivative to zero, i.e., it is the solution to the equation

β =   (), (2.2.56)

where   () is the derivative of (). In other words, ∗ (β) the inverse function of
  (·). By the same token, the achiever, β ∗ (), of () in the backward transform is
obtained by equating the other derivative to zero, i.e., it is the solution to the equation

 = −φ (β) (2.2.57)

or in other words, the inverse function of −φ (·). This establishes a relationship
between the typical per–particle energy  and the inverse temperature β that gives
rise to  (cf. the Lagrange interpretation above, where we said that β controls the
average energy).

Example 2.3 (Two level system) Similarly to the earlier example of Schottky defects,
which was previously given in the context of the microcanonical ensemble, consider
now a system of N independent particles, each having two possible states: state 0 of
zero energy and state 1, whose energy is 0 , i.e., E(x) = 0 x, x ∈ {0, 1}. The xi ’s are
independent, each having a marginal14 :

e−β0 x
P(x) = x ∈ {0, 1}. (2.2.58)
1 + e−β0

In this case,

φ(β) = ln(1 + e−β0 ) (2.2.59)

13 More precisely, the one–dimensional Legendre–Fenchel transform of a real function f (x) is


defined as g(y) = supx [x y − f (x)]. If f is convex, it can readily be shown that: (i) The inverse
transform has the very same form, i.e., f (x) = sup y [x y − g(y)], and (ii) The derivatives f  (x) and
g  (y) are inverses of each other.
14 Note that the expected number of ‘activated’ particles n = N P(1) = N e−β0 /(1 + e−β0 ) =

N /(eβ0 + 1), in agreement with the result of Example 2.1 (Eq. (2.2.17)). This demonstrates the
ensemble equivalence principle.
32 2 Elementary Statistical Physics

and

() = min[β + ln(1 + e−β0 )]. (2.2.60)


β≥0

To find β ∗ (), we take the derivative and equate to zero:

0 e−β0
− =0 (2.2.61)
1 + e−β0

which gives
ln(0 / − 1)
β ∗ () = . (2.2.62)
0

On substituting this back into the above expression of (), we get:


    

  
() = ln − 1 + ln 1 + exp − ln −1 , (2.2.63)
0 0 0

which after a short algebraic manipulation, becomes


 

() = h 2 , (2.2.64)
0

just like in the Schottky example. In the other direction:


 


φ(β) = max h 2 − β , (2.2.65)
 0

whose achiever ∗ (β) solves the zero–derivative equation:


1 1 − /0
ln =β (2.2.66)
0 /0

or equivalently,
0
∗ (β) = , (2.2.67)
1 + e−β0

which is exactly the inverse function of β ∗ () above, and which when substituted
back into the expression of φ(β), indeed gives

φ(β) = ln(1 + e−β0 ). (2.2.68)

This concludes Example 2.3.


2.2 Statistical Ensembles 33

Comment A very similar model (and hence with similar results) pertains to non–
interacting spins (magnetic moments), where the only difference is that x ∈ {−1, +1}
rather than x ∈ {0, 1}. Here, the meaning of the parameter 0 becomes that of
a magnetic field, which is more customarily denoted by B (or H ), and which is
either parallel or anti-parallel to that of the spin, and so the potential energy (in the
appropriate physical units), B · x, is either Bx or −Bx. Thus,

eβ Bx
P(x) = ; Z (β) = 2 cosh(β B). (2.2.69)
2 cosh(β B)

The net magnetization per–spin is defined as


 
1
N
 ∂φ
m= X i = X 1  = = tanh(β B). (2.2.70)
N i=1 ∂(β B)

This is the paramagnetic characteristic of the magnetization as a function of the


magnetic field: as B → ±∞, the magnetization m → ±1 accordingly. When the
magnetic field is removed (B = 0), the magnetization vanishes too. We will get back
to this model and its extensions in Chap. 5. 
Now, observe that whenever β and  are related as explained above, we have:

() = β + φ(β) = φ(β) − β · φ (β). (2.2.71)

The Gibbs entropy per particle is defined in its normalized form as

1 1
H̄ = − lim P(x) ln P(x) = − lim ln P(x), (2.2.72)
N →∞ N x N →∞ N

which in the case of the B–G distribution amounts to


 
1 Z (β)
H̄ = lim ln −βE(X)
N →∞ N e

ln Z (β) βE(X)
= lim +
N →∞ N N
= φ(β) − β · φ (β),

but this is exactly the same expression as in (2.2.71), and so, () and H̄ are identical
whenever β and  are related accordingly. The former, as we recall, we defined as the
normalized logarithm of the number of microstates with per–particle energy . Thus,
we have learned that the number of such microstates is of the exponential order of
e N H̄ . Another look at this relation is the following:
34 2 Elementary Statistical Physics

exp{−β i E(xi )}
1≥ P(x) =
x: E(x)≈N  x: E(x)≈N 
Z N (β)
·

= exp{−β N  − N φ(β)}
x: E(x)≈N 

= (N ) · exp{−N [β + φ(β)]} (2.2.73)

which means that

(N ) ≤ exp{N [β + φ(β)]} (2.2.74)

for all β, and so,

(N ) ≤ exp{N min[β + φ(β)]} = e N () = e N H̄ . (2.2.75)


β

A compatible lower bound is obtained by observing that the minimizing β gives rise
to E(X 1 ) = , which makes the event {x : E(x) ≈ N } a high–probability event,
by the weak law of large numbers. A good reference for further study, and from a
more general perspective, is the article by Hall [6]. See also [7].
Now, that we identified the Gibbs entropy with the Boltzmann entropy, it is instruc-
tive to point out that the B–G distribution could have been obtained also in a different
manner, owing to the maximum–entropy principle that stems from the second law,
or the minimum free–energy principle. Specifically, let us denote the Gibbs entropy
as

H (P) = − P(x) ln P(x) (2.2.76)
x

and consider the following optimization problem:

max H (P)
s.t. E(X) = E (2.2.77)

By formalizing the equivalent Lagrange problem, where β now plays the role of a
Lagrange multiplier:
  

max H (P) + β E − P(x)E(x) , (2.2.78)
x

or equivalently,
 
H (P)
min P(x)E(x) − (2.2.79)
x
β
2.2 Statistical Ensembles 35

one readily verifies that the solution to this problem is the B–G distribution where the
choice of the (Lagrange multiplier) β controls the average energy E. If β is identified
with the inverse temperature, the above is nothing but the minimization of the free
energy.
Note also that Eq. (2.2.71), which we will rewrite, with a slight abuse of notation
as
φ(β) − βφ (β) = (β) (2.2.80)

can be viewed in two ways. The first suggests to take derivatives of both sides w.r.t.
β and then obtain   (β) = −βφ (β) and so,

s(β) = k(β)
 ∞
=k β̃φ (β̃)dβ̃ 3rd law
β
 T
1 dT̃ 
=k · k T̃ 2 c(T̃ ) · c(T̃ ) = heat capacity per particle
0 k T̃ k T̃ 2
 T
c(T̃ )dT̃
= (2.2.81)
0 T̃

recovering the Clausius entropy as c(T̃ )dT̃ is the increment of heat intake per particle
dq. The second way to look at Eq. (2.2.80) is as a first order differential equation in
φ(β), whose solution is easily found to be
 ∞
dβ̂(β̂)
φ(β) = −βGS + β · , (2.2.82)
β β̂ 2

where GS = lim N →∞ E GS /N . Equivalently,


  

· dβ̂(β̂)
Z (β) = exp −β E GS + N β · , (2.2.83)
β β̂ 2

namely, the partition function at a certain temperature can be expressed as a functional


of the entropy pertaining to all temperatures lower than that temperature. Changing
the integration variable from β to T , this readily gives the relation
 T
F = E GS − S(T  )dT  . (2.2.84)
0

Since F = E − ST , we have
 T  S
E = E GS + ST − S(T  )dT  = E GS + T (S  )dS  , (2.2.85)
0 0
36 2 Elementary Statistical Physics

where the second term amounts to the heat Q that accumulates in the system, as
the temperature is raised from 0 to T . This is a special case of the first law of
thermodynamics. The more general form, as said, takes into account also possible
work performed on (or by) the system.
Let us now summarize the main properties of the partition function that we have
seen thus far:
1. Z (β) is a continuous function. Z (0) = |X n | and limβ→∞ ln Zβ(β) = −E GS .
2. Generating cumulants: E(X) = −d ln Z /dβ, Var{E(X)} = d2 ln Z /dβ 2 , which
implies convexity of ln Z , and hence also of φ(β).
3. φ and  are a Legendre–Fenchel transform pair.  is concave.
We have also seen that Boltzmann’s entropy is not only equivalent to the Clausius
entropy, but also to the Gibbs/Shannon entropy. Thus, there are actually three different
forms of the expression of entropy.

Comment Consider Z (β) for an imaginary temperature β = iω, where i = −1,
and define z(E) as the inverse Fourier transform of Z (iω). It can readily be seen that
z(E) = ω(E), i.e., for E 1 < E 2 , the number of states with energy between E 1 and
E
E 2 is given by E12 z(E)dE. Thus, Z (·) can be related to energy enumeration in two
different ways: one is by the Legendre–Fenchel transform of ln Z (β) for real β, and
the other is by the inverse Fourier transform of Z (β) for imaginary β. It should be
kept in mind, however, that while the latter relation holds for every system size N ,
the former is true only in the thermodynamic limit, as mentioned.

The Energy Equipartition Theorem

It turns out that in the case of a quadratic Hamiltonian, E(x) = 21 αx 2 , which means
that x is Gaussian, the average per–particle energy, is always given by 1/(2β) =
kT /2, independently of α. If we have N such quadratic terms, then of course, we
end up with N kT /2. In the case of the ideal gas, we have three such terms (one
for each dimension) per particle, thus a total of 3N terms, and so, E = 3N kT /2,
which is exactly the expression we obtained also from the microcanonical ensemble
as well as in the previous chapter. In fact, we observe that in the canonical ensemble,
whenever we have an Hamiltonian of the form α2 xi2 plus some arbitrary terms that
do not depend on xi , then xi is Gaussian (with variance kT /α) and independent of
the other variables, i.e., p(xi ) ∝ e−αxi /(2kT ) . Hence it contributes an amount of
2

 
1 1 kT kT
αX i = α ·
2
= (2.2.86)
2 2 α 2

to the total average energy, independently of α. It is more precise to refer to this xi as


a degree of freedom rather than a particle. This is because in the three–dimensional
world, the kinetic energy, for example, is given by px2 /(2m) + p 2y /(2m) + pz2 /(2m),
that is, each particle contributes three additive quadratic terms rather than one ( just
2.2 Statistical Ensembles 37

like three independent one–dimensional particles) and so, it contributes 3kT /2. This
principle is called the energy equipartition theorem.
Below is a direct derivation of the equipartition theorem:
  ∞ −βαx 2 /2
−∞ dx(αx /2)e
2
1
αX =
2
∞
2 −βαx 2 /2)
−∞ dxe
 ∞


dxe−βαx /2
2
=− ln
∂β −∞
 ∞ 

∂ 1 √
−α( βx)2 /2
=− ln √ d( βx)e
∂β β −∞
 ∞

∂ 1 −αu 2 /2
=− ln √ due
∂β β −∞
1 d ln β 1 kT
= = = .
2 dβ 2β 2

Note that although we could have used closed–form expressions for both the numer-
ator and the denominator of the first line, we have deliberately taken a somewhat
different route in the second line, where we have presented it as the derivative of
the denominator of the first line. Also, rather than calculating the Gaussian integral
explicitly, we only figured out how it scales with β, because this is the only thing
that matters after taking the derivative relative to β. The reason for using this trick
of bypassing the need to calculate integrals, is that it can easily be extended in two
directions at least:
1. Let x ∈ IR N and let E(x) = 21 xT Ax, where A is a N × N positive definite
matrix. This corresponds to a physical system with a quadratic Hamiltonian, which
includes also interactions between pairs (e.g., harmonic oscillators or springs, which
are coupled because they are tied to one another). It turns out that here, regardless of
A, we get:
 
1 T kT
E(X) = X AX = N · . (2.2.87)
2 2

2. Back to the case of a scalar x, but suppose now a more general power–law
Hamiltonian, E(x) = α|x|θ . In this case, we get

kT
E(X ) = α|X |θ  = . (2.2.88)
θ

Moreover, if lim x→±∞ xe−βE(x) = 0 for all β > 0, and we denote E  (x) = dE(x)/dx,
then

X · E  (X ) = kT. (2.2.89)
38 2 Elementary Statistical Physics

It is easy to see that the earlier power–law result is obtained as a special case of this,
as E  (x) = αθ|x|θ−1 sgn(x) in this case.

Example 2.4 (Ideal gas with gravitation) Let

px2 + p 2y + pz2
E(x) = + mgz. (2.2.90)
2m
The average kinetic energy of each particle is 3kT /2, as said before. The contribution
of the average potential energy is kT (one degree of freedom with θ = 1). Thus, the
total is 5kT /2, where 60% come from kinetic energy and 40% come from potential
energy, universally, that is, independent of T , m, and g. 

2.2.3 The Grand–Canonical Ensemble and the Gibbs


Ensemble

A brief summary of what we have done thus far, is the following: we started with
the microcanonical ensemble, which was very restrictive in the sense that the energy
was held strictly fixed to the value of E, the number of particles was held strictly
fixed to the value of N , and at least in the example of a gas, the volume was also held
strictly fixed to a certain value V . In the passage from the microcanonical ensemble
to the canonical one, we slightly relaxed the first of these parameters, E: rather than
insisting on a fixed value of E, we allowed energy to be exchanged back and forth
with the environment, and thereby to slightly fluctuate (for large N ) around a certain
average value, which was controlled by temperature, or equivalently, by the choice
of β. This was done while keeping in mind that the total energy of both system and
heat bath must be kept fixed, by the law of energy conservation, which allowed us
to look at the combined system as an isolated one, thus obeying the microcanonical
ensemble. We then had a one–to–one correspondence between the extensive quantity
E and the intensive variable β, that adjusted its average value. But the other extensive
variables, like N and V were still kept strictly fixed.
It turns out, that we can continue in this spirit, and ‘relax’ also either one of the
other variables N or V (but not both at the same time), allowing it to fluctuate around
a typical average value, and controlling it by a corresponding intensive variable. Like
E, both N and V are also subjected to conservation laws when the combined system
is considered. Each one of these relaxations leads to a new ensemble, in addition to
the microcanonical and the canonical ensembles that we have already seen. In the
case where it is the variable V that is allowed to be flexible, this ensemble is called
the Gibbs ensemble. In the case where it is the variable N , this ensemble is called
the grand–canonical ensemble. There are, of course, additional ensembles based on
this principle, depending on the kind of the physical system.
2.2 Statistical Ensembles 39

The Grand–Canonical Ensemble

The fundamental idea is essentially the very same as the one we used to derive the
canonical ensemble: let us get back to our (relatively small) subsystem, which is in
contact with a heat bath, and this time, let us allow this subsystem to exchange with
the heat bath, not only energy, but also matter, i.e., particles. The heat bath consists of
a huge reservoir of energy and particles. The total energy is E 0 and the total number
of particles is N0 . Suppose that we can calculate the number/volume of states of the
heat bath as a function of both its energy E  and amount of particles N  , and denote
this function by  B (E  , N  ). A microstate is now a combination (x, N ), where N is
the (variable) number of particles in our subsystem and x is as before for a given N .
From the same considerations as before, whenever our subsystem is in state (x, N ),
the heat bath can be in any one of  B (E 0 − E(x), N0 − N ) microstates of its own.
Thus, owing to the microcanonical ensemble,

P(x, N ) ∝  B (E 0 − E(x), N0 − N )
= exp{S B (E 0 − E(x), N0 − N )/k}
 
S B (E 0 , N0 ) 1 ∂ S B 1 ∂ SB
≈ exp − · E(x) − ·N
k k ∂E k ∂N
 
E(x) μN
∝ exp − + (2.2.91)
kT kT

where μ is the chemical potential of the heat bath. Thus, we now have the grand–
canonical distribution:
eβ[μN −E(x)]
P(x, N ) = , (2.2.92)
(β, μ)

where the denominator is called the grand partition function:





 
(β, μ) = eβμN e−βE(x) = eβμN Z N (β). (2.2.93)
N =0 x N =0

Example 2.5 (Grand partition function of the ideal gas) Using the result of Exercise
2.1, we have for the ideal gas:

 N
1
βμN V
(β, μ) = e ·
N =0
N ! λ 3

∞  
1 V N
= eβμ · 3
N =0
N! λ
40 2 Elementary Statistical Physics
 
V
= exp eβμ · 3 . (2.2.94)
λ

It is convenient to change variables and to define z = eβμ (which is called the


fugacity) and then, define


˜
(β, z) = z N Z N (β). (2.2.95)
N =0

˜
This notation emphasizes the fact that for a given β, (z) is actually the z–transform
of the sequence {Z N (β)} N ≥0 . A natural way to think about P(x, N ) is as P(N ) ·
P(x|N ), where P(N ) is proportional to z N Z N (β) and P(x|N ) corresponds to the
canonical ensemble as before.
Using the grand partition function, it is now easy to obtain moments of the random
variable N . For example, the first moment is:

N z N Z N (β) ˜
∂ ln (β, z)
N  = N N =z· . (2.2.96)
N z Z N (β) ∂z

Thus, we have replaced the fixed number of particles, N , by a random number of


particles, which concentrates around an average controlled by μ, or equivalently,
by z. The dominant15 value of N is the one that maximizes the product z N Z N (β),
or equivalently, βμN + ln Z N (β) = β(μN − FN ). Thus, ln  ˜ is related to ln Z N
˜
by another kind of a Legendre–Fenchel transform: ln (β, z, V ) ≈ max N [βμN +
ln Z N (β, V )] or equivalently

˜
kT ln (β, z, V ) ≈ max[μN + kT ln Z N (β, V )].
N

Note that by passing to the grand–canonical ensemble, we have replaced two


extensive quantities, E and N , by their respective conjugate intensive variables,
T and μ. This means that the grand partition function depends on one remaining
extensive variable only, which is V , and so, under ordinary conditions, ln (β, z),
or in its more complete notation, ln (β, z, V ), depends linearly on V at least in the
thermodynamic limit, namely, lim V →∞ [ln (β, z, V )]/V tends to a constant that
depends only on β and z. What is this constant? Let us examine again the first law in
its more general form, as it appears in Eq. (2.2.30). For fixed T and μ, we have the
following:

15 The best way to understand this is in analogy to the derivation of ∗ as the minimizer of the free
energy in the canonical ensemble, except that now the ‘big’ extensive variable is V rather than
N , so that z N Z N (β, V ) is roughly exponential in V for a given fixed ρ = N /V . The exponential
coefficient depends on ρ, and the ‘dominant’ ρ∗ maximizes this coefficient. Finally, the ‘dominant’
N is N ∗ = ρ∗ V .
2.2 Statistical Ensembles 41

PδV = μδ N + T δS − δ E
= δ(μN + T S − E)
= δ(μN − F)
≈ kT · δ[ln (β, z, V )] V large (2.2.97)

Thus, the constant of proportionality must be P. In other words, the grand–canonical


formula of the pressure is:

ln (β, z, V )
P = kT · lim . (2.2.98)
V →∞ V

This is different from the canonical–ensemble formula (Exercise 2.3): P = kT · ∂


ln Z N (β, V )/∂V , and to the microcanonical–ensemble formula, P = T · ∂ S(E,
V, N )/∂V .

Example 2.6 (more on the ideal gas) Applying formula (2.2.96) to Eq. (2.2.94), we
readily obtain
zV eμ/kT V
N  = = . (2.2.99)
λ3 λ3

We see that the grand–canonical factor eμ/kT has the physical meaning of the average
number of ideal gas atoms in a cube of size λ × λ × λ, where λ is the thermal de
Broglie wavelength. Now, applying Eqs. (2.2.98) to (2.2.94), we get

kT · eμ/kT N  · kT
P= = , (2.2.100)
λ3 V
recovering again the equation of state of the ideal gas. This also demonstrates the
principle of ensemble equivalence.

Once again, it should be pointed out that beyond the obvious physical significance
of the grand–canonical ensemble, sometimes it proves useful to work with from rea-
sons of pure mathematical convenience, using the principle of ensemble equivalence.
We will see this very clearly in the next chapters on quantum statistics.

The Gibbs Ensemble

Consider now the case where T and N are fixed, but V is allowed to fluctuate
around an average volume controlled by the pressure P. Again, we can analyze our
relatively small test system surrounded by a heat bath. The total energy is E 0 and
the total volume of the system and the heat bath is V0 . Suppose that we can calculate
the count/volume of states of the heat bath as function of both its energy E  and the
volume V  , call it  B (E  , V  ). A microstate is now a combination (x, V ), where V
is the (variable) volume of our subsystem. Once again, the same line of thought is
42 2 Elementary Statistical Physics

used: whenever our subsystem is at state (x, V ), the heat bath may be in any one of
 B (E 0 − E(x), V0 − V ) microstates of its own. Thus,

P(x, V ) ∝  B (E 0 − E(x), V0 − V )
= exp{S B (E 0 − E(x), V0 − V )/k}
 
S B (E 0 , V0 ) 1 ∂ S B 1 ∂ SB
≈ exp − · E(x) − ·V
k k ∂E k ∂V
 
E(x) PV
∝ exp − −
kT kT
= exp{−β[E(x) + P V ]}. (2.2.101)

The corresponding partition function that normalizes this probability function is


given by
 ∞  ∞
Y N (β, P) = e−β P V Z N (β, V )dV = e−β P V dV e−βE(x) . (2.2.102)
0 0 x

For a given N and β, the function Y N (β, P) can be thought of as the Laplace transform
of Z N (β, V ) as a function of V . In the thermodynamic limit, lim N →∞ N1 ln Y N (β, P)
is the Legendre–Fenchel transform of lim N →∞ N1 ln Z N (β, V ) for fixed β, similarly
to the Legendre–Fenchel relationship between the entropy and the canonical log–
partition function.16 Note that analogously to Eq. (2.2.96), here the Gibbs partition
function serves as a cumulant generating function for the random variable V , thus,
for example,
∂ ln Y N (β, P)
V  = −kT · . (2.2.103)
∂P
As mentioned in footnote no. 20,

G = −kT ln Y N (β, P) = E − T S + P V = F + P V (2.2.104)

is the Gibbs free energy of the system, and for the case considered here, the force
is the pressure and the conjugate variable it controls is the volume. In analogy to
the grand–canonical ensemble, here too, there is only one extensive variable, this
time, it is N . Thus, G should be asymptotically proportional to N with a constant of
proportionality that depends on the fixed values of T and P.

16 Exercise 2.5 Write explicitly the Legendre–Fenchel relation (and its inverse) between the Gibbs
partition function and the canonical partition function.
2.2 Statistical Ensembles 43

Exercise 2.6 Show that this constant is the chemical potential μ.


All this is, of course, relevant when the physical system is a gas in a container. In
general, the Gibbs ensemble is obtained by a similar Legendre–Fenchel transform
replacing an extensive physical quantity of the canonical ensemble by its conjugate
force. For example, magnetic field is conjugate to magnetization, electric field is
conjugate to electric charge, mechanical force is conjugate to displacement, moment
is conjugate to angular shift, and so on. By the same token, the chemical potential is
a ‘force’ that is conjugate to the number of particles in grand–canonical ensemble,
and (inverse) temperature is a ‘force’ that controls the heat energy.
Figure 2.2 summarizes the thermodynamic potentials associated with the various
statistical ensembles. The arrow between each two connected blocks in the diagram
designates a passage from one ensemble to another by a Legendre–Fenchel transform
operator L that is defined generically at the bottom of the figure. In each passage,
it is also indicated which extensive variable is replaced by its conjugate intensive
variable.
It should be noted, that at least mathematically, one could have defined three
more ensembles that would complete the picture of Fig. 2.2 in a symmetric manner.
Two of the additional ensembles can be obtained by applying Legendre–Fenchel
transforms on S(E, V, N ), other than the transform that takes us to the canonical
ensemble. The first Legendre–Fenchel transform is w.r.t. the variable V , replacing
it by P, and the second additional ensemble is w.r.t. the variable N , replacing it
by μ. Let us denote the new resulting ‘potentials’ (minus kT times log–partition
functions) by A(E, P, N ) and B(E, V, μ), respectively. The third ensemble, with
potential C(E, P, μ), whose only extensive variable is E, could be obtained by

Fig. 2.2 Diagram of Legendre–Fenchel relations between the various ensembles


44 2 Elementary Statistical Physics

yet another Legendre–Fenchel transform, either on A(E, P, N ) or B(E, V, μ) w.r.t.


the appropriate extensive variable. Of course, A(E, P, N ) and B(E, V, μ) are also
connected directly to the Gibbs ensemble and to the grand–canonical ensemble,
respectively, both by Legendre–Fenchel–transforming w.r.t. E. While these three
ensembles are not really used in physics, they might prove useful to work with
for calculating certain physical quantities, by taking advantage of the principle of
ensemble equivalence.

Exercise 2.7 Complete the diagram of Fig. 2.2 by the three additional ensembles just
defined. Can you give physical meanings to A, B and C? Also, as said, C(E, P, μ) has
only E as an extensive variable. Thus, lim E→∞ C(E, P, μ)/E should be a constant.
What is this constant?

Even more generally, we could start from a system model, whose micro–canonical
ensemble consists of many extensive variables L 1 , . . . , L n , in addition to the inter-
nal energy E (not just V and N ). The entropy function is then S(E, L 1 , . . . , L n , N ).
Here, L i can be, for example, volume, mass, electric charge, electric polarization in
each one of the three axes, magnetization in each one of three axes, and so on. The
first Legendre–Fenchel transform takes us from the micro–canonical ensemble to the
canonical one upon replacing E by β. Then we can think of various Gibbs ensem-
bles obtained by replacing any subset of extensive variables L i by their respective
conjugate forces λi = T ∂ S/∂ L i , i = 1, . . . , n (in the above examples: pressure,
gravitational force (weight), voltage (or electric potential), electric fields, and mag-
netic fields in the corresponding axes, respectively). In the extreme case, all L i are
replaced by λi upon applying successive Legendre–Fenchel transforms, or equiva-
lently, a multi–dimensional Legendre–Fenchel transform:

G(T, λ1 , . . . , λn , N ) = − sup [kT ln Z N (β, L 1 , . . . , L n ) − λ1 L 1 − . . . − λn L n ] .


L 1 ,...,L n
(2.2.105)
Once again, there must be one extensive variable at least.

2.3 Suggestions for Supplementary Reading

Part of the presentation in this chapter is similar to a corresponding chapter in [1]. The
derivations associated with the various ensembles of statistical mechanics, as well
as their many properties, can also be found in any textbook on elementary statistical
mechanics, including: Beck [8, Chap. 3], Huang [9, Chaps. 6, 7], Honerkamp [10,
Chap. 3], Landau and Lifshitz [2], Pathria [11, Chaps. 2–4], and Reif [12, Chap. 6],
among many others.
References 45

References

1. N. Merhav, Statistical physics and information theory. Found. Trends Commun. Inf. Theory
6(1–2), 1–212 (2009)
2. L.D. Landau, E.M. Lifshitz, Course of Theoretical Physics – Volume 5: Statistical Physics,
Part 1, 3rd edn. (Elsevier: Butterworth–Heinemann, New York, 1980)
3. J. Barré, D. Mukamel, S. Ruffo, Inequivalence of ensembles in a system with long-range
interactions. Phys. Rev. Lett. 87(3), 030601 (2001)
4. D. Mukamel, Statistical mechanics of systems with long range interactions, in AIP Conference
Proceedings, vol. 970, no. 1, ed. by A. Campa, A. Giansanti, G. Morigi, F. Sylos Labini (AIP,
2008), pp. 22–38
5. R. Kubo, Statistical Mechanics (North-Holland, New York, 1965)
6. M.J.W. Hall, Universal geometric approach to uncertainty, entropy, and information. Phys. Rev.
A 59(4), 2602–2615 (1999)
7. R.S. Ellis, Entropy, Large Deviations, and Statistical Mechanics (Springer, New York, 1985)
8. A.H.W. Beck, Statistical Mechanics, Fluctuations and Noise (Edward Arnold Publishers, Lon-
don, 1976)
9. K. Huang, Statistical Mechanics, 2nd edn. (Wiley, New York, 1987)
10. J. Honerkamp, Statistical Physics - An Advanced Approach with Applications, 2nd edn.
(Springer, Berlin, 2002)
11. R.K. Pathria, Statistical Mechanics, 2nd edn. (Elsevier: Butterworth-Heinemann, Oxford,
1996)
12. F. Reif, Fundamentals of Statistical and Thermal Physics (McGraw-Hill, New York, 1965)
Chapter 3
Quantum Statistics – The Fermi–Dirac
Distribution

In our discussion thus far, we have largely taken for granted the assumption that our
system can be analyzed in the classical regime, where quantum effects are negligible.
This is, of course, not always the case, especially at very low temperatures. Also,
if radiation plays a role in the physical system, then at very high frequency ν, the
classical approximation also breaks down. Roughly speaking, kT should be much
larger than hν for the classical regime to be well justified.1 It is therefore necessary
to address quantum effects in statistical physics issues, most notably, the fact that
certain quantities, like energy and angular momentum (or spin), no longer take on
values in the continuum, but only in a discrete set, which depends on the system in
question.
Consider a gas of identical particles with discrete single–particle quantum states,
1, 2, . . . , r, . . ., corresponding to energies

1 ≤ 2 ≤ · · · ≤ r ≤ · · · .

Since the particles are assumed indistinguishable, then for a gas of N particles, a
micro–state is defined by the combination of occupation numbers

N 1 , N 2 , . . . , Nr , . . . ,

where Nr is the number of particles at a single state r .

1 One well–known example is black–body radiation. According to the classical theory, the radiation

density per unit frequency grows proportionally to kT ν 2 , a function whose integral over ν, from zero
to infinity, diverges (“the ultraviolet catastrophe”). This absurd is resolved by quantum mechanical
considerations, according to which the factor kT should be replaced by hν/[ehν/(kT ) − 1], which
is close to kT at low frequencies, but decays exponentially for ν > kT / h.
© The Author(s) 2018 47
N. Merhav, Statistical Physics for Electrical Engineering,
DOI 10.1007/978-3-319-62063-3_3
48 3 Quantum Statistics – The Fermi–Dirac Distribution

The first fundamental question is the following: what values can the occupation
numbers N1 , N2 , . . . assume? According to quantum mechanics, there might be cer-
tain restrictions on these numbers. In particular, there are two kinds of situations that
may arise, which divide the various particles in the world into two mutually exclusive
classes.
For the first class of particles, there are no restrictions at all. The occupation
numbers can assume any non–negative integer value (Nr = 0, 1, 2, . . .). Particles
of this class are called Bose–Einstein (BE) particles2 or bosons for short. Another
feature of bosons is that their spins are always integral multiples of , namely, 0, ,
2, etc. Examples of bosons are photons, π mesons and K mesons. We will focus
on them in the next chapter.
In the second class of particles, the occupation numbers are restricted by the Pauli
exclusion principle (discovered in 1925), according to which no more than one par-
ticle can occupy a given quantum state r (thus Nr is either 0 or 1 for all r ), since
the wave function of two such particles is anti–symmetric and thus vanishes if they
assume the same quantum state (unlike bosons for which the wave function is sym-
metric). Particles of this kind are called Fermi–Dirac (FD) particles3 or fermions for
short. Another characteristic of fermions is that their spins are always odd multiples
of /2, namely, /2, 3/2, 5/2, etc. Examples of fermions are electrons, positrons,
protons, and neutrons. The statistical mechanics of fermions will be discussed in this
chapter.

3.1 Combinatorial Derivation of the FD Statistics

Consider a gas of N fermions in volume V and temperature T . In the thermodynamic


limit, where the dimensions of the system are large, the discrete single–particle energy
levels {r } are very close to each other. Therefore, instead of considering each one of
them individually, we shall consider groups of neighboring states. Since the energy
levels in each group are very close, we will approximate all of them by a single
energy value. Let us label these groups by s = 1, 2, . . .. Let group no. s contain G s
single–particle states and let the representative energy level be ˆs . Let us assume that
G s  1. A (coarse–grained) microstate of the gas is now defined by the occupation
numbers

N̂1 , N̂2 , . . . , N̂s , . . . ,



N̂s being the total number of particles in group no. s, where, of course s N̂s = N .

2 Bosons were first introduced by Bose (1924) in order to derive Planck’s radiation law, and Einstein

applied this finding in the same year to a perfect gas of particles.


3 Introduced independently by Fermi and Dirac in 1926.
3.1 Combinatorial Derivation of the FD Statistics 49

To derive the equilibrium behavior of this system, we analyze the Helmholtz free
energy F as a function of the occupation
 numbers, and use the fact that in equilibrium,
it should be minimum. Since E = s N̂s ˆs and F = E − T S, this boils down to
the evaluation of the entropy S = k ln ( N̂1 , N̂2 , . . .). Let s ( N̂s ) be the number of
ways of putting N̂s particles into G s states of group no. s. Now, for fermions each
one of the G s states is either empty or occupied by one particle. Thus,

Gs !
s ( N̂s ) = (3.1.1)
N̂s !(G s − N̂s )!

and

( N̂1 , N̂2 , . . .) = s ( N̂s ). (3.1.2)
s

Therefore,

F( N̂1 , N̂2 , . . .) = [ N̂s ˆs − kT ln s ( N̂s )]
s
  
 N̂s
≈ N̂s ˆs − kT G s h 2 . (3.1.3)
s
Gs


As said, we wish to minimize F( N̂1 , N̂2 , . . .) s.t. the constraint s N̂s = N . Consider
then the minimization of the Lagrangian4
    
 N̂s 
L= N̂s ˆs − kT G s h 2 −λ N̂s − N . (3.1.4)
s
Gs s

The solution is readily obtained to read

Gs
N̂s = (3.1.5)
e(ˆs −λ)/kT + 1

where the Lagrange multiplier λ is determined to satisfy the constraint


 Gs
= N. (3.1.6)
s
e(ˆs −λ)/kT + 1


4 For readers that are not familiar with Lagrangians, the minimization of F s.t. s N̂s = N , is

equivalent to the unconstrained minimization of F − λ( s N̂s − N ) for the value of λ at which
the constraint is met with equality by the minimizer { N̂s∗ }. This is because F( N̂1∗ , N̂2∗ , . . .) −
   
λ( s N̂s∗ − N ) ≤ F( N̂1 , N̂2 , . . .) − λ( s N̂s − N ), together with s N̂s = s N̂s∗ = N , imply

F( N̂1∗ , N̂2∗ , . . .) ≤ F( N̂1 , N̂2 , . . .) for every { N̂s } with s N̂s = N .
50 3 Quantum Statistics – The Fermi–Dirac Distribution

Fig. 3.1 Illustration of the 1


FD distribution. As T
0.9
decreases, the curve becomes
closer to N̄r = u(μ − r ), 0.8
where u(·) is the unit step 0.7
function
0.6

0.5

0.4

0.3

0.2

0.1

0
0 2 4 6 8 10

Exercise 3.1 After showing the general relation μ = (∂ F/∂ N )T,V , show that λ =
μ, namely, the Lagrange multiplier λ has the physical meaning of the chemical
potential. From now on, then we replace the notation λ by μ.

Note that N̂s /G s is the mean occupation number N̄r of a single state r within
group no. s. I.e.,
1
N̄r = (3.1.7)
e(r −μ)/kT +1

with the constraint


 1
= N. (3.1.8)
r
e(r −μ)/kT +1

It is pleasing that this result no longer depends on the partition into groups. Equa-
tion (3.1.7) is the FD distribution, and it is depicted in Fig. 3.1.

3.2 FD Statistics from the Grand–Canonical Ensemble

Thanks to the principle of ensemble equivalence, an alternative, simpler derivation of


the FD distribution results from the use of the grand–canonical ensemble. Beginning
from the canonical partition function
 

1 
1  
Z N (β) = ...δ Nr = N e−β r Nr r
, (3.2.1)
N1 =0 N2 =0 r
3.2 FD Statistics from the Grand–Canonical Ensemble 51

we pass to the grand–canonical ensemble in the following manner:


 
 
1 
1  
(β, μ) = eβμN ...δ Nr = N e−β r Nr r

N =0 N1 =0 N2 =0 r
 ∞
 

1 
1   
= ... δ Nr = N eβ r Nr (μ−r )

N1 =0 N2 =0 N =0 r


1 
1 
= . . . eβ r Nr (μ−r )

N1 =0 N2 =0


1 
1 
= ... eβ Nr (μ−r )
N1 =0 N2 =0 r
 
 
1
β Nr (μ−r )
= e
r Nr =0


= 1 + eβ(μ−r ) . (3.2.2)
r

Note that this product form of the grand partition function means that under the grand–
canonical ensemble the binary random variables {Nr } are statistically independent,
i.e.,

P(N1 , N2 , . . .) = Pr (Nr ) (3.2.3)
r

where
eβ Nr (μ−r )
Pr (Nr ) = , Nr = 0, 1, r = 1, 2, . . . . (3.2.4)
1 + eβ(μ−r )

Thus,
e(μ−r )/kT 1
N̄r = Pr{Nr = 1} = = ( −μ)/kT . (3.2.5)
1 + e(μ−r )/kT e r +1

Equivalently, defining αr = β(μ − r ), we have  = r 1Nr =0 eαr Nr , giving rise
to N̄r = ∂ ln /∂αr = eαr /(1 + eαr ), which is the same result.
52 3 Quantum Statistics – The Fermi–Dirac Distribution

3.3 The Fermi Energy

Let us now examine what happens if the system is cooled to the absolute zero (T →
0). It should be kept in mind that the chemical potential μ depends on T , so let μ0
be the chemical potential at T = 0. It is readily seen that N̄r approaches a unit
step function (see Fig. 3.1), namely, all energy levels {r } below μ0 are occupied
( N̄r ≈ 1) by a fermion, whereas all those that are above μ0 are empty ( N̄r ≈ 0).
The explanation is simple: Pauli’s exclusion principle does not allow all particles to
reside at the ground state at T = 0 since then many of them would occupy the same
quantum state. The minimum energy of the system that can possibly be achieved is
when all energy levels are filled up, one by one, starting from the ground state up to
some maximum level, which is exactly μ0 . This explains why even at the absolute
zero, fermions have energy.5 The maximum occupied energy level in a gas of non–
interacting fermions at the absolute zero is called the Fermi energy, which we shall
denote by F . Thus, μ0 = F , and then the FD distribution at very low temperatures
is approximately
1
N̄r = . (3.3.1)
e(r −F )/kT +1

We next take a closer look on the FD distribution, taking into account the density
of states. Consider a metal box of dimensions L x × L y × L z and hence volume
V = L x L y L z . The energy level associated with quantum number (l x , l y , l z ) is given
by
 
π 2 2 l x2 l y2 l z2 2 2
lx ,l y ,lz = + + = (k + k 2y + k z2 ), (3.3.2)
2m L 2x L 2y L 2z 2m x

where k x , k y and k z are the wave numbers pertaining to the various solutions of
the Schrödinger equation. First, we would like to count how many quantum states
{(l x , l y , l z )} give rise to energy between  and  + d. We denote this number by
g()d, where g() is the density of states.

 2m π 2 l x2 π 2 l y2 π 2 l z2 2m( + d)
g()d = 1 ≤ + + ≤
l x ,l y ,l z
 2 Lx 2 Ly 2 Lz 2 2
 

Lx L y Lz
· Vol k  2 ≤ 2m( + d)
 : 2m ≤ k
π3 2 2
 
V
= 3 · Vol k :
2m
≤  2 ≤ 2m( + d) .
k (3.3.3)
π 2 2

5 Indeed, free electrons in a metal continue to be mobile and free even at T = 0.


3.3 The Fermi Energy 53

A volume element pertaining to a given value of K = k  ( k being k x x̂ + k y ŷ + k z ẑ)


is given by dK times the surface area of sphere of radius K , namely, 4πK 2 dK , but it
has to be divided by 8, to account for the fact that the components of k are positive.
I.e., it is π2 K 2 dK . From Eq. (3.3.2), we have  = 2 K 2 /2m, and so
 √
2m 1 2m 2m 3 d
K 2 dK = · d = (3.3.4)
2 2  3

Therefore, combining the above, we get



2m 3 V
g() = . (3.3.5)
2π 2 3
For electrons, spin values of ±1/2 are allowed, so this density should be doubled,
and so

2m 3 V
ge () = . (3.3.6)
π 2 3
Approximating the equation of the constraint on the total number of electrons, we
get
 1
Ne =
r
e(r −F )/kT + 1
 ∞
ge ()d

0 e F )/kT +
(− 1
√  ∞ √
2m 3 V d
= ·
π 2 3 0 e(−F )/kT + 1
√  F
2m 3 V √
≈ · d T ≈0
π 2 3 0
√ 3/2
2m 3 V 2F
= · (3.3.7)
π 2 3 3
which easily leads to the following simple formula for the Fermi energy:
 2/3
2 3π 2 Ne 2
F = = (3π 2 ρe )2/3 , (3.3.8)
2m V 2m

where ρe is the electron density. In most metals F is about the order of 5–10 electron–
volts (eV’s), whose equivalent temperature TF = F /k (the Fermi temperature) is of
the order of magnitude of 100,000 ◦ K. Hence, the Fermi energy is much larger than
kT in laboratory conditions. In other words, electrons in a metal behave like a gas
54 3 Quantum Statistics – The Fermi–Dirac Distribution

at an extremely high temperature. This means that the internal pressure in metals
(the Fermi pressure) is extremely large and this a reason why metals are almost
incompressible. This kind of pressure also stabilizes a neutron star (a Fermi gas of
neutrons) or a white dwarf star (a Fermi gas of electrons) against the inward pull of
gravity, which would ostensibly collapse the star into a Black Hole. Only when a
star is sufficiently massive to overcome the degeneracy pressure can it collapse into
a singularity.

Exercise 3.2 Derive an expression for n of an electron near T = 0, in terms of F .

3.4 Useful Approximations of Fermi Integrals

Before considering applications, it will be instructive to develop some useful approx-


imations for integrals associated with the Fermi function

 1
f () = . (3.4.1)
e(−μ)/kT + 1

For example, if we wish to calculate the average energy, we have to deal with an
integral like
 ∞
3/2 f ()d.
0

Consider then, more generally, an integral of the form


 ∞
In = n f ()d.
0

Upon integrating by parts, we readily have


  ∞
n+1 ∞ 1
In = f () · − n+1 f  ()d
n + 1 0 n+1 0
 ∞
1
=− n+1 f  ()d (3.4.2)
n+1 0

Changing variables to x = ( − μ)/kT ,


 ∞
1
In = − (μ + kT x)n+1 φ (x)dx
n+1 −μ/kT
 ∞ 
μn+1 kT x n+1 
≈− 1+ φ (x)dx, (3.4.3)
n+1 −∞ μ
3.4 Useful Approximations of Fermi Integrals 55

where we have introduced the scaled version of f , which is φ(x) = f (μ + kT x) =


1/(e x + 1) and φ is its derivative, and where in the second line we are assuming
μ  kT . Applying the Taylor series expansion to the binomial term (recall that n is
not necessarily an integer), and using the symmetry of φ around the origin, we have
   2 

μn+1 kT x n(n + 1) kT x
In = − 1 + (n + 1) + + · · · φ (x)dx
n+1 −∞ μ 2 μ
  2  
∞ ∞
μn+1 n(n + 1) kT
=− φ (x)dx + 2 
x φ (x)dx + · · ·
n + 1 −∞ 2 μ −∞
   
μn+1 n(n + 1)π 2 kT 2
≈ 1+ (3.4.4)
n+1 6 μ

where the last line was obtained by calculating the integral of x 2 φ (x) using a power
series expansion. Note that this series contains only even powers of kT /μ, thus the
convergence is rather fast. Let us now repeat the calculation of Eq. (3.3.7), this time
at T > 0.
√  ∞ √
2m 3 d
ρe ≈ 2 3 · (−μ)/kT + 1
π  0 e

2m 3
= 2 3 · I1/2
π 
√    
2m 3 2 3/2 π 2 kT 2
= 2 3 · μ 1+ (3.4.5)
π  3 8 μ

which gives
   2/3
π 2 kT 2
F = μ 1 +
8 μ
   
π 2 kT 2
≈ μ 1+
12 μ

(πkT )2
= μ+ . (3.4.6)
12μ

This relation between μ and F can be easily inverted by solving a simple quadratic
equation, which yields
56 3 Quantum Statistics – The Fermi–Dirac Distribution

F + F 1 − (πkT /F )2 /3
μ≈
2
  
2 
π kT 2
≈ F 1 − ·
12 F
  2 
π2 T
= F 1 − · . (3.4.7)
12 TF

Since T /TF  1 for all T in the interesting range, we observe that the chemical
potential depends extremely weakly on T . In other words, we can safely approximate
μ ≈ F for all relevant temperatures of interest. The assumption that kT  μ was
found self–consistent with the result μ ≈ F .
Having established the approximation μ ≈ F , we can now calculate the average
energy of the electron at an arbitrary temperature T :
√ 
2m 3 ∞ 3/2 d
 = 2 3
π  ρe 0 e(−F )/kT + 1

2m 3
= 2 3 · I3/2
π  ρe
√ 5/2
   
2m 3 2F 5π 2 T 2
≈ 2 3 · 1+
π  ρe 5 8 TF
 2 
2 
32 5π T
= · (3π 2 ρe )2/3 · 1 +
10m 8 TF
   
3F 5π 2 T 2
= · 1+ (3.4.8)
5 8 TF

Note that the dependence of the average per–particle energy on the temperature is
drastically different from that of the ideal gas. While in the idea gas it was linear
(  = 3kT /2), here it is actually almost a constant, independent of the temperature
(just like the chemical potential).
The same technique can be used, of course, to calculate any moment of the electron
energy.

3.5 Applications of the FD Distribution

The FD distribution is at the heart of modern solid–state physics and semiconductor


physics (see also, for example, [1, Sect. 4.5]) and indeed frequently encountered in
related courses on semiconductor devices. It is also useful for understanding the
3.5 Applications of the FD Distribution 57

physics of white dwarfs. We next briefly touch upon the very basics of conductance
in solids, as well as on two other applications: thermionic emission and photoelectric
emission.

3.5.1 Electrons in a Solid

The structure of the electron energy levels in a solid are basically obtained using
quantum–mechanical considerations. In the case of a crystal, this amounts to solving
the Schrödinger equation in a periodic potential, stemming from the corresponding
periodic lattice structure. Its idealized form, which ignores the size of each atom, is
given by a train of equispaced Dirac delta functions. This is an extreme case of the so
called Kronig–Penney model, where the potential function is a periodic rectangular
on–off function (square wave function), and it leads to a certain band structure. In
particular, bands of allowed energy levels are alternately interlaced with bands of
forbidden energy levels. The Fermi energy level F , which depends on the overall
concentration of electrons, may either fall in an allowed band or in a forbidden band.
The former case is the case of a metal, whereas the latter case is the case of an
insulator or a semiconductor (the difference being only how wide is the forbidden
band in which F lies). While in metals it is impossible to change F , it is possible
by doping in semiconductors.
A semiconductor can then be thought of as a system with electron orbitals grouped
into two6 energy bands separated by an energy gap. The lower band is the valence
band (where electrons are tied to their individual atoms) and the upper band is the
conduction band, where they are free. In a pure semiconductor at T = 0, all valence
orbitals are occupied with electrons and all conduction orbitals are empty. A full
band cannot carry any current so a pure semiconductor at T = 0 is an insulator. In a
pure semiconductor the Fermi energy is exactly in the middle of the gap between the
valence band (where f () is very close 1) and the conduction band (where f () is very
close to 0). Finite conductivity in a semiconductor follows either from the presence of
electrons in the conduction band (conduction electrons) or from unoccupied orbitals
in the valence band (holes).
Two different mechanisms give rise to conduction electrons and holes: the first is
thermal excitation of electrons from the valence band to the conduction band, while
the second is the presence of impurities that change the balance between the number
of orbitals in the valence band and the number of electrons available to fill them.
We will not delve into this too much beyond this point, since this material is
normally well–covered in other courses in the standard curriculum of electrical engi-
neering, namely, courses on solid state physics. Here we only demonstrate the use of
the FD distribution in order to calculate the density of charge carriers. The density of
charge carriers n of the conduction band is found by integrating up, from the conduc-

6 We treat both bands as single bands for our purposes. It does not matter that both may be themselves

groups of (sub)bands with additional gaps within each group.


58 3 Quantum Statistics – The Fermi–Dirac Distribution

tion band edge C , the product of the density of states ge () and the FD distribution
f (), i.e.,
 ∞ √  √
2m 3 ∞  − C d
n= d · ge () f () = 2 3 (−F )/kT + 1
, (3.5.1)
C π  C e

where here m designates the effective mass of√the electron7 and where we have taken
the density of states to be proportional to  − C since C is now the reference
energy and only the difference  − C goes for kinetic energy.8 For a semiconductor
at room temperature, kT is much smaller than the gap, and so

f () ≈ e−(−F )/kT (3.5.2)

which yields the approximation


√ 
2m 3 F /kT ∞ √
n≈ · e d ·  − C · e−/kT
π 
2 3
C
√ 
2m 3 −(C −F )/kT ∞ √
= ·e d · e−/kT
π 2 3 0
  ∞
2(mkT ) 3
−(C −F )/kT

= ·e dx · xe−x
π 
2 3
0
√ 
π 2(mkT ) 3
= · · e−(C −F )/kT
2 π 2 3
 
1 2mkT 3/2 −(C −F )/kT
= · ·e . (3.5.3)
4 π2

We see then that the density of conduction electrons, and hence also the conduction
properties, depend critically on the gap between C and F . A similar calculation
holds for the holes, of course.

3.5.2 Thermionic Emission∗

Thermionic emission is a current of charge carriers (most notably, electrons or ions)


via a surface (which acts as a potential barrier), caused by heat energy that overcomes
the electrostatic forces. If a refractory metal (e.g., tungsten) is heated up to high

7 The effective mass is obtained by a second order Taylor series expansion of the energy as a
function of the wave-number (used to obtain the density of states), and thinking of the coefficient
of the quadratic term as 2 /2m.
8 Recall that earlier we calculated the density of states for a simple potential well, not for a periodic

potential function. Thus, the earlier expression of ge () is not correct here.
3.5 Applications of the FD Distribution 59

enough temperature (somewhat below the melting point), an emission of electrons


can be obtained towards a positive anode. Quantum–mechanically speaking, this
heated metal can be viewed as a potential well with finitely high walls determined by
the surface potential barrier. Thus, some of the particles will be sufficiently energetic
to surmount the surface barrier (a.k.a. the surface work function) and hence will be
emitted. The work function φ varies between 2 and 6 eV for pure metals. The electron
will not be emitted unless the energy component normal to the surface would exceed
F + φ. The excess energy beyond this threshold is in the form of translational kinetic
energy which dictates the velocity away from the surface.
The analysis of this effect is made by transforming the distribution of energy into
a distribution in terms of the components of the velocity, vx , v y , and vz . We begin
with the expression of the energy of a single electron9
 
1 π 2 2 l x2 l y2 l z2
 = m(vx2 + v 2y + vz2 ) = + + . (3.5.4)
2 2m L 2x L 2y L 2z

Thus, dvx = hdl x /(2m L x ) and similar relations hold for the two other components,
which together yield
 m 3
dl x dl y dl z = V dvx dv y dvz , (3.5.5)
h
where we have divided by 8 since every quantum state can be occupied by only one
out of 8 combinations of the signs of the three component velocities. The fraction of
electrons dN with quantum states within the cube dl x dl y dl z is simply the expected
number of occupied quantum states within that cube, which is

dl x dl y dl z
dN = .
1 + exp{(lx l y lz − )/kT }

Thus, we can write the distribution function of the number of electrons in a cube
dvx × dv y × dvz as
 m 3 dvx dv y dvz
dN = 2V    , (3.5.6)
h 1 1
1 + exp m(vx2 + v 2y + vz2 ) − F
kT 2

where we have doubled the expression due to the spin and we have taken the chemical
potential of the electron gas to be F , independently of temperature, as was justified
in the previous subsection. Assuming that the surface is parallel to the Y Z plane, the

9 We are assuming that the potential barrier φ is fairly large (relative to kT ), such that the relationship

between energy and quantum numbers is reasonably well approximated by that of a particle in a
box.
60 3 Quantum Statistics – The Fermi–Dirac Distribution

minimum escape velocity in the x–direction is v0 = m2 (F + φ) and there are no
restrictions on v y and vz . The current along the x–direction is

dq qe dN [leaving the surface]


I = =
dt dt
    m 3
qe ∞ +∞ +∞ vx dt dvx dv y dvz
= · 2V   
dt v0 −∞ −∞ L x h 1 1
1 + exp m(vx2 + v 2y + vz2 ) − F
kT 2
 m 3  ∞  +∞ dv y dvz
= 2L y L z qe vx dvx    ,
h v0 −∞ 1 1
1 + exp m(vx2 + v 2y + vz2 ) − F
kT 2
(3.5.7)

where the factor vx dt/L x in the second line is the fraction of electrons close enough
to the surface so as to be emitted within time dt. Thus, the current density (current
per unity area) is

 m 3  ∞  +∞  +∞ dv y dvz
J = 2qe dvx · vx    .
h v0 −∞ −∞ 1 1
1 + exp m(vx + v y + vz ) − F
2 2 2
kT 2
(3.5.8)

As for the inner double integral, transform to polar coordinates to obtain


 +∞  +∞
dv y dvz
  
−∞ −∞ 1 1
1 + exp m(vx2 + v 2y + vz2 ) − F
kT 2
 ∞
v yz dv yz
= 2π   
0 mv 2yz /2kT 1 1 2
1+e · exp mv − F
kT 2 x
 ∞
2πkT du
=    u = mv 2yz /2kT
m 0 1 1 2
1 + exp mv − F · eu
kT 2 x
   
2πkT 1 1 2
= ln 1 + exp F − mvx (3.5.9)
m kT 2

which yields
 ∞    
4πm 2 qe kT 1 1
J= dvx · vx ln 1 + exp F − mvx2 . (3.5.10)
h3 v0 kT 2
3.5 Applications of the FD Distribution 61

Now, since normally10 φ  kT , the exponent in the integrand is very small through-
out the entire range of integration and so, it is safe to approximate it by ln(1+ x) ≈ x,
i.e.,
 ∞
4πm 2 qe kT F /kT
dvx · vx e−mvx /2kT
2
J ≈ e
h3 v0
  
4πmqe (kT ) 1 2
1 2
= exp F − mv0
h3 kT 2
4πmqe (kT )2 −φ/kT
= e , (3.5.11)
h3
and thus we have obtained a simple expression for the current density as function
of temperature. This result, which is known as the Richardson–Dushman equation,
is in very good agreement with experimental evidence. Further discussion on this
result can be found in [1, 2].

3.5.3 Photoelectric Emission∗

An analysis based on a similar line of thought applies also to the photoelectric


emission, an effect where electrons are emitted from a metal as a result of radiation
at frequency beyond a certain critical frequency ν0 (the Einstein threshold frequency),
whose corresponding photon energy hν0 is equal to the work function φ. Here, the
electron gains an energy amount of hν from a photon, which helps to pass the energy
barrier, and so the minimum velocity of emission, after excitation by a photon of
energy hν is given by

1
hν + mv02 = F + φ = F + hν0 . (3.5.12)
2
Let α denote the probability that a photon actually excites an electron. Then, similarly
as in the previous subsection,
 ∞    
4πm 2 qe kT 1 1
J =α· dvx · vx ln 1 + exp F − mvx2 . (3.5.13)
h3 v0 kT 2

where this time



2
v0 = [F + h(ν0 − ν)]. (3.5.14)
m

10 Atroom temperature (T = 300 ◦ K), kT ≈ 4 × 10−21 Joules ≈ 0.024eV, whereas φ is between


2eV and 6eV.
62 3 Quantum Statistics – The Fermi–Dirac Distribution

Changing the integration variable to


 
1 1 2
x= mv + h(ν − ν0 ) − F ,
kT 2 x

we can write the last integral as


 ∞   
4πmqe (kT )2 h(ν − ν0 )
J =α· dx ln 1 + exp − x dx. (3.5.15)
h3 0 kT

Now, let us denote


h(ν − ν0 )
= . (3.5.16)
kT
Integrating by parts (twice), we have
 ∞  ∞
−x xdx
dx ln(1 + e )=
0 0 e x− +1
 ∞
1 x 2 e x− dx 
= = f (e ). (3.5.17)
2 0 (e x− + 1)2

For h(ν − ν0 )  kT , we have e  1, and then it can be shown (using the same
technique as in Sect. 3.4) that f (e ) ≈ 2 /2, which gives

2πmqe
J =α· (ν − ν0 )2 (3.5.18)
h
independently of T . In other words, when the energy of light quantum is much larger
than the thermal energy kT , temperature becomes irrelevant. At the other extreme of
very low frequency, where h(ν0 − ν)  kT , and then e  1, we have f (e ) ≈ e ,
and then
4πmqe (kT )2 (hν−φ)/kT
J =α· e (3.5.19)
h3

which is like the thermionic current density, enhanced by a photon factor ehν/kT .

3.6 Suggestions for Supplementary Reading

The Fermi–Dirac distribution, its derivation, and its various applications can also be
found in many alternative textbooks, such as: Beck [1, Chap. 4], Huang [3, Chap. 11],
Kittel [4, Part I, Chap. 19], Landau and Lifshitz [5, Chap. V], Mandl [6, Sect. 11.4.1],
3.6 Suggestions for Supplementary Reading 63

Pathria [2], and Reif [7, Chap. 9]. The exposition in this chapter is based, to a large
extent, on the books by Beck, Mandl and Pathria. Applications to semiconductor
physics are based also on Omar [8, Chaps. 6, 7] and Gershenfeld [9].

References

1. A.H.W. Beck, Statistical Mechanics, Fluctuations and Noise (Edward Arnold Publishers, Lon-
don, 1976)
2. R.K. Pathria, Statistical Mechanics, 2nd edn. (Elsevier: Butterworth-Heinemann, Oxford, 1996)
3. K. Huang, Statistical Mechanics, 2nd edn. (Wiley, New York, 1987)
4. C. Kittel, Elementary Statistical Physics (Wiley, New York, 1958)
5. L.D. Landau, E.M. Lifshitz, Course of Theoretical Physics – Volume 5: Statistical Physics, Part
1, 3rd edn. Elsevier: Butterworth–Heinemann, New York (1980)
6. F. Mandl, Statistical Physics (Wiley, Chichester, 1971)
7. F. Reif, Fundamentals of Statistical and Thermal Physics (McGraw-Hill, New York, 1965)
8. M.A. Omar, Elementary Solid State Physics: Principles and Applications (Addison Wesley,
Reading, 1975)
9. N. Gershenfeld, The Physics of Information Technology (Cambridge University Press, Cam-
bridge, 2000)
Chapter 4
Quantum Statistics – The Bose–Einstein
Distribution

The general description of bosons was provided in the introductory paragraphs of


Chap. 3. As said, the crucial difference between bosons and fermions is that in the
case of bosons, Pauli’s exclusion principle does not apply. In this chapter, we study
the statistical mechanics of bosons.

4.1 Combinatorial Derivation of the BE Statistics

Using the same notation as in Chap. 3, again, we are partitioning the energy levels
1 , 2 , . . . into groups, labeled by s, where in group no. s, which has G s quantum
states, the representative energy is ˆs . As before, a microstate is defined in terms
of { N̂s } and ( N̂1 , N̂2 , . . .) = s s ( N̂s ), but now we need a different estimate of
each factor s ( N̂s ), since now there are no restrictions on the occupation numbers
of the quantum states. In how many ways can one partition N̂s particles among G s
states? Imagine that the N̂s particles of group no. s are arranged along a line. By
means of G s − 1 partitions we divide the particles into G s subsets corresponding to
the various states in that group. We have a total of ( N̂s + G s − 1) elements, N̂s of
them are particles and the remaining (G s − 1) are partitions (see Fig. 4.1). In how
many distinct ways can we configure them? The answer is simple:

( N̂s + G s − 1)!
s ( N̂s ) = . (4.1.1)
N̂s !(G s − 1)!

On the account that G s  1, the −1 term can be safely neglected, and we approximate

( N̂s + G s )!
s ( N̂s ) = . (4.1.2)
N̂s !G s !

© The Author(s) 2018 65


N. Merhav, Statistical Physics for Electrical Engineering,
DOI 10.1007/978-3-319-62063-3_4
66 4 Quantum Statistics – The Bose–Einstein Distribution

Fig. 4.1 N̂s particles and


G s − 1 partitions

Repeating the same derivation as in Sect. 4.1, but with the above s ( N̂s ), we get:
 
Gs
ln s ( N̂s ) ≈ ( N̂s + G s )h 2 , (4.1.3)
N̂s + G s

and so the free energy is now

 
Gs

F≈ N̂s ˆs − kT ( N̂s + G s )h 2 , (4.1.4)
s N̂s + G s

which should be minimized s.t. s N̂s = N . Upon carrying out the minimization
of the corresponding Lagrangian, we arrive1 at the following result for the most
probable occupation numbers:

Gs
N̂s = β(ˆ
 −μ)
(4.1.5)
e s −1

or, moving back to the original occupation numbers,

1
N̄r = , (4.1.6)
eβ(r −μ) −1

where μ is again the Lagrange multiplier, which has the meaning of the chemical
potential. This is Bose–Einstein (BE) distribution. As we see, the formula is very
similar to that of the FD distribution, the only difference is that in the denominator,
+1 is replaced by −1. Surprisingly enough, this is a crucial difference that makes
the behavior of bosons drastically different from that of fermions. Note that for this
expression to make sense, μ must be smaller than the ground energy 1 , otherwise
the denominator either vanishes or becomes negative. If the ground–state energy is
zero, this means μ < 0.

4.2 Derivation Using the Grand–Canonical Ensemble

As in Sect. 4.2, an alternative derivation can be carried out using the grand–canonical
ensemble. The only difference is that now, the summations over {Nr }, are not only
over {0, 1} but over all non–negative integers. In particular,

1 Exercise 4.1 Fill in the detailed derivation.


4.2 Derivation Using the Grand–Canonical Ensemble 67



β Nr (μ−r )
(β, μ) = e . (4.2.1)
r Nr =0

Of course, here too, for convergence of each geometric series, we must assume
μ < 1 , and then the result is
1
(β, μ) = . (4.2.2)
r
1 − eβ(μ−r )

Here, under the grand–canonical ensemble, N1 , N2 , . . . are independent geometric


random variables with distributions

Pr (Nr ) = [1 − eβ(μ−r ) ]eβ Nr (μ−r ) Nr = 0, 1, 2, . . . , r = 1, 2, . . . (4.2.3)

Thus, N̄r is just the expectation of this geometric random variable, which is readily
found2 to be as in Eq. (4.1.6).

4.3 Bose–Einstein Condensation

In analogy to the FD case, here too, the chemical potential μ is determined from the
constraint on the total number of particles. In this case, it reads
 1
= N. (4.3.1)
r
eβ(r −μ) − 1

Taking into account the density of states in a potential well of sizes L x × L y × L z


(as was done in Chap. 3), in the continuous limit, this yields
√ ∞ √
2m 3 d
ρ= · . (4.3.2)
2π 2 3 0 e(−μ)/kT − 1

At this point, an important peculiarity should be discussed. Consider Eq. (4.3.2) and
suppose that we are cooling the system. As T decreases, μ must adjust in order to keep
Eq. (4.3.2) holding since the number of particles must be preserved. In particular, as
T decreases, μ must increase, yet it must be negative. The point is that even for μ = 0,
which is the maximum allowed value of μ,√ the integral at the r.h.s. of (4.3.2) is finite3
as the density of states is proportional to  and hence balances the divergence of
the BE integral near  = 0. Let us define then

2 Exercise 4.2 Show this.


3 Exercise 4.3 Show this.
68 4 Quantum Statistics – The Bose–Einstein Distribution
√ ∞ √
 2m 3 d
(T ) = · /kT
(4.3.3)
2π 
2 3
0 e −1

and let Tc be the solution to the equation (T ) = ρ, which can be found as follows.
By changing the integration variable to z = /kT , we can rewrite the r.h.s. as
 3/2 ∞ √   3/2
mkT 2 zdz mkT
(T ) = √ ≈ 2.612 · , (4.3.4)
2π2 π 0 e −1
z 2π2

where the constant 2.612 is the numerical value of the expression in the curly brackets.
Thus,
2π2 2/3 2 ρ2/3
Tc ≈ 0.5274 · · ρ = 3.313 · . (4.3.5)
mk mk
The problem is that for T < Tc , Eq. (4.3.2) can no longer be solved by any non–
positive value of μ. So what happens below Tc ?
The root of the problem is in the passage from the discrete sum over r to the
integral over . The paradox is resolved when it is understood that below Tc , the
contribution of  = 0 should be separated from the integral. That is, the correct form
is
√ ∞ √
1 2m 3 V d
N = −μ/kT + · . (4.3.6)
e −1 2π 2 3 0 e(−μ)/kT − 1

or, after dividing by V ,


√ ∞ √
2m 3 d
ρ = ρ0 + · , (4.3.7)
2π 2 3 0 e(−μ)/kT − 1

where ρ0 is the density of ground–state particles, and now the integral accommodates
the contribution of all particles with strictly positive energy. Now, for T < Tc ,
we simply have ρ0 = ρ − (T ), which means that a macroscopic fraction of the
particles condensate at the ground state. This phenomenon is called Bose–Einstein
condensation. Note that for T < Tc ,

ρ0 = ρ − (T )
= (Tc ) − (T )
 
(T )
= (Tc ) 1 −
(Tc )

 3/2
T
= (Tc ) 1 −
Tc
4.3 Bose–Einstein Condensation 69

 3/2
T
= ρ 1− (4.3.8)
Tc

which gives a precise characterization of the condensation as a function of tempera-


ture. It should be pointed out that Tc is normally extremely low.4
One might ask why does the point  = 0 require special caution when T < Tc ,
but does not require such caution for T > Tc ? The answer is that for T > Tc ,
ρ0 = 1/V [e−μ/kT − 1] tends to zero in the thermodynamic limit (V → ∞) since
μ < 0. However, as T → Tc , μ → 0, and ρ0 becomes singular.
It is instructive to derive the pressure exerted by the ideal Boson gas for T < Tc .
This can be obtained from the grand partition function

ln  = − ln(1 − e−r /kT ) (μ = 0)
r

2m 3 V ∞ √
∼− d ·  ln(1 − e−/kT )
2π 
2 3
0

2m (kT )3/2 V ∞
3 √
=− dx · x ln(1 − e−x ), (4.3.9)
2π 2 3 0

where integral over x (including the minus sign) is just a positive constant C that we
will not calculate here. Now,

kT ln  C 2m 3 (kT )5/2
P = lim = . (4.3.10)
V →∞ V 2π 2 3

We see that the pressure is independent of the density ρ (compare with the ideal gas
where P = ρkT ). This is because the condensed particles do not contribute to the
pressure. What matters is only the density of those with positive energy, and this
density in turn depends only on T .

Exercise 4.4 Why fermions do not condensate? What changes in the last derivation?

Exercise 4.5 The last derivation was in three dimensions (d = 3). Modify the deriva-
tion of the BE statistics to apply to a general dimension d, taking into account the
dependence of the density of states upon d. For which values of d bosons condensate?

4 In 1995 the first gaseous condensate was produced by Eric Cornell and Carl Wieman at the Univer-

sity of Colorado, using a gas of rubidium atoms cooled to 170 nanokelvin. For their achievements
Cornell, Wieman, and Wolfgang Ketterle of MIT received the 2001 Nobel Prize in Physics. In
November 2010 the first photon BEC was observed.
70 4 Quantum Statistics – The Bose–Einstein Distribution

4.4 Black–Body Radiation

A black body is an (idealized model of an) object that absorbs all the incident electro-
magnetic radiation (and reflects none), regardless of the wavelength. A black body
in thermal equilibrium emits radiation that is called black–body radiation. It should
be understood that all bodies emit electromagnetic radiation whenever at positive
temperature, but normally, this radiation is not in thermal equilibrium. One of the
important applications of the BE statistics is to investigate the equilibrium properties
of black–body radiation.
If we consider the radiation inside an opaque object whose surfaces and walls
are kept at fixed temperature T , then the radiation and the surfaces arrive at thermal
equilibrium and then, the radiation has properties that are appreciably close to those
of a black body. To study the behavior of such a radiation, one creates a tiny hole in the
surface of the enclosure (so that a photon entering the cavity will be ‘trapped’ within
internal reflections, but will never be reflected out) it will not disturb the equilibrium
of the cavity and then the emitted radiation will have the same properties as the cavity
radiation, which in turn are the same as the radiation properties of a black body. The
temperature of the black body is T as well, of course. In this section, we study these
radiation properties using BE statistics.
We consider a radiation cavity of volume V and temperature T . Historically,
Planck (1900) viewed this system as an assembly of harmonic oscillators with quan-
tized energies (n + 1/2)ω, n = 0, 1, 2, . . ., where ω is the angular frequency of the
oscillator. An alternative point of view is as an ideal gas of identical and indistin-
guishable photons, each one with energy ω. Photons have integral spin and hence are
bosons, but they have zero mass and zero chemical potential when they interact with
a black–body. The reason is that there is no constraint that their total number would
be conserved (they are emitted and absorbed in the black–body material with which
they interact). Since in equilibrium F should be minimum, then (∂ F/∂ N )T,V = 0.
But (∂ F/∂ N )T,V = μ, and so, μ = 0. It follows then that distribution of photons
across the quantum states obeys BE statistics with μ = 0, that is

1
N̄ω = . (4.4.1)
eω/kT −1

The calculation of the density of states here is somewhat different from the one in
Sect. 4.3. Earlier, we considered a particle with positive mass m, whose kinetic energy
2 /2m, whereas now we are talking about a photon whose rest
is  p2 /2m = 2  k
mass is zero and whose energy is ω =  kc =  pc (c being the speed of light),

so the dependence on  k is now linear rather than quadratic. This is a relativistic
effect.
Assuming that V is large enough, we can pass to the continuous approximation.
As in Sect. 3.3, the number of waves (i.e., the number of quantum states) whose
wave–vector magnitude lies between  k and  k + d k,
is given by
4.4 Black–Body Radiation 71

2 d k
(1/8) · 4π k 2 d k
V  k
= .
(π/L x ) · (π/L y ) · (π/L z ) 2π 2

and doubling the above expres-


In terms of frequencies, using the relation ω =  kc,
sion, due to two directions of polarization (left– and right–circular polarizations), we
have that the total number of quantum states of a photon in the range [ω, ω + dω] is
V ω 2 dω/π 2 c3 . Thus, the number of photons in this frequency range is

V ω 2 dω
dNω = · . (4.4.2)
π 2 c3 eω/kT − 1

The contribution of this to the energy is

V ω 3 dω
dE ω = ωdNω = · . (4.4.3)
π 2 c3 eω/kT − 1

This expression for the spectrum of black–body radiation is known as Planck’s law.

Exercise 4.6 Write Planck’s law in terms of the wavelength dE λ .

At low frequencies (ω


kT ), this gives

kT V 2
dE ω ≈ ω dω (4.4.4)
π 2 c3
which is the Rayleigh–Jeans law. This is actually the classic limit (see footnote at the
Introduction to Chap. 3), obtained from multiplying kT by the “number of waves.”
In the other extreme of ω  kT , we have

V
dE ω = ωdNω ≈ · ω 3 e−ω/kT dω, (4.4.5)
π 2 c3
which is Wien’s law. At low temperatures, this is an excellent approximation over a
very wide range of frequencies. The frequency of maximum radiation is (Fig. 4.2)

kT
ωmax = 2.822 · , (4.4.6)

namely, linear in temperature. This relation has immediate applications. For example,
the sun is known to be a source of radiation, which with a good level of approximation,
can be considered a black body. Using a spectrometer, one can measure the frequency
ωmax of maximum radiation (which turns out to be at the lower limit of the visible
range), and estimate the sun’s surface temperature (from Eq. (4.4.6)), to be T ≈
5800◦ K. At room temperature, ωmax falls deep in the infrared range, and thus invisible
to the human eye. Hence the name black body.
72 4 Quantum Statistics – The Bose–Einstein Distribution

Fig. 4.2 Illustration of 0.18


Planck’s law. The energy
0.16
density per unit frequency as
a function of frequency 0.14

0.12

0.1

0.08

0.06

0.04

0.02

0
0 1 2 3 4 5

Now, the energy density is



E  ω 3 dω
= 2 3 = aT 4 (4.4.7)
V π c 0 eω/kT − 1

where the second equality is obtained by changing the integration variable to x =


ω/kT and then
 4 ∞ 3
 k x dx π2 k 4
a= 2 3 = . (4.4.8)
π c  0 ex − 1 153 c3

The relation E/V = aT 4 is called the Stefan–Boltzmann law. The heat capacity at
constant volume, C V = (∂ E/∂T )V , is therefore proportional to T 3 .

Exercise 4.7 Calculate ρ, the density of photons.

Additional thermodynamic quantities can now be calculated from the logarithm


of the grand–canonical partition function
 ∞
V
ln  = − ln[1 − e−ωr /kT ] = − dω · ω 2 ln[1 − e−ω/kT ]. (4.4.9)
r
π 2 c3 0

For example, the pressure of the photon gas can be calculated from

kT ln 
P=
V

kT
=− 2 3 dω · ω 2 ln[1 − e−ω/kT ]
π c 0
4.4 Black–Body Radiation 73

(kT )4
=− 2 3 3 dx · x 2 ln(1 − e−x )
π c  0
1 E
= aT 4 = , (4.4.10)
3 3V

where the integral is calculated using integration by parts.5 Note that while in the
ideal gas P was only linear in T , here it is proportional to the fourth power of T . Note
also that here, P V = E/3, which is different from the ideal gas, where P V = 2E/3.

4.5 Suggestions for Supplementary Reading

The exposition in this chapter is heavily based on those of Mandl [1] and Pathria
[2]. Additional relevant textbooks are the same as those that are mentioned also in
Sect. 3.6 (as BE statistics and FD statistics are almost always presented on a similar
footing).

References

1. F. Mandl, Statistical Physics (Wiley, Chichester, 1971)


2. R.K. Pathria, Statistical Mechanics, 2nd edn. (Elsevier: Butterworth-Heinemann, Oxford, 1996)

5 Exercise 4.8 Fill in the details.


Chapter 5
Interacting Particle Systems and Phase
Transitions

In this chapter, we discuss systems with interacting particles. As we shall see, when
the interactions among the particles are significant, the system exhibits a certain
collective behavior that, in the thermodynamic limit, may be subjected to phase
transitions, i.e., abrupt changes in the behavior of the system in the presence of
a gradual change in an external control parameter, like temperature, pressure, or
magnetic field. The contents of this chapter has a considerable overlap with Chap. 5
of [1], and it is provided in this book too for the sake of completeness.

5.1 Introduction – Sources of Interaction

So far, we have
 dealt almost exclusively with systems that have additive Hamiltoni-
ans, E(x) = i E(xi ), which means, under the canonical ensemble, that the particles
are statistically independent and there are no interactions among them. In Nature,
of course, this is seldom really the case. Sometimes this is still a reasonably good
approximation, but in other cases, the interactions are appreciably strong and cannot
be neglected. Among the different particles there could be many sorts of mutual
forces, such as mechanical, electrical, or magnetic forces. There could also be inter-
actions that stem from quantum–mechanical effects: as described earlier, fermions
must obey Pauli’s exclusion principle. Another type of interaction stems from the fact
that the particles are indistinguishable, so permutations between them are not con-
sidered as distinct states. For example, referring to BE statistics, had the N particles
been statistically independent, the resulting partition function would be
 N

−βr
Z N (β) = e
r
 

  N! 
= δ Nr = N  · exp −β Nr r (5.1.1)
N1 ,N2 ,... r r Nr ! r

© The Author(s) 2018 75


N. Merhav, Statistical Physics for Electrical Engineering,
DOI 10.1007/978-3-319-62063-3_5
76 5 Interacting Particle Systems and Phase Transitions


whereas in Eq. (3.2.1), the combinatorial factor, N !/ r Nr !, that distinguishes
between the various permutations among the particles, is absent. This introduces
dependency, which means interaction. Indeed, for the ideal boson gas, we have
encountered the effect of Bose–Einstein condensation, which is a phase transition,
and phase transitions can occur only in systems of interacting particles, as will be
discussed in this chapter.1

5.2 Models of Interacting Particles

The simplest forms of deviation from the purely additive Hamiltonian structure are
those that consist, in addition to the individual energy terms, {E(xi )}, also terms that
depend on pairs, and/or triples, and/or even larger cliques of particles. In the case of
purely pairwise interactions, this means a structure like the following:


N 
E(x) = E(xi ) + ε(xi , x j ) (5.2.1)
i=1 (i, j)

where the summation over pairs can be defined over all pairs i = j, or over some of
the pairs, according to a given rule, e.g., depending on the distance between particle i
and particle j, and according to the geometry of the system, or according to a certain
graph whose edges connect the relevant pairs of variables (that in turn, are designated
as nodes).
For example, in a one–dimensional array (a lattice) of particles, a customary model
accounts for interactions between neighboring pairs only, neglecting more remote
ones, thus the second term above would be i ε(xi , xi+1 ). A well known special
case of this is that of a polymer or a solid with crystal lattice structure, where, in the
one–dimensional version of the model, atoms are thought of as a chain of masses
connected by springs (see left part of Fig. 5.1), i.e., an array of coupled harmonic
oscillators. In this case, ε(xi , xi+1 ) = 21 K (xi+1 − xi )2 , where K is a constant and xi
is the displacement of the i-th atom from its equilibrium location, i.e., the potential
energies of the springs. In higher dimensional arrays (or lattices), similar interactions
apply, there are just more neighbors to each site, from the various directions (see right
part of Fig. 5.1). These kinds of models will be discussed in the next chapter in some
depth.
In a system where the particles are mobile and hence their locations vary and
have no geometrical structure, like in a gas, the interaction terms are also potential
energies pertaining to the mutual forces (see Fig. 5.2), and these normally depend
solely on the distances  ri − rj .

1 Another way to understand the dependence is to observe that occupation numbers {N


r } are depen-
dent via the constraint on their sum. This is different from the grand–canonical ensemble, where
they are independent.
5.2 Models of Interacting Particles 77

Fig. 5.1 Elastic interaction


forces between adjacent
atoms in a one–dimensional
lattice (left part of the figure)
and in a two–dimensional
lattice (right part)

Fig. 5.2 Mobile particles


and mutual forces between
them

For example, in a non–ideal gas,


N
 pi 2 
E(x) = + φ(
ri − rj ). (5.2.2)
i=1
2m i= j

A simple special case is that of hard spheres (Billiard balls), without any forces,
where
ri − rj  < 2R
∞ 
φ( ri − rj ) = (5.2.3)
ri − rj  ≥ 2R
0 

which expresses the simple fact that balls cannot physically overlap. The analysis of
this model can be carried out using diagrammatic techniques (the cluster expansion,
etc.), but we will not get into details in this book.2 To demonstrate, however, the
effect of interactions on the deviation from the equation of state of the ideal gas, we
consider next a simple one–dimensional example.
Example 5.1 (Non–ideal gas in one dimension) Consider a one–dimensional object
of length L that contains N + 1 particles, whose locations are 0 ≡ r0 ≤ r1 ≤ . . . ≤
r N −1 ≤ r N ≡ L, namely, the first and the last particles are fixed at the edges. The

2 The reader can find the derivations in any textbook on elementary statistical mechanics, for exam-

ple, [2, Chap. 9].


78 5 Interacting Particle Systems and Phase Transitions

order of the particles is fixed, namely, they cannot be swapped. Let the Hamiltonian
be given by
N n
pi2
E(x) = φ(ri − ri−1 ) + (5.2.4)
i=1 i=1
2m

where φ is a given potential function designating the interaction between two neigh-
boring particles along the line. The partition function, which is an integral of
the Boltzmann factor pertaining to this Hamiltonian, should incorporate the fact
that the positions {ri } are not independent. It is convenient to change variables to
ξi = ri − ri−1 , i = 1, 2, . . . , N , where it should be kept in mind that ξi ≥ 0 for all
N
i and i=1 ξi = L. Let us assume that L is an extensive variable, i.e., L = N ξ0 for
some constant ξ0 > 0. Thus, the partition function is
⎛ ⎞
N 
N
1 −β 2
Z N (β, L) = N d p1 · · · d p N dξ1 · · · dξ N e i=1 [φ(ξi )+ pi /2m] · δ ⎝L − ξi ⎠
h IR+
N i=1
(5.2.5)
⎛ ⎞
N 
N
1
= dξ1 · · · dξ N e−β i=1 φ(ξi ) · δ ⎝L − ξi ⎠ , (5.2.6)
λN IR+
N i=1

√ N
where λ = h/ 2πmkT . The constraint i=1 ξi = L makes the analysis of the
configurational partition function difficult. Let us pass to the corresponding Gibbs
ensemble where instead of fixing the length L, we control it by applying a force f .3
The corresponding partition function now reads

−N
Y N (β, f ) = λ dLe−β f L Z N (β, L)
0
 
∞ N 
N
= λ−N dLe−β f L dξ1 · · · dξ N e−β i=1 φ(ξi )
·δ L − ξi
0 IR+
N i=1
  
∞ 
N N
−N −β f L
=λ dξ1 · · · dξ N dLe δ L− ξi e−β i=1 φ(ξi )
IR+
N 0 i=1

 


N 
N
−N
=λ dξ1 · · · dξ N exp −β f ξi + φ(ξi )
IR+
N i=1 i=1
 


N 
N
−N
=λ dξ1 · · · dξ N exp −s ξi − β φ(ξi )
IR+
N i=1 i=1
∞ N
1 −[sξ+βφ(ξ)]
= dξ · e (5.2.7)
λ 0

3 Here we use the principle of ensemble equivalence.


5.2 Models of Interacting Particles 79

With a slight abuse of notation, from now on, we will denote the last expression by
Y N (β, s). Consider now the following potential function

⎨∞ 0≤ξ≤d
φ(ξ) = − d <ξ ≤d +δ (5.2.8)

0 ξ >d +δ

In words, distances below d are strictly forbidden (e.g., because of the size of the
particles), in the range between d and d+δ there is a negative potential −, and beyond
d + δ the potential is zero.4 Now, for this potential function, the one–dimensional
integral above is given by

e−sd −sδ
I = dξe−[sξ+βφ(ξ)] = [e (1 − eβ ) + eβ ], (5.2.9)
0 s

and so,

e−sd N −sδ
Y N (β, s) = [e (1 − eβ ) + eβ ] N
λN s N
  
= exp N ln[e−sδ (1 − eβ ) + eβ ] − sd − ln(λs) (5.2.10)

Now, the average length of the system is given by

∂ ln Y N (β, s)

L = −
∂s
N δe−sδ (1 − eβ ) N
= −sδ β β
+ Nd + , (5.2.11)
e (1 − e ) + e s

or, equivalently,
L =
L − N d, which is the excess length beyond the possible
minimum, is given by

N δe− f δ/kT (1 − e/kT ) N kT



L = + . (5.2.12)
e− f δ/kT (1 − e/kT ) + e/kT f

Thus,

N f δe− f δ/kT (1 − e/kT )


f ·
L = N kT + − f δ/kT
e (1 − e/kT ) + e/kT
 

= N kT − (+ f δ)/kT /kT (5.2.13)
e /(e − 1) − 1

4 Thisis a caricature of the Lennard–Jones potential function φ(ξ) ∝ [(d/ξ)12 − (d/ξ)6 ], which
begins from +∞, decreases down to a negative minimum, and finally increases and tends to zero.
80 5 Interacting Particle Systems and Phase Transitions

where the last line is obtained after some standard algebraic manipulation. Note that
without the potential well of the intermediate range of distances ( = 0 or δ = 0), the
second term in the square brackets disappears and we get a one dimensional version
of the equation of state of the ideal gas (with the volume being replaced by length
and the pressure – replaced by force). The second term is then a correction term due
to the interaction. The attractive potential reduces the product f · L. 

Yet another example of a model, or more precisely, a very large class of models
with interactions, are those of magnetic materials. These models will closely accom-
pany our discussions from this point onward in this chapter. Although few of these
models are solvable, most of them are not. For the purpose of our discussion, a mag-
netic material is one for which the relevant property of each particle is its magnetic
moment. As a reminder, the magnetic moment is a vector proportional to the angular
momentum of a revolving charged particle (like a rotating electron, or a current loop),
or the spin, and it designates the intensity of its response to the net magnetic field that
this particle ‘feels’. This magnetic field is given by the superposition of an externally
applied magnetic field and the magnetic fields generated by the neighboring spins.
Quantum mechanical considerations dictate that each spin, which will be denoted
by si , is quantized, that is, it may take only one out of finitely many values. In the
simplest case to be adopted in our study – two values only. These will be designated
by si = +1 (“spin up”) and si = −1 (“spin down”), corresponding to the same
intensity, but in two opposite directions, one parallel to the magnetic field, and the
other – anti-parallel (see Fig. 5.3). The Hamiltonian associated with an array of spins
s = (s1 , . . . , s N ) is customarily modeled (up to certain constants that, among other
things, accommodate for the physical units) with a structure like this:


N 
E(s) = −B · si − Ji j si s j , (5.2.14)
i=1 (i, j)

where B is the externally applied magnetic field and {Ji j } are the coupling constants
that designate the levels of interaction between spin pairs, and they depend on prop-
erties of the magnetic material and on the geometry of the system. The first term
accounts for the contributions of potential energies of all spins due to the magnetic
field, which in general, are given by the inner product B · si , but since each si is either
 as said, these boil down to simple products, where only
parallel or anti-parallel to B,
the sign of each si counts. Since P(s) is proportional to e−βE(s) , the spins ‘prefer’
to be parallel, rather than anti-parallel to the magnetic field. The second term in the
above Hamiltonian accounts for the interaction energy. If Ji j are all positive, they
also prefer to be parallel to each other (the probability for this is larger), which is
the case where the material is called ferromagnetic (like iron and nickel). If they are
all negative, the material is antiferromagnetic. In the mixed case, it is called a spin
glass. In the latter, the behavior is rather complicated.
5.2 Models of Interacting Particles 81

Fig. 5.3 Illustration of a


spin array on a square lattice

The case where all Ji j are equal and the double summation over {(i, j)} is over
nearest neighbors only is called the Ising model. A more general version of it is
called the O(n) model, according to which each spin is an n–dimensional unit vector
si = (si1 , . . . , sin ) (and so is the magnetic field), where n is not necessarily related to
the dimension d of the lattice in which the spins reside. The case n = 1 is then the
Ising model. The case n = 2 is called the XY model, and the case n = 3 is called the
Heisenberg model.
Of course, the above models for the Hamiltonian can (and, in fact, are being)
generalized to include interactions formed also, by triples, quadruples, or any fixed
size p (that does not grow with N ) of spin–cliques.
We next discuss a very important effect that exists in some systems with strong
interactions (both in magnetic materials and in other models): the effect of phase
transitions.

5.3 A Qualitative Discussion on Phase Transitions

As was mentioned in the introductory paragraph of this chapter, a phase transition


means an abrupt change in the collective behavior of a physical system, as we change
gradually one of the externally controlled parameters, like the temperature, pressure,
or magnetic field. The most common example of a phase transition in our everyday
life is the water that we boil in the kettle when we make coffee, or when it turns into
ice as we put it in the freezer.
What exactly these phase transitions are? In physics, phase transitions can occur
only if the system has interactions. Consider, the above example of an array of spins
with B = 0, and let us suppose that all Ji j > 0 are equal, and thus will be denoted
commonly by J (like in the O(n) model). Then,
82 5 Interacting Particle Systems and Phase Transitions
  
exp β J (i, j) si s j
P(s) = (5.3.1)
Z (β)

and, as mentioned earlier, this is a ferromagnetic model, where all spins ‘like’ to be
in the same direction, especially when β J is large. In other words, the interactions,
in this case, tend to introduce order into the system. On the other hand, the second
law talks about maximum entropy, which tends to increase the disorder. So there are
two conflicting effects here. Which one of them prevails?
The answer turns out to depend on temperature. Recall that in the canonical
ensemble, equilibrium is attained at the point of minimum free energy f = −T s().
Now, T plays the role of a weighting factor for the entropy. At low temperatures,
the weight of the second term of f is small, and minimizing f is approximately
equivalent to minimizing
 , which is obtained by states with a high level of order,
as E(s) = −J (i, j) si s j , in this example. As T grows, however, the weight of the
term −T s() increases, and min f , becomes more and more equivalent to max s(),
which is achieved by states with a high level of disorder (see Fig. 5.4). Thus, the
order–disorder characteristics depend primarily on temperature. It turns out that for
some magnetic systems of this kind, this transition between order and disorder may
be abrupt, in which case, we call it a phase transition. At a certain critical temperature,
called the Curie temperature, there is a sudden transition between order and disorder.
In the ordered phase, a considerable fraction of the spins align in the same direction,
which means that the system is spontaneously magnetized (even without an external
magnetic field), whereas in the disordered phase, about half of the spins are in either
direction, and then the net magnetization vanishes. This happens if the interactions,
or more precisely, their dimension in some sense, is strong enough.
What is the mathematical significance of a phase transition? If we look at the
partition function, Z N (β), which is the key to all physical quantities of interest, then
for every finite N , this is simply the sum of finitely many exponentials in β and
therefore it is continuous and differentiable infinitely many times. So what kind of
abrupt changes could there possibly be in the behavior of this function? It turns out
that while this is true for all finite N , it is no longer necessarily true if we look at the
thermodynamic limit, i.e., if we look at the behavior of

ln Z N (β)
φ(β) = lim . (5.3.2)
N →∞ N

Fig. 5.4 Qualitative graphs


of f () at various
temperatures. The
minimizing  increases
with T
5.3 A Qualitative Discussion on Phase Transitions 83

While φ(β) must be continuous for all β > 0 (since it is convex), it need not necessar-
ily have continuous derivatives. Thus, a phase transition, if exists, is fundamentally an
asymptotic property, it may exist in the thermodynamic limit only. While a physical
system is, after all finite, it is nevertheless well approximated by the thermodynamic
limit when it is very large.
The above discussion explains also why a system without interactions, where all
{xi } are i.i.d., cannot have phase transitions. In this case, Z N (β) = [Z 1 (β)] N , and
so, φ(β) = ln Z 1 (β), which is always a smooth function without any irregularities.
For a phase transition to occur, the particles must behave in some collective manner,
which is the case only if interactions take place.
There is a distinction between two types of phase transitions:
• If φ(β) has a discontinuous first order derivative, then this is called a first order
phase transition.
• If φ(β) has a continuous first order derivative, but a discontinuous second order
derivative then this is called a second order phase transition, or a continuous phase
transition.
We can talk, of course, about phase transitions w.r.t. additional parameters other
than temperature. In the above magnetic example, if we introduce back the magnetic
field B into the picture, then Z , and hence also φ, become functions of B too. If we
then look at derivative of
ln Z N (β, B)
φ(β, B) = lim
N →∞ N
⎡ ⎧ ⎫⎤
1 ⎣ ⎨ N  ⎬
= lim ln exp β B si + β J si s j ⎦ (5.3.3)
N →∞ N ⎩ ⎭
s i=1 (i, j)


w.r.t. the product (β B), which multiplies the magnetization, i si , at the exponent,
this would give exactly the average magnetization per spin
$ %
1 
N
m(β, B) = Si , (5.3.4)
N i=1

and this quantity might not always be continuous. Indeed, as mentioned earlier, below
the Curie temperature there might be a spontaneous magnetization. If B ↓ 0, then
this magnetization is positive, and if B ↑ 0, it is negative, so there is a discontinuity
at B = 0. We shall see this more concretely later on.
84 5 Interacting Particle Systems and Phase Transitions

5.4 The One–Dimensional Ising Model

The most familiar model of a magnetic system with interactions is the one–
dimensional Ising model, according to which


N 
N
E(s) = −B si − J si si+1 (5.4.1)
i=1 i=1

with the periodic boundary condition s N +1 = s1 . Thus,


 
N 
N
Z N (β, B) = exp β B si + β J si si+1
s i=1 i=1

 
N 
N
 
= exp h si + K si si+1 h = β B, K = β J
s i=1 i=1

 h
N N
= exp (si + si+1 ) + K si si+1 . (5.4.2)
s
2 i=1 i=1

Consider now the 2 × 2 matrix P whose entries are exp{ h2 (s + s  ) + K ss  }, s, s  ∈


{−1, +1}, i.e., & K +h '
e e−K
P= . (5.4.3)
e−K e K −h

Also, si = +1 will be represented by the column vector σi = (1, 0)T and si = −1


will be represented by σi = (0, 1)T . Thus,
 
Z (β, B) = ··· (σ1T Pσ2 ) · (σ2T Pσ2 ) · · · (σ NT Pσ1 )
σ1 σN
     
   
= σ1T P σ2 σ2T P σ3 σ3T P ··· P σ N σ NT Pσ1
σ1 σ2 σ3 σN

= σ1T P · I · P · I · · · I · Pσ1
σ1

= σ1T P N σ1
σ1

= tr{P N }
= λ1N + λ2N (5.4.4)
5.4 The One–Dimensional Ising Model 85

where λ1 and λ2 are the eigenvalues of P, which are


(
λ1,2 = e K cosh(h) ± e−2K + e2K sinh2 (h). (5.4.5)

Letting λ1 denote the larger (the dominant) eigenvalue, i.e.,


(
λ1 = e cosh(h) + e−2K + e2K sinh2 (h),
K
(5.4.6)

then clearly,
ln Z N (h, K )
φ(h, K ) = lim = ln λ1 . (5.4.7)
N →∞ N

The average magnetization is


$ N %

M(h, K ) = Si
i=1
 N N N
s ( i=1 si ) exp{h i=1 si + K i=1 si si+1 }
=  N N
s exp{h i=1 si + K i=1 si si+1 }
∂ ln Z (h, K )
= (5.4.8)
∂h
and so, the per–spin magnetization is:

 M(h, K ) ∂φ(h, K ) sinh(h)


m(h, K ) = lim = =) (5.4.9)
N →∞ N ∂h e −4K + sinh2 (h)

or, returning to the original parametrization:

sinh(β B)
m(β, B) = ) . (5.4.10)
e−4β J + sinh2 (β B)

For β > 0 and B > 0 this is a smooth function, and so, there are no phase transi-
tions and no spontaneous magnetization at any finite temperature.5 However, at the
absolute zero (β → ∞), we get

lim lim m(β, B) = +1; lim lim m(β, B) = −1, (5.4.11)


B↓0 β→∞ B↑0 β→∞

5 Note, in particular, that for J = 0 (i.i.d. spins) we get paramagnetic characteristics m(β, B) =

tanh(β B), in agreement with the result pointed out in the example of two–level systems, in the
comment that follows Example 2.3.
86 5 Interacting Particle Systems and Phase Transitions

thus m is discontinuous w.r.t. B at β → ∞, which means that there is a phase


transition at T = 0. In other words, the Curie temperature is Tc = 0 independent
of J .
We see then that one–dimensional Ising model is easy to handle, but it is not very
interesting in the sense that there is actually no phase transition. The extension to
the two–dimensional Ising model on the square lattice is surprisingly more difficult.
It is still solvable, but only without an external magnetic field. It was first solved in
1944 by Onsager [3], who has shown that it exhibits a phase transition with Curie
temperature given by
2J
Tc = √ . (5.4.12)
k ln( 2 + 1)

For lattice dimension larger than two, the problem is still open.
It turns out then that what counts for the existence of phase transitions, is not
only the intensity of the interactions (designated by the magnitude of J ), but more
importantly, the “dimensionality” of the structure of the pairwise interactions. If we
denote by n  the number of –th order neighbors of every given site, namely, the
number of sites that can be reached within  steps from the given site, then what
counts is how fast does the sequence {n  } grow, or more precisely, what is the value

of d = lim→∞ lnlnn , which is exactly the ordinary dimensionality for hyper-cubic
lattices. Loosely speaking, this dimension must be sufficiently large for a phase
transition to exist.
To demonstrate this point, we next discuss an extreme case of a model where
this dimensionality is actually infinite. In this model “everybody is a neighbor of
everybody else” and to the same extent, so it definitely has the highest connectivity
possible. This is not quite a physically realistic model, but it is pleasing that it is easy
to solve and that it exhibits a phase transition that is fairly similar to those that exist
in real systems. It is also intimately related to a very popular approximation method
in statistical mechanics, called the mean field approximation. Hence it is sometimes
called the mean field model. It is also known as the Curie–Weiss model or the infinite
range model.
Finally, we should comment that there are other “infinite–dimensional” Ising
models, like the one defined on the Bethe lattice (an infinite tree without a root and
without leaves), which is also easily solvable (by recursion) and it also exhibits phase
transitions [4], but we will not discuss it here.

5.5 The Curie–Weiss Model

According to the Curie–Weiss (C–W) model,


N
J 
E(s) = −B si − si s j . (5.5.1)
i=1
2N i= j
5.5 The Curie–Weiss Model 87

Here, all pairs {(si , s j )} communicate to the same extent, and without any geometry.
The 1/N factor here is responsible for keeping the energy of the system extensive
(linear in N ), as the number of interaction terms is quadratic in N . The factor 1/2
compensates for the fact that the summation over i = j counts each pair twice. The
first observation is the trivial fact that
 2
   
si = si2 + si s j = N + si s j (5.5.2)
i i i= j i= j

where the second equality holds since si2 ≡ 1. It follows then, that our Hamiltonian
is, up to a(n immaterial) constant, equivalent to
 N 2

J 
N
E(s) = −B si − si
i=1
2N i=1
⎡    2 ⎤
1 N
J 1 N
= −N ⎣ B · si + si ⎦ , (5.5.3)
N i=1 2 N i=1


thus E(s) depends on s only via the magnetization m(s) = 1
N i si . This fact makes
the C–W model very easy to handle:

  
J
Z N (β, B) = exp N β B · m(s) + m 2 (s)
s
2
+1

(m) · e N β(Bm+J m /2)
2
=
m=−1
+1

·
e N h 2 ((1+m)/2) · e N β(Bm+J m /2)
2
=
m=−1
 & ' 
· 1+m βm 2 J
= exp N · max h 2 + β Bm + (5.5.4)
|m|≤1 2 2

and so,  & ' 


1+m βm 2 J
φ(β, B) = max h 2 + β Bm + . (5.5.5)
|m|≤1 2 2

The maximum is found by equating the derivative to zero, i.e.,


& '
1 1−m
0= ln + β B + β J m ≡ − tanh−1 (m) + β B + β J m (5.5.6)
2 1+m
88 5 Interacting Particle Systems and Phase Transitions

Fig. 5.5 Graphical solutions of equation m = tanh(β J m): The left part corresponds to the case
β J < 1, where there is one solution only, m ∗ = 0. The right part corresponds to the case β J > 1,
where in addition to the zero solution, there are two non–zero solutions m ∗ = ±m 0

or equivalently, the maximizing (and hence the dominant) m is a solution m ∗ to the


equation6
m = tanh(β B + β J m).

Consider first the case B = 0, where the equation boils down to

m = tanh(β J m). (5.5.7)

It is instructive to look at this equation graphically. Referring to Fig. 5.5, we have to



make a distinction between two cases: If β J < 1, namely, T > Tc = J/k, the slope
of the function y = tanh(β J m) at the origin, β J , is smaller than the slope of the
linear function y = m, which is 1, thus these two graphs intersect only at the origin.
It is easy to check that in this case, the second derivative of
& '
 1+m β J m2
ψ(m) = h 2 + (5.5.8)
2 2

at m = 0 is negative, and therefore it is indeed the maximum (see Fig. 5.6, left part).
Thus, the dominant magnetization is m ∗ = 0, which means disorder and hence no
spontaneous magnetization for T > Tc . On the other hand, when β J > 1, which
means temperatures lower than Tc , the initial slope of the tanh function is larger than
that of the linear function, but since the tanh function cannot take values outside the
interval (−1, +1), the two functions must intersect also at two additional, symmetric,
non–zero points, which we denote by +m 0 and −m 0 (see Fig. 5.5, right part). In this
case, it can readily be shown that the second derivative of ψ(m) is positive at the
origin (i.e., there is a local minimum at m = 0) and negative at m = ±m 0 , which
means that there are maxima at these two points (see Fig. 5.6, right part). Thus, the
dominant magnetizations are ±m 0 , each capturing about half of the probability.
Consider now the case β J > 1, where the magnetic field B is brought back
into the picture. This will break the symmetry of the right graph of Fig. 5.6 and the
corresponding graphs of ψ(m) would be as in Fig. 5.7, where now the higher local

6 Onceagain, for J = 0, we are back to non–interacting spins and then this equation gives the
paramagnetic behavior m = tanh(β B).
5.5 The Curie–Weiss Model 89

Fig. 5.6 The function ψ(m) = h 2 ((1 + m)/2) + β J m 2 /2 has a unique maximum at m = 0 when
β J < 1 (left graph) and two local maxima at ±m 0 , in addition to a local minimum at m = 0, when
β J > 1 (right graph)

Fig. 5.7 The case β J > 1 with a magnetic field B. The left graph corresponds to B < 0 and the
right graph – to B > 0

maximum (which is also the global one) is at m 0 (B) whose sign is as that of B. But
as B → 0, m 0 (B) → m 0 of Fig. 5.6. Thus, we see the spontaneous magnetization
here. Even after removing the magnetic field, the system remains magnetized to the
level of m 0 , depending on the direction (the sign) of B before its removal. Obviously,
the magnetization m(β, B) has a discontinuity at B = 0 for T < Tc , which is a first
order phase transition w.r.t. B (see Fig. 5.8). We note that the point T = Tc is the

Fig. 5.8 Magnetization versus magnetic field: For T < Tc there is spontaneous magnetization:
lim B↓0 m(β, B) = +m 0 and lim B↑0 m(β, B) = −m 0 , and so there is a discontinuity at B = 0
90 5 Interacting Particle Systems and Phase Transitions

boundary between the region of existence and the region of non–existence of a phase
transition w.r.t. B. Such a point is called a critical point. The phase transition w.r.t.
β is of the second order.
Finally, we should mention here an alternative technique that can be used to
analyze this model, which is based on the so called Hubbard–Stratonovich transform.
Specifically, we have the following chain of equalities:
⎧  N 2 ⎫
 ⎨ N
K  ⎬
 
Z (h, K ) = exp h si + si h = β B, K = β J
s
⎩ 2N ⎭
i=1 i=1
N
⎧  N 2 ⎫
  ⎨ K  ⎬
= exp h si · exp si
s
⎩ 2N ⎭
i=1 i=1
N
*

  N N z2  N
= exp h si · dz exp − +z· si
s i=1
2πK IR 2K i=1
*

N  N
−N z 2 /(2K )
= dze exp (h + z) si
2πK IR s i=1
*  1 N
N 
dze−N z /(2K ) e(h+z)s
2
=
2πK IR s=−1
*
N
dze−N z /(2K ) [2 cosh(h + z)] N
2
=
2πK IR
*
N
=2 ·
N
dz exp{N [ln cosh(h + z) − z 2 /(2K )]}, (5.5.9)
2πK IR

where the passage from the second to the third line follows the use of
+ the, characteristic
variable: If X ∼ N (0, σ 2 ), then eαX = eα σ /2 (in
2 2
function of a Gaussian random
our case, σ 2 = K /N and α = i si ).
The integral in the last line can be shown (see, e.g., [1, Chap. 4]) to be dominated
by e to N times the maximum of the function in the square brackets at the exponent
of the integrand, or equivalently, the minimum of the function

z2
γ(z) = − ln cosh(h + z). (5.5.10)
2K
by equating its derivative to zero, we get the very same equation as m = tanh(β B +
β J m) by setting z = β J m. The function γ(z) is different from the function ψ that we
maximized earlier, but the extremum is the same. This function is called the Landau
free energy.
5.6 Spin Glasses∗ 91

5.6 Spin Glasses∗

So far we discussed only models where the non–zero coupling coefficients, J = {Ji j }
are equal, thus they are either all positive (ferromagnetic models) or all negative
(antiferromagnetic models). As mentioned earlier, there are also models where the
signs of these coefficients are mixed, which are called spin glass models.
Spin glass models have a much more complicated and more interesting behavior
than ferromagnets, because there might be meta-stable states, due to the fact that not
all spin pairs {(si , s j )} can necessarily be in their preferred mutual polarization. It
might be the case that some of these pairs are “frustrated.” In order to model situations
of amorphism and disorder in such systems, it is customary to model the coupling
coefficients as random variables. This model with random parameters means that
there are now two levels of randomness:
• Randomness of the coupling coefficients J.
• Randomness of the spin configuration s given J, according to the Boltzmann
distribution, i.e.,
 -   .
N
exp β B i=1 si + (i, j) Ji j si s j
P(s| J) = . (5.6.1)
Z (β, B| J)

However, these two sets of random variables have a rather different stature. The
underlying setting is normally such that J is considered to be randomly drawn once
and for all, and then remain fixed, whereas s keeps varying all the time (according to
the dynamics of the system). At any rate, the time scale along which s varies is much
smaller than that of J. Another difference is that J is normally not assumed to depend
on temperature, whereas s does. In the terminology of physicists, s is considered an
annealed random variable, whereas J is considered a quenched random variable.7
Accordingly, there is a corresponding distinction between annealed averages and
quenched averages.
Let us see what is exactly the difference between the quenched averaging and
the annealed one. If we examine, for instance, the free energy, or the log–partition
function, ln Z (β| J), this is now a random variable because it depends on the random
J. If we denote by
· J the expectation w.r.t. the randomness of J, then quenched
averaging means
ln Z (β| J) J , whereas annealed averaging means ln
Z (β| J) J .
Normally, the relevant average is the quenched one, because the random variable
1
N
ln Z (β| J) typically converges to the same limit as its expectation N1
ln Z (β| J) J
(the so called self–averaging property), but more often than not, it is also much harder
to calculate. Clearly, the annealed average is never smaller than the quenched one
because of Jensen’s inequality, but they sometimes coincide at high temperatures.
The difference between them is that in quenched averaging, the dominant realizations

7 Ina nutshell, annealing means slow cooling, whereas quenching means fast cooling, that causes
the material to freeze without enough time to settle in an ordered structure. The result is then a
disordered structure, modeled by frozen (fixed) random parameters, J.
92 5 Interacting Particle Systems and Phase Transitions

of J are the typical ones, whereas in annealed averaging, this is not necessarily the
case. This follows from the following sketchy consideration. As for the annealed
average, we have:


Z (β| J) J = P( J)Z (β| J)
J
 ·
≈ Pr{ J : Z (β| J) = e N α } · e N α
α

≈ e−N E(α) · e N α (assuming exponential probabilities)
α
·
= e N maxα [α−E(α)] (5.6.2)

which means that the annealed average is dominated by realizations of the system
with
ln Z (β| J) 
≈ α∗ = arg max[α − E(α)], (5.6.3)
N α

which may differ from the typical value of α, which is

1
α = φ(β) ≡ lim
ln Z (β| J) . (5.6.4)
N →∞ N

On the other hand, when it comes to quenched averaging, the random variable
ln Z (β| J) behaves linearly in N , and concentrates strongly around the typical value
N φ(β), whereas other values are weighted by (exponentially) decaying probabilities.
The literature on spin glasses includes many models for the randomness of the
coupling coefficients. We end this part by listing just a few.
• The Edwards–Anderson (E–A) model, where {Ji j } are non–zero for nearest–
neighbor pairs only (e.g., j = i ± 1 in one–dimensional model). According to
this model, {Ji j } are i.i.d. random variables, which are normally modeled to have
a zero–mean Gaussian pdf, or binary symmetric with levels ±J0 . It is customary
to work with a zero–mean distribution if we have a pure spin glass in mind. If the
mean is nonzero, the model has either a ferromagnetic or an anti-ferromagnetic
bias, according to the sign of the mean.
• The Sherrington–Kirkpatrick (S–K) model, which is similar to the E–A model,
except that the support of {Ji j } is extended to include all N (N − 1)/2 pairs, and
not only nearest–neighbor pairs. This can be thought of as a stochastic version of
the C–W model in the sense that here too, there is no geometry, and every spin
interacts with every other spin to the same extent, but here the coefficients are
random, as said.
• The p–spin model, which is similar to the S–K model, but now the interaction
term consists, not only of pairs, but also of triples, quadruples, and so on, up to
cliques of size p, i.e., products si1 si2 · · · si p , where (i 1 , . . . , i p ) exhaust all possible
subsets of p spins out of N . Each such term has a Gaussian coefficient Ji1 ,...,i p with
an appropriate variance.
5.6 Spin Glasses∗ 93

Considering the p–spin model, it turns out that if we look at the extreme case of
p → ∞ (taken after the thermodynamic limit N → ∞), the resulting behavior
turns out to be extremely erratic: all energy levels {E(s)}s∈{−1,+1} N become i.i.d.
Gaussian random variables. This is, of course, a toy model, which has very little to
do with reality (if any), but it is surprisingly interesting and easy to work with. It is
called the random energy model (REM).

5.7 Suggestions for Supplementary Reading

As mentioned earlier, part of the presentation in this chapter is similar to Chap. 5


of [1]. The topic of interacting particles and phase transitions is covered in many
textbooks, including: Huang [5, Part C], Kardar [6, Chap. 5], Landau and Lifshitz [7,
Chap. VI and onward], Pathria [2, Chaps. 10–13], and Reif [8, Chap. 10].

References

1. N. Merhav, Statistical physics and information theory. Found. Trends Commun. Inf. Theory
6(1–2), 1–212 (2009)
2. R.K. Pathria, Statistical Mechanics, 2nd edn. (Elsevier: Butterworth–Heinemann, Oxford, 1996)
3. L. Onsager, Crystal statistics. I. A two-dimensional model with an order-disorder transition.
Phys. Rev. 65(3–4), 117–149 (1944)
4. R.J. Baxter, Exactly Solved Models in Statistical Mechanics (Academic Press, London, 1982)
5. K. Huang, Statistical Mechanics, 2nd edn. (Wiley, New York, 1987)
6. M. Kardar, Statistical Physics of Particles (Cambridge University Press, Cambridge, 2007)
7. L.D. Landau, E.M. Lifshitz, Course of Theoretical Physics – Volume 5: Statistical Physics, Part
1, 3rd edn. (Elsevier: Butterworth–Heinemann, New York, 1980)
8. F. Reif, Fundamentals of Statistical and Thermal Physics (McGraw-Hill, New York, 1965)
Chapter 6
Vibrations in a Solid – Phonons
and Heat Capacity∗

6.1 Introduction

In analogy to black–body radiation, discussed earlier, there is a similar issue related


to vibrational modes of a solid. As in black–body radiation, the analysis of vibrational
modes in a solid can be viewed either by regarding the system as interacting harmonic
oscillators or as a gas of particles, called phonons – the analogue of photons, but in
the context of sound waves, rather than electromagnetic waves.
In this chapter, we shall use this point of view and apply statistical mechanical
methods to calculate the heat capacity pertaining to the lattice vibrations of crys-
talline solids.1 There are two basic experimental facts which any reasonable theory
must be able to explain. The first is that in room temperature the heat capacity of
most solids is about 3 k for each atom.2 This is the Dulong and Petit law (1819),
but this is only an approximation. The second fact is that at low temperatures, the
heat capacity at constant volume, C V , decreases, and actually vanishes at T = 0.
Experimentally, it was observed that the low–temperature dependence is of the form
C V = αT 3 + γT , where α and γ are constants that depend on the material and the
volume. For certain insulators, like potassium chloride, γ = 0, namely, C V is propor-
tional to T 3 . For metals (like copper), the linear term is present, but it is contributed
by the conduction electrons. A good theory of the vibrational contribution to heat
capacity should therefore predict the T 3 behavior at low temperatures. In classical
statistical mechanics, the equipartition theorem suggests a constant heat capacity at
all temperatures, in contradiction with both experiments and with the third law of

1 In general, there are additional contributions to the heat capacity (e.g., from orientational ordering

in paramagnetic salts, or from conduction electrons in metals, etc.), but here we shall consider only
the vibrational heat capacity.
2 Each atom has 6 degrees of freedom (3 of position + 3 of momentum). Classically, each one of

them contributes one quadratic term to the Hamiltonian, whose mean is kT /2, thus a total mean
energy of 3 kT, which means specific heat of 3 k per atom.
© The Author(s) 2018 95
N. Merhav, Statistical Physics for Electrical Engineering,
DOI 10.1007/978-3-319-62063-3_6
96 6 Vibrations in a Solid – Phonons and Heat Capacity∗

thermodynamics that asserts that as T → 0, the entropy S tends to zero (whereas


a constant heat capacity would yield S ∝ ln T for small T ). A fundamental con-
tribution in resolving this contradiction between theory and experiment was due to
Einstein (1907), who considered the lattice vibrations in the quantum regime. Ein-
stein’s derivations reproduce the desired features (observed experimentally) at least
qualitatively. However, he used a simplified model and did not expect full agree-
ment with experiment, but he pointed out the kind of modifications which the model
requires. Einstein’s theory was later improved by Debye (1912), who considered a
more realistic model.

6.2 Formulation

Consider a Hamiltonian of a classical solid composed of N atoms whose positions


in space are specified by the coordinates x = (x1 , . . . , x3N ). In the state of lowest
energy (the ground state), these coordinates are denoted by x̄ = (x̄1 , . . . , x̄3N ), which
are normally points of a lattice in the three–dimensional space, if the solid in question
is a crystal. Let ξi = xi − x̄i , i = 1, . . . , 3N , denote the displacements. The kinetic
energy of the system is clearly

1  2 1  ˙2
3N 3N
K = m ẋi = m ξ (6.2.1)
2 i=1 2 i=1 i

and the potential energy is

  ∂   1  ∂2 
(x) = ( x̄) + ξi + ξi ξ j + . . . (6.2.2)
i
∂xi x= x̄ i, j
2 ∂xi ∂x j x= x̄

The first term in this expansion represents the minimum energy when all atoms are
at rest in their mean positions x̄i . We henceforth denote this energy by 0 . The
second term is identically zero because (x) is minimized at x = x̄. The second
order terms of this expansion represent the harmonic component of the vibrations.
If we assume that the overall amplitude of the vibrations is reasonably small, we
can safely neglect all successive terms and then we are working with the so called
harmonic approximation. Thus, we may write

1  ˙2 
3N
E(x) = 0 + m ξ + αi j ξi ξ j (6.2.3)
2 i=1 i i, j
6.2 Formulation 97

where we have denoted



1 ∂ 2  
αi j = · . (6.2.4)
2 ∂xi ∂x j  x= x̄

This Hamiltonian corresponds to harmonic oscillators that are coupled to one another,
as discussed in Sects. 2.2.2 and 5.2, where the off–diagonal terms of the matrix
A = {αi j } designate the pairwise interactions. This Hamiltonian obeys the general
form of Eq. (5.2.1)
While Einstein neglected the off–diagonal terms of A in the first place, Debye
did not. In the following, we present the latter approach, which is more general (and
more realistic), whereas the former will essentially be a special case.

6.3 Heat Capacity Analysis

The first idea of the analysis is to transform the coordinates into a new domain where
the components are all decoupled. This means diagonalizing the matrix A. Since A
is a symmetric non–negative definite matrix, it is clearly diagonalizable by a unitary
matrix formed by its eigenvectors, and the diagonal elements of the diagonalized
matrix (which are the eigenvalues of A) are non–negative. Let us denote the new
coordinates of the system by qi , i = 1, . . . , 3N , and the eigenvalues – by 21 mωi2 .
By linearity of the differentiation operation, the same transformation take us from
the vector of velocities {ξ˙i } (of the kinetic component of the Hamiltonian) to the
vector of derivatives of {qi }, which will be denoted {q̇i }. Fortunately enough, since
the transformation is unitary it leaves the components {q̇i } decoupled. In other words,
by the Parseval theorem, the norm of {ξ˙i } is equal to the norm of {q̇i }. Thus, in the
transformed domain, the Hamiltonian reads

1  2
E(q) = 0 + m (q̇i + ωi2 qi2 ). (6.3.1)
2 i

which can be viewed as 3N decoupled harmonic oscillators, each one oscillating in its
individual normal mode ωi . The parameters {ωi } are called characteristic frequencies
or normal modes.

Example 6.1 (One–dimensional ring of springs) If the system has translational sym-
metry and if, in addition, there are periodic boundary conditions, then the matrix A is
circulant, which means that it is always diagonalized by the discrete Fourier transform
(DFT). In this case, qi are the corresponding spatial frequency variables, conjugate
to the location displacement variables ξi . The simplest example of this is a ring of N
one–dimensional springs, as discussed in Sect. 5.2 (see left part of Fig. 5.1), where
the Hamiltonian (in the current notation) is
98 6 Vibrations in a Solid – Phonons and Heat Capacity∗

1  ˙2 1 
E(x) = 0 + m ξi + K (ξi+1 − ξi )2 . (6.3.2)
2 i
2 i

In this case, the matrix A is given by


⎛ ⎞
1 − 21 0 . . . 0 − 21
⎜−1 1 −1 0 . . . 0 ⎟
⎜ 2 2 ⎟
⎜ ⎟
⎜ 0 − 21 1 − 21 0 . . . ⎟
⎜ ⎟
⎜ · ⎟
A = K ·⎜ · · · · · ⎟ (6.3.3)
⎜ 1⎟
⎜ 0 0 . . . − 1
1 − ⎟
⎜ 2 2⎟
⎜−1 0 . . . 0 −1 1 ⎟
⎝ 2 2 ⎠

The eigenvalues of A are λi = K [1 − cos(2πi/N )], which are simply the DFT
coefficients of the N –sequence formed by any row of A (removing the com-
plex exponential
√ of the phase factor). This means that the normal modes are
ωi = 2K [1 − cos(2πi/N )]/m. 
Classically, each of the 3N normal modes of vibration corresponds to a wave
of distortion of the lattice. Quantum–mechanically, these modes give rise to quanta
called phonons, in analogy to the fact that vibrational modes of electromagnetic
waves give rise to photons. There is one important difference, however: while the
number of normal modes in the case of an electromagnetic wave is infinite, here the
number of modes (or the number of phonon energy levels) is finite – there are exactly
3N of them. This gives rise to a few differences in the physical behavior, but at low
temperatures, where the high–frequency modes of the solid become unlikely to be
excited, these differences become insignificant.
The Hamiltonian is then
 1

E(n 1 , n 2 , . . .) = 0 + ni + ωi , (6.3.4)
i
2

where the non-negative integers {n i } denote the ‘states of excitation’ of the various
oscillators, or equally well, the occupation numbers of the various phonon levels in
the system. The internal energy is then


E = − ln Z 3N (β)
∂β
 
∂   1

=− ln . . . exp −β 0 + ni + ωi
∂β n1 n2 i
2
 
∂  e−βωi /2
=− ln e−β0
∂β i
1 − e−βωi
6.3 Heat Capacity Analysis 99

1  ∂
= 0 + ln(1 − e−βωi )
ωi +
i
2 i
∂β
1  ωi
= 0 + ωi + . (6.3.5)
i
2 i
1 − e−βωi

Only the last term of the last expression depends on T . Thus, the heat capacity at
constant volume3 is:

∂ E  (ωi /kT )2 ehωi /kT


CV = =k . (6.3.6)
∂T i
(ehωi /kT − 1)2

To proceed from here, one has to know (or assume something) about the form of the
density g(ω) of {ωi } and then pass from summation to integration. It is this point
where the difference between Einstein’s approach and Debye’s approach starts to
show up.

6.3.1 Einstein’s Theory

For Einstein, who assumed that the oscillators do not interact in the original, ξ–
domain, all the normal modes are equal ωi = ω E for all i, because then (assuming
translational symmetry) A is proportional to the identity matrix and then all its
eigenvalues are the same. Thus, in Einstein’s model g(ω) = 3N δ(ω − ω E ), and the
result is

C V = 3N k E(x) (6.3.7)

where E(x) is the so–called the Einstein function:

x 2 ex
E(x) = (6.3.8)
(e x − 1)2

with
ω E   E
x= = . (6.3.9)
kT T

where  E = ω E /k is called the Einstein temperature. At high temperatures


T   E , where x 1 and then E(x) ≈ 1, we readily see that C V (T ) ≈ 3N k, in
agreement with classical physics. For low temperatures, C V (T ) falls exponentially
fast as T → 0. This theoretical rate of decay, however, is way too fast compared to

3 Exercise 6.1 Why is this the heat capacity at constant volume? Where is the assumption of constant

volume being used here?.


100 6 Vibrations in a Solid – Phonons and Heat Capacity∗

the observed rate, which is cubic, as described earlier. But at least, Einstein’s theory
predicts the qualitative behavior correctly.

6.3.2 Debye’s Theory

Debye (1912), on the other hand, assumed a continuous density g(ω). He assumed
some cutoff frequency ω D , so that
 ωD
g(ω)dω = 3N . (6.3.10)
0

For g(ω) in the range 0 ≤ ω ≤ ω D , Debye adopted a Rayleigh expression in the


spirit of the one we saw in black–body radiation, but with a distinction between
the longitudinal mode and the two independent transverse modes associated with
the propagation of each wave at a given frequency. Letting v L and vT denote the
corresponding velocities of these modes, this amounts to
 
ω 2 dω ω 2 dω
g(ω)dω = V + . (6.3.11)
2π 2 v L3 π 2 vT3

This, together with the previous equation, determines the cutoff frequency to be
 1/3
18π 2 ρ
ωD = (6.3.12)
1/v L3 + 2/vT3

where ρ = N /V is the density of the atoms. Accordingly,



9N 2
ω 3D
ω ω ≤ ωD
g(ω) = (6.3.13)
0 elsewhere

The Debye formula for the heat capacity is now

C V = 3N k D(x0 ) (6.3.14)

where D(·) is called the Debye function


 x0
3 x 4 e x dx
D(x0 ) = 3 (6.3.15)
x0 0 (e x − 1)2

with
ω D   D
x0 = = , (6.3.16)
kT T
6.3 Heat Capacity Analysis 101

where  D = ω D /k is called the Debye temperature. Integrating by parts, the Debye


function can also be written as

3x0 12 x0 x 3 dx
D(x0 ) = − x + 3 . (6.3.17)
e 0 −1 x0 0 e x − 1

Now, for T   D , which means x0 1, D(x0 ) can be approximated by a Taylor


series expansion:
x02
D(x0 ) = 1 − + ... (6.3.18)
20
Thus, for high temperatures, we again recover the classical result C V = 3N k. On the
other hand, for T  D , which is x0  1, the dominant term in the integration by
parts is the second one, which gives the approximation
 ∞  3
12 x 3 dx 4π 4 4π 4 T
D(x0 ) ≈ 3 = = . (6.3.19)
x0 0 ex − 1 5x03 5 D

Therefore, at low temperatures, the heat capacity is


 3
12π 4 T
CV ≈ Nk . (6.3.20)
5 D

In other words, Debye’s theory indeed recovers the T 3 behavior at low tempera-
tures, in agreement with experimental evidence. Moreover, the match to experimental
results is very good, not only near T = 0, but across a rather wide range of temper-
atures. In some textbooks, like [1, p. 164, Fig. 6.7], or [2, p. 177, Fig. 7.10], there are
plots of C V as a function of T for certain materials, which show impressive proximity
between theory and measurements.

Exercise 6.2 Extend Debye’s analysis to allow two different cutoff frequencies, ω L
and ωT – for the longitudinal and the transverse modes, respectively.

Exercise 6.3 Calculate the density g(ω) for a ring of springs as described in
Example 6.1. Write an expression for C V as an integral and try to simplify it as
much as you can.

6.4 Suggestions for Supplementary Reading

The exposition in this chapter is based largely on the books by Mandl [1] and Pathria
[2]. Additional material appears also in Kardar [3, Sect. 6.2].
102 6 Vibrations in a Solid – Phonons and Heat Capacity∗

References

1. F. Mandl, Statistical Physics (Wiley, Chichester, 1971)


2. R.K. Pathria, Statistical Mechanics, 2nd edn. (Elsevier: Butterworth–Heinemann, Oxford, 1996)
3. M. Kardar, Statistical Physics of Particles (Cambridge University Press, Cambridge, 2007)
Chapter 7
Fluctuations, Stochastic Dynamics
and Noise

So far we have discussed mostly systems in equilibrium. Extensive quantities like


volume, energy, etc., have been calculated as means of certain ensembles, and these
were not only means, but moreover, the values of high probability in the thermody-
namic limit. In this chapter, we investigate the statistical fluctuations around these
means, as well as dynamical issues, like the rate of approach to equilibrium when
a system is initially away from equilibrium. We will also discuss noise generation
mechanisms as well as their implications on electric circuits and other systems.
Historically, the theory of fluctuations has been interesting and useful because it
made several experimental effects explicable. This was refreshing, considering the
fact that late–nineteenth–century classical physicists were not able to explain these
effects rigorously. One such phenomenon is Brownian motion – the irregular, random
motion of very light particles suspended in a drop of liquid, which is observed using
a microscope. Another phenomenon is electrical noise, such as thermal noise and
shot noise, as mentioned in the previous paragraph.
Classical thermodynamics cannot explain fluctuations and, in fact, even denies
their existence, because a fluctuation into a less probable state leads to a decrease
of entropy, which is seemingly contradictory to the ideas of the consistent increase
of entropy. This contradiction is resolved by the statistical–mechanical viewpoint,
according to which the increase of entropy holds true only on the average (or with
high probability), not deterministically. Apart from their theoretical interest, fluc-
tuations are important to understand in order to make accurate measurements of
physical properties and at the same time, to realize that the precision is limited by
the fluctuations.

© The Author(s) 2018 103


N. Merhav, Statistical Physics for Electrical Engineering,
DOI 10.1007/978-3-319-62063-3_7
104 7 Fluctuations, Stochastic Dynamics and Noise

7.1 Elements of Fluctuation Theory

So far, we have established probability distributions for various physical situations


and have taken for granted the most likely value (or the mean value) as the value of
the physical quantity of interest. For example, the internal energy in the canonical
ensemble was taken to be E, which is also the most likely value, with a very sharp
peak as N grows.
The first question is what is the probabilistic characterization of the departure
from the mean. One of the most natural measures  2 of this departure is the variance,
in the above
√ example of the energy, Var{E} = E − E 2
, or the relative standard
deviation Var{E}/ E. When several physical quantities are involved, then the
covariances between them are also measures of fluctuation. There are two possible
routes to assess fluctuations in this second order sense. The first is directly from
the relevant ensemble, and the second is by a Gaussian approximation. It is empha-
sized that when it comes to fluctuations, the principle of ensemble equivalence no
longer holds. For example, in the microcanonical ensemble, Var{E} = 0 (since E is
fixed), whereas√ in the canonical ensemble, it is normally √ extensive, as we shall see
shortly. Only Var{E}/ E, which is proportional to 1/ N and hence tends to 0,
can be considered asymptotically equivalent (in a very rough sense) to that of the
microcanonical ensemble.
Consider a system in the canonical ensemble, and let us calculate the probability
of energy level E(x) = E, which fluctuates from the mean E ∗ . Then,

(E)e−β E
P(E) =
Z (β)
e−β[E−T S(E)]

e−β F
e−β[E−T S(E)]
=
e−β[E ∗ −T S(E ∗ )]
= e−β[E−T S] , E = E − E ∗ ; S = S(E) − S(E ∗ ). (7.1.1)

Now,

∂S 1 ∂2 S
S = E + · · (E)2 + . . .
∂E 2 ∂ E2
1 1 ∂2 S
= · E + · · (E)2 + . . . (7.1.2)
T 2 ∂ E2
7.1 Elements of Fluctuation Theory 105

and so,
 
−β[E−T S] βT ∂ 2 S
P(E) ≈ e ≈ exp · · (E)2
2 ∂ E2
 
1 ∂2 S ∗ 2
= exp · · (E − E ) . (7.1.3)
2k ∂ E 2

One should keep in mind that since S(E) is concave, its second derivative is negative,
so in the vicinity of E ∗ , the random variable E(X) is nearly Gaussian with mean E ∗
and variance k/|∂ 2 S/∂ E 2 |. How does this variance scale with N ? Note that since S
and E are both extensive (proportional to N ), the first derivative is intensive, and the
second derivative is proportional to 1/N , so the variance of E is proportional to √N ,
which means that the standard √ deviation of the energy fluctuations
√ scales like N,
and so the relative
 variance Var{E}/E scales like 1/ N (cf. the additive case,
where E(X) = i E(X i ) is the sum of N i.i.d. random variables). This asymptotic
Gaussianity should not be a surprise, as we have approximated F(E) by a second
order Taylor series expansion around its minimum, so e−β F(E) is approximated by
an exponentiated quadratic expression which is Gaussian. The same idea can be
used for additional quantities that fluctuate. For example, in the Gibbsian ensemble,
where both E and V fluctuate, the Gibbs free energy is nearly quadratic in (E, V )
around its equilibrium value, and so, this random vector is Gaussian with a covariance
matrix that is proportional to the inverse of the Hessian of S w.r.t. E and V .

Example 7.1 (Ideal gas) In the case of the ideal gas, Eq. (2.2.6) gives

3N 5/3 h 2
E(S, V ) = e2S/(3N k) , (7.1.4)
4πe5/3 mV 2/3
whose Hessian is
 
2E 2/(N k)2 −2/(N kV )
∇2 E = . (7.1.5)
9 −2/(N kV ) 5/V 2

Thus, the covariance matrix of (V, S) is


 
−1 9kT (N kV )2 5/V 2 2/(N kV )
 = kT · (∇ E) 2
= · · (7.1.6)
2E 6 2/(N kV ) 2/(N k)2

or, using the relation E = 3N kT /2,


 
5N k 2 /2 kV
= . (7.1.7)
kV V 2 /N

Thus, Var{S} = 5N k 2 /2, Var{V } = V 2 /N , and SV  = kV , which are all


extensive. 
106 7 Fluctuations, Stochastic Dynamics and Noise

7.2 Brownian Motion and the Langevin Equation

The term “Brownian motion” is after the botanist Robert Brown, who, in 1828, had
been observing tiny pollen grains in a liquid under a microscope and saw that they
moved in a random fashion, and that this motion was not triggered by currents or
other processes in the liquid, like evaporation, etc. The movement was caused by
frequent collisions with the particles of the liquid. Einstein (1905) was the first to
provide a sound theoretical analysis of Brownian motion on the basis of the “random
walk problem.” Here, we introduce the topic using a formulation due to the French
physicist Paul Langevin (1872–1946), which makes the derivation extremely simple.
Langevin focuses on the motion of a relatively large particle of mass m, located at x(t)
at time t, whose velocity is v(t) = ẋ(t). The particle is subjected to the influence of a
force, composed of two components: one is a slowly varying macroscopic force and
the other is varying rapidly and randomly. The latter has zero mean, but it fluctuates.
In the one–dimensional case, it obeys the differential equation

m ẍ(t) + γ ẋ(t) = F + Fr (t), (7.2.1)

where γ is a frictional (dissipative) coefficient and Fr (t) is the random component


of the force. While this differential equation is nothing but Newton’s law and hence
obvious in macroscopic physics, it should not be taken for granted in the micro-
scopic regime. In elementary Gibbsian statistical mechanics, all processes are time
reversible in the microscopic level, since energy is conserved in collisions as the
effect of dissipation in binary collisions is traditionally neglected. A reasonable the-
ory, however, should incorporate the dissipative term.
The response of x(t) to F + Fr (t) is, clearly, the superposition of the individ-
ual responses to F and to Fr (t) separately. The former is the solution to a simple
(deterministic) differential equation, which is not the center of our interest here.
Considering the response to Fr (t) only, multiply Eq. (7.2.1) by x(t), to get

d(x(t)ẋ(t))
mx(t)ẍ(t) ≡ m − ẋ 2 (t) = −γx(t)ẋ(t) + x(t)Fr (t). (7.2.2)
dt

Taking the expectation, while assuming that, due to the randomness of {Fr (t)}, x(t)
and Fr (t) at time
 t,are independent, we have, x(t)Fr (t) = x(t) Fr (t) = 0. Also,
note that m ẋ 2 (t) = kT by the energy equipartition theorem (which applies here
since we are assuming the classical regime), and so, we end up with

d x(t)ẋ(t)
m = kT − γ x(t)ẋ(t) , (7.2.3)
dt
a simple first order differential equation, whose solution is

kT
x(t)ẋ(t) = + Ce−γt/m , (7.2.4)
γ
7.2 Brownian Motion and the Langevin Equation 107

where C is a constant of integration. Imposing the condition that x(0) = 0, this gives
C = −kT /γ, and so
 
1 d x 2 (t) kT
≡ x(t)ẋ(t) = 1 − e−γt/m , (7.2.5)
2 dt γ

which yields

  2kT m
x 2 (t) = t − (1 − e−γt/m ) . (7.2.6)
γ γ

The last equation gives the mean square deviation of a particle away from its origin,
at time t. The time constant
 of the dynamics, a.k.a. the relaxation time, is θ = m/γ.
For short times (t θ), x 2 (t) ≈ √
kT t 2 /m, which means that it looks like the particle
is moving at constant velocity of kT /m. For t
θ, however,

  2kT
x 2 (t) ≈ · t. (7.2.7)
γ
 
It should now be pointed out that this linear growth rate of x 2 (t) is a characteristic of
Brownian motion. Here it is only an approximation for t
θ, as for m > 0, {x(t)} is
not a pure Brownian motion. Pure Brownian motion corresponds to the case m = 0
(hence θ = 0), namely, the term m ẍ(t) in the Langevin equation can be neglected,
t
and then x(t) is simply proportional to 0 Fr (τ )dτ where {Fr (t)} is white noise.
Figure 7.1 illustrates a few realizations of a Brownian motion in one dimension and
in two dimensions.
We may visualize each collision on the pollen grain as that of an impulse, because
the duration of each collision is extremely short. In other words, the position of
the particle x(t) is responding to a sequence of (positive and negative) impulses at
random times. Let

Rv (τ ) = v(t)v(t + τ ) = ẋ(t)ẋ(t + τ ) (7.2.8)

denote the autocorrelation of the random process v(t) = ẋ(t) and let Sv (ω) =
F{Rv (τ )} be the power spectral density.1
Clearly, by the Langevin equation {v(t)} is the response of a linear, time–invariant
linear system

1 1 1 −t/θ
H (s) = = ; h(t) = e u(t) (7.2.9)
ms + γ m(s + 1/θ) m

1 To avoid confusion, one should keep in mind that although Sv (ω) is expressed as a function of
the radial frequency ω, which is measured in radians per second, the physical units of the spectral
density function itself here are Volt2 /Hz and not Volt2 /[radian per
+∞second]. To pass to the latter,
one should divide by 2π. Thus, to calculate power, one must use −∞ Sv (2π f )d f .
108 7 Fluctuations, Stochastic Dynamics and Noise

30 20 10
25
15
20 0
15 10
−10
10 5
5 −20
0 0

−5 −30
−5
−10
−10 −40
−15
−20 −15 −50
0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000

10 15 70
0 10
60
−10 5
50
−20 0
−30 −5 40
−40 −10 30
−50 −15
20
−60 −20
10
−70 −25
−80 −30 0
−90 −35 −10
−10 −5 0 5 10 15 20 25 30 35 −5 0 5 10 15 20 25 30 35 −25−20−15−10 −5 0 5 10 15 20 25

Fig. 7.1 Illustration of a Brownian motion. Upper figures: one–dimensional Brownian motion –
three realizations of x(t) as a function of t. Lower figures: two–dimensional Brownian motion –
three realizations of r (t) = [x(t), y(t)]. All realizations start at the origin

to the random input process {Fr (t)}. Assuming that the impulse process {Fr (t)} is
white noise, then

kT −|τ |/θ
Rv (τ ) = const · h(τ ) ∗ h(−τ ) = const · e−|τ |/θ = Rv (0)e−|τ |/θ = ·e
m
(7.2.10)

and
2kT ω0 1 γ
Sv (ω) = · 2 , ω0 = = (7.2.11)
m ω + ω02 θ m

that is, a Lorentzian spectrum. We see that the relaxation time θ is indeed a measure
of the “memory” of the particle and ω0 = 1/θ plays the role of 3 dB cutoff frequency
of the spectrum of {v(t)}. What is the spectral density of the driving input white
noise process?

Sv (ω) 2kT ω0
S Fr (ω) = = · m 2 (ω 2 + ω02 ) = 2kT mω0 = 2kT γ.
|H (iω)|2 m(ω 2 + ω02 )
(7.2.12)
This result is very important. The spectral density of the white noise is 2kT times
the dissipative coefficient of the system, γ. In other words, the dissipative element of
7.2 Brownian Motion and the Langevin Equation 109

the system is ‘responsible’ for the noise. At first glance, it may seem surprising: why
should the intensity of the (external) driving force Fr (t) be related to the dissipative
coefficient γ? The answer is that they are related via energy balance considerations,
since we are assuming thermal equilibrium. Because the energy waste (dissipation)
is proportional to γ, the energy supply from Fr (t) must also be proportional to γ in
order to balance it.

Example 7.2 (Energy balance for the Brownian particle) The friction force
Ffriction (t) = −γv(t) causes the particle to loose kinetic energy at the rate of

  kT kT
Ploss = Ffriction (t)v(t) = −γ v 2 (t) = −γ · =− .
m θ

On the other hand, the driving force Fr (t) injects kinetic energy at the rate of

Pinjected = Fr (t)v(t) (7.2.13)


  ∞ 
= Fr (t) dτ h(τ )Fr (t − τ ) (7.2.14)
0
 ∞
= 2kT γ dτ h(τ )δ(τ ) (7.2.15)
0
kT γ kT
= kT γh(0) = = , (7.2.16)
m θ

which exactly balances the loss. Here, we used that fact that 0 dτ h(τ )δ(τ ) =
h(0)/2 since only “half” of the delta function is “alive” where h(τ ) > 0.2 Exercise 7.1
What happens if γ = 0, yet Fr (t) has spectral density N0 ? Calculate
 the rate of kinetic
energy increase in two different ways: (i) Show that m2 v 2 (t) is linear in t and find
the constant of proportionality. (ii) Calculate Fr (t)v(t) for this case.

These principles apply not only to a Brownian particle in a liquid, but to any linear
system that obeys a first order stochastic differential equation with a white noise
input, provided that the energy equipartition theorem applies. An obvious electrical
analogue of this is a simple electric circuit where a resistor R and a capacitor C are
connected to each other (see Fig. 7.2). The thermal noise generated by the resistor
(due to the thermal random motion of the colliding free electrons in the conductor
with extremely short mean time between collisions), a.k.a. the Johnson–Nyquist
noise, is modeled as a current source connected in parallel to the resistor (or as an
equivalent voltage source connected in series to the resistor), which generates a white
noise current process Ir (t). The differential equation pertaining to Kirchoff’s current
law is

2 More rigorously, think of the delta function here as the limit of narrow (symmetric) autocorrelation

functions which all integrate to unity.


110 7 Fluctuations, Stochastic Dynamics and Noise

Fig. 7.2 An R–C circuit

V (t)
C V̇ (t) + = Ir (t) (7.2.17)
R
where V (t) is the voltage across the resistor as well as the parallel capacitor. Now,
this is exactly the same differential equation as before, where Ir (t) plays the role of
the driving force, V (t) is replacing ẋ(t), C substitutes m, and 1/R is the dissipative
coefficient instead of γ. Thus, the spectral density of the current is

2kT
S Ir (ω) = . (7.2.18)
R
Alternatively, if one adopts the equivalent serial voltage source model then Vr (t) =
R Ir (t) and so
2kT
SVr (ω) = · R 2 = 2kT R. (7.2.19)
R
This result is studied in every elementary course on random processes.
Finally, note that here we have something similar to the ultraviolet catastro-
phe: white noise has infinite power, which is nonphysical. Once again, this hap-
pens because we have not addressed quantum effects pertaining to high frequencies
(ω
kT ), which as in black–body radiation, cause an exponential decay in the
spectrum beyond a frequency of about kT /. We will get back to this later on in
Sect. 7.5.

7.3 Diffusion and the Fokker–Planck Equation

In this subsection, we consider the temporal evolution of the probability density func-
tion of x(t) (and not only its second order statistics, as in the previous subsection),
under quite general conditions. The first successful treatment of Brownian motion
was due to Einstein, who as mentioned earlier, reduced the problem to one of dif-
fusion. Einstein’s argument can be summarized as follows: assume that all particles
move independently. The relaxation time is short compared to the observation time,
but long enough for the motions of a particle in two consecutive intervals of θ to
be independent. Let the number of suspended grains be N and let the x coordinate
change by  in one relaxation time.  is a random variable, symmetrically dis-
tributed around 0. The number of particles dN displaced by more than  but less
7.3 Diffusion and the Fokker–Planck Equation 111

than  + d is dN = N p()d where p() is the pdf of . Since only small


displacements are likely to occur, p() is sharply peaked at the origin. Let ρ(x, t)
denote the density of particles at position x, at time t. The number of particles in the
interval [x, x + dx] at time t + δ (δ small) is
 +∞
ρ(x, t + δ)dx = dx ρ(x − , t) p()d (7.3.1)
−∞

This equation tells that the probability of finding the particle around x at time t + δ
is the probability of finding it in x −  (for any ) at time t, and then moving
by  within duration δ to arrive at x at time t + δ. Here we assume independence
between the location x −  at time t and the probability distribution of , as p()
is independent of x − . Since δ is small, we use the Taylor series expansion

∂ρ(x, t)
ρ(x, t + δ) ≈ ρ(x, t) + δ · . (7.3.2)
∂t

Also, for small , we approximate ρ(x − , t), this time to the second order:

∂ρ(x, t) 2 ∂ 2 ρ(x, t)
ρ(x − , t) ≈ ρ(x, t) −  · + · . (7.3.3)
∂x 2 ∂x 2
Putting these in Eq. (7.3.1), we get
 +∞  +∞
∂ρ(x, t) ∂ρ(x, t)
ρ(x, t) + δ · = ρ(x, t) p()d − p()d +
∂t −∞ ∂x −∞

1 ∂ 2 ρ(x, t) +∞ 2
·  p()d (7.3.4)
2 ∂x 2 −∞

or
 +∞
∂ρ(x, t) 1 ∂ 2 ρ(x, t)
= · 2 p()d (7.3.5)
∂t 2δ ∂x 2 −∞

which is the diffusion equation

∂ρ(x, t) ∂ 2 ρ(x, t)
= D· (7.3.6)
∂t ∂x 2
with the diffusion coefficient being
  
2 [x(t + δ) − x(t)]2
D = lim = lim . (7.3.7)
δ→0 2δ δ→0 2δ

To solve the diffusion equation, define (κ, t) as the Fourier transform of ρ(x, t)
w.r.t. the variable x, i.e.,
112 7 Fluctuations, Stochastic Dynamics and Noise
 +∞
(κ, t) = dx · e−iκx ρ(x, t). (7.3.8)
−∞

Then, the diffusion equation becomes an ordinary differential equation w.r.t. t:

(κ, t)
= D(iκ)2 (κ, t) ≡ −Dκ2 (κ, t) (7.3.9)
∂t

whose solution is easily found to be (κ, t) = C(κ)e−Dκ t . Assuming ρ(x, 0) =


2

δ(x), this means C(κ) = (κ, 0) = 1 for all κ, and so (κ, t) = e−Dκ t . The density
2

ρ(x, t) is obtained by the inverse Fourier transform, which is

e−x /(4Dt)
2

ρ(x, t) = √ , (7.3.10)
4π Dt
 
and so x(t) is zero–mean Gaussian with variance x 2 (t) = 2Dt.3 Of course, any
other initial location x0 would yield a Gaussian with the same variance 2Dt, but the
mean would be x0 . Comparing the variance 2Dt with (7.2.7), we have D = kT /γ,
which is known as the Einstein relation, widely used in semiconductor physics.
The analysis thus far assumed that  = 0, namely, there is no drift to either the
left or the right direction. We next drop this assumption. In this case, the diffusion
equation generalizes to

∂ρ(x, t) ∂ρ(x, t) ∂ 2 ρ(x, t)


= −v · +D· (7.3.11)
∂t ∂x ∂x 2
where
 x(t + δ) − x(t) d
v = lim = lim = x(t) = ẋ(t) (7.3.12)
δ→0 δ δ→0 δ dt

has the obvious meaning of the average velocity. Equation (7.3.11) is well known as
the Fokker–Planck equation. The diffusion equation and the Fokker–Planck equation
are very central in physics. As mentioned already in Chap. 1, they are fundamental
in semiconductor physics, describing processes of propagation of concentrations of
electrons and holes in semiconductor materials.

Exercise 7.2 Solve the Fokker–Planck equation and show that the solution is
ρ(x, t) = N (vt, 2Dt). Explain the intuition.

It is possible to further extend the Fokker–Planck equation so as to allow the pdf


of  to be location–dependent, that is, p() would be replaced by px (), but the

3 A point to think about: what is the intuition behind the resultant Gaussianity? We have not assumed

any specific distribution in advance.


7.3 Diffusion and the Fokker–Planck Equation 113

important point to retain is that given the present location x(t) = x,  would be
independent of the earlier history of {x(t  ), t  < t}, which means that {x(t)} should
be a Markov process. Consider then a general continuous–time Markov process
defined by the transition probability density function Wδ (x  |x), which denotes the
pdf of x(t + δ) at x  given that x(t) = x. A straightforward extension of the earlier
derivation would lead to the following more general form4

∂ρ(x, t) ∂ ∂2
= − [v(x)ρ(x, t)] + 2 [D(x)ρ(x, t)], (7.3.13)
∂t ∂x ∂x
where
 +∞
1
v(x) = lim (x  − x)Wδ (x  |x)dx  = E[ẋ(t)|x(t) = x] (7.3.14)
δ→0 δ −∞

is the local average velocity and

 +∞
1 1
D(x) = lim (x  − x)2 Wδ (x  |x)dx  = lim E{[x(t + δ) − x(t)]2 |x(t) = x}
δ→0 2δ −∞ δ→0 2δ
(7.3.15)

is the local diffusion coefficient.

Example 7.3 Consider the stochastic differential equation

ẋ(t) = −ax(t) + n(t),

where n(t) is a Gaussian white noise with spectral density N0 /2. From the solution
of this differential equation, it is easy to see that
 t+δ
−aδ −a(t+δ)
x(t + δ) = x(t)e +e dτ n(τ )eaτ .
t

This relation, between x(t) and x(t + δ), can be used to derive the first and the
second moments of [x(t + δ) − x(t)] for small δ, and to find that v(x) = −ax and
D(x) = N0 /4 (Exercise 7.4 Show this). Thus, the Fokker–Planck equation, in this
example, reads

∂ρ(x, t) ∂ N0 ∂ 2 ρ(x, t)
=a· [x · ρ(x, t)] + · .
∂t ∂x 4 ∂x 2

It is easy to check that the r.h.s. vanishes for ρ(x, t) ∝ e−2ax /N0 (independent of t),
2

which means that in equilibrium, x(t) is Gaussian with zero mean and variance

4 Exercise 7.3 Prove it.


114 7 Fluctuations, Stochastic Dynamics and Noise

N0 /4a. This is in agreement with the fact that, as x(t) is the response of the lin-
ear system H (s) = 1/(s + a) (or in the time domain, h(t) = e−at u(t)) to n(t), its
variance is indeed (as we know from courses on random processes):
 ∞  ∞
N0 N0 N0
h 2 (t)dt = e−2at dt = .
2 0 2 0 4a

Note that if x(t) is the voltage across the capacitor in a simple R–C network, then
a = 1/RC, and since E(x) = C x 2 /2, then in equilibrium we have the Boltzmann
weight ρ(x) ∝ exp(−C x 2 /2kT ), which is again, a zero–mean Gaussian. Comparing
the exponents, we immediately obtain N0 /2 = 2kT /RC 2 .
Exercise 7.5 Find the solution ρ(x, t) for all x and t subject to the initial condition
ρ(x, 0) = δ(x). 

A slightly different representation of the Fokker–Planck equation is the following:


 
∂ρ(x, t) ∂ ∂
=− v(x)ρ(x, t) − [D(x)ρ(x, t)] . (7.3.16)
∂t ∂x ∂x

Now, v(x)ρ(x, t) has the obvious interpretation of the drift current density Jdrift (x, t)
of a ‘mass’ whose density is ρ(x, t) (in this case, it is a probability mass), whereas


Jdiffusion (x, t) = − [D(x)ρ(x, t)]
∂x

is the diffusion current density.5 While the drift current is related to the overall motion
of the object, the diffusion current is associated with the tendency to equalize the
density ρ (which is why it is proportional to the negative density gradient). Thus,

∂ρ(x, t) ∂ Jtotal (x, t)


=− (7.3.17)
∂t ∂x
where

Jtotal (x, t) = Jdrift (x, t) + Jdiffusion (x, t).

Equation (7.3.17) is the equation of continuity, which we saw in Chap. 1. In steady


state, when ρ(x, t) is time–invariant, the total current may vanish. The drift current
and the diffusion current balance each other, or at least the net current is homogeneous
(independent of x), so no mass accumulates anywhere.

5 Thisgeneralizes Fick’s law that we have seen in Chap. 1. There, D was fixed (independent of x),
and so the diffusion current was proportional to the negative gradient of the density.
7.3 Diffusion and the Fokker–Planck Equation 115

Comment It is interesting to relate the diffusion constant D to the mobility of the


electrons in the context of electric conductivity. The mobility μ is defined according
to v = −μE, where E is the electric field, and the minus sign is because electrons are
accelerated in a direction opposite to that of the electric field (due to their negative
charge). According to Fick’s law, the diffusion current density is proportional to the
negative gradient of the concentration, and D is defined as the constant of propor-
tionality, i.e., Jdiff. elec. = −D∂ρ/∂x, which is Jdiffusion = Dqe ∂ρ/∂x, with the sign
change, again, due to the negative charge of the electron. If one sets up a field E in
an open circuit, the diffusion current cancels the drift current, that is

∂ρ
J = ρqe μE + Dqe = 0. (7.3.18)
∂x

This gives ρ(x) ∝ e−μE x/D . On the other hand, under thermal equilibrium, with
potential energy V (x) = qe E x, we also have ρ(x) ∝ e−V /kT = e−qe E x/kT . Upon
comparing the exponents, we readily obtain the Einstein relation, D = kT μ/qe .
Note that μ/qe = |v|/qe |E| is related to the admittance (the dissipative coefficient)
since |v| is proportional to the current and |E| is proportional to the voltage.

7.4 The Fluctuation–Dissipation Theorem

We next take another point of view on stochastic dynamics of a physical system:


suppose that a system (not necessarily a single particle as in the previous subsections)
is initially in equilibrium of the canonical ensemble, but at a certain time instant, it is
subjected to an abrupt, yet small, change in a certain parameter that controls it (say,
a certain force, like pressure, magnetic field, etc.). Right after this abrupt change
in the parameter, the system is, of course, no longer in equilibrium, but it is not
far, since the change is assumed small. How fast does the system re–equilibrate and
what is its dynamical behavior in the course of the passage to the new equilibrium?
Also, since the change was abrupt and not quasi–static, how is energy dissipated?
Quite remarkably, it turns out that the answers to both questions are related to the
equilibrium fluctuations of the system. Accordingly, the principle that quantifies and
characterizes this relationship is called the fluctuation–dissipation theorem and this is
the subject of this subsection. We shall also relate it to the derivations of the previous
subsection.
Consider a physical system, which in the absence of any applied external field, has
an Hamiltonian E(x), where here x denotes the microstate, and so, its equilibrium
distribution is the Boltzmann distribution with a partition function given by:

Z (β) = e−βE(x) . (7.4.1)
x
116 7 Fluctuations, Stochastic Dynamics and Noise

Now, let w(x) be an observable (a measurable physical quantity that depends on the
microstate), which has a conjugate force F, so that when F is applied, the change
in the Hamiltonian is E(x) = −F · w(x). Next, suppose that the external force is
time–varying according to a certain waveform {F(t), − ∞ < t < ∞}. As in the
previous subsection, it should be kept in mind that the overall effective force can
be thought of as a superposition of two contributions, a deterministic contribution,
which is the above mentioned F(t) – the external field that the experimentalist applies
on purpose and fully controls, and a random part Fr (t), which pertains to interaction
with the environment (or the heat bath at temperature T ). The former is deterministic
and the latter symbolizes the random, spontaneous thermal fluctuations.6 The random
component Fr (t) is responsible for the randomness of the microstate x and hence also
the randomness of the observable. We shall denote the random variable corresponding
to the observable at time t by W (t). Thus, W (t) is random variable, which takes values
in the set {w(x), x ∈ X }, where X is the space of microstates. When the external
deterministic field is kept fixed (F(t) ≡ const.), the system is expected to converge
into equilibrium and eventually obey the Boltzmann law. While in the section on
Brownian motion, we focused only on the contribution of the random part, Fr (t),
now let us refer only to the deterministic part, F(t). We will get back to the random
part later on.
Let us assume first that F(t) was switched on to a small level at time −∞, and
then switched off at time t = 0, in other words, F(t) = U (−t), where U (·) is the
unit step function (a.k.a. the Heaviside function). We are interested in the behavior
of the mean of the observable W (t) at time t, which we shall denote by W (t),
for t > 0. Also, W (∞) will denote the limit of W (t) as t → ∞, namely, the
equilibrium mean of the observable in the absence of an external field. Define now
the (negative) step response function as

W (t) − W (∞)
ζ(t) = lim (7.4.2)
→0

and the auto–covariance function pertaining to the final equilibrium as


RW (τ ) = lim W (t)W (t + τ ) − W (∞)2 , (7.4.3)
t→∞

Then, the fluctuation–dissipation theorem (FDT) asserts that

RW (τ ) = kT · ζ(τ ). (7.4.4)

The FDT then relates between the linear transient response of the system to a small
excitation (after it has been removed) and the autocovariance of the observable in

6 The random part of the force Fr (t) does not necessarily exist physically, but it is a way to refer
the random thermal fluctuations of the system to the ‘input’ F from a pure signals–and–systems
perspective. For example, think again of the example a Brownian particle colliding with other
particles. The other particles can be thought of as the environment in this case.
7.4 The Fluctuation–Dissipation Theorem 117

Fig. 7.3 Illustration of the response of W (t) to a step function at the input force F(t) = U (−t).
According to the FDT, the response (on top of the asymptotic level W (∞)) is proportional to the
equilibrium autocorrelation function RW (t), which in turn may decay either monotonically (solid
curve) or in an oscillatory manner (dashed curve)

equilibrium. The transient response, that fades away is the dissipation, whereas the
autocovariance is the fluctuation. Normally, RW (τ ) decays for large τ and so W (t)
converges to W (∞) at the same rate (see Fig. 7.3).
To prove this result, we proceed as follows: first, we have by definition:
 −βE(x)
x w(x)e
W (∞) =  −βE(x)
. (7.4.5)
xe

Now, for t < 0, we have

e−βE(x)−βE(x)
P(x) =  −βE(x)−βE(x) . (7.4.6)
xe

Thus, for all negative times, and for t = 0− in particular, we have



 −
 x w(x)e
−βE(x)−βE(x)
W (0 ) =  −βE(x)−βE(x)
. (7.4.7)
xe

Let Pt (x  |x) denote the probability that the system will be at state x  at time t
(t > 0) given that it was at state x at time t = 0− . This probability depends on the
dynamical properties of the system (in the absence of the perturbing force). Let us
define W (t)x = x  w(x  )Pt (x  |x), which is the expectation of W (t) (t > 0) given
that the system was at state x at t = 0− . Now,
118 7 Fluctuations, Stochastic Dynamics and Noise

W (t)x e−βE(x)−βE(x)
W (t) = x
 −βE(x)−βE(x)
xe

W (t)x e−βE(x)+β w(x)
= x
 −βE(x)+β w(x) (7.4.8)
xe

and W (∞) can be seen as a special case of this quantity for = 0 (no perturbation
at all). Thus, ζ(t) is, by definition, nothing but the derivative of W (t) w.r.t. ,
computed at = 0. I.e.,
 
−β E (x)+β w(x)
∂ x W (τ )x e
ζ(τ ) =  −β E (x)+β w(x)
∂ xe =0
 −β E (x)  −β E (x)  w(x)e−β E (x)
x W (τ )x w(x)e x W (τ )x e x
=β·  −β E (x) −β·  −β E (x) · 
−β E (x)
xe x e xe

=β lim W (t)W (t + τ ) − W (∞)2


t→∞

= β RW (τ ), (7.4.9)

where we have used the fact that the dynamics of {Pt (x  |x)} preserve the equilibrium
distribution.

Exercise 7.6 Extend the FDT to account for a situation where the force F(t) is not
conjugate to w(x), but to another physical quantity v(x).

While ζ(t) is essentially the response of the system to a (negative) step function
in F(t), then obviously,
 
0 t <0 0 t <0
h(t) = = (7.4.10)
−ζ̇(t) t ≥ 0 −β ṘW (t) t ≥ 0

would be the impulse response. Thus, we have characterized the “linear” system that
describes the transient response of W (t) − W (∞) to a small input signal in F(t).
It is directly related to the equilibrium autocovariance function of {W (t)}. We can
now express the response of W (t) to a general small signal F(t) that vanishes for
t ≥ 0 to be
 0
W (t) − W (∞) ≈ −β ṘW (t − τ )F(τ )dτ
−∞
 0
= −β RW (t − τ ) Ḟ(τ )dτ
−∞

= −β RW ⊗ Ḟ, (7.4.11)
7.4 The Fluctuation–Dissipation Theorem 119

where the second passage is from integration by parts and where ⊗ denotes convo-
lution. Indeed, in our first example, Ḟ(t) = − δ(t) and we are back with the result
W (t) − W (∞) = β RW (t).
It is instructive to look at these relations also in the frequency domain. Applying
the one sided Fourier transform on both sides of the relation h(t) = −β ṘW (t) and
taking the complex conjugate (i.e., multiplying by eiωt and integrating over t > 0),
we get
 ∞  ∞  ∞
H (−iω) ≡ h(t)eiωt dt = −β ṘW (t)eiωt dt = βiω RW (t)eiωt dt + β RW (0),
0 0 0
(7.4.12)
where the last step is due to integration by parts. Upon taking the imaginary parts of
both sides, we get:
 ∞
1
Im{H (−iω)} = βω RW (t) cos(ωt)dt = βωSW (ω), (7.4.13)
0 2

where SW (ω) is the power spectrum of {W (t)} in equilibrium, that is, the Fourier
transform of RW (τ ). Equivalently, we have:

Im{H (−iω)} Im{H (iω)}


SW (ω) = 2kT · = −2kT · (7.4.14)
ω ω

Example 7.4 (An electric circuit) Consider the circuit in Fig. 7.4. The driving force
is the voltage source V (t) and the conjugate variable is Q(t) the electric charge of
the capacitor. The resistors are considered part of thermal environment. The voltage
waveform is V (t) = U (−t). At time t = 0− , the voltage across the capacitor is /2
and the energy is 21 C(Vr + 2 )2 , whereas for t → ∞, it is 21 C Vr2 , so the difference
is E = 21 C Vr = 21 Q r , neglecting the O( 2 ) term. According to the FDT then,
ζ(t) = 21 β R Q (t), where the factor of 1/2 follows the one in E. This then gives

Im{H (−iω)}
S Q (ω) = 4kT · . (7.4.15)
ω
In this case,
(R1/[iωC]) · C C
H (iω) = = (7.4.16)
R + (R1/[iωC]) 2 + iω RC

Fig. 7.4 Electric circuit for


Example 7.4
120 7 Fluctuations, Stochastic Dynamics and Noise

for which
ω RC 2
Im{H (−iω)} = (7.4.17)
4 + (ω RC)2

and finally,
4kT RC 2
S Q (ω) = . (7.4.18)
4 + (ω RC)2

Thus, the thermal noise voltage across the capacitor is 4kT R/[4 + (ω RC)2 ]. The
same result can be obtained, of course, using the method studied in courses on
random processes, where the voltage noise spectrum across a certain pair of points
in the circuit is given by 2kT times the real part of the input impedance seen from
these points, which in this case, is given by
 
1 4kT R
2kT · Re RR = . (7.4.19)
iωC 4 + (ω RC)2

This concludes Example 7.4. 

Earlier, we have seen that W (t) − W (∞) responds to a deterministic (small)


waveform F(t) via a linear (or actually, linearized) system with an impulse response
h(t). By the same token, we can think of the random part around the mean, W (t) −
W (t), as the response of the same system to a random input Fr (t) (thus, the total
response is the superposition). In other words, we are decomposing the total “output
signal” as

W (t) − W (∞) = [W (t) − W (∞)] + [W (t) − W (t)],

viewing the first bracketed term as the deterministic part, responding to the determin-
istic signal F(t), and the second bracketed term as the random fluctuation, responding
to a random input Fr (t). If we wish to think of our physical system in equilibrium as
a linear(ized) system with input Fr (t) and output W (t) − W (t), then what should
the spectrum of the input process {Fr (t)} be in order to comply with the last result?
Denoting by S Fr (ω) the spectrum of the input process, we know from the basic of
random processes that

SW (ω) = S Fr (ω) · |H (iω)|2 (7.4.20)

and so comparing with (7.4.14), we have

Im{H (−iω)}
S Fr (ω) = 2kT · . (7.4.21)
ω · |H (iω)|2
7.4 The Fluctuation–Dissipation Theorem 121

This extends our earlier result concerning the spectrum of the driving white noise in
the case of the Brownian particle, where we obtained a spectral density of 2kT γ.
Example 7.5 (Second order linear system) For a second order linear system (e.g., a
damped harmonic oscillator),

m Ẅ (t) + γ Ẇ (t) + K W (t) = Fr (t) (7.4.22)

the force Fr (t) is indeed conjugate to the variable W (t), which is the location, as
required by the FDT. Here, we have
1 1
H (iω) = = . (7.4.23)
m(iω)2 + γiω + K K − mω 2 + γiω
In this case,
γω
Im{H (−iω)} = = γω|H (iω)|2 (7.4.24)
(K − mω 2 )2 + γ 2 ω 2
and so, we readily obtain

S Fr (ω) = 2kT γ, (7.4.25)

recovering the principle that the spectral density of the noise process is 2kT times
the dissipative coefficient γ of the system, which is responsible to the irreversible
component. The difference between this and the earlier derivation is that earlier, we
assumed in advance that the input noise process is white and we only computed its
spectral level, whereas now, we have actually shown that at least for a second order
linear system like this, it must be white noise (as far as the classical approximation
holds). 
From Eq. (7.4.21), we see that the thermal interaction with the environment, when
referred to the input of the system, has a spectral density of the form that we can
calculate. In general, it does not necessarily have to be a flat spectrum. Consider for
example, an arbitrary electric network consisting of one voltage source (in the role
of F(t)) and several resistors and capacitors. Suppose that our observable W (t) is
the voltage across one of the capacitors. Then, there is a certain transfer function
H (iω) from the voltage source to W (t). The thermal noise process stemming from
all resistors is calculated by considering equivalent noise sources (parallel current
sources or serial voltage sources) attached to each resistor. However, in order to refer
the contribution of these noise sources to the input F(t), we must calculate equivalent
noise sources which are in series with the given voltage source F(t). These equivalent
noise sources will no longer generate white noise processes, in general. For example,
in the circuit of Fig. 7.4, if an extra capacitor C would be connected in series to one
of the resistors, then, the contribution of the right resistor referred to the left one is
not white noise.7

7 Exercise 7.7 Calculate its spectrum.


122 7 Fluctuations, Stochastic Dynamics and Noise

Finally, it should be pointed out that this concept of referring the randomness in
the system to the input is not always feasible, as in general, there is no apparent
guarantee that the r.h.s. of Eq. (7.4.21) is a legitimate spectrum density function,
i.e., that it is non–negative everywhere. In the absence of this condition, the idea of
referring the noise to the input should simply be abandoned.

7.5 Johnson–Nyquist Noise in the Quantum–Mechanical


Regime

As promised at the end of Sect. 7.3, we now return to the problematics of the formula
SVr (ω) = 2kT R when it comes to very high frequencies, namely, the electrical ana-
logue to the ultraviolet catastrophe. Very high frequencies means very short waves,
much shorter than the physical sizes of the electric circuit.
The remedy to the unreasonable classical results in the high frequency range, is
to view the motion of electrons in a resistor as an instance of black–body radiation,
but instead of the three–dimensional case that we studied earlier, this time we are
talking about the one–dimensional case. The difference is mainly the calculation of
the density of states. Consider a long transmission line with characteristic impedance
R of length L, terminating at both ends by resistances R (see Fig. 7.5), so that the
impedances are matched at both ends. Then any voltage wave propagating along
the transmission line is fully absorbed by the terminating resistor without reflection.
The system resides in thermal equilibrium at temperature T . The resistor then can be
thought of as a black–body radiator in one dimension. A voltage wave of the form
V (x, t) = V0 exp[i(κx − ωt)] propagates along the transmission line with velocity
v = ω/κ, which depends on the capacitance and the inductance of the transmission
line per unit length. To assess the number of modes, let us impose the periodic
boundary condition V (0, t) = V (L , t). Then κL = 2πn for any positive integer n.
Thus, there are n = Lκ/2π = Lω/2πv such modes in the frequency range
between ω = vκ and ω + ω = v(κ + κ). The mean energy of such a mode is
given by

(ω) = . (7.5.1)
eω/kT − 1

Fig. 7.5 Transmission line


of length L, terminated by
resistances R at both ends
7.5 Johnson–Nyquist Noise in the Quantum–Mechanical Regime 123

Since there are n = Lω/(2πv) propagating modes in this frequency range, the
mean energy per unit time (i.e., the power) incident upon a resistor in this frequency
range is
1 Lω 1 ωω
P= · · (ω) = · , (7.5.2)
L/v 2πv 2π eω/kT − 1

where L/v at the denominator is the travel time of the wave along the transmis-
sion line. This is the radiation power absorbed by the resistor, which must be
equal to the power emitted by the resistor in this frequency range. Let the thermal
voltage generated by the resistor in the frequency range [ω, ω + ω] be denoted
by Vr (t)[ω, ω + ω]. This voltage sets up a current
 of Vr (t)[ω, ω + ω]/2R and
hence an average power of Vr2 (t)[ω, ω + ω] /4R. Thus, the balance between the
absorbed and the emitted power gives
 
Vr2 (t)[ω, ω + ω] 1 ω · ω
= · , (7.5.3)
4R 2π eω/kT − 1

which is
 
Vr2 (t)[ω, ω + ω] 4R ω
= · ω/kT (7.5.4)
ω 2π e −1
or
 
Vr2 (t)[ f, f +  f ] hf
= 4R · h f /kT . (7.5.5)
f e −1

Taking the limit  f → 0, the left–hand side becomes the one–sided spectral density
of the thermal noise, and so (returning to the angular frequency domain), the two–
sided spectral density is

SVr (ω) = 2R · . (7.5.6)
eω/kT − 1

We see that when quantum–mechanical considerations are incorporated, the noise


spectrum is no longer flat. As long as ω kT , the denominator is very well approx-
imated by ω/kT , and we recover the formula 2kT R, but for frequencies of the

order of magnitude of ωc = kT /, the spectrum decays exponentially rapidly. So the
quantum–mechanical correction is the substitution:


kT =⇒ . (7.5.7)
eω/kT −1
124 7 Fluctuations, Stochastic Dynamics and Noise

At T = 300 ◦ K, the cutoff frequency is f c = ωc /2π ≈ 6.2 THz,8 so the spectrum


can be safely considered 2kT R flat over any frequency range of practical interest.
What is the total RMS noise voltage generated by a resistor R at temperature T ?
The total mean square noise voltage is
 ∞
  ωdω
Vr2 (t) = 4R ω/kT − 1
0 e

4R(kT )2 ∞ xdx
=
 0 ex − 1

4R(kT )2 ∞ xe−x dx
=
 0 1 − e−x
∞ 
4R(kT )2  ∞ −nx
= xe dx
 n=1 0

4R(kT )2  1
=
 n=1
n2

2R(πkT )2
= , (7.5.8)
3
which is quadratic in T since both the (low frequency) spectral density and the
effective bandwidth are linear in T . The RMS is then
 
 2R
VRMS = Vr2 (t) = · πkT, (7.5.9)
3

namely, proportional to temperature and to the square root of the resistance. To assess
the order of magnitude, a resistor of 100  at T = 300 ◦ K generates an RMS thermal
noise of about 10 mV when it stands alone (without a circuit that limits the bandwidth
much more drastically than ωc ). The equivalent noise bandwidth is

2R(πkT )2 /3h π 2 kT π2
Beq = = = · fc . (7.5.10)
2kT R 3h 3
Exercise 7.8 Derive an expression for the autocorrelation function of the Johnson–
Nyquist noise in the quantum mechanical regime.

8 Recall that 1 THz = 1012 Hz.


7.6 Other Noise Sources 125

7.6 Other Noise Sources

In addition to thermal noise, there are other physical mechanisms that generate noise
in Nature in general, and in electronic circuits, in particular. We will only provide short
descriptions here. The interested reader is referred to the course notes “Fundamentals
of Noise Processes” by Y. Yamamoto in the following link: http://www.nii.
ac.jp/qis/first-quantum/e/forStudents/lecture/index.html
These notes contain a very detailed and comprehensive account of many more topics
that evolve around the physics of noise processes in electronic circuitry and other
systems.
Flicker Noise
Flicker noise, also known as 1/ f noise, is a random process with a spectrum that falls
off steadily into the higher frequencies. It occurs in almost all electronic devices, and
results from a variety of effects, though always related to a direct current. According
to the underlying theory, there are fluctuations in the conductivity due to the superpo-
sition of many independent thermal processes of alternate excitation and relaxation
of certain defects (e.g., dopant atoms or vacant lattice sites). This means that every
once in a while, a certain lattice site or a dopant atom gets excited and it moves into
a state of higher energy for some time, and then it relaxes back to the lower energy
state until the next excitation. Each one of these excitation/relaxation processes can
be modeled as a random telegraph signal (RTS) with a different time constant θ (due
to different physical/geometric characteristics) and hence contributes a Lorentzian
spectrum parametrized by θ. The superposition of these processes, whose spectrum
is given by the integral of the Lorentzian function over a range of values of θ (with
a certain weight), gives rise to the 1/ f behavior over a wide range of frequencies.
To see this more concretely in the mathematical language, an RTS X (t) is given by
X (t) = (−1) N (t) , where N (t) is a Poisson process of rate λ. It is a binary signal where
the level +1 can symbolize excitation and the level −1 designates relaxation. Here
the dwell times between jumps are exponentially distributed. The autocorrelation
function is given by
 
X (t)X (t + τ ) = (−1) N (t)+N (t+τ )
 
= (−1) N (t+τ )−N (t)
 
= (−1) N (τ )
∞
(λτ )k
= e−λτ · (−1)k
k=0
k!


−λτ (−λτ )k
=e
k=0
k!
= e−2λτ (7.6.1)
126 7 Fluctuations, Stochastic Dynamics and Noise

and so the spectrum is Lorentzian:

4λ 2θ
S X (ω) = F{e−2λ|τ | } = = , (7.6.2)
ω 2 + 4λ2 1 + (ωθ)2

where the time constant is θ = 1/2λ and the cutoff frequency is ωc = 2λ. Now,
calculating the integral
 θmax

dθ · g(θ) ·
θmin 1 + (ωθ)2

with g(θ) = 1/θ, yields a composite spectrum that is proportional to

1 1
tan−1 (ωθmax ) − tan−1 (ωθmin ).
ω ω

For ω 1/θmax , using the approximation tan−1 (x) ≈ x (|x| 1), this is approx-
imately a constant. For ω
1/θmin , using the approximation tan−1 (x) ≈ π2 − x1
(|x|
1), this is approximately proportional to 1/ω 2 . In between, in the range
1/θmax ω 1/θmin (assuming that 1/θmax 1/θmin ), the behavior is according
to
   
1 π 1 1 π 1 π
− − θmin = − − ωθmin ≈ ,
ω 2 ωθmax ω 2 ωθmax 2ω

which is the 1/ f behavior in this wide range of frequencies. There are several theories
why g(θ) should be inversely proportional to θ, but the truth is that they are not perfect,
and the issue of 1/ f noise is not yet perfectly (and universally) understood.
Shot Noise
Shot noise in electronic devices consists of unavoidable random statistical fluctu-
ations of the electric current in an electrical conductor. Random fluctuations are
inherent when current flows, as the current is a flow of discrete charges (electrons).
First, some background on Poisson processes: a Poisson process {N (t)}t≥0 is a
continuous–time counting process, starting from N (0) = 0 and incremented by 1 at
random time instants T1 , T2 , . . .. The number of events N (t), counted up to time t,
is distributed according to

(λt)k
Pr{N (t) = k} = e−λt , k = 0, 1, 2, . . . (7.6.3)
k!
and events counted at non–overlapping time intervals are statistically independent.
Thus, over a total time interval of t0 seconds, the joint probability of N (t0 ) = k
together with counting event times within [τ1 , τ1 + dτ1 ] × . . . × [τk , τk + dτk ] is
given by
7.6 Other Noise Sources 127

Pr{T1 ∈ [τ1 , τ1 + dτ1 ), . . . , Tk ∈ [τk , τk + dτk ), N (t0 ) = k} (7.6.4)



= f (τ1 , . . . , τk , N (t0 ) = k)dτ1 · · · dτk (7.6.5)
= e−λτ1 · λdτ1 · e−λ(τ2 −τ1 ) · λdτ2 · · · λdτk e−λ(t0 −τk )
= e−λt0 λk · dτ1 · · · dτk . (7.6.6)

Therefore, by the Bayes theorem

e−λt0 λk
f (τ1 , . . . , τk |N (t0 ) = k) = (7.6.7)
e−λt0 (λt0 )k /k!
 k
1
= k! · . (7.6.8)
t0

Consider k independent random variables, 1 , . . . , k , all uniformly distributed


within the interval [0, t0 ]. Their joint density is, of course (1/t0 )k , which is similar
to the above except the factor k!. But T1 , T2 , . . . , Tk are ordered in increasing order,
whereas 1 , 2 , . . . , k are not necessarily so. One can think of T1 , T2 , . . . , Tk as a
result of ordering 1 , 2 , . . . , k in increasing order, and since there are k! possible
orderings, this gives rise to the factor k! in f (τ1 , . . . , τk |N (t0 ) = k). This means that
one can simulate a Poisson process {N (t), 0 ≤ t < t0 } as follows: (i) First, randomly
select k according to the Poisson distribution (7.6.3). (ii) Draw 1 , . . . , k indepen-
dently and uniformly at random over [0, t0 ]. (iii) Sortk1 , 2 , . . . , k in an increasing
k
order to obtain T1 , T2 , . . . , Tk . (iv) Let N (t) = i=1 u(t − Ti ) = i=1 u(t − i )
for t ∈ [0, t0 ).
Now, consider a DC current in a device, which shoots electrons according to a
Poisson process (e.g., a p-n junction of a diode), i.e.,

I (t) = i e (t − Ti ), (7.6.9)
i

where i e (·) is the (very short) current pulse generated by the passage of a single

electron.9 The DC current is simply the average of this, which is λt0 qe /t0 = λqe = I0 .
The noise, which is associated with the fluctuations around this average, is given by
the second order statistics. Neglecting edge effects, we have:


R(s) = E{I (t)I (t + s)} = E{E{I (t)I (t + s)|K }} (7.6.10)
⎧ ⎫
⎨ K 
K ⎬
=E i e (t − i ) i e (t + s −  j ) (7.6.11)
⎩ ⎭
i=1 j=1

9 Note that i e (t) integrates to qe , and since it is a very short pulse, it is nearly qe δ(t) for a passage
at time t = 0.
128 7 Fluctuations, Stochastic Dynamics and Noise
  
 K
1 t0
=E i e (t + s − θ)i e (t − θ)dθ + (7.6.12)
t
i=1 0 0
⎧ ⎫
⎨ 1  t0  t0 ⎬
E i e (t + s − θ)dθ · i e (t − θ)dθ (7.6.13)
⎩ t 2 ⎭
i= j 0 0 0

E{K } E{K 2 − K } 2
= Re (s) + · qe (7.6.14)
t0 t02
λt0 λ2 t 2
= · Re (s) + 2 0 qe2 (7.6.15)
t0 t0
I0
= · Re (s) + I02 , (7.6.16)
qe

where
 +∞
Re (s) = i e (t)i e (t + s)dt. (7.6.17)
−∞

Now, the second term, I02 , is the contribution of the pure DC component, i.e., the
(stationary) average current. The first term is the fluctuation noise. Note that for
i e (t) = qe δ(t), we have Re (s) = qe2 δ(s), and so, the (flat) spectrum of the noisy part
is

Sshot (ω) = I0 qe , (7.6.18)

or S(ω) = 2I0 qe for the single–sided spectrum. A few comments are in order:
1. By measuring the noise intensity in a diode, one can find qe experimentally.
2. In the derivation above, we assumed that i e (t) is proportional to the Dirac delta
function, which is an idealization. For a general pulse shape, Sshot (ω) would
become proportional to the Fourier transform of Re (s) as defined in (7.6.17).
Equivalently, this can be thought of as letting the white noise process derived
above undergo a linear filter whose impulse response is i e (t).
3. The result applies as long as I0 is not too large. For a strong DC current I0 , there
is another effect that kicks in, namely, the spatial charge effect: if a large bulk of
electrons cross at the same time, they create a spatial charge that interferes with
the emission of additional electrons, and this causes the shot noise spectral level
to be smaller than predicted by the above derivation.
Burst Noise
Burst noise consists of sudden step–like transitions between two or more levels (non-
Gaussian), as high as several hundred micro-volts, at random and unpredictable times.
Each shift in offset voltage or current lasts for several milliseconds, and the intervals
between pulses tend to be in the audio range (less than 100 Hz), leading to the term
7.6 Other Noise Sources 129

popcorn noise for the popping or crackling sounds it produces in audio circuits. Burst
noise is customarily modeled as an RTS and therefore, another synonym for burst
noise is RTS noise. Accordingly, it has a Lorentzian spectrum, similarly as in (7.6.2):

1
Sburst (ω) ∝ , (7.6.19)
1 + (ω/ω0 )2

which means that the spectrum is nearly flat at low frequencies (compared to the
cutoff frequency ω0 ) and nearly proportional to 1/ω 2 for high frequencies.
Avalanche Noise
Avalanche noise is the noise produced when a junction diode is operated at the onset
of avalanche breakdown, a semiconductor junction phenomenon in which carriers
in a high voltage gradient develop sufficient energy to dislodge additional carriers
through physical impact, creating ragged current flows.

7.7 Suggestions for Supplementary Reading

Parts of the material in this chapter are based on Beck [1, Chaps. 6 and 9] and Reif
[2, Chap. 15]. For additional recommended reading, the reader is referred to van
Kampen [3], Risken [4], and Sethna [5, Chap. 10].

References

1. A.H.W. Beck, Statistical Mechanics, Fluctuations and Noise (Edward Arnold Publishers, Lon-
don, 1976)
2. F. Reif, Fundamentals of Statistical and Thermal Physics (McGraw-Hill, New York, 1965)
3. N.G. van Kampen, Stochastic Processes in Physics and Chemistry (North Holland, Amsterdam,
1992)
4. H. Risken, The Fokker–Planck Equation - Methods of Solution and Applications, 2nd edn.
(Springer, Berlin, 1989)
5. J.P. Sethna, Statistical Mechanics: Entropy, Order Parameters, and Complexity (Oxford Uni-
versity Press, Oxford, 2006)
Chapter 8
A Brief Touch on Information Theory∗

8.1 Introduction – What Is Information Theory About?

Our last topic in this book consists of a very brief description on the relation between
statistical physics and information theory, a research field pioneered by Claude
Elwood Shannon (1916–2001), whose seminal paper “A Mathematical Theory of
Communications” (1948) has established the corner–stone of this field.
In a nutshell, information theory is a science that focuses on the fundamental limits,
on the one hand, and the achievable performance, on the other hand, concerning
various information processing tasks, including most notably:
1. Data compression (lossless/lossy).
2. Error correction coding (coding for protection against errors due to channel noise).
3. Encryption.
There are also additional tasks of information processing that are considered to belong
under the umbrella of information theory, like: signal detection, estimation (para-
meter estimation, filtering/smoothing, prediction), information embedding, process
simulation, extraction of random bits, information relaying, and more.
Core information theory, which is called Shannon theory in the jargon of the
professionals, is about coding theorems. It is associated with the development of
computable formulas that characterize the best performance that can possibly be
achieved in these information processing tasks under some (usually simple) assump-
tions on the probabilistic models that govern the data, the channel noise, the side
information, the jammers if applicable, etc. While in most cases, this theory does not
suggest constructive communication systems, it certainly provides insights concern-
ing the features that an optimal (or nearly optimal) communication system must have.
Shannon theory serves, first and foremost, as the theoretical basis for modern digital
communication engineering. That being said, much of the modern research activ-
ity in information theory evolves, not only around Shannon theory, but also on the
never-ending efforts to develop methodologies (mostly, specific code structures and

© The Author(s) 2018 131


N. Merhav, Statistical Physics for Electrical Engineering,
DOI 10.1007/978-3-319-62063-3_8
132 8 A Brief Touch on Information Theory∗

algorithms) for designing very efficient communication systems, which hopefully


come close to the bounds and the fundamental performance limits.
But the scope of information theory it is not limited merely to communication
engineering: it plays a role also in computer science, and many other disciplines,
one of them is thermodynamics and statistical mechanics, which is the focus of
this last chapter. Often, information–theoretic problems are well approached from a
statistical–mechanical point of view. We will taste this very briefly in two examples
of problems.
In this book, we will not delve into information theory too deeply, but our purpose
is merely to touch upon the interface of these two fields. A more advanced exposition,
that goes much deeper than our scope here, is provided in [1].

8.2 Entropy in Information Theory and Statistical Physics

Perhaps the first relation that crosses one’s mind is that in both fields there is a
fundamental notion of entropy. Actually, in information theory, the term entropy was
coined in the footsteps of the thermodynamic/statistical–mechanical entropy. Along
this book, we have seen already three (seemingly) different forms of the entropy: the
first is the thermodynamic entropy defined, in its differential form as

δS = δ Q/T, (8.2.1)

which was first introduced by Clausius in 1850. The second is the statistical entropy

S = k ln , (8.2.2)

which was defined by Boltzmann in 1872. The third is yet another formula for the
entropy – the Gibbs formula for the entropy of the canonical ensemble:

S = −k P(x) ln P(x) = −kln P(x), (8.2.3)
x

which we encountered in Chap. 2.


It is virtually impossible to miss the functional resemblance between the last form
above and the information–theoretic entropy, a.k.a. the Shannon entropy, which is
simply

H =− P(x) log2 P(x) (8.2.4)
x

namely, the same expression as above exactly, just without the factor k and with the
basis of the logarithm being 2 rather than e. Indeed, this clear analogy was recognized
already by Shannon and von Neumann. According to a well–known anecdote, von
8.2 Entropy in Information Theory and Statistical Physics 133

Neumann advised Shannon to adopt this term because it would provide him with
“... a great edge in debates because nobody really knows what entropy is anyway.”
What is the information–theoretic meaning of entropy? It turns out that it has many
information–theoretic meanings, but the most fundamental one concerns optimum
compressibility of data. Suppose that we have a string of N i.i.d. random variables,
x1 , x2 , . . . , x N , taking values in a discrete set, say, the components of the microstate
in a quantum system of non–interacting particles, and we want to represent the
microstate information digitally (in bits) as compactly as possible, without losing
any information – in other words, we require the ability to fully reconstruct the data
from the compressed binary representation. How short can this binary representation
be?
Let us look at the following example. Suppose that each xi takes values in the set
{A, B, C, D}, independently with probabilities

P(A) = 1/2; P(B) = 1/4; P(C) = 1/8; P(D) = 1/8.

Clearly, when translating the letters into bits, the naive approach would be to say
the following: we have 4 letters, so it takes 2 bits to distinguish between them, by
mapping, say lexicographically, as follows:

A → 00; B → 01; C → 10; D → 11.

This would mean representing the list of x’s using 2 bits per–symbol. This is very
simple. But is this the best thing one can do?
It turns out that the answer is negative. Intuitively, if we can assign variable–
length code-words to the various letters, using shorter code-words for more probable
symbols and longer code-words for the less frequent ones, we might be able to gain
something. In our example, A is most probable, while C and D are the least probable,
so how about the following solution:

A → 0; B → 10; C → 110; D → 111.

Now the average number of bits per symbol is:

1 1 1 1
· 1 + · 2 + · 3 + · 3 = 1.75.
2 4 8 8
We have improved the average bit rate by 12.5%. This is fine, but is this the best one
can do or can we improve even further?
It turns out that this time, the answer is affirmative. Note that in this solution,
each letter has a probability of the form 2− ( – positive integer) and the length
of the assigned code-word is exactly  (for A,  = 1, for B –  = 2, and for C
and D,  = 3). In other words, the length of the code-word for each letter is the
negative
 logarithm of its probability, so the average number of bits per symbol is
x∈{A,B,C,D} P(x)[− log2 P(x)], which is exactly the entropy H of the information
134 8 A Brief Touch on Information Theory∗

source. One of the basic coding theorems of information theory tells us that we
cannot compress to any coding rate below the entropy and still expect to be able to
reconstruct the x’s perfectly. But why is this true?
We will not get into a rigorous proof of this statement, but we will make an
attempt to give a statistical–mechanical insight into it. Consider the microstate x =
(x1 , . . . , x N ) and let us think of the probability function
 

N 
N
P(x1 , . . . , x N ) = P(xi ) = exp −(ln 2) log2 [1/P(xi )] (8.2.5)
i=1 i=1

as an instance of the canonical ensemble at inverse  Ntemperature β = ln 2, where


Hamiltonian is additive, namely, E(x1 , . . . , x N ) = i=1 (xi ), with (xi ) = − log2
P(xi ). Obviously, Z (β) = Z (ln 2) = 1, so the free energy is exactly zero here. Now,
by the weak law of large numbers, for most realizations of the microstate x,

1 
N
(xi ) ≈ (xi ) = − log2 P(xi ) = H, (8.2.6)
N i=1

so the average ‘internal energy’ is NH. It is safe to consider instead, the corresponding
microcanonical ensemble, which is equivalent as far as macroscopic averages go. In
the microcanonical ensemble, we would then have:

1 
N
(xi ) = H (8.2.7)
N i=1

for every realization of x. How many bits would it take us to represent x in this micro-
canonical ensemble? Since all x’s are equiprobable in the microcanonical ensemble,
we assign to all x’s binary code-words of the same length, call it L. In order to
have a one–to–one mapping between the set of accessible x’s and binary strings of
representation, 2 L , which is the number of binary strings of length L, should be no
less than the number of microstates {x} of the microcanonical ensemble. Thus,
 
 
N 
 
L ≥ log2  x : (xi ) = N H  = log2 (N H ), (8.2.8)
 
i=1

but the r.h.s. is exactly related (up to a constant factor) to Boltzmann’s entropy
associated with ‘internal energy’ at the level of N H . Now, observe that the free
energy of the original, canonical ensemble, which is zero, is related to the entropy
ln (N H ) via the Legendre relation ln Z ≈ ln  − β E, which is

0 = ln Z (ln 2) ≈ ln (N H ) − N H ln 2 (8.2.9)


8.2 Entropy in Information Theory and Statistical Physics 135

and so,

ln (N H ) ≈ N H ln 2 (8.2.10)

or

log2 (N H ) ≈ N H, (8.2.11)

and therefore, by (8.2.8):

L ≥ log2 (N H ) ≈ N H, (8.2.12)

which means that the length of the binary representation essentially cannot be less
than N H , namely, a compression rate of H bits per component of x. So, we have
seen that the entropy has a very concrete information–theoretic meaning, and in fact,
it is not the only one, but we will not delve into this any further here.

8.3 Statistical Physics of Optimum Message Distributions

We next study another, very simple paradigm of a communication system, studied


by Reiss [2] and Reiss and Huang [3]. The analogy and the parallelism to the basic
concepts of statistical mechanics, that were introduced earlier, will be quite evident
from the choice of the notation, which is deliberately chosen to correspond to that
of analogous physical quantities.
Consider a continuous–time communication system that includes a noiseless
channel, with capacity
log2 M(E)
C = lim , (8.3.1)
E→∞ E

where M(E) is the number of distinct messages (and log2 of this is the number of
bits) that can be transmitted over a time interval of E seconds. Over a duration of E
seconds, L information symbols are conveyed, so that the average transmission time
per symbol is σ = E/L seconds per symbol. In the absence of any constraints on
the structure of the encoded messages, M(E) = r L = r E/σ , where r is the channel
input–output alphabet size. Thus, C = (log r )/σ bits per second.
Consider now the thermodynamic limit of L → ∞. Suppose that the L symbols
of duration E form N words, where by ‘word’, we mean a certain variable–length
string of channel symbols. The average transmission time per word is then  = E/N .
Suppose further, that the code defines a certain set of word transmission times:
word number i takes i seconds to transmit. What is the optimum allocation of
word probabilities {Pi } that would support full utilization of the channel capacity?
Equivalently, given the probabilities {Pi }, what are the optimum transmission times
{i }? For simplicity, we will assume that {i } are all distinct. Suppose that each word
136 8 A Brief Touch on Information Theory∗

appears Ni times in the entire message. Denoting N = (N1 , N2 , . . .), Pi = Ni /N ,


and P = (P1 , P2 , . . .), the total number of messages pertaining to a given N is

N! ·
(N) =  = exp{N · H ( P)} (8.3.2)
N
i i !

where H ( P) is the Shannon entropy pertaining to the probability distribution P.


Now,

M(E) = (N). (8.3.3)

N: i Ni i =E

This sum is dominated by the maximum term, namely, the maximum–entropy assign-
ment of relative frequencies
e−βi
Pi = (8.3.4)
Z (β)

where β > 0 is a Lagrange multiplier chosen such that i Pi i = , which gives

ln[Pi Z (β)]
i = − . (8.3.5)
β

For β = 1, this is in agreement with our earlier observation that the optimum message
length assignment in variable–length lossless data compression is according to the
negative logarithm of the probability.
Suppose now that {i } are kept fixed and consider a small perturbation in Pi ,
denoted dPi . Then

d = i dPi
i
1
=− (dPi ) ln[Pi Z (β)]
β i
1 1
=− (dPi ) ln Pi − (dPi ) ln Z (β)
β i β i
1
=− (dPi ) ln Pi
β i

1 
= d −k Pi ln Pi
kβ i

= T ds, (8.3.6)
8.3 Statistical Physics of Optimum Message Distributions 137


where we have defined T = 1/(kβ) and s = −k i Pi ln Pi . The free energy per
particle is given by

f =  − T s = −kT ln Z , (8.3.7)

which is related to the redundancy of the code. In [2], there is also an extension of this
setting to the case where N is not fixed, with correspondence to the grand—canonical
ensemble.

8.4 Suggestions for Supplementary Reading

In addition to the references mentioned in this chapter, the reader is referred to


[1, 4], as well as many references therein, for a much deeper exposition of the
relation to information theory. We should also mention that Sect. 8.4 above is similar
to Sect. 3.1 of [1].

References

1. N. Merhav, Statistical physics and information theory. Found. Trends Commun. Inf. Theory
6(1–2), 1–212 (2009)
2. H. Reiss, Thermodynamic-like transformations in information theory. J. Stat. Phys. 1(1), 107–
131 (1969)
3. H. Reiss, C. Huang, Statistical thermodynamic formalism in the solution of information theory
problems. J. Stat. Phys. 3(2), 191–211 (1971)
4. M. Mézard, A. Montanari, Information, Physics, and Computation (Oxford University Press,
Oxford, 2009)

You might also like